Re: [PATCH V2] mm: Replace all open encodings for NUMA_NO_NODE

2018-11-26 Thread Michael Ellerman
Anshuman Khandual  writes:
> At present there are multiple places where invalid node number is encoded
> as -1. Even though implicitly understood it is always better to have macros
> in there. Replace these open encodings for an invalid node number with the
> global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like
> 'invalid node' from various places redirecting them to a common definition.
>
> Signed-off-by: Anshuman Khandual 
> ---
> Changes in V2:
>
> - Added inclusion of 'numa.h' header at various places per Andrew
> - Updated 'dev_to_node' to use NUMA_NO_NODE instead per Vinod
>
> Changes in V1: (https://lkml.org/lkml/2018/11/23/485)
>
> - Dropped OCFS2 changes per Joseph
> - Dropped media/video drivers changes per Hans
>
> RFC - https://patchwork.kernel.org/patch/10678035/
>
> Build tested this with multiple cross compiler options like alpha, sparc,
> arm64, x86, powerpc, powerpc64le etc with their default config which might
> not have compiled tested all driver related changes. I will appreciate
> folks giving this a test in their respective build environment.
>
> All these places for replacement were found by running the following grep
> patterns on the entire kernel code. Please let me know if this might have
> missed some instances. This might also have replaced some false positives.
> I will appreciate suggestions, inputs and review.
>
> 1. git grep "nid == -1"
> 2. git grep "node == -1"
> 3. git grep "nid = -1"
> 4. git grep "node = -1"
>
>  arch/alpha/include/asm/topology.h |  3 ++-
>  arch/ia64/kernel/numa.c   |  2 +-
>  arch/ia64/mm/discontig.c  |  6 +++---
>  arch/ia64/sn/kernel/io_common.c   |  3 ++-
>  arch/powerpc/include/asm/pci-bridge.h |  3 ++-
>  arch/powerpc/kernel/paca.c|  3 ++-
>  arch/powerpc/kernel/pci-common.c  |  3 ++-
>  arch/powerpc/mm/numa.c| 14 +++---
>  arch/powerpc/platforms/powernv/memtrace.c |  5 +++--

These powerpc changes all look fine.

Acked-by: Michael Ellerman 


cheers


Re: [PATCH] lkdtm: print real addresses

2018-11-26 Thread Greg Kroah-Hartman
On Wed, Nov 07, 2018 at 08:14:10PM +, Christophe Leroy wrote:
> Today, when doing a lkdtm test before the readiness of the
> random generator, (ptrval) is printed instead of the address
> at which it perform the fault:
> 
> [ 1597.337030] lkdtm: Performing direct entry EXEC_USERSPACE
> [ 1597.337142] lkdtm: attempting ok execution at (ptrval)
> [ 1597.337398] lkdtm: attempting bad execution at (ptrval)
> [ 1597.337460] kernel tried to execute user page (77858000) -exploit attempt? 
> (uid: 0)
> [ 1597.344769] Unable to handle kernel paging request for instruction fetch
> [ 1597.351392] Faulting instruction address: 0x77858000
> [ 1597.356312] Oops: Kernel access of bad area, sig: 11 [#1]
> 
> If the lkdtm test is done later on, it prints an hashed address.
> 
> In both cases this is pointless. The purpose of the test is to
> ensure the kernel generates an Oops at the expected address,
> so real addresses needs to be printed. This patch fixes that.

I am pretty sure this is intentional.  Kees?



Re: [PATCH] lkdtm: do not depend on CONFIG_BLOCK

2018-11-26 Thread Greg Kroah-Hartman
On Fri, Nov 09, 2018 at 07:05:51AM +, Christophe Leroy wrote:
> Most parts of lkdtm don't require CONFIG_BLOCK.
> 
> This patch limits dependency to CONFIG_BLOCK in order to give embedded
> platforms which don't select CONFIG_BLOCK the opportunity to use LKDTM.
> 
> Fixes: fddd9cf82c9f ("make LKDTM depend on BLOCK")
> Signed-off-by: Christophe Leroy 
> ---
>  drivers/misc/lkdtm/core.c | 7 ++-
>  lib/Kconfig.debug | 1 -
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
> index 2837dc77478e..bc76756b7eda 100644
> --- a/drivers/misc/lkdtm/core.c
> +++ b/drivers/misc/lkdtm/core.c
> @@ -40,9 +40,12 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  
> +#ifdef CONFIG_BLOCK
> +#include 
> +#endif

Why would this config option be needed to be checked just to be able to
include a .h file?

And shouldn't you be depending on SCSI instead?

> +
>  #ifdef CONFIG_IDE
>  #include 
>  #endif
> @@ -101,7 +104,9 @@ static struct crashpoint crashpoints[] = {
>   CRASHPOINT("FS_DEVRW",   "ll_rw_block"),
>   CRASHPOINT("MEM_SWAPOUT","shrink_inactive_list"),
>   CRASHPOINT("TIMERADD",   "hrtimer_start"),
> +# ifdef CONFIG_BLOCK
>   CRASHPOINT("SCSI_DISPATCH_CMD",  "scsi_dispatch_cmd"),
> +# endif

Again, scsi?

thanks,

greg k-h


Re: use generic DMA mapping code in powerpc V4

2018-11-26 Thread Christoph Hellwig
Any comments?  I'd like to at least get the ball moving on the easy
bits.

On Wed, Nov 14, 2018 at 09:22:40AM +0100, Christoph Hellwig wrote:
> Hi all,
> 
> this series switches the powerpc port to use the generic swiotlb and
> noncoherent dma ops, and to use more generic code for the coherent
> direct mapping, as well as removing a lot of dead code.
> 
> As this series is very large and depends on the dma-mapping tree I've
> also published a git tree:
> 
> git://git.infradead.org/users/hch/misc.git powerpc-dma.4
> 
> Gitweb:
> 
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.4
> 
> Changes since v3:
>  - rebase on the powerpc fixes tree
>  - add a new patch to actually make the baseline amigaone config
>configure without warnings
>  - only use ZONE_DMA for 64-bit embedded CPUs, on pseries an IOMMU is
>always present
>  - fix compile in mem.c for one configuration
>  - drop the full npu removal for now, will be resent separately
>  - a few git bisection fixes
> 
> The changes since v1 are to big to list and v2 was not posted in public.
> 
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---


Re: [PATCH RESEND] powerpc/perf: Update perf_regs structure to include SIER

2018-11-26 Thread Christophe LEROY




Le 26/11/2018 à 19:04, Madhavan Srinivasan a écrit :

On each sample, Sample Instruction Event Register (SIER) content
is saved in pt_regs. SIER does not have a entry as-is in the pt_regs
but instead, SIER content is saved in the "dar" register of pt_regs.

Patch adds another entry to the perf_regs structure to include the "SIER"
printing which internally maps to the "dar" of pt_regs.

Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Alexander Shishkin 
Cc: Anju T Sudhakar 
Cc: Ravi Bangoria 
Tested-by: Ravi Bangoria 
Signed-off-by: Madhavan Srinivasan 
---


I see this patch is already in patchwork 
https://patchwork.ozlabs.org/patch/941908/ and I can't spot any 
difference between the two. Please say here (under the ---) the reason 
for resending.


Christophe



  arch/powerpc/include/uapi/asm/perf_regs.h   | 1 +
  arch/powerpc/perf/perf_regs.c   | 1 +
  tools/arch/powerpc/include/uapi/asm/perf_regs.h | 1 +
  tools/perf/arch/powerpc/include/perf_regs.h | 3 ++-
  tools/perf/arch/powerpc/util/perf_regs.c| 1 +
  5 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h 
b/arch/powerpc/include/uapi/asm/perf_regs.h
index 9e52c86ccbd3..ff91192407d1 100644
--- a/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -46,6 +46,7 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_TRAP,
PERF_REG_POWERPC_DAR,
PERF_REG_POWERPC_DSISR,
+   PERF_REG_POWERPC_SIER,
PERF_REG_POWERPC_MAX,
  };
  #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c
index 09ceea6175ba..c262aea22ad9 100644
--- a/arch/powerpc/perf/perf_regs.c
+++ b/arch/powerpc/perf/perf_regs.c
@@ -69,6 +69,7 @@ static unsigned int pt_regs_offset[PERF_REG_POWERPC_MAX] = {
PT_REGS_OFFSET(PERF_REG_POWERPC_TRAP, trap),
PT_REGS_OFFSET(PERF_REG_POWERPC_DAR, dar),
PT_REGS_OFFSET(PERF_REG_POWERPC_DSISR, dsisr),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_SIER, dar),
  };
  
  u64 perf_reg_value(struct pt_regs *regs, int idx)

diff --git a/tools/arch/powerpc/include/uapi/asm/perf_regs.h 
b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
index 9e52c86ccbd3..ff91192407d1 100644
--- a/tools/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -46,6 +46,7 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_TRAP,
PERF_REG_POWERPC_DAR,
PERF_REG_POWERPC_DSISR,
+   PERF_REG_POWERPC_SIER,
PERF_REG_POWERPC_MAX,
  };
  #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/tools/perf/arch/powerpc/include/perf_regs.h 
b/tools/perf/arch/powerpc/include/perf_regs.h
index 00e37b106913..1076393e6f43 100644
--- a/tools/perf/arch/powerpc/include/perf_regs.h
+++ b/tools/perf/arch/powerpc/include/perf_regs.h
@@ -62,7 +62,8 @@ static const char *reg_names[] = {
[PERF_REG_POWERPC_SOFTE] = "softe",
[PERF_REG_POWERPC_TRAP] = "trap",
[PERF_REG_POWERPC_DAR] = "dar",
-   [PERF_REG_POWERPC_DSISR] = "dsisr"
+   [PERF_REG_POWERPC_DSISR] = "dsisr",
+   [PERF_REG_POWERPC_SIER] = "sier"
  };
  
  static inline const char *perf_reg_name(int id)

diff --git a/tools/perf/arch/powerpc/util/perf_regs.c 
b/tools/perf/arch/powerpc/util/perf_regs.c
index ec50939b0418..07fcd977d93e 100644
--- a/tools/perf/arch/powerpc/util/perf_regs.c
+++ b/tools/perf/arch/powerpc/util/perf_regs.c
@@ -52,6 +52,7 @@ const struct sample_reg sample_reg_masks[] = {
SMPL_REG(trap, PERF_REG_POWERPC_TRAP),
SMPL_REG(dar, PERF_REG_POWERPC_DAR),
SMPL_REG(dsisr, PERF_REG_POWERPC_DSISR),
+   SMPL_REG(sier, PERF_REG_POWERPC_SIER),
SMPL_REG_END
  };
  



Re: [PATCH v5 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-11-26 Thread kbuild test robot
Hi Douglas,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kgdb/kgdb-next]
[also build test ERROR on v4.20-rc4 next-20181126]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Douglas-Anderson/kgdb-Fix-kgdb_roundup_cpus/20181127-115425
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb.git 
kgdb-next
config: sparc64-allyesconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=sparc64 

Note: the 
linux-review/Douglas-Anderson/kgdb-Fix-kgdb_roundup_cpus/20181127-115425 HEAD 
b8d0502e65f2d7a2187baa69870146a6fbf18c9f builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

   kernel/debug/debug_core.c: In function 'kgdb_roundup_cpus':
>> kernel/debug/debug_core.c:261:18: error: 'struct debuggerinfo_struct' has no 
>> member named 'rounding_up'
   kgdb_info[cpu].rounding_up = false;
 ^

vim +261 kernel/debug/debug_core.c

   244  
   245  void __weak kgdb_roundup_cpus(void)
   246  {
   247  call_single_data_t *csd;
   248  int this_cpu = raw_smp_processor_id();
   249  int cpu;
   250  int ret;
   251  
   252  for_each_online_cpu(cpu) {
   253  /* No need to roundup ourselves */
   254  if (cpu == this_cpu)
   255  continue;
   256  
   257  csd = _cpu(kgdb_roundup_csd, cpu);
   258  csd->func = kgdb_call_nmi_hook;
   259  ret = smp_call_function_single_async(cpu, csd);
   260  if (ret)
 > 261  kgdb_info[cpu].rounding_up = false;
   262  }
   263  }
   264  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v4 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-11-26 Thread Doug Anderson
Hi,

On Wed, Nov 14, 2018 at 2:07 PM Will Deacon  wrote:
>
> > +void __weak kgdb_call_nmi_hook(void *ignored)
> > +{
> > + kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
> > +}
>
> I suppose you could pass the cpu as an argument, but it doesn't really
> matter.

I probably won't change it for now if it doesn't matter...


> Also, I think there are cases where the CSD callback can run without
> having received an IPI, so we could potentially up passing NULL for the regs
> here which probably goes boom.

Hrm, good point.  This is not a new issue so I'd tend to add it to the
TODO list rather than block the series.  I'll also add a comment in
the code since I'm touching it anyway.

Interestingly enough quite a bit of things continue to work just fine
even if this is NULL.  I simulated setting this to NULL for all CPUs
and I could drop into the debugger, type "btc" to backtrace all CPUs,
attach to kgdb, etc.  ...but when I typed "cpu 1" it went boom.  So it
seems like parts of kdb use this but definitely not everything.

Also interesting is that on MIPS this is always NULL.  I have no idea
why but my patch series preserves this oddity.  Presumably if someone
was on a SMP MIPS machine and did "cpu 1" from kdb they'd go boom too.

In general kdb has a lot of crufty stuff like this in it.  We need to
work to get rid of the cruft but one step at a time I think.

I've started a kgdb-wishlist:

https://bugs.chromium.org/p/chromium/issues/list?q=label%3Akgdb-wishlist

...and this is crbug.com/908723


> > +void __weak kgdb_roundup_cpus(void)
> > +{
> > + call_single_data_t *csd;
> > + int this_cpu = get_cpu();
>
> Do you actually need to disable preemption here? afaict, irqs are already
> disabled by the kgdb core.

Ah, right.  I can just use raw_smp_processor_id().  Done.  I didn't
try to see if I could use smp_processor_id() since
kgdb_call_nmi_hook() already used raw_smp_processor_id(), but I can
dig if you wish.


> > + int cpu;
> > +
> > + for_each_cpu(cpu, cpu_online_mask) {
>
> for_each_online_cpu(cpu) ?

Done.


> I'm assuming this is serialised wrt CPU hotplug somehow?

I doubt it.  I can add it to my wishlist (crbug.com/908722) , but I
don't think it's something I'm going to try to solve right now and
it's definitely not new.  I think we need to make some sort of attempt
in kgdb_cpu_enter() to stop hotplugging, though we'd have to take into
account that we may be entering kgdb in an IRQ context so it might be
hard to grab a mutex.  We need to account for it there since that
function has code like:

> while (kgdb_do_roundup && --time_left &&
>(atomic_read(_in_kgdb) + atomic_read(_in_kgdb)) !=
>online_cpus)
> udelay(1000);

...and that would also be broken if cpus were plugging / unplugging.

In general, at least, the worst case would be that we'd either have an
extra 1 second delay entering the debugger (because we were waiting
for a CPU to respond that's been hotplugged) or we'd enter kgdb
without stopping one of the CPUs.  Neither of those is ideal but I
don't think we'd end up in too bad shape.

Oh, but actually, I guess I should probably check the error return of
smp_call_function_single_async() and if it returns an error I should
unset rounding_up...  That would make things behave slightly better
and is probably right anyway.


Overall: thank you very much for the review and the feedback.  Sorry
I'm not really fixing everything here.  My hope is to move things to a
slightly better state but I don't have time to fix everything.
Hopefully I can find some more time soon to fix more, or perhaps
someone else will.


-Doug


[PATCH v5 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-11-26 Thread Douglas Anderson
When I had lockdep turned on and dropped into kgdb I got a nice splat
on my system.  Specifically it hit:
  DEBUG_LOCKS_WARN_ON(current->hardirq_context)

Specifically it looked like this:
  sysrq: SysRq : DEBUG
  [ cut here ]
  DEBUG_LOCKS_WARN_ON(current->hardirq_context)
  WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 
lockdep_hardirqs_on+0xf0/0x160
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
  pstate: 604003c9 (nZCv DAIF +PAN -UAO)
  pc : lockdep_hardirqs_on+0xf0/0x160
  ...
  Call trace:
   lockdep_hardirqs_on+0xf0/0x160
   trace_hardirqs_on+0x188/0x1ac
   kgdb_roundup_cpus+0x14/0x3c
   kgdb_cpu_enter+0x53c/0x5cc
   kgdb_handle_exception+0x180/0x1d4
   kgdb_compiled_brk_fn+0x30/0x3c
   brk_handler+0x134/0x178
   do_debug_exception+0xfc/0x178
   el1_dbg+0x18/0x78
   kgdb_breakpoint+0x34/0x58
   sysrq_handle_dbg+0x54/0x5c
   __handle_sysrq+0x114/0x21c
   handle_sysrq+0x30/0x3c
   qcom_geni_serial_isr+0x2dc/0x30c
  ...
  ...
  irq event stamp: ...45
  hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
  hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
  softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
  softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
  ---[ end trace adf21f830c46e638 ]---

Looking closely at it, it seems like a really bad idea to be calling
local_irq_enable() in kgdb_roundup_cpus().  If nothing else that seems
like it could violate spinlock semantics and cause a deadlock.

Instead, let's use a private csd alongside
smp_call_function_single_async() to round up the other CPUs.  Using
smp_call_function_single_async() doesn't require interrupts to be
enabled so we can remove the offending bit of code.

In order to avoid duplicating this across all the architectures that
use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
to debug_core.c.

Looking at all the people who previously had copies of this code,
there were a few variants.  I've attempted to keep the variants
working like they used to.  Specifically:
* For arch/arc we passed NULL to kgdb_nmicallback() instead of
  get_irq_regs().
* For arch/mips there was a bit of extra code around
  kgdb_nmicallback()

NOTE: In this patch we will still get into trouble if we try to round
up a CPU that failed to round up before.  We'll try to round it up
again and potentially hang when we try to grab the csd lock.  That's
not new behavior but we'll still try to do better in a future patch.

Suggested-by: Daniel Thompson 
Signed-off-by: Douglas Anderson 
---

Changes in v5:
- Add a comment about get_irq_regs().
- get_cpu() => raw_smp_processor_id() in kgdb_roundup_cpus().
- for_each_cpu() => for_each_online_cpu()
- Error check smp_call_function_single_async()

Changes in v4: None
Changes in v3:
- No separate init call.
- Don't round up the CPU that is doing the rounding up.
- Add "#ifdef CONFIG_SMP" to match the rest of the file.
- Updated desc saying we don't solve the "failed to roundup" case.
- Document the ignored parameter.

Changes in v2:
- Removing irq flags separated from fixing lockdep splat.
- Don't use smp_call_function (Daniel).

 arch/arc/kernel/kgdb.c | 10 ++---
 arch/arm/kernel/kgdb.c | 12 ---
 arch/arm64/kernel/kgdb.c   | 12 ---
 arch/hexagon/kernel/kgdb.c | 27 ---
 arch/mips/kernel/kgdb.c|  9 +---
 arch/powerpc/kernel/kgdb.c |  4 ++--
 arch/sh/kernel/kgdb.c  | 12 ---
 include/linux/kgdb.h   | 15 +++--
 kernel/debug/debug_core.c  | 44 ++
 9 files changed, 62 insertions(+), 83 deletions(-)

diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
index 0932851028e0..68d9fe4b5aa7 100644
--- a/arch/arc/kernel/kgdb.c
+++ b/arch/arc/kernel/kgdb.c
@@ -192,18 +192,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long 
ip)
instruction_pointer(regs) = ip;
 }
 
-static void kgdb_call_nmi_hook(void *ignored)
+void kgdb_call_nmi_hook(void *ignored)
 {
+   /* Default implementation passes get_irq_regs() but we don't */
kgdb_nmicallback(raw_smp_processor_id(), NULL);
 }
 
-void kgdb_roundup_cpus(void)
-{
-   local_irq_enable();
-   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-   local_irq_disable();
-}
-
 struct kgdb_arch arch_kgdb_ops = {
/* breakpoint instruction: TRAP_S 0x3 */
 #ifdef CONFIG_CPU_BIG_ENDIAN
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index f21077b077be..d9a69e941463 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -170,18 +170,6 @@ static struct undef_hook kgdb_compiled_brkpt_hook = {
.fn = kgdb_compiled_brk_fn
 };
 
-static void kgdb_call_nmi_hook(void *ignored)
-{
-   kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
-}
-
-void kgdb_roundup_cpus(void)
-{
-   local_irq_enable();
-   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-   

[PATCH v5 1/4] kgdb: Remove irq flags from roundup

2018-11-26 Thread Douglas Anderson
The function kgdb_roundup_cpus() was passed a parameter that was
documented as:

> the flags that will be used when restoring the interrupts. There is
> local_irq_save() call before kgdb_roundup_cpus().

Nobody used those flags.  Anyone who wanted to temporarily turn on
interrupts just did local_irq_enable() and local_irq_disable() without
looking at them.  So we can definitely remove the flags.

Signed-off-by: Douglas Anderson 
Acked-by: Will Deacon 
---

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2:
- Removing irq flags separated from fixing lockdep splat.

 arch/arc/kernel/kgdb.c | 2 +-
 arch/arm/kernel/kgdb.c | 2 +-
 arch/arm64/kernel/kgdb.c   | 2 +-
 arch/hexagon/kernel/kgdb.c | 9 ++---
 arch/mips/kernel/kgdb.c| 2 +-
 arch/powerpc/kernel/kgdb.c | 2 +-
 arch/sh/kernel/kgdb.c  | 2 +-
 arch/sparc/kernel/smp_64.c | 2 +-
 arch/x86/kernel/kgdb.c | 9 ++---
 include/linux/kgdb.h   | 9 ++---
 kernel/debug/debug_core.c  | 2 +-
 11 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
index 9a3c34af2ae8..0932851028e0 100644
--- a/arch/arc/kernel/kgdb.c
+++ b/arch/arc/kernel/kgdb.c
@@ -197,7 +197,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), NULL);
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index caa0dbe3dc61..f21077b077be 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -175,7 +175,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index a20de58061a8..12c339ff6e75 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -289,7 +289,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
index 16c24b22d0b2..012e0e230ac2 100644
--- a/arch/hexagon/kernel/kgdb.c
+++ b/arch/hexagon/kernel/kgdb.c
@@ -119,17 +119,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long 
pc)
 
 /**
  * kgdb_roundup_cpus - Get other CPUs into a holding pattern
- * @flags: Current IRQ state
  *
  * On SMP systems, we need to get the attention of the other CPUs
  * and get them be in a known state.  This should do what is needed
  * to get the other CPUs to call kgdb_wait(). Note that on some arches,
  * the NMI approach is not used for rounding up all the CPUs. For example,
- * in case of MIPS, smp_call_function() is used to roundup CPUs. In
- * this case, we have to make sure that interrupts are enabled before
- * calling smp_call_function(). The argument to this function is
- * the flags that will be used when restoring the interrupts. There is
- * local_irq_save() call before kgdb_roundup_cpus().
+ * in case of MIPS, smp_call_function() is used to roundup CPUs.
  *
  * On non-SMP systems, this is not called.
  */
@@ -139,7 +134,7 @@ static void hexagon_kgdb_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(hexagon_kgdb_nmi_hook, NULL, 0);
diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
index eb6c0d582626..2b05effc17b4 100644
--- a/arch/mips/kernel/kgdb.c
+++ b/arch/mips/kernel/kgdb.c
@@ -219,7 +219,7 @@ static void kgdb_call_nmi_hook(void *ignored)
set_fs(old_fs);
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index 59c578f865aa..b0e804844be0 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -124,7 +124,7 @@ static int kgdb_call_nmi_hook(struct pt_regs *regs)
 }
 
 #ifdef CONFIG_SMP
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
smp_send_debugger_break();
 }
diff --git a/arch/sh/kernel/kgdb.c b/arch/sh/kernel/kgdb.c
index 4f04c6638a4d..cc57630f6bf2 100644
--- a/arch/sh/kernel/kgdb.c
+++ b/arch/sh/kernel/kgdb.c
@@ -319,7 +319,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void 

[PATCH v5 0/4] kgdb: Fix kgdb_roundup_cpus()

2018-11-26 Thread Douglas Anderson
This series was originally part of the series ("serial: Finish kgdb on
qcom_geni; fix many lockdep splats w/ kgdb") but it made sense to
split it up.

It's believed that dropping into kgdb should be more robust once these
patches are applied.

Changes in v5:
- Add a comment about get_irq_regs().
- get_cpu() => raw_smp_processor_id() in kgdb_roundup_cpus().
- for_each_cpu() => for_each_online_cpu()
- Error check smp_call_function_single_async()

Changes in v4:
- Removed smp_mb() calls.
- Also clear out .debuggerinfo.
- Also clear out .debuggerinfo and .task for the master.
- Remove clearing out in kdb_stub for offline CPUs; it's now redundant.

Changes in v3:
- No separate init call.
- Don't round up the CPU that is doing the rounding up.
- Add "#ifdef CONFIG_SMP" to match the rest of the file.
- Updated desc saying we don't solve the "failed to roundup" case.
- Document the ignored parameter.
- Don't round up a CPU that failed rounding up before new for v3.
- Don't back trace on a cpu that didn't round up new for v3.

Changes in v2:
- Removing irq flags separated from fixing lockdep splat.
- Don't use smp_call_function (Daniel).

Douglas Anderson (4):
  kgdb: Remove irq flags from roundup
  kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()
  kgdb: Don't round up a CPU that failed rounding up before
  kdb: Don't back trace on a cpu that didn't round up

 arch/arc/kernel/kgdb.c  | 10 +
 arch/arm/kernel/kgdb.c  | 12 --
 arch/arm64/kernel/kgdb.c| 12 --
 arch/hexagon/kernel/kgdb.c  | 32 
 arch/mips/kernel/kgdb.c |  9 +
 arch/powerpc/kernel/kgdb.c  |  6 +--
 arch/sh/kernel/kgdb.c   | 12 --
 arch/sparc/kernel/smp_64.c  |  2 +-
 arch/x86/kernel/kgdb.c  |  9 +
 include/linux/kgdb.h| 22 +++
 kernel/debug/debug_core.c   | 65 -
 kernel/debug/debug_core.h   |  1 +
 kernel/debug/kdb/kdb_bt.c   | 11 +-
 kernel/debug/kdb/kdb_debugger.c |  7 
 14 files changed, 98 insertions(+), 112 deletions(-)

-- 
2.20.0.rc0.387.gc7a69e6b6c-goog



Re: [PATCH] serial: 8250: Default SERIAL_OF_PLATFORM to SERIAL_8250

2018-11-26 Thread Florian Fainelli
+PPC folks

On 11/23/18 10:20 AM, Guenter Roeck wrote:
> On Mon, Nov 19, 2018 at 12:50:50PM -0800, Guenter Roeck wrote:
>> On Mon, Nov 19, 2018 at 10:44:30AM -0800, Florian Fainelli wrote:
>>> On 11/15/18 5:16 PM, Guenter Roeck wrote:
 On Thu, Nov 15, 2018 at 11:48:20AM -0800, Florian Fainelli wrote:
>
> OK, would you mind testing this below? It seems to me that 8250_of.c is
> incompatible with arch/powerpc/kernel/legacy_serial.c and that is what
> is causing the issue here.
>
> diff --git a/drivers/tty/serial/8250/Kconfig
> b/drivers/tty/serial/8250/Kconfig
> index d7737dca0e48..21cb14cbd34a 100644
> --- a/drivers/tty/serial/8250/Kconfig
> +++ b/drivers/tty/serial/8250/Kconfig
> @@ -483,7 +483,7 @@ config SERIAL_8250_PXA
>
>  config SERIAL_OF_PLATFORM
> tristate "Devicetree based probing for 8250 ports"
> -   depends on SERIAL_8250 && OF
> +   depends on SERIAL_8250 && OF && !(PPC && PPC_UDBG_16550)
> default SERIAL_8250
> help
>   This option is used for all 8250 compatible serial ports that

 44x/virtex5_defconfig has both PPC_UDBG_16550 and SERIAL_OF_PLATFORM 
 enabled
 and fails to boot (or display anything on the console) with this patch 
 applied.
>>>
>>> Thanks for trying, can you either share or provide a link to the mpc85xx
>>> and ml507 qemu command lines that you use? I spent a good chunk of my
>>> time trying to get a kernel to boot but has failed so far.
>>>
>>
> 
> Any update ? I still see the boot failures in next-20181123.

Yes, took most of last week's off sorry for the delay. I have finally
been able to boot a kernel using your instructions, thanks. The problem
is the following call chain:

of_platform_serial_probe()
 -> serial8250_register_8250_port()
-> up->port.uartclk == 0, return -EINVAL
-> irq_dispose_mapping()

and then we basically unwind what we just did with
of_platform_serial_probe() including disabling the UART's IRQ, comment
out the irq_dispose_mapping() and you will have a functional console
again. 8250_of.c is clearly not a full replacement for legacy_serial.c
(that was a first attempt), so we need to keep that code around. Making
the depends even more conditional also does not sound too appealing,
because while we have identified mpc85xx as being problematic, who knows
about other platforms. So the best I have that does not involve a revert
is this below.

Ben, Michael, would that sound reasonable to you? It seems to me that
there is a million ways to shoot the legacy_serial 8250 registration in
the foot in any cases, and having CONFIG_SERIAL_OF_PLATFORM just made it
painfully obvious.

diff --git a/arch/powerpc/kernel/legacy_serial.c
b/arch/powerpc/kernel/legacy_serial.c
index 33b34a58fc62..31353a27d714 100644
--- a/arch/powerpc/kernel/legacy_serial.c
+++ b/arch/powerpc/kernel/legacy_serial.c
@@ -16,7 +16,7 @@
 #include 
 #include 

-#undef DEBUG
+#define DEBUG

 #ifdef DEBUG
 #define DBG(fmt...) do { printk(fmt); } while(0)
@@ -70,6 +70,29 @@ static void tsi_serial_out(struct uart_port *p, int
offset, int value)
writeb(value, p->membase + offset);
 }

+#ifdef CONFIG_SERIAL_OF_PLATFORM
+static struct property uart_prop = {
+   .value = "disabled",
+   .name = "status",
+   .length = strlen("disabled"),
+};
+
+static void __init disable_uart_node(struct device_node *np)
+{
+   /* To avoid having 8250_of.c attempt to register the same device
+* and failing to do, and calling irq_dispose_mapping(), just
+* disable the device_node for now.
+*/
+   of_update_property(np, _prop);
+   pr_info("%s: disabled UART node\n", __func__);
+}
+#else
+static inline int disable_uart_node(struct device_node *np)
+{
+   return 0;
+}
+#endif
+
 static int __init add_legacy_port(struct device_node *np, int want_index,
  int iotype, phys_addr_t base,
  phys_addr_t taddr, unsigned long irq,
@@ -80,6 +103,8 @@ static int __init add_legacy_port(struct device_node
*np, int want_index,
u32 shift = 0;
int index;

+   disable_uart_node(np);
+
/* get clock freq. if present */
clk = of_get_property(np, "clock-frequency", NULL);
if (clk && *clk)



-- 
Florian


Re: [PATCH v2 01/20] perf/doc: update design.txt for exclude_{host|guest} flags

2018-11-26 Thread Suzuki K Poulose

Hi Andrew,

On 26/11/2018 11:12, Andrew Murray wrote:

Update design.txt to reflect the presence of the exclude_host
and exclude_guest perf flags.

Signed-off-by: Andrew Murray 


Thanks a lot for adding this !


---
  tools/perf/design.txt | 4 
  1 file changed, 4 insertions(+)

diff --git a/tools/perf/design.txt b/tools/perf/design.txt
index a28dca2..5b2b23b 100644
--- a/tools/perf/design.txt
+++ b/tools/perf/design.txt
@@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits 
provide a
  way to request that counting of events be restricted to times when the
  CPU is in user, kernel and/or hypervisor mode.
  
+Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way

+to request counting of events restricted to guest and host contexts when
+using KVM virtualisation.


minor nit: could we generalise this to :

"using Linux as the hypervisor".

Otherwise, looks good to me.

Cheers
Suzuki


[PATCH] powerpc: Fix COFF zImage booting on old powermacs

2018-11-26 Thread Paul Mackerras
Commit 6975a783d7b4 ("powerpc/boot: Allow building the zImage wrapper
as a relocatable ET_DYN", 2011-04-12) changed the procedure descriptor
at the start of crt0.S to have a hard-coded start address of 0x50
rather than a reference to _zimage_start, presumably because having
a reference to a symbol introduced a relocation which is awkward to
handle in a position-independent executable.  Unfortunately, what is
at 0x50 in the COFF image is not the first instruction, but the
procedure descriptor itself, that is, a word containing 0x50,
which is not a valid instruction.  Hence, booting a COFF zImage
results in a "DEFAULT CATCH!, code=FFF00700" message from Open
Firmware.

This fixes the problem by (a) putting the procedure descriptor in the
data section and (b) adding a branch to _zimage_start as the first
instruction in the program.

Fixes: 6975a783d7b4 ("powerpc/boot: Allow building the zImage wrapper as a 
relocatable ET_DYN")
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/boot/crt0.S | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/crt0.S b/arch/powerpc/boot/crt0.S
index 32dfe6d..9b9d174 100644
--- a/arch/powerpc/boot/crt0.S
+++ b/arch/powerpc/boot/crt0.S
@@ -15,7 +15,7 @@
 RELA = 7
 RELACOUNT = 0x6ff9
 
-   .text
+   .data
/* A procedure descriptor used when booting this as a COFF file.
 * When making COFF, this comes first in the link and we're
 * linked at 0x50.
@@ -23,6 +23,8 @@ RELACOUNT = 0x6ff9
.globl  _zimage_start_opd
 _zimage_start_opd:
.long   0x50, 0, 0, 0
+   .text
+   b   _zimage_start
 
 #ifdef __powerpc64__
 .balign 8
-- 
2.7.4



[Internal Review PATCH] powerpc/pseries: Relocate drmem.h to pseries

2018-11-26 Thread Michael Bringmann
The implementation of the pseries-specific dynamic memory features
is currently implemented in several non-pseries-specific files.
This patch set moves the implementation of the device-tree parsing
code for the properties ibm,dynamic-memory, ibm,dynamic-memory-v2,
and its representation in the kernel into the platform-specific
directory to the Pseries features.

This patch moves drmem.h from directory arch/powerpc/include/asm to
arch/powerpc/platforms/pseries, and fixing include file references
in pseries files.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/include/asm/drmem.h|  107 ---
 arch/powerpc/platforms/pseries/drmem.c  |2 
 arch/powerpc/platforms/pseries/drmem.h  |  107 +++
 arch/powerpc/platforms/pseries/hotplug-memory.c |2 
 arch/powerpc/platforms/pseries/lparcfg.c|2 
 5 files changed, 110 insertions(+), 110 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/drmem.h
 create mode 100644 arch/powerpc/platforms/pseries/drmem.h

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
deleted file mode 100644
index 7c1d8e7..000
--- a/arch/powerpc/include/asm/drmem.h
+++ /dev/null
@@ -1,107 +0,0 @@
-/*
- * drmem.h: Power specific logical memory block representation
- *
- * Copyright 2017 IBM Corporation
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#ifndef _ASM_POWERPC_LMB_H
-#define _ASM_POWERPC_LMB_H
-
-struct drmem_lmb {
-   u64 base_addr;
-   u32 drc_index;
-   u32 aa_index;
-   u32 flags;
-};
-
-struct drmem_lmb_info {
-   struct drmem_lmb*lmbs;
-   int n_lmbs;
-   u32 lmb_size;
-};
-
-extern struct drmem_lmb_info *drmem_info;
-
-#define for_each_drmem_lmb_in_range(lmb, start, end)   \
-   for ((lmb) = (start); (lmb) <= (end); (lmb)++)
-
-#define for_each_drmem_lmb(lmb)\
-   for_each_drmem_lmb_in_range((lmb),  \
-   _info->lmbs[0],   \
-   _info->lmbs[drmem_info->n_lmbs - 1])
-
-/*
- * The of_drconf_cell_v1 struct defines the layout of the LMB data
- * specified in the ibm,dynamic-memory device tree property.
- * The property itself is a 32-bit value specifying the number of
- * LMBs followed by an array of of_drconf_cell_v1 entries, one
- * per LMB.
- */
-struct of_drconf_cell_v1 {
-   __be64  base_addr;
-   __be32  drc_index;
-   __be32  reserved;
-   __be32  aa_index;
-   __be32  flags;
-};
-
-/*
- * Version 2 of the ibm,dynamic-memory property is defined as a
- * 32-bit value specifying the number of LMB sets followed by an
- * array of of_drconf_cell_v2 entries, one per LMB set.
- */
-struct of_drconf_cell_v2 {
-   u32 seq_lmbs;
-   u64 base_addr;
-   u32 drc_index;
-   u32 aa_index;
-   u32 flags;
-} __packed;
-
-#define DRCONF_MEM_ASSIGNED0x0008
-#define DRCONF_MEM_AI_INVALID  0x0040
-#define DRCONF_MEM_RESERVED0x0080
-
-static inline u32 drmem_lmb_size(void)
-{
-   return drmem_info->lmb_size;
-}
-
-#define DRMEM_LMB_RESERVED 0x8000
-
-static inline void drmem_mark_lmb_reserved(struct drmem_lmb *lmb)
-{
-   lmb->flags |= DRMEM_LMB_RESERVED;
-}
-
-static inline void drmem_remove_lmb_reservation(struct drmem_lmb *lmb)
-{
-   lmb->flags &= ~DRMEM_LMB_RESERVED;
-}
-
-static inline bool drmem_lmb_reserved(struct drmem_lmb *lmb)
-{
-   return lmb->flags & DRMEM_LMB_RESERVED;
-}
-
-u64 drmem_lmb_memory_max(void);
-void __init walk_drmem_lmbs(struct device_node *dn,
-   void (*func)(struct drmem_lmb *, const __be32 **));
-int drmem_update_dt(void);
-
-#ifdef CONFIG_PPC_PSERIES
-void __init walk_drmem_lmbs_early(unsigned long node,
-   void (*func)(struct drmem_lmb *, const __be32 **));
-#endif
-
-static inline void invalidate_lmb_associativity_index(struct drmem_lmb *lmb)
-{
-   lmb->aa_index = 0x;
-}
-
-#endif /* _ASM_POWERPC_LMB_H */
diff --git a/arch/powerpc/platforms/pseries/drmem.c 
b/arch/powerpc/platforms/pseries/drmem.c
index 01ac651..a52f10e 100644
--- a/arch/powerpc/platforms/pseries/drmem.c
+++ b/arch/powerpc/platforms/pseries/drmem.c
@@ -17,7 +17,7 @@
 #include 
 #include 
 #include 
-#include 
+#include "drmem.h"
 
 static struct drmem_lmb_info __drmem_info;
 struct drmem_lmb_info *drmem_info = &__drmem_info;
diff --git a/arch/powerpc/platforms/pseries/drmem.h 
b/arch/powerpc/platforms/pseries/drmem.h
new file mode 100644
index 000..7c1d8e7
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/drmem.h
@@ -0,0 +1,107 @@
+/*
+ * drmem.h: Power specific logical memory 

[Internal Review PATCH] powerpc/pseries: Move DRMEM processing out of numa.c

2018-11-26 Thread Michael Bringmann
The implementation of the pseries-specific dynamic memory features
is currently implemented in several non-pseries-specific files.
This patch set moves the implementation of the device-tree parsing
code for the properties ibm,dynamic-memory, ibm,dynamic-memory-v2,
and its representation in the kernel into the platform-specific
directory to the Pseries features.

This patch refactors references to drmem features out of numa.c, so
that they can be moved to drmem.c.  Changes include exporting a few
support functions from numa.c via powerpc/include/asm/topology.h, and
the creation of platform function platform_parse_numa_properties that
any powerpc platform may implement.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/include/asm/topology.h|   13 +
 arch/powerpc/mm/numa.c |  238 +++
 arch/powerpc/platforms/pseries/drmem.c |  330 
 3 files changed, 329 insertions(+), 252 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index a4a718d..0c1ad7e 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -135,5 +135,18 @@ static inline void shared_proc_topology_init(void) {}
 #endif
 #endif
 
+extern unsigned long numa_enforce_memory_limit(unsigned long start,
+   unsigned long size);
+extern void initialize_distance_lookup_table(int nid,
+   const __be32 *associativity);
+extern int fake_numa_create_new_node(unsigned long end_pfn,
+   unsigned int *nid);
+
+struct assoc_arrays {
+   u32 n_arrays;
+   u32 array_sz;
+   const __be32 *arrays;
+};
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_TOPOLOGY_H */
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 3a048e9..6c982df 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -39,7 +39,6 @@
 #include 
 #include 
 #include 
-#include 
 
 static int numa_enabled = 1;
 
@@ -87,8 +86,8 @@ static void __init setup_node_to_cpumask_map(void)
dbg("Node to cpumask map for %d nodes\n", nr_node_ids);
 }
 
-static int __init fake_numa_create_new_node(unsigned long end_pfn,
-   unsigned int *nid)
+int __init fake_numa_create_new_node(unsigned long end_pfn,
+   unsigned int *nid)
 {
unsigned long long mem;
char *p = cmdline;
@@ -194,7 +193,7 @@ int __node_distance(int a, int b)
 }
 EXPORT_SYMBOL(__node_distance);
 
-static void initialize_distance_lookup_table(int nid,
+void initialize_distance_lookup_table(int nid,
const __be32 *associativity)
 {
int i;
@@ -209,6 +208,7 @@ static void initialize_distance_lookup_table(int nid,
distance_lookup_table[nid][i] = of_read_number(entry, 1);
}
 }
+EXPORT_SYMBOL(initialize_distance_lookup_table);
 
 /* Returns nid in the range [0..MAX_NUMNODES-1], or -1 if no useful numa
  * info is found.
@@ -356,98 +356,6 @@ static void __init get_n_mem_cells(int *n_addr_cells, int 
*n_size_cells)
of_node_put(memory);
 }
 
-static unsigned long read_n_cells(int n, const __be32 **buf)
-{
-   unsigned long result = 0;
-
-   while (n--) {
-   result = (result << 32) | of_read_number(*buf, 1);
-   (*buf)++;
-   }
-   return result;
-}
-
-struct assoc_arrays {
-   u32 n_arrays;
-   u32 array_sz;
-   const __be32 *arrays;
-};
-
-/*
- * Retrieve and validate the list of associativity arrays for drconf
- * memory from the ibm,associativity-lookup-arrays property of the
- * device tree..
- *
- * The layout of the ibm,associativity-lookup-arrays property is a number N
- * indicating the number of associativity arrays, followed by a number M
- * indicating the size of each associativity array, followed by a list
- * of N associativity arrays.
- */
-static int of_get_assoc_arrays(struct assoc_arrays *aa)
-{
-   struct device_node *memory;
-   const __be32 *prop;
-   u32 len;
-
-   memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
-   if (!memory)
-   return -1;
-
-   prop = of_get_property(memory, "ibm,associativity-lookup-arrays", );
-   if (!prop || len < 2 * sizeof(unsigned int)) {
-   of_node_put(memory);
-   return -1;
-   }
-
-   aa->n_arrays = of_read_number(prop++, 1);
-   aa->array_sz = of_read_number(prop++, 1);
-
-   of_node_put(memory);
-
-   /* Now that we know the number of arrays and size of each array,
-* revalidate the size of the property read in.
-*/
-   if (len < (aa->n_arrays * aa->array_sz + 2) * sizeof(unsigned int))
-   return -1;
-
-   aa->arrays = prop;
-   return 0;
-}
-
-/*
- * This is like of_node_to_nid_single() for memory represented in the
- * 

[Internal Review PATCH] powerpc/pseries: Move DRMEM processing out of prom.c

2018-11-26 Thread Michael Bringmann
The implementation of the pseries-specific dynamic memory features
is currently implemented in several non-pseries-specific files.
This patch set moves the implementation of the device-tree parsing
code for the properties ibm,dynamic-memory, ibm,dynamic-memory-v2,
and its representation in the kernel into the platform-specific
directory to the Pseries features.

This patch refactors references to drmem features out of prom.c, so
that they can be moved to drmem.c.  Changes include creating a
platform function platform_early_init_dt_scan_memory_ppc that any
powerpc platform may implement, and moving a support function to
powerpc/include/asm/sparsemem.h

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/include/asm/platform.h|   23 
 arch/powerpc/include/asm/prom.h|3 +
 arch/powerpc/include/asm/sparsemem.h   |   19 +++
 arch/powerpc/kernel/prom.c |   90 +---
 arch/powerpc/platforms/pseries/drmem.c |   73 ++
 5 files changed, 122 insertions(+), 86 deletions(-)
 create mode 100644 arch/powerpc/include/asm/platform.h

diff --git a/arch/powerpc/include/asm/platform.h 
b/arch/powerpc/include/asm/platform.h
new file mode 100644
index 000..36f0f9e
--- /dev/null
+++ b/arch/powerpc/include/asm/platform.h
@@ -0,0 +1,23 @@
+#ifndef _POWERPC_PLATFORM_H
+#define _POWERPC_PLATFORM_H
+#ifdef __KERNEL__
+
+/*
+ * Definitions for talking to the Platform-specific functions of PowerPC
+ *
+ * Copyright (C) 2018 Michael Bringmann, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include 
+
+/* Memory initialization support */
+extern int platform_early_init_dt_scan_memory_ppc(unsigned long node,
+   const char *uname,
+   int depth, void *data);
+
+#endif /* __KERNEL__ */
+#endif /* _POWERPC_PLATFORM_H */
diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index b04c5ce..4504773 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -180,5 +180,8 @@ extern int of_read_drc_info_cell(struct property **prop,
 /* Option Vector 6: IBM PAPR hints */
 #define OV6_LINUX  0x02/* Linux is our OS */
 
+/* Other functions */
+extern bool validate_mem_limit(u64 base, u64 *size);
+
 #endif /* __KERNEL__ */
 #endif /* _POWERPC_PROM_H */
diff --git a/arch/powerpc/include/asm/sparsemem.h 
b/arch/powerpc/include/asm/sparsemem.h
index 68da493..25edfc2 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -32,5 +32,24 @@ static inline int hot_add_scn_to_nid(unsigned long scn_addr)
 #endif /* CONFIG_NUMA */
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
+
+#ifdef CONFIG_SPARSEMEM
+static inline bool validate_mem_limit(u64 base, u64 *size) 
+{
+   u64 max_mem = 1UL << (MAX_PHYSMEM_BITS);
+
+   if (base >= max_mem)
+   return false;
+   if ((base + *size) > max_mem)
+   *size = max_mem - base;
+   return true;
+}
+#else
+static inline bool validate_mem_limit(u64 base, u64 *size) 
+{
+   return true;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_SPARSEMEM_H */
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index fe758ce..ea32fee 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -58,7 +58,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 
@@ -444,96 +444,14 @@ static int __init early_init_dt_scan_chosen_ppc(unsigned 
long node,
  * size if it cross the limit.
  */
 
-#ifdef CONFIG_SPARSEMEM
-static bool validate_mem_limit(u64 base, u64 *size)
-{
-   u64 max_mem = 1UL << (MAX_PHYSMEM_BITS);
-
-   if (base >= max_mem)
-   return false;
-   if ((base + *size) > max_mem)
-   *size = max_mem - base;
-   return true;
-}
-#else
-static bool validate_mem_limit(u64 base, u64 *size)
-{
-   return true;
-}
-#endif
-
-#ifdef CONFIG_PPC_PSERIES
-/*
- * Interpret the ibm dynamic reconfiguration memory LMBs.
- * This contains a list of memory blocks along with NUMA affinity
- * information.
- */
-static void __init early_init_drmem_lmb(struct drmem_lmb *lmb,
-   const __be32 **usm)
-{
-   u64 base, size;
-   int is_kexec_kdump = 0, rngs;
-
-   base = lmb->base_addr;
-   size = drmem_lmb_size();
-   rngs = 1;
-
-   /*
-* Skip this block if the reserved bit is set in flags
-* or if the block is not assigned to this partition.
-*/
-   if ((lmb->flags & DRCONF_MEM_RESERVED) ||
-   !(lmb->flags & DRCONF_MEM_ASSIGNED))
-   return;
-
-   if (*usm)
-   is_kexec_kdump = 

[Internal Review PATCH] powerpc/pseries: Relocate drmem.c to pseries

2018-11-26 Thread Michael Bringmann
The implementation of the pseries-specific dynamic memory features
is currently implemented in several non-pseries-specific files.
This patch set moves the implementation of the device-tree parsing
code for the properties ibm,dynamic-memory, ibm,dynamic-memory-v2,
and its representation in the kernel into the platform-specific
directory to the Pseries features.

This patch moves drmem.c from kernel directory arch/powerpc/mm to
powerpc/platforms/pseries.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/mm/Makefile|2 
 arch/powerpc/mm/drmem.c |  447 ---
 arch/powerpc/platforms/pseries/Makefile |3 
 arch/powerpc/platforms/pseries/drmem.c  |  447 +++
 4 files changed, 450 insertions(+), 449 deletions(-)
 delete mode 100644 arch/powerpc/mm/drmem.c
 create mode 100644 arch/powerpc/platforms/pseries/drmem.c

diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index ca96e7b..06281e0f 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -9,7 +9,7 @@ CFLAGS_REMOVE_slb.o = $(CC_FLAGS_FTRACE)
 
 obj-y  := fault.o mem.o pgtable.o mmap.o \
   init_$(BITS).o pgtable_$(BITS).o \
-  init-common.o mmu_context.o drmem.o
+  init-common.o mmu_context.o
 obj-$(CONFIG_PPC_MMU_NOHASH)   += mmu_context_nohash.o tlb_nohash.o \
   tlb_nohash_low.o
 obj-$(CONFIG_PPC_BOOK3E)   += tlb_low_$(BITS)e.o
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
deleted file mode 100644
index 3f18036..000
--- a/arch/powerpc/mm/drmem.c
+++ /dev/null
@@ -1,447 +0,0 @@
-/*
- * Dynamic reconfiguration memory support
- *
- * Copyright 2017 IBM Corporation
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#define pr_fmt(fmt) "drmem: " fmt
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-static struct drmem_lmb_info __drmem_info;
-struct drmem_lmb_info *drmem_info = &__drmem_info;
-
-u64 drmem_lmb_memory_max(void)
-{
-   struct drmem_lmb *last_lmb;
-
-   last_lmb = _info->lmbs[drmem_info->n_lmbs - 1];
-   return last_lmb->base_addr + drmem_lmb_size();
-}
-
-static u32 drmem_lmb_flags(struct drmem_lmb *lmb)
-{
-   /*
-* Return the value of the lmb flags field minus the reserved
-* bit used internally for hotplug processing.
-*/
-   return lmb->flags & ~DRMEM_LMB_RESERVED;
-}
-
-static struct property *clone_property(struct property *prop, u32 prop_sz)
-{
-   struct property *new_prop;
-
-   new_prop = kzalloc(sizeof(*new_prop), GFP_KERNEL);
-   if (!new_prop)
-   return NULL;
-
-   new_prop->name = kstrdup(prop->name, GFP_KERNEL);
-   new_prop->value = kzalloc(prop_sz, GFP_KERNEL);
-   if (!new_prop->name || !new_prop->value) {
-   kfree(new_prop->name);
-   kfree(new_prop->value);
-   kfree(new_prop);
-   return NULL;
-   }
-
-   new_prop->length = prop_sz;
-#if defined(CONFIG_OF_DYNAMIC)
-   of_property_set_flag(new_prop, OF_DYNAMIC);
-#endif
-   return new_prop;
-}
-
-static int drmem_update_dt_v1(struct device_node *memory,
- struct property *prop)
-{
-   struct property *new_prop;
-   struct of_drconf_cell_v1 *dr_cell;
-   struct drmem_lmb *lmb;
-   u32 *p;
-
-   new_prop = clone_property(prop, prop->length);
-   if (!new_prop)
-   return -1;
-
-   p = new_prop->value;
-   *p++ = cpu_to_be32(drmem_info->n_lmbs);
-
-   dr_cell = (struct of_drconf_cell_v1 *)p;
-
-   for_each_drmem_lmb(lmb) {
-   dr_cell->base_addr = cpu_to_be64(lmb->base_addr);
-   dr_cell->drc_index = cpu_to_be32(lmb->drc_index);
-   dr_cell->aa_index = cpu_to_be32(lmb->aa_index);
-   dr_cell->flags = cpu_to_be32(drmem_lmb_flags(lmb));
-
-   dr_cell++;
-   }
-
-   of_update_property(memory, new_prop);
-   return 0;
-}
-
-static void init_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
-   struct drmem_lmb *lmb)
-{
-   dr_cell->base_addr = cpu_to_be64(lmb->base_addr);
-   dr_cell->drc_index = cpu_to_be32(lmb->drc_index);
-   dr_cell->aa_index = cpu_to_be32(lmb->aa_index);
-   dr_cell->flags = cpu_to_be32(drmem_lmb_flags(lmb));
-}
-
-static int drmem_update_dt_v2(struct device_node *memory,
- struct property *prop)
-{
-   struct property *new_prop;
-   struct of_drconf_cell_v2 *dr_cell;
-   struct drmem_lmb *lmb, *prev_lmb;
-   u32 lmb_sets, prop_sz, seq_lmbs;
-   u32 

[Internal Review PATCH] powerpc/pseries: Refactor code to centralize drmem feature

2018-11-26 Thread Michael Bringmann
The implementation of the pseries-specific dynamic memory features
is currently implemented in several non-pseries-specific files.
This patch set moves the implementation of the device-tree parsing
code for the properties ibm,dynamic-memory, ibm,dynamic-memory-v2,
and its representation in the kernel into the platform-specific
directory to the Pseries features.

Signed-off-by: Michael Bringmann 

Michael Bringmann (4):
  powerpc/pseries: Relocate drmem.c to pseries
  powerpc/pseries: Move DRMEM processing out of prom.c
  powerpc/pseries: Move DRMEM processing out of numa.c
  powerpc/pseries: Relocate drmem.h to pseries



[Internal Review PATCH 2/3] powerpc/numa: Define mapping between HW and kernel cpus

2018-11-26 Thread Michael Bringmann
Define interface to map external powerpc cpus across multiple nodes
to a range of kernel cpu values.  Mapping is intended to prevent
confusion within the kernel about the cpu+node mapping, and the
changes in configuration that may happen due to powerpc LPAR
migration or other associativity changes during the lifetime of a
system.  These interfaces will be used entirely within the powerpc
kernel code to maintain separation between the machine and kernel
contexts.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/include/asm/topology.h   |   31 +++
 arch/powerpc/platforms/pseries/Kconfig|   10 ++
 arch/powerpc/platforms/pseries/Makefile   |1 
 arch/powerpc/platforms/pseries/cpuremap.c |  131 +
 4 files changed, 173 insertions(+)
 create mode 100644 arch/powerpc/platforms/pseries/cpuremap.c

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 4621f40..db11969 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -135,5 +135,36 @@ static inline void shared_proc_topology_init(void) {}
 #endif
 #endif
 
+#define CPUREMAP_NO_CPU(~0)
+#define CPUREMAP_NO_THREAD (~0)
+
+#ifdef CONFIG_CPUREMAP
+extern int cpuremap_thread_to_cpu(int thread_index);
+   /* Return CPUREMAP_NO_CPU if not found */
+extern int cpuremap_map_cpu(int thread_index, int in_core_ndx, int node);
+   /* Return CPUREMAP_NO_CPU if fails */
+extern int cpuremap_reserve_cpu(int cpu);
+   /* Return CPUREMAP_NO_CPU if fails */
+extern int cpuremap_release_cpu(int cpu);
+   /* Return CPUREMAP_NO_CPU if fails */
+extern int cpuremap_cpu_to_thread(int cpu);
+   /* Return CPUREMAP_NO_THREAD if not found */
+extern void cpuremap_init(void);
+   /* Identify necessary constants & alloc memory at boot */
+#else
+static inline int cpuremap_thread_to_cpu(int thread_index)
+{
+   return thread_index;
+}
+static inline int cpuremap_map_cpu(int thread_index, int in_core_ndx, int node)
+{
+   return thread_index;
+}
+static inline int cpuremap_reserve_cpu(int cpu) { return cpu; }
+static inline int cpuremap_release_cpu(int cpu) { return cpu; }
+static inline int cpuremap_cpu_to_thread(int cpu) { return cpu; }
+static inline void cpuremap_init(void) {}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_TOPOLOGY_H */
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 2e4bd32..c35009f 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -145,3 +145,13 @@ config PAPR_SCM
tristate "Support for the PAPR Storage Class Memory interface"
help
  Enable access to hypervisor provided storage class memory.
+  Enable access to hypervisor provided storage class memory.
+
+config CPUREMAP
+bool "Support for mapping hw cpu+node to kernel index"
+depends on SMP && (PPC_PSERIES)
+---help---
+  Say Y here to be able to remap hw cpu+node to standardized
+  kernel CPUs at runtime on Pseries machines.
+
+  Say N if you are unsure.
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index a43ec84..ad49d8e 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_KEXEC_CORE)  += kexec.o
 obj-$(CONFIG_PSERIES_ENERGY)   += pseries_energy.o
 
 obj-$(CONFIG_HOTPLUG_CPU)  += hotplug-cpu.o
+obj-$(CONFIG_CPUREMAP) += cpuremap.o
 obj-$(CONFIG_MEMORY_HOTPLUG)   += hotplug-memory.o pmem.o
 
 obj-$(CONFIG_HVC_CONSOLE)  += hvconsole.o
diff --git a/arch/powerpc/platforms/pseries/cpuremap.c 
b/arch/powerpc/platforms/pseries/cpuremap.c
new file mode 100644
index 000..86fdf12
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/cpuremap.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct cpuremap_cpu {
+   int thread_index;
+   /* Set to thread_index from ibm,ppc-interrupt-server#s arrays
+* Don't clear when release'ed
+*/
+   int node;
+   bool in_use;
+   /* Set to true when reserve'ed
+* Don't clear when release'ed
+   */
+};
+
+struct cpuremap_struct {
+   int num_nodes;
+   int num_cores;
+   int num_threads_per_core;
+   struct cpuremap_cpu *threads;
+} cpuremap_data;
+
+
+void cpuremap_init(void)
+{
+   int i, k;
+
+   /* Identify necessary constants & alloc memory at boot */
+   cpuremap_data.num_threads_per_core = 8;
+   cpuremap_data.num_cores = 32;
+   cpuremap_data.num_nodes =
+   nr_cpu_ids /
+   (cpuremap_data.num_threads_per_core * cpuremap_data.num_cores);
+   cpuremap_data.threads = kcalloc(nr_cpu_ids, 

[Internal Review PATCH 3/3] powerpc/numa: Apply mapping between HW and kernel cpus

2018-11-26 Thread Michael Bringmann
Apply new interface to map external powerpc cpus across multiple
nodes to a range of kernel cpu values.  Mapping is intended to
prevent confusion within the kernel about the cpu+node mapping, and
the changes in configuration that may happen due to powerpc LPAR
migration or other associativity changes during the lifetime of a
system.  These interfaces exchange the thread_index provided by the
'ibm,ppc-interrupt-server#s' properties, for an internal index to
be used by kernel scheduling interfaces.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/mm/numa.c   |   45 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c |   15 +++--
 2 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 460d60f..9825fc9 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1063,7 +1063,8 @@ u64 memory_hotplug_max(void)
 
 struct topology_update_data {
struct topology_update_data *next;
-   unsigned int cpu;
+   unsigned int old_cpu;
+   unsigned int new_cpu;
int old_nid;
int new_nid;
 };
@@ -1253,13 +1254,13 @@ static int update_cpu_topology(void *data)
 
for (update = data; update; update = update->next) {
int new_nid = update->new_nid;
-   if (cpu != update->cpu)
+   if (cpu != update->new_cpu)
continue;
 
-   unmap_cpu_from_node(cpu);
-   map_cpu_to_node(cpu, new_nid);
-   set_cpu_numa_node(cpu, new_nid);
-   set_cpu_numa_mem(cpu, local_memory_node(new_nid));
+   unmap_cpu_from_node(update->old_cpu);
+   map_cpu_to_node(update->new_cpu, new_nid);
+   set_cpu_numa_node(update->new_cpu, new_nid);
+   set_cpu_numa_mem(update->new_cpu, local_memory_node(new_nid));
vdso_getcpu_init();
}
 
@@ -1283,7 +1284,7 @@ static int update_lookup_table(void *data)
int nid, base, j;
 
nid = update->new_nid;
-   base = cpu_first_thread_sibling(update->cpu);
+   base = cpu_first_thread_sibling(update->new_cpu);
 
for (j = 0; j < threads_per_core; j++) {
update_numa_cpu_lookup_table(base + j, nid);
@@ -1305,7 +1306,7 @@ int numa_update_cpu_topology(bool cpus_locked)
struct topology_update_data *updates, *ud;
cpumask_t updated_cpus;
struct device *dev;
-   int weight, new_nid, i = 0;
+   int weight, new_nid, i = 0, ii;
 
if (!prrn_enabled && !vphn_enabled && topology_inited)
return 0;
@@ -1349,12 +1350,16 @@ int numa_update_cpu_topology(bool cpus_locked)
continue;
}
 
+   ii = 0;
for_each_cpu(sibling, cpu_sibling_mask(cpu)) {
ud = [i++];
ud->next = [i];
-   ud->cpu = sibling;
ud->new_nid = new_nid;
ud->old_nid = numa_cpu_lookup_table[sibling];
+   ud->old_cpu = sibling;
+   ud->new_cpu = cpuremap_map_cpu(
+   get_hard_smp_processor_id(sibling),
+   ii++, new_nid);
cpumask_set_cpu(sibling, _cpus);
}
cpu = cpu_last_thread_sibling(cpu);
@@ -1370,9 +1375,10 @@ int numa_update_cpu_topology(bool cpus_locked)
pr_debug("Topology update for the following CPUs:\n");
if (cpumask_weight(_cpus)) {
for (ud = [0]; ud; ud = ud->next) {
-   pr_debug("cpu %d moving from node %d "
- "to %d\n", ud->cpu,
- ud->old_nid, ud->new_nid);
+   pr_debug("cpu %d, node %d moving to"
+" cpu %d, node %d\n",
+ud->old_cpu, ud->old_nid,
+ud->new_cpu, ud->new_nid);
}
}
 
@@ -1409,13 +1415,20 @@ int numa_update_cpu_topology(bool cpus_locked)
 cpumask_of(raw_smp_processor_id()));
 
for (ud = [0]; ud; ud = ud->next) {
-   unregister_cpu_under_node(ud->cpu, ud->old_nid);
-   register_cpu_under_node(ud->cpu, ud->new_nid);
+   unregister_cpu_under_node(ud->old_cpu, ud->old_nid);
+   register_cpu_under_node(ud->new_cpu, ud->new_nid);
 
-   dev = get_cpu_device(ud->cpu);
+   dev = get_cpu_device(ud->old_cpu);
if (dev)
kobject_uevent(>kobj, KOBJ_CHANGE);
-   cpumask_clear_cpu(ud->cpu, _associativity_changes_mask);
+   cpumask_clear_cpu(ud->old_cpu, _associativity_changes_mask);
+   if (ud->old_cpu != 

[Internal Review PATCH 1/3] powerpc/numa: Conditionally online new nodes

2018-11-26 Thread Michael Bringmann
Add argument to allow caller to determine whether the node identified
for a cpu after an associativity / affinity change should be inited.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/include/asm/topology.h  |2 +-
 arch/powerpc/mm/numa.c   |6 +++---
 arch/powerpc/platforms/pseries/hotplug-cpu.c |2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index a4a718d..4621f40 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -90,7 +90,7 @@ static inline void update_numa_cpu_lookup_table(unsigned int 
cpu, int node) {}
 extern int start_topology_update(void);
 extern int stop_topology_update(void);
 extern int prrn_is_enabled(void);
-extern int find_and_online_cpu_nid(int cpu);
+extern int find_and_online_cpu_nid(int cpu, bool must_online);
 extern int timed_topology_update(int nsecs);
 extern void __init shared_proc_topology_init(void);
 #else
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 3a048e9..460d60f 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1197,7 +1197,7 @@ static long vphn_get_associativity(unsigned long cpu,
return rc;
 }
 
-int find_and_online_cpu_nid(int cpu)
+int find_and_online_cpu_nid(int cpu, bool must_online)
 {
__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
int new_nid;
@@ -1210,7 +1210,7 @@ int find_and_online_cpu_nid(int cpu)
if (new_nid < 0 || !node_possible(new_nid))
new_nid = first_online_node;
 
-   if (NODE_DATA(new_nid) == NULL) {
+   if (must_online && (NODE_DATA(new_nid) == NULL)) {
 #ifdef CONFIG_MEMORY_HOTPLUG
/*
 * Need to ensure that NODE_DATA is initialized for a node from
@@ -1337,7 +1337,7 @@ int numa_update_cpu_topology(bool cpus_locked)
continue;
}
 
-   new_nid = find_and_online_cpu_nid(cpu);
+   new_nid = find_and_online_cpu_nid(cpu, true);
 
if (new_nid == numa_cpu_lookup_table[cpu]) {
cpumask_andnot(_associativity_changes_mask,
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 2f8e621..620cb57 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -366,7 +366,7 @@ static int dlpar_online_cpu(struct device_node *dn)
!= CPU_STATE_OFFLINE);
cpu_maps_update_done();
timed_topology_update(1);
-   find_and_online_cpu_nid(cpu);
+   find_and_online_cpu_nid(cpu, true);
rc = device_online(get_cpu_device(cpu));
if (rc)
goto out;



[Internal Review PATCH] powerpc/pseries: Remap hw to kernel cpu indexes

2018-11-26 Thread Michael Bringmann
Define and apply new interface to map hardware-specific powerpc cpu
ids to a kernel specific range of cpu values.  Mapping is intended
to prevent confusion within the kernel about the cpu+node mapping,
and the changes in configuration that may happen due to powerpc LPAR
migration or other associativity changes during the lifetime of a
system.  These interfaces exchange the thread_index provided by the
'ibm,ppc-interrupt-server#s' properties, for an internal index to
be used by kernel scheduling interfaces.

Signed-off-by: Michael Bringmann 

Michael Bringmann (3):
  powerpc/numa: Conditionally online new nodes
  powerpc/numa: Define mapping between HW and kernel cpus
  powerpc/numa: Apply mapping between HW and kernel cpu



[PATCH 4/4] selftests/powerpc: Add checks for transactional sigreturn

2018-11-26 Thread Breno Leitao
This is a new test case that creates a signal and starts a suspended
transaction inside the signal handler.

It returns from the signal handler with the CPU at suspended state, but
without setting user context MSR Transaction State (TS) field.

The kernel signal handler code should be able to handle this discrepancy
instead of crashing.

This code could be compiled and used to test 32 and 64-bits signal
handlers.

Signed-off-by: Breno Leitao 
Signed-off-by: Gustavo Romero 
---
 tools/testing/selftests/powerpc/tm/.gitignore |  1 +
 tools/testing/selftests/powerpc/tm/Makefile   |  2 +-
 .../powerpc/tm/tm-signal-sigreturn-nt.c   | 46 +++
 3 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-sigreturn-nt.c

diff --git a/tools/testing/selftests/powerpc/tm/.gitignore 
b/tools/testing/selftests/powerpc/tm/.gitignore
index c3ee8393dae8..208452a93e2c 100644
--- a/tools/testing/selftests/powerpc/tm/.gitignore
+++ b/tools/testing/selftests/powerpc/tm/.gitignore
@@ -11,6 +11,7 @@ tm-signal-context-chk-fpu
 tm-signal-context-chk-gpr
 tm-signal-context-chk-vmx
 tm-signal-context-chk-vsx
+tm-signal-sigreturn-nt
 tm-vmx-unavail
 tm-unavailable
 tm-trap
diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index 9fc2cf6fbc92..75a685359129 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -4,7 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr 
tm-signal-context-chk-fpu
 
 TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv 
tm-signal-stack \
tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable 
tm-trap \
-   $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn
+   $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-sigreturn-nt
 
 top_srcdir = ../../../../..
 include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-sigreturn-nt.c 
b/tools/testing/selftests/powerpc/tm/tm-signal-sigreturn-nt.c
new file mode 100644
index ..56fbf9f6bbf3
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-sigreturn-nt.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018, Breno Leitao, Gustavo Romero, IBM Corp.
+ *
+ * A test case that creates a signal and starts a suspended transaction
+ * inside the signal handler.
+ *
+ * It returns from the signal handler with the CPU at suspended state, but
+ * without setting usercontext MSR Transaction State (TS) fields.
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+
+#include "utils.h"
+
+void trap_signal_handler(int signo, siginfo_t *si, void *uc)
+{
+   ucontext_t *ucp = (ucontext_t *) uc;
+
+   asm("tbegin.; tsuspend.;");
+
+   /* Skip 'trap' instruction if it succeed */
+   ucp->uc_mcontext.regs->nip += 4;
+}
+
+int tm_signal_sigreturn_nt(void)
+{
+   struct sigaction trap_sa;
+
+   trap_sa.sa_flags = SA_SIGINFO;
+   trap_sa.sa_sigaction = trap_signal_handler;
+
+   sigaction(SIGTRAP, _sa, NULL);
+
+   raise(SIGTRAP);
+
+   return EXIT_SUCCESS;
+}
+
+int main(int argc, char **argv)
+{
+   test_harness(tm_signal_sigreturn_nt, "tm_signal_sigreturn_nt");
+}
+
-- 
2.19.0



[PATCH 3/4] powerpc/tm: Unset MSR[TS] if not recheckpointing

2018-11-26 Thread Breno Leitao
There is a TM Bad Thing bug that can be caused when you return from a
signal context in a suspended transaction but with ucontext MSR[TS] unset.

This forces regs->msr[TS] to be set at syscall entrance (since the CPU
state is transactional). It also calls treclaim() to flush the transaction
state, which is done based on the live (mfmsr) MSR state.

Since user context MSR[TS] is not set, then restore_tm_sigcontexts() is not
called, thus, not executing recheckpoint, keeping the CPU state as not
transactional. When calling rfid, SRR1 will have MSR[TS] set, but the CPU
state is non transactional, causing the TM Bad Thing with the following
stack:

[   33.862316] Bad kernel stack pointer 3fffd9dce3e0 at c000c47c
cpu 0x8: Vector: 700 (Program Check) at [c0003ff7fd40]
pc: c000c47c: fast_exception_return+0xac/0xb4
lr: 3fff865f442c
sp: 3fffd9dce3e0
   msr: 800102a03031
  current = 0xc0041f68b700
  paca= 0xcfb84800   softe: 0irq_happened: 0x01
pid   = 1721, comm = tm-signal-sigre
Linux version 4.9.0-3-powerpc64le (debian-ker...@lists.debian.org) (gcc 
version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 
(2017-06-26)
WARNING: exception is not recoverable, can't continue

The same problem happens on 32-bits signal handler, and the fix is very
similar, if tm_recheckpoint() is not executed, then regs->msr[TS] should be
zeroed.

This patch also fixes a sparse warning related to lack of indentation when
CONFIG_PPC_TRANSACTIONAL_MEM is set.

Fixes: 2b0a576d15e0e ("powerpc: Add new transactional memory state to the 
signal context")
CC: Stable  # 3.10+
Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/signal_32.c | 18 +-
 arch/powerpc/kernel/signal_64.c | 20 
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index e6474a45cef5..6327fd79b0fb 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -1140,11 +1140,11 @@ SYSCALL_DEFINE0(rt_sigreturn)
 {
struct rt_sigframe __user *rt_sf;
struct pt_regs *regs = current_pt_regs();
+   int tm_restore = 0;
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
struct ucontext __user *uc_transact;
unsigned long msr_hi;
unsigned long tmp;
-   int tm_restore = 0;
 #endif
/* Always make any pending restarted system calls return -EINTR */
current->restart_block.fn = do_no_restart_syscall;
@@ -1192,11 +1192,19 @@ SYSCALL_DEFINE0(rt_sigreturn)
goto bad;
}
}
-   if (!tm_restore)
-   /* Fall through, for non-TM restore */
+   if (!tm_restore) {
+   /*
+* Unset regs->msr because ucontext MSR TS is not
+* set, and recheckpoint was not called. This avoid
+* hitting a TM Bad thing at RFID
+*/
+   regs->msr &= ~MSR_TS_MASK;
+   }
+   /* Fall through, for non-TM restore */
 #endif
-   if (do_setcontext(_sf->uc, regs, 1))
-   goto bad;
+   if (!tm_restore)
+   if (do_setcontext(_sf->uc, regs, 1))
+   goto bad;
 
/*
 * It's not clear whether or why it is desirable to save the
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 83d51bf586c7..daa28cb72272 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -740,11 +740,23 @@ SYSCALL_DEFINE0(rt_sigreturn)
   _transact->uc_mcontext))
goto badframe;
}
-   else
-   /* Fall through, for non-TM restore */
 #endif
-   if (restore_sigcontext(current, NULL, 1, >uc_mcontext))
-   goto badframe;
+   /* Fall through, for non-TM restore */
+   if (!MSR_TM_ACTIVE(msr)) {
+   /*
+* Unset MSR[TS] on the thread regs since MSR from user
+* context does not have MSR active, and recheckpoint was
+* not called since restore_tm_sigcontexts() was not called
+* also.
+*
+* If not unsetting it, the code can RFID to userspace with
+* MSR[TS] set, but without CPU in the proper state,
+* causing a TM bad thing.
+*/
+   current->thread.regs->msr &= ~MSR_TS_MASK;
+   if (restore_sigcontext(current, NULL, 1, >uc_mcontext))
+   goto badframe;
+   }
 
if (restore_altstack(>uc_stack))
goto badframe;
-- 
2.19.0



[PATCH 2/4] powerpc/tm: Print scratch value

2018-11-26 Thread Breno Leitao
Usually a TM Bad Thing exception is raised due to three different problems.
a) touching SPRs in an active transaction; b) using TM instruction with the
facility disabled and c) setting a wrong MSR/SRR1 at RFID.

The two initial cases are easy to identify by looking at the instructions.
The latter case is harder, because the MSR is masked after RFID, so, it is
very useful to look at the previous MSR (SRR1) before RFID as also the
current and masked MSR.

Since MSR is saved at paca just before RFID, this patch prints it if a TM
Bad thing happen, helping to understand what is the invalid TM transition
that is causing the exception.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/traps.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 9a86572db1ef..00af2c4febf4 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1434,7 +1434,8 @@ void program_check_exception(struct pt_regs *regs)
goto bail;
} else {
printk(KERN_EMERG "Unexpected TM Bad Thing exception "
-  "at %lx (msr 0x%lx)\n", regs->nip, regs->msr);
+  "at %lx (msr 0x%lx) tm_scratch=%llx\n",
+  regs->nip, regs->msr, get_paca()->tm_scratch);
die("Unrecoverable exception", regs, SIGABRT);
}
}
-- 
2.19.0



[PATCH 1/4] powerpc/tm: Save MSR to PACA before RFID

2018-11-26 Thread Breno Leitao
As other exit points, move SRR1 (MSR) into paca->tm_scratch, so, if
there is a TM Bad Thing in RFID, it is easy to understand what was the
SRR1 value being used.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/entry_64.S | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 7b1693adff2a..ada1879a4e26 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -286,6 +286,10 @@ BEGIN_FTR_SECTION
HMT_MEDIUM_LOW
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+   std r8, PACATMSCRATCH(r13)
+#endif
+
ld  r13,GPR13(r1)   /* only restore r13 if returning to usermode */
ld  r2,GPR2(r1)
ld  r1,GPR1(r1)
-- 
2.19.0



Re: [Intel-wired-lan] [PATCH V2] mm: Replace all open encodings for NUMA_NO_NODE

2018-11-26 Thread Jeff Kirsher
On Mon, 2018-11-26 at 17:56 +0530, Anshuman Khandual wrote:
> At present there are multiple places where invalid node number is
> encoded
> as -1. Even though implicitly understood it is always better to have
> macros
> in there. Replace these open encodings for an invalid node number
> with the
> global macro NUMA_NO_NODE. This helps remove NUMA related assumptions
> like
> 'invalid node' from various places redirecting them to a common
> definition.
> 
> Signed-off-by: Anshuman Khandual 

For the 'ixgbe' driver changes.

Acked-by: Jeff Kirsher 


> ---
> Changes in V2:
> 
> - Added inclusion of 'numa.h' header at various places per Andrew
> - Updated 'dev_to_node' to use NUMA_NO_NODE instead per Vinod
> 
> Changes in V1: (https://lkml.org/lkml/2018/11/23/485)
> 
> - Dropped OCFS2 changes per Joseph
> - Dropped media/video drivers changes per Hans
> 
> RFC - https://patchwork.kernel.org/patch/10678035/
> 
> Build tested this with multiple cross compiler options like alpha,
> sparc,
> arm64, x86, powerpc, powerpc64le etc with their default config which
> might
> not have compiled tested all driver related changes. I will
> appreciate
> folks giving this a test in their respective build environment.
> 
> All these places for replacement were found by running the following
> grep
> patterns on the entire kernel code. Please let me know if this might
> have
> missed some instances. This might also have replaced some false
> positives.
> I will appreciate suggestions, inputs and review.
> 
> 1. git grep "nid == -1"
> 2. git grep "node == -1"
> 3. git grep "nid = -1"
> 4. git grep "node = -1"
> 
>  arch/alpha/include/asm/topology.h |  3 ++-
>  arch/ia64/kernel/numa.c   |  2 +-
>  arch/ia64/mm/discontig.c  |  6 +++---
>  arch/ia64/sn/kernel/io_common.c   |  3 ++-
>  arch/powerpc/include/asm/pci-bridge.h |  3 ++-
>  arch/powerpc/kernel/paca.c|  3 ++-
>  arch/powerpc/kernel/pci-common.c  |  3 ++-
>  arch/powerpc/mm/numa.c| 14 +++---
>  arch/powerpc/platforms/powernv/memtrace.c |  5 +++--
>  arch/sparc/kernel/auxio_32.c  |  3 ++-
>  arch/sparc/kernel/pci_fire.c  |  3 ++-
>  arch/sparc/kernel/pci_schizo.c|  3 ++-
>  arch/sparc/kernel/pcic.c  |  7 ---
>  arch/sparc/kernel/psycho_common.c |  3 ++-
>  arch/sparc/kernel/sbus.c  |  3 ++-
>  arch/sparc/mm/init_64.c   |  6 +++---
>  arch/sparc/prom/init_32.c |  3 ++-
>  arch/sparc/prom/init_64.c |  5 +++--
>  arch/sparc/prom/tree_32.c | 13 +++--
>  arch/sparc/prom/tree_64.c | 19 ++---
> --
>  arch/x86/include/asm/pci.h|  3 ++-
>  arch/x86/kernel/apic/x2apic_uv_x.c|  7 ---
>  arch/x86/kernel/smpboot.c |  3 ++-
>  arch/x86/platform/olpc/olpc_dt.c  | 17 +
>  drivers/block/mtip32xx/mtip32xx.c |  5 +++--
>  drivers/dma/dmaengine.c   |  4 +++-
>  drivers/infiniband/hw/hfi1/affinity.c |  3 ++-
>  drivers/infiniband/hw/hfi1/init.c |  3 ++-
>  drivers/iommu/dmar.c  |  5 +++--
>  drivers/iommu/intel-iommu.c   |  3 ++-
>  drivers/misc/sgi-xp/xpc_uv.c  |  3 ++-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  5 +++--
>  include/linux/device.h|  2 +-
>  init/init_task.c  |  3 ++-
>  kernel/kthread.c  |  3 ++-
>  kernel/sched/fair.c   | 15 ---
>  lib/cpumask.c |  3 ++-
>  mm/huge_memory.c  | 13 +++--
>  mm/hugetlb.c  |  3 ++-
>  mm/ksm.c  |  2 +-
>  mm/memory.c   |  7 ---
>  mm/memory_hotplug.c   | 12 ++--
>  mm/mempolicy.c|  2 +-
>  mm/page_alloc.c   |  4 ++--
>  mm/page_ext.c |  2 +-
>  net/core/pktgen.c |  3 ++-
>  net/qrtr/qrtr.c   |  3 ++-
>  tools/perf/bench/numa.c   |  6 +++---
>  48 files changed, 146 insertions(+), 108 deletions(-)



signature.asc
Description: This is a digitally signed message part


Re: [Intel-wired-lan] [PATCH V2] mm: Replace all open encodings for NUMA_NO_NODE

2018-11-26 Thread Jens Axboe
On 11/26/18 10:56 AM, Jeff Kirsher wrote:
> On Mon, 2018-11-26 at 17:56 +0530, Anshuman Khandual wrote:
>> At present there are multiple places where invalid node number is
>> encoded
>> as -1. Even though implicitly understood it is always better to have
>> macros
>> in there. Replace these open encodings for an invalid node number
>> with the
>> global macro NUMA_NO_NODE. This helps remove NUMA related assumptions
>> like
>> 'invalid node' from various places redirecting them to a common
>> definition.
>>
>> Signed-off-by: Anshuman Khandual 
> 
> For the 'ixgbe' driver changes.
> 
> Acked-by: Jeff Kirsher 

Lost the original, but for mtip32xx:

Acked-by: Jens Axboe 

-- 
Jens Axboe



[PATCH RESEND] powerpc/perf: Update perf_regs structure to include SIER

2018-11-26 Thread Madhavan Srinivasan
On each sample, Sample Instruction Event Register (SIER) content
is saved in pt_regs. SIER does not have a entry as-is in the pt_regs
but instead, SIER content is saved in the "dar" register of pt_regs.

Patch adds another entry to the perf_regs structure to include the "SIER"
printing which internally maps to the "dar" of pt_regs.

Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Alexander Shishkin 
Cc: Anju T Sudhakar 
Cc: Ravi Bangoria 
Tested-by: Ravi Bangoria 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/uapi/asm/perf_regs.h   | 1 +
 arch/powerpc/perf/perf_regs.c   | 1 +
 tools/arch/powerpc/include/uapi/asm/perf_regs.h | 1 +
 tools/perf/arch/powerpc/include/perf_regs.h | 3 ++-
 tools/perf/arch/powerpc/util/perf_regs.c| 1 +
 5 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h 
b/arch/powerpc/include/uapi/asm/perf_regs.h
index 9e52c86ccbd3..ff91192407d1 100644
--- a/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -46,6 +46,7 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_TRAP,
PERF_REG_POWERPC_DAR,
PERF_REG_POWERPC_DSISR,
+   PERF_REG_POWERPC_SIER,
PERF_REG_POWERPC_MAX,
 };
 #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c
index 09ceea6175ba..c262aea22ad9 100644
--- a/arch/powerpc/perf/perf_regs.c
+++ b/arch/powerpc/perf/perf_regs.c
@@ -69,6 +69,7 @@ static unsigned int pt_regs_offset[PERF_REG_POWERPC_MAX] = {
PT_REGS_OFFSET(PERF_REG_POWERPC_TRAP, trap),
PT_REGS_OFFSET(PERF_REG_POWERPC_DAR, dar),
PT_REGS_OFFSET(PERF_REG_POWERPC_DSISR, dsisr),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_SIER, dar),
 };
 
 u64 perf_reg_value(struct pt_regs *regs, int idx)
diff --git a/tools/arch/powerpc/include/uapi/asm/perf_regs.h 
b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
index 9e52c86ccbd3..ff91192407d1 100644
--- a/tools/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -46,6 +46,7 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_TRAP,
PERF_REG_POWERPC_DAR,
PERF_REG_POWERPC_DSISR,
+   PERF_REG_POWERPC_SIER,
PERF_REG_POWERPC_MAX,
 };
 #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/tools/perf/arch/powerpc/include/perf_regs.h 
b/tools/perf/arch/powerpc/include/perf_regs.h
index 00e37b106913..1076393e6f43 100644
--- a/tools/perf/arch/powerpc/include/perf_regs.h
+++ b/tools/perf/arch/powerpc/include/perf_regs.h
@@ -62,7 +62,8 @@ static const char *reg_names[] = {
[PERF_REG_POWERPC_SOFTE] = "softe",
[PERF_REG_POWERPC_TRAP] = "trap",
[PERF_REG_POWERPC_DAR] = "dar",
-   [PERF_REG_POWERPC_DSISR] = "dsisr"
+   [PERF_REG_POWERPC_DSISR] = "dsisr",
+   [PERF_REG_POWERPC_SIER] = "sier"
 };
 
 static inline const char *perf_reg_name(int id)
diff --git a/tools/perf/arch/powerpc/util/perf_regs.c 
b/tools/perf/arch/powerpc/util/perf_regs.c
index ec50939b0418..07fcd977d93e 100644
--- a/tools/perf/arch/powerpc/util/perf_regs.c
+++ b/tools/perf/arch/powerpc/util/perf_regs.c
@@ -52,6 +52,7 @@ const struct sample_reg sample_reg_masks[] = {
SMPL_REG(trap, PERF_REG_POWERPC_TRAP),
SMPL_REG(dar, PERF_REG_POWERPC_DAR),
SMPL_REG(dsisr, PERF_REG_POWERPC_DSISR),
+   SMPL_REG(sier, PERF_REG_POWERPC_SIER),
SMPL_REG_END
 };
 
-- 
2.7.4



Re: [PATCH v2 16/20] sparc: perf/core: advertise PMU exclusion capability

2018-11-26 Thread David Miller
From: Andrew Murray 
Date: Mon, 26 Nov 2018 11:12:32 +

> The SPARC PMU has the capability to exclude events based on context
>  - let's advertise that we support the PERF_PMU_CAP_EXCLUDE
> capability to ensure that perf doesn't prevent us from handling
> events where any exclusion flags are set.
> 
> Signed-off-by: Andrew Murray 

Acked-by: David S. Miller 


Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types

2018-11-26 Thread David Hildenbrand
On 26.11.18 15:20, Michal Suchánek wrote:
> On Mon, 26 Nov 2018 14:33:29 +0100
> David Hildenbrand  wrote:
> 
>> On 26.11.18 13:30, David Hildenbrand wrote:
>>> On 23.11.18 19:06, Michal Suchánek wrote:  
> 

 If we are going to fake the driver information we may as well add the
 type attribute and be done with it.

 I think the problem with the patch was more with the semantic than the
 attribute itself.

 What is normal, paravirtualized, and standby memory?

 I can understand DIMM device, baloon device, or whatever mechanism for
 adding memory you might have.

 I can understand "memory designated as standby by the cluster
 administrator".

 However, DIMM vs baloon is orthogonal to standby and should not be
 conflated into one property.

 paravirtualized means nothing at all in relationship to memory type and
 the desired online policy to me.  
>>>
>>> Right, so with whatever we come up, it should allow to make a decision
>>> in user space about
>>> - if memory is to be onlined automatically  
>>
>> And I will think about if we really should model standby memory. Maybe
>> it is really better to have in user space something like (as Dan noted)
> 
> If it is possible to designate the memory as standby or online in the
> s390 admin interface and the kernel does have access to this
> information it makes sense to forward it to userspace (as separate
> s390-specific property). If not then you need to make some kind of
> assumption like below and the user can tune the script according to
> their usecase.

Also true, standby memory really represents a distinct type of memory
block (memory seems to be there but really isn't). Right now I am
thinking about something like this (tried to formulate it on a very
generic level because we can't predict which mechanism might want to
make use of these types in the future).


/*
 * Memory block types allow user space to formulate rules if and how to
 * online memory blocks. The types are exposed to user space as text
 * strings in sysfs. While the typical online strategies are described
 * along with the types, there are use cases where that can differ (e.g.
 * use MOVABLE zone for more reliable huge page usage, use NORMAL zone
 * due to zone imbalance or because memory unplug is not intended).
 *
 * MEMORY_BLOCK_NONE:
 *  No memory block is to be created (e.g. device memory). Used internally
 *  only.
 *
 * MEMORY_BLOCK_REMOVABLE:
 *  This memory block type should be treated as if it can be
 *  removed/unplugged from the system again. E.g. there is a hardware
 *  interface to unplug such memory. This memory block type is usually
 *  onlined to the MOVABLE zone, to e.g. make offlining of it more
 *  reliable. Examples include ACPI and PPC DIMMs.
 *
 * MEMORY_BLOCK_UNREMOVABLE:
 *  This memory block type should be treated as if it can not be
 *  removed/unplugged again. E.g. there is no hardware interface to
 *  unplug such memory. This memory block type is usually onlined to
 *  the NORMAL zone, as offlining is not beneficial. Examples include boot
 *  memory on most architectures and memory added via balloon devices.
 *
 * MEMORY_BLOCK_STANDBY:
 *  The memory block type should be treated as if it can be
 *  removed/unplugged again, however the actual memory hot(un)plug is
 *  performed by onlining/offlining. In virtual environments, such memory
 *  is usually added during boot and never removed. Onlining memory will
 *  result in memory getting allocated to a VM. This memory type is usually
 *  not onlined automatically but explicitly by the administrator. One
 *  example is standby memory on s390x.
 */

> 
>>
>> if (isS390x() && type == "dimm") {
>>  /* don't online, on s390x system DIMMs are standby memory */
>> }
> 
> Thanks
> 
> Michal
> 


-- 

Thanks,

David / dhildenb


Re: [PATCH v2 03/20] perf/core: add PERF_PMU_CAP_EXCLUDE for exclusion capable PMUs

2018-11-26 Thread Andrew Murray
On Mon, Nov 26, 2018 at 02:10:24PM +, Robin Murphy wrote:
> Hi Andrew,
> 
> On 26/11/2018 11:12, Andrew Murray wrote:
> > Many PMU drivers do not have the capability to exclude counting events
> > that occur in specific contexts such as idle, kernel, guest, etc. These
> > drivers indicate this by returning an error in their event_init upon
> > testing the events attribute flags. This approach is error prone and
> > often inconsistent.
> > 
> > Let's instead allow PMU drivers to advertise their ability to exclude
> > based on context via a new capability: PERF_PMU_CAP_EXCLUDE. This
> > allows the perf core to reject requests for exclusion events where
> > there is no support in the PMU.
> > 
> > Signed-off-by: Andrew Murray 
> > ---
> >   include/linux/perf_event.h | 1 +
> >   kernel/events/core.c   | 9 +
> >   2 files changed, 10 insertions(+)
> > 
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index b2e806f..69b3d65 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -244,6 +244,7 @@ struct perf_event;
> >   #define PERF_PMU_CAP_EXCLUSIVE0x10
> >   #define PERF_PMU_CAP_ITRACE   0x20
> >   #define PERF_PMU_CAP_HETEROGENEOUS_CPUS   0x40
> > +#define PERF_PMU_CAP_EXCLUDE   0x80
> >   /**
> >* struct pmu - generic performance monitoring unit
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 5a97f34..9afb33c 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -9743,6 +9743,15 @@ static int perf_try_init_event(struct pmu *pmu, 
> > struct perf_event *event)
> > if (ctx)
> > perf_event_ctx_unlock(event->group_leader, ctx);
> > +   if (!ret) {
> > +   if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUDE) &&
> > +   event_has_any_exclude_flag(event)) {
> 
> Technically this is a bisection-breaker, since no driver has this capability
> yet - ideally, this patch should come after all the ones introducing it to
> the relevant drivers (with the removal of the now-redundant code from the
> other drivers at the end).

Indeed. Thought it is possible to first introduce the capability, update the
relevant drivers to advertise it, then add the change to core.c and finally
remove the unnecessary error checks as a result of using the new capability.
This approach could be bisection-proof.

> 
> Alternatively, since we already have several other negative capabilities,
> unless there's a strong feeling against adding any more then it might work
> out simpler to flip it to PERF_PMU_CAP_NO_EXCLUDE, such that we only need to
> introduce the core check then directly replace the open-coded event checks
> with the capability in the appropriate drivers, and need not touch the
> exclusion-supporting ones at all.

This would certaintly be less risky and invasive (e.g. compare the number of
files touched between this v2 and the previous v1).

I'm happy with either approach.

Thanks,

Andrew Murray

> 
> Robin.
> 
> > +   if (event->destroy)
> > +   event->destroy(event);
> > +   ret = -EINVAL;
> > +   }
> > +   }
> > +
> > if (ret)
> > module_put(pmu->module);
> > 


Re: New linux-next KCFLAGS

2018-11-26 Thread Gustavo A. R. Silva




On 11/26/18 8:00 AM, Stephen Rothwell wrote:

Hi all,

On Mon, 26 Nov 2018 17:49:26 +1100 Stephen Rothwell  
wrote:


On Sun, 25 Nov 2018 21:46:20 -0800 Kees Cook  wrote:


Excellent! (Though, wait, does this mean everyone _else_ will see this
too? I'm worried that will be way too noisy...)


Ah, yes, you may be right.  everyone will see them :-(

I may have to figure out another way to do this for tomorrow.


OK, starting tomorrow (later today), I have just added
-Wimplicit-fallthrough to KCFLAGS in my build scripts.



Awesome!

Thank you, Stephen. :)
--
Gustavo


[PATCH] powerpc/mm/hash: Hand user access of kernel address gracefully

2018-11-26 Thread Aneesh Kumar K.V
With commit 2865d08dd9ea ("powerpc/mm: Move the DSISR_PROTFAULT sanity check")
we moved the protection fault access check before vma lookup. That means we
hit that WARN_ON when user space access a kernel address.  Before the commit
this was handled by find_vma() not finding vma for the kernel address and
considering that access as bad area access.

Avoid the confusing WARN_ON and convert that to a ratelimited printk.
With the patch we now get

for load:
[  187.700294] a.out[5997]: User access of kernel address (c000dea0) - 
exploit attempt? (uid: 1000)
[  187.700344] a.out[5997]: segfault (11) at c000dea0 nip 1317c0798 lr 
7fff80d6441c code 1 in a.out[1317c+1]
[  187.700429] a.out[5997]: code: 6000 6042 3c4c0002 38427790 4b20 
3c4c0002 38427784 fbe1fff8
[  187.700435] a.out[5997]: code: f821ffc1 7c3f0b78 6000 e9228030 
<8929> 993f002f 6000 383f0040

for exec:
[  225.100903] a.out[6067]: User access of kernel address (c000dea0) - 
exploit attempt? (uid: 1000)
[  225.100938] a.out[6067]: segfault (11) at c000dea0 nip 
c000dea0 lr 129d507b0 code 1
[  225.100943] a.out[6067]: Bad NIP, not dumping instructions.

Fixes: 2865d08dd9ea ("powerpc/mm: Move the DSISR_PROTFAULT sanity check")
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/fault.c | 29 -
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 1697e903bbf2..46f280068c45 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -342,8 +342,21 @@ static inline void cmo_account_page_fault(void) { }
 #endif /* CONFIG_PPC_SMLPAR */
 
 #ifdef CONFIG_PPC_STD_MMU
-static void sanity_check_fault(bool is_write, unsigned long error_code)
+static void sanity_check_fault(bool is_write, bool is_user,
+  unsigned long error_code, unsigned long address)
 {
+   /*
+* userspace trying to access kernel address, we get PROTFAULT for that.
+*/
+   if (is_user && address >= TASK_SIZE) {
+   printk_ratelimited(KERN_CRIT "%s[%d]: "
+  "User access of kernel address (%lx) - "
+  "exploit attempt? (uid: %d)\n",
+  current->comm, current->pid, address,
+  from_kuid(_user_ns, current_uid()));
+   return;
+   }
+
/*
 * For hash translation mode, we should never get a
 * PROTFAULT. Any update to pte to reduce access will result in us
@@ -373,11 +386,17 @@ static void sanity_check_fault(bool is_write, unsigned 
long error_code)
 * For radix, we can get prot fault for autonuma case, because radix
 * page table will have them marked noaccess for user.
 */
-   if (!radix_enabled() && !is_write)
-   WARN_ON_ONCE(error_code & DSISR_PROTFAULT);
+   if (radix_enabled() || is_write)
+   return;
+
+   WARN_ON_ONCE(error_code & DSISR_PROTFAULT);
 }
 #else
-static void sanity_check_fault(bool is_write, unsigned long error_code) { }
+static void sanity_check_fault(bool is_write, bool is_user,
+  unsigned long error_code, unsigned long address)
+{
+
+}
 #endif /* CONFIG_PPC_STD_MMU */
 
 /*
@@ -435,7 +454,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
}
 
/* Additional sanity check(s) */
-   sanity_check_fault(is_write, error_code);
+   sanity_check_fault(is_write, is_user, error_code, address);
 
/*
 * The kernel should never take an execute fault nor should it
-- 
2.19.1



Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types

2018-11-26 Thread Michal Suchánek
On Mon, 26 Nov 2018 14:33:29 +0100
David Hildenbrand  wrote:

> On 26.11.18 13:30, David Hildenbrand wrote:
> > On 23.11.18 19:06, Michal Suchánek wrote:  

> >>
> >> If we are going to fake the driver information we may as well add the
> >> type attribute and be done with it.
> >>
> >> I think the problem with the patch was more with the semantic than the
> >> attribute itself.
> >>
> >> What is normal, paravirtualized, and standby memory?
> >>
> >> I can understand DIMM device, baloon device, or whatever mechanism for
> >> adding memory you might have.
> >>
> >> I can understand "memory designated as standby by the cluster
> >> administrator".
> >>
> >> However, DIMM vs baloon is orthogonal to standby and should not be
> >> conflated into one property.
> >>
> >> paravirtualized means nothing at all in relationship to memory type and
> >> the desired online policy to me.  
> > 
> > Right, so with whatever we come up, it should allow to make a decision
> > in user space about
> > - if memory is to be onlined automatically  
> 
> And I will think about if we really should model standby memory. Maybe
> it is really better to have in user space something like (as Dan noted)

If it is possible to designate the memory as standby or online in the
s390 admin interface and the kernel does have access to this
information it makes sense to forward it to userspace (as separate
s390-specific property). If not then you need to make some kind of
assumption like below and the user can tune the script according to
their usecase.

> 
> if (isS390x() && type == "dimm") {
>   /* don't online, on s390x system DIMMs are standby memory */
> }

Thanks

Michal


Re: [PATCH v2 03/20] perf/core: add PERF_PMU_CAP_EXCLUDE for exclusion capable PMUs

2018-11-26 Thread Robin Murphy

Hi Andrew,

On 26/11/2018 11:12, Andrew Murray wrote:

Many PMU drivers do not have the capability to exclude counting events
that occur in specific contexts such as idle, kernel, guest, etc. These
drivers indicate this by returning an error in their event_init upon
testing the events attribute flags. This approach is error prone and
often inconsistent.

Let's instead allow PMU drivers to advertise their ability to exclude
based on context via a new capability: PERF_PMU_CAP_EXCLUDE. This
allows the perf core to reject requests for exclusion events where
there is no support in the PMU.

Signed-off-by: Andrew Murray 
---
  include/linux/perf_event.h | 1 +
  kernel/events/core.c   | 9 +
  2 files changed, 10 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b2e806f..69b3d65 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -244,6 +244,7 @@ struct perf_event;
  #define PERF_PMU_CAP_EXCLUSIVE0x10
  #define PERF_PMU_CAP_ITRACE   0x20
  #define PERF_PMU_CAP_HETEROGENEOUS_CPUS   0x40
+#define PERF_PMU_CAP_EXCLUDE   0x80
  
  /**

   * struct pmu - generic performance monitoring unit
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5a97f34..9afb33c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9743,6 +9743,15 @@ static int perf_try_init_event(struct pmu *pmu, struct 
perf_event *event)
if (ctx)
perf_event_ctx_unlock(event->group_leader, ctx);
  
+	if (!ret) {

+   if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUDE) &&
+   event_has_any_exclude_flag(event)) {


Technically this is a bisection-breaker, since no driver has this 
capability yet - ideally, this patch should come after all the ones 
introducing it to the relevant drivers (with the removal of the 
now-redundant code from the other drivers at the end).


Alternatively, since we already have several other negative 
capabilities, unless there's a strong feeling against adding any more 
then it might work out simpler to flip it to PERF_PMU_CAP_NO_EXCLUDE, 
such that we only need to introduce the core check then directly replace 
the open-coded event checks with the capability in the appropriate 
drivers, and need not touch the exclusion-supporting ones at all.


Robin.


+   if (event->destroy)
+   event->destroy(event);
+   ret = -EINVAL;
+   }
+   }
+
if (ret)
module_put(pmu->module);
  



Re: [PATCH V2] mm: Replace all open encodings for NUMA_NO_NODE

2018-11-26 Thread Anshuman Khandual



On 11/26/2018 06:18 PM, David Hildenbrand wrote:
> On 26.11.18 13:26, Anshuman Khandual wrote:
>> At present there are multiple places where invalid node number is encoded
>> as -1. Even though implicitly understood it is always better to have macros
>> in there. Replace these open encodings for an invalid node number with the
>> global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like
>> 'invalid node' from various places redirecting them to a common definition.
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>> Changes in V2:
>>
>> - Added inclusion of 'numa.h' header at various places per Andrew
>> - Updated 'dev_to_node' to use NUMA_NO_NODE instead per Vinod
> 
> Reviewed-by: David Hildenbrand 

Thanks David. My bad, forgot to add your review tag from the earlier version.


Re: New linux-next KCFLAGS

2018-11-26 Thread Stephen Rothwell
Hi all,

On Mon, 26 Nov 2018 17:49:26 +1100 Stephen Rothwell  
wrote:
>
> On Sun, 25 Nov 2018 21:46:20 -0800 Kees Cook  wrote:
> >
> > Excellent! (Though, wait, does this mean everyone _else_ will see this
> > too? I'm worried that will be way too noisy...)  
> 
> Ah, yes, you may be right.  everyone will see them :-(
> 
> I may have to figure out another way to do this for tomorrow.

OK, starting tomorrow (later today), I have just added
-Wimplicit-fallthrough to KCFLAGS in my build scripts.

-- 
Cheers,
Stephen Rothwell


pgp6DnytlpIpB.pgp
Description: OpenPGP digital signature


Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types

2018-11-26 Thread David Hildenbrand
On 26.11.18 13:30, David Hildenbrand wrote:
> On 23.11.18 19:06, Michal Suchánek wrote:
>> On Fri, 23 Nov 2018 12:13:58 +0100
>> David Hildenbrand  wrote:
>>
>>> On 28.09.18 17:03, David Hildenbrand wrote:
 How to/when to online hotplugged memory is hard to manage for
 distributions because different memory types are to be treated differently.
 Right now, we need complicated udev rules that e.g. check if we are
 running on s390x, on a physical system or on a virtualized system. But
 there is also sometimes the demand to really online memory immediately
 while adding in the kernel and not to wait for user space to make a
 decision. And on virtualized systems there might be different
 requirements, depending on "how" the memory was added (and if it will
 eventually get unplugged again - DIMM vs. paravirtualized mechanisms).

 On the one hand, we have physical systems where we sometimes
 want to be able to unplug memory again - e.g. a DIMM - so we have to online
 it to the MOVABLE zone optionally. That decision is usually made in user
 space.

 On the other hand, we have memory that should never be onlined
 automatically, only when asked for by an administrator. Such memory only
 applies to virtualized environments like s390x, where the concept of
 "standby" memory exists. Memory is detected and added during boot, so it
 can be onlined when requested by the admininistrator or some tooling.
 Only when onlining, memory will be allocated in the hypervisor.

 But then, we also have paravirtualized devices (namely xen and hyper-v
 balloons), that hotplug memory that will never ever be removed from a
 system right now using offline_pages/remove_memory. If at all, this memory
 is logically unplugged and handed back to the hypervisor via ballooning.

 For paravirtualized devices it is relevant that memory is onlined as
 quickly as possible after adding - and that it is added to the NORMAL
 zone. Otherwise, it could happen that too much memory in a row is added
 (but not onlined), resulting in out-of-memory conditions due to the
 additional memory for "struct pages" and friends. MOVABLE zone as well
 as delays might be very problematic and lead to crashes (e.g. zone
 imbalance).

 Therefore, introduce memory block types and online memory depending on
 it when adding the memory. Expose the memory type to user space, so user
 space handlers can start to process only "normal" memory. Other memory
 block types can be ignored. One thing less to worry about in user space.
   
>>>
>>> So I was looking into alternatives.
>>>
>>> 1. Provide only "normal" and "standby" memory types to user space. This
>>> way user space can make smarter decisions about how to online memory.
>>> Not really sure if this is the right way to go.
>>>
>>>
>>> 2. Use device driver information (as mentioned by Michal S.).
>>>
>>> The problem right now is that there are no drivers for memory block
>>> devices. The "memory" subsystem has no drivers, so the KOBJ_ADD uevent
>>> will not contain a "DRIVER" information and we ave no idea what kind of
>>> memory block device we hold in our hands.
>>>
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory0
>>>
>>>   looking at device '/devices/system/memory/memory0':
>>> KERNEL=="memory0"
>>> SUBSYSTEM=="memory"
>>> DRIVER==""
>>> ATTR{online}=="1"
>>> ATTR{phys_device}=="0"
>>> ATTR{phys_index}==""
>>> ATTR{removable}=="0"
>>> ATTR{state}=="online"
>>> ATTR{valid_zones}=="none"
>>>
>>>
>>> If we would provide "fake" drivers for the memory block devices we want
>>> to treat in a special way in user space (e.g. standby memory on s390x),
>>> user space could use that information to make smarter decisions.
>>>
>>> Adding such drivers might work. My suggestion would be to let ordinary
>>> DIMMs be without a driver for now and only special case standby memory
>>> and eventually paravirtualized memory devices (XEN and Hyper-V).
>>>
>>> Any thoughts?
>>
>> If we are going to fake the driver information we may as well add the
>> type attribute and be done with it.
>>
>> I think the problem with the patch was more with the semantic than the
>> attribute itself.
>>
>> What is normal, paravirtualized, and standby memory?
>>
>> I can understand DIMM device, baloon device, or whatever mechanism for
>> adding memory you might have.
>>
>> I can understand "memory designated as standby by the cluster
>> administrator".
>>
>> However, DIMM vs baloon is orthogonal to standby and should not be
>> conflated into one property.
>>
>> paravirtualized means nothing at all in relationship to memory type and
>> the desired online policy to me.
> 
> Right, so with whatever we come up, it should allow to make a decision
> in user space about
> - if memory is to be onlined automatically

And I will think about if we really should 

Re: [PATCH v2] cpufreq: powernv: add of_node_put()

2018-11-26 Thread Frank Lee
On Wed, Nov 21, 2018 at 12:02 PM Viresh Kumar  wrote:
>
> On 20-11-18, 11:05, Yangtao Li wrote:
> > The of_find_node_by_path() returns a node pointer with refcount
> > incremented,but there is the lack of use of the of_node_put() when
> > done.Add the missing of_node_put() to release the refcount.
> >
> > Signed-off-by: Yangtao Li 
> > ---
> > Changes in v2
> > -update changelog
> > ---
> >  drivers/cpufreq/powernv-cpufreq.c | 17 +++--
> >  1 file changed, 11 insertions(+), 6 deletions(-)
>
> Acked-by: Viresh Kumar 
>
> --
> viresh
Hi Gautham:

Can you add your Reviewed-by here?

Thanks,
Yangtao


Re: [PATCH V2] mm: Replace all open encodings for NUMA_NO_NODE

2018-11-26 Thread David Hildenbrand
On 26.11.18 13:26, Anshuman Khandual wrote:
> At present there are multiple places where invalid node number is encoded
> as -1. Even though implicitly understood it is always better to have macros
> in there. Replace these open encodings for an invalid node number with the
> global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like
> 'invalid node' from various places redirecting them to a common definition.
> 
> Signed-off-by: Anshuman Khandual 
> ---
> Changes in V2:
> 
> - Added inclusion of 'numa.h' header at various places per Andrew
> - Updated 'dev_to_node' to use NUMA_NO_NODE instead per Vinod

Reviewed-by: David Hildenbrand 

-- 

Thanks,

David / dhildenb


Re: [PATCH] cpufreq: powernv: add of_node_put()

2018-11-26 Thread Gautham R Shenoy
Hello Yangtao Li,

On Tue, Nov 20, 2018 at 07:57:31AM -0500, Yangtao Li wrote:
> use of_node_put() to release the refcount.
>

Thanks for the patch.

Reviewed-by: Gautham R. Shenoy 

> Signed-off-by: Yangtao Li 
> ---
>  drivers/cpufreq/powernv-cpufreq.c | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
> index bf6519cf64bc..7e7ad3879c4e 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -253,18 +253,18 @@ static int init_powernv_pstates(void)
> 
>   if (of_property_read_u32(power_mgt, "ibm,pstate-min", _min)) {
>   pr_warn("ibm,pstate-min node not found\n");
> - return -ENODEV;
> + goto out;
>   }
> 
>   if (of_property_read_u32(power_mgt, "ibm,pstate-max", _max)) {
>   pr_warn("ibm,pstate-max node not found\n");
> - return -ENODEV;
> + goto out;
>   }
> 
>   if (of_property_read_u32(power_mgt, "ibm,pstate-nominal",
>_nominal)) {
>   pr_warn("ibm,pstate-nominal not found\n");
> - return -ENODEV;
> + goto out;
>   }
> 
>   if (of_property_read_u32(power_mgt, "ibm,pstate-ultra-turbo",
> @@ -293,14 +293,14 @@ static int init_powernv_pstates(void)
>   pstate_ids = of_get_property(power_mgt, "ibm,pstate-ids", _ids);
>   if (!pstate_ids) {
>   pr_warn("ibm,pstate-ids not found\n");
> - return -ENODEV;
> + goto out;
>   }
> 
>   pstate_freqs = of_get_property(power_mgt, "ibm,pstate-frequencies-mhz",
> _freqs);
>   if (!pstate_freqs) {
>   pr_warn("ibm,pstate-frequencies-mhz not found\n");
> - return -ENODEV;
> + goto out;
>   }
> 
>   if (len_ids != len_freqs) {
> @@ -311,7 +311,7 @@ static int init_powernv_pstates(void)
>   nr_pstates = min(len_ids, len_freqs) / sizeof(u32);
>   if (!nr_pstates) {
>   pr_warn("No PStates found\n");
> - return -ENODEV;
> + goto out;
>   }
> 
>   powernv_pstate_info.nr_pstates = nr_pstates;
> @@ -352,7 +352,12 @@ static int init_powernv_pstates(void)
> 
>   /* End of list marker entry */
>   powernv_freqs[i].frequency = CPUFREQ_TABLE_END;
> +
> + of_node_put(power_mgt);
>   return 0;
> +out:
> + of_node_put(power_mgt);
> + return -ENODEV;
>  }
> 
>  /* Returns the CPU frequency corresponding to the pstate_id. */
> -- 
> 2.17.0
> 



Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types

2018-11-26 Thread David Hildenbrand
On 23.11.18 19:06, Michal Suchánek wrote:
> On Fri, 23 Nov 2018 12:13:58 +0100
> David Hildenbrand  wrote:
> 
>> On 28.09.18 17:03, David Hildenbrand wrote:
>>> How to/when to online hotplugged memory is hard to manage for
>>> distributions because different memory types are to be treated differently.
>>> Right now, we need complicated udev rules that e.g. check if we are
>>> running on s390x, on a physical system or on a virtualized system. But
>>> there is also sometimes the demand to really online memory immediately
>>> while adding in the kernel and not to wait for user space to make a
>>> decision. And on virtualized systems there might be different
>>> requirements, depending on "how" the memory was added (and if it will
>>> eventually get unplugged again - DIMM vs. paravirtualized mechanisms).
>>>
>>> On the one hand, we have physical systems where we sometimes
>>> want to be able to unplug memory again - e.g. a DIMM - so we have to online
>>> it to the MOVABLE zone optionally. That decision is usually made in user
>>> space.
>>>
>>> On the other hand, we have memory that should never be onlined
>>> automatically, only when asked for by an administrator. Such memory only
>>> applies to virtualized environments like s390x, where the concept of
>>> "standby" memory exists. Memory is detected and added during boot, so it
>>> can be onlined when requested by the admininistrator or some tooling.
>>> Only when onlining, memory will be allocated in the hypervisor.
>>>
>>> But then, we also have paravirtualized devices (namely xen and hyper-v
>>> balloons), that hotplug memory that will never ever be removed from a
>>> system right now using offline_pages/remove_memory. If at all, this memory
>>> is logically unplugged and handed back to the hypervisor via ballooning.
>>>
>>> For paravirtualized devices it is relevant that memory is onlined as
>>> quickly as possible after adding - and that it is added to the NORMAL
>>> zone. Otherwise, it could happen that too much memory in a row is added
>>> (but not onlined), resulting in out-of-memory conditions due to the
>>> additional memory for "struct pages" and friends. MOVABLE zone as well
>>> as delays might be very problematic and lead to crashes (e.g. zone
>>> imbalance).
>>>
>>> Therefore, introduce memory block types and online memory depending on
>>> it when adding the memory. Expose the memory type to user space, so user
>>> space handlers can start to process only "normal" memory. Other memory
>>> block types can be ignored. One thing less to worry about in user space.
>>>   
>>
>> So I was looking into alternatives.
>>
>> 1. Provide only "normal" and "standby" memory types to user space. This
>> way user space can make smarter decisions about how to online memory.
>> Not really sure if this is the right way to go.
>>
>>
>> 2. Use device driver information (as mentioned by Michal S.).
>>
>> The problem right now is that there are no drivers for memory block
>> devices. The "memory" subsystem has no drivers, so the KOBJ_ADD uevent
>> will not contain a "DRIVER" information and we ave no idea what kind of
>> memory block device we hold in our hands.
>>
>> $ udevadm info -q all -a /sys/devices/system/memory/memory0
>>
>>   looking at device '/devices/system/memory/memory0':
>> KERNEL=="memory0"
>> SUBSYSTEM=="memory"
>> DRIVER==""
>> ATTR{online}=="1"
>> ATTR{phys_device}=="0"
>> ATTR{phys_index}==""
>> ATTR{removable}=="0"
>> ATTR{state}=="online"
>> ATTR{valid_zones}=="none"
>>
>>
>> If we would provide "fake" drivers for the memory block devices we want
>> to treat in a special way in user space (e.g. standby memory on s390x),
>> user space could use that information to make smarter decisions.
>>
>> Adding such drivers might work. My suggestion would be to let ordinary
>> DIMMs be without a driver for now and only special case standby memory
>> and eventually paravirtualized memory devices (XEN and Hyper-V).
>>
>> Any thoughts?
> 
> If we are going to fake the driver information we may as well add the
> type attribute and be done with it.
> 
> I think the problem with the patch was more with the semantic than the
> attribute itself.
> 
> What is normal, paravirtualized, and standby memory?
> 
> I can understand DIMM device, baloon device, or whatever mechanism for
> adding memory you might have.
> 
> I can understand "memory designated as standby by the cluster
> administrator".
> 
> However, DIMM vs baloon is orthogonal to standby and should not be
> conflated into one property.
> 
> paravirtualized means nothing at all in relationship to memory type and
> the desired online policy to me.

Right, so with whatever we come up, it should allow to make a decision
in user space about
- if memory is to be onlined automatically
- to which zone memory is to be onlined

The rules are encoded in user space, the type will allow to make a
decision. One important part will be if the memory can eventually be
offlined + 

[PATCH V2] mm: Replace all open encodings for NUMA_NO_NODE

2018-11-26 Thread Anshuman Khandual
At present there are multiple places where invalid node number is encoded
as -1. Even though implicitly understood it is always better to have macros
in there. Replace these open encodings for an invalid node number with the
global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like
'invalid node' from various places redirecting them to a common definition.

Signed-off-by: Anshuman Khandual 
---
Changes in V2:

- Added inclusion of 'numa.h' header at various places per Andrew
- Updated 'dev_to_node' to use NUMA_NO_NODE instead per Vinod

Changes in V1: (https://lkml.org/lkml/2018/11/23/485)

- Dropped OCFS2 changes per Joseph
- Dropped media/video drivers changes per Hans

RFC - https://patchwork.kernel.org/patch/10678035/

Build tested this with multiple cross compiler options like alpha, sparc,
arm64, x86, powerpc, powerpc64le etc with their default config which might
not have compiled tested all driver related changes. I will appreciate
folks giving this a test in their respective build environment.

All these places for replacement were found by running the following grep
patterns on the entire kernel code. Please let me know if this might have
missed some instances. This might also have replaced some false positives.
I will appreciate suggestions, inputs and review.

1. git grep "nid == -1"
2. git grep "node == -1"
3. git grep "nid = -1"
4. git grep "node = -1"

 arch/alpha/include/asm/topology.h |  3 ++-
 arch/ia64/kernel/numa.c   |  2 +-
 arch/ia64/mm/discontig.c  |  6 +++---
 arch/ia64/sn/kernel/io_common.c   |  3 ++-
 arch/powerpc/include/asm/pci-bridge.h |  3 ++-
 arch/powerpc/kernel/paca.c|  3 ++-
 arch/powerpc/kernel/pci-common.c  |  3 ++-
 arch/powerpc/mm/numa.c| 14 +++---
 arch/powerpc/platforms/powernv/memtrace.c |  5 +++--
 arch/sparc/kernel/auxio_32.c  |  3 ++-
 arch/sparc/kernel/pci_fire.c  |  3 ++-
 arch/sparc/kernel/pci_schizo.c|  3 ++-
 arch/sparc/kernel/pcic.c  |  7 ---
 arch/sparc/kernel/psycho_common.c |  3 ++-
 arch/sparc/kernel/sbus.c  |  3 ++-
 arch/sparc/mm/init_64.c   |  6 +++---
 arch/sparc/prom/init_32.c |  3 ++-
 arch/sparc/prom/init_64.c |  5 +++--
 arch/sparc/prom/tree_32.c | 13 +++--
 arch/sparc/prom/tree_64.c | 19 ++-
 arch/x86/include/asm/pci.h|  3 ++-
 arch/x86/kernel/apic/x2apic_uv_x.c|  7 ---
 arch/x86/kernel/smpboot.c |  3 ++-
 arch/x86/platform/olpc/olpc_dt.c  | 17 +
 drivers/block/mtip32xx/mtip32xx.c |  5 +++--
 drivers/dma/dmaengine.c   |  4 +++-
 drivers/infiniband/hw/hfi1/affinity.c |  3 ++-
 drivers/infiniband/hw/hfi1/init.c |  3 ++-
 drivers/iommu/dmar.c  |  5 +++--
 drivers/iommu/intel-iommu.c   |  3 ++-
 drivers/misc/sgi-xp/xpc_uv.c  |  3 ++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  5 +++--
 include/linux/device.h|  2 +-
 init/init_task.c  |  3 ++-
 kernel/kthread.c  |  3 ++-
 kernel/sched/fair.c   | 15 ---
 lib/cpumask.c |  3 ++-
 mm/huge_memory.c  | 13 +++--
 mm/hugetlb.c  |  3 ++-
 mm/ksm.c  |  2 +-
 mm/memory.c   |  7 ---
 mm/memory_hotplug.c   | 12 ++--
 mm/mempolicy.c|  2 +-
 mm/page_alloc.c   |  4 ++--
 mm/page_ext.c |  2 +-
 net/core/pktgen.c |  3 ++-
 net/qrtr/qrtr.c   |  3 ++-
 tools/perf/bench/numa.c   |  6 +++---
 48 files changed, 146 insertions(+), 108 deletions(-)

diff --git a/arch/alpha/include/asm/topology.h 
b/arch/alpha/include/asm/topology.h
index e6e13a8..5a77a40 100644
--- a/arch/alpha/include/asm/topology.h
+++ b/arch/alpha/include/asm/topology.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #ifdef CONFIG_NUMA
@@ -29,7 +30,7 @@ static const struct cpumask *cpumask_of_node(int node)
 {
int cpu;
 
-   if (node == -1)
+   if (node == NUMA_NO_NODE)
return cpu_all_mask;
 
cpumask_clear(_to_cpumask_map[node]);
diff --git a/arch/ia64/kernel/numa.c b/arch/ia64/kernel/numa.c
index 92c3762..1315da6 100644
--- a/arch/ia64/kernel/numa.c
+++ b/arch/ia64/kernel/numa.c
@@ -74,7 +74,7 @@ void __init build_cpu_to_node_map(void)

Re: [PATCH] of: add dtc annotations functionality to dtx_diff

2018-11-26 Thread Frank Rowand
Hi Rob,

I got the cc: list wrong on this patch, please dis-regard.

I will resend (with unchanged version) to the correct cc: list.

-Frank


On 11/26/18 3:54 AM, frowand.l...@gmail.com wrote:
> From: Frank Rowand 
> 
> Add -T and --annotations command line arguments to dtx_diff.  These
> arguments will be passed through to dtc.  dtc will then add source
> location annotations to its output.
> 
> Signed-off-by: Frank Rowand 
> ---
> 
> This feature depends upon commit 5667e7ef9a9a ("annotations: add the
> annotation functionality") in the dtc git repository.  To use the
> new flags before the new version of dtc is imported to the linux
> kernel, download the dtc repository, compile dtc with the make command,
> then add the path of the dtc repository to the shell PATH variable.
> 
>  scripts/dtc/dtx_diff | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/dtc/dtx_diff b/scripts/dtc/dtx_diff
> index 8c4fbad2055e..0d8572008729 100755
> --- a/scripts/dtc/dtx_diff
> +++ b/scripts/dtc/dtx_diff
> @@ -21,6 +21,7 @@ Usage:
>  diff DTx_1 and DTx_2
>  
>  
> +  --annotatesynonym for -T
> -f   print full dts in diff (--unified=9)
> -h   synonym for --help
> -helpsynonym for --help
> @@ -28,6 +29,7 @@ Usage:
> -s SRCTREE   linux kernel source tree is at path SRCTREE
>  (default is current directory)
> -S   linux kernel source tree is at root of current git repo
> +   -T   Annotate output .dts with input source file and line (-T 
> -T for more details)
> -u   unsorted, do not sort DTx
>  
>  
> @@ -174,6 +176,7 @@ compile_to_dts() {
>  
>  # -  start of script
>  
> +annotate=""
>  cmd_diff=0
>  diff_flags="-u"
>  dtx_file_1=""
> @@ -208,6 +211,14 @@ while [ $# -gt 0 ] ; do
>   shift
>   ;;
>  
> + -T | --annotate )
> + if [ "${annotate}"  = "" ] ; then
> + annotate="-T"
> + elif [ "${annotate}"  = "-T" ] ; then
> + annotate="-T -T"
> + fi
> + shift
> + ;;
>   -u )
>   dtc_sort=""
>   shift
> @@ -327,7 +338,7 @@ cpp_flags="\
>  DTC="\
>   ${DTC} \
>   -i ${srctree}/scripts/dtc/include-prefixes \
> - -O dts -qq -f ${dtc_sort} -o -"
> + -O dts -qq -f ${dtc_sort} ${annotate} -o -"
>  
>  
>  # -  do the diff or decompile
> 



[PATCH] of: add dtc annotations functionality to dtx_diff

2018-11-26 Thread frowand . list
From: Frank Rowand 

Add -T and --annotations command line arguments to dtx_diff.  These
arguments will be passed through to dtc.  dtc will then add source
location annotations to its output.

Signed-off-by: Frank Rowand 
---

This feature depends upon commit 5667e7ef9a9a ("annotations: add the
annotation functionality") in the dtc git repository.  To use the
new flags before the new version of dtc is imported to the linux
kernel, download the dtc repository, compile dtc with the make command,
then add the path of the dtc repository to the shell PATH variable.

 scripts/dtc/dtx_diff | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/scripts/dtc/dtx_diff b/scripts/dtc/dtx_diff
index 8c4fbad2055e..0d8572008729 100755
--- a/scripts/dtc/dtx_diff
+++ b/scripts/dtc/dtx_diff
@@ -21,6 +21,7 @@ Usage:
 diff DTx_1 and DTx_2
 
 
+  --annotatesynonym for -T
-f   print full dts in diff (--unified=9)
-h   synonym for --help
-helpsynonym for --help
@@ -28,6 +29,7 @@ Usage:
-s SRCTREE   linux kernel source tree is at path SRCTREE
 (default is current directory)
-S   linux kernel source tree is at root of current git repo
+   -T   Annotate output .dts with input source file and line (-T 
-T for more details)
-u   unsorted, do not sort DTx
 
 
@@ -174,6 +176,7 @@ compile_to_dts() {
 
 # -  start of script
 
+annotate=""
 cmd_diff=0
 diff_flags="-u"
 dtx_file_1=""
@@ -208,6 +211,14 @@ while [ $# -gt 0 ] ; do
shift
;;
 
+   -T | --annotate )
+   if [ "${annotate}"  = "" ] ; then
+   annotate="-T"
+   elif [ "${annotate}"  = "-T" ] ; then
+   annotate="-T -T"
+   fi
+   shift
+   ;;
-u )
dtc_sort=""
shift
@@ -327,7 +338,7 @@ cpp_flags="\
 DTC="\
${DTC} \
-i ${srctree}/scripts/dtc/include-prefixes \
-   -O dts -qq -f ${dtc_sort} -o -"
+   -O dts -qq -f ${dtc_sort} ${annotate} -o -"
 
 
 # -  do the diff or decompile
-- 
Frank Rowand 



[PATCH v2 20/20] perf/core: remove unused perf_flags

2018-11-26 Thread Andrew Murray
Now that perf_flags is not used we remove it.

Signed-off-by: Andrew Murray 
---
 include/uapi/linux/perf_event.h   | 2 --
 tools/include/uapi/linux/perf_event.h | 2 --
 2 files changed, 4 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index f35eb72..ba89bd3 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -445,8 +445,6 @@ struct perf_event_query_bpf {
__u32   ids[0];
 };
 
-#define perf_flags(attr)   (*(&(attr)->read_format + 1))
-
 /*
  * Ioctls that can be done on a perf event fd:
  */
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index f35eb72..ba89bd3 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -445,8 +445,6 @@ struct perf_event_query_bpf {
__u32   ids[0];
 };
 
-#define perf_flags(attr)   (*(&(attr)->read_format + 1))
-
 /*
  * Ioctls that can be done on a perf event fd:
  */
-- 
2.7.4



[PATCH v2 19/20] x86: perf/core: advertise PMU exclusion capability

2018-11-26 Thread Andrew Murray
For PMUs that have the capability to exclude events based
on context. Let's advertise that we support the
PERF_PMU_CAP_EXCLUDE capability to ensure that perf doesn't
prevent us from handling events where any exclusion flags
are set.

Signed-off-by: Andrew Murray 
---
 arch/x86/events/core.c  | 2 ++
 arch/x86/events/intel/bts.c | 2 +-
 arch/x86/events/intel/pt.c  | 4 +++-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index dfb2f7c..3f51916 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2297,6 +2297,8 @@ static struct pmu pmu = {
.event_idx  = x86_pmu_event_idx,
.sched_task = x86_pmu_sched_task,
.task_ctx_size  = sizeof(struct x86_perf_task_context),
+
+   .capabilities   = PERF_PMU_CAP_EXCLUDE,
 };
 
 void arch_perf_update_userpage(struct perf_event *event,
diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
index 24ffa1e..4976695 100644
--- a/arch/x86/events/intel/bts.c
+++ b/arch/x86/events/intel/bts.c
@@ -601,7 +601,7 @@ static __init int bts_init(void)
}
 
bts_pmu.capabilities= PERF_PMU_CAP_AUX_NO_SG | PERF_PMU_CAP_ITRACE |
- PERF_PMU_CAP_EXCLUSIVE;
+ PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_EXCLUDE;
bts_pmu.task_ctx_nr = perf_sw_context;
bts_pmu.event_init  = bts_event_init;
bts_pmu.add = bts_event_add;
diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 8d016ce..2d811f8 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1516,7 +1516,9 @@ static __init int pt_init(void)
pt_pmu.pmu.capabilities =
PERF_PMU_CAP_AUX_NO_SG | PERF_PMU_CAP_AUX_SW_DOUBLEBUF;
 
-   pt_pmu.pmu.capabilities |= PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE;
+   pt_pmu.pmu.capabilities  = PERF_PMU_CAP_EXCLUSIVE |
+  PERF_PMU_CAP_ITRACE |
+  PERF_PMU_CAP_EXCLUDE;
pt_pmu.pmu.attr_groups   = pt_attr_groups;
pt_pmu.pmu.task_ctx_nr   = perf_sw_context;
pt_pmu.pmu.event_init= pt_event_init;
-- 
2.7.4



[PATCH v2 18/20] x86: perf/core remove unnecessary checks for exclusion

2018-11-26 Thread Andrew Murray
For x86 PMUs that do not support context exclusion we do not
advertise the PERF_PMU_CAP_EXCLUDE capability. This ensures
that perf will prevent us from handling events where any
exclusion flags are set. Let's remove the now unnecessary
check for exclusion flags.

This change means that amd/iommu and amd/uncore will now
also indicate that they do not support exclude_{hv|idle} and
intel/uncore that it does not support exclude_{guest|host}.

Signed-off-by: Andrew Murray 
---
 arch/x86/events/amd/iommu.c| 5 -
 arch/x86/events/amd/uncore.c   | 5 -
 arch/x86/events/intel/uncore.c | 8 
 3 files changed, 18 deletions(-)

diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index 3210fee..eb35fe4 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -223,11 +223,6 @@ static int perf_iommu_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   /* IOMMU counters do not have usr/os/guest/host bits */
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_host || event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 8671de1..d6ade30 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -201,11 +201,6 @@ static int amd_uncore_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   /* NB and Last level cache counters do not have usr/os/guest/host bits 
*/
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_host || event->attr.exclude_guest)
-   return -EINVAL;
-
/* and we do not enable counter overflow interrupts */
hwc->config = event->attr.config & AMD64_RAW_EVENT_MASK_NB;
hwc->idx = -1;
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 27a4614..f1d78d9 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -695,14 +695,6 @@ static int uncore_pmu_event_init(struct perf_event *event)
if (pmu->func_id < 0)
return -ENOENT;
 
-   /*
-* Uncore PMU does measure at all privilege level all the time.
-* So it doesn't make sense to specify any exclude bits.
-*/
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_hv || event->attr.exclude_idle)
-   return -EINVAL;
-
/* Sampling not supported yet */
if (hwc->sample_period)
return -EINVAL;
-- 
2.7.4



[PATCH v2 16/20] sparc: perf/core: advertise PMU exclusion capability

2018-11-26 Thread Andrew Murray
The SPARC PMU has the capability to exclude events based on context
 - let's advertise that we support the PERF_PMU_CAP_EXCLUDE
capability to ensure that perf doesn't prevent us from handling
events where any exclusion flags are set.

Signed-off-by: Andrew Murray 
---
 arch/sparc/kernel/perf_event.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index d3149ba..38fac17 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1571,6 +1571,7 @@ static struct pmu pmu = {
.start_txn  = sparc_pmu_start_txn,
.cancel_txn = sparc_pmu_cancel_txn,
.commit_txn = sparc_pmu_commit_txn,
+   .capabilities   = PERF_PMU_CAP_EXCLUDE,
 };
 
 void perf_event_print_debug(void)
-- 
2.7.4



[PATCH v2 17/20] x86: perf/core: remove unnecessary checks for exclusion

2018-11-26 Thread Andrew Murray
For drivers that do not support context exclusion we do not
advertise the PERF_PMU_CAP_EXCLUDE capability. This ensures
that perf will prevent us from handling events where any exclusion
flags are set. Let's remove the now unnecessary check for
exclusion flags.

Signed-off-by: Andrew Murray 
---
 arch/x86/events/amd/ibs.c  | 12 
 arch/x86/events/amd/power.c|  9 +
 arch/x86/events/intel/cstate.c |  8 +---
 arch/x86/events/intel/rapl.c   |  8 +---
 arch/x86/events/intel/uncore_snb.c |  8 +---
 arch/x86/events/msr.c  |  8 +---
 6 files changed, 5 insertions(+), 48 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index d50bb4d..9e43ef6 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -253,15 +253,6 @@ static int perf_ibs_precise_event(struct perf_event 
*event, u64 *config)
return -EOPNOTSUPP;
 }
 
-static const struct perf_event_attr ibs_notsupp = {
-   .exclude_user   = 1,
-   .exclude_kernel = 1,
-   .exclude_hv = 1,
-   .exclude_idle   = 1,
-   .exclude_host   = 1,
-   .exclude_guest  = 1,
-};
-
 static int perf_ibs_init(struct perf_event *event)
 {
struct hw_perf_event *hwc = >hw;
@@ -282,9 +273,6 @@ static int perf_ibs_init(struct perf_event *event)
if (event->pmu != _ibs->pmu)
return -ENOENT;
 
-   if (perf_flags(>attr) & perf_flags(_notsupp))
-   return -EINVAL;
-
if (config & ~perf_ibs->config_mask)
return -EINVAL;
 
diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c
index 2aefacf..ef80c60 100644
--- a/arch/x86/events/amd/power.c
+++ b/arch/x86/events/amd/power.c
@@ -136,14 +136,7 @@ static int pmu_event_init(struct perf_event *event)
return -ENOENT;
 
/* Unsupported modes and filters. */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest  ||
-   /* no sampling */
-   event->attr.sample_period)
+   if (event->attr.sample_period)
return -EINVAL;
 
if (cfg != AMD_POWER_EVENTSEL_PKG)
diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index 9f8084f..af919c4 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -280,13 +280,7 @@ static int cstate_pmu_event_init(struct perf_event *event)
return -ENOENT;
 
/* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest  ||
-   event->attr.sample_period) /* no sampling */
+   if (event->attr.sample_period) /* no sampling */
return -EINVAL;
 
if (event->cpu < 0)
diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 32f3e94..9cb94e6 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -397,13 +397,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
return -EINVAL;
 
/* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest  ||
-   event->attr.sample_period) /* no sampling */
+   if (event->attr.sample_period) /* no sampling */
return -EINVAL;
 
/* must be done before validate_group */
diff --git a/arch/x86/events/intel/uncore_snb.c 
b/arch/x86/events/intel/uncore_snb.c
index 8527c3e..26441eb 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -374,13 +374,7 @@ static int snb_uncore_imc_event_init(struct perf_event 
*event)
return -EINVAL;
 
/* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest  ||
-   event->attr.sample_period) /* no sampling */
+   if (event->attr.sample_period) /* no sampling */
return -EINVAL;
 
/*
diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c
index b4771a6..c4fa5d4 100644
--- a/arch/x86/events/msr.c
+++ b/arch/x86/events/msr.c
@@ -160,13 +160,7 @@ static int msr_event_init(struct perf_event *event)
return -ENOENT;
 
/* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-

[PATCH v2 15/20] s390: perf/events: advertise PMU exclusion capability

2018-11-26 Thread Andrew Murray
The s390 cpum_cf and cpum_sf PMUs have the capability to exclude
events based on context. Let's advertise that we support the
PERF_PMU_CAP_EXCLUDE capability to ensure that perf doesn't
prevent us from handling events where any exclusion flags are set.

Signed-off-by: Andrew Murray 
---
 arch/s390/kernel/perf_cpum_cf.c | 1 +
 arch/s390/kernel/perf_cpum_sf.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index cc085e2..7b583ed 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -667,6 +667,7 @@ static struct pmu cpumf_pmu = {
.start_txn= cpumf_pmu_start_txn,
.commit_txn   = cpumf_pmu_commit_txn,
.cancel_txn   = cpumf_pmu_cancel_txn,
+   .capabilities = PERF_PMU_CAP_EXCLUDE,
 };
 
 static int cpumf_pmf_setup(unsigned int cpu, int flags)
diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index 5c53e97..25a64aa 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -1885,6 +1885,8 @@ static struct pmu cpumf_sampling = {
 
.setup_aux= aux_buffer_setup,
.free_aux = aux_buffer_free,
+
+   .capabilities = PERF_PMU_CAP_EXCLUDE,
 };
 
 static void cpumf_measurement_alert(struct ext_code ext_code,
-- 
2.7.4



[PATCH v2 14/20] powerpc: perf/core: remove unnecessary checks for exclusion

2018-11-26 Thread Andrew Murray
For PowerPC PMUs that do not support context exclusion we do not
advertise the PERF_PMU_CAP_EXCLUDE capability. This ensures that
perf will prevent us from handling events where any exclusion
flags are set. Let's remove the now unnecessary check for exclusion
flags.

Signed-off-by: Andrew Murray 
---
 arch/powerpc/perf/hv-24x7.c |  9 -
 arch/powerpc/perf/hv-gpci.c |  9 -
 arch/powerpc/perf/imc-pmu.c | 18 --
 3 files changed, 36 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 72238ee..d13d8a9 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1306,15 +1306,6 @@ static int h_24x7_event_init(struct perf_event *event)
return -EINVAL;
}
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
/* no branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
index 43fabb3..e0ce0e0 100644
--- a/arch/powerpc/perf/hv-gpci.c
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -232,15 +232,6 @@ static int h_gpci_event_init(struct perf_event *event)
return -EINVAL;
}
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
/* no branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 1fafc32b..49c0b1c 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -473,15 +473,6 @@ static int nest_imc_event_init(struct perf_event *event)
if (event->hw.sample_period)
return -EINVAL;
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
@@ -748,15 +739,6 @@ static int core_imc_event_init(struct perf_event *event)
if (event->hw.sample_period)
return -EINVAL;
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
-- 
2.7.4



[PATCH v2 13/20] powerpc: perf/core: advertise PMU exclusion capability

2018-11-26 Thread Andrew Murray
For PowerPC PMUs that have the capability to exclude events
based on context. Let's advertise that we support the
PERF_PMU_CAP_EXCLUDE capability to ensure that perf doesn't
prevent us from handling events where any exclusion flags are
set.

Signed-off-by: Andrew Murray 
---
 arch/powerpc/perf/core-book3s.c  | 1 +
 arch/powerpc/perf/core-fsl-emb.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 81f8a0c..2f44b09 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2007,6 +2007,7 @@ static struct pmu power_pmu = {
.commit_txn = power_pmu_commit_txn,
.event_idx  = power_pmu_event_idx,
.sched_task = power_pmu_sched_task,
+   .capabilities   = PERF_PMU_CAP_EXCLUDE,
 };
 
 /*
diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c
index ba48584..cea5bcb 100644
--- a/arch/powerpc/perf/core-fsl-emb.c
+++ b/arch/powerpc/perf/core-fsl-emb.c
@@ -596,6 +596,7 @@ static struct pmu fsl_emb_pmu = {
.start  = fsl_emb_pmu_start,
.stop   = fsl_emb_pmu_stop,
.read   = fsl_emb_pmu_read,
+   .capabilities   = PERF_PMU_CAP_EXCLUDE,
 };
 
 /*
-- 
2.7.4



[PATCH v2 12/20] mips: perf/core: advertise PMU exclusion capability

2018-11-26 Thread Andrew Murray
The MIPS PMU has the capability to exclude events based on
context. Let's advertise that we support the PERF_PMU_CAP_EXCLUDE
capability to ensure that perf doesn't prevent us from handling
events where any exclusion flags are set.

Signed-off-by: Andrew Murray 
---
 arch/mips/kernel/perf_event_mipsxx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c 
b/arch/mips/kernel/perf_event_mipsxx.c
index 4138635..d7813d0 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -670,6 +670,7 @@ static struct pmu pmu = {
.start  = mipspmu_start,
.stop   = mipspmu_stop,
.read   = mipspmu_read,
+   .capabilities   = PERF_PMU_CAP_EXCLUDE,
 };
 
 static unsigned int mipspmu_perf_event_encode(const struct mips_perf_event 
*pev)
-- 
2.7.4



[PATCH v2 11/20] drivers/perf: perf/core: advertise PMU exclusion capability

2018-11-26 Thread Andrew Murray
The arm_pse PMU has the capability to exclude events based on
context. Let's advertise that we support the PERF_PMU_CAP_EXCLUDE
capability to ensure that perf doesn't prevent us from handling
events where any exclusion flags are set.

Signed-off-by: Andrew Murray 
---
 drivers/perf/arm_spe_pmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 54ec278..a09c38a 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -902,7 +902,8 @@ static int arm_spe_pmu_perf_init(struct arm_spe_pmu 
*spe_pmu)
 
spe_pmu->pmu = (struct pmu) {
.module = THIS_MODULE,
-   .capabilities   = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
+   .capabilities   = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE |
+ PERF_PMU_CAP_EXCLUDE,
.attr_groups= arm_spe_pmu_attr_groups,
/*
 * We hitch a ride on the software context here, so that
-- 
2.7.4



[PATCH v2 10/20] drivers/perf: perf/core: remove unnecessary checks for exclusion

2018-11-26 Thread Andrew Murray
For drivers that do not support context exclusion we do not
advertise the PERF_PMU_CAP_EXCLUDE capability. This ensures that
perf will prevent us from handling events where any exclusion
flags are set. Let's remove the now unnecessary check for
exclusion flags.

This change means that qcom_{l2|l3}_pmu will now also indicate that
they do not support exclude_{host|guest} and that xgene_pmu does
not also support exclude_idle and exclude_hv.

Note that for qcom_l2_pmu we now implictly return -EINVAL instead
of -EOPNOTSUPP. This change will result in the perf userspace
utility retrying the perf_event_open system call with fallback
event attributes that do not fail.

Signed-off-by: Andrew Murray 
---
 drivers/perf/qcom_l2_pmu.c | 8 
 drivers/perf/qcom_l3_pmu.c | 7 ---
 drivers/perf/xgene_pmu.c   | 5 -
 3 files changed, 20 deletions(-)

diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 842135c..518e18c 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -509,14 +509,6 @@ static int l2_cache_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
 
-   /* We cannot filter accurately so we just don't allow it. */
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_hv || event->attr.exclude_idle) {
-   dev_dbg_ratelimited(_pmu->pdev->dev,
-   "Can't exclude execution levels\n");
-   return -EOPNOTSUPP;
-   }
-
if (((L2_EVT_GROUP(event->attr.config) > L2_EVT_GROUP_MAX) ||
 ((event->attr.config & ~L2_EVT_MASK) != 0)) &&
(event->attr.config != L2CYCLE_CTR_RAW_CODE)) {
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 2dc63d6..e28bd2f 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -495,13 +495,6 @@ static int qcom_l3_cache__event_init(struct perf_event 
*event)
return -ENOENT;
 
/*
-* There are no per-counter mode filters in the PMU.
-*/
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_hv || event->attr.exclude_idle)
-   return -EINVAL;
-
-   /*
 * Sampling not supported since these events are not core-attributable.
 */
if (hwc->sample_period)
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 0e31f13..bdc55de 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -914,11 +914,6 @@ static int xgene_perf_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   /* SOC counters do not have usr/os/guest/host bits */
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_host || event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
/*
-- 
2.7.4



[PATCH v2 09/20] drivers/perf: perf/core: remove unnecessary checks for exclusion

2018-11-26 Thread Andrew Murray
For drivers that do not support context exclusion we do not
advertise the PERF_PMU_CAP_EXCLUDE capability. This ensures
that perf will prevent us from handling events where any exclusion
flags are set. Let's remove the now unnecessary check for
exclusion flags.

Signed-off-by: Andrew Murray 
---
 drivers/perf/arm-cci.c   | 9 -
 drivers/perf/arm-ccn.c   | 5 +
 drivers/perf/arm_dsu_pmu.c   | 8 +---
 drivers/perf/hisilicon/hisi_uncore_pmu.c | 9 -
 4 files changed, 2 insertions(+), 29 deletions(-)

diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 1bfeb16..501b497 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1327,15 +1327,6 @@ static int cci_pmu_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EOPNOTSUPP;
 
-   /* We have no filtering of any kind */
-   if (event->attr.exclude_user||
-   event->attr.exclude_kernel  ||
-   event->attr.exclude_hv  ||
-   event->attr.exclude_idle||
-   event->attr.exclude_host||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
/*
 * Following the example set by other "uncore" PMUs, we accept any CPU
 * and rewrite its affinity dynamically rather than having perf core
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 7dd850e..decf881 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -741,10 +741,7 @@ static int arm_ccn_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
 
-   if (has_branch_stack(event) || event->attr.exclude_user ||
-   event->attr.exclude_kernel || event->attr.exclude_hv ||
-   event->attr.exclude_idle || event->attr.exclude_host ||
-   event->attr.exclude_guest) {
+   if (has_branch_stack(event)) {
dev_dbg(ccn->dev, "Can't exclude execution levels!\n");
return -EINVAL;
}
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index 660cb8a..036414c 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -562,13 +562,7 @@ static int dsu_pmu_event_init(struct perf_event *event)
return -EINVAL;
}
 
-   if (has_branch_stack(event) ||
-   event->attr.exclude_user ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle ||
-   event->attr.exclude_host ||
-   event->attr.exclude_guest) {
+   if (has_branch_stack(event)) {
dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n");
return -EINVAL;
}
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c 
b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index 9efd241..f028cbc 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -142,15 +142,6 @@ int hisi_uncore_pmu_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EOPNOTSUPP;
 
-   /* counters do not have these bits */
-   if (event->attr.exclude_user||
-   event->attr.exclude_kernel  ||
-   event->attr.exclude_host||
-   event->attr.exclude_guest   ||
-   event->attr.exclude_hv  ||
-   event->attr.exclude_idle)
-   return -EINVAL;
-
/*
 *  The uncore counters not specific to any CPU, so cannot
 *  support per-task
-- 
2.7.4



[PATCH v2 08/20] arm: perf/core: remove unnecessary checks for exclusion

2018-11-26 Thread Andrew Murray
For drivers that do not support context exclusion we do not
advertise the PERF_PMU_CAP_EXCLUDE capability. This ensures that
perf will prevent us from handling events where any exclusion
flags are set. Let's remove the now unnecessary check for
exclusion flags.

Signed-off-by: Andrew Murray 
---
 arch/arm/mach-imx/mmdc.c | 8 +---
 arch/arm/mm/cache-l2x0-pmu.c | 8 
 2 files changed, 1 insertion(+), 15 deletions(-)

diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index 04b3bf7..b937a15 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -293,13 +293,7 @@ static int mmdc_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
 
-   if (event->attr.exclude_user||
-   event->attr.exclude_kernel  ||
-   event->attr.exclude_hv  ||
-   event->attr.exclude_idle||
-   event->attr.exclude_host||
-   event->attr.exclude_guest   ||
-   event->attr.sample_period)
+   if (event->attr.sample_period)
return -EINVAL;
 
if (cfg < 0 || cfg >= MMDC_NUM_COUNTERS)
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index afe5b4c..ba92f9e 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -314,14 +314,6 @@ static int l2x0_pmu_event_init(struct perf_event *event)
event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
-- 
2.7.4



[PATCH v2 07/20] arm: perf: conditionally advertise PMU exclusion capability

2018-11-26 Thread Andrew Murray
The ARM PMU driver can be used to represent a variety of ARM based
PMUs. Some of these PMUs provide support for context exclusion, where
this is the case we advertise this via the PERF_PMU_CAP_EXCLUDE
capability to ensure that perf doesn't prevent us from handling events
where any exclusion flags are set.

Signed-off-by: Andrew Murray 
---
 drivers/perf/arm_pmu.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 7f01f6f..4409035 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -357,13 +357,6 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
 }
 
 static int
-event_requires_mode_exclusion(struct perf_event_attr *attr)
-{
-   return attr->exclude_idle || attr->exclude_user ||
-  attr->exclude_kernel || attr->exclude_hv;
-}
-
-static int
 __hw_perf_event_init(struct perf_event *event)
 {
struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
@@ -393,9 +386,8 @@ __hw_perf_event_init(struct perf_event *event)
/*
 * Check whether we need to exclude the counter from certain modes.
 */
-   if ((!armpmu->set_event_filter ||
-armpmu->set_event_filter(hwc, >attr)) &&
-event_requires_mode_exclusion(>attr)) {
+   if (armpmu->set_event_filter &&
+   armpmu->set_event_filter(hwc, >attr)) {
pr_debug("ARM performance counters do not support "
 "mode exclusion\n");
return -EOPNOTSUPP;
@@ -861,6 +853,9 @@ int armpmu_register(struct arm_pmu *pmu)
if (ret)
return ret;
 
+   if (pmu->set_event_filter)
+   pmu->pmu.capabilities |= PERF_PMU_CAP_EXCLUDE;
+
ret = perf_pmu_register(>pmu, pmu->name, -1);
if (ret)
goto out_destroy;
-- 
2.7.4



[PATCH v2 06/20] arc: perf/core: advertise PMU exclusion capability

2018-11-26 Thread Andrew Murray
The ARC PMU has the capability to exclude events based on
context. Let's advertise that we support the
PERF_PMU_CAP_EXCLUDE capability to ensure that perf doesn't
prevent us from handling events where any exclusion flags are
set.

Signed-off-by: Andrew Murray 
---
 arch/arc/kernel/perf_event.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 8aec462..cf8ea6d 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -514,6 +514,7 @@ static int arc_pmu_device_probe(struct platform_device 
*pdev)
.start  = arc_pmu_start,
.stop   = arc_pmu_stop,
.read   = arc_pmu_read,
+   .capabilities   = PERF_PMU_CAP_EXCLUDE,
};
 
if (has_interrupts) {
-- 
2.7.4



[PATCH v2 05/20] alpha: perf/core: remove unnecessary checks for exclusion

2018-11-26 Thread Andrew Murray
As the Alpha PMU doesn't support context exclusion we do not
advertise the PERF_PMU_CAP_EXCLUDE capability. This ensures that
perf will prevent us from handling events where any exclusion
flags are set. Let's remove the now unnecessary check for
exclusion flags.

This change means that __hw_perf_event_init will now also
indicate that it doesn't support exclude_host and exclude_guest and
will now implicitly return -EINVAL instead of -EPERM. This is likely
more desirable as -EPERM will result in a kernel.perf_event_paranoid
related warning from the perf userspace utility.

Signed-off-by: Andrew Murray 
---
 arch/alpha/kernel/perf_event.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/alpha/kernel/perf_event.c b/arch/alpha/kernel/perf_event.c
index 5613aa37..5c17077 100644
--- a/arch/alpha/kernel/perf_event.c
+++ b/arch/alpha/kernel/perf_event.c
@@ -630,12 +630,6 @@ static int __hw_perf_event_init(struct perf_event *event)
return ev;
}
 
-   /* The EV67 does not support mode exclusion */
-   if (attr->exclude_kernel || attr->exclude_user
-   || attr->exclude_hv || attr->exclude_idle) {
-   return -EPERM;
-   }
-
/*
 * We place the event type in event_base here and leave calculation
 * of the codes to programme the PMU for alpha_pmu_enable() because
-- 
2.7.4



[PATCH v2 04/20] perf/hw_breakpoint: perf/core: advertise PMU exclusion capability

2018-11-26 Thread Andrew Murray
The breakpoint PMU has the capability to exclude kernel contexts,
let's advertise that we support the PERF_PMU_CAP_EXCLUDE
capability so that perf doesn't prevent us from handling events
with any exclusion flags set.

Signed-off-by: Andrew Murray 
---
 kernel/events/hw_breakpoint.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index d6b5618..fefe3d5 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -669,6 +669,8 @@ static struct pmu perf_breakpoint = {
.start  = hw_breakpoint_start,
.stop   = hw_breakpoint_stop,
.read   = hw_breakpoint_pmu_read,
+
+   .capabilities   = PERF_PMU_CAP_EXCLUDE,
 };
 
 int __init init_hw_breakpoint(void)
-- 
2.7.4



[PATCH v2 03/20] perf/core: add PERF_PMU_CAP_EXCLUDE for exclusion capable PMUs

2018-11-26 Thread Andrew Murray
Many PMU drivers do not have the capability to exclude counting events
that occur in specific contexts such as idle, kernel, guest, etc. These
drivers indicate this by returning an error in their event_init upon
testing the events attribute flags. This approach is error prone and
often inconsistent.

Let's instead allow PMU drivers to advertise their ability to exclude
based on context via a new capability: PERF_PMU_CAP_EXCLUDE. This
allows the perf core to reject requests for exclusion events where
there is no support in the PMU.

Signed-off-by: Andrew Murray 
---
 include/linux/perf_event.h | 1 +
 kernel/events/core.c   | 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b2e806f..69b3d65 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -244,6 +244,7 @@ struct perf_event;
 #define PERF_PMU_CAP_EXCLUSIVE 0x10
 #define PERF_PMU_CAP_ITRACE0x20
 #define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x40
+#define PERF_PMU_CAP_EXCLUDE   0x80
 
 /**
  * struct pmu - generic performance monitoring unit
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5a97f34..9afb33c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9743,6 +9743,15 @@ static int perf_try_init_event(struct pmu *pmu, struct 
perf_event *event)
if (ctx)
perf_event_ctx_unlock(event->group_leader, ctx);
 
+   if (!ret) {
+   if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUDE) &&
+   event_has_any_exclude_flag(event)) {
+   if (event->destroy)
+   event->destroy(event);
+   ret = -EINVAL;
+   }
+   }
+
if (ret)
module_put(pmu->module);
 
-- 
2.7.4



[PATCH v2 02/20] perf/core: add function to test for event exclusion flags

2018-11-26 Thread Andrew Murray
Add a function that tests if any of the perf event exclusion flags
are set on a given event.

Signed-off-by: Andrew Murray 
---
 include/linux/perf_event.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 53c500f..b2e806f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event,
 extern void
 perf_log_lost_samples(struct perf_event *event, u64 lost);
 
+static inline bool event_has_any_exclude_flag(struct perf_event *event)
+{
+   struct perf_event_attr *attr = >attr;
+
+   return attr->exclude_idle || attr->exclude_user ||
+  attr->exclude_kernel || attr->exclude_hv ||
+  attr->exclude_guest || attr->exclude_host;
+}
+
 static inline bool is_sampling_event(struct perf_event *event)
 {
return event->attr.sample_period != 0;
-- 
2.7.4



[PATCH v2 01/20] perf/doc: update design.txt for exclude_{host|guest} flags

2018-11-26 Thread Andrew Murray
Update design.txt to reflect the presence of the exclude_host
and exclude_guest perf flags.

Signed-off-by: Andrew Murray 
---
 tools/perf/design.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/design.txt b/tools/perf/design.txt
index a28dca2..5b2b23b 100644
--- a/tools/perf/design.txt
+++ b/tools/perf/design.txt
@@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits 
provide a
 way to request that counting of events be restricted to times when the
 CPU is in user, kernel and/or hypervisor mode.
 
+Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
+to request counting of events restricted to guest and host contexts when
+using KVM virtualisation.
+
 The 'mmap' and 'munmap' bits allow recording of PROT_EXEC mmap/munmap
 operations, these can be used to relate userspace IP addresses to actual
 code, even after the mapping (or even the whole process) is gone,
-- 
2.7.4



[PATCH v2 00/20] perf/core: Generalise event exclusion checking

2018-11-26 Thread Andrew Murray
Many PMU drivers do not have the capability to exclude counting events
that occur in specific contexts such as idle, kernel, guest, etc. These
drivers indicate this by returning an error in their event_init upon
testing the events attribute flags.

However this approach requires that each time a new event modifier is
added to perf, all the perf drivers need to be modified to indicate that
they don't support the attribute. This results in additional boiler-plate
code common to many drivers that needs to be maintained. Furthermore the
drivers are not consistent with regards to the error value they return
when reporting unsupported attributes.

This patchset allow PMU drivers to advertise their ability to exclude
based on context via a new capability: PERF_PMU_CAP_EXCLUDE. This allows
the perf core to reject requests for exclusion events where there is no
support in the PMU.

This is a functional change, in particular:

 - Some drivers will now additionally (but correctly) report unsupported
   exclusion flags. It's typical for existing userspace tools such as
   perf to handle such errors by retrying the system call without the
   unsupported flags.

 - Drivers that do not support any exclusion that previously reported
   -EPERM or -EOPNOTSUPP will now report -EINVAL - this is consistent
   with the majority and results in userspace perf retrying without
   exclusion.

All drivers touched by this patchset have been compile tested.

Changes from v1:

 - Changed approach from explicitly rejecting events in unsupporting PMU
   drivers to explicitly advertising a capability in PMU drivers that
   do support exclusion events

 - Added additional information to tools/perf/design.txt

 - Rename event_has_exclude_flags to event_has_any_exclude_flag and
   update commit log to reflect it's a function

Andrew Murray (20):
  perf/doc: update design.txt for exclude_{host|guest} flags
  perf/core: add function to test for event exclusion flags
  perf/core: add PERF_PMU_CAP_EXCLUDE for exclusion capable PMUs
  perf/hw_breakpoint: perf/core: advertise PMU exclusion capability
  alpha: perf/core: remove unnecessary checks for exclusion
  arc: perf/core: advertise PMU exclusion capability
  arm: perf: conditionally advertise PMU exclusion capability
  arm: perf/core: remove unnecessary checks for exclusion
  drivers/perf: perf/core: remove unnecessary checks for exclusion
  drivers/perf: perf/core: remove unnecessary checks for exclusion
  drivers/perf: perf/core: advertise PMU exclusion capability
  mips: perf/core: advertise PMU exclusion capability
  powerpc: perf/core: advertise PMU exclusion capability
  powerpc: perf/core: remove unnecessary checks for exclusion
  s390: perf/events: advertise PMU exclusion capability
  sparc: perf/core: advertise PMU exclusion capability
  x86: perf/core: remove unnecessary checks for exclusion
  x86: perf/core remove unnecessary checks for exclusion
  x86: perf/core: advertise PMU exclusion capability
  perf/core: remove unused perf_flags

 arch/alpha/kernel/perf_event.c   |  6 --
 arch/arc/kernel/perf_event.c |  1 +
 arch/arm/mach-imx/mmdc.c |  8 +---
 arch/arm/mm/cache-l2x0-pmu.c |  8 
 arch/mips/kernel/perf_event_mipsxx.c |  1 +
 arch/powerpc/perf/core-book3s.c  |  1 +
 arch/powerpc/perf/core-fsl-emb.c |  1 +
 arch/powerpc/perf/hv-24x7.c  |  9 -
 arch/powerpc/perf/hv-gpci.c  |  9 -
 arch/powerpc/perf/imc-pmu.c  | 18 --
 arch/s390/kernel/perf_cpum_cf.c  |  1 +
 arch/s390/kernel/perf_cpum_sf.c  |  2 ++
 arch/sparc/kernel/perf_event.c   |  1 +
 arch/x86/events/amd/ibs.c| 12 
 arch/x86/events/amd/iommu.c  |  5 -
 arch/x86/events/amd/power.c  |  9 +
 arch/x86/events/amd/uncore.c |  5 -
 arch/x86/events/core.c   |  2 ++
 arch/x86/events/intel/bts.c  |  2 +-
 arch/x86/events/intel/cstate.c   |  8 +---
 arch/x86/events/intel/pt.c   |  4 +++-
 arch/x86/events/intel/rapl.c |  8 +---
 arch/x86/events/intel/uncore.c   |  8 
 arch/x86/events/intel/uncore_snb.c   |  8 +---
 arch/x86/events/msr.c|  8 +---
 drivers/perf/arm-cci.c   |  9 -
 drivers/perf/arm-ccn.c   |  5 +
 drivers/perf/arm_dsu_pmu.c   |  8 +---
 drivers/perf/arm_pmu.c   | 15 +--
 drivers/perf/arm_spe_pmu.c   |  3 ++-
 drivers/perf/hisilicon/hisi_uncore_pmu.c |  9 -
 drivers/perf/qcom_l2_pmu.c   |  8 
 drivers/perf/qcom_l3_pmu.c   |  7 ---
 drivers/perf/xgene_pmu.c |  5 -
 include/linux/perf_event.h   | 10 ++
 include/uapi/linux/perf_event.h  |  2 --
 kernel/events/core.c

[PATCH 2/2] powerpc/configs: Update ppc64_defconfig with savedefconfig

2018-11-26 Thread Michael Ellerman
Update ppc64_defconfig with savedefconfig. No symbols are added or
removed, this is 100% movement.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/configs/ppc64_defconfig | 68 ++--
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index 6cdf034e8625..860a3a72b559 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -1,4 +1,3 @@
-CONFIG_PPC64=y
 CONFIG_SYSVIPC=y
 CONFIG_POSIX_MQUEUE=y
 CONFIG_NO_HZ=y
@@ -16,14 +15,7 @@ CONFIG_BLK_DEV_INITRD=y
 CONFIG_BPF_SYSCALL=y
 # CONFIG_COMPAT_BRK is not set
 CONFIG_PROFILING=y
-CONFIG_OPROFILE=m
-CONFIG_KPROBES=y
-CONFIG_JUMP_LABEL=y
-CONFIG_MODULES=y
-CONFIG_MODULE_UNLOAD=y
-CONFIG_MODVERSIONS=y
-CONFIG_MODULE_SRCVERSION_ALL=y
-CONFIG_PARTITION_ADVANCED=y
+CONFIG_PPC64=y
 CONFIG_PPC_SPLPAR=y
 CONFIG_DTL=y
 CONFIG_SCANLOG=m
@@ -45,14 +37,11 @@ CONFIG_CPU_FREQ_GOV_USERSPACE=y
 CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
 CONFIG_CPU_FREQ_PMAC64=y
 CONFIG_HZ_100=y
-CONFIG_BINFMT_MISC=m
 CONFIG_PPC_TRANSACTIONAL_MEM=y
 CONFIG_KEXEC=y
 CONFIG_KEXEC_FILE=y
 CONFIG_CRASH_DUMP=y
 CONFIG_IRQ_ALL_CPUS=y
-CONFIG_KSM=y
-CONFIG_TRANSPARENT_HUGEPAGE=y
 CONFIG_PPC_64K_PAGES=y
 CONFIG_SCHED_SMT=y
 CONFIG_HOTPLUG_PCI=y
@@ -60,6 +49,21 @@ CONFIG_HOTPLUG_PCI_RPA=m
 CONFIG_HOTPLUG_PCI_RPA_DLPAR=m
 CONFIG_PCCARD=y
 CONFIG_ELECTRA_CF=y
+CONFIG_VIRTUALIZATION=y
+CONFIG_KVM_BOOK3S_64=m
+CONFIG_KVM_BOOK3S_64_HV=m
+CONFIG_VHOST_NET=m
+CONFIG_OPROFILE=m
+CONFIG_KPROBES=y
+CONFIG_JUMP_LABEL=y
+CONFIG_MODULES=y
+CONFIG_MODULE_UNLOAD=y
+CONFIG_MODVERSIONS=y
+CONFIG_MODULE_SRCVERSION_ALL=y
+CONFIG_PARTITION_ADVANCED=y
+CONFIG_BINFMT_MISC=m
+CONFIG_KSM=y
+CONFIG_TRANSPARENT_HUGEPAGE=y
 CONFIG_NET=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
@@ -163,7 +167,6 @@ CONFIG_TIGON3=y
 CONFIG_BNX2X=m
 CONFIG_CHELSIO_T1=m
 CONFIG_BE2NET=m
-CONFIG_S2IO=m
 CONFIG_IBMVETH=m
 CONFIG_EHEA=m
 CONFIG_E100=y
@@ -174,6 +177,7 @@ CONFIG_IXGBE=m
 CONFIG_I40E=m
 CONFIG_MLX4_EN=m
 CONFIG_MYRI10GE=m
+CONFIG_S2IO=m
 CONFIG_PASEMI_MAC=y
 CONFIG_QLGE=m
 CONFIG_NETXEN_NIC=m
@@ -323,23 +327,6 @@ CONFIG_NLS_CODEPAGE_437=y
 CONFIG_NLS_ASCII=y
 CONFIG_NLS_ISO8859_1=y
 CONFIG_NLS_UTF8=y
-CONFIG_MAGIC_SYSRQ=y
-CONFIG_DEBUG_KERNEL=y
-CONFIG_DEBUG_STACK_USAGE=y
-CONFIG_DEBUG_STACKOVERFLOW=y
-CONFIG_SOFTLOCKUP_DETECTOR=y
-CONFIG_HARDLOCKUP_DETECTOR=y
-CONFIG_DEBUG_MUTEXES=y
-CONFIG_LATENCYTOP=y
-CONFIG_FUNCTION_TRACER=y
-CONFIG_SCHED_TRACER=y
-CONFIG_BLK_DEV_IO_TRACE=y
-CONFIG_CODE_PATCHING_SELFTEST=y
-CONFIG_FTR_FIXUP_SELFTEST=y
-CONFIG_MSI_BITMAP_SELFTEST=y
-CONFIG_XMON=y
-CONFIG_BOOTX_TEXT=y
-CONFIG_PPC_EARLY_DEBUG=y
 CONFIG_CRYPTO_TEST=m
 CONFIG_CRYPTO_PCBC=m
 CONFIG_CRYPTO_HMAC=y
@@ -362,8 +349,21 @@ CONFIG_CRYPTO_LZO=m
 CONFIG_CRYPTO_DEV_NX=y
 CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
 CONFIG_CRYPTO_DEV_VMX=y
-CONFIG_VIRTUALIZATION=y
-CONFIG_KVM_BOOK3S_64=m
-CONFIG_KVM_BOOK3S_64_HV=m
-CONFIG_VHOST_NET=m
 CONFIG_PRINTK_TIME=y
+CONFIG_MAGIC_SYSRQ=y
+CONFIG_DEBUG_KERNEL=y
+CONFIG_DEBUG_STACK_USAGE=y
+CONFIG_DEBUG_STACKOVERFLOW=y
+CONFIG_SOFTLOCKUP_DETECTOR=y
+CONFIG_HARDLOCKUP_DETECTOR=y
+CONFIG_DEBUG_MUTEXES=y
+CONFIG_LATENCYTOP=y
+CONFIG_FUNCTION_TRACER=y
+CONFIG_SCHED_TRACER=y
+CONFIG_BLK_DEV_IO_TRACE=y
+CONFIG_CODE_PATCHING_SELFTEST=y
+CONFIG_FTR_FIXUP_SELFTEST=y
+CONFIG_MSI_BITMAP_SELFTEST=y
+CONFIG_XMON=y
+CONFIG_BOOTX_TEXT=y
+CONFIG_PPC_EARLY_DEBUG=y
-- 
2.17.2



[PATCH 1/2] powerpc/configs: Remove unnecessary ftrace symbols

2018-11-26 Thread Michael Ellerman
In commit 539df7fcb303 ("powerpc/configs: Enable function trace by
default") we added:

  CONFIG_FTRACE=y
  CONFIG_FUNCTION_TRACER=y
  CONFIG_FUNCTION_GRAPH_TRACER=y

To ppc64_defconfig, powernv_defconfig and pseries_defconfig.

But only CONFIG_FUNCTION_TRACER=y is required, CONFIG_FTRACE is
default y if DEBUG_KERNEL is enabled, which we have. And then
CONFIG_FUNCTION_GRAPH_TRACER is default y when CONFIG_FUNCTION_TRACER
is enabled.

The extra symbols were already removed from powernv_defconfig in
commit 9a018fb1e147 ("powerpc/config: powernv_defconfig updates").

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/configs/ppc64_defconfig   | 2 --
 arch/powerpc/configs/pseries_defconfig | 2 --
 2 files changed, 4 deletions(-)

diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index f2515674a1e2..6cdf034e8625 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -331,9 +331,7 @@ CONFIG_SOFTLOCKUP_DETECTOR=y
 CONFIG_HARDLOCKUP_DETECTOR=y
 CONFIG_DEBUG_MUTEXES=y
 CONFIG_LATENCYTOP=y
-CONFIG_FTRACE=y
 CONFIG_FUNCTION_TRACER=y
-CONFIG_FUNCTION_GRAPH_TRACER=y
 CONFIG_SCHED_TRACER=y
 CONFIG_BLK_DEV_IO_TRACE=y
 CONFIG_CODE_PATCHING_SELFTEST=y
diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index 5e09a40cbcbf..ea79c519863d 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -290,9 +290,7 @@ CONFIG_DEBUG_STACKOVERFLOW=y
 CONFIG_SOFTLOCKUP_DETECTOR=y
 CONFIG_HARDLOCKUP_DETECTOR=y
 CONFIG_LATENCYTOP=y
-CONFIG_FTRACE=y
 CONFIG_FUNCTION_TRACER=y
-CONFIG_FUNCTION_GRAPH_TRACER=y
 CONFIG_SCHED_TRACER=y
 CONFIG_BLK_DEV_IO_TRACE=y
 CONFIG_CODE_PATCHING_SELFTEST=y
-- 
2.17.2



Re: [PATCH 5/5] arch: simplify several early memory allocations

2018-11-26 Thread Mike Rapoport
On Mon, Nov 26, 2018 at 12:21:34AM -0800, Christoph Hellwig wrote:
> >  static void __init *early_alloc_aligned(unsigned long sz, unsigned long 
> > align)
> >  {
> > -   void *ptr = __va(memblock_phys_alloc(sz, align));
> > -   memset(ptr, 0, sz);
> > -   return ptr;
> > +   return memblock_alloc(sz, align);
> >  }
> 
> What is the point of keeping this wrapper?

No point indeed. I'll remove it in v2.
 
> >  static void __init *early_alloc(unsigned long sz)
> >  {
> > -   void *ptr = __va(memblock_phys_alloc(sz, sz));
> > -   memset(ptr, 0, sz);
> > -   return ptr;
> > +   return memblock_alloc(sz, sz);
> >  }
> 
> Same here.
> 

Here it provides a shortcut for allocations with align == size, but can be
removed as well.

-- 
Sincerely yours,
Mike.



Re: [PATCH 5/5] arch: simplify several early memory allocations

2018-11-26 Thread Christoph Hellwig
>  static void __init *early_alloc_aligned(unsigned long sz, unsigned long 
> align)
>  {
> - void *ptr = __va(memblock_phys_alloc(sz, align));
> - memset(ptr, 0, sz);
> - return ptr;
> + return memblock_alloc(sz, align);
>  }

What is the point of keeping this wrapper?

>  static void __init *early_alloc(unsigned long sz)
>  {
> - void *ptr = __va(memblock_phys_alloc(sz, sz));
> - memset(ptr, 0, sz);
> - return ptr;
> + return memblock_alloc(sz, sz);
>  }

Same here.