Re: [PATCH 0/4] powerpc: build out kprobes blacklist

2017-04-26 Thread Naveen N. Rao
On 2017/04/27 11:24AM, Masami Hiramatsu wrote:
> Hello Naveen,
> 
> On Tue, 25 Apr 2017 22:04:05 +0530
> "Naveen N. Rao"  wrote:
> 
> > This is the second in the series of patches to build out an appropriate
> > kprobes blacklist. This series blacklists system_call() and functions
> > involved when handling the trap itself. Not everything is covered, but
> > this is the first set of functions that I have tested with. More
> > patches to follow once I expand my tests.
> 
> OK, btw, have you tested to put kprobes on these functions and
> saw kernel panic happened?

Thanks for the question! I re-checked and I just realized I hadn't 
tested the PAPR case (stolen time accounting). On testing, I just found 
that those functions are not a problem since they are only invoked when 
we are coming in from user-space and not when we take a trap in-kernel.  
I will re-spin patch 3/4. Other functions have been tested to cause 
issues with kprobes.

There are still a few more functions involved (especially with 
CONFIG_PREEMPT) where I couldn't reproduce an issue, so I haven't 
included those in this series.  I am continuing to test this further and 
will post subsequent patches once I work out which other functions are 
problematic.

> 
> > I have converted many labels into private -- these are labels that I
> > felt are not necessary to read stack traces. If any of those are
> > important to have, please let me know.
> 
> At least from the viewpoint of kprobe-blacklist macros, it seems
> good to me :)
> 
> Reviewed-by: Masami Hiramatsu 
> 
> for this series.
> 
> Thank you,

Thanks for the review!
- Naveen



[PATCH-RESEND] cxl: Route eeh events to all drivers in cxl_pci_error_detected()

2017-04-26 Thread Vaibhav Jain
Fix a boundary condition where in some cases an eeh event that results
in card reset isn't passed on to a driver attached to the virtual PCI
device associated with a slice. This will happen in case when a slice
attached device driver returns a value other than
PCI_ERS_RESULT_NEED_RESET from the eeh error_detected() callback. This
would result in an early return from cxl_pci_error_detected() and
other drivers attached to other AFUs on the card wont be notified.

The patch fixes this by making sure that all slice attached
device-drivers are notified and the return values from
error_detected() callback are aggregated in a scheme where request for
'disconnect' trumps all and 'none' trumps 'need_reset'.

Cc: sta...@vger.kernel.org
Fixes: 9e8df8a21963("cxl: EEH support")
Based-on: https://patchwork.ozlabs.org/patch/755799/
Signed-off-by: Vaibhav Jain 
Reviewed-by: Andrew Donnellan 
Acked-by: Frederic Barrat 

---
Change-log:

Resend
- Marked to stable tree and added acks.
---
 drivers/misc/cxl/pci.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index dd9a128..f98d649 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1782,7 +1782,7 @@ static pci_ers_result_t cxl_pci_error_detected(struct 
pci_dev *pdev,
 {
struct cxl *adapter = pci_get_drvdata(pdev);
struct cxl_afu *afu;
-   pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET;
+   pci_ers_result_t result = PCI_ERS_RESULT_NEED_RESET, afu_result;
int i;
 
/* At this point, we could still have an interrupt pending.
@@ -1886,15 +1886,18 @@ static pci_ers_result_t cxl_pci_error_detected(struct 
pci_dev *pdev,
for (i = 0; i < adapter->slices; i++) {
afu = adapter->afu[i];
 
-   result = cxl_vphb_error_detected(afu, state);
-
-   /* Only continue if everyone agrees on NEED_RESET */
-   if (result != PCI_ERS_RESULT_NEED_RESET)
-   return result;
+   afu_result = cxl_vphb_error_detected(afu, state);
 
cxl_context_detach_all(afu);
cxl_ops->afu_deactivate_mode(afu, afu->current_mode);
pci_deconfigure_afu(afu);
+
+   /* Disconnect trumps all, NONE trumps NEED_RESET */
+   if (afu_result == PCI_ERS_RESULT_DISCONNECT)
+   result = PCI_ERS_RESULT_DISCONNECT;
+   else if ((afu_result == PCI_ERS_RESULT_NONE) &&
+(result == PCI_ERS_RESULT_NEED_RESET))
+   result = PCI_ERS_RESULT_NONE;
}
 
/* should take the context lock here */
-- 
2.9.3



[PATCH-RESEND v4] cxl: Force context lock during EEH flow

2017-04-26 Thread Vaibhav Jain
During an eeh event when the cxl card is fenced and card sysfs attr
perst_reloads_same_image is set following warning message is seen in the
kernel logs:

 [   60.622727] Adapter context unlocked with 0 active contexts
 [   60.622762] [ cut here ]
 [   60.622771] WARNING: CPU: 12 PID: 627 at
 ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]

Even though this warning is harmless, it clutters the kernel log
during an eeh event. This warning is triggered as the EEH callback
cxl_pci_error_detected doesn't obtain a context-lock before forcibly
detaching all active context and when context-lock is released during
call to cxl_configure_adapter from cxl_pci_slot_reset, a warning in
cxl_adapter_context_unlock is triggered.

To fix this warning, we acquire the adapter context-lock via
cxl_adapter_context_lock() in the eeh callback
cxl_pci_error_detected() once all the virtual AFU PHBs are notified
and their contexts detached. The context-lock is released in
cxl_pci_slot_reset() after the adapter is successfully reconfigured
and before the we call the slot_reset callback on slice attached
device-drivers.

Cc: sta...@vger.kernel.org
Fixes: 70b565bbdb91("cxl: Prevent adapter reset if an active context exists")
Reported-by: Andrew Donnellan 
Signed-off-by: Vaibhav Jain 
Acked-by: Frederic Barrat 
Reviewed-by: Matthew R. Ochs 
Tested-by: Uma Krishnan 
---
Change-Log:

Resend
- Added Acks and Sign-offs.

v3..v4
- Moved the call to context-unlock from cxl_pci_resume to
cxl_pci_slot_reset to let cxlflash module activate its master context
during slot reset. (Fred)

v2..v3
- As discussed with Fred removed function
cxl_adapter_context_force_lock() which may potentially expose the code
to deadlock in the future.
-  Other details of changes in cxl_pci_error_detected() to fix an
earlier issue of eeh callbacks not being passed on to all slices, is
being reworked as a separate patch.

v2..v1
- Moved the call to cxl_adapter_context_force_lock() from
cxl_pci_error_detected() to cxl_remove. (Fred)
---
 drivers/misc/cxl/pci.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index b27ea98..dd9a128 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1496,8 +1496,6 @@ static int cxl_configure_adapter(struct cxl *adapter, 
struct pci_dev *dev)
if ((rc = cxl_native_register_psl_err_irq(adapter)))
goto err;
 
-   /* Release the context lock as adapter is configured */
-   cxl_adapter_context_unlock(adapter);
return 0;
 
 err:
@@ -1596,6 +1594,9 @@ static struct cxl *cxl_pci_init_adapter(struct pci_dev 
*dev)
if ((rc = cxl_sysfs_adapter_add(adapter)))
goto err_put1;
 
+   /* Release the context lock as adapter is configured */
+   cxl_adapter_context_unlock(adapter);
+
return adapter;
 
 err_put1:
@@ -1895,6 +1896,13 @@ static pci_ers_result_t cxl_pci_error_detected(struct 
pci_dev *pdev,
cxl_ops->afu_deactivate_mode(afu, afu->current_mode);
pci_deconfigure_afu(afu);
}
+
+   /* should take the context lock here */
+   if (cxl_adapter_context_lock(adapter) != 0)
+   dev_warn(&adapter->dev,
+"Couldn't take context lock with %d active-contexts\n",
+atomic_read(&adapter->contexts_num));
+
cxl_deconfigure_adapter(adapter);
 
return result;
@@ -1913,6 +1921,13 @@ static pci_ers_result_t cxl_pci_slot_reset(struct 
pci_dev *pdev)
if (cxl_configure_adapter(adapter, pdev))
goto err;
 
+   /*
+* Unlock context activation for the adapter. Ideally this should be
+* done in cxl_pci_resume but cxlflash module tries to activate the
+* master context as part of slot_reset callback.
+*/
+   cxl_adapter_context_unlock(adapter);
+
for (i = 0; i < adapter->slices; i++) {
afu = adapter->afu[i];
 
-- 
2.9.3



Re: [PATCH v2] cxl: Prevent IRQ storm

2017-04-26 Thread Alastair D'Silva
On Thu, 2017-04-27 at 14:41 +1000, Andrew Donnellan wrote:
On 27/04/17 11:37, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> > 
> > In some situations, a faulty AFU slice may create an interrupt
> > storm,
> > rendering the machine unusable. Since these interrupts are
> > informational
> > only, present the interrupt once, then mask it off to prevent it
> > from
> > being retriggered until the card is reset.
> > 
> > Changelog:
> > v2
> > Rebase against linux-next
> 
> The patch changelog shouldn't be part of the commit message - it
> should 
> go under a "---" line after the sign-off so it doesn't get included
> in 
> the final commit.
> 
> Also now that I've taken a second look, I think the summary line of
> the 
> commit message could be more descriptive, something like:
> 
> "cxl: mask slice error interrupt after first occurrence"
> 
> or something to that effect?
> 

Ok

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH v2] cxl: Prevent IRQ storm

2017-04-26 Thread Andrew Donnellan

On 27/04/17 11:37, Alastair D'Silva wrote:

From: Alastair D'Silva 

In some situations, a faulty AFU slice may create an interrupt storm,
rendering the machine unusable. Since these interrupts are informational
only, present the interrupt once, then mask it off to prevent it from
being retriggered until the card is reset.

Changelog:
v2
Rebase against linux-next


The patch changelog shouldn't be part of the commit message - it should 
go under a "---" line after the sign-off so it doesn't get included in 
the final commit.


Also now that I've taken a second look, I think the summary line of the 
commit message could be more descriptive, something like:


"cxl: mask slice error interrupt after first occurrence"

or something to that effect?


Andrew

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH] powerpc/mm/hugetlb: Add support for 1G huge pages

2017-04-26 Thread Aneesh Kumar K.V
Anshuman Khandual  writes:

> On 04/17/2017 10:44 PM, Aneesh Kumar K.V wrote:
>> POWER9 supports hugepages of size 2M and 1G in radix MMU mode. This patch
>> enables the usage of 1G page size for hugetlbfs. This also update the helper
>> such we can do 1G page allocation at runtime.
>> 
>> Since we can do this only when radix translation mode is enabled, we can't 
>> use
>> the generic gigantic_page_supported helper. Hence provide a way for 
>> architecture
>> to override gigantic_page_supported helper.
>> 
>> We still don't enable 1G page size on DD1 version. This is to avoid doing
>> workaround mentioned in commit: 6d3a0379ebdc8 (powerpc/mm: Add
>> radix__tlb_flush_pte_p9_dd1()
>> 
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>  arch/powerpc/include/asm/book3s/64/hugetlb.h | 13 +
>>  arch/powerpc/mm/hugetlbpage.c|  7 +--
>>  arch/powerpc/platforms/Kconfig.cputype   |  1 +
>>  mm/hugetlb.c |  4 
>>  4 files changed, 23 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h 
>> b/arch/powerpc/include/asm/book3s/64/hugetlb.h
>> index cd366596..86f27cc8ec61 100644
>> --- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
>> +++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
>> @@ -50,4 +50,17 @@ static inline pte_t arch_make_huge_pte(pte_t entry, 
>> struct vm_area_struct *vma,
>>  else
>>  return entry;
>>  }
>> +
>> +#if defined(CONFIG_ARCH_HAS_GIGANTIC_PAGE) &&   
>> \
>> +((defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || \
>> + defined(CONFIG_CMA))
>> +#define gigantic_page_supported gigantic_page_supported
>
> As I have mentioned in later part of the reply, it does not really
> make sense to have both arch call back as well as generic config
> option checking to decide on whether a feature is enabled or not.
>
>> +static inline bool gigantic_page_supported(void)
>> +{
>> +if (radix_enabled())
>> +return true;
>> +return false;
>> +}
>> +#endif
>> +
>>  #endif
>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
>> index a4f33de4008e..80f6d2ed551a 100644
>> --- a/arch/powerpc/mm/hugetlbpage.c
>> +++ b/arch/powerpc/mm/hugetlbpage.c
>> @@ -763,8 +763,11 @@ static int __init add_huge_page_size(unsigned long long 
>> size)
>>   * Hash: 16M and 16G
>>   */
>>  if (radix_enabled()) {
>> -if (mmu_psize != MMU_PAGE_2M)
>> -return -EINVAL;
>> +if (mmu_psize != MMU_PAGE_2M) {
>> +if (cpu_has_feature(CPU_FTR_POWER9_DD1) ||
>> +(mmu_psize != MMU_PAGE_1G))
>> +return -EINVAL;
>> +}
>>  } else {
>>  if (mmu_psize != MMU_PAGE_16M && mmu_psize != MMU_PAGE_16G)
>>  return -EINVAL;
>> diff --git a/arch/powerpc/platforms/Kconfig.cputype 
>> b/arch/powerpc/platforms/Kconfig.cputype
>> index ef4c4b8fc547..f4ba4bf0d762 100644
>> --- a/arch/powerpc/platforms/Kconfig.cputype
>> +++ b/arch/powerpc/platforms/Kconfig.cputype
>> @@ -343,6 +343,7 @@ config PPC_STD_MMU_64
>>  config PPC_RADIX_MMU
>>  bool "Radix MMU Support"
>>  depends on PPC_BOOK3S_64
>> +select ARCH_HAS_GIGANTIC_PAGE
>>  default y
>>  help
>
> As we are already checking for radix_enabled() test inside function
> gigantic_page_supported(), do we still need to conditionally enable
> this on Radix based platforms only ?
>
>
>>Enable support for the Power ISA 3.0 Radix style MMU. Currently this
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 3d0aab9ee80d..2c090189f314 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -1158,7 +1158,11 @@ static int alloc_fresh_gigantic_page(struct hstate *h,
>>  return 0;
>>  }
>>  
>> +#ifndef gigantic_page_supported
>>  static inline bool gigantic_page_supported(void) { return true; }
>> +#define gigantic_page_supported gigantic_page_supported
>> +#endif
>
> As seen above, now that arch's decision to support this feature is not
> based solely on ARCH_HAS_GIGANTIC_PAGE config option but also on the
> availability of platform features like radix, is it a good time to have
> an arch call back deciding on gigantic_page_supported() test instead of
> just checking presence of config options like 
>
> #if defined(CONFIG_ARCH_HAS_GIGANTIC_PAGE) && \
>   ((defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || \
>   defined(CONFIG_CMA))
>
> We should not have both as proposed. I mean CONFIG_ARCH_HAS_GIGANTIC_PAGE
> should not be enabled unless we have MEMORY_ISOLATION && COMPACTION && CMA
> and once enabled we should have arch_gigantic_page_supported() deciding for
> gigantic_page_supported().

I will update the patch. I guess I can also fixup other arch that enable
GIGANTIC_PAGE accordingly.

-aneesh



Re: [PATCH 0/4] powerpc: build out kprobes blacklist

2017-04-26 Thread Masami Hiramatsu
Hello Naveen,

On Tue, 25 Apr 2017 22:04:05 +0530
"Naveen N. Rao"  wrote:

> This is the second in the series of patches to build out an appropriate
> kprobes blacklist. This series blacklists system_call() and functions
> involved when handling the trap itself. Not everything is covered, but
> this is the first set of functions that I have tested with. More
> patches to follow once I expand my tests.

OK, btw, have you tested to put kprobes on these functions and
saw kernel panic happened?

> I have converted many labels into private -- these are labels that I
> felt are not necessary to read stack traces. If any of those are
> important to have, please let me know.

At least from the viewpoint of kprobe-blacklist macros, it seems
good to me :)

Reviewed-by: Masami Hiramatsu 

for this series.

Thank you,

> 
> - Naveen
> 
> Naveen N. Rao (4):
>   powerpc/kprobes: cleanup system_call_common and blacklist it from
> kprobes
>   powerpc/kprobes: un-blacklist system_call() from kprobes
>   powerpc/kprobes: blacklist functions invoked on a trap
>   powerpc/kprobes: blacklist functions involved when returning from
> exception
> 
>  arch/powerpc/kernel/entry_64.S   | 94 
> +++-
>  arch/powerpc/kernel/exceptions-64s.S |  1 +
>  arch/powerpc/kernel/time.c   |  3 ++
>  arch/powerpc/kernel/traps.c  |  3 ++
>  arch/powerpc/platforms/pseries/dtl.c |  2 +
>  5 files changed, 60 insertions(+), 43 deletions(-)
> 
> -- 
> 2.12.1
> 


-- 
Masami Hiramatsu 


Re: [PATCH] kallsyms: optimize kallsyms_lookup_name() for a few cases

2017-04-26 Thread Masami Hiramatsu
On Thu, 27 Apr 2017 01:38:10 +0530
"Naveen N. Rao"  wrote:

> Michael Ellerman wrote:
> > "Naveen N. Rao"  writes:
> >> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> >> index 6a3b249a2ae1..d134b060564f 100644
> >> --- a/kernel/kallsyms.c
> >> +++ b/kernel/kallsyms.c
> >> @@ -205,6 +205,12 @@ unsigned long kallsyms_lookup_name(const char *name)
> >>unsigned long i;
> >>unsigned int off;
> >>  
> >> +  if (!name || *name == '\0')
> >> +  return false;
> >> +
> >> +  if (strnchr(name, MODULE_NAME_LEN, ':'))
> >> +  return module_kallsyms_lookup_name(name);
> >> +
> >>for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
> >>off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
> > ... 
> > }
> > return module_kallsyms_lookup_name(name);
> > 
> > Is the rest of the context.
> > 
> > Which looks a bit odd, we already did module lookup previously?
> > 
> > But it's correct, because you can lookup a symbol in a module without a
> > module prefix, it just looks in every module.
> 
> Yes.
> 
> > 
> > You could invert the logic, ie. check that there isn't a ":" in the name
> > and only in that case do the for loop, always falling back to module
> > lookup.
> > 
> > Or just add a comment explaining why we call module lookup in two places.
> 
> Good point. Here's a v2 - I'm using a goto so as to not indent the code too 
> much.
> 
> Thanks for the review!
> - Naveen
> 
> --
> [PATCH v2] kallsyms: optimize kallsyms_lookup_name() for a few cases
> 
> 1. Fail early for invalid/zero length symbols.
> 2. Detect names of the form  and skip checking for kernel
> symbols in that case.
> 

Looks good to me.

Reviewed-by: Masami Hiramatsu 

Thanks,


> Signed-off-by: Naveen N. Rao 
> ---
>  kernel/kallsyms.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index 6a3b249a2ae1..f7558dc5c6ac 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -205,12 +205,20 @@ unsigned long kallsyms_lookup_name(const char *name)
>   unsigned long i;
>   unsigned int off;
>  
> + if (!name || *name == '\0')
> + return 0;
> +
> + /* For symbols of the form :, only check the modules */
> + if (strnchr(name, MODULE_NAME_LEN, ':'))
> + goto mod;
> +
>   for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>   off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>  
>   if (strcmp(namebuf, name) == 0)
>   return kallsyms_sym_address(i);
>   }
> +mod:
>   return module_kallsyms_lookup_name(name);
>  }
>  EXPORT_SYMBOL_GPL(kallsyms_lookup_name);
> -- 
> 2.12.2
> 


-- 
Masami Hiramatsu 


[PATCH v2] cxl: Prevent IRQ storm

2017-04-26 Thread Alastair D'Silva
From: Alastair D'Silva 

In some situations, a faulty AFU slice may create an interrupt storm,
rendering the machine unusable. Since these interrupts are informational
only, present the interrupt once, then mask it off to prevent it from
being retriggered until the card is reset.

Changelog:
v2
Rebase against linux-next

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/cxl/native.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index 194c58e..95a776a 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -1205,7 +1205,7 @@ static irqreturn_t native_slice_irq_err(int irq, void 
*data)
 {
struct cxl_afu *afu = data;
u64 errstat, serr, afu_error, dsisr;
-   u64 fir_slice, afu_debug;
+   u64 fir_slice, afu_debug, irq_mask;
 
/*
 * slice err interrupt is only used with full PSL (no XSL)
@@ -1226,6 +1226,10 @@ static irqreturn_t native_slice_irq_err(int irq, void 
*data)
dev_crit(&afu->dev, "AFU_ERR_An: 0x%.16llx\n", afu_error);
dev_crit(&afu->dev, "PSL_DSISR_An: 0x%.16llx\n", dsisr);
 
+   /* mask off the IRQ so it won't retrigger until the card is reset */
+   irq_mask = (serr & 0xff80ULL) >> 32;
+   serr |= irq_mask;
+
cxl_p1n_write(afu, CXL_PSL_SERR_An, serr);
 
return IRQ_HANDLED;
-- 
2.9.3



[PATCH] powerpc/powernv: Fix missing attr initialisation in opal_export_attrs()

2017-04-26 Thread Michael Ellerman
In opal_export_attrs() we dynamically allocate some bin_attributes. They're
allocated with kmalloc() and although we initialise most of the fields, we don't
initialise write() or mmap(), and in particular we don't initialise the lockdep
related fields in the embedded struct attribute.

This leads to a lockdep warning at boot:

  BUG: key c000f11906d8 not in .data!
  WARNING: CPU: 0 PID: 1 at ../kernel/locking/lockdep.c:3136 
lockdep_init_map+0x28c/0x2a0
  ...
  Call Trace:
lockdep_init_map+0x288/0x2a0 (unreliable)
__kernfs_create_file+0x8c/0x170
sysfs_add_file_mode_ns+0xc8/0x240
__machine_initcall_powernv_opal_init+0x60c/0x684
do_one_initcall+0x60/0x1c0
kernel_init_freeable+0x2f4/0x3d4
kernel_init+0x24/0x160
ret_from_kernel_thread+0x5c/0xb0

Fix it by kzalloc'ing the attr, which fixes the uninitialised write() and
mmap(), and calling sysfs_bin_attr_init() on it to initialise the lockdep
fields.

Fixes: 11fe909d2362 ("powerpc/powernv: Add OPAL exports attributes to sysfs")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/powernv/opal.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 76e153fc1f93..7925a9d72cca 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -637,13 +637,14 @@ static void opal_export_attrs(void)
if (of_property_read_u64_array(np, prop->name, &vals[0], 2))
continue;
 
-   attr = kmalloc(sizeof(*attr), GFP_KERNEL);
+   attr = kzalloc(sizeof(*attr), GFP_KERNEL);
 
if (attr == NULL) {
pr_warn("Failed kmalloc for bin_attribute!");
continue;
}
 
+   sysfs_bin_attr_init(attr);
attr->attr.name = kstrdup(prop->name, GFP_KERNEL);
attr->attr.mode = 0400;
attr->read = export_attr_read;
-- 
2.7.4



Re: [PATCH] cxl: Prevent IRQ storm

2017-04-26 Thread Andrew Donnellan

On 27/04/17 11:09, Alastair D'Silva wrote:

Patch looks good, thanks!
It doesn't apply cleanly on the 'next' tree due to the capi2
patchset
though, so you should probably rebase on that tree. The bits have
changed a bit on PSL9, but the approach still works (error type
reported
in the first byte, and the corresponding masking bits are still
right-shifted by 32).



Hmm, both you & the documentation say 8 bits, but the code suggests 9:
#define CXL_PSL_SERR_An_afuto   (1ull << (63-0))
#define CXL_PSL_SERR_An_afudis  (1ull << (63-1))
#define CXL_PSL_SERR_An_afuov   (1ull << (63-2))
#define CXL_PSL_SERR_An_badsrc  (1ull << (63-3))
#define CXL_PSL_SERR_An_badctx  (1ull << (63-4))
#define CXL_PSL_SERR_An_llcmdis (1ull << (63-5))
#define CXL_PSL_SERR_An_llcmdto (1ull << (63-6))
#define CXL_PSL_SERR_An_afupar  (1ull << (63-7))
#define CXL_PSL_SERR_An_afudup  (1ull << (63-8))

Referenced from irq.c:cxl_afu_decode_psl_serr()

Thoughts?


The latest version of the (IBM internal) PSL8 workbook which I happen to 
have at hand lists bit 8 as "Reserved" in the bitfield diagram, but 
lists it as "afudup" in the description underneath, so I think it's safe 
to say it's the first 9 bits.


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH] cxl: Prevent IRQ storm

2017-04-26 Thread Alastair D'Silva
On Wed, 2017-04-26 at 11:23 +0200, Frederic Barrat wrote:
> 
> Le 26/04/2017 à 08:40, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > In some situations, a faulty AFU slice may create an interrupt
> > storm,
> > rendering the machine unusable. Since these interrupts are
> > informational
> > only, present the interrupt once, then mask it off to prevent it
> > from
> > being retriggered until the card is reset.
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> 
> Patch looks good, thanks!
> It doesn't apply cleanly on the 'next' tree due to the capi2
> patchset 
> though, so you should probably rebase on that tree. The bits have 
> changed a bit on PSL9, but the approach still works (error type
> reported 
> in the first byte, and the corresponding masking bits are still 
> right-shifted by 32).
> 

Hmm, both you & the documentation say 8 bits, but the code suggests 9:
#define CXL_PSL_SERR_An_afuto   (1ull << (63-0))
#define CXL_PSL_SERR_An_afudis  (1ull << (63-1))
#define CXL_PSL_SERR_An_afuov   (1ull << (63-2))
#define CXL_PSL_SERR_An_badsrc  (1ull << (63-3))
#define CXL_PSL_SERR_An_badctx  (1ull << (63-4))
#define CXL_PSL_SERR_An_llcmdis (1ull << (63-5))
#define CXL_PSL_SERR_An_llcmdto (1ull << (63-6))
#define CXL_PSL_SERR_An_afupar  (1ull << (63-7))
#define CXL_PSL_SERR_An_afudup  (1ull << (63-8))

Referenced from irq.c:cxl_afu_decode_psl_serr()

Thoughts?

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH] kallsyms: optimize kallsyms_lookup_name() for a few cases

2017-04-26 Thread Naveen N. Rao
Michael Ellerman wrote:
> "Naveen N. Rao"  writes:
>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
>> index 6a3b249a2ae1..d134b060564f 100644
>> --- a/kernel/kallsyms.c
>> +++ b/kernel/kallsyms.c
>> @@ -205,6 +205,12 @@ unsigned long kallsyms_lookup_name(const char *name)
>>  unsigned long i;
>>  unsigned int off;
>>  
>> +if (!name || *name == '\0')
>> +return false;
>> +
>> +if (strnchr(name, MODULE_NAME_LEN, ':'))
>> +return module_kallsyms_lookup_name(name);
>> +
>>  for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>>  off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>   ... 
>   }
>   return module_kallsyms_lookup_name(name);
> 
> Is the rest of the context.
> 
> Which looks a bit odd, we already did module lookup previously?
> 
> But it's correct, because you can lookup a symbol in a module without a
> module prefix, it just looks in every module.

Yes.

> 
> You could invert the logic, ie. check that there isn't a ":" in the name
> and only in that case do the for loop, always falling back to module
> lookup.
> 
> Or just add a comment explaining why we call module lookup in two places.

Good point. Here's a v2 - I'm using a goto so as to not indent the code too 
much.

Thanks for the review!
- Naveen

--
[PATCH v2] kallsyms: optimize kallsyms_lookup_name() for a few cases

1. Fail early for invalid/zero length symbols.
2. Detect names of the form  and skip checking for kernel
symbols in that case.

Signed-off-by: Naveen N. Rao 
---
 kernel/kallsyms.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 6a3b249a2ae1..f7558dc5c6ac 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -205,12 +205,20 @@ unsigned long kallsyms_lookup_name(const char *name)
unsigned long i;
unsigned int off;
 
+   if (!name || *name == '\0')
+   return 0;
+
+   /* For symbols of the form :, only check the modules */
+   if (strnchr(name, MODULE_NAME_LEN, ':'))
+   goto mod;
+
for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
 
if (strcmp(namebuf, name) == 0)
return kallsyms_sym_address(i);
}
+mod:
return module_kallsyms_lookup_name(name);
 }
 EXPORT_SYMBOL_GPL(kallsyms_lookup_name);
-- 
2.12.2



Re: [PATCH] powerpc/kprobes: refactor kprobe_lookup_name for safer string operations

2017-04-26 Thread Naveen N. Rao

Excerpts from Masami Hiramatsu's message of April 26, 2017 10:11:

On Tue, 25 Apr 2017 21:37:11 +0530
"Naveen N. Rao"  wrote:


Use safer string manipulation functions when dealing with a
user-provided string in kprobe_lookup_name().

Reported-by: David Laight 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/kprobes.c | 47 ++-
 1 file changed, 20 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 160ae0fa7d0d..5a17e6467be9 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -53,7 +53,7 @@ bool arch_within_kprobe_blacklist(unsigned long addr)
 
 kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)

 {
-   kprobe_opcode_t *addr;
+   kprobe_opcode_t *addr = NULL;
 
 #ifdef PPC64_ELF_ABI_v2

/* PPC64 ABIv2 needs local entry point */
@@ -85,36 +85,29 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
 * Also handle  format.
 */
char dot_name[MODULE_NAME_LEN + 1 + KSYM_NAME_LEN];
-   const char *modsym;
bool dot_appended = false;
-   if ((modsym = strchr(name, ':')) != NULL) {
-   modsym++;
-   if (*modsym != '\0' && *modsym != '.') {
-   /* Convert to  */
-   strncpy(dot_name, name, modsym - name);
-   dot_name[modsym - name] = '.';
-   dot_name[modsym - name + 1] = '\0';
-   strncat(dot_name, modsym,
-   sizeof(dot_name) - (modsym - name) - 2);
-   dot_appended = true;
-   } else {
-   dot_name[0] = '\0';
-   strncat(dot_name, name, sizeof(dot_name) - 1);
-   }
-   } else if (name[0] != '.') {
-   dot_name[0] = '.';
-   dot_name[1] = '\0';
-   strncat(dot_name, name, KSYM_NAME_LEN - 2);
+   const char *c;
+   ssize_t ret = 0;
+   int len = 0;
+
+   if ((c = strnchr(name, MODULE_NAME_LEN, ':')) != NULL) {
+   c++;
+   len = c - name;
+   memcpy(dot_name, name, len);
+   } else
+   c = name;
+
+   if (*c != '\0' && *c != '.') {
+   dot_name[len++] = '.';
dot_appended = true;
-   } else {
-   dot_name[0] = '\0';
-   strncat(dot_name, name, KSYM_NAME_LEN - 1);
}
-   addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name);
-   if (!addr && dot_appended) {
-   /* Let's try the original non-dot symbol lookup */
+   ret = strscpy(dot_name + len, c, KSYM_NAME_LEN);
+   if (ret >= 0)


Here, maybe you can skip the case of ret == 0. (Or, would we have
a symbol which only has "."?)


Ah, indeed. Good point. We just need the test to be (ret > 0).

Michael,
If the rest of the patch is fine by you, would you be ok to make the 
small change above? If not, please let me know and I'll re-spin. Thanks.




Others look good to me.


Thanks for the review,
- Naveen




Re: [PATCH v3 1/2] powerpc/fadump: reduce memory consumption for capture kernel

2017-04-26 Thread Michal Suchánek
Hello,

On Wed, 26 Apr 2017 13:05:20 +0530
Hari Bathini  wrote:

> With fadump (dump capture) kernel booting like a regular kernel, it
> almost needs the same amount of memory to boot as the production
> kernel, which is unwarranted for a dump capture kernel. But with no
> option to disable some of the unnecessary subsystems in fadump
> kernel, that much memory is wasted on fadump, depriving the
> production kernel of that memory.
> 
> Introduce kernel parameter 'fadump_append=' that would take regular
> kernel parameters as a comma separated list, to be enforced when
> fadump is active. This 'fadump_append=' parameter can be leveraged to
> pass parameters like nr_cpus=1, cgroup_disable=memory and numa=off,
> to disable unwarranted resources/subsystems.

According to Linux admin guide at
Documentation/admin-guide/kernel-parameters.{rst,txt}

there are arguments like isolcpus=1,2,10-20,100-2000:2/25
baycom_ser_hdx=,, or console=ttyUSB0[,options] that
include a comma. On the other hand, same guide suggests that parameters
can be quoted param="spaces in here". 

So I think it would be more sensible to document the existing quoting
mechanism in the fadump_append option documentation rather than
inventing your own that is incompatible with some existing options.

Thanks

Michal


Re: [PATCH] powerpc/mm/hugetlb: Add support for 1G huge pages

2017-04-26 Thread Anshuman Khandual
On 04/17/2017 10:44 PM, Aneesh Kumar K.V wrote:
> POWER9 supports hugepages of size 2M and 1G in radix MMU mode. This patch
> enables the usage of 1G page size for hugetlbfs. This also update the helper
> such we can do 1G page allocation at runtime.
> 
> Since we can do this only when radix translation mode is enabled, we can't use
> the generic gigantic_page_supported helper. Hence provide a way for 
> architecture
> to override gigantic_page_supported helper.
> 
> We still don't enable 1G page size on DD1 version. This is to avoid doing
> workaround mentioned in commit: 6d3a0379ebdc8 (powerpc/mm: Add
> radix__tlb_flush_pte_p9_dd1()
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/book3s/64/hugetlb.h | 13 +
>  arch/powerpc/mm/hugetlbpage.c|  7 +--
>  arch/powerpc/platforms/Kconfig.cputype   |  1 +
>  mm/hugetlb.c |  4 
>  4 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h 
> b/arch/powerpc/include/asm/book3s/64/hugetlb.h
> index cd366596..86f27cc8ec61 100644
> --- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
> +++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
> @@ -50,4 +50,17 @@ static inline pte_t arch_make_huge_pte(pte_t entry, struct 
> vm_area_struct *vma,
>   else
>   return entry;
>  }
> +
> +#if defined(CONFIG_ARCH_HAS_GIGANTIC_PAGE) &&
> \
> + ((defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || \
> +  defined(CONFIG_CMA))
> +#define gigantic_page_supported gigantic_page_supported

As I have mentioned in later part of the reply, it does not really
make sense to have both arch call back as well as generic config
option checking to decide on whether a feature is enabled or not.

> +static inline bool gigantic_page_supported(void)
> +{
> + if (radix_enabled())
> + return true;
> + return false;
> +}
> +#endif
> +
>  #endif
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index a4f33de4008e..80f6d2ed551a 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -763,8 +763,11 @@ static int __init add_huge_page_size(unsigned long long 
> size)
>* Hash: 16M and 16G
>*/
>   if (radix_enabled()) {
> - if (mmu_psize != MMU_PAGE_2M)
> - return -EINVAL;
> + if (mmu_psize != MMU_PAGE_2M) {
> + if (cpu_has_feature(CPU_FTR_POWER9_DD1) ||
> + (mmu_psize != MMU_PAGE_1G))
> + return -EINVAL;
> + }
>   } else {
>   if (mmu_psize != MMU_PAGE_16M && mmu_psize != MMU_PAGE_16G)
>   return -EINVAL;
> diff --git a/arch/powerpc/platforms/Kconfig.cputype 
> b/arch/powerpc/platforms/Kconfig.cputype
> index ef4c4b8fc547..f4ba4bf0d762 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -343,6 +343,7 @@ config PPC_STD_MMU_64
>  config PPC_RADIX_MMU
>   bool "Radix MMU Support"
>   depends on PPC_BOOK3S_64
> + select ARCH_HAS_GIGANTIC_PAGE
>   default y
>   help

As we are already checking for radix_enabled() test inside function
gigantic_page_supported(), do we still need to conditionally enable
this on Radix based platforms only ?


> Enable support for the Power ISA 3.0 Radix style MMU. Currently this
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3d0aab9ee80d..2c090189f314 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1158,7 +1158,11 @@ static int alloc_fresh_gigantic_page(struct hstate *h,
>   return 0;
>  }
>  
> +#ifndef gigantic_page_supported
>  static inline bool gigantic_page_supported(void) { return true; }
> +#define gigantic_page_supported gigantic_page_supported
> +#endif

As seen above, now that arch's decision to support this feature is not
based solely on ARCH_HAS_GIGANTIC_PAGE config option but also on the
availability of platform features like radix, is it a good time to have
an arch call back deciding on gigantic_page_supported() test instead of
just checking presence of config options like 

#if defined(CONFIG_ARCH_HAS_GIGANTIC_PAGE) && \
((defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || \
defined(CONFIG_CMA))

We should not have both as proposed. I mean CONFIG_ARCH_HAS_GIGANTIC_PAGE
should not be enabled unless we have MEMORY_ISOLATION && COMPACTION && CMA
and once enabled we should have arch_gigantic_page_supported() deciding for
gigantic_page_supported().



[PATCH v2 2/2] powerpc/mm/radix: Optimise tlbiel flush all case

2017-04-26 Thread Michael Ellerman
_tlbiel_pid() is called with a ric (Radix Invalidation Control) argument of
either RIC_FLUSH_TLB or RIC_FLUSH_ALL.

RIC_FLUSH_ALL says to invalidate the entire TLB and the Page Walk Cache (PWC).

To flush the whole TLB, we have to iterate over each set (congruence class) of
the TLB. Currently we do that and pass RIC_FLUSH_ALL each time. That is not
incorrect but it means we flush the PWC 128 times, when once would suffice.

Fix it by doing the first flush with the ric value we're passed, and then if it
was RIC_FLUSH_ALL, we downgrade it to RIC_FLUSH_TLB, because we know we have
just flushed the PWC and don't need to do it again.

Signed-off-by: Aneesh Kumar K.V 
[mpe: Split out of combined patch, tweak logic, rewrite change log]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/tlb-radix.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index ae2e799822bd..5e17c4e873a5 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -46,9 +46,20 @@ static inline void _tlbiel_pid(unsigned long pid, unsigned 
long ric)
int set;
 
asm volatile("ptesync": : :"memory");
-   for (set = 0; set < POWER9_TLB_SETS_RADIX ; set++) {
+
+   /*
+* Flush the first set of the TLB, and if we're doing a RIC_FLUSH_ALL,
+* also flush the entire Page Walk Cache.
+*/
+   __tlbiel_pid(pid, 0, ric);
+
+   if (ric == RIC_FLUSH_ALL)
+   /* For the remaining sets, just flush the TLB */
+   ric = RIC_FLUSH_TLB;
+
+   for (set = 1; set < POWER9_TLB_SETS_RADIX ; set++)
__tlbiel_pid(pid, set, ric);
-   }
+
asm volatile("ptesync": : :"memory");
asm volatile(PPC_INVALIDATE_ERAT "; isync" : : :"memory");
 }
-- 
2.7.4



[PATCH v2 1/2] powerpc/mm/radix: Optimise Page Walk Cache flush

2017-04-26 Thread Michael Ellerman
Currently we implement flushing of the page walk cache (PWC) by calling
_tlbiel_pid() with a RIC (Radix Invalidation Control) value of 1 which says to
only flush the PWC.

But _tlbiel_pid() loops over each set (congruence class) of the TLB, which is
not necessary when we're just flushing the PWC.

In fact the set argument is ignored for a PWC flush, so essentially we're just
flushing the PWC 127 extra times for no benefit.

Fix it by adding tlbiel_pwc() which just does a single flush of the PWC.

Signed-off-by: Aneesh Kumar K.V 
[mpe: Split out of combined patch, drop _ in name, rewrite change log]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/tlb-radix.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index b68b5219cf45..ae2e799822bd 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -53,6 +53,17 @@ static inline void _tlbiel_pid(unsigned long pid, unsigned 
long ric)
asm volatile(PPC_INVALIDATE_ERAT "; isync" : : :"memory");
 }
 
+static inline void tlbiel_pwc(unsigned long pid)
+{
+   asm volatile("ptesync": : :"memory");
+
+   /* For PWC flush, we don't look at set number */
+   __tlbiel_pid(pid, 0, RIC_FLUSH_PWC);
+
+   asm volatile("ptesync": : :"memory");
+   asm volatile(PPC_INVALIDATE_ERAT "; isync" : : :"memory");
+}
+
 static inline void _tlbie_pid(unsigned long pid, unsigned long ric)
 {
unsigned long rb,rs,prs,r;
@@ -140,7 +151,7 @@ void radix__local_flush_tlb_pwc(struct mmu_gather *tlb, 
unsigned long addr)
 
pid = mm->context.id;
if (pid != MMU_NO_CONTEXT)
-   _tlbiel_pid(pid, RIC_FLUSH_PWC);
+   tlbiel_pwc(pid);
 
preempt_enable();
 }
@@ -222,7 +233,7 @@ void radix__flush_tlb_pwc(struct mmu_gather *tlb, unsigned 
long addr)
if (lock_tlbie)
raw_spin_unlock(&native_tlbie_lock);
} else
-   _tlbiel_pid(pid, RIC_FLUSH_PWC);
+   tlbiel_pwc(pid);
 no_context:
preempt_enable();
 }
-- 
2.7.4



Re: [PATCH v3 1/2] powerpc/fadump: reduce memory consumption for capture kernel

2017-04-26 Thread Michael Ellerman
Hari Bathini  writes:

> On Wednesday 26 April 2017 04:02 PM, Michael Ellerman wrote:
>> Hari Bathini  writes:
>>
>>> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
>>> index 8ff0dd4..87edc7b 100644
>>> --- a/arch/powerpc/kernel/fadump.c
>>> +++ b/arch/powerpc/kernel/fadump.c
>>> @@ -338,6 +343,36 @@ unsigned long __init arch_reserved_kernel_pages(void)
>>> return memblock_reserved_size() / PAGE_SIZE;
>>>   }
>>>   
>>> +static void __init parse_fadump_append_params(const char *p)
>>> +{
>>> +   static char fadump_cmdline[COMMAND_LINE_SIZE] __initdata;
>>> +   char *token;
>>> +
>>> +   strlcpy(fadump_cmdline, p, COMMAND_LINE_SIZE);
>>> +   token = strchr(fadump_cmdline, ',');
>>> +   while (token) {
>>> +   *token = ' ';
>>> +   token = strchr(token, ',');
>>> +   }
>>> +
>>> +   pr_info("enforcing additional parameters (%s) passed through "
>>> +   "'fadump_append=' parameter\n", fadump_cmdline);
>>> +   parse_early_options(fadump_cmdline);
>> Using parse_early_options() means it only works for parameters defined
>> with early_param() or __setup() doesn't it?
>
> Yeah. Actually, is it better to change commandline from "$params 
> fadump_append=nr_cpus=1,numa=off"
> to "$params nr_cpus=1 numa=off" early in the boot process?

It's not better, it's more confusing for a user, because the command
line looks different to what they specified.

But it might be the only option because I don't know if there's an easy
way to trigger parsing for the non-early options.

cheers


Re: [PATCH v3 1/2] powerpc/fadump: reduce memory consumption for capture kernel

2017-04-26 Thread Hari Bathini



On Wednesday 26 April 2017 04:02 PM, Michael Ellerman wrote:

Hari Bathini  writes:


diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 8ff0dd4..87edc7b 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -338,6 +343,36 @@ unsigned long __init arch_reserved_kernel_pages(void)
return memblock_reserved_size() / PAGE_SIZE;
  }
  
+static void __init parse_fadump_append_params(const char *p)

+{
+   static char fadump_cmdline[COMMAND_LINE_SIZE] __initdata;
+   char *token;
+
+   strlcpy(fadump_cmdline, p, COMMAND_LINE_SIZE);
+   token = strchr(fadump_cmdline, ',');
+   while (token) {
+   *token = ' ';
+   token = strchr(token, ',');
+   }
+
+   pr_info("enforcing additional parameters (%s) passed through "
+   "'fadump_append=' parameter\n", fadump_cmdline);
+   parse_early_options(fadump_cmdline);

Using parse_early_options() means it only works for parameters defined
with early_param() or __setup() doesn't it?


Yeah. Actually, is it better to change commandline from "$params 
fadump_append=nr_cpus=1,numa=off"
to "$params nr_cpus=1 numa=off" early in the boot process? That way, 
parameters like udev.children-max=2 &
systemd.unit=kdump.service may also be specified in fadump_append= 
parameter?


Thanks
Hari



Re: [PATCH v3] KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller

2017-04-26 Thread Paul Mackerras
On Wed, Apr 26, 2017 at 12:07:30PM +1000, Michael Ellerman wrote:
> From: Benjamin Herrenschmidt 
> 
> This patch makes KVM capable of using the XIVE interrupt controller
> to provide the standard PAPR "XICS" style hypercalls. It is necessary
> for proper operations when the host uses XIVE natively.
> 
> This has been lightly tested on an actual system, including PCI
> pass-through with a TG3 device.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build
>  failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and
>  adding empty stubs for the kvm_xive_xxx() routines, fixup subject]
> Signed-off-by: Michael Ellerman 

The Kconfig stuff for KVM_XIVE is wrong, you need:

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 6498424..c56939e 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -200,7 +200,7 @@ config KVM_XICS
 config KVM_XIVE
bool
default y
-   depends on KVM_XICS && XIVE_NATIVE
+   depends on KVM_XICS && PPC_XIVE_NATIVE
 
 source drivers/vhost/Kconfig
 
on top of what you posted.

Paul.


[PATCH] powerpc/powernv: Fix oops on P9 DD1 in cause_ipi()

2017-04-26 Thread Michael Ellerman
Recently we merged the native xive support for Power9, and then separately some
reworks for doorbell IPI support. In isolation both series were OK, but the
merged result had a bug in one case.

On P9 DD1 we use pnv_p9_dd1_cause_ipi() which tries to use doorbells, and then
falls back to the interrupt controller. However the fallback is implemented by
calling icp_ops->cause_ipi. But now that xive support is merged we might be
using xive, in which case icp_ops is not initialised, it's a xics specific
structure. This leads to an oops such as:

  Unable to handle kernel paging request for data at address 0x0028
  Oops: Kernel access of bad area, sig: 11 [#1]
  NIP pnv_p9_dd1_cause_ipi+0x74/0xe0
  LR smp_muxed_ipi_message_pass+0x54/0x70

To fix it, rather than using icp_ops which might be NULL, have both xics and
xive set smp_ops->cause_ipi, and then in the powernv code we save that as
ic_cause_ipi before overriding smp_ops->cause_ipi. For paranoia add a WARN_ON()
to check if somehow smp_ops->cause_ipi is NULL.

Fixes: b866cc2199d6 ("powerpc: Change the doorbell IPI calling convention")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/powernv/smp.c   | 12 
 arch/powerpc/sysdev/xics/xics-common.c |  3 +++
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index 951a2e230cfa..5189445db164 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -249,12 +249,15 @@ static int pnv_smp_prepare_cpu(int cpu)
return 0;
 }
 
+/* Cause IPI as setup by the interrupt controller (xics or xive) */
+static void (*ic_cause_ipi)(int cpu);
+
 static void pnv_cause_ipi(int cpu)
 {
if (doorbell_try_core_ipi(cpu))
return;
 
-   icp_ops->cause_ipi(cpu);
+   ic_cause_ipi(cpu);
 }
 
 static void pnv_p9_dd1_cause_ipi(int cpu)
@@ -269,7 +272,7 @@ static void pnv_p9_dd1_cause_ipi(int cpu)
if (cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu)))
doorbell_global_ipi(cpu);
else
-   icp_ops->cause_ipi(cpu);
+   ic_cause_ipi(cpu);
 
put_cpu();
 }
@@ -282,6 +285,9 @@ static void __init pnv_smp_probe(void)
xics_smp_probe();
 
if (cpu_has_feature(CPU_FTR_DBELL)) {
+   ic_cause_ipi = smp_ops->cause_ipi;
+   WARN_ON(!ic_cause_ipi);
+
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
if (cpu_has_feature(CPU_FTR_POWER9_DD1))
smp_ops->cause_ipi = pnv_p9_dd1_cause_ipi;
@@ -290,8 +296,6 @@ static void __init pnv_smp_probe(void)
} else {
smp_ops->cause_ipi = pnv_cause_ipi;
}
-   } else {
-   smp_ops->cause_ipi = icp_ops->cause_ipi;
}
 }
 
diff --git a/arch/powerpc/sysdev/xics/xics-common.c 
b/arch/powerpc/sysdev/xics/xics-common.c
index 2eb53c12ee6e..ffe138b8b9dc 100644
--- a/arch/powerpc/sysdev/xics/xics-common.c
+++ b/arch/powerpc/sysdev/xics/xics-common.c
@@ -145,6 +145,9 @@ void __init xics_smp_probe(void)
 {
/* Register all the IPIs */
xics_request_ipi();
+
+   /* Setup cause_ipi callback based on which ICP is used */
+   smp_ops->cause_ipi = icp_ops->cause_ipi;
 }
 
 #endif /* CONFIG_SMP */
-- 
2.7.4



Re: [PATCH 2/8] selftests: lib.mk: define CLEAN macro to allow Makefiles to override clean

2017-04-26 Thread Michael Ellerman
Shuah Khan  writes:
> On 04/21/2017 11:38 PM, Michael Ellerman wrote:
>> Shuah Khan  writes:

>> This patch is a good solution to fix the warnings.
>> 
>> Acked-by: Michael Ellerman 
>
> Thanks. I plan to apply the patch with the amended changelog and your
> Ack. Please let me know if you want to see v2 with the change sent out.

I don't really mind if you send it or not.

cheers


Re: [PATCH] kallsyms: optimize kallsyms_lookup_name() for a few cases

2017-04-26 Thread Michael Ellerman
"Naveen N. Rao"  writes:
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index 6a3b249a2ae1..d134b060564f 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -205,6 +205,12 @@ unsigned long kallsyms_lookup_name(const char *name)
>   unsigned long i;
>   unsigned int off;
>  
> + if (!name || *name == '\0')
> + return false;
> +
> + if (strnchr(name, MODULE_NAME_LEN, ':'))
> + return module_kallsyms_lookup_name(name);
> +
>   for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>   off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
... 
}
return module_kallsyms_lookup_name(name);

Is the rest of the context.

Which looks a bit odd, we already did module lookup previously?

But it's correct, because you can lookup a symbol in a module without a
module prefix, it just looks in every module.

You could invert the logic, ie. check that there isn't a ":" in the name
and only in that case do the for loop, always falling back to module
lookup.

Or just add a comment explaining why we call module lookup in two places.

cheers


Re: [PATCH v3 1/2] powerpc/fadump: reduce memory consumption for capture kernel

2017-04-26 Thread Michael Ellerman
Hari Bathini  writes:

> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 8ff0dd4..87edc7b 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -338,6 +343,36 @@ unsigned long __init arch_reserved_kernel_pages(void)
>   return memblock_reserved_size() / PAGE_SIZE;
>  }
>  
> +static void __init parse_fadump_append_params(const char *p)
> +{
> + static char fadump_cmdline[COMMAND_LINE_SIZE] __initdata;
> + char *token;
> +
> + strlcpy(fadump_cmdline, p, COMMAND_LINE_SIZE);
> + token = strchr(fadump_cmdline, ',');
> + while (token) {
> + *token = ' ';
> + token = strchr(token, ',');
> + }
> +
> + pr_info("enforcing additional parameters (%s) passed through "
> + "'fadump_append=' parameter\n", fadump_cmdline);
> + parse_early_options(fadump_cmdline);

Using parse_early_options() means it only works for parameters defined
with early_param() or __setup() doesn't it?

That might be OK but at least you need to clearly document it.

cheers


Re: [PATCH] cxl: Prevent IRQ storm

2017-04-26 Thread Frederic Barrat



Le 26/04/2017 à 08:40, Alastair D'Silva a écrit :

From: Alastair D'Silva 

In some situations, a faulty AFU slice may create an interrupt storm,
rendering the machine unusable. Since these interrupts are informational
only, present the interrupt once, then mask it off to prevent it from
being retriggered until the card is reset.

Signed-off-by: Alastair D'Silva 
---


Patch looks good, thanks!
It doesn't apply cleanly on the 'next' tree due to the capi2 patchset 
though, so you should probably rebase on that tree. The bits have 
changed a bit on PSL9, but the approach still works (error type reported 
in the first byte, and the corresponding masking bits are still 
right-shifted by 32).


  Fred


 drivers/misc/cxl/native.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index 7ae7105..4e8010f 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -996,7 +996,7 @@ static void native_irq_wait(struct cxl_context *ctx)
 static irqreturn_t native_slice_irq_err(int irq, void *data)
 {
struct cxl_afu *afu = data;
-   u64 fir_slice, errstat, serr, afu_debug, afu_error, dsisr;
+   u64 fir_slice, errstat, serr, afu_debug, afu_error, dsisr, irq_mask;

/*
 * slice err interrupt is only used with full PSL (no XSL)
@@ -1014,6 +1014,10 @@ static irqreturn_t native_slice_irq_err(int irq, void 
*data)
dev_crit(&afu->dev, "AFU_ERR_An: 0x%.16llx\n", afu_error);
dev_crit(&afu->dev, "PSL_DSISR_An: 0x%.16llx\n", dsisr);

+   /* mask off the IRQ so it won't retrigger until the card is reset */
+   irq_mask = (serr & 0xff80ULL) >> 32;
+   serr |= irq_mask;
+
cxl_p1n_write(afu, CXL_PSL_SERR_An, serr);

return IRQ_HANDLED;





Re: [PATCH] powerpc/64s: use ibm,tlbiel-congruence-classes-(hash|radix) dt property

2017-04-26 Thread Nicholas Piggin
On Mon, 24 Apr 2017 13:53:12 +0530
"Aneesh Kumar K.V"  wrote:

> Nicholas Piggin  writes:
> 
> > tlbiel instruction with IS!=0 on POWER7 and later Book3s CPUs invalidate
> > TLBs belonging to a specified congruence class. In order to operate on
> > the entire TLB, all congruence classes must be specified, requiring a
> > software loop.
> >
> > This dt property specifies the number of classes that must be operated
> > on. Use this to set tlbiel loop counts. If the property does not exist,
> > fall back to hard coded values based on the cpu table.
> >
> > Signed-off-by: Nicholas Piggin 
> > return 0;  
> .
> .
> 
> > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> > index d2f0afeae5a0..08ec2f431eff 100644
> > --- a/arch/powerpc/kernel/prom.c
> > +++ b/arch/powerpc/kernel/prom.c
> > @@ -236,8 +236,27 @@ static void __init init_mmu_slb_size(unsigned long 
> > node)
> > if (slb_size_ptr)
> > mmu_slb_size = be32_to_cpup(slb_size_ptr);
> >  }
> > +static void __init init_mmu_tlb_sets_hash(unsigned long node)
> > +{
> > +   const __be32 *ptr;
> > +
> > +   ptr = of_get_flat_dt_prop(node, "ibm,tlbiel-congruence-classes-hash", 
> > NULL);
> > +   if (ptr)
> > +   cur_cpu_spec->tlb_sets_hash = be32_to_cpup(ptr);
> > +}
> > +
> > +static void __init init_mmu_tlb_sets_radix(unsigned long node)
> > +{
> > +   const __be32 *ptr;
> > +
> > +   ptr = of_get_flat_dt_prop(node, "ibm,tlbiel-congruence-classes-radix", 
> > NULL);
> > +   if (ptr)
> > +   cur_cpu_spec->tlb_sets_radix = be32_to_cpup(ptr);
> > +}
> >  #else
> >  #define init_mmu_slb_size(node) do { } while(0)
> > +#define init_mmu_hash_sets(node) do { } while(0)
> > +#define init_mmu_radix_sets(node) do { } while(0)
> >  #endif
> >
> >  static struct feature_property {
> > @@ -385,6 +404,8 @@ static int __init early_init_dt_scan_cpus(unsigned long 
> > node,
> > check_cpu_feature_properties(node);
> > check_cpu_pa_features(node);
> > init_mmu_slb_size(node);
> > +   init_mmu_tlb_sets_hash(node);
> > +   init_mmu_tlb_sets_radix(node);
> >  
> 
> 
> I thought cpu features patch series had a generic mechanism to parse all
> these based on dt_cpu_feature_match_table[] ?

It could in theory. We could have a feature node for this set-wise
invalidation behaviour, and make a custom property of that feature
to give the number of sets to loop over.

At the moment Ben preferred each feature to be just a "binary" (present
or absent), with the accompanying metadata just being for the
compatibility bits. But this is a policy decision, and we may reconsider
it as we start adding extra cpu-features for future hardware.

For these numbers I think it's okay to add this way for now.


> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index fadb75abfe37..2211cda5de90 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -3430,14 +3430,7 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> >  * Work out how many sets the TLB has, for the use of
> >  * the TLB invalidation loop in book3s_hv_rmhandlers.S.
> >  */
> > -   if (kvm_is_radix(kvm))
> > -   kvm->arch.tlb_sets = POWER9_TLB_SETS_RADIX; /* 128 */
> > -   else if (cpu_has_feature(CPU_FTR_ARCH_300))
> > -   kvm->arch.tlb_sets = POWER9_TLB_SETS_HASH;  /* 256 */
> > -   else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > -   kvm->arch.tlb_sets = POWER8_TLB_SETS;   /* 512 */
> > -   else
> > -   kvm->arch.tlb_sets = POWER7_TLB_SETS;   /* 128 */
> > +   kvm->arch.tlb_sets = cur_cpu_spec->tlb_sets;  
> 
> 
> This should be based on guest mode right ? ie, we need to set
> kvm->arch.tlb_sets based on the mode kvm guest is running ?

You mean radix vs hash? Yes... Well, that's a reasonable assumption for
the behaviour of a CPU that supports guest mode != host mode, right? I'll
make the change.

> > /*
> >  * Track that we now have a HV mode VM active. This blocks secondary
> > diff --git a/arch/powerpc/kvm/book3s_hv_ras.c 
> > b/arch/powerpc/kvm/book3s_hv_ras.c
> > index 7ef0993214f3..f62798ce304b 100644
> > --- a/arch/powerpc/kvm/book3s_hv_ras.c
> > +++ b/arch/powerpc/kvm/book3s_hv_ras.c
> > @@ -87,8 +87,7 @@ static long kvmppc_realmode_mc_power7(struct kvm_vcpu 
> > *vcpu)
> >DSISR_MC_SLB_PARITY | DSISR_MC_DERAT_MULTI);
> > }
> > if (dsisr & DSISR_MC_TLB_MULTI) {
> > -   if (cur_cpu_spec && cur_cpu_spec->flush_tlb)
> > -   cur_cpu_spec->flush_tlb(TLB_INVAL_SCOPE_LPID);
> > +   machine_check_flush_tlb(TLB_INVAL_SCOPE_LPID);
> > dsisr &= ~DSISR_MC_TLB_MULTI;
> > }
> > /* Any other errors we don't understand? */
> > @@ -105,8 +104,7 @@ static long kvmppc_realmode_mc_power7(struct kvm_vcpu 
> > *vcpu)
> > reload_slb(vcpu);
> > break;
> > c

Re: stable-rc build: 0 warnings 2 failures (stable-rc/v4.4.63-29-ge636782)

2017-04-26 Thread gregkh
On Wed, Apr 26, 2017 at 03:00:19PM +1000, Michael Ellerman wrote:
> Arnd Bergmann  writes:
> 
> > On Tue, Apr 25, 2017 at 7:08 PM, Olof's autobuilder  wrote:
> >>
> >> Failed defconfigs:
> >> powerpc.pasemi_defconfig
> >> powerpc.ppc64_defconfig
> >>
> >> ---
> >>
> >> Errors:
> >>
> >> /work/build/batch/arch/powerpc/kernel/misc_64.S:72: Error: .localentry 
> >> expression for `flush_icache_range' does not evaluate to a constant
> >
> > This is a regression in 4.4-stable-rc, caused by the backport of commit
> > 8f5f525d5b83f7d7 ("powerpc/64: Fix flush_(d|i)cache_range() called from
> > modules").
> >
> > The patch was also backported into v4.9-stable-rc, but did not cause
> > problems there. The fixes tag suggests that the patch is needed on
> > every version from 3.16 up, but apparently the fix needs to be a little
> > different compared to newer kernels.
> 
> Yeah, the auto-backport failed, and then I sent a version for 4.4 which
> fixed the bug but broke other configs (big endian).
> 
> I've asked Greg to drop it for now.

It's now dropped, thanks.

greg k-h


[PATCH v3 2/2] powerpc/fadump: update documentation about 'fadump_append=' parameter

2017-04-26 Thread Hari Bathini
With the introduction of 'fadump_append=' parameter to pass additional
parameters to fadump (capture) kernel, update documentation about it.

Signed-off-by: Hari Bathini 
---
 Documentation/powerpc/firmware-assisted-dump.txt |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/powerpc/firmware-assisted-dump.txt 
b/Documentation/powerpc/firmware-assisted-dump.txt
index 3007bc9..8e12380 100644
--- a/Documentation/powerpc/firmware-assisted-dump.txt
+++ b/Documentation/powerpc/firmware-assisted-dump.txt
@@ -158,7 +158,11 @@ How to enable firmware-assisted dump (fadump):
 
 1. Set config option CONFIG_FA_DUMP=y and build kernel.
 2. Boot into linux kernel with 'fadump=on' kernel cmdline option.
-3. Optionally, user can also set 'fadump_reserve_mem=' kernel cmdline
+3. A user can pass additional kernel parameters as a comma separated list
+   through 'fadump_append=' parameter, to be enforced when fadump is active.
+   This can be used to pass parameters like nr_cpus=1, numa=off to reduce
+   memory consumption during dump capture.
+4. Optionally, user can also set 'fadump_reserve_mem=' kernel cmdline
to specify size of the memory to reserve for boot memory dump
preservation.
 



[PATCH v3 1/2] powerpc/fadump: reduce memory consumption for capture kernel

2017-04-26 Thread Hari Bathini
With fadump (dump capture) kernel booting like a regular kernel, it almost
needs the same amount of memory to boot as the production kernel, which is
unwarranted for a dump capture kernel. But with no option to disable some
of the unnecessary subsystems in fadump kernel, that much memory is wasted
on fadump, depriving the production kernel of that memory.

Introduce kernel parameter 'fadump_append=' that would take regular kernel
parameters as a comma separated list, to be enforced when fadump is active.
This 'fadump_append=' parameter can be leveraged to pass parameters like
nr_cpus=1, cgroup_disable=memory and numa=off, to disable unwarranted
resources/subsystems.

Also, ensure the log "Firmware-assisted dump is active" is printed early
in the boot process to put the subsequent fadump messages in context.

Suggested-by: Michael Ellerman 
Signed-off-by: Hari Bathini 
---

Changes from v2:
* Introducing 'fadump_append=' parameter to pass additional kernel  

  parameters instead of using a handover area.


 arch/powerpc/kernel/fadump.c |   41 ++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 8ff0dd4..87edc7b 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -79,8 +79,10 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
 * dump data waiting for us.
 */
fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL);
-   if (fdm_active)
+   if (fdm_active) {
+   pr_info("Firmware-assisted dump is active.\n");
fw_dump.dump_active = 1;
+   }
 
/* Get the sizes required to store dump data for the firmware provided
 * dump sections.
@@ -257,8 +259,12 @@ int __init fadump_reserve_mem(void)
 {
unsigned long base, size, memory_boundary;
 
-   if (!fw_dump.fadump_enabled)
+   if (!fw_dump.fadump_enabled) {
+   if (fw_dump.dump_active)
+   pr_warn("Firmware-assisted dump was active but kernel"
+   " booted with fadump disabled!\n");
return 0;
+   }
 
if (!fw_dump.fadump_supported) {
printk(KERN_INFO "Firmware-assisted dump is not supported on"
@@ -298,7 +304,6 @@ int __init fadump_reserve_mem(void)
memory_boundary = memblock_end_of_DRAM();
 
if (fw_dump.dump_active) {
-   printk(KERN_INFO "Firmware-assisted dump is active.\n");
/*
 * If last boot has crashed then reserve all the memory
 * above boot_memory_size so that we don't touch it until
@@ -338,6 +343,36 @@ unsigned long __init arch_reserved_kernel_pages(void)
return memblock_reserved_size() / PAGE_SIZE;
 }
 
+static void __init parse_fadump_append_params(const char *p)
+{
+   static char fadump_cmdline[COMMAND_LINE_SIZE] __initdata;
+   char *token;
+
+   strlcpy(fadump_cmdline, p, COMMAND_LINE_SIZE);
+   token = strchr(fadump_cmdline, ',');
+   while (token) {
+   *token = ' ';
+   token = strchr(token, ',');
+   }
+
+   pr_info("enforcing additional parameters (%s) passed through "
+   "'fadump_append=' parameter\n", fadump_cmdline);
+   parse_early_options(fadump_cmdline);
+}
+
+/* Look for fadump_append= cmdline option. */
+static int __init early_fadump_append_param(char *p)
+{
+   if (!p)
+   return 1;
+
+   if (fw_dump.dump_active)
+   parse_fadump_append_params(p);
+
+   return 0;
+}
+early_param("fadump_append", early_fadump_append_param);
+
 /* Look for fadump= cmdline option. */
 static int __init early_fadump_param(char *p)
 {



Re: [PATCH v4 2/3] powerpc/fadump: Use the correct VMCOREINFO_NOTE_SIZE for phdr

2017-04-26 Thread Dave Young
Ccing ppc list
On 04/20/17 at 07:39pm, Xunlei Pang wrote:
> vmcoreinfo_max_size stands for the vmcoreinfo_data, the
> correct one we should use is vmcoreinfo_note whose total
> size is VMCOREINFO_NOTE_SIZE.
> 
> Like explained in commit 77019967f06b ("kdump: fix exported
> size of vmcoreinfo note"), it should not affect the actual
> function, but we better fix it, also this change should be
> safe and backward compatible.
> 
> After this, we can get rid of variable vmcoreinfo_max_size,
> let's use the corresponding macros directly, fewer variables
> means more safety for vmcoreinfo operation.
> 
> Cc: Mahesh Salgaonkar 
> Cc: Hari Bathini 
> Signed-off-by: Xunlei Pang 
> ---
> v3->v4:
> -Rebased on the latest linux-next
> 
>  arch/powerpc/kernel/fadump.c | 3 +--
>  include/linux/crash_core.h   | 1 -
>  kernel/crash_core.c  | 3 +--
>  3 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 466569e..7bd6cd0 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -893,8 +893,7 @@ static int fadump_create_elfcore_headers(char *bufp)
>  
>   phdr->p_paddr   = fadump_relocate(paddr_vmcoreinfo_note());
>   phdr->p_offset  = phdr->p_paddr;
> - phdr->p_memsz   = vmcoreinfo_max_size;
> - phdr->p_filesz  = vmcoreinfo_max_size;
> + phdr->p_memsz   = phdr->p_filesz = VMCOREINFO_NOTE_SIZE;
>  
>   /* Increment number of program headers. */
>   (elf->e_phnum)++;
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index ba283a2..7d6bc7b 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -55,7 +55,6 @@
>  
>  extern u32 *vmcoreinfo_note;
>  extern size_t vmcoreinfo_size;
> -extern size_t vmcoreinfo_max_size;
>  
>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
> void *data, size_t data_len);
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 0321f04..43cdb00 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -16,7 +16,6 @@
>  /* vmcoreinfo stuff */
>  static unsigned char *vmcoreinfo_data;
>  size_t vmcoreinfo_size;
> -size_t vmcoreinfo_max_size = VMCOREINFO_BYTES;
>  u32 *vmcoreinfo_note;
>  
>  /*
> @@ -343,7 +342,7 @@ void vmcoreinfo_append_str(const char *fmt, ...)
>   r = vscnprintf(buf, sizeof(buf), fmt, args);
>   va_end(args);
>  
> - r = min(r, vmcoreinfo_max_size - vmcoreinfo_size);
> + r = min(r, VMCOREINFO_BYTES - vmcoreinfo_size);
>  
>   memcpy(&vmcoreinfo_data[vmcoreinfo_size], buf, r);
>  
> -- 
> 1.8.3.1
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Reviewed-by: Dave Young 

Thanks
Dave