date:20160621

Re: [PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-21 Thread Michael Ellerman

On Tue, 2016-06-21 at 14:28 +0530, Naveen N. Rao wrote:
> On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote:
> > On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote:
> > > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote:
> > > > 
> > > > Hi, Michael and Naveen.
> > > > 
> > > > I noticed independently that there is a problem with BPF JIT and ABIv2, 
> > > > and
> > > > worked out the patch below before I noticed Naveen's patchset and the 
> > > > latest
> > > > changes in ppc tree for a better way to check for ABI versions.
> > > > 
> > > > However, since the issue described below affect mainline and stable 
> > > > kernels,
> > > > would you consider applying it before merging your two patchsets, so 
> > > > that we can
> > > > more easily backport the fix?
> > > 
> > > Hi Cascardo,
> > > Given that this has been broken on ABIv2 since forever, I didn't bother 
> > > fixing it. But, I can see why this would be a good thing to have for 
> > > -stable and existing distros. However, while your patch below may fix 
> > > the crash you're seeing on ppc64le, it is not sufficient -- you'll need 
> > > changes in bpf_jit_asm.S as well.
> > 
> > Hi, Naveen.
> > 
> > Any tips on how to exercise possible issues there? Or what changes you think
> > would be sufficient?
> 
> The calling convention is different with ABIv2 and so we'll need changes 
> in bpf_slow_path_common() and sk_negative_common().
> 
> However, rather than enabling classic JIT for ppc64le, are we better off 
> just disabling it?
> 
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -128,7 +128,7 @@ config PPC
> select IRQ_FORCED_THREADING
> select HAVE_RCU_TABLE_FREE if SMP
> select HAVE_SYSCALL_TRACEPOINTS
> -   select HAVE_CBPF_JIT
> +   select HAVE_CBPF_JIT if CPU_BIG_ENDIAN
> select HAVE_ARCH_JUMP_LABEL
> select ARCH_HAVE_NMI_SAFE_CMPXCHG
> select ARCH_HAS_GCOV_PROFILE_ALL
> 
> 
> Michael,
> Let me know your thoughts on whether you intend to take this patch or 
> Cascardo's patch for -stable before the eBPF patches. I can redo my 
> patches accordingly.

Can one of you send me a proper version of this patch, with change log and
sign-off etc.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-21 Thread Michael Ellerman

On Fri, 2016-06-17 at 10:00 -0300, Thadeu Lima de Souza Cascardo wrote:
> From a984dc02b6317a1d3a3c2302385adba5227be5bd Mon Sep 17 00:00:00 2001
> From: Thadeu Lima de Souza Cascardo 
> Date: Wed, 15 Jun 2016 13:22:12 -0300
> Subject: [PATCH] ppc: Fix BPF JIT for ABIv2
> 
> ABIv2 used for ppc64le does not use function descriptors. Without this patch,
> whenever BPF JIT is enabled, we get a crash as below.
> 
...

> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
> index 889fd19..28b89ed 100644
> --- a/arch/powerpc/net/bpf_jit.h
> +++ b/arch/powerpc/net/bpf_jit.h
> @@ -70,7 +70,7 @@ DECLARE_LOAD_FUNC(sk_load_half);
>  DECLARE_LOAD_FUNC(sk_load_byte);
>  DECLARE_LOAD_FUNC(sk_load_byte_msh);
>  
> -#ifdef CONFIG_PPC64
> +#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF != 2)
>  #define FUNCTION_DESCR_SIZE  24
>  #else
>  #define FUNCTION_DESCR_SIZE  0
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 2d66a84..035b887 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -664,7 +664,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
>  
>   if (image) {
>   bpf_flush_icache(code_base, code_base + (proglen/4));
> -#ifdef CONFIG_PPC64
> +#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF != 2)
>   /* Function descriptor nastiness: Address + TOC */
>   ((u64 *)image)[0] = (u64)code_base;
>   ((u64 *)image)[1] = local_paca->kernel_toc;


Confirmed that even with this patch we still crash:

  # echo 1 > /proc/sys/net/core/bpf_jit_enable
  # modprobe test_bpf
  BPF filter opcode 0020 (@3) unsupported
  BPF filter opcode 0020 (@2) unsupported
  BPF filter opcode 0020 (@0) unsupported
  Unable to handle kernel paging request for data at address 0xd54f65e8
  Faulting instruction address: 0xc08765f8
  cpu 0x0: Vector: 300 (Data Access) at [c34f3480]
  pc: c08765f8: skb_copy_bits+0x158/0x330
  lr: c008fb7c: bpf_slow_path_byte+0x28/0x54
  sp: c34f3700
 msr: 80010280b033
 dar: d54f65e8
   dsisr: 4000
current = 0xc001f857d8d0
paca= 0xc7b8 softe: 0irq_happened: 0x01
  pid   = 2993, comm = modprobe
  Linux version 4.7.0-rc3-00055-g9497a1c1c5b4-dirty 
(mich...@ka3.ozlabs.ibm.com) () #30 SMP Wed Jun 22 15:06:58 AEST 2016
  enter ? for help
  [c34f3770] c008fb7c bpf_slow_path_byte+0x28/0x54
  [c34f37e0] d7bb004c
  [c34f3900] d5331668 test_bpf_init+0x5fc/0x7f8 [test_bpf]
  [c34f3a30] c000b628 do_one_initcall+0x68/0x1d0
  [c34f3af0] c09beb24 do_init_module+0x90/0x240
  [c34f3b80] c01642bc load_module+0x206c/0x22f0
  [c34f3d30] c01648b0 SyS_finit_module+0x120/0x180
  [c34f3e30] c0009260 system_call+0x38/0x108
  --- Exception: c01 (System Call) at 3fff7ffa2db4


cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE

2016-06-21 Thread Gautham R Shenoy

On Tue, Jun 21, 2016 at 09:47:19PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 21, 2016 at 03:43:56PM -0400, Tejun Heo wrote:
> > On Tue, Jun 21, 2016 at 09:37:09PM +0200, Peter Zijlstra wrote:
> > > Hurm.. So I've applied it, just to get this issue sorted, but I'm not
> > > entirely sure I like it.
> > > 
> > > I think I prefer ego's version because that makes it harder to get stuff
> > > to run on !active,online cpus. I think we really want to be careful what
> > > gets to run during that state.
> > 
> > The original patch just did set_cpus_allowed one more time late enough
> > so that the target kthread (in most cases) doesn't have to go through
> > fallback rq selection afterwards.  I don't know what the long term
> > solution is but CPU_ONLINE callbacks should be able to bind kthreads
> > to the new CPU one way or the other.
> 
> Fair enough; clearly I need to stare harder. In any case, patch is on
> its way to sched/urgent.

Thanks Tejun, Peter!
> 

--
Regards
gautham.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 07/11] powerpc/powernv: set power_save func after the idle states are initialized

2016-06-21 Thread Michael Neuling

On Wed, 2016-06-22 at 11:54 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2016-06-08 at 11:54 -0500, Shreyas B. Prabhu wrote:
> > 
> > pnv_init_idle_states discovers supported idle states from the
> > device tree and does the required initialization. Set power_save
> > function pointer only after this initialization is done
> > 
> > Reviewed-by: Gautham R. Shenoy 
> > Signed-off-by: Shreyas B. Prabhu 
> Acked-by: Benjamin Herrenschmidt 
> 
> Please merge that one as-is now, no need to wait for the rest, as
> otherwise pwoer9 crashes at boot. It doesn't need to wait for the
> rest of the series.

Acked-by: Michael Neuling 

For the same reason. Without this we need powersave=off on the cmdline on
POWER9.

Mikey

> 
> Cheers,
> Ben.
> 
> > 
> > ---
> > - No changes since v1
> > 
> >  arch/powerpc/platforms/powernv/idle.c  | 3 +++
> >  arch/powerpc/platforms/powernv/setup.c | 2 +-
> >  2 files changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/idle.c
> > b/arch/powerpc/platforms/powernv/idle.c
> > index fcc8b68..fbb09fb 100644
> > --- a/arch/powerpc/platforms/powernv/idle.c
> > +++ b/arch/powerpc/platforms/powernv/idle.c
> > @@ -285,6 +285,9 @@ static int __init pnv_init_idle_states(void)
> >     }
> >  
> >     pnv_alloc_idle_core_states();
> > +
> > +   if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
> > +   ppc_md.power_save = power7_idle;
> >  out_free:
> >     kfree(flags);
> >  out:
> > diff --git a/arch/powerpc/platforms/powernv/setup.c
> > b/arch/powerpc/platforms/powernv/setup.c
> > index ee6430b..8492bbb 100644
> > --- a/arch/powerpc/platforms/powernv/setup.c
> > +++ b/arch/powerpc/platforms/powernv/setup.c
> > @@ -315,7 +315,7 @@ define_machine(powernv) {
> >     .get_proc_freq  = pnv_get_proc_freq,
> >     .progress   = pnv_progress,
> >     .machine_shutdown   = pnv_shutdown,
> > -   .power_save = power7_idle,
> > +   .power_save = NULL,
> >     .calibrate_decr = generic_calibrate_decr,
> >  #ifdef CONFIG_KEXEC
> >     .kexec_cpu_down = pnv_kexec_cpu_down,
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v13 01/16] PCI: Let pci_mmap_page_range() take resource address

2016-06-21 Thread Yinghai Lu

On Sat, Jun 18, 2016 at 5:17 AM, Bjorn Helgaas  wrote:
> On Fri, Jun 17, 2016 at 07:24:46PM -0700, Yinghai Lu wrote:
>> In 8c05cd08a7 ("PCI: fix offset check for sysfs mmapped files"), try
>> to check exposed value with resource start/end in proc mmap path.
>>
>> |start = vma->vm_pgoff;
>> |size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
>> |pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
>> |pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
>> |if (start >= pci_start && start < pci_start + size &&
>> |start + nr <= pci_start + size)
>>
>> That breaks sparc that exposed value is BAR value, and need to be offseted
>> to resource address.
>
> I asked this same question of the v12 patch, but I don't think you
> answered it:
>
> I'm not quite sure what you're saying here.  Are you saying that sparc
> is currently broken, and this patch fixes it?  If so, what exactly is
> broken?  Can you give a small example of an mmap that is currently
> broken?

Yes, for sparc that path (proc mmap) is broken, but only according to
code checking.

The reason for the problem is not discovered is that seem all users
(other than x86) are not
use proc_mmap ?

vma->vm_pgoff is that code segment is User/BAR value >> PAGE_SHIFT.
pci_start is resource->start >> PAGE_SHIFT.

For sparc, resource start is different from BAR start aka pci bus address.
pci bus address add offset to be the resource start.

Thanks

Yinghai
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-21 Thread Michael Ellerman

On Tue, 2016-06-21 at 08:45 -0700, Alexei Starovoitov wrote:
> On 6/21/16 7:47 AM, Thadeu Lima de Souza Cascardo wrote:
> > > > 
> > > > The calling convention is different with ABIv2 and so we'll need changes
> > > > in bpf_slow_path_common() and sk_negative_common().
> > > 
> > > How big would those changes be? Do we know?
> > > 
> > > How come no one reported this was broken previously? This is the first 
> > > I've
> > > heard of it being broken.
> > > 
> > 
> > I just heard of it less than two weeks ago, and only could investigate it 
> > last
> > week, when I realized mainline was also affected.
> > 
> > It looks like the little-endian support for classic JIT were done before the
> > conversion to ABIv2. And as JIT is disabled by default, no one seems to have
> > exercised it.
> 
> it's not a surprise unfortunately. The JITs that were written before
> test_bpf.ko was developed were missing corner cases. Typical tcpdump
> would be fine, but fragmented packets, negative offsets and
> out-out-bounds wouldn't be handled correctly.
> I'd suggest to validate the stable backport with test_bpf as well.
 
OK thanks.

I have been running seltests/net/test_bpf, but I realise now it doesn't enable
the JIT.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] tools/perf: Fix the mask in regs_dump__printf and print_sample_iregs

2016-06-21 Thread Madhavan Srinivasan



On Tuesday 21 June 2016 09:05 PM, Yury Norov wrote:
> On Tue, Jun 21, 2016 at 08:26:40PM +0530, Madhavan Srinivasan wrote:
>> When decoding the perf_regs mask in regs_dump__printf(),
>> we loop through the mask using find_first_bit and find_next_bit functions.
>> "mask" is of type "u64", but sent as a "unsigned long *" to
>> lib functions along with sizeof().
>>
>> While the exisitng code works fine in most of the case,
>> the logic is broken when using a 32bit perf on a 64bit kernel (Big Endian).
>> When reading u64 using (u32 *)()[0], perf (lib/find_*_bit()) assumes it 
>> gets
>> lower 32bits of u64 which is wrong. Proposed fix is to swap the words
>> of the u64 to handle this case. This is _not_ endianess swap.
>>
>> Suggested-by: Yury Norov 
>> Cc: Yury Norov 
>> Cc: Peter Zijlstra 
>> Cc: Ingo Molnar 
>> Cc: Arnaldo Carvalho de Melo 
>> Cc: Alexander Shishkin 
>> Cc: Jiri Olsa 
>> Cc: Adrian Hunter 
>> Cc: Kan Liang 
>> Cc: Wang Nan 
>> Cc: Michael Ellerman 
>> Signed-off-by: Madhavan Srinivasan 
>> ---
>> Changelog v2:
>> 1)Moved the swap code to a common function
>> 2)Added more comments in the code
>>
>> Changelog v1:
>> 1)updated commit message and patch subject
>> 2)Add the fix to print_sample_iregs() in builtin-script.c
>>
>>  tools/include/linux/bitmap.h |  9 +
> What about include/linux/bitmap.h? I think we'd place it there first.

Wanted to handle that separately.

>
>>  tools/perf/builtin-script.c  | 16 +++-
>>  tools/perf/util/session.c| 16 +++-
>>  3 files changed, 39 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
>> index 28f5493da491..79998b26eb04 100644
>> --- a/tools/include/linux/bitmap.h
>> +++ b/tools/include/linux/bitmap.h
>> @@ -2,6 +2,7 @@
>>  #define _PERF_BITOPS_H
>>  
>>  #include 
>> +#include 
>>  #include 
>>  
>>  #define DECLARE_BITMAP(name,bits) \
>> @@ -22,6 +23,14 @@ void __bitmap_or(unsigned long *dst, const unsigned long 
>> *bitmap1,
>>  #define small_const_nbits(nbits) \
>>  (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG)
>>  
>> +static inline void bitmap_from_u64(unsigned long *_mask, u64 mask)
> Inline is not required. Some people don't not like it. Underscored parameter 
> in

Not sure why you say that. IIUC we can avoid a function call overhead,
also rest of the functions in the file likes it.

> function declaration is not the best idea as well. Try:
> static void bitmap_from_u64(unsigned long *bitmap, u64 mask)
>
>> +{
>> +_mask[0] = mask & ULONG_MAX;
>> +
>> +if (sizeof(mask) > sizeof(unsigned long))
>> +_mask[1] = mask >> 32;
>> +}
>> +
>>  static inline void bitmap_zero(unsigned long *dst, int nbits)
>>  {
>>  if (small_const_nbits(nbits))
>> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
>> index e3ce2f34d3ad..73928310fd91 100644
>> --- a/tools/perf/builtin-script.c
>> +++ b/tools/perf/builtin-script.c
>> @@ -412,11 +412,25 @@ static void print_sample_iregs(struct perf_sample 
>> *sample,
>>  struct regs_dump *regs = >intr_regs;
>>  uint64_t mask = attr->sample_regs_intr;
>>  unsigned i = 0, r;
>> +unsigned long _mask[sizeof(mask)/sizeof(unsigned long)];
> If we start with it, I think we'd hide declaration machinery as well:
>
> #define DECLARE_L64_BITMAP(__name) unsigned long 
> __name[sizeof(u64)/sizeof(unsigned long)]
> or
> #define L64_BITMAP_SIZE (sizeof(u64)/sizeof(unsigned long))
>
> Or both :) Whatever you prefer.

ok

>
>>  
>>  if (!regs)
>>  return;
>>  
>> -for_each_set_bit(r, (unsigned long *) , sizeof(mask) * 8) {
>> +/*
>> + * Since u64 is passed as 'unsigned long *', check
>> + * to see whether we need to swap words within u64.
>> + * Reason being, in 32 bit big endian userspace on a
>> + * 64bit kernel, 'unsigned long' is 32 bits.
>> + * When reading u64 using (u32 *)()[0] and (u32 *)()[1],
>> + * we will get wrong value for the mask. This is what
>> + * find_first_bit() and find_next_bit() is doing.
>> + * Issue here is "(u32 *)()[0]" gets upper 32 bits of u64,
>> + * but perf assumes it gets lower 32bits of u64. Hence the check
>> + * and swap.
>> + */
>> +bitmap_from_u64(_mask, mask);
>> +for_each_set_bit(r, _mask, sizeof(mask) * 8) {
>>  u64 val = regs->regs[i++];
>>  printf("%5s:0x%"PRIx64" ", perf_reg_name(r), val);
>>  }
>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>> index 5214974e841a..1337b1c73f82 100644
>> --- a/tools/perf/util/session.c
>> +++ b/tools/perf/util/session.c
>> @@ -940,8 +940,22 @@ static void branch_stack__printf(struct

[PATCH v2] ibmvnic: fix to use list_for_each_safe() when delete items

2016-06-21 Thread Wei Yongjun


Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each() macro aptly named
list_for_each_safe().

Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 864cb21..ecdb685 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2121,7 +2121,7 @@ static void handle_error_info_rsp(union ibmvnic_crq *crq,
  struct ibmvnic_adapter *adapter)
 {
struct device *dev = >vdev->dev;
-   struct ibmvnic_error_buff *error_buff;
+   struct ibmvnic_error_buff *error_buff, *tmp;
unsigned long flags;
bool found = false;
int i;
@@ -2133,7 +2133,7 @@ static void handle_error_info_rsp(union ibmvnic_crq *crq,
}
 
 	spin_lock_irqsave(>error_list_lock, flags);

-   list_for_each_entry(error_buff, >errors, list)
+   list_for_each_entry_safe(error_buff, tmp, >errors, list)
if (error_buff->error_id == crq->request_error_rsp.error_id) {
found = true;
list_del(_buff->list);
@@ -3141,14 +3141,14 @@ static void handle_request_ras_comp_num_rsp(union 
ibmvnic_crq *crq,
 
 static void ibmvnic_free_inflight(struct ibmvnic_adapter *adapter)

 {
-   struct ibmvnic_inflight_cmd *inflight_cmd;
+   struct ibmvnic_inflight_cmd *inflight_cmd, *tmp1;
struct device *dev = >vdev->dev;
-   struct ibmvnic_error_buff *error_buff;
+   struct ibmvnic_error_buff *error_buff, *tmp2;
unsigned long flags;
unsigned long flags2;
 
 	spin_lock_irqsave(>inflight_lock, flags);

-   list_for_each_entry(inflight_cmd, >inflight, list) {
+   list_for_each_entry_safe(inflight_cmd, tmp1, >inflight, list) {
switch (inflight_cmd->crq.generic.cmd) {
case LOGIN:
dma_unmap_single(dev, adapter->login_buf_token,
@@ -3165,8 +3165,8 @@ static void ibmvnic_free_inflight(struct ibmvnic_adapter 
*adapter)
break;
case REQUEST_ERROR_INFO:
spin_lock_irqsave(>error_list_lock, flags2);
-   list_for_each_entry(error_buff, >errors,
-   list) {
+   list_for_each_entry_safe(error_buff, tmp2,
+>errors, list) {
dma_unmap_single(dev, error_buff->dma,
 error_buff->len,
 DMA_FROM_DEVICE);



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND][PATCH v2] powerpc: Export thread_struct.used_vr/used_vsr to user space

2016-06-21 Thread Simon Guo

On Tue, Jun 21, 2016 at 02:30:06PM +0800, Simon Guo wrote:
> Hi Michael,
> On Wed, Apr 06, 2016 at 03:00:12PM +0800, Simon Guo wrote:
> > These 2 fields track whether user process has used Altivec/VSX
> > registers or not. They are used by kernel to setup signal frame
> > on user stack correctly regarding vector part.
> > 
> > CRIU(Checkpoint and Restore In User space) builds signal frame
> > for restored process. It will need this export information to
> > setup signal frame correctly. And CRIU will need to restore these
> > 2 fields for the restored process.
> > 
> > Signed-off-by: Simon Guo 
> > Reviewed-by: Laurent Dufour 
> > ---
> 
> Just a kind reminder per our previous discussion.
> If possible, please help pull this in during 4.8 merge window. Some CRIU work 
> items are pending for it.
> 
> Have a nice day.
> 
> Thanks,
> - Simon

+ linuxppc-dev list

Thanks,
- Simon (IBM LTC)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] ibmvnic: fix to use list_for_each_safe() when delete items

2016-06-21 Thread Wei Yongjun


Hi  Thomas Falcon,

Thanks for found this. I will send new patch include your changes.

Regards,
Yongjun Wei

On 06/22/2016 12:01 AM, Thomas Falcon wrote:

On 06/20/2016 10:50 AM, Thomas Falcon wrote:

On 06/17/2016 09:53 PM, weiyj...@163.com wrote:

From: Wei Yongjun 

Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each() macro aptly named
list_for_each_safe().

Signed-off-by: Wei Yongjun 
---
  drivers/net/ethernet/ibm/ibmvnic.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 864cb21..0b6a922 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3141,14 +3141,14 @@ static void handle_request_ras_comp_num_rsp(union 
ibmvnic_crq *crq,
  
  static void ibmvnic_free_inflight(struct ibmvnic_adapter *adapter)

  {
-   struct ibmvnic_inflight_cmd *inflight_cmd;
+   struct ibmvnic_inflight_cmd *inflight_cmd, *tmp1;
struct device *dev = >vdev->dev;
-   struct ibmvnic_error_buff *error_buff;
+   struct ibmvnic_error_buff *error_buff, *tmp2;
unsigned long flags;
unsigned long flags2;
  
  	spin_lock_irqsave(>inflight_lock, flags);

-   list_for_each_entry(inflight_cmd, >inflight, list) {
+   list_for_each_entry_safe(inflight_cmd, tmp1, >inflight, list) {
switch (inflight_cmd->crq.generic.cmd) {
case LOGIN:
dma_unmap_single(dev, adapter->login_buf_token,
@@ -3165,8 +3165,8 @@ static void ibmvnic_free_inflight(struct ibmvnic_adapter 
*adapter)
break;
case REQUEST_ERROR_INFO:
spin_lock_irqsave(>error_list_lock, flags2);
-   list_for_each_entry(error_buff, >errors,
-   list) {
+   list_for_each_entry_safe(error_buff, tmp2,
+>errors, list) {
dma_unmap_single(dev, error_buff->dma,
 error_buff->len,
 DMA_FROM_DEVICE);


Thanks!

Acked-by: Thomas Falcon 

Hello, I apologize for prematurely ack'ing this.  There is another situation 
where you could use list_for_each_entry_safe in the function 
handle_error_info_rsp.  Could you include this in your patch, please?

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 864cb21..e9968d9 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2121,7 +2121,7 @@ static void handle_error_info_rsp(union ibmvnic_crq *crq,
   struct ibmvnic_adapter *adapter)
  {
 struct device *dev = >vdev->dev;
-   struct ibmvnic_error_buff *error_buff;
+   struct ibmvnic_error_buff *error_buff, *tmp;
 unsigned long flags;
 bool found = false;
 int i;
@@ -2133,7 +2133,7 @@ static void handle_error_info_rsp(union ibmvnic_crq *crq,
 }
  
 spin_lock_irqsave(>error_list_lock, flags);

-   list_for_each_entry(error_buff, >errors, list)
+   list_for_each_entry_safe(error_buff, tmp, >errors, list)
 if (error_buff->error_id == crq->request_error_rsp.error_id) {
 found = true;
 list_del(_buff->list);



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev





___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: Fix faults caused by radix patching of SLB miss handler

2016-06-21 Thread Aneesh Kumar K.V

Michael Ellerman  writes:

> As part of the Radix MMU support we added some feature sections in the
> SLB miss handler. These are intended to catch the case that we
> incorrectly take an SLB miss when Radix is enabled, and instead of
> crashing weirdly they bail out to a well defined exit path and trigger
> an oops.
>
> However the way they were written meant the bailout case was enabled by
> default until we did CPU feature patching.
>
> On powermacs the early debug prints in setup_system() can cause an SLB
> miss, which happens before code patching, and so the SLB miss handler
> would incorrectly bailout and crash during boot.
>
> Fix it by inverting the sense of the feature section, so that the code
> which is in place at boot is correct for the hash case. Once we
> determine we are using Radix - which will never happen on a powermac -
> only then do we patch in the bailout case which unconditionally jumps.
>
> Fixes: caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate 
> hash related code")
> Reported-by: Denis Kirjanov 
> Tested-by: Denis Kirjanov 
> Signed-off-by: Michael Ellerman 


Reviewed-by: Aneesh Kumar K.V 

> ---
>  arch/powerpc/kernel/exceptions-64s.S | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 4c9440629128..8bcc1b457115 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1399,11 +1399,12 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX)
>   lwz r9,PACA_EXSLB+EX_CCR(r13)   /* get saved CR */
>
>   mtlrr10
> -BEGIN_MMU_FTR_SECTION
> - b   2f
> -END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX)
>   andi.   r10,r12,MSR_RI  /* check for unrecoverable exception */
> +BEGIN_MMU_FTR_SECTION
>   beq-2f
> +FTR_SECTION_ELSE
> + b   2f
> +ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_RADIX)
>
>  .machine push
>  .machine "power4"
> -- 
> 2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 07/11] powerpc/powernv: set power_save func after the idle states are initialized

2016-06-21 Thread Benjamin Herrenschmidt

On Wed, 2016-06-08 at 11:54 -0500, Shreyas B. Prabhu wrote:
> pnv_init_idle_states discovers supported idle states from the
> device tree and does the required initialization. Set power_save
> function pointer only after this initialization is done
> 
> Reviewed-by: Gautham R. Shenoy 
> Signed-off-by: Shreyas B. Prabhu 

Acked-by: Benjamin Herrenschmidt 

Please merge that one as-is now, no need to wait for the rest, as
otherwise pwoer9 crashes at boot. It doesn't need to wait for the
rest of the series.

Cheers,
Ben.

> ---
> - No changes since v1
> 
>  arch/powerpc/platforms/powernv/idle.c  | 3 +++
>  arch/powerpc/platforms/powernv/setup.c | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c
> b/arch/powerpc/platforms/powernv/idle.c
> index fcc8b68..fbb09fb 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -285,6 +285,9 @@ static int __init pnv_init_idle_states(void)
>   }
>  
>   pnv_alloc_idle_core_states();
> +
> + if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
> + ppc_md.power_save = power7_idle;
>  out_free:
>   kfree(flags);
>  out:
> diff --git a/arch/powerpc/platforms/powernv/setup.c
> b/arch/powerpc/platforms/powernv/setup.c
> index ee6430b..8492bbb 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -315,7 +315,7 @@ define_machine(powernv) {
>   .get_proc_freq  = pnv_get_proc_freq,
>   .progress   = pnv_progress,
>   .machine_shutdown   = pnv_shutdown,
> - .power_save = power7_idle,
> + .power_save = NULL,
>   .calibrate_decr = generic_calibrate_decr,
>  #ifdef CONFIG_KEXEC
>   .kexec_cpu_down = pnv_kexec_cpu_down,

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/6] kexec_file: Add buffer hand-over for the next kernel

2016-06-21 Thread Dave Young

On 06/20/16 at 10:44pm, Thiago Jung Bauermann wrote:
> Hello,
> 
> This patch series implements a mechanism which allows the kernel to pass on
> a buffer to the kernel that will be kexec'd. This buffer is passed as a
> segment which is added to the kimage when it is being prepared by
> kexec_file_load.
> 
> How the second kernel is informed of this buffer is architecture-specific.
> On PowerPC, this is done via the device tree, by checking the properties
> /chosen/linux,kexec-handover-buffer-start and
> /chosen/linux,kexec-handover-buffer-end, which is analogous to how the
> kernel finds the initrd.
> 
> This feature was implemented because the Integrity Measurement Architecture
> subsystem needs to preserve its measurement list accross the kexec reboot.
> This is so that IMA can implement trusted boot support on the OpenPower
> platform, because on such systems an intermediary Linux instance running as
> part of the firmware is used to boot the target operating system via kexec.
> Using this mechanism, IMA on this intermediary instance can hand over to the
> target OS the measurements of the components that were used to boot it.

We have CONFIG_KEXEC_VERIFY_SIG, why not verifying the kernel to be
loaded instead?  I feel IMA should rebuild its measurement instead of
passing it to another kernel. Kexec reboot is also a reboot. If we have
to preserve something get from firmware we can do it, but other than
that I think it sounds not a good idea.

> 
> Because there could be additional measurement events between the
> kexec_file_load call and the actual reboot, IMA needs a way to update the
> buffer with those additional events before rebooting. One can minimize
> the interval between the kexec_file_load and the reboot syscalls, but as
> small as it can be, there is always the possibility that the measurement
> list will be out of date at the time of reboot.
> 
> To address this issue, this patch series also introduces kexec_update_segment,
> which allows a reboot notifier to change the contents of the image segment
> during the reboot process.
> 
> There's one patch which makes kimage_load_normal_segment and
> kexec_update_segment share code. It's not much code that they can share
> though, so I'm not sure if it's worth including this patch.
> 
> The last patch is not intended to be merged, it just demonstrates how this
> feature can be used.
> 
> This series applies on top of v2 of the "kexec_file_load implementation
> for PowerPC" patch series at:

The kexec_file_load patches should be addressed first, no?

> 
> http://lists.infradead.org/pipermail/kexec/2016-June/016078.html
> 
> Thiago Jung Bauermann (6):
>   kexec_file: Add buffer hand-over support for the next kernel
>   powerpc: kexec_file: Add buffer hand-over support for the next kernel
>   kexec_file: Allow skipping checksum calculation for some segments.
>   kexec_file: Add mechanism to update kexec segments.
>   kexec: Share logic to copy segment page contents.
>   IMA: Demonstration code for kexec buffer passing.
> 
>  arch/powerpc/include/asm/kexec.h   |   9 ++
>  arch/powerpc/kernel/kexec_elf_64.c |  50 +++-
>  arch/powerpc/kernel/machine_kexec_64.c |  64 ++
>  arch/x86/kernel/crash.c|   4 +-
>  arch/x86/kernel/kexec-bzimage64.c  |   6 +-
>  include/linux/ima.h|  11 ++
>  include/linux/kexec.h  |  47 +++-
>  kernel/kexec_core.c| 205 
> ++---
>  kernel/kexec_file.c| 102 ++--
>  security/integrity/ima/ima.h   |   5 +
>  security/integrity/ima/ima_init.c  |  26 +
>  security/integrity/ima/ima_template.c  |  79 +
>  12 files changed, 547 insertions(+), 61 deletions(-)
> 
> -- 
> 1.9.1
> 

Thanks
Dave
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled

2016-06-21 Thread Aravinda Prasad



On Monday 20 June 2016 10:48 AM, Paul Mackerras wrote:
> Hi Aravinda,
> 
> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>> exit reasons upon a machine check exception (MCE) in
>> the guest address space if the KVM_CAP_PPC_FWNMI
>> capability is enabled (instead of delivering 0x200
>> interrupt to guest). This enables QEMU to build error
>> log and deliver machine check exception to guest via
>> guest registered machine check handler.
>>
>> This approach simplifies the delivering of machine
>> check exception to guest OS compared to the earlier
>> approach of KVM directly invoking 0x200 guest interrupt
>> vector. In the earlier approach QEMU was enhanced to
>> patch the 0x200 interrupt vector during boot. The
>> patched code at 0x200 issued a private hcall to pass
>> the control to QEMU to build the error log.
>>
>> This design/approach is based on the feedback for the
>> QEMU patches to handle machine check exception. Details
>> of earlier approach of handling machine check exception
>> in QEMU and related discussions can be found at:
>>
>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
>>
>> Signed-off-by: Aravinda Prasad 
> 
> Are you in the process of doing a new version of this patch with the
> requested changes?

Yes, I am working (intermittently) on the new version. But, not able to
finish off and post it. Will complete it and post the new version.

Regards,
Aravinda

> 
> Paul.
> 

-- 
Regards,
Aravinda

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 9/9] powerpc: Add purgatory for kexec_file_load implementation.

2016-06-21 Thread Thiago Jung Bauermann

This purgatory implementation comes from kexec-tools, almost unchanged.

The only changes were that the sha256_regions global variable was
renamed to sha_regions to match what kexec_file_load expects, and to
use the sha256.c file from x86's purgatory to avoid adding yet another
SHA-256 implementation.

Also, some formatting warnings found by checkpatch.pl were fixed.

Signed-off-by: Thiago Jung Bauermann 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
---
 arch/powerpc/Makefile |   4 +
 arch/powerpc/purgatory/.gitignore |   2 +
 arch/powerpc/purgatory/Makefile   |  36 +++
 arch/powerpc/purgatory/console-ppc64.c|  38 +++
 arch/powerpc/purgatory/crashdump-ppc64.h  |  42 
 arch/powerpc/purgatory/crashdump_backup.c |  36 +++
 arch/powerpc/purgatory/crtsavres.S|   5 +
 arch/powerpc/purgatory/hvCall.S   |  27 +
 arch/powerpc/purgatory/hvCall.h   |   8 ++
 arch/powerpc/purgatory/kexec-sha256.h |  11 ++
 arch/powerpc/purgatory/ppc64_asm.h|  20 
 arch/powerpc/purgatory/printf.c   | 164 ++
 arch/powerpc/purgatory/purgatory-ppc64.c  |  41 
 arch/powerpc/purgatory/purgatory-ppc64.h  |   6 ++
 arch/powerpc/purgatory/purgatory.c|  62 +++
 arch/powerpc/purgatory/purgatory.h|  11 ++
 arch/powerpc/purgatory/sha256.c   |   6 ++
 arch/powerpc/purgatory/sha256.h   |   1 +
 arch/powerpc/purgatory/string.S   |   1 +
 arch/powerpc/purgatory/v2wrap.S   | 134 
 20 files changed, 655 insertions(+)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 709a22a3e824..293322855cce 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -249,6 +249,7 @@ core-y  += arch/powerpc/kernel/ 
\
 core-$(CONFIG_XMON)+= arch/powerpc/xmon/
 core-$(CONFIG_KVM) += arch/powerpc/kvm/
 core-$(CONFIG_PERF_EVENTS) += arch/powerpc/perf/
+core-$(CONFIG_KEXEC_FILE)  += arch/powerpc/purgatory/
 
 drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/
 
@@ -370,6 +371,9 @@ archclean:
$(Q)$(MAKE) $(clean)=$(boot)
 
 archprepare: checkbin
+ifeq ($(CONFIG_KEXEC_FILE),y)
+   $(Q)$(MAKE) $(build)=arch/powerpc/purgatory 
arch/powerpc/purgatory/kexec-purgatory.c
+endif
 
 # Use the file '.tmp_gas_check' for binutils tests, as gas won't output
 # to stdout and these checks are run even on install targets.
diff --git a/arch/powerpc/purgatory/.gitignore 
b/arch/powerpc/purgatory/.gitignore
new file mode 100644
index ..e9e66f178a6d
--- /dev/null
+++ b/arch/powerpc/purgatory/.gitignore
@@ -0,0 +1,2 @@
+kexec-purgatory.c
+purgatory.ro
diff --git a/arch/powerpc/purgatory/Makefile b/arch/powerpc/purgatory/Makefile
new file mode 100644
index ..63daf95e5703
--- /dev/null
+++ b/arch/powerpc/purgatory/Makefile
@@ -0,0 +1,36 @@
+purgatory-y := purgatory.o printf.o string.o v2wrap.o hvCall.o \
+   purgatory-ppc64.o console-ppc64.o crashdump_backup.o \
+   crtsavres.o sha256.o
+
+targets += $(purgatory-y)
+PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
+
+LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostartfiles \
+   -nostdlib -nodefaultlibs
+targets += purgatory.ro
+
+# Default KBUILD_CFLAGS can have -pg option set when FTRACE is enabled. That
+# in turn leaves some undefined symbols like __fentry__ in purgatory and not
+# sure how to relocate those. Like kexec-tools, use custom flags.
+
+KBUILD_CFLAGS := -Wall -Wstrict-prototypes -fno-strict-aliasing \
+   -fno-zero-initialized-in-bss -fno-builtin -ffreestanding \
+   -fno-PIC -fno-PIE -fno-stack-protector  -fno-exceptions \
+   -msoft-float -MD -Os
+KBUILD_CFLAGS += -m$(CONFIG_WORD_SIZE)
+
+$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
+   $(call if_changed,ld)
+
+targets += kexec-purgatory.c
+
+CMD_BIN2C = $(objtree)/scripts/basic/bin2c
+quiet_cmd_bin2c = BIN2C   $@
+  cmd_bin2c = $(CMD_BIN2C) kexec_purgatory < $< > $@
+
+$(obj)/kexec-purgatory.c: $(obj)/purgatory.ro FORCE
+   $(call if_changed,bin2c)
+   @:
+
+
+obj-$(CONFIG_KEXEC_FILE)   += kexec-purgatory.o
diff --git a/arch/powerpc/purgatory/console-ppc64.c 
b/arch/powerpc/purgatory/console-ppc64.c
new file mode 100644
index ..3d07be0b5d08
--- /dev/null
+++ b/arch/powerpc/purgatory/console-ppc64.c
@@ -0,0 +1,38 @@
+/*
+ * kexec: Linux boots Linux
+ *
+ * Created by: Mohan Kumar M (mo...@in.ibm.com)
+ *
+ * Copyright (C) IBM Corporation, 2005. All rights reserved
+ *
+ * Code taken from kexec-tools.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation (version 2 of the License).
+ *
+ * This program is distributed in

[PATCH v3 8/9] powerpc: Add support for loading ELF kernels with kexec_file_load.

2016-06-21 Thread Thiago Jung Bauermann

This uses all the infrastructure built up by the previous patches
in the series to load an ELF vmlinux file and an initrd. It uses the
flattened device tree at initial_boot_params as a base and adjusts memory
reservations and its /chosen node for the next kernel.

elf64_apply_relocate_add was extended to support relative symbols. This
is necessary because before relocation, the module loading mechanism
adjusts Elf64_Sym.st_value to point to the absolute memory address
while the kexec purgatory relocation code does that during relocation.

The patch also adds relocation types used by the purgatory.

Signed-off-by: Thiago Jung Bauermann 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
---
 arch/powerpc/include/asm/elf_util.h |   1 +
 arch/powerpc/include/asm/kexec_elf_64.h |  10 +
 arch/powerpc/kernel/Makefile|   5 +-
 arch/powerpc/kernel/elf_util_64.c   |  84 -
 arch/powerpc/kernel/kexec_elf_64.c  | 560 
 arch/powerpc/kernel/machine_kexec_64.c  |  86 -
 arch/powerpc/kernel/module_64.c |   5 +-
 7 files changed, 747 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/elf_util.h 
b/arch/powerpc/include/asm/elf_util.h
index 47d15515ba33..18703d56eabd 100644
--- a/arch/powerpc/include/asm/elf_util.h
+++ b/arch/powerpc/include/asm/elf_util.h
@@ -86,6 +86,7 @@ int elf64_apply_relocate_add(const struct elf_info *elf_info,
 const char *strtab, const Elf64_Rela *rela,
 unsigned int num_rela, void *syms_base,
 void *loc_base, Elf64_Addr addr_base,
+bool relative_symbols, bool check_symbols,
 const char *obj_name);
 
 #endif /* _ASM_POWERPC_ELF_UTIL_H */
diff --git a/arch/powerpc/include/asm/kexec_elf_64.h 
b/arch/powerpc/include/asm/kexec_elf_64.h
new file mode 100644
index ..30da6bc0ccf8
--- /dev/null
+++ b/arch/powerpc/include/asm/kexec_elf_64.h
@@ -0,0 +1,10 @@
+#ifndef __POWERPC_KEXEC_ELF_64_H__
+#define __POWERPC_KEXEC_ELF_64_H__
+
+#ifdef CONFIG_KEXEC_FILE
+
+extern struct kexec_file_ops kexec_elf64_ops;
+
+#endif /* CONFIG_KEXEC_FILE */
+
+#endif /* __POWERPC_KEXEC_ELF_64_H__ */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 8a53fccaa053..b89a2ae1b2a0 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -110,6 +110,7 @@ obj-$(CONFIG_PCI)   += pci_$(CONFIG_WORD_SIZE).o 
$(pci64-y) \
 obj-$(CONFIG_PCI_MSI)  += msi.o
 obj-$(CONFIG_KEXEC)+= machine_kexec.o crash.o \
   machine_kexec_$(CONFIG_WORD_SIZE).o
+obj-$(CONFIG_KEXEC_FILE)   += kexec_elf_$(CONFIG_WORD_SIZE).o
 obj-$(CONFIG_AUDIT)+= audit.o
 obj64-$(CONFIG_AUDIT)  += compat_audit.o
 
@@ -124,9 +125,11 @@ ifneq ($(CONFIG_PPC_INDIRECT_PIO),y)
 obj-y  += iomap.o
 endif
 
-ifeq ($(CONFIG_MODULES)$(CONFIG_WORD_SIZE),y64)
+ifneq ($(CONFIG_MODULES)$(CONFIG_KEXEC_FILE),)
+ifeq ($(CONFIG_WORD_SIZE),64)
 obj-y  += elf_util.o elf_util_64.o
 endif
+endif
 
 obj64-$(CONFIG_PPC_TRANSACTIONAL_MEM)  += tm.o
 
diff --git a/arch/powerpc/kernel/elf_util_64.c 
b/arch/powerpc/kernel/elf_util_64.c
index 8e5d400ac9f2..80f209a42abd 100644
--- a/arch/powerpc/kernel/elf_util_64.c
+++ b/arch/powerpc/kernel/elf_util_64.c
@@ -74,6 +74,8 @@ static void squash_toc_save_inst(const char *name, unsigned 
long addr) { }
  * @syms_base: Contents of the associated symbol table.
  * @loc_base:  Contents of the section to which relocations apply.
  * @addr_base: The address where the section will be loaded in memory.
+ * @relative_symbols:  Are the symbols' st_value members relative?
+ * @check_symbols: Fail if an unexpected symbol is found?
  * @obj_name:  The name of the ELF binary, for information messages.
  *
  * Applies RELA relocations to an ELF file already at its final location
@@ -84,11 +86,13 @@ int elf64_apply_relocate_add(const struct elf_info 
*elf_info,
 const char *strtab, const Elf64_Rela *rela,
 unsigned int num_rela, void *syms_base,
 void *loc_base, Elf64_Addr addr_base,
+bool relative_symbols, bool check_symbols,
 const char *obj_name)
 {
unsigned int i;
unsigned long *location;
unsigned long address;
+   unsigned long sec_base;
unsigned long value;
const char *name;
Elf64_Sym *sym;
@@ -121,8 +125,36 @@ int elf64_apply_relocate_add(const struct elf_info 
*elf_info,
   name, (unsigned long)sym->st_value,
   (long)rela[i].r_addend);
 
+   if (check_symbols) {
+   /*
+* TOC symbols appear as

[PATCH v3 7/9] powerpc: Implement kexec_file_load.

2016-06-21 Thread Thiago Jung Bauermann

Adds the basic machinery needed by kexec_file_load.

Signed-off-by: Josh Sklar 
Signed-off-by: Thiago Jung Bauermann 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
---
 arch/powerpc/Kconfig   | 13 +
 arch/powerpc/include/asm/systbl.h  |  1 +
 arch/powerpc/include/asm/unistd.h  |  2 +-
 arch/powerpc/include/uapi/asm/unistd.h |  1 +
 arch/powerpc/kernel/machine_kexec_64.c | 50 ++
 5 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 01f7464d9fea..3ed5770b89e4 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -457,6 +457,19 @@ config KEXEC
  interface is strongly in flux, so no good recommendation can be
  made.
 
+config KEXEC_FILE
+   bool "kexec file based system call"
+   select KEXEC_CORE
+   select BUILD_BIN2C
+   depends on PPC64
+   depends on CRYPTO=y
+   depends on CRYPTO_SHA256=y
+   help
+ This is a new version of the kexec system call. This call is
+ file based and takes in file descriptors as system call arguments
+ for kernel and initramfs as opposed to a list of segments as is the
+ case for the older kexec call.
+
 config CRASH_DUMP
bool "Build a kdump crash kernel"
depends on PPC64 || 6xx || FSL_BOOKE || (44x && !SMP)
diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
index 2fc5d4db503c..4b369d83fe9c 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -386,3 +386,4 @@ SYSCALL(mlock2)
 SYSCALL(copy_file_range)
 COMPAT_SYS_SPU(preadv2)
 COMPAT_SYS_SPU(pwritev2)
+SYSCALL(kexec_file_load)
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index cf12c580f6b2..a01e97d3f305 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include 
 
 
-#define NR_syscalls382
+#define NR_syscalls383
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h 
b/arch/powerpc/include/uapi/asm/unistd.h
index e9f5f41aa55a..2f26335a3c42 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -392,5 +392,6 @@
 #define __NR_copy_file_range   379
 #define __NR_preadv2   380
 #define __NR_pwritev2  381
+#define __NR_kexec_file_load   382
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 50bf55135ef8..b242f2293a6e 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -31,6 +31,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_KEXEC_FILE
+static struct kexec_file_ops *kexec_file_loaders[] = { };
+#endif
+
 #ifdef CONFIG_PPC_BOOK3E
 int default_machine_kexec_prepare(struct kimage *image)
 {
@@ -427,3 +431,49 @@ static int __init export_htab_values(void)
 }
 late_initcall(export_htab_values);
 #endif /* CONFIG_PPC_STD_MMU_64 */
+
+#ifdef CONFIG_KEXEC_FILE
+int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
+ unsigned long buf_len)
+{
+   int i, ret = -ENOEXEC;
+   struct kexec_file_ops *fops;
+
+   /* We don't support crash kernels yet. */
+   if (image->type == KEXEC_TYPE_CRASH)
+   return -ENOTSUPP;
+
+   for (i = 0; i < ARRAY_SIZE(kexec_file_loaders); i++) {
+   fops = kexec_file_loaders[i];
+   if (!fops || !fops->probe)
+   continue;
+
+   ret = fops->probe(buf, buf_len);
+   if (!ret) {
+   image->fops = fops;
+   return ret;
+   }
+   }
+
+   return ret;
+}
+
+void *arch_kexec_kernel_image_load(struct kimage *image)
+{
+   if (!image->fops || !image->fops->load)
+   return ERR_PTR(-ENOEXEC);
+
+   return image->fops->load(image, image->kernel_buf,
+image->kernel_buf_len, image->initrd_buf,
+image->initrd_buf_len, image->cmdline_buf,
+image->cmdline_buf_len);
+}
+
+int arch_kimage_file_post_load_cleanup(struct kimage *image)
+{
+   if (!image->fops || !image->fops->cleanup)
+   return 0;
+
+   return image->fops->cleanup(image->image_loader_data);
+}
+#endif /* CONFIG_KEXEC_FILE */
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 6/9] powerpc: Add functions to read ELF files of any endianness.

2016-06-21 Thread Thiago Jung Bauermann

A little endian kernel might need to kexec a big endian kernel (the
opposite is less likely but could happen as well), so we can't just cast
the buffer with the binary to ELF structs and use them as is done
elsewhere.

This patch adds functions which do byte-swapping as necessary when
populating the ELF structs. These functions will be used in the next
patch in the series.

Signed-off-by: Thiago Jung Bauermann 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
---
 arch/powerpc/include/asm/elf_util.h |  19 ++
 arch/powerpc/kernel/Makefile|   2 +-
 arch/powerpc/kernel/elf_util.c  | 476 
 3 files changed, 496 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/elf_util.h 
b/arch/powerpc/include/asm/elf_util.h
index a012ba03282d..47d15515ba33 100644
--- a/arch/powerpc/include/asm/elf_util.h
+++ b/arch/powerpc/include/asm/elf_util.h
@@ -20,6 +20,14 @@
 #include 
 
 struct elf_info {
+   /*
+* Where the ELF binary contents are kept.
+* Memory managed by the user of the struct.
+*/
+   const char *buffer;
+
+   const struct elfhdr *ehdr;
+   const struct elf_phdr *proghdrs;
struct elf_shdr *sechdrs;
 
/* Index of stubs section. */
@@ -63,6 +71,17 @@ static inline unsigned long my_r2(const struct elf_info 
*elf_info)
return elf_info->sechdrs[elf_info->toc_section].sh_addr + 0x8000;
 }
 
+static inline bool elf_is_elf_file(const struct elfhdr *ehdr)
+{
+   return memcmp(ehdr->e_ident, ELFMAG, SELFMAG) == 0;
+}
+
+int elf_read_from_buffer(const char *buf, size_t len, struct elfhdr *ehdr,
+struct elf_info *elf_info);
+void elf_init_elf_info(const struct elfhdr *ehdr, struct elf_shdr *sechdrs,
+  struct elf_info *elf_info);
+void elf_free_info(struct elf_info *elf_info);
+
 int elf64_apply_relocate_add(const struct elf_info *elf_info,
 const char *strtab, const Elf64_Rela *rela,
 unsigned int num_rela, void *syms_base,
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index e99f626acc85..8a53fccaa053 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -125,7 +125,7 @@ obj-y   += iomap.o
 endif
 
 ifeq ($(CONFIG_MODULES)$(CONFIG_WORD_SIZE),y64)
-obj-y  += elf_util_64.o
+obj-y  += elf_util.o elf_util_64.o
 endif
 
 obj64-$(CONFIG_PPC_TRANSACTIONAL_MEM)  += tm.o
diff --git a/arch/powerpc/kernel/elf_util.c b/arch/powerpc/kernel/elf_util.c
new file mode 100644
index ..1df4a116ad90
--- /dev/null
+++ b/arch/powerpc/kernel/elf_util.c
@@ -0,0 +1,476 @@
+/*
+ * Utility functions to work with ELF files.
+ *
+ * Copyright (C) 2016, IBM Corporation
+ *
+ * Based on kexec-tools' kexec-elf.c. Heavily modified for the
+ * kernel by Thiago Jung Bauermann .
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation (version 2 of the License).
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+
+#if ELF_CLASS == ELFCLASS32
+#define elf_addr_to_cpuelf32_to_cpu
+
+#ifndef Elf_Rel
+#define Elf_RelElf32_Rel
+#endif /* Elf_Rel */
+#else /* ELF_CLASS == ELFCLASS32 */
+#define elf_addr_to_cpuelf64_to_cpu
+
+#ifndef Elf_Rel
+#define Elf_RelElf64_Rel
+#endif /* Elf_Rel */
+
+static uint64_t elf64_to_cpu(const struct elfhdr *ehdr, uint64_t value)
+{
+   if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+   value = le64_to_cpu(value);
+   else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+   value = be64_to_cpu(value);
+
+   return value;
+}
+#endif /* ELF_CLASS == ELFCLASS32 */
+
+static uint16_t elf16_to_cpu(const struct elfhdr *ehdr, uint16_t value)
+{
+   if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+   value = le16_to_cpu(value);
+   else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+   value = be16_to_cpu(value);
+
+   return value;
+}
+
+static uint32_t elf32_to_cpu(const struct elfhdr *ehdr, uint32_t value)
+{
+   if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+   value = le32_to_cpu(value);
+   else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+   value = be32_to_cpu(value);
+
+   return value;
+}
+
+/**
+ * elf_is_ehdr_sane - check that it is safe to use the ELF header
+ * @buf_len:   size of the buffer in which the ELF file is loaded.
+ */
+static bool

[PATCH v3 5/9] powerpc: Generalize elf64_apply_relocate_add.

2016-06-21 Thread Thiago Jung Bauermann

When apply_relocate_add is called, modules are already loaded at their
final location in memory so Elf64_Shdr.sh_addr can be used for accessing
the section contents as well as the base address for relocations.

This is not the case for kexec's purgatory, because it will only be
copied to its final location right before being executed. Therefore,
it needs to be relocated while it is still in a temporary buffer. In
this case, Elf64_Shdr.sh_addr can't be used to access the sections'
contents.

This patch allows elf64_apply_relocate_add to be used when the ELF
binary is not yet at its final location by adding an addr_base argument
to specify the address at which the section will be loaded, and rela,
loc_base and syms_base to point to the sections' contents.

Signed-off-by: Thiago Jung Bauermann 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Torsten Duwe 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
---
 arch/powerpc/include/asm/elf_util.h |  6 ++--
 arch/powerpc/kernel/elf_util_64.c   | 63 +
 arch/powerpc/kernel/module_64.c | 17 --
 3 files changed, 61 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/elf_util.h 
b/arch/powerpc/include/asm/elf_util.h
index 37372559fe62..a012ba03282d 100644
--- a/arch/powerpc/include/asm/elf_util.h
+++ b/arch/powerpc/include/asm/elf_util.h
@@ -64,7 +64,9 @@ static inline unsigned long my_r2(const struct elf_info 
*elf_info)
 }
 
 int elf64_apply_relocate_add(const struct elf_info *elf_info,
-const char *strtab, unsigned int symindex,
-unsigned int relsec, const char *obj_name);
+const char *strtab, const Elf64_Rela *rela,
+unsigned int num_rela, void *syms_base,
+void *loc_base, Elf64_Addr addr_base,
+const char *obj_name);
 
 #endif /* _ASM_POWERPC_ELF_UTIL_H */
diff --git a/arch/powerpc/kernel/elf_util_64.c 
b/arch/powerpc/kernel/elf_util_64.c
index decad2c34f38..8e5d400ac9f2 100644
--- a/arch/powerpc/kernel/elf_util_64.c
+++ b/arch/powerpc/kernel/elf_util_64.c
@@ -69,33 +69,56 @@ static void squash_toc_save_inst(const char *name, unsigned 
long addr) { }
  * elf64_apply_relocate_add - apply 64 bit RELA relocations
  * @elf_info:  Support information for the ELF binary being relocated.
  * @strtab:String table for the associated symbol table.
- * @symindex:  Section header index for the associated symbol table.
- * @relsec:Section header index for the relocations to apply.
+ * @rela:  Contents of the section with the relocations to apply.
+ * @num_rela:  Number of relocation entries in the section.
+ * @syms_base: Contents of the associated symbol table.
+ * @loc_base:  Contents of the section to which relocations apply.
+ * @addr_base: The address where the section will be loaded in memory.
  * @obj_name:  The name of the ELF binary, for information messages.
+ *
+ * Applies RELA relocations to an ELF file already at its final location
+ * in memory (in which case loc_base == addr_base), or still in a temporary
+ * buffer.
  */
 int elf64_apply_relocate_add(const struct elf_info *elf_info,
-const char *strtab, unsigned int symindex,
-unsigned int relsec, const char *obj_name)
+const char *strtab, const Elf64_Rela *rela,
+unsigned int num_rela, void *syms_base,
+void *loc_base, Elf64_Addr addr_base,
+const char *obj_name)
 {
unsigned int i;
-   Elf64_Shdr *sechdrs = elf_info->sechdrs;
-   Elf64_Rela *rela = (void *)sechdrs[relsec].sh_addr;
-   Elf64_Sym *sym;
unsigned long *location;
+   unsigned long address;
unsigned long value;
+   const char *name;
+   Elf64_Sym *sym;
+
+   for (i = 0; i < num_rela; i++) {
+   /*
+* rels[i].r_offset contains the byte offset from the beginning
+* of section to the storage unit affected.
+*
+* This is the location to update in the temporary buffer where
+* the section is currently loaded. The section will finally
+* be loaded to a different address later, pointed to by
+* addr_base.
+*/
+   location = loc_base + rela[i].r_offset;
+
+   /* Final address of the location. */
+   address = addr_base + rela[i].r_offset;
 
+   /* This is the symbol the relocation is referring to. */
+   sym = (Elf64_Sym *) syms_base + ELF64_R_SYM(rela[i].r_info);
 
-   for

[PATCH v3 3/9] kexec_file: Factor out kexec_locate_mem_hole from kexec_add_buffer.

2016-06-21 Thread Thiago Jung Bauermann

kexec_locate_mem_hole will be used by the PowerPC kexec_file_load
implementation to find free memory for the purgatory stack.

Signed-off-by: Thiago Jung Bauermann 
Cc: Eric Biederman 
Cc: Dave Young 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
---
 include/linux/kexec.h |  4 
 kernel/kexec_file.c   | 66 ++-
 2 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 3d91bcfc180d..4ca6f5f95d66 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -227,6 +227,10 @@ extern asmlinkage long sys_kexec_load(unsigned long entry,
struct kexec_segment __user *segments,
unsigned long flags);
 extern int kernel_kexec(void);
+int kexec_locate_mem_hole(struct kimage *image, unsigned long size,
+ unsigned long align, unsigned long min_addr,
+ unsigned long max_addr, bool top_down,
+ unsigned long *addr);
 extern int kexec_add_buffer(struct kimage *image, char *buffer,
unsigned long bufsz, unsigned long memsz,
unsigned long buf_align, unsigned long buf_min,
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index b1f1f6402518..85a515511925 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -449,6 +449,46 @@ int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
 }
 
+/**
+ * kexec_locate_mem_hole - find free memory to load segment or use in purgatory
+ * @image: kexec image being updated.
+ * @size:  Memory size.
+ * @align: Minimum alignment needed.
+ * @min_addr:  Minimum starting address.
+ * @max_addr:  Maximum end address.
+ * @top_down   Find the highest free memory region?
+ * @addr   On success, will have start address of the memory region found.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int kexec_locate_mem_hole(struct kimage *image, unsigned long size,
+ unsigned long align, unsigned long min_addr,
+ unsigned long max_addr, bool top_down,
+ unsigned long *addr)
+{
+   int ret;
+   struct kexec_buf buf;
+
+   memset(, 0, sizeof(struct kexec_buf));
+   buf.image = image;
+
+   buf.memsz = size;
+   buf.buf_align = align;
+   buf.buf_min = min_addr;
+   buf.buf_max = max_addr;
+   buf.top_down = top_down;
+
+   ret = arch_kexec_walk_mem(, locate_mem_hole_callback);
+   if (ret != 1) {
+   /* A suitable memory range could not be found for buffer */
+   return -EADDRNOTAVAIL;
+   }
+
+   *addr = buf.mem;
+
+   return 0;
+}
+
 /*
  * Helper function for placing a buffer in a kexec segment. This assumes
  * that kexec_mutex is held.
@@ -460,8 +500,8 @@ int kexec_add_buffer(struct kimage *image, char *buffer, 
unsigned long bufsz,
 {
 
struct kexec_segment *ksegment;
-   struct kexec_buf buf, *kbuf;
int ret;
+   unsigned long addr, align, size;
 
/* Currently adding segment this way is allowed only in file mode */
if (!image->file_mode)
@@ -482,29 +522,21 @@ int kexec_add_buffer(struct kimage *image, char *buffer, 
unsigned long bufsz,
return -EINVAL;
}
 
-   memset(, 0, sizeof(struct kexec_buf));
-   kbuf = 
-   kbuf->image = image;
-
-   kbuf->memsz = ALIGN(memsz, PAGE_SIZE);
-   kbuf->buf_align = max(buf_align, PAGE_SIZE);
-   kbuf->buf_min = buf_min;
-   kbuf->buf_max = buf_max;
-   kbuf->top_down = top_down;
+   size = ALIGN(memsz, PAGE_SIZE);
+   align = max(buf_align, PAGE_SIZE);
 
/* Walk the RAM ranges and allocate a suitable range for the buffer */
-   ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback);
-   if (ret != 1) {
-   /* A suitable memory range could not be found for buffer */
-   return -EADDRNOTAVAIL;
-   }
+   ret = kexec_locate_mem_hole(image, size, align, buf_min, buf_max,
+   top_down, );
+   if (ret)
+   return ret;
 
/* Found a suitable memory range */
ksegment = >segment[image->nr_segments];
ksegment->kbuf = buffer;
ksegment->bufsz = bufsz;
-   ksegment->mem = kbuf->mem;
-   ksegment->memsz = kbuf->memsz;
+   ksegment->mem = addr;
+   ksegment->memsz = size;
image->nr_segments++;
*load_addr = ksegment->mem;
return 0;
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 4/9] powerpc: Factor out relocation code from module_64.c to elf_util_64.c.

2016-06-21 Thread Thiago Jung Bauermann

The kexec_file_load system call needs to relocate the purgatory, so
factor out the module relocation code so that it can be shared.

This patch's purpose is to move the ELF relocation logic from
apply_relocate_add to elf_util_64.c with as few changes as
possible. The following changes were needed:

To avoid having module-specific code in a general purpose utility
function, struct elf_info was created to contain the information
needed for ELF binaries manipulation.

my_r2, stub_for_addr and create_stub were changed to use it instead of
having to receive a struct module, since they are called from
elf64_apply_relocate_add.

local_entry_offset and squash_toc_save_inst were only used by
apply_rellocate_add, so they were moved to elf_util_64.c as well.

Signed-off-by: Thiago Jung Bauermann 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Torsten Duwe 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
---
 arch/powerpc/include/asm/elf_util.h |  70 
 arch/powerpc/include/asm/module.h   |  14 +-
 arch/powerpc/kernel/Makefile|   4 +
 arch/powerpc/kernel/elf_util_64.c   | 269 +++
 arch/powerpc/kernel/module_64.c | 312 
 5 files changed, 386 insertions(+), 283 deletions(-)

diff --git a/arch/powerpc/include/asm/elf_util.h 
b/arch/powerpc/include/asm/elf_util.h
new file mode 100644
index ..37372559fe62
--- /dev/null
+++ b/arch/powerpc/include/asm/elf_util.h
@@ -0,0 +1,70 @@
+/*
+ * Utility functions to work with ELF files.
+ *
+ * Copyright (C) 2016, IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _ASM_POWERPC_ELF_UTIL_H
+#define _ASM_POWERPC_ELF_UTIL_H
+
+#include 
+
+struct elf_info {
+   struct elf_shdr *sechdrs;
+
+   /* Index of stubs section. */
+   unsigned int stubs_section;
+   /* Index of TOC section. */
+   unsigned int toc_section;
+};
+
+#ifdef __powerpc64__
+#ifdef PPC64_ELF_ABI_v2
+
+/* An address is simply the address of the function. */
+typedef unsigned long func_desc_t;
+#else
+
+/* An address is address of the OPD entry, which contains address of fn. */
+typedef struct ppc64_opd_entry func_desc_t;
+#endif /* PPC64_ELF_ABI_v2 */
+
+/* Like PPC32, we need little trampolines to do > 24-bit jumps (into
+   the kernel itself).  But on PPC64, these need to be used for every
+   jump, actually, to reset r2 (TOC+0x8000). */
+struct ppc64_stub_entry
+{
+   /* 28 byte jump instruction sequence (7 instructions). We only
+* need 6 instructions on ABIv2 but we always allocate 7 so
+* so we don't have to modify the trampoline load instruction. */
+   u32 jump[7];
+   /* Used by ftrace to identify stubs */
+   u32 magic;
+   /* Data for the above code */
+   func_desc_t funcdata;
+};
+#endif
+
+/* r2 is the TOC pointer: it actually points 0x8000 into the TOC (this
+   gives the value maximum span in an instruction which uses a signed
+   offset) */
+static inline unsigned long my_r2(const struct elf_info *elf_info)
+{
+   return elf_info->sechdrs[elf_info->toc_section].sh_addr + 0x8000;
+}
+
+int elf64_apply_relocate_add(const struct elf_info *elf_info,
+const char *strtab, unsigned int symindex,
+unsigned int relsec, const char *obj_name);
+
+#endif /* _ASM_POWERPC_ELF_UTIL_H */
diff --git a/arch/powerpc/include/asm/module.h 
b/arch/powerpc/include/asm/module.h
index cd4ffd86765f..f2073115d518 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/arch/powerpc/include/asm/module.h
@@ -12,7 +12,14 @@
 #include 
 #include 
 #include 
+#include 
 
+/* Both low and high 16 bits are added as SIGNED additions, so if low
+   16 bits has high bit set, high 16 bits must be adjusted.  These
+   macros do that (stolen from binutils). */
+#define PPC_LO(v) ((v) & 0x)
+#define PPC_HI(v) (((v) >> 16) & 0x)
+#define PPC_HA(v) PPC_HI ((v) + 0x8000)
 
 #ifndef __powerpc64__
 /*
@@ -33,8 +40,7 @@ struct ppc_plt_entry {
 
 struct mod_arch_specific {
 #ifdef __powerpc64__
-   unsigned int stubs_section; /* Index of stubs section in module */
-   unsigned int toc_section;   /* What section is the TOC? */
+   struct elf_info elf_info;
bool toc_fixed; /* Have we fixed up .TOC.? */
 #ifdef CONFIG_DYNAMIC_FTRACE
unsigned long toc;
@@ -90,6 +96,10 @@

[PATCH v3 2/9] kexec_file: Generalize kexec_add_buffer.

2016-06-21 Thread Thiago Jung Bauermann

Allow architectures to specify different memory walking functions for
kexec_add_buffer. Intel uses iomem to track reserved memory ranges,
but PowerPC uses the memblock subsystem.

Signed-off-by: Thiago Jung Bauermann 
Cc: Eric Biederman 
Cc: Dave Young 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
---
 include/linux/kexec.h   | 19 ++-
 kernel/kexec_file.c | 30 ++
 kernel/kexec_internal.h | 14 --
 3 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index e8acb2b43dd9..3d91bcfc180d 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -146,7 +146,24 @@ struct kexec_file_ops {
kexec_verify_sig_t *verify_sig;
 #endif
 };
-#endif
+
+/*
+ * Keeps track of buffer parameters as provided by caller for requesting
+ * memory placement of buffer.
+ */
+struct kexec_buf {
+   struct kimage *image;
+   unsigned long mem;
+   unsigned long memsz;
+   unsigned long buf_align;
+   unsigned long buf_min;
+   unsigned long buf_max;
+   bool top_down;  /* allocate from top of memory hole */
+};
+
+int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
+  int (*func)(u64, u64, void *));
+#endif /* CONFIG_KEXEC_FILE */
 
 struct kimage {
kimage_entry_t head;
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index b6eec7527e9f..b1f1f6402518 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -428,6 +428,27 @@ static int locate_mem_hole_callback(u64 start, u64 end, 
void *arg)
return locate_mem_hole_bottom_up(start, end, kbuf);
 }
 
+/**
+ * arch_kexec_walk_mem - call func(data) on free memory regions
+ * @kbuf:  Context info for the search. Also passed to @func.
+ * @func:  Function to call for each memory region.
+ *
+ * Return: The memory walk will stop when func returns a non-zero value
+ * and that value will be returned. If all free regions are visited without
+ * func returning non-zero, then zero will be returned.
+ */
+int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
+  int (*func)(u64, u64, void *))
+{
+   if (kbuf->image->type == KEXEC_TYPE_CRASH)
+   return walk_iomem_res_desc(crashk_res.desc,
+  IORESOURCE_SYSTEM_RAM | 
IORESOURCE_BUSY,
+  crashk_res.start, crashk_res.end,
+  kbuf, func);
+   else
+   return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
+}
+
 /*
  * Helper function for placing a buffer in a kexec segment. This assumes
  * that kexec_mutex is held.
@@ -472,14 +493,7 @@ int kexec_add_buffer(struct kimage *image, char *buffer, 
unsigned long bufsz,
kbuf->top_down = top_down;
 
/* Walk the RAM ranges and allocate a suitable range for the buffer */
-   if (image->type == KEXEC_TYPE_CRASH)
-   ret = walk_iomem_res_desc(crashk_res.desc,
-   IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
-   crashk_res.start, crashk_res.end, kbuf,
-   locate_mem_hole_callback);
-   else
-   ret = walk_system_ram_res(0, -1, kbuf,
- locate_mem_hole_callback);
+   ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback);
if (ret != 1) {
/* A suitable memory range could not be found for buffer */
return -EADDRNOTAVAIL;
diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h
index eefd5bf960c2..4cef7e4706b0 100644
--- a/kernel/kexec_internal.h
+++ b/kernel/kexec_internal.h
@@ -20,20 +20,6 @@ struct kexec_sha_region {
unsigned long len;
 };
 
-/*
- * Keeps track of buffer parameters as provided by caller for requesting
- * memory placement of buffer.
- */
-struct kexec_buf {
-   struct kimage *image;
-   unsigned long mem;
-   unsigned long memsz;
-   unsigned long buf_align;
-   unsigned long buf_min;
-   unsigned long buf_max;
-   bool top_down;  /* allocate from top of memory hole */
-};
-
 void kimage_file_post_load_cleanup(struct kimage *image);
 #else /* CONFIG_KEXEC_FILE */
 static inline void kimage_file_post_load_cleanup(struct kimage *image) { }
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 1/9] kexec_file: Remove unused members from struct kexec_buf.

2016-06-21 Thread Thiago Jung Bauermann

kexec_add_buffer uses kexec_buf.buffer and kexec_buf.bufsz to pass along
its own arguments buffer and bufsz, but since they aren't used anywhere
else, it's pointless.

Signed-off-by: Thiago Jung Bauermann 
Cc: Eric Biederman 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Acked-by: Dave Young 
---
 kernel/kexec_file.c | 6 ++
 kernel/kexec_internal.h | 2 --
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 01ab82a40d22..b6eec7527e9f 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -464,8 +464,6 @@ int kexec_add_buffer(struct kimage *image, char *buffer, 
unsigned long bufsz,
memset(, 0, sizeof(struct kexec_buf));
kbuf = 
kbuf->image = image;
-   kbuf->buffer = buffer;
-   kbuf->bufsz = bufsz;
 
kbuf->memsz = ALIGN(memsz, PAGE_SIZE);
kbuf->buf_align = max(buf_align, PAGE_SIZE);
@@ -489,8 +487,8 @@ int kexec_add_buffer(struct kimage *image, char *buffer, 
unsigned long bufsz,
 
/* Found a suitable memory range */
ksegment = >segment[image->nr_segments];
-   ksegment->kbuf = kbuf->buffer;
-   ksegment->bufsz = kbuf->bufsz;
+   ksegment->kbuf = buffer;
+   ksegment->bufsz = bufsz;
ksegment->mem = kbuf->mem;
ksegment->memsz = kbuf->memsz;
image->nr_segments++;
diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h
index 0a52315d9c62..eefd5bf960c2 100644
--- a/kernel/kexec_internal.h
+++ b/kernel/kexec_internal.h
@@ -26,8 +26,6 @@ struct kexec_sha_region {
  */
 struct kexec_buf {
struct kimage *image;
-   char *buffer;
-   unsigned long bufsz;
unsigned long mem;
unsigned long memsz;
unsigned long buf_align;
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 0/9] kexec_file_load implementation for PowerPC

2016-06-21 Thread Thiago Jung Bauermann

Hello,

This patch series implements the kexec_file_load system call on PowerPC.

This system call moves the reading of the kernel, initrd and the device tree
from the userspace kexec tool to the kernel. This is needed if you want to
do one or both of the following:

1. only allow loading of signed kernels.
2. "measure" (i.e., record the hashes of) the kernel, initrd, kernel
   command line and other boot inputs for the Integrity Measurement
   Architecture subsystem.

The above are the functions kexec already has built into kexec_file_load.
Yesterday I posted a set of patches which allows a third feature:

3. have IMA pass-on its event log (where integrity measurements are
   registered) accross kexec to the second kernel, so that the event
   history is preserved.

Because OpenPower uses an intermediary Linux instance as a boot loader
(skiroot), feature 1 is needed to implement secure boot for the platform,
while features 2 and 3 are needed to implement trusted boot.

This patch series starts by removing an x86 assumption from kexec_file:
kexec_add_buffer uses iomem to find reserved memory ranges, but PowerPC
uses the memblock subsystem.  A hook is added so that each arch can
specify how memory ranges can be found.

Also, the memory-walking logic in kexec_add_buffer is useful in this
implementation to find a free area for the purgatory's stack, so the
next patch moves that logic to kexec_locate_mem_hole.

The kexec_file_load system call needs to apply relocations to the
purgatory but adding code for that would duplicate functionality with
the module loading mechanism, which also needs to apply relocations to
the kernel modules.  Therefore, this patch series factors out the module
relocation code so that it can be shared.

One thing that is still missing is crashkernel support, which I intend
to submit shortly. For now, arch_kexec_kernel_image_probe rejects crash
kernels.

This code is based on kexec-tools, but with many modifications to adapt
it to the kernel environment and facilities. Except the purgatory,
which only has minimal changes.

Changes for v3:
- Rebased series on today's powerpc/next.
- Patch "kexec_file: Generalize kexec_add_buffer.":
- Removed most arguments from arch_kexec_walk_mem and pass kbuf
  explicitly.
- Patch "powerpc: Add functions to read ELF files of any endianness.":
- Fixed whitespace issues found by checkpatch.pl.
- Patch "powerpc: Factor out relocation code from module_64.c to
  elf_util_64.c.":
- Changed to use the new PPC64_ELF_ABI_v2 macro.
- Patch "powerpc: Add support for loading ELF kernels with
  kexec_file_load.":
- Adapted arch_kexec_walk_mem implementation to changes in its
  argument list.
- Fixed whitespace and GPL header issues found by checkpatch.pl.
- Patch "powerpc: Add purgatory for kexec_file_load implementation.":
- Fixed whitespace and GPL header issues found by checkpatch.pl.
- Changed to use the new PPC64_ELF_ABI_v2 macro.

Changes for v2:

- All patches: forgot to add Signed-off-by lines in v1, so added them now.
- Patch "kexec_file: Generalize kexec_add_buffer.": broke in two, one
  adding arch_kexec_walk_mem and the other adding kexec_locate_mem_hole.
- Patch "powerpc: Implement kexec_file_load.":
- Moved relocation changes and the arch_kexec_walk_mem implementation
  to the next patch in the series.
- Removed pr_fmt from machine_kexec_64.c, since the patch doesn't add
  any call to pr_debug in that file.
- Changed arch_kexec_kernel_image_probe to reject crash kernels.

Changes for v3:
- Rebased series on today's powerpc/next.
- Patch "kexec_file: Generalize kexec_add_buffer.":
- Removed most arguments from arch_kexec_walk_mem and pass kbuf
  explicitly.
- Patch "powerpc: Add functions to read ELF files of any endianness.":
- Fixed whitespace issues found by checkpatch.pl.
- Patch "powerpc: Factor out relocation code from module_64.c to
  elf_util_64.c.":
- Changed to use the new PPC64_ELF_ABI_v2 macro.
- Patch "powerpc: Add support for loading ELF kernels with
  kexec_file_load.":
- Adapted arch_kexec_walk_mem implementation to changes in its
  argument list.
- Fixed whitespace and GPL header issues found by checkpatch.pl.
- Patch "powerpc: Add purgatory for kexec_file_load implementation.":
- Fixed whitespace and GPL header issues found by checkpatch.pl.
- Changed to use the new PPC64_ELF_ABI_v2 macro.

Changes for v2:

- All patches: forgot to add Signed-off-by lines in v1, so added them now.
- Patch "kexec_file: Generalize kexec_add_buffer.": broke in two, one
  adding arch_kexec_walk_mem and the other adding kexec_locate_mem_hole.
- Patch "powerpc: Implement kexec_file_load.":
- Moved relocation changes and the arch_kexec_walk_mem implementation
  to the next patch in the series.
- Removed pr_fmt from machine_kexec_64.c, since the patch doesn't add
  any call to pr_debug in that file.
- Changed arch_kexec_kernel_image_probe to reject

Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE

2016-06-21 Thread Peter Zijlstra

On Tue, Jun 21, 2016 at 03:43:56PM -0400, Tejun Heo wrote:
> On Tue, Jun 21, 2016 at 09:37:09PM +0200, Peter Zijlstra wrote:
> > Hurm.. So I've applied it, just to get this issue sorted, but I'm not
> > entirely sure I like it.
> > 
> > I think I prefer ego's version because that makes it harder to get stuff
> > to run on !active,online cpus. I think we really want to be careful what
> > gets to run during that state.
> 
> The original patch just did set_cpus_allowed one more time late enough
> so that the target kthread (in most cases) doesn't have to go through
> fallback rq selection afterwards.  I don't know what the long term
> solution is but CPU_ONLINE callbacks should be able to bind kthreads
> to the new CPU one way or the other.

Fair enough; clearly I need to stare harder. In any case, patch is on
its way to sched/urgent.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE

2016-06-21 Thread Tejun Heo

On Tue, Jun 21, 2016 at 09:37:09PM +0200, Peter Zijlstra wrote:
> Hurm.. So I've applied it, just to get this issue sorted, but I'm not
> entirely sure I like it.
> 
> I think I prefer ego's version because that makes it harder to get stuff
> to run on !active,online cpus. I think we really want to be careful what
> gets to run during that state.

The original patch just did set_cpus_allowed one more time late enough
so that the target kthread (in most cases) doesn't have to go through
fallback rq selection afterwards.  I don't know what the long term
solution is but CPU_ONLINE callbacks should be able to bind kthreads
to the new CPU one way or the other.

Thanks.

-- 
tejun
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE

2016-06-21 Thread Peter Zijlstra

On Tue, Jun 21, 2016 at 11:36:51AM -0400, Tejun Heo wrote:
> On Tue, Jun 21, 2016 at 07:42:31PM +0530, Gautham R Shenoy wrote:
> > > Subject: [PATCH] sched: allow kthreads to fallback to online && !active 
> > > cpus
> > > 
> > > During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is
> > > online but not active.  A CPU_ONLINE callback may create or bind a
> > > kthread so that its cpus_allowed mask only allows the CPU which is
> > > being brought online.  The kthread may start executing before the CPU
> > > is made active and can end up in select_fallback_rq().
> > > 
> > > In such cases, the expected behavior is selecting the CPU which is
> > > coming online; however, because select_fallback_rq() only chooses from
> > > active CPUs, it determines that the task doesn't have any viable CPU
> > > in its allowed mask and ends up overriding it to cpu_possible_mask.
> > > 
> > > CPU_ONLINE callbacks should be able to put kthreads on the CPU which
> > > is coming online.  Update select_fallback_rq() so that it follows
> > > cpu_online() rather than cpu_active() for kthreads.
> > > 
> > > Signed-off-by: Tejun Heo 
> > > Reported-by: Gautham R Shenoy 
> > 
> > Hi Tejun,
> > 
> > This patch fixes the issue on POWER. I am able to see the worker
> > threads of the unbound workqueues of the newly onlined node with this.
> > 
> > Tested-by: Gautham R. Shenoy 
> 
> Peter?

Hurm.. So I've applied it, just to get this issue sorted, but I'm not
entirely sure I like it.

I think I prefer ego's version because that makes it harder to get stuff
to run on !active,online cpus. I think we really want to be careful what
gets to run during that state.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 7/8] dmaengine: tegra20-apb-dma: Only calculate residue if txstate exists.

2016-06-21 Thread Jon Hunter


On 21/06/16 17:01, Vinod Koul wrote:
> On Wed, Jun 08, 2016 at 09:51:57AM +0100, Jon Hunter wrote:
>> Hi Peter,
>>
>> On 07/06/16 18:38, Peter Griffin wrote:
>>> There is no point calculating the residue if there is
>>> no txstate to store the value.
>>>
>>> Signed-off-by: Peter Griffin 
>>> ---
>>>  drivers/dma/tegra20-apb-dma.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
>>> index 01e316f..7f4af8c 100644
>>> --- a/drivers/dma/tegra20-apb-dma.c
>>> +++ b/drivers/dma/tegra20-apb-dma.c
>>> @@ -814,7 +814,7 @@ static enum dma_status tegra_dma_tx_status(struct 
>>> dma_chan *dc,
>>> unsigned int residual;
>>>  
>>> ret = dma_cookie_status(dc, cookie, txstate);
>>> -   if (ret == DMA_COMPLETE)
>>> +   if (ret == DMA_COMPLETE || !txstate)
>>> return ret;
>>
>> Thanks for reporting this. I agree that we should not do this, however, 
>> looking at the code for Tegra, I am wondering if this could change the
>> actual state that is returned. Looking at dma_cookie_status() it will
>> call dma_async_is_complete() which will return either DMA_COMPLETE or
>> DMA_IN_PROGRESS. It could be possible that the actual state for the
>> DMA transfer in the tegra driver is DMA_ERROR, so I am wondering if we
>> should do something like the following  ...
> 
> This one is stopping code execution when residue is not valid. Do notice
> that it check for DMA_COMPLETE OR txstate. In other cases, wit will return
> 'that' state when txstate is NULL.

Sorry what do you mean by "this one"?

My point is that if the status is not DMA_COMPLETE, then it is possible
that it could be DMA_ERROR (for tegra that is). However,
dma_cookie_status will only return DMA_IN_PROGRESS or DMA_COMPLETE and
so if 'txstate' is NULL we will not see the DMA_ERROR status anymore and
just think it is in progress when it is actually an error.

I do agree that the driver is broken as we are not checking for
!txstate, but this also changes the behaviour a bit.

Cheers
Jon

-- 
nvpublic
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] tools/perf: Fix the mask in regs_dump__printf and print_sample_iregs

2016-06-21 Thread Yury Norov

On Tue, Jun 21, 2016 at 08:26:40PM +0530, Madhavan Srinivasan wrote:
> When decoding the perf_regs mask in regs_dump__printf(),
> we loop through the mask using find_first_bit and find_next_bit functions.
> "mask" is of type "u64", but sent as a "unsigned long *" to
> lib functions along with sizeof().
> 
> While the exisitng code works fine in most of the case,
> the logic is broken when using a 32bit perf on a 64bit kernel (Big Endian).
> When reading u64 using (u32 *)()[0], perf (lib/find_*_bit()) assumes it 
> gets
> lower 32bits of u64 which is wrong. Proposed fix is to swap the words
> of the u64 to handle this case. This is _not_ endianess swap.
> 
> Suggested-by: Yury Norov 
> Cc: Yury Norov 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Alexander Shishkin 
> Cc: Jiri Olsa 
> Cc: Adrian Hunter 
> Cc: Kan Liang 
> Cc: Wang Nan 
> Cc: Michael Ellerman 
> Signed-off-by: Madhavan Srinivasan 
> ---
> Changelog v2:
> 1)Moved the swap code to a common function
> 2)Added more comments in the code
> 
> Changelog v1:
> 1)updated commit message and patch subject
> 2)Add the fix to print_sample_iregs() in builtin-script.c
> 
>  tools/include/linux/bitmap.h |  9 +

What about include/linux/bitmap.h? I think we'd place it there first.

>  tools/perf/builtin-script.c  | 16 +++-
>  tools/perf/util/session.c| 16 +++-
>  3 files changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
> index 28f5493da491..79998b26eb04 100644
> --- a/tools/include/linux/bitmap.h
> +++ b/tools/include/linux/bitmap.h
> @@ -2,6 +2,7 @@
>  #define _PERF_BITOPS_H
>  
>  #include 
> +#include 
>  #include 
>  
>  #define DECLARE_BITMAP(name,bits) \
> @@ -22,6 +23,14 @@ void __bitmap_or(unsigned long *dst, const unsigned long 
> *bitmap1,
>  #define small_const_nbits(nbits) \
>   (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG)
>  
> +static inline void bitmap_from_u64(unsigned long *_mask, u64 mask)

Inline is not required. Some people don't not like it. Underscored parameter in
function declaration is not the best idea as well. Try:
static void bitmap_from_u64(unsigned long *bitmap, u64 mask)

> +{
> + _mask[0] = mask & ULONG_MAX;
> +
> + if (sizeof(mask) > sizeof(unsigned long))
> + _mask[1] = mask >> 32;
> +}
> +
>  static inline void bitmap_zero(unsigned long *dst, int nbits)
>  {
>   if (small_const_nbits(nbits))
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index e3ce2f34d3ad..73928310fd91 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -412,11 +412,25 @@ static void print_sample_iregs(struct perf_sample 
> *sample,
>   struct regs_dump *regs = >intr_regs;
>   uint64_t mask = attr->sample_regs_intr;
>   unsigned i = 0, r;
> + unsigned long _mask[sizeof(mask)/sizeof(unsigned long)];

If we start with it, I think we'd hide declaration machinery as well:

#define DECLARE_L64_BITMAP(__name) unsigned long 
__name[sizeof(u64)/sizeof(unsigned long)]
or
#define L64_BITMAP_SIZE (sizeof(u64)/sizeof(unsigned long))

Or both :) Whatever you prefer.

>  
>   if (!regs)
>   return;
>  
> - for_each_set_bit(r, (unsigned long *) , sizeof(mask) * 8) {
> + /*
> +  * Since u64 is passed as 'unsigned long *', check
> +  * to see whether we need to swap words within u64.
> +  * Reason being, in 32 bit big endian userspace on a
> +  * 64bit kernel, 'unsigned long' is 32 bits.
> +  * When reading u64 using (u32 *)()[0] and (u32 *)()[1],
> +  * we will get wrong value for the mask. This is what
> +  * find_first_bit() and find_next_bit() is doing.
> +  * Issue here is "(u32 *)()[0]" gets upper 32 bits of u64,
> +  * but perf assumes it gets lower 32bits of u64. Hence the check
> +  * and swap.
> +  */
> + bitmap_from_u64(_mask, mask);
> + for_each_set_bit(r, _mask, sizeof(mask) * 8) {
>   u64 val = regs->regs[i++];
>   printf("%5s:0x%"PRIx64" ", perf_reg_name(r), val);
>   }
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 5214974e841a..1337b1c73f82 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -940,8 +940,22 @@ static void branch_stack__printf(struct perf_sample 
> *sample)
>  static void regs_dump__printf(u64 mask, u64 *regs)
>  {
>   unsigned rid, i = 0;
> + unsigned long _mask[sizeof(mask)/sizeof(unsigned long)];
>  
> - for_each_set_bit(rid, (unsigned long *) , sizeof(mask) * 8) {
> + /*
> +  * Since u64 is passed as 'unsigned long *', check

Re: [PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-21 Thread Alexei Starovoitov


On 6/21/16 7:47 AM, Thadeu Lima de Souza Cascardo wrote:


The calling convention is different with ABIv2 and so we'll need changes
in bpf_slow_path_common() and sk_negative_common().


How big would those changes be? Do we know?

How come no one reported this was broken previously? This is the first I've
heard of it being broken.



I just heard of it less than two weeks ago, and only could investigate it last
week, when I realized mainline was also affected.

It looks like the little-endian support for classic JIT were done before the
conversion to ABIv2. And as JIT is disabled by default, no one seems to have
exercised it.


it's not a surprise unfortunately. The JITs that were written before
test_bpf.ko was developed were missing corner cases. Typical tcpdump
would be fine, but fragmented packets, negative offsets and
out-out-bounds wouldn't be handled correctly.
I'd suggest to validate the stable backport with test_bpf as well.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] ibmvnic: fix to use list_for_each_safe() when delete items

2016-06-21 Thread Thomas Falcon

On 06/20/2016 10:50 AM, Thomas Falcon wrote:
> On 06/17/2016 09:53 PM, weiyj...@163.com wrote:
>> From: Wei Yongjun 
>>
>> Since we will remove items off the list using list_del() we need
>> to use a safe version of the list_for_each() macro aptly named
>> list_for_each_safe().
>>
>> Signed-off-by: Wei Yongjun 
>> ---
>>  drivers/net/ethernet/ibm/ibmvnic.c | 10 +-
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
>> b/drivers/net/ethernet/ibm/ibmvnic.c
>> index 864cb21..0b6a922 100644
>> --- a/drivers/net/ethernet/ibm/ibmvnic.c
>> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
>> @@ -3141,14 +3141,14 @@ static void handle_request_ras_comp_num_rsp(union 
>> ibmvnic_crq *crq,
>>  
>>  static void ibmvnic_free_inflight(struct ibmvnic_adapter *adapter)
>>  {
>> -struct ibmvnic_inflight_cmd *inflight_cmd;
>> +struct ibmvnic_inflight_cmd *inflight_cmd, *tmp1;
>>  struct device *dev = >vdev->dev;
>> -struct ibmvnic_error_buff *error_buff;
>> +struct ibmvnic_error_buff *error_buff, *tmp2;
>>  unsigned long flags;
>>  unsigned long flags2;
>>  
>>  spin_lock_irqsave(>inflight_lock, flags);
>> -list_for_each_entry(inflight_cmd, >inflight, list) {
>> +list_for_each_entry_safe(inflight_cmd, tmp1, >inflight, list) {
>>  switch (inflight_cmd->crq.generic.cmd) {
>>  case LOGIN:
>>  dma_unmap_single(dev, adapter->login_buf_token,
>> @@ -3165,8 +3165,8 @@ static void ibmvnic_free_inflight(struct 
>> ibmvnic_adapter *adapter)
>>  break;
>>  case REQUEST_ERROR_INFO:
>>  spin_lock_irqsave(>error_list_lock, flags2);
>> -list_for_each_entry(error_buff, >errors,
>> -list) {
>> +list_for_each_entry_safe(error_buff, tmp2,
>> + >errors, list) {
>>  dma_unmap_single(dev, error_buff->dma,
>>   error_buff->len,
>>   DMA_FROM_DEVICE);
>>
> Thanks!
>
> Acked-by: Thomas Falcon 

Hello, I apologize for prematurely ack'ing this.  There is another situation 
where you could use list_for_each_entry_safe in the function 
handle_error_info_rsp.  Could you include this in your patch, please?

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 864cb21..e9968d9 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2121,7 +2121,7 @@ static void handle_error_info_rsp(union ibmvnic_crq *crq,
  struct ibmvnic_adapter *adapter)
 {
struct device *dev = >vdev->dev;
-   struct ibmvnic_error_buff *error_buff;
+   struct ibmvnic_error_buff *error_buff, *tmp;
unsigned long flags;
bool found = false;
int i;
@@ -2133,7 +2133,7 @@ static void handle_error_info_rsp(union ibmvnic_crq *crq,
}
 
spin_lock_irqsave(>error_list_lock, flags);
-   list_for_each_entry(error_buff, >errors, list)
+   list_for_each_entry_safe(error_buff, tmp, >errors, list)
if (error_buff->error_id == crq->request_error_rsp.error_id) {
found = true;
list_del(_buff->list);

>>
>>
>> ___
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/8] Various dmaengine cleanups

2016-06-21 Thread Vinod Koul

On Tue, Jun 07, 2016 at 06:38:33PM +0100, Peter Griffin wrote:
> Hi Vinod,
> 
> This series is a bunch of cleanup updates to various
> dmaengine drivers, based on some of the review feeback to my fdma series.

Good cleanup, Applied, thanks

-- 
~Vinod
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 7/8] dmaengine: tegra20-apb-dma: Only calculate residue if txstate exists.

2016-06-21 Thread Vinod Koul

On Wed, Jun 08, 2016 at 09:51:57AM +0100, Jon Hunter wrote:
> Hi Peter,
> 
> On 07/06/16 18:38, Peter Griffin wrote:
> > There is no point calculating the residue if there is
> > no txstate to store the value.
> > 
> > Signed-off-by: Peter Griffin 
> > ---
> >  drivers/dma/tegra20-apb-dma.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
> > index 01e316f..7f4af8c 100644
> > --- a/drivers/dma/tegra20-apb-dma.c
> > +++ b/drivers/dma/tegra20-apb-dma.c
> > @@ -814,7 +814,7 @@ static enum dma_status tegra_dma_tx_status(struct 
> > dma_chan *dc,
> > unsigned int residual;
> >  
> > ret = dma_cookie_status(dc, cookie, txstate);
> > -   if (ret == DMA_COMPLETE)
> > +   if (ret == DMA_COMPLETE || !txstate)
> > return ret;
> 
> Thanks for reporting this. I agree that we should not do this, however, 
> looking at the code for Tegra, I am wondering if this could change the
> actual state that is returned. Looking at dma_cookie_status() it will
> call dma_async_is_complete() which will return either DMA_COMPLETE or
> DMA_IN_PROGRESS. It could be possible that the actual state for the
> DMA transfer in the tegra driver is DMA_ERROR, so I am wondering if we
> should do something like the following  ...

This one is stopping code execution when residue is not valid. Do notice
that it check for DMA_COMPLETE OR txstate. In other cases, wit will return
'that' state when txstate is NULL.

I am going to apply this.

> 
> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
> index 01e316f73559..45edab7418d0 100644
> --- a/drivers/dma/tegra20-apb-dma.c
> +++ b/drivers/dma/tegra20-apb-dma.c
> @@ -822,13 +822,8 @@ static enum dma_status tegra_dma_tx_status(struct 
> dma_chan *dc,
> /* Check on wait_ack desc status */
> list_for_each_entry(dma_desc, >free_dma_desc, node) {
> if (dma_desc->txd.cookie == cookie) {
> -   residual =  dma_desc->bytes_requested -
> -   (dma_desc->bytes_transferred %
> -   dma_desc->bytes_requested);
> -   dma_set_residue(txstate, residual);
> ret = dma_desc->dma_status;
> -   spin_unlock_irqrestore(>lock, flags);
> -   return ret;
> +   goto found;
> }
> }
>  
> @@ -836,17 +831,23 @@ static enum dma_status tegra_dma_tx_status(struct 
> dma_chan *dc,
> list_for_each_entry(sg_req, >pending_sg_req, node) {
> dma_desc = sg_req->dma_desc;
> if (dma_desc->txd.cookie == cookie) {
> -   residual =  dma_desc->bytes_requested -
> -   (dma_desc->bytes_transferred %
> -   dma_desc->bytes_requested);
> -   dma_set_residue(txstate, residual);
> ret = dma_desc->dma_status;
> -   spin_unlock_irqrestore(>lock, flags);
> -   return ret;
> +   goto found;
> }
> }
>  
> -   dev_dbg(tdc2dev(tdc), "cookie %d does not found\n", cookie);
> +   dev_warn(tdc2dev(tdc), "cookie %d not found\n", cookie);
> +   spin_unlock_irqrestore(>lock, flags);
> +   return ret;
> +
> +found:
> +   if (txstate) {
> +   residual = dma_desc->bytes_requested -
> +  (dma_desc->bytes_transferred %
> +   dma_desc->bytes_requested);
> +   dma_set_residue(txstate, residual);
> +   }
> +

I feel this optimizes stuff, which seems okay. Feel free to send as proper
patch.

> spin_unlock_irqrestore(>lock, flags);
> return ret;
>  }
> 
> Cheers
> Jon
> 
> -- 
> nvpublic

-- 
~Vinod
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE

2016-06-21 Thread Tejun Heo

On Tue, Jun 21, 2016 at 07:42:31PM +0530, Gautham R Shenoy wrote:
> > Subject: [PATCH] sched: allow kthreads to fallback to online && !active cpus
> > 
> > During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is
> > online but not active.  A CPU_ONLINE callback may create or bind a
> > kthread so that its cpus_allowed mask only allows the CPU which is
> > being brought online.  The kthread may start executing before the CPU
> > is made active and can end up in select_fallback_rq().
> > 
> > In such cases, the expected behavior is selecting the CPU which is
> > coming online; however, because select_fallback_rq() only chooses from
> > active CPUs, it determines that the task doesn't have any viable CPU
> > in its allowed mask and ends up overriding it to cpu_possible_mask.
> > 
> > CPU_ONLINE callbacks should be able to put kthreads on the CPU which
> > is coming online.  Update select_fallback_rq() so that it follows
> > cpu_online() rather than cpu_active() for kthreads.
> > 
> > Signed-off-by: Tejun Heo 
> > Reported-by: Gautham R Shenoy 
> 
> Hi Tejun,
> 
> This patch fixes the issue on POWER. I am able to see the worker
> threads of the unbound workqueues of the newly onlined node with this.
> 
> Tested-by: Gautham R. Shenoy 

Peter?

-- 
tejun
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3] tools/perf: Fix the mask in regs_dump__printf and print_sample_iregs

2016-06-21 Thread Madhavan Srinivasan

When decoding the perf_regs mask in regs_dump__printf(),
we loop through the mask using find_first_bit and find_next_bit functions.
"mask" is of type "u64", but sent as a "unsigned long *" to
lib functions along with sizeof().

While the exisitng code works fine in most of the case,
the logic is broken when using a 32bit perf on a 64bit kernel (Big Endian).
When reading u64 using (u32 *)()[0], perf (lib/find_*_bit()) assumes it gets
lower 32bits of u64 which is wrong. Proposed fix is to swap the words
of the u64 to handle this case. This is _not_ endianess swap.

Suggested-by: Yury Norov 
Cc: Yury Norov 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Adrian Hunter 
Cc: Kan Liang 
Cc: Wang Nan 
Cc: Michael Ellerman 
Signed-off-by: Madhavan Srinivasan 
---
Changelog v2:
1)Moved the swap code to a common function
2)Added more comments in the code

Changelog v1:
1)updated commit message and patch subject
2)Add the fix to print_sample_iregs() in builtin-script.c

 tools/include/linux/bitmap.h |  9 +
 tools/perf/builtin-script.c  | 16 +++-
 tools/perf/util/session.c| 16 +++-
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index 28f5493da491..79998b26eb04 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -2,6 +2,7 @@
 #define _PERF_BITOPS_H
 
 #include 
+#include 
 #include 
 
 #define DECLARE_BITMAP(name,bits) \
@@ -22,6 +23,14 @@ void __bitmap_or(unsigned long *dst, const unsigned long 
*bitmap1,
 #define small_const_nbits(nbits) \
(__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG)
 
+static inline void bitmap_from_u64(unsigned long *_mask, u64 mask)
+{
+   _mask[0] = mask & ULONG_MAX;
+
+   if (sizeof(mask) > sizeof(unsigned long))
+   _mask[1] = mask >> 32;
+}
+
 static inline void bitmap_zero(unsigned long *dst, int nbits)
 {
if (small_const_nbits(nbits))
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index e3ce2f34d3ad..73928310fd91 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -412,11 +412,25 @@ static void print_sample_iregs(struct perf_sample *sample,
struct regs_dump *regs = >intr_regs;
uint64_t mask = attr->sample_regs_intr;
unsigned i = 0, r;
+   unsigned long _mask[sizeof(mask)/sizeof(unsigned long)];
 
if (!regs)
return;
 
-   for_each_set_bit(r, (unsigned long *) , sizeof(mask) * 8) {
+   /*
+* Since u64 is passed as 'unsigned long *', check
+* to see whether we need to swap words within u64.
+* Reason being, in 32 bit big endian userspace on a
+* 64bit kernel, 'unsigned long' is 32 bits.
+* When reading u64 using (u32 *)()[0] and (u32 *)()[1],
+* we will get wrong value for the mask. This is what
+* find_first_bit() and find_next_bit() is doing.
+* Issue here is "(u32 *)()[0]" gets upper 32 bits of u64,
+* but perf assumes it gets lower 32bits of u64. Hence the check
+* and swap.
+*/
+   bitmap_from_u64(_mask, mask);
+   for_each_set_bit(r, _mask, sizeof(mask) * 8) {
u64 val = regs->regs[i++];
printf("%5s:0x%"PRIx64" ", perf_reg_name(r), val);
}
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 5214974e841a..1337b1c73f82 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -940,8 +940,22 @@ static void branch_stack__printf(struct perf_sample 
*sample)
 static void regs_dump__printf(u64 mask, u64 *regs)
 {
unsigned rid, i = 0;
+   unsigned long _mask[sizeof(mask)/sizeof(unsigned long)];
 
-   for_each_set_bit(rid, (unsigned long *) , sizeof(mask) * 8) {
+   /*
+* Since u64 is passed as 'unsigned long *', check
+* to see whether we need to swap words within u64.
+* Reason being, in 32 bit big endian userspace on a
+* 64bit kernel, 'unsigned long' is 32 bits.
+* When reading u64 using (u32 *)()[0] and (u32 *)()[1],
+* we will get wrong value for the mask. This is what
+* find_first_bit() and find_next_bit() is doing.
+* Issue here is "(u32 *)()[0]" gets upper 32 bits of u64,
+* but perf assumes it gets lower 32bits of u64. Hence the check
+* and swap.
+*/
+   bitmap_from_u64(_mask, mask);
+   for_each_set_bit(rid, _mask, sizeof(mask) * 8) {
u64 val = regs[i++];
 
printf(" %-5s 0x%" PRIx64 "\n",
-- 
1.9.1

Re: [PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-21 Thread Thadeu Lima de Souza Cascardo

On Tue, Jun 21, 2016 at 09:15:48PM +1000, Michael Ellerman wrote:
> On Tue, 2016-06-21 at 14:28 +0530, Naveen N. Rao wrote:
> > On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote:
> > > On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote:
> > > > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote:
> > > > > 
> > > > > Hi, Michael and Naveen.
> > > > > 
> > > > > I noticed independently that there is a problem with BPF JIT and 
> > > > > ABIv2, and
> > > > > worked out the patch below before I noticed Naveen's patchset and the 
> > > > > latest
> > > > > changes in ppc tree for a better way to check for ABI versions.
> > > > > 
> > > > > However, since the issue described below affect mainline and stable 
> > > > > kernels,
> > > > > would you consider applying it before merging your two patchsets, so 
> > > > > that we can
> > > > > more easily backport the fix?
> > > > 
> > > > Hi Cascardo,
> > > > Given that this has been broken on ABIv2 since forever, I didn't bother 
> > > > fixing it. But, I can see why this would be a good thing to have for 
> > > > -stable and existing distros. However, while your patch below may fix 
> > > > the crash you're seeing on ppc64le, it is not sufficient -- you'll need 
> > > > changes in bpf_jit_asm.S as well.
> > > 
> > > Hi, Naveen.
> > > 
> > > Any tips on how to exercise possible issues there? Or what changes you 
> > > think
> > > would be sufficient?
> > 
> > The calling convention is different with ABIv2 and so we'll need changes 
> > in bpf_slow_path_common() and sk_negative_common().
> 
> How big would those changes be? Do we know?
> 
> How come no one reported this was broken previously? This is the first I've
> heard of it being broken.
> 

I just heard of it less than two weeks ago, and only could investigate it last
week, when I realized mainline was also affected.

It looks like the little-endian support for classic JIT were done before the
conversion to ABIv2. And as JIT is disabled by default, no one seems to have
exercised it.

> > However, rather than enabling classic JIT for ppc64le, are we better off 
> > just disabling it?
> > 
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -128,7 +128,7 @@ config PPC
> > select IRQ_FORCED_THREADING
> > select HAVE_RCU_TABLE_FREE if SMP
> > select HAVE_SYSCALL_TRACEPOINTS
> > -   select HAVE_CBPF_JIT
> > +   select HAVE_CBPF_JIT if CPU_BIG_ENDIAN
> > select HAVE_ARCH_JUMP_LABEL
> > select ARCH_HAVE_NMI_SAFE_CMPXCHG
> > select ARCH_HAS_GCOV_PROFILE_ALL
> > 
> > 
> > Michael,
> > Let me know your thoughts on whether you intend to take this patch or 
> > Cascardo's patch for -stable before the eBPF patches. I can redo my 
> > patches accordingly.
> 
> This patch sounds like the best option at the moment for something we can
> backport. Unless the changes to fix it are minimal.
> 
> cheers
> 

With my patch only, I can run a minimal tcpdump tcp port 22 with success. It
correctly filter packets. But as pointed out, slow paths may not be taken.

I don't have strong opinions on what to apply to stable, just that it would be
nice to have something for the crash before applying all the nice changes by
Naveen.

Cascardo.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [Qemu-ppc] [PATCH v2] powerpc/pseries: start rtasd before PCI probing

2016-06-21 Thread Greg Kurz

On Wed, 15 Jun 2016 22:26:41 +0200
Greg Kurz  wrote:

> A strange behaviour is observed when comparing PCI hotplug in QEMU, between
> x86 and pseries. If you consider the following steps:
> - start a VM
> - add a PCI device via the QEMU monitor before the rtasd has started (for
>   example starting the VM in paused state, or hotplug during FW or boot
>   loader)
> - resume the VM execution
> 
> The x86 kernel detects the PCI device, but the pseries one does not.
> 
> This happens because the rtasd kernel worker is currently started under
> device_initcall, while PCI probing happens earlier under subsys_initcall.
> 
> As a consequence, if we have a pending RTAS event at boot time, a message
> is printed and the event is dropped.
> 
> This patch moves all the initialization of rtasd to arch_initcall, which is
> run before subsys_call: this way, logging_enabled is true when the RTAS
> event pops up and it is not lost anymore.
> 
> The proc fs bits stay at device_initcall because they cannot be run before
> fs_initcall.
> 
> Signed-off-by: Greg Kurz 
> ---
> v2: - avoid behaviour change: don't create the proc entry if early init failed
> 

I forgot to mention that Thomas had sent a Tested-by for v1, which I think is
still valid for v2.

> Michael,
> 
> This was also tested under PowerVM: it doesn't fix anything there because the
> HMC tells it won't honor DLPAR features as long as the RMC isn't here, which
> happens later in the boot sequence. It hence seems impossible to have a 
> pending
> RTAS event at boot time.
> 
> It doesn't seem to break anything either, the kernel boots and hotplug works
> okay once the RMC is up.
> 
> Cheers.
> 
> --
> Greg
> 
> ---
>  arch/powerpc/kernel/rtasd.c |   22 +-
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
> index e864b7c5884e..a26a02006576 100644
> --- a/arch/powerpc/kernel/rtasd.c
> +++ b/arch/powerpc/kernel/rtasd.c
> @@ -526,10 +526,8 @@ void rtas_cancel_event_scan(void)
>  }
>  EXPORT_SYMBOL_GPL(rtas_cancel_event_scan);
>  
> -static int __init rtas_init(void)
> +static int __init rtas_event_scan_init(void)
>  {
> - struct proc_dir_entry *entry;
> -
>   if (!machine_is(pseries) && !machine_is(chrp))
>   return 0;
>  
> @@ -562,13 +560,27 @@ static int __init rtas_init(void)
>   return -ENOMEM;
>   }
>  
> + start_event_scan();
> +
> + return 0;
> +}
> +arch_initcall(rtas_event_scan_init);
> +
> +static int __init rtas_init(void)
> +{
> + struct proc_dir_entry *entry;
> +
> + if (!machine_is(pseries) && !machine_is(chrp))
> + return 0;
> +
> + if (!rtas_log_buf)
> + return -ENODEV;
> +
>   entry = proc_create("powerpc/rtas/error_log", S_IRUSR, NULL,
>   _rtas_log_operations);
>   if (!entry)
>   printk(KERN_ERR "Failed to create error_log proc entry\n");
>  
> - start_event_scan();
> -
>   return 0;
>  }
>  __initcall(rtas_init);
> 
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE

2016-06-21 Thread Gautham R Shenoy

Hi Tejun,
On Thu, Jun 16, 2016 at 03:35:04PM -0400, Tejun Heo wrote:
> Hello,
> 
> So, the issue of the initial worker not having its affinity set
> correctly wasn't caused by the order of the operations.  Reordering
> just made set_cpus_allowed tried one more time late enough so that it
> hides the race condition most of the time.  The problem is that
> CPU_ONLINE callbacks are called while the cpu being onlined is online
> but not active and select_fallback_rq() only considers active cpus, so
> if a kthread gets scheduled in the meantime and it doesn't have any
> cpu which is active in its allowed mask, it's allowed mask gets reset
> to cpu_possible_mask.
> 
> Would something like the following make sense?
> 
> Thanks.
> -- 8< --
> Subject: [PATCH] sched: allow kthreads to fallback to online && !active cpus
> 
> During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is
> online but not active.  A CPU_ONLINE callback may create or bind a
> kthread so that its cpus_allowed mask only allows the CPU which is
> being brought online.  The kthread may start executing before the CPU
> is made active and can end up in select_fallback_rq().
> 
> In such cases, the expected behavior is selecting the CPU which is
> coming online; however, because select_fallback_rq() only chooses from
> active CPUs, it determines that the task doesn't have any viable CPU
> in its allowed mask and ends up overriding it to cpu_possible_mask.
> 
> CPU_ONLINE callbacks should be able to put kthreads on the CPU which
> is coming online.  Update select_fallback_rq() so that it follows
> cpu_online() rather than cpu_active() for kthreads.
> 
> Signed-off-by: Tejun Heo 
> Reported-by: Gautham R Shenoy 

Hi Tejun,

This patch fixes the issue on POWER. I am able to see the worker
threads of the unbound workqueues of the newly onlined node with this.

Tested-by: Gautham R. Shenoy 

> ---
>  kernel/sched/core.c |4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 017d539..a12e3db 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1536,7 +1536,9 @@ static int select_fallback_rq(int cpu, struct 
> task_struct *p)
>   for (;;) {
>   /* Any allowed, online CPU? */
>   for_each_cpu(dest_cpu, tsk_cpus_allowed(p)) {
> - if (!cpu_active(dest_cpu))
> + if (!(p->flags & PF_KTHREAD) && !cpu_active(dest_cpu))
> + continue;
> + if (!cpu_online(dest_cpu))
>   continue;
>   goto out;
>   }
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v6, 1/2] cxl: Add mechanism for delivering AFU driver specific events

2016-06-21 Thread Matthew R. Ochs

> On Jun 21, 2016, at 5:34 AM, Vaibhav Jain  wrote:
> 
> Hi Ian,
> 
> Ian Munsie  writes:
> 
>> Excerpts from Vaibhav Jain's message of 2016-06-20 14:20:16 +0530:
>> 
>> What exactly is the use case for this API? I'd vote to drop it if we can
>> do without it.
> Agree with this. Functionality of this API can be merged with
> cxl_set_driver_ops when called with NULL arg for cxl_afu_driver_ops.

Passing a NULL arg instead of calling an 'unset' API is fine with us.

I'll add that for cxlflash, I can't envision a scenario where we'll unset the
driver ops for a context.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] leds: Add no-op gpio_led_register_device when LED subsystem is disabled

2016-06-21 Thread Jacek Anaszewski


On 06/21/2016 01:48 PM, Andrew F. Davis wrote:

On 06/21/2016 02:09 AM, Jacek Anaszewski wrote:

Hi Andrew,

This patch doesn't apply, please rebase onto recent LED tree.

On 06/21/2016 12:13 AM, Andrew F. Davis wrote:

Some systems use 'gpio_led_register_device' to make an in-memory copy of
their LED device table so the original can be removed as .init.rodata.
When the LED subsystem is not enabled source in the led directory is not
built and so this function may be undefined. Fix this here.

Signed-off-by: Andrew F. Davis 
---
   include/linux/leds.h | 8 
   1 file changed, 8 insertions(+)

diff --git a/include/linux/leds.h b/include/linux/leds.h
index d2b1306..a4a3da6 100644
--- a/include/linux/leds.h
+++ b/include/linux/leds.h
@@ -386,8 +386,16 @@ struct gpio_led_platform_data {
  unsigned long *delay_off);


Currently there is some stuff here, and in fact it has been for
a long time.

Patch "[PATCH 12/12] leds: Only descend into leds directory when
CONFIG_NEW_LEDS is set" also doesn't apply.
What repository are you using?



v4.7-rc4, it may not apply due to the surrounding lines being changed in
the other patches which may not be applied to your tree. It is a single
line change per patch so hopefully the merge conflict resolutions will
be trivial.

A better solution could have been getting an ack from each maintainer
and having someone pull the whole series into one tree, but parts have
already been picked so it may be a little late for that.


OK, I resolved the issues and applied, thanks.


   };

+#ifdef CONFIG_NEW_LEDS
   struct platform_device *gpio_led_register_device(
  int id, const struct gpio_led_platform_data *pdata);
+#else
+static inline struct platform_device *gpio_led_register_device(
+   int id, const struct gpio_led_platform_data *pdata)
+{
+   return 0;
+}
+#endif

   enum cpu_led_event {
  CPU_LED_IDLE_START, /* CPU enters idle */










--
Best regards,
Jacek Anaszewski
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 1/6] qspinlock: powerpc support qspinlock

2016-06-21 Thread xinhui




On 2016年06月07日 05:41, Benjamin Herrenschmidt wrote:

On Mon, 2016-06-06 at 17:59 +0200, Peter Zijlstra wrote:

On Fri, Jun 03, 2016 at 02:33:47PM +1000, Benjamin Herrenschmidt wrote:


  - For the above, can you show (or describe) where the qspinlock
improves things compared to our current locks.

So currently PPC has a fairly straight forward test-and-set spinlock
IIRC. You have this because LPAR/virt muck and lock holder preemption
issues etc..
qspinlock is 1) a fair lock (like ticket locks) and 2) provides
out-of-word spinning, reducing cacheline pressure.


Thanks Peter. I think I understand the theory, but I'd like see it
translate into real numbers.


Esp. on multi-socket x86 we saw the out-of-word spinning being a big win
over our ticket locks.

And fairness, brought to us by the ticket locks a long time ago,
eliminated starvation issues we had, where a spinner local to the holder
would 'always' win from a spinner further away. So under heavy enough
local contention, the spinners on 'remote' CPUs would 'never' get to own
the lock.


I think our HW has tweaks to avoid that from happening with the simple
locks in the underlying ll/sc implementation. In any case, what I'm
asking is actual tests to verify it works as expected for us.


IF HW has such tweaks then there mush be performance drop when total cpu's 
number grows up.
And I got such clues

one simple benchmark test:
it tests how many spin_lock/spin_unlock pairs can be done within 15 seconds on 
all cpus.
say,
while(!done) {
spin_lock()
this_cpu_inc(loops)
spin_unlock()
}

I do the test on two machines, one is using powerKVM, and the other is using 
pHyp.
the result below shows what the sum of loops is in the end, with K form.

cpu count   | pv-qspinlock  | test-set spinlock|

8 (powerKVM)|   62830K  |   67340K  |

8 (pHyp)|   49800K  |   59330K  |

32 (pHyp)   |   87580K  |   20990K  |
-

while cpu count grows up, the lock/unlock pairs ops of test-set spinlock drops 
very much.
this is because the cache bouncing in different physical cpus.

So to verify how both spinlock impact the data-cache,
another simple benchmark test.
code looks like:

struct _x {
spinlock_t lk;
unsigned long x;
} x;

while(!this_cpu_read(stop)) {
int i = 0xff
spin_lock(x.lk)
this_cpu_inc(loops)
while(i--)
READ_ONCE(x.x);
spin_unlock(x.lk)
}

the result below shows what the sum of loops is in the end, with K form.

cpu count   | pv-qspinlock  | test-set spinlock|

8 (pHyp)|   13240K  |   9780K   |

32 (pHyp)   |   25790K  |   9700K   |


obviously pv-qspinlock is more cache-friendly, and has better performance than 
test-set spinlock.

More test is going on, I will send out new patch set with the result.
HOPE *within* this week. unixbench really takes a long time.

thanks
xinhui

pv-qspinlock tries to preserve the fairness while allowing limited lock
stealing and explicitly managing which vcpus to wake.


Right.


While there's
theory and to some extent practice on x86, it would be nice to
validate the effects on POWER.

Right; so that will have to be from benchmarks which I cannot help you
with ;-)


Precisely :-) This is what I was asking for ;-)

Cheers,
Ben.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/kprobes: Remove kretprobe_trampoline_holder.

2016-06-21 Thread Michael Ellerman

On Thu, 2016-31-03 at 20:10:40 UTC, Thiago Jung Bauermann wrote:
> Fixes the following testsuite failure:
> 
>   $ sudo ./perf test -v kallsyms
>1: vmlinux symtab matches kallsyms  :
>   --- start ---
>   test child forked, pid 12489
>   Using /proc/kcore for kernel object code
>   Looking at the vmlinux_path (8 entries long)
>   Using /boot/vmlinux for symbols
>   0xc003d300: diff name v: .kretprobe_trampoline_holder k: 
> kretprobe_trampoline
>   Maps only in vmlinux:
>c086ca38-c0879b6c 87ca38 [kernel].text.unlikely
>c0879b6c-c0bf 889b6c [kernel].meminit.text
>c0bf-c0c53264 c0 [kernel].init.text
>c0c53264-d425 c63264 [kernel].exit.text
>d425-d445 0 [libcrc32c]
>d445-d462 0 [xfs]
>d462-d468 0 [autofs4]
>d468-d46e 0 [x_tables]
>d46e-d478 0 [ip_tables]
>d478-d47e 0 [rng_core]
>d47e- 0 [pseries_rng]
>   Maps in vmlinux with a different name in kallsyms:
>   Maps only in kallsyms:
>d000-f000 1001 [kernel.kallsyms]
>f000- 3001 [kernel.kallsyms]
>   test child finished with -1
>    end 
>   vmlinux symtab matches kallsyms: FAILED!
> 
> The problem is that the kretprobe_trampoline symbol looks like this:
> 
>   $ eu-readelf -s /boot/vmlinux G kretprobe_trampoline
>2431: c1302368 24 NOTYPE  LOCAL  DEFAULT   37 
> kretprobe_trampoline_holder
>2432: c003d300  8 FUNCLOCAL  DEFAULT1 
> .kretprobe_trampoline_holder
>   97543: c003d300  0 NOTYPE  GLOBAL DEFAULT1 
> kretprobe_trampoline
> 
> Its type is NOTYPE, and its size is 0, and this is a problem because
> symbol-elf.c:dso__load_sym skips function symbols that are not STT_FUNC
> or STT_GNU_IFUNC (this is determined by elf_sym__is_function). Even
> if the type is changed to STT_FUNC, when dso__load_sym calls
> symbols__fixup_duplicate, the kretprobe_trampoline symbol is dropped in
> favour of .kretprobe_trampoline_holder because the latter has non-zero
> size (as determined by choose_best_symbol).
> 
> With this patch, all vmlinux symbols match /proc/kallsyms and the
> testcase passes.
> 
> Commit c1c355ce14c0 ("x86/kprobes: Get rid of
> kretprobe_trampoline_holder()") gets rid of kretprobe_trampoline_holder
> altogether on x86. This commit does the same on powerpc. This change
> introduces no regressions on the perf and ftracetest testsuite results.
> 
> Cc: Ananth N Mavinakayanahalli 
> Cc: Michael Ellerman 
> Reviewed-by: Naveen N. Rao 
> Signed-off-by: Thiago Jung Bauermann 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/61ed9cfb1b0951a3b4b98dd8bf

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/powernv: Print correct PHB type names

2016-06-21 Thread Michael Ellerman

On Tue, 2016-21-06 at 02:35:56 UTC, Gavin Shan wrote:
> We're initializing "IODA1" and "IODA2" PHBs though they are IODA2
> and NPU PHBs as below kernel log indicates.
> 
>Initializing IODA1 OPAL PHB /pciex@3fffe4070
>Initializing IODA2 OPAL PHB /pciex@3fff00040
> 
> This fixes the PHB names. After it's applied, we get:
> 
>Initializing IODA2 PHB (/pciex@3fffe4070)
>Initializing NPU PHB (/pciex@3fff00040)
> 
> Signed-off-by: Gavin Shan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/9497a1c1c5b4de2a359b6d8648

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2] powerpc: export cpu_to_core_id()

2016-06-21 Thread Michael Ellerman

On Thu, 2016-02-06 at 11:45:14 UTC, Mauricio Faria de Oliveira wrote:
> Export cpu_to_core_id().  This will be used by the lpfc driver.
> 
> This enables topology_core_id() from   (defined
> to cpu_to_core_id() in arch/powerpc/include/asm/topology.h) to be
> used by (non-builtin) modules.
> 
> That is arch-neutral, already used by eg, drivers/base/topology.c,
> but it is builtin (obj-y in Makefile) thus didn't need the export.
> 
> Since the module uses topology_core_id() and this is defined to
> cpu_to_core_id(), it needs the export, otherwise:
> 
> ERROR: "cpu_to_core_id" [drivers/scsi/lpfc/lpfc.ko] undefined!
> 
> Signed-off-by: Mauricio Faria de Oliveira 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f8ab481066e7246e4b272233aa

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/pci: Fix SRIOV not building without EEH enabled

2016-06-21 Thread Michael Ellerman

On Fri, 2016-17-06 at 05:25:17 UTC, Russell Currey wrote:
> On Book3E CPUs (and possibly other configs), it is possible to have SRIOV
> (CONFIG_PCI_IOV) set without CONFIG_EEH.  The SRIOV code does not check
> for this, and if EEH is disabled, pci_dn.c fails to build.
> 
> Fix this by gating the EEH-specific code in the SRIOV implementation
> behind CONFIG_EEH.
> 
> Fixes: 39218cd0 ("powerpc/eeh: EEH device for VF")
> Reported-by: Michael Ellerman 
> Signed-off-by: Russell Currey 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/fb36e90736938d50fdaa1be7af

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v7,3/3] powerpc: Load Monitor Register Tests

2016-06-21 Thread Michael Ellerman

On Thu, 2016-09-06 at 02:31:10 UTC, Michael Neuling wrote:
> From: Jack Miller 
> 
> Adds two tests. One is a simple test to ensure that the new registers
> LMRR and LMSER are properly maintained. The other actually uses the
> existing EBB test infrastructure to test that LMRR and LMSER behave as
> documented.
> 
> Signed-off-by: Jack Miller 
> Signed-off-by: Michael Neuling 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/16c19a2e983346c547501795aa

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v7,2/3] powerpc: Load Monitor Register Support

2016-06-21 Thread Michael Ellerman

On Thu, 2016-09-06 at 02:31:09 UTC, Michael Neuling wrote:
> From: Jack Miller 
> 
> This enables new registers, LMRR and LMSER, that can trigger an EBB in
> userspace code when a monitored load (via the new ldmx instruction)
> loads memory from a monitored space. This facility is controlled by a
> new FSCR bit, LM.
> 
> This patch disables the FSCR LM control bit on task init and enables
> that bit when a load monitor facility unavailable exception is taken
> for using it. On context switch, this bit is then used to determine
> whether the two relevant registers are saved and restored. This is
> done lazily for performance reasons.
> 
> Signed-off-by: Jack Miller 
> Signed-off-by: Michael Neuling 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/bd3ea317fddfd0f2044f94bed2

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v7,1/3] powerpc: Improve FSCR init and context switching

2016-06-21 Thread Michael Ellerman

On Thu, 2016-09-06 at 02:31:08 UTC, Michael Neuling wrote:
> This fixes a few issues with FSCR init and switching.
> 
...
> 
> Signed-off-by: Michael Neuling 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b57bd2de8c6c9aa03f1b899edd

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/2] Fix misleading comment in early_setup_secondary

2016-06-21 Thread Michael Ellerman

On Fri, 2016-04-03 at 05:01:48 UTC, Madhavan Srinivasan wrote:
> Current comment in the early_setup_secondary() for
> paca->soft_enabled update is misleading. Comment should say to
> Mark interrupts "disabled" insteads of "enable".
> Patch to fix the typo.
> 
> Signed-off-by: Madhavan Srinivasan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/103b7827d977ea34c982e6a9d2

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v10,01/18] PCI: Add pcibios_setup_bridge()

2016-06-21 Thread Michael Ellerman

On Fri, 2016-20-05 at 06:41:25 UTC, Gavin Shan wrote:
> Currently, PowerPC PowerNV platform utilizes ppc_md.pcibios_fixup(),
> which is called for once after PCI probing and resource assignment
> are completed, to allocate platform required resources for PCI devices:
> PE#, IO and MMIO mapping, DMA address translation (TCE) table etc.
> Obviously, it's not hotplug friendly.
> 
> This adds weak function pcibios_setup_bridge(), which is called by
> pci_setup_bridge(). PowerPC PowerNV platform will reuse the function
> to assign above platform required resources to newly plugged PCI devices
> during PCI hotplug in subsequent patches.
> 
> Signed-off-by: Gavin Shan 
> Acked-by: Bjorn Helgaas 

Entire series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d366d28cd1325f11d582ec6d4a

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] leds: Add no-op gpio_led_register_device when LED subsystem is disabled

2016-06-21 Thread Andrew F. Davis

On 06/21/2016 02:09 AM, Jacek Anaszewski wrote:
> Hi Andrew,
> 
> This patch doesn't apply, please rebase onto recent LED tree.
> 
> On 06/21/2016 12:13 AM, Andrew F. Davis wrote:
>> Some systems use 'gpio_led_register_device' to make an in-memory copy of
>> their LED device table so the original can be removed as .init.rodata.
>> When the LED subsystem is not enabled source in the led directory is not
>> built and so this function may be undefined. Fix this here.
>>
>> Signed-off-by: Andrew F. Davis 
>> ---
>>   include/linux/leds.h | 8 
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/include/linux/leds.h b/include/linux/leds.h
>> index d2b1306..a4a3da6 100644
>> --- a/include/linux/leds.h
>> +++ b/include/linux/leds.h
>> @@ -386,8 +386,16 @@ struct gpio_led_platform_data {
>>  unsigned long *delay_off);
> 
> Currently there is some stuff here, and in fact it has been for
> a long time.
> 
> Patch "[PATCH 12/12] leds: Only descend into leds directory when
> CONFIG_NEW_LEDS is set" also doesn't apply.
> What repository are you using?
> 

v4.7-rc4, it may not apply due to the surrounding lines being changed in
the other patches which may not be applied to your tree. It is a single
line change per patch so hopefully the merge conflict resolutions will
be trivial.

A better solution could have been getting an ack from each maintainer
and having someone pull the whole series into one tree, but parts have
already been picked so it may be a little late for that.

>>   };
>>
>> +#ifdef CONFIG_NEW_LEDS
>>   struct platform_device *gpio_led_register_device(
>>  int id, const struct gpio_led_platform_data *pdata);
>> +#else
>> +static inline struct platform_device *gpio_led_register_device(
>> +   int id, const struct gpio_led_platform_data *pdata)
>> +{
>> +   return 0;
>> +}
>> +#endif
>>
>>   enum cpu_led_event {
>>  CPU_LED_IDLE_START, /* CPU enters idle */
>>
> 
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-21 Thread Michael Ellerman

On Tue, 2016-06-21 at 14:28 +0530, Naveen N. Rao wrote:
> On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote:
> > On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote:
> > > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote:
> > > > 
> > > > Hi, Michael and Naveen.
> > > > 
> > > > I noticed independently that there is a problem with BPF JIT and ABIv2, 
> > > > and
> > > > worked out the patch below before I noticed Naveen's patchset and the 
> > > > latest
> > > > changes in ppc tree for a better way to check for ABI versions.
> > > > 
> > > > However, since the issue described below affect mainline and stable 
> > > > kernels,
> > > > would you consider applying it before merging your two patchsets, so 
> > > > that we can
> > > > more easily backport the fix?
> > > 
> > > Hi Cascardo,
> > > Given that this has been broken on ABIv2 since forever, I didn't bother 
> > > fixing it. But, I can see why this would be a good thing to have for 
> > > -stable and existing distros. However, while your patch below may fix 
> > > the crash you're seeing on ppc64le, it is not sufficient -- you'll need 
> > > changes in bpf_jit_asm.S as well.
> > 
> > Hi, Naveen.
> > 
> > Any tips on how to exercise possible issues there? Or what changes you think
> > would be sufficient?
> 
> The calling convention is different with ABIv2 and so we'll need changes 
> in bpf_slow_path_common() and sk_negative_common().

How big would those changes be? Do we know?

How come no one reported this was broken previously? This is the first I've
heard of it being broken.

> However, rather than enabling classic JIT for ppc64le, are we better off 
> just disabling it?
> 
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -128,7 +128,7 @@ config PPC
> select IRQ_FORCED_THREADING
> select HAVE_RCU_TABLE_FREE if SMP
> select HAVE_SYSCALL_TRACEPOINTS
> -   select HAVE_CBPF_JIT
> +   select HAVE_CBPF_JIT if CPU_BIG_ENDIAN
> select HAVE_ARCH_JUMP_LABEL
> select ARCH_HAVE_NMI_SAFE_CMPXCHG
> select ARCH_HAS_GCOV_PROFILE_ALL
> 
> 
> Michael,
> Let me know your thoughts on whether you intend to take this patch or 
> Cascardo's patch for -stable before the eBPF patches. I can redo my 
> patches accordingly.

This patch sounds like the best option at the moment for something we can
backport. Unless the changes to fix it are minimal.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [6/6] ppc: ebpf/jit: Implement JIT compiler for extended BPF

2016-06-21 Thread Michael Ellerman

On Tue, 2016-06-21 at 12:28 +0530, Naveen N. Rao wrote:
> On 2016/06/21 09:38AM, Michael Ellerman wrote:
> > On Sun, 2016-06-19 at 23:06 +0530, Naveen N. Rao wrote:
> > > 
> > > #include 
> > > 
> > > in bpf_jit_comp64.c
> > > 
> > > Can you please check if it resolves the build error?
> > 
> > Can you? :D
> 
> :)
> Sorry, I should have explained myself better. I did actually try your 
> config and I was able to reproduce the build error. After the above 
> #include, that error went away, but I saw some vdso related errors. I 
> thought I was doing something wrong and needed a different setup for 
> that particular kernel config, which is why I requested your help in the 
> matter. I just didn't do a good job of putting across that message...
 
Ah OK. Not sure why you're seeing VDSO errors?

> Note to self: randconfig builds *and* more time drafting emails :)

No stress. You don't need to do randconfig builds, or even build all the
arch/powerpc/ configs, just try to do a reasonable set, something like - ppc64,
powernv, pseries, pmac32, ppc64e.

I'm happy to catch the esoteric build failures.

> Do you want me to respin the patches?

No that's fine, I'll fix it up here.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: Fix faults caused by radix patching of SLB miss handler

2016-06-21 Thread Michael Ellerman

As part of the Radix MMU support we added some feature sections in the
SLB miss handler. These are intended to catch the case that we
incorrectly take an SLB miss when Radix is enabled, and instead of
crashing weirdly they bail out to a well defined exit path and trigger
an oops.

However the way they were written meant the bailout case was enabled by
default until we did CPU feature patching.

On powermacs the early debug prints in setup_system() can cause an SLB
miss, which happens before code patching, and so the SLB miss handler
would incorrectly bailout and crash during boot.

Fix it by inverting the sense of the feature section, so that the code
which is in place at boot is correct for the hash case. Once we
determine we are using Radix - which will never happen on a powermac -
only then do we patch in the bailout case which unconditionally jumps.

Fixes: caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash 
related code")
Reported-by: Denis Kirjanov 
Tested-by: Denis Kirjanov 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/exceptions-64s.S | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 4c9440629128..8bcc1b457115 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1399,11 +1399,12 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX)
lwz r9,PACA_EXSLB+EX_CCR(r13)   /* get saved CR */
 
mtlrr10
-BEGIN_MMU_FTR_SECTION
-   b   2f
-END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX)
andi.   r10,r12,MSR_RI  /* check for unrecoverable exception */
+BEGIN_MMU_FTR_SECTION
beq-2f
+FTR_SECTION_ELSE
+   b   2f
+ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_RADIX)
 
 .machine   push
 .machine   "power4"
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v6, 1/2] cxl: Add mechanism for delivering AFU driver specific events

2016-06-21 Thread Vaibhav Jain

Hi Ian,

Ian Munsie  writes:

> Excerpts from Vaibhav Jain's message of 2016-06-20 14:20:16 +0530:
>> > +int cxl_unset_driver_ops(struct cxl_context *ctx)
>> > +{
>> > +if (atomic_read(>afu_driver_events))
>> > +return -EBUSY;
>> > +
>> > +ctx->afu_driver_ops = NULL;
>> Need a write memory barrier so that afu_driver_ops isnt possibly called
>> after this store.
>
> What situation do you think this will help? I haven't looked closely at
> the last few iterations of this patch set, but if you're in a situation
> where you might be racing with some code doing e.g.
>
> if (ctx->afu_driver_ops)
>   ctx->afu_driver_ops->something();
>
> You have a race with or without a memory barrier. Ideally you would just
> have the caller guarantee that it will only call cxl_unset_driver_ops if
> no further calls to afu_driver_ops is possible, otherwise you may need
> locking here which would be far from ideal.
Yes, agree that wmb wont save against the race condition mentioned and
this is much better handled with locking. But imho having a wmb is still
better compared to having no locking for this shared variable.

>
> What exactly is the use case for this API? I'd vote to drop it if we can
> do without it.
Agree with this. Functionality of this API can be merged with
cxl_set_driver_ops when called with NULL arg for cxl_afu_driver_ops.

~ Vaibhav

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/align: Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE

2016-06-21 Thread Arnd Bergmann

On Tuesday, June 21, 2016 10:51:00 AM CEST Michael Ellerman wrote:
> On Fri, 2016-06-17 at 12:46 +0200, Arnd Bergmann wrote:
> > On Friday, June 17, 2016 1:35:35 PM CEST Daniel Axtens wrote:
> > > > It would be better to fix the sparse compilation so the same endianess
> > > > is set that you get when calling gcc.
> > > 
> > > I will definitely work on a patch to sparse! I'd still like this or
> > > something like it to go in though, so we can keep working on reducing
> > > the sparse warning count while the sparse patch is in the works.
> > 
> > I think you just need to fix the Makefile so it sets the right
> > arguments when calling sparse.
> > 
> > Something like the (untested) patch below, similar to how we
> > already handle the word size and how some other architectures
> > handle setting __BIG_ENDIAN__.
> 
> Yep that's clearly better. I didn't know we had separate CHECKER_FLAGS.
> 
> Daniel can you test that?
> 
> Arnd we'll add Suggested-by: you, or send a SOB if you like?
> 

Please use 'Suggested-by', the main work for this patch was in
analysing the problem and writing the changelog, and Daniel did that.

Arnd

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-21 Thread Naveen N. Rao

On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote:
> On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote:
> > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote:
> > > 
> > > Hi, Michael and Naveen.
> > > 
> > > I noticed independently that there is a problem with BPF JIT and ABIv2, 
> > > and
> > > worked out the patch below before I noticed Naveen's patchset and the 
> > > latest
> > > changes in ppc tree for a better way to check for ABI versions.
> > > 
> > > However, since the issue described below affect mainline and stable 
> > > kernels,
> > > would you consider applying it before merging your two patchsets, so that 
> > > we can
> > > more easily backport the fix?
> > 
> > Hi Cascardo,
> > Given that this has been broken on ABIv2 since forever, I didn't bother 
> > fixing it. But, I can see why this would be a good thing to have for 
> > -stable and existing distros. However, while your patch below may fix 
> > the crash you're seeing on ppc64le, it is not sufficient -- you'll need 
> > changes in bpf_jit_asm.S as well.
> 
> Hi, Naveen.
> 
> Any tips on how to exercise possible issues there? Or what changes you think
> would be sufficient?

The calling convention is different with ABIv2 and so we'll need changes 
in bpf_slow_path_common() and sk_negative_common().

However, rather than enabling classic JIT for ppc64le, are we better off 
just disabling it?

--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -128,7 +128,7 @@ config PPC
select IRQ_FORCED_THREADING
select HAVE_RCU_TABLE_FREE if SMP
select HAVE_SYSCALL_TRACEPOINTS
-   select HAVE_CBPF_JIT
+   select HAVE_CBPF_JIT if CPU_BIG_ENDIAN
select HAVE_ARCH_JUMP_LABEL
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_HAS_GCOV_PROFILE_ALL


Michael,
Let me know your thoughts on whether you intend to take this patch or 
Cascardo's patch for -stable before the eBPF patches. I can redo my 
patches accordingly.


- Naveen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH v2 1/4] PCI: Ignore resource_alignment if PCI_PROBE_ONLY was set\\

2016-06-21 Thread Yongji Xie


On 2016/6/21 10:16, Yongji Xie wrote:


On 2016/6/21 9:43, Bjorn Helgaas wrote:


On Thu, Jun 02, 2016 at 01:46:48PM +0800, Yongji Xie wrote:

The resource_alignment will releases memory resources allocated
by firmware so that kernel can reassign new resources later on.
But this will cause the problem that no resources can be
allocated by kernel if PCI_PROBE_ONLY was set, e.g. on pSeries
platform because PCI_PROBE_ONLY force kernel to use firmware
setup and not to reassign any resources.

To solve this problem, this patch ignores resource_alignment
if PCI_PROBE_ONLY was set.

Signed-off-by: Yongji Xie 
---
  drivers/pci/pci.c |6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index c8b4dbd..a259394 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4761,6 +4761,12 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev)

  spin_lock(_alignment_lock);
  p = resource_alignment_param;
  while (*p) {
+if (pci_has_flag(PCI_PROBE_ONLY)) {
+printk(KERN_ERR "PCI: Ignore resource_alignment 
parameter: %s with PCI_PROBE_ONLY set\n",

+p);
+*p = 0;
+break;

Wouldn't it be simpler to make pci_set_resource_alignment_param() fail
if PCI_PROBE_ONLY is set?


I add the check here because I want to print some logs so that users
could know the reason why resource_alignment doesn't work when
they add this parameter.

Thanks,
Yongji



Sorry, please ignore the previous reply. I didn't add this check in
pci_set_resource_alignment_param() because PCI_PROBE_ONLY
may be set after we parse "resource_alignment".

And it seems that printk_once() may be better here so that
we don't need to set *p = 0.

Thanks,
Yongji


+}
  count = 0;
  if (sscanf(p, "%d%n", _order, ) == 1 &&
  p[count] == '@') {
--
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] leds: Add no-op gpio_led_register_device when LED subsystem is disabled

2016-06-21 Thread Jacek Anaszewski


Hi Andrew,

This patch doesn't apply, please rebase onto recent LED tree.

On 06/21/2016 12:13 AM, Andrew F. Davis wrote:

Some systems use 'gpio_led_register_device' to make an in-memory copy of
their LED device table so the original can be removed as .init.rodata.
When the LED subsystem is not enabled source in the led directory is not
built and so this function may be undefined. Fix this here.

Signed-off-by: Andrew F. Davis 
---
  include/linux/leds.h | 8 
  1 file changed, 8 insertions(+)

diff --git a/include/linux/leds.h b/include/linux/leds.h
index d2b1306..a4a3da6 100644
--- a/include/linux/leds.h
+++ b/include/linux/leds.h
@@ -386,8 +386,16 @@ struct gpio_led_platform_data {
 unsigned long *delay_off);


Currently there is some stuff here, and in fact it has been for
a long time.

Patch "[PATCH 12/12] leds: Only descend into leds directory when
CONFIG_NEW_LEDS is set" also doesn't apply.
What repository are you using?


  };

+#ifdef CONFIG_NEW_LEDS
  struct platform_device *gpio_led_register_device(
 int id, const struct gpio_led_platform_data *pdata);
+#else
+static inline struct platform_device *gpio_led_register_device(
+   int id, const struct gpio_led_platform_data *pdata)
+{
+   return 0;
+}
+#endif

  enum cpu_led_event {
 CPU_LED_IDLE_START, /* CPU enters idle */




--
Best regards,
Jacek Anaszewski
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [6/6] ppc: ebpf/jit: Implement JIT compiler for extended BPF

2016-06-21 Thread Naveen N. Rao

On 2016/06/21 09:38AM, Michael Ellerman wrote:
> On Sun, 2016-06-19 at 23:06 +0530, Naveen N. Rao wrote:
> > On 2016/06/17 10:53PM, Michael Ellerman wrote:
> > > On Tue, 2016-07-06 at 13:32:23 UTC, "Naveen N. Rao" wrote:
> > > > diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> > > > b/arch/powerpc/net/bpf_jit_comp64.c
> > > > new file mode 100644
> > > > index 000..954ff53
> > > > --- /dev/null
> > > > +++ b/arch/powerpc/net/bpf_jit_comp64.c
> > > > @@ -0,0 +1,956 @@
> > > ...
> 
> > > > +
> > > > +static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
> > > > +{
> > > > +   int *p = area;
> > > > +
> > > > +   /* Fill whole space with trap instructions */
> > > > +   while (p < (int *)((char *)area + size))
> > > > +   *p++ = BREAKPOINT_INSTRUCTION;
> > > > +}
> > > 
> > > This breaks the build for some configs, presumably you're missing a 
> > > header:
> > > 
> > >   arch/powerpc/net/bpf_jit_comp64.c:30:10: error: 
> > > 'BREAKPOINT_INSTRUCTION' undeclared (first use in this function)
> > > 
> > > http://kisskb.ellerman.id.au/kisskb/buildresult/12720611/
> > 
> > Oops. Yes, I should have caught that. I need to add:
> > 
> > #include 
> > 
> > in bpf_jit_comp64.c
> > 
> > Can you please check if it resolves the build error?
> 
> Can you? :D

:)
Sorry, I should have explained myself better. I did actually try your 
config and I was able to reproduce the build error. After the above 
#include, that error went away, but I saw some vdso related errors. I 
thought I was doing something wrong and needed a different setup for 
that particular kernel config, which is why I requested your help in the 
matter. I just didn't do a good job of putting across that message...

Note to self: randconfig builds *and* more time drafting emails :)

Do you want me to respin the patches?


Thanks,
Naveen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND PATCH v2 4/4] PCI: Add support for enforcing all MMIO BARs to be page aligned

2016-06-21 Thread Yongji Xie


On 2016/6/21 10:26, Bjorn Helgaas wrote:


On Thu, Jun 02, 2016 at 01:46:51PM +0800, Yongji Xie wrote:

When vfio passthrough a PCI device of which MMIO BARs are
smaller than PAGE_SIZE, guest will not handle the mmio
accesses to the BARs which leads to mmio emulations in host.

This is because vfio will not allow to passthrough one BAR's
mmio page which may be shared with other BARs. Otherwise,
there will be a backdoor that guest can use to access BARs
of other guest.

To solve this issue, this patch modifies resource_alignment
to support syntax where multiple devices get the same
alignment. So we can use something like
"pci=resource_alignment=*:*:*.*:noresize" to enforce the
alignment of all MMIO BARs to be at least PAGE_SIZE so that
one BAR's mmio page would not be shared with other BARs.

And we also define a macro PCIBIOS_MIN_ALIGNMENT to enable this
automatically on PPC64 platform which can easily hit this issue
because its PAGE_SIZE is 64KB.

Note that this would not be applied to VFs whose BARs are always
page aligned and should be never reassigned according to SRIOV
spec.

I see that SR-IOV spec r1.1, sec 3.3.13 requires that all VF BAR
resources be aligned on System Page Size, and must be sized to consume
an integral number of pages.

Where does it say VF BARs can't be reassigned?  I thought they *could*
be reassigned, as long as VFs are disabled when you do it.


Oh, sorry. I made a mistake here. We can reassign VF BARs by writing the
alignment to System Page Size(20h) when VFs are disabled.

As you said below,  VF BARs are read-only zeroes, the normal way(writing
BARs) of resources allocation wouldn't be applied to VFs. The resources
allocation of VFs have been determined when we enable SR-IOV capability.
So we should not touch VF BARs here. It's useless and will release the
allocated resources of VFs which leads to a bug.


Signed-off-by: Yongji Xie 
---
  Documentation/kernel-parameters.txt |2 ++
  arch/powerpc/include/asm/pci.h  |2 ++
  drivers/pci/pci.c   |   68 +--
  3 files changed, 61 insertions(+), 11 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index c4802f5..cb09503 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3003,6 +3003,8 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
aligned memory resources.
If  is not specified,
PAGE_SIZE is used as alignment.
+   , ,  and  can be set to
+   "*" which means match all values.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
noresize: Don't change the resources' sizes when
diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index a6f3ac0..742fd34 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -28,6 +28,8 @@
  #define PCIBIOS_MIN_IO0x1000
  #define PCIBIOS_MIN_MEM   0x1000
  
+#define PCIBIOS_MIN_ALIGNMENT  PAGE_SIZE

+
  struct pci_dev;
  
  /* Values for the `which' argument to sys_pciconfig_iobase syscall.  */

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 3ee13e5..664f295 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4759,7 +4759,12 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
int seg, bus, slot, func, align_order, count;
resource_size_t align = 0;
char *p;
+   bool invalid = false;
  
+#ifdef PCIBIOS_MIN_ALIGNMENT

+   align = PCIBIOS_MIN_ALIGNMENT;
+   *resize = false;
+#endif

This PCIBIOS_MIN_ALIGNMENT part should be a separate patch by itself.


OK, I will.


If you have PCIBIOS_MIN_ALIGNMENT enabled automatically for powerpc,
do you still need the command-line argument?


Other archs may benefit from this. And using command-line seems to be
more flexible that we can enable/disable this feature dynamically.


spin_lock(_alignment_lock);
p = resource_alignment_param;
while (*p) {
@@ -4776,16 +4781,49 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
} else {
align_order = -1;
}
-   if (sscanf(p, "%x:%x:%x.%x%n",
-   , , , , ) != 4) {
+   if (p[0] == '*' && p[1] == ':') {
+   seg = -1;
+   count = 1;
+   } else if (sscanf(p, "%x%n", , ) != 1 ||
+   p[count] != ':') {
+   invalid = true;
+   break;
+   }
+   p += count + 1;
+   if (*p == '*') {
+

Re: [v6, 08/11] powerpc/powernv: Add platform support for stop instruction

2016-06-21 Thread Michael Neuling


> > > +#define OPAL_PM_TIMEBASE_STOP0x0002
> > > +#define OPAL_PM_LOSE_HYP_CONTEXT 0x2000
> > > +#define OPAL_PM_LOSE_FULL_CONTEXT0x4000
> > >  #define OPAL_PM_NAP_ENABLED  0x0001
> > >  #define OPAL_PM_SLEEP_ENABLED0x0002
> > >  #define OPAL_PM_WINKLE_ENABLED   0x0004
> > >  #define OPAL_PM_SLEEP_ENABLED_ER10x0008 /* with
> > > workaround */
> > > +#define OPAL_PM_STOP_INST_FAST   0x0010
> > > +#define OPAL_PM_STOP_INST_DEEP   0x0020
> > I don't see the above in skiboot yet?
> I've posted it here -
> http://patchwork.ozlabs.org/patch/617828/

FWIW, this is in now.

https://github.com/open-power/skiboot/commit/952daa69baca407383bc900911f6c40718a0e289

> > 
> > 
> > > 
> > > diff --git a/arch/powerpc/include/asm/paca.h
> > > b/arch/powerpc/include/asm/paca.h
> > > index 546540b..ae91b44 100644
> > > --- a/arch/powerpc/include/asm/paca.h
> > > +++ b/arch/powerpc/include/asm/paca.h
> > > @@ -171,6 +171,8 @@ struct paca_struct {
> > >   /* Mask to denote subcore sibling threads */
> > >   u8 subcore_sibling_mask;
> > >  #endif
> > > + /* Template for PSSCR with EC, ESL, TR, PSLL, MTL fields set
> > > */
> > > + u64 thread_psscr;
> > I'm not entirely clear on why that needs to be in the paca. Could it
> > not be global?
> > 
> While we use Requested Level (RL) field of PSSCR to request a stop
> level, other fields in the SPR like EC, ESL, TR, PSLL, MTL can be
> modified by individual threads less frequently to alter the behaviour of
> stop. So the idea was to have a per-thread variable with all (except RL)
> fields of PSSCR set appropriately. Threads at the time of entering idle,
> can modify the RL field in the variable and execute stop instruction.

But we don't do any of this currently? This is setup at init
in pnv_init_idle_states() and only the RL is changed in power_stop().

So it can still be a global.  It could just be a constant currently even.

>   .text
> > >  
> > >  /*
> > > @@ -61,8 +75,19 @@ save_sprs_to_stack:
> > >    * Note all register i.e per-core, per-subcore or per-thread 
> > > is saved
> > >    * here since any thread in the core might wake up first
> > >    */
> > > +BEGIN_FTR_SECTION
> > > + mfspr   r3,SPRN_PTCR
> > > + std r3,_PTCR(r1)
> > > + mfspr   r3,SPRN_LMRR
> > > + std r3,_LMRR(r1)
> > > + mfspr   r3,SPRN_LMSER
> > > + std r3,_LMSER(r1)
> > > + mfspr   r3,SPRN_ASDR
> > > + std r3,_ASDR(r1)
> > > +FTR_SECTION_ELSE
> > A comment here saying that SDR1 is removed in ISA 3.0 would be helpful.
> > 
> Ok.

I thought we decided we didn't need LMRR, LMSR, 

https://lkml.org/lkml/2016/6/8/1121

or ASDR isn't actually used at all yet and is only valid for some page
faults, so we don't need it here also.

> +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX)
> > > +
> > > + /* Restore per thread state */
> > > +BEGIN_FTR_SECTION
> > > + bl  __restore_cpu_power9
> > > +
> > > + ld  r4,_LMRR(r1)
> > > + mtspr   SPRN_LMRR,r4
> > > + ld  r4,_LMSER(r1)
> > > + mtspr   SPRN_LMSER,r4
> > > + ld  r4,_ASDR(r1)
> > > + mtspr   SPRN_ASDR,r4
> > Should those be in __restore_cpu_power9 ?
> I was not sure how these registers will be used, but after speaking to
> Aneesh and Mikey I realized these registers will not need restoring.
> LMRR and LMSER are associated with the context and ADSR will be consumed
> before entering stop. So I'll be dropping the this hunk in next revision.

Yep.

> 
> > >   pnv_alloc_idle_core_states();
> > >  
> > > + if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST)
> > > + for_each_possible_cpu(i) {
> > > +
> > > + u64 psscr_init_val = PSSCR_ESL | PSSCR_EC |
> > > + PSSCR_PSLL_MASK | PSSCR_TR_MASK |
> > > + PSSCR_MTL_MASK;
> > > +
> > > + paca[i].thread_psscr = psscr_init_val;

This seems to be the only place you set this.  Why put it in the paca, why
not just make this a constant? 

Mikey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/powernv: Exclude MSI region in extended bridge window

2016-06-21 Thread Gavin Shan

On Tue, Jun 21, 2016 at 02:30:48PM +1000, Michael Ellerman wrote:
>On Tue, 2016-21-06 at 02:41:05 UTC, Gavin Shan wrote:
>> The windows of root port and bridge behind that are extended to
>> the PHB's windows to accomodate the PCI hotplug happening in
>> future. The PHB's 64KB 32-bits MSI region is included in bridge's
>> M32 windows (in hardware) though it's excluded in the corresponding
>> resource, as the bridge's M32 windows have 1MB as their minimal
>> alignment. We observed EEH error during system boot when the MSI
>> region is included in bridge's M32 window.
>> 
>> This excludes top 1MB (including 64KB 32-bits MSI region) region
>> from bridge's M32 windows when extending them.
>
>AFAICS you added that code in "powerpc/powernv: Extend PCI bridge resources", 
>so
>I'll squash it into that. That way there is no window of breakage.
>

Yeah, I guess it's the best way to go. Thanks a lot, Michael.

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

63 matches

Mail list logo