Re: [PATCH] ppc: Fix BPF JIT for ABIv2
On Tue, 2016-06-21 at 14:28 +0530, Naveen N. Rao wrote: > On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote: > > On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote: > > > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote: > > > > > > > > Hi, Michael and Naveen. > > > > > > > > I noticed independently that there is a problem with BPF JIT and ABIv2, > > > > and > > > > worked out the patch below before I noticed Naveen's patchset and the > > > > latest > > > > changes in ppc tree for a better way to check for ABI versions. > > > > > > > > However, since the issue described below affect mainline and stable > > > > kernels, > > > > would you consider applying it before merging your two patchsets, so > > > > that we can > > > > more easily backport the fix? > > > > > > Hi Cascardo, > > > Given that this has been broken on ABIv2 since forever, I didn't bother > > > fixing it. But, I can see why this would be a good thing to have for > > > -stable and existing distros. However, while your patch below may fix > > > the crash you're seeing on ppc64le, it is not sufficient -- you'll need > > > changes in bpf_jit_asm.S as well. > > > > Hi, Naveen. > > > > Any tips on how to exercise possible issues there? Or what changes you think > > would be sufficient? > > The calling convention is different with ABIv2 and so we'll need changes > in bpf_slow_path_common() and sk_negative_common(). > > However, rather than enabling classic JIT for ppc64le, are we better off > just disabling it? > > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -128,7 +128,7 @@ config PPC > select IRQ_FORCED_THREADING > select HAVE_RCU_TABLE_FREE if SMP > select HAVE_SYSCALL_TRACEPOINTS > - select HAVE_CBPF_JIT > + select HAVE_CBPF_JIT if CPU_BIG_ENDIAN > select HAVE_ARCH_JUMP_LABEL > select ARCH_HAVE_NMI_SAFE_CMPXCHG > select ARCH_HAS_GCOV_PROFILE_ALL > > > Michael, > Let me know your thoughts on whether you intend to take this patch or > Cascardo's patch for -stable before the eBPF patches. I can redo my > patches accordingly. Can one of you send me a proper version of this patch, with change log and sign-off etc. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] ppc: Fix BPF JIT for ABIv2
On Fri, 2016-06-17 at 10:00 -0300, Thadeu Lima de Souza Cascardo wrote: > From a984dc02b6317a1d3a3c2302385adba5227be5bd Mon Sep 17 00:00:00 2001 > From: Thadeu Lima de Souza Cascardo> Date: Wed, 15 Jun 2016 13:22:12 -0300 > Subject: [PATCH] ppc: Fix BPF JIT for ABIv2 > > ABIv2 used for ppc64le does not use function descriptors. Without this patch, > whenever BPF JIT is enabled, we get a crash as below. > ... > diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h > index 889fd19..28b89ed 100644 > --- a/arch/powerpc/net/bpf_jit.h > +++ b/arch/powerpc/net/bpf_jit.h > @@ -70,7 +70,7 @@ DECLARE_LOAD_FUNC(sk_load_half); > DECLARE_LOAD_FUNC(sk_load_byte); > DECLARE_LOAD_FUNC(sk_load_byte_msh); > > -#ifdef CONFIG_PPC64 > +#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF != 2) > #define FUNCTION_DESCR_SIZE 24 > #else > #define FUNCTION_DESCR_SIZE 0 > diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c > index 2d66a84..035b887 100644 > --- a/arch/powerpc/net/bpf_jit_comp.c > +++ b/arch/powerpc/net/bpf_jit_comp.c > @@ -664,7 +664,7 @@ void bpf_jit_compile(struct bpf_prog *fp) > > if (image) { > bpf_flush_icache(code_base, code_base + (proglen/4)); > -#ifdef CONFIG_PPC64 > +#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF != 2) > /* Function descriptor nastiness: Address + TOC */ > ((u64 *)image)[0] = (u64)code_base; > ((u64 *)image)[1] = local_paca->kernel_toc; Confirmed that even with this patch we still crash: # echo 1 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf BPF filter opcode 0020 (@3) unsupported BPF filter opcode 0020 (@2) unsupported BPF filter opcode 0020 (@0) unsupported Unable to handle kernel paging request for data at address 0xd54f65e8 Faulting instruction address: 0xc08765f8 cpu 0x0: Vector: 300 (Data Access) at [c34f3480] pc: c08765f8: skb_copy_bits+0x158/0x330 lr: c008fb7c: bpf_slow_path_byte+0x28/0x54 sp: c34f3700 msr: 80010280b033 dar: d54f65e8 dsisr: 4000 current = 0xc001f857d8d0 paca= 0xc7b8 softe: 0irq_happened: 0x01 pid = 2993, comm = modprobe Linux version 4.7.0-rc3-00055-g9497a1c1c5b4-dirty (mich...@ka3.ozlabs.ibm.com) () #30 SMP Wed Jun 22 15:06:58 AEST 2016 enter ? for help [c34f3770] c008fb7c bpf_slow_path_byte+0x28/0x54 [c34f37e0] d7bb004c [c34f3900] d5331668 test_bpf_init+0x5fc/0x7f8 [test_bpf] [c34f3a30] c000b628 do_one_initcall+0x68/0x1d0 [c34f3af0] c09beb24 do_init_module+0x90/0x240 [c34f3b80] c01642bc load_module+0x206c/0x22f0 [c34f3d30] c01648b0 SyS_finit_module+0x120/0x180 [c34f3e30] c0009260 system_call+0x38/0x108 --- Exception: c01 (System Call) at 3fff7ffa2db4 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
On Tue, Jun 21, 2016 at 09:47:19PM +0200, Peter Zijlstra wrote: > On Tue, Jun 21, 2016 at 03:43:56PM -0400, Tejun Heo wrote: > > On Tue, Jun 21, 2016 at 09:37:09PM +0200, Peter Zijlstra wrote: > > > Hurm.. So I've applied it, just to get this issue sorted, but I'm not > > > entirely sure I like it. > > > > > > I think I prefer ego's version because that makes it harder to get stuff > > > to run on !active,online cpus. I think we really want to be careful what > > > gets to run during that state. > > > > The original patch just did set_cpus_allowed one more time late enough > > so that the target kthread (in most cases) doesn't have to go through > > fallback rq selection afterwards. I don't know what the long term > > solution is but CPU_ONLINE callbacks should be able to bind kthreads > > to the new CPU one way or the other. > > Fair enough; clearly I need to stare harder. In any case, patch is on > its way to sched/urgent. Thanks Tejun, Peter! > -- Regards gautham. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 07/11] powerpc/powernv: set power_save func after the idle states are initialized
On Wed, 2016-06-22 at 11:54 +1000, Benjamin Herrenschmidt wrote: > On Wed, 2016-06-08 at 11:54 -0500, Shreyas B. Prabhu wrote: > > > > pnv_init_idle_states discovers supported idle states from the > > device tree and does the required initialization. Set power_save > > function pointer only after this initialization is done > > > > Reviewed-by: Gautham R. Shenoy> > Signed-off-by: Shreyas B. Prabhu > Acked-by: Benjamin Herrenschmidt > > Please merge that one as-is now, no need to wait for the rest, as > otherwise pwoer9 crashes at boot. It doesn't need to wait for the > rest of the series. Acked-by: Michael Neuling For the same reason. Without this we need powersave=off on the cmdline on POWER9. Mikey > > Cheers, > Ben. > > > > > --- > > - No changes since v1 > > > > arch/powerpc/platforms/powernv/idle.c | 3 +++ > > arch/powerpc/platforms/powernv/setup.c | 2 +- > > 2 files changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/platforms/powernv/idle.c > > b/arch/powerpc/platforms/powernv/idle.c > > index fcc8b68..fbb09fb 100644 > > --- a/arch/powerpc/platforms/powernv/idle.c > > +++ b/arch/powerpc/platforms/powernv/idle.c > > @@ -285,6 +285,9 @@ static int __init pnv_init_idle_states(void) > > } > > > > pnv_alloc_idle_core_states(); > > + > > + if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED) > > + ppc_md.power_save = power7_idle; > > out_free: > > kfree(flags); > > out: > > diff --git a/arch/powerpc/platforms/powernv/setup.c > > b/arch/powerpc/platforms/powernv/setup.c > > index ee6430b..8492bbb 100644 > > --- a/arch/powerpc/platforms/powernv/setup.c > > +++ b/arch/powerpc/platforms/powernv/setup.c > > @@ -315,7 +315,7 @@ define_machine(powernv) { > > .get_proc_freq = pnv_get_proc_freq, > > .progress = pnv_progress, > > .machine_shutdown = pnv_shutdown, > > - .power_save = power7_idle, > > + .power_save = NULL, > > .calibrate_decr = generic_calibrate_decr, > > #ifdef CONFIG_KEXEC > > .kexec_cpu_down = pnv_kexec_cpu_down, ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v13 01/16] PCI: Let pci_mmap_page_range() take resource address
On Sat, Jun 18, 2016 at 5:17 AM, Bjorn Helgaaswrote: > On Fri, Jun 17, 2016 at 07:24:46PM -0700, Yinghai Lu wrote: >> In 8c05cd08a7 ("PCI: fix offset check for sysfs mmapped files"), try >> to check exposed value with resource start/end in proc mmap path. >> >> |start = vma->vm_pgoff; >> |size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1; >> |pci_start = (mmap_api == PCI_MMAP_PROCFS) ? >> |pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0; >> |if (start >= pci_start && start < pci_start + size && >> |start + nr <= pci_start + size) >> >> That breaks sparc that exposed value is BAR value, and need to be offseted >> to resource address. > > I asked this same question of the v12 patch, but I don't think you > answered it: > > I'm not quite sure what you're saying here. Are you saying that sparc > is currently broken, and this patch fixes it? If so, what exactly is > broken? Can you give a small example of an mmap that is currently > broken? Yes, for sparc that path (proc mmap) is broken, but only according to code checking. The reason for the problem is not discovered is that seem all users (other than x86) are not use proc_mmap ? vma->vm_pgoff is that code segment is User/BAR value >> PAGE_SHIFT. pci_start is resource->start >> PAGE_SHIFT. For sparc, resource start is different from BAR start aka pci bus address. pci bus address add offset to be the resource start. Thanks Yinghai ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] ppc: Fix BPF JIT for ABIv2
On Tue, 2016-06-21 at 08:45 -0700, Alexei Starovoitov wrote: > On 6/21/16 7:47 AM, Thadeu Lima de Souza Cascardo wrote: > > > > > > > > The calling convention is different with ABIv2 and so we'll need changes > > > > in bpf_slow_path_common() and sk_negative_common(). > > > > > > How big would those changes be? Do we know? > > > > > > How come no one reported this was broken previously? This is the first > > > I've > > > heard of it being broken. > > > > > > > I just heard of it less than two weeks ago, and only could investigate it > > last > > week, when I realized mainline was also affected. > > > > It looks like the little-endian support for classic JIT were done before the > > conversion to ABIv2. And as JIT is disabled by default, no one seems to have > > exercised it. > > it's not a surprise unfortunately. The JITs that were written before > test_bpf.ko was developed were missing corner cases. Typical tcpdump > would be fine, but fragmented packets, negative offsets and > out-out-bounds wouldn't be handled correctly. > I'd suggest to validate the stable backport with test_bpf as well. OK thanks. I have been running seltests/net/test_bpf, but I realise now it doesn't enable the JIT. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3] tools/perf: Fix the mask in regs_dump__printf and print_sample_iregs
On Tuesday 21 June 2016 09:05 PM, Yury Norov wrote: > On Tue, Jun 21, 2016 at 08:26:40PM +0530, Madhavan Srinivasan wrote: >> When decoding the perf_regs mask in regs_dump__printf(), >> we loop through the mask using find_first_bit and find_next_bit functions. >> "mask" is of type "u64", but sent as a "unsigned long *" to >> lib functions along with sizeof(). >> >> While the exisitng code works fine in most of the case, >> the logic is broken when using a 32bit perf on a 64bit kernel (Big Endian). >> When reading u64 using (u32 *)()[0], perf (lib/find_*_bit()) assumes it >> gets >> lower 32bits of u64 which is wrong. Proposed fix is to swap the words >> of the u64 to handle this case. This is _not_ endianess swap. >> >> Suggested-by: Yury Norov>> Cc: Yury Norov >> Cc: Peter Zijlstra >> Cc: Ingo Molnar >> Cc: Arnaldo Carvalho de Melo >> Cc: Alexander Shishkin >> Cc: Jiri Olsa >> Cc: Adrian Hunter >> Cc: Kan Liang >> Cc: Wang Nan >> Cc: Michael Ellerman >> Signed-off-by: Madhavan Srinivasan >> --- >> Changelog v2: >> 1)Moved the swap code to a common function >> 2)Added more comments in the code >> >> Changelog v1: >> 1)updated commit message and patch subject >> 2)Add the fix to print_sample_iregs() in builtin-script.c >> >> tools/include/linux/bitmap.h | 9 + > What about include/linux/bitmap.h? I think we'd place it there first. Wanted to handle that separately. > >> tools/perf/builtin-script.c | 16 +++- >> tools/perf/util/session.c| 16 +++- >> 3 files changed, 39 insertions(+), 2 deletions(-) >> >> diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h >> index 28f5493da491..79998b26eb04 100644 >> --- a/tools/include/linux/bitmap.h >> +++ b/tools/include/linux/bitmap.h >> @@ -2,6 +2,7 @@ >> #define _PERF_BITOPS_H >> >> #include >> +#include >> #include >> >> #define DECLARE_BITMAP(name,bits) \ >> @@ -22,6 +23,14 @@ void __bitmap_or(unsigned long *dst, const unsigned long >> *bitmap1, >> #define small_const_nbits(nbits) \ >> (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG) >> >> +static inline void bitmap_from_u64(unsigned long *_mask, u64 mask) > Inline is not required. Some people don't not like it. Underscored parameter > in Not sure why you say that. IIUC we can avoid a function call overhead, also rest of the functions in the file likes it. > function declaration is not the best idea as well. Try: > static void bitmap_from_u64(unsigned long *bitmap, u64 mask) > >> +{ >> +_mask[0] = mask & ULONG_MAX; >> + >> +if (sizeof(mask) > sizeof(unsigned long)) >> +_mask[1] = mask >> 32; >> +} >> + >> static inline void bitmap_zero(unsigned long *dst, int nbits) >> { >> if (small_const_nbits(nbits)) >> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c >> index e3ce2f34d3ad..73928310fd91 100644 >> --- a/tools/perf/builtin-script.c >> +++ b/tools/perf/builtin-script.c >> @@ -412,11 +412,25 @@ static void print_sample_iregs(struct perf_sample >> *sample, >> struct regs_dump *regs = >intr_regs; >> uint64_t mask = attr->sample_regs_intr; >> unsigned i = 0, r; >> +unsigned long _mask[sizeof(mask)/sizeof(unsigned long)]; > If we start with it, I think we'd hide declaration machinery as well: > > #define DECLARE_L64_BITMAP(__name) unsigned long > __name[sizeof(u64)/sizeof(unsigned long)] > or > #define L64_BITMAP_SIZE (sizeof(u64)/sizeof(unsigned long)) > > Or both :) Whatever you prefer. ok > >> >> if (!regs) >> return; >> >> -for_each_set_bit(r, (unsigned long *) , sizeof(mask) * 8) { >> +/* >> + * Since u64 is passed as 'unsigned long *', check >> + * to see whether we need to swap words within u64. >> + * Reason being, in 32 bit big endian userspace on a >> + * 64bit kernel, 'unsigned long' is 32 bits. >> + * When reading u64 using (u32 *)()[0] and (u32 *)()[1], >> + * we will get wrong value for the mask. This is what >> + * find_first_bit() and find_next_bit() is doing. >> + * Issue here is "(u32 *)()[0]" gets upper 32 bits of u64, >> + * but perf assumes it gets lower 32bits of u64. Hence the check >> + * and swap. >> + */ >> +bitmap_from_u64(_mask, mask); >> +for_each_set_bit(r, _mask, sizeof(mask) * 8) { >> u64 val = regs->regs[i++]; >> printf("%5s:0x%"PRIx64" ", perf_reg_name(r), val); >> } >> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c >> index 5214974e841a..1337b1c73f82 100644 >> --- a/tools/perf/util/session.c >> +++ b/tools/perf/util/session.c >> @@ -940,8 +940,22 @@ static void branch_stack__printf(struct
[PATCH v2] ibmvnic: fix to use list_for_each_safe() when delete items
Since we will remove items off the list using list_del() we need to use a safe version of the list_for_each() macro aptly named list_for_each_safe(). Signed-off-by: Wei Yongjun--- drivers/net/ethernet/ibm/ibmvnic.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 864cb21..ecdb685 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2121,7 +2121,7 @@ static void handle_error_info_rsp(union ibmvnic_crq *crq, struct ibmvnic_adapter *adapter) { struct device *dev = >vdev->dev; - struct ibmvnic_error_buff *error_buff; + struct ibmvnic_error_buff *error_buff, *tmp; unsigned long flags; bool found = false; int i; @@ -2133,7 +2133,7 @@ static void handle_error_info_rsp(union ibmvnic_crq *crq, } spin_lock_irqsave(>error_list_lock, flags); - list_for_each_entry(error_buff, >errors, list) + list_for_each_entry_safe(error_buff, tmp, >errors, list) if (error_buff->error_id == crq->request_error_rsp.error_id) { found = true; list_del(_buff->list); @@ -3141,14 +3141,14 @@ static void handle_request_ras_comp_num_rsp(union ibmvnic_crq *crq, static void ibmvnic_free_inflight(struct ibmvnic_adapter *adapter) { - struct ibmvnic_inflight_cmd *inflight_cmd; + struct ibmvnic_inflight_cmd *inflight_cmd, *tmp1; struct device *dev = >vdev->dev; - struct ibmvnic_error_buff *error_buff; + struct ibmvnic_error_buff *error_buff, *tmp2; unsigned long flags; unsigned long flags2; spin_lock_irqsave(>inflight_lock, flags); - list_for_each_entry(inflight_cmd, >inflight, list) { + list_for_each_entry_safe(inflight_cmd, tmp1, >inflight, list) { switch (inflight_cmd->crq.generic.cmd) { case LOGIN: dma_unmap_single(dev, adapter->login_buf_token, @@ -3165,8 +3165,8 @@ static void ibmvnic_free_inflight(struct ibmvnic_adapter *adapter) break; case REQUEST_ERROR_INFO: spin_lock_irqsave(>error_list_lock, flags2); - list_for_each_entry(error_buff, >errors, - list) { + list_for_each_entry_safe(error_buff, tmp2, +>errors, list) { dma_unmap_single(dev, error_buff->dma, error_buff->len, DMA_FROM_DEVICE); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND][PATCH v2] powerpc: Export thread_struct.used_vr/used_vsr to user space
On Tue, Jun 21, 2016 at 02:30:06PM +0800, Simon Guo wrote: > Hi Michael, > On Wed, Apr 06, 2016 at 03:00:12PM +0800, Simon Guo wrote: > > These 2 fields track whether user process has used Altivec/VSX > > registers or not. They are used by kernel to setup signal frame > > on user stack correctly regarding vector part. > > > > CRIU(Checkpoint and Restore In User space) builds signal frame > > for restored process. It will need this export information to > > setup signal frame correctly. And CRIU will need to restore these > > 2 fields for the restored process. > > > > Signed-off-by: Simon Guo> > Reviewed-by: Laurent Dufour > > --- > > Just a kind reminder per our previous discussion. > If possible, please help pull this in during 4.8 merge window. Some CRIU work > items are pending for it. > > Have a nice day. > > Thanks, > - Simon + linuxppc-dev list Thanks, - Simon (IBM LTC) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] ibmvnic: fix to use list_for_each_safe() when delete items
Hi Thomas Falcon, Thanks for found this. I will send new patch include your changes. Regards, Yongjun Wei On 06/22/2016 12:01 AM, Thomas Falcon wrote: On 06/20/2016 10:50 AM, Thomas Falcon wrote: On 06/17/2016 09:53 PM, weiyj...@163.com wrote: From: Wei YongjunSince we will remove items off the list using list_del() we need to use a safe version of the list_for_each() macro aptly named list_for_each_safe(). Signed-off-by: Wei Yongjun --- drivers/net/ethernet/ibm/ibmvnic.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 864cb21..0b6a922 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -3141,14 +3141,14 @@ static void handle_request_ras_comp_num_rsp(union ibmvnic_crq *crq, static void ibmvnic_free_inflight(struct ibmvnic_adapter *adapter) { - struct ibmvnic_inflight_cmd *inflight_cmd; + struct ibmvnic_inflight_cmd *inflight_cmd, *tmp1; struct device *dev = >vdev->dev; - struct ibmvnic_error_buff *error_buff; + struct ibmvnic_error_buff *error_buff, *tmp2; unsigned long flags; unsigned long flags2; spin_lock_irqsave(>inflight_lock, flags); - list_for_each_entry(inflight_cmd, >inflight, list) { + list_for_each_entry_safe(inflight_cmd, tmp1, >inflight, list) { switch (inflight_cmd->crq.generic.cmd) { case LOGIN: dma_unmap_single(dev, adapter->login_buf_token, @@ -3165,8 +3165,8 @@ static void ibmvnic_free_inflight(struct ibmvnic_adapter *adapter) break; case REQUEST_ERROR_INFO: spin_lock_irqsave(>error_list_lock, flags2); - list_for_each_entry(error_buff, >errors, - list) { + list_for_each_entry_safe(error_buff, tmp2, +>errors, list) { dma_unmap_single(dev, error_buff->dma, error_buff->len, DMA_FROM_DEVICE); Thanks! Acked-by: Thomas Falcon Hello, I apologize for prematurely ack'ing this. There is another situation where you could use list_for_each_entry_safe in the function handle_error_info_rsp. Could you include this in your patch, please? diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 864cb21..e9968d9 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2121,7 +2121,7 @@ static void handle_error_info_rsp(union ibmvnic_crq *crq, struct ibmvnic_adapter *adapter) { struct device *dev = >vdev->dev; - struct ibmvnic_error_buff *error_buff; + struct ibmvnic_error_buff *error_buff, *tmp; unsigned long flags; bool found = false; int i; @@ -2133,7 +2133,7 @@ static void handle_error_info_rsp(union ibmvnic_crq *crq, } spin_lock_irqsave(>error_list_lock, flags); - list_for_each_entry(error_buff, >errors, list) + list_for_each_entry_safe(error_buff, tmp, >errors, list) if (error_buff->error_id == crq->request_error_rsp.error_id) { found = true; list_del(_buff->list); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix faults caused by radix patching of SLB miss handler
Michael Ellermanwrites: > As part of the Radix MMU support we added some feature sections in the > SLB miss handler. These are intended to catch the case that we > incorrectly take an SLB miss when Radix is enabled, and instead of > crashing weirdly they bail out to a well defined exit path and trigger > an oops. > > However the way they were written meant the bailout case was enabled by > default until we did CPU feature patching. > > On powermacs the early debug prints in setup_system() can cause an SLB > miss, which happens before code patching, and so the SLB miss handler > would incorrectly bailout and crash during boot. > > Fix it by inverting the sense of the feature section, so that the code > which is in place at boot is correct for the hash case. Once we > determine we are using Radix - which will never happen on a powermac - > only then do we patch in the bailout case which unconditionally jumps. > > Fixes: caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate > hash related code") > Reported-by: Denis Kirjanov > Tested-by: Denis Kirjanov > Signed-off-by: Michael Ellerman Reviewed-by: Aneesh Kumar K.V > --- > arch/powerpc/kernel/exceptions-64s.S | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 4c9440629128..8bcc1b457115 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -1399,11 +1399,12 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX) > lwz r9,PACA_EXSLB+EX_CCR(r13) /* get saved CR */ > > mtlrr10 > -BEGIN_MMU_FTR_SECTION > - b 2f > -END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX) > andi. r10,r12,MSR_RI /* check for unrecoverable exception */ > +BEGIN_MMU_FTR_SECTION > beq-2f > +FTR_SECTION_ELSE > + b 2f > +ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_RADIX) > > .machine push > .machine "power4" > -- > 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 07/11] powerpc/powernv: set power_save func after the idle states are initialized
On Wed, 2016-06-08 at 11:54 -0500, Shreyas B. Prabhu wrote: > pnv_init_idle_states discovers supported idle states from the > device tree and does the required initialization. Set power_save > function pointer only after this initialization is done > > Reviewed-by: Gautham R. Shenoy> Signed-off-by: Shreyas B. Prabhu Acked-by: Benjamin Herrenschmidt Please merge that one as-is now, no need to wait for the rest, as otherwise pwoer9 crashes at boot. It doesn't need to wait for the rest of the series. Cheers, Ben. > --- > - No changes since v1 > > arch/powerpc/platforms/powernv/idle.c | 3 +++ > arch/powerpc/platforms/powernv/setup.c | 2 +- > 2 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/powernv/idle.c > b/arch/powerpc/platforms/powernv/idle.c > index fcc8b68..fbb09fb 100644 > --- a/arch/powerpc/platforms/powernv/idle.c > +++ b/arch/powerpc/platforms/powernv/idle.c > @@ -285,6 +285,9 @@ static int __init pnv_init_idle_states(void) > } > > pnv_alloc_idle_core_states(); > + > + if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED) > + ppc_md.power_save = power7_idle; > out_free: > kfree(flags); > out: > diff --git a/arch/powerpc/platforms/powernv/setup.c > b/arch/powerpc/platforms/powernv/setup.c > index ee6430b..8492bbb 100644 > --- a/arch/powerpc/platforms/powernv/setup.c > +++ b/arch/powerpc/platforms/powernv/setup.c > @@ -315,7 +315,7 @@ define_machine(powernv) { > .get_proc_freq = pnv_get_proc_freq, > .progress = pnv_progress, > .machine_shutdown = pnv_shutdown, > - .power_save = power7_idle, > + .power_save = NULL, > .calibrate_decr = generic_calibrate_decr, > #ifdef CONFIG_KEXEC > .kexec_cpu_down = pnv_kexec_cpu_down, ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/6] kexec_file: Add buffer hand-over for the next kernel
On 06/20/16 at 10:44pm, Thiago Jung Bauermann wrote: > Hello, > > This patch series implements a mechanism which allows the kernel to pass on > a buffer to the kernel that will be kexec'd. This buffer is passed as a > segment which is added to the kimage when it is being prepared by > kexec_file_load. > > How the second kernel is informed of this buffer is architecture-specific. > On PowerPC, this is done via the device tree, by checking the properties > /chosen/linux,kexec-handover-buffer-start and > /chosen/linux,kexec-handover-buffer-end, which is analogous to how the > kernel finds the initrd. > > This feature was implemented because the Integrity Measurement Architecture > subsystem needs to preserve its measurement list accross the kexec reboot. > This is so that IMA can implement trusted boot support on the OpenPower > platform, because on such systems an intermediary Linux instance running as > part of the firmware is used to boot the target operating system via kexec. > Using this mechanism, IMA on this intermediary instance can hand over to the > target OS the measurements of the components that were used to boot it. We have CONFIG_KEXEC_VERIFY_SIG, why not verifying the kernel to be loaded instead? I feel IMA should rebuild its measurement instead of passing it to another kernel. Kexec reboot is also a reboot. If we have to preserve something get from firmware we can do it, but other than that I think it sounds not a good idea. > > Because there could be additional measurement events between the > kexec_file_load call and the actual reboot, IMA needs a way to update the > buffer with those additional events before rebooting. One can minimize > the interval between the kexec_file_load and the reboot syscalls, but as > small as it can be, there is always the possibility that the measurement > list will be out of date at the time of reboot. > > To address this issue, this patch series also introduces kexec_update_segment, > which allows a reboot notifier to change the contents of the image segment > during the reboot process. > > There's one patch which makes kimage_load_normal_segment and > kexec_update_segment share code. It's not much code that they can share > though, so I'm not sure if it's worth including this patch. > > The last patch is not intended to be merged, it just demonstrates how this > feature can be used. > > This series applies on top of v2 of the "kexec_file_load implementation > for PowerPC" patch series at: The kexec_file_load patches should be addressed first, no? > > http://lists.infradead.org/pipermail/kexec/2016-June/016078.html > > Thiago Jung Bauermann (6): > kexec_file: Add buffer hand-over support for the next kernel > powerpc: kexec_file: Add buffer hand-over support for the next kernel > kexec_file: Allow skipping checksum calculation for some segments. > kexec_file: Add mechanism to update kexec segments. > kexec: Share logic to copy segment page contents. > IMA: Demonstration code for kexec buffer passing. > > arch/powerpc/include/asm/kexec.h | 9 ++ > arch/powerpc/kernel/kexec_elf_64.c | 50 +++- > arch/powerpc/kernel/machine_kexec_64.c | 64 ++ > arch/x86/kernel/crash.c| 4 +- > arch/x86/kernel/kexec-bzimage64.c | 6 +- > include/linux/ima.h| 11 ++ > include/linux/kexec.h | 47 +++- > kernel/kexec_core.c| 205 > ++--- > kernel/kexec_file.c| 102 ++-- > security/integrity/ima/ima.h | 5 + > security/integrity/ima/ima_init.c | 26 + > security/integrity/ima/ima_template.c | 79 + > 12 files changed, 547 insertions(+), 61 deletions(-) > > -- > 1.9.1 > Thanks Dave ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
On Monday 20 June 2016 10:48 AM, Paul Mackerras wrote: > Hi Aravinda, > > On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote: >> Enhance KVM to cause a guest exit with KVM_EXIT_NMI >> exit reasons upon a machine check exception (MCE) in >> the guest address space if the KVM_CAP_PPC_FWNMI >> capability is enabled (instead of delivering 0x200 >> interrupt to guest). This enables QEMU to build error >> log and deliver machine check exception to guest via >> guest registered machine check handler. >> >> This approach simplifies the delivering of machine >> check exception to guest OS compared to the earlier >> approach of KVM directly invoking 0x200 guest interrupt >> vector. In the earlier approach QEMU was enhanced to >> patch the 0x200 interrupt vector during boot. The >> patched code at 0x200 issued a private hcall to pass >> the control to QEMU to build the error log. >> >> This design/approach is based on the feedback for the >> QEMU patches to handle machine check exception. Details >> of earlier approach of handling machine check exception >> in QEMU and related discussions can be found at: >> >> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html >> >> Signed-off-by: Aravinda Prasad> > Are you in the process of doing a new version of this patch with the > requested changes? Yes, I am working (intermittently) on the new version. But, not able to finish off and post it. Will complete it and post the new version. Regards, Aravinda > > Paul. > -- Regards, Aravinda ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 9/9] powerpc: Add purgatory for kexec_file_load implementation.
This purgatory implementation comes from kexec-tools, almost unchanged. The only changes were that the sha256_regions global variable was renamed to sha_regions to match what kexec_file_load expects, and to use the sha256.c file from x86's purgatory to avoid adding yet another SHA-256 implementation. Also, some formatting warnings found by checkpatch.pl were fixed. Signed-off-by: Thiago Jung BauermannCc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org --- arch/powerpc/Makefile | 4 + arch/powerpc/purgatory/.gitignore | 2 + arch/powerpc/purgatory/Makefile | 36 +++ arch/powerpc/purgatory/console-ppc64.c| 38 +++ arch/powerpc/purgatory/crashdump-ppc64.h | 42 arch/powerpc/purgatory/crashdump_backup.c | 36 +++ arch/powerpc/purgatory/crtsavres.S| 5 + arch/powerpc/purgatory/hvCall.S | 27 + arch/powerpc/purgatory/hvCall.h | 8 ++ arch/powerpc/purgatory/kexec-sha256.h | 11 ++ arch/powerpc/purgatory/ppc64_asm.h| 20 arch/powerpc/purgatory/printf.c | 164 ++ arch/powerpc/purgatory/purgatory-ppc64.c | 41 arch/powerpc/purgatory/purgatory-ppc64.h | 6 ++ arch/powerpc/purgatory/purgatory.c| 62 +++ arch/powerpc/purgatory/purgatory.h| 11 ++ arch/powerpc/purgatory/sha256.c | 6 ++ arch/powerpc/purgatory/sha256.h | 1 + arch/powerpc/purgatory/string.S | 1 + arch/powerpc/purgatory/v2wrap.S | 134 20 files changed, 655 insertions(+) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 709a22a3e824..293322855cce 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -249,6 +249,7 @@ core-y += arch/powerpc/kernel/ \ core-$(CONFIG_XMON)+= arch/powerpc/xmon/ core-$(CONFIG_KVM) += arch/powerpc/kvm/ core-$(CONFIG_PERF_EVENTS) += arch/powerpc/perf/ +core-$(CONFIG_KEXEC_FILE) += arch/powerpc/purgatory/ drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/ @@ -370,6 +371,9 @@ archclean: $(Q)$(MAKE) $(clean)=$(boot) archprepare: checkbin +ifeq ($(CONFIG_KEXEC_FILE),y) + $(Q)$(MAKE) $(build)=arch/powerpc/purgatory arch/powerpc/purgatory/kexec-purgatory.c +endif # Use the file '.tmp_gas_check' for binutils tests, as gas won't output # to stdout and these checks are run even on install targets. diff --git a/arch/powerpc/purgatory/.gitignore b/arch/powerpc/purgatory/.gitignore new file mode 100644 index ..e9e66f178a6d --- /dev/null +++ b/arch/powerpc/purgatory/.gitignore @@ -0,0 +1,2 @@ +kexec-purgatory.c +purgatory.ro diff --git a/arch/powerpc/purgatory/Makefile b/arch/powerpc/purgatory/Makefile new file mode 100644 index ..63daf95e5703 --- /dev/null +++ b/arch/powerpc/purgatory/Makefile @@ -0,0 +1,36 @@ +purgatory-y := purgatory.o printf.o string.o v2wrap.o hvCall.o \ + purgatory-ppc64.o console-ppc64.o crashdump_backup.o \ + crtsavres.o sha256.o + +targets += $(purgatory-y) +PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y)) + +LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostartfiles \ + -nostdlib -nodefaultlibs +targets += purgatory.ro + +# Default KBUILD_CFLAGS can have -pg option set when FTRACE is enabled. That +# in turn leaves some undefined symbols like __fentry__ in purgatory and not +# sure how to relocate those. Like kexec-tools, use custom flags. + +KBUILD_CFLAGS := -Wall -Wstrict-prototypes -fno-strict-aliasing \ + -fno-zero-initialized-in-bss -fno-builtin -ffreestanding \ + -fno-PIC -fno-PIE -fno-stack-protector -fno-exceptions \ + -msoft-float -MD -Os +KBUILD_CFLAGS += -m$(CONFIG_WORD_SIZE) + +$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE + $(call if_changed,ld) + +targets += kexec-purgatory.c + +CMD_BIN2C = $(objtree)/scripts/basic/bin2c +quiet_cmd_bin2c = BIN2C $@ + cmd_bin2c = $(CMD_BIN2C) kexec_purgatory < $< > $@ + +$(obj)/kexec-purgatory.c: $(obj)/purgatory.ro FORCE + $(call if_changed,bin2c) + @: + + +obj-$(CONFIG_KEXEC_FILE) += kexec-purgatory.o diff --git a/arch/powerpc/purgatory/console-ppc64.c b/arch/powerpc/purgatory/console-ppc64.c new file mode 100644 index ..3d07be0b5d08 --- /dev/null +++ b/arch/powerpc/purgatory/console-ppc64.c @@ -0,0 +1,38 @@ +/* + * kexec: Linux boots Linux + * + * Created by: Mohan Kumar M (mo...@in.ibm.com) + * + * Copyright (C) IBM Corporation, 2005. All rights reserved + * + * Code taken from kexec-tools. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation (version 2 of the License). + * + * This program is distributed in
[PATCH v3 8/9] powerpc: Add support for loading ELF kernels with kexec_file_load.
This uses all the infrastructure built up by the previous patches in the series to load an ELF vmlinux file and an initrd. It uses the flattened device tree at initial_boot_params as a base and adjusts memory reservations and its /chosen node for the next kernel. elf64_apply_relocate_add was extended to support relative symbols. This is necessary because before relocation, the module loading mechanism adjusts Elf64_Sym.st_value to point to the absolute memory address while the kexec purgatory relocation code does that during relocation. The patch also adds relocation types used by the purgatory. Signed-off-by: Thiago Jung BauermannCc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org --- arch/powerpc/include/asm/elf_util.h | 1 + arch/powerpc/include/asm/kexec_elf_64.h | 10 + arch/powerpc/kernel/Makefile| 5 +- arch/powerpc/kernel/elf_util_64.c | 84 - arch/powerpc/kernel/kexec_elf_64.c | 560 arch/powerpc/kernel/machine_kexec_64.c | 86 - arch/powerpc/kernel/module_64.c | 5 +- 7 files changed, 747 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/elf_util.h b/arch/powerpc/include/asm/elf_util.h index 47d15515ba33..18703d56eabd 100644 --- a/arch/powerpc/include/asm/elf_util.h +++ b/arch/powerpc/include/asm/elf_util.h @@ -86,6 +86,7 @@ int elf64_apply_relocate_add(const struct elf_info *elf_info, const char *strtab, const Elf64_Rela *rela, unsigned int num_rela, void *syms_base, void *loc_base, Elf64_Addr addr_base, +bool relative_symbols, bool check_symbols, const char *obj_name); #endif /* _ASM_POWERPC_ELF_UTIL_H */ diff --git a/arch/powerpc/include/asm/kexec_elf_64.h b/arch/powerpc/include/asm/kexec_elf_64.h new file mode 100644 index ..30da6bc0ccf8 --- /dev/null +++ b/arch/powerpc/include/asm/kexec_elf_64.h @@ -0,0 +1,10 @@ +#ifndef __POWERPC_KEXEC_ELF_64_H__ +#define __POWERPC_KEXEC_ELF_64_H__ + +#ifdef CONFIG_KEXEC_FILE + +extern struct kexec_file_ops kexec_elf64_ops; + +#endif /* CONFIG_KEXEC_FILE */ + +#endif /* __POWERPC_KEXEC_ELF_64_H__ */ diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 8a53fccaa053..b89a2ae1b2a0 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -110,6 +110,7 @@ obj-$(CONFIG_PCI) += pci_$(CONFIG_WORD_SIZE).o $(pci64-y) \ obj-$(CONFIG_PCI_MSI) += msi.o obj-$(CONFIG_KEXEC)+= machine_kexec.o crash.o \ machine_kexec_$(CONFIG_WORD_SIZE).o +obj-$(CONFIG_KEXEC_FILE) += kexec_elf_$(CONFIG_WORD_SIZE).o obj-$(CONFIG_AUDIT)+= audit.o obj64-$(CONFIG_AUDIT) += compat_audit.o @@ -124,9 +125,11 @@ ifneq ($(CONFIG_PPC_INDIRECT_PIO),y) obj-y += iomap.o endif -ifeq ($(CONFIG_MODULES)$(CONFIG_WORD_SIZE),y64) +ifneq ($(CONFIG_MODULES)$(CONFIG_KEXEC_FILE),) +ifeq ($(CONFIG_WORD_SIZE),64) obj-y += elf_util.o elf_util_64.o endif +endif obj64-$(CONFIG_PPC_TRANSACTIONAL_MEM) += tm.o diff --git a/arch/powerpc/kernel/elf_util_64.c b/arch/powerpc/kernel/elf_util_64.c index 8e5d400ac9f2..80f209a42abd 100644 --- a/arch/powerpc/kernel/elf_util_64.c +++ b/arch/powerpc/kernel/elf_util_64.c @@ -74,6 +74,8 @@ static void squash_toc_save_inst(const char *name, unsigned long addr) { } * @syms_base: Contents of the associated symbol table. * @loc_base: Contents of the section to which relocations apply. * @addr_base: The address where the section will be loaded in memory. + * @relative_symbols: Are the symbols' st_value members relative? + * @check_symbols: Fail if an unexpected symbol is found? * @obj_name: The name of the ELF binary, for information messages. * * Applies RELA relocations to an ELF file already at its final location @@ -84,11 +86,13 @@ int elf64_apply_relocate_add(const struct elf_info *elf_info, const char *strtab, const Elf64_Rela *rela, unsigned int num_rela, void *syms_base, void *loc_base, Elf64_Addr addr_base, +bool relative_symbols, bool check_symbols, const char *obj_name) { unsigned int i; unsigned long *location; unsigned long address; + unsigned long sec_base; unsigned long value; const char *name; Elf64_Sym *sym; @@ -121,8 +125,36 @@ int elf64_apply_relocate_add(const struct elf_info *elf_info, name, (unsigned long)sym->st_value, (long)rela[i].r_addend); + if (check_symbols) { + /* +* TOC symbols appear as
[PATCH v3 7/9] powerpc: Implement kexec_file_load.
Adds the basic machinery needed by kexec_file_load. Signed-off-by: Josh SklarSigned-off-by: Thiago Jung Bauermann Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org --- arch/powerpc/Kconfig | 13 + arch/powerpc/include/asm/systbl.h | 1 + arch/powerpc/include/asm/unistd.h | 2 +- arch/powerpc/include/uapi/asm/unistd.h | 1 + arch/powerpc/kernel/machine_kexec_64.c | 50 ++ 5 files changed, 66 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 01f7464d9fea..3ed5770b89e4 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -457,6 +457,19 @@ config KEXEC interface is strongly in flux, so no good recommendation can be made. +config KEXEC_FILE + bool "kexec file based system call" + select KEXEC_CORE + select BUILD_BIN2C + depends on PPC64 + depends on CRYPTO=y + depends on CRYPTO_SHA256=y + help + This is a new version of the kexec system call. This call is + file based and takes in file descriptors as system call arguments + for kernel and initramfs as opposed to a list of segments as is the + case for the older kexec call. + config CRASH_DUMP bool "Build a kdump crash kernel" depends on PPC64 || 6xx || FSL_BOOKE || (44x && !SMP) diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h index 2fc5d4db503c..4b369d83fe9c 100644 --- a/arch/powerpc/include/asm/systbl.h +++ b/arch/powerpc/include/asm/systbl.h @@ -386,3 +386,4 @@ SYSCALL(mlock2) SYSCALL(copy_file_range) COMPAT_SYS_SPU(preadv2) COMPAT_SYS_SPU(pwritev2) +SYSCALL(kexec_file_load) diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index cf12c580f6b2..a01e97d3f305 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -12,7 +12,7 @@ #include -#define NR_syscalls382 +#define NR_syscalls383 #define __NR__exit __NR_exit diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h index e9f5f41aa55a..2f26335a3c42 100644 --- a/arch/powerpc/include/uapi/asm/unistd.h +++ b/arch/powerpc/include/uapi/asm/unistd.h @@ -392,5 +392,6 @@ #define __NR_copy_file_range 379 #define __NR_preadv2 380 #define __NR_pwritev2 381 +#define __NR_kexec_file_load 382 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */ diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 50bf55135ef8..b242f2293a6e 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -31,6 +31,10 @@ #include #include +#ifdef CONFIG_KEXEC_FILE +static struct kexec_file_ops *kexec_file_loaders[] = { }; +#endif + #ifdef CONFIG_PPC_BOOK3E int default_machine_kexec_prepare(struct kimage *image) { @@ -427,3 +431,49 @@ static int __init export_htab_values(void) } late_initcall(export_htab_values); #endif /* CONFIG_PPC_STD_MMU_64 */ + +#ifdef CONFIG_KEXEC_FILE +int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, + unsigned long buf_len) +{ + int i, ret = -ENOEXEC; + struct kexec_file_ops *fops; + + /* We don't support crash kernels yet. */ + if (image->type == KEXEC_TYPE_CRASH) + return -ENOTSUPP; + + for (i = 0; i < ARRAY_SIZE(kexec_file_loaders); i++) { + fops = kexec_file_loaders[i]; + if (!fops || !fops->probe) + continue; + + ret = fops->probe(buf, buf_len); + if (!ret) { + image->fops = fops; + return ret; + } + } + + return ret; +} + +void *arch_kexec_kernel_image_load(struct kimage *image) +{ + if (!image->fops || !image->fops->load) + return ERR_PTR(-ENOEXEC); + + return image->fops->load(image, image->kernel_buf, +image->kernel_buf_len, image->initrd_buf, +image->initrd_buf_len, image->cmdline_buf, +image->cmdline_buf_len); +} + +int arch_kimage_file_post_load_cleanup(struct kimage *image) +{ + if (!image->fops || !image->fops->cleanup) + return 0; + + return image->fops->cleanup(image->image_loader_data); +} +#endif /* CONFIG_KEXEC_FILE */ -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 6/9] powerpc: Add functions to read ELF files of any endianness.
A little endian kernel might need to kexec a big endian kernel (the opposite is less likely but could happen as well), so we can't just cast the buffer with the binary to ELF structs and use them as is done elsewhere. This patch adds functions which do byte-swapping as necessary when populating the ELF structs. These functions will be used in the next patch in the series. Signed-off-by: Thiago Jung BauermannCc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org --- arch/powerpc/include/asm/elf_util.h | 19 ++ arch/powerpc/kernel/Makefile| 2 +- arch/powerpc/kernel/elf_util.c | 476 3 files changed, 496 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/elf_util.h b/arch/powerpc/include/asm/elf_util.h index a012ba03282d..47d15515ba33 100644 --- a/arch/powerpc/include/asm/elf_util.h +++ b/arch/powerpc/include/asm/elf_util.h @@ -20,6 +20,14 @@ #include struct elf_info { + /* +* Where the ELF binary contents are kept. +* Memory managed by the user of the struct. +*/ + const char *buffer; + + const struct elfhdr *ehdr; + const struct elf_phdr *proghdrs; struct elf_shdr *sechdrs; /* Index of stubs section. */ @@ -63,6 +71,17 @@ static inline unsigned long my_r2(const struct elf_info *elf_info) return elf_info->sechdrs[elf_info->toc_section].sh_addr + 0x8000; } +static inline bool elf_is_elf_file(const struct elfhdr *ehdr) +{ + return memcmp(ehdr->e_ident, ELFMAG, SELFMAG) == 0; +} + +int elf_read_from_buffer(const char *buf, size_t len, struct elfhdr *ehdr, +struct elf_info *elf_info); +void elf_init_elf_info(const struct elfhdr *ehdr, struct elf_shdr *sechdrs, + struct elf_info *elf_info); +void elf_free_info(struct elf_info *elf_info); + int elf64_apply_relocate_add(const struct elf_info *elf_info, const char *strtab, const Elf64_Rela *rela, unsigned int num_rela, void *syms_base, diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index e99f626acc85..8a53fccaa053 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -125,7 +125,7 @@ obj-y += iomap.o endif ifeq ($(CONFIG_MODULES)$(CONFIG_WORD_SIZE),y64) -obj-y += elf_util_64.o +obj-y += elf_util.o elf_util_64.o endif obj64-$(CONFIG_PPC_TRANSACTIONAL_MEM) += tm.o diff --git a/arch/powerpc/kernel/elf_util.c b/arch/powerpc/kernel/elf_util.c new file mode 100644 index ..1df4a116ad90 --- /dev/null +++ b/arch/powerpc/kernel/elf_util.c @@ -0,0 +1,476 @@ +/* + * Utility functions to work with ELF files. + * + * Copyright (C) 2016, IBM Corporation + * + * Based on kexec-tools' kexec-elf.c. Heavily modified for the + * kernel by Thiago Jung Bauermann . + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation (version 2 of the License). + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include + +#if ELF_CLASS == ELFCLASS32 +#define elf_addr_to_cpuelf32_to_cpu + +#ifndef Elf_Rel +#define Elf_RelElf32_Rel +#endif /* Elf_Rel */ +#else /* ELF_CLASS == ELFCLASS32 */ +#define elf_addr_to_cpuelf64_to_cpu + +#ifndef Elf_Rel +#define Elf_RelElf64_Rel +#endif /* Elf_Rel */ + +static uint64_t elf64_to_cpu(const struct elfhdr *ehdr, uint64_t value) +{ + if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB) + value = le64_to_cpu(value); + else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB) + value = be64_to_cpu(value); + + return value; +} +#endif /* ELF_CLASS == ELFCLASS32 */ + +static uint16_t elf16_to_cpu(const struct elfhdr *ehdr, uint16_t value) +{ + if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB) + value = le16_to_cpu(value); + else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB) + value = be16_to_cpu(value); + + return value; +} + +static uint32_t elf32_to_cpu(const struct elfhdr *ehdr, uint32_t value) +{ + if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB) + value = le32_to_cpu(value); + else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB) + value = be32_to_cpu(value); + + return value; +} + +/** + * elf_is_ehdr_sane - check that it is safe to use the ELF header + * @buf_len: size of the buffer in which the ELF file is loaded. + */ +static bool
[PATCH v3 5/9] powerpc: Generalize elf64_apply_relocate_add.
When apply_relocate_add is called, modules are already loaded at their final location in memory so Elf64_Shdr.sh_addr can be used for accessing the section contents as well as the base address for relocations. This is not the case for kexec's purgatory, because it will only be copied to its final location right before being executed. Therefore, it needs to be relocated while it is still in a temporary buffer. In this case, Elf64_Shdr.sh_addr can't be used to access the sections' contents. This patch allows elf64_apply_relocate_add to be used when the ELF binary is not yet at its final location by adding an addr_base argument to specify the address at which the section will be loaded, and rela, loc_base and syms_base to point to the sections' contents. Signed-off-by: Thiago Jung BauermannCc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Torsten Duwe Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org --- arch/powerpc/include/asm/elf_util.h | 6 ++-- arch/powerpc/kernel/elf_util_64.c | 63 + arch/powerpc/kernel/module_64.c | 17 -- 3 files changed, 61 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/include/asm/elf_util.h b/arch/powerpc/include/asm/elf_util.h index 37372559fe62..a012ba03282d 100644 --- a/arch/powerpc/include/asm/elf_util.h +++ b/arch/powerpc/include/asm/elf_util.h @@ -64,7 +64,9 @@ static inline unsigned long my_r2(const struct elf_info *elf_info) } int elf64_apply_relocate_add(const struct elf_info *elf_info, -const char *strtab, unsigned int symindex, -unsigned int relsec, const char *obj_name); +const char *strtab, const Elf64_Rela *rela, +unsigned int num_rela, void *syms_base, +void *loc_base, Elf64_Addr addr_base, +const char *obj_name); #endif /* _ASM_POWERPC_ELF_UTIL_H */ diff --git a/arch/powerpc/kernel/elf_util_64.c b/arch/powerpc/kernel/elf_util_64.c index decad2c34f38..8e5d400ac9f2 100644 --- a/arch/powerpc/kernel/elf_util_64.c +++ b/arch/powerpc/kernel/elf_util_64.c @@ -69,33 +69,56 @@ static void squash_toc_save_inst(const char *name, unsigned long addr) { } * elf64_apply_relocate_add - apply 64 bit RELA relocations * @elf_info: Support information for the ELF binary being relocated. * @strtab:String table for the associated symbol table. - * @symindex: Section header index for the associated symbol table. - * @relsec:Section header index for the relocations to apply. + * @rela: Contents of the section with the relocations to apply. + * @num_rela: Number of relocation entries in the section. + * @syms_base: Contents of the associated symbol table. + * @loc_base: Contents of the section to which relocations apply. + * @addr_base: The address where the section will be loaded in memory. * @obj_name: The name of the ELF binary, for information messages. + * + * Applies RELA relocations to an ELF file already at its final location + * in memory (in which case loc_base == addr_base), or still in a temporary + * buffer. */ int elf64_apply_relocate_add(const struct elf_info *elf_info, -const char *strtab, unsigned int symindex, -unsigned int relsec, const char *obj_name) +const char *strtab, const Elf64_Rela *rela, +unsigned int num_rela, void *syms_base, +void *loc_base, Elf64_Addr addr_base, +const char *obj_name) { unsigned int i; - Elf64_Shdr *sechdrs = elf_info->sechdrs; - Elf64_Rela *rela = (void *)sechdrs[relsec].sh_addr; - Elf64_Sym *sym; unsigned long *location; + unsigned long address; unsigned long value; + const char *name; + Elf64_Sym *sym; + + for (i = 0; i < num_rela; i++) { + /* +* rels[i].r_offset contains the byte offset from the beginning +* of section to the storage unit affected. +* +* This is the location to update in the temporary buffer where +* the section is currently loaded. The section will finally +* be loaded to a different address later, pointed to by +* addr_base. +*/ + location = loc_base + rela[i].r_offset; + + /* Final address of the location. */ + address = addr_base + rela[i].r_offset; + /* This is the symbol the relocation is referring to. */ + sym = (Elf64_Sym *) syms_base + ELF64_R_SYM(rela[i].r_info); - for
[PATCH v3 3/9] kexec_file: Factor out kexec_locate_mem_hole from kexec_add_buffer.
kexec_locate_mem_hole will be used by the PowerPC kexec_file_load implementation to find free memory for the purgatory stack. Signed-off-by: Thiago Jung BauermannCc: Eric Biederman Cc: Dave Young Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org --- include/linux/kexec.h | 4 kernel/kexec_file.c | 66 ++- 2 files changed, 53 insertions(+), 17 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 3d91bcfc180d..4ca6f5f95d66 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -227,6 +227,10 @@ extern asmlinkage long sys_kexec_load(unsigned long entry, struct kexec_segment __user *segments, unsigned long flags); extern int kernel_kexec(void); +int kexec_locate_mem_hole(struct kimage *image, unsigned long size, + unsigned long align, unsigned long min_addr, + unsigned long max_addr, bool top_down, + unsigned long *addr); extern int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz, unsigned long memsz, unsigned long buf_align, unsigned long buf_min, diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index b1f1f6402518..85a515511925 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -449,6 +449,46 @@ int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf, return walk_system_ram_res(0, ULONG_MAX, kbuf, func); } +/** + * kexec_locate_mem_hole - find free memory to load segment or use in purgatory + * @image: kexec image being updated. + * @size: Memory size. + * @align: Minimum alignment needed. + * @min_addr: Minimum starting address. + * @max_addr: Maximum end address. + * @top_down Find the highest free memory region? + * @addr On success, will have start address of the memory region found. + * + * Return: 0 on success, negative errno on error. + */ +int kexec_locate_mem_hole(struct kimage *image, unsigned long size, + unsigned long align, unsigned long min_addr, + unsigned long max_addr, bool top_down, + unsigned long *addr) +{ + int ret; + struct kexec_buf buf; + + memset(, 0, sizeof(struct kexec_buf)); + buf.image = image; + + buf.memsz = size; + buf.buf_align = align; + buf.buf_min = min_addr; + buf.buf_max = max_addr; + buf.top_down = top_down; + + ret = arch_kexec_walk_mem(, locate_mem_hole_callback); + if (ret != 1) { + /* A suitable memory range could not be found for buffer */ + return -EADDRNOTAVAIL; + } + + *addr = buf.mem; + + return 0; +} + /* * Helper function for placing a buffer in a kexec segment. This assumes * that kexec_mutex is held. @@ -460,8 +500,8 @@ int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz, { struct kexec_segment *ksegment; - struct kexec_buf buf, *kbuf; int ret; + unsigned long addr, align, size; /* Currently adding segment this way is allowed only in file mode */ if (!image->file_mode) @@ -482,29 +522,21 @@ int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz, return -EINVAL; } - memset(, 0, sizeof(struct kexec_buf)); - kbuf = - kbuf->image = image; - - kbuf->memsz = ALIGN(memsz, PAGE_SIZE); - kbuf->buf_align = max(buf_align, PAGE_SIZE); - kbuf->buf_min = buf_min; - kbuf->buf_max = buf_max; - kbuf->top_down = top_down; + size = ALIGN(memsz, PAGE_SIZE); + align = max(buf_align, PAGE_SIZE); /* Walk the RAM ranges and allocate a suitable range for the buffer */ - ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback); - if (ret != 1) { - /* A suitable memory range could not be found for buffer */ - return -EADDRNOTAVAIL; - } + ret = kexec_locate_mem_hole(image, size, align, buf_min, buf_max, + top_down, ); + if (ret) + return ret; /* Found a suitable memory range */ ksegment = >segment[image->nr_segments]; ksegment->kbuf = buffer; ksegment->bufsz = bufsz; - ksegment->mem = kbuf->mem; - ksegment->memsz = kbuf->memsz; + ksegment->mem = addr; + ksegment->memsz = size; image->nr_segments++; *load_addr = ksegment->mem; return 0; -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 4/9] powerpc: Factor out relocation code from module_64.c to elf_util_64.c.
The kexec_file_load system call needs to relocate the purgatory, so factor out the module relocation code so that it can be shared. This patch's purpose is to move the ELF relocation logic from apply_relocate_add to elf_util_64.c with as few changes as possible. The following changes were needed: To avoid having module-specific code in a general purpose utility function, struct elf_info was created to contain the information needed for ELF binaries manipulation. my_r2, stub_for_addr and create_stub were changed to use it instead of having to receive a struct module, since they are called from elf64_apply_relocate_add. local_entry_offset and squash_toc_save_inst were only used by apply_rellocate_add, so they were moved to elf_util_64.c as well. Signed-off-by: Thiago Jung BauermannCc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Torsten Duwe Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org --- arch/powerpc/include/asm/elf_util.h | 70 arch/powerpc/include/asm/module.h | 14 +- arch/powerpc/kernel/Makefile| 4 + arch/powerpc/kernel/elf_util_64.c | 269 +++ arch/powerpc/kernel/module_64.c | 312 5 files changed, 386 insertions(+), 283 deletions(-) diff --git a/arch/powerpc/include/asm/elf_util.h b/arch/powerpc/include/asm/elf_util.h new file mode 100644 index ..37372559fe62 --- /dev/null +++ b/arch/powerpc/include/asm/elf_util.h @@ -0,0 +1,70 @@ +/* + * Utility functions to work with ELF files. + * + * Copyright (C) 2016, IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#ifndef _ASM_POWERPC_ELF_UTIL_H +#define _ASM_POWERPC_ELF_UTIL_H + +#include + +struct elf_info { + struct elf_shdr *sechdrs; + + /* Index of stubs section. */ + unsigned int stubs_section; + /* Index of TOC section. */ + unsigned int toc_section; +}; + +#ifdef __powerpc64__ +#ifdef PPC64_ELF_ABI_v2 + +/* An address is simply the address of the function. */ +typedef unsigned long func_desc_t; +#else + +/* An address is address of the OPD entry, which contains address of fn. */ +typedef struct ppc64_opd_entry func_desc_t; +#endif /* PPC64_ELF_ABI_v2 */ + +/* Like PPC32, we need little trampolines to do > 24-bit jumps (into + the kernel itself). But on PPC64, these need to be used for every + jump, actually, to reset r2 (TOC+0x8000). */ +struct ppc64_stub_entry +{ + /* 28 byte jump instruction sequence (7 instructions). We only +* need 6 instructions on ABIv2 but we always allocate 7 so +* so we don't have to modify the trampoline load instruction. */ + u32 jump[7]; + /* Used by ftrace to identify stubs */ + u32 magic; + /* Data for the above code */ + func_desc_t funcdata; +}; +#endif + +/* r2 is the TOC pointer: it actually points 0x8000 into the TOC (this + gives the value maximum span in an instruction which uses a signed + offset) */ +static inline unsigned long my_r2(const struct elf_info *elf_info) +{ + return elf_info->sechdrs[elf_info->toc_section].sh_addr + 0x8000; +} + +int elf64_apply_relocate_add(const struct elf_info *elf_info, +const char *strtab, unsigned int symindex, +unsigned int relsec, const char *obj_name); + +#endif /* _ASM_POWERPC_ELF_UTIL_H */ diff --git a/arch/powerpc/include/asm/module.h b/arch/powerpc/include/asm/module.h index cd4ffd86765f..f2073115d518 100644 --- a/arch/powerpc/include/asm/module.h +++ b/arch/powerpc/include/asm/module.h @@ -12,7 +12,14 @@ #include #include #include +#include +/* Both low and high 16 bits are added as SIGNED additions, so if low + 16 bits has high bit set, high 16 bits must be adjusted. These + macros do that (stolen from binutils). */ +#define PPC_LO(v) ((v) & 0x) +#define PPC_HI(v) (((v) >> 16) & 0x) +#define PPC_HA(v) PPC_HI ((v) + 0x8000) #ifndef __powerpc64__ /* @@ -33,8 +40,7 @@ struct ppc_plt_entry { struct mod_arch_specific { #ifdef __powerpc64__ - unsigned int stubs_section; /* Index of stubs section in module */ - unsigned int toc_section; /* What section is the TOC? */ + struct elf_info elf_info; bool toc_fixed; /* Have we fixed up .TOC.? */ #ifdef CONFIG_DYNAMIC_FTRACE unsigned long toc; @@ -90,6 +96,10 @@
[PATCH v3 2/9] kexec_file: Generalize kexec_add_buffer.
Allow architectures to specify different memory walking functions for kexec_add_buffer. Intel uses iomem to track reserved memory ranges, but PowerPC uses the memblock subsystem. Signed-off-by: Thiago Jung BauermannCc: Eric Biederman Cc: Dave Young Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org --- include/linux/kexec.h | 19 ++- kernel/kexec_file.c | 30 ++ kernel/kexec_internal.h | 14 -- 3 files changed, 40 insertions(+), 23 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index e8acb2b43dd9..3d91bcfc180d 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -146,7 +146,24 @@ struct kexec_file_ops { kexec_verify_sig_t *verify_sig; #endif }; -#endif + +/* + * Keeps track of buffer parameters as provided by caller for requesting + * memory placement of buffer. + */ +struct kexec_buf { + struct kimage *image; + unsigned long mem; + unsigned long memsz; + unsigned long buf_align; + unsigned long buf_min; + unsigned long buf_max; + bool top_down; /* allocate from top of memory hole */ +}; + +int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf, + int (*func)(u64, u64, void *)); +#endif /* CONFIG_KEXEC_FILE */ struct kimage { kimage_entry_t head; diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index b6eec7527e9f..b1f1f6402518 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -428,6 +428,27 @@ static int locate_mem_hole_callback(u64 start, u64 end, void *arg) return locate_mem_hole_bottom_up(start, end, kbuf); } +/** + * arch_kexec_walk_mem - call func(data) on free memory regions + * @kbuf: Context info for the search. Also passed to @func. + * @func: Function to call for each memory region. + * + * Return: The memory walk will stop when func returns a non-zero value + * and that value will be returned. If all free regions are visited without + * func returning non-zero, then zero will be returned. + */ +int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf, + int (*func)(u64, u64, void *)) +{ + if (kbuf->image->type == KEXEC_TYPE_CRASH) + return walk_iomem_res_desc(crashk_res.desc, + IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, + crashk_res.start, crashk_res.end, + kbuf, func); + else + return walk_system_ram_res(0, ULONG_MAX, kbuf, func); +} + /* * Helper function for placing a buffer in a kexec segment. This assumes * that kexec_mutex is held. @@ -472,14 +493,7 @@ int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz, kbuf->top_down = top_down; /* Walk the RAM ranges and allocate a suitable range for the buffer */ - if (image->type == KEXEC_TYPE_CRASH) - ret = walk_iomem_res_desc(crashk_res.desc, - IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, - crashk_res.start, crashk_res.end, kbuf, - locate_mem_hole_callback); - else - ret = walk_system_ram_res(0, -1, kbuf, - locate_mem_hole_callback); + ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback); if (ret != 1) { /* A suitable memory range could not be found for buffer */ return -EADDRNOTAVAIL; diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h index eefd5bf960c2..4cef7e4706b0 100644 --- a/kernel/kexec_internal.h +++ b/kernel/kexec_internal.h @@ -20,20 +20,6 @@ struct kexec_sha_region { unsigned long len; }; -/* - * Keeps track of buffer parameters as provided by caller for requesting - * memory placement of buffer. - */ -struct kexec_buf { - struct kimage *image; - unsigned long mem; - unsigned long memsz; - unsigned long buf_align; - unsigned long buf_min; - unsigned long buf_max; - bool top_down; /* allocate from top of memory hole */ -}; - void kimage_file_post_load_cleanup(struct kimage *image); #else /* CONFIG_KEXEC_FILE */ static inline void kimage_file_post_load_cleanup(struct kimage *image) { } -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 1/9] kexec_file: Remove unused members from struct kexec_buf.
kexec_add_buffer uses kexec_buf.buffer and kexec_buf.bufsz to pass along its own arguments buffer and bufsz, but since they aren't used anywhere else, it's pointless. Signed-off-by: Thiago Jung BauermannCc: Eric Biederman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Acked-by: Dave Young --- kernel/kexec_file.c | 6 ++ kernel/kexec_internal.h | 2 -- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 01ab82a40d22..b6eec7527e9f 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -464,8 +464,6 @@ int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz, memset(, 0, sizeof(struct kexec_buf)); kbuf = kbuf->image = image; - kbuf->buffer = buffer; - kbuf->bufsz = bufsz; kbuf->memsz = ALIGN(memsz, PAGE_SIZE); kbuf->buf_align = max(buf_align, PAGE_SIZE); @@ -489,8 +487,8 @@ int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz, /* Found a suitable memory range */ ksegment = >segment[image->nr_segments]; - ksegment->kbuf = kbuf->buffer; - ksegment->bufsz = kbuf->bufsz; + ksegment->kbuf = buffer; + ksegment->bufsz = bufsz; ksegment->mem = kbuf->mem; ksegment->memsz = kbuf->memsz; image->nr_segments++; diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h index 0a52315d9c62..eefd5bf960c2 100644 --- a/kernel/kexec_internal.h +++ b/kernel/kexec_internal.h @@ -26,8 +26,6 @@ struct kexec_sha_region { */ struct kexec_buf { struct kimage *image; - char *buffer; - unsigned long bufsz; unsigned long mem; unsigned long memsz; unsigned long buf_align; -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 0/9] kexec_file_load implementation for PowerPC
Hello, This patch series implements the kexec_file_load system call on PowerPC. This system call moves the reading of the kernel, initrd and the device tree from the userspace kexec tool to the kernel. This is needed if you want to do one or both of the following: 1. only allow loading of signed kernels. 2. "measure" (i.e., record the hashes of) the kernel, initrd, kernel command line and other boot inputs for the Integrity Measurement Architecture subsystem. The above are the functions kexec already has built into kexec_file_load. Yesterday I posted a set of patches which allows a third feature: 3. have IMA pass-on its event log (where integrity measurements are registered) accross kexec to the second kernel, so that the event history is preserved. Because OpenPower uses an intermediary Linux instance as a boot loader (skiroot), feature 1 is needed to implement secure boot for the platform, while features 2 and 3 are needed to implement trusted boot. This patch series starts by removing an x86 assumption from kexec_file: kexec_add_buffer uses iomem to find reserved memory ranges, but PowerPC uses the memblock subsystem. A hook is added so that each arch can specify how memory ranges can be found. Also, the memory-walking logic in kexec_add_buffer is useful in this implementation to find a free area for the purgatory's stack, so the next patch moves that logic to kexec_locate_mem_hole. The kexec_file_load system call needs to apply relocations to the purgatory but adding code for that would duplicate functionality with the module loading mechanism, which also needs to apply relocations to the kernel modules. Therefore, this patch series factors out the module relocation code so that it can be shared. One thing that is still missing is crashkernel support, which I intend to submit shortly. For now, arch_kexec_kernel_image_probe rejects crash kernels. This code is based on kexec-tools, but with many modifications to adapt it to the kernel environment and facilities. Except the purgatory, which only has minimal changes. Changes for v3: - Rebased series on today's powerpc/next. - Patch "kexec_file: Generalize kexec_add_buffer.": - Removed most arguments from arch_kexec_walk_mem and pass kbuf explicitly. - Patch "powerpc: Add functions to read ELF files of any endianness.": - Fixed whitespace issues found by checkpatch.pl. - Patch "powerpc: Factor out relocation code from module_64.c to elf_util_64.c.": - Changed to use the new PPC64_ELF_ABI_v2 macro. - Patch "powerpc: Add support for loading ELF kernels with kexec_file_load.": - Adapted arch_kexec_walk_mem implementation to changes in its argument list. - Fixed whitespace and GPL header issues found by checkpatch.pl. - Patch "powerpc: Add purgatory for kexec_file_load implementation.": - Fixed whitespace and GPL header issues found by checkpatch.pl. - Changed to use the new PPC64_ELF_ABI_v2 macro. Changes for v2: - All patches: forgot to add Signed-off-by lines in v1, so added them now. - Patch "kexec_file: Generalize kexec_add_buffer.": broke in two, one adding arch_kexec_walk_mem and the other adding kexec_locate_mem_hole. - Patch "powerpc: Implement kexec_file_load.": - Moved relocation changes and the arch_kexec_walk_mem implementation to the next patch in the series. - Removed pr_fmt from machine_kexec_64.c, since the patch doesn't add any call to pr_debug in that file. - Changed arch_kexec_kernel_image_probe to reject crash kernels. Changes for v3: - Rebased series on today's powerpc/next. - Patch "kexec_file: Generalize kexec_add_buffer.": - Removed most arguments from arch_kexec_walk_mem and pass kbuf explicitly. - Patch "powerpc: Add functions to read ELF files of any endianness.": - Fixed whitespace issues found by checkpatch.pl. - Patch "powerpc: Factor out relocation code from module_64.c to elf_util_64.c.": - Changed to use the new PPC64_ELF_ABI_v2 macro. - Patch "powerpc: Add support for loading ELF kernels with kexec_file_load.": - Adapted arch_kexec_walk_mem implementation to changes in its argument list. - Fixed whitespace and GPL header issues found by checkpatch.pl. - Patch "powerpc: Add purgatory for kexec_file_load implementation.": - Fixed whitespace and GPL header issues found by checkpatch.pl. - Changed to use the new PPC64_ELF_ABI_v2 macro. Changes for v2: - All patches: forgot to add Signed-off-by lines in v1, so added them now. - Patch "kexec_file: Generalize kexec_add_buffer.": broke in two, one adding arch_kexec_walk_mem and the other adding kexec_locate_mem_hole. - Patch "powerpc: Implement kexec_file_load.": - Moved relocation changes and the arch_kexec_walk_mem implementation to the next patch in the series. - Removed pr_fmt from machine_kexec_64.c, since the patch doesn't add any call to pr_debug in that file. - Changed arch_kexec_kernel_image_probe to reject
Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
On Tue, Jun 21, 2016 at 03:43:56PM -0400, Tejun Heo wrote: > On Tue, Jun 21, 2016 at 09:37:09PM +0200, Peter Zijlstra wrote: > > Hurm.. So I've applied it, just to get this issue sorted, but I'm not > > entirely sure I like it. > > > > I think I prefer ego's version because that makes it harder to get stuff > > to run on !active,online cpus. I think we really want to be careful what > > gets to run during that state. > > The original patch just did set_cpus_allowed one more time late enough > so that the target kthread (in most cases) doesn't have to go through > fallback rq selection afterwards. I don't know what the long term > solution is but CPU_ONLINE callbacks should be able to bind kthreads > to the new CPU one way or the other. Fair enough; clearly I need to stare harder. In any case, patch is on its way to sched/urgent. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
On Tue, Jun 21, 2016 at 09:37:09PM +0200, Peter Zijlstra wrote: > Hurm.. So I've applied it, just to get this issue sorted, but I'm not > entirely sure I like it. > > I think I prefer ego's version because that makes it harder to get stuff > to run on !active,online cpus. I think we really want to be careful what > gets to run during that state. The original patch just did set_cpus_allowed one more time late enough so that the target kthread (in most cases) doesn't have to go through fallback rq selection afterwards. I don't know what the long term solution is but CPU_ONLINE callbacks should be able to bind kthreads to the new CPU one way or the other. Thanks. -- tejun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
On Tue, Jun 21, 2016 at 11:36:51AM -0400, Tejun Heo wrote: > On Tue, Jun 21, 2016 at 07:42:31PM +0530, Gautham R Shenoy wrote: > > > Subject: [PATCH] sched: allow kthreads to fallback to online && !active > > > cpus > > > > > > During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is > > > online but not active. A CPU_ONLINE callback may create or bind a > > > kthread so that its cpus_allowed mask only allows the CPU which is > > > being brought online. The kthread may start executing before the CPU > > > is made active and can end up in select_fallback_rq(). > > > > > > In such cases, the expected behavior is selecting the CPU which is > > > coming online; however, because select_fallback_rq() only chooses from > > > active CPUs, it determines that the task doesn't have any viable CPU > > > in its allowed mask and ends up overriding it to cpu_possible_mask. > > > > > > CPU_ONLINE callbacks should be able to put kthreads on the CPU which > > > is coming online. Update select_fallback_rq() so that it follows > > > cpu_online() rather than cpu_active() for kthreads. > > > > > > Signed-off-by: Tejun Heo> > > Reported-by: Gautham R Shenoy > > > > Hi Tejun, > > > > This patch fixes the issue on POWER. I am able to see the worker > > threads of the unbound workqueues of the newly onlined node with this. > > > > Tested-by: Gautham R. Shenoy > > Peter? Hurm.. So I've applied it, just to get this issue sorted, but I'm not entirely sure I like it. I think I prefer ego's version because that makes it harder to get stuff to run on !active,online cpus. I think we really want to be careful what gets to run during that state. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 7/8] dmaengine: tegra20-apb-dma: Only calculate residue if txstate exists.
On 21/06/16 17:01, Vinod Koul wrote: > On Wed, Jun 08, 2016 at 09:51:57AM +0100, Jon Hunter wrote: >> Hi Peter, >> >> On 07/06/16 18:38, Peter Griffin wrote: >>> There is no point calculating the residue if there is >>> no txstate to store the value. >>> >>> Signed-off-by: Peter Griffin>>> --- >>> drivers/dma/tegra20-apb-dma.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c >>> index 01e316f..7f4af8c 100644 >>> --- a/drivers/dma/tegra20-apb-dma.c >>> +++ b/drivers/dma/tegra20-apb-dma.c >>> @@ -814,7 +814,7 @@ static enum dma_status tegra_dma_tx_status(struct >>> dma_chan *dc, >>> unsigned int residual; >>> >>> ret = dma_cookie_status(dc, cookie, txstate); >>> - if (ret == DMA_COMPLETE) >>> + if (ret == DMA_COMPLETE || !txstate) >>> return ret; >> >> Thanks for reporting this. I agree that we should not do this, however, >> looking at the code for Tegra, I am wondering if this could change the >> actual state that is returned. Looking at dma_cookie_status() it will >> call dma_async_is_complete() which will return either DMA_COMPLETE or >> DMA_IN_PROGRESS. It could be possible that the actual state for the >> DMA transfer in the tegra driver is DMA_ERROR, so I am wondering if we >> should do something like the following ... > > This one is stopping code execution when residue is not valid. Do notice > that it check for DMA_COMPLETE OR txstate. In other cases, wit will return > 'that' state when txstate is NULL. Sorry what do you mean by "this one"? My point is that if the status is not DMA_COMPLETE, then it is possible that it could be DMA_ERROR (for tegra that is). However, dma_cookie_status will only return DMA_IN_PROGRESS or DMA_COMPLETE and so if 'txstate' is NULL we will not see the DMA_ERROR status anymore and just think it is in progress when it is actually an error. I do agree that the driver is broken as we are not checking for !txstate, but this also changes the behaviour a bit. Cheers Jon -- nvpublic ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3] tools/perf: Fix the mask in regs_dump__printf and print_sample_iregs
On Tue, Jun 21, 2016 at 08:26:40PM +0530, Madhavan Srinivasan wrote: > When decoding the perf_regs mask in regs_dump__printf(), > we loop through the mask using find_first_bit and find_next_bit functions. > "mask" is of type "u64", but sent as a "unsigned long *" to > lib functions along with sizeof(). > > While the exisitng code works fine in most of the case, > the logic is broken when using a 32bit perf on a 64bit kernel (Big Endian). > When reading u64 using (u32 *)()[0], perf (lib/find_*_bit()) assumes it > gets > lower 32bits of u64 which is wrong. Proposed fix is to swap the words > of the u64 to handle this case. This is _not_ endianess swap. > > Suggested-by: Yury Norov> Cc: Yury Norov > Cc: Peter Zijlstra > Cc: Ingo Molnar > Cc: Arnaldo Carvalho de Melo > Cc: Alexander Shishkin > Cc: Jiri Olsa > Cc: Adrian Hunter > Cc: Kan Liang > Cc: Wang Nan > Cc: Michael Ellerman > Signed-off-by: Madhavan Srinivasan > --- > Changelog v2: > 1)Moved the swap code to a common function > 2)Added more comments in the code > > Changelog v1: > 1)updated commit message and patch subject > 2)Add the fix to print_sample_iregs() in builtin-script.c > > tools/include/linux/bitmap.h | 9 + What about include/linux/bitmap.h? I think we'd place it there first. > tools/perf/builtin-script.c | 16 +++- > tools/perf/util/session.c| 16 +++- > 3 files changed, 39 insertions(+), 2 deletions(-) > > diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h > index 28f5493da491..79998b26eb04 100644 > --- a/tools/include/linux/bitmap.h > +++ b/tools/include/linux/bitmap.h > @@ -2,6 +2,7 @@ > #define _PERF_BITOPS_H > > #include > +#include > #include > > #define DECLARE_BITMAP(name,bits) \ > @@ -22,6 +23,14 @@ void __bitmap_or(unsigned long *dst, const unsigned long > *bitmap1, > #define small_const_nbits(nbits) \ > (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG) > > +static inline void bitmap_from_u64(unsigned long *_mask, u64 mask) Inline is not required. Some people don't not like it. Underscored parameter in function declaration is not the best idea as well. Try: static void bitmap_from_u64(unsigned long *bitmap, u64 mask) > +{ > + _mask[0] = mask & ULONG_MAX; > + > + if (sizeof(mask) > sizeof(unsigned long)) > + _mask[1] = mask >> 32; > +} > + > static inline void bitmap_zero(unsigned long *dst, int nbits) > { > if (small_const_nbits(nbits)) > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c > index e3ce2f34d3ad..73928310fd91 100644 > --- a/tools/perf/builtin-script.c > +++ b/tools/perf/builtin-script.c > @@ -412,11 +412,25 @@ static void print_sample_iregs(struct perf_sample > *sample, > struct regs_dump *regs = >intr_regs; > uint64_t mask = attr->sample_regs_intr; > unsigned i = 0, r; > + unsigned long _mask[sizeof(mask)/sizeof(unsigned long)]; If we start with it, I think we'd hide declaration machinery as well: #define DECLARE_L64_BITMAP(__name) unsigned long __name[sizeof(u64)/sizeof(unsigned long)] or #define L64_BITMAP_SIZE (sizeof(u64)/sizeof(unsigned long)) Or both :) Whatever you prefer. > > if (!regs) > return; > > - for_each_set_bit(r, (unsigned long *) , sizeof(mask) * 8) { > + /* > + * Since u64 is passed as 'unsigned long *', check > + * to see whether we need to swap words within u64. > + * Reason being, in 32 bit big endian userspace on a > + * 64bit kernel, 'unsigned long' is 32 bits. > + * When reading u64 using (u32 *)()[0] and (u32 *)()[1], > + * we will get wrong value for the mask. This is what > + * find_first_bit() and find_next_bit() is doing. > + * Issue here is "(u32 *)()[0]" gets upper 32 bits of u64, > + * but perf assumes it gets lower 32bits of u64. Hence the check > + * and swap. > + */ > + bitmap_from_u64(_mask, mask); > + for_each_set_bit(r, _mask, sizeof(mask) * 8) { > u64 val = regs->regs[i++]; > printf("%5s:0x%"PRIx64" ", perf_reg_name(r), val); > } > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c > index 5214974e841a..1337b1c73f82 100644 > --- a/tools/perf/util/session.c > +++ b/tools/perf/util/session.c > @@ -940,8 +940,22 @@ static void branch_stack__printf(struct perf_sample > *sample) > static void regs_dump__printf(u64 mask, u64 *regs) > { > unsigned rid, i = 0; > + unsigned long _mask[sizeof(mask)/sizeof(unsigned long)]; > > - for_each_set_bit(rid, (unsigned long *) , sizeof(mask) * 8) { > + /* > + * Since u64 is passed as 'unsigned long *', check
Re: [PATCH] ppc: Fix BPF JIT for ABIv2
On 6/21/16 7:47 AM, Thadeu Lima de Souza Cascardo wrote: The calling convention is different with ABIv2 and so we'll need changes in bpf_slow_path_common() and sk_negative_common(). How big would those changes be? Do we know? How come no one reported this was broken previously? This is the first I've heard of it being broken. I just heard of it less than two weeks ago, and only could investigate it last week, when I realized mainline was also affected. It looks like the little-endian support for classic JIT were done before the conversion to ABIv2. And as JIT is disabled by default, no one seems to have exercised it. it's not a surprise unfortunately. The JITs that were written before test_bpf.ko was developed were missing corner cases. Typical tcpdump would be fine, but fragmented packets, negative offsets and out-out-bounds wouldn't be handled correctly. I'd suggest to validate the stable backport with test_bpf as well. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] ibmvnic: fix to use list_for_each_safe() when delete items
On 06/20/2016 10:50 AM, Thomas Falcon wrote: > On 06/17/2016 09:53 PM, weiyj...@163.com wrote: >> From: Wei Yongjun>> >> Since we will remove items off the list using list_del() we need >> to use a safe version of the list_for_each() macro aptly named >> list_for_each_safe(). >> >> Signed-off-by: Wei Yongjun >> --- >> drivers/net/ethernet/ibm/ibmvnic.c | 10 +- >> 1 file changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c >> b/drivers/net/ethernet/ibm/ibmvnic.c >> index 864cb21..0b6a922 100644 >> --- a/drivers/net/ethernet/ibm/ibmvnic.c >> +++ b/drivers/net/ethernet/ibm/ibmvnic.c >> @@ -3141,14 +3141,14 @@ static void handle_request_ras_comp_num_rsp(union >> ibmvnic_crq *crq, >> >> static void ibmvnic_free_inflight(struct ibmvnic_adapter *adapter) >> { >> -struct ibmvnic_inflight_cmd *inflight_cmd; >> +struct ibmvnic_inflight_cmd *inflight_cmd, *tmp1; >> struct device *dev = >vdev->dev; >> -struct ibmvnic_error_buff *error_buff; >> +struct ibmvnic_error_buff *error_buff, *tmp2; >> unsigned long flags; >> unsigned long flags2; >> >> spin_lock_irqsave(>inflight_lock, flags); >> -list_for_each_entry(inflight_cmd, >inflight, list) { >> +list_for_each_entry_safe(inflight_cmd, tmp1, >inflight, list) { >> switch (inflight_cmd->crq.generic.cmd) { >> case LOGIN: >> dma_unmap_single(dev, adapter->login_buf_token, >> @@ -3165,8 +3165,8 @@ static void ibmvnic_free_inflight(struct >> ibmvnic_adapter *adapter) >> break; >> case REQUEST_ERROR_INFO: >> spin_lock_irqsave(>error_list_lock, flags2); >> -list_for_each_entry(error_buff, >errors, >> -list) { >> +list_for_each_entry_safe(error_buff, tmp2, >> + >errors, list) { >> dma_unmap_single(dev, error_buff->dma, >> error_buff->len, >> DMA_FROM_DEVICE); >> > Thanks! > > Acked-by: Thomas Falcon Hello, I apologize for prematurely ack'ing this. There is another situation where you could use list_for_each_entry_safe in the function handle_error_info_rsp. Could you include this in your patch, please? diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 864cb21..e9968d9 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2121,7 +2121,7 @@ static void handle_error_info_rsp(union ibmvnic_crq *crq, struct ibmvnic_adapter *adapter) { struct device *dev = >vdev->dev; - struct ibmvnic_error_buff *error_buff; + struct ibmvnic_error_buff *error_buff, *tmp; unsigned long flags; bool found = false; int i; @@ -2133,7 +2133,7 @@ static void handle_error_info_rsp(union ibmvnic_crq *crq, } spin_lock_irqsave(>error_list_lock, flags); - list_for_each_entry(error_buff, >errors, list) + list_for_each_entry_safe(error_buff, tmp, >errors, list) if (error_buff->error_id == crq->request_error_rsp.error_id) { found = true; list_del(_buff->list); >> >> >> ___ >> Linuxppc-dev mailing list >> Linuxppc-dev@lists.ozlabs.org >> https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] Various dmaengine cleanups
On Tue, Jun 07, 2016 at 06:38:33PM +0100, Peter Griffin wrote: > Hi Vinod, > > This series is a bunch of cleanup updates to various > dmaengine drivers, based on some of the review feeback to my fdma series. Good cleanup, Applied, thanks -- ~Vinod ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 7/8] dmaengine: tegra20-apb-dma: Only calculate residue if txstate exists.
On Wed, Jun 08, 2016 at 09:51:57AM +0100, Jon Hunter wrote: > Hi Peter, > > On 07/06/16 18:38, Peter Griffin wrote: > > There is no point calculating the residue if there is > > no txstate to store the value. > > > > Signed-off-by: Peter Griffin> > --- > > drivers/dma/tegra20-apb-dma.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c > > index 01e316f..7f4af8c 100644 > > --- a/drivers/dma/tegra20-apb-dma.c > > +++ b/drivers/dma/tegra20-apb-dma.c > > @@ -814,7 +814,7 @@ static enum dma_status tegra_dma_tx_status(struct > > dma_chan *dc, > > unsigned int residual; > > > > ret = dma_cookie_status(dc, cookie, txstate); > > - if (ret == DMA_COMPLETE) > > + if (ret == DMA_COMPLETE || !txstate) > > return ret; > > Thanks for reporting this. I agree that we should not do this, however, > looking at the code for Tegra, I am wondering if this could change the > actual state that is returned. Looking at dma_cookie_status() it will > call dma_async_is_complete() which will return either DMA_COMPLETE or > DMA_IN_PROGRESS. It could be possible that the actual state for the > DMA transfer in the tegra driver is DMA_ERROR, so I am wondering if we > should do something like the following ... This one is stopping code execution when residue is not valid. Do notice that it check for DMA_COMPLETE OR txstate. In other cases, wit will return 'that' state when txstate is NULL. I am going to apply this. > > diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c > index 01e316f73559..45edab7418d0 100644 > --- a/drivers/dma/tegra20-apb-dma.c > +++ b/drivers/dma/tegra20-apb-dma.c > @@ -822,13 +822,8 @@ static enum dma_status tegra_dma_tx_status(struct > dma_chan *dc, > /* Check on wait_ack desc status */ > list_for_each_entry(dma_desc, >free_dma_desc, node) { > if (dma_desc->txd.cookie == cookie) { > - residual = dma_desc->bytes_requested - > - (dma_desc->bytes_transferred % > - dma_desc->bytes_requested); > - dma_set_residue(txstate, residual); > ret = dma_desc->dma_status; > - spin_unlock_irqrestore(>lock, flags); > - return ret; > + goto found; > } > } > > @@ -836,17 +831,23 @@ static enum dma_status tegra_dma_tx_status(struct > dma_chan *dc, > list_for_each_entry(sg_req, >pending_sg_req, node) { > dma_desc = sg_req->dma_desc; > if (dma_desc->txd.cookie == cookie) { > - residual = dma_desc->bytes_requested - > - (dma_desc->bytes_transferred % > - dma_desc->bytes_requested); > - dma_set_residue(txstate, residual); > ret = dma_desc->dma_status; > - spin_unlock_irqrestore(>lock, flags); > - return ret; > + goto found; > } > } > > - dev_dbg(tdc2dev(tdc), "cookie %d does not found\n", cookie); > + dev_warn(tdc2dev(tdc), "cookie %d not found\n", cookie); > + spin_unlock_irqrestore(>lock, flags); > + return ret; > + > +found: > + if (txstate) { > + residual = dma_desc->bytes_requested - > + (dma_desc->bytes_transferred % > + dma_desc->bytes_requested); > + dma_set_residue(txstate, residual); > + } > + I feel this optimizes stuff, which seems okay. Feel free to send as proper patch. > spin_unlock_irqrestore(>lock, flags); > return ret; > } > > Cheers > Jon > > -- > nvpublic -- ~Vinod ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
On Tue, Jun 21, 2016 at 07:42:31PM +0530, Gautham R Shenoy wrote: > > Subject: [PATCH] sched: allow kthreads to fallback to online && !active cpus > > > > During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is > > online but not active. A CPU_ONLINE callback may create or bind a > > kthread so that its cpus_allowed mask only allows the CPU which is > > being brought online. The kthread may start executing before the CPU > > is made active and can end up in select_fallback_rq(). > > > > In such cases, the expected behavior is selecting the CPU which is > > coming online; however, because select_fallback_rq() only chooses from > > active CPUs, it determines that the task doesn't have any viable CPU > > in its allowed mask and ends up overriding it to cpu_possible_mask. > > > > CPU_ONLINE callbacks should be able to put kthreads on the CPU which > > is coming online. Update select_fallback_rq() so that it follows > > cpu_online() rather than cpu_active() for kthreads. > > > > Signed-off-by: Tejun Heo> > Reported-by: Gautham R Shenoy > > Hi Tejun, > > This patch fixes the issue on POWER. I am able to see the worker > threads of the unbound workqueues of the newly onlined node with this. > > Tested-by: Gautham R. Shenoy Peter? -- tejun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3] tools/perf: Fix the mask in regs_dump__printf and print_sample_iregs
When decoding the perf_regs mask in regs_dump__printf(), we loop through the mask using find_first_bit and find_next_bit functions. "mask" is of type "u64", but sent as a "unsigned long *" to lib functions along with sizeof(). While the exisitng code works fine in most of the case, the logic is broken when using a 32bit perf on a 64bit kernel (Big Endian). When reading u64 using (u32 *)()[0], perf (lib/find_*_bit()) assumes it gets lower 32bits of u64 which is wrong. Proposed fix is to swap the words of the u64 to handle this case. This is _not_ endianess swap. Suggested-by: Yury NorovCc: Yury Norov Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Adrian Hunter Cc: Kan Liang Cc: Wang Nan Cc: Michael Ellerman Signed-off-by: Madhavan Srinivasan --- Changelog v2: 1)Moved the swap code to a common function 2)Added more comments in the code Changelog v1: 1)updated commit message and patch subject 2)Add the fix to print_sample_iregs() in builtin-script.c tools/include/linux/bitmap.h | 9 + tools/perf/builtin-script.c | 16 +++- tools/perf/util/session.c| 16 +++- 3 files changed, 39 insertions(+), 2 deletions(-) diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h index 28f5493da491..79998b26eb04 100644 --- a/tools/include/linux/bitmap.h +++ b/tools/include/linux/bitmap.h @@ -2,6 +2,7 @@ #define _PERF_BITOPS_H #include +#include #include #define DECLARE_BITMAP(name,bits) \ @@ -22,6 +23,14 @@ void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1, #define small_const_nbits(nbits) \ (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG) +static inline void bitmap_from_u64(unsigned long *_mask, u64 mask) +{ + _mask[0] = mask & ULONG_MAX; + + if (sizeof(mask) > sizeof(unsigned long)) + _mask[1] = mask >> 32; +} + static inline void bitmap_zero(unsigned long *dst, int nbits) { if (small_const_nbits(nbits)) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index e3ce2f34d3ad..73928310fd91 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -412,11 +412,25 @@ static void print_sample_iregs(struct perf_sample *sample, struct regs_dump *regs = >intr_regs; uint64_t mask = attr->sample_regs_intr; unsigned i = 0, r; + unsigned long _mask[sizeof(mask)/sizeof(unsigned long)]; if (!regs) return; - for_each_set_bit(r, (unsigned long *) , sizeof(mask) * 8) { + /* +* Since u64 is passed as 'unsigned long *', check +* to see whether we need to swap words within u64. +* Reason being, in 32 bit big endian userspace on a +* 64bit kernel, 'unsigned long' is 32 bits. +* When reading u64 using (u32 *)()[0] and (u32 *)()[1], +* we will get wrong value for the mask. This is what +* find_first_bit() and find_next_bit() is doing. +* Issue here is "(u32 *)()[0]" gets upper 32 bits of u64, +* but perf assumes it gets lower 32bits of u64. Hence the check +* and swap. +*/ + bitmap_from_u64(_mask, mask); + for_each_set_bit(r, _mask, sizeof(mask) * 8) { u64 val = regs->regs[i++]; printf("%5s:0x%"PRIx64" ", perf_reg_name(r), val); } diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 5214974e841a..1337b1c73f82 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -940,8 +940,22 @@ static void branch_stack__printf(struct perf_sample *sample) static void regs_dump__printf(u64 mask, u64 *regs) { unsigned rid, i = 0; + unsigned long _mask[sizeof(mask)/sizeof(unsigned long)]; - for_each_set_bit(rid, (unsigned long *) , sizeof(mask) * 8) { + /* +* Since u64 is passed as 'unsigned long *', check +* to see whether we need to swap words within u64. +* Reason being, in 32 bit big endian userspace on a +* 64bit kernel, 'unsigned long' is 32 bits. +* When reading u64 using (u32 *)()[0] and (u32 *)()[1], +* we will get wrong value for the mask. This is what +* find_first_bit() and find_next_bit() is doing. +* Issue here is "(u32 *)()[0]" gets upper 32 bits of u64, +* but perf assumes it gets lower 32bits of u64. Hence the check +* and swap. +*/ + bitmap_from_u64(_mask, mask); + for_each_set_bit(rid, _mask, sizeof(mask) * 8) { u64 val = regs[i++]; printf(" %-5s 0x%" PRIx64 "\n", -- 1.9.1
Re: [PATCH] ppc: Fix BPF JIT for ABIv2
On Tue, Jun 21, 2016 at 09:15:48PM +1000, Michael Ellerman wrote: > On Tue, 2016-06-21 at 14:28 +0530, Naveen N. Rao wrote: > > On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote: > > > On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote: > > > > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote: > > > > > > > > > > Hi, Michael and Naveen. > > > > > > > > > > I noticed independently that there is a problem with BPF JIT and > > > > > ABIv2, and > > > > > worked out the patch below before I noticed Naveen's patchset and the > > > > > latest > > > > > changes in ppc tree for a better way to check for ABI versions. > > > > > > > > > > However, since the issue described below affect mainline and stable > > > > > kernels, > > > > > would you consider applying it before merging your two patchsets, so > > > > > that we can > > > > > more easily backport the fix? > > > > > > > > Hi Cascardo, > > > > Given that this has been broken on ABIv2 since forever, I didn't bother > > > > fixing it. But, I can see why this would be a good thing to have for > > > > -stable and existing distros. However, while your patch below may fix > > > > the crash you're seeing on ppc64le, it is not sufficient -- you'll need > > > > changes in bpf_jit_asm.S as well. > > > > > > Hi, Naveen. > > > > > > Any tips on how to exercise possible issues there? Or what changes you > > > think > > > would be sufficient? > > > > The calling convention is different with ABIv2 and so we'll need changes > > in bpf_slow_path_common() and sk_negative_common(). > > How big would those changes be? Do we know? > > How come no one reported this was broken previously? This is the first I've > heard of it being broken. > I just heard of it less than two weeks ago, and only could investigate it last week, when I realized mainline was also affected. It looks like the little-endian support for classic JIT were done before the conversion to ABIv2. And as JIT is disabled by default, no one seems to have exercised it. > > However, rather than enabling classic JIT for ppc64le, are we better off > > just disabling it? > > > > --- a/arch/powerpc/Kconfig > > +++ b/arch/powerpc/Kconfig > > @@ -128,7 +128,7 @@ config PPC > > select IRQ_FORCED_THREADING > > select HAVE_RCU_TABLE_FREE if SMP > > select HAVE_SYSCALL_TRACEPOINTS > > - select HAVE_CBPF_JIT > > + select HAVE_CBPF_JIT if CPU_BIG_ENDIAN > > select HAVE_ARCH_JUMP_LABEL > > select ARCH_HAVE_NMI_SAFE_CMPXCHG > > select ARCH_HAS_GCOV_PROFILE_ALL > > > > > > Michael, > > Let me know your thoughts on whether you intend to take this patch or > > Cascardo's patch for -stable before the eBPF patches. I can redo my > > patches accordingly. > > This patch sounds like the best option at the moment for something we can > backport. Unless the changes to fix it are minimal. > > cheers > With my patch only, I can run a minimal tcpdump tcp port 22 with success. It correctly filter packets. But as pointed out, slow paths may not be taken. I don't have strong opinions on what to apply to stable, just that it would be nice to have something for the crash before applying all the nice changes by Naveen. Cascardo. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Qemu-ppc] [PATCH v2] powerpc/pseries: start rtasd before PCI probing
On Wed, 15 Jun 2016 22:26:41 +0200 Greg Kurzwrote: > A strange behaviour is observed when comparing PCI hotplug in QEMU, between > x86 and pseries. If you consider the following steps: > - start a VM > - add a PCI device via the QEMU monitor before the rtasd has started (for > example starting the VM in paused state, or hotplug during FW or boot > loader) > - resume the VM execution > > The x86 kernel detects the PCI device, but the pseries one does not. > > This happens because the rtasd kernel worker is currently started under > device_initcall, while PCI probing happens earlier under subsys_initcall. > > As a consequence, if we have a pending RTAS event at boot time, a message > is printed and the event is dropped. > > This patch moves all the initialization of rtasd to arch_initcall, which is > run before subsys_call: this way, logging_enabled is true when the RTAS > event pops up and it is not lost anymore. > > The proc fs bits stay at device_initcall because they cannot be run before > fs_initcall. > > Signed-off-by: Greg Kurz > --- > v2: - avoid behaviour change: don't create the proc entry if early init failed > I forgot to mention that Thomas had sent a Tested-by for v1, which I think is still valid for v2. > Michael, > > This was also tested under PowerVM: it doesn't fix anything there because the > HMC tells it won't honor DLPAR features as long as the RMC isn't here, which > happens later in the boot sequence. It hence seems impossible to have a > pending > RTAS event at boot time. > > It doesn't seem to break anything either, the kernel boots and hotplug works > okay once the RMC is up. > > Cheers. > > -- > Greg > > --- > arch/powerpc/kernel/rtasd.c | 22 +- > 1 file changed, 17 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c > index e864b7c5884e..a26a02006576 100644 > --- a/arch/powerpc/kernel/rtasd.c > +++ b/arch/powerpc/kernel/rtasd.c > @@ -526,10 +526,8 @@ void rtas_cancel_event_scan(void) > } > EXPORT_SYMBOL_GPL(rtas_cancel_event_scan); > > -static int __init rtas_init(void) > +static int __init rtas_event_scan_init(void) > { > - struct proc_dir_entry *entry; > - > if (!machine_is(pseries) && !machine_is(chrp)) > return 0; > > @@ -562,13 +560,27 @@ static int __init rtas_init(void) > return -ENOMEM; > } > > + start_event_scan(); > + > + return 0; > +} > +arch_initcall(rtas_event_scan_init); > + > +static int __init rtas_init(void) > +{ > + struct proc_dir_entry *entry; > + > + if (!machine_is(pseries) && !machine_is(chrp)) > + return 0; > + > + if (!rtas_log_buf) > + return -ENODEV; > + > entry = proc_create("powerpc/rtas/error_log", S_IRUSR, NULL, > _rtas_log_operations); > if (!entry) > printk(KERN_ERR "Failed to create error_log proc entry\n"); > > - start_event_scan(); > - > return 0; > } > __initcall(rtas_init); > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
Hi Tejun, On Thu, Jun 16, 2016 at 03:35:04PM -0400, Tejun Heo wrote: > Hello, > > So, the issue of the initial worker not having its affinity set > correctly wasn't caused by the order of the operations. Reordering > just made set_cpus_allowed tried one more time late enough so that it > hides the race condition most of the time. The problem is that > CPU_ONLINE callbacks are called while the cpu being onlined is online > but not active and select_fallback_rq() only considers active cpus, so > if a kthread gets scheduled in the meantime and it doesn't have any > cpu which is active in its allowed mask, it's allowed mask gets reset > to cpu_possible_mask. > > Would something like the following make sense? > > Thanks. > -- 8< -- > Subject: [PATCH] sched: allow kthreads to fallback to online && !active cpus > > During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is > online but not active. A CPU_ONLINE callback may create or bind a > kthread so that its cpus_allowed mask only allows the CPU which is > being brought online. The kthread may start executing before the CPU > is made active and can end up in select_fallback_rq(). > > In such cases, the expected behavior is selecting the CPU which is > coming online; however, because select_fallback_rq() only chooses from > active CPUs, it determines that the task doesn't have any viable CPU > in its allowed mask and ends up overriding it to cpu_possible_mask. > > CPU_ONLINE callbacks should be able to put kthreads on the CPU which > is coming online. Update select_fallback_rq() so that it follows > cpu_online() rather than cpu_active() for kthreads. > > Signed-off-by: Tejun Heo> Reported-by: Gautham R Shenoy Hi Tejun, This patch fixes the issue on POWER. I am able to see the worker threads of the unbound workqueues of the newly onlined node with this. Tested-by: Gautham R. Shenoy > --- > kernel/sched/core.c |4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 017d539..a12e3db 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1536,7 +1536,9 @@ static int select_fallback_rq(int cpu, struct > task_struct *p) > for (;;) { > /* Any allowed, online CPU? */ > for_each_cpu(dest_cpu, tsk_cpus_allowed(p)) { > - if (!cpu_active(dest_cpu)) > + if (!(p->flags & PF_KTHREAD) && !cpu_active(dest_cpu)) > + continue; > + if (!cpu_online(dest_cpu)) > continue; > goto out; > } > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v6, 1/2] cxl: Add mechanism for delivering AFU driver specific events
> On Jun 21, 2016, at 5:34 AM, Vaibhav Jainwrote: > > Hi Ian, > > Ian Munsie writes: > >> Excerpts from Vaibhav Jain's message of 2016-06-20 14:20:16 +0530: >> >> What exactly is the use case for this API? I'd vote to drop it if we can >> do without it. > Agree with this. Functionality of this API can be merged with > cxl_set_driver_ops when called with NULL arg for cxl_afu_driver_ops. Passing a NULL arg instead of calling an 'unset' API is fine with us. I'll add that for cxlflash, I can't envision a scenario where we'll unset the driver ops for a context. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] leds: Add no-op gpio_led_register_device when LED subsystem is disabled
On 06/21/2016 01:48 PM, Andrew F. Davis wrote: On 06/21/2016 02:09 AM, Jacek Anaszewski wrote: Hi Andrew, This patch doesn't apply, please rebase onto recent LED tree. On 06/21/2016 12:13 AM, Andrew F. Davis wrote: Some systems use 'gpio_led_register_device' to make an in-memory copy of their LED device table so the original can be removed as .init.rodata. When the LED subsystem is not enabled source in the led directory is not built and so this function may be undefined. Fix this here. Signed-off-by: Andrew F. Davis--- include/linux/leds.h | 8 1 file changed, 8 insertions(+) diff --git a/include/linux/leds.h b/include/linux/leds.h index d2b1306..a4a3da6 100644 --- a/include/linux/leds.h +++ b/include/linux/leds.h @@ -386,8 +386,16 @@ struct gpio_led_platform_data { unsigned long *delay_off); Currently there is some stuff here, and in fact it has been for a long time. Patch "[PATCH 12/12] leds: Only descend into leds directory when CONFIG_NEW_LEDS is set" also doesn't apply. What repository are you using? v4.7-rc4, it may not apply due to the surrounding lines being changed in the other patches which may not be applied to your tree. It is a single line change per patch so hopefully the merge conflict resolutions will be trivial. A better solution could have been getting an ack from each maintainer and having someone pull the whole series into one tree, but parts have already been picked so it may be a little late for that. OK, I resolved the issues and applied, thanks. }; +#ifdef CONFIG_NEW_LEDS struct platform_device *gpio_led_register_device( int id, const struct gpio_led_platform_data *pdata); +#else +static inline struct platform_device *gpio_led_register_device( + int id, const struct gpio_led_platform_data *pdata) +{ + return 0; +} +#endif enum cpu_led_event { CPU_LED_IDLE_START, /* CPU enters idle */ -- Best regards, Jacek Anaszewski ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 1/6] qspinlock: powerpc support qspinlock
On 2016年06月07日 05:41, Benjamin Herrenschmidt wrote: On Mon, 2016-06-06 at 17:59 +0200, Peter Zijlstra wrote: On Fri, Jun 03, 2016 at 02:33:47PM +1000, Benjamin Herrenschmidt wrote: - For the above, can you show (or describe) where the qspinlock improves things compared to our current locks. So currently PPC has a fairly straight forward test-and-set spinlock IIRC. You have this because LPAR/virt muck and lock holder preemption issues etc.. qspinlock is 1) a fair lock (like ticket locks) and 2) provides out-of-word spinning, reducing cacheline pressure. Thanks Peter. I think I understand the theory, but I'd like see it translate into real numbers. Esp. on multi-socket x86 we saw the out-of-word spinning being a big win over our ticket locks. And fairness, brought to us by the ticket locks a long time ago, eliminated starvation issues we had, where a spinner local to the holder would 'always' win from a spinner further away. So under heavy enough local contention, the spinners on 'remote' CPUs would 'never' get to own the lock. I think our HW has tweaks to avoid that from happening with the simple locks in the underlying ll/sc implementation. In any case, what I'm asking is actual tests to verify it works as expected for us. IF HW has such tweaks then there mush be performance drop when total cpu's number grows up. And I got such clues one simple benchmark test: it tests how many spin_lock/spin_unlock pairs can be done within 15 seconds on all cpus. say, while(!done) { spin_lock() this_cpu_inc(loops) spin_unlock() } I do the test on two machines, one is using powerKVM, and the other is using pHyp. the result below shows what the sum of loops is in the end, with K form. cpu count | pv-qspinlock | test-set spinlock| 8 (powerKVM)| 62830K | 67340K | 8 (pHyp)| 49800K | 59330K | 32 (pHyp) | 87580K | 20990K | - while cpu count grows up, the lock/unlock pairs ops of test-set spinlock drops very much. this is because the cache bouncing in different physical cpus. So to verify how both spinlock impact the data-cache, another simple benchmark test. code looks like: struct _x { spinlock_t lk; unsigned long x; } x; while(!this_cpu_read(stop)) { int i = 0xff spin_lock(x.lk) this_cpu_inc(loops) while(i--) READ_ONCE(x.x); spin_unlock(x.lk) } the result below shows what the sum of loops is in the end, with K form. cpu count | pv-qspinlock | test-set spinlock| 8 (pHyp)| 13240K | 9780K | 32 (pHyp) | 25790K | 9700K | obviously pv-qspinlock is more cache-friendly, and has better performance than test-set spinlock. More test is going on, I will send out new patch set with the result. HOPE *within* this week. unixbench really takes a long time. thanks xinhui pv-qspinlock tries to preserve the fairness while allowing limited lock stealing and explicitly managing which vcpus to wake. Right. While there's theory and to some extent practice on x86, it would be nice to validate the effects on POWER. Right; so that will have to be from benchmarks which I cannot help you with ;-) Precisely :-) This is what I was asking for ;-) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/kprobes: Remove kretprobe_trampoline_holder.
On Thu, 2016-31-03 at 20:10:40 UTC, Thiago Jung Bauermann wrote: > Fixes the following testsuite failure: > > $ sudo ./perf test -v kallsyms >1: vmlinux symtab matches kallsyms : > --- start --- > test child forked, pid 12489 > Using /proc/kcore for kernel object code > Looking at the vmlinux_path (8 entries long) > Using /boot/vmlinux for symbols > 0xc003d300: diff name v: .kretprobe_trampoline_holder k: > kretprobe_trampoline > Maps only in vmlinux: >c086ca38-c0879b6c 87ca38 [kernel].text.unlikely >c0879b6c-c0bf 889b6c [kernel].meminit.text >c0bf-c0c53264 c0 [kernel].init.text >c0c53264-d425 c63264 [kernel].exit.text >d425-d445 0 [libcrc32c] >d445-d462 0 [xfs] >d462-d468 0 [autofs4] >d468-d46e 0 [x_tables] >d46e-d478 0 [ip_tables] >d478-d47e 0 [rng_core] >d47e- 0 [pseries_rng] > Maps in vmlinux with a different name in kallsyms: > Maps only in kallsyms: >d000-f000 1001 [kernel.kallsyms] >f000- 3001 [kernel.kallsyms] > test child finished with -1 > end > vmlinux symtab matches kallsyms: FAILED! > > The problem is that the kretprobe_trampoline symbol looks like this: > > $ eu-readelf -s /boot/vmlinux G kretprobe_trampoline >2431: c1302368 24 NOTYPE LOCAL DEFAULT 37 > kretprobe_trampoline_holder >2432: c003d300 8 FUNCLOCAL DEFAULT1 > .kretprobe_trampoline_holder > 97543: c003d300 0 NOTYPE GLOBAL DEFAULT1 > kretprobe_trampoline > > Its type is NOTYPE, and its size is 0, and this is a problem because > symbol-elf.c:dso__load_sym skips function symbols that are not STT_FUNC > or STT_GNU_IFUNC (this is determined by elf_sym__is_function). Even > if the type is changed to STT_FUNC, when dso__load_sym calls > symbols__fixup_duplicate, the kretprobe_trampoline symbol is dropped in > favour of .kretprobe_trampoline_holder because the latter has non-zero > size (as determined by choose_best_symbol). > > With this patch, all vmlinux symbols match /proc/kallsyms and the > testcase passes. > > Commit c1c355ce14c0 ("x86/kprobes: Get rid of > kretprobe_trampoline_holder()") gets rid of kretprobe_trampoline_holder > altogether on x86. This commit does the same on powerpc. This change > introduces no regressions on the perf and ftracetest testsuite results. > > Cc: Ananth N Mavinakayanahalli> Cc: Michael Ellerman > Reviewed-by: Naveen N. Rao > Signed-off-by: Thiago Jung Bauermann Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/61ed9cfb1b0951a3b4b98dd8bf cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/powernv: Print correct PHB type names
On Tue, 2016-21-06 at 02:35:56 UTC, Gavin Shan wrote: > We're initializing "IODA1" and "IODA2" PHBs though they are IODA2 > and NPU PHBs as below kernel log indicates. > >Initializing IODA1 OPAL PHB /pciex@3fffe4070 >Initializing IODA2 OPAL PHB /pciex@3fff00040 > > This fixes the PHB names. After it's applied, we get: > >Initializing IODA2 PHB (/pciex@3fffe4070) >Initializing NPU PHB (/pciex@3fff00040) > > Signed-off-by: Gavin ShanApplied to powerpc next, thanks. https://git.kernel.org/powerpc/c/9497a1c1c5b4de2a359b6d8648 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v2] powerpc: export cpu_to_core_id()
On Thu, 2016-02-06 at 11:45:14 UTC, Mauricio Faria de Oliveira wrote: > Export cpu_to_core_id(). This will be used by the lpfc driver. > > This enables topology_core_id() from (defined > to cpu_to_core_id() in arch/powerpc/include/asm/topology.h) to be > used by (non-builtin) modules. > > That is arch-neutral, already used by eg, drivers/base/topology.c, > but it is builtin (obj-y in Makefile) thus didn't need the export. > > Since the module uses topology_core_id() and this is defined to > cpu_to_core_id(), it needs the export, otherwise: > > ERROR: "cpu_to_core_id" [drivers/scsi/lpfc/lpfc.ko] undefined! > > Signed-off-by: Mauricio Faria de OliveiraApplied to powerpc next, thanks. https://git.kernel.org/powerpc/c/f8ab481066e7246e4b272233aa cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/pci: Fix SRIOV not building without EEH enabled
On Fri, 2016-17-06 at 05:25:17 UTC, Russell Currey wrote: > On Book3E CPUs (and possibly other configs), it is possible to have SRIOV > (CONFIG_PCI_IOV) set without CONFIG_EEH. The SRIOV code does not check > for this, and if EEH is disabled, pci_dn.c fails to build. > > Fix this by gating the EEH-specific code in the SRIOV implementation > behind CONFIG_EEH. > > Fixes: 39218cd0 ("powerpc/eeh: EEH device for VF") > Reported-by: Michael Ellerman> Signed-off-by: Russell Currey Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/fb36e90736938d50fdaa1be7af cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v7,3/3] powerpc: Load Monitor Register Tests
On Thu, 2016-09-06 at 02:31:10 UTC, Michael Neuling wrote: > From: Jack Miller> > Adds two tests. One is a simple test to ensure that the new registers > LMRR and LMSER are properly maintained. The other actually uses the > existing EBB test infrastructure to test that LMRR and LMSER behave as > documented. > > Signed-off-by: Jack Miller > Signed-off-by: Michael Neuling Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/16c19a2e983346c547501795aa cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v7,2/3] powerpc: Load Monitor Register Support
On Thu, 2016-09-06 at 02:31:09 UTC, Michael Neuling wrote: > From: Jack Miller> > This enables new registers, LMRR and LMSER, that can trigger an EBB in > userspace code when a monitored load (via the new ldmx instruction) > loads memory from a monitored space. This facility is controlled by a > new FSCR bit, LM. > > This patch disables the FSCR LM control bit on task init and enables > that bit when a load monitor facility unavailable exception is taken > for using it. On context switch, this bit is then used to determine > whether the two relevant registers are saved and restored. This is > done lazily for performance reasons. > > Signed-off-by: Jack Miller > Signed-off-by: Michael Neuling Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/bd3ea317fddfd0f2044f94bed2 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v7,1/3] powerpc: Improve FSCR init and context switching
On Thu, 2016-09-06 at 02:31:08 UTC, Michael Neuling wrote: > This fixes a few issues with FSCR init and switching. > ... > > Signed-off-by: Michael NeulingApplied to powerpc next, thanks. https://git.kernel.org/powerpc/c/b57bd2de8c6c9aa03f1b899edd cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [1/2] Fix misleading comment in early_setup_secondary
On Fri, 2016-04-03 at 05:01:48 UTC, Madhavan Srinivasan wrote: > Current comment in the early_setup_secondary() for > paca->soft_enabled update is misleading. Comment should say to > Mark interrupts "disabled" insteads of "enable". > Patch to fix the typo. > > Signed-off-by: Madhavan SrinivasanApplied to powerpc next, thanks. https://git.kernel.org/powerpc/c/103b7827d977ea34c982e6a9d2 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v10,01/18] PCI: Add pcibios_setup_bridge()
On Fri, 2016-20-05 at 06:41:25 UTC, Gavin Shan wrote: > Currently, PowerPC PowerNV platform utilizes ppc_md.pcibios_fixup(), > which is called for once after PCI probing and resource assignment > are completed, to allocate platform required resources for PCI devices: > PE#, IO and MMIO mapping, DMA address translation (TCE) table etc. > Obviously, it's not hotplug friendly. > > This adds weak function pcibios_setup_bridge(), which is called by > pci_setup_bridge(). PowerPC PowerNV platform will reuse the function > to assign above platform required resources to newly plugged PCI devices > during PCI hotplug in subsequent patches. > > Signed-off-by: Gavin Shan> Acked-by: Bjorn Helgaas Entire series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/d366d28cd1325f11d582ec6d4a cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] leds: Add no-op gpio_led_register_device when LED subsystem is disabled
On 06/21/2016 02:09 AM, Jacek Anaszewski wrote: > Hi Andrew, > > This patch doesn't apply, please rebase onto recent LED tree. > > On 06/21/2016 12:13 AM, Andrew F. Davis wrote: >> Some systems use 'gpio_led_register_device' to make an in-memory copy of >> their LED device table so the original can be removed as .init.rodata. >> When the LED subsystem is not enabled source in the led directory is not >> built and so this function may be undefined. Fix this here. >> >> Signed-off-by: Andrew F. Davis>> --- >> include/linux/leds.h | 8 >> 1 file changed, 8 insertions(+) >> >> diff --git a/include/linux/leds.h b/include/linux/leds.h >> index d2b1306..a4a3da6 100644 >> --- a/include/linux/leds.h >> +++ b/include/linux/leds.h >> @@ -386,8 +386,16 @@ struct gpio_led_platform_data { >> unsigned long *delay_off); > > Currently there is some stuff here, and in fact it has been for > a long time. > > Patch "[PATCH 12/12] leds: Only descend into leds directory when > CONFIG_NEW_LEDS is set" also doesn't apply. > What repository are you using? > v4.7-rc4, it may not apply due to the surrounding lines being changed in the other patches which may not be applied to your tree. It is a single line change per patch so hopefully the merge conflict resolutions will be trivial. A better solution could have been getting an ack from each maintainer and having someone pull the whole series into one tree, but parts have already been picked so it may be a little late for that. >> }; >> >> +#ifdef CONFIG_NEW_LEDS >> struct platform_device *gpio_led_register_device( >> int id, const struct gpio_led_platform_data *pdata); >> +#else >> +static inline struct platform_device *gpio_led_register_device( >> + int id, const struct gpio_led_platform_data *pdata) >> +{ >> + return 0; >> +} >> +#endif >> >> enum cpu_led_event { >> CPU_LED_IDLE_START, /* CPU enters idle */ >> > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] ppc: Fix BPF JIT for ABIv2
On Tue, 2016-06-21 at 14:28 +0530, Naveen N. Rao wrote: > On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote: > > On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote: > > > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote: > > > > > > > > Hi, Michael and Naveen. > > > > > > > > I noticed independently that there is a problem with BPF JIT and ABIv2, > > > > and > > > > worked out the patch below before I noticed Naveen's patchset and the > > > > latest > > > > changes in ppc tree for a better way to check for ABI versions. > > > > > > > > However, since the issue described below affect mainline and stable > > > > kernels, > > > > would you consider applying it before merging your two patchsets, so > > > > that we can > > > > more easily backport the fix? > > > > > > Hi Cascardo, > > > Given that this has been broken on ABIv2 since forever, I didn't bother > > > fixing it. But, I can see why this would be a good thing to have for > > > -stable and existing distros. However, while your patch below may fix > > > the crash you're seeing on ppc64le, it is not sufficient -- you'll need > > > changes in bpf_jit_asm.S as well. > > > > Hi, Naveen. > > > > Any tips on how to exercise possible issues there? Or what changes you think > > would be sufficient? > > The calling convention is different with ABIv2 and so we'll need changes > in bpf_slow_path_common() and sk_negative_common(). How big would those changes be? Do we know? How come no one reported this was broken previously? This is the first I've heard of it being broken. > However, rather than enabling classic JIT for ppc64le, are we better off > just disabling it? > > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -128,7 +128,7 @@ config PPC > select IRQ_FORCED_THREADING > select HAVE_RCU_TABLE_FREE if SMP > select HAVE_SYSCALL_TRACEPOINTS > - select HAVE_CBPF_JIT > + select HAVE_CBPF_JIT if CPU_BIG_ENDIAN > select HAVE_ARCH_JUMP_LABEL > select ARCH_HAVE_NMI_SAFE_CMPXCHG > select ARCH_HAS_GCOV_PROFILE_ALL > > > Michael, > Let me know your thoughts on whether you intend to take this patch or > Cascardo's patch for -stable before the eBPF patches. I can redo my > patches accordingly. This patch sounds like the best option at the moment for something we can backport. Unless the changes to fix it are minimal. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [6/6] ppc: ebpf/jit: Implement JIT compiler for extended BPF
On Tue, 2016-06-21 at 12:28 +0530, Naveen N. Rao wrote: > On 2016/06/21 09:38AM, Michael Ellerman wrote: > > On Sun, 2016-06-19 at 23:06 +0530, Naveen N. Rao wrote: > > > > > > #include > > > > > > in bpf_jit_comp64.c > > > > > > Can you please check if it resolves the build error? > > > > Can you? :D > > :) > Sorry, I should have explained myself better. I did actually try your > config and I was able to reproduce the build error. After the above > #include, that error went away, but I saw some vdso related errors. I > thought I was doing something wrong and needed a different setup for > that particular kernel config, which is why I requested your help in the > matter. I just didn't do a good job of putting across that message... Ah OK. Not sure why you're seeing VDSO errors? > Note to self: randconfig builds *and* more time drafting emails :) No stress. You don't need to do randconfig builds, or even build all the arch/powerpc/ configs, just try to do a reasonable set, something like - ppc64, powernv, pseries, pmac32, ppc64e. I'm happy to catch the esoteric build failures. > Do you want me to respin the patches? No that's fine, I'll fix it up here. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Fix faults caused by radix patching of SLB miss handler
As part of the Radix MMU support we added some feature sections in the SLB miss handler. These are intended to catch the case that we incorrectly take an SLB miss when Radix is enabled, and instead of crashing weirdly they bail out to a well defined exit path and trigger an oops. However the way they were written meant the bailout case was enabled by default until we did CPU feature patching. On powermacs the early debug prints in setup_system() can cause an SLB miss, which happens before code patching, and so the SLB miss handler would incorrectly bailout and crash during boot. Fix it by inverting the sense of the feature section, so that the code which is in place at boot is correct for the hash case. Once we determine we are using Radix - which will never happen on a powermac - only then do we patch in the bailout case which unconditionally jumps. Fixes: caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code") Reported-by: Denis KirjanovTested-by: Denis Kirjanov Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/exceptions-64s.S | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 4c9440629128..8bcc1b457115 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1399,11 +1399,12 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX) lwz r9,PACA_EXSLB+EX_CCR(r13) /* get saved CR */ mtlrr10 -BEGIN_MMU_FTR_SECTION - b 2f -END_MMU_FTR_SECTION_IFSET(MMU_FTR_RADIX) andi. r10,r12,MSR_RI /* check for unrecoverable exception */ +BEGIN_MMU_FTR_SECTION beq-2f +FTR_SECTION_ELSE + b 2f +ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_RADIX) .machine push .machine "power4" -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v6, 1/2] cxl: Add mechanism for delivering AFU driver specific events
Hi Ian, Ian Munsiewrites: > Excerpts from Vaibhav Jain's message of 2016-06-20 14:20:16 +0530: >> > +int cxl_unset_driver_ops(struct cxl_context *ctx) >> > +{ >> > +if (atomic_read(>afu_driver_events)) >> > +return -EBUSY; >> > + >> > +ctx->afu_driver_ops = NULL; >> Need a write memory barrier so that afu_driver_ops isnt possibly called >> after this store. > > What situation do you think this will help? I haven't looked closely at > the last few iterations of this patch set, but if you're in a situation > where you might be racing with some code doing e.g. > > if (ctx->afu_driver_ops) > ctx->afu_driver_ops->something(); > > You have a race with or without a memory barrier. Ideally you would just > have the caller guarantee that it will only call cxl_unset_driver_ops if > no further calls to afu_driver_ops is possible, otherwise you may need > locking here which would be far from ideal. Yes, agree that wmb wont save against the race condition mentioned and this is much better handled with locking. But imho having a wmb is still better compared to having no locking for this shared variable. > > What exactly is the use case for this API? I'd vote to drop it if we can > do without it. Agree with this. Functionality of this API can be merged with cxl_set_driver_ops when called with NULL arg for cxl_afu_driver_ops. ~ Vaibhav ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/align: Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE
On Tuesday, June 21, 2016 10:51:00 AM CEST Michael Ellerman wrote: > On Fri, 2016-06-17 at 12:46 +0200, Arnd Bergmann wrote: > > On Friday, June 17, 2016 1:35:35 PM CEST Daniel Axtens wrote: > > > > It would be better to fix the sparse compilation so the same endianess > > > > is set that you get when calling gcc. > > > > > > I will definitely work on a patch to sparse! I'd still like this or > > > something like it to go in though, so we can keep working on reducing > > > the sparse warning count while the sparse patch is in the works. > > > > I think you just need to fix the Makefile so it sets the right > > arguments when calling sparse. > > > > Something like the (untested) patch below, similar to how we > > already handle the word size and how some other architectures > > handle setting __BIG_ENDIAN__. > > Yep that's clearly better. I didn't know we had separate CHECKER_FLAGS. > > Daniel can you test that? > > Arnd we'll add Suggested-by: you, or send a SOB if you like? > Please use 'Suggested-by', the main work for this patch was in analysing the problem and writing the changelog, and Daniel did that. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] ppc: Fix BPF JIT for ABIv2
On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote: > On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote: > > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote: > > > > > > Hi, Michael and Naveen. > > > > > > I noticed independently that there is a problem with BPF JIT and ABIv2, > > > and > > > worked out the patch below before I noticed Naveen's patchset and the > > > latest > > > changes in ppc tree for a better way to check for ABI versions. > > > > > > However, since the issue described below affect mainline and stable > > > kernels, > > > would you consider applying it before merging your two patchsets, so that > > > we can > > > more easily backport the fix? > > > > Hi Cascardo, > > Given that this has been broken on ABIv2 since forever, I didn't bother > > fixing it. But, I can see why this would be a good thing to have for > > -stable and existing distros. However, while your patch below may fix > > the crash you're seeing on ppc64le, it is not sufficient -- you'll need > > changes in bpf_jit_asm.S as well. > > Hi, Naveen. > > Any tips on how to exercise possible issues there? Or what changes you think > would be sufficient? The calling convention is different with ABIv2 and so we'll need changes in bpf_slow_path_common() and sk_negative_common(). However, rather than enabling classic JIT for ppc64le, are we better off just disabling it? --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -128,7 +128,7 @@ config PPC select IRQ_FORCED_THREADING select HAVE_RCU_TABLE_FREE if SMP select HAVE_SYSCALL_TRACEPOINTS - select HAVE_CBPF_JIT + select HAVE_CBPF_JIT if CPU_BIG_ENDIAN select HAVE_ARCH_JUMP_LABEL select ARCH_HAVE_NMI_SAFE_CMPXCHG select ARCH_HAS_GCOV_PROFILE_ALL Michael, Let me know your thoughts on whether you intend to take this patch or Cascardo's patch for -stable before the eBPF patches. I can redo my patches accordingly. - Naveen ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH v2 1/4] PCI: Ignore resource_alignment if PCI_PROBE_ONLY was set\\
On 2016/6/21 10:16, Yongji Xie wrote: On 2016/6/21 9:43, Bjorn Helgaas wrote: On Thu, Jun 02, 2016 at 01:46:48PM +0800, Yongji Xie wrote: The resource_alignment will releases memory resources allocated by firmware so that kernel can reassign new resources later on. But this will cause the problem that no resources can be allocated by kernel if PCI_PROBE_ONLY was set, e.g. on pSeries platform because PCI_PROBE_ONLY force kernel to use firmware setup and not to reassign any resources. To solve this problem, this patch ignores resource_alignment if PCI_PROBE_ONLY was set. Signed-off-by: Yongji Xie--- drivers/pci/pci.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index c8b4dbd..a259394 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4761,6 +4761,12 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev) spin_lock(_alignment_lock); p = resource_alignment_param; while (*p) { +if (pci_has_flag(PCI_PROBE_ONLY)) { +printk(KERN_ERR "PCI: Ignore resource_alignment parameter: %s with PCI_PROBE_ONLY set\n", +p); +*p = 0; +break; Wouldn't it be simpler to make pci_set_resource_alignment_param() fail if PCI_PROBE_ONLY is set? I add the check here because I want to print some logs so that users could know the reason why resource_alignment doesn't work when they add this parameter. Thanks, Yongji Sorry, please ignore the previous reply. I didn't add this check in pci_set_resource_alignment_param() because PCI_PROBE_ONLY may be set after we parse "resource_alignment". And it seems that printk_once() may be better here so that we don't need to set *p = 0. Thanks, Yongji +} count = 0; if (sscanf(p, "%d%n", _order, ) == 1 && p[count] == '@') { -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] leds: Add no-op gpio_led_register_device when LED subsystem is disabled
Hi Andrew, This patch doesn't apply, please rebase onto recent LED tree. On 06/21/2016 12:13 AM, Andrew F. Davis wrote: Some systems use 'gpio_led_register_device' to make an in-memory copy of their LED device table so the original can be removed as .init.rodata. When the LED subsystem is not enabled source in the led directory is not built and so this function may be undefined. Fix this here. Signed-off-by: Andrew F. Davis--- include/linux/leds.h | 8 1 file changed, 8 insertions(+) diff --git a/include/linux/leds.h b/include/linux/leds.h index d2b1306..a4a3da6 100644 --- a/include/linux/leds.h +++ b/include/linux/leds.h @@ -386,8 +386,16 @@ struct gpio_led_platform_data { unsigned long *delay_off); Currently there is some stuff here, and in fact it has been for a long time. Patch "[PATCH 12/12] leds: Only descend into leds directory when CONFIG_NEW_LEDS is set" also doesn't apply. What repository are you using? }; +#ifdef CONFIG_NEW_LEDS struct platform_device *gpio_led_register_device( int id, const struct gpio_led_platform_data *pdata); +#else +static inline struct platform_device *gpio_led_register_device( + int id, const struct gpio_led_platform_data *pdata) +{ + return 0; +} +#endif enum cpu_led_event { CPU_LED_IDLE_START, /* CPU enters idle */ -- Best regards, Jacek Anaszewski ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [6/6] ppc: ebpf/jit: Implement JIT compiler for extended BPF
On 2016/06/21 09:38AM, Michael Ellerman wrote: > On Sun, 2016-06-19 at 23:06 +0530, Naveen N. Rao wrote: > > On 2016/06/17 10:53PM, Michael Ellerman wrote: > > > On Tue, 2016-07-06 at 13:32:23 UTC, "Naveen N. Rao" wrote: > > > > diff --git a/arch/powerpc/net/bpf_jit_comp64.c > > > > b/arch/powerpc/net/bpf_jit_comp64.c > > > > new file mode 100644 > > > > index 000..954ff53 > > > > --- /dev/null > > > > +++ b/arch/powerpc/net/bpf_jit_comp64.c > > > > @@ -0,0 +1,956 @@ > > > ... > > > > > + > > > > +static void bpf_jit_fill_ill_insns(void *area, unsigned int size) > > > > +{ > > > > + int *p = area; > > > > + > > > > + /* Fill whole space with trap instructions */ > > > > + while (p < (int *)((char *)area + size)) > > > > + *p++ = BREAKPOINT_INSTRUCTION; > > > > +} > > > > > > This breaks the build for some configs, presumably you're missing a > > > header: > > > > > > arch/powerpc/net/bpf_jit_comp64.c:30:10: error: > > > 'BREAKPOINT_INSTRUCTION' undeclared (first use in this function) > > > > > > http://kisskb.ellerman.id.au/kisskb/buildresult/12720611/ > > > > Oops. Yes, I should have caught that. I need to add: > > > > #include > > > > in bpf_jit_comp64.c > > > > Can you please check if it resolves the build error? > > Can you? :D :) Sorry, I should have explained myself better. I did actually try your config and I was able to reproduce the build error. After the above #include, that error went away, but I saw some vdso related errors. I thought I was doing something wrong and needed a different setup for that particular kernel config, which is why I requested your help in the matter. I just didn't do a good job of putting across that message... Note to self: randconfig builds *and* more time drafting emails :) Do you want me to respin the patches? Thanks, Naveen ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RESEND PATCH v2 4/4] PCI: Add support for enforcing all MMIO BARs to be page aligned
On 2016/6/21 10:26, Bjorn Helgaas wrote: On Thu, Jun 02, 2016 at 01:46:51PM +0800, Yongji Xie wrote: When vfio passthrough a PCI device of which MMIO BARs are smaller than PAGE_SIZE, guest will not handle the mmio accesses to the BARs which leads to mmio emulations in host. This is because vfio will not allow to passthrough one BAR's mmio page which may be shared with other BARs. Otherwise, there will be a backdoor that guest can use to access BARs of other guest. To solve this issue, this patch modifies resource_alignment to support syntax where multiple devices get the same alignment. So we can use something like "pci=resource_alignment=*:*:*.*:noresize" to enforce the alignment of all MMIO BARs to be at least PAGE_SIZE so that one BAR's mmio page would not be shared with other BARs. And we also define a macro PCIBIOS_MIN_ALIGNMENT to enable this automatically on PPC64 platform which can easily hit this issue because its PAGE_SIZE is 64KB. Note that this would not be applied to VFs whose BARs are always page aligned and should be never reassigned according to SRIOV spec. I see that SR-IOV spec r1.1, sec 3.3.13 requires that all VF BAR resources be aligned on System Page Size, and must be sized to consume an integral number of pages. Where does it say VF BARs can't be reassigned? I thought they *could* be reassigned, as long as VFs are disabled when you do it. Oh, sorry. I made a mistake here. We can reassign VF BARs by writing the alignment to System Page Size(20h) when VFs are disabled. As you said below, VF BARs are read-only zeroes, the normal way(writing BARs) of resources allocation wouldn't be applied to VFs. The resources allocation of VFs have been determined when we enable SR-IOV capability. So we should not touch VF BARs here. It's useless and will release the allocated resources of VFs which leads to a bug. Signed-off-by: Yongji Xie--- Documentation/kernel-parameters.txt |2 ++ arch/powerpc/include/asm/pci.h |2 ++ drivers/pci/pci.c | 68 +-- 3 files changed, 61 insertions(+), 11 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index c4802f5..cb09503 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3003,6 +3003,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. aligned memory resources. If is not specified, PAGE_SIZE is used as alignment. + , , and can be set to + "*" which means match all values. PCI-PCI bridge can be specified, if resource windows need to be expanded. noresize: Don't change the resources' sizes when diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h index a6f3ac0..742fd34 100644 --- a/arch/powerpc/include/asm/pci.h +++ b/arch/powerpc/include/asm/pci.h @@ -28,6 +28,8 @@ #define PCIBIOS_MIN_IO0x1000 #define PCIBIOS_MIN_MEM 0x1000 +#define PCIBIOS_MIN_ALIGNMENT PAGE_SIZE + struct pci_dev; /* Values for the `which' argument to sys_pciconfig_iobase syscall. */ diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 3ee13e5..664f295 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4759,7 +4759,12 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, int seg, bus, slot, func, align_order, count; resource_size_t align = 0; char *p; + bool invalid = false; +#ifdef PCIBIOS_MIN_ALIGNMENT + align = PCIBIOS_MIN_ALIGNMENT; + *resize = false; +#endif This PCIBIOS_MIN_ALIGNMENT part should be a separate patch by itself. OK, I will. If you have PCIBIOS_MIN_ALIGNMENT enabled automatically for powerpc, do you still need the command-line argument? Other archs may benefit from this. And using command-line seems to be more flexible that we can enable/disable this feature dynamically. spin_lock(_alignment_lock); p = resource_alignment_param; while (*p) { @@ -4776,16 +4781,49 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, } else { align_order = -1; } - if (sscanf(p, "%x:%x:%x.%x%n", - , , , , ) != 4) { + if (p[0] == '*' && p[1] == ':') { + seg = -1; + count = 1; + } else if (sscanf(p, "%x%n", , ) != 1 || + p[count] != ':') { + invalid = true; + break; + } + p += count + 1; + if (*p == '*') { +
Re: [v6, 08/11] powerpc/powernv: Add platform support for stop instruction
> > > +#define OPAL_PM_TIMEBASE_STOP0x0002 > > > +#define OPAL_PM_LOSE_HYP_CONTEXT 0x2000 > > > +#define OPAL_PM_LOSE_FULL_CONTEXT0x4000 > > > #define OPAL_PM_NAP_ENABLED 0x0001 > > > #define OPAL_PM_SLEEP_ENABLED0x0002 > > > #define OPAL_PM_WINKLE_ENABLED 0x0004 > > > #define OPAL_PM_SLEEP_ENABLED_ER10x0008 /* with > > > workaround */ > > > +#define OPAL_PM_STOP_INST_FAST 0x0010 > > > +#define OPAL_PM_STOP_INST_DEEP 0x0020 > > I don't see the above in skiboot yet? > I've posted it here - > http://patchwork.ozlabs.org/patch/617828/ FWIW, this is in now. https://github.com/open-power/skiboot/commit/952daa69baca407383bc900911f6c40718a0e289 > > > > > > > > > > diff --git a/arch/powerpc/include/asm/paca.h > > > b/arch/powerpc/include/asm/paca.h > > > index 546540b..ae91b44 100644 > > > --- a/arch/powerpc/include/asm/paca.h > > > +++ b/arch/powerpc/include/asm/paca.h > > > @@ -171,6 +171,8 @@ struct paca_struct { > > > /* Mask to denote subcore sibling threads */ > > > u8 subcore_sibling_mask; > > > #endif > > > + /* Template for PSSCR with EC, ESL, TR, PSLL, MTL fields set > > > */ > > > + u64 thread_psscr; > > I'm not entirely clear on why that needs to be in the paca. Could it > > not be global? > > > While we use Requested Level (RL) field of PSSCR to request a stop > level, other fields in the SPR like EC, ESL, TR, PSLL, MTL can be > modified by individual threads less frequently to alter the behaviour of > stop. So the idea was to have a per-thread variable with all (except RL) > fields of PSSCR set appropriately. Threads at the time of entering idle, > can modify the RL field in the variable and execute stop instruction. But we don't do any of this currently? This is setup at init in pnv_init_idle_states() and only the RL is changed in power_stop(). So it can still be a global. It could just be a constant currently even. > .text > > > > > > /* > > > @@ -61,8 +75,19 @@ save_sprs_to_stack: > > > * Note all register i.e per-core, per-subcore or per-thread > > > is saved > > > * here since any thread in the core might wake up first > > > */ > > > +BEGIN_FTR_SECTION > > > + mfspr r3,SPRN_PTCR > > > + std r3,_PTCR(r1) > > > + mfspr r3,SPRN_LMRR > > > + std r3,_LMRR(r1) > > > + mfspr r3,SPRN_LMSER > > > + std r3,_LMSER(r1) > > > + mfspr r3,SPRN_ASDR > > > + std r3,_ASDR(r1) > > > +FTR_SECTION_ELSE > > A comment here saying that SDR1 is removed in ISA 3.0 would be helpful. > > > Ok. I thought we decided we didn't need LMRR, LMSR, https://lkml.org/lkml/2016/6/8/1121 or ASDR isn't actually used at all yet and is only valid for some page faults, so we don't need it here also. > +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX) > > > + > > > + /* Restore per thread state */ > > > +BEGIN_FTR_SECTION > > > + bl __restore_cpu_power9 > > > + > > > + ld r4,_LMRR(r1) > > > + mtspr SPRN_LMRR,r4 > > > + ld r4,_LMSER(r1) > > > + mtspr SPRN_LMSER,r4 > > > + ld r4,_ASDR(r1) > > > + mtspr SPRN_ASDR,r4 > > Should those be in __restore_cpu_power9 ? > I was not sure how these registers will be used, but after speaking to > Aneesh and Mikey I realized these registers will not need restoring. > LMRR and LMSER are associated with the context and ADSR will be consumed > before entering stop. So I'll be dropping the this hunk in next revision. Yep. > > > > pnv_alloc_idle_core_states(); > > > > > > + if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST) > > > + for_each_possible_cpu(i) { > > > + > > > + u64 psscr_init_val = PSSCR_ESL | PSSCR_EC | > > > + PSSCR_PSLL_MASK | PSSCR_TR_MASK | > > > + PSSCR_MTL_MASK; > > > + > > > + paca[i].thread_psscr = psscr_init_val; This seems to be the only place you set this. Why put it in the paca, why not just make this a constant? Mikey ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/powernv: Exclude MSI region in extended bridge window
On Tue, Jun 21, 2016 at 02:30:48PM +1000, Michael Ellerman wrote: >On Tue, 2016-21-06 at 02:41:05 UTC, Gavin Shan wrote: >> The windows of root port and bridge behind that are extended to >> the PHB's windows to accomodate the PCI hotplug happening in >> future. The PHB's 64KB 32-bits MSI region is included in bridge's >> M32 windows (in hardware) though it's excluded in the corresponding >> resource, as the bridge's M32 windows have 1MB as their minimal >> alignment. We observed EEH error during system boot when the MSI >> region is included in bridge's M32 window. >> >> This excludes top 1MB (including 64KB 32-bits MSI region) region >> from bridge's M32 windows when extending them. > >AFAICS you added that code in "powerpc/powernv: Extend PCI bridge resources", >so >I'll squash it into that. That way there is no window of breakage. > Yeah, I guess it's the best way to go. Thanks a lot, Michael. Thanks, Gavin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev