Re: [PATCH v4 1/3] powerpc/perf: Use stack siar instead of mfspr
On 18/08/21 10:45 pm, Kajol Jain wrote: Minor optimization in the 'perf_instruction_pointer' function code by making use of stack siar instead of mfspr. Fixes: 75382aa72f06 ("powerpc/perf: Move code to select SIAR or pt_regs into perf_read_regs") Signed-off-by: Kajol Jain Tested this patch series, not seeing any '0' values. Tested-by: Nageswara R Sastry example output: # perf report -D | grep addr 0 26236879714 0x3dcc8 [0x38]: PERF_RECORD_SAMPLE(IP, 0x1): 1446/1446: 0xc0113584 period: 1 addr: 0 0 26236882500 0x3dd00 [0x38]: PERF_RECORD_SAMPLE(IP, 0x1): 1446/1446: 0xc0113584 period: 1 addr: 0 0 26236883436 0x3dd38 [0x38]: PERF_RECORD_SAMPLE(IP, 0x1): 1446/1446: 0xc0113584 period: 10 addr: 0 ... --- arch/powerpc/perf/core-book3s.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index bb0ee716de91..1b464aad29c4 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2260,7 +2260,7 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs) else return regs->nip; } else if (use_siar && siar_valid(regs)) - return mfspr(SPRN_SIAR) + perf_ip_adjust(regs); + return siar + perf_ip_adjust(regs); else if (use_siar) return 0; // no valid instruction pointer else -- Thanks and Regards R.Nageswara Sastry
Re: [PATCH v7 1/2] tty: hvc: pass DMA capable memory to put_chars()
在 2021/8/18 上午11:17, Jiri Slaby 写道: Hi, On 17. 08. 21, 15:22, Xianting Tian wrote: As well known, hvc backend can register its opertions to hvc backend. the opertions contain put_chars(), get_chars() and so on. "operations". And there too: Some hvc backend may do dma in its opertions. eg, put_chars() of virtio-console. But in the code of hvc framework, it may pass DMA incapable memory to put_chars() under a specific configuration, which is explained in commit c4baad5029(virtio-console: avoid DMA from stack): 1, c[] is on stack, hvc_console_print(): char c[N_OUTBUF] __ALIGNED__; cons_ops[index]->put_chars(vtermnos[index], c, i); 2, ch is on stack, static void hvc_poll_put_char(,,char ch) { struct tty_struct *tty = driver->ttys[0]; struct hvc_struct *hp = tty->driver_data; int n; do { n = hp->ops->put_chars(hp->vtermno, , 1); } while (n <= 0); } Commit c4baad5029 is just the fix to avoid DMA from stack memory, which is passed to virtio-console by hvc framework in above code. But I think the fix is aggressive, it directly uses kmemdup() to alloc new buffer from kmalloc area and do memcpy no matter the memory is in kmalloc area or not. But most importantly, it should better be fixed in the hvc framework, by changing it to never pass stack memory to the put_chars() function in the first place. Otherwise, we still face the same issue if a new hvc backend using dma added in the furture. We make 'char c[N_OUTBUF]' part of 'struct hvc_struct', so hp->c is no longer the stack memory. we can use it in above two cases. In fact, you need only a single char for the poll case (hvc_poll_put_char), so hvc_struct needs to contain only c, not an array. OTOH, you need c[N_OUTBUF] in the console case (hvc_console_print), but not whole hvc_struct. So cons_hvcs should be an array of structs composed of only the lock and the buffer. Hum. Or maybe rethink and take care of the console case by kmemdup and be done with that case? What problem do you have with allocating 16 bytes? It should be quite easy and really fast (lockless) in most cases. On the contrary, your solution has to take a spinlock to access the global buffer. As I replyed before, this issue may can be solved just by adjust the alignment to L1_CACHE_BYTES or at least 16: #define __ALIGNED__ __attribute__((__aligned__(L1_CACHE_BYTES))) Then, c[16] won't cross the pages, that is to say c[16]'s physical address is continuous. Could you comment this? I submitted v8, I found it still can't solve ths issue, even we create 'char out_buf[N_OUTBUF]' and 'chat out_ch' be part of 'struct hvc_struct', and use it separately, we still need lock to protect each buf. When we invloced lock, it will impact the hvc performance. So we can back to the original intention of this solution, just fix the kmemdup issue in virtio_console driver? Other fix is use L1_CACHE_BYTES as the alignment, use 'sizeof(long)' as dma alignment is wrong. And use struct_size() to calculate size of hvc_struct. This ought to be in separate patches. thanks,
[PATCH] powerpc/tau: Add 'static' storage qualifier to 'tau_work' definition
This patch prevents the following sparse warning. arch/powerpc/kernel/tau_6xx.c:199:1: sparse: sparse: symbol 'tau_work' was not declared. Should it be static? Reported-by: kernel test robot Signed-off-by: Finn Thain --- arch/powerpc/kernel/tau_6xx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/tau_6xx.c b/arch/powerpc/kernel/tau_6xx.c index b9a047d92ec0..8e83d19fe8fa 100644 --- a/arch/powerpc/kernel/tau_6xx.c +++ b/arch/powerpc/kernel/tau_6xx.c @@ -164,7 +164,7 @@ static void tau_work_func(struct work_struct *work) queue_work(tau_workq, work); } -DECLARE_WORK(tau_work, tau_work_func); +static DECLARE_WORK(tau_work, tau_work_func); /* * setup the TAU -- 2.26.3
Re: [PATCH v2 61/63] powerpc: Split memset() to avoid multi-field overflow
On Wed, Aug 18, 2021 at 08:42:18AM +0200, Christophe Leroy wrote: > > > Le 18/08/2021 à 08:05, Kees Cook a écrit : > > In preparation for FORTIFY_SOURCE performing compile-time and run-time > > field bounds checking for memset(), avoid intentionally writing across > > neighboring fields. > > > > Instead of writing across a field boundary with memset(), move the call > > to just the array, and an explicit zeroing of the prior field. > > > > Cc: Benjamin Herrenschmidt > > Cc: Qinglang Miao > > Cc: "Gustavo A. R. Silva" > > Cc: Hulk Robot > > Cc: Wang Wensheng > > Cc: linuxppc-dev@lists.ozlabs.org > > Signed-off-by: Kees Cook > > Reviewed-by: Michael Ellerman > > Link: https://lore.kernel.org/lkml/87czqsnmw9@mpe.ellerman.id.au > > --- > > drivers/macintosh/smu.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/macintosh/smu.c b/drivers/macintosh/smu.c > > index 94fb63a7b357..59ce431da7ef 100644 > > --- a/drivers/macintosh/smu.c > > +++ b/drivers/macintosh/smu.c > > @@ -848,7 +848,8 @@ int smu_queue_i2c(struct smu_i2c_cmd *cmd) > > cmd->read = cmd->info.devaddr & 0x01; > > switch(cmd->info.type) { > > case SMU_I2C_TRANSFER_SIMPLE: > > - memset(>info.sublen, 0, 4); > > + cmd->info.sublen = 0; > > + memset(>info.subaddr, 0, 3); > > subaddr[] is a table, should the & be avoided ? It results in the same thing, but it's better form to not have the &; I will fix this. > And while at it, why not use sizeof(subaddr) instead of 3 ? Agreed. :) Thanks! -- Kees Cook
[PATCH] soc: fsl: guts: Fix a resource leak in the error handling path of 'fsl_guts_probe()'
If an error occurs after 'of_find_node_by_path()', the reference taken for 'root' will never be released and some memory will leak. Instead of adding an error handling path and modifying all the 'return -SOMETHING' into 'goto errorpath', use 'devm_add_action_or_reset()' to release the reference when needed. Simplify the remove function accordingly. As an extra benefit, the 'root' global variable can now be removed as well. Fixes: 3c0d64e867ed ("soc: fsl: guts: reuse machine name from device tree") Signed-off-by: Christophe JAILLET --- Compile tested only --- drivers/soc/fsl/guts.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c index d5e9a5f2c087..4d9476c7b87c 100644 --- a/drivers/soc/fsl/guts.c +++ b/drivers/soc/fsl/guts.c @@ -28,7 +28,6 @@ struct fsl_soc_die_attr { static struct guts *guts; static struct soc_device_attribute soc_dev_attr; static struct soc_device *soc_dev; -static struct device_node *root; /* SoC die attribute definition for QorIQ platform */ @@ -136,14 +135,23 @@ static u32 fsl_guts_get_svr(void) return svr; } +static void fsl_guts_put_root(void *data) +{ + struct device_node *root = data; + + of_node_put(root); +} + static int fsl_guts_probe(struct platform_device *pdev) { struct device_node *np = pdev->dev.of_node; struct device *dev = >dev; + struct device_node *root; struct resource *res; const struct fsl_soc_die_attr *soc_die; const char *machine; u32 svr; + int ret; /* Initialize guts */ guts = devm_kzalloc(dev, sizeof(*guts), GFP_KERNEL); @@ -159,6 +167,10 @@ static int fsl_guts_probe(struct platform_device *pdev) /* Register soc device */ root = of_find_node_by_path("/"); + ret = devm_add_action_or_reset(dev, fsl_guts_put_root, root); + if (ret) + return ret; + if (of_property_read_string(root, "model", )) of_property_read_string_index(root, "compatible", 0, ); if (machine) @@ -197,7 +209,7 @@ static int fsl_guts_probe(struct platform_device *pdev) static int fsl_guts_remove(struct platform_device *dev) { soc_device_unregister(soc_dev); - of_node_put(root); + return 0; } -- 2.30.2
Re: [PATCH v8 2/3] tty: hvc: pass DMA capable memory to put_chars()
Hi Xianting, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on tty/tty-testing] [also build test WARNING on char-misc/char-misc-testing soc/for-next v5.14-rc6 next-20210818] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Xianting-Tian/make-hvc-pass-dma-capable-memory-to-its-backend/20210818-162408 base: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git tty-testing config: arm64-randconfig-r025-20210818 (attached as .config) compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project d2b574a4dea5b718e4386bf2e26af0126e5978ce) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install arm64 cross compiling tool for clang build # apt-get install binutils-aarch64-linux-gnu # https://github.com/0day-ci/linux/commit/e1b7662dafceb07a6905b64da2f1be27498c4a46 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Xianting-Tian/make-hvc-pass-dma-capable-memory-to-its-backend/20210818-162408 git checkout e1b7662dafceb07a6905b64da2f1be27498c4a46 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): >> drivers/tty/hvc/hvc_console.c:880:39: warning: incompatible integer to >> pointer conversion passing 'char' to parameter of type 'const char *'; take >> the address with & [-Wint-conversion] n = hp->ops->put_chars(hp->vtermno, hp->out_ch, 1); ^~ & 1 warning generated. vim +880 drivers/tty/hvc/hvc_console.c 870 871 static void hvc_poll_put_char(struct tty_driver *driver, int line, char ch) 872 { 873 struct tty_struct *tty = driver->ttys[0]; 874 struct hvc_struct *hp = tty->driver_data; 875 int n; 876 877 hp->out_ch = ch; 878 879 do { > 880 n = hp->ops->put_chars(hp->vtermno, hp->out_ch, 1); 881 } while (n <= 0); 882 } 883 #endif 884 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
[PATCH v4 3/3] powerpc/perf: Fix the check for SIAR value
Incase of random sampling, there can be scenarios where Sample Instruction Address Register(SIAR) may not latch to the sampled instruction and could result in the value of 0. In these scenarios it is preferred to return regs->nip. These corner cases are seen in the previous generation (p9) also. Patch adds the check for SIAR value along with regs_use_siar and siar_valid checks so that the function will return regs->nip incase SIAR is zero. Patch drops the code under PPMU_P10_DD1 flag check which handles SIAR 0 case only for Power10 DD1. Fixes: 2ca13a4cc56c9 ("powerpc/perf: Use regs->nip when SIAR is zero") Signed-off-by: Kajol Jain --- Changelog: v3 -> v4 - Remove use_siar variable and directly using regs_use_siar call as suggested by Christophe Leroy v2 -> v3 - Drop adding new ternary condition to check siar value. - Remove siar check specific for PPMU_P10_DD1 and add it along with common checks as suggested by Christophe Leroy and Michael Ellermen arch/powerpc/perf/core-book3s.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 23ec89a59893..b0a589409039 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2251,15 +2251,9 @@ unsigned long perf_misc_flags(struct pt_regs *regs) */ unsigned long perf_instruction_pointer(struct pt_regs *regs) { - bool use_siar = regs_use_siar(regs); unsigned long siar = mfspr(SPRN_SIAR); - if (ppmu && (ppmu->flags & PPMU_P10_DD1)) { - if (siar) - return siar; - else - return regs->nip; - } else if (use_siar && siar_valid(regs)) + if (regs_use_siar(regs) && siar_valid(regs) && siar) return siar + perf_ip_adjust(regs); else return regs->nip; -- 2.26.2
[PATCH v4 2/3] powerpc/perf: Drop the case of returning 0 as instruction pointer
Drop the case of returning 0 as instruction pointer since kernel never executes at 0 and userspace almost never does either. Fixes: e6878835ac47 ("powerpc/perf: Sample only if SIAR-Valid bit is set in P7+") Signed-off-by: Kajol Jain --- arch/powerpc/perf/core-book3s.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 1b464aad29c4..23ec89a59893 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2261,8 +2261,6 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs) return regs->nip; } else if (use_siar && siar_valid(regs)) return siar + perf_ip_adjust(regs); - else if (use_siar) - return 0; // no valid instruction pointer else return regs->nip; } -- 2.26.2
[PATCH v4 1/3] powerpc/perf: Use stack siar instead of mfspr
Minor optimization in the 'perf_instruction_pointer' function code by making use of stack siar instead of mfspr. Fixes: 75382aa72f06 ("powerpc/perf: Move code to select SIAR or pt_regs into perf_read_regs") Signed-off-by: Kajol Jain --- arch/powerpc/perf/core-book3s.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index bb0ee716de91..1b464aad29c4 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2260,7 +2260,7 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs) else return regs->nip; } else if (use_siar && siar_valid(regs)) - return mfspr(SPRN_SIAR) + perf_ip_adjust(regs); + return siar + perf_ip_adjust(regs); else if (use_siar) return 0; // no valid instruction pointer else -- 2.26.2
Re: [PATCH v3 3/3] powerpc/perf: Fix the check for SIAR value
On 8/18/21 6:58 PM, Christophe Leroy wrote: > > > Le 18/08/2021 à 15:19, Kajol Jain a écrit : >> Incase of random sampling, there can be scenarios where >> Sample Instruction Address Register(SIAR) may not latch >> to the sampled instruction and could result in >> the value of 0. In these scenarios it is preferred to >> return regs->nip. These corner cases are seen in the >> previous generation (p9) also. >> >> Patch adds the check for SIAR value along with use_siar and >> siar_valid checks so that the function will return regs->nip >> incase SIAR is zero. >> >> Patch drops the code under PPMU_P10_DD1 flag check >> which handles SIAR 0 case only for Power10 DD1. >> >> Fixes: 2ca13a4cc56c9 ("powerpc/perf: Use regs->nip when SIAR is zero") >> Signed-off-by: Kajol Jain >> --- >> >> Changelog: >> - Drop adding new ternary condition to check siar value. >> - Remove siar check specific for PPMU_P10_DD1 and add >> it along with common checks as suggested by Christophe Leroy >> and Michael Ellermen >> >> arch/powerpc/perf/core-book3s.c | 7 +-- >> 1 file changed, 1 insertion(+), 6 deletions(-) >> >> diff --git a/arch/powerpc/perf/core-book3s.c >> b/arch/powerpc/perf/core-book3s.c >> index 23ec89a59893..55efbba7572b 100644 >> --- a/arch/powerpc/perf/core-book3s.c >> +++ b/arch/powerpc/perf/core-book3s.c >> @@ -2254,12 +2254,7 @@ unsigned long perf_instruction_pointer(struct pt_regs >> *regs) >> bool use_siar = regs_use_siar(regs); >> unsigned long siar = mfspr(SPRN_SIAR); >> - if (ppmu && (ppmu->flags & PPMU_P10_DD1)) { >> - if (siar) >> - return siar; >> - else >> - return regs->nip; >> - } else if (use_siar && siar_valid(regs)) >> + if (use_siar && siar_valid(regs) && siar) > > You can probably now do > > + if (regs_use_siar(regs) && siar_valid(regs) && siar) > > and remove use_siar Hi Christophe, I will update it. Thanks for pointing it. Thanks, Kajol Jain > >> return siar + perf_ip_adjust(regs); >> else >> return regs->nip; >>
Re: [PATCH v2] powerpc/mm: Fix set_memory_*() against concurrent accesses
On 18/08/2021 14:05, Michael Ellerman wrote: > Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes > on one of his systems: > > kernel tried to execute exec-protected page (c00804073278) - exploit > attempt? (uid: 0) > BUG: Unable to handle kernel instruction fetch > Faulting instruction address: 0xc00804073278 > Oops: Kernel access of bad area, sig: 11 [#1] > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries > Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ... > CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12 > Workqueue: events control_work_handler [virtio_console] > NIP: c00804073278 LR: c00804073278 CTR: c01e9de0 > REGS: c0002e4ef7e0 TRAP: 0400 Not tainted (5.14.0-rc4+) > MSR: 80004280b033 CR: 24002822 XER: > 200400cf > ... > NIP fill_queue+0xf0/0x210 [virtio_console] > LR fill_queue+0xf0/0x210 [virtio_console] > Call Trace: > fill_queue+0xb4/0x210 [virtio_console] (unreliable) > add_port+0x1a8/0x470 [virtio_console] > control_work_handler+0xbc/0x1e8 [virtio_console] > process_one_work+0x290/0x590 > worker_thread+0x88/0x620 > kthread+0x194/0x1a0 > ret_from_kernel_thread+0x5c/0x64 > > Jordan, Fabiano & Murilo were able to reproduce and identify that the > problem is caused by the call to module_enable_ro() in do_init_module(), > which happens after the module's init function has already been called. > > Our current implementation of change_page_attr() is not safe against > concurrent accesses, because it invalidates the PTE before flushing the > TLB and then installing the new PTE. That leaves a window in time where > there is no valid PTE for the page, if another CPU tries to access the > page at that time we see something like the fault above. > > We can't simply switch to set_pte_at()/flush TLB, because our hash MMU > code doesn't handle a set_pte_at() of a valid PTE. See [1]. > > But we do have pte_update(), which replaces the old PTE with the new, > meaning there's no window where the PTE is invalid. And the hash MMU > version hash__pte_update() deals with synchronising the hash page table > correctly. > > [1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r@linux.ibm.com/ > > Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines") > Reported-by: Laurent Vivier > Signed-off-by: Fabiano Rosas > Signed-off-by: Michael Ellerman > --- > arch/powerpc/mm/pageattr.c | 23 ++- > 1 file changed, 10 insertions(+), 13 deletions(-) > > v2: Use pte_update(..., ~0, pte_val(pte), ...) as suggested by Fabiano, > and ptep_get() as suggested by Christophe. > > diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c > index 0876216ceee6..edea388e9d3f 100644 > --- a/arch/powerpc/mm/pageattr.c > +++ b/arch/powerpc/mm/pageattr.c > @@ -18,16 +18,12 @@ > /* > * Updates the attributes of a page in three steps: > * > - * 1. invalidate the page table entry > - * 2. flush the TLB > - * 3. install the new entry with the updated attributes > - * > - * Invalidating the pte means there are situations where this will not work > - * when in theory it should. > - * For example: > - * - removing write from page whilst it is being executed > - * - setting a page read-only whilst it is being read by another CPU > + * 1. take the page_table_lock > + * 2. install the new entry with the updated attributes > + * 3. flush the TLB > * > + * This sequence is safe against concurrent updates, and also allows > updating the > + * attributes of a page currently being executed or accessed. > */ > static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) > { > @@ -36,9 +32,7 @@ static int change_page_attr(pte_t *ptep, unsigned long > addr, void *data) > > spin_lock(_mm.page_table_lock); > > - /* invalidate the PTE so it's safe to modify */ > - pte = ptep_get_and_clear(_mm, addr, ptep); > - flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > + pte = ptep_get(ptep); > > /* modify the PTE bits as desired, then apply */ > switch (action) { > @@ -59,11 +53,14 @@ static int change_page_attr(pte_t *ptep, unsigned long > addr, void *data) > break; > } > > - set_pte_at(_mm, addr, ptep, pte); > + pte_update(_mm, addr, ptep, ~0UL, pte_val(pte), 0); > > /* See ptesync comment in radix__set_pte_at() */ > if (radix_enabled()) > asm volatile("ptesync": : :"memory"); > + > + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > + > spin_unlock(_mm.page_table_lock); > > return 0; > > base-commit: cbc06f051c524dcfe52ef0d1f30647828e226d30 > Tested-by: Laurent Vivier
[PATCH 1/1] selftests/powerpc: Add memmove_64 test
While debugging an issue, we wanted to check whether the arch specific kernel memmove implementation is correct. This selftest could help test that. Suggested-by: Aneesh Kumar K.V Suggested-by: Vaibhav Jain Signed-off-by: Ritesh Harjani --- tools/testing/selftests/powerpc/Makefile | 1 + .../selftests/powerpc/memmoveloop/.gitignore | 2 + .../selftests/powerpc/memmoveloop/Makefile| 19 +++ .../powerpc/memmoveloop/asm/asm-compat.h | 0 .../powerpc/memmoveloop/asm/export.h | 4 ++ .../powerpc/memmoveloop/asm/feature-fixups.h | 0 .../selftests/powerpc/memmoveloop/asm/kasan.h | 0 .../powerpc/memmoveloop/asm/ppc_asm.h | 39 + .../powerpc/memmoveloop/asm/processor.h | 0 .../selftests/powerpc/memmoveloop/mem_64.S| 1 + .../selftests/powerpc/memmoveloop/memcpy_64.S | 1 + .../selftests/powerpc/memmoveloop/stubs.S | 8 +++ .../selftests/powerpc/memmoveloop/validate.c | 56 +++ 13 files changed, 131 insertions(+) create mode 100644 tools/testing/selftests/powerpc/memmoveloop/.gitignore create mode 100644 tools/testing/selftests/powerpc/memmoveloop/Makefile create mode 100644 tools/testing/selftests/powerpc/memmoveloop/asm/asm-compat.h create mode 100644 tools/testing/selftests/powerpc/memmoveloop/asm/export.h create mode 100644 tools/testing/selftests/powerpc/memmoveloop/asm/feature-fixups.h create mode 100644 tools/testing/selftests/powerpc/memmoveloop/asm/kasan.h create mode 100644 tools/testing/selftests/powerpc/memmoveloop/asm/ppc_asm.h create mode 100644 tools/testing/selftests/powerpc/memmoveloop/asm/processor.h create mode 12 tools/testing/selftests/powerpc/memmoveloop/mem_64.S create mode 12 tools/testing/selftests/powerpc/memmoveloop/memcpy_64.S create mode 100644 tools/testing/selftests/powerpc/memmoveloop/stubs.S create mode 100644 tools/testing/selftests/powerpc/memmoveloop/validate.c diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile index 0830e63818c1..d110b3e5cbcd 100644 --- a/tools/testing/selftests/powerpc/Makefile +++ b/tools/testing/selftests/powerpc/Makefile @@ -16,6 +16,7 @@ export CFLAGS SUB_DIRS = alignment \ benchmarks \ cache_shape \ + memmoveloop \ copyloops\ dscr \ mm \ diff --git a/tools/testing/selftests/powerpc/memmoveloop/.gitignore b/tools/testing/selftests/powerpc/memmoveloop/.gitignore new file mode 100644 index ..56c1426013d5 --- /dev/null +++ b/tools/testing/selftests/powerpc/memmoveloop/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +memmove_64 diff --git a/tools/testing/selftests/powerpc/memmoveloop/Makefile b/tools/testing/selftests/powerpc/memmoveloop/Makefile new file mode 100644 index ..d58d8c100099 --- /dev/null +++ b/tools/testing/selftests/powerpc/memmoveloop/Makefile @@ -0,0 +1,19 @@ +# SPDX-License-Identifier: GPL-2.0 +CFLAGS += -m64 +CFLAGS += -I$(CURDIR) +CFLAGS += -D SELFTEST +CFLAGS += -maltivec + +ASFLAGS = $(CFLAGS) -Wa,-mpower4 + +TEST_GEN_PROGS := memmove_64 +EXTRA_SOURCES := validate.c ../harness.c stubs.S +CPPFLAGS += -D TEST_MEMMOVE=test_memmove + +top_srcdir = ../../../../.. +include ../../lib.mk + +$(OUTPUT)/memmove_64: mem_64.S memcpy_64.S $(EXTRA_SOURCES) + $(CC) $(CPPFLAGS) $(CFLAGS) \ + -D TEST_MEMMOVE=test_memmove \ + -o $@ $^ diff --git a/tools/testing/selftests/powerpc/memmoveloop/asm/asm-compat.h b/tools/testing/selftests/powerpc/memmoveloop/asm/asm-compat.h new file mode 100644 index ..e69de29bb2d1 diff --git a/tools/testing/selftests/powerpc/memmoveloop/asm/export.h b/tools/testing/selftests/powerpc/memmoveloop/asm/export.h new file mode 100644 index ..e6b80d5fbd14 --- /dev/null +++ b/tools/testing/selftests/powerpc/memmoveloop/asm/export.h @@ -0,0 +1,4 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#define EXPORT_SYMBOL(x) +#define EXPORT_SYMBOL_GPL(x) +#define EXPORT_SYMBOL_KASAN(x) diff --git a/tools/testing/selftests/powerpc/memmoveloop/asm/feature-fixups.h b/tools/testing/selftests/powerpc/memmoveloop/asm/feature-fixups.h new file mode 100644 index ..e69de29bb2d1 diff --git a/tools/testing/selftests/powerpc/memmoveloop/asm/kasan.h b/tools/testing/selftests/powerpc/memmoveloop/asm/kasan.h new file mode 100644 index ..e69de29bb2d1 diff --git a/tools/testing/selftests/powerpc/memmoveloop/asm/ppc_asm.h b/tools/testing/selftests/powerpc/memmoveloop/asm/ppc_asm.h new file mode 100644 index ..117005c56e19 --- /dev/null +++ b/tools/testing/selftests/powerpc/memmoveloop/asm/ppc_asm.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __SELFTESTS_POWERPC_PPC_ASM_H +#define __SELFTESTS_POWERPC_PPC_ASM_H +#include + +#define CONFIG_ALTIVEC + +#define r1 1 + +#define R14 r14 +#define
Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote: > This one possibly the branches end up in predictors, whereas conditional > trap is always just speculated not to hit. Branches may also have a > throughput limit on execution whereas trap could be more (1 per cycle > vs 4 per cycle on POWER9). I thought only *taken* branches are just one per cycle? And those branches are only taken for the exceptional condition (or the case where we do not care about performance, anyway, if we do have an error most of the time ;-) ) > On typical ppc32 CPUs, maybe it's a more obvious win. As you say there > is the CFAR issue as well which makes it a problem for 64s. It would > have been nice if it could use the same code though. On 64-bit the code looks better for the no-error path as well. > Maybe one day gcc's __builtin_trap() will become smart enough around > conditional statements that it it generates better code and tries to > avoid branches. Internally *all* traps are conditional, in GCC. It also can optimise them quite well. There must be something in the kernel macros that prevents good optimisation. Segher
Re: [PATCH v2] powerpc/mm: Fix set_memory_*() against concurrent accesses
On 8/18/21 9:05 AM, Michael Ellerman wrote: Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes on one of his systems: kernel tried to execute exec-protected page (c00804073278) - exploit attempt? (uid: 0) BUG: Unable to handle kernel instruction fetch Faulting instruction address: 0xc00804073278 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ... CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12 Workqueue: events control_work_handler [virtio_console] NIP: c00804073278 LR: c00804073278 CTR: c01e9de0 REGS: c0002e4ef7e0 TRAP: 0400 Not tainted (5.14.0-rc4+) MSR: 80004280b033 CR: 24002822 XER: 200400cf ... NIP fill_queue+0xf0/0x210 [virtio_console] LR fill_queue+0xf0/0x210 [virtio_console] Call Trace: fill_queue+0xb4/0x210 [virtio_console] (unreliable) add_port+0x1a8/0x470 [virtio_console] control_work_handler+0xbc/0x1e8 [virtio_console] process_one_work+0x290/0x590 worker_thread+0x88/0x620 kthread+0x194/0x1a0 ret_from_kernel_thread+0x5c/0x64 Jordan, Fabiano & Murilo were able to reproduce and identify that the problem is caused by the call to module_enable_ro() in do_init_module(), which happens after the module's init function has already been called. Our current implementation of change_page_attr() is not safe against concurrent accesses, because it invalidates the PTE before flushing the TLB and then installing the new PTE. That leaves a window in time where there is no valid PTE for the page, if another CPU tries to access the page at that time we see something like the fault above. We can't simply switch to set_pte_at()/flush TLB, because our hash MMU code doesn't handle a set_pte_at() of a valid PTE. See [1]. But we do have pte_update(), which replaces the old PTE with the new, meaning there's no window where the PTE is invalid. And the hash MMU version hash__pte_update() deals with synchronising the hash page table correctly. [1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r@linux.ibm.com/ Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines") Reported-by: Laurent Vivier Signed-off-by: Fabiano Rosas Signed-off-by: Michael Ellerman Reviewed-by: Murilo Opsfelder Araújo --- arch/powerpc/mm/pageattr.c | 23 ++- 1 file changed, 10 insertions(+), 13 deletions(-) v2: Use pte_update(..., ~0, pte_val(pte), ...) as suggested by Fabiano, and ptep_get() as suggested by Christophe. diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c index 0876216ceee6..edea388e9d3f 100644 --- a/arch/powerpc/mm/pageattr.c +++ b/arch/powerpc/mm/pageattr.c @@ -18,16 +18,12 @@ /* * Updates the attributes of a page in three steps: * - * 1. invalidate the page table entry - * 2. flush the TLB - * 3. install the new entry with the updated attributes - * - * Invalidating the pte means there are situations where this will not work - * when in theory it should. - * For example: - * - removing write from page whilst it is being executed - * - setting a page read-only whilst it is being read by another CPU + * 1. take the page_table_lock + * 2. install the new entry with the updated attributes + * 3. flush the TLB * + * This sequence is safe against concurrent updates, and also allows updating the + * attributes of a page currently being executed or accessed. */ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) { @@ -36,9 +32,7 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) spin_lock(_mm.page_table_lock); - /* invalidate the PTE so it's safe to modify */ - pte = ptep_get_and_clear(_mm, addr, ptep); - flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + pte = ptep_get(ptep); /* modify the PTE bits as desired, then apply */ switch (action) { @@ -59,11 +53,14 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) break; } - set_pte_at(_mm, addr, ptep, pte); + pte_update(_mm, addr, ptep, ~0UL, pte_val(pte), 0); /* See ptesync comment in radix__set_pte_at() */ if (radix_enabled()) asm volatile("ptesync": : :"memory"); + + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + spin_unlock(_mm.page_table_lock); return 0; base-commit: cbc06f051c524dcfe52ef0d1f30647828e226d30 -- Murilo
Re: [PATCH 3/3] powerpc: move the install rule to arch/powerpc/Makefile
On Sat, Jul 31, 2021 at 5:30 AM Nick Desaulniers wrote: > > On Thu, Jul 29, 2021 at 7:22 AM Masahiro Yamada wrote: > > > > Currently, the install target in arch/powerpc/Makefile descends into > > arch/powerpc/boot/Makefile to invoke the shell script, but there is no > > good reason to do so. > > Sure, but there are more arch/ subdirs that DO invoke install.sh from > arch//boot/Makefile than, not: > > arch//boot/Makefile: > - parisc > - nios2 > - arm > - nds32 > - sparc > - riscv > - 390 > - ppc (this patch) > - x86 > - arm64 I sent patches for these architectures. Check LKML. > arch//Makefile: > - ia64 > - m68k > > Patch is fine, but right now the tree is a bit inconsistent. > > > > > arch/powerpc/Makefile can run the shell script directly. > > > > Signed-off-by: Masahiro Yamada > > --- > > > > arch/powerpc/Makefile | 3 ++- > > arch/powerpc/boot/Makefile | 6 -- > > 2 files changed, 2 insertions(+), 7 deletions(-) > > > > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile > > index 6505d66f1193..9aaf1abbc641 100644 > > --- a/arch/powerpc/Makefile > > +++ b/arch/powerpc/Makefile > > @@ -407,7 +407,8 @@ endef > > > > PHONY += install > > install: > > - $(Q)$(MAKE) $(build)=$(boot) install > > + sh -x $(srctree)/$(boot)/install.sh "$(KERNELRELEASE)" vmlinux \ > > + System.map "$(INSTALL_PATH)" > > > > archclean: > > $(Q)$(MAKE) $(clean)=$(boot) > > diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile > > index 0d165bd98b61..10c0fb306f15 100644 > > --- a/arch/powerpc/boot/Makefile > > +++ b/arch/powerpc/boot/Makefile > > @@ -444,12 +444,6 @@ $(obj)/zImage: $(addprefix $(obj)/, > > $(image-y)) > > $(obj)/zImage.initrd: $(addprefix $(obj)/, $(initrd-y)) > > $(Q)rm -f $@; ln $< $@ > > > > -# Only install the vmlinux > > -install: > > - sh -x $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" vmlinux > > System.map "$(INSTALL_PATH)" > > - > > -PHONY += install > > - > > # anything not in $(targets) > > clean-files += $(image-) $(initrd-) cuImage.* dtbImage.* treeImage.* \ > > zImage zImage.initrd zImage.chrp zImage.coff zImage.holly \ > > -- > > 2.27.0 > > > > > -- > Thanks, > ~Nick Desaulniers -- Best Regards Masahiro Yamada
Re: [PATCH v2 0/1] cpufreq:powernv: Fix init_chip_info initialization in numa=off
On Wed, 28 Jul 2021 17:34:59 +0530, Pratik R. Sampat wrote: > v1: https://lkml.org/lkml/2021/7/26/1509 > Changelog v1-->v2: > Based on comments from Gautham, > 1. Included a #define for MAX_NR_CHIPS instead of hardcoding the > allocation. > > Pratik R. Sampat (1): > cpufreq:powernv: Fix init_chip_info initialization in numa=off > > [...] Applied to powerpc/next. [1/1] cpufreq:powernv: Fix init_chip_info initialization in numa=off https://git.kernel.org/powerpc/c/f34ee9cb2c5ac5af426fee6fa4591a34d187e696 cheers
Re: [PATCH] powerpc/64s/perf: Always use SIAR for kernel interrupts
On Wed, 21 Jul 2021 00:15:04 +1000, Nicholas Piggin wrote: > If an interrupt is taken in kernel mode, always use SIAR for it rather than > looking at regs_sipr. This prevents samples piling up around interrupt > enable (hard enable or interrupt replay via soft enable) in PMUs / modes > where the PR sample indication is not in synch with SIAR. > > This results in better sampling of interrupt entry and exit in particular. Applied to powerpc/next. [1/1] powerpc/64s/perf: Always use SIAR for kernel interrupts https://git.kernel.org/powerpc/c/cf9c615cde49fb5d2480549c8c955a0a387798d3 cheers
Re: [PATCH] cpuidle: pseries: Mark pseries_idle_proble() as __init
On Tue, 3 Aug 2021 14:15:47 -0700, Nathan Chancellor wrote: > After commit 7cbd631d4dec ("cpuidle: pseries: Fixup CEDE0 latency only > for POWER10 onwards"), pseries_idle_probe() is no longer inlined when > compiling with clang, which causes a modpost warning: > > WARNING: modpost: vmlinux.o(.text+0xc86a54): Section mismatch in > reference from the function pseries_idle_probe() to the function > .init.text:fixup_cede0_latency() > The function pseries_idle_probe() references > the function __init fixup_cede0_latency(). > This is often because pseries_idle_probe lacks a __init > annotation or the annotation of fixup_cede0_latency is wrong. > > [...] Applied to powerpc/next. [1/1] cpuidle: pseries: Mark pseries_idle_proble() as __init https://git.kernel.org/powerpc/c/d04691d373e75c83424b85c0e68e4a3f9370c10d cheers
Re: [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property
On Wed, 28 Jul 2021 23:26:04 +0530, Parth Shah wrote: > Changes from v1 -> v2: > - Based on Gautham's comments, use a separate thread_group_l3_cache_map > and modify parsing code to build cache_map for L3. This makes the > cache_map building code isolated from the parsing code. > v1 can be found at: > https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-June/230680.html > > [...] Applied to powerpc/next. [1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id https://git.kernel.org/powerpc/c/a4bec516b9c0823d7e2bb8c8928c98b535cf9adf [2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() https://git.kernel.org/powerpc/c/69aa8e078545bc14d84a8b4b3cb914ac8f9f280e [3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings https://git.kernel.org/powerpc/c/e9ef81e1079b0c4c374fba0f9affa7129c7c913b cheers
Re: [PATCH 1/3] powerpc: remove unused zInstall target from arch/powerpc/boot/Makefile
On Thu, 29 Jul 2021 23:19:35 +0900, Masahiro Yamada wrote: > Commit c913e5f95e54 ("powerpc/boot: Don't install zImage.* from make > install") added the zInstall target to arch/powerpc/boot/Makefile, > but you cannot use it since the corresponding hook is missing in > arch/powerpc/Makefile. > > It has never worked since its addition. Nobody has complained about > it for 7 years, which means this code was unneeded. > > [...] Applied to powerpc/next. [1/3] powerpc: remove unused zInstall target from arch/powerpc/boot/Makefile https://git.kernel.org/powerpc/c/156ca4e650bfb9a4259b427069caa11b5a4df3d4 [2/3] powerpc: make the install target not depend on any build artifact https://git.kernel.org/powerpc/c/9bef456b20581e630ef9a13555ca04fed65a859d [3/3] powerpc: move the install rule to arch/powerpc/Makefile https://git.kernel.org/powerpc/c/86ff0bce2e9665c8b074930fe6caed615da070c1 cheers
Re: [PATCH v4 0/5] nvmem: nintendo-otp: Add new driver for the Wii and Wii U OTP
On Sun, 1 Aug 2021 09:38:17 +0200, Emmanuel Gil Peyrot wrote: > The OTP is a read-only memory area which contains various keys and > signatures used to decrypt, encrypt or verify various pieces of storage. > > Its size depends on the console, it is 128 bytes on the Wii and > 1024 bytes on the Wii U (split into eight 128 bytes banks). > > It can be used directly by writing into one register and reading from > the other one, without any additional synchronisation. > > [...] Patches 3-5 applied to powerpc/next. [3/5] powerpc: wii.dts: Reduce the size of the control area https://git.kernel.org/powerpc/c/b11748e693166679acc13c8a4328a71efe1d4a89 [4/5] powerpc: wii.dts: Expose the OTP on this platform https://git.kernel.org/powerpc/c/562a610b4c5119034aed300f6ae212ec7a20c4b4 [5/5] powerpc: wii_defconfig: Enable OTP by default https://git.kernel.org/powerpc/c/140a89b7bfe65e9649c4a3678f74c32556834ec1 cheers
Re: [PATCH v5] pseries: prevent free CPU ids to be reused on another node
On Thu, 29 Apr 2021 19:49:08 +0200, Laurent Dufour wrote: > When a CPU is hot added, the CPU ids are taken from the available mask from > the lower possible set. If that set of values was previously used for CPU > attached to a different node, this seems to application like if these CPUs > have migrated from a node to another one which is not expected in real > life. > > To prevent this, it is needed to record the CPU ids used for each node and > to not reuse them on another node. However, to prevent CPU hot plug to > fail, in the case the CPU ids is starved on a node, the capability to reuse > other nodes’ free CPU ids is kept. A warning is displayed in such a case > to warn the user. > > [...] Applied to powerpc/next. [1/1] pseries: prevent free CPU ids to be reused on another node https://git.kernel.org/powerpc/c/bd1dd4c5f5286df0148b5b316f37c583b8f55fa1 cheers
Re: [PATCH v5] pseries/drmem: update LMBs after LPM
On Mon, 17 May 2021 11:06:06 +0200, Laurent Dufour wrote: > After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be > updated by the hypervisor in the case the NUMA topology of the LPAR's > memory is updated. > > This is handled by the kernel, but the memory's node is not updated because > there is no way to move a memory block between nodes from the Linux kernel > point of view. > > [...] Applied to powerpc/next. [1/1] pseries/drmem: update LMBs after LPM https://git.kernel.org/powerpc/c/d144f4d5a8a804133d20ff311d7be70bcdbfaac2 cheers
Re: [PATCH v2] ppc64/numa: consider the max numa node for migratable LPAR
On Tue, 11 May 2021 09:31:36 +0200, Laurent Dufour wrote: > When a LPAR is migratable, we should consider the maximum possible NUMA > node instead the number of NUMA node from the actual system. > > The DT property 'ibm,current-associativity-domains' is defining the maximum > number of nodes the LPAR can see when running on that box. But if the LPAR > is being migrated on another box, it may seen up to the nodes defined by > 'ibm,max-associativity-domains'. So if a LPAR is migratable, that value > should be used. > > [...] Applied to powerpc/next. [1/1] ppc64/numa: consider the max numa node for migratable LPAR https://git.kernel.org/powerpc/c/9c7248bb8de31f51c693bfa6a6ea53b1c07e0fa8 cheers
Re: [PATCH] powerpc/kexec: fix for_each_child.cocci warning
On Tue, 3 Aug 2021 16:59:55 +0200 (CEST), Julia Lawall wrote: > for_each_node_by_type should have of_node_put() before return. > > Generated by: scripts/coccinelle/iterators/for_each_child.cocci Applied to powerpc/next. [1/1] powerpc/kexec: fix for_each_child.cocci warning https://git.kernel.org/powerpc/c/c00103abf76fd3916596afd07dd3fdeee0dca15d cheers
Re: [PATCH v2 00/32] powerpc: Add MSI IRQ domains to PCI drivers
On Thu, 1 Jul 2021 15:27:18 +0200, Cédric Le Goater wrote: > This series adds support for MSI IRQ domains on top of the XICS (P8) > and XIVE (P9/P10) IRQ domains for the PowerNV (baremetal) and pSeries > (VM) platforms. It should simplify and improve IRQ affinity of PCI > MSIs under these PowerPC platforms, specially for drivers distributing > multiple RX/TX queues on the different CPUs of the system. > > Data locality can still be improved with an interrupt controller node > per chip but this requires FW changes. It could be done under OPAL. > > [...] Patches 1-31 applied to powerpc/next. [01/32] powerpc/pseries/pci: Introduce __find_pe_total_msi() https://git.kernel.org/powerpc/c/786e5b102a0007d81579822eac23cb5bfaa0b65f [02/32] powerpc/pseries/pci: Introduce rtas_prepare_msi_irqs() https://git.kernel.org/powerpc/c/e81202007363bd694b711f307f02320b5f98edaa [03/32] powerpc/xive: Add support for IRQ domain hierarchy https://git.kernel.org/powerpc/c/14be098c5387eb93b794f299f3c3e2ddf6038ec7 [04/32] powerpc/xive: Ease debugging of xive_irq_set_affinity() https://git.kernel.org/powerpc/c/6c2ab2a5d634d4e30445ee5d52d5d1469bf74aa2 [05/32] powerpc/pseries/pci: Add MSI domains https://git.kernel.org/powerpc/c/a5f3d2c17b07e69166b93209f34a5fb8271a6810 [06/32] powerpc/xive: Drop unmask of MSIs at startup https://git.kernel.org/powerpc/c/5690bcae186084a8544b1819f0d89399268bd0cf [07/32] powerpc/xive: Remove irqd_is_started() check when setting the affinity https://git.kernel.org/powerpc/c/292145a6e598c1e6633b8f5f607706b46f552ab9 [08/32] powerpc/pseries/pci: Add a domain_free_irqs() handler https://git.kernel.org/powerpc/c/07817a578a7a79638537480b8847dc7a12f293c5 [09/32] powerpc/pseries/pci: Add a msi_free() handler to clear XIVE data https://git.kernel.org/powerpc/c/9a014f456881e947bf8cdd8c984a207097e6c096 [10/32] powerpc/pseries/pci: Add support of MSI domains to PHB hotplug https://git.kernel.org/powerpc/c/174db9e7f775ce06fc6949c9abbe758b3eb8171c [11/32] powerpc/powernv/pci: Introduce __pnv_pci_ioda_msi_setup() https://git.kernel.org/powerpc/c/2c50d7e99e39eba92b93210e740f3f9e5a06ba54 [12/32] powerpc/powernv/pci: Add MSI domains https://git.kernel.org/powerpc/c/0fcfe2247e75070361af2b6845030cada92cdbf8 [13/32] KVM: PPC: Book3S HV: Use the new IRQ chip to detect passthrough interrupts https://git.kernel.org/powerpc/c/ba418a0278265ad65f2f9544e743b7dbff3b994b [14/32] KVM: PPC: Book3S HV: XIVE: Change interface of passthrough interrupt routines https://git.kernel.org/powerpc/c/e5e78b15113a73d0294141d9796969fa7b10fa3c [15/32] KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts https://git.kernel.org/powerpc/c/51be9e51a8000ffc6a33083ceca9da9303ed4dc5 [16/32] powerpc/xics: Remove ICS list https://git.kernel.org/powerpc/c/298f6f952885eeb1f25461f085c6c238bcd9fc5e [17/32] powerpc/xics: Rename the map handler in a check handler https://git.kernel.org/powerpc/c/248af248a8f45461662fb633eca4adf24ae704ad [18/32] powerpc/xics: Give a name to the default XICS IRQ domain https://git.kernel.org/powerpc/c/7d14f6c60b76fa7f3f89d81a95385576ca33b483 [19/32] powerpc/xics: Add debug logging to the set_irq_affinity handlers https://git.kernel.org/powerpc/c/53b34e8db73af98fa652641bf490384dc665d0f2 [20/32] powerpc/xics: Add support for IRQ domain hierarchy https://git.kernel.org/powerpc/c/e4f0aa3b4731430ad73fb4485e97f751c7500668 [21/32] powerpc/powernv/pci: Customize the MSI EOI handler to support PHB3 https://git.kernel.org/powerpc/c/bbb25af8fbdba4acaf955e412a84eb2eea48697c [22/32] powerpc/pci: Drop XIVE restriction on MSI domains https://git.kernel.org/powerpc/c/679e30b9536eeb93bc8c9a39c0ddc77dec536f6b [23/32] powerpc/xics: Drop unmask of MSIs at startup https://git.kernel.org/powerpc/c/1e661f81a522eadfe4bc5bb1ec9fbae27c13f163 [24/32] powerpc/pseries/pci: Drop unused MSI code https://git.kernel.org/powerpc/c/3005123eea0daa18d98602ab64b2ce3ad087d849 [25/32] powerpc/powernv/pci: Drop unused MSI code https://git.kernel.org/powerpc/c/6d9ba6121b1cf453985d08c141970a1b44cd9cf1 [26/32] powerpc/powernv/pci: Adapt is_pnv_opal_msi() to detect passthrough interrupt https://git.kernel.org/powerpc/c/f1a377f86f51b381cfc30bf2270f8a5f81e35ee9 [27/32] powerpc/xics: Fix IRQ migration https://git.kernel.org/powerpc/c/c80198a21792ac59412871e4e6fad5041c9be8e4 [28/32] powerpc/powernv/pci: Set the IRQ chip data for P8/CXL devices https://git.kernel.org/powerpc/c/5cd69651ceeed15e021cf7d19f1b1be0a80c0c7a [29/32] powerpc/powernv/pci: Rework pnv_opal_pci_msi_eoi() https://git.kernel.org/powerpc/c/c325712b5f85e561ea89bae2ba5d0104e797e42c [30/32] KVM: PPC: Book3S HV: XICS: Fix mapping of passthrough interrupts https://git.kernel.org/powerpc/c/1753081f2d445f9157550692fcc4221cd3ff0958 [31/32]
Re: [PATCH] powerpc: Always inline radix_enabled() to fix build failure
On Wed, 4 Aug 2021 11:37:24 +1000, Jordan Niethe wrote: > This is the same as commit acdad8fb4a15 ("powerpc: Force inlining of > mmu_has_feature to fix build failure") but for radix_enabled(). The > config in the linked bugzilla causes the following build failure: > > LD .tmp_vmlinux.kallsyms1 > powerpc64-linux-ld: arch/powerpc/mm/pgtable.o: in function > `.__ptep_set_access_flags': > pgtable.c:(.text+0x17c): undefined reference to > `.radix__ptep_set_access_flags' ... > > [...] Applied to powerpc/next. [1/1] powerpc: Always inline radix_enabled() to fix build failure https://git.kernel.org/powerpc/c/27fd051dc218e5b6cb2da5dbb3f342879ff1 cheers
Re: [PATCH] powerpc/non-smp: Inconditionaly call smp_mb() on switch_mm
On Mon, 5 Jul 2021 12:00:50 + (UTC), Christophe Leroy wrote: > Commit 3ccfebedd8cf ("powerpc, membarrier: Skip memory barrier in > switch_mm()") added some logic to skip the smp_mb() in > switch_mm_irqs_off() before the call to switch_mmu_context(). > > However, on non SMP smp_mb() is just a compiler barrier and doing > it inconditionaly is simpler than the logic used to check > whether the barrier is needed or not. > > [...] Applied to powerpc/next. [1/1] powerpc/non-smp: Inconditionaly call smp_mb() on switch_mm https://git.kernel.org/powerpc/c/c8a6d91005343dea0d53be0ff0620c66934dcd44 cheers
Re: [PATCH kernel v2] KVM: PPC: Use arch_get_random_seed_long instead of powernv variant
On Thu, 5 Aug 2021 17:56:49 +1000, Alexey Kardashevskiy wrote: > The powernv_get_random_long() does not work in nested KVM (which is > pseries) and produces a crash when accessing in_be64(rng->regs) in > powernv_get_random_long(). > > This replaces powernv_get_random_long with the ppc_md machine hook > wrapper. Applied to powerpc/next. [1/1] KVM: PPC: Use arch_get_random_seed_long instead of powernv variant https://git.kernel.org/powerpc/c/2ac78e0c00184a9ba53d507be7549c69a3f566b6 cheers
Re: [PATCH v8 0/5] Add support for FORM2 associativity
On Thu, 12 Aug 2021 18:52:18 +0530, Aneesh Kumar K.V wrote: > Form2 associativity adds a much more flexible NUMA topology layout > than what is provided by Form1. More details can be found in patch 7. > > $ numactl -H > ... > node distances: > node 0 1 2 3 > 0: 10 11 222 33 > 1: 44 10 55 66 > 2: 77 88 10 99 > 3: 101 121 132 10 > $ > > [...] Applied to powerpc/next. [1/5] powerpc/pseries: rename min_common_depth to primary_domain_index https://git.kernel.org/powerpc/c/7e35ef662ca05c42dbc2f401bb76d9219dd7fd02 [2/5] powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY https://git.kernel.org/powerpc/c/0eacd06bb8adea8dd9edb0a30144166d9f227e64 [3/5] powerpc/pseries: Consolidate different NUMA distance update code paths https://git.kernel.org/powerpc/c/8ddc6448ec5a5ef50eaa581a7dec0e12a02850ff [4/5] powerpc/pseries: Add a helper for form1 cpu distance https://git.kernel.org/powerpc/c/ef31cb83d19c4c589d650747cd5a7e502be9f665 [5/5] powerpc/pseries: Add support for FORM2 associativity https://git.kernel.org/powerpc/c/1c6b5a7e74052768977855f95d6b8812f6e7772c cheers
Re: [PATCH] powerpc/configs: Disable legacy ptys on microwatt defconfig
On Thu, 5 Aug 2021 11:20:05 +1000, Anton Blanchard wrote: > We shouldn't need legacy ptys, and disabling the option improves boot > time by about 0.5 seconds. Applied to powerpc/next. [1/1] powerpc/configs: Disable legacy ptys on microwatt defconfig https://git.kernel.org/powerpc/c/9b49f979b3d560cb75ea9f1a596baf432d566798 cheers
Re: [PATCH] powerpc: Remove in_kernel_text()
On Sun, 27 Jun 2021 17:09:18 + (UTC), Christophe Leroy wrote: > Last user of in_kernel_text() stopped using in with > commit 549e8152de80 ("powerpc: Make the 64-bit kernel as a > position-independent executable"). > > Generic function is_kernel_text() does the same. > > So remote it. Applied to powerpc/next. [1/1] powerpc: Remove in_kernel_text() https://git.kernel.org/powerpc/c/09ca497528dac12cbbceab8197011c875a96d053 cheers
Re: [PATCH] powerpc: use IRQF_NO_DEBUG for IPIs
On Mon, 19 Jul 2021 15:06:14 +0200, Cédric Le Goater wrote: > There is no need to use the lockup detector ("noirqdebug") for IPIs. > The ipistorm benchmark measures a ~10% improvement on high systems > when this flag is set. Applied to powerpc/next. [1/1] powerpc: use IRQF_NO_DEBUG for IPIs https://git.kernel.org/powerpc/c/17df41fec5b80b16ea4774495f1eb730e2225619 cheers
Re: [PATCH 0/2] KVM: PPC: Book3S HV: XIVE: Improve guest entries and exits
On Tue, 20 Jul 2021 15:42:07 +0200, Cédric Le Goater wrote: > The XIVE interrupt controller on P10 can automatically save and > restore the state of the interrupt registers under the internal NVP > structure representing the VCPU. This saves a costly store/load in > guest entries and exits. > > Thanks, > > [...] Applied to powerpc/next. [1/2] KVM: PPC: Book3S HV: XIVE: Add a 'flags' field https://git.kernel.org/powerpc/c/b68c6646cce5ee8caefa6333ee743f960222dcea [2/2] KVM: PPC: Book3S HV: XIVE: Add support for automatic save-restore https://git.kernel.org/powerpc/c/f5af0a978776b710f16dc99a85496b1e760bf9e0 cheers
Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
On Tue, 13 Apr 2021 16:38:09 + (UTC), Christophe Leroy wrote: > powerpc BUG_ON() and WARN_ON() are based on using twnei instruction. > > For catching simple conditions like a variable having value 0, this > is efficient because it does the test and the trap at the same time. > But most conditions used with BUG_ON or WARN_ON are more complex and > forces GCC to format the condition into a 0 or 1 value in a register. > This will usually require 2 to 3 instructions. > > [...] Applied to powerpc/next. [1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 https://git.kernel.org/powerpc/c/db87a7199229b75c9996bf78117eceb81854fce2 [2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto https://git.kernel.org/powerpc/c/1e688dd2a3d6759d416616ff07afc4bb836c4213 cheers
Re: (subset) [PATCH 00/38] Replace deprecated CPU-hotplug
On Tue, 3 Aug 2021 16:15:43 +0200, Sebastian Andrzej Siewior wrote: > This is a tree wide replacement of the deprecated CPU hotplug functions > which are only wrappers around the actual functions. > > Each patch is independent and can be picked up by the relevant maintainer. > > [...] Applied to powerpc/next. [03/38] powerpc: Replace deprecated CPU-hotplug functions. https://git.kernel.org/powerpc/c/5ae36401ca4ea2737d779ce7c267444b16530001 cheers
Re: [PATCH v2 1/2] powerpc/book3s64/radix: make tlb_single_page_flush_ceiling a debugfs entry
On Thu, 12 Aug 2021 18:58:30 +0530, Aneesh Kumar K.V wrote: > Similar to x86/s390 add a debugfs file to tune tlb_single_page_flush_ceiling. > Also add a debugfs entry for tlb_local_single_page_flush_ceiling. Applied to powerpc/next. [1/2] powerpc/book3s64/radix: make tlb_single_page_flush_ceiling a debugfs entry https://git.kernel.org/powerpc/c/3e188b1ae8807f26cc5a530a9d55f3f643fe050a [2/2] powerpc: rename powerpc_debugfs_root to arch_debugfs_dir https://git.kernel.org/powerpc/c/dbf77fed8b302e87561c7c2fc06050c88f4d3120 cheers
Re: [PATCH v3 3/3] powerpc/perf: Fix the check for SIAR value
Le 18/08/2021 à 15:19, Kajol Jain a écrit : Incase of random sampling, there can be scenarios where Sample Instruction Address Register(SIAR) may not latch to the sampled instruction and could result in the value of 0. In these scenarios it is preferred to return regs->nip. These corner cases are seen in the previous generation (p9) also. Patch adds the check for SIAR value along with use_siar and siar_valid checks so that the function will return regs->nip incase SIAR is zero. Patch drops the code under PPMU_P10_DD1 flag check which handles SIAR 0 case only for Power10 DD1. Fixes: 2ca13a4cc56c9 ("powerpc/perf: Use regs->nip when SIAR is zero") Signed-off-by: Kajol Jain --- Changelog: - Drop adding new ternary condition to check siar value. - Remove siar check specific for PPMU_P10_DD1 and add it along with common checks as suggested by Christophe Leroy and Michael Ellermen arch/powerpc/perf/core-book3s.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 23ec89a59893..55efbba7572b 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2254,12 +2254,7 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs) bool use_siar = regs_use_siar(regs); unsigned long siar = mfspr(SPRN_SIAR); - if (ppmu && (ppmu->flags & PPMU_P10_DD1)) { - if (siar) - return siar; - else - return regs->nip; - } else if (use_siar && siar_valid(regs)) + if (use_siar && siar_valid(regs) && siar) You can probably now do + if (regs_use_siar(regs) && siar_valid(regs) && siar) and remove use_siar return siar + perf_ip_adjust(regs); else return regs->nip;
[PATCH v3 3/3] powerpc/perf: Fix the check for SIAR value
Incase of random sampling, there can be scenarios where Sample Instruction Address Register(SIAR) may not latch to the sampled instruction and could result in the value of 0. In these scenarios it is preferred to return regs->nip. These corner cases are seen in the previous generation (p9) also. Patch adds the check for SIAR value along with use_siar and siar_valid checks so that the function will return regs->nip incase SIAR is zero. Patch drops the code under PPMU_P10_DD1 flag check which handles SIAR 0 case only for Power10 DD1. Fixes: 2ca13a4cc56c9 ("powerpc/perf: Use regs->nip when SIAR is zero") Signed-off-by: Kajol Jain --- Changelog: - Drop adding new ternary condition to check siar value. - Remove siar check specific for PPMU_P10_DD1 and add it along with common checks as suggested by Christophe Leroy and Michael Ellermen arch/powerpc/perf/core-book3s.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 23ec89a59893..55efbba7572b 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2254,12 +2254,7 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs) bool use_siar = regs_use_siar(regs); unsigned long siar = mfspr(SPRN_SIAR); - if (ppmu && (ppmu->flags & PPMU_P10_DD1)) { - if (siar) - return siar; - else - return regs->nip; - } else if (use_siar && siar_valid(regs)) + if (use_siar && siar_valid(regs) && siar) return siar + perf_ip_adjust(regs); else return regs->nip; -- 2.26.2
[PATCH v3 2/3] powerpc/perf: Drop the case of returning 0 as instruction pointer
Drop the case of returning 0 as instruction pointer since kernel never executes at 0 and userspace almost never does either. Fixes: e6878835ac47 ("powerpc/perf: Sample only if SIAR-Valid bit is set in P7+") Signed-off-by: Kajol Jain --- arch/powerpc/perf/core-book3s.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 1b464aad29c4..23ec89a59893 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2261,8 +2261,6 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs) return regs->nip; } else if (use_siar && siar_valid(regs)) return siar + perf_ip_adjust(regs); - else if (use_siar) - return 0; // no valid instruction pointer else return regs->nip; } -- 2.26.2
[PATCH v3 1/3] powerpc/perf: Use stack siar instead of mfspr
Minor optimization in the 'perf_instruction_pointer' function code by making use of stack siar instead of mfspr. Fixes: 75382aa72f06 ("powerpc/perf: Move code to select SIAR or pt_regs into perf_read_regs") Signed-off-by: Kajol Jain --- arch/powerpc/perf/core-book3s.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index bb0ee716de91..1b464aad29c4 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2260,7 +2260,7 @@ unsigned long perf_instruction_pointer(struct pt_regs *regs) else return regs->nip; } else if (use_siar && siar_valid(regs)) - return mfspr(SPRN_SIAR) + perf_ip_adjust(regs); + return siar + perf_ip_adjust(regs); else if (use_siar) return 0; // no valid instruction pointer else -- 2.26.2
Re: [PATCH v1 2/4] powerpc/64s/perf: add power_pmu_running to query whether perf is being used
> On 18-Aug-2021, at 5:11 PM, Nicholas Piggin wrote: > > Excerpts from Madhavan Srinivasan's message of August 17, 2021 11:06 pm: >> >> On 8/16/21 12:59 PM, Nicholas Piggin wrote: >>> Interrupt handling code would like to know whether perf is enabled, to >>> know whether it should enable MSR[EE] to improve PMI coverage. >>> >>> Cc: Madhavan Srinivasan >>> Cc: Athira Rajeev >>> Signed-off-by: Nicholas Piggin >>> --- >>> arch/powerpc/include/asm/hw_irq.h | 2 ++ >>> arch/powerpc/perf/core-book3s.c | 13 + >>> 2 files changed, 15 insertions(+) >>> >>> diff --git a/arch/powerpc/include/asm/hw_irq.h >>> b/arch/powerpc/include/asm/hw_irq.h >>> index 21cc571ea9c2..2d5c0d3ccbb6 100644 >>> --- a/arch/powerpc/include/asm/hw_irq.h >>> +++ b/arch/powerpc/include/asm/hw_irq.h >>> @@ -306,6 +306,8 @@ static inline bool lazy_irq_pending_nocheck(void) >>> return __lazy_irq_pending(local_paca->irq_happened); >>> } >>> >>> +bool power_pmu_running(void); >>> + >>> /* >>> * This is called by asynchronous interrupts to conditionally >>> * re-enable hard interrupts after having cleared the source >>> diff --git a/arch/powerpc/perf/core-book3s.c >>> b/arch/powerpc/perf/core-book3s.c >>> index bb0ee716de91..76114a9afb2b 100644 >>> --- a/arch/powerpc/perf/core-book3s.c >>> +++ b/arch/powerpc/perf/core-book3s.c >>> @@ -2380,6 +2380,19 @@ static void perf_event_interrupt(struct pt_regs >>> *regs) >>> perf_sample_event_took(sched_clock() - start_clock); >>> } >>> >>> +bool power_pmu_running(void) >>> +{ >>> + struct cpu_hw_events *cpuhw; >>> + >>> + /* Could this simply test local_paca->pmcregs_in_use? */ >>> + >>> + if (!ppmu) >>> + return false; >> >> >> This covers only when perf platform driver is not registered, >> but we should also check for MMCR0[32], since pmu sprs can be >> accessed via sysfs. > > In that case do they actually do anything with the PMI? I don't think it > should matter hopefully. > > But I do think a lot of this stuff could be cleaned up. We have > pmcs_enabled and ppc_enable_pmcs() in sysfs.c, ppc_set_pmu_inuse(), > ppc_md.enable_pmcs(), reserve_pmc_hardware(), etc and different users > call different things. We don't consistently disable either, e.g., we > never disable the H_PERFMON facility after we stop using perf even > though it says that slows down partition switch. Hi Nick, I have started looking at understanding the code path and working on it to get the code cleaned up. I will work on posting the patch set for clean up. Thanks Athira Rajeev > > I started to have a look at sorting it out but it looks like a big > job so would take a bit of time if we want to do it. > > Thanks, > Nick
Re: [PATCH v2] powerpc/mm: Fix set_memory_*() against concurrent accesses
Le 18/08/2021 à 14:05, Michael Ellerman a écrit : Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes on one of his systems: kernel tried to execute exec-protected page (c00804073278) - exploit attempt? (uid: 0) BUG: Unable to handle kernel instruction fetch Faulting instruction address: 0xc00804073278 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ... CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12 Workqueue: events control_work_handler [virtio_console] NIP: c00804073278 LR: c00804073278 CTR: c01e9de0 REGS: c0002e4ef7e0 TRAP: 0400 Not tainted (5.14.0-rc4+) MSR: 80004280b033 CR: 24002822 XER: 200400cf ... NIP fill_queue+0xf0/0x210 [virtio_console] LR fill_queue+0xf0/0x210 [virtio_console] Call Trace: fill_queue+0xb4/0x210 [virtio_console] (unreliable) add_port+0x1a8/0x470 [virtio_console] control_work_handler+0xbc/0x1e8 [virtio_console] process_one_work+0x290/0x590 worker_thread+0x88/0x620 kthread+0x194/0x1a0 ret_from_kernel_thread+0x5c/0x64 Jordan, Fabiano & Murilo were able to reproduce and identify that the problem is caused by the call to module_enable_ro() in do_init_module(), which happens after the module's init function has already been called. Our current implementation of change_page_attr() is not safe against concurrent accesses, because it invalidates the PTE before flushing the TLB and then installing the new PTE. That leaves a window in time where there is no valid PTE for the page, if another CPU tries to access the page at that time we see something like the fault above. We can't simply switch to set_pte_at()/flush TLB, because our hash MMU code doesn't handle a set_pte_at() of a valid PTE. See [1]. But we do have pte_update(), which replaces the old PTE with the new, meaning there's no window where the PTE is invalid. And the hash MMU version hash__pte_update() deals with synchronising the hash page table correctly. [1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r@linux.ibm.com/ Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines") Reported-by: Laurent Vivier Signed-off-by: Fabiano Rosas Signed-off-by: Michael Ellerman Reviewed-by: Christophe Leroy --- arch/powerpc/mm/pageattr.c | 23 ++- 1 file changed, 10 insertions(+), 13 deletions(-) v2: Use pte_update(..., ~0, pte_val(pte), ...) as suggested by Fabiano, and ptep_get() as suggested by Christophe. diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c index 0876216ceee6..edea388e9d3f 100644 --- a/arch/powerpc/mm/pageattr.c +++ b/arch/powerpc/mm/pageattr.c @@ -18,16 +18,12 @@ /* * Updates the attributes of a page in three steps: * - * 1. invalidate the page table entry - * 2. flush the TLB - * 3. install the new entry with the updated attributes - * - * Invalidating the pte means there are situations where this will not work - * when in theory it should. - * For example: - * - removing write from page whilst it is being executed - * - setting a page read-only whilst it is being read by another CPU + * 1. take the page_table_lock + * 2. install the new entry with the updated attributes + * 3. flush the TLB * + * This sequence is safe against concurrent updates, and also allows updating the + * attributes of a page currently being executed or accessed. */ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) { @@ -36,9 +32,7 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) spin_lock(_mm.page_table_lock); - /* invalidate the PTE so it's safe to modify */ - pte = ptep_get_and_clear(_mm, addr, ptep); - flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + pte = ptep_get(ptep); /* modify the PTE bits as desired, then apply */ switch (action) { @@ -59,11 +53,14 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) break; } - set_pte_at(_mm, addr, ptep, pte); + pte_update(_mm, addr, ptep, ~0UL, pte_val(pte), 0); /* See ptesync comment in radix__set_pte_at() */ if (radix_enabled()) asm volatile("ptesync": : :"memory"); + + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + spin_unlock(_mm.page_table_lock); return 0; base-commit: cbc06f051c524dcfe52ef0d1f30647828e226d30
[PATCH v2] powerpc/mm: Fix set_memory_*() against concurrent accesses
Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes on one of his systems: kernel tried to execute exec-protected page (c00804073278) - exploit attempt? (uid: 0) BUG: Unable to handle kernel instruction fetch Faulting instruction address: 0xc00804073278 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ... CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12 Workqueue: events control_work_handler [virtio_console] NIP: c00804073278 LR: c00804073278 CTR: c01e9de0 REGS: c0002e4ef7e0 TRAP: 0400 Not tainted (5.14.0-rc4+) MSR: 80004280b033 CR: 24002822 XER: 200400cf ... NIP fill_queue+0xf0/0x210 [virtio_console] LR fill_queue+0xf0/0x210 [virtio_console] Call Trace: fill_queue+0xb4/0x210 [virtio_console] (unreliable) add_port+0x1a8/0x470 [virtio_console] control_work_handler+0xbc/0x1e8 [virtio_console] process_one_work+0x290/0x590 worker_thread+0x88/0x620 kthread+0x194/0x1a0 ret_from_kernel_thread+0x5c/0x64 Jordan, Fabiano & Murilo were able to reproduce and identify that the problem is caused by the call to module_enable_ro() in do_init_module(), which happens after the module's init function has already been called. Our current implementation of change_page_attr() is not safe against concurrent accesses, because it invalidates the PTE before flushing the TLB and then installing the new PTE. That leaves a window in time where there is no valid PTE for the page, if another CPU tries to access the page at that time we see something like the fault above. We can't simply switch to set_pte_at()/flush TLB, because our hash MMU code doesn't handle a set_pte_at() of a valid PTE. See [1]. But we do have pte_update(), which replaces the old PTE with the new, meaning there's no window where the PTE is invalid. And the hash MMU version hash__pte_update() deals with synchronising the hash page table correctly. [1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r@linux.ibm.com/ Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines") Reported-by: Laurent Vivier Signed-off-by: Fabiano Rosas Signed-off-by: Michael Ellerman --- arch/powerpc/mm/pageattr.c | 23 ++- 1 file changed, 10 insertions(+), 13 deletions(-) v2: Use pte_update(..., ~0, pte_val(pte), ...) as suggested by Fabiano, and ptep_get() as suggested by Christophe. diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c index 0876216ceee6..edea388e9d3f 100644 --- a/arch/powerpc/mm/pageattr.c +++ b/arch/powerpc/mm/pageattr.c @@ -18,16 +18,12 @@ /* * Updates the attributes of a page in three steps: * - * 1. invalidate the page table entry - * 2. flush the TLB - * 3. install the new entry with the updated attributes - * - * Invalidating the pte means there are situations where this will not work - * when in theory it should. - * For example: - * - removing write from page whilst it is being executed - * - setting a page read-only whilst it is being read by another CPU + * 1. take the page_table_lock + * 2. install the new entry with the updated attributes + * 3. flush the TLB * + * This sequence is safe against concurrent updates, and also allows updating the + * attributes of a page currently being executed or accessed. */ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) { @@ -36,9 +32,7 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) spin_lock(_mm.page_table_lock); - /* invalidate the PTE so it's safe to modify */ - pte = ptep_get_and_clear(_mm, addr, ptep); - flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + pte = ptep_get(ptep); /* modify the PTE bits as desired, then apply */ switch (action) { @@ -59,11 +53,14 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) break; } - set_pte_at(_mm, addr, ptep, pte); + pte_update(_mm, addr, ptep, ~0UL, pte_val(pte), 0); /* See ptesync comment in radix__set_pte_at() */ if (radix_enabled()) asm volatile("ptesync": : :"memory"); + + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + spin_unlock(_mm.page_table_lock); return 0; base-commit: cbc06f051c524dcfe52ef0d1f30647828e226d30 -- 2.25.1
Re: [PATCH v1 2/4] powerpc/64s/perf: add power_pmu_running to query whether perf is being used
Excerpts from Madhavan Srinivasan's message of August 17, 2021 11:06 pm: > > On 8/16/21 12:59 PM, Nicholas Piggin wrote: >> Interrupt handling code would like to know whether perf is enabled, to >> know whether it should enable MSR[EE] to improve PMI coverage. >> >> Cc: Madhavan Srinivasan >> Cc: Athira Rajeev >> Signed-off-by: Nicholas Piggin >> --- >> arch/powerpc/include/asm/hw_irq.h | 2 ++ >> arch/powerpc/perf/core-book3s.c | 13 + >> 2 files changed, 15 insertions(+) >> >> diff --git a/arch/powerpc/include/asm/hw_irq.h >> b/arch/powerpc/include/asm/hw_irq.h >> index 21cc571ea9c2..2d5c0d3ccbb6 100644 >> --- a/arch/powerpc/include/asm/hw_irq.h >> +++ b/arch/powerpc/include/asm/hw_irq.h >> @@ -306,6 +306,8 @@ static inline bool lazy_irq_pending_nocheck(void) >> return __lazy_irq_pending(local_paca->irq_happened); >> } >> >> +bool power_pmu_running(void); >> + >> /* >>* This is called by asynchronous interrupts to conditionally >>* re-enable hard interrupts after having cleared the source >> diff --git a/arch/powerpc/perf/core-book3s.c >> b/arch/powerpc/perf/core-book3s.c >> index bb0ee716de91..76114a9afb2b 100644 >> --- a/arch/powerpc/perf/core-book3s.c >> +++ b/arch/powerpc/perf/core-book3s.c >> @@ -2380,6 +2380,19 @@ static void perf_event_interrupt(struct pt_regs *regs) >> perf_sample_event_took(sched_clock() - start_clock); >> } >> >> +bool power_pmu_running(void) >> +{ >> +struct cpu_hw_events *cpuhw; >> + >> +/* Could this simply test local_paca->pmcregs_in_use? */ >> + >> +if (!ppmu) >> +return false; > > > This covers only when perf platform driver is not registered, > but we should also check for MMCR0[32], since pmu sprs can be > accessed via sysfs. In that case do they actually do anything with the PMI? I don't think it should matter hopefully. But I do think a lot of this stuff could be cleaned up. We have pmcs_enabled and ppc_enable_pmcs() in sysfs.c, ppc_set_pmu_inuse(), ppc_md.enable_pmcs(), reserve_pmc_hardware(), etc and different users call different things. We don't consistently disable either, e.g., we never disable the H_PERFMON facility after we stop using perf even though it says that slows down partition switch. I started to have a look at sorting it out but it looks like a big job so would take a bit of time if we want to do it. Thanks, Nick
Re: [PATCH] scsi: ibmvfc: Stop using scsi_cmnd.tag
Hi Martin, Use scsi_cmd_to_rq(scsi_cmnd)->tag in preference to scsi_cmnd.tag. Applied to 5.15/scsi-staging and rebased for bisectability. Thanks, and sorry for the hassle. But I would still like the maintainers to have a look, as I was curious about current usage of scsi_cmnd.tag in that driver. Just to be picky it looks like there's another scsi_cmmd tag lurking in qla1280.c but it's sitting behind an #ifdef DEBUG_QLA1280. That driver does not even compile with DEBUG_QLA1280 set beforehand. I'll fix that up and send as separate patches in case you want to shuffle the tag patch in earlier, which is prob not worth the effort. I've done a good few more x86 randconfigs and tried to audit the code for more references, so hopefully that's the last. Thanks
Re: [PATCH v8 2/3] tty: hvc: pass DMA capable memory to put_chars()
Hi Xianting, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on tty/tty-testing] [also build test WARNING on char-misc/char-misc-testing soc/for-next v5.14-rc6 next-20210818] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Xianting-Tian/make-hvc-pass-dma-capable-memory-to-its-backend/20210818-162408 base: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git tty-testing config: sh-allmodconfig (attached as .config) compiler: sh4-linux-gcc (GCC) 11.2.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/e1b7662dafceb07a6905b64da2f1be27498c4a46 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Xianting-Tian/make-hvc-pass-dma-capable-memory-to-its-backend/20210818-162408 git checkout e1b7662dafceb07a6905b64da2f1be27498c4a46 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=sh If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): drivers/tty/hvc/hvc_console.c: In function 'hvc_poll_put_char': >> drivers/tty/hvc/hvc_console.c:880:55: warning: passing argument 2 of >> 'hp->ops->put_chars' makes pointer from integer without a cast >> [-Wint-conversion] 880 | n = hp->ops->put_chars(hp->vtermno, hp->out_ch, 1); | ~~^~~~ | | | char drivers/tty/hvc/hvc_console.c:880:55: note: expected 'const char *' but argument is of type 'char' Kconfig warnings: (for reference only) WARNING: unmet direct dependencies detected for SND_ATMEL_SOC_PDC Depends on SOUND && !UML && SND && SND_SOC && SND_ATMEL_SOC && HAS_DMA Selected by - SND_ATMEL_SOC_SSC && SOUND && !UML && SND && SND_SOC && SND_ATMEL_SOC - SND_ATMEL_SOC_SSC_PDC && SOUND && !UML && SND && SND_SOC && SND_ATMEL_SOC && ATMEL_SSC vim +880 drivers/tty/hvc/hvc_console.c 870 871 static void hvc_poll_put_char(struct tty_driver *driver, int line, char ch) 872 { 873 struct tty_struct *tty = driver->ttys[0]; 874 struct hvc_struct *hp = tty->driver_data; 875 int n; 876 877 hp->out_ch = ch; 878 879 do { > 880 n = hp->ops->put_chars(hp->vtermno, hp->out_ch, 1); 881 } while (n <= 0); 882 } 883 #endif 884 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
[PATCH] powerpc/32: Remove unneccessary calculations in load_up_{fpu/altivec}
No need to re-read SPRN_THREAD, we can calculate thread address from current (r2). And remove a reload of value 1 into r4 as r4 is already 1. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/fpu.S| 3 +-- arch/powerpc/kernel/vector.S | 4 +--- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S index 6010adcee16e..ba4afe3b5a9c 100644 --- a/arch/powerpc/kernel/fpu.S +++ b/arch/powerpc/kernel/fpu.S @@ -91,8 +91,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) isync /* enable use of FP after return */ #ifdef CONFIG_PPC32 - mfspr r5,SPRN_SPRG_THREAD /* current task's THREAD (phys) */ - tovirt(r5, r5) + addir5,r2,THREAD lwz r4,THREAD_FPEXC_MODE(r5) ori r9,r9,MSR_FP/* enable FP for current */ or r9,r9,r4 diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S index fc120fac1910..ba03eedfdcd8 100644 --- a/arch/powerpc/kernel/vector.S +++ b/arch/powerpc/kernel/vector.S @@ -65,9 +65,8 @@ _GLOBAL(load_up_altivec) 1: /* enable use of VMX after return */ #ifdef CONFIG_PPC32 - mfspr r5,SPRN_SPRG_THREAD /* current task's THREAD (phys) */ + addir5,r2,THREAD orisr9,r9,MSR_VEC@h - tovirt(r5, r5) #else ld r4,PACACURRENT(r13) addir5,r4,THREAD/* Get THREAD */ @@ -81,7 +80,6 @@ _GLOBAL(load_up_altivec) li r4,1 stb r4,THREAD_LOAD_VEC(r5) addir6,r5,THREAD_VRSTATE - li r4,1 li r10,VRSTATE_VSCR stw r4,THREAD_USED_VR(r5) lvx v0,r10,r6 -- 2.25.0
[PATCH v8 3/3] virtio-console: remove unnecessary kmemdup()
This revert commit c4baad5029 ("virtio-console: avoid DMA from stack") hvc framework will never pass stack memory to the put_chars() function, So the calling of kmemdup() is unnecessary, we can remove it. Signed-off-by: Xianting Tian Reviewed-by: Shile Zhang --- drivers/char/virtio_console.c | 12 ++-- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c index 7eaf303a7..4ed3ffb1d 100644 --- a/drivers/char/virtio_console.c +++ b/drivers/char/virtio_console.c @@ -1117,8 +1117,6 @@ static int put_chars(u32 vtermno, const char *buf, int count) { struct port *port; struct scatterlist sg[1]; - void *data; - int ret; if (unlikely(early_put_chars)) return early_put_chars(vtermno, buf, count); @@ -1127,14 +1125,8 @@ static int put_chars(u32 vtermno, const char *buf, int count) if (!port) return -EPIPE; - data = kmemdup(buf, count, GFP_ATOMIC); - if (!data) - return -ENOMEM; - - sg_init_one(sg, data, count); - ret = __send_to_port(port, sg, 1, count, data, false); - kfree(data); - return ret; + sg_init_one(sg, buf, count); + return __send_to_port(port, sg, 1, count, (void *)buf, false); } /* -- 2.17.1
[PATCH v8 2/3] tty: hvc: pass DMA capable memory to put_chars()
As well known, hvc backend driver(eg, virtio-console) can register its operations to hvc framework. The operations can contain put_chars(), get_chars() and so on. Some hvc backend may do dma in its operations. eg, put_chars() of virtio-console. But in the code of hvc framework, it may pass DMA incapable memory to put_chars() under a specific configuration, which is explained in commit c4baad5029(virtio-console: avoid DMA from stack): 1, c[] is on stack, hvc_console_print(): char c[N_OUTBUF] __ALIGNED__; cons_ops[index]->put_chars(vtermnos[index], c, i); 2, ch is on stack, static void hvc_poll_put_char(,,char ch) { struct tty_struct *tty = driver->ttys[0]; struct hvc_struct *hp = tty->driver_data; int n; do { n = hp->ops->put_chars(hp->vtermno, , 1); } while (n <= 0); } Commit c4baad5029 is just the fix to avoid DMA from stack memory, which is passed to virtio-console by hvc framework in above code. But I think the fix is aggressive, it directly uses kmemdup() to alloc new buffer from kmalloc area and do memcpy no matter the memory is in kmalloc area or not. But most importantly, it should better be fixed in the hvc framework, by changing it to never pass stack memory to the put_chars() function in the first place. Otherwise, we still face the same issue if a new hvc backend using dma added in the future. In this patch, we make 'char out_buf[N_OUTBUF]' and 'chat out_ch' part of 'struct hvc_struct', so both two buf are no longer the stack memory. we can use it in above two cases separately. Introduce another array(cons_outbufs[]) for buffer pointers next to the cons_ops[] and vtermnos[] arrays. With the array, we can easily find the buffer, instead of traversing hp list. With the patch, we can remove the fix c4baad5029. Signed-off-by: Xianting Tian Reviewed-by: Shile Zhang --- drivers/tty/hvc/hvc_console.c | 27 --- drivers/tty/hvc/hvc_console.h | 16 ++-- 2 files changed, 26 insertions(+), 17 deletions(-) diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c index 5bb8c4e44..300e9c037 100644 --- a/drivers/tty/hvc/hvc_console.c +++ b/drivers/tty/hvc/hvc_console.c @@ -41,16 +41,6 @@ */ #define HVC_CLOSE_WAIT (HZ/100) /* 1/10 of a second */ -/* - * These sizes are most efficient for vio, because they are the - * native transfer size. We could make them selectable in the - * future to better deal with backends that want other buffer sizes. - */ -#define N_OUTBUF 16 -#define N_INBUF16 - -#define __ALIGNED__ __attribute__((__aligned__(L1_CACHE_BYTES))) - static struct tty_driver *hvc_driver; static struct task_struct *hvc_task; @@ -142,6 +132,7 @@ static int hvc_flush(struct hvc_struct *hp) static const struct hv_ops *cons_ops[MAX_NR_HVC_CONSOLES]; static uint32_t vtermnos[MAX_NR_HVC_CONSOLES] = {[0 ... MAX_NR_HVC_CONSOLES - 1] = -1}; +static char *cons_outbufs[MAX_NR_HVC_CONSOLES]; /* * Console APIs, NOT TTY. These APIs are available immediately when @@ -151,7 +142,7 @@ static uint32_t vtermnos[MAX_NR_HVC_CONSOLES] = static void hvc_console_print(struct console *co, const char *b, unsigned count) { - char c[N_OUTBUF] __ALIGNED__; + char *c; unsigned i = 0, n = 0; int r, donecr = 0, index = co->index; @@ -163,6 +154,10 @@ static void hvc_console_print(struct console *co, const char *b, if (vtermnos[index] == -1) return; + c = cons_outbufs[index]; + if (!c) + return; + while (count > 0 || i > 0) { if (count > 0 && i < sizeof(c)) { if (b[n] == '\n' && !donecr) { @@ -879,8 +874,10 @@ static void hvc_poll_put_char(struct tty_driver *driver, int line, char ch) struct hvc_struct *hp = tty->driver_data; int n; + hp->out_ch = ch; + do { - n = hp->ops->put_chars(hp->vtermno, , 1); + n = hp->ops->put_chars(hp->vtermno, hp->out_ch, 1); } while (n <= 0); } #endif @@ -922,8 +919,7 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data, return ERR_PTR(err); } - hp = kzalloc(ALIGN(sizeof(*hp), sizeof(long)) + outbuf_size, - GFP_KERNEL); + hp = kzalloc(struct_size(hp, outbuf, outbuf_size), GFP_KERNEL); if (!hp) return ERR_PTR(-ENOMEM); @@ -931,7 +927,6 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data, hp->data = data; hp->ops = ops; hp->outbuf_size = outbuf_size; - hp->outbuf = &((char *)hp)[ALIGN(sizeof(*hp), sizeof(long))]; tty_port_init(>port); hp->port.ops = _port_ops; @@ -964,6 +959,7 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data, if (i < MAX_NR_HVC_CONSOLES) { cons_ops[i] = ops; vtermnos[i] =
[PATCH v8 0/3] make hvc pass dma capable memory to its backend
Dear all, This patch series make hvc framework pass DMA capable memory to put_chars() of hvc backend(eg, virtio-console), and revert commit c4baad5029 ("virtio-console: avoid DMA from stack”) V1 virtio-console: avoid DMA from vmalloc area https://lkml.org/lkml/2021/7/27/494 For v1 patch, Arnd Bergmann suggests to fix the issue in the first place: Make hvc pass DMA capable memory to put_chars() The fix suggestion is included in v2. V2 [PATCH 1/2] tty: hvc: pass DMA capable memory to put_chars() https://lkml.org/lkml/2021/8/1/8 [PATCH 2/2] virtio-console: remove unnecessary kmemdup() https://lkml.org/lkml/2021/8/1/9 For v2 patch, Arnd Bergmann suggests to make new buf part of the hvc_struct structure, and fix the compile issue. The fix suggestion is included in v3. V3 [PATCH v3 1/2] tty: hvc: pass DMA capable memory to put_chars() https://lkml.org/lkml/2021/8/3/1347 [PATCH v3 2/2] virtio-console: remove unnecessary kmemdup() https://lkml.org/lkml/2021/8/3/1348 For v3 patch, Jiri Slaby suggests to make 'char c[N_OUTBUF]' part of hvc_struct, and make 'hp->outbuf' aligned and use struct_size() to calculate the size of hvc_struct. The fix suggestion is included in v4. V4 [PATCH v4 0/2] make hvc pass dma capable memory to its backend https://lkml.org/lkml/2021/8/5/1350 [PATCH v4 1/2] tty: hvc: pass DMA capable memory to put_chars() https://lkml.org/lkml/2021/8/5/1351 [PATCH v4 2/2] virtio-console: remove unnecessary kmemdup() https://lkml.org/lkml/2021/8/5/1352 For v4 patch, Arnd Bergmann suggests to introduce another array(cons_outbuf[]) for the buffer pointers next to the cons_ops[] and vtermnos[] arrays. This fix included in this v5 patch. V5 Arnd Bergmann suggests to use "L1_CACHE_BYTES" as dma alignment, use 'sizeof(long)' as dma alignment is wrong. fix it in v6. V6 It contains coding error, fix it in v7 and it worked normally according to test result. V7 Greg KH suggests to add test and code review developer, Jiri Slaby suggests to use lockless buffer and fix dma alignment in separate patch. fix above things in v8. drivers/tty/hvc/hvc_console.c | 27 --- drivers/tty/hvc/hvc_console.h | 16 ++-- drivers/tty/hvc/hvc_console.h | 16 -- 3 file changed
[PATCH v8 1/3] tty: hvc: use correct dma alignment size
Use L1_CACHE_BYTES as the dma alignment size, use 'sizeof(long)' is wrong. Signed-off-by: Xianting Tian Reviewed-by: Shile Zhang --- drivers/tty/hvc/hvc_console.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c index 5bb8c4e44..5957ab728 100644 --- a/drivers/tty/hvc/hvc_console.c +++ b/drivers/tty/hvc/hvc_console.c @@ -49,7 +49,7 @@ #define N_OUTBUF 16 #define N_INBUF16 -#define __ALIGNED__ __attribute__((__aligned__(sizeof(long +#define __ALIGNED__ __attribute__((__aligned__(L1_CACHE_BYTES))) static struct tty_driver *hvc_driver; static struct task_struct *hvc_task; -- 2.17.1
Re: [PATCH] powerpc/mm: Fix set_memory_*() against concurrent accesses
Fabiano Rosas writes: > Michael Ellerman writes: > > Hi, I already mentioned these things in private, but I'll post here so > everyone can see: > >> Because pte_update() takes the set of PTE bits to set and clear we can't >> use our existing helpers, eg. pte_wrprotect() etc. and instead have to >> open code the set of flags. We will clean that up somehow in a future >> commit. > > I tested the following on P9 and it seems to work fine. Not sure if it > works for CONFIG_PPC_8xx, though. > > > static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) > { > long action = (long)data; > pte_t pte; > > spin_lock(_mm.page_table_lock); > - > - /* invalidate the PTE so it's safe to modify */ > - pte = ptep_get_and_clear(_mm, addr, ptep); > - flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > + pte = *ptep; > > /* modify the PTE bits as desired, then apply */ > switch (action) { > @@ -59,11 +42,9 @@ static int change_page_attr(pte_t *ptep, unsigned long > addr, void *data) > break; > } > > - set_pte_at(_mm, addr, ptep, pte); > + pte_update(_mm, addr, ptep, ~0UL, pte_val(pte), 0); I avoided that because it's not atomic, but pte_update() is not atomic on some platforms anyway. And for now at least we still have the page_table_lock as well. So you're right that's a nicer way to do it. And I'll use ptep_get() as Christophe suggested. > + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > > - /* See ptesync comment in radix__set_pte_at() */ > - if (radix_enabled()) > - asm volatile("ptesync": : :"memory"); I didn't do that because I wanted to keep the patch minimal. We can do that as a separate patch. > spin_unlock(_mm.page_table_lock); > > return 0; > --- > > For reference, the full patch is here: > https://github.com/farosas/linux/commit/923c95c84d7081d7be9503bf5b276dd93bd17036.patch > >> >> [1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r@linux.ibm.com/ >> >> Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines") >> Reported-by: Laurent Vivier >> Signed-off-by: Michael Ellerman >> --- > > ... > >> -set_pte_at(_mm, addr, ptep, pte); >> +pte_update(_mm, addr, ptep, clear, set, 0); >> >> /* See ptesync comment in radix__set_pte_at() */ >> if (radix_enabled()) >> asm volatile("ptesync": : :"memory"); >> + >> +flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > > I think there's an optimization possible here, when relaxing access, to > skip the TLB flush. Would still need the ptesync though. Similar to what > Nick did in e5f7cb58c2b7 ("powerpc/64s/radix: do not flush TLB when > relaxing access"). That commit is specific to Radix, whereas this code needs to work on all platforms. We'd need to verify that it's safe to skip the flush on all platforms and CPU versions. What I think we can do, and would possibly be a more meaningful optimisation, is to move the TLB flush out of the loop and up into change_memory_attr(). So we just do it once for the range, rather than per page. But that too would be a separate patch. cheers
[PATCH] sched/topology: Skip updating masks for non-online nodes
From: Valentin Schneider The scheduler currently expects NUMA node distances to be stable from init onwards, and as a consequence builds the related data structures once-and-for-all at init (see sched_init_numa()). Unfortunately, on some architectures node distance is unreliable for offline nodes and may very well change upon onlining. Skip over offline nodes during sched_init_numa(). Track nodes that have been onlined at least once, and trigger a build of a node's NUMA masks when it is first onlined post-init. Cc: LKML Cc: linuxppc-dev@lists.ozlabs.org Cc: Nathan Lynch Cc: Michael Ellerman Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Gautham R Shenoy Cc: Dietmar Eggemann Cc: Mel Gorman Cc: Vincent Guittot Cc: Rik van Riel Cc: Geetika Moolchandani Cc: Laurent Dufour Reported-by: Geetika Moolchandani Signed-off-by: Srikar Dronamraju Signed-off-by: Valentin Schneider --- Changelog: [Fixed warning: no previous prototype for '__sched_domains_numa_masks_set'] kernel/sched/topology.c | 65 + 1 file changed, 65 insertions(+) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index b77ad49dc14f..4e8698e62f07 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1482,6 +1482,8 @@ int sched_max_numa_distance; static int *sched_domains_numa_distance; static struct cpumask ***sched_domains_numa_masks; int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE; + +static unsigned long __read_mostly *sched_numa_onlined_nodes; #endif /* @@ -1833,6 +1835,16 @@ void sched_init_numa(void) sched_domains_numa_masks[i][j] = mask; for_each_node(k) { + /* +* Distance information can be unreliable for +* offline nodes, defer building the node +* masks to its bringup. +* This relies on all unique distance values +* still being visible at init time. +*/ + if (!node_online(j)) + continue; + if (sched_debug() && (node_distance(j, k) != node_distance(k, j))) sched_numa_warn("Node-distance not symmetric"); @@ -1886,6 +1898,53 @@ void sched_init_numa(void) sched_max_numa_distance = sched_domains_numa_distance[nr_levels - 1]; init_numa_topology_type(); + + sched_numa_onlined_nodes = bitmap_alloc(nr_node_ids, GFP_KERNEL); + if (!sched_numa_onlined_nodes) + return; + + bitmap_zero(sched_numa_onlined_nodes, nr_node_ids); + for_each_online_node(i) + bitmap_set(sched_numa_onlined_nodes, i, 1); +} + +static void __sched_domains_numa_masks_set(unsigned int node) +{ + int i, j; + + /* +* NUMA masks are not built for offline nodes in sched_init_numa(). +* Thus, when a CPU of a never-onlined-before node gets plugged in, +* adding that new CPU to the right NUMA masks is not sufficient: the +* masks of that CPU's node must also be updated. +*/ + if (test_bit(node, sched_numa_onlined_nodes)) + return; + + bitmap_set(sched_numa_onlined_nodes, node, 1); + + for (i = 0; i < sched_domains_numa_levels; i++) { + for (j = 0; j < nr_node_ids; j++) { + if (!node_online(j) || node == j) + continue; + + if (node_distance(j, node) > sched_domains_numa_distance[i]) + continue; + + /* Add remote nodes in our masks */ + cpumask_or(sched_domains_numa_masks[i][node], + sched_domains_numa_masks[i][node], + sched_domains_numa_masks[0][j]); + } + } + + /* +* A new node has been brought up, potentially changing the topology +* classification. +* +* Note that this is racy vs any use of sched_numa_topology_type :/ +*/ + init_numa_topology_type(); } void sched_domains_numa_masks_set(unsigned int cpu) @@ -1893,8 +1952,14 @@ void sched_domains_numa_masks_set(unsigned int cpu) int node = cpu_to_node(cpu); int i, j; + __sched_domains_numa_masks_set(node); + for (i = 0; i < sched_domains_numa_levels; i++) { for (j = 0; j < nr_node_ids; j++) { + if (!node_online(j)) + continue; + + /* Set ourselves in the remote node's masks */ if (node_distance(j, node) <= sched_domains_numa_distance[i])
[PATCH v2] powerpc/32s: Fix random crashes by adding isync() after locking/unlocking KUEP
Commit b5efec00b671 ("powerpc/32s: Move KUEP locking/unlocking in C") removed the 'isync' instruction after adding/removing NX bit in user segments. The reasoning behind this change was that when setting the NX bit we don't mind it taking effect with delay as the kernel never executes text from userspace, and when clearing the NX bit this is to return to userspace and then the 'rfi' should synchronise the context. However, it looks like on book3s/32 having a hash page table, at least on the G3 processor, we get an unexpected fault from userspace, then this is followed by something wrong in the verification of MSR_PR at end of another interrupt. This is fixed by adding back the removed isync() following update of NX bit in user segment registers. Only do it for cores with an hash table, as 603 cores don't exhibit that problem and the two isync increase ./null_syscall selftest by 6 cycles on an MPC 832x. First problem: unexpected WARN_ON() for mysterious PROTFAULT [ 62.896426] WARNING: CPU: 0 PID: 1660 at arch/powerpc/mm/fault.c:354 do_page_fault+0x6c/0x5b0 [ 62.918111] Modules linked in: [ 62.923350] CPU: 0 PID: 1660 Comm: Xorg Not tainted 5.13.0-pmac-00028-gb3c15b60339a #40 [ 62.943476] NIP: c001b5c8 LR: c001b6f8 CTR: [ 62.954714] REGS: e2d09e40 TRAP: 0700 Not tainted (5.13.0-pmac-00028-gb3c15b60339a) [ 62.974581] MSR: 00021032 CR: 42d04f30 XER: 2000 [ 62.990009] GPR00: c000424c e2d09f00 c301b680 e2d09f40 001e 4200 00cba028 GPR08: 0800 4810 c301b680 e2d09f30 22d09f30 00c1fff0 00cba000 a7b7ba4c GPR16: 0031 a7b7b0d0 00c5c010 GPR24: a7b7b64c a7b7d2f0 0004 c1efa6c0 00cba02c 0300 e2d09f40 [ 63.075238] NIP [c001b5c8] do_page_fault+0x6c/0x5b0 [ 63.085952] LR [c001b6f8] do_page_fault+0x19c/0x5b0 [ 63.096678] Call Trace: [ 63.100075] [e2d09f00] [e2d09f04] 0xe2d09f04 (unreliable) [ 63.112359] [e2d09f30] [c000424c] DataAccess_virt+0xd4/0xe4 [ 63.125168] --- interrupt: 300 at 0xa7a261dc [ 63.134060] NIP: a7a261dc LR: a7a253bc CTR: [ 63.145302] REGS: e2d09f40 TRAP: 0300 Not tainted (5.13.0-pmac-00028-gb3c15b60339a) [ 63.165167] MSR: d032 CR: 228428e2 XER: 2000 [ 63.182162] DAR: 00cba02c DSISR: 4200 GPR00: a7a27448 afa6b0e0 a74c35c0 a7b7b614 001e a7b7b614 00cba028 GPR08: 00020fd9 0031 00cb9ff8 a7a273b0 220028e2 00c1fff0 00cba000 a7b7ba4c GPR16: 0031 a7b7b0d0 00c5c010 GPR24: a7b7b64c a7b7d2f0 0004 0002 001e a7b7b614 a7b7aff4 0030 [ 63.275233] NIP [a7a261dc] 0xa7a261dc [ 63.282291] LR [a7a253bc] 0xa7a253bc [ 63.289087] --- interrupt: 300 [ 63.294322] Instruction dump: [ 63.299291] 7c4a1378 810300a0 75278410 83820298 83a300a4 553b018c 551e0036 4082038c [ 63.318630] 2e1b 40920228 75280800 41820220 <0fe0> 3b60 41920214 81420594 [ 63.338503] ---[ end trace f642a84639cba377 ]--- Second problem: MSR PR is seen unset allthough the interrupt frame shows it set [ 63.666846] kernel BUG at arch/powerpc/kernel/interrupt.c:458! [ 63.680633] Oops: Exception in kernel mode, sig: 5 [#1] [ 63.692201] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac [ 63.705011] Modules linked in: [ 63.710247] CPU: 0 PID: 1660 Comm: Xorg Tainted: GW 5.13.0-pmac-00028-gb3c15b60339a #40 [ 63.734553] NIP: c0011434 LR: c001629c CTR: [ 63.745796] REGS: e2d09e70 TRAP: 0700 Tainted: GW (5.13.0-pmac-00028-gb3c15b60339a) [ 63.769844] MSR: 00029032 CR: 42d09f30 XER: [ 63.786052] GPR00: e2d09f30 c301b680 e2d09f40 8344 c44d0e68 e2d09e8c GPR08: 0002 00dc228a 4000 e2d09f30 22d09f30 00c1fff0 afa6ceb4 00c26144 GPR16: 00c25fb8 00c26140 afa6ceb8 9000 00c944d8 001c 0020 GPR24: 01fb afa6d1b4 0001 a539a2a0 a530fd80 0089 [ 63.871284] NIP [c0011434] interrupt_exit_kernel_prepare+0x10/0x70 [ 63.885922] LR [c001629c] interrupt_return+0x9c/0x144 [ 63.897168] Call Trace: [ 63.900562] [e2d09f30] [c000424c] DataAccess_virt+0xd4/0xe4 (unreliable) [ 63.916773] --- interrupt: 300 at 0xa09be008 [ 63.925659] NIP: a09be008 LR: a09bdfe8 CTR: a09bdfc0 [ 63.936903] REGS: e2d09f40 TRAP: 0300 Tainted: GW
Re: [PATCH v2 61/63] powerpc: Split memset() to avoid multi-field overflow
Le 18/08/2021 à 08:05, Kees Cook a écrit : In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Instead of writing across a field boundary with memset(), move the call to just the array, and an explicit zeroing of the prior field. Cc: Benjamin Herrenschmidt Cc: Qinglang Miao Cc: "Gustavo A. R. Silva" Cc: Hulk Robot Cc: Wang Wensheng Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Kees Cook Reviewed-by: Michael Ellerman Link: https://lore.kernel.org/lkml/87czqsnmw9@mpe.ellerman.id.au --- drivers/macintosh/smu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/macintosh/smu.c b/drivers/macintosh/smu.c index 94fb63a7b357..59ce431da7ef 100644 --- a/drivers/macintosh/smu.c +++ b/drivers/macintosh/smu.c @@ -848,7 +848,8 @@ int smu_queue_i2c(struct smu_i2c_cmd *cmd) cmd->read = cmd->info.devaddr & 0x01; switch(cmd->info.type) { case SMU_I2C_TRANSFER_SIMPLE: - memset(>info.sublen, 0, 4); + cmd->info.sublen = 0; + memset(>info.subaddr, 0, 3); subaddr[] is a table, should the & be avoided ? And while at it, why not use sizeof(subaddr) instead of 3 ? break; case SMU_I2C_TRANSFER_COMBINED: cmd->info.devaddr &= 0xfe;
[PATCH v2 57/63] powerpc/signal32: Use struct_group() to zero spe regs
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Add a struct_group() for the spe registers so that memset() can correctly reason about the size: In function 'fortify_memset_chk', inlined from 'restore_user_regs.part.0' at arch/powerpc/kernel/signal_32.c:539:3: >> include/linux/fortify-string.h:195:4: error: call to >> '__write_overflow_field' declared with attribute warning: detected write >> beyond size of field (1st parameter); maybe use struct_group()? >> [-Werror=attribute-warning] 195 |__write_overflow_field(); |^~~~ Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Christophe Leroy Cc: Sudeep Holla Cc: linuxppc-dev@lists.ozlabs.org Reported-by: kernel test robot Signed-off-by: Kees Cook --- arch/powerpc/include/asm/processor.h | 6 -- arch/powerpc/kernel/signal_32.c | 6 +++--- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index f348e564f7dd..05dc567cb9a8 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -191,8 +191,10 @@ struct thread_struct { int used_vsr; /* set if process has used VSX */ #endif /* CONFIG_VSX */ #ifdef CONFIG_SPE - unsigned long evr[32];/* upper 32-bits of SPE regs */ - u64 acc;/* Accumulator */ + struct_group(spe, + unsigned long evr[32];/* upper 32-bits of SPE regs */ + u64 acc;/* Accumulator */ + ); unsigned long spefscr;/* SPE & eFP status */ unsigned long spefscr_last; /* SPEFSCR value on last prctl call or trap return */ diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index 0608581967f0..77b86caf5c51 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -532,11 +532,11 @@ static long restore_user_regs(struct pt_regs *regs, regs_set_return_msr(regs, regs->msr & ~MSR_SPE); if (msr & MSR_SPE) { /* restore spe registers from the stack */ - unsafe_copy_from_user(current->thread.evr, >mc_vregs, - ELF_NEVRREG * sizeof(u32), failed); + unsafe_copy_from_user(>thread.spe, >mc_vregs, + sizeof(current->thread.spe), failed); current->thread.used_spe = true; } else if (current->thread.used_spe) - memset(current->thread.evr, 0, ELF_NEVRREG * sizeof(u32)); + memset(>thread.spe, 0, sizeof(current->thread.spe)); /* Always get SPEFSCR back */ unsafe_get_user(current->thread.spefscr, (u32 __user *)>mc_vregs + ELF_NEVRREG, failed); -- 2.30.2
[PATCH v2 61/63] powerpc: Split memset() to avoid multi-field overflow
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Instead of writing across a field boundary with memset(), move the call to just the array, and an explicit zeroing of the prior field. Cc: Benjamin Herrenschmidt Cc: Qinglang Miao Cc: "Gustavo A. R. Silva" Cc: Hulk Robot Cc: Wang Wensheng Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Kees Cook Reviewed-by: Michael Ellerman Link: https://lore.kernel.org/lkml/87czqsnmw9@mpe.ellerman.id.au --- drivers/macintosh/smu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/macintosh/smu.c b/drivers/macintosh/smu.c index 94fb63a7b357..59ce431da7ef 100644 --- a/drivers/macintosh/smu.c +++ b/drivers/macintosh/smu.c @@ -848,7 +848,8 @@ int smu_queue_i2c(struct smu_i2c_cmd *cmd) cmd->read = cmd->info.devaddr & 0x01; switch(cmd->info.type) { case SMU_I2C_TRANSFER_SIMPLE: - memset(>info.sublen, 0, 4); + cmd->info.sublen = 0; + memset(>info.subaddr, 0, 3); break; case SMU_I2C_TRANSFER_COMBINED: cmd->info.devaddr &= 0xfe; -- 2.30.2
[PATCH v2 36/63] scsi: ibmvscsi: Avoid multi-field memset() overflow by aiming at srp
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Instead of writing beyond the end of evt_struct->iu.srp.cmd, target the upper union (evt_struct->iu.srp) instead, as that's what is being wiped. Cc: Tyrel Datwyler Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: "James E.J. Bottomley" Cc: "Martin K. Petersen" Cc: linux-s...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Kees Cook Acked-by: Martin K. Petersen Link: https://lore.kernel.org/lkml/yq135rzp79c@ca-mkp.ca.oracle.com Acked-by: Tyrel Datwyler Link: https://lore.kernel.org/lkml/6eae8434-e9a7-aa74-628b-b515b3695...@linux.ibm.com --- drivers/scsi/ibmvscsi/ibmvscsi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index 50df7dd9cb91..ea8e01f49cba 100644 --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -1055,8 +1055,9 @@ static int ibmvscsi_queuecommand_lck(struct scsi_cmnd *cmnd, return SCSI_MLQUEUE_HOST_BUSY; /* Set up the actual SRP IU */ + BUILD_BUG_ON(sizeof(evt_struct->iu.srp) != SRP_MAX_IU_LEN); + memset(_struct->iu.srp, 0x00, sizeof(evt_struct->iu.srp)); srp_cmd = _struct->iu.srp.cmd; - memset(srp_cmd, 0x00, SRP_MAX_IU_LEN); srp_cmd->opcode = SRP_CMD; memcpy(srp_cmd->cdb, cmnd->cmnd, sizeof(srp_cmd->cdb)); int_to_scsilun(lun, _cmd->lun); -- 2.30.2