[PATCH v3] powerpc/kexec_file: use current CPU info while setting up FDT
kexec_file_load uses initial_boot_params in setting up the device-tree for the kernel to be loaded. Though initial_boot_params holds info about CPUs at the time of boot, it doesn't account for hot added CPUs. So, kexec'ing with kexec_file_load syscall would leave the kexec'ed kernel with inaccurate CPU info. Also, if kdump kernel is loaded with kexec_file_load syscall and the system crashes on a hot added CPU, capture kernel hangs failing to identify the boot CPU. Kernel panic - not syncing: sysrq triggered crash CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 5.12.0-rc5upstream #54 Call Trace: [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 (unreliable) [c000e590fb00] [c0145290] panic+0x16c/0x41c [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40 [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0 [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178 [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0 [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330 [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140 [c000e590fdb0] [c0031ee0] system_call_exception+0x150/0x290 [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278 --- interrupt: c00 at 0x7fff905b9664 NIP: 7fff905b9664 LR: 7fff905320c4 CTR: REGS: c000e590fe80 TRAP: 0c00 Not tainted (5.12.0-rc5upstream) MSR: 8280f033 CR: 28000242 XER: IRQMASK: 0 GPR00: 0004 75fedf30 7fff906a7300 0001 GPR04: 01002a7355b0 0002 0001 75fef616 GPR08: 0001 GPR12: 7fff9073a160 GPR16: GPR20: 7fff906a4ee0 0002 0001 GPR24: 7fff906a0898 0002 01002a7355b0 GPR28: 0002 7fff906a1790 01002a7355b0 0002 NIP [7fff905b9664] 0x7fff905b9664 LR [7fff905320c4] 0x7fff905320c4 --- interrupt: c00 To avoid this from happening, extract current CPU info from of_root device node and use it for setting up the fdt in kexec_file_load case. Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory reserve map") Signed-off-by: Sourabh Jain Cc: --- arch/powerpc/kexec/file_load_64.c | 98 +++ 1 file changed, 98 insertions(+) --- Changelog: v1 -> v2 - fdt should be updated regardless of kexec type - updated commit message and title v2 -> v3 - Fixed warnings reported by patchwork (https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210416124658.718860-1-sourabhj...@linux.ibm.com/) - argument aligned to open parenthesis - declared add_node_prop and update_cpus_node function static --- diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 02b9e4d0dc40..878f8297fbed 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -960,6 +960,99 @@ unsigned int kexec_fdt_totalsize_ppc64(struct kimage *image) return fdt_size; } +/** + * add_node_prop - Read property from device node structure and add + * them to fdt. + * @fdt: Flattened device tree of the kernel + * @node_offset: offset of the node to add a property at + * np: device node pointer + * + * Returns 0 on success, negative errno on error. + */ +static int add_node_prop(void *fdt, int node_offset, const struct device_node *np) +{ + int ret = 0; + struct property *pp; + unsigned long flags; + + if (!np) + return -EINVAL; + + raw_spin_lock_irqsave(&devtree_lock, flags); + for (pp = np->properties; pp; pp = pp->next) { + ret = fdt_setprop(fdt, node_offset, pp->name, + pp->value, pp->length); + if (ret < 0) { + pr_err("Unable to add %s property: %s\n", + pp->name, fdt_strerror(ret)); + goto out; + } + } +out: + raw_spin_unlock_irqrestore(&devtree_lock, flags); + return ret; +} + +/** + * update_cpus_node - Update cpus node of flattened device-tree using of_root + * device node. + * @fdt: Flattened device tree of the kernel. + * + * Returns 0 on success, negative errno on error. + */ +static int update_cpus_node(void *fdt) +{ + struct device_node *cpus_node, *dn; + int cpus_offset, cpus_subnode_off, ret = 0; + + cpus_offset = fdt_path_offset(fdt, "/cpus"); + if (cpus_offset == -FDT_ERR_NOTFOUND || cpus_offset > 0) { + if (cpus_offset > 0) { + ret = fdt_del_n
Re: [PATCH v2] tools: do not include scripts/Kbuild.include
On 4/16/21 6:00 AM, Masahiro Yamada wrote: Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to scripts/Makefile.compiler"), some kselftests fail to build. The tools/ directory opted out Kbuild, and went in a different direction. They copy any kind of files to the tools/ directory in order to do whatever they want in their world. tools/build/Build.include mimics scripts/Kbuild.include, but some tool Makefiles included the Kbuild one to import a feature that is missing in tools/build/Build.include: - Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector" only if supported") included scripts/Kbuild.include from tools/thermal/tmon/Makefile to import the cc-option macro. - Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do not support -no-pie") included scripts/Kbuild.include from tools/testing/selftests/kvm/Makefile to import the try-run macro. - Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang failures") included scripts/Kbuild.include from tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR target. - Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for unrecognized option") included scripts/Kbuild.include from tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the try-run macro. Copy what they need into tools/build/Build.include, and make them include it instead of scripts/Kbuild.include. Link: https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f4...@linux.ibm.com/ Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to scripts/Makefile.compiler") Reported-by: Janosch Frank Reported-by: Christian Borntraeger Signed-off-by: Masahiro Yamada LGTM although I see some tools Makefile directly added ".DELETE_ON_ERROR:" in their Makefile. Acked-by: Yonghong Song
Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems
On Fri, Apr 16, 2021 at 07:08:23PM +0200, Jesper Dangaard Brouer wrote: > On Fri, 16 Apr 2021 16:27:55 +0100 > Matthew Wilcox wrote: > > > On Thu, Apr 15, 2021 at 08:08:32PM +0200, Jesper Dangaard Brouer wrote: > > > See below patch. Where I swap32 the dma address to satisfy > > > page->compound having bit zero cleared. (It is the simplest fix I could > > > come up with). > > > > I think this is slightly simpler, and as a bonus code that assumes the > > old layout won't compile. > > This is clever, I like it! When reading the code one just have to > remember 'unsigned long' size difference between 64-bit vs 32-bit. > And I assume compiler can optimize the sizeof check out then doable. I checked before/after with the replacement patch that doesn't have compiler warnings. On x86, there is zero codegen difference (objdump -dr before/after matches exactly) for both x86-32 with 32-bit dma_addr_t and x86-64. For x86-32 with 64-bit dma_addr_t, the compiler makes some different inlining decisions in page_pool_empty_ring(), page_pool_put_page() and page_pool_put_page_bulk(), but it's not clear to me that they're wrong. Function old new delta page_pool_empty_ring 387 307 -80 page_pool_put_page 604 516 -88 page_pool_put_page_bulk 690 517-173
Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems
Replacement patch to fix compiler warning. From: "Matthew Wilcox (Oracle)" Date: Fri, 16 Apr 2021 16:34:55 -0400 Subject: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems To: bro...@redhat.com Cc: linux-ker...@vger.kernel.org, linux...@kvack.org, net...@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-arm-ker...@lists.infradead.org, linux-m...@vger.kernel.org, ilias.apalodi...@linaro.org, mcr...@linux.microsoft.com, grygorii.stras...@ti.com, a...@kernel.org, h...@lst.de, linux-snps-...@lists.infradead.org, mho...@kernel.org, mgor...@suse.de 32-bit architectures which expect 8-byte alignment for 8-byte integers and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct page inadvertently expanded in 2019. When the dma_addr_t was added, it forced the alignment of the union to 8 bytes, which inserted a 4 byte gap between 'flags' and the union. Fix this by storing the dma_addr_t in one or two adjacent unsigned longs. This restores the alignment to that of an unsigned long, and also fixes a potential problem where (on a big endian platform), the bit used to denote PageTail could inadvertently get set, and a racing get_user_pages_fast() could dereference a bogus compound_head(). Fixes: c25fff7171be ("mm: add dma_addr_t to struct page") Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/mm_types.h | 4 ++-- include/net/page_pool.h | 12 +++- net/core/page_pool.c | 12 +++- 3 files changed, 20 insertions(+), 8 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6613b26a8894..5aacc1c10a45 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -97,10 +97,10 @@ struct page { }; struct {/* page_pool used by netstack */ /** -* @dma_addr: might require a 64-bit value even on +* @dma_addr: might require a 64-bit value on * 32-bit architectures. */ - dma_addr_t dma_addr; + unsigned long dma_addr[2]; }; struct {/* slab, slob and slub */ union { diff --git a/include/net/page_pool.h b/include/net/page_pool.h index b5b195305346..ad6154dc206c 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct page_pool *pool, static inline dma_addr_t page_pool_get_dma_addr(struct page *page) { - return page->dma_addr; + dma_addr_t ret = page->dma_addr[0]; + if (sizeof(dma_addr_t) > sizeof(unsigned long)) + ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16; + return ret; +} + +static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr) +{ + page->dma_addr[0] = addr; + if (sizeof(dma_addr_t) > sizeof(unsigned long)) + page->dma_addr[1] = addr >> 16 >> 16; } static inline bool is_page_pool_compiled_in(void) diff --git a/net/core/page_pool.c b/net/core/page_pool.c index ad8b0707af04..f014fd8c19a6 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct page_pool *pool, struct page *page, unsigned int dma_sync_size) { + dma_addr_t dma_addr = page_pool_get_dma_addr(page); + dma_sync_size = min(dma_sync_size, pool->p.max_len); - dma_sync_single_range_for_device(pool->p.dev, page->dma_addr, + dma_sync_single_range_for_device(pool->p.dev, dma_addr, pool->p.offset, dma_sync_size, pool->p.dma_dir); } @@ -226,7 +228,7 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, put_page(page); return NULL; } - page->dma_addr = dma; + page_pool_set_dma_addr(page, dma); if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) page_pool_dma_sync_for_device(pool, page, pool->p.max_len); @@ -294,13 +296,13 @@ void page_pool_release_page(struct page_pool *pool, struct page *page) */ goto skip_dma_unmap; - dma = page->dma_addr; + dma = page_pool_get_dma_addr(page); - /* When page is unmapped, it cannot be returned our pool */ + /* When page is unmapped, it cannot be returned to our pool */ dma_unmap_page_attrs(pool->p.dev, dma, PAGE_SIZE << pool->p.order, pool->p.dma_dir, DMA_ATTR_SKIP_CPU_SYNC); - page->dma_addr = 0; + page_pool_set_dma_addr(page, 0); skip_dma_unmap: /* This may be the last page returned, releasing the pool, so * it is not safe to reference pool aft
Re: [PATCH v13 14/14] powerpc/64s/radix: Enable huge vmalloc mappings
Excerpts from Andrew Morton's message of April 16, 2021 4:55 am: > On Thu, 15 Apr 2021 12:23:55 +0200 Christophe Leroy > wrote: >> > + * is done. STRICT_MODULE_RWX may require extra work to support this >> > + * too. >> > + */ >> > >> > - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, >> > GFP_KERNEL, >> > - PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS, >> > NUMA_NO_NODE, >> >> >> I think you should add the following in >> >> #ifndef MODULES_VADDR >> #define MODULES_VADDR VMALLOC_START >> #define MODULES_END VMALLOC_END >> #endif >> >> And leave module_alloc() as is (just removing the enclosing #ifdef >> MODULES_VADDR and adding the >> VM_NO_HUGE_VMAP flag) >> >> This would minimise the conflits with the changes I did in powerpc/next >> reported by Stephen R. >> > > I'll drop powerpc-64s-radix-enable-huge-vmalloc-mappings.patch for now, > make life simpler. Yeah that's fine. > Nick, a redo on top of Christophe's changes in linux-next would be best > please. Will do. Thanks, Nick
Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems
Hi "Matthew, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linux/master] [also build test WARNING on linus/master v5.12-rc7] [cannot apply to hnaz-linux-mm/master next-20210416] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Matthew-Wilcox-Oracle/Change-struct-page-layout-for-page_pool/20210417-070951 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 5e46d1b78a03d52306f21f77a4e4a144b6d31486 config: parisc-randconfig-s031-20210416 (attached as .config) compiler: hppa-linux-gcc (GCC) 9.3.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # apt-get install sparse # sparse version: v0.6.3-280-g2cd6d34e-dirty # https://github.com/0day-ci/linux/commit/898e155048088be20b2606575a24108eacc4c91b git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Matthew-Wilcox-Oracle/Change-struct-page-layout-for-page_pool/20210417-070951 git checkout 898e155048088be20b2606575a24108eacc4c91b # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=parisc If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): In file included from net/core/xdp.c:15: include/net/page_pool.h: In function 'page_pool_get_dma_addr': >> include/net/page_pool.h:203:40: warning: left shift count >= width of type >> [-Wshift-count-overflow] 203 | ret |= (dma_addr_t)page->dma_addr[1] << 32; |^~ include/net/page_pool.h: In function 'page_pool_set_dma_addr': >> include/net/page_pool.h:211:28: warning: right shift count >= width of type >> [-Wshift-count-overflow] 211 | page->dma_addr[1] = addr >> 32; |^~ vim +203 include/net/page_pool.h 198 199 static inline dma_addr_t page_pool_get_dma_addr(struct page *page) 200 { 201 dma_addr_t ret = page->dma_addr[0]; 202 if (sizeof(dma_addr_t) > sizeof(unsigned long)) > 203 ret |= (dma_addr_t)page->dma_addr[1] << 32; 204 return ret; 205 } 206 207 static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr) 208 { 209 page->dma_addr[0] = addr; 210 if (sizeof(dma_addr_t) > sizeof(unsigned long)) > 211 page->dma_addr[1] = addr >> 32; 212 } 213 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH v3] powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h
On Fri, 16 Apr 2021 at 03:41, Michael Ellerman wrote: > > Tony Ambardar writes: > > Hello Michael, > > > > The latest version of this patch addressed all feedback I'm aware of > > when submitted last September, and I've seen no further comments from > > reviewers since then. > > > > Could you please let me know where this stands and if anything further > > is needed? > > Sorry, it's still sitting in my inbox :/ > > I was going to reply to suggest we split the tools change out. The > headers under tools are usually updated by another maintainer, I think > it might even be scripted. > > Anyway I've applied your patch and done that (dropped the change to > tools/.../errno.h), which should also mean the stable backport is more > likely to work automatically. > > It will hit mainline in v5.13-rc1 and then be backported to the stable > trees. > > I don't think you actually need the tools version of the header updated > to fix your bug? In which case we can probably just wait for it to be > updated automatically when the tools headers are sync'ed with the kernel > versions. > > cheers I appreciate the follow up. My original bug was indeed with the tools header but is being patched locally, so waiting for those headers to sync with the kernel versions is fine if it simplifies things overall. Thanks, Tony
Re: [PATCH net-next v4 2/2] of: net: fix of_get_mac_addr_nvmem() for non-platform devices
On Fri, Apr 16, 2021 at 2:30 AM Michael Walle wrote: > > Am 2021-04-16 05:24, schrieb Benjamin Herrenschmidt: > > On Mon, 2021-04-12 at 19:47 +0200, Michael Walle wrote: > >> > >> /** > >> * of_get_phy_mode - Get phy mode for given device_node > >> @@ -59,15 +60,39 @@ static int of_get_mac_addr(struct device_node *np, > >> const char *name, u8 *addr) > >> static int of_get_mac_addr_nvmem(struct device_node *np, u8 *addr) > >> { > >> struct platform_device *pdev = of_find_device_by_node(np); > >> + struct nvmem_cell *cell; > >> + const void *mac; > >> + size_t len; > >> int ret; > >> > >> - if (!pdev) > >> - return -ENODEV; > >> + /* Try lookup by device first, there might be a > >> nvmem_cell_lookup > >> +* associated with a given device. > >> +*/ > >> + if (pdev) { > >> + ret = nvmem_get_mac_address(&pdev->dev, addr); > >> + put_device(&pdev->dev); > >> + return ret; > >> + } > >> + > > > > This smells like the wrong band aid :) > > > > Any struct device can contain an OF node pointer these days. > > But not all nodes might have an associated device, see DSA for example. I believe what Ben is saying and what I said earlier is going from dev -> OF node is right and OF node -> dev is wrong. If you only have an OF node, then use an of_* function. > And as the name suggests of_get_mac_address() operates on a node. So > if a driver calls of_get_mac_address() it should work on the node. What > is wrong IMHO, is that the ethernet drivers where the corresponding > board > has a nvmem_cell_lookup registered is calling of_get_mac_address(node). > It should rather call eth_get_mac_address(dev) in the first place. > > One would need to figure out if there is an actual device (with an > assiciated of_node), then call eth_get_mac_address(dev) and if there > isn't a device call of_get_mac_address(node). Yes, I think we're all in agreement. > But I don't know if that is easy to figure out. Well, one could start > with just the device where nvmem_cell_lookup is used. Then we could > drop the workaround above. Start with the ones just passing dev.of_node directly: $ git grep 'of_get_mac_address(.*of_node)' drivers/net/ethernet/aeroflex/greth.c: addr = of_get_mac_address(ofdev->dev.of_node); drivers/net/ethernet/altera/altera_tse_main.c: macaddr = of_get_mac_address(pdev->dev.of_node); drivers/net/ethernet/arc/emac_main.c: mac_addr = of_get_mac_address(dev->of_node); drivers/net/ethernet/broadcom/bgmac-bcma.c: mac = of_get_mac_address(bgmac->dev->of_node); drivers/net/ethernet/cavium/octeon/octeon_mgmt.c: mac = of_get_mac_address(pdev->dev.of_node); drivers/net/ethernet/ethoc.c: mac = of_get_mac_address(pdev->dev.of_node); drivers/net/ethernet/ezchip/nps_enet.c: mac_addr = of_get_mac_address(dev->of_node); drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c: mac_addr = of_get_mac_address(ofdev->dev.of_node); drivers/net/ethernet/marvell/pxa168_eth.c: mac_addr = of_get_mac_address(pdev->dev.of_node); drivers/net/ethernet/marvell/sky2.c:iap = of_get_mac_address(hw->pdev->dev.of_node); drivers/net/ethernet/mediatek/mtk_eth_soc.c:mac_addr = of_get_mac_address(mac->of_node); drivers/net/ethernet/microchip/lan743x_main.c: mac_addr = of_get_mac_address(pdev->dev.of_node); drivers/net/ethernet/qualcomm/qca_spi.c:mac = of_get_mac_address(spi->dev.of_node); drivers/net/ethernet/qualcomm/qca_uart.c: mac = of_get_mac_address(serdev->dev.of_node); drivers/net/ethernet/wiznet/w5100-spi.c:const void *mac = of_get_mac_address(spi->dev.of_node); drivers/net/ethernet/xilinx/xilinx_axienet_main.c: mac_addr = of_get_mac_address(pdev->dev.of_node); drivers/net/ethernet/xilinx/xilinx_emaclite.c: mac_address = of_get_mac_address(ofdev->dev.of_node); drivers/net/wireless/ralink/rt2x00/rt2x00dev.c: mac_addr = of_get_mac_address(rt2x00dev->dev->of_node); drivers/staging/octeon/ethernet.c: mac = of_get_mac_address(priv->of_node); drivers/staging/wfx/main.c: macaddr = of_get_mac_address(wdev->dev->of_node); net/ethernet/eth.c: addr = of_get_mac_address(dev->of_node); Then this will find most of the rest: git grep -W 'of_get_mac_address([a-z]*)'| grep -E '(node|np)' Rob
Re: [PATCH bpf-next 1/2] bpf: Remove bpf_jit_enable=2 debugging mode
On Thu, Apr 15, 2021 at 10:38 PM Daniel Borkmann wrote: > > On 4/15/21 11:32 AM, Jianlin Lv wrote: > > For debugging JITs, dumping the JITed image to kernel log is discouraged, > > "bpftool prog dump jited" is much better way to examine JITed dumps. > > This patch get rid of the code related to bpf_jit_enable=2 mode and > > update the proc handler of bpf_jit_enable, also added auxiliary > > information to explain how to use bpf_jit_disasm tool after this change. > > > > Signed-off-by: Jianlin Lv > [...] > > diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c > > index 0a7a2870f111..8d36b4658076 100644 > > --- a/arch/x86/net/bpf_jit_comp32.c > > +++ b/arch/x86/net/bpf_jit_comp32.c > > @@ -2566,9 +2566,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog > > *prog) > > cond_resched(); > > } > > > > - if (bpf_jit_enable > 1) > > - bpf_jit_dump(prog->len, proglen, pass + 1, image); > > - > > if (image) { > > bpf_jit_binary_lock_ro(header); > > prog->bpf_func = (void *)image; > > diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c > > index c8496c1142c9..990b1720c7a4 100644 > > --- a/net/core/sysctl_net_core.c > > +++ b/net/core/sysctl_net_core.c > > @@ -273,16 +273,8 @@ static int proc_dointvec_minmax_bpf_enable(struct > > ctl_table *table, int write, > > > > tmp.data = &jit_enable; > > ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); > > - if (write && !ret) { > > - if (jit_enable < 2 || > > - (jit_enable == 2 && bpf_dump_raw_ok(current_cred( { > > - *(int *)table->data = jit_enable; > > - if (jit_enable == 2) > > - pr_warn("bpf_jit_enable = 2 was set! NEVER > > use this in production, only for JIT debugging!\n"); > > - } else { > > - ret = -EPERM; > > - } > > - } > > + if (write && !ret) > > + *(int *)table->data = jit_enable; > > return ret; > > } > > > > @@ -389,7 +381,7 @@ static struct ctl_table net_core_table[] = { > > .extra2 = SYSCTL_ONE, > > # else > > .extra1 = SYSCTL_ZERO, > > - .extra2 = &two, > > + .extra2 = SYSCTL_ONE, > > # endif > > }, > > # ifdef CONFIG_HAVE_EBPF_JIT > > diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c > > index c8ae95804728..efa4b17ae016 100644 > > --- a/tools/bpf/bpf_jit_disasm.c > > +++ b/tools/bpf/bpf_jit_disasm.c > > @@ -7,7 +7,7 @@ > >* > >* To get the disassembly of the JIT code, do the following: > >* > > - * 1) `echo 2 > /proc/sys/net/core/bpf_jit_enable` > > + * 1) Insert bpf_jit_dump() and recompile the kernel to output JITed > > image into log > > Hmm, if we remove bpf_jit_dump(), the next drive-by cleanup patch will be > thrown > at bpf@vger stating that bpf_jit_dump() has no in-tree users and should be > removed. > Maybe we should be removing bpf_jit_disasm.c along with it as well as > bpf_jit_dump() > itself ... I guess if it's ever needed in those rare occasions for JIT > debugging we > can resurrect it from old kernels just locally. But yeah, bpftool's jit dump > should > suffice for vast majority of use cases. > > There was a recent set for ppc32 jit which was merged into ppc tree which > will create > a merge conflict with this one [0]. So we would need a rebase and take it > maybe during > merge win once the ppc32 landed.. > >[0] > https://lore.kernel.org/bpf/cover.1616430991.git.christophe.le...@csgroup.eu/ > > >* 2) Load a BPF filter (e.g. `tcpdump -p -n -s 0 -i eth1 host > > 192.168.20.0/24`) > >* 3) Run e.g. `bpf_jit_disasm -o` to read out the last JIT code > >* > > diff --git a/tools/bpf/bpftool/feature.c b/tools/bpf/bpftool/feature.c > > index 40a88df275f9..98c7eec2923f 100644 > > --- a/tools/bpf/bpftool/feature.c > > +++ b/tools/bpf/bpftool/feature.c > > @@ -203,9 +203,6 @@ static void probe_jit_enable(void) > > case 1: > > printf("JIT compiler is enabled\n"); > > break; > > - case 2: > > - printf("JIT compiler is enabled with debugging traces > > in kernel logs\n"); > > - break; > > This would still need to be there for older kernels ... I will submit another version after ppc32 landed to remove bpf_jit_disasm.c and restore bpftool/feature.c Jianlin > > > case -1: > > printf("Unable to retrieve JIT-compiler status\n"); > > break; > > >
Re: [PATCH net-next 0/2] net: gianfar: Drop GFAR_MQ_POLLING support
Hello: This series was applied to netdev/net-next.git (refs/heads/master): On Fri, 16 Apr 2021 20:11:21 +0300 you wrote: > Drop long time obsolete "per NAPI multi-queue" support in gianfar, > and related (and undocumented) device tree properties. > > Claudiu Manoil (2): > gianfar: Drop GFAR_MQ_POLLING support > powerpc: dts: fsl: Drop obsolete fsl,rx-bit-map and fsl,tx-bit-map > properties > > [...] Here is the summary with links: - [net-next,1/2] gianfar: Drop GFAR_MQ_POLLING support https://git.kernel.org/netdev/net-next/c/8eda54c5e6c4 - [net-next,2/2] powerpc: dts: fsl: Drop obsolete fsl,rx-bit-map and fsl,tx-bit-map properties https://git.kernel.org/netdev/net-next/c/221e8c126b78 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
[PATCH 2/2] mm: Indicate pfmemalloc pages in compound_head
The net page_pool wants to use a magic value to identify page pool pages. The best place to put it is in the first word where it can be clearly a non-pointer value. That means shifting dma_addr up to alias with ->index, which means we need to find another way to indicate page_is_pfmemalloc(). Since page_pool doesn't want to set its magic value on pages which are pfmemalloc, we can use bit 1 of compound_head to indicate that the page came from the memory reserves. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/mm.h | 12 +++- include/linux/mm_types.h | 7 +++ 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8ba434287387..44eab3f6d5ae 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1629,10 +1629,12 @@ struct address_space *page_mapping_file(struct page *page); static inline bool page_is_pfmemalloc(const struct page *page) { /* -* Page index cannot be this large so this must be -* a pfmemalloc page. +* This is not a tail page; compound_head of a head page is unused +* at return from the page allocator, and will be overwritten +* by callers who do not care whether the page came from the +* reserves. */ - return page->index == -1UL; + return page->compound_head & 2; } /* @@ -1641,12 +1643,12 @@ static inline bool page_is_pfmemalloc(const struct page *page) */ static inline void set_page_pfmemalloc(struct page *page) { - page->index = -1UL; + page->compound_head = 2; } static inline void clear_page_pfmemalloc(struct page *page) { - page->index = 0; + page->compound_head = 0; } /* diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5aacc1c10a45..39f7163dcace 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -96,10 +96,9 @@ struct page { unsigned long private; }; struct {/* page_pool used by netstack */ - /** -* @dma_addr: might require a 64-bit value on -* 32-bit architectures. -*/ + unsigned long pp_magic; + unsigned long xmi; + unsigned long _pp_mapping_pad; unsigned long dma_addr[2]; }; struct {/* slab, slob and slub */ -- 2.30.2
[PATCH 1/2] mm: Fix struct page layout on 32-bit systems
32-bit architectures which expect 8-byte alignment for 8-byte integers and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct page inadvertently expanded in 2019. When the dma_addr_t was added, it forced the alignment of the union to 8 bytes, which inserted a 4 byte gap between 'flags' and the union. Fix this by storing the dma_addr_t in one or two adjacent unsigned longs. This restores the alignment to that of an unsigned long, and also fixes a potential problem where (on a big endian platform), the bit used to denote PageTail could inadvertently get set, and a racing get_user_pages_fast() could dereference a bogus compound_head(). Fixes: c25fff7171be ("mm: add dma_addr_t to struct page") Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/mm_types.h | 4 ++-- include/net/page_pool.h | 12 +++- net/core/page_pool.c | 12 +++- 3 files changed, 20 insertions(+), 8 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6613b26a8894..5aacc1c10a45 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -97,10 +97,10 @@ struct page { }; struct {/* page_pool used by netstack */ /** -* @dma_addr: might require a 64-bit value even on +* @dma_addr: might require a 64-bit value on * 32-bit architectures. */ - dma_addr_t dma_addr; + unsigned long dma_addr[2]; }; struct {/* slab, slob and slub */ union { diff --git a/include/net/page_pool.h b/include/net/page_pool.h index b5b195305346..db7c7020746a 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct page_pool *pool, static inline dma_addr_t page_pool_get_dma_addr(struct page *page) { - return page->dma_addr; + dma_addr_t ret = page->dma_addr[0]; + if (sizeof(dma_addr_t) > sizeof(unsigned long)) + ret |= (dma_addr_t)page->dma_addr[1] << 32; + return ret; +} + +static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr) +{ + page->dma_addr[0] = addr; + if (sizeof(dma_addr_t) > sizeof(unsigned long)) + page->dma_addr[1] = addr >> 32; } static inline bool is_page_pool_compiled_in(void) diff --git a/net/core/page_pool.c b/net/core/page_pool.c index ad8b0707af04..f014fd8c19a6 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct page_pool *pool, struct page *page, unsigned int dma_sync_size) { + dma_addr_t dma_addr = page_pool_get_dma_addr(page); + dma_sync_size = min(dma_sync_size, pool->p.max_len); - dma_sync_single_range_for_device(pool->p.dev, page->dma_addr, + dma_sync_single_range_for_device(pool->p.dev, dma_addr, pool->p.offset, dma_sync_size, pool->p.dma_dir); } @@ -226,7 +228,7 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, put_page(page); return NULL; } - page->dma_addr = dma; + page_pool_set_dma_addr(page, dma); if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) page_pool_dma_sync_for_device(pool, page, pool->p.max_len); @@ -294,13 +296,13 @@ void page_pool_release_page(struct page_pool *pool, struct page *page) */ goto skip_dma_unmap; - dma = page->dma_addr; + dma = page_pool_get_dma_addr(page); - /* When page is unmapped, it cannot be returned our pool */ + /* When page is unmapped, it cannot be returned to our pool */ dma_unmap_page_attrs(pool->p.dev, dma, PAGE_SIZE << pool->p.order, pool->p.dma_dir, DMA_ATTR_SKIP_CPU_SYNC); - page->dma_addr = 0; + page_pool_set_dma_addr(page, 0); skip_dma_unmap: /* This may be the last page returned, releasing the pool, so * it is not safe to reference pool afterwards. -- 2.30.2
[PATCH 0/2] Change struct page layout for page_pool
The first patch here fixes two bugs on ppc32, and mips32. It fixes one bug on arc and arm32 (in certain configurations). It probably makes sense to get it in ASAP through the networking tree. I'd like to see testing on those four architectures if possible? The second patch enables new functionality. It is much less urgent. I'd really like to see Mel & Michal's thoughts on it. I have only compile-tested these patches. Matthew Wilcox (Oracle) (2): mm: Fix struct page layout on 32-bit systems mm: Indicate pfmemalloc pages in compound_head include/linux/mm.h | 12 +++- include/linux/mm_types.h | 9 - include/net/page_pool.h | 12 +++- net/core/page_pool.c | 12 +++- 4 files changed, 29 insertions(+), 16 deletions(-) -- 2.30.2
Re: [PATCH 1/1] of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses
Hello Rob, thanks for this feedback! On Thu, 2021-04-15 at 13:59 -0500, Rob Herring wrote: > +PPC and PCI lists > > On Thu, Apr 15, 2021 at 1:01 PM Leonardo Bras wrote: > > > > Many other resource flag parsers already add this flag when the input > > has bits 24 & 25 set, so update this one to do the same. > > Many others? Looks like sparc and powerpc to me. > s390 also does that, but it look like it comes from a device-tree. > Those would be the > ones I worry about breaking. Sparc doesn't use of/address.c so it's > fine. Powerpc version of the flags code was only fixed in 2019, so I > don't think powerpc will care either. In powerpc I reach this function with this stack, while configuring a virtio-net device for a qemu/KVM pseries guest: pci_process_bridge_OF_ranges+0xac/0x2d4 pSeries_discover_phbs+0xc4/0x158 discover_phbs+0x40/0x60 do_one_initcall+0x60/0x2d0 kernel_init_freeable+0x308/0x3a8 kernel_init+0x2c/0x168 ret_from_kernel_thread+0x5c/0x70 For this, both MMIO32 and MMIO64 resources will have flags 0x200. > > I noticed both sparc and powerpc set PCI_BASE_ADDRESS_MEM_TYPE_64 in > the flags. AFAICT, that's not set anywhere outside of arch code. So > never for riscv, arm and arm64 at least. That leads me to > pci_std_update_resource() which is where the PCI code sets BARs and > just copies the flags in PCI_BASE_ADDRESS_MEM_MASK ignoring > IORESOURCE_* flags. So it seems like 64-bit is still not handled and > neither is prefetch. > I am not sure if you mean here: a) it's ok to add IORESOURCE_MEM_64 here, because it does not affect anything else, or b) it should be using PCI_BASE_ADDRESS_MEM_TYPE_64 (or IORESOURCE_MEM_64 | PCI_BASE_ADDRESS_MEM_TYPE_64) instead, since it's how it's added in powerpc/sparc, and else there is no point. Again, thanks for helping! Best regards, Leonardo Bras
[PATCH 2/2] hotplug-cpu.c: set UNISOLATE on dlpar_cpu_remove() failure
The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else for both QEMU and phyp. This gives us an opportunity to use this behavior to signal the hypervisor layer when an error during device removal happens, allowing it to do a proper error handling, while not breaking QEMU/phyp implementations that don't have this support. This patch introduces this idea by unisolating all CPU DRCs that failed to be removed by dlpar_cpu_remove_by_index(), when handling the PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event only because its the only CPU removal event QEMU uses, and there's no need at this moment to add this mechanism for phyp only code. Signed-off-by: Daniel Henrique Barboza --- arch/powerpc/platforms/pseries/hotplug-cpu.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index 12cbffd3c2e3..ed66895c2f51 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -802,8 +802,15 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog) case PSERIES_HP_ELOG_ACTION_REMOVE: if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT) rc = dlpar_cpu_remove_by_count(count); - else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) + else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) { rc = dlpar_cpu_remove_by_index(drc_index); + /* Setting the isolation state of an UNISOLATED/CONFIGURED +* device to UNISOLATE is a no-op, but the hypervison can +* use it as a hint that the cpu removal failed. +*/ + if (rc) + dlpar_unisolate_drc(drc_index); + } else rc = -EINVAL; break; -- 2.30.2
[PATCH 1/2] dlpar.c: introduce dlpar_unisolate_drc()
Next patch will execute a set-indicator call in hotplug-cpu.c. Create a dlpar_unisolate_drc() helper to avoid spreading more rtas_set_indicator() calls outside of dlpar.c. Signed-off-by: Daniel Henrique Barboza --- arch/powerpc/platforms/pseries/dlpar.c | 14 ++ arch/powerpc/platforms/pseries/pseries.h | 1 + 2 files changed, 15 insertions(+) diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c index 233503fcf8f0..3ac70790ec7a 100644 --- a/arch/powerpc/platforms/pseries/dlpar.c +++ b/arch/powerpc/platforms/pseries/dlpar.c @@ -329,6 +329,20 @@ int dlpar_release_drc(u32 drc_index) return 0; } +int dlpar_unisolate_drc(u32 drc_index) +{ + int dr_status, rc; + + rc = rtas_call(rtas_token("get-sensor-state"), 2, 2, &dr_status, + DR_ENTITY_SENSE, drc_index); + if (rc || dr_status != DR_ENTITY_PRESENT) + return -1; + + rtas_set_indicator(ISOLATION_STATE, drc_index, UNISOLATE); + + return 0; +} + int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_elog) { int rc; diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h index 4fe48c04c6c2..4ea12037c920 100644 --- a/arch/powerpc/platforms/pseries/pseries.h +++ b/arch/powerpc/platforms/pseries/pseries.h @@ -55,6 +55,7 @@ extern int dlpar_attach_node(struct device_node *, struct device_node *); extern int dlpar_detach_node(struct device_node *); extern int dlpar_acquire_drc(u32 drc_index); extern int dlpar_release_drc(u32 drc_index); +extern int dlpar_unisolate_drc(u32 drc_index); void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog); int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_errlog); -- 2.30.2
[PATCH 0/2] pseries: UNISOLATE DRCs to signal device removal error
At this moment, PAPR [1] does not have a way to report errors during a device removal operation. This puts a strain in the hypervisor, which needs extra mechanisms to try to fallback and recover from an error that might have happened during the removal. The QEMU community has dealt with it during these years by either trying to preempt the error before sending the HP event or, in case of a guest side failure, reboot the guest to complete the removal process. This started to change with QEMU commit fe1831eff8a4 ("spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state"), where a way to fallback from a memory removal error was introduced. In this case, when QEMU detects that the kernel is reconfiguring LMBs DRCs that were marked as pending removal, the entire process is reverted from the QEMU side as well. Around the same time, other discussions in the QEMU mailing discussed an alternative for other device as well. In [2] the idea of using RTAS set-indicator for this role was first introduced. The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else for both QEMU and phyp. This gives us an opportunity to use this behavior to signal the hypervisor layer when a device removal happens, allowing it to do a proper error handling knowing for sure that the removal failed in the kernel. Using set-indicator to report HP errors isn't strange to PAPR, as per R1-13.5.3.4-4. of table 13.7 of [1]: "For all DR options: If this is a DR operation that involves the user insert- ing a DR entity, then if the firmware can determine that the inserted entity would cause a system disturbance, then the set-indicator RTAS call must not unisolate the entity and must return an error status which is unique to the particular error." PAPR does not make any restrictions or considerations about setting an already Unisolated/Configured DRC to 'unisolate', meaning we have a chance to use it for this purpose - signal an OS side error when attempting to remove a DR entity. To validate the design, this is being implemented only for CPUs. QEMU will use this mechanism to rollback the device removal (hotunplug) state, allowing for a better error handling mechanism. A implementation of how QEMU can do it is in [3]. When using a kernel with this series applied, together with this QEMU build, this is what happens in a common CPU removal/hotunplug error scenario (trying to remove the last online CPU): ( QEMU command line: qemu-system-ppc64 -machine pseries,accel=kvm,usb=off -smp 1,maxcpus=2,threads=1,cores=2,sockets=1 ... ) [root@localhost ~]# QEMU 5.2.92 monitor - type 'help' for more information (qemu) device_add host-spapr-cpu-core,core-id=1,id=core1 (qemu) [root@localhost ~]# echo 0 > /sys/devices/system/cpu/cpu0/online [ 77.548442][ T13] IRQ 19: no longer affine to CPU0 [ 77.548452][ T13] IRQ 20: no longer affine to CPU0 [ 77.548458][ T13] IRQ 256: no longer affine to CPU0 [ 77.548465][ T13] IRQ 258: no longer affine to CPU0 [ 77.548472][ T13] IRQ 259: no longer affine to CPU0 [ 77.548479][ T13] IRQ 260: no longer affine to CPU0 [ 77.548485][ T13] IRQ 261: no longer affine to CPU0 [ 77.548590][T0] cpu 0 (hwid 0) Ready to die... [root@localhost ~]# (qemu) (qemu) device_del core1 (qemu) [ 83.214073][ T100] pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9, rc: -16 qemu-system-ppc64: Device hotunplug rejected by the guest for device core1 (qemu) As soon as the CPU removal fails in dlpar_cpu(), QEMU becames aware of it and is able to do error recovery. If this solution is well received, I'll push for an architecture change request internally at IBM to make this mechanism PAPR official. [1] https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200611.pdf [2] https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06395.html [3] https://github.com/danielhb/qemu/tree/unisolate_drc_callback_v1 Daniel Henrique Barboza (2): dlpar.c: introduce dlpar_unisolate_drc() hotplug-cpu.c: set UNISOLATE on dlpar_cpu_remove() failure arch/powerpc/platforms/pseries/dlpar.c | 14 ++ arch/powerpc/platforms/pseries/hotplug-cpu.c | 9 - arch/powerpc/platforms/pseries/pseries.h | 1 + 3 files changed, 23 insertions(+), 1 deletion(-) -- 2.30.2
Re: [PATCH 1/1] of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses
Hello Rob, thanks for this feedback! On Thu, 2021-04-15 at 13:59 -0500, Rob Herring wrote: > +PPC and PCI lists > > On Thu, Apr 15, 2021 at 1:01 PM Leonardo Bras wrote: > > > > Many other resource flag parsers already add this flag when the input > > has bits 24 & 25 set, so update this one to do the same. > > Many others? Looks like sparc and powerpc to me. > s390 also does that, but it look like it comes from a device-tree. > Those would be the > ones I worry about breaking. Sparc doesn't use of/address.c so it's > fine. Powerpc version of the flags code was only fixed in 2019, so I > don't think powerpc will care either. In powerpc I reach this function with this stack, while configuring a virtio-net device for a qemu/KVM pseries guest: pci_process_bridge_OF_ranges+0xac/0x2d4 pSeries_discover_phbs+0xc4/0x158 discover_phbs+0x40/0x60 do_one_initcall+0x60/0x2d0 kernel_init_freeable+0x308/0x3a8 kernel_init+0x2c/0x168 ret_from_kernel_thread+0x5c/0x70 For this, both MMIO32 and MMIO64 resources will have flags 0x200. > > I noticed both sparc and powerpc set PCI_BASE_ADDRESS_MEM_TYPE_64 in > the flags. AFAICT, that's not set anywhere outside of arch code. So > never for riscv, arm and arm64 at least. That leads me to > pci_std_update_resource() which is where the PCI code sets BARs and > just copies the flags in PCI_BASE_ADDRESS_MEM_MASK ignoring > IORESOURCE_* flags. So it seems like 64-bit is still not handled and > neither is prefetch. > I am not sure if you mean here: a) it's ok to add IORESOURCE_MEM_64 here, because it does not affect anything else, or b) it should be using PCI_BASE_ADDRESS_MEM_TYPE_64 (or IORESOURCE_MEM_64 | PCI_BASE_ADDRESS_MEM_TYPE_64) instead, since it's how it's added in powerpc/sparc, and else there is no point. Again, thanks for helping! Best regards, Leonardo Bras
Re: [PATCH] powerpc/pseries: extract host bridge from pci_bus prior to bus removal
On 4/16/21 12:15 AM, Daniel Axtens wrote: > Hi Tyrel, > >> The pci_bus->bridge reference may no longer be valid after >> pci_bus_remove() resulting in passing a bad value to device_unregister() >> for the associated bridge device. >> >> Store the host_bridge reference in a separate variable prior to >> pci_bus_remove(). >> > The patch certainly seems to do what you say. I'm not really up on the > innards of PCI, so I'm struggling to figure out by what code path > pci_bus_remove() might invalidate pci_bus->bridge? A quick look at > pci_remove_bus was not very illuminating but I didn't chase down every > call it made. remove_phb_dynamic() |--> pci_remove_bus(bus) |--> device_unregister(&bus->dev) |--> put_device(dev) |--> device_release(kobj) |--> dev->class->dev_release(dev) == release_pci_bus(dev) |--> kfree(bus) We have the above call chain that takes place in the when put_device() triggers the kobject ref count to go zero. The kobject_release function in this case is device_release() which in turn calls dev->class->dev_release(dev). For a pci_bus the class is appropriately pcibus_class whose dev_release() callback points to release_pci_bus(). This in turn calls kfree() on the bus. Which means we can no longer safely dereference any fields of the pci_bus struct. -Tyrel > > Kind regards, > Daniel > >> Fixes: 7340056567e3 ("powerpc/pci: Reorder pci bus/bridge unregistration >> during PHB removal") >> Signed-off-by: Tyrel Datwyler >> --- >> arch/powerpc/platforms/pseries/pci_dlpar.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c >> b/arch/powerpc/platforms/pseries/pci_dlpar.c >> index f9ae17e8a0f4..a8f9140a24fa 100644 >> --- a/arch/powerpc/platforms/pseries/pci_dlpar.c >> +++ b/arch/powerpc/platforms/pseries/pci_dlpar.c >> @@ -50,6 +50,7 @@ EXPORT_SYMBOL_GPL(init_phb_dynamic); >> int remove_phb_dynamic(struct pci_controller *phb) >> { >> struct pci_bus *b = phb->bus; >> +struct pci_host_bridge *host_bridge = to_pci_host_bridge(b->bridge); >> struct resource *res; >> int rc, i; >> >> @@ -76,7 +77,8 @@ int remove_phb_dynamic(struct pci_controller *phb) >> /* Remove the PCI bus and unregister the bridge device from sysfs */ >> phb->bus = NULL; >> pci_remove_bus(b); >> -device_unregister(b->bridge); >> +host_bridge->bus = NULL; >> +device_unregister(&host_bridge->dev); >> >> /* Now release the IO resource */ >> if (res->flags & IORESOURCE_IO) >> -- >> 2.27.0
Re: [PATCH] powerpc/pseries: Add shutdown() to vio_driver and vio_bus
On 4/1/21 5:13 PM, Tyrel Datwyler wrote: > Currently, neither the vio_bus or vio_driver structures provide support > for a shutdown() routine. > > Add support for shutdown() by allowing drivers to provide a > implementation via function pointer in their vio_driver struct and > provide a proper implementation in the driver template for the vio_bus > that calls a vio drivers shutdown() if defined. > > In the case that no shutdown() is defined by a vio driver and a kexec is > in progress we implement a big hammer that calls remove() to ensure no > further DMA for the devices is possible. > > Signed-off-by: Tyrel Datwyler > --- Ping... any comments, problems with this approach? -Tyrel
Re: [PATCH v1 12/12] KVM: PPC: Book3S HV: Ensure MSR[HV] is always clear in guest MSR
Nicholas Piggin writes: > Rather than clear the HV bit from the MSR at guest entry, make it clear > that the hypervisor does not allow the guest to set the bit. > > The HV clear is kept in guest entry for now, but a future patch will > warn if it is set. > > Acked-by: Paul Mackerras > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/kvm/book3s_hv_builtin.c | 4 ++-- > arch/powerpc/kvm/book3s_hv_nested.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c > b/arch/powerpc/kvm/book3s_hv_builtin.c > index 41cb03d0bde4..7a0e33a9c980 100644 > --- a/arch/powerpc/kvm/book3s_hv_builtin.c > +++ b/arch/powerpc/kvm/book3s_hv_builtin.c > @@ -662,8 +662,8 @@ static void kvmppc_end_cede(struct kvm_vcpu *vcpu) > > void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr) > { > - /* Guest must always run with ME enabled. */ > - msr = msr | MSR_ME; > + /* Guest must always run with ME enabled, HV disabled. */ > + msr = (msr | MSR_ME) & ~MSR_HV; > > /* >* Check for illegal transactional state bit combination > diff --git a/arch/powerpc/kvm/book3s_hv_nested.c > b/arch/powerpc/kvm/book3s_hv_nested.c > index fb03085c902b..60724f674421 100644 > --- a/arch/powerpc/kvm/book3s_hv_nested.c > +++ b/arch/powerpc/kvm/book3s_hv_nested.c > @@ -344,8 +344,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) > vcpu->arch.nested_vcpu_id = l2_hv.vcpu_token; > vcpu->arch.regs = l2_regs; > > - /* Guest must always run with ME enabled. */ > - vcpu->arch.shregs.msr = vcpu->arch.regs.msr | MSR_ME; > + /* Guest must always run with ME enabled, HV disabled. */ > + vcpu->arch.shregs.msr = (vcpu->arch.regs.msr | MSR_ME) & ~MSR_HV; > > sanitise_hv_regs(vcpu, &l2_hv); > restore_hv_regs(vcpu, &l2_hv);
Re: [PATCH v1 4/7] KVM: PPC: Book3S 64: Move hcall early register setup to KVM
Nicholas Piggin writes: > System calls / hcalls have a different calling convention than > other interrupts, so there is code in the KVMTEST to massage these > into the same form as other interrupt handlers. > > Move this work into the KVM hcall handler. This means teaching KVM > a little more about the low level interrupt handler setup, PACA save > areas, etc., although that's not obviously worse than the current > approach of coming up with an entirely different interrupt register > / save convention. > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/include/asm/exception-64s.h | 13 > arch/powerpc/kernel/exceptions-64s.S | 42 +--- > arch/powerpc/kvm/book3s_64_entry.S | 30 + > 3 files changed, 44 insertions(+), 41 deletions(-) > > diff --git a/arch/powerpc/include/asm/exception-64s.h > b/arch/powerpc/include/asm/exception-64s.h > index c1a8aac01cf9..bb6f78fcf981 100644 > --- a/arch/powerpc/include/asm/exception-64s.h > +++ b/arch/powerpc/include/asm/exception-64s.h > @@ -35,6 +35,19 @@ > /* PACA save area size in u64 units (exgen, exmc, etc) */ > #define EX_SIZE 10 > > +/* PACA save area offsets */ > +#define EX_R90 > +#define EX_R10 8 > +#define EX_R11 16 > +#define EX_R12 24 > +#define EX_R13 32 > +#define EX_DAR 40 > +#define EX_DSISR 48 > +#define EX_CCR 52 > +#define EX_CFAR 56 > +#define EX_PPR 64 > +#define EX_CTR 72 > + > /* > * maximum recursive depth of MCE exceptions > */ > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 9467fd1038f9..1bfd0d7af09e 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -21,22 +21,6 @@ > #include > #include > > -/* PACA save area offsets (exgen, exmc, etc) */ > -#define EX_R90 > -#define EX_R10 8 > -#define EX_R11 16 > -#define EX_R12 24 > -#define EX_R13 32 > -#define EX_DAR 40 > -#define EX_DSISR 48 > -#define EX_CCR 52 > -#define EX_CFAR 56 > -#define EX_PPR 64 > -#define EX_CTR 72 > -.if EX_SIZE != 10 > - .error "EX_SIZE is wrong" > -.endif > - > /* > * Following are fixed section helper macros. > * > @@ -1964,29 +1948,8 @@ EXC_VIRT_END(system_call, 0x4c00, 0x100) > > #ifdef CONFIG_KVM_BOOK3S_64_HANDLER > TRAMP_REAL_BEGIN(system_call_kvm) > - /* > - * This is a hcall, so register convention is as above, with these > - * differences: > - * r13 = PACA > - * ctr = orig r13 > - * orig r10 saved in PACA > - */ > - /* > - * Save the PPR (on systems that support it) before changing to > - * HMT_MEDIUM. That allows the KVM code to save that value into the > - * guest state (it is the guest's PPR value). > - */ > -BEGIN_FTR_SECTION > - mfspr r10,SPRN_PPR > - std r10,HSTATE_PPR(r13) > -END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > - HMT_MEDIUM > mfctr r10 > - SET_SCRATCH0(r10) > - mfcrr10 > - std r12,HSTATE_SCRATCH0(r13) > - sldir12,r10,32 > - ori r12,r12,0xc00 > + SET_SCRATCH0(r10) /* Save r13 in SCRATCH0 */ > #ifdef CONFIG_RELOCATABLE > /* >* Requires __LOAD_FAR_HANDLER beause kvmppc_hcall lives > @@ -1994,15 +1957,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) >*/ > __LOAD_FAR_HANDLER(r10, kvmppc_hcall) > mtctr r10 > - ld r10,PACA_EXGEN+EX_R10(r13) > bctr > #else > - ld r10,PACA_EXGEN+EX_R10(r13) > b kvmppc_hcall > #endif > #endif > > - > /** > * Interrupt 0xd00 - Trace Interrupt. > * This is a synchronous interrupt in response to instruction step or > diff --git a/arch/powerpc/kvm/book3s_64_entry.S > b/arch/powerpc/kvm/book3s_64_entry.S > index c21fa64059ef..f527e16707db 100644 > --- a/arch/powerpc/kvm/book3s_64_entry.S > +++ b/arch/powerpc/kvm/book3s_64_entry.S > @@ -14,6 +14,36 @@ > .global kvmppc_hcall > .balign IFETCH_ALIGN_BYTES > kvmppc_hcall: > + /* > + * This is a hcall, so register convention is as > + * Documentation/powerpc/papr_hcalls.rst, with these additions: > + * R13 = PACA > + * guest R13 saved in SPRN_SCRATCH0 > + * R10 = free > + * guest r10 saved in PACA_EXGEN > + * > + * This may also be a syscall from PR-KVM userspace that is to be > + * reflected to the PR guest kernel, so registers may be set up for > + * a system call rather than hcall. We don't currently clobber > + * anything here, but the 0xc00 handler has already clobbered CTR > + * and CR0, so PR-KVM can not support a guest kernel that preserves > + * those registers across its s
Re: [PATCH v1 5/7] KVM: PPC: Book3S 64: Move interrupt early register setup to KVM
Nicholas Piggin writes: > Like the earlier patch for hcalls, KVM interrupt entry requires a > different calling convention than the Linux interrupt handlers > set up. Move the code that converts from one to the other into KVM. > > Signed-off-by: Nicholas Piggin Reviewed-by: Fabiano Rosas > --- > arch/powerpc/kernel/exceptions-64s.S | 131 +-- > arch/powerpc/kvm/book3s_64_entry.S | 50 +- > 2 files changed, 71 insertions(+), 110 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 1bfd0d7af09e..cd1731642b12 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -187,7 +187,6 @@ do_define_int n > .endif > .endm > > -#ifdef CONFIG_KVM_BOOK3S_64_HANDLER > /* > * All interrupts which set HSRR registers, as well as SRESET and MCE and > * syscall when invoked with "sc 1" switch to MSR[HV]=1 (HVMODE) to be taken, > @@ -220,54 +219,25 @@ do_define_int n > * to KVM to handle. > */ > > -.macro KVMTEST name > +.macro KVMTEST name handler > +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER > lbz r10,HSTATE_IN_GUEST(r13) > cmpwi r10,0 > - bne \name\()_kvm > -.endm > - > -.macro GEN_KVM name > - .balign IFETCH_ALIGN_BYTES > -\name\()_kvm: > - > -BEGIN_FTR_SECTION > - ld r10,IAREA+EX_CFAR(r13) > - std r10,HSTATE_CFAR(r13) > -END_FTR_SECTION_IFSET(CPU_FTR_CFAR) > - > - ld r10,IAREA+EX_CTR(r13) > - mtctr r10 > -BEGIN_FTR_SECTION > - ld r10,IAREA+EX_PPR(r13) > - std r10,HSTATE_PPR(r13) > -END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > - ld r11,IAREA+EX_R11(r13) > - ld r12,IAREA+EX_R12(r13) > - std r12,HSTATE_SCRATCH0(r13) > - sldir12,r9,32 > - ld r9,IAREA+EX_R9(r13) > - ld r10,IAREA+EX_R10(r13) > /* HSRR variants have the 0x2 bit added to their trap number */ > .if IHSRR_IF_HVMODE > BEGIN_FTR_SECTION > - ori r12,r12,(IVEC + 0x2) > + li r10,(IVEC + 0x2) > FTR_SECTION_ELSE > - ori r12,r12,(IVEC) > + li r10,(IVEC) > ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) > .elseif IHSRR > - ori r12,r12,(IVEC+ 0x2) > + li r10,(IVEC + 0x2) > .else > - ori r12,r12,(IVEC) > + li r10,(IVEC) > .endif > - b kvmppc_interrupt > -.endm > - > -#else > -.macro KVMTEST name > -.endm > -.macro GEN_KVM name > -.endm > + bne \handler > #endif > +.endm > > /* > * This is the BOOK3S interrupt entry code macro. > @@ -409,7 +379,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR) > DEFINE_FIXED_SYMBOL(\name\()_common_real) > \name\()_common_real: > .if IKVM_REAL > - KVMTEST \name > + KVMTEST \name kvm_interrupt > .endif > > ld r10,PACAKMSR(r13) /* get MSR value for kernel */ > @@ -432,7 +402,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real) > DEFINE_FIXED_SYMBOL(\name\()_common_virt) > \name\()_common_virt: > .if IKVM_VIRT > - KVMTEST \name > + KVMTEST \name kvm_interrupt > 1: > .endif > .endif /* IVIRT */ > @@ -446,7 +416,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt) > DEFINE_FIXED_SYMBOL(\name\()_common_real) > \name\()_common_real: > .if IKVM_REAL > - KVMTEST \name > + KVMTEST \name kvm_interrupt > .endif > .endm > > @@ -967,8 +937,6 @@ EXC_COMMON_BEGIN(system_reset_common) > EXCEPTION_RESTORE_REGS > RFI_TO_USER_OR_KERNEL > > - GEN_KVM system_reset > - > > /** > * Interrupt 0x200 - Machine Check Interrupt (MCE). > @@ -1132,7 +1100,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) > /* >* Check if we are coming from guest. If yes, then run the normal >* exception handler which will take the > - * machine_check_kvm->kvmppc_interrupt branch to deliver the MC event > + * machine_check_kvm->kvm_interrupt branch to deliver the MC event >* to guest. >*/ > lbz r11,HSTATE_IN_GUEST(r13) > @@ -1203,8 +1171,6 @@ EXC_COMMON_BEGIN(machine_check_common) > bl machine_check_exception > b interrupt_return > > - GEN_KVM machine_check > - > > #ifdef CONFIG_PPC_P7_NAP > /* > @@ -1339,8 +1305,6 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) > REST_NVGPRS(r1) > b interrupt_return > > - GEN_KVM data_access > - > > /** > * Interrupt 0x380 - Data Segment Interrupt (DSLB). > @@ -1390,8 +1354,6 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) > bl do_bad_slb_fault > b interrupt_return > > - GEN_KVM data_access_slb > - > > /** > * Interrupt 0x400 - Instruction Storage Interrupt (ISI). > @@ -1428,8 +1390,6 @@ MMU_FTR_SECTION_ELSE > ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) > b interrupt_return > > - GEN_KVM instruction_access > -
Re: [PATCH v2] powerpc/kexec_file: use current CPU info while setting up FDT
On 16/04/21 6:16 pm, Sourabh Jain wrote: kexec_file_load uses initial_boot_params in setting up the device-tree for the kernel to be loaded. Though initial_boot_params holds info about CPUs at the time of boot, it doesn't account for hot added CPUs. So, kexec'ing with kexec_file_load syscall would leave the kexec'ed kernel with inaccurate CPU info. Also, if kdump kernel is loaded with kexec_file_load syscall and the system crashes on a hot added CPU, capture kernel hangs failing to identify the boot CPU. Kernel panic - not syncing: sysrq triggered crash CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 5.12.0-rc5upstream #54 Call Trace: [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 (unreliable) [c000e590fb00] [c0145290] panic+0x16c/0x41c [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40 [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0 [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178 [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0 [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330 [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140 [c000e590fdb0] [c0031ee0] system_call_exception+0x150/0x290 [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278 --- interrupt: c00 at 0x7fff905b9664 NIP: 7fff905b9664 LR: 7fff905320c4 CTR: REGS: c000e590fe80 TRAP: 0c00 Not tainted (5.12.0-rc5upstream) MSR: 8280f033 CR: 28000242 XER: IRQMASK: 0 GPR00: 0004 75fedf30 7fff906a7300 0001 GPR04: 01002a7355b0 0002 0001 75fef616 GPR08: 0001 GPR12: 7fff9073a160 GPR16: GPR20: 7fff906a4ee0 0002 0001 GPR24: 7fff906a0898 0002 01002a7355b0 GPR28: 0002 7fff906a1790 01002a7355b0 0002 NIP [7fff905b9664] 0x7fff905b9664 LR [7fff905320c4] 0x7fff905320c4 --- interrupt: c00 To avoid this from happening, extract current CPU info from of_root device node and use it for setting up the fdt in kexec_file_load case. Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory reserve map") Missed marking sta...@vger.kernel.org on Cc for this fix.. +int add_node_prop(void *fdt, int node_offset, const struct device_node *np) +{ +int update_cpus_node(void *fdt) +{ I think the above two new functions should be marked 'static'... Thanks Hari
[PATCH net-next 2/2] powerpc: dts: fsl: Drop obsolete fsl, rx-bit-map and fsl, tx-bit-map properties
These are very old properties that were used by the "gianfar" ethernet driver. They don't have documented bindings and are obsolete. Signed-off-by: Claudiu Manoil --- arch/powerpc/boot/dts/fsl/bsc9131si-post.dtsi | 4 arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi | 4 arch/powerpc/boot/dts/fsl/c293si-post.dtsi| 4 arch/powerpc/boot/dts/fsl/p1010si-post.dtsi | 21 --- 4 files changed, 33 deletions(-) diff --git a/arch/powerpc/boot/dts/fsl/bsc9131si-post.dtsi b/arch/powerpc/boot/dts/fsl/bsc9131si-post.dtsi index 0c0efa94cfb4..2a677fd323eb 100644 --- a/arch/powerpc/boot/dts/fsl/bsc9131si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/bsc9131si-post.dtsi @@ -170,8 +170,6 @@ timer@41100 { /include/ "pq3-etsec2-0.dtsi" enet0: ethernet@b { queue-group@b { - fsl,rx-bit-map = <0xff>; - fsl,tx-bit-map = <0xff>; interrupts = <26 2 0 0 27 2 0 0 28 2 0 0>; }; }; @@ -179,8 +177,6 @@ queue-group@b { /include/ "pq3-etsec2-1.dtsi" enet1: ethernet@b1000 { queue-group@b1000 { - fsl,rx-bit-map = <0xff>; - fsl,tx-bit-map = <0xff>; interrupts = <33 2 0 0 34 2 0 0 35 2 0 0>; }; }; diff --git a/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi b/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi index b5f071574e83..b8e0edd1ac69 100644 --- a/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi @@ -190,8 +190,6 @@ sec_jr3: jr@4000 { /include/ "pq3-etsec2-0.dtsi" enet0: ethernet@b { queue-group@b { - fsl,rx-bit-map = <0xff>; - fsl,tx-bit-map = <0xff>; interrupts = <26 2 0 0 27 2 0 0 28 2 0 0>; }; }; @@ -199,8 +197,6 @@ queue-group@b { /include/ "pq3-etsec2-1.dtsi" enet1: ethernet@b1000 { queue-group@b1000 { - fsl,rx-bit-map = <0xff>; - fsl,tx-bit-map = <0xff>; interrupts = <33 2 0 0 34 2 0 0 35 2 0 0>; }; }; diff --git a/arch/powerpc/boot/dts/fsl/c293si-post.dtsi b/arch/powerpc/boot/dts/fsl/c293si-post.dtsi index bd208320bff5..bec0fc36849d 100644 --- a/arch/powerpc/boot/dts/fsl/c293si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/c293si-post.dtsi @@ -171,8 +171,6 @@ jr@2000{ enet0: ethernet@b { queue-group@b { reg = <0x1 0x1000>; - fsl,rx-bit-map = <0xff>; - fsl,tx-bit-map = <0xff>; }; }; @@ -180,8 +178,6 @@ queue-group@b { enet1: ethernet@b1000 { queue-group@b1000 { reg = <0x11000 0x1000>; - fsl,rx-bit-map = <0xff>; - fsl,tx-bit-map = <0xff>; }; }; diff --git a/arch/powerpc/boot/dts/fsl/p1010si-post.dtsi b/arch/powerpc/boot/dts/fsl/p1010si-post.dtsi index 1b4aafc1f6a2..c2717f31925a 100644 --- a/arch/powerpc/boot/dts/fsl/p1010si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/p1010si-post.dtsi @@ -172,29 +172,8 @@ sdhc@2e000 { /include/ "pq3-mpic-timer-B.dtsi" /include/ "pq3-etsec2-0.dtsi" - enet0: ethernet@b { - queue-group@b { - fsl,rx-bit-map = <0xff>; - fsl,tx-bit-map = <0xff>; - }; - }; - /include/ "pq3-etsec2-1.dtsi" - enet1: ethernet@b1000 { - queue-group@b1000 { - fsl,rx-bit-map = <0xff>; - fsl,tx-bit-map = <0xff>; - }; - }; - /include/ "pq3-etsec2-2.dtsi" - enet2: ethernet@b2000 { - queue-group@b2000 { - fsl,rx-bit-map = <0xff>; - fsl,tx-bit-map = <0xff>; - }; - - }; global-utilities@e { compatible = "fsl,p1010-guts"; -- 2.25.1
[PATCH net-next 1/2] gianfar: Drop GFAR_MQ_POLLING support
Gianfar used to enable all 8 Rx queues (DMA rings) per ethernet device, even though the controller can only support 2 interrupt lines at most. This meant that multiple Rx queues would have to be grouped per NAPI poll routine, and the CPU would have to split the budget and service them in a round robin manner. The overhead of this scheme proved to outweight the potential benefits. The alternative was to introduce the "Single Queue" polling mode, supporting one Rx queue per NAPI, which became the default packet processing option and helped improve the performance of the driver. MQ_POLLING also relies on undocumeted device tree properties to specify how to map the 8 Rx and Tx queues to a given interrupt line (aka "interrupt group"). Using module parameters to enable this mode wasn't an option either. Long story short, MQ_POLLING became obsolete, now it is just dead code, and no one asked for it so far. For the Tx queues, multi-queue support (more than 1 Tx queue per CPU) could be revisited by adding tc MQPRIO support, but again, one has to consider that there are only 2 interrupt lines. So the NAPI poll routine would have to service multiple Tx rings. Signed-off-by: Claudiu Manoil --- drivers/net/ethernet/freescale/gianfar.c | 170 ++- drivers/net/ethernet/freescale/gianfar.h | 17 --- 2 files changed, 11 insertions(+), 176 deletions(-) diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index 3ec4d9fddd52..4e4c62d4061e 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -175,10 +175,7 @@ static void gfar_mac_rx_config(struct gfar_private *priv) if (priv->rx_filer_enable) { rctrl |= RCTRL_FILREN | RCTRL_PRSDEP_INIT; /* Program the RIR0 reg with the required distribution */ - if (priv->poll_mode == GFAR_SQ_POLLING) - gfar_write(®s->rir0, DEFAULT_2RXQ_RIR0); - else /* GFAR_MQ_POLLING */ - gfar_write(®s->rir0, DEFAULT_8RXQ_RIR0); + gfar_write(®s->rir0, DEFAULT_2RXQ_RIR0); } /* Restore PROMISC mode */ @@ -521,29 +518,9 @@ static int gfar_parse_group(struct device_node *np, grp->priv = priv; spin_lock_init(&grp->grplock); if (priv->mode == MQ_MG_MODE) { - u32 rxq_mask, txq_mask; - int ret; - + /* One Q per interrupt group: Q0 to G0, Q1 to G1 */ grp->rx_bit_map = (DEFAULT_MAPPING >> priv->num_grps); grp->tx_bit_map = (DEFAULT_MAPPING >> priv->num_grps); - - ret = of_property_read_u32(np, "fsl,rx-bit-map", &rxq_mask); - if (!ret) { - grp->rx_bit_map = rxq_mask ? - rxq_mask : (DEFAULT_MAPPING >> priv->num_grps); - } - - ret = of_property_read_u32(np, "fsl,tx-bit-map", &txq_mask); - if (!ret) { - grp->tx_bit_map = txq_mask ? - txq_mask : (DEFAULT_MAPPING >> priv->num_grps); - } - - if (priv->poll_mode == GFAR_SQ_POLLING) { - /* One Q per interrupt group: Q0 to G0, Q1 to G1 */ - grp->rx_bit_map = (DEFAULT_MAPPING >> priv->num_grps); - grp->tx_bit_map = (DEFAULT_MAPPING >> priv->num_grps); - } } else { grp->rx_bit_map = 0xFF; grp->tx_bit_map = 0xFF; @@ -650,18 +627,15 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev) u32 stash_len = 0; u32 stash_idx = 0; unsigned int num_tx_qs, num_rx_qs; - unsigned short mode, poll_mode; + unsigned short mode; if (!np) return -ENODEV; - if (of_device_is_compatible(np, "fsl,etsec2")) { + if (of_device_is_compatible(np, "fsl,etsec2")) mode = MQ_MG_MODE; - poll_mode = GFAR_SQ_POLLING; - } else { + else mode = SQ_SG_MODE; - poll_mode = GFAR_SQ_POLLING; - } if (mode == SQ_SG_MODE) { num_tx_qs = 1; @@ -677,22 +651,8 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev) return -EINVAL; } - if (poll_mode == GFAR_SQ_POLLING) { - num_tx_qs = num_grps; /* one txq per int group */ - num_rx_qs = num_grps; /* one rxq per int group */ - } else { /* GFAR_MQ_POLLING */ - u32 tx_queues, rx_queues; - int ret; - - /* parse the num of HW tx and rx queues */ - ret = of_property_read_u32(np, "fsl,num_tx_queues", - &tx_queues); - num_tx_q
[PATCH net-next 0/2] net: gianfar: Drop GFAR_MQ_POLLING support
Drop long time obsolete "per NAPI multi-queue" support in gianfar, and related (and undocumented) device tree properties. Claudiu Manoil (2): gianfar: Drop GFAR_MQ_POLLING support powerpc: dts: fsl: Drop obsolete fsl,rx-bit-map and fsl,tx-bit-map properties arch/powerpc/boot/dts/fsl/bsc9131si-post.dtsi | 4 - arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi | 4 - arch/powerpc/boot/dts/fsl/c293si-post.dtsi| 4 - arch/powerpc/boot/dts/fsl/p1010si-post.dtsi | 21 --- drivers/net/ethernet/freescale/gianfar.c | 170 ++ drivers/net/ethernet/freescale/gianfar.h | 17 -- 6 files changed, 11 insertions(+), 209 deletions(-) -- 2.25.1
Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems
On Fri, 16 Apr 2021 16:27:55 +0100 Matthew Wilcox wrote: > On Thu, Apr 15, 2021 at 08:08:32PM +0200, Jesper Dangaard Brouer wrote: > > See below patch. Where I swap32 the dma address to satisfy > > page->compound having bit zero cleared. (It is the simplest fix I could > > come up with). > > I think this is slightly simpler, and as a bonus code that assumes the > old layout won't compile. This is clever, I like it! When reading the code one just have to remember 'unsigned long' size difference between 64-bit vs 32-bit. And I assume compiler can optimize the sizeof check out then doable. > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 6613b26a8894..5aacc1c10a45 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -97,10 +97,10 @@ struct page { > }; > struct {/* page_pool used by netstack */ > /** > - * @dma_addr: might require a 64-bit value even on > + * @dma_addr: might require a 64-bit value on >* 32-bit architectures. >*/ > - dma_addr_t dma_addr; > + unsigned long dma_addr[2]; > }; > struct {/* slab, slob and slub */ > union { > diff --git a/include/net/page_pool.h b/include/net/page_pool.h > index b5b195305346..db7c7020746a 100644 > --- a/include/net/page_pool.h > +++ b/include/net/page_pool.h > @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct > page_pool *pool, > > static inline dma_addr_t page_pool_get_dma_addr(struct page *page) > { > - return page->dma_addr; > + dma_addr_t ret = page->dma_addr[0]; > + if (sizeof(dma_addr_t) > sizeof(unsigned long)) > + ret |= (dma_addr_t)page->dma_addr[1] << 32; > + return ret; > +} > + > +static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr) > +{ > + page->dma_addr[0] = addr; > + if (sizeof(dma_addr_t) > sizeof(unsigned long)) > + page->dma_addr[1] = addr >> 32; > } > > static inline bool is_page_pool_compiled_in(void) > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index ad8b0707af04..f014fd8c19a6 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct > page_pool *pool, > struct page *page, > unsigned int dma_sync_size) > { > + dma_addr_t dma_addr = page_pool_get_dma_addr(page); > + > dma_sync_size = min(dma_sync_size, pool->p.max_len); > - dma_sync_single_range_for_device(pool->p.dev, page->dma_addr, > + dma_sync_single_range_for_device(pool->p.dev, dma_addr, >pool->p.offset, dma_sync_size, >pool->p.dma_dir); > } > @@ -226,7 +228,7 @@ static struct page *__page_pool_alloc_pages_slow(struct > page_pool *pool, > put_page(page); > return NULL; > } > - page->dma_addr = dma; > + page_pool_set_dma_addr(page, dma); > > if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) > page_pool_dma_sync_for_device(pool, page, pool->p.max_len); > @@ -294,13 +296,13 @@ void page_pool_release_page(struct page_pool *pool, > struct page *page) >*/ > goto skip_dma_unmap; > > - dma = page->dma_addr; > + dma = page_pool_get_dma_addr(page); > > - /* When page is unmapped, it cannot be returned our pool */ > + /* When page is unmapped, it cannot be returned to our pool */ > dma_unmap_page_attrs(pool->p.dev, dma, >PAGE_SIZE << pool->p.order, pool->p.dma_dir, >DMA_ATTR_SKIP_CPU_SYNC); > - page->dma_addr = 0; > + page_pool_set_dma_addr(page, 0); > skip_dma_unmap: > /* This may be the last page returned, releasing the pool, so >* it is not safe to reference pool afterwards. > -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH 3/3] powerpc/smp: Cache CPU to chip lookup
* Gautham R Shenoy [2021-04-16 21:27:48]: > On Thu, Apr 15, 2021 at 11:21:10PM +0530, Srikar Dronamraju wrote: > > * Gautham R Shenoy [2021-04-15 22:49:21]: > > > > > > > > > > +int *chip_id_lookup_table; > > > > + > > > > #ifdef CONFIG_PPC64 > > > > int __initdata iommu_is_off; > > > > int __initdata iommu_force_on; > > > > @@ -914,13 +916,22 @@ EXPORT_SYMBOL(of_get_ibm_chip_id); > > > > int cpu_to_chip_id(int cpu) > > > > { > > > > struct device_node *np; > > > > + int ret = -1, idx; > > > > + > > > > + idx = cpu / threads_per_core; > > > > + if (chip_id_lookup_table && chip_id_lookup_table[idx] != -1) > > > > > > > > The value -1 is ambiguous since we won't be able to determine if > > > it is because we haven't yet made a of_get_ibm_chip_id() call > > > or if of_get_ibm_chip_id() call was made and it returned a -1. > > > > > > > We don't allocate chip_id_lookup_table unless cpu_to_chip_id() return > > !-1 value for the boot-cpuid. So this ensures that we dont > > unnecessarily allocate chip_id_lookup_table. Also I check for > > chip_id_lookup_table before calling cpu_to_chip_id() for other CPUs. > > So this avoids overhead of calling cpu_to_chip_id() for platforms that > > dont support it. Also its most likely that if the > > chip_id_lookup_table is initialized then of_get_ibm_chip_id() call > > would return a valid value. > > > > + Below we are only populating the lookup table, only when the > > of_get_cpu_node is valid. > > > > So I dont see any drawbacks of initializing it to -1. Do you see > any? > > > Only if other callers of cpu_to_chip_id() don't check for whether the > chip_id_lookup_table() has been allocated or not. From a code > readability point of view, it is easier to have that check this inside > cpu_to_chip_id() instead of requiring all its callers to make that > check. > I didn't understand your comment. However let me reiterate what I said earlier. We don't have control over who and when cpu_to_chip_id() gets called. If the cpu_to_chip_id() might be called for non present CPU, in which case it will return -1, Should we cache it or not? If we cache it, we will return wrong value when the CPU may turn out to be present. If we cache and retry it then having one value for initializing and another for invalid is all the same as having just 1 value for initializing and invalid. Just that we end up adding more confusing code. Atleast to me, code isnt readable if I say retry for -1 and -2 too. After few years, we ourselves will wonder why we have two values if we are checking and performing same actions. > > > > > Thus, perhaps we can initialize chip_id_lookup_table[idx] with a > > > different unique negative value. How about S32_MIN ? and check > > > chip_id_lookup_table[idx] is different here ? > > > > > > > I had initially initialized to -2, But then I thought we adding in > > more confusion than necessary and it was not solving any issues. > > > > > > -- > > Thanks and Regards > > Srikar Dronamraju -- Thanks and Regards Srikar Dronamraju
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
On 16/04/2021 16:15, Christophe Leroy wrote: Le 16/04/2021 à 17:04, Christophe Leroy a écrit : Le 16/04/2021 à 16:40, Christophe Leroy a écrit : Le 16/04/2021 à 15:00, Steven Price a écrit : On 16/04/2021 12:08, Christophe Leroy wrote: Le 16/04/2021 à 12:51, Steven Price a écrit : On 16/04/2021 11:38, Christophe Leroy wrote: Le 16/04/2021 à 11:28, Steven Price a écrit : To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN output to x86. Given the generic ptdump code has handling for KASAN already it should be possible to drop that from the powerpc arch code, which I think means we don't actually need to provide page size to notepage(). Hopefully that means more code to delete ;) Yes ... and no. It looks like the generic ptdump handles the case when several pgdir entries points to the same kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular page tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page . I'm not sure I follow quite how powerpc is different here. But could you have a similar check for PTEs against kasan_early_shadow_pte as the other levels already have? I'm just worried that page_size isn't well defined in this interface and it's going to cause problems in the future. I'm trying. I reverted the two commits b00ff6d8c and cabe8138. At the moment, I don't get exactly what I expect: For linear memory I get one line for each 8M page whereas before reverting the patches I got one 16M line and one 112M line. And for KASAN shadow area I get two lines for the 2x 8M pages shadowing linear mem then I get one 4M line for each PGDIR entry pointing to kasan_early_shadow_pte. 0xf800-0xf87f 0x0700 8M huge rw present 0xf880-0xf8ff 0x0780 8M huge rw present 0xf900-0xf93f 0x0143 4M r present ... 0xfec0-0xfeff 0x0143 4M r present Any idea ? I think the different with other architectures is here: } else if (flag != st->current_flags || level != st->level || addr >= st->marker[1].start_address || pa != st->last_pa + PAGE_SIZE) { In addition to the checks everyone do, powerpc also checks "pa != st->last_pa + PAGE_SIZE". And it is definitely for that test that page_size argument add been added. By replacing that test by (pa - st->start_pa != addr - st->start_address) it works again. So we definitely don't need the real page size. Yes that should work. Thanks for figuring it out! I see that other architectures except RISCV don't dump the physical address. But even RISCV doesn't include that check. Yes not having the physical address certainly simplifies things - although I can see why that can be handy to see. The disadvantage is that user space or vmalloc()'d memory will produce a lot of output because the physical addresses are unlikely to be contiguous. And for most uses you don't need the information. That physical address dump was added by commit aaa229529244 ("powerpc/mm: Add physical address to Linux page table dump") [https://github.com/torvalds/linux/commit/aaa2295] How do other architectures deal with the problem described by the commit log of that patch ? AFAIK other architectures are "broken" in this regard. In practice I don't think it often causes an issue though. Steve
Re: [PATCH 3/3] powerpc/smp: Cache CPU to chip lookup
On Thu, Apr 15, 2021 at 11:21:10PM +0530, Srikar Dronamraju wrote: > * Gautham R Shenoy [2021-04-15 22:49:21]: > > > > > > > +int *chip_id_lookup_table; > > > + > > > #ifdef CONFIG_PPC64 > > > int __initdata iommu_is_off; > > > int __initdata iommu_force_on; > > > @@ -914,13 +916,22 @@ EXPORT_SYMBOL(of_get_ibm_chip_id); > > > int cpu_to_chip_id(int cpu) > > > { > > > struct device_node *np; > > > + int ret = -1, idx; > > > + > > > + idx = cpu / threads_per_core; > > > + if (chip_id_lookup_table && chip_id_lookup_table[idx] != -1) > > > > > The value -1 is ambiguous since we won't be able to determine if > > it is because we haven't yet made a of_get_ibm_chip_id() call > > or if of_get_ibm_chip_id() call was made and it returned a -1. > > > > We don't allocate chip_id_lookup_table unless cpu_to_chip_id() return > !-1 value for the boot-cpuid. So this ensures that we dont > unnecessarily allocate chip_id_lookup_table. Also I check for > chip_id_lookup_table before calling cpu_to_chip_id() for other CPUs. > So this avoids overhead of calling cpu_to_chip_id() for platforms that > dont support it. Also its most likely that if the > chip_id_lookup_table is initialized then of_get_ibm_chip_id() call > would return a valid value. > > + Below we are only populating the lookup table, only when the > of_get_cpu_node is valid. > > So I dont see any drawbacks of initializing it to -1. Do you see any? Only if other callers of cpu_to_chip_id() don't check for whether the chip_id_lookup_table() has been allocated or not. From a code readability point of view, it is easier to have that check this inside cpu_to_chip_id() instead of requiring all its callers to make that check. > > > Thus, perhaps we can initialize chip_id_lookup_table[idx] with a > > different unique negative value. How about S32_MIN ? and check > > chip_id_lookup_table[idx] is different here ? > > > > I had initially initialized to -2, But then I thought we adding in > more confusion than necessary and it was not solving any issues. > > > -- > Thanks and Regards > Srikar Dronamraju
Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems
On Thu, Apr 15, 2021 at 08:08:32PM +0200, Jesper Dangaard Brouer wrote: > See below patch. Where I swap32 the dma address to satisfy > page->compound having bit zero cleared. (It is the simplest fix I could > come up with). I think this is slightly simpler, and as a bonus code that assumes the old layout won't compile. diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6613b26a8894..5aacc1c10a45 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -97,10 +97,10 @@ struct page { }; struct {/* page_pool used by netstack */ /** -* @dma_addr: might require a 64-bit value even on +* @dma_addr: might require a 64-bit value on * 32-bit architectures. */ - dma_addr_t dma_addr; + unsigned long dma_addr[2]; }; struct {/* slab, slob and slub */ union { diff --git a/include/net/page_pool.h b/include/net/page_pool.h index b5b195305346..db7c7020746a 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct page_pool *pool, static inline dma_addr_t page_pool_get_dma_addr(struct page *page) { - return page->dma_addr; + dma_addr_t ret = page->dma_addr[0]; + if (sizeof(dma_addr_t) > sizeof(unsigned long)) + ret |= (dma_addr_t)page->dma_addr[1] << 32; + return ret; +} + +static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr) +{ + page->dma_addr[0] = addr; + if (sizeof(dma_addr_t) > sizeof(unsigned long)) + page->dma_addr[1] = addr >> 32; } static inline bool is_page_pool_compiled_in(void) diff --git a/net/core/page_pool.c b/net/core/page_pool.c index ad8b0707af04..f014fd8c19a6 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct page_pool *pool, struct page *page, unsigned int dma_sync_size) { + dma_addr_t dma_addr = page_pool_get_dma_addr(page); + dma_sync_size = min(dma_sync_size, pool->p.max_len); - dma_sync_single_range_for_device(pool->p.dev, page->dma_addr, + dma_sync_single_range_for_device(pool->p.dev, dma_addr, pool->p.offset, dma_sync_size, pool->p.dma_dir); } @@ -226,7 +228,7 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, put_page(page); return NULL; } - page->dma_addr = dma; + page_pool_set_dma_addr(page, dma); if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) page_pool_dma_sync_for_device(pool, page, pool->p.max_len); @@ -294,13 +296,13 @@ void page_pool_release_page(struct page_pool *pool, struct page *page) */ goto skip_dma_unmap; - dma = page->dma_addr; + dma = page_pool_get_dma_addr(page); - /* When page is unmapped, it cannot be returned our pool */ + /* When page is unmapped, it cannot be returned to our pool */ dma_unmap_page_attrs(pool->p.dev, dma, PAGE_SIZE << pool->p.order, pool->p.dma_dir, DMA_ATTR_SKIP_CPU_SYNC); - page->dma_addr = 0; + page_pool_set_dma_addr(page, 0); skip_dma_unmap: /* This may be the last page returned, releasing the pool, so * it is not safe to reference pool afterwards.
Re: [PATCH v2] powerpc/kexec_file: use current CPU info while setting up FDT
Hi Sourabh, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on powerpc/next] [also build test WARNING on linus/master v5.12-rc7 next-20210416] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Sourabh-Jain/powerpc-kexec_file-use-current-CPU-info-while-setting-up-FDT/20210416-204821 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-allyesconfig (attached as .config) compiler: powerpc64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/c4f40243a6928fb16798b2b98c5371815b49e4cc git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Sourabh-Jain/powerpc-kexec_file-use-current-CPU-info-while-setting-up-FDT/20210416-204821 git checkout c4f40243a6928fb16798b2b98c5371815b49e4cc # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross W=1 ARCH=powerpc If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): >> arch/powerpc/kexec/file_load_64.c:972:5: warning: no previous prototype for >> 'add_node_prop' [-Wmissing-prototypes] 972 | int add_node_prop(void *fdt, int node_offset, const struct device_node *np) | ^ >> arch/powerpc/kexec/file_load_64.c:1003:5: warning: no previous prototype for >> 'update_cpus_node' [-Wmissing-prototypes] 1003 | int update_cpus_node(void *fdt) | ^~~~ vim +/add_node_prop +972 arch/powerpc/kexec/file_load_64.c 962 963 /** 964 * add_node_prop - Read property from device node structure and add 965 * them to fdt. 966 * @fdt:Flattened device tree of the kernel 967 * @node_offset:offset of the node to add a property at 968 * np: device node pointer 969 * 970 * Returns 0 on success, negative errno on error. 971 */ > 972 int add_node_prop(void *fdt, int node_offset, const struct device_node > *np) 973 { 974 int ret = 0; 975 struct property *pp; 976 unsigned long flags; 977 978 if (!np) 979 return -EINVAL; 980 981 raw_spin_lock_irqsave(&devtree_lock, flags); 982 for (pp = np->properties; pp; pp = pp->next) { 983 ret = fdt_setprop(fdt, node_offset, pp->name, 984pp->value, pp->length); 985 if (ret < 0) { 986 pr_err("Unable to add %s property: %s\n", 987 pp->name, fdt_strerror(ret)); 988 goto out; 989 } 990 } 991 out: 992 raw_spin_unlock_irqrestore(&devtree_lock, flags); 993 return ret; 994 } 995 996 /** 997 * update_cpus_node - Update cpus node of flattened device-tree using of_root 998 * device node. 999 * @fdt:Flattened device tree of the kernel. 1000 * 1001 * Returns 0 on success, negative errno on error. 1002 */ > 1003 int update_cpus_node(void *fdt) 1004 { 1005 struct device_node *cpus_node, *dn; 1006 int cpus_offset, cpus_subnode_off, ret = 0; 1007 1008 cpus_offset = fdt_path_offset(fdt, "/cpus"); 1009 if (cpus_offset == -FDT_ERR_NOTFOUND || cpus_offset > 0) { 1010 if (cpus_offset > 0) { 1011 ret = fdt_del_node(fdt, cpus_offset); 1012 if (ret < 0) { 1013 pr_err("Error deleting /cpus node: %s\n", 1014 fdt_strerror(ret)); 1015 return -EINVAL; 1016 } 1017 } 1018 1019 /* Add cpus node to fdt */ 1020 cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), 1021"cpus"); 1022 if (cpus_offset < 0) { 1023 pr_err("Error creating /cpus node: %s\n", 1024 fdt_strerror(cpus_offset)); 1025 return -E
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
Le 16/04/2021 à 17:04, Christophe Leroy a écrit : Le 16/04/2021 à 16:40, Christophe Leroy a écrit : Le 16/04/2021 à 15:00, Steven Price a écrit : On 16/04/2021 12:08, Christophe Leroy wrote: Le 16/04/2021 à 12:51, Steven Price a écrit : On 16/04/2021 11:38, Christophe Leroy wrote: Le 16/04/2021 à 11:28, Steven Price a écrit : To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN output to x86. Given the generic ptdump code has handling for KASAN already it should be possible to drop that from the powerpc arch code, which I think means we don't actually need to provide page size to notepage(). Hopefully that means more code to delete ;) Yes ... and no. It looks like the generic ptdump handles the case when several pgdir entries points to the same kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular page tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page . I'm not sure I follow quite how powerpc is different here. But could you have a similar check for PTEs against kasan_early_shadow_pte as the other levels already have? I'm just worried that page_size isn't well defined in this interface and it's going to cause problems in the future. I'm trying. I reverted the two commits b00ff6d8c and cabe8138. At the moment, I don't get exactly what I expect: For linear memory I get one line for each 8M page whereas before reverting the patches I got one 16M line and one 112M line. And for KASAN shadow area I get two lines for the 2x 8M pages shadowing linear mem then I get one 4M line for each PGDIR entry pointing to kasan_early_shadow_pte. 0xf800-0xf87f 0x0700 8M huge rw present 0xf880-0xf8ff 0x0780 8M huge rw present 0xf900-0xf93f 0x0143 4M r present ... 0xfec0-0xfeff 0x0143 4M r present Any idea ? I think the different with other architectures is here: } else if (flag != st->current_flags || level != st->level || addr >= st->marker[1].start_address || pa != st->last_pa + PAGE_SIZE) { In addition to the checks everyone do, powerpc also checks "pa != st->last_pa + PAGE_SIZE". And it is definitely for that test that page_size argument add been added. By replacing that test by (pa - st->start_pa != addr - st->start_address) it works again. So we definitely don't need the real page size. I see that other architectures except RISCV don't dump the physical address. But even RISCV doesn't include that check. That physical address dump was added by commit aaa229529244 ("powerpc/mm: Add physical address to Linux page table dump") [https://github.com/torvalds/linux/commit/aaa2295] How do other architectures deal with the problem described by the commit log of that patch ? Christophe
Re: [PATCH v1 1/1] powerpc/papr_scm: Properly handle UUID types and API
Andy Shevchenko writes: > On Thu, Apr 15, 2021 at 8:10 PM Vaibhav Jain wrote: >> >> >> Thanks for the patch Andy, >> >> Unfortunately ran into a compilation issue due to missing "#include >> " that provides definition for >> get_unaligned_le64(). Gcc reported following error: >> >> error: implicit declaration of function ‘get_unaligned_le64’ > > Right, I have not tested it (as mentioned in the comments to the patch) > >> After including the necessary header file, kernel compiled fine and I >> was able to test & verify the patch. > > Thank you very much for the testing. > > I'm not sure what the coverage of your test is. Your patch updates the way the interleaved set-cookies are populated in papr_scm which are then used to populate label entry for a namespace. I verified that the reported region setcookie hasnt changed for an nvdimm region before and after applying your patch for both BE and LE variants: # 64-bit Little endian kernel before applying the patch $ sudo cat /sys/devices/ndbus0/region0/set_cookie 0x8b6b26cbc930e2b5 # 64-bit Little endian kernel after applying your patch $ sudo cat /sys/devices/ndbus0/region0/set_cookie 0x8b6b26cbc930e2b5 # 64-bit Big endian kernel before applying your patch $ sudo cat /sys/devices/ndbus0/region0/set_cookie 0x8b6b26cbc930e2b5 # 64-bit Big endian kernel after applying your patch $ sudo cat /sys/devices/ndbus0/region0/set_cookie 0x8b6b26cbc930e2b5 > That's why I have an > additional question below. Is the byte ordering kept the same in BE > (32- and 64-bit) cases? Because I'm worrying that I might have missed > something. Libnvdimm store these cookies in label area as little endian values and based on the results above I think we are good. > > > -- > With Best Regards, > Andy Shevchenko -- Cheers ~ Vaibhav
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
Le 16/04/2021 à 16:40, Christophe Leroy a écrit : Le 16/04/2021 à 15:00, Steven Price a écrit : On 16/04/2021 12:08, Christophe Leroy wrote: Le 16/04/2021 à 12:51, Steven Price a écrit : On 16/04/2021 11:38, Christophe Leroy wrote: Le 16/04/2021 à 11:28, Steven Price a écrit : To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN output to x86. Given the generic ptdump code has handling for KASAN already it should be possible to drop that from the powerpc arch code, which I think means we don't actually need to provide page size to notepage(). Hopefully that means more code to delete ;) Yes ... and no. It looks like the generic ptdump handles the case when several pgdir entries points to the same kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular page tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page . I'm not sure I follow quite how powerpc is different here. But could you have a similar check for PTEs against kasan_early_shadow_pte as the other levels already have? I'm just worried that page_size isn't well defined in this interface and it's going to cause problems in the future. I'm trying. I reverted the two commits b00ff6d8c and cabe8138. At the moment, I don't get exactly what I expect: For linear memory I get one line for each 8M page whereas before reverting the patches I got one 16M line and one 112M line. And for KASAN shadow area I get two lines for the 2x 8M pages shadowing linear mem then I get one 4M line for each PGDIR entry pointing to kasan_early_shadow_pte. 0xf800-0xf87f 0x0700 8M huge rw present 0xf880-0xf8ff 0x0780 8M huge rw present 0xf900-0xf93f 0x0143 4M r present ... 0xfec0-0xfeff 0x0143 4M r present Any idea ? I think the different with other architectures is here: } else if (flag != st->current_flags || level != st->level || addr >= st->marker[1].start_address || pa != st->last_pa + PAGE_SIZE) { In addition to the checks everyone do, powerpc also checks "pa != st->last_pa + PAGE_SIZE". And it is definitely for that test that page_size argument add been added. I see that other architectures except RISCV don't dump the physical address. But even RISCV doesn't include that check. That physical address dump was added by commit aaa229529244 ("powerpc/mm: Add physical address to Linux page table dump") [https://github.com/torvalds/linux/commit/aaa2295] How do other architectures deal with the problem described by the commit log of that patch ? Christophe
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
Le 16/04/2021 à 15:00, Steven Price a écrit : On 16/04/2021 12:08, Christophe Leroy wrote: Le 16/04/2021 à 12:51, Steven Price a écrit : On 16/04/2021 11:38, Christophe Leroy wrote: Le 16/04/2021 à 11:28, Steven Price a écrit : On 15/04/2021 18:18, Christophe Leroy wrote: In order to support large pages on powerpc, notepage() needs to know the page size of the page. Add a page_size argument to notepage(). Signed-off-by: Christophe Leroy --- arch/arm64/mm/ptdump.c | 2 +- arch/riscv/mm/ptdump.c | 2 +- arch/s390/mm/dump_pagetables.c | 3 ++- arch/x86/mm/dump_pagetables.c | 2 +- include/linux/ptdump.h | 2 +- mm/ptdump.c | 16 6 files changed, 14 insertions(+), 13 deletions(-) [...] diff --git a/mm/ptdump.c b/mm/ptdump.c index da751448d0e4..61cd16afb1c8 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk, { struct ptdump_state *st = walk->private; - st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0])); + st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), PAGE_SIZE); I'm not completely sure what the page_size is going to be used for, but note that KASAN presents an interesting case here. We short-cut by detecting it's a KASAN region at a high level (PGD/P4D/PUD/PMD) and instead of walking the tree down just call note_page() *once* but with level==4 because we know KASAN sets up the page table like that. However the one call actually covers a much larger region - so while PAGE_SIZE matches the level it doesn't match the region covered. AFAICT this will lead to odd results if you enable KASAN on powerpc. Hum I successfully tested it with KASAN, I now realise that I tested it with CONFIG_KASAN_VMALLOC selected. In this situation, since https://github.com/torvalds/linux/commit/af3d0a686 we don't have any common shadow page table anymore. I'll test again without CONFIG_KASAN_VMALLOC. To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN output to x86. Given the generic ptdump code has handling for KASAN already it should be possible to drop that from the powerpc arch code, which I think means we don't actually need to provide page size to notepage(). Hopefully that means more code to delete ;) Yes ... and no. It looks like the generic ptdump handles the case when several pgdir entries points to the same kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular page tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page . I'm not sure I follow quite how powerpc is different here. But could you have a similar check for PTEs against kasan_early_shadow_pte as the other levels already have? I'm just worried that page_size isn't well defined in this interface and it's going to cause problems in the future. I'm trying. I reverted the two commits b00ff6d8c and cabe8138. At the moment, I don't get exactly what I expect: For linear memory I get one line for each 8M page whereas before reverting the patches I got one 16M line and one 112M line. And for KASAN shadow area I get two lines for the 2x 8M pages shadowing linear mem then I get one 4M line for each PGDIR entry pointing to kasan_early_shadow_pte. 0xf800-0xf87f 0x0700 8M hugerw present 0xf880-0xf8ff 0x0780 8M hugerw present 0xf900-0xf93f 0x0143 4M rpresent 0xf940-0xf97f 0x0143 4M rpresent 0xf980-0xf9bf 0x0143 4M rpresent 0xf9c0-0xf9ff 0x0143 4M rpresent 0xfa00-0xfa3f 0x0143 4M rpresent 0xfa40-0xfa7f 0x0143 4M rpresent 0xfa80-0xfabf 0x0143 4M rpresent 0xfac0-0xfaff 0x0143 4M rpresent 0xfb00-0xfb3f 0x0143 4M rpresent 0xfb40-0xfb7f 0x0143 4M r
Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()
On 4/16/21 2:05 AM, Michael Ellerman wrote: Daniel Axtens writes: On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote: Sorry - missed copying device-tree and powerpc mailing lists. There are a few "goto out;" statements before the local variable "fdt" is initialized through the call to of_kexec_alloc_and_setup_fdt() in elf64_load(). This will result in an uninitialized "fdt" being passed to kvfree() in this function if there is an error before the call to of_kexec_alloc_and_setup_fdt(). Initialize the local variable "fdt" to NULL. I'm a huge fan of initialising local variables! But I'm struggling to find the code path that will lead to an uninit fdt being returned... The out label reads in part: /* Make kimage_file_post_load_cleanup free the fdt buffer for us. */ return ret ? ERR_PTR(ret) : fdt; As far as I can tell, any time we get a non-zero ret, we're going to return an error pointer rather than the uninitialised value... As Dan pointed out, the new code is in linux-next. I have copied the new one below - the function doesn't return fdt, but instead sets it in the arch specific field (please see the link to the updated elf_64.c below). https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/tree/arch/powerpc/kexec/elf_64.c?h=for-next (btw, it does look like we might leak fdt if we have an error after we successfully kmalloc it.) Am I missing something? Can you link to the report for the kernel test robot or from Dan? /* * Once FDT buffer has been successfully passed to kexec_add_buffer(), * the FDT buffer address is saved in image->arch.fdt. In that case, * the memory cannot be freed here in case of any other error. */ if (ret && !image->arch.fdt) kvfree(fdt); return ret ? ERR_PTR(ret) : NULL; In case of an error, the memory allocated for fdt is freed unless it has already been passed to kexec_add_buffer(). thanks, -lakshmi FWIW, I think it's worth including this patch _anyway_ because initing local variables is good practice, but I'm just not sure on the justification. Why is it good practice? It defeats -Wuninitialized. So you're guaranteed to be returning something initialised, but not necessarily initialised to the right value. In a case like this NULL seems like a safe choice, but it's still wrong. The function is meant to return a pointer to the successfully allocated fdt, or an ERR_PTR() value. NULL is neither of those. I agree there are security reasons that initialising stack variables is desirable, but I think that should be handled by the compiler, not at the source level. cheers
Re: Bogus struct page layout on 32-bit
On Fri, Apr 16, 2021 at 11:27 AM 'Grygorii Strashko' via Clang Built Linux wrote: > On 10/04/2021 11:52, Ilias Apalodimas wrote: > > +CC Grygorii for the cpsw part as Ivan's email is not valid anymore > The TI platforms am3/4/5 (cpsw) and Keystone 2 (netcp) can do only 32bit DMA > even in case of LPAE (dma-ranges are used). > Originally, as I remember, CONFIG_ARCH_DMA_ADDR_T_64BIT has not been selected > for the LPAE case > on TI platforms and the fact that it became set is the result of > multi-paltform/allXXXconfig/DMA > optimizations and unification. > (just checked - not set in 4.14) > > Probable commit 4965a68780c5 ("arch: define the ARCH_DMA_ADDR_T_64BIT config > symbol in lib/Kconfig"). I completely missed this change in the past, and I don't really agree with it either. Most 32-bit Arm platforms are in fact limited to 32-bit DMA, even when they have MMIO or RAM areas above the 4GB boundary that require LPAE. > The TI drivers have been updated, finally to accept ARCH_DMA_ADDR_T_64BIT=y > by using > things like (__force u32) for example. > > Honestly, I've done sanity check of CPSW with LPAE=y > (ARCH_DMA_ADDR_T_64BIT=y) very long time ago. This is of course a good idea, drivers should work with any combination of 32-bit or 64-bit phys_addr_t and dma_addr_t. Arnd
Re: [PATCH v2] tools: do not include scripts/Kbuild.include
On 16/04/21 15:26, Christian Borntraeger wrote: On 16.04.21 15:00, Masahiro Yamada wrote: Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to scripts/Makefile.compiler"), some kselftests fail to build. The tools/ directory opted out Kbuild, and went in a different direction. They copy any kind of files to the tools/ directory in order to do whatever they want in their world. tools/build/Build.include mimics scripts/Kbuild.include, but some tool Makefiles included the Kbuild one to import a feature that is missing in tools/build/Build.include: - Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector" only if supported") included scripts/Kbuild.include from tools/thermal/tmon/Makefile to import the cc-option macro. - Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do not support -no-pie") included scripts/Kbuild.include from tools/testing/selftests/kvm/Makefile to import the try-run macro. - Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang failures") included scripts/Kbuild.include from tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR target. - Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for unrecognized option") included scripts/Kbuild.include from tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the try-run macro. Copy what they need into tools/build/Build.include, and make them include it instead of scripts/Kbuild.include. Link: https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f4...@linux.ibm.com/ Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to scripts/Makefile.compiler") Reported-by: Janosch Frank Reported-by: Christian Borntraeger Signed-off-by: Masahiro Yamada looks better. Tested-by: Christian Borntraeger Thank you very much Masahiro, this look great. Paolo
Re: [PATCH v2] tools: do not include scripts/Kbuild.include
On 16.04.21 15:00, Masahiro Yamada wrote: Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to scripts/Makefile.compiler"), some kselftests fail to build. The tools/ directory opted out Kbuild, and went in a different direction. They copy any kind of files to the tools/ directory in order to do whatever they want in their world. tools/build/Build.include mimics scripts/Kbuild.include, but some tool Makefiles included the Kbuild one to import a feature that is missing in tools/build/Build.include: - Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector" only if supported") included scripts/Kbuild.include from tools/thermal/tmon/Makefile to import the cc-option macro. - Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do not support -no-pie") included scripts/Kbuild.include from tools/testing/selftests/kvm/Makefile to import the try-run macro. - Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang failures") included scripts/Kbuild.include from tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR target. - Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for unrecognized option") included scripts/Kbuild.include from tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the try-run macro. Copy what they need into tools/build/Build.include, and make them include it instead of scripts/Kbuild.include. Link: https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f4...@linux.ibm.com/ Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to scripts/Makefile.compiler") Reported-by: Janosch Frank Reported-by: Christian Borntraeger Signed-off-by: Masahiro Yamada looks better. Tested-by: Christian Borntraeger
[PATCH] powerpc/pseries/mce: Fix a typo in error type assignment
The error type is ICACHE and DCACHE, for case MCE_ERROR_TYPE_ICACHE. Signed-off-by: Ganesh Goudar --- arch/powerpc/platforms/pseries/ras.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index f8b390a9d9fb..9d4ef65da7f3 100644 --- a/arch/powerpc/platforms/pseries/ras.c +++ b/arch/powerpc/platforms/pseries/ras.c @@ -699,7 +699,7 @@ static int mce_handle_err_virtmode(struct pt_regs *regs, mce_err.error_type = MCE_ERROR_TYPE_DCACHE; break; case MC_ERROR_TYPE_I_CACHE: - mce_err.error_type = MCE_ERROR_TYPE_DCACHE; + mce_err.error_type = MCE_ERROR_TYPE_ICACHE; break; case MC_ERROR_TYPE_UNKNOWN: default: -- 2.26.2
[PATCH v2] tools: do not include scripts/Kbuild.include
Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to scripts/Makefile.compiler"), some kselftests fail to build. The tools/ directory opted out Kbuild, and went in a different direction. They copy any kind of files to the tools/ directory in order to do whatever they want in their world. tools/build/Build.include mimics scripts/Kbuild.include, but some tool Makefiles included the Kbuild one to import a feature that is missing in tools/build/Build.include: - Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector" only if supported") included scripts/Kbuild.include from tools/thermal/tmon/Makefile to import the cc-option macro. - Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do not support -no-pie") included scripts/Kbuild.include from tools/testing/selftests/kvm/Makefile to import the try-run macro. - Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang failures") included scripts/Kbuild.include from tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR target. - Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for unrecognized option") included scripts/Kbuild.include from tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the try-run macro. Copy what they need into tools/build/Build.include, and make them include it instead of scripts/Kbuild.include. Link: https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f4...@linux.ibm.com/ Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to scripts/Makefile.compiler") Reported-by: Janosch Frank Reported-by: Christian Borntraeger Signed-off-by: Masahiro Yamada --- Changes in v2: - copy macros to tools/build/BUild.include tools/build/Build.include | 24 +++ tools/testing/selftests/bpf/Makefile | 2 +- tools/testing/selftests/kvm/Makefile | 2 +- .../selftests/powerpc/pmu/ebb/Makefile| 2 +- tools/thermal/tmon/Makefile | 2 +- 5 files changed, 28 insertions(+), 4 deletions(-) diff --git a/tools/build/Build.include b/tools/build/Build.include index 585486e40995..2cf3b1bde86e 100644 --- a/tools/build/Build.include +++ b/tools/build/Build.include @@ -100,3 +100,27 @@ cxx_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CXXFLAGS) -D"BUILD_STR(s)=\#s" $(CXX ## HOSTCC C flags host_c_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(KBUILD_HOSTCFLAGS) -D"BUILD_STR(s)=\#s" $(HOSTCFLAGS_$(basetarget).o) $(HOSTCFLAGS_$(obj)) + +# output directory for tests below +TMPOUT = .tmp_ + +# try-run +# Usage: option = $(call try-run, $(CC)...-o "$$TMP",option-ok,otherwise) +# Exit code chooses option. "$$TMP" serves as a temporary file and is +# automatically cleaned up. +try-run = $(shell set -e; \ + TMP=$(TMPOUT)/tmp; \ + mkdir -p $(TMPOUT); \ + trap "rm -rf $(TMPOUT)" EXIT; \ + if ($(1)) >/dev/null 2>&1; \ + then echo "$(2)"; \ + else echo "$(3)"; \ + fi) + +# cc-option +# Usage: cflags-y += $(call cc-option,-march=winchip-c6,-march=i586) +cc-option = $(call try-run, \ + $(CC) -Werror $(1) -c -x c /dev/null -o "$$TMP",$(1),$(2)) + +# delete partially updated (i.e. corrupted) files on error +.DELETE_ON_ERROR: diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 044bfdcf5b74..17a5cdf48d37 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 -include ../../../../scripts/Kbuild.include +include ../../../build/Build.include include ../../../scripts/Makefile.arch include ../../../scripts/Makefile.include diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index a6d61f451f88..5ef141f265bd 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only -include ../../../../scripts/Kbuild.include +include ../../../build/Build.include all: diff --git a/tools/testing/selftests/powerpc/pmu/ebb/Makefile b/tools/testing/selftests/powerpc/pmu/ebb/Makefile index af3df79d8163..c5ecb4634094 100644 --- a/tools/testing/selftests/powerpc/pmu/ebb/Makefile +++ b/tools/testing/selftests/powerpc/pmu/ebb/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 -include ../../../../../../scripts/Kbuild.include +include ../../../../../build/Build.include noarg: $(MAKE) -C ../../ diff --git a/tools/thermal/tmon/Makefile b/tools/thermal/tmon/Makefile index 59e417ec3e13..9db867df7679 100644 --- a/tools/thermal/tmon/Makefile +++ b/tools/thermal/tmon/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 # We need this for the "cc-option" macro. -include ../../../scripts/Kbuild.include +include ../../build/Build.include VERSION = 1.0 -- 2.27.0
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
On 16/04/2021 12:08, Christophe Leroy wrote: Le 16/04/2021 à 12:51, Steven Price a écrit : On 16/04/2021 11:38, Christophe Leroy wrote: Le 16/04/2021 à 11:28, Steven Price a écrit : On 15/04/2021 18:18, Christophe Leroy wrote: In order to support large pages on powerpc, notepage() needs to know the page size of the page. Add a page_size argument to notepage(). Signed-off-by: Christophe Leroy --- arch/arm64/mm/ptdump.c | 2 +- arch/riscv/mm/ptdump.c | 2 +- arch/s390/mm/dump_pagetables.c | 3 ++- arch/x86/mm/dump_pagetables.c | 2 +- include/linux/ptdump.h | 2 +- mm/ptdump.c | 16 6 files changed, 14 insertions(+), 13 deletions(-) [...] diff --git a/mm/ptdump.c b/mm/ptdump.c index da751448d0e4..61cd16afb1c8 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk, { struct ptdump_state *st = walk->private; - st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0])); + st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), PAGE_SIZE); I'm not completely sure what the page_size is going to be used for, but note that KASAN presents an interesting case here. We short-cut by detecting it's a KASAN region at a high level (PGD/P4D/PUD/PMD) and instead of walking the tree down just call note_page() *once* but with level==4 because we know KASAN sets up the page table like that. However the one call actually covers a much larger region - so while PAGE_SIZE matches the level it doesn't match the region covered. AFAICT this will lead to odd results if you enable KASAN on powerpc. Hum I successfully tested it with KASAN, I now realise that I tested it with CONFIG_KASAN_VMALLOC selected. In this situation, since https://github.com/torvalds/linux/commit/af3d0a686 we don't have any common shadow page table anymore. I'll test again without CONFIG_KASAN_VMALLOC. To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN output to x86. Given the generic ptdump code has handling for KASAN already it should be possible to drop that from the powerpc arch code, which I think means we don't actually need to provide page size to notepage(). Hopefully that means more code to delete ;) Yes ... and no. It looks like the generic ptdump handles the case when several pgdir entries points to the same kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular page tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page . I'm not sure I follow quite how powerpc is different here. But could you have a similar check for PTEs against kasan_early_shadow_pte as the other levels already have? I'm just worried that page_size isn't well defined in this interface and it's going to cause problems in the future. Steve
Re: [PATCH 2/2] tools: do not include scripts/Kbuild.include
On Fri, Apr 16, 2021 at 2:56 PM Christian Borntraeger wrote: > > > On 15.04.21 10:06, Christian Borntraeger wrote: > > > > On 15.04.21 09:27, Masahiro Yamada wrote: > >> Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to > >> scripts/Makefile.compiler"), some kselftests fail to build. > >> > >> The tools/ directory opted out Kbuild, and went in a different > >> direction. They copy any kind of files to the tools/ directory > >> in order to do whatever they want to do in their world. > >> > >> tools/build/Build.include mimics scripts/Kbuild.include, but some > >> tool Makefiles included the Kbuild one to import a feature that is > >> missing in tools/build/Build.include: > >> > >> - Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector" > >> only if supported") included scripts/Kbuild.include from > >> tools/thermal/tmon/Makefile to import the cc-option macro. > >> > >> - Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do > >> not support -no-pie") included scripts/Kbuild.include from > >> tools/testing/selftests/kvm/Makefile to import the try-run macro. > >> > >> - Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang > >> failures") included scripts/Kbuild.include from > >> tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR > >> target. > >> > >> - Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for > >> unrecognized option") included scripts/Kbuild.include from > >> tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the > >> try-run macro. > >> > >> Copy what they want there, and stop including scripts/Kbuild.include > >> from the tool Makefiles. > >> > >> Link: > >> https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f4...@linux.ibm.com/ > >> Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to > >> scripts/Makefile.compiler") > >> Reported-by: Janosch Frank > >> Reported-by: Christian Borntraeger > >> Signed-off-by: Masahiro Yamada > > > > When applying this on top of d9f4ff50d2aa ("kbuild: spilt cc-option and > > friends to scripts/Makefile.compiler") > > > > I still do get > > > > # Test Assertion Failure > > # lib/kvm_util.c:142: vm->fd >= 0 > > # pid=315635 tid=315635 - Invalid argument > > # 10x01002f4b: vm_open at kvm_util.c:142 > > # 2 (inlined by) vm_create at kvm_util.c:258 > > # 30x010015ef: test_add_max_memory_regions at > > set_memory_region_test.c:351 > > # 4 (inlined by) main at set_memory_region_test.c:397 > > # 50x03ff971abb89: ?? ??:0 > > # 60x010017ad: .annobin_abi_note.c.hot at crt1.o:? > > # KVM_CREATE_VM ioctl failed, rc: -1 errno: 22 > > not ok 7 selftests: kvm: set_memory_region_test # exit=254 > > > > and the testcase compilation does not pickup the pgste option. > > What does work is the following: > diff --git a/tools/testing/selftests/kvm/Makefile > b/tools/testing/selftests/kvm/Makefile > index a6d61f451f88..d9c6d9c2069e 100644 > --- a/tools/testing/selftests/kvm/Makefile > +++ b/tools/testing/selftests/kvm/Makefile > @@ -1,5 +1,6 @@ > # SPDX-License-Identifier: GPL-2.0-only > include ../../../../scripts/Kbuild.include > +include ../../../../scripts/Makefile.compiler > > all: > > > as it does pickup the linker option handling. Kbuild and the tools are divorced. They cannot be married unless the tools/ build system is largely refactored. That will be a tons of works (and I am not sure if it is welcome). The Kbuild refactoring should not be bothered by the tools. For now, I want them separated from each other. -- Best Regards Masahiro Yamada
[PATCH v2] powerpc/kexec_file: use current CPU info while setting up FDT
kexec_file_load uses initial_boot_params in setting up the device-tree for the kernel to be loaded. Though initial_boot_params holds info about CPUs at the time of boot, it doesn't account for hot added CPUs. So, kexec'ing with kexec_file_load syscall would leave the kexec'ed kernel with inaccurate CPU info. Also, if kdump kernel is loaded with kexec_file_load syscall and the system crashes on a hot added CPU, capture kernel hangs failing to identify the boot CPU. Kernel panic - not syncing: sysrq triggered crash CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 5.12.0-rc5upstream #54 Call Trace: [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 (unreliable) [c000e590fb00] [c0145290] panic+0x16c/0x41c [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40 [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0 [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178 [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0 [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330 [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140 [c000e590fdb0] [c0031ee0] system_call_exception+0x150/0x290 [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278 --- interrupt: c00 at 0x7fff905b9664 NIP: 7fff905b9664 LR: 7fff905320c4 CTR: REGS: c000e590fe80 TRAP: 0c00 Not tainted (5.12.0-rc5upstream) MSR: 8280f033 CR: 28000242 XER: IRQMASK: 0 GPR00: 0004 75fedf30 7fff906a7300 0001 GPR04: 01002a7355b0 0002 0001 75fef616 GPR08: 0001 GPR12: 7fff9073a160 GPR16: GPR20: 7fff906a4ee0 0002 0001 GPR24: 7fff906a0898 0002 01002a7355b0 GPR28: 0002 7fff906a1790 01002a7355b0 0002 NIP [7fff905b9664] 0x7fff905b9664 LR [7fff905320c4] 0x7fff905320c4 --- interrupt: c00 To avoid this from happening, extract current CPU info from of_root device node and use it for setting up the fdt in kexec_file_load case. Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory reserve map") Signed-off-by: Sourabh Jain --- arch/powerpc/kexec/file_load_64.c | 98 +++ 1 file changed, 98 insertions(+) --- Changelog: v1 -> v2 - fdt should be updated regardless of kexec type - updated commit message and title --- diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 02b9e4d0dc40..21d2f0e172ed 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -960,6 +960,99 @@ unsigned int kexec_fdt_totalsize_ppc64(struct kimage *image) return fdt_size; } +/** + * add_node_prop - Read property from device node structure and add + * them to fdt. + * @fdt: Flattened device tree of the kernel + * @node_offset: offset of the node to add a property at + * np: device node pointer + * + * Returns 0 on success, negative errno on error. + */ +int add_node_prop(void *fdt, int node_offset, const struct device_node *np) +{ + int ret = 0; + struct property *pp; + unsigned long flags; + + if (!np) + return -EINVAL; + + raw_spin_lock_irqsave(&devtree_lock, flags); + for (pp = np->properties; pp; pp = pp->next) { + ret = fdt_setprop(fdt, node_offset, pp->name, + pp->value, pp->length); + if (ret < 0) { + pr_err("Unable to add %s property: %s\n", + pp->name, fdt_strerror(ret)); + goto out; + } + } +out: + raw_spin_unlock_irqrestore(&devtree_lock, flags); + return ret; +} + +/** + * update_cpus_node - Update cpus node of flattened device-tree using of_root + * device node. + * @fdt: Flattened device tree of the kernel. + * + * Returns 0 on success, negative errno on error. + */ +int update_cpus_node(void *fdt) +{ + struct device_node *cpus_node, *dn; + int cpus_offset, cpus_subnode_off, ret = 0; + + cpus_offset = fdt_path_offset(fdt, "/cpus"); + if (cpus_offset == -FDT_ERR_NOTFOUND || cpus_offset > 0) { + if (cpus_offset > 0) { + ret = fdt_del_node(fdt, cpus_offset); + if (ret < 0) { + pr_err("Error deleting /cpus node: %s\n", + fdt_strerror(ret)); + return -EINVAL; + } + } + +
Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()
Dan Carpenter writes: > On Fri, Apr 16, 2021 at 09:00:12AM +0200, Christophe Leroy wrote: >> Le 16/04/2021 à 08:44, Daniel Axtens a écrit : >> > > On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote: >> > > >> > > > There are a few "goto out;" statements before the local variable "fdt" >> > > > is initialized through the call to of_kexec_alloc_and_setup_fdt() in >> > > > elf64_load(). This will result in an uninitialized "fdt" being passed >> > > > to kvfree() in this function if there is an error before the call to >> > > > of_kexec_alloc_and_setup_fdt(). >> > > > >> > > > Initialize the local variable "fdt" to NULL. >> > > > >> > I'm a huge fan of initialising local variables! But I'm struggling to >> > find the code path that will lead to an uninit fdt being returned... >> > >> > The out label reads in part: >> > >> >/* Make kimage_file_post_load_cleanup free the fdt buffer for us. */ >> >return ret ? ERR_PTR(ret) : fdt; >> > >> > As far as I can tell, any time we get a non-zero ret, we're going to >> > return an error pointer rather than the uninitialised value... >> >> I don't think GCC is smart enough to detect that. >> > > We disabled uninitialized variable checking for GCC. We disabled -Wmaybe-uninitialized, but that doesn't disable *all* uninitialized warnings does it? I wish we hadn't disabled it, it's already led to bugs slipping through. > But actually is something that has been on my mind recently. Smatch is > supposed to parse this correctly but there is a bug that affects powerpc > and I don't know how to debug it. The kbuild bot is doing cross > platform compiles but I don't have one set up on myself. Could someone > with Smatch installed test something for me? > > Or if you don't have Smatch installed then you should definitely install > it. :P > https://www.spinics.net/lists/smatch/msg00568.html I have smatch installed, and even run it sometimes ;) > Apply the patch from below and edit the path to point to the correct > directory. Then run kchecker and email me the output? That gave me: CC arch/powerpc/kernel/hw_breakpoint.o /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c: In function ‘task_bps_add’: /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:176:16: error: passing argument 1 of ‘__smatch_about’ makes integer from pointer without a cast [-Werror=int-conversion] 176 | __smatch_about(tmp); |^~~ || |struct breakpoint * In file included from /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:170: /home/michael/smatch/check_debug.h:4:40: note: expected ‘long int’ but argument is of type ‘struct breakpoint *’ 4 | static inline void __smatch_about(long var){} | ~^~~ /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:180:21: error: passing argument 1 of ‘__smatch_about’ makes integer from pointer without a cast [-Werror=int-conversion] 180 | __smatch_about(tmp); | ^~~ | | | struct breakpoint * In file included from /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:170: /home/michael/smatch/check_debug.h:4:40: note: expected ‘long int’ but argument is of type ‘struct breakpoint *’ 4 | static inline void __smatch_about(long var){} | ~^~~ cc1: all warnings being treated as errors Which looks like it didn't work. Right, needs tmp cast to long. Output below, hope it helps. Happy to test other things. cheers GEN Makefile CHECK /home/michael/linux/scripts/mod/empty.c CALL/home/michael/linux/scripts/checksyscalls.sh CALL/home/michael/linux/scripts/atomic/check-atomics.sh CHECK /home/michael/linux/arch/powerpc/kernel/vdso64/vgettimeofday.c CC arch/powerpc/kernel/hw_breakpoint.o CHECK /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() about /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() implied: tmp = 's64min-(-4096),(-12),4096-s64max' /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() buf size: 'tmp' 0 elements, 0 bytes (rl = (-1),32) /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() strlen: 'tmp' characters /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() real absolute: tmp = 's64min-(-4096),(-12),4096-s64max' /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() mtag = 0 offset = 0 rl = '' /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() [register_smatch_extra] tmp = '4096-ptr_max,(-12)' [merged] (4096-ptr_max,(-12), 4096-ptr_max, (-12), 4096-ptr_max, (-12)) /home/michael/linux/arch/powerpc/kernel/hw_breakpoint.c:177 task_bps_add() [register_modific
[PATCH] macintosh/via-pmu: Fix build warning
Now that __fake_sleep is static, we get a warning about it being unused in some configurations: drivers/macintosh/via-pmu.c:190:12: warning: '__fake_sleep' defined but not used 190 | static int __fake_sleep; Move it inside the ifdef where it's used to avoid the warning. Fixes: 95d143923379 ("macintosh/via-pmu: Make some symbols static") Reported-by: Stephen Rothwell Signed-off-by: Michael Ellerman --- drivers/macintosh/via-pmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/macintosh/via-pmu.c b/drivers/macintosh/via-pmu.c index 478766434919..4bdd4c45e7a7 100644 --- a/drivers/macintosh/via-pmu.c +++ b/drivers/macintosh/via-pmu.c @@ -187,7 +187,6 @@ static int query_batt_timer = BATTERY_POLLING_COUNT; static struct adb_request batt_req; static struct proc_dir_entry *proc_pmu_batt[PMU_MAX_BATTERIES]; -static int __fake_sleep; int asleep; #ifdef CONFIG_ADB @@ -1833,6 +1832,7 @@ pmu_present(void) */ static u32 save_via[8]; +static int __fake_sleep; static void save_via_state(void) -- 2.25.1
[PATCH 1/2] powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h
From: Tony Ambardar A few archs like powerpc have different errno.h values for macros EDEADLOCK and EDEADLK. In code including both libc and linux versions of errno.h, this can result in multiple definitions of EDEADLOCK in the include chain. Definitions to the same value (e.g. seen with mips) do not raise warnings, but on powerpc there are redefinitions changing the value, which raise warnings and errors (if using "-Werror"). Guard against these redefinitions to avoid build errors like the following, first seen cross-compiling libbpf v5.8.9 for powerpc using GCC 8.4.0 with musl 1.1.24: In file included from ../../arch/powerpc/include/uapi/asm/errno.h:5, from ../../include/linux/err.h:8, from libbpf.c:29: ../../include/uapi/asm-generic/errno.h:40: error: "EDEADLOCK" redefined [-Werror] #define EDEADLOCK EDEADLK In file included from toolchain-powerpc_8540_gcc-8.4.0_musl/include/errno.h:10, from libbpf.c:26: toolchain-powerpc_8540_gcc-8.4.0_musl/include/bits/errno.h:58: note: this is the location of the previous definition #define EDEADLOCK 58 cc1: all warnings being treated as errors Cc: Stable Reported-by: Rosen Penev Signed-off-by: Tony Ambardar Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200917135437.1238787-1-tony.ambar...@gmail.com --- arch/powerpc/include/uapi/asm/errno.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/include/uapi/asm/errno.h b/arch/powerpc/include/uapi/asm/errno.h index cc79856896a1..4ba87de32be0 100644 --- a/arch/powerpc/include/uapi/asm/errno.h +++ b/arch/powerpc/include/uapi/asm/errno.h @@ -2,6 +2,7 @@ #ifndef _ASM_POWERPC_ERRNO_H #define _ASM_POWERPC_ERRNO_H +#undef EDEADLOCK #include #undef EDEADLOCK -- 2.25.1
Re: [PATCH] powerpc/kdump: fix kdump kernel hangup issue with hot add CPUs
On 16/04/21 3:03 pm, Hari Bathini wrote: On 16/04/21 12:17 pm, Sourabh Jain wrote: With the kexec_file_load system call when system crashes on the hot add CPU the capture kernel hangs and failed to collect the vmcore. Kernel panic - not syncing: sysrq triggered crash CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 5.12.0-rc5upstream #54 Call Trace: [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 (unreliable) [c000e590fb00] [c0145290] panic+0x16c/0x41c [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40 [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0 [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178 [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0 [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330 [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140 [c000e590fdb0] [c0031ee0] system_call_exception+0x150/0x290 [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278 --- interrupt: c00 at 0x7fff905b9664 NIP: 7fff905b9664 LR: 7fff905320c4 CTR: REGS: c000e590fe80 TRAP: 0c00 Not tainted (5.12.0-rc5upstream) MSR: 8280f033 CR: 28000242 XER: IRQMASK: 0 GPR00: 0004 75fedf30 7fff906a7300 0001 GPR04: 01002a7355b0 0002 0001 75fef616 GPR08: 0001 GPR12: 7fff9073a160 GPR16: GPR20: 7fff906a4ee0 0002 0001 GPR24: 7fff906a0898 0002 01002a7355b0 GPR28: 0002 7fff906a1790 01002a7355b0 0002 NIP [7fff905b9664] 0x7fff905b9664 LR [7fff905320c4] 0x7fff905320c4 --- interrupt: c00 I will update the commit message. /** * setup_new_fdt_ppc64 - Update the flattend device-tree of the kernel * being loaded. @@ -1020,6 +1113,13 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, } } + /* Update cpus nodes information to account hotplug CPUs. */ + if (image->type == KEXEC_TYPE_CRASH) { Shouldn't this apply to regular kexec_file_load case as well? Yeah, there won't be a hang in regular kexec_file_load case but for correctness, that kernel should also not see stale CPU info in FDT? Yes better to update the fdt for both kexec and kdump. Thanks for the review Hari. - Sourabh Jain
[PATCH 2/2] powerpc/papr_scm: Fix build error due to wrong printf specifier
When I changed the rc variable to be long rather than int64_t I neglected to update the printk(), leading to a build break: arch/powerpc/platforms/pseries/papr_scm.c: In function 'papr_scm_pmem_flush': arch/powerpc/platforms/pseries/papr_scm.c:144:26: warning: format '%lld' expects argument of type 'long long int', but argument 3 has type 'long int' [-Wformat=] Fixes: 75b7c05ebf90 ("powerpc/papr_scm: Implement support for H_SCM_FLUSH hcall") Reported-by: Stephen Rothwell Signed-off-by: Michael Ellerman --- arch/powerpc/platforms/pseries/papr_scm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index ae6f5d80d5ce..48de21902116 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -141,7 +141,7 @@ static int papr_scm_pmem_flush(struct nd_region *nd_region, } while (rc == H_BUSY); if (rc) { - dev_err(&p->pdev->dev, "flush error: %lld", rc); + dev_err(&p->pdev->dev, "flush error: %ld", rc); rc = -EIO; } else { dev_dbg(&p->pdev->dev, "flush drc 0x%x complete", p->drc_index); -- 2.25.1
[PATCH 1/2] powerpc/configs: Add PAPR_SCM to pseries_defconfig
This is a pseries only driver, it should be built by default as part of pseries_defconfig to get some build coverage. Signed-off-by: Michael Ellerman --- arch/powerpc/configs/pseries_defconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig index 777221775c83..968095d7682c 100644 --- a/arch/powerpc/configs/pseries_defconfig +++ b/arch/powerpc/configs/pseries_defconfig @@ -41,6 +41,7 @@ CONFIG_DTL=y CONFIG_SCANLOG=m CONFIG_PPC_SMLPAR=y CONFIG_IBMEBUS=y +CONFIG_PAPR_SCM=m CONFIG_PPC_SVM=y # CONFIG_PPC_PMAC is not set CONFIG_RTAS_FLASH=m -- 2.25.1
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
Le 16/04/2021 à 12:51, Steven Price a écrit : On 16/04/2021 11:38, Christophe Leroy wrote: Le 16/04/2021 à 11:28, Steven Price a écrit : On 15/04/2021 18:18, Christophe Leroy wrote: In order to support large pages on powerpc, notepage() needs to know the page size of the page. Add a page_size argument to notepage(). Signed-off-by: Christophe Leroy --- arch/arm64/mm/ptdump.c | 2 +- arch/riscv/mm/ptdump.c | 2 +- arch/s390/mm/dump_pagetables.c | 3 ++- arch/x86/mm/dump_pagetables.c | 2 +- include/linux/ptdump.h | 2 +- mm/ptdump.c | 16 6 files changed, 14 insertions(+), 13 deletions(-) [...] diff --git a/mm/ptdump.c b/mm/ptdump.c index da751448d0e4..61cd16afb1c8 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk, { struct ptdump_state *st = walk->private; - st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0])); + st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), PAGE_SIZE); I'm not completely sure what the page_size is going to be used for, but note that KASAN presents an interesting case here. We short-cut by detecting it's a KASAN region at a high level (PGD/P4D/PUD/PMD) and instead of walking the tree down just call note_page() *once* but with level==4 because we know KASAN sets up the page table like that. However the one call actually covers a much larger region - so while PAGE_SIZE matches the level it doesn't match the region covered. AFAICT this will lead to odd results if you enable KASAN on powerpc. Hum I successfully tested it with KASAN, I now realise that I tested it with CONFIG_KASAN_VMALLOC selected. In this situation, since https://github.com/torvalds/linux/commit/af3d0a686 we don't have any common shadow page table anymore. I'll test again without CONFIG_KASAN_VMALLOC. To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN output to x86. Given the generic ptdump code has handling for KASAN already it should be possible to drop that from the powerpc arch code, which I think means we don't actually need to provide page size to notepage(). Hopefully that means more code to delete ;) Yes ... and no. It looks like the generic ptdump handles the case when several pgdir entries points to the same kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular page tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page . Christophe
Re: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems
On Fri, Apr 16, 2021 at 07:32:35AM +, David Laight wrote: > From: Matthew Wilcox > > Sent: 15 April 2021 23:22 > > > > On Thu, Apr 15, 2021 at 09:11:56PM +, David Laight wrote: > > > Isn't it possible to move the field down one long? > > > This might require an explicit zero - but this is not a common > > > code path - the extra write will be noise. > > > > Then it overlaps page->mapping. See emails passim. > > The rules on overlaps make be wonder if every 'long' > should be in its own union. That was what we used to have. It was worse. > The comments would need to say when each field is used. > It would, at least, make these errors less common. > > That doesn't solve the 64bit dma_addr though. > > Actually rather that word-swapping dma_addr on 32bit BE > could you swap over the two fields it overlays with. > That might look messy in the .h, but it doesn't require > an accessor function to do the swap - easily missed. No.
Re: linux-next: build warning after merge of the powerpc tree
Stephen Rothwell writes: > Hi all, > > After merging the powerpc tree, today's linux-next build (powerpc > allyesconfig) produced this warning: > > In file included from include/linux/device.h:15, > from arch/powerpc/include/asm/io.h:27, > from include/linux/io.h:13, > from include/linux/irq.h:20, > from arch/powerpc/include/asm/hardirq.h:6, > from include/linux/hardirq.h:11, > from include/linux/highmem.h:10, > from include/linux/bio.h:8, > from include/linux/libnvdimm.h:14, > from arch/powerpc/platforms/pseries/papr_scm.c:12: > arch/powerpc/platforms/pseries/papr_scm.c: In function 'papr_scm_pmem_flush': > arch/powerpc/platforms/pseries/papr_scm.c:144:26: warning: format '%lld' > expects argument of type 'long long int', but argument 3 has type 'long int' > [-Wformat=] > 144 | dev_err(&p->pdev->dev, "flush error: %lld", rc); > | ^~~ > include/linux/dev_printk.h:19:22: note: in definition of macro 'dev_fmt' >19 | #define dev_fmt(fmt) fmt > | ^~~ > arch/powerpc/platforms/pseries/papr_scm.c:144:3: note: in expansion of macro > 'dev_err' > 144 | dev_err(&p->pdev->dev, "flush error: %lld", rc); > | ^~~ > arch/powerpc/platforms/pseries/papr_scm.c:144:43: note: format string is > defined here > 144 | dev_err(&p->pdev->dev, "flush error: %lld", rc); > |~~~^ > | | > | long long int > |%ld > > Introduced by commit > > 75b7c05ebf90 ("powerpc/papr_scm: Implement support for H_SCM_FLUSH hcall") My bad. cheers
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
On 16/04/2021 11:38, Christophe Leroy wrote: Le 16/04/2021 à 11:28, Steven Price a écrit : On 15/04/2021 18:18, Christophe Leroy wrote: In order to support large pages on powerpc, notepage() needs to know the page size of the page. Add a page_size argument to notepage(). Signed-off-by: Christophe Leroy --- arch/arm64/mm/ptdump.c | 2 +- arch/riscv/mm/ptdump.c | 2 +- arch/s390/mm/dump_pagetables.c | 3 ++- arch/x86/mm/dump_pagetables.c | 2 +- include/linux/ptdump.h | 2 +- mm/ptdump.c | 16 6 files changed, 14 insertions(+), 13 deletions(-) [...] diff --git a/mm/ptdump.c b/mm/ptdump.c index da751448d0e4..61cd16afb1c8 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk, { struct ptdump_state *st = walk->private; - st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0])); + st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), PAGE_SIZE); I'm not completely sure what the page_size is going to be used for, but note that KASAN presents an interesting case here. We short-cut by detecting it's a KASAN region at a high level (PGD/P4D/PUD/PMD) and instead of walking the tree down just call note_page() *once* but with level==4 because we know KASAN sets up the page table like that. However the one call actually covers a much larger region - so while PAGE_SIZE matches the level it doesn't match the region covered. AFAICT this will lead to odd results if you enable KASAN on powerpc. Hum I successfully tested it with KASAN, I now realise that I tested it with CONFIG_KASAN_VMALLOC selected. In this situation, since https://github.com/torvalds/linux/commit/af3d0a686 we don't have any common shadow page table anymore. I'll test again without CONFIG_KASAN_VMALLOC. To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN output to x86. Given the generic ptdump code has handling for KASAN already it should be possible to drop that from the powerpc arch code, which I think means we don't actually need to provide page size to notepage(). Hopefully that means more code to delete ;) Steve
Re: [PATCH v3] powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h
Tony Ambardar writes: > Hello Michael, > > The latest version of this patch addressed all feedback I'm aware of > when submitted last September, and I've seen no further comments from > reviewers since then. > > Could you please let me know where this stands and if anything further > is needed? Sorry, it's still sitting in my inbox :/ I was going to reply to suggest we split the tools change out. The headers under tools are usually updated by another maintainer, I think it might even be scripted. Anyway I've applied your patch and done that (dropped the change to tools/.../errno.h), which should also mean the stable backport is more likely to work automatically. It will hit mainline in v5.13-rc1 and then be backported to the stable trees. I don't think you actually need the tools version of the header updated to fix your bug? In which case we can probably just wait for it to be updated automatically when the tools headers are sync'ed with the kernel versions. cheers > On Thu, 17 Sept 2020 at 06:54, Tony Ambardar wrote: >> >> A few archs like powerpc have different errno.h values for macros >> EDEADLOCK and EDEADLK. In code including both libc and linux versions of >> errno.h, this can result in multiple definitions of EDEADLOCK in the >> include chain. Definitions to the same value (e.g. seen with mips) do >> not raise warnings, but on powerpc there are redefinitions changing the >> value, which raise warnings and errors (if using "-Werror"). >> >> Guard against these redefinitions to avoid build errors like the following, >> first seen cross-compiling libbpf v5.8.9 for powerpc using GCC 8.4.0 with >> musl 1.1.24: >> >> In file included from ../../arch/powerpc/include/uapi/asm/errno.h:5, >>from ../../include/linux/err.h:8, >>from libbpf.c:29: >> ../../include/uapi/asm-generic/errno.h:40: error: "EDEADLOCK" redefined >> [-Werror] >>#define EDEADLOCK EDEADLK >> >> In file included from >> toolchain-powerpc_8540_gcc-8.4.0_musl/include/errno.h:10, >>from libbpf.c:26: >> toolchain-powerpc_8540_gcc-8.4.0_musl/include/bits/errno.h:58: note: this >> is the location of the previous definition >>#define EDEADLOCK 58 >> >> cc1: all warnings being treated as errors >> >> CC: Stable >> Reported-by: Rosen Penev >> Signed-off-by: Tony Ambardar >> --- >> v1 -> v2: >> * clean up commit description formatting >> >> v2 -> v3: (per Michael Ellerman) >> * drop indeterminate 'Fixes' tags, request stable backports instead >> --- >> arch/powerpc/include/uapi/asm/errno.h | 1 + >> tools/arch/powerpc/include/uapi/asm/errno.h | 1 + >> 2 files changed, 2 insertions(+) >> >> diff --git a/arch/powerpc/include/uapi/asm/errno.h >> b/arch/powerpc/include/uapi/asm/errno.h >> index cc79856896a1..4ba87de32be0 100644 >> --- a/arch/powerpc/include/uapi/asm/errno.h >> +++ b/arch/powerpc/include/uapi/asm/errno.h >> @@ -2,6 +2,7 @@ >> #ifndef _ASM_POWERPC_ERRNO_H >> #define _ASM_POWERPC_ERRNO_H >> >> +#undef EDEADLOCK >> #include >> >> #undef EDEADLOCK >> diff --git a/tools/arch/powerpc/include/uapi/asm/errno.h >> b/tools/arch/powerpc/include/uapi/asm/errno.h >> index cc79856896a1..4ba87de32be0 100644 >> --- a/tools/arch/powerpc/include/uapi/asm/errno.h >> +++ b/tools/arch/powerpc/include/uapi/asm/errno.h >> @@ -2,6 +2,7 @@ >> #ifndef _ASM_POWERPC_ERRNO_H >> #define _ASM_POWERPC_ERRNO_H >> >> +#undef EDEADLOCK >> #include >> >> #undef EDEADLOCK >> -- >> 2.25.1 >>
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
Le 16/04/2021 à 11:28, Steven Price a écrit : On 15/04/2021 18:18, Christophe Leroy wrote: In order to support large pages on powerpc, notepage() needs to know the page size of the page. Add a page_size argument to notepage(). Signed-off-by: Christophe Leroy --- arch/arm64/mm/ptdump.c | 2 +- arch/riscv/mm/ptdump.c | 2 +- arch/s390/mm/dump_pagetables.c | 3 ++- arch/x86/mm/dump_pagetables.c | 2 +- include/linux/ptdump.h | 2 +- mm/ptdump.c | 16 6 files changed, 14 insertions(+), 13 deletions(-) [...] diff --git a/mm/ptdump.c b/mm/ptdump.c index da751448d0e4..61cd16afb1c8 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk, { struct ptdump_state *st = walk->private; - st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0])); + st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), PAGE_SIZE); I'm not completely sure what the page_size is going to be used for, but note that KASAN presents an interesting case here. We short-cut by detecting it's a KASAN region at a high level (PGD/P4D/PUD/PMD) and instead of walking the tree down just call note_page() *once* but with level==4 because we know KASAN sets up the page table like that. However the one call actually covers a much larger region - so while PAGE_SIZE matches the level it doesn't match the region covered. AFAICT this will lead to odd results if you enable KASAN on powerpc. Hum I successfully tested it with KASAN, I now realise that I tested it with CONFIG_KASAN_VMALLOC selected. In this situation, since https://github.com/torvalds/linux/commit/af3d0a686 we don't have any common shadow page table anymore. I'll test again without CONFIG_KASAN_VMALLOC. To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. I was indeed introduced for KASAN. We have a first commit https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a KASAN like stuff. Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the problem was exactly, something around the use of hugepages for kernel memory, came as part of the series https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.le...@csgroup.eu/ Christophe
Re: [PATCH v1 1/1] powerpc/papr_scm: Properly handle UUID types and API
On 4/16/21 2:39 PM, Andy Shevchenko wrote: On Fri, Apr 16, 2021 at 01:28:21PM +0530, Aneesh Kumar K.V wrote: On 4/15/21 7:16 PM, Andy Shevchenko wrote: Parse to and export from UUID own type, before dereferencing. This also fixes wrong comment (Little Endian UUID is something else) and should fix Sparse warnings about assigning strict types to POD. Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset cookie") Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for storing uuid from the device tree") Cc: Oliver O'Halloran Cc: Aneesh Kumar K.V Signed-off-by: Andy Shevchenko --- Not tested arch/powerpc/platforms/pseries/papr_scm.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index ae6f5d80d5ce..4366e1902890 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -1085,8 +1085,9 @@ static int papr_scm_probe(struct platform_device *pdev) u32 drc_index, metadata_size; u64 blocks, block_size; struct papr_scm_priv *p; + u8 uuid_raw[UUID_SIZE]; const char *uuid_str; - u64 uuid[2]; + uuid_t uuid; int rc; /* check we have all the required DT properties */ @@ -1129,16 +1130,18 @@ static int papr_scm_probe(struct platform_device *pdev) p->hcall_flush_required = of_property_read_bool(dn, "ibm,hcall-flush-required"); /* We just need to ensure that set cookies are unique across */ - uuid_parse(uuid_str, (uuid_t *) uuid); + uuid_parse(uuid_str, &uuid); + /* * cookie1 and cookie2 are not really little endian -* we store a little endian representation of the +* we store a raw buffer representation of the * uuid str so that we can compare this with the label * area cookie irrespective of the endian config with which * the kernel is built. */ - p->nd_set.cookie1 = cpu_to_le64(uuid[0]); - p->nd_set.cookie2 = cpu_to_le64(uuid[1]); + export_uuid(uuid_raw, &uuid); + p->nd_set.cookie1 = get_unaligned_le64(&uuid_raw[0]); + p->nd_set.cookie2 = get_unaligned_le64(&uuid_raw[8]); ok that does the equivalent of cpu_to_le64 there. So we are good. But the comment update is missing the details why we did that get_unaligned_le64. Maybe raw buffer representation is the correct term? Should we add an example in the comment. ie, /* * Historically we stored the cookie in the below format. for a uuid str 72511b67-0b3b-42fd-8d1d-5be3cae8bcaa cookie1 was 0xfd423b0b671b5172 cookie2 was 0xaabce8cae35b1d8d */ I'm fine with the comment. At least it will shed a light on the byte ordering we are expecting. Will you be sending an update? Also it will be good to list the sparse warning in the commit message? -aneesh
Re: [PATCH] powerpc/kdump: fix kdump kernel hangup issue with hot add CPUs
On 16/04/21 12:17 pm, Sourabh Jain wrote: With the kexec_file_load system call when system crashes on the hot add CPU the capture kernel hangs and failed to collect the vmcore. Kernel panic - not syncing: sysrq triggered crash CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 5.12.0-rc5upstream #54 Call Trace: [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 (unreliable) [c000e590fb00] [c0145290] panic+0x16c/0x41c [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40 [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0 [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178 [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0 [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330 [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140 [c000e590fdb0] [c0031ee0] system_call_exception+0x150/0x290 [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278 --- interrupt: c00 at 0x7fff905b9664 NIP: 7fff905b9664 LR: 7fff905320c4 CTR: REGS: c000e590fe80 TRAP: 0c00 Not tainted (5.12.0-rc5upstream) MSR: 8280f033 CR: 28000242 XER: IRQMASK: 0 GPR00: 0004 75fedf30 7fff906a7300 0001 GPR04: 01002a7355b0 0002 0001 75fef616 GPR08: 0001 GPR12: 7fff9073a160 GPR16: GPR20: 7fff906a4ee0 0002 0001 GPR24: 7fff906a0898 0002 01002a7355b0 GPR28: 0002 7fff906a1790 01002a7355b0 0002 NIP [7fff905b9664] 0x7fff905b9664 LR [7fff905320c4] 0x7fff905320c4 --- interrupt: c00 /** * setup_new_fdt_ppc64 - Update the flattend device-tree of the kernel * being loaded. @@ -1020,6 +1113,13 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, } } + /* Update cpus nodes information to account hotplug CPUs. */ + if (image->type == KEXEC_TYPE_CRASH) { Shouldn't this apply to regular kexec_file_load case as well? Yeah, there won't be a hang in regular kexec_file_load case but for correctness, that kernel should also not see stale CPU info in FDT? Thanks Hari
Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()
On 15/04/2021 18:18, Christophe Leroy wrote: In order to support large pages on powerpc, notepage() needs to know the page size of the page. Add a page_size argument to notepage(). Signed-off-by: Christophe Leroy --- arch/arm64/mm/ptdump.c | 2 +- arch/riscv/mm/ptdump.c | 2 +- arch/s390/mm/dump_pagetables.c | 3 ++- arch/x86/mm/dump_pagetables.c | 2 +- include/linux/ptdump.h | 2 +- mm/ptdump.c| 16 6 files changed, 14 insertions(+), 13 deletions(-) [...] diff --git a/mm/ptdump.c b/mm/ptdump.c index da751448d0e4..61cd16afb1c8 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk, { struct ptdump_state *st = walk->private; - st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0])); + st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), PAGE_SIZE); I'm not completely sure what the page_size is going to be used for, but note that KASAN presents an interesting case here. We short-cut by detecting it's a KASAN region at a high level (PGD/P4D/PUD/PMD) and instead of walking the tree down just call note_page() *once* but with level==4 because we know KASAN sets up the page table like that. However the one call actually covers a much larger region - so while PAGE_SIZE matches the level it doesn't match the region covered. AFAICT this will lead to odd results if you enable KASAN on powerpc. To be honest I don't fully understand why powerpc requires the page_size - it appears to be using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such holes would occur. Steve walk->action = ACTION_CONTINUE; @@ -41,7 +41,7 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr, st->effective_prot(st, 0, pgd_val(val)); if (pgd_leaf(val)) - st->note_page(st, addr, 0, pgd_val(val)); + st->note_page(st, addr, 0, pgd_val(val), PGDIR_SIZE); return 0; } @@ -62,7 +62,7 @@ static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr, st->effective_prot(st, 1, p4d_val(val)); if (p4d_leaf(val)) - st->note_page(st, addr, 1, p4d_val(val)); + st->note_page(st, addr, 1, p4d_val(val), P4D_SIZE); return 0; } @@ -83,7 +83,7 @@ static int ptdump_pud_entry(pud_t *pud, unsigned long addr, st->effective_prot(st, 2, pud_val(val)); if (pud_leaf(val)) - st->note_page(st, addr, 2, pud_val(val)); + st->note_page(st, addr, 2, pud_val(val), PUD_SIZE); return 0; } @@ -102,7 +102,7 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr, if (st->effective_prot) st->effective_prot(st, 3, pmd_val(val)); if (pmd_leaf(val)) - st->note_page(st, addr, 3, pmd_val(val)); + st->note_page(st, addr, 3, pmd_val(val), PMD_SIZE); return 0; } @@ -116,7 +116,7 @@ static int ptdump_pte_entry(pte_t *pte, unsigned long addr, if (st->effective_prot) st->effective_prot(st, 4, pte_val(val)); - st->note_page(st, addr, 4, pte_val(val)); + st->note_page(st, addr, 4, pte_val(val), PAGE_SIZE); return 0; } @@ -126,7 +126,7 @@ static int ptdump_hole(unsigned long addr, unsigned long next, { struct ptdump_state *st = walk->private; - st->note_page(st, addr, depth, 0); + st->note_page(st, addr, depth, 0, 0); return 0; } @@ -153,5 +153,5 @@ void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm, pgd_t *pgd) mmap_read_unlock(mm); /* Flush out the last page */ - st->note_page(st, 0, -1, 0); + st->note_page(st, 0, -1, 0, 0); }
Re: [PATCH v1 1/1] powerpc/papr_scm: Properly handle UUID types and API
On Fri, Apr 16, 2021 at 01:28:21PM +0530, Aneesh Kumar K.V wrote: > On 4/15/21 7:16 PM, Andy Shevchenko wrote: > > Parse to and export from UUID own type, before dereferencing. > > This also fixes wrong comment (Little Endian UUID is something else) > > and should fix Sparse warnings about assigning strict types to POD. > > > > Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset > > cookie") > > Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for > > storing uuid from the device tree") > > Cc: Oliver O'Halloran > > Cc: Aneesh Kumar K.V > > Signed-off-by: Andy Shevchenko > > --- > > Not tested > > arch/powerpc/platforms/pseries/papr_scm.c | 13 - > > 1 file changed, 8 insertions(+), 5 deletions(-) > > > > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c > > b/arch/powerpc/platforms/pseries/papr_scm.c > > index ae6f5d80d5ce..4366e1902890 100644 > > --- a/arch/powerpc/platforms/pseries/papr_scm.c > > +++ b/arch/powerpc/platforms/pseries/papr_scm.c > > @@ -1085,8 +1085,9 @@ static int papr_scm_probe(struct platform_device > > *pdev) > > u32 drc_index, metadata_size; > > u64 blocks, block_size; > > struct papr_scm_priv *p; > > + u8 uuid_raw[UUID_SIZE]; > > const char *uuid_str; > > - u64 uuid[2]; > > + uuid_t uuid; > > int rc; > > /* check we have all the required DT properties */ > > @@ -1129,16 +1130,18 @@ static int papr_scm_probe(struct platform_device > > *pdev) > > p->hcall_flush_required = of_property_read_bool(dn, > > "ibm,hcall-flush-required"); > > /* We just need to ensure that set cookies are unique across */ > > - uuid_parse(uuid_str, (uuid_t *) uuid); > > + uuid_parse(uuid_str, &uuid); > > + > > /* > > * cookie1 and cookie2 are not really little endian > > -* we store a little endian representation of the > > +* we store a raw buffer representation of the > > * uuid str so that we can compare this with the label > > * area cookie irrespective of the endian config with which > > * the kernel is built. > > */ > > - p->nd_set.cookie1 = cpu_to_le64(uuid[0]); > > - p->nd_set.cookie2 = cpu_to_le64(uuid[1]); > > + export_uuid(uuid_raw, &uuid); > > + p->nd_set.cookie1 = get_unaligned_le64(&uuid_raw[0]); > > + p->nd_set.cookie2 = get_unaligned_le64(&uuid_raw[8]); > > ok that does the equivalent of cpu_to_le64 there. So we are good. But the > comment update is missing the details why we did that get_unaligned_le64. > Maybe raw buffer representation is the correct term? > Should we add an example in the comment. ie, > /* > * Historically we stored the cookie in the below format. > for a uuid str 72511b67-0b3b-42fd-8d1d-5be3cae8bcaa > cookie1 was 0xfd423b0b671b5172 cookie2 was 0xaabce8cae35b1d8d > */ I'm fine with the comment. At least it will shed a light on the byte ordering we are expecting. > > /* might be zero */ > > p->metadata_size = metadata_size; -- With Best Regards, Andy Shevchenko
Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()
Daniel Axtens writes: >> On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote: >> >> Sorry - missed copying device-tree and powerpc mailing lists. >> >>> There are a few "goto out;" statements before the local variable "fdt" >>> is initialized through the call to of_kexec_alloc_and_setup_fdt() in >>> elf64_load(). This will result in an uninitialized "fdt" being passed >>> to kvfree() in this function if there is an error before the call to >>> of_kexec_alloc_and_setup_fdt(). >>> >>> Initialize the local variable "fdt" to NULL. >>> > I'm a huge fan of initialising local variables! But I'm struggling to > find the code path that will lead to an uninit fdt being returned... > > The out label reads in part: > > /* Make kimage_file_post_load_cleanup free the fdt buffer for us. */ > return ret ? ERR_PTR(ret) : fdt; > > As far as I can tell, any time we get a non-zero ret, we're going to > return an error pointer rather than the uninitialised value... > > (btw, it does look like we might leak fdt if we have an error after we > successfully kmalloc it.) > > Am I missing something? Can you link to the report for the kernel test > robot or from Dan? > > FWIW, I think it's worth including this patch _anyway_ because initing > local variables is good practice, but I'm just not sure on the > justification. Why is it good practice? It defeats -Wuninitialized. So you're guaranteed to be returning something initialised, but not necessarily initialised to the right value. In a case like this NULL seems like a safe choice, but it's still wrong. The function is meant to return a pointer to the successfully allocated fdt, or an ERR_PTR() value. NULL is neither of those. I agree there are security reasons that initialising stack variables is desirable, but I think that should be handled by the compiler, not at the source level. cheers
Re: [PATCH net-next v4 2/2] of: net: fix of_get_mac_addr_nvmem() for non-platform devices
Am 2021-04-16 05:24, schrieb Benjamin Herrenschmidt: On Mon, 2021-04-12 at 19:47 +0200, Michael Walle wrote: /** * of_get_phy_mode - Get phy mode for given device_node @@ -59,15 +60,39 @@ static int of_get_mac_addr(struct device_node *np, const char *name, u8 *addr) static int of_get_mac_addr_nvmem(struct device_node *np, u8 *addr) { struct platform_device *pdev = of_find_device_by_node(np); + struct nvmem_cell *cell; + const void *mac; + size_t len; int ret; - if (!pdev) - return -ENODEV; + /* Try lookup by device first, there might be a nvmem_cell_lookup +* associated with a given device. +*/ + if (pdev) { + ret = nvmem_get_mac_address(&pdev->dev, addr); + put_device(&pdev->dev); + return ret; + } + This smells like the wrong band aid :) Any struct device can contain an OF node pointer these days. But not all nodes might have an associated device, see DSA for example. And as the name suggests of_get_mac_address() operates on a node. So if a driver calls of_get_mac_address() it should work on the node. What is wrong IMHO, is that the ethernet drivers where the corresponding board has a nvmem_cell_lookup registered is calling of_get_mac_address(node). It should rather call eth_get_mac_address(dev) in the first place. One would need to figure out if there is an actual device (with an assiciated of_node), then call eth_get_mac_address(dev) and if there isn't a device call of_get_mac_address(node). But I don't know if that is easy to figure out. Well, one could start with just the device where nvmem_cell_lookup is used. Then we could drop the workaround above. This seems all backwards. I think we are dealing with bad evolution. We need to do a lookup for the device because we get passed an of_node. We should just get passed a device here... or rather stop calling of_get_mac_addr() from all those drivers and instead call eth_platform_get_mac_address() which in turns calls of_get_mac_addr(). Then the nvmem stuff gets put in eth_platform_get_mac_address(). of_get_mac_addr() becomes a low-level thingy that most drivers don't care about. The NVMEM thing is just another (optional) way how the MAC address is fetched from the device tree. Thus, if the drivers have the of_get_mac_address() call they should automatically get the NVMEM method, too. -michael
Re: [PATCH] mm: ptdump: Fix build failure
On 15/04/2021 10:31, Christophe Leroy wrote: CC mm/ptdump.o In file included from : mm/ptdump.c: In function 'ptdump_pte_entry': ././include/linux/compiler_types.h:320:38: error: call to '__compiletime_assert_207' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE(). 320 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^ ././include/linux/compiler_types.h:301:4: note: in definition of macro '__compiletime_assert' 301 |prefix ## suffix();\ |^~ ././include/linux/compiler_types.h:320:2: note: in expansion of macro '_compiletime_assert' 320 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^~~ ./include/asm-generic/rwonce.h:36:2: note: in expansion of macro 'compiletime_assert' 36 | compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \ | ^~ ./include/asm-generic/rwonce.h:49:2: note: in expansion of macro 'compiletime_assert_rwonce_type' 49 | compiletime_assert_rwonce_type(x);\ | ^~ mm/ptdump.c:114:14: note: in expansion of macro 'READ_ONCE' 114 | pte_t val = READ_ONCE(*pte); | ^ make[2]: *** [mm/ptdump.o] Error 1 READ_ONCE() cannot be used for reading PTEs. Use ptep_get() instead. See commit 481e980a7c19 ("mm: Allow arches to provide ptep_get()") and commit c0e1c8c22beb ("powerpc/8xx: Provide ptep_get() with 16k pages") for details. It was cargo-culted from the arm64/x86 implementations (where this happens to be safe). Fixes: 30d621f6723b ("mm: add generic ptdump") Cc: Steven Price Signed-off-by: Christophe Leroy Reviewed-by: Steven Price Thanks, Steve --- mm/ptdump.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/ptdump.c b/mm/ptdump.c index 4354c1422d57..da751448d0e4 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -111,7 +111,7 @@ static int ptdump_pte_entry(pte_t *pte, unsigned long addr, unsigned long next, struct mm_walk *walk) { struct ptdump_state *st = walk->private; - pte_t val = READ_ONCE(*pte); + pte_t val = ptep_get(pte); if (st->effective_prot) st->effective_prot(st, 4, pte_val(val));
Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()
On Fri, Apr 16, 2021 at 09:00:12AM +0200, Christophe Leroy wrote: > > > Le 16/04/2021 à 08:44, Daniel Axtens a écrit : > > Hi Lakshmi, > > > > > On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote: > > > > > > Sorry - missed copying device-tree and powerpc mailing lists. > > > > > > > There are a few "goto out;" statements before the local variable "fdt" > > > > is initialized through the call to of_kexec_alloc_and_setup_fdt() in > > > > elf64_load(). This will result in an uninitialized "fdt" being passed > > > > to kvfree() in this function if there is an error before the call to > > > > of_kexec_alloc_and_setup_fdt(). > > > > > > > > Initialize the local variable "fdt" to NULL. > > > > > > I'm a huge fan of initialising local variables! But I'm struggling to > > find the code path that will lead to an uninit fdt being returned... > > > > The out label reads in part: > > > > /* Make kimage_file_post_load_cleanup free the fdt buffer for us. */ > > return ret ? ERR_PTR(ret) : fdt; > > > > As far as I can tell, any time we get a non-zero ret, we're going to > > return an error pointer rather than the uninitialised value... > > I don't think GCC is smart enough to detect that. > We disabled uninitialized variable checking for GCC. But actually is something that has been on my mind recently. Smatch is supposed to parse this correctly but there is a bug that affects powerpc and I don't know how to debug it. The kbuild bot is doing cross platform compiles but I don't have one set up on myself. Could someone with Smatch installed test something for me? Or if you don't have Smatch installed then you should definitely install it. :P https://www.spinics.net/lists/smatch/msg00568.html Apply the patch from below and edit the path to point to the correct directory. Then run kchecker and email me the output? ~/path/to/smatch_scripts/kchecker arch/powerpc/kernel/hw_breakpoint.c regads, dan carpenter diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 8fc7a14e4d71..f2dfba54e14d 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -167,13 +167,19 @@ static bool can_co_exist(struct breakpoint *b, struct perf_event *bp) return !(alternate_infra_bp(b, bp) && bp_addr_range_overlap(b->bp, bp)); } +#include "/home/XXX/path/to/smatch/check_debug.h" static int task_bps_add(struct perf_event *bp) { struct breakpoint *tmp; tmp = alloc_breakpoint(bp); - if (IS_ERR(tmp)) + __smatch_about(tmp); + __smatch_debug_on(); + if (IS_ERR(tmp)) { + __smatch_debug_off(); + __smatch_about(tmp); return PTR_ERR(tmp); + } list_add(&tmp->list, &task_bps); return 0;
Re: [PATCH v1 1/1] powerpc/papr_scm: Properly handle UUID types and API
On 4/15/21 7:16 PM, Andy Shevchenko wrote: Parse to and export from UUID own type, before dereferencing. This also fixes wrong comment (Little Endian UUID is something else) and should fix Sparse warnings about assigning strict types to POD. Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset cookie") Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for storing uuid from the device tree") Cc: Oliver O'Halloran Cc: Aneesh Kumar K.V Signed-off-by: Andy Shevchenko --- Not tested arch/powerpc/platforms/pseries/papr_scm.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index ae6f5d80d5ce..4366e1902890 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -1085,8 +1085,9 @@ static int papr_scm_probe(struct platform_device *pdev) u32 drc_index, metadata_size; u64 blocks, block_size; struct papr_scm_priv *p; + u8 uuid_raw[UUID_SIZE]; const char *uuid_str; - u64 uuid[2]; + uuid_t uuid; int rc; /* check we have all the required DT properties */ @@ -1129,16 +1130,18 @@ static int papr_scm_probe(struct platform_device *pdev) p->hcall_flush_required = of_property_read_bool(dn, "ibm,hcall-flush-required"); /* We just need to ensure that set cookies are unique across */ - uuid_parse(uuid_str, (uuid_t *) uuid); + uuid_parse(uuid_str, &uuid); + /* * cookie1 and cookie2 are not really little endian -* we store a little endian representation of the +* we store a raw buffer representation of the * uuid str so that we can compare this with the label * area cookie irrespective of the endian config with which * the kernel is built. */ - p->nd_set.cookie1 = cpu_to_le64(uuid[0]); - p->nd_set.cookie2 = cpu_to_le64(uuid[1]); + export_uuid(uuid_raw, &uuid); + p->nd_set.cookie1 = get_unaligned_le64(&uuid_raw[0]); + p->nd_set.cookie2 = get_unaligned_le64(&uuid_raw[8]); ok that does the equivalent of cpu_to_le64 there. So we are good. But the comment update is missing the details why we did that get_unaligned_le64. Maybe raw buffer representation is the correct term? Should we add an example in the comment. ie, /* * Historically we stored the cookie in the below format. for a uuid str 72511b67-0b3b-42fd-8d1d-5be3cae8bcaa cookie1 was 0xfd423b0b671b5172 cookie2 was 0xaabce8cae35b1d8d */ /* might be zero */ p->metadata_size = metadata_size;
[PATCH] symbol : Make the size of the compile-related array fixed
OPPO 本电子邮件及其附件含有OPPO公司的保密信息,仅限于邮件指明的收件人使用(包含个人及群组)。禁止任何人在未经授权的情况下以任何形式使用。如果您错收了本邮件,请立即以电子邮件通知发件人并删除本邮件及其附件。 This e-mail and its attachments contain confidential information from OPPO, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! 0001-symbol-Make-the-size-of-the-compile-related-array-fi.patch Description: 0001-symbol-Make-the-size-of-the-compile-related-array-fi.patch
Re: [PATCH 1/2] powerpc/sstep: Add emulation support for ‘setb’ instruction
Sathvika Vasireddy writes: > This adds emulation support for the following instruction: >* Set Boolean (setb) > > Signed-off-by: Sathvika Vasireddy > --- > arch/powerpc/lib/sstep.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c > index c6aebc149d14..263c613d7490 100644 > --- a/arch/powerpc/lib/sstep.c > +++ b/arch/powerpc/lib/sstep.c > @@ -1964,6 +1964,18 @@ int analyse_instr(struct instruction_op *op, const > struct pt_regs *regs, > op->val = ~(regs->gpr[rd] | regs->gpr[rb]); > goto logical_done; > > + case 128: /* setb */ > + if (!cpu_has_feature(CPU_FTR_ARCH_300)) > + goto unknown_opcode; Ok, if I've understood correctly... > + ra = ra & ~0x3; This masks off the bits of RA that are not part of BTF: ra is in [0, 31] which is [0b0, 0b1] Then ~0x3 = ~0b00011 ra = ra & 0b11100 This gives us then, ra = btf << 2; or btf = ra >> 2; Let's then check to see if your calculations read the right fields. > + if ((regs->ccr) & (1 << (31 - ra))) > + op->val = -1; > + else if ((regs->ccr) & (1 << (30 - ra))) > + op->val = 1; > + else > + op->val = 0; CR field: 76543210 bit: 0123 0123 0123 0123 0123 0123 0123 0123 normal bit #: 0.31 ibm bit #: 31.0 If btf = 0, ra = 0, check normal bits 31 and 30, which are both in CR0. CR field: 76543210 bit: 0123 0123 0123 0123 0123 0123 0123 0123 ^^ If btf = 7, ra = 0b11100 = 28, so check normal bits 31-28 and 30-28, which are 3 and 2. CR field: 76543210 bit: 0123 0123 0123 0123 0123 0123 0123 0123 ^^ If btf = 3, ra = 0b01100 = 12, for normal bits 19 and 18: CR field: 76543210 bit: 0123 0123 0123 0123 0123 0123 0123 0123 ^^ So yes, your calculations, while I struggle to follow _how_ they work, do in fact seem to work. Checkpatch does have one complaint: CHECK:UNNECESSARY_PARENTHESES: Unnecessary parentheses around 'regs->ccr' #30: FILE: arch/powerpc/lib/sstep.c:1971: + if ((regs->ccr) & (1 << (31 - ra))) I don't really mind the parenteses: I think you are safe to ignore checkpatch here unless someone else complains :) If you do end up respinning the patch, I think it would be good to make the maths a bit clearer. I think it works because a left shift of 2 is the same as multiplying by 4, but it would be easier to follow if you used a temporary variable for btf. However, I do think this is still worth adding to the kernel either way, so: Reviewed-by: Daniel Axtens Kind regards, Daniel > + goto compute_done; > + > case 154: /* prtyw */ > do_prty(regs, op, regs->gpr[rd], 32); > goto logical_done_nocc; > -- > 2.16.4
Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()
On Fri, Apr 16, 2021 at 04:44:30PM +1000, Daniel Axtens wrote: > Hi Lakshmi, > > > On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote: > > > > Sorry - missed copying device-tree and powerpc mailing lists. > > > >> There are a few "goto out;" statements before the local variable "fdt" > >> is initialized through the call to of_kexec_alloc_and_setup_fdt() in > >> elf64_load(). This will result in an uninitialized "fdt" being passed > >> to kvfree() in this function if there is an error before the call to > >> of_kexec_alloc_and_setup_fdt(). > >> > >> Initialize the local variable "fdt" to NULL. > >> > I'm a huge fan of initialising local variables! Don't be! It just disables static checker warnings and hides bugs. The kbuild emails are archived but the email is mangled and unreadable. https://www.mail-archive.com/kbuild@lists.01.org/msg06371.html I think maybe you're not on the most recent code. In linux-next this code looks like: arch/powerpc/kexec/elf_64.c 27 static void *elf64_load(struct kimage *image, char *kernel_buf, 28 unsigned long kernel_len, char *initrd, 29 unsigned long initrd_len, char *cmdline, 30 unsigned long cmdline_len) 31 { 32 int ret; 33 unsigned long kernel_load_addr; 34 unsigned long initrd_load_addr = 0, fdt_load_addr; 35 void *fdt; 36 const void *slave_code; 37 struct elfhdr ehdr; 38 char *modified_cmdline = NULL; 39 struct kexec_elf_info elf_info; 40 struct kexec_buf kbuf = { .image = image, .buf_min = 0, 41.buf_max = ppc64_rma_size }; 42 struct kexec_buf pbuf = { .image = image, .buf_min = 0, 43.buf_max = ppc64_rma_size, .top_down = true, 44.mem = KEXEC_BUF_MEM_UNKNOWN }; 45 46 ret = kexec_build_elf_info(kernel_buf, kernel_len, &ehdr, &elf_info); 47 if (ret) 48 goto out; I really despise "goto out;" because freeing things which haven't been allocated is always dangerous. [ snip ]. 143 out: 144 kfree(modified_cmdline); 145 kexec_free_elf_info(&elf_info); There is a possibility that "elf_info" has holds uninitialized stack data if elf_read_ehdr() fails so that's probably fixing as well. kexec() is root only so this can't be exploited. 146 147 /* 148 * Once FDT buffer has been successfully passed to kexec_add_buffer(), 149 * the FDT buffer address is saved in image->arch.fdt. In that case, 150 * the memory cannot be freed here in case of any other error. 151 */ 152 if (ret && !image->arch.fdt) 153 kvfree(fdt); ^^^ Uninitialized. 154 155 return ret ? ERR_PTR(ret) : NULL; 156 } regards, dan carpenter
RE: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems
From: Matthew Wilcox > Sent: 15 April 2021 23:22 > > On Thu, Apr 15, 2021 at 09:11:56PM +, David Laight wrote: > > Isn't it possible to move the field down one long? > > This might require an explicit zero - but this is not a common > > code path - the extra write will be noise. > > Then it overlaps page->mapping. See emails passim. The rules on overlaps make be wonder if every 'long' should be in its own union. The comments would need to say when each field is used. It would, at least, make these errors less common. That doesn't solve the 64bit dma_addr though. Actually rather that word-swapping dma_addr on 32bit BE could you swap over the two fields it overlays with. That might look messy in the .h, but it doesn't require an accessor function to do the swap - easily missed. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH] powerpc/pseries: extract host bridge from pci_bus prior to bus removal
Hi Tyrel, > The pci_bus->bridge reference may no longer be valid after > pci_bus_remove() resulting in passing a bad value to device_unregister() > for the associated bridge device. > > Store the host_bridge reference in a separate variable prior to > pci_bus_remove(). > The patch certainly seems to do what you say. I'm not really up on the innards of PCI, so I'm struggling to figure out by what code path pci_bus_remove() might invalidate pci_bus->bridge? A quick look at pci_remove_bus was not very illuminating but I didn't chase down every call it made. Kind regards, Daniel > Fixes: 7340056567e3 ("powerpc/pci: Reorder pci bus/bridge unregistration > during PHB removal") > Signed-off-by: Tyrel Datwyler > --- > arch/powerpc/platforms/pseries/pci_dlpar.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c > b/arch/powerpc/platforms/pseries/pci_dlpar.c > index f9ae17e8a0f4..a8f9140a24fa 100644 > --- a/arch/powerpc/platforms/pseries/pci_dlpar.c > +++ b/arch/powerpc/platforms/pseries/pci_dlpar.c > @@ -50,6 +50,7 @@ EXPORT_SYMBOL_GPL(init_phb_dynamic); > int remove_phb_dynamic(struct pci_controller *phb) > { > struct pci_bus *b = phb->bus; > + struct pci_host_bridge *host_bridge = to_pci_host_bridge(b->bridge); > struct resource *res; > int rc, i; > > @@ -76,7 +77,8 @@ int remove_phb_dynamic(struct pci_controller *phb) > /* Remove the PCI bus and unregister the bridge device from sysfs */ > phb->bus = NULL; > pci_remove_bus(b); > - device_unregister(b->bridge); > + host_bridge->bus = NULL; > + device_unregister(&host_bridge->dev); > > /* Now release the IO resource */ > if (res->flags & IORESOURCE_IO) > -- > 2.27.0
Re: [PATCH] soc: fsl: qe: remove unused function
Le 16/04/2021 à 08:57, Daniel Axtens a écrit : Hi Jiapeng, Fix the following clang warning: You are not fixing a warning, you are removing a function in order to fix a warning ... drivers/soc/fsl/qe/qe_ic.c:234:29: warning: unused function 'qe_ic_from_irq' [-Wunused-function]. Would be wise to tell that the last users of the function where removed by commit d7c2878cfcfa ("soc: fsl: qe: remove unused qe_ic_set_* functions") https://github.com/torvalds/linux/commit/d7c2878cfcfa Reported-by: Abaci Robot Signed-off-by: Jiapeng Chong --- drivers/soc/fsl/qe/qe_ic.c | 5 - 1 file changed, 5 deletions(-) diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/soc/fsl/qe/qe_ic.c index 0390af9..b573712 100644 --- a/drivers/soc/fsl/qe/qe_ic.c +++ b/drivers/soc/fsl/qe/qe_ic.c @@ -231,11 +231,6 @@ static inline void qe_ic_write(__be32 __iomem *base, unsigned int reg, qe_iowrite32be(value, base + (reg >> 2)); } -static inline struct qe_ic *qe_ic_from_irq(unsigned int virq) -{ - return irq_get_chip_data(virq); -} This seems good to me. * We know that this function can't be called directly from outside the file, because it is static. * The function address isn't used as a function pointer anywhere, so that means it can't be called from outside the file that way (also it's inline, which would make using a function pointer unwise!) * There's no obvious macros in that file that might construct the name of the function in a way that is hidden from grep. All in all, I am fairly confident that the function is indeed not used. Reviewed-by: Daniel Axtens Kind regards, Daniel - static inline struct qe_ic *qe_ic_from_irq_data(struct irq_data *d) { return irq_data_get_irq_chip_data(d); -- 1.8.3.1
答复: [PATCH] symbol : Make the size of the compile-related array fixed
Thank you!Got it. I tried to use Git, but it didn't work. I'll do it next time. -邮件原件- 发件人: Daniel Axtens 发送时间: 2021年4月16日 14:19 收件人: 韩大鹏(Han Dapeng) ; Michael Ellerman ; Benjamin Herrenschmidt ; Paul Mackerras ; Heiko Carstens ; Vasily Gorbik ; Christian Borntraeger ; Thomas Gleixner ; Ingo Molnar ; Borislav Petkov ; x...@kernel.org; H. Peter Anvin ; Masahiro Yamada ; Michal Marek ; Mike Rapoport ; Pekka Enberg ; Andrew Morton ; Arseny Solokha ; 韩大鹏(Han Dapeng) ; Arvind Sankar ; Kees Cook ; Joerg Roedel ; Christian Brauner ; Kirill Tkhai ; linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org; linux-s...@vger.kernel.org; linux-kbu...@vger.kernel.org 抄送: 陈安庆(Anqing) 主题: Re: [PATCH] symbol : Make the size of the compile-related array fixed Hi, Thanks for your contribution to the kernel! I notice that your patch is sumbitted as an attachment. In future, please could you submit your patch inline, rather than as an attachment? See https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Fv4.15%2Fprocess%2F5.Posting.html&data=04%7C01%7Chandapeng%40oppo.com%7Cd00adeb268d34207f5bb08d9009f98b2%7Cf1905eb1c35341c5951662b4a54b5ee6%7C0%7C0%7C637541508341234090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8Sncu0Yyqvv3AtEZiL73vexVEpwmxmwrUjJ124ez%2BPs%3D&reserved=0 I'd recommend you use git send-email if possible: see e.g. https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Fv4.15%2Fprocess%2Femail-clients.html&data=04%7C01%7Chandapeng%40oppo.com%7Cd00adeb268d34207f5bb08d9009f98b2%7Cf1905eb1c35341c5951662b4a54b5ee6%7C0%7C0%7C637541508341234090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Ny2BFB4AE05nHHqITJkixtOwfPrB8s1KboWVLTFVXF8%3D&reserved=0 > Subject: [PATCH] symbol : Make the size of the compile-related array > fixed > > For the same code, the machine's user name, hostname, or compilation > time may cause the kernel symbol address to be inconsistent, which is > not friendly to some symbol-dependent software, such as Crash. If I understand correctly, this patch makes it easier to recompile the kernel from the same source but at a different time or on a different machine or with a different user, but still get the same symbols. Is that right? I wonder if there are other reproducible build techniques that might be simpler to apply? There is a kernel documentation page at https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Flatest%2Fkbuild%2Freproducible-builds.html&data=04%7C01%7Chandapeng%40oppo.com%7Cd00adeb268d34207f5bb08d9009f98b2%7Cf1905eb1c35341c5951662b4a54b5ee6%7C0%7C0%7C637541508341234090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=s54GLV2Pue3ArPgX4OcEyhAGsanJyBy1LRESkVMOGts%3D&reserved=0 which gives exisiting techniques to override the date, user and host. Would they be sufficient to address your use case? Kind regards, Daniel > > Signed-off-by: Han Dapeng > --- > arch/powerpc/mm/nohash/kaslr_booke.c | 2 +- > arch/s390/boot/version.c | 2 +- > arch/x86/boot/compressed/kaslr.c | 2 +- > arch/x86/boot/version.c | 2 +- > init/version.c | 4 ++-- > scripts/mkcompile_h | 2 ++ > 6 files changed, 8 insertions(+), 6 deletions(-) > > diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c > b/arch/powerpc/mm/nohash/kaslr_booke.c > index 4c74e8a5482b..494ef408e60c 100644 > --- a/arch/powerpc/mm/nohash/kaslr_booke.c > +++ b/arch/powerpc/mm/nohash/kaslr_booke.c > @@ -37,7 +37,7 @@ struct regions { > }; > > /* Simplified build-specific string for starting entropy. */ -static > const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@" > +static const char build_str[COMPILE_STR_MAX] = UTS_RELEASE " (" > LINUX_COMPILE_BY "@" > LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION; > > struct regions __initdata regions; > diff --git a/arch/s390/boot/version.c b/arch/s390/boot/version.c index > d32e58bdda6a..627416a27d74 100644 > --- a/arch/s390/boot/version.c > +++ b/arch/s390/boot/version.c > @@ -3,5 +3,5 @@ > #include > #include "boot.h" > > -const char kernel_version[] = UTS_RELEASE > +const char kernel_version[COMPILE_STR_MAX] = UTS_RELEASE > " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") " UTS_VERSION; diff > --git a/arch/x86/boot/compressed/kaslr.c > b/arch/x86/boot/compressed/kaslr.c > index b92fffbe761f..7b72b518a4c8 100644 > --- a/arch/x86/boot/compressed/kaslr.c > +++ b/arch/x86/boot/compressed/kaslr.c > @@ -43,7 +43,7 @@ > extern unsigned long get_cmd_line_ptr(void); > > /* Simplified build-specific string for starting entropy. */ -static > const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@" > +static const char build_str[COMPILE_STR_MAX] = UTS_RELEASE " (" > LINUX_COMPILE_BY "@
答复: [PATCH] symbol : Make the size of the compile-related array fixed
The raw text is shown below: From 540e6a6c36e6372d4f99eeb4a50c8eaa6d7989b3 Mon Sep 17 00:00:00 2001 From: Han Dapeng Date: Fri, 16 Apr 2021 10:36:38 +0800 Subject: [PATCH] symbol : Make the size of the compile-related array fixed For the same code, the machine's user name, hostname, or compilation time may cause the kernel symbol address to be inconsistent, which is not friendly to some symbol-dependent software, such as Crash. Signed-off-by: Han Dapeng --- arch/powerpc/mm/nohash/kaslr_booke.c | 2 +- arch/s390/boot/version.c | 2 +- arch/x86/boot/compressed/kaslr.c | 2 +- arch/x86/boot/version.c | 2 +- init/version.c | 4 ++-- scripts/mkcompile_h | 2 ++ 6 files changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c b/arch/powerpc/mm/nohash/kaslr_booke.c index 4c74e8a5482b..494ef408e60c 100644 --- a/arch/powerpc/mm/nohash/kaslr_booke.c +++ b/arch/powerpc/mm/nohash/kaslr_booke.c @@ -37,7 +37,7 @@ struct regions { }; /* Simplified build-specific string for starting entropy. */ -static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@" +static const char build_str[COMPILE_STR_MAX] = UTS_RELEASE " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION; struct regions __initdata regions; diff --git a/arch/s390/boot/version.c b/arch/s390/boot/version.c index d32e58bdda6a..627416a27d74 100644 --- a/arch/s390/boot/version.c +++ b/arch/s390/boot/version.c @@ -3,5 +3,5 @@ #include #include "boot.h" -const char kernel_version[] = UTS_RELEASE +const char kernel_version[COMPILE_STR_MAX] = UTS_RELEASE " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") " UTS_VERSION; diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index b92fffbe761f..7b72b518a4c8 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -43,7 +43,7 @@ extern unsigned long get_cmd_line_ptr(void); /* Simplified build-specific string for starting entropy. */ -static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@" +static const char build_str[COMPILE_STR_MAX] = UTS_RELEASE " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION; static unsigned long rotate_xor(unsigned long hash, const void *area, diff --git a/arch/x86/boot/version.c b/arch/x86/boot/version.c index a1aaaf6c06a6..08feaa2d7a10 100644 --- a/arch/x86/boot/version.c +++ b/arch/x86/boot/version.c @@ -14,6 +14,6 @@ #include #include -const char kernel_version[] = +const char kernel_version[COMPILE_STR_MAX] = UTS_RELEASE " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") " UTS_VERSION; diff --git a/init/version.c b/init/version.c index 92afc782b043..adfc9e91b56b 100644 --- a/init/version.c +++ b/init/version.c @@ -35,11 +35,11 @@ struct uts_namespace init_uts_ns = { EXPORT_SYMBOL_GPL(init_uts_ns); /* FIXED STRINGS! Don't touch! */ -const char linux_banner[] = +const char linux_banner[COMPILE_STR_MAX] = "Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "\n"; -const char linux_proc_banner[] = +const char linux_proc_banner[COMPILE_STR_MAX] = "%s version %s" " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ")" " (" LINUX_COMPILER ") %s\n"; diff --git a/scripts/mkcompile_h b/scripts/mkcompile_h index 4ae735039daf..02b9d9d54da9 100755 --- a/scripts/mkcompile_h +++ b/scripts/mkcompile_h @@ -65,6 +65,8 @@ UTS_VERSION="$(echo $UTS_VERSION $CONFIG_FLAGS $TIMESTAMP | cut -b -$UTS_LEN)" LD_VERSION=$($LD -v | head -n1 | sed 's/(compatible with [^)]*)//' \ | sed 's/[[:space:]]*$//') printf '#define LINUX_COMPILER "%s"\n' "$CC_VERSION, $LD_VERSION" + + echo \#define COMPILE_STR_MAX 512 } > .tmpcompile # Only replace the real compile.h if the new one is different, -- 2.27.0 = This statement is automatically attached to our company's mail client. Our company encourages giving code back to the community, no problem, I will take responsibility for this. Thank you! -邮件原件- 发件人: Christophe Leroy 发送时间: 2021年4月16日 14:13 收件人: 韩大鹏(Han Dapeng) ; Michael Ellerman ; Benjamin Herrenschmidt ; Paul Mackerras ; Heiko Carstens ; Vasily Gorbik ; Christian Borntraeger ; Thomas Gleixner ; Ingo Molnar ; Borislav Petkov ; x...@kernel.org; H. Peter Anvin ; Masahiro Yamada ; Michal Marek ; Mike Rapoport ; Pekka Enberg ; Andrew Morton ; Arseny Solokha ; Arvind Sankar ; Kees Cook ; Joerg Roedel ; Christian Brauner ; Kirill Tkhai ; linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org; linux-s...@vger.kernel.org; linux-kbu...@vger.kernel.org 抄送: 陈安庆(Anqing) 主题: Re: [PATCH] symbol : Make the size
Re: [PATCH] powerpc: Initialize local variable fdt to NULL in elf64_load()
Le 16/04/2021 à 08:44, Daniel Axtens a écrit : Hi Lakshmi, On 4/15/21 12:14 PM, Lakshmi Ramasubramanian wrote: Sorry - missed copying device-tree and powerpc mailing lists. There are a few "goto out;" statements before the local variable "fdt" is initialized through the call to of_kexec_alloc_and_setup_fdt() in elf64_load(). This will result in an uninitialized "fdt" being passed to kvfree() in this function if there is an error before the call to of_kexec_alloc_and_setup_fdt(). Initialize the local variable "fdt" to NULL. I'm a huge fan of initialising local variables! But I'm struggling to find the code path that will lead to an uninit fdt being returned... The out label reads in part: /* Make kimage_file_post_load_cleanup free the fdt buffer for us. */ return ret ? ERR_PTR(ret) : fdt; As far as I can tell, any time we get a non-zero ret, we're going to return an error pointer rather than the uninitialised value... I don't think GCC is smart enough to detect that. (btw, it does look like we might leak fdt if we have an error after we successfully kmalloc it.) Am I missing something? Can you link to the report for the kernel test robot or from Dan? FWIW, I think it's worth including this patch _anyway_ because initing local variables is good practice, but I'm just not sure on the justification. I don't think local systematically initing local variables is a good practice at all, as it leads to bugs where you get a wrong value because of pathes where you forgot to set the correct value. If you don't init local variable at declaration and forget to set it in some pathes, the compiler will detect it and warn you. If you init the local variable with an arbitrary value at declaration and forget to set it later in some pathes, the compiler won't be able to detect it and you will go with the wrong value. Christophe Kind regards, Daniel Signed-off-by: Lakshmi Ramasubramanian Reported-by: kernel test robot Reported-by: Dan Carpenter --- arch/powerpc/kexec/elf_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c index 5a569bb51349..0051440c1f77 100644 --- a/arch/powerpc/kexec/elf_64.c +++ b/arch/powerpc/kexec/elf_64.c @@ -32,7 +32,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, int ret; unsigned long kernel_load_addr; unsigned long initrd_load_addr = 0, fdt_load_addr; - void *fdt; + void *fdt = NULL; const void *slave_code; struct elfhdr ehdr; char *modified_cmdline = NULL; thanks, -lakshmi