Re: 5.?? regression: strace testsuite OOpses kernel on ia64
On Tue, 23 Feb 2021 18:53:21 + Sergei Trofimovich wrote: > The crash seems to be related to sock_filter-v test from strace: > https://github.com/strace/strace/blob/master/tests/seccomp-filter-v.c > > Here is an OOps: > > [ 818.089904] BUG: Bad page map in process sock_filter-v pte:0001 > pmd:118580001 > [ 818.089904] page:e6a429c8 refcount:1 mapcount:-1 > mapping: index:0x0 pfn:0x0 > [ 818.089904] flags: 0x1000(reserved) > [ 818.089904] raw: 1000 a0004008 a0004008 > > [ 818.089904] raw: 0001fffe > [ 818.089904] page dumped because: bad pte > [ 818.089904] addr: vm_flags:04044011 > anon_vma: mapping: index:0 > [ 818.095483] file:(null) fault:0x0 mmap:0x0 readpage:0x0 > [ 818.095483] CPU: 0 PID: 5990 Comm: sock_filter-v Not tainted > 5.11.0-3-gbfa5a4929c90 #57 > [ 818.095483] Hardware name: hp server rx3600 , BIOS 04.03 >04/08/2008 > [ 818.095483] > [ 818.095483] Call Trace: > [ 818.095483] [] show_stack+0x90/0xc0 > [ 818.095483] sp=e00118707bb0 > bsp=e001187013c0 > [ 818.095483] [] dump_stack+0x120/0x160 > [ 818.095483] sp=e00118707d80 > bsp=e00118701348 > [ 818.095483] [] print_bad_pte+0x300/0x3a0 > [ 818.095483] sp=e00118707d80 > bsp=e001187012e0 > [ 818.099483] [] unmap_page_range+0xa90/0x11a0 > [ 818.099483] sp=e00118707d80 > bsp=e00118701140 > [ 818.099483] [] unmap_vmas+0xc0/0x100 > [ 818.099483] sp=e00118707da0 > bsp=e00118701108 > [ 818.099483] [] exit_mmap+0x150/0x320 > [ 818.099483] sp=e00118707da0 > bsp=e001187010d8 > [ 818.099483] [] mmput+0x60/0x200 > [ 818.099483] sp=e00118707e20 > bsp=e001187010b0 > [ 818.103482] [] do_exit+0x6f0/0x18a0 > [ 818.103482] sp=e00118707e20 > bsp=e00118701038 > [ 818.103482] [] do_group_exit+0x90/0x2a0 > [ 818.103482] sp=e00118707e30 > bsp=e00118700ff0 > [ 818.103482] [] sys_exit_group+0x20/0x40 > [ 818.103482] sp=e00118707e30 > bsp=e00118700f98 > [ 818.107482] [] ia64_trace_syscall+0xf0/0x130 > [ 818.107482] sp=e00118707e30 > bsp=e00118700f98 > [ 818.107482] [] ia64_ivt+0x00040720/0x400 > [ 818.107482] sp=e00118708000 > bsp=e00118700f98 > [ 818.115482] Disabling lock debugging due to kernel taint > [ 818.115482] BUG: Bad rss-counter state mm:2eec6412 > type:MM_FILEPAGES val:-1 > [ 818.132256] Unable to handle kernel NULL pointer dereference (address > 0068) > [ 818.133904] sock_filter-v-X[5999]: Oops 11012296146944 [1] > [ 818.133904] Modules linked in: acpi_ipmi ipmi_si usb_storage e1000 > ipmi_devintf ipmi_msghandler rtc_efi > [ 818.133904] > [ 818.133904] CPU: 0 PID: 5999 Comm: sock_filter-v-X Tainted: GB > 5.11.0-3-gbfa5a4929c90 #57 > [ 818.133904] Hardware name: hp server rx3600 , BIOS 04.03 >04/08/2008 > [ 818.133904] psr : 121008026010 ifs : 8288 ip : > []Tainted: GB > (5.11.0-3-gbfa5a4929c90) > [ 818.133904] ip is at bpf_prog_free+0x21/0xe0 > [ 818.133904] unat: pfs : 0307 rsc : > 0003 > [ 818.133904] rnat: bsps: pr : > 00106a5a51665965 > [ 818.133904] ldrs: ccv : 12088904 fpsr: > 0009804c8a70033f > [ 818.133904] csd : ssd : > [ 818.133904] b0 : a00100d54080 b6 : a00100d53fe0 b7 : > a001cef0 > [ 818.133904] f6 : 0ffefb0c50daa1b67f89a f7 : 0ffed8b3e4fdb0800 > [ 818.133904] f8 : 10017fbd1bc00 f9 : 1000eb95f > [ 818.133904] f10 : 10008ade20716a6c83cc1 f11 : 1003e02b7 > [ 818.133904] r1 : a0010176b300 r2 : a0028004 r3 : > > [ 818.133904] r8 : 0008 r9 : e0011873f800 r10 : > e00102c18600 > [ 818.133904] r11 : e00102c19600 r12 : e0011873f7f0 r13 : > e00118738000 > [ 818.133904] r14 : 0068 r15 : a0028028 r16 :
[PATCH v2] mm: page_poison: print page info when corruption is caught
When page_poison detects page corruption it's useful to see who freed a page recently to have a guess where write-after-free corruption happens. After this change corruption report has extra page data. Example report from real corruption (includes only page_pwner part): pagealloc: memory corruption e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60 0..` e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20 ...`... ... CPU: 1 PID: 220402 Comm: cc1plus Not tainted 5.12.0-rc5-00107-g9720c6f59ecf #245 Hardware name: hp server rx3600, BIOS 04.03 04/08/2008 ... Call Trace: [] show_stack+0x90/0xc0 [] dump_stack+0x150/0x1c0 [] __kernel_unpoison_pages+0x410/0x440 [] get_page_from_freelist+0x1460/0x2ca0 [] __alloc_pages_nodemask+0x3c0/0x660 [] alloc_pages_vma+0xb0/0x500 [] __handle_mm_fault+0x1230/0x1fe0 [] handle_mm_fault+0x310/0x4e0 [] ia64_do_page_fault+0x1f0/0xb80 [] ia64_leave_kernel+0x0/0x270 page_owner tracks the page as freed page allocated via order 0, migratetype Movable, gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts 8173444098740 __reset_page_owner+0x40/0x200 free_pcp_prepare+0x4d0/0x600 free_unref_page+0x20/0x1c0 __put_page+0x110/0x1a0 migrate_pages+0x16d0/0x1dc0 compact_zone+0xfc0/0x1aa0 proactive_compact_node+0xd0/0x1e0 kcompactd+0x550/0x600 kthread+0x2c0/0x2e0 call_payload+0x50/0x80 Here we can see that page was freed by page migration but something managed to write to it afterwards. CC: Vlastimil Babka CC: Andrew Morton CC: linux...@kvack.org Signed-off-by: Sergei Trofimovich --- Change since v1: use more generic 'dump_page()' suggested by Vlastimil Should supersede existing mm-page_poison-print-page-owner-info-when-corruption-is-caught.patch mm/page_poison.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/page_poison.c b/mm/page_poison.c index 65cdf844c8ad..df03126f3b2b 100644 --- a/mm/page_poison.c +++ b/mm/page_poison.c @@ -2,6 +2,7 @@ #include #include #include +#include #include #include #include @@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned char b) return error && !(error & (error - 1)); } -static void check_poison_mem(unsigned char *mem, size_t bytes) +static void check_poison_mem(struct page *page, unsigned char *mem, size_t bytes) { static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10); unsigned char *start; @@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t bytes) print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start, end - start + 1, 1); dump_stack(); + dump_page(page, "pagealloc: corrupted page details"); } static void unpoison_page(struct page *page) @@ -82,7 +84,7 @@ static void unpoison_page(struct page *page) * that is freed to buddy. Thus no extra check is done to * see if a page was poisoned. */ - check_poison_mem(addr, PAGE_SIZE); + check_poison_mem(page, addr, PAGE_SIZE); kunmap_atomic(addr); } -- 2.31.1
Re: [PATCH] mm: page_poison: print page owner info when corruption is caught
On Wed, Apr 07, 2021 at 02:15:50PM +0200, Vlastimil Babka wrote: > On 4/4/21 4:17 PM, Sergei Trofimovich wrote: > > When page_poison detects page corruption it's useful to see who > > freed a page recently to have a guess where write-after-free > > corruption happens. > > > > After this change corruption report has extra page_owner data. > > Example report from real corruption: > > > > pagealloc: memory corruption > > e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60 > > e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20 > > ... > > CPU: 1 PID: 220402 Comm: cc1plus Not tainted > > 5.12.0-rc5-00107-g9720c6f59ecf #245 > > Hardware name: hp server rx3600, BIOS 04.03 04/08/2008 > > ... > > Call Trace: > > [] show_stack+0x90/0xc0 > > [] dump_stack+0x150/0x1c0 > > [] __kernel_unpoison_pages+0x410/0x440 > > [] get_page_from_freelist+0x1460/0x2ca0 > > [] __alloc_pages_nodemask+0x3c0/0x660 > > [] alloc_pages_vma+0xb0/0x500 > > [] __handle_mm_fault+0x1230/0x1fe0 > > [] handle_mm_fault+0x310/0x4e0 > > [] ia64_do_page_fault+0x1f0/0xb80 > > [] ia64_leave_kernel+0x0/0x270 > > page_owner tracks the page as freed > > page allocated via order 0, migratetype Movable, > > gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts > > 8173444098740 > > __reset_page_owner+0x40/0x200 > > free_pcp_prepare+0x4d0/0x600 > > free_unref_page+0x20/0x1c0 > > __put_page+0x110/0x1a0 > > migrate_pages+0x16d0/0x1dc0 > > compact_zone+0xfc0/0x1aa0 > > proactive_compact_node+0xd0/0x1e0 > > kcompactd+0x550/0x600 > > kthread+0x2c0/0x2e0 > > call_payload+0x50/0x80 > > > > Here we can see that page was freed by page migration but something > > managed to write to it afterwards. > > > > CC: Andrew Morton > > CC: linux...@kvack.org > > Signed-off-by: Sergei Trofimovich > > --- > > mm/page_poison.c | 6 -- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/mm/page_poison.c b/mm/page_poison.c > > index 65cdf844c8ad..ef2a1eab13d7 100644 > > --- a/mm/page_poison.c > > +++ b/mm/page_poison.c > > @@ -4,6 +4,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned > > char b) > > return error && !(error & (error - 1)); > > } > > > > -static void check_poison_mem(unsigned char *mem, size_t bytes) > > +static void check_poison_mem(struct page *page, unsigned char *mem, size_t > > bytes) > > { > > static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10); > > unsigned char *start; > > @@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t > > bytes) > > print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start, > > end - start + 1, 1); > > dump_stack(); > > + dump_page_owner(page); > > OK but why not a full dump_page()? Oh, I did not know it existed! Looks even better. Will send a v2 with dump_page(). > > } > > > > static void unpoison_page(struct page *page) > > @@ -82,7 +84,7 @@ static void unpoison_page(struct page *page) > > * that is freed to buddy. Thus no extra check is done to > > * see if a page was poisoned. > > */ > > - check_poison_mem(addr, PAGE_SIZE); > > + check_poison_mem(page, addr, PAGE_SIZE); > > kunmap_atomic(addr); > > } > > > > > -- Sergei
Re: [PATCH] mm: page_owner: fetch backtrace only for tracked pages
On Wed, Apr 07, 2021 at 05:49:14PM +0200, Vlastimil Babka wrote: > On 4/1/21 11:24 PM, Sergei Trofimovich wrote: > > Very minor optimization. > > I'm not entirely sure about accuracy of "only for tracked pages". Missing > page_ext is something I'm not even sure how possible it is in practice, > probably > just an error condition (failed to be allocated?). Or did you observe this in > practice? But anyway, the change is not wrong. Never saw missing 'page_ext' in practice (I also did not check for it explicitly). I agree "optimization" is misleading. "cleanup" might be a better wording. > > CC: Andrew Morton > > CC: linux...@kvack.org > > Signed-off-by: Sergei Trofimovich > > Acked-by: Vlastimil Babka > > > --- > > mm/page_owner.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/mm/page_owner.c b/mm/page_owner.c > > index 63e4ecaba97b..7147fd34a948 100644 > > --- a/mm/page_owner.c > > +++ b/mm/page_owner.c > > @@ -140,14 +140,14 @@ void __reset_page_owner(struct page *page, unsigned > > int order) > > { > > int i; > > struct page_ext *page_ext; > > - depot_stack_handle_t handle = 0; > > + depot_stack_handle_t handle; > > struct page_owner *page_owner; > > > > - handle = save_stack(GFP_NOWAIT | __GFP_NOWARN); > > - > > page_ext = lookup_page_ext(page); > > if (unlikely(!page_ext)) > > return; > > + > > + handle = save_stack(GFP_NOWAIT | __GFP_NOWARN); > > for (i = 0; i < (1 << order); i++) { > > __clear_bit(PAGE_EXT_OWNER_ALLOCATED, _ext->flags); > > page_owner = get_page_owner(page_ext); > > > -- Sergei
Re: [PATCH 11/20] kbuild: ia64: use common install script
On Wed, 7 Apr 2021 07:34:10 +0200 Greg Kroah-Hartman wrote: > The common scripts/install.sh script will now work for ia64, all that > is needed is to add the compressed image type to it. So add that file > type check and the ability to call /usr/sbin/elilo after copying the > kernel. With that we can remove the ia64-only version of the file. > > Cc: linux-i...@vger.kernel.org > Signed-off-by: Greg Kroah-Hartman Reviewed-by: Sergei Trofimovich > --- > arch/ia64/Makefile | 2 +- > arch/ia64/install.sh | 40 > scripts/install.sh | 8 +++- > 3 files changed, 8 insertions(+), 42 deletions(-) > delete mode 100644 arch/ia64/install.sh > > diff --git a/arch/ia64/Makefile b/arch/ia64/Makefile > index 467b7e7f967c..19e20e99f487 100644 > --- a/arch/ia64/Makefile > +++ b/arch/ia64/Makefile > @@ -77,7 +77,7 @@ archheaders: > CLEAN_FILES += vmlinux.gz > > install: vmlinux.gz > - sh $(srctree)/arch/ia64/install.sh $(KERNELRELEASE) $< System.map > "$(INSTALL_PATH)" > + sh $(srctree)/scripts/install.sh $(KERNELRELEASE) $< System.map > "$(INSTALL_PATH)" > > define archhelp >echo '* compressed - Build compressed kernel image' > diff --git a/arch/ia64/install.sh b/arch/ia64/install.sh > deleted file mode 100644 > index 0e932f5dcd1a.. > --- a/arch/ia64/install.sh > +++ /dev/null > @@ -1,40 +0,0 @@ > -#!/bin/sh > -# > -# arch/ia64/install.sh > -# > -# This file is subject to the terms and conditions of the GNU General Public > -# License. See the file "COPYING" in the main directory of this archive > -# for more details. > -# > -# Copyright (C) 1995 by Linus Torvalds > -# > -# Adapted from code in arch/i386/boot/Makefile by H. Peter Anvin > -# > -# "make install" script for ia64 architecture > -# > -# Arguments: > -# $1 - kernel version > -# $2 - kernel image file > -# $3 - kernel map file > -# $4 - default install path (blank if root directory) > -# > - > -# User may have a custom install script > - > -if [ -x ~/bin/${INSTALLKERNEL} ]; then exec ~/bin/${INSTALLKERNEL} "$@"; fi > -if [ -x /sbin/${INSTALLKERNEL} ]; then exec /sbin/${INSTALLKERNEL} "$@"; fi > - > -# Default install - same as make zlilo > - > -if [ -f $4/vmlinuz ]; then > - mv $4/vmlinuz $4/vmlinuz.old > -fi > - > -if [ -f $4/System.map ]; then > - mv $4/System.map $4/System.old > -fi > - > -cat $2 > $4/vmlinuz > -cp $3 $4/System.map > - > -test -x /usr/sbin/elilo && /usr/sbin/elilo > diff --git a/scripts/install.sh b/scripts/install.sh > index 73067b535ea0..b6ca2a0f0983 100644 > --- a/scripts/install.sh > +++ b/scripts/install.sh > @@ -52,6 +52,7 @@ if [ -x /sbin/"${INSTALLKERNEL}" ]; then exec > /sbin/"${INSTALLKERNEL}" "$@"; fi > base=$(basename "$2") > if [ "$base" = "bzImage" ] || > [ "$base" = "Image.gz" ] || > + [ "$base" = "vmlinux.gz" ] || > [ "$base" = "zImage" ] ; then > # Compressed install > echo "Installing compressed kernel" > @@ -65,7 +66,7 @@ fi > # Some architectures name their files based on version number, and > # others do not. Call out the ones that do not to make it obvious. > case "${ARCH}" in > - x86) > + ia64 | x86) > version="" > ;; > *) > @@ -86,6 +87,11 @@ case "${ARCH}" in > echo "You have to install it yourself" > fi > ;; > + ia64) > + if [ -x /usr/sbin/elilo ]; then > + /usr/sbin/elilo > + fi > + ;; > x86) > if [ -x /sbin/lilo ]; then > /sbin/lilo > -- > 2.31.1 > -- Sergei
[PATCH] ia64: drop marked broken DISCONTIGMEM and VIRTUAL_MEM_MAP
DISCONTIGMEM was marked BROKEN in 5.11. Let's remove it. Booted SPARSEMEM successfully on rx3600. CC: Andrew Morton CC: linux-i...@vger.kernel.org Signed-off-by: Sergei Trofimovich --- arch/ia64/Kconfig | 23 arch/ia64/configs/bigsur_defconfig | 1 - arch/ia64/include/asm/meminit.h| 11 -- arch/ia64/include/asm/page.h | 25 +--- arch/ia64/include/asm/pgtable.h| 5 - arch/ia64/kernel/Makefile | 2 +- arch/ia64/kernel/ia64_ksyms.c | 12 -- arch/ia64/kernel/machine_kexec.c | 2 +- arch/ia64/mm/Makefile | 1 - arch/ia64/mm/contig.c | 4 - arch/ia64/mm/discontig.c | 21 --- arch/ia64/mm/fault.c | 15 -- arch/ia64/mm/init.c| 213 - 13 files changed, 4 insertions(+), 331 deletions(-) delete mode 100644 arch/ia64/kernel/ia64_ksyms.c diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 2ad7a8d29fcc..81e2b893b1e7 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -286,15 +286,6 @@ config FORCE_CPEI_RETARGET config ARCH_SELECT_MEMORY_MODEL def_bool y -config ARCH_DISCONTIGMEM_ENABLE - def_bool y - depends on BROKEN - help - Say Y to support efficient handling of discontiguous physical memory, - for architectures which are either NUMA (Non-Uniform Memory Access) - or have huge holes in the physical address space for other reasons. - See for more. - config ARCH_FLATMEM_ENABLE def_bool y @@ -325,22 +316,8 @@ config NODES_SHIFT MAX_NUMNODES will be 2^(This value). If in doubt, use the default. -# VIRTUAL_MEM_MAP and FLAT_NODE_MEM_MAP are functionally equivalent. -# VIRTUAL_MEM_MAP has been retained for historical reasons. -config VIRTUAL_MEM_MAP - bool "Virtual mem map" - depends on !SPARSEMEM && !FLATMEM - default y - help - Say Y to compile the kernel with support for a virtual mem map. - This code also only takes effect if a memory hole of greater than - 1 Gb is found during boot. You must turn this option on if you - require the DISCONTIGMEM option for your machine. If you are - unsure, say Y. - config HOLES_IN_ZONE bool - default y if VIRTUAL_MEM_MAP config HAVE_ARCH_NODEDATA_EXTENSION def_bool y diff --git a/arch/ia64/configs/bigsur_defconfig b/arch/ia64/configs/bigsur_defconfig index c409756b5396..0341a67cc1bf 100644 --- a/arch/ia64/configs/bigsur_defconfig +++ b/arch/ia64/configs/bigsur_defconfig @@ -9,7 +9,6 @@ CONFIG_SGI_PARTITION=y CONFIG_SMP=y CONFIG_NR_CPUS=2 CONFIG_PREEMPT=y -# CONFIG_VIRTUAL_MEM_MAP is not set CONFIG_IA64_PALINFO=y CONFIG_EFI_VARS=y CONFIG_BINFMT_MISC=m diff --git a/arch/ia64/include/asm/meminit.h b/arch/ia64/include/asm/meminit.h index e789c0818edb..6c47a239fc26 100644 --- a/arch/ia64/include/asm/meminit.h +++ b/arch/ia64/include/asm/meminit.h @@ -58,15 +58,4 @@ extern int reserve_elfcorehdr(u64 *start, u64 *end); extern int register_active_ranges(u64 start, u64 len, int nid); -#ifdef CONFIG_VIRTUAL_MEM_MAP - extern unsigned long VMALLOC_END; - extern struct page *vmem_map; - extern int create_mem_map_page_table(u64 start, u64 end, void *arg); - extern int vmemmap_find_next_valid_pfn(int, int); -#else -static inline int vmemmap_find_next_valid_pfn(int node, int i) -{ - return i + 1; -} -#endif #endif /* meminit_h */ diff --git a/arch/ia64/include/asm/page.h b/arch/ia64/include/asm/page.h index b69a5499d75b..f4dc81fa7146 100644 --- a/arch/ia64/include/asm/page.h +++ b/arch/ia64/include/asm/page.h @@ -95,31 +95,10 @@ do {\ #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) -#ifdef CONFIG_VIRTUAL_MEM_MAP -extern int ia64_pfn_valid (unsigned long pfn); -#else -# define ia64_pfn_valid(pfn) 1 -#endif - -#ifdef CONFIG_VIRTUAL_MEM_MAP -extern struct page *vmem_map; -#ifdef CONFIG_DISCONTIGMEM -# define page_to_pfn(page) ((unsigned long) (page - vmem_map)) -# define pfn_to_page(pfn) (vmem_map + (pfn)) -# define __pfn_to_phys(pfn)PFN_PHYS(pfn) -#else -# include -#endif -#else -# include -#endif +#include #ifdef CONFIG_FLATMEM -# define pfn_valid(pfn)(((pfn) < max_mapnr) && ia64_pfn_valid(pfn)) -#elif defined(CONFIG_DISCONTIGMEM) -extern unsigned long min_low_pfn; -extern unsigned long max_low_pfn; -# define pfn_valid(pfn)(((pfn) >= min_low_pfn) && ((pfn) < max_low_pfn) && ia64_pfn_valid(pfn)) +# define pfn_valid(pfn)((pfn) < max_mapnr) #endif #define page_to_phys(page) (page_to_pfn(page) << PAGE_SHIFT) diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h index 9b4efe89e62d..8994514ebe91 100644 --- a/arch/ia64/include/asm/pgtable.h +++ b/arch/ia64/include/asm/pgtable.h @@ -223,1
[PATCH] mm: page_poison: print page owner info when corruption is caught
When page_poison detects page corruption it's useful to see who freed a page recently to have a guess where write-after-free corruption happens. After this change corruption report has extra page_owner data. Example report from real corruption: pagealloc: memory corruption e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60 e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20 ... CPU: 1 PID: 220402 Comm: cc1plus Not tainted 5.12.0-rc5-00107-g9720c6f59ecf #245 Hardware name: hp server rx3600, BIOS 04.03 04/08/2008 ... Call Trace: [] show_stack+0x90/0xc0 [] dump_stack+0x150/0x1c0 [] __kernel_unpoison_pages+0x410/0x440 [] get_page_from_freelist+0x1460/0x2ca0 [] __alloc_pages_nodemask+0x3c0/0x660 [] alloc_pages_vma+0xb0/0x500 [] __handle_mm_fault+0x1230/0x1fe0 [] handle_mm_fault+0x310/0x4e0 [] ia64_do_page_fault+0x1f0/0xb80 [] ia64_leave_kernel+0x0/0x270 page_owner tracks the page as freed page allocated via order 0, migratetype Movable, gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts 8173444098740 __reset_page_owner+0x40/0x200 free_pcp_prepare+0x4d0/0x600 free_unref_page+0x20/0x1c0 __put_page+0x110/0x1a0 migrate_pages+0x16d0/0x1dc0 compact_zone+0xfc0/0x1aa0 proactive_compact_node+0xd0/0x1e0 kcompactd+0x550/0x600 kthread+0x2c0/0x2e0 call_payload+0x50/0x80 Here we can see that page was freed by page migration but something managed to write to it afterwards. CC: Andrew Morton CC: linux...@kvack.org Signed-off-by: Sergei Trofimovich --- mm/page_poison.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/page_poison.c b/mm/page_poison.c index 65cdf844c8ad..ef2a1eab13d7 100644 --- a/mm/page_poison.c +++ b/mm/page_poison.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include @@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned char b) return error && !(error & (error - 1)); } -static void check_poison_mem(unsigned char *mem, size_t bytes) +static void check_poison_mem(struct page *page, unsigned char *mem, size_t bytes) { static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10); unsigned char *start; @@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t bytes) print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start, end - start + 1, 1); dump_stack(); + dump_page_owner(page); } static void unpoison_page(struct page *page) @@ -82,7 +84,7 @@ static void unpoison_page(struct page *page) * that is freed to buddy. Thus no extra check is done to * see if a page was poisoned. */ - check_poison_mem(addr, PAGE_SIZE); + check_poison_mem(page, addr, PAGE_SIZE); kunmap_atomic(addr); } -- 2.31.1
Re: [PATCH v2 3/3] hpsa: add an assert to prevent from __packed reintroduction
On Fri, 2 Apr 2021 14:40:39 + "Elliott, Robert (Servers)" wrote: > It looks like ia64 implements atomic_t as a 64-bit value and expects atomic_t > to be 64-bit aligned, but does nothing to ensure that. > > For x86, atomic_t is a 32-bit value and atomic64_t is a 64-bit value, and > the definition of atomic64_t is overridden in a way that ensures > 64-bit (8 byte) alignment: > > Generic definitions are in include/linux/types.h: > typedef struct { > int counter; > } atomic_t; > > #define ATOMIC_INIT(i) { (i) } > > #ifdef CONFIG_64BIT > typedef struct { > s64 counter; > } atomic64_t; > #endif > > Override in arch/x86/include/asm/atomic64_32.h: > typedef struct { > s64 __aligned(8) counter; > } atomic64_t; > > Perhaps ia64 needs to take over the definition of both atomic_t and atomic64_t > and do the same? I don't think it's needed. ia64 is a 64-bit arch with expected natural alignment for s64: alignof(s64)=8. Also if my understanding is correct adding __aligned(8) would not fix use case of embedding locks into packed structs even on x86_64 (or i386): $ cat a.c #include #include typedef struct { unsigned long long __attribute__((aligned(8))) l; } lock_t; struct s { char c; lock_t lock; } __attribute__((packed)); int main() { printf ("offsetof(struct s, lock) = %lu\nsizeof(struct s) = %lu\n", offsetof(struct s, lock), sizeof(struct s)); } $ x86_64-pc-linux-gnu-gcc a.c -o a && ./a offsetof(struct s, lock) = 1 sizeof(struct s) = 9 $ x86_64-pc-linux-gnu-gcc a.c -o a -m32 && ./a offsetof(struct s, lock) = 1 sizeof(struct s) = 9 Note how alignment of 'lock' stays 1 byte in both cases. 8-byte alignment added for i386 in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bbf2a330d92c5afccfd17592ba9ccd50f41cf748 is only as a performance optimization (not a correctness fix). Larger alignment on i386 is preferred because alignof(s64)=4 on that target which might make atomic op span cache lines that leads to performance degradation. -- Sergei
[PATCH] ia64: module: fix symbolizer crash on fdescr
Noticed failure as a crash on ia64 when tried to symbolize all backtraces collected by page_owner=on: $ cat /sys/kernel/debug/page_owner CPU: 1 PID: 2074 Comm: cat Not tainted 5.12.0-rc4 #226 Hardware name: hp server rx3600, BIOS 04.03 04/08/2008 ip is at dereference_module_function_descriptor+0x41/0x100 Crash happens at dereference_module_function_descriptor() due to use-after-free when dereferencing ".opd" section header. All section headers are already freed after module is laoded successfully. To keep symbolizer working the change stores ".opd" address and size after module is relocated to a new place and before section headers are discarded. To make similar errors less obscure module_finalize() now zeroes out all variables relevant to module loading only. CC: Andrew Morton CC: linux-i...@vger.kernel.org Signed-off-by: Sergei Trofimovich --- arch/ia64/include/asm/module.h | 6 +- arch/ia64/kernel/module.c | 29 + 2 files changed, 30 insertions(+), 5 deletions(-) diff --git a/arch/ia64/include/asm/module.h b/arch/ia64/include/asm/module.h index 5a29652e6def..7271b9c5fc76 100644 --- a/arch/ia64/include/asm/module.h +++ b/arch/ia64/include/asm/module.h @@ -14,16 +14,20 @@ struct elf64_shdr; /* forward declration */ struct mod_arch_specific { + /* Used only at module load time. */ struct elf64_shdr *core_plt;/* core PLT section */ struct elf64_shdr *init_plt;/* init PLT section */ struct elf64_shdr *got; /* global offset table */ struct elf64_shdr *opd; /* official procedure descriptors */ struct elf64_shdr *unwind; /* unwind-table section */ unsigned long gp; /* global-pointer for module */ + unsigned int next_got_entry;/* index of next available got entry */ + /* Used at module run and cleanup time. */ void *core_unw_table; /* core unwind-table cookie returned by unwinder */ void *init_unw_table; /* init unwind-table cookie returned by unwinder */ - unsigned int next_got_entry;/* index of next available got entry */ + void *opd_addr; /* symbolize uses .opd to get to actual function */ + unsigned long opd_size; }; #define ARCH_SHF_SMALL SHF_IA_64_SHORT diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c index 00a496cb346f..f3385fe6e37e 100644 --- a/arch/ia64/kernel/module.c +++ b/arch/ia64/kernel/module.c @@ -905,9 +905,31 @@ register_unwind_table (struct module *mod) int module_finalize (const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs, struct module *mod) { + struct mod_arch_specific *mas = >arch; + DEBUGP("%s: init: entry=%p\n", __func__, mod->init); - if (mod->arch.unwind) + if (mas->unwind) register_unwind_table(mod); + + /* +* ".opd" was already relocated to the final destination. Store +* it's address for use in symbolizer. +*/ + mas->opd_addr = (void *)mas->opd->sh_addr; + mas->opd_size = mas->opd->sh_size; + + /* +* Module relocation was already done at this point. Section +* headers are about to be deleted. Wipe out load-time context. +*/ + mas->core_plt = NULL; + mas->init_plt = NULL; + mas->got = NULL; + mas->opd = NULL; + mas->unwind = NULL; + mas->gp = 0; + mas->next_got_entry = 0; + return 0; } @@ -926,10 +948,9 @@ module_arch_cleanup (struct module *mod) void *dereference_module_function_descriptor(struct module *mod, void *ptr) { - Elf64_Shdr *opd = mod->arch.opd; + struct mod_arch_specific *mas = >arch; - if (ptr < (void *)opd->sh_addr || - ptr >= (void *)(opd->sh_addr + opd->sh_size)) + if (ptr < mas->opd_addr || ptr >= mas->opd_addr + mas->opd_size) return ptr; return dereference_function_descriptor(ptr); -- 2.31.1
[PATCH v2] mm: page_owner: detect page_owner recursion via task_struct
Before the change page_owner recursion was detected via fetching backtrace and inspecting it for current instruction pointer. It has a few problems: - it is slightly slow as it requires extra backtrace and a linear stack scan of the result - it is too late to check if backtrace fetching required memory allocation itself (ia64's unwinder requires it). To simplify recursion tracking let's use page_owner recursion flag in 'struct task_struct'. The change make page_owner=on work on ia64 by avoiding infinite recursion in: kmalloc() -> __set_page_owner() -> save_stack() -> unwind() [ia64-specific] -> build_script() -> kmalloc() -> __set_page_owner() [we short-circuit here] -> save_stack() -> unwind() [recursion] CC: Ingo Molnar CC: Peter Zijlstra CC: Juri Lelli CC: Vincent Guittot CC: Dietmar Eggemann CC: Steven Rostedt CC: Ben Segall CC: Mel Gorman CC: Daniel Bristot de Oliveira CC: Andrew Morton CC: linux...@kvack.org Signed-off-by: Sergei Trofimovich --- Change since v1: - use bit from task_struct instead of a new field - track only one recursion depth level so far include/linux/sched.h | 4 mm/page_owner.c | 32 ++-- 2 files changed, 14 insertions(+), 22 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index ef00bb22164c..00986450677c 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -841,6 +841,10 @@ struct task_struct { /* Stalled due to lack of memory */ unsignedin_memstall:1; #endif +#ifdef CONFIG_PAGE_OWNER + /* Used by page_owner=on to detect recursion in page tracking. */ + unsignedin_page_owner:1; +#endif unsigned long atomic_flags; /* Flags requiring atomic access. */ diff --git a/mm/page_owner.c b/mm/page_owner.c index 7147fd34a948..64b2e4c6afb7 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -97,42 +97,30 @@ static inline struct page_owner *get_page_owner(struct page_ext *page_ext) return (void *)page_ext + page_owner_ops.offset; } -static inline bool check_recursive_alloc(unsigned long *entries, -unsigned int nr_entries, -unsigned long ip) -{ - unsigned int i; - - for (i = 0; i < nr_entries; i++) { - if (entries[i] == ip) - return true; - } - return false; -} - static noinline depot_stack_handle_t save_stack(gfp_t flags) { unsigned long entries[PAGE_OWNER_STACK_DEPTH]; depot_stack_handle_t handle; unsigned int nr_entries; - nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2); - /* -* We need to check recursion here because our request to -* stackdepot could trigger memory allocation to save new -* entry. New memory allocation would reach here and call -* stack_depot_save_entries() again if we don't catch it. There is -* still not enough memory in stackdepot so it would try to -* allocate memory again and loop forever. +* Avoid recursion. +* +* Sometimes page metadata allocation tracking requires more +* memory to be allocated: +* - when new stack trace is saved to stack depot +* - when backtrace itself is calculated (ia64) */ - if (check_recursive_alloc(entries, nr_entries, _RET_IP_)) + if (current->in_page_owner) return dummy_handle; + current->in_page_owner = 1; + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2); handle = stack_depot_save(entries, nr_entries, flags); if (!handle) handle = failure_handle; + current->in_page_owner = 0; return handle; } -- 2.31.1
Re: [PATCH] mm: page_owner: detect page_owner recursion via task_struct
On Thu, 1 Apr 2021 17:05:19 -0700 Andrew Morton wrote: > On Thu, 1 Apr 2021 23:30:10 +0100 Sergei Trofimovich > wrote: > > > Before the change page_owner recursion was detected via fetching > > backtrace and inspecting it for current instruction pointer. > > It has a few problems: > > - it is slightly slow as it requires extra backtrace and a linear > > stack scan of the result > > - it is too late to check if backtrace fetching required memory > > allocation itself (ia64's unwinder requires it). > > > > To simplify recursion tracking let's use page_owner recursion depth > > as a counter in 'struct task_struct'. > > Seems like a better approach. > > > The change make page_owner=on work on ia64 bu avoiding infinite > > recursion in: > > kmalloc() > > -> __set_page_owner() > > -> save_stack() > > -> unwind() [ia64-specific] > > -> build_script() > > -> kmalloc() > > -> __set_page_owner() [we short-circuit here] > > -> save_stack() > > -> unwind() [recursion] > > > > ... > > > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -1371,6 +1371,15 @@ struct task_struct { > > struct llist_head kretprobe_instances; > > #endif > > > > +#ifdef CONFIG_PAGE_OWNER > > + /* > > +* Used by page_owner=on to detect recursion in page tracking. > > +* Is it fine to have non-atomic ops here if we ever access > > +* this variable via current->page_owner_depth? > > Yes, it is fine. This part of the comment can be removed. Cool! Will do. > > +*/ > > + unsigned int page_owner_depth; > > +#endif > > Adding to the task_struct has a cost. But I don't expect that > PAGE_OWNER is commonly used in prodction builds (correct?). Yeah, PAGE_OWNER should not be enabled for production kernels. Not having extra memory overhead (or layout disruption) is a nice benefit though. I'll switch to "Unserialized, strictly 'current'" bitfield. > > --- a/init/init_task.c > > +++ b/init/init_task.c > > @@ -213,6 +213,9 @@ struct task_struct init_task > > #ifdef CONFIG_SECCOMP > > .seccomp= { .filter_count = ATOMIC_INIT(0) }, > > #endif > > +#ifdef CONFIG_PAGE_OWNER > > + .page_owner_depth = 0, > > +#endif > > }; > > EXPORT_SYMBOL(init_task); > > It will be initialized to zero by the compiler. We can omit this hunk > entirely. > > > --- a/mm/page_owner.c > > +++ b/mm/page_owner.c > > @@ -20,6 +20,16 @@ > > */ > > #define PAGE_OWNER_STACK_DEPTH (16) > > > > +/* > > + * How many reenters we allow to page_owner. > > + * > > + * Sometimes metadata allocation tracking requires more memory to be > > allocated: > > + * - when new stack trace is saved to stack depot > > + * - when backtrace itself is calculated (ia64) > > + * Instead of falling to infinite recursion give it a chance to recover. > > + */ > > +#define PAGE_OWNER_MAX_RECURSION_DEPTH (1) > > So this is presently a boolean. Is there any expectation that > PAGE_OWNER_MAX_RECURSION_DEPTH will ever be greater than 1? If not, we > could use a single bit in the task_struct. Add it to the > "Unserialized, strictly 'current'" bitfields. Could make it a 2-bit field if > we want > to permit PAGE_OWNER_MAX_RECURSION_DEPTH=larger. Let's settle on depth=1. depth>1 is not trivial for other reasons I don't completely understand. Follow-up patch incoming. -- Sergei
[PATCH] mm: page_owner: detect page_owner recursion via task_struct
Before the change page_owner recursion was detected via fetching backtrace and inspecting it for current instruction pointer. It has a few problems: - it is slightly slow as it requires extra backtrace and a linear stack scan of the result - it is too late to check if backtrace fetching required memory allocation itself (ia64's unwinder requires it). To simplify recursion tracking let's use page_owner recursion depth as a counter in 'struct task_struct'. The change make page_owner=on work on ia64 bu avoiding infinite recursion in: kmalloc() -> __set_page_owner() -> save_stack() -> unwind() [ia64-specific] -> build_script() -> kmalloc() -> __set_page_owner() [we short-circuit here] -> save_stack() -> unwind() [recursion] CC: Ingo Molnar CC: Peter Zijlstra CC: Juri Lelli CC: Vincent Guittot CC: Dietmar Eggemann CC: Steven Rostedt CC: Ben Segall CC: Mel Gorman CC: Daniel Bristot de Oliveira CC: Andrew Morton CC: linux...@kvack.org Signed-off-by: Sergei Trofimovich --- include/linux/sched.h | 9 + init/init_task.c | 3 +++ mm/page_owner.c | 41 + 3 files changed, 29 insertions(+), 24 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index ef00bb22164c..35771703fd89 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1371,6 +1371,15 @@ struct task_struct { struct llist_head kretprobe_instances; #endif +#ifdef CONFIG_PAGE_OWNER + /* +* Used by page_owner=on to detect recursion in page tracking. +* Is it fine to have non-atomic ops here if we ever access +* this variable via current->page_owner_depth? +*/ + unsigned int page_owner_depth; +#endif + /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. diff --git a/init/init_task.c b/init/init_task.c index 3711cdaafed2..f579f2b2eca8 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -213,6 +213,9 @@ struct task_struct init_task #ifdef CONFIG_SECCOMP .seccomp= { .filter_count = ATOMIC_INIT(0) }, #endif +#ifdef CONFIG_PAGE_OWNER + .page_owner_depth = 0, +#endif }; EXPORT_SYMBOL(init_task); diff --git a/mm/page_owner.c b/mm/page_owner.c index 7147fd34a948..422558605fcc 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -20,6 +20,16 @@ */ #define PAGE_OWNER_STACK_DEPTH (16) +/* + * How many reenters we allow to page_owner. + * + * Sometimes metadata allocation tracking requires more memory to be allocated: + * - when new stack trace is saved to stack depot + * - when backtrace itself is calculated (ia64) + * Instead of falling to infinite recursion give it a chance to recover. + */ +#define PAGE_OWNER_MAX_RECURSION_DEPTH (1) + struct page_owner { unsigned short order; short last_migrate_reason; @@ -97,42 +107,25 @@ static inline struct page_owner *get_page_owner(struct page_ext *page_ext) return (void *)page_ext + page_owner_ops.offset; } -static inline bool check_recursive_alloc(unsigned long *entries, -unsigned int nr_entries, -unsigned long ip) -{ - unsigned int i; - - for (i = 0; i < nr_entries; i++) { - if (entries[i] == ip) - return true; - } - return false; -} - static noinline depot_stack_handle_t save_stack(gfp_t flags) { unsigned long entries[PAGE_OWNER_STACK_DEPTH]; depot_stack_handle_t handle; unsigned int nr_entries; - nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2); - - /* -* We need to check recursion here because our request to -* stackdepot could trigger memory allocation to save new -* entry. New memory allocation would reach here and call -* stack_depot_save_entries() again if we don't catch it. There is -* still not enough memory in stackdepot so it would try to -* allocate memory again and loop forever. -*/ - if (check_recursive_alloc(entries, nr_entries, _RET_IP_)) + /* Avoid recursion. Used in stack trace generation code. */ + if (current->page_owner_depth >= PAGE_OWNER_MAX_RECURSION_DEPTH) return dummy_handle; + current->page_owner_depth++; + + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2); + handle = stack_depot_save(entries, nr_entries, flags); if (!handle) handle = failure_handle; + current->page_owner_depth--; return handle; } -- 2.31.1
[PATCH] mm: page_owner: fetch backtrace only for tracked pages
Very minor optimization. CC: Andrew Morton CC: linux...@kvack.org Signed-off-by: Sergei Trofimovich --- mm/page_owner.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/page_owner.c b/mm/page_owner.c index 63e4ecaba97b..7147fd34a948 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -140,14 +140,14 @@ void __reset_page_owner(struct page *page, unsigned int order) { int i; struct page_ext *page_ext; - depot_stack_handle_t handle = 0; + depot_stack_handle_t handle; struct page_owner *page_owner; - handle = save_stack(GFP_NOWAIT | __GFP_NOWARN); - page_ext = lookup_page_ext(page); if (unlikely(!page_ext)) return; + + handle = save_stack(GFP_NOWAIT | __GFP_NOWARN); for (i = 0; i < (1 << order); i++) { __clear_bit(PAGE_EXT_OWNER_ALLOCATED, _ext->flags); page_owner = get_page_owner(page_ext); -- 2.31.1
[PATCH] mm: page_owner: use kstrtobool() to parse bool option
I tried to use page_owner=1 for a while noticed too late it had no effect as opposed to similar init_on_alloc=1 (these work). Let's make them consistent. The change decreses binary size slightly: textdata bss dec hex filename 12408 321 17 1274631ca mm/page_owner.o.before 12320 321 17 126583172 mm/page_owner.o.after CC: Andrew Morton CC: linux...@kvack.org Signed-off-by: Sergei Trofimovich --- mm/page_owner.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/mm/page_owner.c b/mm/page_owner.c index d15c7c4994f5..63e4ecaba97b 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -41,13 +41,7 @@ static void init_early_allocated_pages(void); static int __init early_page_owner_param(char *buf) { - if (!buf) - return -EINVAL; - - if (strcmp(buf, "on") == 0) - page_owner_enabled = true; - - return 0; + return kstrtobool(buf, _owner_enabled); } early_param("page_owner", early_page_owner_param); -- 2.31.1
Re: [PATCH] ia64: fix user_stack_pointer() for ptrace()
On Wed, 31 Mar 2021 17:49:08 -0700 Andrew Morton wrote: > On Wed, 31 Mar 2021 09:44:47 +0100 Sergei Trofimovich > wrote: > > > ia64 has two stacks: > > - memory stack (or stack), pointed at by by r12 > > - register backing store (register stack), pointed at > > ar.bsp/ar.bspstore with complications around dirty > > register frame on CPU. > > > > In https://bugs.gentoo.org/769614 Dmitry noticed that > > PTRACE_GET_SYSCALL_INFO returns register stack instead > > memory stack. > > > > The bug comes from the fact that user_stack_pointer() and > > current_user_stack_pointer() don't return the same register: > > > > ulong user_stack_pointer(struct pt_regs *regs) { return > > regs->ar_bspstore; } > > #define current_user_stack_pointer() (current_pt_regs()->r12) > > > > The change gets both back in sync. > > > > I think ptrace(PTRACE_GET_SYSCALL_INFO) is the only affected user > > by this bug on ia64. > > > > The change fixes 'rt_sigreturn.gen.test' strace test where > > it was observed initially. > > > > I assume a cc:stable is justified here? > > The bug seems to have been there for 10+ years, so there isn't a lot of > point in looking for the Fixes: reference. Yes, I think cc:stable is fine. -- Sergei
[PATCH] ia64: fix user_stack_pointer() for ptrace()
ia64 has two stacks: - memory stack (or stack), pointed at by by r12 - register backing store (register stack), pointed at ar.bsp/ar.bspstore with complications around dirty register frame on CPU. In https://bugs.gentoo.org/769614 Dmitry noticed that PTRACE_GET_SYSCALL_INFO returns register stack instead memory stack. The bug comes from the fact that user_stack_pointer() and current_user_stack_pointer() don't return the same register: ulong user_stack_pointer(struct pt_regs *regs) { return regs->ar_bspstore; } #define current_user_stack_pointer() (current_pt_regs()->r12) The change gets both back in sync. I think ptrace(PTRACE_GET_SYSCALL_INFO) is the only affected user by this bug on ia64. The change fixes 'rt_sigreturn.gen.test' strace test where it was observed initially. CC: Andrew Morton CC: Oleg Nesterov CC: linux-i...@vger.kernel.org Bug: https://bugs.gentoo.org/769614 Reported-by: Dmitry V. Levin Signed-off-by: Sergei Trofimovich --- arch/ia64/include/asm/ptrace.h | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h index b3aa46090101..08179135905c 100644 --- a/arch/ia64/include/asm/ptrace.h +++ b/arch/ia64/include/asm/ptrace.h @@ -54,8 +54,7 @@ static inline unsigned long user_stack_pointer(struct pt_regs *regs) { - /* FIXME: should this be bspstore + nr_dirty regs? */ - return regs->ar_bspstore; + return regs->r12; } static inline int is_syscall_success(struct pt_regs *regs) @@ -79,11 +78,6 @@ static inline long regs_return_value(struct pt_regs *regs) unsigned long __ip = instruction_pointer(regs); \ (__ip & ~3UL) + ((__ip & 3UL) << 2);\ }) -/* - * Why not default? Because user_stack_pointer() on ia64 gives register - * stack backing store instead... - */ -#define current_user_stack_pointer() (current_pt_regs()->r12) /* given a pointer to a task_struct, return the user's pt_regs */ # define task_pt_regs(t) (((struct pt_regs *) ((char *) (t) + IA64_STK_OFFSET)) - 1) -- 2.31.1
Re: [PATCH mm v2] mm, kasan: fix for "integrate page_alloc init with HW_TAGS"
On Tue, 30 Mar 2021 18:44:09 +0200 Vlastimil Babka wrote: > On 3/30/21 6:37 PM, Andrey Konovalov wrote: > > My commit "integrate page_alloc init with HW_TAGS" changed the order of > > kernel_unpoison_pages() and kernel_init_free_pages() calls. This leads > > to complaints from the page unpoisoning code, as the poison pattern gets > > overwritten for __GFP_ZERO allocations. > > > > Fix by restoring the initial order. Also add a warning comment. > > > > Reported-by: Vlastimil Babka > > Reported-by: Sergei Trofimovich > > Signed-off-by: Andrey Konovalov > > Tested that the bug indeed occurs in -next and is fixed by this. Thanks. Reviewed-by: Sergei Trofimovich > > --- > > mm/page_alloc.c | 8 +++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 033bd92e8398..d2c020563c0b 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2328,6 +2328,13 @@ inline void post_alloc_hook(struct page *page, > > unsigned int order, > > arch_alloc_page(page, order); > > debug_pagealloc_map_pages(page, 1 << order); > > > > + /* > > +* Page unpoisoning must happen before memory initialization. > > +* Otherwise, the poison pattern will be overwritten for __GFP_ZERO > > +* allocations and the page unpoisoning code will complain. > > +*/ > > + kernel_unpoison_pages(page, 1 << order); > > + > > /* > > * As memory initialization might be integrated into KASAN, > > * kasan_alloc_pages and kernel_init_free_pages must be > > @@ -2338,7 +2345,6 @@ inline void post_alloc_hook(struct page *page, > > unsigned int order, > > if (init && !kasan_has_integrated_init()) > > kernel_init_free_pages(page, 1 << order); > > > > - kernel_unpoison_pages(page, 1 << order); > > set_page_owner(page, order, gfp_flags); > > } > > > > > -- Sergei
[PATCH v2 3/3] hpsa: add an assert to prevent from __packed reintroduction
CC: linux-i...@vger.kernel.org CC: storage...@microchip.com CC: linux-s...@vger.kernel.org CC: Joe Szczypek CC: Scott Benesh CC: Scott Teel CC: Tomas Henzl CC: "Martin K. Petersen" CC: Don Brace Reported-by: John Paul Adrian Glaubitz Suggested-by: Don Brace Fixes: f749d8b7a "scsi: hpsa: Correct dev cmds outstanding for retried cmds" Signed-off-by: Sergei Trofimovich --- drivers/scsi/hpsa_cmd.h | 12 1 file changed, 12 insertions(+) diff --git a/drivers/scsi/hpsa_cmd.h b/drivers/scsi/hpsa_cmd.h index 885b1f1fb20a..ba6a3aa8d954 100644 --- a/drivers/scsi/hpsa_cmd.h +++ b/drivers/scsi/hpsa_cmd.h @@ -22,6 +22,9 @@ #include +#include /* static_assert */ +#include /* offsetof */ + /* general boundary defintions */ #define SENSEINFOBYTES 32 /* may vary between hbas */ #define SG_ENTRIES_IN_CMD 32 /* Max SG entries excluding chain blocks */ @@ -454,6 +457,15 @@ struct CommandList { atomic_t refcount; /* Must be last to avoid memset in hpsa_cmd_init() */ } __aligned(COMMANDLIST_ALIGNMENT); +/* + * Make sure our embedded atomic variable is aligned. Otherwise we break atomic + * operations on architectures that don't support unaligned atomics like IA64. + * + * The assert guards against reintroductin against unwanted __packed to + * the struct CommandList. + */ +static_assert(offsetof(struct CommandList, refcount) % __alignof__(atomic_t) == 0); + /* Max S/G elements in I/O accelerator command */ #define IOACCEL1_MAXSGENTRIES 24 #define IOACCEL2_MAXSGENTRIES 28 -- 2.31.1
[PATCH v2 2/3] hpsa: fix boot on ia64 (atomic_t alignment)
The failure initially observed as boot failure on rx3600 ia64 machine with RAID bus controller: Hewlett-Packard Company Smart Array P600: kernel unaligned access to 0xe00105dd8b95, ip=0xa00100b87551 kernel unaligned access to 0xe00105dd8e95, ip=0xa00100b87551 hpsa :14:01.0: Controller reports max supported commands of 0 Using 16 instead. Ensure that firmware is up to date. swapper/0[1]: error during unaligned kernel access Here unaligned access comes from 'struct CommandList' that happens to be packed. The change f749d8b7a ("scsi: hpsa: Correct dev cmds outstanding for retried cmds") introduced unexpected padding and un-aligned atomic_t from natural alignment to something else. This change removes packing annotation from struct not intended to be sent to controller as is. This restores natural `atomic_t` alignment. The change is tested on the same rx3600 machine. CC: linux-i...@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: storage...@microchip.com CC: linux-s...@vger.kernel.org CC: Joe Szczypek CC: Scott Benesh CC: Scott Teel CC: Tomas Henzl CC: "Martin K. Petersen" CC: Don Brace Reported-by: John Paul Adrian Glaubitz Suggested-by: Don Brace Fixes: f749d8b7a "scsi: hpsa: Correct dev cmds outstanding for retried cmds" Signed-off-by: Sergei Trofimovich --- drivers/scsi/hpsa_cmd.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/hpsa_cmd.h b/drivers/scsi/hpsa_cmd.h index 280e933d27e7..885b1f1fb20a 100644 --- a/drivers/scsi/hpsa_cmd.h +++ b/drivers/scsi/hpsa_cmd.h @@ -452,7 +452,7 @@ struct CommandList { bool retry_pending; struct hpsa_scsi_dev_t *device; atomic_t refcount; /* Must be last to avoid memset in hpsa_cmd_init() */ -} __packed __aligned(COMMANDLIST_ALIGNMENT); +} __aligned(COMMANDLIST_ALIGNMENT); /* Max S/G elements in I/O accelerator command */ #define IOACCEL1_MAXSGENTRIES 24 -- 2.31.1
[PATCH v2 1/3] hpsa: use __packed on individual structs, not header-wide
Some of the structs contain `atomic_t` values and are not intended to be sent to IO controller as is. The change adds __packed to every struct and union in the file. Follow-up commits will fix `atomic_t` problems. The commit is a no-op at least on ia64: $ diff -u <(objdump -d -r old.o) <(objdump -d -r new.o) CC: linux-i...@vger.kernel.org CC: storage...@microchip.com CC: linux-s...@vger.kernel.org CC: Joe Szczypek CC: Scott Benesh CC: Scott Teel CC: Tomas Henzl CC: "Martin K. Petersen" CC: Don Brace Reported-by: John Paul Adrian Glaubitz Suggested-by: Don Brace Fixes: f749d8b7a "scsi: hpsa: Correct dev cmds outstanding for retried cmds" Signed-off-by: Sergei Trofimovich --- drivers/scsi/hpsa_cmd.h | 68 - 1 file changed, 34 insertions(+), 34 deletions(-) diff --git a/drivers/scsi/hpsa_cmd.h b/drivers/scsi/hpsa_cmd.h index d126bb877250..280e933d27e7 100644 --- a/drivers/scsi/hpsa_cmd.h +++ b/drivers/scsi/hpsa_cmd.h @@ -20,6 +20,8 @@ #ifndef HPSA_CMD_H #define HPSA_CMD_H +#include + /* general boundary defintions */ #define SENSEINFOBYTES 32 /* may vary between hbas */ #define SG_ENTRIES_IN_CMD 32 /* Max SG entries excluding chain blocks */ @@ -200,12 +202,10 @@ union u64bit { MAX_EXT_TARGETS + 1) /* + 1 is for the controller itself */ /* SCSI-3 Commands */ -#pragma pack(1) - #define HPSA_INQUIRY 0x12 struct InquiryData { u8 data_byte[36]; -}; +} __packed; #define HPSA_REPORT_LOG 0xc2/* Report Logical LUNs */ #define HPSA_REPORT_PHYS 0xc3 /* Report Physical LUNs */ @@ -221,7 +221,7 @@ struct raid_map_disk_data { u8xor_mult[2];/**< XOR multipliers for this position, * valid for data disks only */ u8reserved[2]; -}; +} __packed; struct raid_map_data { __le32 structure_size;/* Size of entire structure in bytes */ @@ -247,14 +247,14 @@ struct raid_map_data { __le16 dekindex; /* Data encryption key index. */ u8reserved[16]; struct raid_map_disk_data data[RAID_MAP_MAX_ENTRIES]; -}; +} __packed; struct ReportLUNdata { u8 LUNListLength[4]; u8 extended_response_flag; u8 reserved[3]; u8 LUN[HPSA_MAX_LUN][8]; -}; +} __packed; struct ext_report_lun_entry { u8 lunid[8]; @@ -269,20 +269,20 @@ struct ext_report_lun_entry { u8 lun_count; /* multi-lun device, how many luns */ u8 redundant_paths; u32 ioaccel_handle; /* ioaccel1 only uses lower 16 bits */ -}; +} __packed; struct ReportExtendedLUNdata { u8 LUNListLength[4]; u8 extended_response_flag; u8 reserved[3]; struct ext_report_lun_entry LUN[HPSA_MAX_PHYS_LUN]; -}; +} __packed; struct SenseSubsystem_info { u8 reserved[36]; u8 portname[8]; u8 reserved1[1108]; -}; +} __packed; /* BMIC commands */ #define BMIC_READ 0x26 @@ -317,7 +317,7 @@ union SCSI3Addr { u8 Targ:6; u8 Mode:2;/* b10 */ } LogUnit; -}; +} __packed; struct PhysDevAddr { u32 TargetId:24; @@ -325,20 +325,20 @@ struct PhysDevAddr { u32 Mode:2; /* 2 level target device addr */ union SCSI3Addr Target[2]; -}; +} __packed; struct LogDevAddr { u32VolId:30; u32Mode:2; u8 reserved[4]; -}; +} __packed; union LUNAddr { u8 LunAddrBytes[8]; union SCSI3AddrSCSI3Lun[4]; struct PhysDevAddr PhysDev; struct LogDevAddr LogDev; -}; +} __packed; struct CommandListHeader { u8 ReplyQueue; @@ -346,7 +346,7 @@ struct CommandListHeader { __le16 SGTotal; __le64 tag; union LUNAddr LUN; -}; +} __packed; struct RequestBlock { u8 CDBLen; @@ -365,18 +365,18 @@ struct RequestBlock { #define GET_DIR(tad) (((tad) >> 6) & 0x03) u16 Timeout; u8 CDB[16]; -}; +} __packed; struct ErrDescriptor { __le64 Addr; __le32 Len; -}; +} __packed; struct SGDescriptor { __le64 Addr; __le32 Len; __le32 Ext; -}; +} __packed; union MoreErrInfo { struct { @@ -390,7 +390,8 @@ union MoreErrInfo { u8 offense_num; /* byte # of offense 0-base */ u32 offense_value; } Invalid_Cmd; -}; +} __packed; + struct ErrorInfo { u8 ScsiStatus; u8 SenseLen; @@ -398,7 +399,7 @@ struct ErrorInfo { u32 ResidualCnt; union MoreErrInfo MoreErrInfo; u8 SenseInfo[SENSEINFOBYTES]; -}; +} __packed; /* Command types */ #define CMD_IOCTL_PEND 0x01 #define CMD_SCSI 0x03 @@ -451,7 +452,7 @@ struct CommandList { bool retry_pending;
[PATCH v3] mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1
On !ARCH_SUPPORTS_DEBUG_PAGEALLOC (like ia64) debug_pagealloc=1 implies page_poison=on: if (page_poisoning_enabled() || (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && debug_pagealloc_enabled())) static_branch_enable(&_page_poisoning_enabled); page_poison=on needs to override init_on_free=1. Before the change it did not work as expected for the following case: - have PAGE_POISONING=y - have page_poison unset - have !ARCH_SUPPORTS_DEBUG_PAGEALLOC arch (like ia64) - have init_on_free=1 - have debug_pagealloc=1 That way we get both keys enabled: - static_branch_enable(_on_free); - static_branch_enable(&_page_poisoning_enabled); which leads to poisoned pages returned for __GFP_ZERO pages. After the change we execute only: - static_branch_enable(&_page_poisoning_enabled); and ignore init_on_free=1. Acked-by: Vlastimil Babka Fixes: 8db26a3d4735 ("mm, page_poison: use static key more efficiently") Cc: CC: Andrew Morton CC: linux...@kvack.org CC: David Hildenbrand CC: Andrey Konovalov Link: https://lkml.org/lkml/2021/3/26/443 Signed-off-by: Sergei Trofimovich --- Change since v2: - Added 'Fixes:' and 'CC: stable@' suggested by Vlastimil and David - Renamed local variable to 'page_poisoning_requested' for consistency suggested by David - Simplified initialization of page_poisoning_requested suggested by David - Added 'Acked-by: Vlastimil' mm/page_alloc.c | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cfc72873961d..4bb3cdfc47f8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -764,32 +764,36 @@ static inline void clear_page_guard(struct zone *zone, struct page *page, */ void init_mem_debugging_and_hardening(void) { + bool page_poisoning_requested = false; + +#ifdef CONFIG_PAGE_POISONING + /* +* Page poisoning is debug page alloc for some arches. If +* either of those options are enabled, enable poisoning. +*/ + if (page_poisoning_enabled() || +(!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && + debug_pagealloc_enabled())) { + static_branch_enable(&_page_poisoning_enabled); + page_poisoning_requested = true; + } +#endif + if (_init_on_alloc_enabled_early) { - if (page_poisoning_enabled()) + if (page_poisoning_requested) pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, " "will take precedence over init_on_alloc\n"); else static_branch_enable(_on_alloc); } if (_init_on_free_enabled_early) { - if (page_poisoning_enabled()) + if (page_poisoning_requested) pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, " "will take precedence over init_on_free\n"); else static_branch_enable(_on_free); } -#ifdef CONFIG_PAGE_POISONING - /* -* Page poisoning is debug page alloc for some arches. If -* either of those options are enabled, enable poisoning. -*/ - if (page_poisoning_enabled() || -(!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && - debug_pagealloc_enabled())) - static_branch_enable(&_page_poisoning_enabled); -#endif - #ifdef CONFIG_DEBUG_PAGEALLOC if (!debug_pagealloc_enabled()) return; -- 2.31.1
Re: [PATCH] ia64: tools: add generic errno.h definition
On Sat, 27 Mar 2021 10:18:18 + Sergei Trofimovich wrote: > On Fri, Mar 12, 2021 at 07:51:35AM +0000, Sergei Trofimovich wrote: > > Noticed missing header when build bpfilter helper: > > > > CC [U] net/bpfilter/main.o > > In file included from /usr/include/linux/errno.h:1, > >from /usr/include/bits/errno.h:26, > >from /usr/include/errno.h:28, > >from net/bpfilter/main.c:4: > > tools/include/uapi/asm/errno.h:13:10: fatal error: > > ../../../arch/ia64/include/uapi/asm/errno.h: No such file or directory > > 13 | #include "../../../arch/ia64/include/uapi/asm/errno.h" > > | ^ > > > > CC: linux-kernel@vger.kernel.org > > CC: net...@vger.kernel.org > > CC: b...@vger.kernel.org > > Signed-off-by: Sergei Trofimovich > > Any chance to pick it up? Alternative (and nicer) patch is queued in -mm as: https://www.ozlabs.org/~akpm/mmotm/broken-out/ia64-tools-remove-inclusion-of-ia64-specific-version-of-errnoh-header.patch -- Sergei
[PATCH] ia64: mca: always make IA64_MCA_DEBUG an expression
At least ia64_mca_log_sal_error_record() expects some statement: static void ia64_mca_log_sal_error_record(int sal_info_type) { ... if (irq_safe) IA64_MCA_DEBUG("CPU %d: SAL log contains %s error record\n", smp_processor_id(), sal_info_type < ARRAY_SIZE(rec_name) ? rec_name[sal_info_type] : "UNKNOWN"); ... } Instead of fixing all callers the change expicitly makes IA64_MCA_DEBUG a non-empty expression. CC: Andrew Morton CC: linux-i...@vger.kernel.org Signed-off-by: Sergei Trofimovich --- arch/ia64/kernel/mca.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c index 79e76712198c..16088c645e2b 100644 --- a/arch/ia64/kernel/mca.c +++ b/arch/ia64/kernel/mca.c @@ -109,9 +109,9 @@ #include "irq.h" #if defined(IA64_MCA_DEBUG_INFO) -# define IA64_MCA_DEBUG(fmt...)printk(fmt) +# define IA64_MCA_DEBUG(fmt...) printk(fmt) #else -# define IA64_MCA_DEBUG(fmt...) +# define IA64_MCA_DEBUG(fmt...) do {} while (0) #endif #define NOTIFY_INIT(event, regs, arg, spin)\ -- 2.31.1
[PATCH v2] ia64: fix EFI_DEBUG build
When enabled local debugging via `#define EFI_DEBUG 1` noticed build failure: arch/ia64/kernel/efi.c:564:8: error: 'i' undeclared (first use in this function) While at it fixed benign string format mismatches visible only when EFI_DEBUG is enabled: arch/ia64/kernel/efi.c:589:11: warning: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' {aka 'long long unsigned int'} [-Wformat=] Fixes: 14fb42090943559 ("efi: Merge EFI system table revision and vendor checks") CC: Ard Biesheuvel CC: linux-...@vger.kernel.org CC: linux-i...@vger.kernel.org Signed-off-by: Sergei Trofimovich --- Change since v1: mention explicitly format string change arch/ia64/kernel/efi.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c index c5fe21de46a8..31149e41f9be 100644 --- a/arch/ia64/kernel/efi.c +++ b/arch/ia64/kernel/efi.c @@ -415,10 +415,10 @@ efi_get_pal_addr (void) mask = ~((1 << IA64_GRANULE_SHIFT) - 1); printk(KERN_INFO "CPU %d: mapping PAL code " - "[0x%lx-0x%lx) into [0x%lx-0x%lx)\n", - smp_processor_id(), md->phys_addr, - md->phys_addr + efi_md_size(md), - vaddr & mask, (vaddr & mask) + IA64_GRANULE_SIZE); + "[0x%llx-0x%llx) into [0x%llx-0x%llx)\n", + smp_processor_id(), md->phys_addr, + md->phys_addr + efi_md_size(md), + vaddr & mask, (vaddr & mask) + IA64_GRANULE_SIZE); #endif return __va(md->phys_addr); } @@ -560,6 +560,7 @@ efi_init (void) { efi_memory_desc_t *md; void *p; + unsigned int i; for (i = 0, p = efi_map_start; p < efi_map_end; ++i, p += efi_desc_size) @@ -586,7 +587,7 @@ efi_init (void) } printk("mem%02d: %s " - "range=[0x%016lx-0x%016lx) (%4lu%s)\n", + "range=[0x%016llx-0x%016llx) (%4lu%s)\n", i, efi_md_typeattr_format(buf, sizeof(buf), md), md->phys_addr, md->phys_addr + efi_md_size(md), size, unit); -- 2.31.1
[PATCH v2] ia64: simplify code flow around swiotlb init
Before the change CONFIG_INTEL_IOMMU && !CONFIG_SWIOTLB && !CONFIG_FLATMEM could skip `set_max_mapnr(max_low_pfn);` if iommu is not present on system. CC: Andrew Morton CC: John Paul Adrian Glaubitz CC: linux-i...@vger.kernel.org Signed-off-by: Sergei Trofimovich --- Change since v1: fixed a typo in commit mesage. arch/ia64/mm/init.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 16d0d7d22657..a63585db94fe 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -644,13 +644,16 @@ mem_init (void) * _before_ any drivers that may need the PCI DMA interface are * initialized or bootmem has been freed. */ + do { #ifdef CONFIG_INTEL_IOMMU - detect_intel_iommu(); - if (!iommu_detected) + detect_intel_iommu(); + if (iommu_detected) + break; #endif #ifdef CONFIG_SWIOTLB swiotlb_init(1); #endif + } while (0); #ifdef CONFIG_FLATMEM BUG_ON(!mem_map); -- 2.31.1
Re: [PATCH] mm: add page_owner_stack=off to make stack collection optional
On Sun, 21 Mar 2021 21:25:01 + Sergei Trofimovich wrote: > On some architectures (like ia64) stack walking is slow > and currently requires memory allocation. This causes stack > collection for page_owner=on to fall into recursion. > > This patch implements a page_owner_stack=off to allow page stats > collection. More user friendly alternative would be to have a GFP_ flag similar to __GFP_NOLOCKDEP which would allow us to skip the recursion. I'll prepare alternative patch. > Signed-off-by: Sergei Trofimovich > --- > .../admin-guide/kernel-parameters.txt | 6 + > mm/Kconfig.debug | 3 ++- > mm/page_owner.c | 23 +-- > 3 files changed, 24 insertions(+), 8 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > b/Documentation/admin-guide/kernel-parameters.txt > index 04545725f187..3e710c4ab4df 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -3518,6 +3518,12 @@ > we can turn it on. > on: enable the feature > > + page_owner_stack= [KNL] Boot-time parameter option disabling stack > + collection of page allocation. Has effect only if > + "page_owner=on" is set. Useful for cases when stack > + collection is too slow or not feasible. > + off: disable the feature > + > page_poison=[KNL] Boot-time parameter changing the state of > poisoning on the buddy allocator, available with > CONFIG_PAGE_POISONING=y. > diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug > index 1e73717802f8..c1ecaf066c93 100644 > --- a/mm/Kconfig.debug > +++ b/mm/Kconfig.debug > @@ -57,7 +57,8 @@ config PAGE_OWNER > help to find bare alloc_page(s) leaks. Even if you include this > feature on your build, it is disabled in default. You should pass > "page_owner=on" to boot parameter in order to enable it. Eats > - a fair amount of memory if enabled. See tools/vm/page_owner_sort.c > + a fair amount of memory if enabled. Call chain tracking can be > + disabled with "page_owner_stack=off". See tools/vm/page_owner_sort.c > for user-space helper. > > If unsure, say N. > diff --git a/mm/page_owner.c b/mm/page_owner.c > index d15c7c4994f5..2cc1113fa28d 100644 > --- a/mm/page_owner.c > +++ b/mm/page_owner.c > @@ -31,6 +31,7 @@ struct page_owner { > }; > > static bool page_owner_enabled = false; > +static bool page_owner_stack_enabled = true; > DEFINE_STATIC_KEY_FALSE(page_owner_inited); > > static depot_stack_handle_t dummy_handle; > @@ -41,21 +42,26 @@ static void init_early_allocated_pages(void); > > static int __init early_page_owner_param(char *buf) > { > - if (!buf) > - return -EINVAL; > - > - if (strcmp(buf, "on") == 0) > - page_owner_enabled = true; > - > - return 0; > + return kstrtobool(buf, _owner_enabled); > } > early_param("page_owner", early_page_owner_param); > > +static int __init early_page_owner_stack_param(char *buf) > +{ > + return kstrtobool(buf, _owner_stack_enabled); > +} > +early_param("page_owner_stack", early_page_owner_stack_param); > + > static bool need_page_owner(void) > { > return page_owner_enabled; > } > > +static bool need_page_owner_stack(void) > +{ > + return page_owner_stack_enabled; > +} > + > static __always_inline depot_stack_handle_t create_dummy_stack(void) > { > unsigned long entries[4]; > @@ -122,6 +128,9 @@ static noinline depot_stack_handle_t save_stack(gfp_t > flags) > depot_stack_handle_t handle; > unsigned int nr_entries; > > + if (!need_page_owner_stack()) > + return failure_handle; > + > nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2); > > /* > -- > 2.31.0 > -- Sergei
[PATCH v2] mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1
On !ARCH_SUPPORTS_DEBUG_PAGEALLOC (like ia64) debug_pagealloc=1 implies page_poison=on: if (page_poisoning_enabled() || (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && debug_pagealloc_enabled())) static_branch_enable(&_page_poisoning_enabled); page_poison=on needs to init_on_free=1. Before the change id happened too late for the following case: - have PAGE_POISONING=y - have page_poison unset - have !ARCH_SUPPORTS_DEBUG_PAGEALLOC arch (like ia64) - have init_on_free=1 - have debug_pagealloc=1 That way we get both keys enabled: - static_branch_enable(_on_free); - static_branch_enable(&_page_poisoning_enabled); which leads to poisoned pages returned for __GFP_ZERO pages. After the change we execute only: - static_branch_enable(&_page_poisoning_enabled); and ignore init_on_free=1. CC: Vlastimil Babka CC: Andrew Morton CC: linux...@kvack.org CC: David Hildenbrand CC: Andrey Konovalov Link: https://lkml.org/lkml/2021/3/26/443 Signed-off-by: Sergei Trofimovich --- mm/page_alloc.c | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d57d9b4f7089..10a8a1d28c11 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -764,32 +764,36 @@ static inline void clear_page_guard(struct zone *zone, struct page *page, */ void init_mem_debugging_and_hardening(void) { + bool page_poison_requested = page_poisoning_enabled(); + +#ifdef CONFIG_PAGE_POISONING + /* +* Page poisoning is debug page alloc for some arches. If +* either of those options are enabled, enable poisoning. +*/ + if (page_poisoning_enabled() || +(!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && + debug_pagealloc_enabled())) { + static_branch_enable(&_page_poisoning_enabled); + page_poison_requested = true; + } +#endif + if (_init_on_alloc_enabled_early) { - if (page_poisoning_enabled()) + if (page_poison_requested) pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, " "will take precedence over init_on_alloc\n"); else static_branch_enable(_on_alloc); } if (_init_on_free_enabled_early) { - if (page_poisoning_enabled()) + if (page_poison_requested) pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, " "will take precedence over init_on_free\n"); else static_branch_enable(_on_free); } -#ifdef CONFIG_PAGE_POISONING - /* -* Page poisoning is debug page alloc for some arches. If -* either of those options are enabled, enable poisoning. -*/ - if (page_poisoning_enabled() || -(!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && - debug_pagealloc_enabled())) - static_branch_enable(&_page_poisoning_enabled); -#endif - #ifdef CONFIG_DEBUG_PAGEALLOC if (!debug_pagealloc_enabled()) return; -- 2.31.0
Re: [PATCH] mm: page_alloc: ignore init_on_free=1 for page alloc
On Fri, 26 Mar 2021 17:25:22 + Sergei Trofimovich wrote: > On Fri, 26 Mar 2021 15:17:00 +0100 > Vlastimil Babka wrote: > > > On 3/26/21 12:26 PM, Sergei Trofimovich wrote: > > > init_on_free=1 does not guarantee that free pages contain only zero bytes. > > > > > > Some examples: > > > 1. page_poison=on takes presedence over init_on_alloc=1 / ini_on_free=1 > > > > > > > Yes, and it spits out a message that you enabled both and poisoning takes > > precedence. It was that way even before my changes IIRC, but not > > consistent. > > Yeah. I probably should not have included this case as page_poison=on actually > made my machine boot just fine. My main focus was to understand why I an > seeing > the crash on kernel with init_on_alloc=1 init_on_free=1 and most debugging > options > on. > > My apologies! I'll try to find where this extra poisoning comes from. > > Making a step back and explaining my setup: > > Initially it's an ia64 box that manages to consistently corrupt memory > on socket free; https://lkml.org/lkml/2021/2/23/653 > > To get better understanding where corruption comes from I enabled > A Lot of VM, pagealloc and slab debugging options. Full config: > > > https://dev.gentoo.org/~slyfox/configs/guppy-config-5.12.0-rc4-00016-g427684abc9fd-dirty > > I boot machine as: > > [0.00] Kernel command line: > BOOT_IMAGE=/boot/vmlinuz-5.12.0-rc4-00016-g427684abc9fd-dirty root=/dev/sda3 > ro slab_nomerge memblock=debug debug_pagealloc=1 hardened_usercopy=1 > page_owner=on page_poison=0 init_on_alloc=1 init_on_free=1 > debug_guardpage_minorder=0 > > My boot log: > > > https://dev.gentoo.org/~slyfox/bugs/ia64-boot-bug/2021-03-26-init_on_alloc-fail > > Caveats in reading boot log: > - kernel crashes too early: stack unwinder does not have working > kmalloc() yet > - kernel crashes in MCE handler: normally it should not. It's an > unrelated bug > that makes backtrace useless. I'll try to fix it later, but it will not > be fast. > - I added a bunch of printk()s around the crash. > > The important pernel boot failure part is: > [0.00] put_kernel_page: pmd=e001 > [0.00] pmd:(ptrval): > I added WARN_ON_ONCE(1) to __kernel_poison_pages() to get the idea where poisoning comes from and got it at: [0.00] [ cut here ] [0.00] WARNING: CPU: 0 PID: 0 at mm/page_poison.c:40 __kernel_poison_pages+0x1a0/0x1c0 [0.00] Modules linked in: [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 5.12.0-rc4-00016-g427684abc9fd-dirty #196 Call Trace: [0.00] [] show_stack+0x90/0xc0 [0.00] [] dump_stack+0x150/0x1c0 [0.00] [] __warn+0x180/0x220 [0.00] [] warn_slowpath_fmt+0xc0/0x100 [0.00] [] __kernel_poison_pages+0x1a0/0x1c0 [0.00] [] __free_pages_ok+0x2a0/0x10c0 [0.00] [] __free_pages_core+0x2d0/0x480 [0.00] [] memblock_free_pages+0x30/0x50 [0.00] [] memblock_free_all+0x280/0x3c0 [0.00] [] mem_init+0x70/0x2d0 [0.00] [] start_kernel+0x670/0xc20 [0.00] [] start_ap+0x760/0x780 [0.00] ---[ end trace ]--- I think I found where page_poison=on get enabled at init_mem_debugging_and_hardening(): void init_mem_debugging_and_hardening(void) { if (_init_on_alloc_enabled_early) { if (page_poisoning_enabled()) pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, " "will take precedence over init_on_alloc\n"); else static_branch_enable(_on_alloc); } if (_init_on_free_enabled_early) { if (page_poisoning_enabled()) pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, " "will take precedence over init_on_free\n"); else static_branch_enable(_on_free); } #ifdef CONFIG_PAGE_POISONING /* * Page poisoning is debug page alloc for some arches. If * either of those options are enabled, enable poisoning. */ if (page_poisoning_enabled() || (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && debug_pagealloc_enabled())) static_branch_enable(&_page_poisoning_enabled); // <- HERE #endif ... } If I follow the code correctly to trigger the problem one needs to: - have PAGE_POISONING=y - have page_poison=off set (or just unset) - have a
Re: [PATCH] hpsa: fix boot on ia64 (atomic_t alignment)
On Wed, 17 Mar 2021 18:28:31 +0100 John Paul Adrian Glaubitz wrote: > Hi Sergei! > > On 3/12/21 11:27 PM, Sergei Trofimovich wrote: > > The failure initially observed as boot failure on rx3600 ia64 machine > > with RAID bus controller: Hewlett-Packard Company Smart Array P600: > > > > kernel unaligned access to 0xe00105dd8b95, ip=0xa00100b87551 > > kernel unaligned access to 0xe00105dd8e95, ip=0xa00100b87551 > > hpsa :14:01.0: Controller reports max supported commands of 0 Using > > 16 instead. Ensure that firmware is up to date. > > swapper/0[1]: error during unaligned kernel access > > > > Here unaligned access comes from 'struct CommandList' that happens > > to be packed. The change f749d8b7a ("scsi: hpsa: Correct dev cmds > > outstanding for retried cmds") introduced unexpected padding and > > un-aligned atomic_t from natural alignment to something else. > > > > This change does not remove packing annotation from struct but only > > restores alignment of atomic variable. > > > > The change is tested on the same rx3600 machine. > > I just gave it a try on my RX2660 and for me, the hpsa driver won't load even > with your patch. > > Can you share your kernel configuration so I can give it a try? Sure! Here is a config from a few days ago: https://dev.gentoo.org/~slyfox/configs/guppy-config-5.12.0-rc4-00016-g427684abc9fd-dirty -- Sergei
Re: [PATCH] ia64: tools: add generic errno.h definition
On Fri, Mar 12, 2021 at 07:51:35AM +, Sergei Trofimovich wrote: > Noticed missing header when build bpfilter helper: > > CC [U] net/bpfilter/main.o > In file included from /usr/include/linux/errno.h:1, >from /usr/include/bits/errno.h:26, >from /usr/include/errno.h:28, >from net/bpfilter/main.c:4: > tools/include/uapi/asm/errno.h:13:10: fatal error: > ../../../arch/ia64/include/uapi/asm/errno.h: No such file or directory > 13 | #include "../../../arch/ia64/include/uapi/asm/errno.h" > | ^ > > CC: linux-kernel@vger.kernel.org > CC: net...@vger.kernel.org > CC: b...@vger.kernel.org > Signed-off-by: Sergei Trofimovich Any chance to pick it up? Thanks! > --- > tools/arch/ia64/include/uapi/asm/errno.h | 1 + > 1 file changed, 1 insertion(+) > create mode 100644 tools/arch/ia64/include/uapi/asm/errno.h > > diff --git a/tools/arch/ia64/include/uapi/asm/errno.h > b/tools/arch/ia64/include/uapi/asm/errno.h > new file mode 100644 > index ..4c82b503d92f > --- /dev/null > +++ b/tools/arch/ia64/include/uapi/asm/errno.h > @@ -0,0 +1 @@ > +#include > -- > 2.30.2 > -- Sergei
Re: [PATCH] mm: page_alloc: ignore init_on_free=1 for page alloc
On Fri, 26 Mar 2021 15:17:00 +0100 Vlastimil Babka wrote: > On 3/26/21 12:26 PM, Sergei Trofimovich wrote: > > init_on_free=1 does not guarantee that free pages contain only zero bytes. > > > > Some examples: > > 1. page_poison=on takes presedence over init_on_alloc=1 / ini_on_free=1 > > Yes, and it spits out a message that you enabled both and poisoning takes > precedence. It was that way even before my changes IIRC, but not consistent. Yeah. I probably should not have included this case as page_poison=on actually made my machine boot just fine. My main focus was to understand why I an seeing the crash on kernel with init_on_alloc=1 init_on_free=1 and most debugging options on. My apologies! I'll try to find where this extra poisoning comes from. Making a step back and explaining my setup: Initially it's an ia64 box that manages to consistently corrupt memory on socket free; https://lkml.org/lkml/2021/2/23/653 To get better understanding where corruption comes from I enabled A Lot of VM, pagealloc and slab debugging options. Full config: https://dev.gentoo.org/~slyfox/configs/guppy-config-5.12.0-rc4-00016-g427684abc9fd-dirty I boot machine as: [0.00] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.12.0-rc4-00016-g427684abc9fd-dirty root=/dev/sda3 ro slab_nomerge memblock=debug debug_pagealloc=1 hardened_usercopy=1 page_owner=on page_poison=0 init_on_alloc=1 init_on_free=1 debug_guardpage_minorder=0 My boot log: https://dev.gentoo.org/~slyfox/bugs/ia64-boot-bug/2021-03-26-init_on_alloc-fail Caveats in reading boot log: - kernel crashes too early: stack unwinder does not have working kmalloc() yet - kernel crashes in MCE handler: normally it should not. It's an unrelated bug that makes backtrace useless. I'll try to fix it later, but it will not be fast. - I added a bunch of printk()s around the crash. The important pernel boot failure part is: [0.00] put_kernel_page: pmd=e001 [0.00] pmd:(ptrval): Note 1: I do not really enable page_poison at runtime and was misleading you in previous emails. (I initially assumed kernel_poison_pages() poisons pages unconditionally but you all explained it does not). Something else manages to poison my pmd(s?). Note 2: I have many other debugging options enabled that might trigger poisoning. > > 2. free_pages_prepare() always poisons pages: > > > >if (want_init_on_free()) > >kernel_init_free_pages(page, 1 << order); > >kernel_poison_pages(page, 1 << order > > kernel_poison_pages() includes a test if poisoning is enabled. And in that > case > want_init_on_free() shouldn't be. see init_mem_debugging_and_hardening() I completely missed that! Thank you! Will try to trace real cause of poisoning. > > I observed use of poisoned pages as the crash on ia64 booted with > > init_on_free=1 init_on_alloc=1 (CONFIG_PAGE_POISONING=y config). > > There pmd page contained 0x poison pages and led to early crash. > > Hm but that looks lika a sign that ia64 pmd allocation should use __GFP_ZERO > and > doesn't. It shouldn't rely on init_on_alloc or init_on_free being enabled. ia64 does use __GFP_ZERO (I even tried to add it manually to pmd_alloc_one() before I realized all _PGTABLEs imply __GFP_ZERO). I'll provide the call chain I arrived at for completeness: - [ia64 boots] - mem_init() (defined at arch/ia64/mm/init.c) -> setup_gate() (defined at arch/ia64/mm/init.c) -> put_kernel_page() (defined at arch/ia64/mm/init.c) -> [NOTE: from now on it's all generic code, not ia64-speficic] -> pmd_alloc() (defined at include/linux/mm.h) -> __pmd_alloc() (defined at mm/memory.c) -> [under #ifndef __PAGETABLE_PMD_FOLDED] pmd_alloc_one() (defined at include/asm-generic/pgalloc.h) -> pmd_alloc_one() [defined at include/asm-generic/pgalloc.h): static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) { struct page *page; gfp_t gfp = GFP_PGTABLE_USER; if (mm == _mm) gfp = GFP_PGTABLE_KERNEL; page = alloc_pages(gfp, 0); if (!page) return NULL; if (!pgtable_pmd_page_ctor(page)) { __free_pages(page, 0); return NULL; } return (pmd_t *)page_address(page); } In our case it is a GFP_PGTABLE_KERNEL with __GFP_ZERO and result is poisoned page instead of zeroed page. If I interpret the above correctly it means that something (probably memalloc_free_pages() ?) puts initial free pages as poisoned and later alloc_pages() assumes they are memset()-zero. But I don't see why. > > The change drops the ass
Re: [PATCH] mm: page_alloc: ignore init_on_free=1 for page alloc
On Fri, 26 Mar 2021 16:00:34 +0100 Andrey Konovalov wrote: > On Fri, Mar 26, 2021 at 2:49 PM David Hildenbrand wrote: > > > > > I observed use of poisoned pages as the crash on ia64 booted with > > > init_on_free=1 init_on_alloc=1 (CONFIG_PAGE_POISONING=y config). > > > There pmd page contained 0x poison pages and led to early crash. > > > > > > The change drops the assumption that init_on_free=1 guarantees free > > > pages to contain zeros. > > > > > > Alternative would be to make interaction between runtime poisoning and > > > sanitizing options and build-time debug flags like CONFIG_PAGE_POISONING > > > more coherent. I took the simpler path. > > > > > > > I thought latest work be Vlastimil tried to tackle that. To me, it feels > > like page_poison=on and init_on_free=1 should bail out and disable one > > of both things. Having both at the same time doesn't sound helpful. > > This is exactly how it works, see init_mem_debugging_and_hardening(). > > Sergei, could you elaborate more on what kind of crash this patch is > trying to fix? Where does it happen and why? Yeah, I see I misinterpreted page_poison=on handling and misled you all. Something else poisons a page when it should have not. I'll answer in more detail to Vlastimil's email upthread and will provide more detail of the unexpected poisoning I see. -- Sergei
[PATCH] mm: page_alloc: ignore init_on_free=1 for page alloc
init_on_free=1 does not guarantee that free pages contain only zero bytes. Some examples: 1. page_poison=on takes presedence over init_on_alloc=1 / ini_on_free=1 2. free_pages_prepare() always poisons pages: if (want_init_on_free()) kernel_init_free_pages(page, 1 << order); kernel_poison_pages(page, 1 << order I observed use of poisoned pages as the crash on ia64 booted with init_on_free=1 init_on_alloc=1 (CONFIG_PAGE_POISONING=y config). There pmd page contained 0x poison pages and led to early crash. The change drops the assumption that init_on_free=1 guarantees free pages to contain zeros. Alternative would be to make interaction between runtime poisoning and sanitizing options and build-time debug flags like CONFIG_PAGE_POISONING more coherent. I took the simpler path. Tested the fix on rx3600. CC: Andrew Morton CC: linux...@kvack.org Signed-off-by: Sergei Trofimovich --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cfc72873961d..d57d9b4f7089 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2301,7 +2301,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, kernel_unpoison_pages(page, 1 << order); set_page_owner(page, order, gfp_flags); - if (!want_init_on_free() && want_init_on_alloc(gfp_flags)) + if (want_init_on_alloc(gfp_flags)) kernel_init_free_pages(page, 1 << order); } -- 2.31.0
[PATCH] ia64: simplify code flow around swiotlb init
Before the change CONFIG_INTEL_IOMMU && !CONFIG_SWIOTLB && !CONFIG_FLATMEM could skip `set_max_mapnr(max_low_pfn);` is iommu is not present on system. CC: Andrew Morton CC: linux-i...@vger.kernel.org Signed-off-by: Sergei Trofimovich --- arch/ia64/mm/init.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 16d0d7d22657..a63585db94fe 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -644,13 +644,16 @@ mem_init (void) * _before_ any drivers that may need the PCI DMA interface are * initialized or bootmem has been freed. */ + do { #ifdef CONFIG_INTEL_IOMMU - detect_intel_iommu(); - if (!iommu_detected) + detect_intel_iommu(); + if (iommu_detected) + break; #endif #ifdef CONFIG_SWIOTLB swiotlb_init(1); #endif + } while (0); #ifdef CONFIG_FLATMEM BUG_ON(!mem_map); -- 2.31.0
[PATCH] ia64: drop unused IA64_FW_EMU ifdef
It's a remnant of deleted hpsim emulation target removed in fc5bad037 ("ia64: remove the hpsim platform"). CC: Andrew Morton CC: linux-i...@vger.kernel.org Signed-off-by: Sergei Trofimovich --- arch/ia64/kernel/head.S | 5 - 1 file changed, 5 deletions(-) diff --git a/arch/ia64/kernel/head.S b/arch/ia64/kernel/head.S index 646a22c25edf..9cd0a2cce36e 100644 --- a/arch/ia64/kernel/head.S +++ b/arch/ia64/kernel/head.S @@ -405,11 +405,6 @@ start_ap: // This is executed by the bootstrap processor (bsp) only: -#ifdef CONFIG_IA64_FW_EMU - // initialize PAL & SAL emulator: - br.call.sptk.many rp=sys_fw_init -.ret1: -#endif br.call.sptk.many rp=start_kernel .ret2: addl r3=@ltoff(halt_msg),gp ;; -- 2.31.0
Re: [PATCH] ia64: mca: allocate early mca with GFP_ATOMIC
On Tue, 23 Mar 2021 16:15:06 +0100 John Paul Adrian Glaubitz wrote: > Hi Andrew! > > On 3/15/21 9:50 AM, Sergei Trofimovich wrote: > > The sleep warning happens at early boot right at > > secondary CPU activation bootup: > > > > smp: Bringing up secondary CPUs ... > > BUG: sleeping function called from invalid context at > > mm/page_alloc.c:4942 > > in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 0, name: > > swapper/1 > > CPU: 1 PID: 0 Comm: swapper/1 Not tainted > > 5.12.0-rc2-7-g79e228d0b611-dirty #99 > > > > Call Trace: > > [] show_stack+0x90/0xc0 > > [] dump_stack+0x150/0x1c0 > > [] ___might_sleep+0x1c0/0x2a0 > > [] __might_sleep+0xa0/0x160 > > [] __alloc_pages_nodemask+0x1a0/0x600 > > [] alloc_page_interleave+0x30/0x1c0 > > [] alloc_pages_current+0x2c0/0x340 > > [] __get_free_pages+0x30/0xa0 > > [] ia64_mca_cpu_init+0x2d0/0x3a0 > > [] cpu_init+0x8b0/0x1440 > > [] start_secondary+0x60/0x700 > > [] start_ap+0x750/0x780 > > Fixed BSP b0 value from CPU 1 > > > > As I understand interrupts are not enabled yet and system has a lot > > of memory. There is little chance to sleep and switch to GFP_ATOMIC > > should be a no-op. > > > > CC: Andrew Morton > > CC: linux-i...@vger.kernel.org > > Signed-off-by: Sergei Trofimovich > > --- > > arch/ia64/kernel/mca.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c > > index d4cae2fc69ca..adf6521525f4 100644 > > --- a/arch/ia64/kernel/mca.c > > +++ b/arch/ia64/kernel/mca.c > > @@ -1824,7 +1824,7 @@ ia64_mca_cpu_init(void *cpu_data) > > data = mca_bootmem(); > > first_time = 0; > > } else > > - data = (void *)__get_free_pages(GFP_KERNEL, > > + data = (void *)__get_free_pages(GFP_ATOMIC, > > get_order(sz)); > > if (!data) > > panic("Could not allocate MCA memory for cpu %d\n", > > > > Has this one been picked up for your tree already? Should be there: https://www.ozlabs.org/~akpm/mmotm/series > #NEXT_PATCHES_START mainline-later (next week, approximately) > ia64-mca-allocate-early-mca-with-gfp_atomic.patch -- Sergei
[PATCH] mm: add page_owner_stack=off to make stack collection optional
On some architectures (like ia64) stack walking is slow and currently requires memory allocation. This causes stack collection for page_owner=on to fall into recursion. This patch implements a page_owner_stack=off to allow page stats collection. Signed-off-by: Sergei Trofimovich --- .../admin-guide/kernel-parameters.txt | 6 + mm/Kconfig.debug | 3 ++- mm/page_owner.c | 23 +-- 3 files changed, 24 insertions(+), 8 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 04545725f187..3e710c4ab4df 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3518,6 +3518,12 @@ we can turn it on. on: enable the feature + page_owner_stack= [KNL] Boot-time parameter option disabling stack + collection of page allocation. Has effect only if + "page_owner=on" is set. Useful for cases when stack + collection is too slow or not feasible. + off: disable the feature + page_poison=[KNL] Boot-time parameter changing the state of poisoning on the buddy allocator, available with CONFIG_PAGE_POISONING=y. diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index 1e73717802f8..c1ecaf066c93 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -57,7 +57,8 @@ config PAGE_OWNER help to find bare alloc_page(s) leaks. Even if you include this feature on your build, it is disabled in default. You should pass "page_owner=on" to boot parameter in order to enable it. Eats - a fair amount of memory if enabled. See tools/vm/page_owner_sort.c + a fair amount of memory if enabled. Call chain tracking can be + disabled with "page_owner_stack=off". See tools/vm/page_owner_sort.c for user-space helper. If unsure, say N. diff --git a/mm/page_owner.c b/mm/page_owner.c index d15c7c4994f5..2cc1113fa28d 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -31,6 +31,7 @@ struct page_owner { }; static bool page_owner_enabled = false; +static bool page_owner_stack_enabled = true; DEFINE_STATIC_KEY_FALSE(page_owner_inited); static depot_stack_handle_t dummy_handle; @@ -41,21 +42,26 @@ static void init_early_allocated_pages(void); static int __init early_page_owner_param(char *buf) { - if (!buf) - return -EINVAL; - - if (strcmp(buf, "on") == 0) - page_owner_enabled = true; - - return 0; + return kstrtobool(buf, _owner_enabled); } early_param("page_owner", early_page_owner_param); +static int __init early_page_owner_stack_param(char *buf) +{ + return kstrtobool(buf, _owner_stack_enabled); +} +early_param("page_owner_stack", early_page_owner_stack_param); + static bool need_page_owner(void) { return page_owner_enabled; } +static bool need_page_owner_stack(void) +{ + return page_owner_stack_enabled; +} + static __always_inline depot_stack_handle_t create_dummy_stack(void) { unsigned long entries[4]; @@ -122,6 +128,9 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags) depot_stack_handle_t handle; unsigned int nr_entries; + if (!need_page_owner_stack()) + return failure_handle; + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2); /* -- 2.31.0
Re: [PATCH] ia64: Ensure proper NUMA distance and possible map initialization
On Fri, 19 Mar 2021 15:47:09 +0100 John Paul Adrian Glaubitz wrote: > Hi Valentin! > > On 3/18/21 2:06 PM, Valentin Schneider wrote: > > John Paul reported a warning about bogus NUMA distance values spurred by > > commit: > > > > 620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for the > > deduplicating sort") > > > > In this case, the afflicted machine comes up with a reported 256 possible > > nodes, all of which are 0 distance away from one another. This was > > previously silently ignored, but is now caught by the aforementioned > > commit. > > > > The culprit is ia64's node_possible_map which remains unchanged from its > > initialization value of NODE_MASK_ALL. In John's case, the machine doesn't > > have any SRAT nor SLIT table, but AIUI the possible map remains untouched > > regardless of what ACPI tables end up being parsed. Thus, !online && > > possible nodes remain with a bogus distance of 0 (distances \in [0, 9] are > > "reserved and have no meaning" as per the ACPI spec). > > > > Follow x86 / drivers/base/arch_numa's example and set the possible map to > > the parsed map, which in this case seems to be the online map. > > > > Link: > > http://lore.kernel.org/r/255d6b5d-194e-eb0e-ecdd-97477a534...@physik.fu-berlin.de > > Fixes: 620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for > > the deduplicating sort") > > Reported-by: John Paul Adrian Glaubitz > > Signed-off-by: Valentin Schneider > > --- > > This might need an earlier Fixes: tag, but all of this is quite old and > > dusty (the git blame rabbit hole leads me to ~2008/2007) > > > > Alternatively, can we deprecate ia64 already? > > --- > > arch/ia64/kernel/acpi.c | 7 +-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c > > index a5636524af76..e2af6b172200 100644 > > --- a/arch/ia64/kernel/acpi.c > > +++ b/arch/ia64/kernel/acpi.c > > @@ -446,7 +446,8 @@ void __init acpi_numa_fixup(void) > > if (srat_num_cpus == 0) { > > node_set_online(0); > > node_cpuid[0].phys_id = hard_smp_processor_id(); > > - return; > > + slit_distance(0, 0) = LOCAL_DISTANCE; > > + goto out; > > } > > > > /* > > @@ -489,7 +490,7 @@ void __init acpi_numa_fixup(void) > > for (j = 0; j < MAX_NUMNODES; j++) > > slit_distance(i, j) = i == j ? > > LOCAL_DISTANCE : REMOTE_DISTANCE; > > - return; > > + goto out; > > } > > > > memset(numa_slit, -1, sizeof(numa_slit)); > > @@ -514,6 +515,8 @@ void __init acpi_numa_fixup(void) > > printk("\n"); > > } > > #endif > > +out: > > + node_possible_map = node_online_map; > > } > > #endif /* CONFIG_ACPI_NUMA */ > > > > > > Tested-by: John Paul Adrian Glaubitz > > Could you send this patch through Andrew Morton's tree? The ia64 port > currently > has no maintainer, so we have to use an alternative tree. > > @Sergei: Could you test/ack this patch as well? Booted successfully without problems on rx3600. Tested-by: Sergei Trofimovich -- Sergei
Re: [PATCH 0/1] sched/topology: NUMA distance deduplication
On Wed, 17 Mar 2021 20:04:07 + Valentin Schneider wrote: > On 17/03/21 20:47, John Paul Adrian Glaubitz wrote: > > Helo Valentin! > > > > On 3/17/21 8:36 PM, Valentin Schneider wrote: > >> I see ACPI in your boot logs, so I'm guessing you have a bogus SLIT table > >> (the ACPI table with node distances). You should be able to double check > >> this with something like: > >> > >> $ acpidump > acpi.dump > >> $ acpixtract -a acpi.dump > >> $ iasl -d *.dat > >> $ cat slit.dsl > > > > There does not seem to be a SLIT table in my firmware: > > > > root@glendronach:~# acpidump > acpi.dump > > root@glendronach:~# acpixtract -a acpi.dump > > > > Intel ACPI Component Architecture > > ACPI Binary Table Extraction Utility version 20200925 > > Copyright (c) 2000 - 2020 Intel Corporation > > > > acpixtract(31194): unaligned access to 0x6f9b3925, > > ip=0x40003e91 > > SSDT -3768 bytes written (0x0EB8) - ssdt1.dat > > acpixtract(31194): unaligned access to 0x6f9b3925, > > ip=0x40003e00 > > acpixtract(31194): unaligned access to 0x6f9b3925, > > ip=0x40003e91 > > SPCR - 80 bytes written (0x0050) - spcr.dat > > acpixtract(31194): unaligned access to 0x6f9b3925, > > ip=0x40003e00 > > acpixtract(31194): unaligned access to 0x6f9b3925, > > ip=0x40003e91 > > APIC - 200 bytes written (0x00C8) - apic.dat > > SSDT -1110 bytes written (0x0456) - ssdt2.dat > > SSDT - 316 bytes written (0x013C) - ssdt3.dat > > SPMI - 80 bytes written (0x0050) - spmi.dat > > DSDT - 58726 bytes written (0xE566) - dsdt.dat > > SSDT - 312 bytes written (0x0138) - ssdt4.dat > > SSDT -2150 bytes written (0x0866) - ssdt5.dat > > SSDT - 316 bytes written (0x013C) - ssdt6.dat > > SSDT -3768 bytes written (0x0EB8) - ssdt7.dat > > FACP - 244 bytes written (0x00F4) - facp.dat > > SSDT -1203 bytes written (0x04B3) - ssdt8.dat > > CPEP - 52 bytes written (0x0034) - cpep.dat > > SSDT - 316 bytes written (0x013C) - ssdt9.dat > > DBGP - 52 bytes written (0x0034) - dbgp.dat > > SSDT -3768 bytes written (0x0EB8) - ssdt10.dat > > FACS - 64 bytes written (0x0040) - facs.dat > > root@glendronach:~# > > > > root@glendronach:~# ls *.dsl *.dat > > apic.dat cpep.dsl dsdt.dat facp.dsl spcr.dat spmi.dslssdt1.dat > > ssdt2.dsl ssdt4.dat ssdt5.dsl ssdt7.dat ssdt8.dsl > > apic.dsl dbgp.dat dsdt.dsl facs.dat spcr.dsl ssdt10.dat ssdt1.dsl > > ssdt3.dat ssdt4.dsl ssdt6.dat ssdt7.dsl ssdt9.dat > > cpep.dat dbgp.dsl facp.dat facs.dsl spmi.dat ssdt10.dsl ssdt2.dat > > ssdt3.dsl ssdt5.dat ssdt6.dsl ssdt8.dat ssdt9.dsl > > root@glendronach:~# > > > > Huh, then this might be some initialization fail that leaves nr_node_ids to > MAX_NUMNODES, which must be 256 in your case (NODES_SHIFT==8). Devicetree > can provide node distances, but something tells me you're not using that :-) > > >> a) Complain to your hardware vendor to have them fix the table and ship a > >>firmware fix > > > > The hardware is probably too old for the vendor to care about fixing it. > > > > Indeed, I only realized that after googling your machine > > >> b) Fix the ACPI table yourself - I've been told it's doable for *some* of > >>them, but I've never done that myself > >> c) Compile your kernel with CONFIG_NUMA=n, as AFAICT you only actually have > >>a single node > >> d) Ignore the warning > >> > >> > >> c) is clearly not ideal if you want to use a somewhat generic kernel image > >> on a wide host of machines; d) is also a bit yucky... > > > > Shouldn't the kernel be able to cope with quirky hardware? From what I > > remember in the past, > > ACPI tables used to be broken quite a lot and the kernel contained > > workarounds for such cases, > > didn't it? > > > > Technically it *is* coping with it, it's just dumping the entire NUMA > distance matrix in the process... Let me see if I can't figure out why your > system doesn't end up with nr_node_ids=1. I also poked at it a few days ago assuming it was an issue causing boot failures (it was not, it's a harmless warning). Looking at 'arch/ia64/**' NUMA presence is detected by SRAT ACPI tables (and generic ACPI also wants SLIT, those probably exist on large ia64 boxes?) None of SRAT/SLIT are present on these small machines, thus I would expect generic code to erive single fake node. Boot log suggests we even inferred 1 node system: > [0.04] smp: Brought up 1 node, 4 CPUs Or is it just an early bootstrap message assuming more are to come? Could it be that we initialize too little of generic acpi boilerplace when there is no SRAT? Somewhere around: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/ia64/kernel/acpi.c#n446 arm64 and riscv calls `arch_numa_init()` and initializes numa node
[PATCH] ia64: mca: allocate early mca with GFP_ATOMIC
The sleep warning happens at early boot right at secondary CPU activation bootup: smp: Bringing up secondary CPUs ... BUG: sleeping function called from invalid context at mm/page_alloc.c:4942 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.12.0-rc2-7-g79e228d0b611-dirty #99 Call Trace: [] show_stack+0x90/0xc0 [] dump_stack+0x150/0x1c0 [] ___might_sleep+0x1c0/0x2a0 [] __might_sleep+0xa0/0x160 [] __alloc_pages_nodemask+0x1a0/0x600 [] alloc_page_interleave+0x30/0x1c0 [] alloc_pages_current+0x2c0/0x340 [] __get_free_pages+0x30/0xa0 [] ia64_mca_cpu_init+0x2d0/0x3a0 [] cpu_init+0x8b0/0x1440 [] start_secondary+0x60/0x700 [] start_ap+0x750/0x780 Fixed BSP b0 value from CPU 1 As I understand interrupts are not enabled yet and system has a lot of memory. There is little chance to sleep and switch to GFP_ATOMIC should be a no-op. CC: Andrew Morton CC: linux-i...@vger.kernel.org Signed-off-by: Sergei Trofimovich --- arch/ia64/kernel/mca.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c index d4cae2fc69ca..adf6521525f4 100644 --- a/arch/ia64/kernel/mca.c +++ b/arch/ia64/kernel/mca.c @@ -1824,7 +1824,7 @@ ia64_mca_cpu_init(void *cpu_data) data = mca_bootmem(); first_time = 0; } else - data = (void *)__get_free_pages(GFP_KERNEL, + data = (void *)__get_free_pages(GFP_ATOMIC, get_order(sz)); if (!data) panic("Could not allocate MCA memory for cpu %d\n", -- 2.30.2
[PATCH] ia64: fix format strings for err_inject
Fix warning with %lx / u64 mismatch: arch/ia64/kernel/err_inject.c: In function 'show_resources': arch/ia64/kernel/err_inject.c:62:22: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'u64' {aka 'long long unsigned int'} 62 | return sprintf(buf, "%lx\n", name[cpu]); \ | ^~~ CC: linux-i...@vger.kernel.org Signed-off-by: Sergei Trofimovich --- arch/ia64/kernel/err_inject.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/ia64/kernel/err_inject.c b/arch/ia64/kernel/err_inject.c index 8b5b8e6bc9d9..3d48f8766d78 100644 --- a/arch/ia64/kernel/err_inject.c +++ b/arch/ia64/kernel/err_inject.c @@ -59,7 +59,7 @@ show_##name(struct device *dev, struct device_attribute *attr,\ char *buf) \ { \ u32 cpu=dev->id;\ - return sprintf(buf, "%lx\n", name[cpu]);\ + return sprintf(buf, "%llx\n", name[cpu]); \ } #define store(name)\ @@ -86,9 +86,9 @@ store_call_start(struct device *dev, struct device_attribute *attr, #ifdef ERR_INJ_DEBUG printk(KERN_DEBUG "pal_mc_err_inject for cpu%d:\n", cpu); - printk(KERN_DEBUG "err_type_info=%lx,\n", err_type_info[cpu]); - printk(KERN_DEBUG "err_struct_info=%lx,\n", err_struct_info[cpu]); - printk(KERN_DEBUG "err_data_buffer=%lx, %lx, %lx.\n", + printk(KERN_DEBUG "err_type_info=%llx,\n", err_type_info[cpu]); + printk(KERN_DEBUG "err_struct_info=%llx,\n", err_struct_info[cpu]); + printk(KERN_DEBUG "err_data_buffer=%llx, %llx, %llx.\n", err_data_buffer[cpu].data1, err_data_buffer[cpu].data2, err_data_buffer[cpu].data3); @@ -117,8 +117,8 @@ store_call_start(struct device *dev, struct device_attribute *attr, #ifdef ERR_INJ_DEBUG printk(KERN_DEBUG "Returns: status=%d,\n", (int)status[cpu]); - printk(KERN_DEBUG "capabilities=%lx,\n", capabilities[cpu]); - printk(KERN_DEBUG "resources=%lx\n", resources[cpu]); + printk(KERN_DEBUG "capabilities=%llx,\n", capabilities[cpu]); + printk(KERN_DEBUG "resources=%llx\n", resources[cpu]); #endif return size; } @@ -131,7 +131,7 @@ show_virtual_to_phys(struct device *dev, struct device_attribute *attr, char *buf) { unsigned int cpu=dev->id; - return sprintf(buf, "%lx\n", phys_addr[cpu]); + return sprintf(buf, "%llx\n", phys_addr[cpu]); } static ssize_t @@ -145,7 +145,7 @@ store_virtual_to_phys(struct device *dev, struct device_attribute *attr, ret = get_user_pages_fast(virt_addr, 1, FOLL_WRITE, NULL); if (ret<=0) { #ifdef ERR_INJ_DEBUG - printk("Virtual address %lx is not existing.\n",virt_addr); + printk("Virtual address %llx is not existing.\n", virt_addr); #endif return -EINVAL; } @@ -163,7 +163,7 @@ show_err_data_buffer(struct device *dev, { unsigned int cpu=dev->id; - return sprintf(buf, "%lx, %lx, %lx\n", + return sprintf(buf, "%llx, %llx, %llx\n", err_data_buffer[cpu].data1, err_data_buffer[cpu].data2, err_data_buffer[cpu].data3); @@ -178,13 +178,13 @@ store_err_data_buffer(struct device *dev, int ret; #ifdef ERR_INJ_DEBUG - printk("write err_data_buffer=[%lx,%lx,%lx] on cpu%d\n", + printk("write err_data_buffer=[%llx,%llx,%llx] on cpu%d\n", err_data_buffer[cpu].data1, err_data_buffer[cpu].data2, err_data_buffer[cpu].data3, cpu); #endif - ret=sscanf(buf, "%lx, %lx, %lx", + ret = sscanf(buf, "%llx, %llx, %llx", _data_buffer[cpu].data1, _data_buffer[cpu].data2, _data_buffer[cpu].data3); -- 2.30.2
[PATCH] ia64: fix format string for ia64-acpi-cpu-freq
Fix warning with %lx / s64 mismatch: CC [M] drivers/cpufreq/ia64-acpi-cpufreq.o drivers/cpufreq/ia64-acpi-cpufreq.c: In function 'processor_get_pstate': warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 's64' {aka 'long long int'} [-Wformat=] CC: "Rafael J. Wysocki" CC: Viresh Kumar CC: linux...@vger.kernel.org Signed-off-by: Sergei Trofimovich --- drivers/cpufreq/ia64-acpi-cpufreq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/ia64-acpi-cpufreq.c b/drivers/cpufreq/ia64-acpi-cpufreq.c index 2efe7189ccc4..c6bdc455517f 100644 --- a/drivers/cpufreq/ia64-acpi-cpufreq.c +++ b/drivers/cpufreq/ia64-acpi-cpufreq.c @@ -54,7 +54,7 @@ processor_set_pstate ( retval = ia64_pal_set_pstate((u64)value); if (retval) { - pr_debug("Failed to set freq to 0x%x, with error 0x%lx\n", + pr_debug("Failed to set freq to 0x%x, with error 0x%llx\n", value, retval); return -ENODEV; } @@ -77,7 +77,7 @@ processor_get_pstate ( if (retval) pr_debug("Failed to get current freq with " - "error 0x%lx, idx 0x%x\n", retval, *value); + "error 0x%llx, idx 0x%x\n", retval, *value); return (int)retval; } -- 2.30.2
[PATCH] hpsa: fix boot on ia64 (atomic_t alignment)
The failure initially observed as boot failure on rx3600 ia64 machine with RAID bus controller: Hewlett-Packard Company Smart Array P600: kernel unaligned access to 0xe00105dd8b95, ip=0xa00100b87551 kernel unaligned access to 0xe00105dd8e95, ip=0xa00100b87551 hpsa :14:01.0: Controller reports max supported commands of 0 Using 16 instead. Ensure that firmware is up to date. swapper/0[1]: error during unaligned kernel access Here unaligned access comes from 'struct CommandList' that happens to be packed. The change f749d8b7a ("scsi: hpsa: Correct dev cmds outstanding for retried cmds") introduced unexpected padding and un-aligned atomic_t from natural alignment to something else. This change does not remove packing annotation from struct but only restores alignment of atomic variable. The change is tested on the same rx3600 machine. CC: linux-i...@vger.kernel.org CC: storage...@microchip.com CC: linux-s...@vger.kernel.org CC: Joe Szczypek CC: Scott Benesh CC: Scott Teel CC: Tomas Henzl CC: "Martin K. Petersen" CC: Don Brace Reported-by: John Paul Adrian Glaubitz Suggested-by: Don Brace Fixes: f749d8b7a "scsi: hpsa: Correct dev cmds outstanding for retried cmds" Signed-off-by: Sergei Trofimovich --- drivers/scsi/hpsa_cmd.h | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/hpsa_cmd.h b/drivers/scsi/hpsa_cmd.h index d126bb877250..617bdae9a7de 100644 --- a/drivers/scsi/hpsa_cmd.h +++ b/drivers/scsi/hpsa_cmd.h @@ -20,6 +20,9 @@ #ifndef HPSA_CMD_H #define HPSA_CMD_H +#include /* static_assert */ +#include /* offsetof */ + /* general boundary defintions */ #define SENSEINFOBYTES 32 /* may vary between hbas */ #define SG_ENTRIES_IN_CMD 32 /* Max SG entries excluding chain blocks */ @@ -448,11 +451,20 @@ struct CommandList { */ struct hpsa_scsi_dev_t *phys_disk; - bool retry_pending; + int retry_pending; struct hpsa_scsi_dev_t *device; atomic_t refcount; /* Must be last to avoid memset in hpsa_cmd_init() */ } __aligned(COMMANDLIST_ALIGNMENT); +/* + * Make sure our embedded atomic variable is aligned. Otherwise we break atomic + * operations on architectures that don't support unaligned atomics like IA64. + * + * Ideally this header should be cleaned up to only mark individual structs as + * packed. + */ +static_assert(offsetof(struct CommandList, refcount) % __alignof__(atomic_t) == 0); + /* Max S/G elements in I/O accelerator command */ #define IOACCEL1_MAXSGENTRIES 24 #define IOACCEL2_MAXSGENTRIES 28 -- 2.30.2
[PATCH] ia64: tools: add generic errno.h definition
Noticed missing header when build bpfilter helper: CC [U] net/bpfilter/main.o In file included from /usr/include/linux/errno.h:1, from /usr/include/bits/errno.h:26, from /usr/include/errno.h:28, from net/bpfilter/main.c:4: tools/include/uapi/asm/errno.h:13:10: fatal error: ../../../arch/ia64/include/uapi/asm/errno.h: No such file or directory 13 | #include "../../../arch/ia64/include/uapi/asm/errno.h" | ^ CC: linux-kernel@vger.kernel.org CC: net...@vger.kernel.org CC: b...@vger.kernel.org Signed-off-by: Sergei Trofimovich --- tools/arch/ia64/include/uapi/asm/errno.h | 1 + 1 file changed, 1 insertion(+) create mode 100644 tools/arch/ia64/include/uapi/asm/errno.h diff --git a/tools/arch/ia64/include/uapi/asm/errno.h b/tools/arch/ia64/include/uapi/asm/errno.h new file mode 100644 index ..4c82b503d92f --- /dev/null +++ b/tools/arch/ia64/include/uapi/asm/errno.h @@ -0,0 +1 @@ +#include -- 2.30.2
[PATCH] docs: don't include Documentation/Kconfig twice
Before the change there were two inclusions of Documentation/Kconfig: lib/Kconfig.debug:source "Documentation/Kconfig" Kconfig: source "Documentation/Kconfig" Kconfig also included 'source "lib/Kconfig.debug"'. Noticed as two 'make menuconfig' entries in both top level menu and in 'Kernel hacking' menu. The patch keeps entries only in 'Kernel hacking'. CC: Mauro Carvalho Chehab CC: Jonathan Corbet Signed-off-by: Sergei Trofimovich --- Kconfig | 2 -- 1 file changed, 2 deletions(-) diff --git a/Kconfig b/Kconfig index 745bc773f567..97ed6389c921 100644 --- a/Kconfig +++ b/Kconfig @@ -28,5 +28,3 @@ source "crypto/Kconfig" source "lib/Kconfig" source "lib/Kconfig.debug" - -source "Documentation/Kconfig" -- 2.30.1
Re: [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600
On Wed, 3 Mar 2021 00:22:36 + Sergei Trofimovich wrote: > On Tue, 2 Mar 2021 23:31:32 +0100 > John Paul Adrian Glaubitz wrote: > > > Hi Sergei! > > > > On 3/2/21 11:26 PM, Sergei Trofimovich wrote: > > > Gave v5.12-rc1 a try today and got a similar boot failure around > > > hpsa queue initialization, but my failure is later: > > > https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1 > > > Maybe I get different error because I flipped on most debugging > > > kernel options :) > > > > > > Looks like 'ERROR: Invalid distance value range' while being > > > very scary are harmless. It's just a new spammy way for kernel > > > to report lack of NUMA config on the machine (no SRAT and SLIT > > > ACPI tables). > > > > > > At least I get hpsa detected on PCI bus. But I guess it's discovered > > > configuration is very wrong as I get unaligned accesses: > > > [ 19.811570] kernel unaligned access to 0xe00105dd8295, > > > ip=0xa00100b874d1 > > > > > > Bisecting now. > > > > Sounds good. I guess we should get Jens' fix for the signal regression > > merged as well as your two fixes for strace. > > "bisected" (cheated halfway through) and verified that reverting > f749d8b7a9896bc6e5ffe104cc64345037e0b152 makes rx3600 boot again. > > CCing authors who might be able to help us here. > > commit f749d8b7a9896bc6e5ffe104cc64345037e0b152 > Author: Don Brace > Date: Mon Feb 15 16:26:57 2021 -0600 > > scsi: hpsa: Correct dev cmds outstanding for retried cmds > > Prevent incrementing device->commands_outstanding for ioaccel command > retries that are driver initiated. If the command goes through the retry > path, the device->commands_outstanding counter has already accounted for > the number of commands outstanding to the device. Only commands going > through function hpsa_cmd_resolve_events decrement this counter. > > - ioaccel commands go to either HBA disks or to logical volumes comprised >of SSDs. > > The extra increment is causing device resets to hang. > > - Resets wait for all device outstanding commands to complete before >returning. > > Replace unused field abort_pending with retry_pending. This is a > maintenance driver so these changes have the least impact/risk. > > Link: > https://lore.kernel.org/r/161342801747.29388.13045495968308188518.stgit@brunhilda > Tested-by: Joe Szczypek > Reviewed-by: Scott Benesh > Reviewed-by: Scott Teel > Reviewed-by: Tomas Henzl > Signed-off-by: Don Brace > Signed-off-by: Martin K. Petersen > > Don, do you happen to know why this patch caused some controller init failure > for device > 14:01.0 RAID bus controller: Hewlett-Packard Company Smart Array P600 > ? > > Boot failure: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1 > Boot success: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1-good > > The difference between the two boots is > f749d8b7a9896bc6e5ffe104cc64345037e0b152 reverted on top of 5.12-rc1 > in -good case. > > Looks like hpsa controller fails to initialize in bad case (could be a race?). Also CCing hpsa maintainer mailing lists. Looking more into the suspect commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f749d8b7a9896bc6e5ffe104cc64345037e0b152 it roughly does the: @@ -448,7 +448,7 @@ struct CommandList { */ struct hpsa_scsi_dev_t *phys_disk; - int abort_pending; + bool retry_pending; struct hpsa_scsi_dev_t *device; atomic_t refcount; /* Must be last to avoid memset in hpsa_cmd_init() */ } __aligned(COMMANDLIST_ALIGNMENT); ... @@ -1151,7 +1151,10 @@ static void __enqueue_cmd_and_start_io(struct ctlr_info *h, { dial_down_lockup_detection_during_fw_flash(h, c); atomic_inc(>commands_outstanding); - if (c->device) + /* +* Check to see if the command is being retried. +*/ + if (c->device && !c->retry_pending) atomic_inc(>device->commands_outstanding); But I don't immediately see anything wrong with it. -- Sergei
[bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600
On Tue, 2 Mar 2021 23:31:32 +0100 John Paul Adrian Glaubitz wrote: > Hi Sergei! > > On 3/2/21 11:26 PM, Sergei Trofimovich wrote: > > Gave v5.12-rc1 a try today and got a similar boot failure around > > hpsa queue initialization, but my failure is later: > > https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1 > > Maybe I get different error because I flipped on most debugging > > kernel options :) > > > > Looks like 'ERROR: Invalid distance value range' while being > > very scary are harmless. It's just a new spammy way for kernel > > to report lack of NUMA config on the machine (no SRAT and SLIT > > ACPI tables). > > > > At least I get hpsa detected on PCI bus. But I guess it's discovered > > configuration is very wrong as I get unaligned accesses: > > [ 19.811570] kernel unaligned access to 0xe00105dd8295, > > ip=0xa00100b874d1 > > > > Bisecting now. > > Sounds good. I guess we should get Jens' fix for the signal regression > merged as well as your two fixes for strace. "bisected" (cheated halfway through) and verified that reverting f749d8b7a9896bc6e5ffe104cc64345037e0b152 makes rx3600 boot again. CCing authors who might be able to help us here. commit f749d8b7a9896bc6e5ffe104cc64345037e0b152 Author: Don Brace Date: Mon Feb 15 16:26:57 2021 -0600 scsi: hpsa: Correct dev cmds outstanding for retried cmds Prevent incrementing device->commands_outstanding for ioaccel command retries that are driver initiated. If the command goes through the retry path, the device->commands_outstanding counter has already accounted for the number of commands outstanding to the device. Only commands going through function hpsa_cmd_resolve_events decrement this counter. - ioaccel commands go to either HBA disks or to logical volumes comprised of SSDs. The extra increment is causing device resets to hang. - Resets wait for all device outstanding commands to complete before returning. Replace unused field abort_pending with retry_pending. This is a maintenance driver so these changes have the least impact/risk. Link: https://lore.kernel.org/r/161342801747.29388.13045495968308188518.stgit@brunhilda Tested-by: Joe Szczypek Reviewed-by: Scott Benesh Reviewed-by: Scott Teel Reviewed-by: Tomas Henzl Signed-off-by: Don Brace Signed-off-by: Martin K. Petersen Don, do you happen to know why this patch caused some controller init failure for device 14:01.0 RAID bus controller: Hewlett-Packard Company Smart Array P600 ? Boot failure: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1 Boot success: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1-good The difference between the two boots is f749d8b7a9896bc6e5ffe104cc64345037e0b152 reverted on top of 5.12-rc1 in -good case. Looks like hpsa controller fails to initialize in bad case (could be a race?). -- Sergei
Re: [PATCH] ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
On Sun, 21 Feb 2021 00:25:54 + Sergei Trofimovich wrote: > In https://bugs.gentoo.org/769614 Dmitry noticed that > `ptrace(PTRACE_GET_SYSCALL_INFO)` does not return error sign properly. > > The bug is in mismatch between get/set errors: > > static inline long syscall_get_error(struct task_struct *task, > struct pt_regs *regs) > { > return regs->r10 == -1 ? regs->r8:0; > } > > static inline long syscall_get_return_value(struct task_struct *task, > struct pt_regs *regs) > { > return regs->r8; > } > > static inline void syscall_set_return_value(struct task_struct *task, > struct pt_regs *regs, > int error, long val) > { > if (error) { > /* error < 0, but ia64 uses > 0 return value */ > regs->r8 = -error; > regs->r10 = -1; > } else { > regs->r8 = val; > regs->r10 = 0; > } > } > > Tested on v5.10 on rx3600 machine (ia64 9040 CPU). > > CC: linux-i...@vger.kernel.org > CC: linux-kernel@vger.kernel.org > CC: Andrew Morton > Reported-by: Dmitry V. Levin > Bug: https://bugs.gentoo.org/769614 > Signed-off-by: Sergei Trofimovich > --- > arch/ia64/include/asm/syscall.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/ia64/include/asm/syscall.h b/arch/ia64/include/asm/syscall.h > index 6c6f16e409a8..0d23c0049301 100644 > --- a/arch/ia64/include/asm/syscall.h > +++ b/arch/ia64/include/asm/syscall.h > @@ -32,7 +32,7 @@ static inline void syscall_rollback(struct task_struct > *task, > static inline long syscall_get_error(struct task_struct *task, >struct pt_regs *regs) > { > - return regs->r10 == -1 ? regs->r8:0; > + return regs->r10 == -1 ? -regs->r8:0; > } > > static inline long syscall_get_return_value(struct task_struct *task, > -- > 2.30.1 > Andrew, would it be fine to pass it through misc tree? Or should it go through Oleg as it's mostly about ptrace? -- Sergei
Re: [PATCH] ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
On Sun, 21 Feb 2021 00:25:53 + Sergei Trofimovich wrote: > In https://bugs.gentoo.org/769614 Dmitry noticed that > `ptrace(PTRACE_GET_SYSCALL_INFO)` does not work for syscalls called > via glibc's syscall() wrapper. > > ia64 has two ways to call syscalls from userspace: via `break` and via > `eps` instructions. > > The difference is in stack layout: > > 1. `eps` creates simple stack frame: no locals, in{0..7} == out{0..8} > 2. `break` uses userspace stack frame: may be locals (glibc provides >one), in{0..7} == out{0..8}. > > Both work fine in syscall handling cde itself. > > But `ptrace(PTRACE_GET_SYSCALL_INFO)` uses unwind mechanism to > re-extract syscall arguments but it does not account for locals. > > The change always skips locals registers. It should not change `eps` > path as kernel's handler already enforces locals=0 and fixes `break`. > > Tested on v5.10 on rx3600 machine (ia64 9040 CPU). > > CC: Oleg Nesterov > CC: linux-i...@vger.kernel.org > CC: linux-kernel@vger.kernel.org > CC: Andrew Morton > Reported-by: Dmitry V. Levin > Bug: https://bugs.gentoo.org/769614 > Signed-off-by: Sergei Trofimovich > --- > arch/ia64/kernel/ptrace.c | 24 ++-- > 1 file changed, 18 insertions(+), 6 deletions(-) > > diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c > index c3490ee2daa5..e14f5653393a 100644 > --- a/arch/ia64/kernel/ptrace.c > +++ b/arch/ia64/kernel/ptrace.c > @@ -2013,27 +2013,39 @@ static void syscall_get_set_args_cb(struct > unw_frame_info *info, void *data) > { > struct syscall_get_set_args *args = data; > struct pt_regs *pt = args->regs; > - unsigned long *krbs, cfm, ndirty; > + unsigned long *krbs, cfm, ndirty, nlocals, nouts; > int i, count; > > if (unw_unwind_to_user(info) < 0) > return; > > + /* > + * We get here via a few paths: > + * - break instruction: cfm is shared with caller. > + * syscall args are in out= regs, locals are non-empty. > + * - epsinstruction: cfm is set by br.call > + * locals don't exist. > + * > + * For both cases argguments are reachable in cfm.sof - cfm.sol. > + * CFM: [ ... | sor: 17..14 | sol : 13..7 | sof : 6..0 ] > + */ > cfm = pt->cr_ifs; > + nlocals = (cfm >> 7) & 0x7f; /* aka sol */ > + nouts = (cfm & 0x7f) - nlocals; /* aka sof - sol */ > krbs = (unsigned long *)info->task + IA64_RBS_OFFSET/8; > ndirty = ia64_rse_num_regs(krbs, krbs + (pt->loadrs >> 19)); > > count = 0; > if (in_syscall(pt)) > - count = min_t(int, args->n, cfm & 0x7f); > + count = min_t(int, args->n, nouts); > > + /* Iterate over outs. */ > for (i = 0; i < count; i++) { > + int j = ndirty + nlocals + i + args->i; > if (args->rw) > - *ia64_rse_skip_regs(krbs, ndirty + i + args->i) = > - args->args[i]; > + *ia64_rse_skip_regs(krbs, j) = args->args[i]; > else > - args->args[i] = *ia64_rse_skip_regs(krbs, > - ndirty + i + args->i); > + args->args[i] = *ia64_rse_skip_regs(krbs, j); > } > > if (!args->rw) { > -- > 2.30.1 > Andrew, would it be fine to pass it through misc tree? Or should it go through Oleg as it's about ptrace? -- Sergei
Re: 5.11 regression: "ia64: add support for TIF_NOTIFY_SIGNAL" breaks ia64 boot
On Tue, 23 Feb 2021 08:08:30 + Sergei Trofimovich wrote: > On Mon, 22 Feb 2021 17:43:58 -0700 > Jens Axboe wrote: > > > On 2/22/21 5:41 PM, Jens Axboe wrote: > > > On 2/22/21 5:34 PM, Jens Axboe wrote: > > >> On 2/22/21 4:53 PM, Sergei Trofimovich wrote: > > >>> On Mon, 22 Feb 2021 16:34:50 -0700 > > >>> Jens Axboe wrote: > > >>> > > >>>> On 2/22/21 4:05 PM, Sergei Trofimovich wrote: > > >>>>> Hia Jens! > > >>>>> > > >>>>> Tried 5.11 on rx3600 box and noticed it has > > >>>>> a problem handling init (5.10 booted fine): > > >>>>> > > >>>>> INIT: version 2.98 booting > > >>>>> > > >>>>>OpenRC 0.42.1 is starting up Gentoo Linux (ia64) > > >>>>> > > >>>>> mkdir `/run/openrc': Read-only file system > > >>>>> mkdir `/run/openrc/starting': No such file or directory > > >>>>> mkdir `/run/openrc/started': No such file or directory > > >>>>> mkdir `/run/openrc/stopping': No such file or directory > > >>>>> mkdir `/run/openrc/inactive': No such file or directory > > >>>>> mkdir `/run/openrc/wasinactive': No such file or directory > > >>>>> mkdir `/run/openrc/failed': No such file or directory > > >>>>> mkdir `/run/openrc/hotplugged': No such file or directory > > >>>>> mkdir `/run/openrc/daemons': No such file or directory > > >>>>> mkdir `/run[ 14.595059] Kernel panic - not syncing: Attempted to > > >>>>> kill init! exitcode=0x000b > > >>>>> [ 14.599059] ---[ end Kernel panic - not syncing: Attempted to kill > > >>>>> init! exitcode=0x000b ]--- > > >>>>> > > >>>>> I suspect we build bad signal stack frame for userspace. > > >>>>> > > >>>>> With a bit of #define DEBUG_SIG 1 enabled the signals are SIGCHLD: > > >>>>> > > >>>>> [ 34.969771] SIG deliver (gendepends.sh:69): sig=17 > > >>>>> sp=6f6aeaa0 ip=a0040740 handler=4b4c59b6 > > >>>>> [ 34.969948] SIG deliver (init:1): sig=17 sp=6f1ccc50 > > >>>>> ip=a0040740 handler=4638b9e5 > > >>>>> [ 34.969948] SIG deliver (gendepends.sh:69): sig=17 > > >>>>> sp=6f6adf90 ip=a0040740 handler=4b4c59b6 > > >>>>> [ 34.973948] SIG deliver (init:1): sig=17 sp=6f1cc140 > > >>>>> ip=a0040740 handler=4638b9e5 > > >>>>> [ 34.973948] Kernel panic - not syncing: Attempted to kill init! > > >>>>> exitcode=0x000b > > >>>>> [ 34.973948] SIG deliver (gendepends.sh:69): sig=17 > > >>>>> sp=6f6ad480 ip=a0040740 handler=4b4c59b6 > > >>>>> [ 34.973948] ---[ end Kernel panic - not syncing: Attempted to kill > > >>>>> init! exitcode=0x000b ]--- > > >>>>> > > >>>>> Bisect points at: > > >>>>> > > >>>>> commit b269c229b0e89aedb7943c06673b56b6052cf5e5 > > >>>>> Author: Jens Axboe > > >>>>> Date: Fri Oct 9 14:49:43 2020 -0600 > > >>>>> > > >>>>> ia64: add support for TIF_NOTIFY_SIGNAL > > >>>>> > > >>>>> Wire up TIF_NOTIFY_SIGNAL handling for ia64. > > >>>>> > > >>>>> Cc: linux-i...@vger.kernel.org > > >>>>> [axboe: added fixes from Mike Rapoport ] > > >>>>> Signed-off-by: Jens Axboe > > >>>>> > > >>>>> diff --git a/arch/ia64/include/asm/thread_info.h > > >>>>> b/arch/ia64/include/asm/thread_info.h > > >>>>> index 64a1011f6812..51d20cb37706 100644 > > >>>>> --- a/arch/ia64/include/asm/thread_info.h > > >>>>> +++ b/arch/ia64/include/asm/thread_info.h > > >>>>> @@ -103,6 +103,7 @@ struct thread_info { > > >>>>> #define TIF_SYSCALL_TRACE 2 /* syscall trace active */ > > >>>>> #define TIF_SYSCALL_AUDIT 3 /* syscall auditing active */ > >
5.?? regression: strace testsuite OOpses kernel on ia64
The crash seems to be related to sock_filter-v test from strace: https://github.com/strace/strace/blob/master/tests/seccomp-filter-v.c Here is an OOps: [ 818.089904] BUG: Bad page map in process sock_filter-v pte:0001 pmd:118580001 [ 818.089904] page:e6a429c8 refcount:1 mapcount:-1 mapping: index:0x0 pfn:0x0 [ 818.089904] flags: 0x1000(reserved) [ 818.089904] raw: 1000 a0004008 a0004008 [ 818.089904] raw: 0001fffe [ 818.089904] page dumped because: bad pte [ 818.089904] addr: vm_flags:04044011 anon_vma: mapping: index:0 [ 818.095483] file:(null) fault:0x0 mmap:0x0 readpage:0x0 [ 818.095483] CPU: 0 PID: 5990 Comm: sock_filter-v Not tainted 5.11.0-3-gbfa5a4929c90 #57 [ 818.095483] Hardware name: hp server rx3600 , BIOS 04.03 04/08/2008 [ 818.095483] [ 818.095483] Call Trace: [ 818.095483] [] show_stack+0x90/0xc0 [ 818.095483] sp=e00118707bb0 bsp=e001187013c0 [ 818.095483] [] dump_stack+0x120/0x160 [ 818.095483] sp=e00118707d80 bsp=e00118701348 [ 818.095483] [] print_bad_pte+0x300/0x3a0 [ 818.095483] sp=e00118707d80 bsp=e001187012e0 [ 818.099483] [] unmap_page_range+0xa90/0x11a0 [ 818.099483] sp=e00118707d80 bsp=e00118701140 [ 818.099483] [] unmap_vmas+0xc0/0x100 [ 818.099483] sp=e00118707da0 bsp=e00118701108 [ 818.099483] [] exit_mmap+0x150/0x320 [ 818.099483] sp=e00118707da0 bsp=e001187010d8 [ 818.099483] [] mmput+0x60/0x200 [ 818.099483] sp=e00118707e20 bsp=e001187010b0 [ 818.103482] [] do_exit+0x6f0/0x18a0 [ 818.103482] sp=e00118707e20 bsp=e00118701038 [ 818.103482] [] do_group_exit+0x90/0x2a0 [ 818.103482] sp=e00118707e30 bsp=e00118700ff0 [ 818.103482] [] sys_exit_group+0x20/0x40 [ 818.103482] sp=e00118707e30 bsp=e00118700f98 [ 818.107482] [] ia64_trace_syscall+0xf0/0x130 [ 818.107482] sp=e00118707e30 bsp=e00118700f98 [ 818.107482] [] ia64_ivt+0x00040720/0x400 [ 818.107482] sp=e00118708000 bsp=e00118700f98 [ 818.115482] Disabling lock debugging due to kernel taint [ 818.115482] BUG: Bad rss-counter state mm:2eec6412 type:MM_FILEPAGES val:-1 [ 818.132256] Unable to handle kernel NULL pointer dereference (address 0068) [ 818.133904] sock_filter-v-X[5999]: Oops 11012296146944 [1] [ 818.133904] Modules linked in: acpi_ipmi ipmi_si usb_storage e1000 ipmi_devintf ipmi_msghandler rtc_efi [ 818.133904] [ 818.133904] CPU: 0 PID: 5999 Comm: sock_filter-v-X Tainted: GB 5.11.0-3-gbfa5a4929c90 #57 [ 818.133904] Hardware name: hp server rx3600 , BIOS 04.03 04/08/2008 [ 818.133904] psr : 121008026010 ifs : 8288 ip : []Tainted: GB (5.11.0-3-gbfa5a4929c90) [ 818.133904] ip is at bpf_prog_free+0x21/0xe0 [ 818.133904] unat: pfs : 0307 rsc : 0003 [ 818.133904] rnat: bsps: pr : 00106a5a51665965 [ 818.133904] ldrs: ccv : 12088904 fpsr: 0009804c8a70033f [ 818.133904] csd : ssd : [ 818.133904] b0 : a00100d54080 b6 : a00100d53fe0 b7 : a001cef0 [ 818.133904] f6 : 0ffefb0c50daa1b67f89a f7 : 0ffed8b3e4fdb0800 [ 818.133904] f8 : 10017fbd1bc00 f9 : 1000eb95f [ 818.133904] f10 : 10008ade20716a6c83cc1 f11 : 1003e02b7 [ 818.133904] r1 : a0010176b300 r2 : a0028004 r3 : [ 818.133904] r8 : 0008 r9 : e0011873f800 r10 : e00102c18600 [ 818.133904] r11 : e00102c19600 r12 : e0011873f7f0 r13 : e00118738000 [ 818.133904] r14 : 0068 r15 : a0028028 r16 : e5606a70 [ 818.133904] r17 : e00102c18600 r18 : e00104370748 r19 : e00102c18600 [ 818.133904] r20 : e00102c18600 r21 : e5606a78 r22 : a0010156bd28 [ 818.133904] r23 : a0010147fdf4 r24 : 4000 r25 : e00104370750 [ 818.133904] r26 : a001012f7088 r27 : a00100d53fe0 r28 : 0001 [ 818.133904] r29 : e0011873f800 r30 : e0011873f810 r31 : e0011873f808 [ 818.133904] [ 818.133904] Call Trace: [ 818.133904] [] show_stack+0x90/0xc0 [ 818.133904]
Re: [PATCH] ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
On Sun, 21 Feb 2021 10:21:56 +0100 John Paul Adrian Glaubitz wrote: > Hi Sergei! > > On 2/21/21 1:25 AM, Sergei Trofimovich wrote: > > In https://bugs.gentoo.org/769614 Dmitry noticed that > > `ptrace(PTRACE_GET_SYSCALL_INFO)` does not return error sign properly. > > (...) > > Do these two patches unbreak gdb on ia64? gdb was somewhat working on ia64 for Gentoo. strace was the main impacted here. But I did not try anything complicated recently. Anything specific that breaks for you? $ uname -r 5.10.0 (even without the patches above) $ cat c.c int main(){} $ gcc c.c -o a -ggdb3 $ gdb --quiet ./a Reading symbols from ./a... (gdb) start Temporary breakpoint 1 at 0x7f2: file c.c, line 1. Starting program: /home/slyfox/a Failed to read a valid object file image from memory. Temporary breakpoint 1, main () at c.c:1 1 int main(){} (gdb) disassemble Dump of assembler code for function main: 0x200807f0 <+0>: [MII] mov r2=r12 0x200807f1 <+1>: mov r14=r0;; => 0x200807f2 <+2>: mov r8=r14 0x20080800 <+16>:[MIB] mov r12=r2 0x20080801 <+17>:nop.i 0x0 0x20080802 <+18>:br.ret.sptk.many b0;; End of assembler dump. (gdb) break *0x20080800 Breakpoint 2 at 0x20080800: file c.c, line 1. (gdb) continue Continuing. Breakpoint 2, 0x20080800 in main () at c.c:1 1 int main(){} Looks ok for minor stuff. > And have you, by any chance, managed to get the hpsa driver working again? v5.10 seems to boot off hpsa just fine without extra patches: 14:01.0 RAID bus controller: Hewlett-Packard Company Smart Array P600 Subsystem: Hewlett-Packard Company 3 Gb/s SAS RAID Kernel driver in use: hpsa v5.11 does not boot yet. Kernel does not see some files while boots after init is started (but I'm not sure it's a block device problem). Bisecting now why. -- Sergei
[PATCH] ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
In https://bugs.gentoo.org/769614 Dmitry noticed that `ptrace(PTRACE_GET_SYSCALL_INFO)` does not work for syscalls called via glibc's syscall() wrapper. ia64 has two ways to call syscalls from userspace: via `break` and via `eps` instructions. The difference is in stack layout: 1. `eps` creates simple stack frame: no locals, in{0..7} == out{0..8} 2. `break` uses userspace stack frame: may be locals (glibc provides one), in{0..7} == out{0..8}. Both work fine in syscall handling cde itself. But `ptrace(PTRACE_GET_SYSCALL_INFO)` uses unwind mechanism to re-extract syscall arguments but it does not account for locals. The change always skips locals registers. It should not change `eps` path as kernel's handler already enforces locals=0 and fixes `break`. Tested on v5.10 on rx3600 machine (ia64 9040 CPU). CC: Oleg Nesterov CC: linux-i...@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: Andrew Morton Reported-by: Dmitry V. Levin Bug: https://bugs.gentoo.org/769614 Signed-off-by: Sergei Trofimovich --- arch/ia64/kernel/ptrace.c | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c index c3490ee2daa5..e14f5653393a 100644 --- a/arch/ia64/kernel/ptrace.c +++ b/arch/ia64/kernel/ptrace.c @@ -2013,27 +2013,39 @@ static void syscall_get_set_args_cb(struct unw_frame_info *info, void *data) { struct syscall_get_set_args *args = data; struct pt_regs *pt = args->regs; - unsigned long *krbs, cfm, ndirty; + unsigned long *krbs, cfm, ndirty, nlocals, nouts; int i, count; if (unw_unwind_to_user(info) < 0) return; + /* +* We get here via a few paths: +* - break instruction: cfm is shared with caller. +* syscall args are in out= regs, locals are non-empty. +* - epsinstruction: cfm is set by br.call +* locals don't exist. +* +* For both cases argguments are reachable in cfm.sof - cfm.sol. +* CFM: [ ... | sor: 17..14 | sol : 13..7 | sof : 6..0 ] +*/ cfm = pt->cr_ifs; + nlocals = (cfm >> 7) & 0x7f; /* aka sol */ + nouts = (cfm & 0x7f) - nlocals; /* aka sof - sol */ krbs = (unsigned long *)info->task + IA64_RBS_OFFSET/8; ndirty = ia64_rse_num_regs(krbs, krbs + (pt->loadrs >> 19)); count = 0; if (in_syscall(pt)) - count = min_t(int, args->n, cfm & 0x7f); + count = min_t(int, args->n, nouts); + /* Iterate over outs. */ for (i = 0; i < count; i++) { + int j = ndirty + nlocals + i + args->i; if (args->rw) - *ia64_rse_skip_regs(krbs, ndirty + i + args->i) = - args->args[i]; + *ia64_rse_skip_regs(krbs, j) = args->args[i]; else - args->args[i] = *ia64_rse_skip_regs(krbs, - ndirty + i + args->i); + args->args[i] = *ia64_rse_skip_regs(krbs, j); } if (!args->rw) { -- 2.30.1
[PATCH] ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
In https://bugs.gentoo.org/769614 Dmitry noticed that `ptrace(PTRACE_GET_SYSCALL_INFO)` does not return error sign properly. The bug is in mismatch between get/set errors: static inline long syscall_get_error(struct task_struct *task, struct pt_regs *regs) { return regs->r10 == -1 ? regs->r8:0; } static inline long syscall_get_return_value(struct task_struct *task, struct pt_regs *regs) { return regs->r8; } static inline void syscall_set_return_value(struct task_struct *task, struct pt_regs *regs, int error, long val) { if (error) { /* error < 0, but ia64 uses > 0 return value */ regs->r8 = -error; regs->r10 = -1; } else { regs->r8 = val; regs->r10 = 0; } } Tested on v5.10 on rx3600 machine (ia64 9040 CPU). CC: linux-i...@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: Andrew Morton Reported-by: Dmitry V. Levin Bug: https://bugs.gentoo.org/769614 Signed-off-by: Sergei Trofimovich --- arch/ia64/include/asm/syscall.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/ia64/include/asm/syscall.h b/arch/ia64/include/asm/syscall.h index 6c6f16e409a8..0d23c0049301 100644 --- a/arch/ia64/include/asm/syscall.h +++ b/arch/ia64/include/asm/syscall.h @@ -32,7 +32,7 @@ static inline void syscall_rollback(struct task_struct *task, static inline long syscall_get_error(struct task_struct *task, struct pt_regs *regs) { - return regs->r10 == -1 ? regs->r8:0; + return regs->r10 == -1 ? -regs->r8:0; } static inline long syscall_get_return_value(struct task_struct *task, -- 2.30.1
linux-headers-5.2 and proper use of SIOCGSTAMP
Commit https://github.com/torvalds/linux/commit/0768e17073dc527ccd18ed5f96ce85f9985e9115 ("net: socket: implement 64-bit timestamps") caused a bit of userspace breakage for existing programs: - firefox: https://bugs.gentoo.org/689808 - qemu: https://lists.sr.ht/~philmd/qemu/%3C20190604071915.288045-1-borntraeger%40de.ibm.com%3E - linux-atm: https://gitweb.gentoo.org/repo/gentoo.git/tree/net-dialup/linux-atm/files/linux-atm-2.5.2-linux-5.2-SIOCGSTAMP.patch?id=408621819a85bf67a73efd33a06ea371c20ea5a2 I have a question: how a well-behaved app should include 'SIOCGSTAMP' definition to keep being buildable against old and new linux-headers? 'man 7 socket' explicitly mentions SIOCGSTAMP and mentions only #include as needed header. Should #include always be included by user app? Or should glibc tweak it's definition of '#include ' to make it available on both old and new version of linux headers? CCing both kernel and glibc folk as I don't understand on which side issue should be fixed. Thanks! -- Sergei
[RESEND, PATCH] tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
The bug manifests as an attempt to access deallocated memory: BUG: unable to handle kernel paging request at 9c8735448000 #PF error: [PROT] [WRITE] PGD 288a05067 P4D 288a05067 PUD 288a07067 PMD 7f60c2063 PTE 8007f5448161 Oops: 0003 [#1] PREEMPT SMP CPU: 6 PID: 388 Comm: loadkeys Tainted: G C 5.0.0-rc6-00153-g5ded5871030e #91 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M-D3H, BIOS F12 11/14/2013 RIP: 0010:__memmove+0x81/0x1a0 Code: 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 a2 00 00 00 66 90 48 89 d1 4c 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03 48 a5 4d 89 1a e9 0c 01 00 00 0f 1f 40 00 48 89 d1 4c 8b 1e 49 RSP: 0018:a1b9002d7d08 EFLAGS: 00010203 RAX: 9c873541af43 RBX: 9c873541af43 RCX: 0c6f105cd6bf RDX: 637882e986b6 RSI: 9c8735447ffb RDI: 9c8735447ffb RBP: 9c8739cd3800 R08: 9c873b802f00 R09: f73b R10: b82b35f1 R11: 00505b1b004d5b1b R12: R13: 9c873541af3d R14: 000b R15: 000c FS: 7f450c390580() GS:9c873f18() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 9c8735448000 CR3: 0007e213c002 CR4: 000606e0 Call Trace: vt_do_kdgkb_ioctl+0x34d/0x440 vt_ioctl+0xba3/0x1190 ? __bpf_prog_run32+0x39/0x60 ? mem_cgroup_commit_charge+0x7b/0x4e0 tty_ioctl+0x23f/0x920 ? preempt_count_sub+0x98/0xe0 ? __seccomp_filter+0x67/0x600 do_vfs_ioctl+0xa2/0x6a0 ? syscall_trace_enter+0x192/0x2d0 ksys_ioctl+0x3a/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x54/0xe0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The bug manifests on systemd systems with multiple vtcon devices: # cat /sys/devices/virtual/vtconsole/vtcon0/name (S) dummy device # cat /sys/devices/virtual/vtconsole/vtcon1/name (M) frame buffer device There systemd runs 'loadkeys' tool in tapallel for each vtcon instance. This causes two parallel ioctl(KDSKBSENT) calls to race into adding the same entry into 'func_table' array at: drivers/tty/vt/keyboard.c:vt_do_kdgkb_ioctl() The function has no locking around writes to 'func_table'. The simplest reproducer is to have initrams with the following init on a 8-CPU machine x86_64: #!/bin/sh loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & wait The change adds lock on write path only. Reads are still racy. CC: Greg Kroah-Hartman CC: Jiri Slaby Link: https://lkml.org/lkml/2019/2/17/256 Signed-off-by: Sergei Trofimovich --- drivers/tty/vt/keyboard.c | 33 +++-- 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/tty/vt/keyboard.c b/drivers/tty/vt/keyboard.c index 88312c6c92cc..0617e87ab343 100644 --- a/drivers/tty/vt/keyboard.c +++ b/drivers/tty/vt/keyboard.c @@ -123,6 +123,7 @@ static const int NR_TYPES = ARRAY_SIZE(max_vals); static struct input_handler kbd_handler; static DEFINE_SPINLOCK(kbd_event_lock); static DEFINE_SPINLOCK(led_lock); +static DEFINE_SPINLOCK(func_buf_lock); /* guard 'func_buf' and friends */ static unsigned long key_down[BITS_TO_LONGS(KEY_CNT)]; /* keyboard key bitmap */ static unsigned char shift_down[NR_SHIFT]; /* shift state counters.. */ static bool dead_key_next; @@ -1990,11 +1991,12 @@ int vt_do_kdgkb_ioctl(int cmd, struct kbsentry __user *user_kdgkb, int perm) char *p; u_char *q; u_char __user *up; - int sz; + int sz, fnw_sz; int delta; char *first_free, *fj, *fnw; int i, j, k; int ret; + unsigned long flags; if (!capable(CAP_SYS_TTY_CONFIG)) perm = 0; @@ -2037,7 +2039,14 @@ int vt_do_kdgkb_ioctl(int cmd, struct kbsentry __user *user_kdgkb, int perm) goto reterr; } + fnw = NULL; + fnw_sz = 0; + /* race aginst other writers */ + again: + spin_lock_irqsave(_buf_lock, flags); q = func_table[i]; + + /* fj pointer to next entry after 'q' */ first_free = funcbufptr + (funcbufsize - funcbufleft); for (j = i+1; j < MAX_NR_FUNC && !func_table[j]; j++) ; @@ -2045,10 +2054,12 @@ int vt_do_kdgkb_ioctl(int cmd, struct kbsentry __user *user_kdgkb, int perm) fj = func_table[j]; else fj = first_free; - + /* buffer usage increase by new entry */ delta = (q ? -strlen(q) : 1) + strlen(kbs->kb_string); +
[PATCH] tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
The bug manifests as an attempt to access deallocated memory: BUG: unable to handle kernel paging request at 9c8735448000 #PF error: [PROT] [WRITE] PGD 288a05067 P4D 288a05067 PUD 288a07067 PMD 7f60c2063 PTE 8007f5448161 Oops: 0003 [#1] PREEMPT SMP CPU: 6 PID: 388 Comm: loadkeys Tainted: G C 5.0.0-rc6-00153-g5ded5871030e #91 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M-D3H, BIOS F12 11/14/2013 RIP: 0010:__memmove+0x81/0x1a0 Code: 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 a2 00 00 00 66 90 48 89 d1 4c 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03 48 a5 4d 89 1a e9 0c 01 00 00 0f 1f 40 00 48 89 d1 4c 8b 1e 49 RSP: 0018:a1b9002d7d08 EFLAGS: 00010203 RAX: 9c873541af43 RBX: 9c873541af43 RCX: 0c6f105cd6bf RDX: 637882e986b6 RSI: 9c8735447ffb RDI: 9c8735447ffb RBP: 9c8739cd3800 R08: 9c873b802f00 R09: f73b R10: b82b35f1 R11: 00505b1b004d5b1b R12: R13: 9c873541af3d R14: 000b R15: 000c FS: 7f450c390580() GS:9c873f18() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 9c8735448000 CR3: 0007e213c002 CR4: 000606e0 Call Trace: vt_do_kdgkb_ioctl+0x34d/0x440 vt_ioctl+0xba3/0x1190 ? __bpf_prog_run32+0x39/0x60 ? mem_cgroup_commit_charge+0x7b/0x4e0 tty_ioctl+0x23f/0x920 ? preempt_count_sub+0x98/0xe0 ? __seccomp_filter+0x67/0x600 do_vfs_ioctl+0xa2/0x6a0 ? syscall_trace_enter+0x192/0x2d0 ksys_ioctl+0x3a/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x54/0xe0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The bug manifests on systemd systems with multiple vtcon devices: # cat /sys/devices/virtual/vtconsole/vtcon0/name (S) dummy device # cat /sys/devices/virtual/vtconsole/vtcon1/name (M) frame buffer device There systemd runs 'loadkeys' tool in tapallel for each vtcon instance. This causes two parallel ioctl(KDSKBSENT) calls to race into adding the same entry into 'func_table' array at: drivers/tty/vt/keyboard.c:vt_do_kdgkb_ioctl() The function has no locking around writes to 'func_table'. The simplest reproducer is to have initrams with the following init on a 8-CPU machine x86_64: #!/bin/sh loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & wait The change adds lock on write path only. Reads are still racy. CC: Greg Kroah-Hartman CC: Jiri Slaby Link: https://lkml.org/lkml/2019/2/17/256 Signed-off-by: Sergei Trofimovich --- drivers/tty/vt/keyboard.c | 33 +++-- 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/tty/vt/keyboard.c b/drivers/tty/vt/keyboard.c index 88312c6c92cc..0617e87ab343 100644 --- a/drivers/tty/vt/keyboard.c +++ b/drivers/tty/vt/keyboard.c @@ -123,6 +123,7 @@ static const int NR_TYPES = ARRAY_SIZE(max_vals); static struct input_handler kbd_handler; static DEFINE_SPINLOCK(kbd_event_lock); static DEFINE_SPINLOCK(led_lock); +static DEFINE_SPINLOCK(func_buf_lock); /* guard 'func_buf' and friends */ static unsigned long key_down[BITS_TO_LONGS(KEY_CNT)]; /* keyboard key bitmap */ static unsigned char shift_down[NR_SHIFT]; /* shift state counters.. */ static bool dead_key_next; @@ -1990,11 +1991,12 @@ int vt_do_kdgkb_ioctl(int cmd, struct kbsentry __user *user_kdgkb, int perm) char *p; u_char *q; u_char __user *up; - int sz; + int sz, fnw_sz; int delta; char *first_free, *fj, *fnw; int i, j, k; int ret; + unsigned long flags; if (!capable(CAP_SYS_TTY_CONFIG)) perm = 0; @@ -2037,7 +2039,14 @@ int vt_do_kdgkb_ioctl(int cmd, struct kbsentry __user *user_kdgkb, int perm) goto reterr; } + fnw = NULL; + fnw_sz = 0; + /* race aginst other writers */ + again: + spin_lock_irqsave(_buf_lock, flags); q = func_table[i]; + + /* fj pointer to next entry after 'q' */ first_free = funcbufptr + (funcbufsize - funcbufleft); for (j = i+1; j < MAX_NR_FUNC && !func_table[j]; j++) ; @@ -2045,10 +2054,12 @@ int vt_do_kdgkb_ioctl(int cmd, struct kbsentry __user *user_kdgkb, int perm) fj = func_table[j]; else fj = first_free; - + /* buffer usage increase by new entry */ delta = (q ? -strlen(q) : 1) + strlen(kbs->kb_string); +
Re: 5.0.0-rc6+: Oops at boot: RIP: 0010:__memmove+0x81/0x1a0 / vt_do_kdgkb_ioctl+0x34d/0x440 (race at reenter?)
On Mon, 18 Feb 2019 09:38:10 +0100 Greg Kroah-Hartman wrote: > On Sun, Feb 17, 2019 at 11:39:57PM +0000, Sergei Trofimovich wrote: > > [ Copying as is from https://bugzilla.kernel.org/show_bug.cgi?id=202605 > > and sending to LKML. Greg, Jiri, can you clarify mailing > > list im MAINTAINERS as well? > > https://github.com/torvalds/linux/blob/master/MAINTAINERS#L15527 > > mentions no list for tty/vt/. ] > > > > Kernel Oops > > [ 38.739241] Oops: 0003 [#1] PREEMPT SMP > > [ 38.739243] CPU: 6 PID: 388 Comm: loadkeys Tainted: G C > > 5.0.0-rc6-00153-g5ded5871030e #91 > > [ 38.739244] Hardware name: Gigabyte Technology Co., Ltd. To be filled > > by O.E.M./H77M-D3H, BIOS F12 11/14/2013 > > [ 38.739249] RIP: 0010:__memmove+0x81/0x1a0 > > happes on a fresh vanilla master kernel roughly at boot > > (before tty login prompt): > > $ uname -r > > 5.0.0-rc6-00153-g5ded5871030e > > > > The kernel page fault happens at 'loadkeys start'. > > I suspect some kind of race at reenter of vt_do_kdgkb_ioctl(KDSKBSENT): > > > > https://github.com/torvalds/linux/blob/master/drivers/tty/vt/keyboard.c#L1986 > > > > The oops trace looks similar to the following reports (no details besides > > Oops) > > https://bugzilla.kernel.org/show_bug.cgi?id=194589 > > https://bugzilla.kernel.org/show_bug.cgi?id=202111 > > > > [ 38.044921] IPv6: ADDRCONF(NETDEV_CHANGE): br0: link becomes ready > > [ 38.533196] usb 1-1.2: r8712u: CustomerID = 0x > > [ 38.533200] usb 1-1.2: r8712u: MAC Address from efuse = 00:0d:81:a9:09:90 > > [ 38.533203] usb 1-1.2: r8712u: Loading firmware from > > "rtlwifi/rtl8712u.bin" > > [ 38.51] usbcore: registered new interface driver r8712u > > [ 38.736178] BUG: unable to handle kernel paging request at > > 9c8735448000 > > [ 38.737215] #PF error: [PROT] [WRITE] > > [ 38.737216] PGD 288a05067 P4D 288a05067 PUD 288a07067 PMD 7f60c2063 PTE > > 8007f5448161 > > [ 38.739241] Oops: 0003 [#1] PREEMPT SMP > > [ 38.739243] CPU: 6 PID: 388 Comm: loadkeys Tainted: G C > > 5.0.0-rc6-00153-g5ded5871030e #91 > > [ 38.739244] Hardware name: Gigabyte Technology Co., Ltd. To be filled by > > O.E.M./H77M-D3H, BIOS F12 11/14/2013 > > [ 38.739249] RIP: 0010:__memmove+0x81/0x1a0 > > [ 38.739251] Code: 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 > > e9 a2 00 00 00 66 90 48 89 d1 4c 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03 > > 48 a5 4d 89 1a e9 0c 01 00 00 0f 1f 40 00 48 89 d1 4c 8b 1e 49 > > [ 38.739252] RSP: 0018:a1b9002d7d08 EFLAGS: 00010203 > > [ 38.745857] RAX: 9c873541af43 RBX: 9c873541af43 RCX: > > 0c6f105cd6bf > > [ 38.745858] RDX: 637882e986b6 RSI: 9c8735447ffb RDI: > > 9c8735447ffb > > [ 38.745859] RBP: 9c8739cd3800 R08: 9c873b802f00 R09: > > f73b > > [ 38.745860] R10: b82b35f1 R11: 00505b1b004d5b1b R12: > > > > [ 38.745861] R13: 9c873541af3d R14: 000b R15: > > 000c > > [ 38.745862] FS: 7f450c390580() GS:9c873f18() > > knlGS: > > [ 38.745863] CS: 0010 DS: ES: CR0: 80050033 > > [ 38.745864] CR2: 9c8735448000 CR3: 0007e213c002 CR4: > > 000606e0 > > [ 38.745865] Call Trace: > > [ 38.745871] vt_do_kdgkb_ioctl+0x34d/0x440 > > [ 38.745875] vt_ioctl+0xba3/0x1190 > > [ 38.745879] ? __bpf_prog_run32+0x39/0x60 > > [ 38.745882] ? mem_cgroup_commit_charge+0x7b/0x4e0 > > [ 38.762583] tty_ioctl+0x23f/0x920 > > [ 38.762586] ? preempt_count_sub+0x98/0xe0 > > [ 38.762590] ? __seccomp_filter+0x67/0x600 > > [ 38.762594] do_vfs_ioctl+0xa2/0x6a0 > > [ 38.762597] ? syscall_trace_enter+0x192/0x2d0 > > [ 38.762599] ksys_ioctl+0x3a/0x70 > > [ 38.762601] __x64_sys_ioctl+0x16/0x20 > > [ 38.762604] do_syscall_64+0x54/0xe0 > > [ 38.772513] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > [ 38.772515] RIP: 0033:0x7f450c2bb427 > > [ 38.772517] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 > > 8d d2 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 > > <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 da 0c 00 f7 d8 64 89 01 48 > > [ 38.772518] RSP: 002b:7fffbcedd348 EFLAGS: 0246 ORIG_RAX: > > 0010 > > [ 38.772519] RAX: ffda RBX: 000b RCX: > > 7f450c2bb427 > > [ 38.772520] RDX: 0
5.0.0-rc6+: Oops at boot: RIP: 0010:__memmove+0x81/0x1a0 / vt_do_kdgkb_ioctl+0x34d/0x440 (race at reenter?)
[ Copying as is from https://bugzilla.kernel.org/show_bug.cgi?id=202605 and sending to LKML. Greg, Jiri, can you clarify mailing list im MAINTAINERS as well? https://github.com/torvalds/linux/blob/master/MAINTAINERS#L15527 mentions no list for tty/vt/. ] Kernel Oops [ 38.739241] Oops: 0003 [#1] PREEMPT SMP [ 38.739243] CPU: 6 PID: 388 Comm: loadkeys Tainted: G C 5.0.0-rc6-00153-g5ded5871030e #91 [ 38.739244] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M-D3H, BIOS F12 11/14/2013 [ 38.739249] RIP: 0010:__memmove+0x81/0x1a0 happes on a fresh vanilla master kernel roughly at boot (before tty login prompt): $ uname -r 5.0.0-rc6-00153-g5ded5871030e The kernel page fault happens at 'loadkeys start'. I suspect some kind of race at reenter of vt_do_kdgkb_ioctl(KDSKBSENT): https://github.com/torvalds/linux/blob/master/drivers/tty/vt/keyboard.c#L1986 The oops trace looks similar to the following reports (no details besides Oops) https://bugzilla.kernel.org/show_bug.cgi?id=194589 https://bugzilla.kernel.org/show_bug.cgi?id=202111 [ 38.044921] IPv6: ADDRCONF(NETDEV_CHANGE): br0: link becomes ready [ 38.533196] usb 1-1.2: r8712u: CustomerID = 0x [ 38.533200] usb 1-1.2: r8712u: MAC Address from efuse = 00:0d:81:a9:09:90 [ 38.533203] usb 1-1.2: r8712u: Loading firmware from "rtlwifi/rtl8712u.bin" [ 38.51] usbcore: registered new interface driver r8712u [ 38.736178] BUG: unable to handle kernel paging request at 9c8735448000 [ 38.737215] #PF error: [PROT] [WRITE] [ 38.737216] PGD 288a05067 P4D 288a05067 PUD 288a07067 PMD 7f60c2063 PTE 8007f5448161 [ 38.739241] Oops: 0003 [#1] PREEMPT SMP [ 38.739243] CPU: 6 PID: 388 Comm: loadkeys Tainted: G C 5.0.0-rc6-00153-g5ded5871030e #91 [ 38.739244] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M-D3H, BIOS F12 11/14/2013 [ 38.739249] RIP: 0010:__memmove+0x81/0x1a0 [ 38.739251] Code: 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 a2 00 00 00 66 90 48 89 d1 4c 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03 48 a5 4d 89 1a e9 0c 01 00 00 0f 1f 40 00 48 89 d1 4c 8b 1e 49 [ 38.739252] RSP: 0018:a1b9002d7d08 EFLAGS: 00010203 [ 38.745857] RAX: 9c873541af43 RBX: 9c873541af43 RCX: 0c6f105cd6bf [ 38.745858] RDX: 637882e986b6 RSI: 9c8735447ffb RDI: 9c8735447ffb [ 38.745859] RBP: 9c8739cd3800 R08: 9c873b802f00 R09: f73b [ 38.745860] R10: b82b35f1 R11: 00505b1b004d5b1b R12: [ 38.745861] R13: 9c873541af3d R14: 000b R15: 000c [ 38.745862] FS: 7f450c390580() GS:9c873f18() knlGS: [ 38.745863] CS: 0010 DS: ES: CR0: 80050033 [ 38.745864] CR2: 9c8735448000 CR3: 0007e213c002 CR4: 000606e0 [ 38.745865] Call Trace: [ 38.745871] vt_do_kdgkb_ioctl+0x34d/0x440 [ 38.745875] vt_ioctl+0xba3/0x1190 [ 38.745879] ? __bpf_prog_run32+0x39/0x60 [ 38.745882] ? mem_cgroup_commit_charge+0x7b/0x4e0 [ 38.762583] tty_ioctl+0x23f/0x920 [ 38.762586] ? preempt_count_sub+0x98/0xe0 [ 38.762590] ? __seccomp_filter+0x67/0x600 [ 38.762594] do_vfs_ioctl+0xa2/0x6a0 [ 38.762597] ? syscall_trace_enter+0x192/0x2d0 [ 38.762599] ksys_ioctl+0x3a/0x70 [ 38.762601] __x64_sys_ioctl+0x16/0x20 [ 38.762604] do_syscall_64+0x54/0xe0 [ 38.772513] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 38.772515] RIP: 0033:0x7f450c2bb427 [ 38.772517] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 8d d2 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 da 0c 00 f7 d8 64 89 01 48 [ 38.772518] RSP: 002b:7fffbcedd348 EFLAGS: 0246 ORIG_RAX: 0010 [ 38.772519] RAX: ffda RBX: 000b RCX: 7f450c2bb427 [ 38.772520] RDX: 7fffbcedd360 RSI: 4b49 RDI: 0003 [ 38.772521] RBP: 7fffbcedd361 R08: 7f450c389c40 R09: 55cbef2494a0 [ 38.772522] R10: R11: 0246 R12: 55cbef2412b0 [ 38.772522] R13: 7fffbcedd360 R14: 000b R15: 0003 [ 38.772525] Modules linked in: snd_hda_codec_hdmi bridge r8712u(C) stp llc snd_hda_codec_via snd_hda_codec_generic snd_hda_intel snd_hda_codec x86_pkg_temp_thermal dummy kvm_intel snd_hwdep snd_hda_core snd_pcm snd_timer kvm snd atl1c soundcore irqbypass xfs tun nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 loop fuse binfmt_misc ipv6 [ 38.779196] r8712u 1-1.2:1.0 wl0: renamed from wlan0 [ 38.779240] CR2: 9c8735448000 [ 38.790894] ---[ end trace 8116e48ba19076a0 ]--- [ 38.790897] RIP: 0010:__memmove+0x81/0x1a0 [ 38.790898] Code: 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 a2 00 00 00 66 90 48 89 d1 4c 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03 48 a5 4d 89 1a e9 0c 01 00 00 0f
[PATCH v2] alpha: fix page fault handling for r16-r18 targets
Fix page fault handling code to fixup r16-r18 registers. Before the patch code had off-by-two registers bug. This bug caused overwriting of ps,pc,gp registers instead of fixing intended r16,r17,r18 (see `struct pt_regs`). More details: Initially Dmitry noticed a kernel bug as a failure on strace test suite. Test passes unmapped userspace pointer to io_submit: ```c #include #include #include #include int main(void) { unsigned long ctx = 0; if (syscall(__NR_io_setup, 1, )) err(1, "io_setup"); const size_t page_size = sysconf(_SC_PAGESIZE); const size_t size = page_size * 2; void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (MAP_FAILED == ptr) err(1, "mmap(%zu)", size); if (munmap(ptr, size)) err(1, "munmap"); syscall(__NR_io_submit, ctx, 1, ptr + page_size); syscall(__NR_io_destroy, ctx); return 0; } ``` Running this test causes kernel to crash when handling page fault: ``` Unable to handle kernel paging request at virtual address 9468 CPU 3 aio(26027): Oops 0 pc = [] ra = [] ps = Not tainted pc is at sys_io_submit+0x108/0x200 ra is at sys_io_submit+0x6c/0x200 v0 = fc00c58e6300 t0 = fff2 t1 = 0225e000 t2 = fc01f159fef8 t3 = fc0001009640 t4 = fce0f6e0 t5 = 020001002e9e t6 = 4c41564e49452031 t7 = fc01f159c000 s0 = 0002 s1 = 0225e000 s2 = s3 = s4 = s5 = fff2 s6 = fc00c58e6300 a0 = fc00c58e6300 a1 = a2 = 0225e000 a3 = 021ac260 a4 = 021ac1e8 a5 = 0001 t8 = 0008 t9 = 00011f8bce30 t10= 021ac440 t11= pv = fc6fd320 at = gp = sp = 265fd174 Disabling lock debugging due to kernel taint Trace: [] entSys+0xa4/0xc0 ``` Here `gp` has invalid value. `gp is s overwritten by a fixup for the following page fault handler in `io_submit` syscall handler: ``` __se_sys_io_submit ... ldq a1,0(t1) bne t0,4280 <__se_sys_io_submit+0x180> ``` After a page fault `t0` should contain -EFALUT and `a1` is 0. Instead `gp` was overwritten in place of `a1`. This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18` (aka `a0-a2`). I think the bug went unnoticed for a long time as `gp` is one of scratch registers. Any kernel function call would re-calculate `gp`. Dmitry tracked down the bug origin back to 2.1.32 kernel version where trap_a{0,1,2} fields were inserted into struct pt_regs. And even before that `dpf_reg()` contained off-by-one error. Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: linux-al...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reported-and-reviewed-by: "Dmitry V. Levin" Cc: sta...@vger.kernel.org # v2.1.32+ Bug: https://bugs.gentoo.org/672040 Signed-off-by: Sergei Trofimovich --- Changes since V1: - expanded bug origin tracked down by Dmitry - added proper Dmitry's email and reviwed by tags arch/alpha/mm/fault.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index d73dc473fbb9..188fc9256baf 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -78,7 +78,7 @@ __load_new_mm_context(struct mm_struct *next_mm) /* Macro for exception fixup code to access integer registers. */ #define dpf_reg(r) \ (((unsigned long *)regs)[(r) <= 8 ? (r) : (r) <= 15 ? (r)-16 : \ -(r) <= 18 ? (r)+8 : (r)-10]) +(r) <= 18 ? (r)+10 : (r)-10]) asmlinkage void do_page_fault(unsigned long address, unsigned long mmcsr, -- 2.20.1
[PATCH] alpha: fix page fault handling for r16-r18 targets
Fix page fault handling code to fixup r16-r18 registers. Before the patch code had off-by-two registers bug. This bug caused overwriting of ps,pc,gp registers instead of fixing intended r16,r17,r18 (see `struct pt_regs`). More details: Initially Dmitry noticed a kernel bug as a failure on strace test suite. Test passes unmapped userspace pointer to io_submit: ```c #include #include #include #include int main(void) { unsigned long ctx = 0; if (syscall(__NR_io_setup, 1, )) err(1, "io_setup"); const size_t page_size = sysconf(_SC_PAGESIZE); const size_t size = page_size * 2; void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (MAP_FAILED == ptr) err(1, "mmap(%zu)", size); if (munmap(ptr, size)) err(1, "munmap"); syscall(__NR_io_submit, ctx, 1, ptr + page_size); syscall(__NR_io_destroy, ctx); return 0; } ``` Running this test causes kernel to crash when handling page fault: ``` Unable to handle kernel paging request at virtual address 9468 CPU 3 aio(26027): Oops 0 pc = [] ra = [] ps = Not tainted pc is at sys_io_submit+0x108/0x200 ra is at sys_io_submit+0x6c/0x200 v0 = fc00c58e6300 t0 = fff2 t1 = 0225e000 t2 = fc01f159fef8 t3 = fc0001009640 t4 = fce0f6e0 t5 = 020001002e9e t6 = 4c41564e49452031 t7 = fc01f159c000 s0 = 0002 s1 = 0225e000 s2 = s3 = s4 = s5 = fff2 s6 = fc00c58e6300 a0 = fc00c58e6300 a1 = a2 = 0225e000 a3 = 021ac260 a4 = 021ac1e8 a5 = 0001 t8 = 0008 t9 = 00011f8bce30 t10= 021ac440 t11= pv = fc6fd320 at = gp = sp = 265fd174 Disabling lock debugging due to kernel taint Trace: [] entSys+0xa4/0xc0 ``` Here `gp` has invalid value. `gp is s overwritten by a fixup for the following page fault handler in `io_submit` syscall handler: ``` __se_sys_io_submit ... ldq a1,0(t1) bne t0,4280 <__se_sys_io_submit+0x180> ``` After a page fault `t0` should contain -EFALUT and `a1` is 0. Instead `gp` was overwritten in place of `a1`. This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18` (aka `a0-a2`). I think the bug went unnoticed for a long time as `gp` is one of scratch registers. Any kernel function call would re-calculate `gp`. CC: Dmitry V. Levin CC: Richard Henderson CC: Ivan Kokshaysky CC: Matt Turner CC: linux-al...@vger.kernel.org CC: linux-kernel@vger.kernel.org Reported-by: Dmitry V. Levin Bug: https://bugs.gentoo.org/672040 Signed-off-by: Sergei Trofimovich --- arch/alpha/mm/fault.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index d73dc473fbb9..188fc9256baf 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -78,7 +78,7 @@ __load_new_mm_context(struct mm_struct *next_mm) /* Macro for exception fixup code to access integer registers. */ #define dpf_reg(r) \ (((unsigned long *)regs)[(r) <= 8 ? (r) : (r) <= 15 ? (r)-16 : \ -(r) <= 18 ? (r)+8 : (r)-10]) +(r) <= 18 ? (r)+10 : (r)-10]) asmlinkage void do_page_fault(unsigned long address, unsigned long mmcsr, -- 2.20.1
Re: [PATCH] ia64: enable GENERIC_HWEIGHT
On Fri, 14 Sep 2018 08:06:46 +0100 Sergei Trofimovich wrote: > Noticed on a single driver failure: > ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] > undefined! > > CC: Tony Luck > CC: Fenghua Yu > CC: linux-i...@vger.kernel.org > CC: Andrew Morton > CC: linux-kernel@vger.kernel.org > Signed-off-by: Sergei Trofimovich > --- > arch/ia64/Kconfig | 4 > 1 file changed, 4 insertions(+) > > diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig > index 8b4a0c1748c0..1a71f92f0b8e 100644 > --- a/arch/ia64/Kconfig > +++ b/arch/ia64/Kconfig > @@ -576,3 +576,7 @@ config MSPEC > If you have an ia64 and you want to enable memory special > operations support (formerly known as fetchop), say Y here, > otherwise say N. > + > +config GENERIC_HWEIGHT > + bool > + default y > -- > 2.19.0 > Ping. -- Sergei
Re: [PATCH] ia64: enable GENERIC_HWEIGHT
On Fri, 14 Sep 2018 08:06:46 +0100 Sergei Trofimovich wrote: > Noticed on a single driver failure: > ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] > undefined! > > CC: Tony Luck > CC: Fenghua Yu > CC: linux-i...@vger.kernel.org > CC: Andrew Morton > CC: linux-kernel@vger.kernel.org > Signed-off-by: Sergei Trofimovich > --- > arch/ia64/Kconfig | 4 > 1 file changed, 4 insertions(+) > > diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig > index 8b4a0c1748c0..1a71f92f0b8e 100644 > --- a/arch/ia64/Kconfig > +++ b/arch/ia64/Kconfig > @@ -576,3 +576,7 @@ config MSPEC > If you have an ia64 and you want to enable memory special > operations support (formerly known as fetchop), say Y here, > otherwise say N. > + > +config GENERIC_HWEIGHT > + bool > + default y > -- > 2.19.0 > Ping. -- Sergei
Re: [PATCH] ia64: disable SCHED_STACK_END_CHECK
On Fri, 14 Sep 2018 08:06:17 +0100 Sergei Trofimovich wrote: > SCHED_STACK_END_CHECK assumes stack grows in one direction. > ia64 is a rare case where it is not. > > As a result kernel fails at startup as: > Kernel panic - not syncing: corrupted stack end detected inside scheduler > > The error does not find a real problem: it's register backing store > is written on top of canary value. > > Disable SCHED_STACK_END_CHECK on ia64 as there is no good > place for canary without moving initial stack address. > > CC: Tony Luck > CC: Fenghua Yu > CC: linux-i...@vger.kernel.org > CC: Andrew Morton > CC: linux-kernel@vger.kernel.org > Signed-off-by: Sergei Trofimovich > --- > lib/Kconfig.debug | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index 4966c4fbe7f7..a097dfe38d2b 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -1004,7 +1004,7 @@ config SCHEDSTATS > > config SCHED_STACK_END_CHECK > bool "Detect stack corruption on calls to schedule()" > - depends on DEBUG_KERNEL > + depends on DEBUG_KERNEL && !IA64 > default n > help > This option checks for a stack overrun on calls to schedule(). > -- > 2.19.0 > Ping. -- Sergei
Re: [PATCH] ia64: disable SCHED_STACK_END_CHECK
On Fri, 14 Sep 2018 08:06:17 +0100 Sergei Trofimovich wrote: > SCHED_STACK_END_CHECK assumes stack grows in one direction. > ia64 is a rare case where it is not. > > As a result kernel fails at startup as: > Kernel panic - not syncing: corrupted stack end detected inside scheduler > > The error does not find a real problem: it's register backing store > is written on top of canary value. > > Disable SCHED_STACK_END_CHECK on ia64 as there is no good > place for canary without moving initial stack address. > > CC: Tony Luck > CC: Fenghua Yu > CC: linux-i...@vger.kernel.org > CC: Andrew Morton > CC: linux-kernel@vger.kernel.org > Signed-off-by: Sergei Trofimovich > --- > lib/Kconfig.debug | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index 4966c4fbe7f7..a097dfe38d2b 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -1004,7 +1004,7 @@ config SCHEDSTATS > > config SCHED_STACK_END_CHECK > bool "Detect stack corruption on calls to schedule()" > - depends on DEBUG_KERNEL > + depends on DEBUG_KERNEL && !IA64 > default n > help > This option checks for a stack overrun on calls to schedule(). > -- > 2.19.0 > Ping. -- Sergei
[PATCH] ia64: enable GENERIC_HWEIGHT
Noticed on a single driver failure: ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] undefined! CC: Tony Luck CC: Fenghua Yu CC: linux-i...@vger.kernel.org CC: Andrew Morton CC: linux-kernel@vger.kernel.org Signed-off-by: Sergei Trofimovich --- arch/ia64/Kconfig | 4 1 file changed, 4 insertions(+) diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 8b4a0c1748c0..1a71f92f0b8e 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -576,3 +576,7 @@ config MSPEC If you have an ia64 and you want to enable memory special operations support (formerly known as fetchop), say Y here, otherwise say N. + +config GENERIC_HWEIGHT + bool + default y -- 2.19.0
[PATCH] ia64: disable SCHED_STACK_END_CHECK
SCHED_STACK_END_CHECK assumes stack grows in one direction. ia64 is a rare case where it is not. As a result kernel fails at startup as: Kernel panic - not syncing: corrupted stack end detected inside scheduler The error does not find a real problem: it's register backing store is written on top of canary value. Disable SCHED_STACK_END_CHECK on ia64 as there is no good place for canary without moving initial stack address. CC: Tony Luck CC: Fenghua Yu CC: linux-i...@vger.kernel.org CC: Andrew Morton CC: linux-kernel@vger.kernel.org Signed-off-by: Sergei Trofimovich --- lib/Kconfig.debug | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 4966c4fbe7f7..a097dfe38d2b 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1004,7 +1004,7 @@ config SCHEDSTATS config SCHED_STACK_END_CHECK bool "Detect stack corruption on calls to schedule()" - depends on DEBUG_KERNEL + depends on DEBUG_KERNEL && !IA64 default n help This option checks for a stack overrun on calls to schedule(). -- 2.19.0
[PATCH] ia64: enable GENERIC_HWEIGHT
Noticed on a single driver failure: ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] undefined! CC: Tony Luck CC: Fenghua Yu CC: linux-i...@vger.kernel.org CC: Andrew Morton CC: linux-kernel@vger.kernel.org Signed-off-by: Sergei Trofimovich --- arch/ia64/Kconfig | 4 1 file changed, 4 insertions(+) diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 8b4a0c1748c0..1a71f92f0b8e 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -576,3 +576,7 @@ config MSPEC If you have an ia64 and you want to enable memory special operations support (formerly known as fetchop), say Y here, otherwise say N. + +config GENERIC_HWEIGHT + bool + default y -- 2.19.0
[PATCH] ia64: disable SCHED_STACK_END_CHECK
SCHED_STACK_END_CHECK assumes stack grows in one direction. ia64 is a rare case where it is not. As a result kernel fails at startup as: Kernel panic - not syncing: corrupted stack end detected inside scheduler The error does not find a real problem: it's register backing store is written on top of canary value. Disable SCHED_STACK_END_CHECK on ia64 as there is no good place for canary without moving initial stack address. CC: Tony Luck CC: Fenghua Yu CC: linux-i...@vger.kernel.org CC: Andrew Morton CC: linux-kernel@vger.kernel.org Signed-off-by: Sergei Trofimovich --- lib/Kconfig.debug | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 4966c4fbe7f7..a097dfe38d2b 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1004,7 +1004,7 @@ config SCHEDSTATS config SCHED_STACK_END_CHECK bool "Detect stack corruption on calls to schedule()" - depends on DEBUG_KERNEL + depends on DEBUG_KERNEL && !IA64 default n help This option checks for a stack overrun on calls to schedule(). -- 2.19.0
Re: [PATCH v2, simpler] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)
On Fri, 9 Mar 2018 23:15:55 + Sergei Trofimovich wrote: I tried to explain in more detail breakage mechanics of unwinder and gcc code generation quirks at: https://trofi.github.io/posts/210-ptrace-and-accidental-boot-fix-on-ia64.html Hopefully it gives better intuition of code change caused by both proposed patches. I personally think v1 patch is slightly more robust. -- Sergei
Re: [PATCH v2, simpler] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)
On Fri, 9 Mar 2018 23:15:55 + Sergei Trofimovich wrote: I tried to explain in more detail breakage mechanics of unwinder and gcc code generation quirks at: https://trofi.github.io/posts/210-ptrace-and-accidental-boot-fix-on-ia64.html Hopefully it gives better intuition of code change caused by both proposed patches. I personally think v1 patch is slightly more robust. -- Sergei
Re: x86_64: movdqu rarely stores bad data (movdqu works fine). Kernel bug, fried CPU or glibc bug?
On Sat, 16 Jun 2018 22:22:50 +0100 Sergei Trofimovich wrote: > TL;DR: on master string/test-memmove glibc test fails on my machine > and I don't know why. Other tests work fine. > ... > This fails: > loop { > movdqu [src++],%xmm0 > movntdq %xmm0,[dst++] > } > sfence > This works: > loop { > movdqu [src++],%xmm0 > movdqu %xmm0,[dst++] > } > sfence > ... > If there is no obvious problems with glibc's memove() or my small test > what can I do to rule-out/pin-down hardware or kernel problem? Found the cause: bad RAM module. After I've tweaked test to allocate most of available physical RAM I've got fully reproducible failure. I unplugged RAM modules one by one and ran the test. That way I've nailed down to one bad chip. Removing single bad chip restored string/test-memmove test on this machine \o/ Sorry for the noise! -- Sergei pgp2Q0GGYjjHI.pgp Description: Цифровая подпись OpenPGP
Re: x86_64: movdqu rarely stores bad data (movdqu works fine). Kernel bug, fried CPU or glibc bug?
On Sat, 16 Jun 2018 22:22:50 +0100 Sergei Trofimovich wrote: > TL;DR: on master string/test-memmove glibc test fails on my machine > and I don't know why. Other tests work fine. > ... > This fails: > loop { > movdqu [src++],%xmm0 > movntdq %xmm0,[dst++] > } > sfence > This works: > loop { > movdqu [src++],%xmm0 > movdqu %xmm0,[dst++] > } > sfence > ... > If there is no obvious problems with glibc's memove() or my small test > what can I do to rule-out/pin-down hardware or kernel problem? Found the cause: bad RAM module. After I've tweaked test to allocate most of available physical RAM I've got fully reproducible failure. I unplugged RAM modules one by one and ran the test. That way I've nailed down to one bad chip. Removing single bad chip restored string/test-memmove test on this machine \o/ Sorry for the noise! -- Sergei pgp2Q0GGYjjHI.pgp Description: Цифровая подпись OpenPGP
x86_64: movdqu rarely stores bad data (movdqu works fine). Kernel bug, fried CPU or glibc bug?
TL;DR: on master string/test-memmove glibc test fails on my machine and I don't know why. Other tests work fine. $ elf/ld.so --inhibit-cache --library-path . string/test-memmove simple_memmove __memmove_ssse3_rep __memmove_ssse3 __memmove_sse2_unaligned__memmove_ia32 string/test-memmove: Wrong result in function __memmove_sse2_unaligned dst "0x7084" src "0x7000" offset "43297733" https://sourceware.org/git/?p=glibc.git;a=blob;f=string/test-memmove.c;h=64e3651ba40604e47ddf6d633f4d0aea4644f60a;hb=HEAD Long story: I've trimmed __memmove_sse2_unaligned implementation down to test-memmove-xmm-unaligned.c (attached). It's supposed to show failed memmove attempts when those happen: $ gcc -ggdb3 -O2 -m32 test-memmove-xmm-unaligned.c -o test-memmove-xmm-unaligned -Wall && ./test-memmove-xmm-unaligned Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): offset= 3786689; expected=0039C7C1( 3786689) actual=0039C7C3( 3786691) bit_mismatch=0002; iteration=1 Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): offset= 3786689; expected=0039C7C1( 3786689) actual=0039C7C3( 3786691) bit_mismatch=0002; iteration=3 Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): offset= 5448641; expected=005323C1( 5448641) actual=005323C3( 5448643) bit_mismatch=0002; iteration=5 Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): offset=29022145; expected=01BAD7C1(29022145) actual=01BAD7C3(29022147) bit_mismatch=0002; iteration=9 $ gcc -ggdb3 -O2 -m64 test-memmove-xmm-unaligned.c -o test-memmove-xmm-unaligned -Wall && ./test-memmove-xmm-unaligned Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): offset=25257857; expected=01816781(25257857) actual=01816783(25257859) bit_mismatch=0002; iteration=43 Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): offset=28109697; expected=01ACEB81(28109697) actual=01ACEB83(28109699) bit_mismatch=0002; iteration=112 Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): offset=18257633; expected=011696E1(18257633) actual=011696E3(18257635) bit_mismatch=0002; iteration=363 Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): offset=26981249; expected=019BB381(26981249) actual=019BB383(26981251) bit_mismatch=0002; iteration=437 Note it is a single-bit corruption happening occasionally (not on every iteration). -m32 is way more error prone that -m64. Test example roughly implements these 2 loops: This fails: sfence loop { movdqu [src++],%xmm0 movntdq %xmm0,[dst++] } sfence This works: sfence loop { movdqu [src++],%xmm0 movdqu %xmm0,[dst++] } sfence Failures happen only on sandybridge CPU: Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz kernel is 4.17.0-11928-g2837461dbe6f. Problem is not reproducible instantly after reboot. Machine has to be heavily loaded to start corrupting memory. A few hours of memtest86+ does not reveal any memory failures. I wonder if anyone else can reproduce this failure or should I start looking for a new CPU. From the above it looks like as if movntdq does not play well with XMM context save/restore and there is an 'mfence' missing somewhere in interrupt handling. If there is no obvious problems with glibc's memove() or my small test what can I do to rule-out/pin-down hardware or kernel problem? Thanks! -- Sergei /* Test as: $ gcc -ggdb3 -O2 -m32 test-memmove-xmm-unaligned.c -o test-memmove-xmm-unaligned -Wall && ./test-memmove-xmm-unaligned Error example: Bad result in memmove(dst=0xd7cf5094, src=0xd7cf5010, len=268435456): offset= 8031729; expected=007A8DF1( 8031729) actual=007A8DF3( 8031731) bit_mismatch=0002; iteration=2 Bad result in memmove(dst=0xd7cf5094, src=0xd7cf5010, len=268435456): offset=43626993; expected=0299B1F1(43626993) actual=0299B1F3(43626995) bit_mismatch=0002; iteration=3 Bad result in memmove(dst=0xd7cf5094, src=0xd7cf5010, len=268435456): offset=25404913; expected=0183A5F1(25404913) actual=0183A5F3(25404915) bit_mismatch=0002; iteration=4 ... */ #include /* memmove */ #include /* exit */ #include /* fprintf */ #include /* mlock() */ #include /* movdqu, sfence, movntdq */ typedef unsigned int u32; static void memmove_si128u (__m128i_u * dest, __m128i_u const *src, size_t items) __attribute__((noinline)); static void memmove_si128u (__m128i_u * dest, __m128i_u const *src, size_t items) { // emulate behaviour of optimised block for __memmove_sse2_unaligned: // sfence // loop(backwards) { // 8x movdqu mem->%xmm{N} // 8x movntdq %xmm{N}->mem // } // source: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S;h=9aa17de99c9c3415a9b5ac28fd9f1eb4457f916d;hb=HEAD#l244 // ASSUME: if ((unintptr_t)dest > (unintptr_t)src) {
x86_64: movdqu rarely stores bad data (movdqu works fine). Kernel bug, fried CPU or glibc bug?
TL;DR: on master string/test-memmove glibc test fails on my machine and I don't know why. Other tests work fine. $ elf/ld.so --inhibit-cache --library-path . string/test-memmove simple_memmove __memmove_ssse3_rep __memmove_ssse3 __memmove_sse2_unaligned__memmove_ia32 string/test-memmove: Wrong result in function __memmove_sse2_unaligned dst "0x7084" src "0x7000" offset "43297733" https://sourceware.org/git/?p=glibc.git;a=blob;f=string/test-memmove.c;h=64e3651ba40604e47ddf6d633f4d0aea4644f60a;hb=HEAD Long story: I've trimmed __memmove_sse2_unaligned implementation down to test-memmove-xmm-unaligned.c (attached). It's supposed to show failed memmove attempts when those happen: $ gcc -ggdb3 -O2 -m32 test-memmove-xmm-unaligned.c -o test-memmove-xmm-unaligned -Wall && ./test-memmove-xmm-unaligned Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): offset= 3786689; expected=0039C7C1( 3786689) actual=0039C7C3( 3786691) bit_mismatch=0002; iteration=1 Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): offset= 3786689; expected=0039C7C1( 3786689) actual=0039C7C3( 3786691) bit_mismatch=0002; iteration=3 Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): offset= 5448641; expected=005323C1( 5448641) actual=005323C3( 5448643) bit_mismatch=0002; iteration=5 Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): offset=29022145; expected=01BAD7C1(29022145) actual=01BAD7C3(29022147) bit_mismatch=0002; iteration=9 $ gcc -ggdb3 -O2 -m64 test-memmove-xmm-unaligned.c -o test-memmove-xmm-unaligned -Wall && ./test-memmove-xmm-unaligned Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): offset=25257857; expected=01816781(25257857) actual=01816783(25257859) bit_mismatch=0002; iteration=43 Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): offset=28109697; expected=01ACEB81(28109697) actual=01ACEB83(28109699) bit_mismatch=0002; iteration=112 Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): offset=18257633; expected=011696E1(18257633) actual=011696E3(18257635) bit_mismatch=0002; iteration=363 Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): offset=26981249; expected=019BB381(26981249) actual=019BB383(26981251) bit_mismatch=0002; iteration=437 Note it is a single-bit corruption happening occasionally (not on every iteration). -m32 is way more error prone that -m64. Test example roughly implements these 2 loops: This fails: sfence loop { movdqu [src++],%xmm0 movntdq %xmm0,[dst++] } sfence This works: sfence loop { movdqu [src++],%xmm0 movdqu %xmm0,[dst++] } sfence Failures happen only on sandybridge CPU: Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz kernel is 4.17.0-11928-g2837461dbe6f. Problem is not reproducible instantly after reboot. Machine has to be heavily loaded to start corrupting memory. A few hours of memtest86+ does not reveal any memory failures. I wonder if anyone else can reproduce this failure or should I start looking for a new CPU. From the above it looks like as if movntdq does not play well with XMM context save/restore and there is an 'mfence' missing somewhere in interrupt handling. If there is no obvious problems with glibc's memove() or my small test what can I do to rule-out/pin-down hardware or kernel problem? Thanks! -- Sergei /* Test as: $ gcc -ggdb3 -O2 -m32 test-memmove-xmm-unaligned.c -o test-memmove-xmm-unaligned -Wall && ./test-memmove-xmm-unaligned Error example: Bad result in memmove(dst=0xd7cf5094, src=0xd7cf5010, len=268435456): offset= 8031729; expected=007A8DF1( 8031729) actual=007A8DF3( 8031731) bit_mismatch=0002; iteration=2 Bad result in memmove(dst=0xd7cf5094, src=0xd7cf5010, len=268435456): offset=43626993; expected=0299B1F1(43626993) actual=0299B1F3(43626995) bit_mismatch=0002; iteration=3 Bad result in memmove(dst=0xd7cf5094, src=0xd7cf5010, len=268435456): offset=25404913; expected=0183A5F1(25404913) actual=0183A5F3(25404915) bit_mismatch=0002; iteration=4 ... */ #include /* memmove */ #include /* exit */ #include /* fprintf */ #include /* mlock() */ #include /* movdqu, sfence, movntdq */ typedef unsigned int u32; static void memmove_si128u (__m128i_u * dest, __m128i_u const *src, size_t items) __attribute__((noinline)); static void memmove_si128u (__m128i_u * dest, __m128i_u const *src, size_t items) { // emulate behaviour of optimised block for __memmove_sse2_unaligned: // sfence // loop(backwards) { // 8x movdqu mem->%xmm{N} // 8x movntdq %xmm{N}->mem // } // source: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S;h=9aa17de99c9c3415a9b5ac28fd9f1eb4457f916d;hb=HEAD#l244 // ASSUME: if ((unintptr_t)dest > (unintptr_t)src) {
Re: [PATCH] modify one dead link
On Tue, 20 Mar 2018 10:54:22 -0400 Dongliang Muwrote: > -# hg clone http://xenbits.xensource.com/ext/ia64/xen-unstable.hg > +# hg clone http://xenbits.xensource.com/ext/ia64/xen-unstable > # cd xen-unstable.hg > # hg clone http://xenbits.xensource.com/ext/ia64/linux-2.6.18-xen.hg You will need to fix a 'cd' as well: cd xen-unstable Otherwise looks good. -- Sergei
Re: [PATCH] modify one dead link
On Tue, 20 Mar 2018 10:54:22 -0400 Dongliang Mu wrote: > -# hg clone http://xenbits.xensource.com/ext/ia64/xen-unstable.hg > +# hg clone http://xenbits.xensource.com/ext/ia64/xen-unstable > # cd xen-unstable.hg > # hg clone http://xenbits.xensource.com/ext/ia64/linux-2.6.18-xen.hg You will need to fix a 'cd' as well: cd xen-unstable Otherwise looks good. -- Sergei
[PATCH v2, simpler] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)
The strace breakage looks like that: ./strace: get_regs: get_regs_error: Input/output error It happens because ia64 needs to load unwind tables to read certain registers in 'PTRACE_GETREGS'. Unwind tables fail to load at kernel startup due to GCC quirk on the following code (logged as PR 84184): extern char __end_unwind[]; const struct unw_table_entry *end = (struct unw_table_entry *)table_end; table->end = segment_base + end[-1].end_offset; GCC does not generate correct code for this single memory reference after constant propagation. Two triggers are required for bad code generation: - '__end_unwind' has alignment lower (char), than 'struct unw_table_entry' (8). - symbol offset is negative. This commit workarounds it by disabling inline on init_unwind_table(). This way we avoid const-propagation of '__end_unwind' and pass address via register. Tested in ski (emulator) and on rx2600, rx3600 (real hardware). In case of rx2600 it unbreaks booting. This patch is a lighter version of patch https://lkml.org/lkml/2018/2/2/914 CC: Tony Luck <tony.l...@intel.com> CC: Fenghua Yu <fenghua...@intel.com> CC: linux-i...@vger.kernel.org CC: linux-kernel@vger.kernel.org Bug: https://github.com/strace/strace/issues/33 Bug: https://gcc.gnu.org/PR84184 Reported-by: Émeric Maschino <emeric.masch...@gmail.com> Tested-by: stanton_a...@mail.com Signed-off-by: Sergei Trofimovich <sly...@gentoo.org> --- arch/ia64/kernel/unwind.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/ia64/kernel/unwind.c b/arch/ia64/kernel/unwind.c index e04efa088902..a18190bc99a9 100644 --- a/arch/ia64/kernel/unwind.c +++ b/arch/ia64/kernel/unwind.c @@ -2078,7 +2078,14 @@ unw_init_from_blocked_task (struct unw_frame_info *info, struct task_struct *t) } EXPORT_SYMBOL(unw_init_from_blocked_task); -static void +/* + * We use 'noinline' to evade GCC bug https://gcc.gnu.org/PR84184 + * where gcc code generator emits incorrect code when '__end_unwind' + * is const-propagated to 'end[-1].end_offset' and gcc generates + * incorrect code. The prigger there is negative offset relative + * to externally-defined symbol. + */ +noinline static void init_unwind_table (struct unw_table *table, const char *name, unsigned long segment_base, unsigned long gp, const void *table_start, const void *table_end) { -- 2.16.2
[PATCH v2, simpler] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)
The strace breakage looks like that: ./strace: get_regs: get_regs_error: Input/output error It happens because ia64 needs to load unwind tables to read certain registers in 'PTRACE_GETREGS'. Unwind tables fail to load at kernel startup due to GCC quirk on the following code (logged as PR 84184): extern char __end_unwind[]; const struct unw_table_entry *end = (struct unw_table_entry *)table_end; table->end = segment_base + end[-1].end_offset; GCC does not generate correct code for this single memory reference after constant propagation. Two triggers are required for bad code generation: - '__end_unwind' has alignment lower (char), than 'struct unw_table_entry' (8). - symbol offset is negative. This commit workarounds it by disabling inline on init_unwind_table(). This way we avoid const-propagation of '__end_unwind' and pass address via register. Tested in ski (emulator) and on rx2600, rx3600 (real hardware). In case of rx2600 it unbreaks booting. This patch is a lighter version of patch https://lkml.org/lkml/2018/2/2/914 CC: Tony Luck CC: Fenghua Yu CC: linux-i...@vger.kernel.org CC: linux-kernel@vger.kernel.org Bug: https://github.com/strace/strace/issues/33 Bug: https://gcc.gnu.org/PR84184 Reported-by: Émeric Maschino Tested-by: stanton_a...@mail.com Signed-off-by: Sergei Trofimovich --- arch/ia64/kernel/unwind.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/ia64/kernel/unwind.c b/arch/ia64/kernel/unwind.c index e04efa088902..a18190bc99a9 100644 --- a/arch/ia64/kernel/unwind.c +++ b/arch/ia64/kernel/unwind.c @@ -2078,7 +2078,14 @@ unw_init_from_blocked_task (struct unw_frame_info *info, struct task_struct *t) } EXPORT_SYMBOL(unw_init_from_blocked_task); -static void +/* + * We use 'noinline' to evade GCC bug https://gcc.gnu.org/PR84184 + * where gcc code generator emits incorrect code when '__end_unwind' + * is const-propagated to 'end[-1].end_offset' and gcc generates + * incorrect code. The prigger there is negative offset relative + * to externally-defined symbol. + */ +noinline static void init_unwind_table (struct unw_table *table, const char *name, unsigned long segment_base, unsigned long gp, const void *table_start, const void *table_end) { -- 2.16.2
[PATCH] ia64: doc: tweak whitespace for 'console=' parameter
CC: Tony Luck <tony.l...@intel.com> CC: Fenghua Yu <fenghua...@intel.com> CC: linux-i...@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Sergei Trofimovich <sly...@gentoo.org> --- Documentation/ia64/serial.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/ia64/serial.txt b/Documentation/ia64/serial.txt index 6869c73de4e2..a63d2c54329b 100644 --- a/Documentation/ia64/serial.txt +++ b/Documentation/ia64/serial.txt @@ -111,7 +111,7 @@ TROUBLESHOOTING SERIAL CONSOLE PROBLEMS - If you don't have an HCDP, the kernel doesn't know where your console lives until the driver discovers serial - devices. Use "console=uart, io,0x3f8" (or appropriate + devices. Use "console=uart,io,0x3f8" (or appropriate address for your machine). Kernel and init script output works fine, but no "login:" prompt: -- 2.16.1
[PATCH] ia64: doc: tweak whitespace for 'console=' parameter
CC: Tony Luck CC: Fenghua Yu CC: linux-i...@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Sergei Trofimovich --- Documentation/ia64/serial.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/ia64/serial.txt b/Documentation/ia64/serial.txt index 6869c73de4e2..a63d2c54329b 100644 --- a/Documentation/ia64/serial.txt +++ b/Documentation/ia64/serial.txt @@ -111,7 +111,7 @@ TROUBLESHOOTING SERIAL CONSOLE PROBLEMS - If you don't have an HCDP, the kernel doesn't know where your console lives until the driver discovers serial - devices. Use "console=uart, io,0x3f8" (or appropriate + devices. Use "console=uart,io,0x3f8" (or appropriate address for your machine). Kernel and init script output works fine, but no "login:" prompt: -- 2.16.1
Re: [PATCH] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)
On Fri, 2 Feb 2018 23:02:20 + Sergei Trofimovich <sly...@gentoo.org> wrote: > On Fri, 2 Feb 2018 14:22:32 -0800 > "Luck, Tony" <tony.l...@intel.com> wrote: > > > On Fri, Feb 02, 2018 at 10:12:24PM +, Sergei Trofimovich wrote: > > > The strace breakage looks like that: > > > ./strace: get_regs: get_regs_error: Input/output error > > > > > > It happens because ia64 needs to load unwind tables > > > to read certain registers. Unwind tables fail to load > > > due to GCC quirk on the following code: > > > > > > extern char __end_unwind[]; > > > const struct unw_table_entry *end = (struct unw_table_entry > > > *)table_end; > > > table->end = segment_base + end[-1].end_offset; > > > > > > GCC does not generate correct code for this single memory > > > reference after constant propagation (see https://gcc.gnu.org/PR84184). > > > > > > > I'm not seeing this ... probably because I build with > > a pre-historic 4.3.4 version of gcc. > > > > Do you know which version(s) are affected? I'm not looking > > for an exhaustive list, just the one on which you found this > > would be good. > > > > -Tony > > Original bug https://bugs.gentoo.org/518130 claims regression appeared > around gcc-4.5. Locally am seeing the problem with gcc-6.4.0, gcc-7.2.0 and > gcc-8 (HEAD). Another report on the positive patch effect: rx2600 boots successfully with this patch (did not without, my guess is due to early access fault at bad address): https://bugs.gentoo.org/579278#c13 Tested-by: stanton_a...@mail.com -- Sergei
Re: [PATCH] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)
On Fri, 2 Feb 2018 23:02:20 + Sergei Trofimovich wrote: > On Fri, 2 Feb 2018 14:22:32 -0800 > "Luck, Tony" wrote: > > > On Fri, Feb 02, 2018 at 10:12:24PM +, Sergei Trofimovich wrote: > > > The strace breakage looks like that: > > > ./strace: get_regs: get_regs_error: Input/output error > > > > > > It happens because ia64 needs to load unwind tables > > > to read certain registers. Unwind tables fail to load > > > due to GCC quirk on the following code: > > > > > > extern char __end_unwind[]; > > > const struct unw_table_entry *end = (struct unw_table_entry > > > *)table_end; > > > table->end = segment_base + end[-1].end_offset; > > > > > > GCC does not generate correct code for this single memory > > > reference after constant propagation (see https://gcc.gnu.org/PR84184). > > > > > > > I'm not seeing this ... probably because I build with > > a pre-historic 4.3.4 version of gcc. > > > > Do you know which version(s) are affected? I'm not looking > > for an exhaustive list, just the one on which you found this > > would be good. > > > > -Tony > > Original bug https://bugs.gentoo.org/518130 claims regression appeared > around gcc-4.5. Locally am seeing the problem with gcc-6.4.0, gcc-7.2.0 and > gcc-8 (HEAD). Another report on the positive patch effect: rx2600 boots successfully with this patch (did not without, my guess is due to early access fault at bad address): https://bugs.gentoo.org/579278#c13 Tested-by: stanton_a...@mail.com -- Sergei
Re: [PATCH] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)
On Fri, 2 Feb 2018 14:22:32 -0800 "Luck, Tony" <tony.l...@intel.com> wrote: > On Fri, Feb 02, 2018 at 10:12:24PM +, Sergei Trofimovich wrote: > > The strace breakage looks like that: > > ./strace: get_regs: get_regs_error: Input/output error > > > > It happens because ia64 needs to load unwind tables > > to read certain registers. Unwind tables fail to load > > due to GCC quirk on the following code: > > > > extern char __end_unwind[]; > > const struct unw_table_entry *end = (struct unw_table_entry *)table_end; > > table->end = segment_base + end[-1].end_offset; > > > > GCC does not generate correct code for this single memory > > reference after constant propagation (see https://gcc.gnu.org/PR84184). > > I'm not seeing this ... probably because I build with > a pre-historic 4.3.4 version of gcc. > > Do you know which version(s) are affected? I'm not looking > for an exhaustive list, just the one on which you found this > would be good. > > -Tony Original bug https://bugs.gentoo.org/518130 claims regression appeared around gcc-4.5. Locally am seeing the problem with gcc-6.4.0, gcc-7.2.0 and gcc-8 (HEAD). -- Sergei
Re: [PATCH] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)
On Fri, 2 Feb 2018 14:22:32 -0800 "Luck, Tony" wrote: > On Fri, Feb 02, 2018 at 10:12:24PM +, Sergei Trofimovich wrote: > > The strace breakage looks like that: > > ./strace: get_regs: get_regs_error: Input/output error > > > > It happens because ia64 needs to load unwind tables > > to read certain registers. Unwind tables fail to load > > due to GCC quirk on the following code: > > > > extern char __end_unwind[]; > > const struct unw_table_entry *end = (struct unw_table_entry *)table_end; > > table->end = segment_base + end[-1].end_offset; > > > > GCC does not generate correct code for this single memory > > reference after constant propagation (see https://gcc.gnu.org/PR84184). > > I'm not seeing this ... probably because I build with > a pre-historic 4.3.4 version of gcc. > > Do you know which version(s) are affected? I'm not looking > for an exhaustive list, just the one on which you found this > would be good. > > -Tony Original bug https://bugs.gentoo.org/518130 claims regression appeared around gcc-4.5. Locally am seeing the problem with gcc-6.4.0, gcc-7.2.0 and gcc-8 (HEAD). -- Sergei
[PATCH] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)
The strace breakage looks like that: ./strace: get_regs: get_regs_error: Input/output error It happens because ia64 needs to load unwind tables to read certain registers. Unwind tables fail to load due to GCC quirk on the following code: extern char __end_unwind[]; const struct unw_table_entry *end = (struct unw_table_entry *)table_end; table->end = segment_base + end[-1].end_offset; GCC does not generate correct code for this single memory reference after constant propagation (see https://gcc.gnu.org/PR84184). Two triggers are required for bad code generation: - '__end_unwind' has alignment lower (char), than 'struct unw_table_entry' (8). - symbol offset is negative. This commit workarounds it by fixing alignment of '__end_unwind'. While at it use hidden symbols to generate shorter gp-relative relocations. CC: Tony Luck <tony.l...@intel.com> CC: Fenghua Yu <fenghua...@intel.com> CC: linux-i...@vger.kernel.org CC: linux-kernel@vger.kernel.org Bug: https://github.com/strace/strace/issues/33 Bug: https://gcc.gnu.org/PR84184 Reported-by: Émeric Maschino <emeric.masch...@gmail.com> Signed-off-by: Sergei Trofimovich <sly...@gentoo.org> --- arch/ia64/include/asm/sections.h | 1 - arch/ia64/kernel/unwind.c| 15 ++- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h index f3481408594e..0fc4f1757a44 100644 --- a/arch/ia64/include/asm/sections.h +++ b/arch/ia64/include/asm/sections.h @@ -24,7 +24,6 @@ extern char __start_gate_mckinley_e9_patchlist[], __end_gate_mckinley_e9_patchli extern char __start_gate_vtop_patchlist[], __end_gate_vtop_patchlist[]; extern char __start_gate_fsyscall_patchlist[], __end_gate_fsyscall_patchlist[]; extern char __start_gate_brl_fsys_bubble_down_patchlist[], __end_gate_brl_fsys_bubble_down_patchlist[]; -extern char __start_unwind[], __end_unwind[]; extern char __start_ivt_text[], __end_ivt_text[]; #undef dereference_function_descriptor diff --git a/arch/ia64/kernel/unwind.c b/arch/ia64/kernel/unwind.c index e04efa088902..025ba6700790 100644 --- a/arch/ia64/kernel/unwind.c +++ b/arch/ia64/kernel/unwind.c @@ -2243,7 +2243,20 @@ __initcall(create_gate_table); void __init unw_init (void) { - extern char __gp[]; + #define __ia64_hidden __attribute__((visibility("hidden"))) + /* +* We use hidden symbols to generate more efficient code using +* gp-relative addressing. +*/ + extern char __gp[] __ia64_hidden; + /* +* Unwind tables need to have proper alignment as init_unwind_table() +* uses negative offsets against '__end_unwind'. +* See https://gcc.gnu.org/PR84184 +*/ + extern const struct unw_table_entry __start_unwind[] __ia64_hidden; + extern const struct unw_table_entry __end_unwind[] __ia64_hidden; + #undef __ia64_hidden extern void unw_hash_index_t_is_too_narrow (void); long i, off; -- 2.16.1
[PATCH] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)
The strace breakage looks like that: ./strace: get_regs: get_regs_error: Input/output error It happens because ia64 needs to load unwind tables to read certain registers. Unwind tables fail to load due to GCC quirk on the following code: extern char __end_unwind[]; const struct unw_table_entry *end = (struct unw_table_entry *)table_end; table->end = segment_base + end[-1].end_offset; GCC does not generate correct code for this single memory reference after constant propagation (see https://gcc.gnu.org/PR84184). Two triggers are required for bad code generation: - '__end_unwind' has alignment lower (char), than 'struct unw_table_entry' (8). - symbol offset is negative. This commit workarounds it by fixing alignment of '__end_unwind'. While at it use hidden symbols to generate shorter gp-relative relocations. CC: Tony Luck CC: Fenghua Yu CC: linux-i...@vger.kernel.org CC: linux-kernel@vger.kernel.org Bug: https://github.com/strace/strace/issues/33 Bug: https://gcc.gnu.org/PR84184 Reported-by: Émeric Maschino Signed-off-by: Sergei Trofimovich --- arch/ia64/include/asm/sections.h | 1 - arch/ia64/kernel/unwind.c| 15 ++- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h index f3481408594e..0fc4f1757a44 100644 --- a/arch/ia64/include/asm/sections.h +++ b/arch/ia64/include/asm/sections.h @@ -24,7 +24,6 @@ extern char __start_gate_mckinley_e9_patchlist[], __end_gate_mckinley_e9_patchli extern char __start_gate_vtop_patchlist[], __end_gate_vtop_patchlist[]; extern char __start_gate_fsyscall_patchlist[], __end_gate_fsyscall_patchlist[]; extern char __start_gate_brl_fsys_bubble_down_patchlist[], __end_gate_brl_fsys_bubble_down_patchlist[]; -extern char __start_unwind[], __end_unwind[]; extern char __start_ivt_text[], __end_ivt_text[]; #undef dereference_function_descriptor diff --git a/arch/ia64/kernel/unwind.c b/arch/ia64/kernel/unwind.c index e04efa088902..025ba6700790 100644 --- a/arch/ia64/kernel/unwind.c +++ b/arch/ia64/kernel/unwind.c @@ -2243,7 +2243,20 @@ __initcall(create_gate_table); void __init unw_init (void) { - extern char __gp[]; + #define __ia64_hidden __attribute__((visibility("hidden"))) + /* +* We use hidden symbols to generate more efficient code using +* gp-relative addressing. +*/ + extern char __gp[] __ia64_hidden; + /* +* Unwind tables need to have proper alignment as init_unwind_table() +* uses negative offsets against '__end_unwind'. +* See https://gcc.gnu.org/PR84184 +*/ + extern const struct unw_table_entry __start_unwind[] __ia64_hidden; + extern const struct unw_table_entry __end_unwind[] __ia64_hidden; + #undef __ia64_hidden extern void unw_hash_index_t_is_too_narrow (void); long i, off; -- 2.16.1
Re: [PATCH v3] ia64: fix module loading for gcc-5.4
On Sat, 8 Apr 2017 20:53:18 +0100 Sergei Trofimovich <sly...@gentoo.org> wrote: > Starting from gcc-5.4+ gcc generates MLX > instructions in more cases to refer local > symbols: > https://gcc.gnu.org/PR60465 > > That caused ia64 module loader to choke > on such instructions: > fuse: invalid slot number 1 for IMM64 > > Linux kernel used to handle only case where > relocation pointed to slot=2 instruction in > the bundle. That limitation was fixed in linux by > commit 9c184a073bfd ("[IA64] Fix 2.6 kernel for the new ia64 assembler") > See http://sources.redhat.com/bugzilla/show_bug.cgi?id=1433 > > This change lifts the slot=2 restriction from > linux kernel module loader. > > Tested on 'fuse' and 'btrfs' kernel modules. > > Cc: Markus Elfring <elfr...@users.sourceforge.net> > Cc: H. J. Lu <hjl.to...@gmail.com> > Cc: Tony Luck <tony.l...@intel.com> > Cc: Fenghua Yu <fenghua...@intel.com> > Cc: linux-i...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: Andrew Morton <a...@linux-foundation.org> > Bug: https://bugs.gentoo.org/601014 > Tested-by: Émeric MASCHINO <emeric.masch...@gmail.com> > Signed-off-by: Sergei Trofimovich <sly...@gentoo.org> > --- > Change since v1: added 'Tested-by' > Change since v2: checkpatched, fixed typos by found by Markus Elfring Ping :) -- Sergei pgpCNIfSoxdHg.pgp Description: Цифровая подпись OpenPGP
Re: [PATCH v3] ia64: fix module loading for gcc-5.4
On Sat, 8 Apr 2017 20:53:18 +0100 Sergei Trofimovich wrote: > Starting from gcc-5.4+ gcc generates MLX > instructions in more cases to refer local > symbols: > https://gcc.gnu.org/PR60465 > > That caused ia64 module loader to choke > on such instructions: > fuse: invalid slot number 1 for IMM64 > > Linux kernel used to handle only case where > relocation pointed to slot=2 instruction in > the bundle. That limitation was fixed in linux by > commit 9c184a073bfd ("[IA64] Fix 2.6 kernel for the new ia64 assembler") > See http://sources.redhat.com/bugzilla/show_bug.cgi?id=1433 > > This change lifts the slot=2 restriction from > linux kernel module loader. > > Tested on 'fuse' and 'btrfs' kernel modules. > > Cc: Markus Elfring > Cc: H. J. Lu > Cc: Tony Luck > Cc: Fenghua Yu > Cc: linux-i...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: Andrew Morton > Bug: https://bugs.gentoo.org/601014 > Tested-by: Émeric MASCHINO > Signed-off-by: Sergei Trofimovich > --- > Change since v1: added 'Tested-by' > Change since v2: checkpatched, fixed typos by found by Markus Elfring Ping :) -- Sergei pgpCNIfSoxdHg.pgp Description: Цифровая подпись OpenPGP
Re: [PATCH v3] ia64: fix module loading for gcc-5.4+
On Mon, 10 Apr 2017 19:23:28 +0200 SF Markus Elfringwrote: > > - if (slot(insn) != 2) { > > + if (slot(insn) != 1 && slot(insn) != 2) { > > + int const s = slot(insn); > + if (s < 1 || s > 2) { > > Do run time characteristics matter for such a condition check here? It's done once at kernel module load time. My guess would be "not critical at all". slot() is a pure arithmetic static inline function. You can compare assembly output before and after your change. You can measure the difference yourself using 'ski' emulator. That's for example how I debugged and tested the patch: http://trofi.github.io/posts/199-ia64-machine-emulation.html -- Sergei pgps9ql5YvoLF.pgp Description: Цифровая подпись OpenPGP
Re: [PATCH v3] ia64: fix module loading for gcc-5.4+
On Mon, 10 Apr 2017 19:23:28 +0200 SF Markus Elfring wrote: > > - if (slot(insn) != 2) { > > + if (slot(insn) != 1 && slot(insn) != 2) { > > + int const s = slot(insn); > + if (s < 1 || s > 2) { > > Do run time characteristics matter for such a condition check here? It's done once at kernel module load time. My guess would be "not critical at all". slot() is a pure arithmetic static inline function. You can compare assembly output before and after your change. You can measure the difference yourself using 'ski' emulator. That's for example how I debugged and tested the patch: http://trofi.github.io/posts/199-ia64-machine-emulation.html -- Sergei pgps9ql5YvoLF.pgp Description: Цифровая подпись OpenPGP
Re: ia64: fix module loading for gcc-5.4
On Sun, 9 Apr 2017 11:02:43 +0200 SF Markus Elfringwrote: > >>> That caused ia64 module loader to choke > >>> on such instructions: > >>> fuse: invalid slot number 1 for IMM64 > >> > >> Why does it matter to check such a value? > > > > I'm not sure I follow the question. Is your question about > > linux kernel relocation code handler, gcc or ia64 instruction format? > > I am just curious if this source code could also work without > the mentioned check. It should work for valid code, yes. The flip side of check removal is to miss malformed relocation (say, when instruction "address" is wrong due to obscure toolchain bug). In this case apply_imm64() would silently corrupt unrelated memory instead of crashing kernel. > Would it make sense to check more than two values there? AFAIU ia64 does not allow encoding imm64/imm60 instructions spanning slot=0 at all. ia64_patch_imm64() can handle only imm64 bundles that span only both slot 1 and slot 2 at the same time. It can accept either slot=1 "address" or slot=2 "address". Anything else would be malformed. -- Sergei pgpzJI4SWd3oI.pgp Description: Цифровая подпись OpenPGP
Re: ia64: fix module loading for gcc-5.4
On Sun, 9 Apr 2017 11:02:43 +0200 SF Markus Elfring wrote: > >>> That caused ia64 module loader to choke > >>> on such instructions: > >>> fuse: invalid slot number 1 for IMM64 > >> > >> Why does it matter to check such a value? > > > > I'm not sure I follow the question. Is your question about > > linux kernel relocation code handler, gcc or ia64 instruction format? > > I am just curious if this source code could also work without > the mentioned check. It should work for valid code, yes. The flip side of check removal is to miss malformed relocation (say, when instruction "address" is wrong due to obscure toolchain bug). In this case apply_imm64() would silently corrupt unrelated memory instead of crashing kernel. > Would it make sense to check more than two values there? AFAIU ia64 does not allow encoding imm64/imm60 instructions spanning slot=0 at all. ia64_patch_imm64() can handle only imm64 bundles that span only both slot 1 and slot 2 at the same time. It can accept either slot=1 "address" or slot=2 "address". Anything else would be malformed. -- Sergei pgpzJI4SWd3oI.pgp Description: Цифровая подпись OpenPGP
Re: [PATCH v3] ia64: fix module loading for gcc-5.4
On Sun, 9 Apr 2017 10:27:52 +0200 SF Markus Elfringwrote: > > That caused ia64 module loader to choke > > on such instructions: > > fuse: invalid slot number 1 for IMM64 > > Why does it matter to check such a value? I'm not sure I follow the question. Is your question about linux kernel relocation code handler, gcc or ia64 instruction format? -- Sergei pgpsbo9tMNJCe.pgp Description: Цифровая подпись OpenPGP
Re: [PATCH v3] ia64: fix module loading for gcc-5.4
On Sun, 9 Apr 2017 10:27:52 +0200 SF Markus Elfring wrote: > > That caused ia64 module loader to choke > > on such instructions: > > fuse: invalid slot number 1 for IMM64 > > Why does it matter to check such a value? I'm not sure I follow the question. Is your question about linux kernel relocation code handler, gcc or ia64 instruction format? -- Sergei pgpsbo9tMNJCe.pgp Description: Цифровая подпись OpenPGP
[PATCH v3] ia64: fix module loading for gcc-5.4
Starting from gcc-5.4+ gcc generates MLX instructions in more cases to refer local symbols: https://gcc.gnu.org/PR60465 That caused ia64 module loader to choke on such instructions: fuse: invalid slot number 1 for IMM64 Linux kernel used to handle only case where relocation pointed to slot=2 instruction in the bundle. That limitation was fixed in linux by commit 9c184a073bfd ("[IA64] Fix 2.6 kernel for the new ia64 assembler") See http://sources.redhat.com/bugzilla/show_bug.cgi?id=1433 This change lifts the slot=2 restriction from linux kernel module loader. Tested on 'fuse' and 'btrfs' kernel modules. Cc: Markus Elfring <elfr...@users.sourceforge.net> Cc: H. J. Lu <hjl.to...@gmail.com> Cc: Tony Luck <tony.l...@intel.com> Cc: Fenghua Yu <fenghua...@intel.com> Cc: linux-i...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Andrew Morton <a...@linux-foundation.org> Bug: https://bugs.gentoo.org/601014 Tested-by: Émeric MASCHINO <emeric.masch...@gmail.com> Signed-off-by: Sergei Trofimovich <sly...@gentoo.org> --- Change since v1: added 'Tested-by' Change since v2: checkpatched, fixed typos by found by Markus Elfring arch/ia64/kernel/module.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c index 6ab0ae7d6535..d1d945c6bd05 100644 --- a/arch/ia64/kernel/module.c +++ b/arch/ia64/kernel/module.c @@ -153,7 +153,7 @@ slot (const struct insn *insn) static int apply_imm64 (struct module *mod, struct insn *insn, uint64_t val) { - if (slot(insn) != 2) { + if (slot(insn) != 1 && slot(insn) != 2) { printk(KERN_ERR "%s: invalid slot number %d for IMM64\n", mod->name, slot(insn)); return 0; @@ -165,7 +165,7 @@ apply_imm64 (struct module *mod, struct insn *insn, uint64_t val) static int apply_imm60 (struct module *mod, struct insn *insn, uint64_t val) { - if (slot(insn) != 2) { + if (slot(insn) != 1 && slot(insn) != 2) { printk(KERN_ERR "%s: invalid slot number %d for IMM60\n", mod->name, slot(insn)); return 0; -- 2.12.0
[PATCH v3] ia64: fix module loading for gcc-5.4
Starting from gcc-5.4+ gcc generates MLX instructions in more cases to refer local symbols: https://gcc.gnu.org/PR60465 That caused ia64 module loader to choke on such instructions: fuse: invalid slot number 1 for IMM64 Linux kernel used to handle only case where relocation pointed to slot=2 instruction in the bundle. That limitation was fixed in linux by commit 9c184a073bfd ("[IA64] Fix 2.6 kernel for the new ia64 assembler") See http://sources.redhat.com/bugzilla/show_bug.cgi?id=1433 This change lifts the slot=2 restriction from linux kernel module loader. Tested on 'fuse' and 'btrfs' kernel modules. Cc: Markus Elfring Cc: H. J. Lu Cc: Tony Luck Cc: Fenghua Yu Cc: linux-i...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Andrew Morton Bug: https://bugs.gentoo.org/601014 Tested-by: Émeric MASCHINO Signed-off-by: Sergei Trofimovich --- Change since v1: added 'Tested-by' Change since v2: checkpatched, fixed typos by found by Markus Elfring arch/ia64/kernel/module.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c index 6ab0ae7d6535..d1d945c6bd05 100644 --- a/arch/ia64/kernel/module.c +++ b/arch/ia64/kernel/module.c @@ -153,7 +153,7 @@ slot (const struct insn *insn) static int apply_imm64 (struct module *mod, struct insn *insn, uint64_t val) { - if (slot(insn) != 2) { + if (slot(insn) != 1 && slot(insn) != 2) { printk(KERN_ERR "%s: invalid slot number %d for IMM64\n", mod->name, slot(insn)); return 0; @@ -165,7 +165,7 @@ apply_imm64 (struct module *mod, struct insn *insn, uint64_t val) static int apply_imm60 (struct module *mod, struct insn *insn, uint64_t val) { - if (slot(insn) != 2) { + if (slot(insn) != 1 && slot(insn) != 2) { printk(KERN_ERR "%s: invalid slot number %d for IMM60\n", mod->name, slot(insn)); return 0; -- 2.12.0
[PATCH] alpha: cleanup: remove __NR_sys_epoll_*, leave __NR_epoll_*
__NR_sys_epoll_create and friends are alpha-specific while __NR_epoll_create is a generic name for other arches. Cc: Richard Henderson <r...@twiddle.net> Cc: Ivan Kokshaysky <i...@jurassic.park.msu.ru> Cc: Matt Turner <matts...@gmail.com> Cc: linux-al...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergei Trofimovich <sly...@gentoo.org> --- arch/alpha/include/uapi/asm/unistd.h | 5 - 1 file changed, 5 deletions(-) diff --git a/arch/alpha/include/uapi/asm/unistd.h b/arch/alpha/include/uapi/asm/unistd.h index aa33bf5aacb6..650d339a8df6 100644 --- a/arch/alpha/include/uapi/asm/unistd.h +++ b/arch/alpha/include/uapi/asm/unistd.h @@ -366,11 +366,6 @@ #define __NR_epoll_create 407 #define __NR_epoll_ctl 408 #define __NR_epoll_wait409 -/* Feb 2007: These three sys_epoll defines shouldn't be here but culling - * them would break userspace apps ... we'll kill them off in 2010 :) */ -#define __NR_sys_epoll_create __NR_epoll_create -#define __NR_sys_epoll_ctl __NR_epoll_ctl -#define __NR_sys_epoll_wait__NR_epoll_wait #define __NR_remap_file_pages 410 #define __NR_set_tid_address 411 #define __NR_restart_syscall 412 -- 2.12.2
[PATCH] alpha: cleanup: remove __NR_sys_epoll_*, leave __NR_epoll_*
__NR_sys_epoll_create and friends are alpha-specific while __NR_epoll_create is a generic name for other arches. Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: linux-al...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergei Trofimovich --- arch/alpha/include/uapi/asm/unistd.h | 5 - 1 file changed, 5 deletions(-) diff --git a/arch/alpha/include/uapi/asm/unistd.h b/arch/alpha/include/uapi/asm/unistd.h index aa33bf5aacb6..650d339a8df6 100644 --- a/arch/alpha/include/uapi/asm/unistd.h +++ b/arch/alpha/include/uapi/asm/unistd.h @@ -366,11 +366,6 @@ #define __NR_epoll_create 407 #define __NR_epoll_ctl 408 #define __NR_epoll_wait409 -/* Feb 2007: These three sys_epoll defines shouldn't be here but culling - * them would break userspace apps ... we'll kill them off in 2010 :) */ -#define __NR_sys_epoll_create __NR_epoll_create -#define __NR_sys_epoll_ctl __NR_epoll_ctl -#define __NR_sys_epoll_wait__NR_epoll_wait #define __NR_remap_file_pages 410 #define __NR_set_tid_address 411 #define __NR_restart_syscall 412 -- 2.12.2
[PATCH (resend)] ia64: fix module loading for gcc-5.4
Starting from gcc-5.4+ gcc geperates MLX instructions in more cases to refer local symbols: https://gcc.gnu.org/bugzilla/60465 That caused ia64 module loader to choke on such instructions: fuse: invalid slot number 1 for IMM64 Linux kernel used to handle only case where relocation pointed to slot=2 instruction in the bundle. That limitation was fixed in linux by 9c184a073bfd650cc791956d6ca79725bb682716 commit. See http://sources.redhat.com/bugzilla/show_bug.cgi?id=1433 This change lifts the slot=2 restriction from linux kernel module loader. Tested on 'fuse' and 'btrfs' kernel modules. Cc: H. J. Lu <hjl.to...@gmail.com> Cc: Tony Luck <tony.l...@intel.com> Cc: Fenghua Yu <fenghua...@intel.com> Cc: linux-i...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Andrew Morton <a...@linux-foundation.org> Bug: https://bugs.gentoo.org/601014 Tested-by: Émeric MASCHINO <emeric.masch...@gmail.com> Signed-off-by: Sergei Trofimovich <sly...@gentoo.org> --- Change since v1: added 'Tested-by' arch/ia64/kernel/module.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c index 6ab0ae7d6535..d1d945c6bd05 100644 --- a/arch/ia64/kernel/module.c +++ b/arch/ia64/kernel/module.c @@ -153,7 +153,7 @@ slot (const struct insn *insn) static int apply_imm64 (struct module *mod, struct insn *insn, uint64_t val) { - if (slot(insn) != 2) { + if (slot(insn) != 1 && slot(insn) != 2) { printk(KERN_ERR "%s: invalid slot number %d for IMM64\n", mod->name, slot(insn)); return 0; @@ -165,7 +165,7 @@ apply_imm64 (struct module *mod, struct insn *insn, uint64_t val) static int apply_imm60 (struct module *mod, struct insn *insn, uint64_t val) { - if (slot(insn) != 2) { + if (slot(insn) != 1 && slot(insn) != 2) { printk(KERN_ERR "%s: invalid slot number %d for IMM60\n", mod->name, slot(insn)); return 0; -- 2.12.0