Re: 5.?? regression: strace testsuite OOpses kernel on ia64

2021-04-09 Thread Sergei Trofimovich
On Tue, 23 Feb 2021 18:53:21 +
Sergei Trofimovich  wrote:

> The crash seems to be related to sock_filter-v test from strace:
> https://github.com/strace/strace/blob/master/tests/seccomp-filter-v.c
> 
> Here is an OOps:
> 
> [  818.089904] BUG: Bad page map in process sock_filter-v  pte:0001 
> pmd:118580001
> [  818.089904] page:e6a429c8 refcount:1 mapcount:-1 
> mapping: index:0x0 pfn:0x0
> [  818.089904] flags: 0x1000(reserved)
> [  818.089904] raw: 1000 a0004008 a0004008 
> 
> [  818.089904] raw:   0001fffe
> [  818.089904] page dumped because: bad pte
> [  818.089904] addr: vm_flags:04044011 
> anon_vma: mapping: index:0
> [  818.095483] file:(null) fault:0x0 mmap:0x0 readpage:0x0
> [  818.095483] CPU: 0 PID: 5990 Comm: sock_filter-v Not tainted 
> 5.11.0-3-gbfa5a4929c90 #57
> [  818.095483] Hardware name: hp server rx3600   , BIOS 04.03 
>04/08/2008
> [  818.095483]
> [  818.095483] Call Trace:
> [  818.095483]  [] show_stack+0x90/0xc0
> [  818.095483] sp=e00118707bb0 
> bsp=e001187013c0
> [  818.095483]  [] dump_stack+0x120/0x160
> [  818.095483] sp=e00118707d80 
> bsp=e00118701348
> [  818.095483]  [] print_bad_pte+0x300/0x3a0
> [  818.095483] sp=e00118707d80 
> bsp=e001187012e0
> [  818.099483]  [] unmap_page_range+0xa90/0x11a0
> [  818.099483] sp=e00118707d80 
> bsp=e00118701140
> [  818.099483]  [] unmap_vmas+0xc0/0x100
> [  818.099483] sp=e00118707da0 
> bsp=e00118701108
> [  818.099483]  [] exit_mmap+0x150/0x320
> [  818.099483] sp=e00118707da0 
> bsp=e001187010d8
> [  818.099483]  [] mmput+0x60/0x200
> [  818.099483] sp=e00118707e20 
> bsp=e001187010b0
> [  818.103482]  [] do_exit+0x6f0/0x18a0
> [  818.103482] sp=e00118707e20 
> bsp=e00118701038
> [  818.103482]  [] do_group_exit+0x90/0x2a0
> [  818.103482] sp=e00118707e30 
> bsp=e00118700ff0
> [  818.103482]  [] sys_exit_group+0x20/0x40
> [  818.103482] sp=e00118707e30 
> bsp=e00118700f98
> [  818.107482]  [] ia64_trace_syscall+0xf0/0x130
> [  818.107482] sp=e00118707e30 
> bsp=e00118700f98
> [  818.107482]  [] ia64_ivt+0x00040720/0x400
> [  818.107482] sp=e00118708000 
> bsp=e00118700f98
> [  818.115482] Disabling lock debugging due to kernel taint
> [  818.115482] BUG: Bad rss-counter state mm:2eec6412 
> type:MM_FILEPAGES val:-1
> [  818.132256] Unable to handle kernel NULL pointer dereference (address 
> 0068)
> [  818.133904] sock_filter-v-X[5999]: Oops 11012296146944 [1]
> [  818.133904] Modules linked in: acpi_ipmi ipmi_si usb_storage e1000 
> ipmi_devintf ipmi_msghandler rtc_efi
> [  818.133904]
> [  818.133904] CPU: 0 PID: 5999 Comm: sock_filter-v-X Tainted: GB 
> 5.11.0-3-gbfa5a4929c90 #57
> [  818.133904] Hardware name: hp server rx3600   , BIOS 04.03 
>04/08/2008
> [  818.133904] psr : 121008026010 ifs : 8288 ip  : 
> []Tainted: GB 
> (5.11.0-3-gbfa5a4929c90)
> [  818.133904] ip is at bpf_prog_free+0x21/0xe0
> [  818.133904] unat:  pfs : 0307 rsc : 
> 0003
> [  818.133904] rnat:  bsps:  pr  : 
> 00106a5a51665965
> [  818.133904] ldrs:  ccv : 12088904 fpsr: 
> 0009804c8a70033f
> [  818.133904] csd :  ssd : 
> [  818.133904] b0  : a00100d54080 b6  : a00100d53fe0 b7  : 
> a001cef0
> [  818.133904] f6  : 0ffefb0c50daa1b67f89a f7  : 0ffed8b3e4fdb0800
> [  818.133904] f8  : 10017fbd1bc00 f9  : 1000eb95f
> [  818.133904] f10 : 10008ade20716a6c83cc1 f11 : 1003e02b7
> [  818.133904] r1  : a0010176b300 r2  : a0028004 r3  : 
> 
> [  818.133904] r8  : 0008 r9  : e0011873f800 r10 : 
> e00102c18600
> [  818.133904] r11 : e00102c19600 r12 : e0011873f7f0 r13 : 
> e00118738000
> [  818.133904] r14 : 0068 r15 : a0028028 r16 : 

[PATCH v2] mm: page_poison: print page info when corruption is caught

2021-04-07 Thread Sergei Trofimovich
When page_poison detects page corruption it's useful to see who
freed a page recently to have a guess where write-after-free
corruption happens.

After this change corruption report has extra page data.
Example report from real corruption (includes only page_pwner part):

pagealloc: memory corruption
e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60  
0..`
e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20  
...`...
...
CPU: 1 PID: 220402 Comm: cc1plus Not tainted 5.12.0-rc5-00107-g9720c6f59ecf 
#245
Hardware name: hp server rx3600, BIOS 04.03 04/08/2008
...
Call Trace:
 [] show_stack+0x90/0xc0
 [] dump_stack+0x150/0x1c0
 [] __kernel_unpoison_pages+0x410/0x440
 [] get_page_from_freelist+0x1460/0x2ca0
 [] __alloc_pages_nodemask+0x3c0/0x660
 [] alloc_pages_vma+0xb0/0x500
 [] __handle_mm_fault+0x1230/0x1fe0
 [] handle_mm_fault+0x310/0x4e0
 [] ia64_do_page_fault+0x1f0/0xb80
 [] ia64_leave_kernel+0x0/0x270
page_owner tracks the page as freed
page allocated via order 0, migratetype Movable,
  gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts 
8173444098740
 __reset_page_owner+0x40/0x200
 free_pcp_prepare+0x4d0/0x600
 free_unref_page+0x20/0x1c0
 __put_page+0x110/0x1a0
 migrate_pages+0x16d0/0x1dc0
 compact_zone+0xfc0/0x1aa0
 proactive_compact_node+0xd0/0x1e0
 kcompactd+0x550/0x600
 kthread+0x2c0/0x2e0
 call_payload+0x50/0x80

Here we can see that page was freed by page migration but something
managed to write to it afterwards.

CC: Vlastimil Babka 
CC: Andrew Morton 
CC: linux...@kvack.org
Signed-off-by: Sergei Trofimovich 
---
Change since v1: use more generic 'dump_page()' suggested by Vlastimil
Should supersede existing 
mm-page_poison-print-page-owner-info-when-corruption-is-caught.patch

 mm/page_poison.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/page_poison.c b/mm/page_poison.c
index 65cdf844c8ad..df03126f3b2b 100644
--- a/mm/page_poison.c
+++ b/mm/page_poison.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned char b)
return error && !(error & (error - 1));
 }
 
-static void check_poison_mem(unsigned char *mem, size_t bytes)
+static void check_poison_mem(struct page *page, unsigned char *mem, size_t 
bytes)
 {
static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10);
unsigned char *start;
@@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t bytes)
print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start,
end - start + 1, 1);
dump_stack();
+   dump_page(page, "pagealloc: corrupted page details");
 }
 
 static void unpoison_page(struct page *page)
@@ -82,7 +84,7 @@ static void unpoison_page(struct page *page)
 * that is freed to buddy. Thus no extra check is done to
 * see if a page was poisoned.
 */
-   check_poison_mem(addr, PAGE_SIZE);
+   check_poison_mem(page, addr, PAGE_SIZE);
kunmap_atomic(addr);
 }
 
-- 
2.31.1



Re: [PATCH] mm: page_poison: print page owner info when corruption is caught

2021-04-07 Thread Sergei Trofimovich
On Wed, Apr 07, 2021 at 02:15:50PM +0200, Vlastimil Babka wrote:
> On 4/4/21 4:17 PM, Sergei Trofimovich wrote:
> > When page_poison detects page corruption it's useful to see who
> > freed a page recently to have a guess where write-after-free
> > corruption happens.
> > 
> > After this change corruption report has extra page_owner data.
> > Example report from real corruption:
> > 
> > pagealloc: memory corruption
> > e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60
> > e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20
> > ...
> > CPU: 1 PID: 220402 Comm: cc1plus Not tainted 
> > 5.12.0-rc5-00107-g9720c6f59ecf #245
> > Hardware name: hp server rx3600, BIOS 04.03 04/08/2008
> > ...
> > Call Trace:
> >  [] show_stack+0x90/0xc0
> >  [] dump_stack+0x150/0x1c0
> >  [] __kernel_unpoison_pages+0x410/0x440
> >  [] get_page_from_freelist+0x1460/0x2ca0
> >  [] __alloc_pages_nodemask+0x3c0/0x660
> >  [] alloc_pages_vma+0xb0/0x500
> >  [] __handle_mm_fault+0x1230/0x1fe0
> >  [] handle_mm_fault+0x310/0x4e0
> >  [] ia64_do_page_fault+0x1f0/0xb80
> >  [] ia64_leave_kernel+0x0/0x270
> > page_owner tracks the page as freed
> > page allocated via order 0, migratetype Movable,
> >   gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts 
> > 8173444098740
> >  __reset_page_owner+0x40/0x200
> >  free_pcp_prepare+0x4d0/0x600
> >  free_unref_page+0x20/0x1c0
> >  __put_page+0x110/0x1a0
> >  migrate_pages+0x16d0/0x1dc0
> >  compact_zone+0xfc0/0x1aa0
> >  proactive_compact_node+0xd0/0x1e0
> >  kcompactd+0x550/0x600
> >  kthread+0x2c0/0x2e0
> >  call_payload+0x50/0x80
> > 
> > Here we can see that page was freed by page migration but something
> > managed to write to it afterwards.
> > 
> > CC: Andrew Morton 
> > CC: linux...@kvack.org
> > Signed-off-by: Sergei Trofimovich 
> > ---
> >  mm/page_poison.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/page_poison.c b/mm/page_poison.c
> > index 65cdf844c8ad..ef2a1eab13d7 100644
> > --- a/mm/page_poison.c
> > +++ b/mm/page_poison.c
> > @@ -4,6 +4,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned 
> > char b)
> > return error && !(error & (error - 1));
> >  }
> >  
> > -static void check_poison_mem(unsigned char *mem, size_t bytes)
> > +static void check_poison_mem(struct page *page, unsigned char *mem, size_t 
> > bytes)
> >  {
> > static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10);
> > unsigned char *start;
> > @@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t 
> > bytes)
> > print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start,
> > end - start + 1, 1);
> > dump_stack();
> > +   dump_page_owner(page);
> 
> OK but why not a full dump_page()?

Oh, I did not know it existed! Looks even better.
Will send a v2 with dump_page().

> >  }
> >  
> >  static void unpoison_page(struct page *page)
> > @@ -82,7 +84,7 @@ static void unpoison_page(struct page *page)
> >  * that is freed to buddy. Thus no extra check is done to
> >  * see if a page was poisoned.
> >  */
> > -   check_poison_mem(addr, PAGE_SIZE);
> > +   check_poison_mem(page, addr, PAGE_SIZE);
> > kunmap_atomic(addr);
> >  }
> >  
> > 
> 

-- 

  Sergei


Re: [PATCH] mm: page_owner: fetch backtrace only for tracked pages

2021-04-07 Thread Sergei Trofimovich
On Wed, Apr 07, 2021 at 05:49:14PM +0200, Vlastimil Babka wrote:
> On 4/1/21 11:24 PM, Sergei Trofimovich wrote:
> > Very minor optimization.
> 
> I'm not entirely sure about accuracy of "only for tracked pages". Missing
> page_ext is something I'm not even sure how possible it is in practice, 
> probably
> just an error condition (failed to be allocated?). Or did you observe this in
> practice? But anyway, the change is not wrong.

Never saw missing 'page_ext' in practice (I also did not check for
it explicitly). I agree "optimization" is misleading. "cleanup"
might be a better wording.

> > CC: Andrew Morton 
> > CC: linux...@kvack.org
> > Signed-off-by: Sergei Trofimovich 
> 
> Acked-by: Vlastimil Babka 
> 
> > ---
> >  mm/page_owner.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/page_owner.c b/mm/page_owner.c
> > index 63e4ecaba97b..7147fd34a948 100644
> > --- a/mm/page_owner.c
> > +++ b/mm/page_owner.c
> > @@ -140,14 +140,14 @@ void __reset_page_owner(struct page *page, unsigned 
> > int order)
> >  {
> > int i;
> > struct page_ext *page_ext;
> > -   depot_stack_handle_t handle = 0;
> > +   depot_stack_handle_t handle;
> > struct page_owner *page_owner;
> >  
> > -   handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
> > -
> > page_ext = lookup_page_ext(page);
> > if (unlikely(!page_ext))
> > return;
> > +
> > +   handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
> > for (i = 0; i < (1 << order); i++) {
> > __clear_bit(PAGE_EXT_OWNER_ALLOCATED, _ext->flags);
> > page_owner = get_page_owner(page_ext);
> > 
> 

-- 

  Sergei


Re: [PATCH 11/20] kbuild: ia64: use common install script

2021-04-07 Thread Sergei Trofimovich
On Wed,  7 Apr 2021 07:34:10 +0200
Greg Kroah-Hartman  wrote:

> The common scripts/install.sh script will now work for ia64, all that
> is needed is to add the compressed image type to it.  So add that file
> type check and the ability to call /usr/sbin/elilo after copying the
> kernel.  With that we can remove the ia64-only version of the file.
> 
> Cc: linux-i...@vger.kernel.org
> Signed-off-by: Greg Kroah-Hartman 

Reviewed-by: Sergei Trofimovich 

> ---
>  arch/ia64/Makefile   |  2 +-
>  arch/ia64/install.sh | 40 
>  scripts/install.sh   |  8 +++-
>  3 files changed, 8 insertions(+), 42 deletions(-)
>  delete mode 100644 arch/ia64/install.sh
> 
> diff --git a/arch/ia64/Makefile b/arch/ia64/Makefile
> index 467b7e7f967c..19e20e99f487 100644
> --- a/arch/ia64/Makefile
> +++ b/arch/ia64/Makefile
> @@ -77,7 +77,7 @@ archheaders:
>  CLEAN_FILES += vmlinux.gz
>  
>  install: vmlinux.gz
> - sh $(srctree)/arch/ia64/install.sh $(KERNELRELEASE) $< System.map 
> "$(INSTALL_PATH)"
> + sh $(srctree)/scripts/install.sh $(KERNELRELEASE) $< System.map 
> "$(INSTALL_PATH)"
>  
>  define archhelp
>echo '* compressed - Build compressed kernel image'
> diff --git a/arch/ia64/install.sh b/arch/ia64/install.sh
> deleted file mode 100644
> index 0e932f5dcd1a..
> --- a/arch/ia64/install.sh
> +++ /dev/null
> @@ -1,40 +0,0 @@
> -#!/bin/sh
> -#
> -# arch/ia64/install.sh
> -#
> -# This file is subject to the terms and conditions of the GNU General Public
> -# License.  See the file "COPYING" in the main directory of this archive
> -# for more details.
> -#
> -# Copyright (C) 1995 by Linus Torvalds
> -#
> -# Adapted from code in arch/i386/boot/Makefile by H. Peter Anvin
> -#
> -# "make install" script for ia64 architecture
> -#
> -# Arguments:
> -#   $1 - kernel version
> -#   $2 - kernel image file
> -#   $3 - kernel map file
> -#   $4 - default install path (blank if root directory)
> -#
> -
> -# User may have a custom install script
> -
> -if [ -x ~/bin/${INSTALLKERNEL} ]; then exec ~/bin/${INSTALLKERNEL} "$@"; fi
> -if [ -x /sbin/${INSTALLKERNEL} ]; then exec /sbin/${INSTALLKERNEL} "$@"; fi
> -
> -# Default install - same as make zlilo
> -
> -if [ -f $4/vmlinuz ]; then
> - mv $4/vmlinuz $4/vmlinuz.old
> -fi
> -
> -if [ -f $4/System.map ]; then
> - mv $4/System.map $4/System.old
> -fi
> -
> -cat $2 > $4/vmlinuz
> -cp $3 $4/System.map
> -
> -test -x /usr/sbin/elilo && /usr/sbin/elilo
> diff --git a/scripts/install.sh b/scripts/install.sh
> index 73067b535ea0..b6ca2a0f0983 100644
> --- a/scripts/install.sh
> +++ b/scripts/install.sh
> @@ -52,6 +52,7 @@ if [ -x /sbin/"${INSTALLKERNEL}" ]; then exec 
> /sbin/"${INSTALLKERNEL}" "$@"; fi
>  base=$(basename "$2")
>  if [ "$base" = "bzImage" ] ||
> [ "$base" = "Image.gz" ] ||
> +   [ "$base" = "vmlinux.gz" ] ||
> [ "$base" = "zImage" ] ; then
>   # Compressed install
>   echo "Installing compressed kernel"
> @@ -65,7 +66,7 @@ fi
>  # Some architectures name their files based on version number, and
>  # others do not.  Call out the ones that do not to make it obvious.
>  case "${ARCH}" in
> - x86)
> + ia64 | x86)
>   version=""
>   ;;
>   *)
> @@ -86,6 +87,11 @@ case "${ARCH}" in
>   echo "You have to install it yourself"
>   fi
>   ;;
> + ia64)
> + if [ -x /usr/sbin/elilo ]; then
> + /usr/sbin/elilo
> + fi
> + ;;
>   x86)
>   if [ -x /sbin/lilo ]; then
>   /sbin/lilo
> -- 
> 2.31.1
> 


-- 

  Sergei


[PATCH] ia64: drop marked broken DISCONTIGMEM and VIRTUAL_MEM_MAP

2021-04-04 Thread Sergei Trofimovich
DISCONTIGMEM was marked BROKEN in 5.11. Let's remove it.

Booted SPARSEMEM successfully on rx3600.

CC: Andrew Morton 
CC: linux-i...@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/Kconfig  |  23 
 arch/ia64/configs/bigsur_defconfig |   1 -
 arch/ia64/include/asm/meminit.h|  11 --
 arch/ia64/include/asm/page.h   |  25 +---
 arch/ia64/include/asm/pgtable.h|   5 -
 arch/ia64/kernel/Makefile  |   2 +-
 arch/ia64/kernel/ia64_ksyms.c  |  12 --
 arch/ia64/kernel/machine_kexec.c   |   2 +-
 arch/ia64/mm/Makefile  |   1 -
 arch/ia64/mm/contig.c  |   4 -
 arch/ia64/mm/discontig.c   |  21 ---
 arch/ia64/mm/fault.c   |  15 --
 arch/ia64/mm/init.c| 213 -
 13 files changed, 4 insertions(+), 331 deletions(-)
 delete mode 100644 arch/ia64/kernel/ia64_ksyms.c

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 2ad7a8d29fcc..81e2b893b1e7 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -286,15 +286,6 @@ config FORCE_CPEI_RETARGET
 config ARCH_SELECT_MEMORY_MODEL
def_bool y
 
-config ARCH_DISCONTIGMEM_ENABLE
-   def_bool y
-   depends on BROKEN
-   help
- Say Y to support efficient handling of discontiguous physical memory,
- for architectures which are either NUMA (Non-Uniform Memory Access)
- or have huge holes in the physical address space for other reasons.
- See  for more.
-
 config ARCH_FLATMEM_ENABLE
def_bool y
 
@@ -325,22 +316,8 @@ config NODES_SHIFT
  MAX_NUMNODES will be 2^(This value).
  If in doubt, use the default.
 
-# VIRTUAL_MEM_MAP and FLAT_NODE_MEM_MAP are functionally equivalent.
-# VIRTUAL_MEM_MAP has been retained for historical reasons.
-config VIRTUAL_MEM_MAP
-   bool "Virtual mem map"
-   depends on !SPARSEMEM && !FLATMEM
-   default y
-   help
- Say Y to compile the kernel with support for a virtual mem map.
- This code also only takes effect if a memory hole of greater than
- 1 Gb is found during boot.  You must turn this option on if you
- require the DISCONTIGMEM option for your machine. If you are
- unsure, say Y.
-
 config HOLES_IN_ZONE
bool
-   default y if VIRTUAL_MEM_MAP
 
 config HAVE_ARCH_NODEDATA_EXTENSION
def_bool y
diff --git a/arch/ia64/configs/bigsur_defconfig 
b/arch/ia64/configs/bigsur_defconfig
index c409756b5396..0341a67cc1bf 100644
--- a/arch/ia64/configs/bigsur_defconfig
+++ b/arch/ia64/configs/bigsur_defconfig
@@ -9,7 +9,6 @@ CONFIG_SGI_PARTITION=y
 CONFIG_SMP=y
 CONFIG_NR_CPUS=2
 CONFIG_PREEMPT=y
-# CONFIG_VIRTUAL_MEM_MAP is not set
 CONFIG_IA64_PALINFO=y
 CONFIG_EFI_VARS=y
 CONFIG_BINFMT_MISC=m
diff --git a/arch/ia64/include/asm/meminit.h b/arch/ia64/include/asm/meminit.h
index e789c0818edb..6c47a239fc26 100644
--- a/arch/ia64/include/asm/meminit.h
+++ b/arch/ia64/include/asm/meminit.h
@@ -58,15 +58,4 @@ extern int reserve_elfcorehdr(u64 *start, u64 *end);
 
 extern int register_active_ranges(u64 start, u64 len, int nid);
 
-#ifdef CONFIG_VIRTUAL_MEM_MAP
-  extern unsigned long VMALLOC_END;
-  extern struct page *vmem_map;
-  extern int create_mem_map_page_table(u64 start, u64 end, void *arg);
-  extern int vmemmap_find_next_valid_pfn(int, int);
-#else
-static inline int vmemmap_find_next_valid_pfn(int node, int i)
-{
-   return i + 1;
-}
-#endif
 #endif /* meminit_h */
diff --git a/arch/ia64/include/asm/page.h b/arch/ia64/include/asm/page.h
index b69a5499d75b..f4dc81fa7146 100644
--- a/arch/ia64/include/asm/page.h
+++ b/arch/ia64/include/asm/page.h
@@ -95,31 +95,10 @@ do {\
 
 #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT)
 
-#ifdef CONFIG_VIRTUAL_MEM_MAP
-extern int ia64_pfn_valid (unsigned long pfn);
-#else
-# define ia64_pfn_valid(pfn) 1
-#endif
-
-#ifdef CONFIG_VIRTUAL_MEM_MAP
-extern struct page *vmem_map;
-#ifdef CONFIG_DISCONTIGMEM
-# define page_to_pfn(page) ((unsigned long) (page - vmem_map))
-# define pfn_to_page(pfn)  (vmem_map + (pfn))
-# define __pfn_to_phys(pfn)PFN_PHYS(pfn)
-#else
-# include 
-#endif
-#else
-# include 
-#endif
+#include 
 
 #ifdef CONFIG_FLATMEM
-# define pfn_valid(pfn)(((pfn) < max_mapnr) && 
ia64_pfn_valid(pfn))
-#elif defined(CONFIG_DISCONTIGMEM)
-extern unsigned long min_low_pfn;
-extern unsigned long max_low_pfn;
-# define pfn_valid(pfn)(((pfn) >= min_low_pfn) && ((pfn) < 
max_low_pfn) && ia64_pfn_valid(pfn))
+# define pfn_valid(pfn)((pfn) < max_mapnr)
 #endif
 
 #define page_to_phys(page) (page_to_pfn(page) << PAGE_SHIFT)
diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index 9b4efe89e62d..8994514ebe91 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -223,1

[PATCH] mm: page_poison: print page owner info when corruption is caught

2021-04-04 Thread Sergei Trofimovich
When page_poison detects page corruption it's useful to see who
freed a page recently to have a guess where write-after-free
corruption happens.

After this change corruption report has extra page_owner data.
Example report from real corruption:

pagealloc: memory corruption
e0014cd61d10: 11 00 00 00 00 00 00 00 30 1d d2 ff ff 0f 00 60
e0014cd61d20: b0 1d d2 ff ff 0f 00 60 90 fe 1c 00 08 00 00 20
...
CPU: 1 PID: 220402 Comm: cc1plus Not tainted 5.12.0-rc5-00107-g9720c6f59ecf 
#245
Hardware name: hp server rx3600, BIOS 04.03 04/08/2008
...
Call Trace:
 [] show_stack+0x90/0xc0
 [] dump_stack+0x150/0x1c0
 [] __kernel_unpoison_pages+0x410/0x440
 [] get_page_from_freelist+0x1460/0x2ca0
 [] __alloc_pages_nodemask+0x3c0/0x660
 [] alloc_pages_vma+0xb0/0x500
 [] __handle_mm_fault+0x1230/0x1fe0
 [] handle_mm_fault+0x310/0x4e0
 [] ia64_do_page_fault+0x1f0/0xb80
 [] ia64_leave_kernel+0x0/0x270
page_owner tracks the page as freed
page allocated via order 0, migratetype Movable,
  gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 37, ts 
8173444098740
 __reset_page_owner+0x40/0x200
 free_pcp_prepare+0x4d0/0x600
 free_unref_page+0x20/0x1c0
 __put_page+0x110/0x1a0
 migrate_pages+0x16d0/0x1dc0
 compact_zone+0xfc0/0x1aa0
 proactive_compact_node+0xd0/0x1e0
 kcompactd+0x550/0x600
 kthread+0x2c0/0x2e0
 call_payload+0x50/0x80

Here we can see that page was freed by page migration but something
managed to write to it afterwards.

CC: Andrew Morton 
CC: linux...@kvack.org
Signed-off-by: Sergei Trofimovich 
---
 mm/page_poison.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/page_poison.c b/mm/page_poison.c
index 65cdf844c8ad..ef2a1eab13d7 100644
--- a/mm/page_poison.c
+++ b/mm/page_poison.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -45,7 +46,7 @@ static bool single_bit_flip(unsigned char a, unsigned char b)
return error && !(error & (error - 1));
 }
 
-static void check_poison_mem(unsigned char *mem, size_t bytes)
+static void check_poison_mem(struct page *page, unsigned char *mem, size_t 
bytes)
 {
static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 10);
unsigned char *start;
@@ -70,6 +71,7 @@ static void check_poison_mem(unsigned char *mem, size_t bytes)
print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, start,
end - start + 1, 1);
dump_stack();
+   dump_page_owner(page);
 }
 
 static void unpoison_page(struct page *page)
@@ -82,7 +84,7 @@ static void unpoison_page(struct page *page)
 * that is freed to buddy. Thus no extra check is done to
 * see if a page was poisoned.
 */
-   check_poison_mem(addr, PAGE_SIZE);
+   check_poison_mem(page, addr, PAGE_SIZE);
kunmap_atomic(addr);
 }
 
-- 
2.31.1



Re: [PATCH v2 3/3] hpsa: add an assert to prevent from __packed reintroduction

2021-04-03 Thread Sergei Trofimovich
On Fri, 2 Apr 2021 14:40:39 +
"Elliott, Robert (Servers)"  wrote:

> It looks like ia64 implements atomic_t as a 64-bit value and expects atomic_t
> to be 64-bit aligned, but does nothing to ensure that.
> 
> For x86, atomic_t is a 32-bit value and atomic64_t is a 64-bit value, and
> the definition of atomic64_t is overridden in a way that ensures
> 64-bit (8 byte) alignment:
> 
> Generic definitions are in include/linux/types.h:
> typedef struct {
> int counter;
> } atomic_t;
> 
> #define ATOMIC_INIT(i) { (i) }
> 
> #ifdef CONFIG_64BIT
> typedef struct {
> s64 counter;
> } atomic64_t;
> #endif
> 
> Override in arch/x86/include/asm/atomic64_32.h:
> typedef struct {
> s64 __aligned(8) counter;
> } atomic64_t;
> 
> Perhaps ia64 needs to take over the definition of both atomic_t and atomic64_t
> and do the same?

I don't think it's needed. ia64 is a 64-bit arch with expected
natural alignment for s64: alignof(s64)=8.

Also if my understanding is correct adding __aligned(8) would not fix
use case of embedding locks into packed structs even on x86_64 (or i386):

$ cat a.c
#include 
#include 

typedef struct { unsigned long long __attribute__((aligned(8))) l; } lock_t;
struct s { char c; lock_t lock; } __attribute__((packed));
int main() { printf ("offsetof(struct s, lock) = %lu\nsizeof(struct s) = 
%lu\n", offsetof(struct s, lock), sizeof(struct s)); }

$ x86_64-pc-linux-gnu-gcc a.c -o a && ./a
offsetof(struct s, lock) = 1
sizeof(struct s) = 9

$ x86_64-pc-linux-gnu-gcc a.c -o a -m32 && ./a
offsetof(struct s, lock) = 1
sizeof(struct s) = 9

Note how alignment of 'lock' stays 1 byte in both cases.

8-byte alignment added for i386 in

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bbf2a330d92c5afccfd17592ba9ccd50f41cf748
is only as a performance optimization (not a correctness fix).

Larger alignment on i386 is preferred because alignof(s64)=4 on
that target which might make atomic op span cache lines that
leads to performance degradation.

-- 

  Sergei


[PATCH] ia64: module: fix symbolizer crash on fdescr

2021-04-03 Thread Sergei Trofimovich
Noticed failure as a crash on ia64 when tried to symbolize all
backtraces collected by page_owner=on:

$ cat /sys/kernel/debug/page_owner


CPU: 1 PID: 2074 Comm: cat Not tainted 5.12.0-rc4 #226
Hardware name: hp server rx3600, BIOS 04.03 04/08/2008
ip is at dereference_module_function_descriptor+0x41/0x100

Crash happens at dereference_module_function_descriptor() due to
use-after-free when dereferencing ".opd" section header.

All section headers are already freed after module is laoded
successfully.

To keep symbolizer working the change stores ".opd" address
and size after module is relocated to a new place and before
section headers are discarded.

To make similar errors less obscure module_finalize() now
zeroes out all variables relevant to module loading only.

CC: Andrew Morton 
CC: linux-i...@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/include/asm/module.h |  6 +-
 arch/ia64/kernel/module.c  | 29 +
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/ia64/include/asm/module.h b/arch/ia64/include/asm/module.h
index 5a29652e6def..7271b9c5fc76 100644
--- a/arch/ia64/include/asm/module.h
+++ b/arch/ia64/include/asm/module.h
@@ -14,16 +14,20 @@
 struct elf64_shdr; /* forward declration */
 
 struct mod_arch_specific {
+   /* Used only at module load time. */
struct elf64_shdr *core_plt;/* core PLT section */
struct elf64_shdr *init_plt;/* init PLT section */
struct elf64_shdr *got; /* global offset table */
struct elf64_shdr *opd; /* official procedure descriptors */
struct elf64_shdr *unwind;  /* unwind-table section */
unsigned long gp;   /* global-pointer for module */
+   unsigned int next_got_entry;/* index of next available got entry */
 
+   /* Used at module run and cleanup time. */
void *core_unw_table;   /* core unwind-table cookie returned by 
unwinder */
void *init_unw_table;   /* init unwind-table cookie returned by 
unwinder */
-   unsigned int next_got_entry;/* index of next available got entry */
+   void *opd_addr; /* symbolize uses .opd to get to actual 
function */
+   unsigned long opd_size;
 };
 
 #define ARCH_SHF_SMALL SHF_IA_64_SHORT
diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c
index 00a496cb346f..f3385fe6e37e 100644
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -905,9 +905,31 @@ register_unwind_table (struct module *mod)
 int
 module_finalize (const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs, struct module 
*mod)
 {
+   struct mod_arch_specific *mas = >arch;
+
DEBUGP("%s: init: entry=%p\n", __func__, mod->init);
-   if (mod->arch.unwind)
+   if (mas->unwind)
register_unwind_table(mod);
+
+   /*
+* ".opd" was already relocated to the final destination. Store
+* it's address for use in symbolizer.
+*/
+   mas->opd_addr = (void *)mas->opd->sh_addr;
+   mas->opd_size = mas->opd->sh_size;
+
+   /*
+* Module relocation was already done at this point. Section
+* headers are about to be deleted. Wipe out load-time context.
+*/
+   mas->core_plt = NULL;
+   mas->init_plt = NULL;
+   mas->got = NULL;
+   mas->opd = NULL;
+   mas->unwind = NULL;
+   mas->gp = 0;
+   mas->next_got_entry = 0;
+
return 0;
 }
 
@@ -926,10 +948,9 @@ module_arch_cleanup (struct module *mod)
 
 void *dereference_module_function_descriptor(struct module *mod, void *ptr)
 {
-   Elf64_Shdr *opd = mod->arch.opd;
+   struct mod_arch_specific *mas = >arch;
 
-   if (ptr < (void *)opd->sh_addr ||
-   ptr >= (void *)(opd->sh_addr + opd->sh_size))
+   if (ptr < mas->opd_addr || ptr >= mas->opd_addr + mas->opd_size)
return ptr;
 
return dereference_function_descriptor(ptr);
-- 
2.31.1



[PATCH v2] mm: page_owner: detect page_owner recursion via task_struct

2021-04-02 Thread Sergei Trofimovich
Before the change page_owner recursion was detected via fetching
backtrace and inspecting it for current instruction pointer.
It has a few problems:
- it is slightly slow as it requires extra backtrace and a linear
  stack scan of the result
- it is too late to check if backtrace fetching required memory
  allocation itself (ia64's unwinder requires it).

To simplify recursion tracking let's use page_owner recursion flag
in 'struct task_struct'.

The change make page_owner=on work on ia64 by avoiding infinite
recursion in:
  kmalloc()
  -> __set_page_owner()
  -> save_stack()
  -> unwind() [ia64-specific]
  -> build_script()
  -> kmalloc()
  -> __set_page_owner() [we short-circuit here]
  -> save_stack()
  -> unwind() [recursion]

CC: Ingo Molnar 
CC: Peter Zijlstra 
CC: Juri Lelli 
CC: Vincent Guittot 
CC: Dietmar Eggemann 
CC: Steven Rostedt 
CC: Ben Segall 
CC: Mel Gorman 
CC: Daniel Bristot de Oliveira 
CC: Andrew Morton 
CC: linux...@kvack.org
Signed-off-by: Sergei Trofimovich 
---
Change since v1:
- use bit from task_struct instead of a new field
- track only one recursion depth level so far

 include/linux/sched.h |  4 
 mm/page_owner.c   | 32 ++--
 2 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ef00bb22164c..00986450677c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -841,6 +841,10 @@ struct task_struct {
/* Stalled due to lack of memory */
unsignedin_memstall:1;
 #endif
+#ifdef CONFIG_PAGE_OWNER
+   /* Used by page_owner=on to detect recursion in page tracking. */
+   unsignedin_page_owner:1;
+#endif
 
unsigned long   atomic_flags; /* Flags requiring atomic 
access. */
 
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 7147fd34a948..64b2e4c6afb7 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -97,42 +97,30 @@ static inline struct page_owner *get_page_owner(struct 
page_ext *page_ext)
return (void *)page_ext + page_owner_ops.offset;
 }
 
-static inline bool check_recursive_alloc(unsigned long *entries,
-unsigned int nr_entries,
-unsigned long ip)
-{
-   unsigned int i;
-
-   for (i = 0; i < nr_entries; i++) {
-   if (entries[i] == ip)
-   return true;
-   }
-   return false;
-}
-
 static noinline depot_stack_handle_t save_stack(gfp_t flags)
 {
unsigned long entries[PAGE_OWNER_STACK_DEPTH];
depot_stack_handle_t handle;
unsigned int nr_entries;
 
-   nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
-
/*
-* We need to check recursion here because our request to
-* stackdepot could trigger memory allocation to save new
-* entry. New memory allocation would reach here and call
-* stack_depot_save_entries() again if we don't catch it. There is
-* still not enough memory in stackdepot so it would try to
-* allocate memory again and loop forever.
+* Avoid recursion.
+*
+* Sometimes page metadata allocation tracking requires more
+* memory to be allocated:
+* - when new stack trace is saved to stack depot
+* - when backtrace itself is calculated (ia64)
 */
-   if (check_recursive_alloc(entries, nr_entries, _RET_IP_))
+   if (current->in_page_owner)
return dummy_handle;
+   current->in_page_owner = 1;
 
+   nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
handle = stack_depot_save(entries, nr_entries, flags);
if (!handle)
handle = failure_handle;
 
+   current->in_page_owner = 0;
return handle;
 }
 
-- 
2.31.1



Re: [PATCH] mm: page_owner: detect page_owner recursion via task_struct

2021-04-02 Thread Sergei Trofimovich
On Thu, 1 Apr 2021 17:05:19 -0700
Andrew Morton  wrote:

> On Thu,  1 Apr 2021 23:30:10 +0100 Sergei Trofimovich  
> wrote:
> 
> > Before the change page_owner recursion was detected via fetching
> > backtrace and inspecting it for current instruction pointer.
> > It has a few problems:
> > - it is slightly slow as it requires extra backtrace and a linear
> >   stack scan of the result
> > - it is too late to check if backtrace fetching required memory
> >   allocation itself (ia64's unwinder requires it).
> > 
> > To simplify recursion tracking let's use page_owner recursion depth
> > as a counter in 'struct task_struct'.  
> 
> Seems like a better approach.
> 
> > The change make page_owner=on work on ia64 bu avoiding infinite
> > recursion in:
> >   kmalloc()  
> >   -> __set_page_owner()
> >   -> save_stack()
> >   -> unwind() [ia64-specific]
> >   -> build_script()
> >   -> kmalloc()
> >   -> __set_page_owner() [we short-circuit here]
> >   -> save_stack()
> >   -> unwind() [recursion]  
> > 
> > ...
> >
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1371,6 +1371,15 @@ struct task_struct {
> > struct llist_head   kretprobe_instances;
> >  #endif
> >  
> > +#ifdef CONFIG_PAGE_OWNER
> > +   /*
> > +* Used by page_owner=on to detect recursion in page tracking.
> > +* Is it fine to have non-atomic ops here if we ever access
> > +* this variable via current->page_owner_depth?  
> 
> Yes, it is fine.  This part of the comment can be removed.

Cool! Will do.

> > +*/
> > +   unsigned int page_owner_depth;
> > +#endif  
> 
> Adding to the task_struct has a cost.  But I don't expect that
> PAGE_OWNER is commonly used in prodction builds (correct?).

Yeah, PAGE_OWNER should not be enabled for production kernels.

Not having extra memory overhead (or layout disruption) is a nice
benefit though. I'll switch to "Unserialized, strictly 'current'" bitfield.

> > --- a/init/init_task.c
> > +++ b/init/init_task.c
> > @@ -213,6 +213,9 @@ struct task_struct init_task
> >  #ifdef CONFIG_SECCOMP
> > .seccomp= { .filter_count = ATOMIC_INIT(0) },
> >  #endif
> > +#ifdef CONFIG_PAGE_OWNER
> > +   .page_owner_depth   = 0,
> > +#endif
> >  };
> >  EXPORT_SYMBOL(init_task);  
> 
> It will be initialized to zero by the compiler.  We can omit this hunk
> entirely.
> 
> > --- a/mm/page_owner.c
> > +++ b/mm/page_owner.c
> > @@ -20,6 +20,16 @@
> >   */
> >  #define PAGE_OWNER_STACK_DEPTH (16)
> >  
> > +/*
> > + * How many reenters we allow to page_owner.
> > + *
> > + * Sometimes metadata allocation tracking requires more memory to be 
> > allocated:
> > + * - when new stack trace is saved to stack depot
> > + * - when backtrace itself is calculated (ia64)
> > + * Instead of falling to infinite recursion give it a chance to recover.
> > + */
> > +#define PAGE_OWNER_MAX_RECURSION_DEPTH (1)  
> 
> So this is presently a boolean.  Is there any expectation that
> PAGE_OWNER_MAX_RECURSION_DEPTH will ever be greater than 1?  If not, we
> could use a single bit in the task_struct.  Add it to the
> "Unserialized, strictly 'current'" bitfields.  Could make it a 2-bit field if 
> we want
> to permit PAGE_OWNER_MAX_RECURSION_DEPTH=larger.

Let's settle on depth=1. depth>1 is not trivial for other reasons I don't
completely understand.

Follow-up patch incoming.

-- 

  Sergei


[PATCH] mm: page_owner: detect page_owner recursion via task_struct

2021-04-01 Thread Sergei Trofimovich
Before the change page_owner recursion was detected via fetching
backtrace and inspecting it for current instruction pointer.
It has a few problems:
- it is slightly slow as it requires extra backtrace and a linear
  stack scan of the result
- it is too late to check if backtrace fetching required memory
  allocation itself (ia64's unwinder requires it).

To simplify recursion tracking let's use page_owner recursion depth
as a counter in 'struct task_struct'.

The change make page_owner=on work on ia64 bu avoiding infinite
recursion in:
  kmalloc()
  -> __set_page_owner()
  -> save_stack()
  -> unwind() [ia64-specific]
  -> build_script()
  -> kmalloc()
  -> __set_page_owner() [we short-circuit here]
  -> save_stack()
  -> unwind() [recursion]

CC: Ingo Molnar 
CC: Peter Zijlstra 
CC: Juri Lelli 
CC: Vincent Guittot 
CC: Dietmar Eggemann 
CC: Steven Rostedt 
CC: Ben Segall 
CC: Mel Gorman 
CC: Daniel Bristot de Oliveira 
CC: Andrew Morton 
CC: linux...@kvack.org
Signed-off-by: Sergei Trofimovich 
---
 include/linux/sched.h |  9 +
 init/init_task.c  |  3 +++
 mm/page_owner.c   | 41 +
 3 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ef00bb22164c..35771703fd89 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1371,6 +1371,15 @@ struct task_struct {
struct llist_head   kretprobe_instances;
 #endif
 
+#ifdef CONFIG_PAGE_OWNER
+   /*
+* Used by page_owner=on to detect recursion in page tracking.
+* Is it fine to have non-atomic ops here if we ever access
+* this variable via current->page_owner_depth?
+*/
+   unsigned int page_owner_depth;
+#endif
+
/*
 * New fields for task_struct should be added above here, so that
 * they are included in the randomized portion of task_struct.
diff --git a/init/init_task.c b/init/init_task.c
index 3711cdaafed2..f579f2b2eca8 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -213,6 +213,9 @@ struct task_struct init_task
 #ifdef CONFIG_SECCOMP
.seccomp= { .filter_count = ATOMIC_INIT(0) },
 #endif
+#ifdef CONFIG_PAGE_OWNER
+   .page_owner_depth   = 0,
+#endif
 };
 EXPORT_SYMBOL(init_task);
 
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 7147fd34a948..422558605fcc 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -20,6 +20,16 @@
  */
 #define PAGE_OWNER_STACK_DEPTH (16)
 
+/*
+ * How many reenters we allow to page_owner.
+ *
+ * Sometimes metadata allocation tracking requires more memory to be allocated:
+ * - when new stack trace is saved to stack depot
+ * - when backtrace itself is calculated (ia64)
+ * Instead of falling to infinite recursion give it a chance to recover.
+ */
+#define PAGE_OWNER_MAX_RECURSION_DEPTH (1)
+
 struct page_owner {
unsigned short order;
short last_migrate_reason;
@@ -97,42 +107,25 @@ static inline struct page_owner *get_page_owner(struct 
page_ext *page_ext)
return (void *)page_ext + page_owner_ops.offset;
 }
 
-static inline bool check_recursive_alloc(unsigned long *entries,
-unsigned int nr_entries,
-unsigned long ip)
-{
-   unsigned int i;
-
-   for (i = 0; i < nr_entries; i++) {
-   if (entries[i] == ip)
-   return true;
-   }
-   return false;
-}
-
 static noinline depot_stack_handle_t save_stack(gfp_t flags)
 {
unsigned long entries[PAGE_OWNER_STACK_DEPTH];
depot_stack_handle_t handle;
unsigned int nr_entries;
 
-   nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
-
-   /*
-* We need to check recursion here because our request to
-* stackdepot could trigger memory allocation to save new
-* entry. New memory allocation would reach here and call
-* stack_depot_save_entries() again if we don't catch it. There is
-* still not enough memory in stackdepot so it would try to
-* allocate memory again and loop forever.
-*/
-   if (check_recursive_alloc(entries, nr_entries, _RET_IP_))
+   /* Avoid recursion. Used in stack trace generation code. */
+   if (current->page_owner_depth >= PAGE_OWNER_MAX_RECURSION_DEPTH)
return dummy_handle;
 
+   current->page_owner_depth++;
+
+   nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
+
handle = stack_depot_save(entries, nr_entries, flags);
if (!handle)
handle = failure_handle;
 
+   current->page_owner_depth--;
return handle;
 }
 
-- 
2.31.1



[PATCH] mm: page_owner: fetch backtrace only for tracked pages

2021-04-01 Thread Sergei Trofimovich
Very minor optimization.

CC: Andrew Morton 
CC: linux...@kvack.org
Signed-off-by: Sergei Trofimovich 
---
 mm/page_owner.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index 63e4ecaba97b..7147fd34a948 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -140,14 +140,14 @@ void __reset_page_owner(struct page *page, unsigned int 
order)
 {
int i;
struct page_ext *page_ext;
-   depot_stack_handle_t handle = 0;
+   depot_stack_handle_t handle;
struct page_owner *page_owner;
 
-   handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
-
page_ext = lookup_page_ext(page);
if (unlikely(!page_ext))
return;
+
+   handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
for (i = 0; i < (1 << order); i++) {
__clear_bit(PAGE_EXT_OWNER_ALLOCATED, _ext->flags);
page_owner = get_page_owner(page_ext);
-- 
2.31.1



[PATCH] mm: page_owner: use kstrtobool() to parse bool option

2021-04-01 Thread Sergei Trofimovich
I tried to use page_owner=1 for a while noticed too late it had
no effect as opposed to similar init_on_alloc=1 (these work).

Let's make them consistent.

The change decreses binary size slightly:
   textdata bss dec hex filename
  12408 321  17   1274631ca mm/page_owner.o.before
  12320 321  17   126583172 mm/page_owner.o.after

CC: Andrew Morton 
CC: linux...@kvack.org
Signed-off-by: Sergei Trofimovich 
---
 mm/page_owner.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index d15c7c4994f5..63e4ecaba97b 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -41,13 +41,7 @@ static void init_early_allocated_pages(void);
 
 static int __init early_page_owner_param(char *buf)
 {
-   if (!buf)
-   return -EINVAL;
-
-   if (strcmp(buf, "on") == 0)
-   page_owner_enabled = true;
-
-   return 0;
+   return kstrtobool(buf, _owner_enabled);
 }
 early_param("page_owner", early_page_owner_param);
 
-- 
2.31.1



Re: [PATCH] ia64: fix user_stack_pointer() for ptrace()

2021-04-01 Thread Sergei Trofimovich
On Wed, 31 Mar 2021 17:49:08 -0700
Andrew Morton  wrote:

> On Wed, 31 Mar 2021 09:44:47 +0100 Sergei Trofimovich  
> wrote:
> 
> > ia64 has two stacks:
> > - memory stack (or stack), pointed at by by r12
> > - register backing store (register stack), pointed at
> >   ar.bsp/ar.bspstore with complications around dirty
> >   register frame on CPU.
> > 
> > In https://bugs.gentoo.org/769614 Dmitry noticed that
> > PTRACE_GET_SYSCALL_INFO returns register stack instead
> > memory stack.
> > 
> > The bug comes from the fact that user_stack_pointer() and
> > current_user_stack_pointer() don't return the same register:
> > 
> >   ulong user_stack_pointer(struct pt_regs *regs) { return 
> > regs->ar_bspstore; }
> >   #define current_user_stack_pointer() (current_pt_regs()->r12)
> > 
> > The change gets both back in sync.
> > 
> > I think ptrace(PTRACE_GET_SYSCALL_INFO) is the only affected user
> > by this bug on ia64.
> > 
> > The change fixes 'rt_sigreturn.gen.test' strace test where
> > it was observed initially.
> >   
> 
> I assume a cc:stable is justified here?
> 
> The bug seems to have been there for 10+ years, so there isn't a lot of
> point in looking for the Fixes: reference.

Yes, I think cc:stable is fine.

-- 

  Sergei


[PATCH] ia64: fix user_stack_pointer() for ptrace()

2021-03-31 Thread Sergei Trofimovich
ia64 has two stacks:
- memory stack (or stack), pointed at by by r12
- register backing store (register stack), pointed at
  ar.bsp/ar.bspstore with complications around dirty
  register frame on CPU.

In https://bugs.gentoo.org/769614 Dmitry noticed that
PTRACE_GET_SYSCALL_INFO returns register stack instead
memory stack.

The bug comes from the fact that user_stack_pointer() and
current_user_stack_pointer() don't return the same register:

  ulong user_stack_pointer(struct pt_regs *regs) { return regs->ar_bspstore; }
  #define current_user_stack_pointer() (current_pt_regs()->r12)

The change gets both back in sync.

I think ptrace(PTRACE_GET_SYSCALL_INFO) is the only affected user
by this bug on ia64.

The change fixes 'rt_sigreturn.gen.test' strace test where
it was observed initially.

CC: Andrew Morton 
CC: Oleg Nesterov 
CC: linux-i...@vger.kernel.org
Bug: https://bugs.gentoo.org/769614
Reported-by: Dmitry V. Levin 
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/include/asm/ptrace.h | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index b3aa46090101..08179135905c 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -54,8 +54,7 @@
 
 static inline unsigned long user_stack_pointer(struct pt_regs *regs)
 {
-   /* FIXME: should this be bspstore + nr_dirty regs? */
-   return regs->ar_bspstore;
+   return regs->r12;
 }
 
 static inline int is_syscall_success(struct pt_regs *regs)
@@ -79,11 +78,6 @@ static inline long regs_return_value(struct pt_regs *regs)
unsigned long __ip = instruction_pointer(regs); \
(__ip & ~3UL) + ((__ip & 3UL) << 2);\
 })
-/*
- * Why not default?  Because user_stack_pointer() on ia64 gives register
- * stack backing store instead...
- */
-#define current_user_stack_pointer() (current_pt_regs()->r12)
 
   /* given a pointer to a task_struct, return the user's pt_regs */
 # define task_pt_regs(t)   (((struct pt_regs *) ((char *) (t) + 
IA64_STK_OFFSET)) - 1)
-- 
2.31.1



Re: [PATCH mm v2] mm, kasan: fix for "integrate page_alloc init with HW_TAGS"

2021-03-30 Thread Sergei Trofimovich
On Tue, 30 Mar 2021 18:44:09 +0200
Vlastimil Babka  wrote:

> On 3/30/21 6:37 PM, Andrey Konovalov wrote:
> > My commit "integrate page_alloc init with HW_TAGS" changed the order of
> > kernel_unpoison_pages() and kernel_init_free_pages() calls. This leads
> > to complaints from the page unpoisoning code, as the poison pattern gets
> > overwritten for __GFP_ZERO allocations.
> > 
> > Fix by restoring the initial order. Also add a warning comment.
> > 
> > Reported-by: Vlastimil Babka 
> > Reported-by: Sergei Trofimovich 
> > Signed-off-by: Andrey Konovalov   
> 
> Tested that the bug indeed occurs in -next and is fixed by this. Thanks.

Reviewed-by: Sergei Trofimovich 

> > ---
> >  mm/page_alloc.c | 8 +++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 033bd92e8398..d2c020563c0b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2328,6 +2328,13 @@ inline void post_alloc_hook(struct page *page, 
> > unsigned int order,
> > arch_alloc_page(page, order);
> > debug_pagealloc_map_pages(page, 1 << order);
> >  
> > +   /*
> > +* Page unpoisoning must happen before memory initialization.
> > +* Otherwise, the poison pattern will be overwritten for __GFP_ZERO
> > +* allocations and the page unpoisoning code will complain.
> > +*/
> > +   kernel_unpoison_pages(page, 1 << order);
> > +
> > /*
> >  * As memory initialization might be integrated into KASAN,
> >  * kasan_alloc_pages and kernel_init_free_pages must be
> > @@ -2338,7 +2345,6 @@ inline void post_alloc_hook(struct page *page, 
> > unsigned int order,
> > if (init && !kasan_has_integrated_init())
> > kernel_init_free_pages(page, 1 << order);
> >  
> > -   kernel_unpoison_pages(page, 1 << order);
> > set_page_owner(page, order, gfp_flags);
> >  }
> >  
> >   
> 


-- 

  Sergei


[PATCH v2 3/3] hpsa: add an assert to prevent from __packed reintroduction

2021-03-30 Thread Sergei Trofimovich
CC: linux-i...@vger.kernel.org
CC: storage...@microchip.com
CC: linux-s...@vger.kernel.org
CC: Joe Szczypek 
CC: Scott Benesh 
CC: Scott Teel 
CC: Tomas Henzl 
CC: "Martin K. Petersen" 
CC: Don Brace 
Reported-by: John Paul Adrian Glaubitz 
Suggested-by: Don Brace 
Fixes: f749d8b7a "scsi: hpsa: Correct dev cmds outstanding for retried cmds"
Signed-off-by: Sergei Trofimovich 
---
 drivers/scsi/hpsa_cmd.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/scsi/hpsa_cmd.h b/drivers/scsi/hpsa_cmd.h
index 885b1f1fb20a..ba6a3aa8d954 100644
--- a/drivers/scsi/hpsa_cmd.h
+++ b/drivers/scsi/hpsa_cmd.h
@@ -22,6 +22,9 @@
 
 #include 
 
+#include  /* static_assert */
+#include  /* offsetof */
+
 /* general boundary defintions */
 #define SENSEINFOBYTES  32 /* may vary between hbas */
 #define SG_ENTRIES_IN_CMD  32 /* Max SG entries excluding chain blocks */
@@ -454,6 +457,15 @@ struct CommandList {
atomic_t refcount; /* Must be last to avoid memset in hpsa_cmd_init() */
 } __aligned(COMMANDLIST_ALIGNMENT);
 
+/*
+ * Make sure our embedded atomic variable is aligned. Otherwise we break atomic
+ * operations on architectures that don't support unaligned atomics like IA64.
+ *
+ * The assert guards against reintroductin against unwanted __packed to
+ * the struct CommandList.
+ */
+static_assert(offsetof(struct CommandList, refcount) % __alignof__(atomic_t) 
== 0);
+
 /* Max S/G elements in I/O accelerator command */
 #define IOACCEL1_MAXSGENTRIES   24
 #define IOACCEL2_MAXSGENTRIES  28
-- 
2.31.1



[PATCH v2 2/3] hpsa: fix boot on ia64 (atomic_t alignment)

2021-03-30 Thread Sergei Trofimovich
The failure initially observed as boot failure on rx3600 ia64 machine
with RAID bus controller: Hewlett-Packard Company Smart Array P600:

kernel unaligned access to 0xe00105dd8b95, ip=0xa00100b87551
kernel unaligned access to 0xe00105dd8e95, ip=0xa00100b87551
hpsa :14:01.0: Controller reports max supported commands of 0 Using 16 
instead. Ensure that firmware is up to date.
swapper/0[1]: error during unaligned kernel access

Here unaligned access comes from 'struct CommandList' that happens
to be packed. The change f749d8b7a ("scsi: hpsa: Correct dev cmds
outstanding for retried cmds") introduced unexpected padding and
un-aligned atomic_t from natural alignment to something else.

This change removes packing annotation from struct not intended to be
sent to controller as is. This restores natural `atomic_t` alignment.

The change is tested on the same rx3600 machine.

CC: linux-i...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: storage...@microchip.com
CC: linux-s...@vger.kernel.org
CC: Joe Szczypek 
CC: Scott Benesh 
CC: Scott Teel 
CC: Tomas Henzl 
CC: "Martin K. Petersen" 
CC: Don Brace 
Reported-by: John Paul Adrian Glaubitz 
Suggested-by: Don Brace 
Fixes: f749d8b7a "scsi: hpsa: Correct dev cmds outstanding for retried cmds"
Signed-off-by: Sergei Trofimovich 
---
 drivers/scsi/hpsa_cmd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/hpsa_cmd.h b/drivers/scsi/hpsa_cmd.h
index 280e933d27e7..885b1f1fb20a 100644
--- a/drivers/scsi/hpsa_cmd.h
+++ b/drivers/scsi/hpsa_cmd.h
@@ -452,7 +452,7 @@ struct CommandList {
bool retry_pending;
struct hpsa_scsi_dev_t *device;
atomic_t refcount; /* Must be last to avoid memset in hpsa_cmd_init() */
-} __packed __aligned(COMMANDLIST_ALIGNMENT);
+} __aligned(COMMANDLIST_ALIGNMENT);
 
 /* Max S/G elements in I/O accelerator command */
 #define IOACCEL1_MAXSGENTRIES   24
-- 
2.31.1



[PATCH v2 1/3] hpsa: use __packed on individual structs, not header-wide

2021-03-30 Thread Sergei Trofimovich
Some of the structs contain `atomic_t` values and are not intended to be
sent to IO controller as is.

The change adds __packed to every struct and union in the file.
Follow-up commits will fix `atomic_t` problems.

The commit is a no-op at least on ia64:
$ diff -u <(objdump -d -r old.o) <(objdump -d -r new.o)

CC: linux-i...@vger.kernel.org
CC: storage...@microchip.com
CC: linux-s...@vger.kernel.org
CC: Joe Szczypek 
CC: Scott Benesh 
CC: Scott Teel 
CC: Tomas Henzl 
CC: "Martin K. Petersen" 
CC: Don Brace 
Reported-by: John Paul Adrian Glaubitz 
Suggested-by: Don Brace 
Fixes: f749d8b7a "scsi: hpsa: Correct dev cmds outstanding for retried cmds"
Signed-off-by: Sergei Trofimovich 
---
 drivers/scsi/hpsa_cmd.h | 68 -
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/drivers/scsi/hpsa_cmd.h b/drivers/scsi/hpsa_cmd.h
index d126bb877250..280e933d27e7 100644
--- a/drivers/scsi/hpsa_cmd.h
+++ b/drivers/scsi/hpsa_cmd.h
@@ -20,6 +20,8 @@
 #ifndef HPSA_CMD_H
 #define HPSA_CMD_H
 
+#include 
+
 /* general boundary defintions */
 #define SENSEINFOBYTES  32 /* may vary between hbas */
 #define SG_ENTRIES_IN_CMD  32 /* Max SG entries excluding chain blocks */
@@ -200,12 +202,10 @@ union u64bit {
MAX_EXT_TARGETS + 1) /* + 1 is for the controller itself */
 
 /* SCSI-3 Commands */
-#pragma pack(1)
-
 #define HPSA_INQUIRY 0x12
 struct InquiryData {
u8 data_byte[36];
-};
+} __packed;
 
 #define HPSA_REPORT_LOG 0xc2/* Report Logical LUNs */
 #define HPSA_REPORT_PHYS 0xc3   /* Report Physical LUNs */
@@ -221,7 +221,7 @@ struct raid_map_disk_data {
u8xor_mult[2];/**< XOR multipliers for this position,
*  valid for data disks only */
u8reserved[2];
-};
+} __packed;
 
 struct raid_map_data {
__le32   structure_size;/* Size of entire structure in bytes */
@@ -247,14 +247,14 @@ struct raid_map_data {
__le16   dekindex;  /* Data encryption key index. */
u8reserved[16];
struct raid_map_disk_data data[RAID_MAP_MAX_ENTRIES];
-};
+} __packed;
 
 struct ReportLUNdata {
u8 LUNListLength[4];
u8 extended_response_flag;
u8 reserved[3];
u8 LUN[HPSA_MAX_LUN][8];
-};
+} __packed;
 
 struct ext_report_lun_entry {
u8 lunid[8];
@@ -269,20 +269,20 @@ struct ext_report_lun_entry {
u8 lun_count; /* multi-lun device, how many luns */
u8 redundant_paths;
u32 ioaccel_handle; /* ioaccel1 only uses lower 16 bits */
-};
+} __packed;
 
 struct ReportExtendedLUNdata {
u8 LUNListLength[4];
u8 extended_response_flag;
u8 reserved[3];
struct ext_report_lun_entry LUN[HPSA_MAX_PHYS_LUN];
-};
+} __packed;
 
 struct SenseSubsystem_info {
u8 reserved[36];
u8 portname[8];
u8 reserved1[1108];
-};
+} __packed;
 
 /* BMIC commands */
 #define BMIC_READ 0x26
@@ -317,7 +317,7 @@ union SCSI3Addr {
u8 Targ:6;
u8 Mode:2;/* b10 */
} LogUnit;
-};
+} __packed;
 
 struct PhysDevAddr {
u32 TargetId:24;
@@ -325,20 +325,20 @@ struct PhysDevAddr {
u32 Mode:2;
/* 2 level target device addr */
union SCSI3Addr  Target[2];
-};
+} __packed;
 
 struct LogDevAddr {
u32VolId:30;
u32Mode:2;
u8 reserved[4];
-};
+} __packed;
 
 union LUNAddr {
u8   LunAddrBytes[8];
union SCSI3AddrSCSI3Lun[4];
struct PhysDevAddr PhysDev;
struct LogDevAddr  LogDev;
-};
+} __packed;
 
 struct CommandListHeader {
u8  ReplyQueue;
@@ -346,7 +346,7 @@ struct CommandListHeader {
__le16  SGTotal;
__le64  tag;
union LUNAddr LUN;
-};
+} __packed;
 
 struct RequestBlock {
u8   CDBLen;
@@ -365,18 +365,18 @@ struct RequestBlock {
 #define GET_DIR(tad) (((tad) >> 6) & 0x03)
u16  Timeout;
u8   CDB[16];
-};
+} __packed;
 
 struct ErrDescriptor {
__le64 Addr;
__le32 Len;
-};
+} __packed;
 
 struct SGDescriptor {
__le64 Addr;
__le32 Len;
__le32 Ext;
-};
+} __packed;
 
 union MoreErrInfo {
struct {
@@ -390,7 +390,8 @@ union MoreErrInfo {
u8  offense_num;  /* byte # of offense 0-base */
u32 offense_value;
} Invalid_Cmd;
-};
+} __packed;
+
 struct ErrorInfo {
u8   ScsiStatus;
u8   SenseLen;
@@ -398,7 +399,7 @@ struct ErrorInfo {
u32  ResidualCnt;
union MoreErrInfo  MoreErrInfo;
u8   SenseInfo[SENSEINFOBYTES];
-};
+} __packed;
 /* Command types */
 #define CMD_IOCTL_PEND  0x01
 #define CMD_SCSI   0x03
@@ -451,7 +452,7 @@ struct CommandList {
bool retry_pending;

[PATCH v3] mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1

2021-03-29 Thread Sergei Trofimovich
On !ARCH_SUPPORTS_DEBUG_PAGEALLOC (like ia64) debug_pagealloc=1
implies page_poison=on:

if (page_poisoning_enabled() ||
 (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
  debug_pagealloc_enabled()))
static_branch_enable(&_page_poisoning_enabled);

page_poison=on needs to override init_on_free=1.

Before the change it did not work as expected for the following case:
- have PAGE_POISONING=y
- have page_poison unset
- have !ARCH_SUPPORTS_DEBUG_PAGEALLOC arch (like ia64)
- have init_on_free=1
- have debug_pagealloc=1

That way we get both keys enabled:
- static_branch_enable(_on_free);
- static_branch_enable(&_page_poisoning_enabled);

which leads to poisoned pages returned for __GFP_ZERO pages.

After the change we execute only:
- static_branch_enable(&_page_poisoning_enabled);
and ignore init_on_free=1.

Acked-by: Vlastimil Babka 
Fixes: 8db26a3d4735 ("mm, page_poison: use static key more efficiently")
Cc: 
CC: Andrew Morton 
CC: linux...@kvack.org
CC: David Hildenbrand 
CC: Andrey Konovalov 
Link: https://lkml.org/lkml/2021/3/26/443

Signed-off-by: Sergei Trofimovich 
---
Change since v2:
- Added 'Fixes:' and 'CC: stable@' suggested by Vlastimil and David
- Renamed local variable to 'page_poisoning_requested' for
  consistency suggested by David
- Simplified initialization of page_poisoning_requested suggested
  by David
- Added 'Acked-by: Vlastimil'

 mm/page_alloc.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cfc72873961d..4bb3cdfc47f8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -764,32 +764,36 @@ static inline void clear_page_guard(struct zone *zone, 
struct page *page,
  */
 void init_mem_debugging_and_hardening(void)
 {
+   bool page_poisoning_requested = false;
+
+#ifdef CONFIG_PAGE_POISONING
+   /*
+* Page poisoning is debug page alloc for some arches. If
+* either of those options are enabled, enable poisoning.
+*/
+   if (page_poisoning_enabled() ||
+(!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
+ debug_pagealloc_enabled())) {
+   static_branch_enable(&_page_poisoning_enabled);
+   page_poisoning_requested = true;
+   }
+#endif
+
if (_init_on_alloc_enabled_early) {
-   if (page_poisoning_enabled())
+   if (page_poisoning_requested)
pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, "
"will take precedence over init_on_alloc\n");
else
static_branch_enable(_on_alloc);
}
if (_init_on_free_enabled_early) {
-   if (page_poisoning_enabled())
+   if (page_poisoning_requested)
pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, "
"will take precedence over init_on_free\n");
else
static_branch_enable(_on_free);
}
 
-#ifdef CONFIG_PAGE_POISONING
-   /*
-* Page poisoning is debug page alloc for some arches. If
-* either of those options are enabled, enable poisoning.
-*/
-   if (page_poisoning_enabled() ||
-(!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
- debug_pagealloc_enabled()))
-   static_branch_enable(&_page_poisoning_enabled);
-#endif
-
 #ifdef CONFIG_DEBUG_PAGEALLOC
if (!debug_pagealloc_enabled())
return;
-- 
2.31.1



Re: [PATCH] ia64: tools: add generic errno.h definition

2021-03-28 Thread Sergei Trofimovich
On Sat, 27 Mar 2021 10:18:18 +
Sergei Trofimovich  wrote:

> On Fri, Mar 12, 2021 at 07:51:35AM +0000, Sergei Trofimovich wrote:
> > Noticed missing header when build bpfilter helper:
> > 
> > CC [U]  net/bpfilter/main.o
> >   In file included from /usr/include/linux/errno.h:1,
> >from /usr/include/bits/errno.h:26,
> >from /usr/include/errno.h:28,
> >from net/bpfilter/main.c:4:
> >   tools/include/uapi/asm/errno.h:13:10: fatal error:
> > ../../../arch/ia64/include/uapi/asm/errno.h: No such file or directory
> >  13 | #include "../../../arch/ia64/include/uapi/asm/errno.h"
> > |  ^
> > 
> > CC: linux-kernel@vger.kernel.org
> > CC: net...@vger.kernel.org
> > CC: b...@vger.kernel.org
> > Signed-off-by: Sergei Trofimovich   
> 
> Any chance to pick it up?

Alternative (and nicer) patch is queued in -mm as:

https://www.ozlabs.org/~akpm/mmotm/broken-out/ia64-tools-remove-inclusion-of-ia64-specific-version-of-errnoh-header.patch

-- 

  Sergei


[PATCH] ia64: mca: always make IA64_MCA_DEBUG an expression

2021-03-28 Thread Sergei Trofimovich
At least ia64_mca_log_sal_error_record() expects some statement:

static void ia64_mca_log_sal_error_record(int sal_info_type)
{
...
if (irq_safe)
IA64_MCA_DEBUG("CPU %d: SAL log contains %s error record\n",
smp_processor_id(),
sal_info_type < ARRAY_SIZE(rec_name) ? rec_name[sal_info_type] 
: "UNKNOWN");
...
}

Instead of fixing all callers the change expicitly makes IA64_MCA_DEBUG
a non-empty expression.

CC: Andrew Morton 
CC: linux-i...@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/kernel/mca.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c
index 79e76712198c..16088c645e2b 100644
--- a/arch/ia64/kernel/mca.c
+++ b/arch/ia64/kernel/mca.c
@@ -109,9 +109,9 @@
 #include "irq.h"
 
 #if defined(IA64_MCA_DEBUG_INFO)
-# define IA64_MCA_DEBUG(fmt...)printk(fmt)
+# define IA64_MCA_DEBUG(fmt...) printk(fmt)
 #else
-# define IA64_MCA_DEBUG(fmt...)
+# define IA64_MCA_DEBUG(fmt...) do {} while (0)
 #endif
 
 #define NOTIFY_INIT(event, regs, arg, spin)\
-- 
2.31.1



[PATCH v2] ia64: fix EFI_DEBUG build

2021-03-28 Thread Sergei Trofimovich
When enabled local debugging via `#define EFI_DEBUG 1` noticed
build failure:
arch/ia64/kernel/efi.c:564:8: error: 'i' undeclared (first use in this 
function)

While at it fixed benign string format mismatches visible only
when EFI_DEBUG is enabled:

arch/ia64/kernel/efi.c:589:11:
warning: format '%lx' expects argument of type 'long unsigned int',
but argument 5 has type 'u64' {aka 'long long unsigned int'} [-Wformat=]

Fixes: 14fb42090943559 ("efi: Merge EFI system table revision and vendor 
checks")
CC: Ard Biesheuvel 
CC: linux-...@vger.kernel.org
CC: linux-i...@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
Change since v1: mention explicitly format string change

 arch/ia64/kernel/efi.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index c5fe21de46a8..31149e41f9be 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -415,10 +415,10 @@ efi_get_pal_addr (void)
mask  = ~((1 << IA64_GRANULE_SHIFT) - 1);
 
printk(KERN_INFO "CPU %d: mapping PAL code "
-   "[0x%lx-0x%lx) into [0x%lx-0x%lx)\n",
-   smp_processor_id(), md->phys_addr,
-   md->phys_addr + efi_md_size(md),
-   vaddr & mask, (vaddr & mask) + IA64_GRANULE_SIZE);
+   "[0x%llx-0x%llx) into [0x%llx-0x%llx)\n",
+   smp_processor_id(), md->phys_addr,
+   md->phys_addr + efi_md_size(md),
+   vaddr & mask, (vaddr & mask) + IA64_GRANULE_SIZE);
 #endif
return __va(md->phys_addr);
}
@@ -560,6 +560,7 @@ efi_init (void)
{
efi_memory_desc_t *md;
void *p;
+   unsigned int i;
 
for (i = 0, p = efi_map_start; p < efi_map_end;
 ++i, p += efi_desc_size)
@@ -586,7 +587,7 @@ efi_init (void)
}
 
printk("mem%02d: %s "
-  "range=[0x%016lx-0x%016lx) (%4lu%s)\n",
+  "range=[0x%016llx-0x%016llx) (%4lu%s)\n",
   i, efi_md_typeattr_format(buf, sizeof(buf), md),
   md->phys_addr,
   md->phys_addr + efi_md_size(md), size, unit);
-- 
2.31.1



[PATCH v2] ia64: simplify code flow around swiotlb init

2021-03-28 Thread Sergei Trofimovich
Before the change CONFIG_INTEL_IOMMU && !CONFIG_SWIOTLB && !CONFIG_FLATMEM
could skip `set_max_mapnr(max_low_pfn);` if iommu is not present on system.

CC: Andrew Morton 
CC: John Paul Adrian Glaubitz 
CC: linux-i...@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
Change since v1: fixed a typo in commit mesage.

 arch/ia64/mm/init.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 16d0d7d22657..a63585db94fe 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -644,13 +644,16 @@ mem_init (void)
 * _before_ any drivers that may need the PCI DMA interface are
 * initialized or bootmem has been freed.
 */
+   do {
 #ifdef CONFIG_INTEL_IOMMU
-   detect_intel_iommu();
-   if (!iommu_detected)
+   detect_intel_iommu();
+   if (iommu_detected)
+   break;
 #endif
 #ifdef CONFIG_SWIOTLB
swiotlb_init(1);
 #endif
+   } while (0);
 
 #ifdef CONFIG_FLATMEM
BUG_ON(!mem_map);
-- 
2.31.1



Re: [PATCH] mm: add page_owner_stack=off to make stack collection optional

2021-03-27 Thread Sergei Trofimovich
On Sun, 21 Mar 2021 21:25:01 +
Sergei Trofimovich  wrote:

> On some architectures (like ia64) stack walking is slow
> and currently requires memory allocation. This causes stack
> collection for page_owner=on to fall into recursion.
> 
> This patch implements a page_owner_stack=off to allow page stats
> collection.

More user friendly alternative would be to have a GFP_ flag similar
to __GFP_NOLOCKDEP which would allow us to skip the recursion.
I'll prepare alternative patch.

> Signed-off-by: Sergei Trofimovich 
> ---
>  .../admin-guide/kernel-parameters.txt |  6 +
>  mm/Kconfig.debug  |  3 ++-
>  mm/page_owner.c   | 23 +--
>  3 files changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 04545725f187..3e710c4ab4df 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3518,6 +3518,12 @@
>   we can turn it on.
>   on: enable the feature
>  
> + page_owner_stack= [KNL] Boot-time parameter option disabling stack
> + collection of page allocation. Has effect only if
> + "page_owner=on" is set. Useful for cases when stack
> + collection is too slow or not feasible.
> + off: disable the feature
> +
>   page_poison=[KNL] Boot-time parameter changing the state of
>   poisoning on the buddy allocator, available with
>   CONFIG_PAGE_POISONING=y.
> diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
> index 1e73717802f8..c1ecaf066c93 100644
> --- a/mm/Kconfig.debug
> +++ b/mm/Kconfig.debug
> @@ -57,7 +57,8 @@ config PAGE_OWNER
> help to find bare alloc_page(s) leaks. Even if you include this
> feature on your build, it is disabled in default. You should pass
> "page_owner=on" to boot parameter in order to enable it. Eats
> -   a fair amount of memory if enabled. See tools/vm/page_owner_sort.c
> +   a fair amount of memory if enabled. Call chain tracking can be
> +   disabled with "page_owner_stack=off". See tools/vm/page_owner_sort.c
> for user-space helper.
>  
> If unsure, say N.
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index d15c7c4994f5..2cc1113fa28d 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -31,6 +31,7 @@ struct page_owner {
>  };
>  
>  static bool page_owner_enabled = false;
> +static bool page_owner_stack_enabled = true;
>  DEFINE_STATIC_KEY_FALSE(page_owner_inited);
>  
>  static depot_stack_handle_t dummy_handle;
> @@ -41,21 +42,26 @@ static void init_early_allocated_pages(void);
>  
>  static int __init early_page_owner_param(char *buf)
>  {
> - if (!buf)
> - return -EINVAL;
> -
> - if (strcmp(buf, "on") == 0)
> - page_owner_enabled = true;
> -
> - return 0;
> + return kstrtobool(buf, _owner_enabled);
>  }
>  early_param("page_owner", early_page_owner_param);
>  
> +static int __init early_page_owner_stack_param(char *buf)
> +{
> + return kstrtobool(buf, _owner_stack_enabled);
> +}
> +early_param("page_owner_stack", early_page_owner_stack_param);
> +
>  static bool need_page_owner(void)
>  {
>   return page_owner_enabled;
>  }
>  
> +static bool need_page_owner_stack(void)
> +{
> + return page_owner_stack_enabled;
> +}
> +
>  static __always_inline depot_stack_handle_t create_dummy_stack(void)
>  {
>   unsigned long entries[4];
> @@ -122,6 +128,9 @@ static noinline depot_stack_handle_t save_stack(gfp_t 
> flags)
>   depot_stack_handle_t handle;
>   unsigned int nr_entries;
>  
> + if (!need_page_owner_stack())
> + return failure_handle;
> +
>   nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
>  
>   /*
> -- 
> 2.31.0
> 


-- 

  Sergei


[PATCH v2] mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1

2021-03-27 Thread Sergei Trofimovich
On !ARCH_SUPPORTS_DEBUG_PAGEALLOC (like ia64) debug_pagealloc=1
implies page_poison=on:

if (page_poisoning_enabled() ||
 (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
  debug_pagealloc_enabled()))
static_branch_enable(&_page_poisoning_enabled);

page_poison=on needs to init_on_free=1.

Before the change id happened too late for the following case:
- have PAGE_POISONING=y
- have page_poison unset
- have !ARCH_SUPPORTS_DEBUG_PAGEALLOC arch (like ia64)
- have init_on_free=1
- have debug_pagealloc=1

That way we get both keys enabled:
- static_branch_enable(_on_free);
- static_branch_enable(&_page_poisoning_enabled);

which leads to poisoned pages returned for __GFP_ZERO pages.

After the change we execute only:
- static_branch_enable(&_page_poisoning_enabled);
and ignore init_on_free=1.

CC: Vlastimil Babka 
CC: Andrew Morton 
CC: linux...@kvack.org
CC: David Hildenbrand 
CC: Andrey Konovalov 
Link: https://lkml.org/lkml/2021/3/26/443
Signed-off-by: Sergei Trofimovich 
---
 mm/page_alloc.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d57d9b4f7089..10a8a1d28c11 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -764,32 +764,36 @@ static inline void clear_page_guard(struct zone *zone, 
struct page *page,
  */
 void init_mem_debugging_and_hardening(void)
 {
+   bool page_poison_requested = page_poisoning_enabled();
+
+#ifdef CONFIG_PAGE_POISONING
+   /*
+* Page poisoning is debug page alloc for some arches. If
+* either of those options are enabled, enable poisoning.
+*/
+   if (page_poisoning_enabled() ||
+(!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
+ debug_pagealloc_enabled())) {
+   static_branch_enable(&_page_poisoning_enabled);
+   page_poison_requested = true;
+   }
+#endif
+
if (_init_on_alloc_enabled_early) {
-   if (page_poisoning_enabled())
+   if (page_poison_requested)
pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, "
"will take precedence over init_on_alloc\n");
else
static_branch_enable(_on_alloc);
}
if (_init_on_free_enabled_early) {
-   if (page_poisoning_enabled())
+   if (page_poison_requested)
pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, "
"will take precedence over init_on_free\n");
else
static_branch_enable(_on_free);
}
 
-#ifdef CONFIG_PAGE_POISONING
-   /*
-* Page poisoning is debug page alloc for some arches. If
-* either of those options are enabled, enable poisoning.
-*/
-   if (page_poisoning_enabled() ||
-(!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
- debug_pagealloc_enabled()))
-   static_branch_enable(&_page_poisoning_enabled);
-#endif
-
 #ifdef CONFIG_DEBUG_PAGEALLOC
if (!debug_pagealloc_enabled())
return;
-- 
2.31.0



Re: [PATCH] mm: page_alloc: ignore init_on_free=1 for page alloc

2021-03-27 Thread Sergei Trofimovich
On Fri, 26 Mar 2021 17:25:22 +
Sergei Trofimovich  wrote:

> On Fri, 26 Mar 2021 15:17:00 +0100
> Vlastimil Babka  wrote:
> 
> > On 3/26/21 12:26 PM, Sergei Trofimovich wrote:  
> > > init_on_free=1 does not guarantee that free pages contain only zero bytes.
> > > 
> > > Some examples:
> > > 1. page_poison=on takes presedence over init_on_alloc=1 / ini_on_free=1   
> > >  
> > 
> > Yes, and it spits out a message that you enabled both and poisoning takes
> > precedence. It was that way even before my changes IIRC, but not 
> > consistent.  
> 
> Yeah. I probably should not have included this case as page_poison=on actually
> made my machine boot just fine. My main focus was to understand why I an 
> seeing
> the crash on kernel with init_on_alloc=1 init_on_free=1 and most debugging 
> options
> on.
> 
> My apologies! I'll try to find where this extra poisoning comes from.
> 
> Making a step back and explaining my setup:
> 
> Initially it's an ia64 box that manages to consistently corrupt memory
> on socket free; https://lkml.org/lkml/2021/2/23/653
> 
> To get better understanding where corruption comes from I enabled
> A Lot of VM, pagealloc and slab debugging options. Full config:
> 
> 
> https://dev.gentoo.org/~slyfox/configs/guppy-config-5.12.0-rc4-00016-g427684abc9fd-dirty
> 
> I boot machine as:
> 
> [0.00] Kernel command line: 
> BOOT_IMAGE=/boot/vmlinuz-5.12.0-rc4-00016-g427684abc9fd-dirty root=/dev/sda3 
> ro slab_nomerge memblock=debug debug_pagealloc=1 hardened_usercopy=1 
> page_owner=on page_poison=0 init_on_alloc=1 init_on_free=1 
> debug_guardpage_minorder=0
> 
> My boot log:
> 
> 
> https://dev.gentoo.org/~slyfox/bugs/ia64-boot-bug/2021-03-26-init_on_alloc-fail
> 
> Caveats in reading boot log:
> - kernel crashes too early: stack unwinder does not have working 
> kmalloc() yet
> - kernel crashes in MCE handler: normally it should not. It's an 
> unrelated bug
>   that makes backtrace useless. I'll try to fix it later, but it will not 
> be fast.
> - I added a bunch of printk()s around the crash.
> 
> The important pernel boot failure part is:
>   [0.00] put_kernel_page: pmd=e001
>   [0.00] pmd:(ptrval):   
>    

I added WARN_ON_ONCE(1) to __kernel_poison_pages() to get the idea where
poisoning comes from and got it at:

[0.00] [ cut here ]
[0.00] WARNING: CPU: 0 PID: 0 at mm/page_poison.c:40 
__kernel_poison_pages+0x1a0/0x1c0
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
5.12.0-rc4-00016-g427684abc9fd-dirty #196
   Call Trace:
[0.00]  [] show_stack+0x90/0xc0
[0.00]  [] dump_stack+0x150/0x1c0
[0.00]  [] __warn+0x180/0x220
[0.00]  [] warn_slowpath_fmt+0xc0/0x100
[0.00]  [] __kernel_poison_pages+0x1a0/0x1c0
[0.00]  [] __free_pages_ok+0x2a0/0x10c0
[0.00]  [] __free_pages_core+0x2d0/0x480
[0.00]  [] memblock_free_pages+0x30/0x50
[0.00]  [] memblock_free_all+0x280/0x3c0
[0.00]  [] mem_init+0x70/0x2d0
[0.00]  [] start_kernel+0x670/0xc20
[0.00]  [] start_ap+0x760/0x780
[0.00] ---[ end trace  ]---

I think I found where page_poison=on get enabled at 
init_mem_debugging_and_hardening():

void init_mem_debugging_and_hardening(void)
{
if (_init_on_alloc_enabled_early) {
if (page_poisoning_enabled())
pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, "
"will take precedence over init_on_alloc\n");
else
static_branch_enable(_on_alloc);
}
if (_init_on_free_enabled_early) {
if (page_poisoning_enabled())
pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, "
"will take precedence over init_on_free\n");
else
static_branch_enable(_on_free);
}

#ifdef CONFIG_PAGE_POISONING
/*
 * Page poisoning is debug page alloc for some arches. If
 * either of those options are enabled, enable poisoning.
 */
if (page_poisoning_enabled() ||
 (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
  debug_pagealloc_enabled()))
static_branch_enable(&_page_poisoning_enabled); // <- HERE
#endif
...
}

If I follow the code correctly to trigger the problem one needs to:
- have PAGE_POISONING=y
- have page_poison=off set (or just unset)
- have a

Re: [PATCH] hpsa: fix boot on ia64 (atomic_t alignment)

2021-03-27 Thread Sergei Trofimovich
On Wed, 17 Mar 2021 18:28:31 +0100
John Paul Adrian Glaubitz  wrote:

> Hi Sergei!
> 
> On 3/12/21 11:27 PM, Sergei Trofimovich wrote:
> > The failure initially observed as boot failure on rx3600 ia64 machine
> > with RAID bus controller: Hewlett-Packard Company Smart Array P600:
> > 
> > kernel unaligned access to 0xe00105dd8b95, ip=0xa00100b87551
> > kernel unaligned access to 0xe00105dd8e95, ip=0xa00100b87551
> > hpsa :14:01.0: Controller reports max supported commands of 0 Using 
> > 16 instead. Ensure that firmware is up to date.
> > swapper/0[1]: error during unaligned kernel access
> > 
> > Here unaligned access comes from 'struct CommandList' that happens
> > to be packed. The change f749d8b7a ("scsi: hpsa: Correct dev cmds
> > outstanding for retried cmds") introduced unexpected padding and
> > un-aligned atomic_t from natural alignment to something else.
> > 
> > This change does not remove packing annotation from struct but only
> > restores alignment of atomic variable.
> > 
> > The change is tested on the same rx3600 machine.  
> 
> I just gave it a try on my RX2660 and for me, the hpsa driver won't load even
> with your patch.
> 
> Can you share your kernel configuration so I can give it a try?

Sure! Here is a config from a few days ago:

https://dev.gentoo.org/~slyfox/configs/guppy-config-5.12.0-rc4-00016-g427684abc9fd-dirty

-- 

  Sergei


Re: [PATCH] ia64: tools: add generic errno.h definition

2021-03-27 Thread Sergei Trofimovich
On Fri, Mar 12, 2021 at 07:51:35AM +, Sergei Trofimovich wrote:
> Noticed missing header when build bpfilter helper:
> 
> CC [U]  net/bpfilter/main.o
>   In file included from /usr/include/linux/errno.h:1,
>from /usr/include/bits/errno.h:26,
>from /usr/include/errno.h:28,
>from net/bpfilter/main.c:4:
>   tools/include/uapi/asm/errno.h:13:10: fatal error:
> ../../../arch/ia64/include/uapi/asm/errno.h: No such file or directory
>  13 | #include "../../../arch/ia64/include/uapi/asm/errno.h"
> |  ^
> 
> CC: linux-kernel@vger.kernel.org
> CC: net...@vger.kernel.org
> CC: b...@vger.kernel.org
> Signed-off-by: Sergei Trofimovich 

Any chance to pick it up?

Thanks!

> ---
>  tools/arch/ia64/include/uapi/asm/errno.h | 1 +
>  1 file changed, 1 insertion(+)
>  create mode 100644 tools/arch/ia64/include/uapi/asm/errno.h
> 
> diff --git a/tools/arch/ia64/include/uapi/asm/errno.h 
> b/tools/arch/ia64/include/uapi/asm/errno.h
> new file mode 100644
> index ..4c82b503d92f
> --- /dev/null
> +++ b/tools/arch/ia64/include/uapi/asm/errno.h
> @@ -0,0 +1 @@
> +#include 
> -- 
> 2.30.2
> 

-- 

  Sergei


Re: [PATCH] mm: page_alloc: ignore init_on_free=1 for page alloc

2021-03-26 Thread Sergei Trofimovich
On Fri, 26 Mar 2021 15:17:00 +0100
Vlastimil Babka  wrote:

> On 3/26/21 12:26 PM, Sergei Trofimovich wrote:
> > init_on_free=1 does not guarantee that free pages contain only zero bytes.
> > 
> > Some examples:
> > 1. page_poison=on takes presedence over init_on_alloc=1 / ini_on_free=1  
> 
> Yes, and it spits out a message that you enabled both and poisoning takes
> precedence. It was that way even before my changes IIRC, but not consistent.

Yeah. I probably should not have included this case as page_poison=on actually
made my machine boot just fine. My main focus was to understand why I an seeing
the crash on kernel with init_on_alloc=1 init_on_free=1 and most debugging 
options
on.

My apologies! I'll try to find where this extra poisoning comes from.

Making a step back and explaining my setup:

Initially it's an ia64 box that manages to consistently corrupt memory
on socket free; https://lkml.org/lkml/2021/2/23/653

To get better understanding where corruption comes from I enabled
A Lot of VM, pagealloc and slab debugging options. Full config:


https://dev.gentoo.org/~slyfox/configs/guppy-config-5.12.0-rc4-00016-g427684abc9fd-dirty

I boot machine as:

[0.00] Kernel command line: 
BOOT_IMAGE=/boot/vmlinuz-5.12.0-rc4-00016-g427684abc9fd-dirty root=/dev/sda3 ro 
slab_nomerge memblock=debug debug_pagealloc=1 hardened_usercopy=1 page_owner=on 
page_poison=0 init_on_alloc=1 init_on_free=1 debug_guardpage_minorder=0

My boot log:


https://dev.gentoo.org/~slyfox/bugs/ia64-boot-bug/2021-03-26-init_on_alloc-fail

Caveats in reading boot log:
- kernel crashes too early: stack unwinder does not have working kmalloc() 
yet
- kernel crashes in MCE handler: normally it should not. It's an unrelated 
bug
  that makes backtrace useless. I'll try to fix it later, but it will not 
be fast.
- I added a bunch of printk()s around the crash.

The important pernel boot failure part is:
  [0.00] put_kernel_page: pmd=e001
  [0.00] pmd:(ptrval):   
   

Note 1: I do not really enable page_poison at runtime and was misleading you
in previous emails. (I initially assumed kernel_poison_pages() poisons pages
unconditionally but you all explained it does not). Something else manages to
poison my pmd(s?).

Note 2: I have many other debugging options enabled that might trigger
poisoning. 

> > 2. free_pages_prepare() always poisons pages:
> > 
> >if (want_init_on_free())
> >kernel_init_free_pages(page, 1 << order);
> >kernel_poison_pages(page, 1 << order  
> 
> kernel_poison_pages() includes a test if poisoning is enabled. And in that 
> case
> want_init_on_free() shouldn't be. see init_mem_debugging_and_hardening()

I completely missed that! Thank you! Will try to trace real cause of poisoning.

> > I observed use of poisoned pages as the crash on ia64 booted with
> > init_on_free=1 init_on_alloc=1 (CONFIG_PAGE_POISONING=y config).
> > There pmd page contained 0x poison pages and led to early crash.  
> 
> Hm but that looks lika a sign that ia64 pmd allocation should use __GFP_ZERO 
> and
> doesn't. It shouldn't rely on init_on_alloc or init_on_free being enabled.

ia64 does use __GFP_ZERO (I even tried to add it manually to pmd_alloc_one()
before I realized all _PGTABLEs imply __GFP_ZERO).

I'll provide the call chain I arrived at for completeness:
- [ia64 boots]
- mem_init() (defined at arch/ia64/mm/init.c)
 -> setup_gate() (defined at arch/ia64/mm/init.c)
  -> put_kernel_page() (defined at arch/ia64/mm/init.c)
   -> [NOTE: from now on it's all generic code, not ia64-speficic]
-> pmd_alloc() (defined at include/linux/mm.h)
 -> __pmd_alloc() (defined at mm/memory.c)
  -> [under #ifndef __PAGETABLE_PMD_FOLDED] pmd_alloc_one() (defined at 
include/asm-generic/pgalloc.h)
   -> pmd_alloc_one() [defined at include/asm-generic/pgalloc.h):
 
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
struct page *page;
gfp_t gfp = GFP_PGTABLE_USER;

if (mm == _mm)
gfp = GFP_PGTABLE_KERNEL;
page = alloc_pages(gfp, 0);
if (!page)
return NULL;
if (!pgtable_pmd_page_ctor(page)) {
__free_pages(page, 0);
return NULL;
}
return (pmd_t *)page_address(page);
}

In our case it is a GFP_PGTABLE_KERNEL with __GFP_ZERO and result is
poisoned page instead of zeroed page.

If I interpret the above correctly it means that something (probably
memalloc_free_pages() ?) puts initial free pages as poisoned and later
alloc_pages() assumes they are memset()-zero. But I don't see why.

> > The change drops the ass

Re: [PATCH] mm: page_alloc: ignore init_on_free=1 for page alloc

2021-03-26 Thread Sergei Trofimovich
On Fri, 26 Mar 2021 16:00:34 +0100
Andrey Konovalov  wrote:

> On Fri, Mar 26, 2021 at 2:49 PM David Hildenbrand  wrote:
> >  
> > > I observed use of poisoned pages as the crash on ia64 booted with
> > > init_on_free=1 init_on_alloc=1 (CONFIG_PAGE_POISONING=y config).
> > > There pmd page contained 0x poison pages and led to early crash.
> > >
> > > The change drops the assumption that init_on_free=1 guarantees free
> > > pages to contain zeros.
> > >
> > > Alternative would be to make interaction between runtime poisoning and
> > > sanitizing options and build-time debug flags like CONFIG_PAGE_POISONING
> > > more coherent. I took the simpler path.
> > >  
> >
> > I thought latest work be Vlastimil tried to tackle that. To me, it feels
> > like page_poison=on  and init_on_free=1 should bail out and disable one
> > of both things. Having both at the same time doesn't sound helpful.  
> 
> This is exactly how it works, see init_mem_debugging_and_hardening().
> 
> Sergei, could you elaborate more on what kind of crash this patch is
> trying to fix? Where does it happen and why?

Yeah, I see I misinterpreted page_poison=on handling and misled you all.
Something else poisons a page when it should have not. I'll answer in more
detail to Vlastimil's email upthread and will provide more detail of the
unexpected poisoning I see.

-- 

  Sergei


[PATCH] mm: page_alloc: ignore init_on_free=1 for page alloc

2021-03-26 Thread Sergei Trofimovich
init_on_free=1 does not guarantee that free pages contain only zero bytes.

Some examples:
1. page_poison=on takes presedence over init_on_alloc=1 / ini_on_free=1
2. free_pages_prepare() always poisons pages:

   if (want_init_on_free())
   kernel_init_free_pages(page, 1 << order);
   kernel_poison_pages(page, 1 << order

I observed use of poisoned pages as the crash on ia64 booted with
init_on_free=1 init_on_alloc=1 (CONFIG_PAGE_POISONING=y config).
There pmd page contained 0x poison pages and led to early crash.

The change drops the assumption that init_on_free=1 guarantees free
pages to contain zeros.

Alternative would be to make interaction between runtime poisoning and
sanitizing options and build-time debug flags like CONFIG_PAGE_POISONING
more coherent. I took the simpler path.

Tested the fix on rx3600.

CC: Andrew Morton 
CC: linux...@kvack.org
Signed-off-by: Sergei Trofimovich 
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cfc72873961d..d57d9b4f7089 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2301,7 +2301,7 @@ inline void post_alloc_hook(struct page *page, unsigned 
int order,
kernel_unpoison_pages(page, 1 << order);
set_page_owner(page, order, gfp_flags);
 
-   if (!want_init_on_free() && want_init_on_alloc(gfp_flags))
+   if (want_init_on_alloc(gfp_flags))
kernel_init_free_pages(page, 1 << order);
 }
 
-- 
2.31.0



[PATCH] ia64: simplify code flow around swiotlb init

2021-03-25 Thread Sergei Trofimovich
Before the change CONFIG_INTEL_IOMMU && !CONFIG_SWIOTLB && !CONFIG_FLATMEM

could skip `set_max_mapnr(max_low_pfn);` is iommu is not present on system.

CC: Andrew Morton 
CC: linux-i...@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/mm/init.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 16d0d7d22657..a63585db94fe 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -644,13 +644,16 @@ mem_init (void)
 * _before_ any drivers that may need the PCI DMA interface are
 * initialized or bootmem has been freed.
 */
+   do {
 #ifdef CONFIG_INTEL_IOMMU
-   detect_intel_iommu();
-   if (!iommu_detected)
+   detect_intel_iommu();
+   if (iommu_detected)
+   break;
 #endif
 #ifdef CONFIG_SWIOTLB
swiotlb_init(1);
 #endif
+   } while (0);
 
 #ifdef CONFIG_FLATMEM
BUG_ON(!mem_map);
-- 
2.31.0



[PATCH] ia64: drop unused IA64_FW_EMU ifdef

2021-03-23 Thread Sergei Trofimovich
It's a remnant of deleted hpsim emulation target
removed in fc5bad037 ("ia64: remove the hpsim platform").

CC: Andrew Morton 
CC: linux-i...@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/kernel/head.S | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/ia64/kernel/head.S b/arch/ia64/kernel/head.S
index 646a22c25edf..9cd0a2cce36e 100644
--- a/arch/ia64/kernel/head.S
+++ b/arch/ia64/kernel/head.S
@@ -405,11 +405,6 @@ start_ap:
 
// This is executed by the bootstrap processor (bsp) only:
 
-#ifdef CONFIG_IA64_FW_EMU
-   // initialize PAL & SAL emulator:
-   br.call.sptk.many rp=sys_fw_init
-.ret1:
-#endif
br.call.sptk.many rp=start_kernel
 .ret2: addl r3=@ltoff(halt_msg),gp
;;
-- 
2.31.0



Re: [PATCH] ia64: mca: allocate early mca with GFP_ATOMIC

2021-03-23 Thread Sergei Trofimovich
On Tue, 23 Mar 2021 16:15:06 +0100
John Paul Adrian Glaubitz  wrote:

> Hi Andrew!
> 
> On 3/15/21 9:50 AM, Sergei Trofimovich wrote:
> > The sleep warning happens at early boot right at
> > secondary CPU activation bootup:
> > 
> > smp: Bringing up secondary CPUs ...
> > BUG: sleeping function called from invalid context at 
> > mm/page_alloc.c:4942
> > in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 0, name: 
> > swapper/1
> > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
> > 5.12.0-rc2-7-g79e228d0b611-dirty #99
> > 
> > Call Trace:
> >  [] show_stack+0x90/0xc0
> >  [] dump_stack+0x150/0x1c0
> >  [] ___might_sleep+0x1c0/0x2a0
> >  [] __might_sleep+0xa0/0x160
> >  [] __alloc_pages_nodemask+0x1a0/0x600
> >  [] alloc_page_interleave+0x30/0x1c0
> >  [] alloc_pages_current+0x2c0/0x340
> >  [] __get_free_pages+0x30/0xa0
> >  [] ia64_mca_cpu_init+0x2d0/0x3a0
> >  [] cpu_init+0x8b0/0x1440
> >  [] start_secondary+0x60/0x700
> >  [] start_ap+0x750/0x780
> > Fixed BSP b0 value from CPU 1
> > 
> > As I understand interrupts are not enabled yet and system has a lot
> > of memory. There is little chance to sleep and switch to GFP_ATOMIC
> > should be a no-op.
> > 
> > CC: Andrew Morton 
> > CC: linux-i...@vger.kernel.org
> > Signed-off-by: Sergei Trofimovich 
> > ---
> >  arch/ia64/kernel/mca.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c
> > index d4cae2fc69ca..adf6521525f4 100644
> > --- a/arch/ia64/kernel/mca.c
> > +++ b/arch/ia64/kernel/mca.c
> > @@ -1824,7 +1824,7 @@ ia64_mca_cpu_init(void *cpu_data)
> > data = mca_bootmem();
> > first_time = 0;
> > } else
> > -   data = (void *)__get_free_pages(GFP_KERNEL,
> > +   data = (void *)__get_free_pages(GFP_ATOMIC,
> > get_order(sz));
> > if (!data)
> > panic("Could not allocate MCA memory for cpu %d\n",
> >   
> 
> Has this one been picked up for your tree already?

Should be there: https://www.ozlabs.org/~akpm/mmotm/series

> #NEXT_PATCHES_START mainline-later (next week, approximately)
> ia64-mca-allocate-early-mca-with-gfp_atomic.patch


-- 

  Sergei


[PATCH] mm: add page_owner_stack=off to make stack collection optional

2021-03-21 Thread Sergei Trofimovich
On some architectures (like ia64) stack walking is slow
and currently requires memory allocation. This causes stack
collection for page_owner=on to fall into recursion.

This patch implements a page_owner_stack=off to allow page stats
collection.

Signed-off-by: Sergei Trofimovich 
---
 .../admin-guide/kernel-parameters.txt |  6 +
 mm/Kconfig.debug  |  3 ++-
 mm/page_owner.c   | 23 +--
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 04545725f187..3e710c4ab4df 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3518,6 +3518,12 @@
we can turn it on.
on: enable the feature
 
+   page_owner_stack= [KNL] Boot-time parameter option disabling stack
+   collection of page allocation. Has effect only if
+   "page_owner=on" is set. Useful for cases when stack
+   collection is too slow or not feasible.
+   off: disable the feature
+
page_poison=[KNL] Boot-time parameter changing the state of
poisoning on the buddy allocator, available with
CONFIG_PAGE_POISONING=y.
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 1e73717802f8..c1ecaf066c93 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -57,7 +57,8 @@ config PAGE_OWNER
  help to find bare alloc_page(s) leaks. Even if you include this
  feature on your build, it is disabled in default. You should pass
  "page_owner=on" to boot parameter in order to enable it. Eats
- a fair amount of memory if enabled. See tools/vm/page_owner_sort.c
+ a fair amount of memory if enabled. Call chain tracking can be
+ disabled with "page_owner_stack=off". See tools/vm/page_owner_sort.c
  for user-space helper.
 
  If unsure, say N.
diff --git a/mm/page_owner.c b/mm/page_owner.c
index d15c7c4994f5..2cc1113fa28d 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -31,6 +31,7 @@ struct page_owner {
 };
 
 static bool page_owner_enabled = false;
+static bool page_owner_stack_enabled = true;
 DEFINE_STATIC_KEY_FALSE(page_owner_inited);
 
 static depot_stack_handle_t dummy_handle;
@@ -41,21 +42,26 @@ static void init_early_allocated_pages(void);
 
 static int __init early_page_owner_param(char *buf)
 {
-   if (!buf)
-   return -EINVAL;
-
-   if (strcmp(buf, "on") == 0)
-   page_owner_enabled = true;
-
-   return 0;
+   return kstrtobool(buf, _owner_enabled);
 }
 early_param("page_owner", early_page_owner_param);
 
+static int __init early_page_owner_stack_param(char *buf)
+{
+   return kstrtobool(buf, _owner_stack_enabled);
+}
+early_param("page_owner_stack", early_page_owner_stack_param);
+
 static bool need_page_owner(void)
 {
return page_owner_enabled;
 }
 
+static bool need_page_owner_stack(void)
+{
+   return page_owner_stack_enabled;
+}
+
 static __always_inline depot_stack_handle_t create_dummy_stack(void)
 {
unsigned long entries[4];
@@ -122,6 +128,9 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
depot_stack_handle_t handle;
unsigned int nr_entries;
 
+   if (!need_page_owner_stack())
+   return failure_handle;
+
nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
 
/*
-- 
2.31.0



Re: [PATCH] ia64: Ensure proper NUMA distance and possible map initialization

2021-03-19 Thread Sergei Trofimovich
On Fri, 19 Mar 2021 15:47:09 +0100
John Paul Adrian Glaubitz  wrote:

> Hi Valentin!
> 
> On 3/18/21 2:06 PM, Valentin Schneider wrote:
> > John Paul reported a warning about bogus NUMA distance values spurred by
> > commit:
> > 
> >   620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for the 
> > deduplicating sort")
> > 
> > In this case, the afflicted machine comes up with a reported 256 possible
> > nodes, all of which are 0 distance away from one another. This was
> > previously silently ignored, but is now caught by the aforementioned
> > commit.
> > 
> > The culprit is ia64's node_possible_map which remains unchanged from its
> > initialization value of NODE_MASK_ALL. In John's case, the machine doesn't
> > have any SRAT nor SLIT table, but AIUI the possible map remains untouched
> > regardless of what ACPI tables end up being parsed. Thus, !online &&
> > possible nodes remain with a bogus distance of 0 (distances \in [0, 9] are
> > "reserved and have no meaning" as per the ACPI spec).
> > 
> > Follow x86 / drivers/base/arch_numa's example and set the possible map to
> > the parsed map, which in this case seems to be the online map.
> > 
> > Link: 
> > http://lore.kernel.org/r/255d6b5d-194e-eb0e-ecdd-97477a534...@physik.fu-berlin.de
> > Fixes: 620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for 
> > the deduplicating sort")
> > Reported-by: John Paul Adrian Glaubitz 
> > Signed-off-by: Valentin Schneider 
> > ---
> > This might need an earlier Fixes: tag, but all of this is quite old and
> > dusty (the git blame rabbit hole leads me to ~2008/2007)
> > 
> > Alternatively, can we deprecate ia64 already?
> > ---
> >  arch/ia64/kernel/acpi.c | 7 +--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
> > index a5636524af76..e2af6b172200 100644
> > --- a/arch/ia64/kernel/acpi.c
> > +++ b/arch/ia64/kernel/acpi.c
> > @@ -446,7 +446,8 @@ void __init acpi_numa_fixup(void)
> > if (srat_num_cpus == 0) {
> > node_set_online(0);
> > node_cpuid[0].phys_id = hard_smp_processor_id();
> > -   return;
> > +   slit_distance(0, 0) = LOCAL_DISTANCE;
> > +   goto out;
> > }
> >  
> > /*
> > @@ -489,7 +490,7 @@ void __init acpi_numa_fixup(void)
> > for (j = 0; j < MAX_NUMNODES; j++)
> > slit_distance(i, j) = i == j ?
> > LOCAL_DISTANCE : REMOTE_DISTANCE;
> > -   return;
> > +   goto out;
> > }
> >  
> > memset(numa_slit, -1, sizeof(numa_slit));
> > @@ -514,6 +515,8 @@ void __init acpi_numa_fixup(void)
> > printk("\n");
> > }
> >  #endif
> > +out:
> > +   node_possible_map = node_online_map;
> >  }
> >  #endif /* CONFIG_ACPI_NUMA */
> >  
> >   
> 
> Tested-by: John Paul Adrian Glaubitz 
> 
> Could you send this patch through Andrew Morton's tree? The ia64 port 
> currently
> has no maintainer, so we have to use an alternative tree.
> 
> @Sergei: Could you test/ack this patch as well?

Booted successfully without problems on rx3600.

Tested-by: Sergei Trofimovich 


-- 

  Sergei


Re: [PATCH 0/1] sched/topology: NUMA distance deduplication

2021-03-17 Thread Sergei Trofimovich
On Wed, 17 Mar 2021 20:04:07 +
Valentin Schneider  wrote:

> On 17/03/21 20:47, John Paul Adrian Glaubitz wrote:
> > Helo Valentin!
> >
> > On 3/17/21 8:36 PM, Valentin Schneider wrote:  
> >> I see ACPI in your boot logs, so I'm guessing you have a bogus SLIT table
> >> (the ACPI table with node distances). You should be able to double check
> >> this with something like:
> >>
> >> $ acpidump > acpi.dump
> >> $ acpixtract -a acpi.dump
> >> $ iasl -d *.dat
> >> $ cat slit.dsl  
> >
> > There does not seem to be a SLIT table in my firmware:
> >
> > root@glendronach:~# acpidump > acpi.dump
> > root@glendronach:~# acpixtract -a acpi.dump
> >
> > Intel ACPI Component Architecture
> > ACPI Binary Table Extraction Utility version 20200925
> > Copyright (c) 2000 - 2020 Intel Corporation
> >
> > acpixtract(31194): unaligned access to 0x6f9b3925, 
> > ip=0x40003e91
> >   SSDT -3768 bytes written (0x0EB8) - ssdt1.dat
> > acpixtract(31194): unaligned access to 0x6f9b3925, 
> > ip=0x40003e00
> > acpixtract(31194): unaligned access to 0x6f9b3925, 
> > ip=0x40003e91
> >   SPCR -  80 bytes written (0x0050) - spcr.dat
> > acpixtract(31194): unaligned access to 0x6f9b3925, 
> > ip=0x40003e00
> > acpixtract(31194): unaligned access to 0x6f9b3925, 
> > ip=0x40003e91
> >   APIC - 200 bytes written (0x00C8) - apic.dat
> >   SSDT -1110 bytes written (0x0456) - ssdt2.dat
> >   SSDT - 316 bytes written (0x013C) - ssdt3.dat
> >   SPMI -  80 bytes written (0x0050) - spmi.dat
> >   DSDT -   58726 bytes written (0xE566) - dsdt.dat
> >   SSDT - 312 bytes written (0x0138) - ssdt4.dat
> >   SSDT -2150 bytes written (0x0866) - ssdt5.dat
> >   SSDT - 316 bytes written (0x013C) - ssdt6.dat
> >   SSDT -3768 bytes written (0x0EB8) - ssdt7.dat
> >   FACP - 244 bytes written (0x00F4) - facp.dat
> >   SSDT -1203 bytes written (0x04B3) - ssdt8.dat
> >   CPEP -  52 bytes written (0x0034) - cpep.dat
> >   SSDT - 316 bytes written (0x013C) - ssdt9.dat
> >   DBGP -  52 bytes written (0x0034) - dbgp.dat
> >   SSDT -3768 bytes written (0x0EB8) - ssdt10.dat
> >   FACS -  64 bytes written (0x0040) - facs.dat
> > root@glendronach:~#
> >
> > root@glendronach:~# ls *.dsl *.dat
> > apic.dat  cpep.dsl  dsdt.dat  facp.dsl  spcr.dat  spmi.dslssdt1.dat  
> > ssdt2.dsl  ssdt4.dat  ssdt5.dsl  ssdt7.dat  ssdt8.dsl
> > apic.dsl  dbgp.dat  dsdt.dsl  facs.dat  spcr.dsl  ssdt10.dat  ssdt1.dsl  
> > ssdt3.dat  ssdt4.dsl  ssdt6.dat  ssdt7.dsl  ssdt9.dat
> > cpep.dat  dbgp.dsl  facp.dat  facs.dsl  spmi.dat  ssdt10.dsl  ssdt2.dat  
> > ssdt3.dsl  ssdt5.dat  ssdt6.dsl  ssdt8.dat  ssdt9.dsl
> > root@glendronach:~#
> >  
> 
> Huh, then this might be some initialization fail that leaves nr_node_ids to
> MAX_NUMNODES, which must be 256 in your case (NODES_SHIFT==8). Devicetree
> can provide node distances, but something tells me you're not using that :-)
> 
> >> a) Complain to your hardware vendor to have them fix the table and ship a
> >>firmware fix  
> >
> > The hardware is probably too old for the vendor to care about fixing it.
> >  
> 
> Indeed, I only realized that after googling your machine
> 
> >> b) Fix the ACPI table yourself - I've been told it's doable for *some* of
> >>them, but I've never done that myself
> >> c) Compile your kernel with CONFIG_NUMA=n, as AFAICT you only actually have
> >>a single node
> >> d) Ignore the warning
> >>
> >>
> >> c) is clearly not ideal if you want to use a somewhat generic kernel image
> >> on a wide host of machines; d) is also a bit yucky...  
> >
> > Shouldn't the kernel be able to cope with quirky hardware? From what I 
> > remember in the past,
> > ACPI tables used to be broken quite a lot and the kernel contained 
> > workarounds for such cases,
> > didn't it?
> >  
> 
> Technically it *is* coping with it, it's just dumping the entire NUMA
> distance matrix in the process... Let me see if I can't figure out why your
> system doesn't end up with nr_node_ids=1.

I also poked at it a few days ago assuming it was an issue causing boot
failures (it was not, it's a harmless warning).

Looking at 'arch/ia64/**' NUMA presence is detected by SRAT ACPI
tables (and generic ACPI also wants SLIT, those probably exist on large
ia64 boxes?)

None of SRAT/SLIT are present on these small machines, thus I would
expect generic code to  erive single fake node. Boot log suggests we even
inferred 1 node system:

> [0.04] smp: Brought up 1 node, 4 CPUs

Or is it just an early bootstrap message assuming more are to come?

Could it be that we initialize too little of generic acpi boilerplace
when there is no SRAT? Somewhere around:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/ia64/kernel/acpi.c#n446

arm64 and riscv calls `arch_numa_init()` and initializes numa node

[PATCH] ia64: mca: allocate early mca with GFP_ATOMIC

2021-03-15 Thread Sergei Trofimovich
The sleep warning happens at early boot right at
secondary CPU activation bootup:

smp: Bringing up secondary CPUs ...
BUG: sleeping function called from invalid context at mm/page_alloc.c:4942
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
5.12.0-rc2-7-g79e228d0b611-dirty #99

Call Trace:
 [] show_stack+0x90/0xc0
 [] dump_stack+0x150/0x1c0
 [] ___might_sleep+0x1c0/0x2a0
 [] __might_sleep+0xa0/0x160
 [] __alloc_pages_nodemask+0x1a0/0x600
 [] alloc_page_interleave+0x30/0x1c0
 [] alloc_pages_current+0x2c0/0x340
 [] __get_free_pages+0x30/0xa0
 [] ia64_mca_cpu_init+0x2d0/0x3a0
 [] cpu_init+0x8b0/0x1440
 [] start_secondary+0x60/0x700
 [] start_ap+0x750/0x780
Fixed BSP b0 value from CPU 1

As I understand interrupts are not enabled yet and system has a lot
of memory. There is little chance to sleep and switch to GFP_ATOMIC
should be a no-op.

CC: Andrew Morton 
CC: linux-i...@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/kernel/mca.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c
index d4cae2fc69ca..adf6521525f4 100644
--- a/arch/ia64/kernel/mca.c
+++ b/arch/ia64/kernel/mca.c
@@ -1824,7 +1824,7 @@ ia64_mca_cpu_init(void *cpu_data)
data = mca_bootmem();
first_time = 0;
} else
-   data = (void *)__get_free_pages(GFP_KERNEL,
+   data = (void *)__get_free_pages(GFP_ATOMIC,
get_order(sz));
if (!data)
panic("Could not allocate MCA memory for cpu %d\n",
-- 
2.30.2



[PATCH] ia64: fix format strings for err_inject

2021-03-13 Thread Sergei Trofimovich
Fix warning with %lx / u64 mismatch:

  arch/ia64/kernel/err_inject.c: In function 'show_resources':
  arch/ia64/kernel/err_inject.c:62:22: warning:
format '%lx' expects argument of type 'long unsigned int',
but argument 3 has type 'u64' {aka 'long long unsigned int'}
 62 |  return sprintf(buf, "%lx\n", name[cpu]);   \
|  ^~~

CC: linux-i...@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/kernel/err_inject.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/ia64/kernel/err_inject.c b/arch/ia64/kernel/err_inject.c
index 8b5b8e6bc9d9..3d48f8766d78 100644
--- a/arch/ia64/kernel/err_inject.c
+++ b/arch/ia64/kernel/err_inject.c
@@ -59,7 +59,7 @@ show_##name(struct device *dev, struct device_attribute 
*attr,\
char *buf)  \
 {  \
u32 cpu=dev->id;\
-   return sprintf(buf, "%lx\n", name[cpu]);\
+   return sprintf(buf, "%llx\n", name[cpu]);   \
 }
 
 #define store(name)\
@@ -86,9 +86,9 @@ store_call_start(struct device *dev, struct device_attribute 
*attr,
 
 #ifdef ERR_INJ_DEBUG
printk(KERN_DEBUG "pal_mc_err_inject for cpu%d:\n", cpu);
-   printk(KERN_DEBUG "err_type_info=%lx,\n", err_type_info[cpu]);
-   printk(KERN_DEBUG "err_struct_info=%lx,\n", err_struct_info[cpu]);
-   printk(KERN_DEBUG "err_data_buffer=%lx, %lx, %lx.\n",
+   printk(KERN_DEBUG "err_type_info=%llx,\n", err_type_info[cpu]);
+   printk(KERN_DEBUG "err_struct_info=%llx,\n", err_struct_info[cpu]);
+   printk(KERN_DEBUG "err_data_buffer=%llx, %llx, %llx.\n",
  err_data_buffer[cpu].data1,
  err_data_buffer[cpu].data2,
  err_data_buffer[cpu].data3);
@@ -117,8 +117,8 @@ store_call_start(struct device *dev, struct 
device_attribute *attr,
 
 #ifdef ERR_INJ_DEBUG
printk(KERN_DEBUG "Returns: status=%d,\n", (int)status[cpu]);
-   printk(KERN_DEBUG "capabilities=%lx,\n", capabilities[cpu]);
-   printk(KERN_DEBUG "resources=%lx\n", resources[cpu]);
+   printk(KERN_DEBUG "capabilities=%llx,\n", capabilities[cpu]);
+   printk(KERN_DEBUG "resources=%llx\n", resources[cpu]);
 #endif
return size;
 }
@@ -131,7 +131,7 @@ show_virtual_to_phys(struct device *dev, struct 
device_attribute *attr,
char *buf)
 {
unsigned int cpu=dev->id;
-   return sprintf(buf, "%lx\n", phys_addr[cpu]);
+   return sprintf(buf, "%llx\n", phys_addr[cpu]);
 }
 
 static ssize_t
@@ -145,7 +145,7 @@ store_virtual_to_phys(struct device *dev, struct 
device_attribute *attr,
ret = get_user_pages_fast(virt_addr, 1, FOLL_WRITE, NULL);
if (ret<=0) {
 #ifdef ERR_INJ_DEBUG
-   printk("Virtual address %lx is not existing.\n",virt_addr);
+   printk("Virtual address %llx is not existing.\n", virt_addr);
 #endif
return -EINVAL;
}
@@ -163,7 +163,7 @@ show_err_data_buffer(struct device *dev,
 {
unsigned int cpu=dev->id;
 
-   return sprintf(buf, "%lx, %lx, %lx\n",
+   return sprintf(buf, "%llx, %llx, %llx\n",
err_data_buffer[cpu].data1,
err_data_buffer[cpu].data2,
err_data_buffer[cpu].data3);
@@ -178,13 +178,13 @@ store_err_data_buffer(struct device *dev,
int ret;
 
 #ifdef ERR_INJ_DEBUG
-   printk("write err_data_buffer=[%lx,%lx,%lx] on cpu%d\n",
+   printk("write err_data_buffer=[%llx,%llx,%llx] on cpu%d\n",
 err_data_buffer[cpu].data1,
 err_data_buffer[cpu].data2,
 err_data_buffer[cpu].data3,
 cpu);
 #endif
-   ret=sscanf(buf, "%lx, %lx, %lx",
+   ret = sscanf(buf, "%llx, %llx, %llx",
_data_buffer[cpu].data1,
_data_buffer[cpu].data2,
_data_buffer[cpu].data3);
-- 
2.30.2



[PATCH] ia64: fix format string for ia64-acpi-cpu-freq

2021-03-13 Thread Sergei Trofimovich
Fix warning with %lx / s64 mismatch:

  CC [M]  drivers/cpufreq/ia64-acpi-cpufreq.o
drivers/cpufreq/ia64-acpi-cpufreq.c: In function 'processor_get_pstate':
  warning: format '%lx' expects argument of type 'long unsigned int',
  but argument 3 has type 's64' {aka 'long long int'} [-Wformat=]

CC: "Rafael J. Wysocki" 
CC: Viresh Kumar 
CC: linux...@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 drivers/cpufreq/ia64-acpi-cpufreq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/ia64-acpi-cpufreq.c 
b/drivers/cpufreq/ia64-acpi-cpufreq.c
index 2efe7189ccc4..c6bdc455517f 100644
--- a/drivers/cpufreq/ia64-acpi-cpufreq.c
+++ b/drivers/cpufreq/ia64-acpi-cpufreq.c
@@ -54,7 +54,7 @@ processor_set_pstate (
retval = ia64_pal_set_pstate((u64)value);
 
if (retval) {
-   pr_debug("Failed to set freq to 0x%x, with error 0x%lx\n",
+   pr_debug("Failed to set freq to 0x%x, with error 0x%llx\n",
value, retval);
return -ENODEV;
}
@@ -77,7 +77,7 @@ processor_get_pstate (
 
if (retval)
pr_debug("Failed to get current freq with "
-   "error 0x%lx, idx 0x%x\n", retval, *value);
+   "error 0x%llx, idx 0x%x\n", retval, *value);
 
return (int)retval;
 }
-- 
2.30.2



[PATCH] hpsa: fix boot on ia64 (atomic_t alignment)

2021-03-12 Thread Sergei Trofimovich
The failure initially observed as boot failure on rx3600 ia64 machine
with RAID bus controller: Hewlett-Packard Company Smart Array P600:

kernel unaligned access to 0xe00105dd8b95, ip=0xa00100b87551
kernel unaligned access to 0xe00105dd8e95, ip=0xa00100b87551
hpsa :14:01.0: Controller reports max supported commands of 0 Using 16 
instead. Ensure that firmware is up to date.
swapper/0[1]: error during unaligned kernel access

Here unaligned access comes from 'struct CommandList' that happens
to be packed. The change f749d8b7a ("scsi: hpsa: Correct dev cmds
outstanding for retried cmds") introduced unexpected padding and
un-aligned atomic_t from natural alignment to something else.

This change does not remove packing annotation from struct but only
restores alignment of atomic variable.

The change is tested on the same rx3600 machine.

CC: linux-i...@vger.kernel.org
CC: storage...@microchip.com
CC: linux-s...@vger.kernel.org
CC: Joe Szczypek 
CC: Scott Benesh 
CC: Scott Teel 
CC: Tomas Henzl 
CC: "Martin K. Petersen" 
CC: Don Brace 
Reported-by: John Paul Adrian Glaubitz 
Suggested-by: Don Brace 
Fixes: f749d8b7a "scsi: hpsa: Correct dev cmds outstanding for retried cmds"
Signed-off-by: Sergei Trofimovich 
---
 drivers/scsi/hpsa_cmd.h | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hpsa_cmd.h b/drivers/scsi/hpsa_cmd.h
index d126bb877250..617bdae9a7de 100644
--- a/drivers/scsi/hpsa_cmd.h
+++ b/drivers/scsi/hpsa_cmd.h
@@ -20,6 +20,9 @@
 #ifndef HPSA_CMD_H
 #define HPSA_CMD_H
 
+#include  /* static_assert */
+#include  /* offsetof */
+
 /* general boundary defintions */
 #define SENSEINFOBYTES  32 /* may vary between hbas */
 #define SG_ENTRIES_IN_CMD  32 /* Max SG entries excluding chain blocks */
@@ -448,11 +451,20 @@ struct CommandList {
 */
struct hpsa_scsi_dev_t *phys_disk;
 
-   bool retry_pending;
+   int retry_pending;
struct hpsa_scsi_dev_t *device;
atomic_t refcount; /* Must be last to avoid memset in hpsa_cmd_init() */
 } __aligned(COMMANDLIST_ALIGNMENT);
 
+/*
+ * Make sure our embedded atomic variable is aligned. Otherwise we break atomic
+ * operations on architectures that don't support unaligned atomics like IA64.
+ *
+ * Ideally this header should be cleaned up to only mark individual structs as
+ * packed.
+ */
+static_assert(offsetof(struct CommandList, refcount) % __alignof__(atomic_t) 
== 0);
+
 /* Max S/G elements in I/O accelerator command */
 #define IOACCEL1_MAXSGENTRIES   24
 #define IOACCEL2_MAXSGENTRIES  28
-- 
2.30.2



[PATCH] ia64: tools: add generic errno.h definition

2021-03-11 Thread Sergei Trofimovich
Noticed missing header when build bpfilter helper:

CC [U]  net/bpfilter/main.o
  In file included from /usr/include/linux/errno.h:1,
   from /usr/include/bits/errno.h:26,
   from /usr/include/errno.h:28,
   from net/bpfilter/main.c:4:
  tools/include/uapi/asm/errno.h:13:10: fatal error:
../../../arch/ia64/include/uapi/asm/errno.h: No such file or directory
 13 | #include "../../../arch/ia64/include/uapi/asm/errno.h"
|  ^

CC: linux-kernel@vger.kernel.org
CC: net...@vger.kernel.org
CC: b...@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 tools/arch/ia64/include/uapi/asm/errno.h | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 tools/arch/ia64/include/uapi/asm/errno.h

diff --git a/tools/arch/ia64/include/uapi/asm/errno.h 
b/tools/arch/ia64/include/uapi/asm/errno.h
new file mode 100644
index ..4c82b503d92f
--- /dev/null
+++ b/tools/arch/ia64/include/uapi/asm/errno.h
@@ -0,0 +1 @@
+#include 
-- 
2.30.2



[PATCH] docs: don't include Documentation/Kconfig twice

2021-03-07 Thread Sergei Trofimovich
Before the change there were two inclusions of Documentation/Kconfig:
lib/Kconfig.debug:source "Documentation/Kconfig"
Kconfig: source "Documentation/Kconfig"

Kconfig also included 'source "lib/Kconfig.debug"'.

Noticed as two 'make menuconfig' entries in both top level menu
and in 'Kernel hacking' menu. The patch keeps entries only in
'Kernel hacking'.

CC: Mauro Carvalho Chehab 
CC: Jonathan Corbet 
Signed-off-by: Sergei Trofimovich 
---
 Kconfig | 2 --
 1 file changed, 2 deletions(-)

diff --git a/Kconfig b/Kconfig
index 745bc773f567..97ed6389c921 100644
--- a/Kconfig
+++ b/Kconfig
@@ -28,5 +28,3 @@ source "crypto/Kconfig"
 source "lib/Kconfig"
 
 source "lib/Kconfig.debug"
-
-source "Documentation/Kconfig"
-- 
2.30.1



Re: [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600

2021-03-03 Thread Sergei Trofimovich
On Wed, 3 Mar 2021 00:22:36 +
Sergei Trofimovich  wrote:

> On Tue, 2 Mar 2021 23:31:32 +0100
> John Paul Adrian Glaubitz  wrote:
> 
> > Hi Sergei!
> > 
> > On 3/2/21 11:26 PM, Sergei Trofimovich wrote:  
> > > Gave v5.12-rc1 a try today and got a similar boot failure around
> > > hpsa queue initialization, but my failure is later:
> > > https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1
> > > Maybe I get different error because I flipped on most debugging
> > > kernel options :)
> > > 
> > > Looks like 'ERROR: Invalid distance value range' while being
> > > very scary are harmless. It's just a new spammy way for kernel
> > > to report lack of NUMA config on the machine (no SRAT and SLIT
> > > ACPI tables).
> > > 
> > > At least I get hpsa detected on PCI bus. But I guess it's discovered
> > > configuration is very wrong as I get unaligned accesses:
> > > [   19.811570] kernel unaligned access to 0xe00105dd8295, 
> > > ip=0xa00100b874d1
> > > 
> > > Bisecting now.
> > 
> > Sounds good. I guess we should get Jens' fix for the signal regression
> > merged as well as your two fixes for strace.  
> 
> "bisected" (cheated halfway through) and verified that reverting
> f749d8b7a9896bc6e5ffe104cc64345037e0b152 makes rx3600 boot again.
> 
> CCing authors who might be able to help us here.
> 
> commit f749d8b7a9896bc6e5ffe104cc64345037e0b152
> Author: Don Brace 
> Date:   Mon Feb 15 16:26:57 2021 -0600
> 
> scsi: hpsa: Correct dev cmds outstanding for retried cmds
> 
> Prevent incrementing device->commands_outstanding for ioaccel command
> retries that are driver initiated.  If the command goes through the retry
> path, the device->commands_outstanding counter has already accounted for
> the number of commands outstanding to the device.  Only commands going
> through function hpsa_cmd_resolve_events decrement this counter.
> 
>  - ioaccel commands go to either HBA disks or to logical volumes comprised
>of SSDs.
> 
> The extra increment is causing device resets to hang.
> 
>  - Resets wait for all device outstanding commands to complete before
>returning.
> 
> Replace unused field abort_pending with retry_pending. This is a
> maintenance driver so these changes have the least impact/risk.
> 
> Link: 
> https://lore.kernel.org/r/161342801747.29388.13045495968308188518.stgit@brunhilda
> Tested-by: Joe Szczypek 
> Reviewed-by: Scott Benesh 
> Reviewed-by: Scott Teel 
> Reviewed-by: Tomas Henzl 
> Signed-off-by: Don Brace 
> Signed-off-by: Martin K. Petersen 
> 
> Don, do you happen to know why this patch caused some controller init failure
> for device
> 14:01.0 RAID bus controller: Hewlett-Packard Company Smart Array P600
> ?
> 
> Boot failure: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1
> Boot success: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1-good
> 
> The difference between the two boots is 
> f749d8b7a9896bc6e5ffe104cc64345037e0b152 reverted on top of 5.12-rc1
> in -good case.
> 
> Looks like hpsa controller fails to initialize in bad case (could be a race?).

Also CCing hpsa maintainer mailing lists.

Looking more into the suspect commit

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f749d8b7a9896bc6e5ffe104cc64345037e0b152
it roughly does the:

@@ -448,7 +448,7 @@ struct CommandList {
 */
struct hpsa_scsi_dev_t *phys_disk;
 
-   int abort_pending;
+   bool retry_pending;
struct hpsa_scsi_dev_t *device;
atomic_t refcount; /* Must be last to avoid memset in hpsa_cmd_init() */
 } __aligned(COMMANDLIST_ALIGNMENT);
...
@@ -1151,7 +1151,10 @@ static void __enqueue_cmd_and_start_io(struct ctlr_info 
*h,
 {
dial_down_lockup_detection_during_fw_flash(h, c);
atomic_inc(>commands_outstanding);
-   if (c->device)
+   /*
+* Check to see if the command is being retried.
+*/
+   if (c->device && !c->retry_pending)
atomic_inc(>device->commands_outstanding);

But I don't immediately see anything wrong with it.

-- 

  Sergei


[bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600

2021-03-03 Thread Sergei Trofimovich
On Tue, 2 Mar 2021 23:31:32 +0100
John Paul Adrian Glaubitz  wrote:

> Hi Sergei!
> 
> On 3/2/21 11:26 PM, Sergei Trofimovich wrote:
> > Gave v5.12-rc1 a try today and got a similar boot failure around
> > hpsa queue initialization, but my failure is later:
> > https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1
> > Maybe I get different error because I flipped on most debugging
> > kernel options :)
> > 
> > Looks like 'ERROR: Invalid distance value range' while being
> > very scary are harmless. It's just a new spammy way for kernel
> > to report lack of NUMA config on the machine (no SRAT and SLIT
> > ACPI tables).
> > 
> > At least I get hpsa detected on PCI bus. But I guess it's discovered
> > configuration is very wrong as I get unaligned accesses:
> > [   19.811570] kernel unaligned access to 0xe00105dd8295, 
> > ip=0xa00100b874d1
> > 
> > Bisecting now.  
> 
> Sounds good. I guess we should get Jens' fix for the signal regression
> merged as well as your two fixes for strace.

"bisected" (cheated halfway through) and verified that reverting
f749d8b7a9896bc6e5ffe104cc64345037e0b152 makes rx3600 boot again.

CCing authors who might be able to help us here.

commit f749d8b7a9896bc6e5ffe104cc64345037e0b152
Author: Don Brace 
Date:   Mon Feb 15 16:26:57 2021 -0600

scsi: hpsa: Correct dev cmds outstanding for retried cmds

Prevent incrementing device->commands_outstanding for ioaccel command
retries that are driver initiated.  If the command goes through the retry
path, the device->commands_outstanding counter has already accounted for
the number of commands outstanding to the device.  Only commands going
through function hpsa_cmd_resolve_events decrement this counter.

 - ioaccel commands go to either HBA disks or to logical volumes comprised
   of SSDs.

The extra increment is causing device resets to hang.

 - Resets wait for all device outstanding commands to complete before
   returning.

Replace unused field abort_pending with retry_pending. This is a
maintenance driver so these changes have the least impact/risk.

Link: 
https://lore.kernel.org/r/161342801747.29388.13045495968308188518.stgit@brunhilda
Tested-by: Joe Szczypek 
Reviewed-by: Scott Benesh 
Reviewed-by: Scott Teel 
Reviewed-by: Tomas Henzl 
Signed-off-by: Don Brace 
Signed-off-by: Martin K. Petersen 

Don, do you happen to know why this patch caused some controller init failure
for device
14:01.0 RAID bus controller: Hewlett-Packard Company Smart Array P600
?

Boot failure: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1
Boot success: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1-good

The difference between the two boots is 
f749d8b7a9896bc6e5ffe104cc64345037e0b152 reverted on top of 5.12-rc1
in -good case.

Looks like hpsa controller fails to initialize in bad case (could be a race?).

-- 

  Sergei


Re: [PATCH] ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign

2021-03-03 Thread Sergei Trofimovich
On Sun, 21 Feb 2021 00:25:54 +
Sergei Trofimovich  wrote:

> In https://bugs.gentoo.org/769614 Dmitry noticed that
> `ptrace(PTRACE_GET_SYSCALL_INFO)` does not return error sign properly.
> 
> The bug is in mismatch between get/set errors:
> 
> static inline long syscall_get_error(struct task_struct *task,
>  struct pt_regs *regs)
> {
> return regs->r10 == -1 ? regs->r8:0;
> }
> 
> static inline long syscall_get_return_value(struct task_struct *task,
> struct pt_regs *regs)
> {
> return regs->r8;
> }
> 
> static inline void syscall_set_return_value(struct task_struct *task,
> struct pt_regs *regs,
> int error, long val)
> {
> if (error) {
> /* error < 0, but ia64 uses > 0 return value */
> regs->r8 = -error;
> regs->r10 = -1;
> } else {
> regs->r8 = val;
> regs->r10 = 0;
> }
> }
> 
> Tested on v5.10 on rx3600 machine (ia64 9040 CPU).
> 
> CC: linux-i...@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: Andrew Morton 
> Reported-by: Dmitry V. Levin 
> Bug: https://bugs.gentoo.org/769614
> Signed-off-by: Sergei Trofimovich 
> ---
>  arch/ia64/include/asm/syscall.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/ia64/include/asm/syscall.h b/arch/ia64/include/asm/syscall.h
> index 6c6f16e409a8..0d23c0049301 100644
> --- a/arch/ia64/include/asm/syscall.h
> +++ b/arch/ia64/include/asm/syscall.h
> @@ -32,7 +32,7 @@ static inline void syscall_rollback(struct task_struct 
> *task,
>  static inline long syscall_get_error(struct task_struct *task,
>struct pt_regs *regs)
>  {
> - return regs->r10 == -1 ? regs->r8:0;
> + return regs->r10 == -1 ? -regs->r8:0;
>  }
>  
>  static inline long syscall_get_return_value(struct task_struct *task,
> -- 
> 2.30.1
> 

Andrew, would it be fine to pass it through misc tree?
Or should it go through Oleg as it's mostly about ptrace?

-- 

  Sergei


Re: [PATCH] ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls

2021-03-03 Thread Sergei Trofimovich
On Sun, 21 Feb 2021 00:25:53 +
Sergei Trofimovich  wrote:

> In https://bugs.gentoo.org/769614 Dmitry noticed that
> `ptrace(PTRACE_GET_SYSCALL_INFO)` does not work for syscalls called
> via glibc's syscall() wrapper.
> 
> ia64 has two ways to call syscalls from userspace: via `break` and via
> `eps` instructions.
> 
> The difference is in stack layout:
> 
> 1. `eps` creates simple stack frame: no locals, in{0..7} == out{0..8}
> 2. `break` uses userspace stack frame: may be locals (glibc provides
>one), in{0..7} == out{0..8}.
> 
> Both work fine in syscall handling cde itself.
> 
> But `ptrace(PTRACE_GET_SYSCALL_INFO)` uses unwind mechanism to
> re-extract syscall arguments but it does not account for locals.
> 
> The change always skips locals registers. It should not change `eps`
> path as kernel's handler already enforces locals=0 and fixes `break`.
> 
> Tested on v5.10 on rx3600 machine (ia64 9040 CPU).
> 
> CC: Oleg Nesterov 
> CC: linux-i...@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: Andrew Morton 
> Reported-by: Dmitry V. Levin 
> Bug: https://bugs.gentoo.org/769614
> Signed-off-by: Sergei Trofimovich 
> ---
>  arch/ia64/kernel/ptrace.c | 24 ++--
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
> index c3490ee2daa5..e14f5653393a 100644
> --- a/arch/ia64/kernel/ptrace.c
> +++ b/arch/ia64/kernel/ptrace.c
> @@ -2013,27 +2013,39 @@ static void syscall_get_set_args_cb(struct 
> unw_frame_info *info, void *data)
>  {
>   struct syscall_get_set_args *args = data;
>   struct pt_regs *pt = args->regs;
> - unsigned long *krbs, cfm, ndirty;
> + unsigned long *krbs, cfm, ndirty, nlocals, nouts;
>   int i, count;
>  
>   if (unw_unwind_to_user(info) < 0)
>   return;
>  
> + /*
> +  * We get here via a few paths:
> +  * - break instruction: cfm is shared with caller.
> +  *   syscall args are in out= regs, locals are non-empty.
> +  * - epsinstruction: cfm is set by br.call
> +  *   locals don't exist.
> +  *
> +  * For both cases argguments are reachable in cfm.sof - cfm.sol.
> +  * CFM: [ ... | sor: 17..14 | sol : 13..7 | sof : 6..0 ]
> +  */
>   cfm = pt->cr_ifs;
> + nlocals = (cfm >> 7) & 0x7f; /* aka sol */
> + nouts = (cfm & 0x7f) - nlocals; /* aka sof - sol */
>   krbs = (unsigned long *)info->task + IA64_RBS_OFFSET/8;
>   ndirty = ia64_rse_num_regs(krbs, krbs + (pt->loadrs >> 19));
>  
>   count = 0;
>   if (in_syscall(pt))
> - count = min_t(int, args->n, cfm & 0x7f);
> + count = min_t(int, args->n, nouts);
>  
> + /* Iterate over outs. */
>   for (i = 0; i < count; i++) {
> + int j = ndirty + nlocals + i + args->i;
>   if (args->rw)
> - *ia64_rse_skip_regs(krbs, ndirty + i + args->i) =
> - args->args[i];
> + *ia64_rse_skip_regs(krbs, j) = args->args[i];
>   else
> - args->args[i] = *ia64_rse_skip_regs(krbs,
> - ndirty + i + args->i);
> + args->args[i] = *ia64_rse_skip_regs(krbs, j);
>   }
>  
>   if (!args->rw) {
> -- 
> 2.30.1
> 

Andrew, would it be fine to pass it through misc tree?
Or should it go through Oleg as it's about ptrace?

-- 

  Sergei


Re: 5.11 regression: "ia64: add support for TIF_NOTIFY_SIGNAL" breaks ia64 boot

2021-03-02 Thread Sergei Trofimovich
On Tue, 23 Feb 2021 08:08:30 +
Sergei Trofimovich  wrote:

> On Mon, 22 Feb 2021 17:43:58 -0700
> Jens Axboe  wrote:
> 
> > On 2/22/21 5:41 PM, Jens Axboe wrote:  
> > > On 2/22/21 5:34 PM, Jens Axboe wrote:
> > >> On 2/22/21 4:53 PM, Sergei Trofimovich wrote:
> > >>> On Mon, 22 Feb 2021 16:34:50 -0700
> > >>> Jens Axboe  wrote:
> > >>>
> > >>>> On 2/22/21 4:05 PM, Sergei Trofimovich wrote:
> > >>>>> Hia Jens!
> > >>>>>
> > >>>>> Tried 5.11 on rx3600 box and noticed it has
> > >>>>> a problem handling init (5.10 booted fine):
> > >>>>>
> > >>>>> INIT: version 2.98 booting
> > >>>>>
> > >>>>>OpenRC 0.42.1 is starting up Gentoo Linux (ia64)
> > >>>>>
> > >>>>> mkdir `/run/openrc': Read-only file system
> > >>>>> mkdir `/run/openrc/starting': No such file or directory
> > >>>>> mkdir `/run/openrc/started': No such file or directory
> > >>>>> mkdir `/run/openrc/stopping': No such file or directory
> > >>>>> mkdir `/run/openrc/inactive': No such file or directory
> > >>>>> mkdir `/run/openrc/wasinactive': No such file or directory
> > >>>>> mkdir `/run/openrc/failed': No such file or directory
> > >>>>> mkdir `/run/openrc/hotplugged': No such file or directory
> > >>>>> mkdir `/run/openrc/daemons': No such file or directory
> > >>>>> mkdir `/run[   14.595059] Kernel panic - not syncing: Attempted to 
> > >>>>> kill init! exitcode=0x000b
> > >>>>> [   14.599059] ---[ end Kernel panic - not syncing: Attempted to kill 
> > >>>>> init! exitcode=0x000b ]---
> > >>>>>
> > >>>>> I suspect we build bad signal stack frame for userspace.
> > >>>>>
> > >>>>> With a bit of #define DEBUG_SIG 1 enabled the signals are SIGCHLD:
> > >>>>>
> > >>>>> [   34.969771] SIG deliver (gendepends.sh:69): sig=17 
> > >>>>> sp=6f6aeaa0 ip=a0040740 handler=4b4c59b6
> > >>>>> [   34.969948] SIG deliver (init:1): sig=17 sp=6f1ccc50 
> > >>>>> ip=a0040740 handler=4638b9e5
> > >>>>> [   34.969948] SIG deliver (gendepends.sh:69): sig=17 
> > >>>>> sp=6f6adf90 ip=a0040740 handler=4b4c59b6
> > >>>>> [   34.973948] SIG deliver (init:1): sig=17 sp=6f1cc140 
> > >>>>> ip=a0040740 handler=4638b9e5
> > >>>>> [   34.973948] Kernel panic - not syncing: Attempted to kill init! 
> > >>>>> exitcode=0x000b
> > >>>>> [   34.973948] SIG deliver (gendepends.sh:69): sig=17 
> > >>>>> sp=6f6ad480 ip=a0040740 handler=4b4c59b6
> > >>>>> [   34.973948] ---[ end Kernel panic - not syncing: Attempted to kill 
> > >>>>> init! exitcode=0x000b ]---
> > >>>>>
> > >>>>> Bisect points at:
> > >>>>>
> > >>>>> commit b269c229b0e89aedb7943c06673b56b6052cf5e5
> > >>>>> Author: Jens Axboe 
> > >>>>> Date:   Fri Oct 9 14:49:43 2020 -0600
> > >>>>>
> > >>>>> ia64: add support for TIF_NOTIFY_SIGNAL
> > >>>>>
> > >>>>> Wire up TIF_NOTIFY_SIGNAL handling for ia64.
> > >>>>>
> > >>>>> Cc: linux-i...@vger.kernel.org
> > >>>>> [axboe: added fixes from Mike Rapoport ]
> > >>>>> Signed-off-by: Jens Axboe 
> > >>>>>
> > >>>>> diff --git a/arch/ia64/include/asm/thread_info.h 
> > >>>>> b/arch/ia64/include/asm/thread_info.h
> > >>>>> index 64a1011f6812..51d20cb37706 100644
> > >>>>> --- a/arch/ia64/include/asm/thread_info.h
> > >>>>> +++ b/arch/ia64/include/asm/thread_info.h
> > >>>>> @@ -103,6 +103,7 @@ struct thread_info {
> > >>>>>  #define TIF_SYSCALL_TRACE  2   /* syscall trace active */
> > >>>>>  #define TIF_SYSCALL_AUDIT  3   /* syscall auditing active */
> > 

5.?? regression: strace testsuite OOpses kernel on ia64

2021-02-23 Thread Sergei Trofimovich
The crash seems to be related to sock_filter-v test from strace:
https://github.com/strace/strace/blob/master/tests/seccomp-filter-v.c

Here is an OOps:

[  818.089904] BUG: Bad page map in process sock_filter-v  pte:0001 
pmd:118580001
[  818.089904] page:e6a429c8 refcount:1 mapcount:-1 
mapping: index:0x0 pfn:0x0
[  818.089904] flags: 0x1000(reserved)
[  818.089904] raw: 1000 a0004008 a0004008 

[  818.089904] raw:   0001fffe
[  818.089904] page dumped because: bad pte
[  818.089904] addr: vm_flags:04044011 
anon_vma: mapping: index:0
[  818.095483] file:(null) fault:0x0 mmap:0x0 readpage:0x0
[  818.095483] CPU: 0 PID: 5990 Comm: sock_filter-v Not tainted 
5.11.0-3-gbfa5a4929c90 #57
[  818.095483] Hardware name: hp server rx3600   , BIOS 04.03   
 04/08/2008
[  818.095483]
[  818.095483] Call Trace:
[  818.095483]  [] show_stack+0x90/0xc0
[  818.095483] sp=e00118707bb0 
bsp=e001187013c0
[  818.095483]  [] dump_stack+0x120/0x160
[  818.095483] sp=e00118707d80 
bsp=e00118701348
[  818.095483]  [] print_bad_pte+0x300/0x3a0
[  818.095483] sp=e00118707d80 
bsp=e001187012e0
[  818.099483]  [] unmap_page_range+0xa90/0x11a0
[  818.099483] sp=e00118707d80 
bsp=e00118701140
[  818.099483]  [] unmap_vmas+0xc0/0x100
[  818.099483] sp=e00118707da0 
bsp=e00118701108
[  818.099483]  [] exit_mmap+0x150/0x320
[  818.099483] sp=e00118707da0 
bsp=e001187010d8
[  818.099483]  [] mmput+0x60/0x200
[  818.099483] sp=e00118707e20 
bsp=e001187010b0
[  818.103482]  [] do_exit+0x6f0/0x18a0
[  818.103482] sp=e00118707e20 
bsp=e00118701038
[  818.103482]  [] do_group_exit+0x90/0x2a0
[  818.103482] sp=e00118707e30 
bsp=e00118700ff0
[  818.103482]  [] sys_exit_group+0x20/0x40
[  818.103482] sp=e00118707e30 
bsp=e00118700f98
[  818.107482]  [] ia64_trace_syscall+0xf0/0x130
[  818.107482] sp=e00118707e30 
bsp=e00118700f98
[  818.107482]  [] ia64_ivt+0x00040720/0x400
[  818.107482] sp=e00118708000 
bsp=e00118700f98
[  818.115482] Disabling lock debugging due to kernel taint
[  818.115482] BUG: Bad rss-counter state mm:2eec6412 type:MM_FILEPAGES 
val:-1
[  818.132256] Unable to handle kernel NULL pointer dereference (address 
0068)
[  818.133904] sock_filter-v-X[5999]: Oops 11012296146944 [1]
[  818.133904] Modules linked in: acpi_ipmi ipmi_si usb_storage e1000 
ipmi_devintf ipmi_msghandler rtc_efi
[  818.133904]
[  818.133904] CPU: 0 PID: 5999 Comm: sock_filter-v-X Tainted: GB   
  5.11.0-3-gbfa5a4929c90 #57
[  818.133904] Hardware name: hp server rx3600   , BIOS 04.03   
 04/08/2008
[  818.133904] psr : 121008026010 ifs : 8288 ip  : 
[]Tainted: GB (5.11.0-3-gbfa5a4929c90)
[  818.133904] ip is at bpf_prog_free+0x21/0xe0
[  818.133904] unat:  pfs : 0307 rsc : 
0003
[  818.133904] rnat:  bsps:  pr  : 
00106a5a51665965
[  818.133904] ldrs:  ccv : 12088904 fpsr: 
0009804c8a70033f
[  818.133904] csd :  ssd : 
[  818.133904] b0  : a00100d54080 b6  : a00100d53fe0 b7  : 
a001cef0
[  818.133904] f6  : 0ffefb0c50daa1b67f89a f7  : 0ffed8b3e4fdb0800
[  818.133904] f8  : 10017fbd1bc00 f9  : 1000eb95f
[  818.133904] f10 : 10008ade20716a6c83cc1 f11 : 1003e02b7
[  818.133904] r1  : a0010176b300 r2  : a0028004 r3  : 

[  818.133904] r8  : 0008 r9  : e0011873f800 r10 : 
e00102c18600
[  818.133904] r11 : e00102c19600 r12 : e0011873f7f0 r13 : 
e00118738000
[  818.133904] r14 : 0068 r15 : a0028028 r16 : 
e5606a70
[  818.133904] r17 : e00102c18600 r18 : e00104370748 r19 : 
e00102c18600
[  818.133904] r20 : e00102c18600 r21 : e5606a78 r22 : 
a0010156bd28
[  818.133904] r23 : a0010147fdf4 r24 : 4000 r25 : 
e00104370750
[  818.133904] r26 : a001012f7088 r27 : a00100d53fe0 r28 : 
0001
[  818.133904] r29 : e0011873f800 r30 : e0011873f810 r31 : 
e0011873f808
[  818.133904]
[  818.133904] Call Trace:
[  818.133904]  [] show_stack+0x90/0xc0
[  818.133904]

Re: [PATCH] ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign

2021-02-21 Thread Sergei Trofimovich
On Sun, 21 Feb 2021 10:21:56 +0100
John Paul Adrian Glaubitz  wrote:

> Hi Sergei!
> 
> On 2/21/21 1:25 AM, Sergei Trofimovich wrote:
> > In https://bugs.gentoo.org/769614 Dmitry noticed that
> > `ptrace(PTRACE_GET_SYSCALL_INFO)` does not return error sign properly.
> > (...)  
> 
> Do these two patches unbreak gdb on ia64?

gdb was somewhat working on ia64 for Gentoo. strace was the main
impacted here.

But I did not try anything complicated recently. Anything specific that
breaks for you?

$ uname -r
5.10.0
(even without the patches above)

$ cat c.c
int main(){}
$ gcc c.c -o a -ggdb3
$ gdb --quiet ./a
Reading symbols from ./a...
(gdb) start
Temporary breakpoint 1 at 0x7f2: file c.c, line 1.
Starting program: /home/slyfox/a
Failed to read a valid object file image from memory.

Temporary breakpoint 1, main () at c.c:1
1   int main(){}
(gdb) disassemble
Dump of assembler code for function main:
   0x200807f0 <+0>: [MII]   mov r2=r12
   0x200807f1 <+1>: mov r14=r0;;
=> 0x200807f2 <+2>: mov r8=r14
   0x20080800 <+16>:[MIB]   mov r12=r2
   0x20080801 <+17>:nop.i 0x0
   0x20080802 <+18>:br.ret.sptk.many b0;;
End of assembler dump.
(gdb) break *0x20080800
Breakpoint 2 at 0x20080800: file c.c, line 1.
(gdb) continue
Continuing.

Breakpoint 2, 0x20080800 in main () at c.c:1
1   int main(){}

Looks ok for minor stuff.

> And have you, by any chance, managed to get the hpsa driver working again?

v5.10 seems to boot off hpsa just fine without extra patches:
  14:01.0 RAID bus controller: Hewlett-Packard Company Smart Array P600
Subsystem: Hewlett-Packard Company 3 Gb/s SAS RAID
Kernel driver in use: hpsa

v5.11 does not boot yet. Kernel does not see some files while boots after init 
is
started  (but I'm not sure it's a block device problem). Bisecting now why.

-- 

  Sergei


[PATCH] ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls

2021-02-20 Thread Sergei Trofimovich
In https://bugs.gentoo.org/769614 Dmitry noticed that
`ptrace(PTRACE_GET_SYSCALL_INFO)` does not work for syscalls called
via glibc's syscall() wrapper.

ia64 has two ways to call syscalls from userspace: via `break` and via
`eps` instructions.

The difference is in stack layout:

1. `eps` creates simple stack frame: no locals, in{0..7} == out{0..8}
2. `break` uses userspace stack frame: may be locals (glibc provides
   one), in{0..7} == out{0..8}.

Both work fine in syscall handling cde itself.

But `ptrace(PTRACE_GET_SYSCALL_INFO)` uses unwind mechanism to
re-extract syscall arguments but it does not account for locals.

The change always skips locals registers. It should not change `eps`
path as kernel's handler already enforces locals=0 and fixes `break`.

Tested on v5.10 on rx3600 machine (ia64 9040 CPU).

CC: Oleg Nesterov 
CC: linux-i...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: Andrew Morton 
Reported-by: Dmitry V. Levin 
Bug: https://bugs.gentoo.org/769614
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/kernel/ptrace.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index c3490ee2daa5..e14f5653393a 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -2013,27 +2013,39 @@ static void syscall_get_set_args_cb(struct 
unw_frame_info *info, void *data)
 {
struct syscall_get_set_args *args = data;
struct pt_regs *pt = args->regs;
-   unsigned long *krbs, cfm, ndirty;
+   unsigned long *krbs, cfm, ndirty, nlocals, nouts;
int i, count;
 
if (unw_unwind_to_user(info) < 0)
return;
 
+   /*
+* We get here via a few paths:
+* - break instruction: cfm is shared with caller.
+*   syscall args are in out= regs, locals are non-empty.
+* - epsinstruction: cfm is set by br.call
+*   locals don't exist.
+*
+* For both cases argguments are reachable in cfm.sof - cfm.sol.
+* CFM: [ ... | sor: 17..14 | sol : 13..7 | sof : 6..0 ]
+*/
cfm = pt->cr_ifs;
+   nlocals = (cfm >> 7) & 0x7f; /* aka sol */
+   nouts = (cfm & 0x7f) - nlocals; /* aka sof - sol */
krbs = (unsigned long *)info->task + IA64_RBS_OFFSET/8;
ndirty = ia64_rse_num_regs(krbs, krbs + (pt->loadrs >> 19));
 
count = 0;
if (in_syscall(pt))
-   count = min_t(int, args->n, cfm & 0x7f);
+   count = min_t(int, args->n, nouts);
 
+   /* Iterate over outs. */
for (i = 0; i < count; i++) {
+   int j = ndirty + nlocals + i + args->i;
if (args->rw)
-   *ia64_rse_skip_regs(krbs, ndirty + i + args->i) =
-   args->args[i];
+   *ia64_rse_skip_regs(krbs, j) = args->args[i];
else
-   args->args[i] = *ia64_rse_skip_regs(krbs,
-   ndirty + i + args->i);
+   args->args[i] = *ia64_rse_skip_regs(krbs, j);
}
 
if (!args->rw) {
-- 
2.30.1



[PATCH] ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign

2021-02-20 Thread Sergei Trofimovich
In https://bugs.gentoo.org/769614 Dmitry noticed that
`ptrace(PTRACE_GET_SYSCALL_INFO)` does not return error sign properly.

The bug is in mismatch between get/set errors:

static inline long syscall_get_error(struct task_struct *task,
 struct pt_regs *regs)
{
return regs->r10 == -1 ? regs->r8:0;
}

static inline long syscall_get_return_value(struct task_struct *task,
struct pt_regs *regs)
{
return regs->r8;
}

static inline void syscall_set_return_value(struct task_struct *task,
struct pt_regs *regs,
int error, long val)
{
if (error) {
/* error < 0, but ia64 uses > 0 return value */
regs->r8 = -error;
regs->r10 = -1;
} else {
regs->r8 = val;
regs->r10 = 0;
}
}

Tested on v5.10 on rx3600 machine (ia64 9040 CPU).

CC: linux-i...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: Andrew Morton 
Reported-by: Dmitry V. Levin 
Bug: https://bugs.gentoo.org/769614
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/include/asm/syscall.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/syscall.h b/arch/ia64/include/asm/syscall.h
index 6c6f16e409a8..0d23c0049301 100644
--- a/arch/ia64/include/asm/syscall.h
+++ b/arch/ia64/include/asm/syscall.h
@@ -32,7 +32,7 @@ static inline void syscall_rollback(struct task_struct *task,
 static inline long syscall_get_error(struct task_struct *task,
 struct pt_regs *regs)
 {
-   return regs->r10 == -1 ? regs->r8:0;
+   return regs->r10 == -1 ? -regs->r8:0;
 }
 
 static inline long syscall_get_return_value(struct task_struct *task,
-- 
2.30.1



linux-headers-5.2 and proper use of SIOCGSTAMP

2019-07-20 Thread Sergei Trofimovich
Commit 
https://github.com/torvalds/linux/commit/0768e17073dc527ccd18ed5f96ce85f9985e9115
("net: socket: implement 64-bit timestamps") caused a bit of userspace breakage
for existing programs:
- firefox: https://bugs.gentoo.org/689808
- qemu: 
https://lists.sr.ht/~philmd/qemu/%3C20190604071915.288045-1-borntraeger%40de.ibm.com%3E
- linux-atm: 
https://gitweb.gentoo.org/repo/gentoo.git/tree/net-dialup/linux-atm/files/linux-atm-2.5.2-linux-5.2-SIOCGSTAMP.patch?id=408621819a85bf67a73efd33a06ea371c20ea5a2

I have a question: how a well-behaved app should include 'SIOCGSTAMP'
definition to keep being buildable against old and new linux-headers?

'man 7 socket' explicitly mentions SIOCGSTAMP and mentions only
#include 
as needed header.

Should #include  always be included by user app?
Or should glibc tweak it's definition of '#include '
to make it available on both old and new version of linux headers?

CCing both kernel and glibc folk as I don't understand on which
side issue should be fixed.

Thanks!

-- 

  Sergei


[RESEND, PATCH] tty/vt: fix write/write race in ioctl(KDSKBSENT) handler

2019-03-10 Thread Sergei Trofimovich
The bug manifests as an attempt to access deallocated memory:

BUG: unable to handle kernel paging request at 9c8735448000
#PF error: [PROT] [WRITE]
PGD 288a05067 P4D 288a05067 PUD 288a07067 PMD 7f60c2063 PTE 8007f5448161
Oops: 0003 [#1] PREEMPT SMP
CPU: 6 PID: 388 Comm: loadkeys Tainted: G C
5.0.0-rc6-00153-g5ded5871030e #91
Hardware name: Gigabyte Technology Co., Ltd. To be filled by 
O.E.M./H77M-D3H, BIOS F12 11/14/2013
RIP: 0010:__memmove+0x81/0x1a0
Code: 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 a2 00 00 00 
66 90 48 89 d1 4c 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03  48 a5 4d 89 1a e9 
0c 01 00 00 0f 1f 40 00 48 89 d1 4c 8b 1e 49
RSP: 0018:a1b9002d7d08 EFLAGS: 00010203
RAX: 9c873541af43 RBX: 9c873541af43 RCX: 0c6f105cd6bf
RDX: 637882e986b6 RSI: 9c8735447ffb RDI: 9c8735447ffb
RBP: 9c8739cd3800 R08: 9c873b802f00 R09: f73b
R10: b82b35f1 R11: 00505b1b004d5b1b R12: 
R13: 9c873541af3d R14: 000b R15: 000c
FS:  7f450c390580() GS:9c873f18() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 9c8735448000 CR3: 0007e213c002 CR4: 000606e0
Call Trace:
 vt_do_kdgkb_ioctl+0x34d/0x440
 vt_ioctl+0xba3/0x1190
 ? __bpf_prog_run32+0x39/0x60
 ? mem_cgroup_commit_charge+0x7b/0x4e0
 tty_ioctl+0x23f/0x920
 ? preempt_count_sub+0x98/0xe0
 ? __seccomp_filter+0x67/0x600
 do_vfs_ioctl+0xa2/0x6a0
 ? syscall_trace_enter+0x192/0x2d0
 ksys_ioctl+0x3a/0x70
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x54/0xe0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The bug manifests on systemd systems with multiple vtcon devices:
  # cat /sys/devices/virtual/vtconsole/vtcon0/name
  (S) dummy device
  # cat /sys/devices/virtual/vtconsole/vtcon1/name
  (M) frame buffer device

There systemd runs 'loadkeys' tool in tapallel for each vtcon
instance. This causes two parallel ioctl(KDSKBSENT) calls to
race into adding the same entry into 'func_table' array at:

drivers/tty/vt/keyboard.c:vt_do_kdgkb_ioctl()

The function has no locking around writes to 'func_table'.

The simplest reproducer is to have initrams with the following
init on a 8-CPU machine x86_64:

#!/bin/sh

loadkeys -q windowkeys ru4 &
loadkeys -q windowkeys ru4 &
loadkeys -q windowkeys ru4 &
loadkeys -q windowkeys ru4 &

loadkeys -q windowkeys ru4 &
loadkeys -q windowkeys ru4 &
loadkeys -q windowkeys ru4 &
loadkeys -q windowkeys ru4 &
wait

The change adds lock on write path only. Reads are still racy.

CC: Greg Kroah-Hartman 
CC: Jiri Slaby 
Link: https://lkml.org/lkml/2019/2/17/256
Signed-off-by: Sergei Trofimovich 
---
 drivers/tty/vt/keyboard.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/vt/keyboard.c b/drivers/tty/vt/keyboard.c
index 88312c6c92cc..0617e87ab343 100644
--- a/drivers/tty/vt/keyboard.c
+++ b/drivers/tty/vt/keyboard.c
@@ -123,6 +123,7 @@ static const int NR_TYPES = ARRAY_SIZE(max_vals);
 static struct input_handler kbd_handler;
 static DEFINE_SPINLOCK(kbd_event_lock);
 static DEFINE_SPINLOCK(led_lock);
+static DEFINE_SPINLOCK(func_buf_lock); /* guard 'func_buf'  and friends */
 static unsigned long key_down[BITS_TO_LONGS(KEY_CNT)]; /* keyboard key bitmap 
*/
 static unsigned char shift_down[NR_SHIFT]; /* shift state 
counters.. */
 static bool dead_key_next;
@@ -1990,11 +1991,12 @@ int vt_do_kdgkb_ioctl(int cmd, struct kbsentry __user 
*user_kdgkb, int perm)
char *p;
u_char *q;
u_char __user *up;
-   int sz;
+   int sz, fnw_sz;
int delta;
char *first_free, *fj, *fnw;
int i, j, k;
int ret;
+   unsigned long flags;
 
if (!capable(CAP_SYS_TTY_CONFIG))
perm = 0;
@@ -2037,7 +2039,14 @@ int vt_do_kdgkb_ioctl(int cmd, struct kbsentry __user 
*user_kdgkb, int perm)
goto reterr;
}
 
+   fnw = NULL;
+   fnw_sz = 0;
+   /* race aginst other writers */
+   again:
+   spin_lock_irqsave(_buf_lock, flags);
q = func_table[i];
+
+   /* fj pointer to next entry after 'q' */
first_free = funcbufptr + (funcbufsize - funcbufleft);
for (j = i+1; j < MAX_NR_FUNC && !func_table[j]; j++)
;
@@ -2045,10 +2054,12 @@ int vt_do_kdgkb_ioctl(int cmd, struct kbsentry __user 
*user_kdgkb, int perm)
fj = func_table[j];
else
fj = first_free;
-
+   /* buffer usage increase by new entry */
delta = (q ? -strlen(q) : 1) + strlen(kbs->kb_string);
+

[PATCH] tty/vt: fix write/write race in ioctl(KDSKBSENT) handler

2019-02-25 Thread Sergei Trofimovich
The bug manifests as an attempt to access deallocated memory:

BUG: unable to handle kernel paging request at 9c8735448000
#PF error: [PROT] [WRITE]
PGD 288a05067 P4D 288a05067 PUD 288a07067 PMD 7f60c2063 PTE 8007f5448161
Oops: 0003 [#1] PREEMPT SMP
CPU: 6 PID: 388 Comm: loadkeys Tainted: G C
5.0.0-rc6-00153-g5ded5871030e #91
Hardware name: Gigabyte Technology Co., Ltd. To be filled by 
O.E.M./H77M-D3H, BIOS F12 11/14/2013
RIP: 0010:__memmove+0x81/0x1a0
Code: 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 a2 00 00 00 
66 90 48 89 d1 4c 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03  48 a5 4d 89 1a e9 
0c 01 00 00 0f 1f 40 00 48 89 d1 4c 8b 1e 49
RSP: 0018:a1b9002d7d08 EFLAGS: 00010203
RAX: 9c873541af43 RBX: 9c873541af43 RCX: 0c6f105cd6bf
RDX: 637882e986b6 RSI: 9c8735447ffb RDI: 9c8735447ffb
RBP: 9c8739cd3800 R08: 9c873b802f00 R09: f73b
R10: b82b35f1 R11: 00505b1b004d5b1b R12: 
R13: 9c873541af3d R14: 000b R15: 000c
FS:  7f450c390580() GS:9c873f18() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 9c8735448000 CR3: 0007e213c002 CR4: 000606e0
Call Trace:
 vt_do_kdgkb_ioctl+0x34d/0x440
 vt_ioctl+0xba3/0x1190
 ? __bpf_prog_run32+0x39/0x60
 ? mem_cgroup_commit_charge+0x7b/0x4e0
 tty_ioctl+0x23f/0x920
 ? preempt_count_sub+0x98/0xe0
 ? __seccomp_filter+0x67/0x600
 do_vfs_ioctl+0xa2/0x6a0
 ? syscall_trace_enter+0x192/0x2d0
 ksys_ioctl+0x3a/0x70
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x54/0xe0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The bug manifests on systemd systems with multiple vtcon devices:
  # cat /sys/devices/virtual/vtconsole/vtcon0/name
  (S) dummy device
  # cat /sys/devices/virtual/vtconsole/vtcon1/name
  (M) frame buffer device

There systemd runs 'loadkeys' tool in tapallel for each vtcon
instance. This causes two parallel ioctl(KDSKBSENT) calls to
race into adding the same entry into 'func_table' array at:

drivers/tty/vt/keyboard.c:vt_do_kdgkb_ioctl()

The function has no locking around writes to 'func_table'.

The simplest reproducer is to have initrams with the following
init on a 8-CPU machine x86_64:

#!/bin/sh

loadkeys -q windowkeys ru4 &
loadkeys -q windowkeys ru4 &
loadkeys -q windowkeys ru4 &
loadkeys -q windowkeys ru4 &

loadkeys -q windowkeys ru4 &
loadkeys -q windowkeys ru4 &
loadkeys -q windowkeys ru4 &
loadkeys -q windowkeys ru4 &
wait

The change adds lock on write path only. Reads are still racy.

CC: Greg Kroah-Hartman 
CC: Jiri Slaby 
Link: https://lkml.org/lkml/2019/2/17/256
Signed-off-by: Sergei Trofimovich 
---
 drivers/tty/vt/keyboard.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/vt/keyboard.c b/drivers/tty/vt/keyboard.c
index 88312c6c92cc..0617e87ab343 100644
--- a/drivers/tty/vt/keyboard.c
+++ b/drivers/tty/vt/keyboard.c
@@ -123,6 +123,7 @@ static const int NR_TYPES = ARRAY_SIZE(max_vals);
 static struct input_handler kbd_handler;
 static DEFINE_SPINLOCK(kbd_event_lock);
 static DEFINE_SPINLOCK(led_lock);
+static DEFINE_SPINLOCK(func_buf_lock); /* guard 'func_buf'  and friends */
 static unsigned long key_down[BITS_TO_LONGS(KEY_CNT)]; /* keyboard key bitmap 
*/
 static unsigned char shift_down[NR_SHIFT]; /* shift state 
counters.. */
 static bool dead_key_next;
@@ -1990,11 +1991,12 @@ int vt_do_kdgkb_ioctl(int cmd, struct kbsentry __user 
*user_kdgkb, int perm)
char *p;
u_char *q;
u_char __user *up;
-   int sz;
+   int sz, fnw_sz;
int delta;
char *first_free, *fj, *fnw;
int i, j, k;
int ret;
+   unsigned long flags;
 
if (!capable(CAP_SYS_TTY_CONFIG))
perm = 0;
@@ -2037,7 +2039,14 @@ int vt_do_kdgkb_ioctl(int cmd, struct kbsentry __user 
*user_kdgkb, int perm)
goto reterr;
}
 
+   fnw = NULL;
+   fnw_sz = 0;
+   /* race aginst other writers */
+   again:
+   spin_lock_irqsave(_buf_lock, flags);
q = func_table[i];
+
+   /* fj pointer to next entry after 'q' */
first_free = funcbufptr + (funcbufsize - funcbufleft);
for (j = i+1; j < MAX_NR_FUNC && !func_table[j]; j++)
;
@@ -2045,10 +2054,12 @@ int vt_do_kdgkb_ioctl(int cmd, struct kbsentry __user 
*user_kdgkb, int perm)
fj = func_table[j];
else
fj = first_free;
-
+   /* buffer usage increase by new entry */
delta = (q ? -strlen(q) : 1) + strlen(kbs->kb_string);
+

Re: 5.0.0-rc6+: Oops at boot: RIP: 0010:__memmove+0x81/0x1a0 / vt_do_kdgkb_ioctl+0x34d/0x440 (race at reenter?)

2019-02-24 Thread Sergei Trofimovich
On Mon, 18 Feb 2019 09:38:10 +0100
Greg Kroah-Hartman  wrote:

> On Sun, Feb 17, 2019 at 11:39:57PM +0000, Sergei Trofimovich wrote:
> > [ Copying as is from https://bugzilla.kernel.org/show_bug.cgi?id=202605
> >   and sending to LKML. Greg, Jiri, can you clarify mailing
> >   list im  MAINTAINERS as well?
> >   https://github.com/torvalds/linux/blob/master/MAINTAINERS#L15527
> >   mentions no list for tty/vt/. ]
> > 
> > Kernel Oops
> >   [   38.739241] Oops: 0003 [#1] PREEMPT SMP
> >   [   38.739243] CPU: 6 PID: 388 Comm: loadkeys Tainted: G C
> > 5.0.0-rc6-00153-g5ded5871030e #91
> >   [   38.739244] Hardware name: Gigabyte Technology Co., Ltd. To be filled 
> > by O.E.M./H77M-D3H, BIOS F12 11/14/2013
> >   [   38.739249] RIP: 0010:__memmove+0x81/0x1a0
> > happes on a fresh vanilla master kernel roughly at boot
> > (before tty login prompt):
> >   $ uname -r
> >   5.0.0-rc6-00153-g5ded5871030e
> > 
> > The kernel page fault happens at 'loadkeys start'.
> > I suspect some kind of race at reenter of vt_do_kdgkb_ioctl(KDSKBSENT):
> > 
> > https://github.com/torvalds/linux/blob/master/drivers/tty/vt/keyboard.c#L1986
> > 
> > The oops trace looks similar to the following reports (no details besides 
> > Oops)
> > https://bugzilla.kernel.org/show_bug.cgi?id=194589
> > https://bugzilla.kernel.org/show_bug.cgi?id=202111
> > 
> > [   38.044921] IPv6: ADDRCONF(NETDEV_CHANGE): br0: link becomes ready
> > [   38.533196] usb 1-1.2: r8712u: CustomerID = 0x
> > [   38.533200] usb 1-1.2: r8712u: MAC Address from efuse = 00:0d:81:a9:09:90
> > [   38.533203] usb 1-1.2: r8712u: Loading firmware from 
> > "rtlwifi/rtl8712u.bin"
> > [   38.51] usbcore: registered new interface driver r8712u
> > [   38.736178] BUG: unable to handle kernel paging request at 
> > 9c8735448000
> > [   38.737215] #PF error: [PROT] [WRITE]
> > [   38.737216] PGD 288a05067 P4D 288a05067 PUD 288a07067 PMD 7f60c2063 PTE 
> > 8007f5448161
> > [   38.739241] Oops: 0003 [#1] PREEMPT SMP
> > [   38.739243] CPU: 6 PID: 388 Comm: loadkeys Tainted: G C
> > 5.0.0-rc6-00153-g5ded5871030e #91
> > [   38.739244] Hardware name: Gigabyte Technology Co., Ltd. To be filled by 
> > O.E.M./H77M-D3H, BIOS F12 11/14/2013
> > [   38.739249] RIP: 0010:__memmove+0x81/0x1a0
> > [   38.739251] Code: 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 
> > e9 a2 00 00 00 66 90 48 89 d1 4c 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03 
> >  48 a5 4d 89 1a e9 0c 01 00 00 0f 1f 40 00 48 89 d1 4c 8b 1e 49
> > [   38.739252] RSP: 0018:a1b9002d7d08 EFLAGS: 00010203
> > [   38.745857] RAX: 9c873541af43 RBX: 9c873541af43 RCX: 
> > 0c6f105cd6bf
> > [   38.745858] RDX: 637882e986b6 RSI: 9c8735447ffb RDI: 
> > 9c8735447ffb
> > [   38.745859] RBP: 9c8739cd3800 R08: 9c873b802f00 R09: 
> > f73b
> > [   38.745860] R10: b82b35f1 R11: 00505b1b004d5b1b R12: 
> > 
> > [   38.745861] R13: 9c873541af3d R14: 000b R15: 
> > 000c
> > [   38.745862] FS:  7f450c390580() GS:9c873f18() 
> > knlGS:
> > [   38.745863] CS:  0010 DS:  ES:  CR0: 80050033
> > [   38.745864] CR2: 9c8735448000 CR3: 0007e213c002 CR4: 
> > 000606e0
> > [   38.745865] Call Trace:
> > [   38.745871]  vt_do_kdgkb_ioctl+0x34d/0x440
> > [   38.745875]  vt_ioctl+0xba3/0x1190
> > [   38.745879]  ? __bpf_prog_run32+0x39/0x60
> > [   38.745882]  ? mem_cgroup_commit_charge+0x7b/0x4e0
> > [   38.762583]  tty_ioctl+0x23f/0x920
> > [   38.762586]  ? preempt_count_sub+0x98/0xe0
> > [   38.762590]  ? __seccomp_filter+0x67/0x600
> > [   38.762594]  do_vfs_ioctl+0xa2/0x6a0
> > [   38.762597]  ? syscall_trace_enter+0x192/0x2d0
> > [   38.762599]  ksys_ioctl+0x3a/0x70
> > [   38.762601]  __x64_sys_ioctl+0x16/0x20
> > [   38.762604]  do_syscall_64+0x54/0xe0
> > [   38.772513]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [   38.772515] RIP: 0033:0x7f450c2bb427
> > [   38.772517] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 
> > 8d d2 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 
> > <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 da 0c 00 f7 d8 64 89 01 48
> > [   38.772518] RSP: 002b:7fffbcedd348 EFLAGS: 0246 ORIG_RAX: 
> > 0010
> > [   38.772519] RAX: ffda RBX: 000b RCX: 
> > 7f450c2bb427
> > [   38.772520] RDX: 0

5.0.0-rc6+: Oops at boot: RIP: 0010:__memmove+0x81/0x1a0 / vt_do_kdgkb_ioctl+0x34d/0x440 (race at reenter?)

2019-02-17 Thread Sergei Trofimovich
[ Copying as is from https://bugzilla.kernel.org/show_bug.cgi?id=202605
  and sending to LKML. Greg, Jiri, can you clarify mailing
  list im  MAINTAINERS as well?
  https://github.com/torvalds/linux/blob/master/MAINTAINERS#L15527
  mentions no list for tty/vt/. ]

Kernel Oops
  [   38.739241] Oops: 0003 [#1] PREEMPT SMP
  [   38.739243] CPU: 6 PID: 388 Comm: loadkeys Tainted: G C
5.0.0-rc6-00153-g5ded5871030e #91
  [   38.739244] Hardware name: Gigabyte Technology Co., Ltd. To be filled by 
O.E.M./H77M-D3H, BIOS F12 11/14/2013
  [   38.739249] RIP: 0010:__memmove+0x81/0x1a0
happes on a fresh vanilla master kernel roughly at boot
(before tty login prompt):
  $ uname -r
  5.0.0-rc6-00153-g5ded5871030e

The kernel page fault happens at 'loadkeys start'.
I suspect some kind of race at reenter of vt_do_kdgkb_ioctl(KDSKBSENT):

https://github.com/torvalds/linux/blob/master/drivers/tty/vt/keyboard.c#L1986

The oops trace looks similar to the following reports (no details besides Oops)
https://bugzilla.kernel.org/show_bug.cgi?id=194589
https://bugzilla.kernel.org/show_bug.cgi?id=202111

[   38.044921] IPv6: ADDRCONF(NETDEV_CHANGE): br0: link becomes ready
[   38.533196] usb 1-1.2: r8712u: CustomerID = 0x
[   38.533200] usb 1-1.2: r8712u: MAC Address from efuse = 00:0d:81:a9:09:90
[   38.533203] usb 1-1.2: r8712u: Loading firmware from "rtlwifi/rtl8712u.bin"
[   38.51] usbcore: registered new interface driver r8712u
[   38.736178] BUG: unable to handle kernel paging request at 9c8735448000
[   38.737215] #PF error: [PROT] [WRITE]
[   38.737216] PGD 288a05067 P4D 288a05067 PUD 288a07067 PMD 7f60c2063 PTE 
8007f5448161
[   38.739241] Oops: 0003 [#1] PREEMPT SMP
[   38.739243] CPU: 6 PID: 388 Comm: loadkeys Tainted: G C
5.0.0-rc6-00153-g5ded5871030e #91
[   38.739244] Hardware name: Gigabyte Technology Co., Ltd. To be filled by 
O.E.M./H77M-D3H, BIOS F12 11/14/2013
[   38.739249] RIP: 0010:__memmove+0x81/0x1a0
[   38.739251] Code: 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 
a2 00 00 00 66 90 48 89 d1 4c 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03  48 a5 
4d 89 1a e9 0c 01 00 00 0f 1f 40 00 48 89 d1 4c 8b 1e 49
[   38.739252] RSP: 0018:a1b9002d7d08 EFLAGS: 00010203
[   38.745857] RAX: 9c873541af43 RBX: 9c873541af43 RCX: 0c6f105cd6bf
[   38.745858] RDX: 637882e986b6 RSI: 9c8735447ffb RDI: 9c8735447ffb
[   38.745859] RBP: 9c8739cd3800 R08: 9c873b802f00 R09: f73b
[   38.745860] R10: b82b35f1 R11: 00505b1b004d5b1b R12: 
[   38.745861] R13: 9c873541af3d R14: 000b R15: 000c
[   38.745862] FS:  7f450c390580() GS:9c873f18() 
knlGS:
[   38.745863] CS:  0010 DS:  ES:  CR0: 80050033
[   38.745864] CR2: 9c8735448000 CR3: 0007e213c002 CR4: 000606e0
[   38.745865] Call Trace:
[   38.745871]  vt_do_kdgkb_ioctl+0x34d/0x440
[   38.745875]  vt_ioctl+0xba3/0x1190
[   38.745879]  ? __bpf_prog_run32+0x39/0x60
[   38.745882]  ? mem_cgroup_commit_charge+0x7b/0x4e0
[   38.762583]  tty_ioctl+0x23f/0x920
[   38.762586]  ? preempt_count_sub+0x98/0xe0
[   38.762590]  ? __seccomp_filter+0x67/0x600
[   38.762594]  do_vfs_ioctl+0xa2/0x6a0
[   38.762597]  ? syscall_trace_enter+0x192/0x2d0
[   38.762599]  ksys_ioctl+0x3a/0x70
[   38.762601]  __x64_sys_ioctl+0x16/0x20
[   38.762604]  do_syscall_64+0x54/0xe0
[   38.772513]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   38.772515] RIP: 0033:0x7f450c2bb427
[   38.772517] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 8d 
d2 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 39 da 0c 00 f7 d8 64 89 01 48
[   38.772518] RSP: 002b:7fffbcedd348 EFLAGS: 0246 ORIG_RAX: 
0010
[   38.772519] RAX: ffda RBX: 000b RCX: 7f450c2bb427
[   38.772520] RDX: 7fffbcedd360 RSI: 4b49 RDI: 0003
[   38.772521] RBP: 7fffbcedd361 R08: 7f450c389c40 R09: 55cbef2494a0
[   38.772522] R10:  R11: 0246 R12: 55cbef2412b0
[   38.772522] R13: 7fffbcedd360 R14: 000b R15: 0003
[   38.772525] Modules linked in: snd_hda_codec_hdmi bridge r8712u(C) stp llc 
snd_hda_codec_via snd_hda_codec_generic snd_hda_intel snd_hda_codec 
x86_pkg_temp_thermal dummy kvm_intel snd_hwdep snd_hda_core snd_pcm snd_timer 
kvm snd atl1c soundcore irqbypass xfs tun nf_conntrack_ftp nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 loop fuse binfmt_misc ipv6
[   38.779196] r8712u 1-1.2:1.0 wl0: renamed from wlan0
[   38.779240] CR2: 9c8735448000
[   38.790894] ---[ end trace 8116e48ba19076a0 ]---
[   38.790897] RIP: 0010:__memmove+0x81/0x1a0
[   38.790898] Code: 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 
a2 00 00 00 66 90 48 89 d1 4c 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03  48 a5 
4d 89 1a e9 0c 01 00 00 0f 

[PATCH v2] alpha: fix page fault handling for r16-r18 targets

2018-12-31 Thread Sergei Trofimovich
Fix page fault handling code to fixup r16-r18 registers.
Before the patch code had off-by-two registers bug.
This bug caused overwriting of ps,pc,gp registers instead
of fixing intended r16,r17,r18 (see `struct pt_regs`).

More details:

Initially Dmitry noticed a kernel bug as a failure
on strace test suite. Test passes unmapped userspace
pointer to io_submit:

```c
#include 
#include 
#include 
#include 
int main(void)
{
unsigned long ctx = 0;
if (syscall(__NR_io_setup, 1, ))
err(1, "io_setup");
const size_t page_size = sysconf(_SC_PAGESIZE);
const size_t size = page_size * 2;
void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (MAP_FAILED == ptr)
err(1, "mmap(%zu)", size);
if (munmap(ptr, size))
err(1, "munmap");
syscall(__NR_io_submit, ctx, 1, ptr + page_size);
syscall(__NR_io_destroy, ctx);
return 0;
}
```

Running this test causes kernel to crash when handling page fault:

```
Unable to handle kernel paging request at virtual address 9468
CPU 3
aio(26027): Oops 0
pc = []  ra = []  ps = Not 
tainted
pc is at sys_io_submit+0x108/0x200
ra is at sys_io_submit+0x6c/0x200
v0 = fc00c58e6300  t0 = fff2  t1 = 0225e000
t2 = fc01f159fef8  t3 = fc0001009640  t4 = fce0f6e0
t5 = 020001002e9e  t6 = 4c41564e49452031  t7 = fc01f159c000
s0 = 0002  s1 = 0225e000  s2 = 
s3 =   s4 =   s5 = fff2
s6 = fc00c58e6300
a0 = fc00c58e6300  a1 =   a2 = 0225e000
a3 = 021ac260  a4 = 021ac1e8  a5 = 0001
t8 = 0008  t9 = 00011f8bce30  t10= 021ac440
t11=   pv = fc6fd320  at = 
gp =   sp = 265fd174
Disabling lock debugging due to kernel taint
Trace:
[] entSys+0xa4/0xc0
```

Here `gp` has invalid value. `gp is s overwritten by a fixup for the
following page fault handler in `io_submit` syscall handler:

```
__se_sys_io_submit
...
ldq a1,0(t1)
bne t0,4280 <__se_sys_io_submit+0x180>
```

After a page fault `t0` should contain -EFALUT and `a1` is 0.
Instead `gp` was overwritten in place of `a1`.

This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18`
(aka `a0-a2`).

I think the bug went unnoticed for a long time as `gp` is one
of scratch registers. Any kernel function call would re-calculate `gp`.

Dmitry tracked down the bug origin back to 2.1.32 kernel version
where trap_a{0,1,2} fields were inserted into struct pt_regs.
And even before that `dpf_reg()` contained off-by-one error.

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: linux-al...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reported-and-reviewed-by: "Dmitry V. Levin" 
Cc: sta...@vger.kernel.org # v2.1.32+
Bug: https://bugs.gentoo.org/672040
Signed-off-by: Sergei Trofimovich 
---
Changes since V1:
- expanded bug origin tracked down by Dmitry
- added proper Dmitry's email and reviwed by tags

 arch/alpha/mm/fault.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index d73dc473fbb9..188fc9256baf 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -78,7 +78,7 @@ __load_new_mm_context(struct mm_struct *next_mm)
 /* Macro for exception fixup code to access integer registers.  */
 #define dpf_reg(r) \
(((unsigned long *)regs)[(r) <= 8 ? (r) : (r) <= 15 ? (r)-16 :  \
-(r) <= 18 ? (r)+8 : (r)-10])
+(r) <= 18 ? (r)+10 : (r)-10])
 
 asmlinkage void
 do_page_fault(unsigned long address, unsigned long mmcsr,
-- 
2.20.1



[PATCH] alpha: fix page fault handling for r16-r18 targets

2018-12-30 Thread Sergei Trofimovich
Fix page fault handling code to fixup r16-r18 registers.
Before the patch code had off-by-two registers bug.
This bug caused overwriting of ps,pc,gp registers instead
of fixing intended r16,r17,r18 (see `struct pt_regs`).

More details:

Initially Dmitry noticed a kernel bug as a failure
on strace test suite. Test passes unmapped userspace
pointer to io_submit:

```c
#include 
#include 
#include 
#include 
int main(void)
{
unsigned long ctx = 0;
if (syscall(__NR_io_setup, 1, ))
err(1, "io_setup");
const size_t page_size = sysconf(_SC_PAGESIZE);
const size_t size = page_size * 2;
void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (MAP_FAILED == ptr)
err(1, "mmap(%zu)", size);
if (munmap(ptr, size))
err(1, "munmap");
syscall(__NR_io_submit, ctx, 1, ptr + page_size);
syscall(__NR_io_destroy, ctx);
return 0;
}
```

Running this test causes kernel to crash when handling page fault:

```
Unable to handle kernel paging request at virtual address 9468
CPU 3
aio(26027): Oops 0
pc = []  ra = []  ps = Not 
tainted
pc is at sys_io_submit+0x108/0x200
ra is at sys_io_submit+0x6c/0x200
v0 = fc00c58e6300  t0 = fff2  t1 = 0225e000
t2 = fc01f159fef8  t3 = fc0001009640  t4 = fce0f6e0
t5 = 020001002e9e  t6 = 4c41564e49452031  t7 = fc01f159c000
s0 = 0002  s1 = 0225e000  s2 = 
s3 =   s4 =   s5 = fff2
s6 = fc00c58e6300
a0 = fc00c58e6300  a1 =   a2 = 0225e000
a3 = 021ac260  a4 = 021ac1e8  a5 = 0001
t8 = 0008  t9 = 00011f8bce30  t10= 021ac440
t11=   pv = fc6fd320  at = 
gp =   sp = 265fd174
Disabling lock debugging due to kernel taint
Trace:
[] entSys+0xa4/0xc0
```

Here `gp` has invalid value. `gp is s overwritten by a fixup for the
following page fault handler in `io_submit` syscall handler:

```
__se_sys_io_submit
...
ldq a1,0(t1)
bne t0,4280 <__se_sys_io_submit+0x180>
```

After a page fault `t0` should contain -EFALUT and `a1` is 0.
Instead `gp` was overwritten in place of `a1`.

This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18`
(aka `a0-a2`).

I think the bug went unnoticed for a long time as `gp` is one
of scratch registers. Any kernel function call would re-calculate `gp`.

CC: Dmitry V. Levin 
CC: Richard Henderson 
CC: Ivan Kokshaysky 
CC: Matt Turner 
CC: linux-al...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Reported-by: Dmitry V. Levin
Bug: https://bugs.gentoo.org/672040
Signed-off-by: Sergei Trofimovich 
---
 arch/alpha/mm/fault.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index d73dc473fbb9..188fc9256baf 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -78,7 +78,7 @@ __load_new_mm_context(struct mm_struct *next_mm)
 /* Macro for exception fixup code to access integer registers.  */
 #define dpf_reg(r) \
(((unsigned long *)regs)[(r) <= 8 ? (r) : (r) <= 15 ? (r)-16 :  \
-(r) <= 18 ? (r)+8 : (r)-10])
+(r) <= 18 ? (r)+10 : (r)-10])
 
 asmlinkage void
 do_page_fault(unsigned long address, unsigned long mmcsr,
-- 
2.20.1



Re: [PATCH] ia64: enable GENERIC_HWEIGHT

2018-10-15 Thread Sergei Trofimovich
On Fri, 14 Sep 2018 08:06:46 +0100
Sergei Trofimovich  wrote:

> Noticed on a single driver failure:
>   ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] 
> undefined!
> 
> CC: Tony Luck 
> CC: Fenghua Yu 
> CC: linux-i...@vger.kernel.org
> CC: Andrew Morton 
> CC: linux-kernel@vger.kernel.org
> Signed-off-by: Sergei Trofimovich 
> ---
>  arch/ia64/Kconfig | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
> index 8b4a0c1748c0..1a71f92f0b8e 100644
> --- a/arch/ia64/Kconfig
> +++ b/arch/ia64/Kconfig
> @@ -576,3 +576,7 @@ config MSPEC
> If you have an ia64 and you want to enable memory special
> operations support (formerly known as fetchop), say Y here,
> otherwise say N.
> +
> +config GENERIC_HWEIGHT
> + bool
> + default y
> -- 
> 2.19.0
> 

Ping.

-- 

  Sergei


Re: [PATCH] ia64: enable GENERIC_HWEIGHT

2018-10-15 Thread Sergei Trofimovich
On Fri, 14 Sep 2018 08:06:46 +0100
Sergei Trofimovich  wrote:

> Noticed on a single driver failure:
>   ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] 
> undefined!
> 
> CC: Tony Luck 
> CC: Fenghua Yu 
> CC: linux-i...@vger.kernel.org
> CC: Andrew Morton 
> CC: linux-kernel@vger.kernel.org
> Signed-off-by: Sergei Trofimovich 
> ---
>  arch/ia64/Kconfig | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
> index 8b4a0c1748c0..1a71f92f0b8e 100644
> --- a/arch/ia64/Kconfig
> +++ b/arch/ia64/Kconfig
> @@ -576,3 +576,7 @@ config MSPEC
> If you have an ia64 and you want to enable memory special
> operations support (formerly known as fetchop), say Y here,
> otherwise say N.
> +
> +config GENERIC_HWEIGHT
> + bool
> + default y
> -- 
> 2.19.0
> 

Ping.

-- 

  Sergei


Re: [PATCH] ia64: disable SCHED_STACK_END_CHECK

2018-10-15 Thread Sergei Trofimovich
On Fri, 14 Sep 2018 08:06:17 +0100
Sergei Trofimovich  wrote:

> SCHED_STACK_END_CHECK assumes stack grows in one direction.
> ia64 is a rare case where it is not.
> 
> As a result kernel fails at startup as:
>   Kernel panic - not syncing: corrupted stack end detected inside scheduler
> 
> The error does not find a real problem: it's register backing store
> is written on top of canary value.
> 
> Disable SCHED_STACK_END_CHECK on ia64 as there is no good
> place for canary without moving initial stack address.
> 
> CC: Tony Luck 
> CC: Fenghua Yu 
> CC: linux-i...@vger.kernel.org
> CC: Andrew Morton 
> CC: linux-kernel@vger.kernel.org
> Signed-off-by: Sergei Trofimovich 
> ---
>  lib/Kconfig.debug | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 4966c4fbe7f7..a097dfe38d2b 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1004,7 +1004,7 @@ config SCHEDSTATS
>  
>  config SCHED_STACK_END_CHECK
>   bool "Detect stack corruption on calls to schedule()"
> - depends on DEBUG_KERNEL
> + depends on DEBUG_KERNEL && !IA64
>   default n
>   help
> This option checks for a stack overrun on calls to schedule().
> -- 
> 2.19.0
> 

Ping.

-- 

  Sergei


Re: [PATCH] ia64: disable SCHED_STACK_END_CHECK

2018-10-15 Thread Sergei Trofimovich
On Fri, 14 Sep 2018 08:06:17 +0100
Sergei Trofimovich  wrote:

> SCHED_STACK_END_CHECK assumes stack grows in one direction.
> ia64 is a rare case where it is not.
> 
> As a result kernel fails at startup as:
>   Kernel panic - not syncing: corrupted stack end detected inside scheduler
> 
> The error does not find a real problem: it's register backing store
> is written on top of canary value.
> 
> Disable SCHED_STACK_END_CHECK on ia64 as there is no good
> place for canary without moving initial stack address.
> 
> CC: Tony Luck 
> CC: Fenghua Yu 
> CC: linux-i...@vger.kernel.org
> CC: Andrew Morton 
> CC: linux-kernel@vger.kernel.org
> Signed-off-by: Sergei Trofimovich 
> ---
>  lib/Kconfig.debug | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 4966c4fbe7f7..a097dfe38d2b 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1004,7 +1004,7 @@ config SCHEDSTATS
>  
>  config SCHED_STACK_END_CHECK
>   bool "Detect stack corruption on calls to schedule()"
> - depends on DEBUG_KERNEL
> + depends on DEBUG_KERNEL && !IA64
>   default n
>   help
> This option checks for a stack overrun on calls to schedule().
> -- 
> 2.19.0
> 

Ping.

-- 

  Sergei


[PATCH] ia64: enable GENERIC_HWEIGHT

2018-09-14 Thread Sergei Trofimovich
Noticed on a single driver failure:
  ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] undefined!

CC: Tony Luck 
CC: Fenghua Yu 
CC: linux-i...@vger.kernel.org
CC: Andrew Morton 
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/Kconfig | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 8b4a0c1748c0..1a71f92f0b8e 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -576,3 +576,7 @@ config MSPEC
  If you have an ia64 and you want to enable memory special
  operations support (formerly known as fetchop), say Y here,
  otherwise say N.
+
+config GENERIC_HWEIGHT
+   bool
+   default y
-- 
2.19.0



[PATCH] ia64: disable SCHED_STACK_END_CHECK

2018-09-14 Thread Sergei Trofimovich
SCHED_STACK_END_CHECK assumes stack grows in one direction.
ia64 is a rare case where it is not.

As a result kernel fails at startup as:
  Kernel panic - not syncing: corrupted stack end detected inside scheduler

The error does not find a real problem: it's register backing store
is written on top of canary value.

Disable SCHED_STACK_END_CHECK on ia64 as there is no good
place for canary without moving initial stack address.

CC: Tony Luck 
CC: Fenghua Yu 
CC: linux-i...@vger.kernel.org
CC: Andrew Morton 
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 lib/Kconfig.debug | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4966c4fbe7f7..a097dfe38d2b 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1004,7 +1004,7 @@ config SCHEDSTATS
 
 config SCHED_STACK_END_CHECK
bool "Detect stack corruption on calls to schedule()"
-   depends on DEBUG_KERNEL
+   depends on DEBUG_KERNEL && !IA64
default n
help
  This option checks for a stack overrun on calls to schedule().
-- 
2.19.0



[PATCH] ia64: enable GENERIC_HWEIGHT

2018-09-14 Thread Sergei Trofimovich
Noticed on a single driver failure:
  ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] undefined!

CC: Tony Luck 
CC: Fenghua Yu 
CC: linux-i...@vger.kernel.org
CC: Andrew Morton 
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/Kconfig | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 8b4a0c1748c0..1a71f92f0b8e 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -576,3 +576,7 @@ config MSPEC
  If you have an ia64 and you want to enable memory special
  operations support (formerly known as fetchop), say Y here,
  otherwise say N.
+
+config GENERIC_HWEIGHT
+   bool
+   default y
-- 
2.19.0



[PATCH] ia64: disable SCHED_STACK_END_CHECK

2018-09-14 Thread Sergei Trofimovich
SCHED_STACK_END_CHECK assumes stack grows in one direction.
ia64 is a rare case where it is not.

As a result kernel fails at startup as:
  Kernel panic - not syncing: corrupted stack end detected inside scheduler

The error does not find a real problem: it's register backing store
is written on top of canary value.

Disable SCHED_STACK_END_CHECK on ia64 as there is no good
place for canary without moving initial stack address.

CC: Tony Luck 
CC: Fenghua Yu 
CC: linux-i...@vger.kernel.org
CC: Andrew Morton 
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 lib/Kconfig.debug | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4966c4fbe7f7..a097dfe38d2b 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1004,7 +1004,7 @@ config SCHEDSTATS
 
 config SCHED_STACK_END_CHECK
bool "Detect stack corruption on calls to schedule()"
-   depends on DEBUG_KERNEL
+   depends on DEBUG_KERNEL && !IA64
default n
help
  This option checks for a stack overrun on calls to schedule().
-- 
2.19.0



Re: [PATCH v2, simpler] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)

2018-08-04 Thread Sergei Trofimovich
On Fri,  9 Mar 2018 23:15:55 +
Sergei Trofimovich  wrote:

I tried to explain in more detail breakage mechanics
of unwinder and gcc code generation quirks at:

https://trofi.github.io/posts/210-ptrace-and-accidental-boot-fix-on-ia64.html
Hopefully it gives better intuition of code change
caused by both proposed patches.

I personally think v1 patch is slightly more robust.

-- 

  Sergei


Re: [PATCH v2, simpler] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)

2018-08-04 Thread Sergei Trofimovich
On Fri,  9 Mar 2018 23:15:55 +
Sergei Trofimovich  wrote:

I tried to explain in more detail breakage mechanics
of unwinder and gcc code generation quirks at:

https://trofi.github.io/posts/210-ptrace-and-accidental-boot-fix-on-ia64.html
Hopefully it gives better intuition of code change
caused by both proposed patches.

I personally think v1 patch is slightly more robust.

-- 

  Sergei


Re: x86_64: movdqu rarely stores bad data (movdqu works fine). Kernel bug, fried CPU or glibc bug?

2018-06-17 Thread Sergei Trofimovich
On Sat, 16 Jun 2018 22:22:50 +0100
Sergei Trofimovich  wrote:

> TL;DR: on master string/test-memmove glibc test fails on my machine
> and I don't know why. Other tests work fine.
> ...
> This fails:
>   loop {
> movdqu [src++],%xmm0
> movntdq %xmm0,[dst++]
>   }
>   sfence
> This works:
>   loop {
> movdqu [src++],%xmm0
> movdqu %xmm0,[dst++]
>   }
>   sfence
> ...
> If there is no obvious problems with glibc's memove() or my small test
> what can I do to rule-out/pin-down hardware or kernel problem?

Found the cause: bad RAM module.

After I've tweaked test to allocate most of available physical RAM
I've got fully reproducible failure.

I unplugged RAM modules one by one and ran the test. That way I've
nailed down to one bad chip. Removing single bad chip restored
string/test-memmove test on this machine \o/

Sorry for the noise!

-- 

  Sergei


pgp2Q0GGYjjHI.pgp
Description: Цифровая подпись OpenPGP


Re: x86_64: movdqu rarely stores bad data (movdqu works fine). Kernel bug, fried CPU or glibc bug?

2018-06-17 Thread Sergei Trofimovich
On Sat, 16 Jun 2018 22:22:50 +0100
Sergei Trofimovich  wrote:

> TL;DR: on master string/test-memmove glibc test fails on my machine
> and I don't know why. Other tests work fine.
> ...
> This fails:
>   loop {
> movdqu [src++],%xmm0
> movntdq %xmm0,[dst++]
>   }
>   sfence
> This works:
>   loop {
> movdqu [src++],%xmm0
> movdqu %xmm0,[dst++]
>   }
>   sfence
> ...
> If there is no obvious problems with glibc's memove() or my small test
> what can I do to rule-out/pin-down hardware or kernel problem?

Found the cause: bad RAM module.

After I've tweaked test to allocate most of available physical RAM
I've got fully reproducible failure.

I unplugged RAM modules one by one and ran the test. That way I've
nailed down to one bad chip. Removing single bad chip restored
string/test-memmove test on this machine \o/

Sorry for the noise!

-- 

  Sergei


pgp2Q0GGYjjHI.pgp
Description: Цифровая подпись OpenPGP


x86_64: movdqu rarely stores bad data (movdqu works fine). Kernel bug, fried CPU or glibc bug?

2018-06-16 Thread Sergei Trofimovich
TL;DR: on master string/test-memmove glibc test fails on my machine
and I don't know why. Other tests work fine.

$ elf/ld.so --inhibit-cache --library-path . string/test-memmove
simple_memmove  __memmove_ssse3_rep __memmove_ssse3 
__memmove_sse2_unaligned__memmove_ia32
string/test-memmove: Wrong result in function __memmove_sse2_unaligned dst 
"0x7084" src "0x7000" offset "43297733"

https://sourceware.org/git/?p=glibc.git;a=blob;f=string/test-memmove.c;h=64e3651ba40604e47ddf6d633f4d0aea4644f60a;hb=HEAD

Long story:

I've trimmed __memmove_sse2_unaligned implementation down to
test-memmove-xmm-unaligned.c (attached). It's supposed to show
failed memmove attempts when those happen:

$ gcc -ggdb3 -O2 -m32 test-memmove-xmm-unaligned.c -o 
test-memmove-xmm-unaligned -Wall && ./test-memmove-xmm-unaligned
Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): offset= 
3786689; expected=0039C7C1( 3786689) actual=0039C7C3( 3786691) 
bit_mismatch=0002; iteration=1
Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): offset= 
3786689; expected=0039C7C1( 3786689) actual=0039C7C3( 3786691) 
bit_mismatch=0002; iteration=3
Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): offset= 
5448641; expected=005323C1( 5448641) actual=005323C3( 5448643) 
bit_mismatch=0002; iteration=5
Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): 
offset=29022145; expected=01BAD7C1(29022145) actual=01BAD7C3(29022147) 
bit_mismatch=0002; iteration=9

$ gcc -ggdb3 -O2 -m64 test-memmove-xmm-unaligned.c -o 
test-memmove-xmm-unaligned -Wall && ./test-memmove-xmm-unaligned
Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): 
offset=25257857; expected=01816781(25257857) actual=01816783(25257859) 
bit_mismatch=0002; iteration=43
Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): 
offset=28109697; expected=01ACEB81(28109697) actual=01ACEB83(28109699) 
bit_mismatch=0002; iteration=112
Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): 
offset=18257633; expected=011696E1(18257633) actual=011696E3(18257635) 
bit_mismatch=0002; iteration=363
Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): 
offset=26981249; expected=019BB381(26981249) actual=019BB383(26981251) 
bit_mismatch=0002; iteration=437

Note it is a single-bit corruption happening occasionally (not on every 
iteration).
-m32 is way more error prone that -m64.

Test example roughly implements these 2 loops:
This fails:
  sfence
  loop {
movdqu [src++],%xmm0
movntdq %xmm0,[dst++]
  }
  sfence
This works:
  sfence
  loop {
movdqu [src++],%xmm0
movdqu %xmm0,[dst++]
  }
  sfence

Failures happen only on sandybridge CPU:
Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz
kernel is 4.17.0-11928-g2837461dbe6f.

Problem is not reproducible instantly after reboot. Machine has to be
heavily loaded to start corrupting memory. A few hours of memtest86+
does not reveal any memory failures.

I wonder if anyone else can reproduce this failure or should I start
looking for a new CPU.

From the above it looks like as if movntdq does not play well with XMM
context save/restore and there is an 'mfence' missing somewhere in
interrupt handling.

If there is no obvious problems with glibc's memove() or my small test
what can I do to rule-out/pin-down hardware or kernel problem?

Thanks!

-- 

  Sergei
/*
  Test as:
$ gcc -ggdb3 -O2 -m32 test-memmove-xmm-unaligned.c -o test-memmove-xmm-unaligned -Wall && ./test-memmove-xmm-unaligned
  Error example:
Bad result in memmove(dst=0xd7cf5094, src=0xd7cf5010, len=268435456): offset= 8031729; expected=007A8DF1( 8031729) actual=007A8DF3( 8031731) bit_mismatch=0002; iteration=2
Bad result in memmove(dst=0xd7cf5094, src=0xd7cf5010, len=268435456): offset=43626993; expected=0299B1F1(43626993) actual=0299B1F3(43626995) bit_mismatch=0002; iteration=3
Bad result in memmove(dst=0xd7cf5094, src=0xd7cf5010, len=268435456): offset=25404913; expected=0183A5F1(25404913) actual=0183A5F3(25404915) bit_mismatch=0002; iteration=4
...
*/

#include  /* memmove */
#include  /* exit */
#include   /* fprintf */

#include  /* mlock() */
#include  /* movdqu, sfence, movntdq */

typedef unsigned int u32;

static void memmove_si128u (__m128i_u * dest, __m128i_u const *src, size_t items) __attribute__((noinline));
static void memmove_si128u (__m128i_u * dest, __m128i_u const *src, size_t items)
{
// emulate behaviour of optimised block for __memmove_sse2_unaligned:
// sfence
// loop(backwards) {
//   8x movdqu  mem->%xmm{N}
//   8x movntdq %xmm{N}->mem
// }
// source: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S;h=9aa17de99c9c3415a9b5ac28fd9f1eb4457f916d;hb=HEAD#l244

// ASSUME: if ((unintptr_t)dest > (unintptr_t)src) {

x86_64: movdqu rarely stores bad data (movdqu works fine). Kernel bug, fried CPU or glibc bug?

2018-06-16 Thread Sergei Trofimovich
TL;DR: on master string/test-memmove glibc test fails on my machine
and I don't know why. Other tests work fine.

$ elf/ld.so --inhibit-cache --library-path . string/test-memmove
simple_memmove  __memmove_ssse3_rep __memmove_ssse3 
__memmove_sse2_unaligned__memmove_ia32
string/test-memmove: Wrong result in function __memmove_sse2_unaligned dst 
"0x7084" src "0x7000" offset "43297733"

https://sourceware.org/git/?p=glibc.git;a=blob;f=string/test-memmove.c;h=64e3651ba40604e47ddf6d633f4d0aea4644f60a;hb=HEAD

Long story:

I've trimmed __memmove_sse2_unaligned implementation down to
test-memmove-xmm-unaligned.c (attached). It's supposed to show
failed memmove attempts when those happen:

$ gcc -ggdb3 -O2 -m32 test-memmove-xmm-unaligned.c -o 
test-memmove-xmm-unaligned -Wall && ./test-memmove-xmm-unaligned
Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): offset= 
3786689; expected=0039C7C1( 3786689) actual=0039C7C3( 3786691) 
bit_mismatch=0002; iteration=1
Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): offset= 
3786689; expected=0039C7C1( 3786689) actual=0039C7C3( 3786691) 
bit_mismatch=0002; iteration=3
Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): offset= 
5448641; expected=005323C1( 5448641) actual=005323C3( 5448643) 
bit_mismatch=0002; iteration=5
Bad result in memmove(dst=0xe7d44110, src=0xe7d44010, len=134217728): 
offset=29022145; expected=01BAD7C1(29022145) actual=01BAD7C3(29022147) 
bit_mismatch=0002; iteration=9

$ gcc -ggdb3 -O2 -m64 test-memmove-xmm-unaligned.c -o 
test-memmove-xmm-unaligned -Wall && ./test-memmove-xmm-unaligned
Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): 
offset=25257857; expected=01816781(25257857) actual=01816783(25257859) 
bit_mismatch=0002; iteration=43
Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): 
offset=28109697; expected=01ACEB81(28109697) actual=01ACEB83(28109699) 
bit_mismatch=0002; iteration=112
Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): 
offset=18257633; expected=011696E1(18257633) actual=011696E3(18257635) 
bit_mismatch=0002; iteration=363
Bad result in memmove(dst=0x7fa4658bf110, src=0x7fa4658bf010, len=134217728): 
offset=26981249; expected=019BB381(26981249) actual=019BB383(26981251) 
bit_mismatch=0002; iteration=437

Note it is a single-bit corruption happening occasionally (not on every 
iteration).
-m32 is way more error prone that -m64.

Test example roughly implements these 2 loops:
This fails:
  sfence
  loop {
movdqu [src++],%xmm0
movntdq %xmm0,[dst++]
  }
  sfence
This works:
  sfence
  loop {
movdqu [src++],%xmm0
movdqu %xmm0,[dst++]
  }
  sfence

Failures happen only on sandybridge CPU:
Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz
kernel is 4.17.0-11928-g2837461dbe6f.

Problem is not reproducible instantly after reboot. Machine has to be
heavily loaded to start corrupting memory. A few hours of memtest86+
does not reveal any memory failures.

I wonder if anyone else can reproduce this failure or should I start
looking for a new CPU.

From the above it looks like as if movntdq does not play well with XMM
context save/restore and there is an 'mfence' missing somewhere in
interrupt handling.

If there is no obvious problems with glibc's memove() or my small test
what can I do to rule-out/pin-down hardware or kernel problem?

Thanks!

-- 

  Sergei
/*
  Test as:
$ gcc -ggdb3 -O2 -m32 test-memmove-xmm-unaligned.c -o test-memmove-xmm-unaligned -Wall && ./test-memmove-xmm-unaligned
  Error example:
Bad result in memmove(dst=0xd7cf5094, src=0xd7cf5010, len=268435456): offset= 8031729; expected=007A8DF1( 8031729) actual=007A8DF3( 8031731) bit_mismatch=0002; iteration=2
Bad result in memmove(dst=0xd7cf5094, src=0xd7cf5010, len=268435456): offset=43626993; expected=0299B1F1(43626993) actual=0299B1F3(43626995) bit_mismatch=0002; iteration=3
Bad result in memmove(dst=0xd7cf5094, src=0xd7cf5010, len=268435456): offset=25404913; expected=0183A5F1(25404913) actual=0183A5F3(25404915) bit_mismatch=0002; iteration=4
...
*/

#include  /* memmove */
#include  /* exit */
#include   /* fprintf */

#include  /* mlock() */
#include  /* movdqu, sfence, movntdq */

typedef unsigned int u32;

static void memmove_si128u (__m128i_u * dest, __m128i_u const *src, size_t items) __attribute__((noinline));
static void memmove_si128u (__m128i_u * dest, __m128i_u const *src, size_t items)
{
// emulate behaviour of optimised block for __memmove_sse2_unaligned:
// sfence
// loop(backwards) {
//   8x movdqu  mem->%xmm{N}
//   8x movntdq %xmm{N}->mem
// }
// source: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S;h=9aa17de99c9c3415a9b5ac28fd9f1eb4457f916d;hb=HEAD#l244

// ASSUME: if ((unintptr_t)dest > (unintptr_t)src) {

Re: [PATCH] modify one dead link

2018-03-20 Thread Sergei Trofimovich
On Tue, 20 Mar 2018 10:54:22 -0400
Dongliang Mu  wrote:

> -# hg clone http://xenbits.xensource.com/ext/ia64/xen-unstable.hg
> +# hg clone http://xenbits.xensource.com/ext/ia64/xen-unstable
>  # cd xen-unstable.hg
>  # hg clone http://xenbits.xensource.com/ext/ia64/linux-2.6.18-xen.hg

You will need to fix a 'cd' as well:
cd xen-unstable
 
Otherwise looks good.

-- 

  Sergei


Re: [PATCH] modify one dead link

2018-03-20 Thread Sergei Trofimovich
On Tue, 20 Mar 2018 10:54:22 -0400
Dongliang Mu  wrote:

> -# hg clone http://xenbits.xensource.com/ext/ia64/xen-unstable.hg
> +# hg clone http://xenbits.xensource.com/ext/ia64/xen-unstable
>  # cd xen-unstable.hg
>  # hg clone http://xenbits.xensource.com/ext/ia64/linux-2.6.18-xen.hg

You will need to fix a 'cd' as well:
cd xen-unstable
 
Otherwise looks good.

-- 

  Sergei


[PATCH v2, simpler] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)

2018-03-09 Thread Sergei Trofimovich
The strace breakage looks like that:
./strace: get_regs: get_regs_error: Input/output error

It happens because ia64 needs to load unwind tables
to read certain registers in 'PTRACE_GETREGS'. Unwind
tables fail to load at kernel startup due to GCC quirk
on the following code (logged as PR 84184):

extern char __end_unwind[];
const struct unw_table_entry *end = (struct unw_table_entry *)table_end;
table->end = segment_base + end[-1].end_offset;

GCC does not generate correct code for this single memory
reference after constant propagation.
Two triggers are required for bad code generation:
- '__end_unwind' has alignment lower (char), than
  'struct unw_table_entry' (8).
- symbol offset is negative.

This commit workarounds it by disabling inline on
init_unwind_table(). This way we avoid const-propagation
of '__end_unwind' and pass address via register.

Tested in ski (emulator) and on rx2600, rx3600 (real hardware).
In case of rx2600 it unbreaks booting.

This patch is a lighter version of patch
https://lkml.org/lkml/2018/2/2/914

CC: Tony Luck <tony.l...@intel.com>
CC: Fenghua Yu <fenghua...@intel.com>
CC: linux-i...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Bug: https://github.com/strace/strace/issues/33
Bug: https://gcc.gnu.org/PR84184
Reported-by: Émeric Maschino <emeric.masch...@gmail.com>
Tested-by: stanton_a...@mail.com
Signed-off-by: Sergei Trofimovich <sly...@gentoo.org>
---
 arch/ia64/kernel/unwind.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/kernel/unwind.c b/arch/ia64/kernel/unwind.c
index e04efa088902..a18190bc99a9 100644
--- a/arch/ia64/kernel/unwind.c
+++ b/arch/ia64/kernel/unwind.c
@@ -2078,7 +2078,14 @@ unw_init_from_blocked_task (struct unw_frame_info *info, 
struct task_struct *t)
 }
 EXPORT_SYMBOL(unw_init_from_blocked_task);
 
-static void
+/*
+ * We use 'noinline' to evade GCC bug https://gcc.gnu.org/PR84184
+ * where gcc code generator emits incorrect code when '__end_unwind'
+ * is const-propagated to 'end[-1].end_offset' and gcc generates
+ * incorrect code. The prigger there is negative offset relative
+ * to externally-defined symbol.
+ */
+noinline static void
 init_unwind_table (struct unw_table *table, const char *name, unsigned long 
segment_base,
   unsigned long gp, const void *table_start, const void 
*table_end)
 {
-- 
2.16.2



[PATCH v2, simpler] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)

2018-03-09 Thread Sergei Trofimovich
The strace breakage looks like that:
./strace: get_regs: get_regs_error: Input/output error

It happens because ia64 needs to load unwind tables
to read certain registers in 'PTRACE_GETREGS'. Unwind
tables fail to load at kernel startup due to GCC quirk
on the following code (logged as PR 84184):

extern char __end_unwind[];
const struct unw_table_entry *end = (struct unw_table_entry *)table_end;
table->end = segment_base + end[-1].end_offset;

GCC does not generate correct code for this single memory
reference after constant propagation.
Two triggers are required for bad code generation:
- '__end_unwind' has alignment lower (char), than
  'struct unw_table_entry' (8).
- symbol offset is negative.

This commit workarounds it by disabling inline on
init_unwind_table(). This way we avoid const-propagation
of '__end_unwind' and pass address via register.

Tested in ski (emulator) and on rx2600, rx3600 (real hardware).
In case of rx2600 it unbreaks booting.

This patch is a lighter version of patch
https://lkml.org/lkml/2018/2/2/914

CC: Tony Luck 
CC: Fenghua Yu 
CC: linux-i...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Bug: https://github.com/strace/strace/issues/33
Bug: https://gcc.gnu.org/PR84184
Reported-by: Émeric Maschino 
Tested-by: stanton_a...@mail.com
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/kernel/unwind.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/kernel/unwind.c b/arch/ia64/kernel/unwind.c
index e04efa088902..a18190bc99a9 100644
--- a/arch/ia64/kernel/unwind.c
+++ b/arch/ia64/kernel/unwind.c
@@ -2078,7 +2078,14 @@ unw_init_from_blocked_task (struct unw_frame_info *info, 
struct task_struct *t)
 }
 EXPORT_SYMBOL(unw_init_from_blocked_task);
 
-static void
+/*
+ * We use 'noinline' to evade GCC bug https://gcc.gnu.org/PR84184
+ * where gcc code generator emits incorrect code when '__end_unwind'
+ * is const-propagated to 'end[-1].end_offset' and gcc generates
+ * incorrect code. The prigger there is negative offset relative
+ * to externally-defined symbol.
+ */
+noinline static void
 init_unwind_table (struct unw_table *table, const char *name, unsigned long 
segment_base,
   unsigned long gp, const void *table_start, const void 
*table_end)
 {
-- 
2.16.2



[PATCH] ia64: doc: tweak whitespace for 'console=' parameter

2018-02-24 Thread Sergei Trofimovich
CC: Tony Luck <tony.l...@intel.com>
CC: Fenghua Yu <fenghua...@intel.com>
CC: linux-i...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sergei Trofimovich <sly...@gentoo.org>
---
 Documentation/ia64/serial.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/ia64/serial.txt b/Documentation/ia64/serial.txt
index 6869c73de4e2..a63d2c54329b 100644
--- a/Documentation/ia64/serial.txt
+++ b/Documentation/ia64/serial.txt
@@ -111,7 +111,7 @@ TROUBLESHOOTING SERIAL CONSOLE PROBLEMS
 
- If you don't have an HCDP, the kernel doesn't know where
  your console lives until the driver discovers serial
- devices.  Use "console=uart, io,0x3f8" (or appropriate
+ devices.  Use "console=uart,io,0x3f8" (or appropriate
  address for your machine).
 
 Kernel and init script output works fine, but no "login:" prompt:
-- 
2.16.1



[PATCH] ia64: doc: tweak whitespace for 'console=' parameter

2018-02-24 Thread Sergei Trofimovich
CC: Tony Luck 
CC: Fenghua Yu 
CC: linux-i...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 Documentation/ia64/serial.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/ia64/serial.txt b/Documentation/ia64/serial.txt
index 6869c73de4e2..a63d2c54329b 100644
--- a/Documentation/ia64/serial.txt
+++ b/Documentation/ia64/serial.txt
@@ -111,7 +111,7 @@ TROUBLESHOOTING SERIAL CONSOLE PROBLEMS
 
- If you don't have an HCDP, the kernel doesn't know where
  your console lives until the driver discovers serial
- devices.  Use "console=uart, io,0x3f8" (or appropriate
+ devices.  Use "console=uart,io,0x3f8" (or appropriate
  address for your machine).
 
 Kernel and init script output works fine, but no "login:" prompt:
-- 
2.16.1



Re: [PATCH] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)

2018-02-11 Thread Sergei Trofimovich
On Fri, 2 Feb 2018 23:02:20 +
Sergei Trofimovich <sly...@gentoo.org> wrote:

> On Fri, 2 Feb 2018 14:22:32 -0800
> "Luck, Tony" <tony.l...@intel.com> wrote:
> 
> > On Fri, Feb 02, 2018 at 10:12:24PM +, Sergei Trofimovich wrote:  
> > > The strace breakage looks like that:
> > > ./strace: get_regs: get_regs_error: Input/output error
> > > 
> > > It happens because ia64 needs to load unwind tables
> > > to read certain registers. Unwind tables fail to load
> > > due to GCC quirk on the following code:
> > > 
> > > extern char __end_unwind[];
> > > const struct unw_table_entry *end = (struct unw_table_entry 
> > > *)table_end;
> > > table->end = segment_base + end[-1].end_offset;
> > > 
> > > GCC does not generate correct code for this single memory
> > > reference after constant propagation (see https://gcc.gnu.org/PR84184).   
> > >  
> > 
> > I'm not seeing this ... probably because I build with
> > a pre-historic 4.3.4 version of gcc.
> > 
> > Do you know which version(s) are affected? I'm not looking
> > for an exhaustive list, just the one on which you found this
> > would be good.
> > 
> > -Tony  
> 
> Original bug https://bugs.gentoo.org/518130 claims regression appeared
> around gcc-4.5. Locally am seeing the problem with gcc-6.4.0, gcc-7.2.0 and
> gcc-8 (HEAD).

Another report on the positive patch effect:

rx2600 boots successfully with this patch (did not without, my guess is due to
early access fault at bad address): https://bugs.gentoo.org/579278#c13

Tested-by: stanton_a...@mail.com

-- 

  Sergei


Re: [PATCH] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)

2018-02-11 Thread Sergei Trofimovich
On Fri, 2 Feb 2018 23:02:20 +
Sergei Trofimovich  wrote:

> On Fri, 2 Feb 2018 14:22:32 -0800
> "Luck, Tony"  wrote:
> 
> > On Fri, Feb 02, 2018 at 10:12:24PM +, Sergei Trofimovich wrote:  
> > > The strace breakage looks like that:
> > > ./strace: get_regs: get_regs_error: Input/output error
> > > 
> > > It happens because ia64 needs to load unwind tables
> > > to read certain registers. Unwind tables fail to load
> > > due to GCC quirk on the following code:
> > > 
> > > extern char __end_unwind[];
> > > const struct unw_table_entry *end = (struct unw_table_entry 
> > > *)table_end;
> > > table->end = segment_base + end[-1].end_offset;
> > > 
> > > GCC does not generate correct code for this single memory
> > > reference after constant propagation (see https://gcc.gnu.org/PR84184).   
> > >  
> > 
> > I'm not seeing this ... probably because I build with
> > a pre-historic 4.3.4 version of gcc.
> > 
> > Do you know which version(s) are affected? I'm not looking
> > for an exhaustive list, just the one on which you found this
> > would be good.
> > 
> > -Tony  
> 
> Original bug https://bugs.gentoo.org/518130 claims regression appeared
> around gcc-4.5. Locally am seeing the problem with gcc-6.4.0, gcc-7.2.0 and
> gcc-8 (HEAD).

Another report on the positive patch effect:

rx2600 boots successfully with this patch (did not without, my guess is due to
early access fault at bad address): https://bugs.gentoo.org/579278#c13

Tested-by: stanton_a...@mail.com

-- 

  Sergei


Re: [PATCH] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)

2018-02-02 Thread Sergei Trofimovich
On Fri, 2 Feb 2018 14:22:32 -0800
"Luck, Tony" <tony.l...@intel.com> wrote:

> On Fri, Feb 02, 2018 at 10:12:24PM +, Sergei Trofimovich wrote:
> > The strace breakage looks like that:
> > ./strace: get_regs: get_regs_error: Input/output error
> > 
> > It happens because ia64 needs to load unwind tables
> > to read certain registers. Unwind tables fail to load
> > due to GCC quirk on the following code:
> > 
> > extern char __end_unwind[];
> > const struct unw_table_entry *end = (struct unw_table_entry *)table_end;
> > table->end = segment_base + end[-1].end_offset;
> > 
> > GCC does not generate correct code for this single memory
> > reference after constant propagation (see https://gcc.gnu.org/PR84184).  
> 
> I'm not seeing this ... probably because I build with
> a pre-historic 4.3.4 version of gcc.
> 
> Do you know which version(s) are affected? I'm not looking
> for an exhaustive list, just the one on which you found this
> would be good.
> 
> -Tony

Original bug https://bugs.gentoo.org/518130 claims regression appeared
around gcc-4.5. Locally am seeing the problem with gcc-6.4.0, gcc-7.2.0 and
gcc-8 (HEAD).

-- 

  Sergei


Re: [PATCH] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)

2018-02-02 Thread Sergei Trofimovich
On Fri, 2 Feb 2018 14:22:32 -0800
"Luck, Tony"  wrote:

> On Fri, Feb 02, 2018 at 10:12:24PM +, Sergei Trofimovich wrote:
> > The strace breakage looks like that:
> > ./strace: get_regs: get_regs_error: Input/output error
> > 
> > It happens because ia64 needs to load unwind tables
> > to read certain registers. Unwind tables fail to load
> > due to GCC quirk on the following code:
> > 
> > extern char __end_unwind[];
> > const struct unw_table_entry *end = (struct unw_table_entry *)table_end;
> > table->end = segment_base + end[-1].end_offset;
> > 
> > GCC does not generate correct code for this single memory
> > reference after constant propagation (see https://gcc.gnu.org/PR84184).  
> 
> I'm not seeing this ... probably because I build with
> a pre-historic 4.3.4 version of gcc.
> 
> Do you know which version(s) are affected? I'm not looking
> for an exhaustive list, just the one on which you found this
> would be good.
> 
> -Tony

Original bug https://bugs.gentoo.org/518130 claims regression appeared
around gcc-4.5. Locally am seeing the problem with gcc-6.4.0, gcc-7.2.0 and
gcc-8 (HEAD).

-- 

  Sergei


[PATCH] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)

2018-02-02 Thread Sergei Trofimovich
The strace breakage looks like that:
./strace: get_regs: get_regs_error: Input/output error

It happens because ia64 needs to load unwind tables
to read certain registers. Unwind tables fail to load
due to GCC quirk on the following code:

extern char __end_unwind[];
const struct unw_table_entry *end = (struct unw_table_entry *)table_end;
table->end = segment_base + end[-1].end_offset;

GCC does not generate correct code for this single memory
reference after constant propagation (see https://gcc.gnu.org/PR84184).
Two triggers are required for bad code generation:
- '__end_unwind' has alignment lower (char), than
  'struct unw_table_entry' (8).
- symbol offset is negative.

This commit workarounds it by fixing alignment of '__end_unwind'.
While at it use hidden symbols to generate shorter gp-relative
relocations.

CC: Tony Luck <tony.l...@intel.com>
CC: Fenghua Yu <fenghua...@intel.com>
CC: linux-i...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Bug: https://github.com/strace/strace/issues/33
Bug: https://gcc.gnu.org/PR84184
Reported-by: Émeric Maschino <emeric.masch...@gmail.com>
Signed-off-by: Sergei Trofimovich <sly...@gentoo.org>
---
 arch/ia64/include/asm/sections.h |  1 -
 arch/ia64/kernel/unwind.c| 15 ++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index f3481408594e..0fc4f1757a44 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -24,7 +24,6 @@ extern char __start_gate_mckinley_e9_patchlist[], 
__end_gate_mckinley_e9_patchli
 extern char __start_gate_vtop_patchlist[], __end_gate_vtop_patchlist[];
 extern char __start_gate_fsyscall_patchlist[], __end_gate_fsyscall_patchlist[];
 extern char __start_gate_brl_fsys_bubble_down_patchlist[], 
__end_gate_brl_fsys_bubble_down_patchlist[];
-extern char __start_unwind[], __end_unwind[];
 extern char __start_ivt_text[], __end_ivt_text[];
 
 #undef dereference_function_descriptor
diff --git a/arch/ia64/kernel/unwind.c b/arch/ia64/kernel/unwind.c
index e04efa088902..025ba6700790 100644
--- a/arch/ia64/kernel/unwind.c
+++ b/arch/ia64/kernel/unwind.c
@@ -2243,7 +2243,20 @@ __initcall(create_gate_table);
 void __init
 unw_init (void)
 {
-   extern char __gp[];
+   #define __ia64_hidden __attribute__((visibility("hidden")))
+   /*
+* We use hidden symbols to generate more efficient code using
+* gp-relative addressing.
+*/
+   extern char __gp[] __ia64_hidden;
+   /*
+* Unwind tables need to have proper alignment as init_unwind_table()
+* uses negative offsets against '__end_unwind'.
+* See https://gcc.gnu.org/PR84184
+*/
+   extern const struct unw_table_entry __start_unwind[] __ia64_hidden;
+   extern const struct unw_table_entry __end_unwind[] __ia64_hidden;
+   #undef __ia64_hidden
extern void unw_hash_index_t_is_too_narrow (void);
long i, off;
 
-- 
2.16.1



[PATCH] ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)

2018-02-02 Thread Sergei Trofimovich
The strace breakage looks like that:
./strace: get_regs: get_regs_error: Input/output error

It happens because ia64 needs to load unwind tables
to read certain registers. Unwind tables fail to load
due to GCC quirk on the following code:

extern char __end_unwind[];
const struct unw_table_entry *end = (struct unw_table_entry *)table_end;
table->end = segment_base + end[-1].end_offset;

GCC does not generate correct code for this single memory
reference after constant propagation (see https://gcc.gnu.org/PR84184).
Two triggers are required for bad code generation:
- '__end_unwind' has alignment lower (char), than
  'struct unw_table_entry' (8).
- symbol offset is negative.

This commit workarounds it by fixing alignment of '__end_unwind'.
While at it use hidden symbols to generate shorter gp-relative
relocations.

CC: Tony Luck 
CC: Fenghua Yu 
CC: linux-i...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Bug: https://github.com/strace/strace/issues/33
Bug: https://gcc.gnu.org/PR84184
Reported-by: Émeric Maschino 
Signed-off-by: Sergei Trofimovich 
---
 arch/ia64/include/asm/sections.h |  1 -
 arch/ia64/kernel/unwind.c| 15 ++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index f3481408594e..0fc4f1757a44 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -24,7 +24,6 @@ extern char __start_gate_mckinley_e9_patchlist[], 
__end_gate_mckinley_e9_patchli
 extern char __start_gate_vtop_patchlist[], __end_gate_vtop_patchlist[];
 extern char __start_gate_fsyscall_patchlist[], __end_gate_fsyscall_patchlist[];
 extern char __start_gate_brl_fsys_bubble_down_patchlist[], 
__end_gate_brl_fsys_bubble_down_patchlist[];
-extern char __start_unwind[], __end_unwind[];
 extern char __start_ivt_text[], __end_ivt_text[];
 
 #undef dereference_function_descriptor
diff --git a/arch/ia64/kernel/unwind.c b/arch/ia64/kernel/unwind.c
index e04efa088902..025ba6700790 100644
--- a/arch/ia64/kernel/unwind.c
+++ b/arch/ia64/kernel/unwind.c
@@ -2243,7 +2243,20 @@ __initcall(create_gate_table);
 void __init
 unw_init (void)
 {
-   extern char __gp[];
+   #define __ia64_hidden __attribute__((visibility("hidden")))
+   /*
+* We use hidden symbols to generate more efficient code using
+* gp-relative addressing.
+*/
+   extern char __gp[] __ia64_hidden;
+   /*
+* Unwind tables need to have proper alignment as init_unwind_table()
+* uses negative offsets against '__end_unwind'.
+* See https://gcc.gnu.org/PR84184
+*/
+   extern const struct unw_table_entry __start_unwind[] __ia64_hidden;
+   extern const struct unw_table_entry __end_unwind[] __ia64_hidden;
+   #undef __ia64_hidden
extern void unw_hash_index_t_is_too_narrow (void);
long i, off;
 
-- 
2.16.1



Re: [PATCH v3] ia64: fix module loading for gcc-5.4

2017-04-29 Thread Sergei Trofimovich
On Sat,  8 Apr 2017 20:53:18 +0100
Sergei Trofimovich <sly...@gentoo.org> wrote:

> Starting from gcc-5.4+ gcc generates MLX
> instructions in more cases to refer local
> symbols:
> https://gcc.gnu.org/PR60465
> 
> That caused ia64 module loader to choke
> on such instructions:
> fuse: invalid slot number 1 for IMM64
> 
> Linux kernel used to handle only case where
> relocation pointed to slot=2 instruction in
> the bundle. That limitation was fixed in linux by
> commit 9c184a073bfd ("[IA64] Fix 2.6 kernel for the new ia64 assembler")
> See http://sources.redhat.com/bugzilla/show_bug.cgi?id=1433
> 
> This change lifts the slot=2 restriction from
> linux kernel module loader.
> 
> Tested on 'fuse' and 'btrfs' kernel modules.
> 
> Cc: Markus Elfring <elfr...@users.sourceforge.net>
> Cc: H. J. Lu <hjl.to...@gmail.com>
> Cc: Tony Luck <tony.l...@intel.com>
> Cc: Fenghua Yu <fenghua...@intel.com>
> Cc: linux-i...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Andrew Morton <a...@linux-foundation.org>
> Bug: https://bugs.gentoo.org/601014
> Tested-by: Émeric MASCHINO <emeric.masch...@gmail.com>
> Signed-off-by: Sergei Trofimovich <sly...@gentoo.org>
> ---
> Change since v1: added 'Tested-by'
> Change since v2: checkpatched, fixed typos by found by Markus Elfring

Ping :)

-- 

  Sergei


pgpCNIfSoxdHg.pgp
Description: Цифровая подпись OpenPGP


Re: [PATCH v3] ia64: fix module loading for gcc-5.4

2017-04-29 Thread Sergei Trofimovich
On Sat,  8 Apr 2017 20:53:18 +0100
Sergei Trofimovich  wrote:

> Starting from gcc-5.4+ gcc generates MLX
> instructions in more cases to refer local
> symbols:
> https://gcc.gnu.org/PR60465
> 
> That caused ia64 module loader to choke
> on such instructions:
> fuse: invalid slot number 1 for IMM64
> 
> Linux kernel used to handle only case where
> relocation pointed to slot=2 instruction in
> the bundle. That limitation was fixed in linux by
> commit 9c184a073bfd ("[IA64] Fix 2.6 kernel for the new ia64 assembler")
> See http://sources.redhat.com/bugzilla/show_bug.cgi?id=1433
> 
> This change lifts the slot=2 restriction from
> linux kernel module loader.
> 
> Tested on 'fuse' and 'btrfs' kernel modules.
> 
> Cc: Markus Elfring 
> Cc: H. J. Lu 
> Cc: Tony Luck 
> Cc: Fenghua Yu 
> Cc: linux-i...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Andrew Morton 
> Bug: https://bugs.gentoo.org/601014
> Tested-by: Émeric MASCHINO 
> Signed-off-by: Sergei Trofimovich 
> ---
> Change since v1: added 'Tested-by'
> Change since v2: checkpatched, fixed typos by found by Markus Elfring

Ping :)

-- 

  Sergei


pgpCNIfSoxdHg.pgp
Description: Цифровая подпись OpenPGP


Re: [PATCH v3] ia64: fix module loading for gcc-5.4+

2017-04-10 Thread Sergei Trofimovich
On Mon, 10 Apr 2017 19:23:28 +0200
SF Markus Elfring  wrote:

> > -   if (slot(insn) != 2) {
> > +   if (slot(insn) != 1 && slot(insn) != 2) {  
> 
> + int const s = slot(insn);
> + if (s < 1 || s > 2) {
> 
> Do run time characteristics matter for such a condition check here?

It's done once at kernel module load time. My guess would be
"not critical at all".

slot() is a pure arithmetic static inline function. You can compare
assembly output before and after your change.

You can measure the difference yourself using 'ski' emulator.
That's for example how I debugged and tested the patch:
 http://trofi.github.io/posts/199-ia64-machine-emulation.html

-- 

  Sergei


pgps9ql5YvoLF.pgp
Description: Цифровая подпись OpenPGP


Re: [PATCH v3] ia64: fix module loading for gcc-5.4+

2017-04-10 Thread Sergei Trofimovich
On Mon, 10 Apr 2017 19:23:28 +0200
SF Markus Elfring  wrote:

> > -   if (slot(insn) != 2) {
> > +   if (slot(insn) != 1 && slot(insn) != 2) {  
> 
> + int const s = slot(insn);
> + if (s < 1 || s > 2) {
> 
> Do run time characteristics matter for such a condition check here?

It's done once at kernel module load time. My guess would be
"not critical at all".

slot() is a pure arithmetic static inline function. You can compare
assembly output before and after your change.

You can measure the difference yourself using 'ski' emulator.
That's for example how I debugged and tested the patch:
 http://trofi.github.io/posts/199-ia64-machine-emulation.html

-- 

  Sergei


pgps9ql5YvoLF.pgp
Description: Цифровая подпись OpenPGP


Re: ia64: fix module loading for gcc-5.4

2017-04-09 Thread Sergei Trofimovich
On Sun, 9 Apr 2017 11:02:43 +0200
SF Markus Elfring  wrote:

> >>> That caused ia64 module loader to choke
> >>> on such instructions:
> >>> fuse: invalid slot number 1 for IMM64
> >>
> >> Why does it matter to check such a value?  
> > 
> > I'm not sure I follow the question. Is your question about
> > linux kernel relocation code handler, gcc or ia64 instruction format?  
> 
> I am just curious if this source code could also work without
> the mentioned check.

It should work for valid code, yes. The flip side of check removal
is to miss malformed relocation (say, when instruction "address" is
wrong due to obscure toolchain bug). In this case apply_imm64()
would silently corrupt unrelated memory instead of crashing kernel.

> Would it make sense to check more than two values there?

AFAIU ia64 does not allow encoding imm64/imm60 instructions
spanning slot=0 at all.

ia64_patch_imm64() can handle only imm64 bundles that span
only both slot 1 and slot 2 at the same time. It can accept
either slot=1 "address" or slot=2 "address". Anything else would
be malformed.

-- 

  Sergei


pgpzJI4SWd3oI.pgp
Description: Цифровая подпись OpenPGP


Re: ia64: fix module loading for gcc-5.4

2017-04-09 Thread Sergei Trofimovich
On Sun, 9 Apr 2017 11:02:43 +0200
SF Markus Elfring  wrote:

> >>> That caused ia64 module loader to choke
> >>> on such instructions:
> >>> fuse: invalid slot number 1 for IMM64
> >>
> >> Why does it matter to check such a value?  
> > 
> > I'm not sure I follow the question. Is your question about
> > linux kernel relocation code handler, gcc or ia64 instruction format?  
> 
> I am just curious if this source code could also work without
> the mentioned check.

It should work for valid code, yes. The flip side of check removal
is to miss malformed relocation (say, when instruction "address" is
wrong due to obscure toolchain bug). In this case apply_imm64()
would silently corrupt unrelated memory instead of crashing kernel.

> Would it make sense to check more than two values there?

AFAIU ia64 does not allow encoding imm64/imm60 instructions
spanning slot=0 at all.

ia64_patch_imm64() can handle only imm64 bundles that span
only both slot 1 and slot 2 at the same time. It can accept
either slot=1 "address" or slot=2 "address". Anything else would
be malformed.

-- 

  Sergei


pgpzJI4SWd3oI.pgp
Description: Цифровая подпись OpenPGP


Re: [PATCH v3] ia64: fix module loading for gcc-5.4

2017-04-09 Thread Sergei Trofimovich
On Sun, 9 Apr 2017 10:27:52 +0200
SF Markus Elfring  wrote:

> > That caused ia64 module loader to choke
> > on such instructions:
> > fuse: invalid slot number 1 for IMM64  
> 
> Why does it matter to check such a value?

I'm not sure I follow the question. Is your question about
linux kernel relocation code handler, gcc or ia64 instruction
format?

-- 

  Sergei


pgpsbo9tMNJCe.pgp
Description: Цифровая подпись OpenPGP


Re: [PATCH v3] ia64: fix module loading for gcc-5.4

2017-04-09 Thread Sergei Trofimovich
On Sun, 9 Apr 2017 10:27:52 +0200
SF Markus Elfring  wrote:

> > That caused ia64 module loader to choke
> > on such instructions:
> > fuse: invalid slot number 1 for IMM64  
> 
> Why does it matter to check such a value?

I'm not sure I follow the question. Is your question about
linux kernel relocation code handler, gcc or ia64 instruction
format?

-- 

  Sergei


pgpsbo9tMNJCe.pgp
Description: Цифровая подпись OpenPGP


[PATCH v3] ia64: fix module loading for gcc-5.4

2017-04-08 Thread Sergei Trofimovich
Starting from gcc-5.4+ gcc generates MLX
instructions in more cases to refer local
symbols:
https://gcc.gnu.org/PR60465

That caused ia64 module loader to choke
on such instructions:
fuse: invalid slot number 1 for IMM64

Linux kernel used to handle only case where
relocation pointed to slot=2 instruction in
the bundle. That limitation was fixed in linux by
commit 9c184a073bfd ("[IA64] Fix 2.6 kernel for the new ia64 assembler")
See http://sources.redhat.com/bugzilla/show_bug.cgi?id=1433

This change lifts the slot=2 restriction from
linux kernel module loader.

Tested on 'fuse' and 'btrfs' kernel modules.

Cc: Markus Elfring <elfr...@users.sourceforge.net>
Cc: H. J. Lu <hjl.to...@gmail.com>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Fenghua Yu <fenghua...@intel.com>
Cc: linux-i...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Andrew Morton <a...@linux-foundation.org>
Bug: https://bugs.gentoo.org/601014
Tested-by: Émeric MASCHINO <emeric.masch...@gmail.com>
Signed-off-by: Sergei Trofimovich <sly...@gentoo.org>
---
Change since v1: added 'Tested-by'
Change since v2: checkpatched, fixed typos by found by Markus Elfring

 arch/ia64/kernel/module.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c
index 6ab0ae7d6535..d1d945c6bd05 100644
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -153,7 +153,7 @@ slot (const struct insn *insn)
 static int
 apply_imm64 (struct module *mod, struct insn *insn, uint64_t val)
 {
-   if (slot(insn) != 2) {
+   if (slot(insn) != 1 && slot(insn) != 2) {
printk(KERN_ERR "%s: invalid slot number %d for IMM64\n",
   mod->name, slot(insn));
return 0;
@@ -165,7 +165,7 @@ apply_imm64 (struct module *mod, struct insn *insn, 
uint64_t val)
 static int
 apply_imm60 (struct module *mod, struct insn *insn, uint64_t val)
 {
-   if (slot(insn) != 2) {
+   if (slot(insn) != 1 && slot(insn) != 2) {
printk(KERN_ERR "%s: invalid slot number %d for IMM60\n",
   mod->name, slot(insn));
return 0;
-- 
2.12.0



[PATCH v3] ia64: fix module loading for gcc-5.4

2017-04-08 Thread Sergei Trofimovich
Starting from gcc-5.4+ gcc generates MLX
instructions in more cases to refer local
symbols:
https://gcc.gnu.org/PR60465

That caused ia64 module loader to choke
on such instructions:
fuse: invalid slot number 1 for IMM64

Linux kernel used to handle only case where
relocation pointed to slot=2 instruction in
the bundle. That limitation was fixed in linux by
commit 9c184a073bfd ("[IA64] Fix 2.6 kernel for the new ia64 assembler")
See http://sources.redhat.com/bugzilla/show_bug.cgi?id=1433

This change lifts the slot=2 restriction from
linux kernel module loader.

Tested on 'fuse' and 'btrfs' kernel modules.

Cc: Markus Elfring 
Cc: H. J. Lu 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: linux-i...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Andrew Morton 
Bug: https://bugs.gentoo.org/601014
Tested-by: Émeric MASCHINO 
Signed-off-by: Sergei Trofimovich 
---
Change since v1: added 'Tested-by'
Change since v2: checkpatched, fixed typos by found by Markus Elfring

 arch/ia64/kernel/module.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c
index 6ab0ae7d6535..d1d945c6bd05 100644
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -153,7 +153,7 @@ slot (const struct insn *insn)
 static int
 apply_imm64 (struct module *mod, struct insn *insn, uint64_t val)
 {
-   if (slot(insn) != 2) {
+   if (slot(insn) != 1 && slot(insn) != 2) {
printk(KERN_ERR "%s: invalid slot number %d for IMM64\n",
   mod->name, slot(insn));
return 0;
@@ -165,7 +165,7 @@ apply_imm64 (struct module *mod, struct insn *insn, 
uint64_t val)
 static int
 apply_imm60 (struct module *mod, struct insn *insn, uint64_t val)
 {
-   if (slot(insn) != 2) {
+   if (slot(insn) != 1 && slot(insn) != 2) {
printk(KERN_ERR "%s: invalid slot number %d for IMM60\n",
   mod->name, slot(insn));
return 0;
-- 
2.12.0



[PATCH] alpha: cleanup: remove __NR_sys_epoll_*, leave __NR_epoll_*

2017-04-08 Thread Sergei Trofimovich
__NR_sys_epoll_create and friends are alpha-specific
while __NR_epoll_create is a generic name for other
arches.

Cc: Richard Henderson <r...@twiddle.net>
Cc: Ivan Kokshaysky <i...@jurassic.park.msu.ru>
Cc: Matt Turner <matts...@gmail.com>
Cc: linux-al...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergei Trofimovich <sly...@gentoo.org>
---
 arch/alpha/include/uapi/asm/unistd.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/unistd.h 
b/arch/alpha/include/uapi/asm/unistd.h
index aa33bf5aacb6..650d339a8df6 100644
--- a/arch/alpha/include/uapi/asm/unistd.h
+++ b/arch/alpha/include/uapi/asm/unistd.h
@@ -366,11 +366,6 @@
 #define __NR_epoll_create  407
 #define __NR_epoll_ctl 408
 #define __NR_epoll_wait409
-/* Feb 2007: These three sys_epoll defines shouldn't be here but culling
- * them would break userspace apps ... we'll kill them off in 2010 :) */
-#define __NR_sys_epoll_create  __NR_epoll_create
-#define __NR_sys_epoll_ctl __NR_epoll_ctl
-#define __NR_sys_epoll_wait__NR_epoll_wait
 #define __NR_remap_file_pages  410
 #define __NR_set_tid_address   411
 #define __NR_restart_syscall   412
-- 
2.12.2



[PATCH] alpha: cleanup: remove __NR_sys_epoll_*, leave __NR_epoll_*

2017-04-08 Thread Sergei Trofimovich
__NR_sys_epoll_create and friends are alpha-specific
while __NR_epoll_create is a generic name for other
arches.

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: linux-al...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergei Trofimovich 
---
 arch/alpha/include/uapi/asm/unistd.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/unistd.h 
b/arch/alpha/include/uapi/asm/unistd.h
index aa33bf5aacb6..650d339a8df6 100644
--- a/arch/alpha/include/uapi/asm/unistd.h
+++ b/arch/alpha/include/uapi/asm/unistd.h
@@ -366,11 +366,6 @@
 #define __NR_epoll_create  407
 #define __NR_epoll_ctl 408
 #define __NR_epoll_wait409
-/* Feb 2007: These three sys_epoll defines shouldn't be here but culling
- * them would break userspace apps ... we'll kill them off in 2010 :) */
-#define __NR_sys_epoll_create  __NR_epoll_create
-#define __NR_sys_epoll_ctl __NR_epoll_ctl
-#define __NR_sys_epoll_wait__NR_epoll_wait
 #define __NR_remap_file_pages  410
 #define __NR_set_tid_address   411
 #define __NR_restart_syscall   412
-- 
2.12.2



[PATCH (resend)] ia64: fix module loading for gcc-5.4

2017-04-08 Thread Sergei Trofimovich
Starting from gcc-5.4+ gcc geperates MLX
instructions in more cases to refer local
symbols:
https://gcc.gnu.org/bugzilla/60465

That caused ia64 module loader to choke
on such instructions:
fuse: invalid slot number 1 for IMM64

Linux kernel used to handle only case where
relocation pointed to slot=2 instruction in
the bundle. That limitation was fixed in linux
by 9c184a073bfd650cc791956d6ca79725bb682716 commit.
See http://sources.redhat.com/bugzilla/show_bug.cgi?id=1433

This change lifts the slot=2 restriction from
linux kernel module loader.

Tested on 'fuse' and 'btrfs' kernel modules.

Cc: H. J. Lu <hjl.to...@gmail.com>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Fenghua Yu <fenghua...@intel.com>
Cc: linux-i...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Andrew Morton <a...@linux-foundation.org>
Bug: https://bugs.gentoo.org/601014
Tested-by: Émeric MASCHINO <emeric.masch...@gmail.com>
Signed-off-by: Sergei Trofimovich <sly...@gentoo.org>
---
Change since v1: added 'Tested-by'

 arch/ia64/kernel/module.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c
index 6ab0ae7d6535..d1d945c6bd05 100644
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -153,7 +153,7 @@ slot (const struct insn *insn)
 static int
 apply_imm64 (struct module *mod, struct insn *insn, uint64_t val)
 {
-   if (slot(insn) != 2) {
+   if (slot(insn) != 1 && slot(insn) != 2) {
printk(KERN_ERR "%s: invalid slot number %d for IMM64\n",
   mod->name, slot(insn));
return 0;
@@ -165,7 +165,7 @@ apply_imm64 (struct module *mod, struct insn *insn, 
uint64_t val)
 static int
 apply_imm60 (struct module *mod, struct insn *insn, uint64_t val)
 {
-   if (slot(insn) != 2) {
+   if (slot(insn) != 1 && slot(insn) != 2) {
printk(KERN_ERR "%s: invalid slot number %d for IMM60\n",
   mod->name, slot(insn));
return 0;
-- 
2.12.0



  1   2   >