Re: [PATCH v3] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2

2021-02-09 Thread Meelis Roos

I did a rudimentary benchmark on the same 8-node Sun Fire X4600-M2, on top of 
todays  5.11.0-rc7-2-ge0756cfc7d7c.

The test: building clean kernel with make -j64 after make clean and drop_caches.

While running clean kernel / 3 tries):

real2m38.574s
user46m18.387s
sys 6m8.724s

real2m37.647s
user46m34.171s
sys 6m11.993s

real2m37.832s
user46m34.910s
sys 6m12.013s


While running patched kernel:

real2m40.072s
user46m22.610s
sys 6m6.658s


for real time, seems to be 1.5s-2s slower out of 160s (noise?) User and system 
time are slightly less, on the other hand, so seems good to me.

--
Meelis Roos 


Re: [PATCH v2] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2

2021-02-03 Thread Meelis Roos

03.02.21 13:12 Barry Song wrote:

kernel/sched/topology.c | 85 +
  1 file changed, 53 insertions(+), 32 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 5d3675c7a76b..964ed89001fe 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c


This one still works on the Sun X4600-M2, on top of v5.11-rc6-55-g3aaf0a27ffc2.


Performance-wise - is the some simple benhmark to run to meaure the impact? 
Compared to what - 5.10.0 or the kernel with the warning?

drop caches and time the build time of linux kernel with make -j64?

--
Meelis Roos


Re: [RFC PATCH v2] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2

2021-01-28 Thread Meelis Roos

Tested by the below topology:
qemu-system-aarch64  -M virt -nographic \


Also works on the initial 8-node Sun Fire X4600-M2. No strange messages in 
dmesg and no problems on kernel build with make -j64.

Tested-by: Meelis Roos 


Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes

2021-01-21 Thread Meelis Roos




Could you paste the output of the below?

   $ cat /sys/devices/system/node/node*/distance


10 12 12 14 14 14 14 16
12 10 14 12 14 14 12 14
12 14 10 14 12 12 14 14
14 12 14 10 12 12 14 14
14 14 12 12 10 14 12 14
14 14 12 12 14 10 14 12
14 12 14 14 12 14 10 12
16 14 14 14 14 12 12 10



Additionally, booting your system with CONFIG_SCHED_DEBUG=y and
appending 'sched_debug' to your cmdline should yield some extra data.


[0.00] Linux version 5.11.0-rc4-00015-g45dfb8a5659a (mroos@x4600m2) 
(gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 
2.35.1) #55 SMP Thu Jan 21 19:23:10 EET 2021
[0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc4-00015-g45dfb8a5659a root=/dev/sda1 ro quiet
[0.00] x86/fpu: x87 FPU will use FXSAVE
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x00099bff] usable
[0.00] BIOS-e820: [mem 0x00099c00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e6000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xd7f9] usable
[0.00] BIOS-e820: [mem 0xd7fae000-0xd7fa] type 9
[0.00] BIOS-e820: [mem 0xd7fb-0xd7fbdfff] ACPI data
[0.00] BIOS-e820: [mem 0xd7fbe000-0xd7fe] ACPI NVS
[0.00] BIOS-e820: [mem 0xd7ff-0xd7ff] reserved
[0.00] BIOS-e820: [mem 0xdc00-0xefff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff70-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x002027ff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.5 present.
[0.00] DMI: Sun Microsystems Sun Fire X4600 M2/Sun Fire X4600 M2, 
BIOS 0ABIT132 12/03/2009
[0.00] tsc: Fast TSC calibration using PIT
[0.00] tsc: Detected 2293.794 MHz processor
[0.005734] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.005740] e820: remove [mem 0x000a-0x000f] usable
[0.011432] AGP: No AGP bridge found
[0.011578] last_pfn = 0x2028000 max_arch_pfn = 0x4
[0.011601] MTRR default type: uncachable
[0.011604] MTRR fixed ranges enabled:
[0.011607]   0-9 write-back
[0.011610]   A-E uncachable
[0.011612]   F-F write-protect
[0.011614] MTRR variable ranges enabled:
[0.011616]   0 base  mask 8000 write-back
[0.011620]   1 base 8000 mask C000 write-back
[0.011623]   2 base C000 mask F000 write-back
[0.011626]   3 base D000 mask F800 write-back
[0.011629]   4 disabled
[0.011630]   5 disabled
[0.011632]   6 disabled
[0.011633]   7 disabled
[0.011634] TOM2: 00202800 aka 131712M
[0.012697] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT
[0.013048] e820: update [mem 0xd800-0x] usable ==> reserved
[0.013083] last_pfn = 0xd7fa0 max_arch_pfn = 0x4
[0.018157] found SMP MP-table at [mem 0x000ff780-0x000ff78f]
[0.018215] Using GB pages for direct mapping
[0.018603] ACPI: Early table checksum verification disabled
[0.018613] ACPI: RSDP 0x000F9EE0 24 (v02 SUN   )
[0.018623] ACPI: XSDT 0xD7FB0100 9C (v01 SUNX4600 M2 
0132 MSFT 0097)
[0.018635] ACPI: FACP 0xD7FB0290 F4 (v03 SUNX4600 M2 
0132 MSFT 0097)
[0.018645] ACPI BIOS Warning (bug): 32/64X length mismatch in 
FADT/Gpe0Block: 64/32 (20201113/tbfadt-564)
[0.018652] ACPI BIOS Warning (bug): 32/64X length mismatch in 
FADT/Gpe1Block: 128/64 (20201113/tbfadt-564)
[0.018658] ACPI: DSDT 0xD7FB0710 007DF7 (v01 SUNX4600 M2 
0132 INTL 20051117)
[0.018664] ACPI: FACS 0xD7FBE000 40
[0.018667] ACPI: FACS 0xD7FBE000 40
[0.018671] ACPI: APIC 0xD7FB0390 000170 (v01 SUNX4600 M2 
0132 MSFT 0097)
[0.018676] ACPI: SPCR 0xD7FB0500 50 (v01 SUNX4600 M2 
0132 MSFT 0097)
[0.018681] ACPI: MCFG 0xD7FB0550 3C (v01 SUNX4600 M2 
0132 MSFT 0097)
[0.018686] ACPI: SLIT 0xD7FB064C 6C (v01 SUNX4600 M2 
0132 MSFT 0097)
[0.018691] ACPI: SPMI 0xD7FB06C0 41 (v05 SUNOEMSPMI  
0132 MSFT 0097)
[0.018695] ACPI: OEMB 0xD7FBE040 63 (v01 SUNX4600 M2 
0132 MSFT 0097)
[0.018700] ACPI: SRAT 0xD7FB8510 0003C0 (v01 AMDFAM_F_10 
0002 AMD  0001)
[0.018705] ACPI: HPET 0xD7FB88D0 38 (v01 SUNX4600 M2 
0132 MSFT 0097)
[0.018709] ACPI: IPET 0xD7FB8910 38 (v01 SUNX4600

VGA text console corruption in 5.9.0 and 5.10-rc4

2020-11-17 Thread Meelis Roos

5.9 introduces VGA console corruption in one of my test PC-s (I do not have VGA 
console on most). The PC has Intel D2550MUD2 board with Atom D2550.

The symptoms include:
* missing screen updates on VT switch
* fragments of other VT-s appear during scrolling (kernel compilation output on 
visible VT1 scrolls up, sometimes it includes 5 or so lines from curses 
application on VT2 or its scroll-back history)
* missing up-scrolling of lines/fragments in curses applications. Visible in 
make menuconfig and mc
and maybe more (these are the ones I can describe mostly clearly).

5.9.0 with fbcon (as packaged by debian) does not show these symptoms.
5.9.0 and todays 5.10-rc4+git exhibit this behavior if I let them use VGA text 
console.


$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Atom Processor D2xxx/N2xxx DRAM 
Controller [8086:0bf3] (rev 04)
00:02.0 VGA compatible controller [0300]: Intel Corporation Atom Processor 
D2xxx/N2xxx Integrated Graphics Controller [8086:0be2] (rev 0b)
00:1b.0 Audio device [0403]: Intel Corporation NM10/ICH7 Family High Definition 
Audio Controller [8086:27d8] (rev 02)
00:1c.0 PCI bridge [0604]: Intel Corporation NM10/ICH7 Family PCI Express Port 
1 [8086:27d0] (rev 02)
00:1d.0 USB controller [0c03]: Intel Corporation NM10/ICH7 Family USB UHCI 
Controller #1 [8086:27c8] (rev 02)
00:1d.1 USB controller [0c03]: Intel Corporation NM10/ICH7 Family USB UHCI 
Controller #2 [8086:27c9] (rev 02)
00:1d.2 USB controller [0c03]: Intel Corporation NM10/ICH7 Family USB UHCI 
Controller #3 [8086:27ca] (rev 02)
00:1d.3 USB controller [0c03]: Intel Corporation NM10/ICH7 Family USB UHCI 
Controller #4 [8086:27cb] (rev 02)
00:1d.7 USB controller [0c03]: Intel Corporation NM10/ICH7 Family USB2 EHCI 
Controller [8086:27cc] (rev 02)
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge 
[8086:2448] (rev e2)
00:1f.0 ISA bridge [0601]: Intel Corporation NM10 Family LPC Controller 
[8086:27bc] (rev 02)
00:1f.2 SATA controller [0106]: Intel Corporation NM10/ICH7 Family SATA 
Controller [AHCI mode] [8086:27c1] (rev 02)
00:1f.3 SMBus [0c05]: Intel Corporation NM10/ICH7 Family SMBus Controller 
[8086:27da] (rev 02)
01:00.0 Ethernet controller [0200]: Intel Corporation 82574L Gigabit Network 
Connection [8086:10d3]


Nothing interesting in dmesg, selected lines:

[0.00] Linux version 5.10.0-rc4-00067-g9c87c9f41245 (mroos@d2550) (gcc 
(Debian 10.2.0-15) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #8 SMP Tue 
Nov 17 14:39:11 EET 2020
[0.00] DMI:  /D2550MUD2, BIOS MUCDT10N.86A.0075.2013.0427.1548 
04/27/2013
[0.001878] MTRR default type: uncachable
[0.001881] MTRR fixed ranges enabled:
[0.001885]   0-9 write-back
[0.001888]   A-B uncachable
[0.001891]   C-D write-protect
[0.001893]   E-F uncachable
[0.001896] MTRR variable ranges enabled:
[0.001900]   0 base 0 mask F8000 write-back
[0.001903]   1 base 07F00 mask FFF00 uncachable
[0.001907]   2 base 0FFE0 mask FFFE0 write-protect
[0.001909]   3 disabled
[0.001911]   4 disabled
[0.001913]   5 disabled
[0.001915]   6 disabled
[0.002024] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT
[0.028625] ACPI: HPET id: 0x8086a201 base: 0xfed0
[0.028636] smpboot: Allowing 4 CPUs, 0 hotplug CPUs
[0.095056] Console: colour VGA+ 80x25
[0.099221] printk: console [tty0] enabled
[0.226357] smpboot: CPU0: Intel(R) Atom(TM) CPU D2550   @ 1.86GHz (family: 
0x6, model: 0x36, stepping: 0x1)[0.095056] Console: colour VGA+ 80x25
[0.227697] smp: Bringing up secondary CPUs ...
[0.227697] x86: Booting SMP configuration:
[0.227697]  node  #0, CPUs:  #1
[0.010909] Disabled fast string operations
[0.228016]  #2
[0.010909] Disabled fast string operations
[0.231720]  #3
[0.010909] Disabled fast string operations
[0.233935] smp: Brought up 1 node, 4 CPUs
[0.233935] smpboot: Max logical packages: 1
[0.233935] smpboot: Total of 4 processors activated (14934.80 BogoMIPS)
[0.238692] PCI: MMCONFIG for domain  [bus 00-3f] at [mem 
0xe000-0xe3ff] (base 0xe000)
[0.238756] PCI: MMCONFIG at [mem 0xe000-0xe3ff] reserved in E820
[0.238824] PCI: Using configuration type 1 for base access
[0.243986] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages



Machine-specific config, from compiling current git:
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 5.10.0-rc4 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (Debian 10.2.0-15) 10.2.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=100200
CONFIG_LD_VERSION=23501
CONFIG_CLANG_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFI

page granularity memory corruption on alpha (5.8, 5.9)

2020-10-13 Thread Meelis Roos

I have an AlphaServer DS20E that ran 5.6.0 fine. 5.8.0 had a problem during 
rc's - ext4 mounting failed
due to corrupt data (looked like memory corruption but was very deterministic). 
5.8.0 release booted
fine once but if 5.9-git failed again, I recompiled 5.8.0 and that failed too. 
Next 5.9-git kernels
booted but corrupted files - I updated debian-ports distro and it broke a files 
list file for some
package or another (garbage at end of file). Tried 5.9.0-00282-g1e6d1d96461e 
yesterday and that fails
too: I tried git pull and building the kernel with newest gcc and 
drivers/mfd/Makefile had 8192 bytes
of correct contents and binary garbage with a structure after that.

I also checked debian-ports packaged 5.8.0-3-alpha-generic kernel and it seemed 
to work without corruption -
perhaps something is wrong with my configuration (but it worked before).


Sample corruption from the Makefile: od -A d -c shows

...
0008160   -   c   o   r   e   .   o   l   m   3   5   3   3   -   c
0008176   t   r   l   b   a   n   k   .   o  \n   o   b   j   -   $   (
0008192  \0  \0  \0  \0  \0  \0  \0  \0 002  \0  \0  \0  \0  \0  \0  \0
0008208 341 242 003 001  \0  \0  \0 247 361  \a 001  \0  \0  \0
0008224  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0008240 304 277  \0  \0  \0 002  \0  \0   : 373 237   |  \0  \0  \0  \0
0008256 001  \0  \0  \0  \0  \0  \0  \0 320 345 002  \0  \0 002  \0  \0
0008272 205   9  \0 001  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0008288 033 001  \0  \0  \0  \0  \0  \0   H 330 006  \0  \0 002  \0  \0
0008304  \0 340 002  \0  \0 002  \0  \0 320 340 274 037 001  \0  \0  \0
0008320 330 340 274 037 001  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0008336 004 311  \0  \0  \0 002  \0  \0   : 373 237   |  \0  \0  \0  \0
0008352 205   9  \0 001  \0  \0  \0  \0 343 274 037 001  \0  \0  \0
0008368 320 345 002  \0  \0 002  \0  \0   @ 342 274 037 001  \0  \0  \0
0008384   ( 342 274 037 001  \0  \0  \0 210 265 003  \0  \0 002  \0  \0
0008400  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0008416   H 330 006  \0  \0 002  \0  \0 354 177 362 001  \0  \0  \0  \0

and the same in od -x (corruption starting from 2 octal) - looks like 
64-bit values with two bytes of zero.

0017660 0929 3d2b 7320 7379 6f63 2e6e 0a6f 626f
0017700 2d6a 2824 4f43 464e 4749 4d5f 4446 4c5f
0017720 334d 3335 2933 2b09 203d 6d6c 3533 
0017740 632d 726f 2e65 206f 6d6c 3533  632d
0017760 7274 626c 6e61 2e6b 0a6f 626f 2d6a 2824
002     0002   
0020020 a2e1 2003 0001  f1a7 2007 0001 
0020040        
0020060 bfc4  0200  fb3a 7c9f  
0020100 0001    e5d0 0002 0200 
0020120 3985 2000 0001     
0020140 011b    d848 0006 0200 
0020160 e000 0002 0200  e0d0 1fbc 0001 
0020200 e0d8 1fbc 0001     
0020220 c904  0200  fb3a 7c9f  
0020240 3985 2000 0001  e300 1fbc 0001 
0020260 e5d0 0002 0200  e240 1fbc 0001 
0020300 e228 1fbc 0001  b588 0003 0200 
0020320        
0020340 d848 0006 0200  7fec 01f2  
0020360 e0d8 1fbc 0001  d990 0005 0200 
0020400 e228 1fbc 0001  1350 2000 0001 

It has custom kernel configuration:

#
# Automatically generated file; DO NOT EDIT.
# Linux/alpha 5.9.0 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (Debian 10.2.0-13) 10.2.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=100200
CONFIG_LD_VERSION=23501
CONFIG_CLANG_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="ds20e"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_GENERIC_CLOCKEVENTS=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

CONFIG_PREEMPT_NONE=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

#
# RCU Subs

Re: gcc crashes with general protection faults in 5.9.0-rc5

2020-09-14 Thread Meelis Roos
e029f3c0 R12: 01ab53e0
[ 1513.209020] R13: 0003 R14:  R15: 7f60da91b1f8
[ 1513.209023] Modules linked in: dm_mod md_mod cpufreq_conservative 
cpufreq_userspace cpufreq_powersave pktcdvd joydev snd_hda_codec_realtek 
snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi coretemp radeon 
snd_hda_intel snd_intel_dspcfg intel_powerclamp snd_hda_codec hwmon snd_hwdep 
kvm_intel ttm snd_hda_core kvm irqbypass iTCO_wdt snd_pcm_oss tpm_infineon 
iTCO_vendor_support crc32c_intel snd_mixer_oss mei_wdt psmouse evdev pcspkr 
tpm_tis snd_pcm tpm_tis_core snd_timer e1000e snd lpc_ich tpm mfd_core rng_core 
soundcore acpi_cpufreq loop i2c_dev parport_pc lp parport ip_tables x_tables 
autofs4
[ 1513.209048] ---[ end trace 5ccb97e370c341f7 ]---
[ 1513.209051] RIP: 0010:ext4_readpage+0xa/0x50
[ 1513.209053] Code: ff a9 00 00 00 10 74 0b 66 83 bf e2 02 00 00 00 74 01 c3 31 d2 
e9 46 ea 01 00 66 0f 1f 44 00 00 41 54 49 89 f4 55 48 8b 46 18 <48> 8b 28 48 8b 
85 68 ff ff ff a9 00 00 00 10 74 1b 66 83 bd e2 02
[ 1513.209055] RSP: :96b18b09fd88 EFLAGS: 00010286
[ 1513.209057] RAX: dead0400 RBX: 96b18b09fe60 RCX: 
[ 1513.209058] RDX: 0001 RSI: c54a413fdf80 RDI: 8d8f41c3c800
[ 1513.209059] RBP: c54a413fdf80 R08: 0005 R09: 8d8f9bd61e50
[ 1513.209061] R10:  R11: 8d8f41c3c800 R12: c54a413fdf80
[ 1513.209062] R13: 0b7e R14: 8d8efa04bb18 R15: 8d8efa04bc90
[ 1513.209065] FS:  7f60e012bf00() GS:8d8f9bc0() 
knlGS:
[ 1513.209067] CS:  0010 DS:  ES:  CR0: 80050033
[ 1513.209068] CR2: 0112cc96 CR3: 0000bb72e000 CR4: 06f0

--
Meelis Roos 


Re: gcc crashes with general protection faults in 5.9.0-rc3-00091-ge28f0104343d

2020-09-11 Thread Meelis Roos

Replying to myself:

This is 5.9.0-rc3-00091-ge28f0104343d on Lenovo t460s that has ran fine up to 
5.8.0.


Now I reproduced the same problem with 5.9.0-rc3 on a HP desktop with Core2Quad 
CPU. The call trace is very similar and it's crashing gcc again while compiling 
5.9-rc4.

But it seems 5.9-rc4 cures it here as well - whatever the reason might have 
been.

Nope, the reason was nondeterminism - it happened on the Core2Quad running 
5.9-rc4 while trying to compile todays Linux from git.

--
Meelis Roos 


Re: gcc crashes with general protection faults in 5.9.0-rc3-00091-ge28f0104343d

2020-09-08 Thread Meelis Roos
a08308d03d58 EFLAGS: 00010286
[307299.392060] RAX: dead0400 RBX: a08308d03e38 RCX: 

[307299.392061] RDX: 0001 RSI: de94c0d00ec0 RDI: 
9661c786ca00
[307299.392062] RBP: de94c0d00ec0 R08: 0001 R09: 

[307299.392063] R10: 0071 R11: 9661c786ca00 R12: 
de94c0d00ec0
[307299.392064] R13: 063b R14: 96636d3aaea0 R15: 
96636d3ab018
[307299.392065] FS:  7f7871446f00() GS:966396c8() 
knlGS:
[307299.392067] CS:  0010 DS:  ES:  CR0: 80050033
[307299.392068] CR2: 00a3b1f0 CR3: 6c3e2003 CR4: 
003706e0
[307299.392069] Call Trace:
[307299.392073]  filemap_fault+0x193/0x7c0
[307299.392075]  ext4_filemap_fault+0x28/0x3a
[307299.392078]  __do_fault+0x31/0xf0
[307299.392080]  handle_mm_fault+0xf1a/0x14c0
[307299.392084]  do_user_addr_fault+0x1b3/0x3e0
[307299.392087]  exc_page_fault+0x61/0x130
[307299.392089]  ? asm_exc_page_fault+0x8/0x30
[307299.392091]  asm_exc_page_fault+0x1e/0x30
[307299.392093] RIP: 0033:0xa3b620
[307299.392096] Code: Bad RIP value.
[307299.392097] RSP: 002b:7ffe4b382018 EFLAGS: 00010202
[307299.392099] RAX: 7f786fd32980 RBX: 7f786fd32a18 RCX: 

[307299.392100] RDX: 0002 RSI: 0001 RDI: 
7f786fd4ee70
[307299.392101] RBP:  R08:  R09: 
00c0
[307299.392102] R10: 0140 R11: 002f R12: 
0001
[307299.392103] R13:  R14:  R15: 0000-- 

Meelis Roos 


Re: 5.9-rc4: modpost undefined symbols + relocation in read-only section `.head.text'

2020-09-08 Thread Meelis Roos

Replying to myself:


This is 5.9-rc4 git on a specific amd64 machine with Debian unstable and custom 
kernel config. 5.8 compiled and worked fine, I have seen something like this 
with different 5.9-git commits. I made sure my binutils and gcc-10 are up to 
date in Debian unstable and retried with 5.9-rc4. Still I see the same during 
build (have not tried booting it more than once after a failed boot). This only 
happens on this specific computer and is reproducible after make clean, other 
tested machines with Debian unstable toolchain are fine. Kernel config is below.

I found another Debian amd64 machine that exhibits the "relocation in read-only 
section `.head.text'" warning but no symbol errors from MODPOST.

The kernel fails to boot, grub selects next kernel automatically so image 
format is probably bad.

  LDS arch/x86/boot/compressed/vmlinux.lds
  AS  arch/x86/boot/compressed/head_64.o
  VOFFSET arch/x86/boot/compressed/../voffset.h
  CC  arch/x86/boot/compressed/string.o
  CC  arch/x86/boot/compressed/cmdline.o
  CC  arch/x86/boot/compressed/error.o
  OBJCOPY arch/x86/boot/compressed/vmlinux.bin
  RELOCS  arch/x86/boot/compressed/vmlinux.relocs
  CC  arch/x86/boot/compressed/cpuflags.o
  CC  arch/x86/boot/compressed/early_serial_console.o
  CC  arch/x86/boot/compressed/kaslr.o
  CC  arch/x86/boot/compressed/kaslr_64.o
  AS  arch/x86/boot/compressed/mem_encrypt.o
  CC  arch/x86/boot/compressed/pgtable_64.o
  CC  arch/x86/boot/compressed/acpi.o
  AS  arch/x86/boot/compressed/efi_thunk_64.o
  CC  arch/x86/boot/compressed/misc.o
  LZMAarch/x86/boot/compressed/vmlinux.bin.lzma
  MKPIGGY arch/x86/boot/compressed/piggy.S
  AS  arch/x86/boot/compressed/piggy.o
  LD  arch/x86/boot/compressed/vmlinux
ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only 
section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
  ZOFFSET arch/x86/boot/zoffset.h
  OBJCOPY arch/x86/boot/vmlinux.bin
  AS  arch/x86/boot/header.o
  LD  arch/x86/boot/setup.elf
  OBJCOPY arch/x86/boot/setup.bin
  BUILD   arch/x86/boot/bzImage
Setup is 14460 bytes (padded to 14848 bytes).
System is 4785 kB
CRC f036c6cb
Kernel: arch/x86/boot/bzImage is ready  (#322)
^Cmake[1]: *** [scripts/Makefile.modpost:117: __modpost] Interrupt
make: *** [Makefile:1392: modules] Interrupt

Config:
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 5.9.0-rc4 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (Debian 10.2.0-6) 10.2.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=100200
CONFIG_LD_VERSION=23500
CONFIG_CLANG_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
CONFIG_KERNEL_LZMA=y
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="prometheus"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_WATCH_QUEUE=y
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_GENERIC_IRQ_INJECTION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
CONFIG_CONTEXT_TRACKING=y
# CONFIG_CONTEXT_TRACKING_FORCE is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CON

Re: gcc crashes with general protection faults in 5.9.0-rc3-00091-ge28f0104343d

2020-09-07 Thread Meelis Roos

Following up my yesterdays mail:


This is 5.9.0-rc3-00091-ge28f0104343d on Lenovo t460s that has ran fine up to 
5.8.0.

Today I tried reproducing my linking problem with git kernel on my laptop and 
got segmentation faults in gcc. This is probably the corresponding dmesg part:

0xdead0400 loks like some kind of poisoning.

[307299.392045] general protection fault, probably for non-canonical address 
0xdead0400:  [#1] SMP PTI


Was not reproducible in 5.9-rc4 while recompiling the kernel in a loop for 8 
hours.

--
Meelis Roos 


5.9-rc4: modpost undefined symbols + relocation in read-only section `.head.text'

2020-09-07 Thread Meelis Roos

This is 5.9-rc4 git on a specific amd64 machine with Debian unstable and custom 
kernel config. 5.8 compiled and worked fine, I hav seen something like this 
with different 5.9-git commits. I made sure my binutils and gcc-10 are up to 
date in Debian unstable and retried with 5.9-rc4. Still I see the same during 
build (have not tried booting it more than once after a failed boot). This only 
happens on this specific computer and is reproducible after make clean, other 
tested machines with Debian unstable toolchain are fine. Kernel config is below.

  ...
  CC  arch/x86/boot/cpu.o
  LDS arch/x86/boot/compressed/vmlinux.lds
  AS  arch/x86/boot/compressed/kernel_info.o
  AS  arch/x86/boot/compressed/head_64.o
  VOFFSET arch/x86/boot/compressed/../voffset.h
  CC  arch/x86/boot/compressed/string.o
  CC  arch/x86/boot/compressed/cmdline.o
  CC  arch/x86/boot/compressed/error.o
  OBJCOPY arch/x86/boot/compressed/vmlinux.bin
  RELOCS  arch/x86/boot/compressed/vmlinux.relocs
  HOSTCC  arch/x86/boot/compressed/mkpiggy
  CC  arch/x86/boot/compressed/cpuflags.o
  CC  arch/x86/boot/compressed/early_serial_console.o
  CC  arch/x86/boot/compressed/kaslr.o
  CC  arch/x86/boot/compressed/kaslr_64.o
  AS  arch/x86/boot/compressed/mem_encrypt.o
  CC  arch/x86/boot/compressed/pgtable_64.o
  CC  arch/x86/boot/compressed/acpi.o
  XZKERN  arch/x86/boot/compressed/vmlinux.bin.xz
ERROR: modpost: "irq_poll_init" [drivers/scsi/lpfc/lpfc.ko] undefined!
ERROR: modpost: "irq_poll_sched" [drivers/scsi/lpfc/lpfc.ko] undefined!
ERROR: modpost: "irq_poll_complete" [drivers/scsi/lpfc/lpfc.ko] undefined!
  CC  arch/x86/boot/compressed/misc.o
make[1]: *** [scripts/Makefile.modpost:111: Module.symvers] Error 1
make[1]: *** Deleting file 'Module.symvers'
make: *** [Makefile:1392: modules] Error 2
make: *** Waiting for unfinished jobs
  MKPIGGY arch/x86/boot/compressed/piggy.S
  AS  arch/x86/boot/compressed/piggy.o
  LD  arch/x86/boot/compressed/vmlinux
ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only 
section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
  ZOFFSET arch/x86/boot/zoffset.h
  OBJCOPY arch/x86/boot/vmlinux.bin
  AS  arch/x86/boot/header.o
  LD  arch/x86/boot/setup.elf
  OBJCOPY arch/x86/boot/setup.bin
  BUILD   arch/x86/boot/bzImage
Setup is 14396 bytes (padded to 14848 bytes).
System is 4649 kB
CRC 3b22552a
Kernel: arch/x86/boot/bzImage is ready  (#38)


#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 5.9.0-rc4 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (Debian 10.2.0-6) 10.2.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=100200
CONFIG_LD_VERSION=23500
CONFIG_CLANG_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_KERNEL_XZ=y
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_GENERIC_IRQ_INJECTION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_IRQ_MSI_IOMMU=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

#
# CPU/Tas

Re: [bisected] "mm/vmalloc: Add flag for freeing of special permsissions" corrupts memory on ia64

2019-07-08 Thread Meelis Roos

I am out of the office and don't have access to this hardware either. I
will try to find someone at Intel that does to speed this up. In the
meantime I can send you a logging patch to do some sanity checks if you
are able to run it.


I am also cut off from testing anything - it seems the air conditioning
unit in my test site has failked for good now and the earliest I can test
anything is next week.


I think I found your earlier mail, and it said 5.2-rc1 did not show the
problem. I guess this wasn't the case after further testing, but 5.1
continued to be problem free?


Yes, 5.2-rc1 was problematic in retesting, and 5.1 was OK.

I also started suspecting binutils upgrade meanwhile - I upgraded binutils
to 2.31.1-p5 in Gentoo right after booting into 5.1, but the bisection
results were finally consistent so I did not look into binutils versions
further. gcc has not changed for me recently.

--
Meelis Roos 


[bisected] "mm/vmalloc: Add flag for freeing of special permsissions" corrupts memory on ia64

2019-07-04 Thread Meelis Roos

I noticed that while 5.1 works on my HP Integrity RX2620, 5.2-rc6 crashed on 
boot nondeterministically.
Bisecting it took many tries sice it does not happen on each boot and when it 
happes, the symptoms are
different each time. But now the bisection converged to

!ma868b104d7379e28013e9d48bdd2db25e0bdcf751 is the first bad commit
commit 868b104d7379e28013e9d48bdd2db25e0bdcf751
Author: Rick Edgecombe 
Date:   Thu Apr 25 17:11:36 2019 -0700

mm/vmalloc: Add flag for freeing of special permsissions

Add a new flag VM_FLUSH_RESET_PERMS, for enabling vfree operations to
immediately clear executable TLB entries before freeing pages, and handle
resetting permissions on the directmap. This flag is useful for any kind
of memory with elevated permissions, or where there can be related
permissions changes on the directmap. Today this is RO+X and RO memory.

Although this enables directly vfreeing non-writeable memory now,
non-writable memory cannot be freed in an interrupt because the allocation
itself is used as a node on deferred free list. So when RO memory needs to
be freed in an interrupt the code doing the vfree needs to have its own
work queue, as was the case before the deferred vfree list was added to
vmalloc.

For architectures with set_direct_map_ implementations this whole operation
can be done with one TLB flush when centralized like this. For others with
directmap permissions, currently only arm64, a backup method using
set_memory functions is used to reset the directmap. When arm64 adds
set_direct_map_ functions, this backup can be removed.

When the TLB is flushed to both remove TLB entries for the vmalloc range
mapping and the direct map permissions, the lazy purge operation could be
done to try to save a TLB flush later. However today vm_unmap_aliases
could flush a TLB range that does not include the directmap. So a helper
is added with extra parameters that can allow both the vmalloc address and
the direct mapping to be flushed during this operation. The behavior of the
normal vm_unmap_aliases function is unchanged.

Suggested-by: Dave Hansen 
Suggested-by: Andy Lutomirski 
Suggested-by: Will Deacon 
Signed-off-by: Rick Edgecombe 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Nadav Amit 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Link: https://lkml.kernel.org/r/20190426001143.4983-17-na...@vmware.com
Signed-off-by: Ingo Molnar 

:04 04 6af7c46e4736f2b80e363d7d7793253f9f279ea4 
58066de53107eab0705398b5d0c407424c138a86 M  include
:04 04 87cf40e161342a2a1c2dd49099740dc413b32449 
19a0d6f5ba799f7f1d43ee1f0aebcc46be0e96bd M  mm


The symptoms seem to be often module loading related.

One typical scenario is modprobes failing and udevd agents killed:

Jul  1 09:17:57 rx2620 kernel: udevd[421]: worker [504] 
/devices/pci:00/:00:01.0 is taking a long time
Jul  1 09:17:57 rx2620 kernel: udevd[421]: worker [495] 
/devices/pci:00/:00:01.1 is taking a long time
Jul  1 09:19:57 rx2620 kernel: udevd[421]: worker [504] 
/devices/pci:00/:00:01.0 timeout; kill it
Jul  1 09:19:57 rx2620 kernel: udevd[421]: seq 626 
'/devices/pci:00/:00:01.0' killed
Jul  1 09:19:57 rx2620 kernel: udevd[421]: worker [495] 
/devices/pci:00/:00:01.1 timeout; kill it
Jul  1 09:19:57 rx2620 kernel: udevd[421]: seq 627 
'/devices/pci:00/:00:01.1' killed
Jul  1 09:19:57 rx2620 kernel: udevd[421]: worker [495] terminated by signal 9 
(Killed)
Jul  1 09:19:57 rx2620 kernel: udevd[421]: worker [495] failed while handling 
'/devices/pci:00/:00:01.1'
Jul  1 09:19:57 rx2620 kernel: udevd[421]: worker [504] terminated by signal 9 
(Killed)
Jul  1 09:19:57 rx2620 kernel: udevd[421]: worker [504] failed while handling 
'/devices/pci:00/:00:01.0'


Or:

[   13.363452] udevd[498]: IA-64 Illegal operation fault 0 [1]
[   13.363452] Modules linked in: ehci_pci(+) e1000(+) ehci_hcd usbcore 
usb_common pata_cmd64x libata efivars
[   13.363452]
[   13.363452] CPU: 0 PID: 498 Comm: udevd Not tainted 5.2.0-rc6 #46
[   13.363452] Hardware name: hp server rx2620   , BIOS 04.29   
 11/30/2007
[   13.363452] psr : 101008026010 ifs : 8003 ip  : 
[]Not tainted (5.2.0-rc6)

Or (as mentioned in my first mail about the crash):

   13.471600] udevd[498]: NaT consumption 2216203124768 [1]
[   13.471600] Modules linked in: L^A() ohci_hcd ehci_pci ehci_hcd usbcore 
pata_cmd64x e1000(+) usb_common libata efivars

[   13.471600] CPU: 0 PID: 498 Comm: udevd Not tainted 
5.2.0-rc6-00015-g249155c20f9b #47
[   13.473692] Hardware name: hp server rx2620   , BIOS 04.29   
 11/30

sock_prot_inuse_add unaligned access and crash on sparc64

2019-06-19 Thread Meelis Roos

Tried todays git on Sun Netra 240 (sparc64). Got bootup crash with custom, 
machine-specific config:

[   47.760841] Kernel unaligned access at TPC[7bf124] 
sock_prot_inuse_add+0x4/0x20
[   47.856969] Unable to handle kernel paging request in mna handler
[   47.856972]  at virtual address 14ee258a
[   47.997703] current->{active_,}mm->context = 0001
[   48.073193] current->{active_,}mm->pgd = fff000133cc0c000
[   48.144105]   \|/  \|/
[   48.144105]   "@'/ .. \`@"
[   48.144105]   /_| \__/ |_\
[   48.144105]  \__U_/
[   48.337408] systemd(1): Oops [#1]
[   48.380862] CPU: 0 PID: 1 Comm: systemd Not tainted 
5.2.0-rc5-00224-gbed3c0d84e7e #8
[   48.482657] TSTATE: 004411001605 TPC: 007bf124 TNPC: 
007bf128 Y: Not tainted
[   48.611912] TPC: 
[   48.671370] g0: ff00 g1: 0200 g2: 0006 
g3: 
[   48.785748] g4: fff000133c0a5760 g5: fff000133ecc4000 g6: fff000133c0bc000 
g7: 001e
[   48.900121] o0: 14ee240a o1: 00afef30 o2:  
o3: fff000133c0a5d50
[   49.014495] o4: fff000133c0a5760 o5:  sp: fff000133c0bf061 
ret_pc: 008aac78
[   49.133456] RPC: 
[   49.196349] l0: 07feff80d5d0 l1:  l2: fff000133cd11f88 
l3: 
[   49.310725] l4:  l5:  l6:  
l7: fff100869da0
[   49.425099] i0: 008aac48 i1: 0200 i2: 0001 
i3: 
[   49.539475] i4:  i5:  i6: fff000133c0bf111 
i7: 007c05f0
[   49.653851] I7: <__sk_destruct+0x10/0x180>
[   49.707597] Call Trace:
[   49.739626]  [007c05f0] __sk_destruct+0x10/0x180
[   49.809396]  [008abb1c] unix_release_sock+0x1bc/0x260
[   49.884882]  [008abbd0] unix_release+0x10/0x40
[   49.952361]  [007ba96c] __sock_release+0x2c/0xc0
[   50.022130]  [007baa0c] sock_close+0xc/0x20
[   50.086187]  [00594a70] __fput+0x90/0x220
[   50.147944]  [0047ee80] task_work_run+0x80/0xc0
[   50.216574]  [0042e23c] do_notify_resume+0x5c/0x80
[   50.288624]  [00404b48] __handle_signal+0xc/0x30
[   50.358387] Disabling lock debugging due to kernel taint
[   50.428159] Caller[007c05f0]: __sk_destruct+0x10/0x180
[   50.504788] Caller[008abb1c]: unix_release_sock+0x1bc/0x260
[   50.587138] Caller[008abbd0]: unix_release+0x10/0x40
[   50.661483] Caller[007ba96c]: __sock_release+0x2c/0xc0
[   50.738110] Caller[007baa0c]: sock_close+0xc/0x20
[   50.809023] Caller[00594a70]: __fput+0x90/0x220
[   50.877647] Caller[0047ee80]: task_work_run+0x80/0xc0
[   50.953136] Caller[0042e23c]: do_notify_resume+0x5c/0x80
[   51.032053] Caller[00404b48]: __handle_signal+0xc/0x30
[   51.108685] Caller[fff100205934]: 0xfff100205934
[   51.178449] Instruction DUMP:
[   51.178451]  0100
[   51.217335]  0100
[   51.248217]  c40260c8
[   51.279096] 
[   51.309979]  8528b002
[   51.340858]  82004002
[   51.371742]  c4004005
[   51.402622]  9400800a
[   51.433505]  81c3e008
[   51.464385]
[   51.514706] Kernel panic - not syncing: Aiee, killing interrupt handler!
[   51.602778] Press Stop-A (L1-A) from sun keyboard or send break
[   51.602778] twice on console to return to the boot prom
[   51.749170] ---[ end Kernel panic - not syncing: Aiee, killing interrupt 
handler! ]---

Config:

#
# Automatically generated file; DO NOT EDIT.
# Linux/sparc64 5.2.0-rc5 Kernel Configuration
#

#
# Compiler: gcc (Debian 8.3.0-7) 8.3.0
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=80300
CONFIG_CLANG_VERSION=0
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_WARN_MAYBE_UNINITIALIZED=y
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_PREFLOW_FASTEOI=y
CONFIG_IRQ_DOMAIN=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN 

sparc64 crash around deactivate_slab

2019-06-19 Thread Meelis Roos

The same Sun V445 that gave me BPF errors, had a differrnet error with todays 
git, just idling:

[   51.530195] Kernel unaligned access at TPC[58265c] 
deactivate_slab.isra.28+0xfc/0x420
[   51.675010] Unable to handle kernel paging request in mna handler
[   51.675013]  at virtual address 91d0200591d02005
[   51.828736] current->{active_,}mm->context = 0026
[   51.911239] current->{active_,}mm->pgd = fff000323d3d8000
[   51.988743]   \|/  \|/
[   51.988743]   "@'/ .. \`@"
[   51.988743]   /_| \__/ |_\
[   51.988743]  \__U_/
[   52.200013] swapper/0(0): Oops [#1]
[   52.250008] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
5.2.0-rc5-00224-gbed3c0d84e7e #33
[   52.365015] TSTATE: 004480e01600 TPC: 0058265c TNPC: 
00582660 Y: Not tainted
[   52.506274] TPC: 
[   52.578772] g0: 0001 g1:  g2: fff000323ed26000 
g3: 
[   52.703780] g4: fff000323c0e6300 g5: fff000323ed26000 g6: fff000323c134000 
g7: 0200
[   52.828786] o0:  o1: 000c71090c88 o2: 0001 
o3: 
[   52.953792] o4: 000e o5: 000e sp: fff000323c1370b1 
ret_pc: 005825a0
[   53.083799] RPC: 
[   53.156299] l0: fff000323d378340 l1: 007f0101 l2: 0001 
l3: 000c7109baa8
[   53.281307] l4: 000f l5: 00210d00 l6: 000c71090c88 
l7: 
[   53.406312] i0: fff000323d18b1e0 i1: 00800101 i2: 91d0200591d02005 
i3: fff000323f814e80
[   53.531319] i4: fff000323f814e90 i5: 91d0200591d02005 i6: fff000323c1371c1 
i7: 00582c18
[   53.656328] I7: 
[   53.715073] Call Trace:
[   53.750076]  [00582c18] flush_cpu_slab+0x38/0x60
[   53.826333]  [004d02a8] flush_smp_call_function_queue+0x68/0x180
[   53.922593]  [0093585c] smp_call_function_client+0x1c/0x40
[   54.011341]  [004208d4] tl0_irq6+0x14/0x20
[   54.080098]  [0042c8b4] arch_cpu_idle+0x94/0xa0
[   54.155104]  [0048b118] do_idle+0x118/0x1a0
[   54.225099]  [0048b3bc] cpu_startup_entry+0x1c/0x40
[   54.305102]  [00a71984] 0xa71984
[   54.361354]  [4000] 0x4000
[   54.420106] Disabling lock debugging due to kernel taint
[   54.496362] Caller[00582c18]: flush_cpu_slab+0x38/0x60
[   54.580116] Caller[004d02a8]: 
flush_smp_call_function_queue+0x68/0x180
[   54.683873] Caller[0093585c]: smp_call_function_client+0x1c/0x40
[   54.780124] Caller[004208d4]: tl0_irq6+0x14/0x20
[   54.856378] Caller[0042c8a8]: arch_cpu_idle+0x88/0xa0
[   54.938882] Caller[0048b118]: do_idle+0x118/0x1a0
[   55.016386] Caller[0048b3bc]: cpu_startup_entry+0x1c/0x40
[   55.103889] Caller[00a71984]: 0xa71984
[   55.167641] Caller[4000]: 0x4000
[   55.233894] Instruction DUMP:
[   55.233895]  c2758000
[   55.276395]  c2062020
[   55.310146]  b410001d
[   55.343897] 
[   55.377650]  02c1c004
[   55.411401]  ee5da020
[   55.445153]  106fffdf
[   55.478904]  ba17
[   55.512655]  f85da028
[   55.546407]
[   55.601412] Kernel panic - not syncing: Aiee, killing interrupt handler!
[   55.697685] Press Stop-A (L1-A) from sun keyboard or send break
[   55.697685] twice on console to return to the boot prom
[   55.857678] ---[ end Kernel panic - not syncing: Aiee, killing interrupt 
handler! ]---

Config:

#
# Automatically generated file; DO NOT EDIT.
# Linux/sparc64 5.2.0-rc5 Kernel Configuration
#

#
# Compiler: gcc (Debian 8.3.0-7) 8.3.0
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=80300
CONFIG_CLANG_VERSION=0
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_WARN_MAYBE_UNINITIALIZED=y
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_DEFAULT_HOSTNAME="v445"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_PREFLOW_FASTEOI=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TA

Re: [PATCH] vmalloc: Don't use flush flag when no exec perm

2019-05-30 Thread Meelis Roos

The addition of VM_FLUSH_RESET_PERMS for BPF JIT allocations was
bisected to prevent boot on an UltraSparc III machine. It was found
that
sometime shortly after the TLB flush this flag does on vfree of the
BPF
program, the machine hung. Further investigation showed that before
any of
the changes for this flag were introduced, with
CONFIG_DEBUG_PAGEALLOC
configured (which does a similar TLB flush of the vmalloc range on
every vfree), this machine also hung shortly after the first vmalloc
unmap/free.

So the evidence points to there being some existing issue with the
vmalloc TLB flushes, but it's still unknown exactly why these hangs
are
happening on sparc. It is also unknown when someone with this
hardware
could resolve this, and in the meantime using this flag on it turns a
lurking behavior into something that prevents boot.


The sparc TLB flush issue has been bisected and is being worked on now,
so hopefully we won't need this patch:
https://marc.info/?l=linux-sparc&m=155915694304118&w=2


And the sparc64 patch that fixes CONFIG_DEBUG_PAGEALLOC also fixes booting
of the latest git kernel on Sun V445 where my problem initially happened.

--
Meelis Roos 


Re: [PATCH v2] vmalloc: Fix issues with flush flag

2019-05-20 Thread Meelis Roos

Switch VM_FLUSH_RESET_PERMS to use a regular TLB flush intead of
vm_unmap_aliases() and fix calculation of the direct map for the
CONFIG_ARCH_HAS_SET_DIRECT_MAP case.

Meelis Roos reported issues with the new VM_FLUSH_RESET_PERMS flag on a
sparc machine. On investigation some issues were noticed:

1. The calculation of the direct map address range to flush was wrong.
This could cause problems on x86 if a RO direct map alias ever got loaded
into the TLB. This shouldn't normally happen, but it could cause the
permissions to remain RO on the direct map alias, and then the page
would return from the page allocator to some other component as RO and
cause a crash.

2. Calling vm_unmap_alias() on vfree could potentially be a lot of work to
do on a free operation. Simply flushing the TLB instead of the whole
vm_unmap_alias() operation makes the frees faster and pushes the heavy
work to happen on allocation where it would be more expected.
In addition to the extra work, vm_unmap_alias() takes some locks including
a long hold of vmap_purge_lock, which will make all other
VM_FLUSH_RESET_PERMS vfrees wait while the purge operation happens.

3. page_address() can have locking on some configurations, so skip calling
this when possible to further speed this up.

Fixes: 868b104d7379 ("mm/vmalloc: Add flag for freeing of special permsissions")
Reported-by: Meelis Roos
Cc: Meelis Roos
Cc: Peter Zijlstra
Cc: "David S. Miller"
Cc: Dave Hansen
Cc: Borislav Petkov
Cc: Andy Lutomirski
Cc: Ingo Molnar
Cc: Nadav Amit
Signed-off-by: Rick Edgecombe
---

Changes since v1:
  - Update commit message with more detail
  - Fix flush end range on !CONFIG_ARCH_HAS_SET_DIRECT_MAP case


It does not work on my V445 where the initial problem happened.

[   46.582633] systemd[1]: Detected architecture sparc64.

Welcome to Debian GNU/Linux 10 (buster)!

[   46.759048] systemd[1]: Set hostname to .
[   46.831383] systemd[1]: Failed to bump fs.file-max, ignoring: Invalid 
argument
[   67.989695] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   68.074706] rcu: 0-...!: (0 ticks this GP) idle=5c6/1/0x4000 
softirq=33/33 fqs=0
[   68.198443] rcu: 2-...!: (0 ticks this GP) idle=e7e/1/0x4000 
softirq=67/67 fqs=0
[   68.322198]  (detected by 1, t=5252 jiffies, g=-939, q=108)
[   68.402204]   CPU[  0]: TSTATE[80001603] TPC[0043f298] 
TNPC[0043f29c] TASK[systemd-debug-g:89]
[   68.556001]  TPC[smp_synchronize_tick_client+0x18/0x1a0] 
O7[0xfff1691c] I7[xcall_sync_tick+0x1c/0x2c] 
RPC[alloc_set_pte+0xf4/0x300]
[   68.750973]   CPU[  2]: TSTATE[80001600] TPC[0043f298] 
TNPC[0043f29c] TASK[systemd-cryptse:88]
[   68.904741]  TPC[smp_synchronize_tick_client+0x18/0x1a0] 
O7[filemap_map_pages+0x3cc/0x3e0] I7[xcall_sync_tick+0x1c/0x2c] 
RPC[handle_mm_fault+0xa0/0x180]
[   69.115991] rcu: rcu_sched kthread starved for 5252 jiffies! g-939 f0x0 
RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=3
[   69.262239] rcu: RCU grace-period kthread stack dump:
[   69.334741] rcu_sched   I010  2 0x0600
[   69.413495] Call Trace:
[   69.448501]  [0093325c] schedule+0x1c/0xc0
[   69.517253]  [00936c74] schedule_timeout+0x154/0x260
[   69.598514]  [004b65a4] rcu_gp_kthread+0x4e4/0xac0
[   69.677261]  [0047ecfc] kthread+0xfc/0x120
[   69.746018]  [004060a4] ret_from_fork+0x1c/0x2c
[   69.821014]  [] 0x0

and hangs here, software watchdog kicks in soon.

--
Meelis Roos


Re: DISCONTIGMEM is deprecated

2019-04-23 Thread Meelis Roos

ia64 (looks complicated ...)


Well as far as I can tell it was not even used 12 or so years ago on
Itanium when I worked on that stuff.


My notes tell that on UP ia64 (RX2620), !NUMA was broken with both
SPARSEMEM and DISCONTIGMEM. NUMA+SPARSEMEM or !NUMA worked. Even
NUMA+DISCONTIGMEM worked, that was my config on 2-CPU RX2660.

--
Meelis Roos



5.1-rc6: UBSAN: Undefined behaviour in mm/compaction.c:1167:30

2019-04-22 Thread Meelis Roos

The warning UBSAN: Undefined behaviour in mm/compaction.c:1167:30 happened with 
5.1-rc6 on UP 32-bit P4 PC with highmem.

[   95.135408] 

[   95.135478] UBSAN: Undefined behaviour in mm/compaction.c:1167:30
[   95.135528] shift exponent 32 is too large for 32-bit type 'long unsigned 
int'
[   95.135579] CPU: 0 PID: 13 Comm: kcompactd0 Not tainted 5.1.0-rc6 #71
[   95.135626] Hardware name: MSI  MS-6547  
   /MS-6547 , BIOS 07.00T
[   95.135681] Call Trace:
[   95.135742]  dump_stack+0x16/0x1e
[   95.135791]  ubsan_epilogue+0xb/0x29
[   95.135836]  __ubsan_handle_shift_out_of_bounds.cold.14+0x20/0x6a
[   95.135887]  ? page_vma_mapped_walk+0x125/0x410
[   95.135935]  ? page_counter_cancel+0x16/0x30
[   95.135984]  compaction_alloc.cold.43+0x56/0xbc
[   95.136033]  ? free_unref_page_commit.isra.95+0x7a/0x80
[   95.136082]  migrate_pages+0x99/0x732
[   95.136127]  ? isolate_migratepages_block+0x940/0x940
[   95.136172]  ? __ClearPageMovable+0x10/0x10
[   95.136217]  compact_zone+0x7e2/0xb70
[   95.136262]  ? compaction_suitable+0x49/0x60
[   95.136306]  kcompactd_do_work+0xdb/0x1d0
[   95.136389]  ? __switch_to_asm+0x26/0x4c
[   95.136470]  kcompactd+0x4f/0x110
[   95.136550]  ? wait_woken+0x60/0x60
[   95.136630]  kthread+0xe5/0x100
[   95.136709]  ? kcompactd_do_work+0x1d0/0x1d0
[   95.136789]  ? kthread_create_worker_on_cpu+0x20/0x20
[   95.136870]  ret_from_fork+0x2e/0x38
[   95.136949] 


It is not reproducible at will - did not happen on 2 next reboots, so it 
probably originates
from an earlier version.

Full dmesg and config are below.

[0.00] Linux version 5.1.0-rc6 (mroos@kukeseen) (gcc version 8.3.0 
(Debian 8.3.0-6)) #71 Mon Apr 22 01:30:01 EEST 2019
[0.00] x86/fpu: x87 FPU will use FXSAVE
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000ec000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x3fff] usable
[0.00] BIOS-e820: [mem 0xfec0-0xfecf] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfeef] reserved
[0.00] BIOS-e820: [mem 0xffee-0xfff0fffe] reserved
[0.00] BIOS-e820: [mem 0xfffc-0x] reserved
[0.00] Notice: NX (Execute Disable) protection missing in CPU!
[0.00] Legacy DMI 2.3 present.
[0.00] DMI: MSI  MS-6547
 /MS-6547 , BIOS 07.00T
[0.00] tsc: Fast TSC calibration using PIT
[0.00] tsc: Detected 2000.078 MHz processor
[0.009834] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.009838] e820: remove [mem 0x000a-0x000f] usable
[0.009849] last_pfn = 0x4 max_arch_pfn = 0x10
[0.009866] MTRR default type: uncachable
[0.009868] MTRR fixed ranges enabled:
[0.009871]   0-9 write-back
[0.009873]   A-B uncachable
[0.009875]   C-C7FFF write-protect
[0.009878]   C8000-E uncachable
[0.009879]   F-F write-protect
[0.009881] MTRR variable ranges enabled:
[0.009885]   0 base 0 mask FC000 write-back
[0.009886]   1 disabled
[0.009887]   2 disabled
[0.009888]   3 disabled
[0.009889]   4 disabled
[0.009890]   5 disabled
[0.009893]   6 base 0E000 mask FFC00 write-combining
[0.009895]   7 base 0E000 mask FFC00 write-combining
[0.010289] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- UC
[0.032447] initial memory mapped: [mem 0x-0x11bf]
[0.032502] BRK [0x11831000, 0x11831fff] PGTABLE
[0.032536] ACPI: Early table checksum verification disabled
[0.033564] ACPI BIOS Error (bug): A valid RSDP was not found 
(20190215/tbxfroot-210)
[0.033571] 140MB HIGHMEM available.
[0.033575] 883MB LOWMEM available.
[0.033578]   mapped low ram: 0 - 373fe000
[0.033581]   low ram: 0 - 373fe000
[0.033585] BRK [0x11832000, 0x11832fff] PGTABLE
[0.038164] Zone ranges:
[0.038177]   DMA  [mem 0x1000-0x00ff]
[0.038181]   Normal   [mem 0x0100-0x373fdfff]
[0.038185]   HighMem  [mem 0x373fe000-0x3fff]
[0.038188] Movable zone start for each node
[0.038190] Early memory node ranges
[0.038193]   node   0: [mem 0x1000-0x0009efff]
[0.038196]   node   0: [mem 0x0010-0x3fff]
[0.038206] Zeroed struct page in unavailable ranges: 98 pages
[0.038210] Initmem setup node 0 [

Re: CONFIG_DEBUG_VIRTUAL breaks boot on x86-32

2019-03-27 Thread Meelis Roos

You might be hitting a bug I found.
Try applying this patch:
https://marc.info/?l=linux-kernel&m=155355953012985&w=2


Unfortunately it did not change anything.

--
Meelis Roos 


Re: CONFIG_DEBUG_VIRTUAL breaks boot on x86-32

2019-03-26 Thread Meelis Roos
 13.104639] [drm] radeon: 1 quad pipes, 1 Z pipes initialized
[   13.105883] radeon :01:00.0: WB disabled
[   13.105921] radeon :01:00.0: fence driver on ring 0 use gpu addr 
0xe000 and cpu addr 0xc1c2fb20
[   13.105942] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   13.105948] [drm] Driver supports precise vblank timestamp query.
[   13.106006] [drm] radeon: irq initialized.
[   13.106061] [drm] Loading R300 Microcode
[   13.145700] Registered IR keymap rc-hauppauge
[   13.146003] rc rc0: Hauppauge as 
/devices/pci:00/:00:0a.2/i2c-1/1-0018/rc/rc0
[   13.146230] input: Hauppauge as 
/devices/pci:00/:00:0a.2/i2c-1/1-0018/rc/rc0/input6
[   13.152323] rc rc0: lirc_dev: driver ir_kbd_i2c registered at minor = 0, 
scancode receiver, no transmitter
[   13.205146] cx88_blackbird: cx2388x blackbird driver version 1.0.0 loaded
[   13.205174] cx8802: registering cx8802 driver, type: blackbird access: shared
[   13.205183] cx8802: subsystem: 107d:663c, board: Leadtek PVR 2000 [card=9]
[   13.205538] cx88_blackbird: cx23416 based mpeg encoder (blackbird reference 
design)
[   13.205767] cx88_blackbird: blackbird_mbox_func: blackbird:Firmware and/or 
mailbox pointer not initialized or corrupted
[   15.612593] cx88_blackbird: blackbird_load_firmware: blackbird:Firmware 
upload successful.
[   15.630492] [drm] radeon: ring at 0xE0001000
[   15.630545] [drm] ring test succeeded in 0 usecs
[   15.630875] [drm] ib test succeeded in 0 usecs
[   15.632854] [drm] Radeon Display Connectors
[   15.632867] [drm] Connector 0:
[   15.632872] [drm]   VGA-1
[   15.632877] [drm]   DDC: 0x60 0x60 0x60 0x60 0x60 0x60 0x60 0x60
[   15.632883] [drm]   Encoders:
[   15.632887] [drm] CRT1: INTERNAL_DAC1
[   15.632892] [drm] Connector 1:
[   15.632896] [drm]   DVI-I-1
[   15.632900] [drm]   HPD1
[   15.632905] [drm]   DDC: 0x64 0x64 0x64 0x64 0x64 0x64 0x64 0x64
[   15.632910] [drm]   Encoders:
[   15.632913] [drm] CRT2: INTERNAL_DAC2
[   15.632918] [drm] DFP1: INTERNAL_TMDS1
[   15.632922] [drm] Connector 2:
[   15.632925] [drm]   SVIDEO-1
[   15.632929] [drm]   Encoders:
[   15.632933] [drm] TV1: INTERNAL_DAC2
[   15.749890] [drm] fb mappable at 0xC004
[   15.749914] [drm] vram apper at 0xC000
[   15.749919] [drm] size 5242880
[   15.749923] [drm] fb depth is 24
[   15.749927] [drm]pitch is 5120
[   15.752277] fbcon: radeondrmfb (fb0) is primary device
[   15.803402] Console: switching to colour frame buffer device 160x64
[   15.930197] radeon :01:00.0: fb0: radeondrmfb frame buffer device
[   15.930273] [drm] Initialized radeon 2.50.0 20080528 for :01:00.0 on 
minor 0
[   16.272511] cx88_blackbird: blackbird_initialize_codec: blackbird:Firmware 
version is 0x02060039
[   16.284001] cx88_blackbird: registered device video1 [mpeg]
[   16.287894] modprobe (155) used greatest stack depth: 5496 bytes left
[   16.803253] Adding 2096124k swap on /dev/sda5.  Priority:-2 extents:1 
across:2096124k
[   20.717229] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s18: link becomes ready
[   21.027559] systemd-udevd (100) used greatest stack depth: 4416 bytes left

--
Meelis Roos


CONFIG_DEBUG_VIRTUAL breaks boot on x86-32

2019-03-21 Thread Meelis Roos

I tried to debug another problem and turned on most debug options for memory.
The resulting kernel failed to boot.

Bisecting the configurations led to CONFIG_DEBUG_VIRTUAL - if I turned it on
in addition to some other debug options, the machine crashed with

kernel BUG at arch/x86/mm/physaddr.c:79!

Screenshot at http://kodu.ut.ee/~mroos/debug_virtual-boot-hang-1.jpg

The machine was Athlon XP with VIA KT600 chipset and 2G RAM.

--
Meelis Roos 


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-20 Thread Meelis Roos

First, I found out that both the problematic alphas had memory compaction and
page migration and bounce buffers turned on, and working alphas had them off.

Next, turing off these options makes the problematic alphas work.


OK, thanks for testing! Can you narrow down whether the problem is due to
CONFIG_BOUNCE or CONFIG_MIGRATION + CONFIG_COMPACTION? These are two
completely different things so knowing where to look will help. Thanks!


Tested both.

Just CONFIG_MIGRATION + CONFIG_COMPACTION breaks the alpha.
Just CONFIG_BOUNCE has no effect in 5 tries.

--
Meelis Roos


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-19 Thread Meelis Roos

Could
https://lore.kernel.org/linux-mm/20190219123212.29838-1-lar...@axis.com/T/#u
be relevant?


Tried it, still broken.

I wrote:


But my kernel config had memory compaction (that turned on page migration) and
bounce buffers. I do not remember why I found them necessary but I will try
without them. 


First, I found out that both the problematic alphas had memory compaction and
page migration and bounce buffers turned on, and working alphas had them off.

Next, turing off these options makes the problematic alphas work.

--
Meelis Roos 


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-19 Thread Meelis Roos

Thanks for information. Yeah, that makes somewhat more sense. Can you ever
see the failure if you disable CONFIG_TRANSPARENT_HUGEPAGE?

HAVE_ARCH_TRANSPARENT_HUGEPAGE [=n]

Seems there is no THP on alpha.


Because your
findings still seem to indicate that there' some problem with page
migration and Alpha (added MM list to CC).


But my kernel config had memory compaction (that turned on page migration) and
bounce buffers. I do not remember why I found them necessary but I will try
without them.

--
Meelis Roos 


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-19 Thread Meelis Roos

The result of the bisection is
[88dbcbb3a4847f5e6dfeae952d3105497700c128] blkdev: avoid migration stalls for 
blkdev pages

Is that result relevant for the problem or should I continue bisecting between 
4.20.0 and the so far first bad commit?


Can you try reverting the commit and see if it makes the problem go away?


Tried reverting it on top of 5.0.0-rc6-00153-g5ded5871030e and it seems
to make the kernel work - emerge --sync succeeded.

There is more to it.

After running 5.0.0-rc6-00153-g5ded5871030e-dirty (with the revert of that 
patch)
successfully for Gentoo update, I upgraded the kernel to
5.0.0-rc7-00011-gb5372fe5dc84-dirty (todays git + revert of this patch) and it 
broke on rsync again:

RepoStorageException: command exited with status -6: rsync -a --link-dest 
/usr/portage --exclude=/distfiles --exclude=/local --exclude=/lost+found 
--exclude=/packages --exclude /.tmp-unverified-download-quarantine 
/usr/portage/ /usr/portage/.tmp-unverified-download-quarantine/

Nothing in dmesg.

This means the real root reason is somewhere deeper and reverting this commit 
just made
it less likely to happen.

--
Meelis Roos 


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-18 Thread Meelis Roos

Hum, weird. I have hard time understanding how that change could be causing
fs corruption on Aplha but OTOH it is not completely unthinkable. With this
commit we may migrate some block device pages we were not able to migrate
previously and that could be causing some unexpected issue. I'll look into
this.


To make things more interesting, it does not happen on any alpha but only one 
subarch
so far: 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1889207.html
is my original bug report.

--
Meelis Roos 


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-16 Thread Meelis Roos

The result of the bisection is
[88dbcbb3a4847f5e6dfeae952d3105497700c128] blkdev: avoid migration stalls for 
blkdev pages

Is that result relevant for the problem or should I continue bisecting between 
4.20.0 and the so far first bad commit?


Can you try reverting the commit and see if it makes the problem go away?


Tried reverting it on top of 5.0.0-rc6-00153-g5ded5871030e and it seems to make 
the kernel work - emerge --sync succeeded.

Unfinished further bisection has also not yielded any other bad revisions so 
far.

--
Meelis Roos


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-15 Thread Meelis Roos

I have noticed ext4 filesystem corruption on two of my test alphas with 
4.20.0-09062-gd8372ba8ce28.


Retried it, still happens with 5.0.0-rc5-00358-gdf3865f8f568 - rsync of emerge 
--sync just fail with nothing in dmesg.


Finished second round of bisecting, first round did not get me far enough so
I may still have false "goods" in my bisection history.

The command I used for bisecting was Gentoos
emerge --sync.
that sometimes failed from error -6 or -11 from rsync.
Usually the file system corruption did not happen and nothing was in dmesg, 
just file IO error from rsync.

The result of the bisection is
[88dbcbb3a4847f5e6dfeae952d3105497700c128] blkdev: avoid migration stalls for 
blkdev pages

Is that result relevant for the problem or should I continue bisecting between 
4.20.0 and the so far first bad commit?


On AlphaServer DS10:
[10749.664418] EXT4-fs error (device sda2): __ext4_iget:5052: inode #1853093: 
block 1: comm rsync: invalid block

On AlphaServer DS10L:
[ 5325.064656] EXT4-fs error (device sda2): htree_dirblock_to_tree:1007: inode 
#1191951: block 4731728: comm rm: bad entry in directory: directory entry 
overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096
[ 5325.069539] EXT4-fs error (device sda2): htree_dirblock_to_tree:1007: inode 
#1191951: block 4731728: comm rm: bad entry in directory: directory entry 
overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096
[ 5325.077351] EXT4-fs error (device sda2): ext4_empty_dir:2718: inode 
#1191951: block 4731728: comm rm: bad entry in directory: directory entry 
overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096

Two other alphas, PC-164 and Eiger, worked fine with the same kernel version 
(different kernel configs according to hardware).

The details:
4.20 worked fine, with gentoo emerge package update after bootup.
Next, 4.20.0-06428-g00c569b567c7 worked fine, with gentoo emerge after bootup.
Next, 4.20.0-09062-gd8372ba8ce28 booted up fine but rsync and rm during start 
of gentoo emerge errored out like above.

So the corruption _might_ have happened during bootup of previous kernel but it 
looks more likely that only the latest kernel with blk-mq introduced the 
problems. mq-deadline is in use on all the alphas.

DS10 has Symbios 53C896 SCSI (sym2 driver), DS10L has QLogic ISP1040, so they 
are different. Working Eiger and PC164 have sym2 based scsi controllers too.




--
Meelis Roos 


Undefined behaviour in drivers/gpu/drm/radeon/r200.c:480:34 - shift exponent 4096 is too large

2019-02-11 Thread Meelis Roos

Got UBSAN warning from Dell D600 running 5.0.0-rc4-00218-g12491ed354d2.
The warning did not happen on bootup but during xfce session start or console 
switch.

[   15.323113] radeon :01:00.0: putting AGP V2 device into 4x mode
[   15.323134] radeon :01:00.0: GTT: 128M 0xE000 - 0xE7FF
[   15.323142] radeon :01:00.0: VRAM: 128M 0xE800 - 
0xEFFF (32M used)
[   15.323459] [drm] Detected VRAM RAM=128M, BAR=128M
[   15.323463] [drm] RAM width 64bits DDR
[   15.323566] [TTM] Zone  kernel: Available graphics memory: 412446 kiB
[   15.323567] [TTM] Initializing pool allocator
[   15.323580] [TTM] Initializing DMA pool allocator
[   15.323609] [drm] radeon: 32M of VRAM memory ready
[   15.323611] [drm] radeon: 128M of GTT memory ready.
[   15.323621] [drm] radeon: power management initialized
[   15.331289] radeon :01:00.0: WB disabled
[   15.331296] radeon :01:00.0: fence driver on ring 0 use gpu addr 
0xe000 and cpu addr 0x712386dd
[   15.331299] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   15.331300] [drm] Driver supports precise vblank timestamp query.
[   15.331315] [drm] radeon: irq initialized.
[   15.331317] [drm] Loading R200 Microcode
[...]
[   15.795041] [drm] radeon: ring at 0xE0001000
[   15.795073] [drm] ring test succeeded in 1 usecs
[   15.795316] [drm] ib test succeeded in 0 usecs
[   15.801857] [drm] Panel ID String: 2K077141X13
[   15.801861] [drm] Panel Size 1024x768
[   15.801938] [drm] No TV DAC info found in BIOS
[   15.802012] [drm] Radeon Display Connectors
[   15.802015] [drm] Connector 0:
[   15.802017] [drm]   VGA-1
[   15.802023] [drm]   DDC: 0x60 0x60 0x60 0x60 0x60 0x60 0x60 0x60
[   15.802024] [drm]   Encoders:
[   15.802027] [drm] CRT1: INTERNAL_DAC1
[   15.802030] [drm] Connector 1:
[   15.802031] [drm]   DVI-D-1
[   15.802033] [drm]   HPD1
[   15.802038] [drm]   DDC: 0x64 0x64 0x64 0x64 0x64 0x64 0x64 0x64
[   15.802040] [drm]   Encoders:
[   15.802042] [drm] DFP1: INTERNAL_TMDS1
[   15.802044] [drm] Connector 2:
[   15.802046] [drm]   LVDS-1
[   15.802047] [drm]   Encoders:
[   15.802049] [drm] LCD1: INTERNAL_LVDS
[   15.802051] [drm] Connector 3:
[   15.802053] [drm]   SVIDEO-1
[   15.802054] [drm]   Encoders:
[   15.802056] [drm] TV1: INTERNAL_DAC2
[   15.845987] [drm] fb mappable at 0xE804
[   15.845988] [drm] vram apper at 0xE800
[   15.845989] [drm] size 1572864
[   15.845990] [drm] fb depth is 16
[   15.845990] [drm]pitch is 2048
[   15.848183] fbcon: radeondrmfb (fb0) is primary device
[   15.892233] Console: switching to colour frame buffer device 128x48
[   15.901408] radeon :01:00.0: fb0: radeondrmfb frame buffer device
[   15.905786] [drm] Initialized radeon 2.50.0 20080528 for :01:00.0 on 
minor 0
[...]
[  447.146334] 

[  447.146347] UBSAN: Undefined behaviour in 
drivers/gpu/drm/radeon/r200.c:480:34
[  447.146351] shift exponent 4096 is too large for 32-bit type 'int'
[  447.146357] CPU: 0 PID: 386 Comm: Xorg Not tainted 
5.0.0-rc4-00218-g12491ed354d2 #7
[  447.146358] Hardware name: Dell Computer Corporation Latitude D600   
/0X2034, BIOS A16 06/29/2005
[  447.146359] Call Trace:
[  447.146375]  dump_stack+0x16/0x19
[  447.146379]  ubsan_epilogue+0xb/0x29
[  447.146381]  __ubsan_handle_shift_out_of_bounds.cold.14+0x26/0x80
[  447.146486]  ? radeon_cs_packet_next_reloc+0x3c/0x150 [radeon]
[  447.146521]  ? r100_reloc_pitch_offset+0x27/0x150 [radeon]
[  447.146551]  r200_packet0_check.cold.0+0xf/0x45 [radeon]
[  447.146592]  ? r200_copy_dma+0x430/0x430 [radeon]
[  447.146626]  r100_cs_parse_packet0+0x53/0xe0 [radeon]
[  447.146661]  r100_cs_parse+0x12e/0x440 [radeon]
[  447.146700]  ? r200_copy_dma+0x430/0x430 [radeon]
[  447.146734]  radeon_cs_ioctl+0x256/0x890 [radeon]
[  447.146743]  ? ttm_bo_init_reserved+0x338/0x390 [ttm]
[  447.146779]  ? radeon_cs_parser_init+0x550/0x550 [radeon]
[  447.146804]  drm_ioctl_kernel+0x96/0xe0 [drm]
[  447.146816]  drm_ioctl+0x25f/0x530 [drm]
[  447.146850]  ? radeon_cs_parser_init+0x550/0x550 [radeon]
[  447.146855]  ? ktime_get_mono_fast_ns+0xb6/0x1f0
[  447.146880]  radeon_drm_ioctl+0x40/0x80 [radeon]
[  447.146905]  ? radeon_pci_shutdown+0x30/0x30 [radeon]
[  447.146909]  do_vfs_ioctl+0x90/0x6c0
[  447.146913]  ? handle_mm_fault+0xa48/0xfe0
[  447.146918]  ? vm_mmap_pgoff+0x88/0xd0
[  447.146923]  ? ktime_get_ts64+0x5f/0x1e0
[  447.146925]  ksys_ioctl+0x39/0x70
[  447.146927]  sys_ioctl+0x11/0x13
[  447.146930]  do_fast_syscall_32+0x95/0x1d0
[  447.146934]  entry_SYSENTER_32+0x6b/0xbd
[  447.146936] EIP: 0xb7f937cd
[  447.146939] Code: 54 cd ff ff 85 d2 8b 98 58 cd ff ff 89 c8 74 02 89 0a 5b 5d c3 
8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 
90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
[  447.146941] EAX: ffda EBX: 000e ECX: c0206466 EDX: 02311c40
[  447.1469

Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-10 Thread Meelis Roos

02.01.19 17:52 I wrote:


I have noticed ext4 filesystem corruption on two of my test alphas with 
4.20.0-09062-gd8372ba8ce28.


Retried it, still happens with 5.0.0-rc5-00358-gdf3865f8f568 - rsync of emerge 
--sync just fail with nothing in dmesg.
 

On AlphaServer DS10:
[10749.664418] EXT4-fs error (device sda2): __ext4_iget:5052: inode #1853093: 
block 1: comm rsync: invalid block

On AlphaServer DS10L:
[ 5325.064656] EXT4-fs error (device sda2): htree_dirblock_to_tree:1007: inode 
#1191951: block 4731728: comm rm: bad entry in directory: directory entry 
overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096
[ 5325.069539] EXT4-fs error (device sda2): htree_dirblock_to_tree:1007: inode 
#1191951: block 4731728: comm rm: bad entry in directory: directory entry 
overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096
[ 5325.077351] EXT4-fs error (device sda2): ext4_empty_dir:2718: inode 
#1191951: block 4731728: comm rm: bad entry in directory: directory entry 
overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096

Two other alphas, PC-164 and Eiger, worked fine with the same kernel version 
(different kernel configs according to hardware).

The details:
4.20 worked fine, with gentoo emerge package update after bootup.
Next, 4.20.0-06428-g00c569b567c7 worked fine, with gentoo emerge after bootup.
Next, 4.20.0-09062-gd8372ba8ce28 booted up fine but rsync and rm during start 
of gentoo emerge errored out like above.

So the corruption _might_ have happened during bootup of previous kernel but it 
looks more likely that only the latest kernel with blk-mq introduced the 
problems. mq-deadline is in use on all the alphas.

DS10 has Symbios 53C896 SCSI (sym2 driver), DS10L has QLogic ISP1040, so they 
are different. Working Eiger and PC164 have sym2 based scsi controllers too.


--
Meelis Roos 


Re: bisected: ttyS panic on pa-risc

2019-01-19 Thread Meelis Roos

The patch below was just applied to my tree, hopefully ti fixes this
issue.


Yes, it cures both the HP A500 (parisc) and HP RX2620 (ia64) that I also found 
breaking meanwhile.

--
Meelis Roos 


bisected: ttyS panic on pa-risc

2019-01-10 Thread Meelis Roos

My HP 9000 A500 (pa-risc architecture) paniced in 5.0-rc1. It happened after 
printing dmesg lines about ttyS and before moving on to scsi printk-s.
I bisected it and the panic symptoms changed during that (some had backtrace, 
some had just panic).

This is one of the crashes I got:
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
serial :00:04.0: enabling device (0146 -> 0147)
printk: console [ttyS0] disabled

:00:04.0: ttyS0 at MMIO 0xf800 (irq = 21, base_baud = 115200) 
is a 16550A
printk: console [ttyS0] enabled
printk: console [ttyS0] enabled
printk: bootconsole [ttyB0] disabled
printk: bootconsole [ttyB0] disabled
:00:04.0: ttyS1 at MMIO 0xf808 (irq = 21, base_baud = 115200) 
is a 16550A
:00:04.0: ttyS2 at MMIO 0xf810 (irq = 21, base_baud = 115200) 
is a 16550A
serial :00:05.0: enabling device (0140 -> 0143)
:00:05.0: ttyS3 at MMIO 0xf8005000 (irq = 22, base_baud = 115200) 
is a 16550A
Backtrace:
 [<40502268>] pciserial_init_ports+0x128/0x240
 [<405040b8>] pciserial_init_one+0x1e0/0x2f0
 [<404b2b8c>] pci_device_probe+0xfc/0x180
 [<40513958>] really_probe+0x268/0x3d0
 [<40513d28>] driver_probe_device+0xf8/0x100
 [<40513e54>] __driver_attach+0x124/0x130
 [<40510dc4>] bus_for_each_dev+0x9c/0xe8
 [<40513040>] driver_attach+0x28/0x38
 [<405128c0>] bus_a

Normal dmesg excerpt from working kernel before the problem:

[6.746131] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[6.771772] serial :00:04.0: enabling device (0146 -> 0147)
[6.792657] printk: console [ttyS0] disabled
[6.829825] :00:04.0: ttyS0 at MMIO 0xf800 (irq = 21, 
base_baud = 115200) is a 16550A
[6.837151] printk: console [ttyS0] enabled
[6.877768] printk: bootconsole [ttyB0] disabled
[6.904352] :00:04.0: ttyS1 at MMIO 0xf808 (irq = 21, 
base_baud = 115200) is a 16550A
[6.961051] :00:04.0: ttyS2 at MMIO 0xf810 (irq = 21, 
base_baud = 115200) is a 16550A
[6.969881] serial :00:05.0: enabling device ( -> 0003)
[7.004160] serial :00:05.0: enabling SERR and PARITY (0003 -> 0143)
[7.030298] :00:05.0: ttyS3 at MMIO 0xf8005000 (irq = 22, 
base_baud = 115200) is a 16550A
[7.041663] serial :00:05.0: Couldn't register serial port 0, irq 22, 
type 2, error -28
[7.145456] sym53c8xx :00:01.0: enabling device ( -> 0003)


Bisection leads to this commit:

6d7f677a2afa1c82d7fc7af7f9159cbffd5dc010 is the first bad commit
commit 6d7f677a2afa1c82d7fc7af7f9159cbffd5dc010
Author: Darwin Dingel 
Date:   Mon Dec 10 11:29:09 2018 +1300

serial: 8250: Rate limit serial port rx interrupts during input overruns

When a serial port gets faulty or gets flooded with inputs, its interrupt
handler starts to work double time to get the characters to the workqueue
for the tty layer to handle them. When this busy time on the serial/tty
subsystem happens during boot, where it is also busy on the userspace
trying to initialise, some processes can continuously get preempted
and will be on hold until the interrupts subside.

The fix is to backoff on processing received characters for a specified
amount of time when an input overrun is seen (received a new character
before the previous one is processed). This only stops receive and will
continue to transmit characters to serial port. After the backoff period
is done, it receive will be re-enabled. This is optional and will only
be enabled by setting 'overrun-throttle-ms' in the dts.

Signed-off-by: Darwin Dingel 
Signed-off-by: Greg Kroah-Hartman 

:04 04 4ea6cd68ededa0c9ffaa218668ffeb35557070a5 
a011db1916fbf5cfdcfff836a81e4fb5ee737003 M  drivers
:04 04 b1b1dc977965eb2db6b2cc79939446a1cf2f684d 
41322ab1c199f504cfcc5b2ca211b4638d41351c M  include


--
Meelis Roos 


ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-01-02 Thread Meelis Roos

I have noticed ext4 filesystem corruption on two of my test alphas with 
4.20.0-09062-gd8372ba8ce28.

On AlphaServer DS10:
[10749.664418] EXT4-fs error (device sda2): __ext4_iget:5052: inode #1853093: 
block 1: comm rsync: invalid block

On AlphaServer DS10L:
[ 5325.064656] EXT4-fs error (device sda2): htree_dirblock_to_tree:1007: inode 
#1191951: block 4731728: comm rm: bad entry in directory: directory entry 
overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096
[ 5325.069539] EXT4-fs error (device sda2): htree_dirblock_to_tree:1007: inode 
#1191951: block 4731728: comm rm: bad entry in directory: directory entry 
overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096
[ 5325.077351] EXT4-fs error (device sda2): ext4_empty_dir:2718: inode 
#1191951: block 4731728: comm rm: bad entry in directory: directory entry 
overrun - offset=76, inode=417080, rec_len=61816, name_len=35, size=4096

Two other alphas, PC-164 and Eiger, worked fine with the same kernel version 
(different kernel configs according to hardware).

The details:
4.20 worked fine, with gentoo emerge package update after bootup.
Next, 4.20.0-06428-g00c569b567c7 worked fine, with gentoo emerge after bootup.
Next, 4.20.0-09062-gd8372ba8ce28 booted up fine but rsync and rm during start 
of gentoo emerge errored out like above.

So the corruption _might_ have happened during bootup of previous kernel but it 
looks more likely that only the latest kernel with blk-mq introduced the 
problems. mq-deadline is in use on all the alphas.

DS10 has Symbios 53C896 SCSI (sym2 driver), DS10L has QLogic ISP1040, so they 
are different. Working Eiger and PC164 have sym2 based scsi controllers too.

Full dmesg of DS10:

[0.00] Linux version 4.20.0-09062-gd8372ba8ce28 (mroos@ds10) (gcc 
version 7.3.0 (Gentoo 7.3.0-r3 p1.4)) #92 Sun Dec 30 01:29:49 EET 2018
[0.00] Booting GENERIC on Tsunami variation Webbrick using machine 
vector Webbrick from SRM
[0.00] Major Options: LEGACY_START VERBOSE_MCHECK MAGIC_SYSRQ
[0.00] Command line: root=/dev/sda2 console=ttyS0
[0.00] memcluster 0, usage 1, start0, end  256
[0.00] memcluster 1, usage 0, start  256, end65443
[0.00] memcluster 2, usage 1, start65443, end65536
[0.00] 2048K Bcache detected; load hit latency 20 cycles, load miss 
latency 95 cycles
[0.00] On node 0 totalpages: 65443
[0.00]   DMA zone: 448 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 65443 pages, LIFO batch:15
[0.00] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[0.00] pcpu-alloc: [0] 0
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 64995
[0.00] Kernel command line: root=/dev/sda2 console=ttyS0
[0.00] Dentry cache hash table entries: 65536 (order: 6, 524288 bytes)
[0.00] Inode-cache hash table entries: 32768 (order: 5, 262144 bytes)
[0.00] Sorting __ex_table...
[0.00] Memory: 508584K/523544K available (5571K kernel code, 413K 
rwdata, 1456K rodata, 256K init, 206K bss, 14960K reserved, 0K cma-reserved)
[0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[0.00] NR_IRQS: 128
[0.00] HWRPB cycle frequency bogus.  Estimated 462413354 Hz
[0.00] clocksource: rpcc: mask: 0x max_cycles: 0x, 
max_idle_ns: 4133229351 ns
[0.002929] Console: colour VGA+ 80x25
[0.021484] printk: console [ttyS0] enabled
[0.022460] Calibrating delay loop... 916.72 BogoMIPS (lpj=447488)
[0.032226] pid_max: default: 32768 minimum: 301
[0.033203] Mount-cache hash table entries: 1024 (order: 0, 8192 bytes)
[0.034179] Mountpoint-cache hash table entries: 1024 (order: 0, 8192 bytes)
[0.038085] devtmpfs: initialized
[0.040039] random: get_random_u32 called from 
bucket_table_alloc.isra.17+0xc4/0x290 with crng_init=0
[0.041015] clocksource: jiffies: mask: 0x max_cycles: 0x, 
max_idle_ns: 1866466235866741 ns
[0.041992] futex hash table entries: 256 (order: -1, 6144 bytes)
[0.043945] NET: Registered protocol family 16
[0.045898] EISA bus registered
[0.047851] random: get_random_bytes called from kcmp_cookies_init+0x2c/0x74 
with crng_init=0
[0.048828] PCI host bridge to bus :00
[0.050781] pci_bus :00: root bus resource [io  0x-0x1ff]
[0.052734] pci_bus :00: root bus resource [mem 0x-0x3fff]
[0.053710] pci_bus :00: No busn resource found for root bus, will use 
[bus 00-ff]
[0.054687] pci :00:01.0: [10b9:5237] type 00 class 0x0c0310
[0.054687] pci :00:01.0: reg 0x10: [mem 0x020b4000-0x020b4fff]
[0.054687] pci :00:07.0: [10b9:1533] type 00 class 0x060100
[0.055664] pci :00:09.0: [1011:0019] type 00 class 0x02
[0.055664] pci :00:09.0: reg 0x10: [io  0x1200-0x127f]
[0.055664] pci :00:09.0: reg 0x14: [mem 0x

Re: [PATCH v2] x86/build: fix compiler support check for CONFIG_RETPOLINE

2018-12-04 Thread Meelis Roos

05.12.18 08:27 Masahiro Yamada kirjutas:

The easiest fix is to move this check to the "archprepare" like commit
829fe4aa9ac1 ("x86: Allow generating user-space headers without a
compiler") did.

Link: https://lkml.org/lkml/2018/12/4/206
Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler 
support")
Reported-by: Meelis Roos 
Signed-off-by: Masahiro Yamada 
---

Changes in v2:
   - Revive ifdef CONFIG_RETPOLINE surrounding the KBUILD_CFLAGS addition
   - Rephase the commit log a bit, hoping the cause of the issue will be clearer


Works for me - first it did

scripts/kconfig/conf  --syncconfig Kconfig

and then started compiling. The #define is gone from include/linux.

Thank you!

--
Meelis Roos 


Compiling with old gcc breaks when CONFIG_RETPOLINE is off

2018-12-04 Thread Meelis Roos

Just tried 4.20-rc5 on an old K6-2 PC with gcc 5.3.1, got an error about 
non-retpoline compiler,
turned CONFIG_RETPOLINE off and retried.

To my surprise, compilation still breaks with
arch/x86/Makefile:224: *** You are building kernel with non-retpoline compiler, 
please update your compiler..  Stop.

As I read the Makefile, it should error only when CONFIG_RETPOLINE is enabled, 
but it still breaks.

$ grep -r CONFIG_RETPOLINE .config
# CONFIG_RETPOLINE is not set

$ grep -r CONFIG_RETPOLINE include/
include/generated/autoconf.h:#define CONFIG_RETPOLINE 1
include/config/auto.conf:CONFIG_RETPOLINE=y

So the headers have not been updated yet, maybe?

--
Meelis Roos 


insecure W+X mappings on HP DL365 G5

2018-10-31 Thread Meelis Roos

This HP DL365 G5 is the second old server where I see massive W+X mapped pages.

Is it some BIOS defect?

[0.714956] x86/mm: Found insecure W+X mapping at address 0x8ed98000
[0.715101] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:266 
note_page+0x4c7/0x780
[0.715298] Modules linked in:
[0.715421] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.19.0-11807-g310c7585e830 #9
[0.715612] Hardware name: HP ProLiant DL365 G5   , BIOS A10 05/02/2011
[0.715741] RIP: 0010:note_page+0x4c7/0x780
[0.715864] Code: fd 01 0f 87 1a 09 00 00 41 83 e5 01 0f 85 3f fc ff ff 49 8b 74 
24 18 48 c7 c7 20 72 f2 bc c6 05 13 7f e9 00 01 e8 8a bf 00 00 <0f> 0b e9 20 fc 
ff ff 45 84 ed 0f 85 2b 08 00 00 4d 85 ff 0f 85 91
[0.716141] RSP: 0018:b262c0c5be10 EFLAGS: 00010282
[0.716265] RAX:  RBX: 0161 RCX: bd06b778
[0.716393] RDX: 0001 RSI: 0082 RDI: bd4a972c
[0.716511] RBP:  R08: 02bb R09: bd4eb701
[0.716638] R10: 8ed9800bc240 R11: 00032084 R12: b262c0c5bec0
[0.716775] R13:  R14: 0002 R15: 
[0.716903] FS:  () GS:8edaaba0() 
knlGS:
[0.717085] CS:  0010 DS:  ES:  CR0: 80050033
[0.717208] CR2: b262c0e24000 CR3: 9e60a000 CR4: 06f0
[0.717343] Call Trace:
[0.717470]  ? vprintk_emit+0x18a/0x1e0
[0.717592]  ptdump_walk_pgd_level_core+0x352/0x410
[0.717720]  ? rest_init+0x1/0xcc
[0.717839]  kernel_init+0x39/0x114
[0.717960]  ? rest_init+0xcc/0xcc
[0.718085]  ret_from_fork+0x22/0x40
[0.718207] ---[ end trace 34c16f2bb7a914e2 ]---
[0.744838] x86/mm: Checked W+X mappings: FAILED, 2182367 W+X pages found.

--
Meelis Roos 


Re: HP DL585 warm boot fail (old)

2018-10-25 Thread Meelis Roos

Can you try the patch below?  This is extracted from the code here:
https://github.com/joyent/illumos-joyent/blob/b6a0b04d591f5b877cfe05f45e81f0e8a5cfc2b3/usr/src/uts/intel/io/pci/pci_boot.c#L1805


Thank you. Unfortunately it does not change anything noticable.


Do you see the "disabling NMI on error" message?> 
Can you boot with "pci=earlydump vga=0xf07" and capture the output?

Drop the "vga=0xf07" if it doesn't work or makes the screen
unreadable.


vga= modes did not work with any LCD available there, vga=6 worked with old CRT 
only.
But I connected serial console and got full dmesg.

There is no "disabling NMI on error" in the dmesg.

This also caused(?) a working boot with the same kernel that failed before. 
Both 9600 and 115200 worked the same.

dmesg from pci=earlydump from serial console:

[0.00] Linux version 4.19.0-dirty (mroos@dl585) (gcc version 8.2.0 
(Debian 8.2.0-4)) #97 SMP Wed Oct 24 17:36:06 EEST 2018
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-dirty 
root=/dev/sda1 ro ignore_loglevel pci=earlydump console=ttyS0,115200
[0.00] x86/fpu: x87 FPU will use FXSAVE
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009f3ff] usable
[0.00] BIOS-e820: [mem 0x0009f400-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xf57f67ff] usable
[0.00] BIOS-e820: [mem 0xf57f6800-0xf57f] ACPI data
[0.00] BIOS-e820: [mem 0xfdc0-0xfdc00fff] reserved
[0.00] BIOS-e820: [mem 0xfdc1-0xfdc10fff] reserved
[0.00] BIOS-e820: [mem 0xfdc2-0xfdc20fff] reserved
[0.00] BIOS-e820: [mem 0xfdc3-0xfdc30fff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0xfec2-0xfec20fff] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee0] reserved
[0.00] BIOS-e820: [mem 0xff80-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x0003efff] usable
[0.00] debug: ignoring loglevel setting.
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.3 present.
[0.00] DMI: HP ProLiant DL585 G1, BIOS A01 02/14/2007
[0.00] tsc: Fast TSC calibration using PIT
[0.00] tsc: Detected 2196.908 MHz processor
[0.008307] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.008311] e820: remove [mem 0x000a-0x000f] usable
[0.015723] AGP: No AGP bridge found
[0.015858] last_pfn = 0x3f max_arch_pfn = 0x4
[0.015865] MTRR default type: write-back
[0.015866] MTRR fixed ranges enabled:
[0.015869]   0-9 write-back
[0.015871]   A-B uncachable
[0.015873]   C-F write-back
[0.015874] MTRR variable ranges enabled:
[0.015878]   0 base 00F580 mask 80 uncachable
[0.015881]   1 base 00F600 mask FFFE00 uncachable
[0.015883]   2 base 00F800 mask FFF800 uncachable
[0.015884]   3 disabled
[0.015885]   4 disabled
[0.015886]   5 disabled
[0.015887]   6 disabled
[0.015888]   7 disabled
[0.016523] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT
[0.016767] last_pfn = 0xf57f6 max_arch_pfn = 0x4
[0.016863] Base memory trampoline at [(ptrval)] 99000 size 24576
[0.016878] BRK [0x2be401000, 0x2be401fff] PGTABLE
[0.016887] BRK [0x2be402000, 0x2be402fff] PGTABLE
[0.016891] BRK [0x2be403000, 0x2be403fff] PGTABLE
[0.016964] BRK [0x2be404000, 0x2be404fff] PGTABLE
[0.016971] BRK [0x2be405000, 0x2be405fff] PGTABLE
[0.017198] BRK [0x2be406000, 0x2be406fff] PGTABLE
[0.017209] BRK [0x2be407000, 0x2be407fff] PGTABLE
[0.017219] BRK [0x2be408000, 0x2be408fff] PGTABLE
[0.017284] BRK [0x2be409000, 0x2be409fff] PGTABLE
[0.017521] BRK [0x2be40a000, 0x2be40afff] PGTABLE
[0.017583] ACPI: Early table checksum verification disabled
[0.018039] ACPI: RSDP 0x000F4F20 24 (v02 HP)
[0.018046] ACPI: XSDT 0xF57F6C00 44 (v01 HP A01  
0002 �?   162E)
[0.018058] ACPI: FACP 0xF57F6C80 F4 (v03 HP A01  
0002 �?   162E)
[0.018074] ACPI BIOS Warning (bug): Invalid length for 
FADT/Pm1aControlBlock: 32, using default 16 (20180810/tbfadt-674)
[0.018079] ACPI BIOS Warning (bug): Invalid length for 
FADT/Pm1bControlBlock: 32, using default 16 (20180810/tbfadt-674)
[0.018085] ACPI: DSDT 0xF57F6D80 0051D5 (v01 HP DSDT 
0001 MSFT 0201)
[0.018091] ACPI: FACS 0xF57F68C0 40
[0.018094] ACPI: FACS 0xF

Re: HP DL585 warm boot fail (old)

2018-10-24 Thread Meelis Roos

Can you try the patch below?  This is extracted from the code here:
https://github.com/joyent/illumos-joyent/blob/b6a0b04d591f5b877cfe05f45e81f0e8a5cfc2b3/usr/src/uts/intel/io/pci/pci_boot.c#L1805


Thank you. Unfortunately it does not change anything noticable.


I'm not sure why this would be only an intermittent problem, but at
least we can see if this is related.


It seems 4.19 and current git are 100% reproducers so far - I have not managed 
to
successfully boot either of them yet. I have seen 4.19-rc1 era git kernel 
booting at least once.

I noticed that Debian packaged 4.17 with initramfs worked fine so far for my 
test,
from these I have in grub menu. My selfcompiled kernels do not use initramfs.

--
Meelis Roos 


Re: HH DL585 warm boot fail (old)

2018-10-24 Thread Meelis Roos

Would you mind opening a report at https://bugzilla.kernel.org?  I'm
not sure if anybody will be able to do anything about this, but it's
always possible.


Submitted now, https://bugzilla.kernel.org/show_bug.cgi?id=201503



A complete dmesg log and "sudo lspci -vv" output from a successful
boot would be a good start.  And if you have a screenshot of the
failure, that would help, too.  You can use the "ignore_loglevel"
kernel parameter to make sure we see everything on the console.


Added.


 Does
this machine have an iLO?  If so, it may have logs that could be
useful if this is related to some sort of bus error.


Nothing in the ILO logs.

--
Meelis Roos 


Re: 32-bit PTI with THP = userspace corruption

2018-09-11 Thread Meelis Roos
>   4) Disable PTI support on 2-level paging by making it dependent
>  on CONFIG_X86_PAE. This is, imho, the least ugly option
>  because the machines that do not support PAE are most likely
>  too old to be affected my Meltdown anyway. We might also
>  consider switching i386_defconfig to PAE?
> 
> Any other thoughts?

The machines where I have PAE off are the ones that have less memory. 
PAE is off just for performance reasons, not lack of PAE. PAE should be 
present on all of my affected machines anyway and current distributions 
seem to mostly assume 686 and PAE anyway for 32-bit systems.

-- 
Meelis Roos (mr...@ut.ee)  http://www.cs.ut.ee/~mroos/


rng_dev_read: Kernel memory exposure attempt detected from SLUB object 'kmalloc-64'

2018-09-10 Thread Meelis Roos
This is weekend's 4.19.0-rc2-00246-gd7b686ebf704 on a Thinkad T460s. 
There seems to be a usercopy warning from rng_dev read (full dmesg 
below).

[0.00] microcode: microcode updated early to revision 0xc6, date = 
2018-04-17
[0.00] Linux version 4.19.0-rc2-00246-gd7b686ebf704 (mroos@t460s) (gcc 
version 8.2.0 (Debian 8.2.0-5)) #36 SMP Sat Sep 8 16:27:54 EEST 2018
[0.00] Command line: BOOT_IMAGE=/vmlinuz-4.19.0-rc2-00246-gd7b686ebf704 
root=/dev/mapper/TP-ROOT ro
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
[0.00] x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
[0.00] x86/fpu: Enabled xstate features 0x1f, context size is 960 
bytes, using 'compacted' format.
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009cfff] usable
[0.00] BIOS-e820: [mem 0x0009d000-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xb100afff] usable
[0.00] BIOS-e820: [mem 0xb100b000-0xc3ed5fff] reserved
[0.00] BIOS-e820: [mem 0xc3ed6000-0xc3ed6fff] ACPI NVS
[0.00] BIOS-e820: [mem 0xc3ed7000-0xcff75fff] reserved
[0.00] BIOS-e820: [mem 0xcff76000-0xcff77fff] ACPI NVS
[0.00] BIOS-e820: [mem 0xcff78000-0xcff78fff] reserved
[0.00] BIOS-e820: [mem 0xcff79000-0xcffc5fff] ACPI NVS
[0.00] BIOS-e820: [mem 0xcffc6000-0xcfffdfff] ACPI data
[0.00] BIOS-e820: [mem 0xcfffe000-0xd7ff] reserved
[0.00] BIOS-e820: [mem 0xd860-0xdc7f] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfd00-0xfe7f] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfed0-0xfed00fff] reserved
[0.00] BIOS-e820: [mem 0xfed1-0xfed19fff] reserved
[0.00] BIOS-e820: [mem 0xfed84000-0xfed84fff] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff80-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x0003227f] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.8 present.
[0.00] DMI: LENOVO 20F9003SMS/20F9003SMS, BIOS N1CET65W (1.33 ) 
02/16/2018
[0.00] tsc: Detected 2400.000 MHz processor
[0.002224] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.002226] e820: remove [mem 0x000a-0x000f] usable
[0.002234] last_pfn = 0x322800 max_arch_pfn = 0x4
[0.002238] MTRR default type: write-back
[0.002239] MTRR fixed ranges enabled:
[0.002240]   0-9 write-back
[0.002241]   A-B uncachable
[0.002242]   C-F write-protect
[0.002242] MTRR variable ranges enabled:
[0.002244]   0 base 00E000 mask 7FE000 uncachable
[0.002245]   1 base 00DC00 mask 7FFC00 uncachable
[0.002246]   2 base 00DA00 mask 7FFE00 uncachable
[0.002246]   3 disabled
[0.002246]   4 disabled
[0.002247]   5 disabled
[0.002247]   6 disabled
[0.002248]   7 disabled
[0.002248]   8 disabled
[0.002248]   9 disabled
[0.003223] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[0.003726] last_pfn = 0xb100b max_arch_pfn = 0x4
[0.011684] Scanning 1 areas for low memory corruption
[0.011688] Base memory trampoline at [(ptrval)] 97000 size 24576
[0.011691] Using GB pages for direct mapping
[0.011693] BRK [0x2422f6000, 0x2422f6fff] PGTABLE
[0.011695] BRK [0x2422f7000, 0x2422f7fff] PGTABLE
[0.011696] BRK [0x2422f8000, 0x2422f8fff] PGTABLE
[0.011724] BRK [0x2422f9000, 0x2422f9fff] PGTABLE
[0.011726] BRK [0x2422fa000, 0x2422fafff] PGTABLE
[0.011888] BRK [0x2422fb000, 0x2422fbfff] PGTABLE
[0.011917] BRK [0x2422fc000, 0x2422fcfff] PGTABLE
[0.011986] RAMDISK: [mem 0x36a31000-0x3750]
[0.011996] ACPI: Early table checksum verification disabled
[0.012029] ACPI: RSDP 0x000F0120 24 (v02 LENOVO)
[0.012033] ACPI: XSDT 0xCFFCF188 EC (v01 LENOVO TP-N1C   
 PTE

4.19-rc1: usercopy warning from rng_dev_read()

2018-09-01 Thread Meelis Roos
Some time yesterday I have got this warning in dmesg.


[55255.629421] usercopy: Kernel memory exposure attempt detected from SLUB 
object 'kmalloc-64' (offset 0, size 379)!
[55255.629440] [ cut here ]
[55255.629446] kernel BUG at mm/usercopy.c:102!
[55255.629465] invalid opcode:  [#1] SMP PTI
[55255.629477] CPU: 3 PID: 1719 Comm: rngd Not tainted 4.19.0-rc1 #34
[55255.629483] Hardware name: LENOVO 20F9003SMS/20F9003SMS, BIOS N1CET65W (1.33 
) 02/16/2018
[55255.629499] RIP: 0010:usercopy_abort+0x6f/0x71
[55255.629508] Code: 0f 45 c6 48 c7 c2 2c 27 e0 bd 48 c7 c6 d5 53 df bd 51 48 
0f 45 f2 48 89 f9 41 52 48 89 c2 48 c7 c7 f8 27 e0 bd e8 0e 3c ed ff <0f> 0b 49 
89 e8 31 c9 44 89 e2 31 f6 48 c7 c7 60 27 e0 bd e8 79 ff
[55255.629516] RSP: 0018:a2394078bdb0 EFLAGS: 00010246
[55255.629527] RAX: 0065 RBX: 8d2e5464afc0 RCX: 0006
[55255.629535] RDX:  RSI: 0086 RDI: 8d2e56b95500
[55255.629541] RBP: 017b R08: bd5116c0 R09: 0065
[55255.629548] R10: be6902a0 R11: be67efad R12: 0001
[55255.629555] R13: 8d2e5464b13b R14: 017b R15: 017b
[55255.629564] FS:  7fc22d165700() GS:8d2e56b8() 
knlGS:
[55255.629572] CS:  0010 DS:  ES:  CR0: 80050033
[55255.629579] CR2: 1d6de2d36018 CR3: 000309eae004 CR4: 003606e0
[55255.629584] Call Trace:
[55255.629605]  __check_heap_object+0xd5/0x100
[55255.629615]  __check_object_size+0xf5/0x17c
[55255.629627]  rng_dev_read+0x6e/0x270
[55255.629642]  __vfs_read+0x31/0x170
[55255.629657]  vfs_read+0x85/0x130
[55255.629670]  ksys_read+0x4a/0xb0
[55255.629682]  do_syscall_64+0x4a/0xf0
[55255.629696]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[55255.629706] RIP: 0033:0x7fc22d337394
[55255.629715] Code: 84 00 00 00 00 00 41 54 55 49 89 d4 53 48 89 f5 89 fb 48 
83 ec 10 e8 8b fc ff ff 4c 89 e2 41 89 c0 48 89 ee 89 df 31 c0 0f 05 <48> 3d 00 
f0 ff ff 77 38 44 89 c7 48 89 44 24 08 e8 c7 fc ff ff 48
[55255.629723] RSP: 002b:7fc22d164e10 EFLAGS: 0246 ORIG_RAX: 

[55255.629733] RAX: ffda RBX: 0003 RCX: 7fc22d337394
[55255.629739] RDX: 09c4 RSI: 55a5a95d0b50 RDI: 0003
[55255.629746] RBP: 55a5a95d0b50 R08:  R09: 7fff68b5b080
[55255.629752] R10: 0001 R11: 0246 R12: 09c4
[55255.629759] R13: 7fff68ac3a9f R14: 7fff68ac3aa0 R15: 
[55255.629766] Modules linked in: tun ipt_MASQUERADE nf_conntrack_netlink 
iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter bpfilter xt_conntrack nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp 
llc overlay fuse cpufreq_userspace bnep iwlmvm mac80211 snd_hda_codec_hdmi 
btusb btrtl btbcm btintel iwlwifi snd_hda_codec_realtek snd_hda_codec_generic 
x86_pkg_temp_thermal intel_powerclamp coretemp uvcvideo videobuf2_vmalloc 
videobuf2_memops videobuf2_v4l2 snd_hda_intel snd_hda_codec joydev pcspkr 
videobuf2_common cfg80211 iTCO_wdt snd_hwdep iTCO_vendor_support snd_hda_core 
cdc_mbim cdc_acm cdc_wdm videodev cdc_ncm usbnet mii media bluetooth 
ecdh_generic mei_me mei intel_pch_thermal thinkpad_acpi tpm_crb tpm_tis 
tpm_tis_core pcc_cpufreq tpm ip_tables dm_crypt dm_mod dax
[55255.629913]  hid_generic rtsx_pci_sdmmc mmc_core crct10dif_pclmul e1000e 
i2c_i801 rtsx_pci mfd_core
[55255.629987] ---[ end trace 26cd21a5b2d7ec20 ]---
[55255.630022] RIP: 0010:usercopy_abort+0x6f/0x71
[55255.630046] Code: 0f 45 c6 48 c7 c2 2c 27 e0 bd 48 c7 c6 d5 53 df bd 51 48 
0f 45 f2 48 89 f9 41 52 48 89 c2 48 c7 c7 f8 27 e0 bd e8 0e 3c ed ff <0f> 0b 49 
89 e8 31 c9 44 89 e2 31 f6 48 c7 c7 60 27 e0 bd e8 79 ff
[55255.630069] RSP: 0018:a2394078bdb0 EFLAGS: 00010246
[55255.630102] RAX: 0065 RBX: 8d2e5464afc0 RCX: 0006
[55255.630134] RDX:  RSI: 0086 RDI: 8d2e56b95500
[55255.630154] RBP: 017b R08: bd5116c0 R09: 0065
[55255.630173] R10: be6902a0 R11: be67efad R12: 0001
[55255.630197] R13: 8d2e5464b13b R14: 017b R15: 017b
[55255.630218] FS:  7fc22d165700() GS:8d2e56b8() 
knlGS:
[55255.630246] CS:  0010 DS:  ES:  CR0: 80050033
[55255.630266] CR2: 1d6de2d36018 CR3: 000309eae004 CR4: 003606e0

Config:

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.19.0-rc1 Kernel Configuration
#

#
# Compiler: gcc (Debian 8.2.0-4) 8.2.0
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=80200
CONFIG_CLANG_VERSION=0
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG

Re: cmpxchg.h:245:2: error: ‘asm’ operand has impossible constraints

2018-08-31 Thread Meelis Roos
> > > > 5.3.1-14, seems to be available in 
> > > > http://snapshot.debian.org/package/gcc-5/5.3.1-14/#gcc-5_5.3.1-14 - the 
> > > > whole system is a snapshot of debian unstable when they stoooed 
> > > > supporting pre-686 CPUs.
> > > 
> > > Uurgh. That's going to be a nightmare to set that one up. Let's try to 
> > > nail
> > > it on your machine then. Can you try to generate the intermediate file by
> > > invoking: make mm/slub.i ?
> > 
> > Here you are.
> 
> Looks unsuspicious. Is this an entitely new issue on 4.19-rc or can you see
> the same with older kernel versions?

4.18 was fine with sea same toolchain, so this is a new issue.

-- 
Meelis Roos (mr...@linux.ee)


Re: cmpxchg.h:245:2: error: ‘asm’ operand has impossible constraints

2018-08-31 Thread Meelis Roos
> > While trying to compile v4.18-13105-gaba16dc5cf93 with gcc 5.3.1 on a 
> > 32-bit x86 configured for AMD K6:
> 
> I tried to get hold of that debian gcc 5.3.1 compiler, but no luck so far.

5.3.1-14, seems to be available in 
http://snapshot.debian.org/package/gcc-5/5.3.1-14/#gcc-5_5.3.1-14 - the 
whole system is a snapshot of debian unstable when they stoooed 
supporting pre-686 CPUs.

-- 
Meelis Roos (mr...@linux.ee)


Re: 32-bit PTI with THP = userspace corruption

2018-08-31 Thread Meelis Roos
> > I am seeing userland corruption and application crashes on multiple 
> > 32-bit machines with 4.19-rc1+git. The machines vary: PII, PIII, P4. 
> > They are all Intel. AMD Duron/Athlon/AthlonMP have been fine in my tests 
> > so far (may be configuration dependent).
> 
> Thanks for the report! I'll try to reproduce the problem tomorrow and
> investigate it. Can you please check if any of the kernel configurations
> that show the bug has CONFIG_X86_PAE set? If not, can you please test
> if enabling this option still triggers the problem?

PAE was not visible itself, but when I changed HIGHMEM_4G to 
HIGHMEM_64G, X86_PAE was also selected and the resutling kernel works.

Also, I verified that the olid proliants with 6G RAM already have 
HIGHMEM_64G set and they do not exhibit the problem either.

-- 
Meelis Roos (mr...@linux.ee)


Re: 32-bit PTI with THP = userspace corruption

2018-08-30 Thread Meelis Roos
> > I am seeing userland corruption and application crashes on multiple 
> > 32-bit machines with 4.19-rc1+git. The machines vary: PII, PIII, P4. 
> > They are all Intel. AMD Duron/Athlon/AthlonMP have been fine in my tests 
> > so far (may be configuration dependent).
> 
> Thanks for the report! I'll try to reproduce the problem tomorrow and
> investigate it. Can you please check if any of the kernel configurations
> that show the bug has CONFIG_X86_PAE set? If not, can you please test
> if enabling this option still triggers the problem?

Will check, but out of my memery there were 2 G3 HP Proliants that did 
not fit into the pattern (problem did not appear). I have more than 4G 
RAM in those and HIGHMEM_4G there, maybe that's it?

-- 
Meelis Roos (mr...@linux.ee)


32-bit PTI with THP = userspace corruption

2018-08-30 Thread Meelis Roos
I am seeing userland corruption and application crashes on multiple 
32-bit machines with 4.19-rc1+git. The machines vary: PII, PIII, P4. 
They are all Intel. AMD Duron/Athlon/AthlonMP have been fine in my tests 
so far (may be configuration dependent).

Typical problem is running aptitude in Debian unstable, doing package 
list update and seeing glibc warning about linked list corruption and 
some other corruption, causing SIGABRT-s:

corrupted double-linked list
Ouch!  Got SIGABRT, dying..

malloc_consolidate(): invalid chunk size
Ouch!  Got SIGABRT, dying..

I bisected the problem. It was tricky because it led to 32-bit bpf 
problem commit range, but that could be worked around with the patch 
that was later applied. The result is 32-bit PTI introduction commit 
(PTI was turned on on all the test machines):

7757d607c6b3186de42e1fb0210b9c5d8b70 is the first bad commit
commit 7757d607c6b3186de42e1fb0210b9c5d8b70
Author: Joerg Roedel 
Date:   Wed Jul 18 11:41:14 2018 +0200

x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32

Allow PTI to be compiled on x86_32.

Signed-off-by: Joerg Roedel 
Signed-off-by: Thomas Gleixner 
Tested-by: Pavel Machek 
Cc: "H . Peter Anvin" 
Cc: linux...@kvack.org
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Josh Poimboeuf 
Cc: Juergen Gross 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Jiri Kosina 
Cc: Boris Ostrovsky 
Cc: Brian Gerst 
Cc: David Laight 
Cc: Denys Vlasenko 
Cc: Eduardo Valentin 
Cc: Greg KH 
Cc: Will Deacon 
Cc: aligu...@amazon.com
Cc: daniel.gr...@iaik.tugraz.at
Cc: hu...@google.com
Cc: keesc...@google.com
Cc: Andrea Arcangeli 
Cc: Waiman Long 
Cc: "David H . Gutteridge" 
Cc: j...@8bytes.org
Link: 
https://lkml.kernel.org/r/1531906876-13451-38-git-send-email-j...@8bytes.org

:04 04 dbab9a897d534d7b14f900f0c6779b6848833892 
f0674017544bc95fafa431d1e638f994eca37b51 M  security

However, not all of my 32-bit Intel machines showed the problem, so I 
looked for correlations in kernel configs (6 working and 6 non-working) 
and found a suspect of CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y, as well as 
multiple CPU hotplug options (not turned on directly but by something 
else, I think - and not investigated further). I retested 
v4.19-rc1-95-g3f16503b7d22 with changed configuration options and found 
that it starts to work as soon as I turn 
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS to madvise or turn off 
CONFIG_PAGE_TABLE_ISOLATION. So the combination of PTI and THP always-on 
breaks it.

Here is a sample configuration that is broken:

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.19.0-rc1 Kernel Configuration
#

#
# Compiler: gcc (Debian 8.2.0-3) 8.2.0
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=80200
CONFIG_CLANG_VERSION=0
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_KERNEL_XZ=y
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_CPU_ISOLATION is not set

#
#

cmpxchg.h:245:2: error: ‘asm’ operand has impossible constraints

2018-08-26 Thread Meelis Roos
While trying to compile v4.18-13105-gaba16dc5cf93 with gcc 5.3.1 on a 
32-bit x86 configured for AMD K6:

  CC  mm/slub.o
In file included from ./arch/x86/include/asm/atomic.h:8:0,
 from ./include/linux/atomic.h:7,
 from ./arch/x86/include/asm/thread_info.h:54,
 from ./include/linux/thread_info.h:38,
 from ./arch/x86/include/asm/preempt.h:7,
 from ./include/linux/preempt.h:81,
 from ./include/linux/spinlock.h:51,
 from ./include/linux/mmzone.h:8,
 from ./include/linux/gfp.h:6,
 from ./include/linux/mm.h:10,
 from mm/slub.c:13:
mm/slub.c: In function ‘__slab_free’:
./arch/x86/include/asm/cmpxchg.h:245:2: error: ‘asm’ operand has impossible 
constraints
  asm volatile(pfx "cmpxchg%c4b %2; sete %0"   \
  ^
./arch/x86/include/asm/cmpxchg.h:254:2: note: in expansion of macro 
‘__cmpxchg_double’
  __cmpxchg_double(LOCK_PREFIX, p1, p2, o1, o2, n1, n2)
  ^
./include/asm-generic/atomic-instrumented.h:457:2: note: in expansion of macro 
‘arch_cmpxchg_double’
  arch_cmpxchg_double(__ai_p1, (p2), (o1), (o2), (n1), (n2)); \
  ^
mm/slub.c:404:7: note: in expansion of macro ‘cmpxchg_double’
   if (cmpxchg_double(&page->freelist, &page->counters,
   ^
scripts/Makefile.build:307: recipe for target 'mm/slub.o' failed
make[1]: *** [mm/slub.o] Error 1

Config:

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.18.0 Kernel Configuration
#

#
# Compiler: gcc (Debian 5.3.1-14) 5.3.1 20160409
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=50301
CONFIG_CLANG_VERSION=0
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_KERNEL_XZ=y
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TINY_SRCU=y
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_CGROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_HUGETLB is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CGROUP_PERF is not set
CONFIG_CGROUP_BPF=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set
# CONFIG_CHECKPOINT_RESTORE is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
# CONFIG_EXPERT is not set
CONFIG_UID16=y
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_PRINTK_NMI=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIM

make *config regression: pkg-build

2018-08-19 Thread Meelis Roos
Just tried to run 'make menuconfig' on v4.18-10568-g08b5fa819970 and 
found a bad surprise:

'make *config' requires 'pkg-config'. Please install it.
make[1]: *** [scripts/kconfig/Makefile:219: scripts/kconfig/.mconf-cfg] Error 1

This is clearly a regression - I have libncurses devele pakcage 
installed in the default system location (as do 99%+ on actuall 
develeopers proobably) and in this case, pkg-config is useless. 
pkg-config is needed only when libraries and headers are installed in 
non-default locations but it is bad to require installation of 
pkg-config on all the machines where make menuconfig would be possibly 
run (for example, I have a aboult 100 machine kernel testbed with 
self-hosted kernel compilation and machine-specific kernel 
configurations that ocassionally need tweaking).

I notice 4.18 complained it can not find pkg-config but still worked. 
This is clearly better than now.

If we want to support developers with libraries in non-default 
locations, why not - but the common case of system include path should 
work without any trouble or warnings. For exaple, test if compilation 
against ncurses works, and if not retry it with pkg-config (and error 
out if it does not give working result).

-- 
Meelis Roos (mr...@linux.ee)


ptrace compile failure with gcc-8.2 on 32-bit powerpc

2018-08-16 Thread Meelis Roos
After upgrading my distro compiler to gcc-8.2, Linux fails to compile on 
32-bit powerpc (tested with 4.17, 4.18 and v4.18-7873-gf91e654474d4).


  CC  arch/powerpc/kernel/ptrace.o
In file included from ./include/linux/bitmap.h:9,
 from ./include/linux/cpumask.h:12,
 from ./include/linux/rcupdate.h:44,
 from ./include/linux/rculist.h:11,
 from ./include/linux/pid.h:5,
 from ./include/linux/sched.h:14,
 from arch/powerpc/kernel/ptrace.c:19:
In function ‘memcpy’,
inlined from ‘user_regset_copyin’ at ./include/linux/regset.h:295:4,
inlined from ‘vr_set’ at arch/powerpc/kernel/ptrace.c:619:9:
./include/linux/string.h:345:9: error: ‘__builtin_memcpy’ offset [-527, -529] 
is out of the bounds [0, 16] of object ‘vrsave’ with type ‘union ’ 
[-Werror=array-bounds]
  return __builtin_memcpy(p, q, size);
 ^~~~
arch/powerpc/kernel/ptrace.c: In function ‘vr_set’:
arch/powerpc/kernel/ptrace.c:614:5: note: ‘vrsave’ declared here
   } vrsave;
 ^~
In file included from ./include/linux/bitmap.h:9,
 from ./include/linux/cpumask.h:12,
 from ./include/linux/rcupdate.h:44,
 from ./include/linux/rculist.h:11,
 from ./include/linux/pid.h:5,
 from ./include/linux/sched.h:14,
 from arch/powerpc/kernel/ptrace.c:19:
In function ‘memcpy’,
inlined from ‘user_regset_copyout’ at ./include/linux/regset.h:270:4,
inlined from ‘vr_get’ at arch/powerpc/kernel/ptrace.c:572:9:
./include/linux/string.h:345:9: error: ‘__builtin_memcpy’ offset [-527, -529] 
is out of the bounds [0, 16] of object ‘vrsave’ with type ‘union ’ 
[-Werror=array-bounds]
  return __builtin_memcpy(p, q, size);
 ^~~~
arch/powerpc/kernel/ptrace.c: In function ‘vr_get’:
arch/powerpc/kernel/ptrace.c:567:5: note: ‘vrsave’ declared here
   } vrsave;
 ^~
cc1: all warnings being treated as errors
make[1]: *** [scripts/Makefile.build:311: arch/powerpc/kernel/ptrace.o] Error 1


-- 
Meelis Roos (mr...@linux.ee)


apparmor unaligned accesses on sparc64 in 4.18+git

2018-08-15 Thread Meelis Roos
Just tried 4.18.0-02978-g1eb46908b35d on a sparc64 box with Debian Ports 
sparc64 unstable (apparmor packages recommended by linux-image package) 
and got the following on bootup:

[   46.315721] Kernel unaligned access at TPC[6b8b98] aa_dfa_unpack+0x38/0x620
[   46.412375] Kernel unaligned access at TPC[6b8ba8] aa_dfa_unpack+0x48/0x620
[   46.412392] Kernel unaligned access at TPC[6b8c28] aa_dfa_unpack+0xc8/0x620
[   46.698283] Kernel unaligned access at TPC[6b8ce8] aa_dfa_unpack+0x188/0x620
[   46.789536] Kernel unaligned access at TPC[6b8cfc] aa_dfa_unpack+0x19c/0x620

-- 
Meelis Roos (mr...@linux.ee)


4.18+git: undefined reference to `l1tf_vmx_mitigation'

2018-08-14 Thread Meelis Roos
Tried to compile current git (v4.18-1934-gbe718b524d8d) with AMD KVM and 
got the following linking error:

  MODPOST vmlinux.o
ld: arch/x86/kvm/x86.o: in function `kvm_get_arch_capabilities':
x86.c:(.text+0x5132): undefined reference to `l1tf_vmx_mitigation'

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.18.0 Kernel Configuration
#

#
# Compiler: gcc (Debian 8.2.0-3) 8.2.0
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_FILTER_PGPROT=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=80200
CONFIG_CLANG_VERSION=0
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_KERNEL_XZ=y
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="x4200m2"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=18
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_ARCH_SUPPORTS_INT128=y
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_SWAP_ENABLED=y
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
# CONFIG_RT_GROUP_SCHED is not set
CONFIG_CGROUP_PIDS=y
# CONFIG_CGROUP_RDMA is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not 

Re: bisected: 4.18-rc* regression: x86-32 troubles (with timers?)

2018-07-23 Thread Meelis Roos
> >> Now this seems more relevant:
> >>
> >> mroos@rx100s2:~/linux$ nice git bisect good
> >> 24dea04767e6e5175f4750770281b0c17ac6a2fb is the first bad commit
> >> commit 24dea04767e6e5175f4750770281b0c17ac6a2fb
> >> Author: Daniel Borkmann 
> >> Date:   Fri May 4 01:08:23 2018 +0200
> >>
> >> bpf, x32: remove ld_abs/ld_ind
> >>
> >> Since LD_ABS/LD_IND instructions are now removed from the core and
> >> reimplemented through a combination of inlined BPF instructions and
> >> a slow-path helper, we can get rid of the complexity from x32 JIT.
> > 
> > This does seem much more likely than the previous bisection, given
> > that you ended up in an x86-32 specific commit (the subject says x32,
> > but that is a mistake). I also checked that systemd indeed does
> > call into bpf in a number of places, possibly for the journald socket.
> > 
> > OTOH, it's still hard to tell how that commit can have ended up
> > corrupting the clock read function in systemd. To cross-check,
> > could you try reverting that commit on the latest kernel and see
> > if it still works?
> 
> I would be curious as well about that whether revert would make it
> work. What's the value of sysctl net.core.bpf_jit_enable ? Does it
> change anything if you set it to 0 (only interpreter) or 1 (JIT
> enabled). Seems a bit strange to me that bisect ended at this commit
> given the issue you have. The JIT itself was also new in this window
> fwiw. In any case some more debug info would be great to have.

net.core.bpf_jit_enable is 1.

Since it breaks bootup, I can not easily change the value at runtime (it 
would be postfactum). Do you mean changing the 
CONFIG_BPF_JIT_ALWAYS_ON=y option?

Anyway, I started compile of v4.18-rc5 that was the latest I tested, 
with the commit in question reverted. Will see if I can test tomorrow 
morning. But I will leave tomorrow for a week and can only test further 
things if they happen to boot fine (no manual reboot possible for a 
week).

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.18-rc* regression: x86-32 troubles (with timers?)

2018-07-20 Thread Meelis Roos
449
# bad: [e64d52569f6e847495091db40ab58d2d379748ef] tools: bpftool: move 
get_possible_cpus() to common code
git bisect bad e64d52569f6e847495091db40ab58d2d379748ef
# bad: [b4264c96b5cbc00c4c07deb9fbab928d43dffcf9] nfp: bpf: rewrite map 
pointers with NFP TIDs
git bisect bad b4264c96b5cbc00c4c07deb9fbab928d43dffcf9
# bad: [9816dd35ececc095f3e3be29d30d3adc755908d9] nfp: bpf: perf event output 
helpers support
git bisect bad 9816dd35ececc095f3e3be29d30d3adc755908d9
# first bad commit: [9816dd35ececc095f3e3be29d30d3adc755908d9] nfp: bpf: perf 
event output helpers support


-- 
Meelis Roos (mr...@linux.ee)


Re: 4.18-rc* regression: x86-32 troubles (with timers?)

2018-07-16 Thread Meelis Roos
> > Everything below here is is 'bad', which can be an indication that you
> > misclassified one of
> > the commits above as 'good' when it should have been 'bad'. The most likely
> > explanations are that you either typed the 'git bisect good' by accident, or
> > that the failure is not 100% reliable, and it sometimes works fine even on a
> > broken kernel.
> > 
> > 0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct the
> > variable name in v9fs_get_trans_by_name() comment", which is marked "good",
> > and can't really be good if 0bc5fe85727413 is bad and you are not using the
> > 'qed' driver.
> > 
> > I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and
> > if it was, test v4.17-rc4, which is what the net-next tree was based on.
> 
> Yes, the same prebuilt 3a443bd6dd7c appeared to be bad when retesting 
> it. Building v4.17-rc4 now.

v4.17-rc4 seems good after 2 reboots.

-- 
Meelis Roos (mr...@ut.ee)  http://www.cs.ut.ee/~mroos/


Re: 4.18-rc* regression: x86-32 troubles (with timers?)

2018-07-16 Thread Meelis Roos
> Everything below here is is 'bad', which can be an indication that you
> misclassified one of
> the commits above as 'good' when it should have been 'bad'. The most likely
> explanations are that you either typed the 'git bisect good' by accident, or
> that the failure is not 100% reliable, and it sometimes works fine even on a
> broken kernel.
> 
> 0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct the
> variable name in v9fs_get_trans_by_name() comment", which is marked "good",
> and can't really be good if 0bc5fe85727413 is bad and you are not using the
> 'qed' driver.
> 
> I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and
> if it was, test v4.17-rc4, which is what the net-next tree was based on.

Yes, the same prebuilt 3a443bd6dd7c appeared to be bad when retesting 
it. Building v4.17-rc4 now.

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.18-rc* regression: x86-32 troubles (with timers?)

2018-07-15 Thread Meelis Roos
7104a] qed: Add support for Unified 
Fabric Port.
git bisect bad cac6f691546b9efd50c31c0db97fe50d0357104a
# bad: [27bf96e32c92599dc7523b36d6c761fc8312c8c0] qed: Remove unused data 
member 'is_mf_default'.
git bisect bad 27bf96e32c92599dc7523b36d6c761fc8312c8c0
# bad: [0bc5fe857274133ca028ebb15ff2e8549a369916] qed*: Refactor mf_mode to 
consist of bits.
git bisect bad 0bc5fe857274133ca028ebb15ff2e8549a369916
# first bad commit: [0bc5fe857274133ca028ebb15ff2e8549a369916] qed*: Refactor 
mf_mode to consist of bits.


-- 
Meelis Roos (mr...@linux.ee)


HH DL585 warm boot fail (old)

2018-07-06 Thread Meelis Roos
I have a first gen HP Proliant DL585 ("G1" but the name was not used 
back then) that boots up fine from poweron but usually fails bootup from 
warm reboot, somewhere in PCI detection (will try to photographs the 
screen some time).

I just stumbled upon an old OpenSolaris thead about the same DL585 and 
same symptoms: 
http://opensolaris-discuss.opensolaris.narkive.com/T0UTXYGZ/solaris-10-06-06-x86-hp-dl585-boot-hang-aftrer-reboot-help

Their conclusion was the wfollowing and they seem to have found a fix 
(although I have not tested any version of Solaris on this DL585 
myself):

"The hang is caused when, during PCI enumeration, a PCI-PCI bridge is 
partially disabled when the PCI command register bits which enable IO 
and memory windows are cleared."

Is this information useful in some way for debugging it?

What else besides screenshot of the can be useful in debugging?

-- 
Meelis Roos (mr...@linux.ee)


UBSAN: Undefined behaviour in lib/percpu_counter.c:92:14

2018-07-06 Thread Meelis Roos
This is on a AMD Athlon64 X2 compiling kernel with make -2:

[91550.438790] 

[91550.438832] UBSAN: Undefined behaviour in lib/percpu_counter.c:92:14
[91550.438862] signed integer overflow:
[91550.43] 91550438785688 + 9223336756968817285 cannot be represented in 
type 'long long int'
[91550.438923] CPU: 0 PID: 8875 Comm: cc1 Not tainted 
4.18.0-rc3-00113-gfc36def997cf #11
[91550.438924] Hardware name: HP-Pavilion RT589AA-ABU t3709.uk/Nance, BIOS 5.02 
11/26/2006
[91550.438924] Call Trace:
[91550.438929]  
[91550.438937]  dump_stack+0x5a/0x9b
[91550.438941]  ubsan_epilogue+0x9/0x40
[91550.438944]  handle_overflow+0xf2/0x100
[91550.438946]  percpu_counter_add_batch+0xfb/0x120
[91550.438949]  cfq_completed_request+0x320/0xb00
[91550.438953]  __blk_put_request+0x15d/0x390
[91550.438957]  scsi_end_request+0x154/0x370
[91550.438960]  scsi_io_completion+0x603/0x9e0
[91550.438963]  blk_done_softirq+0xe6/0x1c0
[91550.438967]  __do_softirq+0x118/0x414
[91550.438970]  irq_exit+0xa2/0xd0
[91550.438972]  do_IRQ+0xac/0x160
[91550.438974]  common_interrupt+0xf/0xf
[91550.438976]  
[91550.438978] RIP: 0033:0x7f54e89631b7
[91550.438979] Code: 83 f9 02 48 0f 47 cf 83 c1 7c e9 a9 fa ff ff 4c 8b 41 08 
4c 89 c2 48 83 e2 f8 48 39 d3 0f 87 fb 00 00 00 48 8d 3c 11 48 8b 07 <48> 39 d0 
0f 85 35 01 00 00 48 8b 51 10 48 8b 71 18 48 39 4a 18 0f 
[91550.439006] RSP: 002b:7ffc7398a470 EFLAGS: 0287 ORIG_RAX: 
ffde
[91550.439007] RAX: 02a0 RBX: 0060 RCX: 03311380
[91550.439009] RDX: 02a0 RSI: 7f54e8c96f30 RDI: 03311620
[91550.439010] RBP: 0004 R08: 02a1 R09: 7f54e8c96cb0
[91550.439011] R10:  R11: 0001 R12: 
[91550.439012] R13: 7f54e8c96c40 R14: 02e3b010 R15: 7f54e8c96ca0
[91550.439013] 
========


-- 
Meelis Roos (mr...@linux.ee)


Re: 4.18-rc* regression: x86-32 troubles (with timers?)

2018-07-05 Thread Meelis Roos
> > I tried 4.18.0-rc1-00023-g9ffc59d57228 and now
> > 4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other
> > 32-bit machines, and got half-failed bootup - kernel and userspace come
> > up but some services fail to start, including network and
> > systemd-journald:
> >
> > systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), &ts) 
> > == 0' failed at ../src/basic/time-util.c:53, function now(). Aborting.
> >
> > I then tried multiple other machines. All x86-64 machines seem
> > unaffected, some x86-32 machines are affected (Athlon with AMD750
> > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset),
> > some very similar x86-32 machines are unaffected. I have different
> > customized kernel configuration on them, so far I have not pinpointed
> > any configuration option to be at fault.
> >
> > All machines run Debian unstable.
> >
> > 4.17.0 was working fine.
> >
> > Will continue with bisecting between 4.17.0 and
> > 4.18.0-rc1-00023-g9ffc59d57228.
> 
> That does sound like it is related to my patches indeed. If you are not
> yet done bisecting, please checkout commit e27c49291a7f ("x86: Convert
> x86_platform_ops to timespec64") before you try anything else, that
> one is the top of the branch with my changes. If that fails, the bisection
> will be much quicker.

This commit was fine. So it's likely something else.

-- 
Meelis Roos (mr...@linux.ee)


4.18-rc* regression: x86-32 troubles (with timers?)

2018-07-04 Thread Meelis Roos
I tried 4.18.0-rc1-00023-g9ffc59d57228 and now 
4.18.0-rc3-00113-gfc36def997cf on a 32-bit server and then some other 
32-bit machines, and got half-failed bootup - kernel and userspace come 
up but some services fail to start, including network and 
systemd-journald:

systemd-journald[85]: Assertion 'clock_gettime(map_clock_id(clock_id), &ts) == 
0' failed at ../src/basic/time-util.c:53, function now(). Aborting.

I then tried multiple other machines. All x86-64 machines seem 
unaffected, some x86-32 machines are affected (Athlon with AMD750 
chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset), 
some very similar x86-32 machines are unaffected. I have different 
customized kernel configuration on them, so far I have not pinpointed 
any configuration option to be at fault.

All machines run Debian unstable.

4.17.0 was working fine.

Will continue with bisecting between 4.17.0 and 
4.18.0-rc1-00023-g9ffc59d57228.


[0.00] Linux version 4.18.0-rc3-00113-gfc36def997cf (mroos@rx100s2) 
(gcc version 7.3.0 (Debian 7.3.0-23)) #27 SMP Wed Jul 4 13:06:34 EEST 2018
[0.00] x86/fpu: x87 FPU will use FXSAVE
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009afff] usable
[0.00] BIOS-e820: [mem 0x0009b000-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000ca000-0x000cbfff] reserved
[0.00] BIOS-e820: [mem 0x000dc000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x3ff6] usable
[0.00] BIOS-e820: [mem 0x3ff7-0x3ff79fff] ACPI data
[0.00] BIOS-e820: [mem 0x3ff7a000-0x3ff7] ACPI NVS
[0.00] BIOS-e820: [mem 0x3ff8-0x3fff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec0] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff80-0xffbf] reserved
[0.00] BIOS-e820: [mem 0xfc00-0x] reserved
[0.00] Notice: NX (Execute Disable) protection missing in CPU!
[0.00] SMBIOS 2.3 present.
[0.00] DMI: FUJITSU SIEMENS PRIMERGY RX100S2/D1571/M71IXG, BIOS 6.0 
Rev. C0F2.1571 04/27/2005
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] last_pfn = 0x3ff70 max_arch_pfn = 0x10
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-C7FFF write-protect
[0.00]   C8000-D uncachable
[0.00]   E-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 0 mask FC000 write-back
[0.00]   1 base 03FF8 mask 8 uncachable
[0.00]   2 disabled
[0.00]   3 disabled
[0.00]   4 disabled
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- UC  
[0.00] total RAM covered: 1023M
[0.00] Found optimal setting for mtrr clean up
[0.00]  gran_size: 64K  chunk_size: 1M  num_reg: 2  lose cover RAM: 
0G
[0.00] found SMP MP-table at [mem 0x000f6680-0x000f668f] mapped at 
[(ptrval)]
[0.00] initial memory mapped: [mem 0x-0x04ff]
[0.00] Base memory trampoline at [(ptrval)] 97000 size 16384
[0.00] BRK [0x04d97000, 0x04d97fff] PGTABLE
[0.00] ACPI: Early table checksum verification disabled
[0.00] ACPI: RSDP 0x000F66B0 14 (v00 PTLTD )
[0.00] ACPI: RSDT 0x3FF75B79 38 (v01 PTLTDRSDT   
0604  LTP )
[0.00] ACPI: FACP 0x3FF79E69 74 (v01 INTEL  CANTWOOD 
0604 PTL  0003)
[0.00] ACPI: DSDT 0x3FF75BB1 0042B8 (v01 INTEL  CANTWOOD 
0604 MSFT 010B)
[0.00] ACPI: FACS 0x3FF7AFC0 40
[0.00] ACPI: SPCR 0x3FF79EDD 50 (v01 PTLTD  $UCRTBL$ 
0604 PTL  0001)
[0.00] ACPI: APIC 0x3FF79F2D 74 (v01 PTLTD  ? APIC   
0604  LTP )
[0.00] ACPI: BOOT 0x3FF79FA1 28 (v01 PTLTD  $SBFTBL$ 
0604  LTP 0001)
[0.00] ACPI: SSDT 0x3FF79FC9 37 (v01 PTLTD  ACPIHT   
0604  LTP 0001)
[0.00] ACPI: Local APIC address 0xfee0
[0.00] 135MB HIGHMEM available.
[0.00] 887MB LOWMEM available.
[0.00]   mapped low ram: 0 - 377fe000
[0.00]   low ram: 0 - 377fe000
[0.00] tsc: Fast TSC calibration using PIT
[0.00] BRK [0x04d98000, 0x04d98fff] PGTABLE
[0.00] Zone ranges:
[0.00]   DMA  [mem 0x1000-0x00ff]
[0.00]

4.18-rc1: Bad or missing .orc_unwind table. Disabling unwinder.

2018-06-20 Thread Meelis Roos
HP Proliant DL360 G6 displays the following on bootup with 
4.18.0-rc1-00023-g9ffc59d57228 (4.17 did not display this warning):

[0.00] WARNING: WARNING: Bad or missing .orc_unwind table.  Disabling 
unwinder.

Debian unstable, gcc 7.3.0-21, config below.

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.18.0-rc1 Kernel Configuration
#

#
# Compiler: gcc (Debian 7.3.0-21) 7.3.0
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_FILTER_PGPROT=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=70300
CONFIG_CLANG_VERSION=0
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_KERNEL_XZ=y
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_ARCH_SUPPORTS_INT128=y
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
# CONFIG_MEMCG_SWAP_ENABLED is not set
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
# CONFIG_RT_GROUP_SCHED is not set
CONFIG_CGROUP_PIDS=y
# CONFIG_CGROUP_RDMA is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is n

iomi-si UBSAN warning and NULL pointer dereference

2018-06-19 Thread Meelis Roos
: 0006 R12: c0181aa8
[7.611822] R13:  R14: 8e8e3b2df240 R15: c0181260
[7.611894] FS:  7fef3a80b8c0() GS:8e8e3dd0() 
knlGS:
[7.611988] CS:  0010 DS:  ES:  CR0: 80050033
[7.612067] CR2:  CR3: 3ab1a000 CR4: 000006e0


-- 
Meelis Roos (mr...@linux.ee)


4.17.0-10146-gf0dc7f9c6dd9: hw csum failure on powerpc+sungem

2018-06-11 Thread Meelis Roos
I am seeing this on PowerMac G4 with sungem ethernet driver. 4.17 was 
OK, 4.17.0-10146-gf0dc7f9c6dd9 is problematic.

[  140.518664] eth0: hw csum failure
[  140.518699] CPU: 0 PID: 1237 Comm: postconf Not tainted 
4.17.0-10146-gf0dc7f9c6dd9 #83
[  140.518707] Call Trace:
[  140.518734] [effefd90] [c03d6db8] __skb_checksum_complete+0xd8/0xdc 
(unreliable)
[  140.518759] [effefdb0] [c04c1284] icmpv6_rcv+0x248/0x4ec
[  140.518775] [effefdd0] [c049a448] ip6_input_finish.constprop.0+0x11c/0x5f4
[  140.518786] [effefe10] [c049b1c0] ip6_mc_input+0xcc/0x100
[  140.518807] [effefe20] [c03e110c] __netif_receive_skb_core+0x310/0x944
[  140.518820] [effefe70] [c03e76ec] napi_gro_receive+0xd0/0xe8
[  140.518845] [effefe80] [f3e1f66c] gem_poll+0x618/0x1274 [sungem]
[  140.518856] [effeff30] [c03e6f0c] net_rx_action+0x198/0x374
[  140.518872] [effeff90] [c0501a88] __do_softirq+0x120/0x278
[  140.518890] [effeffe0] [c0036188] irq_exit+0xd8/0xdc
[  140.518908] [effefff0] [c000f478] call_do_irq+0x24/0x3c
[  140.518925] [d05a5d30] [c0007120] do_IRQ+0x74/0xf0
[  140.518941] [d05a5d50] [c0012474] ret_from_except+0x0/0x14
[  140.518960] --- interrupt: 501 at copy_page+0x40/0x90
   LR = copy_user_page+0x18/0x30
[  140.518973] [d05a5e10] [d058cd80] 0xd058cd80 (unreliable)
[  140.518989] [d05a5e20] [c00fa2bc] wp_page_copy+0xec/0x654
[  140.519002] [d05a5e60] [c00fd3a4] do_wp_page+0xa8/0x5b4
[  140.519013] [d05a5e90] [c00fe934] handle_mm_fault+0x564/0xa84
[  140.519025] [d05a5f00] [c0016230] do_page_fault+0x1bc/0x7e8
[  140.519037] [d05a5f40] [c0012300] handle_page_fault+0x14/0x40
[  140.519048] --- interrupt: 301 at 0xb78b6864
   LR = 0xb78b6c54


-- 
Meelis Roos (mr...@linux.ee)


Re: 85f1abe001 ("kthread, sched/wait: Fix kthread_parkme() .."): WARNING: CPU: 0 PID: 1 at kernel/kthread.c:486 kthread_park

2018-05-24 Thread Meelis Roos
I had the same kthread_parkme warning on many machines I tested with 
4.17.0-rc6-00158-gbee797529d7c (x86, amd64, sparc, parisc, alpha).

Your patch https://lkml.org/lkml/2018/5/4/212 fixed the problem for me.

Sorry for off-thread respnse, I found your mail from the web only.

-- 
Meelis Roos (mr...@linux.ee)


Re: [PATCH v1 0/4] sparc/PCI: VGA resource and other fixes

2018-05-24 Thread Meelis Roos
> [+cc sparclinux, sorry I missed this first time around]

> >   sparc/PCI: Use dev_printk() when possible

This patch causes compile errors for me:

  CC  arch/sparc/kernel/pci.o
In file included from ./include/linux/pci.h:31:0,
 from arch/sparc/kernel/pci.c:18:
arch/sparc/kernel/pci.c: In function ‘pcibios_enable_device’:
arch/sparc/kernel/pci.c:754:53: error: ‘old_cmd’ undeclared (first use in this 
function)
   pci_info(dev, "enabling device (%04x -> %04x)\n", old_cmd, cmd);
 ^
./include/linux/device.h:1384:58: note: in definition of macro ‘dev_info’
 #define dev_info(dev, fmt, arg...) _dev_info(dev, fmt, ##arg)
  ^
arch/sparc/kernel/pci.c:754:3: note: in expansion of macro ‘pci_info’
   pci_info(dev, "enabling device (%04x -> %04x)\n", old_cmd, cmd);
   ^
arch/sparc/kernel/pci.c:754:53: note: each undeclared identifier is reported 
only once for each function it appears in
   pci_info(dev, "enabling device (%04x -> %04x)\n", old_cmd, cmd);
 ^
./include/linux/device.h:1384:58: note: in definition of macro ‘dev_info’
 #define dev_info(dev, fmt, arg...) _dev_info(dev, fmt, ##arg)
  ^
arch/sparc/kernel/pci.c:754:3: note: in expansion of macro ‘pci_info’
   pci_info(dev, "enabling device (%04x -> %04x)\n", old_cmd, cmd);
   ^
scripts/Makefile.build:312: recipe for target 'arch/sparc/kernel/pci.o' failed


-- 
Meelis Roos (mr...@linux.ee)


Re: 4.16-rc2+git: pata_serverworks: hanging ata detection thread on HP DL380G3

2018-04-03 Thread Meelis Roos
> Hello, Meelis.
> 
> Can you please verify whether the following patch fixes the problem?
> 
> Thanks.
> 
> Subject: blk-mq: Directly schedule q->timeout_work when aborting a request

Yes, this patch on top of 4.16 fixes it for me. dmesg shows CD detection 
works fast now:

[2.278383] libata version 3.00 loaded.
[2.292212] scsi host1: pata_serverworks
[2.292618] scsi host2: pata_serverworks
[2.292844] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x2000 irq 14
[2.292973] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x2008 irq 15
[...]
[2.578720] ata1.00: ATAPI: COMPAQ  CD-ROM SN-124, N104, max PIO4
[2.583705] ata1.00: configured for PIO4
[2.584526] scsi 1:0:0:0: CD-ROMCOMPAQ   CD-ROM SN-124N104 
PQ: 0 ANSI: 5
[2.812963] scsi 1:0:0:0: Attached scsi generic sg3 type 5
[...]
[3.179602] sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda 
tray
[3.179826] cdrom: Uniform CD-ROM driver Revision: 3.20
[3.180198] sr 1:0:0:0: Attached scsi CD-ROM sr0

config at the last step of bisection:

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.15.0-rc4 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=3
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_KERNEL_XZ=y
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
# CONFIG_TASKS_RCU is not set
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# CONFIG_BUILD_BIN2C is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_CGROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
# CONFIG

Re: 4.16-rc2+git: pata_serverworks: hanging ata detection thread on HP DL380G3

2018-03-30 Thread Meelis Roos
Added CC-s, start of the thread is at 
https://lkml.org/lkml/2018/2/26/165

> > > 4.16 git bootup on HP Proliant DL380 G3 pauses for a a minute or two and 
> > > then continues with "blocked for more than 120 seconds" message with 
> > > libata detection functions in ther stack - 
> > > async_synchronize_cookie_domain() as the last. It seems to happen during 
> > > IDE CD-ROM detection (detected before but registered as sr0 after the 
> > > warning). After detection, the eject button on the drive did not work.
> > > 
> > > 
> > > pata_serverworks is the libata driver in use.
> 
> There were no changes to pata_serverworks since 2014 and libata changes
> in v4.16 look obviously correct..
> 
> > This is still the same in 4.16.0-rc7-00062-g0b412605ef5f.
> 
> Any chance that you could bisect this issue?

Bisected to the following commit:

358f70da49d77c43f2ca11b5da584213b2add29c is the first bad commit
commit 358f70da49d77c43f2ca11b5da584213b2add29c
Author: Tejun Heo 
Date:   Tue Jan 9 08:29:50 2018 -0800

blk-mq: make blk_abort_request() trigger timeout path

With issue/complete and timeout paths now using the generation number
and state based synchronization, blk_abort_request() is the only one
which depends on REQ_ATOM_COMPLETE for arbitrating completion.

There's no reason for blk_abort_request() to be a completely separate
path.  This patch makes blk_abort_request() piggyback on the timeout
path instead of trying to terminate the request directly.

This removes the last dependency on REQ_ATOM_COMPLETE in blk-mq.

Note that this makes blk_abort_request() asynchronous - it initiates
abortion but the actual termination will happen after a short while,
even when the caller owns the request.  AFAICS, SCSI and ATA should be
fine with that and I think mtip32xx and dasd should be safe but not
completely sure.  It'd be great if people who know the drivers take a
look.

v2: - Add comment explaining the lack of synchronization around
  ->deadline update as requested by Bart.

Signed-off-by: Tejun Heo 
Cc: Asai Thambi SP 
Cc: Stefan Haberland 
Cc: Jan Hoeppner 
Cc: Bart Van Assche 
Signed-off-by: Jens Axboe 

:04 04 b5c8c2fd69850021865071f9641d54ab4fd20a15 
e2dbd2a15a6baeec1332cc1416e51d537ff5040a M  block


-- 
Meelis Roos (mr...@linux.ee)


Re: 4.16-rc2+git: pata_serverworks: hanging ata detection thread on HP DL380G3

2018-03-29 Thread Meelis Roos
> On Thursday, March 29, 2018 11:54:09 AM Meelis Roos wrote:
> > > 4.16 git bootup on HP Proliant DL380 G3 pauses for a a minute or two and 
> > > then continues with "blocked for more than 120 seconds" message with 
> > > libata detection functions in ther stack - 
> > > async_synchronize_cookie_domain() as the last. It seems to happen during 
> > > IDE CD-ROM detection (detected before but registered as sr0 after the 
> > > warning). After detection, the eject button on the drive did not work.
> > > 
> > > 
> > > pata_serverworks is the libata driver in use.
> 
> There were no changes to pata_serverworks since 2014 and libata changes
> in v4.16 look obviously correct..
> 
> > This is still the same in 4.16.0-rc7-00062-g0b412605ef5f.
> 
> Any chance that you could bisect this issue?

Yes, will do.

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.16-rc2+git: pata_serverworks: hanging ata detection thread on HP DL380G3

2018-03-29 Thread Meelis Roos
> 4.16 git bootup on HP Proliant DL380 G3 pauses for a a minute or two and 
> then continues with "blocked for more than 120 seconds" message with 
> libata detection functions in ther stack - 
> async_synchronize_cookie_domain() as the last. It seems to happen during 
> IDE CD-ROM detection (detected before but registered as sr0 after the 
> warning). After detection, the eject button on the drive did not work.
> 
> 
> pata_serverworks is the libata driver in use.

This is still the same in 4.16.0-rc7-00062-g0b412605ef5f.

> [  242.652061] INFO: task kworker/u8:4:613 blocked for more than 120 seconds.
> [  242.652230]   Not tainted 4.16.0-rc2-00374-g3664ce2d9309 #36
> [  242.654171] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  242.654386] kworker/u8:4D0   613  2 0x8000
> [  242.654517] Workqueue: events_unbound async_run_entry_fn
> [  242.654637] Call Trace:
> [  242.654759]  __schedule+0x1bc/0x8d3
> [  242.654877]  ? set_next_entity+0xc1/0x39a
> [  242.654994]  schedule+0x28/0xb2
> [  242.655096]  async_synchronize_cookie_domain+0xac/0xf4
> [  242.655217]  ? __clear_rsb+0x1d/0x32
> [  242.655334]  ? wait_woken+0xb7/0xb7
> [  242.655449]  async_synchronize_cookie+0xd/0x15
> [  242.655583]  async_port_probe+0x57/0x87 [libata]
> [  242.655703]  ? __clear_rsb+0xd/0x32
> [  242.655825]  ? ata_port_probe+0x52/0x52 [libata]
> [  242.655945]  async_run_entry_fn+0x49/0x1f2
> [  242.656075]  process_one_work+0x20a/0x568
> [  242.656191]  worker_thread+0x4c/0x631
> [  242.656312]  kthread+0x140/0x1e4
> [  242.656428]  ? process_one_work+0x568/0x568
> [  242.656547]  ? kthread_create_on_node+0x23/0x23
> [  242.656667]  ret_from_fork+0x2e/0x38
> [  242.656793] INFO: task systemd-udevd:803 blocked for more than 120 seconds.
> [  242.656920]   Not tainted 4.16.0-rc2-00374-g3664ce2d9309 #36
> [  242.657039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  242.657257] systemd-udevd   D0   803758 0x8004
> [  242.657379] Call Trace:
> [  242.657495]  __schedule+0x1bc/0x8d3
> [  242.657614]  ? kfree_skbmem+0x65/0x85
> [  242.657730]  schedule+0x28/0xb2
> [  242.657846]  async_synchronize_cookie_domain+0xac/0xf4
> [  242.657968]  ? wait_woken+0xb7/0xb7
> [  242.658082]  async_synchronize_full+0x14/0x16
> [  242.658206]  do_init_module+0x10f/0x24b
> [  242.658323]  load_module+0x29c9/0x3865
> [  242.658443]  ? kernel_read+0x50/0xa7
> [  242.658558]  SyS_finit_module+0x78/0x8d
> [  242.658681]  do_fast_syscall_32+0xc7/0x323
> [  242.658800]  entry_SYSENTER_32+0x4e/0x7c
> [  242.658916] EIP: 0xb7f0cad5
> [  242.659030] EFLAGS: 0292 CPU: 0
> [  242.659145] EAX: ffda EBX: 000d ECX: b7d03bdd EDX: 
> [  242.659265] ESI: 011ba740 EDI: 011bdb50 EBP:  ESP: bfcf1bcc
> [  242.659388]  DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
> [  244.422337] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [  246.012767] tg3 :02:01.0 eth0: Link is up at 100 Mbps, full duplex
> [  246.012875] tg3 :02:01.0 eth0: Flow control is off for TX and off for 
> RX
> [  246.012990] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [  316.432903] scsi 1:0:0:0: Attached scsi generic sg3 type 5
> [  316.667528] sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda 
> tray
> [  316.667571] cdrom: Uniform CD-ROM driver Revision: 3.20
> [  316.667837] sr 1:0:0:0: Attached scsi CD-ROM sr0
> [ 4097.814125] random: crng init done
> 
> 

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.15-rc9 new insecure W+X mapping warning

2018-03-22 Thread Meelis Roos
> > This is Intel SE7520JR22S mainboard with 2 64-bit P4 xeons. Earlier 
> > kernels up to 4.14 have had W+X checking on but found nothing. Now I 
> > tried 4.15.0-rc9-00023-g1f07476ec143 and it gives a new W+X warning. 
> 
> This still happens in 4.15 and 4.16-rc2+.
> 
> What can I do to help resolving it?

Below is kernel_page_tables from debugfs.

> > [   10.880663] [ cut here ]
> > [   10.880755] x86/mm: Found insecure W+X mapping at address 
> > d051fb08/0x8800
> > [   10.880900] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:266 
> > note_page+0x718/0xb89
> > [   10.881035] Modules linked in:
> > [   10.881128] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 
> > 4.15.0-rc9-00023-g1f07476ec143 #104
> > [   10.881264] Hardware name: Intel  
> > /SE7520JR22S, BIOS SE7520JR22.86B.P.10.00.0087.120820051348 12/08/2005
> > [   10.881405] RIP: 0010:note_page+0x718/0xb89
> > [   10.881491] RSP: :c9013e48 EFLAGS: 00010296
> > [   10.881578] RAX: 0051 RBX: c9013ec8 RCX: 
> > 8164f938
> > [   10.881666] RDX: 0001 RSI: 0092 RDI: 
> > 82b468cc
> > [   10.881756] RBP: 0061 R08: 0177 R09: 
> > 01d7
> > [   10.881844] R10: 0720072007200720 R11: 0720072007200720 R12: 
> > 
> > [   10.881932] R13:  R14: 0001 R15: 
> > 88099000
> > [   10.882022] FS:  () GS:88003fc8() 
> > knlGS:
> > [   10.882156] CS:  0010 DS:  ES:  CR0: 80050033
> > [   10.882243] CR2:  CR3: 0200a000 CR4: 
> > 06e0
> > [   10.882331] Call Trace:
> > [   10.882423]  ptdump_walk_pgd_level_core+0x367/0x3a5
> > [   10.882511]  ptdump_walk_pgd_level_checkwx+0x10/0x3e
> > [   10.882602]  kernel_init+0x2e/0x10f
> > [   10.882688]  ? rest_init+0xb9/0xb9
> > [   10.882775]  ret_from_fork+0x35/0x40
> > [   10.882861] Code: fb ff ff 41 f7 c7 00 10 00 00 0f 85 e2 fe ff ff e9 36 
> > fd ff ff c6 05 7d 45 6f 01 01 48 89 f2 48 c7 c7 08 5b ee 81 e8 4b d6 00 00 
> > <0f> ff 48 8b 73 10 e9 bc f9 ff ff 4d 85 ed 0f 84 b9 01 00 00 41 
> > [   10.883103] ---[ end trace bc3e2cf1a1adfa39 ]---
> > [   10.896336] x86/mm: Checked W+X mappings: FAILED, 266243 W+X pages found.
> > [   10.896430] x86/mm: Checking user space page tables
> > [   10.909522] x86/mm: Checked W+X mappings: FAILED, 56 W+X pages found.

---[ User Space ]---
0x-0x800016777088T  
 pgd
---[ Kernel Space ]---
0x8000-0x8800   8T  
 pgd
---[ Low Kernel Mapping ]---
0x8800-0x88099000 612K RW GLB x 
 pte
0x88099000-0x8809b000   8K ro GLB x 
 pte
0x8809b000-0x88201428K RW GLB x 
 pte
0x8820-0x88000100  14M RW PSE GLB x 
 pmd
0x88000100-0x88000180   8M ro PSE GLB x 
 pmd
0x88000180-0x880001892000 584K ro GLB x 
 pte
0x880001892000-0x880001a01464K RW GLB x 
 pte
0x880001a0-0x880001b520001352K ro GLB x 
 pte
0x880001b52000-0x880001c0 696K RW GLB x 
 pte
0x880001c0-0x8800dfe03554M RW PSE GLB x 
 pmd
0x8800dfe0-0x8800dffe1920K RW GLB x 
 pte
0x8800dffe-0x8800e000 128K  
 pte
0x8800e000-0x8801 512M  
 pmd
0x8801-0x88030ec08428M RW PSE GLB x 
 pmd
0x88030ec0-0x88030ec12000  72K RW GLB x 
 pte
0x88030ec12000-0x88030ec1a000  32K ro GLB x 
 pte
0x88030ec1a000-0x88030ec28000  56K RW GLB x 
 pte
0x88030ec28000-0x88030ec3  32K ro GLB x 
 pte
0x88030ec3-0x88030ec37000  28K RW GLB x 
 pte
0x88030ec37000-0x88030ec4c000  84K ro GLB x 
 pte
0x88030ec4c000-0x88030ec5  16K RW GLB x 
 pte
0x88030ec5-0x88030ec54000  16K ro GLB x 
 pte
0x88030ec54000-0x88030ec82000 184K RW GLB x 
 pte
0x88030ec82000-0x88030ec84000   8K ro GLB x 
 pte
0x88030ec84000-0x88030ec92000  56K RW GLB x 
 pte
0x88030ec92000-0x88030ec97000  20K ro GLB x 
 pte
0

Re: 4.15-rc9 new insecure W+X mapping warning

2018-03-09 Thread Meelis Roos
> This is Intel SE7520JR22S mainboard with 2 64-bit P4 xeons. Earlier 
> kernels up to 4.14 have had W+X checking on but found nothing. Now I 
> tried 4.15.0-rc9-00023-g1f07476ec143 and it gives a new W+X warning. 

Actually, I was wrong about earlier kernels - I just did not have 
CONFIG_DEBUG_WX turned on before and eralier kernels did not check it.

Recompiled 4.14 with CONFIG_DEBUG_WX=y and the problem is there. So this 
is not a Linux regression but a peculiarity with the SE7520JR22S, it 
seems.

Is there anything that Linux might be doing wrong?

> [   10.880663] [ cut here ]
> [   10.880755] x86/mm: Found insecure W+X mapping at address 
> d051fb08/0x8800
> [   10.880900] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:266 
> note_page+0x718/0xb89
> [   10.881035] Modules linked in:
> [   10.881128] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 
> 4.15.0-rc9-00023-g1f07476ec143 #104
> [   10.881264] Hardware name: Intel  
> /SE7520JR22S, BIOS SE7520JR22.86B.P.10.00.0087.120820051348 12/08/2005
> [   10.881405] RIP: 0010:note_page+0x718/0xb89
> [   10.881491] RSP: :c9013e48 EFLAGS: 00010296
> [   10.881578] RAX: 0051 RBX: c9013ec8 RCX: 
> 8164f938
> [   10.881666] RDX: 0001 RSI: 0092 RDI: 
> 82b468cc
> [   10.881756] RBP: 0061 R08: 0177 R09: 
> 01d7
> [   10.881844] R10: 0720072007200720 R11: 0720072007200720 R12: 
> 
> [   10.881932] R13:  R14: 0001 R15: 
> 88099000
> [   10.882022] FS:  () GS:88003fc8() 
> knlGS:
> [   10.882156] CS:  0010 DS:  ES:  CR0: 80050033
> [   10.882243] CR2:  CR3: 0200a000 CR4: 
> 06e0
> [   10.882331] Call Trace:
> [   10.882423]  ptdump_walk_pgd_level_core+0x367/0x3a5
> [   10.882511]  ptdump_walk_pgd_level_checkwx+0x10/0x3e
> [   10.882602]  kernel_init+0x2e/0x10f
> [   10.882688]  ? rest_init+0xb9/0xb9
> [   10.882775]  ret_from_fork+0x35/0x40
> [   10.882861] Code: fb ff ff 41 f7 c7 00 10 00 00 0f 85 e2 fe ff ff e9 36 fd 
> ff ff c6 05 7d 45 6f 01 01 48 89 f2 48 c7 c7 08 5b ee 81 e8 4b d6 00 00 <0f> 
> ff 48 8b 73 10 e9 bc f9 ff ff 4d 85 ed 0f 84 b9 01 00 00 41 
> [   10.883103] ---[ end trace bc3e2cf1a1adfa39 ]---
> [   10.896336] x86/mm: Checked W+X mappings: FAILED, 266243 W+X pages found.
> [   10.896430] x86/mm: Checking user space page tables
> [   10.909522] x86/mm: Checked W+X mappings: FAILED, 56 W+X pages found.

-- 
Meelis Roos (mr...@linux.ee)


UBSAN warning in nouveau_bios.c:1528:8

2018-03-01 Thread Meelis Roos
This is the first time I have tried UBSAN on this specific machine 
(onboard nforce 420 with HP BIOS on Nance mainboard). nouveau seems to 
be working fine but gives this UBSAN warning:

[7.953957] nouveau :00:0d.0: NVIDIA C61 (04c000a2)
[7.965101] nouveau :00:0d.0: bios: version 05.61.32.25.02
[7.966141] nouveau :00:0d.0: fb: 128 MiB of unknown memory type
[8.015336] [TTM] Zone  kernel: Available graphics memory: 952564 kiB
[8.015339] [TTM] Initializing pool allocator
[8.015344] [TTM] Initializing DMA pool allocator
[8.015370] nouveau :00:0d.0: DRM: VRAM: 125 MiB
[8.015372] nouveau :00:0d.0: DRM: GART: 512 MiB
[8.015377] nouveau :00:0d.0: DRM: TMDS table version 1.1
[8.015379] nouveau :00:0d.0: DRM: DCB version 3.0
[8.015382] nouveau :00:0d.0: DRM: DCB outp 00: 01000310 0023
[8.015385] nouveau :00:0d.0: DRM: DCB outp 01: 00110204 98830003
[8.015386] 

[8.015423] UBSAN: Undefined behaviour in 
drivers/gpu/drm/nouveau/nouveau_bios.c:1528:8
[8.015455] shift exponent -1 is negative
[8.015482] CPU: 1 PID: 148 Comm: systemd-udevd Not tainted 
4.16.0-rc3-00167-g97ace515f014 #1
[8.015483] Hardware name: HP-Pavilion RT589AA-ABU t3709.uk/Nance, BIOS 5.02 
11/26/2006
[8.015485] Call Trace:
[8.015496]  dump_stack+0x5a/0x99
[8.015500]  ubsan_epilogue+0x9/0x40
[8.015503]  __ubsan_handle_shift_out_of_bounds+0x124/0x160
[8.015506]  ? _dev_info+0x67/0x90
[8.015509]  ? dev_printk_emit+0x49/0x70
[8.015632]  parse_dcb_entry+0x91e/0xd90 [nouveau]
[8.015712]  ? parse_bit_M_tbl_entry+0x150/0x150 [nouveau]
[8.015791]  olddcb_outp_foreach+0x66/0xa0 [nouveau]
[8.015870]  nouveau_bios_init+0x23a/0x2250 [nouveau]
[8.015950]  ? nouveau_ttm_init+0x3a4/0x710 [nouveau]
[8.016029]  nouveau_drm_load+0x229/0xf10 [nouveau]
[8.016033]  ? sysfs_do_create_link_sd+0xa6/0x170
[8.016067]  drm_dev_register+0x1b7/0x330 [drm]
[8.016070]  ? pci_enable_device_flags+0x160/0x1f0
[8.016091]  drm_get_pci_dev+0xee/0x2e0 [drm]
[8.016172]  nouveau_drm_probe+0x1dd/0x270 [nouveau]
[8.016175]  pci_device_probe+0x113/0x1d0
[8.016178]  driver_probe_device+0x375/0x720
[8.016180]  __driver_attach+0xeb/0x150
[8.016181]  ? driver_probe_device+0x720/0x720
[8.016183]  bus_for_each_dev+0x84/0xe0
[8.016186]  bus_add_driver+0x19f/0x340
[8.016188]  driver_register+0x67/0x110
[8.016190]  ? 0xc0cfb000
[8.016193]  do_one_initcall+0x66/0x210
[8.016197]  do_init_module+0xa7/0x2a9
[8.016199]  load_module+0x2548/0x3d30
[8.016202]  ? __symbol_put+0x60/0x60
[8.016205]  ? kernel_read_file+0x21b/0x390
[8.016208]  ? kernel_read_file_from_fd+0x52/0x90
[8.016210]  SYSC_finit_module+0x124/0x150
[8.016212]  do_syscall_64+0x7a/0x1f0
[8.016214]  ? page_fault+0x2f/0x50
[8.016217]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[8.016219] RIP: 0033:0x7f2e47b82e19
[8.016220] RSP: 002b:7ffdcdc157b8 EFLAGS: 0246 ORIG_RAX: 
0139
[8.016223] RAX: ffda RBX: 5638b23c7250 RCX: 7f2e47b82e19
[8.016224] RDX:  RSI: 7f2e4788d0ed RDI: 0019
[8.016225] RBP: 7f2e4788d0ed R08:  R09: 
[8.016226] R10: 0019 R11: 0246 R12: 
[8.016227] R13: 5638b23c2ce0 R14: 0002 R15: 5638b23c7250
[8.016228] 

[8.016299] nouveau :00:0d.0: DRM: DCB conn 00: 
[8.016301] nouveau :00:0d.0: DRM: DCB conn 01: 1131
[8.016302] nouveau :00:0d.0: DRM: DCB conn 02: 0110
[8.016304] nouveau :00:0d.0: DRM: DCB conn 03: 0111
[8.016305] nouveau :00:0d.0: DRM: DCB conn 04: 0113
[8.016626] nouveau :00:0d.0: DRM: Saving VGA fonts
[8.052781] nouveau :00:0d.0: DRM: DCB type 4 not known
[8.052784] nouveau :00:0d.0: DRM: Unknown-1 has no encoders, removing
[8.053728] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[8.053729] [drm] Driver supports precise vblank timestamp query.
[8.055836] nouveau :00:0d.0: DRM: MM: using M2MF for buffer copies
[8.084488] nouveau :00:0d.0: DRM: allocated 1280x1024 fb: 0x9000, bo 
50f4b5d0
[8.084678] fbcon: nouveaufb (fb0) is primary device
[8.193959] Console: switching to colour frame buffer device 160x64
[8.195378] nouveau :00:0d.0: fb0: nouveaufb frame buffer device
[8.212083] [drm] Initialized nouveau 1.3.1 20120801 for :00:0d.0 on 
minor 0

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.15-rc9 new insecure W+X mapping warning

2018-02-26 Thread Meelis Roos
> This is Intel SE7520JR22S mainboard with 2 64-bit P4 xeons. Earlier 
> kernels up to 4.14 have had W+X checking on but found nothing. Now I 
> tried 4.15.0-rc9-00023-g1f07476ec143 and it gives a new W+X warning. 

This still happens in 4.15 and 4.16-rc2+.

What can I do to help resolving it?

> [   10.880663] [ cut here ]
> [   10.880755] x86/mm: Found insecure W+X mapping at address 
> d051fb08/0x8800
> [   10.880900] WARNING: CPU: 2 PID: 1 at arch/x86/mm/dump_pagetables.c:266 
> note_page+0x718/0xb89
> [   10.881035] Modules linked in:
> [   10.881128] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 
> 4.15.0-rc9-00023-g1f07476ec143 #104
> [   10.881264] Hardware name: Intel  
> /SE7520JR22S, BIOS SE7520JR22.86B.P.10.00.0087.120820051348 12/08/2005
> [   10.881405] RIP: 0010:note_page+0x718/0xb89
> [   10.881491] RSP: :c9013e48 EFLAGS: 00010296
> [   10.881578] RAX: 0051 RBX: c9013ec8 RCX: 
> 8164f938
> [   10.881666] RDX: 0001 RSI: 0092 RDI: 
> 82b468cc
> [   10.881756] RBP: 0061 R08: 0177 R09: 
> 01d7
> [   10.881844] R10: 0720072007200720 R11: 0720072007200720 R12: 
> 
> [   10.881932] R13:  R14: 0001 R15: 
> 88099000
> [   10.882022] FS:  () GS:88003fc8() 
> knlGS:
> [   10.882156] CS:  0010 DS:  ES:  CR0: 80050033
> [   10.882243] CR2:  CR3: 0200a000 CR4: 
> 06e0
> [   10.882331] Call Trace:
> [   10.882423]  ptdump_walk_pgd_level_core+0x367/0x3a5
> [   10.882511]  ptdump_walk_pgd_level_checkwx+0x10/0x3e
> [   10.882602]  kernel_init+0x2e/0x10f
> [   10.882688]  ? rest_init+0xb9/0xb9
> [   10.882775]  ret_from_fork+0x35/0x40
> [   10.882861] Code: fb ff ff 41 f7 c7 00 10 00 00 0f 85 e2 fe ff ff e9 36 fd 
> ff ff c6 05 7d 45 6f 01 01 48 89 f2 48 c7 c7 08 5b ee 81 e8 4b d6 00 00 <0f> 
> ff 48 8b 73 10 e9 bc f9 ff ff 4d 85 ed 0f 84 b9 01 00 00 41 
> [   10.883103] ---[ end trace bc3e2cf1a1adfa39 ]---
> [   10.896336] x86/mm: Checked W+X mappings: FAILED, 266243 W+X pages found.
> [   10.896430] x86/mm: Checking user space page tables
> [   10.909522] x86/mm: Checked W+X mappings: FAILED, 56 W+X pages found.

-- 
Meelis Roos (mr...@linux.ee)


4.16-rc2+git: pata_serverworks: hanging ata detection thread on HP DL380G3

2018-02-26 Thread Meelis Roos
654386] kworker/u8:4D0   613  2 0x8000
[  242.654517] Workqueue: events_unbound async_run_entry_fn
[  242.654637] Call Trace:
[  242.654759]  __schedule+0x1bc/0x8d3
[  242.654877]  ? set_next_entity+0xc1/0x39a
[  242.654994]  schedule+0x28/0xb2
[  242.655096]  async_synchronize_cookie_domain+0xac/0xf4
[  242.655217]  ? __clear_rsb+0x1d/0x32
[  242.655334]  ? wait_woken+0xb7/0xb7
[  242.655449]  async_synchronize_cookie+0xd/0x15
[  242.655583]  async_port_probe+0x57/0x87 [libata]
[  242.655703]  ? __clear_rsb+0xd/0x32
[  242.655825]  ? ata_port_probe+0x52/0x52 [libata]
[  242.655945]  async_run_entry_fn+0x49/0x1f2
[  242.656075]  process_one_work+0x20a/0x568
[  242.656191]  worker_thread+0x4c/0x631
[  242.656312]  kthread+0x140/0x1e4
[  242.656428]  ? process_one_work+0x568/0x568
[  242.656547]  ? kthread_create_on_node+0x23/0x23
[  242.656667]  ret_from_fork+0x2e/0x38
[  242.656793] INFO: task systemd-udevd:803 blocked for more than 120 seconds.
[  242.656920]   Not tainted 4.16.0-rc2-00374-g3664ce2d9309 #36
[  242.657039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  242.657257] systemd-udevd   D0   803758 0x8004
[  242.657379] Call Trace:
[  242.657495]  __schedule+0x1bc/0x8d3
[  242.657614]  ? kfree_skbmem+0x65/0x85
[  242.657730]  schedule+0x28/0xb2
[  242.657846]  async_synchronize_cookie_domain+0xac/0xf4
[  242.657968]  ? wait_woken+0xb7/0xb7
[  242.658082]  async_synchronize_full+0x14/0x16
[  242.658206]  do_init_module+0x10f/0x24b
[  242.658323]  load_module+0x29c9/0x3865
[  242.658443]  ? kernel_read+0x50/0xa7
[  242.658558]  SyS_finit_module+0x78/0x8d
[  242.658681]  do_fast_syscall_32+0xc7/0x323
[  242.658800]  entry_SYSENTER_32+0x4e/0x7c
[  242.658916] EIP: 0xb7f0cad5
[  242.659030] EFLAGS: 0292 CPU: 0
[  242.659145] EAX: ffda EBX: 000d ECX: b7d03bdd EDX: 
[  242.659265] ESI: 011ba740 EDI: 011bdb50 EBP:  ESP: bfcf1bcc
[  242.659388]  DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
[  244.422337] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[  246.012767] tg3 :02:01.0 eth0: Link is up at 100 Mbps, full duplex
[  246.012875] tg3 :02:01.0 eth0: Flow control is off for TX and off for RX
[  246.012990] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  316.432903] scsi 1:0:0:0: Attached scsi generic sg3 type 5
[  316.667528] sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda 
tray
[  316.667571] cdrom: Uniform CD-ROM driver Revision: 3.20
[  316.667837] sr 1:0:0:0: Attached scsi CD-ROM sr0
[ 4097.814125] random: crng init done

-- 
Meelis Roos (mr...@linux.ee)


PCI BAR allocation failures in 4.15 and 4.16-rc2+ on HP DL585

2018-02-26 Thread Meelis Roos
I added a Atto U320 SCSI card into my HP Proliant DL585 (G1 despite the 
G naming was not used then). I have not tried earlier kernels.

In dmesg there are actually two sets of BAR assignment failures, the 
first bridge may or may not be related.

[0.353439] pci :00:03.0: BAR 15: no space for [mem size 0x0010 pref]
[0.353621] pci :00:03.0: BAR 15: failed to assign [mem size 0x0010 
pref]
[...]
[0.355801] pci :04:09.0: PCI bridge to [bus 05]
[0.355801] pci :06:0e.0: BAR 6: no space for [mem size 0x0010 pref]
[0.355916] pci :06:0e.0: BAR 6: failed to assign [mem size 0x0010 
pref]
[0.356207] pci :06:0e.1: BAR 6: no space for [mem size 0x0010 pref]
[0.356387] pci :06:0e.1: BAR 6: failed to assign [mem size 0x0010 
pref]
[0.356661] pci :04:0a.0: PCI bridge to [bus 06]
[0.356834] pci :04:0a.0:   bridge window [io  0x6000-0x6fff]
[0.357013] pci :04:0a.0:   bridge window [mem 0xf7f0-0xf7ff]
[0.357197] pci :04:0b.0: PCI bridge to [bus 07]
[0.357375] pci :04:0c.0: PCI bridge to [bus 08]

lspci -vvvxxx and full dmesg are below.

00:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] AMD-8111 PCI (rev 07) 
(prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
SERR- TAbort- 
Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [c0] HyperTransport: Slave or Primary Interface
Command: BaseUnitID=3 UnitCnt=4 MastHost- DefDir- DUL-
Link Control 0: CFlE- CST- CFE- TAbort- 
SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- 
Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [a0] PCI-X bridge device
Secondary Status: 64bit+ 133MHz+ SCD- USC- SCO- SRD- Freq=100MHz
Status: Dev=00:07.0 64bit+ 133MHz+ SCD- USC- SCO- SRD-
Upstream: Capacity=14 CommitmentLimit=65535
Downstream: Capacity=2 CommitmentLimit=65535
Capabilities: [b8] HyperTransport: Interrupt Discovery and Configuration
Capabilities: [c0] HyperTransport: Slave or Primary Interface
Command: BaseUnitID=7 UnitCnt=2 MastHost- DefDir- DUL-
Link Control 0: CFlE- CST- CFE- TAbort- 
SERR- TAbort- 
SERR- TAbort- 
Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [a0] PCI-X bridge device
Secondary Status: 64bit+ 133MHz+ SCD- USC- SCO- SRD- Freq=100MHz
Status: Dev=00:08.0 64bit+ 133MHz+ SCD- USC- SCO- SRD-
Upstream: Capacity=14 CommitmentLimit=65535
Downstream: Capacity=2 CommitmentLimit=65535
Capabilities: [b8] HyperTransport: Interrupt Discovery and Configuration
Kernel modules: shpchp
00: 22 10 50 74 47 01 30 02 12 00 04 06 00 40 81 00
10: 00 00 00 00 00 00 00 00 00 03 03 40 f1 01 20 22
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 a0 00 00 00 00 00 00 00 ff 00 01 00
40: 05 00 1f 00 01 00 00 00 00 00 00 00 01 2c 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 07 b8 83 00 40 00 03 00 0e 00 ff ff 02 00 ff ff
b0: 00 00 00 00 00 00 00 00 08 00 00 80 00 00 00 06
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:08.1 PIC: Advanced Micro Devices, Inc. [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 
(prog-if 10 [IO-APIC])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- 
SERR-  [disabled]
Capabilities: [40] PCI-X non-bridge device
Command: DPERE- ERO- RBC=2048 OST=1
Status: Dev=02:06.0 64bit+ 133MHz+ SCD- USC- DC=simple 
DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable+ DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data

Re: hpsa crashes on boot in 4.16-rc2-00062

2018-02-22 Thread Meelis Roos
> This happens on a HP DL360 G6 with Smart Array 410i.
> 
> Will try to bisect.
> 
> IO completion timeout could be because of some IRQ toubles?

Reverting 84676c1f21e8ff54befe985f4f14dc1edc10046b fixes it for me (as 
suggested by Laurence Oberman).

-- 
Meelis Roos (mr...@linux.ee)


hpsa crashes on boot in 4.16-rc2-00062

2018-02-22 Thread Meelis Roos
This happens on a HP DL360 G6 with Smart Array 410i.

Will try to bisect.

IO completion timeout could be because of some IRQ toubles?

(sorry, the rest has scrolled away in ilo2 textcons)

[  242.655025] Call Trace:  
[  242.655077]  ? __schedule+0x1dd/0x5e0
[  242.655130]  schedule+0x23/0x70  
[  242.655182]  schedule_timeout+0xe1/0x290 
[  242.655236]  io_schedule_timeout+0x14/0x40   
[  242.655290]  wait_for_completion_io+0xa4/0x120   
[  242.655346]  ? wake_up_q+0x70/0x70   
[  242.655401]  hpsa_scsi_do_simple_cmd+0xa7/0xf0   
[  242.655456]  hpsa_scsi_do_simple_cmd_with_retry+0x4a/0x150   
[  242.655512]  hpsa_scsi_do_inquiry+0x5d/0xc0  
[  242.655567]  hpsa_scan_start+0xf67/0x1fa0
[  242.655621]  ? sched_clock_local+0x12/0x80   
[  242.655675]  ? sched_clock_local+0x12/0x80   
[  242.655729]  ? select_idle_sibling+0x21/0x3b0
[  242.655785]  ? do_scsi_scan_host+0x2d/0x90   
[  242.655839]  do_scsi_scan_host+0x2d/0x90 
[  242.655892]  do_scan_async+0x12/0x180
[  242.655945]  async_run_entry_fn+0x2c/0x140   
[  242.656002]  process_one_work+0x1a6/0x320
[  242.656062]  worker_thread+0x26/0x3c0
[  242.656115]  ? create_worker+0x190/0x190 
[  242.656170]  kthread+0x107/0x120 
[  242.656222]  ? kthread_create_worker_on_cpu+0x70/0x70
[  242.656278]  ret_from_fork+0x35/0x40  


-- 
Meelis Roos (mr...@linux.ee)


Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini

2018-02-14 Thread Meelis Roos
> Actually this was brought up to me already, there's a fix on the mailing list
> for this I reviewed a little while ago from nvidia that we should pull in:
> 
> https://patchwork.freedesktop.org/patch/203205/
> 
> Would you guys mind confirming that this patch fixes your issues?

It works on my amd64, P4 is still compiling.

[1.124987] nouveau :04:05.0: NVIDIA NV05 (20154000)
[1.161464] nouveau :04:05.0: bios: version 03.05.00.10.00
[1.161475] nouveau :04:05.0: bios: DCB table not found
[1.161535] nouveau :04:05.0: bios: DCB table not found
[1.161577] nouveau :04:05.0: bios: DCB table not found
[1.161586] nouveau :04:05.0: bios: DCB table not found
[1.344008] tsc: Refined TSC clocksource calibration: 2200.078 MHz
[1.344024] clocksource: tsc: mask: 0x max_cycles: 
0x1fb67c69f81, max_idle_ns: 440795210317 ns
[1.344037] clocksource: Switched to clocksource tsc
[1.408102] nouveau :04:05.0: tmr: unknown input clock freq
[1.409471] nouveau :04:05.0: fb: 32 MiB SDRAM
[1.414459] nouveau :04:05.0: DRM: VRAM: 31 MiB
[1.414467] nouveau :04:05.0: DRM: GART: 128 MiB
[1.414476] nouveau :04:05.0: DRM: BMP version 5.17
[1.414484] nouveau :04:05.0: DRM: No DCB data found in VBIOS
[1.415629] nouveau :04:05.0: DRM: Adaptor not initialised, running 
VBIOS init tables.
[1.415829] nouveau :04:05.0: bios: DCB table not found
[1.416125] nouveau :04:05.0: DRM: Saving VGA fonts
[1.477526] nouveau :04:05.0: DRM: No DCB data found in VBIOS
[1.478428] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[1.478438] [drm] Driver supports precise vblank timestamp query.
[1.479618] nouveau :04:05.0: DRM: MM: using M2MF for buffer copies
[1.517930] nouveau :04:05.0: DRM: allocated 1024x768 fb: 0x4000, bo 
a09f4d1f
[1.519294] nouveau :04:05.0: fb1: nouveaufb frame buffer device
[1.519313] [drm] Initialized nouveau 1.3.1 20120801 for :04:05.0 on 
minor 1


-- 
Meelis Roos (mr...@linux.ee)


apm_32.c: undefined reference to `cpuidle_poll_state_init'

2018-02-14 Thread Meelis Roos
This is 4.16-rc1+git as of today, on a IBM PC 365 that uses APM instead 
of ACPI. APM linking fails:

  MODPOST vmlinux.o
arch/x86/kernel/apm_32.o: In function `apm_init':
apm_32.c:(.init.text+0x597): undefined reference to `cpuidle_poll_state_init'

Config:

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.16.0-rc1 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TINY_SRCU=y
# CONFIG_TASKS_RCU is not set
# CONFIG_RCU_STALL_COMMON is not set
# CONFIG_RCU_NEED_SEGCBLIST is not set
# CONFIG_BUILD_BIN2C is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=16
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_CGROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_HUGETLB is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_BPF is not set
# CONFIG_SOCK_CGROUP_DATA is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
# CONFIG_EXPERT is not set
CONFIG_UID16=y
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_PRINTK_NMI=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ABSOLUTE_PERCPU is not set
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_BPF_SYSCALL

Re: 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini

2018-02-14 Thread Meelis Roos
> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:

NV5 in another PC (secondary card in x86-64) made the systrem crash on 
boot, in nvkm_therm_clkgate_fini.

-- 
Meelis Roos (mr...@linux.ee)


4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini

2018-02-13 Thread Meelis Roos
d_intel8x0 snd_ac97_codec button rng_core ac97_bus snd_pcm snd_timer snd 
soundcore eeprom adm1031 adm1025 hwmon_vid i2c_core ip_tables x_tables ipv6 
autofs4
[7.410357] CPU: 0 PID: 125 Comm: systemd-udevd Not tainted 
4.16.0-rc1-00010-g178e834c47b0 #65
[7.410499] Hardware name:  /D850GB , BIOS 
GB85010A.86A.0078.P18.0110081719 10/08/2001
[7.410824] EIP: nvkm_therm_clkgate_fini+0x15/0x174 [nouveau]
[7.410921] EFLAGS: 00010286 CPU: 0
[7.411014] EAX: f6b3b800 EBX:  ECX: 0006 EDX: 0007
[7.411109] ESI:  EDI:  EBP: f6155858 ESP: f6155834
[7.411205]  DS: 007b ES: 007b FS:  GS: 00e0 SS: 0068
[7.411299] CR0: 80050033 CR2:  CR3: 3614b000 CR4: 06d0
[7.411395] Call Trace:
[7.411662]  ? nvkm_device_subdev+0x1b9/0x1fa [nouveau]
[7.411926]  nvkm_device_fini+0x113/0x3e9 [nouveau]
[7.412030]  ? ktime_get+0x4b/0x135
[7.412274]  ? nvkm_devinit_post+0x35/0xbf [nouveau]
[7.412536]  nvkm_device_init+0x228/0x5b0 [nouveau]
[7.412640]  ? kmem_cache_alloc+0xbd/0x12a
[7.412906]  nvkm_udevice_init+0x51/0xa9 [nouveau]
[7.413146]  nvkm_object_init+0xc8/0x442 [nouveau]
[7.413248]  ? check_preempt_wakeup+0xc2/0x1c1
[7.413602]  ? nvkm_client_child_new+0x1d/0x38 [nouveau]
[7.413956]  nvkm_ioctl_new+0x152/0x3d9 [nouveau]
[7.414055]  ? default_wake_function+0x1a/0x35
[7.414409]  ? nvif_vmm_init+0x2ce/0x2ce [nouveau]
[7.414788]  ? nvkm_udevice_rd08+0x5b/0x5b [nouveau]
[7.415150]  nvkm_ioctl+0x1c6/0x48d [nouveau]
[7.416466]  ? nvif_client_init+0xc3/0x114 [nouveau]
[7.416832]  ? nvkm_client_map+0xf/0xf [nouveau]
[7.417201]  nvkm_client_ioctl+0x1c/0x22 [nouveau]
[7.417554]  nvif_object_ioctl+0x6f/0xff [nouveau]
[7.417909]  nvif_object_init+0xd4/0x1de [nouveau]
[7.418271]  nvif_device_init+0x21/0x5c [nouveau]
[7.418536]  nouveau_cli_init+0x21f/0xe1f [nouveau]
[7.418799]  ? nouveau_drm_load+0x1d/0xe11 [nouveau]
[7.419058]  nouveau_drm_load+0x54/0xe11 [nouveau]
[7.419158]  ? kernfs_new_node+0x2b/0x8e
[7.419255]  ? kernfs_create_link+0x55/0xcd
[7.419369]  ? drm_dev_register+0x12f/0x2e0 [drm]
[7.419496]  drm_dev_register+0x168/0x2e0 [drm]
[7.419596]  ? pci_enable_device_flags+0xeb/0x15e
[7.419724]  drm_get_pci_dev+0xbf/0x230 [drm]
[7.420102]  nouveau_drm_probe+0x183/0x1ea [nouveau]
[7.420207]  pci_device_probe+0xaa/0x163
[7.420305]  driver_probe_device+0x1db/0x383
[7.420402]  __driver_attach+0x86/0xb8
[7.420497]  ? driver_probe_device+0x383/0x383
[7.420597]  bus_for_each_dev+0x4e/0x83
[7.420694]  driver_attach+0x1d/0x33
[7.420790]  ? driver_probe_device+0x383/0x383
[7.420886]  bus_add_driver+0x184/0x273
[7.420983]  driver_register+0x66/0x107
[7.421215]  ? nouveau_drm_init+0x66/0x1000 [nouveau]
[7.421322]  __pci_register_driver+0x47/0x71
[7.421555]  nouveau_drm_init+0x18a/0x1000 [nouveau]
[7.421654]  ? 0xf831a000
[7.421751]  do_one_initcall+0x4f/0x1e2
[7.421850]  ? free_unref_page_commit.isra.88+0xd5/0x176
[7.421947]  ? kvfree+0x3c/0x3e
[7.422041]  ? __vunmap+0x89/0xef
[7.422136]  ? do_init_module+0x1a/0x23f
[7.422232]  do_init_module+0x82/0x23f
[7.422329]  load_module+0x243c/0x36ae
[7.422428]  ? kernel_read+0x4c/0xa1
[7.422524]  SyS_finit_module+0x78/0x8d
[7.422624]  do_fast_syscall_32+0xc1/0x31b
[7.422722]  entry_SYSENTER_32+0x4e/0x7c
[7.422817] EIP: 0xb7ee9ad5
[7.422907] EFLAGS: 0296 CPU: 0
[7.423001] EAX: ffda EBX: 0019 ECX: b7ce0bdd EDX: 
[7.423098] ESI: 00eb6670 EDI: 00ebe610 EBP:  ESP: bff8704c
[7.423195]  DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
[7.423291] Code: e9 30 ff ff ff 31 d2 b8 78 cf b0 f8 e8 ba 07 a2 c8 e9 0f 
ff ff ff 55 89 e5 57 56 53 83 ec 18 89 c3 89 d6 85 c0 0f 84 2c 01 00 00 <8b> 3b 
85 ff 0f 84 11 01 00 00 8b 47 30 85 c0 0f 84 a1 00 00 00
[7.423757] EIP: nvkm_therm_clkgate_fini+0x15/0x174 [nouveau] SS:ESP: 
0068:f6155834
[7.423899] CR2: 
[7.424033] ---[ end trace cad535783d11d7b9 ]---

-- 
Meelis Roos (mr...@linux.ee)


Re: pata-macio WARNING at dmam_alloc_coherent+0xec/0x110

2018-02-13 Thread Meelis Roos
> Does this fix your warning?
> 
> diff --git a/drivers/macintosh/macio_asic.c b/drivers/macintosh/macio_asic.c
> index 62f541f968f6..07074820a167 100644
> --- a/drivers/macintosh/macio_asic.c
> +++ b/drivers/macintosh/macio_asic.c
> @@ -375,6 +375,7 @@ static struct macio_dev * macio_add_one_device(struct 
> macio_chip *chip,
>   dev->ofdev.dev.of_node = np;
>   dev->ofdev.archdata.dma_mask = 0xUL;
>   dev->ofdev.dev.dma_mask = &dev->ofdev.archdata.dma_mask;
> + dev->ofdev.dev.coherent_dma_mask = dev->ofdev.archdata.dma_mask;
>   dev->ofdev.dev.parent = parent;
>   dev->ofdev.dev.bus = &macio_bus_type;
>   dev->ofdev.dev.release = macio_release_dev;

Yes, it does - thank you!

Tested-by: Meelis Roos 

-- 
Meelis Roos (mr...@linux.ee)


pata-macio WARNING at dmam_alloc_coherent+0xec/0x110

2018-02-12 Thread Meelis Roos
I tested 4.16-rc1 on my PowerMac G4 and got the following warning from 
macio pata driver. Since pata-macio has no recent changes, dma-mapping.h 
changes seem to be related.

[0.228408] MacIO PCI driver attached to Keylargo chipset
[1.283931] pata-macio 0.0001f000:ata-4: Activating pata-macio chipset 
KeyLargo ATA-4, Apple bus ID 2
[1.284398] WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 
dmam_alloc_coherent+0xec/0x110
[1.284689] Modules linked in:
[1.284797] CPU: 0 PID: 1 Comm: swapper Not tainted 4.16.0-rc1 #60
[1.284991] NIP:  c03259ec LR: c0325948 CTR: 
[1.285150] REGS: ef047c10 TRAP: 0700   Not tainted  (4.16.0-rc1)
[1.285337] MSR:  00029032   CR: 24fff228  XER: 2000
[1.285559] 
   GPR00: c0325948 ef047cc0 ef048000 ef1321b0   
ef1321bc  
   GPR08:   c04f1bd0  22fff884  
c0004c80  
   GPR16:       
c066 c05f0960 
   GPR24: 0007 c063d7a8  ef1e59ac 1020 ef1321b0 
ef135c18 014000c0 
[1.303085] NIP [c03259ec] dmam_alloc_coherent+0xec/0x110
[1.308751] LR [c0325948] dmam_alloc_coherent+0x48/0x110
[1.314511] Call Trace:
[1.320187] [ef047cc0] [c0325948] dmam_alloc_coherent+0x48/0x110 (unreliable)
[1.326133] [ef047ce0] [c0370a90] pata_macio_port_start+0x44/0xb8
[1.332110] [ef047d00] [c0355ed4] ata_host_start.part.5+0x138/0x254
[1.338100] [ef047d30] [c035c1e8] ata_host_activate+0x84/0x1a0
[1.344007] [ef047d50] [c0371214] pata_macio_common_init+0x3b0/0x608
[1.349890] [ef047db0] [c0336f9c] macio_device_probe+0x60/0x120
[1.355761] [ef047dd0] [c031868c] driver_probe_device+0x25c/0x35c
[1.361576] [ef047e00] [c031887c] __driver_attach+0xf0/0xf4
[1.367320] [ef047e20] [c0316340] bus_for_each_dev+0x80/0xc0
[1.373051] [ef047e50] [c031782c] bus_add_driver+0x144/0x258
[1.378805] [ef047e70] [c03190dc] driver_register+0x8c/0x140
[1.384580] [ef047e80] [c060ce14] pata_macio_init+0x5c/0x8c
[1.390303] [ef047ea0] [c0004aa0] do_one_initcall+0x48/0x18c
[1.396000] [ef047f00] [c05f1214] kernel_init_freeable+0x12c/0x1ec
[1.401615] [ef047f30] [c0004c98] kernel_init+0x18/0x128
[1.407208] [ef047f40] [c00122e4] ret_from_kernel_thread+0x5c/0x64
[1.412829] Instruction dump:
[1.418409] 939d 4bff6329 80010024 7fe3fb78 8361000c 83810010 7c0803a6 
83a10014
[1.424201] 83c10018 83e1001c 38210020 4e800020 <0fe0> 4b84 7fa3eb78 
3be0 
[1.430020] ---[ end trace 89c0f4a91a110769 ]---


-- 
Meelis Roos (mr...@linux.ee)


Re: 4.15: WARNING: CPU: 3 PID: 258 at kernel/irq/chip.c:244 __irq_startup+0x80/0x100

2018-01-31 Thread Meelis Roos
> > I'll do a proper fix and queue it so your museum is kept alive.

Thank you.

> Museum, space heater and ventilation system all in one? :-)

Actually, I do have a computer museum that is open for groups in Tartu, 
Estonia, at University of Tartu, Institute of Computer Science. But this 
museum displays older stuff than P3.

In the queue for the museum, I have lots of servers and desktops and 
laptops that look too similar for presentation but are interesting for 
testing kernels. This set includes 100+ machines that are ocassionally 
powered on and most test 1-2 RC-s and the release kernels - can not 
afford to run them 24x7.

Currently, there are 30+ sparc64 machines, 30 x86 towers (mostly 
desktop, mostly 32-bit), 7 laptops, 25 x86 rack servers, 6 ia64, 2 
powerpc, 4 alpha and 5 parisc machines. At any moment, at least some of 
them are out of order but the majority are alive.

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.15: WARNING: CPU: 3 PID: 258 at kernel/irq/chip.c:244 __irq_startup+0x80/0x100

2018-01-30 Thread Meelis Roos
> > Your supply of vintage hardware is amazing.

:-)

> Does the patch below fix the issue for you?

  CC  kernel/irq/autoprobe.o
kernel/irq/autoprobe.c: In function ‘probe_irq_on’:
kernel/irq/autoprobe.c:74:8: error: void value not ignored as it ought to be
if (irq_activate_and_startup(desc, IRQ_NORESEND))
^~~~
Just 
irq_activate_and_startup(desc, IRQ_NORESEND);

cures the warning and at least the first bootup was working otherwise 
too.

-- 
Meelis Roos (mr...@linux.ee)


4.15: WARNING: CPU: 3 PID: 258 at kernel/irq/chip.c:244 __irq_startup+0x80/0x100

2018-01-29 Thread Meelis Roos
Upgraded some of my older machines to v4.15 today. On a quad P3 HP 
NetServer, I get a bootup warning at kernel/irq/chip.c:244 
__irq_startup+0x80/0x100 (full dmesg below). It seems it was there 
before but I did not notice it.

Reading older kernel logs, I found that up to 
4.15.0-rc4-00041-gace52288edf0 it did not have the warning.

4.15.0-rc6 did not have the warning but had a oops with AACRAID (NULL 
derefernce when battery died). 

4.15.0-rc6-dirty has the warning, dirty means my aacraid init order 
patch (submitted to linux-scsi, initialized function pointers before 
using them in error handler, does not seem related to IRQs?).

I also found it for the next 2 boots, 
4.15.0-rc9-00023-g1f07476ec143-dirty and 4.15.0-dirty. Sometimes on 
CPU0, sometimes on CPU 3.

Config is also below.

[0.00] Linux version 4.15.0-dirty (mroos@ninasarvik) (gcc version 7.2.0 
(Debian 7.2.0-20)) #89 SMP Mon Jan 29 13:18:49 EET 2018
[0.00] x86/fpu: x87 FPU will use FXSAVE
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009dbff] usable
[0.00] BIOS-e820: [mem 0x0009dc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e5800-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xbffe] usable
[0.00] BIOS-e820: [mem 0xbfff-0xbbff] ACPI data
[0.00] BIOS-e820: [mem 0xbc00-0xbfff] ACPI NVS
[0.00] BIOS-e820: [mem 0xfec0-0xfecf] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfeef] reserved
[0.00] BIOS-e820: [mem 0xfff8-0x] reserved
[0.00] Notice: NX (Execute Disable) protection missing in CPU!
[0.00] random: fast init done
[0.00] SMBIOS 2.3 present.
[0.00] DMI: Hewlett Packard HP NetServer/HP System Board, BIOS 4.06.46 
PW 06/25/2003
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0xbfff0 max_arch_pfn = 0x10
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-C7FFF write-protect
[0.00]   C8000-E uncachable
[0.00]   F-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 0 mask F8000 write-back
[0.00]   1 base 08000 mask FC000 write-back
[0.00]   2 disabled
[0.00]   3 disabled
[0.00]   4 disabled
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] x86/PAT: PAT not supported by CPU.
[0.00] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC  
[0.00] found SMP MP-table at [mem 0x000f7610-0x000f761f] mapped at 
[(ptrval)]
[0.00] initial memory mapped: [mem 0x-0x01ff]
[0.00] Base memory trampoline at [(ptrval)] 99000 size 16384
[0.00] BRK [0x01d97000, 0x01d97fff] PGTABLE
[0.00] ACPI: Early table checksum verification disabled
[0.00] ACPI: RSDP 0x000F75A0 14 (v00 PTLTD )
[0.00] ACPI: RSDT 0xBFFFC11A 30 (v01 PTLTD  HWPC20C  
0001  LTP )
[0.00] ACPI: FACP 0xBAC1 74 (v01 HP LH 6000  
0001 PTL  0001)
[0.00] ACPI BIOS Warning (bug): Invalid length for 
FADT/Pm1aControlBlock: 32, using default 16 (20170831/tbfadt-708)
[0.00] ACPI BIOS Warning (bug): Invalid length for 
FADT/Pm1bControlBlock: 32, using default 16 (20170831/tbfadt-708)
[0.00] ACPI: DSDT 0xBFFFC14A 003977 (v01 HP LT 6000  
0001 MSFT 010B)
[0.00] ACPI: FACS 0xBFC0 40
[0.00] ACPI: APIC 0xBB35 A4 (v01 PTLTDAPIC   
0001  LTP )
[0.00] ACPI: BOOT 0xBBD9 27 (v01 PTLTD  $SBFTBL$ 
0001  LTP 0001)
[0.00] ACPI: Local APIC address 0xfee0
[0.00] 2183MB HIGHMEM available.
[0.00] 887MB LOWMEM available.
[0.00]   mapped low ram: 0 - 377fe000
[0.00]   low ram: 0 - 377fe000
[0.00] tsc: Fast TSC calibration using PIT
[0.00] BRK [0x01d98000, 0x01d98fff] PGTABLE
[0.00] Zone ranges:
[0.00]   DMA  [mem 0x1000-0x00ff]
[0.00]   Normal   [mem 0x0100-0x377fdfff]
[0.00]   HighMem  [mem 0x377fe000-0xbffe]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x1000-0x0009cfff]
[0.00]   node   0: [mem 0x0010-0xbffe]
[0.00] Initmem setup node 0 [mem 0x1000-0xbff

4.15-rc9 new insecure W+X mapping warning

2018-01-24 Thread Meelis Roos
.20
[   16.301660] mptctl: Registered with Fusion MPT base driver
[   16.301764] mptctl: /dev/mptctl @ (major,minor=10,220)
[   17.020409] e1000: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX
[   17.020573] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   17.535582] audit: type=1400 audit(1516811432.091:2): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=1013 
comm="apparmor_parser"
[   17.536021] audit: type=1400 audit(1516811432.091:3): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="/usr/bin/man//filter" 
pid=1013 comm="apparmor_parser"
[   17.536432] audit: type=1400 audit(1516811432.095:4): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="/usr/bin/man//groff" 
pid=1013 comm="apparmor_parser"
[   17.610155] audit: type=1400 audit(1516811432.167:5): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="/usr/sbin/ntpd" pid=1014 
comm="apparmor_parser"

-- 
Meelis Roos (mr...@linux.ee)


Re: powersaving-related hangs on T460s

2018-01-20 Thread Meelis Roos
> > > I tried 4.15-git on my Thinkpad T460s laptop. It is working but I have 
> > > problems waking up the computer after it has been idle.
> > > 
> > I seem to have found a better reproducer - when running on battery, it 
> > will hang after some minutes, with screen on. It just hangs.
> 
> And as of last Fridays git, it seems to have been fixed, so I did not 
> try to bisect it.

And as of yesterdays git, the problem is back again :(

Will see if I can biusect it this time on battery power.

-- 
Meelis Roos (mr...@linux.ee)


Re: lapic-related boot crash in 4.15-rc1

2018-01-16 Thread Meelis Roos
> I am compiling the x86/urgent pull that you suggested.

And it works.

-- 
Meelis Roos (mr...@linux.ee)


Re: lapic-related boot crash in 4.15-rc1

2018-01-16 Thread Meelis Roos
> I've reverted the commit which Dou pointed out in rc8. Can you please confirm 
> that
> this fixes the issue for you?

I am compiling the x86/urgent pull that you suggested.

Meanwhile the bisect finished and it came to the exact same commit by 
Dou Liyang  that he sent me for revert test. 

Reverting this patch worked on 2 of the machines, 3rd one is compiling.

-- 
Meelis Roos (mr...@linux.ee)


Re: powersaving-related hangs on T460s

2018-01-16 Thread Meelis Roos
> > I tried 4.15-git on my Thinkpad T460s laptop. It is working but I have 
> > problems waking up the computer after it has been idle.
> > 
> I seem to have found a better reproducer - when running on battery, it 
> will hang after some minutes, with screen on. It just hangs.

And as of last Fridays git, it seems to have been fixed, so I did not 
try to bisect it.

-- 
Meelis Roos (mr...@linux.ee)


Re: lapic-related boot crash in 4.15-rc1

2018-01-15 Thread Meelis Roos
> I've reverted the commit which Dou pointed out in rc8. Can you please confirm 
> that
> this fixes the issue for you?

Tried rc8 on the P3, it still hangs.

-- 
Meelis Roos (mr...@linux.ee)


Re: lapic-related boot crash in 4.15-rc1

2018-01-15 Thread Meelis Roos
On Wed, 10 Jan 2018, Thomas Gleixner wrote:

> On Wed, 10 Jan 2018, Meelis Roos wrote:
> 
> > > > On 3 of my test computers, boot hangs with 4.15 git kernels. So far I 
> > > > have traced it down to 4.14.0 being good and 4.15-rc1 being bad (bisect 
> > > > is slow because the computers are somwehat remote). Also because of 
> > > > trying to find when it started, I have not tries newer than rc5 
> > > > kernels.
> > > 
> > > Please do so. We have fixes post rc5 in that area.
> > 
> > P4 was the quickest to rebuild the kernel and it is still hanging like 
> > before with todays 4.15-rc7-00102-gcf1fb158230e.

So far I have bisected it to 4f45ed9f848f good, ae41a2a40ed4 bad. Will 
continue tomorrow.

1be2172e96e3 bad
2cd83ba5bede bad
449fcf3ab0ba bad
43ff2f4db9d0 good
313144c1bcd6 good
b18d62891aaf bad
b24591e2fcf8 good
0696d059f23c bad
023a611748fd bad
ae41a2a40ed4 bad
4f45ed9f848f good

-- 
Meelis Roos (mr...@linux.ee)


Re: powersaving-related hangs on T460s

2018-01-10 Thread Meelis Roos
> I tried 4.15-git on my Thinkpad T460s laptop. It is working but I have 
> problems waking up the computer after it has been idle.
> 
> There should be no suspend (to keep network connections alive) when the 
> laptop is on AC power, even when the lid is closed. In dmesg, I have 
> seen no indication of suspend happening. It is configured to just lock 
> the screen when the lid is closed.
> 
> Normally, I have to press a key after opening the lid to unblank the 
> screen and get to password prompt. Usually this works but sometimes 
> there is no response - power LED is on that is all, holding down power 
> button is the only way out.
> 
> Sometimes it happens overnight, sometimes it is alive in the morning. It 
> almost never happens with short 5-15 minutes breaks but it can happen 
> for about hour long breaks. There is no reliable way to reproduce the 
> problem.

I seem to have found a better reproducer - when running on battery, it 
will hang after some minutes, with screen on. It just hangs.

-- 
Meelis Roos (mr...@linux.ee)


Re: lapic-related boot crash in 4.15-rc1

2018-01-10 Thread Meelis Roos
> > P4 was the quickest to rebuild the kernel and it is still hanging like 
> > before with todays 4.15-rc7-00102-gcf1fb158230e.
> 
> I try to find a time slot for this ...

And I will try to bisect.

-- 
Meelis Roos (mr...@linux.ee)


Re: lapic-related boot crash in 4.15-rc1

2018-01-10 Thread Meelis Roos
> > On 3 of my test computers, boot hangs with 4.15 git kernels. So far I 
> > have traced it down to 4.14.0 being good and 4.15-rc1 being bad (bisect 
> > is slow because the computers are somwehat remote). Also because of 
> > trying to find when it started, I have not tries newer than rc5 
> > kernels.
> 
> Please do so. We have fixes post rc5 in that area.

P4 was the quickest to rebuild the kernel and it is still hanging like 
before with todays 4.15-rc7-00102-gcf1fb158230e.

-- 
Meelis Roos (mr...@linux.ee)


powersaving-related hangs on T460s

2018-01-10 Thread Meelis Roos
I tried 4.15-git on my Thinkpad T460s laptop. It is working but I have 
problems waking up the computer after it has been idle.

There should be no suspend (to keep network connections alive) when the 
laptop is on AC power, even when the lid is closed. In dmesg, I have 
seen no indication of suspend happening. It is configured to just lock 
the screen when the lid is closed.

Normally, I have to press a key after opening the lid to unblank the 
screen and get to password prompt. Usually this works but sometimes 
there is no response - power LED is on that is all, holding down power 
button is the only way out.

Sometimes it happens overnight, sometimes it is alive in the morning. It 
almost never happens with short 5-15 minutes breaks but it can happen 
for about hour long breaks. There is no reliable way to reproduce the 
problem.

4.14 with the same config (modulo any new config options) was working 
fine.

Nothing in the log files afterwards. Network connection is WiFi. There 
is a USB mouse connected.

Wat do I check next?

-- 
Meelis Roos (mr...@linux.ee)


  1   2   3   4   5   6   >