Re: Regression causes a hang on boot with a Comtrol PCI card
On Fri, Mar 22, 2019 at 3:02 PM Jesse Hathaway wrote: > > Can you boot v5.0 vanilla with "initcall_debug"? Maybe we can narrow > > it down to a specific quirk. > > yup, added the "initcall_debug" output to the ticket: > https://bugzilla.kernel.org/show_bug.cgi?id=202927, here is the tail end > > [ 14.896337] NET: Registered protocol family 1 > [ 14.901314] initcall af_unix_init+0x0/0x4e returned 0 after 4866 usecs > [ 14.908694] calling ipv6_offload_init+0x0/0x7f @ 1 > [ 14.914238] initcall ipv6_offload_init+0x0/0x7f returned 0 after 1 usecs > [ 14.921821] calling vlan_offload_init+0x0/0x20 @ 1 > [ 14.927365] initcall vlan_offload_init+0x0/0x20 returned 0 after 0 usecs > [ 14.934948] calling pci_apply_final_quirks+0x0/0x126 @ 1 > [ 14.941106] pci :00:1a.0: calling quirk_usb_early_handoff+0x0/0x6a0 @ > 1 Bjorn, did you get a chance to look at the initcall_debug output for anything obvious to you on what might be the cause of the problem? Thanks, Jesse Hathaway
Re: Regression causes a hang on boot with a Comtrol PCI card
> So apparently the hang happens while we're running the "final" PCI > fixups. This happens after all the rest of PCI is initialized. > > Can you boot v5.0 vanilla with "initcall_debug"? Maybe we can narrow > it down to a specific quirk. yup, added the "initcall_debug" output to the ticket: https://bugzilla.kernel.org/show_bug.cgi?id=202927, here is the tail end [ 14.896337] NET: Registered protocol family 1 [ 14.901314] initcall af_unix_init+0x0/0x4e returned 0 after 4866 usecs [ 14.908694] calling ipv6_offload_init+0x0/0x7f @ 1 [ 14.914238] initcall ipv6_offload_init+0x0/0x7f returned 0 after 1 usecs [ 14.921821] calling vlan_offload_init+0x0/0x20 @ 1 [ 14.927365] initcall vlan_offload_init+0x0/0x20 returned 0 after 0 usecs [ 14.934948] calling pci_apply_final_quirks+0x0/0x126 @ 1 [ 14.941106] pci :00:1a.0: calling quirk_usb_early_handoff+0x0/0x6a0 @ 1 thanks, Jesse Hathaway
Re: Regression causes a hang on boot with a Comtrol PCI card
> > 1302fcf0d03e (refs/bisect/bad) PCI: Configure *all* devices, not just > > hot-added ones > > 1c3c5eab1715 sched/core: Enable might_sleep() and smp_processor_id() > > checks early > > How did you narrow it down to *two* commits, and do you have to revert > both of them to avoid the hang? Usually a bisection identifies a > single commit, and the two you mention aren't related. Sorry I should have been more verbose in what the bisection process was, I found the problem after attempting to upgrade from linux v3.16 to v4.9. When v4.9 hung I tried the latest kernel, v5.0, which also hanged. I began a git bisect, but found there was more than one bad commit. Here is my current understanding: - [x] v3.18 vanilla, 1302fcf0d03e committed, hangs - [x] v3.18 with revert of 1302fcf0d03e, works . . . - [x] v4.12 vanilla, hangs - [x] v4.12 with revert of 1302fcf0d03e, works - [x] v4.13 vanilla, 1c3c5eab1715 committed, hangs - [x] v4.13 with revert of 1302fcf0d03e, hangs - [x] v4.13 with revert of 1c3c5eab1715, hangs - [x] v4.13 with revert of 1302fcf0d03e & 1c3c5eab1715, works - [x] v5.0 vanilla, hangs - [x] v5.0 with revert of 1302fcf0d03e & 1c3c5eab1715, works > Can you collect a complete dmesg log (with a working kernel) and > output of "sudo lspci -vvxxx"? You can open a bug report at > https://bugzilla.kernel.org, attach the logs there, and respond here > with the URL. Bug submitted along with the requested logs, https://bugzilla.kernel.org/show_bug.cgi?id=202927 > Where does the hang happen? Is it when we configure the Comtrol card? Hang occurs after PCI is initialized, snippet below, I have included the full output in the bug report: [ 10.561971] pci :81:00.0: bridge window [mem 0xc800-0xc80f] [ 10.569661] pci :80:01.0: PCI bridge to [bus 81-82] [ 10.575594] pci :80:01.0: bridge window [mem 0xc800-0xc80f] [ 10.583278] pci :80:03.0: PCI bridge to [bus 83] [ 10.589008] NET: Registered protocol family 2 [ 10.594254] tcp_listen_portaddr_hash hash table entries: 65536 (order: 8, 1048576 bytes) [ 10.603671] TCP established hash table entries: 524288 (order: 10, 4194304 bytes) [ 10.612729] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) [ 10.620446] TCP: Hash tables configured (established 524288 bind 65536) [ 10.628124] UDP hash table entries: 65536 (order: 9, 2097152 bytes) [ 10.635541] UDP-Lite hash table entries: 65536 (order: 9, 2097152 bytes) [ 10.643669] NET: Registered protocol family 1 Please let me know if there is anything else I can provide, I am also happy to test any patches, Jesse Hathaway
Regression causes a hang on boot with a Comtrol PCI card
Two regressions cause Linux to hang on boot when a Comtrol PCI card is present. If I revert the following two commits, I can boot again and the card operates without issue: 1302fcf0d03e (refs/bisect/bad) PCI: Configure *all* devices, not just hot-added ones 1c3c5eab1715 sched/core: Enable might_sleep() and smp_processor_id() checks early ; lspci -vs 82:00.0 82:00.0 Multiport serial controller: Comtrol Corporation Device 0061 Subsystem: Comtrol Corporation Device 0061 Flags: 66MHz, medium devsel, IRQ 35, NUMA node 1 Memory at c8004000 (32-bit, non-prefetchable) [size=4K] Memory at c800 (32-bit, non-prefetchable) [size=16K] Capabilities: [40] Hot-plug capable Capabilities: [48] Power Management version 2 Kernel driver in use: rp2 Kernel modules: rp2 Is it possible that the problem is that the card claims to support Hot-plug, but does not? I would love to help fix this issue, please let me know what other information would be helpful to provide. ; awk -f scripts/ver_linux If some fields are empty or look unusual you may have an old version. Compare to the current minimal requirements in Documentation/Changes. Linux tty01 5.0.1-amd64 #1 SMP Wed Mar 13 15:43:44 UTC 2019 x86_64 GNU/Linux GNU C 6.3.0 GNU Make4.1 Binutils2.28 Util-linux 2.29.2 Mount 2.29.2 Linux C Library 2.24 Dynamic linker (ldd)2.24 Procps 3.3.12 Sh-utils8.26 Udev232 Modules Loaded 8021q acpi_power_meter aesni_intel aes_x86_64 ahci autofs4 bonding button coretemp crc16 crc32c_generic crc32c_intel crc32_pclmul crct10dif_pclmul cryptd crypto_simd dca dcdbas dm_mod drm drm_kms_helper ehci_hcd ehci_pci evdev ext4 fscrypto garp ghash_clmulni_intel glue_helper i2c_algo_bit igb intel_cstate intel_powerclamp intel_rapl intel_rapl_perf intel_uncore ioatdma ipmi_devintf ipmi_msghandler ipmi_si iptable_filter ip_tables irqbypass iTCO_vendor_support iTCO_wdt ixgbe jbd2 kvm kvm_intel libahci libata libcrc32c libphy llc lpc_ich mbcache mdio megaraid_sas mei mei_me mgag200 mrp mxm_wmi nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 pcc_cpufreq pcspkr rp2 sb_edac scsi_mod sd_mod sg snd snd_pcm snd_timer soundcore stp ttm usbcore wmi x86_pkg_temp_thermal xfrm_algo x_tables xt_conntrack xt_tcpudp