Re: Regression causes a hang on boot with a Comtrol PCI card

2019-04-01 Thread Jesse Hathaway
On Fri, Mar 22, 2019 at 3:02 PM Jesse Hathaway  wrote:
> > Can you boot v5.0 vanilla with "initcall_debug"?  Maybe we can narrow
> > it down to a specific quirk.
>
> yup, added the "initcall_debug" output to the ticket:
> https://bugzilla.kernel.org/show_bug.cgi?id=202927, here is the tail end
>
> [   14.896337] NET: Registered protocol family 1
> [   14.901314] initcall af_unix_init+0x0/0x4e returned 0 after 4866 usecs
> [   14.908694] calling  ipv6_offload_init+0x0/0x7f @ 1
> [   14.914238] initcall ipv6_offload_init+0x0/0x7f returned 0 after 1 usecs
> [   14.921821] calling  vlan_offload_init+0x0/0x20 @ 1
> [   14.927365] initcall vlan_offload_init+0x0/0x20 returned 0 after 0 usecs
> [   14.934948] calling  pci_apply_final_quirks+0x0/0x126 @ 1
> [   14.941106] pci :00:1a.0: calling  quirk_usb_early_handoff+0x0/0x6a0 @ 
> 1

Bjorn, did you get a chance to look at the initcall_debug output for anything
obvious to you on what might be the cause of the problem?

Thanks, Jesse Hathaway


Re: Regression causes a hang on boot with a Comtrol PCI card

2019-03-22 Thread Jesse Hathaway
> So apparently the hang happens while we're running the "final" PCI
> fixups.  This happens after all the rest of PCI is initialized.
>
> Can you boot v5.0 vanilla with "initcall_debug"?  Maybe we can narrow
> it down to a specific quirk.

yup, added the "initcall_debug" output to the ticket:
https://bugzilla.kernel.org/show_bug.cgi?id=202927, here is the tail end

[   14.896337] NET: Registered protocol family 1
[   14.901314] initcall af_unix_init+0x0/0x4e returned 0 after 4866 usecs
[   14.908694] calling  ipv6_offload_init+0x0/0x7f @ 1
[   14.914238] initcall ipv6_offload_init+0x0/0x7f returned 0 after 1 usecs
[   14.921821] calling  vlan_offload_init+0x0/0x20 @ 1
[   14.927365] initcall vlan_offload_init+0x0/0x20 returned 0 after 0 usecs
[   14.934948] calling  pci_apply_final_quirks+0x0/0x126 @ 1
[   14.941106] pci :00:1a.0: calling  quirk_usb_early_handoff+0x0/0x6a0 @ 1

thanks, Jesse Hathaway


Re: Regression causes a hang on boot with a Comtrol PCI card

2019-03-14 Thread Jesse Hathaway
> > 1302fcf0d03e (refs/bisect/bad) PCI: Configure *all* devices, not just
> > hot-added ones
> > 1c3c5eab1715 sched/core: Enable might_sleep() and smp_processor_id()
> > checks early
>
> How did you narrow it down to *two* commits, and do you have to revert
> both of them to avoid the hang?  Usually a bisection identifies a
> single commit, and the two you mention aren't related.

Sorry I should have been more verbose in what the bisection process was, I
found the problem after attempting to upgrade from linux v3.16 to v4.9. When
v4.9 hung I tried the latest kernel, v5.0, which also hanged. I began a git
bisect, but found there was more than one bad commit. Here is my current
understanding:

- [x] v3.18 vanilla, 1302fcf0d03e committed, hangs
- [x] v3.18 with revert of 1302fcf0d03e, works
.
.
.
- [x] v4.12 vanilla, hangs
- [x] v4.12 with revert of 1302fcf0d03e, works

- [x] v4.13 vanilla, 1c3c5eab1715 committed, hangs
- [x] v4.13 with revert of 1302fcf0d03e, hangs
- [x] v4.13 with revert of 1c3c5eab1715, hangs
- [x] v4.13 with revert of 1302fcf0d03e & 1c3c5eab1715, works

- [x] v5.0 vanilla, hangs
- [x] v5.0 with revert of 1302fcf0d03e & 1c3c5eab1715, works

> Can you collect a complete dmesg log (with a working kernel) and
> output of "sudo lspci -vvxxx"?  You can open a bug report at
> https://bugzilla.kernel.org, attach the logs there, and respond here
> with the URL.

Bug submitted along with the requested logs,
https://bugzilla.kernel.org/show_bug.cgi?id=202927

> Where does the hang happen?  Is it when we configure the Comtrol card?

Hang occurs after PCI is initialized, snippet below, I have included the full
output in the bug report:

[   10.561971] pci :81:00.0:   bridge window [mem 0xc800-0xc80f]
[   10.569661] pci :80:01.0: PCI bridge to [bus 81-82]
[   10.575594] pci :80:01.0:   bridge window [mem 0xc800-0xc80f]
[   10.583278] pci :80:03.0: PCI bridge to [bus 83]
[   10.589008] NET: Registered protocol family 2
[   10.594254] tcp_listen_portaddr_hash hash table entries: 65536
(order: 8, 1048576 bytes)
[   10.603671] TCP established hash table entries: 524288 (order: 10,
4194304 bytes)
[   10.612729] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[   10.620446] TCP: Hash tables configured (established 524288 bind 65536)
[   10.628124] UDP hash table entries: 65536 (order: 9, 2097152 bytes)
[   10.635541] UDP-Lite hash table entries: 65536 (order: 9, 2097152 bytes)
[   10.643669] NET: Registered protocol family 1

Please let me know if there is anything else I can provide, I am also happy to
test any patches, Jesse Hathaway


Regression causes a hang on boot with a Comtrol PCI card

2019-03-13 Thread Jesse Hathaway
Two regressions cause Linux to hang on boot when a Comtrol PCI card is present.

If I revert the following two commits, I can boot again and the card operates
without issue:

1302fcf0d03e (refs/bisect/bad) PCI: Configure *all* devices, not just
hot-added ones
1c3c5eab1715 sched/core: Enable might_sleep() and smp_processor_id()
checks early

; lspci -vs 82:00.0
82:00.0 Multiport serial controller: Comtrol Corporation Device 0061
Subsystem: Comtrol Corporation Device 0061
Flags: 66MHz, medium devsel, IRQ 35, NUMA node 1
Memory at c8004000 (32-bit, non-prefetchable) [size=4K]
Memory at c800 (32-bit, non-prefetchable) [size=16K]
Capabilities: [40] Hot-plug capable
Capabilities: [48] Power Management version 2
Kernel driver in use: rp2
Kernel modules: rp2

Is it possible that the problem is that the card claims to support Hot-plug,
but does not?

I would love to help fix this issue, please let me know what other information
would be helpful to provide.

; awk -f scripts/ver_linux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux tty01 5.0.1-amd64 #1 SMP Wed Mar 13 15:43:44 UTC 2019 x86_64 GNU/Linux

GNU C   6.3.0
GNU Make4.1
Binutils2.28
Util-linux  2.29.2
Mount   2.29.2
Linux C Library 2.24
Dynamic linker (ldd)2.24
Procps  3.3.12
Sh-utils8.26
Udev232
Modules Loaded  8021q acpi_power_meter aesni_intel aes_x86_64
ahci autofs4 bonding button coretemp crc16 crc32c_generic crc32c_intel
crc32_pclmul crct10dif_pclmul cryptd crypto_simd dca dcdbas dm_mod drm
drm_kms_helper ehci_hcd ehci_pci evdev ext4 fscrypto garp
ghash_clmulni_intel glue_helper i2c_algo_bit igb intel_cstate
intel_powerclamp intel_rapl intel_rapl_perf intel_uncore ioatdma
ipmi_devintf ipmi_msghandler ipmi_si iptable_filter ip_tables
irqbypass iTCO_vendor_support iTCO_wdt ixgbe jbd2 kvm kvm_intel
libahci libata libcrc32c libphy llc lpc_ich mbcache mdio megaraid_sas
mei mei_me mgag200 mrp mxm_wmi nf_conntrack nf_defrag_ipv4
nf_defrag_ipv6 pcc_cpufreq pcspkr rp2 sb_edac scsi_mod sd_mod sg snd
snd_pcm snd_timer soundcore stp ttm usbcore wmi x86_pkg_temp_thermal
xfrm_algo x_tables xt_conntrack xt_tcpudp