[Kernel-packages] [Bug 1782716] Re: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout
Seeing the same on Ubuntu 18.04.3 with the HWE kernel, 5.0.0-25-generic AMD A6-9225 RADEON R4 This appears to be tied in to the problems resuming the laptop from suspended (black screen, flashing cursor) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1782716 Title: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout Status in linux package in Ubuntu: Confirmed Bug description: Running the 4.17.0-5-generic kernel on a ppc64le machine with a Radeon R9 Fury GPU 0033:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ff) [ 2361.958847] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=8777, last emitted seq=8778 [ 2362.080397] EEH: Frozen PHB#33-PE#0 detected [ 2362.080470] EEH: PE location: CPU2 Slot1 (16x), PHB location: N/A [ 2362.080568] CPU: 53 PID: 874 Comm: kworker/53:1 Not tainted 4.17.0-5-generic #6-Ubuntu [ 2362.080575] Workqueue: events drm_sched_job_timedout [gpu_sched] [ 2362.080577] Call Trace: [ 2362.080584] [c000fb7078f0] [c0d275ac] dump_stack+0xb0/0xf4 (unreliable) [ 2362.080590] [c000fb707930] [c003ba0c] eeh_dev_check_failure+0x5bc/0x5e0 [ 2362.080593] [c000fb7079e0] [c003babc] eeh_check_failure+0x8c/0xd0 [ 2362.080628] [c000fb707a20] [c0080cfa1b88] amdgpu_mm_rreg+0x280/0x2a0 [amdgpu] [ 2362.080676] [c000fb707a70] [c0080d04cf68] gmc_v8_0_check_soft_reset+0x30/0xe0 [amdgpu] [ 2362.080711] [c000fb707aa0] [c0080cfa1194] amdgpu_device_ip_check_soft_reset.part.1+0x8c/0x140 [amdgpu] [ 2362.080745] [c000fb707b30] [c0080cfa649c] amdgpu_device_gpu_recover+0x854/0xa40 [amdgpu] [ 2362.080799] [c000fb707c00] [c0080d0b97a4] amdgpu_job_timedout+0x5c/0x80 [amdgpu] [ 2362.080805] [c000fb707c70] [c0080c8f0040] drm_sched_job_timedout+0x38/0x60 [gpu_sched] [ 2362.080810] [c000fb707c90] [c0137928] process_one_work+0x298/0x580 [ 2362.080813] [c000fb707d20] [c0137c98] worker_thread+0x88/0x610 [ 2362.080817] [c000fb707dc0] [c0140958] kthread+0x1a8/0x1b0 [ 2362.080822] [c000fb707e30] [c000b658] ret_from_kernel_thread+0x5c/0x84 [ 2362.080827] [drm] IP block:gmc_v8_0 is hung! [ 2362.080832] [drm] IP block:tonga_ih is hung! [ 2362.080843] [drm] IP block:gfx_v8_0 is hung! [ 2362.080845] EEH: Detected PCI bus error on PHB#33-PE#0 [ 2362.080847] EEH: This PCI device has failed 1 times in the last hour [ 2362.080849] EEH: Notify device drivers to shutdown [ 2362.080850] [drm] IP block:sdma_v3_0 is hung! [ 2362.080856] [drm] IP block:uvd_v6_0 is hung! [ 2362.080858] EEH: Collect temporary log [ 2362.080866] [drm] IP block:vce_v3_0 is hung! [ 2362.080867] [drm] GPU recovery disabled. [ 2362.080903] EEH: of node=0033:01:00.1 [ 2362.080905] EEH: PCI device/vendor: [ 2362.080907] EEH: PCI cmd/status register: [ 2362.080908] EEH: PCI-E capabilities and status follow: [ 2362.080915] EEH: PCI-E 00: [ 2362.080920] EEH: PCI-E 10: [ 2362.080921] EEH: PCI-E 20: [ 2362.080922] EEH: PCI-E AER capability register set follows: [ 2362.080928] EEH: PCI-E AER 00: [ 2362.080933] EEH: PCI-E AER 10: [ 2362.080938] EEH: PCI-E AER 20: [ 2362.080940] EEH: PCI-E AER 30: [ 2362.080941] EEH: of node=0033:01:00.0 [ 2362.080943] EEH: PCI device/vendor: [ 2362.080945] EEH: PCI cmd/status register: [ 2362.080945] EEH: PCI-E capabilities and status follow: [ 2362.080951] EEH: PCI-E 00: [ 2362.080956] EEH: PCI-E 10: [ 2362.080957] EEH: PCI-E 20: [ 2362.080958] EEH: PCI-E AER capability register set follows: [ 2362.080964] EEH: PCI-E AER 00: [ 2362.080969] EEH: PCI-E AER 10: [ 2362.080974] EEH: PCI-E AER 20: [ 2362.080975] EEH: PCI-E AER 30: [ 2362.080977] PHB4 PHB#51 Diag-data (Version: 1) [ 2362.080978] brdgCtl:0002 [ 2362.080979] RootSts:00060020 00402000 c1010008 00100107 [ 2362.080980] RootErrSts: 0020 [ 2362.080981] PhbSts: 001c 001c [ 2362.080982] Lem:0001 0001 [ 2362.080983] PhbErr: 00c0 0080 214898000240 a0084000 [ 2362.080984] RegbErr:0090 0010 483c 0200 [ 2362.080985] PE[000] A/B:
[Kernel-packages] [Bug 1790652] Re: Oracle cosmic image does not find broadcom network device in Shape VMStandard2.1
I'm confused. Do you need verification or not? Cosmic is not specifically supported on our platform, and there are no plans at the moment to support non-LTS releases that I know of. I can certainly test this if needs be, though. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1790652 Title: Oracle cosmic image does not find broadcom network device in Shape VMStandard2.1 Status in linux package in Ubuntu: Fix Released Status in linux source package in Cosmic: Fix Released Bug description: I tried to register and boot a cosmic image to verify new changes in it and in cloud-init. The image failed to bring up networking in the initramfs, and thus failed to find iscsi root. this could be user error. Here is what I did to publish the image. - use oci build tool [1]. following https://docs.cloud.oracle.com/iaas/Content/Compute/Tasks/imageimportexport.htm#ImportinganImage - Download a livefs build from cloudware https://launchpad.net/~cloudware/+livefs/ubuntu/cosmic/cpc/ example: livecd.ubuntu-cpc.oracle_bare_metal.img My image had version 20180821.1 - oci os bucket create --name=smoser-devel - oci os object put \ --parallel-upload-count=4 \ --part-size=10 \ --bucket-name=smoser-devel \ --file=/tmp/livecd.ubuntu-cpc.oracle_bare_metal.img \ --name=cosmic-20180821.1.img - import the object $ oci compute image import from-object \ --display-name=smoser-cosmic-20180821.1.img \ --launch-mode=NATIVE \ --namespace=intcanonical \ --bucket-name=smoser-devel \ --name=cosmic-20180821.1.img \ --source-image-type=QCOW2 Then I launched from the web UI a VM.Standard2.1. -- https://docs.cloud.oracle.com/iaas/Content/API/Concepts/cliconcepts.htm To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1790652/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1790652] Re: Oracle cosmic image does not find broadcom network device in Shape VMStandard2.1
Patch submitted to netdev: https://marc.info/?l=linux- netdev=153695411427176=2 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1790652 Title: Oracle cosmic image does not find broadcom network device in Shape VMStandard2.1 Status in linux package in Ubuntu: Triaged Status in linux source package in Cosmic: Triaged Bug description: I tried to register and boot a cosmic image to verify new changes in it and in cloud-init. The image failed to bring up networking in the initramfs, and thus failed to find iscsi root. this could be user error. Here is what I did to publish the image. - use oci build tool [1]. following https://docs.cloud.oracle.com/iaas/Content/Compute/Tasks/imageimportexport.htm#ImportinganImage - Download a livefs build from cloudware https://launchpad.net/~cloudware/+livefs/ubuntu/cosmic/cpc/ example: livecd.ubuntu-cpc.oracle_bare_metal.img My image had version 20180821.1 - oci os bucket create --name=smoser-devel - oci os object put \ --parallel-upload-count=4 \ --part-size=10 \ --bucket-name=smoser-devel \ --file=/tmp/livecd.ubuntu-cpc.oracle_bare_metal.img \ --name=cosmic-20180821.1.img - import the object $ oci compute image import from-object \ --display-name=smoser-cosmic-20180821.1.img \ --launch-mode=NATIVE \ --namespace=intcanonical \ --bucket-name=smoser-devel \ --name=cosmic-20180821.1.img \ --source-image-type=QCOW2 Then I launched from the web UI a VM.Standard2.1. -- https://docs.cloud.oracle.com/iaas/Content/API/Concepts/cliconcepts.htm To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1790652/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1790652] Re: Oracle cosmic image does not find broadcom network device in Shape VMStandard2.1
I've been able to replicate the situation with a few different distributions. It seems to only occur with VMs. When I tried 4.18.7 on a bare metal instance, there was no problem. We believe we've isolated the kernel commit that is introducing the problem to 707e7e96602675beb5e09bb994195663da6eb56d -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1790652 Title: Oracle cosmic image does not find broadcom network device in Shape VMStandard2.1 Status in linux package in Ubuntu: Triaged Status in linux source package in Cosmic: Triaged Bug description: I tried to register and boot a cosmic image to verify new changes in it and in cloud-init. The image failed to bring up networking in the initramfs, and thus failed to find iscsi root. this could be user error. Here is what I did to publish the image. - use oci build tool [1]. following https://docs.cloud.oracle.com/iaas/Content/Compute/Tasks/imageimportexport.htm#ImportinganImage - Download a livefs build from cloudware https://launchpad.net/~cloudware/+livefs/ubuntu/cosmic/cpc/ example: livecd.ubuntu-cpc.oracle_bare_metal.img My image had version 20180821.1 - oci os bucket create --name=smoser-devel - oci os object put \ --parallel-upload-count=4 \ --part-size=10 \ --bucket-name=smoser-devel \ --file=/tmp/livecd.ubuntu-cpc.oracle_bare_metal.img \ --name=cosmic-20180821.1.img - import the object $ oci compute image import from-object \ --display-name=smoser-cosmic-20180821.1.img \ --launch-mode=NATIVE \ --namespace=intcanonical \ --bucket-name=smoser-devel \ --name=cosmic-20180821.1.img \ --source-image-type=QCOW2 Then I launched from the web UI a VM.Standard2.1. -- https://docs.cloud.oracle.com/iaas/Content/API/Concepts/cliconcepts.htm To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1790652/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1790652] Re: Oracle cosmic image does not find broadcom network device in Shape VMStandard2.1
I haven't specifically seen that one, but I'll check in with both the Oracle Linux team and our Hypervisor team. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1790652 Title: Oracle cosmic image does not find broadcom network device in Shape VMStandard2.1 Status in linux package in Ubuntu: Triaged Status in linux source package in Cosmic: Triaged Bug description: I tried to register and boot a cosmic image to verify new changes in it and in cloud-init. The image failed to bring up networking in the initramfs, and thus failed to find iscsi root. this could be user error. Here is what I did to publish the image. - use oci build tool [1]. following https://docs.cloud.oracle.com/iaas/Content/Compute/Tasks/imageimportexport.htm#ImportinganImage - Download a livefs build from cloudware https://launchpad.net/~cloudware/+livefs/ubuntu/cosmic/cpc/ example: livecd.ubuntu-cpc.oracle_bare_metal.img My image had version 20180821.1 - oci os bucket create --name=smoser-devel - oci os object put \ --parallel-upload-count=4 \ --part-size=10 \ --bucket-name=smoser-devel \ --file=/tmp/livecd.ubuntu-cpc.oracle_bare_metal.img \ --name=cosmic-20180821.1.img - import the object $ oci compute image import from-object \ --display-name=smoser-cosmic-20180821.1.img \ --launch-mode=NATIVE \ --namespace=intcanonical \ --bucket-name=smoser-devel \ --name=cosmic-20180821.1.img \ --source-image-type=QCOW2 Then I launched from the web UI a VM.Standard2.1. -- https://docs.cloud.oracle.com/iaas/Content/API/Concepts/cliconcepts.htm To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1790652/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
I took a step back from doing bisecting and focussed on creating a replication scenario, which I've done successfully. ipconfig is struggling to handle things when two interfaces are present and sending out DHCP requests, even if one interface doesn't get a response. Here's what I've done: Using virt-manager I created a bridge, bridge1, with no IP range associated with it (I want dnsmasq on a host to handle IP). I created a second, bridge2, likewise with no IP range associated with it ready for later use. $$$ I created an instance, named primary, with two NICs, one doing the usual NAT stuff so it has internet access. One hooked up to bridge1. I gave it two storage devices, 1 (sda) at 15Gb in size to act as local storage, 1 (sdb) 40Gb in size to be hosted over iSCSI (in hindsight, no reason for it not to be 15Gb too). Install Ubuntu 16.04.1 LTS on the primary instance, pretty much following through with defaults, but leaving the second hard drive unused. Reboot and bring up the instance. In my case I end up with ens3 being the NATing interface, ens9 being hooked up to the bridge interface. ## sudo apt update sudo apt upgrade ## Add to /etc/network/interfaces: auto ens9 iface ens9 inet static address 192.168.0.1/24 ## Then: sudo apt install open-iscsi targetcli dnsmasq ## dnsmasq config: log-queries log-dhcp interface=ens9 dhcp-range=192.168.0.50,192.168.0.150,12h dhcp-boot=script.ipxe enable-tftp tftp-root=/tftpd tftp-no-fail ## Then run targetcli and do the following commands: backstores/iblock create uefi /dev/sdb /iscsi create iqn.2015-02.oracle.boot:uefi cd iqn.2015-02.oracle.boot:uefi/tpg1 luns/ create /backstores/block/uefi portals/ create 0.0.0.0 set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1 exit ## sudo mkdir /tftpd sudo chown dnsmasq: /tftpd ## /tftpd/script.ipxe: #!ipxe set initiator-iqn iqn.2015-02.oracle.boot:uefi sanboot iscsi:192.168.0.1iqn.2015-02.oracle.boot:uefi ## This gets the host pretty much ready to be an iscsi target for a host. The host has been patched etc, so reboot. You may want to set up ip forwarding etc on this instance. $$$ Second host: No storage. Attach Ubuntu 16.04.1 LTS iso to the instance to boot from initially. Two NICs, first attached to bridge1. Second attached to bridge2. Go through the installation procedure, logging in to the iscsi endpoint on 192.168.0.1, using the details above (no username/password necessary with this configuration) and install to the iSCSI target. At the end, detach the CD-ROM and ensure everything is set up to network boot. On start-up you should see it network boot happily, everything is awesome. Do a "sudo apt update" and "sudo apt upgrade". Then reboot. On start-up you should see the bug happening. ipconfig is sending out DHCP requests on both interfaces and failing to accept any responses it is being sent ("journalctl -xef -u dnsmasq" on primary shows it is sending them). If you remove that second NIC, you'll see that the instance is able to boot happily. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
I'm continuing to bisect the mainline linux kernel, and also trying to see if I can create a straightforward reproducible example. First focus on bisecting was between 4.5 and 4.6, to figure out what changed to suddenly have ipconfig working. I've tracked it down to this using bisect, and validated it afterwards: commit 689de1d6ca95b3b5bd8ee446863bf81a4883ea25 Author: Linus TorvaldsDate: Mon May 2 12:46:42 2016 -0700 Minimal fix-up of bad hashing behavior of hash_64() This is a fairly minimal fixup to the horribly bad behavior of hash_64() with certain input patterns. In particular, because the multiplicative value used for the 64-bit hash was intentionally bit-sparse (so that the multiply could be done with shifts and adds on architectures without hardware multipliers), some bits did not get spread out very much. In particular, certain fairly common bit ranges in the input (roughly bits 12-20: commonly with the most information in them when you hash things like byte offsets in files or memory that have block factors that mean that the low bits are often zero) would not necessarily show up much in the result. There's a bigger patch-series brewing to fix up things more completely, but this is the fairly minimal fix for the 64-bit hashing problem. It simply picks a much better constant multiplier, spreading the bits out a lot better. NOTE! For 32-bit architectures, the bad old hash_64() remains the same for now, since 64-bit multiplies are expensive. The bigger hashing cleanup will replace the 32-bit case with something better. The new constants were picked by George Spelvin who wrote that bigger cleanup series. I just picked out the constants and part of the comment from that series. Cc: sta...@vger.kernel.org Cc: George Spelvin Cc: Thomas Gleixner Signed-off-by: Linus Torvalds Next up is tracking down what changed between 4.7 and 4.8. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. Ubuntu kernel bisect offending commit: # first bad commit: [fd4b5fa6e3487d15ede746f92601af008b2abbc0] mnt: Add a per mount namespace limit on the number of mounts Ubuntu kernel bisect offending commit submission: https://lkml.org/lkml/2016/10/5/308 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
I've tried every version in the v4 series, and a few in v3. None prior to (and including) v4.0.0 will boot, none output anything on the screen to give me a clue why they're not booting. So far: v4.0 = won't boot v4.1 = ipconfig bug v4.2 = ipconfig bug v4.3 = ipconfig bug v4.4 = ipconfig bug v4.5 = ipconfig bug v4.6 = Boots v4.7 = Boots v4.8 = ipconfig bug v4.9 = ipconfig bug v4.10 = ipconfig bug I'm getting seriously concerned that "working" is actually the aberration. It's working in just two out of ten releases. I do have two things I should probably bisect there: 1) what changed between 4.5 and 4.6, and 2) what changed between 4.7 and 4.8. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. Ubuntu kernel bisect offending commit: # first bad commit: [fd4b5fa6e3487d15ede746f92601af008b2abbc0] mnt: Add a per mount namespace limit on the number of mounts Ubuntu kernel bisect offending commit submission: https://lkml.org/lkml/2016/10/5/308 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
The more I look at this, the more I'm convinced *most* of the real problem lies in that ipconfig tool. Yes, various kernel changes seem to make it alter between working & not working under the circumstances (which is bizarre), but unless something is specifically interfering with the inter-process communication, ipconfig appears to be ignoring valid dhcp responses, just based on whether you tell it "all" interfaces vs telling it a specific interface. A small modification could be made to the initramfs-tools to have it iterate over the interfaces in the system one-at-a-time. It would marginally slow down the boot should the relevant interface not be the first, but it would get rid of this bug entirely. Or the intird environment could be modified to use dhclient instead of ipconfig (dhclient appears to be in the initrd, and works perfectly fine when called in a generic fashion, though the other initramfs-tools scripts seem aware ipconfig didn't complete successfully which I haven't looked in to) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. Offending commit: # first bad commit: [fd4b5fa6e3487d15ede746f92601af008b2abbc0] mnt: Add a per mount namespace limit on the number of mounts The offending commit submission: https://lkml.org/lkml/2016/10/5/308 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
My apologies for any lack of clarity. I tested against the head of ubuntu-xenial, reverting just that commit and it fixed it. I tested against the head of the mainstream kernel and it didn't (last night I tried 4.9, 4.8, 4.5, 4.4, 4.2 tags of the mainstream kernel and in every place I find the general bug in effect). I'll try some larger leaps and see if I can track it down elsewhere. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. Offending commit: # first bad commit: [fd4b5fa6e3487d15ede746f92601af008b2abbc0] mnt: Add a per mount namespace limit on the number of mounts The offending commit submission: https://lkml.org/lkml/2016/10/5/308 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
I tried reverting that specific commit from upstream, but that didn't resolve the issue. Time for a new round of bisecting the kernel, this time using mainline. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Triaged Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. Offending commit: # first bad commit: [fd4b5fa6e3487d15ede746f92601af008b2abbc0] mnt: Add a per mount namespace limit on the number of mounts The offending commit submission: https://lkml.org/lkml/2016/10/5/308 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
This seems to make no sense to me, as a layman anyway. I checked out the 4.4.0-58.79 tag, reverted that one commit and confirmed I have a booting 4.4.0-58-generic that'll happily DHCP in the initrd environment on multiple boots. It really does seem like, somehow, that commit is the source of the problems. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
I bisected again, and again it came back to that mount point change. This seems so bizarre. $ git bisect log # bad: [6d4f0a79e5a307b6fd3ee3cc5bbb2fcb701b09db] UBUNTU: Ubuntu-4.4.0-57.78 # good: [db5f146d309e70067dae57798c9ea679af835aa7] UBUNTU: Ubuntu-4.4.0-53.74 git bisect start 'Ubuntu-4.4.0-57.78' 'Ubuntu-4.4.0-53.74' # bad: [02bf412367b827aa5be05a315088ef5fdcf267ca] dmaengine: at_xdmac: fix spurious flag status for mem2mem transfers git bisect bad 02bf412367b827aa5be05a315088ef5fdcf267ca # bad: [1e089050b800ba7d6ba1bf5814827e6cca301ad5] smc91x: avoid self-comparison warning git bisect bad 1e089050b800ba7d6ba1bf5814827e6cca301ad5 # bad: [d7632bdaba3dd143eac3c80bb7e2b0f62259583d] xhci: use default USB_RESUME_TIMEOUT when resuming ports. git bisect bad d7632bdaba3dd143eac3c80bb7e2b0f62259583d # bad: [7942010de9a2fe39e72b84e628867f4ff29a70f2] libxfs: clean up _calc_dquots_per_chunk git bisect bad 7942010de9a2fe39e72b84e628867f4ff29a70f2 # good: [9d2524b0bdeb57f80d0279f6695a833606ad0597] UBUNTU: SAUCE: Bluetooth: decrease refcount after use git bisect good 9d2524b0bdeb57f80d0279f6695a833606ad0597 # bad: [fd4b5fa6e3487d15ede746f92601af008b2abbc0] mnt: Add a per mount namespace limit on the number of mounts git bisect bad fd4b5fa6e3487d15ede746f92601af008b2abbc0 # good: [f2109fe47ceb77647ef7d4f545efeba43d06fb64] videobuf2-v4l2: Verify planes array in buffer dequeueing git bisect good f2109fe47ceb77647ef7d4f545efeba43d06fb64 # good: [d5d9494d2092a7e571dee635ca254075912355c1] thinkpad_acpi: Add support for HKEY version 0x200 git bisect good d5d9494d2092a7e571dee635ca254075912355c1 # first bad commit: [fd4b5fa6e3487d15ede746f92601af008b2abbc0] mnt: Add a per mount namespace limit on the number of mounts -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
I see where I messed up.. I'll try the bisect again. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
Okay... I can't help but think I made a mistake somewhere in the bisecting process, but it seems to have isolated fd4b5fa6e3487d15ede746f92601af008b2abbc0 as the bad commit $ git bisect log # bad: [6d4f0a79e5a307b6fd3ee3cc5bbb2fcb701b09db] UBUNTU: Ubuntu-4.4.0-57.78 # good: [40a98f0e91bcc062babd017732cbf7cb20cf39fd] UBUNTU: Ubuntu-4.4.0-51.72 git bisect start 'Ubuntu-4.4.0-57.78' 'Ubuntu-4.4.0-51.72' # bad: [cd29d2303e86529c089b1c292480c05e7a24bd16] drm/i915: Respect alternate_ddc_pin for all DDI ports git bisect bad cd29d2303e86529c089b1c292480c05e7a24bd16 # bad: [617dec606ff9e43e64a06daef83e17da0035340a] drm/exynos: fix error handling in exynos_drm_subdrv_open git bisect bad 617dec606ff9e43e64a06daef83e17da0035340a # bad: [0dbd2050197ea4dd59f8957b72981cb7d2cfab1c] usb: gadget: function: u_ether: don't starve tx request queue git bisect bad 0dbd2050197ea4dd59f8957b72981cb7d2cfab1c # bad: [f3f9de1bd9a63b633946226ba23392ad44e2badf] i2c: core: fix NULL pointer dereference under race condition git bisect bad f3f9de1bd9a63b633946226ba23392ad44e2badf # good: [a0678a6643bf688bccce3c298a4a110af10988fc] ipv6: correctly add local routes when lo goes up git bisect good a0678a6643bf688bccce3c298a4a110af10988fc # good: [a0ae41d8ee0549161174a39d60f7316b67a87cae] Bluetooth: btusb: Add support for 0cf3:e009 git bisect good a0ae41d8ee0549161174a39d60f7316b67a87cae # good: [d5d9494d2092a7e571dee635ca254075912355c1] thinkpad_acpi: Add support for HKEY version 0x200 git bisect good d5d9494d2092a7e571dee635ca254075912355c1 # bad: [a6e674fa25854a7dafc59555d508855ea8fe3eaa] i2c: xgene: Avoid dma_buffer overrun git bisect bad a6e674fa25854a7dafc59555d508855ea8fe3eaa # bad: [fd4b5fa6e3487d15ede746f92601af008b2abbc0] mnt: Add a per mount namespace limit on the number of mounts git bisect bad fd4b5fa6e3487d15ede746f92601af008b2abbc0 # first bad commit: [fd4b5fa6e3487d15ede746f92601af008b2abbc0] mnt: Add a per mount namespace limit on the number of mounts >From a layman perspective, it doesn't seem like that could possibly cause the >bug. I guess one quick way forward, rather than repeat the whole bisecting process, is to completely reset the repository, bring it up to date, verify the bug still exists, and then revert this specific commit and see if the bug goes away. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
I'll take a fresh look in the morning, but ran into this: make[1]: Leaving directory '/home/ubuntu/storage/ubuntu-xenial/debian/build/build-generic/zfs/module' Debug: module-check-generic install -d /home/ubuntu/storage/ubuntu-xenial/debian.master/abi/4.4.0-54.76/amd64 find /home/ubuntu/storage/ubuntu-xenial/debian/build/build-generic/ -name \*.ko | \ sed -e 's/.*\/\([^\/]*\)\.ko/\1/' | sort > /home/ubuntu/storage/ubuntu-xenial/debian.master/abi/4.4.0-54.76/amd64/generic.modules II: Checking modules for generic...previous or current modules file missing! /home/ubuntu/storage/ubuntu-xenial/debian.master/abi/4.4.0-54.76/amd64/generic.modules /home/ubuntu/storage/ubuntu-xenial/debian.master/abi/4.4.0-54.75/amd64/generic.modules debian/rules.d/4-checks.mk:12: recipe for target 'module-check-generic' failed make: *** [module-check-generic] Error 1 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
I can give that a shot, following the instructions here: https://wiki.ubuntu.com/Kernel/KernelBisection#Bisecting_Ubuntu_kernel_versions -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
I should clarify, I know for certain that 4.4.0-51 is stable and reliable (and doesn't exhibit the bug). As part of our attempt to verify everything was correct with the installation we had a system run from Wednesday before Thanksgiving, all the way through to the following Monday, during which time it had an rc.local triggered reboot (so it had to be fully booted). -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
Okay.. this is interesting. It seems like the Ubuntu dev version of 4.10 is actually intermittently failing (?!) I guess the next thing to do here is keep rebooting on this version of the kernel and see how often the bug occurs vs doesn't occur, so I can get a feel for a reasonable number of times to reboot with each test kernel once I actually start bisecting. >From the dhcp server side I can't see anything different. The requests look the same. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
Rolling that command against master fails too: ubuntu@Beta:~/linux$ mainline-build-one afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc xenial *** BUILDING: commit:afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc series:xenial abinum: ... full_version<4.4.0> version<4.4.0> long abinum<040400> fatal: 'xenial' does not appear to be a git repository fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. error: pathspec 'xenial/master' did not match any file(s) known to git. Deleted branch BUILD.040400 (was 794249c). Checking out files: 100% (33279/33279), done. Switched to a new branch 'BUILD.040400' vvv - build head commit afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc Author: Linus TorvaldsDate: Sun Jan 10 15:01:32 2016 -0800 Linux 4.4 ^^^ - build head fatal: invalid reference: xenial/master fatal: invalid reference: xenial/master-next fatal: invalid reference: xenial/master fatal: invalid reference: xenial/master-next On branch BUILD.040400 nothing to commit, working directory clean *** checking /home/ubuntu/kteam-tools/mainline-build/adhoc/0001-DISABLE-comedi.patch (drivers/staging/comedi/drivers/das08_cs.c 47a4f33c4733880faa50f0e64a6e5c8f 79236ea0358db3c7a7a8a5f081c320b4) ... md5sum: drivers/staging/ti-st/st_kim.c: No such file or directory *** checking /home/ubuntu/kteam-tools/mainline-build/adhoc/0002-DISABLE-ti-st.patch (drivers/staging/ti-st/st_kim.c b41944e0c30683bdedb6a66e11098892 ) ... md5sum: drivers/staging/hv/hv_mouse.c: No such file or directory *** checking /home/ubuntu/kteam-tools/mainline-build/adhoc/0003-DISABLE-hyperv.patch (drivers/staging/hv/hv_mouse.c afd5524c29871a8293518f0be50a7474 ) ... *** checking /home/ubuntu/kteam-tools/mainline-build/adhoc/0004-DISABLE-olpc.patch (drivers/staging/olpc_dcon/olpc_dcon_xo_1.c 13b325ae1aeee7f8602759057ed0d1f9 9d099e35d45e22f96c4d77694a5e6c58) ... *** checking /home/ubuntu/kteam-tools/mainline-build/adhoc/0005-UBUNTU-olpc_dcon_xo_1-needs-delay.h.patch (drivers/staging/olpc_dcon/olpc_dcon_xo_1.c 6a0ae9f73f4878052202473bb952d6e4 9d099e35d45e22f96c4d77694a5e6c58) ... *** checking /home/ubuntu/kteam-tools/mainline-build/adhoc/0006-UBUNTU-olpc_dcon_xo_1_5-needs-delay.h.patch (drivers/staging/olpc_dcon/olpc_dcon_xo_1_5.c 55c01b13d520fa0cdde88d8d3034f21c 37460a6a542aa92444e9114105621f18) ... *** checking /home/ubuntu/kteam-tools/mainline-build/adhoc/0007-x86-idle-APM-requires-pm_idle-always-when-it-is-a-mo.patch (arch/x86/kernel/process.c 1ded15dd3a3cb622df182d60160ff826 73538a1ff57235e73e0342d9efa681f5) ... md5sum: debian/rules.d/2-binary-arch.mk: No such file or directory *** checking /home/ubuntu/kteam-tools/mainline-build/adhoc/0008-UBUNTU-packaging-do-not-fail-secure-copy-on-older-ke.patch (debian/rules.d/2-binary-arch.mk 647c141b53e037781844f0c04234526e ) ... md5sum: arch/arm/mach-highbank/clock.c: No such file or directory *** checking /home/ubuntu/kteam-tools/mainline-build/adhoc/0009-UBUNTU-SAUCE-highbank-export-clock-functions-for-mod.patch (arch/arm/mach-highbank/clock.c 119a926bf04eae5024a3002b626ef8bc ) ... *** applying /home/ubuntu/kteam-tools/mainline-build/adhoc/any-0001-UBUNTU-SAUCE-add-vmlinux.strip-to-BOOT_TARGETS1-on-p.patch ... Applying: UBUNTU: SAUCE: add vmlinux.strip to BOOT_TARGETS1 on powerpc *** applying /home/ubuntu/kteam-tools/mainline-build/adhoc/any-0001-UBUNTU-SAUCE-tools-hv-lsvmbus-add-manual-page.patch ... Applying: UBUNTU: SAUCE: tools/hv/lsvmbus -- add manual page *** applying /home/ubuntu/kteam-tools/mainline-build/adhoc/yakkety-0001-disable-pie-when-gcc-has-it-enabled-by-default.patch ... Applying: UBUNTU: SAUCE: (no-up) disable -pie when gcc has it enabled by default fatal: Not a valid object name xenial/master-next:debian.master/changelog dpkg-parsechangelog: warning:-(l0): found end of file where expected first heading dpkg-parsechangelog: error: fatal error occurred while parsing - fatal: Not a valid object name xenial/master:debian.master/changelog dpkg-parsechangelog: warning:-(l0): found end of file where expected first heading dpkg-parsechangelog: error: fatal error occurred while parsing - /home/ubuntu/kteam-tools/mainline-build/mainline-build-one: line 291: debian/changelog.new: No such file or directory mv: cannot stat 'debian/changelog.new': No such file or directory On branch BUILD.040400 nothing to commit, working directory clean *** using configs from Ubuntu-0 () ... fatal: invalid reference: Ubuntu-0 fatal: invalid reference: xenial/ xenial-amd64: chroot not found (::,) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
Gah.. okay https://wiki.ubuntu.com/KernelTeam/GitKernelBuild -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
Ahh, I see where the kteam tools stuff is supposed to come from. It's not clear if I'm supposed to go down that route and use the mainline-build-one script or not when trying to build the kernel in this case. If I use the mainline-build-one tool: $ mainline-build-one afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc xenial *** BUILDING: commit:afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc series:xenial abinum: ... full_version<4.4.0> version<4.4.0> long abinum<040400> fatal: 'xenial' does not appear to be a git repository fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. error: pathspec 'xenial/master' did not match any file(s) known to git. error: Cannot delete the branch 'BUILD.040400' which you are currently on. fatal: A branch named 'BUILD.040400' already exists. The only way this tool works with that syntax is to switch to the master branch, and run it from there. I'm not sure how that's supposed to work with git bisect, given bisect is setting your checked out position. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
I'll get started on it. This might take a while to do. A couple of quick observations: 1) we haven't validated that mainline 4.4.0 actually works. I only know certain Ubuntu versions of the 4.4.0 kernel work. Given how much seems to be changing between Ubuntu releases of it, that seems a risky assumption to make. I'll start by proving that first. 2) On the wiki you linked to: "To do this, you can use the mainline- build-one script which can be found at ~kteam-tools/malinline-build /maineline-build-one ." A proper link would be useful. Where is ~kteam-tools? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
Tried and tested (the current up-to-date kernels at the time of posting): http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10-rc1/linux- headers-4.10.0-041000rc1-generic_4.10.0-041000rc1.201612252031_amd64.deb http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10-rc1/linux- image-4.10.0-041000rc1-generic_4.10.0-041000rc1.201612252031_amd64.deb They do not appear to suffer from the bug, dhcp was able to complete happily via the startup scripts in the initrd environment, and the host booted successfully. ** Tags added: kernel-fixed-upstream ** Tags added: kernel-fixed-upstream-4.10-rc1 ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Confirmed Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
I've also confirmed the bug is present all the way back in 4.4.0-21-generic, and is present in 4.8.0-34-generic from yakkety- proposed. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Confirmed Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
I've worked my way back through the kernels. The bug, as it was (avoided by ip=dhcp in the kernel command line), was in effect in version 4.4.0-38-generic. It was fixed in 4.4.0-42-generic. This is the state of play so far with kernels I've tested: linux-image-4.4.0-38-generic - Affected linux-image-4.4.0-42-generic - Fine linux-image-4.4.0-43-generic - Fine linux-image-4.4.0-45-generic - Fine linux-image-4.4.0-47-generic - Fine linux-image-4.4.0-51-generic - Fine linux-image-4.4.0-53-generic - Fine linux-image-4.4.0-57-generic - Affected linux-image-4.4.0-58-generic - Affected (kernel in proposed) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Confirmed Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
apport-collect doesn't exist in initrd. I'm unable to supply the requested information. ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Confirmed Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
** Attachment added: "pcap from dhcp server side of 'ipconfig -t "dhcp" -d "ens2f0" '" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+attachment/4795819/+files/worked.pcap -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
The checksum invalid mentioned in the pcap is interesting, but happens in both failed and successful, so I'm not sure it's relevant. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
** Attachment added: "pcap from dhcp server side of inird startup doing dhcp" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+attachment/4795820/+files/failed.pcap -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: Incomplete Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
** Attachment added: "Working 4.4.0-53 initrd" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+attachment/4795794/+files/initrd.img-4.4.0-53-generic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: New Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] Re: initrd dhcp fails / ignores valid response
** Attachment added: "4.4.0-57 "broken" initrd" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+attachment/4795793/+files/initrd.img-4.4.0-57-generic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux package in Ubuntu: New Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652348/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1652348] [NEW] initrd dhcp fails / ignores valid response
Public bug reported: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel components. I'm going to try and track back through kernel versions to see if I can find which version the fix happened in to maybe provide some additional context. I'll also attach copies of the initrds, packet captures etc. ** Affects: linux-meta (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-meta in Ubuntu. https://bugs.launchpad.net/bugs/1652348 Title: initrd dhcp fails / ignores valid response Status in linux-meta package in Ubuntu: New Bug description: Between kernel versions 4.4.0-53 and 4.4.0-57 a bug has been (re?)introduced that is breaking dhcp booting in the initrd environment. This is stopping instances that use iscsi storage from being able to connect. Over serial console it outputs: IP-Config: no response after 2 secs - giving up IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f1 hardware address 90:e2:ba:d1:36:39 mtu 1500 DHCP RARP IP-Config: no response after 3 secs - giving up with increasing delays until it fails. At which point a simple ipconfig -t dhcp -d "ens2f0" works. The console output is slightly garbled but should give you an idea: (initramfs) ipconfig -t dhcp -[ 728.379793] ixgbe :13:00.0 ens2f0: changing MTU from 1500 to 9000 d "ens2f0" IP-Config: ens2f0 hardware address 90:e2:ba:d1:36:38 mtu 1500 DHCP RARP IP-Config: ens2f0 guessed broadcast address 10.0.1.255 IP-Config: ens2f0 complete (dhcp from 169.254.169.254): addres[ 728.980448] ixgbe :13:00.0 ens2f0: detected SFP+: 3 s: 10.0.1.56broadcast: 10.0.1.255 netmask: 255.255.255.0 gateway: 10.0.1.1 [ 729.148410] ixgbe :13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX dns0 : 169.254.169.254 dns1 : 0.0.0.0 rootserver: 169.254.169.254 rootpath: filename : /ipxe.efi tcpdumps show that dhcp requests are being received from the host, and responses sent, but not accepted by the host. When the ipconfig command is issued manually, an identical dhcp request and response happens, only this time it is accepted. It doesn't appear to be that the messages are being sent and received incorrectly, just silently ignored by ipconfig. I was seeing this behaviour earlier this year, which I was able to fix by specifying "ip=dhcp" as a kernel parameter. About a month ago that was identified as causing us other problems (long story) and we dropped it, at which point we discovered the original bug was no longer an issue. Putting "ip=dhcp" back on with this kernel no longer fixes the problem. I've compared the two initrds and effectively the only thing that has changed between the two is the kernel
[Kernel-packages] [Bug 1626679] Re: NVMe triggering kernel panic followed by "bad: scheduling from the idle thread!"
There isn't a kernel in proposed at the moment, but I've tested using the latest in yakkety and it seems to be working fine. I don't have a simple replication case for the bug, unfortunately. It just seems to happen for (hand-wavey guess) 50% of boots. So far I've got this 4.8.0-19-generic kernel to boot several times over without problem. I'll keep rebooting and rebooting the server in the background today, just in case, while I focus on other stuff. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1626679 Title: NVMe triggering kernel panic followed by "bad: scheduling from the idle thread!" Status in linux package in Ubuntu: Triaged Bug description: On an NVMe system I'm using, Ubuntu 16.04.1 regularly seems to trigger off a kernel panic against somepart of the NVMe driver it looks like, after which the logs get filled with entries over and over again of: "bad: scheduling from the idle thread!" Here's the initial stack trace that seems to trigger off the bug: Sep 22 15:51:46 ubuntu kernel: [ 97.478175] [ cut here ] Sep 22 15:51:46 ubuntu kernel: [ 97.478185] WARNING: CPU: 13 PID: 0 at /build/linux-dcxD3m/linux-4.4.0/kernel/irq/manage.c:1438 __free_irq+0x1d2/0x280() Sep 22 15:51:46 ubuntu kernel: [ 97.478188] Trying to free IRQ 38 from IRQ context! Sep 22 15:51:46 ubuntu kernel: [ 97.478191] Modules linked in: nls_iso8859_1 ipmi_ssif intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ioatdma me i_me sb_edac shpchp edac_core lpc_ich mei 8250_fintek ipmi_msghandler mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr autofs4 btrfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul ixgbe crc32_pclmu l dca vxlan aesni_intel ip6_udp_tunnel udp_tunnel aes_x86_64 lrw gf128mul ptp glue_helper ahci ablk_helper pps_core cryptd nvme libahci mdio wmi fjes Sep 22 15:51:46 ubuntu kernel: [ 97.478257] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.4.0-31-generic #50-Ubuntu Sep 22 15:51:46 ubuntu kernel: [ 97.478260] Hardware name: Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U, BIOS 30080100 04/13/2016 Sep 22 15:51:46 ubuntu kernel: [ 97.478263] 0286 4fea3140a01056a3 883f7f743b10 813f1143 Sep 22 15:51:46 ubuntu kernel: [ 97.478267] 883f7f743b58 81cb61f8 883f7f743b48 81081102 Sep 22 15:51:46 ubuntu kernel: [ 97.478271] 0026 883f5b2ea700 0026 Sep 22 15:51:46 ubuntu kernel: [ 97.478275] Call Trace: Sep 22 15:51:46 ubuntu kernel: [ 97.478277][] dump_stack+0x63/0x90 Sep 22 15:51:46 ubuntu kernel: [ 97.478290] [] warn_slowpath_common+0x82/0xc0 Sep 22 15:51:46 ubuntu kernel: [ 97.478294] [] warn_slowpath_fmt+0x5c/0x80 Sep 22 15:51:46 ubuntu kernel: [ 97.478299] [] ? try_to_grab_pending+0xb3/0x160 Sep 22 15:51:46 ubuntu kernel: [ 97.478302] [] __free_irq+0x1d2/0x280 Sep 22 15:51:46 ubuntu kernel: [ 97.478306] [] free_irq+0x3c/0x90 Sep 22 15:51:46 ubuntu kernel: [ 97.478314] [] nvme_suspend_queue+0x89/0xb0 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478320] [] nvme_disable_admin_queue+0x27/0x90 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478325] [] nvme_dev_disable+0x29e/0x2c0 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478330] [] ? __nvme_process_cq+0x210/0x210 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478334] [] ? dev_warn+0x6c/0x90 Sep 22 15:51:46 ubuntu kernel: [ 97.478340] [] nvme_timeout+0x110/0x1d0 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478344] [] ? cpumask_next_and+0x2f/0x40 Sep 22 15:51:46 ubuntu kernel: [ 97.478348] [] ? load_balance+0x18c/0x980 Sep 22 15:51:46 ubuntu kernel: [ 97.478354] [] blk_mq_rq_timed_out+0x2f/0x70 Sep 22 15:51:46 ubuntu kernel: [ 97.478358] [] blk_mq_check_expired+0x4e/0x80 Sep 22 15:51:46 ubuntu kernel: [ 97.478363] [] bt_for_each+0xd8/0xe0 Sep 22 15:51:46 ubuntu kernel: [ 97.478367] [] ? blk_mq_rq_timed_out+0x70/0x70 Sep 22 15:51:46 ubuntu kernel: [ 97.478370] [] ? blk_mq_rq_timed_out+0x70/0x70 Sep 22 15:51:46 ubuntu kernel: [ 97.478375] [] blk_mq_queue_tag_busy_iter+0x47/0xc0 Sep 22 15:51:46 ubuntu kernel: [ 97.478379] [] ? blk_mq_attempt_merge+0xb0/0xb0 Sep 22 15:51:46 ubuntu kernel: [ 97.478383] [] blk_mq_rq_timer+0x41/0xf0 Sep 22 15:51:46 ubuntu kernel: [ 97.478389] [] call_timer_fn+0x35/0x120 Sep 22 15:51:46 ubuntu kernel: [ 97.478393] [] ? blk_mq_attempt_merge+0xb0/0xb0 Sep 22 15:51:46 ubuntu kernel: [ 97.478397] [] run_timer_softirq+0x23a/0x2f0 Sep 22 15:51:46 ubuntu kernel: [ 97.478403] [] __do_softirq+0x101/0x290 Sep 22
[Kernel-packages] [Bug 1626679] Re: NVMe triggering kernel panic followed by "bad: scheduling from the idle thread!"
gzip'd copy of the kern.log showing the error. ** Attachment added: "kern.log.gz" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1626679/+attachment/4746377/+files/kern.log.gz -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1626679 Title: NVMe triggering kernel panic followed by "bad: scheduling from the idle thread!" Status in linux package in Ubuntu: New Bug description: On an NVMe system I'm using, Ubuntu 16.04.1 regularly seems to trigger off a kernel panic against somepart of the NVMe driver it looks like, after which the logs get filled with entries over and over again of: "bad: scheduling from the idle thread!" Here's the initial stack trace that seems to trigger off the bug: Sep 22 15:51:46 ubuntu kernel: [ 97.478175] [ cut here ] Sep 22 15:51:46 ubuntu kernel: [ 97.478185] WARNING: CPU: 13 PID: 0 at /build/linux-dcxD3m/linux-4.4.0/kernel/irq/manage.c:1438 __free_irq+0x1d2/0x280() Sep 22 15:51:46 ubuntu kernel: [ 97.478188] Trying to free IRQ 38 from IRQ context! Sep 22 15:51:46 ubuntu kernel: [ 97.478191] Modules linked in: nls_iso8859_1 ipmi_ssif intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ioatdma me i_me sb_edac shpchp edac_core lpc_ich mei 8250_fintek ipmi_msghandler mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr autofs4 btrfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul ixgbe crc32_pclmu l dca vxlan aesni_intel ip6_udp_tunnel udp_tunnel aes_x86_64 lrw gf128mul ptp glue_helper ahci ablk_helper pps_core cryptd nvme libahci mdio wmi fjes Sep 22 15:51:46 ubuntu kernel: [ 97.478257] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.4.0-31-generic #50-Ubuntu Sep 22 15:51:46 ubuntu kernel: [ 97.478260] Hardware name: Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U, BIOS 30080100 04/13/2016 Sep 22 15:51:46 ubuntu kernel: [ 97.478263] 0286 4fea3140a01056a3 883f7f743b10 813f1143 Sep 22 15:51:46 ubuntu kernel: [ 97.478267] 883f7f743b58 81cb61f8 883f7f743b48 81081102 Sep 22 15:51:46 ubuntu kernel: [ 97.478271] 0026 883f5b2ea700 0026 Sep 22 15:51:46 ubuntu kernel: [ 97.478275] Call Trace: Sep 22 15:51:46 ubuntu kernel: [ 97.478277][] dump_stack+0x63/0x90 Sep 22 15:51:46 ubuntu kernel: [ 97.478290] [] warn_slowpath_common+0x82/0xc0 Sep 22 15:51:46 ubuntu kernel: [ 97.478294] [] warn_slowpath_fmt+0x5c/0x80 Sep 22 15:51:46 ubuntu kernel: [ 97.478299] [] ? try_to_grab_pending+0xb3/0x160 Sep 22 15:51:46 ubuntu kernel: [ 97.478302] [] __free_irq+0x1d2/0x280 Sep 22 15:51:46 ubuntu kernel: [ 97.478306] [] free_irq+0x3c/0x90 Sep 22 15:51:46 ubuntu kernel: [ 97.478314] [] nvme_suspend_queue+0x89/0xb0 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478320] [] nvme_disable_admin_queue+0x27/0x90 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478325] [] nvme_dev_disable+0x29e/0x2c0 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478330] [] ? __nvme_process_cq+0x210/0x210 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478334] [] ? dev_warn+0x6c/0x90 Sep 22 15:51:46 ubuntu kernel: [ 97.478340] [] nvme_timeout+0x110/0x1d0 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478344] [] ? cpumask_next_and+0x2f/0x40 Sep 22 15:51:46 ubuntu kernel: [ 97.478348] [] ? load_balance+0x18c/0x980 Sep 22 15:51:46 ubuntu kernel: [ 97.478354] [] blk_mq_rq_timed_out+0x2f/0x70 Sep 22 15:51:46 ubuntu kernel: [ 97.478358] [] blk_mq_check_expired+0x4e/0x80 Sep 22 15:51:46 ubuntu kernel: [ 97.478363] [] bt_for_each+0xd8/0xe0 Sep 22 15:51:46 ubuntu kernel: [ 97.478367] [] ? blk_mq_rq_timed_out+0x70/0x70 Sep 22 15:51:46 ubuntu kernel: [ 97.478370] [] ? blk_mq_rq_timed_out+0x70/0x70 Sep 22 15:51:46 ubuntu kernel: [ 97.478375] [] blk_mq_queue_tag_busy_iter+0x47/0xc0 Sep 22 15:51:46 ubuntu kernel: [ 97.478379] [] ? blk_mq_attempt_merge+0xb0/0xb0 Sep 22 15:51:46 ubuntu kernel: [ 97.478383] [] blk_mq_rq_timer+0x41/0xf0 Sep 22 15:51:46 ubuntu kernel: [ 97.478389] [] call_timer_fn+0x35/0x120 Sep 22 15:51:46 ubuntu kernel: [ 97.478393] [] ? blk_mq_attempt_merge+0xb0/0xb0 Sep 22 15:51:46 ubuntu kernel: [ 97.478397] [] run_timer_softirq+0x23a/0x2f0 Sep 22 15:51:46 ubuntu kernel: [ 97.478403] [] __do_softirq+0x101/0x290 Sep 22 15:51:46 ubuntu kernel: [ 97.478407] [] irq_exit+0xa3/0xb0 Sep 22 15:51:46 ubuntu kernel: [ 97.478413] [] smp_apic_timer_interrupt+0x42/0x50 Sep 22 15:51:46 ubuntu kernel: [ 97.478417] [] apic_timer_interrupt+0x82/0x90 Sep 22 15:51:46 ubuntu kernel: [
[Kernel-packages] [Bug 1626679] [NEW] NVMe triggering kernel panic followed by "bad: scheduling from the idle thread!"
Public bug reported: On an NVMe system I'm using, Ubuntu 16.04.1 regularly seems to trigger off a kernel panic against somepart of the NVMe driver it looks like, after which the logs get filled with entries over and over again of: "bad: scheduling from the idle thread!" Here's the initial stack trace that seems to trigger off the bug: Sep 22 15:51:46 ubuntu kernel: [ 97.478175] [ cut here ] Sep 22 15:51:46 ubuntu kernel: [ 97.478185] WARNING: CPU: 13 PID: 0 at /build/linux-dcxD3m/linux-4.4.0/kernel/irq/manage.c:1438 __free_irq+0x1d2/0x280() Sep 22 15:51:46 ubuntu kernel: [ 97.478188] Trying to free IRQ 38 from IRQ context! Sep 22 15:51:46 ubuntu kernel: [ 97.478191] Modules linked in: nls_iso8859_1 ipmi_ssif intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ioatdma me i_me sb_edac shpchp edac_core lpc_ich mei 8250_fintek ipmi_msghandler mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr autofs4 btrfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul ixgbe crc32_pclmu l dca vxlan aesni_intel ip6_udp_tunnel udp_tunnel aes_x86_64 lrw gf128mul ptp glue_helper ahci ablk_helper pps_core cryptd nvme libahci mdio wmi fjes Sep 22 15:51:46 ubuntu kernel: [ 97.478257] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.4.0-31-generic #50-Ubuntu Sep 22 15:51:46 ubuntu kernel: [ 97.478260] Hardware name: Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U, BIOS 30080100 04/13/2016 Sep 22 15:51:46 ubuntu kernel: [ 97.478263] 0286 4fea3140a01056a3 883f7f743b10 813f1143 Sep 22 15:51:46 ubuntu kernel: [ 97.478267] 883f7f743b58 81cb61f8 883f7f743b48 81081102 Sep 22 15:51:46 ubuntu kernel: [ 97.478271] 0026 883f5b2ea700 0026 Sep 22 15:51:46 ubuntu kernel: [ 97.478275] Call Trace: Sep 22 15:51:46 ubuntu kernel: [ 97.478277][] dump_stack+0x63/0x90 Sep 22 15:51:46 ubuntu kernel: [ 97.478290] [] warn_slowpath_common+0x82/0xc0 Sep 22 15:51:46 ubuntu kernel: [ 97.478294] [] warn_slowpath_fmt+0x5c/0x80 Sep 22 15:51:46 ubuntu kernel: [ 97.478299] [] ? try_to_grab_pending+0xb3/0x160 Sep 22 15:51:46 ubuntu kernel: [ 97.478302] [] __free_irq+0x1d2/0x280 Sep 22 15:51:46 ubuntu kernel: [ 97.478306] [] free_irq+0x3c/0x90 Sep 22 15:51:46 ubuntu kernel: [ 97.478314] [] nvme_suspend_queue+0x89/0xb0 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478320] [] nvme_disable_admin_queue+0x27/0x90 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478325] [] nvme_dev_disable+0x29e/0x2c0 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478330] [] ? __nvme_process_cq+0x210/0x210 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478334] [] ? dev_warn+0x6c/0x90 Sep 22 15:51:46 ubuntu kernel: [ 97.478340] [] nvme_timeout+0x110/0x1d0 [nvme] Sep 22 15:51:46 ubuntu kernel: [ 97.478344] [] ? cpumask_next_and+0x2f/0x40 Sep 22 15:51:46 ubuntu kernel: [ 97.478348] [] ? load_balance+0x18c/0x980 Sep 22 15:51:46 ubuntu kernel: [ 97.478354] [] blk_mq_rq_timed_out+0x2f/0x70 Sep 22 15:51:46 ubuntu kernel: [ 97.478358] [] blk_mq_check_expired+0x4e/0x80 Sep 22 15:51:46 ubuntu kernel: [ 97.478363] [] bt_for_each+0xd8/0xe0 Sep 22 15:51:46 ubuntu kernel: [ 97.478367] [] ? blk_mq_rq_timed_out+0x70/0x70 Sep 22 15:51:46 ubuntu kernel: [ 97.478370] [] ? blk_mq_rq_timed_out+0x70/0x70 Sep 22 15:51:46 ubuntu kernel: [ 97.478375] [] blk_mq_queue_tag_busy_iter+0x47/0xc0 Sep 22 15:51:46 ubuntu kernel: [ 97.478379] [] ? blk_mq_attempt_merge+0xb0/0xb0 Sep 22 15:51:46 ubuntu kernel: [ 97.478383] [] blk_mq_rq_timer+0x41/0xf0 Sep 22 15:51:46 ubuntu kernel: [ 97.478389] [] call_timer_fn+0x35/0x120 Sep 22 15:51:46 ubuntu kernel: [ 97.478393] [] ? blk_mq_attempt_merge+0xb0/0xb0 Sep 22 15:51:46 ubuntu kernel: [ 97.478397] [] run_timer_softirq+0x23a/0x2f0 Sep 22 15:51:46 ubuntu kernel: [ 97.478403] [] __do_softirq+0x101/0x290 Sep 22 15:51:46 ubuntu kernel: [ 97.478407] [] irq_exit+0xa3/0xb0 Sep 22 15:51:46 ubuntu kernel: [ 97.478413] [] smp_apic_timer_interrupt+0x42/0x50 Sep 22 15:51:46 ubuntu kernel: [ 97.478417] [] apic_timer_interrupt+0x82/0x90 Sep 22 15:51:46 ubuntu kernel: [ 97.478419][] ? cpuidle_enter_state+0x111/0x2b0 Sep 22 15:51:46 ubuntu kernel: [ 97.478428] [] cpuidle_enter+0x17/0x20 Sep 22 15:51:46 ubuntu kernel: [ 97.478432] [] call_cpuidle+0x32/0x60 Sep 22 15:51:46 ubuntu kernel: [ 97.478436] [] ? cpuidle_select+0x13/0x20 Sep 22 15:51:46 ubuntu kernel: [ 97.478440] [] cpu_startup_entry+0x290/0x350 Sep 22 15:51:46 ubuntu kernel: [ 97.478444] [] start_secondary+0x154/0x190 Sep 22 15:51:46 ubuntu kernel: [ 97.478448] ---[ end trace 4f4c67e52b4d19ac ]--- then Sep 22 15:51:46 ubuntu kernel: [ 97.478463] BUG: