Bug#897930: linux-image-4.9.0-6-686-pae: Random hard lockups (no panic message) on Via C7 system
Package: src:linux Version: 4.9.88-1 Severity: normal Dear Maintainer, For several weeks I have been experiencing hard lockups of my system (running a Via C7 CPU). When the system locks up, there is no panic message on the console, numlock/caps lock are inoperative, and a hard reset is the only thing that can be done. In recent days, these lockups have been happening on a near daily basis. The motherboard and RAM have been replaced, thinking it was a hardware fault, but the problem continues. -- Package-specific info: ** Version: Linux version 4.9.0-6-686-pae (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.88-1 (2018-04-29) ** Command line: BOOT_IMAGE=/boot/vmlinuz-4.9.0-6-686-pae root=/dev/mapper/vg0-root ro quiet ** Not tainted ** Kernel log: Unable to read kernel log; any relevant messages should be attached ** Model information ** Loaded modules: sit tunnel4 ip_tunnel ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_multiport ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter padlock_sha via_cputemp hwmon_vid via_rng rng_core evdev serio_raw pcspkr snd_hda_codec_via snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep sg snd_pcm shpchp snd_timer snd soundcore button parport_pc ppdev lp parport ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb xts lrw gf128mul ablk_helper cryptd aes_i586 mbcache hid_generic usbhid hid dm_mod sd_mod ata_generic padlock_aes pata_via libata ehci_pci uhci_hcd ehci_hcd i2c_viapro scsi_mod usbcore usb_common via_velocity crc_ccitt thermal fan ** PCI devices: not available ** USB devices: Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 005 Device 002: ID 17f6:0802 Unicomp, Inc Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 003 Device 002: ID 051d:0002 American Power Conversion Uninterruptible Power Supply Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub -- System Information: Debian Release: 9.4 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: i386 (i686) Kernel: Linux 4.9.0-6-686-pae (SMP w/1 CPU core) Locale: LANG=en_AU, LC_CTYPE=en_AU (charmap=ISO-8859-1), LANGUAGE=en_AU:en_US:en_GB:en (charmap=ISO-8859-1) Shell: /bin/sh linked to /bin/bash Init: systemd (via /run/systemd/system) Versions of packages linux-image-4.9.0-6-686-pae depends on: ii initramfs-tools [linux-initramfs-tool] 0.130 ii kmod23-2 ii linux-base 4.5 Versions of packages linux-image-4.9.0-6-686-pae recommends: ii firmware-linux-free 3.4 ii irqbalance 1.1.0-2.3 Versions of packages linux-image-4.9.0-6-686-pae suggests: pn debian-kernel-handbook ii grub-pc 2.02~beta3-5 pn linux-doc-4.9 Versions of packages linux-image-4.9.0-6-686-pae is related to: pn firmware-amd-graphics pn firmware-atheros pn firmware-bnx2 pn firmware-bnx2x pn firmware-brcm80211 pn firmware-cavium pn firmware-intel-sound pn firmware-intelwimax pn firmware-ipw2x00 pn firmware-ivtv pn firmware-iwlwifi pn firmware-libertas pn firmware-linux-nonfree pn firmware-misc-nonfree pn firmware-myricom pn firmware-netxen pn firmware-qlogic pn firmware-realtek pn firmware-samsung pn firmware-siano pn firmware-ti-connectivity pn xen-hypervisor -- no debconf information
Re: Processed: reopening 897204, found 897204 in 4.15.17-1, tagging 897204
Hi, 2018-05-04 9:04 GMT+02:00 Salvatore Bonaccorso : > Hi Romain, > > On Fri, May 04, 2018 at 08:44:32AM +0200, Romain Perier wrote: >> Hello, >> >> Whoops, I did something wrong apparently, sorry. >> So I only fixed the issue in sid (4.16), I have to wait that the >> package is in buster before closing this bug, I guess, right ? > > My point was to reopen the bug, the closer is marked correctly in the > debian/changelog file and it is actually not yet fixed, only in the > packaging repository commited. Once 4.16.5-2 or maybe an iteration > with an 4.16.7 import on top, will enter the archive, the BTS will > correctly handle the closing of the bug (and with correct version, say > we have then 4.16.7-1 at the time including your change, then the bug > will bie closed with that version on upload and archive entering > time). Ah, I see. I did not know that BTS closes bugs automatically (I need to continue to read documentation about it). It makes sense, yes. > > Hope this explains my reopening, marking yet as unfixed, tagging as > pending. It does, yes :) Thanks, Romain
Bug#897917: Stretch kernel 4.9.88-1 breaks startup of RPC, KDC services
Package: linux-image-4.9.0-6-amd64 Version: 4.9.88-1 Issue: == Kernel "linux-image-4.9.0-6-amd64," version 4.9.88-1, breaks systemd startup of RPC, Kerberos KDC services. Description: After upgrading to the latest Stretch kernel (4.9.88-1), RPC and KDC services time out during the boot process. This issue is being seen on a Kerberos KDC that is also an NFS client. Kerberos auth. and encryption are being used with NFS in this environment, and this KDC provides the Kerberos services for that to work. Network is functional prior to these services starting, which is proper. After the server has booted completely, I can issue `service krb5-kdc restart` and, after a short delay, the KDC service starts normally. Not sure if this is a kernel bug, a systemd bug, or something else. Since the kernel package was the only thing that was upgraded before the issue started, I'm leaning toward the kernel. Relevant output from /var/log/syslog: - May 4 09:03:17 systemd[1]: rpc-svcgssd.service: Start operation timed out. Terminating. May 4 09:03:17 systemd[1]: Failed to start RPC security service for NFS server. May 4 09:03:17 systemd[1]: rpc-svcgssd.service: Unit entered failed state. May 4 09:03:17 systemd[1]: rpc-svcgssd.service: Failed with result 'timeout'. May 4 09:03:17 systemd[1]: rpc-gssd.service: Start operation timed out. Terminating. May 4 09:03:17 systemd[1]: Failed to start RPC security service for NFS client and server. May 4 09:03:17 systemd[1]: rpc-gssd.service: Unit entered failed state. May 4 09:03:17 systemd[1]: rpc-gssd.service: Failed with result 'timeout'. May 4 09:03:20 systemd[1]: krb5-kdc.service: Start operation timed out. Terminating. May 4 09:03:20 systemd[1]: Failed to start Kerberos 5 Key Distribution Center. May 4 09:03:20 systemd[1]: krb5-kdc.service: Unit entered failed state.29s random time. May 4 09:03:20 systemd[1]: krb5-kdc.service: Failed with result 'timeout'. random time. Workaround: === Rolling back to Stretch kernel 4.9.82-1+deb9u3 fixes the issue. Setup: == 1. KDC package: krb5-kdc 1.15-1+deb9u1 2. NFS package: nfs-common 1:1.3.4-2.1 3. Kernel: linux-image-4.9.0-6-amd64 4.9.88-1 4. Systemd version: 232-25+deb9u3 5. Server is a 64-bit Xen PV domU
Bug#897893: Strange kernel panics on linux-image-4.9.0-6-amd64 with mlx4_en driver
Control: tag -1 moreinfo On Fri, 2018-05-04 at 15:35 +0300, Eugene Budanov wrote: > Package: linux-image-4.9.0-6-amd64 > Version: 4.9.82-1+deb9u3 > > Hi! > > Here's a short problem description. > > We have some Supermicro servers with the same configuration for all > machines (hardware, kernels, packages, etc). A month ago, or maybe a > bit later, all of these machines began crashing into kernel panic. I > can't find any pattern of failure at all. But it happens very often. > Some machines may drop into kernel panic a couple times a day! But > usually machines crash about every 3 to 6 days. All of these machines > have intensive network and i/o operations. > > I saved dmesg log from one of these machines after the crash (see the > attachment). > > As far as I see, every machine probably has problems with mlx4_en or > GRO. Also I see list_add double add => list_del corruption. Can I do > anything to get more detailed logs? What additional information do > you need for better problem diagnostics? The WARNING messages show that there are out-of-tree modules (i.e. not part of the kernel package) loaded. What are those? Ben. -- Ben Hutchings Every program is either trivial or else contains at least one bug signature.asc Description: This is a digitally signed message part
Processed: Re: Bug#897893: Strange kernel panics on linux-image-4.9.0-6-amd64 with mlx4_en driver
Processing control commands: > tag -1 moreinfo Bug #897893 [src:linux] Strange kernel panics on linux-image-4.9.0-6-amd64 with mlx4_en driver Added tag(s) moreinfo. -- 897893: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=897893 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Processed: forcibly merging 897427 897685
Processing commands for cont...@bugs.debian.org: > forcemerge 897427 897685 Bug #897427 [src:linux] linux-image-3.16.0-6-amd64 breaks KVM guests in libvirt Bug #897685 [src:linux] linux-image-3.16.0-6-amd64: Unable to start multiple KVM instances with libvirt Added indication that 897685 affects libvirt-daemon-system Merged 897427 897685 > thanks Stopping processing here. Please contact me if you need assistance. -- 897427: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=897427 897685: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=897685 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Processed: reassign 897893 to src:linux
Processing commands for cont...@bugs.debian.org: > reassign 897893 src:linux 4.9.82-1+deb9u3 Bug #897893 [linux-image-4.9.0-6-amd64] Strange kernel panics on linux-image-4.9.0-6-amd64 with mlx4_en driver Bug reassigned from package 'linux-image-4.9.0-6-amd64' to 'src:linux'. No longer marked as found in versions linux/4.9.82-1+deb9u3. Ignoring request to alter fixed versions of bug #897893 to the same values previously set Bug #897893 [src:linux] Strange kernel panics on linux-image-4.9.0-6-amd64 with mlx4_en driver Marked as found in versions linux/4.9.82-1+deb9u3. > thanks Stopping processing here. Please contact me if you need assistance. -- 897893: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=897893 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#897893: Strange kernel panics on linux-image-4.9.0-6-amd64 with mlx4_en driver
Package: linux-image-4.9.0-6-amd64 Version: 4.9.82-1+deb9u3 Hi! Here's a short problem description. We have some Supermicro servers with the same configuration for all machines (hardware, kernels, packages, etc). A month ago, or maybe a bit later, all of these machines began crashing into kernel panic. I can't find any pattern of failure at all. But it happens very often. Some machines may drop into kernel panic a couple times a day! But usually machines crash about every 3 to 6 days. All of these machines have intensive network and i/o operations. I saved dmesg log from one of these machines after the crash (see the attachment). As far as I see, every machine probably has problems with mlx4_en or GRO. Also I see list_add double add => list_del corruption. Can I do anything to get more detailed logs? What additional information do you need for better problem diagnostics? --- С уважением, Буданов Евгений. Системный администратор Компания «Рестрим» dmesg.log Description: Binary data lspci Description: Binary data
Bug#897802: linux: ftbfs with GCC-8
Package: src:linux Version: 4.15.17-1 Severity: normal Tags: sid buster User: debian-...@lists.debian.org Usertags: ftbfs-gcc-8 Please keep this issue open in the bug tracker for the package it was filed for. If a fix in another package is required, please file a bug for the other package (or clone), and add a block in this package. Please keep the issue open until the package can be built in a follow-up test rebuild. The package fails to build in a test rebuild on at least amd64 with gcc-8/g++-8, but succeeds to build with gcc-7/g++-7. The severity of this report will be raised before the buster release. The full build log can be found at: http://aws-logs.debian.net/2018/05/01/gcc8/linux_4.15.17-1_unstable_gcc8.log.gz The last lines of the build log are at the end of this report. To build with GCC 8, either set CC=gcc-8 CXX=g++-8 explicitly, or install the gcc, g++, gfortran, ... packages from experimental. apt-get -t=experimental install g++ Common build failures are new warnings resulting in build failures with -Werror turned on, or new/dropped symbols in Debian symbols files. For other C/C++ related build failures see the porting guide at http://gcc.gnu.org/gcc-8/porting_to.html [...] WRAParch/x86/include/generated/asm/mm-arch-hooks.h CHK include/generated/utsrelease.h UPD include/generated/utsrelease.h HOSTLD arch/x86/tools/relocs HOSTCC scripts/genksyms/genksyms.o SHIPPED scripts/genksyms/parse.tab.c SHIPPED scripts/genksyms/lex.lex.c SHIPPED scripts/genksyms/parse.tab.h HOSTCC scripts/genksyms/parse.tab.o HOSTCC scripts/genksyms/lex.lex.o CC scripts/mod/empty.o HOSTCC scripts/selinux/genheaders/genheaders HOSTCC scripts/mod/mk_elfconfig CC scripts/mod/devicetable-offsets.s CHK scripts/mod/devicetable-offsets.h UPD scripts/mod/devicetable-offsets.h MKELF scripts/mod/elfconfig.h HOSTCC scripts/mod/modpost.o HOSTCC scripts/selinux/mdp/mdp CC arch/x86/purgatory/purgatory.o AS arch/x86/purgatory/stack.o AS arch/x86/purgatory/setup-x86_64.o CC arch/x86/purgatory/sha256.o AS arch/x86/purgatory/entry64.o make[7]: *** [Makefile:46: /<>/debian/build/build_amd64_none_amd64/tools/objtool/objtool-in.o] Error 2 make[6]: *** [Makefile:63: objtool] Error 2 make[5]: *** [/<>/Makefile:1663: tools/objtool] Error 2 make[5]: *** Waiting for unfinished jobs CC arch/x86/purgatory/string.o HOSTCC scripts/mod/file2alias.o HOSTCC scripts/kallsyms HOSTCC scripts/mod/sumversion.o HOSTCC scripts/conmakehash HOSTCC scripts/recordmcount HOSTCC scripts/sortextable HOSTCC scripts/asn1_compiler HOSTCC scripts/extract-cert LD arch/x86/purgatory/purgatory.ro BIN2C arch/x86/purgatory/kexec-purgatory.c HOSTLD scripts/genksyms/genksyms HOSTLD scripts/mod/modpost make[4]: *** [Makefile:146: sub-make] Error 2 make[3]: *** [Makefile:24: __sub-make] Error 2 make[3]: Leaving directory '/<>/debian/build/build_amd64_none_amd64' make[2]: *** [debian/rules.real:190: debian/stamps/build_amd64_none_amd64] Error 2 make[2]: Leaving directory '/<>' make[1]: *** [debian/rules.gen:453: build-arch_amd64_none_amd64_real] Error 2 make[1]: Leaving directory '/<>' make: *** [debian/rules:37: build-arch] Error 2 dpkg-buildpackage: error: debian/rules binary-arch subprocess returned exit status 2
Bug#897685: linux-image-3.16.0-6-amd64: Unable to start multiple KVM instances with libvirt
Package: src:linux Version: 3.16.56-1 Severity: important Dear Maintainer, The kernel package linux-image-3.16.0-6-amd64 introduced a regression apparently preventing multiple KVM libvirt instances from running concurrently. Here is the error message when attempting to start a second instance : # virsh start error: Failed to start domain error: Cannot get interface MAC on 'vnet%d': No such device Running this command doesn't produce any dmesg log. It may simply be a regression affecting tun/tap interfaces only. See https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1602078.html -- Package-specific info: ** Version: Linux version 3.16.0-6-amd64 (debian-kernel@lists.debian.org) (gcc version 4.9.2 (Debian 4.9.2-10+deb8u1) ) #1 SMP Debian 3.16.56-1 (2018-04-28) ** Command line: BOOT_IMAGE=/boot/vmlinuz-3.16.0-6-amd64 root=/dev/md2 ro rootdelay=10 elevator=deadline hugepagesz=1GB hugepages=9 ** Not tainted ** Model information sys_vendor: Supermicro product_name: X9SRE/X9SRE-3F/X9SRi/X9SRi-3F product_version: 0123456789 chassis_vendor: Supermicro chassis_version: 0123456789 bios_vendor: American Megatrends Inc. bios_version: 3.0a board_vendor: Supermicro board_name: X9SRE/X9SRE-3F/X9SRi/X9SRi-3F board_version: 0123456789 ** Loaded modules: tun cpufreq_conservative binfmt_misc cpufreq_stats cpufreq_powersave cpufreq_userspace bridge stp llc quota_v2 quota_tree x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd ttm drm_kms_helper drm serio_raw pcspkr iTCO_wdt iTCO_vendor_support sb_edac joydev evdev edac_core tpm_tis tpm wmi processor mei_me thermal_sys shpchp mei lpc_ich ioatdma mfd_core button ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler fuse autofs4 ext4 crc16 mbcache jbd2 btrfs ohci_hcd uhci_hcd pata_via netxen_nic 3w_9xxx qlge ixgbe mdio sata_nv forcedeth via686a mptctl mptsas mptspi mptscsih mptbase dm_crypt raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 raid1 md_mod dm_mirror dm_region_hash dm_log dm_mod sata_via ata_piix sata_sis pata_sis sym53c8xx megaraid_sas megaraid aic7xxx scsi_transport_spi 3w_ sky2 r8169 skge e1000e e1000 via_rhine sis900 8139too e100 mii hid_generic usbhid hid sg sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common ehci_pci isci crc32c_intel ehci_hcd ahci libsas libahci psmouse igb libata scsi_transport_sas i2c_algo_bit i2c_i801 i2c_core dca usbcore ptp scsi_mod usb_common pps_core ** PCI devices: 00:00.0 Host bridge [0600]: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 DMI2 [8086:0e00] (rev 04) Subsystem: Super Micro Computer Inc Device [15d9:062b] Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 1a [8086:0e02] (rev 04) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport 00:02.0 PCI bridge [0604]: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 2a [8086:0e04] (rev 04) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport 00:03.0 PCI bridge [0604]: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 3a [8086:0e08] (rev 04) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport 00:03.2 PCI bridge [0604]: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 3c [8086:0e0a] (rev 04) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport 00:04.0 System peripheral [0880]: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 Crystal Beach DM
Processed: affects 897427
Processing commands for cont...@bugs.debian.org: > affects 897427 libvirt-daemon-system Bug #897427 [src:linux] linux-image-3.16.0-6-amd64 breaks KVM guests in libvirt Added indication that 897427 affects libvirt-daemon-system > thanks Stopping processing here. Please contact me if you need assistance. -- 897427: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=897427 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Re: Processed: reopening 897204, found 897204 in 4.15.17-1, tagging 897204
Hi Romain, On Fri, May 04, 2018 at 08:44:32AM +0200, Romain Perier wrote: > Hello, > > Whoops, I did something wrong apparently, sorry. > So I only fixed the issue in sid (4.16), I have to wait that the > package is in buster before closing this bug, I guess, right ? My point was to reopen the bug, the closer is marked correctly in the debian/changelog file and it is actually not yet fixed, only in the packaging repository commited. Once 4.16.5-2 or maybe an iteration with an 4.16.7 import on top, will enter the archive, the BTS will correctly handle the closing of the bug (and with correct version, say we have then 4.16.7-1 at the time including your change, then the bug will bie closed with that version on upload and archive entering time). Hope this explains my reopening, marking yet as unfixed, tagging as pending. Regards, Salvatore