Bug#754354: Something holds dentry-related mutex forever in Wheezy amd64 kernel
On Sunday 28 September 2014 at 15:01:21 +0100, Ben Hutchings wrote: > On Sat, 2014-09-27 at 19:41 +0100, Mike Crowe wrote: > > I compiled my own version of the Debian 3.2.60-1+deb7u3 kernel with > > CONFIG_LOCKDEP and panic on hung task enabled. > > > > >From the crash dump: > > > > [25202.156175] INFO: task nfsd:3247 blocked for more than 900 seconds. > > [25202.162565] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > > this message. > > [25202.170432] nfsdD 88080aa0eca8 0 3247 2 > > 0x > > [25202.170444] 88080a8e19f0 0046 0006 > > 8808 > > [25202.170458] 88080aa0e9c0 88080a8e1fd8 88080a8e1fd8 > > 001d4040 > > [25202.170472] 88040e9926c0 88080aa0e9c0 8138d6da > > 0001a04c47dd > > [25202.170488] Call Trace: > > [25202.170504] [] ? __mutex_lock_common+0x236/0x379 > > [25202.170531] [] ? fh_lock_nested+0x4d/0x61 [nfsd] > > [25202.170542] [] schedule+0x55/0x57 > > [25202.170552] [] __mutex_lock_common+0x243/0x379 > > [25202.170569] [] ? fh_lock_nested+0x4d/0x61 [nfsd] > > [25202.170581] [] mutex_lock_nested+0x2a/0x31 > > [25202.170598] [] fh_lock_nested+0x4d/0x61 [nfsd] > > [25202.170610] [] ? sched_clock+0x9/0xd > > [25202.170626] [] nfsd_lookup_dentry+0x196/0x227 [nfsd] > > [25202.170646] [] nfsd4_secinfo.part.15+0x26/0x9e [nfsd] > > [25202.170666] [] nfsd4_secinfo+0x4d/0x5b [nfsd] > > [25202.170688] [] nfsd4_proc_compound+0x265/0x43e [nfsd] > > [25202.170703] [] nfsd_dispatch+0xe2/0x1c8 [nfsd] > > [25202.170734] [] svc_process_common+0x2cf/0x4d0 [sunrpc] > > [25202.170759] [] svc_process+0x118/0x136 [sunrpc] > > [25202.170773] [] nfsd+0xeb/0x131 [nfsd] > > [25202.170796] [] ? 0xa04c0fff > > [25202.170806] [] kthread+0xa3/0xab > > [25202.170815] [] kernel_thread_helper+0x4/0x10 > > [25202.170823] [] ? retint_restore_args+0x13/0x13 > > [25202.170830] [] ? __init_kthread_worker+0x53/0x53 > > [25202.170837] [] ? gs_change+0x13/0x13 > > [25202.170842] 1 lock held by nfsd/3247: > > [25202.170845] #0: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: > > [] fh_lock_nested+0x4d/0x61 [nfsd] > > [25202.170870] Kernel panic - not syncing: hung_task: blocked tasks [snip] > nfsd is trying to lock two objects in the same class: specifically, it > locks a file handle and then the file handle for the file's parent. > It's generally safe to do this so long as they're always taken in that > order. lockdep should complain (much more verbosely) if this is not > done consistently. That makes sense. So is there any clue as to why it's blocking inside the second mutex_lock_nested? > I'm afraid this doesn't explain what's going wrong. But if there are > any more messages from lockdep further up the log (like, 15 minutes > earlier), they might do. Unfortunately not, the previous line in the log is the last message from boot time: [ 38.624072] vnet0: no IPv6 routers present Is there a way I can persuade crash(8) to tell me which process currently has the lock in question? Do you have any advice as to any more debug stuff I should try turning on when compiling the kernel? Thanks for your help. Mike. signature.asc Description: Digital signature
Bug#754354: Something holds dentry-related mutex forever in Wheezy amd64 kernel
I compiled my own version of the Debian 3.2.60-1+deb7u3 kernel with CONFIG_LOCKDEP and panic on hung task enabled. >From the crash dump: [25202.156175] INFO: task nfsd:3247 blocked for more than 900 seconds. [25202.162565] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25202.170432] nfsdD 88080aa0eca8 0 3247 2 0x [25202.170444] 88080a8e19f0 0046 0006 8808 [25202.170458] 88080aa0e9c0 88080a8e1fd8 88080a8e1fd8 001d4040 [25202.170472] 88040e9926c0 88080aa0e9c0 8138d6da 0001a04c47dd [25202.170488] Call Trace: [25202.170504] [] ? __mutex_lock_common+0x236/0x379 [25202.170531] [] ? fh_lock_nested+0x4d/0x61 [nfsd] [25202.170542] [] schedule+0x55/0x57 [25202.170552] [] __mutex_lock_common+0x243/0x379 [25202.170569] [] ? fh_lock_nested+0x4d/0x61 [nfsd] [25202.170581] [] mutex_lock_nested+0x2a/0x31 [25202.170598] [] fh_lock_nested+0x4d/0x61 [nfsd] [25202.170610] [] ? sched_clock+0x9/0xd [25202.170626] [] nfsd_lookup_dentry+0x196/0x227 [nfsd] [25202.170646] [] nfsd4_secinfo.part.15+0x26/0x9e [nfsd] [25202.170666] [] nfsd4_secinfo+0x4d/0x5b [nfsd] [25202.170688] [] nfsd4_proc_compound+0x265/0x43e [nfsd] [25202.170703] [] nfsd_dispatch+0xe2/0x1c8 [nfsd] [25202.170734] [] svc_process_common+0x2cf/0x4d0 [sunrpc] [25202.170759] [] svc_process+0x118/0x136 [sunrpc] [25202.170773] [] nfsd+0xeb/0x131 [nfsd] [25202.170796] [] ? 0xa04c0fff [25202.170806] [] kthread+0xa3/0xab [25202.170815] [] kernel_thread_helper+0x4/0x10 [25202.170823] [] ? retint_restore_args+0x13/0x13 [25202.170830] [] ? __init_kthread_worker+0x53/0x53 [25202.170837] [] ? gs_change+0x13/0x13 [25202.170842] 1 lock held by nfsd/3247: [25202.170845] #0: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [] fh_lock_nested+0x4d/0x61 [nfsd] [25202.170870] Kernel panic - not syncing: hung_task: blocked tasks I'm no expert at interpreting lockdep output but I think this is saying that nfsd is taking a nested lock and then deadlocks trying to take it again (which presumably shouldn't happen because it is nested.) Mike. -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#733059: gnome-screensaver cannot be launched by alternative gnome-session
On Thursday 31 July 2014 at 10:56:39 +0100, Simon McVittie wrote: > On Sun, 18 May 2014 at 20:36:42 +0100, Mike Crowe wrote: > > gnome-settings-daemon, nm-applet and (my) i3-gnome are launched correctly > > yet gnome-screensaver is not. This appears to be due to the following line > > in /usr/share/gnome/autostart/gnome-screensaver.desktop: > > > > AutostartCondition=GNOME3 if-session gnome-flashback [snip] > The upstream solution to this appears to have been to remove > gnome-screensaver from the autostart directory entirely, so that > it will be started by exactly those GNOME sessions that list it as > a required component (notably gnome-flashback, which is part of > src:gnome-panel, and your custom i3-gnome session). > > https://bugzilla.gnome.org/show_bug.cgi?id=683060 > https://git.gnome.org/browse/gnome-screensaver/commit/?id=1940dc6bc8ad5ee2c029714efb1276c05ca80bd4 > Could you try that, please? Manually implementing the equivalent of that commit worked for me with my session explicitly listing gnome-screensaver. > gnome-screensaver is essentially dead upstream - it's part of > gnome-session-flashback (the former gnome-session-fallback). > Both MATE and Cinnamon appear to have either forked or replaced > gnome-screensaver in their environments, which seems a shame... I'm mainly interested in locking and user-switching so I tried to work out how to just tell gdm to do that for me without luck. More digging required I think. Thanks for you help. Mike. -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#754354: Something holds dentry-related mutex forever in Wheezy amd64 kernel
On Monday 21 July 2014 at 15:01:31 +0100, Mike Crowe wrote: > It is possible that I'm seeing the same problem. Our AMD Opteron 4386 (16 > cores) machine is also getting stuck with lots of hung tasks. [snip] > PID: 4087 TASK: 88040ea63840 CPU: 2 COMMAND: "nfsd" > #0 [8804034b9c00] __schedule at 8134f195 > #1 [8804034b9c88] __mutex_lock_common.isra.5 at 8134fb74 > #2 [8804034b9cf8] mutex_lock at 8134fa62 > #3 [8804034b9d18] fh_lock_nested.isra.6 at a043d63c [nfsd] > #4 [8804034b9d28] nfsd_lookup_dentry at a043df1a [nfsd] > #5 [8804034b9d98] nfsd4_secinfo.part.15 at a0447692 [nfsd] > #6 [8804034b9dc8] nfsd4_proc_compound at a04468d6 [nfsd] > #7 [8804034b9e18] nfsd_dispatch at a043a7cd [nfsd] > #8 [8804034b9e48] svc_process_common at a0336c3f [sunrpc] > #9 [8804034b9eb8] svc_process at a0337050 [sunrpc] > #10 [8804034b9ed8] nfsd at a043a0e3 [nfsd] > #11 [8804034b9ef8] kthread at 8105f701 > #12 [8804034b9f48] kernel_thread_helper at 813576f4 > ii linux-image-amd643.2+46 > ii nfs-kernel-server1:1.2.6-4 That version information wasn't very useful, was it. :( I believe that this crash was from linux-image-3.2.0-4-amd64 3.2.60-1+deb7u1 I've just had the same failure happen with 3.2.60-1+deb7u3. Mike. -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#754354: Something holds dentry-related mutex forever in Wheezy amd64 kernel
It is possible that I'm seeing the same problem. Our AMD Opteron 4386 (16 cores) machine is also getting stuck with lots of hung tasks. Although it responds to ping, and even a KVM virtual machine running on it appears to continue working correctly, the host itself is locked up. This happens once a week - probably when the machine is under the most direct CPU load and NFS load. Once the machine is in this state I can type in a username at the login prompt but no password prompt ever appears. I forced a crashdump and it contained hundreds of tasks with backtraces involving a mutex_lock in walk_component or nfsd_lookup_dentry which look similar to Alexander's: PID: 499TASK: 880490a29080 CPU: 11 COMMAND: "nrpe" #0 [880454e099a8] __schedule at 8134f195 #1 [880454e09a30] __mutex_lock_common.isra.5 at 8134fb74 #2 [880454e09aa0] mutex_lock at 8134fa62 #3 [880454e09ac0] walk_component at 81103868 #4 [880454e09b30] link_path_walk at 811040c1 #5 [880454e09bc0] path_openat at 8110611d #6 [880454e09c50] do_filp_open at 8110646d #7 [880454e09d20] open_exec at 810fed80 #8 [880454e09d40] load_elf_binary at 81135939 #9 [880454e09e50] search_binary_handler at 810ff7fd #10 [880454e09ea0] do_execve_common.isra.24 at 81100551 #11 [880454e09f10] sys_execve at 81014dd2 #12 [880454e09f50] stub_execve at 813559ec RIP: 7fcc8991ca87 RSP: 7fffe8b91ef8 RFLAGS: 0246 RAX: 003b RBX: 0003 RCX: RDX: 0164d180 RSI: 7fffe8b91f10 RDI: 7fcc899bc3ad RBP: 0003 R8: R9: 01f2 R10: 7fcc8a88f9d0 R11: 0246 R12: 7fffe8b91f10 R13: 0400 R14: 0001 R15: 7fffe8b91f10 ORIG_RAX: 003b CS: 0033 SS: 002b and: PID: 4087 TASK: 88040ea63840 CPU: 2 COMMAND: "nfsd" #0 [8804034b9c00] __schedule at 8134f195 #1 [8804034b9c88] __mutex_lock_common.isra.5 at 8134fb74 #2 [8804034b9cf8] mutex_lock at 8134fa62 #3 [8804034b9d18] fh_lock_nested.isra.6 at a043d63c [nfsd] #4 [8804034b9d28] nfsd_lookup_dentry at a043df1a [nfsd] #5 [8804034b9d98] nfsd4_secinfo.part.15 at a0447692 [nfsd] #6 [8804034b9dc8] nfsd4_proc_compound at a04468d6 [nfsd] #7 [8804034b9e18] nfsd_dispatch at a043a7cd [nfsd] #8 [8804034b9e48] svc_process_common at a0336c3f [sunrpc] #9 [8804034b9eb8] svc_process at a0337050 [sunrpc] #10 [8804034b9ed8] nfsd at a043a0e3 [nfsd] #11 [8804034b9ef8] kthread at 8105f701 #12 [8804034b9f48] kernel_thread_helper at 813576f4 and: PID: 5013 TASK: 880805c8b180 CPU: 8 COMMAND: "getty" #0 [88080cb8b9a8] __schedule at 8134f195 #1 [88080cb8ba30] __mutex_lock_common.isra.5 at 8134fb74 #2 [88080cb8baa0] mutex_lock at 8134fa62 #3 [88080cb8bac0] walk_component at 81103868 #4 [88080cb8bb30] link_path_walk at 811040c1 #5 [88080cb8bbc0] path_openat at 8110611d #6 [88080cb8bc50] do_filp_open at 8110646d #7 [88080cb8bd20] open_exec at 810fed80 #8 [88080cb8bd40] load_elf_binary at 81135939 #9 [88080cb8be50] search_binary_handler at 810ff7fd #10 [88080cb8bea0] do_execve_common.isra.24 at 81100551 #11 [88080cb8bf10] sys_execve at 81014dd2 #12 [88080cb8bf50] stub_execve at 813559ec RIP: 7f0d1ed74a87 RSP: 7fffab157528 RFLAGS: 0206 RAX: 003b RBX: RCX: RDX: 7fffab159ee8 RSI: 7fffab157600 RDI: 00405d7c RBP: 0003 R8: R9: R10: R11: 0206 R12: 006075a0 R13: 011da750 R14: R15: ORIG_RAX: 003b CS: 0033 SS: 002b ii linux-image-amd643.2+46 ii nfs-kernel-server1:1.2.6-4 Mike. -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#733059: gnome-screensaver cannot be launched by alternative gnome-session
I'm not absolutely sure that this is what the original reporter meant but I've just run into this problem when using gnome-session to launch my own "i3-gnome" session. My session file contains: [GNOME Session] Name=i3 Gnome session RequiredComponents=gnome-settings-daemon;gnome-screensaver;nm-applet;i3-gnome gnome-settings-daemon, nm-applet and (my) i3-gnome are launched correctly yet gnome-screensaver is not. This appears to be due to the following line in /usr/share/gnome/autostart/gnome-screensaver.desktop: AutostartCondition=GNOME3 if-session gnome-flashback (I assume that modern gnome-shell launches gnome-screensaver itself now.) If I comment the line out then gnome-screensaver is launched correctly. Perhaps there is a way to invert this condition so that gnome-screensaver is enabled except on GNOME3? Thanks. Mike. -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#350183: this isn't limited to network devices
On Feb 14, Mike Crowe <[EMAIL PROTECTED]> wrote: >> I don't think this issue is limited to network devices. On Tue, Feb 14, 2006 at 01:46:30PM +0100, Marco d'Itri wrote: > You are showing some totally unrelated issue which is obviously a > bug in lvm2. I admit that the log excerpt I provided was rather confusing and did not contain the pertinent information. The issue I was aiming to describe is unrelated to LVM as far as I can determine. That vgchange not found error occurs in both successful and unsuccessful boots. I've attached two separate logs - one from a successful boot and one from a failed boot. It hope it is clear that these two logs show sata, megaraid, aic79xx and USB modules scanning for devices in a different order. This means that on some boots the boot disk on megaraid is sda and on others it's sdb. I see no reason why this randomness could not also occur between SATA, and on/off board SCSI devices but don't have the hardware available to prove this at the moment. -- Mike Crowe LILO 22.6.1 Loading Linux... BIOS data check successful Bootdata ok (command line is auto BOOT_IMAGE=Linux ro root=801 console=tty0 console=ttyS0,9600n8) Linux version 2.6.15-1-amd64-k8-smp (Debian 2.6.15-2bpo1) ([EMAIL PROTECTED]) (gcc version 3.3.5 (Debian 1:3.3.5-13)) #1 SMP Thu Jan 26 00:57:18 UTC 2006 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - fbff (usable) BIOS-e820: fbff - fbfff000 (ACPI data) BIOS-e820: fbfff000 - fc00 (ACPI NVS) BIOS-e820: ff78 - 0001 (reserved) BIOS-e820: 0001 - 0002 (usable) SRAT: PXM 0 -> APIC 0 -> Node 0 SRAT: PXM 0 -> APIC 1 -> Node 0 SRAT: PXM 1 -> APIC 2 -> Node 1 SRAT: PXM 1 -> APIC 3 -> Node 1 SRAT: Node 0 PXM 0 10-fc00 SRAT: Node 1 PXM 1 1-2 SRAT: Node 0 PXM 0 10-1 SRAT: Node 0 PXM 0 0-1 Bootmem setup node 0 -0001 Bootmem setup node 1 0001-0002 ACPI: PM-Timer IO Port: 0x1008 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:1 APIC version 16 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 15:1 APIC version 16 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) Processor #2 15:1 APIC version 16 ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) Processor #3 15:1 APIC version 16 ACPI: IOAPIC (id[0x04] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 4, version 17, address 0xfec0, GSI 0-23 ACPI: IOAPIC (id[0x05] address[0xfebff000] gsi_base[24]) IOAPIC[1]: apic_id 5, version 17, address 0xfebff000, GSI 24-27 ACPI: IOAPIC (id[0x06] address[0xfebfe000] gsi_base[28]) IOAPIC[2]: apic_id 6, version 17, address 0xfebfe000, GSI 28-31 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at fc40 (gap: fc00:378) Checking aperture... CPU 0: aperture @ 800 size 32 MB Aperture from northbridge cpu 0 too small (32 MB) No AGP bridge found Your BIOS doesn't leave a aperture memory hole Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM Mapping aperture over 65536 KB of RAM @ 800 Built 2 zonelists Kernel command line: auto BOOT_IMAGE=Linux ro root=801 console=tty0 console=ttyS0,9600n8 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 131072 bytes) time.c: Using 3.579545 MHz PM timer. time.c: Detected 2191.499 MHz processor. Console: colour VGA+ 80x50 Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) Memory: 8120696k/8388608k available (1861k kernel code, 201924k reserved, 870k data, 208k init) Calibrating delay using timer specific routine.. 4389.36 BogoMIPS (lpj=2194681) Security Framework v1.0.0 initialized SELinux: Disabled at boot. Capability LSM initialized Mount-cache hash table entries: 256 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 0(2) -> Node 0 -> Core 0 mtrr: v2.0 (20020519) Using local APIC timer interrupts. Detected 12.451 MHz APIC timer. Booting processor 1/4 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 4387.71 BogoMIPS (lpj=2193855) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 1(2) -> Node 0 -> Core 1 Dual Core AMD Opteron(tm) Processor 275 stepping 02 CPU 1: Syncing TSC
Bug#350183: this isn't limited to network devices
On Feb 14, Mike Crowe <[EMAIL PROTECTED]> wrote: >> This means that on some boots the boot disk on megaraid is sda and >> on others it's sdb. On Tue, Feb 14, 2006 at 10:12:09PM +0100, Marco d'Itri wrote: > It will happen among devices handled by different drivers. > The solution is to use the /dev/disk/ persistent symlinks. As long as the installer knows to do that when it writes /etc/fstab then users won't be shocked by a system that sometimes won't boot. Another workaround is to use yaird rather than initramfs-tools because it only loads stuff that's necessary for boot from the initrd. Thanks for your help. -- Mike Crowe -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#350183: Bug #350183: this isn't limited to network devices
I don't think this issue is limited to network devices. I have a machine with onboard SCSI, SATA and RAID on a separate card. Even though the onboard SCSI and SATA have no drives connected the machine fails to boot and drops me at a busybox shell prompt inside the initrd: Begin: Mounting root file system ... Begin: Running /scripts/local-top ... /scripts/local-top/lvm: 36: vgchange: not found Done. /init: 1: cannot open /dev/root: No such device or address Begin: Running /scripts/local-premount ... Done. Usage: modprobe [-v] [-V] [-C config-file] [-n] [-i] [-q] [-b] [-o ] [parameters...] modprobe -r [-n] [-i] [-v] ... modprobe -l -t [ -a ...] mount: Cannot read /etc/fstab: No such file or directory Begin: Running /scripts/log-bottom ... Done. Done. Begin: Running /scripts/init-bottom ... mount: Mounting /root/dev on /dev/.static/dev failed: No such file or directory Done. mount: Mounting /sys on /root/sys failed: No such file or directory mount: Mounting /proc on /root/proc failed: No such file or directory Target filesystem doesn't have /sbin/init I also get the network interface reordering problem described in this bug report on the machine. I've worked around the problem for now by disabling the unused facilities on the motherboard. The machine is running sarge/amd64 with 2.6.15 kernel and associated dependencies from backports.org: ii linux-image-2.6.15-1-amd64-k8-smp 2.6.15-2bpo1 Linux kernel 2.6.15 image on AMD64 K8 SMP machines ii udev 0.081-0bpo1 /dev/ and hotplug management daemon -- Mike Crowe -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]