Bug#426571: nfs-kernel-server: Frequent "can't read superblock", rpc.mountd spinning at 100% CPU
On Fri, Jun 01, 2007 at 02:48:42PM +0200, Tobias Diedrich wrote: > I also noticed that while I had upgraded nfs-kernel-server I was > still using an older version of libblkid1. Maybe that upgrade fixed > it. I remember issues with it, yes -- those were leaks, but still... /* Steinar */ -- Homepage: http://www.sesse.net/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#426571: nfs-kernel-server: Frequent "can't read superblock", rpc.mountd spinning at 100% CPU
Steinar H. Gunderson wrote: > On Tue, May 29, 2007 at 07:21:33PM +0200, Tobias Diedrich wrote: > > After upgrading nfs-kernel-server to a new testing version, > > nfs started behaving erratically. Upgrade to newest unstable version did > > not help. > > Hm, that's odd. Could you please take it upstream? [EMAIL PROTECTED] > should probably be able to help you. > > > [blkid_probe_all_new taking _really long_] > > That's a bit odd, too. :-) > > > blk_id_probe_all_new seems to scan /proc/partitions and the do some > > probeing on the devices? > > Mm. I take it you don't have any really weird devices attached that could be > responsible for the slowdown? Not that I can think of. However, I just upgraded the server to new hardware (faster cpu, bigger disks) and the problem seems to be gone now... I probably should try ltracing rpc.mountd to look at the execution time of blk_id_probe_all_new again, just in case it's still excessive, but the faster CPU might compensate for that. Hmm, no... Seems fine now: 14:41:50.754679 blkid_probe_all_new(0x8063810, 0x805c940, 34, 17, 0xf7ef4023) = 0 14:41:51.094040 __xstat64(3, "/home/ranma/mail", 0xffe7ef94) = 0 I also noticed that while I had upgraded nfs-kernel-server I was still using an older version of libblkid1. Maybe that upgrade fixed it. Anyway, I should be able to boot up the old components to do some tests with them... Maybe it's a timing problem that only shows up on slow CPUs? (PentiumIII 800MHz vs. Athlon64 3200+) -- Tobias PGP: http://9ac7e0bc.uguu.de このメールは十割再利用されたビットで作られています。 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#426571: nfs-kernel-server: Frequent "can't read superblock", rpc.mountd spinning at 100% CPU
Tobias Diedrich wrote: > Steinar H. Gunderson wrote: > > On Tue, May 29, 2007 at 07:21:33PM +0200, Tobias Diedrich wrote: > > > After upgrading nfs-kernel-server to a new testing version, > > > nfs started behaving erratically. Upgrade to newest unstable version did > > > not help. > > > > Hm, that's odd. Could you please take it upstream? [EMAIL PROTECTED] > > should probably be able to help you. > > > > > [blkid_probe_all_new taking _really long_] > > > > That's a bit odd, too. :-) > > > > > blk_id_probe_all_new seems to scan /proc/partitions and the do some > > > probeing on the devices? > > > > Mm. I take it you don't have any really weird devices attached that could be > > responsible for the slowdown? > > Not that I can think of. > However, I just upgraded the server to new hardware (faster cpu, > bigger disks) and the problem seems to be gone now... > I probably should try ltracing rpc.mountd to look at the execution > time of blk_id_probe_all_new again, just in case it's still > excessive, but the faster CPU might compensate for that. > > Hmm, no... Seems fine now: > 14:41:50.754679 blkid_probe_all_new(0x8063810, 0x805c940, 34, 17, > 0xf7ef4023) = 0 > 14:41:51.094040 __xstat64(3, "/home/ranma/mail", 0xffe7ef94) = 0 > > I also noticed that while I had upgraded nfs-kernel-server I was > still using an older version of libblkid1. Maybe that upgrade fixed > it. > > Anyway, I should be able to boot up the old components to do some > tests with them... Maybe it's a timing problem that only shows up > on slow CPUs? (PentiumIII 800MHz vs. Athlon64 3200+) Unfortunately (or maybe fortunately ;)), while I did boot up the old parts I could not reproduce the problem again (during the short testing time). I also tried downgrading libblkid1 back to testing but it still worked flawlessly. -- Tobias PGP: http://9ac7e0bc.uguu.de このメールは十割再利用されたビットで作られています。 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#426571: nfs-kernel-server: Frequent "can't read superblock", rpc.mountd spinning at 100% CPU
On Tue, May 29, 2007 at 07:21:33PM +0200, Tobias Diedrich wrote: > After upgrading nfs-kernel-server to a new testing version, > nfs started behaving erratically. Upgrade to newest unstable version did > not help. Hm, that's odd. Could you please take it upstream? [EMAIL PROTECTED] should probably be able to help you. > [blkid_probe_all_new taking _really long_] That's a bit odd, too. :-) > blk_id_probe_all_new seems to scan /proc/partitions and the do some > probeing on the devices? Mm. I take it you don't have any really weird devices attached that could be responsible for the slowdown? /* Steinar */ -- Homepage: http://www.sesse.net/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#426571: nfs-kernel-server: Frequent "can't read superblock", rpc.mountd spinning at 100% CPU
Package: nfs-kernel-server Version: 1:1.1.0-2 Severity: important -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 After upgrading nfs-kernel-server to a new testing version, nfs started behaving erratically. Upgrade to newest unstable version did not help. mounting now often fails with "can't read superblock" and yields an almost immediat "nfs: server nukunuku not responding, still trying" in the kernel log. Also, when the mount did work, accessing a directory after some period of inactivity takes forever (again with "nfs: server nukunuku not responding, still trying" and sometime later "nfs: server nukunuku OK"). During the "server not responding" time (and even some time after that, if the mount attempt times out) rpc.mountd spins at 100% CPU. ltrace reveals that the call to blkid_probe_all_new takes ages: [rpc.mountd is idle] 19:03:05.293900 select(1024, 0xbfef925c, 0, 0, 0) = 1 [incoming mount request] 19:03:05.301495 svc_getreqset(0xbfef925c, 0xbfef925c, 0, 0, 0 19:03:05.302757 inet_ntoa(0xf108a8c0) = "192.168.8.241" 19:03:05.305276 hosts_ctl(0x805677e, 0x8057485, 0xb7daa6a0, 0x8057485, 0xbfef8bd0) = 1 [...] [blkid_probe_all_new taking _really long_] 19:03:06.641248 blkid_probe_all_new(0x806ba80, 0x805c940, 34, 17, 0xb7f9f023) = 0 [back from blkid_probe_all_new] 19:03:26.403212 __xstat64(3, "/home/ranma/mail", 0xbfef7914) = 0 19:03:26.404184 blkid_devno_to_devname(64771, 0, 0xbfef7914, 17, 0xb7f9f023) = 0x8063f50 19:03:26.405733 blkid_get_dev(0x806ba80, 0x8063f50, 3, 17, 0xb7f9f023) = 0x806ef00 19:03:26.426065 free(0x8063f50)= 19:03:26.426640 blkid_tag_iterate_begin(0x806ef00, 0x8063f50, 3, 17, 0xb7f9f023) = 0x806ced0 19:03:26.428163 blkid_tag_next(0x806ced0, 0xbfef7974, 0xbfef7978, 17, 0xb7f9f023) = 0 19:03:26.428751 blkid_tag_iterate_end(0x806ced0, 0xbfef7974, 0xbfef7978, 17, 0xb7f9f023) = 0x806ced0 19:03:26.429343 memset(0xbfef79ac, '\000', 16) = 0xbfef79ac 19:03:26.429878 __ctype_b_loc()= 0xb7daa690 19:03:26.430506 fwrite("uuid /ranma/mail ya.is-a-geek.or"..., 5, 1, 0x8063de8) = 1 [...] [lather, rinse repeat?] 19:03:27.535890 blkid_probe_all_new(0x806ba80, 0x805c940, 34, 17, 0xb7f9f023) = 0 19:03:44.935949 __xstat64(3, "/home/ranma/mail", 0xbfef7914) = 0 [...] 19:03:45.966056 blkid_probe_all_new(0x806ba80, 0x46460470, 0, 0x465c5cc5, 0) = 0 19:04:03.208417 __xstat64(3, "/home/ranma/mail", 0xbfef9074) = 0 [...] 19:04:03.09 blkid_probe_all_new(0x806ba80, 0x43377aee, 0, 0x4645bb22, 0) = 0 19:04:23.466075 __xstat64(3, "/home", 0xbfef9074) = 0 [I see a pattern] 19:04:24.462181 blkid_probe_all_new(0x806ba80, 0x43377aee, 0, 0x4645bb22, 0) = 0 19:04:41.805271 __xstat64(3, "/usr", 0xbfef9074) = 0 [This might take a while] 19:04:41.817657 blkid_probe_all_new(0x806ba80, 0x43377aee, 0, 0x4645bb22, 0) = 0 19:04:58.288851 __xstat64(3, "/var", 0xbfef9074) = 0 [...] 19:05:52.771674 blkid_probe_all_new(0x806ba80, 0x805c940, 34, 17, 0xb7f9f023) = 0 19:06:13.230946 __xstat64(3, "/home/ranma/mail", 0xbfef8f84) = 0 [...] 19:06:15.953164 select(1024, 0xbfef925c, 0, 0, 0 [finally idle again!] So a single mount request caused rpc.mountd to spin at 100% CPU for about 3 minutes. blk_id_probe_all_new seems to scan /proc/partitions and the do some probeing on the devices? This is my /proc/partitions: |major minor #blocks name | | 8 0 160086528 sda | 8 1 497983 sda1 | 8 21461915 sda2 | 8 33911827 sda3 | 8 4 150416595 sda4 | 816 156290904 sdb | 817 497983 sdb1 | 8181461915 sdb2 | 8193911827 sdb3 | 820 150416595 sdb4 | 832 156290904 sdc | 833 497983 sdc1 | 8341461915 sdc2 | 8353911827 sdc3 | 836 150416595 sdc4 | 9 0 497856 md0 | 9 3 300833024 md3 | 9 23911744 md2 | 9 11461824 md1 | 253 08388608 dm-0 | 253 18388608 dm-1 | 253 2 146800640 dm-2 | 253 31048576 dm-3 | 253 4 62914560 dm-4 | 253 5 41943040 dm-5 - -- System Information: Debian Release: lenny/sid APT prefers testing APT policy: (990, 'testing'), (500, 'unstable') Architecture: i386 (i686) Kernel: Linux 2.6.21.1 Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages nfs-kernel-server depends on: ii libblkid 1.39+1.40-WIP-2006.11.14+dfsg-2 block device id library ii libc62.5-7 GNU C Library: Shared libraries ii libcomer 1.39+1.40-WIP-2006.11.14+dfsg-2 common error description library ii libgssap 0.10-4 A mechanism-switch gssapi library ii libkrb53 1.4.4-8 MIT Kerberos runtime libraries ii libnfsid 0.18-0 An nfs idmapping library ii librpcse 0.14-2