Bug#426571: nfs-kernel-server: Frequent "can't read superblock", rpc.mountd spinning at 100% CPU

2007-06-03 Thread Steinar H. Gunderson
On Fri, Jun 01, 2007 at 02:48:42PM +0200, Tobias Diedrich wrote:
> I also noticed that while I had upgraded nfs-kernel-server I was
> still using an older version of libblkid1.  Maybe that upgrade fixed
> it.

I remember issues with it, yes -- those were leaks, but still...

/* Steinar */
-- 
Homepage: http://www.sesse.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#426571: nfs-kernel-server: Frequent "can't read superblock", rpc.mountd spinning at 100% CPU

2007-06-02 Thread Tobias Diedrich
Steinar H. Gunderson wrote:
> On Tue, May 29, 2007 at 07:21:33PM +0200, Tobias Diedrich wrote:
> > After upgrading nfs-kernel-server to a new testing version,
> > nfs started behaving erratically. Upgrade to newest unstable version did
> > not help.
> 
> Hm, that's odd. Could you please take it upstream? [EMAIL PROTECTED]
> should probably be able to help you.
> 
> > [blkid_probe_all_new taking _really long_]
> 
> That's a bit odd, too. :-)
> 
> > blk_id_probe_all_new seems to scan /proc/partitions and the do some
> > probeing on the devices?
> 
> Mm. I take it you don't have any really weird devices attached that could be
> responsible for the slowdown?

Not that I can think of.
However, I just upgraded the server to new hardware (faster cpu,
bigger disks) and the problem seems to be gone now...
I probably should try ltracing rpc.mountd to look at the execution
time of blk_id_probe_all_new again, just in case it's still
excessive, but the faster CPU might compensate for that.

Hmm, no... Seems fine now:
14:41:50.754679 blkid_probe_all_new(0x8063810, 0x805c940, 34, 17,
0xf7ef4023) = 0
14:41:51.094040 __xstat64(3, "/home/ranma/mail", 0xffe7ef94) = 0

I also noticed that while I had upgraded nfs-kernel-server I was
still using an older version of libblkid1.  Maybe that upgrade fixed
it.

Anyway, I should be able to boot up the old components to do some
tests with them...  Maybe it's a timing problem that only shows up
on slow CPUs? (PentiumIII 800MHz vs. Athlon64 3200+)

-- 
Tobias  PGP: http://9ac7e0bc.uguu.de
このメールは十割再利用されたビットで作られています。


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#426571: nfs-kernel-server: Frequent "can't read superblock", rpc.mountd spinning at 100% CPU

2007-06-02 Thread Tobias Diedrich
Tobias Diedrich wrote:
> Steinar H. Gunderson wrote:
> > On Tue, May 29, 2007 at 07:21:33PM +0200, Tobias Diedrich wrote:
> > > After upgrading nfs-kernel-server to a new testing version,
> > > nfs started behaving erratically. Upgrade to newest unstable version did
> > > not help.
> > 
> > Hm, that's odd. Could you please take it upstream? [EMAIL PROTECTED]
> > should probably be able to help you.
> > 
> > > [blkid_probe_all_new taking _really long_]
> > 
> > That's a bit odd, too. :-)
> > 
> > > blk_id_probe_all_new seems to scan /proc/partitions and the do some
> > > probeing on the devices?
> > 
> > Mm. I take it you don't have any really weird devices attached that could be
> > responsible for the slowdown?
> 
> Not that I can think of.
> However, I just upgraded the server to new hardware (faster cpu,
> bigger disks) and the problem seems to be gone now...
> I probably should try ltracing rpc.mountd to look at the execution
> time of blk_id_probe_all_new again, just in case it's still
> excessive, but the faster CPU might compensate for that.
> 
> Hmm, no... Seems fine now:
> 14:41:50.754679 blkid_probe_all_new(0x8063810, 0x805c940, 34, 17,
> 0xf7ef4023) = 0
> 14:41:51.094040 __xstat64(3, "/home/ranma/mail", 0xffe7ef94) = 0
> 
> I also noticed that while I had upgraded nfs-kernel-server I was
> still using an older version of libblkid1.  Maybe that upgrade fixed
> it.
> 
> Anyway, I should be able to boot up the old components to do some
> tests with them...  Maybe it's a timing problem that only shows up
> on slow CPUs? (PentiumIII 800MHz vs. Athlon64 3200+)

Unfortunately (or maybe fortunately ;)), while I did boot up the old
parts I could not reproduce the problem again (during the short
testing time).  I also tried downgrading libblkid1 back to testing
but it still worked flawlessly.

-- 
Tobias  PGP: http://9ac7e0bc.uguu.de
このメールは十割再利用されたビットで作られています。


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#426571: nfs-kernel-server: Frequent "can't read superblock", rpc.mountd spinning at 100% CPU

2007-05-30 Thread Steinar H. Gunderson
On Tue, May 29, 2007 at 07:21:33PM +0200, Tobias Diedrich wrote:
> After upgrading nfs-kernel-server to a new testing version,
> nfs started behaving erratically. Upgrade to newest unstable version did
> not help.

Hm, that's odd. Could you please take it upstream? [EMAIL PROTECTED]
should probably be able to help you.

> [blkid_probe_all_new taking _really long_]

That's a bit odd, too. :-)

> blk_id_probe_all_new seems to scan /proc/partitions and the do some
> probeing on the devices?

Mm. I take it you don't have any really weird devices attached that could be
responsible for the slowdown?

/* Steinar */
-- 
Homepage: http://www.sesse.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#426571: nfs-kernel-server: Frequent "can't read superblock", rpc.mountd spinning at 100% CPU

2007-05-29 Thread Tobias Diedrich
Package: nfs-kernel-server
Version: 1:1.1.0-2
Severity: important

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


After upgrading nfs-kernel-server to a new testing version,
nfs started behaving erratically. Upgrade to newest unstable version did
not help.

mounting now often fails with "can't read superblock" and yields an
almost immediat "nfs: server nukunuku not responding, still trying" in
the kernel log.

Also, when the mount did work, accessing a directory after some period
of inactivity takes forever (again with "nfs: server nukunuku not
responding, still trying" and sometime later "nfs: server nukunuku OK").

During the "server not responding" time (and even some time after that,
if the mount attempt times out) rpc.mountd spins at 100% CPU.
ltrace reveals that the call to blkid_probe_all_new takes ages:

[rpc.mountd is idle]
19:03:05.293900 select(1024, 0xbfef925c, 0, 0, 0)  = 1
[incoming mount request]
19:03:05.301495 svc_getreqset(0xbfef925c, 0xbfef925c, 0, 0, 0

19:03:05.302757 inet_ntoa(0xf108a8c0)  =
"192.168.8.241"
19:03:05.305276 hosts_ctl(0x805677e, 0x8057485, 0xb7daa6a0, 0x8057485,
0xbfef8bd0) = 1
[...]
[blkid_probe_all_new taking _really long_]
19:03:06.641248 blkid_probe_all_new(0x806ba80, 0x805c940, 34, 17,
0xb7f9f023) = 0
[back from blkid_probe_all_new]
19:03:26.403212 __xstat64(3, "/home/ranma/mail", 0xbfef7914)   = 0
19:03:26.404184 blkid_devno_to_devname(64771, 0, 0xbfef7914, 17,
0xb7f9f023) = 0x8063f50
19:03:26.405733 blkid_get_dev(0x806ba80, 0x8063f50, 3, 17, 0xb7f9f023) =
0x806ef00
19:03:26.426065 free(0x8063f50)=

19:03:26.426640 blkid_tag_iterate_begin(0x806ef00, 0x8063f50, 3, 17,
0xb7f9f023) = 0x806ced0
19:03:26.428163 blkid_tag_next(0x806ced0, 0xbfef7974, 0xbfef7978, 17,
0xb7f9f023) = 0
19:03:26.428751 blkid_tag_iterate_end(0x806ced0, 0xbfef7974, 0xbfef7978,
17, 0xb7f9f023) = 0x806ced0
19:03:26.429343 memset(0xbfef79ac, '\000', 16) =
0xbfef79ac
19:03:26.429878 __ctype_b_loc()=
0xb7daa690
19:03:26.430506 fwrite("uuid /ranma/mail ya.is-a-geek.or"..., 5, 1,
0x8063de8) = 1
[...]
[lather, rinse repeat?]
19:03:27.535890 blkid_probe_all_new(0x806ba80, 0x805c940, 34, 17,
0xb7f9f023) = 0
19:03:44.935949 __xstat64(3, "/home/ranma/mail", 0xbfef7914)   = 0
[...]
19:03:45.966056 blkid_probe_all_new(0x806ba80, 0x46460470, 0,
0x465c5cc5, 0) = 0
19:04:03.208417 __xstat64(3, "/home/ranma/mail", 0xbfef9074)   = 0
[...]
19:04:03.09 blkid_probe_all_new(0x806ba80, 0x43377aee, 0,
0x4645bb22, 0) = 0
19:04:23.466075 __xstat64(3, "/home", 0xbfef9074)  = 0
[I see a pattern]
19:04:24.462181 blkid_probe_all_new(0x806ba80, 0x43377aee, 0,
0x4645bb22, 0) = 0
19:04:41.805271 __xstat64(3, "/usr", 0xbfef9074)   = 0
[This might take a while]
19:04:41.817657 blkid_probe_all_new(0x806ba80, 0x43377aee, 0,
0x4645bb22, 0) = 0
19:04:58.288851 __xstat64(3, "/var", 0xbfef9074)   = 0
[...]
19:05:52.771674 blkid_probe_all_new(0x806ba80, 0x805c940, 34, 17,
0xb7f9f023) = 0
19:06:13.230946 __xstat64(3, "/home/ranma/mail", 0xbfef8f84)   = 0
[...]
19:06:15.953164 select(1024, 0xbfef925c, 0, 0, 0
[finally idle again!]

So a single mount request caused rpc.mountd to spin at 100% CPU for
about 3 minutes.

blk_id_probe_all_new seems to scan /proc/partitions and the do some
probeing on the devices?

This is my /proc/partitions:
|major minor  #blocks  name
|
|   8 0  160086528 sda
|   8 1 497983 sda1
|   8 21461915 sda2
|   8 33911827 sda3
|   8 4  150416595 sda4
|   816  156290904 sdb
|   817 497983 sdb1
|   8181461915 sdb2
|   8193911827 sdb3
|   820  150416595 sdb4
|   832  156290904 sdc
|   833 497983 sdc1
|   8341461915 sdc2
|   8353911827 sdc3
|   836  150416595 sdc4
|   9 0 497856 md0
|   9 3  300833024 md3
|   9 23911744 md2
|   9 11461824 md1
| 253 08388608 dm-0
| 253 18388608 dm-1
| 253 2  146800640 dm-2
| 253 31048576 dm-3
| 253 4   62914560 dm-4
| 253 5   41943040 dm-5

- -- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.21.1
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages nfs-kernel-server depends on:
ii  libblkid 1.39+1.40-WIP-2006.11.14+dfsg-2 block device id library
ii  libc62.5-7   GNU C Library: Shared libraries
ii  libcomer 1.39+1.40-WIP-2006.11.14+dfsg-2 common error description library
ii  libgssap 0.10-4  A mechanism-switch gssapi library
ii  libkrb53 1.4.4-8 MIT Kerberos runtime libraries
ii  libnfsid 0.18-0  An nfs idmapping library
ii  librpcse 0.14-2