autofs-5.0.1 with patches:
autofs-5.0.1-bad-cast.patch
autofs-5.0.1-check-mtab-updated.patch
autofs-5.0.1-check-user-info-return.patch
autofs-5.0.1-cmd-global-options-fix.patch
autofs-5.0.1-cmd-global-options.patch
autofs-5.0.1-code-cleanups.patch
autofs-5.0.1-conf-append-global.patch
autofs-5.0.1-configure-cleanups.patch
autofs-5.0.1-correct-hesiod-check.patch
autofs-5.0.1-disable-exports-check.patch
autofs-5.0.1-drop-default-prefix-from-config.patch
autofs-5.0.1-export-check-network-fix-2.patch
autofs-5.0.1-file-map-allow-white-space-only-line.patch
autofs-5.0.1-fix-browse-dir-create.patch
autofs-5.0.1-hosts-simple-fail.patch
autofs-5.0.1-localfs-label-check.patch
autofs-5.0.1-map-update-source-only.patch
autofs-5.0.1-network_match-fix.patch
autofs-5.0.1-null-domain-fix.patch
autofs-5.0.1-random-selection.patch
autofs-5.0.1-remove-macro-automount-8.patch
autofs-5.0.1-remove-redundant-ident-macros.patch
autofs-5.0.1-update-kernel-patches.patch
kernel 2.6.16.21-0.9 on sles9-sp3 (took sles10 kernel src.rpm and built
new kernel on top of sles9).
All maps in LDAP. Startup is fine... 6200+ entries in /proc/mounts (Yes
we have that many). Some hosts get into a state where they aren't
expiring mounts anymore:
$ mount -t nfs | wc
476 2856 70188
Differences I see in ps are that there's another thread hanging around
(wedged?):
ps axsm output:
0 2840 0000000000010201 - -
- - ? 384:58 automoun 0 - 0000000000000000
fffffffe7ffbfeff 0000000000000000 0000000180000000 Tsl -
0:00 -
0 - 0000000000000000 fffffffe7ffbfeff 0000000000000000
0000000180000000 Ssl - 0:00 -
0 - 0000000000000000 fffffffe7ffbfeff 0000000000000000
0000000180000000 Ssl - 0:16 -
0 - 0000000000000000 fffffffe7ffbfeff 0000000000000000
0000000180000000 Ssl - 0:04 -
0 - 0000000000000000 fffffffe7ffbfeff 0000000000000000
0000000180000000 Ssl - 0:10 -
0 - 0000000000000000 fffffffe7ffbfeff 0000000000000000
0000000180000000 Ssl - 0:09 -
0 - 0000000000000000 fffffffe7ffbfeff 0000000000000000
0000000180000000 Ssl - 0:00 -
0 - 0000000000000000 fffffffe7ffbfeff 0000000000000000
0000000180000000 Ssl - 0:00 -
0 - 0000000000000000 fffffffe7ffbfeff 0000000000000000
0000000180000000 Ssl - 0:02 -
ps -eLF output:
root 2840 1 2840 0 9 15576 7208 2 Jul05 ?
00:00:00 automount
root 2840 1 2841 0 9 15576 7208 2 Jul05 ?
00:00:00 automount
root 2840 1 2842 0 9 15576 7208 2 Jul05 ?
00:00:16 automount
root 2840 1 2845 0 9 15576 7208 1 Jul05 ?
00:00:04 automount
root 2840 1 2848 0 9 15576 7208 0 Jul05 ?
00:00:10 automount
root 2840 1 2849 0 9 15576 7208 3 Jul05 ?
00:00:09 automount
root 2840 1 32417 0 9 15576 7208 2 Jul13 ?
00:00:00 automount
root 2840 1 2230 0 9 15576 7208 3 Jul14 ?
00:00:00 automount
root 2840 1 2235 0 9 15576 7208 1 Jul14 ?
00:00:02 automount
Hosts that are ok spawn a new thread periodically, the wedged hosts
don't. Kill -USR1/HUP don't seem to have any effect, even stracing
process while sending that signal shows that it never seems to see it
(no SIGUSR1 info in strace).
Strace shows host stuck doing nothing but futex and time calls like
this:
[pid 2840] futex(0x555555685ac4, FUTEX_WAIT, 2949131, {0, 993887000}) =
-1 ETIMEDOUT (Connection timed out)
[pid 2840] futex(0x555555685a80, FUTEX_WAKE, 1) = 0
[pid 2840] clock_gettime(CLOCK_REALTIME, {1184701903, 6010000}) = 0
[pid 2840] futex(0x555555685ac4, FUTEX_WAIT, 2949133, {0, 993990000}) =
-1 ETIMEDOUT (Connection timed out)
[pid 2840] futex(0x555555685a80, FUTEX_WAKE, 1) = 0
[pid 2840] clock_gettime(CLOCK_REALTIME, {1184701904, 5912000}) = 0
[pid 2840] futex(0x555555685ac4, FUTEX_WAIT, 2949135, {0, 994088000}) =
-1 ETIMEDOUT (Connection timed out)
[pid 2840] futex(0x555555685a80, FUTEX_WAKE, 1) = 0
[pid 2840] clock_gettime(CLOCK_REALTIME, {1184701905, 5758000}) = 0
[pid 2840] futex(0x555555685ac4, FUTEX_WAIT, 2949137, {0, 994242000}) =
-1 ETIMEDOUT (Connection timed out)
[pid 2840] futex(0x555555685a80, FUTEX_WAKE, 1) = 0
[pid 2840] clock_gettime(CLOCK_REALTIME, {1184701906, 5657000}) = 0
[pid 2840] futex(0x555555685ac4, FUTEX_WAIT, 2949139, {0, 994343000}) =
-1 ETIMEDOUT (Connection timed out)
[pid 2840] futex(0x555555685a80, FUTEX_WAKE, 1) = 0
[pid 2840] clock_gettime(CLOCK_REALTIME, {1184701907, 5625000}) = 0
[pid 2840] futex(0x555555685ac4, FUTEX_WAIT, 2949141, {0, 994375000}) =
-1 ETIMEDOUT (Connection timed out)
[pid 2840] futex(0x555555685a80, FUTEX_WAKE, 1) = 0
[pid 2840] clock_gettime(CLOCK_REALTIME, {1184701908, 5506000}) = 0
[pid 2840] futex(0x555555685ac4, FUTEX_WAIT, 2949143, {0, 994494000}) =
-1 ETIMEDOUT (Connection timed out)
[pid 2840] futex(0x555555685a80, FUTEX_WAKE, 1) = 0
[pid 2840] clock_gettime(CLOCK_REALTIME, {1184701909, 5422000}) = 0
Wondering if anyone else has seen and/or has any idea what might be the
cause.
--
Mike Marion-Unix SysAdmin/Staff IT Engineer-http://www.qualcomm.com
"I've never used their tech support, but the word is that it sucks."
I believe it's composed entirely of monkeys that couldn't get the Shakespeare
gig. :-) ==> /. users talking about @home tech support
_______________________________________________
autofs mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/autofs