Some more information that might be helpful. There is a particular
code that one of our users runs. Personally after the trouble this
code has caused us we'd like to hand him a calculator and disable his
accounts but sadly that's not an option. Since the time of the hang,
there is what seems to be one process associated with lustre that is
running as the userid of the problem user- "ll_sa_15530". A trace of
this process in its current state shows this -
Apr 30 11:29:30 cola10 kernel: ll_sa_15530 S 0000000000000000 0
15531 1 17700 18228 (L-TLB)
Apr 30 11:29:30 cola10 kernel: ffff810116c31c10 0000000000000046
ffff81013e7747a0 ffffffff80087d0e
Apr 30 11:29:30 cola10 kernel: 0000000000000007 ffff81003a76b040
ffff81012f11f0c0 000fcb5175eba398
Apr 30 11:29:30 cola10 kernel: 0000000000001407 ffff81003a76b228
0000000000000001 0000000000000068
Apr 30 11:29:30 cola10 kernel: Call Trace:
Apr 30 11:29:30 cola10 kernel: [<ffffffff80087d0e>] enqueue_task
+0x41/0x56
Apr 30 11:29:30 cola10 kernel:
[<ffffffff8862b7e4>] :ptlrpc:ldlm_prep_enqueue_req+0x1b4/0x2e0
Apr 30 11:29:30 cola10 kernel: [<ffffffff886e528c>] :mdc:mdc_req_avail
+0x6c/0xf0
Apr 30 11:29:30 cola10 kernel:
[<ffffffff886e6275>] :mdc:mdc_enter_request+0x145/0x1e0
Apr 30 11:29:30 cola10 kernel: [<ffffffff800884ed>]
default_wake_function+0x0/0xe
Apr 30 11:29:30 cola10 kernel:
[<ffffffff886e6410>] :mdc:mdc_intent_lookup_pack+0xd0/0xf0
Apr 30 11:29:30 cola10 kernel:
[<ffffffff886e6644>] :mdc:mdc_intent_getattr_async+0x214/0x420
Apr 30 11:29:30 cola10 kernel: [<ffffffff887ae63d>] :lustre:ll_i2gids
+0x5d/0x150
Apr 30 11:29:30 cola10 kernel:
[<ffffffff887b94c5>] :lustre:ll_statahead_thread+0xf75/0x1810
Apr 30 11:29:30 cola10 kernel: [<ffffffff800884ed>]
default_wake_function+0x0/0xe
Apr 30 11:29:30 cola10 kernel: [<ffffffff8005bfb1>] child_rip+0xa/0x11
Apr 30 11:29:30 cola10 kernel:
[<ffffffff887b8550>] :lustre:ll_statahead_thread+0x0/0x1810
Apr 30 11:29:30 cola10 kernel: [<ffffffff8005bfa7>] child_rip+0x0/0x11
Is this a problem with the lustre readahead code? If so would this fix
it? "echo 0 > /proc/fs/lustre/llite/*/statahead_count "
Thank you so much for all your help.
-Aaron
On Apr 30, 2008, at 11:16 AM, Aaron S. Knister wrote:
I have a lustre client that was randomly evicted early this morning.
The errors from the dmesg are below. It's running infiniband. There
were no infiniband errors that I could tell and all the mds/mgs and
oss's said was "haven't heard from client xyz in 2277 seconds.
Evicting". The client has halfway come back and now shows this -
[EMAIL PROTECTED]:~ $ lfs df -h
UUID bytes Used Available Use% Mounted on
data-MDT0000_UUID 87.5G 6.4G 81.1G 7% /data[MDT:0]
data-OST0000_UUID 5.4T 4.9T 439.6G 92% /data[OST:0]
data-OST0001_UUID : inactive device
data-OST0002_UUID : inactive device
data-OST0003_UUID : inactive device
data-OST0004_UUID : inactive device
data-OST0005_UUID : inactive device
data-OST0006_UUID : inactive device
data-OST0007_UUID : inactive device
data-OST0008_UUID : inactive device
data-OST0009_UUID : inactive device
filesystem summary: 5.4T 4.9T 439.6G 92% /data
so it's reconnected to one of 10 osts. I tried to to an lctl --
device {device} reconnect and it said "Error: Operation in
progress". I have no idea what went wrong and I'm confident a reboot
would fix it but I'd like to avoid it if possible.
Thanks in advance.
LustreError: 11-0: an error occurred while communicating with
[EMAIL PROTECTED] The mds_statfs operation failed with -107
Lustre: data-MDT0000-mdc-ffff81013037b800: Connection to service
data-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress
operations using this service will wait for recovery to complete.
LustreError: 167-0: This client was evicted by data-MDT0000; in
progress operations using this service will fail.
LustreError: 22345:0:(llite_lib.c:1508:ll_statfs_internal())
mdc_statfs fails: rc = -5
LustreError: 22396:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID [EMAIL PROTECTED] x81717113/t0 o41->[EMAIL PROTECTED]
@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22396:0:(llite_lib.c:1508:ll_statfs_internal())
mdc_statfs fails: rc = -108
LustreError: 22454:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID [EMAIL PROTECTED] x81717114/t0 o41->[EMAIL PROTECTED]
@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22454:0:(llite_lib.c:1508:ll_statfs_internal())
mdc_statfs fails: rc = -108
LustreError: 22463:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID [EMAIL PROTECTED] x81717115/t0 o41->[EMAIL PROTECTED]
@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22463:0:(llite_lib.c:1508:ll_statfs_internal())
mdc_statfs fails: rc = -108
LustreError: 22734:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID [EMAIL PROTECTED] x81717138/t0 o41->[EMAIL PROTECTED]
@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22734:0:(llite_lib.c:1508:ll_statfs_internal())
mdc_statfs fails: rc = -108
LustreError: 22736:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID [EMAIL PROTECTED] x81717139/t0 o41->[EMAIL PROTECTED]
@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22736:0:(llite_lib.c:1508:ll_statfs_internal())
mdc_statfs fails: rc = -108
LustreError: 22912:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID [EMAIL PROTECTED] x81717140/t0 o41->[EMAIL PROTECTED]
@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22912:0:(llite_lib.c:1508:ll_statfs_internal())
mdc_statfs fails: rc = -108
LustreError: 22971:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID [EMAIL PROTECTED] x81717143/t0 o41->[EMAIL PROTECTED]
@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22971:0:(client.c:519:ptlrpc_import_delay_req())
Skipped 2 previous similar messages
LustreError: 22971:0:(llite_lib.c:1508:ll_statfs_internal())
mdc_statfs fails: rc = -108
LustreError: 22971:0:(llite_lib.c:1508:ll_statfs_internal()) Skipped
2 previous similar messages
LustreError: 23781:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID [EMAIL PROTECTED] x81717144/t0 o41->[EMAIL PROTECTED]
@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 23781:0:(llite_lib.c:1508:ll_statfs_internal())
mdc_statfs fails: rc = -108
LustreError: 23796:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID [EMAIL PROTECTED] x81717156/t0 o41->[EMAIL PROTECTED]
@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 23827:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID [EMAIL PROTECTED] x81717157/t0 o41->[EMAIL PROTECTED]
@o2ib:12 lens 128/272 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 23827:0:(llite_lib.c:1508:ll_statfs_internal())
mdc_statfs fails: rc = -108
LustreError: 23827:0:(llite_lib.c:1508:ll_statfs_internal()) Skipped
1 previous similar message
LustreError: 22346:0:(client.c:519:ptlrpc_import_delay_req()) @@@
IMP_INVALID [EMAIL PROTECTED] x81717169/t0 o35->[EMAIL PROTECTED]
@o2ib:12 lens 296/896 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22346:0:(file.c:97:ll_close_inode_openhandle()) inode
21601226 mdc close failed: rc = -108
Lustre: data-MDT0000-mdc-ffff81013037b800: Connection restored to
service data-MDT0000 using nid [EMAIL PROTECTED]
LustreError: 11-0: an error occurred while communicating with
[EMAIL PROTECTED] The ost_statfs operation failed with -107
Lustre: data-OST0001-osc-ffff81013037b800: Connection to service
data-OST0001 via nid [EMAIL PROTECTED] was lost; in progress
operations using this service will wait for recovery to complete.
LustreError: 11-0: an error occurred while communicating with
[EMAIL PROTECTED] The ost_statfs operation failed with -107
LustreError: 167-0: This client was evicted by data-OST0001; in
progress operations using this service will fail.
LustreError: 167-0: This client was evicted by data-OST0002; in
progress operations using this service will fail.
LustreError: 24093:0:(llite_lib.c:1520:ll_statfs_internal())
obd_statfs fails: rc = -5
Lustre: data-OST0000-osc-ffff81013037b800: Connection restored to
service data-OST0000 using nid [EMAIL PROTECTED]
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss
Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies
(301) 595-7000
[EMAIL PROTECTED]
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss