Hello all,

We have a serious problem with lustre.  Since a few days we have
lockups on the client side.  Not all clients are having this
problem.

We are running this kernel  2.6.16-54-0.2.5_lustre.1.6.4.3smp.

The statahead disable is done on the systems.

Some more information about the environment:

- Lustre clients are all vmware virtual systems
- Lustre Farm are all vmware virtual systems

the errors I see are the following:

LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type
0, status -5, desc ffff8100e5dca000
LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type
0, status -5, desc ffff8100e519e000
LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type
0, status -5, desc ffff8100e4e0a000
LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type
0, status -5, desc ffff8100e86b1bc0
LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type
0, status -5, desc ffff8100e79fe5c0
LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type
0, status -5, desc ffff8100e70a88c0
LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type
0, status -5, desc ffff8100e7081280
LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type
0, status -5, desc ffff8100e6d6d5c0
LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) @@@
timeout (sent at 1225816920, 100s ago)  [EMAIL PROTECTED] x17940/t0
o4->[EMAIL PROTECTED]@tcp:28 lens 384/352 ref 2 fl Rpc:/
0/0 rc 0/-22
Lustre: lustre-OST0005-osc-ffff8100e8551800: Connection to service
lustre-OST0005 via nid [EMAIL PROTECTED] was lost; in progress
operations using this service will wait for recovery to complete.
Lustre: lustre-OST0005-osc-ffff8100e8551800: Connection restored to
service lustre-OST0005 using nid [EMAIL PROTECTED]
LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) @@@
timeout (sent at 1225816924, 100s ago)  [EMAIL PROTECTED] x19702/t0
o36->[EMAIL PROTECTED]@tcp:12 lens 1544/296 ref 1 fl
Rpc:/0/0 rc 0/-22
LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) Skipped
2 previous similar messages
Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection to service
lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress
operations using this service will wait for recovery to complete.
Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection restored to
service lustre-MDT0000 using nid [EMAIL PROTECTED]
LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) @@@
timeout (sent at 1225816953, 100s ago)  [EMAIL PROTECTED] x20560/t0
o4->[EMAIL PROTECTED]@tcp:28 lens 384/352 ref 2 fl Rpc:/
0/0 rc 0/-22
Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection to service
lustre-OST0006 via nid [EMAIL PROTECTED] was lost; in progress
operations using this service will wait for recovery to complete.
Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection restored to
service lustre-OST0006 using nid [EMAIL PROTECTED]
LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) @@@
timeout (sent at 1225817024, 100s ago)  [EMAIL PROTECTED] x19702/t0
o36->[EMAIL PROTECTED]@tcp:12 lens 1544/296 ref 1 fl
Rpc:/2/0 rc -11/-22
Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection to service
lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress
operations using this service will wait for recovery to complete.
Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection restored to
service lustre-MDT0000 using nid [EMAIL PROTECTED]
LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) @@@
timeout (sent at 1225817053, 100s ago)  [EMAIL PROTECTED] x20724/t0
o4->[EMAIL PROTECTED]@tcp:28 lens 384/352 ref 2 fl Rpc:/
2/0 rc -11/-22
Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection to service
lustre-OST0006 via nid [EMAIL PROTECTED] was lost; in progress
operations using this service will wait for recovery to complete.
Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection restored to
service lustre-OST0006 using nid [EMAIL PROTECTED]
LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) @@@
timeout (sent at 1225817124, 100s ago)  [EMAIL PROTECTED] x19702/t0
o36->[EMAIL PROTECTED]@tcp:12 lens 1544/296 ref 1 fl
Rpc:/2/0 rc -11/-22
Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection to service
lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress
operations using this service will wait for recovery to complete.
Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection restored to
service lustre-MDT0000 using nid [EMAIL PROTECTED]
LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) @@@
timeout (sent at 1225817153, 100s ago)  [EMAIL PROTECTED] x20767/t0
o4->[EMAIL PROTECTED]@tcp:28 lens 384/352 ref 2 fl Rpc:/
2/0 rc -11/-22
Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection to service
lustre-OST0006 via nid [EMAIL PROTECTED] was lost; in progress
operations using this service will wait for recovery to complete.
Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection restored to
service lustre-OST0006 using nid [EMAIL PROTECTED]
LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) @@@
timeout (sent at 1225817224, 100s ago)  [EMAIL PROTECTED] x19702/t0
o36->[EMAIL PROTECTED]@tcp:12 lens 1544/296 ref 1 fl
Rpc:/2/0 rc -11/-22
Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection to service
lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress
operations using this service will wait for recovery to complete.
Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection restored to
service lustre-MDT0000 using nid [EMAIL PROTECTED]
LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) @@@
timeout (sent at 1225817324, 100s ago)  [EMAIL PROTECTED] x19702/t0
o36->[EMAIL PROTECTED]@tcp:12 lens 1544/296 ref 1 fl
Rpc:/2/0 rc -11/-22
LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) Skipped
1 previous similar message
Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection to service
lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress
operations using this service will wait for recovery to complete.
Lustre: Skipped 1 previous similar message
Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection restored to
service lustre-MDT0000 using nid [EMAIL PROTECTED]
Lustre: Skipped 1 previous similar message
LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) @@@
timeout (sent at 1225817424, 100s ago)  [EMAIL PROTECTED] x19702/t0
o36->[EMAIL PROTECTED]@tcp:12 lens 1544/296 ref 1 fl
Rpc:/2/0 rc -11/-22
LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) Skipped
1 previous similar message
Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection to service
lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress
operations using this service will wait for recovery to complete.
Lustre: Skipped 1 previous similar message
Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection restored to
service lustre-MDT0000 using nid [EMAIL PROTECTED]
Lustre: Skipped 1 previous similar message
LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) @@@
timeout (sent at 1225817553, 100s ago)  [EMAIL PROTECTED] x20952/t0
o4->[EMAIL PROTECTED]@tcp:28 lens 384/352 ref 2 fl Rpc:/
2/0 rc -11/-22
LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) Skipped
2 previous similar messages
Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection to service
lustre-OST0006 via nid [EMAIL PROTECTED] was lost; in progress
operations using this service will wait for recovery to complete.
Lustre: Skipped 2 previous similar messages
Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection restored to
service lustre-OST0006 using nid [EMAIL PROTECTED]
Lustre: Skipped 2 previous similar messages
LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type
0, status -5, desc ffff8100efba6800
LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) @@@
timeout (sent at 1225817824, 99s ago)  [EMAIL PROTECTED] x19702/t0
o36->[EMAIL PROTECTED]@tcp:12 lens 1544/296 ref 1 fl
Rpc:/2/0 rc -11/-22
LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) Skipped
4 previous similar messages
Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection to service
lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress
operations using this service will wait for recovery to complete.
Lustre: Skipped 4 previous similar messages
Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection restored to
service lustre-MDT0000 using nid [EMAIL PROTECTED]
Lustre: Skipped 4 previous similar messages

Could somebody help me out ?

Thanks in advance.

Kurt
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to