First glance indicates you're having network connectivity problems, (possibly driver issue with your NIC?)
(Check MTU settings, etc?) -cf On Thu, Aug 11, 2016 at 7:43 AM, Phill Harvey-Smith < p.harvey-sm...@warwick.ac.uk> wrote: > Hi all, > > I have a (fairly urgent) problem. > > I have been updating our cluster to Ubuntu 16.04, which has on the whole > gone well, however in the last part I have run across a rather serious > error. > > Our frontend node has an instance of samba that shares out the home > directories, we have found that writing to a file on /home will cause the > /home mount to timeout and become inacessable. > > The errors that are reported in the journal are like so : > > Aug 11 10:06:34 buster-fe0 kernel: Lustre: Lustre: Build Version: >> 2.8.53_51_g3680fa1_dirty >> Aug 11 10:06:34 buster-fe0 kernel: Lustre: Server MGS version (2.1.0.0) >> is much older than client. Consider upgrading server >> (2.8.53_51_g3680fa1_dirty) >> Aug 11 10:06:34 buster-fe0 kernel: Lustre: Trying to mount a client with >> IR setting not compatible with current mgc. Force to use current mgc >> setting that is IR disabled. >> Aug 11 10:06:34 buster-fe0 kernel: Lustre: Mounted home-client >> Aug 11 10:06:34 buster-fe0 mount[4687]: mount.lustre: addmntent: Invalid >> argument: >> Aug 11 10:06:34 buster-fe0 mount[4691]: mount.lustre: addmntent: Invalid >> argument: >> Aug 11 10:06:34 buster-fe0 mount[4670]: mount.lustre: addmntent: Invalid >> argument: >> Aug 11 10:06:35 buster-fe0 systemd[1]: Started Lustre setup. >> Aug 11 10:06:37 buster-fe0 lustre.sh[4870]: >> llite.home-ffff881005c37800.create_no_open_optimization=0 >> Aug 11 13:48:12 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470919685/real >> 1470919685] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470919692 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 >> Aug 11 13:48:12 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 13:48:12 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 13:48:19 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470919692/real >> 1470919692] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470919699 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 >> Aug 11 13:48:19 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 13:48:19 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 13:48:26 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470919699/real >> 1470919699] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470919706 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 >> Aug 11 13:48:26 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 13:48:26 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 13:48:33 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470919706/real >> 1470919706] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470919713 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 >> Aug 11 13:48:33 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 13:48:33 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 13:48:40 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470919713/real >> 1470919713] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470919720 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 >> Aug 11 13:48:40 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 13:48:40 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 13:48:54 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470919727/real >> 1470919727] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470919734 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 >> Aug 11 13:48:54 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> Skipped 1 previous similar message >> Aug 11 13:48:54 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 13:48:54 buster-fe0 kernel: Lustre: Skipped 1 previous similar >> message >> Aug 11 13:48:54 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 13:48:54 buster-fe0 kernel: Lustre: Skipped 1 previous similar >> message >> Aug 11 13:49:15 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470919748/real >> 1470919748] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470919755 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 >> Aug 11 13:49:15 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> Skipped 2 previous similar messages >> Aug 11 13:49:15 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 13:49:15 buster-fe0 kernel: Lustre: Skipped 2 previous similar >> messages >> Aug 11 13:49:15 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 13:49:15 buster-fe0 kernel: Lustre: Skipped 2 previous similar >> messages >> Aug 11 13:49:50 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470919783/real >> 1470919783] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470919790 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 >> Aug 11 13:49:50 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> Skipped 4 previous similar messages >> Aug 11 13:49:50 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 13:49:50 buster-fe0 kernel: Lustre: Skipped 4 previous similar >> messages >> Aug 11 13:49:50 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 13:49:50 buster-fe0 kernel: Lustre: Skipped 4 previous similar >> messages >> Aug 11 13:51:00 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470919853/real >> 1470919853] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470919860 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 >> Aug 11 13:51:00 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> Skipped 9 previous similar messages >> Aug 11 13:51:00 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 13:51:00 buster-fe0 kernel: Lustre: Skipped 9 previous similar >> messages >> Aug 11 13:51:00 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 13:51:00 buster-fe0 kernel: Lustre: Skipped 9 previous similar >> messages >> Aug 11 13:53:13 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470919986/real >> 1470919986] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470919993 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 >> Aug 11 13:53:13 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> Skipped 18 previous similar messages >> Aug 11 13:53:13 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 13:53:13 buster-fe0 kernel: Lustre: Skipped 18 previous similar >> messages >> Aug 11 13:53:13 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 13:53:13 buster-fe0 kernel: Lustre: Skipped 18 previous similar >> messages >> Aug 11 13:55:38 buster-fe0 kernel: [<ffffffffc0c588fc>] ? >> ll_lookup_finish_locks+0xfc/0x8a0 [lustre] >> Aug 11 13:57:32 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470920245/real >> 1470920245] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470920252 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 >> Aug 11 13:57:32 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> Skipped 36 previous similar messages >> Aug 11 13:57:32 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 13:57:32 buster-fe0 kernel: Lustre: Skipped 36 previous similar >> messages >> Aug 11 13:57:32 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 13:57:32 buster-fe0 kernel: Lustre: Skipped 36 previous similar >> messages >> Aug 11 13:57:38 buster-fe0 kernel: [<ffffffffc0c588fc>] ? >> ll_lookup_finish_locks+0xfc/0x8a0 [lustre] >> Aug 11 13:57:38 buster-fe0 kernel: [<ffffffffc0c588fc>] ? >> ll_lookup_finish_locks+0xfc/0x8a0 [lustre] >> Aug 11 13:57:38 buster-fe0 kernel: [<ffffffffc0c588fc>] ? >> ll_lookup_finish_locks+0xfc/0x8a0 [lustre] >> Aug 11 13:59:38 buster-fe0 kernel: [<ffffffffc0c588fc>] ? >> ll_lookup_finish_locks+0xfc/0x8a0 [lustre] >> Aug 11 13:59:38 buster-fe0 kernel: [<ffffffffc0c588fc>] ? >> ll_lookup_finish_locks+0xfc/0x8a0 [lustre] >> Aug 11 13:59:38 buster-fe0 kernel: [<ffffffffc0c588fc>] ? >> ll_lookup_finish_locks+0xfc/0x8a0 [lustre] >> Aug 11 14:06:10 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470920763/real >> 1470920763] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470920770 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 >> Aug 11 14:06:10 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> Skipped 73 previous similar messages >> Aug 11 14:06:10 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 14:06:10 buster-fe0 kernel: Lustre: Skipped 73 previous similar >> messages >> Aug 11 14:06:10 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 14:06:10 buster-fe0 kernel: Lustre: Skipped 73 previous similar >> messages >> Aug 11 14:16:12 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470921365/real >> 1470921365] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470921372 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 >> Aug 11 14:16:12 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> Skipped 85 previous similar messages >> Aug 11 14:16:12 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 14:16:12 buster-fe0 kernel: Lustre: Skipped 85 previous similar >> messages >> Aug 11 14:16:12 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 14:16:12 buster-fe0 kernel: Lustre: Skipped 85 previous similar >> messages >> Aug 11 14:26:14 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470921967/real >> 1470921967] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470921974 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 >> Aug 11 14:26:14 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> Skipped 85 previous similar messages >> Aug 11 14:26:14 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 14:26:14 buster-fe0 kernel: Lustre: Skipped 85 previous similar >> messages >> Aug 11 14:26:14 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 14:26:14 buster-fe0 kernel: Lustre: Skipped 85 previous similar >> messages >> Aug 11 14:36:16 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> @@@ Request sent has timed out for slow reply: [sent 1470922569/real >> 1470922569] req@ffff8807e6076c00 x1542357171466016/t0(0) >> o55->home-MDT0000-mdc-ffff881005c37800@192.168.0.4@tcp:12/10 lens >> 592/224 e 0 to 1 dl 1470922576 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 >> Aug 11 14:36:16 buster-fe0 kernel: Lustre: >> 20221:0:(client.c:2067:ptlrpc_expire_one_request()) >> Skipped 85 previous similar messages >> Aug 11 14:36:16 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection to home-MDT0000 (at 192.168.0.4@tcp) was lost; in progress >> operations using this service will wait for recovery to complete >> Aug 11 14:36:16 buster-fe0 kernel: Lustre: Skipped 85 previous similar >> messages >> Aug 11 14:36:16 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800: >> Connection restored to 192.168.0.4@tcp (at 192.168.0.4@tcp) >> Aug 11 14:36:16 buster-fe0 kernel: Lustre: Skipped 85 previous similar >> messages >> > > I also can't get the recently checked out sources to compile, but will > post a seperate query about that. :) > > Cheers. > > Phill. > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lu > stre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d= > DQICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=x9pM59OqndbWw-lPPdr8w1Vud2 > 9EZigcxcNkz0uw5oQ&m=zW-9Djf9o-ocK161YQkqDgnP4T8BJOFtVz8rXWgh > O_Y&s=AYaw6SUqY638craGX5JylO5KxzRFY-WlPVio8hwXpjc&e=
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org