Hi Phil, I understand that you’re running master on your clients (tag v2_8_56 was created 4 days ago) and 2.1 on the servers? Running master in production is already a challenge. Also Lustre has never be good for cross-version compatibility. For example, it is possible to make 2.1 servers work with 2.5 clients and 2.5 servers work with 2.7 clients, even though additional patches may be needed.
I would say try to reduce the gap, upgrade your servers and/or try an official lustre release on your clients… All the best, Stephane > On Aug 12, 2016, at 5:37 AM, Phill Harvey-Smith > <p.harvey-sm...@warwick.ac.uk> wrote: > > On 11/08/2016 16:10, Colin Faber wrote: >>> First glance indicates you're having network connectivity problems, >>> (possibly driver issue with your NIC?) > > I don't seem to have had any problems with any other services running on the > cluster, and there are no messages in the journal or the /var/log files > relating to network errors. > > Oddly though when the /home filesystem hangs the /storage and /scratch > filesystems also served by the same luster servers continue to respond > without problems. > > What does semm top have some bearing on it is that the first few writes seem > to succeed and then it will hang, though it was first noticed through samba, > it also appears to also happen logged in to the console directly. > >>> (Check MTU settings, etc?) > > Pasting as quotation as it stops thunderbird from wrapping the text..... > >> root@test-r710:~# ifconfig >> eno1 Link encap:Ethernet HWaddr 00:26:b9:84:c7:8d >> inet addr:192.168.1.80 Bcast:192.168.1.255 Mask:255.255.255.0 >> inet6 addr: fe80::226:b9ff:fe84:c78d/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:8516 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:23199 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:5297958 (5.2 MB) TX bytes:3222616 (3.2 MB) >> >> eno2 Link encap:Ethernet HWaddr 00:26:b9:84:c7:8f >> inet addr:192.168.0.80 Bcast:192.168.0.255 Mask:255.255.255.0 >> inet6 addr: fe80::226:b9ff:fe84:c78f/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:1374513 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:168485 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:2026863011 (2.0 GB) TX bytes:21861558 (21.8 MB) >> >> eno4 Link encap:Ethernet HWaddr 00:26:b9:84:c7:93 >> inet addr:137.205.232.159 Bcast:137.205.232.255 >> Mask:255.255.255.128 >> inet6 addr: fe80::226:b9ff:fe84:c793/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:11483 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:10560 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:3504764 (3.5 MB) TX bytes:5731764 (5.7 MB) > > >> root@test-r710:~# route -n >> Kernel IP routing table >> Destination Gateway Genmask Flags Metric Ref Use Iface >> 0.0.0.0 137.205.232.254 0.0.0.0 UG 0 0 0 eno4 >> 137.205.232.128 0.0.0.0 255.255.255.128 U 0 0 0 eno4 >> 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eno2 >> 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eno1 > > Lustre mounts in fstab :> # Lustre mounted >> 192.168.0.4@tcp0:/storage /storage lustre >> defaults,_netdev,flock 0 0 >> 192.168.0.4@tcp0:/home /home lustre >> defaults,_netdev,flock 0 0 >> 192.168.0.4@tcp0:/scratch /scratch lustre >> defaults,_netdev,flock 0 0 > > I've also tried compiling the latest source and installing those modules : > Lustre: Build Version: 2.8.56_26_g6fad3ab this does seem not to have the > problem with matlab (mentioned about a month or so ago), but still has the > hanging problem. > > The lustre startup logs in the joural are here : >> Aug 12 12:57:10 test-r710 kernel: Lustre: Lustre: Build Version: >> 2.8.56_26_g6fad3ab >> Aug 12 12:57:10 test-r710 kernel: Lustre: Server MGS version (2.1.0.0) is >> much older than client. Consider upgrading server (2.8.56_26_g6fad3ab) >> Aug 12 12:57:10 test-r710 kernel: Lustre: Trying to mount a client with IR >> setting not compatible with current mgc. Force to use current mgc setting >> that is IR disabled. >> Aug 12 12:57:10 test-r710 kernel: Lustre: Mounted home-client > > > Cheers. > > Phill. > > > > Cheers. > > Phill. > > > > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org