Hi Phil,

I understand that you’re running master on your clients (tag v2_8_56 was 
created 4 days ago) and 2.1 on the servers? Running master in production is 
already a challenge. Also Lustre has never be good for cross-version 
compatibility. For example, it is possible to make 2.1 servers work with 2.5 
clients and 2.5 servers work with 2.7 clients, even though additional patches 
may be needed.

I would say try to reduce the gap, upgrade your servers and/or try an official 
lustre release on your clients…

All the best,
Stephane


> On Aug 12, 2016, at 5:37 AM, Phill Harvey-Smith 
> <p.harvey-sm...@warwick.ac.uk> wrote:
> 
> On 11/08/2016 16:10, Colin Faber wrote:
>>> First glance indicates you're having network connectivity problems,
>>> (possibly driver issue with your NIC?)
> 
> I don't seem to have had any problems with any other services running on the 
> cluster, and there are no messages in the journal or the /var/log files 
> relating to network errors.
> 
> Oddly though when the /home filesystem hangs the /storage and /scratch 
> filesystems also served by the same luster servers continue to respond
> without problems.
> 
> What does semm top have some bearing on it is that the first few writes seem 
> to succeed and then it will hang, though it was first noticed through samba, 
> it also appears to also happen logged in to the console directly.
> 
>>> (Check MTU settings, etc?)
> 
> Pasting as quotation as it stops thunderbird from wrapping the text.....
> 
>> root@test-r710:~# ifconfig
>> eno1      Link encap:Ethernet  HWaddr 00:26:b9:84:c7:8d
>>          inet addr:192.168.1.80  Bcast:192.168.1.255  Mask:255.255.255.0
>>          inet6 addr: fe80::226:b9ff:fe84:c78d/64 Scope:Link
>>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>          RX packets:8516 errors:0 dropped:0 overruns:0 frame:0
>>          TX packets:23199 errors:0 dropped:0 overruns:0 carrier:0
>>          collisions:0 txqueuelen:1000
>>          RX bytes:5297958 (5.2 MB)  TX bytes:3222616 (3.2 MB)
>> 
>> eno2      Link encap:Ethernet  HWaddr 00:26:b9:84:c7:8f
>>          inet addr:192.168.0.80  Bcast:192.168.0.255  Mask:255.255.255.0
>>          inet6 addr: fe80::226:b9ff:fe84:c78f/64 Scope:Link
>>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>          RX packets:1374513 errors:0 dropped:0 overruns:0 frame:0
>>          TX packets:168485 errors:0 dropped:0 overruns:0 carrier:0
>>          collisions:0 txqueuelen:1000
>>          RX bytes:2026863011 (2.0 GB)  TX bytes:21861558 (21.8 MB)
>> 
>> eno4      Link encap:Ethernet  HWaddr 00:26:b9:84:c7:93
>>          inet addr:137.205.232.159  Bcast:137.205.232.255  
>> Mask:255.255.255.128
>>          inet6 addr: fe80::226:b9ff:fe84:c793/64 Scope:Link
>>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>          RX packets:11483 errors:0 dropped:0 overruns:0 frame:0
>>          TX packets:10560 errors:0 dropped:0 overruns:0 carrier:0
>>          collisions:0 txqueuelen:1000
>>          RX bytes:3504764 (3.5 MB)  TX bytes:5731764 (5.7 MB)
> 
> 
>> root@test-r710:~# route -n
>> Kernel IP routing table
>> Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
>> 0.0.0.0         137.205.232.254 0.0.0.0         UG    0      0        0 eno4
>> 137.205.232.128 0.0.0.0         255.255.255.128 U     0      0        0 eno4
>> 192.168.0.0     0.0.0.0         255.255.255.0   U     0      0        0 eno2
>> 192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0 eno1
> 
> Lustre mounts in fstab :> # Lustre mounted
>> 192.168.0.4@tcp0:/storage       /storage        lustre  
>> defaults,_netdev,flock 0 0
>> 192.168.0.4@tcp0:/home          /home           lustre  
>> defaults,_netdev,flock 0 0
>> 192.168.0.4@tcp0:/scratch       /scratch        lustre  
>> defaults,_netdev,flock 0 0
> 
> I've also tried compiling the latest source and installing those modules : 
> Lustre: Build Version: 2.8.56_26_g6fad3ab this does seem not to have the 
> problem with matlab (mentioned about a month or so ago), but still has the 
> hanging problem.
> 
> The lustre startup logs in the joural are here :
>> Aug 12 12:57:10 test-r710 kernel: Lustre: Lustre: Build Version: 
>> 2.8.56_26_g6fad3ab
>> Aug 12 12:57:10 test-r710 kernel: Lustre: Server MGS version (2.1.0.0) is 
>> much older than client. Consider upgrading server (2.8.56_26_g6fad3ab)
>> Aug 12 12:57:10 test-r710 kernel: Lustre: Trying to mount a client with IR 
>> setting not compatible with current mgc. Force to use current mgc setting 
>> that is IR disabled.
>> Aug 12 12:57:10 test-r710 kernel: Lustre: Mounted home-client
> 
> 
> Cheers.
> 
> Phill.
> 
> 
> 
> Cheers.
> 
> Phill.
> 
> 
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to