Those are the kind of symptoms you would see if the client is able to connect 
to the MDS server but not to an OSS server.  Certain operations (mount, cd, ls) 
will work if the MDS server is reachable , even if one or more OSS servers is 
not reachable.  But other operations (“ls -la”, df) require info from the OSS 
servers, so those operations would hang.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu


> On Sep 8, 2018, at 8:33 AM, fırat yılmaz <firatyilm...@gmail.com> wrote:
> 
> Hi There,
> 
> OS=Centos 7.4
> Lustre Version: Intel® Manager for Lustre* software 4.0.3.0
> İnterconnect: Mellanox OFED, ConnectX-5
> 
> In one of my lustre client i have Input/output error in df command, i am 
> unable to see the lustre mount point in df but mtab file shows that lustre is 
> mounted
> 
> df -h output:
> 
> df: ‘/home’: Input/output error
> df: ‘/vol1’: Input/output error
> df: ‘/cm/shared’: Input/output error
> Filesystem        Size  Used Avail Use% Mounted on
> 
>  cat /etc/mtab |grep lustre
> 
> 10.51.22.11@o2ib:10.51.22.10@o2ib:/lustre/home /home lustre 
> rw,flock,lazystatfs 0 0
> 10.51.22.11@o2ib:10.51.22.10@o2ib:/lustre /vol1 lustre rw,flock,lazystatfs 0 0
> 10.51.22.11@o2ib:10.51.22.10@o2ib:/lustre/cmshared /cm/shared lustre 
> rw,flock,lazystatfs 0 0
> 
> 
> df -h output:
> 
> df: ‘/home’: Input/output error
> df: ‘/vol1’: Input/output error
> df: ‘/cm/shared’: Input/output error
> Filesystem        Size  Used Avail Use% Mounted on
> 
> 
> When i cd to the mounted point i can reach the lustre filesystem, i can 
> create and delete files and folders. But when i cd to a large fileand run ls 
> -lah command, response from the lustre client freezes.
> 
> dmesg output:
>  [84276.460557] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) 
> @@@ Request sent has failed due to network error: [sent 1536408434/real 
> 1536408489]  req@ffff882f31697800 x1610952588839712/t0(0) 
> o8->lustre-OST0016-osc-ffff885f5fa1f000@10.52.23.5@o2ib:28/4 lens 520/544 e 0 
> to 1 dl 1536408714 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
> [84276.460565] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) 
> Skipped 910 previous similar messages
> [84386.986467] LustreError: 122750:0:(llite_lib.c:1772:ll_statfs_internal()) 
> obd_statfs fails: rc = -5
> [84386.986471] LustreError: 122750:0:(llite_lib.c:1772:ll_statfs_internal()) 
> Skipped 29 previous similar messages
> [84704.429967] LNet: 5429:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed 
> out tx for 10.52.23.5@o2ib: 4379575 seconds
> [84704.429970] LNet: 5429:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Skipped 
> 863 previous similar messages
> [84881.004949] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ 
> Request sent has failed due to network error: [sent 1536409034/real 
> 1536409095]  req@ffff882f2a6e5700 x1610952588854608/t0(0) 
> o8->lustre-OST002e-osc-ffff885f5fa1f000@10.52.23.5@o2ib:28/4 lens 520/544 e 0 
> to 1 dl 1536409314 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
> [84881.004957] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) 
> Skipped 863 previous similar messages
> [85065.953686] LustreError: 123635:0:(llite_lib.c:1772:ll_statfs_internal()) 
> obd_statfs fails: rc = -5
> [85065.953689] LustreError: 123635:0:(llite_lib.c:1772:ll_statfs_internal()) 
> Skipped 26 previous similar messages
> 
> fstab mount options:
> lustre       flock,_netdev,x-systemd.requires=lnet.service 0 0
> 
> ib_* benchmark tests are as usual.
> 
> Where should i check?
> 
> Best Regards.
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to