Those are the kind of symptoms you would see if the client is able to connect to the MDS server but not to an OSS server. Certain operations (mount, cd, ls) will work if the MDS server is reachable , even if one or more OSS servers is not reachable. But other operations (“ls -la”, df) require info from the OSS servers, so those operations would hang.
-- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu > On Sep 8, 2018, at 8:33 AM, fırat yılmaz <firatyilm...@gmail.com> wrote: > > Hi There, > > OS=Centos 7.4 > Lustre Version: Intel® Manager for Lustre* software 4.0.3.0 > İnterconnect: Mellanox OFED, ConnectX-5 > > In one of my lustre client i have Input/output error in df command, i am > unable to see the lustre mount point in df but mtab file shows that lustre is > mounted > > df -h output: > > df: ‘/home’: Input/output error > df: ‘/vol1’: Input/output error > df: ‘/cm/shared’: Input/output error > Filesystem Size Used Avail Use% Mounted on > > cat /etc/mtab |grep lustre > > 10.51.22.11@o2ib:10.51.22.10@o2ib:/lustre/home /home lustre > rw,flock,lazystatfs 0 0 > 10.51.22.11@o2ib:10.51.22.10@o2ib:/lustre /vol1 lustre rw,flock,lazystatfs 0 0 > 10.51.22.11@o2ib:10.51.22.10@o2ib:/lustre/cmshared /cm/shared lustre > rw,flock,lazystatfs 0 0 > > > df -h output: > > df: ‘/home’: Input/output error > df: ‘/vol1’: Input/output error > df: ‘/cm/shared’: Input/output error > Filesystem Size Used Avail Use% Mounted on > > > When i cd to the mounted point i can reach the lustre filesystem, i can > create and delete files and folders. But when i cd to a large fileand run ls > -lah command, response from the lustre client freezes. > > dmesg output: > [84276.460557] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) > @@@ Request sent has failed due to network error: [sent 1536408434/real > 1536408489] req@ffff882f31697800 x1610952588839712/t0(0) > o8->lustre-OST0016-osc-ffff885f5fa1f000@10.52.23.5@o2ib:28/4 lens 520/544 e 0 > to 1 dl 1536408714 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 > [84276.460565] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) > Skipped 910 previous similar messages > [84386.986467] LustreError: 122750:0:(llite_lib.c:1772:ll_statfs_internal()) > obd_statfs fails: rc = -5 > [84386.986471] LustreError: 122750:0:(llite_lib.c:1772:ll_statfs_internal()) > Skipped 29 previous similar messages > [84704.429967] LNet: 5429:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed > out tx for 10.52.23.5@o2ib: 4379575 seconds > [84704.429970] LNet: 5429:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Skipped > 863 previous similar messages > [84881.004949] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ > Request sent has failed due to network error: [sent 1536409034/real > 1536409095] req@ffff882f2a6e5700 x1610952588854608/t0(0) > o8->lustre-OST002e-osc-ffff885f5fa1f000@10.52.23.5@o2ib:28/4 lens 520/544 e 0 > to 1 dl 1536409314 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 > [84881.004957] Lustre: 5617:0:(client.c:2114:ptlrpc_expire_one_request()) > Skipped 863 previous similar messages > [85065.953686] LustreError: 123635:0:(llite_lib.c:1772:ll_statfs_internal()) > obd_statfs fails: rc = -5 > [85065.953689] LustreError: 123635:0:(llite_lib.c:1772:ll_statfs_internal()) > Skipped 26 previous similar messages > > fstab mount options: > lustre flock,_netdev,x-systemd.requires=lnet.service 0 0 > > ib_* benchmark tests are as usual. > > Where should i check? > > Best Regards. > > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org