Steve, It might be the network that LNet is running on. Have you run some bandwidth tests without LNet to check for network problems? On Dec 11, 2016 3:37 PM, "Steve Barnet" <bar...@icecube.wisc.edu> wrote:
> Hi all, > > Seeing something very strange. I recently added two OSSes > and 10 OSTs to one of our filesystems. Things look OK under > light loads, but when we load them up, we start seeing lots > of LNet errors. > > OS: Scientific Linux 6.7 > Lustre - Server: 2.8.0 Community version > Lustre - Client: 2.5.3 > > The errors are below. Do these narrow the range of possible > problems? > > > Dec 11 11:17:39 lfs-ex-oss-20 kernel: LNetError: > 7732:0:(socklnd_cb.c:2509:ksocknal_check_peer_timeouts()) Total 4 stale > ZC_REQs for peer 10.128.10.29@tcp1 detected; the oldest(ffff880f6a90e000) > timed out 7 secs ago, resid: 0, wmem: 0 > Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError: > 7732:0:(events.c:447:server_bulk_callback()) event type 5, status -5, > desc ffff8805379f8000 > Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError: > 7732:0:(events.c:447:server_bulk_callback()) event type 5, status -5, > desc ffff880f375dc000 > Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError: > 8234:0:(ldlm_lib.c:3175:target_bulk_io()) @@@ network error on bulk READ > req@ffff880e506263c0 x1551187318090340/t0(0) > o3->092e941d-272a-09e3-502b-9338dbf387d3@10.128.10.29@tcp1:587/0 lens > 488/432 e 3 to 0 dl 1481476687 ref 1 fl Interpret:/0/0 rc 0/0 > Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError: > 8234:0:(ldlm_lib.c:3175:target_bulk_io()) Skipped 1 previous similar > message > Dec 11 11:17:39 lfs-ex-oss-20 kernel: Lustre: lfs2-OST0024: Bulk IO read > error with 092e941d-272a-09e3-502b-9338dbf387d3 (at 10.128.10.29@tcp1), > client will retry: rc -110 > Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError: > 7732:0:(events.c:447:server_bulk_callback()) event type 5, status -5, > desc ffff8804db0ce000 > Dec 11 11:17:39 lfs-ex-oss-20 kernel: LustreError: > 7732:0:(events.c:447:server_bulk_callback()) event type 5, status -5, > desc ffff880aa4374000 > > > Thanks much! > > Best, > > ---Steve > > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org