Hi Louis,

If you don’t have any particular monitoring on the servers (Prometheus, 
ganglia, etc..) you could also use sar (sysstat) or a similar tool to confirm 
the CPU waits for IO. Also the device saturation on sar or with iostat. For 
instance:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.19    0.00    6.09    0.10    0.06   93.55

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     1.20    0.20    0.60     0.00     0.01    20.00     
0.00    0.75    1.00    0.67   0.75   0.06
sdb               0.00   136.80    2.80   96.60     0.81     9.21   206.42     
0.19    1.91   26.29    1.20   0.55   5.46
sdc               0.00   144.20   58.80  128.00     2.34    16.82   210.08     
0.24    1.31    2.68    0.68   0.66  12.40

Then if you enable lustre job stats you can check on that specific device which 
job is doing most IO. Last but not least you could also parse which specific 
NID is doing the intensive IO on that OST 
(/proc/fs/lustre/obdfilter/<fs>-OST0007/exports/*/stats).

Regards,

Diego


From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
Louis Allen <louisal...@live.co.uk>
Date: Tuesday, 29 October 2019 at 17:43
To: "Oral, H." <ora...@ornl.gov>, "Carlson, Timothy S" 
<timothy.carl...@pnnl.gov>, "lustre-discuss@lists.lustre.org" 
<lustre-discuss@lists.lustre.org>
Subject: Re: [lustre-discuss] [EXTERNAL] Re: Lustre Timeouts/Filesystem Hanging

Thanks, will take a look.

Any other areas i should be looking? Should i be applying any Lustre tuning?

Thanks

Get Outlook for Android<https://aka.ms/ghei36>
________________________________
From: Oral, H. <ora...@ornl.gov>
Sent: Monday, October 28, 2019 7:06:41 PM
To: Louis Allen <louisal...@live.co.uk>; Carlson, Timothy S 
<timothy.carl...@pnnl.gov>; lustre-discuss@lists.lustre.org 
<lustre-discuss@lists.lustre.org>
Subject: Re: [EXTERNAL] Re: [lustre-discuss] Lustre Timeouts/Filesystem Hanging

For inspecting client side I/O, you can use Darshan.

Thanks,

Sarp

--
Sarp Oral, PhD

National Center for Computational Sciences
Oak Ridge National Laboratory
ora...@ornl.gov
865-574-2173


On 10/28/19, 1:58 PM, "lustre-discuss on behalf of Louis Allen" 
<lustre-discuss-boun...@lists.lustre.org on behalf of louisal...@live.co.uk> 
wrote:


    Thanks for the reply, Tim.


    Are there any tools I can use to see if that is the cause?


    Could any tuning possibly help the situation?


    Thanks





    ________________________________________
    From: Carlson, Timothy S <timothy.carl...@pnnl.gov>
    Sent: Monday, 28 October 2019, 17:24
    To: Louis Allen; lustre-discuss@lists.lustre.org
    Subject: RE: Lustre Timeouts/Filesystem Hanging


    In my experience, this is almost always related to some code doing really 
bad I/O. Let’s say you have a 1000 rank MPI code doing open/read 4k/close on a 
few specific files on that OST.  That will make for a  bad day.

    The other place you can see this, and this isn’t your case, is when ZFS 
refuses to give up on a disk that is failing and your overall I/O suffers from 
ZFS continuing to try to read from a disk that it should just kick out

    Tim


    From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org>
    On Behalf Of Louis Allen
    Sent: Monday, October 28, 2019 10:16 AM
    To: lustre-discuss@lists.lustre.org
    Subject: [lustre-discuss] Lustre Timeouts/Filesystem Hanging



    Hello,



    Lustre (2.12) seem to be hanging quite frequently (5+ times a day) for us 
and one of the OSS servers (out of 4) is reporting an extremely high load 
average (150+) but the CPU usage of that server
     is actually very low - so it must be related to something else - possibly 
CPU_IO_WAIT.



    The OSS server we are seeing the high load averages we can also see 
multiple LustreError messages in /var/log/messages:



    Oct 28 11:22:23 pazlustreoss001 kernel: LNet: Service thread pid 2403 was 
inactive for 200.08s. The thread might be hung, or it might only be slow and 
will resume later. Dumping the stack trace
     for debugging purposes:
    Oct 28 11:22:23 pazlustreoss001 kernel: LNet: Skipped 4 previous similar 
messages

    Oct 28 11:22:23 pazlustreoss001 kernel: Pid: 2403, comm: ll_ost00_068 
3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Sun May 26 21:48:35 UTC 2019

    Oct 28 11:22:23 pazlustreoss001 kernel: Call Trace:

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc03747c5>] 
jbd2_log_wait_commit+0xc5/0x140 [jbd2]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0375e52>] 
jbd2_complete_transaction+0x52/0xa0 [jbd2]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0732da2>] 
ldiskfs_sync_file+0x2e2/0x320 [ldiskfs]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffa52760b0>] 
vfs_fsync_range+0x20/0x30

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0c8b651>] 
osd_object_sync+0xb1/0x160 [osd_ldiskfs]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0ab48a7>] 
tgt_sync+0xb7/0x270 [ptlrpc]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0dc3731>] 
ofd_sync_hdl+0x111/0x530 [ofd]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0aba1da>] 
tgt_request_handle+0xaea/0x1580 [ptlrpc]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0a5f80b>] 
ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0a6313c>] 
ptlrpc_main+0xafc/0x1fc0 [ptlrpc]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffa50c1c71>] 
kthread+0xd1/0xe0

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffa5775c37>] 
ret_from_fork_nospec_end+0x0/0x39

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffffffffff>] 
0xffffffffffffffff

    Oct 28 11:22:23 pazlustreoss001 kernel: LustreError: dumping log to 
/tmp/lustre-log.1572261743.2403

    Oct 28 11:22:23 pazlustreoss001 kernel: Pid: 2292, comm: ll_ost03_043 
3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Sun May 26 21:48:35 UTC 2019

    Oct 28 11:22:23 pazlustreoss001 kernel: Call Trace:

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc03747c5>] 
jbd2_log_wait_commit+0xc5/0x140 [jbd2]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0375e52>] 
jbd2_complete_transaction+0x52/0xa0 [jbd2]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0732da2>] 
ldiskfs_sync_file+0x2e2/0x320 [ldiskfs]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffa52760b0>] 
vfs_fsync_range+0x20/0x30

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0c8b651>] 
osd_object_sync+0xb1/0x160 [osd_ldiskfs]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0ab48a7>] 
tgt_sync+0xb7/0x270 [ptlrpc]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0dc3731>] 
ofd_sync_hdl+0x111/0x530 [ofd]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0aba1da>] 
tgt_request_handle+0xaea/0x1580 [ptlrpc]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0a5f80b>] 
ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]

    Oct 28 11:22:23 pazlustreoss001 kernel: LNet: Service thread pid 2403 
completed after 200.29s. This indicates the system was overloaded (too many 
service threads, or there were not enough hardware
     resources).

    Oct 28 11:22:23 pazlustreoss001 kernel: LNet: Skipped 48 previous similar 
messages

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffc0a6313c>] 
ptlrpc_main+0xafc/0x1fc0 [ptlrpc]

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffa50c1c71>] 
kthread+0xd1/0xe0

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffa5775c37>] 
ret_from_fork_nospec_end+0x0/0x39

    Oct 28 11:22:23 pazlustreoss001 kernel: [<ffffffffffffffff>] 
0xffffffffffffffff










_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to