iotop can be used to debug the I/O performance. lfs health_check , lctl get_param to get lustre health status.
*cratch-OST0084_UUID: not available for connect from 172.23.15.246@tcp30 (no target) indicates the network issue check network as well. * verify the health of the storage devices associated with OST00_036 can use smartctl. On Mon, 18 Dec 2023 at 15:28, Strikwerda, Ger via lustre-discuss < lustre-discuss@lists.lustre.org> wrote: > > Dear all, > > Since last week we are facing 'hanging kernel threads' causing our Lustre > environment (Rocky 8.7/Lustre 2.15.2) to hang. > > errors: > > Dec 18 10:36:04 hb-oss01 kernel: LustreError: 137-5: scratch-OST0084_UUID: > not available for connect from 172.23.15.246@tcp30 (no target). If you > are running an HA pair check that the target is mounted on the other server. > Dec 18 10:36:04 hb-oss01 kernel: LustreError: Skipped 330 previous similar > messages > Dec 18 10:36:04 hb-oss01 kernel: ptlrpc_watchdog_fire: 1 callbacks > suppressed > Dec 18 10:36:04 hb-oss01 kernel: Lustre: ll_ost00_036: service thread pid > 85609 was inactive for 1062.652 seconds. The thread might be hung, or it > might only be slow and will resume later. Dumping the stack trace for > debugging purposes: > > at that moment 231 jobs, not really high io. Normally we run way more > jobs, and way more io. > > environment is > > 2 MDS > 4 OSS > 160 OST's > 250 clients > > network is tcp > > According to the internet, this could be caused by 'bad i/o'. Are there > any useful things to check/isolate where this bad i/o is coming from? How > do others pinpoint these issues? > > Any feedback is very welcome, > > -- > > Vriendelijke groet, > > Ger Strikwerdasenior expert multidisciplinary enabler > simple solution architect > Rijksuniversiteit Groningen > CIT/RDMS/HPC > > Smitsborg > Nettelbosje 1 > 9747 AJ Groningen > Tel. 050 363 9276 > "God is hard, God is fair > some men he gave brains, others he gave hair" > > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org