Just to provide extra information, you can enable the gpfsgui. This will give you the webpage of the whole system and it will contain alerts.
It will allow you to see everything and do stuff from the webpage rather than from the command line. It's quite useful, and it also provides system health as well as some nice graphs. 🙂 Thanks Danny ________________________________ From: gpfsug-discuss <gpfsug-discuss-boun...@gpfsug.org> on behalf of Maloney, John Daniel <malon...@illinois.edu> Sent: 06 June 2024 11:58 PM To: gpfsug main discussion list <gpfsug-discuss@gpfsug.org> Subject: Re: [gpfsug-discuss] No space left on device, but plenty of quota space for inodes and blocks External Sender: Use caution. Yeah; you’ll want to bump that up on the home fileset, something like: mmchfileset cluster home --inode-limit 25000000 (that’d give you a buffer of ~4.9 million inodes) For the ones that show 0, those are dependent filesets (not independent) the inode allocations are tracked in the parent independent inode fileset. Best, J.D. Maloney Lead HPC Storage Engineer | Storage Enabling Technologies Group National Center for Supercomputing Applications (NCSA) From: gpfsug-discuss <gpfsug-discuss-boun...@gpfsug.org> on behalf of Rob Kudyba <rk3...@columbia.edu> Date: Thursday, June 6, 2024 at 5:51 PM To: gpfsug main discussion list <gpfsug-discuss@gpfsug.org> Subject: Re: [gpfsug-discuss] No space left on device, but plenty of quota space for inodes and blocks I guess I have my answer: /usr/lpp/mmfs/bin/mmlsfileset cluster home -L Filesets in file system 'cluster': Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Comment home 1 1048579 0 Thu Nov 29 15:21:52 2018 1 20971520 20971520 However on some of the other filesets the AllocInodes is 0? /usr/lpp/mmfs/bin/mmlsfileset cluster groupa -L -i Collecting fileset usage information ... Filesets in file system 'moto': Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes UsedInodes Comment stats 8 181207 0 Fri Nov 30 12:27:25 2018 0 0 0 7628733 Yes we realize it's old and it'll be retired at the end of 2024. On Thu, Jun 6, 2024 at 6:15 PM Fred Stock <sto...@us.ibm.com<mailto:sto...@us.ibm.com>> wrote: You should check the inode counts for each of the filesets using the mmlsfileset command. You should check the local disk space on all the nodes. I presume you are aware that Scale 4.2.3 has been out of support for 4 years. Fred Fred Stock, Spectrum Scale Development Advocacy sto...@us.ibm.com<mailto:sto...@us.ibm.com> | 720-430-8821 From: gpfsug-discuss <gpfsug-discuss-boun...@gpfsug.org<mailto:gpfsug-discuss-boun...@gpfsug.org>> on behalf of Rob Kudyba <rk3...@columbia.edu<mailto:rk3...@columbia.edu>> Date: Thursday, June 6, 2024 at 5:39 PM To: gpfsug main discussion list <gpfsug-discuss@gpfsug.org<mailto:gpfsug-discuss@gpfsug.org>> Subject: [EXTERNAL] Re: [gpfsug-discuss] No space left on device, but plenty of quota space for inodes and blocks Are you seeing the issues across the whole file system or in certain areas? Only with accounts in GPFS, local accounts and root do not gt this. That sounds like inode exhaustion to me (and based on it not being block exhaustion as you’ve demonstrated).  ZjQcmQRYFpfptBannerStart This Message Is From an Untrusted Sender You have not previously corresponded with this sender. Report Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/AdhS1Rd-!-XFVHHiymkdb2PX5Ys9u3xcIH6Vd3Ap1CobKrLSv4AEKLaxWTmX-SIXo5pwXtsG8GuxP6yYyms8BE2p0j0YYMsauSua4xvEzG7v8C4nNZ8q-8rr50pPoh5DWHA$> ZjQcmQRYFpfptBannerEnd Are you seeing the issues across the whole file system or in certain areas? Only with accounts in GPFS, local accounts and root do not gt this. That sounds like inode exhaustion to me (and based on it not being block exhaustion as you’ve demonstrated). What does a “df -i /cluster” show you? We bumped it up a few weeks ago: df -i /cluster Filesystem Inodes IUsed IFree IUse% Mounted on cluster 276971520 154807697 122163823 56% /cluster Or if this is only in a certain area you can “cd” into that directory and run a “df -i .” As root on a login node; df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda2 20971520 169536 20801984 1% / devtmpfs 12169978 528 12169450 1% /dev tmpfs 12174353 1832 12172521 1% /run tmpfs 12174353 77 12174276 1% /dev/shm tmpfs 12174353 15 12174338 1% /sys/fs/cgroup /dev/sda1 0 0 0 - /boot/efi /dev/sda3 52428800 2887 52425913 1% /var /dev/sda7 277368832 35913 277332919 1% /local /dev/sda5 104857600 398 104857202 1% /tmp tmpfs 12174353 1 12174352 1% /run/user/551336 tmpfs 12174353 1 12174352 1% /run/user/0 moto 276971520 154807697 122163823 56% /cluster tmpfs 12174353 3 12174350 1% /run/user/441245 tmpfs 12174353 12 12174341 1% /run/user/553562 tmpfs 12174353 1 12174352 1% /run/user/525583 tmpfs 12174353 1 12174352 1% /run/user/476374 tmpfs 12174353 1 12174352 1% /run/user/468934 tmpfs 12174353 5 12174348 1% /run/user/551200 tmpfs 12174353 1 12174352 1% /run/user/539143 tmpfs 12174353 1 12174352 1% /run/user/488676 tmpfs 12174353 1 12174352 1% /run/user/493713 tmpfs 12174353 1 12174352 1% /run/user/507831 tmpfs 12174353 1 12174352 1% /run/user/549822 tmpfs 12174353 1 12174352 1% /run/user/500569 tmpfs 12174353 1 12174352 1% /run/user/443748 tmpfs 12174353 1 12174352 1% /run/user/543676 tmpfs 12174353 1 12174352 1% /run/user/451446 tmpfs 12174353 1 12174352 1% /run/user/497945 tmpfs 12174353 6 12174347 1% /run/user/554672 tmpfs 12174353 32 12174321 1% /run/user/554653 tmpfs 12174353 1 12174352 1% /run/user/30094 tmpfs 12174353 1 12174352 1% /run/user/470790 tmpfs 12174353 59 12174294 1% /run/user/553037 tmpfs 12174353 1 12174352 1% /run/user/554670 tmpfs 12174353 1 12174352 1% /run/user/548236 tmpfs 12174353 1 12174352 1% /run/user/547288 tmpfs 12174353 1 12174352 1% /run/user/547289 You may need to allocate more inodes to an independent inode fileset somewhere. Especially with something as old as 4.2.3 you won’t have auto-inode expansion for the filesets. Do we have to restart any service after upping the inode count? Best, J.D. Maloney Lead HPC Storage Engineer | Storage Enabling Technologies Group National Center for Supercomputing Applications (NCSA) Ho JD I took an intermediate LCI workshop with you at Univ of Cincinnati! From: gpfsug-discuss <gpfsug-discuss-boun...@gpfsug.org<mailto:gpfsug-discuss-boun...@gpfsug.org>> on behalf of Rob Kudyba <rk3...@columbia.edu<mailto:rk3...@columbia.edu>> Date: Thursday, June 6, 2024 at 3:50 PM To: gpfsug-discuss@gpfsug.org<mailto:gpfsug-discuss@gpfsug.org> <gpfsug-discuss@gpfsug.org<mailto:gpfsug-discuss@gpfsug.org>> Subject: [gpfsug-discuss] No space left on device, but plenty of quota space for inodes and blocks Running GPFS 4.2.3 on a DDN GridScaler and users are getting the No space left on device message when trying to write to a file. In /var/adm/ras/mmfs.log the only recent errors are this: 2024-06-06_15:51:22.311-0400: mmcommon getContactNodes cluster failed. Return code -1. 2024-06-06_15:51:22.311-0400: The previous error was detected on node x.x.x.x (headnode). 2024-06-06_15:53:25.088-0400: mmcommon getContactNodes cluster failed. Return code -1. 2024-06-06_15:53:25.088-0400: The previous error was detected on node x.x.x.x (headnode). according to https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=messages-6027-615<https://urldefense.com/v3/__https:/www.ibm.com/docs/en/storage-scale/5.1.9?topic=messages-6027-615__;!!DZ3fjg!6G98Oo44gOmToQaQbotCPR6nxP_ox0xaRc9U6-ZjcWg8XMrwGvtt72VimFO6hSNnNkySuxawIVoA3_uEDKIgjW0$> Check the preceding messages, and consult the earlier chapters of this document. A frequent cause for such errors is lack of space in /var. We have plenty of space left. /usr/lpp/mmfs/bin/mmlsdisk cluster disk driver sector failure holds holds storage name type size group metadata data status availability pool ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------ S01_MDT200_1 nsd 4096 200 Yes No ready up system S01_MDT201_1 nsd 4096 201 Yes No ready up system S01_DAT0001_1 nsd 4096 100 No Yes ready up data1 S01_DAT0002_1 nsd 4096 101 No Yes ready up data1 S01_DAT0003_1 nsd 4096 100 No Yes ready up data1 S01_DAT0004_1 nsd 4096 101 No Yes ready up data1 S01_DAT0005_1 nsd 4096 100 No Yes ready up data1 S01_DAT0006_1 nsd 4096 101 No Yes ready up data1 S01_DAT0007_1 nsd 4096 100 No Yes ready up data1 /usr/lpp/mmfs/bin/mmdf headnode disk disk size failure holds holds free KB free KB name in KB group metadata data in full blocks in fragments --------------- ------------- -------- -------- ----- -------------------- ------------------- Disks in storage pool: system (Maximum disk size allowed is 14 TB) S01_MDT200_1 1862270976 200 Yes No 969134848 ( 52%) 2948720 ( 0%) S01_MDT201_1 1862270976 201 Yes No 969126144 ( 52%) 2957424 ( 0%) ------------- -------------------- ------------------- (pool total) 3724541952 1938260992 ( 52%) 5906144 ( 0%) Disks in storage pool: data1 (Maximum disk size allowed is 578 TB) S01_DAT0007_1 77510737920 100 No Yes 21080752128 ( 27%) 897723392 ( 1%) S01_DAT0005_1 77510737920 100 No Yes 14507212800 ( 19%) 949412160 ( 1%) S01_DAT0001_1 77510737920 100 No Yes 14503620608 ( 19%) 951327680 ( 1%) S01_DAT0003_1 77510737920 100 No Yes 14509205504 ( 19%) 949340544 ( 1%) S01_DAT0002_1 77510737920 101 No Yes 14504585216 ( 19%) 948377536 ( 1%) S01_DAT0004_1 77510737920 101 No Yes 14503647232 ( 19%) 952892480 ( 1%) S01_DAT0006_1 77510737920 101 No Yes 14504486912 ( 19%) 949072512 ( 1%) ------------- -------------------- ------------------- (pool total) 542575165440 108113510400 ( 20%) 6598146304 ( 1%) ============= ==================== =================== (data) 542575165440 108113510400 ( 20%) 6598146304 ( 1%) (metadata) 3724541952 1938260992 ( 52%) 5906144 ( 0%) ============= ==================== =================== (total) 546299707392 110051771392 ( 22%) 6604052448 ( 1%) Inode Information ----------------- Total number of used inodes in all Inode spaces: 154807668 Total number of free inodes in all Inode spaces: 12964492 Total number of allocated inodes in all Inode spaces: 167772160 Total of Maximum number of inodes in all Inode spaces: 276971520 On the head node: df -h Filesystem Size Used Avail Use% Mounted on /dev/sda4 430G 216G 215G 51% / devtmpfs 47G 0 47G 0% /dev tmpfs 47G 0 47G 0% /dev/shm tmpfs 47G 4.1G 43G 9% /run tmpfs 47G 0 47G 0% /sys/fs/cgroup /dev/sda1 504M 114M 365M 24% /boot /dev/sda2 100M 9.9M 90M 10% /boot/efi x.x.x.:/nfs-share 430G 326G 105G 76% /nfs-share cluster 506T 405T 101T 81% /cluster tmpfs 9.3G 0 9.3G 0% /run/user/443748 tmpfs 9.3G 0 9.3G 0% /run/user/547288 tmpfs 9.3G 0 9.3G 0% /run/user/551336 tmpfs 9.3G 0 9.3G 0% /run/user/547289 The login nodes have plenty of space in /var: /dev/sda3 50G 8.7G 42G 18% /var What else should we check? We are just at 81% on the GPFS mounted file system but that should be enough for more space without these errors. Any recommended service(s) that we can restart? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org<https://urldefense.com/v3/__http:/gpfsug.org__;!!DZ3fjg!6G98Oo44gOmToQaQbotCPR6nxP_ox0xaRc9U6-ZjcWg8XMrwGvtt72VimFO6hSNnNkySuxawIVoA3_uEpMAUCh8$> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org<https://urldefense.com/v3/__http:/gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org__;!!DZ3fjg!6G98Oo44gOmToQaQbotCPR6nxP_ox0xaRc9U6-ZjcWg8XMrwGvtt72VimFO6hSNnNkySuxawIVoA3_uEFX20ez8$> _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org<https://urldefense.com/v3/__http:/gpfsug.org__;!!DZ3fjg!6G98Oo44gOmToQaQbotCPR6nxP_ox0xaRc9U6-ZjcWg8XMrwGvtt72VimFO6hSNnNkySuxawIVoA3_uEpMAUCh8$> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org<https://urldefense.com/v3/__http:/gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org__;!!DZ3fjg!6G98Oo44gOmToQaQbotCPR6nxP_ox0xaRc9U6-ZjcWg8XMrwGvtt72VimFO6hSNnNkySuxawIVoA3_uEFX20ez8$> The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org