Re: [lustre-discuss] Ongoing issues with quota
Hello Andreas, lfs df -i reports 19,204,412 inodes used. When I did the full robinhood scan, it reported scanning 18,673,874 entries, so fairly close. I don’t have a .lustre directory at the filesystem root. Another interesting aspect of this particular issue is I can run lctl lfsck and every time I get: layout_repaired: 1468299 But it doesn’t seem to be actually repairing anything because if I run it again, I’ll get the same or a similar number. I run it like this: lctl lfsck_start -t layout -t namespace -o -M lfsc-MDT — Dan Szkola FNAL > On Oct 10, 2023, at 10:47 AM, Andreas Dilger wrote: > > There is a $ROOT/.lustre/lost+found that you could check. > > What does "lfs df -i" report for the used inode count? Maybe it is RBH that > is reporting the wrong count? > > The other alternative would be to mount the MDT filesystem directly as type > ZFS and see what df -i and find report? > > Cheers, Andreas > >> On Oct 10, 2023, at 22:16, Daniel Szkola via lustre-discuss >> wrote: >> >> OK, I disabled, waited for a while, then reenabled. I still get the same >> numbers. The only thing I can think is somehow the count is correct, despite >> the huge difference. Robinhood and find show about 1.7M files, dirs, and >> links. The quota is showing a bit over 3.1M inodes used. We only have one >> MDS and MGS. Any ideas where the discrepancy may lie? Orphans? Is there a >> lost+found area in lustre? >> >> — >> Dan Szkola >> FNAL >> >> >>> On Oct 10, 2023, at 8:24 AM, Daniel Szkola wrote: >>> >>> Hi Robert, >>> >>> Thanks for the response. Do you remember exactly how you did it? Did you >>> bring everything down at any point? I know you can do this: >>> >>> lctl conf_param fsname.quota.mdt=none >>> >>> but is that all you did? Did you wait or bring everything down before >>> reenabling? I’m worried because that allegedly just enables/disables >>> enforcement and space accounting is always on. Andreas stated that quotas >>> are controlled by ZFS, but there has been no quota support enabled on any >>> of the ZFS volumes in our lustre filesystem. >>> >>> — >>> Dan Szkola >>> FNAL >>> > On Oct 10, 2023, at 2:17 AM, Redl, Robert wrote: Dear Dan, I had a similar problem some time ago. We are also using ZFS for MDT and OSTs. For us, the used disk space was reported wrong. The problem was fixed by switching quota support off on the MGS and then on again. Cheers, Robert > Am 09.10.2023 um 17:55 schrieb Daniel Szkola via lustre-discuss > : > > Thanks, I will look into the ZFS quota since we are using ZFS for all > storage, MDT and OSTs. > > In our case, there is a single MDS/MDT. I have used Robinhood and lfs > find (by group) commands to verify what the numbers should apparently be. > > — > Dan Szkola > FNAL > >> On Oct 9, 2023, at 10:13 AM, Andreas Dilger >> wrote: >> >> The quota accounting is controlled by the backing filesystem of the OSTs >> and MDTs. >> >> For ldiskfs/ext4 you could run e2fsck to re-count all of the inode and >> block usage. >> >> For ZFS you would have to ask on the ZFS list to see if there is some >> way to re-count the quota usage. >> >> The "inode" quota is accounted from the MDTs, while the "block" quota is >> accounted from the OSTs. You might be able to see with "lfs quota -v -g >> group" to see if there is one particular MDT that is returning too many >> inodes. >> >> Possibly if you have directories that are striped across many MDTs it >> would inflate the used inode count. For example, if every one of the >> 426k directories reported by RBH was striped across 4 MDTs then you >> would see the inode count add up to 3.6M. >> >> If that was the case, then I would really, really advise against >> striping every directory in the filesystem. That will cause problems >> far worse than just inflating the inode quota accounting. >> >> Cheers, Andreas >> >>> On Oct 9, 2023, at 22:33, Daniel Szkola via lustre-discuss >>> wrote: >>> >>> Is there really no way to force a recount of files used by the quota? >>> All indications are we have accounts where files were removed and this >>> is not reflected in the used file count in the quota. The space used >>> seems correct but the inodes used numbers are way high. There must be a >>> way to clear these numbers and have a fresh count done. >>> >>> — >>> Dan Szkola >>> FNAL >>> On Oct 4, 2023, at 11:37 AM, Daniel Szkola via lustre-discuss wrote: Also, quotas on the OSTS don’t add up to near 3 million files either: [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 /lustre1 Disk quotas for grp somegroup
Re: [lustre-discuss] Ongoing issues with quota
There is a $ROOT/.lustre/lost+found that you could check. What does "lfs df -i" report for the used inode count? Maybe it is RBH that is reporting the wrong count? The other alternative would be to mount the MDT filesystem directly as type ZFS and see what df -i and find report? Cheers, Andreas > On Oct 10, 2023, at 22:16, Daniel Szkola via lustre-discuss > wrote: > > OK, I disabled, waited for a while, then reenabled. I still get the same > numbers. The only thing I can think is somehow the count is correct, despite > the huge difference. Robinhood and find show about 1.7M files, dirs, and > links. The quota is showing a bit over 3.1M inodes used. We only have one MDS > and MGS. Any ideas where the discrepancy may lie? Orphans? Is there a > lost+found area in lustre? > > — > Dan Szkola > FNAL > > >> On Oct 10, 2023, at 8:24 AM, Daniel Szkola wrote: >> >> Hi Robert, >> >> Thanks for the response. Do you remember exactly how you did it? Did you >> bring everything down at any point? I know you can do this: >> >> lctl conf_param fsname.quota.mdt=none >> >> but is that all you did? Did you wait or bring everything down before >> reenabling? I’m worried because that allegedly just enables/disables >> enforcement and space accounting is always on. Andreas stated that quotas >> are controlled by ZFS, but there has been no quota support enabled on any of >> the ZFS volumes in our lustre filesystem. >> >> — >> Dan Szkola >> FNAL >> On Oct 10, 2023, at 2:17 AM, Redl, Robert wrote: >>> >>> Dear Dan, >>> >>> I had a similar problem some time ago. We are also using ZFS for MDT and >>> OSTs. For us, the used disk space was reported wrong. The problem was fixed >>> by switching quota support off on the MGS and then on again. >>> >>> Cheers, >>> Robert >>> Am 09.10.2023 um 17:55 schrieb Daniel Szkola via lustre-discuss : Thanks, I will look into the ZFS quota since we are using ZFS for all storage, MDT and OSTs. In our case, there is a single MDS/MDT. I have used Robinhood and lfs find (by group) commands to verify what the numbers should apparently be. — Dan Szkola FNAL > On Oct 9, 2023, at 10:13 AM, Andreas Dilger wrote: > > The quota accounting is controlled by the backing filesystem of the OSTs > and MDTs. > > For ldiskfs/ext4 you could run e2fsck to re-count all of the inode and > block usage. > > For ZFS you would have to ask on the ZFS list to see if there is some way > to re-count the quota usage. > > The "inode" quota is accounted from the MDTs, while the "block" quota is > accounted from the OSTs. You might be able to see with "lfs quota -v -g > group" to see if there is one particular MDT that is returning too many > inodes. > > Possibly if you have directories that are striped across many MDTs it > would inflate the used inode count. For example, if every one of the 426k > directories reported by RBH was striped across 4 MDTs then you would see > the inode count add up to 3.6M. > > If that was the case, then I would really, really advise against striping > every directory in the filesystem. That will cause problems far worse > than just inflating the inode quota accounting. > > Cheers, Andreas > >> On Oct 9, 2023, at 22:33, Daniel Szkola via lustre-discuss >> wrote: >> >> Is there really no way to force a recount of files used by the quota? >> All indications are we have accounts where files were removed and this >> is not reflected in the used file count in the quota. The space used >> seems correct but the inodes used numbers are way high. There must be a >> way to clear these numbers and have a fresh count done. >> >> — >> Dan Szkola >> FNAL >> >>> On Oct 4, 2023, at 11:37 AM, Daniel Szkola via lustre-discuss >>> wrote: >>> >>> Also, quotas on the OSTS don’t add up to near 3 million files either: >>> >>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 >>> /lustre1 >>> Disk quotas for grp somegroup (gid 9544): >>> Filesystem kbytes quota limit grace files quota limit >>> grace >>> 1394853459 0 1913344192 - 132863 0 0 >>> - >>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 1 >>> /lustre1 >>> Disk quotas for grp somegroup (gid 9544): >>> Filesystem kbytes quota limit grace files quota limit >>> grace >>> 1411579601 0 1963246413 - 120643 0 0 >>> - >>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 2 >>> /lustre1 >>> Disk quotas for grp somegroup (gid 9544): >>> Filesystem kbytes quota limit grace files quota limit
Re: [lustre-discuss] Ongoing issues with quota
OK, I disabled, waited for a while, then reenabled. I still get the same numbers. The only thing I can think is somehow the count is correct, despite the huge difference. Robinhood and find show about 1.7M files, dirs, and links. The quota is showing a bit over 3.1M inodes used. We only have one MDS and MGS. Any ideas where the discrepancy may lie? Orphans? Is there a lost+found area in lustre? — Dan Szkola FNAL > On Oct 10, 2023, at 8:24 AM, Daniel Szkola wrote: > > Hi Robert, > > Thanks for the response. Do you remember exactly how you did it? Did you > bring everything down at any point? I know you can do this: > > lctl conf_param fsname.quota.mdt=none > > but is that all you did? Did you wait or bring everything down before > reenabling? I’m worried because that allegedly just enables/disables > enforcement and space accounting is always on. Andreas stated that quotas are > controlled by ZFS, but there has been no quota support enabled on any of the > ZFS volumes in our lustre filesystem. > > — > Dan Szkola > FNAL > >> On Oct 10, 2023, at 2:17 AM, Redl, Robert wrote: >> >> Dear Dan, >> >> I had a similar problem some time ago. We are also using ZFS for MDT and >> OSTs. For us, the used disk space was reported wrong. The problem was fixed >> by switching quota support off on the MGS and then on again. >> >> Cheers, >> Robert >> >>> Am 09.10.2023 um 17:55 schrieb Daniel Szkola via lustre-discuss >>> : >>> >>> Thanks, I will look into the ZFS quota since we are using ZFS for all >>> storage, MDT and OSTs. >>> >>> In our case, there is a single MDS/MDT. I have used Robinhood and lfs find >>> (by group) commands to verify what the numbers should apparently be. >>> >>> — >>> Dan Szkola >>> FNAL >>> On Oct 9, 2023, at 10:13 AM, Andreas Dilger wrote: The quota accounting is controlled by the backing filesystem of the OSTs and MDTs. For ldiskfs/ext4 you could run e2fsck to re-count all of the inode and block usage. For ZFS you would have to ask on the ZFS list to see if there is some way to re-count the quota usage. The "inode" quota is accounted from the MDTs, while the "block" quota is accounted from the OSTs. You might be able to see with "lfs quota -v -g group" to see if there is one particular MDT that is returning too many inodes. Possibly if you have directories that are striped across many MDTs it would inflate the used inode count. For example, if every one of the 426k directories reported by RBH was striped across 4 MDTs then you would see the inode count add up to 3.6M. If that was the case, then I would really, really advise against striping every directory in the filesystem. That will cause problems far worse than just inflating the inode quota accounting. Cheers, Andreas > On Oct 9, 2023, at 22:33, Daniel Szkola via lustre-discuss > wrote: > > Is there really no way to force a recount of files used by the quota? > All indications are we have accounts where files were removed and this is > not reflected in the used file count in the quota. The space used seems > correct but the inodes used numbers are way high. There must be a way to > clear these numbers and have a fresh count done. > > — > Dan Szkola > FNAL > >> On Oct 4, 2023, at 11:37 AM, Daniel Szkola via lustre-discuss >> wrote: >> >> Also, quotas on the OSTS don’t add up to near 3 million files either: >> >> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 >> /lustre1 >> Disk quotas for grp somegroup (gid 9544): >> Filesystem kbytes quota limit grace files quota limit >> grace >> 1394853459 0 1913344192 - 132863 0 0 >> - >> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 1 >> /lustre1 >> Disk quotas for grp somegroup (gid 9544): >> Filesystem kbytes quota limit grace files quota limit >> grace >> 1411579601 0 1963246413 - 120643 0 0 >> - >> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 2 >> /lustre1 >> Disk quotas for grp somegroup (gid 9544): >> Filesystem kbytes quota limit grace files quota limit >> grace >> 1416507527 0 1789950778 - 190687 0 0 >> - >> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 3 >> /lustre1 >> Disk quotas for grp somegroup (gid 9544): >> Filesystem kbytes quota limit grace files quota limit >> grace >> 1636465724 0 1926578117 - 195034 0 0 >> - >> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup
Re: [lustre-discuss] Ongoing issues with quota
Thanks, I will look into the ZFS quota since we are using ZFS for all storage, MDT and OSTs. In our case, there is a single MDS/MDT. I have used Robinhood and lfs find (by group) commands to verify what the numbers should apparently be. — Dan Szkola FNAL > On Oct 9, 2023, at 10:13 AM, Andreas Dilger wrote: > > The quota accounting is controlled by the backing filesystem of the OSTs and > MDTs. > > For ldiskfs/ext4 you could run e2fsck to re-count all of the inode and block > usage. > > For ZFS you would have to ask on the ZFS list to see if there is some way to > re-count the quota usage. > > The "inode" quota is accounted from the MDTs, while the "block" quota is > accounted from the OSTs. You might be able to see with "lfs quota -v -g > group" to see if there is one particular MDT that is returning too many > inodes. > > Possibly if you have directories that are striped across many MDTs it would > inflate the used inode count. For example, if every one of the 426k > directories reported by RBH was striped across 4 MDTs then you would see the > inode count add up to 3.6M. > > If that was the case, then I would really, really advise against striping > every directory in the filesystem. That will cause problems far worse than > just inflating the inode quota accounting. > > Cheers, Andreas > >> On Oct 9, 2023, at 22:33, Daniel Szkola via lustre-discuss >> wrote: >> >> Is there really no way to force a recount of files used by the quota? All >> indications are we have accounts where files were removed and this is not >> reflected in the used file count in the quota. The space used seems correct >> but the inodes used numbers are way high. There must be a way to clear these >> numbers and have a fresh count done. >> >> — >> Dan Szkola >> FNAL >> >>> On Oct 4, 2023, at 11:37 AM, Daniel Szkola via lustre-discuss >>> wrote: >>> >>> Also, quotas on the OSTS don’t add up to near 3 million files either: >>> >>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 >>> /lustre1 >>> Disk quotas for grp somegroup (gid 9544): >>> Filesystem kbytes quota limit grace files quota limit grace >>> 1394853459 0 1913344192 - 132863 0 0 >>> - >>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 1 >>> /lustre1 >>> Disk quotas for grp somegroup (gid 9544): >>> Filesystem kbytes quota limit grace files quota limit grace >>> 1411579601 0 1963246413 - 120643 0 0 >>> - >>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 2 >>> /lustre1 >>> Disk quotas for grp somegroup (gid 9544): >>> Filesystem kbytes quota limit grace files quota limit grace >>> 1416507527 0 1789950778 - 190687 0 0 >>> - >>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 3 >>> /lustre1 >>> Disk quotas for grp somegroup (gid 9544): >>> Filesystem kbytes quota limit grace files quota limit grace >>> 1636465724 0 1926578117 - 195034 0 0 >>> - >>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 4 >>> /lustre1 >>> Disk quotas for grp somegroup (gid 9544): >>> Filesystem kbytes quota limit grace files quota limit grace >>> 2202272244 0 3020159313 - 185097 0 0 >>> - >>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 5 >>> /lustre1 >>> Disk quotas for grp somegroup (gid 9544): >>> Filesystem kbytes quota limit grace files quota limit grace >>> 1324770165 0 1371244768 - 145347 0 0 >>> - >>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 6 >>> /lustre1 >>> Disk quotas for grp somegroup (gid 9544): >>> Filesystem kbytes quota limit grace files quota limit grace >>> 2892027349 0 3221225472 - 169386 0 0 >>> - >>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 7 >>> /lustre1 >>> Disk quotas for grp somegroup (gid 9544): >>> Filesystem kbytes quota limit grace files quota limit grace >>> 2076201636 0 2474853207 - 171552 0 0 >>> - >>> >>> >>> — >>> Dan Szkola >>> FNAL >>> > On Oct 4, 2023, at 8:45 AM, Daniel Szkola via lustre-discuss > wrote: No combination of ossnodek runs has helped with this. Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' found 1796104 files as well. So why is the quota command showing over 3 million inodes used? There must be a way to force it to recount or clear all stale quota data and have it regenerate it? Anyone? — Dan Szkola FNAL
Re: [lustre-discuss] Ongoing issues with quota
The quota accounting is controlled by the backing filesystem of the OSTs and MDTs. For ldiskfs/ext4 you could run e2fsck to re-count all of the inode and block usage. For ZFS you would have to ask on the ZFS list to see if there is some way to re-count the quota usage. The "inode" quota is accounted from the MDTs, while the "block" quota is accounted from the OSTs. You might be able to see with "lfs quota -v -g group" to see if there is one particular MDT that is returning too many inodes. Possibly if you have directories that are striped across many MDTs it would inflate the used inode count. For example, if every one of the 426k directories reported by RBH was striped across 4 MDTs then you would see the inode count add up to 3.6M. If that was the case, then I would really, really advise against striping every directory in the filesystem. That will cause problems far worse than just inflating the inode quota accounting. Cheers, Andreas > On Oct 9, 2023, at 22:33, Daniel Szkola via lustre-discuss > wrote: > > Is there really no way to force a recount of files used by the quota? All > indications are we have accounts where files were removed and this is not > reflected in the used file count in the quota. The space used seems correct > but the inodes used numbers are way high. There must be a way to clear these > numbers and have a fresh count done. > > — > Dan Szkola > FNAL > >> On Oct 4, 2023, at 11:37 AM, Daniel Szkola via lustre-discuss >> wrote: >> >> Also, quotas on the OSTS don’t add up to near 3 million files either: >> >> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 >> /lustre1 >> Disk quotas for grp somegroup (gid 9544): >>Filesystem kbytes quota limit grace files quota limit grace >> 1394853459 0 1913344192 - 132863 0 0 >> - >> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 1 >> /lustre1 >> Disk quotas for grp somegroup (gid 9544): >>Filesystem kbytes quota limit grace files quota limit grace >> 1411579601 0 1963246413 - 120643 0 0 >> - >> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 2 >> /lustre1 >> Disk quotas for grp somegroup (gid 9544): >>Filesystem kbytes quota limit grace files quota limit grace >> 1416507527 0 1789950778 - 190687 0 0 >> - >> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 3 >> /lustre1 >> Disk quotas for grp somegroup (gid 9544): >>Filesystem kbytes quota limit grace files quota limit grace >> 1636465724 0 1926578117 - 195034 0 0 >> - >> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 4 >> /lustre1 >> Disk quotas for grp somegroup (gid 9544): >>Filesystem kbytes quota limit grace files quota limit grace >> 2202272244 0 3020159313 - 185097 0 0 >> - >> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 5 >> /lustre1 >> Disk quotas for grp somegroup (gid 9544): >>Filesystem kbytes quota limit grace files quota limit grace >> 1324770165 0 1371244768 - 145347 0 0 >> - >> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 6 >> /lustre1 >> Disk quotas for grp somegroup (gid 9544): >>Filesystem kbytes quota limit grace files quota limit grace >> 2892027349 0 3221225472 - 169386 0 0 >> - >> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 7 >> /lustre1 >> Disk quotas for grp somegroup (gid 9544): >>Filesystem kbytes quota limit grace files quota limit grace >> 2076201636 0 2474853207 - 171552 0 0 >> - >> >> >> — >> Dan Szkola >> FNAL >> On Oct 4, 2023, at 8:45 AM, Daniel Szkola via lustre-discuss wrote: >>> >>> No combination of ossnodek runs has helped with this. >>> >>> Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' >>> found 1796104 files as well. >>> >>> So why is the quota command showing over 3 million inodes used? >>> >>> There must be a way to force it to recount or clear all stale quota data >>> and have it regenerate it? >>> >>> Anyone? >>> >>> — >>> Dan Szkola >>> FNAL >>> >>> On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss wrote: We have a lustre filesystem that we just upgraded to 2.15.3, however this problem has been going on for some time. The quota command shows this: Disk quotas for grp somegroup (gid 9544): Filesystemused quota limit grace files quota limit grace /lustre1 13.38T 40T 45T -
Re: [lustre-discuss] Ongoing issues with quota
Is there really no way to force a recount of files used by the quota? All indications are we have accounts where files were removed and this is not reflected in the used file count in the quota. The space used seems correct but the inodes used numbers are way high. There must be a way to clear these numbers and have a fresh count done. — Dan Szkola FNAL > On Oct 4, 2023, at 11:37 AM, Daniel Szkola via lustre-discuss > wrote: > > Also, quotas on the OSTS don’t add up to near 3 million files either: > > [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 /lustre1 > Disk quotas for grp somegroup (gid 9544): > Filesystem kbytes quota limit grace files quota limit grace >1394853459 0 1913344192 - 132863 0 0 > - > [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 1 /lustre1 > Disk quotas for grp somegroup (gid 9544): > Filesystem kbytes quota limit grace files quota limit grace >1411579601 0 1963246413 - 120643 0 0 > - > [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 2 /lustre1 > Disk quotas for grp somegroup (gid 9544): > Filesystem kbytes quota limit grace files quota limit grace >1416507527 0 1789950778 - 190687 0 0 > - > [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 3 /lustre1 > Disk quotas for grp somegroup (gid 9544): > Filesystem kbytes quota limit grace files quota limit grace >1636465724 0 1926578117 - 195034 0 0 > - > [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 4 /lustre1 > Disk quotas for grp somegroup (gid 9544): > Filesystem kbytes quota limit grace files quota limit grace >2202272244 0 3020159313 - 185097 0 0 > - > [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 5 /lustre1 > Disk quotas for grp somegroup (gid 9544): > Filesystem kbytes quota limit grace files quota limit grace >1324770165 0 1371244768 - 145347 0 0 > - > [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 6 /lustre1 > Disk quotas for grp somegroup (gid 9544): > Filesystem kbytes quota limit grace files quota limit grace >2892027349 0 3221225472 - 169386 0 0 > - > [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 7 /lustre1 > Disk quotas for grp somegroup (gid 9544): > Filesystem kbytes quota limit grace files quota limit grace >2076201636 0 2474853207 - 171552 0 0 > - > > > — > Dan Szkola > FNAL > >> On Oct 4, 2023, at 8:45 AM, Daniel Szkola via lustre-discuss >> wrote: >> >> No combination of ossnodek runs has helped with this. >> >> Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' >> found 1796104 files as well. >> >> So why is the quota command showing over 3 million inodes used? >> >> There must be a way to force it to recount or clear all stale quota data and >> have it regenerate it? >> >> Anyone? >> >> — >> Dan Szkola >> FNAL >> >> >>> On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss >>> wrote: >>> >>> We have a lustre filesystem that we just upgraded to 2.15.3, however this >>> problem has been going on for some time. >>> >>> The quota command shows this: >>> >>> Disk quotas for grp somegroup (gid 9544): >>> Filesystemused quota limit grace files quota limit grace >>> /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 >>> expired >>> >>> The group is not using nearly that many files. We have robinhood installed >>> and it show this: >>> >>> Using config file '/etc/robinhood.d/lustre1.conf'. >>> group, type, count, volume, spc_used, avg_size >>> somegroup, symlink, 59071,5.12 MB, 103.16 MB, 91 >>> somegroup, dir, 426619,5.24 GB,5.24 GB, 12.87 KB >>> somegroup, file,1310414, 16.24 TB, 13.37 TB, 13.00 MB >>> >>> Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space >>> used: 14704924899840 bytes (13.37 TB) >>> >>> Any ideas what is wrong here? >>> >>> — >>> Dan Szkola >>> FNAL >>> ___ >>> lustre-discuss mailing list >>> lustre-discuss@lists.lustre.org >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI= >>> >> >>
Re: [lustre-discuss] Ongoing issues with quota
Also, quotas on the OSTS don’t add up to near 3 million files either: [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 1394853459 0 1913344192 - 132863 0 0 - [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 1 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 1411579601 0 1963246413 - 120643 0 0 - [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 2 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 1416507527 0 1789950778 - 190687 0 0 - [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 3 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 1636465724 0 1926578117 - 195034 0 0 - [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 4 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 2202272244 0 3020159313 - 185097 0 0 - [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 5 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 1324770165 0 1371244768 - 145347 0 0 - [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 6 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 2892027349 0 3221225472 - 169386 0 0 - [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 7 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 2076201636 0 2474853207 - 171552 0 0 - — Dan Szkola FNAL > On Oct 4, 2023, at 8:45 AM, Daniel Szkola via lustre-discuss > wrote: > > No combination of ossnodek runs has helped with this. > > Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' > found 1796104 files as well. > > So why is the quota command showing over 3 million inodes used? > > There must be a way to force it to recount or clear all stale quota data and > have it regenerate it? > > Anyone? > > — > Dan Szkola > FNAL > > >> On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss >> wrote: >> >> We have a lustre filesystem that we just upgraded to 2.15.3, however this >> problem has been going on for some time. >> >> The quota command shows this: >> >> Disk quotas for grp somegroup (gid 9544): >>Filesystemused quota limit grace files quota limit grace >> /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 >> expired >> >> The group is not using nearly that many files. We have robinhood installed >> and it show this: >> >> Using config file '/etc/robinhood.d/lustre1.conf'. >>group, type, count, volume, spc_used, avg_size >> somegroup, symlink, 59071,5.12 MB, 103.16 MB, 91 >> somegroup, dir, 426619,5.24 GB,5.24 GB, 12.87 KB >> somegroup, file,1310414, 16.24 TB, 13.37 TB, 13.00 MB >> >> Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space used: >> 14704924899840 bytes (13.37 TB) >> >> Any ideas what is wrong here? >> >> — >> Dan Szkola >> FNAL >> ___ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI= >> > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=k8TeSgok6MIb-uQMJaquDJS0FQPt0RQxysFNe4d7Rp5TMqGtcqdlezA_TZNuoTJS=SRDKhUKQgMW9_OohjyrkzKNYbzTw_M5BJk-bmEi_6w4= > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Ongoing issues with quota
Hi Dan, Ah, I see. Sorry, no idea - it's been a few years since I last used ZFS, and I've never used the Lustre ZFS backend. Regards, Mark On Wed, 4 Oct 2023, Daniel Szkola wrote: [EXTERNAL EMAIL] Hi Mark, All nodes are using ZFS. OSTs, MDT, and MGT are all ZFS-based, so there's really no way to fsck them. I could do a scrub, but that's not the same thing. Is there a Lustre/ZFS equivalent of 'tune2fs -O [^]quota' for ZFS? I'm guessing that at some point, a large number of files was removed and somehow quota accounting missed this. There should be a simple way to reconcile or regenerate what quota has recorded vs what is actually on disk, which I have verified two different ways. -- Dan On Wed, 2023-10-04 at 15:01 +0100, Mark Dixon wrote: Hi Dan, I think it gets corrected when you umount and fsck the OST's themselves (not lfsck). At least I recall seeing such messages when fsck'ing on 2.12. Best, Mark On Wed, 4 Oct 2023, Daniel Szkola via lustre-discuss wrote: [EXTERNAL EMAIL] No combination of lfsck runs has helped with this. Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' found 1796104 files as well. So why is the quota command showing over 3 million inodes used? There must be a way to force it to recount or clear all stale quota data and have it regenerate it? Anyone? — Dan Szkola FNAL On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss wrote: We have a lustre filesystem that we just upgraded to 2.15.3, however this problem has been going on for some time. The quota command shows this: Disk quotas for grp somegroup (gid 9544): Filesystemused quota limit grace files quota limit grace /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 expired The group is not using nearly that many files. We have robinhood installed and it show this: Using config file '/etc/robinhood.d/lustre1.conf'. group, type, count, volume, spc_used, avg_size somegroup, symlink, 59071,5.12 MB, 103.16 MB, 91 somegroup, dir, 426619,5.24 GB,5.24 GB, 12.87 KB somegroup, file,1310414, 16.24 TB, 13.37 TB, 13.00 MB Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space used: 14704924899840 bytes (13.37 TB) Any ideas what is wrong here? — Dan Szkola FNAL ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI= ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIDaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=BBVt50ufoqbL64MfSKVa87fK1B4Q0n91KVNJVmvb-9q9xOYwnzpZcOXWgUeM6fxQ=uTJ98MgxxcM61HIDJRBpfJpuLDt9Ug4ARh8P_Api3xQ= ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Ongoing issues with quota
Hi Mark, All nodes are using ZFS. OSTs, MDT, and MGT are all ZFS-based, so there's really no way to fsck them. I could do a scrub, but that's not the same thing. Is there a Lustre/ZFS equivalent of 'tune2fs -O [^]quota' for ZFS? I'm guessing that at some point, a large number of files was removed and somehow quota accounting missed this. There should be a simple way to reconcile or regenerate what quota has recorded vs what is actually on disk, which I have verified two different ways. -- Dan On Wed, 2023-10-04 at 15:01 +0100, Mark Dixon wrote: > Hi Dan, > > I think it gets corrected when you umount and fsck the OST's themselves > (not lfsck). At least I recall seeing such messages when fsck'ing on 2.12. > > Best, > > Mark > > On Wed, 4 Oct 2023, Daniel Szkola via lustre-discuss wrote: > > > [EXTERNAL EMAIL] > > > > No combination of lfsck runs has helped with this. > > > > Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' > > found 1796104 files as well. > > > > So why is the quota command showing over 3 million inodes used? > > > > There must be a way to force it to recount or clear all stale quota data > > and have it regenerate it? > > > > Anyone? > > > > — > > Dan Szkola > > FNAL > > > > > > > On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss > > > wrote: > > > > > > We have a lustre filesystem that we just upgraded to 2.15.3, however > > > this problem has been going on for some time. > > > > > > The quota command shows this: > > > > > > Disk quotas for grp somegroup (gid 9544): > > > Filesystem used quota limit grace files quota > > > limit grace > > > /lustre1 13.38T 40T 45T - 3136761* 2621440 > > > 3670016 expired > > > > > > The group is not using nearly that many files. We have robinhood > > > installed and it show this: > > > > > > Using config file '/etc/robinhood.d/lustre1.conf'. > > > group, type, count, volume, spc_used, avg_size > > > somegroup, symlink, 59071, 5.12 MB, 103.16 MB, 91 > > > somegroup, dir, 426619, 5.24 GB, 5.24 GB, 12.87 KB > > > somegroup, file, 1310414, 16.24 TB, 13.37 TB, 13.00 MB > > > > > > Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space > > > used: 14704924899840 bytes (13.37 TB) > > > > > > Any ideas what is wrong here? > > > > > > — > > > Dan Szkola > > > FNAL > > > ___ > > > lustre-discuss mailing list > > > lustre-discuss@lists.lustre.org > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI= > > > > ___ > > lustre-discuss mailing list > > lustre-discuss@lists.lustre.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIDaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=BBVt50ufoqbL64MfSKVa87fK1B4Q0n91KVNJVmvb-9q9xOYwnzpZcOXWgUeM6fxQ=uTJ98MgxxcM61HIDJRBpfJpuLDt9Ug4ARh8P_Api3xQ= > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Ongoing issues with quota
Hi Dan, I think it gets corrected when you umount and fsck the OST's themselves (not lfsck). At least I recall seeing such messages when fsck'ing on 2.12. Best, Mark On Wed, 4 Oct 2023, Daniel Szkola via lustre-discuss wrote: [EXTERNAL EMAIL] No combination of lfsck runs has helped with this. Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' found 1796104 files as well. So why is the quota command showing over 3 million inodes used? There must be a way to force it to recount or clear all stale quota data and have it regenerate it? Anyone? — Dan Szkola FNAL On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss wrote: We have a lustre filesystem that we just upgraded to 2.15.3, however this problem has been going on for some time. The quota command shows this: Disk quotas for grp somegroup (gid 9544): Filesystemused quota limit grace files quota limit grace /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 expired The group is not using nearly that many files. We have robinhood installed and it show this: Using config file '/etc/robinhood.d/lustre1.conf'. group, type, count, volume, spc_used, avg_size somegroup, symlink, 59071,5.12 MB, 103.16 MB, 91 somegroup, dir, 426619,5.24 GB,5.24 GB, 12.87 KB somegroup, file,1310414, 16.24 TB, 13.37 TB, 13.00 MB Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space used: 14704924899840 bytes (13.37 TB) Any ideas what is wrong here? — Dan Szkola FNAL ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI= ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Ongoing issues with quota
No combination of lfsck runs has helped with this. Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' found 1796104 files as well. So why is the quota command showing over 3 million inodes used? There must be a way to force it to recount or clear all stale quota data and have it regenerate it? Anyone? — Dan Szkola FNAL > On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss > wrote: > > We have a lustre filesystem that we just upgraded to 2.15.3, however this > problem has been going on for some time. > > The quota command shows this: > > Disk quotas for grp somegroup (gid 9544): > Filesystemused quota limit grace files quota limit grace > /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 > expired > > The group is not using nearly that many files. We have robinhood installed > and it show this: > > Using config file '/etc/robinhood.d/lustre1.conf'. > group, type, count, volume, spc_used, avg_size > somegroup, symlink, 59071,5.12 MB, 103.16 MB, 91 > somegroup, dir, 426619,5.24 GB,5.24 GB, 12.87 KB > somegroup, file,1310414, 16.24 TB, 13.37 TB, 13.00 MB > > Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space used: > 14704924899840 bytes (13.37 TB) > > Any ideas what is wrong here? > > — > Dan Szkola > FNAL > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI= > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org