Re: [lustre-discuss] OST went back in time: no(?) hardware issue
Hi Andreas, On 10/5/23 02:30, Andreas Dilger wrote: On Oct 3, 2023, at 16:22, Thomas Roth via lustre-discuss wrote: Hi all, in our Lustre 2.12.5 system, we have "OST went back in time" after OST hardware replacement: - hardware had reached EOL - we set `max_create_count=0` for these OSTs, searched for and migrated off the files of these OSTs - formatted the new OSTs with `--replace` and the old indices - all OSTs are on ZFS - set the OSTs `active=0` on our 3 MDTs - moved in the new hardware, reused the old NIDs, old OST indices, mounted the OSTs - set the OSTs `active=1` - ran `lfsck` on all servers - set `max_create_count=200` for these OSTs Now the "OST went back in time" messages appeard in the MDS logs. This doesn't quite fit the description in the manual. There were no crashes or power losses. I cannot understand how which cache might have been lost. The transaction numbers quoted in the error are both large, eg. `transno 55841088879 was previously committed, server now claims 4294992012` What should we do? Give `lfsck` another try? Nothing really to see here I think? Did you delete LAST_RCVD during the replacement and the OST didn't know what transno was assigned to the last RPCs it sent? The still-mounted clients have a record of this transno and are surprised that it was reset. If you unmount and remount the clients the error would go away. No, I don't think I deleted something during the procedure. - The old OST was emptied (max_create_count=0) in normal Lustre operations. Last transaction should be ~ last file being moved away. - Then the OST is deactivated, but only on the MDS, not on the clients. - Then the new OST, formatted with '--replace', is mounted. It is activated on the MDS. Up to this point no errors. - Finally, the max_create_count is increased, clients can write. - Now the MDT throws this error (nothing in the client logs). According to the manual, what should have happened when I mounted the new OST, The MDS and OSS will negotiate the LAST_ID value for the replacement OST. Ok, this is about LAST_ID, whereever that is on ZFS. About LAST_RCVD, the manual says (even in the case when the configuration files got lost and have to be recreated): The last_rcvd file will be recreated when the OST is first mounted using the default parameters, So, let's see what happens once the clients remount. Eventually, then, I should also restart the MDTs? Regards, Thomas I'm not sure if the clients might try to preserve the next 55B RPCs in memory until the committed transno on the OST catches up, or if they just accept the new transno and get on with life? Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] OST went back in time: no(?) hardware issue
On Oct 3, 2023, at 16:22, Thomas Roth via lustre-discuss wrote: > > Hi all, > > in our Lustre 2.12.5 system, we have "OST went back in time" after OST > hardware replacement: > - hardware had reached EOL > - we set `max_create_count=0` for these OSTs, searched for and migrated off > the files of these OSTs > - formatted the new OSTs with `--replace` and the old indices > - all OSTs are on ZFS > - set the OSTs `active=0` on our 3 MDTs > - moved in the new hardware, reused the old NIDs, old OST indices, mounted > the OSTs > - set the OSTs `active=1` > - ran `lfsck` on all servers > - set `max_create_count=200` for these OSTs > > Now the "OST went back in time" messages appeard in the MDS logs. > > This doesn't quite fit the description in the manual. There were no crashes > or power losses. I cannot understand how which cache might have been lost. > The transaction numbers quoted in the error are both large, eg. `transno > 55841088879 was previously committed, server now claims 4294992012` > > What should we do? Give `lfsck` another try? Nothing really to see here I think? Did you delete LAST_RCVD during the replacement and the OST didn't know what transno was assigned to the last RPCs it sent? The still-mounted clients have a record of this transno and are surprised that it was reset. If you unmount and remount the clients the error would go away. I'm not sure if the clients might try to preserve the next 55B RPCs in memory until the committed transno on the OST catches up, or if they just accept the new transno and get on with life? Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Failing build of lustre client on Debian 12
On Oct 4, 2023, at 16:26, Jan Andersen mailto:j...@comind.io>> wrote: Hi, I've just successfully built the lustre 2.15.3 client on Debian 11 and need to do the same on Debian 12; however, configure fails with: checking if Linux kernel was built with CONFIG_FHANDLE in or as module... no configure: error: Lustre fid handling requires that CONFIG_FHANDLE is enabled in your kernel. As far as I can see, CONFIG_FHANDLE is in fact enabled - eg: root@debian12:~/lustre-release# grep CONFIG_FHANDLE /boot/config-6.1.38 CONFIG_FHANDLE=y I've tried to figure out how configure checks for this, but the script is rather dense and I haven't penetrated it (yet). It seems to me there is an error in the way it checks. What is the best way forward, considering that I've already invested a lot of time and effort in setting up a slurm cluster with Debian 12? You could change the AC_MSG_ERROR() to AC_MESSAGE_WARN() or similar, if you think the check is wrong. It would be wortwhile to check if a patch has already been submitted to fix this on the master branch. Otherwise, getting a proper patch submitted to fix the check would be better than just ignoring the error and leaving it for the next person to fix. Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Ongoing issues with quota
Also, quotas on the OSTS don’t add up to near 3 million files either: [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 1394853459 0 1913344192 - 132863 0 0 - [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 1 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 1411579601 0 1963246413 - 120643 0 0 - [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 2 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 1416507527 0 1789950778 - 190687 0 0 - [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 3 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 1636465724 0 1926578117 - 195034 0 0 - [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 4 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 2202272244 0 3020159313 - 185097 0 0 - [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 5 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 1324770165 0 1371244768 - 145347 0 0 - [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 6 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 2892027349 0 3221225472 - 169386 0 0 - [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 7 /lustre1 Disk quotas for grp somegroup (gid 9544): Filesystem kbytes quota limit grace files quota limit grace 2076201636 0 2474853207 - 171552 0 0 - — Dan Szkola FNAL > On Oct 4, 2023, at 8:45 AM, Daniel Szkola via lustre-discuss > wrote: > > No combination of ossnodek runs has helped with this. > > Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' > found 1796104 files as well. > > So why is the quota command showing over 3 million inodes used? > > There must be a way to force it to recount or clear all stale quota data and > have it regenerate it? > > Anyone? > > — > Dan Szkola > FNAL > > >> On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss >> wrote: >> >> We have a lustre filesystem that we just upgraded to 2.15.3, however this >> problem has been going on for some time. >> >> The quota command shows this: >> >> Disk quotas for grp somegroup (gid 9544): >>Filesystemused quota limit grace files quota limit grace >> /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 >> expired >> >> The group is not using nearly that many files. We have robinhood installed >> and it show this: >> >> Using config file '/etc/robinhood.d/lustre1.conf'. >>group, type, count, volume, spc_used, avg_size >> somegroup, symlink, 59071,5.12 MB, 103.16 MB, 91 >> somegroup, dir, 426619,5.24 GB,5.24 GB, 12.87 KB >> somegroup, file,1310414, 16.24 TB, 13.37 TB, 13.00 MB >> >> Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space used: >> 14704924899840 bytes (13.37 TB) >> >> Any ideas what is wrong here? >> >> — >> Dan Szkola >> FNAL >> ___ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI= >> > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=k8TeSgok6MIb-uQMJaquDJS0FQPt0RQxysFNe4d7Rp5TMqGtcqdlezA_TZNuoTJS=SRDKhUKQgMW9_OohjyrkzKNYbzTw_M5BJk-bmEi_6w4= > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Ongoing issues with quota
Hi Dan, Ah, I see. Sorry, no idea - it's been a few years since I last used ZFS, and I've never used the Lustre ZFS backend. Regards, Mark On Wed, 4 Oct 2023, Daniel Szkola wrote: [EXTERNAL EMAIL] Hi Mark, All nodes are using ZFS. OSTs, MDT, and MGT are all ZFS-based, so there's really no way to fsck them. I could do a scrub, but that's not the same thing. Is there a Lustre/ZFS equivalent of 'tune2fs -O [^]quota' for ZFS? I'm guessing that at some point, a large number of files was removed and somehow quota accounting missed this. There should be a simple way to reconcile or regenerate what quota has recorded vs what is actually on disk, which I have verified two different ways. -- Dan On Wed, 2023-10-04 at 15:01 +0100, Mark Dixon wrote: Hi Dan, I think it gets corrected when you umount and fsck the OST's themselves (not lfsck). At least I recall seeing such messages when fsck'ing on 2.12. Best, Mark On Wed, 4 Oct 2023, Daniel Szkola via lustre-discuss wrote: [EXTERNAL EMAIL] No combination of lfsck runs has helped with this. Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' found 1796104 files as well. So why is the quota command showing over 3 million inodes used? There must be a way to force it to recount or clear all stale quota data and have it regenerate it? Anyone? — Dan Szkola FNAL On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss wrote: We have a lustre filesystem that we just upgraded to 2.15.3, however this problem has been going on for some time. The quota command shows this: Disk quotas for grp somegroup (gid 9544): Filesystemused quota limit grace files quota limit grace /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 expired The group is not using nearly that many files. We have robinhood installed and it show this: Using config file '/etc/robinhood.d/lustre1.conf'. group, type, count, volume, spc_used, avg_size somegroup, symlink, 59071,5.12 MB, 103.16 MB, 91 somegroup, dir, 426619,5.24 GB,5.24 GB, 12.87 KB somegroup, file,1310414, 16.24 TB, 13.37 TB, 13.00 MB Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space used: 14704924899840 bytes (13.37 TB) Any ideas what is wrong here? — Dan Szkola FNAL ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI= ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIDaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=BBVt50ufoqbL64MfSKVa87fK1B4Q0n91KVNJVmvb-9q9xOYwnzpZcOXWgUeM6fxQ=uTJ98MgxxcM61HIDJRBpfJpuLDt9Ug4ARh8P_Api3xQ= ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Failing build of lustre client on Debian 12
Hi, I've just successfully built the lustre 2.15.3 client on Debian 11 and need to do the same on Debian 12; however, configure fails with: checking if Linux kernel was built with CONFIG_FHANDLE in or as module... no configure: error: Lustre fid handling requires that CONFIG_FHANDLE is enabled in your kernel. As far as I can see, CONFIG_FHANDLE is in fact enabled - eg: root@debian12:~/lustre-release# grep CONFIG_FHANDLE /boot/config-6.1.38 CONFIG_FHANDLE=y I've tried to figure out how configure checks for this, but the script is rather dense and I haven't penetrated it (yet). It seems to me there is an error in the way it checks. What is the best way forward, considering that I've already invested a lot of time and effort in setting up a slurm cluster with Debian 12? /jan ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Ongoing issues with quota
Hi Mark, All nodes are using ZFS. OSTs, MDT, and MGT are all ZFS-based, so there's really no way to fsck them. I could do a scrub, but that's not the same thing. Is there a Lustre/ZFS equivalent of 'tune2fs -O [^]quota' for ZFS? I'm guessing that at some point, a large number of files was removed and somehow quota accounting missed this. There should be a simple way to reconcile or regenerate what quota has recorded vs what is actually on disk, which I have verified two different ways. -- Dan On Wed, 2023-10-04 at 15:01 +0100, Mark Dixon wrote: > Hi Dan, > > I think it gets corrected when you umount and fsck the OST's themselves > (not lfsck). At least I recall seeing such messages when fsck'ing on 2.12. > > Best, > > Mark > > On Wed, 4 Oct 2023, Daniel Szkola via lustre-discuss wrote: > > > [EXTERNAL EMAIL] > > > > No combination of lfsck runs has helped with this. > > > > Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' > > found 1796104 files as well. > > > > So why is the quota command showing over 3 million inodes used? > > > > There must be a way to force it to recount or clear all stale quota data > > and have it regenerate it? > > > > Anyone? > > > > — > > Dan Szkola > > FNAL > > > > > > > On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss > > > wrote: > > > > > > We have a lustre filesystem that we just upgraded to 2.15.3, however > > > this problem has been going on for some time. > > > > > > The quota command shows this: > > > > > > Disk quotas for grp somegroup (gid 9544): > > > Filesystem used quota limit grace files quota > > > limit grace > > > /lustre1 13.38T 40T 45T - 3136761* 2621440 > > > 3670016 expired > > > > > > The group is not using nearly that many files. We have robinhood > > > installed and it show this: > > > > > > Using config file '/etc/robinhood.d/lustre1.conf'. > > > group, type, count, volume, spc_used, avg_size > > > somegroup, symlink, 59071, 5.12 MB, 103.16 MB, 91 > > > somegroup, dir, 426619, 5.24 GB, 5.24 GB, 12.87 KB > > > somegroup, file, 1310414, 16.24 TB, 13.37 TB, 13.00 MB > > > > > > Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space > > > used: 14704924899840 bytes (13.37 TB) > > > > > > Any ideas what is wrong here? > > > > > > — > > > Dan Szkola > > > FNAL > > > ___ > > > lustre-discuss mailing list > > > lustre-discuss@lists.lustre.org > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI= > > > > ___ > > lustre-discuss mailing list > > lustre-discuss@lists.lustre.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIDaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=BBVt50ufoqbL64MfSKVa87fK1B4Q0n91KVNJVmvb-9q9xOYwnzpZcOXWgUeM6fxQ=uTJ98MgxxcM61HIDJRBpfJpuLDt9Ug4ARh8P_Api3xQ= > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Ongoing issues with quota
Hi Dan, I think it gets corrected when you umount and fsck the OST's themselves (not lfsck). At least I recall seeing such messages when fsck'ing on 2.12. Best, Mark On Wed, 4 Oct 2023, Daniel Szkola via lustre-discuss wrote: [EXTERNAL EMAIL] No combination of lfsck runs has helped with this. Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' found 1796104 files as well. So why is the quota command showing over 3 million inodes used? There must be a way to force it to recount or clear all stale quota data and have it regenerate it? Anyone? — Dan Szkola FNAL On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss wrote: We have a lustre filesystem that we just upgraded to 2.15.3, however this problem has been going on for some time. The quota command shows this: Disk quotas for grp somegroup (gid 9544): Filesystemused quota limit grace files quota limit grace /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 expired The group is not using nearly that many files. We have robinhood installed and it show this: Using config file '/etc/robinhood.d/lustre1.conf'. group, type, count, volume, spc_used, avg_size somegroup, symlink, 59071,5.12 MB, 103.16 MB, 91 somegroup, dir, 426619,5.24 GB,5.24 GB, 12.87 KB somegroup, file,1310414, 16.24 TB, 13.37 TB, 13.00 MB Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space used: 14704924899840 bytes (13.37 TB) Any ideas what is wrong here? — Dan Szkola FNAL ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI= ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Ongoing issues with quota
No combination of lfsck runs has helped with this. Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' found 1796104 files as well. So why is the quota command showing over 3 million inodes used? There must be a way to force it to recount or clear all stale quota data and have it regenerate it? Anyone? — Dan Szkola FNAL > On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss > wrote: > > We have a lustre filesystem that we just upgraded to 2.15.3, however this > problem has been going on for some time. > > The quota command shows this: > > Disk quotas for grp somegroup (gid 9544): > Filesystemused quota limit grace files quota limit grace > /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 > expired > > The group is not using nearly that many files. We have robinhood installed > and it show this: > > Using config file '/etc/robinhood.d/lustre1.conf'. > group, type, count, volume, spc_used, avg_size > somegroup, symlink, 59071,5.12 MB, 103.16 MB, 91 > somegroup, dir, 426619,5.24 GB,5.24 GB, 12.87 KB > somegroup, file,1310414, 16.24 TB, 13.37 TB, 13.00 MB > > Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space used: > 14704924899840 bytes (13.37 TB) > > Any ideas what is wrong here? > > — > Dan Szkola > FNAL > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI= > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org