Re: [lustre-discuss] nodemap exports
Thanks Sebastien, this put all clients into the correct LNet. And nodemap.exports looks as expected. (Had to add 'options lnet lnet_peer_discovery_disabled=1' for ‘-o network=’ to work). Regards Thomas On 6/19/24 08:31, Sebastien Buisson wrote: Hi, I am not sure the NIDs displayed under ‘exports’ and in the logs are the actual NIDs used to communicate with, maybe they are just identifiers to describe the peers. If you want to restrict a client to a given LNet network for a given mount point, you can use the ‘-o network=’ mount option, like ‘-o network=o2ib8’ in your case. Cheers, Sebastien. Le 18 juin 2024 à 18:01, Thomas Roth via lustre-discuss a écrit : OK, client 247 is the good one ;-) It is rather an LNET issue. All three clients have: options lnet networks="o2ib8(ib0),o2ib6(ib0)" I can mount my nodemapped Lustre, with its MGS on 10.20.6.63@o2ib8, and another Lustre on o2ib6. However, the log of the MGS at 10.20.6.63@o2ib8 remarks MGS: Connection restored to 15da687f-7a11-4913-9f9d-3c764c97a9cb (at 10.20.3.246@o2ib6) MGS: Connection restored to 059da3e0-b277-4767-8678-e83730769fb8 (at 10.20.3.248@o2ib6) and then MGS: Connection restored to a0ee6b8c-48b9-46e8-ba2c-9448889c77ed (at 10.20.3.247@o2ib8) I can see the "alien" nids also in mgs # ls /proc/fs/lustre/mgs/MGS/exports ... 10.20.3.246@o2ib6 10.20.3.247@o2ib8 10.20.3.248@o2ib6 Question is: Why would an MGS on "o2ib8" accept connections from a client on "o2ib6"? Obviously, this would not happen if the client had only o2ib6. So the MGS is somewhat confused, since the LNET connection is actually via o2ib8, but the labels and the nodemapping use o2ib6. This is boot-resistant, the MGS has these wrong nids stored somewhere - can I erase them and start again with correct nids? Regards Thomas On 6/18/24 17:33, Thomas Roth via lustre-discuss wrote: Hi all, what is the meaning of the "exports" property/parameter of a nodemap? I have mgs ]# lctl nodemap_info newclients ... There are three clients: mgs # lctl get_param nodemap.newclients.ranges nodemap.newclients.ranges= [ { id: 13, start_nid: 10.20.3.246@o2ib8, end_nid: 10.20.3.246@o2ib8 }, { id: 12, start_nid: 10.20.3.247@o2ib8, end_nid: 10.20.3.247@o2ib8 }, { id: 9, start_nid: 10.20.3.248@o2ib8, end_nid: 10.20.3.248@o2ib8 } This nodemap has nodemap.newclients.admin_nodemap=0 nodemap.newclients.trusted_nodemap=0 nodemap.newclients.deny_unknown=1 and mgs # lctl get_param nodemap.newclients.exports nodemap.newclients.exports= [ { nid: 10.20.3.247@o2ib8, uuid: 5d9964f9-81eb-4ea5-93dc-a145534f9e74 }, ] _This_ client, 10.20.3.247, behaves differently: No access for root (ok!), no access for a regular user. The other two clients, 10.20.3.246/248, show no access for root (ok!), while a regular users sees the squashed (uid 99) directories of the top level and his own directories/files with correct uid/gid beneath. And the only difference between these clients seems to be these 'exports' (totally absent from the manual, btw). Regards, Thomas -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Katharina Stummeyer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: Ministerialdirigent Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Katharina Stummeyer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: Ministerialdirigent Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] nodemap exports
OK, client 247 is the good one ;-) It is rather an LNET issue. All three clients have: > options lnet networks="o2ib8(ib0),o2ib6(ib0)" I can mount my nodemapped Lustre, with its MGS on 10.20.6.63@o2ib8, and another Lustre on o2ib6. However, the log of the MGS at 10.20.6.63@o2ib8 remarks > MGS: Connection restored to 15da687f-7a11-4913-9f9d-3c764c97a9cb (at 10.20.3.246@o2ib6) > MGS: Connection restored to 059da3e0-b277-4767-8678-e83730769fb8 (at 10.20.3.248@o2ib6) and then > MGS: Connection restored to a0ee6b8c-48b9-46e8-ba2c-9448889c77ed (at 10.20.3.247@o2ib8) I can see the "alien" nids also in mgs # ls /proc/fs/lustre/mgs/MGS/exports ... 10.20.3.246@o2ib6 10.20.3.247@o2ib8 10.20.3.248@o2ib6 Question is: Why would an MGS on "o2ib8" accept connections from a client on "o2ib6"? Obviously, this would not happen if the client had only o2ib6. So the MGS is somewhat confused, since the LNET connection is actually via o2ib8, but the labels and the nodemapping use o2ib6. This is boot-resistant, the MGS has these wrong nids stored somewhere - can I erase them and start again with correct nids? Regards Thomas On 6/18/24 17:33, Thomas Roth via lustre-discuss wrote: Hi all, what is the meaning of the "exports" property/parameter of a nodemap? I have mgs ]# lctl nodemap_info newclients ... There are three clients: mgs # lctl get_param nodemap.newclients.ranges nodemap.newclients.ranges= [ { id: 13, start_nid: 10.20.3.246@o2ib8, end_nid: 10.20.3.246@o2ib8 }, { id: 12, start_nid: 10.20.3.247@o2ib8, end_nid: 10.20.3.247@o2ib8 }, { id: 9, start_nid: 10.20.3.248@o2ib8, end_nid: 10.20.3.248@o2ib8 } This nodemap has nodemap.newclients.admin_nodemap=0 nodemap.newclients.trusted_nodemap=0 nodemap.newclients.deny_unknown=1 and mgs # lctl get_param nodemap.newclients.exports nodemap.newclients.exports= [ { nid: 10.20.3.247@o2ib8, uuid: 5d9964f9-81eb-4ea5-93dc-a145534f9e74 }, ] _This_ client, 10.20.3.247, behaves differently: No access for root (ok!), no access for a regular user. The other two clients, 10.20.3.246/248, show no access for root (ok!), while a regular users sees the squashed (uid 99) directories of the top level and his own directories/files with correct uid/gid beneath. And the only difference between these clients seems to be these 'exports' (totally absent from the manual, btw). Regards, Thomas -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Katharina Stummeyer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: Ministerialdirigent Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] nodemap exports
Hi all, what is the meaning of the "exports" property/parameter of a nodemap? I have mgs ]# lctl nodemap_info newclients ... There are three clients: mgs # lctl get_param nodemap.newclients.ranges nodemap.newclients.ranges= [ { id: 13, start_nid: 10.20.3.246@o2ib8, end_nid: 10.20.3.246@o2ib8 }, { id: 12, start_nid: 10.20.3.247@o2ib8, end_nid: 10.20.3.247@o2ib8 }, { id: 9, start_nid: 10.20.3.248@o2ib8, end_nid: 10.20.3.248@o2ib8 } This nodemap has nodemap.newclients.admin_nodemap=0 nodemap.newclients.trusted_nodemap=0 nodemap.newclients.deny_unknown=1 and mgs # lctl get_param nodemap.newclients.exports nodemap.newclients.exports= [ { nid: 10.20.3.247@o2ib8, uuid: 5d9964f9-81eb-4ea5-93dc-a145534f9e74 }, ] _This_ client, 10.20.3.247, behaves differently: No access for root (ok!), no access for a regular user. The other two clients, 10.20.3.246/248, show no access for root (ok!), while a regular users sees the squashed (uid 99) directories of the top level and his own directories/files with correct uid/gid beneath. And the only difference between these clients seems to be these 'exports' (totally absent from the manual, btw). Regards, Thomas -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Katharina Stummeyer,Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: Ministerialdirigent Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] default Nodemap: ll_close_inode_openhandle errors
Hi all, as described before, I have a test cluster with the nodemap feature activated, and managed to get a bunch of clients to mount (by deactivating selinux), while these clients are in the "default" nodemap. I left the "default"s property `admin=0` and set `trusted=1`, and everything seemed to work. Now I ran a compile+install benchmark on these nodes, which finished successfully everywhere and with the expected performance. However, the client logs all show a large number of the following errors: > LustreError: 2967:0:(mdc_locks.c:1388:mdc_intent_getattr_async_interpret()) mdstest-MDT-mdc-9a0bc355f000: ldlm_cli_enqueue_fini() failed: rc = -13 > LustreError: 53842:0:(file.c:241:ll_close_inode_openhandle()) mdstest-clilmv-9a0bc355f000: inode [0x20429:0x9d8d:0x0] mdc close failed: rc = -13 Error code 13 is /* Permission denied */ Therefore I repeated the benchmark on one of the "Admin" nodes - the nodemap with both `admin=1` and `trusted=1` - and this client does not show these errors. More than coincidence? Given that the benchmark seems to have finished successfully, I would ignore these errors. On the other hand, if inodes cannot be handled - that sounds severe? Regards, Thomas ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] default Nodemap : clients cannot mount
OK - my bad: selinux was on. It's a bunch of test hosts = sloppy configuration = default selinux settings. With selinux=disabled, one of these hosts mounts, and if I give it trusted=1, the users are enabled, root is squashed - all fine. Cheers Thomas On 4/22/24 16:50, Thomas Roth via lustre-discuss wrote: Hi all, - Lustre 2.15.4 test system with MDS + 2 OSS + 2 administrative clients. I activated nodemapping and put all these hosts into an "Admin" nodemap, which has the properties `admin=1` and `trusted=1` - all works fine. Now there are a couple of other hosts which should not become administrative clients, but just standard clients => they should be / remain in the "default" nodemap. The "default" nodemap has `admin=0` and `trusted=0`, as verified by `lctl get_param nodemap.default` - and these hosts cannot mount. Error message is "l_getidentity: no such user 99" I verified that these hosts actually are seen as "default" nodes by setting `admin=1` for one of them - mounts. Umount, lustre_rmmod, set `admin=0` again - does not mount anymore. Atm I do not see what I overlooked, but I am certain this has worked before, where "before" would mean other hardware and perhaps Lustre 2.15.1 *Switching* is still ok: - Put client X into "Admin" nodemap, mount Lustre, remove client X from "Admin" nodemap, wait, try to `ls` as root - fails. Set the property `trusted=1` on the "default" nodemap, wait, try to `ls` as a user - works. However, this defeats the purpose of having a usable default... Regards, Thomas -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: Ministerialdirigent Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] default Nodemap : clients cannot mount
Hi all, - Lustre 2.15.4 test system with MDS + 2 OSS + 2 administrative clients. I activated nodemapping and put all these hosts into an "Admin" nodemap, which has the properties `admin=1` and `trusted=1` - all works fine. Now there are a couple of other hosts which should not become administrative clients, but just standard clients => they should be / remain in the "default" nodemap. The "default" nodemap has `admin=0` and `trusted=0`, as verified by `lctl get_param nodemap.default` - and these hosts cannot mount. Error message is "l_getidentity: no such user 99" I verified that these hosts actually are seen as "default" nodes by setting `admin=1` for one of them - mounts. Umount, lustre_rmmod, set `admin=0` again - does not mount anymore. Atm I do not see what I overlooked, but I am certain this has worked before, where "before" would mean other hardware and perhaps Lustre 2.15.1 *Switching* is still ok: - Put client X into "Admin" nodemap, mount Lustre, remove client X from "Admin" nodemap, wait, try to `ls` as root - fails. Set the property `trusted=1` on the "default" nodemap, wait, try to `ls` as a user - works. However, this defeats the purpose of having a usable default... Regards, Thomas -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: Ministerialdirigent Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] ldiskfs / mdt size limits
Hi all, confused about size limits: I distinctly remember trying to format a ~19 TB disk / LV for use as an MDT, with ldiskfs, and failing to do so: the max size for the underlying ext4 is 16 TB. Knew that, had ignoed that, but not a problem back then - just adapted the logical volume's size. Now I have a 24T disk, and neither mkfs.lustre nor Lustre itself have show any issues with it. 'df -h' does show the 24T, 'df -ih' shows the expected 4G of inodes. I suppose this MDS has a lot of space for directories and stuff, or for DOM. But why does it work in the first place? ldiskfs extends beyond all limits these days? Regards, Thomas -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: Ministerialdirigent Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [EXTERNAL] [BULK] MDS hardware - NVME?
Actually we had MDTs on software raid-1 *connecting two JBODs* for quite some time - worked surprisingly well and stable. Still, personally I would prefer ZFS anytime. Nowadays we have all our OSTs are on ZFS, very stable. Of course, a look at all the possible ZFS parameters tells me that surely I have overlooked a crucial tuning tweak ;-) Hmm, if you have your MDTs on a zpool of mirrors aka raid-10, wouldn't going towards raidz2 increase data safety, something you don't need if the SSDs anyhow never fail? Doesn't raidz2 protect against failure of *any* two disks - in a pool of mirrors the second failure could destroy one mirror? Regards Thomas On 1/9/24 20:57, Cameron Harr via lustre-discuss wrote: Thomas, We value management over performance and have knowingly left performance on the floor in the name of standardization, robustness, management, etc; while still maintaining our performance targets. We are a heavy ZFS-on-Linux (ZoL) shop so we never considered MD-RAID, which, IMO, is very far behind ZoL in enterprise storage features. As Jeff mentioned, we have done some tuning (and if you haven't noticed there are *a lot* of possible ZFS parameters) to further improve performance and are at a good place performance-wise. Cameron On 1/8/24 10:33, Jeff Johnson wrote: Today nvme/mdraid/ldiskfs will beat nvme/zfs on MDS IOPs but you can close the gap somewhat with tuning, zfs ashift/recordsize and special allocation class vdevs. While the IOPs performance favors nvme/mdraid/ldiskfs there are tradeoffs. The snapshot/backup abilities of ZFS and the security it provides to the most critical function in a Lustre file system shouldn't be undervalued. From personal experience, I'd much rather deal with zfs in the event of a seriously jackknifed MDT than mdraid/ldiskfs and both zfs and mdraid/ldiskfs are preferable to trying to unscramble a vendor blackbox hwraid volume. ;-) When zfs directio lands and is fully integrated into Lustre the performance differences *should* be negligible. Just my $.02 worth On Mon, Jan 8, 2024 at 8:23 AM Thomas Roth via lustre-discuss wrote: Hi Cameron, did you run a performance comparison between ZFS and mdadm-raid on the MDTs? I'm currently doing some tests, and the results favor software raid, in particular when it comes to IOPS. Regards Thomas On 1/5/24 19:55, Cameron Harr via lustre-discuss wrote: This doesn't answer your question about ldiskfs on zvols, but we've been running MDTs on ZFS on NVMe in production for a couple years (and on SAS SSDs for many years prior). Our current production MDTs using NVMe consist of one zpool/node made up of 3x 2-drive mirrors, but we've been experimenting lately with using raidz3 and possibly even raidz2 for MDTs since SSDs have been pretty reliable for us. Cameron On 1/5/24 9:07 AM, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss wrote: We are in the process of retiring two long standing LFS's (about 8 years old), which we built and managed ourselves. Both use ZFS and have the MDT'S on ssd's in a JBOD that require the kind of software-based management you describe, in our case ZFS pools built on multipath devices. The MDT in one is ZFS and the MDT in the other LFS is ldiskfs but uses ZFS and a zvol as you describe - we build the ldiskfs MDT on top of the zvol. Generally, this has worked well for us, with one big caveat. If you look for my posts to this list and the ZFS list you'll find more details. The short version is that we utilize ZFS snapshots and clones to do backups of the metadata. We've run into situations where the backup process stalls, leaving a clone hanging around. We've experienced a situation a couple of times where the clone and the primary zvol get swapped, effectively rolling back our metadata to the point when the clone was created. I have tried, unsuccessfully, to recreate that in a test environment. So if you do that kind of setup, make sure you have good monitoring in place to detect if your backups/clones stall. We've kept up with lustre and ZFS updates over the years and are currently on lustre 2.14 and ZFS 2.1. We've seen the gap between our ZFS MDT and ldiskfs performance shrink to the point where they are pretty much on par to each now. I think our ZFS MDT performance could be better with more hardware and software tuning but our small team hasn't had the bandwidth to tackle that. Our newest LFS is vendor provided and uses NVMe MDT's. I'm not at liberty to talk about the proprietary way those devices are managed. However, the metadata performance is SO much better than our older LFS's, for a lot of reasons, but I'd highly recommend NVMe's for your MDT's. -Original Message- From: lustre-discuss mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of Thomas Roth via lus
Re: [lustre-discuss] Extending Lustre file system
Yes, sorry, I meant the actual procedure of mounting the OSTs for the first time. Last year I did that with 175 OSTs - replacements for EOL hardware. All OSTs had been formatted with a specific index, so probably creating a suitable /etc/fstab everywhere and sending a 'mount -a -t lustre' to all OSTs simultaneously would have worked. But why the hurry? Instead, I logged in to my new OSS, mounted the OSTs with 2 sec between each mount command, watched the OSS log, watched the MDS log, saw the expected log messages, proceeded to the new OSS - all fine ;-) Such a leisurely approach takes its time, of course. Once all OSTs were happily incorporated, we put the max_create_count (set to 0 before) to some finite value and started file migration. As long as the migration is more effective, faster, than the users's file creations, the result should be evenly filled OSTs with a good mixture of files (file sizes, ages, types). Cheers Thomas On 1/8/24 19:07, Andreas Dilger wrote: The need to rebalance depends on how full the existing OSTs are. My recommendation if you know that the data will continue to grow is to add new OSTs when the existing ones are at 60-70% full, and add them in larger groups rather than one at a time. Cheers, Andreas On Jan 8, 2024, at 09:29, Thomas Roth via lustre-discuss wrote: Just mount the OSTs, one by one and perhaps not if your system is heavily loaded. Follow what happens in the MDS log and the OSS log. And try to rebalance the OSTs fill levels afterwards - very empty OSTs will attract all new files, which might be hot and direct your users's fire to your new OSS only. Regards, Thomas On 1/8/24 15:38, Backer via lustre-discuss wrote: Hi, Good morning and happy new year! I have a quick question on extending a lustre file system. The extension is performed online. I am looking for any best practices or anything to watchout while doing the file system extension. The file system extension is done adding new OSS and many OSTs within these servers. Really appreciate your help on this. Regards, ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Extending Lustre file system
Just mount the OSTs, one by one and perhaps not if your system is heavily loaded. Follow what happens in the MDS log and the OSS log. And try to rebalance the OSTs fill levels afterwards - very empty OSTs will attract all new files, which might be hot and direct your users's fire to your new OSS only. Regards, Thomas On 1/8/24 15:38, Backer via lustre-discuss wrote: Hi, Good morning and happy new year! I have a quick question on extending a lustre file system. The extension is performed online. I am looking for any best practices or anything to watchout while doing the file system extension. The file system extension is done adding new OSS and many OSTs within these servers. Really appreciate your help on this. Regards, ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [EXTERNAL] [BULK] MDS hardware - NVME?
Hi Cameron, did you run a performance comparison between ZFS and mdadm-raid on the MDTs? I'm currently doing some tests, and the results favor software raid, in particular when it comes to IOPS. Regards Thomas On 1/5/24 19:55, Cameron Harr via lustre-discuss wrote: This doesn't answer your question about ldiskfs on zvols, but we've been running MDTs on ZFS on NVMe in production for a couple years (and on SAS SSDs for many years prior). Our current production MDTs using NVMe consist of one zpool/node made up of 3x 2-drive mirrors, but we've been experimenting lately with using raidz3 and possibly even raidz2 for MDTs since SSDs have been pretty reliable for us. Cameron On 1/5/24 9:07 AM, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss wrote: We are in the process of retiring two long standing LFS's (about 8 years old), which we built and managed ourselves. Both use ZFS and have the MDT'S on ssd's in a JBOD that require the kind of software-based management you describe, in our case ZFS pools built on multipath devices. The MDT in one is ZFS and the MDT in the other LFS is ldiskfs but uses ZFS and a zvol as you describe - we build the ldiskfs MDT on top of the zvol. Generally, this has worked well for us, with one big caveat. If you look for my posts to this list and the ZFS list you'll find more details. The short version is that we utilize ZFS snapshots and clones to do backups of the metadata. We've run into situations where the backup process stalls, leaving a clone hanging around. We've experienced a situation a couple of times where the clone and the primary zvol get swapped, effectively rolling back our metadata to the point when the clone was created. I have tried, unsuccessfully, to recreate that in a test environment. So if you do that kind of setup, make sure you have good monitoring in place to detect if your backups/clones stall. We've kept up with lustre and ZFS updates over the years and are currently on lustre 2.14 and ZFS 2.1. We've seen the gap between our ZFS MDT and ldiskfs performance shrink to the point where they are pretty much on par to each now. I think our ZFS MDT performance could be better with more hardware and software tuning but our small team hasn't had the bandwidth to tackle that. Our newest LFS is vendor provided and uses NVMe MDT's. I'm not at liberty to talk about the proprietary way those devices are managed. However, the metadata performance is SO much better than our older LFS's, for a lot of reasons, but I'd highly recommend NVMe's for your MDT's. -Original Message- From: lustre-discuss mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of Thomas Roth via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> Reply-To: Thomas Roth mailto:t.r...@gsi.de>> Date: Friday, January 5, 2024 at 9:03 AM To: Lustre Diskussionsliste mailto:lustre-discuss@lists.lustre.org>> Subject: [EXTERNAL] [BULK] [lustre-discuss] MDS hardware - NVME? CAUTION: This email originated from outside of NASA. Please take care when clicking links or opening attachments. Use the "Report Message" button to report suspicious messages to the NASA SOC. Dear all, considering NVME storage for the next MDS. As I understand, NVME disks are bundled in software, not by a hardware raid controller. This would be done using Linux software raid, mdadm, correct? We have some experience with ZFS, which we use on our OSTs. But I would like to stick to ldiskfs for the MDTs, and a zpool with a zvol on top which is then formatted with ldiskfs - to much voodoo... How is this handled elsewhere? Any experiences? The available devices are quite large. If I create a raid-10 out of 4 disks, e.g. 7 TB each, my MDT will be 14 TB - already close to the 16 TB limit. So no need for a box with lots of U.3 slots. But for MDS operations, we will still need a powerful dual-CPU system with lots of RAM. Then the NVME devices should be distributed between the CPUs? Is there a way to pinpoint this in a call for tender? Best regards, Thomas Thomas Roth GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, https://urldefense.us/v3/__http://www.gsi.de/__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF_rGY74EQ$ <https://urldefense.us/v3/__http://www.gsi.de/__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF_rGY74EQ$ > Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: St
[lustre-discuss] MDS hardware - NVME?
Dear all, considering NVME storage for the next MDS. As I understand, NVME disks are bundled in software, not by a hardware raid controller. This would be done using Linux software raid, mdadm, correct? We have some experience with ZFS, which we use on our OSTs. But I would like to stick to ldiskfs for the MDTs, and a zpool with a zvol on top which is then formatted with ldiskfs - to much voodoo... How is this handled elsewhere? Any experiences? The available devices are quite large. If I create a raid-10 out of 4 disks, e.g. 7 TB each, my MDT will be 14 TB - already close to the 16 TB limit. So no need for a box with lots of U.3 slots. But for MDS operations, we will still need a powerful dual-CPU system with lots of RAM. Then the NVME devices should be distributed between the CPUs? Is there a way to pinpoint this in a call for tender? Best regards, Thomas Thomas Roth GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] adding a new OST to live system
Not a problem at all. Perhaps if you manage to mount your new OST for the first time just when your MGS/MDT and your network are completely overloaded and almost unreactive, then, perhaps, there might be issues ;-) Afterwards the new OST, being empty, will attract most of the files that are newly created. That could result in an imbalance - old, cold data vs. new, hot data. In our case, we migrate some of the old data around, such that the fill level of the OSTs becomes ~equal. Regards, Thomas On 12/1/23 19:18, Lana Deere via lustre-discuss wrote: I'm looking at the manual, 14.8, Adding a New OST to a Lustre File System, and it looks straightforward. It isn;'t clear to me, however, whether it is OK to do this while the rest of the lustre system is live. Is it OK to add a new OST while the system is in use? Or do I need to arrange downtime for the system to do this? Thanks. .. Lana (lana.de...@gmail.com) ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] OST is not mounting
So, did you do the "writeconf"? And the OST mounted afterwards? As I understand, the MGS was under the impression that this re-mounting OST was actually a new one using an old index. So, what made your repaired OST look new/different ? I would probably have mounted it locally, as an ext4 file system, if only to check that there is data still present (ok, "df" would do that, too). "tunefs.lustre --dryrun" will show other quantum numbers that _should not_ change when taking down and remounting an OST. And since "writeconf" has to be done on all targets, you have to take down your MDS anyhow - so nothing is lost by simply trying an MDS restart? Regards Thomas On 11/5/23 17:11, Backer via lustre-discuss wrote: Hi, I am new to this email list. Looking to get some help on why an OST is not getting mounted. The cluster was running healthy and the OST experienced an issue and Linux re-mounted the OST read only. After fixing the issue and rebooting the node multiple times, it wouldn't mount. When the mount is done, the mount command errors out stating that that the index is already in use. The index for the device is 33. There is no place where this index is mounted. The debug message from the MGS during the mount is attached at the end of this email. It is asking to use writeconf. After using writeconfig, the device was mounted. Looking for a couple of things here. - I am hoping that the writeconf method is the right thing to do here. - Why did OST become in this state after the write failure and was mounted RO. The write error was due to iSCSI target going offline and coming back after a few seconds later. 2000:0100:17.0:1698240468.758487:0:91492:0:(mgs_handler.c:496:mgs_target_reg()) updating fs1-OST0021, index=33 2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:4403:mgs_write_log_target()) Process entered 2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:671:mgs_set_index()) Process entered 2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:572:mgs_find_or_make_fsdb()) Process entered 2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:551:mgs_find_or_make_fsdb_nolock()) Process entered 2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:565:mgs_find_or_make_fsdb_nolock()) Process leaving (rc=0 : 0 : 0) 2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:578:mgs_find_or_make_fsdb()) Process leaving (rc=0 : 0 : 0) 2000:0202:17.0:1698240468.758490:0:91492:0:(mgs_llog.c:711:mgs_set_index()) 140-5: Server fs1-OST0021 requested index 33, but that index is already in use. Use --writeconf to force 2000:0001:17.0:1698240468.772355:0:91492:0:(mgs_llog.c:712:mgs_set_index()) Process leaving via out_up (rc=18446744073709551518 : -98 : 0xff9e) 2000:0001:17.0:1698240468.772356:0:91492:0:(mgs_llog.c:4408:mgs_write_log_target()) Process leaving (rc=18446744073709551518 : -98 : ff9e) 2000:0002:17.0:1698240468.772357:0:91492:0:(mgs_handler.c:503:mgs_target_reg()) Failed to write fs1-OST0021 log (-98) 2000:0001:17.0:1698240468.783747:0:91492:0:(mgs_handler.c:504:mgs_target_reg()) Process leaving via out (rc=18446744073709551518 : -98 : 0xff9e) ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre-Manual on lfsck - non-existing entries?
Thanks! In the end, it was a typo. But your explanation about parameters having wandered off to debugfs has helped me find some other info, long lost from /proc and not Lustre-related ;-) Regards Thomas On 10/31/23 22:34, Andreas Dilger wrote: On Oct 31, 2023, at 13:12, Thomas Roth via lustre-discuss wrote: Hi all, after starting an `lctl lfsck_start -A -C -o` and the oi_scrub having completed, I would check the layout scan as described in the Lustre manual, "36.4.3.3. LFSCK status of layout via procfs", by lctl get_param -n mdd.FSNAME-MDT_target.lfsck_layout Doesn't work, and inspection of 'ls /sys/fs/lustre/mdd/FSNAME-MDT/' shows: ... lfsck_async_windows lfsck_speed_limit ... as the only entries showing the string "lfsck". lctl lfsck_query -M FSNAME-MDT -t layout does show some info, although it is not what the manual describes as output of the `lctl get_param` command. Issue with the manual or issue with our Lustre? Are you perhaps running the "lctl get_param" as a non-root user? One of the wonderful quirks of the kernel is that they don't want new parameters stored in procfs, and they don't want "complex" parameters (more than one value) stored in sysfs, so by necessity this means anything "complex" needs to go into debugfs (/sys/kernel/debug) but that was changed at some point to only be accessible by root. As such, you need to be root to access any of the "complex" parameters/stats: $ lctl get_param mdd.*.lfsck_layout error: get_param: param_path 'mdd/*/lfsck_layout': No such file or directory $ sudo lctl get_param mdd.*.lfsck_layout mdd.myth-MDT.lfsck_layout= name: lfsck_layout magic: 0xb1732fed version: 2 status: completed flags: param: all_targets last_completed_time: 1694676243 time_since_last_completed: 4111337 seconds latest_start_time: 1694675639 time_since_latest_start: 4111941 seconds last_checkpoint_time: 1694676243 time_since_last_checkpoint: 4111337 seconds latest_start_position: 12 last_checkpoint_position: 4194304 first_failure_position: 0 success_count: 6 repaired_dangling: 0 repaired_unmatched_pair: 0 repaired_multiple_referenced: 0 repaired_orphan: 0 repaired_inconsistent_owner: 0 repaired_others: 0 skipped: 0 failed_phase1: 0 failed_phase2: 0 checked_phase1: 3791402 checked_phase2: 0 run_time_phase1: 595 seconds run_time_phase2: 8 seconds average_speed_phase1: 6372 items/sec average_speed_phase2: 0 objs/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A $ sudo ls /sys/kernel/debug/lustre/mdd/myth-MDT/ total 0 0 changelog_current_mask 0 changelog_users 0 lfsck_namespace 0 changelog_mask 0 lfsck_layout Getting an update to the manual to clarify this requirement would be welcome. Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Lustre-Manual on lfsck - non-existing entries?
Hi all, after starting an `lctl lfsck_start -A -C -o` and the oi_scrub having completed, I would check the layout scan as described in the Lustre manual, "36.4.3.3. LFSCK status of layout via procfs", by > lctl get_param -n mdd.FSNAME-MDT_target.lfsck_layout Doesn't work, and inspection of 'ls /sys/fs/lustre/mdd/FSNAME-MDT/' shows: > ... > lfsck_async_windows > lfsck_speed_limit ... as the only entries showing the string "lfsck". > lctl lfsck_query -M FSNAME-MDT -t layout does show some info, although it is not what the manual describes as output of the `lctl get_param` command. Issue with the manual or issue with our Lustre? Regards Thomas ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] OST went back in time: no(?) hardware issue
Hi Andreas, On 10/5/23 02:30, Andreas Dilger wrote: On Oct 3, 2023, at 16:22, Thomas Roth via lustre-discuss wrote: Hi all, in our Lustre 2.12.5 system, we have "OST went back in time" after OST hardware replacement: - hardware had reached EOL - we set `max_create_count=0` for these OSTs, searched for and migrated off the files of these OSTs - formatted the new OSTs with `--replace` and the old indices - all OSTs are on ZFS - set the OSTs `active=0` on our 3 MDTs - moved in the new hardware, reused the old NIDs, old OST indices, mounted the OSTs - set the OSTs `active=1` - ran `lfsck` on all servers - set `max_create_count=200` for these OSTs Now the "OST went back in time" messages appeard in the MDS logs. This doesn't quite fit the description in the manual. There were no crashes or power losses. I cannot understand how which cache might have been lost. The transaction numbers quoted in the error are both large, eg. `transno 55841088879 was previously committed, server now claims 4294992012` What should we do? Give `lfsck` another try? Nothing really to see here I think? Did you delete LAST_RCVD during the replacement and the OST didn't know what transno was assigned to the last RPCs it sent? The still-mounted clients have a record of this transno and are surprised that it was reset. If you unmount and remount the clients the error would go away. No, I don't think I deleted something during the procedure. - The old OST was emptied (max_create_count=0) in normal Lustre operations. Last transaction should be ~ last file being moved away. - Then the OST is deactivated, but only on the MDS, not on the clients. - Then the new OST, formatted with '--replace', is mounted. It is activated on the MDS. Up to this point no errors. - Finally, the max_create_count is increased, clients can write. - Now the MDT throws this error (nothing in the client logs). According to the manual, what should have happened when I mounted the new OST, The MDS and OSS will negotiate the LAST_ID value for the replacement OST. Ok, this is about LAST_ID, whereever that is on ZFS. About LAST_RCVD, the manual says (even in the case when the configuration files got lost and have to be recreated): The last_rcvd file will be recreated when the OST is first mounted using the default parameters, So, let's see what happens once the clients remount. Eventually, then, I should also restart the MDTs? Regards, Thomas I'm not sure if the clients might try to preserve the next 55B RPCs in memory until the committed transno on the OST catches up, or if they just accept the new transno and get on with life? Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] OST went back in time: no(?) hardware issue
Hi all, in our Lustre 2.12.5 system, we have "OST went back in time" after OST hardware replacement: - hardware had reached EOL - we set `max_create_count=0` for these OSTs, searched for and migrated off the files of these OSTs - formatted the new OSTs with `--replace` and the old indices - all OSTs are on ZFS - set the OSTs `active=0` on our 3 MDTs - moved in the new hardware, reused the old NIDs, old OST indices, mounted the OSTs - set the OSTs `active=1` - ran `lfsck` on all servers - set `max_create_count=200` for these OSTs Now the "OST went back in time" messages appeard in the MDS logs. This doesn't quite fit the description in the manual. There were no crashes or power losses. I cannot understand how which cache might have been lost. The transaction numbers quoted in the error are both large, eg. `transno 55841088879 was previously committed, server now claims 4294992012` What should we do? Give `lfsck` another try? Regards, Thomas -- Thomas Roth Department: IT GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Install instructions for Rocky 8.8
Also, I think you want to check out some 'release branch' first: your compilation gave you "2.15.58" packages - this is probably an intermediary, development version. At least I seem to remember a warning from Andreas about these multi-digit subsubversions. According to lustre.org, the current release is 2.15.3 - perhaps this works a little better. On 25/09/2023 11.52, Jan Andersen wrote: I'm having some trouble installing lustre - this is on Rocky 8.8. I downloaded the latest (?) source: git clone git://git.whamcloud.com/fs/lustre-release.git and I managed to compile and create the RPMs: make rpms I now have a directory full of rpm files: [root@rocky8 lustre-release]# ls -1 ?*rpm kmod-lustre-2.15.58_42_ga54a206-1.el8.x86_64.rpm kmod-lustre-debuginfo-2.15.58_42_ga54a206-1.el8.x86_64.rpm kmod-lustre-osd-ldiskfs-2.15.58_42_ga54a206-1.el8.x86_64.rpm kmod-lustre-osd-ldiskfs-debuginfo-2.15.58_42_ga54a206-1.el8.x86_64.rpm kmod-lustre-tests-2.15.58_42_ga54a206-1.el8.x86_64.rpm kmod-lustre-tests-debuginfo-2.15.58_42_ga54a206-1.el8.x86_64.rpm lustre-2.15.58_42_ga54a206-1.el8.x86_64.rpm lustre-2.15.58_42_ga54a206-1.src.rpm lustre-debuginfo-2.15.58_42_ga54a206-1.el8.x86_64.rpm lustre-debugsource-2.15.58_42_ga54a206-1.el8.x86_64.rpm lustre-devel-2.15.58_42_ga54a206-1.el8.x86_64.rpm lustre-iokit-2.15.58_42_ga54a206-1.el8.x86_64.rpm lustre-osd-ldiskfs-mount-2.15.58_42_ga54a206-1.el8.x86_64.rpm lustre-osd-ldiskfs-mount-debuginfo-2.15.58_42_ga54a206-1.el8.x86_64.rpm lustre-resource-agents-2.15.58_42_ga54a206-1.el8.x86_64.rpm lustre-tests-2.15.58_42_ga54a206-1.el8.x86_64.rpm lustre-tests-debuginfo-2.15.58_42_ga54a206-1.el8.x86_64.rpm This is what I get when I, somewhat naively, try to simply install the lot with: [root@rocky8 lustre-release]# dnf install ?*rpm Last metadata expiration check: 0:12:59 ago on Mon 25 Sep 2023 09:29:52 UTC. Error: Problem 1: conflicting requests - nothing provides ldiskfsprogs >= 1.44.3.wc1 needed by kmod-lustre-osd-ldiskfs-2.15.58_42_ga54a206-1.el8.x86_64 Problem 2: conflicting requests - nothing provides ldiskfsprogs > 1.45.6 needed by lustre-osd-ldiskfs-mount-2.15.58_42_ga54a206-1.el8.x86_64 Problem 3: package lustre-2.15.58_42_ga54a206-1.el8.x86_64 requires lustre-osd-mount, but none of the providers can be installed - conflicting requests - nothing provides ldiskfsprogs > 1.45.6 needed by lustre-osd-ldiskfs-mount-2.15.58_42_ga54a206-1.el8.x86_64 Problem 4: package lustre-devel-2.15.58_42_ga54a206-1.el8.x86_64 requires liblnetconfig.so.4()(64bit), but none of the providers can be installed - package lustre-devel-2.15.58_42_ga54a206-1.el8.x86_64 requires liblustreapi.so.1()(64bit), but none of the providers can be installed - package lustre-devel-2.15.58_42_ga54a206-1.el8.x86_64 requires lustre = 2.15.58_42_ga54a206, but none of the providers can be installed - package lustre-2.15.58_42_ga54a206-1.el8.x86_64 requires lustre-osd-mount, but none of the providers can be installed - conflicting requests - nothing provides ldiskfsprogs > 1.45.6 needed by lustre-osd-ldiskfs-mount-2.15.58_42_ga54a206-1.el8.x86_64 Problem 5: package lustre-resource-agents-2.15.58_42_ga54a206-1.el8.x86_64 requires lustre, but none of the providers can be installed - package lustre-2.15.58_42_ga54a206-1.el8.x86_64 requires lustre-osd-mount, but none of the providers can be installed - conflicting requests - nothing provides ldiskfsprogs > 1.45.6 needed by lustre-osd-ldiskfs-mount-2.15.58_42_ga54a206-1.el8.x86_64 Problem 6: package lustre-tests-2.15.58_42_ga54a206-1.el8.x86_64 requires liblnetconfig.so.4()(64bit), but none of the providers can be installed - package lustre-tests-2.15.58_42_ga54a206-1.el8.x86_64 requires liblustreapi.so.1()(64bit), but none of the providers can be installed - package lustre-tests-2.15.58_42_ga54a206-1.el8.x86_64 requires lustre = 2.15.58_42_ga54a206, but none of the providers can be installed - package lustre-2.15.58_42_ga54a206-1.el8.x86_64 requires lustre-osd-mount, but none of the providers can be installed - conflicting requests - nothing provides ldiskfsprogs > 1.45.6 needed by lustre-osd-ldiskfs-mount-2.15.58_42_ga54a206-1.el8.x86_64 Clearly there is something I haven't done yet, but what am I doing wrong? /jan ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] 2.15 install failure
Hi all, returning to my Lustre installations, the curious failures continue... - Download of 2.15.3 for el8.8 from Whamcloud - Installation of a server with rocky 8.8 (I mean, why not, while it still exists...) - Want an ldiskfs server, so > dnf install lustre lustre-osd-ldiskfs-mount lustre-ldiskfs-dkms --> Fails because the full ext4 source is not present. I wonder whether I got the workaround from this mailing list, but it should really be in some official documentation or better not necessary at all: - Rocky 8.8 installs with kernel 4.18.0-477.15.1, so download 'kernel-4.18.0-477.15.1.el8_8.src.rpm' > rpm -i ./kernel-4.18.0-477.15.1.el8_8.src.rpm > tar xJf rpmbuild/SOURCES/linux-4.18.0-477.15.1.el8_8.tar.xz > cp -a linux-4.18.0-477.15.1.el8_8/fs/ext4/* /usr/src/kernels/4.18.0-477.15.1.el8_8.x86_64/fs/ext4/ Of course, at this stage, 'lustre-ldiskfs-dkms' is already installed, so > dnf reinstall lustre-ldiskfs-dkms This plainly prints out that dkms is successfully installing / compiling all the modules, then prints > Running scriptlet: lustre-ldiskfs-dkms-2.15.3-1.el8.noarch > 2/2 > Deprecated feature: REMAKE_INITRD (/var/lib/dkms/lustre-ldiskfs/2.15.3/source/dkms.conf) > Deprecated feature: REMAKE_INITRD (/var/lib/dkms/lustre-ldiskfs/2.15.3/source/dkms.conf) > Module lustre-ldiskfs-2.15.3 for kernel 4.18.0-477.15.1.el8_8.x86_64 (x86_64). > Before uninstall, this module version was ACTIVE on this kernel. > Removing any linked weak-modules and the uninstalls all the modules Even the /var/lib/dkms/lustre-ldiskfs gets removed, so this machine is clean and pristine, just that dnf/rpm believe that lustre-ldiskfs-dkms is already installed. ;-) (These messages printed between creation and destruction, they do not really indicate any kind of trouble, do they?) Well. we all know we are dealing with computers and not with deterministic machines, so > dnf remove lustre lustre-ldiskfs-dkms lustre-osd-ldiskfs-mount and > dnf install lustre-ldiskfs-dkms (Drum roll...) Lustre modules get compiled, installed _and_ _not_ removed. ('modprobe lustre' works, 'dnf install lustre lustre-osd-ldiskfs-mount' does not create new havoc) I'm flabbergasted and really have no idea how I misconfigured a simple, minimal el8.8 installation into this kind of behavior. Cheers Thomas -- Thomas Roth IT-HPC-Linux Location: SB3 2.291 Phone: 1453 ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Configuring LustreFS Over DRBD
Hi Shambhu, I also think active-active is not possible here - two NIDs for the same target? - but we have been running our MDTs on top of DRBD, which works quite well. The last time I compared this setup against a mirror of storage targets, DRBD was actually a bit faster. And it might improve if you use protocol B or A instead of C. Regards, Thomas On 3/15/23 12:29, Shambhu Raje via lustre-discuss wrote: I am trying to configure a clustered file system over DRBD software so that if we mount a file system just like LustreFS over DRBD set -up in dual primary mode it can provide us with the real time replication of data . Can I configure lustre file system over DRBD in redhat 8.7 ... If yes, how can it be configured ? Waiting for your supporting response. ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Quota issue after OST removal
Hi Daniel, isn't this expected: on your lustrefs-OST0001, usage seems to have hit the limit (perhaps if you do 'lfs quota -g somegroup...', it will show you by how many bytes). If one part of the distributed quota is exceeded, Lustre should report that with the * - although the total across the file system is still below the limit. Obviously your 'somegroup' is at the quota limit on all visible OSTs, so my guess is that would be the same on the missing two OSTs. So, either have some data removed or increase the limit. Best regards Thomas On 26.10.22 16:52, Daniel Szkola via lustre-discuss wrote: Hello all, We recently removed an OSS/OST node that was spontaneously shutting down so hardware testing could be performed. I have no idea how long it will be out, so I followed the procedure for permanent removal. Since then space usage is being calculated correctly, but 'lfs quota' will show groups as exceeding quota, despite being under both soft and hard limits. A verbose listing shows that all OST limits are met and I have no idea how to reset the limits now that the two OSTs on the removed OSS node are not part of the equation. Due to the heavy usage of the Lustre filesystem, no clients have been unmounted and no MDS or OST nodes have been restarted. The underlying filesystem is ZFS. Looking for ideas on how to correct this. Example: # lfs quota -gh somegroup -v /lustre1 Disk quotas for grp somegroup (gid ): Filesystemused quota limit grace files quota limit grace /lustre1 21.59T*27T 30T 6d23h39m15s 2250592 2621440 3145728 - lustrefs-MDT_UUID 1.961G - 1.962G - 2250592 - 2359296 - lustrefs-OST_UUID 2.876T - 2.876T - - - - - lustrefs-OST0001_UUID 2.611T* - 2.611T - - - - - lustrefs-OST0002_UUID 4.794T - 4.794T - - - - - lustrefs-OST0003_UUID 4.587T - 4.587T - - - - - quotactl ost4 failed. quotactl ost5 failed. lustrefs-OST0006_UUID 3.21T - 3.21T - - - - - lustrefs-OST0007_UUID 3.515T - 3.515T - - - - - Total allocated inode limit: 2359296, total allocated block limit: 21.59T Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate. -- Dan Szkola FNAL ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] OST replacement procedure
Hi all, about the correct procedure to replace an OST: I read the recent issues reported here by Robert Redl, the LU-15000 by Stephane and in particular his talk at LAD22: Why is it important to _not_ reuse old OST indices? Understandable if you want to remove the OST, but not replace it. In the past - I think in Lustre 1.8 - when there was no "mkfs.lustre --replace" available, over time we ended up with a long list of OSTs continually 'lctl --deactivate'd on all clients, very ugly. And were so happy when explicit indices and '--replace' were introduced, in particular because I was terribly afraid of creating holes in the list of active OSTs ('holes' might have been a No-No in some past version?) Nowadays, everybody wants to avoid old OST incides - with 'lctl --del-ost' a specific command for doing that comfortably is developed. Why? Best regards, Thomas -- Thomas Roth IT-HPC-Linux Location: SB3 2.291 Phone: 1453 ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] missing option mgsnode
You could look at what the device believes it's formatted with by > tunefs.lustre --dryrun /dev/mapper/mpathd When I do that here, I get something like checking for existing Lustre data: found Read previous values: Target: idril-OST000e Index: 14 Lustre FS: idril Mount type: zfs Flags: 0x2 (OST ) Persistent mount opts: Parameters: mgsnode=10.20.6.64@o2ib4:10.20.6.69@o2ib4 ... Tells you about 'mount type' and 'mgsnode'. Regards Thomas On 20/07/2022 19.48, Paul Edmon via lustre-discuss wrote: We have a filesystem that we have running Lustre 2.10.4 in HA mode using IML. One of our OST's had some disk failures and after reconstruction of the RAID set it won't remount but gives: [root@holylfs02oss06 ~]# mount -t lustre /dev/mapper/mpathd /mnt/holylfs2-OST001f Failed to initialize ZFS library: 256 mount.lustre: missing option mgsnode= The weird thing is that we didn't build this with ZFS, the devices are all ldiskfs. We suspect some of the data is corrupt on the disk but we were wondering if anyone had seen this error before and if there was a solution. -Paul Edmon- ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] lustre 2.15 installation on Centos 8.5.2111
In addition, you might need to use the "--nodeps" to install your self-compiled packages, cf. https://jira.whamcloud.com/browse/LU-15976 Cheers Thomas On 7/12/22 19:02, Jesse Stroik via lustre-discuss wrote: Hi Fran, I suspect the issue is a missing kmod-zfs RPM which provides those symbols. It might be the case that it was inadvertently excluded from the whamcloud repo. You could build your own RPMs on a centos system with the group 'Development Tools' installed. I'd recommend doing it as an unprivileged user and however you setup your normal build environment. Start by installing the kernel RPMs they provide, boot into that new kernel, verify that is the newest kernel you have installed, then build zfs & lustre from source. Here is an example I tested on a rocky 8.5 system, but it'll probably work similarly on a centos 8.5 system. $ git clone https://github.com/zfsonlinux/zfs.git $ cd zfs $ git checkout zfs-2.0.7 $ sh autogen.sh $ ./configure --with-spec=redhat $ make rpms At that point, you should have a set of ZFS RPMs built. Install them: $ dnf localinstall kmod-zfs-2.0.7-1.el8.x86_64.rpm kmod-zfs-devel-2.0.7-1.el8.x86_64.rpm libnvpair3-2.0.7-1.el8.x86_64.rpm libuutil3-2.0.7-1.el8.x86_64.rpm libzfs4-2.0.7-1.el8.x86_64.rpm libzfs4-devel-2.0.7-1.el8.x86_64.rpm libzpool4-2.0.7-1.el8.x86_64.rpm zfs-2.0.7-1.el8.x86_64.rpm At this point if you've built for the correct kernel, the zfs module should be loadable. Then fetch and build lustre 2.15.0. This worked for me: $ git clone git://git.whamcloud.com/fs/lustre-release.git $ cd lustre-release $ git checkout v2_15_0 $ sh autogen.sh $ ## in this example it will build with zfs support only $ ./configure --enable-server --disable-ldiskfs $ make rpms If everything succeeds, install the RPMs. I believe this would be the minimum set you might need: $ dnf localinstall kmod-lustre-2.15.0-1.el8.x86_64.rpm kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64.rpm lustre-2.15.0-1.el8.x86_64.rpm lustre-osd-zfs-mount-2.15.0-1.el8.x86_64.rpm Adjust as needed. I hope you find this useful. Best, Jesse From: lustre-discuss on behalf of Bedosti Francesco Sent: Monday, July 11, 2022 11:17 AM To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] lustre 2.15 installation on Centos 8.5.2111 Hi i'm installing lustre 2.15 with ZFS backend from repository https://downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5/server/ on a Centos 8.5 I added zfs from lustre repository without problems, but when i try to install lustre it gives me this error: yum install lustre Last metadata expiration check: 0:07:45 ago on Mon Jul 11 18:05:51 2022. Error: Problem: package lustre-2.15.0-1.el8.x86_64 requires lustre-osd-mount, but none of the providers can be installed - package lustre-osd-zfs-mount-2.15.0-1.el8.x86_64 requires kmod-lustre-osd-zfs, but none of the providers can be installed - conflicting requests - nothing provides ksym(__cv_broadcast) = 0x03cebd8a needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(arc_add_prune_callback) = 0x1363912f needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(arc_buf_size) = 0x115a75cf needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(arc_remove_prune_callback) = 0x1ab2d851 needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(dbuf_create_bonus) = 0x7beafc97 needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(dbuf_read) = 0xa12ed106 needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(dmu_assign_arcbuf_by_dbuf) = 0x26d78f55 needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(dmu_bonus_hold) = 0x8d7deb8a needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(dmu_buf_hold_array_by_bonus) = 0x878059c5 needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(dmu_buf_rele) = 0x9205359f needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(dmu_buf_rele_array) = 0x33363fa0 needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(dmu_free_long_range) = 0x329676ab needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(dmu_free_range) = 0x356042ae needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(dmu_object_alloc_dnsize) = 0x72ae6b8e needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(dmu_object_free) = 0x01514575 needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(dmu_object_next) = 0x989708be needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(dmu_object_set_blocksize) = 0xdbdf5ea0 needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothing provides ksym(dmu_objset_disown) = 0xac0fbfc0 needed by kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64 - nothin
Re: [lustre-discuss] How to speed up Lustre
Yes, I got it. But Marion states that they switched > to a PFL arrangement, where the first 64k lives on flash OST's (mounted on our metadata servers), and the remainder of larger files lives on HDD OST's. So, how do you specify a particular OSTs (or group of OSTs) in a PFL? The OST-equivalent of the "-L mdt" part ? With SSDs and HDDs making up the OSTs, I would have guessed OST pools, but I'm only aware of a "lfs setstripe" that puts all of my file into a pool. How to put the first few kB of a file in pool A and the rest in pool B ? Cheers Thomas On 7/6/22 21:42, Andreas Dilger wrote: Thomas, where the file data is stored depends entirely on the PFL layout used for the filesystem or parent directory. For DoM files, you need to specify a DoM component, like: lfs setstripe -E 64K -L mdt -E 1G -c 1 -E 16G -c 4 -E eof -c 32 so the first 64KB will be put onto the MDT where the file is created, the remaining 1GB onto a single OST, the next 15GB striped across 4 OSTs, and the rest of the file striped across (up to) 32 OSTs. 64KB is the minimum DoM component size, but if the files are smaller (e.g. 3KB) they will only allocate space on the MDT in multiples of 4KB blocks. However, the default ldiskfs MDT formatting only leaves about 1 KB of space per inode, which would quickly run out unless DoM is restricted to specific directories with small files, or if the MDT is formatted with enough free space to accommodate this usage. This is less of an issue with ZFS MDTs, but DoM files will still consume space much more quickly and reduce the available inode count by a factor of 16-64 more quickly than without DoM. It is strongly recommended to use Lustre 2.15 with DoM to benefit from the automatic MDT space balancing, otherwise the MDT usage may become imbalanced if the admin (or users) do not actively manage the MDT selection for new user/project/job directories with "lfs mkdir -i". Cheers, Andreas On Jul 6, 2022, at 10:48, Thomas Roth via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: Hi Marion, I do not fully understand how to "mount flash OSTs on a metadata server" - You have a couple of SSDs, you assemble these into on block device and format it with "mkfs.lustre --ost ..." ? And then mount it just as any other OST? - PFL then puts the first 64k on these OSTs and the rest of all files on the HDD-based OSTs? So, no magic on the MDS? I'm asking because we are considering something similar, but we would not have these flash-OSTs in the MDS-hardware but on separate OSS servers. Regards, Thomas On 23/02/2022 04.35, Marion Hakanson via lustre-discuss wrote: Hi again, kara...@aselsan.com.tr<mailto:kara...@aselsan.com.tr> said: I was thinking that DoM is built in feature and it can be enabled/disabled online for a certain directories. What do you mean by reformat to converting to DoM (or away from it). I think just Metadata target size is important. When we first turned on DoM, it's likely that our Lustre system was old enough to need to be reformatted in order to support it. Our flash storage RAID configuration also needed to be expanded, but the system was not yet in production so a reformat was no big deal at the time. So perhaps your system will not be subject to this requirement (other than expanding your MDT flash somehow). kara...@aselsan.com.tr<mailto:kara...@aselsan.com.tr> said: I also thought creating flash OST on metadata server. But I was not sure what to install on metadata server for this purpose. Can Metadata server be an OSS server at the same time? If it is possible I would prefer flash OST on Metadata server instead of DoM. Because Our metadata target size is small, it seems I have to do risky operations to expand size. Yes, our metadata servers are also OSS's at the same time. The flash OST's are separate volumes (and drives) from the MDT's, so less scary (:-). kara...@aselsan.com.tr<mailto:kara...@aselsan.com.tr> said: imho, because of the less RPC traffic DoM shows more performance than flash OST. Am I right? The documentation does say there that using DoM for small files will produce less RPC traffic than using OST's for small files. But as I said earlier, for us, the amount of flash needed to support DoM was a lot higher than with the flash OST approach (we have a high percentage, by number, of small files). I'll also note that we had a wish to mostly "set and forget" the layout for our Lustre filesystem. We have not figured out a way to predict or control where small files (or large ones) are going to end up, so trying to craft optimal layouts in particular directories for particular file sizes has turned out to not be feasible for us. PFL has been a win for us here, for that reason. Our conclusion was that in order to take advantage of the performance improvements of DoM, you need enough mon
Re: [lustre-discuss] How to speed up Lustre
Hi Marion, I do not fully understand how to "mount flash OSTs on a metadata server" - You have a couple of SSDs, you assemble these into on block device and format it with "mkfs.lustre --ost ..." ? And then mount it just as any other OST? - PFL then puts the first 64k on these OSTs and the rest of all files on the HDD-based OSTs? So, no magic on the MDS? I'm asking because we are considering something similar, but we would not have these flash-OSTs in the MDS-hardware but on separate OSS servers. Regards, Thomas On 23/02/2022 04.35, Marion Hakanson via lustre-discuss wrote: Hi again, kara...@aselsan.com.tr said: I was thinking that DoM is built in feature and it can be enabled/disabled online for a certain directories. What do you mean by reformat to converting to DoM (or away from it). I think just Metadata target size is important. When we first turned on DoM, it's likely that our Lustre system was old enough to need to be reformatted in order to support it. Our flash storage RAID configuration also needed to be expanded, but the system was not yet in production so a reformat was no big deal at the time. So perhaps your system will not be subject to this requirement (other than expanding your MDT flash somehow). kara...@aselsan.com.tr said: I also thought creating flash OST on metadata server. But I was not sure what to install on metadata server for this purpose. Can Metadata server be an OSS server at the same time? If it is possible I would prefer flash OST on Metadata server instead of DoM. Because Our metadata target size is small, it seems I have to do risky operations to expand size. Yes, our metadata servers are also OSS's at the same time. The flash OST's are separate volumes (and drives) from the MDT's, so less scary (:-). kara...@aselsan.com.tr said: imho, because of the less RPC traffic DoM shows more performance than flash OST. Am I right? The documentation does say there that using DoM for small files will produce less RPC traffic than using OST's for small files. But as I said earlier, for us, the amount of flash needed to support DoM was a lot higher than with the flash OST approach (we have a high percentage, by number, of small files). I'll also note that we had a wish to mostly "set and forget" the layout for our Lustre filesystem. We have not figured out a way to predict or control where small files (or large ones) are going to end up, so trying to craft optimal layouts in particular directories for particular file sizes has turned out to not be feasible for us. PFL has been a win for us here, for that reason. Our conclusion was that in order to take advantage of the performance improvements of DoM, you need enough money for lots of flash, or you need enough staff time to manage the DoM layouts to fit into that flash. We have neither of those conditions, and we find that using PFL and flash OST's for small files is working very well for us. Regards, Marion From: =?utf-8?B?VGFuZXIgS0FSQUfDlkw=?= To: Marion Hakanson CC: "lustre-discuss@lists.lustre.org" Date: Tue, 22 Feb 2022 04:53:03 + UNCLASSIFIED Thank you for sharing your experience. I was thinking that DoM is built in feature and it can be enabled/disabled online for a certain directories. What do you mean by reformat to converting to DoM (or away from it). I think just Metadata target size is important. I also thought creating flash OST on metadata server. But I was not sure what to install on metadata server for this purpose. Can Metadata server be an OSS server at the same time? If it is possible I would prefer flash OST on Metadata server instead of DoM. Because Our metadata target size is small, it seems I have to do risky operations to expand size. imho, because of the less RPC traffic DoM shows more performance than flash OST. Am I right? Best Regards; From: Marion Hakanson Sent: Thursday, February 17, 2022 8:20 PM To: Taner KARAGÖL Cc: lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] How to speed up Lustre We started with DoM on our new Lustre system a couple years ago. - Converting to DoM (or away from it) is a full-reformat operation. - DoM uses a fixed amount of metadata space (64k minimum for us) for every file, even those smaller than 64k. Basically, DoM uses a lot of flash metadata space, more than we planned for, and more than we could afford. We ended up switching to a PFL arrangement, where the first 64k lives on flash OST's (mounted on our metadata servers), and the remainder of larger files lives on HDD OST's. This is working very well for our small-file workloads, and uses less flash space than the DoM configuration did. Since you don't already have DoM in effect, it may be possible that you could add flash OST's, configure a PFL, and then use "lfs migrate" to re-layout existing files into the new OST's. Your mileage may vary, so be safe! Regards, Marion On Feb 14, 2022, at 03:32, Taner KARAGÖL via lustre-disc
Re: [lustre-discuss] Installing 2.15 on rhel 8.5 fails
After making sure that this can be reproduced when starting with the git repo, I have created LU-15972 for this issue. There is an unrelated issue with the ZFS-variant, which however also fits the subject of this mail, which I described in detail in LU-15976. Best regards Thomas On 24/06/2022 16.24, Thomas Roth via lustre-discuss wrote: Since it seems I have now managed to create the modules, I'd like to record that here: 1. Install system w AlmaLinux 8.5 -> kernel 4.18.0-348.23.1 2. Install packages from Whamcloud (lustre-2.15.0/el8.5.2111/server/): lustre, kmod-lustre, kmod-lustre-osd-ldiskfs -> fails due to the discussed 'unknown symbols', cf. https://jira.whamcloud.com/browse/LU-15962 3. Install the correspoding dkms-packages -> fails, reason not clear 4. Go to the remnant /var/lib/dkms/lustre-ldiskfs/2.15.0/build, run configure with '--with-o2ib=/usr/src/kernels/4.18.0-348.23.1.el8_5.x86_64' (that's were the kernel-devel + the extfs sources ended up in this case) 5. 'make rpms' now fails with > make[4]: Entering directory '/var/lib/dkms/lustre-ldiskfs/2.15.0/build/lustre/utils' > ... > In file included from liblustreapi.c:83: > lstddef.h:306:22: error: static declaration of ‘copy_file_range’ follows non-static declaration > │static inline loff_t copy_file_range(int fd_in, loff_t *off_in, int fd_out, This was already reported last year, e.g. https://www.mail-archive.com/lustre-discuss@lists.lustre.org/msg16822.html The workaround is also given there: the configure command line has '--disable-utils' (by default), somehow this makes make go to 'utils' and fail. 6. Repeat 'configure with the '-with-o2ib=/usr/src/kernels...' and without '--disable-utils' 7. 'make rpms' yields some installable kmod packages, the contained modules can be loaded (I haven't setup the file system yet). Cheers, Thomas PS: My rather spartan Alma-installation needed in addition 'dnf install' > kernel-headers-4.18.0-348.23.1.el8_5.x86_64 kernel-devel-4.18.0-348.23.1.el8_5.x86_64 dkms > kernel-4.18.0-348.23.1.el8_5.src > e2fsprogs-devel rpm-build kernel-rpm-macros kernel-abi-whitelists libselinux-devel libtool On 6/22/22 21:08, Jian Yu wrote: Hi Thomas, The issue is being fixed in https://jira.whamcloud.com/browse/LU-15962. A workaround is to build Lustre with "--with-o2ib=" configure option. The is where in-kernel Module.symvers is located. -- Best regards, Jian Yu -Original Message- From: lustre-discuss on behalf of Thomas Roth via lustre-discuss Reply-To: Thomas Roth Date: Wednesday, June 22, 2022 at 10:32 AM To: Andreas Dilger Cc: lustre-discuss Subject: Re: [lustre-discuss] Installing 2.15 on rhel 8.5 fails Hmm, but we are using the in-kernel OFED, so this makes these messages all the more mysterious. Regards, Thomas On 22/06/2022 19.12, Andreas Dilger wrote: > On Jun 22, 2022, at 10:40, Thomas Roth via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: > > my rhel8 system is actually an Alma Linux 8.5 installation, this is the first time the compatiblity to an alleged rhel8.5 software fails... > > > The system is running kernel '4.18.0-348.2.1.el8_5' > This version string can also be found in the package names in > https://downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64 > - this is usually a good sign. > > However, installation of kmod-lustre-2.15.0-1.el8 yields the well known "depmod: WARNINGs", like >> /lib/modules/4.18.0-348.2.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol __ib_alloc_pd > > > The kernel from downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64 identifies itself as "CentOS" and does not want to boot - no option either. > > > Any hints how to proceed? > > The ko2iblnd module is built against the in-kernel OFED, so if you are using MOFED you will need to rebuild the kernel modules themselves. If you don't use IB at all you can ignore these depmod messages. > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Whamcloud > -- Thomas Roth Department: Informationstechnologie GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of
Re: [lustre-discuss] Installing 2.15 on rhel 8.5 fails
Since it seems I have now managed to create the modules, I'd like to record that here: 1. Install system w AlmaLinux 8.5 -> kernel 4.18.0-348.23.1 2. Install packages from Whamcloud (lustre-2.15.0/el8.5.2111/server/): lustre, kmod-lustre, kmod-lustre-osd-ldiskfs -> fails due to the discussed 'unknown symbols', cf. https://jira.whamcloud.com/browse/LU-15962 3. Install the correspoding dkms-packages -> fails, reason not clear 4. Go to the remnant /var/lib/dkms/lustre-ldiskfs/2.15.0/build, run configure with '--with-o2ib=/usr/src/kernels/4.18.0-348.23.1.el8_5.x86_64' (that's were the kernel-devel + the extfs sources ended up in this case) 5. 'make rpms' now fails with > make[4]: Entering directory '/var/lib/dkms/lustre-ldiskfs/2.15.0/build/lustre/utils' > ... > In file included from liblustreapi.c:83: > lstddef.h:306:22: error: static declaration of ‘copy_file_range’ follows non-static declaration > │static inline loff_t copy_file_range(int fd_in, loff_t *off_in, int fd_out, This was already reported last year, e.g. https://www.mail-archive.com/lustre-discuss@lists.lustre.org/msg16822.html The workaround is also given there: the configure command line has '--disable-utils' (by default), somehow this makes make go to 'utils' and fail. 6. Repeat 'configure with the '-with-o2ib=/usr/src/kernels...' and without '--disable-utils' 7. 'make rpms' yields some installable kmod packages, the contained modules can be loaded (I haven't setup the file system yet). Cheers, Thomas PS: My rather spartan Alma-installation needed in addition 'dnf install' > kernel-headers-4.18.0-348.23.1.el8_5.x86_64 kernel-devel-4.18.0-348.23.1.el8_5.x86_64 dkms > kernel-4.18.0-348.23.1.el8_5.src > e2fsprogs-devel rpm-build kernel-rpm-macros kernel-abi-whitelists libselinux-devel libtool On 6/22/22 21:08, Jian Yu wrote: Hi Thomas, The issue is being fixed in https://jira.whamcloud.com/browse/LU-15962. A workaround is to build Lustre with "--with-o2ib=" configure option. The is where in-kernel Module.symvers is located. -- Best regards, Jian Yu -Original Message- From: lustre-discuss on behalf of Thomas Roth via lustre-discuss Reply-To: Thomas Roth Date: Wednesday, June 22, 2022 at 10:32 AM To: Andreas Dilger Cc: lustre-discuss Subject: Re: [lustre-discuss] Installing 2.15 on rhel 8.5 fails Hmm, but we are using the in-kernel OFED, so this makes these messages all the more mysterious. Regards, Thomas On 22/06/2022 19.12, Andreas Dilger wrote: > On Jun 22, 2022, at 10:40, Thomas Roth via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: > > my rhel8 system is actually an Alma Linux 8.5 installation, this is the first time the compatiblity to an alleged rhel8.5 software fails... > > > The system is running kernel '4.18.0-348.2.1.el8_5' > This version string can also be found in the package names in > https://downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64 > - this is usually a good sign. > > However, installation of kmod-lustre-2.15.0-1.el8 yields the well known "depmod: WARNINGs", like >> /lib/modules/4.18.0-348.2.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol __ib_alloc_pd > > > The kernel from downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64 identifies itself as "CentOS" and does not want to boot - no option either. > > > Any hints how to proceed? > > The ko2iblnd module is built against the in-kernel OFED, so if you are using MOFED you will need to rebuild the kernel modules themselves. If you don't use IB at all you can ignore these depmod messages. > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Whamcloud > -- Thomas Roth Department: Informationstechnologie GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Installing 2.15 on rhel 8.5 fails
Hmm, but we are using the in-kernel OFED, so this makes these messages all the more mysterious. Regards, Thomas On 22/06/2022 19.12, Andreas Dilger wrote: On Jun 22, 2022, at 10:40, Thomas Roth via lustre-discuss mailto:lustre-discuss@lists.lustre.org>> wrote: my rhel8 system is actually an Alma Linux 8.5 installation, this is the first time the compatiblity to an alleged rhel8.5 software fails... The system is running kernel '4.18.0-348.2.1.el8_5' This version string can also be found in the package names in https://downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64 - this is usually a good sign. However, installation of kmod-lustre-2.15.0-1.el8 yields the well known "depmod: WARNINGs", like /lib/modules/4.18.0-348.2.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol __ib_alloc_pd The kernel from downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64 identifies itself as "CentOS" and does not want to boot - no option either. Any hints how to proceed? The ko2iblnd module is built against the in-kernel OFED, so if you are using MOFED you will need to rebuild the kernel modules themselves. If you don't use IB at all you can ignore these depmod messages. Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Whamcloud -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Installing 2.15 on rhel 8.5 fails
Hi all, my rhel8 system is actually an Alma Linux 8.5 installation, this is the first time the compatiblity to an alleged rhel8.5 software fails... The system is running kernel '4.18.0-348.2.1.el8_5' This version string can also be found in the package names in https://downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64 - this is usually a good sign. However, installation of kmod-lustre-2.15.0-1.el8 yields the well known "depmod: WARNINGs", like > /lib/modules/4.18.0-348.2.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol __ib_alloc_pd The kernel from downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64 identifies itself as "CentOS" and does not want to boot - no option either. Any hints how to proceed? Regards, Thomas -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Building 2.15 on rhel8 fails
Hi all, I tried to install 'lustre-ldiskfs-dkms' on a rhel8.5 system, running kernel Fails, /var/lib/dkms/lustre-ldiskfs/2.15.0/build/make.log says "No targets specified and no makefile found", and in the corresponding '/var/lib/dkms/lustre-ldiskfs/2.15.0/buildconfig.log' indeed the first real error seems to be > scripts/Makefile.build:45: /var/lib/dkms/lustre-ldiskfs/2.15.0/build/build//var/lib/dkms/lustre-ldiskfs/2.15.0/build/build/Makefile: No such file or directory > make[1]: *** No rule to make target '/var/lib/dkms/lustre-ldiskfs/2.15.0/build/build//var/lib/dkms/lustre-ldiskfs/2.15.0/build/build/Makefile'. Stop. This directory tree is a bit large :-) > '/var/lib/dkms/lustre-ldiskfs/2.15.0/build/build/Makefile' does exist, though. Where could this doubling of the path come from? Btw, how to re-run dkms, in case I'd edit some stuff there? Regards Thomas -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Interoperability 2.12.7 client <-> 2.12.8 server
Hi Hans-Henrik, might be this LU-15244 - I would never have guessed so from the LU - text ;-) But I can report that the same to clients can do the same operations as before without any problems if installed with CentOS 7.9 instead of rhel8.5, again one of them with 'kmod-lustre-client-2.12.8_6', the other one with 'lustre-client-dkms-2.12.7' Regards, Thomas On 07/03/2022 09.05, Hans Henrik Happe via lustre-discuss wrote: Hi Thomas, They should work together, but there are other requirements that need to be fulfilled: https://wiki.lustre.org/Lustre_2.12.8_Changelog I guess your servers are CentOS 7.9 as required for 2.12.8. I had an issue with Rocky 8.5 and the latest kernel with 2.12.8. While RHEL 8.5 is supported there was something new after 4.18.0-348.2.1.el8_5, which caused problems. I found an LU fixing it post 2.12.8 (can't remember the number), but downgrading to 4.18.0-348.2.1.el8_5 was the quick fix. Cheers, Hans Henrik On 03.03.2022 08.40, Thomas Roth via lustre-discuss wrote: Dear all, this might be just something I forgot or did not read thoroughly, but shouldn't a 2.12.7-client work with 2.12.8 - servers? The 2.12.8-changelog has the standard disclaimer Interoperability Support: Clients & Servers: Latest 2.10.X and Latest 2.11.X I have this test cluster that I upgraded recently to 2.12.8 on the servers. The fist client I attached now is a fresh install of rhel 8.5 (Alma). I installed 'kmod-lustre-client' and `lustre-client` from https://downloads.whamcloud.com/public/lustre/lustre-2.12.8/el8.5.2111/ I copied a directory containing ~5000 files - no visible issues The next client was also installed with rhel 8.5 (Alma), but now using 'lustre-client-2.12.7-1' and 'lustre-client-dkms-2.12.7-1' from https://downloads.whamcloud.com/public/lustre/lustre-2.12.7/el8/client/RPMS/x86_64/ As on my first client, I copied a directory containing ~5000 files. The copy stalled, and the OSTs exploded in my face kernel: LustreError: 23345:0:(events.c:310:request_in_callback()) event type 2, status -103, service ost_io kernel: LustreError: 40265:0:(pack_generic.c:605:__lustre_unpack_msg()) message length 0 too small for magic/version check kernel: LustreError: 40265:0:(sec.c:2217:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.20.2.167@o2ib6 x1726208297906176 kernel: LustreError: 23345:0:(events.c:310:request_in_callback()) event type 2, status -103, service ost_io The latter message is repeated ad infinitum. The client log blames the network: Request sent has failed due to network error Connection to was lost; in progress operations using this service will wait for recovery to complete LustreError: 181316:0:(events.c:205:client_bulk_callback()) event type 1, status -103, desc86e248d6 LustreError: 181315:0:(events.c:205:client_bulk_callback()) event type 1, status -5, desc e569130f There is also a client running Debian 9 and Lustre 2.12.6 (compiled from git) - no trouble at all. The I switched those two rhel8.5-clients: reinstalled the OS, gave the first one the 2.12.7 -packages, the second on the 2.12.8 - and the error followed: again the client running with 'lustre-client-dkms-2.12.7-1' immedeately ran into trouble, causing the same error messages in the logs. So this is not a network problem in the sense of broken hardware etc. What did I miss? Some important Jira I did not read? Regards Thomas ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Interoperability 2.12.7 client <-> 2.12.8 server
Dear all, this might be just something I forgot or did not read thoroughly, but shouldn't a 2.12.7-client work with 2.12.8 - servers? The 2.12.8-changelog has the standard disclaimer Interoperability Support: Clients & Servers: Latest 2.10.X and Latest 2.11.X I have this test cluster that I upgraded recently to 2.12.8 on the servers. The fist client I attached now is a fresh install of rhel 8.5 (Alma). I installed 'kmod-lustre-client' and `lustre-client` from https://downloads.whamcloud.com/public/lustre/lustre-2.12.8/el8.5.2111/ I copied a directory containing ~5000 files - no visible issues The next client was also installed with rhel 8.5 (Alma), but now using 'lustre-client-2.12.7-1' and 'lustre-client-dkms-2.12.7-1' from https://downloads.whamcloud.com/public/lustre/lustre-2.12.7/el8/client/RPMS/x86_64/ As on my first client, I copied a directory containing ~5000 files. The copy stalled, and the OSTs exploded in my face kernel: LustreError: 23345:0:(events.c:310:request_in_callback()) event type 2, status -103, service ost_io kernel: LustreError: 40265:0:(pack_generic.c:605:__lustre_unpack_msg()) message length 0 too small for magic/version check kernel: LustreError: 40265:0:(sec.c:2217:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.20.2.167@o2ib6 x1726208297906176 kernel: LustreError: 23345:0:(events.c:310:request_in_callback()) event type 2, status -103, service ost_io The latter message is repeated ad infinitum. The client log blames the network: Request sent has failed due to network error Connection to was lost; in progress operations using this service will wait for recovery to complete LustreError: 181316:0:(events.c:205:client_bulk_callback()) event type 1, status -103, desc86e248d6 LustreError: 181315:0:(events.c:205:client_bulk_callback()) event type 1, status -5, desc e569130f There is also a client running Debian 9 and Lustre 2.12.6 (compiled from git) - no trouble at all. The I switched those two rhel8.5-clients: reinstalled the OS, gave the first one the 2.12.7 -packages, the second on the 2.12.8 - and the error followed: again the client running with 'lustre-client-dkms-2.12.7-1' immedeately ran into trouble, causing the same error messages in the logs. So this is not a network problem in the sense of broken hardware etc. What did I miss? Some important Jira I did not read? Regards Thomas -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] OST mount with failover MDS
Hi all, I wonder if I am seeing signs of network problems when mounting an OST: tunefs.lustre --dryrun tells me (what I know from my own format command) >Parameters: mgsnode=10.20.3.0@o2ib5:10.20.3.1@o2ib5 These are the nids for our MGS+MDT0, there are two more pairs for MDT1 and MDT2. I went step-by-step, modprobing lnet and lustre, and checking LNET by 'lnet ping' to the active MDTs, which worked fine. However, mounting such an OST (e.g. after a crash) at first prints a number of > LNet: 19444:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for 10.20.3.1@o2ib5: 0 seconds and similarly for the failover partners of the other two MDS. Should it do that? Imho, LNET to a failover node _must_ fail, because LNET should not be up on the failover node, right? If I started LNET there, and some client does not get an answer quickly enough from the acting MDS, it would try the failover, LNET yes but Lustre no - that doesn't sound right. Regards, Thomas -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [EXTERNAL] MDT mount stuck
Hi Rick, I have not tried that yet - after some forty minutes the mount command returned, the device is mounted. I will check how it behaves after all OSTs have been mounted. Regards Thomas On 12.03.21 00:05, Mohr, Rick wrote: Thomas, Is the behavior any different if you mount with the "-o abort_recov" option to avoid the recovery phase? --Rick On 3/11/21, 11:48 AM, "lustre-discuss on behalf of Thomas Roth via lustre-discuss" wrote: Hi all, after not getting out of the ldlm_lockd - situation, we are trying a shutdown plus restart. Does not work at all, the very first mount of the restart is MGS + MDT0, of course. It is quite busy writing traces to the log Mar 11 17:21:17 lxmds19.gsi.de kernel: INFO: task mount.lustre:2948 blocked for more than 120 seconds. Mar 11 17:21:17 lxmds19.gsi.de kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 11 17:21:17 lxmds19.gsi.de kernel: mount.lustreD 9616ffc5acc0 0 2948 2947 0x0082 Mar 11 17:21:17 lxmds19.gsi.de kernel: Call Trace: Mar 11 17:21:17 lxmds19.gsi.de kernel: [] schedule+0x29/0x70 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] schedule_timeout+0x221/0x2d0 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? select_task_rq_fair+0x5a6/0x760 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] wait_for_completion+0xfd/0x140 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? wake_up_state+0x20/0x20 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] llog_process_or_fork+0x244/0x450 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] llog_process+0x14/0x20 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] class_config_parse_llog+0x125/0x350 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mgc_process_cfg_log+0x790/0xc40 [mgc] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mgc_process_log+0x3dc/0x8f0 [mgc] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? config_recover_log_add+0x13f/0x280 [mgc] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mgc_process_config+0x88b/0x13f0 [mgc] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] lustre_process_log+0x2d8/0xad0 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? libcfs_debug_msg+0x57/0x80 [libcfs] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] server_start_targets+0x13a4/0x2a20 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? lustre_start_mgc+0x260/0x2510 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] server_fill_super+0x10cc/0x1890 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] lustre_fill_super+0x468/0x960 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? lustre_common_put_super+0x270/0x270 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mount_nodev+0x4f/0xb0 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] lustre_mount+0x38/0x60 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mount_fs+0x3e/0x1b0 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] vfs_kern_mount+0x67/0x110 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] do_mount+0x1ef/0xd00 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? __check_object_size+0x1ca/0x250 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? kmem_cache_alloc_trace+0x3c/0x200 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] SyS_mount+0x83/0xd0 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] system_call_fastpath+0x25/0x2a Other than that, nothing is happening. The Lustre processes have started, but e.g. recovery_status = Inactive. OK, perhaps because there is nothing out there to recover besides this MDS, all other Lustre servers+clients are still stopped. Still, on previous occasions the mount would not block in this way. The device would be mounted - now it does not make it into /proc/mounts Btw, the disk device can be mounted as type ldiskfs. So it exists, and it looks definitely like a Lustre MDT on the inside. Best, Thomas -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary /
Re: [lustre-discuss] MDT mount stuck
And a perhaps minor observation: Comparing to previous restarts in the log files, I see the line Lustre: MGS: Connection restored to 2519f316-4f30-9698-3487-70eb31a73320 (at 0@lo) Before, it was Lustre: MGS: Connection restored to c70c1b4e-3517-5631-28b1-7163f13e7bed (at 0@lo) What is this number? A unique identifier for the MGS? Which changes between restarts? Regards, Thomas On 11/03/2021 17.47, Thomas Roth via lustre-discuss wrote: Hi all, after not getting out of the ldlm_lockd - situation, we are trying a shutdown plus restart. Does not work at all, the very first mount of the restart is MGS + MDT0, of course. It is quite busy writing traces to the log Mar 11 17:21:17 lxmds19.gsi.de kernel: INFO: task mount.lustre:2948 blocked for more than 120 seconds. Mar 11 17:21:17 lxmds19.gsi.de kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 11 17:21:17 lxmds19.gsi.de kernel: mount.lustre D 9616ffc5acc0 0 2948 2947 0x0082 Mar 11 17:21:17 lxmds19.gsi.de kernel: Call Trace: Mar 11 17:21:17 lxmds19.gsi.de kernel: [] schedule+0x29/0x70 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] schedule_timeout+0x221/0x2d0 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? select_task_rq_fair+0x5a6/0x760 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] wait_for_completion+0xfd/0x140 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? wake_up_state+0x20/0x20 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] llog_process_or_fork+0x244/0x450 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] llog_process+0x14/0x20 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] class_config_parse_llog+0x125/0x350 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mgc_process_cfg_log+0x790/0xc40 [mgc] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mgc_process_log+0x3dc/0x8f0 [mgc] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? config_recover_log_add+0x13f/0x280 [mgc] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mgc_process_config+0x88b/0x13f0 [mgc] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] lustre_process_log+0x2d8/0xad0 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? libcfs_debug_msg+0x57/0x80 [libcfs] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] server_start_targets+0x13a4/0x2a20 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? lustre_start_mgc+0x260/0x2510 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] server_fill_super+0x10cc/0x1890 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] lustre_fill_super+0x468/0x960 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? lustre_common_put_super+0x270/0x270 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mount_nodev+0x4f/0xb0 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] lustre_mount+0x38/0x60 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mount_fs+0x3e/0x1b0 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] vfs_kern_mount+0x67/0x110 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] do_mount+0x1ef/0xd00 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? __check_object_size+0x1ca/0x250 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? kmem_cache_alloc_trace+0x3c/0x200 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] SyS_mount+0x83/0xd0 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] system_call_fastpath+0x25/0x2a Other than that, nothing is happening. The Lustre processes have started, but e.g. recovery_status = Inactive. OK, perhaps because there is nothing out there to recover besides this MDS, all other Lustre servers+clients are still stopped. Still, on previous occasions the mount would not block in this way. The device would be mounted - now it does not make it into /proc/mounts Btw, the disk device can be mounted as type ldiskfs. So it exists, and it looks definitely like a Lustre MDT on the inside. Best, Thomas -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] MDT mount stuck
Hi all, after not getting out of the ldlm_lockd - situation, we are trying a shutdown plus restart. Does not work at all, the very first mount of the restart is MGS + MDT0, of course. It is quite busy writing traces to the log Mar 11 17:21:17 lxmds19.gsi.de kernel: INFO: task mount.lustre:2948 blocked for more than 120 seconds. Mar 11 17:21:17 lxmds19.gsi.de kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 11 17:21:17 lxmds19.gsi.de kernel: mount.lustreD 9616ffc5acc0 0 2948 2947 0x0082 Mar 11 17:21:17 lxmds19.gsi.de kernel: Call Trace: Mar 11 17:21:17 lxmds19.gsi.de kernel: [] schedule+0x29/0x70 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] schedule_timeout+0x221/0x2d0 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? select_task_rq_fair+0x5a6/0x760 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] wait_for_completion+0xfd/0x140 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? wake_up_state+0x20/0x20 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] llog_process_or_fork+0x244/0x450 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] llog_process+0x14/0x20 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] class_config_parse_llog+0x125/0x350 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mgc_process_cfg_log+0x790/0xc40 [mgc] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mgc_process_log+0x3dc/0x8f0 [mgc] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? config_recover_log_add+0x13f/0x280 [mgc] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mgc_process_config+0x88b/0x13f0 [mgc] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] lustre_process_log+0x2d8/0xad0 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? libcfs_debug_msg+0x57/0x80 [libcfs] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] server_start_targets+0x13a4/0x2a20 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? lustre_start_mgc+0x260/0x2510 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] server_fill_super+0x10cc/0x1890 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] lustre_fill_super+0x468/0x960 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? lustre_common_put_super+0x270/0x270 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mount_nodev+0x4f/0xb0 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] lustre_mount+0x38/0x60 [obdclass] Mar 11 17:21:17 lxmds19.gsi.de kernel: [] mount_fs+0x3e/0x1b0 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] vfs_kern_mount+0x67/0x110 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] do_mount+0x1ef/0xd00 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? __check_object_size+0x1ca/0x250 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] ? kmem_cache_alloc_trace+0x3c/0x200 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] SyS_mount+0x83/0xd0 Mar 11 17:21:17 lxmds19.gsi.de kernel: [] system_call_fastpath+0x25/0x2a Other than that, nothing is happening. The Lustre processes have started, but e.g. recovery_status = Inactive. OK, perhaps because there is nothing out there to recover besides this MDS, all other Lustre servers+clients are still stopped. Still, on previous occasions the mount would not block in this way. The device would be mounted - now it does not make it into /proc/mounts Btw, the disk device can be mounted as type ldiskfs. So it exists, and it looks definitely like a Lustre MDT on the inside. Best, Thomas -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre stuck in ldlm_lockd (lock on destroyed export, lock timed out)
In addition, I noticed that those clients that do reconnect are logged as Mar 10 13:12:24 lxmds19.gsi.de kernel: Lustre: hebe-MDT: Connection restored to (at 10.20.0.41@o2ib5) MDS and MDT have this client listed (/proc/fs/lustre/.../exports/) and there is a uuid there for the client. Regards Thomas On 10.03.21 12:33, Thomas Roth via lustre-discuss wrote: Hi all, we are in a critical situation where our Lustre is rendered completely inaccessible. We are running Lustre 2.12.5 on CentOS 7.8, Whamcloud sources, MDTs on ldiskfs, OSTs on ZFS, 3 MDS. The first MDS, running MGS + MDT0, is showing ### lock callback timer expired evicting clients, and ### lock on destroyed export for the same client, as in Mar 10 09:51:54 lxmds19.gsi.de kernel: LustreError: 4779:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 450s: evicting client at 10.20.4.68@o2ib5 ns: mdt-hebe-MDT_UUID lock: 8f1ef6681b00/0xdba5480d76a73ab6 lrc: 3/0,0 mode: PR/PR res: [0x20002db4c:0x14:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 0x6020040020 nid: 10.20.4.68@o2ib5 remote: 0x5360294b0558b867 expref: 31 pid: 6649 timeout: 4849 lvb_type: 0 Mar 10 09:51:54 lxmds19.gsi.de kernel: LustreError: 6570:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) ### lock on destroyed export 8f1eede9 ns: mdt-hebe-MDT_UUID lock: 8f1efbded8c0/0xdba5480d76a9e456 lrc: 3/0,0 mode: PR/PR res: [0x20002c52b:0xd92b:0x0].0x0 bits 0x13/0x0 rrc: 175 type: IBT flags: 0x5020040020 nid: 10.20.4.68@o2ib5 remote: 0x5360294b0558b875 expref: 4 pid: 6570 timeout: 0 lvb_type: 0 Eventually, there is ### lock timed out ; not entering recovery in server code, just going back to sleep Restart of the server does not help. Recovery runs through, clients show the MDS in 'lfs check mds', but any kind of access (aka 'ls') will hang. Any help is much appreciated. Regards Thomas -- Thomas Roth Department: IT Location: SB3 2.291 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Lustre stuck in ldlm_lockd (lock on destroyed export, lock timed out)
Hi all, we are in a critical situation where our Lustre is rendered completely inaccessible. We are running Lustre 2.12.5 on CentOS 7.8, Whamcloud sources, MDTs on ldiskfs, OSTs on ZFS, 3 MDS. The first MDS, running MGS + MDT0, is showing ### lock callback timer expired evicting clients, and ### lock on destroyed export for the same client, as in Mar 10 09:51:54 lxmds19.gsi.de kernel: LustreError: 4779:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 450s: evicting client at 10.20.4.68@o2ib5 ns: mdt-hebe-MDT_UUID lock: 8f1ef6681b00/0xdba5480d76a73ab6 lrc: 3/0,0 mode: PR/PR res: [0x20002db4c:0x14:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 0x6020040020 nid: 10.20.4.68@o2ib5 remote: 0x5360294b0558b867 expref: 31 pid: 6649 timeout: 4849 lvb_type: 0 Mar 10 09:51:54 lxmds19.gsi.de kernel: LustreError: 6570:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) ### lock on destroyed export 8f1eede9 ns: mdt-hebe-MDT_UUID lock: 8f1efbded8c0/0xdba5480d76a9e456 lrc: 3/0,0 mode: PR/PR res: [0x20002c52b:0xd92b:0x0].0x0 bits 0x13/0x0 rrc: 175 type: IBT flags: 0x5020040020 nid: 10.20.4.68@o2ib5 remote: 0x5360294b0558b875 expref: 4 pid: 6570 timeout: 0 lvb_type: 0 Eventually, there is ### lock timed out ; not entering recovery in server code, just going back to sleep Restart of the server does not help. Recovery runs through, clients show the MDS in 'lfs check mds', but any kind of access (aka 'ls') will hang. Any help is much appreciated. Regards Thomas -- Thomas Roth Department: IT Location: SB3 2.291 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Inode quota: limits on different MDTs
Dear all, a user has hit the inode quota limit: Filesystemused quota limit grace files quota limit grace /lustre 12.76T 0k 0k - 635978* 636078 636178 2d16h29m43s Typical quota mathematics: 635978 > 636178, but it's distributed quota, very well. We have three MDTs, the user most probably has files and directories on only one of them. Let's check "-v": # lfs quota -h -v -u User /lustre /lustre 12.76T 0k 0k - 635978* 636078 636178 2d16h29m43s lustre-MDT_UUID 0k - 15.28G - 0 - 1 - lustre-MDT0001_UUID 134.2M - 16.04G - 635978 - 636460 - lustre-MDT0002_UUID 0k - 0k - 0 - 17150 - What is the meaning of column #7 in the output for each MDT? In the general result, it is the hard limit. Here it is 1 on MDT0 - everything probably needs at least one inode on the root of the fs. Seem the user's files are on MDT1, where the column reads 636460 - that is not what I set as the hard limit. And on MDT2, the column reads 17150, but no files from the user there. And the hard limit is also not the difference of these two values ;-) Best regards, Thomas -- Thomas Roth Department: IT Location: SB3 2.291 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org