Re: [lustre-discuss] lfsck repair quota
Dear Fernando, I'm not sure if those files contribute to the quota, but I would assume that the ones on the OSTs consume disk quota and the ones on the MDT consume inode quota. As long as they are in the lost+found directory they are not visible to the users, but they may contain data which belonged to user files. If they contain useful data and if files can be reconstructed completely depends on the exact damage that the e2fsck has tried to repair. A complete output of all the fsck runs could tell more, but even with that one would probably need further information e.g. about the striping stored in the extended attributes of the files. best regards, Martin ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] lfsck repair quota
Are there a lot of inodes moved to lost+found by the fsck, which contribute to the occupied quota now? - Ursprüngliche Mail - Von: Fernando Pérez An: lustre-discuss@lists.lustre.org Gesendet: Tue, 16 Apr 2019 16:24:13 +0200 (CEST) Betreff: Re: [lustre-discuss] lfsck repair quota Thank you Rick. I followed these steps for the ldiskfs OSTs and MDT, but the quotes for all users is more corrupted than before. I tried to run e2fsck in ldiskfs OSTs MDT, but the problem was the MDT e2fsck ran very slow ( 10 inodes per second for more than 100 million inodes). According to the lustre wiki I though that the lfsck could repair corrupted quotes: http://wiki.lustre.org/Lustre_Quota_Troubleshooting Regards. Fernando Pérez Institut de Ciències del Mar (CSIC) Departament Oceanografía Física i Tecnològica Passeig Marítim de la Barceloneta,37-49 08003 Barcelona Phone: (+34) 93 230 96 35 > El 16 abr 2019, a las 15:34, Mohr Jr, Richard Frank (Rick Mohr) > escribió: > > >> On Apr 15, 2019, at 10:54 AM, Fernando Perez wrote: >> >> Could anyone confirm me that the correct way to repair wrong quotes in a >> ldiskfs mdt is lctl lfsck_start -t layout -A? > > As far as I know, lfsck doesn’t repair quota info. It only fixes internal > consistency within Lustre. > > Whenever I have had to repair quotas, I just follow the procedure you did > (unmount everything, run “tune2fs -O ^quota ”, run “tune2fs -O quota > ”, and then remount). But all my systems used ldiskfs, so I don’t know > if the ZFS OSTs introduce any sort of complication. (Actually, I am not even > sure if/how you can regenerate quota info for ZFS.) > > -- > Rick Mohr > Senior HPC System Administrator > National Institute for Computational Sciences > http://www.nics.tennessee.edu > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Command line tool to monitor Lustre I/O ?
Hello Roland, there is a nice collection of lustre monitoring tools on the lustre wiki: http://wiki.lustre.org/Lustre_Monitoring_and_Statistics_Guide which also contains a couple of references. One of them is lltop, which has already been mentioned a couple of times and that's what came to my mind as well when I read your question. best regards, Martin ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] ko2iblnd optimizations for EDR
On 11/7/18 9:44 PM, Riccardo Veraldi wrote: > Anyway I Was wondering if something different is needed for mlx5 and > what are the suggested values in that case ? > > Anyone has experience with mlx5 LNET performance tunings ? Hi Riccardo, We have recently integrated mlx5 nodes into our fabric, and we had to reduce the values to peer_credits = 16 concurrent_sends = 16 because mlx5 doesn't support larger values for some reason. The peer_credits must have the same value in all connected lnets, even across routers (at least it used to be like this. I believe we are currently running some Lustre 2.5.x derivates on the server side, and newer versions on the various clients). kind regards, Martin ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] building lustre 2.11.50 on CentOS 7.4
problem solved: another git pull today, followed by autogen.sh and configure has made the error go away. I assume it was LU-10752 which was fixed by a patch by James Simons (commit 6189ae07c5161d14c9e9f863a400045f923f2301) that was landed on the hpdd git 16 hours ago. Martin On 04/09/2018 04:55 PM, Martin Hecht wrote: > Hi, > > I'm trying to build lustre 2.11 from source, with ldiskfs on CentOS 7.4. > > patching the kernel for ldiskfs worked fine, I have installed and booted > the patched kernel as well as the devel-rpm, but when I run `make rpms` > it exits with the following errors: > > Processing files: lustre-2.11.50-1.el7.centos.x86_64 > error: File not found: > /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/init.d/lsvcgss > error: File not found: > /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/sysconfig/lsvcgss > error: File not found: > /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/request-key.d/lgssc.conf > > > RPM build errors: > File not found: > /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/init.d/lsvcgss > File not found: > /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/sysconfig/lsvcgss > File not found: > /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/request-key.d/lgssc.conf > make: *** [rpms] Error 1 > > just `make` works fine, so the problem is something with packaging the > rpms. Any hints? > > kind regards, > Martin > > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- Dr. Martin Hecht High Performance Computing Center Stuttgart (HLRS) Office 0.051, HPCN Production, IT-Security University of Stuttgart Nobelstraße 19, 70569 Stuttgart, Germany Tel: +49(0)711/685-65799 Fax: -55799 Mail: he...@hlrs.de Web: http://www.hlrs.de/people/hecht/ PGP Key available at: https://www.hlrs.de/fileadmin/user_upload/Martin_Hecht.pgp PGP Key Fingerprint: 41BB 33E9 7170 3864 D5B3 44AD 5490 010B 96C2 6E4A smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] building lustre 2.11.50 on CentOS 7.4
Hi, I'm trying to build lustre 2.11 from source, with ldiskfs on CentOS 7.4. patching the kernel for ldiskfs worked fine, I have installed and booted the patched kernel as well as the devel-rpm, but when I run `make rpms` it exits with the following errors: Processing files: lustre-2.11.50-1.el7.centos.x86_64 error: File not found: /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/init.d/lsvcgss error: File not found: /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/sysconfig/lsvcgss error: File not found: /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/request-key.d/lgssc.conf RPM build errors: File not found: /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/init.d/lsvcgss File not found: /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/sysconfig/lsvcgss File not found: /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/request-key.d/lgssc.conf make: *** [rpms] Error 1 just `make` works fine, so the problem is something with packaging the rpms. Any hints? kind regards, Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Mixed size OST's
On 03/15/2018 04:48 PM, Steve Thompson wrote: > If I go with one OST per system (one zpool comprising 8 x 6 RAIDZ2 > vdevs), I will have a lustre f/s comprised of two 60 TB OST's and two > 192 TB OST's (minus RAIDZ2 overhead). This is obviously a big mismatch > between OST sizes. Depending on how full your file system is going to be, it may be better to create more OSTs on the new OSSes to have all OSTs roughly of the same size and avoid trouble balancing the fill level of the OSTs. We had a lustre system (back in lustre 1.8 times) with different disk sizes. We did put them into pools such that each pool contains only OSTs of the same size. We balanced the users between the pools such that the larger OSTs were filled more quickly than the smaller ones, or put in other words such that the percentage how much an OST was filled remained homogeneous across the whole file system. It worked, but this manual interaction was needed to prevent the small OSTs from reaching a critical filling level more quickly than the large ones. Maybe the internal algorithm has been improved in the meantime, but as far as I know it is just round robin until a critical difference of levels is reached and the weighted stripe allocation impacts performance. A rough description can be found on wiki.lustre.org/Managing_Free_Space or in the Lustre Manual in the corresponding chapter. However, I'm not sure if these sections are all up to date. smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Fwd: FW: mdt mounting error
Hi Parag, can you lctl ping 10.2.1.204@o2ib from the mgs node and from the mds now? I have seen on the list that you were able to load the modules, but well, if lnet is not working on the ib this might be a the reason for the errors you are seeing. Regards, Martin On 11/08/2017 09:15 AM, Parag Khuraswar wrote: > Hi, > > Any resolution on this ? > > Regards, > Parag. > > > > Original Message > Subject: [lustre-discuss] FW: mdt mounting error > Date: 2017-11-07 15:29 > From: "Parag Khuraswar"> To: "'Lustre discussion'" > > Hi, > > Lustre module is loaded. I am getting bellow errors in > /var/log/messaged while mounting mdt. > > Nov 7 14:10:03 mds1 kernel: LDISKFS-fs (dm-2): mounted filesystem with > ordered data mode. Opts: > user_xattr,errors=remount-ro,no_mbcache,nodelalloc > > Nov 7 14:10:03 mds1 kernel: LustreError: > 4852:0:(ldlm_lib.c:483:client_obd_setup()) can't add initial connection > > Nov 7 14:10:03 mds1 kernel: LustreError: > 4852:0:(obd_config.c:608:class_setup()) setup MGC10.2.1.204@o2ib failed > (-2) > > Nov 7 14:10:03 mds1 kernel: LustreError: > 4852:0:(obd_mount.c:202:lustre_start_simple()) MGC10.2.1.204@o2ib setup > error -2 > > Nov 7 14:10:03 mds1 kernel: LustreError: > 4852:0:(obd_mount_server.c:1573:server_put_super()) no obd home-MDT > > Nov 7 14:10:03 mds1 kernel: LustreError: > 4852:0:(obd_mount_server.c:132:server_deregister_mount()) home-MDT > not registered > > Nov 7 14:10:03 mds1 kernel: Lustre: server umount home-MDT complete > > Nov 7 14:10:03 mds1 kernel: LustreError: > 4852:0:(obd_mount.c:1504:lustre_fill_super()) Unable to mount (-2) > > Regards, > > Parag > > FROM: Ben Evans [mailto:bev...@cray.com] > SENT: Wednesday, November , 2017 6:19 PM > TO: Raj; Parag Khuraswar > CC: Lustre discussion > SUBJECT: Re: [lustre-discuss] mdt mounting error > > On the node in question Try: lsmod | grep lustre > > followed by: modprobe lustre > > I'm betting the modules aren't loaded for some reason, generally that > reason is found in dmesg. > > FROM: lustre-discuss on behalf > of Raj > DATE: Wednesday, November 1, 2017 at 7:48 AM > TO: Parag Khuraswar > CC: Lustre discussion > SUBJECT: Re: [lustre-discuss] mdt mounting error > > Parag, I have not tested two FS using a common MGT and I don't know > whether it is supported. > > On Wed, Nov 1, 2017 at 6:37 AM Parag Khuraswar > wrote: > >> Hi Raj, >> >> But I have two file systems, >> And I think I can use one mgt for two filesystems. Please correct me >> if >> I am wrong. >> >> Regards, >> Parag >> >> On 2017-11-01 16:56, Raj wrote: >>> The following can contribute to this issue: >>> - Missing FS name in mgt creation (it must be <=9 character long): >>> --fsname= >>> mkfs.lustre --servicenode=10.2.1.204@o2ib >>> --servicenode=10.2.1.205@o2ib --FSNAME=HOME --mgs /dev/mapper/mpathc >>> >>> - verify if /mdt directory exists >>> >>> On Wed, Nov 1, 2017 at 6:16 AM Raj wrote: >>> What options in mkfs.lustre did you use to format with lustre? On Wed, Nov 1, 2017 at 6:14 AM Parag Khuraswar wrote: Hi Raj, Yes, /dev/mapper/mpatha available. I could format and mount using ext4. >> Regards, Parag FROM: Raj [mailto:rajgau...@gmail.com] SENT: Wednesday, November , 2017 4:39 PM TO: Parag Khuraswar; Lustre discussion SUBJECT: Re: [lustre-discuss] mdt mounting error Parag, Is the device /dev/mapper/mpatha available? If not, the multipathd may not have started or the multipath configuration may not be correct. On Wed, Nov 1, 2017 at 5:18 AM Parag Khuraswar wrote: Hi, I am getting below error while mounting mdt. Mgt is mounted. Please suggest [root@mds2 ~]# mount -t lustre /dev/mapper/mpatha /mdt mount.lustre: mount /dev/mapper/mpatha at /mdt failed: No such file or directory Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs) Regards, Parag ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org smime.p7s Description: S/MIME
Re: [lustre-discuss] ldiskfsprogs
Hi Parag, please reply to the list or keep it in cc at least On 10/30/2017 01:21 PM, Parag Khuraswar wrote: > Hi Martin, > > The problem got resolved. > But I am not able to see ib in 'lctl list_nids' output > My lnet.conf file entry is 'options lnet networks=o2ib(ib0)' This file is > not executable. > > Can you help ? > > Regards, > Parag your lnet is probably not configured correctly. Things to check: - is the ib0 device there (i.e. make sure the infiniband layer works correctly)? - does the ib0 haven an ip address? (lustre normally doesn't use ip over ib but it uses the ip-addresses for identifying the hosts) - verify that you can ping the ip (with normal network ping to ensure that the connection is working) - Is the lnet module loaded? - if not can you load it manually with modprobe lnet? - what is written to dmesg / syslog when it fails? - when the module is loaded, try lctl network up Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] ldiskfsprogs
Hi, On 10/30/2017 09:56 AM, Parag Khuraswar wrote: > Hi, > > I am installing lustre cloned from github. Hmm... there are a few lustre related repositories on github. I would prefer the upstream Lustre git repository managed by Intel git://git.hpdd.intel.com unless you are interested in specific features that are not (yet) available from there. > After build of rpms I am trying > to install lustre rpms. > > I am getting below error > > Requires: ldiskfsprogs >= 1.42.7.wc1 > > But while compilation this package was not built. ldiskfsprogs used to be called e2fsprogs. However, in my experience it is a bit more a challenge to build these ones from source than for the main lustre packages. Anyhow, in Intel's git Lustre repository git://git.hpdd.intel.com there is also a branch tools/e2fsprogs.git - or you can use pre-built rpms for your OS from https://downloads.hpdd.intel.com/public/e2fsprogs/latest/ best regards, Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre [2.8.0] flock Functionality
Hello, we use the flock mount option on all our lustre systems (currently some 2.5 versions) and are not aware of any issues due to that. If your applications run on a single node (or require locks only locally) you could also try localflock. localflock has less performance impact than the global flock. How much impact you have depends on how heavily the applications make use of locks. We have measured a few per cent on lustre 1.8 in simple tests, but I think that the performance impact nowadays is even less, but as I said, it depends on the IO pattern. localflock is more risky than flock, because it makes your application think that locks are there, but in fact they are not globally visible, which may lead to strange effects with parallel applications spanning several nodes. We were running localflock on one of our systems for some time and occasionally heard about such problems from a few users. best regards, Martin On 03/28/2017 07:49 PM, DeWitt, Chad wrote: > Good afternoon, All. > > We've encountered several programs that require flock, so we are now > investigating enabling flock functionality. However, the Lustre manual > includes a passage in regards to flocks which gives us pause: > > "Warning > This mode affects the performance of the file being flocked and may affect > stability, depending on the Lustre version used. Consider using a newer > Lustre version which is more stable. If the consistent mode is enabled and > no applications are using flock, then it has no effect." > > We are running Lustre 2.8.0 (servers and clients). I've looked through > Jira, but didn't see anything that looked like a showstopper. > > Just curious if anyone has enabled flocks and encountered issues? Anything > in particular to look out for? > > Thank you in advance, > Chad > > > > Chad DeWitt, CISSP | HPC Storage Administrator > > UNC Charlotte *| *ITS – University Research Computing > > > smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] many 'ksym' packages required
I have seen this, too, on SL6, build went smoothly, but installation failed. A few months before 2.9 was tagged on master the build and install went smoothly. I'm not using zfs by the way. Unfortunately, I didn't find the time yet, to investigate this more deeply. Cheers, Martin On 12/20/2016 05:34 AM, Andrus, Brian Contractor wrote: > All, > I am running into an issue lately on rpms I build and the ones I download > from intel, where I try to install the server zfs module > (kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64.rpm) and it gives me a TON of errors > about Requires: ksym(xxx) > Example: > Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 > (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64) >Requires: ksym(dmu_objset_pool) = 0xa8cb0bd0 > Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 > (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64) >Requires: ksym(zap_cursor_serialize) = 0x3f455060 > Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 > (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64) >Requires: ksym(dmu_prefetch) = 0x7947c677 > Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 > (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64) >Requires: ksym(dsl_prop_register) = 0xa6f021e0 > Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 > (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64) >Requires: ksym(dmu_objset_space) = 0x0a5a5f8f > Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 > (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64) >Requires: ksym(zfs_prop_to_name) = 0xa483a8c3 > Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 > (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64) >Requires: ksym(txg_wait_callbacks) = 0x90f50ab1 > Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 > (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64) >Requires: ksym(nvlist_pack) = 0x424ac2e1 > Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 > (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64) >Requires: ksym(dmu_buf_rele) = 0x53e356d2 > Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 > (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64) >Requires: ksym(dmu_buf_hold_array_by_bonus) = 0x330ef227 > Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 > (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64) >Requires: ksym(dmu_objset_disown) = 0x27d01e19 > Error: Package: kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64 > (/kmod-lustre-osd-zfs-2.9.0-1.el7.x86_64) > > Has anyone seen this before and know what the issue could be? > > Brian Andrus > ITACS/Research Computing > Naval Postgraduate School > Monterey, California > voice: 831-656-6238 > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Mounting Lustre over IB-to-Ethernet gateway
Hi Kevin, I think your proposed lnet config line is correct and it would add tcp0. If you add a new lnet on the servers you have to reload the lnet module, which implies that you have to restart lustre (you don't have to reboot if unloading the modules works smoothly, i.e. unmounting all targets, followed by lustre_rmmod, and then mounting targets again, you don't have to restart the ib clients though). If you have clients with an interface on both networks (which could act as lnet routers) you can do without restarting the servers. You don't have to add the lnet on the servers in that case, but you just have to add the routes to the new lnet on all servers which works in production with lctl --net tcp0 add_route client-ip@o2ib0. On the routers you need forwarding="enabled" and they need both lnets, each of them assigned to the appropriate interface (in order to configure this you have to reload the lnet module on the clients which will act as routers). On the tcp clients you would need the route across the routers in the opposite direction. However, in that scenario you wouldn't use the ib2eth gateway. Greetings, Martin On 08/01/2016 01:05 PM, Kevin M. Hildebrand wrote: > Our Lustre filesystem is currently set up to use the o2ib interface only- > all of the servers have > options lnet networks=o2ib0(ib0) > > We've just added a Mellanox IB-to-Ethernet gateway and would like to be > able to have clients on the Ethernet side also mount Lustre. The gateway > extends the same layer-2 IP range that's being used for IPoIB out to the > Ethernet clients > > How should I go about doing this? Since the clients don't have IB, it > doesn't appear that I can use o2ib0 to mount. Do I need to add another > lnet network on the servers? Something like > options lnet networks=o2ib0(ib0),tcp0(ib0)? Can I have both protocols on > the same interface? > And if I do have to add another lnet network, is there any way to do so > without restarting the servers? > > Thanks, > Kevin > > -- > Kevin Hildebrand > University of Maryland, College Park > Division of IT > > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Analog of ll_recover_lost_found_objs for MDS
Hi James, I'm not aware of a ready-to use tool, but if you have captured the output of e2fsck you can use that as a basis for a script that puts the files back to their original location. e2fsck usually prints out the full path and the inode numbers of the files/directories which it moves to lost+found and there, they are named "#$inode" (which makes scripting a bit ugly, but if you properly escape the '#'-sign and do some magic with awk, perl or alike, you can transform the log to a shell script that moves your files back to the orignal path). I have done this once after a file system corruption after an upgrade from 1.8 to 2.4 (which contained an ugly bug when enabling the "FID in dirent" feature). The backup... well, it gives you the chance to go back to the state where you started *before* the e2fsck. That would be a chance to capture the output again, in case you did not store it (actually, you could do this offline, on a copy of the backup). Restoring the MDT out of the backup however is only useful as long as you did not go in production after the e2fsck. And as I said, you still have to "repair" the restored MDT (probably by doing the same steps as you already did on the live system), but a second chance is better than no chance to go back... The backup is also good to investigate what happened during the e2fsck (in case it did something weird) or to go in with e2fsdebug for manual investigations... (e.g. manually look up inode<->path relations). Martin On 07/26/2016 04:08 PM, jbellinger wrote: > Is there, or could there be, something analogous to the OST recovery > tool that works on the lost+found on the MDT? e2fsck went berserk. > > We're running 2.5.3. > > > Thanks, > James Bellinger > > Yes, we have an older (therefore somewhat inconsistent) backup of the > mdt, so we should be able to recover most things, _in theory_. In > practice -- we'd love to hear other people's experience about recovery > using an inconsistent backup. > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] luster client mount issues
Hi, I think your client doesn't have the o2ib lnet (it should appear in the output of the lctl ping, even if you ping on the tcp lnet). In your /etc/modprobe.d/lustre.conf o2ib is associated with the ib0 interface, but your /var/log/messages talks about ib1. If it is a dual port card where just one port is used, the easiest would be to plug the cable to the other interface. (If there are two ib connections, things might become a bit more complicated. There are examples for multi rail configurations using several lnets in the lustre manual, but maybe this goes too far.) With the attempt to mount via tcp (or tcp0, which is the same) I think the problem is that the file system config on the mgs doesn't contain the tcp-NIDs and/or the routes are not configured correctly. It seems the attempt to mount via tcp causes the client to use o2ib for the connections to the MDS and OSSes. So, I would recommend to get that working first and then look at tcp0 at a later stage (if you need it at all - native o2ib is more performant). Last but not least I have noticed a typo in your client mount command: mount -t lustre 192.168.200.52@ob2:/mylustre /lustre this should be "o2ib" here, too. best regards, Martin On 07/20/2016 08:09 PM, sohamm wrote: > Hi > > Any guidance/help on this is greatly appreciated. > > Thanks > > On Mon, Jul 18, 2016 at 7:25 PM, sohammwrote: > >> Hi Ben >> Both the networks have netmasks of value 255.255.255.0 >> >> Thanks >> >> On Mon, Jul 18, 2016 at 10:08 AM, Ben Evans wrote: >> >>> What do your netmasks look like on each network? >>> >>> From: lustre-discuss on behalf >>> of sohamm >>> Date: Monday, July 18, 2016 at 1:56 AM >>> To: "lustre-discuss@lists.lustre.org" >>> Subject: Re: [lustre-discuss] lustre-discuss Digest, Vol 124, Issue 17 >>> >>> Hi Thomas >>> Below are the results of the commands you suggested. >>> >>> *From Client* >>> [root@dev1 ~]# lctl ping 192.168.200.52@o2ib >>> failed to ping 192.168.200.52@o2ib: Input/output error >>> [root@dev1 ~]# lctl ping 192.168.111.52@tcp >>> 12345-0@lo >>> 12345-192.168.200.52@o2ib >>> 12345-192.168.111.52@tcp >>> [root@dev1 ~]# mount -t lustre 192.168.111.52@tcp:/mylustre /lustre >>> mount.lustre: mount 192.168.111.52@tcp:/mylustre at /lustre failed: >>> Input/output error >>> Is the MGS running? >>> mount: mounting 192.168.111.52@tcp:/mylustre on /lustre failed: Invalid >>> argument >>> >>> cat /var/log/messages | tail >>> Jul 18 01:37:04 dev1 user.warn kernel: [2250504.401397] ib1: multicast >>> join failed for ff12:401b::::::, status -22 >>> Jul 18 01:37:26 dev1 user.warn kernel: [2250526.257309] LNet: No route to >>> 12345-192.168.200.52@o2ib via (all routers down) >>> Jul 18 01:37:36 dev1 user.warn kernel: [2250536.481862] ib1: multicast >>> join failed for ff12:401b::::::, status -22 >>> Jul 18 01:41:53 dev1 user.warn kernel: [2250792.947299] LNet: No route to >>> 12345-192.168.200.52@o2ib via (all routers down) >>> >>> >>> *From MGS* >>> [root@lustre_mgs01_vm03 ~]# lctl ping 192.168.111.102@tcp >>> 12345-0@lo >>> 12345-192.168.111.102@tcp >>> >>> Please let me know what else i can try. Looks like i am missing something >>> with the ib config? Do i need router setup as part of lnet ? >>> if i am able to ping mgs from client on the tcp network, it should still >>> work ? >>> >>> Thanks >>> >>> >>> On Sun, Jul 17, 2016 at 1:07 PM, To: "lustre-discuss@lists.lustre.org" Subject: [lustre-discuss] llapi_file_get_stripe() and /proc/fs/lustre/osc/entries Message-ID: <03ceaaa0-b004-ae43-eaa1-437da2a5b...@iodoctors.com> Content-Type: text/plain; charset="utf-8"; Format="flowed" I am using
Re: [lustre-discuss] rpmbuild error with lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64.src.rpm
Hi Andreas, I can't reproduce this with the latest master on a freshly installed CentOS 6.8. I have successfully built the server packages and also the client for the unpatched kernel, both without having heartbeat installed. Maybe the spec file has been fixed already. LU-5760 "LU-4707 patch breaks Lustre build" might be related. Here a discussion about the issue with Lustre 2.5 on el6.5 which I have found: http://comments.gmane.org/gmane.comp.file-systems.lustre.user/13961 Cheers, Martin On 06/29/2016 07:55 PM, Dilger, Andreas wrote: > This is a bug in the RPM .spec file. While heartbeat is one option for HA on > servers, it definitely should not be required. Could you please file a Jira > ticket with details. > > Cheers, Andreas > >> On Jun 29, 2016, at 11:36, Martin Hecht <he...@hlrs.de> wrote: >> >> Hello, >> >> I have just seen that you managed to mount with a different kernel, but >> let me come back to this error when building your own rpms for a >> specific kernel. >> >> Independent if you use it or not, I believe on lustre servers you need >> to have heartbeat installed nowadays. This is not installed by default >> on a standard centos server, and it's a new requirement to build the >> rpms since some 2.x release (it was optional before, and actually using >> it is still optional). This requirement for building and installing the >> server rpms is not mentioned in all tutorials and unfortunately the >> absence of heartbeat is not properly detected by the configure system. >> It would be better to fail earlier, during configure, with a clear error >> message, rather than the error during make which you have seen here (has >> anybody filed a lustre bug about this yet?) >> >> If you aim to build lustre client rpms only, you can use the rpmbuild >> option --without servers to work around this problem, but If I didn't >> miss anything in the discussion before you are trying to build the >> server rpms with zfs, so --without servers is not suitable for you, but >> mentioning it here might be helpful for others who run into the same >> trouble. >> >> Martin >> >>> On 06/28/2016 04:55 PM, Yu Chen wrote: >>> Hello, >>> >>> Trying to follow Christopher's advice to rebuild the lustre from src.rpm. >>> However, got into this error: >>> >>> ... >>> >>> make[3]: Nothing to be done for `install-data-am'. >>> >>> make[3]: Leaving directory `/home/build/rpmbuild/BUILD/lustre-2.8.0/lustre' >>> >>> make[2]: Leaving directory `/home/build/rpmbuild/BUILD/lustre-2.8.0/lustre' >>> >>> make[1]: Leaving directory `/home/build/rpmbuild/BUILD/lustre-2.8.0/lustre' >>> >>> + : >>> >>> + ln -s Lustre.ha_v2 >>> /home/build/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64/etc/ha.d/resource.d/Lustre >>> >>> ln: failed to create symbolic link >>> '/home/build/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64/etc/ha.d/resource.d/Lustre': >>> No such file or directory >>> >>> error: Bad exit status from /var/tmp/rpm-tmp.Rhg32s (%install) >>> .. >>> >>> >>> There seems someone posted to the list before about this error too, and no >>> answers, wondering if anybody has some solutions now? >>> >>> Thanks in advance! >>> >>> Regards, >>> >>> Chen >> >> ___ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] rpmbuild error with lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64.src.rpm
Hello, I have just seen that you managed to mount with a different kernel, but let me come back to this error when building your own rpms for a specific kernel. Independent if you use it or not, I believe on lustre servers you need to have heartbeat installed nowadays. This is not installed by default on a standard centos server, and it's a new requirement to build the rpms since some 2.x release (it was optional before, and actually using it is still optional). This requirement for building and installing the server rpms is not mentioned in all tutorials and unfortunately the absence of heartbeat is not properly detected by the configure system. It would be better to fail earlier, during configure, with a clear error message, rather than the error during make which you have seen here (has anybody filed a lustre bug about this yet?) If you aim to build lustre client rpms only, you can use the rpmbuild option --without servers to work around this problem, but If I didn't miss anything in the discussion before you are trying to build the server rpms with zfs, so --without servers is not suitable for you, but mentioning it here might be helpful for others who run into the same trouble. Martin On 06/28/2016 04:55 PM, Yu Chen wrote: > Hello, > > Trying to follow Christopher's advice to rebuild the lustre from src.rpm. > However, got into this error: > > ... > > make[3]: Nothing to be done for `install-data-am'. > > make[3]: Leaving directory `/home/build/rpmbuild/BUILD/lustre-2.8.0/lustre' > > make[2]: Leaving directory `/home/build/rpmbuild/BUILD/lustre-2.8.0/lustre' > > make[1]: Leaving directory `/home/build/rpmbuild/BUILD/lustre-2.8.0/lustre' > > + : > > + ln -s Lustre.ha_v2 > /home/build/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64/etc/ha.d/resource.d/Lustre > > ln: failed to create symbolic link > '/home/build/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64/etc/ha.d/resource.d/Lustre': > No such file or directory > > error: Bad exit status from /var/tmp/rpm-tmp.Rhg32s (%install) > .. > > > There seems someone posted to the list before about this error too, and no > answers, wondering if anybody has some solutions now? > > Thanks in advance! > > Regards, > > Chen > smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Apache via NFS via Lustre
I think, if the apache uid and gid needs to be known on the mds, this depends on the question if you have configured mdt.group_upcall or not. If not, the group memberships are checked on the lustre client against its /etc/group (or ldap if that's configured). On 03/09/2016 06:59 AM, Philippe Weill wrote: > We use this on our lustre 2.5.3 but now your apache user uid and gid > have to be known on mds server > > Le 09/03/2016 03:05, John Dubinski a écrit : >> Hi, >> >> I'm wondering if there are any developments on this front. >> >> We also NFS export some lustre filesystems from a client to an apache >> server so that users can link to their large datasets on their >> personal websites. This has been working for years for us using >> lustre 1.8. >> >> We recently built some new systems using lustre 2.5.3 and now this >> functionality is broken in the same way that Eric describes - >> symlinks to directories and files on the lustre filesystem are denied >> by the apache server. This doesn't seem to be due to our apache >> configuration since symlinks to files and directories in ordinary >> (non-lustre) nfs-mounted filesystems work. Also the nfs-exported >> filesystems behave normally - you can copy files in, as well as >> create and delete files as you wish. >> The only problems arise in relation to apache access. >> >> We've also noticed that whenever the forbidden access messages comes >> up in the browser /var/log/messages on the lustre client spits out >> this error consistently: >> >> Mar 8 19:53:14 nuexport02 kernel: LustreError: >> 2626:0:(mdc_locks.c:918:mdc_enqueue()) ldlm_cli_enqueue: -13 >> >> This appears to be related to file locking looking at the code... >> >> We have also built a test apache server with lustre client modules >> that directly mount our lustre filesystems. symlinks to the >> directories in the lustre fs within /var/www/html similarly return >> the forbidden access message with the above mdc_locks error. >> >> We're running CentOS 6 with lustre 2.5.3 on the client and server >> side. To repeat, direct client mounts of the lustre filesystems >> behave normally as well as nfs-exported mounts. Only apache access >> to symlinks of files on a lustre filesystem give trouble. >> >> Are there any special nfs export flags that can be set to help in >> /etc/exports? >> >> Thanks for any help or insight! >> >> Regards, >> >> John >> >> -- >> John Dubinski >> Canadian Institute for Theoretical Astrophysics >> University of Torontophone: 416-946-7290 >> 60 St. George St.fax:416-946-7287 >> Toronto, Ontario e-mail: dubin...@cita.utoronto.ca >> CANADA M5S 3H8 url:www.cita.utoronto.ca >> >> >> >> >> >> >> ___ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> > smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Hi, comments inline... On 11/04/2015 01:34 PM, Patrick Farrell wrote: > Our observation at the time was that lfsck did not add the fid to the .. > dentry unless there was already space in the appropriate location. Ok, I might have been wrong in this point and some manual mv by the users was involved. On 11/04/2015 04:24 PM, Chris Hunter wrote: > Yes I believe you want to (manually) recover the directories from > lost+found back to ROOT on the MDT before lfsck/oi_scrub runs. I don't > think lfsck on the MDT will impact orphan objects on the OSTs. With lfsck phase 2 introduced in lustre 2.6 the MDT-OST consistency is checked and repaired. Chris, you wrote that you have upgraded to "lustre 2.x", so I don't know if you have lfsck II already. And I'm not sure if MDT entries in lost+found are ignored by lfsck. I just wanted to point out that you might have to be careful here, but looking at the lustre manual it turns out that you are right. The consistency checks are run when lfsck type is set to "layout", which is a different thing than the "namespace" check used to update the FIDs. On 11/05/2015 01:29 AM, Dilger, Andreas wrote: > Note that newer versions of LFSCK namespace checking (2.6 or 2.7, don't > recall offhand) will be able to return such entries from lost+found back > into the proper parent directory in the namespace, assuming they were > created under 2.x. Lustre stores an extra "link" xattr on each inode with > the filename and parent directory FID for each link to the file (up to the > available xattr space for each inode), so in case of directory corruption > it would be possible to rebuild the directory structure just from the > "link" xattrs on each file. that's good to know. However, the files in this case were created with 1.8, so even if the current version after the upgrade has this "link" xattr, it doesn't help to recover from LU-5626. But your script is useful (it's pretty much the same as I did back then, but I didn't find my quick hack it anymore...) > In the meantime, I attached a script to LU-5626 that could be used to > re-link files from lost+found into the right directory and filename based > on the output from e2fsck. It is a bit rough (needs manual editing of > pathnames), but may be useful if someone has hit this problem. best regards, Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
On 11/04/2015 03:23 AM, Patrick Farrell wrote: > PAF: Remember, the specific conditions are pretty tight. Created under 1.8, > not empty (if it's empty, the .. dentry is not misplaced when moved) but also > non-htree, then moved with dirdata enabled, and then grown to this larger > size. How many existing (small) directories do you move and then add a bunch > of files to? It's a pretty rare operation. We only hit it at Martin's site > because of an automated tool they have to re-arrange user/job directories. Well, not only because of the tool. Especially, because when the directories have been moved by the tool, no files are added anymore. However, our mechanism gives a reason to the users to move their data from time to time (that's not the intention of the mechanism, but that's how some users react). But I'm not quite sure anymore if moving the directories is really a precondition to run into LU-5626. We have run the background lfsck which adds the FID to the existing dentries. This might be an important detail, because in our case a second '..' entry containing the FID was presumably created by lfsck (in the wrong place), and not by moving the directory. To my current understanding the user then only has to add some files to trigger the LBUG. A subsequent e2fsck will not only find this particular directory but all other small directories with a '..' entry in the wrong place. When e2fsck tries to fix these directories, some entries are overwritten by the FID and these files are then moved to lost+found. If one of these first entries happens to be a small subdirectory, I believe there is a chance to run into the same issue again, when you move everything back to the original location after the e2fsck and someone starts adding files in these subdirectories. However, the preconditions are still quite narrow: small directories, not empty, created without fid, then converted by lfsck (or alternatively moved to a different place which would also create the second '..' entry). To trigger the LBUG files need to be added to one of these directories and for a second occurrence of the LBUG the same conditions must hold for another subdirectory which must have been at the very beginning of the directory. Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)
Hi Chris and Patrick, I was sick last week so I have found this conversation not before today, sorry On 10/27/2015 05:06 PM, Patrick Farrell wrote: > If you read LU-5626 carefully, there's an explanation of the exact nature of > the damage, and having that should let you make partial recoveries by hand. > I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it > would prove helpful in this instance. there is no tool like ll_recover_lost_found_objs for the MDT. On OSTs this would be the right choice. > Note that there's two forms to this corruption. One is if you move a > directory which was created before dirdata was enabled, then the '..' entry > ends up in the wrong place. This does not trouble Lustre, but fsck reports > it as an error and will 'correct' it, which has the effect of (usually) > overwriting one dentry in the directory when it creates a new '..' dentry in > the correct location. > > I don't *think* that one causes the MDT to go read only, but I could be > wrong. I *think* what causes the MDT to go read only is the other problem: > > When you have a non-htree directory (not too many items in it, all directory > entries in a single inode) that is in the bad state described above (with the > '..' dentry in the wrong place after being moved) and that directory has > enough files added to it that it becomes an htree directory, the resulting > directory is corrupted more severely. We never sorted out the precise > details of this - I believe we chose to simply delete any directories in this > state. (I think lfsck did it for us, but can't recall for sure.) If I recall correctly, moving (or renaming) the corrupted directory to another place caused the MDT to go readonly, probably adding more files as Patrick wrote before is another trigger. In our case we captured the full ouptut of e2fsck which contained the original names and the inodes. fsck moved some of the files and subdiretories of the corrupted directories to lost+found. With the information contained in the e2fsck output we could move them back from lost+found to their original place on the ldiskfs level (I have parsed the e2fsck output for a pattern matching the inode numbers and created a script out of it). We had to repeat this a couple of times, because either some of the subdirectories moved to lost+found were in a bad shape themselves or were further damaged later when the owners added files to them later on or moved them around. So, if you have captured all your e2fsck output and you haven't yet cleaned up lost+found, you still can recover the data. lfsck would probably throw away the objects on the OSTs because it thinks they are orphane objects left over after deleting the files. best regards, Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre 2.5.3 - OST unable to connect to MGS
Hi, you can use ll_recover_lost_found_objs to recover the files in lost+found to their original location. I think this should be the first step. Also these messages look a bit scary to me: Oct 7 13:02:04 OSS50 kernel: LustreError: 0-0: Trying to start OBD Lustre-OST003b_UUID using the wrong disk <85>. Were the /dev/ assignments rearranged? ... Oct 7 13:02:04 OSS50 kernel: LustreError: 15b-f: MGC172.16.0.251@tcp: The configuration from log 'Lustre-OST003b'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre. Oct 7 13:02:05 OSS50 kernel: LustreError: 15c-8: MGC172.16.0.251@tcp: The configuration from log 'Lustre-OST003b' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. before actually instructing tunefs.lustre to do the writeconf I would check the configuration, parameters etc. with --dryrun. Maybe you also have to put --erase-params and re-configure the OST. Or other CONFIG files (e.g. mountdata) got screwed up on this OST (or was moved to lost+found by the e2fsck?). If you have lost some important ones, some data exists in a copy on the MGT (basically, the writeconf is the mechanism, which transfers it to the MGS). It's a bit difficult to give a good advice by looking at the syslog messages only. Anyhow, recovering the files from lost+found should be the first step, maybe followed by a closer look at the OST on the ldiskfs level. regards, Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Remove failnode parameter
Hi, --erase-params should remove everything, but you have to set the other ones (e.g. --mgsnode) again. If you don't want to have any --failnode, just leave that one out. At least this is how it should work. If not, you could post the exact command line and the output of the tunefs command here. Martin On 09/27/2015 05:16 PM, Exec Unerd wrote: > Thanks for the reply. > > I can't seem to get tunefs to *remove *the failnode parameter. I can > *change *the failnode NIDs, but I can't figure out how to wholesale remove > the param as if I'd never put it in. > > On Thu, Sep 24, 2015 at 1:43 AM, Martin Hecht <he...@hlrs.de> wrote: > >> On 09/23/2015 02:38 AM, Exec Unerd wrote: >>> I made a typo when setting failnode/servicenode parameters, but I can't >>> figure out how to remove the failnode parameter entirely >>> >>> I can change the failnode NIDs, but I can't figure out how to completely >>> remove "failnode" from the system. >>> >>> Does anyone have an example of a syntax (maybe lctl?) that will eliminate >>> the failnode parameter from the config so there's no chance it gets in >> the >>> way of the servicenode parameter? >>> >> you have set the failnode with tunefs.lustre, right? You can erase *all* >> parameters with tunefs.lustre --erase-params and set the correct ones >> again. You can combine several ones in one call, and I recommend to use >> also the dry-run option before actually changing anything >> >> tunefs.lustre --erase-params --mgsnode=10.11.12.13@o2ib --param >> sys.timeout=300 --failnode=10.11.12.101@o2ib --dryrun >> /dev/mapper/some-device >> >> the output will be the previous values and the premanent disk data which >> the command intents to write. If this is ok, ommit the --dryrun option. >> BTW the file system must be unmounted to perform the tunefs command. >> >> >> -- Dr. Martin Hecht High Performance Computing Center Stuttgart (HLRS) Office 0.051, HPCN Production, IT-Security University of Stuttgart Nobelstraße 19, 70569 Stuttgart, Germany Tel: +49(0)711/685-65799 Fax: -55799 Mail: he...@hlrs.de Web: http://www.hlrs.de/people/hecht/ PGP Key Fingerprint: 41BB 33E9 7170 3864 D5B3 44AD 5490 010B 96C2 6E4A smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Multiple MGS interfaces config
On 09/27/2015 08:59 PM, Exec Unerd wrote: >> I'm not sure if I have understood your setup correctly. > In this case, the clients are a combination of all three: some are o2ib > only, some tcp only, and some o2ib+tcp with tcp as failover. > > It sounds like I need a combination of configurations, one for the OSSes > and one for each client type. > > So if I used this parameter in the OST, > --mgsnode="172.16.10.1@o2ib0,192.168.10.1@tcp0" > > Then configured the modprobe.d/lustre.conf appropriately on the clients > tcp: options lnet networks="tcp0(ixgbe1)" > o2ib: options lnet networks="o2ib0(ib1)" > both: options lnet networks="o2ib0(ib1),tcp0(ixgbe1)" > > And use these mount parameters: > tcp: mount -v -t lustre 192.168.10.1@tcp0:/testfs /mnt/testfs > o2ib: mount -v -t lustre 172.16.10.1@o2ib0:/testfs /mnt/testfs > both: mount -v -t lustre 172.16.10.1@o2ib0,192.168.10.1@tcp0:/testfs I think here it should be a colon between the two MGS nids: mount -v -t lustre 172.16.10.1@o2ib0:192.168.10.1@tcp0:/testfs > /mnt/testfs > > Everything should be happy? > > On Thu, Sep 24, 2015 at 9:12 AM, Martin Hecht <he...@hlrs.de> wrote: > >> On 09/24/2015 05:33 PM, Chris Hunter wrote: >>> [...] >>>>2. What's the best way to trace the TCP client interactions to see >>>> where >>>>it's breaking down? >>> If lnet is running on the client, you can try "lctl ping" >>> eg) lctl ping 172.16.10.1@o2ib >>> >>> I believe a lustre mount uses ipoib for initial handshake with a mds >>> o2ib interfaces. You should make sure regular ping over ipoib is >>> working before mounting lustre. >> if the client and the server is on the same network, yes, it's a good >> starting point. But it's not a prerequisite. In general you can have an >> lnet router in-between or have different ip subnets for ipoib, so you >> can't ping on the ipoib layer, but you can still lctl ping the whole >> path (although you could verify that you can ip ping to the next hop at >> least). >> >> We also have a case in which we tried to block ipoib completely with >> iptables, but we still could lctl ping, even after rebooting the host >> and ensuring that the firewall was up before loading the lnet module. >> So, I doubt that ipoib is needed at all for establishing the o2ib >> connection. >> >> smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Multiple MGS interfaces config
On 09/23/2015 02:39 AM, Exec Unerd wrote: > My environment has both TCP and IB clients, so my Lustre config has to > accommodate both, but I'm having a hard time figuring out the proper syntax > for it. Theoretically, I should be able to use comma-separated interfaces > in the mgsnode parameter like this: > > --mgsnode=192.168.10.1@tcp0,172.16.10.1@o2ib > --mgsnode=192.168.10.2@tcp0,172.16.10.2@o2ib I think this should work: --mgsnode=192.168.10.1@tcp0 --mgsnode=172.16.10.1@o2ib --mgsnode=192.168.10.2@tcp0 --mgsnode=172.16.10.2@o2ib at least that's how it works with a multirail ib network (where you would replace tcp0 by o2ib1). The mount command would contain all 4 nids, but if the client can't connect via tcp it takes until it reaches a timeout and tries the next one. If in addition the MGS is failed over to the second server I guess it takes three timeouts until the client succeeds to connect. > The problem is, this doesn't work for all clients all the time ... > randomly. It would work, then it wouldn't. Googling, I found some known > defects saying that the comma delimiter didn't work as per the manual and > recommending alternate syntaxes like using the colon instead of a comma. I > know what the manuals *say*about the syntax, I'm just having trouble > getting it to work. I'm not sure if I have understood your setup correctly. You have ib clients and you have other hosts which are connected via tcp, right? Or do the clients have both, and the tcp network a failback solution in case the ib doesn't work properly (network flooded, SM crashed or alike)? When you say it doesn't work on a particular client, can you lctl ping one of the nids in this situation? Or can you ping the other direction from the server to the client? And if at least one of the pings succeeds, can you suddenly mount afterwards? > This seems to affect only the TCP clients; at least I haven't seen it > affect any of the IB clients. It may be a comma parsing problem or > something else. > > I have two questions for the group: > >1. Is there a known-working method for using both TCP and IB interface >NIDs for the MGS in this manner? >2. What's the best way to trace the TCP client interactions to see where >it's breaking down? > > Versions in use: > kernel: 2.6.32-504.23.4.el6.x86_64 > lustre: lustre-2.7.58-2.6.32_504.23.4.el6.x86_64_g051c25b.x86_64 > zfs: zfs-0.6.4-76_g87abfcb.el6.x86_64 > > My lustre.conf contents: > options lnet networks="o2ib0(ib1),tcp0(ixgbe1)" ip2nets could be an alternative here, especially if not all clients have both interfaces. smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Remove failnode parameter
On 09/23/2015 02:38 AM, Exec Unerd wrote: > I made a typo when setting failnode/servicenode parameters, but I can't > figure out how to remove the failnode parameter entirely > > I can change the failnode NIDs, but I can't figure out how to completely > remove "failnode" from the system. > > Does anyone have an example of a syntax (maybe lctl?) that will eliminate > the failnode parameter from the config so there's no chance it gets in the > way of the servicenode parameter? > you have set the failnode with tunefs.lustre, right? You can erase *all* parameters with tunefs.lustre --erase-params and set the correct ones again. You can combine several ones in one call, and I recommend to use also the dry-run option before actually changing anything tunefs.lustre --erase-params --mgsnode=10.11.12.13@o2ib --param sys.timeout=300 --failnode=10.11.12.101@o2ib --dryrun /dev/mapper/some-device the output will be the previous values and the premanent disk data which the command intents to write. If this is ok, ommit the --dryrun option. BTW the file system must be unmounted to perform the tunefs command. smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Multiple MGS interfaces config
On 09/24/2015 05:33 PM, Chris Hunter wrote: > [...] >>2. What's the best way to trace the TCP client interactions to see >> where >>it's breaking down? > If lnet is running on the client, you can try "lctl ping" > eg) lctl ping 172.16.10.1@o2ib > > I believe a lustre mount uses ipoib for initial handshake with a mds > o2ib interfaces. You should make sure regular ping over ipoib is > working before mounting lustre. if the client and the server is on the same network, yes, it's a good starting point. But it's not a prerequisite. In general you can have an lnet router in-between or have different ip subnets for ipoib, so you can't ping on the ipoib layer, but you can still lctl ping the whole path (although you could verify that you can ip ping to the next hop at least). We also have a case in which we tried to block ipoib completely with iptables, but we still could lctl ping, even after rebooting the host and ensuring that the firewall was up before loading the lnet module. So, I doubt that ipoib is needed at all for establishing the o2ib connection. smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] 1.8 client on 3.13.0 kernel
Hi, A backup is always a good idea if feasible. It gives you at least the chance to go back and start over again. However, a backup of the MDT alone wouldn't help much, because as soon as you put the file system online and users start to work on their files, also the content of the OSTs will change. Restoring the MDT backup would cause the MDT being out of sync with the OSTs. You notice some bugs immediately during the upgrade (e.g. the one with the CATALOGS file which prevents you from starting the MDT again), but some others (e.g. quota bugs or the one about the FID) pop up a few hours or days after you have started production again, and then you have to make a decision. Even if you have a full backup, it's always a trade-off if you decide to give it a try to fix the problems based on the state where you are or if you go back and restore the backup. But even then, you have to put some measures in place which ensure that you won't run into the same problem again. In the worst case it's reinstalling the servers with the lustre version you have used before. A full backup at least gives you this fallback for the worst case scenario. It can also be useful for offline analysis in case you have to investigate what's going wrong. In the particular case with the FID in the directory a file level backup of the MDT wouldn't have been of that much help, because you also have to backup the extended attributes. There is a section in the lustre manual how to do this. However, these structures must be converted (at least if you want to make use of the fid_in_dirent feature). If I'm not mistaken, the structures were ok right after the upgrade and the subsequent lfsck run, but the ldiskfs backend contained a bug which caused things to be overwritten, when users started to move files somewhere else. Lustre 2.4.3 is marked as affected in LU-5626. regards, Martin On 09/11/2015 04:14 PM, Patrick Farrell wrote: > Having an MDT backup might perhaps have allowed recovery and trying an > improved upgrade process and/or upgrading to a version with the fixes in it. > It's not a bad idea if practical. (And yes, the changes are MDT specific.) > > By the way, the fid-in-dirent bug that Martin described is fixed in the most > recent 2.5 from Intel, but I don't think it's fixed in 2.4? Unsure. > But I'd recommend targeting 2.5 as the destination version for an upgrade. > > From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of > Chris Hunter [chris.hun...@yale.edu] > Sent: Friday, September 11, 2015 8:02 AM > To: lustre-discuss@lists.lustre.org > Subject: Re: [lustre-discuss] 1.8 client on 3.13.0 kernel > > Hi > I believe FID & dirdata feature changes would only affect the MDT during > a lustre upgrade. In hindsight/retrospective do you think a file-level > backup/restore of the MDT would have avoided some of these issues ? > > thanks > chris hunter > >> On 9/10/15 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote: >>> Lewis, >>> >>> I did an upgrade from Lustre 1.8.6 to 2.4.3 on our servers, and for the >>> most part things went pretty good. I?ll chime in on a couple of Martin?s >>> points and mention a few other things. >>> >>>> On Sep 10, 2015, at 9:30 AM, Martin Hecht <he...@hlrs.de> wrote: >>>> >>>> In any case the file systems should be clean before starting the >>>> upgrade, so I would recommend to run e2fsck on all targets and repair >>>> them before starting the upgrade. We did so, but unfortunately our >>>> e2fsprogs were not really up to date and after our lustre upgrade a lot >>>> of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So, >>>> probably some errors on the file systems were still present, but >>>> unnoticed when we did the upgrade. >>> This is a very important point. While I didn?t run e2fsck before the >>> upgrade (but maybe I should have), I made sure to install the latest >>> e2fsprogs. >>> >>>> Lustre 2 introduces the FID (which is something like an inode number, >>>> where lustre 1.8 used the inode number of the underlying ldiskfs, but >>>> with the possibility to have several MDTs in one file system a >>>> replacement was needed). The FID is stored in the inode, but it can also >>>> be activated that the FIDs are stored in the directory node, which makes >>>> lookups faster, especially when there are many files in a directory. >>>> However, there were bugs in the code that takes care about adding the >>>> FID to the directory entry when the file system is converted from 1.8 to >>>> 2.x. So, I would recommend to use a version in
Re: [lustre-discuss] 1.8 client on 3.13.0 kernel
a few more comments in-line On 09/10/2015 09:11 PM, Lewis Hyatt wrote: > Thanks a lot for the info, a little more optimistic :-). > > -Lewis > > On 9/10/15 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote: >> Lewis, >> >> I did an upgrade from Lustre 1.8.6 to 2.4.3 on our servers, and for >> the most part things went pretty good. I’ll chime in on a couple of >> Martin’s points and mention a few other things. >> >>> On Sep 10, 2015, at 9:30 AM, Martin Hecht <he...@hlrs.de> wrote: >>> >>> In any case the file systems should be clean before starting the >>> upgrade, so I would recommend to run e2fsck on all targets and repair >>> them before starting the upgrade. We did so, but unfortunately our >>> e2fsprogs were not really up to date and after our lustre upgrade a lot >>> of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So, >>> probably some errors on the file systems were still present, but >>> unnoticed when we did the upgrade. >> >> This is a very important point. While I didn’t run e2fsck before the >> upgrade (but maybe I should have), I made sure to install the latest >> e2fsprogs. well, a version of the e2fsprogs with some important fixes was released shortly after we did the upgrade. Maybe this was just because we ran into these bugs, and the vendor escalated our tickets to whamcloud/intel >> >>> Lustre 2 introduces the FID (which is something like an inode number, >>> where lustre 1.8 used the inode number of the underlying ldiskfs, but >>> with the possibility to have several MDTs in one file system a >>> replacement was needed). The FID is stored in the inode, but it can >>> also >>> be activated that the FIDs are stored in the directory node, which >>> makes >>> lookups faster, especially when there are many files in a directory. >>> However, there were bugs in the code that takes care about adding the >>> FID to the directory entry when the file system is converted from >>> 1.8 to >>> 2.x. So, I would recommend to use a version in which these bug are >>> solved. We went to 2.4.1 that time. By default this fid_in_dirent >>> feature is not automatically enabled, however, this is the only point >>> where a performance boost may be expected... so we took the risk to >>> enable this... and ran into some bugs. >> >> Enabling fid_in_dirent prevents you from backing out of the upgrade. >> In theory, if you upgraded to Lustre 2.x without enabling >> fid_in_dirent, you could always revert back to Lustre 1.8. We tried >> this on a test system, and the downgrade seemed to work. However, >> this was a small scale test and I have never tried it on a production >> file system. But if you want to minimize possible complications, you >> could always leave this disabled for a while after the updgrade, and >> then if things are going well, enable it later on. actually, the FID is added to new contents, and you have to run the oi_scrub once to convert the file system. That might be important to know when you decide to use this feature. On the other hand, if you don't enable fid_in_dirent, you can go back theoretically, but I think the FID is still added to regular files (not to the directory entry), and you can't read these files created with lustre 2 after the downgrade. However, running lustre 2 without fid_in_dirent is possiblem at least in the earlier 2.x versions - about 2.5 onwards you would have to double check. This is sometimes called "Compatibility Mode IGIF" Anyhow, to avoid running into the problem with the directory entries, I would also recommend not to enable fid_in_dirent or make sure to choose a version which has all the fixes for this problem. There are different types of directories, large and small ones which have a different structure, and the issue was already fixed for some cases, but we have hit another case which was not correctly handled until we hit that bug with our upgrade. >> >> My only other advice is to test as much as possible prior to the >> upgrade. If you have a little test hardware, install the same Lustre >> 1.8 version you are currently running in production and then try >> upgrading that to the new Lustre version. I think preparation is the >> key. I think I spent about 2 months reading about upgrade >> procedures, talking with others who have upgraded, reading JIRA bug >> reports, and running tests on hardware. well, our vendor was preparing the upgrade for about a year and did intensive testing on several file systems and they changed the targeted lustre version several times. The problem is that some bugs are on
Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?
On 09/11/2015 05:23 AM, Dilger, Andreas wrote: > On 2015/09/10, 6:54 PM, "Chris Hunter"wrote: > >> We experienced file corruption on several OSTs. We proceeded through >> recovery using e2fsck & ll_recover_lost_found_obj tools. >> Following these steps, e2fsck came out clean. >> >> The file corruption did not impact the MDT. The files were still >> referenced by the MDT. Accessing the file on a lustre client (ie. ls -l) >> would report error "Cannot allocate memory" >> >> Following OST recovery steps, we started removing the corrupt files via >> "unlink" command on lustre client (rm command would not remove file). >> >> Now dry-run e2fsck of the OST is reporting errors: >> "deleted/unused inodes" in Pass 2 (checking directory structure), >> "Unattached inodes" in Pass 4 (checking reference counts) >> "free block count wrong" in Pass 5 (checking group summary information). >> >> Is e2fsck errors expected when unlinking files ? > No, the "unlink" command is just avoiding the -ENOENT error that "rm" gets > by calling "stat()" on the file before trying to unlink it. This > shouldn't cause any errors on the OSTs, unless there is ongoing corruption > from the back-end storage. Chris, with "live filesystem" you mean that you ran a readonly e2fsck on a lustre file system while it was mounted and clients working on the file system? Then, it is expected that e2fsck reports some error, because the file system contents changes while the e2fsck is running and the in-memory directory structure does not fit to the on-disk data anymore. However, as Andreas points out, it might as well be a sign of ongoing corruption on the storage, but only an offline e2fsck (i.e. while the OST is unmounted, and the journal is played back) can clarify this. regards, Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] 1.8 client on 3.13.0 kernel
Hi Lewis, it's difficult to tell how much data loss was actually related to the lustre upgrade itself. We have upgraded 6 file systems and we had to do it more or less in one shot, because at that time they were using a common MGS server. All servers of one file system must be on the same level (at least for the major upgrade 1.8 to 2.x, there is rolling upgrade for minor versions in the lustre 2 branch now, but I have no experience with that). In any case the file systems should be clean before starting the upgrade, so I would recommend to run e2fsck on all targets and repair them before starting the upgrade. We did so, but unfortunately our e2fsprogs were not really up to date and after our lustre upgrade a lot of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So, probably some errors on the file systems were still present, but unnoticed when we did the upgrade. Lustre 2 introduces the FID (which is something like an inode number, where lustre 1.8 used the inode number of the underlying ldiskfs, but with the possibility to have several MDTs in one file system a replacement was needed). The FID is stored in the inode, but it can also be activated that the FIDs are stored in the directory node, which makes lookups faster, especially when there are many files in a directory. However, there were bugs in the code that takes care about adding the FID to the directory entry when the file system is converted from 1.8 to 2.x. So, I would recommend to use a version in which these bug are solved. We went to 2.4.1 that time. By default this fid_in_dirent feature is not automatically enabled, however, this is the only point where a performance boost may be expected... so we took the risk to enable this... and ran into some bugs. We had other file systems, still on 1.8, so with the server upgrade we didn't upgrade the clients, because lustre 2 clients wouldn't have been able to mount the 1.8 file systems. And we use quotas, and for this you need the 1.8.9 client with a patch that corrects a defect of the 1.8.9 client when it talks to 2.x servers (LU-3067). However, older 1.8 clients don't support the Lustre 2 quota (which came in 2.2 or 2.4, I'm not 100% sure). BTW, it still runs out of sync from time to time, but the limit seems to be fine now, it's just the numbers the users see. lfs quota prints out too low numbers and users run out of quota earlier than they expect... It's better in the latest 2.5 versions now. Here an unsorted(!) list of bugs we have hit during the lustre upgrade. For most of them we weren't the first ones, but I guess you could wait forever for the version in which all bugs are resolved :-) LU-3067 - already mentioned above, a patch for 1.8.9 clients interoperating with 2.x servers, however, 1.8.9 is needed for having quota working. Without this patch clients become unresponsive, 100% cpu load, then just hang and devices become unavailable, reboot doesn't work, so power cycle needed, but after a while the problem reappeared LU-4504 - e2fsck noticed quota issues similar to this bug on osts - use latest e2fsprogs, check again and then the ldiskfs backend doesn't run into this anymore e2fsck noticed quota issues on MDT "Problem in HTREE directory inode 21685465: block #16 not referenced" however, could be fixed by e2fsck LU-5626 mdt becomes readonly: one file system where the MDT was corrupted at earlier stage and obviously not fully repaired lbuged upon MDT mount, could only be mounted with noscrub option the mdt group_upcall (which can be configured with tunefs) used to be /usr/sbin/l_getgroups in lustre 1.8 and it was set by default - the program is called l_getidentity now, is not configured by default anymore. You should either change it with tunefs, or put an appropriate link in place as a fallback. Anyhow, lustre 2 file systems don't use it by default anymore. They just trust the client. It also means that users/groups are not needed anymore on lustre the servers. (we had lokal passwd/group files there so that secondary groups work properly, alternatively you could configure ldap, but without group_upcall, all this is handled by the lustre client. LU-5626 and LU-2627: .. directory entries were damaged by adding the FID, once all old directories were converted and all files somehow recovered (in several consecutive attempts), the problem is gone. The number of emergency maintenances is basically limited by the depth of your directory structure. It could be repaired by running e2fsck, followed by manually moving back everything (save the log of the e2fsck which tells you the relation of the objects in lost+found and their original path!) LU-4504 quota out of sync: turn off quota, run e2fsck, turn it on again - I believe that's something which must be done anyhow quite often, because there is no quotacheck anymore. It's run in the background when enabling quotas, but file systems have to be unmounted for this. Related to quota, there is a change in the lfs
Re: [lustre-discuss] 1.8 client on 3.13.0 kernel
Hi Lewis, Yes, for lustre 2.x you have to "upgrade" the OS, which basically means a reinstall of a CentOS 6.x (because there is no upgade path across major releases), then install the lustre packages and the lustre-patched kernel, and then the pain begins. We had a lot of trouble when we upgraded our lustre file systems from 1.8 to 2.4. I would recommend to consider a fresh install of lustre 2 on a separate hardware, then migrate the data (1.8 clients are able to mount lustre 2 file systems, but not the other way round, and for working quota support you need 1.8.9) to the new file system, and finally reformat the old file system with lustre 2 and use it for testing or backups or whatever. However, if buying new hardware is not an option, the upgrade is possible, and depending on the history of the file system it might work quite smoothly. Upgrading a freshly formatted lustre 1.8 with some artificial test data worked without any problems in our tests before doing the upgrade of the production file systems. Regards, Martin On 09/08/2015 08:18 PM, Lewis Hyatt wrote: > Thanks a lot for the response. Seems like we need to explore upgrading > the servers. Do you happen to know how smooth that process is likely > to be? We have lustre 1.8.8 on CentOS 5.4 there, I presume we need to > upgrade the OS and then follow the upgrade procedure in the lustre > manual, maybe it isn't such a big deal. Thanks again... > > -Lewis > > On 9/8/15 11:16 AM, Patrick Farrell wrote: >> Lewis, >> >> My own understanding is you are out of luck - the 1.8 client cannot >> realistically be brought forward to newer kernels. Far too many >> changes over too long a period. >> >> As far as version compatibility, I believe no newer clients will talk >> to servers running 1.8. If any will, they would be very early 2.x >> versions, which won't support your desired kernel versions anyway. >> >> Regards, >> Patrick >> >> >> From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on >> behalf of Lewis Hyatt [lhy...@gmail.com] >> Sent: Tuesday, September 08, 2015 9:06 AM >> To: lustre-discuss@lists.lustre.org >> Subject: [lustre-discuss] 1.8 client on 3.13.0 kernel >> >> Hello- >> >> We have a working 1.8 lustre cluster with which we are very happy. >> The object >> and metadata servers are running one of the recommended CentOS >> distributions >> (5.4), but the clients are all Ubuntu 10.04 LTS, with kernel 2.6.32. >> It is not >> feasible for us to change on the client side to a different distro >> other than >> Ubuntu, but we are about to go to Ubuntu 14, with kernel 3.13.0, for >> reasons >> unrelated to lustre. Unfortunately it seems that lustre 1.8 cannot be >> built on >> this kernel, we can't even get through the configure process without >> a large >> number of errors. The first one we hit is this: >> >> checking for >> /lib/modules/3.13.0-63-generic/build/include/linux/autoconf.h... no >> >> But various attempts to hack around the errors as they come up have >> not led to >> much success. Is this something we can hope to achieve? I thought I >> saw some >> threads about a series of patches to support this kernel in lustre >> 1.8 but I >> haven't been able to find anything conclusive. We are really hoping >> it is >> possible to upgrade our clients without touching the lustre servers, >> as we >> don't want to disturb that production system which has been very >> reliable for >> us, and we don't have much in-house expertise with lustre or CentOS. >> We were >> able to build a newer lustre client on the 3.13 kernel, but it seems >> it is not >> willing to interact with the 1.8 servers. >> >> Thanks for any advice, much appreciated. >> >> -Lewis >> ___ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] refresh file layout error
On 09/03/2015 07:22 AM, E.S. Rosenberg wrote: > On Wed, Sep 2, 2015 at 8:47 PM, Wahl, Edwardwrote: > >> That would be my guess here. Any chance this is across NFS? Seen that a >> great deal with this error, it used to cause crashes. >> > Strictly speaking it is not, but it may be because a part of the path the > server 'sees'/'knows' is a symlink to the lustre filesystem which lives on > nfs... > Ah, I can remember a problem we had some years ago, when users with their $HOME on NFS were accessing many files in directories on lustre via symlink. Somehow the NAS box serving the nfs file system didn't immediately notice that the files weren't on its own file system and repeatedly had to look up in its cache, just to notice that the files are somewhere else behind a symlink. If I recall correctly, the problem could be avoided by: - Either access the file via absolute path, or cd into the directory (both via mount point, not (!) via symlink) - Or make the symlink an absolute one (I'm not 100% sure, but I believe the problem was only with relative links pointing out of the NFS upwards across the mountpoint and down again into the lustre file system). It could be something similar here. Do you have any chance to access the files via absolute path in your setup and web server configuration? best regards, Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Convert a disk from lustre to ext4
Maybe, it's anyhow too late, but I have found this thread in my unread mail: On 09/01/2015 06:38 PM, Colin Faber wrote: > If you're just looking to reformat the drive, then just reformat the drive: > > http://linux.die.net/man/8/mkfs.ext4 It's still unclear what he actually did. Maybe he formatted as ldiskfs and used the disk as if it were ext4? Then, it might even be mountable as ext4, at least if the e2fstools are installed. Anyhow, I would recommend to backup the data (if there is anything useful on the device already) on a different device, then reformat the drive and restore the files on it. and from an earlier mail on this topic: On 08/24/2015 08:07 PM, E.S. Rosenberg wrote: > What's wrong with plain ext4? > Or XFS, btrfs etc. > > If you want good support on Windows you're stuck with windows filesystems > ((ex)FAT, NTFS), though there are tools to mount extX filesystems on > windows they aren't all that stable as far as I know (though I haven't > looked at that for years so things may have changed). maybe using a NAS box which exports a file system via NFS and SMB is what you are looking for. It's connected via ethernet, however and Windwos can mount it as so-called Network Drive. Linux and MAC support both, NFS and SMB more or less, depending on the version of OS. If you go out to buy such a box, I would recommend to look for a solution based on a RAID system. The very low-price end is based on single disks which, might not be a good idea for long term storage. A RAID system (excpet for RAID0) has built-in redundancy, so that disks may fail and you still can recover the data. regards, Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?
Hi Chris, On 09/02/2015 07:18 AM, Chris Hunter wrote: > Hi Andreas > > On 09/01/2015 07:22 PM, Dilger, Andreas wrote: >> On 2015/09/01, 7:59 AM, "lustre-discuss on behalf of Chris Hunter" >>> chris.hun...@yale.edu> wrote: >> >>> Hi Andreas, >>> Thanks for your help. >>> >>> If you have a striped lustre file with "holes" (ie. one chunk is gone >>> due hardware failure, etc.) are the remaining file chunks considered >>> orphan objects ? > So when a lustre striped file has a hole (eg. missing chunk due to > hardware failure), the remaining file chunks stay indefinitely on the > OSTs. > Is there a way to reclaim the space occupied by these pieces (after > recovery of any usuable data, etc.)? these remaining chunks still belong to the file (i.e. you have the metadata entry on the MDT and you see the file when lustre is mounted). By removing the file you free up the space. In general there are two types of inconsistencies which may occur: Orphan objects are objects which are NOT assigned to an entry on the MDT, i.e. chunks which do not belong to any file. These can be either pre-allocated chunks or chunks left over after a corruption of the metadata on the MDT. The other type of corruption is that you have a file, where chunks are missing in-between. This can happen, when an OST gets corrupted. As long as the MDT is Ok, you should be able to remove such a file. If in addition the MDT is also corrupted, you should first fix the MDT, and you might then only be able to unlink the file (which again might leave some orphan objects on the OSTs). lfsck should be able to remove them, depending on the lustre version you are running... Another point: When the OST got corrupted, after having them repaired with e2fsck, you can mount them as ldiskfs and see if there are chunks in lost+found and use the tool ll_recover_lost_found_objs to restore them in the original place. I believe these objects which e2fsck puts in lost+found are another kind of thing, usually not called "orphan objects". As I said, they usually can be easily recovered. Martin smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] quota only in- but not decreasing after upgrading to Lustre 2.5.3
Hi, it might help to disable quota using tune2fs and re-enable it again on the ext2 level on all devices, see LU-3861. (BTW you don't need the e2fsprogs mentioned in the bug, there was an official release last year in September). You have to stop lustre for the tune2fs run and it takes some time, because this triggers a quota check in the background (which does not produce any output on the screen). best regards, Martin On 07/28/2015 09:44 AM, Torsten Harenberg wrote: a further observation: a user deleted a ~100MB file: before. [root@wnfg001 lustre]# lfs quota -u sandhoff /lustre Disk quotas for user sandhoff (uid 11206): Filesystem kbytes quota limit grace files quota limit grace /lustre 811480188 1811480200 2811480200 - 61077 0 0 - after: [root@wnfg001 lustre]# lfs quota -u sandhoff /lustre Disk quotas for user sandhoff (uid 11206): Filesystem kbytes quota limit grace files quota limit grace /lustre 811480188 1811480200 2811480200 - 61076 0 0 - [root@wnfg001 lustre]# so #files decreased by 1, but not the #kbytes. Furthermore, the lfs quota command is pretty slow: [root@wnfg001 lustre]# time lfs quota -u sandhoff /lustre Disk quotas for user sandhoff (uid 11206): Filesystem kbytes quota limit grace files quota limit grace /lustre 811480188 1811480200 2811480200 - 61076 0 0 - real0m2.441s user0m0.001s sys 0m0.004s [root@wnfg001 lustre]# although the system is not overloaded. Couldn't find anything useful in dmesg: [root@lustre2 ~]# dmesg | grep quota VFS: Disk quotas dquot_6.5.2 LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts: [root@lustre2 ~]# [root@lustre3 ~]# dmesg | grep quota VFS: Disk quotas dquot_6.5.2 LDISKFS-fs (dm-6): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-7): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-9): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-13): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-12): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-10): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-6): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-7): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-9): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-13): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-12): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-10): mounted filesystem with ordered data mode. quota=on. Opts: [root@lustre3 ~]# [root@lustre4 ~]# dmesg | grep quota VFS: Disk quotas dquot_6.5.2 LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-8): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-13): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-14): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-8): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-13): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-14): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts: [root@lustre4 ~]# Thanks for any hint! Best regards Torsten smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] trouble mounting after a tunefs
Hi John, on the Parameters line the different nodes should not be separated by :. Each node should be specified by a separate mgsnode=... or failover.node=... statement. I'm not sure if separating the two interfaces of each node by , is correct here, or if this should be splitted again in two separate statements. best regards, Martin On 06/12/2015 05:07 PM, John White wrote: Good Morning Folks, We recently had to add TCP NIDs to an existing o2ib FS. We added the nid to the modprobe.d stuff and tossed the definition of the NID in the failnode and mgsnode params on all OSTs and the MGS + MDT. When either an o2ib or tcp client try to mount, the mount command hangs and dmesg repeats: LustreError: 11-0: brc-MDT-mdc-881036879c00: Communicating with 10.4.250.10@o2ib, operation mds_connect failed with -11. I fear we may have over-done the parameters, could anyone take a look here and let me know if we need to fix things up (remove params, etc)? MGS: Read previous values: Target: MGS Index: unassigned Lustre FS: Mount type: ldiskfs Flags: 0x4 (MGS ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: MDT: Read previous values: Target: brc-MDT Index: 0 Lustre FS: brc Mount type: ldiskfs Flags: 0x1001 (MDT no_primnode ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: mgsnode=10.4.250.11@o2ib,10.0.250.11@tcp:10.4.250.10@o2ib,10.0.250.10@tcp failover.node=10.4.250.10@o2ib,10.0.250.10@tcp:10.4.250.11@o2ib,10.0.250.11@tcp mdt.quota_type=ug OST(sample): Read previous values: Target: brc-OST0002 Index: 2 Lustre FS: brc Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: errors=remount-ro Parameters: mgsnode=10.4.250.10@o2ib,10.0.250.10@tcp:10.4.250.11@o2ib,10.0.250.11@tcp failover.node=10.4.250.12@o2ib,10.0.250.12@tcp:10.4.250.13@o2ib,10.0.250.13@tcp ost.quota_type=ug ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Exporting a lustre mounted directory via nfs
Hi, I'm re-adding lustre-discuss (I mistakenly replied directly to Kurt). It's interesting that you can't re-export the 2.5.3 system on the client which is able to export the 1.8.9. The support for the lustre 2 quotas has been added to the 1.8 client somewhere between 1.8.7 and 1.8.9. There are a few more commits to the 1.8 branch in Whamcloud's git, but unfortunately there is not much activity anymore. Important fixes which haven't been landed on 1.8 are in LU-3596 and LU-1126. best regards, Martin On 05/21/2015 07:03 PM, Kurt Strosahl wrote: Hi, The purpose of it was to allow access to the lustre file system to any system that can mount over nfs (also because our lustre system runs over IB, and this allows peoples desktops to get to the system). The thing is that the export of lustre 1.8.9 over the 1.8.7 client has been working for years (it was set up back in 2009 I believe, before I was on the project). It is only the lustre 2.5.3 system that mounts but is not reachable via NFS. Today I compiled the 2.5.3 client for a new system that has IB but does not need access to the lustre file system, mounted the new lustre system, and was able to export successfuly. So the problem clearly lies with some combination of the old OS (RHEL5), old client (1.8.7) and new lustre (2.5.3). This isn't the first oddity I've encountered. Early in the testing process I discovered that the quotas in 2.5.3 are not visible to the 1.8.7 clients (but are visible to the 1.8.9 clients). At this point I'm probably going to try building a new gateway (with new hardware and the new OS), mount the new lustre with the client I know works, and export the new area that way. I was just hoping that someone would say oh, just mount it with the derp option. thanks, Kurt - Original Message - From: Martin Hecht To: Kurt Strosahl Sent: Thursday, May 21, 2015 12:51:41 PM Subject: Re: [lustre-discuss] Exporting a lustre mounted directory via nfs Hi Kurt, some time ago we had a client re-exporting a lustre 1.8.x in rhel5. It worked quite well, but I believe you shouldn't run too many nfs clients with this construct. What's the reason for the reexport? If your lustre is on an infiniband and you want to make it available on clients which have no IB card, lnet routing over a tcp-network might be a better option than the nfs re-export. If the reason is that you can't build the lustre client... well... the nfs reexport might be worth trying, but I don't have any experience with re-exporting a lustre 2 file system (although I think the client version is more important in this scenario). best regards, Martin On 05/20/2015 09:14 PM, Kurt Strosahl wrote: Sorry, left off some important info... the system is rhel5, with client 1.8.7... the lustre file system is 2.5.3 (the system already exports a 1.8.9 lustre file system). w/r, Kurt - Original Message - From: Kurt Strosahl To: lustre-discuss@lists.lustre.org Sent: Wednesday, May 20, 2015 2:55:39 PM Subject: Exporting a lustre mounted directory via nfs Good Afternoon, I'm attempting to use nfs to export a lustre mount point on a client box (essentially acting as a gateway for systems that don't have the lustre client). I've mounted lustre, and added it to the nfs exports file (it shows up as exported) but when I try to mount the nfs point the system hangs. On the server side (the lustre gateway) I do see the test system authenticating. w/r, Kurt ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Size difference between du and quota
Hi, a few more things which may play a role: - as you are suspecting, the difference of used blocks vs. used bytes might be the reason, especially if there are many very small files, but there are more possible causes: - some tools use 2^10 bytes and some others use 1000 bytes as kb which might explain small discrpancies. du and ls are examples for different output. However, this can not explain the whole difference. - your find looks for regular files only (type f), but directories and symbolic links consume a few kb as well. If there are many symbolic links, this would be the explanation, I think. - there are also cases in which quota can get out of sync (I don't remember the cause, but I have already seen warnings about this in the syslog of lustre servers). e2fsck on the ldiskfs level is supposed to fix this issue, but I also had cases in which I had to turn off quota by means of tune2fs on the ldisk level and turn it on again in order to trigger something like a background quotacheck in lustre 2. In lustre 1.8 there used to be a tool quotacheck - preallocated stripes on many ost's might be an issue as well, although I don't see the discrepancy described on our file systems. - there might also be orphane objects on the disk, i.e. stripes which are not referenced anymore on the lustre level, but which still consume disk space (not sure if these may affect quota). An online lfsck is supposed to clean them up in lustre 2. In lustre 1.8 one had to run several e2fsck on the ldisk level and build databases to run an lfsck, but that's not supported anymore in lustre 2. best regards, Martin On 05/20/2015 10:50 AM, Phill Harvey-Smith wrote: Hi all, One of my users is reporting a massive size difference between the figures reported by du and quota. doing a du -hs on his directory reports : du -hs . 529G. doing a lfs quota -u username /storage reports Filesystem kbytes quota limit grace files quota limit grace /storage 621775192 64000 64001 - 601284 100 110 - Though this user does have a lot of files : find . -type f | wc -l 581519 So I suspect that it is the typical thing that quota is reporting used blocks whilst du is reporting used bytes, which can of course be wildly different due to filesystem overhead and wasted unused space at the end of files where a block is allocated but only partially used. Is this likely to be the case ? I'm also not entirely sure what versions of lustre the client machines and MDS / OSS servers are running, as I didn't initially set the system up. Cheers. Phill. ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [Lustre-discuss] [HPDD-discuss] Recovering a failed OST
Hi bob, just to make sure: You already followed: http://wiki.lustre.org/index.php/Handling_File_System_Errors, especially the steps for e2fsck linked there? If you did *not yet* do any write operation to the damaged OST, you might want to back up the whole OST first, using dd for instance (if the underlying hardware still permits it). If the situation described (empty O directory, lost LAST_ID entry) occurred *after* the e2fsck, and you find lots of files in lost+found when you mount the OST as ldiskfs, you can use ll_recover_lost_found_objs to put them back in the correct place (http://manpages.ubuntu.com/manpages/precise/man1/ll_recover_lost_found_objs.1.html) - it is part of the lustre distribution. Once I had to run this several times in order to restore the structure below. best regards, Martin On 05/19/2014 08:24 PM, Bob Ball wrote: Oh, better still, as I kept looking, and the low-level panic retreated, I found this on the mdt: [root@lmd02 ~]# lctl get_param osc.*.prealloc_next_id ... osc.umt3-OST0025-osc.prealloc_next_id=6778336 So, unless someone tells me that I am way off base, I'm going to proceed with the assumption that this is a valid starting point, and proceed to get my file system back online. bob On 5/19/2014 2:05 PM, Bob Ball wrote: Google first, ask later. I found this in the manuals: 26.3.4 Fixing a Bad LAST_ID on an OST The procedures there spell out pretty well what I must do, so this should be relatively straight forward. But, does this comment refer to just this OST, or to all OST? *Note - *The file system must be stopped on all servers before performing this procedure. So, is this the best approach to follow, allowing for the fact that there is nothing at all left on the OST, or is there a better short cut to choosing an appropriate LAST_ID? Thanks again, bob On 5/19/2014 1:50 PM, Bob Ball wrote: I need to completely remake a failed OST. I have done this in the past, but this time, the disk failed in such a way that I cannot fully get recovery information from the OST before I destroy and recreate. In particular, I am unable to recover the LAST_ID file, but successfully retrieved the last_rcvd and CONFIGS/* files. mount -t ldiskfs /dev/sde /mnt/ost pushd /mnt/ost cd O cd 0 cp -p LAST_ID /root/reformat/sde The O directory exists, but it is empty. What can I do concerning this missing LAST_ID file? I mean, I probably have something, somewhere, from some previous recovery, but that is way, way out of date. My intent is to recreate this OST with the same index, and then put it back into production. All files were moved off the OST before reaching this state, so nothing else needs to be recovered here. Thanks, bob ___ HPDD-discuss mailing list hpdd-disc...@lists.01.org https://lists.01.org/mailman/listinfo/hpdd-discuss ___ HPDD-discuss mailing list hpdd-disc...@lists.01.org https://lists.01.org/mailman/listinfo/hpdd-discuss ___ HPDD-discuss mailing list hpdd-disc...@lists.01.org https://lists.01.org/mailman/listinfo/hpdd-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss