Re: [lustre-discuss] Recover from broken lustre updates
Hello, Yes. You may boot into the old linux kernel and erase/install/downgrade to your desired Lustre version. Best wishes, Megan On Tue, Jul 27, 2021 at 8:53 PM Haoyang Liu wrote: > Hello Megan: > > > After checking the yum log, I found the following packages were also > updated: > > > Jul 24 10:30:07 Updated: > lustre-modules-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64 > Jul 24 10:30:07 Updated: > lustre-osd-ldiskfs-mount-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64 > Jul 24 10:31:35 Updated: > lustre-osd-ldiskfs-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64 > Jul 24 10:31:36 Updated: > lustre-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64 > Jul 24 10:33:04 Installed: > lustre-osd-zfs-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64 > > > Is it safe to 1) boot into the old kernel and 2) remove the updated lustre > packages and 3) install the old packages? > > > Thanks, > > > Haoyang > > > -原始邮件- > *发件人:*"Ms. Megan Larko via lustre-discuss" < > lustre-discuss@lists.lustre.org> > *发送时间:*2021-07-27 22:15:15 (星期二) > *收件人:* "Lustre User Discussion Mailing List" < > lustre-discuss@lists.lustre.org> > *抄送:* > *主题:* [lustre-discuss] Recover from broken lustre updates > > > > Greetings! > > I've not seen a response to this post yet so I will chime in. > > If the only rpm change was the linux kernel then you should be able to > reboot into the previous linux kernel. The CentOS distro, like most linux > distros, will leave the old kernel rpms in place. You may use commands > like "grub2 editenv list" to see what kernel is the current default; you > may change the default, etc... > > I would try this first. > Cheers, > megan > > -- Sent from Gmail Mobile ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Recover from broken lustre updates
Greetings! I've not seen a response to this post yet so I will chime in. If the only rpm change was the linux kernel then you should be able to reboot into the previous linux kernel. The CentOS distro, like most linux distros, will leave the old kernel rpms in place. You may use commands like "grub2 editenv list" to see what kernel is the current default; you may change the default, etc... I would try this first. Cheers, megan ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] OST not being used
Hi! Does the NIC on the OSS that serves OST 4-7 respond to an lctl ping? You indicated that it does respond to regular ping, ssh, etc. I would review my /etc/lnet.conf file for the behavior of a NIC that times out. Does the conf allow for asymmetrical routing? (Is that what you wish?) Is there only one path to those OSTs or is there a way failover NIC address that did not work in this even for some reason? The Lustre Operations Manual Section 9.1 on lnetctl command shows how you can get more info on the NIC ( lnetctl show...) Good luck. megan ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] OST not being used
Greetings Alastair! You did not indicate which version of Lustre you are using. FYI it can be useful to aiding you in your Lustre queries. You show your command "lfs setstripe --stripe-index 7 myfile.dat". The Lustre Operations Manual ( https://doc.lustre.org/lustre_manual.xhtml ) Section 40.1.1 "Synopsis: indicates that stripe-index starts counting at zero. My reading of the Manual indicates that starting at zero and using a default stripe count of one might correctly put the file on to obd index 8. Depending upon whether or not obdidx starts at zero or one, eight might possibly be the correct result. Did you try using a stripe-index of 6 to see if the resulting stripe count of one file is then on obdidx 7? If the OST is not usable then the command "lctl dl" will indicate that (as does the command you used for active OST devices. Your info does seem to indicate that the OST 7 is okay. Cheers, megan ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Benchmarking Lustre, reduce caching
Hello, The caching could be skewing your performance results. Try writing a file larger than the amount of memory on the LFS servers. Another nice item is the SuperComputing IO500 (and IO50 for smaller systems). There are instructions for benchmarking storage in ways which can compare to other results for a good idea of the performance ability of your storage. There are also ideas on avoiding caching issues, etc. (Ref io500.org ) Disclaimer: I am not associated with either SuperComputing nor the IO group. Cheers, megan ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] lustre-discuss Digest, Vol 182, Issue 12
Hello Tahari, What is the result of "lctl ping 10.0.1.70@tcp_0" from the box on which you are trying to mount the Lustre File System? Is the ping successful and then fails after 03 seconds? If yes, you may wish to check the /etc/lnet.conf file for Lustre LNet path "discovery" (1 allows LNet discovery while 0 does not), and drop_asym_route (0 disallows asymmetrical routing while 1 permits it). I have worked with a few complex networks in which we chose to turn off LNet discovery and specify, via /etc/lnet.conf, the routes. On one system the asymmetrical routing (we have 16 LNet boxes between the system and the Lustre storage) seemed to be a problem, but we couldn't pin it to any particular box. On that system disallowing asymmetrical routing seemed to help maintain LNet/Lustre connectivity. One may check the lctl ping to narrow down net connectivity from other possibilities. Cheers, megan On Mon, May 17, 2021 at 3:50 PM wrote: > Send lustre-discuss mailing list submissions to > lustre-discuss@lists.lustre.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > or, via email, send a message with subject or body 'help' to > lustre-discuss-requ...@lists.lustre.org > > You can reach the person managing the list at > lustre-discuss-ow...@lists.lustre.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of lustre-discuss digest..." > > > Today's Topics: > >1. Re: problems to mount MDS and MDT (Abdeslam Tahari) >2. Re: problems to mount MDS and MDT (Colin Faber) > > > -- > > Message: 1 > Date: Mon, 17 May 2021 21:35:34 +0200 > From: Abdeslam Tahari > To: Colin Faber > Cc: lustre-discuss > Subject: Re: [lustre-discuss] problems to mount MDS and MDT > Message-ID: > bxecepen5dzzd+qxn...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Thank you Colin > > No i don't have iptables or rules > > firewalled is stopped selinux disabled as well > iptables -L > Chain INPUT (policy ACCEPT) > target prot opt source destination > > Chain FORWARD (policy ACCEPT) > target prot opt source destination > > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > > > Regards > > > Regards > > Le lun. 17 mai 2021 ? 21:29, Colin Faber a ?crit : > > > Firewall rules dealing with localhost? > > > > On Mon, May 17, 2021 at 11:33 AM Abdeslam Tahari via lustre-discuss < > > lustre-discuss@lists.lustre.org> wrote: > > > >> Hello > >> > >> i have a problem to mount the mds/mdt luster, it wont mount at all and > >> there is no message errors at the console > >> > >> -it does not show errors or messages while mounting it > >> > >> here are some debug file logs > >> > >> > >> i specify it is a new project that i am doing. > >> > >> the version and packages of luter installed: > >> kmod-lustre-2.12.5-1.el7.x86_64 > >> kernel-devel-3.10.0-1127.8.2.el7_lustre.x86_64 > >> lustre-2.12.5-1.el7.x86_64 > >> lustre-resource-agents-2.12.5-1.el7.x86_64 > >> kernel-3.10.0-1160.2.1.el7_lustre.x86_64 > >> kernel-debuginfo-common-x86_64-3.10.0-1160.2.1.el7_lustre.x86_64 > >> kmod-lustre-osd-ldiskfs-2.12.5-1.el7.x86_64 > >> kernel-3.10.0-1127.8.2.el7_lustre.x86_64 > >> lustre-osd-ldiskfs-mount-2.12.5-1.el7.x86_64 > >> > >> > >> > >> the system(os) Centos 7 > >> > >> the kernel > >> Linux lustre-mds1 3.10.0-1127.8.2.el7_lustre.x86_64 > >> cat /etc/redhat-release > >> > >> > >> when i mount the luster file-system it wont show up and no errors > >> > >> mount -t lustre /dev/sda /mds > >> > >> lctl dl does not show up > >> > >> df -h no mount point for /dev/sda > >> > >> > >> lctl dl > >> > >> shows this: > >> lctl dl > >> 0 UP osd-ldiskfs lustre-MDT-osd lustre-MDT-osd_UUID 3 > >> 2 UP mgc MGC10.0.1.70@tcp 57e06c2d-5294-f034-fd95-460cee4f92b7 4 > >> 3 UP mds MDS MDS_uuid 2 > >> > >> > >> but unfortunately it disappears after 03 seconds > >> > >> lctl dl shows nothing > >> > >> lctl dk > >> > >> shows this debug output > >> > >> > >> > 0020:0080:18.0:1621276062.004338:0:13403:0:(obd_config.c:1128:class_process_config()) > >> processing cmd: cf006 > >> > 0020:0080:18.0:1621276062.004341:0:13403:0:(obd_config.c:1147:class_process_config()) > >> removing mappings for uuid MGC10.0.1.70@tcp_0 > >> > 0020:0104:18.0:1621276062.004346:0:13403:0:(obd_mount.c:661:lustre_put_lsi()) > >> put 9bbbf91d5800 1 > >> > 0020:0080:18.0:1621276062.004351:0:13403:0:(genops.c:1501:class_disconnect()) > >> disconnect: cookie 0x256dd92fc5bf929c > >> > 0020:0080:18.0:1621276062.004354:0:13403:0:(genops.c:1024:class_export_put()) > >> final put 9bbf3e66a400/lustre-MDT-osd_UUID > >> > 0020:0100:18.0:1621276062.004361:0:13403:0:(obd_config.c:2100:class_manual_cleanup()) > >> Manual
[lustre-discuss] Experience with DDN AI400X
Hello! I have no direct experience with the DDN AI400X, but as a vendor DDN has some nice value-add to the Lustre systems they build. Having worked with other DDN Lustre hw in my career, interoperability with other Lustre mounts is usually not an issue unless the current lustre-client software on the client boxes is a very different software version or network stack. A general example is a box with lustre-client 2.10.4 is not going to be completely happy with a new 2.12.x on the lustre network. As far as vendor lock-in, DDN support in my past experience does have its own value-add to their Lustre storage product so it is not completely vanilla. I have found the enhancements useful. As far as your total admin control of the DDN storage product, that is probably up to the terms of the service agreement made with purchase. My one experience with DDN on that is contractually DDN maintained the box version level and patches, standard Lustre tunables were fine for local admins. In one case we did stumble upon a bug, I was permitted to dig around freely but not to change anything; I shared my findings with the DDN team. It worked out well for us. P.S. I am not in any way employed or compensated by DDN.I'm just sharing my own experience. Smile. Cheers, megan ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Multiple IB Interfaces
Greetings Alastair, Bonding is supported on InfiniBand, but I believe that it is only active/passive. I think what you might be looking for WRT avoiding data travel through the inter-cpu link is cpu "affinity" AKA cpu "pinning". Cheers, megan WRT = "with regards to" AKA = "also known as" ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] MGS IP in a HA cluster
Hello Community! WRT Mr. Sid Young's question on volumes visible to HA partner for failover, I have experience with lustre-2.12.6 that is ZFS-backed. We have the MGS/MDS and then the OSS boxes in pairs using a heartbeat IP crossover-cabled between the two members in the pair; there is a unique non-routeable IP assigned to each member of the HA pair. The PCS set-up defines a preferred node for a Lustre target (MDT/OST). If the preferred node of the pair is unavailable on the primary then the zpool is acquired via pacemaker on the secondary which is capable of hosting its own OSTs and those of its HA partner if necessary. Our MDS HA is different. We have only one MDT which is mounted on one and only one member of the MGS/MDS HA pair. Generally-speaking, one does not want a multi-mount situation. A target should only be on one member of the pair. For our HA pairs, the "zpool list" or "zpool status" does not even display a zpool which is not active (already imported) on that particular box. Yes, the pairs may have access to the resources of the partner, but that does not mean that those resources are active/seen/visible on the secondary if they are active on the primary. If the primary is inactive then yes all target resources should be visible and active on the secondary member of the HA pair. Just sharing our viewpoint/use case. Cheers, megan ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org