Re: [lustre-discuss] Recover from broken lustre updates

2021-07-27 Thread Ms. Megan Larko via lustre-discuss
Hello,

Yes. You may boot into the old linux kernel and erase/install/downgrade to
your desired Lustre version.

Best wishes,
Megan

On Tue, Jul 27, 2021 at 8:53 PM Haoyang Liu  wrote:

> Hello Megan:
>
>
> After checking the yum log, I found the following packages were also
> updated:
>
>
> Jul 24 10:30:07 Updated:
> lustre-modules-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64
> Jul 24 10:30:07 Updated:
> lustre-osd-ldiskfs-mount-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64
> Jul 24 10:31:35 Updated:
> lustre-osd-ldiskfs-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64
> Jul 24 10:31:36 Updated:
> lustre-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64
> Jul 24 10:33:04 Installed:
> lustre-osd-zfs-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64
>
>
> Is it safe to 1) boot into the old kernel and 2) remove the updated lustre
> packages and 3) install the old packages?
>
>
> Thanks,
>
>
> Haoyang
>
>
> -原始邮件-
> *发件人:*"Ms. Megan Larko via lustre-discuss" <
> lustre-discuss@lists.lustre.org>
> *发送时间:*2021-07-27 22:15:15 (星期二)
> *收件人:* "Lustre User Discussion Mailing List" <
> lustre-discuss@lists.lustre.org>
> *抄送:*
> *主题:* [lustre-discuss] Recover from broken lustre updates
>
>
>
> Greetings!
>
> I've not seen a response to this post yet so I will chime in.
>
> If the only rpm change was the linux kernel then you should be able to
> reboot into the previous linux kernel.   The CentOS distro, like most linux
> distros, will leave the old kernel rpms in place.  You may use commands
> like "grub2 editenv list" to see what kernel is the current default; you
> may change the default, etc...
>
> I would try this first.
> Cheers,
> megan
>
> --
Sent from Gmail Mobile
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Recover from broken lustre updates

2021-07-27 Thread Ms. Megan Larko via lustre-discuss
Greetings!

I've not seen a response to this post yet so I will chime in.

If the only rpm change was the linux kernel then you should be able to
reboot into the previous linux kernel.   The CentOS distro, like most linux
distros, will leave the old kernel rpms in place.  You may use commands
like "grub2 editenv list" to see what kernel is the current default; you
may change the default, etc...

I would try this first.
Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] OST not being used

2021-06-23 Thread Ms. Megan Larko via lustre-discuss
Hi!

Does the NIC on the OSS that serves OST 4-7 respond to an lctl ping?   You
indicated that it does respond to regular ping, ssh, etc. I would
review my /etc/lnet.conf file for the behavior of a  NIC that times out.
Does the conf allow for asymmetrical routing?  (Is that what you wish?)  Is
there only one path to those OSTs or is there a way failover NIC address
that did not work in this even for some reason?

The Lustre Operations Manual Section 9.1 on lnetctl command shows how you
can get more info on the NIC ( lnetctl show...)

Good luck.
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] OST not being used

2021-06-21 Thread Ms. Megan Larko via lustre-discuss
Greetings Alastair!

You did not indicate which version of Lustre you are using.   FYI it can be
useful to aiding you in your Lustre queries.

You show your command "lfs setstripe --stripe-index 7 myfile.dat".   The
Lustre Operations Manual ( https://doc.lustre.org/lustre_manual.xhtml
) Section 40.1.1 "Synopsis: indicates that stripe-index starts counting at
zero.   My reading of the Manual indicates that starting at zero and using
a default stripe count of one might correctly put the file on to obd index
8.  Depending upon whether or not obdidx starts at zero or one, eight might
possibly be the correct result.  Did you try using a stripe-index of 6 to
see if the resulting stripe count of one file is then on obdidx 7?

If the OST is not usable then the command "lctl dl" will indicate that (as
does the command you used for active OST devices.  Your info does seem to
indicate that the OST 7 is okay.

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Benchmarking Lustre, reduce caching

2021-05-19 Thread Ms. Megan Larko via lustre-discuss
Hello,

The caching could be skewing your performance results.   Try writing a file
larger than the amount of memory on the LFS servers.

Another nice item is the SuperComputing IO500 (and IO50 for smaller
systems).  There are instructions for benchmarking storage in ways which
can compare to other results for a good idea of the performance ability of
your storage.   There are also ideas on avoiding caching issues, etc.
(Ref io500.org )  Disclaimer:  I am not associated with either
SuperComputing nor the IO group.

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre-discuss Digest, Vol 182, Issue 12

2021-05-18 Thread Ms. Megan Larko via lustre-discuss
Hello Tahari,
What is the result of "lctl ping 10.0.1.70@tcp_0" from the box on which you
are trying to mount the Lustre File System?   Is the ping successful and
then fails after 03 seconds? If yes, you may wish to check the
/etc/lnet.conf file for Lustre LNet path "discovery"  (1 allows LNet
discovery while 0 does not), and drop_asym_route (0 disallows
asymmetrical routing while 1 permits it).   I have worked with a few
complex networks in which we chose to turn off LNet discovery and specify,
via /etc/lnet.conf, the routes.  On one system the asymmetrical routing (we
have 16 LNet boxes between the system and the Lustre storage) seemed to be
a problem, but we couldn't pin it to any particular box.  On that system
disallowing asymmetrical routing seemed to help maintain LNet/Lustre
connectivity.

One may check the lctl ping to narrow down net connectivity from other
possibilities.

Cheers,
megan

On Mon, May 17, 2021 at 3:50 PM 
wrote:

> Send lustre-discuss mailing list submissions to
> lustre-discuss@lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
> lustre-discuss-requ...@lists.lustre.org
>
> You can reach the person managing the list at
> lustre-discuss-ow...@lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>1. Re: problems to mount MDS and MDT (Abdeslam Tahari)
>2. Re: problems to mount MDS and MDT (Colin Faber)
>
>
> --
>
> Message: 1
> Date: Mon, 17 May 2021 21:35:34 +0200
> From: Abdeslam Tahari 
> To: Colin Faber 
> Cc: lustre-discuss 
> Subject: Re: [lustre-discuss] problems to mount MDS and MDT
> Message-ID:
>  bxecepen5dzzd+qxn...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Thank you Colin
>
> No i don't have iptables or rules
>
> firewalled is stopped selinux disabled as well
>  iptables -L
> Chain INPUT (policy ACCEPT)
> target prot opt source   destination
>
> Chain FORWARD (policy ACCEPT)
> target prot opt source   destination
>
> Chain OUTPUT (policy ACCEPT)
> target prot opt source   destination
>
>
> Regards
>
>
> Regards
>
> Le lun. 17 mai 2021 ? 21:29, Colin Faber  a ?crit :
>
> > Firewall rules dealing with localhost?
> >
> > On Mon, May 17, 2021 at 11:33 AM Abdeslam Tahari via lustre-discuss <
> > lustre-discuss@lists.lustre.org> wrote:
> >
> >> Hello
> >>
> >> i have a problem to mount the mds/mdt luster, it wont mount at all and
> >> there is no message errors at the console
> >>
> >> -it does not show errors or messages while mounting it
> >>
> >> here are some debug file logs
> >>
> >>
> >> i specify it is a new project that i am doing.
> >>
> >> the version and packages of luter installed:
> >> kmod-lustre-2.12.5-1.el7.x86_64
> >> kernel-devel-3.10.0-1127.8.2.el7_lustre.x86_64
> >> lustre-2.12.5-1.el7.x86_64
> >> lustre-resource-agents-2.12.5-1.el7.x86_64
> >> kernel-3.10.0-1160.2.1.el7_lustre.x86_64
> >> kernel-debuginfo-common-x86_64-3.10.0-1160.2.1.el7_lustre.x86_64
> >> kmod-lustre-osd-ldiskfs-2.12.5-1.el7.x86_64
> >> kernel-3.10.0-1127.8.2.el7_lustre.x86_64
> >> lustre-osd-ldiskfs-mount-2.12.5-1.el7.x86_64
> >>
> >>
> >>
> >> the system(os) Centos 7
> >>
> >> the kernel
> >> Linux lustre-mds1 3.10.0-1127.8.2.el7_lustre.x86_64
> >>  cat /etc/redhat-release
> >>
> >>
> >> when i mount the luster file-system it wont show up and no errors
> >>
> >> mount -t lustre /dev/sda /mds
> >>
> >> lctl dl  does not show up
> >>
> >> df -h   no mount point for /dev/sda
> >>
> >>
> >> lctl dl
> >>
> >> shows this:
> >> lctl dl
> >>   0 UP osd-ldiskfs lustre-MDT-osd lustre-MDT-osd_UUID 3
> >>   2 UP mgc MGC10.0.1.70@tcp 57e06c2d-5294-f034-fd95-460cee4f92b7 4
> >>   3 UP mds MDS MDS_uuid 2
> >>
> >>
> >> but unfortunately it disappears after 03 seconds
> >>
> >> lctl  dl shows nothing
> >>
> >> lctl dk
> >>
> >> shows this debug output
> >>
> >>
> >>
> 0020:0080:18.0:1621276062.004338:0:13403:0:(obd_config.c:1128:class_process_config())
> >> processing cmd: cf006
> >>
> 0020:0080:18.0:1621276062.004341:0:13403:0:(obd_config.c:1147:class_process_config())
> >> removing mappings for uuid MGC10.0.1.70@tcp_0
> >>
> 0020:0104:18.0:1621276062.004346:0:13403:0:(obd_mount.c:661:lustre_put_lsi())
> >> put 9bbbf91d5800 1
> >>
> 0020:0080:18.0:1621276062.004351:0:13403:0:(genops.c:1501:class_disconnect())
> >> disconnect: cookie 0x256dd92fc5bf929c
> >>
> 0020:0080:18.0:1621276062.004354:0:13403:0:(genops.c:1024:class_export_put())
> >> final put 9bbf3e66a400/lustre-MDT-osd_UUID
> >>
> 0020:0100:18.0:1621276062.004361:0:13403:0:(obd_config.c:2100:class_manual_cleanup())
> >> Manual 

[lustre-discuss] Experience with DDN AI400X

2021-03-30 Thread Ms. Megan Larko via lustre-discuss
Hello!

I have no direct experience with the DDN AI400X, but as a vendor DDN has
some nice value-add to the Lustre systems they build.  Having worked with
other DDN Lustre hw in my career, interoperability with other Lustre mounts
is usually not an issue unless the current lustre-client software on the
client boxes is a very different software version or network stack.  A
general example is a box with lustre-client 2.10.4 is not going to be
completely happy with a new 2.12.x on the lustre network.  As far as vendor
lock-in, DDN support in my past experience does have its own value-add to
their Lustre storage product so it is not completely vanilla.  I have found
the enhancements useful.  As far as your total admin control of the DDN
storage product, that is probably up to the terms of the service agreement
made with purchase.   My one experience with DDN on that is contractually
DDN maintained the box version level and patches, standard Lustre tunables
were fine for local admins.  In one case we did stumble upon a bug, I was
permitted to dig around freely but not to change anything; I shared my
findings with the DDN team.  It worked out well for us.

P.S.  I am not in any way employed or compensated by DDN.I'm just
sharing my own experience.   Smile.

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Multiple IB Interfaces

2021-03-09 Thread Ms. Megan Larko via lustre-discuss
Greetings Alastair,

Bonding is supported on InfiniBand, but  I believe that it is only
active/passive.
I think what you might be looking for WRT avoiding data travel through the
inter-cpu link is cpu "affinity" AKA cpu "pinning".

Cheers,
megan

WRT = "with regards to"
AKA = "also known as"
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] MGS IP in a HA cluster

2021-02-22 Thread Ms. Megan Larko via lustre-discuss
Hello Community!

WRT Mr. Sid Young's question on volumes visible to HA partner for failover,
I have experience with lustre-2.12.6 that is ZFS-backed.  We have the
MGS/MDS and then the OSS boxes in pairs using a heartbeat IP
crossover-cabled between the two members in the pair; there is a unique
non-routeable IP assigned to each member of the HA pair.  The PCS set-up
defines a preferred node for a Lustre target (MDT/OST).  If the preferred
node of the pair is unavailable on the primary then the zpool is acquired
via pacemaker on the secondary which is capable of hosting its own OSTs and
those of its HA partner if necessary.

Our MDS HA is different.  We have only one MDT which is mounted on one and
only one member of the MGS/MDS HA pair.

Generally-speaking, one does not want a multi-mount situation.   A target
should only be on one member of the pair.   For our HA pairs, the "zpool
list" or "zpool status" does not even display a zpool which is not active
(already imported) on that particular box.   Yes, the pairs may have access
to the resources of the partner, but that does not mean that those
resources are active/seen/visible on the secondary if they are active on
the primary.   If the primary is inactive then yes all target resources
should be visible and active on the secondary member of the HA pair.

Just sharing our viewpoint/use case.
Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org