Re: [lustre-discuss] Recover from broken lustre updates

2021-07-27 Thread Ms. Megan Larko via lustre-discuss
Hello,

Yes. You may boot into the old linux kernel and erase/install/downgrade to
your desired Lustre version.

Best wishes,
Megan

On Tue, Jul 27, 2021 at 8:53 PM Haoyang Liu  wrote:

> Hello Megan:
>
>
> After checking the yum log, I found the following packages were also
> updated:
>
>
> Jul 24 10:30:07 Updated:
> lustre-modules-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64
> Jul 24 10:30:07 Updated:
> lustre-osd-ldiskfs-mount-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64
> Jul 24 10:31:35 Updated:
> lustre-osd-ldiskfs-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64
> Jul 24 10:31:36 Updated:
> lustre-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64
> Jul 24 10:33:04 Installed:
> lustre-osd-zfs-2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64
>
>
> Is it safe to 1) boot into the old kernel and 2) remove the updated lustre
> packages and 3) install the old packages?
>
>
> Thanks,
>
>
> Haoyang
>
>
> -原始邮件-
> *发件人:*"Ms. Megan Larko via lustre-discuss" <
> lustre-discuss@lists.lustre.org>
> *发送时间:*2021-07-27 22:15:15 (星期二)
> *收件人:* "Lustre User Discussion Mailing List" <
> lustre-discuss@lists.lustre.org>
> *抄送:*
> *主题:* [lustre-discuss] Recover from broken lustre updates
>
>
>
> Greetings!
>
> I've not seen a response to this post yet so I will chime in.
>
> If the only rpm change was the linux kernel then you should be able to
> reboot into the previous linux kernel.   The CentOS distro, like most linux
> distros, will leave the old kernel rpms in place.  You may use commands
> like "grub2 editenv list" to see what kernel is the current default; you
> may change the default, etc...
>
> I would try this first.
> Cheers,
> megan
>
> --
Sent from Gmail Mobile
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Recover from broken lustre updates

2021-07-27 Thread Ms. Megan Larko via lustre-discuss
Greetings!

I've not seen a response to this post yet so I will chime in.

If the only rpm change was the linux kernel then you should be able to
reboot into the previous linux kernel.   The CentOS distro, like most linux
distros, will leave the old kernel rpms in place.  You may use commands
like "grub2 editenv list" to see what kernel is the current default; you
may change the default, etc...

I would try this first.
Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] OST not being used

2021-06-23 Thread Ms. Megan Larko via lustre-discuss
Hi!

Does the NIC on the OSS that serves OST 4-7 respond to an lctl ping?   You
indicated that it does respond to regular ping, ssh, etc. I would
review my /etc/lnet.conf file for the behavior of a  NIC that times out.
Does the conf allow for asymmetrical routing?  (Is that what you wish?)  Is
there only one path to those OSTs or is there a way failover NIC address
that did not work in this even for some reason?

The Lustre Operations Manual Section 9.1 on lnetctl command shows how you
can get more info on the NIC ( lnetctl show...)

Good luck.
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] OST not being used

2021-06-21 Thread Ms. Megan Larko via lustre-discuss
Greetings Alastair!

You did not indicate which version of Lustre you are using.   FYI it can be
useful to aiding you in your Lustre queries.

You show your command "lfs setstripe --stripe-index 7 myfile.dat".   The
Lustre Operations Manual ( https://doc.lustre.org/lustre_manual.xhtml
) Section 40.1.1 "Synopsis: indicates that stripe-index starts counting at
zero.   My reading of the Manual indicates that starting at zero and using
a default stripe count of one might correctly put the file on to obd index
8.  Depending upon whether or not obdidx starts at zero or one, eight might
possibly be the correct result.  Did you try using a stripe-index of 6 to
see if the resulting stripe count of one file is then on obdidx 7?

If the OST is not usable then the command "lctl dl" will indicate that (as
does the command you used for active OST devices.  Your info does seem to
indicate that the OST 7 is okay.

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Benchmarking Lustre, reduce caching

2021-05-19 Thread Ms. Megan Larko via lustre-discuss
Hello,

The caching could be skewing your performance results.   Try writing a file
larger than the amount of memory on the LFS servers.

Another nice item is the SuperComputing IO500 (and IO50 for smaller
systems).  There are instructions for benchmarking storage in ways which
can compare to other results for a good idea of the performance ability of
your storage.   There are also ideas on avoiding caching issues, etc.
(Ref io500.org )  Disclaimer:  I am not associated with either
SuperComputing nor the IO group.

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre-discuss Digest, Vol 182, Issue 12

2021-05-18 Thread Ms. Megan Larko via lustre-discuss
Hello Tahari,
What is the result of "lctl ping 10.0.1.70@tcp_0" from the box on which you
are trying to mount the Lustre File System?   Is the ping successful and
then fails after 03 seconds? If yes, you may wish to check the
/etc/lnet.conf file for Lustre LNet path "discovery"  (1 allows LNet
discovery while 0 does not), and drop_asym_route (0 disallows
asymmetrical routing while 1 permits it).   I have worked with a few
complex networks in which we chose to turn off LNet discovery and specify,
via /etc/lnet.conf, the routes.  On one system the asymmetrical routing (we
have 16 LNet boxes between the system and the Lustre storage) seemed to be
a problem, but we couldn't pin it to any particular box.  On that system
disallowing asymmetrical routing seemed to help maintain LNet/Lustre
connectivity.

One may check the lctl ping to narrow down net connectivity from other
possibilities.

Cheers,
megan

On Mon, May 17, 2021 at 3:50 PM 
wrote:

> Send lustre-discuss mailing list submissions to
> lustre-discuss@lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
> lustre-discuss-requ...@lists.lustre.org
>
> You can reach the person managing the list at
> lustre-discuss-ow...@lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>1. Re: problems to mount MDS and MDT (Abdeslam Tahari)
>2. Re: problems to mount MDS and MDT (Colin Faber)
>
>
> --
>
> Message: 1
> Date: Mon, 17 May 2021 21:35:34 +0200
> From: Abdeslam Tahari 
> To: Colin Faber 
> Cc: lustre-discuss 
> Subject: Re: [lustre-discuss] problems to mount MDS and MDT
> Message-ID:
>  bxecepen5dzzd+qxn...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Thank you Colin
>
> No i don't have iptables or rules
>
> firewalled is stopped selinux disabled as well
>  iptables -L
> Chain INPUT (policy ACCEPT)
> target prot opt source   destination
>
> Chain FORWARD (policy ACCEPT)
> target prot opt source   destination
>
> Chain OUTPUT (policy ACCEPT)
> target prot opt source   destination
>
>
> Regards
>
>
> Regards
>
> Le lun. 17 mai 2021 ? 21:29, Colin Faber  a ?crit :
>
> > Firewall rules dealing with localhost?
> >
> > On Mon, May 17, 2021 at 11:33 AM Abdeslam Tahari via lustre-discuss <
> > lustre-discuss@lists.lustre.org> wrote:
> >
> >> Hello
> >>
> >> i have a problem to mount the mds/mdt luster, it wont mount at all and
> >> there is no message errors at the console
> >>
> >> -it does not show errors or messages while mounting it
> >>
> >> here are some debug file logs
> >>
> >>
> >> i specify it is a new project that i am doing.
> >>
> >> the version and packages of luter installed:
> >> kmod-lustre-2.12.5-1.el7.x86_64
> >> kernel-devel-3.10.0-1127.8.2.el7_lustre.x86_64
> >> lustre-2.12.5-1.el7.x86_64
> >> lustre-resource-agents-2.12.5-1.el7.x86_64
> >> kernel-3.10.0-1160.2.1.el7_lustre.x86_64
> >> kernel-debuginfo-common-x86_64-3.10.0-1160.2.1.el7_lustre.x86_64
> >> kmod-lustre-osd-ldiskfs-2.12.5-1.el7.x86_64
> >> kernel-3.10.0-1127.8.2.el7_lustre.x86_64
> >> lustre-osd-ldiskfs-mount-2.12.5-1.el7.x86_64
> >>
> >>
> >>
> >> the system(os) Centos 7
> >>
> >> the kernel
> >> Linux lustre-mds1 3.10.0-1127.8.2.el7_lustre.x86_64
> >>  cat /etc/redhat-release
> >>
> >>
> >> when i mount the luster file-system it wont show up and no errors
> >>
> >> mount -t lustre /dev/sda /mds
> >>
> >> lctl dl  does not show up
> >>
> >> df -h   no mount point for /dev/sda
> >>
> >>
> >> lctl dl
> >>
> >> shows this:
> >> lctl dl
> >>   0 UP osd-ldiskfs lustre-MDT-osd lustre-MDT-osd_UUID 3
> >>   2 UP mgc MGC10.0.1.70@tcp 57e06c2d-5294-f034-fd95-460cee4f92b7 4
> >>   3 UP mds MDS MDS_uuid 2
> >>
> >>
> >> but unfortunately it disappears after 03 seconds
> >>
> >> lctl  dl shows nothing
> >>
> >> lctl dk
> >>
> >> shows this debug output
> >>
> >>
> >>
> 0020:0080:18.0:1621276062.004338:0:13403:0:(obd_config.c:1128:class_process_config())
> >> processing cmd: cf006
> >>
> 0020:0080:18.0:1621276062.004341:0:13403:0:(obd_config.c:1147:class_process_config())
> >> removing mappings for uuid MGC10.0.1.70@tcp_0
> >>
> 0020:0104:18.0:1621276062.004346:0:13403:0:(obd_mount.c:661:lustre_put_lsi())
> >> put 9bbbf91d5800 1
> >>
> 0020:0080:18.0:1621276062.004351:0:13403:0:(genops.c:1501:class_disconnect())
> >> disconnect: cookie 0x256dd92fc5bf929c
> >>
> 0020:0080:18.0:1621276062.004354:0:13403:0:(genops.c:1024:class_export_put())
> >> final put 9bbf3e66a400/lustre-MDT-osd_UUID
> >>
> 0020:0100:18.0:1621276062.004361:0:13403:0:(obd_config.c:2100:class_manual_cleanup())
> >> Manual cleanu

Re: [lustre-discuss] Experience with DDN AI400X

2021-04-06 Thread Ms. Megan Larko via lustre-discuss
Hello Folks,

To clarify my own issues with working with both Lustre server 2.12.5 and
LNet routers at 2.10.4, I have in my notes from October 2020 that I
received many, many lines in /var/log/messages reading:
 LNet: 8759:0 (o2iblnd_cb.c:3401:kiblnd_check_conns()) Timed out tx for  56 seconds
 which was followed by
Skipped 97 previous similar messages.

The behavior of the Lustre File System storage was a bit (noticeably)
slower when traversing LNets and clients at 2.10.4.   Now I will note that
the 2.10.4 Lustre clients were built with Mellanox version 4.3-1.0.1 and
the Lustre 2.12.5 servers are using Mellanox OFED version 4.7-1.0.0.
 These were the versions of Mellanox software applied when the boxes were
built.   I did not investigate the "Timed out tx for "  I only
noticed that it was consistent for me with 2.12.6 Lustre servers and LNet
routers at 2.10.4 (with the corresponding Mellanox OFED).  I eliminated the
obvious performance issue and messages by not using LNet routers with LFS
2.10.4/MOFED and going with an LNet router at Lustre client 2.12.2 or newer
where the Lustre client 2.12.2 is using Mellanox OFED 4.5-1.0.1.0.

That is why I made the comment that Lustre 2.10.x may not play well with
newer 2.12.x.   It well could be that the differenced in the MOFED stack
are more the reason than Lustre software itself.  Apologies if I offended.
  I'm glad other people have had better luck with Lustre 2.10.x and Lustre
2.12.x versions.

Cheers,
megan

P.S.  Sorry for delay in response; I was off for a few days.

On Fri, Apr 2, 2021 at 5:02 AM Andreas Dilger  wrote:

> On Mar 30, 2021, at 11:54, Spitz, Cory James via lustre-discuss <
> lustre-discuss@lists.lustre.org> wrote:
>
>
> Hello, Megan.
>
> I was curious why you made this comment:
> > A general example is a box with lustre-client 2.10.4 is not going to be
> completely happy with a new 2.12.x on the lustre network
> In general, I think that the two LTS release are very interoperable.  What
> incompatibility are you referring to?  Do you have a well-known LU or two
> to share?
>
>
> This could potentially relate to changes with configuring Multi-Rail LNet
> between those releases?
>
>
> On 3/30/21, 12:14 PM, "lustre-discuss on behalf of Ms. Megan Larko via
> lustre-discuss"  lustre-discuss@lists.lustre.org> wrote:
>
> Hello!
>
> I have no direct experience with the DDN AI400X, but as a vendor DDN has
> some nice value-add to the Lustre systems they build.  Having worked with
> other DDN Lustre hw in my career, interoperability with other Lustre mounts
> is usually not an issue unless the current lustre-client software on the
> client boxes is a very different software version or network stack.  A
> general example is a box with lustre-client 2.10.4 is not going to be
> completely happy with a new 2.12.x on the lustre network.  As far as vendor
> lock-in, DDN support in my past experience does have its own value-add to
> their Lustre storage product so it is not completely vanilla.  I have found
> the enhancements useful.  As far as your total admin control of the DDN
> storage product, that is probably up to the terms of the service agreement
> made with purchase.   My one experience with DDN on that is contractually
> DDN maintained the box version level and patches, standard Lustre tunables
> were fine for local admins.  In one case we did stumble upon a bug, I was
> permitted to dig around freely but not to change anything; I shared my
> findings with the DDN team.  It worked out well for us.
>
> P.S.  I am not in any way employed or compensated by DDN.I'm just
> sharing my own experience.   Smile.
>
> Cheers,
> megan
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
>
>
>
>
>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Experience with DDN AI400X

2021-03-30 Thread Ms. Megan Larko via lustre-discuss
Hello!

I have no direct experience with the DDN AI400X, but as a vendor DDN has
some nice value-add to the Lustre systems they build.  Having worked with
other DDN Lustre hw in my career, interoperability with other Lustre mounts
is usually not an issue unless the current lustre-client software on the
client boxes is a very different software version or network stack.  A
general example is a box with lustre-client 2.10.4 is not going to be
completely happy with a new 2.12.x on the lustre network.  As far as vendor
lock-in, DDN support in my past experience does have its own value-add to
their Lustre storage product so it is not completely vanilla.  I have found
the enhancements useful.  As far as your total admin control of the DDN
storage product, that is probably up to the terms of the service agreement
made with purchase.   My one experience with DDN on that is contractually
DDN maintained the box version level and patches, standard Lustre tunables
were fine for local admins.  In one case we did stumble upon a bug, I was
permitted to dig around freely but not to change anything; I shared my
findings with the DDN team.  It worked out well for us.

P.S.  I am not in any way employed or compensated by DDN.I'm just
sharing my own experience.   Smile.

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Multiple IB Interfaces

2021-03-09 Thread Ms. Megan Larko via lustre-discuss
Greetings Alastair,

Bonding is supported on InfiniBand, but  I believe that it is only
active/passive.
I think what you might be looking for WRT avoiding data travel through the
inter-cpu link is cpu "affinity" AKA cpu "pinning".

Cheers,
megan

WRT = "with regards to"
AKA = "also known as"
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] MGS IP in a HA cluster

2021-02-22 Thread Ms. Megan Larko via lustre-discuss
Hello Community!

WRT Mr. Sid Young's question on volumes visible to HA partner for failover,
I have experience with lustre-2.12.6 that is ZFS-backed.  We have the
MGS/MDS and then the OSS boxes in pairs using a heartbeat IP
crossover-cabled between the two members in the pair; there is a unique
non-routeable IP assigned to each member of the HA pair.  The PCS set-up
defines a preferred node for a Lustre target (MDT/OST).  If the preferred
node of the pair is unavailable on the primary then the zpool is acquired
via pacemaker on the secondary which is capable of hosting its own OSTs and
those of its HA partner if necessary.

Our MDS HA is different.  We have only one MDT which is mounted on one and
only one member of the MGS/MDS HA pair.

Generally-speaking, one does not want a multi-mount situation.   A target
should only be on one member of the pair.   For our HA pairs, the "zpool
list" or "zpool status" does not even display a zpool which is not active
(already imported) on that particular box.   Yes, the pairs may have access
to the resources of the partner, but that does not mean that those
resources are active/seen/visible on the secondary if they are active on
the primary.   If the primary is inactive then yes all target resources
should be visible and active on the secondary member of the HA pair.

Just sharing our viewpoint/use case.
Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] More issues with cur_grant_bytes

2020-12-08 Thread Ms. Megan Larko
Greetings K Hildebrand!
Running Lustre 2.12.5 on the clients I have observed the same behavior.  On
some OSTs, the osc.*.cur_grants_bytes drop to the mid-3000's!  That is
lower than the 70 numbers displayed in the posts of others on this
list.   Concurrently, the same OSTs displaying the low cur_grants_bytes low
numbers displayed non-zero numbers in num_dirty_bytes (I think that is
the param; not at my desk.  It is in the same dir as cur_grant_bytes)I
did issue "lctl set_param osc.*.grant_shrink=0" on the clients.  The only
effect I observed from this setting is that the osc.*.num_dirty_bytes were
now always zero, but the cur_grant_bytes numbers were still decreasing
although the decrease was much less than without the osc.*.grant_shrink=0
setting.

Any additional guidance is greatly appreciated.
Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre-discuss Digest, Vol 177, Issue 4

2020-12-04 Thread Ms. Megan Larko
WRT Subject lnet routing issue - 2.12.5 client with 2.10.3 server:

I concur with ExecStart=/usr/sbin/lnetctl set discovery=0 in the
/usr/lib/systemd/system/lnet.conf file.   I also add a line below the
lnetctl line is to add a line to instantiate the lustre peers desired.
Example:  ExecStart=/usr/sbin/lnetctl peer add A.B.C.[R-Z]@tcp9 --non-mr
The --non-mr is "no multirail" re-enforcing the discovery=0.  This would be
done on the Lustre 2.12.x  (2.12.5) box.   Newer versions of Lustre are
"network greedy" (one might also say that lnet is not practicing social
distancing in networks).   As a reminder, newer Lustre 2.12.x uses
/etc/lnet.conf, a commented-out example is included in the Lustre 2.12.x
client install.

Cheers,
megan

On Tue, Dec 1, 2020 at 12:45 PM 
wrote:

> Send lustre-discuss mailing list submissions to
> lustre-discuss@lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
> lustre-discuss-requ...@lists.lustre.org
>
> You can reach the person managing the list at
> lustre-discuss-ow...@lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>1. Re: lnet routing issue - 2.12.5 client with 2.10.3 server
>   (Mark Lundie)
>
>
> --
>
> Message: 1
> Date: Tue, 1 Dec 2020 12:58:27 +
> From: Mark Lundie 
> To: "Degremont, Aurelien" , f?rat y?lmaz
> 
> Cc: "lustre-discuss@lists.lustre.org"
> 
> Subject: Re: [lustre-discuss] lnet routing issue - 2.12.5 client with
> 2.10.3 server
> Message-ID:
> <
> am6pr0102mb3112477460e5a537eb5c06f8de...@am6pr0102mb3112.eurprd01.prod.exchangelabs.com
> >
>
> Content-Type: text/plain; charset="iso-8859-3"
>
> Hi Aur?lien,
>
> Many thanks! Sorry I missed that. I'll try disabling discovery as
> suggested.
>
> Thanks,
>
> Mark
> 
> From: Degremont, Aurelien 
> Sent: 01 December 2020 12:42
> To: Mark Lundie ; f?rat y?lmaz <
> firatyilm...@gmail.com>
> Cc: lustre-discuss@lists.lustre.org 
> Subject: Re: [lustre-discuss] lnet routing issue - 2.12.5 client with
> 2.10.3 server
>
>
> This is a known issue, see https://jira.whamcloud.com/browse/LU-11840 and
> https://jira.whamcloud.com/browse/LU-13548
>
>
>
> Aur?lien
>
>
>
> De : lustre-discuss  au nom de
> Mark Lundie 
> Date : mardi 1 d?cembre 2020 ? 13:16
> ? : f?rat y?lmaz 
> Cc : "lustre-discuss@lists.lustre.org" 
> Objet : RE: [EXTERNAL] [lustre-discuss] lnet routing issue - 2.12.5 client
> with 2.10.3 server
>
>
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> Hi Firat,
>
> Thanks for your reply. Apologies if I am being silly here, but there is no
> route configured for that network. We have the networks tcp (10.110.0.0/16)
> and tcp1 (10.10.0.0/16). The servers have interfaces on both, but the
> clients only have an interface on tcp1. I'm not sure why the client is
> trying to route to 10.110.0.21@tcp:
>
>
>
> client # mount /net/lustre/
>
> mount.lustre: mount hmeta1@tcp1:hmeta2@tcp1:/lustre at /net/lustre
> failed: Input/output error
>
> Is the MGS running?
>
>
>
> hmeta1 resolves to 10.10.0.91, on tcp1.
>
>
>
> Thanks,
>
>
>
> Mark
>
> 
>
> From: f?rat y?lmaz 
> Sent: 01 December 2020 11:55
> To: Mark Lundie 
> Cc: lustre-discuss@lists.lustre.org 
> Subject: Re: [lustre-discuss] lnet routing issue - 2.12.5 client with
> 2.10.3 server
>
>
>
> Hi Mark,
>
>
>
> [Tue Dec  1 11:07:55 2020] LNetError:
> 2127:0:(lib-move.c:1999:lnet_handle_find_routed_path()) no route to
> 10.110.0.21@tcp from 
>
>
>
> I would suggest checking  lnetctl routing show and remove the route to
> 10.110.0.21@tcp and try to mount.
>
> https://wiki.lustre.org/LNet_Router_Config_Guide
>
>
>
>
>
>
>
> On Tue, Dec 1, 2020 at 2:41 PM Mark Lundie  > wrote:
>
> Hi all,
>
>
>
> I've just run in to an issue mounting on a newly upgraded client running
> 2.12.5 with 2.10.3 servers. Just to give some background, we're about to
> replace our existing Lustre storage, but will run it concurrently with the
> replacement for a couple of months. We'll be running 2.12.5 server on the
> new MDS and OSSs and I plan to update all clients to the same version. I
> would like to avoid updating the existing servers though.
>
>
>
> The problem is this. The servers have two tcp LNET networks, tcp and tcp1,
> on separate subnets and VLANs. The clients only see tcp1 (a small number
> are also on tcp3, routed via 2 lnet routers), which has been fine until
> now. With the 2.12.5 client, however, it is trying to mount from tcp.
> 2.10.3 to 2.12.5 i

Re: [lustre-discuss] lustre-discuss Digest, Vol 175, Issue 2

2020-10-06 Thread Ms. Megan Larko
For S:  Help mounting MDT to Alastair,
Just to clarify, you mentioned that the MDT is ldiskfs, but are you
mounting the MDT as a part of a full Lustre File System on the MDS server,
are you mounting te MDT as type lustre?

Cheers,
megan

On Mon, Oct 5, 2020 at 4:50 PM 
wrote:

> Send lustre-discuss mailing list submissions to
> lustre-discuss@lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
> lustre-discuss-requ...@lists.lustre.org
>
> You can reach the person managing the list at
> lustre-discuss-ow...@lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>1. Help mounting MDT (Alastair Basden)
>
>
> --
>
> Message: 1
> Date: Mon, 5 Oct 2020 16:28:37 +0100 (BST)
> From: Alastair Basden 
> To: lustre-discuss@lists.lustre.org
> Subject: [lustre-discuss] Help mounting MDT
> Message-ID: 
> Content-Type: text/plain; format=flowed; charset=US-ASCII
>
> Hi all,
>
> We are having a problem mounting a ldiskfs mdt.  The mount command is
> hanging, with /var/log/messages containing:
> Oct  5 16:26:17 c6mds1 kernel: INFO: task mount.lustre:4285 blocked for
> more than 120 seconds.
> Oct  5 16:26:17 c6mds1 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct  5 16:26:17 c6mds1 kernel: mount.lustreD 92cd279de2a0 0
> 4285   4284 0x0082
> Oct  5 16:26:17 c6mds1 kernel: Call Trace:
> Oct  5 16:26:17 c6mds1 kernel: [] schedule+0x29/0x70
> Oct  5 16:26:17 c6mds1 kernel: []
> schedule_timeout+0x221/0x2d0
> Oct  5 16:26:17 c6mds1 kernel: [] ?
> enqueue_task_fair+0x208/0x6c0
> Oct  5 16:26:17 c6mds1 kernel: [] ?
> sched_clock_cpu+0x85/0xc0
> Oct  5 16:26:17 c6mds1 kernel: [] ?
> check_preempt_curr+0x80/0xa0
> Oct  5 16:26:17 c6mds1 kernel: [] ?
> ttwu_do_wakeup+0x19/0xe0
> Oct  5 16:26:17 c6mds1 kernel: []
> wait_for_completion+0xfd/0x140
> Oct  5 16:26:17 c6mds1 kernel: [] ?
> wake_up_state+0x20/0x20
> Oct  5 16:26:17 c6mds1 kernel: []
> llog_process_or_fork+0x244/0x450 [obdclass]
> Oct  5 16:26:17 c6mds1 kernel: [] llog_process+0x14/0x20
> [obdclass]
> Oct  5 16:26:17 c6mds1 kernel: []
> class_config_parse_llog+0x125/0x350 [obdclass]
> Oct  5 16:26:17 c6mds1 kernel: []
> mgc_process_cfg_log+0x790/0xc40 [mgc]
> Oct  5 16:26:17 c6mds1 kernel: []
> mgc_process_log+0x3dc/0x8f0 [mgc]
> Oct  5 16:26:17 c6mds1 kernel: [] ?
> config_recover_log_add+0x13f/0x280 [mgc]
> Oct  5 16:26:17 c6mds1 kernel: [] ?
> class_config_dump_handler+0x7e0/0x7e0 [obdclass]
> Oct  5 16:26:17 c6mds1 kernel: []
> mgc_process_config+0x88b/0x13f0 [mgc]
> Oct  5 16:26:17 c6mds1 kernel: []
> lustre_process_log+0x2d8/0xad0 [obdclass]
> Oct  5 16:26:17 c6mds1 kernel: [] ?
> libcfs_debug_msg+0x57/0x80 [libcfs]
> Oct  5 16:26:17 c6mds1 kernel: [] ?
> lprocfs_counter_add+0xf9/0x160 [obdclass]
> Oct  5 16:26:17 c6mds1 kernel: []
> server_start_targets+0x13a4/0x2a20 [obdclass]
> Oct  5 16:26:17 c6mds1 kernel: [] ?
> lustre_start_mgc+0x260/0x2510 [obdclass]
> Oct  5 16:26:17 c6mds1 kernel: [] ?
> class_config_dump_handler+0x7e0/0x7e0 [obdclass]
> Oct  5 16:26:17 c6mds1 kernel: []
> server_fill_super+0x10cc/0x1890 [obdclass]
> Oct  5 16:26:17 c6mds1 kernel: []
> lustre_fill_super+0x328/0x950 [obdclass]
> Oct  5 16:26:17 c6mds1 kernel: [] ?
> lustre_common_put_super+0x270/0x270 [obdclass]
> Oct  5 16:26:17 c6mds1 kernel: [] mount_nodev+0x4f/0xb0
> Oct  5 16:26:17 c6mds1 kernel: [] lustre_mount+0x38/0x60
> [obdclass]
> Oct  5 16:26:17 c6mds1 kernel: [] mount_fs+0x3e/0x1b0
> Oct  5 16:26:17 c6mds1 kernel: []
> vfs_kern_mount+0x67/0x110
> Oct  5 16:26:17 c6mds1 kernel: [] do_mount+0x1ef/0xce0
> Oct  5 16:26:17 c6mds1 kernel: [] ?
> __check_object_size+0x1ca/0x250
> Oct  5 16:26:17 c6mds1 kernel: [] ?
> kmem_cache_alloc_trace+0x3c/0x200
> Oct  5 16:26:17 c6mds1 kernel: [] SyS_mount+0x83/0xd0
> Oct  5 16:26:17 c6mds1 kernel: []
> system_call_fastpath+0x25/0x2a
>
>
> This is Lustre 2.12.2 on CentOS 7.6
>
> Does anyone have any suggestions?
>
> Cheers,
> Alastair.
>
>
> --
>
> Subject: Digest Footer
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> --
>
> End of lustre-discuss Digest, Vol 175, Issue 2
> **
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre learning: lustre_exports_pending_total and lustre_exports_dirty_total

2020-07-07 Thread Ms. Megan Larko
Greetings Assembled-Knowledge!

I am in need of some education.   What do the parameters
lustre-exports_pending and lustre_exports_dirty_total mean?A pointer to
the section in the Lustre Operations Manual is sufficient.   I have been
unable to discover to what these lustre parameters refer.  The reason I am
searching is because we have one shared Lustre File System (LFS), of
several,  running Lustre ver 2.10.4 sever that has a count in the hundred
thousands-to-millions for var lustre_exports_pending_total  while our other
Lustre servers, also 2.10.4 and 2.12.2 have a zero for the value of
lustre_exports_pending_total.   I will note that the one LFS displaying
counts for lustre_exports_pending_total is our busiest file system.  I am
also wondering if the parameter lustre_exports_dirty_total is related to
the lustre_exports_pending_total parameter.

Any schooling in this is appreciated.
Thank you.

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] mlx4 and mxl5 mix environment

2020-07-01 Thread Ms. Megan Larko
Awesome, thanks!   Unfortunately the password reset site is not finding my
UID.   Maybe I never had access to the Lustre wiki.  (I have so many
accounts that sometimes my head spins.)   I'm still willing to help.  Is
there a request password site?

Cheers,
megan

On Fri, Jun 26, 2020 at 8:54 PM Spitz, Cory James 
wrote:

> Megan,
>
>
>
> You wrote:
>
> PS. [I am willing to add/contribute to the
> http://wiki.lustre.org/Infiniband_Configuration_Howto but I think my
> account for wiki editing has expired (at least the one I thought I had did
> not work).
>
>
>
> Thank you for your offer!  Did you try
> http://wiki.lustre.org/Special:PasswordReset?  If that didn’t work then I
> think that you could email lustre@lists.opensfs.org.
>
>
>
> -Cory
>
>
>
>
>
>
>
> On 6/24/20, 3:33 PM, "lustre-discuss on behalf of Ms. Megan Larko" <
> lustre-discuss-boun...@lists.lustre.org on behalf of dobsonu...@gmail.com>
> wrote:
>
>
>
> On 22 Jun 2020 "guru.novice" wrote:
>
> Hi, all
> We setup up a cluster use mlx4 and mlx5 driver mixed?all things goes well.
> Later I find something in wiki
> http://wiki.lustre.org/Infiniband_Configuration_Howto and
>
> http://lists.onebuilding.org/pipermail/lustre-devel-lustre.org/2016-May/003842.html
> which was
> last edited on 2016.
> So do i need to change lnet configuration described in this page ?
> Or the problem has been resolved in new version (like 2.12.x) ?
> Anymore where can i find more details ?
>
> Any suggestions would be appreciated.
> Thanks?
>
>
>
> Hello guru.novice,
>
> Lustre 2.12.x has some nice LNet configuration abilities.  The old
> /etc/modprobe.d/ config files have been superceded by /etc/lnet.conf.   An
> install of Lustre 2.12.x provides a sample of this file (with the lines
> commented out).  Our experience has shown that not all lines are necessary;
> edit to suit.
>
>
>
> The Lustre 2.12.x has Multi-Rail (MR) on by default so Lustre will attempt
> to automatically find active and viable LNet paths to use.  This should
> have no issue with your mlx4/5 mix environment; we have some mixed IB and
> eth that work. To explicitly use MR one may set "Multi-Rail: true" in the
> "peer" NID section of the /etc/lnet.conf file.  But that was not necessary
> for us.  We used a simple /etc/lnet.conf for MR systems:
>
> File stub: /etc/lnet.conf
>
> net:
>
>- net type: o2ib0
>
>  local NI(s):
>
> - interfaces:
>
>  0: ib0
>
>   - net type: o2ib777
>
>  local NI(s):
>
> - interfaces:
>
>  0: ib0:1
>
> This allowed LNet to use any NID o2ib0 and o2ib777.
>
>
>
> Whatever is placed in the /etc/lnet.conf file is loaded into the kernel
> modules used via the Lustre starting mechanism (CentOS uses
> /usr/lib/systemd/system).  Because we are choosing _not_ to use MR on a
> different box, we explicitly defined the available routes in /etc/lnet.conf
> using the lines:
>
> route:
>
>- net: tcp
>
>  gateway: 10.10.10.101@o2ib1
>
>- net: tcp
>
>  gateway: 10.10.10.102@o2ib
>
> And so on up to 10.10.10.116@o2ib
>
>
>
>  In CentOS7, /usr/lib/systemd/system/lnet.service file is reproduced
> below.  (details: lustre-2.12.4-1 with Mellanox OFED version 4.7-1.0.0.1
> and  kernel 3.10.957.27.2.el7)
>
> File lnet.service:
>
> [unit]
>
> Description=lnet management
>
> Requires=network-online.target
>
> After=network-online.target openibd.service rdma.service opa.service
>
> ConditionsPathExists=!/proc/sys/lnet/
>
>
>
> [Service]
>
> Type=oneshot
>
> RemainAfterExit=true
>
> ExecStart=/sbin/modprobe lnet
>
> ExecStart=/usr/sbin/lnetctl lnet configure
>
> ExecStart=/usr/sbin/lnetctl set discover 0   <--Do NOT use this line if
> you want MR function
>
> ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf  <--The file with
> router, credit and similar info
>
> ExecStart=/usr/sbin/lnetctl peer add --nid 10.10.10.[101-116]@o2ib1
> --non_mr  <--Omit non_rm if you want to use MR
>
> ExecStop=/usr/sbin/lustre_rmmod ptlrpc
>
> ExecStop=/usr/sbin/lnetctl lnet unconfigure
>
> ExecStop=/usr/sbin/lustre_rmmod libcfs ldiskfs
>
>
>
> [Install]
>
> WantedBy=multi-user.target
>
>
>
> I hope this info can help you in the right direction.
>
>
>
> Cheers,
>
> megan
>
> PS. [I am willing to add/contribute to the
> http://wiki.lustre.org/Infiniband_Configuration_Howto but I think my
> account for wiki editing has expired (at least the one I thought I had did
> not work).
>
> Our site had issues with Multi-Rail "not socially distancing
> appropriately" from other LNet networks so in our particular case we
> disabled MR.  (An entirely different experience.) ]
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] mlx4 and mxl5 mix environment

2020-06-24 Thread Ms. Megan Larko
On 22 Jun 2020 "guru.novice" wrote:
Hi, all
We setup up a cluster use mlx4 and mlx5 driver mixed?all things goes well.
Later I find something in wiki
http://wiki.lustre.org/Infiniband_Configuration_Howto and
http://lists.onebuilding.org/pipermail/lustre-devel-lustre.org/2016-May/003842.html
which was
last edited on 2016.
So do i need to change lnet configuration described in this page ?
Or the problem has been resolved in new version (like 2.12.x) ?
Anymore where can i find more details ?

Any suggestions would be appreciated.
Thanks?

Hello guru.novice,
Lustre 2.12.x has some nice LNet configuration abilities.  The old
/etc/modprobe.d/ config files have been superceded by /etc/lnet.conf.   An
install of Lustre 2.12.x provides a sample of this file (with the lines
commented out).  Our experience has shown that not all lines are necessary;
edit to suit.

The Lustre 2.12.x has Multi-Rail (MR) on by default so Lustre will attempt
to automatically find active and viable LNet paths to use.  This should
have no issue with your mlx4/5 mix environment; we have some mixed IB and
eth that work. To explicitly use MR one may set "Multi-Rail: true" in the
"peer" NID section of the /etc/lnet.conf file.  But that was not necessary
for us.  We used a simple /etc/lnet.conf for MR systems:
File stub: /etc/lnet.conf
net:
   - net type: o2ib0
 local NI(s):
- interfaces:
 0: ib0
  - net type: o2ib777
 local NI(s):
- interfaces:
 0: ib0:1
This allowed LNet to use any NID o2ib0 and o2ib777.

Whatever is placed in the /etc/lnet.conf file is loaded into the kernel
modules used via the Lustre starting mechanism (CentOS uses
/usr/lib/systemd/system).  Because we are choosing _not_ to use MR on a
different box, we explicitly defined the available routes in /etc/lnet.conf
using the lines:
route:
   - net: tcp
 gateway: 10.10.10.101@o2ib1
   - net: tcp
 gateway: 10.10.10.102@o2ib
And so on up to 10.10.10.116@o2ib

 In CentOS7, /usr/lib/systemd/system/lnet.service file is reproduced
below.  (details: lustre-2.12.4-1 with Mellanox OFED version 4.7-1.0.0.1
and  kernel 3.10.957.27.2.el7)
File lnet.service:
[unit]
Description=lnet management
Requires=network-online.target
After=network-online.target openibd.service rdma.service opa.service
ConditionsPathExists=!/proc/sys/lnet/

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/sbin/modprobe lnet
ExecStart=/usr/sbin/lnetctl lnet configure
ExecStart=/usr/sbin/lnetctl set discover 0   <--Do NOT use this line if you
want MR function
ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf  <--The file with router,
credit and similar info
ExecStart=/usr/sbin/lnetctl peer add --nid 10.10.10.[101-116]@o2ib1
--non_mr  <--Omit non_rm if you want to use MR
ExecStop=/usr/sbin/lustre_rmmod ptlrpc
ExecStop=/usr/sbin/lnetctl lnet unconfigure
ExecStop=/usr/sbin/lustre_rmmod libcfs ldiskfs

[Install]
WantedBy=multi-user.target

I hope this info can help you in the right direction.

Cheers,
megan
PS. [I am willing to add/contribute to the
http://wiki.lustre.org/Infiniband_Configuration_Howto but I think my
account for wiki editing has expired (at least the one I thought I had did
not work).
Our site had issues with Multi-Rail "not socially distancing appropriately"
from other LNet networks so in our particular case we disabled MR.  (An
entirely different experience.) ]
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Awaiting Lustre 2.12.5

2020-05-26 Thread Ms. Megan Larko
Greetings Folks,

I have read that LU-13131 has been addressed and landed in Lustre version
2.14.0 and 2.12.5.  As most of my current Lustre storage is a 2.12.x, I am
eager to check-out the fix landed in 2.12.5.  Is there yet any estimate of
release from the Whamcloud team?

Thanks,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] socklnd bonding

2020-05-08 Thread Ms. Megan Larko
On 7 May 2020, Amir Shehata wrote:


Hello all,

socklnd currently allows grouping of ethernet interfaces in socklnd
specific tcp bonding feature. This is superseded by the Multi-Rail feature
which provides the exact same functionality but for all LNDs.

Anyone still using the socklnd bonding, or should we remove the code?

thanks
amir

Query from megan:  What if our site implementation needs to disable MR?
Will the previous TCP/eth functionality of ksocklnd be lost?

Cheers,
m
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lfs check *, change of behaviour from 2.7 to 2.10?

2019-04-10 Thread Ms. Megan Larko
Data point relating to the lfs check item below:
Using lustre server version 2.10.4 and lustre client version 2.10.7 the
unprivileged user still receives the message "error: check: mds status
failed" in response to "lfs check mds".   Ditto for ost check.   The "lfs
check" command responds properly if run by the root user.

megan

>From the lustre-discuss:
Date: Tue, 9 Apr 2019 18:50:12 +1000
From: Andrew Elwell 
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] lfs check *, change of behaviour from 2.7 to
2.10?
Message-ID:

Content-Type: text/plain; charset="utf-8"

I've just noticed that 'lfs check mds / servers no longer works (2.10.0 or
greater clients) for unprivileged users, yet it worked for 2.7.x clients.

Is this by design?
(lfs quota thankfully still works as a normal user tho)


Andrew
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] LFS tuning hierarchy question

2019-01-24 Thread Ms. Megan Larko
Thank you for the information, Patrick.

On my current Lustre client all Lustre File Systems mounted (called
/mnt/foo and /mnt/bar in my example) display a connection value for
max_rpcs_in_flight = 8 for both file systems--the /mnt/foo on which the
server has max_rpcs_in_flight = 8 and also for /mnt/bar on which the Lustre
server indicates max_rpcs_in_flight = 32.

So using the Lustre 2.7.2 client default behavior all of the Lustre mounts
viewed on the client are max_rpcs_in_flight = 8.

I am assuming that I will need to set the value for max_rpcs_in_flight to
32 on the client and then the client will pick up the 32 where 32 is
possible from the Lustre File System server and 8 on those Lustre File
Systems where the servers have not increased the default value for that
parameter.

Is this correct?

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] LFS tuning hierarchy question

2019-01-24 Thread Ms. Megan Larko
Halloo---  People!

I am seeking confirmation of an observed behavior in Lustre.

I have a Lustre client.   This client is running Lustre 2.7.2.  Mounted
onto this client I have /mnt/foo (Lustre server 2.7.2) and /mnt/bar (lustre
2.10.4).

Servers for /mnt/foo have max_rpcs_in_flight=8  (the default value)
Servers for /mnt/bar have max_rpcs_in_flight=32

On the Lustre client, the command "lctl get_param mdc.*.max_rpcs_in_flight"
show both file systems using max_rpcs_in_flight=8.

Is it correct that the client uses the lowers value for a Lustre tunable
presented from a Lustre file system server?   OR...is it the case that the
client needs to be tuned so that it may use "up to" the maximum value of
the mounted file systems if the specific Lustre server supports that value?

Really I am wondering if it is possible to have, in this case, a
"max_rpcs_in_flight" to be 32 for the /mnt/bar Lustre File System while
still using a more-limited max_rpcs_in_flight of 8 for /mnt/foo.

TIA,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Usage for lfs setstripe -o ost_indices

2018-11-09 Thread Ms. Megan Larko
Responding to A. Dilger (orig e-mail copied below)
I am not sure what the overall objective is trying to achieve in
specifically identifying to which OSTs to write; it was a question from one
in our user community.   I am not able to specify -o to an existing file.
I have not tried to use the lustrelibapi to specify OST layout during the
write.

I concur that LU-8417 points out a very significant disadvantage to having
users employ the -o option to "lfs setstripe" and that using Lustre Pools
is a better idea for the file system. (I'm speculating that perhaps the
users themselves want to be able to create such Lustre Pool-like areas and
currently only sysadmins may create Lustre Pools.  Avoid the
middle-man/woman!  Smile!)

Let me get back to my users to better understand what it is that needs to
be done causing them to wish to invoke the -o option to "lfs set-stripe".

Thanks,
megan

A. Dilger wrote:
Andreas Dilger
12:51 PM (2 hours ago)


to Mohr, me, Lustre
This is https://jira.whamcloud.com/browse/LU-8417 "setstripe -o does not
work on directories", which has not been implemented yet.

That said, setting the default striping to specific OSTs on a directory is
usually not the right thing to do. That will result in OST imbalance.

Equivalent mechanisms include OST pools (which also allow a subset of OSTs
to be used, unlike -o currently does), and has the benefit of labeling
files with the pool to find them easier in the future (eg. for migrating
out of the pool).

What is the end goal that you are trying to achieve?

Cheers, Andreas
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Usage for lfs setstripe -o ost_indices

2018-11-08 Thread Ms. Megan Larko
Greetings List!

What is the correct invocation of specifying exact stripe layout in Lustre?

I am attempting to use the --ost | -o option to lfs setstripe.  The
Lustre_Operations_Manual as of 16 May 2018 Section 38.1.3 indicates that
--ost-index "option is used to specify the exact stripe layout on the the
[sic] file system.  ost_indices is a list of OSTs referenced by their
indices and index ranges separated by commas."

A "man lfs setstripe" in Lustre 2.10.1 shows the -o or --ost-list
 may be a range separated by commas with the example of -o
1,2-4,7 (for -c 5).

The "usage" of "lfs setstripe" in Lustre 2.10.1 shows -o or --ost
.
So all cases indicate "-o" is an acceptable flag for specifying exact
stripe layout.

I have been attempting this command on a directory on a Lustre-2.10.4
storage from a Lustre 2.10.1 client and I fail with the following message:
> lfs setstripe -c 4 -S 1m  -o 1,2-4 custTest/
error on ioctl 0x4008669a for 'custTest' (3): Invalid argument
error: setstripe: create striped file 'custTest' filed: Invalid argument

Permutation "lfs setstripe -c 4 -S 1m --ost 1,2-4 custTest/" returns same
ioctl error.

I receive exactly the same error when varying the specification of -o (1-4,
1,2,3,4 etc).  I have tried using the "lctl dl" index number for the OST
desired--nope.

I noticed the ioctl is always 0x4008669a regardless of the system on which
I run the command using -o or --ost or --ost-list.

What is the correct invocation for "lfs setstripe" using -o?

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Status LU-11188 lfs find for POSIX permissions

2018-11-08 Thread Ms. Megan Larko
Greetings List (and Developers)!

I was looking at LU-11188 regarding "Adding the capability to "lfs find" to
search/locate on permission of file, matching the find -perm behaviour of
find(1)."  The item has not been updated since the LU creation date.

The label describes it as "easy".   Not for me, unfortunately.  Is there
any interest in the community for this particular LU?  I'd like it, but
maybe that's just me.

If anyone has a status update (or opinion) I would be interested in hearing
it.

Thanks!
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre Size Variation after formatiing

2018-08-24 Thread Ms. Megan Larko
from ANS on 21 Aug 2018:
Thanks Rick for your detailed info.

Can you please let me know what are the preferred standard benchmark tools
that i can use for testing the lustre like IOR and what is the result that
i can expect.

Thanks,
ANS.

Hi!
I have successfully used ior (version 3.1.0 currently) and fio (version
2.3) to check I/O to and from Lustre File Systems.   The results you may
obtain depend upon:
* hw (network connections, target disk speed, ...)
* the design of your benchmark query
  - Generally two cases: "hero numbers" which are the best that
hardware/software stack can do, and "Your Use Case" which are numbers you
may achieve based upon your expected use case and educated decisions (about
striping, block size, write behavior, ...)

So I do not know what you might expect for your Lustre File System; I know
only what I might expect from mine.  I'm not trying to be snarky here, just
honest.  My sample numbers probably would not be helpful to you.

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Using lctl lfsck syntax issues

2018-08-10 Thread Ms. Megan Larko
Thanks, Andreas.   That worked.

EXAMPLE: on MDS
> lctl lfsck_query -t namespace -M meg1-MDT
namespace_mdts_init: 0
namespace_mdts_scanning-phase1: 0
namespace_scaning-phase2: 0
   .
   .
   .
namespace_repaired: 0

Currently the values for me are all zeroes--happy file system.

Cheers,
m

On Thu, Aug 9, 2018 at 11:02 PM, Andreas Dilger 
wrote:

> The lctl commands need to be run on the MDS.
>
> Cheers, Andreas
>
> > On Aug 9, 2018, at 11:49, Ms. Megan Larko  wrote:
> >
> > Howdy List!
> >
> > I am checking Lustre-2.10.4 (kernel 3.10.0-693 on CentOS 7.3.1611).
> > I am having trouble using lctl lfsck.  I believe I am not using the
> proper syntax.  The manual page, "man 8 lctl-lfsck-start" (or query in
> place of start) is not providing me the info I seek (or rather I just am
> not 'getting' it).
> >
> > All of my queries (shown below) have the same response.
> > > lctl lfsck_query -t namespace --device meg1-MDT
> > Fail to query LFSCK: Invalid argument
> > > lctl lfsck_query --device meg1-MDT
> > Fail to query LFSCK: Invalid argument
> > > lctl lfsck_query -M meg1-MDT
> > Fail
> > > lctl lfsck_query -M 4# lctl dl shows MDT as device num. 4
> > Fail 
> >
> > I am running this on the Lustre client.  The lctl command is from:
> > > rpm -qf /usr/sbin/lctl
> > lustre-2.10.4-1.el7.centos.x86_64
> >
> > I would like to test the lsfck_query|start|stop function.
> >
> > P.S.  Can an actual--non-dryrun--lfsck be run on a mounted active Lustre
> File System?
> >
> > Cheers,
> > megan
> >
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Using lctl lfsck syntax issues

2018-08-09 Thread Ms. Megan Larko
Howdy List!

I am checking Lustre-2.10.4 (kernel 3.10.0-693 on CentOS 7.3.1611).
I am having trouble using lctl lfsck.  I believe I am not using the proper
syntax.  The manual page, "man 8 lctl-lfsck-start" (or query in place of
start) is not providing me the info I seek (or rather I just am not
'getting' it).

All of my queries (shown below) have the same response.
> lctl lfsck_query -t namespace --device meg1-MDT
Fail to query LFSCK: Invalid argument
> lctl lfsck_query --device meg1-MDT
Fail to query LFSCK: Invalid argument
> lctl lfsck_query -M meg1-MDT
Fail
> lctl lfsck_query -M 4# lctl dl shows MDT as device num. 4
Fail 

I am running this on the Lustre client.  The lctl command is from:
> rpm -qf /usr/sbin/lctl
lustre-2.10.4-1.el7.centos.x86_64

I would like to test the lsfck_query|start|stop function.

P.S.  Can an actual--non-dryrun--lfsck be run on a mounted active Lustre
File System?

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] ksym errors on kmod-lustre RPM after 2.10.4 build against MOFED 4.3

2018-08-09 Thread Ms. Megan Larko
Hi!

I concur with Gin Tan.

I built my successful Lustre-2.10.4 on an older linux-3.10.0-693.[ 2.2..x
or 17.1 ].
After booting into the 693 kernel, I then built MLNX_OFED_LINUX-4.3-1.0.1.0
via command:
"./mlnxofedinstall --skip-distro-check --add-kernel-support"

Then I start that version of mlnx (/etc/init.d/openibd start).

Then I build spl and zfs if the box is a Lustre server.
I go right to building Lustre if the goal is to install a client.

Lustre Client:
./configure --disable-server --disable-ldiskfs
--with-o2ib=/usr/src/ofa_kernel/default
--with-linux=/usr/src/kernels/3.10.0-693.17.1.el7.x86_64

Then the usual and customary "make" and "make rpms".
I do have a kmod-lustre-* rpm.   I have no symbol errors and my test bed of
Lustre-2.10.4 is behaving nicely.

I have found that order matters.  I seem to need to do the kernel first,
then Mellanox on that kernel and build the Lustre part on the kernel with
the new MLNX active.

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] MOFED 4.4-1.0.0.0

2018-08-04 Thread Ms. Megan Larko
Hi,

I have found that Lustre-2.10.4 works only with CentOS linux kernel
3.10.0-693.x. and newer.  I discovered that Mellanox MOFED 4.3(or
4?)-1.0.1.0 (I'm not where I can verify the MOFED version number but it is
4 and ending in "1.0.1.0")  will not work with CentOS linux kernel
3.10.0-8*.  So I have a successful Lustre 2.10.4 with the "693" linux
kernel series and MOFED 4.3/4-1.0.1.0

So I will second the statement that the software version stack is indeed
"particular".

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lfs find to locate files with specific permissions

2018-07-27 Thread Ms. Megan Larko
Great question!

I am using the find on those file systems, both non-Lustre and Lustre for
which there is not (yet) any Robinhood monitoring.

For those Lustre File Systems with Robinhood monitoring, yes, I can quickly
check with a simple mysql query ( SELECT name FROM NAMES LEFT JOIN ENTRIES
ON ENTRIES.id = NAMES.id WHERE ENTRIES.mode="493"   for my Robinhood data
base ).  For those systems without Robinhood, linux "find  -type
d -perm 777".

I would like to use the "lfs find" for Lustre file systems not (yet) being
monitored by Robinhood.

Cheers,
megan



On Fri, Jul 27, 2018 at 1:51 PM, Shawn Hall  wrote:

> Hi Megan,
>
>
>
> Considering your recent LUG presentation on Robinhood, does this
> particular Lustre file system happen to have Robinhood watching over it?
> I’m not sure if the rbh command line tools themselves can get at that info,
> but I’ve done SQL queries in the past to get at that information.  There’s
> some octal -> decimal translation involved, but all the information you’re
> talking about is in the Robinhood database.
>
>
>
> Shawn
>
>
>
> *From: *lustre-discuss  on
> behalf of "Ms. Megan Larko" 
> *Date: *Friday, July 27, 2018 at 1:16 PM
> *To: *Lustre User Discussion Mailing List  >
> *Subject: *[lustre-discuss] lfs find to locate files with specific
> permissions
>
>
>
> Greetings List!
>
> Recently I have been looking for dirs with world write access.  I can do
> this on most file systems, including Lustre, with linux/POSIX "find
> /my/src/dir -type d -perm 777", for example.  I was going to invoke an "lfs
> find" on those file systems of type Lustre but I do not see a "-perm"
> option to the "lfs find".   Am I missing anything here?   Have I only the
> linux "find" for searching for permissions settings?
>
> FYI, the Lustre file systems are 2.7.3, 2.9.0 and 2.10.4 servers with
> clients 2.7.3 and newer.
>
> Cheers,
>
> megan
>
>
> *Disclaimer*
>
> Please see our Privacy Notice
> <https://www.nag.co.uk/content/privacy-notice> for information on how we
> process personal data.
>
> This e-mail has been scanned for all viruses and malware, and may have
> been automatically archived by Mimecast Ltd, an innovator in Software as a
> Service (SaaS) for business.
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lfs find to locate files with specific permissions

2018-07-27 Thread Ms. Megan Larko
Greetings List!

Recently I have been looking for dirs with world write access.  I can do
this on most file systems, including Lustre, with linux/POSIX "find
/my/src/dir -type d -perm 777", for example.  I was going to invoke an "lfs
find" on those file systems of type Lustre but I do not see a "-perm"
option to the "lfs find".   Am I missing anything here?   Have I only the
linux "find" for searching for permissions settings?

FYI, the Lustre file systems are 2.7.3, 2.9.0 and 2.10.4 servers with
clients 2.7.3 and newer.

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre-discuss Digest, Vol 147, Issue 43

2018-07-03 Thread Ms. Megan Larko
WRT Subject: lctl ping node28@o2ib report   Input/output error

Hello Yu,

Just to check the obvious,
--  the recipient system (node28) is running lnet (an "lsmod | grep lnet"
returns the appropriate modules, for example)
--  there is nothing along the path which might be blocking Lustre port 998

Cheers,
megan

On Fri, Jun 29, 2018 at 4:19 PM, 
wrote:

> Send lustre-discuss mailing list submissions to
> lustre-discuss@lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
> lustre-discuss-requ...@lists.lustre.org
>
> You can reach the person managing the list at
> lustre-discuss-ow...@lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>1. Re: lctl ping node28@o2ib report Input/output error (Cory Spitz)
>
>
> --
>
> Message: 1
> Date: Fri, 29 Jun 2018 16:14:18 +
> From: Cory Spitz 
> To: Andreas Dilger , yu sun
> 
> Cc: "lustre-discuss@lists.lustre.org"
> 
> Subject: Re: [lustre-discuss] lctl ping node28@o2ib report
> Input/output error
> Message-ID: 
> Content-Type: text/plain; charset="utf-8"
>
> FYI, there is a helpful guide to LNet setup at
> http://wiki.lustre.org/LNet_Router_Config_Guide.  Despite the title, it
> is applicable to non-routed cases as well.
> -Cory
>
> --
>
> ?On 6/29/18, 1:06 AM, "lustre-discuss on behalf of Andreas Dilger" <
> lustre-discuss-boun...@lists.lustre.org on behalf of adil...@whamcloud.com>
> wrote:
>
> On Jun 28, 2018, at 21:14, yu sun  wrote:
> >
> > all server and client that fore-mentioned is using netmasks
> 255.255.255.224.  and they can ping with each other, for example:
> >
> > root@ml-gpu-ser200.nmg01:~$ ping node28
> > PING node28 (10.82.143.202) 56(84) bytes of data.
> > 64 bytes from node28 (10.82.143.202): icmp_seq=1 ttl=61 time=0.047 ms
> > 64 bytes from node28 (10.82.143.202): icmp_seq=2 ttl=61 time=0.028 ms
> >
> > --- node28 ping statistics ---
> > 2 packets transmitted, 2 received, 0% packet loss, time 999ms
> > rtt min/avg/max/mdev = 0.028/0.037/0.047/0.011 ms
> > root@ml-gpu-ser200.nmg01:~$ lctl ping node28@o2ib1
> > failed to ping 10.82.143.202@o2ib1: Input/output error
> > root@ml-gpu-ser200.nmg01:~$
> >
> >  and we also have hundreds of GPU machines with different IP
> Subnet,  they are in service and it's difficulty to change the network
> structure. so any material or document can guide me solve this by don't
> change network structure.
>
> The regular IP "ping" is being routed by an IP router, but that doesn't
> work with IB networks, AFAIK.  The IB interfaces need to be on the same
> subnet, you need to have an IB interface on each subnet configured on
> each subnet (which might get ugly if you have a large number of
> subnets)
> or you need to use LNet routers that are connected to each IB subnet to
> do the routing (each subnet would be a separate LNet network, for
> example
> 10.82.142.202@o2ib23 or whatever).
>
> The other option would be to use the IPoIB layer with socklnd (e.g.
> 10.82.142.202@tcp) but this would not run as fast as native verbs.
>
> Cheers, Andreas
>
>
> > Mohr Jr, Richard Frank (Rick Mohr)  ?2018?6?29???
> ??3:30???
> >
> > > On Jun 27, 2018, at 4:44 PM, Mohr Jr, Richard Frank (Rick Mohr) <
> rm...@utk.edu> wrote:
> > >
> > >
> > >> On Jun 27, 2018, at 3:12 AM, yu sun  wrote:
> > >>
> > >> client:
> > >> root@ml-gpu-ser200.nmg01:~$ mount -t lustre node28@o2ib1
> :node29@o2ib1:/project /mnt/lustre_data
> > >> mount.lustre: mount node28@o2ib1:node29@o2ib1:/project at
> /mnt/lustre_data failed: Input/output error
> > >> Is the MGS running?
> > >> root@ml-gpu-ser200.nmg01:~$ lctl ping node28@o2ib1
> > >> failed to ping 10.82.143.202@o2ib1: Input/output error
> > >> root@ml-gpu-ser200.nmg01:~$
> > >
> > > In your previous email, you said that you could mount lustre on
> the client ml-gpu-ser200.nmg01.  Was that not accurate, or did something
> change in the meantime?
> >
> > (Note: Received out-of-band reply from Yu stating that there was a
> typo in the previous email, and that client ml-gpu-ser200.nmg01 could not
> mount lustre.  Continuing discussion here so others on list can
> follow/benefit.)
> >
> > Yu,
> >
> > For the IPoIB addresses used on your nodes, what are the subnets
> (and netmasks) that you are using?  It looks like servers use 10.82.143.X
> and clients use 10.82.141.X.  If you are using a 255.255.0.0 netmask, you
> should be fine.  But if you are using 255.255.255.0, then you will run into
> problems.  Lustre expects that all nod

[lustre-discuss] Lustre Operations Manual PDF version

2018-05-11 Thread Ms. Megan Larko
Hi!

I am trying to get the PDF version of the Lustre Operations Manual from site
http://doc.lustre.org   and click "PDF".   The direct link is shown as
http://doc.lustre.org/lustre_manual.pdf.

Today (11 May 2018) I am getting errors that the PDF cannot be opened.

Is there an issue or should I be using a different site to get the
most-current Manual?

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lfs data_version $filename

2018-04-05 Thread Ms. Megan Larko
Greetings List!

What is the number from the command "lfs data_version $filename " telling
me?

I do not see "data_version" documented in lfs -h, man lfs, nor in lustre
manual..

I do know that if I have a zero-length file my robinhood scan of my Lustre
mount point indicates that "lfs get_version" failed.  As the file
in-question is zero length, I have no problem with this (other than lines
in my robinhood lustre.log file making it large).

I understand that Lustre File System uses three numbers to validate data
integrity.  How does the number returned from "lfs data_version" come into
play?  What reference number is that one?

Thank you in advance for the education.
Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lustre-2.10.1 Data on MDT

2018-03-14 Thread Ms. Megan Larko
Hello List!

I have a very small test cluster with Lustre-2.10.1 on both client and
server.
I am testing Progressive File Layout and Data on MDT functions.

The backing file system on the Lustre targets is ZFS-0.7.6-1 and the OS is
CentOS 7.3.1611.  The storage was built from source on the cluster itself.

PFL functions as described in the lustre_manual.pdf  Chapter 19.5.
In attempting to test Data on MDT (DoM) which uses PFL, my "lfs setstripe"
command does not recognize the "-L mdt" flag.   It is not included in the
"man lfs-setstripe" for my 2.10.1 lustre and trying to use the command
example in the lustre_manual.pdf Chapter 20 2.1.1 example:
> lfs setstripe  -E 1M -L mdt -E -1 -S 4M -c -1 /mnt/L210/DoM_test
fails with "setstripe: invalid option -- 'L' ..." and prints the setstripe
help.

Sure enough "man lfs-setstripe" does not show the existence of the -L mdt
option described in the manual Chapter 20.

Has DoM landed yet?  If yes, in what lustre version?
If DoM is in lustre-2.10.1, is my syntax faulty?

P.S.  I am already looking at moving the test system to lustre-2.10.3
because of LU-9529/LU-9530 lfs migrate item.

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lustre mount in a heterogeneous net environment-Final: not a Lustre problem

2018-03-05 Thread Ms. Megan Larko
Hi List!

To bring closure to the question posed regarding mounting Lustre in a
heterogeneous network environment of both InfiniBand and ethernet, the
connection failed because of a network switch config item unrelated to
Lustre itself.

It ended up having nothing to do with Lustre File System options at all.

Yay!

megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lustre mount in heterogeneous net environment-update

2018-02-28 Thread Ms. Megan Larko
Greetings List!

We have been continuing to dissect our LNet environment between our
lustre-2.7.0 clients and the lustre-2.7.18 servers.  We have moved from the
client node to the LNet server which bridges the InfiniBand (IB) and
ethernet networks.   As a test, we attempted to mount the ethernet Lustre
storage from the LNet hopefully taking the IB out of the equation to limit
the scope of our debugging.

On the LNet router the attempted mount of Lustre storage fails.   The LNet
command line error on the test LNet client is exactly the same as the
original client result:
mount A.B.C.D@tcp0:/lustre at /mnt/lustre failed: Input/output error  Is
the MGS running?

On the lustre servers, both the MGS/MDS and OSS we can see the error via
dmesg:
LNet: There was an unexpected network error while writing to C.D.E.F:  -110

and we see the periodic (~ every 10 to 20 minutes) in dmesg on MGS/MDS:
Lustre: MGS: Client  (at C.D.E.F@tcp) reconnecting

The "lctl pings" in various directions are still successful.

So, forget the end lustre client, we are not yet getting from MGS/MDS
sucessfully to the LNet router.
We have been looking at the contents of /sys/module/lustre.conf and we are
not seeing any differences in set values between the LNet router we are
using as a test Lustre client and the Lustre MGS/MDS server.

As much as I'd _love_ to go to Lustre-2.10.x, we are dealing with both
"appliance" style Lustre storage systems and clients tied to specific
versions of the linux kernel (for reasons other than Lustre).

Is there a key parameter which I could still be overlooking?

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre mount in heterogeneous net environment

2018-02-27 Thread Ms. Megan Larko
Hello Jeff,

Yes, I can successfully run "lctl ping" from the client to the Lustre
server and vice versa as you described in:

   - Client on ib0 lnet can `lctl ping ip.of.mds.server@tcp0`
   - MDS on tcp0 can `lctl ping ip.of.client@o2ib`

I have not yet run an iperf nor lnet selftest (lst). I can start on that
now.

Thank you,

megan

On Tue, Feb 27, 2018 at 3:37 PM, Jeff Johnson <
jeff.john...@aeoncomputing.com> wrote:

> Megan,
>
> I assume by being able to ping from server and client you mean they can
> ping each other.
>
>- Client on ib0 lnet can `lctl ping ip.of.mds.server@tcp0`
>- MDS on tcp0 can `lctl ping ip.of.client@o2ib`
>
> If so, can you verify sustained performant throughput from each end to the
> lnet router? On the tcp side you can run iperf (iperf2 or iperf3) to verify
> sustained and stable throughput on the ethernet side. You can use
> ib_send_bw from lnet router to client in a similar way to iperf.
>
> Additionally, you can run lnet_selftest engaging the MDS and client. This
> will test the lnet layer only, if the ethernet or IB layers beneath are
> wonky then lnet and lnet_selftest will not be able to tell you why, just
> that it is.
>
> lnet_selftest method.
>
>
>1. On both mds and client run `modprobe lnet_selftest`
>2. On the MDS export
>3. Save the below script on the MDS
>4. On the MDS run `export LST_SESSION=41704170
>5. Run the script.
>
> # lnet_selftest script
>
> conc=8
> export LST_SESSION=41704170
> lst new_session rw
> lst add_group clients clients.ip.addr@o2ib
> lst add_group servers mds.ip.addr@tcp
> lst add_batch bulk_rw
> lst add_test --batch bulk_rw --distribute 1:1 --concurrency ${conc} --from
> clients --to servers brw read size=1M
> lst run bulk_rw
> lst stat clients servers
>
>
> You will see performance stats reported for server and client. To stop,
> ctrl-c and then type `lst end_session`
>
> The value of LST_SESSION is arbitrary but lnet_selftest needs it so the
> background processes can be killed when the benchmark ends.
>
> If lnet_selftest fails then there is something wonky in the routing or the
> network layer (non-lustre) underneath it.
>
> Make sense?
>
> --Jeff
>
>
>
>
> On Tue, Feb 27, 2018 at 12:08 PM, Ms. Megan Larko 
> wrote:
>
>> Hello List!
>>
>> We have some 2.7.18 lustre servers using TCP.  Through some dual-homed
>> Lustre LNet routes we desire to connect some Mellanox (mlx4) InfiniBand
>> Lustre 2.7.0 clients.
>>
>> The "lctl ping" command works from both the server co-located MGS/MDS and
>> from the client.
>> The mount of the TCP lustre server share from the IB client starts and
>> then shortly thereafter fails with "Input/output errorIs the MGS
>> running?"
>>
>> The Lustre MDS at approximate 20 min. intervals from client mount request
>> /var/log/messages reports:
>> Lustre: MGS: Client  (at A.B.C.D@o2ib) reconnecting
>>
>> The IB client mount command:
>> mount -t lustre C.D.E.F@tcp0:/lustre /mnt/lustre
>>
>> Waits about a minute then returns:
>> mount.lustre C.D.E.F@tcp0:/lustre at /mnt/lustre failed:  Input/output
>> error
>> Is the MGS running?.
>>
>> The IB client /var/log/messages file contains:
>> Lustre: client.c:19349:ptlrpc_expire_one_request(()) @@@ Request sent
>> has timed out for slow reply .. -->MGCC.D.E.F@tcp was lost; in
>> progress operations using this service will fail
>> LustreError: 15c-8: MGCC.D.E.F@tcp: The configuration from log
>> 'lustre-client' failed (-5)  This may be the result of communication errors
>> between this node and the MGS, a bad configuration, or other errors.  See
>> the syslog for more information.
>> Lustre: MGCC.D.E.F@tcp: Connection restored to MGS (at C.D.E.F@tcp)
>> Lustre: Unmounted lustre-client
>> LustreError: 22939:0:(obd_mount.c:lustre_fill_super()) Unable to mount
>> (-5)
>>
>> We have not (yet) set any non-default values on the Lustre File System.
>> *  Server: Lustre 2.7.18  CentOS Linux release 7.3.1611 (Core)  kernel
>> 3.10.0-514.2.2.el7_lustre.x86_64   The server is ethernet; no IB.
>>
>> *  Client: Lustre-2.7.0  RHEL 6.8  kernel 2.6.32-696.3.2.el6.x86_64
>> The client uses Mellanox InfiniBand mlx4.
>>
>> The mount point does exist on the client.   The firewall is not an issue;
>> checked.  SELinux is disabled.
>>
>> NOTE: The server does server the same /lustre file system to other TCP
>> Lustre clients.
>> The client does mount other /lustre_mnt from other IB servers.
>>
>> The info on http://wiki.lustre.o

[lustre-discuss] lustre mount in heterogeneous net environment

2018-02-27 Thread Ms. Megan Larko
Hello List!

We have some 2.7.18 lustre servers using TCP.  Through some dual-homed
Lustre LNet routes we desire to connect some Mellanox (mlx4) InfiniBand
Lustre 2.7.0 clients.

The "lctl ping" command works from both the server co-located MGS/MDS and
from the client.
The mount of the TCP lustre server share from the IB client starts and then
shortly thereafter fails with "Input/output errorIs the MGS running?"

The Lustre MDS at approximate 20 min. intervals from client mount request
/var/log/messages reports:
Lustre: MGS: Client  (at A.B.C.D@o2ib) reconnecting

The IB client mount command:
mount -t lustre C.D.E.F@tcp0:/lustre /mnt/lustre

Waits about a minute then returns:
mount.lustre C.D.E.F@tcp0:/lustre at /mnt/lustre failed:  Input/output error
Is the MGS running?.

The IB client /var/log/messages file contains:
Lustre: client.c:19349:ptlrpc_expire_one_request(()) @@@ Request sent has
timed out for slow reply .. -->MGCC.D.E.F@tcp was lost; in progress
operations using this service will fail
LustreError: 15c-8: MGCC.D.E.F@tcp: The configuration from log
'lustre-client' failed (-5)  This may be the result of communication errors
between this node and the MGS, a bad configuration, or other errors.  See
the syslog for more information.
Lustre: MGCC.D.E.F@tcp: Connection restored to MGS (at C.D.E.F@tcp)
Lustre: Unmounted lustre-client
LustreError: 22939:0:(obd_mount.c:lustre_fill_super()) Unable to mount (-5)

We have not (yet) set any non-default values on the Lustre File System.
*  Server: Lustre 2.7.18  CentOS Linux release 7.3.1611 (Core)  kernel
3.10.0-514.2.2.el7_lustre.x86_64   The server is ethernet; no IB.

*  Client: Lustre-2.7.0  RHEL 6.8  kernel 2.6.32-696.3.2.el6.x86_64The
client uses Mellanox InfiniBand mlx4.

The mount point does exist on the client.   The firewall is not an issue;
checked.  SELinux is disabled.

NOTE: The server does server the same /lustre file system to other TCP
Lustre clients.
The client does mount other /lustre_mnt from other IB servers.

The info on
http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes
describes the situation exceedingly similar to ours.   I'm not sure what
Lustre settings to check if I have not explicitly set any to be different
that the default value.

Any hints would be genuinely appreciated.
Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] the .lustre/fid special directory

2018-02-05 Thread Ms. Megan Larko
Greetings Assembled Wisdom!

I have a question regarding the special directory found on a Lustre File
System mount point ".lustre/fid".

I have several Lustre File Systems currently running
2.7.19.8-3.10.0_514.2.2 on CentOS 7.3.1611 each of which has its Lustre
mountpoint, say /mnt/Dev1, /mnt/Dev2, for example.

On each of these mount points I have the special directory:
/mnt/Dev1/.lustre:
dr-x-- 2 root root 25088 Jan  1  1970 lost+found
d--x--2 root root 25088 Jan  1 1970 fid
drwxr-xr-x 7 root root 25088 Jan 19 21:11 ..

Yes, each Lustre mount point has the "Jan  1 1970" date on those dirs.

The command "ls -l /mnt/Dev1/.lustre/fid"  returns nothing, not even a
permission error.

I am asking because my newly upgraded version of Robinhood Policy Engine
3.1 (from 3.0), now shows an error "_set_mount_point | Error: failed to get
FID for special directory : Permission denied when I
execute a "rbh-report" command.

Other than this new error on the Robinhood server, everything else is
occurring normally.  I am seeing what I believe to be correct Robinhood
database information.

- Is this /mnt/Dev1/.lustre special directory supposed to have these
unusual dates associated with the directories?
- Is this /mnt/Dev1/.lustre special directory supposed to have the unusual
permissions associated with sub-dirs?

Could this be related to LU-10243?   (I cannot get to that bug to view any
details yet.)

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] tar file of Lustre 2.10.x

2017-10-06 Thread Ms. Megan Larko
Greetings!

Is there any location from which I may "wget" a Lustre 2.10.1 tar file?  I
would like to build against a particular OS and network stack and I cannot
use "git" at this location.

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Understanding Lustre Recovery

2017-08-15 Thread Ms. Megan Larko
Salutations!

I am trying to better understand Lustre recovery after having a hardware
problem force a recovery on the Lustre File System (LFS).

For the most part, the recovery from the hardware failure succeeded.  It
took a bit of time.
The system is a vendor appliance initially based on Lustre 2.1.0 with some
vendor patches.

I was concerned about the messages "...waking for gap in transno, VBR is
OFF...".   I learned in LU-7732 that the message concerns transaction
numbers in a replay request in which no CREATE occurs such that the
transaction number is subsequenlty removed from the replay list.  The
message was deemed inaccurate and unnecessary per LU-7732 and removed in
Lustre 2.9.0 and newer.  Thanks for the info.

What about the "VBR is OFF" part of the string?  The "Lustre Software
Release Operations Manual 2.x" Section 31.4.1 VBR Messages states of VBR
that "It cannot be disabled".   The "lctl get_param
obdfilter.*.recovery_status" on an OSS shows the line "VBR:  DISABLED" for
each OST on the OSS.  The MDS recovery_status also shows VBR: DISABLED.  I
noted while reading LU-5724 that James Simmons post of recovery_status had
VBR: DISABLED as well. It seemed from the post that the VBR status was
acceptable (Mr. Simmons question in the LU was on the IR status).

* If a LFS is not in recovery (STATUS:  COMPLETE) is VBR: DISABLED printed
because a recovery operation is not currently occuring?

Secondly, the Imperative Recovery (IR) is shown as IR: ENABLED on the MDS.
For the OSS, I see from the recovery_status output that IR is enabled, on
most--but not all--the OSTs.

* Should the OST IR value be consistent across all active OSTs in a LFS?
* Why might an OST have an IR state different from its peers?

P.S.  I really appreciate the clear write-ups in the LU pages.  They have
helped me significantly.

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lfs fid2path user access

2017-08-08 Thread Ms. Megan Larko
Hi,

In answering my own question about the command "lfs fid2path..." being run
as a non-root user:
Yes, a non-root user can successfully run the command "lfs fid2path "
per the Lustre Software Release 2.x Operations Manual section 33.4
"mount".   The box describing Lustre client options indicates that
successful runs of "lfs fid2path ..." by non-root users was implemented in
Lustre 2.3.

I can't wait to get my newer Lustre File System ( the wait is only days
now--yay).

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lfs fid2path user access

2017-08-08 Thread Ms. Megan Larko
Greetings List!

I may have posed this question previously, but I am unable to located the
answer.

Using older Lustre 2.1.x, a user running the command "lfs fid2path ..."
fails for "Operation not permitted" even for files owned by that UID.   The
command is successful for the root user.  Conversely, the command "lfs
path2fid ..." is successful for both the user owning the file and root UID.

It seems to me that the "lfs" commands were separated from the "lctl"
commands so that unprivileged users could successfully use "lfs" queries.
Since Lustre 2.1.x is old (I'm getting new systems very very soon--smile),
has the "lfs fid2path ..." permission restriction for users been addressed
in Lustre 2.7.x and newer?

Thanks!
megan

Background detail:
An "strace" of the "lfs fid2path ..." for the user and root-user diverge at
the point ---
open("/lustre", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
ioctl(3, 0xc0086696, 0x798040)  = 0 # for UID=0

open("/lustre", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
ioctl(3, 0xc0086696, 0x798040)  = -1 EPERM (Operation not
permitted)# for UID != 0

...where /lustre is the Lustre mount point on the client.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre-discuss Digest, Vol 136, Issue 26

2017-07-28 Thread Ms. Megan Larko
Subject: LNET router (2.10.0) recommendations for  heterogeneous (mlx5,
qib) IB setup

Greetings!

I did not see an answer to the question posed in the subject line above
about heterogenous IB environments so I thought I would chime-in.

One document I have found on the topic of heterogeneous IB environments is
http://wiki.lustre.org/Infiniband_Configuration_Howto

Generally speaking, networks like to be as homogeneous as possible.  That
said, they may not always be such.  If you are working with mlx5, you may
wish to look over LU-7124 and LU-1701 regarding the setting of
peer_credits.  In Lustre versions prior to 2.9.0 the mlx5 did not handle
peer_credits > 16 unless the map_on_demand was set to 256 (which is the
default in newer versions of Lustre, I believe).

Cheers,
megan

On Tue, Jul 25, 2017 at 4:11 PM, 
wrote:

> Send lustre-discuss mailing list submissions to
> lustre-discuss@lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
> lustre-discuss-requ...@lists.lustre.org
>
> You can reach the person managing the list at
> lustre-discuss-ow...@lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>1. Re: Install issues on 2.10.0 (John Casu)
>2. How does Lustre client side caching work? (Joakim Ziegler)
>3. LNET router (2.10.0) recommendations for  heterogeneous (mlx5,
>   qib) IB setup (Nathan R.M. Crawford)
>
>
> --
>
> Message: 1
> Date: Tue, 25 Jul 2017 10:52:06 -0700
> From: John Casu 
> To: "Mannthey, Keith" , Ben Evans
> ,  "lustre-discuss@lists.lustre.org"
> 
> Subject: Re: [lustre-discuss] Install issues on 2.10.0
> Message-ID: <96d20a1a-9c15-167d-3538-50721f787...@chiraldynamics.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Ok, so I assume this is actually a ZFS/SPL bug & not a lustre bug.
> Also, thanks Ben, for the ptr.
>
> many thanks,
> -john
>
> On 7/25/17 10:19 AM, Mannthey, Keith wrote:
> > Host_id is for zpool double import protection.  If a host id is set on a
> zpool (zfs does this automatically) then a HA server can't just import to
> pool (users have to use --force). This makes the system a lot safer from
> double zpool imports.  Call 'genhostid' on your Lustre servers and the
> warning will go away.
> >
> > Thanks,
> >   Keith
> >
> >
> >
> > -Original Message-
> > From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org]
> On Behalf Of Ben Evans
> > Sent: Tuesday, July 25, 2017 10:13 AM
> > To: John Casu ; lustre-discuss@lists.lustre.org
> > Subject: Re: [lustre-discuss] Install issues on 2.10.0
> >
> > health_check moved to /sys/fs/lustre/ along with a bunch of other things.
> >
> > -Ben
> >
> > On 7/25/17, 12:21 PM, "lustre-discuss on behalf of John Casu"
> >  j...@chiraldynamics.com> wrote:
> >
> >> Just installed latest 2.10.0 Lustre over ZFS on a vanilla Centos
> >> 7.3.1611 system, using dkms.
> >> ZFS is 0.6.5.11 from zfsonlinux.org, installed w. yum
> >>
> >> Not a single problem during installation, but I am having issues
> >> building a lustre filesystem:
> >> 1. Building a separate mgt doesn't seem to work properly, although the
> >> mgt/mdt combo
> >> seems to work just fine.
> >> 2. I get spl_hostid not set warnings, which I've never seen before 3.
> >> /proc/fs/lustre/health_check seems to be missing.
> >>
> >> thanks,
> >> -john c
> >>
> >>
> >>
> >> -
> >> Building an mgt by itself doesn't seem to work properly:
> >>
> >>> [root@fb-lts-mds0 x86_64]# mkfs.lustre --reformat --mgs
> >>> --force-nohostid --servicenode=192.168.98.113@tcp \
> >>> --backfstype=zfs mgs/mgt
> >>>
> >>> Permanent disk data:
> >>> Target: MGS
> >>> Index:  unassigned
> >>> Lustre FS:
> >>> Mount type: zfs
> >>> Flags:  0x1064
> >>>(MGS first_time update no_primnode ) Persistent mount
> >>> opts:
> >>> Parameters: failover.node=192.168.98.113@tcp
> >>> WARNING: spl_hostid not set. ZFS has no zpool import protection
> >>> mkfs_cmd = zfs create -o canmount=off -o xattr=sa mgs/mgt
> >>> WARNING: spl_hostid not set. ZFS has no zpool import protection
> >>> Writing mgs/mgt properties
> >>>lustre:failover.node=192.168.98.113@tcp
> >>>lustre:version=1
> >>>lustre:flags=4196
> >>>lustre:index=65535
> >>>lustre:svname=MGS
> >>> [root@fb-lts-mds0 x86_64]# mount.lustre mgs/mgt /mnt/mgs
> >>> WARNING: spl_hostid not set. ZFS has no zpool import protection
> >>>
> >>> mount.lustre FATAL: unhandled/unloaded fs type 0 'ext3'
> >>
> >> If I build the combo mgt/mdt, things go a lot better:
> >>
> >>>
> >>> [root@fb-lts-mds0 x86_64]# mkfs.lustre --reformat --mgs --mdt
> >>> -

Re: [lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems

2017-05-19 Thread Ms. Megan Larko
Greetings Jessica,

I'm not sure I am correctly understanding the behavior "robinhood activity
floods the MDT".   The robinhood program as you (and I) are using it is
consuming the MDT CHANGELOG via a reader_id which was assigned when the
CHANGELOG was enabled on the MDT.   You can check the MDS for these readers
via "lctl get_param mdd.*.changelog_users".  Each CHANGELOG reader must
either be consumed by a process or destroyed otherwise the CHANGELOG will
grow until it consumes sufficient space to stop the MDT from functioning
correctly.  So robinhood should consume and then clear the CHANGELOG via
this reader_id.  This implementation of robinhood is actually a rather
light-weight process as far as the MDS is concerned.   The load issues I
encountered were on the robinhood server itself which is a separate server
from the Lustre MGS/MDS server.

Just curious, have you checked for multiple reader_id's on your MDS for
this Lustre file system?

P.S. My robinhood configuration file is using nb_threads = 8, just for a
data point.

Cheers,
megan

On Thu, May 18, 2017 at 2:36 PM, Jessica Otey  wrote:

> Hi Megan,
>
> Thanks for your input. We use percona, a drop-in replacement for mysql...
> The robinhood activity floods the MDT, but it does not seem to produce any
> excessive load on the robinhood box...
>
> Anyway, FWIW...
>
> ~]# mysql --version
> mysql  Ver 14.14 Distrib 5.5.54-38.6, for Linux (x86_64) using readline 5.1
>
> Product: robinhood
> Version: 3.0-1
> Build:   2017-03-13 10:29:26
>
> Compilation switches:
> Lustre filesystems
> Lustre Version: 2.5
> Address entries by FID
> MDT Changelogs supported
>
> Database binding: MySQL
>
> RPM: robinhood-lustre-3.0-1.lustre2.5.el6.x86_64
> Lustre rpms:
>
> lustre-client-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64
> lustre-client-modules-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64
>
> On 5/18/17 11:55 AM, Ms. Megan Larko wrote:
>
> With regards to (WRT) Subject "Robinhood exhausting RPC resources against
> 2.5.5   lustre file systems", what version of robinhood and what version of
> MySQL database?   I mention this because I have been working with
> robinhood-3.0-0.rc1 and initially MySQL-5.5.32 and Lustre 2.5.42.1 on
> kernel-2.6.32-573 and had issues in which the robinhood server consumed
> more than the total amount of 32 CPU cores on the robinhood server (with
> 128 G RAM) and would functionally hang the robinhood server.   The issue
> was solved for me by changing to MySQL-5.6.35.   It was the "sort" command
> in robinhood that was not working well with the MySQL-5.5.32.
>
> Cheers,
> megan
>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre-discuss Digest, Vol 134, Issue 36

2017-05-18 Thread Ms. Megan Larko
With regards to (WRT) Subject "Robinhood exhausting RPC resources against
2.5.5   lustre file systems", what version of robinhood and what version of
MySQL database?   I mention this because I have been working with
robinhood-3.0-0.rc1 and initially MySQL-5.5.32 and Lustre 2.5.42.1 on
kernel-2.6.32-573 and had issues in which the robinhood server consumed
more than the total amount of 32 CPU cores on the robinhood server (with
128 G RAM) and would functionally hang the robinhood server.   The issue
was solved for me by changing to MySQL-5.6.35.   It was the "sort" command
in robinhood that was not working well with the MySQL-5.5.32.

Cheers,
megan

On Wed, May 17, 2017 at 2:04 PM, 
wrote:

> Send lustre-discuss mailing list submissions to
> lustre-discuss@lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
> lustre-discuss-requ...@lists.lustre.org
>
> You can reach the person managing the list at
> lustre-discuss-ow...@lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>1. Re: seclabel (Robin Humble)
>2. Re: seclabel (Sebastien Buisson)
>3. Robinhood exhausting RPC resources against 2.5.5  lustre file
>   systems (Jessica Otey)
>
>
> --
>
> Message: 1
> Date: Wed, 17 May 2017 10:16:51 -0400
> From: Robin Humble 
> To: lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] seclabel
> Message-ID: <20170517141651.ga1...@trinity.cita.utoronto.ca>
> Content-Type: text/plain; charset=us-ascii
>
> I setup a couple of VMs with 2.9 clients and servers (ldiskfs) and
> unfortunately setcap/getcap still are unhappy - same as with my
> previous 2.9 clients with 2.8 servers (ZFS).
>
> hmm.
> I took a gander at the source and noticed that llite/xattr.c
> deliberately filters out 'security.capability' and returns 0/-ENODATA
> for setcap/getcap, which is indeed what strace sees. so setcap/getcap
> is never even sent to the MDS.
>
> if I remove that filter (see patch on lustre-devel) then setcap/getcap
> works ->
>
>  # df .
> Filesystem1K-blocks  Used Available Use% Mounted on
> 10.122.1.5@tcp:/test8   4797904 33992   4491480   1% /mnt/test8
>  # touch blah
>  # setcap cap_net_admin,cap_net_raw+p blah
>  # getcap blah
> blah = cap_net_admin,cap_net_raw+p
>
> and I also tested that the 'ping' binary run as unprivileged user works
> from lustre.
> success!
>
> 'b15587' is listed as the reason for the filtering.
> I don't know what that refers to.
> is it still relevant?
>
> cheers,
> robin
>
>
> --
>
> Message: 2
> Date: Wed, 17 May 2017 14:37:31 +
> From: Sebastien Buisson 
> To: Robin Humble 
> Cc: "lustre-discuss@lists.lustre.org"
> 
> Subject: Re: [lustre-discuss] seclabel
> Message-ID: 
> Content-Type: text/plain; charset="utf-8"
>
> Hi Robin,
>
> b15587 refers to the old Lustre Bugzilla tracking tool:
> https://projectlava.xyratex.com/show_bug.cgi?id=15587
>
> Reading the discussion in the ticket, supporting xattr at the time of
> Lustre 1.8 and 2.0 was causing issues on MDS side in some situations. So it
> was decided to discard security.capability xattr on Lustre client side. I
> think Andreas might have some insight, as he apparently participated in
> b15587.
>
> In any case, it is important to make clear that file capabilities, the
> feature you want to use, is completely distinct from SELinux.
> On the one hand, Capabilities are a Linux mechanism to refine permissions
> granted to privileged processes, by dividing the privileges traditionally
> associated with superuser into distinct units (known as capabilities).
> On the other hand, SELinux is the Linux implementation of Mandatory Access
> Control.
> Both Capabilities and SELinux rely on values stored into file extended
> attributes, but this is the only thing they have in common.
>
> Cheers,
> Sebastien.
>
> > Le 17 mai 2017 ? 16:16, Robin Humble  a
> ?crit :
> >
> > I setup a couple of VMs with 2.9 clients and servers (ldiskfs) and
> > unfortunately setcap/getcap still are unhappy - same as with my
> > previous 2.9 clients with 2.8 servers (ZFS).
> >
> > hmm.
> > I took a gander at the source and noticed that llite/xattr.c
> > deliberately filters out 'security.capability' and returns 0/-ENODATA
> > for setcap/getcap, which is indeed what strace sees. so setcap/getcap
> > is never even sent to the MDS.
> >
> > if I remove that filter (see patch on lustre-devel) then setcap/getcap
> > works ->
> >
> > # df .
> > Filesystem1K-blocks  Used Available Use% Mounted on
> > 10.122.1.5@tcp:/test8   4797904 33992   4491480   1% /mnt/test8
> > # touch blah
> > # setcap cap_net_admin,cap_net_raw+p blah
> > # getcap blah
> > 

Re: [lustre-discuss] lustre-discuss Digest, Vol 134, Issue 27

2017-05-11 Thread Ms. Megan Larko
On the Subject of "ost doesn't mount", the Lustre community version 2.8.0
had a documented issue in that the SELinux context was appended at each
mount so the resulting behavior is that subsequent mounts would fail
because of multiple contexts which is an invalid configuration as noted
by  Strikwerda, Ger  in the "tunefs.lustre --print " command:

Persistent mount opts:
,errors=remount-ro,context=unconfined_u:object_r:user_
tmp_t:s0,context=unconfined_u:object_r:tmp_t:s0

A Solution:
The disabling of SELinux does certainly work, but my own solution was to
write a small script which ran on boot pror to the lustre file system mount
to re-write the parameters (saved from my initial printout from the
tunefs.lustre command) omitting the "context=.." section in the
'tunefs-lustre --mountfsoptions="" '  This action allowed me to
continue to use Lustre 2.8.0 with SELinux in "targeted" mode.

FYI, this is LU-7002 and was patched 14 March 2016 in newer versions of
Lustre.

Cheers,
megan

On Thu, May 11, 2017 at 5:19 AM, 
wrote:

> Send lustre-discuss mailing list submissions to
> lustre-discuss@lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
> lustre-discuss-requ...@lists.lustre.org
>
> You can reach the person managing the list at
> lustre-discuss-ow...@lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>1. Re: ost doesn't mount (Strikwerda, Ger)
>2. Re: ost doesn't mount (Strikwerda, Ger)
>
>
> --
>
> Message: 1
> Date: Thu, 11 May 2017 10:34:28 +0200
> From: "Strikwerda, Ger" 
> To: Colin Faber 
> Cc: Lustre discussion 
> Subject: Re: [lustre-discuss] ost doesn't mount
> Message-ID:
>  ffrqh5_...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Colin,
>
> [root@umcg-storage03 ~]# tunefs.lustre --print /dev/dm-2
> checking for existing Lustre data: found
> Reading CONFIGS/mountdata
>
>Read previous values:
> Target: umcgst08-OST0002
> Index:  2
> Lustre FS:  umcgst08
> Mount type: ldiskfs
> Flags:  0x2
>   (OST )
> Persistent mount opts:
> ,errors=remount-ro,context=unconfined_u:object_r:user_
> tmp_t:s0,context=unconfined_u:object_r:tmp_t:s0
> Parameters:  mgsnode=172.23.34.214@tcp:172.23.34.213@tcp
>
>
>
>
>Permanent disk
> data:
>
> Target:
> umcgst08-OST0002
>
> Index:
> 2
>
> Lustre FS:
> umcgst08
>
> Mount type:
> ldiskfs
>
> Flags:
> 0x2
>
>   (OST
> )
>
> Persistent mount opts:
> ,errors=remount-ro,context=unconfined_u:object_r:user_
> tmp_t:s0,context=unconfined_u:object_r:tmp_t:s0
>
> Parameters:  mgsnode=172.23.34.214@tcp:172.23.34.213@tcp
>
>
>
> exiting before disk write.
>
> So it looks like the mount options are inside the filesystem? How do we get
> rid of those SELinux options? We have no mountoptions for this filesystem
> set in /etc/fstab
>
>
>
>
>
> On Wed, May 10, 2017 at 5:59 PM, Colin Faber 
> wrote:
>
> > 22 == invalid argument.
> >
> > Unrecognized mount option "context=unconfined_u:object_r:user_tmp_t:s0"
> > or missing value
> >
> > tunefs.lustre --print 
> >
> > Do you have this option set within your mount options list?
> >
> > -cf
> >
> >
> > On Wed, May 10, 2017 at 7:19 AM, Strikwerda, Ger <
> g.j.c.strikwe...@rug.nl>
> > wrote:
> >
> >> Hi all,
> >>
> >> On a OSS with SElinux disabled we get a strange probably selinux related
> >> error when we want to mount the OST:
> >>
> >> [root@umcg-storage03 /]# cat /etc/selinux/config
> >> SELINUX=disabled
> >>
> >> # mount -t lustre /dev/dm-3 /mnt/umcgst08-01
> >>
> >> dmesg log:
> >>
> >> Lustre: Lustre: Build Version: 2.8.0-RC5--PRISTINE-2.6.32-573
> >> .12.1.el6_lustre.x86_64
> >>
> >> LDISKFS-fs (dm-3): Unrecognized mount option
> >> "context=unconfined_u:object_r:user_tmp_t:s0" or missing value
> >> LustreError: 3266:0:(osd_handler.c:6305:osd_mount())
> >> umcgst08-OST0001-osd: can't mount /dev/mapper/
> 360080e50002d407603ca52240901:
> >> -22
> >> LustreError: 3266:0:(obd_config.c:578:class_setup()) setup
> >> umcgst08-OST0001-osd failed (-22)
> >> LustreError: 3266:0:(obd_mount.c:203:lustre_start_simple())
> >> umcgst08-OST0001-osd setup error -22
> >> LustreError: 3266:0:(obd_mount_server.c:1764:server_fill_super())
> Unable
> >> to start osd on /dev/mapper/360080e50002d407603ca52240901: -22
> >> LustreError: 3266:0:(obd_mount.c:1426:lustre_fill_super()) Unable to
> >> mount  (-22)
> >>
> >> What does a -22 Unable to mount exactly means? And how can we get rid of
> >> the unrecognized mount options?
> >>
> >>
> >> --
> >>
> >> Vriendelijke groet,
> >>
> >> Ger StrikwerdaChef Special
> >> Rijksuniversiteit Groningen
> >> Centrum voor Informatie Technologie
> >> Unit Pragmatisch 

Re: [lustre-discuss] lustre-discuss Digest, Vol 130, Issue 21

2017-01-30 Thread Ms. Megan Larko
Greetings,

WRT to Subject:  Lustre Installation...
 From: Devendra Patil


For CentOS 7.2 (not using ZFS, but XFS backing):
Server kernel: kernel-3.10.0-327.3.1.el7_lustre.x86_64
Client kernel:  kernel-3.0.10-327.3.1.el7.x86_64

Lustre 2.8.0:
Server:  basically, I used the rpms that matched the kernel.   such as
lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.rpm
and similar files such as kernel-headers-3.10.0-327.3.1.el7_lustre.x86_64,
lustre-modules-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64
lustre-osd-ldiskfs-2.8.0-3.10.0_327.3.1.el7_lustre.X86_64
lustre-osd-ldiskfs-mount-2.8.0-3.10.0_327.3.1.el7_lustre.X86_64
...and lustre-dkms, lustre-iokit, lustre-osd-zfs, lustre-osd-zfs-mount

Client is very similar, but shorter Lustre rpm list:
lustre-client-2.8.0-3.10.0_327.3.1.el7.x86_64
lustre-client-modules-2.8.0-3.10.0_327.3.1.el7.x86_64
lustre-client-dkms-2.8.0-1.el7.noarch
lustre-client-tests-2.8.0-3.10.0_327.3.1.el7.x86_64

Cheers!
megan

On Fri, Jan 27, 2017 at 7:17 AM, 
wrote:

> Send lustre-discuss mailing list submissions to
> lustre-discuss@lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
> lustre-discuss-requ...@lists.lustre.org
>
> You can reach the person managing the list at
> lustre-discuss-ow...@lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>1. Lustre Installation... (Devendra Patil)
>2. Re: Lustre Installation... (Jeff Slapp)
>
>
> --
>
> Message: 1
> Date: Fri, 27 Jan 2017 15:35:34 +0530
> From: Devendra Patil 
> To: lustre-discuss@lists.lustre.org
> Subject: [lustre-discuss] Lustre Installation...
> Message-ID:
>  eeupkb...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello,
>
> I'm newbie to lustre. Currently I'm facing issue while installing lustre on
> CentOs 7.2. Can anyone guide me which OS to use and compatible lustre
> version for it.
>
> Thank you,
> Devendra Patil
> -- next part --
> An HTML attachment was scrubbed...
> URL:  lustre.org/attachments/20170127/f5d3b3cb/attachment-0001.htm>
>
> --
>
> Message: 2
> Date: Fri, 27 Jan 2017 12:17:18 +
> From: Jeff Slapp 
> To: Devendra Patil ,
> "lustre-discuss@lists.lustre.org"   <
> lustre-discuss@lists.lustre.org>
> Subject: Re: [lustre-discuss] Lustre Installation...
> Message-ID:
> <9ae4f4edcfd448018dad607386e5f...@mail-re3.datacoresoftware.com>
> Content-Type: text/plain; charset="utf-8"
>
> Good day Devendra,
>
> Below are the steps I used to install the MGS, OSS and client using ZFS.
> If you are using CentOS 7.2 you will need to adjust some of the parameters
> below to match:
>
> [INITIAL OS INSTALL]
> Using CentOS 7.3.1611 with the following roles enabled:
>File and Storage Server
>Guest Agents (if in a VM)
>Large System Performance
>Network File System Client
>Performance Tools
>Compatibility Libraries
>Development Tools
>
> [POST OS INSTALL STEPS TO BE PERFORMED ON ALL LUSTRE NODES (MGS/MGT AND
> OSS)]
> hostname [YOUR SERVER NAME] - or use nmtui to configure network interfaces
> vi /etc/yum.repos.d/lustre_server.repo
>[lustre-server]
>name=CentOS-$releasever - Lustre server
>baseurl=https://downloads.hpdd.intel.com/public/lustre/
> lustre-2.9.0/el7.3.1611/server/
>gpgcheck=0
> yum -y install http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-
> release-7-8.noarch.rpm
> yum clean all
> kernel_version=`yum list --showduplicates kernel | grep lustre-server |
> awk '{print $2}'`
> kernel_firmware_version=`yum list --showduplicates kernel-firmware | grep
> lustre-server | awk '{print $2}'`
> yum -y install --nogpgcheck --setopt=protected_multilib=false
> kernel-${kernel_version} kernel-firmware-${kernel_firmware_version}
> kernel-devel-${kernel_version} kernel-headers-${kernel_version}
> yum clean all
> yum -y install yum-plugin-versionlock
> yum versionlock add kernel
> yum versionlock add kernel-firmware
> yum versionlock add kernel-devel
> yum versionlock add kernel-headers
> yum clean all
> yum-config-manager --disable lustre-server
> yum -y install http://download.zfsonlinux.org/epel/zfs-release.el7_3.
> noarch.rpm
> yum clean all
> yum-config-manager --disable zfs
> yum-config-manager --enable zfs-kmod
> ** reboot **
> yum -y install wget
> yum -y install rpm-build
> yum -y install kmod-zfs-devel libzfs2-devel
> yum -y install libselinux-devel libtool
> rm -f lustre-2.9.0-1.src.rpm&& wget -q https://downloads.hpdd.intel.
> com/

Re: [lustre-discuss] lustre-discuss Digest, Vol 128, Issue 11

2016-11-29 Thread Ms. Megan Larko
Follow-up to Subject: Lustre client mount fails: Request sent has timed out
for slow reply

Thank you for the suggestions.  I was able to work past this error.  I am
not certain of the exact solution.   I did stop and restart my CentOS 7.2
opensm service.  While that did not seem to change anything immediately,
upon my return to the office after Thanksgiving the next compute nodes were
successfully connected on the InfiniBand network fabric and the Lustre
(2.8.0) file system mounted quickly as I issued the command.

So guessing here:  I had to restart the opensm service and just be patient.

Cheers,
megan

On Fri, Nov 25, 2016 at 4:06 PM, 
wrote:

> Send lustre-discuss mailing list submissions to
> lustre-discuss@lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
> lustre-discuss-requ...@lists.lustre.org
>
> You can reach the person managing the list at
> lustre-discuss-ow...@lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>1. Re: Lustre client mount fails: Request sent has timed out for
>   slow reply (Dilger, Andreas)
>2. Re: Distributing locally (Dilger, Andreas)
>
>
> ------
>
> Message: 1
> Date: Fri, 25 Nov 2016 20:25:54 +
> From: "Dilger, Andreas" 
> To: "Ms. Megan Larko" 
> Cc: Lustre User Discussion Mailing List
> 
> Subject: Re: [lustre-discuss] Lustre client mount fails: Request sent
> has timed   out for slow reply
> Message-ID: <0e44cde8-d3e4-41c2-84a0-683b398ff...@intel.com>
> Content-Type: text/plain; charset="us-ascii"
>
> Possible causes in cases like this:
> - duplicate client IP addresses (used only at connect time for o2iblnd)
> - firewall rules (though unlikely to be the case for IB)
> - SELinux (this is supported in Lustre 2.7+ but can still have rules that
> prevent mounting)
>
> Sorry, I don't know anything about opensm.  Presumably you've restarted
> these clients, and
> other IB-level communications are working?
>
> Cheers, Andreas
>
> On Nov 25, 2016, at 12:05, Ms. Megan Larko  wrote:
> >
> > Greetings List!
> >
> > I have a very small HPC cluster running CentOS 7.2.  The lustre servers
> are running lustre kernel-3.10.0-327.3.1.el7_lustre.x86_64.   The clients
> are running kernel-3.10.0-327.3.1.el7.x86_64.
> >
> > I have two compute node clients successfully mounting the Lustre file
> system from the servers.  The next two compute clients will not mount
> lustre.  I have the lustre-client-3.8.0-3.10.0_327.3.1.el7.x86_64 and
> lustre-client-modules-2.8.0-e.10.0_327.3.1.el7.x86_64 rpm installed on
> all compute clients, including the next two.  My InfiniBand network is up
> and successfully pings the other systems.  I can cleanly "modprobe lustre"
> using /etc/modprobe.d/lustre.conf containing one line: options lnet
> networks="o2ib0(ib0)".  This information is the same on both Lustre client
> and server systems, all of which use ib0.
> >
> > On the next two compute clients I can successfully "lctl ping
> mds-ib@o2ib0" and successfully ping the oss similarly.  I try to mount
> the Lustre file system on the next two compute clients via the command
> "mount -t lustre A.B.C.D@o2ib0:/myLustre /myLustre where the A.B.C.D
> address exists and works as described above and the Lustre FS is "myLustre"
> and successfully mounts on the two earlier compute clients.
> >
> > This mount fails on both of my next two compute clients with the STDERR:
> >
> > mount.lustre: mount A.B.C.D@o2ib0:/myLustre /myLustre failed:
> Input/output error
> >
> > The compute client /var/log/messages file shows:
> > [date] [hostname] kernel: Lustre: 
> > 51814:0:(client.c:2063:ptlrpc_expire_one_request())
> @@@ Request sent has timed out for slow reply: [sent 1480097968/real
> 1480097992]  req@8800aa14000 x1551992831868952/t0(0)
> o250->MCGA.B.C.D@o2ib@A.B.C.D@o2ib:26:25 lens 520/544 e 0 to 1 dl
> 1480997973 ref 1 fl Rpc:XN/0/ rc 0/-1
> >
> > The above appears 2X in a row followed by:
> > [date] [hostname] kernel: LustreError: 15c-8: MGCA.B.C.D@o2ib: The
> configuration from log 'myLustre-client' failed (-5).  This may be the
> result of communication errors between this node and the MGS, a bad
> configuration, or other errors.  See the syslog for m

[lustre-discuss] Lustre client mount fails: Request sent has timed out for slow reply

2016-11-25 Thread Ms. Megan Larko
Greetings List!

I have a very small HPC cluster running CentOS 7.2.  The lustre servers are
running lustre kernel-3.10.0-327.3.1.el7_lustre.x86_64.   The clients are
running kernel-3.10.0-327.3.1.el7.x86_64.

I have two compute node clients successfully mounting the Lustre file
system from the servers.  The next two compute clients will not mount
lustre.  I have the lustre-client-3.8.0-3.10.0_327.3.1.el7.x86_64 and
lustre-client-modules-2.8.0-e.10.0_327.3.1.el7.x86_64 rpm installed on all
compute clients, including the next two.  My InfiniBand network is up and
successfully pings the other systems.  I can cleanly "modprobe lustre"
using /etc/modprobe.d/lustre.conf containing one line: options lnet
networks="o2ib0(ib0)".  This information is the same on both Lustre client
and server systems, all of which use ib0.

On the next two compute clients I can successfully "lctl ping mds-ib@o2ib0"
and successfully ping the oss similarly.  I try to mount the Lustre file
system on the next two compute clients via the command "mount -t lustre
A.B.C.D@o2ib0:/myLustre /myLustre where the A.B.C.D address exists and
works as described above and the Lustre FS is "myLustre" and successfully
mounts on the two earlier compute clients.

This mount fails on both of my next two compute clients with the STDERR:

mount.lustre: mount A.B.C.D@o2ib0:/myLustre /myLustre failed: Input/output
error

The compute client /var/log/messages file shows:
[date] [hostname] kernel: Lustre:
51814:0:(client.c:2063:ptlrpc_expire_one_request())
@@@ Request sent has timed out for slow reply: [sent 1480097968/real
1480097992]  req@8800aa14000 x1551992831868952/t0(0)
o250->MCGA.B.C.D@o2ib@A.B.C.D@o2ib:26:25 lens 520/544 e 0 to 1 dl
1480997973 ref 1 fl Rpc:XN/0/ rc 0/-1

The above appears 2X in a row followed by:
[date] [hostname] kernel: LustreError: 15c-8: MGCA.B.C.D@o2ib: The
configuration from log 'myLustre-client' failed (-5).  This may be the
result of communication errors between this node and the MGS, a bad
configuration, or other errors.  See the syslog for more information.
[date] [hostname] kernel: Lustre: Unmounted myLustre-client
[date] [hostname] kernel: LustreError:
53873:0:(obd_mount.c:1426:lustre_fill_super())
unable to mount  (-5)

As all four compute nodes are built from a single kickstart file, I do  not
understand why two compute clients can mount the /myLustre file system and
two cannot.The IB fabric on the in-kernel opensm-3.3.10-1.el7.x86_64
looks clean with no entries in the /var/log/opensm-unhealthy-ports-dump.
If I go all the way back to the last opensm start I do see a single line in
/var/log/opensm.log on the opensm server for the next compute client
stating:
subn_validate_neighbor: ERR 7518: neighbor does not point back at us (guid:
[GUID of my next compute client])

Is this last opensm error completely stopping my Lustre mount when all
other IP pings are completely successful?

TIA,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre-2.8.0 tunables

2016-05-27 Thread Ms. Megan Larko
Greetings List!

I very recently installed Lustre-2.8.0 in CentOS 7.2 on bare metal in a
small test environment for evaluation.

I was going through my usual Lustre tunable process and I noticed that
while there is still a
/proc/fs/lustre/osc/{my_LustreOST-OSC_stuff}/max_rpcs_in_flight (=8), the
corresponding item of max_dirty_mb (was =32 by default) is no longer
present in Lustre-2.8.0.  There appears a new (to me) item of
max_rpcs_in_progress (=4096 by default).  Would I still need to tune
max_rpcs_in_flight now that I see this new max_rpcs_in_progress?   Is there
a presentation/paper on Lustre-2.8.0 tunables (particularly for InfiniBand
networks)?

TIA,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre-2.8.0 and SELinux

2016-05-27 Thread Ms. Megan Larko
Greetings List!

I have successfully installed a very small, test instantiation of
Lustre-2.8.0.
I have a question about Lustre-2.8.0 and SELinux.  I have read that this
version of Lustre is compatible with SELinux (in CentOS 7.2, in my case)
enforcing mode.  I observe that, by default, SELinux is "enforcing" in
CentOS 7.2.   I also notice that the default SELinuxType is "targeted".
The "semanage module -l" does not show a "lustre" type of listing.  What
should I be looking for to determine if CentOS 7.2 SELinux is using a
security policy for my Lustre client mounts?

Also, I did read in https://jira.hpdd.intel.com/browse/LU-5560 that the
test for SELinux support on the client did not quite make it into the
lustre-2.8.0 release.  Does this mean that I don't have SELinux support for
Lustre clients or only that the test plan is not in the Lustre-2.8.0
software?


Thanks to all the developers and contributors to Lustre-2.8.0.   I look
forward to learning this updated version with the additional features.

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lfs_migrate question

2016-02-17 Thread Ms. Megan Larko
Greetings to One and All!

I am looking at the lfs_migrate command to move files among OST's in a file
system.
In "Lustre Software Release 2.x Operations Manual" Section 33.2.2
Description of lfs_migrate it indicates that "Because lfs_migrate is not
closely integrated with the MDS, it cannot determine whether a file is
currently open and/or in-use by other applications or nodes.  This makes it
UNSAFE (capitalized in Manual) for use on files that might be modified by
other applications, since the migrated file is only a copy of the current
file.  This results in the old file becoming an open-unlinked file and any
modifications to that file are lost."

All of the lfs_migrate examples show the command being run on an
active/mounted Lustre file system.  Is there any way in which one knows
whether a rebalanced/migrated file was in-use at the time of migration (or
that it was not in-use at the time of migration)?  On a mounted Lustre FS,
is it necessary to make the file system or directories therein read-only
for the migration activity?  Would this trait of lfs_migrate being unable
to determine whether the file scheduled to be migrated is or is not in-use
pose an issue if new OST's are added to the file system and lfs_migrate
command is issued (rather than wait for Lustre to re-balance the load over
new OSTs by attrition, as it were)?

TIA for the clarifications.
Still learning

megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] [re] issue in lnet con

2016-02-08 Thread Ms. Megan Larko
Hi Parag,

Could you please share a little more information with us?

Is this a new or existing Lustre file system?
What version of Lustre is running on which operating system (including
kernel version)?

I can't tell from your message if you have run something like a "yum
update" which updated your linux kernel to a newer number on your lustre
server such that that the kernel no longer matches the kernel number
specified in the lustre server rpms,   or if you are building a new lustre
system and have not matched the lustre server kernel with the operating
system kernel.

Or perhaps it is something else entirely.  More information would be useful.

Cheers,
megan


Hi,

I am facing an issue while running "modprobe lnet"

FATAL: Error inserting lnet
(/lib/modules/2.6.32.504.el6_lustre/extra/kernel/net/lustre/lnet.ko):
Unknown symbol in module, or unknown parameter (see dmesg)

Regards,
Parag
+91 8308806004
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre server community release for CentOS 7.x

2016-02-02 Thread Ms. Megan Larko
Howdy!

My org would like to test Lustre server for CentOS 7.1 or 7.2.  I see on
Jenkins server ( https://build.hpdd.intel.com/job/lustre-master/ ) that a
Lustre server version exists for CentOS 7.x.  Is that a community version?
Is there any community version of Lustre server available for testing on
CentOS 7.x?  Does this version work with ZFS?

TIA,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] patchless Lustre client on ZFS

2016-01-30 Thread Ms. Megan Larko
Thank you, Patrick,

I appreciate the clarification.  I wasn't sure what "patchess" really meant
in this context.

Cheers,
megan

On Sat, Jan 30, 2016 at 9:19 AM, Patrick Farrell  wrote:

> No, patchless refers to not needing a patched version of the kernel
> itself.  You'll still need the Lustre client bits you noted installed (and
> they will need to be built for your particular kernel version). Also, the
> ability to use patchless clients doesn't depend on ZFS vs ldiskfs on your
> servers, it will work either way.
>
> - Patrick
> 
> From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf
> of Ms. Megan Larko [dobsonu...@gmail.com]
> Sent: Friday, January 29, 2016 10:31 PM
> To: Lustre User Discussion Mailing List
> Subject: [lustre-discuss] patchless Lustre client on ZFS
>
> Greetings,
>
> I have been reading that if the Lustre 2.6 server is built on ZFS then a
> patchless client may be used.  So am I understanding correctly that a
> Lustre 2.6 server on ZFS may have a client successfully mount without the
> lustre-client and lustre-client-module packages installed?
>
> TIA for the clarification.
> Cheers,
> megan
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] patchless Lustre client on ZFS

2016-01-29 Thread Ms. Megan Larko
Greetings,

I have been reading that if the Lustre 2.6 server is built on ZFS then a
patchless client may be used.  So am I understanding correctly that a
Lustre 2.6 server on ZFS may have a client successfully mount without the
lustre-client and lustre-client-module packages installed?

TIA for the clarification.
Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Large-scale UID/GID changes via lfs

2015-10-14 Thread Ms. Megan Larko
Hello,

I have been able to successfully use "lfs lsetfacl ." to set and modify
permissions on a Lustre file system quickly with a small system because the
lfs is directed at the Lustre MDT.  It is similar, I imagine, to using "lfs
find..." to search a Lustre fs compared with a *nix "find..." command,  the
latter which must touch every stripe located on any OST.

So, how do a change UID and/or GID over a Lustre file system?  Doing a *nix
find and chown seems to have the same detrimental performance.

>lfs lgetfact my.file
The above returns the file ACL info.  I can change permissions and add a
group or user access/perm but I don't know how to change the "header"
information. (To see the difference in header information, one could try
"lfs lgetfact --no-head my.file" which shows the ACL info without the
header.)

>lfs lsetfacl -muser:newPerson:rwx my.file
The above adds user with those perms to the original user listed in the
header info.

This is using Lustre version 2.6.x (forgot minor number). on RHEL 6.5.

Suggestions genuinely appreciated.
Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre-discuss Digest, Vol 108, Issue 11

2015-03-28 Thread Ms. Megan Larko
Hello Alex,

While not actually a Lustre solution, if this is a linux/unix system, you
might be able to add swap space onto a USB device or such similar by using
the linux mkswap command.  This is not as fast as system memory, but it can
provide some wiggle room in tight memory situations.

Just an idea...
Megan Larko

On Thu, Mar 26, 2015 at 8:00 AM, 
wrote:

> Send lustre-discuss mailing list submissions to
> lustre-discuss@lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> or, via email, send a message with subject or body 'help' to
> lustre-discuss-requ...@lists.lustre.org
>
> You can reach the person managing the list at
> lustre-discuss-ow...@lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>1. lester on mds with low memory (Alexander Oltu)
>
>
> --
>
> Message: 1
> Date: Thu, 26 Mar 2015 11:14:47 +0100
> From: Alexander Oltu 
> To: lustre-discuss@lists.lustre.org
> Subject: [lustre-discuss] lester on mds with low memory
> Message-ID: <9b6188d0-2728-4d6c-a40a-2f22c66d3...@uib.no>
> Content-Type: text/plain; charset=us-ascii
>
> Hi,
>
> We are trying to run lester (https://github.com/ORNL-TechInt/lester) on
> MDS with low memory and about 41M files in Lustre filesystem. There is an
> extensive amount of file operations going all the time on the filesystem.
>
> This is the current memory situation on MDS:
>
> cat /proc/meminfo |egrep 'MemTotal|MemFree|Buffers|Slab'
> MemTotal:   16533620 kB
> MemFree: 1025312 kB
> Buffers:12591976 kB
> Slab:1391508 kB
>
> By dropping caches we can make maximum available memory of 1,5GB. When we
> run Lester it gets killed by OOM (System has no swap).
> I tried using UNIX IO manager and decreasing group and dir readahead, but
> there is still not enough memory to finish scan.
>
> Anyone has any experience with running lester on low memory? Will
> appreciate any suggestion. (There is no option to add physical RAM).
>
> Thanks,
> Alex.
>
> --
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
> End of lustre-discuss Digest, Vol 108, Issue 11
> ***
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre-discuss Digest, Vol 106, Issue 18

2015-01-14 Thread Ms. Megan Larko
Greetings,

I concur with Mr. Ball that running a Lustre file system in excess of 90%
full can be problematic.  In my personal experience numbers above 92% have
caused slow response times for users especially for write activity.
Depending upon your Lustre stripe set-up, the system takes a longer time
using the Lustre default stripe of one in its attempt to locate enough
space to store a file.   It a larger stripe number is used, the problem
does  not go away but it is lessened.  I have had a few experiences in
which if one OST completely fills to 100% then nothing else may be written
anywhere on that single-mount-point Lustre file system.   Man oh man!  Have
I heard user complaints about that!   "What do you mean no more space?   A
df shows me another 800Gb (on a 100Tb files system)".

That said, I have had success with creating a folder with a specified
stripe size of  two or so less than the total number of OSTs in the file
system and putting files into that striped folder until I can re-balance
the file system either by a clean-up of deleting files, moving them to tape
or some other archive system, or until I can add more OSTs (I like that
grow feature!).

Someone once said that files will grow to consume all available space.  I
forget the attribution.

Cheers,
megan

On Wed, Jan 14, 2015 at 3:55 PM, 
wrote:

> Send Lustre-discuss mailing list submissions to
> lustre-discuss@lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> or, via email, send a message with subject or body 'help' to
> lustre-discuss-requ...@lists.lustre.org
>
> You can reach the person managing the list at
> lustre-discuss-ow...@lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Lustre-discuss digest..."
>
>
> Today's Topics:
>
>1. Performance dropoff for a nearly full Lustre file system
>   (Mike Selway)
>2. Re: Performance dropoff for a nearly full Lustre file system
>   (Bob Ball)
>
>
> --
>
> Message: 1
> Date: Wed, 14 Jan 2015 19:43:02 +
> From: Mike Selway 
> To: "lustre-discuss@lists.lustre.org"
> 
> Subject: [Lustre-discuss] Performance dropoff for a nearly full Lustre
> filesystem
> Message-ID:
> <5073651db6c02643b8739403be96a0e27bc...@cfwex01.americas.cray.com>
> Content-Type: text/plain; charset="us-ascii"
>
> Hello,
>I'm looking for experiences for what has been observed to
> happen (performance drop offs, severity of drops, partial/full failures,
> ...) when an operational Lustre File System has been almost "filled"...
> percentages of interest are in the range from say 80% to 99%.  Multiple
> responses appreciated.
>
> Also, comments from anyone who has implemented a Robin Hood approach,
> about how they worked to avoid performance drop offs of a "near full" file
> system by "archiving and releasing data blocks" to auto-reconstruct
> continuous data areas.
>
> Thanks!
> Mike
>
> Mike Selway | Sr. Storage Architect (TAS) | Cray Inc.
> Work +1-301-332-4116 | msel...@cray.com
> 146 Castlemaine Ct,   Castle Rock,  CO  80104|   Check out Tiered Adaptive
> Storage (TAS)!<
> http://www.cray.com/Products/Storage/Tiered-Adaptive-Storage.aspx>
>
> [cid:image001.png@01CF36E5.85AF42A0]
> [cid:image002.jpg@01D02FF7.A1B0F1E0]
>
> -- next part --
> An HTML attachment was scrubbed...
> URL: <
> http://lists.lustre.org/pipermail/lustre-discuss/attachments/20150114/65e3298d/attachment-0001.html
> >
> -- next part --
> A non-text attachment was scrubbed...
> Name: image001.png
> Type: image/png
> Size: 5290 bytes
> Desc: image001.png
> URL: <
> http://lists.lustre.org/pipermail/lustre-discuss/attachments/20150114/65e3298d/attachment-0001.png
> >
> -- next part --
> A non-text attachment was scrubbed...
> Name: image002.jpg
> Type: image/jpeg
> Size: 2329 bytes
> Desc: image002.jpg
> URL: <
> http://lists.lustre.org/pipermail/lustre-discuss/attachments/20150114/65e3298d/attachment-0001.jpg
> >
>
> --
>
> Message: 2
> Date: Wed, 14 Jan 2015 15:55:13 -0500
> From: Bob Ball 
> To: Mike Selway , "lustre-discuss@lists.lustre.org"
> 
> Subject: Re: [Lustre-discuss] Performance dropoff for a nearly full
> Lustre file system
> Message-ID: <54b6d7b1.7020...@umich.edu>
> Content-Type: text/plain; charset="windows-1252"; Format="flowed"
>
> In my memory, it is not recommended to run Lustre more than 90% full.
>
> bob
>
> On 1/14/2015 2:43 PM, Mike Selway wrote:
> >
> > Hello,
> >
> >I?m looking for experiences for what has been observed
> > to happen (performance drop offs, severity of drops, partial/full
> > failures, ?) when an operational Lustre File System has been almost
> > ?filled?

[Lustre-discuss] Fast error reporting

2014-03-08 Thread Ms. Megan Larko
Just my $0.02 here.

I am in agreement with Mr. A. Dilger.  I am a vote in favor of the present
Lustre default behavior.   The pausing of operations is a good Lustre
feature for us.   I have worked with various systems in which a network
hiccup will not crash the job. In the present Lustre behavior; the job will
just pause for a bit (a configurable number, if I recall correctly).  We
have left the default value in place.  It prevents us from having jobs fail
because of momentary (one minute or less) holds in the network traffic.

If Yao wishes it to be a shorter time to failing the job, I think he should
have the freedom to configure the value that works for him.

My opinion, YMMV.
Cheers,
megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] How to apply a lustre patch

2013-01-09 Thread Ms. Megan Larko
Greetings List!

I have recently experienced an issue on a lustre-2.1.2 system (kernel
2.6.32-220.17.1.el6_lustre.x86_64) in which the error on the MGS/MDS
server matches LU-1596.  On the web page there is a link (reprinted
below) to a page from Oleg Drokin containing the patches to fix the
issue.   I see all of the modified files listed.   Do I need to use
patch or quilt to apply these patches?  I don't know how to go about
applying these.  Is there a web page explaining HOWTO do this
correctly?

The page with the patch listings:
http://review.whamcloud.com/#change,3623

Thank you for the help.

megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] problem with installing lustre and OFED

2013-01-02 Thread Ms. Megan Larko
Greetings Jason,

As you have most likely discovered, Mellanox (MLNX) needs to be built
into the lustre linux kernel to use InfiniBand.

I worked on such an issue recently.   The Whamcloud linux kernel
2.1.2-2.6.32_220.17.1.el6_lustre would not work with our Mellanox
InfiniBand (IB) drivers optimally.  We got the MLXN version 1.8.5 to
match our Mellanox hardware and had to do the dance already described
to you in this list of...
1.   downloading all of the appropriate (Whamcloud) lustre linux
kernels, header and devel rpms
2.   boot into the lustre kernel
3.   in our /usr/src/lustre-2.1.2 directory built lustre against the
Mellanox "Module.symvers" information (which is why you see the
"Input/Output" errors on fid.ko, mdc.ko, osc.ko, lov.ko and because of
the aforementioned items, the lustre.ko.   The MLNX version 1.8.5 that
we needed was in the /usr/src/ofa_kernel directory (with the
Module.symvers etc)  We used the defaults other than the o2ib so
our command in the /usr/src/lustre-2.1.2 directory looked like
"./configure --with-o2ib=/usr/src/ofa_kernel"
4.   next we issued "make"
5.   next we chose to run a "make rpms" command so that we could have
rpms for our system for cluster re-building

We had to do this for *both* our lustre servers and lustre clients
(using the lustre-client Whamcloud kernel, headers, ...   So we had
the servers and the clients communicating properly over the MLNX ib
fabric.

In /etc/modprobe.d  we used a lustre.conf file to explicitly direct
the system to use the o2ib network when starting lustre at boot.

Without the above actions the ko2iblnd would not load.

Just confirming that you need to build Mellanox on servers and clients
to use MLNX IB with Lustre cluster file system.

megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] lo2iblnd and Mellanox IB question

2012-11-21 Thread Ms. Megan Larko
Thanks, especially to Colin and to Jeff.

Yup.  I suspected that I would have to rebuild the Lustre 2.1.2 I have
to make use of the Mellanox IB.   Colin,  I appreciate the check; I
did not have conflicting IB drivers.  Jeff, I will heed your advice
and I will start my rebuild after the (U.S.) holiday weekend.

An enjoyable weekend to one and all!
megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] lo2iblnd and Mellanox IB question

2012-11-20 Thread Ms. Megan Larko
Hello to Everyone!

I have a question to which I think I know the answer, but I am seeking
confirmation (re-assurance?).

I have build a RHEL 6.2 system with lustre-2.1.2.   I am using the
rpms from the Whamcloud site for linux kernel
2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching
lustre,  lustre-modules, lustre-ldiskfs, and kernel-devel,I also
have from the Whamcloud site
kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related
kernel-ib-devel for same.

The lustre file system works properly for TCP.

I would like to use InfiniBand.   The system has a new Mellanox card
for which mlxn1 firmware and drivers were installed.   After this was
done (I cannot speak to before) the IB network will come up on boot
and copy and ping in a traditional network fashion.

Hard Part:  I would like to run the lustre file system on the IB (ib0).
I re-created the lustre network to use /etc/modprobe.d/lustre.conf
pointing to o2ib in place of tcp0.   I rebuilt the mgs/mdt and all
osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and
the osts point to mgs on IB net).   When I "modprobe lustre" to start
the system I receive error messages stating that there are
Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko
lov.ko.   The lustre.ko cannot be started.   A look in
/var/log/messages reveals many "Unknown symbol" and "Disagrees about
version of symbol"  from the ko2iblnd module.

A "modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko"  shows it
pointing to the Modules.symvers of the lustre kernel.

Am I correct in thinking that because of the specific Mellanox IB
hardware I have (with its own /usr/src/ofa_kernel/Module.symvers
file), that I have to build Lustre-2.1.2 from tarball to use the
"configure --with-o2ib=/usr/src/ofa_kernel"  mandating that this
system use the ofa_kernel-1.8.5  modules and not the OFED 1.8.5 from
the kernel-ib rpms  to which Lustre defaults in the Linux kernel?

Is a rebuild of lustre from source mandartory or is there a way in
which I may point to the appropriate symbols needed by the
ko2iblnd.ko?

Enjoy the Thanksgiving holiday for those U.S. readers.To everyone
else in the world, have a great weekend!

Megan Larko
Hewlett-Packard
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] lctl ping of Pacemaker IP

2012-11-04 Thread Ms. Megan Larko
Greetings,

My present solution for my corosync/pacemaker control of my Lustre
filesystem availability was to make a Linux Standards Base (LSB) Sys V
init script for my IB0 service and then I could use the corosync
primitive to control the IB network (and therefore the MGS).  Being
that I did not know how to make the corosync alias IP accessible to
LNET for a successful lctl ping required for Lustre OSS nodes to
properly communicate with the MGS/MDS, I chose to point to the real
InfiniBand ib0 IP and coorsync align  that network address with the
system servinig the fibre channel multipath mgs/mdt disk.   In this
way the ost disks have one and only one mgsnode (no failover because
the IB0 address fails over).

This has been successful in my TCP test (an LSB-compliant service for
eth1).   I plan on implementing this week when the IB hardware comes
in.

Thanks for your help.  I appreciate it.

Cheers,
megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] lctl ping of Pacemaker IP

2012-11-01 Thread Ms. Megan Larko
Greetings!

I am working with Lustre-2.1.2 on RHEL 6.2.  First I configured it
using the standard defaults over TCP/IP.   Everything worked very
nicely usnig a real, static --mgsnode=a.b.c.x value which was the
actual IP of the MGS/MDS system1 node.

I am now trying to integrate it with Pacemaker-1.1.7.I believe I
have most of the set-up completed with a particular exception.  The
"lctl ping" command cannot ping the pacemaker IP alias (say a.b.c.d).
The generic ping command in RHEL 6.2 can successfully access the
interface.  The Pacemaker alias IP (for failover of the combnied
MGSMDS node with Fibre Channel multipath storage shared between both
MGS/MDS-configured machines)  works in and of itself.  I tested with
an apache service.   The Pacemaker will correctly fail over the
MGS/MDS from system1 to system2 properly.  If I go to system2 then my
Lustre file system stops because it cannot get to the alias IP number.

I did configure the lustre OSTs to use --mgsnode=a.b.c.d (a.b.c.d
representing my Pacemaker IP alias).  A tunefs.lustre confirms the
alias IP number.  The alias IP number does not appear in LNET (lctl
list_nids), and "lctl ping a.b.c.d" fails.

Should this IP alias go into the LNET data base?  If yes, how?   What
steps should I take to generate a successful "lctl ping a.b.c.d"?

Thanks for reading!
Cheers,
megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre Chroma screen shots

2012-10-11 Thread Ms. Megan Larko
Hello,

I was reading on the Whamcloud page about the Chroma tool for managing
Lustre.   I am preparing to install Lustre 2.1.2 onto a new
(ly-reformatted) RHEL 6.2 system.   I am curious about the value-added
of Chroma as compared to my standard CLI tool  habit.

Are there any screen shots or glossies out there about Chroma's
value-added for Lustre?

TIA,
Megan Larko
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Fwd: Lustre Interoperability Question

2012-09-19 Thread Ms. Megan Larko
Hi,

Re-send to the proper (I hope) email list.

Megan

-- Forwarded message --
From: Ms. Megan Larko 
Date: Wed, Sep 19, 2012 at 4:24 PM
Subject: Lustre Interoperability Question
To: lustre-discuss-requ...@lists.lustre.org


Hello List!

I am looking for clarification on a Lustre interoperability question.
 There is a system currently running RHEL 5.5 and Lustre 1.8.3 to
which new hardware would like to be added.  The new hardware is not
supported under RHEL 5.x so it needs to be installed as RHEL 6.2 as a
minimum.   There is considerable doubt that one section of the old
hardware will be able to function under RHEL 6.   A version of the
Lustre Manual for 2.0 (
http://wiki.lustre.org/manual/LustreManual20_HTML/UpgradingLustre.html
) Section 16.1 indicates the possibility of heterogenous Luster
servers---specifically mixed 1.8 and 2.0 servers.  Section 16.1 goes
on to state that Lustre 1.8.4 must be on the client and server nodes
not upgraded to 2.0.

Further research on this issue led me to Whamcloud LU-1116 (
http://jira.whamcloud.com/browse/LU-1116 ) in which a bug (actually a
linux kernel item name change) indicates that for a RHEL 6.2 client to
support 1.8 Lustre (and 2.0 Lustre)  the clients minimum Lustre
version must be 1.8.7 (patched) or 1.8.8 (where the patch is landed).

The Question:  Can Lustre effectively serve a file system with
heterogenous servers of RHEL 5.5 (1.8.3) and RHEL 6.2 (2.1 or better)?

If not, would a solution of RHEL 5.5 with an upgrade from 1.8.3 to
1.8.8 function  with the new hardware using RHEL 6.2 (either Lustre
1.8.8 or 2)?   Do the clients have to run Lustre 1.8.8 at a minimum?

All guidance and suggestions are appreciated.

Cheers!
Megan Larko
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] WRT "obdidx ordering in "lfs getstript"

2012-02-09 Thread Ms. Megan Larko
Greetings!

I was reading Mr. David's query about the ordering of data on a
striped luster file system.   I too am under the impression that the
data stripe of size lfs-stripesize will rotate in order from the
starting point.Following Mr. David's example, a large data set
would be written to the 2nd OST, with the next piece on the 3rd, then
0th and finally 1st before circling back around to the 2nd (assuming
OSTs 0 to 3 from the example).  In his response, Mr. Dilger stated:
"when OST free space is imbalanced  the OSTs will be selected in part
based on how full they are".   Does that refer to a starting point for
the data writes before the orderly progression?   Does that somehow
imply a "skipping over" of a "full" OST?The latter would be
revolutionary to me in my personal understanding of Lustre and cluster
file systems in general.   I thought that a single OST having
insufficient space available for writing of the data piece of "stripe
size"---or all of the data if the default Lustre stripe size of one is
used--would cause a file system full error.This error can confuse
users and novice administrators who see a file system full message
when a typical disk usage command on the client will show (ofter a
reasonable) percentage available on the file system as a whole.

Have I misunderstood something here or is this skipping over a full
OST something in the newer versions of Lustre cluster filesystem?

Cheers!
megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre-discuss Digest, Vol 72, Issue 21

2012-01-28 Thread Ms. Megan Larko
Greetings!

Wow!   Thank you Brian.   That is *exactly* the sort of instruction I
needed.   I was under the erroneous assumption that the lower of the
Bugzilla numbers would be the object of my search in the event of a
duplicate bug.   I never traced BZ 21681 as I thought it was the
"duplicate" and that the trail would be in the "original".

I appreciate the detailed info.   Thanks for teaching me how to fish.

megan

On Sat, Jan 28, 2012 at 2:00 PM,
 wrote:
> Send Lustre-discuss mailing list submissions to
>        lustre-discuss@lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.lustre.org/mailman/listinfo/lustre-discuss
> or, via email, send a message with subject or body 'help' to
>        lustre-discuss-requ...@lists.lustre.org
>
> You can reach the person managing the list at
>        lustre-discuss-ow...@lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Lustre-discuss digest..."
>
>
> Today's Topics:
>
>   1. Re: landing of Lustre Bugzilla 19579 (Brian J. Murrell)
>
>
> --
>
> Message: 1
> Date: Fri, 27 Jan 2012 14:23:53 -0500
> From: "Brian J. Murrell" 
> Subject: Re: [Lustre-discuss] landing of Lustre Bugzilla 19579
> To: lustre-discuss@lists.lustre.org
> Message-ID: <4f22f9c9.8080...@whamcloud.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> On 12-01-27 02:01 PM, Ms. Megan Larko wrote:
>> Hello,
>
> Hi,
>
>> I have gone through the Lustre CHANGELOG for 1.8.x and I do not see
>> where bug 19579 has been addressed.   I see it being resolved, but in
>> which version was the fix landed please?   There are reasons why the
>> customer may not be able to go to the latest and greatest 1.8.x
>> version of Lustre.  I am looking for documentation on what version
>> addresses the message described in 19579.
>
> If you look at the resolution of bug 19579 it says that it was resolved
> as a duplicate of bug 21681.  From there I go to my clone of
> git.whamcloud.com and make sure I am on branch b1_8.
>
> I then use git log to see everything that has been committed and search
> for "b=21681" and it finds commit
> df214dd2e53f58be1f8cacdecb2fec54871a120e.  If I then use "git describe
> --contains df214dd2e53f58be1f8cacdecb2fec54871a120e", it reports
> v1_8_1_60~13 which I can interpret as having landed before v1_8_1_60 was
> tagged, or in terms of which GA release, it would be in 1.8.2.
>
> Additionally bugzilla says that 21681 landed in 1.8.2 and if I check the
> lustre/ChangeLog on my Whamcloud clone I can see that that bugzilla id
> is indeed listed in the changelog under release 1.8.2.
>
> Cheers,
> b.
>
> --
> Brian J. Murrell
> Senior Software Engineer
> Whamcloud, Inc.
>
> -- next part --
> A non-text attachment was scrubbed...
> Name: signature.asc
> Type: application/pgp-signature
> Size: 262 bytes
> Desc: OpenPGP digital signature
> Url : 
> http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120127/f1da8678/attachment-0001.bin
>
> --
>
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
> End of Lustre-discuss Digest, Vol 72, Issue 21
> **
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] landing of Lustre Bugzilla 19579

2012-01-27 Thread Ms. Megan Larko
Hello,

I am looking for some concrete (printable) documentation.   I have a
client who is receiving an error message on the system which has been
indicated to me to be Lustre Bugzilla 19579.   The customer is running
Lustre 1.8.3 currently and wished to upgrade Lustre version to avoid
viewing the error message indicated by Mikhail Pershin as being
"harmless"  and "...the wrong alert and will be fixed.".

I have gone through the Lustre CHANGELOG for 1.8.x and I do not see
where bug 19579 has been addressed.   I see it being resolved, but in
which version was the fix landed please?   There are reasons why the
customer may not be able to go to the latest and greatest 1.8.x
version of Lustre.  I am looking for documentation on what version
addresses the message described in 19579.

I appreciate any assistance.

Thank you,
Megan Larko
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Pardon my stupidity: IOH?

2011-06-03 Thread Ms. Megan Larko
Excellent!   Thank you for the Wikipedia ref.   I had heard of QPI
replacing the old northbridge but I have not yet worked with any
motherboards with that technology.

Thank you!
megan

On Fri, Jun 3, 2011 at 7:12 PM, Kevin Van Maren
 wrote:
> The I/O Hub, which provides the PCI Express lanes to the processor.  See:
> http://en.wikipedia.org/wiki/Intel_X58
>
>
> Ms. Megan Larko wrote:
>>
>> Greetings,
>>
>> Please pardon my ignorance, what is this IOH to which the recent
>> thread "OSSes on dual IOH motherboards" has been referring?
>>
>> Thanks,
>> megan
>> ___
>> Lustre-discuss mailing list
>> Lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
>
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Pardon my stupidity: IOH?

2011-06-03 Thread Ms. Megan Larko
Greetings,

Please pardon my ignorance, what is this IOH to which the recent
thread "OSSes on dual IOH motherboards" has been referring?

Thanks,
megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] ost_write operation failed with -28 in 1.8.5 lustre client

2011-05-02 Thread Ms. Megan Larko
Hello,

Just one very small suggestion:   How are your inodes on your MDT?
If one runs out of inodes then a system appears to be full because no
additional inode pointers may be issued to link the data to a
location/starting point.

A "df -i" on the MDT can answers this question.

Good Luck,
megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] small/inexpensive cluster design

2011-04-21 Thread Ms. Megan Larko
Greetings,

I had been a part of a team that has done this twice.  Once at NASA
Goddard Space Flight Center Hydrological Sciences Branch and one more
time at the Center for Research on Environment & Water.   Both times
were successful experiences I thought.

We used commercial off-the-shelf PC hardware and managed switches to
build a beowulf-style cluster consisting of compute nodes, OSS and MDS
nodes.   The OSS and the MGS/MDS units were separate as per the
recommendation of the Lustre team.  The back-end storage OST units
were 4U boxes containing sATA disks connected to the OST via CX4 (I
think) cables.  We used Perc6/i RAID and the corresponding MegaCLI64
s/w tool  on the OSS units to manage the disks within.

The OS was Red Hat-based CentOS 4 and upgraded before I left to CentOS
5.5.  The OST disks were formatted in the Lustre Cluster file system.

We were able to successfully export the Lustre mount-points via NFS
from the main client box.

We used the data on the Lustre file system to produce and display
Earth science images on an ordinary web interface (using a combination
of IDL proprietary imaging software and the freely available GrADS
imaging software from IGES).  We chose Lustre cluster files system for
the project because of its price point (Free/Open-Source -- FOSS) and
the fact that it performed better for our purposes than GFS and our
test of the, back then early, glustre.

Just a data point for you.

megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] software raid

2011-03-29 Thread Ms. Megan Larko
Hi,

Just as a clarification/update, I have done both software and hardware
raid.   The issue with the device not coming back as the same drive
letter or position was mitigated by using the LABEL=disk5 (or whatever
string) so that the mounts are placed into position by label.   Newer
versions of software raid use the physical drive serial number (s/n)
or other unique identifying number obtained from the hardware itself.
For example root=UUID=21c81788-30ea-4e5d-ad9b-a00a0be5ce7e"I have
had hardware raid cards early on that were not capable of this
behavior.   Now the choice is entirely up to the administrator/user as
to preference.

Cheers!
megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] execute-only Like Ref: Bug 22376

2010-12-02 Thread Ms. Megan Larko
Greetings Andreas and Folks,

I read the info on Bug 23025.   Comment #12 at the bottom indicates:
--- Comment #12 From Sam Chan 2010-06-29 13:49:43 ---

problem is fixed in Novell bugzilla 379057.  kernel 2.6.16.60-0.60.1
and newer should resolve the problem.

i tested 2.6.16.60-0.66.1+lustre.1.8.3 and things work fine now.
_
That's sounds cool, but I am seeing this on SLES10SP3 with linux
kernel 2.6.16.60-0.69.1-smp on the clients and kernel
2.6.16.60-0.69.1+lustre1.8.4-smp on the Lustre servers using lustre
1.8.4 on both sides of the conversation.

Cheers!
megan


On Thu, Dec 2, 2010 at 3:10 PM, Andreas Dilger
 wrote:
> On 2010-12-02, at 12:23, Ms. Megan Larko wrote:
>> We recently upgraded our existing Lustre system from 1.6.7.2 to 1.8.4.
>>  One of the hoped-for features is "execute-only" binaries on the
>> Lustre file system.
>
> Are you running a SLES10 kernel?  If yes, please see bug 23025.
>
>>  According to Bug 22376 (
>> https://bugzilla.lustre.org/show_bug.cgi?id=22376 )  this execute-only
>> feature was available in the patch for Lustre 1.8.2.   I had assumed
>> the patch would be incorporated upstream (i.e. to 1.8.4).   The
>> behavior I am seeing on the Lustre 1.8.4 is shown below with an a.out
>> executable file of the common "hello world" C program.
>
> I've added a simple test to bug 22376 to verify if this is working correctly 
> in current versions of Lustre.  Please CC yourself to that bug to track its 
> progress.
>
>> icecube:/mnt/lustre # ls -l a.out
>> ---x--x--x 1 root mygrp 9027 Dec  2 13:57 a.out
>> la...@icecube:/mnt/lustre> ./a.out
>> -bash: ./a.out: Permission denied
>> la...@icecube:/mnt/lustre> strace a.out
>> execve("/mnt/lustre/a.out", ["a.out"], [/* 73 vars */]) = -1 EACCES
>> (Permission denied)
>> dup(2)                                  = 3
>> fcntl(3, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
>> fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
>> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
>> 0) = 0x2b2df42b3000
>> lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
>> write(3, "strace: exec: Permission denied\n", 32strace: exec: Permission 
>> denied
>> ) = 32
>> close(3)                                = 0
>> munmap(0x2b2df42b3000, 4096)            = 0
>> exit_group(1)                           = ?
>> la...@icecube:/mnt/lustre> whoami
>> larko
>> la...@icecube:/mnt/lustre> cat /etc/passwd | grep larko
>> larko:x:1:1:Catherine M Larko
>> (MYGRP96090RAY):/usr/people/larko:/bin/bash  # where 1 is "mygrp"
>>
>> The execute-only does work for the root users:
>> icecube:/mnt/lustre # whoami
>> root
>> icecube:/mnt/lustre # ./a.out
>> Hello World
>> icecube:/mnt/lustre # strace ./a.out
>> execve("./a.out", ["./a.out"], [/* 69 vars */]) = 0
>> brk(0)                                  = 0x501000
>> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
>> 0) = 0x2b6427bdf000
>> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
>> 0) = 0x2b6427be
>> access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or 
>> directory)
>> open("/etc/ld.so.cache", O_RDONLY)      = 3
>> fstat(3, {st_mode=S_IFREG|0644, st_size=131880, ...}) = 0
>> mmap(NULL, 131880, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b6427be1000
>> close(3)                                = 0
>> open("/lib64/libc.so.6", O_RDONLY)      = 3
>> read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\324\1"..., 832) = 
>> 832
>> fstat(3, {st_mode=S_IFREG|0755, st_size=1570761, ...}) = 0
>> mmap(NULL, 2355560, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
>> 0) = 0x2b6427ce
>> madvise(0x2b6427ce, 2355560, MADV_SEQUENTIAL|0x1) = 0
>> mprotect(0x2b6427e16000, 1048576, PROT_NONE) = 0
>> mmap(0x2b6427f16000, 20480, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x136000) = 0x2b6427f16000
>> mmap(0x2b6427f1b000, 16744, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2b6427f1b000
>> close(3)                                = 0
>> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
>> 0) = 0x2b6427f2
>> arch_prctl(ARCH_SET_FS, 0x2b6427f206d0) = 0
>> mprotect(0x2b6427f16000, 12288, PROT_READ) = 0
>> munmap(0x2b6427be1000, 131880)          = 0
>> fstat(1, {st_mode=S_IFCHR|

[Lustre-discuss] execute-only Like Ref: Bug 22376

2010-12-02 Thread Ms. Megan Larko
Hello Group,

We recently upgraded our existing Lustre system from 1.6.7.2 to 1.8.4.
  One of the hoped-for features is "execute-only" binaries on the
Lustre file system.  According to Bug 22376 (
https://bugzilla.lustre.org/show_bug.cgi?id=22376 )  this execute-only
feature was available in the patch for Lustre 1.8.2.   I had assumed
the patch would be incorporated upstream (i.e. to 1.8.4).   The
behavior I am seeing on the Lustre 1.8.4 is shown below with an a.out
executable file of the common "hello world" C program.

icecube:/mnt/lustre # ls -l a.out
---x--x--x 1 root mygrp 9027 Dec  2 13:57 a.out
la...@icecube:/mnt/lustre> ./a.out
-bash: ./a.out: Permission denied
la...@icecube:/mnt/lustre> strace a.out
execve("/mnt/lustre/a.out", ["a.out"], [/* 73 vars */]) = -1 EACCES
(Permission denied)
dup(2)  = 3
fcntl(3, F_GETFL)   = 0x8002 (flags O_RDWR|O_LARGEFILE)
fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x2b2df42b3000
lseek(3, 0, SEEK_CUR)   = -1 ESPIPE (Illegal seek)
write(3, "strace: exec: Permission denied\n", 32strace: exec: Permission denied
) = 32
close(3)= 0
munmap(0x2b2df42b3000, 4096)= 0
exit_group(1)   = ?
la...@icecube:/mnt/lustre> whoami
larko
la...@icecube:/mnt/lustre> cat /etc/passwd | grep larko
larko:x:1:1:Catherine M Larko
(MYGRP96090RAY):/usr/people/larko:/bin/bash  # where 1 is "mygrp"

The execute-only does work for the root users:
icecube:/mnt/lustre # whoami
root
icecube:/mnt/lustre # ./a.out
Hello World
icecube:/mnt/lustre # strace ./a.out
execve("./a.out", ["./a.out"], [/* 69 vars */]) = 0
brk(0)  = 0x501000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x2b6427bdf000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x2b6427be
access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)  = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=131880, ...}) = 0
mmap(NULL, 131880, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b6427be1000
close(3)= 0
open("/lib64/libc.so.6", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\324\1"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1570761, ...}) = 0
mmap(NULL, 2355560, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x2b6427ce
madvise(0x2b6427ce, 2355560, MADV_SEQUENTIAL|0x1) = 0
mprotect(0x2b6427e16000, 1048576, PROT_NONE) = 0
mmap(0x2b6427f16000, 20480, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x136000) = 0x2b6427f16000
mmap(0x2b6427f1b000, 16744, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2b6427f1b000
close(3)= 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x2b6427f2
arch_prctl(ARCH_SET_FS, 0x2b6427f206d0) = 0
mprotect(0x2b6427f16000, 12288, PROT_READ) = 0
munmap(0x2b6427be1000, 131880)  = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x2b6427be1000
write(1, "Hello World\n", 12Hello World
)   = 12
exit_group(0)   = ?

Regarding the default value of the drop cache:
la...@icecube:/mnt/lustre> cat /proc/sys/vm/drop_caches 0

If I try the other suggestion in the bugzilla URL referenced above
about sendiing the contents once to /dev/null there is no change in
resulting behavior.
icecube:/mnt/lustre # cat ./a.out > /dev/null
la...@icecube:/mnt/lustre> ./a.out
-bash: ./a.out: Permission denied


There are no unusual lines what-so-ever on the MGS/MDT /var/log/messages file.

Any tips?   Settings??

Thank you,
Megan Larko
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] upgrade to 1.8.4 and test fallback to 1.6.7.2

2010-09-29 Thread Ms. Megan Larko
Wow.   I did not know that difference.   I like what I have read about MMP.

Thank you.
MLarko

SGI Federal

On Mon, Sep 27, 2010 at 5:10 PM, Johann Lombardi
 wrote:
> Hi Megan,
>
> On Mon, Sep 27, 2010 at 01:50:38PM -0400, Ms. Megan Larko wrote:
>> OkayI was getting errors when I attempted to use --erase-params
>> and --writeconf in 1.8.4 stating that my 1.6.7.2 parameters would have
>> to be updated (again, my "failover.node" string becomes "failnode"
>> string).  Just my personal experience so far...
>
> --failnode is actually very similar to --param="failover.node=".
> In both cases, the same parameter (i.e. PARAM_FAILNODE="failover.node=") is
> stored in the configuration logs. The only difference is that --failnode
> enables MMP automatically, that's why we recommend to use it.
> To sum up, there should not be any compatibility issue (BTW, --failnode is
> also available in 1.6, AFAIK).
>
> Cheers,
> Johann
>
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] upgrade to 1.8.4 and test fallback to 1.6.7.2

2010-09-27 Thread Ms. Megan Larko
Greetings Johann,

Thank you for your response.

On Mon, Sep 27, 2010 at 1:12 PM, Johann Lombardi
 wrote:
> On Mon, Sep 27, 2010 at 12:31:53PM -0400, Ms. Megan Larko wrote:
>> I attempted to add a new lustre tuning parameter to my system (I
>> wanted to add the *t.group_upcall=NONE.)   While the Lustre 1.8.4 read
>
> Please note that *t.group_upcall=NONE is already supported in 1.6.

Yes.  I know that *t.group_upcall=NONE is supported in 1.6.   Our site
at SGI did not have its 1.6.7.2 Lustre configured that way.   The SGI
site was using the default and I wanted to change it after upgrading
to 1.8.4.   I apologize if I was not clear.
>
>> the 1.6.7.2 tuning parameters without issue and I could add to the
>> parameters under 1.8.4 without issue, if I tried to change a parameter
>> requiring that I use the --writeconf option, I learned I had to change
>> all the parameters from the 1.6.7.2 syntax to the 1.8.4 syntax.
>> (EXAMPLE:   "failover.node" string became "failnode")    Okay.  This I
>> can do,   BUT...
>>
>> If I have to revert to 1.6.7.2 (due to a security flaw in the linux
>> kernel or something...) am I correct in assuming that the lustre 1.8.4
>> parameter strings would not be understood by the 1.6.7.2 lustre system
>> (can't have s/w reading into the future, right?   Smile)?   If that is
>
> To be clear, lustre 1.8 and 1.6 use the same string format. 1.8 just supports
> some additional parameters introduced for to the new features (e.g. OST 
> pools).
> Unknown params are supposed to be ignored when downgrading. While it works
> fine with most of the new params (like OST pools), there is unfortunately a
> bug (i.e. it does not work with at_max), see bug 20449.

OkayI was getting errors when I attempted to use --erase-params
and --writeconf in 1.8.4 stating that my 1.6.7.2 parameters would have
to be updated (again, my "failover.node" string becomes "failnode"
string).  Just my personal experience so far...

Thank you,
MLarko
>
> Cheers,
> Johann
>
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] upgrade to 1.8.4 and test fallback to 1.6.7.2

2010-09-27 Thread Ms. Megan Larko
Greetings!

I performed an upgrade of our Lustre file system and corresponding
SuSE kernel.   The Lustre upgrade went well.  After the upgrade from
1.6.7.2 to 1.8.4 I was able to mount my Lustre volumes.   The data was
fine.The quotas started on mount via the lustre params on the MDT
and OSTs.

I attempted to add a new lustre tuning parameter to my system (I
wanted to add the *t.group_upcall=NONE.)   While the Lustre 1.8.4 read
the 1.6.7.2 tuning parameters without issue and I could add to the
parameters under 1.8.4 without issue, if I tried to change a parameter
requiring that I use the --writeconf option, I learned I had to change
all the parameters from the 1.6.7.2 syntax to the 1.8.4 syntax.
(EXAMPLE:   "failover.node" string became "failnode")Okay.  This I
can do,   BUT...

If I have to revert to 1.6.7.2 (due to a security flaw in the linux
kernel or something...) am I correct in assuming that the lustre 1.8.4
parameter strings would not be understood by the 1.6.7.2 lustre system
(can't have s/w reading into the future, right?   Smile)?   If that is
the case, then I should not change my lustre 1.6.7.2 parameters into
the newer 1.8.4 strings until I am certain I won't have to revert
back, or be prepared to do a full --writeconf of all *Ts to use the
older 1.6.7.2 strings after going back to the previous kernel
containing the lustre 1.6.7.2 files.

Am I correct on this understanding?

Thanks,
Megan Larko

SGI Federal
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] quick question on enabling quotas

2010-09-13 Thread Ms. Megan Larko
Hi,

Have you placed the parameters on the MDT and each and every OST yet?
For my system (/dev/mapper devices), I used...


The mdt.quota_type parameter to the MDT disk.
tunefs.lustre --param="mdt.quota_type=ug" /dev/mapper/mdt

tunefs.lustre --param="ost.quota_type=ug" /dev/mapper/ost

...for all OSTs.This made lustre quotas persistent on my
lustre-1.6.7.2 system.

Cheers,
megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] More detail regarding soft lockup error

2010-08-19 Thread Ms. Megan Larko
I will add emphasis here.  Backup the MDT before doing anything at
all.   The MDT backup procedure is short and documented in the Lustre
Manual.

Megan
(MDT back-up saved my bacon)
Larko


-
Kevin said:

Message: 3
Date: Thu, 19 Aug 2010 10:58:36 -0600
From: Kevin Van Maren 
Subject: Re: [Lustre-discuss] More detail regarding soft lockup error
To: "Brian J. Murrell" 
Cc: lustre-discuss@lists.lustre.org
Message-ID: <4c6d62bc.1060...@oracle.com>
Content-Type: text/plain; charset=UTF-8; format=flowed

Andreas _always_ recommends a backup first.

Kevin


Brian J. Murrell wrote:
> On Thu, 2010-08-19 at 10:09 -0600, Andreas Dilger wrote:
>
>> If you increase the size of the MDT (via resize2fs) it will increase the 
>> number of inodes as well.
>>
>
> Andreas: what is [y]our confidence level with resize2fs and our MDT?
> Given that I don't think we regularly (if at all) test this in our QA
> cycles (although I wish we would) I personally would be a lot more
> comfortable with a backup first.  What are your thoughts?  Unnecessary?
>
> b.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Understanding OST recovery_duration info

2010-08-12 Thread Ms. Megan Larko
Hello,

I am looking at the status of my running Lustre 1.6.7.2_3 system
(upgrade to 1.8.4 within weeks; impetus to further my education).

The default timeout value for Lustre is 100 sec.   The default
recovery time is 2x timeout value.   So I believe our site should have
a recovery of basically 200 sec.  There are a total of 175 OSTs
mounted on approximately 60 OSSes.  Because of a hard power failure to
the facility (the power went out AND the battery backup completely
failed AND the generator was flakey)  the linux 2.6.16.60-0.42.9
SLES10SP3 system was booted from a no-power state.

Lustre worked and the file system recovered just fine.  For education,
the value for "recovery_duration" in /proc/fs/lustre/obdfilter/{ost
name}/recovery_status file is between 300 and 600.   Does this mean
that the actual recovery took between 300 and 600 seconds to
successfully complete?If yes, should the Lustre timeout default
value be higher?Is all of this moot under Lustre 1.8.4 and
adaptive timeouts?

I appreciate the time taken to enlighten me.   Smile!

Cheers!
M Larko
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] mdt backup tar --xattrs question

2010-07-20 Thread Ms. Megan Larko
Greetings Group!

I hope this will be an easy one.   To conserve steps in backing up the
metadata extended attributes of a Lustre mdt, I am looking at using a
newer version of tar combined with its --xattrs option.   (Note:
Previously I have used the mdt two-step back-up from the Lustre Manual
and it has been successful.)   If I can backup the extended attributes
via tar so that I don't have to issue both a getfattr and then a tar
command it would be convenient.

I have GNU tar-1.23.  I was trying a version of the command indicated
in http://lists.lustre.org/pipermail/lustre-discuss/2009-June/010794.html
which supplied the --xattrs argument to the tar command.   It failed
with "Unrecognized option --xattrs". Is there something I need to
specify to get tar to understand it is to back up extended attributes?

Will this version of tar correctly obtain the extended attributes or
should I be using  lustre-tar  tool instead?

TIA,
megan

SGI Federal
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] default quotas

2010-06-28 Thread Ms. Megan Larko
Hi Brian,

Well,  Lustre, or any other file system cannot make assumptions about
a 'basic default' quota because disk space quotas are so dependent
upon the size of the disk mount point and the number of users sharing
the pool---so I am assuming that is not what you mean.

If you mean setting a site-wide default quota value, then yes, I'm
following here.   The quota system on the ext3 file system does have a
-p or "preen" option that allows the operator to set quotas for one
user and then to propagate that to many other users. No, I did not
see such a tool for Lustre file system quotas.  My own usage is that I
script the initialization of quotas on Lustre filesystems.   I then
add that line to my new user procedure.  Does that better answer your
question?

If anyone on the List has better ideas, please do share them with us.

megan

On Mon, Jun 28, 2010 at 1:57 AM, Andrus, Brian Contractor
 wrote:
> Megan,
>
> Yes, I have read through that and followed it all, but it does not tell HOW
> to set the default quotas. All the examples are for a specific user.
>
> Brian
> ____
> From: Ms. Megan Larko [mailto:dobsonu...@gmail.com]
> Sent: Sun 6/27/2010 9:27 PM
> To: Andrus, Brian Contractor
> Cc: Lustre User Discussion Mailing List
> Subject: default quotas
>
> Hi,
>
> I've been reading about how to set-up quota allocations in the Lustre
> Operations Manual 1.8.x   Section 9.1.3.
>
> It is available on-line (although it has moved about on URL names recently).
>
> Cheers!
> megan
>
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] default quotas

2010-06-27 Thread Ms. Megan Larko
Hi,

I've been reading about how to set-up quota allocations in the Lustre
Operations Manual 1.8.x   Section 9.1.3.

It is available on-line (although it has moved about on URL names recently).

Cheers!
megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre quotas using a journaled quota

2010-06-25 Thread Ms. Megan Larko
Howdy!

How can I tell if an existing Lustre file system (either 1.6.7.2 or
1.8.3) is using a journaled quota system?

I have been reading that if a Lustre file system is set-up using
journaled quotas then the command "lfs quotacheck -ug /mnt/lustre"
could be avoided following an unclean (crash) unmount of Lustre.  Is
this true for 1.8.x?   Is journaling the quota information analogous
to providing an external journal for a file system in that I need only
to specify the flag and point to which the quota journal should be
written?  What are the downsides to an external journal for quota
information in Lustre?

Note:  My questions stem from reading the Lustre 1.8 Operations Manual
section 9.1.1

TIA!
megan

SGI Federal
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] File access quetsion

2010-06-24 Thread Ms. Megan Larko
Ooops!   Apologies to all, especially Robin.  My bad.

megan

On Thu, Jun 24, 2010 at 11:21 AM, Kevin Van Maren
 wrote:
> Except that in this case, Robin is a "he".
>
>
> Ms. Megan Larko wrote:
>>
>> Hi,
>>
>> I'm following-up on my own question on 0100 file access as initially
>> appeared in Lustre-discuss Digest, Vol 53, Issue 3.
>>
>> Having a legitimate need for some execute-only programs to run on the
>> cluster system, we noted the behavior that a file 0100 did execute on
>> a non-Lustre (ext3) file system but did not execute (permission
>> denied) on a Lustre 1.6.7.2_3-2.6.2.16.60_0.42.9  (SLES10SP2 kernel
>> 2.6.16.60-0.42.9-smp).     Robin Humble filed bug number 22376 on 16
>> March 2010 regarding Lustre 1.8.2 for the same behavior.   NOTE:
>> RHumble comment #2 stated that the "permission denied" error did not
>> occur for her on her 2.6.18-128 kernel with Lustre version 1.6.7.2!
>>
>> The Lustre Bugzilla for 22376 reports that the issue is fixed in 1.8.3
>> and also in the initial 2.0.0 versions of Lustre.  As RHumble did not
>> have the same behavior in Lustre 1.6..7.2, there were no references to
>> any changes to that line.
>>
>> More reason to upgrade the Lustre system.
>>
>> Cheers!
>> megan
>> ___
>> Lustre-discuss mailing list
>> Lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
>
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] File access quetsion

2010-06-24 Thread Ms. Megan Larko
Hi,

I'm following-up on my own question on 0100 file access as initially
appeared in Lustre-discuss Digest, Vol 53, Issue 3.

Having a legitimate need for some execute-only programs to run on the
cluster system, we noted the behavior that a file 0100 did execute on
a non-Lustre (ext3) file system but did not execute (permission
denied) on a Lustre 1.6.7.2_3-2.6.2.16.60_0.42.9  (SLES10SP2 kernel
2.6.16.60-0.42.9-smp). Robin Humble filed bug number 22376 on 16
March 2010 regarding Lustre 1.8.2 for the same behavior.   NOTE:
RHumble comment #2 stated that the "permission denied" error did not
occur for her on her 2.6.18-128 kernel with Lustre version 1.6.7.2!

The Lustre Bugzilla for 22376 reports that the issue is fixed in 1.8.3
and also in the initial 2.0.0 versions of Lustre.  As RHumble did not
have the same behavior in Lustre 1.6..7.2, there were no references to
any changes to that line.

More reason to upgrade the Lustre system.

Cheers!
megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] SLES11, lustre 1.82 with lvm and multipathing problems

2010-06-18 Thread Ms. Megan Larko
Hi List!

Matthias wrote:
>Hi Atul,
>thanks for your reply.
>Now we use multipathing without LVM and it works fine.
>So the problem comes from LVM.
>
>Cheers,
>Matthias

I am very interested in hearing more details about this.   There is a
cluster which I wish to upgrade/clean-install  to version 1.8.3 or
1.8.4 (currently 1.6.7) and I desire to use both LVM and multipathing.
  The current set-up does use multipathing but not LVM.   I was hoping
to introduce LVM into the new build to enable more efficient snap-shot
backups and expandability on the MDT disks.

Do you know what the conflicts are with the multipathing and LVM?
Was the case-in-point multpathing and LVM on MDT volumes or other?

I appreciate your kindness in sharing your experience.

Sincerely,
megan

SGI Federal
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] A Lustre documentation suggestion

2010-06-03 Thread Ms. Megan Larko
Greetings!

I have been reading the on-line documentation for Lustre 1.8  (
http://wiki.lustre.org/manual/LustreManual18_HTML/IntroductionToLustre.html
).  There are figures for which the text-captioning on the figures is
not legible to me (Example: Figure 1-3 and Figure 1-4) even if I
attempt to enlarge the browser window (Firefox) or the font size via
CTL +.Could the images in the Lustre manual on-line perhaps be
made to be a link on which one could click to view the Figure image in
a larger, hopefully with legible text, form?

Just a suggestion.

megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] CentOS 5.4 (Rocks 5.3) and Lustre 1.8.2

2010-06-03 Thread Ms. Megan Larko
Hello,

I am hoping that someone will have a more elegant answer for you but I
will share my experience.

Files in a linux /etc/fstab file are mounted early in the boot process
(to get the /, /home, /opt, swap).   This action usually happens
before networking is started.  So if your Lustre file system mount
points are listed in /etc/fstab then the system will try to mount them
at the early, pre-networking point of startup.   Without the network,
obviously the Lustre disks will not be able to mount.

Our quick and dirty solution was to mount Lustre via a script in
/etc/rc.d/init.d that was called with an S##  after IB networking had
started (and the script checked for network connectivity first).  In
the script we then either echoed the /etc/fstab mount point lines to
the bottom of that file (I'm not sure why).   Eventually we just left
the Lustre mount points out of our /etc/fstab file allowing the
/etc/rc.d/init.d script for Lustre both start and stop the Lustre
mounts in a Sys V manner (like the other init.d scripts).

The key to stopping Lustre is to make certain that there are no active
jobs (RPCs in-flight) or Lustre/LNET will resist an umount command.
We used a sub-section of the script to check for success/exit code of
umount and if unsuccessful sleep and try to umount again.   This
latter part was really a bubble gum and paper clip approach.   We did
sometimes just have to outright kill the job or even LNET to get the
Lustre file system to unmount and not just hang indefinitely.

I honestly hope that there are more elegant solutions developed by
others out there who may wish to share.

Megan
SGI Federal
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] File access question

2010-06-02 Thread Ms. Megan Larko
Hello,

I am trying to understand the way in which a file on a lustre (1.6.7)
file system is accessed.

I have a legitimate need to have an executable file execute-only; no
read permission at all.  Testing on ext3 I can do this by "chmod 110
a.out".   A user in the group is able to successfully execute the
file.   If I attempt to do the same thing on a Lustre file system I
see the error message "Permission denied".   I can gain access by
setting the g+r.  That last setting is not permissible for this
specific file.   In trying to understand how this works I selected the
on-line Lustre 1.8.x Manual (
http://wiki.lustre.org/manual/LustreManual18_HTML/IntroductionToLustre.html
).  Figure 1.5 seemed to indicate that a read operation to get the
pointers to pass back to the client is required.   Essentially because
of the separation of metadata from the file system on which the data
file actually physically resides an "execute-only"  file on a Lustre
file system is not possible.

Is this thinking correct?

Thank you,
Megan Larko

(now with SGI Federal)
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Unbalanced OST--for discussion purposes

2010-03-03 Thread Ms. Megan Larko
Thanks to both Brian and Andreas for the timely responses.
Brian posed the question as to whether or not the OSTs were more or
less balanced a week ago.  The answer is that I believe that they
were.   Usually all OSTs report a similar percentage of usage (between
1%  and 3% of one another).   I believe that is why this new report
piqued my curiosity.

Regarding Andreas remark about individual OST size, yes I understand
that having larger individual OSTs can preempt any one OST from
becoming so full that the others degrade in performance (per A.
Dilger, not B. Murrel).   For that reason I personally like the option
available in newer Lustre releases (I think 1.8.x and higher) to allow
up to 16Tb in a single OST slice.  I know the previous limit was 8Tb
per OST slice for precaution against data corruption.   (I was able to
build a larger OST slice with 1.6.7 but I was cautioned that some data
may become unreachable and/or corrupted as the Lustre system had not
at that time been modified to accept the larger partition sizes which
the underlying files systems--ext4, xfs---would accept.)The OST
formatted size of 6.3Tb fit nicely into the JBOD scheme of
evenly-sized partitions.

Thanks,
megan

On Tue, 2010-03-02 at 15:45 -0500, Ms. Megan Larko wrote:
> Hi,

Hi,

> I logged directly into the OSS (OSS4) and just ran a df (along with a
> periodic check of the log files).  I last looked about two weeks ago
> (I know it was after 17 Feb).

Is the implication that at this point the OSTs were more or less well
balanced?

> Anyway, the OST0007 is more full than
> any of the other OSTs.  The default lustre stripe (I believe that is
> set to 1) is used.Can just one file shift the size used of one OST
> that significantly?

Sure.  As an example, if one had a 1KiB file on that OST, called, let's
say, "1K_file.dat" and one did:

$ dd if=/dev/zero of=1K_file.dat bs=1G count=1024

that would overwrite the 1KiB file on that OST with a 1TiB file.
Recognizing of course that that would be 1TiB in a single object on an
OST.

> What other reasonable explanation for a
> difference on one OST in comparison with the others?

Any kind of variation on the above.

> Could this cause
> a lustre performance hit at this point?

Not really.

b.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Unbalanced OST--for discussion purposes

2010-03-02 Thread Ms. Megan Larko
Hi,

I have a Lustre array (version  2.6.18-53.1.13.el5_lustre.1.6.4.3smp)
which will soon be decommissioned in favor of newer hardware.
Therefore this question is mostly for my personal intellectual
curiosity.

I logged directly into the OSS (OSS4) and just ran a df (along with a
periodic check of the log files).  I last looked about two weeks ago
(I know it was after 17 Feb).   Anyway, the OST0007 is more full than
any of the other OSTs.  The default lustre stripe (I believe that is
set to 1) is used.Can just one file shift the size used of one OST
that significantly?  What other reasonable explanation for a
difference on one OST in comparison with the others?  Could this cause
a lustre performance hit at this point?

   [r...@oss4 ~]# df -h
FilesystemSize  Used Avail Use% Mounted on

/dev/sdb1 6.3T  3.6T  2.5T  60% /srv/lustre/OST/crew8-OST
/dev/sdb2 6.3T  4.1T  1.9T  69% /srv/lustre/OST/crew8-OST0001
/dev/sdc1 6.3T  3.3T  2.8T  55% /srv/lustre/OST/crew8-OST0002
/dev/sdc2 6.3T  3.3T  2.7T  56% /srv/lustre/OST/crew8-OST0003
/dev/sdd1 6.3T  3.5T  2.6T  58% /srv/lustre/OST/crew8-OST0004
/dev/sdd2 6.3T  4.1T  1.9T  69% /srv/lustre/OST/crew8-OST0005
/dev/sdi1 6.3T  3.9T  2.2T  65% /srv/lustre/OST/crew8-OST0006
/dev/sdi2 6.3T  5.0T 1015G  84%
/srv/lustre/OST/crew8-OST0007 <
/dev/sdj1 6.3T  3.4T  2.7T  56% /srv/lustre/OST/crew8-OST0008
/dev/sdj2 6.3T  3.3T  2.7T  56% /srv/lustre/OST/crew8-OST0009
/dev/sdk1 6.3T  3.4T  2.7T  56% /srv/lustre/OST/crew8-OST0010
/dev/sdk2 6.3T  3.8T  2.2T  64% /srv/lustre/OST/crew8-OST0011

Still learning
megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lctl command usage

2010-01-29 Thread Ms. Megan Larko
Thank you Johann!

The key line seems to be "net o2ib" and *not* net ib0.  Thanks for
pointing out my error.

megan

On Fri, Jan 29, 2010 at 1:00 PM, Johann Lombardi  wrote:
> On Fri, Jan 29, 2010 at 12:30:51PM -0500, Ms. Megan Larko wrote:
>> that "network" was not run.  Am I missing something in lctl command
>> usage?
>
> # lctl
> lctl > net up
> LNET configured
> lctl > list_nids
> 10.8.0@tcp
> lctl > conn_list
> You must run the 'network' command before 'conn_list'.
> lctl > net tcp
> lctl > conn_list
> 12345-10.8.0@tcp I[2]sata17->sata18:1014 16384/654368 nonagle
> 12345-10.8.0@tcp O[1]sata17->sata18:1015 66232/87380 nonagle
> 12345-10.8.0@tcp C[0]sata17->sata18:1016 16384/87380 nonagle
> 12345-10.8.0@tcp I[0]sata17->sfire10:1020 16384/87380 nonagle
> 12345-10.8.0@tcp O[3]sata17->sfire10:1022 58440/87380 nonagle
> 12345-10.8.0@tcp C[1]sata17->sfire10:1023 16384/4194304 nonagle
> 12345-10.8.0@tcp I[0]sata17->sfire11:1014 16384/87380 nonagle
> 12345-10.8.0@tcp O[3]sata17->sfire11:1015 16384/87380 nonagle
> 12345-10.8.0@tcp C[2]sata17->sfire11:1016 58440/3246672 nonagle
>
> I don't have any ib cards on this node, but you can do the same with
> "o2ib" instead of "tcp".
>
> HTH
>
> Johann
>
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] lctl command usage

2010-01-29 Thread Ms. Megan Larko
Greetings,

I am still working on tracking my lustre timeout/reconnect issue.
The lustre version on the OSS and MGS is
2.6.18-53.1.13.el5_lustre.1.6.4.3smp.   I am seeing IMP_INVALID
messages in my log files.   I think (not certain) that I have a bad IB
cable or port in a card, but I am trying to ascertain that it is not
perhaps another issue (such as a dnsmasq GUID).  I am seeking
information using the "lctl" command.  The only network on which we
use lustre is Infiniband (ib0).   The Gb TCP network is not in the
initial tuneconf file system creation.

Am I not invoking lctl properly?  (I am using the syntax in the man
page.)  The network is up.  LNET is running.  My "lctl ping
xxx.yyy.zzz.aaa@ib0" return nicely.   I am trying to gather
information about my network and my lctl commands are informing me
that I need to run the network command before inquiring about
"interface_list".   I do run the network command and I am as yet
unable to get peer_list or conn_list information as lctl indicates
that "network" was not run.  Am I missing something in lctl command
usage?

Thanks,
megan


[r...@mds1 ~]# lctl network ib0 up
Can't parse net ib0
[r...@mds1 ~]# lctl interface_list
You must run the 'network' command before 'interface_list'.
[r...@mds1 ~]# lctl network
usage: network |up|down
[r...@mds1 ~]# lctl network up
LNET configured
[r...@mds1 ~]# lctl interface_list
You must run the 'network' command before 'interface_list'.
[r...@mds1 ~]# lctl dl
  1 UP mgc mgc192.168.64@o2ib c7135d07-19c5-abe2-2ca3-976185b80dde 5
  2 UP mdt MDS MDS_uuid 3
  8 UP lov crew8-mdtlov crew8-mdtlov_UUID 4
  9 UP mds crew8-MDT crew8-MDT_UUID 13
 10 UP osc crew8-OST-osc crew8-mdtlov_UUID 5
 11 UP osc crew8-OST0001-osc crew8-mdtlov_UUID 5
 12 UP osc crew8-OST0002-osc crew8-mdtlov_UUID 5
 13 UP osc crew8-OST0003-osc crew8-mdtlov_UUID 5
 14 UP osc crew8-OST0004-osc crew8-mdtlov_UUID 5
 15 UP osc crew8-OST0005-osc crew8-mdtlov_UUID 5
 16 UP osc crew8-OST0006-osc crew8-mdtlov_UUID 5
 17 UP osc crew8-OST0007-osc crew8-mdtlov_UUID 5
 18 UP osc crew8-OST0008-osc crew8-mdtlov_UUID 5
 19 UP osc crew8-OST0009-osc crew8-mdtlov_UUID 5
 20 UP osc crew8-OST000a-osc crew8-mdtlov_UUID 5
 21 UP osc crew8-OST000b-osc crew8-mdtlov_UUID 5
[r...@mds1 ~]# lctl interface_list
You must run the 'network' command before 'interface_list'.
[r...@mds1 ~]# lctl conn_list
You must run the 'network' command before 'conn_list'.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


  1   2   >