Re: [lustre-discuss] Corrupted? MDT not mounting

2022-05-10 Thread Andrew Elwell via lustre-discuss
On Wed, 11 May 2022 at 04:37, Laura Hild  wrote:
> The non-dummy SRP module is in the kmod-srp package, which isn't included in 
> the Lustre repository...

Thanks Laura,
Yeah, I realised that earlier in the week, and have rebuilt the srp
module from source via mlnxofedinstall, and sure enough installing
srp-4.9-OFED.4.9.4.1.6.1.kver.3.10.0_1160.49.1.el7_lustre.x86_64.x86_64.rpm
(gotta love those short names) gives me working srp again.

Hat tip to a DDN contact here (we owe him even more beers now) for
some extra tuning parameters:
options ib_srp cmd_sg_entries=255 indirect_sg_entries=2048
allow_ext_sg=1 ch_count=1 use_imm_data=0
but I'm pleased to say that it _seems_ to be working much better. I'd
done one half of the HA pairs earlier in the week, lfsck completed,
full robinhood scan done (dropped the DB and rescanned from fresh) and
I'm just bringing the other half of the pairs up to the same software
stack now.

Couple of pointers for anyone caught in the same boat that apparently
we did correctly:
* upgrade your f2fsprogs to the latest - if your fsck'ing disks make
sure you're not introducing more problems with a buggy old e2fsck
* tunefs.lustre --writeconf isn't too destructive (see the warnings,
you'll lose pool info but in our case that wasn't critical)
* monitoring is good but tbh the rate of change and that it happened
out of hours means we likely couldn't have intervened
* so quotas are better.

Thanks to those who replied on and off-list - I'm just grateful we
only had the pair of MDTs, not the 40 (!!!) that Origin's getting
(yeah, I was watching the LUG talk last night) - service isn't quite
back to users but we're getting there!

Andrew
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre File Mirroring Questions

2022-05-10 Thread Brian Lambrigger
Hello Lustre-discuss,

While testing file mirroring from the FLR initiative, I created the
following setup:

1) Two nodes with an OST each. NodeA has MGS, MDT, and OST0. NodeB has OST1
2) Created two pools on the DFS, with OST0 in pool0, OST1 in pool 1.
3) Created a mirrored layout for the DFS root directory, so that child
files will inherit the layout.
4) Created a file in the directory which inherits the mirroring.


That worked well, but I had some questions as to the capabilities of the
mirrored layout:

1) With the file synced on both OSTs, I downed OST0. On OST1, I was able to
read its replica, but any writes would hang as it attempted to contact
OST0. Is this expected behavior, or do I need to change parameters for the
devices with tunefs in order to have writes failover to OST1?
2) It seems that resync must sync the entire file upon modification, even
if its striped into smaller blocks and only one of those stripes is
modified. Is that correct?


Here's my current OST setup as reported by tunefs:

Read previous values:
Target: flrtest-OST Index:  0
Lustre FS:  flrtest Mount type: zfs
Flags:  0x2
  (OST )
Persistent mount opts:
Parameters: mgsnode=192.168.7.60@tcp1 failover.mode=failout


Permanent disk data:

Target: flrtest-OST
Index:  0
Lustre FS:  flrtest Mount type: zfs
Flags:  0x2
  (OST )
Persistent mount opts:
Parameters: mgsnode=192.168.7.60@tcp1 failover.mode=failout


As a newcomer to Lustre, thanks so much for your time and expertise.

-- 



*Brian Lambrigger* | Platform Developer

10602 Virginia Avenue | Culver City, CA 90232

O: (310) 659-8999 | M: (xxx) xxx-
 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Corrupted? MDT not mounting

2022-05-10 Thread Laura Hild via lustre-discuss
Hi Andrew-

The non-dummy SRP module is in the kmod-srp package, which isn't included in 
the Lustre repository.  I'm less certain than I'd like to be, as ours is a DKMS 
setup rather than kmod, and the last time I had an SRP setup was a couple years 
ago, but I suspect you may have success if you fetch the full MLNX_OFED from

  
https://content.mellanox.com/ofed/MLNX_OFED-4.9-4.1.7.0/MLNX_OFED_LINUX-4.9-4.1.7.0-rhel7.9-x86_64.tgz

and rebuild it for the _lustre kernel (mlnxofedinstall --add-kernel-support 
--kmp).  When I do that, I get modules that load successfully into the kernel 
with the kmods from the Lustre repository.

-Laura

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org