Re: [lustre-discuss] lustre-discuss Digest, Vol 223, Issue 7
Thanks for the response, I've used just defaults on my initial attempt, but yes I was using o2ib as this is implemented in all the physical servers. If I need to use a different module as you indicate, how would I do that? via /etc/modprobe.d/lnet.conf or in another file? Regards Sid Young W: https://off-grid-engineering.com > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of lustre-discuss digest..." > Today's Topics: > >1. Re: Lustre 2.15.5 in a Virtual Machine (Michael DiDomenico) > > > > -- Forwarded message -- > From: Michael DiDomenico > To: > Cc: lustre-discuss > Bcc: > Date: Fri, 25 Oct 2024 12:47:07 -0400 > Subject: Re: [lustre-discuss] Lustre 2.15.5 in a Virtual Machine > lustre in a vm certainly works as i have many running under vmware and > mounting lustre > > but i'm a little confused on your message. are you trying to bind the > lustre client via infiniband or tcp/ip? if the later (assumed based > on the ens nic prefix), you need to use the ksocklnd not the kiblnd > module > > > On Thu, Oct 24, 2024 at 3:17 AM Sid Young wrote: > > > > G'Day all, > > > > I'm trying to get lustre to bind to a 100G Mellanox card shared between > VM's but it fails with the following errors in dmeg: > > > > [ 406.474952] Lustre: Lustre: Build Version: 2.15.5 > > [ 406.604652] LNetError: 92384:0:(o2iblnd.c:2838:kiblnd_dev_failover()) > Failed to bind ens224:10.140.93.72 to device(): -19 > > [ 406.604704] LNetError: 92384:0:(o2iblnd.c:3355:kiblnd_startup()) > ko2iblnd: Can't initialize device: rc = -19 > > [ 407.655888] LNetError: 105-4: Error -100 starting up LNI o2ib > > [ 407.656729] LustreError: 92384:0:(events.c:639:ptlrpc_init_portals()) > network initialisation failed > > [ 559.741846] LNetError: > 92993:0:(lib-move.c:2255:lnet_handle_find_routed_path()) peer > 10.140.93.42@o2ib has no available nets > > [ 594.480161] LNetError: 93225:0:(o2iblnd.c:2838:kiblnd_dev_failover()) > Failed to bind ens224:10.140.93.72 to device(): -19 > > [ 594.480213] LNetError: 93225:0:(o2iblnd.c:3355:kiblnd_startup()) > ko2iblnd: Can't initialize device: rc = -19 > > [ 595.498493] LNetError: 105-4: Error -100 starting up LNI o2ib > > [ 707.825127] LNetError: 93691:0:(o2iblnd.c:2838:kiblnd_dev_failover()) > Failed to bind ens224:10.140.93.72 to device(): -19 > > [ 707.825182] LNetError: 93691:0:(o2iblnd.c:3355:kiblnd_startup()) > ko2iblnd: Can't initialize device: rc = -19 > > [ 708.843933] LNetError: 105-4: Error -100 starting up LNI o2ib > > [ 789.779769] LNetError: 93930:0:(o2iblnd.c:2838:kiblnd_dev_failover()) > Failed to bind ens224:10.140.93.72 to device(): -19 > > [ 789.779820] LNetError: 93930:0:(o2iblnd.c:3355:kiblnd_startup()) > ko2iblnd: Can't initialize device: rc = -19 > > [ 790.828974] LNetError: 105-4: Error -100 starting up LNI o2ib > > [root@hpc-vm-02 2.15.5]# > > > > The VM has two network interfaces ens192 and ens224 both are operational > with TCP traffic. > > > > /etc/modprobe.d/lnet.conf > > options lnet networks="o2ib(ens224) 10.140.93.*" > > > > [root@hpc-vm-02 2.15.5]# lnetctl net add --net o2ib --if ens224 > > add: > > - net: > > errno: -100 > > descr: "cannot add network: Network is down" > > [root@hpc-vm-02 2.15.5]# > > > > > > Any ideas where I might look? > > Are virtual machines even supported with Lustre > > OS is VMWare 7U3 on HP DL385 with 256 cores and 512GB RAM. > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Lustre 2.15.5 in a Virtual Machine
G'Day all, I'm trying to get lustre to bind to a 100G Mellanox card shared between VM's but it fails with the following errors in dmeg: [ 406.474952] Lustre: Lustre: Build Version: 2.15.5 [ 406.604652] LNetError: 92384:0:(o2iblnd.c:2838:kiblnd_dev_failover()) Failed to bind ens224:10.140.93.72 to device(): -19 [ 406.604704] LNetError: 92384:0:(o2iblnd.c:3355:kiblnd_startup()) ko2iblnd: Can't initialize device: rc = -19 [ 407.655888] LNetError: 105-4: Error -100 starting up LNI o2ib [ 407.656729] LustreError: 92384:0:(events.c:639:ptlrpc_init_portals()) network initialisation failed [ 559.741846] LNetError: 92993:0:(lib-move.c:2255:lnet_handle_find_routed_path()) peer 10.140.93.42@o2ib has no available nets [ 594.480161] LNetError: 93225:0:(o2iblnd.c:2838:kiblnd_dev_failover()) Failed to bind ens224:10.140.93.72 to device(): -19 [ 594.480213] LNetError: 93225:0:(o2iblnd.c:3355:kiblnd_startup()) ko2iblnd: Can't initialize device: rc = -19 [ 595.498493] LNetError: 105-4: Error -100 starting up LNI o2ib [ 707.825127] LNetError: 93691:0:(o2iblnd.c:2838:kiblnd_dev_failover()) Failed to bind ens224:10.140.93.72 to device(): -19 [ 707.825182] LNetError: 93691:0:(o2iblnd.c:3355:kiblnd_startup()) ko2iblnd: Can't initialize device: rc = -19 [ 708.843933] LNetError: 105-4: Error -100 starting up LNI o2ib [ 789.779769] LNetError: 93930:0:(o2iblnd.c:2838:kiblnd_dev_failover()) Failed to bind ens224:10.140.93.72 to device(): -19 [ 789.779820] LNetError: 93930:0:(o2iblnd.c:3355:kiblnd_startup()) ko2iblnd: Can't initialize device: rc = -19 [ 790.828974] LNetError: 105-4: Error -100 starting up LNI o2ib [root@hpc-vm-02 2.15.5]# The VM has two network interfaces ens192 and ens224 both are operational with TCP traffic. /etc/modprobe.d/lnet.conf options lnet networks="o2ib(ens224) 10.140.93.*" [root@hpc-vm-02 2.15.5]# lnetctl net add --net o2ib --if ens224 add: - net: errno: -100 descr: "cannot add network: Network is down" [root@hpc-vm-02 2.15.5]# Any ideas where I might look? Are virtual machines even supported with Lustre OS is VMWare 7U3 on HP DL385 with 256 cores and 512GB RAM. Sid Young W: https://off-grid-engineering.com ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Migrating to new OSTs
G'Day all, I'm in the process of scoping up more HPC storage on newer hardware and I was looking to deploy a bunch of new OSTs (JBOD with ZFS) and then phase out the older OSTs on the older hardware. Is there a comprehensive guide to doing this, I've found many different ways to migrate files to a new OST? But I also need steps on adding the new OSTs to the MDS (I have /home and /lustre as 2 pools). Sid Young Translational Research Institute ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] tunefs.lustre safe way to get config
G'Day all, I need to review the IP's assigned during the initial mkfs.lustre on ten ZFS based OST's and two ZFS backed MDT's. The ZFS disks are: osthome0/ost0, osthome1/ost1, osthome2/ost2, osthome3/ost3, ostlustre0/ost0, ostlustre1/ost1, ostlustre2/ost2, ostlustre3/ost3, ostlustre4/ost4, ostlustre5/ost5 And mdsthome/home mdtlustre/lustre). A few questions Is it safe to use tunefs.lustre on the running system to read back the parameters only? or do I have to shut everything down and read from the unmounted filesystems? Is this the correct commands to use for the DMTs? tunefs.lustre --print mdthome/home tunefs.lustre --print mdtlustre/lustre Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Lustre crash and now lockup on ls -la /lustre
Hi all, I've been running lustre 2.12.6 and (clients are 2.12.7) on HP gear for nearly 2 years and had an odd crash requiring a reboot of all nodes. I have lustre /home and /lustre file systems and I've been able to remount them on the clients after restarting the MGS/MDT and OSS nodes but on any client when I do an ls -la on the /lustre file system it locks solid. The /home appears to be OK for the directories and sub-directories I tested. I am ver rusty on Lustre now but I logged into another node and ran the following: [root@n04 ~]# lfs check osts home-OST-osc-9f3b26547800 active. home-OST0001-osc-9f3b26547800 active. home-OST0002-osc-9f3b26547800 active. home-OST0003-osc-9f3b26547800 active. lustre-OST-osc-9efd1e392800 active. lustre-OST0001-osc-9efd1e392800 active. lustre-OST0002-osc-9efd1e392800 active. lustre-OST0003-osc-9efd1e392800 active. lustre-OST0004-osc-9efd1e392800 active. lustre-OST0005-osc-9efd1e392800 active. [root@n04 ~]# lfs check mds home-MDT-mdc-9f3b26547800 active. lustre-MDT-mdc-9efd1e392800 active. [root@n04 ~]# lfs check servers home-OST-osc-9f3b26547800 active. home-OST0001-osc-9f3b26547800 active. home-OST0002-osc-9f3b26547800 active. home-OST0003-osc-9f3b26547800 active. lustre-OST-osc-9efd1e392800 active. lustre-OST0001-osc-9efd1e392800 active. lustre-OST0002-osc-9efd1e392800 active. lustre-OST0003-osc-9efd1e392800 active. lustre-OST0004-osc-9efd1e392800 active. lustre-OST0005-osc-9efd1e392800 active. home-MDT-mdc-9f3b26547800 active. lustre-MDT-mdc-9efd1e392800 active. [root@n04 ~]# [root@n04 ~]# lfs df -h UUID bytesUsed Available Use% Mounted on home-MDT_UUID 4.2T 217.5G4.0T 6% /home[MDT:0] home-OST_UUID 47.6T 42.5T5.1T 90% /home[OST:0] home-OST0001_UUID 47.6T 44.6T2.9T 94% /home[OST:1] home-OST0002_UUID 47.6T 41.9T5.7T 88% /home[OST:2] home-OST0003_UUID 47.6T 42.2T5.4T 89% /home[OST:3] filesystem_summary: 190.4T 171.2T 19.1T 90% /home UUID bytesUsed Available Use% Mounted on lustre-MDT_UUID 5.0T 53.8G4.9T 2% /lustre[MDT:0] lustre-OST_UUID47.6T 42.3T5.3T 89% /lustre[OST:0] lustre-OST0001_UUID47.6T 41.8T5.8T 88% /lustre[OST:1] lustre-OST0002_UUID47.6T 41.3T6.3T 87% /lustre[OST:2] lustre-OST0003_UUID47.6T 42.3T5.3T 89% /lustre[OST:3] lustre-OST0004_UUID47.6T 43.7T3.9T 92% /lustre[OST:4] lustre-OST0005_UUID47.6T 40.1T7.4T 85% /lustre[OST:5] filesystem_summary: 285.5T 251.5T 34.0T 89% /lustre [root@n04 ~]# Is it worth remounting everything and hope crash recovery is working or is there some specific checks I can make. Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] upgrade 2.12.6 to 2.12.7 - no lnet after reboot - SOLVED
I've managed to solve this after checking a few nodes in the cluster and discovered this particular node must have had a partial update resulting in a mismatch between the kernel version (locked at base release) and some of the kernel support files which appeared to be a slightly later release causing the DKMS to not generate the required files. Normally I disable kernel updates in YUM so everything is at the same release version and just update packages until I'm ready for a major update cycle. bad node: # yum list installed | grep kernel abrt-addon-kerneloops.x86_64 2.1.11-60.el7.centos @anaconda kernel.x86_64 3.10.0-1160.el7 @anaconda kernel-debug-devel.x86_64 3.10.0-1160.15.2.el7 @updates kernel-devel.x86_643.10.0-1160.15.2.el7 @updates kernel-headers.x86_64 3.10.0-1160.15.2.el7 @updates kernel-tools.x86_643.10.0-1160.15.2.el7 @updates kernel-tools-libs.x86_64 3.10.0-1160.15.2.el7 @updates # Working node: # yum list installed | grep kernel abrt-addon-kerneloops.x86_64 2.1.11-60.el7.centos @anaconda kernel.x86_64 3.10.0-1160.el7 @anaconda kernel-debug-devel.x86_64 3.10.0-1160.31.1.el7 @updates kernel-devel.x86_643.10.0-1160.el7 @/kernel-devel-3.10.0-1160.el7.x86_64 kernel-headers.x86_64 3.10.0-1160.el7 @anaconda kernel-tools.x86_643.10.0-1160.el7 @anaconda kernel-tools-libs.x86_64 3.10.0-1160.el7 @anaconda # After I removed the extraneous release packages and the lustre packages, I then updated the kernel and re-installed the kernel-headers and kernel-devel code then installed the (minimal) lustre client: # yum list installed|grep lustre kmod-lustre-client.x86_64 2.12.7-1.el7 @/kmod-lustre-client-2.12.7-1.el7.x86_64 lustre-client.x86_64 2.12.7-1.el7 @/lustre-client-2.12.7-1.el7.x86_64 lustre-client-dkms.noarch 2.12.7-1.el7 @/lustre-client-dkms-2.12.7-1.el7.noarch # And all good, every mounts and works first go as expected :) Sid Young Translational Research Institute Brisbane > -- Forwarded message ------ > From: Sid Young > To: lustre-discuss > Cc: > Bcc: > Date: Mon, 8 Nov 2021 11:15:59 +1000 > Subject: [lustre-discuss] upgrade 2.12.6 to 2.12.7 - no lnet after reboot? > I was running 2.12.6 on a HP DL385 running standard Centos 7.9 > (3.10.0-1160.el7.x86_64) for around 6 months and decided to plan and start > an upgrade cycle to 2.12.7, so I downloaded and installed the 2.12.7 centos > release from whamcloud using the 7.9.2009 release RPMS > > # cat /etc/centos-release > CentOS Linux release 7.9.2009 (Core) > > I have tried on the a node and I now have the following error after I > rebooted: > > # modprobe -v lnet > modprobe: FATAL: Module lnet not found. > > I suspect its not built against the kernel as there are 3 releases showing > and no errors during the yum install process: > > # ls -la /usr/lib/modules > drwxr-xr-x. 3 root root 4096 Mar 18 2021 3.10.0-1160.2.1.el7.x86_64 > drwxr-xr-x 3 root root 4096 Nov 8 10:32 3.10.0-1160.25.1.el7.x86_64 > drwxr-xr-x. 7 root root 4096 Nov 8 11:02 3.10.0-1160.el7.x86_64 > # > > Anyone upgraded this way? Any obvious gottas I've missed? > > Sid Young > Translational Research Institute > Brisbane > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] upgrade 2.12.6 to 2.12.7 - no lnet after reboot?
I was running 2.12.6 on a HP DL385 running standard Centos 7.9 (3.10.0-1160.el7.x86_64) for around 6 months and decided to plan and start an upgrade cycle to 2.12.7, so I downloaded and installed the 2.12.7 centos release from whamcloud using the 7.9.2009 release RPMS # cat /etc/centos-release CentOS Linux release 7.9.2009 (Core) I have tried on the a node and I now have the following error after I rebooted: # modprobe -v lnet modprobe: FATAL: Module lnet not found. I suspect its not built against the kernel as there are 3 releases showing and no errors during the yum install process: # ls -la /usr/lib/modules drwxr-xr-x. 3 root root 4096 Mar 18 2021 3.10.0-1160.2.1.el7.x86_64 drwxr-xr-x 3 root root 4096 Nov 8 10:32 3.10.0-1160.25.1.el7.x86_64 drwxr-xr-x. 7 root root 4096 Nov 8 11:02 3.10.0-1160.el7.x86_64 # Anyone upgraded this way? Any obvious gottas I've missed? Sid Young Translational Research Institute Brisbane ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] OST "D" status - only 1 OSS mounting
Thanks Andreas, The ZFS pools became degraded so I have cold restarted the storage and the OSTs and everything has come back up after about 5 minutes of crash recovery. Ive also worked out using the lfs_migrate and am emptying the full OST. Is there a tool that cAn specifically check an MDT and its associated OST's? Sid Young On Mon, Nov 1, 2021 at 2:11 PM Andreas Dilger wrote: > The "D" status means the OST is marked in "Degraded" mode, see the > lfs-df(1) man page. The "lfs check osts" is only checking the client > connection to the OSTs, but whether the MDS creates objects on those OSTs > really depends on how the MDS is feeling about them. > > On Oct 31, 2021, at 19:28, Sid Young via lustre-discuss < > lustre-discuss@lists.lustre.org> wrote: > > Hi all, > > I have a really odd issue, only 1 OST appears to mount despite there being > 4 OSTS available and ACTIVE. > > [root@hpc-login-01 home]# lfs df -h > UUID bytesUsed Available Use% Mounted on > home-MDT_UUID 4.2T 40.2G4.1T 1% /home[MDT:0] > home-OST_UUID 47.6T 37.8T9.8T 80% /home[OST:0] > home-OST0001_UUID 47.6T 47.2T 413.4G 100% /home[OST:1] > D > home-OST0002_UUID 47.6T 35.7T 11.9T 75% /home[OST:2] > home-OST0003_UUID 47.6T 39.4T8.2T 83% /home[OST:3] > > filesystem_summary: 190.4T 160.0T 30.3T 85% /home > > [root@hpc-login-01 home]# lfs check osts > home-OST-osc-a10c8f483800 active. > home-OST0001-osc-a10c8f483800 active. > home-OST0002-osc-a10c8f483800 active. > home-OST0003-osc-a10c8f483800 active. > > Should be 191TB... only shows 1 OST.. > > 10.140.93.42@o2ib:/home *48T * 48T 414G 100% /home > > Where should I look? > > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Whamcloud > > > > > > > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] OST "D" status - only 1 OSS mounting
Hi all, I have a really odd issue, only 1 OST appears to mount despite there being 4 OSTS available and ACTIVE. [root@hpc-login-01 home]# lfs df -h UUID bytesUsed Available Use% Mounted on home-MDT_UUID 4.2T 40.2G4.1T 1% /home[MDT:0] home-OST_UUID 47.6T 37.8T9.8T 80% /home[OST:0] home-OST0001_UUID 47.6T 47.2T 413.4G 100% /home[OST:1] D home-OST0002_UUID 47.6T 35.7T 11.9T 75% /home[OST:2] home-OST0003_UUID 47.6T 39.4T8.2T 83% /home[OST:3] filesystem_summary: 190.4T 160.0T 30.3T 85% /home [root@hpc-login-01 home]# lfs check osts home-OST-osc-a10c8f483800 active. home-OST0001-osc-a10c8f483800 active. home-OST0002-osc-a10c8f483800 active. home-OST0003-osc-a10c8f483800 active. Should be 191TB... only shows 1 OST.. 10.140.93.42@o2ib:/home *48T * 48T 414G 100% /home Where should I look? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] df shows wrong size of lustre file system (on all nodes).
I have some stability in my lustre installation after many days of testing, however df- h now reports the /home filesystem incorrectly. After mounting the /home I get: [root@n04 ~]# df -h 10.140.90.42@tcp:/lustre 286T 59T 228T 21% /lustre 10.140.90.42@tcp:/home191T 153T 38T 81% /home doing it again straight after, I get: [root@n04 ~]# df -h 10.140.90.42@tcp:/lustre 286T 59T 228T 21% /lustre 10.140.90.42@tcp:/home 48T 40T 7.8T 84% /home The 4 OSTs report as active and present: [root@n04 ~]# lfs df UUID 1K-blocksUsed Available Use% Mounted on home-MDT_UUID 447380569641784064 4432019584 1% /home[MDT:0] home-OST_UUID51097753600 40560842752 10536908800 80% /home[OST:0] home-OST0001_UUID51097896960 42786978816 8310916096 84% /home[OST:1] home-OST0002_UUID51097687040 38293322752 12804362240 75% /home[OST:2] home-OST0003_UUID51097765888 42293640192 8804123648 83% /home[OST:3] filesystem_summary: 204391103488 163934784512 40456310784 81% /home [root@n04 ~]# [root@n04 ~]# lfs osts OBDS: 0: lustre-OST_UUID ACTIVE 1: lustre-OST0001_UUID ACTIVE 2: lustre-OST0002_UUID ACTIVE 3: lustre-OST0003_UUID ACTIVE 4: lustre-OST0004_UUID ACTIVE 5: lustre-OST0005_UUID ACTIVE OBDS: 0: home-OST_UUID ACTIVE 1: home-OST0001_UUID ACTIVE 2: home-OST0002_UUID ACTIVE 3: home-OST0003_UUID ACTIVE [root@n04 ~]# Anyone seen this before? Reboots and remounts do not appear to change the value. zfs pool is reporting as online and a scrub returns 0 errors. Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Best ways to backup a Lustre file system?
G'Day all, Apart from rsync'ing all the data on a mounted lustre filesystem to another server, what backup systems are people using to backup Lustre? Sid Young M: 0458 396300 W: https://off-grid-engineering.com ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] /home remounted and running for 6 hours
Well my saga with /home locking up was partially resolved for about 6 hours today. I rebooted the MDS and re mounted the MGS and lustre MDT and home MDT and after a while it all came good, then rebooted each compute node and we were operational for about 6 hours when it all locked up again, /lustre worked fine but /home just locked solid.. I'm suspecting corruption but I don't know how to fix it... I have found that once I restart the MDS I can do a remount of home and all the D state processes come good and we are up and running. Is there a tool that can specifically check an individual MDT / OST etc? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre /home lockup - more info
I tried remounting the /home lustre file system to /mnt in read-only mode and when I try to ls the directory it locks up but I can escape it, how ever when I do a df command i get the completely wrong size (should be around 192TB): 10.140.93.42@o2ib:/home6.0P 4.8P 1.3P 80% /mnt zfs scrub is still working and all disks physically report as OK in the ILO of the two OSS servers... When the scrub finishes later today I will unmount and remount the 4 OSTs and see if the remount changes the status... updates in about 8 hours. Sid Young On Tue, Oct 12, 2021 at 8:18 AM Sid Young wrote: > >>2. Tools to check a lustre (Sid Young) >>4. Re: Tools to check a lustre (Dennis Nelson) >> >> >> My key issue is why /home locks solid when you try to use it but /lustre > is OK . The backend is ZFS used to manage the disks presented from the HP > D8000 JBOD > I'm at a loss after 6 months of 100% operation why this is suddenly > occurring. If I do repeated "dd" tasks on lustre it works fine, start one > on /home and it locks solid. > > I have started a ZFS scrub on two of the zfs pools. at 47T each it will > take most of today to resolve, but that should rule out the actual storage > (which is showing "NORMAL/ONLINE" and no errors. > > I'm seeing a lot of these in /var/log/messages > kernel: LustreError: 6578:0:(events.c:200:client_bulk_callback()) event > type 1, status -5, desc 89cdf3b9dc00 > A google search returned this: > https://wiki.lustre.org/Lustre_Resiliency:_Understanding_Lustre_Message_Loss_and_Tuning_for_Resiliency > > Could it be a network issue? - the nodes are running the > Centos7.9 drivers... the Mellanox one did not seam to make any difference > when I originally tried it 6 months ago. > > Any help appreciated :) > > Sid > > >> >> -- Forwarded message -- >> From: Sid Young >> To: lustre-discuss >> Cc: >> Bcc: >> Date: Mon, 11 Oct 2021 16:07:56 +1000 >> Subject: [lustre-discuss] Tools to check a lustre >> >> I'm having trouble diagnosing where the problem lies in my Lustre >> installation, clients are 2.12.6 and I have a /home and /lustre >> filesystems using Lustre. >> >> /home has 4 OSTs and /lustre is made up of 6 OSTs. lfs df shows all OSTs >> as ACTIVE. >> >> The /lustre file system appears fine, I can *ls *into every directory. >> >> When people log into the login node, it appears to lockup. I have shut >> down everything and remounted the OSTs and MDTs etc in order with no >> errors reporting but I'm getting the lockup issue soon after a few people >> log in. >> The backend network is 100G Ethernet using ConnectX5 cards and the OS is >> Cento 7.9, everything was installed as RPMs and updates are disabled in >> yum.conf >> >> Two questions to start with: >> Is there a command line tool to check each OST individually? >> Apart from /var/log/messages, is there a lustre specific log I can >> monitor on the login node to see errors when I hit /home... >> >> >> >> Sid Young >> >> >> >> >> >> >> >> -- Forwarded message -- >> From: Dennis Nelson >> To: Sid Young >> >> Date: Mon, 11 Oct 2021 12:20:25 + >> Subject: Re: [lustre-discuss] Tools to check a lustre >> Have you tried lfs check servers on the login node? >> > > Yes - one of the first things I did and this is what it always reports: > > ]# lfs check servers > home-OST-osc-89adb7e5e000 active. > home-OST0001-osc-89adb7e5e000 active. > home-OST0002-osc-89adb7e5e000 active. > home-OST0003-osc-89adb7e5e000 active. > lustre-OST-osc-89cdd14a2000 active. > lustre-OST0001-osc-89cdd14a2000 active. > lustre-OST0002-osc-89cdd14a2000 active. > lustre-OST0003-osc-89cdd14a2000 active. > lustre-OST0004-osc-89cdd14a2000 active. > lustre-OST0005-osc-89cdd14a2000 active. > home-MDT-mdc-89adb7e5e000 active. > lustre-MDT-mdc-89cdd14a2000 active. > [root@tri-minihub-01 ~]# > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Lustre /home lockup - how to check
> > >2. Tools to check a lustre (Sid Young) >4. Re: Tools to check a lustre (Dennis Nelson) > > > My key issue is why /home locks solid when you try to use it but /lustre is OK . The backend is ZFS used to manage the disks presented from the HP D8000 JBOD I'm at a loss after 6 months of 100% operation why this is suddenly occurring. If I do repeated "dd" tasks on lustre it works fine, start one on /home and it locks solid. I have started a ZFS scrub on two of the zfs pools. at 47T each it will take most of today to resolve, but that should rule out the actual storage (which is showing "NORMAL/ONLINE" and no errors. I'm seeing a lot of these in /var/log/messages kernel: LustreError: 6578:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc 89cdf3b9dc00 A google search returned this: https://wiki.lustre.org/Lustre_Resiliency:_Understanding_Lustre_Message_Loss_and_Tuning_for_Resiliency Could it be a network issue? - the nodes are running the Centos7.9 drivers... the Mellanox one did not seam to make any difference when I originally tried it 6 months ago. Any help appreciated :) Sid > > -- Forwarded message -- > From: Sid Young > To: lustre-discuss > Cc: > Bcc: > Date: Mon, 11 Oct 2021 16:07:56 +1000 > Subject: [lustre-discuss] Tools to check a lustre > > I'm having trouble diagnosing where the problem lies in my Lustre > installation, clients are 2.12.6 and I have a /home and /lustre > filesystems using Lustre. > > /home has 4 OSTs and /lustre is made up of 6 OSTs. lfs df shows all OSTs > as ACTIVE. > > The /lustre file system appears fine, I can *ls *into every directory. > > When people log into the login node, it appears to lockup. I have shut > down everything and remounted the OSTs and MDTs etc in order with no > errors reporting but I'm getting the lockup issue soon after a few people > log in. > The backend network is 100G Ethernet using ConnectX5 cards and the OS is > Cento 7.9, everything was installed as RPMs and updates are disabled in > yum.conf > > Two questions to start with: > Is there a command line tool to check each OST individually? > Apart from /var/log/messages, is there a lustre specific log I can monitor > on the login node to see errors when I hit /home... > > > > Sid Young > > > > > > > > -- Forwarded message -- > From: Dennis Nelson > To: Sid Young > > Date: Mon, 11 Oct 2021 12:20:25 + > Subject: Re: [lustre-discuss] Tools to check a lustre > Have you tried lfs check servers on the login node? > Yes - one of the first things I did and this is what it always reports: ]# lfs check servers home-OST-osc-89adb7e5e000 active. home-OST0001-osc-89adb7e5e000 active. home-OST0002-osc-89adb7e5e000 active. home-OST0003-osc-89adb7e5e000 active. lustre-OST-osc-89cdd14a2000 active. lustre-OST0001-osc-89cdd14a2000 active. lustre-OST0002-osc-89cdd14a2000 active. lustre-OST0003-osc-89cdd14a2000 active. lustre-OST0004-osc-89cdd14a2000 active. lustre-OST0005-osc-89cdd14a2000 active. home-MDT-mdc-89adb7e5e000 active. lustre-MDT-mdc-89cdd14a2000 active. [root@tri-minihub-01 ~]# ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Tools to check a lustre
I'm having trouble diagnosing where the problem lies in my Lustre installation, clients are 2.12.6 and I have a /home and /lustre filesystems using Lustre. /home has 4 OSTs and /lustre is made up of 6 OSTs. lfs df shows all OSTs as ACTIVE. The /lustre file system appears fine, I can *ls *into every directory. When people log into the login node, it appears to lockup. I have shut down everything and remounted the OSTs and MDTs etc in order with no errors reporting but I'm getting the lockup issue soon after a few people log in. The backend network is 100G Ethernet using ConnectX5 cards and the OS is Cento 7.9, everything was installed as RPMs and updates are disabled in yum.conf Two questions to start with: Is there a command line tool to check each OST individually? Apart from /var/log/messages, is there a lustre specific log I can monitor on the login node to see errors when I hit /home... Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] eviction timeout
I'm seeing a lot of these messages: Oct 11 11:12:09 hpc-mds-02 kernel: Lustre: lustre-MDT: Denying connection for new client b6df7eda-8ae1-617c-6ff1-406d1ffb6006 (at 10.140.90.82@tcp), waiting for 6 known clients (0 recovered, 0 in progress, and 0 evicted) to recover in 2:42 It seems to be a 3minute timeout, is it possible to shorten this and even not log this message? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Missing OST's from 1 node only
G'Day all, I have an odd situation where 1 compute node, mounts /home and /lustre but only half the OST's are present, while all the other nodes are fine not sure where to start on this one? Good node: [root@n02 ~]# lfs df UUID 1K-blocksUsed Available Use% Mounted on home-MDT_UUID 447397068830695424 4443273216 1% /home[MDT:0] home-OST_UUID51097721856 39839794176 11257662464 78% /home[OST:0] home-OST0001_UUID51097897984 40967138304 10130627584 81% /home[OST:1] home-OST0002_UUID51097705472 37731089408 13366449152 74% /home[OST:2] home-OST0003_UUID51097773056 41447411712 9650104320 82% /home[OST:3] filesystem_summary: 204391098368 159985433600 44404843520 79% /home UUID 1K-blocksUsed Available Use% Mounted on lustre-MDT_UUID 536881612828246656 5340567424 1% /lustre[MDT:0] lustre-OST_UUID 51098352640 10144093184 40954257408 20% /lustre[OST:0] lustre-OST0001_UUID 51098497024 9584398336 41514096640 19% /lustre[OST:1] lustre-OST0002_UUID 51098414080 11683002368 39415409664 23% /lustre[OST:2] lustre-OST0003_UUID 51098514432 10475310080 40623202304 21% /lustre[OST:3] lustre-OST0004_UUID 51098506240 11505326080 39593178112 23% /lustre[OST:4] lustre-OST0005_UUID 51098429440 9272059904 41826367488 19% /lustre[OST:5] filesystem_summary: 306590713856 62664189952 243926511616 21% /lustre [root@n02 ~]# The bad Node: [root@n04 ~]# lfs df UUID 1K-blocksUsed Available Use% Mounted on home-MDT_UUID 447397068830726400 4443242240 1% /home[MDT:0] home-OST0002_UUID51097703424 37732352000 13363446784 74% /home[OST:2] home-OST0003_UUID51097778176 41449634816 9646617600 82% /home[OST:3] filesystem_summary: 102195481600 79181986816 23010064384 78% /home UUID 1K-blocksUsed Available Use% Mounted on lustre-MDT_UUID 536881612828246656 5340567424 1% /lustre[MDT:0] lustre-OST0003_UUID 51098514432 10475310080 40623202304 21% /lustre[OST:3] lustre-OST0004_UUID 51098511360 11505326080 39593183232 23% /lustre[OST:4] lustre-OST0005_UUID 51098429440 9272059904 41826367488 19% /lustre[OST:5] filesystem_summary: 153295455232 31252696064 122042753024 21% /lustre [root@n04 ~]# Sid Young Translational Research Institute ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Converting MGS to ZFS - HA Config Question
Hi, I am in the process of converting my pre-production cluster to use ZFS, and I have a question regarding HA config parameters. The storage node has 24 disks, I've sliced off two disks in HBA mode to act as a 960G mirror. the command is: # mkfs.lustre --reformat --mgs --failnode 10.140.93.41@o2ib --backfstype=zfs mgspool/mgt mirror d3710M0 d3710M1 This runs successfully and I get the output below, however I want to make sure the second MDS node can be failed over too using Pacemaker, so if the server I am on now is 10.140.93.42 and the other MDS is 10.140.93.41, do I need to specify the host its on now (.42) anywhere in the config? I tried the servicenode parameter but it refuses to have servicenode and failnode in the command: Permanent disk data: Target: MGS Index: unassigned Lustre FS: Mount type: zfs Flags: 0x64 (MGS first_time update ) Persistent mount opts: Parameters: failover.node=10.140.93.41@o2ib mkfs_cmd = zpool create -f -O canmount=off mgspool mirror d3710M0 d3710M1 mkfs_cmd = zfs create -o canmount=off mgspool/mgt xattr=sa dnodesize=auto Writing mgspool/mgt properties lustre:failover.node=10.140.93.41@o2ib lustre:version=1 lustre:flags=100 lustre:index=65535 lustre:svname=MGS [root@hpc-mds-02]# ]# zfs list NAME USED AVAIL REFER MOUNTPOINT mgspool 468K 860G96K /mgspool mgspool/mgt96K 860G96K /mgspool/mgt [root@hpc-mds-02 by-id]# zpool status pool: mgspool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM mgspool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 d3710M0 ONLINE 0 0 0 d3710M1 ONLINE 0 0 0 errors: No known data errors [root@hpc-mds-02# Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] lustre-discuss Digest, Vol 181, Issue 22
3 things Can you send your /etc/lnet.conf file Can you also send /etc/modprobe.d/lnet.conf and does a systemctl restart lnet produce an error? Sid On Fri, Apr 30, 2021 at 6:27 AM wrote: > Send lustre-discuss mailing list submissions to > lustre-discuss@lists.lustre.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > or, via email, send a message with subject or body 'help' to > lustre-discuss-requ...@lists.lustre.org > > You can reach the person managing the list at > lustre-discuss-ow...@lists.lustre.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of lustre-discuss digest..." > Today's Topics: > >1. Lustre client LNET problem from a novice (Yau Hing Tuen, Bill) > > > > -- Forwarded message -- > From: "Yau Hing Tuen, Bill" > To: lustre-discuss@lists.lustre.org > Cc: > Bcc: > Date: Thu, 29 Apr 2021 15:23:51 +0800 > Subject: [lustre-discuss] Lustre client LNET problem from a novice > Dear All, > > Need some advice on the following situation: one of my servers > (Lustre client only) could no longer connect to the Lustre server. > Suspecting some problem on the LNET configuration, but I am too new to > Lustre and does not have more clue on how to troubleshoot it. > > Kernel version: Linux 5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 > 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux > Lustre version: 2.14.0 (pulled from git) > Lustre debs built with GCC 9.3.0 on the server. > > Modprobe not cleanly complete as static lnet configuration does not work: > # modprobe -v lustre > insmod /lib/modules/5.4.0-65-generic/updates/kernel/net/libcfs.ko > insmod /lib/modules/5.4.0-65-generic/updates/kernel/net/lnet.ko > networks="o2ib0(ibp225s0f0)" > insmod /lib/modules/5.4.0-65-generic/updates/kernel/fs/obdclass.ko > insmod /lib/modules/5.4.0-65-generic/updates/kernel/fs/ptlrpc.ko > modprobe: ERROR: could not insert 'lustre': Network is down > > So resort to try dynamic lnet configuration: > > # lctl net up > LNET configure error 100: Network is down > > # lnetctl net show > net: > - net type: lo >local NI(s): > - nid: 0@lo >status: up > > # lnetctl net add --net o2ib0 --if ibp225s0f0" > add: > - net: >errno: -100 >descr: "cannot add network: Network is down" > > Having these error messages in dmesg after the above "lnetctl net > add" command > [265979.237735] LNet: 3893180:0:(config.c:1564:lnet_inet_enumerate()) > lnet: Ignoring interface enxeeeb676d0232: it's down > [265979.237738] LNet: 3893180:0:(config.c:1564:lnet_inet_enumerate()) > Skipped 9 previous similar messages > [265979.238395] LNetError: > 3893180:0:(o2iblnd.c:2655:kiblnd_hdev_get_attr()) Invalid mr size: > 0x100 > [265979.267372] LNetError: > 3893180:0:(o2iblnd.c:2869:kiblnd_dev_failover()) Can't get device > attributes: -22 > [265979.298129] LNetError: 3893180:0:(o2iblnd.c:3353:kiblnd_startup()) > ko2iblnd: Can't initialize device: rc = -22 > [265980.353643] LNetError: 105-4: Error -100 starting up LNI o2ib > > Initial Diagnosis: > # ip link show ibp225s0f0 > 41: ibp225s0f0: mtu 2044 qdisc mq > state UP mode DEFAULT group default qlen 256 > link/infiniband > 00:00:11:08:fe:80:00:00:00:00:00:00:0c:42:a1:03:00:79:99:1c brd > 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff > > # ip address show ibp225s0f0 > 41: ibp225s0f0: mtu 2044 qdisc mq > state UP group default qlen 256 > link/infiniband > 00:00:11:08:fe:80:00:00:00:00:00:00:0c:42:a1:03:00:79:99:1c brd > 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff > inet 10.10.10.3/16 brd 10.10.255.255 scope global ibp225s0f0 > valid_lft forever preferred_lft forever > inet6 fe80::e42:a103:79:991c/64 scope link > valid_lft forever preferred_lft forever > > # ifconfig ibp225s0f0 > ibp225s0f0: flags=4163 mtu 2044 > inet 10.10.10.3 netmask 255.255.0.0 broadcast 10.10.255.255 > inet6 fe80::e42:a103:79:991c prefixlen 64 scopeid 0x20 > unspec 00-00-11-08-FE-80-00-00-00-00-00-00-00-00-00-00 > txqueuelen 256 (UNSPEC) > RX packets 14363998 bytes 1440476592 (1.4 GB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 88 bytes 6648 (6.6 KB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > # lsmod | grep ib > ko2iblnd 233472 0 > lnet 552960 3 ko2iblnd,obdclass > libcfs487424 3 lnet,ko2iblnd,obdclass > ib_umad28672 0 > ib_ipoib 110592 0 > rdma_cm61440 2 ko2iblnd,rdma_ucm > ib_cm 57344 2 rdma_cm,ib_ipoib > mlx5_ib 307200 0 > mlx_compat 65536 1 ko2iblnd > ib_uverbs 126976 2 rdma_ucm,mlx5_ib > ib_core 311296 9 > rdma_cm,ib_ipoib,ko2iblnd,iw_cm,ib
Re: [lustre-discuss] lustre-discuss Digest, Vol 180, Issue 23
> > LNET on the failover node will be operational as its a separate service, > you can check it as shown below and do a "lnetctl net show": > [root@hpc-mds-02 ~]# systemctl status lnet ● lnet.service - lnet management Loaded: loaded (/usr/lib/systemd/system/lnet.service; disabled; vendor preset: disabled) Active: active (exited) since Mon 2021-03-08 15:19:07 AEST; 2 weeks 1 days ago Process: 25742 ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf (code=exited, status=0/SUCCESS) Process: 25738 ExecStart=/usr/sbin/lnetctl lnet configure (code=exited, status=0/SUCCESS) Process: 25736 ExecStart=/sbin/modprobe lnet (code=exited, status=0/SUCCESS) Main PID: 25742 (code=exited, status=0/SUCCESS) CGroup: /system.slice/lnet.service Mar 08 15:19:07 hpc-mds-02 systemd[1]: Starting lnet management... Mar 08 15:19:07 hpc-mds-02 systemd[1]: Started lnet management. [root@hpc-mds-02 ~]# Its only the disk management that is down on the failover node. Sid > > > Imho, LNET to a failover node _must_ fail, because LNET should not be > up on the failover node, right? > > If I started LNET there, and some client does not get an answer > quickly enough from the acting MDS, it > would try the failover, LNET yes but Lustre no - that doesn't sound > right. > > > Regards, > Thomas > > -- > > Thomas Roth > Department: Informationstechnologie > Location: SB3 2.291 > Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] LVM support
G'Day all, I put this question to another member of the list privately, but I thought it would be good to ask of the whole list, if ZFS is supported for managing a large pool of disks in a typical OSS node, could LVM be used with a stripped LV configuration? Since LVM is rock solid and most likely managing the file system of each node in your cluster, what impact does an LVM LV have as an OST ? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Performance over 100G ethernet
G'Day all, What sort of transfer speeds should I expect to see from a client writing a 1G file into the Lustre storage via RoCE on a 100G ConnectX5 (write block size is 1M)? I have done virtually no tuning yet and the MTU is showing as "active 1024". If you have any links to share with some performance benchmarks and config examples that would be much appreciated. Thanks Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Solved - OSS Crash
A big thanks to Karsten - I've downgraded the kernels on two OSS nodes and one of the MDS to 3.10.0-1160.2.1.el7.x86_64, placed the others in standby and everything has run overnight with 50,000 continuous reads/writes/deletes/per cycle and bulk deletes in a shell script running continuously and this morning its all still up and running :) Thanks everyone for your suggestions. Next challenge RoCE over 100G ConnectX5 cards :) Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] OSS node crash/high CPU latency when deleting 100's of empty test files
Thx Karsten, looks like I found it at the same time you posted... I will have a go at re-imaging with 1160.6.1 (the build updates to 1160.15.2) and re-testing. Do you know if 2.14 will be released for Centos 7.9? Sid Hi Sid, if you are using a CentOS 7.9 kernel newer than 3.10.0-1160.6.1.el7.x86_64 then check out LU-14341 as these kernel versions cause a timer related regression: https://jira.whamcloud.com/browse/LU-14341 We learnt this the hard way during the last couple of days and downgraded to kernel-3.10.0-1160.2.1.el7.x86_64 (which is the officially supported kernel version of lustre 2.12.6). We use ZFS. YMMV. -- Karsten Weiss Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] OSS crashes - could be LU-14341
G'Day all, Is 2.12.6 supported on Centos 7.9? After more investigation, I believe this is the issue I am seeing: https://jira.whamcloud.com/browse/LU-14341 If there is a patch release built for 7.9 I am really happy to test it, as it's easy to reproduce and crash the OSS's Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] OSS Nodes crashing (and an MDS crash as well)
G'Day all, As I reported in a previous email my OSS nodes crash soon after initiating a file creation script using "dd" in a loop and then trying to delete all the files at once. At first I thought it was related to the Melanox 100G cards but after rebuilding everything using just the 10G network I still get the crashes. I have a crash dump file from the MDS which crashed during the creates and the OSS crashed when I did the deletes. This leads me to think Lustre 2.12.6 running on Centos 7.9 has a subtle bug somewhere? I'm not sure how to progress this, should I attempt to try 2.13? https://downloads.whamcloud.com/public/lustre/lustre-2.13.0/el7/patchless-ldiskfs-server/RPMS/x86_64/ Or build a fresh instance on a clean build of the OS? Thoughts? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] OSS node crash/high CPU latency when deleting 100's of emty test files
G'Day all, I've been doing some file create/delete testing on our new Lustre storage which results in the OSS nodes crashing and rebooting due to high latency issues. I can reproduce it by running "dd" commands on the /lustre file system in a for loop and then do a rm -f testfile-*.text at the end. This results in console errors on our DL385 OSS nodes (running Centos 7.9) which basically show a stack of: mlx5_core and bnxt_en error messages mlx5 being the Mellanox Driver for the 100G ConnectX5 cards followed by a stack of: "NMI watchdog: BUG: soft lockup - CPU#"N stuck for XXs " where the CPU number is around 4 different ones and XX is typical 20-24seconds...then the boxes reboot! Before I log a support ticket to HPe, I'm going to try and disable the 100G cards and see if its repeatable via the 10G interfaces on the motherboards, but before I do that, does anyone use the mellanox ConnectX5 cards on their Lustre Storage nodes and ethernet only and if so, which driver are you using and on which OS... Thanks in advance! Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] servicenode /failnode
G'Day all, I'm rebuilding my Lustre cluster again and in doing so I am trying to understand the role of the --servicenode option when creating an OST. There is an example in the doco shown as this: [root@rh7z-oss1 system]# mkfs.lustre --ost \ > --fsname demo \ > --index 0 \ > --mgsnode 192.168.227.11@tcp1 \ > --mgsnode 192.168.227.12@tcp1 \ > --servicenode 192.168.227.21@tcp1 \ > --servicenode 192.168.227.22@tcp1 \ > /dev/dm-3 But its not clear what the service node actually is. Am I correct in saying the service nodes are the IP's of the two OSS servers that can manage this particular OST (the HA pair)? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] lustre-discuss Digest, Vol 179, Issue 20
Thanks for the replies, the nodes have multiple interfaces (four on compute nodes and 6 on the storage nodes), ens2f0 is the 100G Mellanox ConnectX5 card in slot 2 and they are all running 2.12.6 using the RPMS from the lustre site. I will remove one of the network definition files and add the lnetctl --backup config to the /etc/lnet.conf i did try an export and noticed it barfs on some of the parameters but I did not try the --backup option, so it gives me a few options to experiment with minimising the config just a bit of trial and error I gather then the lustre.conf file is not needed, just the /etc/modprobe.d/lnet.conf and the /etc/lnet.conf. Sid Young > > -- Forwarded message -- > From: "Degremont, Aurelien" > To: Sid Young , lustre-discuss < > lustre-discuss@lists.lustre.org> > Cc: > Bcc: > Date: Tue, 23 Feb 2021 08:47:27 + > Subject: Re: [lustre-discuss] need to always manually add network after > reboot > > Hello > > > > If I understand correctly, you're telling that you have 2 configuration > files: > > > > /etc/modprobe.d/lnet.conf > > options lnet networks=tcp > > > > [root@hpc-oss-03 ~]# cat /etc/modprobe.d/lustre.conf > options lnet networks="tcp(ens2f0)" > options lnet ip2nets="tcp(ens2f0) 10.140.93.* > > > > That means you are declaring twice the "networks" option for "lnet" kernel > module. I don't know how 'modprobe' will behave regarding that. > > If you have a very simple configuration, where your nodes only have one > Ethernet interface "ens2f0", you only need the following lines, from the 3 > above: > > > > options lnet networks="tcp(ens2f0)" > > > > If this interface is the only Ethernet interface on your host, you don't > even need a network specific setup. By default, when loading Lustre, in the > absence of a network configuration, Lustre will automatically setup the > only ethernet interface to use it for "tcp". > > > > Aurélien > > > > > > *De : *lustre-discuss au nom de > Sid Young via lustre-discuss > *Répondre à : *Sid Young > *Date : *mardi 23 février 2021 à 06:59 > *À : *lustre-discuss > *Objet : *[EXTERNAL] [lustre-discuss] need to always manually add network > after reboot > > > > *CAUTION*: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > > G'Day all, > > I'm finding that when I reboot any node in our new HPC, I need to keep > manually adding the network using lnetctl net add --net tcp --if ens2f0 > > Then I can do an lnetctl net show and see the tcp part active... > > > > I have options in /etc/modprobe.d/lnet.conf > > options lnet networks=tcp > > > > and > > > > [root@hpc-oss-03 ~]# cat /etc/modprobe.d/lustre.conf > options lnet networks="tcp(ens2f0)" > options lnet ip2nets="tcp(ens2f0) 10.140.93.* > > > > I've read the doco and tried to understand the correct parameters for a > simple Lustre config so this is what I worked out is needed... but I > suspect its still wrong. > > > > Any help appreciated :) > > > > > > > > Sid Young > > > > > > -- Forwarded message -- > From: Angelos Ching > To: lustre-discuss@lists.lustre.org > Cc: > Bcc: > Date: Tue, 23 Feb 2021 18:06:02 +0800 > Subject: Re: [lustre-discuss] need to always manually add network after > reboot > > Hi Sid, > > Notice that you are using lnetctl net add to add the lnet network, which > means you should be using a recent version of Lustre that depends on > /etc/lnet.conf for boot time lnet configuration. > > You can save the current lnet configuration using command: lnetctl export > --backup > /etc/lnet.conf (make a backup of the original file first if > required) > > On next boot, lnet.service will load your lnet configuration from the file. > > Or you can manually build lnet.conf as lnetctl seems to have occasion > problems with some of the fields exported by "lnetctl export --backup" > > Attaching my simple lnet.conf for your reference: > > # cat /etc/lnet.conf > ip2nets: > - net-spec: o2ib > ip-range: > 0: 10.2.8.* > - net-spec: tcp > ip-range: > 0: 10.5.9.* > route: > - net: o2ib > gateway: 10.5.9.25@tcp > hop: -1 > priority: 0 > - net: o2ib > gateway: 10.5.9.24@tcp > hop: -1 > priority: 0 > global: > numa_range: 0 > max_intf: 200 > discovery: 1 > drop_asym_route: 0 > > Best regards, > Angelos > > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] need to always manually add network after reboot
G'Day all, I'm finding that when I reboot any node in our new HPC, I need to keep manually adding the network using lnetctl net add --net tcp --if ens2f0 Then I can do an lnetctl net show and see the tcp part active... I have options in /etc/modprobe.d/lnet.conf options lnet networks=tcp and [root@hpc-oss-03 ~]# cat /etc/modprobe.d/lustre.conf options lnet networks="tcp(ens2f0)" options lnet ip2nets="tcp(ens2f0) 10.140.93.* I've read the doco and tried to understand the correct parameters for a simple Lustre config so this is what I worked out is needed... but I suspect its still wrong. Any help appreciated :) Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] MDS using D3710 DAS - partially Solved
After some investigation it looks like a timeout issue in the smartpqi kernel module is causing the disks to be removed soon after they are initially added based on what is reported in "dmesg" This issue first occurred in RHEL/Centos 7.4 and should have been resolved by centos 7.7. I've emailed the maintainer of the module and he's come back to me with an offer to create a test driver to see if increasing the timeout fixes the issue. There is an existing patch but its version is less than the one in Centos 7.9. On the bright side, I've built and rebuilt the Lustre MDS and OSS config several times as I optimise the installation while running under Pacemaker and have been able to mount /lustre and /home on the Compute nodes so this new system is 50% of the way there :) Sid Young > Today's Topics: > > 1. Re: MDS using D3710 DAS (Sid Young) >2. Re: MDS using D3710 DAS (Christopher Mountford) > > > > ------ Forwarded message -- > From: Sid Young > To: Christopher Mountford , > lustre-discuss@lists.lustre.org > Cc: > Bcc: > Date: Mon, 15 Feb 2021 08:42:43 +1000 > Subject: Re: [lustre-discuss] MDS using D3710 DAS > Hi Christopher, > > Just some background, all servers are DL385's all servers are running the > same image of Centos 7.9, The MDS HA pair have a SAS connected D3710 and > the dual OSS HA pair have a D8000 each with 45 disks in each of them. > > The D3710 (which has 24x 960G SSD's) seams a bit hit and miss at > presenting two LV's, I had setup a /lustre and /home which I was going to > use ldiskfs rather than zfs however I am finding that the disks MAY present > to both servers after some reboots but usually the first server to reboot > see's the LV presented and the other only see's its local internal disks > only, so the array appears to only present the LV's to one host most of the > time. > > With the 4 OSS servers. i see the same issue, sometimes the LV's present > and sometimes they don't. > > I was planning on setting up the OST's as ldiskfs as well, but I could > also go zfs, my test bed system and my current HPC uses ldsikfs. > > Correct me if I am wrong, but disks should present to both servers all the > time and using PCS I should be able to mount up a /lustre and /home one the > first server while the disks present on the second server but no software > is mounting them so there should be no issues? > > > Sid Young > > On Fri, Feb 12, 2021 at 7:27 PM Christopher Mountford < > cj...@leicester.ac.uk> wrote: > >> Hi Sid, >> >> We've a similar hardware configuration - 2 MDS pairs and 1 OSS pair which >> each consist of 2 DL360 connected to a single D3700. However we are using >> Lustre on ZFS with each array split into 2 or 4 zpools (depending on the >> usage) and haven't seen any problems of this sort. Are you using ldiskfs? >> >> - Chris >> >> >> On Fri, Feb 12, 2021 at 03:14:58PM +1000, Sid Young wrote: >> >G'day all, >> >Is anyone using a HPe D3710 with two HPeDL380/385 servers in a MDS HA >> >Configuration? If so, is your D3710 presenting LV's to both servers >> at >> >the same time AND are you using PCS with the Lustre PCS Resources? >> >I've just received new kit and cannot get disk to present to the MDS >> >servers at the same time. :( >> >Sid Young >> >> > ___ >> > lustre-discuss mailing list >> > lustre-discuss@lists.lustre.org >> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> >> > > > -- Forwarded message -- > From: Christopher Mountford > To: Sid Young > Cc: Christopher Mountford , > lustre-discuss@lists.lustre.org > Bcc: > Date: Mon, 15 Feb 2021 10:44:10 +0000 > Subject: Re: [lustre-discuss] MDS using D3710 DAS > > Hi Sid. > > We use the D3700s (and our D8000s) as JBODS with zfs providing the > redundancy - do you have some kind of hardware RAID? If so, are your raid > controller the array corntrollers or on the HBAs? Off the top of my head, > if the latter, there might be an issue with multiple HBAs trying to > assemble the same RAID array? > > - Chris. > > On Mon, Feb 15, 2021 at 08:42:43AM +1000, Sid Young wrote: > >Hi Christopher, > >Just some background, all servers are DL385's all servers are running > >the same image of Centos 7.9, The MDS HA pair have a SAS connected > >D3710 and the dual OSS HA pair have a D8000 each with 45 disks in each > >of them. > >The D3710 (which ha
[lustre-discuss] lfs check now working
After some experiments and recreating the two filesystems I now have lfs check mds etc working from the HPC clients :) sorry to waste bandwidth. Sid ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] cant check MDS?
G'Day all, I'm slowly working through various issues and thought I would run a check on the mds node(s) after mounting a "lustre" fs and a "home" fs... but get an odd error? /dev/sdd 3.7T 5.6M 3.4T 1% /mdt-lustre /dev/sdc 2.6T 5.5M 2.4T 1% /mdt-home [root@hpc-mds-02 ~]# lfs check mds lfs check: cannot find mounted Lustre filesystem: No such device [root@hpc-mds-02 ~]# What am I doing wrong? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] MGS IP in a HA cluster
Thanks for the clarification. :) Sid Young M: 0458 396300 W: https://off-grid-engineering.com W: ( <https://z900collector.wordpress.com/restoration/>personal) https://sidyoung.com/ On Thu, Feb 18, 2021 at 4:35 PM Indivar Nair wrote: > Hi Sid, > > 1. > -- You don't need a Cluster/Virtual IP for Lustre. Only the MGT, MDT and > OST volumes need to be failed over. > When these volumes are failed over to the other server, all the > components of the Lustre file system are informed about this failover, and > they will then continue accessing these volumes using the IP of the > failover server. > > 2. > -- No. MGS IP should also be in the same network(s) as MDS and OSS. > > If you are using IML for installation and management of Lustre, then this > should be on a different network (for example, your 10G network (or a 1G > network)). > > Regards, > > > Indivar Nair > > On Thu, Feb 18, 2021 at 10:29 AM Sid Young wrote: > >> G'day all, >> >> I'm trying to get my head around configuring a new Lustre 2.12.6 cluster >> on Centos 7.9, in particular the correct IP(s) for the MGS. >> >> In a pacemaker based MDS cluster, when I define the IP for the HA, is >> that the same IP used when referencing the MGS, or is the MGS IP only >> specified by using the IP of both the MDS servers (assume dual MDS HA >> cluster here)? >> >> And, if I have a 100G ethernet network (for RoCE) for Lustre usage and a >> 10G network for server access is the MGS IP based around the 100G network >> or my 10G network? >> >> Any help appreciated :) >> >> Sid Young >> >> ___ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] MGS IP in a HA cluster
G'day all, I'm trying to get my head around configuring a new Lustre 2.12.6 cluster on Centos 7.9, in particular the correct IP(s) for the MGS. In a pacemaker based MDS cluster, when I define the IP for the HA, is that the same IP used when referencing the MGS, or is the MGS IP only specified by using the IP of both the MDS servers (assume dual MDS HA cluster here)? And, if I have a 100G ethernet network (for RoCE) for Lustre usage and a 10G network for server access is the MGS IP based around the 100G network or my 10G network? Any help appreciated :) Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] MDS using D3710 DAS
Hi Christopher, Just some background, all servers are DL385's all servers are running the same image of Centos 7.9, The MDS HA pair have a SAS connected D3710 and the dual OSS HA pair have a D8000 each with 45 disks in each of them. The D3710 (which has 24x 960G SSD's) seams a bit hit and miss at presenting two LV's, I had setup a /lustre and /home which I was going to use ldiskfs rather than zfs however I am finding that the disks MAY present to both servers after some reboots but usually the first server to reboot see's the LV presented and the other only see's its local internal disks only, so the array appears to only present the LV's to one host most of the time. With the 4 OSS servers. i see the same issue, sometimes the LV's present and sometimes they don't. I was planning on setting up the OST's as ldiskfs as well, but I could also go zfs, my test bed system and my current HPC uses ldsikfs. Correct me if I am wrong, but disks should present to both servers all the time and using PCS I should be able to mount up a /lustre and /home one the first server while the disks present on the second server but no software is mounting them so there should be no issues? Sid Young On Fri, Feb 12, 2021 at 7:27 PM Christopher Mountford wrote: > Hi Sid, > > We've a similar hardware configuration - 2 MDS pairs and 1 OSS pair which > each consist of 2 DL360 connected to a single D3700. However we are using > Lustre on ZFS with each array split into 2 or 4 zpools (depending on the > usage) and haven't seen any problems of this sort. Are you using ldiskfs? > > - Chris > > > On Fri, Feb 12, 2021 at 03:14:58PM +1000, Sid Young wrote: > >G'day all, > >Is anyone using a HPe D3710 with two HPeDL380/385 servers in a MDS HA > >Configuration? If so, is your D3710 presenting LV's to both servers at > >the same time AND are you using PCS with the Lustre PCS Resources? > > I've just received new kit and cannot get disk to present to the MDS > >servers at the same time. :( > >Sid Young > > > ___ > > lustre-discuss mailing list > > lustre-discuss@lists.lustre.org > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] MDS using D3710 DAS
G'day all, Is anyone using a HPe D3710 with two HPeDL380/385 servers in a MDS HA Configuration? If so, is your D3710 presenting LV's to both servers at the same time AND are you using PCS with the Lustre PCS Resources? I've just received new kit and cannot get disk to present to the MDS servers at the same time. :( Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Metrics Gathering into ELK stack
G'Day all, I am about to commission a new HPC over the holiday break and in planning I am looking at metrics gathering of the Lustre Cluster, most likely into an Elastic/Kibana Stack. Are there any reliable/solid Lustre Specific metrics tools that can push data to ELK OR Can generate JSON strings of metrics I can push into more bespoke monitoring solutions... I am more interested in I/O metrics from the lustre side of things as I can gather Disk/CPU/memory metrics with Metricbeat as needed already in the legacy HPC. Sid Young W: https://off-grid-engineering.com W: ( <https://z900collector.wordpress.com/restoration/>personal) https://sidyoung.com/ ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Lustre via 100G Ethernet or Infiniband
With the growth of 100G ethernet, is it better to connect a lustre file server via EDR 100G Infiniband or 100G Ethernet for a 32 node HPC cluster running a typical life sciences - Genomics workload? Thoughts anyone? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] RA's found
Please ignore my last email, I discovered I had the resource agent rpm but not installed it. Sid Young W: https://off-grid-engineering.com W: ( <https://z900collector.wordpress.com/restoration/>personal) https://sidyoung.com/ ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Pacemaker resource Agents
I've been trying to locate the Lustre specific Pacemaker resource agents but I've had no luck at github where they were meant to be hosted, maybe I am looking at the wrong project? Has anyone recently implemented a HA lustre cluster using pacemaker and did you use lustre specific RA's? Thanks in advance! Sid Young W: https://off-grid-engineering.com W: ( <https://z900collector.wordpress.com/restoration/>personal) https://sidyoung.com/ ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] SOLVED - new client locks up on ls /lustre
SOLVED - Rebuilt the MDT and OST disks, changed /etc/fstab to have rw flag set explicitly and rebooted everything. Clients now mount and OSTs come up as active when I run "lfs check servers". Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] new install client locks up on ls /lustre
Hi all, I'm new ish to lustre and I've just created a lustre 2.12.5 cluster using the RPMs from whamcloud for Centos 7.8 with 1 MDT/MGS and 1 OSS with 3 OST's (20GB each) Everything is formatted as ldiskfs and it's running on a vmware platform as a test bed using tcp. The MDT mounts ok, the OST's mount and on my client I can mount the /lustre mount point (58GB) and I can ping everything via the lnet however as soon as I try to do an ls -l /lustre or any kind of I/O the client locks solid till I reboot it. I've tried to work out how to run basic diagnostics to no avail so I am stupped why I don't see a directory listing for what should be an empty 60G disk. On the MDS I ran this: [root@lustre-mds tests]# lctl dl 0 UP osd-ldiskfs lustre-MDT-osd lustre-MDT-osd_UUID 10 1 UP mgs MGS MGS 8 2 UP mgc MGC10.140.95.118@tcp acdb253b-b7a8-a949-0bf2-eaa17dc8dca4 4 3 UP mds MDS MDS_uuid 2 4 UP lod lustre-MDT-mdtlov lustre-MDT-mdtlov_UUID 3 5 UP mdt lustre-MDT lustre-MDT_UUID 12 6 UP mdd lustre-MDD lustre-MDD_UUID 3 7 UP qmt lustre-QMT lustre-QMT_UUID 3 8 UP lwp lustre-MDT-lwp-MDT lustre-MDT-lwp-MDT_UUID 4 9 UP osp lustre-OST-osc-MDT lustre-MDT-mdtlov_UUID 4 10 UP osp lustre-OST0001-osc-MDT lustre-MDT-mdtlov_UUID 4 11 UP osp lustre-OST0002-osc-MDT lustre-MDT-mdtlov_UUID 4 [root@lustre-mds tests]# So it looks like I have everything is running even dmesg on the client reports: [7.998649] Lustre: Lustre: Build Version: 2.12.5 [8.016113] LNet: Added LNI 10.140.95.65@tcp [8/256/0/180] [8.016214] LNet: Accept secure, port 988 [ 10.992285] Lustre: Mounted lustre-client Any pointer where to look? /var/log/messages shows no errors Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org