Re: [lustre-discuss] nodemap exports

2024-06-19 Thread Thomas Roth via lustre-discuss

Thanks Sebastien,
this put all clients into the correct LNet. And nodemap.exports looks as 
expected.

(Had to add 'options lnet lnet_peer_discovery_disabled=1' for ‘-o 
network=’ to work).


Regards
Thomas


On 6/19/24 08:31, Sebastien Buisson wrote:

Hi,

I am not sure the NIDs displayed under ‘exports’ and in the logs are the actual 
NIDs used to communicate with, maybe they are just identifiers to describe the 
peers.

If you want to restrict a client to a given LNet network for a given mount point, you 
can use the ‘-o network=’ mount option, like ‘-o network=o2ib8’ in your 
case.

Cheers,
Sebastien.


Le 18 juin 2024 à 18:01, Thomas Roth via lustre-discuss 
 a écrit :

OK, client 247 is the good one ;-)

It is rather an LNET issue.

All three clients have:


options lnet networks="o2ib8(ib0),o2ib6(ib0)"


I can mount my nodemapped Lustre, with its  MGS on 10.20.6.63@o2ib8, and 
another Lustre on o2ib6.

However, the log of the MGS at 10.20.6.63@o2ib8 remarks

MGS: Connection restored to 15da687f-7a11-4913-9f9d-3c764c97a9cb (at 
10.20.3.246@o2ib6)
MGS: Connection restored to 059da3e0-b277-4767-8678-e83730769fb8 (at 
10.20.3.248@o2ib6)

and then

MGS: Connection restored to a0ee6b8c-48b9-46e8-ba2c-9448889c77ed (at 
10.20.3.247@o2ib8)



I can see the "alien" nids also in
mgs # ls /proc/fs/lustre/mgs/MGS/exports
... 10.20.3.246@o2ib6  10.20.3.247@o2ib8  10.20.3.248@o2ib6

Question is: Why would an MGS on "o2ib8" accept connections from a client on 
"o2ib6"?

Obviously, this would not happen if the client had only o2ib6. So the MGS is 
somewhat confused, since the LNET connection is actually via o2ib8, but the 
labels and the nodemapping use o2ib6.



This is boot-resistant, the MGS has these wrong nids stored somewhere - can I 
erase them and start again with correct nids?


Regards
Thomas



On 6/18/24 17:33, Thomas Roth via lustre-discuss wrote:

Hi all,
what is the meaning of the "exports" property/parameter of a nodemap?
I have
mgs ]# lctl nodemap_info newclients
...
There are three clients:
mgs # lctl get_param nodemap.newclients.ranges
nodemap.newclients.ranges=
[
  { id: 13, start_nid: 10.20.3.246@o2ib8, end_nid: 10.20.3.246@o2ib8 },
  { id: 12, start_nid: 10.20.3.247@o2ib8, end_nid: 10.20.3.247@o2ib8 },
  { id: 9, start_nid: 10.20.3.248@o2ib8, end_nid: 10.20.3.248@o2ib8 }
This nodemap has
nodemap.newclients.admin_nodemap=0
nodemap.newclients.trusted_nodemap=0
nodemap.newclients.deny_unknown=1
and
mgs # lctl get_param nodemap.newclients.exports
nodemap.newclients.exports=
[
  { nid: 10.20.3.247@o2ib8, uuid: 5d9964f9-81eb-4ea5-93dc-a145534f9e74 },
]
_This_ client, 10.20.3.247, behaves differently: No access for root (ok!), no 
access for a regular user.
The other two clients, 10.20.3.246/248, show no access for root (ok!), while a 
regular users sees the squashed (uid 99) directories of the top level and his 
own directories/files with correct uid/gid beneath.
And the only difference between these clients seems to be these 'exports' 
(totally absent from the manual, btw).
Regards,
Thomas


--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Katharina Stummeyer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
Ministerialdirigent Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Katharina Stummeyer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
Ministerialdirigent Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] nodemap exports

2024-06-18 Thread Thomas Roth via lustre-discuss

OK, client 247 is the good one ;-)

It is rather an LNET issue.

All three clients have:

> options lnet networks="o2ib8(ib0),o2ib6(ib0)"

I can mount my nodemapped Lustre, with its  MGS on 10.20.6.63@o2ib8, and 
another Lustre on o2ib6.

However, the log of the MGS at 10.20.6.63@o2ib8 remarks
> MGS: Connection restored to 15da687f-7a11-4913-9f9d-3c764c97a9cb (at 
10.20.3.246@o2ib6)
> MGS: Connection restored to 059da3e0-b277-4767-8678-e83730769fb8 (at 
10.20.3.248@o2ib6)
and then
> MGS: Connection restored to a0ee6b8c-48b9-46e8-ba2c-9448889c77ed (at 
10.20.3.247@o2ib8)


I can see the "alien" nids also in
mgs # ls /proc/fs/lustre/mgs/MGS/exports
... 10.20.3.246@o2ib6  10.20.3.247@o2ib8  10.20.3.248@o2ib6

Question is: Why would an MGS on "o2ib8" accept connections from a client on 
"o2ib6"?

Obviously, this would not happen if the client had only o2ib6. So the MGS is somewhat confused, since 
the LNET connection is actually via o2ib8, but the labels and the nodemapping use o2ib6.




This is boot-resistant, the MGS has these wrong nids stored somewhere - can I erase them and start 
again with correct nids?



Regards
Thomas



On 6/18/24 17:33, Thomas Roth via lustre-discuss wrote:

Hi all,

what is the meaning of the "exports" property/parameter of a nodemap?

I have

mgs ]# lctl nodemap_info newclients
...

There are three clients:

mgs # lctl get_param nodemap.newclients.ranges
nodemap.newclients.ranges=
[
  { id: 13, start_nid: 10.20.3.246@o2ib8, end_nid: 10.20.3.246@o2ib8 },
  { id: 12, start_nid: 10.20.3.247@o2ib8, end_nid: 10.20.3.247@o2ib8 },
  { id: 9, start_nid: 10.20.3.248@o2ib8, end_nid: 10.20.3.248@o2ib8 }


This nodemap has

nodemap.newclients.admin_nodemap=0
nodemap.newclients.trusted_nodemap=0
nodemap.newclients.deny_unknown=1

and

mgs # lctl get_param nodemap.newclients.exports
nodemap.newclients.exports=
[
  { nid: 10.20.3.247@o2ib8, uuid: 5d9964f9-81eb-4ea5-93dc-a145534f9e74 },
]



_This_ client, 10.20.3.247, behaves differently: No access for root (ok!), no 
access for a regular user.

The other two clients, 10.20.3.246/248, show no access for root (ok!), while a regular users sees the 
squashed (uid 99) directories of the top level and his own directories/files with correct uid/gid 
beneath.



And the only difference between these clients seems to be these 'exports' (totally absent from the 
manual, btw).



Regards,
Thomas




--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Katharina Stummeyer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
Ministerialdirigent Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] nodemap exports

2024-06-18 Thread Thomas Roth via lustre-discuss

Hi all,

what is the meaning of the "exports" property/parameter of a nodemap?

I have

mgs ]# lctl nodemap_info newclients
...

There are three clients:

mgs # lctl get_param nodemap.newclients.ranges
nodemap.newclients.ranges=
[
 { id: 13, start_nid: 10.20.3.246@o2ib8, end_nid: 10.20.3.246@o2ib8 },
 { id: 12, start_nid: 10.20.3.247@o2ib8, end_nid: 10.20.3.247@o2ib8 },
 { id: 9, start_nid: 10.20.3.248@o2ib8, end_nid: 10.20.3.248@o2ib8 }


This nodemap has

nodemap.newclients.admin_nodemap=0
nodemap.newclients.trusted_nodemap=0
nodemap.newclients.deny_unknown=1

and

mgs # lctl get_param nodemap.newclients.exports
nodemap.newclients.exports=
[
 { nid: 10.20.3.247@o2ib8, uuid: 5d9964f9-81eb-4ea5-93dc-a145534f9e74 },
]



_This_ client, 10.20.3.247, behaves differently: No access for root (ok!), no 
access for a regular user.

The other two clients, 10.20.3.246/248, show no access for root (ok!), while a regular users sees the 
squashed (uid 99) directories of the top level and his own directories/files with correct uid/gid beneath.



And the only difference between these clients seems to be these 'exports' (totally absent from the 
manual, btw).



Regards,
Thomas


--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Katharina Stummeyer,Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
Ministerialdirigent Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] default Nodemap: ll_close_inode_openhandle errors

2024-04-23 Thread Thomas Roth via lustre-discuss

Hi all,

as described before, I have a test cluster with the nodemap feature activated, and managed to get a bunch of clients to mount (by deactivating 
selinux), while these clients are in the "default" nodemap.

I left the "default"s property `admin=0` and set `trusted=1`, and everything 
seemed to work.

Now I ran a compile+install benchmark on these nodes, which finished 
successfully everywhere and with the expected performance.

However, the client logs all show a large number of the following errors:

> LustreError: 2967:0:(mdc_locks.c:1388:mdc_intent_getattr_async_interpret()) mdstest-MDT-mdc-9a0bc355f000: ldlm_cli_enqueue_fini() failed: 
rc = -13


> LustreError: 53842:0:(file.c:241:ll_close_inode_openhandle()) mdstest-clilmv-9a0bc355f000: inode [0x20429:0x9d8d:0x0] mdc close failed: rc 
= -13



Error code 13 is /* Permission denied */

Therefore I repeated the benchmark on one of the "Admin" nodes - the nodemap with both `admin=1` and `trusted=1` - and this client does not show these 
errors.


More than coincidence?

Given that the benchmark seems to have finished successfully, I would ignore these errors. On the other hand, if inodes cannot be handled - that 
sounds severe?


Regards,
Thomas
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] default Nodemap : clients cannot mount

2024-04-22 Thread Thomas Roth via lustre-discuss

OK - my bad: selinux was on.

It's a bunch of test hosts = sloppy configuration = default selinux settings.

With selinux=disabled, one of these hosts mounts, and if I give it trusted=1, the users are enabled, 
root is squashed - all fine.


Cheers
Thomas

On 4/22/24 16:50, Thomas Roth via lustre-discuss wrote:

Hi all,

- Lustre 2.15.4 test system  with MDS + 2 OSS + 2 administrative clients.

I activated nodemapping and put all these hosts into an "Admin" nodemap, which has the properties 
`admin=1` and `trusted=1`  - all works fine.


Now there are a couple of other hosts which should not become administrative clients, but just 
standard clients => they should be / remain in the "default" nodemap.


The "default" nodemap has `admin=0` and `trusted=0`, as verified by
`lctl get_param nodemap.default`

- and these hosts cannot mount.
Error message is "l_getidentity: no such user 99"

I verified that these hosts actually are seen as "default" nodes by setting `admin=1` for one of them 
- mounts.

Umount, lustre_rmmod, set `admin=0` again - does not mount anymore.


Atm I do not see what I overlooked, but I am certain this has worked before, where "before" would mean 
other hardware and perhaps Lustre 2.15.1


*Switching* is still ok:
- Put client X into "Admin" nodemap, mount Lustre, remove client X from "Admin" nodemap, wait, try to 
`ls` as root - fails.

Set the property `trusted=1` on the "default" nodemap, wait, try to `ls` as a 
user - works.


However, this defeats the purpose of having a usable default...



Regards,
Thomas





--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
Ministerialdirigent Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] default Nodemap : clients cannot mount

2024-04-22 Thread Thomas Roth via lustre-discuss

Hi all,

- Lustre 2.15.4 test system  with MDS + 2 OSS + 2 administrative clients.

I activated nodemapping and put all these hosts into an "Admin" nodemap, which has the properties 
`admin=1` and `trusted=1`  - all works fine.


Now there are a couple of other hosts which should not become administrative clients, but just 
standard clients => they should be / remain in the "default" nodemap.


The "default" nodemap has `admin=0` and `trusted=0`, as verified by
`lctl get_param nodemap.default`

- and these hosts cannot mount.
Error message is "l_getidentity: no such user 99"

I verified that these hosts actually are seen as "default" nodes by setting `admin=1` for one of them 
- mounts.

Umount, lustre_rmmod, set `admin=0` again - does not mount anymore.


Atm I do not see what I overlooked, but I am certain this has worked before, where "before" would mean 
other hardware and perhaps Lustre 2.15.1


*Switching* is still ok:
- Put client X into "Admin" nodemap, mount Lustre, remove client X from "Admin" nodemap, wait, try to 
`ls` as root - fails.

Set the property `trusted=1` on the "default" nodemap, wait, try to `ls` as a 
user - works.


However, this defeats the purpose of having a usable default...



Regards,
Thomas



--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
Ministerialdirigent Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] ldiskfs / mdt size limits

2024-02-02 Thread Thomas Roth via lustre-discuss

Hi all,

 confused  about size limits:

I distinctly remember trying to format a ~19 TB disk / LV for use as an MDT, 
with ldiskfs, and failing to do so: the max size for the underlying ext4 is 16 
TB.
Knew that, had ignoed that, but not a problem back then - just adapted the 
logical volume's size.

Now I have a 24T disk, and neither mkfs.lustre nor Lustre itself have show any 
issues with it.
'df -h' does show the 24T, 'df -ih' shows the expected 4G of inodes.
I suppose this MDS has a lot of space for directories and stuff, or for DOM.
But why does it work in the first place? ldiskfs extends beyond all limits 
these days?

Regards,
Thomas

--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
Ministerialdirigent Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [EXTERNAL] [BULK] MDS hardware - NVME?

2024-01-10 Thread Thomas Roth via lustre-discuss

Actually we had MDTs on software raid-1 *connecting two JBODs* for quite some 
time - worked surprisingly well and stable.

Still, personally I would prefer ZFS anytime. Nowadays we have all our OSTs are 
on ZFS, very stable.
Of course, a look at all the possible ZFS parameters tells me that surely I 
have overlooked a crucial tuning tweak ;-)


Hmm, if you have your MDTs on a zpool of mirrors aka raid-10, wouldn't going 
towards raidz2 increase data safety, something you don't need if the SSDs 
anyhow never fail? Doesn't raidz2 protect against failure of *any* two disks - 
in a pool of mirrors the second failure could destroy one mirror?


Regards
Thomas

On 1/9/24 20:57, Cameron Harr via lustre-discuss wrote:

Thomas,

We value management over performance and have knowingly left performance on the 
floor in the name of standardization, robustness, management, etc; while still 
maintaining our performance targets. We are a heavy ZFS-on-Linux (ZoL) shop so 
we never considered MD-RAID, which, IMO, is very far behind ZoL in enterprise 
storage features.

As Jeff mentioned, we have done some tuning (and if you haven't noticed there 
are *a lot* of possible ZFS parameters) to further improve performance and are 
at a good place performance-wise.

Cameron

On 1/8/24 10:33, Jeff Johnson wrote:

Today nvme/mdraid/ldiskfs will beat nvme/zfs on MDS IOPs but you can
close the gap somewhat with tuning, zfs ashift/recordsize and special
allocation class vdevs. While the IOPs performance favors
nvme/mdraid/ldiskfs there are tradeoffs. The snapshot/backup abilities
of ZFS and the security it provides to the most critical function in a
Lustre file system shouldn't be undervalued. From personal experience,
I'd much rather deal with zfs in the event of a seriously jackknifed
MDT than mdraid/ldiskfs and both zfs and mdraid/ldiskfs are preferable
to trying to unscramble a vendor blackbox hwraid volume. ;-)

When zfs directio lands and is fully integrated into Lustre the
performance differences *should* be negligible.

Just my $.02 worth

On Mon, Jan 8, 2024 at 8:23 AM Thomas Roth via lustre-discuss
 wrote:

Hi Cameron,

did you run a performance comparison between ZFS and mdadm-raid on the MDTs?
I'm currently doing some tests, and the results favor software raid, in 
particular when it comes to IOPS.

Regards
Thomas

On 1/5/24 19:55, Cameron Harr via lustre-discuss wrote:

This doesn't answer your question about ldiskfs on zvols, but we've been 
running MDTs on ZFS on NVMe in production for a couple years (and on SAS SSDs 
for many years prior). Our current production MDTs using NVMe consist of one 
zpool/node made up of 3x 2-drive mirrors, but we've been experimenting lately 
with using raidz3 and possibly even raidz2 for MDTs since SSDs have been pretty 
reliable for us.

Cameron

On 1/5/24 9:07 AM, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via 
lustre-discuss wrote:

We are in the process of retiring two long standing LFS's (about 8 years old), 
which we built and managed ourselves.  Both use ZFS and have the MDT'S on ssd's 
in a JBOD that require the kind of software-based management you describe, in 
our case ZFS pools built on multipath devices.  The MDT in one is ZFS and the 
MDT in the other LFS is ldiskfs but uses ZFS and a zvol as you describe - we 
build the ldiskfs MDT on top of the zvol.  Generally, this has worked well for 
us, with one big caveat.  If you look for my posts to this list and the ZFS 
list you'll find more details.  The short version is that we utilize ZFS 
snapshots and clones to do backups of the metadata.  We've run into situations 
where the backup process stalls, leaving a clone hanging around.  We've 
experienced a situation a couple of times where the clone and the primary zvol 
get swapped, effectively rolling back our metadata to the point when the clone 
was created.  I have tried, unsuccessfully, to recreate
that in a test environment.  So if you do that kind of setup, make sure you 
have good monitoring in place to detect if your backups/clones stall.  We've 
kept up with lustre and ZFS updates over the years and are currently on lustre 
2.14 and ZFS 2.1.  We've seen the gap between our ZFS MDT and ldiskfs 
performance shrink to the point where they are pretty much on par to each now.  
I think our ZFS MDT performance could be better with more hardware and software 
tuning but our small team hasn't had the bandwidth to tackle that.

Our newest LFS is vendor provided and uses NVMe MDT's.  I'm not at liberty to 
talk about the proprietary way those devices are managed.  However, the 
metadata performance is SO much better than our older LFS's, for a lot of 
reasons, but I'd highly recommend NVMe's for your MDT's.

-Original Message-
From: lustre-discuss mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of Thomas Roth via lus

Re: [lustre-discuss] Extending Lustre file system

2024-01-08 Thread Thomas Roth via lustre-discuss

Yes, sorry, I meant the actual procedure of mounting the OSTs for the first 
time.

Last year I did that with 175 OSTs - replacements for EOL hardware. All OSTs had been formatted with a specific index, so probably creating a suitable 
/etc/fstab everywhere and sending a 'mount -a -t lustre' to all OSTs simultaneously would have worked.


But why the hurry? Instead, I logged in to my new OSS, mounted the OSTs with 2 sec between each mount command, watched the OSS log, watched the MDS 
log, saw the expected log messages, proceeded to the new OSS - all fine ;-)  Such a leisurely approach takes its time, of course.


Once all OSTs were happily incorporated, we put the max_create_count (set to 0 before) to some finite value and started file migration. As long as the 
migration is more effective, faster, than the users's file creations, the result should be evenly filled OSTs with a good mixture of files (file 
sizes, ages, types).



Cheers
Thomas

On 1/8/24 19:07, Andreas Dilger wrote:

The need to rebalance depends on how full the existing OSTs are.  My 
recommendation if you know that the data will continue to grow is to add new 
OSTs when the existing ones are at 60-70% full, and add them in larger groups 
rather than one at a time.

Cheers, Andreas


On Jan 8, 2024, at 09:29, Thomas Roth via lustre-discuss 
 wrote:

Just mount the OSTs, one by one and perhaps not if your system is heavily 
loaded. Follow what happens in the MDS log and the OSS log.
And try to rebalance the OSTs fill levels afterwards - very empty OSTs will 
attract all new files, which might be hot and direct your users's fire to your 
new OSS only.

Regards,
Thomas


On 1/8/24 15:38, Backer via lustre-discuss wrote:
Hi,
Good morning and happy new year!
I have a quick question on extending a lustre file system. The extension is 
performed online. I am looking for any best practices or anything to watchout 
while doing the file system extension. The file system extension is done adding 
new OSS and many OSTs within these servers.
Really appreciate your help on this.
Regards,
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Extending Lustre file system

2024-01-08 Thread Thomas Roth via lustre-discuss

Just mount the OSTs, one by one and perhaps not if your system is heavily 
loaded. Follow what happens in the MDS log and the OSS log.
And try to rebalance the OSTs fill levels afterwards - very empty OSTs will 
attract all new files, which might be hot and direct your users's fire to your 
new OSS only.

Regards,
Thomas

On 1/8/24 15:38, Backer via lustre-discuss wrote:

Hi,

Good morning and happy new year!

I have a quick question on extending a lustre file system. The extension is 
performed online. I am looking for any best practices or anything to watchout 
while doing the file system extension. The file system extension is done adding 
new OSS and many OSTs within these servers.

Really appreciate your help on this.

Regards,




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [EXTERNAL] [BULK] MDS hardware - NVME?

2024-01-08 Thread Thomas Roth via lustre-discuss

Hi Cameron,

did you run a performance comparison between ZFS and mdadm-raid on the MDTs?
I'm currently doing some tests, and the results favor software raid, in 
particular when it comes to IOPS.

Regards
Thomas

On 1/5/24 19:55, Cameron Harr via lustre-discuss wrote:

This doesn't answer your question about ldiskfs on zvols, but we've been 
running MDTs on ZFS on NVMe in production for a couple years (and on SAS SSDs 
for many years prior). Our current production MDTs using NVMe consist of one 
zpool/node made up of 3x 2-drive mirrors, but we've been experimenting lately 
with using raidz3 and possibly even raidz2 for MDTs since SSDs have been pretty 
reliable for us.

Cameron

On 1/5/24 9:07 AM, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via 
lustre-discuss wrote:
We are in the process of retiring two long standing LFS's (about 8 years old), which we built and managed ourselves.  Both use ZFS and have the MDT'S on ssd's in a JBOD that require the kind of software-based management you describe, in our case ZFS pools built on multipath devices.  The MDT in one is ZFS and the MDT in the other LFS is ldiskfs but uses ZFS and a zvol as you describe - we build the ldiskfs MDT on top of the zvol.  Generally, this has worked well for us, with one big caveat.  If you look for my posts to this list and the ZFS list you'll find more details.  The short version is that we utilize ZFS snapshots and clones to do backups of the metadata.  We've run into situations where the backup process stalls, leaving a clone hanging around.  We've experienced a situation a couple of times where the clone and the primary zvol get swapped, effectively rolling back our metadata to the point when the clone was created.  I have tried, unsuccessfully, to recreate 
that in a test environment.  So if you do that kind of setup, make sure you have good monitoring in place to detect if your backups/clones stall.  We've kept up with lustre and ZFS updates over the years and are currently on lustre 2.14 and ZFS 2.1.  We've seen the gap between our ZFS MDT and ldiskfs performance shrink to the point where they are pretty much on par to each now.  I think our ZFS MDT performance could be better with more hardware and software tuning but our small team hasn't had the bandwidth to tackle that.


Our newest LFS is vendor provided and uses NVMe MDT's.  I'm not at liberty to 
talk about the proprietary way those devices are managed.  However, the 
metadata performance is SO much better than our older LFS's, for a lot of 
reasons, but I'd highly recommend NVMe's for your MDT's.

-Original Message-
From: lustre-discuss mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of Thomas Roth via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>>
Reply-To: Thomas Roth mailto:t.r...@gsi.de>>
Date: Friday, January 5, 2024 at 9:03 AM
To: Lustre Diskussionsliste mailto:lustre-discuss@lists.lustre.org>>
Subject: [EXTERNAL] [BULK] [lustre-discuss] MDS hardware - NVME?


CAUTION: This email originated from outside of NASA. Please take care when clicking links 
or opening attachments. Use the "Report Message" button to report suspicious 
messages to the NASA SOC.








Dear all,


considering NVME storage for the next MDS.


As I understand, NVME disks are bundled in software, not by a hardware raid 
controller.
This would be done using Linux software raid, mdadm, correct?


We have some experience with ZFS, which we use on our OSTs.
But I would like to stick to ldiskfs for the MDTs, and a zpool with a zvol on 
top which is then formatted with ldiskfs - to much voodoo...


How is this handled elsewhere? Any experiences?




The available devices are quite large. If I create a raid-10 out of 4 disks, 
e.g. 7 TB each, my MDT will be 14 TB - already close to the 16 TB limit.
So no need for a box with lots of U.3 slots.


But for MDS operations, we will still need a powerful dual-CPU system with lots 
of RAM.
Then the NVME devices should be distributed between the CPUs?
Is there a way to pinpoint this in a call for tender?




Best regards,
Thomas



Thomas Roth


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, 
https://urldefense.us/v3/__http://www.gsi.de/__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF_rGY74EQ$
  
<https://urldefense.us/v3/__http://www.gsi.de/__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF_rGY74EQ$
 >


Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
St

[lustre-discuss] MDS hardware - NVME?

2024-01-05 Thread Thomas Roth via lustre-discuss

Dear all,

considering NVME storage for the next MDS.

As I understand, NVME disks are bundled in software, not by a hardware raid 
controller.
This would be done using Linux software raid, mdadm, correct?

We have some experience with ZFS, which we use on our OSTs.
But I would like to stick to ldiskfs for the MDTs, and a zpool with a zvol on 
top which is then formatted with ldiskfs - to much voodoo...

How is this handled elsewhere? Any experiences?


The available devices are quite large. If I create a raid-10 out of 4 disks, e.g. 7 TB each, my MDT will be 14 TB - already close to the 16 TB limit. 
So no need for a box with lots of U.3 slots.


But for MDS operations, we will still need a powerful dual-CPU system with lots 
of RAM.
Then the NVME devices should be distributed between the CPUs?
Is there a way to pinpoint this in a call for tender?


Best regards,
Thomas


Thomas Roth

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] adding a new OST to live system

2024-01-05 Thread Thomas Roth via lustre-discuss

Not a problem at all.

Perhaps if you manage to mount your new OST for the first time just when your MGS/MDT and your network are completely overloaded and almost 
unreactive, then, perhaps, there might be issues ;-)


Afterwards the new OST, being empty, will attract most of the files that are newly created. That could result in an imbalance - old, cold data vs. 
new, hot data. In our case, we migrate some of the old data around, such that the fill level of the OSTs becomes ~equal.


Regards,
Thomas

On 12/1/23 19:18, Lana Deere via lustre-discuss wrote:

I'm looking at the manual, 14.8, Adding a New OST to a Lustre File
System, and it looks straightforward.  It isn;'t clear to me, however,
whether it is OK to do this while the rest of the lustre system is
live.  Is it OK to add a new OST while the system is in use?  Or do I
need to arrange downtime for the system to do this?

Thanks.

.. Lana (lana.de...@gmail.com)
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OST is not mounting

2023-11-13 Thread Thomas Roth via lustre-discuss

So, did you do the "writeconf"? And the OST mounted afterwards?

As I understand, the MGS was under the impression that this re-mounting 
OST was actually a new one using an old index.

So, what made your repaired OST look new/different ?
I would probably have mounted it locally, as an ext4 file system, if 
only to check that there is data still present (ok, "df" would do that, 
too).
"tunefs.lustre --dryrun"  will show other quantum numbers that _should 
not_ change when taking down and remounting an OST.


And since "writeconf" has to be done on all targets, you have to take 
down your MDS anyhow - so nothing is lost by simply trying an MDS restart?


Regards
Thomas

On 11/5/23 17:11, Backer via lustre-discuss wrote:

Hi,

I am new to this email list. Looking to get some help on why an OST is 
not getting mounted.



The cluster was running healthy and the OST experienced an issue and 
Linux re-mounted the OST read only. After fixing the issue and rebooting 
the node multiple times, it wouldn't mount.


When the mount is done, the mount command errors out stating that that 
the index is already in use. The index for the device is 33.  There is 
no place where this index is mounted.


The debug message from the MGS during the mount is attached at the end 
of this email. It is asking to use writeconf. After using writeconfig, 
the device was mounted. Looking for a couple of things here.


- I am hoping that the writeconf method is the right thing to do here.
- Why did OST become in this state after the write failure and was 
mounted RO.  The write error was due to iSCSI target going offline and 
coming back after a few seconds later.


2000:0100:17.0:1698240468.758487:0:91492:0:(mgs_handler.c:496:mgs_target_reg())
 updating fs1-OST0021, index=33

2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:4403:mgs_write_log_target())
 Process entered

2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:671:mgs_set_index())
 Process entered

2000:0001:17.0:1698240468.758488:0:91492:0:(mgs_llog.c:572:mgs_find_or_make_fsdb())
 Process entered

2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:551:mgs_find_or_make_fsdb_nolock())
 Process entered

2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:565:mgs_find_or_make_fsdb_nolock())
 Process leaving (rc=0 : 0 : 0)

2000:0001:17.0:1698240468.758489:0:91492:0:(mgs_llog.c:578:mgs_find_or_make_fsdb())
 Process leaving (rc=0 : 0 : 0)

2000:0202:17.0:1698240468.758490:0:91492:0:(mgs_llog.c:711:mgs_set_index())
 140-5: Server fs1-OST0021 requested index 33, but that index is already in 
use. Use --writeconf to force

2000:0001:17.0:1698240468.772355:0:91492:0:(mgs_llog.c:712:mgs_set_index())
 Process leaving via out_up (rc=18446744073709551518 : -98 : 0xff9e)

2000:0001:17.0:1698240468.772356:0:91492:0:(mgs_llog.c:4408:mgs_write_log_target())
 Process leaving (rc=18446744073709551518 : -98 : ff9e)

2000:0002:17.0:1698240468.772357:0:91492:0:(mgs_handler.c:503:mgs_target_reg())
 Failed to write fs1-OST0021 log (-98)

2000:0001:17.0:1698240468.783747:0:91492:0:(mgs_handler.c:504:mgs_target_reg())
 Process leaving via out (rc=18446744073709551518 : -98 : 0xff9e)




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre-Manual on lfsck - non-existing entries?

2023-11-01 Thread Thomas Roth via lustre-discuss

Thanks!
In the end, it was a typo.

But your explanation about parameters having wandered off to debugfs has 
helped me find some other info, long lost from /proc and not 
Lustre-related ;-)


Regards
Thomas

On 10/31/23 22:34, Andreas Dilger wrote:

On Oct 31, 2023, at 13:12, Thomas Roth via lustre-discuss 
 wrote:


Hi all,

after starting an `lctl lfsck_start -A  -C -o` and the oi_scrub having completed, I would 
check the layout scan as described in the Lustre manual, "36.4.3.3. LFSCK status of 
layout via procfs", by


lctl get_param -n mdd.FSNAME-MDT_target.lfsck_layout


Doesn't work, and inspection of 'ls /sys/fs/lustre/mdd/FSNAME-MDT/' shows:

...
lfsck_async_windows
lfsck_speed_limit

...

as the only entries showing the string "lfsck".


lctl lfsck_query -M FSNAME-MDT -t layout


does show some info, although it is not what the manual describes as output of 
the `lctl get_param` command.


Issue with the manual or issue with our Lustre?


Are you perhaps running the "lctl get_param" as a non-root user?  One of the wonderful quirks of 
the kernel is that they don't want new parameters stored in procfs, and they don't want "complex" 
parameters (more than one value) stored in sysfs, so by necessity this means anything "complex" 
needs to go into debugfs (/sys/kernel/debug) but that was changed at some point to only be accessible by root.

As such, you need to be root to access any of the "complex" parameters/stats:

   $ lctl get_param mdd.*.lfsck_layout
   error: get_param: param_path 'mdd/*/lfsck_layout': No such file or directory

   $ sudo lctl get_param mdd.*.lfsck_layout
   mdd.myth-MDT.lfsck_layout=
   name: lfsck_layout
   magic: 0xb1732fed
   version: 2
   status: completed
   flags:
   param: all_targets
   last_completed_time: 1694676243
   time_since_last_completed: 4111337 seconds
   latest_start_time: 1694675639
   time_since_latest_start: 4111941 seconds
   last_checkpoint_time: 1694676243
   time_since_last_checkpoint: 4111337 seconds
   latest_start_position: 12
   last_checkpoint_position: 4194304
   first_failure_position: 0
   success_count: 6
   repaired_dangling: 0
   repaired_unmatched_pair: 0
   repaired_multiple_referenced: 0
   repaired_orphan: 0
   repaired_inconsistent_owner: 0
   repaired_others: 0
   skipped: 0
   failed_phase1: 0
   failed_phase2: 0
   checked_phase1: 3791402
   checked_phase2: 0
   run_time_phase1: 595 seconds
   run_time_phase2: 8 seconds
   average_speed_phase1: 6372 items/sec
   average_speed_phase2: 0 objs/sec
   real_time_speed_phase1: N/A
   real_time_speed_phase2: N/A
   current_position: N/A

   $ sudo ls /sys/kernel/debug/lustre/mdd/myth-MDT/
   total 0
   0 changelog_current_mask  0 changelog_users  0 lfsck_namespace
   0 changelog_mask  0 lfsck_layout

Getting an update to the manual to clarify this requirement would be welcome.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud








___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre-Manual on lfsck - non-existing entries?

2023-10-31 Thread Thomas Roth via lustre-discuss

Hi all,

after starting an `lctl lfsck_start -A  -C -o` and the oi_scrub having completed, I would check the layout scan as described in the Lustre manual, 
"36.4.3.3. LFSCK status of layout via procfs", by


> lctl get_param -n mdd.FSNAME-MDT_target.lfsck_layout

Doesn't work, and inspection of 'ls /sys/fs/lustre/mdd/FSNAME-MDT/' shows:
> ...
> lfsck_async_windows
> lfsck_speed_limit
...

as the only entries showing the string "lfsck".

> lctl lfsck_query -M FSNAME-MDT -t layout

does show some info, although it is not what the manual describes as output of 
the `lctl get_param` command.



Issue with the manual or issue with our Lustre?


Regards
Thomas
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OST went back in time: no(?) hardware issue

2023-10-04 Thread Thomas Roth via lustre-discuss

Hi Andreas,

On 10/5/23 02:30, Andreas Dilger wrote:

On Oct 3, 2023, at 16:22, Thomas Roth via lustre-discuss 
 wrote:


Hi all,

in our Lustre 2.12.5 system, we have "OST went back in time" after OST hardware 
replacement:
- hardware had reached EOL
- we set `max_create_count=0` for these OSTs, searched for and migrated off the 
files of these OSTs
- formatted the new OSTs with `--replace` and the old indices
- all OSTs are on ZFS
- set the OSTs `active=0` on our 3 MDTs
- moved in the new hardware, reused the old NIDs, old OST indices, mounted the 
OSTs
- set the OSTs `active=1`
- ran `lfsck` on all servers
- set `max_create_count=200` for these OSTs

Now the "OST went back in time" messages appeard in the MDS logs.

This doesn't quite fit the description in the manual. There were no crashes or 
power losses. I cannot understand how which cache might have been lost.
The transaction numbers quoted in the error are both large, eg. `transno 
55841088879 was previously committed, server now claims 4294992012`

What should we do? Give `lfsck` another try?


Nothing really to see here I think?

Did you delete LAST_RCVD during the replacement and the OST didn't know what 
transno was assigned to the last RPCs it sent?  The still-mounted clients have 
a record of this transno and are surprised that it was reset.  If you unmount 
and remount the clients the error would go away.



No, I don't think I deleted something during the procedure.
- The old OST was emptied (max_create_count=0) in normal Lustre operations. 
Last transaction should be ~ last file being moved away.
- Then the OST is deactivated, but only on the MDS, not on the clients.
- Then the new OST, formatted with '--replace', is mounted. It is activated on 
the MDS. Up to this point no errors.
- Finally, the max_create_count is increased, clients can write.
- Now the MDT throws this error (nothing in the client logs).

According to the manual, what should have happened when I mounted the new OST,

The MDS and OSS will negotiate the LAST_ID value for the replacement OST.


Ok, this is about LAST_ID, whereever that is on ZFS.

About LAST_RCVD, the manual says (even in the case when the configuration files 
got lost and have to be recreated):

The last_rcvd file will be recreated when the OST is first mounted using the 
default parameters,



So, let's see what happens once the clients remount.
Eventually, then, I should also restart the MDTs?


Regards,
Thomas



I'm not sure if the clients might try to preserve the next 55B RPCs in memory 
until the committed transno on the OST catches up, or if they just accept the 
new transno and get on with life?

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud








___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] OST went back in time: no(?) hardware issue

2023-10-03 Thread Thomas Roth via lustre-discuss

Hi all,

in our Lustre 2.12.5 system, we have "OST went back in time" after OST hardware 
replacement:
- hardware had reached EOL
- we set `max_create_count=0` for these OSTs, searched for and migrated off the 
files of these OSTs
- formatted the new OSTs with `--replace` and the old indices
- all OSTs are on ZFS
- set the OSTs `active=0` on our 3 MDTs
- moved in the new hardware, reused the old NIDs, old OST indices, mounted the 
OSTs
- set the OSTs `active=1`
- ran `lfsck` on all servers
- set `max_create_count=200` for these OSTs

Now the "OST went back in time" messages appeard in the MDS logs.

This doesn't quite fit the description in the manual. There were no crashes or 
power losses. I cannot understand how which cache might have been lost.
The transaction numbers quoted in the error are both large, eg. `transno 
55841088879 was previously committed, server now claims 4294992012`

What should we do? Give `lfsck` another try?

Regards,
Thomas


--

Thomas Roth
Department: IT

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Install instructions for Rocky 8.8

2023-09-26 Thread Thomas Roth via lustre-discuss
Also, I think you want to check out some 'release branch' first: your compilation gave you "2.15.58" packages - this is probably an intermediary, 
development version. At least I seem to remember a warning from Andreas about these multi-digit subsubversions.

According to lustre.org, the current release is 2.15.3 - perhaps this works a 
little better.

On 25/09/2023 11.52, Jan Andersen wrote:

I'm having some trouble installing lustre - this is on Rocky 8.8. I downloaded 
the latest (?) source:

git clone git://git.whamcloud.com/fs/lustre-release.git

and I managed to compile and create the RPMs:

make rpms

I now have a directory full of rpm files:

[root@rocky8 lustre-release]# ls -1 ?*rpm
kmod-lustre-2.15.58_42_ga54a206-1.el8.x86_64.rpm
kmod-lustre-debuginfo-2.15.58_42_ga54a206-1.el8.x86_64.rpm
kmod-lustre-osd-ldiskfs-2.15.58_42_ga54a206-1.el8.x86_64.rpm
kmod-lustre-osd-ldiskfs-debuginfo-2.15.58_42_ga54a206-1.el8.x86_64.rpm
kmod-lustre-tests-2.15.58_42_ga54a206-1.el8.x86_64.rpm
kmod-lustre-tests-debuginfo-2.15.58_42_ga54a206-1.el8.x86_64.rpm
lustre-2.15.58_42_ga54a206-1.el8.x86_64.rpm
lustre-2.15.58_42_ga54a206-1.src.rpm
lustre-debuginfo-2.15.58_42_ga54a206-1.el8.x86_64.rpm
lustre-debugsource-2.15.58_42_ga54a206-1.el8.x86_64.rpm
lustre-devel-2.15.58_42_ga54a206-1.el8.x86_64.rpm
lustre-iokit-2.15.58_42_ga54a206-1.el8.x86_64.rpm
lustre-osd-ldiskfs-mount-2.15.58_42_ga54a206-1.el8.x86_64.rpm
lustre-osd-ldiskfs-mount-debuginfo-2.15.58_42_ga54a206-1.el8.x86_64.rpm
lustre-resource-agents-2.15.58_42_ga54a206-1.el8.x86_64.rpm
lustre-tests-2.15.58_42_ga54a206-1.el8.x86_64.rpm
lustre-tests-debuginfo-2.15.58_42_ga54a206-1.el8.x86_64.rpm

This is what I get when I, somewhat naively, try to simply install the lot with:

[root@rocky8 lustre-release]# dnf install ?*rpm
Last metadata expiration check: 0:12:59 ago on Mon 25 Sep 2023 09:29:52 UTC.
Error:
  Problem 1: conflicting requests
   - nothing provides ldiskfsprogs >= 1.44.3.wc1 needed by 
kmod-lustre-osd-ldiskfs-2.15.58_42_ga54a206-1.el8.x86_64
  Problem 2: conflicting requests
   - nothing provides ldiskfsprogs > 1.45.6 needed by 
lustre-osd-ldiskfs-mount-2.15.58_42_ga54a206-1.el8.x86_64
  Problem 3: package lustre-2.15.58_42_ga54a206-1.el8.x86_64 requires 
lustre-osd-mount, but none of the providers can be installed
   - conflicting requests
   - nothing provides ldiskfsprogs > 1.45.6 needed by 
lustre-osd-ldiskfs-mount-2.15.58_42_ga54a206-1.el8.x86_64
  Problem 4: package lustre-devel-2.15.58_42_ga54a206-1.el8.x86_64 requires 
liblnetconfig.so.4()(64bit), but none of the providers can be installed
   - package lustre-devel-2.15.58_42_ga54a206-1.el8.x86_64 requires 
liblustreapi.so.1()(64bit), but none of the providers can be installed
   - package lustre-devel-2.15.58_42_ga54a206-1.el8.x86_64 requires lustre = 
2.15.58_42_ga54a206, but none of the providers can be installed
   - package lustre-2.15.58_42_ga54a206-1.el8.x86_64 requires lustre-osd-mount, 
but none of the providers can be installed
   - conflicting requests
   - nothing provides ldiskfsprogs > 1.45.6 needed by 
lustre-osd-ldiskfs-mount-2.15.58_42_ga54a206-1.el8.x86_64
  Problem 5: package lustre-resource-agents-2.15.58_42_ga54a206-1.el8.x86_64 
requires lustre, but none of the providers can be installed
   - package lustre-2.15.58_42_ga54a206-1.el8.x86_64 requires lustre-osd-mount, 
but none of the providers can be installed
   - conflicting requests
   - nothing provides ldiskfsprogs > 1.45.6 needed by 
lustre-osd-ldiskfs-mount-2.15.58_42_ga54a206-1.el8.x86_64
  Problem 6: package lustre-tests-2.15.58_42_ga54a206-1.el8.x86_64 requires 
liblnetconfig.so.4()(64bit), but none of the providers can be installed
   - package lustre-tests-2.15.58_42_ga54a206-1.el8.x86_64 requires 
liblustreapi.so.1()(64bit), but none of the providers can be installed
   - package lustre-tests-2.15.58_42_ga54a206-1.el8.x86_64 requires lustre = 
2.15.58_42_ga54a206, but none of the providers can be installed
   - package lustre-2.15.58_42_ga54a206-1.el8.x86_64 requires lustre-osd-mount, 
but none of the providers can be installed
   - conflicting requests
   - nothing provides ldiskfsprogs > 1.45.6 needed by 
lustre-osd-ldiskfs-mount-2.15.58_42_ga54a206-1.el8.x86_64


Clearly there is something I haven't done yet, but what am I doing wrong?

/jan

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] 2.15 install failure

2023-08-04 Thread Thomas Roth via lustre-discuss

Hi all,

returning to my Lustre installations, the curious failures continue...

- Download of 2.15.3 for el8.8 from Whamcloud
- Installation of a server with rocky 8.8 (I mean, why not,  while it still 
exists...)
- Want an ldiskfs server, so
> dnf install lustre lustre-osd-ldiskfs-mount lustre-ldiskfs-dkms
 -->  Fails because the full ext4 source is not present.

I wonder whether I got the workaround from this mailing list, but it should 
really be in some official documentation or better not necessary at all:

- Rocky 8.8 installs with kernel 4.18.0-477.15.1, so download 
'kernel-4.18.0-477.15.1.el8_8.src.rpm'
> rpm -i ./kernel-4.18.0-477.15.1.el8_8.src.rpm
> tar xJf rpmbuild/SOURCES/linux-4.18.0-477.15.1.el8_8.tar.xz
> cp -a linux-4.18.0-477.15.1.el8_8/fs/ext4/* 
/usr/src/kernels/4.18.0-477.15.1.el8_8.x86_64/fs/ext4/


Of course, at this stage, 'lustre-ldiskfs-dkms' is already installed, so
> dnf reinstall lustre-ldiskfs-dkms

This plainly prints out that dkms is successfully installing / compiling all 
the modules, then prints
>  Running scriptlet: lustre-ldiskfs-dkms-2.15.3-1.el8.noarch 
  > 2/2

> Deprecated feature: REMAKE_INITRD 
(/var/lib/dkms/lustre-ldiskfs/2.15.3/source/dkms.conf)
> Deprecated feature: REMAKE_INITRD 
(/var/lib/dkms/lustre-ldiskfs/2.15.3/source/dkms.conf)
> Module lustre-ldiskfs-2.15.3 for kernel 4.18.0-477.15.1.el8_8.x86_64 (x86_64).
> Before uninstall, this module version was ACTIVE on this kernel.
> Removing any linked weak-modules

and the uninstalls all the modules 

Even the /var/lib/dkms/lustre-ldiskfs gets removed, so this machine is clean and pristine, just that dnf/rpm believe that lustre-ldiskfs-dkms is 
already installed. ;-)


(These messages printed between creation and destruction, they do not really 
indicate any kind of trouble, do they?)


Well. we all know we are dealing with computers and not with deterministic 
machines, so
> dnf remove lustre lustre-ldiskfs-dkms lustre-osd-ldiskfs-mount
and
> dnf install lustre-ldiskfs-dkms

(Drum roll...) Lustre modules get compiled, installed _and_ _not_ removed.


('modprobe lustre' works, 'dnf install lustre lustre-osd-ldiskfs-mount' does 
not create new havoc)

I'm flabbergasted and really have no idea how I misconfigured a simple, minimal 
el8.8 installation into this kind of behavior.


Cheers
Thomas


--
Thomas Roth   IT-HPC-Linux
Location: SB3 2.291   Phone: 1453

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Configuring LustreFS Over DRBD

2023-03-24 Thread Thomas Roth via lustre-discuss

Hi Shambhu,

I also think active-active is not possible here - two NIDs for the same target? - but we have been running our MDTs on top of DRBD, which works quite 
well. The last time I compared this setup against a mirror of storage targets, DRBD was actually a bit faster.

And it might improve if you use protocol B or A instead of C.

Regards,
Thomas

On 3/15/23 12:29, Shambhu Raje via lustre-discuss wrote:

I am trying to configure a clustered file system over DRBD software so that
if we mount a file system just like LustreFS over DRBD set -up in dual
primary mode it can provide us with the real time replication of data . Can
I configure lustre file system over DRBD in redhat 8.7 ...
If yes, how can it be configured ?


Waiting for your supporting response.


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Quota issue after OST removal

2022-10-26 Thread Thomas Roth via lustre-discuss

Hi Daniel,

isn't this expected: on your lustrefs-OST0001, usage  seems to have hit the 
limit (perhaps if you do 'lfs quota -g somegroup...', it will show
you by how many bytes).

If one part of the distributed quota is exceeded, Lustre should report that with the * - although the total across the file system is still below the 
limit.



Obviously your 'somegroup' is at the quota limit on all visible OSTs, so my 
guess is that would be the same on the missing two OSTs.
So, either have some data removed or increase the limit.

Best regards
Thomas

On 26.10.22 16:52, Daniel Szkola via lustre-discuss wrote:

Hello all,

We recently removed an OSS/OST node that was spontaneously shutting down so
hardware testing could be performed. I have no idea how long it will be out,
so I followed the procedure for permanent removal.

Since then space usage is being calculated correctly, but 'lfs quota' will
show groups as exceeding quota, despite being under both soft and hard
limits. A verbose listing shows that all OST limits are met and I have no
idea how to reset the limits now that the two OSTs on the removed OSS node
are not part of the equation.

Due to the heavy usage of the Lustre filesystem, no clients have been
unmounted and no MDS or OST nodes have been restarted. The underlying
filesystem is ZFS.

Looking for ideas on how to correct this.

Example:

# lfs quota -gh somegroup -v /lustre1
Disk quotas for grp somegroup (gid ):
  Filesystemused   quota   limit   grace   files   quota   limit
grace
/lustre1  21.59T*27T 30T 6d23h39m15s 2250592  2621440 3145728
-
lustrefs-MDT_UUID
  1.961G   -  1.962G   - 2250592   - 2359296
-
lustrefs-OST_UUID
  2.876T   -  2.876T   -   -   -   -
-
lustrefs-OST0001_UUID
  2.611T*  -  2.611T   -   -   -   -
-
lustrefs-OST0002_UUID
  4.794T   -  4.794T   -   -   -   -
-
lustrefs-OST0003_UUID
  4.587T   -  4.587T   -   -   -   -
-
quotactl ost4 failed.
quotactl ost5 failed.
lustrefs-OST0006_UUID
   3.21T   -   3.21T   -   -   -   -
-
lustrefs-OST0007_UUID
  3.515T   -  3.515T   -   -   -   -
-
Total allocated inode limit: 2359296, total allocated block limit: 21.59T
Some errors happened when getting quota info. Some devices may be not
working or deactivated. The data in "[]" is inaccurate.

--
Dan Szkola
FNAL
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] OST replacement procedure

2022-10-26 Thread Thomas Roth via lustre-discuss

Hi all,

about the correct procedure to replace an OST:
I read the recent issues reported here by Robert Redl, the LU-15000 by Stephane 
and in particular his talk at LAD22:

Why is it important to _not_ reuse old OST indices?



Understandable if you want to remove the OST, but not replace it.

In the past - I think in Lustre 1.8 - when there was no "mkfs.lustre --replace" available, over time we ended up with a long list of OSTs continually 
'lctl --deactivate'd on all clients, very ugly.
And were so happy when explicit indices and '--replace' were introduced, in particular because I was terribly afraid of creating holes in the list of 
active OSTs ('holes' might have been a No-No in some past version?)




Nowadays, everybody wants to avoid old OST incides - with 'lctl --del-ost' a 
specific command for doing that comfortably is developed.
Why?


Best regards,
Thomas
--

Thomas Roth   IT-HPC-Linux
Location: SB3 2.291   Phone: 1453

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] missing option mgsnode

2022-07-22 Thread Thomas Roth via lustre-discuss

You could look at what the device believes it's formatted with by

> tunefs.lustre --dryrun /dev/mapper/mpathd

When I do that here, I get something like

checking for existing Lustre data: found

   Read previous values:
Target: idril-OST000e
Index:  14
Lustre FS:  idril
Mount type: zfs
Flags:  0x2
  (OST )
Persistent mount opts:
Parameters: mgsnode=10.20.6.64@o2ib4:10.20.6.69@o2ib4
...


Tells you about 'mount type' and 'mgsnode'.


Regards
Thomas


On 20/07/2022 19.48, Paul Edmon via lustre-discuss wrote:
We have a filesystem that we have running Lustre 2.10.4 in HA mode using IML.  One of our OST's had 
some disk failures and after reconstruction of the RAID set it won't remount but gives:


[root@holylfs02oss06 ~]# mount -t lustre /dev/mapper/mpathd 
/mnt/holylfs2-OST001f
Failed to initialize ZFS library: 256
mount.lustre: missing option mgsnode=

The weird thing is that we didn't build this with ZFS, the devices are all ldiskfs.  We suspect some 
of the data is corrupt on the disk but we were wondering if anyone had seen this error before and if 
there was a solution.


-Paul Edmon-

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291



GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre 2.15 installation on Centos 8.5.2111

2022-07-15 Thread Thomas Roth via lustre-discuss

In addition, you might need to use the "--nodeps" to install your self-compiled 
packages, cf.
https://jira.whamcloud.com/browse/LU-15976

Cheers
Thomas

On 7/12/22 19:02, Jesse Stroik via lustre-discuss wrote:

Hi Fran,

I suspect the issue is a missing kmod-zfs RPM which provides those symbols. It 
might be the case that it was inadvertently excluded from the whamcloud repo.

You could build your own RPMs on a centos system with the group 'Development 
Tools' installed. I'd recommend doing it as an unprivileged user and however 
you setup your normal build environment.

Start by installing the kernel RPMs they provide, boot into that new kernel, verify 
that is the newest kernel you have installed, then build zfs & lustre from 
source.

Here is an example I tested on a rocky 8.5 system, but it'll probably work 
similarly on a centos 8.5 system.

$ git clone https://github.com/zfsonlinux/zfs.git
$ cd zfs
$ git checkout zfs-2.0.7
$ sh autogen.sh
$ ./configure --with-spec=redhat
$ make rpms

At that point, you should have a set of ZFS RPMs built. Install them:

$ dnf localinstall kmod-zfs-2.0.7-1.el8.x86_64.rpm 
kmod-zfs-devel-2.0.7-1.el8.x86_64.rpm libnvpair3-2.0.7-1.el8.x86_64.rpm 
libuutil3-2.0.7-1.el8.x86_64.rpm libzfs4-2.0.7-1.el8.x86_64.rpm 
libzfs4-devel-2.0.7-1.el8.x86_64.rpm libzpool4-2.0.7-1.el8.x86_64.rpm 
zfs-2.0.7-1.el8.x86_64.rpm

At this point if you've built for the correct kernel, the zfs module should be 
loadable. Then fetch and build lustre 2.15.0. This worked for me:

$ git clone git://git.whamcloud.com/fs/lustre-release.git
$ cd lustre-release
$ git checkout v2_15_0
$ sh autogen.sh
$  ## in this example it will build with zfs support only
$ ./configure --enable-server --disable-ldiskfs
$ make rpms

If everything succeeds, install the RPMs. I believe this would be the minimum 
set you might need:

$ dnf localinstall kmod-lustre-2.15.0-1.el8.x86_64.rpm 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64.rpm lustre-2.15.0-1.el8.x86_64.rpm 
lustre-osd-zfs-mount-2.15.0-1.el8.x86_64.rpm

Adjust as needed. I hope you find this useful.

Best,
Jesse





From: lustre-discuss  on behalf of Bedosti 
Francesco 
Sent: Monday, July 11, 2022 11:17 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] lustre 2.15 installation on Centos 8.5.2111

Hi
i'm installing lustre 2.15 with ZFS backend from repository 
https://downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5/server/ on a 
Centos 8.5

I added zfs from lustre repository without problems, but when i try to install 
lustre it gives me this error:

yum install lustre
Last metadata expiration check: 0:07:45 ago on Mon Jul 11 18:05:51 2022.
Error:
   Problem: package lustre-2.15.0-1.el8.x86_64 requires lustre-osd-mount, but 
none of the providers can be installed
- package lustre-osd-zfs-mount-2.15.0-1.el8.x86_64 requires 
kmod-lustre-osd-zfs, but none of the providers can be installed
- conflicting requests
- nothing provides ksym(__cv_broadcast) = 0x03cebd8a needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(arc_add_prune_callback) = 0x1363912f needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(arc_buf_size) = 0x115a75cf needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(arc_remove_prune_callback) = 0x1ab2d851 needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(dbuf_create_bonus) = 0x7beafc97 needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(dbuf_read) = 0xa12ed106 needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(dmu_assign_arcbuf_by_dbuf) = 0x26d78f55 needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(dmu_bonus_hold) = 0x8d7deb8a needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(dmu_buf_hold_array_by_bonus) = 0x878059c5 needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(dmu_buf_rele) = 0x9205359f needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(dmu_buf_rele_array) = 0x33363fa0 needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(dmu_free_long_range) = 0x329676ab needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(dmu_free_range) = 0x356042ae needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(dmu_object_alloc_dnsize) = 0x72ae6b8e needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(dmu_object_free) = 0x01514575 needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(dmu_object_next) = 0x989708be needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(dmu_object_set_blocksize) = 0xdbdf5ea0 needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothing provides ksym(dmu_objset_disown) = 0xac0fbfc0 needed by 
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
- nothin

Re: [lustre-discuss] How to speed up Lustre

2022-07-06 Thread Thomas Roth via lustre-discuss

Yes, I got it.
But Marion states that they switched
> to a PFL arrangement, where the first 64k lives on flash OST's (mounted on 
our metadata servers), and the remainder of larger files lives on HDD OST's.

So, how do you specify a particular OSTs (or group of OSTs) in a PFL?
The OST-equivalent of the "-L mdt" part ?

With SSDs and HDDs making up the OSTs, I would have guessed OST pools, but I'm only aware of a "lfs setstripe" that puts all of my file into a pool. 
How to put the first few kB of a file in pool A and the rest in pool B ?



Cheers
Thomas


On 7/6/22 21:42, Andreas Dilger wrote:

Thomas,
where the file data is stored depends entirely on the PFL layout used for the 
filesystem or parent directory.

For DoM files, you need to specify a DoM component, like:

 lfs setstripe -E 64K -L mdt -E 1G -c 1 -E 16G -c 4 -E eof -c 32 

so the first 64KB will be put onto the MDT where the file is created, the 
remaining 1GB onto a single OST, the next 15GB striped across 4 OSTs, and the 
rest of the file striped across (up to) 32 OSTs.

64KB is the minimum DoM component size, but if the files are smaller (e.g. 3KB) 
they will only allocate space on the MDT in multiples of 4KB blocks.  However, 
the default ldiskfs MDT formatting only leaves about 1 KB of space per inode, 
which would quickly run out unless DoM is restricted to specific directories 
with small files, or if the MDT is formatted with enough free space to 
accommodate this usage.  This is less of an issue with ZFS MDTs, but DoM files 
will still consume space much more quickly and reduce the available inode count 
by a factor of 16-64 more quickly than without DoM.

It is strongly recommended to use Lustre 2.15 with DoM to benefit from the automatic MDT 
space balancing, otherwise the MDT usage may become imbalanced if the admin (or users) do 
not actively manage the MDT selection for new user/project/job directories with "lfs 
mkdir -i".

Cheers, Andreas

On Jul 6, 2022, at 10:48, Thomas Roth via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:

Hi Marion,

I do not fully understand how to "mount flash OSTs on a metadata server"
- You have a couple of SSDs, you assemble these into on block device and format it with 
"mkfs.lustre --ost ..." ? And then mount it just as any other OST?
- PFL then puts the first 64k on these OSTs and the rest of all files on the 
HDD-based OSTs?
So, no magic on the MDS?

I'm asking because we are considering something similar, but we would not have 
these flash-OSTs in the MDS-hardware but on separate OSS servers.


Regards,
Thomas

On 23/02/2022 04.35, Marion Hakanson via lustre-discuss wrote:
Hi again,
kara...@aselsan.com.tr<mailto:kara...@aselsan.com.tr> said:
I was thinking that DoM is built in feature and it can be enabled/disabled
online for a certain directories. What do you mean by reformat to converting
to DoM (or away from it). I think just Metadata target size is important.
When we first turned on DoM, it's likely that our Lustre system was old
enough to need to be reformatted in order to support it.  Our flash
storage RAID configuration also needed to be expanded, but the system
was not yet in production so a reformat was no big deal at the time.
So perhaps your system will not be subject to this requirement (other
than expanding your MDT flash somehow).
kara...@aselsan.com.tr<mailto:kara...@aselsan.com.tr> said:
I also thought creating flash OST on metadata server. But I was not sure what
to install on metadata server for this purpose. Can Metadata server be an OSS
server at the same time? If it is possible I would prefer flash OST on
Metadata server instead of DoM. Because Our metadata target size is small, it
seems I have to do risky operations to expand size.
Yes, our metadata servers are also OSS's at the same time.  The flash
OST's are separate volumes (and drives) from the MDT's, so less scary (:-).
kara...@aselsan.com.tr<mailto:kara...@aselsan.com.tr> said:
imho, because of the less RPC traffic DoM shows more performance than flash
OST. Am I right?
The documentation does say there that using DoM for small files will produce
less RPC traffic than using OST's for small files.
But as I said earlier, for us, the amount of flash needed to support DoM
was a lot higher than with the flash OST approach (we have a high percentage,
by number, of small files).
I'll also note that we had a wish to mostly "set and forget" the layout
for our Lustre filesystem.  We have not figured out a way to predict
or control where small files (or large ones) are going to end up, so
trying to craft optimal layouts in particular directories for particular
file sizes has turned out to not be feasible for us.  PFL has been a
win for us here, for that reason.
Our conclusion was that in order to take advantage of the performance
improvements of DoM, you need enough mon

Re: [lustre-discuss] How to speed up Lustre

2022-07-06 Thread Thomas Roth via lustre-discuss

Hi Marion,

I do not fully understand how to "mount flash OSTs on a metadata server"
- You have a couple of SSDs, you assemble these into on block device and format it with "mkfs.lustre 
--ost ..." ? And then mount it just as any other OST?

- PFL then puts the first 64k on these OSTs and the rest of all files on the 
HDD-based OSTs?
So, no magic on the MDS?

I'm asking because we are considering something similar, but we would not have these flash-OSTs in the 
MDS-hardware but on separate OSS servers.



Regards,
Thomas

On 23/02/2022 04.35, Marion Hakanson via lustre-discuss wrote:

Hi again,

kara...@aselsan.com.tr said:

I was thinking that DoM is built in feature and it can be enabled/disabled
online for a certain directories. What do you mean by reformat to converting
to DoM (or away from it). I think just Metadata target size is important.


When we first turned on DoM, it's likely that our Lustre system was old
enough to need to be reformatted in order to support it.  Our flash
storage RAID configuration also needed to be expanded, but the system
was not yet in production so a reformat was no big deal at the time.

So perhaps your system will not be subject to this requirement (other
than expanding your MDT flash somehow).



kara...@aselsan.com.tr said:

I also thought creating flash OST on metadata server. But I was not sure what
to install on metadata server for this purpose. Can Metadata server be an OSS
server at the same time? If it is possible I would prefer flash OST on
Metadata server instead of DoM. Because Our metadata target size is small, it
seems I have to do risky operations to expand size.


Yes, our metadata servers are also OSS's at the same time.  The flash
OST's are separate volumes (and drives) from the MDT's, so less scary (:-).



kara...@aselsan.com.tr said:

imho, because of the less RPC traffic DoM shows more performance than flash
OST. Am I right?


The documentation does say there that using DoM for small files will produce
less RPC traffic than using OST's for small files.

But as I said earlier, for us, the amount of flash needed to support DoM
was a lot higher than with the flash OST approach (we have a high percentage,
by number, of small files).

I'll also note that we had a wish to mostly "set and forget" the layout
for our Lustre filesystem.  We have not figured out a way to predict
or control where small files (or large ones) are going to end up, so
trying to craft optimal layouts in particular directories for particular
file sizes has turned out to not be feasible for us.  PFL has been a
win for us here, for that reason.

Our conclusion was that in order to take advantage of the performance
improvements of DoM, you need enough money for lots of flash, or you need
enough staff time to manage the DoM layouts to fit into that flash.

We have neither of those conditions, and we find that using PFL and
flash OST's for small files is working very well for us.

Regards,

Marion




From: =?utf-8?B?VGFuZXIgS0FSQUfDlkw=?= 
To: Marion Hakanson 
CC: "lustre-discuss@lists.lustre.org" 
Date: Tue, 22 Feb 2022 04:53:03 +

UNCLASSIFIED

Thank you for sharing your experience.

I was thinking that DoM is built in feature and it can be enabled/disabled 
online for a certain directories. What do you mean by reformat to converting to 
DoM (or away from it). I think just Metadata target size is important.

I also thought creating flash OST on metadata server. But I was not sure what 
to install on metadata server for this purpose. Can Metadata server be an OSS 
server at the same time? If it is possible I would prefer flash OST on Metadata 
server instead of DoM. Because Our metadata target size is small, it seems I 
have to do risky operations to expand size.

imho, because of the less RPC traffic DoM shows more performance than flash 
OST. Am I right?

Best Regards;


From: Marion Hakanson 
Sent: Thursday, February 17, 2022 8:20 PM
To: Taner KARAGÖL 
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] How to speed up Lustre

We started with DoM on our new Lustre system a couple years ago.
   - Converting to DoM (or away from it) is a full-reformat operation.
   - DoM uses a fixed amount of metadata space (64k minimum for us) for every 
file, even those smaller than 64k.

Basically, DoM uses a lot of flash metadata space, more than we planned for, 
and more than we could afford.

We ended up switching to a PFL arrangement, where the first 64k lives on flash 
OST's (mounted on our metadata servers), and the remainder of larger files 
lives on HDD OST's.  This is working very well for our small-file workloads, 
and uses less flash space than the DoM configuration did.

Since you don't already have DoM in effect, it may be possible that you could add flash 
OST's, configure a PFL, and then use "lfs migrate" to re-layout existing files 
into the new OST's.  Your mileage may vary, so be safe!

Regards,

Marion



On Feb 14, 2022, at 03:32, Taner KARAGÖL via lustre-disc

Re: [lustre-discuss] Installing 2.15 on rhel 8.5 fails

2022-06-29 Thread Thomas Roth via lustre-discuss
After making sure that this can be reproduced when starting with the git repo, I have created LU-15972 
for this issue.


There is an unrelated issue with the ZFS-variant, which however also fits the subject of this mail, 
which I described in detail in LU-15976.


Best regards
Thomas


On 24/06/2022 16.24, Thomas Roth via lustre-discuss wrote:

Since it seems I have now managed to create the modules, I'd like to record 
that here:

1. Install system w AlmaLinux 8.5 -> kernel 4.18.0-348.23.1
2. Install packages from Whamcloud (lustre-2.15.0/el8.5.2111/server/): lustre, kmod-lustre, 
kmod-lustre-osd-ldiskfs -> fails due to the discussed 'unknown symbols', cf. 
https://jira.whamcloud.com/browse/LU-15962

3. Install the correspoding dkms-packages -> fails, reason not clear
4. Go to the remnant /var/lib/dkms/lustre-ldiskfs/2.15.0/build, run configure with 
'--with-o2ib=/usr/src/kernels/4.18.0-348.23.1.el8_5.x86_64' (that's were the kernel-devel + the extfs 
sources ended up in this case)

5. 'make rpms' now fails with

 > make[4]: Entering directory 
'/var/lib/dkms/lustre-ldiskfs/2.15.0/build/lustre/utils'
 > ...
 > In file included from liblustreapi.c:83:
 > lstddef.h:306:22: error: static declaration of ‘copy_file_range’ follows 
non-static declaration
 > │static inline loff_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,


This was already reported last year, e.g. 
https://www.mail-archive.com/lustre-discuss@lists.lustre.org/msg16822.html
The workaround is also given there: the configure command line has '--disable-utils' (by default), 
somehow this makes make go to 'utils' and fail.


6. Repeat 'configure with the '-with-o2ib=/usr/src/kernels...' and without 
'--disable-utils'
7. 'make rpms' yields some installable kmod packages, the contained modules can be loaded (I haven't 
setup the file system yet).



Cheers,
Thomas


PS: My rather spartan Alma-installation needed in addition 'dnf install'
 > kernel-headers-4.18.0-348.23.1.el8_5.x86_64 
kernel-devel-4.18.0-348.23.1.el8_5.x86_64 dkms
 > kernel-4.18.0-348.23.1.el8_5.src
 > e2fsprogs-devel rpm-build kernel-rpm-macros  kernel-abi-whitelists 
libselinux-devel libtool






On 6/22/22 21:08, Jian Yu wrote:

Hi Thomas,

The issue is being fixed in https://jira.whamcloud.com/browse/LU-15962.
A workaround is to build Lustre with "--with-o2ib=" configure option.
The  is where in-kernel Module.symvers is located.

--
Best regards,
Jian Yu

-Original Message-
From: lustre-discuss  on behalf of Thomas Roth via 
lustre-discuss 

Reply-To: Thomas Roth 
Date: Wednesday, June 22, 2022 at 10:32 AM
To: Andreas Dilger 
Cc: lustre-discuss 
Subject: Re: [lustre-discuss] Installing 2.15 on rhel 8.5 fails

 Hmm, but we are using the in-kernel OFED, so this makes these messages all 
the more mysterious.
 Regards,
 Thomas
 On 22/06/2022 19.12, Andreas Dilger wrote:
 > On Jun 22, 2022, at 10:40, Thomas Roth via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:

 >
 > my rhel8 system is actually an Alma Linux 8.5 installation, this is the first time the 
compatiblity to an alleged rhel8.5 software fails...

 >
 >
 > The system is running kernel '4.18.0-348.2.1.el8_5'
 > This version string can also be found in the package names in
 > 
https://downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64
 > - this is usually a good sign.
 >
 > However, installation of kmod-lustre-2.15.0-1.el8 yields the well known 
"depmod: WARNINGs", like
 >> /lib/modules/4.18.0-348.2.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown 
symbol __ib_alloc_pd

 >
 >
 > The kernel from 
downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64 identifies itself 
as "CentOS" and does not want to boot - no option either.

 >
 >
 > Any hints how to proceed?
 >
 > The ko2iblnd module is built against the in-kernel OFED, so if you are using MOFED you will 
need to rebuild the kernel modules themselves.  If you don't use IB at all you can ignore these 
depmod messages.

 >
 > Cheers, Andreas
 > --
 > Andreas Dilger
 > Lustre Principal Architect
 > Whamcloud
 >
 --
 
 Thomas Roth
 Department: Informationstechnologie
 GSI Helmholtzzentrum für Schwerionenforschung GmbH
 Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
 Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
 Managing Directors / Geschäftsführung:
 Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
 Chairman of

Re: [lustre-discuss] Installing 2.15 on rhel 8.5 fails

2022-06-24 Thread Thomas Roth via lustre-discuss

Since it seems I have now managed to create the modules, I'd like to record 
that here:

1. Install system w AlmaLinux 8.5 -> kernel 4.18.0-348.23.1
2. Install packages from Whamcloud (lustre-2.15.0/el8.5.2111/server/): lustre, kmod-lustre, kmod-lustre-osd-ldiskfs -> fails due to the discussed 
'unknown symbols', cf. https://jira.whamcloud.com/browse/LU-15962

3. Install the correspoding dkms-packages -> fails, reason not clear
4. Go to the remnant /var/lib/dkms/lustre-ldiskfs/2.15.0/build, run configure with '--with-o2ib=/usr/src/kernels/4.18.0-348.23.1.el8_5.x86_64' (that's 
were the kernel-devel + the extfs sources ended up in this case)

5. 'make rpms' now fails with

> make[4]: Entering directory '/var/lib/dkms/lustre-ldiskfs/2.15.0/build/lustre/utils' 

> ... 

> In file included from liblustreapi.c:83: 

> lstddef.h:306:22: error: static declaration of ‘copy_file_range’ follows non-static declaration 

> │static inline loff_t copy_file_range(int fd_in, loff_t *off_in, int fd_out, 




This was already reported last year, e.g. 
https://www.mail-archive.com/lustre-discuss@lists.lustre.org/msg16822.html
The workaround is also given there: the configure command line has 
'--disable-utils' (by default), somehow this makes make go to 'utils' and fail.

6. Repeat 'configure with the '-with-o2ib=/usr/src/kernels...' and without 
'--disable-utils'
7. 'make rpms' yields some installable kmod packages, the contained modules can 
be loaded (I haven't setup the file system yet).


Cheers,
Thomas


PS: My rather spartan Alma-installation needed in addition 'dnf install'
> kernel-headers-4.18.0-348.23.1.el8_5.x86_64 
kernel-devel-4.18.0-348.23.1.el8_5.x86_64 dkms
> kernel-4.18.0-348.23.1.el8_5.src
> e2fsprogs-devel rpm-build kernel-rpm-macros  kernel-abi-whitelists 
libselinux-devel libtool






On 6/22/22 21:08, Jian Yu wrote:

Hi Thomas,

The issue is being fixed in https://jira.whamcloud.com/browse/LU-15962.
A workaround is to build Lustre with "--with-o2ib=" configure option.
The  is where in-kernel Module.symvers is located.

--
Best regards,
Jian Yu
  


-Original Message-
From: lustre-discuss  on behalf of Thomas 
Roth via lustre-discuss 
Reply-To: Thomas Roth 
Date: Wednesday, June 22, 2022 at 10:32 AM
To: Andreas Dilger 
Cc: lustre-discuss 
Subject: Re: [lustre-discuss] Installing 2.15 on rhel 8.5 fails

 Hmm, but we are using the in-kernel OFED, so this makes these messages all 
the more mysterious.
 Regards,
 Thomas
 
 On 22/06/2022 19.12, Andreas Dilger wrote:

 > On Jun 22, 2022, at 10:40, Thomas Roth via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:
 >
 > my rhel8 system is actually an Alma Linux 8.5 installation, this is the 
first time the compatiblity to an alleged rhel8.5 software fails...
 >
 >
 > The system is running kernel '4.18.0-348.2.1.el8_5'
 > This version string can also be found in the package names in
 > 
https://downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64
 > - this is usually a good sign.
 >
 > However, installation of kmod-lustre-2.15.0-1.el8 yields the well known 
"depmod: WARNINGs", like
 >> 
/lib/modules/4.18.0-348.2.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs 
unknown symbol __ib_alloc_pd
 >
 >
 > The kernel from 
downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64 identifies 
itself as "CentOS" and does not want to boot - no option either.
 >
 >
 > Any hints how to proceed?
 >
 > The ko2iblnd module is built against the in-kernel OFED, so if you are 
using MOFED you will need to rebuild the kernel modules themselves.  If you don't 
use IB at all you can ignore these depmod messages.
 >
 > Cheers, Andreas
 > --
 > Andreas Dilger
 > Lustre Principal Architect
 > Whamcloud
 >
 
 
 
 
 
 
 
 
 --

 
 Thomas Roth
 Department: Informationstechnologie
 
 
 GSI Helmholtzzentrum für Schwerionenforschung GmbH

 Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
 
 Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528

 Managing Directors / Geschäftsführung:
 Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
 Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
 State Secretary / Staatssekretär Dr. Volkmar Dietz
 
 ___

 lustre-discuss mailing list
 lustre-discuss@lists.lustre.org
 http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
 


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Installing 2.15 on rhel 8.5 fails

2022-06-22 Thread Thomas Roth via lustre-discuss

Hmm, but we are using the in-kernel OFED, so this makes these messages all the 
more mysterious.
Regards,
Thomas

On 22/06/2022 19.12, Andreas Dilger wrote:

On Jun 22, 2022, at 10:40, Thomas Roth via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:

my rhel8 system is actually an Alma Linux 8.5 installation, this is the first 
time the compatiblity to an alleged rhel8.5 software fails...


The system is running kernel '4.18.0-348.2.1.el8_5'
This version string can also be found in the package names in
https://downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64
- this is usually a good sign.

However, installation of kmod-lustre-2.15.0-1.el8 yields the well known "depmod: 
WARNINGs", like

/lib/modules/4.18.0-348.2.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko 
needs unknown symbol __ib_alloc_pd



The kernel from 
downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64 
identifies itself as "CentOS" and does not want to boot - no option either.


Any hints how to proceed?

The ko2iblnd module is built against the in-kernel OFED, so if you are using 
MOFED you will need to rebuild the kernel modules themselves.  If you don't use 
IB at all you can ignore these depmod messages.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud










--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Installing 2.15 on rhel 8.5 fails

2022-06-22 Thread Thomas Roth via lustre-discuss

Hi all,

my rhel8 system is actually an Alma Linux 8.5 installation, this is the first time the compatiblity to 
an alleged rhel8.5 software fails...



The system is running kernel '4.18.0-348.2.1.el8_5'
This version string can also be found in the package names in
https://downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64
- this is usually a good sign.

However, installation of kmod-lustre-2.15.0-1.el8 yields the well known "depmod: 
WARNINGs", like
> /lib/modules/4.18.0-348.2.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol 
__ib_alloc_pd



The kernel from downloads.whamcloud.com/public/lustre/lustre-2.15.0/el8.5.2111/server/RPMS/x86_64 
identifies itself as "CentOS" and does not want to boot - no option either.



Any hints how to proceed?

Regards,
Thomas


--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Building 2.15 on rhel8 fails

2022-06-22 Thread Thomas Roth via lustre-discuss

Hi all,

I tried to install 'lustre-ldiskfs-dkms' on a rhel8.5 system, running kernel
Fails, /var/lib/dkms/lustre-ldiskfs/2.15.0/build/make.log says "No targets specified and no makefile 
found", and in the corresponding '/var/lib/dkms/lustre-ldiskfs/2.15.0/buildconfig.log' indeed the 
first real error seems to be


> scripts/Makefile.build:45: 
/var/lib/dkms/lustre-ldiskfs/2.15.0/build/build//var/lib/dkms/lustre-ldiskfs/2.15.0/build/build/Makefile: 
No such file or directory
> make[1]: *** No rule to make target 
'/var/lib/dkms/lustre-ldiskfs/2.15.0/build/build//var/lib/dkms/lustre-ldiskfs/2.15.0/build/build/Makefile'. 
 Stop.



This directory tree is a bit large :-)
> '/var/lib/dkms/lustre-ldiskfs/2.15.0/build/build/Makefile'
does exist, though.

Where could this doubling of the path come from?


Btw, how to re-run dkms, in case I'd edit some stuff there?

Regards
Thomas




--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Interoperability 2.12.7 client <-> 2.12.8 server

2022-03-09 Thread Thomas Roth via lustre-discuss

Hi Hans-Henrik,

might be this LU-15244 - I would never have guessed so from the LU - text ;-)

But I can report that the same to clients can do the same operations as before without any problems if 
installed with CentOS 7.9 instead of rhel8.5, again one of them with 'kmod-lustre-client-2.12.8_6', 
the other one with 'lustre-client-dkms-2.12.7'


Regards,
Thomas


On 07/03/2022 09.05, Hans Henrik Happe via lustre-discuss wrote:

Hi Thomas,

They should work together, but there are other requirements that need to be 
fulfilled:

https://wiki.lustre.org/Lustre_2.12.8_Changelog

I guess your servers are CentOS 7.9 as required for 2.12.8.

I had an issue with Rocky 8.5 and the latest kernel with 2.12.8. While RHEL 8.5 is supported there was 
something new after 4.18.0-348.2.1.el8_5, which caused problems. I found an LU fixing it post 2.12.8 
(can't remember the number), but downgrading to 4.18.0-348.2.1.el8_5 was the quick fix.


Cheers,
Hans Henrik

On 03.03.2022 08.40, Thomas Roth via lustre-discuss wrote:

Dear all,

this might be just something I forgot or did not read thoroughly, but shouldn't a 2.12.7-client work 
with 2.12.8 - servers?


The 2.12.8-changelog has the standard disclaimer

Interoperability Support:
   Clients & Servers: Latest 2.10.X and Latest 2.11.X




I have this test cluster that I upgraded recently to 2.12.8 on the servers.

The fist client I attached now is a fresh install of rhel 8.5 (Alma).
I installed 'kmod-lustre-client' and `lustre-client` from 
https://downloads.whamcloud.com/public/lustre/lustre-2.12.8/el8.5.2111/

I copied a directory containing ~5000 files - no visible issues


The next client was also installed with rhel 8.5 (Alma), but now using 'lustre-client-2.12.7-1' and 
'lustre-client-dkms-2.12.7-1' from

https://downloads.whamcloud.com/public/lustre/lustre-2.12.7/el8/client/RPMS/x86_64/

As on my first client, I copied a directory containing ~5000 files. The copy stalled, and the OSTs 
exploded in my face


kernel: LustreError: 23345:0:(events.c:310:request_in_callback()) event type 2, status -103, 

service ost_io
kernel: LustreError: 40265:0:(pack_generic.c:605:__lustre_unpack_msg()) message length 0 too small 

for magic/version check
kernel: LustreError: 40265:0:(sec.c:2217:sptlrpc_svc_unwrap_request()) error unpacking request from 

12345-10.20.2.167@o2ib6 x1726208297906176
kernel: LustreError: 23345:0:(events.c:310:request_in_callback()) event type 2, status -103, 

service ost_io


The latter message is repeated ad infinitum.

The client log blames the network:

Request sent has failed due to network error
 Connection to was lost; in progress operations using this service will wait 
for recovery to complete


LustreError: 181316:0:(events.c:205:client_bulk_callback()) event type 1, status -103, 
desc86e248d6
LustreError: 181315:0:(events.c:205:client_bulk_callback()) event type 1, status -5, desc 

e569130f



There is also a client running Debian 9 and Lustre 2.12.6 (compiled from git) - 
no trouble at all.


The I switched those two rhel8.5-clients: reinstalled the OS, gave the first one the 2.12.7 
-packages, the second on the 2.12.8 - and the error followed: again the client running with 
'lustre-client-dkms-2.12.7-1' immedeately ran into trouble, causing the same error messages in the 
logs.

So this is not a network problem in the sense of broken hardware etc.


What did I miss?
Some important Jira I did not read?


Regards
Thomas





___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Interoperability 2.12.7 client <-> 2.12.8 server

2022-03-02 Thread Thomas Roth via lustre-discuss

Dear all,

this might be just something I forgot or did not read thoroughly, but shouldn't 
a 2.12.7-client work with 2.12.8 - servers?

The 2.12.8-changelog has the standard disclaimer

Interoperability Support:
   Clients & Servers: Latest 2.10.X and Latest 2.11.X




I have this test cluster that I upgraded recently to 2.12.8 on the servers.

The fist client I attached now is a fresh install of rhel 8.5 (Alma).
I installed 'kmod-lustre-client' and `lustre-client` from 
https://downloads.whamcloud.com/public/lustre/lustre-2.12.8/el8.5.2111/
I copied a directory containing ~5000 files - no visible issues


The next client was also installed with rhel 8.5 (Alma), but now using 
'lustre-client-2.12.7-1' and 'lustre-client-dkms-2.12.7-1' from
https://downloads.whamcloud.com/public/lustre/lustre-2.12.7/el8/client/RPMS/x86_64/

As on my first client, I copied a directory containing ~5000 files. The copy 
stalled, and the OSTs exploded in my face

kernel: LustreError: 23345:0:(events.c:310:request_in_callback()) event type 2, status -103, 

service ost_io
kernel: LustreError: 40265:0:(pack_generic.c:605:__lustre_unpack_msg()) message length 0 too small 

for magic/version check
kernel: LustreError: 40265:0:(sec.c:2217:sptlrpc_svc_unwrap_request()) error unpacking request from 

12345-10.20.2.167@o2ib6 x1726208297906176
kernel: LustreError: 23345:0:(events.c:310:request_in_callback()) event type 2, status -103, 

service ost_io


The latter message is repeated ad infinitum.

The client log blames the network:

Request sent has failed due to network error
 Connection to was lost; in progress operations using this service will wait 
for recovery to complete



LustreError: 181316:0:(events.c:205:client_bulk_callback()) event type 1, 
status -103, desc86e248d6
LustreError: 181315:0:(events.c:205:client_bulk_callback()) event type 1, status -5, desc 

e569130f



There is also a client running Debian 9 and Lustre 2.12.6 (compiled from git) - 
no trouble at all.


The I switched those two rhel8.5-clients: reinstalled the OS, gave the first one the 2.12.7 -packages, the second on the 2.12.8 - and the error 
followed: again the client running with 'lustre-client-dkms-2.12.7-1' immedeately ran into trouble, causing the same error messages in the logs.

So this is not a network problem in the sense of broken hardware etc.


What did I miss?
Some important Jira I did not read?


Regards
Thomas


--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] OST mount with failover MDS

2021-03-17 Thread Thomas Roth via lustre-discuss

Hi all,

I wonder if I am seeing signs of network problems when mounting an OST:


tunefs.lustre --dryrun tells me (what I know from my own format command)
>Parameters: mgsnode=10.20.3.0@o2ib5:10.20.3.1@o2ib5

These are the nids for our MGS+MDT0, there are two more pairs for MDT1 and MDT2.

I went step-by-step, modprobing lnet and lustre, and checking LNET by 'lnet ping' to the active MDTs, 
which worked fine.


However, mounting such an OST (e.g. after a crash) at first prints a number of
> LNet: 19444:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for 
10.20.3.1@o2ib5: 0 seconds

and similarly for the failover partners of the other two MDS.

Should it do that?


Imho, LNET to a failover node _must_ fail, because LNET should not be up on the 
failover node, right?

If I started LNET there, and some client does not get an answer quickly enough from the acting MDS, it 
would try the failover, LNET yes but Lustre no - that doesn't sound right.



Regards,
Thomas

--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [EXTERNAL] MDT mount stuck

2021-03-11 Thread Thomas Roth via lustre-discuss

Hi Rick,
I have not tried that yet - after some forty minutes the mount command returned, the device is mounted. I will check how it behaves after all OSTs 
have been mounted.


Regards
Thomas

On 12.03.21 00:05, Mohr, Rick wrote:

Thomas,

Is the behavior any different if you mount with the "-o abort_recov" option to 
avoid the recovery phase?

--Rick

On 3/11/21, 11:48 AM, "lustre-discuss on behalf of Thomas Roth via lustre-discuss" 
 
wrote:

 Hi all,

 after not getting out of the ldlm_lockd - situation, we are trying a 
shutdown plus restart.
 Does not work at all, the very first mount of the restart is MGS + MDT0, 
of course.

 It is quite busy writing traces to the log


 Mar 11 17:21:17 lxmds19.gsi.de kernel: INFO: task mount.lustre:2948 
blocked for more than 120 seconds.
 Mar 11 17:21:17 lxmds19.gsi.de kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables
 this message.
 Mar 11 17:21:17 lxmds19.gsi.de kernel: mount.lustreD 9616ffc5acc0  
   0  2948   2947 0x0082
 Mar 11 17:21:17 lxmds19.gsi.de kernel: Call Trace:
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
schedule+0x29/0x70
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
schedule_timeout+0x221/0x2d0
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
select_task_rq_fair+0x5a6/0x760
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
wait_for_completion+0xfd/0x140
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
wake_up_state+0x20/0x20
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
llog_process_or_fork+0x244/0x450 [obdclass]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
llog_process+0x14/0x20 [obdclass]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
class_config_parse_llog+0x125/0x350
 [obdclass]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
mgc_process_cfg_log+0x790/0xc40 [mgc]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
mgc_process_log+0x3dc/0x8f0 [mgc]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
config_recover_log_add+0x13f/0x280 [mgc]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
class_config_dump_handler+0x7e0/0x7e0
 [obdclass]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
mgc_process_config+0x88b/0x13f0 [mgc]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
lustre_process_log+0x2d8/0xad0 [obdclass]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
libcfs_debug_msg+0x57/0x80 [libcfs]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
lprocfs_counter_add+0xf9/0x160 [obdclass]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
server_start_targets+0x13a4/0x2a20 [obdclass]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
lustre_start_mgc+0x260/0x2510 [obdclass]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
class_config_dump_handler+0x7e0/0x7e0
 [obdclass]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
server_fill_super+0x10cc/0x1890 [obdclass]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
lustre_fill_super+0x468/0x960 [obdclass]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
lustre_common_put_super+0x270/0x270
 [obdclass]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
mount_nodev+0x4f/0xb0
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
lustre_mount+0x38/0x60 [obdclass]
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
mount_fs+0x3e/0x1b0
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
vfs_kern_mount+0x67/0x110
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
do_mount+0x1ef/0xd00
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
__check_object_size+0x1ca/0x250
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
kmem_cache_alloc_trace+0x3c/0x200
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
SyS_mount+0x83/0xd0
 Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
system_call_fastpath+0x25/0x2a




 Other than that, nothing is happening.

 The Lustre processes have started, but e.g. recovery_status = Inactive.
 OK, perhaps because there is nothing out there to recover besides this 
MDS, all other Lustre
 servers+clients are still stopped.


 Still, on previous occasions the mount would not block in this way. The 
device would be mounted - now
 it does not make it into /proc/mounts

 Btw, the disk device can be mounted as type ldiskfs. So it exists, and it 
looks definitely like a
 Lustre MDT on the inside.


 Best,
 Thomas

 --
 
 Thomas Roth
 Department: Informationstechnologie
 Location: SB3 2.291
 Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


 GSI Helmholtzzentrum für Schwerionenforschung GmbH
 Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

 Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
 Managing Directors / Geschäftsführung:
 Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
 Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
 State Secretary /

Re: [lustre-discuss] MDT mount stuck

2021-03-11 Thread Thomas Roth via lustre-discuss

And a perhaps minor observation:

Comparing to previous restarts in the log files, I see the line
Lustre: MGS: Connection restored to 2519f316-4f30-9698-3487-70eb31a73320 (at 
0@lo)

Before, it was
Lustre: MGS: Connection restored to c70c1b4e-3517-5631-28b1-7163f13e7bed (at 
0@lo)

What is this number? A unique identifier for the MGS? Which changes between 
restarts?


Regards,
Thomas


On 11/03/2021 17.47, Thomas Roth via lustre-discuss wrote:

Hi all,

after not getting out of the ldlm_lockd - situation, we are trying a shutdown 
plus restart.
Does not work at all, the very first mount of the restart is MGS + MDT0, of 
course.

It is quite busy writing traces to the log


Mar 11 17:21:17 lxmds19.gsi.de kernel: INFO: task mount.lustre:2948 blocked for 
more than 120 seconds.
Mar 11 17:21:17 lxmds19.gsi.de kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

Mar 11 17:21:17 lxmds19.gsi.de kernel: mount.lustre    D 9616ffc5acc0 0 
 2948   2947 0x0082
Mar 11 17:21:17 lxmds19.gsi.de kernel: Call Trace:
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] schedule+0x29/0x70
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
schedule_timeout+0x221/0x2d0
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
select_task_rq_fair+0x5a6/0x760
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
wait_for_completion+0xfd/0x140
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
wake_up_state+0x20/0x20
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
llog_process_or_fork+0x244/0x450 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
llog_process+0x14/0x20 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] class_config_parse_llog+0x125/0x350 
[obdclass]

Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
mgc_process_cfg_log+0x790/0xc40 [mgc]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
mgc_process_log+0x3dc/0x8f0 [mgc]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
config_recover_log_add+0x13f/0x280 [mgc]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? class_config_dump_handler+0x7e0/0x7e0 
[obdclass]

Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
mgc_process_config+0x88b/0x13f0 [mgc]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
lustre_process_log+0x2d8/0xad0 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
libcfs_debug_msg+0x57/0x80 [libcfs]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
lprocfs_counter_add+0xf9/0x160 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] server_start_targets+0x13a4/0x2a20 
[obdclass]

Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
lustre_start_mgc+0x260/0x2510 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? class_config_dump_handler+0x7e0/0x7e0 
[obdclass]

Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
server_fill_super+0x10cc/0x1890 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
lustre_fill_super+0x468/0x960 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? lustre_common_put_super+0x270/0x270 
[obdclass]

Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
mount_nodev+0x4f/0xb0
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
lustre_mount+0x38/0x60 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] mount_fs+0x3e/0x1b0
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
vfs_kern_mount+0x67/0x110
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
do_mount+0x1ef/0xd00
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
__check_object_size+0x1ca/0x250
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
kmem_cache_alloc_trace+0x3c/0x200
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] SyS_mount+0x83/0xd0
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
system_call_fastpath+0x25/0x2a




Other than that, nothing is happening.

The Lustre processes have started, but e.g. recovery_status = Inactive.
OK, perhaps because there is nothing out there to recover besides this MDS, all other Lustre 
servers+clients are still stopped.



Still, on previous occasions the mount would not block in this way. The device would be mounted - now 
it does not make it into /proc/mounts


Btw, the disk device can be mounted as type ldiskfs. So it exists, and it looks definitely like a 
Lustre MDT on the inside.



Best,
Thomas



--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] MDT mount stuck

2021-03-11 Thread Thomas Roth via lustre-discuss

Hi all,

after not getting out of the ldlm_lockd - situation, we are trying a shutdown 
plus restart.
Does not work at all, the very first mount of the restart is MGS + MDT0, of 
course.

It is quite busy writing traces to the log


Mar 11 17:21:17 lxmds19.gsi.de kernel: INFO: task mount.lustre:2948 blocked for 
more than 120 seconds.
Mar 11 17:21:17 lxmds19.gsi.de kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

Mar 11 17:21:17 lxmds19.gsi.de kernel: mount.lustreD 9616ffc5acc0 0 
 2948   2947 0x0082
Mar 11 17:21:17 lxmds19.gsi.de kernel: Call Trace:
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] schedule+0x29/0x70
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
schedule_timeout+0x221/0x2d0
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
select_task_rq_fair+0x5a6/0x760
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
wait_for_completion+0xfd/0x140
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
wake_up_state+0x20/0x20
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
llog_process_or_fork+0x244/0x450 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
llog_process+0x14/0x20 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] class_config_parse_llog+0x125/0x350 
[obdclass]

Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
mgc_process_cfg_log+0x790/0xc40 [mgc]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
mgc_process_log+0x3dc/0x8f0 [mgc]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
config_recover_log_add+0x13f/0x280 [mgc]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? class_config_dump_handler+0x7e0/0x7e0 
[obdclass]

Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
mgc_process_config+0x88b/0x13f0 [mgc]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
lustre_process_log+0x2d8/0xad0 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
libcfs_debug_msg+0x57/0x80 [libcfs]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
lprocfs_counter_add+0xf9/0x160 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
server_start_targets+0x13a4/0x2a20 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
lustre_start_mgc+0x260/0x2510 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? class_config_dump_handler+0x7e0/0x7e0 
[obdclass]

Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
server_fill_super+0x10cc/0x1890 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
lustre_fill_super+0x468/0x960 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? lustre_common_put_super+0x270/0x270 
[obdclass]

Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
mount_nodev+0x4f/0xb0
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
lustre_mount+0x38/0x60 [obdclass]
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] mount_fs+0x3e/0x1b0
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
vfs_kern_mount+0x67/0x110
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
do_mount+0x1ef/0xd00
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
__check_object_size+0x1ca/0x250
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] ? 
kmem_cache_alloc_trace+0x3c/0x200
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] SyS_mount+0x83/0xd0
Mar 11 17:21:17 lxmds19.gsi.de kernel:  [] 
system_call_fastpath+0x25/0x2a




Other than that, nothing is happening.

The Lustre processes have started, but e.g. recovery_status = Inactive.
OK, perhaps because there is nothing out there to recover besides this MDS, all other Lustre 
servers+clients are still stopped.



Still, on previous occasions the mount would not block in this way. The device would be mounted - now 
it does not make it into /proc/mounts


Btw, the disk device can be mounted as type ldiskfs. So it exists, and it looks definitely like a 
Lustre MDT on the inside.



Best,
Thomas

--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre stuck in ldlm_lockd (lock on destroyed export, lock timed out)

2021-03-10 Thread Thomas Roth via lustre-discuss

In addition, I noticed that those clients that do reconnect are logged as

Mar 10 13:12:24 lxmds19.gsi.de kernel: Lustre: hebe-MDT: Connection 
restored to  (at 10.20.0.41@o2ib5)

MDS and MDT have this client listed (/proc/fs/lustre/.../exports/) and there is 
a uuid there for the client.


Regards
Thomas

On 10.03.21 12:33, Thomas Roth via lustre-discuss wrote:

Hi all,

we are in a critical situation where our Lustre is rendered completely 
inaccessible.

We are running Lustre 2.12.5 on CentOS 7.8, Whamcloud sources, MDTs on ldiskfs, 
OSTs on ZFS, 3 MDS.

The first MDS, running MGS + MDT0, is showing
### lock callback timer expired
evicting clients, and
### lock on destroyed export
for the same client, as in


Mar 10 09:51:54 lxmds19.gsi.de kernel: LustreError: 4779:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 450s: evicting 
client at 10.20.4.68@o2ib5  ns: mdt-hebe-MDT_UUID lock: 8f1ef6681b00/0xdba5480d76a73ab6 lrc: 3/0,0 mode: PR/PR res: [0x20002db4c:0x14:0x0].0x0 
bits 0x13/0x0 rrc: 3 type: IBT flags: 0x6020040020 nid: 10.20.4.68@o2ib5 remote: 0x5360294b0558b867 expref: 31 pid: 6649 timeout: 4849 lvb_type: 0


Mar 10 09:51:54 lxmds19.gsi.de kernel: LustreError: 6570:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) ### lock on destroyed export 8f1eede9 
ns: mdt-hebe-MDT_UUID lock: 8f1efbded8c0/0xdba5480d76a9e456 lrc: 3/0,0 mode: PR/PR res: [0x20002c52b:0xd92b:0x0].0x0 bits 0x13/0x0 rrc: 175 
type: IBT flags: 0x5020040020 nid: 10.20.4.68@o2ib5 remote: 0x5360294b0558b875 expref: 4 pid: 6570 timeout: 0 lvb_type: 0




Eventually, there is
### lock timed out ; not entering recovery in server code, just going back to 
sleep


Restart of the server does not help.
Recovery runs through, clients show the MDS in 'lfs check mds', but any kind of 
access (aka 'ls') will hang.


Any help is much appreciated.

Regards
Thomas




--

Thomas Roth
Department: IT
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre stuck in ldlm_lockd (lock on destroyed export, lock timed out)

2021-03-10 Thread Thomas Roth via lustre-discuss

Hi all,

we are in a critical situation where our Lustre is rendered completely 
inaccessible.

We are running Lustre 2.12.5 on CentOS 7.8, Whamcloud sources, MDTs on ldiskfs, 
OSTs on ZFS, 3 MDS.

The first MDS, running MGS + MDT0, is showing
### lock callback timer expired
evicting clients, and
### lock on destroyed export
for the same client, as in


Mar 10 09:51:54 lxmds19.gsi.de kernel: LustreError: 4779:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 450s: evicting 
client at 10.20.4.68@o2ib5  ns: mdt-hebe-MDT_UUID lock: 8f1ef6681b00/0xdba5480d76a73ab6 lrc: 3/0,0 mode: PR/PR res: [0x20002db4c:0x14:0x0].0x0 
bits 0x13/0x0 rrc: 3 type: IBT flags: 0x6020040020 nid: 10.20.4.68@o2ib5 remote: 0x5360294b0558b867 expref: 31 pid: 6649 timeout: 4849 lvb_type: 0


Mar 10 09:51:54 lxmds19.gsi.de kernel: LustreError: 6570:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) ### lock on destroyed export 8f1eede9 
ns: mdt-hebe-MDT_UUID lock: 8f1efbded8c0/0xdba5480d76a9e456 lrc: 3/0,0 mode: PR/PR res: [0x20002c52b:0xd92b:0x0].0x0 bits 0x13/0x0 rrc: 175 
type: IBT flags: 0x5020040020 nid: 10.20.4.68@o2ib5 remote: 0x5360294b0558b875 expref: 4 pid: 6570 timeout: 0 lvb_type: 0




Eventually, there is
### lock timed out ; not entering recovery in server code, just going back to 
sleep


Restart of the server does not help.
Recovery runs through, clients show the MDS in 'lfs check mds', but any kind of 
access (aka 'ls') will hang.


Any help is much appreciated.

Regards
Thomas


--

Thomas Roth
Department: IT
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Inode quota: limits on different MDTs

2021-03-06 Thread Thomas Roth via lustre-discuss

Dear all,

a user has hit the inode quota limit:

 Filesystemused   quota   limit   grace   files   quota   limit   grace
/lustre  12.76T  0k  0k   -  635978* 636078  636178 
2d16h29m43s

Typical quota mathematics: 635978 > 636178, but it's distributed quota, very 
well.

We have three MDTs, the user most probably has files and directories on only 
one of them.
Let's check "-v":

# lfs quota -h -v -u User /lustre
/lustre  12.76T  0k  0k   -  635978* 636078  636178 
2d16h29m43s
lustre-MDT_UUID
 0k   -  15.28G   -   0   -   1   -
lustre-MDT0001_UUID
 134.2M   -  16.04G   -  635978   -  636460   -
lustre-MDT0002_UUID
 0k   -  0k   -   0   -   17150   -



What is the meaning of column #7 in the output for each MDT?

In the general result, it is the hard limit.

Here it is 1 on MDT0 - everything probably needs at least one inode on the root 
of the fs.
Seem the user's files are on MDT1, where the column reads  636460 - that is not 
what I set as the hard limit.
And on MDT2, the column reads 17150, but no files from the user there.
And the hard limit is also not the difference of these two values ;-)


Best regards,
Thomas



--

Thomas Roth
Department: IT
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org