Re: [lustre-discuss] 9.4 client release date

2024-05-08 Thread Aurelien Degremont via lustre-discuss

2.15.5 will have that support and this is "coming soon", see slide 13
 
https://www.depts.ttu.edu/hpcc/events/LUG24/slides/Day1/LUG_2024_Talk_01-Community_Release_Update.pdf


Aurélien

De : lustre-discuss  de la part de 
Michael DiDomenico via lustre-discuss 
Envoyé : mercredi 8 mai 2024 21:25
À : lustre-discuss 
Objet : [lustre-discuss] 9.4 client release date

External email: Use caution opening links or attachments


does anyone have an idea of when the 2.15 client for redhat 9.4 will
get released?  just curious, trying to plan some maintenance
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] kernel threads for rpcs in flight

2024-05-03 Thread Aurelien Degremont via lustre-discuss
> This is a module parameter, since it cannot be changed at runtime.  This is 
> visible at /sys/module/libcfs/parameters/cpu_npartitions and the default 
> value depends on the number of CPU cores and NUMA configuration.  It can be 
> specified with "options libcfs cpu_npartitions=" in 
> /etc/modprobe.d/lustre.conf.
> The "cpu_npartitions" module parameter controls how many groups the cores are 
> split into.  The "cpu_pattern" parameter can control the specific cores in 
> each of the CPTs, which would affect the default per-CPT ptlrpcd threads 
> location. It is possible to further use the "ptlrpcd_cpts" and 
> "ptlrpcd_per_cpt_max" parameters to control specifically which cores are used 
> for the threads.

Just a comment on the tuning parameters could be tricky.
"cpu_npartitions" is ignored at the profit of "cpu_pattern", except if 
cpu_pattern is the empty string. cpu_pattern can achieve the same results as 
cpu_npartitions, but at the cost of a more complex declaration. If you just 
want to split your cores into multiple subgroups, you can use cpu_npartition.

options libcfs cpu_pattern="" cpu_npartition=8  # You need to set the empty 
string for cpu_pattern to avoid cpu_npartition to be ignored
or
options libcfs cpu_pattern="N" # the default, split on the NUMA groups
or
options libcfs cpu_pattern="0[0-3] 1[4-7] 2[8-11] 3[12-15] 4[16-19] 5[21-23] 
6[24-27] 7[28-31]"  # same as cpu_npartition=8

or even more complex distribution, see Lustre Manual for details.

Also check  "lctl get_param cpu_partition_table" to see your current partition 
table.

Aurélien

De : lustre-discuss  de la part de 
Andreas Dilger via lustre-discuss 
Envoyé : vendredi 3 mai 2024 07:25
À : Anna Fuchs 
Cc : lustre 
Objet : Re: [lustre-discuss] kernel threads for rpcs in flight

External email: Use caution opening links or attachments

On May 2, 2024, at 18:10, Anna Fuchs 
mailto:anna.fu...@uni-hamburg.de>> wrote:
The number of ptlrpc threads per CPT is set by the "ptlrpcd_partner_group_size" 
module parameter, and defaults to 2 threads per CPT, IIRC.  I don't think that 
clients dynamically start/stop ptlrpcd threads at runtime.
When there are RPCs in the queue for any ptlrpcd it will be woken up and 
scheduled by the kernel, so it will compete with the application threads.  
IIRC, if a ptlrpcd thread is woken up and there are no RPCs in the local CPT 
queue it will try to steal RPCs from another CPT on the assumption that the 
local CPU is not generating any RPCs so it would be beneficial to offload 
threads on another CPU that *is* generating RPCs.  If the application thread is 
extremely CPU hungry, then the kernel will not schedule the ptlrpcd threads on 
those codes very often, and the "idle" core ptlrpcd threads will be be able to 
run more frequently.

Sorry, maybe I am confusing things. I am still not sure how many threads I get.
For example I have a 32 cores AMD Epyc machine as a client and I am running a 
serial stream io application with a single stripesize, 1 OST.
I am struggeling to find out how many CPU partitions I have - is it something 
on the hardware side or something configurable?
There is no file /proc/sys/lnet/cpu_partitions on my client.

This is a module parameter, since it cannot be changed at runtime.  This is 
visible at /sys/module/libcfs/parameters/cpu_npartitions and the default value 
depends on the number of CPU cores and NUMA configuration.  It can be specified 
with "options libcfs cpu_npartitions=" in /etc/modprobe.d/lustre.conf.

Assuming I had 2 CPU partitions, that would result in 4 ptlrpc threads at 
system start, right?

Correct.

Now I set  rpcs_in_flight to 1 or to 8, what effect does that have on the 
number and the activity of the threads?

Setting rpcs_in_flight has no effect on the number of ptlrpcd threads.  The 
ptlrpcd threads process RPCs asynchronously (unlike server threads) so they can 
keep many RPCs in progress.

Serial stream, 1 rpcs_in_flight is waking up only one ptlrpc thread, 3 remain 
inactive/sleep/do nothing?

This depends.  There are two ptlrpcd threads for the CPT that can process the 
RPCs from the one user thread.  If they can send the RPCs quickly enough then 
the other ptlrpcd threads may not steal the RPCs from that CPT.

That said, even a single threaded userspace writer may have up to 8 RPCs in 
flight *per OST* (depending on the file striping and if IO submission allows it 
- buffered or AIO+DIO) so if there are a lot of outstanding RPCs and RPC 
generation takes a long time (e.g. compression) then it may be that all ptlrpcd 
threads will be busy.

Does not seem to be the case, as I've applied the rpctracing (thanks a lot for 
the hint!!), and rpcs_in_flight being 1 still show at least 3 different threads 
from at least 2 different partitions for writing a 1MB file with ten blocks.
I don't get the relationship between these values.

What are the opcodes from the different RPCs?  The ptlrpcd threads are only 
handling 

Re: [lustre-discuss] question on behavior of supplementary group permissions

2024-01-24 Thread Aurelien Degremont via lustre-discuss
If I remember correctly, the Lustre client is sending at most 2 GIDs into its 
RPC, in addition to its effective GID, those are the file GIDs related to the 
operation you're trying to do. (2 GIDs if you're doing an operation on 2 files 
(ie: rename)).

In your case, this is just open, so I think the client will send only its 
effective FS GID and the file GID. However this is an open, and it is likely 
that the intent RPC does not fetch the file metadata prior to do the open, so 
the client is not telling the MDT that this user is already a member of that 
file group. The MDT has no way to know it (identity_upcall = NONE).

This is a bug/limitation of identity_upcall = NONE (which is not the standard 
deployment). The "fix" would be for the client to either send all the 
supplementary groups the user is member of (used to be done prior to 2.6, but 
note sure this is a good thing), or the client should retrieve file metadata 
before sending the open RPC, which will impact open perf a lot.

If you want a bit more 
context:https://review.whamcloud.com/c/fs/lustre-release/+/49539

Aurélien


De : lustre-discuss  de la part de 
Bertschinger, Thomas Andrew Hjorth via lustre-discuss 

Envoyé : mercredi 24 janvier 2024 09:23
À : Kira Duwe via lustre-discuss 
Objet : [lustre-discuss] question on behavior of supplementary group permissions

External email: Use caution opening links or attachments


Hello,

We have a curious issue with supplemental group permissions. There is a set of 
files where a user has group read permission to the files via a supplemental 
group. If the user tries to open() one of these files, they get EACCES. Then, 
if the user stat()s the file (or seemingly does any operation that caches the 
inode on the client), the next open() attempt succeeds. Interactively, this 
looks like:

$ cat /lustre/problem_file
cat: /lustre/problem_file: Permission denied
$ stat /lustre/problem_file
(succeeds)
$ cat /lustre/problem_file
(succeeds)

We've only observed this on particular client/server pair:

client kernel: 3.10.0-1160.95.1
client lustre: 2.15.3
server kernel: 4.18.0-477.21.1
server lustre: 2.15.3

We have mdt.*.identity_upcall=NONE set on every server.  Also, we cannot 
reproduce the issue with newly created files; it only appears to affect a set 
of existing files.

I have 2 questions about this. The big one is, section 41.1.2.1 of the Lustre 
manual claims:

> If there is no upcall or if there is an upcall and it fails, one 
> supplementary group at most will be added as supplied by the client.

To my reading, this suggests that the "bug" we observe above is actually the 
correct behavior. Well, the manual is not precise about which single 
supplementary group will be supplied by the client, but the relevant group in 
this case is not the first supplemental group in the user's group list, it's in 
the middle of the list. So my question is, is the Lustre manual accurate (and 
then my followup question is, if so, why do supplemental group permissions 
appear to work for us in most cases...)?

Or is the manual wrong/out of date here?

My second question is, assuming the behavior described above is a bug, are 
there any known issues here that we could be running into?

Let me know if I can provide any more information.

Thanks,
Thomas Bertschinger
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Error messages (ex: not available for connect from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1

2023-12-05 Thread Aurelien Degremont via lustre-discuss
> Now what is the messages about "deleting orphaned objects" ? Is it normal 
> also ?

Yeah, this is kind of normal, and I'm even thinking we should lower the message 
verbosity...
Andreas, do you agree that could become a simple CDEBUG(D_HA, ...) instead of 
LCONSOLE(D_INFO, ...)?


Aurélien

De : lustre-discuss  de la part de 
Audet, Martin via lustre-discuss 
Envoyé : lundi 4 décembre 2023 20:26
À : Andreas Dilger 
Cc : lustre-discuss@lists.lustre.org 
Objet : Re: [lustre-discuss] Error messages (ex: not available for connect from 
0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1

External email: Use caution opening links or attachments


Hello Andrea,


Thanks for your response. Happy to learn that the "errors" I was reporting 
aren't really errors.


I now understand that the 3 messages about LDISKFS were only normal messages 
resulting from mounting the file systems (I was fooled by vim showing this 
message in red, like important error messages, but this is simply a false 
positive result of its syntax highlight rules probably triggered by the 
"errors=" string which is only a mount option...).

Now what is the messages about "deleting orphaned objects" ? Is it normal also 
? We boot the clients VMs always after the server is ready and we shutdown 
clients cleanly well before the vlmf Lustre server is (also cleanly) shutdown. 
It is a sign of corruption ? How come this happen if shutdowns are clean ?

Thanks (and sorry for the beginners questions),

Martin


From: Andreas Dilger 
Sent: December 4, 2023 5:25 AM
To: Audet, Martin
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Error messages (ex: not available for connect 
from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1


***Attention*** This email originated from outside of the NRC. ***Attention*** 
Ce courriel provient de l'extérieur du CNRC.

It wasn't clear from your rail which message(s) are you concerned about?  These 
look like normal mount message(s) to me.

The "error" is pretty normal, it just means there were multiple services 
starting at once and one wasn't yet ready for the other.

 LustreError: 137-5: lustrevm-MDT_UUID: not available for connect
 from 0@lo (no target). If you are running an HA pair check that the 
target
is mounted on the other server.

It probably makes sense to quiet this message right at mount time to avoid this.

Cheers, Andreas

On Dec 1, 2023, at 10:24, Audet, Martin via lustre-discuss 
 wrote:



Hello Lustre community,


Have someone ever seen messages like these on in "/var/log/messages" on a 
Lustre server ?


Dec  1 11:26:30 vlfs kernel: Lustre: Lustre: Build Version: 2.15.4_RC1
Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdd): mounted filesystem with ordered 
data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdc): mounted filesystem with ordered 
data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdb): mounted filesystem with ordered 
data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Dec  1 11:26:36 vlfs kernel: LustreError: 137-5: lustrevm-MDT_UUID: not 
available for connect from 0@lo (no target). If you are running an HA pair 
check that the target is mounted on the other server.
Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: Imperative Recovery not 
enabled, recovery window 300-900
Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: deleting orphan objects 
from 0x0:227 to 0x0:513


This happens on every boot on a Lustre server named vlfs (a AlmaLinux 8.9 VM 
hosted on a VMware) playing the role of both MGS and OSS (it hosts an MDT two 
OST using "virtual" disks). We chose LDISKFS and not ZFS. Note that this 
happens at every boot, well before the clients (AlmaLinux 9.3 or 8.9 VMs) 
connect and even when the clients are powered off. The network connecting the 
clients and the server is a "virtual" 10GbE network (of course there is no 
virtual IB). Also we had the same messages previously with Lustre 2.15.3 using 
an AlmaLinux 8.8 server and AlmaLinux 8.8 / 9.2 clients (also using VMs). Note 
also that we compile ourselves the Lustre RPMs from the sources from the git 
repository. We also chose to use a patched kernel. Our build procedure for RPMs 
seems to work well because our real cluster run fine on CentOS 7.9 with Lustre 
2.12.9 and IB (MOFED) networking.

So has anyone seen these messages ?

Are they problematic ? If yes, how do we avoid them ?

We would like to make sure our small test system using VMs works well before we 
upgrade our real cluster.

Thanks in advance !

Martin Audet

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list