Re: [lustre-discuss] need to always manually add network after reboot

2021-02-23 Thread Degremont, Aurelien via lustre-discuss
Hello

If I understand correctly, you're telling that you have 2 configuration files:

/etc/modprobe.d/lnet.conf
options lnet networks=tcp

[root@hpc-oss-03 ~]# cat /etc/modprobe.d/lustre.conf
options lnet networks="tcp(ens2f0)"
options lnet ip2nets="tcp(ens2f0) 10.140.93.*

That means you are declaring twice the "networks" option for "lnet" kernel 
module. I don't know how 'modprobe' will behave regarding that.
If you have a very simple configuration, where your nodes only have one 
Ethernet interface "ens2f0", you only need the following lines, from the 3 
above:

options lnet networks="tcp(ens2f0)"

If this interface is the only Ethernet interface on your host, you don't even 
need a network specific setup. By default, when loading Lustre, in the absence 
of a network configuration, Lustre will automatically setup the only ethernet 
interface to use it for "tcp".

Aurélien


De : lustre-discuss  au nom de Sid 
Young via lustre-discuss 
Répondre à : Sid Young 
Date : mardi 23 février 2021 à 06:59
À : lustre-discuss 
Objet : [EXTERNAL] [lustre-discuss] need to always manually add network after 
reboot


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



G'Day all,
I'm finding that when I reboot any node in our new HPC, I need to keep manually 
adding the network using lnetctl net add --net tcp --if ens2f0
Then I can do an lnetctl net show and see the tcp part active...

I have options in  /etc/modprobe.d/lnet.conf
options lnet networks=tcp

and

[root@hpc-oss-03 ~]# cat /etc/modprobe.d/lustre.conf
options lnet networks="tcp(ens2f0)"
options lnet ip2nets="tcp(ens2f0) 10.140.93.*

I've read the doco and tried to understand the correct parameters for a simple 
Lustre config so this is what I worked out is needed... but I suspect its still 
wrong.

Any help appreciated :)



Sid Young

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] servicenode /failnode

2021-02-26 Thread Degremont, Aurelien via lustre-discuss
You're totally correct!


De : lustre-discuss  au nom de Sid 
Young via lustre-discuss 
Répondre à : Sid Young 
Date : vendredi 26 février 2021 à 00:45
À : lustre-discuss 
Objet : [EXTERNAL] [lustre-discuss] servicenode /failnode


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


G'Day all,

I'm rebuilding my Lustre cluster again and in doing so I am trying to 
understand the role of the --servicenode option when creating an OST. There is 
an example in the doco shown as this:

[root@rh7z-oss1 system]# mkfs.lustre --ost \

>   --fsname demo \

>   --index 0 \

>   --mgsnode 192.168.227.11@tcp1 \

>   --mgsnode 192.168.227.12@tcp1 \

>   --servicenode 192.168.227.21@tcp1 \

>   --servicenode 192.168.227.22@tcp1 \

>   /dev/dm-3
But its not clear what the service node actually is.

Am I correct in saying the service nodes are the IP's of the two OSS servers 
that can manage this particular OST (the HA pair)?



Sid Young

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OST mount issue

2021-04-26 Thread Degremont, Aurelien via lustre-discuss
Hello Steve,

This message appears when you are using Lustre modules built with only client 
support, with server support disabled.
This message is quite new and only appears in very recent Lustre releases. What 
Lustre version are you using, this error does not exist in 2.12.6 as far as I 
know.

Could you double check the Lustre version and RPMs you installed on that host, 
compare with the other and ensure they are the same?
Could you simply try: 'modprobe mdt' and see if that works?

Check also: 'lctl get_param version'

Aurélien

Le 25/04/2021 15:10, « lustre-discuss au nom de Steve Thompson » 
 a écrit :

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



Two CentOS 7.9 hosts with kernel 3.10.0-1160.21.1.el7.x86_64, using ZFS
0.8.5 with lustre 2.12.6. One system works perfectly, whereas the second
host fails to mount the OST, with this message from the mount:

mount.lustre: mount fs1/ost1 at /mnt/fs1/ost1 failed: Invalid argument

and this confusing message in the log:

Apr 25 08:56:18 fs1 kernel: LustreError: 
27722:0:(obd_mount.c:1597:lustre_fill_super())
This is client-side-only module, cannot handle server mount.

Can someone please point me to the error? BTW, the two systems were both
kickstarted from the same repo, so as far as I can tell, they are
identical. The lustre RPMs were installed from the server repo.

Steve

--

Steve Thompson E-mail:  smt AT vgersoft DOT com
Voyager Software LLC   Web: http://www DOT vgersoft DOT com
3901 N Charles St  VSW Support: support AT vgersoft DOT com
Baltimore MD 21218
   "186,282 miles per second: it's not just a good idea, it's the law"

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OST mount issue

2021-04-26 Thread Degremont, Aurelien via lustre-discuss


Le 26/04/2021 09:34, « Degremont, Aurelien »  a écrit :


This message appears when you are using Lustre modules built with only 
client support, with server support disabled.
This message is quite new and only appears in very recent Lustre releases.

Actually I double-checked that and this is not true. It does exist with Lustre 
2.12.6

Just looks like you installed a client-only version of Lustre RPMs...

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OST mount issue

2021-04-26 Thread Degremont, Aurelien via lustre-discuss
Could you provide more debugging information, like 'rpm -qa | grep lustre' on 
both hosts?
The actual mount command, etc...

There must be something different, as the result is different...



Le 26/04/2021 16:25, « Steve Thompson »  a écrit :

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



On Mon, 26 Apr 2021, Degremont, Aurelien wrote:

> Le 26/04/2021 09:34, ? Degremont, Aurelien ?  a 
?crit :
>
>This message appears when you are using Lustre modules built with
>only client support, with server support disabled. This message is
>quite new and only appears in very recent Lustre releases.
>
> Actually I double-checked that and this is not true. It does exist with 
Lustre 2.12.6
>
> Just looks like you installed a client-only version of Lustre RPMs...

Unfortunately this isn't true. I installed the RPM's on the working and
non-working systems from the same place:

[lustre-server]
name=lustre-server

baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/server
gpgcheck=0

Steve
--

Steve Thompson E-mail:  smt AT vgersoft DOT com
Voyager Software LLC   Web: http://www DOT vgersoft DOT com
3901 N Charles St  VSW Support: support AT vgersoft DOT com
Baltimore MD 21218
   "186,282 miles per second: it's not just a good idea, it's the law"


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OST mount issue

2021-04-26 Thread Degremont, Aurelien via lustre-discuss
I see that you are using dkms version of lustre module.
I would check that, it is possible that the dkms build trigger a client-only 
build of Lustre ? (missing deps, etc...)
Could you check the list of Lustre modules built and installed by dkms between 
the 2 working OSSes and the 4 others?


Aurélien

Le 26/04/2021 18:27, « Steve Thompson »  a écrit :

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



On Mon, 26 Apr 2021, Degremont, Aurelien wrote:

> Could you provide more debugging information, like 'rpm -qa | grep 
lustre' on both hosts?
> The actual mount command, etc...
>
> There must be something different, as the result is different...

Yes, I believe that something must be different; I just cannot find it. I
now have six OST systems. All were installed the same way; two work fine
and four do not. The rpm list:

# rpm -qa | grep lustre
lustre-osd-zfs-mount-2.12.6-1.el7.x86_64
lustre-2.12.6-1.el7.x86_64
lustre-zfs-dkms-2.12.6-1.el7.noarch

# the mount command example:
# grep lustre /etc/fstab
fs1/ost1/mnt/fs1/ost1   lustre defaults,_netdev_  0 0

and all are the same on all six systems. I currently have ZFS 0.8.5
installed, but I have tried with ZFS 0.7.13, and the results are
the same.

Steve
--

Steve Thompson E-mail:  smt AT vgersoft DOT com
Voyager Software LLC   Web: http://www DOT vgersoft DOT com
3901 N Charles St  VSW Support: support AT vgersoft DOT com
Baltimore MD 21218
   "186,282 miles per second: it's not just a good idea, it's the law"


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Full OST

2021-09-03 Thread Degremont, Aurelien via lustre-discuss
Hi

It could be a bug, but most of the time, this is due to an open-unlinked file, 
typically a log file which is still in use and some processes keep writing to 
it until it fills the OSTs it is using.

Look for such files on your clients (use lsof). 

Aurélien


Le 03/09/2021 09:50, « lustre-discuss au nom de Alastair Basden » 
 a 
écrit :

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



Hi,

We have a file system where each OST is a single SSD.

One of those is reporting as 100% full (lfs df -h /snap8):
snap8-OST004d_UUID  5.8T2.0T3.5T  37% /snap8[OST:77]
snap8-OST004e_UUID  5.8T5.5T7.5G 100% /snap8[OST:78]
snap8-OST004f_UUID  5.8T2.0T3.4T  38% /snap8[OST:79]

However, I can't find any files on it:
lfs find --ost snap8-OST004e /snap8/
returns nothing.

I guess that it has filled up, and that there is some bug or other that is
now preventing proper behaviour - but I could be wrong.

Does anyone have any suggestions?

Essentially, I'd like to find some of the files and delete or migrate
some, and thus return it to useful production.

Cheers,
Alastair.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Full OST

2021-09-06 Thread Degremont, Aurelien via lustre-discuss
s provided.
>>>
>>> OPTIONS
>>>  -f, --print-fid
>>> Print the FID with the path.
>>>
>>>  -c, --print-link
>>> Print the current link number with each pathname or parent 
directory.
>>>
>>>  -l, --link=LINK
>>> If a file has multiple hard links, then print only the 
specified LINK, starting at link 0.
>>> If multiple FIDs are given, but only one pathname is needed 
for each file, use --link=0.
>>>
>>> EXAMPLES
>>>  $ lfs fid2path /mnt/testfs [0x20403:0x11f:0x0]
>>> /mnt/testfs/etc/hosts
>>>
>>>
>>> On Sep 3, 2021, at 14:51, Alastair Basden 
mailto:a.g.bas...@durham.ac.uk>> wrote:
>>>
>>> Hi,
>>>
>>> lctl get_param mdt.*.exports.*.open_files  returns:
>>> mdt.snap8-MDT.exports.172.18.180.21@o2ib.open_files=
>>> [0x2b90e:0x10aa:0x0]
>>> mdt.snap8-MDT.exports.172.18.180.22@o2ib.open_files=
>>> [0x2b90e:0x21b3:0x0]
>>> mdt.snap8-MDT.exports.172.18.181.19@o2ib.open_files=
>>> [0x2b90e:0x21b3:0x0]
>>> [0x2b90e:0x21b4:0x0]
>>> [0x2b90c:0x1574:0x0]
>>> [0x2b90c:0x1575:0x0]
>>> [0x2b90c:0x1576:0x0]
>>>
>>> Doesn't seem to be many open, so I don't think it's a problem of open 
files.
>>>
>>> Not sure which bit of this I need to use with lfs fid2path either...
>>>
>>> Cheers,
>>> Alastair.
>>>
>>>
>>> On Fri, 3 Sep 2021, Andreas Dilger wrote:
>>>
>>> [EXTERNAL EMAIL]
>>> You can also check "mdt.*.exports.*.open_files" on the MDTs for a list 
of FIDs open on each client, and use "lfs fid2path" to resolve them to a 
pathname.
>>>
>>> On Sep 3, 2021, at 02:09, Degremont, Aurelien via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org>>
 wrote:
>>>
>>> Hi
>>>
>>> It could be a bug, but most of the time, this is due to an 
open-unlinked file, typically a log file which is still in use and some 
processes keep writing to it until it fills the OSTs it is using.
>>>
>>> Look for such files on your clients (use lsof).
>>>
>>> Aurélien
>>>
>>>
>>> Le 03/09/2021 09:50, « lustre-discuss au nom de Alastair Basden » 
mailto:lustre-discuss-boun...@lists.lustre.org><mailto:lustre-discuss-boun...@lists.lustre.org>
 au nom de 
a.g.bas...@durham.ac.uk<mailto:a.g.bas...@durham.ac.uk><mailto:a.g.bas...@durham.ac.uk>>
 a écrit :
>>>
>>> CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.
>>>
>>>
>>>
>>> Hi,
>>>
>>> We have a file system where each OST is a single SSD.
>>>
>>> One of those is reporting as 100% full (lfs df -h /snap8):
>>> snap8-OST004d_UUID  5.8T2.0T3.5T  37% 
/snap8[OST:77]
>>> snap8-OST004e_UUID  5.8T5.5T7.5G 100% 
/snap8[OST:78]
>>> snap8-OST004f_UUID  5.8T2.0T3.4T  38% 
/snap8[OST:79]
>>>
>>> However, I can't find any files on it:
>>> lfs find --ost snap8-OST004e /snap8/
>>> returns nothing.
>>>
>>> I guess that it has filled up, and that there is some bug or other that 
is
>>> now preventing proper behaviour - but I could be wrong.
>>>
>>> Does anyone have any suggestions?
>>>
>>> Essentially, I'd like to find some of the files and delete or migrate
>>> some, and thus return it to useful production.
>>>
>>> Cheers,
>>> Alastair.
>>> ___
>>> lustre-discuss mailing list
>>> 
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org>
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>> ___
>>> lustre-discuss mailing list
>>> 
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org>
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Lustre Principal Architect
>>> Whamcloud
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Lustre Principal Architect
>>> Whamcloud
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] how to optimize write performances

2021-10-01 Thread Degremont, Aurelien via lustre-discuss
Hello

To achieve higher throughput with a single threaded process, you should try to 
limit latencies and parallelize under the hood.
Try checking the following parameters:
- Stripe your file across multiple OSTs
- Do large I/O, multiple MB per write, to let Lustre send multiple RPC to 
different OSTs
- Try testing with and without Direct I/O.

What is your 'dd' test command?
Clear and check rpc stats (sudo lctl set_param osc.*.rpc_stats=clear; sudo lctl 
get_param osc.*.rpc_stats). Check you are sending large RPCs (pages per rpc).

Aurélien

Le 30/09/2021 18:11, « lustre-discuss au nom de Riccardo Veraldi » 
 a écrit :

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



Hello,

I wanted to ask some hint on how I may increase single process
sequential write performance on Lustre.

I am using Lustre 2.12.7 on RHEL 7.9

I have a number of OSSes with SAS SSDs in raidz. 3 OST per oss and each
OST is made by 8 SSD in raidz.

On a local test with multiple writes I can write and read from the zpool
at 7GB/s per OSS.

With Lustre/ZFS backend I can reach peak writes of 5.5GB/s per OSS which
is ok.

This anyway happens only with several multiple writes at once on the
filesystem.

A single write cannot perform more than 800MB-1GB/s

Changing the underlying hardware and moving to MVMe slightly improve
single write performance but just slightly.

What is preventing a single write pattern to perform better ? They are
XTC files.

Each single SSD has a 500MB/s write capability by factory specs. So
seems like that with a single write it is not possible to take advantage
of the

zpool parallelism. I tried also striping but that does not really help much.

Any hint is really appreciated.

Best

Riccardo



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] how to optimize write performances

2021-10-05 Thread Degremont, Aurelien via lustre-discuss
Hello

Direct I/O is impacting the whole I/O path, from client down to ZFS. Agreed ZFS 
does not support it, but all the rest of I/O path is.

Could you provide you fio command line?
As I said, you need to do _large I/O_ of multiple MB size. If you are just 
doing 1 MB I/O (assuming stripesize is 1MB), you application will just send 1 
RPC at a time to 1 OST, wait for the reply and send the next one. The client 
cache will help at the beginning, until it is full (32MB max_dirty_mb per OST 
by default). 
What about rpc_stats?


Aurélien

Le 04/10/2021 18:32, « Riccardo Veraldi »  a 
écrit :

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



Hello Aurelien,

I Am using ZFS as lustre backend. ZFS does not support direct I/O.

Ok Lustre does but anyway the performance with direct I/O are worse when
using ZFS backend at least during my tests.

Best

Riccardo


On 10/1/21 2:22 AM, Degremont, Aurelien wrote:
> Hello
>
> To achieve higher throughput with a single threaded process, you should 
try to limit latencies and parallelize under the hood.
> Try checking the following parameters:
> - Stripe your file across multiple OSTs
> - Do large I/O, multiple MB per write, to let Lustre send multiple RPC to 
different OSTs
> - Try testing with and without Direct I/O.
>
> What is your 'dd' test command?
> Clear and check rpc stats (sudo lctl set_param osc.*.rpc_stats=clear; 
sudo lctl get_param osc.*.rpc_stats). Check you are sending large RPCs (pages 
per rpc).
>
> Aurélien
>
> Le 30/09/2021 18:11, « lustre-discuss au nom de Riccardo Veraldi » 
 a écrit :
>
>  CAUTION: This email originated from outside of the organization. Do 
not click links or open attachments unless you can confirm the sender and know 
the content is safe.
>
>
>
>  Hello,
>
>  I wanted to ask some hint on how I may increase single process
>  sequential write performance on Lustre.
>
>  I am using Lustre 2.12.7 on RHEL 7.9
>
>  I have a number of OSSes with SAS SSDs in raidz. 3 OST per oss and 
each
>  OST is made by 8 SSD in raidz.
>
>  On a local test with multiple writes I can write and read from the 
zpool
>  at 7GB/s per OSS.
>
>  With Lustre/ZFS backend I can reach peak writes of 5.5GB/s per OSS 
which
>  is ok.
>
>  This anyway happens only with several multiple writes at once on the
>  filesystem.
>
>  A single write cannot perform more than 800MB-1GB/s
>
>  Changing the underlying hardware and moving to MVMe slightly improve
>  single write performance but just slightly.
>
>  What is preventing a single write pattern to perform better ? They 
are
>  XTC files.
>
>  Each single SSD has a 500MB/s write capability by factory specs. So
>  seems like that with a single write it is not possible to take 
advantage
>  of the
>
>  zpool parallelism. I tried also striping but that does not really 
help much.
>
>  Any hint is really appreciated.
>
>  Best
>
>  Riccardo
>
>
>
>  ___
>  lustre-discuss mailing list
>  lustre-discuss@lists.lustre.org
>  http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] eviction timeout

2021-10-11 Thread Degremont, Aurelien via lustre-discuss
Hello

This message is appearing during MDT recovery, likely after a MDS restart. MDT 
tries to reconnect first all existing client when it stopped.
It seems all these clients have been also rebooted. To avoid this message, try 
to stop your clients before the servers.

If not possible, you can abort the recovery, either at start time 
(https://doc.lustre.org/lustre_manual.xhtml#lustremaint.abortRecovery) or when 
recovery is running with the following commands on the MDS host:

lctl --device lustre-MDT abort_recovery


Aurélien

De : lustre-discuss  au nom de Sid 
Young via lustre-discuss 
Répondre à : Sid Young 
Date : lundi 11 octobre 2021 à 03:16
À : lustre-discuss 
Objet : [EXTERNAL] [lustre-discuss] eviction timeout


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



I'm seeing a lot of these messages:

Oct 11 11:12:09 hpc-mds-02 kernel: Lustre: lustre-MDT: Denying connection 
for new client b6df7eda-8ae1-617c-6ff1-406d1ffb6006 (at 10.140.90.82@tcp), 
waiting for 6 known clients (0 recovered, 0 in progress, and 0 evicted) to 
recover in 2:42

It seems to be a 3minute timeout, is it possible to shorten this and even not 
log this message?

Sid Young

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] unmount FS when endpoint is gone

2021-12-21 Thread Degremont, Aurelien via lustre-discuss
Hello Florin,

As the filesystem servers do not exist anymore as you deleted it previously, 
the client could not reach them to complete the unmount process.

Try unmounting them using '-f' flag, ie: 'umount -f '


You should also reach out to AWS support and check that with them.

Aurélien



Le 21/12/2021 00:54, « lustre-discuss au nom de Florin Andrei » 
 a 
écrit :

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



We've created a few Lustre FS endpoints in AWS. They were mounted on a
system. The Lustre endpoints got terminated soon after that, and others
were created instead.

Now the old Lustre filesystems appear to be mounted on that node, and
there's automation trying to unmount them, resulting in a very large
number of umount processes just hanging. In dmesg I see this message
repeated many, many times:

Lustre: 919:0:(client.c:2116:ptlrpc_expire_one_request()) @@@ Request
sent has failed due to network error:

What is the recommended procedure to unmount those FSs? Just running
umount manually also hangs indefinitely. I would prefer to not reboot
that node.

--
Florin Andrei
https://florin.myip.org/
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] build help with 5.15 kernel

2022-01-03 Thread Degremont, Aurelien via lustre-discuss
Hello Michael

Lustre 2.12.8 does not support Linux 5.15 
More recent Lustre versions support up to Linux 5.11 but not further.
See these tickets for 5.12 and 5.14 support.
https://jira.whamcloud.com/browse/LU-14651
https://jira.whamcloud.com/browse/LU-15220

It is possible to manually backport patches to support some 5.x kernels with 
2.12 but this is not trivial. 

I don't know what is your current project but it will be much easier for you if 
you can target an older Linux kernel and focus on kernel used by major distro. 
Ubuntu 20.04 is using 5.11 by example.


Aurélien


Le 31/12/2021 01:34, « lustre-discuss au nom de Hebenstreit, Michael via 
lustre-discuss »  a écrit :

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



Some additional info. I extracted the make command and ran it against the 2 
kernel versions. Old kernel works, new kernel fails

$ make _module_/tmp/lustre-2.12.5/build LUSTRE_KERNEL_TEST=conftest.i 
LDFLAGS=' LD=/usr/bin/ld -m elf_x86_64' CC=gcc   -f 
/tmp/lustre-2.12.5/build/Makefile 
LUSTRE_LINUX_CONFIG=/admin/src/4.18.0-240.22.1.el8_3.crt6.x86_64.withib//.config
 
LINUXINCLUDE='-I/admin/src/4.18.0-240.22.1.el8_3.crt6.x86_64.withib//arch/x86/include
 -Iinclude -Iarch/x86/include/generated 
-I/admin/src/4.18.0-240.22.1.el8_3.crt6.x86_64.withib//include -Iinclude2 
-I/admin/src/4.18.0-240.22.1.el8_3.crt6.x86_64.withib//include/uapi 
-Iinclude/generated 
-I/admin/src/4.18.0-240.22.1.el8_3.crt6.x86_64.withib//arch/x86/include/uapi 
-Iarch/x86/include/generated/uapi 
-I/admin/src/4.18.0-240.22.1.el8_3.crt6.x86_64.withib//include/uapi 
-Iinclude/generated/uapi -include 
/admin/src/4.18.0-240.22.1.el8_3.crt6.x86_64.withib//include/linux/kconfig.h' 
-o tmp_include_depends -o scripts -o include/config/MARKER -C 
/admin/src/4.18.0-240.22.1.el8_3.crt6.x86_64.withib/ 
EXTRA_CFLAGS='-Werror-implicit-function-declaration -g 
-I/tmp/lustre-2.12.5/libcfs/include -I/tmp/lustre-2.12.5/lnet/include 
-I/tmp/lustre-2.12.5/lustre/include/uapi -I/tmp/lustre-2.12.5/lustre/include 
-Wno-format-truncation -Wno-stringop-truncation -Wno-stringop-overflow' 
M=/tmp/lustre-2.12.5/build
make: Entering directory 
'/global/panfs01/admin/src/4.18.0-240.22.1.el8_3.crt6.x86_64.withib'
  CC [M]  /tmp/lustre-2.12.5/build/conftest.o
  CPP [M] /tmp/lustre-2.12.5/build/conftest.i
make: Leaving directory 
'/global/panfs01/admin/src/4.18.0-240.22.1.el8_3.crt6.x86_64.withib'

$ touch /tmp/lustre-2.12.5/build/conftest.c
$ make _module_/tmp/lustre-2.12.5/build LUSTRE_KERNEL_TEST=conftest.i 
LDFLAGS=' LD=/usr/bin/ld -m elf_x86_64' CC=gcc   -f 
/tmp/lustre-2.12.5/build/Makefile 
LUSTRE_LINUX_CONFIG=/global/panfs01/admin/src/5.15.0-spr.bkc.pc.1.21.0.x86_64/.config
 
LINUXINCLUDE='-I/global/panfs01/admin/src/5.15.0-spr.bkc.pc.1.21.0.x86_64/arch/x86/include
 -Iinclude -Iarch/x86/include/generated 
-I/global/panfs01/admin/src/5.15.0-spr.bkc.pc.1.21.0.x86_64/include -Iinclude2 
-I/global/panfs01/admin/src/5.15.0-spr.bkc.pc.1.21.0.x86_64/include/uapi 
-Iinclude/generated 
-I/global/panfs01/admin/src/5.15.0-spr.bkc.pc.1.21.0.x86_64/arch/x86/include/uapi
 -Iarch/x86/include/generated/uapi 
-I/global/panfs01/admin/src/5.15.0-spr.bkc.pc.1.21.0.x86_64/include/uapi 
-Iinclude/generated/uapi -include 
/global/panfs01/admin/src/5.15.0-spr.bkc.pc.1.21.0.x86_64/include/linux/kconfig.h'
 -o tmp_include_depends -o scripts -o include/config/MARKER -C 
/global/panfs01/admin/src/5.15.0-spr.bkc.pc.1.21.0.x86_64 
EXTRA_CFLAGS='-Werror-implicit-function-declaration -g 
-I/tmp/lustre-2.12.5/libcfs/include -I/tmp/lustre-2.12.5/lnet/include 
-I/tmp/lustre-2.12.5/lustre/include/uapi -I/tmp/lustre-2.12.5/lustre/include 
-Wno-format-truncation -Wno-stringop-truncation -Wno-stringop-overflow' 
M=/tmp/lustre-2.12.5/build
make: Entering directory 
'/global/panfs01/admin/src/5.15.0-spr.bkc.pc.1.21.0.x86_64'
make: *** No rule to make target '_module_/tmp/lustre-2.12.5/build'.  Stop.
make: Leaving directory 
'/global/panfs01/admin/src/5.15.0-spr.bkc.pc.1.21.0.x86_64'

-Original Message-
From: lustre-discuss  On Behalf Of 
Hebenstreit, Michael via lustre-discuss
Sent: Thursday, December 30, 2021 5:07 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] build help with 5.15 kernel

Hello

I'm trying to build the Lustre 2.12.8 client on a 5.15 kernel and already 
failing in the configure step. Looks to me like something in the build process 
has changed. The failure occurs in configure line 14390. From the log:

configure:14390: cp conftest.c build && make -d 
_module_/tmp/lustre-2.12.8/build LUSTRE_KERNEL_TEST=conftest.i LDFLAGS= 
LD=/usr/bin/ld -m elf_x86_64 CC=gcc -f
make: *** No rule to make target '_module_/tmp/lustre-2.12.8/build'.  Stop.
configure:14393: $? = 2

For some reasons the construct "make -d _module_/${PWD} .." does no

Re: [lustre-discuss] project quota problem on existing directories from filesystem created on zfs 0.7 pool

2022-01-17 Thread Degremont, Aurelien via lustre-discuss
Hi

I'm not specialist of project quota, but I have a more generic comment.
I see you said you upgraded to 2.14.58? Is that a version you pick on purpose?

2.14.58 is not intended for production at all. This is an alpha version of what 
will be Lustre 2.15. 

If you want a production-compatible version you should use 2.12.8 or 2.14.0.
Never pick a version where last digit is > 50. That means "development release".


Aurélien

Le 14/01/2022 15:12, « lustre-discuss au nom de Chen Wei via lustre-discuss » 
 a écrit :

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



Hi,

We have a lustre filesystem created by Lustre 2.12.8 on zfs 0.7.13 pool.
Since "Project quotas are not supported on zfs versions earlier than
0.8", it has been recently upgraded to Lustre 2.14.58 with zfs 2.0.7.
After upgrade the zfs pool to enable project quota feature and enable
project quota in Lustre, project quota works on new directory. However,
for existing directories, set project quota command fail with:

# lfs project -p 3000 andy
lfs: failed to set xattr for 'andy': No such device or address

Strace of above command:

...
open("andy", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 3
ioctl(3, 0x801c581f, 0x7ffe6fa35d30)= 0
ioctl(3, 0x401c5820, 0x7ffe6fa35d30)= -1 ENXIO (No such device or 
address)
write(2, "lfs: failed to set xattr for 'an"..., 63lfs: failed to set xattr 
for 'andy': No such device or address
) = 63
close(3)= 0
exit_group(1)   = ?


Is there a way to enable project quota on existing directories?


Thanks


--
Wei Chen
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Compiling lustre 2.12.X with linux kernel 5.10.X and above

2022-05-12 Thread Degremont, Aurelien via lustre-discuss
Lustre 2.14.0 supports Linux kernel up to 5.4.
Lustre 2.15.0 which will be released in the coming days, supports up to Linux 
5.11 according to Changelog, but supports clients up to 5.14 according to this 
ticket https://jira.whamcloud.com/browse/LU-15220

This ticket is tracking effort to support Linux 5.15 
https://jira.whamcloud.com/browse/LU-15420


Aurélien

Le 12/05/2022 06:23, « lustre-discuss au nom de Tung-Han Hsieh » 
 a écrit :

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



Dear All,

We tried to compile Lustre-2.12.6, 2.12.8, and 2.14.0 with Linux
kernel 5.10.114 and 5.15.38, the newest releases of the longterm
series of Linux kernel in Linux Kernel Archives:

https://www.kernel.org/

but all failed in configure state. When running this command in:

./configure --prefix=/opt/lustre --with-linux=/usr/src/linux-5.4.192 \
--with-o2ib=no --disable-server --enable-mpitests=no

it prompted with the error message:


==
checking for /usr/src/linux-5.10.114/include/generated/autoconf.h... yes
checking for /usr/src/linux-5.10.114/include/linux/version.h... no
checking for 
/usr/src/linux-5.10.114/include/generated/uapi/linux/version.h... yes
checking for /usr/src/linux-5.10.114/include/linux/kconfig.h... yes
checking for external module build target... configure: error: unknown; 
check config.log for details

==

The error logs in config.log are attached below.

I am wondering whether there is a plan to port Lustre to Linux kernel
version 5 ? at least the Lustre client part. Upgrading to Linux kernel
version 5 is necessary for us, because the drivers of the embedded
ethernet cards of some newly purchased hardware only available in
Linux kernel version 5.

Thanks very much for your reply in advance.

Best Regards,

T.H.Hsieh


==
configure:10681: checking for external module build target
configure:10709: cp conftest.c build && make -d 
/usr/src/lustre-2.12.6/build LUSTRE_KERNEL_TEST=conftest.i LDFLAGS= 
LD=/usr/bin/ld -m elf_x86_64 CC=gcc -f /usr/src/lustre-2.12.6/build/Makefile 
LUSTRE_LINUX_CONFIG=/usr/src/linux-5.10.114/.config LINUXINCLUDE= 
-I/usr/src/linux-5.10.114/arch/x86/include -Iinclude 
-Iarch/x86/include/generated -I/usr/src/linux-5.10.114/include -Iinclude2 
-I/usr/src/linux-5.10.114/include/uapi -Iinclude/generated 
-I/usr/src/linux-5.10.114/arch/x86/include/uapi 
-Iarch/x86/include/generated/uapi -I/usr/src/linux-5.10.114/include/uapi 
-Iinclude/generated/uapi -include 
/usr/src/linux-5.10.114/include/linux/kconfig.h -o tmp_include_depends -o 
scripts -o include/config/MARKER -C /usr/src/linux-5.10.114 
EXTRA_CFLAGS=-Werror-implicit-function-declaration -g 
-I/usr/src/lustre-2.12.6/libcfs/include -I/usr/src/lustre-2.12.6/lnet/include 
-I/usr/src/lustre-2.12.6/lustre/include/uapi 
-I/usr/src/lustre-2.12.6/lustre/include -Wno-format-truncation 
-Wno-stringop-truncation
  -Wno-stringop-overflow SUBDIRS=/usr/src/lustre-2.12.6/build
configure:10712: $? = 0
configure:10714: test -s build/conftest.i
configure:10717: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "Lustre"
| #define PACKAGE_TARNAME "lustre"
| #define PACKAGE_VERSION "2.12.6"
| #define PACKAGE_STRING "Lustre 2.12.6"
| #define PACKAGE_BUGREPORT "https://jira.whamcloud.com/";
| #define PACKAGE_URL ""
| #define PACKAGE "lustre"
| #define VERSION "2.12.6"
| #define STDC_HEADERS 1
| #define HAVE_SYS_TYPES_H 1
| #define HAVE_SYS_STAT_H 1
| #define HAVE_STDLIB_H 1
| #define HAVE_STRING_H 1
| #define HAVE_MEMORY_H 1
| #define HAVE_STRINGS_H 1
| #define HAVE_INTTYPES_H 1
| #define HAVE_STDINT_H 1
| #define HAVE_UNISTD_H 1
| #define HAVE_DLFCN_H 1
| #define LT_OBJDIR ".libs/"
| #define LUSTRE_MAJOR 2
| #define LUSTRE_MINOR 12
| #define LUSTRE_PATCH 6
| #define LUSTRE_FIX 0
| #define LUSTRE_VERSION_STRING "2.12.6"
| #define SIZEOF_UNSIGNED_LONG_LONG 8
| /* end confdefs.h.  */
|
| #include 
| #include 
|
| int
| main (void)
| {
|
|   ;
|   return 0;
| };
| MODULE_LICENSE("GPL");
configure:10746: cp conftest.c build && make -d 
_module_/usr/src/lustre-2.12.6/build LUSTRE_KERNEL_TEST=conftest.i LDFLAGS= 
LD=/usr/bin/ld -m elf_x86_64 CC=gcc -f /usr/src/lustre-2.12.6/build/Makefile 
LUSTRE_LINUX_CONFIG=/usr/src/linux-5.10.114/.config LINUXINCLUDE= 
-I/usr/src/linux-5.10.114/arch/x86/include -Iinclude 
-Iarch/x86/include/generated -I/usr/src/linux-5.10.114/inclu

Re: [lustre-discuss] Lustre recycle bin

2022-10-20 Thread Degremont, Aurelien via lustre-discuss
Hi François

[root@server1 ~] rm: cannot remove ‘Logs: Cannot send after transport endpoint 
shutdown
[root@server1 ~] mv: cannot move /test/lustre/structure1 to 
‘/test/lustre/structure2’: Input/output error

These 2 error messages are typical from a client eviction issue. Your client 
was evicted (likely from the MDT) and you should expect ESHUTDOWN or EIO in 
that situation. Look at the kernel logs (dmesg) from both client and server to 
try understand. The client usually reconnect quickly when this happens and 
following access should not return error.

Aurélien

De : lustre-discuss  au nom de 
"Cloete, F. (Francois) via lustre-discuss" 
Répondre à : "Cloete, F. (Francois)" 
Date : jeudi 20 octobre 2022 à 08:48
À : "Spitz, Cory James" , Alastair Basden 

Cc : "lustre-discuss@lists.lustre.org" 
Objet : RE: [EXTERNAL][lustre-discuss] Lustre recycle bin


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi Cory,
They are both running the same versions on client and mgs server.

lfs 2.12.8_6_g5457c37

Not sure if this could be related but our lustre environment started behaving 
starngly 2/3 weeks ago.

Also seeing the below when doing rsync of folders to new destinations.
failed: Cannot send after transport endpoint shutdown (108)


[root@server1 ~] rm: cannot remove ‘Logs: Cannot send after transport endpoint 
shutdown
[root@server1 ~] mv: cannot move /test/lustre/structure1 to 
‘/test/lustre/structure2’: Input/output error

[root@server1 ~] ~] ll
[root@server1 ~] ls: cannot open directory .: Cannot send after transport 
endpoint shutdown

Regards

From: Spitz, Cory James 
Sent: Monday, 17 October 2022 20:56
To: Cloete, F. (Francois) ; Alastair Basden 

Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Lustre recycle bin


You don't often get email from cory.sp...@hpe.com. 
Learn why this is important

CAUTION - EXTERNAL SENDER - Please be careful when opening links and 
attachments. Nedbank - IT Information Security Department (ISD)
What version(s) are you using?  Do you have an old client and a new-ish server?

Very old client versions will disagree with the MDSes about how to clean up 
objects, resulting in orphans.

-Cory

On 10/17/22, 3:44 AM, "lustre-discuss" 
mailto:lustre-discuss-boun...@lists.lustre.org>>
 wrote:

Thank-you!

-Original Message-
From: Alastair Basden mailto:a.g.bas...@durham.ac.uk>>
Sent: Monday, 17 October 2022 10:13
To: Cloete, F. (Francois) 
mailto:francois...@nedbank.co.za>>
Cc: Andreas Dilger mailto:adil...@whamcloud.com>>; 
lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Lustre recycle bin

[You don't often get email from 
a.g.bas...@durham.ac.uk. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification   ]


CAUTION - EXTERNAL SENDER -  Please be careful when opening links and 
attachments. Nedbank - IT Information Security Department (ISD)


Hi Francois,

We had something similar a few months back - I suspect a bug somewhere.

Basically files weren't getting removed from the OST.  Eventually, we mounted 
as ext, and removed them manually, I think.

A reboot of the file system meant that rm operations then proceeded correctly 
after that.

Cheers,
Alastair.

On Mon, 17 Oct 2022, Cloete, F. (Francois) via lustre-discuss wrote:

> [EXTERNAL EMAIL]
> Hi Andreas,
> Our OSTs still display high file-system usage after removing folders.
>
> Are there any commands that could be run to confirm if the allocated space 
> which was used by those files have been released successfully ?
>
> Thanks
> Francois
>
> From: Andreas Dilger mailto:adil...@whamcloud.com>>
> Sent: Saturday, 15 October 2022 00:20
> To: Cloete, F. (Francois) 
> mailto:francois...@nedbank.co.za>>
> Cc: lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] Lustre recycle bin
>
> You don't often get email from
> adil...@whamcloud.com>.
>  Learn why this is
> important  >
> CAUTION - EXTERNAL SENDER - Please be careful when opening links and
> attachments. Nedbank - IT Information Security Department (ISD) There isn't a 
> recycle bin, but filenames are deleted from the filesystem quickly and the 
> data objects are deleted i

Re: [lustre-discuss] Patched or Patch less kernel

2022-10-20 Thread Degremont, Aurelien via lustre-discuss
Hi Christopher,

As far as I know, this will only prevent you from using few features that 
requires either a recent Linux kernel or a kernel with the appropriate 
backports for them to work. From the top of my head, I'm thinking as project 
quotas for ldiskfs, but you are using ZFS, or client side encryption but this 
is not yet available in 2.12. Other may know better, but you should be pretty 
safe going with the stock kernel if using Lustre 2.12 and ZFS w.r.t patched 
kernel.


Aurélien

Le 19/10/2022 19:35, « lustre-discuss au nom de Mountford, Christopher J. 
(Dr.) via lustre-discuss »  a écrit :

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



Is there any disadvantage to using a stock distribution kernel on our 
Centos 7 lustre servers instead of the _lustre kernel provided in the lustre 
release (version 2.12.9)?

We build spl/zfs and the lustre-zfs kernel modules using dkms and 
standardized on the kernel in the lustre server release a while ago.

Kind Regards,
Christopher.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Problem in confugration of MGS and MDT server in lustre.

2022-12-20 Thread Degremont, Aurelien via lustre-discuss
Hi

Yes, both parameters at the same time are valid.

Regarding the RPMS you installed. You picked a new kernel, did you reboot to 
use it? If not you should.
You are missing those 2 packages, I’m even surprised Yum did not complain about 
missing deps.
kmod-lustre-2.15.1-1.el8.x86_64.rpm
kmod-lustre-osd-ldiskfs-2.15.1-1.el8.x86_64.rpm

lustre-ldiskfs-dkms: you installed ldiskfs support as a DKMS package but not 
Lustre as a DKMS too. I recommend using the same for every package, so here, 
install the osd package above instead of your DKMS one.

Probably the best for you is to setup 
https://downloads.whamcloud.com/public/lustre/latest-release/el8.6/server/ as 
YUM additional repository.

Aurélien

De : lustre-discuss  au nom de Taner 
KARAGÖL via lustre-discuss 
Répondre à : Taner KARAGÖL 
Date : mardi 20 décembre 2022 à 07:12
À : Nick dan 
Cc : "lustre-discuss@lists.lustre.org" 
Objet : RE: [EXTERNAL][lustre-discuss] Problem in confugration of MGS and MDT 
server in lustre.


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


UNCLASSIFIED

Hi,

Is it valid to use two parameters at the same time?  “--mgs –mdt”   I dont 
think so.

Regards,

From: lustre-discuss  On Behalf Of 
Nick dan via lustre-discuss
Sent: Tuesday, December 20, 2022 7:17 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Problem in confugration of MGS and MDT server in 
lustre.

Hi,
Nick here Trying to configure Lustre MGS and MDT on Redhat 8.6.
Referred from  https://www.lustre.org/
Successfully downloaded and installed the following packages by using yum 
install command:

yum install 
https://downloads.whamcloud.com/public/lustre/latest-release/el8.6/server/RPMS/x86_64/kernel-4.18.0-372.9.1.el8_lustre.x86_64.rpm
yum install 
https://downloads.whamcloud.com/public/lustre/latest-release/el8.6/server/RPMS/x86_64/lustre-2.15.1-1.el8.x86_64.rpm
yum install 
https://downloads.whamcloud.com/public/lustre/latest-release/el8.6/server/RPMS/x86_64/lustre-ldiskfs-dkms-2.15.1-1.el8.noarch.rpm
yum install 
https://downloads.whamcloud.com/public/lustre/latest-release/el8.6/server/RPMS/x86_64/lustre-osd-ldiskfs-mount-2.15.1-1.el8.x86_64.rpm
yum install 
https://downloads.whamcloud.com/public/e2fsprogs/latest/el8/RPMS/x86_64/e2fsprogs-1.46.2.wc5-0.el8.x86_64.rpm
yum install 
https://downloads.whamcloud.com/public/e2fsprogs/latest/el8/RPMS/x86_64/e2fsprogs-libs-1.46.2.wc5-0.el8.x86_64.rpm
yum install 
https://downloads.whamcloud.com/public/e2fsprogs/latest/el8/RPMS/x86_64/libcom_err-1.46.2.wc5-0.el8.x86_64.rpm
yum install 
https://downloads.whamcloud.com/public/e2fsprogs/latest/el8/RPMS/x86_64/libss-1.46.2.wc5-0.el8.x86_64.rpm
After this trying to do mkfs.lustre on my available NVMe-disk(nvme1n1)Mentioned 
in the below Fig
[cid:image001.png@01D9145E.95FCE300]

And did the mkfs.lusre on it and getting the error .


mkfs.lustre --fsname=lustre  --mgs --mdt --index=0  /dev/nvme1n1

   Permanent disk data:
Target: lustre:MDT
Index:  0
Lustre FS:  lustre
Mount type: ldiskfs
Flags:  0x65
  (MDT MGS first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

checking for existing Lustre data: not found
device size = 953869MB
formatting backing filesystem ldiskfs on /dev/nvme1n1
target name   lustre:MDT
kilobytes 976762584
options-J size=4096 -I 1024 -i 2560 -q -O 
dirdata,uninit_bg,^extents,dir_nlink,quota,project,huge_file,ea_inode,large_dir,^fast_commit,flex_bg
 -E lazy_journal_init="0",lazy_itable_init="0" -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT  -J size=4096 -I 1024 -i 2560 -q 
-O 
dirdata,uninit_bg,^extents,dir_nlink,quota,project,huge_file,ea_inode,large_dir,^fast_commit,flex_bg
 -E lazy_journal_init="0",lazy_itable_init="0" -F /dev/nvme1n1 976762584k
mkfs.lustre: Unable to mount /dev/nvme1n1: No such device
Is the ldiskfs module available?
mkfs.lustre FATAL: failed to write local files
mkfs.lustre: exiting with 19 (No such device)

Can you help where am I going wrong?

Regards,

Nick


Dikkat:

Bu elektronik posta mesaji kisisel ve ozeldir. Eger size gonderilmediyse lutfen 
gondericiyi bilgilendirip mesaji siliniz. Firmamiza gelen ve giden mesajlar 
virus taramasindan gecirilmekte, guvenlik nedeni ile kontrol edilerek 
saklanmaktadir. Mesajdaki gorusler ve bakis acisi gondericiye ait olup Aselsan 
A.S. resmi gorusu olmak zorunda degildir.


Attention:

This e-mail message is privileged and confidential. If you are not the intended 
recipient please delete the message and notify the sender. E-mails to 

Re: [lustre-discuss] OST not freeing space for deleted files?

2023-01-12 Thread Degremont, Aurelien via lustre-discuss
Hello Daniel,

You should also check if there is not some user workload that is triggering 
that load, like a constant load of SYNC to files on those OSTs by example.

Aurélien

Le 11/01/2023 22:37, « lustre-discuss au nom de Daniel Szkola via 
lustre-discuss »  a écrit :

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



We recently had to take an OSS node that hosts two OSTs out of service to 
test the hardware as it was randomly power cycling.

I migrated all files off of the two OSTs and after some testing we brought 
the node back into service after recreating the ZFS pools
and the two OSTs. Since then it’s been mostly working fine, however we’ve 
noticed a few group quotas reporting file usage that doesn’t
seem to match what is actually on the filesystem. The inode counts seem to 
be correct, but the space used is way too high.

After lots of poking around I am seeing this on the two OSTS:

osp.lfsc-OST0004-osc-MDT.sync_changes=13802381
osp.lfsc-OST0005-osc-MDT.sync_changes=13060667

I upped the max_rpcs_in_progress and max_rpcs_in_flight for the two OSTs, 
but that just caused the numbers to dip slightly.
All other OSTs have 0 for that value. Also destroys_in_flight show similar 
numbers for the two OSTs.

Any ideas how I can remedy this?

Lustre 2.12.8
ZFS 0.7.13

—
Dan Szkola
FNAL





___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre Client

2023-01-13 Thread Degremont, Aurelien via lustre-discuss
Did you try? :)


But the answer is yes, ‘-o ro’ is supported for client mounts

Aurélien
De : lustre-discuss  au nom de Nick 
dan via lustre-discuss 
Répondre à : Nick dan 
Date : vendredi 13 janvier 2023 à 10:48
À : "lustre-discuss@lists.lustre.org" , 
"lustre-discuss-requ...@lists.lustre.org" 
, 
"lustre-discuss-ow...@lists.lustre.org" 
Objet : [EXTERNAL] [lustre-discuss] Lustre Client


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi

I wanted to ask if Lustre client can be mounted as read-only client or not

Regards,
Nick Dan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OST not freeing space for deleted files?

2023-01-13 Thread Degremont, Aurelien via lustre-discuss
In the past, when I've seen such issue this is was really because there was 
more thread adding new stuff to that queue and the MDT was able to empty it.
- Verify how many sync_in_flight you have?
- You're talking about Robinhood. Is Robinhood deleting lots of files?
- You're saying your destroy queue is not emptying, is there a steady UNLINK 
load coming to your MDT?
- Verify how many new requests is coming to your MDT

lctl set_param mdt.lfsc -MDT.md_stats=clear
sleep 10
lctl get_param mdt.lfsc -MDT.md_stats


Aurélien

Le 12/01/2023 18:38, « lustre-discuss au nom de Daniel Szkola via 
lustre-discuss »  a écrit :

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



I’m not seeing anything obvious. Today, the inode counts are increased and 
the group has reached their hard limit.
We have Robinhood running and the numbers there seem accurate but the quota 
numbers are still high.

I’m seeing things like this on the MDS node in dmesg:

[Wed Jan 11 11:39:07 2023] LustreError: 
39308:0:(osp_dev.c:1682:osp_iocontrol()) lfsc-OST0004-osc-MDT: unrecognized 
ioctl 0xc00866e6 by lctl
[Wed Jan 11 11:39:14 2023] LustreError: 
39314:0:(class_obd.c:465:class_handle_ioctl()) OBD ioctl : No Device -12066
[Wed Jan 11 11:39:38 2023] LustreError: 
39385:0:(class_obd.c:465:class_handle_ioctl()) OBD ioctl : No Device -12066
[Wed Jan 11 11:39:38 2023] LustreError: 
39385:0:(class_obd.c:465:class_handle_ioctl()) Skipped 1 previous similar 
message
[Wed Jan 11 12:06:12 2023] LustreError: 41360:0:(lod_dev.c:1551:lod_sync()) 
lfsc-MDT-mdtlov: can't sync ost 4: rc = -110
[Wed Jan 11 12:06:12 2023] LustreError: 41360:0:(lod_dev.c:1551:lod_sync()) 
Skipped 1 previous similar message
[Wed Jan 11 12:09:30 2023] LustreError: 41362:0:(lod_dev.c:1551:lod_sync()) 
lfsc-MDT-mdtlov: can't sync ost 4: rc = -110
[Wed Jan 11 16:18:27 2023] LustreError: 41360:0:(lod_dev.c:1551:lod_sync()) 
lfsc-MDT-mdtlov: can't sync ost 4: rc = -110

Only seeing this for OST4 though and not 5, both of which seem to be having 
the problem. So, these may be harmless.

I still don’t know why the destroys_in_flight are over 13 million and not 
decreasing. Any ideas?

—
Dan Szkola
FNAL



> On Jan 12, 2023, at 2:59 AM, Degremont, Aurelien  
wrote:
>
> Hello Daniel,
>
> You should also check if there is not some user workload that is 
triggering that load, like a constant load of SYNC to files on those OSTs by 
example.
>
> Aurélien
>
> Le 11/01/2023 22:37, « lustre-discuss au nom de Daniel Szkola via 
lustre-discuss »  a écrit :
>
>CAUTION: This email originated from outside of the organization. Do 
not click links or open attachments unless you can confirm the sender and know 
the content is safe.
>
>
>
>We recently had to take an OSS node that hosts two OSTs out of service 
to test the hardware as it was randomly power cycling.
>
>I migrated all files off of the two OSTs and after some testing we 
brought the node back into service after recreating the ZFS pools
>and the two OSTs. Since then it’s been mostly working fine, however 
we’ve noticed a few group quotas reporting file usage that doesn’t
>seem to match what is actually on the filesystem. The inode counts 
seem to be correct, but the space used is way too high.
>
>After lots of poking around I am seeing this on the two OSTS:
>
>osp.lfsc-OST0004-osc-MDT.sync_changes=13802381
>osp.lfsc-OST0005-osc-MDT.sync_changes=13060667
>
>I upped the max_rpcs_in_progress and max_rpcs_in_flight for the two 
OSTs, but that just caused the numbers to dip slightly.
>All other OSTs have 0 for that value. Also destroys_in_flight show 
similar numbers for the two OSTs.
>
>Any ideas how I can remedy this?
>
>Lustre 2.12.8
>ZFS 0.7.13
>
>—
>Dan Szkola
>FNAL
>
>
>
>
>
>___
>lustre-discuss mailing list
>lustre-discuss@lists.lustre.org
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=DBtSEnlwRKd7IUYAtj21XR88qwWp8PCksiUQy7Mn0imnzYiq8OhdYUVdjx3aGoyR&s=T29TaXoWSYBTh5eRNhMflhEe2YEQu8M1CDqrp_NSNMg&e=
>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre Support for Postgres

2023-01-19 Thread Degremont, Aurelien via lustre-discuss
Hi Dan,

There is no, a priori, incompatibilities between Lustre and Postgres. Don’t 
bother configuring some clients in RW and some in RO before having done 
properly your Postgres setup.

However, this is a Lustre mailing-list and you’re asking for Postgres setup. 
This is not the right place.
You should ask those questions to a Postgres community.

Aurélien

De : lustre-discuss  au nom de Nick 
dan via lustre-discuss 
Répondre à : Nick dan 
Date : jeudi 19 janvier 2023 à 12:23
À : "lustre-discuss@lists.lustre.org" , 
"lustre-discuss-requ...@lists.lustre.org" 
, 
"lustre-discuss-ow...@lists.lustre.org" 
Objet : [EXTERNAL] [lustre-discuss] Lustre Support for Postgres


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi

We have mounted the same data directory on all the lustre clients and have 
started the postgres service on all clients.
Our requirement is as follows:
1 client as read-write node
Other clients as read only

We want to know if Postgres is compatible with Lustre and this case is 
achievable.
Can you please let us know if this is possible?

Regards,
Nick Dan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre with ZFS Install

2023-01-24 Thread Degremont, Aurelien via lustre-discuss
Hi

It looks like the ‘./configure’ command was not successful. Did you check it?
Also, please copy/paste terminal output as text and not as a picture.

Aurélien

De : lustre-discuss  au nom de Nick 
dan via lustre-discuss 
Répondre à : Nick dan 
Date : mardi 24 janvier 2023 à 09:31
À : "lustre-discuss@lists.lustre.org" , 
"lustre-discuss-requ...@lists.lustre.org" 
, 
"lustre-discuss-ow...@lists.lustre.org" 
Objet : [EXTERNAL] [lustre-discuss] Lustre with ZFS Install


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi,

We are trying to use ZFS with Lustre referring to the link:
https://wiki.lustre.org/Lustre_with_ZFS_Install#Build_ZFS

We are using the following steps to do so and getting error while making rpms.

git clone 
git://git.whamcloud.com/fs/lustre-release.git
cd lustre-release/
sh ./autogen.sh
./configure --disable-ldiskfs

make rpms (When we are doing make rpms, we are getting the following error) 
(Error attached in ss below)
[root@sv01 lustre-release]# make rpms
make: *** No rule to make target 'rpms'.  Stop.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre with ZFS Install

2023-01-24 Thread Degremont, Aurelien via lustre-discuss
> configure: WARNING: GSS keyring backend requires libkeyutils

The configure command clearly says that libkeyutils should be installed.
Did you try to install it?

Under Rhel, this is probably: dnf install libkeyutils-devel


Aurélien

De : Nick dan 
Date : mardi 24 janvier 2023 à 10:41
À : "Degremont, Aurelien" , 
"lustre-discuss@lists.lustre.org" , 
"lustre-discuss-requ...@lists.lustre.org" 
, 
"lustre-discuss-ow...@lists.lustre.org" 
Objet : RE: [EXTERNAL][lustre-discuss] Lustre with ZFS Install


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi

I have attached the text file. I have got the following error on ./configure.
configure: error: Cannot enable gss_keyring. See above for details.

Can you help with this?

On Tue, 24 Jan 2023 at 14:47, Degremont, Aurelien 
mailto:degre...@amazon.fr>> wrote:
Hi

It looks like the ‘./configure’ command was not successful. Did you check it?
Also, please copy/paste terminal output as text and not as a picture.

Aurélien

De : lustre-discuss 
mailto:lustre-discuss-boun...@lists.lustre.org>>
 au nom de Nick dan via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>>
Répondre à : Nick dan mailto:nickdan2...@gmail.com>>
Date : mardi 24 janvier 2023 à 09:31
À : "lustre-discuss@lists.lustre.org" 
mailto:lustre-discuss@lists.lustre.org>>, 
"lustre-discuss-requ...@lists.lustre.org"
 
mailto:lustre-discuss-requ...@lists.lustre.org>>,
 
"lustre-discuss-ow...@lists.lustre.org"
 
mailto:lustre-discuss-ow...@lists.lustre.org>>
Objet : [EXTERNAL] [lustre-discuss] Lustre with ZFS Install


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi,

We are trying to use ZFS with Lustre referring to the link:
https://wiki.lustre.org/Lustre_with_ZFS_Install#Build_ZFS

We are using the following steps to do so and getting error while making rpms.

git clone 
git://git.whamcloud.com/fs/lustre-release.git
cd lustre-release/
sh ./autogen.sh
./configure --disable-ldiskfs

make rpms (When we are doing make rpms, we are getting the following error) 
(Error attached in ss below)
[root@sv01 lustre-release]# make rpms
make: *** No rule to make target 'rpms'.  Stop.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org