from:"Alastair Basden via lustre\-discuss"

[lustre-discuss] Multiple IB interfaces

2021-03-09 Thread Alastair Basden via lustre-discuss


Hi,

We are installing some new Lustre servers with 2 InfiniBand cards, 1 
attached to each CPU socket.  Storage is nvme, again, some drives attached 
to each socket.


We want to ensure that data to/from each drive uses the appropriate IB 
card, and doesn't need to travel through the inter-cpu link.  Data being 
written is fairly easy I think, we just set that OST to the appropriate IP 
address.  However, data being read may well go out the other NIC, if I 
understand correctly.


What setup do we need for this?

I think probably not bonding, as that will presumably not tie 
NIC interfaces to cpus.  But I also see a note in the Lustre manual:


"""If the server has multiple interfaces on the same subnet, the Linux 
kernel will send all traffic using the first configured interface. This is 
a limitation of Linux, not Lustre. In this case, network interface bonding 
should be used. For more information about network interface bonding, see 
Chapter 7, Setting Up Network Interface Bonding."""


(plus, no idea if bonding is supported on InfiniBand).

Thanks,
Alastair.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Multiple IB Interfaces

2021-03-12 Thread Alastair Basden via lustre-discuss

Hi all,

Thanks for the replies. The issue as I see it is with sending data from
an OST to the client, avoiding the inter-CPU link.

So, if I have:
cpu1 - IB card 1 (10.0.0.1), nvme1 (OST1)
cpu2 - IB card 2 (10.0.0.2), nvme2 (OST2)

Both IB cards on the same subnet. Therefore, by default, packets will be
routed out of the server over the preferred card, say IB card 1 (I could
be wrong, but this is my current understanding, and seems to be what the
Lustre manual says).

Data coming in (being written to the OST) is not a problem. The client
will know the IP address of the card to which the OST is closest. So,
to write to OST2, it will use the 10.0.0.2 address (since this will be
the IP address given in mkfs.lustre for that OST).

The slight complication here is pinning. A cpu thread may run on cpu1, so
the data has to traverse the inter-cpu link twice. However, I am assuming
that this won't happen - i.e. the kernel or lustre are clever enough to
place this thread on cpu2. As far as I am aware, this should just work,
though please correct me if I'm wrong. Perhaps I have to manually specify
pinning - how does one do that with Lustre?

Reading is more problematic. A request from a client (say 10.0.0.100) for
data on OST2 will come in via card 2 (10.0.0.2). A thread on CPU2
(hopefully) will then read the data from OST2, and send it out to the
client, 10.0.0.100. However, here, Linux will route the packet through
the first card on this subnet, so it will go over the inter-cpu link, and
out of IB card 1. And this will be the case even if the thread is pinned
on CPU2.

The question then is whether there is a way to configure Lustre to use IB
card 2 when sending out data from OST2.

Cheers,
Alastair.

On Wed, 10 Mar 2021, Ms. Megan Larko wrote:

[EXTERNAL EMAIL]
Greetings Alastair,

Bonding is supported on InfiniBand, but I believe that it is only
active/passive.
I think what you might be looking for WRT avoiding data travel through the inter-cpu link is cpu
"affinity" AKA cpu "pinning".

Cheers,
megan

WRT = "with regards to"
AKA = "also known as"

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] ZFS wobble

2022-04-28 Thread Alastair Basden via lustre-discuss


Hi,

We have OSDs on ZFS (0.7.9) / Lustre 2.12.6.

Recently, one of our JBODs had a wobble, and the disks (as presented to 
the OS) disappeared for a few seconds (and then returned).


This upset a few zpools which SUSPENDED.

A zpool clear on these then started the resilvering process, and zpool 
status gave e.g.:

errors: Permanent errors have been detected in the following files:

:<0x0>
:<0xb01>
:<0x15>
:<0x383>
cos6-ost7/ost7:/O/40400/d11/10617643
cos6-ost7/ost7:/O/40400/d21/583029


However, once the resilvering had completed, these permanent errors had 
gone.


The question is then, are these errors really permanent, or was zfs able 
to correct them?


Lustre continues to remain fine (though obviously froze while the pools 
were suspended).


Should we be worried that there might be some under-the-hood corruption 
that will present itself when we need to remount (e.g. after a reboot) the 
OST?  In particular the :<0x0> file worries me a bit!


Thanks,
Alastair.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] (no subject)

2022-05-17 Thread Alastair Basden via lustre-discuss


Hi all,

We had a problem with one of our MDS (ldiskfs) on Lustre 2.12.6, which we 
think is a bug - but haven't been able to identify it.  Can anyone shed 
any light?  We unmounted and remounted the mdt at around 23:00.


Client logs:
May 16 22:15:41 m8011 kernel: LustreError: 11-0: 
lustrefs8-MDT-mdc-956fb73c3800: operation ldlm_enqueue to node 
172.18.185.1@o2ib failed: rc = -107
May 16 22:15:41 m8011 kernel: Lustre: lustrefs8-MDT-mdc-956fb73c3800: 
Connection to lustrefs8-MDT (at 172.18.185.1@o2ib) was lost; in progress 
operations using this service will wait for recovery to complete
May 16 22:15:41 m8011 kernel: LustreError: Skipped 5 previous similar messages
May 16 22:15:48 m8011 kernel: Lustre: 
101710:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed 
out for slow reply: [sent 1652735641/real 1652735641]  req@949d8cb1de80 
x1724290358528896/t0(0) 
o101->lustrefs8-MDT-mdc-956fb73c3800@172.18.185.1@o2ib:12/10 lens 
480/568 e 4 to 1 dl 1652735748 ref 2 fl Rpc:X/0/ rc 0/-1
May 16 22:15:48 m8011 kernel: Lustre: 
101710:0:(client.c:2146:ptlrpc_expire_one_request()) Skipped 6 previous similar 
messages
May 16 23:00:15 m8011 kernel: Lustre: 
4784:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed out 
for slow reply: [sent 1652738408/real 1652738408]  req@94ea07314380 
x1724290358763776/t0(0) o400->MGC172.18.185.1@o2ib@172.18.185.1@o2ib:26/25 lens 
224/224 e 0 to 1 dl 1652738415 ref 1 fl Rpc:XN/0/ rc 0/-1
May 16 23:00:15 m8011 kernel: LustreError: 166-1: MGC172.18.185.1@o2ib: 
Connection to MGS (at 172.18.185.1@o2ib) was lost; in progress operations using 
this service will fail
May 16 23:00:15 m8011 kernel: Lustre: Evicted from MGS (at 
MGC172.18.185.1@o2ib_0) after server handle changed from 0xdb7c7c778c8908d6 to 
0xdb7c7cbad3be9e79
May 16 23:00:15 m8011 kernel: Lustre: MGC172.18.185.1@o2ib: Connection restored 
to MGC172.18.185.1@o2ib_0 (at 172.18.185.1@o2ib)
May 16 23:01:49 m8011 kernel: LustreError: 167-0: 
lustrefs8-MDT-mdc-956fb73c3800: This client was evicted by 
lustrefs8-MDT; in progress operations using this service will fail.
May 16 23:01:49 m8011 kernel: LustreError: 
101719:0:(vvp_io.c:1562:vvp_io_init()) lustrefs8: refresh file layout 
[0x28107:0x9b08:0x0] error -108.
May 16 23:01:49 m8011 kernel: LustreError: 
101719:0:(vvp_io.c:1562:vvp_io_init()) Skipped 3 previous similar messages
May 16 23:01:49 m8011 kernel: Lustre: lustrefs8-MDT-mdc-956fb73c3800: 
Connection restored to 172.18.185.1@o2ib (at 172.18.185.1@o2ib)



MDS server logs:
May 16 22:15:40 c8mds1 kernel: LustreError: 
10686:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired 
after 99s: evicting client at 172.18.181.11@o2ib  ns: 
mdt-lustrefs8-MDT_UUID lock: 97b3730d98c0/0xdb7c7cbad3be1c7b lrc: 3/0,0 
mode: PW/PW res: [0x29119:0x327f:0x0].0x0 bits 0x40/0x0 rrc: 201 type: IBT 
flags: 0x6020040020 nid: 172.18.181.11@o2ib remote: 0xe62e31610edfb808 
expref: 90 pid: 10707 timeout: 8482830 lvb_type: 0
May 16 22:15:40 c8mds1 kernel: LustreError: 
10712:0:(ldlm_lockd.c:1351:ldlm_handle_enqueue0()) ### lock on destroyed export 
9769eaf46c00 ns: mdt-lustrefs8-MDT_UUID lock: 
97d828635e80/0xdb7c7cbad3be1c90 lrc: 3/0,0 mode: PW/PW res: 
[0x29119:0x327f:0x0].0x0 bits 0x40/0x0 rrc: 199 type: IBT flags: 
0x5020040020 nid: 172.18.181.11@o2ib remote: 0xe62e31610edfb80f expref: 77 
pid: 10712 timeout: 0 lvb_type: 0
May 16 22:15:40 c8mds1 kernel: LustreError: 
10712:0:(ldlm_lockd.c:1351:ldlm_handle_enqueue0()) Skipped 27 previous similar 
messages
May 16 22:17:22 c8mds1 kernel: LNet: Service thread pid 10783 was inactive for 
200.73s. The thread might be hung, or it might only be slow and will resume 
later. Dumping the stack trace for debugging purposes:
May 16 22:17:22 c8mds1 kernel: LNet: Skipped 3 previous similar messages
May 16 22:17:22 c8mds1 kernel: Pid: 10783, comm: mdt01_040 
3.10.0-1160.2.1.el7_lustre.x86_64 #1 SMP Wed Dec 9 20:53:35 UTC 2020
May 16 22:17:22 c8mds1 kernel: Call Trace:
May 16 22:17:22 c8mds1 kernel: [] 
ldlm_completion_ast+0x430/0x860 [ptlrpc]
May 16 22:17:22 c8mds1 kernel: [] 
ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc]
May 16 22:17:22 c8mds1 kernel: [] 
mdt_object_local_lock+0x50b/0xb20 [mdt]
May 16 22:17:22 c8mds1 kernel: [] 
mdt_object_lock_internal+0x70/0x360 [mdt]
May 16 22:17:22 c8mds1 kernel: [] mdt_object_lock+0x20/0x30 
[mdt]
May 16 22:17:22 c8mds1 kernel: [] mdt_brw_enqueue+0x44b/0x760 
[mdt]
May 16 22:17:22 c8mds1 kernel: [] mdt_intent_brw+0x1f/0x30 
[mdt]
May 16 22:17:22 c8mds1 kernel: [] 
mdt_intent_policy+0x435/0xd80 [mdt]
May 16 22:17:22 c8mds1 kernel: [] 
ldlm_lock_enqueue+0x376/0x9b0 [ptlrpc]
May 16 22:17:22 c8mds1 kernel: [] 
ldlm_handle_enqueue0+0xa86/0x1620 [ptlrpc]
May 16 22:17:22 c8mds1 kernel: [] tgt_enqueue+0x62/0x210 
[ptlrpc]
May 16 22:17:22 c8mds1 kernel: [] 
tgt_request_handle+0xada/0x1570 [ptlrpc]
May 16 22:17:22

Re: [lustre-discuss] Lustre recycle bin

2022-10-17 Thread Alastair Basden via lustre-discuss


Hi Francois,

We had something similar a few months back - I suspect a bug somewhere.

Basically files weren't getting removed from the OST.  Eventually, we 
mounted as ext, and removed them manually, I think.


A reboot of the file system meant that rm operations then proceeded 
correctly after that.


Cheers,
Alastair.

On Mon, 17 Oct 2022, Cloete, F. (Francois) via lustre-discuss wrote:


[EXTERNAL EMAIL]
Hi Andreas,
Our OSTs still display high file-system usage after removing folders.

Are there any commands that could be run to confirm if the allocated space 
which was used by those files have been released successfully ?

Thanks
Francois

From: Andreas Dilger 
Sent: Saturday, 15 October 2022 00:20
To: Cloete, F. (Francois) 
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Lustre recycle bin

You don't often get email from adil...@whamcloud.com. 
Learn why this is important
CAUTION - EXTERNAL SENDER - Please be careful when opening links and 
attachments. Nedbank - IT Information Security Department (ISD)
There isn't a recycle bin, but filenames are deleted from the filesystem 
quickly and the data objects are deleted in the background asynchronously (with 
transactions to prevent the space being leaked).  If there are a lot of files 
this may take some time, rebooting will not speed it up.


On Oct 14, 2022, at 10:00, Cloete, F. (Francois) via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:

Hi Community,
Is anyone aware of a recycle bin parameter for Lustre?

Just deleted a whole lot of files but for some reason the space is not getting 
cleared.

Server rebooted, file-system un-mounted etc.

Thanks


Nedbank Limited group of companies (Nedbank) disclaimer and confidentiality 
notice:
This email, including any attachments (email), contains information that is 
confidential and is meant only for intended recipients. You may not share or 
copy the email or any part of it, unless the sender has specifically allowed 
you to do so. If you are not an intended recipient, please delete the email 
permanently and let Nedbank know that you have deleted it by replying to the 
sender or calling the Nedbank Contact Centre on 0860 555 111.
This email is not confirmation of a transaction or a Nedbank statement and is 
not offering or inviting anyone to take up any financial products or services, 
unless the content specifically indicates that it does so. Nedbank will not be 
liable for any errors or omissions in this email. The views and opinions are 
those of the author and not necessarily those of Nedbank.
The names of the Nedbank Board of Directors and Company Secretary are available here: 
www.nedbank.co.za/terms/DirectorsNedbank.htm.
 Nedbank Ltd Reg No 1951/09/06.

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Changing OST servicenode

2022-11-16 Thread Alastair Basden via lustre-discuss


Hi,

We want to change the service node of an OST.  We think this involves:
1. umount the OST
2. tunefs.lustre --erase-param failover.node 
--servicenode=172.18.100.1@o2ib,172.17.100.1@tcp pool1/ost1

Is this all?  Unclear from the documentation whether a writeconf is 
required (if it is, then we'd need to dismount the whole file system, and 
take it all down, and writeconf every ost/mdt/mgt and then mount them in 
order).


Thanks,
Alastair.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Lnet errors

2023-10-05 Thread Alastair Basden via lustre-discuss


Hi,

Lustre 2.12.2.

We are seeing lots of errors on the servers such as:
Oct  5 11:16:48 oss04 kernel: LNetError: 
6414:0:(lib-move.c:2955:lnet_resend_pending_msgs_locked()) Error sending PUT to 
12345-172.19.171.15@o2ib1: -125
Oct  5 11:16:48 oss04 kernel: LustreError: 
6414:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc 
8fe066bb9400

and
Oct  4 14:59:48 oss04 kernel: LustreError: 
6383:0:(events.c:305:request_in_callback()) event type 2, status -103, service 
ost_io

and
Oct  5 11:18:06 oss04 kernel: LustreError: 
6388:0:(events.c:305:request_in_callback()) event type 2, status -5, service 
ost_io
Oct  5 11:18:06 oss04 kernel: LNet: 
6412:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 172.19.171.15@o2ib1

and on the clients:
m7: Oct  5 14:46:59 m7132 kernel: LustreError: 
2466:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc 
9a251fc14400

and
m7: Oct  5 11:18:34 m7086 kernel: LustreError: 
2495:0:(events.c:200:client_bulk_callback()) event type 2, status -5, desc 
9a39ad668000

Does anyone have any ideas about what could be causing this?

Thanks,
Alastair.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] LNET issues

2024-09-04 Thread Alastair Basden via lustre-discuss


Hi,

We are having some Lnet issues, and wonder if anyone can advise.

Client is 2.15.5, server is 2.12.6.

Fabric is IB.

The file system mounts, but OSTs on a couple of OSSs are not contactable.

Client and servers can ping each other over the IB network.

However, a lnetctl ping fails to/from the bad OSSs to this client.  To 
other clients it's all fine.


i.e. for most of the clients it is working well, just one or two not so.

Server to client:
lnetctl ping 172.18.178.201@o2ib
manage:
- ping:
  errno: -1
  descr: failed to ping 172.18.178.201@o2ib: Input/output error

Client to server:
anage:
- ping:
  errno: -1
  descr: failed to ping 172.18.185.10@o2ib: Input/output error



And the o2ib network is noted as down:
lnetctl net show --net o2ib --verbose
net:
- net type: o2ib
  local NI(s):
- nid: 172.18.178.216@o2ib
  status: down
  interfaces:
  0: ibs1f0
  statistics:
  send_count: 45032
  recv_count: 45030
  drop_count: 0
  tunables:
  peer_timeout: 100
  peer_credits: 32
  peer_buffer_credits: 0
  credits: 256
  lnd tunables:
  peercredits_hiw: 16
  map_on_demand: 1
  concurrent_sends: 32
  fmr_pool_size: 512
  fmr_flush_trigger: 384
  fmr_cache: 1
  ntx: 512
  conns_per_peer: 1
  dev cpt: 0
  CPT: "[0,1]"



Could this be a hardware error, even though the IB is working?

Could it be related to https://jira.whamcloud.com/browse/LU-16378 ?

Are there any suggestions on how to bring up the lnet network or fix the 
problems?


Thanks,
Alastair.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] LNET issues