Hi,
Lustre 2.12.2.
We are seeing lots of errors on the servers such as:
Oct 5 11:16:48 oss04 kernel: LNetError:
6414:0:(lib-move.c:2955:lnet_resend_pending_msgs_locked()) Error sending PUT to
12345-172.19.171.15@o2ib1: -125
Oct 5 11:16:48 oss04 kernel: LustreError:
Hi,
We want to change the service node of an OST. We think this involves:
1. umount the OST
2. tunefs.lustre --erase-param failover.node
--servicenode=172.18.100.1@o2ib,172.17.100.1@tcp pool1/ost1
Is this all? Unclear from the documentation whether a writeconf is
required (if it is, then
Hi Francois,
We had something similar a few months back - I suspect a bug somewhere.
Basically files weren't getting removed from the OST. Eventually, we
mounted as ext, and removed them manually, I think.
A reboot of the file system meant that rm operations then proceeded
correctly after
Hi all,
We had a problem with one of our MDS (ldiskfs) on Lustre 2.12.6, which we
think is a bug - but haven't been able to identify it. Can anyone shed
any light? We unmounted and remounted the mdt at around 23:00.
Client logs:
May 16 22:15:41 m8011 kernel: LustreError: 11-0:
Hi,
We have OSDs on ZFS (0.7.9) / Lustre 2.12.6.
Recently, one of our JBODs had a wobble, and the disks (as presented to
the OS) disappeared for a few seconds (and then returned).
This upset a few zpools which SUSPENDED.
A zpool clear on these then started the resilvering process, and zpool
Hi,
Turns out there is a problem with the zpool, which we think got corrupted
by a stonith event when a disk on another pool started to do a predicted
failure.
A zpool scrub has been done, and there are 5 files with permanent errors
(zpool status -v):
errors: Permanent errors have been
Additional info - exporting the pool, importing on another (HA) server and
attempting to mount there also has the same problem, i.e. a kernel panic,
and the trace shown below.
A writeconf does not help.
On Mon, 29 Nov 2021, Alastair Basden wrote:
[EXTERNAL EMAIL]
Some more information
We suspect corruption on the OST caused by a stonith event, but could be
wrong. Any tips in how to manually solve would be great...
Thanks,
Alastair.
On Mon, 29 Nov 2021, Alastair Basden wrote:
[EXTERNAL EMAIL]
Hi all,
Upon attempting to mount a zfs OST, we are getting:
Message from sysl
Hi all,
Upon attempting to mount a zfs OST, we are getting:
Message from syslogd@c8oss01 at Nov 29 18:11:47 ...
kernel:LustreError: 58223:0:(lu_object.c:1267:lu_device_fini())
ASSERTION( atomic_read(>ld_ref) == 0 ) failed: Refcount is 1
Message from syslogd@c8oss01 at Nov 29 18:11:47 ...
, at 04:42, Alastair Basden
mailto:a.g.bas...@durham.ac.uk>> wrote:
Next step would be to unmount OST004e, run a full e2fsck, and then check lost+found
and/or a regular "find /mnt/ost -type f -size +1M" or similar to find where the
files are.
Thanks. e2fsck returns clean (on
or until you get the older software replaced (or change your
clients to operate the old way).
-Cory
On 9/16/21, 3:45 AM, "lustre-discuss on behalf of Alastair Basden"
wrote:
Hi all,
We mounted as ext4, removed the files, and then remounted as lustre (and
did the l
).
Cheers,
Alastair.
On Thu, 9 Sep 2021, Andreas Dilger wrote:
[EXTERNAL EMAIL]
On Sep 8, 2021, at 04:42, Alastair Basden
mailto:a.g.bas...@durham.ac.uk>> wrote:
Next step would be to unmount OST004e, run a full e2fsck, and then check lost+found
and/or a regular "find /mnt/
Next step would be to unmount OST004e, run a full e2fsck, and then check lost+found
and/or a regular "find /mnt/ost -type f -size +1M" or similar to find where the
files are.
Thanks. e2fsck returns clean (on its own, with -p and with -f).
Now, the find command does return a large number
dir.
I think Andreas is referring especially to directory '0', '1' and '10' is your
output.
Try looking into them, you should see multiple 'dXX' directories with objects
in them.
Aurélien
Le 06/09/2021 10:12, « Alastair Basden » a écrit :
CAUTION: This email originated from outside
ose objects.
Cheers, Andreas
On Sep 4, 2021, at 00:54, Alastair Basden wrote:
Ah, of course - has to be done on a client.
None of these files are on the dodgy OST.
Any further suggestions? Essentially we have what seems to be a full OST with
nothing on it.
Thanks,
Alastair.
On Sat, 4 Se
, 2021, at 14:51, Alastair Basden
mailto:a.g.bas...@durham.ac.uk>> wrote:
Hi,
lctl get_param mdt.*.exports.*.open_files returns:
mdt.snap8-MDT.exports.172.18.180.21@o2ib.open_files=
[0x2b90e:0x10aa:0x0]
mdt.snap8-MDT.exports.172.18.180.22@o2ib.open_files=
[0x2b90e:0x21
ien
Le 03/09/2021 09:50, « lustre-discuss au nom de Alastair Basden »
mailto:lustre-discuss-boun...@lists.lustre.org>
au nom de a.g.bas...@durham.ac.uk<mailto:a.g.bas...@durham.ac.uk>> a écrit :
CAUTION: This email originated from outside of the organization. Do not click
links or o
, this is due to an open-unlinked file,
typically a log file which is still in use and some processes keep writing to
it until it fills the OSTs it is using.
Look for such files on your clients (use lsof).
Aurélien
Le 03/09/2021 09:50, « lustre-discuss au nom de Alastair Basden »
a
écrit
Hi,
We have a file system where each OST is a single SSD.
One of those is reporting as 100% full (lfs df -h /snap8):
snap8-OST004d_UUID 5.8T2.0T3.5T 37% /snap8[OST:77]
snap8-OST004e_UUID 5.8T5.5T7.5G 100% /snap8[OST:78]
snap8-OST004f_UUID
Hi Megan,
Thanks - yes, lctl ping responds.
In the end, we did a writeconf, and this seems to have fixed the problem,
so probably some previous transient. I would however have expected it to
heal whilst online - taking the filesystem down and doing a writeconf
seems a bit drastic!
Cheers,
Hi Megan, all,
Yes, sorry, I should have said. Its 2.12.6.
A bit more detail. I can set the stripe index to 0-3 and 8-191, and it
works fine. However, when I set the stripe index to 4-7, they all end up
on OST 8. It is a system with 192 OSTs and 24 OSSs.
These 4 OSTs are all served on
Hi,
I'm trying to specify a particular OST to be used with:
lfs setstripe --stripe-index 7 myfile.dat
However, a lfs getstripe reveals that it hasn't managed to use this OST:
myfile.dat
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen:0
Hi all,
Thanks for the replies. The issue as I see it is with sending data from
an OST to the client, avoiding the inter-CPU link.
So, if I have:
cpu1 - IB card 1 (10.0.0.1), nvme1 (OST1)
cpu2 - IB card 2 (10.0.0.2), nvme2 (OST2)
Both IB cards on the same subnet. Therefore, by default,
Hi,
We are installing some new Lustre servers with 2 InfiniBand cards, 1
attached to each CPU socket. Storage is nvme, again, some drives attached
to each socket.
We want to ensure that data to/from each drive uses the appropriate IB
card, and doesn't need to travel through the inter-cpu
Hi all,
We are having a problem mounting a ldiskfs mdt. The mount command is
hanging, with /var/log/messages containing:
Oct 5 16:26:17 c6mds1 kernel: INFO: task mount.lustre:4285 blocked for more
than 120 seconds.
Oct 5 16:26:17 c6mds1 kernel: "echo 0 >
the same errors when the kmod-lustre-osd-zfs package was
missing on my system.
cheers
Pascal
On 6/2/20 1:20 AM, Alastair Basden wrote:
Hi,
We have just upgraded Lustre servers from 2.12.2 on centos 7.6 to
2.12.3 on centos 7.7.
The OSSs are on top of zfs (0.7.13 as recommended), and we
?
--Jeff
On Mon, Jun 1, 2020 at 4:21 PM Alastair Basden
wrote:
Hi,
We have just upgraded Lustre servers from 2.12.2 on centos 7.6 to 2.12.3
on centos 7.7.
The OSSs are on top of zfs (0.7.13 as recommended), and we are using
3.10.0-1062.1.1.el7_lustre.x86_64
After the update, Lustre
Hi,
We have just upgraded Lustre servers from 2.12.2 on centos 7.6 to 2.12.3
on centos 7.7.
The OSSs are on top of zfs (0.7.13 as recommended), and we are using
3.10.0-1062.1.1.el7_lustre.x86_64
After the update, Lustre will no longer mount - and messages such as:
Jun 2 00:02:44 hostname
28 matches
Mail list logo