Hi all,

We have solved the problem, so posting back here for completeness, in case it helps anyone else.

It turns out that the oi.16* files are some sort of cache file, and not really needed.

So, getting hints from the user manual section about file-level backups, we mounted the mdt as ldiskfs, removed all the oi.16* files (64 of them) and a few others (lfsck_*, LFSCK, CATALOGS), and then remounted as Lustre.

After a few hours on an lctl lfsck, all appears to be well.

Hopefully that will help someone having a future Lustre panic!

Cheers,
Alastair.

On Mon, 26 Jan 2026, Alastair Basden via lustre-discuss wrote:

[EXTERNAL EMAIL]

Hi all,

We are wondering whether anyone can shed some light for us.

A MDT raid controller failed, and the drbd replica seems to be corrupted,
since we can't mount the MDT on another node (where it should have been
replicated to).

We are using Lustre 2.12.6.

Errors are (when trying to mount):

LDISKFS-fs (drbd3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc LustreError: 114156:0:(osd_iam.c:182:iam_load_idle_blocks()) drbd3: cannot load idle blocks, blk = 1244, err = -5 LustreError: 114156:0:(osd_oi.c:324:osd_oi_table_open()) drbd3: can't open oi.16.6: rc = -5 LustreError: 114156:0:(osd_oi.c:327:osd_oi_table_open()) drbd3: expect to open total 64 OI files. LustreError: 114156:0:(obd_config.c:559:class_setup()) setup cos8-MDT0003-osd failed (-5) LustreError: 114156:0:(obd_mount.c:202:lustre_start_simple()) cos8-MDT0003-osd setup error -5 LustreError: 114156:0:(obd_mount_server.c:1958:server_fill_super()) Unable to start osd on /dev/drbd3: -5 LustreError: 114156:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount (-5)

We can mount as ldiskfs, and the oi.16.6 file is there, however we suspect
this is corrupted (based on teh above error).

We are wondering whether replacing this file from a backup (or indeed from
the failed raid once the controller is back online) would be an option,
and allow the system to continue again, albeit with some potential data
loss of recent accesses.

The failed MDT is not the primary one.

Anyone any ideas?

Thanks,
Alastair.
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to