Hi all,
We are wondering whether anyone can shed some light for us.
A MDT raid controller failed, and the drbd replica seems to be corrupted,
since we can't mount the MDT on another node (where it should have been
replicated to).
We are using Lustre 2.12.6.
Errors are (when trying to mount):
LDISKFS-fs (drbd3): mounted filesystem with ordered data mode. Opts:
user_xattr,errors=remount-ro,no_mbcache,nodelalloc
LustreError: 114156:0:(osd_iam.c:182:iam_load_idle_blocks()) drbd3: cannot load
idle blocks, blk = 1244, err = -5
LustreError: 114156:0:(osd_oi.c:324:osd_oi_table_open()) drbd3: can't open
oi.16.6: rc = -5
LustreError: 114156:0:(osd_oi.c:327:osd_oi_table_open()) drbd3: expect to open
total 64 OI files.
LustreError: 114156:0:(obd_config.c:559:class_setup()) setup cos8-MDT0003-osd
failed (-5)
LustreError: 114156:0:(obd_mount.c:202:lustre_start_simple()) cos8-MDT0003-osd
setup error -5
LustreError: 114156:0:(obd_mount_server.c:1958:server_fill_super()) Unable to
start osd on /dev/drbd3: -5
LustreError: 114156:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount
(-5)
We can mount as ldiskfs, and the oi.16.6 file is there, however we suspect
this is corrupted (based on teh above error).
We are wondering whether replacing this file from a backup (or indeed from
the failed raid once the controller is back online) would be an option,
and allow the system to continue again, albeit with some potential data
loss of recent accesses.
The failed MDT is not the primary one.
Anyone any ideas?
Thanks,
Alastair.
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org