Hi all,

We are wondering whether anyone can shed some light for us.

A MDT raid controller failed, and the drbd replica seems to be corrupted, since we can't mount the MDT on another node (where it should have been replicated to).

We are using Lustre 2.12.6.

Errors are (when trying to mount):

LDISKFS-fs (drbd3): mounted filesystem with ordered data mode. Opts: 
user_xattr,errors=remount-ro,no_mbcache,nodelalloc
LustreError: 114156:0:(osd_iam.c:182:iam_load_idle_blocks()) drbd3: cannot load 
idle blocks, blk = 1244, err = -5
LustreError: 114156:0:(osd_oi.c:324:osd_oi_table_open()) drbd3: can't open 
oi.16.6: rc = -5
LustreError: 114156:0:(osd_oi.c:327:osd_oi_table_open()) drbd3: expect to open 
total 64 OI files.
LustreError: 114156:0:(obd_config.c:559:class_setup()) setup cos8-MDT0003-osd 
failed (-5)
LustreError: 114156:0:(obd_mount.c:202:lustre_start_simple()) cos8-MDT0003-osd 
setup error -5
LustreError: 114156:0:(obd_mount_server.c:1958:server_fill_super()) Unable to 
start osd on /dev/drbd3: -5
LustreError: 114156:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  
(-5)

We can mount as ldiskfs, and the oi.16.6 file is there, however we suspect this is corrupted (based on teh above error).

We are wondering whether replacing this file from a backup (or indeed from the failed raid once the controller is back online) would be an option, and allow the system to continue again, albeit with some potential data loss of recent accesses.

The failed MDT is not the primary one.

Anyone any ideas?

Thanks,
Alastair.
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to