Hi,

Turns out there is a problem with the zpool, which we think got corrupted by a stonith event when a disk on another pool started to do a predicted failure.

A zpool scrub has been done, and there are 5 files with permanent errors (zpool status -v):
errors: Permanent errors have been detected in the following files:

        cos8-ost6/ost6:<0xe>
        cos8-ost6/ost6:<0x1a>
        cos8-ost6/ost6:<0x1c>
        cos8-ost6/ost6:/
        cos8-ost6/ost6:<0x193>

The fact that / is corrupted seems to worry me!
If we set the canmount=on property and mount the zpool, then an ls of the mount point gives an Input/output error.

Does anyone have experience with how to repair this?

There is no hardware problem, all 12 disks within this z2 pool are fine - we think the stonith must have caused it - though I thought zfs was supposed to be immune to that!

Thanks...


On Tue, 30 Nov 2021, Tommi Tervo wrote:

[EXTERNAL EMAIL]

Upon attempting to mount a zfs OST, we are getting:
Message from syslogd@c8oss01 at Nov 29 18:11:47 ...
 kernel:LustreError: 58223:0:(lu_object.c:1267:lu_device_fini())
ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1

Message from syslogd@c8oss01 at Nov 29 18:11:47 ...
 kernel:LustreError: 58223:0:(lu_object.c:1267:lu_device_fini()) LBUG

Hi,

Looks like LU-12675, time to upgrade 2.12.7?

https://jira.whamcloud.com/browse/LU-12675

HTH,
-Tommi

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to