Hi Craig, > Has anyone seen anything like this?
Yes: we had a similar problem a couple of times: First, try to umount all OSTs on the affected OSS. Some OSTs will (most likely) fail to umount. (umount gets stuck due to the ll_ost_io_?? thread). Note the 'broken' OSTs and kill the OSS (echo b > /proc/sysrq-trigger) after the 'good' OSTs finished umounting. Afterwards do a simple 'e2fsck -f -p' on the bad OSTs - it should complain about corrupted directories and other nice things. If it doesn't -> upgrade to the latest fsck from whamcloud. (We had a corruption a few months ago that was unfixable/not detected with the 1.8.4-sun e2fsprogs) > This is a recent phenomena - we are not > sure, but we think it may be related to a particular workload. Our o2ib > clients don't seem to have any trouble. I don't think that this issue is related to the network: It's probably just 'bad luck' that only the tcp clients hit the corrupted directories. Regards, Adrian _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss