Josh,

FWIW, I have been through a very similar issue.
My story from where you are:

I ran fsck on all the targets until it was happy. I mounted the MGS, then the 
MDT. Then I mounted all the OSTs EXCEPT the one that would cause kernel panics. 
This let me get at all the data that did not have anything on that OST.

Then we took about 5 months to get a support contract, just in case.
In the meantime, I upgraded to the latest version of lustre. Everything looked 
good an when we had downtime with support in place, I mounted the 'bad' OST 
which came up with no issue.

The best I could come up with was the upgrade made a difference. Not sure if 
one of the bugs that was fixed did it or what.

But, as I said, we were able to bring up the filesystem with the exception of 
the one OST, so that could help you some.


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238



-----Original Message-----
From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf 
Of Josh Samuelson
Sent: Tuesday, August 11, 2015 1:42 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Recover crashing MDS/MDT (assertion "invalid llog 
tail")

Hello,

I'm experiencing MDS crashes after recovering from FS corruption caused by 
hardware RAID failures on a system I manage.

I'm hoping a similar recovery process discussed on LU-6696 could work for
me: https://jira.hpdd.intel.com/browse/LU-6696

I'm trying to get the FS back to a readable state so users can copy files off 
the system, at which point the FS will likely be reformatted.

Details: Luster servers are running 2.5.3.90 on RHEL kernel 2.6.32-431.56.1.

All was running well till a hardware failure occurred with the backing storage 
arrays of the MDT.  The storage array has active and standby RAID controllers.
The active controller crashed and the "mirrored" cache to the standby 
controller was lost.  MDS+MDT/MGS kept running with multi-path driver under the 
assumption all issued IOs succeeded.  :-/

Hours later the MDS kernel noticed corruption on the MDT and remounted it 
read-only:

[4018464.727774] LDISKFS-fs error (device dm-2): htree_dirblock_to_tree: bad 
entry in directory #1967679418: rec_len % 4 != 0 - block=983894031offset=0(0), 
inode=1919907631, rec_len=12139, name_len=119 [4018464.745662] Aborting journal 
on device dm-2-8.
[4018464.751092] LustreError: 8086:0:(osd_handler.c:860:osd_trans_commit_cb()) 
transaction @0xffff8803ebe4acc0 commit error: 2 [4018464.762367] LDISKFS-fs 
(dm-2): Remounting filesystem read-only

All Lustre clients were unmounted followed by server targets in the following
order: MGS/MDT (same device), all 36xOSTs.  The umounts completed without issue.

The MDT was mounted to recover the journal before running:
e2fsck -fp /dev/mdtdevice

No surprise, that reported: "UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY."

I created a lvm snapshot of the MDT and progressed to:
e2fsck -fy /dev/mdtdevice

As expected, the above found numerous issues with files created prior to the 
MDT going RO.

Mounting the MGS/MDT after the e2fsck succeeded.  I am able to mount OST 
indexes 0 through 6, but the MDS kernel panics when OST0007 is mounted.

I've tried mounting the targets with '-o abort_recov' and also attempted 
'tunefs.lustre --writeconf' on the MDT and OSTs, mounting them in index order.

Mounting OST index 7 always panics the MDS kernel.

I increased the 'lctl set_param' debug/subsystem_debug/printk in hopes of 
narrowing down where the issue is and captured messages when OST index 7 was 
being mounted.  A small snippet prior to LBUG:

<6>[  964.491664] Lustre: 2491:0:(osp_sync.c:949:osp_sync_llog_init()) lstrFS-OS
T0007-osc-MDT0000: Init llog for 7 - catid 0x10:1:0 <snip> <6>[  964.504006] 
Lustre: 2493:0:(llog.c:318:llog_process_thread()) index: 63072 last_index 64767 
<6>[  964.504010] Lustre: 2493:0:(llog_osd.c:551:llog_osd_next_block()) looking 
for log index 63072 (cur idx 32896 off 2113536) <3>[  964.505912] LustreError: 
2493:0:(llog_osd.c:638:llog_osd_next_block()) lstrFS-MDT0000-osd: invalid llog 
tail at log id 0x2b54:1/0 offset 4087808 <6>[  964.505923] Lustre: 
2493:0:(llog.c:402:llog_process_thread()) kfreed 'buf': 8192 at 
ffff880617ace000.
<6>[  964.505928] Lustre: 2493:0:(llog.c:480:llog_process_or_fork()) kfreed 
'lpi': 80 at ffff8806257854c0.
<6>[  964.505937] Lustre: 2493:0:(llog.c:402:llog_process_thread()) kfreed 
'buf': 8192 at ffff880617aae000.
<6>[  964.505942] Lustre: 2493:0:(llog.c:480:llog_process_or_fork()) kfreed 
'lpi': 80 at ffff880625a298c0.
<0>[  964.505951] LustreError: 2493:0:(osp_sync.c:874:osp_sync_thread()) 
ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 2 in progress, 
0 in flight: -22 <0>[  964.505956] LustreError: 
2493:0:(osp_sync.c:874:osp_sync_thread()) LBUG <4>[  964.505958] Pid: 2493, 
comm: osp-syn-7-0 <4>[  964.505960] <4>[  964.505960] Call Trace:
<4>[  964.505988]  [<ffffffffa02e3895>] libcfs_debug_dumpstack+0x55/0x80 
[libcfs] <4>[  964.505997]  [<ffffffffa02e3e97>] lbug_with_loc+0x47/0xb0 
[libcfs] <4>[  964.506007]  [<ffffffffa0e5a2e3>] osp_sync_thread+0x753/0x7d0 
[osp] <4>[  964.506011]  [<ffffffff81529e9e>] ? thread_return+0x4e/0x770 <4>[  
964.506018]  [<ffffffffa0e59b90>] ? osp_sync_thread+0x0/0x7d0 [osp] <4>[  
964.506021]  [<ffffffff8109abf6>] kthread+0x96/0xa0 <4>[  964.506025]  
[<ffffffff8100c20a>] child_rip+0xa/0x20 <4>[  964.506027]  [<ffffffff8109ab60>] 
? kthread+0x0/0xa0 <4>[  964.506029]  [<ffffffff8100c200>] ? child_rip+0x0/0x20 
<4>[  964.506030] <6>[  964.506218] LNet: 
2493:0:(linux-debug.c:142:libcfs_run_upcall()) Invoked LNET upcall 
/usr/lib/lustre/lnet_upcall 
LBUG,/builddir/build/BUILD/lustre-2.5.3.90/lustre/osp/osp_sync.c,osp_sync_thread,874
<0>[  964.506221] Kernel panic - not syncing: LBUG

Is there any way to fix or remove the (bad?) llog for 7 catid 0x10:1:0?
Also, how is 0x10:1:0 decoded to map back to a file on the ldiskfs?

Thank you,
-Josh
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to