Hi all,
This is Lustre 2.8.0 community edition, combined MGS/MDT.
I was adding storage to a filesystem and mistakenly duplicated an
index for one of the OSTs at creation time. Since these OSTs were
new and no data had been written, I made the mistake of reformatting
the affected OSTs (including the first one I successfully mounted).
When I tried to remount the newly formatted OST, the MDS kernel
panicked (log attached). After a device level backup and an e2fsck,
I can mount the MDT as ldiskfs. e2fsck did correct some orphaned
inodes, but those appear to be user files only, nothing from the
Lustre metadata files themselves.
However, the MDT/MGT still will not mount. The logs indicate
that the original definition of the duplicated OST still exists
somewhere. I checked the CONFIGS directory, and indeed there was
a file associated with the OST in question. I copied that file
out of the CONFIGS directory and attempted to mount the MDT/MGT
again, but no change.
The logs read:
May 4 06:41:22 lfs4-mds kernel: Lustre: MGS: Connection restored to
MGC10.128.11.174@tcp1_0 (at 0@lo)
May 4 06:41:22 lfs4-mds kernel: LustreError:
12300:0:(genops.c:334:class_newdev()) Device lfs4-OST000e-osc-MDT0000
already exists at 22, won't add
May 4 06:41:22 lfs4-mds kernel: LustreError:
12300:0:(obd_config.c:370:class_attach()) Cannot create device
lfs4-OST000e-osc-MDT0000 of type osp : -17
May 4 06:41:22 lfs4-mds kernel: LustreError:
12300:0:(obd_config.c:1666:class_config_llog_handler())
MGC10.128.11.174@tcp1: cfg command failed: rc = -17
May 4 06:41:22 lfs4-mds kernel: Lustre: cmd=cf001
0:lfs4-OST000e-osc-MDT0000 1:osp 2:lfs4-MDT0000-mdtlov_UUID
May 4 06:41:22 lfs4-mds kernel:
May 4 06:41:22 lfs4-mds kernel: LustreError: 15c-8:
MGC10.128.11.174@tcp1: The configuration from log 'lfs4-MDT0000' failed
(-17). This may be the result of communication errors between this node
and the MGS, a bad configuration, or other errors. See the syslog for
more information.
May 4 06:41:22 lfs4-mds kernel: LustreError:
12213:0:(obd_mount_server.c:1309:server_start_targets()) failed to start
server lfs4-MDT0000: -17
May 4 06:41:22 lfs4-mds kernel: LustreError:
12213:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start
targets: -17
May 4 06:41:22 lfs4-mds kernel: Lustre: Failing over lfs4-MDT0000
May 4 06:41:28 lfs4-mds kernel: Lustre:
12213:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has
timed out for slow reply: [sent 1493898082/real 1493898082]
req@ffff8803113459c0 x1566404887184424/t0(0)
o251->MGC10.128.11.174@tcp1@0@lo:26/25 lens 224/224 e 0 to 1 dl
1493898088 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
May 4 06:41:28 lfs4-mds kernel: Lustre: server umount lfs4-MDT0000 complete
May 4 06:41:28 lfs4-mds kernel: LustreError:
12213:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount (-17)
May 4 06:45:04 lfs4-mds kernel: LDISKFS-fs (sdb): mounted filesystem
with ordered data mode. quota=on. Opts:
Again, no data was written to these. I was poking around a bit with
the procedure for fixing a bad LAST_ID. From what I was able to
piece together, it doesn't look like the MDT has any notion of
precreated objects on this OST yet, so I am suspecting something
in mountdata, perhaps.
Any ideas?
Thanks much!
Best,
---Steve
May 3 14:20:31 lfs4-mds kernel: Lustre: MGS: Connection restored to
a5bce68d-fd2c-8bd8-c20b-e5713dc99a03 (at 10.128.11.157@tcp1)
May 3 14:20:31 lfs4-mds kernel: Lustre: Skipped 1 previous similar message
May 3 14:20:58 lfs4-mds kernel: Lustre: lfs4-MDT0000: Connection restored to
10.128.11.157@tcp1 (at 10.128.11.157@tcp1)
May 3 14:20:58 lfs4-mds kernel: Lustre: Skipped 1 previous similar message
May 3 14:21:24 lfs4-mds kernel: Lustre: lfs4-MDT0000: Connection restored to
a5bce68d-fd2c-8bd8-c20b-e5713dc99a03 (at 10.128.11.157@tcp
1)
May 3 14:25:33 lfs4-mds kernel: Lustre: MGS: Connection restored to
fcadb8d1-5c7f-143e-145e-c580fa091b56 (at 10.128.11.156@tcp1)
May 3 14:25:43 lfs4-mds kernel: Lustre:
2223:0:(mgc_request.c:1680:mgc_process_recover_log()) Process recover log
lfs4-mdtir error -22
May 3 14:25:43 lfs4-mds kernel: LustreError:
5156:0:(ldlm_lib.c:462:client_obd_setup()) can't add initial connection
May 3 14:25:43 lfs4-mds kernel: LustreError:
5156:0:(osp_dev.c:1145:osp_init0()) lfs4-OST000e-osc-MDT0000: can't setup obd:
rc = -2
May 3 14:25:43 lfs4-mds kernel: LustreError:
5156:0:(obd_config.c:578:class_setup()) setup lfs4-OST000e-osc-MDT0000 failed
(-2)
May 3 14:25:43 lfs4-mds kernel: LustreError:
5156:0:(obd_config.c:1666:class_config_llog_handler()) MGC10.128.11.174@tcp1:
cfg command
failed: rc = -2
May 3 14:25:43 lfs4-mds kernel: Lustre: cmd=cf003
0:lfs4-OST000e-osc-MDT0000 1:lfs4-OST000e_UUID 2:0@<0:0>
May 3 14:25:43 lfs4-mds kernel:
May 3 14:27:26 lfs4-mds kernel: Lustre: MGS: Connection restored to
lfs4-MDT0000-lwp-OST000e_UUID (at 10.128.11.156@tcp1)
May 3 14:27:26 lfs4-mds kernel: Lustre: Skipped 1 previous similar message
May 3 14:27:26 lfs4-mds kernel: LustreError: 140-5: Server lfs4-OST000e
requested index 14, but that index is already in use. Use --wri
teconf to force
May 3 14:27:26 lfs4-mds kernel: LustreError:
29874:0:(mgs_handler.c:460:mgs_target_reg()) Failed to write lfs4-OST000e log
(-98)
May 3 14:27:36 lfs4-mds kernel: LustreError:
5721:0:(obd_config.c:798:class_add_conn()) try to add conn on immature client
dev
May 3 14:27:36 lfs4-mds kernel: LustreError:
5721:0:(lod_lov.c:243:lod_add_device()) ASSERTION( obd->obd_lu_dev->ld_site ==
lod->lod_dt
_dev.dd_lu_dev.ld_site ) failed:
May 3 14:27:36 lfs4-mds kernel: LustreError:
5721:0:(lod_lov.c:243:lod_add_device()) LBUG
May 3 14:27:36 lfs4-mds kernel: Pid: 5721, comm: llog_process_th
May 3 14:27:36 lfs4-mds kernel:
May 3 14:27:36 lfs4-mds kernel: Call Trace:
May 3 14:27:36 lfs4-mds kernel: [<ffffffffa06a1875>]
libcfs_debug_dumpstack+0x55/0x80 [libcfs]
May 3 14:27:36 lfs4-mds kernel: [<ffffffffa06a1e77>] lbug_with_loc+0x47/0xb0
[libcfs]
May 3 14:27:36 lfs4-mds kernel: [<ffffffffa11fa887>]
lod_add_device+0x1da7/0x1fe0 [lod]
May 3 14:27:36 lfs4-mds kernel: [<ffffffff8129c48e>] ? simple_strtol+0xe/0x20
May 3 14:27:36 lfs4-mds kernel: [<ffffffff8129c793>] ? vsscanf+0x2f3/0x770
May 3 14:27:36 lfs4-mds kernel: [<ffffffff8129c28c>] ?
simple_strtoull+0x2c/0x50
May 3 14:27:36 lfs4-mds kernel: [<ffffffffa11f07b9>]
lod_process_config+0x1339/0x1540 [lod]
May 3 14:27:36 lfs4-mds kernel: [<ffffffffa07e5d65>] ? keys_fill+0xd5/0x1b0
[obdclass]
May 3 14:27:36 lfs4-mds kernel: [<ffffffffa07e643b>] ?
lu_context_init+0x8b/0x160 [obdclass]
May 3 14:27:36 lfs4-mds kernel: [<ffffffffa07d7d05>]
class_process_config+0x2225/0x24c0 [obdclass]
May 3 14:27:36 lfs4-mds kernel: [<ffffffff810a185c>] ?
remove_wait_queue+0x3c/0x50
May 3 14:27:36 lfs4-mds kernel: [<ffffffffa07d984a>]
class_config_llog_handler+0xc1a/0x1d50 [obdclass]
May 3 14:27:36 lfs4-mds kernel: [<ffffffff8153a65e>] ? mutex_lock+0x1e/0x50
May 3 14:27:36 lfs4-mds kernel: [<ffffffffa079e3ca>]
llog_process_thread+0x94a/0x1040 [obdclass]
May 3 14:27:36 lfs4-mds kernel: [<ffffffffa079efc5>]
llog_process_thread_daemonize+0x45/0x70 [obdclass]
May 3 14:27:36 lfs4-mds kernel: [<ffffffffa079ef80>] ?
llog_process_thread_daemonize+0x0/0x70 [obdclass]
May 3 14:27:36 lfs4-mds kernel: [<ffffffff810a0fce>] kthread+0x9e/0xc0
May 3 14:27:36 lfs4-mds kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
May 3 14:27:36 lfs4-mds kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0
May 3 14:27:36 lfs4-mds kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org