Hi all,

  This is Lustre 2.8.0 community edition, combined MGS/MDT.

I was adding storage to a filesystem and mistakenly duplicated an
index for one of the OSTs at creation time. Since these OSTs were
new and no data had been written, I made the mistake of reformatting
the affected OSTs (including the first one I successfully mounted).

  When I tried to remount the newly formatted OST, the MDS kernel
panicked (log attached). After a device level backup and an e2fsck,
I can mount the MDT as ldiskfs. e2fsck did correct some orphaned
inodes, but those appear to be user files only, nothing from the
Lustre metadata files themselves.

  However, the MDT/MGT still will not mount. The logs indicate
that the original definition of the duplicated OST still exists
somewhere. I checked the CONFIGS directory, and indeed there was
a file associated with the OST in question. I copied that file
out of the CONFIGS directory and attempted to mount the MDT/MGT
again, but no change.

The logs read:

May 4 06:41:22 lfs4-mds kernel: Lustre: MGS: Connection restored to MGC10.128.11.174@tcp1_0 (at 0@lo) May 4 06:41:22 lfs4-mds kernel: LustreError: 12300:0:(genops.c:334:class_newdev()) Device lfs4-OST000e-osc-MDT0000 already exists at 22, won't add May 4 06:41:22 lfs4-mds kernel: LustreError: 12300:0:(obd_config.c:370:class_attach()) Cannot create device lfs4-OST000e-osc-MDT0000 of type osp : -17 May 4 06:41:22 lfs4-mds kernel: LustreError: 12300:0:(obd_config.c:1666:class_config_llog_handler()) MGC10.128.11.174@tcp1: cfg command failed: rc = -17 May 4 06:41:22 lfs4-mds kernel: Lustre: cmd=cf001 0:lfs4-OST000e-osc-MDT0000 1:osp 2:lfs4-MDT0000-mdtlov_UUID
May  4 06:41:22 lfs4-mds kernel:
May 4 06:41:22 lfs4-mds kernel: LustreError: 15c-8: MGC10.128.11.174@tcp1: The configuration from log 'lfs4-MDT0000' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. May 4 06:41:22 lfs4-mds kernel: LustreError: 12213:0:(obd_mount_server.c:1309:server_start_targets()) failed to start server lfs4-MDT0000: -17 May 4 06:41:22 lfs4-mds kernel: LustreError: 12213:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start targets: -17
May  4 06:41:22 lfs4-mds kernel: Lustre: Failing over lfs4-MDT0000
May 4 06:41:28 lfs4-mds kernel: Lustre: 12213:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1493898082/real 1493898082] req@ffff8803113459c0 x1566404887184424/t0(0) o251->MGC10.128.11.174@tcp1@0@lo:26/25 lens 224/224 e 0 to 1 dl 1493898088 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
May  4 06:41:28 lfs4-mds kernel: Lustre: server umount lfs4-MDT0000 complete
May 4 06:41:28 lfs4-mds kernel: LustreError: 12213:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount (-17) May 4 06:45:04 lfs4-mds kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. quota=on. Opts:


Again, no data was written to these. I was poking around a bit with
the procedure for fixing a bad LAST_ID. From what I was able to
piece together, it doesn't look like the MDT has any notion of
precreated objects on this OST yet, so I am suspecting something
in mountdata, perhaps.

Any ideas?

Thanks much!

Best,

---Steve

May  3 14:20:31 lfs4-mds kernel: Lustre: MGS: Connection restored to 
a5bce68d-fd2c-8bd8-c20b-e5713dc99a03 (at 10.128.11.157@tcp1)
May  3 14:20:31 lfs4-mds kernel: Lustre: Skipped 1 previous similar message
May  3 14:20:58 lfs4-mds kernel: Lustre: lfs4-MDT0000: Connection restored to 
10.128.11.157@tcp1 (at 10.128.11.157@tcp1)
May  3 14:20:58 lfs4-mds kernel: Lustre: Skipped 1 previous similar message
May  3 14:21:24 lfs4-mds kernel: Lustre: lfs4-MDT0000: Connection restored to 
a5bce68d-fd2c-8bd8-c20b-e5713dc99a03 (at 10.128.11.157@tcp
1)
May  3 14:25:33 lfs4-mds kernel: Lustre: MGS: Connection restored to 
fcadb8d1-5c7f-143e-145e-c580fa091b56 (at 10.128.11.156@tcp1)
May  3 14:25:43 lfs4-mds kernel: Lustre: 
2223:0:(mgc_request.c:1680:mgc_process_recover_log()) Process recover log 
lfs4-mdtir error -22
May  3 14:25:43 lfs4-mds kernel: LustreError: 
5156:0:(ldlm_lib.c:462:client_obd_setup()) can't add initial connection
May  3 14:25:43 lfs4-mds kernel: LustreError: 
5156:0:(osp_dev.c:1145:osp_init0()) lfs4-OST000e-osc-MDT0000: can't setup obd: 
rc = -2
May  3 14:25:43 lfs4-mds kernel: LustreError: 
5156:0:(obd_config.c:578:class_setup()) setup lfs4-OST000e-osc-MDT0000 failed 
(-2)
May  3 14:25:43 lfs4-mds kernel: LustreError: 
5156:0:(obd_config.c:1666:class_config_llog_handler()) MGC10.128.11.174@tcp1: 
cfg command 
failed: rc = -2
May  3 14:25:43 lfs4-mds kernel: Lustre:    cmd=cf003 
0:lfs4-OST000e-osc-MDT0000  1:lfs4-OST000e_UUID  2:0@<0:0>  
May  3 14:25:43 lfs4-mds kernel: 
May  3 14:27:26 lfs4-mds kernel: Lustre: MGS: Connection restored to 
lfs4-MDT0000-lwp-OST000e_UUID (at 10.128.11.156@tcp1)
May  3 14:27:26 lfs4-mds kernel: Lustre: Skipped 1 previous similar message
May  3 14:27:26 lfs4-mds kernel: LustreError: 140-5: Server lfs4-OST000e 
requested index 14, but that index is already in use. Use --wri
teconf to force
May  3 14:27:26 lfs4-mds kernel: LustreError: 
29874:0:(mgs_handler.c:460:mgs_target_reg()) Failed to write lfs4-OST000e log 
(-98)
May  3 14:27:36 lfs4-mds kernel: LustreError: 
5721:0:(obd_config.c:798:class_add_conn()) try to add conn on immature client 
dev
May  3 14:27:36 lfs4-mds kernel: LustreError: 
5721:0:(lod_lov.c:243:lod_add_device()) ASSERTION( obd->obd_lu_dev->ld_site == 
lod->lod_dt
_dev.dd_lu_dev.ld_site ) failed: 
May  3 14:27:36 lfs4-mds kernel: LustreError: 
5721:0:(lod_lov.c:243:lod_add_device()) LBUG
May  3 14:27:36 lfs4-mds kernel: Pid: 5721, comm: llog_process_th
May  3 14:27:36 lfs4-mds kernel: 
May  3 14:27:36 lfs4-mds kernel: Call Trace:
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa06a1875>] 
libcfs_debug_dumpstack+0x55/0x80 [libcfs]
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa06a1e77>] lbug_with_loc+0x47/0xb0 
[libcfs]
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa11fa887>] 
lod_add_device+0x1da7/0x1fe0 [lod]
May  3 14:27:36 lfs4-mds kernel: [<ffffffff8129c48e>] ? simple_strtol+0xe/0x20
May  3 14:27:36 lfs4-mds kernel: [<ffffffff8129c793>] ? vsscanf+0x2f3/0x770
May  3 14:27:36 lfs4-mds kernel: [<ffffffff8129c28c>] ? 
simple_strtoull+0x2c/0x50
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa11f07b9>] 
lod_process_config+0x1339/0x1540 [lod]
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa07e5d65>] ? keys_fill+0xd5/0x1b0 
[obdclass]
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa07e643b>] ? 
lu_context_init+0x8b/0x160 [obdclass]
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa07d7d05>] 
class_process_config+0x2225/0x24c0 [obdclass]
May  3 14:27:36 lfs4-mds kernel: [<ffffffff810a185c>] ? 
remove_wait_queue+0x3c/0x50
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa07d984a>] 
class_config_llog_handler+0xc1a/0x1d50 [obdclass]
May  3 14:27:36 lfs4-mds kernel: [<ffffffff8153a65e>] ? mutex_lock+0x1e/0x50
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa079e3ca>] 
llog_process_thread+0x94a/0x1040 [obdclass]
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa079efc5>] 
llog_process_thread_daemonize+0x45/0x70 [obdclass]
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa079ef80>] ? 
llog_process_thread_daemonize+0x0/0x70 [obdclass]
May  3 14:27:36 lfs4-mds kernel: [<ffffffff810a0fce>] kthread+0x9e/0xc0
May  3 14:27:36 lfs4-mds kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
May  3 14:27:36 lfs4-mds kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0
May  3 14:27:36 lfs4-mds kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to