Did you try doing a writeconf to regenerate the config logs for the file system?
-- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu > On May 4, 2017, at 10:03 AM, Steve Barnet <[email protected]> wrote: > > Hi all, > > This is Lustre 2.8.0 community edition, combined MGS/MDT. > > I was adding storage to a filesystem and mistakenly duplicated an > index for one of the OSTs at creation time. Since these OSTs were > new and no data had been written, I made the mistake of reformatting > the affected OSTs (including the first one I successfully mounted). > > When I tried to remount the newly formatted OST, the MDS kernel > panicked (log attached). After a device level backup and an e2fsck, > I can mount the MDT as ldiskfs. e2fsck did correct some orphaned > inodes, but those appear to be user files only, nothing from the > Lustre metadata files themselves. > > However, the MDT/MGT still will not mount. The logs indicate > that the original definition of the duplicated OST still exists > somewhere. I checked the CONFIGS directory, and indeed there was > a file associated with the OST in question. I copied that file > out of the CONFIGS directory and attempted to mount the MDT/MGT > again, but no change. > > The logs read: > > May 4 06:41:22 lfs4-mds kernel: Lustre: MGS: Connection restored to > MGC10.128.11.174@tcp1_0 (at 0@lo) > May 4 06:41:22 lfs4-mds kernel: LustreError: > 12300:0:(genops.c:334:class_newdev()) Device lfs4-OST000e-osc-MDT0000 already > exists at 22, won't add > May 4 06:41:22 lfs4-mds kernel: LustreError: > 12300:0:(obd_config.c:370:class_attach()) Cannot create device > lfs4-OST000e-osc-MDT0000 of type osp : -17 > May 4 06:41:22 lfs4-mds kernel: LustreError: > 12300:0:(obd_config.c:1666:class_config_llog_handler()) > MGC10.128.11.174@tcp1: cfg command failed: rc = -17 > May 4 06:41:22 lfs4-mds kernel: Lustre: cmd=cf001 > 0:lfs4-OST000e-osc-MDT0000 1:osp 2:lfs4-MDT0000-mdtlov_UUID > May 4 06:41:22 lfs4-mds kernel: > May 4 06:41:22 lfs4-mds kernel: LustreError: 15c-8: MGC10.128.11.174@tcp1: > The configuration from log 'lfs4-MDT0000' failed (-17). This may be the > result of communication errors between this node and the MGS, a bad > configuration, or other errors. See the syslog for more information. > May 4 06:41:22 lfs4-mds kernel: LustreError: > 12213:0:(obd_mount_server.c:1309:server_start_targets()) failed to start > server lfs4-MDT0000: -17 > May 4 06:41:22 lfs4-mds kernel: LustreError: > 12213:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start > targets: -17 > May 4 06:41:22 lfs4-mds kernel: Lustre: Failing over lfs4-MDT0000 > May 4 06:41:28 lfs4-mds kernel: Lustre: > 12213:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has > timed out for slow reply: [sent 1493898082/real 1493898082] > req@ffff8803113459c0 x1566404887184424/t0(0) > o251->MGC10.128.11.174@tcp1@0@lo:26/25 lens 224/224 e 0 to 1 dl 1493898088 > ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 > May 4 06:41:28 lfs4-mds kernel: Lustre: server umount lfs4-MDT0000 complete > May 4 06:41:28 lfs4-mds kernel: LustreError: > 12213:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount (-17) > May 4 06:45:04 lfs4-mds kernel: LDISKFS-fs (sdb): mounted filesystem with > ordered data mode. quota=on. Opts: > > > Again, no data was written to these. I was poking around a bit with > the procedure for fixing a bad LAST_ID. From what I was able to > piece together, it doesn't look like the MDT has any notion of > precreated objects on this OST yet, so I am suspecting something > in mountdata, perhaps. > > Any ideas? > > Thanks much! > > Best, > > ---Steve > > <mds-panic.txt>_______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
