On Jul 17, 2017, at 22:48, wanglu <wan...@ihep.ac.cn> wrote: > > Hello, > > One OST of our system can not be mounted in lustre mode after an severe disk > error and an 5 days' e2fsck. Here are errors we got during the mount > operation. > #grep FID /var/log/messages > Jul 17 20:15:21 oss04 kernel: LustreError: > 13089:0:(osd_oi.c:653:osd_oi_insert()) lustre-OST0036: the FID > [0x200000005:0x1:0x0] is used by two objects: 86/3303188178 48085/1708371613 > Jul 17 20:38:41 oss04 kernel: LustreError: > 13988:0:(osd_oi.c:653:osd_oi_insert()) lustre-OST0036: the FID > [0x200000005:0x1:0x0] is used by two objects: 86/3303188178 48086/3830163079 > Jul 17 20:49:55 oss04 kernel: LustreError: > 14221:0:(osd_oi.c:653:osd_oi_insert()) lustre-OST0036: the FID > [0x200000005:0x1:0x0] is used by two objects: 86/3303188178 48087/538285899 > Jul 18 11:39:25 oss04 kernel: LustreError: > 31071:0:(osd_oi.c:653:osd_oi_insert()) lustre-OST0036: the FID > [0x200000005:0x1:0x0] is used by two objects: 86/3303188178 48088/2468309129 > Jul 18 11:39:56 oss04 kernel: LustreError: > 31170:0:(osd_oi.c:653:osd_oi_insert()) lustre-OST0036: the FID > [0x200000005:0x1:0x0] is used by two objects: 86/3303188178 48089/2021195118 > Jul 18 12:04:31 oss04 kernel: LustreError: > 32127:0:(osd_oi.c:653:osd_oi_insert()) lustre-OST0036: the FID > [0x200000005:0x1:0x0] is used by two objects: 86/3303188178 48090/956682248
The numbers printed here are ldiskfs inode numbers, 86 and 48090. The FID [0x200000005:0x1:0x0] is the user quota file, so these files may be in the quota_slave directory. > and the mount operation is failed with error -17 > Jul 18 12:04:31 oss04 kernel: LustreError: > 32127:0:(osd_oi.c:653:osd_oi_insert()) lustre-OST0036: the FID > [0x200000005:0x1:0x0] is used by two objects: 86/3303188178 48090/956682248 > Jul 18 12:04:31 oss04 kernel: LustreError: > 32127:0:(qsd_lib.c:418:qsd_qtype_init()) lustre-OST0036: can't open slave > index copy [0x200000006:0x20000:0x0] -17 > Jul 18 12:04:31 oss04 kernel: LustreError: > 32127:0:(obd_mount_server.c:1723:server_fill_super()) Unable to start > targets: -17 > Jul 18 12:04:31 oss04 kernel: Lustre: Failing over lustre-OST0036 > Jul 18 12:04:32 oss04 kernel: Lustre: server umount lustre-OST0036 complete > > If you run e2fsck again, the command will claim that the inode 480xx has two > reference and remove 480xxx to Lost+Found. > # e2fsck -f /dev/sdn > e2fsck 1.42.12.wc1 (15-Sep-2014) > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Unattached inode 48090 > Connect to /lost+found<y>? yes > Inode 48090 ref count is 2, should be 1. Fix<y>? yes > Pass 5: Checking group summary information > > lustre-OST0036: ***** FILE SYSTEM WAS MODIFIED ***** > lustre-OST0036: 238443/549322752 files (4.4% non-contiguous), > 1737885841/2197287936 blocks > > Is it possible to find the file corresponding to 86/3303188178 and delete it ? You could just delete the 48090 file from lost+found (or move it out of the Lustre filesystem for backup) and it should solve the problem. > P.S 1. in ldiskfs mode, most of the disk files are OK to read, while some > of them are red. > 2. there are about 240'000 objects in the OST. > [root@oss04 d0]# df -i /lustre/ostc > Filesystem Inodes IUsed IFree IUse% Mounted on > /dev/sdn 549322752 238443 549084309 1% /lustre/ostc > 3. Lustre Version 2.5.3, e2fsprog version This is an old version of Lustre and e2fsprogs, you would be much better off to upgrade. Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Intel Corporation _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org