Hi Michael, It seems that dead loop happens in chain 73. You have formatted using 2K block and 4K cluster, so each chain should have 1522 or 1521 records. But at first glance, I cannot figure out which block goes wrong, because the output you pasted indicates all blocks are different. So I suggest you investigate the all blocks which belong to chain 73 and try to find out if there is a loop there. BTW, have you backed up the metadata using o2image?
Thanks, Joseph On 2016/3/24 16:40, Michael Ulbrich wrote: > Hi Joseph, > > thanks a lot for your help. It is very much appreciated! > > I ran debugsfs.ocfs2 from ocfs2-tools 1.6.4 on the mounted file system: > > root@s1a:~# debugfs.ocfs2 -R 'stat //global_bitmap' /dev/drbd1 > > debugfs_drbd1.log 2>&1 > > Inode: 13 Mode: 0644 Generation: 1172963971 (0x45ea0283) > FS Generation: 1172963971 (0x45ea0283) > CRC32: 00000000 ECC: 0000 > Type: Regular Attr: 0x0 Flags: Valid System Allocbitmap Chain > Dynamic Features: (0x0) > User: 0 (root) Group: 0 (root) Size: 11381315956736 > Links: 1 Clusters: 2778641591 > ctime: 0x54010183 -- Sat Aug 30 00:41:07 2014 > atime: 0x54010183 -- Sat Aug 30 00:41:07 2014 > mtime: 0x54010183 -- Sat Aug 30 00:41:07 2014 > dtime: 0x0 -- Thu Jan 1 01:00:00 1970 > ctime_nsec: 0x00000000 -- 0 > atime_nsec: 0x00000000 -- 0 > mtime_nsec: 0x00000000 -- 0 > Refcount Block: 0 > Last Extblk: 0 Orphan Slot: 0 > Sub Alloc Slot: Global Sub Alloc Bit: 7 > Bitmap Total: 2778641591 Used: 1083108631 Free: 1695532960 > Clusters per Group: 15872 Bits per Cluster: 1 > Count: 115 Next Free Rec: 115 > ## Total Used Free Block# > 0 24173056 9429318 14743738 4533995520 > 1 24173056 9421663 14751393 4548629504 > 2 24173056 9432421 14740635 4588817408 > 3 24173056 9427533 14745523 4548692992 > 4 24173056 9433978 14739078 4508568576 > 5 24173056 9436974 14736082 4636369920 > 6 24173056 9428411 14744645 4563390464 > 7 24173056 9426950 14746106 4479459328 > 8 24173056 9428099 14744957 4548851712 > 9 24173056 9431794 14741262 4585389056 > ... > 105 24157184 9414241 14742943 4690652160 > 106 24157184 9419715 14737469 4467999744 > 107 24157184 9411479 14745705 4431525888 > 108 24157184 9413235 14743949 4559327232 > 109 24157184 9417948 14739236 4500950016 > 110 24157184 9411013 14746171 4566691840 > 111 24157184 9421252 14735932 4522916864 > 112 24157184 9416726 14740458 4537550848 > 113 24157184 9415358 14741826 4676303872 > 114 24157184 9420448 14736736 4526662656 > > Group Chain: 0 Parent Inode: 13 Generation: 1172963971 > CRC32: 00000000 ECC: 0000 > ## Block# Total Used Free Contig Size > 0 4533995520 15872 6339 9533 3987 1984 > 1 4530344960 15872 10755 5117 5117 1984 > 2 2997109760 15872 10753 5119 5119 1984 > 3 4526694400 15872 10753 5119 5119 1984 > 4 3022663680 15872 10753 5119 5119 1984 > 5 4512092160 15872 9043 6829 2742 1984 > 6 4523043840 15872 4948 10924 9612 1984 > 7 4519393280 15872 6150 9722 5595 1984 > 8 4515742720 15872 4323 11549 6603 1984 > 9 3771028480 15872 10753 5119 5119 1984 > ... > 1513 5523297280 15872 1 15871 15871 1984 > 1514 5526947840 15872 1 15871 15871 1984 > 1515 5530598400 15872 1 15871 15871 1984 > 1516 5534248960 15872 1 15871 15871 1984 > 1517 5537899520 15872 1 15871 15871 1984 > 1518 5541550080 15872 1 15871 15871 1984 > 1519 5545200640 15872 1 15871 15871 1984 > 1520 5548851200 15872 1 15871 15871 1984 > 1521 5552501760 15872 1 15871 15871 1984 > 1522 5556152320 15872 1 15871 15871 1984 > > Group Chain: 1 Parent Inode: 13 Generation: 1172963971 > CRC32: 00000000 ECC: 0000 > ## Block# Total Used Free Contig Size > 0 4548629504 15872 10755 5117 2496 1984 > 1 2993490944 15872 59 15813 14451 1984 > 2 2489713664 15872 10758 5114 3726 1984 > 3 3117609984 15872 3958 11914 6165 1984 > 4 2544472064 15872 10753 5119 5119 1984 > 5 3040948224 15872 10753 5119 5119 1984 > 6 2971587584 15872 10753 5119 5119 1984 > 7 4493871104 15872 8664 7208 3705 1984 > 8 4544978944 15872 8711 7161 2919 1984 > 9 4417209344 15872 3253 12619 6447 1984 > ... > 1513 5523329024 15872 1 15871 15871 1984 > 1514 5526979584 15872 1 15871 15871 1984 > 1515 5530630144 15872 1 15871 15871 1984 > 1516 5534280704 15872 1 15871 15871 1984 > 1517 5537931264 15872 1 15871 15871 1984 > 1518 5541581824 15872 1 15871 15871 1984 > 1519 5545232384 15872 1 15871 15871 1984 > 1520 5548882944 15872 1 15871 15871 1984 > 1521 5552533504 15872 1 15871 15871 1984 > 1522 5556184064 15872 1 15871 15871 1984 > > ... all following group chains are similarly structured up to #73 which > looks as follows: > > Group Chain: 73 Parent Inode: 13 Generation: 1172963971 > CRC32: 00000000 ECC: 0000 > ## Block# Total Used Free Contig Size > 0 2583263232 15872 5341 10531 5153 1984 > 1 4543613952 15872 5329 10543 5119 1984 > 2 4532662272 15872 10753 5119 5119 1984 > 3 4539963392 15872 3223 12649 7530 1984 > 4 4536312832 15872 5219 10653 5534 1984 > 5 4529011712 15872 6047 9825 3359 1984 > 6 4525361152 15872 4475 11397 5809 1984 > 7 4521710592 15872 3182 12690 5844 1984 > 8 4518060032 15872 5881 9991 5131 1984 > 9 4236966912 15872 10753 5119 5119 1984 > ... > 2059651 4299026432 15872 4334 11538 4816 1984 > 2059652 4087293952 15872 7003 8869 2166 1984 > 2059653 4295375872 15872 6626 9246 5119 1984 > 2059654 4288074752 15872 509 15363 9662 1984 > 2059655 4291725312 15872 6151 9721 5119 1984 > 2059656 4284424192 15872 10052 5820 5119 1984 > 2059657 4277123072 15872 7383 8489 5120 1984 > 2059658 4273472512 15872 14 15858 5655 1984 > 2059659 4269821952 15872 2637 13235 7060 1984 > 2059660 4266171392 15872 10758 5114 3674 1984 > ... > > Assuming this would go on forever I stopped debugfs.ocfs2. > > With debugs.ocfs2 from ocfs2-tools 1.8.4 I get an identical result. > > Please let me know if I can provide any further information and help to > fix this issue. > > Thanks again + Best regards ... Michael > > On 03/24/2016 01:30 AM, Joseph Qi wrote: >> Hi Michael, >> Could you please use debugfs to check the output? >> # debugfs.ocfs2 -R 'stat //global_bitmap' <device> >> >> Thanks, >> Joseph >> >> On 2016/3/24 6:38, Michael Ulbrich wrote: >>> Hi ocfs2-users, >>> >>> my first post to this list from yesterday probably didn't get through. >>> >>> Anyway, I've made some progress in the meantime and may now ask more >>> specific questions ... >>> >>> I'm having issues with an 11 TB ocfs2 shared filesystem on Debian Wheezy: >>> >>> Linux s1a 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux >>> >>> the kernel modules are: >>> >>> modinfo ocfs2 -> version: 1.5.0 >>> >>> using stock ocfs2-tools 1.6.4-1+deb7u1 from the distri. >>> >>> As an alternative I cloned and built the latest ocfs2-tools from >>> markfasheh's ocfs2-tools on github which should be version 1.8.4. >>> >>> The filesystem runs on top of drbd, is used to roughly 40 % and suffers >>> from read-only remounts and hanging clients since the last reboot. This >>> may be DLM problems but I suspect they stem from some corrupt disk >>> structures. Before that it all ran stable for months. >>> >>> This situation made me want to run fsck.ocfs2 and now I wonder how to do >>> that. The filesystem is not mounted. >>> >>> With the stock ocfs-tools 1.6.4: >>> >>> root@s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1 >>> fsck.ocfs2 1.6.4 >>> Checking OCFS2 filesystem in /dev/drbd1: >>> Label: ocfs2_ASSET >>> UUID: 6A1A0189A3F94E32B6B9A526DF9060F3 >>> Number of blocks: 5557283182 >>> Block size: 2048 >>> Number of clusters: 2778641591 >>> Cluster size: 4096 >>> Number of slots: 16 >>> >>> I'm checking fsck_drbd1.log and find that it is making progress in >>> >>> Pass 0a: Checking cluster allocation chains >>> >>> until it reaches "chain 73" and goes into an infinite loop filling the >>> logfile with breathtaking speed. >>> >>> With the newly built ocfs-tools 1.8.4 I get: >>> >>> root@s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1 >>> fsck.ocfs2 1.8.4 >>> Checking OCFS2 filesystem in /dev/drbd1: >>> Label: ocfs2_ASSET >>> UUID: 6A1A0189A3F94E32B6B9A526DF9060F3 >>> Number of blocks: 5557283182 >>> Block size: 2048 >>> Number of clusters: 2778641591 >>> Cluster size: 4096 >>> Number of slots: 16 >>> >>> Again watching the verbose output in fsck_drbd1.log I find that this >>> time it proceeds up to >>> >>> Pass 0a: Checking cluster allocation chains >>> o2fsck_pass0:1360 | found inode alloc 13 at block 13 >>> >>> and stays there without any further progress. I've terminated this >>> process after waiting for more than an hour. >>> >>> Now - I'm lost somehow ... and would very much appreciate if anybody on >>> this list would share his knowledge and give me a hint what to do next. >>> >>> What could be done to get this file system checked and repaired? Am I >>> missing something important or do I just have to wait a little bit >>> longer? Is there a version of ocfs2-tools / fsck.ocfs2 which will >>> perform as expected? >>> >>> I'm prepared to upgrade the kernel to 3.16.0-0.bpo.4-amd64 but shy away >>> from taking that risk without any clue of whether that might solve my >>> problem ... >>> >>> Thanks in advance ... Michael Ulbrich >>> >>> _______________________________________________ >>> Ocfs2-users mailing list >>> Ocfs2-users@oss.oracle.com >>> https://oss.oracle.com/mailman/listinfo/ocfs2-users >>> >>> >> >> >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users@oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-users >> > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-users > > . > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users