Can you save the o2image of the volume when it is in that state. We'll need that for analysis.
On 09/16/2011 05:41 AM, Andre Nathan wrote: > Hello > > For a while I had seen errors like this in the kernel logs: > > OCFS2: ERROR (device drbd5): ocfs2_validate_gd_parent: Group > descriptor #69084874 has bad chain 126 > File system is now read-only due to the potential of on-disk > corruption. Please run fsck.ocfs2 once the file system is unmounted. > > This always happened in the same device, and whenever it happened I ran > fsck.ocfs2 -fy /dev/drbd5, which showed messages like these: > > [GROUP_FREE_BITS] Group descriptor at block 201309696 claims to have > 9893 free bits which is more than 9886 bits indicated by the bitmap. > Drop its free bit count down to the total? y > [CHAIN_BITS] Chain 166 in allocator inode 11 has 1264713 bits > marked free out of 1516032 total bits but the block groups in the > chain have 1264706 free out of 1516032 total. Fix this by updating > the chain record? y > [CHAIN_GROUP_BITS] Allocator inode 11 has 79407510 bits marked used > out of 365955414 total bits but the chains have 79407911 used out of > 365955414 total. Fix this by updating the inode counts? y > [INODE_COUNT] Inode 69085510 has a link count of 0 on disk but > directory entry references come to 1. Update the count on disk to > match? y > > As time passed, the frequency of these issues started to increase, and > the last time it happened, I decided to run fsck twice in a row, and was > surprised to see it showed the same messages in both runs. It seems it > was unable to fix the problem. > > I identified the files corresponding to the inodes using debugfs.ocfs2 > and copied them to a new place, and then moved the copy over the > original file, in order to recreate the inodes. Whenever I did that for > one inode, the error above happened and the filesystem became read-only, > so I had to umount/mount the volume again in order to be able to write > to it again. > > After doing this, I ran fsck.ocfs2 -fy again twice, and no errors were > reported. Since then I haven't seen this problem again. > > I'm running kernel 2.6.35 and ocfs2-tools 1.6.4. > > Has anyone else seen an issue like that? > > Thanks > Andre > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users