[CentOS] Ext3 and drbd read-only remount problem.

2012-05-06 Thread Rafał Radecki
Hi all.

I have two hosts with drbd:
kmod-drbd83-8.3.8-1.el5.centos
drbd83-8.3.8-1.el5.centos
and kernel (CentOS 5.7):
2.6.18-308.4.1.el5

After a recent upgrade of kernel I have had two sitiuations when my ext3
filesystem on /dev/drbd0 became read-only. I've checked disks with smartctl
-t long, they are ok. There are no messages with disks problems in
/var/log/messages | dmesg. I've made fsck tonight but 3 hours after it has
finished the problem repeated once more (under heavy load).

/var/log/messages:

May  6 06:22:27 srv1a kernel: EXT3-fs error (device drbd0):
htree_dirblock_to_tree: bad entry in directory #43024813: rec_len
% 4 != 0 - offset=73728, inode=1701012818, rec_len=30313, name_len=101
May  6 06:22:27 srv1a kernel: Aborting journal on device drbd0.
May  6 06:22:28 srv1a kernel: journal commit I/O error
May  6 06:22:28 srv1a kernel: ext3_abort called.
May  6 06:22:28 srv1a kernel: journal commit I/O error
May  6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0):
ext3_journal_start_sb: Detected aborted journal
May  6 06:22:28 srv1a kernel: ext3_abort called.
May  6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0):
ext3_journal_start_sb: Detected aborted journal
May  6 06:22:28 srv1a kernel: Remounting filesystem read-only
May  6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing
b_committed_data
May  6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing
b_committed_data
May  6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing
b_committed_data
May  6 06:22:28 srv1a kernel: journal commit I/O error
May  6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0):
htree_dirblock_to_tree: bad entry in directory #43024813: rec_len
% 4 != 0 - offset=106496, inode=1701012818, rec_len=30313, name_len=101
May  6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0):
htree_dirblock_to_tree: bad entry in directory #43024813: rec_len
% 4 != 0 - offset=204800, inode=1869116005, rec_len=29811, name_len=46

I've found:

https://bugzilla.redhat.com/show_bug.cgi?id=494927

There are some clues that it may be a  kernel problem so I went back to:
2.6.18-274.7.1.el5

At the moment the situation is ok but I've read that the problem happens in
random circumstances.

Any clues what to do?

Best regards,
Rafal.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Ext3 and drbd read-only remount problem.

2012-05-06 Thread Rafał Radecki
I have one more question with regard to mentioned kernel update to
2.6.18-308.4.1.el5 :
in extras repo there is a package available

kmod-drbd83
8.3.12
This package provides the drbd83 kernel modules built for the Linux
   : kernel 2.6.18-274.17.1.el5 for the i686 family of processors.

We currently have installed kmod-drbd83:

8.3.8
This package provides the drbd83 kernel modules built for the Linux
   : kernel 2.6.18-194.el5 for the i686 family of processors.

Should kmod-drbd83 version match current kernel version (from package
description) or should kmod-drbd83 in version 8.3.8 be installed if we are
using drbd83-8.3.8-1.el5.centos ?

Best regards,
Rafal.

2012/5/6 Rafał Radecki 

> Hi all.
>
> I have two hosts with drbd:
> kmod-drbd83-8.3.8-1.el5.centos
> drbd83-8.3.8-1.el5.centos
> and kernel (CentOS 5.7):
> 2.6.18-308.4.1.el5
>
> After a recent upgrade of kernel I have had two sitiuations when my ext3
> filesystem on /dev/drbd0 became read-only. I've checked disks with smartctl
> -t long, they are ok. There are no messages with disks problems in
> /var/log/messages | dmesg. I've made fsck tonight but 3 hours after it has
> finished the problem repeated once more (under heavy load).
>
> /var/log/messages:
>
> May  6 06:22:27 srv1a kernel: EXT3-fs error (device drbd0):
> htree_dirblock_to_tree: bad entry in directory #43024813: rec_len
> % 4 != 0 - offset=73728, inode=1701012818, rec_len=30313, name_len=101
> May  6 06:22:27 srv1a kernel: Aborting journal on device drbd0.
> May  6 06:22:28 srv1a kernel: journal commit I/O error
> May  6 06:22:28 srv1a kernel: ext3_abort called.
> May  6 06:22:28 srv1a kernel: journal commit I/O error
> May  6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0):
> ext3_journal_start_sb: Detected aborted journal
> May  6 06:22:28 srv1a kernel: ext3_abort called.
> May  6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0):
> ext3_journal_start_sb: Detected aborted journal
> May  6 06:22:28 srv1a kernel: Remounting filesystem read-only
> May  6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing
> b_committed_data
> May  6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing
> b_committed_data
> May  6 06:22:28 srv1a kernel: __journal_remove_journal_head: freeing
> b_committed_data
> May  6 06:22:28 srv1a kernel: journal commit I/O error
> May  6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0):
> htree_dirblock_to_tree: bad entry in directory #43024813: rec_len
> % 4 != 0 - offset=106496, inode=1701012818, rec_len=30313, name_len=101
> May  6 06:22:28 srv1a kernel: EXT3-fs error (device drbd0):
> htree_dirblock_to_tree: bad entry in directory #43024813: rec_len
> % 4 != 0 - offset=204800, inode=1869116005, rec_len=29811, name_len=46
>
> I've found:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=494927
>
> There are some clues that it may be a  kernel problem so I went back to:
> 2.6.18-274.7.1.el5
>
> At the moment the situation is ok but I've read that the problem happens
> in random circumstances.
>
> Any clues what to do?
>
> Best regards,
> Rafal.
>
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos