Hi
I'm experiencing the following problem while using OCFS2 over DRBD
partition.
My config is the following:
2 servers with pacemaker+corosync stack configured
Debian Lenny/Squeeze mixed:
kernel - linux-image-2.6.32-bpo.5-amd64 (2.6.32-26~bpo50+1)
kernel modules - drbd = 8.3.7 (api:88/proto:86-91) ocfs2 = 1.5.0
packages:
pacemaker = 1.0.9.1
corosync = 1.2.1-2
dlm-pcmk = 3.0.12-2
ocfs2-tools-pacemaker(contains ocfs2_controld.pcmk binary )=1.4.4-3
ocfs2-tools = 1.4.4-3
Kernel trace follows here:
[ 3128.804789] block drbd0: Handshake successful: Agreed network
protocol version 91
[ 3128.805094] block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
[ 3128.805176] block drbd0: conn( WFConnection -> WFReportParams )
[ 3128.805274] block drbd0: Starting asender thread (from drbd0_receiver
[4776])
[ 3128.805533] block drbd0: data-integrity-alg: <not-used>
[ 3128.805626] block drbd0: drbd_sync_handshake:
[ 3128.805695] block drbd0: self
B4F22E41814A97AB:ADC1DEC415E06ACD:0E1A98B5C70EAE0E:578A64518662F9CF
bits:202 flags:0
[ 3128.805788] block drbd0: peer
ADC1DEC415E06ACC:0000000000000000:0E1A98B5C70EAE0E:578A64518662F9CF
bits:0 flags:0
[ 3128.805880] block drbd0: uuid_compare()=1 by rule 70
[ 3128.805953] block drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapS )
[ 3129.365716] block drbd0: conn( WFBitMapS -> SyncSource ) pdsk(
Outdated -> Inconsistent )
[ 3129.365816] block drbd0: Began resync as SyncSource (will sync 808 KB
[202 bits set]).
[ 3129.441670] block drbd0: Resync done (total 1 sec; paused 0 sec; 808
K/sec)
[ 3129.441746] block drbd0: conn( SyncSource -> Connected ) pdsk(
Inconsistent -> UpToDate )
[ 3154.019560] block drbd0: peer( Secondary -> Primary )
[ 3156.462341] dlm: got connection from 1191233546
[ 3162.458368] (5378,4):ocfs2_truncate_file:465 ERROR: bug expression:
le64_to_cpu(fe->i_size) != i_size_read(inode)
[ 3162.458466] (5378,4):ocfs2_truncate_file:465 ERROR: Inode 1714687,
inode i_size = 556 != di i_size = 604, i_flags = 0x1
[ 3162.458586] ------------[ cut here ]------------
[ 3162.458654] kernel BUG at
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/file.c:465!
[ 3162.458745] invalid opcode: 0000 [#1] SMP
[ 3162.458901] last sysfs file:
/sys/kernel/dlm/D9348641B1E04D0E907EFF8D978F348A/control
[ 3162.458988] CPU 4
[ 3162.459095] Modules linked in: ocfs2 jbd2 ocfs2_nodemanager
quota_tree ocfs2_stack_user ocfs2_stackglue sha1_generic hmac drbd
lru_cache cn dlm configfs ip_vs_rr ip_vs sctp crc32c libcrc32c nfsd
exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipip tunnel4 8021q
garp stp xt_MARK iptable_mangle xt_tcpudp iptable_filter ip_tables
x_tables coretemp w83627hf w83793 hwmon_vid loop snd_pcsp snd_pcm_oss
snd_mixer_oss snd_pcm radeon ttm drm_kms_helper snd_timer drm snd
i5k_amb soundcore i2c_algo_bit container i5000_edac rng_core
snd_page_alloc edac_core evdev button processor ioatdma dca shpchp
pci_hotplug i2c_i801 i2c_core ext3 jbd mbcache dm_mod ses enclosure
sd_mod crc_t10dif sg sr_mod cdrom ata_piix ata_generic libata aacraid
ehci_hcd uhci_hcd scsi_mod thermal thermal_sys usbcore e1000e nls_base
[last unloaded: scsi_wait_scan]
[ 3162.462354] Pid: 5378, comm: apache2 Not tainted 2.6.32-bpo.5-amd64
#1 X7DBU
[ 3162.462354] RIP: 0010:[<ffffffffa05e006f>] [<ffffffffa05e006f>]
ocfs2_setattr+0x631/0x172a [ocfs2]
[ 3162.462354] RSP: 0018:ffff8801fa71bc28 EFLAGS: 00010292
[ 3162.462354] RAX: 0000000000000081 RBX: ffff8801d5afb000 RCX:
0000000000001977
[ 3162.462354] RDX: 0000000000000000 RSI: 0000000000000092 RDI:
0000000000000246
[ 3162.462354] RBP: 0000000000000000 R08: 000000000000f71d R09:
000000000000000a
[ 3162.462354] R10: 0000000000000000 R11: ffffffff811b7371 R12:
0000000000000000
[ 3162.462354] R13: ffff8801f8fc5ec8 R14: ffff8801f8fc5ec8 R15:
ffff8801f8f752a0
[ 3162.462354] FS: 00007fb03993b710(0000) GS:ffff880008d00000(0000)
knlGS:0000000000000000
[ 3162.462354] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3162.462354] CR2: 00000000010ccbc8 CR3: 00000001fd16e000 CR4:
00000000000006e0
[ 3162.462354] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 3162.462354] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 3162.462354] Process apache2 (pid: 5378, threadinfo ffff8801fa71a000,
task ffff8801f9d0b880)
[ 3162.462354] Stack:
[ 3162.462354] 000000000000022c 000000000000025c 0000000000000001
ffff880227649000
[ 3162.462354] <0> ffff8801f8fc5b60 ffff8801fa71bd68 ffff880227649000
0000000100000292
[ 3162.462354] <0> ffff8801fa4bc800 ffff8801f8fc5b78 000000004cee87ac
ffff880227649000
[ 3162.462354] Call Trace:
[ 3162.462354] [<ffffffff81051f59>] ? current_fs_time+0x1e/0x24
[ 3162.462354] [<ffffffff81100bbb>] ? notify_change+0x180/0x2c5
[ 3162.462354] [<ffffffff810ed880>] ? do_truncate+0x63/0x7e
[ 3162.462354] [<ffffffff810f5a18>] ? get_write_access+0x18/0x4b
[ 3162.462354] [<ffffffff810f7c17>] ? may_open+0x191/0x1c8
[ 3162.462354] [<ffffffff810f84fa>] ? do_filp_open+0x4bf/0x94b
[ 3162.462354] [<ffffffff810f1833>] ? cp_new_stat+0xe9/0xfc
[ 3162.462354] [<ffffffff810ecb5f>] ? do_sys_open+0x55/0xfc
[ 3162.462354] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
[ 3162.462354] Code: 89 fb 62 a0 65 8b 14 25 a8 e3 00 00 89 44 24 10 48
8b 43 20 48 63 d2 48 89 44 24 08 49 8b 46 68 48 89 04 24 31 c0 e8 0e 92
d1 e0 <0f> 0b eb fe 49 39 cc 48 8b 05 c3 7b f8 ff 0f 86 b1 00 00 00 a9
[ 3162.462354] RIP [<ffffffffa05e006f>] ocfs2_setattr+0x631/0x172a [ocfs2]
[ 3162.462354] RSP <ffff8801fa71bc28>
[ 3162.469653] ---[ end trace 3a74db6ea3c5066f ]---
I don't know how to exactly reproduce this bug. Kernel doesn't stall
after hiting this bug. But it is rather annoying and I am worried about
file system consistency.
Any help would be appreciated.
--
Yours Faithfully
Vladimir Kuklin
Network Services Specialist
JSC "SMM"
51/4 build. 1, Shepkina str.
Moscow, 129110
Russia
phone +74952296363 ext. 1514
fax +74952296365
cell +79197848963
e-mail v.kuk...@smm.ru
site http://smm.ru
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users