Привет.

Имеем:
2x Dell 2950 with Debian 5.0 x64
Kernel: 2.6.32-bpo.5-amd64 from backports
Soft: DRBD + OCFS2

Две ноды в тестировании с DRBD + OCFS2 на них. Все было хорошо, но сегодня ночью обе перезагрузились и одна из них намертво зависла, вторая загрузилась.

Никакой работы на них не было - полный idle.

Где искать проблему?
Вот что удалось найти в логах:


В ssh консоли:

Message from sysl...@mail01 at Aug 31 02:06:37 ...
 kernel:[43263.442871] ------------[ cut here ]------------

Message from sysl...@mail01 at Aug 31 02:06:37 ...
 kernel:[43263.442946] invalid opcode: 0000 [#1] SMP

Message from sysl...@mail01 at Aug 31 02:06:37 ...
 kernel:[43263.442973] last sysfs file: /sys/fs/o2cb/interface_revision

Message from sysl...@mail01 at Aug 31 02:06:37 ...
 kernel:[43263.443831] Stack:

Message from sysl...@mail01 at Aug 31 02:06:37 ...
 kernel:[43263.444002] Call Trace:

Message from sysl...@mail01 at Aug 31 02:06:37 ...
kernel:[43263.444244] Code: 83 c3 08 48 83 3b 00 eb ec 48 83 fd 10 0f 86 89 00 00 00 48 89 ef e8 b9 e8 ff ff 48 89 c7 48 8b 00 84 c0 78 13 66 a9 00 c0 75 04 <0f> 0b eb fe 5b 5d 41 5c e9 54 59 fd ff 48 8b 4c 24 18 4c 8b 4f



В /var/log/messages:
Aug 30 14:26:01 mail01 kernel: [ 1227.315451] ocfs2_dlm: Node 1 joins domain 9A96A6832198449A9C8329D2E0C4ED7B Aug 30 14:26:01 mail01 kernel: [ 1227.315527] ocfs2_dlm: Nodes in domain ("9A96A6832198449A9C8329D2E0C4ED7B"): 0 1

*** HERE IS PROMLEM STARTED ***

Aug 31 02:06:37 mail01 kernel: [43263.442999] CPU 1
Aug 31 02:06:37 mail01 kernel: [43263.443021] Modules linked in: drbd ocfs2 jbd2 quota_tree sha1_generic hmac lru_cache cn xt_multiport ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglu e configfs nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables ext2 loop i5k_amb snd_pcm snd_timer dcdbas snd soundcore evdev i5000_edac serio_raw snd_page_alloc psmouse edac_core pcspkr rng_core processor button shpchp pci_hotplug ext3 jbd mbcache sg sr_mod cdrom sd_mod ses crc_t10dif enclosure ata_generic ata_piix ehci_hcd uhci_hcd megaraid_sas libata scsi_mod usbcore nls_
base bnx2 thermal fan thermal_sys [last unloaded: drbd]
Aug 31 02:06:37 mail01 kernel: [43263.443362] Pid: 2011, comm: slapd Not tainted 2.6.32-bpo.5-amd64 #1 PowerEdge 2950 Aug 31 02:06:37 mail01 kernel: [43263.443406] RIP: 0010:[<ffffffff810e55eb>] [<ffffffff810e55eb>] kfree+0x55/0xcb Aug 31 02:06:37 mail01 kernel: [43263.443456] RSP: 0018:ffff88012ca85db8 EFLAGS: 00010046 Aug 31 02:06:37 mail01 kernel: [43263.443482] RAX: 0200000000080000 RBX: 0000000000000000 RCX: 0000000068a4cfe9 Aug 31 02:06:37 mail01 kernel: [43263.443511] RDX: ffff88012fc13000 RSI: 0000000000000010 RDI: ffffea0003800000 Aug 31 02:06:37 mail01 kernel: [43263.443540] RBP: ffff880100000001 R08: 0000000072b1d310 R09: 00000000d002ea4e Aug 31 02:06:37 mail01 kernel: [43263.443569] R10: 000000008a6becc7 R11: 0000000072d84f77 R12: ffffffff812673ba Aug 31 02:06:37 mail01 kernel: [43263.443598] R13: 0000000000000001 R14: 0000000000000010 R15: 0000000000000000 Aug 31 02:06:37 mail01 kernel: [43263.443628] FS: 000000004194a950(0063) GS:ffff880005440000(0000) knlGS:0000000000000000 Aug 31 02:06:37 mail01 kernel: [43263.443672] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Aug 31 02:06:37 mail01 kernel: [43263.443699] CR2: 00007f2b6dffd000 CR3: 000000012dfaa000 CR4: 00000000000006e0 Aug 31 02:06:37 mail01 kernel: [43263.443728] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 31 02:06:37 mail01 kernel: [43263.443757] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Aug 31 02:06:37 mail01 kernel: [43263.443786] Process slapd (pid: 2011, threadinfo ffff88012ca84000, task ffff88012e421530) Aug 31 02:06:37 mail01 kernel: [43263.443850] 0000000000000000 ffff88012fc13000 0000000000000002 ffffffff812673ba Aug 31 02:06:37 mail01 kernel: [43263.443884] <0> ffff880100000001 0000000000000000 ffff88012fc13000 ffff88010efb5800 Aug 31 02:06:37 mail01 kernel: [43263.443935] <0> 0000000000000001 ffff880100000001 00000000000007d5 ffffffff81267dc4 Aug 31 02:06:37 mail01 kernel: [43263.444026] [<ffffffff812673ba>] ? nl_pid_hash_rehash+0xca/0xf1 Aug 31 02:06:37 mail01 kernel: [43263.444053] [<ffffffff81267dc4>] ? netlink_insert+0xbc/0x123 Aug 31 02:06:37 mail01 kernel: [43263.444081] [<ffffffff81267eca>] ? netlink_autobind+0x9f/0xbc Aug 31 02:06:37 mail01 kernel: [43263.444108] [<ffffffff81268445>] ? netlink_bind+0x82/0x179 Aug 31 02:06:37 mail01 kernel: [43263.444136] [<ffffffff8123f2a9>] ? sys_bind+0x7a/0xb9 Aug 31 02:06:37 mail01 kernel: [43263.444162] [<ffffffff810eb2f3>] ? fd_install+0x2e/0x5a Aug 31 02:06:37 mail01 kernel: [43263.444188] [<ffffffff8123e2a8>] ? sock_map_fd+0x57/0x64 Aug 31 02:06:37 mail01 kernel: [43263.444217] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Aug 31 02:06:37 mail01 kernel: [43263.444460]  RSP <ffff88012ca85db8>
Aug 31 02:06:37 mail01 kernel: [43263.444821] ---[ end trace 381ebef00a1cbadb ]---

*** I REBOOT SERVER ***

Aug 31 08:47:07 mail01 kernel: imklog 3.18.6, log source = /proc/kmsg started. Aug 31 08:47:07 mail01 rsyslogd: [origin software="rsyslogd" swVersion="3.18.6" x-pid="1987" x-info="http://www.rsyslog.com";] restart Aug 31 08:47:07 mail01 kernel: [ 0.000000] Initializing cgroup subsys cpuset
Aug 31 08:47:07 mail01 kernel: [    0.000000] Initializing cgroup subsys cpu



--
Best regards,
Proskurin Kirill



--
To UNSUBSCRIBE, email to debian-russian-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c7ce740.6070...@fxclub.org

Ответить