strange queue_if_no_path behavior
Hi! Reposting a shorter version. I have a question regarding queue_if_no_path behavior. I tried Red Hat 5.0 2.6.18-8.el5 kernel and more or less recent multipath-tools. Set no_path_retry queue in multipath.conf and tried losing all paths to a SAN device, while I'm dd-ing from /dev/zero to /dev/mapper/... What's strange is that not only ios to that device got blocked, but also ios to /tmp and /var/log/messages etc that reside on local drive. When I return some paths to the SAN device, all ios resume, both ios to that device and those unexpectedly blocked. Please tell me if this is an expected behavior and if not, how could we find a source of the problem and fix it? # ps aux | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 2872 0.0 0.0 10064 748 ?Ds 21:58 0:00 syslogd -m 0 root 3800 24.9 0.0 63300 1592 ttyS0D22:01 0:22 dd if /dev/zero of /dev/mapper/... root 3990 0.0 0.0 58020 476 ttyS0D22:02 0:00 tail -f /var/log/messages Thanks much, Maxim. Can't include here the full sysrq output as the message doesn't reach the mailing list. Sometimes (maybe it depends if root is on lvm or not) it tells BUG: soft lockup detected on CPU#3! BUG: soft lockup detected on CPU#0! BUG: soft lockup detected on CPU#1! BUG: soft lockup detected on CPU#2! But I always see [] schedule_timeout+0x8a/0xad [] process_timeout+0x0/0x5 [] io_schedule_timeout+0x4b/0x79 [] blk_congestion_wait+0x66/0x80 for all processes in D state. syslogd D 810075f779c8 0 15395 1 15398 15380 (NOTLB) 810075f779c8 8100022c7750 810002667068 0009 81007fbe5080 810037d1b100 006b91ee3bd1 14f4 81007fbe5268 0003 810037d1b100 Call Trace: [] schedule_timeout+0x8a/0xad [] process_timeout+0x0/0x5 [] io_schedule_timeout+0x4b/0x79 [] blk_congestion_wait+0x66/0x80 [] autoremove_wake_function+0x0/0x2e [] writeback_inodes+0xa8/0xd8 [] balance_dirty_pages_ratelimited_nr+0x183/0x1fa [] generic_file_buffered_write+0x5a4/0x6d8 [] skb_copy_datagram_iovec+0x4f/0x237 [] current_fs_time+0x3b/0x40 [] unix_dgram_recvmsg+0x240/0x25e [] __generic_file_aio_write_nolock+0x36d/0x3b8 [] __generic_file_write_nolock+0x8f/0xa8 [] core_sys_select+0x1f9/0x265 [] autoremove_wake_function+0x0/0x2e [] mutex_lock+0xd/0x1d [] generic_file_writev+0x48/0xa2 [] do_sync_write+0x0/0x104 [] do_readv_writev+0x176/0x295 [] do_sync_write+0x0/0x104 [] audit_syscall_entry+0x14d/0x180 [] sys_writev+0x45/0x93 [] tracesys+0xd1/0xdc klogd S 8100757e5be8 0 15398 1 15410 15395 (NOTLB) 8100757e5be8 81007fbe5080 80086480 000a 810037fe37a0 81007c30a7e0 00690912dbde 0003fc7a 810037fe3988 80044d16 fffe Call Trace: [] enqueue_task+0x41/0x56 [] try_to_wake_up+0x407/0x418 [] cache_alloc_refill+0x106/0x186 [] schedule_timeout+0x1e/0xad [] prepare_to_wait_exclusive+0x38/0x61 [] unix_wait_for_peer+0x90/0xac [] autoremove_wake_function+0x0/0x2e [] unix_dgram_sendmsg+0x3de/0x4cf [] do_sock_write+0xc4/0xce [] sock_aio_write+0x4f/0x5e [] thread_return+0x0/0xea [] do_sync_write+0xc7/0x104 [] autoremove_wake_function+0x0/0x2e [] autoremove_wake_function+0x0/0x2e [] vfs_write+0xe1/0x174 [] sys_write+0x45/0x6e [] tracesys+0xd1/0xdc irqbalanceS 810074c05eb8 0 15410 1 15432 15398 (NOTLB) 810074c05eb8 810074c05e58 810074c05e58 0007 81007d6bb7a0 802d1ae0 006b8b7abaa3 0007f864 81007d6bb988 8100 810002c384e0 Call Trace: [] do_nanosleep+0x3f/0x70 [] hrtimer_nanosleep+0x58/0x118 [] hrtimer_wakeup+0x0/0x22 [] sys_nanosleep+0x4c/0x62 [] tracesys+0xd1/0xdc multipathdS 810074031d48 0 15432 1 15436 15410 (NOTLB) 810074031d48 810075f5c140 fff0 0001 81007dbd2080 810037fe9080 00290837b6b9 00091d38 81007dbd2268 8100 0044 8100fc10 Call Trace: [] __rmqueue+0x4c/0xe1 [] find_extend_vma+0x16/0x59 [] schedule_timeout+0x1e/0xad [] add_wait_queue+0x24/0x34 [] do_futex+0x1da/0xbc7 [] enqueue_task+0x41/0x56 [] default_wake_function+0x0/0xe [] wake_up_new_task+0x231/0x240 [] sys_futex+0x101/0x123 [] tracesys+0xd1/0xdc multipathdS 810074063b68 0 15436 1 15456 15432 (NOTLB) 810074063b68 81007fbe5080 80086480 000a 81007fb2a7a0 802d1ae0 0069091b6af3 de22 81007fb2a988 80044d16 Call Trace: [] enqueue_task+0x41/0x56 [] try_to_wake_up+0x407/0x418 [] schedule_timeout+0x1e/0xad [] prepare_to_wait_exclusive+0x38/0x61 [] unix_wait_for_peer+0x90/0xac [] autoremove_wake_function+0x0/0x2e []
strange queue_if_no_path behavior
Hi! Reposting a shorter version. I have a question regarding queue_if_no_path behavior. I tried Red Hat 5.0 2.6.18-8.el5 kernel and more or less recent multipath-tools. Set no_path_retry queue in multipath.conf and tried losing all paths to a SAN device, while I'm dd-ing from /dev/zero to /dev/mapper/... What's strange is that not only ios to that device got blocked, but also ios to /tmp and /var/log/messages etc that reside on local drive. When I return some paths to the SAN device, all ios resume, both ios to that device and those unexpectedly blocked. Please tell me if this is an expected behavior and if not, how could we find a source of the problem and fix it? # ps aux | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 2872 0.0 0.0 10064 748 ?Ds 21:58 0:00 syslogd -m 0 root 3800 24.9 0.0 63300 1592 ttyS0D22:01 0:22 dd if /dev/zero of /dev/mapper/... root 3990 0.0 0.0 58020 476 ttyS0D22:02 0:00 tail -f /var/log/messages Thanks much, Maxim. Can't include here the full sysrq output as the message doesn't reach the mailing list. Sometimes (maybe it depends if root is on lvm or not) it tells BUG: soft lockup detected on CPU#3! BUG: soft lockup detected on CPU#0! BUG: soft lockup detected on CPU#1! BUG: soft lockup detected on CPU#2! But I always see [800613c7] schedule_timeout+0x8a/0xad [80092de7] process_timeout+0x0/0x5 [80060d55] io_schedule_timeout+0x4b/0x79 [8003a9c4] blk_congestion_wait+0x66/0x80 for all processes in D state. syslogd D 810075f779c8 0 15395 1 15398 15380 (NOTLB) 810075f779c8 8100022c7750 810002667068 0009 81007fbe5080 810037d1b100 006b91ee3bd1 14f4 81007fbe5268 0003 810037d1b100 Call Trace: [800613c7] schedule_timeout+0x8a/0xad [80092de7] process_timeout+0x0/0x5 [80060d55] io_schedule_timeout+0x4b/0x79 [8003a9c4] blk_congestion_wait+0x66/0x80 [8009b666] autoremove_wake_function+0x0/0x2e [8004ece5] writeback_inodes+0xa8/0xd8 [800bc61b] balance_dirty_pages_ratelimited_nr+0x183/0x1fa [8000fc69] generic_file_buffered_write+0x5a4/0x6d8 [80030dd5] skb_copy_datagram_iovec+0x4f/0x237 [8000dd98] current_fs_time+0x3b/0x40 [80251b9e] unix_dgram_recvmsg+0x240/0x25e [80015d10] __generic_file_aio_write_nolock+0x36d/0x3b8 [800b9744] __generic_file_write_nolock+0x8f/0xa8 [800d8408] core_sys_select+0x1f9/0x265 [8009b666] autoremove_wake_function+0x0/0x2e [80061622] mutex_lock+0xd/0x1d [800b97a5] generic_file_writev+0x48/0xa2 [8001770b] do_sync_write+0x0/0x104 [800d0f6c] do_readv_writev+0x176/0x295 [8001770b] do_sync_write+0x0/0x104 [800b1cca] audit_syscall_entry+0x14d/0x180 [800d1115] sys_writev+0x45/0x93 [8005b2c1] tracesys+0xd1/0xdc klogd S 8100757e5be8 0 15398 1 15410 15395 (NOTLB) 8100757e5be8 81007fbe5080 80086480 000a 810037fe37a0 81007c30a7e0 00690912dbde 0003fc7a 810037fe3988 80044d16 fffe Call Trace: [80086480] enqueue_task+0x41/0x56 [80044d16] try_to_wake_up+0x407/0x418 [8005a534] cache_alloc_refill+0x106/0x186 [8006135b] schedule_timeout+0x1e/0xad [80045be5] prepare_to_wait_exclusive+0x38/0x61 [80250e8f] unix_wait_for_peer+0x90/0xac [8009b666] autoremove_wake_function+0x0/0x2e [80251422] unix_dgram_sendmsg+0x3de/0x4cf [80037264] do_sock_write+0xc4/0xce [8004543e] sock_aio_write+0x4f/0x5e [80060ab8] thread_return+0x0/0xea [800177d2] do_sync_write+0xc7/0x104 [8009b666] autoremove_wake_function+0x0/0x2e [8009b666] autoremove_wake_function+0x0/0x2e [80016134] vfs_write+0xe1/0x174 [800169b2] sys_write+0x45/0x6e [8005b2c1] tracesys+0xd1/0xdc irqbalanceS 810074c05eb8 0 15410 1 15432 15398 (NOTLB) 810074c05eb8 810074c05e58 810074c05e58 0007 81007d6bb7a0 802d1ae0 006b8b7abaa3 0007f864 81007d6bb988 8100 810002c384e0 Call Trace: [80061804] do_nanosleep+0x3f/0x70 [800587ce] hrtimer_nanosleep+0x58/0x118 [8009d5e0] hrtimer_wakeup+0x0/0x22 [800526e5] sys_nanosleep+0x4c/0x62 [8005b2c1] tracesys+0xd1/0xdc multipathdS 810074031d48 0 15432 1 15436 15410 (NOTLB) 810074031d48 810075f5c140 fff0 0001 81007dbd2080 810037fe9080 00290837b6b9 00091d38 81007dbd2268 8100 0044 8100fc10 Call Trace: [800baea2]
2.6.12.3/2.6.13-rc3 BUG REPORT - x86_64 with hyperthreading
Hi! I tried 2.6.12.3/2.6.13-rc3 compiled for x86_64 on Supermicro dual Xeon with hyperthreading enabled and the kernel gets stuck when trying to initialize the second CPU. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K using mwait in idle threads. CPU: Physical Processor ID: 0 CPU0: Thermal monitoring enabled (TM1) Using local APIC timer interrupts. Detected 12.501 MHz APIC timer. Booting processor 1/6 rip 6000 rsp 81007ff35f58 Initializing CPU#1 CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 3 CPU1: Thermal monitoring enabled (TM1) Intel(R) Xeon(TM) CPU 2.80GHz stepping 01 CPU 1: Syncing TSC to CPU 0. Booting processor 2/1 rip 6000 rsp 8100032dff58 Initializing CPU#2 Booting with hyperthreading disabled is OK. Booting with hyperthreading enabled and maxcpus=1 is also OK. Here are board/bios details: Supermicro X6DH8-XG2/X6DHE-XG2 BIOS Rev 1.2a CPU = 4 - Intel(R) Xeon(TM) CPU 2.80GHz DRAM Type : DDR2-400 Hyper Threading Technology Enabled Please advise. Thanks, Maxim Kozover. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.12.3/2.6.13-rc3 BUG REPORT - x86_64 with hyperthreading
Hi! I tried 2.6.12.3/2.6.13-rc3 compiled for x86_64 on Supermicro dual Xeon with hyperthreading enabled and the kernel gets stuck when trying to initialize the second CPU. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K using mwait in idle threads. CPU: Physical Processor ID: 0 CPU0: Thermal monitoring enabled (TM1) Using local APIC timer interrupts. Detected 12.501 MHz APIC timer. Booting processor 1/6 rip 6000 rsp 81007ff35f58 Initializing CPU#1 CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 3 CPU1: Thermal monitoring enabled (TM1) Intel(R) Xeon(TM) CPU 2.80GHz stepping 01 CPU 1: Syncing TSC to CPU 0. Booting processor 2/1 rip 6000 rsp 8100032dff58 Initializing CPU#2 Booting with hyperthreading disabled is OK. Booting with hyperthreading enabled and maxcpus=1 is also OK. Here are board/bios details: Supermicro X6DH8-XG2/X6DHE-XG2 BIOS Rev 1.2a CPU = 4 - Intel(R) Xeon(TM) CPU 2.80GHz DRAM Type : DDR2-400 Hyper Threading Technology Enabled Please advise. Thanks, Maxim Kozover. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/