strange queue_if_no_path behavior

2007-06-19 Thread Maxim Kozover
Hi!
Reposting a shorter version.
I have a question regarding queue_if_no_path behavior.
I tried Red Hat 5.0 2.6.18-8.el5 kernel and more or less recent multipath-tools.
Set no_path_retry queue in multipath.conf and tried losing all paths
to a SAN device, while I'm dd-ing from /dev/zero to /dev/mapper/...

What's strange is that not only ios to that device got blocked, but
also ios to /tmp and /var/log/messages etc that reside on local drive.
When I return some paths to the SAN device, all ios resume, both ios
to that device and those unexpectedly blocked.

Please tell me if this is an expected behavior and if not, how could
we find a source of the problem and fix it?

# ps aux | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root  2872  0.0  0.0  10064   748 ?Ds   21:58   0:00 syslogd -m 0
root  3800 24.9  0.0  63300  1592 ttyS0D22:01   0:22 dd if
/dev/zero of /dev/mapper/...
root  3990  0.0  0.0  58020   476 ttyS0D22:02   0:00 tail -f 
/var/log/messages

Thanks much,

Maxim.

Can't include here the full sysrq output as the message doesn't reach
the mailing list.
Sometimes (maybe it depends if root is on lvm or not) it tells
BUG: soft lockup detected on CPU#3!
BUG: soft lockup detected on CPU#0!
BUG: soft lockup detected on CPU#1!
BUG: soft lockup detected on CPU#2!

But I always see
 [] schedule_timeout+0x8a/0xad
 [] process_timeout+0x0/0x5
 [] io_schedule_timeout+0x4b/0x79
 [] blk_congestion_wait+0x66/0x80
 for all processes in D state.

syslogd   D 810075f779c8 0 15395  1 15398 15380 (NOTLB)
 810075f779c8 8100022c7750 810002667068 0009
 81007fbe5080 810037d1b100 006b91ee3bd1 14f4
 81007fbe5268 0003 810037d1b100 
Call Trace:
 [] schedule_timeout+0x8a/0xad
 [] process_timeout+0x0/0x5
 [] io_schedule_timeout+0x4b/0x79
 [] blk_congestion_wait+0x66/0x80
 [] autoremove_wake_function+0x0/0x2e
 [] writeback_inodes+0xa8/0xd8
 [] balance_dirty_pages_ratelimited_nr+0x183/0x1fa
 [] generic_file_buffered_write+0x5a4/0x6d8
 [] skb_copy_datagram_iovec+0x4f/0x237
 [] current_fs_time+0x3b/0x40
 [] unix_dgram_recvmsg+0x240/0x25e
 [] __generic_file_aio_write_nolock+0x36d/0x3b8
 [] __generic_file_write_nolock+0x8f/0xa8
 [] core_sys_select+0x1f9/0x265
 [] autoremove_wake_function+0x0/0x2e
 [] mutex_lock+0xd/0x1d
 [] generic_file_writev+0x48/0xa2
 [] do_sync_write+0x0/0x104
 [] do_readv_writev+0x176/0x295
 [] do_sync_write+0x0/0x104
 [] audit_syscall_entry+0x14d/0x180
 [] sys_writev+0x45/0x93
 [] tracesys+0xd1/0xdc

klogd S 8100757e5be8 0 15398  1 15410 15395 (NOTLB)
 8100757e5be8 81007fbe5080 80086480 000a
 810037fe37a0 81007c30a7e0 00690912dbde 0003fc7a
 810037fe3988  80044d16 fffe
Call Trace:
 [] enqueue_task+0x41/0x56
 [] try_to_wake_up+0x407/0x418
 [] cache_alloc_refill+0x106/0x186
 [] schedule_timeout+0x1e/0xad
 [] prepare_to_wait_exclusive+0x38/0x61
 [] unix_wait_for_peer+0x90/0xac
 [] autoremove_wake_function+0x0/0x2e
 [] unix_dgram_sendmsg+0x3de/0x4cf
 [] do_sock_write+0xc4/0xce
 [] sock_aio_write+0x4f/0x5e
 [] thread_return+0x0/0xea
 [] do_sync_write+0xc7/0x104
 [] autoremove_wake_function+0x0/0x2e
 [] autoremove_wake_function+0x0/0x2e
 [] vfs_write+0xe1/0x174
 [] sys_write+0x45/0x6e
 [] tracesys+0xd1/0xdc

irqbalanceS 810074c05eb8 0 15410  1 15432 15398 (NOTLB)
 810074c05eb8 810074c05e58 810074c05e58 0007
 81007d6bb7a0 802d1ae0 006b8b7abaa3 0007f864
 81007d6bb988 8100 810002c384e0 
Call Trace:
 [] do_nanosleep+0x3f/0x70
 [] hrtimer_nanosleep+0x58/0x118
 [] hrtimer_wakeup+0x0/0x22
 [] sys_nanosleep+0x4c/0x62
 [] tracesys+0xd1/0xdc

multipathdS 810074031d48 0 15432  1 15436 15410 (NOTLB)
 810074031d48 810075f5c140 fff0 0001
 81007dbd2080 810037fe9080 00290837b6b9 00091d38
 81007dbd2268 8100 0044 8100fc10
Call Trace:
 [] __rmqueue+0x4c/0xe1
 [] find_extend_vma+0x16/0x59
 [] schedule_timeout+0x1e/0xad
 [] add_wait_queue+0x24/0x34
 [] do_futex+0x1da/0xbc7
 [] enqueue_task+0x41/0x56
 [] default_wake_function+0x0/0xe
 [] wake_up_new_task+0x231/0x240
 [] sys_futex+0x101/0x123
 [] tracesys+0xd1/0xdc

multipathdS 810074063b68 0 15436  1 15456 15432 (NOTLB)
 810074063b68 81007fbe5080 80086480 000a
 81007fb2a7a0 802d1ae0 0069091b6af3 de22
 81007fb2a988  80044d16 
Call Trace:
 [] enqueue_task+0x41/0x56
 [] try_to_wake_up+0x407/0x418
 [] schedule_timeout+0x1e/0xad
 [] prepare_to_wait_exclusive+0x38/0x61
 [] unix_wait_for_peer+0x90/0xac
 [] autoremove_wake_function+0x0/0x2e
 [] 

strange queue_if_no_path behavior

2007-06-19 Thread Maxim Kozover
Hi!
Reposting a shorter version.
I have a question regarding queue_if_no_path behavior.
I tried Red Hat 5.0 2.6.18-8.el5 kernel and more or less recent multipath-tools.
Set no_path_retry queue in multipath.conf and tried losing all paths
to a SAN device, while I'm dd-ing from /dev/zero to /dev/mapper/...

What's strange is that not only ios to that device got blocked, but
also ios to /tmp and /var/log/messages etc that reside on local drive.
When I return some paths to the SAN device, all ios resume, both ios
to that device and those unexpectedly blocked.

Please tell me if this is an expected behavior and if not, how could
we find a source of the problem and fix it?

# ps aux | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root  2872  0.0  0.0  10064   748 ?Ds   21:58   0:00 syslogd -m 0
root  3800 24.9  0.0  63300  1592 ttyS0D22:01   0:22 dd if
/dev/zero of /dev/mapper/...
root  3990  0.0  0.0  58020   476 ttyS0D22:02   0:00 tail -f 
/var/log/messages

Thanks much,

Maxim.

Can't include here the full sysrq output as the message doesn't reach
the mailing list.
Sometimes (maybe it depends if root is on lvm or not) it tells
BUG: soft lockup detected on CPU#3!
BUG: soft lockup detected on CPU#0!
BUG: soft lockup detected on CPU#1!
BUG: soft lockup detected on CPU#2!

But I always see
 [800613c7] schedule_timeout+0x8a/0xad
 [80092de7] process_timeout+0x0/0x5
 [80060d55] io_schedule_timeout+0x4b/0x79
 [8003a9c4] blk_congestion_wait+0x66/0x80
 for all processes in D state.

syslogd   D 810075f779c8 0 15395  1 15398 15380 (NOTLB)
 810075f779c8 8100022c7750 810002667068 0009
 81007fbe5080 810037d1b100 006b91ee3bd1 14f4
 81007fbe5268 0003 810037d1b100 
Call Trace:
 [800613c7] schedule_timeout+0x8a/0xad
 [80092de7] process_timeout+0x0/0x5
 [80060d55] io_schedule_timeout+0x4b/0x79
 [8003a9c4] blk_congestion_wait+0x66/0x80
 [8009b666] autoremove_wake_function+0x0/0x2e
 [8004ece5] writeback_inodes+0xa8/0xd8
 [800bc61b] balance_dirty_pages_ratelimited_nr+0x183/0x1fa
 [8000fc69] generic_file_buffered_write+0x5a4/0x6d8
 [80030dd5] skb_copy_datagram_iovec+0x4f/0x237
 [8000dd98] current_fs_time+0x3b/0x40
 [80251b9e] unix_dgram_recvmsg+0x240/0x25e
 [80015d10] __generic_file_aio_write_nolock+0x36d/0x3b8
 [800b9744] __generic_file_write_nolock+0x8f/0xa8
 [800d8408] core_sys_select+0x1f9/0x265
 [8009b666] autoremove_wake_function+0x0/0x2e
 [80061622] mutex_lock+0xd/0x1d
 [800b97a5] generic_file_writev+0x48/0xa2
 [8001770b] do_sync_write+0x0/0x104
 [800d0f6c] do_readv_writev+0x176/0x295
 [8001770b] do_sync_write+0x0/0x104
 [800b1cca] audit_syscall_entry+0x14d/0x180
 [800d1115] sys_writev+0x45/0x93
 [8005b2c1] tracesys+0xd1/0xdc

klogd S 8100757e5be8 0 15398  1 15410 15395 (NOTLB)
 8100757e5be8 81007fbe5080 80086480 000a
 810037fe37a0 81007c30a7e0 00690912dbde 0003fc7a
 810037fe3988  80044d16 fffe
Call Trace:
 [80086480] enqueue_task+0x41/0x56
 [80044d16] try_to_wake_up+0x407/0x418
 [8005a534] cache_alloc_refill+0x106/0x186
 [8006135b] schedule_timeout+0x1e/0xad
 [80045be5] prepare_to_wait_exclusive+0x38/0x61
 [80250e8f] unix_wait_for_peer+0x90/0xac
 [8009b666] autoremove_wake_function+0x0/0x2e
 [80251422] unix_dgram_sendmsg+0x3de/0x4cf
 [80037264] do_sock_write+0xc4/0xce
 [8004543e] sock_aio_write+0x4f/0x5e
 [80060ab8] thread_return+0x0/0xea
 [800177d2] do_sync_write+0xc7/0x104
 [8009b666] autoremove_wake_function+0x0/0x2e
 [8009b666] autoremove_wake_function+0x0/0x2e
 [80016134] vfs_write+0xe1/0x174
 [800169b2] sys_write+0x45/0x6e
 [8005b2c1] tracesys+0xd1/0xdc

irqbalanceS 810074c05eb8 0 15410  1 15432 15398 (NOTLB)
 810074c05eb8 810074c05e58 810074c05e58 0007
 81007d6bb7a0 802d1ae0 006b8b7abaa3 0007f864
 81007d6bb988 8100 810002c384e0 
Call Trace:
 [80061804] do_nanosleep+0x3f/0x70
 [800587ce] hrtimer_nanosleep+0x58/0x118
 [8009d5e0] hrtimer_wakeup+0x0/0x22
 [800526e5] sys_nanosleep+0x4c/0x62
 [8005b2c1] tracesys+0xd1/0xdc

multipathdS 810074031d48 0 15432  1 15436 15410 (NOTLB)
 810074031d48 810075f5c140 fff0 0001
 81007dbd2080 810037fe9080 00290837b6b9 00091d38
 81007dbd2268 8100 0044 8100fc10
Call Trace:
 [800baea2] 

2.6.12.3/2.6.13-rc3 BUG REPORT - x86_64 with hyperthreading

2005-07-18 Thread Maxim Kozover
Hi!
I tried 2.6.12.3/2.6.13-rc3 compiled for x86_64 on Supermicro dual Xeon
with hyperthreading enabled and the kernel gets stuck when trying to
initialize the second CPU.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
using mwait in idle threads.
CPU: Physical Processor ID: 0
CPU0: Thermal monitoring enabled (TM1)
Using local APIC timer interrupts.
Detected 12.501 MHz APIC timer.
Booting processor 1/6 rip 6000 rsp 81007ff35f58
Initializing CPU#1
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 3
CPU1: Thermal monitoring enabled (TM1)
  Intel(R) Xeon(TM) CPU 2.80GHz stepping 01
CPU 1: Syncing TSC to CPU 0.
Booting processor 2/1 rip 6000 rsp 8100032dff58
Initializing CPU#2

Booting with hyperthreading disabled is OK.
Booting with hyperthreading enabled and maxcpus=1 is also OK.

Here are board/bios details:
Supermicro X6DH8-XG2/X6DHE-XG2 BIOS Rev 1.2a

CPU = 4 - Intel(R) Xeon(TM) CPU 2.80GHz
DRAM Type : DDR2-400
Hyper Threading Technology Enabled

Please advise.

Thanks,

Maxim Kozover.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.12.3/2.6.13-rc3 BUG REPORT - x86_64 with hyperthreading

2005-07-18 Thread Maxim Kozover
Hi!
I tried 2.6.12.3/2.6.13-rc3 compiled for x86_64 on Supermicro dual Xeon
with hyperthreading enabled and the kernel gets stuck when trying to
initialize the second CPU.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
using mwait in idle threads.
CPU: Physical Processor ID: 0
CPU0: Thermal monitoring enabled (TM1)
Using local APIC timer interrupts.
Detected 12.501 MHz APIC timer.
Booting processor 1/6 rip 6000 rsp 81007ff35f58
Initializing CPU#1
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 3
CPU1: Thermal monitoring enabled (TM1)
  Intel(R) Xeon(TM) CPU 2.80GHz stepping 01
CPU 1: Syncing TSC to CPU 0.
Booting processor 2/1 rip 6000 rsp 8100032dff58
Initializing CPU#2

Booting with hyperthreading disabled is OK.
Booting with hyperthreading enabled and maxcpus=1 is also OK.

Here are board/bios details:
Supermicro X6DH8-XG2/X6DHE-XG2 BIOS Rev 1.2a

CPU = 4 - Intel(R) Xeon(TM) CPU 2.80GHz
DRAM Type : DDR2-400
Hyper Threading Technology Enabled

Please advise.

Thanks,

Maxim Kozover.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/