Re: INFO: task blocked for more than 120 seconds.
Good day, Someone in our local LUG recently posed about this, he was seeing Jul 21 07:39:04 linux kernel: INFO: task mount:5258 blocked for more than 120 seconds. Jul 21 07:39:04 linux kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Jul 21 07:39:04 linux kernel: mount D 0002 0 / 5258 4842 0x0080 Jul 21 07:39:04 linux kernel: 880037b4fb28 0082 / 880037b4fa78 812d6ea0 Jul 21 07:39:04 linux kernel: 880037b4fad8 812ef2f9 / 880037b4ffd8 880037b4ffd8 He found a solution: Googling around suggests that it might be a recurrent kernel bug (as it appears to come and go between versions) and that it might be related to the method the kernel uses to idle the processor. The most power-efficient technique is to use the MWAIT instruction, which is the default on systems that support it. The least power-efficient (and most performant) is use a polling idle loop. In between is to use the HALT instruction. Since starting to boot my kernel with idle=halt (and pcie_aspm=off, FWIW, but I don't think my system was using it anyway), I haven't had any recurrance. Does that help?
Re: INFO: task blocked for more than 120 seconds.
Steven Timm a écrit : Hi Doug--I have seen the same message on some of our machines but so far it hasn't caused any real performance problems up until now. It's not so much if you are running SL5.5 but just as long as you are running some of the latest errata kernels.. we only saw it show up on SL5.3 but with the latest errata kernel. Hello, here at CC-IN2P3 we met this problem. It appears 3 or 4 release back, and was never corrected in latests updates. The problem seems to come from mainline kernel stuff (I guess backported with other features), as I saw some similar bug reports on mainline kernel lists. We also met the task blocked messages, and / or increasing of system load, and deeply locked tasks that make 'ps' or similar commands to hang. The only way we found to reduce the problem was to tune the kernel parameters (in particular parameters related to VM). But it only reduced the problem. -- Y. On Mon, 30 Aug 2010, Doug Johnson wrote: Greetings, I am seeing the following messsage when an SL5.5 (all of the most recent updates are installed) is under load writing data to an NFS disk: NOTE: It occurs for other processes than kswapd0, so I don't think that has anything to do with the issue. Aug 30 18:25:21 se kernel: INFO: task kswapd0:220 blocked for more than 120 seconds. Aug 30 18:25:21 se kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Aug 30 18:25:21 se kernel: kswapd0 D 810003336420 0 220 36 221 219 (L-TLB) Aug 30 18:25:21 se kernel: 810003be19e0 0046 810037c9c200 8100ae3c4000 Aug 30 18:25:21 se kernel: 0003 000a 810037f2a860 80308b60 Aug 30 18:25:21 se kernel: 0a919f5c3fe1 002d7d53 810037f2aa48 c770f5f8 Aug 30 18:25:21 se kernel: Call Trace: Aug 30 18:25:21 se kernel: [8006e1db] do_gettimeofday+0x40/0x90 Aug 30 18:25:21 se kernel: [886646e5] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd Aug 30 18:25:21 se kernel: [800637ea] io_schedule+0x3f/0x67 Aug 30 18:25:21 se kernel: [886646ee] :nfs:nfs_wait_bit_uninterruptible+0x9/0xd Aug 30 18:25:21 se kernel: [80063a16] __wait_on_bit+0x40/0x6e Aug 30 18:25:21 se kernel: [886646e5] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd Aug 30 18:25:21 se kernel: [80063ab0] out_of_line_wait_on_bit+0x6c/0x78 Aug 30 18:25:21 se kernel: [800a0a06] wake_bit_function+0x0/0x23 Aug 30 18:25:21 se kernel: [88668106] :nfs:nfs_wait_on_requests_locked+0x70/0xca Aug 30 18:25:21 se kernel: [88669146] :nfs:nfs_sync_inode_wait+0x60/0x1db Aug 30 18:25:21 se kernel: [8865f234] :nfs:nfs_release_page+0x2c/0x4d Aug 30 18:25:21 se kernel: [800caea8] shrink_inactive_list+0x511/0x8d8 Aug 30 18:25:21 se kernel: [800ca39b] isolate_lru_pages+0x98/0xbf Aug 30 18:25:21 se kernel: [80047e98] __pagevec_release+0x19/0x22 Aug 30 18:25:21 se kernel: [800ca876] shrink_active_list+0x4b4/0x4c4 Aug 30 18:25:21 se kernel: [800130f5] shrink_zone+0x127/0x18d Aug 30 18:25:21 se kernel: [80057b94] kswapd+0x323/0x46c Aug 30 18:25:21 se kernel: [800a09d8] autoremove_wake_function+0x0/0x2e Aug 30 18:25:21 se kernel: [800a07c0] keventd_create_kthread+0x0/0xc4 Aug 30 18:25:21 se kernel: [80057871] kswapd+0x0/0x46c Aug 30 18:25:21 se kernel: [800a07c0] keventd_create_kthread+0x0/0xc4 Aug 30 18:25:21 se kernel: [8003287b] kthread+0xfe/0x132 Aug 30 18:25:21 se kernel: [8005dfb1] child_rip+0xa/0x11 Aug 30 18:25:21 se kernel: [800a07c0] keventd_create_kthread+0x0/0xc4 Aug 30 18:25:21 se kernel: [8003277d] kthread+0x0/0x132 Aug 30 18:25:21 se kernel: [8005dfa7] child_rip+0x0/0x11 I have seen this error with both an Intel Pro1000 and a Realtek Ethernet card. I am doing work with 2 other different Universities (completely different hardware) and they have all seen this message. Prior to 5.5, this would result in the machine locking up. Now with 5.5 it appears that the load level on the machine slowly rises (I assume due to D wait state blocked processes), but the machine is somewhat responsive. Also once these messages occur, ps will hang and that session becomes unusable. I don't what this means, but a similarly configured machine with identical hardware running SL4.7 does not produce these errors and the NFS throughput is pretty darn good. Any help or pointers in some direction will be appreciated, Thanks, doug Doug Johnsonemail: drj...@pizero.colorado.edu B390, Duane Physics (303)-492-4506 Office Boulder, CO 80309 (303)-492-5119 FAX http://www.aaccchildren.org Tully, baby. Look around. It's a cage with golden
Re: INFO: task blocked for more than 120 seconds.
Hi Doug--I have seen the same message on some of our machines but so far it hasn't caused any real performance problems up until now. It's not so much if you are running SL5.5 but just as long as you are running some of the latest errata kernels.. we only saw it show up on SL5.3 but with the latest errata kernel. Steve Steve On Mon, 30 Aug 2010, Doug Johnson wrote: Greetings, I am seeing the following messsage when an SL5.5 (all of the most recent updates are installed) is under load writing data to an NFS disk: NOTE: It occurs for other processes than kswapd0, so I don't think that has anything to do with the issue. Aug 30 18:25:21 se kernel: INFO: task kswapd0:220 blocked for more than 120 seconds. Aug 30 18:25:21 se kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Aug 30 18:25:21 se kernel: kswapd0 D 810003336420 0 220 36 221 219 (L-TLB) Aug 30 18:25:21 se kernel: 810003be19e0 0046 810037c9c200 8100ae3c4000 Aug 30 18:25:21 se kernel: 0003 000a 810037f2a860 80308b60 Aug 30 18:25:21 se kernel: 0a919f5c3fe1 002d7d53 810037f2aa48 c770f5f8 Aug 30 18:25:21 se kernel: Call Trace: Aug 30 18:25:21 se kernel: [8006e1db] do_gettimeofday+0x40/0x90 Aug 30 18:25:21 se kernel: [886646e5] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd Aug 30 18:25:21 se kernel: [800637ea] io_schedule+0x3f/0x67 Aug 30 18:25:21 se kernel: [886646ee] :nfs:nfs_wait_bit_uninterruptible+0x9/0xd Aug 30 18:25:21 se kernel: [80063a16] __wait_on_bit+0x40/0x6e Aug 30 18:25:21 se kernel: [886646e5] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd Aug 30 18:25:21 se kernel: [80063ab0] out_of_line_wait_on_bit+0x6c/0x78 Aug 30 18:25:21 se kernel: [800a0a06] wake_bit_function+0x0/0x23 Aug 30 18:25:21 se kernel: [88668106] :nfs:nfs_wait_on_requests_locked+0x70/0xca Aug 30 18:25:21 se kernel: [88669146] :nfs:nfs_sync_inode_wait+0x60/0x1db Aug 30 18:25:21 se kernel: [8865f234] :nfs:nfs_release_page+0x2c/0x4d Aug 30 18:25:21 se kernel: [800caea8] shrink_inactive_list+0x511/0x8d8 Aug 30 18:25:21 se kernel: [800ca39b] isolate_lru_pages+0x98/0xbf Aug 30 18:25:21 se kernel: [80047e98] __pagevec_release+0x19/0x22 Aug 30 18:25:21 se kernel: [800ca876] shrink_active_list+0x4b4/0x4c4 Aug 30 18:25:21 se kernel: [800130f5] shrink_zone+0x127/0x18d Aug 30 18:25:21 se kernel: [80057b94] kswapd+0x323/0x46c Aug 30 18:25:21 se kernel: [800a09d8] autoremove_wake_function+0x0/0x2e Aug 30 18:25:21 se kernel: [800a07c0] keventd_create_kthread+0x0/0xc4 Aug 30 18:25:21 se kernel: [80057871] kswapd+0x0/0x46c Aug 30 18:25:21 se kernel: [800a07c0] keventd_create_kthread+0x0/0xc4 Aug 30 18:25:21 se kernel: [8003287b] kthread+0xfe/0x132 Aug 30 18:25:21 se kernel: [8005dfb1] child_rip+0xa/0x11 Aug 30 18:25:21 se kernel: [800a07c0] keventd_create_kthread+0x0/0xc4 Aug 30 18:25:21 se kernel: [8003277d] kthread+0x0/0x132 Aug 30 18:25:21 se kernel: [8005dfa7] child_rip+0x0/0x11 I have seen this error with both an Intel Pro1000 and a Realtek Ethernet card. I am doing work with 2 other different Universities (completely different hardware) and they have all seen this message. Prior to 5.5, this would result in the machine locking up. Now with 5.5 it appears that the load level on the machine slowly rises (I assume due to D wait state blocked processes), but the machine is somewhat responsive. Also once these messages occur, ps will hang and that session becomes unusable. I don't what this means, but a similarly configured machine with identical hardware running SL4.7 does not produce these errors and the NFS throughput is pretty darn good. Any help or pointers in some direction will be appreciated, Thanks, doug Doug Johnsonemail: drj...@pizero.colorado.edu B390, Duane Physics (303)-492-4506 Office Boulder, CO 80309 (303)-492-5119 FAX http://www.aaccchildren.org Tully, baby. Look around. It's a cage with golden bars. -- -- Steven C. Timm, Ph.D (630) 840-8525 t...@fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division, Scientific Computing Facilities, Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.