Re: 2.6.24-rc4-mm1
On 11/12/2007 8:11 AM, Andrew Morton wrote: On Tue, 11 Dec 2007 01:48:39 +1100 Reuben Farrelly <[EMAIL PROTECTED]> wrote: On 5/12/2007 4:17 PM, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. - The s390 build is still broken. I'm seeing this most incredibly unhelpful (to debug) but fortunately reproduceable problem (so far 4/4 times) on this -mm kernel. I thought this problem may have been related to another bug which I have reported (A TCP oops) but even after applying a likely fix for that I am still seeing this problem. The machine boots up perfectly fine and runs good until I load it up. In this case I can reliably cause this to occur by pulling a 3G ISO across the GigE network from my Linux box to my PC. After maybe 50M or so, the console just displays this (ignore initial boot banner): -- * Starting local ... [ ok ] This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01 tornado login: *** buffer overf --- Yes - after displaying the 'f' in what I can only guess is the word 'overflow', the box spontaneously reboots. There is no further console output until it starts to come back up again. The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage. I enabled a number of kernel debugging options but then I got no output at all when the machine crashed. I'm at a bit of a loss as to which subsystem this might be coming from, so I'm not sure who to CC. Box information is (still) up at http://www.reub.net/files/kernel/2.6.24-rc4-mm1/ hm. grepping around for "buffer overflow" doesn't turn up anything except in drivers which you won't be using on that machine. I'd be suspecting networking, obviously. If you're feeling keen could you please grep a 2.6.24-rc4 tree and apply 2.6.24-rc4-mm1's origin.patch and git-net.patch and see if the bug is still present? No - seems to be fine with just origin.patch and git-net.patch. Just for good measure I then reverted git-net.patch and applied git-netdev-all.patch instead, and still wasn't able to trigger the reboot or console message, no matter how hard I tried. I guess for now I'll sit on it, and if it appears in the next -mm it'll probably annoy me enough and inspire me to dig deeper (or, "guess" deeper, given the lack of direction as to where to even begin). Reuben -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On 11/12/2007 8:11 AM, Andrew Morton wrote: On Tue, 11 Dec 2007 01:48:39 +1100 Reuben Farrelly [EMAIL PROTECTED] wrote: On 5/12/2007 4:17 PM, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. - The s390 build is still broken. I'm seeing this most incredibly unhelpful (to debug) but fortunately reproduceable problem (so far 4/4 times) on this -mm kernel. I thought this problem may have been related to another bug which I have reported (A TCP oops) but even after applying a likely fix for that I am still seeing this problem. The machine boots up perfectly fine and runs good until I load it up. In this case I can reliably cause this to occur by pulling a 3G ISO across the GigE network from my Linux box to my PC. After maybe 50M or so, the console just displays this (ignore initial boot banner): -- * Starting local ... [ ok ] This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01 tornado login: *** buffer overf --- Yes - after displaying the 'f' in what I can only guess is the word 'overflow', the box spontaneously reboots. There is no further console output until it starts to come back up again. The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage. I enabled a number of kernel debugging options but then I got no output at all when the machine crashed. I'm at a bit of a loss as to which subsystem this might be coming from, so I'm not sure who to CC. Box information is (still) up at http://www.reub.net/files/kernel/2.6.24-rc4-mm1/ hm. grepping around for buffer overflow doesn't turn up anything except in drivers which you won't be using on that machine. I'd be suspecting networking, obviously. If you're feeling keen could you please grep a 2.6.24-rc4 tree and apply 2.6.24-rc4-mm1's origin.patch and git-net.patch and see if the bug is still present? No - seems to be fine with just origin.patch and git-net.patch. Just for good measure I then reverted git-net.patch and applied git-netdev-all.patch instead, and still wasn't able to trigger the reboot or console message, no matter how hard I tried. I guess for now I'll sit on it, and if it appears in the next -mm it'll probably annoy me enough and inspire me to dig deeper (or, guess deeper, given the lack of direction as to where to even begin). Reuben -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On 5/12/2007 4:17 PM, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. - The s390 build is still broken. I'm seeing this most incredibly unhelpful (to debug) but fortunately reproduceable problem (so far 4/4 times) on this -mm kernel. I thought this problem may have been related to another bug which I have reported (A TCP oops) but even after applying a likely fix for that I am still seeing this problem. The machine boots up perfectly fine and runs good until I load it up. In this case I can reliably cause this to occur by pulling a 3G ISO across the GigE network from my Linux box to my PC. After maybe 50M or so, the console just displays this (ignore initial boot banner): -- * Starting local ... [ ok ] This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01 tornado login: *** buffer overf --- Yes - after displaying the 'f' in what I can only guess is the word 'overflow', the box spontaneously reboots. There is no further console output until it starts to come back up again. The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage. I enabled a number of kernel debugging options but then I got no output at all when the machine crashed. I'm at a bit of a loss as to which subsystem this might be coming from, so I'm not sure who to CC. Box information is (still) up at http://www.reub.net/files/kernel/2.6.24-rc4-mm1/ Thanks, Reuben -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On 5/12/2007 4:17 PM, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. - The s390 build is still broken. I'm seeing this most incredibly unhelpful (to debug) but fortunately reproduceable problem (so far 4/4 times) on this -mm kernel. I thought this problem may have been related to another bug which I have reported (A TCP oops) but even after applying a likely fix for that I am still seeing this problem. The machine boots up perfectly fine and runs good until I load it up. In this case I can reliably cause this to occur by pulling a 3G ISO across the GigE network from my Linux box to my PC. After maybe 50M or so, the console just displays this (ignore initial boot banner): -- * Starting local ... [ ok ] This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01 tornado login: *** buffer overf --- Yes - after displaying the 'f' in what I can only guess is the word 'overflow', the box spontaneously reboots. There is no further console output until it starts to come back up again. The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage. I enabled a number of kernel debugging options but then I got no output at all when the machine crashed. I'm at a bit of a loss as to which subsystem this might be coming from, so I'm not sure who to CC. Box information is (still) up at http://www.reub.net/files/kernel/2.6.24-rc4-mm1/ Thanks, Reuben -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On 5/12/2007 4:17 PM, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. This non fatal oops which I have just noticed may be related to this change then - certainly looks networking related. WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1 Call Trace: [] tcp_fastretrans_alert+0x229/0xe63 [] tcp_ack+0xa3f/0x127d [] tcp_rcv_established+0x55f/0x7f8 [] tcp_v4_do_rcv+0xdb/0x3a7 [] :nf_conntrack:nf_ct_deliver_cached_events+0x75/0x99 [] :nf_conntrack_ipv4:ipv4_confirm+0x29/0x51 [] tcp_v4_rcv+0x9be/0xaed [] nf_hook_slow+0x60/0xdf [] ip_local_deliver_finish+0xd3/0x253 [] ip_local_deliver+0x3b/0x85 [] ip_rcv_finish+0x119/0x3b8 [] ip_rcv+0x231/0x30c [] netif_receive_skb+0x215/0x299 [] :e1000e:e1000_receive_skb+0x4d/0x1db [] :e1000e:e1000_clean_rx_irq+0x12c/0x341 [] :e1000e:e1000_clean+0x306/0x58f [] rebalance_domains+0xec/0x423 [] handle_edge_irq+0x97/0x13b [] net_rx_action+0xb8/0x11d [] __do_softirq+0x71/0xdd [] call_softirq+0x1c/0x30 [] do_softirq+0x3d/0x8d [] irq_exit+0x84/0x86 [] do_IRQ+0x7e/0xe4 [] mwait_idle+0x0/0x58 [] default_idle+0x0/0x43 [] ret_from_intr+0x0/0xa [] mwait_idle+0x48/0x58 [] enter_idle+0x22/0x24 [] cpu_idle+0x63/0x88 [] rest_init+0x55/0x60 [] start_kernel+0x2a4/0x32a [] _sinittext+0x10b/0x120 tornado home # I have posted a full dmesg up as well as my .config and an lcpci at http://www.reub.net/files/kernel/2.6.24-rc4-mm1/ . Thanks, Reuben -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On 5/12/2007 4:17 PM, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. This non fatal oops which I have just noticed may be related to this change then - certainly looks networking related. WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1 Call Trace: IRQ [8046e038] tcp_fastretrans_alert+0x229/0xe63 [80470975] tcp_ack+0xa3f/0x127d [804747b7] tcp_rcv_established+0x55f/0x7f8 [8047b1aa] tcp_v4_do_rcv+0xdb/0x3a7 [881148a8] :nf_conntrack:nf_ct_deliver_cached_events+0x75/0x99 [88120179] :nf_conntrack_ipv4:ipv4_confirm+0x29/0x51 [8047db71] tcp_v4_rcv+0x9be/0xaed [80455eaa] nf_hook_slow+0x60/0xdf [8045db6b] ip_local_deliver_finish+0xd3/0x253 [8045e146] ip_local_deliver+0x3b/0x85 [8045d7f9] ip_rcv_finish+0x119/0x3b8 [8045e030] ip_rcv+0x231/0x30c [8043ef39] netif_receive_skb+0x215/0x299 [880b82b9] :e1000e:e1000_receive_skb+0x4d/0x1db [880bc200] :e1000e:e1000_clean_rx_irq+0x12c/0x341 [880ba31a] :e1000e:e1000_clean+0x306/0x58f [8022a16a] rebalance_domains+0xec/0x423 [80261332] handle_edge_irq+0x97/0x13b [804412d3] net_rx_action+0xb8/0x11d [802344f8] __do_softirq+0x71/0xdd [8020c8fc] call_softirq+0x1c/0x30 [8020e7a5] do_softirq+0x3d/0x8d [80234485] irq_exit+0x84/0x86 [8020e89e] do_IRQ+0x7e/0xe4 [8020a908] mwait_idle+0x0/0x58 [8020a7f1] default_idle+0x0/0x43 [8020bc81] ret_from_intr+0x0/0xa EOI [8020a950] mwait_idle+0x48/0x58 [80209f23] enter_idle+0x22/0x24 [8020a897] cpu_idle+0x63/0x88 [804ada75] rest_init+0x55/0x60 [80627b9a] start_kernel+0x2a4/0x32a [8062710b] _sinittext+0x10b/0x120 tornado home # I have posted a full dmesg up as well as my .config and an lcpci at http://www.reub.net/files/kernel/2.6.24-rc4-mm1/ . Thanks, Reuben -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc7-mm1
On 25/09/2007 3:12 AM, J. Bruce Fields wrote: On Mon, Sep 24, 2007 at 09:59:29AM -0700, Andrew Morton wrote: On Tue, 25 Sep 2007 00:52:30 +1000 Reuben Farrelly <[EMAIL PROTECTED]> wrote: On 24/09/2007 7:17 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc7/2.6.23-rc7-mm1/ - New git tree git-powerpc-galak.patch added to the -mm lineup: ppc32 things, mainly (Kumar Gala <[EMAIL PROTECTED]>) I'm observing a problem with this kernel (as well as 2.6.23-rc6-mm1) which manifests itself only in my Postfix/application mail.logs: Sep 25 00:25:40 tornado postfix/smtp[12520]: fatal: select lock: Cannot allocate memory Sep 25 00:25:41 tornado postfix/master[8002]: warning: process /usr/lib64/postfix/smtp pid 12520 exit status 1 This is happening frequently with processes started via 'master' (smtp, smtpd and cleanup), but it does not appear to have any noticeable operational impact apart from logging a lot of copies of this message. The corresponding code in Postfix which triggers this is (choice of 3 files in src/master are all possibilities which all have much the same code) Oog. Looks like it's the "Memory shortage can result in inconsistent flocks state" patch--the error variable is being set in some cases when it shouldn't be. Does the following fix it? That's in my git tree, not in mainline. I'll fix up my copy. And I'll spend some time today figuring out what to do about regression testing for the posix lock, flock, and lease code. Thanks for the bug report! --b. diff --git a/fs/locks.c b/fs/locks.c index a6c5917..3e8bfd2 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -740,6 +740,7 @@ static int flock_lock_file(struct file *filp, struct file_lock *request) new_fl = locks_alloc_lock(); if (new_fl == NULL) goto out; + error = 0; } for_each_lock(inode, before) { Yes that has fixed it, thanks! Reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc7-mm1
On 24/09/2007 7:17 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc7/2.6.23-rc7-mm1/ - New git tree git-powerpc-galak.patch added to the -mm lineup: ppc32 things, mainly (Kumar Gala <[EMAIL PROTECTED]>) I'm observing a problem with this kernel (as well as 2.6.23-rc6-mm1) which manifests itself only in my Postfix/application mail.logs: Sep 25 00:25:40 tornado postfix/smtp[12520]: fatal: select lock: Cannot allocate memory Sep 25 00:25:41 tornado postfix/master[8002]: warning: process /usr/lib64/postfix/smtp pid 12520 exit status 1 This is happening frequently with processes started via 'master' (smtp, smtpd and cleanup), but it does not appear to have any noticeable operational impact apart from logging a lot of copies of this message. The corresponding code in Postfix which triggers this is (choice of 3 files in src/master are all possibilities which all have much the same code) /* * The event loop, at last. */ while (var_use_limit == 0 || use_count < var_use_limit || client_count > 0) { if (multi_server_lock != 0) { watchdog_stop(watchdog); if (myflock(vstream_fileno(multi_server_lock), INTERNAL_LOCK, MYFLOCK_OP_EXCLUSIVE) < 0) msg_fatal("select lock: %m"); } watchdog_start(watchdog); delay = loop ? loop(multi_server_name, multi_server_argv) : -1; event_loop(delay); } multi_server_exit(); } Now I'm not convinced this is an application problem, because I'm only seeing this after running up kernel 2.6.23-rc6-mm1 or 2.6.23-rc7-mm1 and with NO changes to the application itself. Using the same application binaries it does not occur with 2.6.22 mainline. [I didn't get a lot of testing with the -mm release prior to that unfortunately due to some other breakage.] Is there anything new in the last two or so -mm kernels which could have caused this? I've put my .config up at http://www.reub.net/files/kernel/2.6.23-rc7-mm1.config Thanks, Reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc7-mm1
On 24/09/2007 7:17 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc7/2.6.23-rc7-mm1/ - New git tree git-powerpc-galak.patch added to the -mm lineup: ppc32 things, mainly (Kumar Gala [EMAIL PROTECTED]) I'm observing a problem with this kernel (as well as 2.6.23-rc6-mm1) which manifests itself only in my Postfix/application mail.logs: Sep 25 00:25:40 tornado postfix/smtp[12520]: fatal: select lock: Cannot allocate memory Sep 25 00:25:41 tornado postfix/master[8002]: warning: process /usr/lib64/postfix/smtp pid 12520 exit status 1 This is happening frequently with processes started via 'master' (smtp, smtpd and cleanup), but it does not appear to have any noticeable operational impact apart from logging a lot of copies of this message. The corresponding code in Postfix which triggers this is (choice of 3 files in src/master are all possibilities which all have much the same code) /* * The event loop, at last. */ while (var_use_limit == 0 || use_count var_use_limit || client_count 0) { if (multi_server_lock != 0) { watchdog_stop(watchdog); if (myflock(vstream_fileno(multi_server_lock), INTERNAL_LOCK, MYFLOCK_OP_EXCLUSIVE) 0) msg_fatal(select lock: %m); } watchdog_start(watchdog); delay = loop ? loop(multi_server_name, multi_server_argv) : -1; event_loop(delay); } multi_server_exit(); } Now I'm not convinced this is an application problem, because I'm only seeing this after running up kernel 2.6.23-rc6-mm1 or 2.6.23-rc7-mm1 and with NO changes to the application itself. Using the same application binaries it does not occur with 2.6.22 mainline. [I didn't get a lot of testing with the -mm release prior to that unfortunately due to some other breakage.] Is there anything new in the last two or so -mm kernels which could have caused this? I've put my .config up at http://www.reub.net/files/kernel/2.6.23-rc7-mm1.config Thanks, Reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc7-mm1
On 25/09/2007 3:12 AM, J. Bruce Fields wrote: On Mon, Sep 24, 2007 at 09:59:29AM -0700, Andrew Morton wrote: On Tue, 25 Sep 2007 00:52:30 +1000 Reuben Farrelly [EMAIL PROTECTED] wrote: On 24/09/2007 7:17 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc7/2.6.23-rc7-mm1/ - New git tree git-powerpc-galak.patch added to the -mm lineup: ppc32 things, mainly (Kumar Gala [EMAIL PROTECTED]) I'm observing a problem with this kernel (as well as 2.6.23-rc6-mm1) which manifests itself only in my Postfix/application mail.logs: Sep 25 00:25:40 tornado postfix/smtp[12520]: fatal: select lock: Cannot allocate memory Sep 25 00:25:41 tornado postfix/master[8002]: warning: process /usr/lib64/postfix/smtp pid 12520 exit status 1 This is happening frequently with processes started via 'master' (smtp, smtpd and cleanup), but it does not appear to have any noticeable operational impact apart from logging a lot of copies of this message. The corresponding code in Postfix which triggers this is (choice of 3 files in src/master are all possibilities which all have much the same code) Oog. Looks like it's the Memory shortage can result in inconsistent flocks state patch--the error variable is being set in some cases when it shouldn't be. Does the following fix it? That's in my git tree, not in mainline. I'll fix up my copy. And I'll spend some time today figuring out what to do about regression testing for the posix lock, flock, and lease code. Thanks for the bug report! --b. diff --git a/fs/locks.c b/fs/locks.c index a6c5917..3e8bfd2 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -740,6 +740,7 @@ static int flock_lock_file(struct file *filp, struct file_lock *request) new_fl = locks_alloc_lock(); if (new_fl == NULL) goto out; + error = 0; } for_each_lock(inode, before) { Yes that has fixed it, thanks! Reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Serial port bug?] was Re: 2.6.22-rc4-mm2
On 7/06/2007 3:03 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc4/2.6.22-rc4-mm2/ - Basically a bugfixed version of 2.6.22-rc4-mm1. None of the subsystem trees were repulled, several bad patches were dropped, a few were fixed. I've come home to find my server has locked up hard, with a panic on the screen. This time unlike others, I was able to grab a photo of it for further analysis. http://www.reub.net/files/kernel/ serial-crash.jpg [Note also the .config and dmesg in the same directory] I have had this or a very similar traceback appear about 3 or 4 times now, including with a 2.6.21-gentoo kernel (based on mainline), so this bug may well be present in mainline. It is not new to this -mm release. The bug does not occur on demand, it just seems to happen every few days without obvious warning, I haven't reported it until now as I haven't had any other information to provide other than "some panic seems to happen with a tty_write something-or-other". The other possibly crucial piece of information on this is that I have one of my serial ports set up as a serial console. The kernel boot commands for this are: kernel /vmlinuz-2.6.22-rc4-mm2 ro real_root=/dev/md2 console=tty0 console=ttyS0,57600 panic=30 as well as this: # SERIAL CONSOLES s0:12345:respawn:/sbin/agetty 57600 ttyS0 vt100 in inittab. The other serial port is connected up to my APC UPS and is set up with apcupsd. Reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Serial port bug?] was Re: 2.6.22-rc4-mm2
On 7/06/2007 3:03 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc4/2.6.22-rc4-mm2/ - Basically a bugfixed version of 2.6.22-rc4-mm1. None of the subsystem trees were repulled, several bad patches were dropped, a few were fixed. I've come home to find my server has locked up hard, with a panic on the screen. This time unlike others, I was able to grab a photo of it for further analysis. http://www.reub.net/files/kernel/ serial-crash.jpg [Note also the .config and dmesg in the same directory] I have had this or a very similar traceback appear about 3 or 4 times now, including with a 2.6.21-gentoo kernel (based on mainline), so this bug may well be present in mainline. It is not new to this -mm release. The bug does not occur on demand, it just seems to happen every few days without obvious warning, I haven't reported it until now as I haven't had any other information to provide other than some panic seems to happen with a tty_write something-or-other. The other possibly crucial piece of information on this is that I have one of my serial ports set up as a serial console. The kernel boot commands for this are: kernel /vmlinuz-2.6.22-rc4-mm2 ro real_root=/dev/md2 console=tty0 console=ttyS0,57600 panic=30 as well as this: # SERIAL CONSOLES s0:12345:respawn:/sbin/agetty 57600 ttyS0 vt100 in inittab. The other serial port is connected up to my APC UPS and is set up with apcupsd. Reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc1-mm1 - Call trace in slub_def.h
On 16/05/2007 1:19 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/ - I found some time to look into some writeback problems in fs/fs-writeback.c. The results were ugly. There are a pile of fixes here but more work (mainly testing) needs to be done. There's some new debug code in there which could be very expensive if there are a lot of dirty inodes in the machine (quadratic behaviour). If the machine seems to be affected by this, the debugging may be disabled with echo 0 > /proc/sys/fs/inode_debug - Added an i386 early-startup development tree, as git-newsetup.patch ("H. Peter Anvin" <[EMAIL PROTECTED]>) - Brought back git-sas.patch (Darrick J. Wong <[EMAIL PROTECTED]>). It got lost quite some time ago. I have just seen this on boot, with 2.6.22-rc2-mm1 on x86_64: -- libata version 2.20 loaded. usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report BUG: at include/linux/slub_def.h:88 kmalloc_index() Call Trace: [] pci_dev_put+0x12/0x14 [] get_slab+0xb5/0x265 [] __kmalloc+0x13/0xa3 [] cache_k8_northbridges+0x80/0x116 [] gart_iommu_init+0x16/0x594 [] genl_rcv+0x0/0x68 [] netlink_kernel_create+0x15e/0x16b [] mutex_unlock+0x9/0xb [] pci_iommu_init+0x9/0x12 [] kernel_init+0x152/0x322 [] trace_hardirqs_on+0xc0/0x14e [] trace_hardirqs_on_thunk+0x35/0x37 [] trace_hardirqs_on+0xc0/0x14e [] child_rip+0xa/0x12 [] restore_args+0x0/0x30 [] kernel_init+0x0/0x322 [] child_rip+0x0/0x12 PCI-GART: No AMD northbridge found. hpet0: at MMIO 0xfed0, IRQs 2, 8, 0 hpet0: 3 64-bit timers, 14318180 Hz ACPI: RTC can wake from S4 pnp: 00:01: iomem range 0xf000-0xf3ff has been reserved pnp: 00:01: iomem range 0xfed13000-0xfed13fff has been reserved -- The full dmesg is at http://www.reub.net/files/kernel/2.6.22-rc1-mm1-dmesg and the config up at http://www.reub.net/files/kernel/2.6.22-rc1-mm1-config The machine otherwise seems to run OK. Reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc1-mm1 - Call trace in slub_def.h
On 16/05/2007 1:19 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/ - I found some time to look into some writeback problems in fs/fs-writeback.c. The results were ugly. There are a pile of fixes here but more work (mainly testing) needs to be done. There's some new debug code in there which could be very expensive if there are a lot of dirty inodes in the machine (quadratic behaviour). If the machine seems to be affected by this, the debugging may be disabled with echo 0 /proc/sys/fs/inode_debug - Added an i386 early-startup development tree, as git-newsetup.patch (H. Peter Anvin [EMAIL PROTECTED]) - Brought back git-sas.patch (Darrick J. Wong [EMAIL PROTECTED]). It got lost quite some time ago. I have just seen this on boot, with 2.6.22-rc2-mm1 on x86_64: -- libata version 2.20 loaded. usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try pci=routeirq. If it helps, post a report BUG: at include/linux/slub_def.h:88 kmalloc_index() Call Trace: [8034f3f9] pci_dev_put+0x12/0x14 [80283f30] get_slab+0xb5/0x265 [802841bc] __kmalloc+0x13/0xa3 [8021a4aa] cache_k8_northbridges+0x80/0x116 [8063fed2] gart_iommu_init+0x16/0x594 [804562ac] genl_rcv+0x0/0x68 [804548ed] netlink_kernel_create+0x15e/0x16b [804acc52] mutex_unlock+0x9/0xb [80639fad] pci_iommu_init+0x9/0x12 [806306af] kernel_init+0x152/0x322 [80249c7c] trace_hardirqs_on+0xc0/0x14e [804ae03d] trace_hardirqs_on_thunk+0x35/0x37 [80249c7c] trace_hardirqs_on+0xc0/0x14e [8020a848] child_rip+0xa/0x12 [80209f5c] restore_args+0x0/0x30 [8063055d] kernel_init+0x0/0x322 [8020a83e] child_rip+0x0/0x12 PCI-GART: No AMD northbridge found. hpet0: at MMIO 0xfed0, IRQs 2, 8, 0 hpet0: 3 64-bit timers, 14318180 Hz ACPI: RTC can wake from S4 pnp: 00:01: iomem range 0xf000-0xf3ff has been reserved pnp: 00:01: iomem range 0xfed13000-0xfed13fff has been reserved -- The full dmesg is at http://www.reub.net/files/kernel/2.6.22-rc1-mm1-dmesg and the config up at http://www.reub.net/files/kernel/2.6.22-rc1-mm1-config The machine otherwise seems to run OK. Reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4
Hi, On 3/04/2007 3:47 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ - The oops in git-net.patch has been fixed, so that tree has been restored. It is huge. - Added the device-mapper development tree to the -mm lineup (Alasdair Kergon). It is a quilt tree, living at ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. - Added davidel's signalfd stuff. Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. md1 is the first array on the disk, and it refuses to start up on boot, or after boot. tornado ~ # cat /proc/mdstat Personalities : [raid1] md1 : inactive sda1[0] sdc1[1] 208640 blocks md3 : active raid1 sdc3[1] sda3[0] 20008832 blocks [2/2] [UU] bitmap: 0/153 pages [0KB], 64KB chunk md5 : active raid1 sdc5[1] sda5[0] 10008384 blocks [2/2] [UU] bitmap: 4/153 pages [16KB], 32KB chunk md6 : active raid1 sdc6[1] sda6[0] 10008384 blocks [2/2] [UU] bitmap: 0/153 pages [0KB], 32KB chunk md8 : active raid1 sdc8[1] sda8[0] 1003904 blocks [2/2] [UU] bitmap: 0/123 pages [0KB], 4KB chunk md10 : active raid1 sdc10[1] sda10[0] 119933120 blocks [2/2] [UU] bitmap: 1/229 pages [4KB], 256KB chunk md2 : active raid1 sdc2[1] sda2[0] 14544 blocks [2/2] [UU] bitmap: 10/191 pages [40KB], 256KB chunk unused devices: tornado ~ # tornado ~ # mdadm --examine /dev/sda1 /dev/sda1: Magic : a92b4efc Version : 00.90.00 UUID : f5c2e565:5ed956c0:33b08c07:16154426 Creation Time : Fri Feb 2 10:16:29 2007 Raid Level : raid1 Used Dev Size : 104320 (101.89 MiB 106.82 MB) Array Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Apr 6 02:06:17 2007 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : d3668aaa - correct Events : 0.368 Number Major Minor RaidDevice State this 0 810 active sync /dev/sda1 0 0 810 active sync /dev/sda1 1 1 8 331 active sync /dev/sdc1 tornado ~ # mdadm --examine /dev/sdc1 /dev/sdc1: Magic : a92b4efc Version : 00.90.00 UUID : f5c2e565:5ed956c0:33b08c07:16154426 Creation Time : Fri Feb 2 10:16:29 2007 Raid Level : raid1 Used Dev Size : 104320 (101.89 MiB 106.82 MB) Array Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Apr 6 02:06:17 2007 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : d3668acc - correct Events : 0.368 Number Major Minor RaidDevice State this 1 8 331 active sync /dev/sdc1 0 0 810 active sync /dev/sda1 1 1 8 331 active sync /dev/sdc1 tornado ~ # tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 mdadm: device /dev/md1 already active - cannot assemble it tornado ~ # mdadm --run /dev/md1 mdadm: failed to run array /dev/md1: Cannot allocate memory tornado ~ # and looking at a dmesg, this is logged: md: bind md: bind raid1: raid set md1 active with 2 out of 2 mirrors md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 md1: failed to create bitmap (-12) md: pers->run() failed ... tornado ~ # uname -a Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux tornado ~ # The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing out the -mm releases so much lately. Also, Andrew, can you please restart posting/cc'ing your -mm announcements to the [EMAIL PROTECTED] list? Seems this stopped around about 2.6.20, it was handy. .config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4 Thanks, Reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RAID1 out of memory error, was Re: 2.6.21-rc5-mm4
Hi, On 3/04/2007 3:47 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ - The oops in git-net.patch has been fixed, so that tree has been restored. It is huge. - Added the device-mapper development tree to the -mm lineup (Alasdair Kergon). It is a quilt tree, living at ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. - Added davidel's signalfd stuff. Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. md1 is the first array on the disk, and it refuses to start up on boot, or after boot. tornado ~ # cat /proc/mdstat Personalities : [raid1] md1 : inactive sda1[0] sdc1[1] 208640 blocks md3 : active raid1 sdc3[1] sda3[0] 20008832 blocks [2/2] [UU] bitmap: 0/153 pages [0KB], 64KB chunk md5 : active raid1 sdc5[1] sda5[0] 10008384 blocks [2/2] [UU] bitmap: 4/153 pages [16KB], 32KB chunk md6 : active raid1 sdc6[1] sda6[0] 10008384 blocks [2/2] [UU] bitmap: 0/153 pages [0KB], 32KB chunk md8 : active raid1 sdc8[1] sda8[0] 1003904 blocks [2/2] [UU] bitmap: 0/123 pages [0KB], 4KB chunk md10 : active raid1 sdc10[1] sda10[0] 119933120 blocks [2/2] [UU] bitmap: 1/229 pages [4KB], 256KB chunk md2 : active raid1 sdc2[1] sda2[0] 14544 blocks [2/2] [UU] bitmap: 10/191 pages [40KB], 256KB chunk unused devices: none tornado ~ # tornado ~ # mdadm --examine /dev/sda1 /dev/sda1: Magic : a92b4efc Version : 00.90.00 UUID : f5c2e565:5ed956c0:33b08c07:16154426 Creation Time : Fri Feb 2 10:16:29 2007 Raid Level : raid1 Used Dev Size : 104320 (101.89 MiB 106.82 MB) Array Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Apr 6 02:06:17 2007 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : d3668aaa - correct Events : 0.368 Number Major Minor RaidDevice State this 0 810 active sync /dev/sda1 0 0 810 active sync /dev/sda1 1 1 8 331 active sync /dev/sdc1 tornado ~ # mdadm --examine /dev/sdc1 /dev/sdc1: Magic : a92b4efc Version : 00.90.00 UUID : f5c2e565:5ed956c0:33b08c07:16154426 Creation Time : Fri Feb 2 10:16:29 2007 Raid Level : raid1 Used Dev Size : 104320 (101.89 MiB 106.82 MB) Array Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Apr 6 02:06:17 2007 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : d3668acc - correct Events : 0.368 Number Major Minor RaidDevice State this 1 8 331 active sync /dev/sdc1 0 0 810 active sync /dev/sda1 1 1 8 331 active sync /dev/sdc1 tornado ~ # tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 mdadm: device /dev/md1 already active - cannot assemble it tornado ~ # mdadm --run /dev/md1 mdadm: failed to run array /dev/md1: Cannot allocate memory tornado ~ # and looking at a dmesg, this is logged: md: bindsdc1 md: bindsda1 raid1: raid set md1 active with 2 out of 2 mirrors md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 md1: failed to create bitmap (-12) md: pers-run() failed ... tornado ~ # uname -a Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux tornado ~ # The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing out the -mm releases so much lately. Also, Andrew, can you please restart posting/cc'ing your -mm announcements to the [EMAIL PROTECTED] list? Seems this stopped around about 2.6.20, it was handy. .config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4 Thanks, Reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc4-mm1
On 20/03/2007 3:56 PM, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.21-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc4/2.6.21-rc4-mm1/ - Restored the RSDL CPU scheduler (a new version thereof) Just booted into this kernel, and hit this, which locked up the machine: This is tornado.reub.net (Linux x86_64 2.6.21-rc4-mm1) 20:16:58 tornado login: [ cut here ] kernel BUG at kernel/sched.c:3505! invalid opcode: [1] SMP last sysfs file: devices/pci:00/:00:1f.3/i2c-adapter/i2c-0/0-002e/pwm3 CPU 1 Modules linked in: firmware_class eeprom lm85 hwmon_vid i2c_i801 8021q iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nfnetlink iptable_mangle ip_tables nfs lockd sunrpc ohci1394 ieee1394 usb_storage Pid: 8250, comm: clamd Not tainted 2.6.21-rc4-mm1 #1 RIP: 0010:[] [] __sched_text_start+0x3cb/0x8b3 RSP: :8100023cfee0 EFLAGS: 00010002 RAX: 008c RBX: 810001e040e8 RCX: 000c RDX: RSI: 008c RDI: 810001e049b8 RBP: 8100023cff70 R08: 008c R09: 810001e049a8 R10: 0034 R11: R12: 810001e03f00 R13: 0002 R14: R15: 00521b55f827 FS: 2b1dfda2ec00() GS:81000208ec40() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 2afcf000 CR3: 04ac3000 CR4: 06e0 Process clamd (pid: 8250, threadinfo 8100023ce000, task 810004c090a0) Stack: 810004c090a0 8025fdb7 810004c090a0 7fffae43e955 810004c09248 0001023cff28 8029635d 00c5aac0 0005 2b1dfc7d6d5a 8025fdb7 Call Trace: [] trace_hardirqs_on_thunk+0x35/0x37 [] trace_hardirqs_on+0x12a/0x15d [] trace_hardirqs_on_thunk+0x35/0x37 [] retint_careful+0x12/0x2e Code: 0f 0b eb fe 49 8b 94 24 e0 01 00 00 49 8b 84 24 d8 01 00 00 RIP [] __sched_text_start+0x3cb/0x8b3 RSP BUG: spinlock lockup on CPU#0, swapper/0, 810001e03f00 BUG: spinlock lockup on CPU#1, clamd/8250, 810001e03f00 every few minutes the last two lines would be repeated. This kernel does not include the hotfixes (the gentoo portage ebuild for this release does not yet include them), however I am uncertain if they fix this problem or not anyway. Also, what happened to the -mm announcements sent to [EMAIL PROTECTED] Maybe I'm the only person to miss them :-) Reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc4-mm1
On 20/03/2007 3:56 PM, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.21-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc4/2.6.21-rc4-mm1/ - Restored the RSDL CPU scheduler (a new version thereof) Just booted into this kernel, and hit this, which locked up the machine: This is tornado.reub.net (Linux x86_64 2.6.21-rc4-mm1) 20:16:58 tornado login: [ cut here ] kernel BUG at kernel/sched.c:3505! invalid opcode: [1] SMP last sysfs file: devices/pci:00/:00:1f.3/i2c-adapter/i2c-0/0-002e/pwm3 CPU 1 Modules linked in: firmware_class eeprom lm85 hwmon_vid i2c_i801 8021q iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nfnetlink iptable_mangle ip_tables nfs lockd sunrpc ohci1394 ieee1394 usb_storage Pid: 8250, comm: clamd Not tainted 2.6.21-rc4-mm1 #1 RIP: 0010:[8025d2cb] [8025d2cb] __sched_text_start+0x3cb/0x8b3 RSP: :8100023cfee0 EFLAGS: 00010002 RAX: 008c RBX: 810001e040e8 RCX: 000c RDX: RSI: 008c RDI: 810001e049b8 RBP: 8100023cff70 R08: 008c R09: 810001e049a8 R10: 0034 R11: R12: 810001e03f00 R13: 0002 R14: R15: 00521b55f827 FS: 2b1dfda2ec00() GS:81000208ec40() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 2afcf000 CR3: 04ac3000 CR4: 06e0 Process clamd (pid: 8250, threadinfo 8100023ce000, task 810004c090a0) Stack: 810004c090a0 8025fdb7 810004c090a0 7fffae43e955 810004c09248 0001023cff28 8029635d 00c5aac0 0005 2b1dfc7d6d5a 8025fdb7 Call Trace: [8025fdb7] trace_hardirqs_on_thunk+0x35/0x37 [8029635d] trace_hardirqs_on+0x12a/0x15d [8025fdb7] trace_hardirqs_on_thunk+0x35/0x37 [8025a7e0] retint_careful+0x12/0x2e Code: 0f 0b eb fe 49 8b 94 24 e0 01 00 00 49 8b 84 24 d8 01 00 00 RIP [8025d2cb] __sched_text_start+0x3cb/0x8b3 RSP 8100023cfee0 BUG: spinlock lockup on CPU#0, swapper/0, 810001e03f00 BUG: spinlock lockup on CPU#1, clamd/8250, 810001e03f00 every few minutes the last two lines would be repeated. This kernel does not include the hotfixes (the gentoo portage ebuild for this release does not yet include them), however I am uncertain if they fix this problem or not anyway. Also, what happened to the -mm announcements sent to [EMAIL PROTECTED] Maybe I'm the only person to miss them :-) Reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.19-rc5-mm2] cpufreq: set policy->curfreq on initialization
On 16/11/2006 6:05 AM, Mattia Dongili wrote: Check the correct variable and set policy->cur upon acpi-cpufreq initialization to allow the userspace governor to be used as default. Signed-off-by: Mattia Dongili <[EMAIL PROTECTED]> --- Reuben, could you also try if this patch fixes the BUG()? Thanks It does, and all looks fine now, thanks. Sorry for not getting back about it a little earlier. Reuben diff --git a/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c index 18f4715..a630f94 100644 --- a/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c +++ b/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c @@ -699,14 +699,14 @@ static int acpi_cpufreq_cpu_init(struct if (result) goto err_freqfree; - switch (data->cpu_feature) { + switch (perf->control_register.space_id) { case ACPI_ADR_SPACE_SYSTEM_IO: /* Current speed is unknown and not detectable by IO port */ policy->cur = acpi_cpufreq_guess_freq(data, policy->cpu); break; case ACPI_ADR_SPACE_FIXED_HARDWARE: acpi_cpufreq_driver.get = get_cur_freq_on_cpu; - get_cur_freq_on_cpu(cpu); + policy->cur = get_cur_freq_on_cpu(cpu); break; default: break; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.19-rc5-mm2] cpufreq: set policy-curfreq on initialization
On 16/11/2006 6:05 AM, Mattia Dongili wrote: Check the correct variable and set policy-cur upon acpi-cpufreq initialization to allow the userspace governor to be used as default. Signed-off-by: Mattia Dongili [EMAIL PROTECTED] --- Reuben, could you also try if this patch fixes the BUG()? Thanks It does, and all looks fine now, thanks. Sorry for not getting back about it a little earlier. Reuben diff --git a/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c index 18f4715..a630f94 100644 --- a/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c +++ b/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c @@ -699,14 +699,14 @@ static int acpi_cpufreq_cpu_init(struct if (result) goto err_freqfree; - switch (data-cpu_feature) { + switch (perf-control_register.space_id) { case ACPI_ADR_SPACE_SYSTEM_IO: /* Current speed is unknown and not detectable by IO port */ policy-cur = acpi_cpufreq_guess_freq(data, policy-cpu); break; case ACPI_ADR_SPACE_FIXED_HARDWARE: acpi_cpufreq_driver.get = get_cur_freq_on_cpu; - get_cur_freq_on_cpu(cpu); + policy-cur = get_cur_freq_on_cpu(cpu); break; default: break; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] Re: 2.6.13-mm1
Hi Alan, On 3/09/2005 3:19 a.m., Alan Stern wrote: On Thu, 1 Sep 2005, Andrew Morton wrote: Reuben Farrelly <[EMAIL PROTECTED]> wrote: I'm also observing some USB messages logged: Sep 2 13:26:22 tornado kernel: usb 5-1: new full speed USB device using uhci_hcd and address 13 Sep 2 13:26:22 tornado kernel: drivers/usb/class/usblp.c: usblp0: USB Bidirectional printer dev 13 if 0 alt 0 proto 2 vid 0x03F0 pid 0x6204 Sep 2 13:26:23 tornado kernel: hub 5-0:1.0: port 1 disabled by hub (EMI?), re-enabling... This message means pretty much what it says: noise or something else caused the connection to be disabled. In theory this could be caused by a problem with the host controller, the cable, or the printer. Does this happen consistently with 2.6.13-mm1? Did it happen with 2.6.12? It may have just been a red herring, as I haven't had the problem appear since, nor had I seen it before then. I've done multiple reboots, plug and unplugs to test since and all have been OK. Thanks for taking the time to reply. reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-mm1: hangs during boot ...
Hi, On 5/09/2005 4:32 a.m., James Bottomley wrote: On Sun, 2005-09-04 at 01:24 +1200, Reuben Farrelly wrote: I am seeing it fill up my messages log as it is logging 1 or so messages each minute. I've emailed the SCSI maintainer James Bottomley twice about it but had no response either time. OK, can you try this ... it should confirm the theory if the messages go away. Thanks, James diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -315,7 +315,7 @@ int scsi_execute(struct scsi_device *sde req->sense = sense; req->sense_len = 0; req->timeout = timeout; - req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL; + req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET; /* * head injection *required* here otherwise quiesce won't work @@ -927,17 +927,20 @@ void scsi_io_completion(struct scsi_cmnd scsi_requeue_command(q, cmd); return; } - printk(KERN_INFO "Device %s not ready.\n", - req->rq_disk ? req->rq_disk->disk_name : ""); + if (!(req->flags & REQ_QUIET)) + dev_printk(KERN_INFO, + >device->sdev_gendev, + "Device not ready.\n"); cmd = scsi_end_request(cmd, 0, this_count, 1); return; case VOLUME_OVERFLOW: - printk(KERN_INFO "Volume overflow <%d %d %d %d> CDB: ", - cmd->device->host->host_no, - (int)cmd->device->channel, - (int)cmd->device->id, (int)cmd->device->lun); - __scsi_print_command(cmd->data_cmnd); - scsi_print_sense("", cmd); + if (!(req->flags & REQ_QUIET)) { + dev_printk(KERN_INFO, + >device->sdev_gendev, + "Volume overflow, CDB: "); + __scsi_print_command(cmd->data_cmnd); + scsi_print_sense("", cmd); + } cmd = scsi_end_request(cmd, 0, block_bytes, 1); return; default: @@ -954,15 +957,13 @@ void scsi_io_completion(struct scsi_cmnd return; } if (result) { - if (!(req->flags & REQ_SPECIAL)) - printk(KERN_INFO "SCSI error : <%d %d %d %d> return code " - "= 0x%x\n", cmd->device->host->host_no, - cmd->device->channel, - cmd->device->id, - cmd->device->lun, result); + if (!(req->flags & REQ_QUIET)) { + dev_printk(KERN_INFO, >device->sdev_gendev, + "SCSI error: return code = 0x%x\n", result); - if (driver_byte(result) & DRIVER_SENSE) - scsi_print_sense("", cmd); + if (driver_byte(result) & DRIVER_SENSE) + scsi_print_sense("", cmd); + } /* * Mark a single buffer as not uptodate. Queue the remainder. * We sometimes get this cruft in the event that a medium error This patch fixes it, and there was no message during boot about not being ready, nor after the machine had fully booted. Great ;-) However, I did get an oops when warm booting the kernel, I suspect this may be the oops that I get every now and then when warm rebooting, with no real pattern, and possibly isn't related to the patch. As my serial console wasn't set up at the time, I took a photo instead, at http://www.reub.net/kernel/scsi-oops.jpg Thanks reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-mm1: hangs during boot ...
Hi, On 5/09/2005 4:32 a.m., James Bottomley wrote: On Sun, 2005-09-04 at 01:24 +1200, Reuben Farrelly wrote: I am seeing it fill up my messages log as it is logging 1 or so messages each minute. I've emailed the SCSI maintainer James Bottomley twice about it but had no response either time. OK, can you try this ... it should confirm the theory if the messages go away. Thanks, James diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -315,7 +315,7 @@ int scsi_execute(struct scsi_device *sde req-sense = sense; req-sense_len = 0; req-timeout = timeout; - req-flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL; + req-flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET; /* * head injection *required* here otherwise quiesce won't work @@ -927,17 +927,20 @@ void scsi_io_completion(struct scsi_cmnd scsi_requeue_command(q, cmd); return; } - printk(KERN_INFO Device %s not ready.\n, - req-rq_disk ? req-rq_disk-disk_name : ); + if (!(req-flags REQ_QUIET)) + dev_printk(KERN_INFO, + cmd-device-sdev_gendev, + Device not ready.\n); cmd = scsi_end_request(cmd, 0, this_count, 1); return; case VOLUME_OVERFLOW: - printk(KERN_INFO Volume overflow %d %d %d %d CDB: , - cmd-device-host-host_no, - (int)cmd-device-channel, - (int)cmd-device-id, (int)cmd-device-lun); - __scsi_print_command(cmd-data_cmnd); - scsi_print_sense(, cmd); + if (!(req-flags REQ_QUIET)) { + dev_printk(KERN_INFO, + cmd-device-sdev_gendev, + Volume overflow, CDB: ); + __scsi_print_command(cmd-data_cmnd); + scsi_print_sense(, cmd); + } cmd = scsi_end_request(cmd, 0, block_bytes, 1); return; default: @@ -954,15 +957,13 @@ void scsi_io_completion(struct scsi_cmnd return; } if (result) { - if (!(req-flags REQ_SPECIAL)) - printk(KERN_INFO SCSI error : %d %d %d %d return code - = 0x%x\n, cmd-device-host-host_no, - cmd-device-channel, - cmd-device-id, - cmd-device-lun, result); + if (!(req-flags REQ_QUIET)) { + dev_printk(KERN_INFO, cmd-device-sdev_gendev, + SCSI error: return code = 0x%x\n, result); - if (driver_byte(result) DRIVER_SENSE) - scsi_print_sense(, cmd); + if (driver_byte(result) DRIVER_SENSE) + scsi_print_sense(, cmd); + } /* * Mark a single buffer as not uptodate. Queue the remainder. * We sometimes get this cruft in the event that a medium error This patch fixes it, and there was no message during boot about not being ready, nor after the machine had fully booted. Great ;-) However, I did get an oops when warm booting the kernel, I suspect this may be the oops that I get every now and then when warm rebooting, with no real pattern, and possibly isn't related to the patch. As my serial console wasn't set up at the time, I took a photo instead, at http://www.reub.net/kernel/scsi-oops.jpg Thanks reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] Re: 2.6.13-mm1
Hi Alan, On 3/09/2005 3:19 a.m., Alan Stern wrote: On Thu, 1 Sep 2005, Andrew Morton wrote: Reuben Farrelly [EMAIL PROTECTED] wrote: I'm also observing some USB messages logged: Sep 2 13:26:22 tornado kernel: usb 5-1: new full speed USB device using uhci_hcd and address 13 Sep 2 13:26:22 tornado kernel: drivers/usb/class/usblp.c: usblp0: USB Bidirectional printer dev 13 if 0 alt 0 proto 2 vid 0x03F0 pid 0x6204 Sep 2 13:26:23 tornado kernel: hub 5-0:1.0: port 1 disabled by hub (EMI?), re-enabling... This message means pretty much what it says: noise or something else caused the connection to be disabled. In theory this could be caused by a problem with the host controller, the cable, or the printer. Does this happen consistently with 2.6.13-mm1? Did it happen with 2.6.12? It may have just been a red herring, as I haven't had the problem appear since, nor had I seen it before then. I've done multiple reboots, plug and unplugs to test since and all have been OK. Thanks for taking the time to reply. reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-mm1: hangs during boot ...
Hi Peter, On 3/09/2005 4:59 a.m., Peter Williams wrote: Brown, Len wrote: [ 279.662960] [] wait_for_completion+0xa4/0x110 possibly a missing interrupt? CONFIG_ACPI=y any difference if booted with "acpi=off" or "acpi=noirq"? Yes. In both cases, the system appears to boot normally but I'm unable to login or connect via ssh. Also there's a "device not ready" message Are you seeing this "Device not ready" message appear over and over, or just the once? I am seeing it fill up my messages log as it is logging 1 or so messages each minute. I've emailed the SCSI maintainer James Bottomley twice about it but had no response either time. The SCSI device I have is: Sep 3 22:14:40 tornado kernel: Vendor: SONY Model: CD-RW CRX145S Rev: 1.0b As for the inability to log in, this bug may be relevant, given I also had that problem: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=166422 There are fixes in the pipeline for util-linux audit interaction in Fedora as well. I know because I reported those too ;) after the scsi initialization which I don't normally see. I've attached the scsi initialization output. The PF_NETLINK error messages after the login prompt in this output are created whenever I try to log in or connect via ssh. The workaround by enabling audit support, but obviously a better fix is in the pipeline.. I'm surprised more people aren't discovering these 'interactions' due to having audit not turned on. Does everyone build audit into their kernels? reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-mm1: hangs during boot ...
Hi Peter, On 3/09/2005 4:59 a.m., Peter Williams wrote: Brown, Len wrote: [ 279.662960] [c02d5c74] wait_for_completion+0xa4/0x110 possibly a missing interrupt? CONFIG_ACPI=y any difference if booted with acpi=off or acpi=noirq? Yes. In both cases, the system appears to boot normally but I'm unable to login or connect via ssh. Also there's a device not ready message Are you seeing this Device not ready message appear over and over, or just the once? I am seeing it fill up my messages log as it is logging 1 or so messages each minute. I've emailed the SCSI maintainer James Bottomley twice about it but had no response either time. The SCSI device I have is: Sep 3 22:14:40 tornado kernel: Vendor: SONY Model: CD-RW CRX145S Rev: 1.0b As for the inability to log in, this bug may be relevant, given I also had that problem: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=166422 There are fixes in the pipeline for util-linux audit interaction in Fedora as well. I know because I reported those too ;) after the scsi initialization which I don't normally see. I've attached the scsi initialization output. The PF_NETLINK error messages after the login prompt in this output are created whenever I try to log in or connect via ssh. The workaround by enabling audit support, but obviously a better fix is in the pipeline.. I'm surprised more people aren't discovering these 'interactions' due to having audit not turned on. Does everyone build audit into their kernels? reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-mm1
Hi, On 1/09/2005 10:58 a.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13/2.6.13-mm1/ - Included Alan's big tty layer buffering rewrite. This breaks the build on lots of more obscure character device drivers. Patches welcome (please cc Alan). Changes since 2.6.13-rc6-mm2: linus.patch git-acpi.patch git-arm.patch git-cpufreq.patch git-cryptodev.patch git-ia64.patch git-audit.patch git-audit-ppc64-fix.patch git-input.patch git-jfs-fixup.patch git-kbuild.patch git-libata-all.patch git-mtd.patch git-netdev-all.patch git-nfs.patch git-ocfs2.patch git-serial.patch git-scsi-block.patch git-scsi-iscsi.patch git-scsi-misc.patch git-watchdog.patch This patch: netlink-log-protocol-failures.patch is causing lots of messages like this to be logged on my console: Sep 2 11:52:41 tornado kernel: DEBUG: Failed to load PF_NETLINK protocol 9 It seems to be caused by audit support not being enabled in as if I rebuild with audit support the message goes away :) I'm also observing some USB messages logged: Sep 2 13:26:22 tornado kernel: usb 5-1: new full speed USB device using uhci_hcd and address 13 Sep 2 13:26:22 tornado kernel: drivers/usb/class/usblp.c: usblp0: USB Bidirectional printer dev 13 if 0 alt 0 proto 2 vid 0x03F0 pid 0x6204 Sep 2 13:26:23 tornado kernel: hub 5-0:1.0: port 1 disabled by hub (EMI?), re-enabling... Sep 2 13:26:23 tornado kernel: usb 5-1: USB disconnect, address 13 Sep 2 13:26:23 tornado kernel: drivers/usb/class/usblp.c: usblp0: removed Sep 2 13:26:23 tornado kernel: usb 5-1: new full speed USB device using uhci_hcd and address 14 Sep 2 13:26:23 tornado kernel: usb 5-1: device descriptor read/64, error -71 Sep 2 13:26:23 tornado kernel: usb 5-1: device descriptor read/64, error -71 Sep 2 13:26:23 tornado kernel: usb 5-1: new full speed USB device using uhci_hcd and address 15 Sep 2 13:26:23 tornado kernel: usb 5-1: device descriptor read/all, error -71 Sep 2 13:26:23 tornado kernel: usb 5-1: new full speed USB device using uhci_hcd and address 16 Sep 2 13:26:23 tornado kernel: usb 5-1: can't set config #1, error -71 Sep 2 13:26:23 tornado kernel: usb 5-1: new full speed USB device using uhci_hcd and address 17 Sep 2 13:26:24 tornado kernel: usb 5-1: unable to read config index 0 descriptor/start Sep 2 13:26:24 tornado kernel: usb 5-1: can't read configurations, error -71 [EMAIL PROTECTED] kernel]# lsusb Bus 005 Device 004: ID 050d:0105 Belkin Components Bus 005 Device 003: ID 0451:2046 Texas Instruments, Inc. TUSB2046 Hub Bus 005 Device 001: ID : Bus 004 Device 001: ID : Bus 003 Device 001: ID : Bus 002 Device 001: ID : Bus 001 Device 001: ID : [EMAIL PROTECTED] kernel]# Output of lsusb -v up at http://www.reub.net/kernel/lsusb-output reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-mm1
Hi, On 1/09/2005 10:58 a.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13/2.6.13-mm1/ - Included Alan's big tty layer buffering rewrite. This breaks the build on lots of more obscure character device drivers. Patches welcome (please cc Alan). Changes since 2.6.13-rc6-mm2: linus.patch git-acpi.patch git-arm.patch git-cpufreq.patch git-cryptodev.patch git-ia64.patch git-audit.patch git-audit-ppc64-fix.patch git-input.patch git-jfs-fixup.patch git-kbuild.patch git-libata-all.patch git-mtd.patch git-netdev-all.patch git-nfs.patch git-ocfs2.patch git-serial.patch git-scsi-block.patch git-scsi-iscsi.patch git-scsi-misc.patch git-watchdog.patch This patch: netlink-log-protocol-failures.patch is causing lots of messages like this to be logged on my console: Sep 2 11:52:41 tornado kernel: DEBUG: Failed to load PF_NETLINK protocol 9 It seems to be caused by audit support not being enabled in as if I rebuild with audit support the message goes away :) I'm also observing some USB messages logged: Sep 2 13:26:22 tornado kernel: usb 5-1: new full speed USB device using uhci_hcd and address 13 Sep 2 13:26:22 tornado kernel: drivers/usb/class/usblp.c: usblp0: USB Bidirectional printer dev 13 if 0 alt 0 proto 2 vid 0x03F0 pid 0x6204 Sep 2 13:26:23 tornado kernel: hub 5-0:1.0: port 1 disabled by hub (EMI?), re-enabling... Sep 2 13:26:23 tornado kernel: usb 5-1: USB disconnect, address 13 Sep 2 13:26:23 tornado kernel: drivers/usb/class/usblp.c: usblp0: removed Sep 2 13:26:23 tornado kernel: usb 5-1: new full speed USB device using uhci_hcd and address 14 Sep 2 13:26:23 tornado kernel: usb 5-1: device descriptor read/64, error -71 Sep 2 13:26:23 tornado kernel: usb 5-1: device descriptor read/64, error -71 Sep 2 13:26:23 tornado kernel: usb 5-1: new full speed USB device using uhci_hcd and address 15 Sep 2 13:26:23 tornado kernel: usb 5-1: device descriptor read/all, error -71 Sep 2 13:26:23 tornado kernel: usb 5-1: new full speed USB device using uhci_hcd and address 16 Sep 2 13:26:23 tornado kernel: usb 5-1: can't set config #1, error -71 Sep 2 13:26:23 tornado kernel: usb 5-1: new full speed USB device using uhci_hcd and address 17 Sep 2 13:26:24 tornado kernel: usb 5-1: unable to read config index 0 descriptor/start Sep 2 13:26:24 tornado kernel: usb 5-1: can't read configurations, error -71 [EMAIL PROTECTED] kernel]# lsusb Bus 005 Device 004: ID 050d:0105 Belkin Components Bus 005 Device 003: ID 0451:2046 Texas Instruments, Inc. TUSB2046 Hub Bus 005 Device 001: ID : Bus 004 Device 001: ID : Bus 003 Device 001: ID : Bus 002 Device 001: ID : Bus 001 Device 001: ID : [EMAIL PROTECTED] kernel]# Output of lsusb -v up at http://www.reub.net/kernel/lsusb-output reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Inotify problem [was Re: 2.6.13-rc6-mm1]
Hi, On 22/08/2005 9:10 p.m., John McCutchan wrote: On Sat, 2005-08-20 at 23:52 -0700, Andrew Morton wrote: Reuben Farrelly <[EMAIL PROTECTED]> wrote: Hi, On 19/08/2005 11:37 a.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc6/2.6.13-rc6-mm1/ - Lots of fixes, updates and cleanups all over the place. - If you have the right debugging options set, this kernel will generate a storm of sleeping-in-atomic-code warnings at boot, from the scsi code. It is being worked on. Changes since 2.6.13-rc5-mm1: linus.patch Noted this in my log earlier today. Is this inotify related? Aug 21 08:33:04 tornado kernel: idr_remove called for id=2048 which is not allocated. Aug 21 08:33:04 tornado kernel: [] dump_stack+0x17/0x19 Aug 21 08:33:04 tornado kernel: [] idr_remove_warning+0x1b/0x1d Aug 21 08:33:04 tornado kernel: [] sub_remove+0x88/0xea Aug 21 08:33:04 tornado kernel: [] idr_remove+0x1b/0x7f Aug 21 08:33:04 tornado kernel: [] remove_watch_no_event+0x7a/0x12e Aug 21 08:33:04 tornado kernel: [] inotify_release+0x8f/0x1af Aug 21 08:33:04 tornado kernel: [] __fput+0xaf/0x199 Aug 21 08:33:04 tornado kernel: [] fput+0x22/0x3b Aug 21 08:33:04 tornado kernel: [] filp_close+0x41/0x67 Aug 21 08:33:04 tornado kernel: [] sys_close+0x70/0x92 Aug 21 08:33:04 tornado kernel: [] sysenter_past_esp+0x54/0x75 Aug 21 08:33:04 tornado kernel: idr_remove called for id=3072 which is not allocated. Aug 21 08:33:05 tornado kernel: [] dump_stack+0x17/0x19 Aug 21 08:33:05 tornado kernel: [] idr_remove_warning+0x1b/0x1d Aug 21 08:33:05 tornado kernel: [] sub_remove+0x88/0xea Aug 21 08:33:05 tornado kernel: [] idr_remove+0x1b/0x7f Aug 21 08:33:05 tornado kernel: [] remove_watch_no_event+0x7a/0x12e Aug 21 08:33:05 tornado kernel: [] inotify_release+0x8f/0x1af Aug 21 08:33:05 tornado kernel: [] __fput+0xaf/0x199 Aug 21 08:33:05 tornado kernel: [] fput+0x22/0x3b Aug 21 08:33:05 tornado kernel: [] filp_close+0x41/0x67 Aug 21 08:33:05 tornado kernel: [] sys_close+0x70/0x92 Aug 21 08:33:05 tornado kernel: [] sysenter_past_esp+0x54/0x75 This would have been triggered by using dovecot IMAP which is configured to use inotify on Maildir. I'm also seeing some userspace errors logged for dovecot: "Aug 21 04:17:22 Error: IMAP(reuben): inotify_rm_watch() failed: Invalid argument" I'll deal with those with the guy who wrote the inotify code in dovecot. I'm not so sure userspace should be able or need to cause the kernel to dump stack traces like that though? Yes, the stack dumps would appear to be due to an inotify bug. The message from dovecot is allegedly due to dovecot passing in a file descriptor which was not obtained from the inotify_init() syscall. But until we know what caused those stack dumps we cannot definitely say whether dovecot is at fault. Inotify has a check on both add and rm watch syscalls: /* verify that this is indeed an inotify instance */ if (unlikely(filp->f_op != _fops)) { ret = -EINVAL; goto out; } This is crashing in inotify_release, which is called on close of the inotify instance. So this fd must be from an inotify instance right? I looked at the dovecot code, it looks fine wrt inotify. Long shot, but the close-on-exec flag is set. Could this be tripping anything up? I have also observed another problem with inotify with dovecot - so I spoke with Johannes Berg who wrote the inotify code in dovecot. He suggested I post here to LKML since his opinion is that this to be a kernel bug. The problem I am observing is this, logged by dovecot after a period of time when a client is connected: dovecot: Aug 22 14:31:23 Error: IMAP(gilly): inotify_rm_watch() failed: Invalid argument dovecot: Aug 22 14:31:23 Error: IMAP(gilly): inotify_rm_watch() failed: Invalid argument dovecot: Aug 22 14:31:23 Error: IMAP(gilly): inotify_rm_watch() failed: Invalid argument Multiply that by about 1000 ;-) Some debugging shows this: dovecot: Aug 25 19:31:22 Warning: IMAP(gilly): removing wd 1019 from inotify fd 4 dovecot: Aug 25 19:31:22 Warning: IMAP(gilly): removing wd 1018 from inotify fd 4 dovecot: Aug 25 19:31:22 Warning: IMAP(gilly): inotify_add_watch returned 1019 dovecot: Aug 25 19:31:22 Warning: IMAP(gilly): inotify_add_watch returned 1020 dovecot: Aug 25 19:31:23 Warning: IMAP(gilly): removing wd 1020 from inotify fd 4 dovecot: Aug 25 19:31:23 Warning: IMAP(gilly): removing wd 1019 from inotify fd 4 dovecot: Aug 25 19:31:24 Warning: IMAP(gilly): inotify_add_watch returned 1020 dovecot: Aug 25 19:31:24 Warning: IMAP(gilly): inotify_add_watch returned 1021 dovecot: Aug 25 19:31:24 Warning: IMAP(gilly): removing wd 1021 from inotify fd 4 dovecot: Aug 25 19:31:24 Warning: IMAP(gilly): removing wd 1020 from inotify fd 4 dovecot: Aug 25 19:31:25 Warning: IMAP(gilly): inotify_add_watch returned 1021 dovecot: Aug 25 19:31:25 Warning: IMAP(gilly): inotify_add_watch returned 1022 dovec
Inotify problem [was Re: 2.6.13-rc6-mm1]
Hi, On 22/08/2005 9:10 p.m., John McCutchan wrote: On Sat, 2005-08-20 at 23:52 -0700, Andrew Morton wrote: Reuben Farrelly [EMAIL PROTECTED] wrote: Hi, On 19/08/2005 11:37 a.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc6/2.6.13-rc6-mm1/ - Lots of fixes, updates and cleanups all over the place. - If you have the right debugging options set, this kernel will generate a storm of sleeping-in-atomic-code warnings at boot, from the scsi code. It is being worked on. Changes since 2.6.13-rc5-mm1: linus.patch Noted this in my log earlier today. Is this inotify related? Aug 21 08:33:04 tornado kernel: idr_remove called for id=2048 which is not allocated. Aug 21 08:33:04 tornado kernel: [c0103a00] dump_stack+0x17/0x19 Aug 21 08:33:04 tornado kernel: [c01c9f9a] idr_remove_warning+0x1b/0x1d Aug 21 08:33:04 tornado kernel: [c01ca024] sub_remove+0x88/0xea Aug 21 08:33:04 tornado kernel: [c01ca0a1] idr_remove+0x1b/0x7f Aug 21 08:33:04 tornado kernel: [c018176a] remove_watch_no_event+0x7a/0x12e Aug 21 08:33:04 tornado kernel: [c0181f64] inotify_release+0x8f/0x1af Aug 21 08:33:04 tornado kernel: [c015ca80] __fput+0xaf/0x199 Aug 21 08:33:04 tornado kernel: [c015c9b8] fput+0x22/0x3b Aug 21 08:33:04 tornado kernel: [c015b2ed] filp_close+0x41/0x67 Aug 21 08:33:04 tornado kernel: [c015b383] sys_close+0x70/0x92 Aug 21 08:33:04 tornado kernel: [c0102a9b] sysenter_past_esp+0x54/0x75 Aug 21 08:33:04 tornado kernel: idr_remove called for id=3072 which is not allocated. Aug 21 08:33:05 tornado kernel: [c0103a00] dump_stack+0x17/0x19 Aug 21 08:33:05 tornado kernel: [c01c9f9a] idr_remove_warning+0x1b/0x1d Aug 21 08:33:05 tornado kernel: [c01ca024] sub_remove+0x88/0xea Aug 21 08:33:05 tornado kernel: [c01ca0a1] idr_remove+0x1b/0x7f Aug 21 08:33:05 tornado kernel: [c018176a] remove_watch_no_event+0x7a/0x12e Aug 21 08:33:05 tornado kernel: [c0181f64] inotify_release+0x8f/0x1af Aug 21 08:33:05 tornado kernel: [c015ca80] __fput+0xaf/0x199 Aug 21 08:33:05 tornado kernel: [c015c9b8] fput+0x22/0x3b Aug 21 08:33:05 tornado kernel: [c015b2ed] filp_close+0x41/0x67 Aug 21 08:33:05 tornado kernel: [c015b383] sys_close+0x70/0x92 Aug 21 08:33:05 tornado kernel: [c0102a9b] sysenter_past_esp+0x54/0x75 This would have been triggered by using dovecot IMAP which is configured to use inotify on Maildir. I'm also seeing some userspace errors logged for dovecot: Aug 21 04:17:22 Error: IMAP(reuben): inotify_rm_watch() failed: Invalid argument I'll deal with those with the guy who wrote the inotify code in dovecot. I'm not so sure userspace should be able or need to cause the kernel to dump stack traces like that though? Yes, the stack dumps would appear to be due to an inotify bug. The message from dovecot is allegedly due to dovecot passing in a file descriptor which was not obtained from the inotify_init() syscall. But until we know what caused those stack dumps we cannot definitely say whether dovecot is at fault. Inotify has a check on both add and rm watch syscalls: /* verify that this is indeed an inotify instance */ if (unlikely(filp-f_op != inotify_fops)) { ret = -EINVAL; goto out; } This is crashing in inotify_release, which is called on close of the inotify instance. So this fd must be from an inotify instance right? I looked at the dovecot code, it looks fine wrt inotify. Long shot, but the close-on-exec flag is set. Could this be tripping anything up? I have also observed another problem with inotify with dovecot - so I spoke with Johannes Berg who wrote the inotify code in dovecot. He suggested I post here to LKML since his opinion is that this to be a kernel bug. The problem I am observing is this, logged by dovecot after a period of time when a client is connected: dovecot: Aug 22 14:31:23 Error: IMAP(gilly): inotify_rm_watch() failed: Invalid argument dovecot: Aug 22 14:31:23 Error: IMAP(gilly): inotify_rm_watch() failed: Invalid argument dovecot: Aug 22 14:31:23 Error: IMAP(gilly): inotify_rm_watch() failed: Invalid argument Multiply that by about 1000 ;-) Some debugging shows this: dovecot: Aug 25 19:31:22 Warning: IMAP(gilly): removing wd 1019 from inotify fd 4 dovecot: Aug 25 19:31:22 Warning: IMAP(gilly): removing wd 1018 from inotify fd 4 dovecot: Aug 25 19:31:22 Warning: IMAP(gilly): inotify_add_watch returned 1019 dovecot: Aug 25 19:31:22 Warning: IMAP(gilly): inotify_add_watch returned 1020 dovecot: Aug 25 19:31:23 Warning: IMAP(gilly): removing wd 1020 from inotify fd 4 dovecot: Aug 25 19:31:23 Warning: IMAP(gilly): removing wd 1019 from inotify fd 4 dovecot: Aug 25 19:31:24 Warning: IMAP(gilly): inotify_add_watch returned 1020 dovecot: Aug 25 19:31:24 Warning: IMAP(gilly): inotify_add_watch returned 1021 dovecot: Aug 25 19:31:24 Warning: IMAP(gilly): removing wd 1021 from inotify fd 4 dovecot: Aug 25 19:31:24 Warning: IMAP(gilly): removing wd 1020 from inotify fd 4
Re: 2.6.13-rc6-mm2
Hi, On 23/08/2005 4:30 p.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc6/2.6.13-rc6-mm2/ - Various updates. Nothing terribly noteworthy. Yup, seems to be generally good... Noticed this in the log earlier tonight: Aug 23 19:44:51 tornado kernel: hub 5-0:1.0: port 1 disabled by hub (EMI?), re-enabling... Aug 23 19:44:51 tornado kernel: usb 5-1: USB disconnect, address 2 Aug 23 19:44:51 tornado kernel: drivers/usb/class/usblp.c: usblp0: removed Aug 23 19:44:51 tornado kernel: Unable to handle kernel NULL pointer dereference at virtual address 0004 Aug 23 19:44:51 tornado kernel: printing eip: Aug 23 19:44:51 tornado kernel: c01ccef2 Aug 23 19:44:51 tornado kernel: *pde = Aug 23 19:44:51 tornado kernel: Oops: [#1] Aug 23 19:44:51 tornado kernel: SMP Aug 23 19:44:51 tornado kernel: last sysfs file: /devices/pci:00/:00:1f.3/i2c-0/name Aug 23 19:44:51 tornado kernel: Modules linked in: nfsd exportfs lockd eeprom sunrpc ipv6 iptable_filter binfmt_misc reiser4 zlib_de flate zlib_inflate dm_mod video thermal processor fan button ac tpm_nsc i2c_i801 sky2 e100 sr_mod Aug 23 19:44:51 tornado kernel: CPU:1 Aug 23 19:44:51 tornado kernel: EIP:0060:[]Not tainted VLI Aug 23 19:44:51 tornado kernel: EFLAGS: 00010286 (2.6.13-rc6-mm2) Aug 23 19:44:51 tornado kernel: EIP is at _raw_spin_lock+0x7/0x73 Aug 23 19:44:51 tornado kernel: eax: ebx: ecx: c1a60658 edx: c1a63e24 Aug 23 19:44:51 tornado kernel: esi: edi: c0382400 ebp: f7c55e98 esp: f7c55e90 Aug 23 19:44:51 tornado kernel: ds: 007b es: 007b ss: 0068 Aug 23 19:44:51 tornado kernel: Process khubd (pid: 109, threadinfo=f7c54000 task=c192b030) Aug 23 19:44:51 tornado kernel: Stack: f7c58a8c f7c55ea0 c0312219 f7c55eb0 c030feb7 f7c58ae8 f7c58a48 Aug 23 19:44:51 tornado kernel:f7c55ec4 c0217e73 f7c58a48 f7d134ec 0040 f7c55ed0 c0217ec0 f7c58a48 Aug 23 19:44:51 tornado kernel:f7c55edc c0217814 f7c58a48 f7c55eec c0216ad2 f7c58a48 f7c58a14 f7c55ef8 Aug 23 19:44:51 tornado kernel: Call Trace: Aug 23 19:44:51 tornado kernel: [] show_stack+0x94/0xca Aug 23 19:44:51 tornado kernel: [] show_registers+0x15a/0x1ea Aug 23 19:44:51 tornado kernel: [] die+0x108/0x183 Aug 23 19:44:51 tornado kernel: [] do_page_fault+0x1ea/0x63d Aug 23 19:44:51 tornado kernel: [] error_code+0x4f/0x54 Aug 23 19:44:51 tornado kernel: [] _spin_lock+0x8/0xa Aug 23 19:44:51 tornado kernel: [] klist_remove+0x10/0x2c Aug 23 19:44:51 tornado kernel: [] __device_release_driver+0x41/0x65 Aug 23 19:44:51 tornado kernel: [] device_release_driver+0x29/0x39 Aug 23 19:44:51 tornado kernel: [] bus_remove_device+0x52/0x60 Aug 23 19:44:51 tornado kernel: [] device_del+0x2e/0x5d Aug 23 19:44:51 tornado kernel: [] device_unregister+0xb/0x15 Aug 23 19:44:51 tornado kernel: [] usb_disconnect+0x115/0x15c Aug 23 19:44:51 tornado kernel: [] hub_port_connect_change+0x54/0x399 Aug 23 19:44:51 tornado kernel: [] hub_events+0x274/0x3b2 Aug 23 19:44:51 tornado kernel: [] hub_thread+0x1a/0xdf Aug 23 19:44:51 tornado kernel: [] kthread+0x99/0x9d Aug 23 19:44:51 tornado kernel: [] kernel_thread_helper+0x5/0xb Aug 23 19:44:51 tornado kernel: Code: 00 00 00 8b 0d a8 62 36 c0 e9 61 ff ff ff f3 90 31 c0 86 07 84 c0 0f 8e 79 ff ff ff 83 c4 18 5 b 5e 5f 5d c3 55 89 e5 56 53 89 c3 <81> 78 04 ad 4e ad de 75 2d be 00 e0 ff ff 21 e6 8b 06 39 43 0c reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc6-mm2
Hi, On 23/08/2005 4:30 p.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc6/2.6.13-rc6-mm2/ - Various updates. Nothing terribly noteworthy. Yup, seems to be generally good... Noticed this in the log earlier tonight: Aug 23 19:44:51 tornado kernel: hub 5-0:1.0: port 1 disabled by hub (EMI?), re-enabling... Aug 23 19:44:51 tornado kernel: usb 5-1: USB disconnect, address 2 Aug 23 19:44:51 tornado kernel: drivers/usb/class/usblp.c: usblp0: removed Aug 23 19:44:51 tornado kernel: Unable to handle kernel NULL pointer dereference at virtual address 0004 Aug 23 19:44:51 tornado kernel: printing eip: Aug 23 19:44:51 tornado kernel: c01ccef2 Aug 23 19:44:51 tornado kernel: *pde = Aug 23 19:44:51 tornado kernel: Oops: [#1] Aug 23 19:44:51 tornado kernel: SMP Aug 23 19:44:51 tornado kernel: last sysfs file: /devices/pci:00/:00:1f.3/i2c-0/name Aug 23 19:44:51 tornado kernel: Modules linked in: nfsd exportfs lockd eeprom sunrpc ipv6 iptable_filter binfmt_misc reiser4 zlib_de flate zlib_inflate dm_mod video thermal processor fan button ac tpm_nsc i2c_i801 sky2 e100 sr_mod Aug 23 19:44:51 tornado kernel: CPU:1 Aug 23 19:44:51 tornado kernel: EIP:0060:[c01ccef2]Not tainted VLI Aug 23 19:44:51 tornado kernel: EFLAGS: 00010286 (2.6.13-rc6-mm2) Aug 23 19:44:51 tornado kernel: EIP is at _raw_spin_lock+0x7/0x73 Aug 23 19:44:51 tornado kernel: eax: ebx: ecx: c1a60658 edx: c1a63e24 Aug 23 19:44:51 tornado kernel: esi: edi: c0382400 ebp: f7c55e98 esp: f7c55e90 Aug 23 19:44:51 tornado kernel: ds: 007b es: 007b ss: 0068 Aug 23 19:44:51 tornado kernel: Process khubd (pid: 109, threadinfo=f7c54000 task=c192b030) Aug 23 19:44:51 tornado kernel: Stack: f7c58a8c f7c55ea0 c0312219 f7c55eb0 c030feb7 f7c58ae8 f7c58a48 Aug 23 19:44:51 tornado kernel:f7c55ec4 c0217e73 f7c58a48 f7d134ec 0040 f7c55ed0 c0217ec0 f7c58a48 Aug 23 19:44:51 tornado kernel:f7c55edc c0217814 f7c58a48 f7c55eec c0216ad2 f7c58a48 f7c58a14 f7c55ef8 Aug 23 19:44:51 tornado kernel: Call Trace: Aug 23 19:44:51 tornado kernel: [c01039c3] show_stack+0x94/0xca Aug 23 19:44:51 tornado kernel: [c0103b6c] show_registers+0x15a/0x1ea Aug 23 19:44:51 tornado kernel: [c0103d8a] die+0x108/0x183 Aug 23 19:44:51 tornado kernel: [c031295a] do_page_fault+0x1ea/0x63d Aug 23 19:44:51 tornado kernel: [c0103693] error_code+0x4f/0x54 Aug 23 19:44:51 tornado kernel: [c0312219] _spin_lock+0x8/0xa Aug 23 19:44:51 tornado kernel: [c030feb7] klist_remove+0x10/0x2c Aug 23 19:44:51 tornado kernel: [c0217e73] __device_release_driver+0x41/0x65 Aug 23 19:44:51 tornado kernel: [c0217ec0] device_release_driver+0x29/0x39 Aug 23 19:44:51 tornado kernel: [c0217814] bus_remove_device+0x52/0x60 Aug 23 19:44:51 tornado kernel: [c0216ad2] device_del+0x2e/0x5d Aug 23 19:44:51 tornado kernel: [c0216b0c] device_unregister+0xb/0x15 Aug 23 19:44:51 tornado kernel: [c0275d67] usb_disconnect+0x115/0x15c Aug 23 19:44:51 tornado kernel: [c0276b85] hub_port_connect_change+0x54/0x399 Aug 23 19:44:51 tornado kernel: [c027713e] hub_events+0x274/0x3b2 Aug 23 19:44:51 tornado kernel: [c0277296] hub_thread+0x1a/0xdf Aug 23 19:44:51 tornado kernel: [c012fba7] kthread+0x99/0x9d Aug 23 19:44:51 tornado kernel: [c01010b5] kernel_thread_helper+0x5/0xb Aug 23 19:44:51 tornado kernel: Code: 00 00 00 8b 0d a8 62 36 c0 e9 61 ff ff ff f3 90 31 c0 86 07 84 c0 0f 8e 79 ff ff ff 83 c4 18 5 b 5e 5f 5d c3 55 89 e5 56 53 89 c3 81 78 04 ad 4e ad de 75 2d be 00 e0 ff ff 21 e6 8b 06 39 43 0c reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc6-mm1
Hi, On 19/08/2005 11:37 a.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc6/2.6.13-rc6-mm1/ - Lots of fixes, updates and cleanups all over the place. - If you have the right debugging options set, this kernel will generate a storm of sleeping-in-atomic-code warnings at boot, from the scsi code. It is being worked on. Changes since 2.6.13-rc5-mm1: linus.patch Noted this in my log earlier today. Is this inotify related? Aug 21 08:33:04 tornado kernel: idr_remove called for id=2048 which is not allocated. Aug 21 08:33:04 tornado kernel: [] dump_stack+0x17/0x19 Aug 21 08:33:04 tornado kernel: [] idr_remove_warning+0x1b/0x1d Aug 21 08:33:04 tornado kernel: [] sub_remove+0x88/0xea Aug 21 08:33:04 tornado kernel: [] idr_remove+0x1b/0x7f Aug 21 08:33:04 tornado kernel: [] remove_watch_no_event+0x7a/0x12e Aug 21 08:33:04 tornado kernel: [] inotify_release+0x8f/0x1af Aug 21 08:33:04 tornado kernel: [] __fput+0xaf/0x199 Aug 21 08:33:04 tornado kernel: [] fput+0x22/0x3b Aug 21 08:33:04 tornado kernel: [] filp_close+0x41/0x67 Aug 21 08:33:04 tornado kernel: [] sys_close+0x70/0x92 Aug 21 08:33:04 tornado kernel: [] sysenter_past_esp+0x54/0x75 Aug 21 08:33:04 tornado kernel: idr_remove called for id=3072 which is not allocated. Aug 21 08:33:05 tornado kernel: [] dump_stack+0x17/0x19 Aug 21 08:33:05 tornado kernel: [] idr_remove_warning+0x1b/0x1d Aug 21 08:33:05 tornado kernel: [] sub_remove+0x88/0xea Aug 21 08:33:05 tornado kernel: [] idr_remove+0x1b/0x7f Aug 21 08:33:05 tornado kernel: [] remove_watch_no_event+0x7a/0x12e Aug 21 08:33:05 tornado kernel: [] inotify_release+0x8f/0x1af Aug 21 08:33:05 tornado kernel: [] __fput+0xaf/0x199 Aug 21 08:33:05 tornado kernel: [] fput+0x22/0x3b Aug 21 08:33:05 tornado kernel: [] filp_close+0x41/0x67 Aug 21 08:33:05 tornado kernel: [] sys_close+0x70/0x92 Aug 21 08:33:05 tornado kernel: [] sysenter_past_esp+0x54/0x75 This would have been triggered by using dovecot IMAP which is configured to use inotify on Maildir. I'm also seeing some userspace errors logged for dovecot: "Aug 21 04:17:22 Error: IMAP(reuben): inotify_rm_watch() failed: Invalid argument" I'll deal with those with the guy who wrote the inotify code in dovecot. I'm not so sure userspace should be able or need to cause the kernel to dump stack traces like that though? reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc6-mm1
Hi, On 21/08/2005 1:40 a.m., David Woodhouse wrote: On Fri, 2005-08-19 at 18:36 -0700, Andrew Morton wrote: Reuben Farrelly <[EMAIL PROTECTED]> wrote: ... 4. PAM is complaining about "PAM audit_open() failed: Protocol not suppor ted" and I can't log in as any user including root. I would have picked this was a userspace problem, but it doesn't break with -rc5-mm1, yet reproduceably breaks with -rc6-mm1. Weird. hm. How come you're able to use the machine then? Machine was booting up ok, and things were being written to syslog. Rebooted into -rc5-mm1 to investigate, and of course could boot into rc6-mm1 in single user mode, test and bring services up one by one from there. Having two boxes helped too. Is it possible to get an strace of this failure somehow? Not sure if this is needed anymore, as I found that the problem goes away when I compile in kernel auditing. This not required for -rc5-mm1. Is that change intended? Sounds wrong to me, especially if 2.6.13-rc6 doesn't do that. Hm. It sounds like you'd configured PAM to require the pam_loginuid module even though you didn't have auditing enabled in your kernel. That seems strange and wrong to me, and _is_ a userspace problem. I haven't touched my pam config since it was installed a long time ago - it's one of those things that is too annoying to fix once broked, so I leave it alone at the system defaults ;) I had logged this as a Fedora bug as I figured the pam_loginuid detection of the presence of auditing in the kernel is not very robust. There was a patch modified in pam-0.80-6 at the start of August which was to fix this on non audit enabled kernels, which works for anything up to and older than 2.6.12-rc5-mm1. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=166422 It was closed 8 mins later, and the suggestion made that I take it to a pam development list instead. Redhat don't seem so interested in fixing things as a result of breakage when running an -mm kernel. I'd also agree that it shouldn't have changed with the new kernel though -- and I can't think of anything I changed recently which would have that effect. An strace would still be useful. Done. Posted up at http://www.reub.net/kernel/strace-login Can you double-check that you didn't have auditing enabled in your older, working kernel? Definitely wasn't enabled. I still have the .config that I used to build -rc5-mm1 with and my original -rc6-mm1 and it reads: CONFIG_SYSCTL=y # CONFIG_AUDIT is not set CONFIG_HOTPLUG=y Thanks for taking a look. Reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc6-mm1
Hi, On 21/08/2005 1:40 a.m., David Woodhouse wrote: On Fri, 2005-08-19 at 18:36 -0700, Andrew Morton wrote: Reuben Farrelly [EMAIL PROTECTED] wrote: ... 4. PAM is complaining about PAM audit_open() failed: Protocol not suppor ted and I can't log in as any user including root. I would have picked this was a userspace problem, but it doesn't break with -rc5-mm1, yet reproduceably breaks with -rc6-mm1. Weird. hm. How come you're able to use the machine then? Machine was booting up ok, and things were being written to syslog. Rebooted into -rc5-mm1 to investigate, and of course could boot into rc6-mm1 in single user mode, test and bring services up one by one from there. Having two boxes helped too. Is it possible to get an strace of this failure somehow? Not sure if this is needed anymore, as I found that the problem goes away when I compile in kernel auditing. This not required for -rc5-mm1. Is that change intended? Sounds wrong to me, especially if 2.6.13-rc6 doesn't do that. Hm. It sounds like you'd configured PAM to require the pam_loginuid module even though you didn't have auditing enabled in your kernel. That seems strange and wrong to me, and _is_ a userspace problem. I haven't touched my pam config since it was installed a long time ago - it's one of those things that is too annoying to fix once broked, so I leave it alone at the system defaults ;) I had logged this as a Fedora bug as I figured the pam_loginuid detection of the presence of auditing in the kernel is not very robust. There was a patch modified in pam-0.80-6 at the start of August which was to fix this on non audit enabled kernels, which works for anything up to and older than 2.6.12-rc5-mm1. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=166422 It was closed 8 mins later, and the suggestion made that I take it to a pam development list instead. Redhat don't seem so interested in fixing things as a result of breakage when running an -mm kernel. I'd also agree that it shouldn't have changed with the new kernel though -- and I can't think of anything I changed recently which would have that effect. An strace would still be useful. Done. Posted up at http://www.reub.net/kernel/strace-login Can you double-check that you didn't have auditing enabled in your older, working kernel? Definitely wasn't enabled. I still have the .config that I used to build -rc5-mm1 with and my original -rc6-mm1 and it reads: CONFIG_SYSCTL=y # CONFIG_AUDIT is not set CONFIG_HOTPLUG=y Thanks for taking a look. Reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc6-mm1
Hi, On 19/08/2005 11:37 a.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc6/2.6.13-rc6-mm1/ - Lots of fixes, updates and cleanups all over the place. - If you have the right debugging options set, this kernel will generate a storm of sleeping-in-atomic-code warnings at boot, from the scsi code. It is being worked on. Changes since 2.6.13-rc5-mm1: linus.patch Noted this in my log earlier today. Is this inotify related? Aug 21 08:33:04 tornado kernel: idr_remove called for id=2048 which is not allocated. Aug 21 08:33:04 tornado kernel: [c0103a00] dump_stack+0x17/0x19 Aug 21 08:33:04 tornado kernel: [c01c9f9a] idr_remove_warning+0x1b/0x1d Aug 21 08:33:04 tornado kernel: [c01ca024] sub_remove+0x88/0xea Aug 21 08:33:04 tornado kernel: [c01ca0a1] idr_remove+0x1b/0x7f Aug 21 08:33:04 tornado kernel: [c018176a] remove_watch_no_event+0x7a/0x12e Aug 21 08:33:04 tornado kernel: [c0181f64] inotify_release+0x8f/0x1af Aug 21 08:33:04 tornado kernel: [c015ca80] __fput+0xaf/0x199 Aug 21 08:33:04 tornado kernel: [c015c9b8] fput+0x22/0x3b Aug 21 08:33:04 tornado kernel: [c015b2ed] filp_close+0x41/0x67 Aug 21 08:33:04 tornado kernel: [c015b383] sys_close+0x70/0x92 Aug 21 08:33:04 tornado kernel: [c0102a9b] sysenter_past_esp+0x54/0x75 Aug 21 08:33:04 tornado kernel: idr_remove called for id=3072 which is not allocated. Aug 21 08:33:05 tornado kernel: [c0103a00] dump_stack+0x17/0x19 Aug 21 08:33:05 tornado kernel: [c01c9f9a] idr_remove_warning+0x1b/0x1d Aug 21 08:33:05 tornado kernel: [c01ca024] sub_remove+0x88/0xea Aug 21 08:33:05 tornado kernel: [c01ca0a1] idr_remove+0x1b/0x7f Aug 21 08:33:05 tornado kernel: [c018176a] remove_watch_no_event+0x7a/0x12e Aug 21 08:33:05 tornado kernel: [c0181f64] inotify_release+0x8f/0x1af Aug 21 08:33:05 tornado kernel: [c015ca80] __fput+0xaf/0x199 Aug 21 08:33:05 tornado kernel: [c015c9b8] fput+0x22/0x3b Aug 21 08:33:05 tornado kernel: [c015b2ed] filp_close+0x41/0x67 Aug 21 08:33:05 tornado kernel: [c015b383] sys_close+0x70/0x92 Aug 21 08:33:05 tornado kernel: [c0102a9b] sysenter_past_esp+0x54/0x75 This would have been triggered by using dovecot IMAP which is configured to use inotify on Maildir. I'm also seeing some userspace errors logged for dovecot: Aug 21 04:17:22 Error: IMAP(reuben): inotify_rm_watch() failed: Invalid argument I'll deal with those with the guy who wrote the inotify code in dovecot. I'm not so sure userspace should be able or need to cause the kernel to dump stack traces like that though? reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc6-mm1
Hi again, On 20/08/2005 5:34 a.m., Andrew Morton wrote: Reuben Farrelly <[EMAIL PROTECTED]> wrote: A few new problems cropped up with this kernel.. 1. NFS seems to be unstable, oopsing when shutting down: --- devel/fs/nfsd/nfssvc.c~ingo-nfs-stuff-fix 2005-08-19 10:29:15.0 -0700 +++ devel-akpm/fs/nfsd/nfssvc.c 2005-08-19 10:30:03.0 -0700 @@ -286,7 +286,6 @@ out: /* Release the thread */ svc_exit_thread(rqstp); - unlock_kernel(); /* Release module */ unlock_kernel(); module_put_and_exit(0); _ That fixed it, thanks. Aug 20 12:26:10 tornado kernel: Device not ready. 2. That message on the third line of the trace above: "kernel: Device not ready." is being logged every few mins or so, I believe it is my SCSI CDROM that is causing it. It also logs something similar after the SCSI driver has probed the device on boot: Aug 20 12:24:36 tornado kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0 Aug 20 12:24:36 tornado kernel: Aug 20 12:24:36 tornado kernel: aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs Aug 20 12:24:36 tornado kernel: Aug 20 12:24:36 tornado kernel: Vendor: SONY Model: CD-RW CRX145S Rev: 1.0b Aug 20 12:24:36 tornado kernel: Type: CD-ROM ANSI SCSI revision: 04 Aug 20 12:24:36 tornado kernel: target0:0:6: Beginning Domain Validation Aug 20 12:24:36 tornado kernel: target0:0:6: Domain Validation skipping write tests Aug 20 12:24:36 tornado kernel: target0:0:6: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15) Aug 20 12:24:36 tornado kernel: target0:0:6: Ending Domain Validation Aug 20 12:24:36 tornado kernel: Device not ready. This has been a problem for quite a few weeks now, albeit I believe, only a cosmetic one. Is some application trying to poll the device? I wonder if hald knows something about this and is polling.. however that message above about "Device not ready" occurs when the kernel is booting, before any userspace stuff has started up. Maybe hald is just being a bit aggressive in re-probing the drive after userspace launches. B all accounts after a week of uptime the drive certainly ought to be ready, it seems to work ok ;-) Note the extra space after 'Device' and 'not' which implies possibly some text is missing (which would have made it more clear which device is not exactly ready). The case sensitive strings "Device" and "not ready" appears together in scsi_lib.c and very few other places. Is the device actually "not ready", or is it in reality ready and working? ie: what happens if you stick a CD in it? The CD can be read, and the error messages go away. They stay away even after the CD has been ejected. 4. PAM is complaining about "PAM audit_open() failed: Protocol not suppor ted" and I can't log in as any user including root. I would have picked this was a userspace problem, but it doesn't break with -rc5-mm1, yet reproduceably breaks with -rc6-mm1. Weird. hm. How come you're able to use the machine then? Machine was booting up ok, and things were being written to syslog. Rebooted into -rc5-mm1 to investigate, and of course could boot into rc6-mm1 in single user mode, test and bring services up one by one from there. Having two boxes helped too. Is it possible to get an strace of this failure somehow? Not sure if this is needed anymore, as I found that the problem goes away when I compile in kernel auditing. This not required for -rc5-mm1. Is that change intended? reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc6-mm1
Hi, On 19/08/2005 11:33 p.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc6/2.6.13-rc6-mm1/ - Lots of fixes, updates and cleanups all over the place. - If you have the right debugging options set, this kernel will generate a storm of sleeping-in-atomic-code warnings at boot, from the scsi code. It is being worked on. A few new problems cropped up with this kernel.. 1. NFS seems to be unstable, oopsing when shutting down: Aug 20 12:26:09 tornado shutdown: shutting down for system reboot Aug 20 12:26:10 tornado init: Switching to runlevel: 6 Aug 20 12:26:10 tornado kernel: Device not ready. Aug 20 12:26:10 tornado last message repeated 4 times Aug 20 12:26:11 tornado smokeping[2524]: Got TERM signal, terminating. Aug 20 12:26:16 tornado rpc.mountd: Caught signal 15, un-registering and exiting. Aug 20 12:26:20 tornado kernel: [ cut here ] Aug 20 12:26:20 tornado kernel: kernel BUG at lib/kernel_lock.c:83! Aug 20 12:26:20 tornado kernel: invalid operand: [#1] Aug 20 12:26:20 tornado kernel: SMP Aug 20 12:26:20 tornado kernel: last sysfs file: /devices/pci:00/:00:1d.3/usb5/5-2/5-2.2/5-2.2.1/5-2.2.1.1/5-2.2.1.1:1.1/mod alias Aug 20 12:26:20 tornado kernel: Modules linked in: nfsd exportfs lockd eeprom sunrpc ipv6 iptable_filter binfmt_misc reiser4 zlib_de flate zlib_inflate dm_mod video thermal processor fan button ac i8xx_tco i2c_i801 sky2 sr_mod Aug 20 12:26:20 tornado kernel: CPU:1 Aug 20 12:26:20 tornado kernel: EIP:0060:[]Not tainted VLI Aug 20 12:26:20 tornado kernel: EFLAGS: 00010286 (2.6.13-rc6-mm1) Aug 20 12:26:20 tornado kernel: EIP is at unlock_kernel+0x28/0x32 Aug 20 12:26:20 tornado kernel: eax: ebx: 0009 ecx: f6a23f90 edx: f6adaa50 Aug 20 12:26:20 tornado kernel: esi: f6a23f54 edi: c191d2fc ebp: f6b3ffa8 esp: f6b3ffa8 Aug 20 12:26:20 tornado kernel: ds: 007b es: 007b ss: 0068 Aug 20 12:26:20 tornado kernel: Process nfsd (pid: 2034, threadinfo=f6b3e000 task=f6adaa50) Aug 20 12:26:20 tornado kernel: Stack: f6b3ffe4 f8e0e4c2 f8e2d648 f6b3e000 f6f9103c 00100100 00200200 f6adaa50 Aug 20 12:26:20 tornado kernel:feff fef8 f8e0e231 Aug 20 12:26:20 tornado kernel:c01010b5 f6f9103c 5a5a5a5a a55a5a5a Aug 20 12:26:20 tornado kernel: Call Trace: Aug 20 12:26:20 tornado kernel: [] show_stack+0x94/0xca Aug 20 12:26:20 tornado kernel: [] show_registers+0x15a/0x1ea Aug 20 12:26:20 tornado kernel: [] die+0x108/0x183 Aug 20 12:26:20 tornado kernel: [] do_trap+0x76/0xa1 Aug 20 12:26:20 tornado kernel: [] do_invalid_op+0x97/0xa1 Aug 20 12:26:20 tornado kernel: [] error_code+0x4f/0x54 Aug 20 12:26:20 tornado kernel: [] nfsd+0x291/0x341 [nfsd] Aug 20 12:26:20 tornado kernel: [] kernel_thread_helper+0x5/0xb Aug 20 12:26:20 tornado kernel: Code: 5e 5d c3 55 89 e5 b8 00 e0 ff ff 21 e0 8b 10 8b 42 14 85 c0 78 15 83 e8 01 89 42 14 85 c0 79 0 9 f0 ff 05 40 e7 36 c0 7e 39 5d c3 <0f> 0b 53 00 37 e1 32 c0 eb e1 8d 05 40 e7 36 c0 e8 fe dd ff ff Aug 20 12:26:20 tornado kernel: [ cut here ] 2. That message on the third line of the trace above: "kernel: Device not ready." is being logged every few mins or so, I believe it is my SCSI CDROM that is causing it. It also logs something similar after the SCSI driver has probed the device on boot: Aug 20 12:24:36 tornado kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0 Aug 20 12:24:36 tornado kernel: Aug 20 12:24:36 tornado kernel: aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs Aug 20 12:24:36 tornado kernel: Aug 20 12:24:36 tornado kernel: Vendor: SONY Model: CD-RW CRX145S Rev: 1.0b Aug 20 12:24:36 tornado kernel: Type: CD-ROM ANSI SCSI revision: 04 Aug 20 12:24:36 tornado kernel: target0:0:6: Beginning Domain Validation Aug 20 12:24:36 tornado kernel: target0:0:6: Domain Validation skipping write tests Aug 20 12:24:36 tornado kernel: target0:0:6: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15) Aug 20 12:24:36 tornado kernel: target0:0:6: Ending Domain Validation Aug 20 12:24:36 tornado kernel: Device not ready. This has been a problem for quite a few weeks now, albeit I believe, only a cosmetic one. 3. As I have a Marvell Yukon 2 chipset, I was _delighted_ to see a new driver from Stephen Hemmingway appear in the netdev tree for it. However it seems to be a bit broken, I get link up and a bit of traffic before it just stops passing traffic of any sort and requires an rmmod/modprobe to get going again. I've emailed him directly about this. 4. PAM is complaining about "PAM audit_open() failed: Protocol not suppor ted" and I can't log in as any user including root. I would have picked this was a userspace problem, but it doesn't break with -rc5-mm1, yet reproduceably breaks with -rc6-mm1. Weird. reuben - To unsubscribe
Re: 2.6.13-rc6-mm1
Hi, On 19/08/2005 11:33 p.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc6/2.6.13-rc6-mm1/ - Lots of fixes, updates and cleanups all over the place. - If you have the right debugging options set, this kernel will generate a storm of sleeping-in-atomic-code warnings at boot, from the scsi code. It is being worked on. A few new problems cropped up with this kernel.. 1. NFS seems to be unstable, oopsing when shutting down: Aug 20 12:26:09 tornado shutdown: shutting down for system reboot Aug 20 12:26:10 tornado init: Switching to runlevel: 6 Aug 20 12:26:10 tornado kernel: Device not ready. Aug 20 12:26:10 tornado last message repeated 4 times Aug 20 12:26:11 tornado smokeping[2524]: Got TERM signal, terminating. Aug 20 12:26:16 tornado rpc.mountd: Caught signal 15, un-registering and exiting. Aug 20 12:26:20 tornado kernel: [ cut here ] Aug 20 12:26:20 tornado kernel: kernel BUG at lib/kernel_lock.c:83! Aug 20 12:26:20 tornado kernel: invalid operand: [#1] Aug 20 12:26:20 tornado kernel: SMP Aug 20 12:26:20 tornado kernel: last sysfs file: /devices/pci:00/:00:1d.3/usb5/5-2/5-2.2/5-2.2.1/5-2.2.1.1/5-2.2.1.1:1.1/mod alias Aug 20 12:26:20 tornado kernel: Modules linked in: nfsd exportfs lockd eeprom sunrpc ipv6 iptable_filter binfmt_misc reiser4 zlib_de flate zlib_inflate dm_mod video thermal processor fan button ac i8xx_tco i2c_i801 sky2 sr_mod Aug 20 12:26:20 tornado kernel: CPU:1 Aug 20 12:26:20 tornado kernel: EIP:0060:[c0310845]Not tainted VLI Aug 20 12:26:20 tornado kernel: EFLAGS: 00010286 (2.6.13-rc6-mm1) Aug 20 12:26:20 tornado kernel: EIP is at unlock_kernel+0x28/0x32 Aug 20 12:26:20 tornado kernel: eax: ebx: 0009 ecx: f6a23f90 edx: f6adaa50 Aug 20 12:26:20 tornado kernel: esi: f6a23f54 edi: c191d2fc ebp: f6b3ffa8 esp: f6b3ffa8 Aug 20 12:26:20 tornado kernel: ds: 007b es: 007b ss: 0068 Aug 20 12:26:20 tornado kernel: Process nfsd (pid: 2034, threadinfo=f6b3e000 task=f6adaa50) Aug 20 12:26:20 tornado kernel: Stack: f6b3ffe4 f8e0e4c2 f8e2d648 f6b3e000 f6f9103c 00100100 00200200 f6adaa50 Aug 20 12:26:20 tornado kernel:feff fef8 f8e0e231 Aug 20 12:26:20 tornado kernel:c01010b5 f6f9103c 5a5a5a5a a55a5a5a Aug 20 12:26:20 tornado kernel: Call Trace: Aug 20 12:26:20 tornado kernel: [c01039c3] show_stack+0x94/0xca Aug 20 12:26:20 tornado kernel: [c0103b6c] show_registers+0x15a/0x1ea Aug 20 12:26:20 tornado kernel: [c0103d8a] die+0x108/0x183 Aug 20 12:26:20 tornado kernel: [c0310986] do_trap+0x76/0xa1 Aug 20 12:26:20 tornado kernel: [c0104090] do_invalid_op+0x97/0xa1 Aug 20 12:26:20 tornado kernel: [c0103693] error_code+0x4f/0x54 Aug 20 12:26:20 tornado kernel: [f8e0e4c2] nfsd+0x291/0x341 [nfsd] Aug 20 12:26:20 tornado kernel: [c01010b5] kernel_thread_helper+0x5/0xb Aug 20 12:26:20 tornado kernel: Code: 5e 5d c3 55 89 e5 b8 00 e0 ff ff 21 e0 8b 10 8b 42 14 85 c0 78 15 83 e8 01 89 42 14 85 c0 79 0 9 f0 ff 05 40 e7 36 c0 7e 39 5d c3 0f 0b 53 00 37 e1 32 c0 eb e1 8d 05 40 e7 36 c0 e8 fe dd ff ff Aug 20 12:26:20 tornado kernel: [ cut here ] 2. That message on the third line of the trace above: kernel: Device not ready. is being logged every few mins or so, I believe it is my SCSI CDROM that is causing it. It also logs something similar after the SCSI driver has probed the device on boot: Aug 20 12:24:36 tornado kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0 Aug 20 12:24:36 tornado kernel: Adaptec 2940 Ultra SCSI adapter Aug 20 12:24:36 tornado kernel: aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs Aug 20 12:24:36 tornado kernel: Aug 20 12:24:36 tornado kernel: Vendor: SONY Model: CD-RW CRX145S Rev: 1.0b Aug 20 12:24:36 tornado kernel: Type: CD-ROM ANSI SCSI revision: 04 Aug 20 12:24:36 tornado kernel: target0:0:6: Beginning Domain Validation Aug 20 12:24:36 tornado kernel: target0:0:6: Domain Validation skipping write tests Aug 20 12:24:36 tornado kernel: target0:0:6: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15) Aug 20 12:24:36 tornado kernel: target0:0:6: Ending Domain Validation Aug 20 12:24:36 tornado kernel: Device not ready. This has been a problem for quite a few weeks now, albeit I believe, only a cosmetic one. 3. As I have a Marvell Yukon 2 chipset, I was _delighted_ to see a new driver from Stephen Hemmingway appear in the netdev tree for it. However it seems to be a bit broken, I get link up and a bit of traffic before it just stops passing traffic of any sort and requires an rmmod/modprobe to get going again. I've emailed him directly about this. 4. PAM is complaining about PAM audit_open() failed: Protocol not suppor ted and I can't log in as any user including root. I would have picked this was a userspace problem, but it doesn't
Re: 2.6.13-rc6-mm1
Hi again, On 20/08/2005 5:34 a.m., Andrew Morton wrote: Reuben Farrelly [EMAIL PROTECTED] wrote: A few new problems cropped up with this kernel.. 1. NFS seems to be unstable, oopsing when shutting down: --- devel/fs/nfsd/nfssvc.c~ingo-nfs-stuff-fix 2005-08-19 10:29:15.0 -0700 +++ devel-akpm/fs/nfsd/nfssvc.c 2005-08-19 10:30:03.0 -0700 @@ -286,7 +286,6 @@ out: /* Release the thread */ svc_exit_thread(rqstp); - unlock_kernel(); /* Release module */ unlock_kernel(); module_put_and_exit(0); _ That fixed it, thanks. Aug 20 12:26:10 tornado kernel: Device not ready. 2. That message on the third line of the trace above: kernel: Device not ready. is being logged every few mins or so, I believe it is my SCSI CDROM that is causing it. It also logs something similar after the SCSI driver has probed the device on boot: Aug 20 12:24:36 tornado kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0 Aug 20 12:24:36 tornado kernel: Adaptec 2940 Ultra SCSI adapter Aug 20 12:24:36 tornado kernel: aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs Aug 20 12:24:36 tornado kernel: Aug 20 12:24:36 tornado kernel: Vendor: SONY Model: CD-RW CRX145S Rev: 1.0b Aug 20 12:24:36 tornado kernel: Type: CD-ROM ANSI SCSI revision: 04 Aug 20 12:24:36 tornado kernel: target0:0:6: Beginning Domain Validation Aug 20 12:24:36 tornado kernel: target0:0:6: Domain Validation skipping write tests Aug 20 12:24:36 tornado kernel: target0:0:6: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15) Aug 20 12:24:36 tornado kernel: target0:0:6: Ending Domain Validation Aug 20 12:24:36 tornado kernel: Device not ready. This has been a problem for quite a few weeks now, albeit I believe, only a cosmetic one. Is some application trying to poll the device? I wonder if hald knows something about this and is polling.. however that message above about Device not ready occurs when the kernel is booting, before any userspace stuff has started up. Maybe hald is just being a bit aggressive in re-probing the drive after userspace launches. B all accounts after a week of uptime the drive certainly ought to be ready, it seems to work ok ;-) Note the extra space after 'Device' and 'not' which implies possibly some text is missing (which would have made it more clear which device is not exactly ready). The case sensitive strings Device and not ready appears together in scsi_lib.c and very few other places. Is the device actually not ready, or is it in reality ready and working? ie: what happens if you stick a CD in it? The CD can be read, and the error messages go away. They stay away even after the CD has been ejected. 4. PAM is complaining about PAM audit_open() failed: Protocol not suppor ted and I can't log in as any user including root. I would have picked this was a userspace problem, but it doesn't break with -rc5-mm1, yet reproduceably breaks with -rc6-mm1. Weird. hm. How come you're able to use the machine then? Machine was booting up ok, and things were being written to syslog. Rebooted into -rc5-mm1 to investigate, and of course could boot into rc6-mm1 in single user mode, test and bring services up one by one from there. Having two boxes helped too. Is it possible to get an strace of this failure somehow? Not sure if this is needed anymore, as I found that the problem goes away when I compile in kernel auditing. This not required for -rc5-mm1. Is that change intended? reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc3-mm2
On 28/07/2005 9:10 p.m., Andrew Morton wrote: Reuben Farrelly <[EMAIL PROTECTED]> wrote: On 27/07/2005 9:45 a.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc3/2.6.13-rc3-mm2/ - Lots of fixes and updates all over the place. There are probably over 100 patches here which need to go into 2.6.13. - A reminder that -mm commit activity may be monitored by subscribing to the mm-commits list. Do echo subscribe mm-commits | mail [EMAIL PROTECTED] Also seeing this during boot-up: This was happening in earlier -mm's was it not? Hadn't seen it anytime recently.. last sysfs file: grr, I need to fix that. [] show_stack+0x94/0xca [] show_registers+0x165/0x1f9 [] die+0x108/0x183 [] do_page_fault+0x1ea/0x63d [] error_code+0x4f/0x54 [] fill_read_buffer+0x2e/0x74 [] sysfs_read_file+0x46/0x76 some dud sysfs file. Didn't appear after a reboot of 2.6.13-rc3-mm2, and doesn't appear with 2.6.13-rc3-mm3, so not too sure what to make of it now. Will see if it reappears (box is otherwise stable). Thanks, reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc3-mm2
On 27/07/2005 9:45 a.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc3/2.6.13-rc3-mm2/ - Lots of fixes and updates all over the place. There are probably over 100 patches here which need to go into 2.6.13. - A reminder that -mm commit activity may be monitored by subscribing to the mm-commits list. Do echo subscribe mm-commits | mail [EMAIL PROTECTED] Also seeing this during boot-up: Adding 497972k swap on /dev/sda7. Priority:1 extents:1 across:497972k Adding 497972k swap on /dev/sdb7. Priority:1 extents:1 across:497972k Unable to handle kernel paging request at virtual address 00316173 printing eip: 00316173 *pde = Oops: [#1] SMP last sysfs file: Modules linked in: binfmt_misc reiser4 zlib_deflate zlib_inflate dm_mod video thermal processor hotkey fan button ac i8xx_tco i2c_i8 01 CPU:0 EIP:0060:[<00316173>]Not tainted VLI EFLAGS: 00010202 (2.6.13-rc3-mm2) EIP is at 0x316173 eax: dfc05d24 ebx: dfc05d24 ecx: 00316173 edx: de87 esi: de87 edi: dfc05d2c ebp: df4e5f3c esp: df4e5f30 ds: 007b es: 007b ss: 0068 Process udev (pid: 1141, threadinfo=df4e4000 task=df24ea50) Stack: c02135a7 dfc6f0e8 c037edf4 df4e5f54 c018b5c3 de5d2bec dfc6f0e8 dfc8b1ec 1000 df4e5f74 c018b6fe df989030 080659b0 dfc6f0fc dfc8b1ec 1000 c018b6b8 df4e5f94 c0157c8f df4e5fa0 080659b0 dfc8b1ec fff7 Call Trace: [] show_stack+0x94/0xca [] show_registers+0x165/0x1f9 [] die+0x108/0x183 [] do_page_fault+0x1ea/0x63d [] error_code+0x4f/0x54 [] fill_read_buffer+0x2e/0x74 [] sysfs_read_file+0x46/0x76 [] vfs_read+0x8a/0x146 [] sys_read+0x3d/0x64 [] sysenter_past_esp+0x54/0x75 Code: Bad EIP value. <6>NET: Registered protocol family 10 IPv6 over IPv4 tunneling driver The machine continues on booting.. reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc3-mm2
On 27/07/2005 9:45 a.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc3/2.6.13-rc3-mm2/ - Lots of fixes and updates all over the place. There are probably over 100 patches here which need to go into 2.6.13. - A reminder that -mm commit activity may be monitored by subscribing to the mm-commits list. Do echo subscribe mm-commits | mail [EMAIL PROTECTED] Also seeing this during boot-up: Adding 497972k swap on /dev/sda7. Priority:1 extents:1 across:497972k Adding 497972k swap on /dev/sdb7. Priority:1 extents:1 across:497972k Unable to handle kernel paging request at virtual address 00316173 printing eip: 00316173 *pde = Oops: [#1] SMP last sysfs file: Modules linked in: binfmt_misc reiser4 zlib_deflate zlib_inflate dm_mod video thermal processor hotkey fan button ac i8xx_tco i2c_i8 01 CPU:0 EIP:0060:[00316173]Not tainted VLI EFLAGS: 00010202 (2.6.13-rc3-mm2) EIP is at 0x316173 eax: dfc05d24 ebx: dfc05d24 ecx: 00316173 edx: de87 esi: de87 edi: dfc05d2c ebp: df4e5f3c esp: df4e5f30 ds: 007b es: 007b ss: 0068 Process udev (pid: 1141, threadinfo=df4e4000 task=df24ea50) Stack: c02135a7 dfc6f0e8 c037edf4 df4e5f54 c018b5c3 de5d2bec dfc6f0e8 dfc8b1ec 1000 df4e5f74 c018b6fe df989030 080659b0 dfc6f0fc dfc8b1ec 1000 c018b6b8 df4e5f94 c0157c8f df4e5fa0 080659b0 dfc8b1ec fff7 Call Trace: [c0103983] show_stack+0x94/0xca [c0103b37] show_registers+0x165/0x1f9 [c0103d5d] die+0x108/0x183 [c0318c3a] do_page_fault+0x1ea/0x63d [c0103657] error_code+0x4f/0x54 [c018b5c3] fill_read_buffer+0x2e/0x74 [c018b6fe] sysfs_read_file+0x46/0x76 [c0157c8f] vfs_read+0x8a/0x146 [c0157fd7] sys_read+0x3d/0x64 [c0102ae7] sysenter_past_esp+0x54/0x75 Code: Bad EIP value. 6NET: Registered protocol family 10 IPv6 over IPv4 tunneling driver The machine continues on booting.. reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc3-mm2
On 28/07/2005 9:10 p.m., Andrew Morton wrote: Reuben Farrelly [EMAIL PROTECTED] wrote: On 27/07/2005 9:45 a.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc3/2.6.13-rc3-mm2/ - Lots of fixes and updates all over the place. There are probably over 100 patches here which need to go into 2.6.13. - A reminder that -mm commit activity may be monitored by subscribing to the mm-commits list. Do echo subscribe mm-commits | mail [EMAIL PROTECTED] Also seeing this during boot-up: This was happening in earlier -mm's was it not? Hadn't seen it anytime recently.. last sysfs file: grr, I need to fix that. [c0103983] show_stack+0x94/0xca [c0103b37] show_registers+0x165/0x1f9 [c0103d5d] die+0x108/0x183 [c0318c3a] do_page_fault+0x1ea/0x63d [c0103657] error_code+0x4f/0x54 [c018b5c3] fill_read_buffer+0x2e/0x74 [c018b6fe] sysfs_read_file+0x46/0x76 some dud sysfs file. Didn't appear after a reboot of 2.6.13-rc3-mm2, and doesn't appear with 2.6.13-rc3-mm3, so not too sure what to make of it now. Will see if it reappears (box is otherwise stable). Thanks, reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc3-mm2
Hi, On 27/07/2005 9:45 a.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc3/2.6.13-rc3-mm2/ - Lots of fixes and updates all over the place. There are probably over 100 patches here which need to go into 2.6.13. - A reminder that -mm commit activity may be monitored by subscribing to the mm-commits list. Do echo subscribe mm-commits | mail [EMAIL PROTECTED] Changes since 2.6.13-rc3-mm1: A few more warnings in mostly the reiser4 code in this one compared to -mm1: LD fs/ramfs/ramfs.o LD fs/ramfs/built-in.o LD fs/reiser4/built-in.o CC [M] fs/reiser4/debug.o In file included from fs/reiser4/plugin/plugin.h:26, from fs/reiser4/jnode.h:19, from fs/reiser4/lock.h:16, from fs/reiser4/context.h:15, from fs/reiser4/debug.c:32: fs/reiser4/plugin/node/node40.h:83:5: warning: "GUESS_EXISTS" is not defined CC [M] fs/reiser4/jnode.o about 20 or so times during this part of the compilation, however it never quite bombs out. and this one: In file included from fs/reiser4/plugin/plugin.h:26, from fs/reiser4/jnode.h:19, from fs/reiser4/seal.c:42: fs/reiser4/plugin/node/node40.h:83:5: warning: "GUESS_EXISTS" is not defined fs/reiser4/seal.c:212:5: warning: "REISER4_DEBUG_OUTPUT" is not defined CC [M] fs/reiser4/dscale.o CC [M] fs/reiser4/flush_queue.o CC net/ipv4/netfilter/ip_conntrack_core.o net/ipv4/netfilter/ip_conntrack_core.c:726:5: warning: "CONFIG_IP_NF_CONNTRACK_MARK" is not defined CC net/ipv4/netfilter/ip_conntrack_proto_generic.o CC drivers/scsi/aic7xxx/aic7xxx_core.o In file included from drivers/scsi/aic7xxx/aic7xxx_core.c:48: drivers/scsi/aic7xxx/aicasm/aicasm_insformat.h:46:5: warning: "BYTE_ORDER" is not defined drivers/scsi/aic7xxx/aicasm/aicasm_insformat.h:46:19: warning: "LITTLE_ENDIAN" is not defined drivers/scsi/aic7xxx/aicasm/aicasm_insformat.h:64:5: warning: "BYTE_ORDER" is not defined drivers/scsi/aic7xxx/aicasm/aicasm_insformat.h:64:19: warning: "LITTLE_ENDIAN" is not defined drivers/scsi/aic7xxx/aicasm/aicasm_insformat.h:82:5: warning: "BYTE_ORDER" is not defined drivers/scsi/aic7xxx/aicasm/aicasm_insformat.h:82:19: warning: "LITTLE_ENDIAN" is not defined reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc3-mm2
Hi, On 27/07/2005 9:45 a.m., Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc3/2.6.13-rc3-mm2/ - Lots of fixes and updates all over the place. There are probably over 100 patches here which need to go into 2.6.13. - A reminder that -mm commit activity may be monitored by subscribing to the mm-commits list. Do echo subscribe mm-commits | mail [EMAIL PROTECTED] Changes since 2.6.13-rc3-mm1: A few more warnings in mostly the reiser4 code in this one compared to -mm1: LD fs/ramfs/ramfs.o LD fs/ramfs/built-in.o LD fs/reiser4/built-in.o CC [M] fs/reiser4/debug.o In file included from fs/reiser4/plugin/plugin.h:26, from fs/reiser4/jnode.h:19, from fs/reiser4/lock.h:16, from fs/reiser4/context.h:15, from fs/reiser4/debug.c:32: fs/reiser4/plugin/node/node40.h:83:5: warning: GUESS_EXISTS is not defined CC [M] fs/reiser4/jnode.o about 20 or so times during this part of the compilation, however it never quite bombs out. and this one: In file included from fs/reiser4/plugin/plugin.h:26, from fs/reiser4/jnode.h:19, from fs/reiser4/seal.c:42: fs/reiser4/plugin/node/node40.h:83:5: warning: GUESS_EXISTS is not defined fs/reiser4/seal.c:212:5: warning: REISER4_DEBUG_OUTPUT is not defined CC [M] fs/reiser4/dscale.o CC [M] fs/reiser4/flush_queue.o CC net/ipv4/netfilter/ip_conntrack_core.o net/ipv4/netfilter/ip_conntrack_core.c:726:5: warning: CONFIG_IP_NF_CONNTRACK_MARK is not defined CC net/ipv4/netfilter/ip_conntrack_proto_generic.o CC drivers/scsi/aic7xxx/aic7xxx_core.o In file included from drivers/scsi/aic7xxx/aic7xxx_core.c:48: drivers/scsi/aic7xxx/aicasm/aicasm_insformat.h:46:5: warning: BYTE_ORDER is not defined drivers/scsi/aic7xxx/aicasm/aicasm_insformat.h:46:19: warning: LITTLE_ENDIAN is not defined drivers/scsi/aic7xxx/aicasm/aicasm_insformat.h:64:5: warning: BYTE_ORDER is not defined drivers/scsi/aic7xxx/aicasm/aicasm_insformat.h:64:19: warning: LITTLE_ENDIAN is not defined drivers/scsi/aic7xxx/aicasm/aicasm_insformat.h:82:5: warning: BYTE_ORDER is not defined drivers/scsi/aic7xxx/aicasm/aicasm_insformat.h:82:19: warning: LITTLE_ENDIAN is not defined reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm1
Hi again At 12:14 a.m. 6/04/2005, Adrian Bunk wrote: On Tue, Apr 05, 2005 at 08:34:11PM +1200, Reuben Farrelly wrote: > Hi, Hi Reuben, >... > Hrm. Something changed between the last -mm release which compiled > through, and this one.. >... > LD .tmp_vmlinux1 > arch/i386/kernel/built-in.o(.init.text+0x1823): In function `setup_arch': > : undefined reference to `acpi_boot_table_init' > arch/i386/kernel/built-in.o(.init.text+0x1828): In function `setup_arch': > : undefined reference to `acpi_boot_init' > make: *** [.tmp_vmlinux1] Error 1 > [EMAIL PROTECTED] linux-2.6]# > > Backing out bk-acpi.patch works around it.. Please send your .config . Have just figured out that it seems to be caused by having ACPI disabled in .config, once I re-enabled ACPI the build problem went away. Config attached anyway, I imagine the problem is quite reproduceable.. Reuben .config Description: Binary data
Re: 2.6.12-rc2-mm1
Hi, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm1/ - x86 NMI handling seems to be bust in 2.6.12-rc2. Try using `nmi_watchdog=0' if you experience weird crashes. - The possible kernel-timer related hangs might possibly be fixed. We haven't heard yet. - Nobody said anything about the PM resume and DRI behaviour in 2.6.12-rc1-mm4. So it's all perfect now? - Various fixes and updates. Nothing earth-shattering. Changes since 2.6.12-rc1-mm4: bk-acpi.patch bk-agpgart.patch bk-cifs.patch bk-cpufreq.patch bk-cryptodev.patch bk-driver-core.patch bk-drm.patch bk-drm-via.patch bk-ia64.patch bk-audit.patch bk-input.patch bk-jfs.patch bk-kbuild.patch bk-mtd.patch bk-netdev.patch bk-nfs.patch bk-ntfs.patch bk-scsi.patch bk-watchdog.patch Latest versions of subsystem trees Hrm. Something changed between the last -mm release which compiled through, and this one.. CHK include/linux/compile.h CHK usr/initramfs_list GEN .version CHK include/linux/compile.h UPD include/linux/compile.h CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 arch/i386/kernel/built-in.o(.init.text+0x1823): In function `setup_arch': : undefined reference to `acpi_boot_table_init' arch/i386/kernel/built-in.o(.init.text+0x1828): In function `setup_arch': : undefined reference to `acpi_boot_init' make: *** [.tmp_vmlinux1] Error 1 [EMAIL PROTECTED] linux-2.6]# Backing out bk-acpi.patch works around it.. reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm1
Hi, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc2/2.6.12-rc2-mm1/ - x86 NMI handling seems to be bust in 2.6.12-rc2. Try using `nmi_watchdog=0' if you experience weird crashes. - The possible kernel-timer related hangs might possibly be fixed. We haven't heard yet. - Nobody said anything about the PM resume and DRI behaviour in 2.6.12-rc1-mm4. So it's all perfect now? - Various fixes and updates. Nothing earth-shattering. Changes since 2.6.12-rc1-mm4: bk-acpi.patch bk-agpgart.patch bk-cifs.patch bk-cpufreq.patch bk-cryptodev.patch bk-driver-core.patch bk-drm.patch bk-drm-via.patch bk-ia64.patch bk-audit.patch bk-input.patch bk-jfs.patch bk-kbuild.patch bk-mtd.patch bk-netdev.patch bk-nfs.patch bk-ntfs.patch bk-scsi.patch bk-watchdog.patch Latest versions of subsystem trees Hrm. Something changed between the last -mm release which compiled through, and this one.. CHK include/linux/compile.h CHK usr/initramfs_list GEN .version CHK include/linux/compile.h UPD include/linux/compile.h CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 arch/i386/kernel/built-in.o(.init.text+0x1823): In function `setup_arch': : undefined reference to `acpi_boot_table_init' arch/i386/kernel/built-in.o(.init.text+0x1828): In function `setup_arch': : undefined reference to `acpi_boot_init' make: *** [.tmp_vmlinux1] Error 1 [EMAIL PROTECTED] linux-2.6]# Backing out bk-acpi.patch works around it.. reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc2-mm1
Hi again At 12:14 a.m. 6/04/2005, Adrian Bunk wrote: On Tue, Apr 05, 2005 at 08:34:11PM +1200, Reuben Farrelly wrote: Hi, Hi Reuben, ... Hrm. Something changed between the last -mm release which compiled through, and this one.. ... LD .tmp_vmlinux1 arch/i386/kernel/built-in.o(.init.text+0x1823): In function `setup_arch': : undefined reference to `acpi_boot_table_init' arch/i386/kernel/built-in.o(.init.text+0x1828): In function `setup_arch': : undefined reference to `acpi_boot_init' make: *** [.tmp_vmlinux1] Error 1 [EMAIL PROTECTED] linux-2.6]# Backing out bk-acpi.patch works around it.. Please send your .config . Have just figured out that it seems to be caused by having ACPI disabled in .config, once I re-enabled ACPI the build problem went away. Config attached anyway, I imagine the problem is quite reproduceable.. Reuben .config Description: Binary data
Re: 2.6.12-rc1-mm3
Hi Dmitry and others, At 06:41 a.m. 31/03/2005, Dmitry Torokhov wrote: On Monday 28 March 2005 06:02, Russell King wrote: > Looks like something in the input layer went bang. The code in > serport_ldisc_write_wakeup is: > >0: 8b 80 a8 09 00 00 mov0x9a8(%eax),%eax >6: 8b 40 14mov0x14(%eax),%eax >9: 8b 50 70mov0x70(%eax),%edx < >c: 85 d2 test %edx,%edx >e: 74 09 je 0x19 > > and the marked line exploded on you. The above instructions correspond > with: > > 0: struct serport *sp = (struct serport *) tty->disc_data; > 6: serio_drv_write_wakeup(sp->serio); > 9: if (serio->drv > > So, "serio" was this strange 0xf3a6cdf8 value. But why? One for the > input people I think. Reuben, could you please try the patch below? Thanks! Russell, could you please tell me if ldisc->write_wakeup (tty_wakwup) and ldisc->read are allowed to be called from an IRQ context? IOW I wonder if I can use spil_lock_bh instead of spil_lock_irqsave to protect serport flags. -- Dmitry serport.c | 98 +++--- 1 files changed, 68 insertions(+), 30 deletions(-) Index: dtor/drivers/input/serio/serport.c === --- dtor.orig/drivers/input/serio/serport.c +++ dtor/drivers/input/serio/serport.c @@ -27,11 +27,15 @@ MODULE_LICENSE("GPL"); MODULE_ALIAS_LDISC(N_MOUSE); I've done some testing this afternoon and it seems that this patch fixes the problem in -mm4. I don't even have a serial mouse/keyboard, but do have a serial PCI card onboard. The box has a USB connection to a Belkin KVM instead of directly attached input devices. I also note that it is occurring on kernel-smp-2.6.11-1.1219_FC4 - so it is probably a problem in mainline as well as -mm. Now I'm crashing a bit further through the shutdown, here's the stacktrace: INIT: Sending processes the TERM signal Stopping yum: Disabling nightly yum update: [ OK ] [ OK ] Stopping cups-config-daemon: [ OK ] Stopping HAL daemon: [ OK ] Stopping system message bus: [ OK ] Stopping atd: [ OK ] Stopping cups: [ OK ] Shutting down xfs: [ OK ] [ OK ] down console mouse services: [ OK ] Shutting down NFS mountd: [ OK ] Shutting down NFS daemon: nfsd: last server has exited nfsd: unexporting all filesystems RPC: error 5 connecting to server localhost RPC: failed to contact portmap (errno -5). Unable to handle kernel paging request at virtual address f2826d2c printing eip: c01337a9 *pde = Oops: [#1] SMP DEBUG_PAGEALLOC Modules linked in: nfsd exportfs md5 ipv6 lp snd_usb_audio snd_usb_lib pwc video dev usb_storage autofs4 eeprom lm85 i2c_sensor rfcomm l2cap bluetooth nfs lockd sunrpc dm_mod video button battery ac ohci1394 ieee1394 uhci_hcd ehci_hcd parpor t_serial parport_pc parport hw_random i2c_i801 i2c_core emu10k1_gp gameport snd_ emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_ pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore e100 mii flopp y ext3 jbd ata_piix libata sd_mod scsi_mod CPU:0 EIP:0060:[]Not tainted VLI EFLAGS: 00010087 (2.6.12-rc1-mm4) EIP is at worker_thread+0x149/0x230 eax: 0001 ebx: 0212 ecx: f7eb4018 edx: f2826d20 esi: f2826d24 edi: f7eb4000 ebp: esp: f7e83f7c ds: 007b es: 007b ss: 0068 Process events/0 (pid: 8, threadinfo=f7e83000 task=f7fefad0) Stack: f7eb4028 f7eb4010 f7eb4018 f7e83000 f2826d20 c014f4b0 0001 000f41fa 0001 f7fefad0 c011ea50 00100100 00200200 fffc f7e46f54 f7eb4000 c0133660 c0137694 Call Trace: [] cache_reap+0x0/0x240 [] default_wake_function+0x0/0x10 [] worker_thread+0x0/0x230 [] kthread+0x94/0xa0 [] kthread+0x0/0xa0 [] kernel_thread_helper+0x5/0x10 Code: 00 00 89 f8 e8 19 e3 1e 00 89 c3 8b 47 40 40 89 47 40 83 f8 03 0f 8f bd 00 00 00 8b 77 10 3b 74 24 04 74 71 8d 56 fc 89 54 24 10 <8b> 42 0c 89 44 24 14 8b 6a 10 8b 46 04 8b 16 89 10 89 36 89 42 [ OK ] Shutting down NFS quotas: [FAILED] Shutting down NFS services: [ OK ] Stopping sshd: [ OK ] Stopping postfix: Shutting down postfix: <3>BUG: soft lockup detected on CPU#0! Pid: 3413, comm: rpc.rquotad EIP: 0060:[] CPU: 0 EIP is at _spin_lock_irqsave+0x20/0x50 EFLAGS: 0286Not tainted (2.6.12-rc1-mm4) EAX: f7eb4000 EBX: 0246 ECX: f7eb4000 EDX: c22021a0 ESI: f7eb4000 EDI: c22021a0 EBP: c01335b0 DS: 007b ES: 007b CR0: 8005003b CR2: 800147fc CR3: 37256d20 CR4: 06e0 [] __queue_work+0xc/0x50 [] run_timer_softirq+0xd7/0x1c0 [] __do_softirq+0x80/0x100 [] do_softirq+0x4b/0x50 === [] apic_timer_interrupt+0x1c/0x30 [] kfree_skbmem+0x8/0x20 [] cpufreq_governor+0x3b/0x50 [] kfree+0x62/0x90 [] kfree_skbmem+0x8/0x20 [] __kfree_skb+0xdc/0x1a0 [] netlink_recvmsg+0xf1/0x230 []
Re: 2.6.12-rc1-mm3
Hi Dmitry and others, At 06:41 a.m. 31/03/2005, Dmitry Torokhov wrote: On Monday 28 March 2005 06:02, Russell King wrote: Looks like something in the input layer went bang. The code in serport_ldisc_write_wakeup is: 0: 8b 80 a8 09 00 00 mov0x9a8(%eax),%eax 6: 8b 40 14mov0x14(%eax),%eax 9: 8b 50 70mov0x70(%eax),%edx c: 85 d2 test %edx,%edx e: 74 09 je 0x19 and the marked line exploded on you. The above instructions correspond with: 0: struct serport *sp = (struct serport *) tty-disc_data; 6: serio_drv_write_wakeup(sp-serio); 9: if (serio-drv So, serio was this strange 0xf3a6cdf8 value. But why? One for the input people I think. Reuben, could you please try the patch below? Thanks! Russell, could you please tell me if ldisc-write_wakeup (tty_wakwup) and ldisc-read are allowed to be called from an IRQ context? IOW I wonder if I can use spil_lock_bh instead of spil_lock_irqsave to protect serport flags. -- Dmitry serport.c | 98 +++--- 1 files changed, 68 insertions(+), 30 deletions(-) Index: dtor/drivers/input/serio/serport.c === --- dtor.orig/drivers/input/serio/serport.c +++ dtor/drivers/input/serio/serport.c @@ -27,11 +27,15 @@ MODULE_LICENSE(GPL); MODULE_ALIAS_LDISC(N_MOUSE); I've done some testing this afternoon and it seems that this patch fixes the problem in -mm4. I don't even have a serial mouse/keyboard, but do have a serial PCI card onboard. The box has a USB connection to a Belkin KVM instead of directly attached input devices. I also note that it is occurring on kernel-smp-2.6.11-1.1219_FC4 - so it is probably a problem in mainline as well as -mm. Now I'm crashing a bit further through the shutdown, here's the stacktrace: INIT: Sending processes the TERM signal Stopping yum: Disabling nightly yum update: [ OK ] [ OK ] Stopping cups-config-daemon: [ OK ] Stopping HAL daemon: [ OK ] Stopping system message bus: [ OK ] Stopping atd: [ OK ] Stopping cups: [ OK ] Shutting down xfs: [ OK ] [ OK ] down console mouse services: [ OK ] Shutting down NFS mountd: [ OK ] Shutting down NFS daemon: nfsd: last server has exited nfsd: unexporting all filesystems RPC: error 5 connecting to server localhost RPC: failed to contact portmap (errno -5). Unable to handle kernel paging request at virtual address f2826d2c printing eip: c01337a9 *pde = Oops: [#1] SMP DEBUG_PAGEALLOC Modules linked in: nfsd exportfs md5 ipv6 lp snd_usb_audio snd_usb_lib pwc video dev usb_storage autofs4 eeprom lm85 i2c_sensor rfcomm l2cap bluetooth nfs lockd sunrpc dm_mod video button battery ac ohci1394 ieee1394 uhci_hcd ehci_hcd parpor t_serial parport_pc parport hw_random i2c_i801 i2c_core emu10k1_gp gameport snd_ emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_ pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore e100 mii flopp y ext3 jbd ata_piix libata sd_mod scsi_mod CPU:0 EIP:0060:[c01337a9]Not tainted VLI EFLAGS: 00010087 (2.6.12-rc1-mm4) EIP is at worker_thread+0x149/0x230 eax: 0001 ebx: 0212 ecx: f7eb4018 edx: f2826d20 esi: f2826d24 edi: f7eb4000 ebp: esp: f7e83f7c ds: 007b es: 007b ss: 0068 Process events/0 (pid: 8, threadinfo=f7e83000 task=f7fefad0) Stack: f7eb4028 f7eb4010 f7eb4018 f7e83000 f2826d20 c014f4b0 0001 000f41fa 0001 f7fefad0 c011ea50 00100100 00200200 fffc f7e46f54 f7eb4000 c0133660 c0137694 Call Trace: [c014f4b0] cache_reap+0x0/0x240 [c011ea50] default_wake_function+0x0/0x10 [c0133660] worker_thread+0x0/0x230 [c0137694] kthread+0x94/0xa0 [c0137600] kthread+0x0/0xa0 [c01023f5] kernel_thread_helper+0x5/0x10 Code: 00 00 89 f8 e8 19 e3 1e 00 89 c3 8b 47 40 40 89 47 40 83 f8 03 0f 8f bd 00 00 00 8b 77 10 3b 74 24 04 74 71 8d 56 fc 89 54 24 10 8b 42 0c 89 44 24 14 8b 6a 10 8b 46 04 8b 16 89 10 89 36 89 42 [ OK ] Shutting down NFS quotas: [FAILED] Shutting down NFS services: [ OK ] Stopping sshd: [ OK ] Stopping postfix: Shutting down postfix: 3BUG: soft lockup detected on CPU#0! Pid: 3413, comm: rpc.rquotad EIP: 0060:[c0321ac0] CPU: 0 EIP is at _spin_lock_irqsave+0x20/0x50 EFLAGS: 0286Not tainted (2.6.12-rc1-mm4) EAX: f7eb4000 EBX: 0246 ECX: f7eb4000 EDX: c22021a0 ESI: f7eb4000 EDI: c22021a0 EBP: c01335b0 DS: 007b ES: 007b CR0: 8005003b CR2: 800147fc CR3: 37256d20 CR4: 06e0 [c013350c] __queue_work+0xc/0x50 [c012cc17] run_timer_softirq+0xd7/0x1c0 [c0128950] __do_softirq+0x80/0x100 [c0106adb] do_softirq+0x4b/0x50 === [c010511c] apic_timer_interrupt+0x1c/0x30 [c02b7ed8] kfree_skbmem+0x8/0x20 [c02b007b] cpufreq_governor+0x3b/0x50 [c014eed2]
Re: 2.6.12-rc1-mm3
Reuben Farrelly wrote: I'm repeatably getting this crash on shutdown in -mm3, and a few releases earlier (but I can't be certain it was the same crash..) Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A ttyS4 at I/O 0xa400 (irq = 16) is a 16550A ttyS5 at I/O 0xa408 (irq = 16) is a 16550A This _may_ be the culprit, but I'm not sure: 03:03.0 Serial controller: Timedia Technology Co Ltd PCI2S550 (Dual 16550 UART) (rev 01) (prog-if 02 [16550]) Subsystem: Timedia Technology Co Ltd: Unknown device 0002 Flags: stepping, medium devsel, IRQ 16 I/O ports at a400 [size=32] Ugh. I'm an idiot, that will teach me for having two sessions to boxes running at once. Wrong info above, but the trace is still valid. Correct info follows: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS14 at I/O 0xb400 (irq = 217) is a 16550A ttyS15 at I/O 0xb000 (irq = 217) is a 16550A 06:02.0 Serial controller: NetMos Technology PCI 9835 Multi-I/O Controller (rev 01) (prog-if 02 [16550]) Subsystem: LSI Logic / Symbios Logic 2S (16C550 UART) Flags: medium devsel, IRQ 217 I/O ports at b400 [size=8] I/O ports at b000 [size=8] I/O ports at ac00 [size=8] I/O ports at a800 [size=8] I/O ports at a400 [size=8] I/O ports at a000 [size=16] The board is an Intel D925XCV. Shutdown goes like this: (yes, hyperterminal sucks for the ^M characters, sorry) reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc1-mm3
Hi, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc1/2.6.12-rc1-mm3/ - Mainly a bunch of fixes relative to 2.6.12-rc1-mm2. - Again, we'd like people who have had recent DRM and USB resume problems to test and report, please. - The bk-ide-dev tree is back after a couple of weeks of difficulties. - Jeff asks that anyone who has had problems with the Silicon Image SATA drivers test sata_sil-corruption--lockup-fix.patch, which is included in this kernel. I'm repeatably getting this crash on shutdown in -mm3, and a few releases earlier (but I can't be certain it was the same crash..) Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A ttyS4 at I/O 0xa400 (irq = 16) is a 16550A ttyS5 at I/O 0xa408 (irq = 16) is a 16550A This _may_ be the culprit, but I'm not sure: 03:03.0 Serial controller: Timedia Technology Co Ltd PCI2S550 (Dual 16550 UART) (rev 01) (prog-if 02 [16550]) Subsystem: Timedia Technology Co Ltd: Unknown device 0002 Flags: stepping, medium devsel, IRQ 16 I/O ports at a400 [size=32] The board is an Intel D925XCV. Shutdown goes like this: (yes, hyperterminal sucks for the ^M characters, sorry) INIT: Switching^MINIT: Sending processes the TERM signal Stopping yum: Disabling nightly yum update: [ OK ] [ OK ] Stopping cups-config-daemon: [ OK ] Stopping HAL daemon: [ OK ] Stopping system message bus: [ OK ] Stopping atd: [ OK ] Stopping cups: [ OK ] Shutting down xfs: [ OK ] Shutting down console mouse services: [ OK ] Unable to handle kernel paging request at virtual address f3a6ce68 printing eip: c0244109 *pde = Oops: [#1] SMP DEBUG_PAGEALLOC Modules linked in: hidp hci_usb sermouse nfsd exportfs md5 ipv6 lp autofs4 eeprom lm85 i2c_sensor rfcomm l2cap bluetooth nfs lock d sunrpc usb_storage pwc videodev dm_mod video button battery ac ohci1394 ieee1394 uhci_hcd ehci_hcd parport_serial parport_pc parp ort hw_random i2c_i801 i2c_core emu10k1_gp gameport e100 mii floppy ext3 jbd ata_piix libata sd_mod scsi_mod CPU:0 EIP:0060:[]Not tainted VLI EFLAGS: 00010286 (2.6.12-rc1-mm3) EIP is at serport_ldisc_write_wakeup+0x9/0x20 eax: f3a6cdf8 ebx: f73d7000 ecx: c038e374 edx: c0244100 esi: f73d700c edi: f73d7000 ebp: c049e900 esp: f7568dc0 ds: 007b es: 007b ss: 0068 Process inputattach (pid: 2932, threadinfo=f7568000 task=f6993ac0) Stack: c021bb08 0286 f6c31000 c0245e4a f6c31018 f73d7000 f67c1e88 cbff5c c021ceaa c1e46000 c1e46000 c011b739 0046 c1e46000 0001 f2c0 f2c0 c011b8b4 Call Trace: ^M [] tty_wakeup+0x48/0x70 ^M [] uart_close+0xca/0x1e0 ^M [] release_dev+0x14a/0x750 ^M [] change_page_attr+0x29/0x60 ^M [] kernel_map_pages+0x84/0xa0 ^M [] store_stackinfo+0x5a/0x90 ^M [] __fput+0x108/0x180 ^M [] inotify_inode_queue_event+0x2b/0x40 ^M [] tty_release+0xf/0x20 ^M [] __fput+0x8a/0x180 ^M [] filp_close+0x4b/0x70 ^M [] put_files_struct+0x74/0x100 ^M [] do_exit+0x11c/0x420 ^M [] do_group_exit+0x2d/0xa0 ^M [] get_signal_to_deliver+0x20c/0x310 ^M [] do_signal+0x5b/0x140 ^M [] __wake_up+0x29/0x40 ^M [] tty_ldisc_deref+0x3c/0x70 ^M [] tty_read+0xc7/0x130 ^M [] serport_ldisc_read+0x0/0x100 ^M [] sys_fstat64+0x23/0x30 ^M [] tty_read+0x0/0x130 ^M [] vfs_read+0x97/0x140 ^M [] sys_read+0x3c/0x70 ^M [] do_notify_resume+0x2a/0x40 ^M [] work_notifysig+0x13/0x25 ^MCode: e8 0f b6 c5 88 4b 4b 31 d2 c1 e9 10 88 43 4a 88 4b 49 89 d0 5b c3 8d b6 00 00 00 00 8d bf 00 00 00 00 8b 80 a8 09 00 00 8b 40 14 <8b> 50 70 85 d2 74 09 8b 52 10 85 d2 74 02 ff d2 c3 90 90 90 90 ^M BUG: atomic counter underflow at: ^M [] do_exit+0x396/0x420 ^M [] die+0x166/0x170 ^M [] do_page_fault+0x1f3/0x6a1 ^M [] serport_ldisc_write_wakeup+0x9/0x20 ^M [] __change_page_attr+0x4c/0x3f0 ^M [] do_page_fault+0x0/0x6a1 ^M [] error_code+0x4f/0x60 ^M [] serport_ldisc_write_wakeup+0x0/0x20 ^M [] serport_ldisc_write_wakeup+0x9/0x20 ^M [] tty_wakeup+0x48/0x70 ^M [] uart_close+0xca/0x1e0 ^M [] release_dev+0x14a/0x750 ^M [] change_page_attr+0x29/0x60 ^M [] kernel_map_pages+0x84/0xa0 ^M [] store_stackinfo+0x5a/0x90 ^M [] __fput+0x108/0x180 ^M [] inotify_inode_queue_event+0x2b/0x40 ^M [] tty_release+0xf/0x20 ^M [] __fput+0x8a/0x180 ^M [] filp_close+0x4b/0x70 ^M [] put_files_struct+0x74/0x100 ^M [] do_exit+0x11c/0x420 ^M [] do_group_exit+0x2d/0xa0 ^M [] get_signal_to_deliver+0x20c/0x310 ^M [] do_signal+0x5b/0x140 ^M [] __wake_up+0x29/0x40 ^M [] tty_ldisc_deref+0x3c/0x70 ^M [] tty_read+0xc7/0x130 ^M [] serport_ldisc_read+0x0/0x100 ^M [] sys_fstat64+0x23/0x30 ^M [] tty_read+0x0/0x130 ^M [] vfs_read+0x97/0x140 ^M [] sys_read+0x3c/0x70 ^M [] do_notify_resume+0x2a/0x40 ^M [] work_notifysig+0x13/0x25 ^MUnable to handle kernel NULL pointer dereference at virtual address 0020 ^M printing eip: ^Mc0121320 ^M*pde = 0041f001 ^MOops: [#2] ^MSMP
Re: 2.6.12-rc1-mm3
Hi, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc1/2.6.12-rc1-mm3/ - Mainly a bunch of fixes relative to 2.6.12-rc1-mm2. - Again, we'd like people who have had recent DRM and USB resume problems to test and report, please. - The bk-ide-dev tree is back after a couple of weeks of difficulties. - Jeff asks that anyone who has had problems with the Silicon Image SATA drivers test sata_sil-corruption--lockup-fix.patch, which is included in this kernel. I'm repeatably getting this crash on shutdown in -mm3, and a few releases earlier (but I can't be certain it was the same crash..) Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A ttyS4 at I/O 0xa400 (irq = 16) is a 16550A ttyS5 at I/O 0xa408 (irq = 16) is a 16550A This _may_ be the culprit, but I'm not sure: 03:03.0 Serial controller: Timedia Technology Co Ltd PCI2S550 (Dual 16550 UART) (rev 01) (prog-if 02 [16550]) Subsystem: Timedia Technology Co Ltd: Unknown device 0002 Flags: stepping, medium devsel, IRQ 16 I/O ports at a400 [size=32] The board is an Intel D925XCV. Shutdown goes like this: (yes, hyperterminal sucks for the ^M characters, sorry) INIT: Switching^MINIT: Sending processes the TERM signal Stopping yum: Disabling nightly yum update: [ OK ] [ OK ] Stopping cups-config-daemon: [ OK ] Stopping HAL daemon: [ OK ] Stopping system message bus: [ OK ] Stopping atd: [ OK ] Stopping cups: [ OK ] Shutting down xfs: [ OK ] Shutting down console mouse services: [ OK ] Unable to handle kernel paging request at virtual address f3a6ce68 printing eip: c0244109 *pde = Oops: [#1] SMP DEBUG_PAGEALLOC Modules linked in: hidp hci_usb sermouse nfsd exportfs md5 ipv6 lp autofs4 eeprom lm85 i2c_sensor rfcomm l2cap bluetooth nfs lock d sunrpc usb_storage pwc videodev dm_mod video button battery ac ohci1394 ieee1394 uhci_hcd ehci_hcd parport_serial parport_pc parp ort hw_random i2c_i801 i2c_core emu10k1_gp gameport e100 mii floppy ext3 jbd ata_piix libata sd_mod scsi_mod CPU:0 EIP:0060:[c0244109]Not tainted VLI EFLAGS: 00010286 (2.6.12-rc1-mm3) EIP is at serport_ldisc_write_wakeup+0x9/0x20 eax: f3a6cdf8 ebx: f73d7000 ecx: c038e374 edx: c0244100 esi: f73d700c edi: f73d7000 ebp: c049e900 esp: f7568dc0 ds: 007b es: 007b ss: 0068 Process inputattach (pid: 2932, threadinfo=f7568000 task=f6993ac0) Stack: c021bb08 0286 f6c31000 c0245e4a f6c31018 f73d7000 f67c1e88 cbff5c c021ceaa c1e46000 c1e46000 c011b739 0046 c1e46000 0001 f2c0 f2c0 c011b8b4 Call Trace: ^M [c021bb08] tty_wakeup+0x48/0x70 ^M [c0245e4a] uart_close+0xca/0x1e0 ^M [c021ceaa] release_dev+0x14a/0x750 ^M [c011b739] change_page_attr+0x29/0x60 ^M [c011b8b4] kernel_map_pages+0x84/0xa0 ^M [c014cbca] store_stackinfo+0x5a/0x90 ^M [c01664c8] __fput+0x108/0x180 ^M [c018b59b] inotify_inode_queue_event+0x2b/0x40 ^M [c021d97f] tty_release+0xf/0x20 ^M [c016644a] __fput+0x8a/0x180 ^M [c0164d7b] filp_close+0x4b/0x70 ^M [c0125254] put_files_struct+0x74/0x100 ^M [c012610c] do_exit+0x11c/0x420 ^M [c012647d] do_group_exit+0x2d/0xa0 ^M [c012f74c] get_signal_to_deliver+0x20c/0x310 ^M [c0103deb] do_signal+0x5b/0x140 ^M [c011ea89] __wake_up+0x29/0x40 ^M [c021b60c] tty_ldisc_deref+0x3c/0x70 ^M [c021c267] tty_read+0xc7/0x130 ^M [c0243fb0] serport_ldisc_read+0x0/0x100 ^M [c016ecd3] sys_fstat64+0x23/0x30 ^M [c021c1a0] tty_read+0x0/0x130 ^M [c0165547] vfs_read+0x97/0x140 ^M [c016585c] sys_read+0x3c/0x70 ^M [c0103efa] do_notify_resume+0x2a/0x40 ^M [c01040be] work_notifysig+0x13/0x25 ^MCode: e8 0f b6 c5 88 4b 4b 31 d2 c1 e9 10 88 43 4a 88 4b 49 89 d0 5b c3 8d b6 00 00 00 00 8d bf 00 00 00 00 8b 80 a8 09 00 00 8b 40 14 8b 50 70 85 d2 74 09 8b 52 10 85 d2 74 02 ff d2 c3 90 90 90 90 ^M BUG: atomic counter underflow at: ^M [c0126386] do_exit+0x396/0x420 ^M [c01059f6] die+0x166/0x170 ^M [c011a7a3] do_page_fault+0x1f3/0x6a1 ^M [c0244109] serport_ldisc_write_wakeup+0x9/0x20 ^M [c011b36c] __change_page_attr+0x4c/0x3f0 ^M [c011a5b0] do_page_fault+0x0/0x6a1 ^M [c010522f] error_code+0x4f/0x60 ^M [c0244100] serport_ldisc_write_wakeup+0x0/0x20 ^M [c0244109] serport_ldisc_write_wakeup+0x9/0x20 ^M [c021bb08] tty_wakeup+0x48/0x70 ^M [c0245e4a] uart_close+0xca/0x1e0 ^M [c021ceaa] release_dev+0x14a/0x750 ^M [c011b739] change_page_attr+0x29/0x60 ^M [c011b8b4] kernel_map_pages+0x84/0xa0 ^M [c014cbca] store_stackinfo+0x5a/0x90 ^M [c01664c8] __fput+0x108/0x180 ^M [c018b59b] inotify_inode_queue_event+0x2b/0x40 ^M [c021d97f] tty_release+0xf/0x20 ^M [c016644a] __fput+0x8a/0x180 ^M [c0164d7b] filp_close+0x4b/0x70 ^M [c0125254] put_files_struct+0x74/0x100 ^M [c012610c] do_exit+0x11c/0x420 ^M [c012647d] do_group_exit+0x2d/0xa0 ^M [c012f74c] get_signal_to_deliver+0x20c/0x310 ^M [c0103deb] do_signal+0x5b/0x140 ^M [c011ea89]
Re: 2.6.12-rc1-mm3
Reuben Farrelly wrote: I'm repeatably getting this crash on shutdown in -mm3, and a few releases earlier (but I can't be certain it was the same crash..) Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A ttyS4 at I/O 0xa400 (irq = 16) is a 16550A ttyS5 at I/O 0xa408 (irq = 16) is a 16550A This _may_ be the culprit, but I'm not sure: 03:03.0 Serial controller: Timedia Technology Co Ltd PCI2S550 (Dual 16550 UART) (rev 01) (prog-if 02 [16550]) Subsystem: Timedia Technology Co Ltd: Unknown device 0002 Flags: stepping, medium devsel, IRQ 16 I/O ports at a400 [size=32] Ugh. I'm an idiot, that will teach me for having two sessions to boxes running at once. Wrong info above, but the trace is still valid. Correct info follows: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS14 at I/O 0xb400 (irq = 217) is a 16550A ttyS15 at I/O 0xb000 (irq = 217) is a 16550A 06:02.0 Serial controller: NetMos Technology PCI 9835 Multi-I/O Controller (rev 01) (prog-if 02 [16550]) Subsystem: LSI Logic / Symbios Logic 2S (16C550 UART) Flags: medium devsel, IRQ 217 I/O ports at b400 [size=8] I/O ports at b000 [size=8] I/O ports at ac00 [size=8] I/O ports at a800 [size=8] I/O ports at a400 [size=8] I/O ports at a000 [size=16] The board is an Intel D925XCV. Shutdown goes like this: (yes, hyperterminal sucks for the ^M characters, sorry) trace omitted reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc1-mm2
Hi, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc1/2.6.12-rc1-mm2/ - Added David Miller's networking tree to the -mm lineup as bk-net.patch. - Added Herbert Xu's crypto development tree to the -mm lineup as bk-cryptodev.patch. -mm kernels now aggregate Linus's tree and 34 subsystem trees. Usually they are pulled 3-4 hours before the release of the -mm kernel. Usually it is possible to determine the latest cset from each tree by looking at the first couple of lines of the relevant patch in the broken-out/ directory. Although sometimes it isn't there if I had to massage the diff. - There may be an x86_64 problem here, although it works for me. If it fails early in boot, try reverting x86_64-separate-amd-cmp-detection-from-hyper-threading.patch - There's some work here on the recent USB PM resume bugs. If you had problems there, please test and be sure to cc linux-usb-devel@lists.sourceforge.net in any reports. - Some fixes for the recent DRM problems. - Big DVB update - md updates - nfs4 server updates - Lots more fixes - Lots more bugs. Fails to compile for me: CC [M] fs/nfs/dir.o CC [M] fs/nfs/inode.o CC [M] fs/nfs/nfs4proc.o fs/nfs/nfs4proc.c:2976: error: static declaration of 'nfs4_file_inode_operations' follows non-static declaration fs/nfs/nfs4_fs.h:179: error: previous declaration of 'nfs4_file_inode_operations' was here make[2]: *** [fs/nfs/nfs4proc.o] Error 1 make[1]: *** [fs/nfs] Error 2 make: *** [fs] Error 2 I needed to remove this line: extern struct inode_operations nfs4_file_inode_operations; from fs/nfs/nfs4_fs.h. Patch attached. Reuben --- fs/nfs/nfs4_fs.h2005-03-25 11:40:51.0 +1200 +++ fs/nfs/nfs4_fs.h2005-03-25 11:44:28.0 +1200 @@ -176,7 +176,6 @@ extern struct dentry_operations nfs4_dentry_operations; extern struct inode_operations nfs4_dir_inode_operations; -extern struct inode_operations nfs4_file_inode_operations; /* inode.c */ extern ssize_t nfs4_getxattr(struct dentry *, const char *, void *, size_t);
Re: 2.6.12-rc1-mm2
Hi, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc1/2.6.12-rc1-mm2/ - Added David Miller's networking tree to the -mm lineup as bk-net.patch. - Added Herbert Xu's crypto development tree to the -mm lineup as bk-cryptodev.patch. -mm kernels now aggregate Linus's tree and 34 subsystem trees. Usually they are pulled 3-4 hours before the release of the -mm kernel. Usually it is possible to determine the latest cset from each tree by looking at the first couple of lines of the relevant patch in the broken-out/ directory. Although sometimes it isn't there if I had to massage the diff. - There may be an x86_64 problem here, although it works for me. If it fails early in boot, try reverting x86_64-separate-amd-cmp-detection-from-hyper-threading.patch - There's some work here on the recent USB PM resume bugs. If you had problems there, please test and be sure to cc linux-usb-devel@lists.sourceforge.net in any reports. - Some fixes for the recent DRM problems. - Big DVB update - md updates - nfs4 server updates - Lots more fixes - Lots more bugs. Fails to compile for me: CC [M] fs/nfs/dir.o CC [M] fs/nfs/inode.o CC [M] fs/nfs/nfs4proc.o fs/nfs/nfs4proc.c:2976: error: static declaration of 'nfs4_file_inode_operations' follows non-static declaration fs/nfs/nfs4_fs.h:179: error: previous declaration of 'nfs4_file_inode_operations' was here make[2]: *** [fs/nfs/nfs4proc.o] Error 1 make[1]: *** [fs/nfs] Error 2 make: *** [fs] Error 2 I needed to remove this line: extern struct inode_operations nfs4_file_inode_operations; from fs/nfs/nfs4_fs.h. Patch attached. Reuben --- fs/nfs/nfs4_fs.h2005-03-25 11:40:51.0 +1200 +++ fs/nfs/nfs4_fs.h2005-03-25 11:44:28.0 +1200 @@ -176,7 +176,6 @@ extern struct dentry_operations nfs4_dentry_operations; extern struct inode_operations nfs4_dir_inode_operations; -extern struct inode_operations nfs4_file_inode_operations; /* inode.c */ extern ssize_t nfs4_getxattr(struct dentry *, const char *, void *, size_t);
Re: 2.6.11-mm3
At 12:42 a.m. 13/03/2005, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm3/ - A new version of the "acpi poweroff fix". People who were having trouble with ACPI poweroff, please test and report. - A very large update to the CFQ I/O scheduler. Treat with caution, run benchmarks. Remember that the I/O scheduler can be selected on a per-disk basis with echo as > /sys/block/sda/queue/scheduler echo deadline > /sys/block/sda/queue/scheduler echo cfq > /sys/block/sda/queue/scheduler - video-for-linux update Ugh, NTFS is br0ken: CC [M] fs/ntfs/attrib.o fs/ntfs/attrib.c: In function 'ntfs_attr_make_non_resident': fs/ntfs/attrib.c:1295: warning: implicit declaration of function 'ntfs_cluster_alloc' fs/ntfs/attrib.c:1296: error: 'DATA_ZONE' undeclared (first use in this function) fs/ntfs/attrib.c:1296: error: (Each undeclared identifier is reported only once fs/ntfs/attrib.c:1296: error: for each function it appears in.) fs/ntfs/attrib.c:1296: warning: assignment makes pointer from integer without a cast fs/ntfs/attrib.c:1435: warning: implicit declaration of function 'flush_dcache_mft_record_page' fs/ntfs/attrib.c:1436: warning: implicit declaration of function 'mark_mft_record_dirty' fs/ntfs/attrib.c:1443: warning: implicit declaration of function 'mark_page_accessed' fs/ntfs/attrib.c:1521: warning: implicit declaration of function 'ntfs_cluster_free_from_rl' make[2]: *** [fs/ntfs/attrib.o] Error 1 make[1]: *** [fs/ntfs] Error 2 make: *** [fs] Error 2 Compile goes through to completion fine if I back out bk-ntfs.patch. Using gcc-4, but this problem did not exist in -mm2. reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-mm3
At 12:42 a.m. 13/03/2005, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm3/ - A new version of the acpi poweroff fix. People who were having trouble with ACPI poweroff, please test and report. - A very large update to the CFQ I/O scheduler. Treat with caution, run benchmarks. Remember that the I/O scheduler can be selected on a per-disk basis with echo as /sys/block/sda/queue/scheduler echo deadline /sys/block/sda/queue/scheduler echo cfq /sys/block/sda/queue/scheduler - video-for-linux update Ugh, NTFS is br0ken: CC [M] fs/ntfs/attrib.o fs/ntfs/attrib.c: In function 'ntfs_attr_make_non_resident': fs/ntfs/attrib.c:1295: warning: implicit declaration of function 'ntfs_cluster_alloc' fs/ntfs/attrib.c:1296: error: 'DATA_ZONE' undeclared (first use in this function) fs/ntfs/attrib.c:1296: error: (Each undeclared identifier is reported only once fs/ntfs/attrib.c:1296: error: for each function it appears in.) fs/ntfs/attrib.c:1296: warning: assignment makes pointer from integer without a cast fs/ntfs/attrib.c:1435: warning: implicit declaration of function 'flush_dcache_mft_record_page' fs/ntfs/attrib.c:1436: warning: implicit declaration of function 'mark_mft_record_dirty' fs/ntfs/attrib.c:1443: warning: implicit declaration of function 'mark_page_accessed' fs/ntfs/attrib.c:1521: warning: implicit declaration of function 'ntfs_cluster_free_from_rl' make[2]: *** [fs/ntfs/attrib.o] Error 1 make[1]: *** [fs/ntfs] Error 2 make: *** [fs] Error 2 Compile goes through to completion fine if I back out bk-ntfs.patch. Using gcc-4, but this problem did not exist in -mm2. reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Breakage with raid in 2.6.11-rc1-mm1 [Regression in mm]
Hi, Reuben Farrelly wrote: At 12:58 a.m. 15/01/2005, Andrew Morton wrote: Reuben Farrelly <[EMAIL PROTECTED]> wrote: > > Something seems to have broken with 2.6.11-rc1-mm1, which worked ok with > 2.6.10-mm3. > > NET: Registered protocol family 17 > Starting balanced_irq > BIOS EDD facility v0.16 2004-Jun-25, 2 devices found > md: Autodetecting RAID arrays. > md: autorun ... > md: ... autorun DONE. > Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) > > The system is running 5 RAID-1 partitions, and md2 is the root as per > grub.conf. Problem seems to be that raid autodetection finds no raid > partitions :( > > The two ST380013AS SATA drives are detected earlier in the boot, so I don't > think that's the problem.. hm, the only raidy thing we have in there is the below. Maybe you could try reverting that? --- 25/drivers/md/raid5.c~raid5-overlapping-read-hack 2005-01-09 22:20:40.211246912 -0800 +++ 25-akpm/drivers/md/raid5.c 2005-01-09 22:20:40.216246152 -0800 @@ -232,6 +232,7 @@ static struct stripe_head *__find_stripe } static void unplug_slaves(mddev_t *mddev); +static void raid5_unplug_device(request_queue_t *q); static struct stripe_head *get_active_stripe(raid5_conf_t *conf, sector_t sector, int pd_idx, int noblock) Ok the breakage occurred somewhere between 2.6.10-mm3 (works) and 2.6.11-rc1 (doesn't work) ie wasn't introduced into the latest -mm patchset as I first thought. Are there any other patches that might be worth a try backing out? reuben I did a full untar of the source and rebuilt my (crusty old) config file from scratch, and it seems to have come right now. Can't really explain it though...but obviously wasn't a problem with the -mm release as I first though. Now running -rc1-mm1 with no problems and no other patches. Thanks to those who helped on what turned out to be a false alarm. reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Breakage with raid in 2.6.11-rc1-mm1 [Regression in mm]
Hi, Reuben Farrelly wrote: At 12:58 a.m. 15/01/2005, Andrew Morton wrote: Reuben Farrelly [EMAIL PROTECTED] wrote: Something seems to have broken with 2.6.11-rc1-mm1, which worked ok with 2.6.10-mm3. NET: Registered protocol family 17 Starting balanced_irq BIOS EDD facility v0.16 2004-Jun-25, 2 devices found md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. snip Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) The system is running 5 RAID-1 partitions, and md2 is the root as per grub.conf. Problem seems to be that raid autodetection finds no raid partitions :( The two ST380013AS SATA drives are detected earlier in the boot, so I don't think that's the problem.. hm, the only raidy thing we have in there is the below. Maybe you could try reverting that? --- 25/drivers/md/raid5.c~raid5-overlapping-read-hack 2005-01-09 22:20:40.211246912 -0800 +++ 25-akpm/drivers/md/raid5.c 2005-01-09 22:20:40.216246152 -0800 @@ -232,6 +232,7 @@ static struct stripe_head *__find_stripe } static void unplug_slaves(mddev_t *mddev); +static void raid5_unplug_device(request_queue_t *q); static struct stripe_head *get_active_stripe(raid5_conf_t *conf, sector_t sector, int pd_idx, int noblock) Ok the breakage occurred somewhere between 2.6.10-mm3 (works) and 2.6.11-rc1 (doesn't work) ie wasn't introduced into the latest -mm patchset as I first thought. Are there any other patches that might be worth a try backing out? reuben I did a full untar of the source and rebuilt my (crusty old) config file from scratch, and it seems to have come right now. Can't really explain it though...but obviously wasn't a problem with the -mm release as I first though. Now running -rc1-mm1 with no problems and no other patches. Thanks to those who helped on what turned out to be a false alarm. reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/