Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment
Ilpo Järvinen wrote: > On Thu, 13 Dec 2007, Cedric Le Goater wrote: > >> I got this one while compiling on NFS. >> >> C. >> >> kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480! > > I'm not exactly sure what patches you have applied and which patches are > not, with rc4-mm1 there are two patches (first one was incomplete, I > assume you had at least that one based on your other mail) to really fix > the issues in (__|)tcp_reset_fack_counts(...). Yes I only have the first patch you sent on lkml on top of 2.6.24-rc4-mm1. attached below. I didn't see the second one on lkml ? > However, there seems to be so much breakage that I have a bit trouble to > decide where to start... The situation seems bit scary :-). my n/w environment seems to reproduce these issues quite easily. if you need some testing, just ping me. Cheers, C. > So, I might soon prepare a revert patch for most of the questionable > TCP parts and ask Dave to apply it (and drop them fully during next > rebase) unless I suddently figure something out soon which explains > all/most of the problems, then return to drawing board. ...As it seems > that the cumulative ACK processing problem discovered later on (having > rather cumbersome solution with skbs only) will make part of the work > that's currently in net-2.6.25 quite useless/duplicate effort. But thanks > anyway for reporting these. > > Subject: [PATCH] [TCP]: Fix fack_count miscountings (multiple places) 1) Fack_count is set incorrectly if the highest sent skb is already sacked (the skb->prev won't return it because it's on the other list already). These manifest as fackets_out counting error later on, the second-order effects are very hard to track, so it may fix all out-standing TCP bug reports. 2) Prev == NULL check was wrong way around 3) Last skb's fack count was incorrectly skipped while() {} loop Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- include/net/tcp.h | 22 -- 1 files changed, 16 insertions(+), 6 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 9dbed0b..11a7e3e 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1337,10 +1337,20 @@ static inline struct sk_buff *tcp_send_head(struct sock *sk) static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb) { struct sk_buff *prev = tcp_write_queue_prev(sk, skb); + unsigned int fc = 0; + + if (prev == (struct sk_buff *)>sk_write_queue) + prev = NULL; + else if (!tcp_skb_adjacent(sk, prev, skb)) + prev = NULL; - if (prev != (struct sk_buff *)>sk_write_queue) - TCP_SKB_CB(skb)->fack_count = TCP_SKB_CB(prev)->fack_count + - tcp_skb_pcount(prev); + if ((prev == NULL) && !__tcp_write_queue_empty(sk, TCP_WQ_SACKED)) + prev = __tcp_write_queue_tail(sk, TCP_WQ_SACKED); + + if (prev != NULL) + fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev); + + TCP_SKB_CB(skb)->fack_count = fc; sk->sk_send_head = tcp_write_queue_next(sk, skb); if (sk->sk_send_head == (struct sk_buff *)>sk_write_queue) @@ -1464,7 +1474,7 @@ static inline struct sk_buff *__tcp_reset_fack_counts(struct sock *sk, { unsigned int fc = 0; - if (prev == NULL) + if (prev != NULL) fc = TCP_SKB_CB(*prev)->fack_count + tcp_skb_pcount(*prev); BUG_ON((*prev != NULL) && !tcp_skb_adjacent(sk, *prev, skb)); @@ -1521,7 +1531,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *inskb) skb[otherq] = prev->next; } - while (skb[queue] != __tcp_write_queue_tail(sk, queue)) { + do { /* Lazy find for the other queue */ if (skb[queue] == NULL) { skb[queue] = tcp_write_queue_find(sk, TCP_SKB_CB(prev)->seq, @@ -1535,7 +1545,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *inskb) break; queue ^= TCP_WQ_SACKED; - } + } while (skb[queue] != __tcp_write_queue_tail(sk, queue)); } static inline void __tcp_insert_write_queue_after(struct sk_buff *skb, -- 1.5.0.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment
On Thu, 13 Dec 2007, Cedric Le Goater wrote: > I got this one while compiling on NFS. > > C. > > kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480! I'm not exactly sure what patches you have applied and which patches are not, with rc4-mm1 there are two patches (first one was incomplete, I assume you had at least that one based on your other mail) to really fix the issues in (__|)tcp_reset_fack_counts(...). However, there seems to be so much breakage that I have a bit trouble to decide where to start... The situation seems bit scary :-). So, I might soon prepare a revert patch for most of the questionable TCP parts and ask Dave to apply it (and drop them fully during next rebase) unless I suddently figure something out soon which explains all/most of the problems, then return to drawing board. ...As it seems that the cumulative ACK processing problem discovered later on (having rather cumbersome solution with skbs only) will make part of the work that's currently in net-2.6.25 quite useless/duplicate effort. But thanks anyway for reporting these. -- i. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Thu, Dec 13, 2007 at 09:17:18AM -0700, Bjorn Helgaas wrote: > On Thursday 13 December 2007 12:09:23 am Borislav Petkov wrote: > > On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote: > > > On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote: > > > > On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote: > > > > > On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote: > > > > > > From what i can roughly tell so far it seems like an resource > > > > > > conflict between acpi and > > > > > > the pnp requested regions in your patch which result in the > > > > > > acpi_thermal code > > > > > > to read the wrong (0xff) temperature value and halt the machine, > > > > > > but i might be > > > > > > wrong on the details since acpi is such a big code chunk to swallow. > > > > > > > > I think Alexey is on the right track with the PCI resource allocation > > > failure. > > > > Then it should be the SMBus controller, PCI id 00:1f:3, which is having > > problems > > registering its io ports region 4, AFAICT. > > Yes, it looks like the ioport region 0x540-0x55f is described both in > PNP and ACPI: > > /sys/devices/pnp0/00:0d/resources:state = active > /sys/devices/pnp0/00:0d/resources:io 0x540-0x55f > /sys/devices/pnp0/00:0d/resources:io 0x400-0x47f > > 00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus > Controller (rev 03) > Subsystem: ASUSTeK Computer Inc. Unknown device 1869 > Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- > Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- > SERR- Interrupt: pin B routed to IRQ 0 > Region 4: I/O ports at 0540 [size=32] > > The PCI SMBus device was enabled by a quirk, asus_hides_smbus_lpc(). > > This quirk seems dangerous to me, and the comments above asus_hides_smbus > allude to problems similar to what you're seeing. It's obvious that a > lot of blood, sweat, and tears have gone into this quirk, so I'm not > suggesting that it's time to revert it, but I would be interested in > knowing whether the critical temperature problem goes away if we leave > the PCI device hidden, e.g., with the following patch: > > Index: linux-mm/drivers/pci/quirks.c > === > --- linux-mm.orig/drivers/pci/quirks.c2007-12-13 09:11:31.0 > -0700 > +++ linux-mm/drivers/pci/quirks.c 2007-12-13 09:12:27.0 -0700 > @@ -1073,12 +1073,7 @@ > > pci_read_config_word(dev, 0xF2, ); > if (val & 0x8) { > - pci_write_config_word(dev, 0xF2, val & (~0x8)); > - pci_read_config_word(dev, 0xF2, ); > - if (val & 0x8) > - printk(KERN_INFO "PCI: i801 SMBus device continues to > play 'hide and seek'! 0x%x\n", val); > - else > - printk(KERN_INFO "PCI: Enabled i801 SMBus device\n"); > + printk(KERN_INFO "PCI: Leaving i801 SMBus device hidden\n"); > } > } > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, > PCI_DEVICE_ID_INTEL_82801AA_0, asus_hides_smbus_lpc); yep, this fixes it. Bootlog attached. -- Regards/Gruß, Boris. bootlog-smbus-hidden.bz2 Description: Binary data
Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment
Andrew Morton wrote: > Temporarily at > > http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ > > Will appear later at > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ I got this one while compiling on NFS. C. kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480! invalid opcode: [1] SMP last sysfs file: /sys/devices/pci:00/:00:1e.0/:01:01.0/local_cpus CPU 1 Modules linked in: autofs4 nfs lockd sunrpc tg3 sg joydev ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #3 RIP: 0010:[] [] tcp_fragment+0x5ee/0x6f7 RSP: 0018:810147c9f9e0 EFLAGS: 00010217 RAX: 1526c311 RBX: 8100c2ce1d00 RCX: 810143cc6aa0 RDX: 0001 RSI: 810102b37b00 RDI: 810102b37b50 RBP: 810147c9fa50 R08: 004a R09: 0001 R10: 0b50 R11: 0001 R12: 81013a575700 R13: R14: 810143cc6400 R15: 81013a575750 FS: () GS:810147c57140() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 2ad5d294b000 CR3: bd11b000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 0, threadinfo 810147c98000, task 810147c89040) Stack: 810147c9fa00 05a843cc6400 810143cc6400 810147c9fa70 8100c2ce1d50 810143cc6590 810143cc6aa0 15265421 810143cc6400 810143cc6400 81013a575700 Call Trace: [] tcp_retransmit_skb+0xd6/0x713 [] tcp_xmit_retransmit_queue+0xd0/0x330 [] tcp_fastretrans_alert+0xb92/0xbf2 [] tcp_ack+0xdf3/0xfbe [] tcp_rcv_established+0x66a/0x76d [] tcp_v4_do_rcv+0x37/0x3aa [] tcp_v4_rcv+0x9a9/0xa76 [] ip_local_deliver_finish+0x161/0x23c [] ip_local_deliver+0x72/0x77 [] ip_rcv_finish+0x371/0x3b5 [] ip_rcv+0x292/0x2c6 [] netif_receive_skb+0x267/0x340 [] :tg3:tg3_poll+0x5d2/0x89e [] net_rx_action+0xd5/0x1ad [] __do_softirq+0x5f/0xe3 [] call_softirq+0x1c/0x28 [] do_softirq+0x39/0x9f [] irq_exit+0x4e/0x50 [] do_IRQ+0xb7/0xd7 [] mwait_idle+0x0/0x55 [] ret_from_intr+0x0/0xf [] __atomic_notifier_call_chain+0x20/0x83 [] mwait_idle+0x48/0x55 [] enter_idle+0x22/0x24 [] cpu_idle+0xa1/0xc5 [] start_secondary+0x3b9/0x3c5 Code: 0f 0b eb fe 48 85 f6 74 08 8b 46 6c 3b 41 68 75 55 48 8d 41 RIP [] tcp_fragment+0x5ee/0x6f7 RSP Kernel panic - not syncing: Aiee, killing interrupt handler! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
tcp_sacktag_one() WARNING (was Re: 2.6.24-rc4-mm1)
Cedric Le Goater wrote: > Ilpo Järvinen wrote: >> On Wed, 5 Dec 2007, Andrew Morton wrote: >> >>> On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly <[EMAIL PROTECTED]> >>> wrote: >>> This non fatal oops which I have just noticed may be related to this change then - certainly looks networking related. >>> yep, but it isn't e1000. It's core TCP. >>> WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1 >>> Ilpo, Reuben's kernel is talking to you ;) >> ...Please try the patch below. Andrew, this probably fixes your problem >> (the packets <= tp->packets_out) as well. > > nah. I got the WARNINGs again with this patch. I got this new one on a 2.6.24-rc5-mm1. It looked similar ? C. WARNING: at /home/legoater/linux/2.6.24-rc5-mm1/net/ipv4/tcp_input.c:1280 tcp_sacktag_one() Pid: 0, comm: swapper Not tainted 2.6.24-rc5-mm1 #1 Call Trace: [] tcp_sacktag_walk+0x2bc/0x62a [] tcp_sacktag_write_queue+0x595/0xa7c [] kfree+0xd4/0xe0 [] tcp_ack+0x2a7/0xfc7 [] mark_held_locks+0x47/0x6a [] trace_hardirqs_on+0xfe/0x139 [] tcp_rcv_established+0x66a/0x76d [] tcp_v4_do_rcv+0x37/0x3aa [] tcp_v4_rcv+0x9a9/0xa76 [] ip_local_deliver_finish+0x161/0x23c [] ip_local_deliver+0x72/0x77 [] ip_rcv_finish+0x371/0x3b5 [] ip_rcv+0x292/0x2c6 [] netif_receive_skb+0x267/0x340 [] :tg3:tg3_poll+0x5d2/0x89e [] net_rx_action+0xd5/0x1ad [] __do_softirq+0x5f/0xe3 [] call_softirq+0x1c/0x28 [] do_softirq+0x39/0x9f [] irq_exit+0x4e/0x50 [] do_IRQ+0xb7/0xd7 [] mwait_idle+0x0/0x52 [] ret_from_intr+0x0/0xf [] __atomic_notifier_call_chain+0x20/0x83 [] mwait_idle+0x48/0x52 [] enter_idle+0x22/0x24 [] cpu_idle+0xa1/0xc5 [] start_secondary+0x3b9/0x3c5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Thursday 13 December 2007 12:09:23 am Borislav Petkov wrote: > On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote: > > On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote: > > > On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote: > > > > On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote: > > > > > From what i can roughly tell so far it seems like an resource > > > > > conflict between acpi and > > > > > the pnp requested regions in your patch which result in the > > > > > acpi_thermal code > > > > > to read the wrong (0xff) temperature value and halt the machine, but > > > > > i might be > > > > > wrong on the details since acpi is such a big code chunk to swallow. > > > > > > I think Alexey is on the right track with the PCI resource allocation > > failure. > > Then it should be the SMBus controller, PCI id 00:1f:3, which is having > problems > registering its io ports region 4, AFAICT. Yes, it looks like the ioport region 0x540-0x55f is described both in PNP and ACPI: /sys/devices/pnp0/00:0d/resources:state = active /sys/devices/pnp0/00:0d/resources:io 0x540-0x55f /sys/devices/pnp0/00:0d/resources:io 0x400-0x47f 00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 03) Subsystem: ASUSTeK Computer Inc. Unknown device 1869 Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Thursday 13 December 2007 12:09:23 am Borislav Petkov wrote: On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote: On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote: On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote: On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote: From what i can roughly tell so far it seems like an resource conflict between acpi and the pnp requested regions in your patch which result in the acpi_thermal code to read the wrong (0xff) temperature value and halt the machine, but i might be wrong on the details since acpi is such a big code chunk to swallow. I think Alexey is on the right track with the PCI resource allocation failure. Then it should be the SMBus controller, PCI id 00:1f:3, which is having problems registering its io ports region 4, AFAICT. Yes, it looks like the ioport region 0x540-0x55f is described both in PNP and ACPI: /sys/devices/pnp0/00:0d/resources:state = active /sys/devices/pnp0/00:0d/resources:io 0x540-0x55f /sys/devices/pnp0/00:0d/resources:io 0x400-0x47f 00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 03) Subsystem: ASUSTeK Computer Inc. Unknown device 1869 Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- Interrupt: pin B routed to IRQ 0 Region 4: I/O ports at 0540 [size=32] The PCI SMBus device was enabled by a quirk, asus_hides_smbus_lpc(). This quirk seems dangerous to me, and the comments above asus_hides_smbus allude to problems similar to what you're seeing. It's obvious that a lot of blood, sweat, and tears have gone into this quirk, so I'm not suggesting that it's time to revert it, but I would be interested in knowing whether the critical temperature problem goes away if we leave the PCI device hidden, e.g., with the following patch: Index: linux-mm/drivers/pci/quirks.c === --- linux-mm.orig/drivers/pci/quirks.c 2007-12-13 09:11:31.0 -0700 +++ linux-mm/drivers/pci/quirks.c 2007-12-13 09:12:27.0 -0700 @@ -1073,12 +1073,7 @@ pci_read_config_word(dev, 0xF2, val); if (val 0x8) { - pci_write_config_word(dev, 0xF2, val (~0x8)); - pci_read_config_word(dev, 0xF2, val); - if (val 0x8) - printk(KERN_INFO PCI: i801 SMBus device continues to play 'hide and seek'! 0x%x\n, val); - else - printk(KERN_INFO PCI: Enabled i801 SMBus device\n); + printk(KERN_INFO PCI: Leaving i801 SMBus device hidden\n); } } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801AA_0, asus_hides_smbus_lpc); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
tcp_sacktag_one() WARNING (was Re: 2.6.24-rc4-mm1)
Cedric Le Goater wrote: Ilpo Järvinen wrote: On Wed, 5 Dec 2007, Andrew Morton wrote: On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly [EMAIL PROTECTED] wrote: This non fatal oops which I have just noticed may be related to this change then - certainly looks networking related. yep, but it isn't e1000. It's core TCP. WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1 Ilpo, Reuben's kernel is talking to you ;) ...Please try the patch below. Andrew, this probably fixes your problem (the packets = tp-packets_out) as well. nah. I got the WARNINGs again with this patch. I got this new one on a 2.6.24-rc5-mm1. It looked similar ? C. WARNING: at /home/legoater/linux/2.6.24-rc5-mm1/net/ipv4/tcp_input.c:1280 tcp_sacktag_one() Pid: 0, comm: swapper Not tainted 2.6.24-rc5-mm1 #1 Call Trace: IRQ [80410e0e] tcp_sacktag_walk+0x2bc/0x62a [80411711] tcp_sacktag_write_queue+0x595/0xa7c [8028ce66] kfree+0xd4/0xe0 [80411e9f] tcp_ack+0x2a7/0xfc7 [80252ca1] mark_held_locks+0x47/0x6a [80252e5c] trace_hardirqs_on+0xfe/0x139 [80415d59] tcp_rcv_established+0x66a/0x76d [8041bd35] tcp_v4_do_rcv+0x37/0x3aa [8041e623] tcp_v4_rcv+0x9a9/0xa76 [80401832] ip_local_deliver_finish+0x161/0x23c [80401d47] ip_local_deliver+0x72/0x77 [8040168d] ip_rcv_finish+0x371/0x3b5 [80401ca1] ip_rcv+0x292/0x2c6 [803e2aae] netif_receive_skb+0x267/0x340 [8806eff4] :tg3:tg3_poll+0x5d2/0x89e [803e505c] net_rx_action+0xd5/0x1ad [8023b0b9] __do_softirq+0x5f/0xe3 [8020c8ec] call_softirq+0x1c/0x28 [8020e7b9] do_softirq+0x39/0x9f [8023b058] irq_exit+0x4e/0x50 [8020e900] do_IRQ+0xb7/0xd7 [8020a892] mwait_idle+0x0/0x52 [8020bbe6] ret_from_intr+0x0/0xf EOI [8024d0cb] __atomic_notifier_call_chain+0x20/0x83 [8020a8da] mwait_idle+0x48/0x52 [80209e79] enter_idle+0x22/0x24 [8020a822] cpu_idle+0xa1/0xc5 [8021e755] start_secondary+0x3b9/0x3c5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment
Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ I got this one while compiling on NFS. C. kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480! invalid opcode: [1] SMP last sysfs file: /sys/devices/pci:00/:00:1e.0/:01:01.0/local_cpus CPU 1 Modules linked in: autofs4 nfs lockd sunrpc tg3 sg joydev ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #3 RIP: 0010:[80418d93] [80418d93] tcp_fragment+0x5ee/0x6f7 RSP: 0018:810147c9f9e0 EFLAGS: 00010217 RAX: 1526c311 RBX: 8100c2ce1d00 RCX: 810143cc6aa0 RDX: 0001 RSI: 810102b37b00 RDI: 810102b37b50 RBP: 810147c9fa50 R08: 004a R09: 0001 R10: 0b50 R11: 0001 R12: 81013a575700 R13: R14: 810143cc6400 R15: 81013a575750 FS: () GS:810147c57140() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 2ad5d294b000 CR3: bd11b000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 0, threadinfo 810147c98000, task 810147c89040) Stack: 810147c9fa00 05a843cc6400 810143cc6400 810147c9fa70 8100c2ce1d50 810143cc6590 810143cc6aa0 15265421 810143cc6400 810143cc6400 81013a575700 Call Trace: IRQ [804190c7] tcp_retransmit_skb+0xd6/0x713 [804197d4] tcp_xmit_retransmit_queue+0xd0/0x330 [8041209b] tcp_fastretrans_alert+0xb92/0xbf2 [80413f30] tcp_ack+0xdf3/0xfbe [80417295] tcp_rcv_established+0x66a/0x76d [8041d285] tcp_v4_do_rcv+0x37/0x3aa [8041fb73] tcp_v4_rcv+0x9a9/0xa76 [80402e4e] ip_local_deliver_finish+0x161/0x23c [80403363] ip_local_deliver+0x72/0x77 [80402ca9] ip_rcv_finish+0x371/0x3b5 [804032bd] ip_rcv+0x292/0x2c6 [803e3dcc] netif_receive_skb+0x267/0x340 [8806eff4] :tg3:tg3_poll+0x5d2/0x89e [803e639d] net_rx_action+0xd5/0x1ad [8023b605] __do_softirq+0x5f/0xe3 [8020c86c] call_softirq+0x1c/0x28 [8020e739] do_softirq+0x39/0x9f [8023b5a4] irq_exit+0x4e/0x50 [8020e880] do_IRQ+0xb7/0xd7 [8020a803] mwait_idle+0x0/0x55 [8020bb66] ret_from_intr+0x0/0xf EOI [8024d623] __atomic_notifier_call_chain+0x20/0x83 [8020a84b] mwait_idle+0x48/0x55 [80209e79] enter_idle+0x22/0x24 [8020a793] cpu_idle+0xa1/0xc5 [8021dfd5] start_secondary+0x3b9/0x3c5 Code: 0f 0b eb fe 48 85 f6 74 08 8b 46 6c 3b 41 68 75 55 48 8d 41 RIP [80418d93] tcp_fragment+0x5ee/0x6f7 RSP 810147c9f9e0 Kernel panic - not syncing: Aiee, killing interrupt handler! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Thu, Dec 13, 2007 at 09:17:18AM -0700, Bjorn Helgaas wrote: On Thursday 13 December 2007 12:09:23 am Borislav Petkov wrote: On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote: On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote: On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote: On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote: From what i can roughly tell so far it seems like an resource conflict between acpi and the pnp requested regions in your patch which result in the acpi_thermal code to read the wrong (0xff) temperature value and halt the machine, but i might be wrong on the details since acpi is such a big code chunk to swallow. I think Alexey is on the right track with the PCI resource allocation failure. Then it should be the SMBus controller, PCI id 00:1f:3, which is having problems registering its io ports region 4, AFAICT. Yes, it looks like the ioport region 0x540-0x55f is described both in PNP and ACPI: /sys/devices/pnp0/00:0d/resources:state = active /sys/devices/pnp0/00:0d/resources:io 0x540-0x55f /sys/devices/pnp0/00:0d/resources:io 0x400-0x47f 00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 03) Subsystem: ASUSTeK Computer Inc. Unknown device 1869 Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- Interrupt: pin B routed to IRQ 0 Region 4: I/O ports at 0540 [size=32] The PCI SMBus device was enabled by a quirk, asus_hides_smbus_lpc(). This quirk seems dangerous to me, and the comments above asus_hides_smbus allude to problems similar to what you're seeing. It's obvious that a lot of blood, sweat, and tears have gone into this quirk, so I'm not suggesting that it's time to revert it, but I would be interested in knowing whether the critical temperature problem goes away if we leave the PCI device hidden, e.g., with the following patch: Index: linux-mm/drivers/pci/quirks.c === --- linux-mm.orig/drivers/pci/quirks.c2007-12-13 09:11:31.0 -0700 +++ linux-mm/drivers/pci/quirks.c 2007-12-13 09:12:27.0 -0700 @@ -1073,12 +1073,7 @@ pci_read_config_word(dev, 0xF2, val); if (val 0x8) { - pci_write_config_word(dev, 0xF2, val (~0x8)); - pci_read_config_word(dev, 0xF2, val); - if (val 0x8) - printk(KERN_INFO PCI: i801 SMBus device continues to play 'hide and seek'! 0x%x\n, val); - else - printk(KERN_INFO PCI: Enabled i801 SMBus device\n); + printk(KERN_INFO PCI: Leaving i801 SMBus device hidden\n); } } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801AA_0, asus_hides_smbus_lpc); yep, this fixes it. Bootlog attached. -- Regards/Gruß, Boris. bootlog-smbus-hidden.bz2 Description: Binary data
Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment
On Thu, 13 Dec 2007, Cedric Le Goater wrote: I got this one while compiling on NFS. C. kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480! I'm not exactly sure what patches you have applied and which patches are not, with rc4-mm1 there are two patches (first one was incomplete, I assume you had at least that one based on your other mail) to really fix the issues in (__|)tcp_reset_fack_counts(...). However, there seems to be so much breakage that I have a bit trouble to decide where to start... The situation seems bit scary :-). So, I might soon prepare a revert patch for most of the questionable TCP parts and ask Dave to apply it (and drop them fully during next rebase) unless I suddently figure something out soon which explains all/most of the problems, then return to drawing board. ...As it seems that the cumulative ACK processing problem discovered later on (having rather cumbersome solution with skbs only) will make part of the work that's currently in net-2.6.25 quite useless/duplicate effort. But thanks anyway for reporting these. -- i. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment
Ilpo Järvinen wrote: On Thu, 13 Dec 2007, Cedric Le Goater wrote: I got this one while compiling on NFS. C. kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480! I'm not exactly sure what patches you have applied and which patches are not, with rc4-mm1 there are two patches (first one was incomplete, I assume you had at least that one based on your other mail) to really fix the issues in (__|)tcp_reset_fack_counts(...). Yes I only have the first patch you sent on lkml on top of 2.6.24-rc4-mm1. attached below. I didn't see the second one on lkml ? However, there seems to be so much breakage that I have a bit trouble to decide where to start... The situation seems bit scary :-). my n/w environment seems to reproduce these issues quite easily. if you need some testing, just ping me. Cheers, C. So, I might soon prepare a revert patch for most of the questionable TCP parts and ask Dave to apply it (and drop them fully during next rebase) unless I suddently figure something out soon which explains all/most of the problems, then return to drawing board. ...As it seems that the cumulative ACK processing problem discovered later on (having rather cumbersome solution with skbs only) will make part of the work that's currently in net-2.6.25 quite useless/duplicate effort. But thanks anyway for reporting these. Subject: [PATCH] [TCP]: Fix fack_count miscountings (multiple places) 1) Fack_count is set incorrectly if the highest sent skb is already sacked (the skb-prev won't return it because it's on the other list already). These manifest as fackets_out counting error later on, the second-order effects are very hard to track, so it may fix all out-standing TCP bug reports. 2) Prev == NULL check was wrong way around 3) Last skb's fack count was incorrectly skipped while() {} loop Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- include/net/tcp.h | 22 -- 1 files changed, 16 insertions(+), 6 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 9dbed0b..11a7e3e 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1337,10 +1337,20 @@ static inline struct sk_buff *tcp_send_head(struct sock *sk) static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb) { struct sk_buff *prev = tcp_write_queue_prev(sk, skb); + unsigned int fc = 0; + + if (prev == (struct sk_buff *)sk-sk_write_queue) + prev = NULL; + else if (!tcp_skb_adjacent(sk, prev, skb)) + prev = NULL; - if (prev != (struct sk_buff *)sk-sk_write_queue) - TCP_SKB_CB(skb)-fack_count = TCP_SKB_CB(prev)-fack_count + - tcp_skb_pcount(prev); + if ((prev == NULL) !__tcp_write_queue_empty(sk, TCP_WQ_SACKED)) + prev = __tcp_write_queue_tail(sk, TCP_WQ_SACKED); + + if (prev != NULL) + fc = TCP_SKB_CB(prev)-fack_count + tcp_skb_pcount(prev); + + TCP_SKB_CB(skb)-fack_count = fc; sk-sk_send_head = tcp_write_queue_next(sk, skb); if (sk-sk_send_head == (struct sk_buff *)sk-sk_write_queue) @@ -1464,7 +1474,7 @@ static inline struct sk_buff *__tcp_reset_fack_counts(struct sock *sk, { unsigned int fc = 0; - if (prev == NULL) + if (prev != NULL) fc = TCP_SKB_CB(*prev)-fack_count + tcp_skb_pcount(*prev); BUG_ON((*prev != NULL) !tcp_skb_adjacent(sk, *prev, skb)); @@ -1521,7 +1531,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *inskb) skb[otherq] = prev-next; } - while (skb[queue] != __tcp_write_queue_tail(sk, queue)) { + do { /* Lazy find for the other queue */ if (skb[queue] == NULL) { skb[queue] = tcp_write_queue_find(sk, TCP_SKB_CB(prev)-seq, @@ -1535,7 +1545,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *inskb) break; queue ^= TCP_WQ_SACKED; - } + } while (skb[queue] != __tcp_write_queue_tail(sk, queue)); } static inline void __tcp_insert_write_queue_after(struct sk_buff *skb, -- 1.5.0.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote: > On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote: > > On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote: > > > On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote: > > > > From what i can roughly tell so far it seems like an resource conflict > > > > between acpi and > > > > the pnp requested regions in your patch which result in the > > > > acpi_thermal code > > > > to read the wrong (0xff) temperature value and halt the machine, but i > > > > might be > > > > wrong on the details since acpi is such a big code chunk to swallow. > > > > > > I don't see any obvious conflict from the log you posted. For the sake > > > of comparison, can you post the corresponding dmesg log after you removed > > > the patch? > > > > The only difference i see is that ACPI finds EC in DSDT in the working > > kernel > > and in the broken case something silently fails. Please find attached the 2 > > bootlogs > > and a disassembled DSDT. > > Thanks very much! > > "ACPI: EC: Look up EC in DSDT" appears in the working log, but not > in the broken one. But I think we *do* find the EC in both cases, > because we see "ACPI: EC: non-query interrupt received" even before > acpi_ec_add() (which prints the "ACPI: EC: GPE = 0x1c, ...". Maybe > the logs were collected with different log levels? Well, hm, actually no, the only difference is that the broken log was taken over netconsole so the lines might appear in a different order. I'll capture that log again on the weekend to see whether something is missing.. > I think Alexey is on the right track with the PCI resource allocation > failure. Then it should be the SMBus controller, PCI id 00:1f:3, which is having problems registering its io ports region 4, AFAICT. > On your working kernel, can you collect this: > > lspci -vv > lspci > cat /proc/ioports > ioports > cat /proc/iomem > iomem > grep . /sys/devices/pnp*/*/resources > pnp > tar -jcf resources.tar.bz2 lspci ioports iomem pnp attached. -- Regards/Gruß, Boris. resources.tar.bz2 Description: Binary data
Re: 2.6.24-rc4-mm1
Ilpo Järvinen wrote: > On Wed, 5 Dec 2007, Andrew Morton wrote: > >> On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly <[EMAIL PROTECTED]> wrote: >> >>> This non fatal oops which I have just noticed may be related to this change >>> then >>> - certainly looks networking related. >> yep, but it isn't e1000. It's core TCP. >> >>> WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() >>> Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1 >> Ilpo, Reuben's kernel is talking to you ;) > > ...Please try the patch below. Andrew, this probably fixes your problem > (the packets <= tp->packets_out) as well. nah. I got the WARNINGs again with this patch. C. > Dave, please include this one to net-2.6.25. > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
Ilpo Järvinen wrote: > On Wed, 5 Dec 2007, David Miller wrote: > >> From: Reuben Farrelly <[EMAIL PROTECTED]> >> Date: Thu, 06 Dec 2007 17:59:37 +1100 >> >>> On 5/12/2007 4:17 PM, Andrew Morton wrote: - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. >>> This non fatal oops which I have just noticed may be related to this change >>> then >>> - certainly looks networking related. >>> >>> WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() >>> Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1 >>> >>> Call Trace: >>> [] tcp_fastretrans_alert+0x229/0xe63 >>> [] tcp_ack+0xa3f/0x127d >>> [] tcp_rcv_established+0x55f/0x7f8 >>> [] tcp_v4_do_rcv+0xdb/0x3a7 >>> [] :nf_conntrack:nf_ct_deliver_cached_events+0x75/0x99 >> No, it's from TCP assertions and changes added by Ilpo to the >> net-2.6.25 tree recently. > > Yeah, this (very likely) due to the new SACK processing (in net-2.6.25). > I'll look what could go wrong with fack_count calculations, most likely > it's the reason (I've found earlier one out-of-place retransmission > segment in one of my test case which already indicated that there's > something incorrect with them but didn't have time to debug it yet). > > Thanks for report. Some info about how easily you can reproduce & > couple of sentences about the test case might be useful later on when > evaluating the fix. I also got plenty of these when untaring a tarball on NFS. C. WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2 Call Trace: [] tcp_fastretrans_alert+0xb6/0xbf2 [] tcp_ack+0xdf3/0xfbe [] sk_reset_timer+0x17/0x23 [] tcp_rcv_established+0xf3/0x76d [] tcp_v4_do_rcv+0x37/0x3aa [] tcp_v4_rcv+0x9a9/0xa76 [] ip_local_deliver_finish+0x161/0x23c [] ip_local_deliver+0x72/0x77 [] ip_rcv_finish+0x371/0x3b5 [] ip_rcv+0x292/0x2c6 [] netif_receive_skb+0x267/0x340 [] :tg3:tg3_poll+0x5d2/0x89e [] net_rx_action+0xd5/0x1ad [] __do_softirq+0x5f/0xe3 [] call_softirq+0x1c/0x28 [] do_softirq+0x39/0x9f [] irq_exit+0x4e/0x50 [] do_IRQ+0xb7/0xd7 [] mwait_idle+0x0/0x55 [] ret_from_intr+0x0/0xf [] __atomic_notifier_call_chain+0x20/0x83 [] mwait_idle+0x48/0x55 [] enter_idle+0x22/0x24 [] cpu_idle+0xa1/0xc5 [] start_secondary+0x3b9/0x3c5 WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2 Call Trace: [] tcp_fastretrans_alert+0xb6/0xbf2 [] tcp_ack+0xdf3/0xfbe [] tcp_data_queue+0x5da/0xb0a [] tcp_rcv_established+0xf3/0x76d [] tcp_v4_do_rcv+0x37/0x3aa [] tcp_v4_rcv+0x9a9/0xa76 [] ip_local_deliver_finish+0x161/0x23c [] ip_local_deliver+0x72/0x77 [] ip_rcv_finish+0x371/0x3b5 [] ip_rcv+0x292/0x2c6 [] netif_receive_skb+0x267/0x340 [] :tg3:tg3_poll+0x5d2/0x89e [] net_rx_action+0xd5/0x1ad [] __do_softirq+0x5f/0xe3 [] call_softirq+0x1c/0x28 [] do_softirq+0x39/0x9f [] irq_exit+0x4e/0x50 [] do_IRQ+0xb7/0xd7 [] mwait_idle+0x0/0x55 [] ret_from_intr+0x0/0xf [] __atomic_notifier_call_chain+0x20/0x83 [] mwait_idle+0x48/0x55 [] enter_idle+0x22/0x24 [] cpu_idle+0xa1/0xc5 [] start_secondary+0x3b9/0x3c5 WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2 Call Trace: [] tcp_fastretrans_alert+0xb6/0xbf2 [] tcp_ack+0xdf3/0xfbe [] tcp_data_queue+0x5da/0xb0a [] tcp_rcv_established+0xf3/0x76d [] tcp_v4_do_rcv+0x37/0x3aa [] tcp_v4_rcv+0x9a9/0xa76 [] ip_local_deliver_finish+0x161/0x23c [] ip_local_deliver+0x72/0x77 [] ip_rcv_finish+0x371/0x3b5 [] ip_rcv+0x292/0x2c6 [] netif_receive_skb+0x267/0x340 [] :tg3:tg3_poll+0x5d2/0x89e [] net_rx_action+0xd5/0x1ad [] __do_softirq+0x5f/0xe3 [] call_softirq+0x1c/0x28 [] do_softirq+0x39/0x9f [] irq_exit+0x4e/0x50 [] do_IRQ+0xb7/0xd7 [] mwait_idle+0x0/0x55 [] ret_from_intr+0x0/0xf [] __atomic_notifier_call_chain+0x20/0x83 [] mwait_idle+0x48/0x55 [] enter_idle+0x22/0x24 [] cpu_idle+0xa1/0xc5 [] start_secondary+0x3b9/0x3c5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote: > On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote: > > On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote: > > > From what i can roughly tell so far it seems like an resource conflict > > > between acpi and > > > the pnp requested regions in your patch which result in the acpi_thermal > > > code > > > to read the wrong (0xff) temperature value and halt the machine, but i > > > might be > > > wrong on the details since acpi is such a big code chunk to swallow. > > > > I don't see any obvious conflict from the log you posted. For the sake > > of comparison, can you post the corresponding dmesg log after you removed > > the patch? > > The only difference i see is that ACPI finds EC in DSDT in the working kernel > and in the broken case something silently fails. Please find attached the 2 > bootlogs > and a disassembled DSDT. Thanks very much! "ACPI: EC: Look up EC in DSDT" appears in the working log, but not in the broken one. But I think we *do* find the EC in both cases, because we see "ACPI: EC: non-query interrupt received" even before acpi_ec_add() (which prints the "ACPI: EC: GPE = 0x1c, ...". Maybe the logs were collected with different log levels? I think Alexey is on the right track with the PCI resource allocation failure. On your working kernel, can you collect this: lspci -vv > lspci cat /proc/ioports > ioports cat /proc/iomem > iomem grep . /sys/devices/pnp*/*/resources > pnp tar -jcf resources.tar.bz2 lspci ioports iomem pnp Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
Borislav Petkov wrote: On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote: On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote: From what i can roughly tell so far it seems like an resource conflict between acpi and the pnp requested regions in your patch which result in the acpi_thermal code to read the wrong (0xff) temperature value and halt the machine, but i might be wrong on the details since acpi is such a big code chunk to swallow. I don't see any obvious conflict from the log you posted. For the sake of comparison, can you post the corresponding dmesg log after you removed the patch? The only difference i see is that ACPI finds EC in DSDT in the working kernel and in the broken case something silently fails. Please find attached the 2 bootlogs and a disassembled DSDT. This seems to be the start of trouble... PCI: Cannot allocate resource region 4 of device :00:1f.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote: > On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote: > > From what i can roughly tell so far it seems like an resource conflict > > between acpi and > > the pnp requested regions in your patch which result in the acpi_thermal > > code > > to read the wrong (0xff) temperature value and halt the machine, but i > > might be > > wrong on the details since acpi is such a big code chunk to swallow. > > I don't see any obvious conflict from the log you posted. For the sake > of comparison, can you post the corresponding dmesg log after you removed > the patch? The only difference i see is that ACPI finds EC in DSDT in the working kernel and in the broken case something silently fails. Please find attached the 2 bootlogs and a disassembled DSDT. -- Regards/Gruß, Boris. BZh91AYµÜ¬t $Øÿ¬ý´ ÿÿÿïÿþÿÿÿô `/?* P ¨ P IMÓS4CÔjzÒz©iF! Ð hà 4 4È 4i£&@ 4 ¦ 2 4dÁ Ð4 4Ó [EMAIL PROTECTED] 1 p Ðd 4È Ñ ¤ Ñ i ÄLÈɪaèÚM©ý_åÿ/;ý÷b°[$íÄTGµýD{Þ¥ó¢DÍJeK"ñ.¢ (¼«%Id¥)E D¢Soúí|ÏõW©?Ó>m ªY.ë]ò±âÓ£d©ÂNÓlm´ø ,5r5K¦HÖc¢}öÍÕä^}2«¶,iänqb]%[<¿*Ê}oÁ±¹îu>îöñàv©ý QââÍϽ\_©ÔÔû®§µòÙÝØÊît¬å:©÷t*ñ²¡ëgccðYéM®3<§ñ}~3btV¯CQµì-çûÙ® ]9üºcròM^I~·ð{ÎÆÆÆ-+Ýïþm_çÅçäâüTÍyãö1qw=ß{q}/i½ñwö¬ìbíì{u©Op~ Ôö2?\ÙGÖòx®z(åä(Hi# ÎAäÃê4,8´©Tc4Ñ\ÎéY¡Jlfø5Fì[öeècãxsü× Ü§kÇ):ÔjZÉù^ÏÎ×kÉåÉS±ú".ó6¬ØÉâÜÉaN.7ëÁÙ«}Uí- ¢Ö´ªµUlÚÚîyùúð©ÉO%Æ¥Êt=ëÛÊ*QE)¥µ j=)îqw; ýÚ¦-é_²èâ§4×çv>GSþ}.ÖÐðt1]>Óû)wÔÍظ»ZÎGçhÞ÷µÌNwhþK[#cïXkÎÈä)¿¥iæ¡ç{[wËçs{¥ähÔ±=§[ßfº5½nµ§^cµÖ¾Å*'LGûnX{Ý.k¦ò¿åtíkvøõéÚÚónøI"Ó¥,ô§¡OGwõ6ºSÎîd¬·×ã0ö0Qñp`èt:Tdå½Þ¾^¢ÌÜÔ<Ý/³6ÔSâõ§½sDâé-CÔ>wàÉz]>غÞ[SÁ©Þë[ÈÚôxIÉçíÖ¤fbè~Vsà²ÍNÊûW{\»¹û¸ñàÚêx¼ýªvbo^IÈt5¬ìôÉNk,êw:Û3yÑèõ¬Os[É¥=«6¾³'ï{ÕÐÀÖ¨z©ÚÙ±ßÝÂêIâÅ»Dó,ØcÂYíÙJl2Rhq´#NS¶13&â½HöëíÒûKI:5éD÷)ò1æìß2ß{µ)ÐìæäÅOpܺ3ÚúÞ®Gêkb¿{{]<Øà©òñÒäÖéfðu2 ôº7½ªÇ¿Ó¯Ì»àÅì¿~¬\Ôä°ít*IðQÇG5YѵÜ׸ð~GkTاsíÑ(´¸NL%½H·a&dHâpè`GÄ0ȧ%OcgçðÁÉYDïiCv»mÎ<Õ<nM±L ªaE70jV8À¨º2GIÛ±8(ÉÜý\ýz[^/ý¹CÐ~té G Ñ¥¥xnOý÷+¼ËAR46·µò?S&[EMAIL PROTECTED]/¼¸5¿ Õ0~:\ô©¹îqg6?k¿¹ïd³¥ÒWÛ«S¡²´³ì·JQÀÚÔÍÖæÖ]øþ§=.Ísös}Kèx½ÈÞüNõÖdðzØ:[Íàâýj\ÉNáÍ Ã/³ÆæáÏ&Σ.¡°ó. ½ñlö,·¥FçBh§üz]Ò ¬æè±Õ¯Íau×dõ!Ò±g½eÙýX¶kÔ«Z~¤§Êá{O¹í{äù,g8Ø}8¹<:ÎÐKÿÉHÕ®º>ÈðGsÉI©§öÝÀô¹O¿Þ³öÙ§êaeL66bÃ6ÕûW~§ß?ìðØÎ JG£äèÙl27½ <ËoV µì|¬.¡IK1²ïk).ÁUKK,]0`³Æ¦/M2>fC,èÑZ|,£VÍWfj¢S&½k?õ0Òý^ ÞHêsØ¿ÁéS¿T `´¦ÑF9âÂÒMXî4PG%¿Îlõ¤Æ'ó`ì%.IlÝ0Ne;jHܯe»TôTòWoô{c%ÕONëþΦ,ÎÖ´ÉûYCú¿äá£oíìSÎÓf«òv5è©ÉÎϯE9ª¦f첩³[ÿ¯ØÔÅ¥?£ýRÍn$Ô¹8¼_BíÇëy.Ü»7ÔÄó»,{Öè]ïع) Î"nqRÃÅe¨É®õ)k+55/vµ)cKR´i fîôÝ·ó°{»®kZµö:juZ¬Yw6N R¥)_'þ50) âðN#ªÍk)ÖàÅg6¦n,Y¿¦n¦²`Ð:Íjc·fL)u[ÓS{«%Ôa|ð;ARAlî.,JÁN>uYucªksp^r1fÿõvjkt,³;³íZs¦æöÄÁ,;éfóg7ó]fÝ´[üU$Å1RÝ*lpMZaÕ¯RjtNÈmi=Jß*é;YJVæ(7Bv'k)=/èì¥õ6¹µ:uÐ]92v[×ûº (£S%ßÄ£R¥)JRÄ©éRê"R¥)E(¥:¾ÉXªê-T¥J«·¾,ÝMªLGð.}Müxãè}O;zQî3àæô´}N&[̲¬¦QN:¤Àä5ý *Ë<õ#éRüíeI²2Ý_'ú¦Ç©ñfõ´llãF¶-`o+væÕô£LkOÙYRîµ²`»Ö°ÒË-®òÏ¥65(Ã]ZôÍfêYu¼¬o¥`³5³`ÁMºO.µm.ÓÎàÂÆ_þURiçW½ßJQT६V¬T±Oàù:Ë3ån.8ǼSDø,Üìo]îPÂ^¶#>ÕöðÎp>V·ìuͳäRmÖ,ɹF WuÒ¦Ó¼O¿ G]ÝJNõ#ÂÇäKçЮL\ï3ëW5~±pÚjpF£qټٯtüìYáO3®í5^ØßL691Ã6O'U)J£Z±\d©¬õ®¦öLV`äØôåÑÄË_wx±àù©ì×éfÍ:läø·¶åèmfà¦ç lYÆãкö}íí¯½©5*§²ìYj»Øó6á&8¼^Tóêéfï3¿ß½æy5®Jq'IÉv ²]¬Í£kËcT©K÷¯èd³¦¥fãüù.Åz×Xc F¦^fWr:QßsÆnÏ%ÜÁNo~vKæYü¼½ÖqoÏr°_÷ðê· G²ÍêkÖ¹©FJëxô»Ç5MZ[W±f³ÂÊ`ñXèc6TèÃ'UئØÅ°§ñY{co'yÞ`mgMÖÔÚúò~õ&[Zgr÷hS{](lK»p5úUyS&µÚ»^Y¶1Ý]µÛ°mYCZ_^k3¥×ìäkiÀ6[Ø·¯Ã3ÔÉ] ¿K8²uÔÕ¾¦î=ÖjÎ襵ÖÎJiÔÃÿ<ßÁw,iשvi±´ÕùseEÊd|ÞíY¹Ziý-Ö3t)Ö6â±£'Ê×ÔÅNÕÚ"úYGæ&¶Æ µëpÕ¤Êe»RÇLilO¶6¿FÙ¤¹;&¼))ýhLVrmÎLÕà ¶m¬ÌÛÖm7mÏ Wm_kF9V¤ÓsvK¶³¥xí|ÔùwÖÑÝ{áÉG%ÜüOÚýêÔüY>ÑOÀR`J(ÑáÀ34z¯R¨ü_±2obÂ4TYYÝ]YY£Fx©Ñ±ÁöÓØýZ¥¯d¬ü>Æ2{¬Ytø¿Éø»ÍEJk]Òjáëöê½ï{_9,»+^ªRë»Û7ºX`¥4iTÑ£s4ûḳ6lÚ4F©Ðê]±ýlãmmY#^¹¸¬jW&½±±ÁÁkængfMâÀØñÜÞWhÙ#Ⱥ"TY½½ÅÅfßø6Ì2fÒWï¿_K©¬¾¤æm]¢e8úº2õý®VjÍEê¤ÔÎÈuª éúkS_ôYÍ©Ó£_k¡ó¼Î->.q÷ºÚ5¾;ß#zT<YOý¾U£góá6Õ}^¯Ûto~?§'»s½ùDú9°´õQízØ¿ÏΨ¥IúäûPe)õ.»ù,ýýºÌ¤âRrPÿ7*6B5ÊRx9º~EP)E?õ85-Ù¿±¹ ϼ´~óÿ§S©8ͪJkµO¥µ½£u¬¦-òJn`²ÌÙÍå Hì»±¢DÔ;¹µE3डQüWt)Fõ01UkIiRR.]æÁÎoWkêf±gاñqhÜßjÇò?yͽÁçe:ÿ½ÔÀŵØf§{ÖQ*¡á6'ªë(xr:SÉNÇó°SFÄÜÛ®KŬµN쩶ÖFÑÀÑÓñu#§y³¼XÏUL×Z#SÁwTÒ<ÌVqf# R`ÊIv+%Ö)ÅLTnZÈÉfâÈ¡J¨R ( QJTUõ*.f+ znL)D¡ÐfkP°îX»i(©~bµ¶.xFJ¬=Z3Á:ÞK7»/$¶»T´Å)þëLký¾KÚ±v¶ £qÝ2<%ÞOêÀ`§¥î4v¾v-C7¥KC½dv)JIJ=¥öòv;:2HÔìþ´I¥ àsG¹MÎ .0Dèncù,ÛIú]NÖ¶k´9=ð`ó1z×2Éf4klSÎÉf¦ç"ÂN%&ÔºCÊ1
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote: On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote: From what i can roughly tell so far it seems like an resource conflict between acpi and the pnp requested regions in your patch which result in the acpi_thermal code to read the wrong (0xff) temperature value and halt the machine, but i might be wrong on the details since acpi is such a big code chunk to swallow. I don't see any obvious conflict from the log you posted. For the sake of comparison, can you post the corresponding dmesg log after you removed the patch? The only difference i see is that ACPI finds EC in DSDT in the working kernel and in the broken case something silently fails. Please find attached the 2 bootlogs and a disassembled DSDT. -- Regards/Gruß, Boris. BZh91AYSYµÜ¬t $Øÿ¬ý´ ÿÿÿïÿþÿÿÿô `/?* P ¨ P IMÓS4CÔjzÒz©iF! Ð hà 4 4È 4i£@ 4 ¦ 2 4dÁ Ð4 4Ó [EMAIL PROTECTED] 1 p Ðd 4È Ñ ¤ Ñ i ÄLÈɪaèÚM©ý_åÿ/;ý÷b°[$íÄTGµýD{Þ¥ó¢DÍJeKñ.¢ (¼«%Id¥)E D¢Soúí|ÏõW©?Óm ªY.ë]ò±âÓNÃb667ÿÏÕæÁnTÿ£¿ªÍÁ¬ÕÛu¯]¼ßsFXÓý3Õ}2F hbl¾¦nZÓoSz·cËG×ÿwØÖº²u²¦»®¬äÑãEFtõ 9mòS®ÙÃv64¾s¬+0Ee£jçIJB[EMAIL PROTECTED]|Ywv]ÑòZÏ%r£[øîV;1`¤÷` ÔÅ£¡ZAµL¼tÖ/TàJÒómo~ÿsVܹ62¾5ÉJR¥9ª×\Þk`RcPñEë0ç8ÙĹ«DÅ0_$[õ¯[qrÔú5kõ?٩͵8¯»©¬Á)}úhíoùut8pÍuï}ø·7¶,ÔÁþ»2dÉ©Ì,ÁÌ,Áêu².¦Öo¯Zf¤ò½ÎÉë})îm~F[6QOçl8µ¿âõop ÌVOü¾Å½()ROæ çËÝêf÷#½Ôï{éÁírdþÎú¬} ËWÈñtNܧ½F;ýM³µ±ä»HãcÌÛ2nSÍíYþ+5?Åõº^åÞâWt³:mmCñfYw½wZë6Tجn¨b*#o¢nÉÿ?ðÎÎÕOåüôcæO2ëÝ.º÷Ku]]ÖÂêR¥6ÓåZÓ.45e³d¥)(¥(¥)4ù¹°k{\ }¸²pGþÑüý·©%É{UÔpkÂÌÍÊ5¬Å8£d©ÂNÓlm´ø ,5r5K¦HÖc¢}öÍÕä^}2«¶,iänqb]%[¿*Ê}oÁ±¹îuîöñàv©ý QââÍϽ\_©ÔÔû®§µòÙÝØÊît¬å:©÷t*ñ²¡ëgccðYéM®3§ñ}~3btV¯CQµì-çûÙ® ]9üºcròM^I~·ð{ÎÆÆÆ-+Ýïþm_çÅçäâüTÍyãö1qw=ß{q}/i½ñwö¬ìbíì{u©Op~ Ôö2?\ÙGÖòx®z(åä(Hi# ÎAäÃê4,8´©Tc4Ñ\ÎéY¡Jlfø5Fì[öeècãxsü× Ü§kÇ):ÔjZÉù^ÏÎ×kÉåÉS±ú.ó6¬ØÉâÜÉaN.7ëÁÙ«}Uí- ¢Ö´ªµUlÚÚîyùúð©ÉO%Æ¥Êt=ëÛÊ*QE)¥µ j=)îqw; ýÚ¦-é_²èâ§4×çvGSþ}.ÖÐðt1]Óû)wÔÍظ»ZÎGçhÞ÷µÌNwhþK[#cïXkÎÈä)¿¥iæ¡ç{[wËçs{¥ähÔ±=§[ßfº5½nµ§^cµÖ¾Å*'LGûnX{Ý.k¦ò¿åtíkvøõéÚÚónøIÓ¥,ô§¡OGwõ6ºSÎîd¬·×ã0ö0Qñp`èt:Tdå½Þ¾^¢ÌÜÔÝ/³6ÔSâõ§½sDâé-CÔwàÉz]غÞ[SÁ©Þë[ÈÚôxIÉçíÖ¤fbè~Vsà²ÍNÊûW{\»¹û¸ñàÚêx¼ýªvbo^IÈt5¬ìôÉNk,êw:Û3yÑèõ¬Os[É¥=«6¾³'ï{ÕÐÀÖ¨z©ÚÙ±ßÝÂêIâÅ»Dó,ØcÂYíÙJl2Rhq´#NS¶13â½HöëíÒûKI:5éD÷)ò1æìß2ß{µ)ÐìæäÅOpܺ3ÚúÞ®Gêkb¿{{]Øà©òñÒäÖéfðu2 ôº7½ªÇ¿Ó¯Ì»àÅì¿~¬\Ôä°ít*IðQÇG5YѵÜ׸ð~GkTاsíÑ(´¸NL%½H·adHâpè`GÄ0ȧ%OcgçðÁÉYDïiCv»mÎÕnM±L ªaE70jV8À¨º2GIÛ±8(ÉÜý\ýz[^/ý¹CÐ~té G Ñ¥¥xnOý÷+¼ËAR46·µò?S[EMAIL PROTECTED]iG|¿skcoÔõOÜçewü-ñ«|Ú¾Öo·|³§qÅÔë5¹¦¦îÕUl56¬ÁÛë÷5}Á-Û¤öïÔíÑK:L}XK´[àÔècí|ÌýmO°WÖ¥)e)c{,Ö}s[Û«Ãé¯:Î3÷¾§ÒR¥?ÓrÓС²µ÷X}jûspzéOʯÊö,§ØÒêiñ;6|_rS±g/¼¸5¿ Õ0~:\ô©¹îqg6?k¿¹ïd³¥ÒWÛ«S¡²´³ì·JQÀÚÔÍÖæÖ]øþ§=.Ísös}Kèx½ÈÞüNõÖdðzØ:[Íàâýj\ÉNáÍ Ã/³ÆæáÏΣ.¡°ó. ½ñlö,·¥FçBh§üz]Ò ¬æè±Õ¯Íau×dõ!Ò±g½eÙýX¶kÔ«Z~¤§Êá{O¹í{äù,g8Ø}8¹:ÎÐKÿÉHÕ®ºÈðGsÉI©§öÝÀô¹O¿Þ³öÙ§êaeL66bÃ6ÕûW~§ß?ìðØÎ JG£äèÙl27½f ËoV µì|¬.¡IK1²ïk).ÁUKK,]0`³Æ¦/M2fC,èÑZ|,£VÍWfj¢S½k?õ0Òý^ ÞHêsØ¿ÁéS¿T `´¦ÑF9âÂÒMXî4PG%¿Îlõ¤Æ'ó`ì%.IlÝ0Ne;jHܯe»TôTòWoô{c%ÕONëþΦ,ÎÖ´ÉûYCú¿äá£oíìSÎÓf«òv5è©ÉÎϯE9ª¦f첩³[ÿ¯ØÔÅ¥?£ýRÍn$Ô¹8¼_BíÇëy.Ü»7ÔÄó»,{Öè]ïع) ÎnqRÃÅe¨É®õ)k+55/vµ)cKR´i fîôÝ·ó°{»®kZµö:juZ¬Yw6N R¥)_'þ50) âðN#ªÍk)ÖàÅg6¦n,Y¿¦n¦²`ÐE:Íjc·fL)u[ÓS{«%Ôa|ð;ARAlî.,JÁNuYucªksp^r1fÿõvjkt,³;³íZs¦æöÄÁ,;éfóg7ó]fÝ´[üU$Å1RÝ*lpMZaÕ¯RjtNÈmi=Jß*é;YJVæ(7Bv'k)=/èì¥õ6¹µ:uÐ]92v[×ûº (£S%ßÄ£R¥)JRÄ©éRêR¥)E(¥:¾ÉXªê-T¥J«·¾,ÝMªLGð.}Müxãè}O;zQî3àæô´}N[̲¬¦QN:¤Àä5ý *Ëõ#éRüíeI²2Ý_'ú¦Ç©ñfõ´llãF¶-`o+væÕô£LkOÙYRîµ²`»Ö°ÒË-®òÏ¥65(Ã]ZôÍfêYu¼¬o¥`³5³`ÁMºO.µm.ÓÎàÂÆ_þURiçW½ßJQT६V¬T±Oàù:Ë3ån.8ǼSDø,Üìo]îPÂ^¶#ÕöðÎpV·ìuͳäRmÖ,ɹF WuÒ¦Ó¼O¿ G]ÝJNõ#ÂÇäKçЮL\ï3ëW5~±pÚjpF£qټٯtüìYáO3®í5^ØßL691Ã6O'U)J£Z±\d©¬õ®¦öLV`äØôåÑÄË_wx±àù©ì×éfÍ:läø·¶åèmfà¦ç lYÆãкö}íí¯½©5*§²ìYj»Øó6á8¼^Tóêéfï3¿ß½æy5®Jq'IÉv ²]¬Í£kËcT©K÷¯èd³¦¥fãüù.Åz×Xc F¦^fWr:QßsÆnÏ%ÜÁNo~vKæYü¼½ÖqoÏr°_÷ðê· G²ÍêkÖ¹©FJëxô»Ç5MZ[W±f³ÂÊ`ñXèc6TèÃ'UئØÅ°§ñY{co'yÞ`mgMÖÔÚúò~õ[Zgr÷hS{](lK»p5úUySµÚ»^Y¶1Ý]µÛ°mYCZ_^k3¥×ìäkiÀ6[Ø·¯Ã3ÔÉ] ¿K8²uÔÕ¾¦î=ÖjÎ襵ÖÎJiÔÃÿßÁw,iשvi±´ÕùseEÊd|ÞíY¹Ziý-Ö3t)Ö6â±£'Ê×ÔÅNÕÚúYGæ¶Æ µëpÕ¤Êe»RÇLilO¶6¿FÙ¤¹;¼))ýhLVrmÎLÕà ¶m¬ÌÛÖm7mÏ Wm_kF9V¤ÓsvK¶³¥xí|ÔùwÖÑÝ{áÉG%ÜüOÚýêÔüYÑOÀR`J(ÑáÀ34z¯R¨ü_±2obÂ4TYYÝ]YY£Fx©Ñ±ÁöÓØýZ¥¯d¬üÆ2{¬Ytø¿Éø»ÍEJk]Òjáëöê½ï{_9,»+^ªRë»Û7ºX`¥4iTÑ£s4ûḳ6lÚ4F©Ðê]±ýlãmmY#^¹¸¬jW½±±ÁÁkængfMâÀØñÜÞW
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
Borislav Petkov wrote: On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote: On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote: From what i can roughly tell so far it seems like an resource conflict between acpi and the pnp requested regions in your patch which result in the acpi_thermal code to read the wrong (0xff) temperature value and halt the machine, but i might be wrong on the details since acpi is such a big code chunk to swallow. I don't see any obvious conflict from the log you posted. For the sake of comparison, can you post the corresponding dmesg log after you removed the patch? The only difference i see is that ACPI finds EC in DSDT in the working kernel and in the broken case something silently fails. Please find attached the 2 bootlogs and a disassembled DSDT. This seems to be the start of trouble... PCI: Cannot allocate resource region 4 of device :00:1f.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote: On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote: On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote: From what i can roughly tell so far it seems like an resource conflict between acpi and the pnp requested regions in your patch which result in the acpi_thermal code to read the wrong (0xff) temperature value and halt the machine, but i might be wrong on the details since acpi is such a big code chunk to swallow. I don't see any obvious conflict from the log you posted. For the sake of comparison, can you post the corresponding dmesg log after you removed the patch? The only difference i see is that ACPI finds EC in DSDT in the working kernel and in the broken case something silently fails. Please find attached the 2 bootlogs and a disassembled DSDT. Thanks very much! ACPI: EC: Look up EC in DSDT appears in the working log, but not in the broken one. But I think we *do* find the EC in both cases, because we see ACPI: EC: non-query interrupt received even before acpi_ec_add() (which prints the ACPI: EC: GPE = 0x1c, Maybe the logs were collected with different log levels? I think Alexey is on the right track with the PCI resource allocation failure. On your working kernel, can you collect this: lspci -vv lspci cat /proc/ioports ioports cat /proc/iomem iomem grep . /sys/devices/pnp*/*/resources pnp tar -jcf resources.tar.bz2 lspci ioports iomem pnp Bjorn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
Ilpo Järvinen wrote: On Wed, 5 Dec 2007, David Miller wrote: From: Reuben Farrelly [EMAIL PROTECTED] Date: Thu, 06 Dec 2007 17:59:37 +1100 On 5/12/2007 4:17 PM, Andrew Morton wrote: - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. This non fatal oops which I have just noticed may be related to this change then - certainly looks networking related. WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1 Call Trace: IRQ [8046e038] tcp_fastretrans_alert+0x229/0xe63 [80470975] tcp_ack+0xa3f/0x127d [804747b7] tcp_rcv_established+0x55f/0x7f8 [8047b1aa] tcp_v4_do_rcv+0xdb/0x3a7 [881148a8] :nf_conntrack:nf_ct_deliver_cached_events+0x75/0x99 No, it's from TCP assertions and changes added by Ilpo to the net-2.6.25 tree recently. Yeah, this (very likely) due to the new SACK processing (in net-2.6.25). I'll look what could go wrong with fack_count calculations, most likely it's the reason (I've found earlier one out-of-place retransmission segment in one of my test case which already indicated that there's something incorrect with them but didn't have time to debug it yet). Thanks for report. Some info about how easily you can reproduce couple of sentences about the test case might be useful later on when evaluating the fix. I also got plenty of these when untaring a tarball on NFS. C. WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2 Call Trace: IRQ [804115bf] tcp_fastretrans_alert+0xb6/0xbf2 [80413f30] tcp_ack+0xdf3/0xfbe [803da8fb] sk_reset_timer+0x17/0x23 [80416d1e] tcp_rcv_established+0xf3/0x76d [8041d231] tcp_v4_do_rcv+0x37/0x3aa [8041fb1f] tcp_v4_rcv+0x9a9/0xa76 [80402e4e] ip_local_deliver_finish+0x161/0x23c [80403363] ip_local_deliver+0x72/0x77 [80402ca9] ip_rcv_finish+0x371/0x3b5 [804032bd] ip_rcv+0x292/0x2c6 [803e3dcc] netif_receive_skb+0x267/0x340 [8806eff4] :tg3:tg3_poll+0x5d2/0x89e [803e639d] net_rx_action+0xd5/0x1ad [8023b605] __do_softirq+0x5f/0xe3 [8020c86c] call_softirq+0x1c/0x28 [8020e739] do_softirq+0x39/0x9f [8023b5a4] irq_exit+0x4e/0x50 [8020e880] do_IRQ+0xb7/0xd7 [8020a803] mwait_idle+0x0/0x55 [8020bb66] ret_from_intr+0x0/0xf EOI [8024d623] __atomic_notifier_call_chain+0x20/0x83 [8020a84b] mwait_idle+0x48/0x55 [80209e79] enter_idle+0x22/0x24 [8020a793] cpu_idle+0xa1/0xc5 [8021dfd5] start_secondary+0x3b9/0x3c5 WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2 Call Trace: IRQ [804115bf] tcp_fastretrans_alert+0xb6/0xbf2 [80413f30] tcp_ack+0xdf3/0xfbe [804153b8] tcp_data_queue+0x5da/0xb0a [80416d1e] tcp_rcv_established+0xf3/0x76d [8041d231] tcp_v4_do_rcv+0x37/0x3aa [8041fb1f] tcp_v4_rcv+0x9a9/0xa76 [80402e4e] ip_local_deliver_finish+0x161/0x23c [80403363] ip_local_deliver+0x72/0x77 [80402ca9] ip_rcv_finish+0x371/0x3b5 [804032bd] ip_rcv+0x292/0x2c6 [803e3dcc] netif_receive_skb+0x267/0x340 [8806eff4] :tg3:tg3_poll+0x5d2/0x89e [803e639d] net_rx_action+0xd5/0x1ad [8023b605] __do_softirq+0x5f/0xe3 [8020c86c] call_softirq+0x1c/0x28 [8020e739] do_softirq+0x39/0x9f [8023b5a4] irq_exit+0x4e/0x50 [8020e880] do_IRQ+0xb7/0xd7 [8020a803] mwait_idle+0x0/0x55 [8020bb66] ret_from_intr+0x0/0xf EOI [8024d623] __atomic_notifier_call_chain+0x20/0x83 [8020a84b] mwait_idle+0x48/0x55 [80209e79] enter_idle+0x22/0x24 [8020a793] cpu_idle+0xa1/0xc5 [8021dfd5] start_secondary+0x3b9/0x3c5 WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2 Call Trace: IRQ [804115bf] tcp_fastretrans_alert+0xb6/0xbf2 [80413f30] tcp_ack+0xdf3/0xfbe [804153b8] tcp_data_queue+0x5da/0xb0a [80416d1e] tcp_rcv_established+0xf3/0x76d [8041d231] tcp_v4_do_rcv+0x37/0x3aa [8041fb1f] tcp_v4_rcv+0x9a9/0xa76 [80402e4e] ip_local_deliver_finish+0x161/0x23c [80403363] ip_local_deliver+0x72/0x77 [80402ca9] ip_rcv_finish+0x371/0x3b5 [804032bd] ip_rcv+0x292/0x2c6 [803e3dcc] netif_receive_skb+0x267/0x340 [8806eff4] :tg3:tg3_poll+0x5d2/0x89e [803e639d] net_rx_action+0xd5/0x1ad [8023b605] __do_softirq+0x5f/0xe3 [8020c86c]
Re: 2.6.24-rc4-mm1
Ilpo Järvinen wrote: On Wed, 5 Dec 2007, Andrew Morton wrote: On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly [EMAIL PROTECTED] wrote: This non fatal oops which I have just noticed may be related to this change then - certainly looks networking related. yep, but it isn't e1000. It's core TCP. WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1 Ilpo, Reuben's kernel is talking to you ;) ...Please try the patch below. Andrew, this probably fixes your problem (the packets = tp-packets_out) as well. nah. I got the WARNINGs again with this patch. C. Dave, please include this one to net-2.6.25. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote: On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote: On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote: On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote: From what i can roughly tell so far it seems like an resource conflict between acpi and the pnp requested regions in your patch which result in the acpi_thermal code to read the wrong (0xff) temperature value and halt the machine, but i might be wrong on the details since acpi is such a big code chunk to swallow. I don't see any obvious conflict from the log you posted. For the sake of comparison, can you post the corresponding dmesg log after you removed the patch? The only difference i see is that ACPI finds EC in DSDT in the working kernel and in the broken case something silently fails. Please find attached the 2 bootlogs and a disassembled DSDT. Thanks very much! ACPI: EC: Look up EC in DSDT appears in the working log, but not in the broken one. But I think we *do* find the EC in both cases, because we see ACPI: EC: non-query interrupt received even before acpi_ec_add() (which prints the ACPI: EC: GPE = 0x1c, Maybe the logs were collected with different log levels? Well, hm, actually no, the only difference is that the broken log was taken over netconsole so the lines might appear in a different order. I'll capture that log again on the weekend to see whether something is missing.. I think Alexey is on the right track with the PCI resource allocation failure. Then it should be the SMBus controller, PCI id 00:1f:3, which is having problems registering its io ports region 4, AFAICT. On your working kernel, can you collect this: lspci -vv lspci cat /proc/ioports ioports cat /proc/iomem iomem grep . /sys/devices/pnp*/*/resources pnp tar -jcf resources.tar.bz2 lspci ioports iomem pnp attached. -- Regards/Gruß, Boris. resources.tar.bz2 Description: Binary data
Re: 2.6.24-rc4-mm1
On Tue, 4 Dec 2007 21:17:01 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > Changes since 2.6.24-rc3-mm2: 2.6.24-rc4-mm1 brought a nice TCP oops on my x86_64 system, while I was stress-testing the VM and watching via ssh: general protection fault: [1] SMP last sysfs file: /sys/devices/pci:00/:00:1c.5/:04:00.0/irq CPU 1 Modules linked in: nfs lockd nfs_acl rfcomm l2cap bluetooth autofs4 sunrpc ipv6 acpi_cpufreq dm_multipath parport_pc e1000e parport firewire_ohci button i2c_i801 i2c_core i82975x_edac pcspkr firewire_core serio_raw edac_core rtc_cmos floppy crc_itu_t sg sr_mod cdrom pata_marvell ata_piix dm_snapshot dm_zero dm_mirror dm_mod ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 2946, comm: sshd Not tainted 2.6.24-rc4-mm1 #1 RIP: 0010:[] [] __tcp_rb_insert+0x1a/0x67 RSP: 0018:810066401c88 EFLAGS: 00010202 RAX: 6b6b6b6b6b6b6b6b RBX: 810076e9f000 RCX: 81003ddc9900 RDX: 6b6b6b6b6b6b6bab RSI: 81006ed1b148 RDI: 6b6b6b6b6b6b6b5b RBP: 81006ed1aa00 R08: 810076e9f010 R09: bef8d64e R10: 81228926 R11: 8110b2aa R12: 810066401de8 R13: 00e0 R14: 810066401ee8 R15: 0001 FS: 7f1c2c10d780() GS:81007f801578() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 02aabfd3 CR3: 665e3000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process sshd (pid: 2946, threadinfo 81006640, task 8100665ce000) Stack: 81003ddc9900 81228b26 0001 810066401ee8 810574da 04e00040 00e004e0 7f1c2c797620 0246 66401d60 Call Trace: [] tcp_sendmsg+0x21f/0xb00 [] sock_aio_write+0xf8/0x110 [] do_sync_write+0xc9/0x10c [] file_has_perm+0x9a/0xa9 [] autoremove_wake_function+0x0/0x2e [] __lock_acquire+0x50f/0xc8e [] lock_release_holdtime+0x27/0x48 [] vfs_write+0xd9/0x16f [] sys_write+0x45/0x6e [] tracesys+0xdc/0xe1 Code: 44 3b 4a 1c 79 10 44 3b 4a 18 78 04 0f 0b eb fe 48 8d 50 10 RIP [] __tcp_rb_insert+0x1a/0x67 RSP -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1 -- boot process hangs -- tty4 main process (2988) terminated with status 1
On Sat, 8 Dec 2007 21:29:18 -0500 "Miles Lane" <[EMAIL PROTECTED]> wrote: > > > Dec 6 21:24:28 erratic-orbits init: tty3 main process (2991) > > > terminated with status 1 > > > > Boggle. We broke the vt driver? > > > > config, please... > > I sent the .config. I didn't receive it but I found a config from you in amother thread. > Is there nothing else to follow up on? I have > tried rebuilding about seven kernels, tweaking the options each time. > All the kernels have failed to boot. I am currently trying with a > "defconfig" kernel. Perhaps I will have better luck with it. Your config instabricks my Vaio. Fiddled with it a bit but failed to pick the problem. Fixing regressions in -mm isn't top priority at present I'm afraid. If the same bug is present in next -mm it'd be great if you could bisect it down please. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote: > From what i can roughly tell so far it seems like an resource conflict > between acpi and > the pnp requested regions in your patch which result in the acpi_thermal code > to read the wrong (0xff) temperature value and halt the machine, but i might > be > wrong on the details since acpi is such a big code chunk to swallow. I don't see any obvious conflict from the log you posted. For the sake of comparison, can you post the corresponding dmesg log after you removed the patch? acpi_thermal_get_temperature() only evaluates _TMP, which isn't very interesting. I wonder if there's some conflict between that AML method and the EC driver or something. If you can also collect the DSDT, maybe I can poke around in there and see what _TMP is really doing. Thanks, Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Tue, 11 Dec 2007 14:17:16 -0800 Kok, Auke wrote: > Andrew Morton wrote: > > On Tue, 11 Dec 2007 13:26:58 -0800 > > "Kok, Auke" <[EMAIL PROTECTED]> wrote: > > > >> Andrew Morton wrote: > >>> On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" <[EMAIL PROTECTED]> > >>> wrote: > >>> > > - Lots of device IDs have been removed from the e1000 driver and moved > > over > > to e1000e. So if your e1000 stops working, you forgot to set > > CONFIG_E1000E. > > > > > Wouldn't it make sense to just default this to on if E1000 was on, rather > than screwing > everybody for no good reason (plus breaking all the automated testing, > etc > etc)? > Much though I love random refactoring, it is fairly painful to just keep > changing the > names of things. > >>> (cc netdev and Auke) > >>> > >>> Yes, that would be very sensible. CONFIG_E1000E should default to > >>> whatever > >>> CONFIG_E1000 was set to. > >> which is "y" for x86 and friends, ppc, arm and ia64 through 'defconfig'. > >> the > >> Kconfig files do not have defaults in them. > > > > I wouldn't be looking at defconfig files - I don't think many people use > > them. Most people use their previous config, via oldconfig. > > > > So what we want here is to give them E1000E if they had previously been > > using E1000. I don't know how one would do this in Kconfig. > > ditto. I doubt that "SELECT E1000E" would be a good idea here (maybe not even > work), and I can't think of anything else. "default E1000" in E1000E seems to work for me. --- From: Randy Dunlap <[EMAIL PROTECTED]> Make E1000E default to the same kconfig setting as E1000, at least for -mm testing. Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- drivers/net/Kconfig |1 + 1 file changed, 1 insertion(+) --- linux-2.6.24-rc4-mm1.orig/drivers/net/Kconfig +++ linux-2.6.24-rc4-mm1/drivers/net/Kconfig @@ -1986,6 +1986,7 @@ config E1000_DISABLE_PACKET_SPLIT config E1000E tristate "Intel(R) PRO/1000 PCI-Express Gigabit Ethernet support" depends on PCI + default E1000 ---help--- This driver supports the PCI-Express Intel(R) PRO/1000 gigabit ethernet family of adapters. For PCI or PCI-X e1000 adapters, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
Andrew Morton wrote: > On Tue, 11 Dec 2007 13:26:58 -0800 > "Kok, Auke" <[EMAIL PROTECTED]> wrote: > >> Andrew Morton wrote: >>> On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" <[EMAIL PROTECTED]> wrote: >>> > - Lots of device IDs have been removed from the e1000 driver and moved > over > to e1000e. So if your e1000 stops working, you forgot to set > CONFIG_E1000E. > > Wouldn't it make sense to just default this to on if E1000 was on, rather than screwing everybody for no good reason (plus breaking all the automated testing, etc etc)? Much though I love random refactoring, it is fairly painful to just keep changing the names of things. >>> (cc netdev and Auke) >>> >>> Yes, that would be very sensible. CONFIG_E1000E should default to whatever >>> CONFIG_E1000 was set to. >> which is "y" for x86 and friends, ppc, arm and ia64 through 'defconfig'. the >> Kconfig files do not have defaults in them. > > I wouldn't be looking at defconfig files - I don't think many people use > them. Most people use their previous config, via oldconfig. > > So what we want here is to give them E1000E if they had previously been > using E1000. I don't know how one would do this in Kconfig. ditto. I doubt that "SELECT E1000E" would be a good idea here (maybe not even work), and I can't think of anything else. Auke -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Tue, 11 Dec 2007 13:26:58 -0800 "Kok, Auke" <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" <[EMAIL PROTECTED]> wrote: > > > >>> > >>> - Lots of device IDs have been removed from the e1000 driver and moved > >>> over > >>> to e1000e. So if your e1000 stops working, you forgot to set > >>> CONFIG_E1000E. > >>> > >>> > >> Wouldn't it make sense to just default this to on if E1000 was on, rather > >> than screwing > >> everybody for no good reason (plus breaking all the automated testing, etc > >> etc)? > >> Much though I love random refactoring, it is fairly painful to just keep > >> changing the > >> names of things. > > > > (cc netdev and Auke) > > > > Yes, that would be very sensible. CONFIG_E1000E should default to whatever > > CONFIG_E1000 was set to. > > which is "y" for x86 and friends, ppc, arm and ia64 through 'defconfig'. the > Kconfig files do not have defaults in them. I wouldn't be looking at defconfig files - I don't think many people use them. Most people use their previous config, via oldconfig. So what we want here is to give them E1000E if they had previously been using E1000. I don't know how one would do this in Kconfig. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
Kok, Auke wrote: > Andrew Morton wrote: >> On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" <[EMAIL PROTECTED]> wrote: >> - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. >>> Wouldn't it make sense to just default this to on if E1000 was on, rather >>> than screwing >>> everybody for no good reason (plus breaking all the automated testing, etc >>> etc)? >>> Much though I love random refactoring, it is fairly painful to just keep >>> changing the >>> names of things. >> (cc netdev and Auke) >> >> Yes, that would be very sensible. CONFIG_E1000E should default to whatever >> CONFIG_E1000 was set to. > > which is "y" for x86 and friends, ppc, arm and ia64 through 'defconfig'. the > Kconfig files do not have defaults in them. > > I can send a patch to adjust the defconfig files, would that be OK? I > certainly > think that would be reasonable, I dislike setting defaults through defconfig > for > network drivers myself and rather would not do that. that should read "dislike setting defaults through Kconfig ..." Auke -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
Andrew Morton wrote: > On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" <[EMAIL PROTECTED]> wrote: > >>> >>> - Lots of device IDs have been removed from the e1000 driver and moved >>> over >>> to e1000e. So if your e1000 stops working, you forgot to set >>> CONFIG_E1000E. >>> >>> >> Wouldn't it make sense to just default this to on if E1000 was on, rather >> than screwing >> everybody for no good reason (plus breaking all the automated testing, etc >> etc)? >> Much though I love random refactoring, it is fairly painful to just keep >> changing the >> names of things. > > (cc netdev and Auke) > > Yes, that would be very sensible. CONFIG_E1000E should default to whatever > CONFIG_E1000 was set to. which is "y" for x86 and friends, ppc, arm and ia64 through 'defconfig'. the Kconfig files do not have defaults in them. I can send a patch to adjust the defconfig files, would that be OK? I certainly think that would be reasonable, I dislike setting defaults through defconfig for network drivers myself and rather would not do that. Auke -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Tue, Dec 11, 2007 at 01:00:24PM -0700, Bjorn Helgaas wrote: > On Tuesday 11 December 2007 10:44:43 am Borislav Petkov wrote: > > On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote: > > > On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote: > > > > Hi Andrew, > > > > Hi Len, > > > > > > > > after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just > > > > fine) on my asus laptop, the machine reboots after claiming that > > > > "Critical temperature reached (255 C)." However, the degrees number > > > > is kinda hinting at 0xff all-ones field. Will try dump_stack in > > > > acpi_thermal_critical() to checkout the call path. For now here's the > > > > netconsole bootlog: > > > > > > Here's what i got so far: > > > > > > [ 50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14 > > > [ 50.287999] [] show_trace_log_lvl+0x12/0x25 > > > [ 50.288103] [] show_trace+0xd/0x10 > > > [ 50.288202] [] dump_stack+0x57/0x5f > > > [ 50.288303] [] acpi_thermal_check+0x150/0x3bb > > > [ 50.288415] [] acpi_thermal_add+0x261/0x2cf > > > [ 50.288515] [] acpi_device_probe+0x3e/0xdb > > > [ 50.288615] [] driver_probe_device+0xaf/0x12a > > > [ 50.288717] [] __driver_attach+0x6c/0xa5 > > > [ 50.288817] [] bus_for_each_dev+0x3e/0x60 > > > [ 50.288916] [] driver_attach+0x14/0x16 > > > [ 50.289015] [] bus_add_driver+0xa6/0x1a8 > > > [ 50.289114] [] driver_register+0x42/0x47 > > > [ 50.289214] [] acpi_bus_register_driver+0x3a/0x3c > > > [ 50.289316] [] acpi_thermal_init+0x57/0x76 > > > [ 50.289424] [] kernel_init+0x138/0x280 > > > [ 50.289525] [] kernel_thread_helper+0x7/0x10 > > > [ 50.289625] === > > > [ 50.289680] ACPI: Critical trip point > > > [ 50.289736] Critical temperature reached (255 C), shutting down. > > > > > > so in acpi_thermal_get_temperature() called in acpi_thermal_add() the > > > tz->temperature thingy is not set properly (printk's added): > > > > > > [ 50.276607] Old temp: 4294967023 > > > [ 50.281890] Got temp: 255 > > > [ 50.282567] Old temp: 255 > > > [ 50.287882] Got temp: 255 > > > > > > What's also strange is that the tz acpi_thermal is alloc'd with kzalloc > > > and > > > there's still garbage in it after reading it in > > > acpi_thermal_get_temperature() > > > for the first time. Debugging continues... > > > > (i almost suspected that the problem might be something completely > > different.) > > well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer > > turned out to be > > > > broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch. > > > > After backing this one out, mm1 boots just fine here. > > Thanks for tracking this down. I'll look into your logs and see if I > can figure out what's going on. There's another report related to that > patch here: http://lkml.org/lkml/2007/11/22/110 . Looks like a different > symptom though, so probably a different fix. >From what i can roughly tell so far it seems like an resource conflict between >acpi and the pnp requested regions in your patch which result in the acpi_thermal code to read the wrong (0xff) temperature value and halt the machine, but i might be wrong on the details since acpi is such a big code chunk to swallow. Anyways, this is a different issue than the one you quote above. -- Regards/Gruß, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > I can't see this compile failure posted anywhere: > > http://test.kernel.org/results/IBM/126049/build/debug/stderr > > > > arch/x86/vdso/vdso32/sigreturn.S: Assembler messages: > > arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid for > > `pop' > > arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for > > `pop' > > make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1 > > make: *** [arch/x86/vdso] Error 2 > > (cc Ingo and Thomas) Roland says: | That seems like it must be a tool problem. The V=1 output would show | if those compiles missed -m32 or something. But even in the wrong | mode, this error does not make sense. The assembly code it's citing | is identical to the old arch/x86/ia32/vsyscall-syscall.S code. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" <[EMAIL PROTECTED]> wrote: > > > > > > - Lots of device IDs have been removed from the e1000 driver and moved > > over > > to e1000e. So if your e1000 stops working, you forgot to set > > CONFIG_E1000E. > > > > > Wouldn't it make sense to just default this to on if E1000 was on, rather > than screwing > everybody for no good reason (plus breaking all the automated testing, etc > etc)? > Much though I love random refactoring, it is fairly painful to just keep > changing the > names of things. (cc netdev and Auke) Yes, that would be very sensible. CONFIG_E1000E should default to whatever CONFIG_E1000 was set to. > > I can't see this compile failure posted anywhere: > http://test.kernel.org/results/IBM/126049/build/debug/stderr > > arch/x86/vdso/vdso32/sigreturn.S: Assembler messages: > arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid for > `pop' > arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for `pop' > make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1 > make: *** [arch/x86/vdso] Error 2 (cc Ingo and Thomas) > > Nor this one: > http://test.kernel.org/results/IBM/126096/build/debug/stderr > > drivers/char/hvcs.c: In function ‘hvcs_open’: > drivers/char/hvcs.c:1180: error: wrong type argument to unary exclamation mark > (cc Greg) Caused by gregkh-driver-kobject-convert-hvcs-to-use-kref-not-kobject.patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Tuesday 11 December 2007 10:44:43 am Borislav Petkov wrote: > On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote: > > On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote: > > > Hi Andrew, > > > Hi Len, > > > > > > after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just > > > fine) on my asus laptop, the machine reboots after claiming that > > > "Critical temperature reached (255 C)." However, the degrees number > > > is kinda hinting at 0xff all-ones field. Will try dump_stack in > > > acpi_thermal_critical() to checkout the call path. For now here's the > > > netconsole bootlog: > > > > Here's what i got so far: > > > > [ 50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14 > > [ 50.287999] [] show_trace_log_lvl+0x12/0x25 > > [ 50.288103] [] show_trace+0xd/0x10 > > [ 50.288202] [] dump_stack+0x57/0x5f > > [ 50.288303] [] acpi_thermal_check+0x150/0x3bb > > [ 50.288415] [] acpi_thermal_add+0x261/0x2cf > > [ 50.288515] [] acpi_device_probe+0x3e/0xdb > > [ 50.288615] [] driver_probe_device+0xaf/0x12a > > [ 50.288717] [] __driver_attach+0x6c/0xa5 > > [ 50.288817] [] bus_for_each_dev+0x3e/0x60 > > [ 50.288916] [] driver_attach+0x14/0x16 > > [ 50.289015] [] bus_add_driver+0xa6/0x1a8 > > [ 50.289114] [] driver_register+0x42/0x47 > > [ 50.289214] [] acpi_bus_register_driver+0x3a/0x3c > > [ 50.289316] [] acpi_thermal_init+0x57/0x76 > > [ 50.289424] [] kernel_init+0x138/0x280 > > [ 50.289525] [] kernel_thread_helper+0x7/0x10 > > [ 50.289625] === > > [ 50.289680] ACPI: Critical trip point > > [ 50.289736] Critical temperature reached (255 C), shutting down. > > > > so in acpi_thermal_get_temperature() called in acpi_thermal_add() the > > tz->temperature thingy is not set properly (printk's added): > > > > [ 50.276607] Old temp: 4294967023 > > [ 50.281890] Got temp: 255 > > [ 50.282567] Old temp: 255 > > [ 50.287882] Got temp: 255 > > > > What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and > > there's still garbage in it after reading it in > > acpi_thermal_get_temperature() > > for the first time. Debugging continues... > > (i almost suspected that the problem might be something completely different.) > well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer > turned out to be > > broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch. > > After backing this one out, mm1 boots just fine here. Thanks for tracking this down. I'll look into your logs and see if I can figure out what's going on. There's another report related to that patch here: http://lkml.org/lkml/2007/11/22/110 . Looks like a different symptom though, so probably a different fix. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
I can't see this compile failure posted anywhere: http://test.kernel.org/results/IBM/126049/build/debug/stderr arch/x86/vdso/vdso32/sigreturn.S: Assembler messages: arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid for `pop' arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for `pop' make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1 make: *** [arch/x86/vdso] Error 2 I see those on one build machine but not on another, so I thought that it was a tools issue... If so, it's a tools issue that worked fine until -mm1, which makes it a kernel problem in my mind ;-) Nor this one: http://test.kernel.org/results/IBM/126096/build/debug/stderr drivers/char/hvcs.c: In function ‘hvcs_open’: drivers/char/hvcs.c:1180: error: wrong type argument to unary exclamation mark See http://marc.info/?l=linux-kernel=119700448119646 for patches. Thanks, M. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote: > On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote: > > Hi Andrew, > > Hi Len, > > > > after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just > > fine) on my asus laptop, the machine reboots after claiming that > > "Critical temperature reached (255 C)." However, the degrees number > > is kinda hinting at 0xff all-ones field. Will try dump_stack in > > acpi_thermal_critical() to checkout the call path. For now here's the > > netconsole bootlog: > > Here's what i got so far: > > [ 50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14 > [ 50.287999] [] show_trace_log_lvl+0x12/0x25 > [ 50.288103] [] show_trace+0xd/0x10 > [ 50.288202] [] dump_stack+0x57/0x5f > [ 50.288303] [] acpi_thermal_check+0x150/0x3bb > [ 50.288415] [] acpi_thermal_add+0x261/0x2cf > [ 50.288515] [] acpi_device_probe+0x3e/0xdb > [ 50.288615] [] driver_probe_device+0xaf/0x12a > [ 50.288717] [] __driver_attach+0x6c/0xa5 > [ 50.288817] [] bus_for_each_dev+0x3e/0x60 > [ 50.288916] [] driver_attach+0x14/0x16 > [ 50.289015] [] bus_add_driver+0xa6/0x1a8 > [ 50.289114] [] driver_register+0x42/0x47 > [ 50.289214] [] acpi_bus_register_driver+0x3a/0x3c > [ 50.289316] [] acpi_thermal_init+0x57/0x76 > [ 50.289424] [] kernel_init+0x138/0x280 > [ 50.289525] [] kernel_thread_helper+0x7/0x10 > [ 50.289625] === > [ 50.289680] ACPI: Critical trip point > [ 50.289736] Critical temperature reached (255 C), shutting down. > > so in acpi_thermal_get_temperature() called in acpi_thermal_add() the > tz->temperature thingy is not set properly (printk's added): > > [ 50.276607] Old temp: 4294967023 > [ 50.281890] Got temp: 255 > [ 50.282567] Old temp: 255 > [ 50.287882] Got temp: 255 > > What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and > there's still garbage in it after reading it in acpi_thermal_get_temperature() > for the first time. Debugging continues... (i almost suspected that the problem might be something completely different.) well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer turned out to be broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch. After backing this one out, mm1 boots just fine here. -- Regards/Gruß, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Tue, 11 Dec 2007 08:20:05 -0800 Martin Bligh wrote: > >- Lots of device IDs have been removed from the e1000 driver and > > moved over to e1000e. So if your e1000 stops working, you forgot > > to set CONFIG_E1000E. > > > Wouldn't it make sense to just default this to on if E1000 was on? > As far as I can see that's not true, which will screwing everybody > for no good reason (plus breaking all the automated testing, etc etc)? > Much though I love random refactoring, it is fairly painful to just > keep changing the names of things. > > > I can't see this compile failure posted anywhere: > http://test.kernel.org/results/IBM/126049/build/debug/stderr > > arch/x86/vdso/vdso32/sigreturn.S: Assembler messages: > arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid > for `pop' > > arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for > `pop' > make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1 > make: *** [arch/x86/vdso] Error 2 I see those on one build machine but not on another, so I thought that it was a tools issue... > Nor this one: > http://test.kernel.org/results/IBM/126096/build/debug/stderr > > drivers/char/hvcs.c: In function ‘hvcs_open’: > drivers/char/hvcs.c:1180: error: wrong type argument to unary > exclamation mark See http://marc.info/?l=linux-kernel=119700448119646 for patches. --- ~Randy Features and documentation: http://lwn.net/Articles/260136/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
>- Lots of device IDs have been removed from the e1000 driver and > moved over to e1000e. So if your e1000 stops working, you forgot > to set CONFIG_E1000E. Wouldn't it make sense to just default this to on if E1000 was on? As far as I can see that's not true, which will screwing everybody for no good reason (plus breaking all the automated testing, etc etc)? Much though I love random refactoring, it is fairly painful to just keep changing the names of things. I can't see this compile failure posted anywhere: http://test.kernel.org/results/IBM/126049/build/debug/stderr arch/x86/vdso/vdso32/sigreturn.S: Assembler messages: arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid for `pop' arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for `pop' make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1 make: *** [arch/x86/vdso] Error 2 Nor this one: http://test.kernel.org/results/IBM/126096/build/debug/stderr drivers/char/hvcs.c: In function ‘hvcs_open’: drivers/char/hvcs.c:1180: error: wrong type argument to unary exclamation mark -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On 11/12/2007 8:11 AM, Andrew Morton wrote: On Tue, 11 Dec 2007 01:48:39 +1100 Reuben Farrelly <[EMAIL PROTECTED]> wrote: On 5/12/2007 4:17 PM, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. - The s390 build is still broken. I'm seeing this most incredibly unhelpful (to debug) but fortunately reproduceable problem (so far 4/4 times) on this -mm kernel. I thought this problem may have been related to another bug which I have reported (A TCP oops) but even after applying a likely fix for that I am still seeing this problem. The machine boots up perfectly fine and runs good until I load it up. In this case I can reliably cause this to occur by pulling a 3G ISO across the GigE network from my Linux box to my PC. After maybe 50M or so, the console just displays this (ignore initial boot banner): -- * Starting local ... [ ok ] This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01 tornado login: *** buffer overf --- Yes - after displaying the 'f' in what I can only guess is the word 'overflow', the box spontaneously reboots. There is no further console output until it starts to come back up again. The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage. I enabled a number of kernel debugging options but then I got no output at all when the machine crashed. I'm at a bit of a loss as to which subsystem this might be coming from, so I'm not sure who to CC. Box information is (still) up at http://www.reub.net/files/kernel/2.6.24-rc4-mm1/ hm. grepping around for "buffer overflow" doesn't turn up anything except in drivers which you won't be using on that machine. I'd be suspecting networking, obviously. If you're feeling keen could you please grep a 2.6.24-rc4 tree and apply 2.6.24-rc4-mm1's origin.patch and git-net.patch and see if the bug is still present? No - seems to be fine with just origin.patch and git-net.patch. Just for good measure I then reverted git-net.patch and applied git-netdev-all.patch instead, and still wasn't able to trigger the reboot or console message, no matter how hard I tried. I guess for now I'll sit on it, and if it appears in the next -mm it'll probably annoy me enough and inspire me to dig deeper (or, "guess" deeper, given the lack of direction as to where to even begin). Reuben -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: undefined reference to `compat_sys_timerfd' on sparc64
From: Andrew Morton <[EMAIL PROTECTED]> Date: Fri, 7 Dec 2007 16:08:00 -0800 > Or should this have been sys_nis_syscall()? sys_nis_syscall() was used in cases on sparc where we wanted to get a log of invocations of unimplemented syscalls, as it aided debugging and anaylsis. But the usefulness of such things I think is long gone, so what I'll likely do is kill the sys_nis_syscall stuff from the sparc ports. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: undefined reference to `compat_sys_timerfd' on sparc64
From: Andrew Morton [EMAIL PROTECTED] Date: Fri, 7 Dec 2007 16:08:00 -0800 Or should this have been sys_nis_syscall()? sys_nis_syscall() was used in cases on sparc where we wanted to get a log of invocations of unimplemented syscalls, as it aided debugging and anaylsis. But the usefulness of such things I think is long gone, so what I'll likely do is kill the sys_nis_syscall stuff from the sparc ports. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On 11/12/2007 8:11 AM, Andrew Morton wrote: On Tue, 11 Dec 2007 01:48:39 +1100 Reuben Farrelly [EMAIL PROTECTED] wrote: On 5/12/2007 4:17 PM, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. - The s390 build is still broken. I'm seeing this most incredibly unhelpful (to debug) but fortunately reproduceable problem (so far 4/4 times) on this -mm kernel. I thought this problem may have been related to another bug which I have reported (A TCP oops) but even after applying a likely fix for that I am still seeing this problem. The machine boots up perfectly fine and runs good until I load it up. In this case I can reliably cause this to occur by pulling a 3G ISO across the GigE network from my Linux box to my PC. After maybe 50M or so, the console just displays this (ignore initial boot banner): -- * Starting local ... [ ok ] This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01 tornado login: *** buffer overf --- Yes - after displaying the 'f' in what I can only guess is the word 'overflow', the box spontaneously reboots. There is no further console output until it starts to come back up again. The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage. I enabled a number of kernel debugging options but then I got no output at all when the machine crashed. I'm at a bit of a loss as to which subsystem this might be coming from, so I'm not sure who to CC. Box information is (still) up at http://www.reub.net/files/kernel/2.6.24-rc4-mm1/ hm. grepping around for buffer overflow doesn't turn up anything except in drivers which you won't be using on that machine. I'd be suspecting networking, obviously. If you're feeling keen could you please grep a 2.6.24-rc4 tree and apply 2.6.24-rc4-mm1's origin.patch and git-net.patch and see if the bug is still present? No - seems to be fine with just origin.patch and git-net.patch. Just for good measure I then reverted git-net.patch and applied git-netdev-all.patch instead, and still wasn't able to trigger the reboot or console message, no matter how hard I tried. I guess for now I'll sit on it, and if it appears in the next -mm it'll probably annoy me enough and inspire me to dig deeper (or, guess deeper, given the lack of direction as to where to even begin). Reuben -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
- Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. Wouldn't it make sense to just default this to on if E1000 was on? As far as I can see that's not true, which will screwing everybody for no good reason (plus breaking all the automated testing, etc etc)? Much though I love random refactoring, it is fairly painful to just keep changing the names of things. I can't see this compile failure posted anywhere: http://test.kernel.org/results/IBM/126049/build/debug/stderr arch/x86/vdso/vdso32/sigreturn.S: Assembler messages: arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid for `pop' arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for `pop' make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1 make: *** [arch/x86/vdso] Error 2 Nor this one: http://test.kernel.org/results/IBM/126096/build/debug/stderr drivers/char/hvcs.c: In function ‘hvcs_open’: drivers/char/hvcs.c:1180: error: wrong type argument to unary exclamation mark -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Tue, 11 Dec 2007 08:20:05 -0800 Martin Bligh wrote: - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. Wouldn't it make sense to just default this to on if E1000 was on? As far as I can see that's not true, which will screwing everybody for no good reason (plus breaking all the automated testing, etc etc)? Much though I love random refactoring, it is fairly painful to just keep changing the names of things. I can't see this compile failure posted anywhere: http://test.kernel.org/results/IBM/126049/build/debug/stderr arch/x86/vdso/vdso32/sigreturn.S: Assembler messages: arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid for `pop' arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for `pop' make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1 make: *** [arch/x86/vdso] Error 2 I see those on one build machine but not on another, so I thought that it was a tools issue... Nor this one: http://test.kernel.org/results/IBM/126096/build/debug/stderr drivers/char/hvcs.c: In function ‘hvcs_open’: drivers/char/hvcs.c:1180: error: wrong type argument to unary exclamation mark See http://marc.info/?l=linux-kernelm=119700448119646 for patches. --- ~Randy Features and documentation: http://lwn.net/Articles/260136/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote: On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote: Hi Andrew, Hi Len, after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just fine) on my asus laptop, the machine reboots after claiming that Critical temperature reached (255 C). However, the degrees number is kinda hinting at 0xff all-ones field. Will try dump_stack in acpi_thermal_critical() to checkout the call path. For now here's the netconsole bootlog: Here's what i got so far: [ 50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14 [ 50.287999] [c0104b65] show_trace_log_lvl+0x12/0x25 [ 50.288103] [c01053e7] show_trace+0xd/0x10 [ 50.288202] [c0105a6c] dump_stack+0x57/0x5f [ 50.288303] [c021c991] acpi_thermal_check+0x150/0x3bb [ 50.288415] [c021d4b3] acpi_thermal_add+0x261/0x2cf [ 50.288515] [c0213549] acpi_device_probe+0x3e/0xdb [ 50.288615] [c023f8f5] driver_probe_device+0xaf/0x12a [ 50.288717] [c023fa88] __driver_attach+0x6c/0xa5 [ 50.288817] [c023ee5a] bus_for_each_dev+0x3e/0x60 [ 50.288916] [c023f77d] driver_attach+0x14/0x16 [ 50.289015] [c023f5a6] bus_add_driver+0xa6/0x1a8 [ 50.289114] [c023fc53] driver_register+0x42/0x47 [ 50.289214] [c02138c2] acpi_bus_register_driver+0x3a/0x3c [ 50.289316] [c044306b] acpi_thermal_init+0x57/0x76 [ 50.289424] [c04344a7] kernel_init+0x138/0x280 [ 50.289525] [c01047df] kernel_thread_helper+0x7/0x10 [ 50.289625] === [ 50.289680] ACPI: Critical trip point [ 50.289736] Critical temperature reached (255 C), shutting down. so in acpi_thermal_get_temperature() called in acpi_thermal_add() the tz-temperature thingy is not set properly (printk's added): [ 50.276607] Old temp: 4294967023 [ 50.281890] Got temp: 255 [ 50.282567] Old temp: 255 [ 50.287882] Got temp: 255 What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and there's still garbage in it after reading it in acpi_thermal_get_temperature() for the first time. Debugging continues... (i almost suspected that the problem might be something completely different.) well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer turned out to be broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch. After backing this one out, mm1 boots just fine here. -- Regards/Gruß, Boris. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
I can't see this compile failure posted anywhere: http://test.kernel.org/results/IBM/126049/build/debug/stderr arch/x86/vdso/vdso32/sigreturn.S: Assembler messages: arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid for `pop' arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for `pop' make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1 make: *** [arch/x86/vdso] Error 2 I see those on one build machine but not on another, so I thought that it was a tools issue... If so, it's a tools issue that worked fine until -mm1, which makes it a kernel problem in my mind ;-) Nor this one: http://test.kernel.org/results/IBM/126096/build/debug/stderr drivers/char/hvcs.c: In function ‘hvcs_open’: drivers/char/hvcs.c:1180: error: wrong type argument to unary exclamation mark See http://marc.info/?l=linux-kernelm=119700448119646 for patches. Thanks, M. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Tuesday 11 December 2007 10:44:43 am Borislav Petkov wrote: On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote: On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote: Hi Andrew, Hi Len, after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just fine) on my asus laptop, the machine reboots after claiming that Critical temperature reached (255 C). However, the degrees number is kinda hinting at 0xff all-ones field. Will try dump_stack in acpi_thermal_critical() to checkout the call path. For now here's the netconsole bootlog: Here's what i got so far: [ 50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14 [ 50.287999] [c0104b65] show_trace_log_lvl+0x12/0x25 [ 50.288103] [c01053e7] show_trace+0xd/0x10 [ 50.288202] [c0105a6c] dump_stack+0x57/0x5f [ 50.288303] [c021c991] acpi_thermal_check+0x150/0x3bb [ 50.288415] [c021d4b3] acpi_thermal_add+0x261/0x2cf [ 50.288515] [c0213549] acpi_device_probe+0x3e/0xdb [ 50.288615] [c023f8f5] driver_probe_device+0xaf/0x12a [ 50.288717] [c023fa88] __driver_attach+0x6c/0xa5 [ 50.288817] [c023ee5a] bus_for_each_dev+0x3e/0x60 [ 50.288916] [c023f77d] driver_attach+0x14/0x16 [ 50.289015] [c023f5a6] bus_add_driver+0xa6/0x1a8 [ 50.289114] [c023fc53] driver_register+0x42/0x47 [ 50.289214] [c02138c2] acpi_bus_register_driver+0x3a/0x3c [ 50.289316] [c044306b] acpi_thermal_init+0x57/0x76 [ 50.289424] [c04344a7] kernel_init+0x138/0x280 [ 50.289525] [c01047df] kernel_thread_helper+0x7/0x10 [ 50.289625] === [ 50.289680] ACPI: Critical trip point [ 50.289736] Critical temperature reached (255 C), shutting down. so in acpi_thermal_get_temperature() called in acpi_thermal_add() the tz-temperature thingy is not set properly (printk's added): [ 50.276607] Old temp: 4294967023 [ 50.281890] Got temp: 255 [ 50.282567] Old temp: 255 [ 50.287882] Got temp: 255 What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and there's still garbage in it after reading it in acpi_thermal_get_temperature() for the first time. Debugging continues... (i almost suspected that the problem might be something completely different.) well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer turned out to be broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch. After backing this one out, mm1 boots just fine here. Thanks for tracking this down. I'll look into your logs and see if I can figure out what's going on. There's another report related to that patch here: http://lkml.org/lkml/2007/11/22/110 . Looks like a different symptom though, so probably a different fix. Bjorn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Tue, 11 Dec 2007 08:13:52 -0800 Martin Bligh [EMAIL PROTECTED] wrote: - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. Wouldn't it make sense to just default this to on if E1000 was on, rather than screwing everybody for no good reason (plus breaking all the automated testing, etc etc)? Much though I love random refactoring, it is fairly painful to just keep changing the names of things. (cc netdev and Auke) Yes, that would be very sensible. CONFIG_E1000E should default to whatever CONFIG_E1000 was set to. I can't see this compile failure posted anywhere: http://test.kernel.org/results/IBM/126049/build/debug/stderr arch/x86/vdso/vdso32/sigreturn.S: Assembler messages: arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid for `pop' arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for `pop' make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1 make: *** [arch/x86/vdso] Error 2 (cc Ingo and Thomas) Nor this one: http://test.kernel.org/results/IBM/126096/build/debug/stderr drivers/char/hvcs.c: In function ‘hvcs_open’: drivers/char/hvcs.c:1180: error: wrong type argument to unary exclamation mark (cc Greg) Caused by gregkh-driver-kobject-convert-hvcs-to-use-kref-not-kobject.patch. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
* Andrew Morton [EMAIL PROTECTED] wrote: I can't see this compile failure posted anywhere: http://test.kernel.org/results/IBM/126049/build/debug/stderr arch/x86/vdso/vdso32/sigreturn.S: Assembler messages: arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid for `pop' arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for `pop' make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1 make: *** [arch/x86/vdso] Error 2 (cc Ingo and Thomas) Roland says: | That seems like it must be a tool problem. The V=1 output would show | if those compiles missed -m32 or something. But even in the wrong | mode, this error does not make sense. The assembly code it's citing | is identical to the old arch/x86/ia32/vsyscall-syscall.S code. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Tue, Dec 11, 2007 at 01:00:24PM -0700, Bjorn Helgaas wrote: On Tuesday 11 December 2007 10:44:43 am Borislav Petkov wrote: On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote: On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote: Hi Andrew, Hi Len, after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just fine) on my asus laptop, the machine reboots after claiming that Critical temperature reached (255 C). However, the degrees number is kinda hinting at 0xff all-ones field. Will try dump_stack in acpi_thermal_critical() to checkout the call path. For now here's the netconsole bootlog: Here's what i got so far: [ 50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14 [ 50.287999] [c0104b65] show_trace_log_lvl+0x12/0x25 [ 50.288103] [c01053e7] show_trace+0xd/0x10 [ 50.288202] [c0105a6c] dump_stack+0x57/0x5f [ 50.288303] [c021c991] acpi_thermal_check+0x150/0x3bb [ 50.288415] [c021d4b3] acpi_thermal_add+0x261/0x2cf [ 50.288515] [c0213549] acpi_device_probe+0x3e/0xdb [ 50.288615] [c023f8f5] driver_probe_device+0xaf/0x12a [ 50.288717] [c023fa88] __driver_attach+0x6c/0xa5 [ 50.288817] [c023ee5a] bus_for_each_dev+0x3e/0x60 [ 50.288916] [c023f77d] driver_attach+0x14/0x16 [ 50.289015] [c023f5a6] bus_add_driver+0xa6/0x1a8 [ 50.289114] [c023fc53] driver_register+0x42/0x47 [ 50.289214] [c02138c2] acpi_bus_register_driver+0x3a/0x3c [ 50.289316] [c044306b] acpi_thermal_init+0x57/0x76 [ 50.289424] [c04344a7] kernel_init+0x138/0x280 [ 50.289525] [c01047df] kernel_thread_helper+0x7/0x10 [ 50.289625] === [ 50.289680] ACPI: Critical trip point [ 50.289736] Critical temperature reached (255 C), shutting down. so in acpi_thermal_get_temperature() called in acpi_thermal_add() the tz-temperature thingy is not set properly (printk's added): [ 50.276607] Old temp: 4294967023 [ 50.281890] Got temp: 255 [ 50.282567] Old temp: 255 [ 50.287882] Got temp: 255 What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and there's still garbage in it after reading it in acpi_thermal_get_temperature() for the first time. Debugging continues... (i almost suspected that the problem might be something completely different.) well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer turned out to be broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch. After backing this one out, mm1 boots just fine here. Thanks for tracking this down. I'll look into your logs and see if I can figure out what's going on. There's another report related to that patch here: http://lkml.org/lkml/2007/11/22/110 . Looks like a different symptom though, so probably a different fix. From what i can roughly tell so far it seems like an resource conflict between acpi and the pnp requested regions in your patch which result in the acpi_thermal code to read the wrong (0xff) temperature value and halt the machine, but i might be wrong on the details since acpi is such a big code chunk to swallow. Anyways, this is a different issue than the one you quote above. -- Regards/Gruß, Boris. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
Andrew Morton wrote: On Tue, 11 Dec 2007 08:13:52 -0800 Martin Bligh [EMAIL PROTECTED] wrote: - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. Wouldn't it make sense to just default this to on if E1000 was on, rather than screwing everybody for no good reason (plus breaking all the automated testing, etc etc)? Much though I love random refactoring, it is fairly painful to just keep changing the names of things. (cc netdev and Auke) Yes, that would be very sensible. CONFIG_E1000E should default to whatever CONFIG_E1000 was set to. which is y for x86 and friends, ppc, arm and ia64 through 'defconfig'. the Kconfig files do not have defaults in them. I can send a patch to adjust the defconfig files, would that be OK? I certainly think that would be reasonable, I dislike setting defaults through defconfig for network drivers myself and rather would not do that. Auke -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
Kok, Auke wrote: Andrew Morton wrote: On Tue, 11 Dec 2007 08:13:52 -0800 Martin Bligh [EMAIL PROTECTED] wrote: - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. Wouldn't it make sense to just default this to on if E1000 was on, rather than screwing everybody for no good reason (plus breaking all the automated testing, etc etc)? Much though I love random refactoring, it is fairly painful to just keep changing the names of things. (cc netdev and Auke) Yes, that would be very sensible. CONFIG_E1000E should default to whatever CONFIG_E1000 was set to. which is y for x86 and friends, ppc, arm and ia64 through 'defconfig'. the Kconfig files do not have defaults in them. I can send a patch to adjust the defconfig files, would that be OK? I certainly think that would be reasonable, I dislike setting defaults through defconfig for network drivers myself and rather would not do that. that should read dislike setting defaults through Kconfig ... Auke -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Tue, 11 Dec 2007 13:26:58 -0800 Kok, Auke [EMAIL PROTECTED] wrote: Andrew Morton wrote: On Tue, 11 Dec 2007 08:13:52 -0800 Martin Bligh [EMAIL PROTECTED] wrote: - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. Wouldn't it make sense to just default this to on if E1000 was on, rather than screwing everybody for no good reason (plus breaking all the automated testing, etc etc)? Much though I love random refactoring, it is fairly painful to just keep changing the names of things. (cc netdev and Auke) Yes, that would be very sensible. CONFIG_E1000E should default to whatever CONFIG_E1000 was set to. which is y for x86 and friends, ppc, arm and ia64 through 'defconfig'. the Kconfig files do not have defaults in them. I wouldn't be looking at defconfig files - I don't think many people use them. Most people use their previous config, via oldconfig. So what we want here is to give them E1000E if they had previously been using E1000. I don't know how one would do this in Kconfig. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
Andrew Morton wrote: On Tue, 11 Dec 2007 13:26:58 -0800 Kok, Auke [EMAIL PROTECTED] wrote: Andrew Morton wrote: On Tue, 11 Dec 2007 08:13:52 -0800 Martin Bligh [EMAIL PROTECTED] wrote: - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. Wouldn't it make sense to just default this to on if E1000 was on, rather than screwing everybody for no good reason (plus breaking all the automated testing, etc etc)? Much though I love random refactoring, it is fairly painful to just keep changing the names of things. (cc netdev and Auke) Yes, that would be very sensible. CONFIG_E1000E should default to whatever CONFIG_E1000 was set to. which is y for x86 and friends, ppc, arm and ia64 through 'defconfig'. the Kconfig files do not have defaults in them. I wouldn't be looking at defconfig files - I don't think many people use them. Most people use their previous config, via oldconfig. So what we want here is to give them E1000E if they had previously been using E1000. I don't know how one would do this in Kconfig. ditto. I doubt that SELECT E1000E would be a good idea here (maybe not even work), and I can't think of anything else. Auke -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Tue, 11 Dec 2007 14:17:16 -0800 Kok, Auke wrote: Andrew Morton wrote: On Tue, 11 Dec 2007 13:26:58 -0800 Kok, Auke [EMAIL PROTECTED] wrote: Andrew Morton wrote: On Tue, 11 Dec 2007 08:13:52 -0800 Martin Bligh [EMAIL PROTECTED] wrote: - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. Wouldn't it make sense to just default this to on if E1000 was on, rather than screwing everybody for no good reason (plus breaking all the automated testing, etc etc)? Much though I love random refactoring, it is fairly painful to just keep changing the names of things. (cc netdev and Auke) Yes, that would be very sensible. CONFIG_E1000E should default to whatever CONFIG_E1000 was set to. which is y for x86 and friends, ppc, arm and ia64 through 'defconfig'. the Kconfig files do not have defaults in them. I wouldn't be looking at defconfig files - I don't think many people use them. Most people use their previous config, via oldconfig. So what we want here is to give them E1000E if they had previously been using E1000. I don't know how one would do this in Kconfig. ditto. I doubt that SELECT E1000E would be a good idea here (maybe not even work), and I can't think of anything else. default E1000 in E1000E seems to work for me. --- From: Randy Dunlap [EMAIL PROTECTED] Make E1000E default to the same kconfig setting as E1000, at least for -mm testing. Signed-off-by: Randy Dunlap [EMAIL PROTECTED] --- drivers/net/Kconfig |1 + 1 file changed, 1 insertion(+) --- linux-2.6.24-rc4-mm1.orig/drivers/net/Kconfig +++ linux-2.6.24-rc4-mm1/drivers/net/Kconfig @@ -1986,6 +1986,7 @@ config E1000_DISABLE_PACKET_SPLIT config E1000E tristate Intel(R) PRO/1000 PCI-Express Gigabit Ethernet support depends on PCI + default E1000 ---help--- This driver supports the PCI-Express Intel(R) PRO/1000 gigabit ethernet family of adapters. For PCI or PCI-X e1000 adapters, -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine... solved
On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote: From what i can roughly tell so far it seems like an resource conflict between acpi and the pnp requested regions in your patch which result in the acpi_thermal code to read the wrong (0xff) temperature value and halt the machine, but i might be wrong on the details since acpi is such a big code chunk to swallow. I don't see any obvious conflict from the log you posted. For the sake of comparison, can you post the corresponding dmesg log after you removed the patch? acpi_thermal_get_temperature() only evaluates _TMP, which isn't very interesting. I wonder if there's some conflict between that AML method and the EC driver or something. If you can also collect the DSDT, maybe I can poke around in there and see what _TMP is really doing. Thanks, Bjorn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1 -- boot process hangs -- tty4 main process (2988) terminated with status 1
On Sat, 8 Dec 2007 21:29:18 -0500 Miles Lane [EMAIL PROTECTED] wrote: Dec 6 21:24:28 erratic-orbits init: tty3 main process (2991) terminated with status 1 Boggle. We broke the vt driver? config, please... I sent the .config. I didn't receive it but I found a config from you in amother thread. Is there nothing else to follow up on? I have tried rebuilding about seven kernels, tweaking the options each time. All the kernels have failed to boot. I am currently trying with a defconfig kernel. Perhaps I will have better luck with it. Your config instabricks my Vaio. Fiddled with it a bit but failed to pick the problem. Fixing regressions in -mm isn't top priority at present I'm afraid. If the same bug is present in next -mm it'd be great if you could bisect it down please. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Tue, 4 Dec 2007 21:17:01 -0800 Andrew Morton [EMAIL PROTECTED] wrote: Changes since 2.6.24-rc3-mm2: 2.6.24-rc4-mm1 brought a nice TCP oops on my x86_64 system, while I was stress-testing the VM and watching via ssh: general protection fault: [1] SMP last sysfs file: /sys/devices/pci:00/:00:1c.5/:04:00.0/irq CPU 1 Modules linked in: nfs lockd nfs_acl rfcomm l2cap bluetooth autofs4 sunrpc ipv6 acpi_cpufreq dm_multipath parport_pc e1000e parport firewire_ohci button i2c_i801 i2c_core i82975x_edac pcspkr firewire_core serio_raw edac_core rtc_cmos floppy crc_itu_t sg sr_mod cdrom pata_marvell ata_piix dm_snapshot dm_zero dm_mirror dm_mod ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 2946, comm: sshd Not tainted 2.6.24-rc4-mm1 #1 RIP: 0010:[81227add] [81227add] __tcp_rb_insert+0x1a/0x67 RSP: 0018:810066401c88 EFLAGS: 00010202 RAX: 6b6b6b6b6b6b6b6b RBX: 810076e9f000 RCX: 81003ddc9900 RDX: 6b6b6b6b6b6b6bab RSI: 81006ed1b148 RDI: 6b6b6b6b6b6b6b5b RBP: 81006ed1aa00 R08: 810076e9f010 R09: bef8d64e R10: 81228926 R11: 8110b2aa R12: 810066401de8 R13: 00e0 R14: 810066401ee8 R15: 0001 FS: 7f1c2c10d780() GS:81007f801578() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 02aabfd3 CR3: 665e3000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process sshd (pid: 2946, threadinfo 81006640, task 8100665ce000) Stack: 81003ddc9900 81228b26 0001 810066401ee8 810574da 04e00040 00e004e0 7f1c2c797620 0246 66401d60 Call Trace: [81228b26] tcp_sendmsg+0x21f/0xb00 [811f0435] sock_aio_write+0xf8/0x110 [810a9451] do_sync_write+0xc9/0x10c [811071d3] file_has_perm+0x9a/0xa9 [8104e29a] autoremove_wake_function+0x0/0x2e [81059db6] __lock_acquire+0x50f/0xc8e [810574da] lock_release_holdtime+0x27/0x48 [810a9c53] vfs_write+0xd9/0x16f [810aa1fd] sys_write+0x45/0x6e [8100c0ba] tracesys+0xdc/0xe1 Code: 44 3b 4a 1c 79 10 44 3b 4a 18 78 04 0f 0b eb fe 48 8d 50 10 RIP [81227add] __tcp_rb_insert+0x1a/0x67 RSP 810066401c88 -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Brian W. Kernighan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Tue, 11 Dec 2007 01:48:39 +1100 Reuben Farrelly <[EMAIL PROTECTED]> wrote: > > > On 5/12/2007 4:17 PM, Andrew Morton wrote: > > Temporarily at > > > > http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ > > > > Will appear later at > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ > > > > > > - Lots of device IDs have been removed from the e1000 driver and moved over > > to e1000e. So if your e1000 stops working, you forgot to set > > CONFIG_E1000E. > > > > - The s390 build is still broken. > > I'm seeing this most incredibly unhelpful (to debug) but fortunately > reproduceable problem (so far 4/4 times) on this -mm kernel. I thought this > problem may have been related to another bug which I have reported (A TCP > oops) > but even after applying a likely fix for that I am still seeing this problem. > > The machine boots up perfectly fine and runs good until I load it up. > In this case I can reliably cause this to occur by pulling a 3G ISO across the > GigE network from my Linux box to my PC. After maybe 50M or so, the console > just displays this (ignore initial boot banner): > > -- > > * Starting local ... [ > ok ] > > > This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01 > > tornado login: *** buffer overf > > --- > > Yes - after displaying the 'f' in what I can only guess is the word > 'overflow', > the box spontaneously reboots. There is no further console output until it > starts to come back up again. > > The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla > 2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this > stage. > > I enabled a number of kernel debugging options but then I got no output at > all > when the machine crashed. > > I'm at a bit of a loss as to which subsystem this might be coming from, so > I'm > not sure who to CC. > > Box information is (still) up at > http://www.reub.net/files/kernel/2.6.24-rc4-mm1/ > hm. grepping around for "buffer overflow" doesn't turn up anything except in drivers which you won't be using on that machine. I'd be suspecting networking, obviously. If you're feeling keen could you please grep a 2.6.24-rc4 tree and apply 2.6.24-rc4-mm1's origin.patch and git-net.patch and see if the bug is still present? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Mon, 10 Dec 2007, Ilpo Järvinen wrote: > Dave, please include this one to net-2.6.25. ... > -- > [PATCH] [TCP]: Fix fack_count miscountings (multiple places) I've better version of this coming up, so Dave please don't put this one into net-2.6.25 (noticed that both the original and the after patch code can get to an infinite loop and the new code is flawed in some rare cases still as well). I'll submit a better version soon. -- i.
Re: 2.6.24-rc4-mm1
On 5/12/2007 4:17 PM, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. - The s390 build is still broken. I'm seeing this most incredibly unhelpful (to debug) but fortunately reproduceable problem (so far 4/4 times) on this -mm kernel. I thought this problem may have been related to another bug which I have reported (A TCP oops) but even after applying a likely fix for that I am still seeing this problem. The machine boots up perfectly fine and runs good until I load it up. In this case I can reliably cause this to occur by pulling a 3G ISO across the GigE network from my Linux box to my PC. After maybe 50M or so, the console just displays this (ignore initial boot banner): -- * Starting local ... [ ok ] This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01 tornado login: *** buffer overf --- Yes - after displaying the 'f' in what I can only guess is the word 'overflow', the box spontaneously reboots. There is no further console output until it starts to come back up again. The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage. I enabled a number of kernel debugging options but then I got no output at all when the machine crashed. I'm at a bit of a loss as to which subsystem this might be coming from, so I'm not sure who to CC. Box information is (still) up at http://www.reub.net/files/kernel/2.6.24-rc4-mm1/ Thanks, Reuben -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Wed, 5 Dec 2007, Andrew Morton wrote: > On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly <[EMAIL PROTECTED]> wrote: > > > This non fatal oops which I have just noticed may be related to this change > > then > > - certainly looks networking related. > > yep, but it isn't e1000. It's core TCP. > > > WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() > > Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1 > > Ilpo, Reuben's kernel is talking to you ;) ...Please try the patch below. Andrew, this probably fixes your problem (the packets <= tp->packets_out) as well. Dave, please include this one to net-2.6.25. -- i. -- [PATCH] [TCP]: Fix fack_count miscountings (multiple places) 1) Fack_count is set incorrectly if the highest sent skb is already sacked (the skb->prev won't return it because it's on the other list already). These manifest as fackets_out counting error later on, the second-order effects are very hard to track, so it may fix all out-standing TCP bug reports. 2) Prev == NULL check was wrong way around 3) Last skb's fack count was incorrectly skipped while() {} loop Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> --- include/net/tcp.h | 22 -- 1 files changed, 16 insertions(+), 6 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 9dbed0b..11a7e3e 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1337,10 +1337,20 @@ static inline struct sk_buff *tcp_send_head(struct sock *sk) static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb) { struct sk_buff *prev = tcp_write_queue_prev(sk, skb); + unsigned int fc = 0; + + if (prev == (struct sk_buff *)>sk_write_queue) + prev = NULL; + else if (!tcp_skb_adjacent(sk, prev, skb)) + prev = NULL; - if (prev != (struct sk_buff *)>sk_write_queue) - TCP_SKB_CB(skb)->fack_count = TCP_SKB_CB(prev)->fack_count + - tcp_skb_pcount(prev); + if ((prev == NULL) && !__tcp_write_queue_empty(sk, TCP_WQ_SACKED)) + prev = __tcp_write_queue_tail(sk, TCP_WQ_SACKED); + + if (prev != NULL) + fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev); + + TCP_SKB_CB(skb)->fack_count = fc; sk->sk_send_head = tcp_write_queue_next(sk, skb); if (sk->sk_send_head == (struct sk_buff *)>sk_write_queue) @@ -1464,7 +1474,7 @@ static inline struct sk_buff *__tcp_reset_fack_counts(struct sock *sk, { unsigned int fc = 0; - if (prev == NULL) + if (prev != NULL) fc = TCP_SKB_CB(*prev)->fack_count + tcp_skb_pcount(*prev); BUG_ON((*prev != NULL) && !tcp_skb_adjacent(sk, *prev, skb)); @@ -1521,7 +1531,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *inskb) skb[otherq] = prev->next; } - while (skb[queue] != __tcp_write_queue_tail(sk, queue)) { + do { /* Lazy find for the other queue */ if (skb[queue] == NULL) { skb[queue] = tcp_write_queue_find(sk, TCP_SKB_CB(prev)->seq, @@ -1535,7 +1545,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *inskb) break; queue ^= TCP_WQ_SACKED; - } + } while (skb[queue] != __tcp_write_queue_tail(sk, queue)); } static inline void __tcp_insert_write_queue_after(struct sk_buff *skb, -- 1.5.0.6
Re: 2.6.24-rc4-mm1
On Wed, 5 Dec 2007, Andrew Morton wrote: On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly [EMAIL PROTECTED] wrote: This non fatal oops which I have just noticed may be related to this change then - certainly looks networking related. yep, but it isn't e1000. It's core TCP. WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1 Ilpo, Reuben's kernel is talking to you ;) ...Please try the patch below. Andrew, this probably fixes your problem (the packets = tp-packets_out) as well. Dave, please include this one to net-2.6.25. -- i. -- [PATCH] [TCP]: Fix fack_count miscountings (multiple places) 1) Fack_count is set incorrectly if the highest sent skb is already sacked (the skb-prev won't return it because it's on the other list already). These manifest as fackets_out counting error later on, the second-order effects are very hard to track, so it may fix all out-standing TCP bug reports. 2) Prev == NULL check was wrong way around 3) Last skb's fack count was incorrectly skipped while() {} loop Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- include/net/tcp.h | 22 -- 1 files changed, 16 insertions(+), 6 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 9dbed0b..11a7e3e 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1337,10 +1337,20 @@ static inline struct sk_buff *tcp_send_head(struct sock *sk) static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb) { struct sk_buff *prev = tcp_write_queue_prev(sk, skb); + unsigned int fc = 0; + + if (prev == (struct sk_buff *)sk-sk_write_queue) + prev = NULL; + else if (!tcp_skb_adjacent(sk, prev, skb)) + prev = NULL; - if (prev != (struct sk_buff *)sk-sk_write_queue) - TCP_SKB_CB(skb)-fack_count = TCP_SKB_CB(prev)-fack_count + - tcp_skb_pcount(prev); + if ((prev == NULL) !__tcp_write_queue_empty(sk, TCP_WQ_SACKED)) + prev = __tcp_write_queue_tail(sk, TCP_WQ_SACKED); + + if (prev != NULL) + fc = TCP_SKB_CB(prev)-fack_count + tcp_skb_pcount(prev); + + TCP_SKB_CB(skb)-fack_count = fc; sk-sk_send_head = tcp_write_queue_next(sk, skb); if (sk-sk_send_head == (struct sk_buff *)sk-sk_write_queue) @@ -1464,7 +1474,7 @@ static inline struct sk_buff *__tcp_reset_fack_counts(struct sock *sk, { unsigned int fc = 0; - if (prev == NULL) + if (prev != NULL) fc = TCP_SKB_CB(*prev)-fack_count + tcp_skb_pcount(*prev); BUG_ON((*prev != NULL) !tcp_skb_adjacent(sk, *prev, skb)); @@ -1521,7 +1531,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *inskb) skb[otherq] = prev-next; } - while (skb[queue] != __tcp_write_queue_tail(sk, queue)) { + do { /* Lazy find for the other queue */ if (skb[queue] == NULL) { skb[queue] = tcp_write_queue_find(sk, TCP_SKB_CB(prev)-seq, @@ -1535,7 +1545,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *inskb) break; queue ^= TCP_WQ_SACKED; - } + } while (skb[queue] != __tcp_write_queue_tail(sk, queue)); } static inline void __tcp_insert_write_queue_after(struct sk_buff *skb, -- 1.5.0.6
Re: 2.6.24-rc4-mm1
On 5/12/2007 4:17 PM, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. - The s390 build is still broken. I'm seeing this most incredibly unhelpful (to debug) but fortunately reproduceable problem (so far 4/4 times) on this -mm kernel. I thought this problem may have been related to another bug which I have reported (A TCP oops) but even after applying a likely fix for that I am still seeing this problem. The machine boots up perfectly fine and runs good until I load it up. In this case I can reliably cause this to occur by pulling a 3G ISO across the GigE network from my Linux box to my PC. After maybe 50M or so, the console just displays this (ignore initial boot banner): -- * Starting local ... [ ok ] This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01 tornado login: *** buffer overf --- Yes - after displaying the 'f' in what I can only guess is the word 'overflow', the box spontaneously reboots. There is no further console output until it starts to come back up again. The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage. I enabled a number of kernel debugging options but then I got no output at all when the machine crashed. I'm at a bit of a loss as to which subsystem this might be coming from, so I'm not sure who to CC. Box information is (still) up at http://www.reub.net/files/kernel/2.6.24-rc4-mm1/ Thanks, Reuben -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Mon, 10 Dec 2007, Ilpo Järvinen wrote: Dave, please include this one to net-2.6.25. ... -- [PATCH] [TCP]: Fix fack_count miscountings (multiple places) I've better version of this coming up, so Dave please don't put this one into net-2.6.25 (noticed that both the original and the after patch code can get to an infinite loop and the new code is flawed in some rare cases still as well). I'll submit a better version soon. -- i.
Re: 2.6.24-rc4-mm1
On Tue, 11 Dec 2007 01:48:39 +1100 Reuben Farrelly [EMAIL PROTECTED] wrote: On 5/12/2007 4:17 PM, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. - The s390 build is still broken. I'm seeing this most incredibly unhelpful (to debug) but fortunately reproduceable problem (so far 4/4 times) on this -mm kernel. I thought this problem may have been related to another bug which I have reported (A TCP oops) but even after applying a likely fix for that I am still seeing this problem. The machine boots up perfectly fine and runs good until I load it up. In this case I can reliably cause this to occur by pulling a 3G ISO across the GigE network from my Linux box to my PC. After maybe 50M or so, the console just displays this (ignore initial boot banner): -- * Starting local ... [ ok ] This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01 tornado login: *** buffer overf --- Yes - after displaying the 'f' in what I can only guess is the word 'overflow', the box spontaneously reboots. There is no further console output until it starts to come back up again. The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage. I enabled a number of kernel debugging options but then I got no output at all when the machine crashed. I'm at a bit of a loss as to which subsystem this might be coming from, so I'm not sure who to CC. Box information is (still) up at http://www.reub.net/files/kernel/2.6.24-rc4-mm1/ hm. grepping around for buffer overflow doesn't turn up anything except in drivers which you won't be using on that machine. I'd be suspecting networking, obviously. If you're feeling keen could you please grep a 2.6.24-rc4 tree and apply 2.6.24-rc4-mm1's origin.patch and git-net.patch and see if the bug is still present? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Dec 8, 2007 6:22 AM, Luis R. Rodriguez <[EMAIL PROTECTED]> wrote: > On Dec 6, 2007 9:12 PM, Dave Young <[EMAIL PROTECTED]> wrote: > > Hi, > > > > 2.6.24-rc4-mm1 build failed at drivers/net/wireless/ath5k/base.c for some > > inline functions like this: > > drivers/net/wireless/ath5k/base.c:292: sorry, unimplemented: inlining > > failed in call to 'ath5k_extend_tsf': function body not available > > > > fix it with adjust the order of inline function body. > > > > Signed-off-by: Dave Young <[EMAIL PROTECTED]> > > Acked-by: Luis R. Rodriguez <[EMAIL PROTECTED]> Thanks. > > Thanks Dave. What version of gcc were you using? I haven't run into this. gcc 3.4.6 > > BTW, nothing new was added in this patch, things were just shifted, > but even that may be copyrightable. Is it fair to assume you are > licensing these changes under the same license the file is in? Ok, I don't care. > > For this file we'd usually use: > > Changes-licensed-under: 3-clause-BSD > > For future reference: > > http://linuxwireless.org/en/developers/Documentation/SubmittingPatches#Changes-licensed-undertag > > Luis > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
2007/12/7, Dave Young <[EMAIL PROTECTED]>: > Hi, > > 2.6.24-rc4-mm1 build failed at drivers/net/wireless/ath5k/base.c for some > inline functions like this: > drivers/net/wireless/ath5k/base.c:292: sorry, unimplemented: inlining failed > in call to 'ath5k_extend_tsf': function body not available > > fix it with adjust the order of inline function body. > > Signed-off-by: Dave Young <[EMAIL PROTECTED]> > > --- > drivers/net/wireless/ath5k/base.c | 67 > -- > 1 file changed, 29 insertions(+), 38 deletions(-) > > diff -upr linux/drivers/net/wireless/ath5k/base.c > linux.new/drivers/net/wireless/ath5k/base.c > --- linux/drivers/net/wireless/ath5k/base.c 2007-12-07 10:01:42.0 > +0800 > +++ linux.new/drivers/net/wireless/ath5k/base.c 2007-12-07 10:01:49.0 > +0800 > @@ -250,8 +250,19 @@ static int ath5k_rxbuf_setup(struct ath > static int ath5k_txbuf_setup(struct ath5k_softc *sc, > struct ath5k_buf *bf, > struct ieee80211_tx_control *ctl); > + > static inline void ath5k_txbuf_free(struct ath5k_softc *sc, > - struct ath5k_buf *bf); > + struct ath5k_buf *bf) > +{ > + BUG_ON(!bf); > + if (!bf->skb) > + return; > + pci_unmap_single(sc->pdev, bf->skbaddr, bf->skb->len, > + PCI_DMA_TODEVICE); > + dev_kfree_skb(bf->skb); > + bf->skb = NULL; > +} > + > /* Queues setup */ > static struct ath5k_txq *ath5k_txq_setup(struct ath5k_softc *sc, > int qtype, int subtype); > @@ -278,14 +289,29 @@ static intath5k_beacon_setup(struct at > struct ieee80211_tx_control *ctl); > static voidath5k_beacon_send(struct ath5k_softc *sc); > static voidath5k_beacon_config(struct ath5k_softc *sc); > -static inline u64 ath5k_extend_tsf(struct ath5k_hw *ah, u32 rstamp); > + > +static inline u64 ath5k_extend_tsf(struct ath5k_hw *ah, u32 rstamp) > +{ > + u64 tsf = ath5k_hw_get_tsf64(ah); > + > + if ((tsf & 0x7fff) < rstamp) > + tsf -= 0x8000; > + > + return (tsf & ~0x7fff) | rstamp; > +} > + > /* Interrupt handling */ > static int ath5k_init(struct ath5k_softc *sc); > static int ath5k_stop_locked(struct ath5k_softc *sc); > static int ath5k_stop_hw(struct ath5k_softc *sc); > static irqreturn_t ath5k_intr(int irq, void *dev_id); > static voidath5k_tasklet_reset(unsigned long data); > -static inline void ath5k_update_txpow(struct ath5k_softc *sc); > + > +static inline void ath5k_update_txpow(struct ath5k_softc *sc) > +{ > + ath5k_hw_set_txpower_limit(sc->ah, 0); > +} > + > static voidath5k_calibrate(unsigned long data); > /* LED functions */ > static voidath5k_led_off(unsigned long data); > @@ -1341,21 +1367,6 @@ err_unmap: > return ret; > } > > -static inline void > -ath5k_txbuf_free(struct ath5k_softc *sc, struct ath5k_buf *bf) > -{ > - BUG_ON(!bf); > - if (!bf->skb) > - return; > - pci_unmap_single(sc->pdev, bf->skbaddr, bf->skb->len, > - PCI_DMA_TODEVICE); > - dev_kfree_skb(bf->skb); > - bf->skb = NULL; > -} > - > - > - > - > /**\ > * Queues setup * > \**/ > @@ -2046,20 +2057,6 @@ ath5k_beacon_config(struct ath5k_softc * > #undef TSF_TO_TU > } > > -static inline > -u64 ath5k_extend_tsf(struct ath5k_hw *ah, u32 rstamp) > -{ > - u64 tsf = ath5k_hw_get_tsf64(ah); > - > - if ((tsf & 0x7fff) < rstamp) > - tsf -= 0x8000; > - > - return (tsf & ~0x7fff) | rstamp; > -} > - > - > - > - > /\ > * Interrupt handling * > \/ > @@ -2295,12 +2292,6 @@ ath5k_tasklet_reset(unsigned long data) > ath5k_reset(sc->hw); > } > > -static inline void > -ath5k_update_txpow(struct ath5k_softc *sc) > -{ > - ath5k_hw_set_txpower_limit(sc->ah, 0); > -} > - > /* > * Periodically recalibrate the PHY to account > * for temperature/environment changes. > We'll change their order in the code, plz keep prototype declarations clean. I'll submit a patch asap on this. -- GPG ID: 0xD21DB2DB As you read this post global entropy rises. Have Fun ;-) Nick -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine
On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote: > Hi Andrew, > Hi Len, > > after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just > fine) on my asus laptop, the machine reboots after claiming that > "Critical temperature reached (255 C)." However, the degrees number > is kinda hinting at 0xff all-ones field. Will try dump_stack in > acpi_thermal_critical() to checkout the call path. For now here's the > netconsole bootlog: Here's what i got so far: [ 50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14 [ 50.287999] [] show_trace_log_lvl+0x12/0x25 [ 50.288103] [] show_trace+0xd/0x10 [ 50.288202] [] dump_stack+0x57/0x5f [ 50.288303] [] acpi_thermal_check+0x150/0x3bb [ 50.288415] [] acpi_thermal_add+0x261/0x2cf [ 50.288515] [] acpi_device_probe+0x3e/0xdb [ 50.288615] [] driver_probe_device+0xaf/0x12a [ 50.288717] [] __driver_attach+0x6c/0xa5 [ 50.288817] [] bus_for_each_dev+0x3e/0x60 [ 50.288916] [] driver_attach+0x14/0x16 [ 50.289015] [] bus_add_driver+0xa6/0x1a8 [ 50.289114] [] driver_register+0x42/0x47 [ 50.289214] [] acpi_bus_register_driver+0x3a/0x3c [ 50.289316] [] acpi_thermal_init+0x57/0x76 [ 50.289424] [] kernel_init+0x138/0x280 [ 50.289525] [] kernel_thread_helper+0x7/0x10 [ 50.289625] === [ 50.289680] ACPI: Critical trip point [ 50.289736] Critical temperature reached (255 C), shutting down. so in acpi_thermal_get_temperature() called in acpi_thermal_add() the tz->temperature thingy is not set properly (printk's added): [ 50.276607] Old temp: 4294967023 [ 50.281890] Got temp: 255 [ 50.282567] Old temp: 255 [ 50.287882] Got temp: 255 What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and there's still garbage in it after reading it in acpi_thermal_get_temperature() for the first time. Debugging continues... -- Regards/Gruß, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: some issues on sparc64
On Sun, 09 Dec 2007 00:45:17 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote: > From: Andrew Morton <[EMAIL PROTECTED]> > Date: Sat, 8 Dec 2007 10:22:39 -0800 > > > That's > > > > J_ASSERT_BH(bh, !buffer_jbddirty(bh)); > > > > at the end of journal_unmap_buffer(). > > > > I don't recall seeing that before and I can't think of anything we've > > done recently which could cause it, sorry. > > If the per-cpu data patches are in the -mm tree that is the first > place I would start looking at for possible cause. They aren't. The dust hadn't settled enough on those when Christoph shot through on vacation. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: some issues on sparc64
From: Andrew Morton <[EMAIL PROTECTED]> Date: Sat, 8 Dec 2007 10:22:39 -0800 > That's > > J_ASSERT_BH(bh, !buffer_jbddirty(bh)); > > at the end of journal_unmap_buffer(). > > I don't recall seeing that before and I can't think of anything we've > done recently which could cause it, sorry. If the per-cpu data patches are in the -mm tree that is the first place I would start looking at for possible cause. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: some issues on sparc64
From: Andrew Morton [EMAIL PROTECTED] Date: Sat, 8 Dec 2007 10:22:39 -0800 That's J_ASSERT_BH(bh, !buffer_jbddirty(bh)); at the end of journal_unmap_buffer(). I don't recall seeing that before and I can't think of anything we've done recently which could cause it, sorry. If the per-cpu data patches are in the -mm tree that is the first place I would start looking at for possible cause. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: some issues on sparc64
On Sun, 09 Dec 2007 00:45:17 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: From: Andrew Morton [EMAIL PROTECTED] Date: Sat, 8 Dec 2007 10:22:39 -0800 That's J_ASSERT_BH(bh, !buffer_jbddirty(bh)); at the end of journal_unmap_buffer(). I don't recall seeing that before and I can't think of anything we've done recently which could cause it, sorry. If the per-cpu data patches are in the -mm tree that is the first place I would start looking at for possible cause. They aren't. The dust hadn't settled enough on those when Christoph shot through on vacation. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: acpi reboots machine
On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote: Hi Andrew, Hi Len, after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just fine) on my asus laptop, the machine reboots after claiming that Critical temperature reached (255 C). However, the degrees number is kinda hinting at 0xff all-ones field. Will try dump_stack in acpi_thermal_critical() to checkout the call path. For now here's the netconsole bootlog: Here's what i got so far: [ 50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14 [ 50.287999] [c0104b65] show_trace_log_lvl+0x12/0x25 [ 50.288103] [c01053e7] show_trace+0xd/0x10 [ 50.288202] [c0105a6c] dump_stack+0x57/0x5f [ 50.288303] [c021c991] acpi_thermal_check+0x150/0x3bb [ 50.288415] [c021d4b3] acpi_thermal_add+0x261/0x2cf [ 50.288515] [c0213549] acpi_device_probe+0x3e/0xdb [ 50.288615] [c023f8f5] driver_probe_device+0xaf/0x12a [ 50.288717] [c023fa88] __driver_attach+0x6c/0xa5 [ 50.288817] [c023ee5a] bus_for_each_dev+0x3e/0x60 [ 50.288916] [c023f77d] driver_attach+0x14/0x16 [ 50.289015] [c023f5a6] bus_add_driver+0xa6/0x1a8 [ 50.289114] [c023fc53] driver_register+0x42/0x47 [ 50.289214] [c02138c2] acpi_bus_register_driver+0x3a/0x3c [ 50.289316] [c044306b] acpi_thermal_init+0x57/0x76 [ 50.289424] [c04344a7] kernel_init+0x138/0x280 [ 50.289525] [c01047df] kernel_thread_helper+0x7/0x10 [ 50.289625] === [ 50.289680] ACPI: Critical trip point [ 50.289736] Critical temperature reached (255 C), shutting down. so in acpi_thermal_get_temperature() called in acpi_thermal_add() the tz-temperature thingy is not set properly (printk's added): [ 50.276607] Old temp: 4294967023 [ 50.281890] Got temp: 255 [ 50.282567] Old temp: 255 [ 50.287882] Got temp: 255 What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and there's still garbage in it after reading it in acpi_thermal_get_temperature() for the first time. Debugging continues... -- Regards/Gruß, Boris. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
2007/12/7, Dave Young [EMAIL PROTECTED]: Hi, 2.6.24-rc4-mm1 build failed at drivers/net/wireless/ath5k/base.c for some inline functions like this: drivers/net/wireless/ath5k/base.c:292: sorry, unimplemented: inlining failed in call to 'ath5k_extend_tsf': function body not available fix it with adjust the order of inline function body. Signed-off-by: Dave Young [EMAIL PROTECTED] --- drivers/net/wireless/ath5k/base.c | 67 -- 1 file changed, 29 insertions(+), 38 deletions(-) diff -upr linux/drivers/net/wireless/ath5k/base.c linux.new/drivers/net/wireless/ath5k/base.c --- linux/drivers/net/wireless/ath5k/base.c 2007-12-07 10:01:42.0 +0800 +++ linux.new/drivers/net/wireless/ath5k/base.c 2007-12-07 10:01:49.0 +0800 @@ -250,8 +250,19 @@ static int ath5k_rxbuf_setup(struct ath static int ath5k_txbuf_setup(struct ath5k_softc *sc, struct ath5k_buf *bf, struct ieee80211_tx_control *ctl); + static inline void ath5k_txbuf_free(struct ath5k_softc *sc, - struct ath5k_buf *bf); + struct ath5k_buf *bf) +{ + BUG_ON(!bf); + if (!bf-skb) + return; + pci_unmap_single(sc-pdev, bf-skbaddr, bf-skb-len, + PCI_DMA_TODEVICE); + dev_kfree_skb(bf-skb); + bf-skb = NULL; +} + /* Queues setup */ static struct ath5k_txq *ath5k_txq_setup(struct ath5k_softc *sc, int qtype, int subtype); @@ -278,14 +289,29 @@ static intath5k_beacon_setup(struct at struct ieee80211_tx_control *ctl); static voidath5k_beacon_send(struct ath5k_softc *sc); static voidath5k_beacon_config(struct ath5k_softc *sc); -static inline u64 ath5k_extend_tsf(struct ath5k_hw *ah, u32 rstamp); + +static inline u64 ath5k_extend_tsf(struct ath5k_hw *ah, u32 rstamp) +{ + u64 tsf = ath5k_hw_get_tsf64(ah); + + if ((tsf 0x7fff) rstamp) + tsf -= 0x8000; + + return (tsf ~0x7fff) | rstamp; +} + /* Interrupt handling */ static int ath5k_init(struct ath5k_softc *sc); static int ath5k_stop_locked(struct ath5k_softc *sc); static int ath5k_stop_hw(struct ath5k_softc *sc); static irqreturn_t ath5k_intr(int irq, void *dev_id); static voidath5k_tasklet_reset(unsigned long data); -static inline void ath5k_update_txpow(struct ath5k_softc *sc); + +static inline void ath5k_update_txpow(struct ath5k_softc *sc) +{ + ath5k_hw_set_txpower_limit(sc-ah, 0); +} + static voidath5k_calibrate(unsigned long data); /* LED functions */ static voidath5k_led_off(unsigned long data); @@ -1341,21 +1367,6 @@ err_unmap: return ret; } -static inline void -ath5k_txbuf_free(struct ath5k_softc *sc, struct ath5k_buf *bf) -{ - BUG_ON(!bf); - if (!bf-skb) - return; - pci_unmap_single(sc-pdev, bf-skbaddr, bf-skb-len, - PCI_DMA_TODEVICE); - dev_kfree_skb(bf-skb); - bf-skb = NULL; -} - - - - /**\ * Queues setup * \**/ @@ -2046,20 +2057,6 @@ ath5k_beacon_config(struct ath5k_softc * #undef TSF_TO_TU } -static inline -u64 ath5k_extend_tsf(struct ath5k_hw *ah, u32 rstamp) -{ - u64 tsf = ath5k_hw_get_tsf64(ah); - - if ((tsf 0x7fff) rstamp) - tsf -= 0x8000; - - return (tsf ~0x7fff) | rstamp; -} - - - - /\ * Interrupt handling * \/ @@ -2295,12 +2292,6 @@ ath5k_tasklet_reset(unsigned long data) ath5k_reset(sc-hw); } -static inline void -ath5k_update_txpow(struct ath5k_softc *sc) -{ - ath5k_hw_set_txpower_limit(sc-ah, 0); -} - /* * Periodically recalibrate the PHY to account * for temperature/environment changes. We'll change their order in the code, plz keep prototype declarations clean. I'll submit a patch asap on this. -- GPG ID: 0xD21DB2DB As you read this post global entropy rises. Have Fun ;-) Nick -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1
On Dec 8, 2007 6:22 AM, Luis R. Rodriguez [EMAIL PROTECTED] wrote: On Dec 6, 2007 9:12 PM, Dave Young [EMAIL PROTECTED] wrote: Hi, 2.6.24-rc4-mm1 build failed at drivers/net/wireless/ath5k/base.c for some inline functions like this: drivers/net/wireless/ath5k/base.c:292: sorry, unimplemented: inlining failed in call to 'ath5k_extend_tsf': function body not available fix it with adjust the order of inline function body. Signed-off-by: Dave Young [EMAIL PROTECTED] Acked-by: Luis R. Rodriguez [EMAIL PROTECTED] Thanks. Thanks Dave. What version of gcc were you using? I haven't run into this. gcc 3.4.6 BTW, nothing new was added in this patch, things were just shifted, but even that may be copyrightable. Is it fair to assume you are licensing these changes under the same license the file is in? Ok, I don't care. For this file we'd usually use: Changes-licensed-under: 3-clause-BSD For future reference: http://linuxwireless.org/en/developers/Documentation/SubmittingPatches#Changes-licensed-undertag Luis -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash
On Sat, 08 Dec 2007 20:02:54 -0700 Zan Lynx <[EMAIL PROTECTED]> wrote: > > On Sat, 2007-12-08 at 02:07 -0800, Andrew Morton wrote: > > On Fri, 07 Dec 2007 22:01:33 -0700 Zan Lynx <[EMAIL PROTECTED]> wrote: > > > > > > > > On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote: > > > > On Fri, 07 Dec 2007 23:09:43 + > > > > Zan Lynx <[EMAIL PROTECTED]> wrote: > > > [cut] > > > > > > > Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, > > > > > > > but I > > > > > > > only get read rates of 1.6 MB/s. When it used to work in 2.6.20 > > > > > > > I got > > > > > > > at least 16 MB/s. The card itself is capable of 30+ in the USB-2 > > > > > > > reader. > [cut] > > argh. OK. And Linus's current tree is OK, yes? > > > > In which case we should be OK for 2.6.24 and I guess we can hope like heck > > that the dud patch doesn't leak into mainline. Hopefully Alan will get > > some time to look into it before 2.6.25 opens. > > Linus' tree is also broken. > > I tried a Linus 2.6.24-rc4 and it acts the same way, with a very slow > transfer rate. shit > I also tried 2.6.24-rc4 with the older not-libata PATA drivers and it is > broken. squared. > dmesg had a line about the CF card detected as hda, > but /sys/block did not have hda and /dev/hda did not function. But these drivers did work in earlier kernels, yes? 2.6.20 worked, but we don't know about intervening kernels. Can you tell us which version(s)? > I will try the patches you mentioned Yes, that won't tell use anything. > but I think I may also have to > work backward through kernel versions until I find the last one where > the PCMCIA hd{a,b,c,d,e} drivers worked. That would be great - a git-bisect is often ideal. http://www.kernel.org/doc/local/git-quick.html has details. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash
On Sat, 2007-12-08 at 02:07 -0800, Andrew Morton wrote: > On Fri, 07 Dec 2007 22:01:33 -0700 Zan Lynx <[EMAIL PROTECTED]> wrote: > > > > > On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote: > > > On Fri, 07 Dec 2007 23:09:43 + > > > Zan Lynx <[EMAIL PROTECTED]> wrote: > > [cut] > > > > > > Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, > > > > > > but I > > > > > > only get read rates of 1.6 MB/s. When it used to work in 2.6.20 I > > > > > > got > > > > > > at least 16 MB/s. The card itself is capable of 30+ in the USB-2 > > > > > > reader. [cut] > argh. OK. And Linus's current tree is OK, yes? > > In which case we should be OK for 2.6.24 and I guess we can hope like heck > that the dud patch doesn't leak into mainline. Hopefully Alan will get > some time to look into it before 2.6.25 opens. Linus' tree is also broken. I tried a Linus 2.6.24-rc4 and it acts the same way, with a very slow transfer rate. I also tried 2.6.24-rc4 with the older not-libata PATA drivers and it is broken. dmesg had a line about the CF card detected as hda, but /sys/block did not have hda and /dev/hda did not function. I will try the patches you mentioned, but I think I may also have to work backward through kernel versions until I find the last one where the PCMCIA hd{a,b,c,d,e} drivers worked. -- Zan Lynx <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: 2.6.24-rc4-mm1 -- boot process hangs -- tty4 main process (2988) terminated with status 1
> > Dec 6 21:24:28 erratic-orbits init: tty3 main process (2991) > > terminated with status 1 > > Boggle. We broke the vt driver? > > config, please... I sent the .config. Is there nothing else to follow up on? I have tried rebuilding about seven kernels, tweaking the options each time. All the kernels have failed to boot. I am currently trying with a "defconfig" kernel. Perhaps I will have better luck with it. Thanks, Miles -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: some issues on sparc64
On Sat, 8 Dec 2007 19:20:28 +0100 Mariusz Kozlowski <[EMAIL PROTECTED]> wrote: > The box is sun ultra 60 (dual sparc64). This was caught when > system (gentoo) was emerging some package. > > [27006.402237] kernel BUG at fs/jbd/transaction.c:1894! That's J_ASSERT_BH(bh, !buffer_jbddirty(bh)); at the end of journal_unmap_buffer(). I don't recall seeing that before and I can't think of anything we've done recently which could cause it, sorry. > [27006.402268] \|/ \|/ > [27006.402274] "@'/ .. \`@" > [27006.402279] /_| \__/ |_\ > [27006.402285] \__U_/ x86 needs that. > [27006.402298] rm(4713): Kernel bad sw trap 5 [#1] > [27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: > 0053b1d0 Y: Not tainted > [27006.402579] TPC: > [27006.402593] g0: 0002 g1: g2: 0001 > g3: f800a7d9 > [27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 > g7: 0076d868 > [27006.402627] o0: 0072b660 o1: 0766 o2: 0002 > o3: 0001 > [27006.402644] o4: 008a2940 o5: sp: f800a7d92c91 > ret_pc: 0053b1c4 > [27006.402665] RPC: > [27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 > l3: > [27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 > l7: 0001 > [27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2: > i3: 00727000 > [27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 > i7: 00529254 > [27006.402763] I7: > [27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60 > [27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60 > [27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80 > [27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440 > [27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20 > [27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160 > [27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120 > [27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0 > [27006.402971] Caller[004e75d4]: iput+0x7c/0xa0 > [27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0 > [27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60 > [27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40 > [27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4 > [27006.403102] Instruction DUMP: 92102766 7ffbbeaf 90122260 <91d02005> > 92102780 7ffbbeab 90122260 91d02005 7ffbbea8 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: some issues on sparc64
Hello, The box is sun ultra 60 (dual sparc64). This was caught when system (gentoo) was emerging some package. [27006.402237] kernel BUG at fs/jbd/transaction.c:1894! [27006.402268] \|/ \|/ [27006.402274] "@'/ .. \`@" [27006.402279] /_| \__/ |_\ [27006.402285] \__U_/ [27006.402298] rm(4713): Kernel bad sw trap 5 [#1] [27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 0053b1d0 Y: Not tainted [27006.402579] TPC: [27006.402593] g0: 0002 g1: g2: 0001 g3: f800a7d9 [27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 g7: 0076d868 [27006.402627] o0: 0072b660 o1: 0766 o2: 0002 o3: 0001 [27006.402644] o4: 008a2940 o5: sp: f800a7d92c91 ret_pc: 0053b1c4 [27006.402665] RPC: [27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 l3: [27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 l7: 0001 [27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2: i3: 00727000 [27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 i7: 00529254 [27006.402763] I7: [27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60 [27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60 [27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80 [27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440 [27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20 [27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160 [27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120 [27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0 [27006.402971] Caller[004e75d4]: iput+0x7c/0xa0 [27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0 [27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60 [27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40 [27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4 [27006.403102] Instruction DUMP: 92102766 7ffbbeaf 90122260 <91d02005> 92102780 7ffbbeab 90122260 91d02005 7ffbbea8 After this happend, one (out of two) cpu got consumed (in kernel space) trying to complete io. Process stuck in D state, wchan says it was in sync_buffer() which you can see also in 'SysRq : Show Blocked State' below. [27422.874858] SysRq : Show Blocked State [27422.877086] taskPC stack pid father [27422.877143] rmD 004f8f68 0 4966 4860 [27422.877160] Call Trace: [27422.877167] [00692840] io_schedule+0x28/0x40 [27422.877182] [004f8f68] sync_buffer+0x50/0x60 [27422.877198] [00692a58] __wait_on_bit_lock+0x60/0xa0 [27422.877213] [00692ae4] out_of_line_wait_on_bit_lock+0x4c/0x60 [27422.877228] [004f9328] __lock_buffer+0x30/0x40 [27422.877242] [0053b024] journal_invalidatepage+0x22c/0x460 [27422.877268] [00529254] ext3_invalidatepage+0x3c/0x60 [27422.877297] [004b22fc] do_invalidatepage+0x24/0x60 [27422.877316] [004b29c4] truncate_complete_page+0x6c/0x80 [27422.877332] [004b2a6c] truncate_inode_pages_range+0x94/0x440 [27422.877349] [004b2e2c] truncate_inode_pages+0x14/0x20 [27422.877364] [00529888] ext3_delete_inode+0x10/0x160 [27422.877381] [004e7ca0] generic_delete_inode+0x88/0x120 [27422.877405] [004e7e60] generic_drop_inode+0x128/0x1c0 [27422.877421] [004e75d4] iput+0x7c/0xa0 [27422.877435] [004dd680] do_unlinkat+0x108/0x1a0 The downside is that it is unclear to me how to reproduce that - it just happens sometimes. Also from time to time I get warnings about tcp_fastretrans_alert(), but it seems they do no harm. [30014.779310] WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() [30014.781630] Call Trace: [30014.783976] [006551c8] tcp_fastretrans_alert+0x70/0xe00 [30014.786312] [00657c60] tcp_ack+0x988/0x10c0 [30014.788702] [0065bd80] tcp_rcv_established+0x408/0x840 [30014.791074] [006634dc] tcp_v4_do_rcv+0xe4/0x4a0 [30014.793440] [0066632c] tcp_v4_rcv+0xa34/0xb20 [30014.795762] [00643a10] ip_local_deliver+0xd8/0x2c0 [30014.798102] [00643ed4] ip_rcv+0x2dc/0x640 [30014.800431] [0062424c] netif_receive_skb+0x334/0x400 [30014.802762] [00627228] process_backlog+0x90/0x140 [30014.805097] [00626d28] net_rx_action+0x190/0x260 [30014.807462] [00475ea8] __do_softirq+0x90/0x140 [30014.809794] [00475fe0] do_softirq+0x88/0xa0 [30014.812134] [0047608c] irq_exit+0x94/0xc0 [30014.814453] [0042f53c] handler_irq+0xa4/0xc0 [30014.816800]
Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash
On Fri, 07 Dec 2007 22:01:33 -0700 Zan Lynx <[EMAIL PROTECTED]> wrote: > > On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote: > > On Fri, 07 Dec 2007 23:09:43 + > > Zan Lynx <[EMAIL PROTECTED]> wrote: > [cut] > > > > > Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, but I > > > > > only get read rates of 1.6 MB/s. When it used to work in 2.6.20 I got > > > > > at least 16 MB/s. The card itself is capable of 30+ in the USB-2 > > > > > reader. > [cut] > > Maybe pata_pcmcia-minor-cleanups-and-support-for-dual-channel-cards.patch? > > > > Could you try a `patch -R' of the below? > > > > > > From: Alan Cox <[EMAIL PROTECTED]> > > > > Signed-off-by: Alan Cox <[EMAIL PROTECTED]> > > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > > --- > > > > drivers/ata/pata_pcmcia.c | 31 +-- > > 1 file changed, 17 insertions(+), 14 deletions(-) > > > > diff -puN > > drivers/ata/pata_pcmcia.c~pata_pcmcia-minor-cleanups-and-support-for-dual-channel-cards > > drivers/ata/pata_pcmcia.c > [cut] > > Nope, that did not change anything. It still detects as PIO0 and still > runs at 1.6 MB/s. argh. OK. And Linus's current tree is OK, yes? In which case we should be OK for 2.6.24 and I guess we can hope like heck that the dud patch doesn't leak into mainline. Hopefully Alan will get some time to look into it before 2.6.25 opens. OK, there's a patch in Jeff's tree "pata_pcmcia: Add support for dumb 8bit IDE emulations" which could be our guy. I've uploaded two patches, against 2.6.24-rc4: http://userweb.kernel.org/~akpm/zl.with.gz origin.patch + git-libata-all.patch http://userweb.kernel.org/~akpm/zl.without.gz origin.patch + git-libata-all.patch - 5ddcddd4dfeb16a9509dad647f509828d6fee605 It would be great if you could test both. If zl.with is bad and zl.without is good then we know that 5ddcddd4dfeb16a9509dad647f509828d6fee605 caused this problem. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: undefined reference to `compat_sys_timerfd' on sparc64
> > LD .tmp_vmlinux1 > > arch/sparc64/kernel/head.o: In function `sys_call_table32': > > arch/sparc64/kernel/head.S:(.text+0x224e0): undefined reference to > > `compat_sys_timerfd' > > make: *** [.tmp_vmlinux1] Error 1 > > argh, sorry, I am soo fed up with fixing that patch. > > --- a/arch/sparc64/kernel/systbls.S~timerfd-v3-new-timerfd-api-sparc64-fix > +++ a/arch/sparc64/kernel/systbls.S > @@ -80,7 +80,7 @@ sys_call_table32: > .word sys_fchmodat, sys_faccessat, compat_sys_pselect6, > compat_sys_ppoll, sys_unshare > /*300*/ .word compat_sys_set_robust_list, compat_sys_get_robust_list, > compat_sys_migrate_pages, compat_sys_mbind, compat_sys_get_mempolicy > .word compat_sys_set_mempolicy, compat_sys_kexec_load, > compat_sys_move_pages, sys_getcpu, compat_sys_epoll_pwait > -/*310*/ .word compat_sys_utimensat, compat_sys_signalfd, > compat_sys_timerfd, sys_eventfd, compat_sys_fallocate > +/*310*/ .word compat_sys_utimensat, compat_sys_signalfd, > sys_ni_syscall, sys_eventfd, compat_sys_fallocate > > #endif /* CONFIG_COMPAT */ Ok - that helped. Thanks, Mariusz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: undefined reference to `compat_sys_timerfd' on sparc64
LD .tmp_vmlinux1 arch/sparc64/kernel/head.o: In function `sys_call_table32': arch/sparc64/kernel/head.S:(.text+0x224e0): undefined reference to `compat_sys_timerfd' make: *** [.tmp_vmlinux1] Error 1 argh, sorry, I am soo fed up with fixing that patch. --- a/arch/sparc64/kernel/systbls.S~timerfd-v3-new-timerfd-api-sparc64-fix +++ a/arch/sparc64/kernel/systbls.S @@ -80,7 +80,7 @@ sys_call_table32: .word sys_fchmodat, sys_faccessat, compat_sys_pselect6, compat_sys_ppoll, sys_unshare /*300*/ .word compat_sys_set_robust_list, compat_sys_get_robust_list, compat_sys_migrate_pages, compat_sys_mbind, compat_sys_get_mempolicy .word compat_sys_set_mempolicy, compat_sys_kexec_load, compat_sys_move_pages, sys_getcpu, compat_sys_epoll_pwait -/*310*/ .word compat_sys_utimensat, compat_sys_signalfd, compat_sys_timerfd, sys_eventfd, compat_sys_fallocate +/*310*/ .word compat_sys_utimensat, compat_sys_signalfd, sys_ni_syscall, sys_eventfd, compat_sys_fallocate #endif /* CONFIG_COMPAT */ Ok - that helped. Thanks, Mariusz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash
On Fri, 07 Dec 2007 22:01:33 -0700 Zan Lynx [EMAIL PROTECTED] wrote: On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote: On Fri, 07 Dec 2007 23:09:43 + Zan Lynx [EMAIL PROTECTED] wrote: [cut] Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, but I only get read rates of 1.6 MB/s. When it used to work in 2.6.20 I got at least 16 MB/s. The card itself is capable of 30+ in the USB-2 reader. [cut] Maybe pata_pcmcia-minor-cleanups-and-support-for-dual-channel-cards.patch? Could you try a `patch -R' of the below? From: Alan Cox [EMAIL PROTECTED] Signed-off-by: Alan Cox [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- drivers/ata/pata_pcmcia.c | 31 +-- 1 file changed, 17 insertions(+), 14 deletions(-) diff -puN drivers/ata/pata_pcmcia.c~pata_pcmcia-minor-cleanups-and-support-for-dual-channel-cards drivers/ata/pata_pcmcia.c [cut] Nope, that did not change anything. It still detects as PIO0 and still runs at 1.6 MB/s. argh. OK. And Linus's current tree is OK, yes? In which case we should be OK for 2.6.24 and I guess we can hope like heck that the dud patch doesn't leak into mainline. Hopefully Alan will get some time to look into it before 2.6.25 opens. looks OK, there's a patch in Jeff's tree pata_pcmcia: Add support for dumb 8bit IDE emulations which could be our guy. I've uploaded two patches, against 2.6.24-rc4: http://userweb.kernel.org/~akpm/zl.with.gz origin.patch + git-libata-all.patch http://userweb.kernel.org/~akpm/zl.without.gz origin.patch + git-libata-all.patch - 5ddcddd4dfeb16a9509dad647f509828d6fee605 It would be great if you could test both. If zl.with is bad and zl.without is good then we know that 5ddcddd4dfeb16a9509dad647f509828d6fee605 caused this problem. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: some issues on sparc64
Hello, The box is sun ultra 60 (dual sparc64). This was caught when system (gentoo) was emerging some package. [27006.402237] kernel BUG at fs/jbd/transaction.c:1894! [27006.402268] \|/ \|/ [27006.402274] @'/ .. \`@ [27006.402279] /_| \__/ |_\ [27006.402285] \__U_/ [27006.402298] rm(4713): Kernel bad sw trap 5 [#1] [27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 0053b1d0 Y: Not tainted [27006.402579] TPC: journal_invalidatepage+0x3d4/0x460 [27006.402593] g0: 0002 g1: g2: 0001 g3: f800a7d9 [27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 g7: 0076d868 [27006.402627] o0: 0072b660 o1: 0766 o2: 0002 o3: 0001 [27006.402644] o4: 008a2940 o5: sp: f800a7d92c91 ret_pc: 0053b1c4 [27006.402665] RPC: journal_invalidatepage+0x3cc/0x460 [27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 l3: [27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 l7: 0001 [27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2: i3: 00727000 [27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 i7: 00529254 [27006.402763] I7: ext3_invalidatepage+0x3c/0x60 [27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60 [27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60 [27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80 [27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440 [27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20 [27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160 [27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120 [27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0 [27006.402971] Caller[004e75d4]: iput+0x7c/0xa0 [27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0 [27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60 [27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40 [27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4 [27006.403102] Instruction DUMP: 92102766 7ffbbeaf 90122260 91d02005 92102780 7ffbbeab 90122260 91d02005 7ffbbea8 After this happend, one (out of two) cpu got consumed (in kernel space) trying to complete io. Process stuck in D state, wchan says it was in sync_buffer() which you can see also in 'SysRq : Show Blocked State' below. [27422.874858] SysRq : Show Blocked State [27422.877086] taskPC stack pid father [27422.877143] rmD 004f8f68 0 4966 4860 [27422.877160] Call Trace: [27422.877167] [00692840] io_schedule+0x28/0x40 [27422.877182] [004f8f68] sync_buffer+0x50/0x60 [27422.877198] [00692a58] __wait_on_bit_lock+0x60/0xa0 [27422.877213] [00692ae4] out_of_line_wait_on_bit_lock+0x4c/0x60 [27422.877228] [004f9328] __lock_buffer+0x30/0x40 [27422.877242] [0053b024] journal_invalidatepage+0x22c/0x460 [27422.877268] [00529254] ext3_invalidatepage+0x3c/0x60 [27422.877297] [004b22fc] do_invalidatepage+0x24/0x60 [27422.877316] [004b29c4] truncate_complete_page+0x6c/0x80 [27422.877332] [004b2a6c] truncate_inode_pages_range+0x94/0x440 [27422.877349] [004b2e2c] truncate_inode_pages+0x14/0x20 [27422.877364] [00529888] ext3_delete_inode+0x10/0x160 [27422.877381] [004e7ca0] generic_delete_inode+0x88/0x120 [27422.877405] [004e7e60] generic_drop_inode+0x128/0x1c0 [27422.877421] [004e75d4] iput+0x7c/0xa0 [27422.877435] [004dd680] do_unlinkat+0x108/0x1a0 The downside is that it is unclear to me how to reproduce that - it just happens sometimes. Also from time to time I get warnings about tcp_fastretrans_alert(), but it seems they do no harm. [30014.779310] WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() [30014.781630] Call Trace: [30014.783976] [006551c8] tcp_fastretrans_alert+0x70/0xe00 [30014.786312] [00657c60] tcp_ack+0x988/0x10c0 [30014.788702] [0065bd80] tcp_rcv_established+0x408/0x840 [30014.791074] [006634dc] tcp_v4_do_rcv+0xe4/0x4a0 [30014.793440] [0066632c] tcp_v4_rcv+0xa34/0xb20 [30014.795762] [00643a10] ip_local_deliver+0xd8/0x2c0 [30014.798102] [00643ed4] ip_rcv+0x2dc/0x640 [30014.800431] [0062424c] netif_receive_skb+0x334/0x400 [30014.802762] [00627228] process_backlog+0x90/0x140 [30014.805097] [00626d28] net_rx_action+0x190/0x260 [30014.807462] [00475ea8] __do_softirq+0x90/0x140 [30014.809794] [00475fe0] do_softirq+0x88/0xa0 [30014.812134] [0047608c]
Re: 2.6.24-rc4-mm1: some issues on sparc64
On Sat, 8 Dec 2007 19:20:28 +0100 Mariusz Kozlowski [EMAIL PROTECTED] wrote: The box is sun ultra 60 (dual sparc64). This was caught when system (gentoo) was emerging some package. [27006.402237] kernel BUG at fs/jbd/transaction.c:1894! That's J_ASSERT_BH(bh, !buffer_jbddirty(bh)); at the end of journal_unmap_buffer(). I don't recall seeing that before and I can't think of anything we've done recently which could cause it, sorry. [27006.402268] \|/ \|/ [27006.402274] @'/ .. \`@ [27006.402279] /_| \__/ |_\ [27006.402285] \__U_/ x86 needs that. [27006.402298] rm(4713): Kernel bad sw trap 5 [#1] [27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 0053b1d0 Y: Not tainted [27006.402579] TPC: journal_invalidatepage+0x3d4/0x460 [27006.402593] g0: 0002 g1: g2: 0001 g3: f800a7d9 [27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 g7: 0076d868 [27006.402627] o0: 0072b660 o1: 0766 o2: 0002 o3: 0001 [27006.402644] o4: 008a2940 o5: sp: f800a7d92c91 ret_pc: 0053b1c4 [27006.402665] RPC: journal_invalidatepage+0x3cc/0x460 [27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 l3: [27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 l7: 0001 [27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2: i3: 00727000 [27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 i7: 00529254 [27006.402763] I7: ext3_invalidatepage+0x3c/0x60 [27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60 [27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60 [27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80 [27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440 [27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20 [27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160 [27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120 [27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0 [27006.402971] Caller[004e75d4]: iput+0x7c/0xa0 [27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0 [27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60 [27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40 [27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4 [27006.403102] Instruction DUMP: 92102766 7ffbbeaf 90122260 91d02005 92102780 7ffbbeab 90122260 91d02005 7ffbbea8 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1 -- boot process hangs -- tty4 main process (2988) terminated with status 1
Dec 6 21:24:28 erratic-orbits init: tty3 main process (2991) terminated with status 1 Boggle. We broke the vt driver? config, please... I sent the .config. Is there nothing else to follow up on? I have tried rebuilding about seven kernels, tweaking the options each time. All the kernels have failed to boot. I am currently trying with a defconfig kernel. Perhaps I will have better luck with it. Thanks, Miles -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash
On Sat, 2007-12-08 at 02:07 -0800, Andrew Morton wrote: On Fri, 07 Dec 2007 22:01:33 -0700 Zan Lynx [EMAIL PROTECTED] wrote: On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote: On Fri, 07 Dec 2007 23:09:43 + Zan Lynx [EMAIL PROTECTED] wrote: [cut] Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, but I only get read rates of 1.6 MB/s. When it used to work in 2.6.20 I got at least 16 MB/s. The card itself is capable of 30+ in the USB-2 reader. [cut] argh. OK. And Linus's current tree is OK, yes? In which case we should be OK for 2.6.24 and I guess we can hope like heck that the dud patch doesn't leak into mainline. Hopefully Alan will get some time to look into it before 2.6.25 opens. Linus' tree is also broken. I tried a Linus 2.6.24-rc4 and it acts the same way, with a very slow transfer rate. I also tried 2.6.24-rc4 with the older not-libata PATA drivers and it is broken. dmesg had a line about the CF card detected as hda, but /sys/block did not have hda and /dev/hda did not function. I will try the patches you mentioned, but I think I may also have to work backward through kernel versions until I find the last one where the PCMCIA hd{a,b,c,d,e} drivers worked. -- Zan Lynx [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash
On Sat, 08 Dec 2007 20:02:54 -0700 Zan Lynx [EMAIL PROTECTED] wrote: On Sat, 2007-12-08 at 02:07 -0800, Andrew Morton wrote: On Fri, 07 Dec 2007 22:01:33 -0700 Zan Lynx [EMAIL PROTECTED] wrote: On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote: On Fri, 07 Dec 2007 23:09:43 + Zan Lynx [EMAIL PROTECTED] wrote: [cut] Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, but I only get read rates of 1.6 MB/s. When it used to work in 2.6.20 I got at least 16 MB/s. The card itself is capable of 30+ in the USB-2 reader. [cut] argh. OK. And Linus's current tree is OK, yes? In which case we should be OK for 2.6.24 and I guess we can hope like heck that the dud patch doesn't leak into mainline. Hopefully Alan will get some time to look into it before 2.6.25 opens. Linus' tree is also broken. I tried a Linus 2.6.24-rc4 and it acts the same way, with a very slow transfer rate. shit I also tried 2.6.24-rc4 with the older not-libata PATA drivers and it is broken. squared. dmesg had a line about the CF card detected as hda, but /sys/block did not have hda and /dev/hda did not function. But these drivers did work in earlier kernels, yes? 2.6.20 worked, but we don't know about intervening kernels. Can you tell us which version(s)? I will try the patches you mentioned Yes, that won't tell use anything. but I think I may also have to work backward through kernel versions until I find the last one where the PCMCIA hd{a,b,c,d,e} drivers worked. That would be great - a git-bisect is often ideal. http://www.kernel.org/doc/local/git-quick.html has details. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash
On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote: > On Fri, 07 Dec 2007 23:09:43 + > Zan Lynx <[EMAIL PROTECTED]> wrote: [cut] > > > > Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, but I > > > > only get read rates of 1.6 MB/s. When it used to work in 2.6.20 I got > > > > at least 16 MB/s. The card itself is capable of 30+ in the USB-2 > > > > reader. [cut] > Maybe pata_pcmcia-minor-cleanups-and-support-for-dual-channel-cards.patch? > > Could you try a `patch -R' of the below? > > > From: Alan Cox <[EMAIL PROTECTED]> > > Signed-off-by: Alan Cox <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > --- > > drivers/ata/pata_pcmcia.c | 31 +-- > 1 file changed, 17 insertions(+), 14 deletions(-) > > diff -puN > drivers/ata/pata_pcmcia.c~pata_pcmcia-minor-cleanups-and-support-for-dual-channel-cards > drivers/ata/pata_pcmcia.c [cut] Nope, that did not change anything. It still detects as PIO0 and still runs at 1.6 MB/s. -- Zan Lynx <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash
Zan Lynx wrote: On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote: On Fri, 07 Dec 2007 23:09:43 + Zan Lynx <[EMAIL PROTECTED]> wrote: On Fri, 2007-12-07 at 15:02 -0800, Andrew Morton wrote: On Fri, 07 Dec 2007 20:38:24 + Zan Lynx <[EMAIL PROTECTED]> wrote: While I'm reporting problems I'll get this one out there. I normally use a USB-2 memory card reader but I also have a PCMCIA CompactFlash adapter that I use occasionally. During the MM series kernels 2.6.22 and 23 (I am pretty sure) this didn't work at all. I don't know about vanilla since I don't run that. Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, but I only get read rates of 1.6 MB/s. When it used to work in 2.6.20 I got at least 16 MB/s. The card itself is capable of 30+ in the USB-2 reader. [cut] Oh, OK. Hopefully the ata guys can help out with this. I don't know if it actually strictly a regression? Did libata ever support that device in any earlier kernels? That could be why it didn't work for a few kernel versions. I reconfigured for a libata-only system a while back. And, since I usually use the USB-2 flash reader I didn't care much about the PCMCIA. I will try reverting that patch later tonight, in a few hours. It looks like pata_pcmcia is always PIO mode 0: /** * pcmcia_init_one - attach a PCMCIA interface * @pdev: pcmcia device * * Register a PCMCIA IDE interface. Such interfaces are PIO 0 and * shared IRQ. */ I assume that with old IDE this would use ide_cs.c, but I'm drawing a blank on what modes that supports.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1 and /proc//status Name: field
Andrew Morton <[EMAIL PROTECTED]> writes: > On Fri, 07 Dec 2007 20:26:43 + > Zan Lynx <[EMAIL PROTECTED]> wrote: > >> Today I noticed pgrep doesn't work. It seems the reason is a missing >> Name: tag in the status file for a process in /proc. >> >> # cat /proc/1/status >> init >> State: S (sleeping) >> Tgid: 1 >> Pid:1 >> PPid: 0 >> TracerPid: 0 >> ...etc, etc... >> >> This is supposed to look like: >> # cat /proc/1/status >> Name:init >> State: S (sleeping) >> Tgid:1 >> Pid: 1 >> PPid:0 >> TracerPid: 0 >> ... >> > > Thanks. Two (more) bugs in > proc-seqfile-convert-proc_pid_status-to-properly-handle-pid-namespaces.patch Doh! How did I get that one confused? Thanks. Eric > > --- > a/fs/proc/array.c~proc-seqfile-convert-proc_pid_status-to-properly-handle-pid-namespaces-fix-3 > +++ a/fs/proc/array.c > @@ -98,9 +98,9 @@ static inline void task_name(struct seq_ > > get_task_comm(tcomm, p); > > + seq_printf(m, "Name:\t"); > end = m->buf + m->size; > buf = m->buf + m->count; > - seq_printf(m, "Name:\n"); > name = tcomm; > i = sizeof(tcomm); > while (i && (buf < end)) { > _ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: undefined reference to `compat_sys_timerfd' on sparc64
On Sat, 8 Dec 2007 01:04:55 +0100 Mariusz Kozlowski <[EMAIL PROTECTED]> wrote: > Hello, > > I tried it on sun ultra 60 (dual sparc64) station. Unfortunately it > failed > to compile. > > AS arch/sparc64/lib/xor.o > AR arch/sparc64/lib/lib.a > GEN .version > CHK include/linux/compile.h > dnsdomainname: Unknown host > UPD include/linux/compile.h > CC init/version.o > LD init/built-in.o > LD .tmp_vmlinux1 > arch/sparc64/kernel/head.o: In function `sys_call_table32': > arch/sparc64/kernel/head.S:(.text+0x224e0): undefined reference to > `compat_sys_timerfd' > make: *** [.tmp_vmlinux1] Error 1 argh, sorry, I am soo fed up with fixing that patch. --- a/arch/sparc64/kernel/systbls.S~timerfd-v3-new-timerfd-api-sparc64-fix +++ a/arch/sparc64/kernel/systbls.S @@ -80,7 +80,7 @@ sys_call_table32: .word sys_fchmodat, sys_faccessat, compat_sys_pselect6, compat_sys_ppoll, sys_unshare /*300*/.word compat_sys_set_robust_list, compat_sys_get_robust_list, compat_sys_migrate_pages, compat_sys_mbind, compat_sys_get_mempolicy .word compat_sys_set_mempolicy, compat_sys_kexec_load, compat_sys_move_pages, sys_getcpu, compat_sys_epoll_pwait -/*310*/.word compat_sys_utimensat, compat_sys_signalfd, compat_sys_timerfd, sys_eventfd, compat_sys_fallocate +/*310*/.word compat_sys_utimensat, compat_sys_signalfd, sys_ni_syscall, sys_eventfd, compat_sys_fallocate #endif /* CONFIG_COMPAT */ _ Or should this have been sys_nis_syscall()? I should have picked this up in cross-build testing but iirc sparc64 broke for other reasons. Let me check on that. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1 and /proc//status Name: field
On Fri, 07 Dec 2007 20:26:43 + Zan Lynx <[EMAIL PROTECTED]> wrote: > Today I noticed pgrep doesn't work. It seems the reason is a missing > Name: tag in the status file for a process in /proc. > > # cat /proc/1/status > init > State: S (sleeping) > Tgid: 1 > Pid:1 > PPid: 0 > TracerPid: 0 > ...etc, etc... > > This is supposed to look like: > # cat /proc/1/status > Name: init > State:S (sleeping) > Tgid: 1 > Pid: 1 > PPid: 0 > TracerPid:0 > ... > Thanks. Two (more) bugs in proc-seqfile-convert-proc_pid_status-to-properly-handle-pid-namespaces.patch --- a/fs/proc/array.c~proc-seqfile-convert-proc_pid_status-to-properly-handle-pid-namespaces-fix-3 +++ a/fs/proc/array.c @@ -98,9 +98,9 @@ static inline void task_name(struct seq_ get_task_comm(tcomm, p); + seq_printf(m, "Name:\t"); end = m->buf + m->size; buf = m->buf + m->count; - seq_printf(m, "Name:\n"); name = tcomm; i = sizeof(tcomm); while (i && (buf < end)) { _ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: undefined reference to `compat_sys_timerfd' on sparc64
Hello, I tried it on sun ultra 60 (dual sparc64) station. Unfortunately it failed to compile. AS arch/sparc64/lib/xor.o AR arch/sparc64/lib/lib.a GEN .version CHK include/linux/compile.h dnsdomainname: Unknown host UPD include/linux/compile.h CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 arch/sparc64/kernel/head.o: In function `sys_call_table32': arch/sparc64/kernel/head.S:(.text+0x224e0): undefined reference to `compat_sys_timerfd' make: *** [.tmp_vmlinux1] Error 1 Any hints? Regards, Mariusz Linux sparc64 2.6.23-gentoo-r3 #4 SMP PREEMPT Sat Dec 8 00:50:12 CET 2007 sparc64 sun4u TI UltraSparc II (BlackBird) GNU/Linux Gnu C 4.1.1 Gnu make 3.81 binutils 2.17 util-linux 2.12r mount 2.12r module-init-tools 3.2.2 e2fsprogs 1.39 Linux C Library2.5 Dynamic linker (ldd) 2.5 Procps 3.2.6 Net-tools 1.60 Kbd1.12 Sh-utils 6.4 udev 104 Modules Loaded sr_mod cdrom sg # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24-rc4-mm1 # Sat Dec 8 01:01:01 2007 # CONFIG_SPARC=y CONFIG_SPARC64=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_64BIT=y CONFIG_MMU=y CONFIG_QUICKLIST=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_AUDIT_ARCH=y CONFIG_ARCH_NO_VIRT_TO_BUS=y CONFIG_OF=y CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y CONFIG_ARCH_SUPPORTS_AOUT=y CONFIG_SPARC64_PAGE_SIZE_8KB=y # CONFIG_SPARC64_PAGE_SIZE_64KB is not set # CONFIG_SPARC64_PAGE_SIZE_512KB is not set # CONFIG_SPARC64_PAGE_SIZE_4MB is not set CONFIG_SECCOMP=y # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_300 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=250 # CONFIG_HOTPLUG_CPU is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 # CONFIG_CGROUPS is not set # CONFIG_FAIR_GROUP_SCHED is not set CONFIG_SYSFS_DEPRECATED=y CONFIG_RELAY=y CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_IPC_NS is not set # CONFIG_USER_NS is not set # CONFIG_PID_NS is not set # CONFIG_BLK_DEV_INITRD is not set # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLUB_DEBUG=y # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_PROC_PAGE_MONITOR=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y CONFIG_BLK_DEV_IO_TRACE=y # CONFIG_BLK_DEV_BSG is not set CONFIG_BLOCK_COMPAT=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y CONFIG_DEFAULT_AS=y # CONFIG_DEFAULT_DEADLINE is not set # CONFIG_DEFAULT_CFQ is not set # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="anticipatory" CONFIG_SYSVIPC_COMPAT=y CONFIG_GENERIC_HARDIRQS=y # # General machine setup # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_SMP=y CONFIG_NR_CPUS=4 # CONFIG_CPU_FREQ is not set CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_FIND_NEXT_BIT=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_DEFAULT=y CONFIG_SELECT_MEMORY_MODEL=y # CONFIG_FLATMEM_MANUAL is not set # CONFIG_DISCONTIGMEM_MANUAL is not set CONFIG_SPARSEMEM_MANUAL=y CONFIG_SPARSEMEM=y CONFIG_HAVE_MEMORY_PRESENT=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPARSEMEM_EXTREME=y CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y CONFIG_SPARSEMEM_VMEMMAP=y CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_RESOURCES_64BIT=y CONFIG_ZONE_DMA_FLAG=0 CONFIG_NR_QUICK=1 CONFIG_SBUS=y CONFIG_SBUSCHAR=y CONFIG_SUN_AUXIO=y CONFIG_SUN_IO=y # CONFIG_SUN_LDOMS is not set CONFIG_PCI=y CONFIG_PCI_DOMAINS=y CONFIG_PCI_SYSCALL=y CONFIG_ARCH_SUPPORTS_MSI=y CONFIG_PCI_MSI=y CONFIG_PCI_LEGACY=y # CONFIG_PCI_DEBUG
Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash
On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote: > On Fri, 07 Dec 2007 23:09:43 + > Zan Lynx <[EMAIL PROTECTED]> wrote: > > > > > On Fri, 2007-12-07 at 15:02 -0800, Andrew Morton wrote: > > > On Fri, 07 Dec 2007 20:38:24 + > > > Zan Lynx <[EMAIL PROTECTED]> wrote: > > > > > > > While I'm reporting problems I'll get this one out there. > > > > > > > > I normally use a USB-2 memory card reader but I also have a PCMCIA > > > > CompactFlash adapter that I use occasionally. During the MM series > > > > kernels 2.6.22 and 23 (I am pretty sure) this didn't work at all. I > > > > don't know about vanilla since I don't run that. > > > > > > > > Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, but I > > > > only get read rates of 1.6 MB/s. When it used to work in 2.6.20 I got > > > > at least 16 MB/s. The card itself is capable of 30+ in the USB-2 > > > > reader. > > > > > > [cut] > Oh, OK. Hopefully the ata guys can help out with this. > > I don't know if it actually strictly a regression? Did libata ever support > that device in any earlier kernels? That could be why it didn't work for a few kernel versions. I reconfigured for a libata-only system a while back. And, since I usually use the USB-2 flash reader I didn't care much about the PCMCIA. I will try reverting that patch later tonight, in a few hours. -- Zan Lynx <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: 2.6.24-rc4-mm1 and excessive block IO errors
On Fri, Dec 07, 2007 at 03:05:37PM -0800, Andrew Morton wrote: > On Fri, 07 Dec 2007 20:44:45 + > Zan Lynx <[EMAIL PROTECTED]> wrote: > > > I am not sure if this problem has been addressed already. I read some > > about the fast-fail issues and this may be related? > > > > On nearly all my USB block devices, I have been getting zillions of I/O > > errors. But they aren't real, they don't appear with 2.6.23 kernels. > > > > I can often read and write data to the device, but these IO errors cause > > error aborts in user space applications in many cases, making it a > > chancy thing to run backup software, for example. > > > > Here is a bit of dmesg from plugging in a perfectly good USB-2 flash > > drive. > > > > hub 3-0:1.0: state 7 ports 6 chg evt 0004 > > ehci_hcd :00:02.2: GetStatus port 2 status 001803 POWER sig=j CSC > > CONNECT > > hub 3-0:1.0: port 2, status 0501, change 0001, 480 Mb/s > > hub 3-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x501 > > ehci_hcd :00:02.2: port 2 high speed > > ehci_hcd :00:02.2: GetStatus port 2 status 001005 POWER sig=se0 PE > > CONNECT > > usb 3-2: new high speed USB device using ehci_hcd and address 9 > > ehci_hcd :00:02.2: port 2 high speed > > ehci_hcd :00:02.2: GetStatus port 2 status 001005 POWER sig=se0 PE > > CONNECT > > usb 3-2: default language 0x0409 > > usb 3-2: uevent > > usb 3-2: usb_probe_device > > usb 3-2: configuration #1 chosen from 1 choice > > usb 3-2: adding 3-2:1.0 (config #1, interface 0) > > usb 3-2:1.0: uevent > > libusual 3-2:1.0: usb_probe_interface > > libusual 3-2:1.0: usb_probe_interface - got id > > usb-storage 3-2:1.0: usb_probe_interface > > usb-storage 3-2:1.0: usb_probe_interface - got id > > scsi4 : SCSI emulation for USB Mass Storage devices > > drivers/usb/core/inode.c: creating file '009' > > usb 3-2: New USB device found, idVendor=05dc, idProduct=a400 > > usb 3-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 > > usb 3-2: Product: JUMPDRIVE > > usb 3-2: Manufacturer: LEXAR MEDIA > > usb 3-2: SerialNumber: 0A4EEC05201219080904 > > usb-storage: device found at 9 > > usb-storage: waiting for device to settle before scanning > > usb-storage: device scan complete > > scsi 4:0:0:0: Direct-Access LEXARJUMPDRIVE1000 PQ: 0 ANSI: > > 0 CCS > > sd 4:0:0:0: [sdg] 2026592 512-byte hardware sectors (1038 MB) > > sd 4:0:0:0: [sdg] Write Protect is off > > sd 4:0:0:0: [sdg] Mode Sense: 43 00 00 00 > > sd 4:0:0:0: [sdg] Assuming drive cache: write through > > sd 4:0:0:0: [sdg] 2026592 512-byte hardware sectors (1038 MB) > > sd 4:0:0:0: [sdg] Write Protect is off > > sd 4:0:0:0: [sdg] Mode Sense: 43 00 00 00 > > sd 4:0:0:0: [sdg] Assuming drive cache: write through > > sdg: sdg1 > > sd 4:0:0:0: [sdg] Attached SCSI removable disk > > sd 4:0:0:0: Attached scsi generic sg7 type 0 > > sd 4:0:0:0: [sdg] Result: hostbyte=0x01 driverbyte=0x00 > > end_request: I/O error, dev sdg, sector 3984 > > Yes, this is breakage in the scsi tree. I believe that the offending patch > has been found and I have a nasty fix somewhere in my inbox - it involves > reverting a patch which doesn't revert properly. I haven't got onto > looking at it yet, sorry. Zan, check this thread http://marc.info/?t=11968982411=1=2 for unholy details. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/