subject:"Re\: 2.6.24\-rc4\-mm1"

Ilpo Järvinen wrote:
> On Thu, 13 Dec 2007, Cedric Le Goater wrote:
> 
>> I got this one while compiling on NFS.
>>
>> C.
>>
>> kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480!
> 
> I'm not exactly sure what patches you have applied and which patches are 
> not, with rc4-mm1 there are two patches (first one was incomplete, I 
> assume you had at least that one based on your other mail) to really fix 
> the issues in (__|)tcp_reset_fack_counts(...). 

Yes I only have the first patch you sent on lkml on top of 2.6.24-rc4-mm1.
attached below. I didn't see the second one on lkml ?  

> However, there seems to be so much breakage that I have a bit trouble to 
> decide where to start... The situation seems bit scary :-).

my n/w environment seems to reproduce these issues quite easily. if you
need some testing, just ping me.

Cheers,

C. 

> So, I might soon prepare a revert patch for most of the questionable 
> TCP parts and ask Dave to apply it (and drop them fully during next 
> rebase) unless I suddently figure something out soon which explains 
> all/most of the problems, then return to drawing board. ...As it seems 
> that the cumulative ACK processing problem discovered later on (having 
> rather cumbersome solution with skbs only) will make part of the work 
> that's currently in net-2.6.25 quite useless/duplicate effort. But thanks 
> anyway for reporting these.
> 
> 

Subject: [PATCH] [TCP]: Fix fack_count miscountings (multiple places)

1) Fack_count is set incorrectly if the highest sent skb is
already sacked (the skb->prev won't return it because it's on
the other list already). These manifest as fackets_out counting
error later on, the second-order effects are very hard to track,
so it may fix all out-standing TCP bug reports.

2) Prev == NULL check was wrong way around

3) Last skb's fack count was incorrectly skipped while() {} loop

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 include/net/tcp.h |   22 --
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9dbed0b..11a7e3e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1337,10 +1337,20 @@ static inline struct sk_buff *tcp_send_head(struct sock 
*sk)
 static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb)
 {
struct sk_buff *prev = tcp_write_queue_prev(sk, skb);
+   unsigned int fc = 0;
+
+   if (prev == (struct sk_buff *)>sk_write_queue)
+   prev = NULL;
+   else if (!tcp_skb_adjacent(sk, prev, skb))
+   prev = NULL;
 
-   if (prev != (struct sk_buff *)>sk_write_queue)
-   TCP_SKB_CB(skb)->fack_count = TCP_SKB_CB(prev)->fack_count +
- tcp_skb_pcount(prev);
+   if ((prev == NULL) && !__tcp_write_queue_empty(sk, TCP_WQ_SACKED))
+   prev = __tcp_write_queue_tail(sk, TCP_WQ_SACKED);
+
+   if (prev != NULL)
+   fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev);
+
+   TCP_SKB_CB(skb)->fack_count = fc;
 
sk->sk_send_head = tcp_write_queue_next(sk, skb);
if (sk->sk_send_head == (struct sk_buff *)>sk_write_queue)
@@ -1464,7 +1474,7 @@ static inline struct sk_buff 
*__tcp_reset_fack_counts(struct sock *sk,
 {
unsigned int fc = 0;
 
-   if (prev == NULL)
+   if (prev != NULL)
fc = TCP_SKB_CB(*prev)->fack_count + tcp_skb_pcount(*prev);
 
BUG_ON((*prev != NULL) && !tcp_skb_adjacent(sk, *prev, skb));
@@ -1521,7 +1531,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, 
struct sk_buff *inskb)
skb[otherq] = prev->next;
}
 
-   while (skb[queue] != __tcp_write_queue_tail(sk, queue)) {
+   do {
/* Lazy find for the other queue */
if (skb[queue] == NULL) {
skb[queue] = tcp_write_queue_find(sk, 
TCP_SKB_CB(prev)->seq,
@@ -1535,7 +1545,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, 
struct sk_buff *inskb)
break;
 
queue ^= TCP_WQ_SACKED;
-   }
+   } while (skb[queue] != __tcp_write_queue_tail(sk, queue));
 }
 
 static inline void __tcp_insert_write_queue_after(struct sk_buff *skb,
-- 1.5.0.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment

2007-12-13 Thread Ilpo Järvinen

On Thu, 13 Dec 2007, Cedric Le Goater wrote:

> I got this one while compiling on NFS.
> 
> C.
> 
> kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480!

I'm not exactly sure what patches you have applied and which patches are 
not, with rc4-mm1 there are two patches (first one was incomplete, I 
assume you had at least that one based on your other mail) to really fix 
the issues in (__|)tcp_reset_fack_counts(...). However, there seems to be 
so much breakage that I have a bit trouble to decide where to start...
The situation seems bit scary :-).

So, I might soon prepare a revert patch for most of the questionable 
TCP parts and ask Dave to apply it (and drop them fully during next 
rebase) unless I suddently figure something out soon which explains 
all/most of the problems, then return to drawing board. ...As it seems 
that the cumulative ACK processing problem discovered later on (having 
rather cumbersome solution with skbs only) will make part of the work 
that's currently in net-2.6.25 quite useless/duplicate effort. But thanks 
anyway for reporting these.

-- 
 i.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

2007-12-13 Thread Borislav Petkov

On Thu, Dec 13, 2007 at 09:17:18AM -0700, Bjorn Helgaas wrote:
> On Thursday 13 December 2007 12:09:23 am Borislav Petkov wrote:
> > On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote:
> > > On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote:
> > > > On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
> > > > > On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
> > > > > > From what i can roughly tell so far it seems like an resource 
> > > > > > conflict between acpi and
> > > > > > the pnp requested regions in your patch which result in the 
> > > > > > acpi_thermal code
> > > > > > to read the wrong (0xff) temperature value and halt the machine, 
> > > > > > but i might be
> > > > > > wrong on the details since acpi is such a big code chunk to swallow.
> > > > > 
> > > I think Alexey is on the right track with the PCI resource allocation
> > > failure.
> > 
> > Then it should be the SMBus controller, PCI id 00:1f:3, which is having 
> > problems
> > registering its io ports region 4, AFAICT.
> 
> Yes, it looks like the ioport region 0x540-0x55f is described both in
> PNP and ACPI:
> 
>   /sys/devices/pnp0/00:0d/resources:state = active
>   /sys/devices/pnp0/00:0d/resources:io 0x540-0x55f
>   /sys/devices/pnp0/00:0d/resources:io 0x400-0x47f
> 
>   00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus 
> Controller (rev 03)
> Subsystem: ASUSTeK Computer Inc. Unknown device 1869
> Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR- FastB2B-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
> SERR-  Interrupt: pin B routed to IRQ 0
> Region 4: I/O ports at 0540 [size=32]
> 
> The PCI SMBus device was enabled by a quirk, asus_hides_smbus_lpc().
> 
> This quirk seems dangerous to me, and the comments above asus_hides_smbus
> allude to problems similar to what you're seeing.  It's obvious that a
> lot of blood, sweat, and tears have gone into this quirk, so I'm not
> suggesting that it's time to revert it, but I would be interested in
> knowing whether the critical temperature problem goes away if we leave
> the PCI device hidden, e.g., with the following patch:
> 
> Index: linux-mm/drivers/pci/quirks.c
> ===
> --- linux-mm.orig/drivers/pci/quirks.c2007-12-13 09:11:31.0 
> -0700
> +++ linux-mm/drivers/pci/quirks.c 2007-12-13 09:12:27.0 -0700
> @@ -1073,12 +1073,7 @@
>  
>   pci_read_config_word(dev, 0xF2, );
>   if (val & 0x8) {
> - pci_write_config_word(dev, 0xF2, val & (~0x8));
> - pci_read_config_word(dev, 0xF2, );
> - if (val & 0x8)
> - printk(KERN_INFO "PCI: i801 SMBus device continues to 
> play 'hide and seek'! 0x%x\n", val);
> - else
> - printk(KERN_INFO "PCI: Enabled i801 SMBus device\n");
> + printk(KERN_INFO "PCI: Leaving i801 SMBus device hidden\n");
>   }
>  }
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,
> PCI_DEVICE_ID_INTEL_82801AA_0,  asus_hides_smbus_lpc);

yep, this fixes it. Bootlog attached.

-- 
Regards/Gruß,
Boris.


bootlog-smbus-hidden.bz2
Description: Binary data

Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment

Andrew Morton wrote:
> Temporarily at
> 
>   http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/
> 
> Will appear later at
> 
>   
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/

I got this one while compiling on NFS.

C.

kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480!
invalid opcode:  [1] SMP 
last sysfs file: /sys/devices/pci:00/:00:1e.0/:01:01.0/local_cpus
CPU 1 
Modules linked in: autofs4 nfs lockd sunrpc tg3 sg joydev ext3 jbd ehci_hcd 
ohci_hcd uhci_hcd
Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #3
RIP: 0010:[]  [] tcp_fragment+0x5ee/0x6f7
RSP: 0018:810147c9f9e0  EFLAGS: 00010217
RAX: 1526c311 RBX: 8100c2ce1d00 RCX: 810143cc6aa0
RDX: 0001 RSI: 810102b37b00 RDI: 810102b37b50
RBP: 810147c9fa50 R08: 004a R09: 0001
R10: 0b50 R11: 0001 R12: 81013a575700
R13:  R14: 810143cc6400 R15: 81013a575750
FS:  () GS:810147c57140() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 2ad5d294b000 CR3: bd11b000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 0, threadinfo 810147c98000, task 810147c89040)
Stack:  810147c9fa00  05a843cc6400 810143cc6400
 810147c9fa70 8100c2ce1d50 810143cc6590 810143cc6aa0
 15265421 810143cc6400 810143cc6400 81013a575700
Call Trace:
   [] tcp_retransmit_skb+0xd6/0x713
 [] tcp_xmit_retransmit_queue+0xd0/0x330
 [] tcp_fastretrans_alert+0xb92/0xbf2
 [] tcp_ack+0xdf3/0xfbe
 [] tcp_rcv_established+0x66a/0x76d
 [] tcp_v4_do_rcv+0x37/0x3aa
 [] tcp_v4_rcv+0x9a9/0xa76
 [] ip_local_deliver_finish+0x161/0x23c
 [] ip_local_deliver+0x72/0x77
 [] ip_rcv_finish+0x371/0x3b5
 [] ip_rcv+0x292/0x2c6
 [] netif_receive_skb+0x267/0x340
 [] :tg3:tg3_poll+0x5d2/0x89e
 [] net_rx_action+0xd5/0x1ad
 [] __do_softirq+0x5f/0xe3
 [] call_softirq+0x1c/0x28
 [] do_softirq+0x39/0x9f
 [] irq_exit+0x4e/0x50
 [] do_IRQ+0xb7/0xd7
 [] mwait_idle+0x0/0x55
 [] ret_from_intr+0x0/0xf
   [] __atomic_notifier_call_chain+0x20/0x83
 [] mwait_idle+0x48/0x55
 [] enter_idle+0x22/0x24
 [] cpu_idle+0xa1/0xc5
 [] start_secondary+0x3b9/0x3c5


Code: 0f 0b eb fe 48 85 f6 74 08 8b 46 6c 3b 41 68 75 55 48 8d 41 
RIP  [] tcp_fragment+0x5ee/0x6f7
 RSP 
Kernel panic - not syncing: Aiee, killing interrupt handler!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

tcp_sacktag_one() WARNING (was Re: 2.6.24-rc4-mm1)

Cedric Le Goater wrote:
> Ilpo Järvinen wrote:
>> On Wed, 5 Dec 2007, Andrew Morton wrote:
>>
>>> On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly <[EMAIL PROTECTED]> 
>>> wrote:
>>>
 This non fatal oops which I have just noticed may be related to this 
 change then 
 - certainly looks networking related.
>>> yep, but it isn't e1000.  It's core TCP.
>>>
 WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
 Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1
>>> Ilpo, Reuben's kernel is talking to you ;)
>> ...Please try the patch below. Andrew, this probably fixes your problem 
>> (the packets <= tp->packets_out) as well.
> 
> nah. I got the WARNINGs again with this patch.

I got this new one on a 2.6.24-rc5-mm1. It looked similar ? 

C.

WARNING: at /home/legoater/linux/2.6.24-rc5-mm1/net/ipv4/tcp_input.c:1280 
tcp_sacktag_one()
Pid: 0, comm: swapper Not tainted 2.6.24-rc5-mm1 #1

Call Trace:
   [] tcp_sacktag_walk+0x2bc/0x62a
 [] tcp_sacktag_write_queue+0x595/0xa7c
 [] kfree+0xd4/0xe0
 [] tcp_ack+0x2a7/0xfc7
 [] mark_held_locks+0x47/0x6a
 [] trace_hardirqs_on+0xfe/0x139
 [] tcp_rcv_established+0x66a/0x76d
 [] tcp_v4_do_rcv+0x37/0x3aa
 [] tcp_v4_rcv+0x9a9/0xa76
 [] ip_local_deliver_finish+0x161/0x23c
 [] ip_local_deliver+0x72/0x77
 [] ip_rcv_finish+0x371/0x3b5
 [] ip_rcv+0x292/0x2c6
 [] netif_receive_skb+0x267/0x340
 [] :tg3:tg3_poll+0x5d2/0x89e
 [] net_rx_action+0xd5/0x1ad
 [] __do_softirq+0x5f/0xe3
 [] call_softirq+0x1c/0x28
 [] do_softirq+0x39/0x9f
 [] irq_exit+0x4e/0x50
 [] do_IRQ+0xb7/0xd7
 [] mwait_idle+0x0/0x52
 [] ret_from_intr+0x0/0xf
   [] __atomic_notifier_call_chain+0x20/0x83
 [] mwait_idle+0x48/0x52
 [] enter_idle+0x22/0x24
 [] cpu_idle+0xa1/0xc5
 [] start_secondary+0x3b9/0x3c5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

2007-12-13 Thread Bjorn Helgaas

On Thursday 13 December 2007 12:09:23 am Borislav Petkov wrote:
> On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote:
> > On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote:
> > > On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
> > > > On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
> > > > > From what i can roughly tell so far it seems like an resource 
> > > > > conflict between acpi and
> > > > > the pnp requested regions in your patch which result in the 
> > > > > acpi_thermal code
> > > > > to read the wrong (0xff) temperature value and halt the machine, but 
> > > > > i might be
> > > > > wrong on the details since acpi is such a big code chunk to swallow.
> > > > 
> > I think Alexey is on the right track with the PCI resource allocation
> > failure.
> 
> Then it should be the SMBus controller, PCI id 00:1f:3, which is having 
> problems
> registering its io ports region 4, AFAICT.

Yes, it looks like the ioport region 0x540-0x55f is described both in
PNP and ACPI:

  /sys/devices/pnp0/00:0d/resources:state = active
  /sys/devices/pnp0/00:0d/resources:io 0x540-0x55f
  /sys/devices/pnp0/00:0d/resources:io 0x400-0x47f

  00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus 
Controller (rev 03)
Subsystem: ASUSTeK Computer Inc. Unknown device 1869
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

2007-12-13 Thread Bjorn Helgaas

On Thursday 13 December 2007 12:09:23 am Borislav Petkov wrote:
 On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote:
  On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote:
   On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
 From what i can roughly tell so far it seems like an resource 
 conflict between acpi and
 the pnp requested regions in your patch which result in the 
 acpi_thermal code
 to read the wrong (0xff) temperature value and halt the machine, but 
 i might be
 wrong on the details since acpi is such a big code chunk to swallow.

  I think Alexey is on the right track with the PCI resource allocation
  failure.
 
 Then it should be the SMBus controller, PCI id 00:1f:3, which is having 
 problems
 registering its io ports region 4, AFAICT.

Yes, it looks like the ioport region 0x540-0x55f is described both in
PNP and ACPI:

  /sys/devices/pnp0/00:0d/resources:state = active
  /sys/devices/pnp0/00:0d/resources:io 0x540-0x55f
  /sys/devices/pnp0/00:0d/resources:io 0x400-0x47f

  00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus 
Controller (rev 03)
Subsystem: ASUSTeK Computer Inc. Unknown device 1869
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- 
TAbort- MAbort- SERR- PERR-
Interrupt: pin B routed to IRQ 0
Region 4: I/O ports at 0540 [size=32]

The PCI SMBus device was enabled by a quirk, asus_hides_smbus_lpc().

This quirk seems dangerous to me, and the comments above asus_hides_smbus
allude to problems similar to what you're seeing.  It's obvious that a
lot of blood, sweat, and tears have gone into this quirk, so I'm not
suggesting that it's time to revert it, but I would be interested in
knowing whether the critical temperature problem goes away if we leave
the PCI device hidden, e.g., with the following patch:

Index: linux-mm/drivers/pci/quirks.c
===
--- linux-mm.orig/drivers/pci/quirks.c  2007-12-13 09:11:31.0 -0700
+++ linux-mm/drivers/pci/quirks.c   2007-12-13 09:12:27.0 -0700
@@ -1073,12 +1073,7 @@
 
pci_read_config_word(dev, 0xF2, val);
if (val  0x8) {
-   pci_write_config_word(dev, 0xF2, val  (~0x8));
-   pci_read_config_word(dev, 0xF2, val);
-   if (val  0x8)
-   printk(KERN_INFO PCI: i801 SMBus device continues to 
play 'hide and seek'! 0x%x\n, val);
-   else
-   printk(KERN_INFO PCI: Enabled i801 SMBus device\n);
+   printk(KERN_INFO PCI: Leaving i801 SMBus device hidden\n);
}
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,  PCI_DEVICE_ID_INTEL_82801AA_0,  
asus_hides_smbus_lpc);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

tcp_sacktag_one() WARNING (was Re: 2.6.24-rc4-mm1)

Cedric Le Goater wrote:
 Ilpo Järvinen wrote:
 On Wed, 5 Dec 2007, Andrew Morton wrote:

 On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly [EMAIL PROTECTED] 
 wrote:

 This non fatal oops which I have just noticed may be related to this 
 change then 
 - certainly looks networking related.
 yep, but it isn't e1000.  It's core TCP.

 WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
 Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1
 Ilpo, Reuben's kernel is talking to you ;)
 ...Please try the patch below. Andrew, this probably fixes your problem 
 (the packets = tp-packets_out) as well.
 
 nah. I got the WARNINGs again with this patch.

I got this new one on a 2.6.24-rc5-mm1. It looked similar ? 

C.

WARNING: at /home/legoater/linux/2.6.24-rc5-mm1/net/ipv4/tcp_input.c:1280 
tcp_sacktag_one()
Pid: 0, comm: swapper Not tainted 2.6.24-rc5-mm1 #1

Call Trace:
 IRQ  [80410e0e] tcp_sacktag_walk+0x2bc/0x62a
 [80411711] tcp_sacktag_write_queue+0x595/0xa7c
 [8028ce66] kfree+0xd4/0xe0
 [80411e9f] tcp_ack+0x2a7/0xfc7
 [80252ca1] mark_held_locks+0x47/0x6a
 [80252e5c] trace_hardirqs_on+0xfe/0x139
 [80415d59] tcp_rcv_established+0x66a/0x76d
 [8041bd35] tcp_v4_do_rcv+0x37/0x3aa
 [8041e623] tcp_v4_rcv+0x9a9/0xa76
 [80401832] ip_local_deliver_finish+0x161/0x23c
 [80401d47] ip_local_deliver+0x72/0x77
 [8040168d] ip_rcv_finish+0x371/0x3b5
 [80401ca1] ip_rcv+0x292/0x2c6
 [803e2aae] netif_receive_skb+0x267/0x340
 [8806eff4] :tg3:tg3_poll+0x5d2/0x89e
 [803e505c] net_rx_action+0xd5/0x1ad
 [8023b0b9] __do_softirq+0x5f/0xe3
 [8020c8ec] call_softirq+0x1c/0x28
 [8020e7b9] do_softirq+0x39/0x9f
 [8023b058] irq_exit+0x4e/0x50
 [8020e900] do_IRQ+0xb7/0xd7
 [8020a892] mwait_idle+0x0/0x52
 [8020bbe6] ret_from_intr+0x0/0xf
 EOI  [8024d0cb] __atomic_notifier_call_chain+0x20/0x83
 [8020a8da] mwait_idle+0x48/0x52
 [80209e79] enter_idle+0x22/0x24
 [8020a822] cpu_idle+0xa1/0xc5
 [8021e755] start_secondary+0x3b9/0x3c5
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment

Andrew Morton wrote:
 Temporarily at
 
   http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/
 
 Will appear later at
 
   
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/

I got this one while compiling on NFS.

C.

kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480!
invalid opcode:  [1] SMP 
last sysfs file: /sys/devices/pci:00/:00:1e.0/:01:01.0/local_cpus
CPU 1 
Modules linked in: autofs4 nfs lockd sunrpc tg3 sg joydev ext3 jbd ehci_hcd 
ohci_hcd uhci_hcd
Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #3
RIP: 0010:[80418d93]  [80418d93] tcp_fragment+0x5ee/0x6f7
RSP: 0018:810147c9f9e0  EFLAGS: 00010217
RAX: 1526c311 RBX: 8100c2ce1d00 RCX: 810143cc6aa0
RDX: 0001 RSI: 810102b37b00 RDI: 810102b37b50
RBP: 810147c9fa50 R08: 004a R09: 0001
R10: 0b50 R11: 0001 R12: 81013a575700
R13:  R14: 810143cc6400 R15: 81013a575750
FS:  () GS:810147c57140() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 2ad5d294b000 CR3: bd11b000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 0, threadinfo 810147c98000, task 810147c89040)
Stack:  810147c9fa00  05a843cc6400 810143cc6400
 810147c9fa70 8100c2ce1d50 810143cc6590 810143cc6aa0
 15265421 810143cc6400 810143cc6400 81013a575700
Call Trace:
 IRQ  [804190c7] tcp_retransmit_skb+0xd6/0x713
 [804197d4] tcp_xmit_retransmit_queue+0xd0/0x330
 [8041209b] tcp_fastretrans_alert+0xb92/0xbf2
 [80413f30] tcp_ack+0xdf3/0xfbe
 [80417295] tcp_rcv_established+0x66a/0x76d
 [8041d285] tcp_v4_do_rcv+0x37/0x3aa
 [8041fb73] tcp_v4_rcv+0x9a9/0xa76
 [80402e4e] ip_local_deliver_finish+0x161/0x23c
 [80403363] ip_local_deliver+0x72/0x77
 [80402ca9] ip_rcv_finish+0x371/0x3b5
 [804032bd] ip_rcv+0x292/0x2c6
 [803e3dcc] netif_receive_skb+0x267/0x340
 [8806eff4] :tg3:tg3_poll+0x5d2/0x89e
 [803e639d] net_rx_action+0xd5/0x1ad
 [8023b605] __do_softirq+0x5f/0xe3
 [8020c86c] call_softirq+0x1c/0x28
 [8020e739] do_softirq+0x39/0x9f
 [8023b5a4] irq_exit+0x4e/0x50
 [8020e880] do_IRQ+0xb7/0xd7
 [8020a803] mwait_idle+0x0/0x55
 [8020bb66] ret_from_intr+0x0/0xf
 EOI  [8024d623] __atomic_notifier_call_chain+0x20/0x83
 [8020a84b] mwait_idle+0x48/0x55
 [80209e79] enter_idle+0x22/0x24
 [8020a793] cpu_idle+0xa1/0xc5
 [8021dfd5] start_secondary+0x3b9/0x3c5


Code: 0f 0b eb fe 48 85 f6 74 08 8b 46 6c 3b 41 68 75 55 48 8d 41 
RIP  [80418d93] tcp_fragment+0x5ee/0x6f7
 RSP 810147c9f9e0
Kernel panic - not syncing: Aiee, killing interrupt handler!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

2007-12-13 Thread Borislav Petkov

On Thu, Dec 13, 2007 at 09:17:18AM -0700, Bjorn Helgaas wrote:
 On Thursday 13 December 2007 12:09:23 am Borislav Petkov wrote:
  On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote:
   On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote:
On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
 On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
  From what i can roughly tell so far it seems like an resource 
  conflict between acpi and
  the pnp requested regions in your patch which result in the 
  acpi_thermal code
  to read the wrong (0xff) temperature value and halt the machine, 
  but i might be
  wrong on the details since acpi is such a big code chunk to swallow.
 
   I think Alexey is on the right track with the PCI resource allocation
   failure.
  
  Then it should be the SMBus controller, PCI id 00:1f:3, which is having 
  problems
  registering its io ports region 4, AFAICT.
 
 Yes, it looks like the ioport region 0x540-0x55f is described both in
 PNP and ACPI:
 
   /sys/devices/pnp0/00:0d/resources:state = active
   /sys/devices/pnp0/00:0d/resources:io 0x540-0x55f
   /sys/devices/pnp0/00:0d/resources:io 0x400-0x47f
 
   00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus 
 Controller (rev 03)
 Subsystem: ASUSTeK Computer Inc. Unknown device 1869
 Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
 Stepping- SERR- FastB2B-
 Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- 
 TAbort- MAbort- SERR- PERR-
 Interrupt: pin B routed to IRQ 0
 Region 4: I/O ports at 0540 [size=32]
 
 The PCI SMBus device was enabled by a quirk, asus_hides_smbus_lpc().
 
 This quirk seems dangerous to me, and the comments above asus_hides_smbus
 allude to problems similar to what you're seeing.  It's obvious that a
 lot of blood, sweat, and tears have gone into this quirk, so I'm not
 suggesting that it's time to revert it, but I would be interested in
 knowing whether the critical temperature problem goes away if we leave
 the PCI device hidden, e.g., with the following patch:
 
 Index: linux-mm/drivers/pci/quirks.c
 ===
 --- linux-mm.orig/drivers/pci/quirks.c2007-12-13 09:11:31.0 
 -0700
 +++ linux-mm/drivers/pci/quirks.c 2007-12-13 09:12:27.0 -0700
 @@ -1073,12 +1073,7 @@
  
   pci_read_config_word(dev, 0xF2, val);
   if (val  0x8) {
 - pci_write_config_word(dev, 0xF2, val  (~0x8));
 - pci_read_config_word(dev, 0xF2, val);
 - if (val  0x8)
 - printk(KERN_INFO PCI: i801 SMBus device continues to 
 play 'hide and seek'! 0x%x\n, val);
 - else
 - printk(KERN_INFO PCI: Enabled i801 SMBus device\n);
 + printk(KERN_INFO PCI: Leaving i801 SMBus device hidden\n);
   }
  }
  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,
 PCI_DEVICE_ID_INTEL_82801AA_0,  asus_hides_smbus_lpc);

yep, this fixes it. Bootlog attached.

-- 
Regards/Gruß,
Boris.


bootlog-smbus-hidden.bz2
Description: Binary data

Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment

2007-12-13 Thread Ilpo Järvinen

On Thu, 13 Dec 2007, Cedric Le Goater wrote:

 I got this one while compiling on NFS.
 
 C.
 
 kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480!

I'm not exactly sure what patches you have applied and which patches are 
not, with rc4-mm1 there are two patches (first one was incomplete, I 
assume you had at least that one based on your other mail) to really fix 
the issues in (__|)tcp_reset_fack_counts(...). However, there seems to be 
so much breakage that I have a bit trouble to decide where to start...
The situation seems bit scary :-).

So, I might soon prepare a revert patch for most of the questionable 
TCP parts and ask Dave to apply it (and drop them fully during next 
rebase) unless I suddently figure something out soon which explains 
all/most of the problems, then return to drawing board. ...As it seems 
that the cumulative ACK processing problem discovered later on (having 
rather cumbersome solution with skbs only) will make part of the work 
that's currently in net-2.6.25 quite useless/duplicate effort. But thanks 
anyway for reporting these.


-- 
 i.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment

Ilpo Järvinen wrote:
 On Thu, 13 Dec 2007, Cedric Le Goater wrote:
 
 I got this one while compiling on NFS.

 C.

 kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480!
 
 I'm not exactly sure what patches you have applied and which patches are 
 not, with rc4-mm1 there are two patches (first one was incomplete, I 
 assume you had at least that one based on your other mail) to really fix 
 the issues in (__|)tcp_reset_fack_counts(...). 

Yes I only have the first patch you sent on lkml on top of 2.6.24-rc4-mm1.
attached below. I didn't see the second one on lkml ?  

 However, there seems to be so much breakage that I have a bit trouble to 
 decide where to start... The situation seems bit scary :-).

my n/w environment seems to reproduce these issues quite easily. if you
need some testing, just ping me.

Cheers,

C. 

 So, I might soon prepare a revert patch for most of the questionable 
 TCP parts and ask Dave to apply it (and drop them fully during next 
 rebase) unless I suddently figure something out soon which explains 
 all/most of the problems, then return to drawing board. ...As it seems 
 that the cumulative ACK processing problem discovered later on (having 
 rather cumbersome solution with skbs only) will make part of the work 
 that's currently in net-2.6.25 quite useless/duplicate effort. But thanks 
 anyway for reporting these.
 
 

Subject: [PATCH] [TCP]: Fix fack_count miscountings (multiple places)

1) Fack_count is set incorrectly if the highest sent skb is
already sacked (the skb-prev won't return it because it's on
the other list already). These manifest as fackets_out counting
error later on, the second-order effects are very hard to track,
so it may fix all out-standing TCP bug reports.

2) Prev == NULL check was wrong way around

3) Last skb's fack count was incorrectly skipped while() {} loop

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/net/tcp.h |   22 --
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9dbed0b..11a7e3e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1337,10 +1337,20 @@ static inline struct sk_buff *tcp_send_head(struct sock 
*sk)
 static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb)
 {
struct sk_buff *prev = tcp_write_queue_prev(sk, skb);
+   unsigned int fc = 0;
+
+   if (prev == (struct sk_buff *)sk-sk_write_queue)
+   prev = NULL;
+   else if (!tcp_skb_adjacent(sk, prev, skb))
+   prev = NULL;
 
-   if (prev != (struct sk_buff *)sk-sk_write_queue)
-   TCP_SKB_CB(skb)-fack_count = TCP_SKB_CB(prev)-fack_count +
- tcp_skb_pcount(prev);
+   if ((prev == NULL)  !__tcp_write_queue_empty(sk, TCP_WQ_SACKED))
+   prev = __tcp_write_queue_tail(sk, TCP_WQ_SACKED);
+
+   if (prev != NULL)
+   fc = TCP_SKB_CB(prev)-fack_count + tcp_skb_pcount(prev);
+
+   TCP_SKB_CB(skb)-fack_count = fc;
 
sk-sk_send_head = tcp_write_queue_next(sk, skb);
if (sk-sk_send_head == (struct sk_buff *)sk-sk_write_queue)
@@ -1464,7 +1474,7 @@ static inline struct sk_buff 
*__tcp_reset_fack_counts(struct sock *sk,
 {
unsigned int fc = 0;
 
-   if (prev == NULL)
+   if (prev != NULL)
fc = TCP_SKB_CB(*prev)-fack_count + tcp_skb_pcount(*prev);
 
BUG_ON((*prev != NULL)  !tcp_skb_adjacent(sk, *prev, skb));
@@ -1521,7 +1531,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, 
struct sk_buff *inskb)
skb[otherq] = prev-next;
}
 
-   while (skb[queue] != __tcp_write_queue_tail(sk, queue)) {
+   do {
/* Lazy find for the other queue */
if (skb[queue] == NULL) {
skb[queue] = tcp_write_queue_find(sk, 
TCP_SKB_CB(prev)-seq,
@@ -1535,7 +1545,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, 
struct sk_buff *inskb)
break;
 
queue ^= TCP_WQ_SACKED;
-   }
+   } while (skb[queue] != __tcp_write_queue_tail(sk, queue));
 }
 
 static inline void __tcp_insert_write_queue_after(struct sk_buff *skb,
-- 1.5.0.6
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote:
> On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote:
> > On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
> > > On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
> > > > From what i can roughly tell so far it seems like an resource conflict 
> > > > between acpi and
> > > > the pnp requested regions in your patch which result in the 
> > > > acpi_thermal code
> > > > to read the wrong (0xff) temperature value and halt the machine, but i 
> > > > might be
> > > > wrong on the details since acpi is such a big code chunk to swallow.
> > > 
> > > I don't see any obvious conflict from the log you posted.  For the sake
> > > of comparison, can you post the corresponding dmesg log after you removed
> > > the patch?
> > 
> > The only difference i see is that ACPI finds EC in DSDT in the working 
> > kernel
> > and in the broken case something silently fails. Please find attached the 2 
> > bootlogs
> > and a disassembled DSDT.
> 
> Thanks very much!
> 
> "ACPI: EC: Look up EC in DSDT" appears in the working log, but not
> in the broken one.  But I think we *do* find the EC in both cases,
> because we see "ACPI: EC: non-query interrupt received" even before
> acpi_ec_add() (which prints the "ACPI: EC: GPE = 0x1c, ...".  Maybe
> the logs were collected with different log levels?

Well, hm, actually no, the only difference is that the broken log was taken over
netconsole so the lines might appear in a different order. I'll capture that
log again on the weekend to see whether something is missing..
 
> I think Alexey is on the right track with the PCI resource allocation
> failure.

Then it should be the SMBus controller, PCI id 00:1f:3, which is having problems
registering its io ports region 4, AFAICT.

> On your working kernel, can you collect this:
> 
>   lspci -vv > lspci
>   cat /proc/ioports > ioports
>   cat /proc/iomem > iomem
>   grep . /sys/devices/pnp*/*/resources > pnp
>   tar -jcf resources.tar.bz2 lspci ioports iomem pnp

attached.

-- 
Regards/Gruß,
Boris.


resources.tar.bz2
Description: Binary data

Re: 2.6.24-rc4-mm1

Ilpo Järvinen wrote:
> On Wed, 5 Dec 2007, Andrew Morton wrote:
> 
>> On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly <[EMAIL PROTECTED]> wrote:
>>
>>> This non fatal oops which I have just noticed may be related to this change 
>>> then 
>>> - certainly looks networking related.
>> yep, but it isn't e1000.  It's core TCP.
>>
>>> WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
>>> Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1
>> Ilpo, Reuben's kernel is talking to you ;)
> 
> ...Please try the patch below. Andrew, this probably fixes your problem 
> (the packets <= tp->packets_out) as well.

nah. I got the WARNINGs again with this patch.

C.
 
> Dave, please include this one to net-2.6.25.
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

Ilpo Järvinen wrote:
> On Wed, 5 Dec 2007, David Miller wrote:
> 
>> From: Reuben Farrelly <[EMAIL PROTECTED]>
>> Date: Thu, 06 Dec 2007 17:59:37 +1100
>>
>>> On 5/12/2007 4:17 PM, Andrew Morton wrote:
 - Lots of device IDs have been removed from the e1000 driver and moved over
   to e1000e.  So if your e1000 stops working, you forgot to set 
 CONFIG_E1000E.
>>> This non fatal oops which I have just noticed may be related to this change 
>>> then 
>>> - certainly looks networking related.
>>>
>>> WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
>>> Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1
>>>
>>> Call Trace:
>>> [] tcp_fastretrans_alert+0x229/0xe63
>>>   [] tcp_ack+0xa3f/0x127d
>>>   [] tcp_rcv_established+0x55f/0x7f8
>>>   [] tcp_v4_do_rcv+0xdb/0x3a7
>>>   [] :nf_conntrack:nf_ct_deliver_cached_events+0x75/0x99
>> No, it's from TCP assertions and changes added by Ilpo to the
>> net-2.6.25 tree recently.
> 
> Yeah, this (very likely) due to the new SACK processing (in net-2.6.25). 
> I'll look what could go wrong with fack_count calculations, most likely 
> it's the reason (I've found earlier one out-of-place retransmission 
> segment in one of my test case which already indicated that there's 
> something incorrect with them but didn't have time to debug it yet).
> 
> Thanks for report. Some info about how easily you can reproduce & 
> couple of sentences about the test case might be useful later on when 
> evaluating the fix.

I also got plenty of these when untaring a tarball on NFS.

C. 

WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 
tcp_fastretrans_alert()
Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2

Call Trace:
   [] tcp_fastretrans_alert+0xb6/0xbf2
 [] tcp_ack+0xdf3/0xfbe
 [] sk_reset_timer+0x17/0x23
 [] tcp_rcv_established+0xf3/0x76d
 [] tcp_v4_do_rcv+0x37/0x3aa
 [] tcp_v4_rcv+0x9a9/0xa76
 [] ip_local_deliver_finish+0x161/0x23c
 [] ip_local_deliver+0x72/0x77
 [] ip_rcv_finish+0x371/0x3b5
 [] ip_rcv+0x292/0x2c6
 [] netif_receive_skb+0x267/0x340
 [] :tg3:tg3_poll+0x5d2/0x89e
 [] net_rx_action+0xd5/0x1ad
 [] __do_softirq+0x5f/0xe3
 [] call_softirq+0x1c/0x28
 [] do_softirq+0x39/0x9f
 [] irq_exit+0x4e/0x50
 [] do_IRQ+0xb7/0xd7
 [] mwait_idle+0x0/0x55
 [] ret_from_intr+0x0/0xf
   [] __atomic_notifier_call_chain+0x20/0x83
 [] mwait_idle+0x48/0x55
 [] enter_idle+0x22/0x24
 [] cpu_idle+0xa1/0xc5
 [] start_secondary+0x3b9/0x3c5

WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 
tcp_fastretrans_alert()
Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2

Call Trace:
   [] tcp_fastretrans_alert+0xb6/0xbf2
 [] tcp_ack+0xdf3/0xfbe
 [] tcp_data_queue+0x5da/0xb0a
 [] tcp_rcv_established+0xf3/0x76d
 [] tcp_v4_do_rcv+0x37/0x3aa
 [] tcp_v4_rcv+0x9a9/0xa76
 [] ip_local_deliver_finish+0x161/0x23c
 [] ip_local_deliver+0x72/0x77
 [] ip_rcv_finish+0x371/0x3b5
 [] ip_rcv+0x292/0x2c6
 [] netif_receive_skb+0x267/0x340
 [] :tg3:tg3_poll+0x5d2/0x89e
 [] net_rx_action+0xd5/0x1ad
 [] __do_softirq+0x5f/0xe3
 [] call_softirq+0x1c/0x28
 [] do_softirq+0x39/0x9f
 [] irq_exit+0x4e/0x50
 [] do_IRQ+0xb7/0xd7
 [] mwait_idle+0x0/0x55
 [] ret_from_intr+0x0/0xf
   [] __atomic_notifier_call_chain+0x20/0x83
 [] mwait_idle+0x48/0x55
 [] enter_idle+0x22/0x24
 [] cpu_idle+0xa1/0xc5
 [] start_secondary+0x3b9/0x3c5

WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 
tcp_fastretrans_alert()
Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2

Call Trace:
   [] tcp_fastretrans_alert+0xb6/0xbf2
 [] tcp_ack+0xdf3/0xfbe
 [] tcp_data_queue+0x5da/0xb0a
 [] tcp_rcv_established+0xf3/0x76d
 [] tcp_v4_do_rcv+0x37/0x3aa
 [] tcp_v4_rcv+0x9a9/0xa76
 [] ip_local_deliver_finish+0x161/0x23c
 [] ip_local_deliver+0x72/0x77
 [] ip_rcv_finish+0x371/0x3b5
 [] ip_rcv+0x292/0x2c6
 [] netif_receive_skb+0x267/0x340
 [] :tg3:tg3_poll+0x5d2/0x89e
 [] net_rx_action+0xd5/0x1ad
 [] __do_softirq+0x5f/0xe3
 [] call_softirq+0x1c/0x28
 [] do_softirq+0x39/0x9f
 [] irq_exit+0x4e/0x50
 [] do_IRQ+0xb7/0xd7
 [] mwait_idle+0x0/0x55
 [] ret_from_intr+0x0/0xf
   [] __atomic_notifier_call_chain+0x20/0x83
 [] mwait_idle+0x48/0x55
 [] enter_idle+0x22/0x24
 [] cpu_idle+0xa1/0xc5
 [] start_secondary+0x3b9/0x3c5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

2007-12-12 Thread Bjorn Helgaas

On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote:
> On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
> > On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
> > > From what i can roughly tell so far it seems like an resource conflict 
> > > between acpi and
> > > the pnp requested regions in your patch which result in the acpi_thermal 
> > > code
> > > to read the wrong (0xff) temperature value and halt the machine, but i 
> > > might be
> > > wrong on the details since acpi is such a big code chunk to swallow.
> > 
> > I don't see any obvious conflict from the log you posted.  For the sake
> > of comparison, can you post the corresponding dmesg log after you removed
> > the patch?
> 
> The only difference i see is that ACPI finds EC in DSDT in the working kernel
> and in the broken case something silently fails. Please find attached the 2 
> bootlogs
> and a disassembled DSDT.

Thanks very much!

"ACPI: EC: Look up EC in DSDT" appears in the working log, but not
in the broken one.  But I think we *do* find the EC in both cases,
because we see "ACPI: EC: non-query interrupt received" even before
acpi_ec_add() (which prints the "ACPI: EC: GPE = 0x1c, ...".  Maybe
the logs were collected with different log levels?

I think Alexey is on the right track with the PCI resource allocation
failure.  On your working kernel, can you collect this:

  lspci -vv > lspci
  cat /proc/ioports > ioports
  cat /proc/iomem > iomem
  grep . /sys/devices/pnp*/*/resources > pnp
  tar -jcf resources.tar.bz2 lspci ioports iomem pnp

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

2007-12-12 Thread Alexey Starikovskiy


Borislav Petkov wrote:

On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
  

On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:


From what i can roughly tell so far it seems like an resource conflict between 
acpi and
the pnp requested regions in your patch which result in the acpi_thermal code
to read the wrong (0xff) temperature value and halt the machine, but i might be
wrong on the details since acpi is such a big code chunk to swallow.
  

I don't see any obvious conflict from the log you posted.  For the sake
of comparison, can you post the corresponding dmesg log after you removed
the patch?



The only difference i see is that ACPI finds EC in DSDT in the working kernel
and in the broken case something silently fails. Please find attached the 2 
bootlogs
and a disassembled DSDT.

  

This seems to be the start of trouble...
   PCI: Cannot allocate resource region 4 of device :00:1f.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
> On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
> > From what i can roughly tell so far it seems like an resource conflict 
> > between acpi and
> > the pnp requested regions in your patch which result in the acpi_thermal 
> > code
> > to read the wrong (0xff) temperature value and halt the machine, but i 
> > might be
> > wrong on the details since acpi is such a big code chunk to swallow.
> 
> I don't see any obvious conflict from the log you posted.  For the sake
> of comparison, can you post the corresponding dmesg log after you removed
> the patch?

The only difference i see is that ACPI finds EC in DSDT in the working kernel
and in the broken case something silently fails. Please find attached the 2 
bootlogs
and a disassembled DSDT.

-- 
Regards/Gruß,
Boris.
BZh91AYµÜ¬t$Øÿ¬ý´ÿÿÿïÿþÿÿÿô`/?* P¨ P
 IMÓS4CÔjzÒz©iF! Ð
hà 4

4È4i£&@4 ¦24dÁÐ44Ó [EMAIL PROTECTED] 
1pÐd4ÈÑ ¤ Ñ i 
ÄLÈÉªaèÚM©ý_åÿ/;ý÷b°[$íÄTGµýD{Þ¥ó¢DÍJeK"ñ.¢(¼«%Id¥)ED¢Soúí|ÏõW©?Ó>m
ªY.ë]ò±âÓ£d©ÂNÓlm´ø,5r5K¦HÖc¢}öÍÕä^}2«¶,iänqb]%[<¿*Ê}oÁ±¹îu>îöñàv©ýQââÍÏ½\_©ÔÔû®§µòÙÝØÊît¬å:©÷t*ñ²¡ëgccðYéM®3<§ñ}~3btV¯CQµì-çûÙ®
]9üºcròM^I~·ð{ÎÆÆÆ-+Ýïþm_çÅçäâüTÍyãö1qw=ß{q}/i½ñwö¬ìbíì{u©Op~
Ôö2?\ÙGÖòx®z(åä(Hi#
ÎAäÃê4,8´©Tc4Ñ\ÎéY¡Jlfø5Fì[öeècãxsü×
Ü§kÇ):ÔjZÉù^ÏÎ×kÉåÉS±ú".ó6¬ØÉâÜÉaN.7ëÁÙ«}Uí-
¢Ö´ªµUlÚÚîyùúð©ÉO%Æ¥Êt=ëÛÊ*QE)¥µ
j=)îqw;
ýÚ¦-é_²èâ§4×çv>GSþ}.ÖÐðt1]>Óû)wÔÍØ¸»ZÎGçhÞ÷µÌNwhþK[#cïXkÎÈä)¿¥iæ¡ç{[wËçs{¥ähÔ±=§[ßfº5½nµ§^cµÖ¾Å*'LGûnX{Ý.k¦ò¿åtíkvøõéÚÚónøI"Ó¥,ô§¡OGwõ6ºSÎîd¬·×ã0ö0Qñp`èt:Tdå½Þ¾^¢ÌÜÔ<Ý/³6ÔSâõ§½sDâé-CÔ>wàÉz]>ØºÞ[SÁ©Þë[ÈÚôxIÉçíÖ¤fbè~Vsà²ÍNÊûW{\»¹û¸ñàÚêx¼ýªvbo^IÈt5¬ìôÉNk,êw:Û3yÑèõ¬Os[É¥=«6¾³'ï{ÕÐÀÖ¨z©ÚÙ±ßÝÂêIâÅ»Dó,ØcÂYíÙJl2Rhq´#NS¶13&â½HöëíÒûKI:5Ã©D÷)ò1æìß2ß{µ)ÐìæäÅOpÜº3ÚúÞ®Gêkb¿{{]<Øà©òÃ±ÒäÖéfðu2
ôº7½ªÇ¿Ó¯Ì»àÅì¿~¬\Ôä°ít*IðQÇG5YÑµÜ×¸ð~GkTØ§síÑ(´¸NL%½H·a&dHâpè`GÄ0È§%OcgçðÁÉYDïiCv»mÎ<Õ<nM±L
ªaE70jV8À¨º2GIÛ±8(ÉÜý\ýz[^/ý¹CÐ~té
G Ñ¥¥xnOý÷+¼ËAR46·µò?S&[EMAIL 
PROTECTED]/¼¸5¿
Õ0~:\ô©¹îqg6?k¿¹ïd³¥ÒWÛ«S¡²´³ì·JQÀÚÔÍÖæÖ]øþ§=.Ísös}Kèx½ÈÞüNõÖdðzØ:[Íàâýj\ÉNáÍ Ã/³ÆæáÏ&Î£.¡°ó.
½ñlö,·¥FçBh§üz]Ò
¬æè±Õ¯Íau×dõ!Ò±g½eÙýX¶kÔ«Z~¤§Êá{O¹í{äù,g8Ø}8¹<:ÎÐKÿÉHÕ®º>ÈðGsÉI©§öÝÀô¹O¿Þ³öÙ§êaeL66bÃ6ÕûW~§ß?ìðØÎ
JG£äèÙl27½
<ËoV
µì|¬.¡IK1²ïk).ÁUKK,]0`³Æ¦/M2>fC,èÑZ|,£VÍWfj¢S&½k?õ0Òý^ÞHêsØ¿ÁéS¿T  
`´¦ÑF9âÂÒMXî4PG%¿Îlõ¤Æ'ó`ì%.IlÝ0Ne;jHÜ¯e»TôTòWoô{c%ÕONëþÎ¦,ÎÖ´ÉûYCú¿äá£oíìSÎÓf«òv5è©ÉÎÏ¯E9ª¦fì²©³[ÿ¯ØÔÅ¥?£ýRÍn$Ô¹8¼_BíÇëy.Ü»7ÔÄó»,{Öè]ïØ¹)
  Î"nqRÃÅe¨É®õ)k+55/vµ)cKR´i fîôÝ·ó°{»®kZµö:juZ¬Yw6N
R¥)_'þ50) 
âðN#ªÍk)ÖàÅg6¦n,Y¿¦n¦²`Ð:Íjc·fL)u[ÓS{«%Ôa|ð;ARAlî.,JÁN>uYucªksp^r1fÿõvjkt,³;³íZs¦æöÄÁ,;éfóg7ó]fÝ´[üU$Å1RÝ*lpMZaÕ¯RjtNÈmi=Jß*é;YJVæ(7Bv'k)=/èì¥õ6¹µ:uÐ]92v[×ûº(£S%ßÄ£R¥)JRÄ©éRê"R¥)E(¥:¾ÉXªê-T¥J«·¾,ÝMªLGð.}Müxãè}O;zQî3àæô´}N&[Ì²¬¦QN:¤Àä5ý

*Ë<õ#éRüíeI²2Ý_'ú¦Ç©ñfõ´llãF¶-`o+væÕô£LkOÙYRîµ²`»Ö°ÒË-®òÏ¥65(Ã]ZôÍfêYu¼¬o¥`³5³`ÁMºO.µm.ÓÎàÂÆ_þURiçW½ßJQTà¥¬V¬T±Oàù:Ë3ån.8Ç¼SDø,Üìo]îPÂ^¶#>ÕöðÎp>V·ìuÍ³äRmÖ,É¹F

WuÒ¦Ó¼O¿
G]ÝJNõ#ÂÇäKçÐ®L\ï3ëW5~±pÚjpF£qÙ¼Ù¯tüìYáO3®í5^ØßL691Ã6O'U)J£Z±\d©¬õ®¦öLV`äØôåÑÄË_wx±àÃ¹©ì×éfÍ:läø·¶åèmfà¦ç lYÆãÐºö}íí¯½©5*§²ìYj»Øó6á&8¼^Tóêéfï3¿ß½æy5®Jq'IÉv²]¬Í£kËcT©K÷¯èd³¦¥fãüù.Åz×XcF¦^fWr:QßsÆnÏ%ÜÁNo~vKæYü¼½ÖqoÏr°_÷ðê·G²ÍêkÖ¹©FJëxô»Ç5MZ[W±f³ÂÊ`ñXèc6TèÃ'UØ¦ØÅ°§ñY{co'yÞ`mgMÖÔÚúò~õ&[Zgr÷hS{](lK»p5úUyS&µÚ»^Y¶1Ý]µÛ°mYCZ_^k3¥×ìäkiÀ6[Ø·¯Ã3ÔÉ]

¿K8²uÔÕ¾¦î=ÖjÎè¥µÖÎJiÔÃÿ<ßÁw,i×©vi±´ÕùseEÊd|ÞíY¹Ziý-Ö3t)Ö6â±£'Ê×ÔÅNÕÚ"úYGæ&¶Æ
µëpÕ¤Êe»RÇLilO¶6¿FÙ¤¹;&¼))ýhLVrmÎLÕÃ ¶m¬ÌÛÖm7mÏ
Wm_kF9V¤ÓsvK¶³¥xí|ÔùwÖÑÝ{áÉG%ÜüOÚýêÔüY>ÑOÀR`J(ÑáÀ34z¯R¨ü_±2obÂ4TYYÝ]YY£Fx©Ñ±ÁöÓØýZ¥¯d¬ü>Æ2{¬Ytø¿Éø»ÍEJk]Òjáëöê½ï{_9,»+^ªRë»Û7ºX`¥4iTÑ£s4ûá¸³6lÚ4F©Ðê]±ýlÃ£mmY#^¹¸¬jW&½±±ÁÁkængfMâÀØñÜÞWhÙ#Èº"TY½½ÅÅfßø6Ì2fÒWï¿_K©¬¾¤æm]¢e8úº2õý®VjÍEê¤ÔÎÈuª

éúkS_ôYÍ©Ó£_k¡ó¼Î->.q÷ºÚ5¾;ß#zT<YOý¾U£góá6Õ}^¯Ûto~?§'»s½ùDú9°´õQízØ¿ÏÎ¨¥IúäûPe)õ.»ù,ýýºÌ¤âRrPÿ7*6B5ÊRx9º~EP)E?õ85-Ù¿±¹
Ï¼´~óÿ§S©8ÍªJkµO¥µ½£u¬¦-òJn`²ÌÙÍå
Hì»±¢DÔ;¹µE3à¤¡QüWt)Fõ01UkIiRR.]æÁÎoWkêf±gØ§ñqhÜßjÇò?yÍ½Áçe:ÿ½ÔÀÅµØf§{ÖQ*¡á6'ªë(xr:SÉNÇó°SFÄÜÛ®KÅ¬µNì©¶ÖFÑÀÑÓñu#§y³¼XÏUL×Z#SÁwTÒ<ÌVqf# R`ÊIv+%Ö)ÅLTnZÈÉfâÈ¡J¨R(
QJTUõ*.f+
znL)D¡ÐfkP°îX»i(©~bµ¶.xFJ¬=Z3Á:ÞK7»/$¶»T´Å)þëLký¾KÚ±v¶
£qÝ2<%ÞOêÀ`§¥î4v¾v-C7¥KC½dv)JIJ=¥öòv;:2HÔìþ´I¥
àsG¹MÎ.0Dèncù,ÛIú]NÖ¶k´9=ð`ó1z×2Éf4klSÎÉf¦ç"ÂN%&ÔºCÊ1

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
 On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
  From what i can roughly tell so far it seems like an resource conflict 
  between acpi and
  the pnp requested regions in your patch which result in the acpi_thermal 
  code
  to read the wrong (0xff) temperature value and halt the machine, but i 
  might be
  wrong on the details since acpi is such a big code chunk to swallow.
 
 I don't see any obvious conflict from the log you posted.  For the sake
 of comparison, can you post the corresponding dmesg log after you removed
 the patch?

The only difference i see is that ACPI finds EC in DSDT in the working kernel
and in the broken case something silently fails. Please find attached the 2 
bootlogs
and a disassembled DSDT.

-- 
Regards/Gruß,
Boris.
BZh91AYSYµÜ¬t$Øÿ¬ý´ÿÿÿïÿþÿÿÿô`/?* P¨ P
 IMÓS4CÔjzÒz©iF! Ð
hà 4

4È4i£@4 ¦24dÁÐ44Ó [EMAIL PROTECTED] 
1pÐd4ÈÑ ¤ Ñ i 
ÄLÈÉªaèÚM©ý_åÿ/;ý÷b°[$íÄTGµýD{Þ¥ó¢DÍJeKñ.¢(¼«%Id¥)ED¢Soúí|ÏõW©?Óm
ªY.ë]ò±âÓNÃb667ÿÏÕæÁnTÿ£¿ªÍÁ¬ÕÛu¯]¼ßsFXÓý3Õ}2F
hbl¾¦nZÓoSz·cËG×ÿwØÖº²u²¦»®¬äÑãEFtõ
9mòS®ÙÃv64¾s¬+0Ee£jçÄ²B[EMAIL PROTECTED]|Ywv]ÑòZÏ%r£[øîV;1`¤÷`
ÔÅ£¡ZAµL¼tÖ/TàJÒómo~ÿsVÜ¹62¾5ÉJR¥9ª×\Þk`RcPñEë0ç8ÙÄ¹«DÅ0_$[õ¯[qrÔú5kõ?Ù©Íµ8¯»©¬Á)}úhíoÃ¹ut8pÍuï}ø·7¶,ÔÁþ»2dÉ©Ì,ÁÌ,Áêu².¦Öo¯Zf¤ò½ÎÉë})îm~F[6QOçl8µ¿âõop
ÌVOü¾Å½()ROæçËÝêf÷#½Ôï{éÁírdþÎú¬}
ËWÈñtNÜ§½F;ýM³µ±ä»HãcÌÛ2nSÍíYþ+5?Åõº^åÞâWt³:mmCñfYw½wZë6TØ¬n¨b*#o¢nÉÿ?ðÎÎÕOåüôcæO2ëÝ.º÷Ku]]ÖÂêR¥6ÓåZÓ.45e³d¥)(¥(¥)4ù¹°k{\
}¸²pGþÑüý·©%É{UÔpkÂÌÍÊ5¬Å8£d©ÂNÓlm´ø,5r5K¦HÖc¢}öÍÕä^}2«¶,iänqb]%[¿*Ê}oÁ±¹îuîöñàv©ýQââÍÏ½\_©ÔÔû®§µòÙÝØÊît¬å:©÷t*ñ²¡ëgccðYéM®3§ñ}~3btV¯CQµì-çûÙ®
]9üºcròM^I~·ð{ÎÆÆÆ-+Ýïþm_çÅçäâüTÍyãö1qw=ß{q}/i½ñwö¬ìbíì{u©Op~
Ôö2?\ÙGÖòx®z(åä(Hi#
ÎAäÃê4,8´©Tc4Ñ\ÎéY¡Jlfø5Fì[öeècãxsü×
Ü§kÇ):ÔjZÉù^ÏÎ×kÉåÉS±ú.ó6¬ØÉâÜÉaN.7ëÁÙ«}Uí-
¢Ö´ªµUlÚÚîyùúð©ÉO%Æ¥Êt=ëÛÊ*QE)¥µ
j=)îqw;
ýÚ¦-é_²èâ§4×çvGSþ}.ÖÐðt1]Óû)wÔÍØ¸»ZÎGçhÞ÷µÌNwhþK[#cïXkÎÈä)¿¥iæ¡ç{[wËçs{¥ähÔ±=§[ßfº5½nµ§^cµÖ¾Å*'LGûnX{Ý.k¦ò¿åtíkvøõéÚÚónøIÓ¥,ô§¡OGwõ6ºSÎîd¬·×ã0ö0Qñp`èt:Tdå½Þ¾^¢ÌÜÔÝ/³6ÔSâõ§½sDâé-CÔwàÉz]ØºÞ[SÁ©Þë[ÈÚôxIÉçíÖ¤fbè~Vsà²ÍNÊûW{\»¹û¸ñàÚêx¼ýªvbo^IÈt5¬ìôÉNk,êw:Û3yÑèõ¬Os[É¥=«6¾³'ï{ÕÐÀÖ¨z©ÚÙ±ßÝÂêIâÅ»Dó,ØcÂYíÙJl2Rhq´#NS¶13â½HöëíÒûKI:5Ã©D÷)ò1æìß2ß{µ)ÐìæäÅOpÜº3ÚúÞ®Gêkb¿{{]Øà©òÃ±ÒäÖéfðu2
ôº7½ªÇ¿Ó¯Ì»àÅì¿~¬\Ôä°ít*IðQÇG5YÑµÜ×¸ð~GkTØ§síÑ(´¸NL%½H·adHâpè`GÄ0È§%OcgçðÁÉYDïiCv»mÎÕnM±L
ªaE70jV8À¨º2GIÛ±8(ÉÜý\ýz[^/ý¹CÐ~té
G Ñ¥¥xnOý÷+¼ËAR46·µò?S[EMAIL 
PROTECTED]iG|¿skcoÔõOÜçewü-ñ«|Ú¾Öo·|³§qÅÔë5¹¦¦îÕUl56¬ÁÛë÷5}Á-Û¤öïÔíÑK:L}XK´[àÔècí|ÌýmO°WÖ¥)e)c{,Ö}s[Û«Ãé¯:Î3÷¾§ÒR¥?ÓrÓÐ¡²µ÷X}jûspzéOÊ¯Êö,§ØÒêiñ;6|_rS±g/¼¸5¿
Õ0~:\ô©¹îqg6?k¿¹ïd³¥ÒWÛ«S¡²´³ì·JQÀÚÔÍÖæÖ]øþ§=.Ísös}Kèx½ÈÞüNõÖdðzØ:[Íàâýj\ÉNáÍ Ã/³ÆæáÏÎ£.¡°ó.
½ñlö,·¥FçBh§üz]Ò
¬æè±Õ¯Íau×dõ!Ò±g½eÙýX¶kÔ«Z~¤§Êá{O¹í{äù,g8Ø}8¹:ÎÐKÿÉHÕ®ºÈðGsÉI©§öÝÀô¹O¿Þ³öÙ§êaeL66bÃ6ÕûW~§ß?ìðØÎ
JG£äèÙl27½f
ËoV
µì|¬.¡IK1²ïk).ÁUKK,]0`³Æ¦/M2fC,èÑZ|,£VÍWfj¢S½k?õ0Òý^ÞHêsØ¿ÁéS¿T  
`´¦ÑF9âÂÒMXî4PG%¿Îlõ¤Æ'ó`ì%.IlÝ0Ne;jHÜ¯e»TôTòWoô{c%ÕONëþÎ¦,ÎÖ´ÉûYCú¿äá£oíìSÎÓf«òv5è©ÉÎÏ¯E9ª¦fì²©³[ÿ¯ØÔÅ¥?£ýRÍn$Ô¹8¼_BíÇëy.Ü»7ÔÄó»,{Öè]ïØ¹)
  ÎnqRÃÅe¨É®õ)k+55/vµ)cKR´i fîôÝ·ó°{»®kZµö:juZ¬Yw6N
R¥)_'þ50) 
âðN#ªÍk)ÖàÅg6¦n,Y¿¦n¦²`ÐE:Íjc·fL)u[ÓS{«%Ôa|ð;ARAlî.,JÁNuYucªksp^r1fÿõvjkt,³;³íZs¦æöÄÁ,;éfóg7ó]fÝ´[üU$Å1RÝ*lpMZaÕ¯RjtNÈmi=Jß*é;YJVæ(7Bv'k)=/èì¥õ6¹µ:uÐ]92v[×ûº(£S%ßÄ£R¥)JRÄ©éRêR¥)E(¥:¾ÉXªê-T¥J«·¾,ÝMªLGð.}Müxãè}O;zQî3àæô´}N[Ì²¬¦QN:¤Àä5ý

*Ëõ#éRüíeI²2Ý_'ú¦Ç©ñfõ´llãF¶-`o+væÕô£LkOÙYRîµ²`»Ö°ÒË-®òÏ¥65(Ã]ZôÍfêYu¼¬o¥`³5³`ÁMºO.µm.ÓÎàÂÆ_þURiçW½ßJQTà¥¬V¬T±Oàù:Ë3ån.8Ç¼SDø,Üìo]îPÂ^¶#ÕöðÎpV·ìuÍ³äRmÖ,É¹F

WuÒ¦Ó¼O¿
G]ÝJNõ#ÂÇäKçÐ®L\ï3ëW5~±pÚjpF£qÙ¼Ù¯tüìYáO3®í5^ØßL691Ã6O'U)J£Z±\d©¬õ®¦öLV`äØôåÑÄË_wx±àÃ¹©ì×éfÍ:läø·¶åèmfà¦ç lYÆãÐºö}íí¯½©5*§²ìYj»Øó6á8¼^Tóêéfï3¿ß½æy5®Jq'IÉv²]¬Í£kËcT©K÷¯èd³¦¥fãüù.Åz×XcF¦^fWr:QßsÆnÏ%ÜÁNo~vKæYü¼½ÖqoÏr°_÷ðê·G²ÍêkÖ¹©FJëxô»Ç5MZ[W±f³ÂÊ`ñXèc6TèÃ'UØ¦ØÅ°§ñY{co'yÞ`mgMÖÔÚúò~õ[Zgr÷hS{](lK»p5úUySµÚ»^Y¶1Ý]µÛ°mYCZ_^k3¥×ìäkiÀ6[Ø·¯Ã3ÔÉ]

¿K8²uÔÕ¾¦î=ÖjÎè¥µÖÎJiÔÃÿßÁw,i×©vi±´ÕùseEÊd|ÞíY¹Ziý-Ö3t)Ö6â±£'Ê×ÔÅNÕÚúYGæ¶Æ
µëpÕ¤Êe»RÇLilO¶6¿FÙ¤¹;¼))ýhLVrmÎLÕÃ ¶m¬ÌÛÖm7mÏ
Wm_kF9V¤ÓsvK¶³¥xí|ÔùwÖÑÝ{áÉG%ÜüOÚýêÔüYÑOÀR`J(ÑáÀ34z¯R¨ü_±2obÂ4TYYÝ]YY£Fx©Ñ±ÁöÓØýZ¥¯d¬üÆ2{¬Ytø¿Éø»ÍEJk]Òjáëöê½ï{_9,»+^ªRë»Û7ºX`¥4iTÑ£s4ûá¸³6lÚ4F©Ðê]±ýlÃ£mmY#^¹¸¬jW½±±ÁÁkængfMâÀØñÜÞW

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

2007-12-12 Thread Alexey Starikovskiy


Borislav Petkov wrote:

On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
  

On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:


From what i can roughly tell so far it seems like an resource conflict between 
acpi and
the pnp requested regions in your patch which result in the acpi_thermal code
to read the wrong (0xff) temperature value and halt the machine, but i might be
wrong on the details since acpi is such a big code chunk to swallow.
  

I don't see any obvious conflict from the log you posted.  For the sake
of comparison, can you post the corresponding dmesg log after you removed
the patch?



The only difference i see is that ACPI finds EC in DSDT in the working kernel
and in the broken case something silently fails. Please find attached the 2 
bootlogs
and a disassembled DSDT.

  

This seems to be the start of trouble...
   PCI: Cannot allocate resource region 4 of device :00:1f.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

2007-12-12 Thread Bjorn Helgaas

On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote:
 On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
  On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
   From what i can roughly tell so far it seems like an resource conflict 
   between acpi and
   the pnp requested regions in your patch which result in the acpi_thermal 
   code
   to read the wrong (0xff) temperature value and halt the machine, but i 
   might be
   wrong on the details since acpi is such a big code chunk to swallow.
  
  I don't see any obvious conflict from the log you posted.  For the sake
  of comparison, can you post the corresponding dmesg log after you removed
  the patch?
 
 The only difference i see is that ACPI finds EC in DSDT in the working kernel
 and in the broken case something silently fails. Please find attached the 2 
 bootlogs
 and a disassembled DSDT.

Thanks very much!

ACPI: EC: Look up EC in DSDT appears in the working log, but not
in the broken one.  But I think we *do* find the EC in both cases,
because we see ACPI: EC: non-query interrupt received even before
acpi_ec_add() (which prints the ACPI: EC: GPE = 0x1c,   Maybe
the logs were collected with different log levels?

I think Alexey is on the right track with the PCI resource allocation
failure.  On your working kernel, can you collect this:

  lspci -vv  lspci
  cat /proc/ioports  ioports
  cat /proc/iomem  iomem
  grep . /sys/devices/pnp*/*/resources  pnp
  tar -jcf resources.tar.bz2 lspci ioports iomem pnp

Bjorn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

Ilpo Järvinen wrote:
 On Wed, 5 Dec 2007, David Miller wrote:
 
 From: Reuben Farrelly [EMAIL PROTECTED]
 Date: Thu, 06 Dec 2007 17:59:37 +1100

 On 5/12/2007 4:17 PM, Andrew Morton wrote:
 - Lots of device IDs have been removed from the e1000 driver and moved over
   to e1000e.  So if your e1000 stops working, you forgot to set 
 CONFIG_E1000E.
 This non fatal oops which I have just noticed may be related to this change 
 then 
 - certainly looks networking related.

 WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
 Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1

 Call Trace:
   IRQ  [8046e038] tcp_fastretrans_alert+0x229/0xe63
   [80470975] tcp_ack+0xa3f/0x127d
   [804747b7] tcp_rcv_established+0x55f/0x7f8
   [8047b1aa] tcp_v4_do_rcv+0xdb/0x3a7
   [881148a8] :nf_conntrack:nf_ct_deliver_cached_events+0x75/0x99
 No, it's from TCP assertions and changes added by Ilpo to the
 net-2.6.25 tree recently.
 
 Yeah, this (very likely) due to the new SACK processing (in net-2.6.25). 
 I'll look what could go wrong with fack_count calculations, most likely 
 it's the reason (I've found earlier one out-of-place retransmission 
 segment in one of my test case which already indicated that there's 
 something incorrect with them but didn't have time to debug it yet).
 
 Thanks for report. Some info about how easily you can reproduce  
 couple of sentences about the test case might be useful later on when 
 evaluating the fix.

I also got plenty of these when untaring a tarball on NFS.

C. 

WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 
tcp_fastretrans_alert()
Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2

Call Trace:
 IRQ  [804115bf] tcp_fastretrans_alert+0xb6/0xbf2
 [80413f30] tcp_ack+0xdf3/0xfbe
 [803da8fb] sk_reset_timer+0x17/0x23
 [80416d1e] tcp_rcv_established+0xf3/0x76d
 [8041d231] tcp_v4_do_rcv+0x37/0x3aa
 [8041fb1f] tcp_v4_rcv+0x9a9/0xa76
 [80402e4e] ip_local_deliver_finish+0x161/0x23c
 [80403363] ip_local_deliver+0x72/0x77
 [80402ca9] ip_rcv_finish+0x371/0x3b5
 [804032bd] ip_rcv+0x292/0x2c6
 [803e3dcc] netif_receive_skb+0x267/0x340
 [8806eff4] :tg3:tg3_poll+0x5d2/0x89e
 [803e639d] net_rx_action+0xd5/0x1ad
 [8023b605] __do_softirq+0x5f/0xe3
 [8020c86c] call_softirq+0x1c/0x28
 [8020e739] do_softirq+0x39/0x9f
 [8023b5a4] irq_exit+0x4e/0x50
 [8020e880] do_IRQ+0xb7/0xd7
 [8020a803] mwait_idle+0x0/0x55
 [8020bb66] ret_from_intr+0x0/0xf
 EOI  [8024d623] __atomic_notifier_call_chain+0x20/0x83
 [8020a84b] mwait_idle+0x48/0x55
 [80209e79] enter_idle+0x22/0x24
 [8020a793] cpu_idle+0xa1/0xc5
 [8021dfd5] start_secondary+0x3b9/0x3c5

WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 
tcp_fastretrans_alert()
Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2

Call Trace:
 IRQ  [804115bf] tcp_fastretrans_alert+0xb6/0xbf2
 [80413f30] tcp_ack+0xdf3/0xfbe
 [804153b8] tcp_data_queue+0x5da/0xb0a
 [80416d1e] tcp_rcv_established+0xf3/0x76d
 [8041d231] tcp_v4_do_rcv+0x37/0x3aa
 [8041fb1f] tcp_v4_rcv+0x9a9/0xa76
 [80402e4e] ip_local_deliver_finish+0x161/0x23c
 [80403363] ip_local_deliver+0x72/0x77
 [80402ca9] ip_rcv_finish+0x371/0x3b5
 [804032bd] ip_rcv+0x292/0x2c6
 [803e3dcc] netif_receive_skb+0x267/0x340
 [8806eff4] :tg3:tg3_poll+0x5d2/0x89e
 [803e639d] net_rx_action+0xd5/0x1ad
 [8023b605] __do_softirq+0x5f/0xe3
 [8020c86c] call_softirq+0x1c/0x28
 [8020e739] do_softirq+0x39/0x9f
 [8023b5a4] irq_exit+0x4e/0x50
 [8020e880] do_IRQ+0xb7/0xd7
 [8020a803] mwait_idle+0x0/0x55
 [8020bb66] ret_from_intr+0x0/0xf
 EOI  [8024d623] __atomic_notifier_call_chain+0x20/0x83
 [8020a84b] mwait_idle+0x48/0x55
 [80209e79] enter_idle+0x22/0x24
 [8020a793] cpu_idle+0xa1/0xc5
 [8021dfd5] start_secondary+0x3b9/0x3c5

WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 
tcp_fastretrans_alert()
Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2

Call Trace:
 IRQ  [804115bf] tcp_fastretrans_alert+0xb6/0xbf2
 [80413f30] tcp_ack+0xdf3/0xfbe
 [804153b8] tcp_data_queue+0x5da/0xb0a
 [80416d1e] tcp_rcv_established+0xf3/0x76d
 [8041d231] tcp_v4_do_rcv+0x37/0x3aa
 [8041fb1f] tcp_v4_rcv+0x9a9/0xa76
 [80402e4e] ip_local_deliver_finish+0x161/0x23c
 [80403363] ip_local_deliver+0x72/0x77
 [80402ca9] ip_rcv_finish+0x371/0x3b5
 [804032bd] ip_rcv+0x292/0x2c6
 [803e3dcc] netif_receive_skb+0x267/0x340
 [8806eff4] :tg3:tg3_poll+0x5d2/0x89e
 [803e639d] net_rx_action+0xd5/0x1ad
 [8023b605] __do_softirq+0x5f/0xe3
 [8020c86c]

Re: 2.6.24-rc4-mm1

Ilpo Järvinen wrote:
 On Wed, 5 Dec 2007, Andrew Morton wrote:
 
 On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly [EMAIL PROTECTED] wrote:

 This non fatal oops which I have just noticed may be related to this change 
 then 
 - certainly looks networking related.
 yep, but it isn't e1000.  It's core TCP.

 WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
 Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1
 Ilpo, Reuben's kernel is talking to you ;)
 
 ...Please try the patch below. Andrew, this probably fixes your problem 
 (the packets = tp-packets_out) as well.

nah. I got the WARNINGs again with this patch.

C.
 
 Dave, please include this one to net-2.6.25.
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Wed, Dec 12, 2007 at 09:21:41AM -0700, Bjorn Helgaas wrote:
 On Wednesday 12 December 2007 03:11:23 am Borislav Petkov wrote:
  On Tue, Dec 11, 2007 at 05:08:59PM -0700, Bjorn Helgaas wrote:
   On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
From what i can roughly tell so far it seems like an resource conflict 
between acpi and
the pnp requested regions in your patch which result in the 
acpi_thermal code
to read the wrong (0xff) temperature value and halt the machine, but i 
might be
wrong on the details since acpi is such a big code chunk to swallow.
   
   I don't see any obvious conflict from the log you posted.  For the sake
   of comparison, can you post the corresponding dmesg log after you removed
   the patch?
  
  The only difference i see is that ACPI finds EC in DSDT in the working 
  kernel
  and in the broken case something silently fails. Please find attached the 2 
  bootlogs
  and a disassembled DSDT.
 
 Thanks very much!
 
 ACPI: EC: Look up EC in DSDT appears in the working log, but not
 in the broken one.  But I think we *do* find the EC in both cases,
 because we see ACPI: EC: non-query interrupt received even before
 acpi_ec_add() (which prints the ACPI: EC: GPE = 0x1c,   Maybe
 the logs were collected with different log levels?

Well, hm, actually no, the only difference is that the broken log was taken over
netconsole so the lines might appear in a different order. I'll capture that
log again on the weekend to see whether something is missing..
 
 I think Alexey is on the right track with the PCI resource allocation
 failure.

Then it should be the SMBus controller, PCI id 00:1f:3, which is having problems
registering its io ports region 4, AFAICT.

 On your working kernel, can you collect this:
 
   lspci -vv  lspci
   cat /proc/ioports  ioports
   cat /proc/iomem  iomem
   grep . /sys/devices/pnp*/*/resources  pnp
   tar -jcf resources.tar.bz2 lspci ioports iomem pnp

attached.

-- 
Regards/Gruß,
Boris.


resources.tar.bz2
Description: Binary data

Re: 2.6.24-rc4-mm1

2007-12-11 Thread Rik van Riel

On Tue, 4 Dec 2007 21:17:01 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> Changes since 2.6.24-rc3-mm2:

2.6.24-rc4-mm1 brought a nice TCP oops on my x86_64 system, while I
was stress-testing the VM and watching via ssh:

general protection fault:  [1] SMP 
last sysfs file: /sys/devices/pci:00/:00:1c.5/:04:00.0/irq
CPU 1 
Modules linked in: nfs lockd nfs_acl rfcomm l2cap bluetooth autofs4 sunrpc ipv6 
acpi_cpufreq dm_multipath parport_pc e1000e parport firewire_ohci button 
i2c_i801 i2c_core i82975x_edac pcspkr firewire_core serio_raw edac_core 
rtc_cmos floppy crc_itu_t sg sr_mod cdrom pata_marvell ata_piix dm_snapshot 
dm_zero dm_mirror dm_mod ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd 
ohci_hcd ehci_hcd
Pid: 2946, comm: sshd Not tainted 2.6.24-rc4-mm1 #1
RIP: 0010:[]  [] __tcp_rb_insert+0x1a/0x67
RSP: 0018:810066401c88  EFLAGS: 00010202
RAX: 6b6b6b6b6b6b6b6b RBX: 810076e9f000 RCX: 81003ddc9900
RDX: 6b6b6b6b6b6b6bab RSI: 81006ed1b148 RDI: 6b6b6b6b6b6b6b5b
RBP: 81006ed1aa00 R08: 810076e9f010 R09: bef8d64e
R10: 81228926 R11: 8110b2aa R12: 810066401de8
R13: 00e0 R14: 810066401ee8 R15: 0001
FS:  7f1c2c10d780() GS:81007f801578() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 02aabfd3 CR3: 665e3000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process sshd (pid: 2946, threadinfo 81006640, task 8100665ce000)
Stack:  81003ddc9900 81228b26  0001
 810066401ee8 810574da 04e00040 00e004e0
 7f1c2c797620 0246 66401d60 
Call Trace:
 [] tcp_sendmsg+0x21f/0xb00
 [] sock_aio_write+0xf8/0x110
 [] do_sync_write+0xc9/0x10c
 [] file_has_perm+0x9a/0xa9
 [] autoremove_wake_function+0x0/0x2e
 [] __lock_acquire+0x50f/0xc8e
 [] lock_release_holdtime+0x27/0x48
 [] vfs_write+0xd9/0x16f
 [] sys_write+0x45/0x6e
 [] tracesys+0xdc/0xe1


Code: 44 3b 4a 1c 79 10 44 3b 4a 18 78 04 0f 0b eb fe 48 8d 50 10 
RIP  [] __tcp_rb_insert+0x1a/0x67
 RSP 


-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1 -- boot process hangs -- tty4 main process (2988) terminated with status 1

On Sat, 8 Dec 2007 21:29:18 -0500 "Miles Lane" <[EMAIL PROTECTED]> wrote:

> > > Dec  6 21:24:28 erratic-orbits init: tty3 main process (2991)
> > > terminated with status 1
> >
> > Boggle.  We broke the vt driver?
> >
> > config, please...
> 
> I sent the .config.

I didn't receive it but I found a config from you in amother thread.

>  Is there nothing else to follow up on?  I have
> tried rebuilding about seven kernels, tweaking the options each time.
> All the kernels have failed to boot.   I am currently trying with a
> "defconfig" kernel.  Perhaps I will have better luck with it.

Your config instabricks my Vaio.  Fiddled with it a bit but failed to pick
the problem.  Fixing regressions in -mm isn't top priority at present I'm
afraid.  If the same bug is present in next -mm it'd be great if you could
bisect it down please.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
> From what i can roughly tell so far it seems like an resource conflict 
> between acpi and
> the pnp requested regions in your patch which result in the acpi_thermal code
> to read the wrong (0xff) temperature value and halt the machine, but i might 
> be
> wrong on the details since acpi is such a big code chunk to swallow.

I don't see any obvious conflict from the log you posted.  For the sake
of comparison, can you post the corresponding dmesg log after you removed
the patch?

acpi_thermal_get_temperature() only evaluates _TMP, which isn't very
interesting.  I wonder if there's some conflict between that AML method
and the EC driver or something.

If you can also collect the DSDT, maybe I can poke around in there and
see what _TMP is really doing.

Thanks,
  Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

On Tue, 11 Dec 2007 14:17:16 -0800 Kok, Auke wrote:

> Andrew Morton wrote:
> > On Tue, 11 Dec 2007 13:26:58 -0800
> > "Kok, Auke" <[EMAIL PROTECTED]> wrote:
> > 
> >> Andrew Morton wrote:
> >>> On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" <[EMAIL PROTECTED]> 
> >>> wrote:
> >>>
> > - Lots of device IDs have been removed from the e1000 driver and moved
> > over
> >  to e1000e.  So if your e1000 stops working, you forgot to set
> > CONFIG_E1000E.
> >
> >
>  Wouldn't it make sense to just default this to on if E1000 was on, rather
>  than screwing
>  everybody for no good reason (plus breaking all the automated testing, 
>  etc
>  etc)?
>  Much though I love random refactoring, it is fairly painful to just keep
>  changing the
>  names of things.
> >>> (cc netdev and Auke)
> >>>
> >>> Yes, that would be very sensible.  CONFIG_E1000E should default to 
> >>> whatever
> >>> CONFIG_E1000 was set to.
> >> which is "y" for x86 and friends, ppc, arm and ia64 through 'defconfig'. 
> >> the
> >> Kconfig files do not have defaults in them.
> > 
> > I wouldn't be looking at defconfig files - I don't think many people use
> > them.  Most people use their previous config, via oldconfig.
> > 
> > So what we want here is to give them E1000E if they had previously been
> > using E1000.  I don't know how one would do this in Kconfig.
> 
> ditto. I doubt that "SELECT E1000E" would be a good idea here (maybe not even
> work), and I can't think of anything else.

"default E1000" in E1000E seems to work for me.

---

From: Randy Dunlap <[EMAIL PROTECTED]>

Make E1000E default to the same kconfig setting as E1000,
at least for -mm testing.

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 drivers/net/Kconfig |1 +
 1 file changed, 1 insertion(+)

--- linux-2.6.24-rc4-mm1.orig/drivers/net/Kconfig
+++ linux-2.6.24-rc4-mm1/drivers/net/Kconfig
@@ -1986,6 +1986,7 @@ config E1000_DISABLE_PACKET_SPLIT
 config E1000E
tristate "Intel(R) PRO/1000 PCI-Express Gigabit Ethernet support"
depends on PCI
+   default E1000
---help---
  This driver supports the PCI-Express Intel(R) PRO/1000 gigabit
  ethernet family of adapters. For PCI or PCI-X e1000 adapters,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

Andrew Morton wrote:
> On Tue, 11 Dec 2007 13:26:58 -0800
> "Kok, Auke" <[EMAIL PROTECTED]> wrote:
> 
>> Andrew Morton wrote:
>>> On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" <[EMAIL PROTECTED]> wrote:
>>>
> - Lots of device IDs have been removed from the e1000 driver and moved
> over
>  to e1000e.  So if your e1000 stops working, you forgot to set
> CONFIG_E1000E.
>
>
 Wouldn't it make sense to just default this to on if E1000 was on, rather
 than screwing
 everybody for no good reason (plus breaking all the automated testing, etc
 etc)?
 Much though I love random refactoring, it is fairly painful to just keep
 changing the
 names of things.
>>> (cc netdev and Auke)
>>>
>>> Yes, that would be very sensible.  CONFIG_E1000E should default to whatever
>>> CONFIG_E1000 was set to.
>> which is "y" for x86 and friends, ppc, arm and ia64 through 'defconfig'. the
>> Kconfig files do not have defaults in them.
> 
> I wouldn't be looking at defconfig files - I don't think many people use
> them.  Most people use their previous config, via oldconfig.
> 
> So what we want here is to give them E1000E if they had previously been
> using E1000.  I don't know how one would do this in Kconfig.

ditto. I doubt that "SELECT E1000E" would be a good idea here (maybe not even
work), and I can't think of anything else.

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

On Tue, 11 Dec 2007 13:26:58 -0800
"Kok, Auke" <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" <[EMAIL PROTECTED]> wrote:
> > 
> >>>
> >>> - Lots of device IDs have been removed from the e1000 driver and moved
> >>> over
> >>>  to e1000e.  So if your e1000 stops working, you forgot to set
> >>> CONFIG_E1000E.
> >>>
> >>>
> >> Wouldn't it make sense to just default this to on if E1000 was on, rather
> >> than screwing
> >> everybody for no good reason (plus breaking all the automated testing, etc
> >> etc)?
> >> Much though I love random refactoring, it is fairly painful to just keep
> >> changing the
> >> names of things.
> > 
> > (cc netdev and Auke)
> > 
> > Yes, that would be very sensible.  CONFIG_E1000E should default to whatever
> > CONFIG_E1000 was set to.
> 
> which is "y" for x86 and friends, ppc, arm and ia64 through 'defconfig'. the
> Kconfig files do not have defaults in them.

I wouldn't be looking at defconfig files - I don't think many people use
them.  Most people use their previous config, via oldconfig.

So what we want here is to give them E1000E if they had previously been
using E1000.  I don't know how one would do this in Kconfig.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

Kok, Auke wrote:
> Andrew Morton wrote:
>> On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" <[EMAIL PROTECTED]> wrote:
>>
 - Lots of device IDs have been removed from the e1000 driver and moved
 over
  to e1000e.  So if your e1000 stops working, you forgot to set
 CONFIG_E1000E.


>>> Wouldn't it make sense to just default this to on if E1000 was on, rather
>>> than screwing
>>> everybody for no good reason (plus breaking all the automated testing, etc
>>> etc)?
>>> Much though I love random refactoring, it is fairly painful to just keep
>>> changing the
>>> names of things.
>> (cc netdev and Auke)
>>
>> Yes, that would be very sensible.  CONFIG_E1000E should default to whatever
>> CONFIG_E1000 was set to.
> 
> which is "y" for x86 and friends, ppc, arm and ia64 through 'defconfig'. the
> Kconfig files do not have defaults in them.
> 
> I can send a patch to adjust the defconfig files, would that be OK? I 
> certainly
> think that would be reasonable, I dislike setting defaults through defconfig 
> for
> network drivers myself and rather would not do that.

that should read "dislike setting defaults through Kconfig ..."

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

Andrew Morton wrote:
> On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" <[EMAIL PROTECTED]> wrote:
> 
>>>
>>> - Lots of device IDs have been removed from the e1000 driver and moved
>>> over
>>>  to e1000e.  So if your e1000 stops working, you forgot to set
>>> CONFIG_E1000E.
>>>
>>>
>> Wouldn't it make sense to just default this to on if E1000 was on, rather
>> than screwing
>> everybody for no good reason (plus breaking all the automated testing, etc
>> etc)?
>> Much though I love random refactoring, it is fairly painful to just keep
>> changing the
>> names of things.
> 
> (cc netdev and Auke)
> 
> Yes, that would be very sensible.  CONFIG_E1000E should default to whatever
> CONFIG_E1000 was set to.

which is "y" for x86 and friends, ppc, arm and ia64 through 'defconfig'. the
Kconfig files do not have defaults in them.

I can send a patch to adjust the defconfig files, would that be OK? I certainly
think that would be reasonable, I dislike setting defaults through defconfig for
network drivers myself and rather would not do that.

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Tue, Dec 11, 2007 at 01:00:24PM -0700, Bjorn Helgaas wrote:
> On Tuesday 11 December 2007 10:44:43 am Borislav Petkov wrote:
> > On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote:
> > > On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote:
> > > > Hi Andrew,
> > > > Hi Len,
> > > > 
> > > > after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just
> > > > fine) on my asus laptop, the machine reboots after claiming that
> > > > "Critical temperature reached (255 C)." However, the degrees number
> > > > is kinda hinting at 0xff all-ones field. Will try dump_stack in
> > > > acpi_thermal_critical() to checkout the call path. For now here's the 
> > > > netconsole bootlog:
> > > 
> > > Here's what i got so far:
> > > 
> > > [   50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14
> > > [   50.287999]  [] show_trace_log_lvl+0x12/0x25
> > > [   50.288103]  [] show_trace+0xd/0x10
> > > [   50.288202]  [] dump_stack+0x57/0x5f
> > > [   50.288303]  [] acpi_thermal_check+0x150/0x3bb
> > > [   50.288415]  [] acpi_thermal_add+0x261/0x2cf
> > > [   50.288515]  [] acpi_device_probe+0x3e/0xdb
> > > [   50.288615]  [] driver_probe_device+0xaf/0x12a
> > > [   50.288717]  [] __driver_attach+0x6c/0xa5
> > > [   50.288817]  [] bus_for_each_dev+0x3e/0x60
> > > [   50.288916]  [] driver_attach+0x14/0x16
> > > [   50.289015]  [] bus_add_driver+0xa6/0x1a8
> > > [   50.289114]  [] driver_register+0x42/0x47
> > > [   50.289214]  [] acpi_bus_register_driver+0x3a/0x3c
> > > [   50.289316]  [] acpi_thermal_init+0x57/0x76
> > > [   50.289424]  [] kernel_init+0x138/0x280
> > > [   50.289525]  [] kernel_thread_helper+0x7/0x10
> > > [   50.289625]  ===
> > > [   50.289680] ACPI: Critical trip point
> > > [   50.289736] Critical temperature reached (255 C), shutting down.
> > > 
> > > so in acpi_thermal_get_temperature() called in acpi_thermal_add() the
> > > tz->temperature thingy is not set properly (printk's added):
> > > 
> > > [   50.276607] Old temp: 4294967023
> > > [   50.281890] Got temp: 255
> > > [   50.282567] Old temp: 255
> > > [   50.287882] Got temp: 255
> > > 
> > > What's also strange is that the tz acpi_thermal is alloc'd with kzalloc 
> > > and
> > > there's still garbage in it after reading it in 
> > > acpi_thermal_get_temperature()
> > > for the first time. Debugging continues...
> > 
> > (i almost suspected that the problem might be something completely 
> > different.)
> > well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer
> > turned out to be
> > 
> > broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch.
> > 
> > After backing this one out, mm1 boots just fine here.
> 
> Thanks for tracking this down.  I'll look into your logs and see if I
> can figure out what's going on.  There's another report related to that
> patch here: http://lkml.org/lkml/2007/11/22/110 .  Looks like a different
> symptom though, so probably a different fix.

>From what i can roughly tell so far it seems like an resource conflict between 
>acpi and
the pnp requested regions in your patch which result in the acpi_thermal code
to read the wrong (0xff) temperature value and halt the machine, but i might be
wrong on the details since acpi is such a big code chunk to swallow. Anyways, 
this is a
different issue than the one you quote above.

-- 
Regards/Gruß,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

2007-12-11 Thread Ingo Molnar


* Andrew Morton <[EMAIL PROTECTED]> wrote:

> > I can't see this compile failure posted anywhere:
> > http://test.kernel.org/results/IBM/126049/build/debug/stderr
> > 
> > arch/x86/vdso/vdso32/sigreturn.S: Assembler messages:
> > arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid for 
> > `pop'
> > arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for 
> > `pop'
> > make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1
> > make: *** [arch/x86/vdso] Error 2
> 
> (cc Ingo and Thomas)

Roland says:

| That seems like it must be a tool problem.  The V=1 output would show 
| if those compiles missed -m32 or something.  But even in the wrong 
| mode, this error does not make sense.  The assembly code it's citing 
| is identical to the old arch/x86/ia32/vsyscall-syscall.S code.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" <[EMAIL PROTECTED]> wrote:

> >
> >
> > - Lots of device IDs have been removed from the e1000 driver and moved
> > over
> >  to e1000e.  So if your e1000 stops working, you forgot to set
> > CONFIG_E1000E.
> >
> >
> Wouldn't it make sense to just default this to on if E1000 was on, rather
> than screwing
> everybody for no good reason (plus breaking all the automated testing, etc
> etc)?
> Much though I love random refactoring, it is fairly painful to just keep
> changing the
> names of things.

(cc netdev and Auke)

Yes, that would be very sensible.  CONFIG_E1000E should default to whatever
CONFIG_E1000 was set to.

> 
> I can't see this compile failure posted anywhere:
> http://test.kernel.org/results/IBM/126049/build/debug/stderr
> 
> arch/x86/vdso/vdso32/sigreturn.S: Assembler messages:
> arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid for 
> `pop'
> arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for `pop'
> make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1
> make: *** [arch/x86/vdso] Error 2

(cc Ingo and Thomas)

> 
> Nor this one:
> http://test.kernel.org/results/IBM/126096/build/debug/stderr
> 
> drivers/char/hvcs.c: In function â€˜hvcs_openâ€™:
> drivers/char/hvcs.c:1180: error: wrong type argument to unary exclamation mark
> 

(cc Greg)

Caused by gregkh-driver-kobject-convert-hvcs-to-use-kref-not-kobject.patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Tuesday 11 December 2007 10:44:43 am Borislav Petkov wrote:
> On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote:
> > On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote:
> > > Hi Andrew,
> > > Hi Len,
> > > 
> > > after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just
> > > fine) on my asus laptop, the machine reboots after claiming that
> > > "Critical temperature reached (255 C)." However, the degrees number
> > > is kinda hinting at 0xff all-ones field. Will try dump_stack in
> > > acpi_thermal_critical() to checkout the call path. For now here's the 
> > > netconsole bootlog:
> > 
> > Here's what i got so far:
> > 
> > [   50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14
> > [   50.287999]  [] show_trace_log_lvl+0x12/0x25
> > [   50.288103]  [] show_trace+0xd/0x10
> > [   50.288202]  [] dump_stack+0x57/0x5f
> > [   50.288303]  [] acpi_thermal_check+0x150/0x3bb
> > [   50.288415]  [] acpi_thermal_add+0x261/0x2cf
> > [   50.288515]  [] acpi_device_probe+0x3e/0xdb
> > [   50.288615]  [] driver_probe_device+0xaf/0x12a
> > [   50.288717]  [] __driver_attach+0x6c/0xa5
> > [   50.288817]  [] bus_for_each_dev+0x3e/0x60
> > [   50.288916]  [] driver_attach+0x14/0x16
> > [   50.289015]  [] bus_add_driver+0xa6/0x1a8
> > [   50.289114]  [] driver_register+0x42/0x47
> > [   50.289214]  [] acpi_bus_register_driver+0x3a/0x3c
> > [   50.289316]  [] acpi_thermal_init+0x57/0x76
> > [   50.289424]  [] kernel_init+0x138/0x280
> > [   50.289525]  [] kernel_thread_helper+0x7/0x10
> > [   50.289625]  ===
> > [   50.289680] ACPI: Critical trip point
> > [   50.289736] Critical temperature reached (255 C), shutting down.
> > 
> > so in acpi_thermal_get_temperature() called in acpi_thermal_add() the
> > tz->temperature thingy is not set properly (printk's added):
> > 
> > [   50.276607] Old temp: 4294967023
> > [   50.281890] Got temp: 255
> > [   50.282567] Old temp: 255
> > [   50.287882] Got temp: 255
> > 
> > What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and
> > there's still garbage in it after reading it in 
> > acpi_thermal_get_temperature()
> > for the first time. Debugging continues...
> 
> (i almost suspected that the problem might be something completely different.)
> well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer
> turned out to be
> 
> broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch.
> 
> After backing this one out, mm1 boots just fine here.

Thanks for tracking this down.  I'll look into your logs and see if I
can figure out what's going on.  There's another report related to that
patch here: http://lkml.org/lkml/2007/11/22/110 .  Looks like a different
symptom though, so probably a different fix.

Bjorn

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1


I can't see this compile failure posted anywhere:
http://test.kernel.org/results/IBM/126049/build/debug/stderr

arch/x86/vdso/vdso32/sigreturn.S: Assembler messages:
arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid 
for `pop'


arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for 
`pop'

make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1
make: *** [arch/x86/vdso] Error 2


I see those on one build machine but not on another, so I thought
that it was a tools issue...


If so, it's a tools issue that worked fine until -mm1, which makes
it a kernel problem in my mind ;-)


Nor this one:
http://test.kernel.org/results/IBM/126096/build/debug/stderr

drivers/char/hvcs.c: In function Ã¢â‚¬Ëœhvcs_openÃ¢â‚¬â„¢:
drivers/char/hvcs.c:1180: error: wrong type argument to unary 
exclamation mark


See http://marc.info/?l=linux-kernel=119700448119646
for patches.



Thanks,

M.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote:
> On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote:
> > Hi Andrew,
> > Hi Len,
> > 
> > after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just
> > fine) on my asus laptop, the machine reboots after claiming that
> > "Critical temperature reached (255 C)." However, the degrees number
> > is kinda hinting at 0xff all-ones field. Will try dump_stack in
> > acpi_thermal_critical() to checkout the call path. For now here's the 
> > netconsole bootlog:
> 
> Here's what i got so far:
> 
> [   50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14
> [   50.287999]  [] show_trace_log_lvl+0x12/0x25
> [   50.288103]  [] show_trace+0xd/0x10
> [   50.288202]  [] dump_stack+0x57/0x5f
> [   50.288303]  [] acpi_thermal_check+0x150/0x3bb
> [   50.288415]  [] acpi_thermal_add+0x261/0x2cf
> [   50.288515]  [] acpi_device_probe+0x3e/0xdb
> [   50.288615]  [] driver_probe_device+0xaf/0x12a
> [   50.288717]  [] __driver_attach+0x6c/0xa5
> [   50.288817]  [] bus_for_each_dev+0x3e/0x60
> [   50.288916]  [] driver_attach+0x14/0x16
> [   50.289015]  [] bus_add_driver+0xa6/0x1a8
> [   50.289114]  [] driver_register+0x42/0x47
> [   50.289214]  [] acpi_bus_register_driver+0x3a/0x3c
> [   50.289316]  [] acpi_thermal_init+0x57/0x76
> [   50.289424]  [] kernel_init+0x138/0x280
> [   50.289525]  [] kernel_thread_helper+0x7/0x10
> [   50.289625]  ===
> [   50.289680] ACPI: Critical trip point
> [   50.289736] Critical temperature reached (255 C), shutting down.
> 
> so in acpi_thermal_get_temperature() called in acpi_thermal_add() the
> tz->temperature thingy is not set properly (printk's added):
> 
> [   50.276607] Old temp: 4294967023
> [   50.281890] Got temp: 255
> [   50.282567] Old temp: 255
> [   50.287882] Got temp: 255
> 
> What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and
> there's still garbage in it after reading it in acpi_thermal_get_temperature()
> for the first time. Debugging continues...

(i almost suspected that the problem might be something completely different.)
well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer
turned out to be

broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch.

After backing this one out, mm1 boots just fine here.
-- 
Regards/Gruß,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

On Tue, 11 Dec 2007 08:20:05 -0800 Martin Bligh wrote:

>  >- Lots of device IDs have been removed from the e1000 driver and
>  > moved over to e1000e.  So if your e1000 stops working, you forgot
>  > to set CONFIG_E1000E.
> 
> 
> Wouldn't it make sense to just default this to on if E1000 was on?
> As far as I can see that's not true, which will screwing everybody
> for no good reason (plus breaking all the automated testing, etc etc)?
> Much though I love random refactoring, it is fairly painful to just
> keep changing the names of things.
> 
> 
> I can't see this compile failure posted anywhere:
> http://test.kernel.org/results/IBM/126049/build/debug/stderr
> 
> arch/x86/vdso/vdso32/sigreturn.S: Assembler messages:
> arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid 
> for `pop'
> 
> arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for 
> `pop'
> make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1
> make: *** [arch/x86/vdso] Error 2

I see those on one build machine but not on another, so I thought
that it was a tools issue...


> Nor this one:
> http://test.kernel.org/results/IBM/126096/build/debug/stderr
> 
> drivers/char/hvcs.c: In function â€˜hvcs_openâ€™:
> drivers/char/hvcs.c:1180: error: wrong type argument to unary 
> exclamation mark

See http://marc.info/?l=linux-kernel=119700448119646
for patches.


---
~Randy
Features and documentation: http://lwn.net/Articles/260136/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1


>- Lots of device IDs have been removed from the e1000 driver and
> moved over to e1000e.  So if your e1000 stops working, you forgot
> to set CONFIG_E1000E.


Wouldn't it make sense to just default this to on if E1000 was on?
As far as I can see that's not true, which will screwing everybody
for no good reason (plus breaking all the automated testing, etc etc)?
Much though I love random refactoring, it is fairly painful to just
keep changing the names of things.


I can't see this compile failure posted anywhere:
http://test.kernel.org/results/IBM/126049/build/debug/stderr

arch/x86/vdso/vdso32/sigreturn.S: Assembler messages:
arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid 
for `pop'


arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for 
`pop'

make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1
make: *** [arch/x86/vdso] Error 2


Nor this one:
http://test.kernel.org/results/IBM/126096/build/debug/stderr

drivers/char/hvcs.c: In function â€˜hvcs_openâ€™:
drivers/char/hvcs.c:1180: error: wrong type argument to unary 
exclamation mark

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

2007-12-11 Thread Reuben Farrelly




On 11/12/2007 8:11 AM, Andrew Morton wrote:

On Tue, 11 Dec 2007 01:48:39 +1100
Reuben Farrelly <[EMAIL PROTECTED]> wrote:



On 5/12/2007 4:17 PM, Andrew Morton wrote:

Temporarily at

  http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/

Will appear later at

  
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/


- Lots of device IDs have been removed from the e1000 driver and moved over
  to e1000e.  So if your e1000 stops working, you forgot to set CONFIG_E1000E.

- The s390 build is still broken.

I'm seeing this most incredibly unhelpful (to debug) but fortunately
reproduceable problem (so far 4/4 times) on this -mm kernel.  I thought this 
problem may have been related to another bug which I have reported (A TCP oops) 
but even after applying a likely fix for that I am still seeing this problem.


The machine boots up perfectly fine and runs good until I load it up.
In this case I can reliably cause this to occur by pulling a 3G ISO across the
GigE network from my Linux box to my PC.  After maybe 50M or so, the console 
just displays this (ignore initial boot banner):


--

  * Starting local ... [ ok 
]


This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01

tornado login: *** buffer overf

---

Yes - after displaying the 'f' in what I can only guess is the word 'overflow',
the box spontaneously reboots.  There is no further console output until it 
starts to come back up again.


The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 
2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage.


I enabled a number of kernel debugging options but then I got no output at all 
when the machine crashed.


I'm at a bit of a loss as to which subsystem this might be coming from, so I'm 
not sure who to CC.


Box information is (still) up at 
http://www.reub.net/files/kernel/2.6.24-rc4-mm1/



hm.  grepping around for "buffer overflow" doesn't turn up anything except in
drivers which you won't be using on that machine.

I'd be suspecting networking, obviously.  If you're feeling keen could you 
please
grep a 2.6.24-rc4 tree and apply 2.6.24-rc4-mm1's origin.patch and git-net.patch
and see if the bug is still present?


No - seems to be fine with just origin.patch and git-net.patch.

Just for good measure I then reverted git-net.patch and applied 
git-netdev-all.patch instead, and still wasn't able to trigger the reboot or 
console message, no matter how hard I tried.


I guess for now I'll sit on it, and if it appears in the next -mm it'll probably 
annoy me enough and inspire me to dig deeper (or, "guess" deeper, given the lack 
of direction as to where to even begin).


Reuben
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: undefined reference to `compat_sys_timerfd' on sparc64

2007-12-11 Thread David Miller

From: Andrew Morton <[EMAIL PROTECTED]>
Date: Fri, 7 Dec 2007 16:08:00 -0800

> Or should this have been sys_nis_syscall()?

sys_nis_syscall() was used in cases on sparc where we wanted
to get a log of invocations of unimplemented syscalls, as it
aided debugging and anaylsis.

But the usefulness of such things I think is long gone, so
what I'll likely do is kill the sys_nis_syscall stuff from the
sparc ports.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: undefined reference to `compat_sys_timerfd' on sparc64

2007-12-11 Thread David Miller

From: Andrew Morton [EMAIL PROTECTED]
Date: Fri, 7 Dec 2007 16:08:00 -0800

 Or should this have been sys_nis_syscall()?

sys_nis_syscall() was used in cases on sparc where we wanted
to get a log of invocations of unimplemented syscalls, as it
aided debugging and anaylsis.

But the usefulness of such things I think is long gone, so
what I'll likely do is kill the sys_nis_syscall stuff from the
sparc ports.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

2007-12-11 Thread Reuben Farrelly




On 11/12/2007 8:11 AM, Andrew Morton wrote:

On Tue, 11 Dec 2007 01:48:39 +1100
Reuben Farrelly [EMAIL PROTECTED] wrote:



On 5/12/2007 4:17 PM, Andrew Morton wrote:

Temporarily at

  http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/

Will appear later at

  
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/


- Lots of device IDs have been removed from the e1000 driver and moved over
  to e1000e.  So if your e1000 stops working, you forgot to set CONFIG_E1000E.

- The s390 build is still broken.

I'm seeing this most incredibly unhelpful (to debug) but fortunately
reproduceable problem (so far 4/4 times) on this -mm kernel.  I thought this 
problem may have been related to another bug which I have reported (A TCP oops) 
but even after applying a likely fix for that I am still seeing this problem.


The machine boots up perfectly fine and runs good until I load it up.
In this case I can reliably cause this to occur by pulling a 3G ISO across the
GigE network from my Linux box to my PC.  After maybe 50M or so, the console 
just displays this (ignore initial boot banner):


--

  * Starting local ... [ ok 
]


This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01

tornado login: *** buffer overf

---

Yes - after displaying the 'f' in what I can only guess is the word 'overflow',
the box spontaneously reboots.  There is no further console output until it 
starts to come back up again.


The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 
2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage.


I enabled a number of kernel debugging options but then I got no output at all 
when the machine crashed.


I'm at a bit of a loss as to which subsystem this might be coming from, so I'm 
not sure who to CC.


Box information is (still) up at 
http://www.reub.net/files/kernel/2.6.24-rc4-mm1/



hm.  grepping around for buffer overflow doesn't turn up anything except in
drivers which you won't be using on that machine.

I'd be suspecting networking, obviously.  If you're feeling keen could you 
please
grep a 2.6.24-rc4 tree and apply 2.6.24-rc4-mm1's origin.patch and git-net.patch
and see if the bug is still present?


No - seems to be fine with just origin.patch and git-net.patch.

Just for good measure I then reverted git-net.patch and applied 
git-netdev-all.patch instead, and still wasn't able to trigger the reboot or 
console message, no matter how hard I tried.


I guess for now I'll sit on it, and if it appears in the next -mm it'll probably 
annoy me enough and inspire me to dig deeper (or, guess deeper, given the lack 
of direction as to where to even begin).


Reuben
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1


- Lots of device IDs have been removed from the e1000 driver and
 moved over to e1000e.  So if your e1000 stops working, you forgot
 to set CONFIG_E1000E.


Wouldn't it make sense to just default this to on if E1000 was on?
As far as I can see that's not true, which will screwing everybody
for no good reason (plus breaking all the automated testing, etc etc)?
Much though I love random refactoring, it is fairly painful to just
keep changing the names of things.


I can't see this compile failure posted anywhere:
http://test.kernel.org/results/IBM/126049/build/debug/stderr

arch/x86/vdso/vdso32/sigreturn.S: Assembler messages:
arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid 
for `pop'


arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for 
`pop'

make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1
make: *** [arch/x86/vdso] Error 2


Nor this one:
http://test.kernel.org/results/IBM/126096/build/debug/stderr

drivers/char/hvcs.c: In function â€˜hvcs_openâ€™:
drivers/char/hvcs.c:1180: error: wrong type argument to unary 
exclamation mark

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

On Tue, 11 Dec 2007 08:20:05 -0800 Martin Bligh wrote:

  - Lots of device IDs have been removed from the e1000 driver and
   moved over to e1000e.  So if your e1000 stops working, you forgot
   to set CONFIG_E1000E.
 
 
 Wouldn't it make sense to just default this to on if E1000 was on?
 As far as I can see that's not true, which will screwing everybody
 for no good reason (plus breaking all the automated testing, etc etc)?
 Much though I love random refactoring, it is fairly painful to just
 keep changing the names of things.
 
 
 I can't see this compile failure posted anywhere:
 http://test.kernel.org/results/IBM/126049/build/debug/stderr
 
 arch/x86/vdso/vdso32/sigreturn.S: Assembler messages:
 arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid 
 for `pop'
 
 arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for 
 `pop'
 make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1
 make: *** [arch/x86/vdso] Error 2

I see those on one build machine but not on another, so I thought
that it was a tools issue...


 Nor this one:
 http://test.kernel.org/results/IBM/126096/build/debug/stderr
 
 drivers/char/hvcs.c: In function â€˜hvcs_openâ€™:
 drivers/char/hvcs.c:1180: error: wrong type argument to unary 
 exclamation mark

See http://marc.info/?l=linux-kernelm=119700448119646
for patches.


---
~Randy
Features and documentation: http://lwn.net/Articles/260136/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote:
 On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote:
  Hi Andrew,
  Hi Len,
  
  after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just
  fine) on my asus laptop, the machine reboots after claiming that
  Critical temperature reached (255 C). However, the degrees number
  is kinda hinting at 0xff all-ones field. Will try dump_stack in
  acpi_thermal_critical() to checkout the call path. For now here's the 
  netconsole bootlog:
 
 Here's what i got so far:
 
 [   50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14
 [   50.287999]  [c0104b65] show_trace_log_lvl+0x12/0x25
 [   50.288103]  [c01053e7] show_trace+0xd/0x10
 [   50.288202]  [c0105a6c] dump_stack+0x57/0x5f
 [   50.288303]  [c021c991] acpi_thermal_check+0x150/0x3bb
 [   50.288415]  [c021d4b3] acpi_thermal_add+0x261/0x2cf
 [   50.288515]  [c0213549] acpi_device_probe+0x3e/0xdb
 [   50.288615]  [c023f8f5] driver_probe_device+0xaf/0x12a
 [   50.288717]  [c023fa88] __driver_attach+0x6c/0xa5
 [   50.288817]  [c023ee5a] bus_for_each_dev+0x3e/0x60
 [   50.288916]  [c023f77d] driver_attach+0x14/0x16
 [   50.289015]  [c023f5a6] bus_add_driver+0xa6/0x1a8
 [   50.289114]  [c023fc53] driver_register+0x42/0x47
 [   50.289214]  [c02138c2] acpi_bus_register_driver+0x3a/0x3c
 [   50.289316]  [c044306b] acpi_thermal_init+0x57/0x76
 [   50.289424]  [c04344a7] kernel_init+0x138/0x280
 [   50.289525]  [c01047df] kernel_thread_helper+0x7/0x10
 [   50.289625]  ===
 [   50.289680] ACPI: Critical trip point
 [   50.289736] Critical temperature reached (255 C), shutting down.
 
 so in acpi_thermal_get_temperature() called in acpi_thermal_add() the
 tz-temperature thingy is not set properly (printk's added):
 
 [   50.276607] Old temp: 4294967023
 [   50.281890] Got temp: 255
 [   50.282567] Old temp: 255
 [   50.287882] Got temp: 255
 
 What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and
 there's still garbage in it after reading it in acpi_thermal_get_temperature()
 for the first time. Debugging continues...

(i almost suspected that the problem might be something completely different.)
well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer
turned out to be

broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch.

After backing this one out, mm1 boots just fine here.
-- 
Regards/Gruß,
Boris.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1


I can't see this compile failure posted anywhere:
http://test.kernel.org/results/IBM/126049/build/debug/stderr

arch/x86/vdso/vdso32/sigreturn.S: Assembler messages:
arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid 
for `pop'


arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for 
`pop'

make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1
make: *** [arch/x86/vdso] Error 2


I see those on one build machine but not on another, so I thought
that it was a tools issue...


If so, it's a tools issue that worked fine until -mm1, which makes
it a kernel problem in my mind ;-)


Nor this one:
http://test.kernel.org/results/IBM/126096/build/debug/stderr

drivers/char/hvcs.c: In function Ã¢â‚¬Ëœhvcs_openÃ¢â‚¬â„¢:
drivers/char/hvcs.c:1180: error: wrong type argument to unary 
exclamation mark


See http://marc.info/?l=linux-kernelm=119700448119646
for patches.



Thanks,

M.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Tuesday 11 December 2007 10:44:43 am Borislav Petkov wrote:
 On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote:
  On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote:
   Hi Andrew,
   Hi Len,
   
   after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just
   fine) on my asus laptop, the machine reboots after claiming that
   Critical temperature reached (255 C). However, the degrees number
   is kinda hinting at 0xff all-ones field. Will try dump_stack in
   acpi_thermal_critical() to checkout the call path. For now here's the 
   netconsole bootlog:
  
  Here's what i got so far:
  
  [   50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14
  [   50.287999]  [c0104b65] show_trace_log_lvl+0x12/0x25
  [   50.288103]  [c01053e7] show_trace+0xd/0x10
  [   50.288202]  [c0105a6c] dump_stack+0x57/0x5f
  [   50.288303]  [c021c991] acpi_thermal_check+0x150/0x3bb
  [   50.288415]  [c021d4b3] acpi_thermal_add+0x261/0x2cf
  [   50.288515]  [c0213549] acpi_device_probe+0x3e/0xdb
  [   50.288615]  [c023f8f5] driver_probe_device+0xaf/0x12a
  [   50.288717]  [c023fa88] __driver_attach+0x6c/0xa5
  [   50.288817]  [c023ee5a] bus_for_each_dev+0x3e/0x60
  [   50.288916]  [c023f77d] driver_attach+0x14/0x16
  [   50.289015]  [c023f5a6] bus_add_driver+0xa6/0x1a8
  [   50.289114]  [c023fc53] driver_register+0x42/0x47
  [   50.289214]  [c02138c2] acpi_bus_register_driver+0x3a/0x3c
  [   50.289316]  [c044306b] acpi_thermal_init+0x57/0x76
  [   50.289424]  [c04344a7] kernel_init+0x138/0x280
  [   50.289525]  [c01047df] kernel_thread_helper+0x7/0x10
  [   50.289625]  ===
  [   50.289680] ACPI: Critical trip point
  [   50.289736] Critical temperature reached (255 C), shutting down.
  
  so in acpi_thermal_get_temperature() called in acpi_thermal_add() the
  tz-temperature thingy is not set properly (printk's added):
  
  [   50.276607] Old temp: 4294967023
  [   50.281890] Got temp: 255
  [   50.282567] Old temp: 255
  [   50.287882] Got temp: 255
  
  What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and
  there's still garbage in it after reading it in 
  acpi_thermal_get_temperature()
  for the first time. Debugging continues...
 
 (i almost suspected that the problem might be something completely different.)
 well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer
 turned out to be
 
 broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch.
 
 After backing this one out, mm1 boots just fine here.

Thanks for tracking this down.  I'll look into your logs and see if I
can figure out what's going on.  There's another report related to that
patch here: http://lkml.org/lkml/2007/11/22/110 .  Looks like a different
symptom though, so probably a different fix.

Bjorn

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

On Tue, 11 Dec 2007 08:13:52 -0800 Martin Bligh [EMAIL PROTECTED] wrote:

 
 
  - Lots of device IDs have been removed from the e1000 driver and moved
  over
   to e1000e.  So if your e1000 stops working, you forgot to set
  CONFIG_E1000E.
 
 
 Wouldn't it make sense to just default this to on if E1000 was on, rather
 than screwing
 everybody for no good reason (plus breaking all the automated testing, etc
 etc)?
 Much though I love random refactoring, it is fairly painful to just keep
 changing the
 names of things.

(cc netdev and Auke)

Yes, that would be very sensible.  CONFIG_E1000E should default to whatever
CONFIG_E1000 was set to.

 
 I can't see this compile failure posted anywhere:
 http://test.kernel.org/results/IBM/126049/build/debug/stderr
 
 arch/x86/vdso/vdso32/sigreturn.S: Assembler messages:
 arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid for 
 `pop'
 arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for `pop'
 make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1
 make: *** [arch/x86/vdso] Error 2

(cc Ingo and Thomas)

 
 Nor this one:
 http://test.kernel.org/results/IBM/126096/build/debug/stderr
 
 drivers/char/hvcs.c: In function â€˜hvcs_openâ€™:
 drivers/char/hvcs.c:1180: error: wrong type argument to unary exclamation mark
 

(cc Greg)

Caused by gregkh-driver-kobject-convert-hvcs-to-use-kref-not-kobject.patch.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

2007-12-11 Thread Ingo Molnar


* Andrew Morton [EMAIL PROTECTED] wrote:

  I can't see this compile failure posted anywhere:
  http://test.kernel.org/results/IBM/126049/build/debug/stderr
  
  arch/x86/vdso/vdso32/sigreturn.S: Assembler messages:
  arch/x86/vdso/vdso32/sigreturn.S:23: Error: suffix or operands invalid for 
  `pop'
  arch/x86/vdso/vdso32/syscall.S:25: Error: suffix or operands invalid for 
  `pop'
  make[1]: *** [arch/x86/vdso/vdso32/syscall.o] Error 1
  make: *** [arch/x86/vdso] Error 2
 
 (cc Ingo and Thomas)

Roland says:

| That seems like it must be a tool problem.  The V=1 output would show 
| if those compiles missed -m32 or something.  But even in the wrong 
| mode, this error does not make sense.  The assembly code it's citing 
| is identical to the old arch/x86/ia32/vsyscall-syscall.S code.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Tue, Dec 11, 2007 at 01:00:24PM -0700, Bjorn Helgaas wrote:
 On Tuesday 11 December 2007 10:44:43 am Borislav Petkov wrote:
  On Sun, Dec 09, 2007 at 10:19:47AM +0100, Borislav Petkov wrote:
   On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote:
Hi Andrew,
Hi Len,

after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just
fine) on my asus laptop, the machine reboots after claiming that
Critical temperature reached (255 C). However, the degrees number
is kinda hinting at 0xff all-ones field. Will try dump_stack in
acpi_thermal_critical() to checkout the call path. For now here's the 
netconsole bootlog:
   
   Here's what i got so far:
   
   [   50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14
   [   50.287999]  [c0104b65] show_trace_log_lvl+0x12/0x25
   [   50.288103]  [c01053e7] show_trace+0xd/0x10
   [   50.288202]  [c0105a6c] dump_stack+0x57/0x5f
   [   50.288303]  [c021c991] acpi_thermal_check+0x150/0x3bb
   [   50.288415]  [c021d4b3] acpi_thermal_add+0x261/0x2cf
   [   50.288515]  [c0213549] acpi_device_probe+0x3e/0xdb
   [   50.288615]  [c023f8f5] driver_probe_device+0xaf/0x12a
   [   50.288717]  [c023fa88] __driver_attach+0x6c/0xa5
   [   50.288817]  [c023ee5a] bus_for_each_dev+0x3e/0x60
   [   50.288916]  [c023f77d] driver_attach+0x14/0x16
   [   50.289015]  [c023f5a6] bus_add_driver+0xa6/0x1a8
   [   50.289114]  [c023fc53] driver_register+0x42/0x47
   [   50.289214]  [c02138c2] acpi_bus_register_driver+0x3a/0x3c
   [   50.289316]  [c044306b] acpi_thermal_init+0x57/0x76
   [   50.289424]  [c04344a7] kernel_init+0x138/0x280
   [   50.289525]  [c01047df] kernel_thread_helper+0x7/0x10
   [   50.289625]  ===
   [   50.289680] ACPI: Critical trip point
   [   50.289736] Critical temperature reached (255 C), shutting down.
   
   so in acpi_thermal_get_temperature() called in acpi_thermal_add() the
   tz-temperature thingy is not set properly (printk's added):
   
   [   50.276607] Old temp: 4294967023
   [   50.281890] Got temp: 255
   [   50.282567] Old temp: 255
   [   50.287882] Got temp: 255
   
   What's also strange is that the tz acpi_thermal is alloc'd with kzalloc 
   and
   there's still garbage in it after reading it in 
   acpi_thermal_get_temperature()
   for the first time. Debugging continues...
  
  (i almost suspected that the problem might be something completely 
  different.)
  well, after bisecting the rc4-mm1 tree for a whole day today, the evildoer
  turned out to be
  
  broken-out/pnp-request-ioport-and-iomem-resources-used-by-active-devices.patch.
  
  After backing this one out, mm1 boots just fine here.
 
 Thanks for tracking this down.  I'll look into your logs and see if I
 can figure out what's going on.  There's another report related to that
 patch here: http://lkml.org/lkml/2007/11/22/110 .  Looks like a different
 symptom though, so probably a different fix.

From what i can roughly tell so far it seems like an resource conflict between 
acpi and
the pnp requested regions in your patch which result in the acpi_thermal code
to read the wrong (0xff) temperature value and halt the machine, but i might be
wrong on the details since acpi is such a big code chunk to swallow. Anyways, 
this is a
different issue than the one you quote above.

-- 
Regards/Gruß,
Boris.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

Andrew Morton wrote:
 On Tue, 11 Dec 2007 08:13:52 -0800 Martin Bligh [EMAIL PROTECTED] wrote:
 

 - Lots of device IDs have been removed from the e1000 driver and moved
 over
  to e1000e.  So if your e1000 stops working, you forgot to set
 CONFIG_E1000E.


 Wouldn't it make sense to just default this to on if E1000 was on, rather
 than screwing
 everybody for no good reason (plus breaking all the automated testing, etc
 etc)?
 Much though I love random refactoring, it is fairly painful to just keep
 changing the
 names of things.
 
 (cc netdev and Auke)
 
 Yes, that would be very sensible.  CONFIG_E1000E should default to whatever
 CONFIG_E1000 was set to.

which is y for x86 and friends, ppc, arm and ia64 through 'defconfig'. the
Kconfig files do not have defaults in them.

I can send a patch to adjust the defconfig files, would that be OK? I certainly
think that would be reasonable, I dislike setting defaults through defconfig for
network drivers myself and rather would not do that.

Auke
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

Kok, Auke wrote:
 Andrew Morton wrote:
 On Tue, 11 Dec 2007 08:13:52 -0800 Martin Bligh [EMAIL PROTECTED] wrote:

 - Lots of device IDs have been removed from the e1000 driver and moved
 over
  to e1000e.  So if your e1000 stops working, you forgot to set
 CONFIG_E1000E.


 Wouldn't it make sense to just default this to on if E1000 was on, rather
 than screwing
 everybody for no good reason (plus breaking all the automated testing, etc
 etc)?
 Much though I love random refactoring, it is fairly painful to just keep
 changing the
 names of things.
 (cc netdev and Auke)

 Yes, that would be very sensible.  CONFIG_E1000E should default to whatever
 CONFIG_E1000 was set to.
 
 which is y for x86 and friends, ppc, arm and ia64 through 'defconfig'. the
 Kconfig files do not have defaults in them.
 
 I can send a patch to adjust the defconfig files, would that be OK? I 
 certainly
 think that would be reasonable, I dislike setting defaults through defconfig 
 for
 network drivers myself and rather would not do that.

that should read dislike setting defaults through Kconfig ...

Auke
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

On Tue, 11 Dec 2007 13:26:58 -0800
Kok, Auke [EMAIL PROTECTED] wrote:

 Andrew Morton wrote:
  On Tue, 11 Dec 2007 08:13:52 -0800 Martin Bligh [EMAIL PROTECTED] wrote:
  
 
  - Lots of device IDs have been removed from the e1000 driver and moved
  over
   to e1000e.  So if your e1000 stops working, you forgot to set
  CONFIG_E1000E.
 
 
  Wouldn't it make sense to just default this to on if E1000 was on, rather
  than screwing
  everybody for no good reason (plus breaking all the automated testing, etc
  etc)?
  Much though I love random refactoring, it is fairly painful to just keep
  changing the
  names of things.
  
  (cc netdev and Auke)
  
  Yes, that would be very sensible.  CONFIG_E1000E should default to whatever
  CONFIG_E1000 was set to.
 
 which is y for x86 and friends, ppc, arm and ia64 through 'defconfig'. the
 Kconfig files do not have defaults in them.

I wouldn't be looking at defconfig files - I don't think many people use
them.  Most people use their previous config, via oldconfig.

So what we want here is to give them E1000E if they had previously been
using E1000.  I don't know how one would do this in Kconfig.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

Andrew Morton wrote:
 On Tue, 11 Dec 2007 13:26:58 -0800
 Kok, Auke [EMAIL PROTECTED] wrote:
 
 Andrew Morton wrote:
 On Tue, 11 Dec 2007 08:13:52 -0800 Martin Bligh [EMAIL PROTECTED] wrote:

 - Lots of device IDs have been removed from the e1000 driver and moved
 over
  to e1000e.  So if your e1000 stops working, you forgot to set
 CONFIG_E1000E.


 Wouldn't it make sense to just default this to on if E1000 was on, rather
 than screwing
 everybody for no good reason (plus breaking all the automated testing, etc
 etc)?
 Much though I love random refactoring, it is fairly painful to just keep
 changing the
 names of things.
 (cc netdev and Auke)

 Yes, that would be very sensible.  CONFIG_E1000E should default to whatever
 CONFIG_E1000 was set to.
 which is y for x86 and friends, ppc, arm and ia64 through 'defconfig'. the
 Kconfig files do not have defaults in them.
 
 I wouldn't be looking at defconfig files - I don't think many people use
 them.  Most people use their previous config, via oldconfig.
 
 So what we want here is to give them E1000E if they had previously been
 using E1000.  I don't know how one would do this in Kconfig.

ditto. I doubt that SELECT E1000E would be a good idea here (maybe not even
work), and I can't think of anything else.

Auke
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

On Tue, 11 Dec 2007 14:17:16 -0800 Kok, Auke wrote:

 Andrew Morton wrote:
  On Tue, 11 Dec 2007 13:26:58 -0800
  Kok, Auke [EMAIL PROTECTED] wrote:
  
  Andrew Morton wrote:
  On Tue, 11 Dec 2007 08:13:52 -0800 Martin Bligh [EMAIL PROTECTED] 
  wrote:
 
  - Lots of device IDs have been removed from the e1000 driver and moved
  over
   to e1000e.  So if your e1000 stops working, you forgot to set
  CONFIG_E1000E.
 
 
  Wouldn't it make sense to just default this to on if E1000 was on, rather
  than screwing
  everybody for no good reason (plus breaking all the automated testing, 
  etc
  etc)?
  Much though I love random refactoring, it is fairly painful to just keep
  changing the
  names of things.
  (cc netdev and Auke)
 
  Yes, that would be very sensible.  CONFIG_E1000E should default to 
  whatever
  CONFIG_E1000 was set to.
  which is y for x86 and friends, ppc, arm and ia64 through 'defconfig'. 
  the
  Kconfig files do not have defaults in them.
  
  I wouldn't be looking at defconfig files - I don't think many people use
  them.  Most people use their previous config, via oldconfig.
  
  So what we want here is to give them E1000E if they had previously been
  using E1000.  I don't know how one would do this in Kconfig.
 
 ditto. I doubt that SELECT E1000E would be a good idea here (maybe not even
 work), and I can't think of anything else.

default E1000 in E1000E seems to work for me.

---

From: Randy Dunlap [EMAIL PROTECTED]

Make E1000E default to the same kconfig setting as E1000,
at least for -mm testing.

Signed-off-by: Randy Dunlap [EMAIL PROTECTED]
---
 drivers/net/Kconfig |1 +
 1 file changed, 1 insertion(+)

--- linux-2.6.24-rc4-mm1.orig/drivers/net/Kconfig
+++ linux-2.6.24-rc4-mm1/drivers/net/Kconfig
@@ -1986,6 +1986,7 @@ config E1000_DISABLE_PACKET_SPLIT
 config E1000E
tristate Intel(R) PRO/1000 PCI-Express Gigabit Ethernet support
depends on PCI
+   default E1000
---help---
  This driver supports the PCI-Express Intel(R) PRO/1000 gigabit
  ethernet family of adapters. For PCI or PCI-X e1000 adapters,
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine... solved

On Tuesday 11 December 2007 01:52:55 pm Borislav Petkov wrote:
 From what i can roughly tell so far it seems like an resource conflict 
 between acpi and
 the pnp requested regions in your patch which result in the acpi_thermal code
 to read the wrong (0xff) temperature value and halt the machine, but i might 
 be
 wrong on the details since acpi is such a big code chunk to swallow.

I don't see any obvious conflict from the log you posted.  For the sake
of comparison, can you post the corresponding dmesg log after you removed
the patch?

acpi_thermal_get_temperature() only evaluates _TMP, which isn't very
interesting.  I wonder if there's some conflict between that AML method
and the EC driver or something.

If you can also collect the DSDT, maybe I can poke around in there and
see what _TMP is really doing.

Thanks,
  Bjorn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1 -- boot process hangs -- tty4 main process (2988) terminated with status 1

On Sat, 8 Dec 2007 21:29:18 -0500 Miles Lane [EMAIL PROTECTED] wrote:

   Dec  6 21:24:28 erratic-orbits init: tty3 main process (2991)
   terminated with status 1
 
  Boggle.  We broke the vt driver?
 
  config, please...
 
 I sent the .config.

I didn't receive it but I found a config from you in amother thread.

  Is there nothing else to follow up on?  I have
 tried rebuilding about seven kernels, tweaking the options each time.
 All the kernels have failed to boot.   I am currently trying with a
 defconfig kernel.  Perhaps I will have better luck with it.

Your config instabricks my Vaio.  Fiddled with it a bit but failed to pick
the problem.  Fixing regressions in -mm isn't top priority at present I'm
afraid.  If the same bug is present in next -mm it'd be great if you could
bisect it down please.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

2007-12-11 Thread Rik van Riel

On Tue, 4 Dec 2007 21:17:01 -0800
Andrew Morton [EMAIL PROTECTED] wrote:

 Changes since 2.6.24-rc3-mm2:

2.6.24-rc4-mm1 brought a nice TCP oops on my x86_64 system, while I
was stress-testing the VM and watching via ssh:

general protection fault:  [1] SMP 
last sysfs file: /sys/devices/pci:00/:00:1c.5/:04:00.0/irq
CPU 1 
Modules linked in: nfs lockd nfs_acl rfcomm l2cap bluetooth autofs4 sunrpc ipv6 
acpi_cpufreq dm_multipath parport_pc e1000e parport firewire_ohci button 
i2c_i801 i2c_core i82975x_edac pcspkr firewire_core serio_raw edac_core 
rtc_cmos floppy crc_itu_t sg sr_mod cdrom pata_marvell ata_piix dm_snapshot 
dm_zero dm_mirror dm_mod ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd 
ohci_hcd ehci_hcd
Pid: 2946, comm: sshd Not tainted 2.6.24-rc4-mm1 #1
RIP: 0010:[81227add]  [81227add] __tcp_rb_insert+0x1a/0x67
RSP: 0018:810066401c88  EFLAGS: 00010202
RAX: 6b6b6b6b6b6b6b6b RBX: 810076e9f000 RCX: 81003ddc9900
RDX: 6b6b6b6b6b6b6bab RSI: 81006ed1b148 RDI: 6b6b6b6b6b6b6b5b
RBP: 81006ed1aa00 R08: 810076e9f010 R09: bef8d64e
R10: 81228926 R11: 8110b2aa R12: 810066401de8
R13: 00e0 R14: 810066401ee8 R15: 0001
FS:  7f1c2c10d780() GS:81007f801578() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 02aabfd3 CR3: 665e3000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process sshd (pid: 2946, threadinfo 81006640, task 8100665ce000)
Stack:  81003ddc9900 81228b26  0001
 810066401ee8 810574da 04e00040 00e004e0
 7f1c2c797620 0246 66401d60 
Call Trace:
 [81228b26] tcp_sendmsg+0x21f/0xb00
 [811f0435] sock_aio_write+0xf8/0x110
 [810a9451] do_sync_write+0xc9/0x10c
 [811071d3] file_has_perm+0x9a/0xa9
 [8104e29a] autoremove_wake_function+0x0/0x2e
 [81059db6] __lock_acquire+0x50f/0xc8e
 [810574da] lock_release_holdtime+0x27/0x48
 [810a9c53] vfs_write+0xd9/0x16f
 [810aa1fd] sys_write+0x45/0x6e
 [8100c0ba] tracesys+0xdc/0xe1


Code: 44 3b 4a 1c 79 10 44 3b 4a 18 78 04 0f 0b eb fe 48 8d 50 10 
RIP  [81227add] __tcp_rb_insert+0x1a/0x67
 RSP 810066401c88


-- 
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it. - Brian W. Kernighan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

2007-12-10 Thread Andrew Morton

On Tue, 11 Dec 2007 01:48:39 +1100
Reuben Farrelly <[EMAIL PROTECTED]> wrote:

> 
> 
> On 5/12/2007 4:17 PM, Andrew Morton wrote:
> > Temporarily at
> > 
> >   http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/
> > 
> > Will appear later at
> > 
> >   
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/
> > 
> > 
> > - Lots of device IDs have been removed from the e1000 driver and moved over
> >   to e1000e.  So if your e1000 stops working, you forgot to set 
> > CONFIG_E1000E.
> > 
> > - The s390 build is still broken.
> 
> I'm seeing this most incredibly unhelpful (to debug) but fortunately
> reproduceable problem (so far 4/4 times) on this -mm kernel.  I thought this 
> problem may have been related to another bug which I have reported (A TCP 
> oops) 
> but even after applying a likely fix for that I am still seeing this problem.
> 
> The machine boots up perfectly fine and runs good until I load it up.
> In this case I can reliably cause this to occur by pulling a 3G ISO across the
> GigE network from my Linux box to my PC.  After maybe 50M or so, the console 
> just displays this (ignore initial boot banner):
> 
> --
> 
>   * Starting local ... [ 
> ok ]
> 
> 
> This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01
> 
> tornado login: *** buffer overf
> 
> ---
> 
> Yes - after displaying the 'f' in what I can only guess is the word 
> 'overflow',
> the box spontaneously reboots.  There is no further console output until it 
> starts to come back up again.
> 
> The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 
> 2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this 
> stage.
> 
> I enabled a number of kernel debugging options but then I got no output at 
> all 
> when the machine crashed.
> 
> I'm at a bit of a loss as to which subsystem this might be coming from, so 
> I'm 
> not sure who to CC.
> 
> Box information is (still) up at 
> http://www.reub.net/files/kernel/2.6.24-rc4-mm1/
> 

hm.  grepping around for "buffer overflow" doesn't turn up anything except in
drivers which you won't be using on that machine.

I'd be suspecting networking, obviously.  If you're feeling keen could you 
please
grep a 2.6.24-rc4 tree and apply 2.6.24-rc4-mm1's origin.patch and git-net.patch
and see if the bug is still present?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

On Mon, 10 Dec 2007, Ilpo Järvinen wrote:

> Dave, please include this one to net-2.6.25.

...

> --
> [PATCH] [TCP]: Fix fack_count miscountings (multiple places)

I've better version of this coming up, so Dave please don't put this one 
into net-2.6.25 (noticed that both the original and the after patch code 
can get to an infinite loop and the new code is flawed in some rare cases 
still as well). I'll submit a better version soon.

-- 
 i.

Re: 2.6.24-rc4-mm1

2007-12-10 Thread Reuben Farrelly




On 5/12/2007 4:17 PM, Andrew Morton wrote:

Temporarily at

  http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/

Will appear later at

  
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/


- Lots of device IDs have been removed from the e1000 driver and moved over
  to e1000e.  So if your e1000 stops working, you forgot to set CONFIG_E1000E.

- The s390 build is still broken.


I'm seeing this most incredibly unhelpful (to debug) but fortunately
reproduceable problem (so far 4/4 times) on this -mm kernel.  I thought this 
problem may have been related to another bug which I have reported (A TCP oops) 
but even after applying a likely fix for that I am still seeing this problem.


The machine boots up perfectly fine and runs good until I load it up.
In this case I can reliably cause this to occur by pulling a 3G ISO across the
GigE network from my Linux box to my PC.  After maybe 50M or so, the console 
just displays this (ignore initial boot banner):


--

 * Starting local ... [ ok ]


This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01

tornado login: *** buffer overf

---

Yes - after displaying the 'f' in what I can only guess is the word 'overflow',
the box spontaneously reboots.  There is no further console output until it 
starts to come back up again.


The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 
2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage.


I enabled a number of kernel debugging options but then I got no output at all 
when the machine crashed.


I'm at a bit of a loss as to which subsystem this might be coming from, so I'm 
not sure who to CC.


Box information is (still) up at 
http://www.reub.net/files/kernel/2.6.24-rc4-mm1/

Thanks,
Reuben




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

On Wed, 5 Dec 2007, Andrew Morton wrote:

> On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly <[EMAIL PROTECTED]> wrote:
> 
> > This non fatal oops which I have just noticed may be related to this change 
> > then 
> > - certainly looks networking related.
> 
> yep, but it isn't e1000.  It's core TCP.
> 
> > WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
> > Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1
> 
> Ilpo, Reuben's kernel is talking to you ;)

...Please try the patch below. Andrew, this probably fixes your problem 
(the packets <= tp->packets_out) as well.

Dave, please include this one to net-2.6.25.


-- 
 i.

--
[PATCH] [TCP]: Fix fack_count miscountings (multiple places)

1) Fack_count is set incorrectly if the highest sent skb is
already sacked (the skb->prev won't return it because it's on
the other list already). These manifest as fackets_out counting
error later on, the second-order effects are very hard to track,
so it may fix all out-standing TCP bug reports.

2) Prev == NULL check was wrong way around

3) Last skb's fack count was incorrectly skipped while() {} loop

Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
---
 include/net/tcp.h |   22 --
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9dbed0b..11a7e3e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1337,10 +1337,20 @@ static inline struct sk_buff *tcp_send_head(struct sock 
*sk)
 static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb)
 {
struct sk_buff *prev = tcp_write_queue_prev(sk, skb);
+   unsigned int fc = 0;
+
+   if (prev == (struct sk_buff *)>sk_write_queue)
+   prev = NULL;
+   else if (!tcp_skb_adjacent(sk, prev, skb))
+   prev = NULL;
 
-   if (prev != (struct sk_buff *)>sk_write_queue)
-   TCP_SKB_CB(skb)->fack_count = TCP_SKB_CB(prev)->fack_count +
- tcp_skb_pcount(prev);
+   if ((prev == NULL) && !__tcp_write_queue_empty(sk, TCP_WQ_SACKED))
+   prev = __tcp_write_queue_tail(sk, TCP_WQ_SACKED);
+
+   if (prev != NULL)
+   fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev);
+
+   TCP_SKB_CB(skb)->fack_count = fc;
 
sk->sk_send_head = tcp_write_queue_next(sk, skb);
if (sk->sk_send_head == (struct sk_buff *)>sk_write_queue)
@@ -1464,7 +1474,7 @@ static inline struct sk_buff 
*__tcp_reset_fack_counts(struct sock *sk,
 {
unsigned int fc = 0;
 
-   if (prev == NULL)
+   if (prev != NULL)
fc = TCP_SKB_CB(*prev)->fack_count + tcp_skb_pcount(*prev);
 
BUG_ON((*prev != NULL) && !tcp_skb_adjacent(sk, *prev, skb));
@@ -1521,7 +1531,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, 
struct sk_buff *inskb)
skb[otherq] = prev->next;
}
 
-   while (skb[queue] != __tcp_write_queue_tail(sk, queue)) {
+   do {
/* Lazy find for the other queue */
if (skb[queue] == NULL) {
skb[queue] = tcp_write_queue_find(sk, 
TCP_SKB_CB(prev)->seq,
@@ -1535,7 +1545,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, 
struct sk_buff *inskb)
break;
 
queue ^= TCP_WQ_SACKED;
-   }
+   } while (skb[queue] != __tcp_write_queue_tail(sk, queue));
 }
 
 static inline void __tcp_insert_write_queue_after(struct sk_buff *skb,
-- 
1.5.0.6

Re: 2.6.24-rc4-mm1

On Wed, 5 Dec 2007, Andrew Morton wrote:

 On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly [EMAIL PROTECTED] wrote:
 
  This non fatal oops which I have just noticed may be related to this change 
  then 
  - certainly looks networking related.
 
 yep, but it isn't e1000.  It's core TCP.
 
  WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
  Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1
 
 Ilpo, Reuben's kernel is talking to you ;)

...Please try the patch below. Andrew, this probably fixes your problem 
(the packets = tp-packets_out) as well.

Dave, please include this one to net-2.6.25.


-- 
 i.

--
[PATCH] [TCP]: Fix fack_count miscountings (multiple places)

1) Fack_count is set incorrectly if the highest sent skb is
already sacked (the skb-prev won't return it because it's on
the other list already). These manifest as fackets_out counting
error later on, the second-order effects are very hard to track,
so it may fix all out-standing TCP bug reports.

2) Prev == NULL check was wrong way around

3) Last skb's fack count was incorrectly skipped while() {} loop

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/net/tcp.h |   22 --
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9dbed0b..11a7e3e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1337,10 +1337,20 @@ static inline struct sk_buff *tcp_send_head(struct sock 
*sk)
 static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb)
 {
struct sk_buff *prev = tcp_write_queue_prev(sk, skb);
+   unsigned int fc = 0;
+
+   if (prev == (struct sk_buff *)sk-sk_write_queue)
+   prev = NULL;
+   else if (!tcp_skb_adjacent(sk, prev, skb))
+   prev = NULL;
 
-   if (prev != (struct sk_buff *)sk-sk_write_queue)
-   TCP_SKB_CB(skb)-fack_count = TCP_SKB_CB(prev)-fack_count +
- tcp_skb_pcount(prev);
+   if ((prev == NULL)  !__tcp_write_queue_empty(sk, TCP_WQ_SACKED))
+   prev = __tcp_write_queue_tail(sk, TCP_WQ_SACKED);
+
+   if (prev != NULL)
+   fc = TCP_SKB_CB(prev)-fack_count + tcp_skb_pcount(prev);
+
+   TCP_SKB_CB(skb)-fack_count = fc;
 
sk-sk_send_head = tcp_write_queue_next(sk, skb);
if (sk-sk_send_head == (struct sk_buff *)sk-sk_write_queue)
@@ -1464,7 +1474,7 @@ static inline struct sk_buff 
*__tcp_reset_fack_counts(struct sock *sk,
 {
unsigned int fc = 0;
 
-   if (prev == NULL)
+   if (prev != NULL)
fc = TCP_SKB_CB(*prev)-fack_count + tcp_skb_pcount(*prev);
 
BUG_ON((*prev != NULL)  !tcp_skb_adjacent(sk, *prev, skb));
@@ -1521,7 +1531,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, 
struct sk_buff *inskb)
skb[otherq] = prev-next;
}
 
-   while (skb[queue] != __tcp_write_queue_tail(sk, queue)) {
+   do {
/* Lazy find for the other queue */
if (skb[queue] == NULL) {
skb[queue] = tcp_write_queue_find(sk, 
TCP_SKB_CB(prev)-seq,
@@ -1535,7 +1545,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, 
struct sk_buff *inskb)
break;
 
queue ^= TCP_WQ_SACKED;
-   }
+   } while (skb[queue] != __tcp_write_queue_tail(sk, queue));
 }
 
 static inline void __tcp_insert_write_queue_after(struct sk_buff *skb,
-- 
1.5.0.6

Re: 2.6.24-rc4-mm1

2007-12-10 Thread Reuben Farrelly




On 5/12/2007 4:17 PM, Andrew Morton wrote:

Temporarily at

  http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/

Will appear later at

  
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/


- Lots of device IDs have been removed from the e1000 driver and moved over
  to e1000e.  So if your e1000 stops working, you forgot to set CONFIG_E1000E.

- The s390 build is still broken.


I'm seeing this most incredibly unhelpful (to debug) but fortunately
reproduceable problem (so far 4/4 times) on this -mm kernel.  I thought this 
problem may have been related to another bug which I have reported (A TCP oops) 
but even after applying a likely fix for that I am still seeing this problem.


The machine boots up perfectly fine and runs good until I load it up.
In this case I can reliably cause this to occur by pulling a 3G ISO across the
GigE network from my Linux box to my PC.  After maybe 50M or so, the console 
just displays this (ignore initial boot banner):


--

 * Starting local ... [ ok ]


This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01

tornado login: *** buffer overf

---

Yes - after displaying the 'f' in what I can only guess is the word 'overflow',
the box spontaneously reboots.  There is no further console output until it 
starts to come back up again.


The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 
2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage.


I enabled a number of kernel debugging options but then I got no output at all 
when the machine crashed.


I'm at a bit of a loss as to which subsystem this might be coming from, so I'm 
not sure who to CC.


Box information is (still) up at 
http://www.reub.net/files/kernel/2.6.24-rc4-mm1/

Thanks,
Reuben




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

On Mon, 10 Dec 2007, Ilpo Järvinen wrote:

 Dave, please include this one to net-2.6.25.

...

 --
 [PATCH] [TCP]: Fix fack_count miscountings (multiple places)

I've better version of this coming up, so Dave please don't put this one 
into net-2.6.25 (noticed that both the original and the after patch code 
can get to an infinite loop and the new code is flawed in some rare cases 
still as well). I'll submit a better version soon.

-- 
 i.

Re: 2.6.24-rc4-mm1

2007-12-10 Thread Andrew Morton

On Tue, 11 Dec 2007 01:48:39 +1100
Reuben Farrelly [EMAIL PROTECTED] wrote:

On 5/12/2007 4:17 PM, Andrew Morton wrote:
Temporarily at

http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/

Will appear later at

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/

- Lots of device IDs have been removed from the e1000 driver and moved over
to e1000e. So if your e1000 stops working, you forgot to set
CONFIG_E1000E.

- The s390 build is still broken.

I'm seeing this most incredibly unhelpful (to debug) but fortunately
reproduceable problem (so far 4/4 times) on this -mm kernel. I thought this
problem may have been related to another bug which I have reported (A TCP
oops)
but even after applying a likely fix for that I am still seeing this problem.

The machine boots up perfectly fine and runs good until I load it up.
In this case I can reliably cause this to occur by pulling a 3G ISO across the
GigE network from my Linux box to my PC. After maybe 50M or so, the console
just displays this (ignore initial boot banner):

* Starting local ... [
ok ]

This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01

tornado login: *** buffer overf

---

Yes - after displaying the 'f' in what I can only guess is the word
'overflow',
the box spontaneously reboots. There is no further console output until it
starts to come back up again.

The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla
2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this
stage.

I enabled a number of kernel debugging options but then I got no output at
all
when the machine crashed.

I'm at a bit of a loss as to which subsystem this might be coming from, so
I'm
not sure who to CC.

Box information is (still) up at
http://www.reub.net/files/kernel/2.6.24-rc4-mm1/

hm. grepping around for buffer overflow doesn't turn up anything except in
drivers which you won't be using on that machine.

I'd be suspecting networking, obviously. If you're feeling keen could you
please
grep a 2.6.24-rc4 tree and apply 2.6.24-rc4-mm1's origin.patch and git-net.patch
and see if the bug is still present?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

2007-12-09 Thread Dave Young

On Dec 8, 2007 6:22 AM, Luis R. Rodriguez <[EMAIL PROTECTED]> wrote:
> On Dec 6, 2007 9:12 PM, Dave Young <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > 2.6.24-rc4-mm1 build failed at drivers/net/wireless/ath5k/base.c for some 
> > inline functions like this:
> > drivers/net/wireless/ath5k/base.c:292: sorry, unimplemented: inlining 
> > failed in call to 'ath5k_extend_tsf': function body not available
> >
> > fix it with adjust the order of inline function body.
> >
> > Signed-off-by: Dave Young <[EMAIL PROTECTED]>
>
> Acked-by: Luis R. Rodriguez <[EMAIL PROTECTED]>

Thanks.

>
> Thanks Dave. What version of gcc were you using? I haven't run into this.

gcc 3.4.6

>
> BTW, nothing new was added in this patch, things were just shifted,
> but even that may be copyrightable. Is it fair to assume you are
> licensing these changes under the same license the file is in?

Ok, I don't care.

>
> For this file we'd usually use:
>
> Changes-licensed-under: 3-clause-BSD
>
> For future reference:
>
> http://linuxwireless.org/en/developers/Documentation/SubmittingPatches#Changes-licensed-undertag
>
>   Luis
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

2007-12-09 Thread Nick Kossifidis

2007/12/7, Dave Young <[EMAIL PROTECTED]>:
> Hi,
>
> 2.6.24-rc4-mm1 build failed at drivers/net/wireless/ath5k/base.c for some 
> inline functions like this:
> drivers/net/wireless/ath5k/base.c:292: sorry, unimplemented: inlining failed 
> in call to 'ath5k_extend_tsf': function body not available
>
> fix it with adjust the order of inline function body.
>
> Signed-off-by: Dave Young <[EMAIL PROTECTED]>
>
> ---
> drivers/net/wireless/ath5k/base.c |   67 
> --
> 1 file changed, 29 insertions(+), 38 deletions(-)
>
> diff -upr linux/drivers/net/wireless/ath5k/base.c 
> linux.new/drivers/net/wireless/ath5k/base.c
> --- linux/drivers/net/wireless/ath5k/base.c 2007-12-07 10:01:42.0 
> +0800
> +++ linux.new/drivers/net/wireless/ath5k/base.c 2007-12-07 10:01:49.0 
> +0800
> @@ -250,8 +250,19 @@ static int ath5k_rxbuf_setup(struct ath
>  static int ath5k_txbuf_setup(struct ath5k_softc *sc,
> struct ath5k_buf *bf,
> struct ieee80211_tx_control *ctl);
> +
>  static inline void ath5k_txbuf_free(struct ath5k_softc *sc,
> -   struct ath5k_buf *bf);
> +   struct ath5k_buf *bf)
> +{
> +   BUG_ON(!bf);
> +   if (!bf->skb)
> +   return;
> +   pci_unmap_single(sc->pdev, bf->skbaddr, bf->skb->len,
> +   PCI_DMA_TODEVICE);
> +   dev_kfree_skb(bf->skb);
> +   bf->skb = NULL;
> +}
> +
>  /* Queues setup */
>  static struct  ath5k_txq *ath5k_txq_setup(struct ath5k_softc *sc,
> int qtype, int subtype);
> @@ -278,14 +289,29 @@ static intath5k_beacon_setup(struct at
> struct ieee80211_tx_control *ctl);
>  static voidath5k_beacon_send(struct ath5k_softc *sc);
>  static voidath5k_beacon_config(struct ath5k_softc *sc);
> -static inline u64 ath5k_extend_tsf(struct ath5k_hw *ah, u32 rstamp);
> +
> +static inline u64 ath5k_extend_tsf(struct ath5k_hw *ah, u32 rstamp)
> +{
> +   u64 tsf = ath5k_hw_get_tsf64(ah);
> +
> +   if ((tsf & 0x7fff) < rstamp)
> +   tsf -= 0x8000;
> +
> +   return (tsf & ~0x7fff) | rstamp;
> +}
> +
>  /* Interrupt handling */
>  static int ath5k_init(struct ath5k_softc *sc);
>  static int ath5k_stop_locked(struct ath5k_softc *sc);
>  static int ath5k_stop_hw(struct ath5k_softc *sc);
>  static irqreturn_t ath5k_intr(int irq, void *dev_id);
>  static voidath5k_tasklet_reset(unsigned long data);
> -static inline void ath5k_update_txpow(struct ath5k_softc *sc);
> +
> +static inline void ath5k_update_txpow(struct ath5k_softc *sc)
> +{
> +   ath5k_hw_set_txpower_limit(sc->ah, 0);
> +}
> +
>  static voidath5k_calibrate(unsigned long data);
>  /* LED functions */
>  static voidath5k_led_off(unsigned long data);
> @@ -1341,21 +1367,6 @@ err_unmap:
> return ret;
>  }
>
> -static inline void
> -ath5k_txbuf_free(struct ath5k_softc *sc, struct ath5k_buf *bf)
> -{
> -   BUG_ON(!bf);
> -   if (!bf->skb)
> -   return;
> -   pci_unmap_single(sc->pdev, bf->skbaddr, bf->skb->len,
> -   PCI_DMA_TODEVICE);
> -   dev_kfree_skb(bf->skb);
> -   bf->skb = NULL;
> -}
> -
> -
> -
> -
>  /**\
>  * Queues setup *
>  \**/
> @@ -2046,20 +2057,6 @@ ath5k_beacon_config(struct ath5k_softc *
>  #undef TSF_TO_TU
>  }
>
> -static inline
> -u64 ath5k_extend_tsf(struct ath5k_hw *ah, u32 rstamp)
> -{
> -   u64 tsf = ath5k_hw_get_tsf64(ah);
> -
> -   if ((tsf & 0x7fff) < rstamp)
> -   tsf -= 0x8000;
> -
> -   return (tsf & ~0x7fff) | rstamp;
> -}
> -
> -
> -
> -
>  /\
>  * Interrupt handling *
>  \/
> @@ -2295,12 +2292,6 @@ ath5k_tasklet_reset(unsigned long data)
> ath5k_reset(sc->hw);
>  }
>
> -static inline void
> -ath5k_update_txpow(struct ath5k_softc *sc)
> -{
> -   ath5k_hw_set_txpower_limit(sc->ah, 0);
> -}
> -
>  /*
>   * Periodically recalibrate the PHY to account
>   * for temperature/environment changes.
>

We'll change their order in the code, plz keep prototype declarations
clean. I'll submit a patch asap on this.

-- 
GPG ID: 0xD21DB2DB
As you read this post global entropy rises. Have Fun ;-)
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine

2007-12-09 Thread Borislav Petkov

On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote:
> Hi Andrew,
> Hi Len,
> 
> after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just
> fine) on my asus laptop, the machine reboots after claiming that
> "Critical temperature reached (255 C)." However, the degrees number
> is kinda hinting at 0xff all-ones field. Will try dump_stack in
> acpi_thermal_critical() to checkout the call path. For now here's the 
> netconsole bootlog:

Here's what i got so far:

[   50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14
[   50.287999]  [] show_trace_log_lvl+0x12/0x25
[   50.288103]  [] show_trace+0xd/0x10
[   50.288202]  [] dump_stack+0x57/0x5f
[   50.288303]  [] acpi_thermal_check+0x150/0x3bb
[   50.288415]  [] acpi_thermal_add+0x261/0x2cf
[   50.288515]  [] acpi_device_probe+0x3e/0xdb
[   50.288615]  [] driver_probe_device+0xaf/0x12a
[   50.288717]  [] __driver_attach+0x6c/0xa5
[   50.288817]  [] bus_for_each_dev+0x3e/0x60
[   50.288916]  [] driver_attach+0x14/0x16
[   50.289015]  [] bus_add_driver+0xa6/0x1a8
[   50.289114]  [] driver_register+0x42/0x47
[   50.289214]  [] acpi_bus_register_driver+0x3a/0x3c
[   50.289316]  [] acpi_thermal_init+0x57/0x76
[   50.289424]  [] kernel_init+0x138/0x280
[   50.289525]  [] kernel_thread_helper+0x7/0x10
[   50.289625]  ===
[   50.289680] ACPI: Critical trip point
[   50.289736] Critical temperature reached (255 C), shutting down.

so in acpi_thermal_get_temperature() called in acpi_thermal_add() the
tz->temperature thingy is not set properly (printk's added):

[   50.276607] Old temp: 4294967023
[   50.281890] Got temp: 255
[   50.282567] Old temp: 255
[   50.287882] Got temp: 255

What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and
there's still garbage in it after reading it in acpi_thermal_get_temperature()
for the first time. Debugging continues...
-- 
Regards/Gruß,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: some issues on sparc64

2007-12-09 Thread Andrew Morton

On Sun, 09 Dec 2007 00:45:17 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote:

> From: Andrew Morton <[EMAIL PROTECTED]>
> Date: Sat, 8 Dec 2007 10:22:39 -0800
> 
> > That's
> > 
> > J_ASSERT_BH(bh, !buffer_jbddirty(bh));
> > 
> > at the end of journal_unmap_buffer().
> > 
> > I don't recall seeing that before and I can't think of anything we've
> > done recently which could cause it, sorry.
> 
> If the per-cpu data patches are in the -mm tree that is the first
> place I would start looking at for possible cause.

They aren't.  The dust hadn't settled enough on those when Christoph shot
through on vacation.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: some issues on sparc64

2007-12-09 Thread David Miller

From: Andrew Morton <[EMAIL PROTECTED]>
Date: Sat, 8 Dec 2007 10:22:39 -0800

> That's
> 
> J_ASSERT_BH(bh, !buffer_jbddirty(bh));
> 
> at the end of journal_unmap_buffer().
> 
> I don't recall seeing that before and I can't think of anything we've
> done recently which could cause it, sorry.

If the per-cpu data patches are in the -mm tree that is the first
place I would start looking at for possible cause.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: some issues on sparc64

2007-12-09 Thread David Miller

From: Andrew Morton [EMAIL PROTECTED]
Date: Sat, 8 Dec 2007 10:22:39 -0800

 That's
 
 J_ASSERT_BH(bh, !buffer_jbddirty(bh));
 
 at the end of journal_unmap_buffer().
 
 I don't recall seeing that before and I can't think of anything we've
 done recently which could cause it, sorry.

If the per-cpu data patches are in the -mm tree that is the first
place I would start looking at for possible cause.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: some issues on sparc64

2007-12-09 Thread Andrew Morton

On Sun, 09 Dec 2007 00:45:17 -0800 (PST) David Miller [EMAIL PROTECTED] wrote:

 From: Andrew Morton [EMAIL PROTECTED]
 Date: Sat, 8 Dec 2007 10:22:39 -0800

  That's

  J_ASSERT_BH(bh, !buffer_jbddirty(bh));

  at the end of journal_unmap_buffer().

  I don't recall seeing that before and I can't think of anything we've
  done recently which could cause it, sorry.

 If the per-cpu data patches are in the -mm tree that is the first
 place I would start looking at for possible cause.

They aren't.  The dust hadn't settled enough on those when Christoph shot
through on vacation.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: acpi reboots machine

2007-12-09 Thread Borislav Petkov

On Sun, Dec 09, 2007 at 08:50:02AM +0100, Borislav Petkov wrote:
 Hi Andrew,
 Hi Len,
 
 after booting 2.6.24-rc4-mm1 (2.6.24-rc4-190-g94545ba, otoh, boots just
 fine) on my asus laptop, the machine reboots after claiming that
 Critical temperature reached (255 C). However, the degrees number
 is kinda hinting at 0xff all-ones field. Will try dump_stack in
 acpi_thermal_critical() to checkout the call path. For now here's the 
 netconsole bootlog:

Here's what i got so far:

[   50.287939] Pid: 1, comm: swapper Not tainted 2.6.24-rc4-mm1 #14
[   50.287999]  [c0104b65] show_trace_log_lvl+0x12/0x25
[   50.288103]  [c01053e7] show_trace+0xd/0x10
[   50.288202]  [c0105a6c] dump_stack+0x57/0x5f
[   50.288303]  [c021c991] acpi_thermal_check+0x150/0x3bb
[   50.288415]  [c021d4b3] acpi_thermal_add+0x261/0x2cf
[   50.288515]  [c0213549] acpi_device_probe+0x3e/0xdb
[   50.288615]  [c023f8f5] driver_probe_device+0xaf/0x12a
[   50.288717]  [c023fa88] __driver_attach+0x6c/0xa5
[   50.288817]  [c023ee5a] bus_for_each_dev+0x3e/0x60
[   50.288916]  [c023f77d] driver_attach+0x14/0x16
[   50.289015]  [c023f5a6] bus_add_driver+0xa6/0x1a8
[   50.289114]  [c023fc53] driver_register+0x42/0x47
[   50.289214]  [c02138c2] acpi_bus_register_driver+0x3a/0x3c
[   50.289316]  [c044306b] acpi_thermal_init+0x57/0x76
[   50.289424]  [c04344a7] kernel_init+0x138/0x280
[   50.289525]  [c01047df] kernel_thread_helper+0x7/0x10
[   50.289625]  ===
[   50.289680] ACPI: Critical trip point
[   50.289736] Critical temperature reached (255 C), shutting down.

so in acpi_thermal_get_temperature() called in acpi_thermal_add() the
tz-temperature thingy is not set properly (printk's added):

[   50.276607] Old temp: 4294967023
[   50.281890] Got temp: 255
[   50.282567] Old temp: 255
[   50.287882] Got temp: 255

What's also strange is that the tz acpi_thermal is alloc'd with kzalloc and
there's still garbage in it after reading it in acpi_thermal_get_temperature()
for the first time. Debugging continues...
-- 
Regards/Gruß,
Boris.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

2007-12-09 Thread Nick Kossifidis

2007/12/7, Dave Young [EMAIL PROTECTED]:
 Hi,

 2.6.24-rc4-mm1 build failed at drivers/net/wireless/ath5k/base.c for some 
 inline functions like this:
 drivers/net/wireless/ath5k/base.c:292: sorry, unimplemented: inlining failed 
 in call to 'ath5k_extend_tsf': function body not available

 fix it with adjust the order of inline function body.

 Signed-off-by: Dave Young [EMAIL PROTECTED]

 ---
 drivers/net/wireless/ath5k/base.c |   67 
 --
 1 file changed, 29 insertions(+), 38 deletions(-)

 diff -upr linux/drivers/net/wireless/ath5k/base.c 
 linux.new/drivers/net/wireless/ath5k/base.c
 --- linux/drivers/net/wireless/ath5k/base.c 2007-12-07 10:01:42.0 
 +0800
 +++ linux.new/drivers/net/wireless/ath5k/base.c 2007-12-07 10:01:49.0 
 +0800
 @@ -250,8 +250,19 @@ static int ath5k_rxbuf_setup(struct ath
  static int ath5k_txbuf_setup(struct ath5k_softc *sc,
 struct ath5k_buf *bf,
 struct ieee80211_tx_control *ctl);
 +
  static inline void ath5k_txbuf_free(struct ath5k_softc *sc,
 -   struct ath5k_buf *bf);
 +   struct ath5k_buf *bf)
 +{
 +   BUG_ON(!bf);
 +   if (!bf-skb)
 +   return;
 +   pci_unmap_single(sc-pdev, bf-skbaddr, bf-skb-len,
 +   PCI_DMA_TODEVICE);
 +   dev_kfree_skb(bf-skb);
 +   bf-skb = NULL;
 +}
 +
  /* Queues setup */
  static struct  ath5k_txq *ath5k_txq_setup(struct ath5k_softc *sc,
 int qtype, int subtype);
 @@ -278,14 +289,29 @@ static intath5k_beacon_setup(struct at
 struct ieee80211_tx_control *ctl);
  static voidath5k_beacon_send(struct ath5k_softc *sc);
  static voidath5k_beacon_config(struct ath5k_softc *sc);
 -static inline u64 ath5k_extend_tsf(struct ath5k_hw *ah, u32 rstamp);
 +
 +static inline u64 ath5k_extend_tsf(struct ath5k_hw *ah, u32 rstamp)
 +{
 +   u64 tsf = ath5k_hw_get_tsf64(ah);
 +
 +   if ((tsf  0x7fff)  rstamp)
 +   tsf -= 0x8000;
 +
 +   return (tsf  ~0x7fff) | rstamp;
 +}
 +
  /* Interrupt handling */
  static int ath5k_init(struct ath5k_softc *sc);
  static int ath5k_stop_locked(struct ath5k_softc *sc);
  static int ath5k_stop_hw(struct ath5k_softc *sc);
  static irqreturn_t ath5k_intr(int irq, void *dev_id);
  static voidath5k_tasklet_reset(unsigned long data);
 -static inline void ath5k_update_txpow(struct ath5k_softc *sc);
 +
 +static inline void ath5k_update_txpow(struct ath5k_softc *sc)
 +{
 +   ath5k_hw_set_txpower_limit(sc-ah, 0);
 +}
 +
  static voidath5k_calibrate(unsigned long data);
  /* LED functions */
  static voidath5k_led_off(unsigned long data);
 @@ -1341,21 +1367,6 @@ err_unmap:
 return ret;
  }

 -static inline void
 -ath5k_txbuf_free(struct ath5k_softc *sc, struct ath5k_buf *bf)
 -{
 -   BUG_ON(!bf);
 -   if (!bf-skb)
 -   return;
 -   pci_unmap_single(sc-pdev, bf-skbaddr, bf-skb-len,
 -   PCI_DMA_TODEVICE);
 -   dev_kfree_skb(bf-skb);
 -   bf-skb = NULL;
 -}
 -
 -
 -
 -
  /**\
  * Queues setup *
  \**/
 @@ -2046,20 +2057,6 @@ ath5k_beacon_config(struct ath5k_softc *
  #undef TSF_TO_TU
  }

 -static inline
 -u64 ath5k_extend_tsf(struct ath5k_hw *ah, u32 rstamp)
 -{
 -   u64 tsf = ath5k_hw_get_tsf64(ah);
 -
 -   if ((tsf  0x7fff)  rstamp)
 -   tsf -= 0x8000;
 -
 -   return (tsf  ~0x7fff) | rstamp;
 -}
 -
 -
 -
 -
  /\
  * Interrupt handling *
  \/
 @@ -2295,12 +2292,6 @@ ath5k_tasklet_reset(unsigned long data)
 ath5k_reset(sc-hw);
  }

 -static inline void
 -ath5k_update_txpow(struct ath5k_softc *sc)
 -{
 -   ath5k_hw_set_txpower_limit(sc-ah, 0);
 -}
 -
  /*
   * Periodically recalibrate the PHY to account
   * for temperature/environment changes.


We'll change their order in the code, plz keep prototype declarations
clean. I'll submit a patch asap on this.

-- 
GPG ID: 0xD21DB2DB
As you read this post global entropy rises. Have Fun ;-)
Nick
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1

2007-12-09 Thread Dave Young

On Dec 8, 2007 6:22 AM, Luis R. Rodriguez [EMAIL PROTECTED] wrote:
 On Dec 6, 2007 9:12 PM, Dave Young [EMAIL PROTECTED] wrote:
  Hi,
 
  2.6.24-rc4-mm1 build failed at drivers/net/wireless/ath5k/base.c for some 
  inline functions like this:
  drivers/net/wireless/ath5k/base.c:292: sorry, unimplemented: inlining 
  failed in call to 'ath5k_extend_tsf': function body not available
 
  fix it with adjust the order of inline function body.
 
  Signed-off-by: Dave Young [EMAIL PROTECTED]

 Acked-by: Luis R. Rodriguez [EMAIL PROTECTED]

Thanks.


 Thanks Dave. What version of gcc were you using? I haven't run into this.

gcc 3.4.6


 BTW, nothing new was added in this patch, things were just shifted,
 but even that may be copyrightable. Is it fair to assume you are
 licensing these changes under the same license the file is in?

Ok, I don't care.


 For this file we'd usually use:

 Changes-licensed-under: 3-clause-BSD

 For future reference:

 http://linuxwireless.org/en/developers/Documentation/SubmittingPatches#Changes-licensed-undertag

   Luis

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash

On Sat, 08 Dec 2007 20:02:54 -0700 Zan Lynx <[EMAIL PROTECTED]> wrote:

> 
> On Sat, 2007-12-08 at 02:07 -0800, Andrew Morton wrote:
> > On Fri, 07 Dec 2007 22:01:33 -0700 Zan Lynx <[EMAIL PROTECTED]> wrote:
> > 
> > > 
> > > On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote:
> > > > On Fri, 07 Dec 2007 23:09:43 +
> > > > Zan Lynx <[EMAIL PROTECTED]> wrote:
> > > [cut] 
> > > > > > > Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, 
> > > > > > > but I
> > > > > > > only get read rates of 1.6 MB/s.  When it used to work in 2.6.20 
> > > > > > > I got
> > > > > > > at least 16 MB/s.  The card itself is capable of 30+ in the USB-2
> > > > > > > reader.
> [cut]
> > argh.  OK.  And Linus's current tree is OK, yes?
> > 
> > In which case we should be OK for 2.6.24 and I guess we can hope like heck
> > that the dud patch doesn't leak into mainline.  Hopefully Alan will get
> > some time to look into it before 2.6.25 opens.
> 
> Linus' tree is also broken.
> 
> I tried a Linus 2.6.24-rc4 and it acts the same way, with a very slow
> transfer rate.  

shit

> I also tried 2.6.24-rc4 with the older not-libata PATA drivers and it is
> broken.

squared.

>  dmesg had a line about the CF card detected as hda,
> but /sys/block did not have hda and /dev/hda did not function.

But these drivers did work in earlier kernels, yes? 2.6.20 worked, but
we don't know about intervening kernels.

Can you tell us which version(s)?

> I will try the patches you mentioned

Yes, that won't tell use anything.

> but I think I may also have to
> work backward through kernel versions until I find the last one where
> the PCMCIA hd{a,b,c,d,e} drivers worked.

That would be great - a git-bisect is often ideal. 
http://www.kernel.org/doc/local/git-quick.html has details.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash

2007-12-08 Thread Zan Lynx


On Sat, 2007-12-08 at 02:07 -0800, Andrew Morton wrote:
> On Fri, 07 Dec 2007 22:01:33 -0700 Zan Lynx <[EMAIL PROTECTED]> wrote:
> 
> > 
> > On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote:
> > > On Fri, 07 Dec 2007 23:09:43 +
> > > Zan Lynx <[EMAIL PROTECTED]> wrote:
> > [cut] 
> > > > > > Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, 
> > > > > > but I
> > > > > > only get read rates of 1.6 MB/s.  When it used to work in 2.6.20 I 
> > > > > > got
> > > > > > at least 16 MB/s.  The card itself is capable of 30+ in the USB-2
> > > > > > reader.
[cut]
> argh.  OK.  And Linus's current tree is OK, yes?
> 
> In which case we should be OK for 2.6.24 and I guess we can hope like heck
> that the dud patch doesn't leak into mainline.  Hopefully Alan will get
> some time to look into it before 2.6.25 opens.

Linus' tree is also broken.

I tried a Linus 2.6.24-rc4 and it acts the same way, with a very slow
transfer rate.  

I also tried 2.6.24-rc4 with the older not-libata PATA drivers and it is
broken.  dmesg had a line about the CF card detected as hda,
but /sys/block did not have hda and /dev/hda did not function.

I will try the patches you mentioned, but I think I may also have to
work backward through kernel versions until I find the last one where
the PCMCIA hd{a,b,c,d,e} drivers worked.
-- 
Zan Lynx <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: 2.6.24-rc4-mm1 -- boot process hangs -- tty4 main process (2988) terminated with status 1

2007-12-08 Thread Miles Lane

> > Dec  6 21:24:28 erratic-orbits init: tty3 main process (2991)
> > terminated with status 1
>
> Boggle.  We broke the vt driver?
>
> config, please...

I sent the .config.  Is there nothing else to follow up on?  I have
tried rebuilding about seven kernels, tweaking the options each time.
All the kernels have failed to boot.   I am currently trying with a
"defconfig" kernel.  Perhaps I will have better luck with it.

Thanks,
   Miles
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: some issues on sparc64

On Sat, 8 Dec 2007 19:20:28 +0100 Mariusz Kozlowski <[EMAIL PROTECTED]> wrote:

>   The box is sun ultra 60 (dual sparc64). This was caught when
> system (gentoo) was emerging some package. 
> 
> [27006.402237] kernel BUG at fs/jbd/transaction.c:1894!

That's

J_ASSERT_BH(bh, !buffer_jbddirty(bh));

at the end of journal_unmap_buffer().

I don't recall seeing that before and I can't think of anything we've
done recently which could cause it, sorry.

> [27006.402268]   \|/  \|/
> [27006.402274]   "@'/ .. \`@"
> [27006.402279]   /_| \__/ |_\
> [27006.402285]  \__U_/

x86 needs that.

> [27006.402298] rm(4713): Kernel bad sw trap 5 [#1]
> [27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 
> 0053b1d0 Y: Not tainted
> [27006.402579] TPC: 
> [27006.402593] g0: 0002 g1:  g2: 0001 
> g3: f800a7d9
> [27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 
> g7: 0076d868
> [27006.402627] o0: 0072b660 o1: 0766 o2: 0002 
> o3: 0001
> [27006.402644] o4: 008a2940 o5:  sp: f800a7d92c91 
> ret_pc: 0053b1c4
> [27006.402665] RPC: 
> [27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 
> l3: 
> [27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 
> l7: 0001
> [27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2:  
> i3: 00727000
> [27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 
> i7: 00529254
> [27006.402763] I7: 
> [27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60
> [27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60
> [27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80
> [27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440
> [27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20
> [27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160
> [27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120
> [27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0
> [27006.402971] Caller[004e75d4]: iput+0x7c/0xa0
> [27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0
> [27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60
> [27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40
> [27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4
> [27006.403102] Instruction DUMP: 92102766  7ffbbeaf  90122260 <91d02005> 
> 92102780  7ffbbeab  90122260  91d02005  7ffbbea8
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: some issues on sparc64

Hello,

The box is sun ultra 60 (dual sparc64). This was caught when
system (gentoo) was emerging some package. 

[27006.402237] kernel BUG at fs/jbd/transaction.c:1894!
[27006.402268]   \|/  \|/
[27006.402274]   "@'/ .. \`@"
[27006.402279]   /_| \__/ |_\
[27006.402285]  \__U_/
[27006.402298] rm(4713): Kernel bad sw trap 5 [#1]
[27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 
0053b1d0 Y: Not tainted
[27006.402579] TPC: 
[27006.402593] g0: 0002 g1:  g2: 0001 
g3: f800a7d9
[27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 
g7: 0076d868
[27006.402627] o0: 0072b660 o1: 0766 o2: 0002 
o3: 0001
[27006.402644] o4: 008a2940 o5:  sp: f800a7d92c91 
ret_pc: 0053b1c4
[27006.402665] RPC: 
[27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 
l3: 
[27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 
l7: 0001
[27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2:  
i3: 00727000
[27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 
i7: 00529254
[27006.402763] I7: 
[27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60
[27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60
[27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80
[27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440
[27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20
[27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160
[27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120
[27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0
[27006.402971] Caller[004e75d4]: iput+0x7c/0xa0
[27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0
[27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60
[27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40
[27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4
[27006.403102] Instruction DUMP: 92102766  7ffbbeaf  90122260 <91d02005> 
92102780  7ffbbeab  90122260  91d02005  7ffbbea8

After this happend, one (out of two) cpu got consumed (in kernel space) trying 
to
complete io. Process stuck in D state, wchan says it was in sync_buffer() which
you can see also in 'SysRq : Show Blocked State' below.

[27422.874858] SysRq : Show Blocked State
[27422.877086]   taskPC stack   pid father
[27422.877143] rmD 004f8f68 0  4966   4860
[27422.877160] Call Trace:
[27422.877167]  [00692840] io_schedule+0x28/0x40
[27422.877182]  [004f8f68] sync_buffer+0x50/0x60
[27422.877198]  [00692a58] __wait_on_bit_lock+0x60/0xa0
[27422.877213]  [00692ae4] out_of_line_wait_on_bit_lock+0x4c/0x60
[27422.877228]  [004f9328] __lock_buffer+0x30/0x40
[27422.877242]  [0053b024] journal_invalidatepage+0x22c/0x460
[27422.877268]  [00529254] ext3_invalidatepage+0x3c/0x60
[27422.877297]  [004b22fc] do_invalidatepage+0x24/0x60
[27422.877316]  [004b29c4] truncate_complete_page+0x6c/0x80
[27422.877332]  [004b2a6c] truncate_inode_pages_range+0x94/0x440
[27422.877349]  [004b2e2c] truncate_inode_pages+0x14/0x20
[27422.877364]  [00529888] ext3_delete_inode+0x10/0x160
[27422.877381]  [004e7ca0] generic_delete_inode+0x88/0x120
[27422.877405]  [004e7e60] generic_drop_inode+0x128/0x1c0
[27422.877421]  [004e75d4] iput+0x7c/0xa0
[27422.877435]  [004dd680] do_unlinkat+0x108/0x1a0

The downside is that it is unclear to me how to reproduce that - it just 
happens sometimes.
Also from time to time I get warnings about tcp_fastretrans_alert(), but it 
seems they do no harm.

[30014.779310] WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
[30014.781630] Call Trace:
[30014.783976]  [006551c8] tcp_fastretrans_alert+0x70/0xe00
[30014.786312]  [00657c60] tcp_ack+0x988/0x10c0
[30014.788702]  [0065bd80] tcp_rcv_established+0x408/0x840
[30014.791074]  [006634dc] tcp_v4_do_rcv+0xe4/0x4a0
[30014.793440]  [0066632c] tcp_v4_rcv+0xa34/0xb20
[30014.795762]  [00643a10] ip_local_deliver+0xd8/0x2c0
[30014.798102]  [00643ed4] ip_rcv+0x2dc/0x640
[30014.800431]  [0062424c] netif_receive_skb+0x334/0x400
[30014.802762]  [00627228] process_backlog+0x90/0x140
[30014.805097]  [00626d28] net_rx_action+0x190/0x260
[30014.807462]  [00475ea8] __do_softirq+0x90/0x140
[30014.809794]  [00475fe0] do_softirq+0x88/0xa0
[30014.812134]  [0047608c] irq_exit+0x94/0xc0
[30014.814453]  [0042f53c] handler_irq+0xa4/0xc0
[30014.816800]

Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash

On Fri, 07 Dec 2007 22:01:33 -0700 Zan Lynx <[EMAIL PROTECTED]> wrote:

> 
> On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote:
> > On Fri, 07 Dec 2007 23:09:43 +
> > Zan Lynx <[EMAIL PROTECTED]> wrote:
> [cut] 
> > > > > Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, but I
> > > > > only get read rates of 1.6 MB/s.  When it used to work in 2.6.20 I got
> > > > > at least 16 MB/s.  The card itself is capable of 30+ in the USB-2
> > > > > reader.
> [cut]
> > Maybe pata_pcmcia-minor-cleanups-and-support-for-dual-channel-cards.patch?
> > 
> > Could you try a `patch -R' of the below?
> > 
> > 
> > From: Alan Cox <[EMAIL PROTECTED]>
> > 
> > Signed-off-by: Alan Cox <[EMAIL PROTECTED]>
> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> > ---
> > 
> >  drivers/ata/pata_pcmcia.c |   31 +--
> >  1 file changed, 17 insertions(+), 14 deletions(-)
> > 
> > diff -puN 
> > drivers/ata/pata_pcmcia.c~pata_pcmcia-minor-cleanups-and-support-for-dual-channel-cards
> >  drivers/ata/pata_pcmcia.c
> [cut]
> 
> Nope, that did not change anything.  It still detects as PIO0 and still
> runs at 1.6 MB/s.

argh.  OK.  And Linus's current tree is OK, yes?

In which case we should be OK for 2.6.24 and I guess we can hope like heck
that the dud patch doesn't leak into mainline.  Hopefully Alan will get
some time to look into it before 2.6.25 opens.



OK, there's a patch in Jeff's tree "pata_pcmcia: Add support for dumb 8bit
IDE emulations" which could be our guy.

I've uploaded two patches, against 2.6.24-rc4:

http://userweb.kernel.org/~akpm/zl.with.gz
origin.patch + git-libata-all.patch

http://userweb.kernel.org/~akpm/zl.without.gz
origin.patch + git-libata-all.patch - 
5ddcddd4dfeb16a9509dad647f509828d6fee605

It would be great if you could test both.  If zl.with is bad and zl.without
is good then we know that 5ddcddd4dfeb16a9509dad647f509828d6fee605 caused
this problem.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: undefined reference to `compat_sys_timerfd' on sparc64


> >   LD  .tmp_vmlinux1
> > arch/sparc64/kernel/head.o: In function `sys_call_table32':
> > arch/sparc64/kernel/head.S:(.text+0x224e0): undefined reference to 
> > `compat_sys_timerfd'
> > make: *** [.tmp_vmlinux1] Error 1
> 
> argh, sorry, I am soo fed up with fixing that patch.
> 
> --- a/arch/sparc64/kernel/systbls.S~timerfd-v3-new-timerfd-api-sparc64-fix
> +++ a/arch/sparc64/kernel/systbls.S
> @@ -80,7 +80,7 @@ sys_call_table32:
>   .word sys_fchmodat, sys_faccessat, compat_sys_pselect6, 
> compat_sys_ppoll, sys_unshare
>  /*300*/  .word compat_sys_set_robust_list, compat_sys_get_robust_list, 
> compat_sys_migrate_pages, compat_sys_mbind, compat_sys_get_mempolicy
>   .word compat_sys_set_mempolicy, compat_sys_kexec_load, 
> compat_sys_move_pages, sys_getcpu, compat_sys_epoll_pwait
> -/*310*/  .word compat_sys_utimensat, compat_sys_signalfd, 
> compat_sys_timerfd, sys_eventfd, compat_sys_fallocate
> +/*310*/  .word compat_sys_utimensat, compat_sys_signalfd, 
> sys_ni_syscall, sys_eventfd, compat_sys_fallocate
>  
>  #endif /* CONFIG_COMPAT */

Ok - that helped.

Thanks,

Mariusz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: undefined reference to `compat_sys_timerfd' on sparc64


LD  .tmp_vmlinux1
  arch/sparc64/kernel/head.o: In function `sys_call_table32':
  arch/sparc64/kernel/head.S:(.text+0x224e0): undefined reference to 
  `compat_sys_timerfd'
  make: *** [.tmp_vmlinux1] Error 1
 
 argh, sorry, I am soo fed up with fixing that patch.
 
 --- a/arch/sparc64/kernel/systbls.S~timerfd-v3-new-timerfd-api-sparc64-fix
 +++ a/arch/sparc64/kernel/systbls.S
 @@ -80,7 +80,7 @@ sys_call_table32:
   .word sys_fchmodat, sys_faccessat, compat_sys_pselect6, 
 compat_sys_ppoll, sys_unshare
  /*300*/  .word compat_sys_set_robust_list, compat_sys_get_robust_list, 
 compat_sys_migrate_pages, compat_sys_mbind, compat_sys_get_mempolicy
   .word compat_sys_set_mempolicy, compat_sys_kexec_load, 
 compat_sys_move_pages, sys_getcpu, compat_sys_epoll_pwait
 -/*310*/  .word compat_sys_utimensat, compat_sys_signalfd, 
 compat_sys_timerfd, sys_eventfd, compat_sys_fallocate
 +/*310*/  .word compat_sys_utimensat, compat_sys_signalfd, 
 sys_ni_syscall, sys_eventfd, compat_sys_fallocate
  
  #endif /* CONFIG_COMPAT */

Ok - that helped.

Thanks,

Mariusz
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash

On Fri, 07 Dec 2007 22:01:33 -0700 Zan Lynx [EMAIL PROTECTED] wrote:

 
 On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote:
  On Fri, 07 Dec 2007 23:09:43 +
  Zan Lynx [EMAIL PROTECTED] wrote:
 [cut] 
 Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, but I
 only get read rates of 1.6 MB/s.  When it used to work in 2.6.20 I got
 at least 16 MB/s.  The card itself is capable of 30+ in the USB-2
 reader.
 [cut]
  Maybe pata_pcmcia-minor-cleanups-and-support-for-dual-channel-cards.patch?
  
  Could you try a `patch -R' of the below?
  
  
  From: Alan Cox [EMAIL PROTECTED]
  
  Signed-off-by: Alan Cox [EMAIL PROTECTED]
  Signed-off-by: Andrew Morton [EMAIL PROTECTED]
  ---
  
   drivers/ata/pata_pcmcia.c |   31 +--
   1 file changed, 17 insertions(+), 14 deletions(-)
  
  diff -puN 
  drivers/ata/pata_pcmcia.c~pata_pcmcia-minor-cleanups-and-support-for-dual-channel-cards
   drivers/ata/pata_pcmcia.c
 [cut]
 
 Nope, that did not change anything.  It still detects as PIO0 and still
 runs at 1.6 MB/s.

argh.  OK.  And Linus's current tree is OK, yes?

In which case we should be OK for 2.6.24 and I guess we can hope like heck
that the dud patch doesn't leak into mainline.  Hopefully Alan will get
some time to look into it before 2.6.25 opens.

looks

OK, there's a patch in Jeff's tree pata_pcmcia: Add support for dumb 8bit
IDE emulations which could be our guy.

I've uploaded two patches, against 2.6.24-rc4:

http://userweb.kernel.org/~akpm/zl.with.gz
origin.patch + git-libata-all.patch

http://userweb.kernel.org/~akpm/zl.without.gz
origin.patch + git-libata-all.patch - 
5ddcddd4dfeb16a9509dad647f509828d6fee605

It would be great if you could test both.  If zl.with is bad and zl.without
is good then we know that 5ddcddd4dfeb16a9509dad647f509828d6fee605 caused
this problem.

Thanks.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1: some issues on sparc64

Hello,

The box is sun ultra 60 (dual sparc64). This was caught when
system (gentoo) was emerging some package. 

[27006.402237] kernel BUG at fs/jbd/transaction.c:1894!
[27006.402268]   \|/  \|/
[27006.402274]   @'/ .. \`@
[27006.402279]   /_| \__/ |_\
[27006.402285]  \__U_/
[27006.402298] rm(4713): Kernel bad sw trap 5 [#1]
[27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 
0053b1d0 Y: Not tainted
[27006.402579] TPC: journal_invalidatepage+0x3d4/0x460
[27006.402593] g0: 0002 g1:  g2: 0001 
g3: f800a7d9
[27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 
g7: 0076d868
[27006.402627] o0: 0072b660 o1: 0766 o2: 0002 
o3: 0001
[27006.402644] o4: 008a2940 o5:  sp: f800a7d92c91 
ret_pc: 0053b1c4
[27006.402665] RPC: journal_invalidatepage+0x3cc/0x460
[27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 
l3: 
[27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 
l7: 0001
[27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2:  
i3: 00727000
[27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 
i7: 00529254
[27006.402763] I7: ext3_invalidatepage+0x3c/0x60
[27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60
[27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60
[27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80
[27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440
[27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20
[27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160
[27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120
[27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0
[27006.402971] Caller[004e75d4]: iput+0x7c/0xa0
[27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0
[27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60
[27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40
[27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4
[27006.403102] Instruction DUMP: 92102766  7ffbbeaf  90122260 91d02005 
92102780  7ffbbeab  90122260  91d02005  7ffbbea8

After this happend, one (out of two) cpu got consumed (in kernel space) trying 
to
complete io. Process stuck in D state, wchan says it was in sync_buffer() which
you can see also in 'SysRq : Show Blocked State' below.

[27422.874858] SysRq : Show Blocked State
[27422.877086]   taskPC stack   pid father
[27422.877143] rmD 004f8f68 0  4966   4860
[27422.877160] Call Trace:
[27422.877167]  [00692840] io_schedule+0x28/0x40
[27422.877182]  [004f8f68] sync_buffer+0x50/0x60
[27422.877198]  [00692a58] __wait_on_bit_lock+0x60/0xa0
[27422.877213]  [00692ae4] out_of_line_wait_on_bit_lock+0x4c/0x60
[27422.877228]  [004f9328] __lock_buffer+0x30/0x40
[27422.877242]  [0053b024] journal_invalidatepage+0x22c/0x460
[27422.877268]  [00529254] ext3_invalidatepage+0x3c/0x60
[27422.877297]  [004b22fc] do_invalidatepage+0x24/0x60
[27422.877316]  [004b29c4] truncate_complete_page+0x6c/0x80
[27422.877332]  [004b2a6c] truncate_inode_pages_range+0x94/0x440
[27422.877349]  [004b2e2c] truncate_inode_pages+0x14/0x20
[27422.877364]  [00529888] ext3_delete_inode+0x10/0x160
[27422.877381]  [004e7ca0] generic_delete_inode+0x88/0x120
[27422.877405]  [004e7e60] generic_drop_inode+0x128/0x1c0
[27422.877421]  [004e75d4] iput+0x7c/0xa0
[27422.877435]  [004dd680] do_unlinkat+0x108/0x1a0

The downside is that it is unclear to me how to reproduce that - it just 
happens sometimes.
Also from time to time I get warnings about tcp_fastretrans_alert(), but it 
seems they do no harm.

[30014.779310] WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
[30014.781630] Call Trace:
[30014.783976]  [006551c8] tcp_fastretrans_alert+0x70/0xe00
[30014.786312]  [00657c60] tcp_ack+0x988/0x10c0
[30014.788702]  [0065bd80] tcp_rcv_established+0x408/0x840
[30014.791074]  [006634dc] tcp_v4_do_rcv+0xe4/0x4a0
[30014.793440]  [0066632c] tcp_v4_rcv+0xa34/0xb20
[30014.795762]  [00643a10] ip_local_deliver+0xd8/0x2c0
[30014.798102]  [00643ed4] ip_rcv+0x2dc/0x640
[30014.800431]  [0062424c] netif_receive_skb+0x334/0x400
[30014.802762]  [00627228] process_backlog+0x90/0x140
[30014.805097]  [00626d28] net_rx_action+0x190/0x260
[30014.807462]  [00475ea8] __do_softirq+0x90/0x140
[30014.809794]  [00475fe0] do_softirq+0x88/0xa0
[30014.812134]  [0047608c]

Re: 2.6.24-rc4-mm1: some issues on sparc64

On Sat, 8 Dec 2007 19:20:28 +0100 Mariusz Kozlowski [EMAIL PROTECTED] wrote:

   The box is sun ultra 60 (dual sparc64). This was caught when
 system (gentoo) was emerging some package. 
 
 [27006.402237] kernel BUG at fs/jbd/transaction.c:1894!

That's

J_ASSERT_BH(bh, !buffer_jbddirty(bh));

at the end of journal_unmap_buffer().

I don't recall seeing that before and I can't think of anything we've
done recently which could cause it, sorry.

 [27006.402268]   \|/  \|/
 [27006.402274]   @'/ .. \`@
 [27006.402279]   /_| \__/ |_\
 [27006.402285]  \__U_/

x86 needs that.

 [27006.402298] rm(4713): Kernel bad sw trap 5 [#1]
 [27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 
 0053b1d0 Y: Not tainted
 [27006.402579] TPC: journal_invalidatepage+0x3d4/0x460
 [27006.402593] g0: 0002 g1:  g2: 0001 
 g3: f800a7d9
 [27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 
 g7: 0076d868
 [27006.402627] o0: 0072b660 o1: 0766 o2: 0002 
 o3: 0001
 [27006.402644] o4: 008a2940 o5:  sp: f800a7d92c91 
 ret_pc: 0053b1c4
 [27006.402665] RPC: journal_invalidatepage+0x3cc/0x460
 [27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 
 l3: 
 [27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 
 l7: 0001
 [27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2:  
 i3: 00727000
 [27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 
 i7: 00529254
 [27006.402763] I7: ext3_invalidatepage+0x3c/0x60
 [27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60
 [27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60
 [27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80
 [27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440
 [27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20
 [27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160
 [27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120
 [27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0
 [27006.402971] Caller[004e75d4]: iput+0x7c/0xa0
 [27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0
 [27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60
 [27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40
 [27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4
 [27006.403102] Instruction DUMP: 92102766  7ffbbeaf  90122260 91d02005 
 92102780  7ffbbeab  90122260  91d02005  7ffbbea8
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1 -- boot process hangs -- tty4 main process (2988) terminated with status 1

2007-12-08 Thread Miles Lane

  Dec  6 21:24:28 erratic-orbits init: tty3 main process (2991)
  terminated with status 1

 Boggle.  We broke the vt driver?

 config, please...

I sent the .config.  Is there nothing else to follow up on?  I have
tried rebuilding about seven kernels, tweaking the options each time.
All the kernels have failed to boot.   I am currently trying with a
defconfig kernel.  Perhaps I will have better luck with it.

Thanks,
   Miles
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash

2007-12-08 Thread Zan Lynx


On Sat, 2007-12-08 at 02:07 -0800, Andrew Morton wrote:
 On Fri, 07 Dec 2007 22:01:33 -0700 Zan Lynx [EMAIL PROTECTED] wrote:
 
  
  On Fri, 2007-12-07 at 15:22 -0800, Andrew Morton wrote:
   On Fri, 07 Dec 2007 23:09:43 +
   Zan Lynx [EMAIL PROTECTED] wrote:
  [cut] 
  Now with MM kernels 2.6.24 rc1-4 the PCMCIA adapter works again, 
  but I
  only get read rates of 1.6 MB/s.  When it used to work in 2.6.20 I 
  got
  at least 16 MB/s.  The card itself is capable of 30+ in the USB-2
  reader.
[cut]
 argh.  OK.  And Linus's current tree is OK, yes?
 
 In which case we should be OK for 2.6.24 and I guess we can hope like heck
 that the dud patch doesn't leak into mainline.  Hopefully Alan will get
 some time to look into it before 2.6.25 opens.

Linus' tree is also broken.

I tried a Linus 2.6.24-rc4 and it acts the same way, with a very slow
transfer rate.  

I also tried 2.6.24-rc4 with the older not-libata PATA drivers and it is
broken.  dmesg had a line about the CF card detected as hda,
but /sys/block did not have hda and /dev/hda did not function.

I will try the patches you mentioned, but I think I may also have to
work backward through kernel versions until I find the last one where
the PCMCIA hd{a,b,c,d,e} drivers worked.
-- 
Zan Lynx [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part

Re: 2.6.24-rc4-mm1 and Very Slow PCMCIA Compact Flash