2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-20 Thread Dave Young
Please see the kernel messages following,(trigged while using some qemu session)
BTW, seems there's some e100 error message as well.

PCI: Setting latency timer of device :00:1b.0 to 64
e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
ACPI: PCI Interrupt :03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
modprobe:2331 conflicting cache attribute efaff000-efb0 uncached<->default
e100: :03:08.0: e100_probe: Cannot map device registers, aborting.
ACPI: PCI interrupt for device :03:08.0 disabled
e100: probe of :03:08.0 failed with error -12
eth0:  setting full-duplex.
[ cut here ]
WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq 
snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel 
snd_pcm snd_timer btusb rtc_cmos thermal bluetooth rtc_core serio_raw intel_agp 
button processor sg snd rtc_lib i2c_i801 evdev agpgart soundcore dcdbas 3c59x 
pcspkr snd_page_alloc
Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
 [] ? printk+0x0/0x20
 [] warn_on_slowpath+0x54/0x80
 [] ? ip_finish_output+0x128/0x2e0
 [] ? ip_output+0xe7/0x100
 [] ? ip_local_out+0x18/0x20
 [] ? ip_queue_xmit+0x3dc/0x470
 [] ? _spin_unlock_irqrestore+0x5e/0x70
 [] ? check_pad_bytes+0x61/0x80
 [] tcp_mark_head_lost+0x121/0x150
 [] tcp_update_scoreboard+0x4c/0x170
 [] tcp_fastretrans_alert+0x48a/0x6b0
 [] tcp_ack+0x1b3/0x3a0
 [] tcp_rcv_established+0x3eb/0x710
 [] tcp_v4_do_rcv+0xe5/0x100
 [] tcp_v4_rcv+0x5db/0x660
 [] ? tcp_v4_rcv+0x387/0x660
 [] ? ip_local_deliver_finish+0x2d/0x1d0
 [] ip_local_deliver_finish+0x84/0x1d0
 [] ? ip_local_deliver_finish+0x2d/0x1d0
 [] ? __lock_release+0x47/0x70
 [] ip_local_deliver+0xb7/0xc0
 [] ip_rcv_finish+0xb2/0x3c0
 [] ? sock_def_readable+0x48/0xa0
 [] ? sock_queue_rcv_skb+0xb1/0x1a0
 [] ? sock_queue_rcv_skb+0xf7/0x1a0
 [] ip_rcv+0x18f/0x290
 [] ? packet_rcv_spkt+0xd0/0x130
 [] netif_receive_skb+0x2b6/0x330
 [] ? netif_receive_skb+0x127/0x330
 [] ? process_backlog+0x83/0x100
 [] process_backlog+0x8e/0x100
 [] net_rx_action+0x13c/0x230
 [] ? net_rx_action+0x59/0x230
 [] ? __do_softirq+0x6e/0x120
 [] __do_softirq+0x93/0x120
 [] do_softirq+0x7a/0x80
 [] irq_exit+0x65/0x90
 [] do_IRQ+0x41/0x80
 [] ? tick_nohz_stop_sched_tick+0x25c/0x350
 [] common_interrupt+0x2e/0x34
 [] ? mwait_idle_with_hints+0x40/0x50
 [] ? mwait_idle+0x0/0x20
 [] mwait_idle+0x12/0x20
 [] cpu_idle+0x61/0x110
 [] rest_init+0x5d/0x60
 [] start_kernel+0x1fa/0x260
 [] ? unknown_bootoption+0x0/0x130
 ===
---[ end trace 97302d8bf57718dd ]---
[ cut here ]
WARNING: at net/ipv4/tcp_input.c:2528 tcp_fastretrans_alert+0x675/0x6b0()
Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq 
snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel 
snd_pcm snd_timer btusb rtc_cmos thermal bluetooth rtc_core serio_raw intel_agp 
button processor sg snd rtc_lib i2c_i801 evdev agpgart soundcore dcdbas 3c59x 
pcspkr snd_page_alloc
Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
 [] ? printk+0x0/0x20
 [] warn_on_slowpath+0x54/0x80
 [] ? __lock_release+0x47/0x70
 [] ? group_send_sig_info+0x74/0x80
 [] ? _spin_unlock_irqrestore+0x5e/0x70
 [] ? group_send_sig_info+0x74/0x80
 [] ? validate_chain+0x1d2/0x320
 [] ? validate_chain+0x1d2/0x320
 [] ? validate_chain+0x1d2/0x320
 [] ? validate_chain+0x1d2/0x320
 [] tcp_fastretrans_alert+0x675/0x6b0
 [] tcp_ack+0x1b3/0x3a0
 [] tcp_rcv_established+0x3eb/0x710
 [] tcp_v4_do_rcv+0xe5/0x100
 [] tcp_v4_rcv+0x5db/0x660
 [] ? tcp_v4_rcv+0x387/0x660
 [] ? ip_local_deliver_finish+0x2d/0x1d0
 [] ip_local_deliver_finish+0x84/0x1d0
 [] ? ip_local_deliver_finish+0x2d/0x1d0
 [] ? __lock_release+0x47/0x70
 [] ip_local_deliver+0xb7/0xc0
 [] ip_rcv_finish+0xb2/0x3c0
 [] ? sock_def_readable+0x48/0xa0
 [] ? sock_queue_rcv_skb+0xb1/0x1a0
 [] ? sock_queue_rcv_skb+0xf7/0x1a0
 [] ip_rcv+0x18f/0x290
 [] ? packet_rcv_spkt+0xd0/0x130
 [] netif_receive_skb+0x2b6/0x330
 [] ? netif_receive_skb+0x127/0x330
 [] ? process_backlog+0x83/0x100
 [] process_backlog+0x8e/0x100
 [] net_rx_action+0x13c/0x230
 [] ? net_rx_action+0x59/0x230
 [] ? __do_softirq+0x6e/0x120
 [] __do_softirq+0x93/0x120
 [] do_softirq+0x7a/0x80
 [] irq_exit+0x65/0x90
 [] do_IRQ+0x41/0x80
 [] ? tick_nohz_stop_sched_tick+0x25c/0x350
 [] common_interrupt+0x2e/0x34
 [] ? mwait_idle_with_hints+0x40/0x50
 [] ? mwait_idle+0x0/0x20
 [] mwait_idle+0x12/0x20
 [] cpu_idle+0x61/0x110
 [] rest_init+0x5d/0x60
 [] start_kernel+0x1fa/0x260
 [] ? unknown_bootoption+0x0/0x130
 ===
---[ end trace 97302d8bf57718dd ]---
[ cut here ]
WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq 
snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel 
snd_pcm snd_timer btusb rtc_cmos t

Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-21 Thread Ilpo Järvinen
On Mon, 21 Jan 2008, Dave Young wrote:

> Please see the kernel messages following,(trigged while using some qemu 
> session)
> BTW, seems there's some e100 error message as well.
> 
> PCI: Setting latency timer of device :00:1b.0 to 64
> e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
> e100: Copyright(c) 1999-2006 Intel Corporation
> ACPI: PCI Interrupt :03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
> modprobe:2331 conflicting cache attribute efaff000-efb0 uncached<->default
> e100: :03:08.0: e100_probe: Cannot map device registers, aborting.
> ACPI: PCI interrupt for device :03:08.0 disabled
> e100: probe of :03:08.0 failed with error -12
> eth0:  setting full-duplex.
> [ cut here ]
> WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
> Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq 
> snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel 
> snd_pcm snd_timer btusb rtc_cmos thermal bluetooth rtc_core serio_raw 
> intel_agp button processor sg snd rtc_lib i2c_i801 evdev agpgart soundcore 
> dcdbas 3c59x pcspkr snd_page_alloc
> Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
>  [] ? printk+0x0/0x20
>  [] warn_on_slowpath+0x54/0x80
>  [] ? ip_finish_output+0x128/0x2e0
>  [] ? ip_output+0xe7/0x100
>  [] ? ip_local_out+0x18/0x20
>  [] ? ip_queue_xmit+0x3dc/0x470
>  [] ? _spin_unlock_irqrestore+0x5e/0x70
>  [] ? check_pad_bytes+0x61/0x80
>  [] tcp_mark_head_lost+0x121/0x150
>  [] tcp_update_scoreboard+0x4c/0x170
>  [] tcp_fastretrans_alert+0x48a/0x6b0
>  [] tcp_ack+0x1b3/0x3a0
>  [] tcp_rcv_established+0x3eb/0x710
>  [] tcp_v4_do_rcv+0xe5/0x100
>  [] tcp_v4_rcv+0x5db/0x660

Doh, once more these S+L things..., the rest are symptom of the first 
problem.

What is strange is that it doesn't show up until now, the last TCP
changes that could have some significance are from early Dec/Nov. Is
there some reason why you haven't seen this before this (e.g., not
tested with similar cfg or so)? I'm a bit worried about its
reproducability if it takes this far to see it...


-- 
 i.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-21 Thread Dave Young
On Jan 22, 2008 5:14 AM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
>
> On Mon, 21 Jan 2008, Dave Young wrote:
>
> > Please see the kernel messages following,(trigged while using some qemu 
> > session)
> > BTW, seems there's some e100 error message as well.
> >
> > PCI: Setting latency timer of device :00:1b.0 to 64
> > e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
> > e100: Copyright(c) 1999-2006 Intel Corporation
> > ACPI: PCI Interrupt :03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
> > modprobe:2331 conflicting cache attribute efaff000-efb0 
> > uncached<->default
> > e100: :03:08.0: e100_probe: Cannot map device registers, aborting.
> > ACPI: PCI interrupt for device :03:08.0 disabled
> > e100: probe of :03:08.0 failed with error -12
> > eth0:  setting full-duplex.
> > [ cut here ]
> > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
> > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq 
> > snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel 
> > snd_pcm snd_timer btusb rtc_cmos thermal bluetooth rtc_core serio_raw 
> > intel_agp button processor sg snd rtc_lib i2c_i801 evdev agpgart soundcore 
> > dcdbas 3c59x pcspkr snd_page_alloc
> > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
> >  [] ? printk+0x0/0x20
> >  [] warn_on_slowpath+0x54/0x80
> >  [] ? ip_finish_output+0x128/0x2e0
> >  [] ? ip_output+0xe7/0x100
> >  [] ? ip_local_out+0x18/0x20
> >  [] ? ip_queue_xmit+0x3dc/0x470
> >  [] ? _spin_unlock_irqrestore+0x5e/0x70
> >  [] ? check_pad_bytes+0x61/0x80
> >  [] tcp_mark_head_lost+0x121/0x150
> >  [] tcp_update_scoreboard+0x4c/0x170
> >  [] tcp_fastretrans_alert+0x48a/0x6b0
> >  [] tcp_ack+0x1b3/0x3a0
> >  [] tcp_rcv_established+0x3eb/0x710
> >  [] tcp_v4_do_rcv+0xe5/0x100
> >  [] tcp_v4_rcv+0x5db/0x660
>
> Doh, once more these S+L things..., the rest are symptom of the first
> problem.

What is the S+L thing? Could you explain a bit?

>
> What is strange is that it doesn't show up until now, the last TCP
> changes that could have some significance are from early Dec/Nov. Is
> there some reason why you haven't seen this before this (e.g., not
> tested with similar cfg or so)?

Hmm, don't know how to answer ...

I'm a bit worried about its
> reproducability if it takes this far to see it...
>
>
> --
>  i.
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-22 Thread Dave Young
On Jan 22, 2008 5:09 PM, Dave Young <[EMAIL PROTECTED]> wrote:
>
> On Jan 22, 2008 12:37 PM, Dave Young <[EMAIL PROTECTED]> wrote:
> >
> > On Jan 22, 2008 5:14 AM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> > >
> > > On Mon, 21 Jan 2008, Dave Young wrote:
> > >
> > > > Please see the kernel messages following,(trigged while using some qemu 
> > > > session)
> > > > BTW, seems there's some e100 error message as well.
> > > >
> > > > PCI: Setting latency timer of device :00:1b.0 to 64
> > > > e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
> > > > e100: Copyright(c) 1999-2006 Intel Corporation
> > > > ACPI: PCI Interrupt :03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
> > > > modprobe:2331 conflicting cache attribute efaff000-efb0 
> > > > uncached<->default
> > > > e100: :03:08.0: e100_probe: Cannot map device registers, aborting.
> > > > ACPI: PCI interrupt for device :03:08.0 disabled
> > > > e100: probe of :03:08.0 failed with error -12
> > > > eth0:  setting full-duplex.
> > > > [ cut here ]
> > > > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
> > > > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq 
> > > > snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse 
> > > > snd_hda_intel snd_pcm snd_timer btusb rtc_cmos thermal bluetooth 
> > > > rtc_core serio_raw intel_agp button processor sg snd rtc_lib i2c_i801 
> > > > evdev agpgart soundcore dcdbas 3c59x pcspkr snd_page_alloc
> > > > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
> > > >  [] ? printk+0x0/0x20
> > > >  [] warn_on_slowpath+0x54/0x80
> > > >  [] ? ip_finish_output+0x128/0x2e0
> > > >  [] ? ip_output+0xe7/0x100
> > > >  [] ? ip_local_out+0x18/0x20
> > > >  [] ? ip_queue_xmit+0x3dc/0x470
> > > >  [] ? _spin_unlock_irqrestore+0x5e/0x70
> > > >  [] ? check_pad_bytes+0x61/0x80
> > > >  [] tcp_mark_head_lost+0x121/0x150
> > > >  [] tcp_update_scoreboard+0x4c/0x170
> > > >  [] tcp_fastretrans_alert+0x48a/0x6b0
> > > >  [] tcp_ack+0x1b3/0x3a0
> > > >  [] tcp_rcv_established+0x3eb/0x710
> > > >  [] tcp_v4_do_rcv+0xe5/0x100
> > > >  [] tcp_v4_rcv+0x5db/0x660
> > >
> > > Doh, once more these S+L things..., the rest are symptom of the first
> > > problem.
> >
> > What is the S+L thing? Could you explain a bit?
> >
> > >
> > > What is strange is that it doesn't show up until now, the last TCP
> > > changes that could have some significance are from early Dec/Nov. Is
> > > there some reason why you haven't seen this before this (e.g., not
> > > tested with similar cfg or so)?
> >
> > Hmm, don't know how to answer ...
> >
> >
> > I'm a bit worried about its
> > > reproducability if it takes this far to see it...
> > >
>
> It's trigged again in my pc, just while using firefox.

Maybe relate to the e100 error, I will apply jiri slaby's
e100-iomap-mem-accesses patch to test.
>
> > >
> > > --
> > >  i.
> > >
> >
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-22 Thread Dave Young
On Jan 22, 2008 12:37 PM, Dave Young <[EMAIL PROTECTED]> wrote:
>
> On Jan 22, 2008 5:14 AM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> >
> > On Mon, 21 Jan 2008, Dave Young wrote:
> >
> > > Please see the kernel messages following,(trigged while using some qemu 
> > > session)
> > > BTW, seems there's some e100 error message as well.
> > >
> > > PCI: Setting latency timer of device :00:1b.0 to 64
> > > e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
> > > e100: Copyright(c) 1999-2006 Intel Corporation
> > > ACPI: PCI Interrupt :03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
> > > modprobe:2331 conflicting cache attribute efaff000-efb0 
> > > uncached<->default
> > > e100: :03:08.0: e100_probe: Cannot map device registers, aborting.
> > > ACPI: PCI interrupt for device :03:08.0 disabled
> > > e100: probe of :03:08.0 failed with error -12
> > > eth0:  setting full-duplex.
> > > [ cut here ]
> > > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
> > > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq 
> > > snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse 
> > > snd_hda_intel snd_pcm snd_timer btusb rtc_cmos thermal bluetooth rtc_core 
> > > serio_raw intel_agp button processor sg snd rtc_lib i2c_i801 evdev 
> > > agpgart soundcore dcdbas 3c59x pcspkr snd_page_alloc
> > > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
> > >  [] ? printk+0x0/0x20
> > >  [] warn_on_slowpath+0x54/0x80
> > >  [] ? ip_finish_output+0x128/0x2e0
> > >  [] ? ip_output+0xe7/0x100
> > >  [] ? ip_local_out+0x18/0x20
> > >  [] ? ip_queue_xmit+0x3dc/0x470
> > >  [] ? _spin_unlock_irqrestore+0x5e/0x70
> > >  [] ? check_pad_bytes+0x61/0x80
> > >  [] tcp_mark_head_lost+0x121/0x150
> > >  [] tcp_update_scoreboard+0x4c/0x170
> > >  [] tcp_fastretrans_alert+0x48a/0x6b0
> > >  [] tcp_ack+0x1b3/0x3a0
> > >  [] tcp_rcv_established+0x3eb/0x710
> > >  [] tcp_v4_do_rcv+0xe5/0x100
> > >  [] tcp_v4_rcv+0x5db/0x660
> >
> > Doh, once more these S+L things..., the rest are symptom of the first
> > problem.
>
> What is the S+L thing? Could you explain a bit?
>
> >
> > What is strange is that it doesn't show up until now, the last TCP
> > changes that could have some significance are from early Dec/Nov. Is
> > there some reason why you haven't seen this before this (e.g., not
> > tested with similar cfg or so)?
>
> Hmm, don't know how to answer ...
>
>
> I'm a bit worried about its
> > reproducability if it takes this far to see it...
> >

It's trigged again in my pc, just while using firefox.

> >
> > --
> >  i.
> >
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-22 Thread Ilpo Järvinen
On Tue, 22 Jan 2008, Dave Young wrote:

> On Jan 22, 2008 12:37 PM, Dave Young <[EMAIL PROTECTED]> wrote:
> >
> > On Jan 22, 2008 5:14 AM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> > >
> > > On Mon, 21 Jan 2008, Dave Young wrote:
> > >
> > > > Please see the kernel messages following,(trigged while using some qemu 
> > > > session)
> > > > BTW, seems there's some e100 error message as well.
> > > >
> > > > PCI: Setting latency timer of device :00:1b.0 to 64
> > > > e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
> > > > e100: Copyright(c) 1999-2006 Intel Corporation
> > > > ACPI: PCI Interrupt :03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
> > > > modprobe:2331 conflicting cache attribute efaff000-efb0 
> > > > uncached<->default
> > > > e100: :03:08.0: e100_probe: Cannot map device registers, aborting.
> > > > ACPI: PCI interrupt for device :03:08.0 disabled
> > > > e100: probe of :03:08.0 failed with error -12
> > > > eth0:  setting full-duplex.
> > > > [ cut here ]
> > > > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
> > > > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq 
> > > > snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse 
> > > > snd_hda_intel snd_pcm snd_timer btusb rtc_cmos thermal bluetooth 
> > > > rtc_core serio_raw intel_agp button processor sg snd rtc_lib i2c_i801 
> > > > evdev agpgart soundcore dcdbas 3c59x pcspkr snd_page_alloc
> > > > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
> > > >  [] ? printk+0x0/0x20
> > > >  [] warn_on_slowpath+0x54/0x80
> > > >  [] ? ip_finish_output+0x128/0x2e0
> > > >  [] ? ip_output+0xe7/0x100
> > > >  [] ? ip_local_out+0x18/0x20
> > > >  [] ? ip_queue_xmit+0x3dc/0x470
> > > >  [] ? _spin_unlock_irqrestore+0x5e/0x70
> > > >  [] ? check_pad_bytes+0x61/0x80
> > > >  [] tcp_mark_head_lost+0x121/0x150
> > > >  [] tcp_update_scoreboard+0x4c/0x170
> > > >  [] tcp_fastretrans_alert+0x48a/0x6b0
> > > >  [] tcp_ack+0x1b3/0x3a0
> > > >  [] tcp_rcv_established+0x3eb/0x710
> > > >  [] tcp_v4_do_rcv+0xe5/0x100
> > > >  [] tcp_v4_rcv+0x5db/0x660
> > >
> > > Doh, once more these S+L things..., the rest are symptom of the first
> > > problem.
> >
> > What is the S+L thing? Could you explain a bit?

It means that one of the skbs is both SACKed and marked as LOST at the
same time in the counters (might be due to miscount of lost/sacked_out
too, not necessarilily in the ->sacked bits). Such state is logically
invalid because it would mean that the sender thinks that the same packet 
both reached the receiver and is lost in the network.

Traditionally TCP has just silently "corrected" over-estimates
(sacked_out+lost_out > packets_out). I changed this couple of releases ago
because those over-estimates often are due to bugs that should be fixed  
(there have been couple of them but it has been very quite on this front  
long time, months or even half year already; but I might have broken
something with the early Dec changes).

These problem may originate from a bug that occurred a number of ACKs
earlier the WARN_ON triggered, therefore they are a bit tricky to track,
those WARN_ON serve just for alerting purposes and usually do not point
out where the bug actually occurred.

I usually just asked people to include exhaustive verifier which compares
->sacked bitmaps with sacked/lost_out counters and report immediately when
the problem shows up, rather than waiting for the cheaper S+L check we do
in the WARN_ON to trigger. I tried to collect tracking patch from the
previous efforts (hopefully got it right after modifications).

> > I'm a bit worried about its
> > > reproducability if it takes this far to see it...
> > >
> 
> It's trigged again in my pc, just while using firefox.

...Good, then there's some chance to catch it.

-- 
 i.

[PATCH] [TCP]: debug S+L

---
 include/net/tcp.h |8 +++-
 net/ipv4/tcp_input.c  |6 +++
 net/ipv4/tcp_ipv4.c   |  101 +
 net/ipv4/tcp_output.c |   21 +++---
 4 files changed, 129 insertions(+), 7 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7de4ea3..0685035 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -272,6 +272,8 @@ DECLARE_SNMP_STAT(struct tcp_mib, tcp_statistics);
 #define TCP_ADD_STATS_BH(field, val)   SNMP_ADD_STATS_BH(tcp_statistics, 
field, val)
 #define TCP_ADD_STATS_USER(field, val) SNMP_ADD_STATS_USER(tcp_statistics, 
field, val)
 
+extern voidtcp_verify_wq(struct sock *sk);
+
 extern voidtcp_v4_err(struct sk_buff *skb, u32);
 
 extern voidtcp_shutdown (struct sock *sk, int how);
@@ -768,7 +770,11 @@ static inline __u32 tcp_current_ssthresh(const struct sock 
*sk)
 }
 
 /* Use define here intentionally to get WARN_ON location shown at the caller */
-#define tcp_verify_left_out(tp)WARN_ON(tcp_left_out(tp) > 
tp->packets_out)
+#define tcp_verify_left_o

Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-22 Thread Dave Young
On Jan 22, 2008 6:47 PM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
>
> On Tue, 22 Jan 2008, Dave Young wrote:
>
> > On Jan 22, 2008 12:37 PM, Dave Young <[EMAIL PROTECTED]> wrote:
> > >
> > > On Jan 22, 2008 5:14 AM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> > > >
> > > > On Mon, 21 Jan 2008, Dave Young wrote:
> > > >
> > > > > Please see the kernel messages following,(trigged while using some 
> > > > > qemu session)
> > > > > BTW, seems there's some e100 error message as well.
> > > > >
> > > > > PCI: Setting latency timer of device :00:1b.0 to 64
> > > > > e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
> > > > > e100: Copyright(c) 1999-2006 Intel Corporation
> > > > > ACPI: PCI Interrupt :03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
> > > > > modprobe:2331 conflicting cache attribute efaff000-efb0 
> > > > > uncached<->default
> > > > > e100: :03:08.0: e100_probe: Cannot map device registers, aborting.
> > > > > ACPI: PCI interrupt for device :03:08.0 disabled
> > > > > e100: probe of :03:08.0 failed with error -12
> > > > > eth0:  setting full-duplex.
> > > > > [ cut here ]
> > > > > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
> > > > > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event 
> > > > > snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse 
> > > > > snd_hda_intel snd_pcm snd_timer btusb rtc_cmos thermal bluetooth 
> > > > > rtc_core serio_raw intel_agp button processor sg snd rtc_lib i2c_i801 
> > > > > evdev agpgart soundcore dcdbas 3c59x pcspkr snd_page_alloc
> > > > > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
> > > > >  [] ? printk+0x0/0x20
> > > > >  [] warn_on_slowpath+0x54/0x80
> > > > >  [] ? ip_finish_output+0x128/0x2e0
> > > > >  [] ? ip_output+0xe7/0x100
> > > > >  [] ? ip_local_out+0x18/0x20
> > > > >  [] ? ip_queue_xmit+0x3dc/0x470
> > > > >  [] ? _spin_unlock_irqrestore+0x5e/0x70
> > > > >  [] ? check_pad_bytes+0x61/0x80
> > > > >  [] tcp_mark_head_lost+0x121/0x150
> > > > >  [] tcp_update_scoreboard+0x4c/0x170
> > > > >  [] tcp_fastretrans_alert+0x48a/0x6b0
> > > > >  [] tcp_ack+0x1b3/0x3a0
> > > > >  [] tcp_rcv_established+0x3eb/0x710
> > > > >  [] tcp_v4_do_rcv+0xe5/0x100
> > > > >  [] tcp_v4_rcv+0x5db/0x660
> > > >
> > > > Doh, once more these S+L things..., the rest are symptom of the first
> > > > problem.
> > >
> > > What is the S+L thing? Could you explain a bit?
>
> It means that one of the skbs is both SACKed and marked as LOST at the
> same time in the counters (might be due to miscount of lost/sacked_out
> too, not necessarilily in the ->sacked bits). Such state is logically
> invalid because it would mean that the sender thinks that the same packet
> both reached the receiver and is lost in the network.
>
> Traditionally TCP has just silently "corrected" over-estimates
> (sacked_out+lost_out > packets_out). I changed this couple of releases ago
> because those over-estimates often are due to bugs that should be fixed
> (there have been couple of them but it has been very quite on this front
> long time, months or even half year already; but I might have broken
> something with the early Dec changes).
>
> These problem may originate from a bug that occurred a number of ACKs
> earlier the WARN_ON triggered, therefore they are a bit tricky to track,
> those WARN_ON serve just for alerting purposes and usually do not point
> out where the bug actually occurred.
>
> I usually just asked people to include exhaustive verifier which compares
> ->sacked bitmaps with sacked/lost_out counters and report immediately when
> the problem shows up, rather than waiting for the cheaper S+L check we do
> in the WARN_ON to trigger. I tried to collect tracking patch from the
> previous efforts (hopefully got it right after modifications).
>
> > > I'm a bit worried about its
> > > > reproducability if it takes this far to see it...
> > > >
> >
> > It's trigged again in my pc, just while using firefox.
>
> ...Good, then there's some chance to catch it.
>
> --
>  i.
>
> [PATCH] [TCP]: debug S+L

Thanks, If there's new findings I will let you know.

>
> ---
>  include/net/tcp.h |8 +++-
>  net/ipv4/tcp_input.c  |6 +++
>  net/ipv4/tcp_ipv4.c   |  101 
> +
>  net/ipv4/tcp_output.c |   21 +++---
>  4 files changed, 129 insertions(+), 7 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 7de4ea3..0685035 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -272,6 +272,8 @@ DECLARE_SNMP_STAT(struct tcp_mib, tcp_statistics);
>  #define TCP_ADD_STATS_BH(field, val)   SNMP_ADD_STATS_BH(tcp_statistics, 
> field, val)
>  #define TCP_ADD_STATS_USER(field, val) SNMP_ADD_STATS_USER(tcp_statistics, 
> field, val)
>
> +extern voidtcp_verify_wq(struct sock *sk);
> +
>  extern voidtcp_v4_err(struct sk_buff *skb, u32);
>
>  extern void

Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-22 Thread David Miller
From: "Dave Young" <[EMAIL PROTECTED]>
Date: Wed, 23 Jan 2008 09:44:30 +0800

> On Jan 22, 2008 6:47 PM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> > [PATCH] [TCP]: debug S+L
> 
> Thanks, If there's new findings I will let you know.

Thanks for helping with this bug Dave.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-22 Thread Ilpo Järvinen
On Tue, 22 Jan 2008, David Miller wrote:

> From: "Dave Young" <[EMAIL PROTECTED]>
> Date: Wed, 23 Jan 2008 09:44:30 +0800
> 
> > On Jan 22, 2008 6:47 PM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> > > [PATCH] [TCP]: debug S+L
> > 
> > Thanks, If there's new findings I will let you know.
> 
> Thanks for helping with this bug Dave.

I noticed btw that there thing might (is likely to) spuriously trigger at 
WARN_ON(sacked != tp->sacked_out); because those won't be equal when SACK 
is not enabled. If that does happen too often, I send a fixed patch for 
it, yet, the fact that I print print tp->rx_opt.sack_ok allows
identification of those cases already as it's zero when SACK is not 
enabled.

Just ask if you need the updated debug patch.

-- 
 i.

Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-22 Thread Dave Young
On Jan 23, 2008 3:41 PM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
>
> On Tue, 22 Jan 2008, David Miller wrote:
>
> > From: "Dave Young" <[EMAIL PROTECTED]>
> > Date: Wed, 23 Jan 2008 09:44:30 +0800
> >
> > > On Jan 22, 2008 6:47 PM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> > > > [PATCH] [TCP]: debug S+L
> > >
> > > Thanks, If there's new findings I will let you know.
> >
> > Thanks for helping with this bug Dave.
>
> I noticed btw that there thing might (is likely to) spuriously trigger at
> WARN_ON(sacked != tp->sacked_out); because those won't be equal when SACK
> is not enabled. If that does happen too often, I send a fixed patch for
> it, yet, the fact that I print print tp->rx_opt.sack_ok allows
> identification of those cases already as it's zero when SACK is not
> enabled.
>
> Just ask if you need the updated debug patch.

Thanks,  please send, I would like to get it.

>
> --
>  i.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-23 Thread Ilpo Järvinen
On Wed, 23 Jan 2008, Dave Young wrote:

> On Jan 23, 2008 3:41 PM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> >
> > On Tue, 22 Jan 2008, David Miller wrote:
> >
> > > From: "Dave Young" <[EMAIL PROTECTED]>
> > > Date: Wed, 23 Jan 2008 09:44:30 +0800
> > >
> > > > On Jan 22, 2008 6:47 PM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> > > > > [PATCH] [TCP]: debug S+L
> > > >
> > > > Thanks, If there's new findings I will let you know.
> > >
> > > Thanks for helping with this bug Dave.
> >
> > I noticed btw that there thing might (is likely to) spuriously trigger at
> > WARN_ON(sacked != tp->sacked_out); because those won't be equal when SACK
> > is not enabled. If that does happen too often, I send a fixed patch for
> > it, yet, the fact that I print print tp->rx_opt.sack_ok allows
> > identification of those cases already as it's zero when SACK is not
> > enabled.
> >
> > Just ask if you need the updated debug patch.
> 
> Thanks,  please send, I would like to get it.

There you go. I fixed non-SACK case by adding tcp_is_sack checks there and 
also added two verifys to tcp_ack to see if there's corruption outside of 
TCP.

-- 
 i.

[PATCH] [TCP]: debug S+L

---
 include/net/tcp.h |8 +++-
 net/ipv4/tcp_input.c  |   10 +
 net/ipv4/tcp_ipv4.c   |  101 +
 net/ipv4/tcp_output.c |   21 +++---
 4 files changed, 133 insertions(+), 7 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7de4ea3..0685035 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -272,6 +272,8 @@ DECLARE_SNMP_STAT(struct tcp_mib, tcp_statistics);
 #define TCP_ADD_STATS_BH(field, val)   SNMP_ADD_STATS_BH(tcp_statistics, 
field, val)
 #define TCP_ADD_STATS_USER(field, val) SNMP_ADD_STATS_USER(tcp_statistics, 
field, val)
 
+extern voidtcp_verify_wq(struct sock *sk);
+
 extern voidtcp_v4_err(struct sk_buff *skb, u32);
 
 extern voidtcp_shutdown (struct sock *sk, int how);
@@ -768,7 +770,11 @@ static inline __u32 tcp_current_ssthresh(const struct sock 
*sk)
 }
 
 /* Use define here intentionally to get WARN_ON location shown at the caller */
-#define tcp_verify_left_out(tp)WARN_ON(tcp_left_out(tp) > 
tp->packets_out)
+#define tcp_verify_left_out(tp)\
+   do { \
+   WARN_ON(tcp_left_out(tp) > tp->packets_out); \
+   tcp_verify_wq((struct sock *)tp); \
+   } while(0)
 
 extern void tcp_enter_cwr(struct sock *sk, const int set_ssthresh);
 extern __u32 tcp_init_cwnd(struct tcp_sock *tp, struct dst_entry *dst);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index fa2c85c..cdacf70 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2645,6 +2645,10 @@ static void tcp_fastretrans_alert(struct sock *sk, int 
pkts_acked, int flag)
if (do_lost || (tcp_is_fack(tp) && tcp_head_timedout(sk)))
tcp_update_scoreboard(sk, fast_rexmit);
tcp_cwnd_down(sk, flag);
+
+   WARN_ON(tcp_write_queue_head(sk) == NULL);
+   WARN_ON(!tp->packets_out);
+
tcp_xmit_retransmit_queue(sk);
 }
 
@@ -2848,6 +2852,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets)
tcp_clear_all_retrans_hints(tp);
}
 
+   tcp_verify_left_out(tp);
+
if (skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
flag |= FLAG_SACK_RENEGING;
 
@@ -3175,6 +3181,8 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, 
int flag)
prior_fackets = tp->fackets_out;
prior_in_flight = tcp_packets_in_flight(tp);
 
+   tcp_verify_left_out(tp);
+
if (!(flag & FLAG_SLOWPATH) && after(ack, prior_snd_una)) {
/* Window is constant, pure forward advance.
 * No more checks are required.
@@ -3237,6 +3245,8 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, 
int flag)
if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
dst_confirm(sk->sk_dst_cache);
 
+   tcp_verify_left_out(tp);
+
return 1;
 
 no_queue:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 9aea88b..7e8ab40 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -108,6 +108,107 @@ struct inet_hashinfo __cacheline_aligned tcp_hashinfo = {
.lhash_wait  = __WAIT_QUEUE_HEAD_INITIALIZER(tcp_hashinfo.lhash_wait),
 };
 
+void tcp_print_queue(struct sock *sk)
+{
+   struct tcp_sock *tp = tcp_sk(sk);
+   struct sk_buff *skb;
+   char s[50+1];
+   char h[50+1];
+   int idx = 0;
+   int i;
+
+   tcp_for_write_queue(skb, sk) {
+   if (skb == tcp_send_head(sk))
+   break;
+
+   for (i = 0; i < tcp_skb_pcount(skb); i++) {
+   if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) {
+   s[idx] = 'S';
+   if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+  

Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-23 Thread Ilpo Järvinen
On Wed, 23 Jan 2008, Ilpo Järvinen wrote:

> On Wed, 23 Jan 2008, Dave Young wrote:
> 
> > On Jan 23, 2008 3:41 PM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> > >
> > > On Tue, 22 Jan 2008, David Miller wrote:
> > >
> > > > From: "Dave Young" <[EMAIL PROTECTED]>
> > > > Date: Wed, 23 Jan 2008 09:44:30 +0800
> > > >
> > > > > On Jan 22, 2008 6:47 PM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> > > > > > [PATCH] [TCP]: debug S+L
> > > > >
> > > > > Thanks, If there's new findings I will let you know.
> > > >
> > > > Thanks for helping with this bug Dave.
> > >
> > > I noticed btw that there thing might (is likely to) spuriously trigger at
> > > WARN_ON(sacked != tp->sacked_out); because those won't be equal when SACK
> > > is not enabled. If that does happen too often, I send a fixed patch for
> > > it, yet, the fact that I print print tp->rx_opt.sack_ok allows
> > > identification of those cases already as it's zero when SACK is not
> > > enabled.
> > >
> > > Just ask if you need the updated debug patch.
> > 
> > Thanks,  please send, I would like to get it.
> 
> There you go. I fixed non-SACK case by adding tcp_is_sack checks there and 
> also added two verifys to tcp_ack to see if there's corruption outside of 
> TCP.

There's some discussion about a problem that is very likely the same as in 
here (sorry for not remembering to cc you in there due to rapid progress):

   http://marc.info/?t=12010717423&r=1&w=2


-- 
 i.

Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-23 Thread Dave Young
On Jan 23, 2008 7:01 PM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> On Wed, 23 Jan 2008, Ilpo Järvinen wrote:
>
> > On Wed, 23 Jan 2008, Dave Young wrote:
> >
> > > On Jan 23, 2008 3:41 PM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> > > >
> > > > On Tue, 22 Jan 2008, David Miller wrote:
> > > >
> > > > > From: "Dave Young" <[EMAIL PROTECTED]>
> > > > > Date: Wed, 23 Jan 2008 09:44:30 +0800
> > > > >
> > > > > > On Jan 22, 2008 6:47 PM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> > > > > > > [PATCH] [TCP]: debug S+L
> > > > > >
> > > > > > Thanks, If there's new findings I will let you know.
> > > > >
> > > > > Thanks for helping with this bug Dave.
> > > >
> > > > I noticed btw that there thing might (is likely to) spuriously trigger 
> > > > at
> > > > WARN_ON(sacked != tp->sacked_out); because those won't be equal when 
> > > > SACK
> > > > is not enabled. If that does happen too often, I send a fixed patch for
> > > > it, yet, the fact that I print print tp->rx_opt.sack_ok allows
> > > > identification of those cases already as it's zero when SACK is not
> > > > enabled.
> > > >
> > > > Just ask if you need the updated debug patch.
> > >
> > > Thanks,  please send, I would like to get it.
> >
> > There you go. I fixed non-SACK case by adding tcp_is_sack checks there and
> > also added two verifys to tcp_ack to see if there's corruption outside of
> > TCP.
>
> There's some discussion about a problem that is very likely the same as in
> here (sorry for not remembering to cc you in there due to rapid progress):
>
>http://marc.info/?t=12010717423&r=1&w=2

Thanks.

New warning trigged with your debug patch:

ACPI: PCI Interrupt :00:1b.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device :00:1b.0 to 64
e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
ACPI: PCI Interrupt :03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
e100: eth1: e100_probe: addr 0xefaff000, irq 20, MAC addr 00:13:72:e7:4d:66
eth0:  setting full-duplex.
[ cut here ]
WARNING: at net/ipv4/tcp_ipv4.c:197 tcp_verify_wq+0x1b6/0x1c0()
Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event
snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse
snd_hda_intel snd_pcm snd_timer btusb bluetooth serio_raw snd 3c59x sg
evdev thermal soundcore rtc_cmos snd_page_alloc rtc_core rtc_lib
i2c_i801 processor button intel_agp dcdbas pcspkr agpgart
Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #8
 [] ? have_callable_console+0x20/0x30
 [] warn_on_slowpath+0x54/0x80
 [] ? timer_list_show_tickdevices+0xf0/0x110
 [] ? native_sched_clock+0x85/0xe0
 [] ? put_lock_stats+0x21/0x30
 [] ? lock_release_holdtime+0x60/0x80
 [] ? check_bytes_and_report+0x24/0xc0
 [] ? check_bytes_and_report+0x24/0xc0
 [] ? check_pad_bytes+0x61/0x80
 [] tcp_verify_wq+0x1b6/0x1c0
 [] ? tcp_clean_rtx_queue+0x2d9/0x5b0
 [] tcp_add_reno_sack+0x30/0x50
 [] tcp_fastretrans_alert+0x3d2/0x700
 [] tcp_ack+0x1b3/0x3a0
 [] tcp_rcv_established+0x3eb/0x710
 [] tcp_v4_do_rcv+0xe5/0x100
 [] tcp_v4_rcv+0x5db/0x660
 [] ? tcp_v4_rcv+0x387/0x660
 [] ? ip_local_deliver_finish+0x2d/0x1d0
 [] ip_local_deliver_finish+0x84/0x1d0
 [] ? ip_local_deliver_finish+0x2d/0x1d0
 [] ? __lock_release+0x47/0x70
 [] ip_local_deliver+0xb7/0xc0
 [] ip_rcv_finish+0xb2/0x3c0
 [] ? sock_def_readable+0x48/0xa0
 [] ? sock_queue_rcv_skb+0xb1/0x1a0
 [] ? sock_queue_rcv_skb+0xf7/0x1a0
 [] ip_rcv+0x18f/0x290
 [] ? packet_rcv_spkt+0xd0/0x130
 [] netif_receive_skb+0x2b6/0x330
 [] ? netif_receive_skb+0x127/0x330
 [] ? process_backlog+0x83/0x100
 [] process_backlog+0x8e/0x100
 [] net_rx_action+0x13c/0x230
 [] ? net_rx_action+0x59/0x230
 [] ? __do_softirq+0x6e/0x120
 [] __do_softirq+0x93/0x120
 [] do_softirq+0x7a/0x80
 [] irq_exit+0x65/0x90
 [] do_IRQ+0x41/0x80
 [] ? trace_hardirqs_on+0xb9/0x130
 [] common_interrupt+0x2e/0x34
 [] ? mwait_idle_with_hints+0x40/0x50
 [] ? mwait_idle+0x0/0x20
 [] mwait_idle+0x12/0x20
 [] cpu_idle+0x61/0x110
 [] rest_init+0x5d/0x60
 [] start_kernel+0x1fa/0x260
 [] ? unknown_bootoption+0x0/0x130
 ===
---[ end trace 14b601818e6903ac ]---
P: 5 L: 0 vs 0 S: 0 vs 1 w: 2044790889-2044796616 (0)
TCP wq(s)  <
TCP wq(h) +++h+<
l0 s1 f0 p5 seq: su2044790889 hs2044795029 sn2044796616
[ cut here ]
WARNING: at net/ipv4/tcp_ipv4.c:197 tcp_verify_wq+0x1b6/0x1c0()
Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event
snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse
snd_hda_intel snd_pcm snd_timer btusb bluetooth serio_raw snd 3c59x sg
evdev thermal soundcore rtc_cmos snd_page_alloc rtc_core rtc_lib
i2c_i801 processor button intel_agp dcdbas pcspkr agpgart
Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #8
 [] ? have_callable_console+0x20/0x30
 [] warn_on_slowpath+0x54/0x80
 [] ? generic_make_request+0x1c0/0x2e0
 [] ? printk+0x18/0x20
 [] ? tcp_print_queue+0x1a4/0x230
 [] ? vprintk+0x308/0x320
 [] tcp_verify_wq+0x1b6/0x1c

Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-24 Thread Ilpo Järvinen
On Thu, 24 Jan 2008, Dave Young wrote:

Hi Dave (& others),

> Thanks.

Thanks a lot, I was first to ignore all these because they occurred 
with newreno, but looked again... :-/

> New warning trigged with your debug patch:

This was probably with the earlier one I sent to you because there's still 
this case remaining which itself is valid:

> P: 5 L: 0 vs 0 S: 0 vs 1 w: 2044790889-2044796616 (0)

...snip... this is still ok state (S+L <= P):

> P: 5 L: 0 vs 0 S: 0 vs 3 w: 2044790889-2044796616 (0)
> TCP wq(s)  <
> TCP wq(h) +++h+<
> l0 s3 f0 p5 seq: su2044790889 hs2044795029 sn2044796616
> [ cut here ]
> WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x122/0x150()
> Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event
> snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse
> snd_hda_intel snd_pcm snd_timer btusb bluetooth serio_raw snd 3c59x sg
> evdev thermal soundcore rtc_cmos snd_page_alloc rtc_core rtc_lib
> i2c_i801 processor button intel_agp dcdbas pcspkr agpgart
> Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #8
>  [] ? have_callable_console+0x20/0x30
>  [] warn_on_slowpath+0x54/0x80
>  [] ? tcp_print_queue+0x1a4/0x230
>  [] ? vprintk+0x308/0x320
>  [] ? vprintk+0x308/0x320
>  [] ? vprintk+0x308/0x320
>  [] ? tcp_verify_wq+0x116/0x1c0
>  [] tcp_mark_head_lost+0x122/0x150
>  [] tcp_update_scoreboard+0x4a/0x190
>  [] tcp_fastretrans_alert+0x4da/0x700
>  [] tcp_ack+0x1b3/0x3a0
>  [] tcp_rcv_established+0x3eb/0x710
>  [] tcp_v4_do_rcv+0xe5/0x100
>  [] tcp_v4_rcv+0x5db/0x660
>  [] ? tcp_v4_rcv+0x387/0x660
>  [] ? ip_local_deliver_finish+0x2d/0x1d0
>  [] ip_local_deliver_finish+0x84/0x1d0
>  [] ? ip_local_deliver_finish+0x2d/0x1d0
>  [] ? __lock_release+0x47/0x70
>  [] ip_local_deliver+0xb7/0xc0
>  [] ip_rcv_finish+0xb2/0x3c0
>  [] ? sock_def_readable+0x48/0xa0
>  [] ? sock_queue_rcv_skb+0xb1/0x1a0
>  [] ? sock_queue_rcv_skb+0xf7/0x1a0
>  [] ip_rcv+0x18f/0x290
>  [] ? packet_rcv_spkt+0xd0/0x130
>  [] netif_receive_skb+0x2b6/0x330
>  [] ? netif_receive_skb+0x127/0x330
>  [] ? process_backlog+0x83/0x100
>  [] process_backlog+0x8e/0x100
>  [] net_rx_action+0x13c/0x230
>  [] ? net_rx_action+0x59/0x230
>  [] __do_softirq+0x93/0x120
>  [] do_softirq+0x7a/0x80
>  [] irq_exit+0x65/0x90
>  [] do_IRQ+0x41/0x80
>  [] ? trace_hardirqs_on+0xb9/0x130
>  [] common_interrupt+0x2e/0x34
>  [] ? mwait_idle_with_hints+0x40/0x50
>  [] ? mwait_idle+0x0/0x20
>  [] mwait_idle+0x12/0x20
>  [] cpu_idle+0x61/0x110
>  [] rest_init+0x5d/0x60
>  [] start_kernel+0x1fa/0x260
>  [] ? unknown_bootoption+0x0/0x130
>  ===
> ---[ end trace 14b601818e6903ac ]---
> [ cut here ]
> WARNING: at net/ipv4/tcp_ipv4.c:197 tcp_verify_wq+0x1b6/0x1c0()
> Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event
> snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse
> snd_hda_intel snd_pcm snd_timer btusb bluetooth serio_raw snd 3c59x sg
> evdev thermal soundcore rtc_cmos snd_page_alloc rtc_core rtc_lib
> i2c_i801 processor button intel_agp dcdbas pcspkr agpgart
> Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #8
>  [] ? have_callable_console+0x20/0x30
>  [] warn_on_slowpath+0x54/0x80
>  [] ? print_oops_end_marker+0x2a/0x30
>  [] ? warn_on_slowpath+0x59/0x80
>  [] ? tcp_print_queue+0x1a4/0x230
>  [] ? vprintk+0x308/0x320
>  [] ? vprintk+0x308/0x320
>  [] tcp_verify_wq+0x1b6/0x1c0
>  [] ? tcp_verify_wq+0x116/0x1c0
>  [] tcp_mark_head_lost+0xcc/0x150
>  [] tcp_update_scoreboard+0x4a/0x190
>  [] tcp_fastretrans_alert+0x4da/0x700
>  [] tcp_ack+0x1b3/0x3a0
>  [] tcp_rcv_established+0x3eb/0x710
>  [] tcp_v4_do_rcv+0xe5/0x100
>  [] tcp_v4_rcv+0x5db/0x660
>  [] ? tcp_v4_rcv+0x387/0x660
>  [] ? ip_local_deliver_finish+0x2d/0x1d0
>  [] ip_local_deliver_finish+0x84/0x1d0
>  [] ? ip_local_deliver_finish+0x2d/0x1d0
>  [] ? __lock_release+0x47/0x70
>  [] ip_local_deliver+0xb7/0xc0
>  [] ip_rcv_finish+0xb2/0x3c0
>  [] ? sock_def_readable+0x48/0xa0
>  [] ? sock_queue_rcv_skb+0xb1/0x1a0
>  [] ? sock_queue_rcv_skb+0xf7/0x1a0
>  [] ip_rcv+0x18f/0x290
>  [] ? packet_rcv_spkt+0xd0/0x130
>  [] netif_receive_skb+0x2b6/0x330
>  [] ? netif_receive_skb+0x127/0x330
>  [] ? process_backlog+0x83/0x100
>  [] process_backlog+0x8e/0x100
>  [] net_rx_action+0x13c/0x230
>  [] ? net_rx_action+0x59/0x230
>  [] __do_softirq+0x93/0x120
>  [] do_softirq+0x7a/0x80
>  [] irq_exit+0x65/0x90
>  [] do_IRQ+0x41/0x80
>  [] ? trace_hardirqs_on+0xb9/0x130
>  [] common_interrupt+0x2e/0x34
>  [] ? mwait_idle_with_hints+0x40/0x50
>  [] ? mwait_idle+0x0/0x20
>  [] mwait_idle+0x12/0x20
>  [] cpu_idle+0x61/0x110
>  [] rest_init+0x5d/0x60
>  [] start_kernel+0x1fa/0x260
>  [] ? unknown_bootoption+0x0/0x130
>  ===
> ---[ end trace 14b601818e6903ac ]---

...But this no longer is, and even more, L: 5 is not valid state at this 
point all (should only happen if we went to RTO but it would reset S to 
zero with newreno):

> P: 5 L: 5 vs 5 S: 0 vs 3 w: 2044790

Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-24 Thread Ilpo Järvinen
On Thu, 24 Jan 2008, Ilpo Järvinen wrote:

> And anyway, there were some fackets_out related 
> problems reported as well and this doesn't help for that but I think I've 
> lost track of who was seeing it due to large number of reports :-), could 
> somebody refresh my memory because I currently don't have time to dig it 
> up from archives (at least on this week).

Here's the updated debug patch for net-2.6.25/mm for tracking 
fackets_out inconsistencies (it won't work for trees which don't
include net-2.6.25, mm does of course :-)).

I hope I got it into good shape this time to avoid spurious stacktraces 
but still maintaining 100% accuracy, but it's not a simple oneliner so I 
might have missed something... :-)

I'd suggest that people trying with this first apply the newreno fix of 
the previous mail to avoid already-fixed case from triggering.

-- 
 i.

--
[PATCH] [TCP]: debug S+L (for net-2.5.26 / mm, incompatible with mainline)

---
 include/net/tcp.h |5 ++-
 net/ipv4/tcp_input.c  |   18 +++-
 net/ipv4/tcp_ipv4.c   |  127 +
 net/ipv4/tcp_output.c |   23 +++--
 4 files changed, 165 insertions(+), 8 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7de4ea3..552aa71 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -272,6 +272,9 @@ DECLARE_SNMP_STAT(struct tcp_mib, tcp_statistics);
 #define TCP_ADD_STATS_BH(field, val)   SNMP_ADD_STATS_BH(tcp_statistics, 
field, val)
 #define TCP_ADD_STATS_USER(field, val) SNMP_ADD_STATS_USER(tcp_statistics, 
field, val)
 
+extern void tcp_print_queue(struct sock *sk);
+extern voidtcp_verify_wq(struct sock *sk);
+
 extern voidtcp_v4_err(struct sk_buff *skb, u32);
 
 extern voidtcp_shutdown (struct sock *sk, int how);
@@ -768,7 +771,7 @@ static inline __u32 tcp_current_ssthresh(const struct sock 
*sk)
 }
 
 /* Use define here intentionally to get WARN_ON location shown at the caller */
-#define tcp_verify_left_out(tp)WARN_ON(tcp_left_out(tp) > 
tp->packets_out)
+#define tcp_verify_left_out(tp)tcp_verify_wq((struct sock *)tp)
 
 extern void tcp_enter_cwr(struct sock *sk, const int set_ssthresh);
 extern __u32 tcp_init_cwnd(struct tcp_sock *tp, struct dst_entry *dst);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 19c449f..c897c93 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1426,8 +1426,10 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb,
int first_sack_index;
 
if (!tp->sacked_out) {
-   if (WARN_ON(tp->fackets_out))
+   if (WARN_ON(tp->fackets_out)) {
+   tcp_verify_left_out(tp);
tp->fackets_out = 0;
+   }
tcp_highest_sack_reset(sk);
}
 
@@ -2136,6 +2138,8 @@ static void tcp_mark_head_lost(struct sock *sk, int 
packets, int fast_rexmit)
struct sk_buff *skb;
int cnt;
 
+   tcp_verify_left_out(tp);
+
BUG_TRAP(packets <= tp->packets_out);
if (tp->lost_skb_hint) {
skb = tp->lost_skb_hint;
@@ -2501,6 +2505,8 @@ static void tcp_fastretrans_alert(struct sock *sk, int 
pkts_acked, int flag)
(tcp_fackets_out(tp) > tp->reordering));
int fast_rexmit = 0;
 
+   tcp_verify_left_out(tp);
+
if (WARN_ON(!tp->packets_out && tp->sacked_out))
tp->sacked_out = 0;
if (WARN_ON(!tp->sacked_out && tp->fackets_out))
@@ -2645,6 +2651,10 @@ static void tcp_fastretrans_alert(struct sock *sk, int 
pkts_acked, int flag)
if (do_lost || (tcp_is_fack(tp) && tcp_head_timedout(sk)))
tcp_update_scoreboard(sk, fast_rexmit);
tcp_cwnd_down(sk, flag);
+
+   WARN_ON(tcp_write_queue_head(sk) == NULL);
+   WARN_ON(!tp->packets_out);
+
tcp_xmit_retransmit_queue(sk);
 }
 
@@ -2848,6 +2858,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets)
tcp_clear_all_retrans_hints(tp);
}
 
+   tcp_verify_left_out(tp);
+
if (skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
flag |= FLAG_SACK_RENEGING;
 
@@ -3175,6 +3187,8 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, 
int flag)
prior_fackets = tp->fackets_out;
prior_in_flight = tcp_packets_in_flight(tp);
 
+   tcp_verify_left_out(tp);
+
if (!(flag & FLAG_SLOWPATH) && after(ack, prior_snd_una)) {
/* Window is constant, pure forward advance.
 * No more checks are required.
@@ -3237,6 +3251,8 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, 
int flag)
if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
dst_confirm(sk->sk_dst_cache);
 
+   tcp_verify_left_out(tp);
+
return 1;
 
 no_queue:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 9aea8

Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-24 Thread Krishna Kumar2
Hi Ilpo,

I have tried parallel iperfs with this patch and don't get any more
warnings.
I will run overnight to be sure.

thanks,

- KK

[EMAIL PROTECTED] wrote on 01/24/2008 03:24:18 PM:

> On Thu, 24 Jan 2008, Dave Young wrote:
>
> Hi Dave (& others),
>
> > Thanks.
>
> Thanks a lot, I was first to ignore all these because they occurred
> with newreno, but looked again... :-/
>
> > New warning trigged with your debug patch:
>
> This was probably with the earlier one I sent to you because there's
still
> this case remaining which itself is valid:
>
> > P: 5 L: 0 vs 0 S: 0 vs 1 w: 2044790889-2044796616 (0)
>
> ...snip... this is still ok state (S+L <= P):
>
> > P: 5 L: 0 vs 0 S: 0 vs 3 w: 2044790889-2044796616 (0)
> > TCP wq(s)  <
> > TCP wq(h) +++h+<
> > l0 s3 f0 p5 seq: su2044790889 hs2044795029 sn2044796616
> > [ cut here ]
> > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x122/0x150()
> > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event
> > snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse
> > snd_hda_intel snd_pcm snd_timer btusb bluetooth serio_raw snd 3c59x sg
> > evdev thermal soundcore rtc_cmos snd_page_alloc rtc_core rtc_lib
> > i2c_i801 processor button intel_agp dcdbas pcspkr agpgart
> > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #8
> >  [] ? have_callable_console+0x20/0x30
> >  [] warn_on_slowpath+0x54/0x80
> >  [] ? tcp_print_queue+0x1a4/0x230
> >  [] ? vprintk+0x308/0x320
> >  [] ? vprintk+0x308/0x320
> >  [] ? vprintk+0x308/0x320
> >  [] ? tcp_verify_wq+0x116/0x1c0
> >  [] tcp_mark_head_lost+0x122/0x150
> >  [] tcp_update_scoreboard+0x4a/0x190
> >  [] tcp_fastretrans_alert+0x4da/0x700
> >  [] tcp_ack+0x1b3/0x3a0
> >  [] tcp_rcv_established+0x3eb/0x710
> >  [] tcp_v4_do_rcv+0xe5/0x100
> >  [] tcp_v4_rcv+0x5db/0x660
> >  [] ? tcp_v4_rcv+0x387/0x660
> >  [] ? ip_local_deliver_finish+0x2d/0x1d0
> >  [] ip_local_deliver_finish+0x84/0x1d0
> >  [] ? ip_local_deliver_finish+0x2d/0x1d0
> >  [] ? __lock_release+0x47/0x70
> >  [] ip_local_deliver+0xb7/0xc0
> >  [] ip_rcv_finish+0xb2/0x3c0
> >  [] ? sock_def_readable+0x48/0xa0
> >  [] ? sock_queue_rcv_skb+0xb1/0x1a0
> >  [] ? sock_queue_rcv_skb+0xf7/0x1a0
> >  [] ip_rcv+0x18f/0x290
> >  [] ? packet_rcv_spkt+0xd0/0x130
> >  [] netif_receive_skb+0x2b6/0x330
> >  [] ? netif_receive_skb+0x127/0x330
> >  [] ? process_backlog+0x83/0x100
> >  [] process_backlog+0x8e/0x100
> >  [] net_rx_action+0x13c/0x230
> >  [] ? net_rx_action+0x59/0x230
> >  [] __do_softirq+0x93/0x120
> >  [] do_softirq+0x7a/0x80
> >  [] irq_exit+0x65/0x90
> >  [] do_IRQ+0x41/0x80
> >  [] ? trace_hardirqs_on+0xb9/0x130
> >  [] common_interrupt+0x2e/0x34
> >  [] ? mwait_idle_with_hints+0x40/0x50
> >  [] ? mwait_idle+0x0/0x20
> >  [] mwait_idle+0x12/0x20
> >  [] cpu_idle+0x61/0x110
> >  [] rest_init+0x5d/0x60
> >  [] start_kernel+0x1fa/0x260
> >  [] ? unknown_bootoption+0x0/0x130
> >  ===
> > ---[ end trace 14b601818e6903ac ]---
> > [ cut here ]
> > WARNING: at net/ipv4/tcp_ipv4.c:197 tcp_verify_wq+0x1b6/0x1c0()
> > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event
> > snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse
> > snd_hda_intel snd_pcm snd_timer btusb bluetooth serio_raw snd 3c59x sg
> > evdev thermal soundcore rtc_cmos snd_page_alloc rtc_core rtc_lib
> > i2c_i801 processor button intel_agp dcdbas pcspkr agpgart
> > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #8
> >  [] ? have_callable_console+0x20/0x30
> >  [] warn_on_slowpath+0x54/0x80
> >  [] ? print_oops_end_marker+0x2a/0x30
> >  [] ? warn_on_slowpath+0x59/0x80
> >  [] ? tcp_print_queue+0x1a4/0x230
> >  [] ? vprintk+0x308/0x320
> >  [] ? vprintk+0x308/0x320
> >  [] tcp_verify_wq+0x1b6/0x1c0
> >  [] ? tcp_verify_wq+0x116/0x1c0
> >  [] tcp_mark_head_lost+0xcc/0x150
> >  [] tcp_update_scoreboard+0x4a/0x190
> >  [] tcp_fastretrans_alert+0x4da/0x700
> >  [] tcp_ack+0x1b3/0x3a0
> >  [] tcp_rcv_established+0x3eb/0x710
> >  [] tcp_v4_do_rcv+0xe5/0x100
> >  [] tcp_v4_rcv+0x5db/0x660
> >  [] ? tcp_v4_rcv+0x387/0x660
> >  [] ? ip_local_deliver_finish+0x2d/0x1d0
> >  [] ip_local_deliver_finish+0x84/0x1d0
> >  [] ? ip_local_deliver_finish+0x2d/0x1d0
> >  [] ? __lock_release+0x47/0x70
> >  [] ip_local_deliver+0xb7/0xc0
> >  [] ip_rcv_finish+0xb2/0x3c0
> >  [] ? sock_def_readable+0x48/0xa0
> >  [] ? sock_queue_rcv_skb+0xb1/0x1a0
> >  [] ? sock_queue_rcv_skb+0xf7/0x1a0
> >  [] ip_rcv+0x18f/0x290
> >  [] ? packet_rcv_spkt+0xd0/0x130
> >  [] netif_receive_skb+0x2b6/0x330
> >  [] ? netif_receive_skb+0x127/0x330
> >  [] ? process_backlog+0x83/0x100
> >  [] process_backlog+0x8e/0x100
> >  [] net_rx_action+0x13c/0x230
> >  [] ? net_rx_action+0x59/0x230
> >  [] __do_softirq+0x93/0x120
> >  [] do_softirq+0x7a/0x80
> >  [] irq_exit+0x65/0x90
> >  [] do_IRQ+0x41/0x80
> >  [] ? trace_hardirqs_on+0xb9/0x130
> >  [] common_interrupt+0x2e/0x34
> >  [] ? mwait_idle_with_hints+0x40/0x50
> >  [] ? mwait_

Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-24 Thread Kamalesh Babulal
On Thu, Jan 24, 2008 at 11:54:18AM +0200, Ilpo Järvinen wrote:
> On Thu, 24 Jan 2008, Dave Young wrote:
> 
> Hi Dave (& others),
> 
> > Thanks.
> 
> Thanks a lot, I was first to ignore all these because they occurred 
> with newreno, but looked again... :-/
> 
> > New warning trigged with your debug patch:
> 
> This was probably with the earlier one I sent to you because there's still 
> this case remaining which itself is valid:
> 
> > P: 5 L: 0 vs 0 S: 0 vs 1 w: 2044790889-2044796616 (0)
> 
> ...snip... this is still ok state (S+L <= P):
> 
> > P: 5 L: 0 vs 0 S: 0 vs 3 w: 2044790889-2044796616 (0)
> > TCP wq(s)  <
> > TCP wq(h) +++h+<
> > l0 s3 f0 p5 seq: su2044790889 hs2044795029 sn2044796616
> > [ cut here ]
> > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x122/0x150()
> > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event
> > snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse
> > snd_hda_intel snd_pcm snd_timer btusb bluetooth serio_raw snd 3c59x sg
> > evdev thermal soundcore rtc_cmos snd_page_alloc rtc_core rtc_lib
> > i2c_i801 processor button intel_agp dcdbas pcspkr agpgart
> > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #8
> >  [] ? have_callable_console+0x20/0x30
> >  [] warn_on_slowpath+0x54/0x80
> >  [] ? tcp_print_queue+0x1a4/0x230
> >  [] ? vprintk+0x308/0x320
.
.

.
.

> > ---[ end trace 14b601818e6903ac ]---
> 
> ...But this no longer is, and even more, L: 5 is not valid state at this 
> point all (should only happen if we went to RTO but it would reset S to 
> zero with newreno):
> 
> > P: 5 L: 5 vs 5 S: 0 vs 3 w: 2044790889-2044796616 (0)
> > TCP wq(s) l<
> > TCP wq(h) +++h+<
> > l5 s3 f0 p5 seq: su2044790889 hs2044795029 sn2044796616
> 
> Surprisingly, it was the first time the WARN_ON for left_out returned 
> correct location. This also explains why the patch I sent to Krishna 
> didn't print anything (it didn't end up into printing because I forgot 
> to add L+S>P check into to the state checking if).
> 
> ...so please, could you (others than Denys) try this patch, it should 
> solve the issue. And Denys, could you confirm (and if necessary double 
> check) that the kernel you saw this similar problem with is the pure 
> Linus' mainline, i.e., without any net-2.6.25 or mm bits please, if so, 
> that problem persists. And anyway, there were some fackets_out related 
> problems reported as well and this doesn't help for that but I think I've 
> lost track of who was seeing it due to large number of reports :-), could 
> somebody refresh my memory because I currently don't have time to dig it 
> up from archives (at least on this week).
> 
> 
> -- 
>  i.
> 
> --
> [PATCH] [TCP]: NewReno must count every skb while marking losses
> 
> NewReno should add cnt per skb (as with FACK) instead of depending
> on SACKED_ACKED bits which won't be set with it at all.
> Effectively, NewReno should always exists after the first
> iteration anyway (or immediately if there's already head in
> lost_out.
> 
> This was fixed earlier in net-2.6.25 but got reverted among other
> stuff and I didn't notice that this is still necessary (actually
> wasn't even considering this case while trying to figure out the
> reports because I lived with different kind of code than it in
> reality was).
> 
> This should solve the WARN_ONs in TCP code that as a result of
> this triggered multiple times in every place we check for this
> invariant.
> 
> Special thanks to Dave Young <[EMAIL PROTECTED]> and
> Krishna Kumar2 <[EMAIL PROTECTED]> for trying with my debug
> patches.

Hi,

Thanks, after applying the patch the warning is not seen.


  Tested-by: Kamalesh Babulal <[EMAIL PROTECTED]> 
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
> ---
>  net/ipv4/tcp_input.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 295490e..aa409a5 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2156,7 +2156,7 @@ static void tcp_mark_head_lost(struct sock *sk, int 
> packets, int fast_rexmit)
>   tp->lost_skb_hint = skb;
>   tp->lost_cnt_hint = cnt;
> 
> - if (tcp_is_fack(tp) ||
> + if (tcp_is_fack(tp) || tcp_is_reno(tp) ||
>   (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
>   cnt += tcp_skb_pcount(skb);
> 
> -- 
> 1.5.2.2


-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-24 Thread Dave Young
On Jan 24, 2008 5:54 PM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
> On Thu, 24 Jan 2008, Dave Young wrote:
>
> Hi Dave (& others),
>
> > Thanks.
>
> Thanks a lot, I was first to ignore all these because they occurred
> with newreno, but looked again... :-/
>
> > New warning trigged with your debug patch:
>
> This was probably with the earlier one I sent to you because there's still
> this case remaining which itself is valid:
>
> > P: 5 L: 0 vs 0 S: 0 vs 1 w: 2044790889-2044796616 (0)
>
> ...snip... this is still ok state (S+L <= P):
>
>
> > P: 5 L: 0 vs 0 S: 0 vs 3 w: 2044790889-2044796616 (0)
> > TCP wq(s)  <
> > TCP wq(h) +++h+<
> > l0 s3 f0 p5 seq: su2044790889 hs2044795029 sn2044796616
> > [ cut here ]
> > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x122/0x150()
> > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event
> > snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse
> > snd_hda_intel snd_pcm snd_timer btusb bluetooth serio_raw snd 3c59x sg
> > evdev thermal soundcore rtc_cmos snd_page_alloc rtc_core rtc_lib
> > i2c_i801 processor button intel_agp dcdbas pcspkr agpgart
> > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #8
> >  [] ? have_callable_console+0x20/0x30
> >  [] warn_on_slowpath+0x54/0x80
> >  [] ? tcp_print_queue+0x1a4/0x230
> >  [] ? vprintk+0x308/0x320
> >  [] ? vprintk+0x308/0x320
> >  [] ? vprintk+0x308/0x320
> >  [] ? tcp_verify_wq+0x116/0x1c0
> >  [] tcp_mark_head_lost+0x122/0x150
> >  [] tcp_update_scoreboard+0x4a/0x190
> >  [] tcp_fastretrans_alert+0x4da/0x700
> >  [] tcp_ack+0x1b3/0x3a0
> >  [] tcp_rcv_established+0x3eb/0x710
> >  [] tcp_v4_do_rcv+0xe5/0x100
> >  [] tcp_v4_rcv+0x5db/0x660
> >  [] ? tcp_v4_rcv+0x387/0x660
> >  [] ? ip_local_deliver_finish+0x2d/0x1d0
> >  [] ip_local_deliver_finish+0x84/0x1d0
> >  [] ? ip_local_deliver_finish+0x2d/0x1d0
> >  [] ? __lock_release+0x47/0x70
> >  [] ip_local_deliver+0xb7/0xc0
> >  [] ip_rcv_finish+0xb2/0x3c0
> >  [] ? sock_def_readable+0x48/0xa0
> >  [] ? sock_queue_rcv_skb+0xb1/0x1a0
> >  [] ? sock_queue_rcv_skb+0xf7/0x1a0
> >  [] ip_rcv+0x18f/0x290
> >  [] ? packet_rcv_spkt+0xd0/0x130
> >  [] netif_receive_skb+0x2b6/0x330
> >  [] ? netif_receive_skb+0x127/0x330
> >  [] ? process_backlog+0x83/0x100
> >  [] process_backlog+0x8e/0x100
> >  [] net_rx_action+0x13c/0x230
> >  [] ? net_rx_action+0x59/0x230
> >  [] __do_softirq+0x93/0x120
> >  [] do_softirq+0x7a/0x80
> >  [] irq_exit+0x65/0x90
> >  [] do_IRQ+0x41/0x80
> >  [] ? trace_hardirqs_on+0xb9/0x130
> >  [] common_interrupt+0x2e/0x34
> >  [] ? mwait_idle_with_hints+0x40/0x50
> >  [] ? mwait_idle+0x0/0x20
> >  [] mwait_idle+0x12/0x20
> >  [] cpu_idle+0x61/0x110
> >  [] rest_init+0x5d/0x60
> >  [] start_kernel+0x1fa/0x260
> >  [] ? unknown_bootoption+0x0/0x130
> >  ===
> > ---[ end trace 14b601818e6903ac ]---
> > [ cut here ]
> > WARNING: at net/ipv4/tcp_ipv4.c:197 tcp_verify_wq+0x1b6/0x1c0()
> > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event
> > snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse
> > snd_hda_intel snd_pcm snd_timer btusb bluetooth serio_raw snd 3c59x sg
> > evdev thermal soundcore rtc_cmos snd_page_alloc rtc_core rtc_lib
> > i2c_i801 processor button intel_agp dcdbas pcspkr agpgart
> > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #8
> >  [] ? have_callable_console+0x20/0x30
> >  [] warn_on_slowpath+0x54/0x80
> >  [] ? print_oops_end_marker+0x2a/0x30
> >  [] ? warn_on_slowpath+0x59/0x80
> >  [] ? tcp_print_queue+0x1a4/0x230
> >  [] ? vprintk+0x308/0x320
> >  [] ? vprintk+0x308/0x320
> >  [] tcp_verify_wq+0x1b6/0x1c0
> >  [] ? tcp_verify_wq+0x116/0x1c0
> >  [] tcp_mark_head_lost+0xcc/0x150
> >  [] tcp_update_scoreboard+0x4a/0x190
> >  [] tcp_fastretrans_alert+0x4da/0x700
> >  [] tcp_ack+0x1b3/0x3a0
> >  [] tcp_rcv_established+0x3eb/0x710
> >  [] tcp_v4_do_rcv+0xe5/0x100
> >  [] tcp_v4_rcv+0x5db/0x660
> >  [] ? tcp_v4_rcv+0x387/0x660
> >  [] ? ip_local_deliver_finish+0x2d/0x1d0
> >  [] ip_local_deliver_finish+0x84/0x1d0
> >  [] ? ip_local_deliver_finish+0x2d/0x1d0
> >  [] ? __lock_release+0x47/0x70
> >  [] ip_local_deliver+0xb7/0xc0
> >  [] ip_rcv_finish+0xb2/0x3c0
> >  [] ? sock_def_readable+0x48/0xa0
> >  [] ? sock_queue_rcv_skb+0xb1/0x1a0
> >  [] ? sock_queue_rcv_skb+0xf7/0x1a0
> >  [] ip_rcv+0x18f/0x290
> >  [] ? packet_rcv_spkt+0xd0/0x130
> >  [] netif_receive_skb+0x2b6/0x330
> >  [] ? netif_receive_skb+0x127/0x330
> >  [] ? process_backlog+0x83/0x100
> >  [] process_backlog+0x8e/0x100
> >  [] net_rx_action+0x13c/0x230
> >  [] ? net_rx_action+0x59/0x230
> >  [] __do_softirq+0x93/0x120
> >  [] do_softirq+0x7a/0x80
> >  [] irq_exit+0x65/0x90
> >  [] do_IRQ+0x41/0x80
> >  [] ? trace_hardirqs_on+0xb9/0x130
> >  [] common_interrupt+0x2e/0x34
> >  [] ? mwait_idle_with_hints+0x40/0x50
> >  [] ? mwait_idle+0x0/0x20
> >  [] mwait_idle+0x12/0x20
> >  [] cpu_idle+0x61/0x110
> >  [] rest_init+0x5d/0x60
> >  [] start_kernel+0