Summary: On a multi-homed box, after turning on
/proc/sys/net/ipv4/icmp_errors_use_inbound_ifaddr, it now periodically
oopses when trying to lookup the source address to use for sending an ICMP
in response to a jump ipt_REJECT.
I'm still trying to figure out what makes this test case unique. It
spuriously occurs with many fedora builds of 2.6.{18,19,20} all of which
don't appear to have any patches in this area of the kernel. Just _maybe_
it's because of a combination of dogleg routing and overloading one vlan
with multiple subnets:
# ip route list table main proto kernel scope link
10.10.36.208/28 dev eth0.1 src 10.10.36.210
10.10.36.224/27 dev eth0.1 src 10.10.36.226
10.10.1.0/24 dev eth0.5 src 10.10.1.48
10.10.2.0/24 dev eth0.9 src 10.10.2.27
10.10.10.0/24 dev eth0.8 src 10.10.10.1
Policy routing has been disabled and the problem still occurs. Only one
route remains. It is gw:
# ip route list table main proto boot
default via 10.10.36.209 dev eth0.1
Am I on the right track here or is this a distro/build/config issue?
Here's the oops:
BUG: unable to handle kernel NULL pointer dereference at virtual address
000000a8
printing eip:
c05fe72b
*pde = 3d429067
Oops: 0000 [#1]
SMP
last sysfs file: /devices/platform/i2c-9191/name
Modules linked in: it87 hwmon_vid hwmon i2c_isa eeprom 8021q nf_nat_ftp
nf_conntrack_ftp ipt_REJECT ipt_owner ipt_ULOG xt_limit xt_state xt_multiport
iptable_filter xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
nfnetlink ip_tables x_tables video sbs i2c_ec dock button battery asus_acpi
backlight ac parport_pc lp parport 8139too 8139cp mii pcspkr i2c_i801 i2c_i810
iTCO_wdt i2c_algo_bit i2c_core iTCO_vendor_support dm_snapshot dm_zero
dm_mirror dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd
uhci_hcd
CPU: 0
EIP: 0060:[<c05fe72b>] Not tainted VLI
EFLAGS: 00010246 (2.6.20-1.2948.fc6 #1)
EIP is at inet_select_addr+0x4/0x9f
eax: 00000000 ebx: f8b97046 ecx: 000000fd edx: 00000000
esi: 000000fd edi: 00000001 ebp: f71cd0ac esp: c078bc9c
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, ti=c078b000 task=c06fc480 task.ti=c0746000)
Stack: f8b97046 f601b130 c05fd0b6 f728b980 f728b980 f8b5adbb c05bcb6e c078bd74
00000003 00000003 00000246 00000246 00000000 f887e014 f8a611a6 f7c1ea80
f728b9a8 00000000 f727d220 f887e000 00000001 00000072 f7383800 f728b980
Call Trace:
[<f8b97046>] reject+0x0/0x4ae [ipt_REJECT]
[<c05fd0b6>] icmp_send+0x14d/0x39b
[<f8b5adbb>] nf_conntrack_free+0x18/0x20 [nf_conntrack]
[<c05bcb6e>] __kfree_skb+0xc5/0x10d
[<f8a611a6>] rtl8139_start_xmit+0xe2/0x112 [8139too]
[<c05c12fc>] dev_hard_start_xmit+0x1bc/0x21b
[<c060e141>] xfrm_decode_session+0x44/0x4b
[<c0620667>] _spin_lock_irqsave+0x9/0xd
[<c042edc9>] lock_timer_base+0x15/0x2f
[<c042eed5>] __mod_timer+0x94/0x9e
[<f8b90406>] ipt_ulog_packet+0x2f0/0x3ba [ipt_ULOG]
[<f8b97046>] reject+0x0/0x4ae [ipt_REJECT]
[<f8b9709c>] reject+0x56/0x4ae [ipt_REJECT]
[<c06205bd>] _spin_unlock_bh+0x5/0xd
[<f8b904f6>] ipt_ulog_target+0x26/0x2e [ipt_ULOG]
[<f8b97046>] reject+0x0/0x4ae [ipt_REJECT]
[<f8b3d454>] ipt_do_table+0x28c/0x2e8 [ip_tables]
[<c05d7550>] nf_iterate+0x38/0x6a
[<c05d7681>] nf_hook_slow+0x4d/0xb5
[<c05dec14>] dst_output+0x0/0x7
[<c05e0f14>] ip_queue_xmit+0x3a3/0x3f4
[<c05dec14>] dst_output+0x0/0x7
[<c0422645>] try_to_wake_up+0x3aa/0x3b4
[<c05f47f1>] tcp_v4_send_check+0x74/0xaa
[<c05ef1c3>] tcp_transmit_skb+0x6d5/0x703
[<c05eff49>] tcp_retransmit_skb+0x4b1/0x592
[<c05e8db7>] tcp_enter_loss+0x1a8/0x205
[<c05f258b>] tcp_write_timer+0x43f/0x638
[<c0620655>] _spin_unlock_irq+0x5/0x7
[<c0439d17>] hrtimer_run_queues+0x127/0x141
[<c0422add>] run_rebalance_domains+0x116/0x33e
[<c05f214c>] tcp_write_timer+0x0/0x638
[<c042e51b>] run_timer_softirq+0x101/0x164
[<c042b7b0>] __do_softirq+0x5d/0xba
[<c040615b>] do_softirq+0x59/0xb1
[<c0419d4e>] smp_apic_timer_interrupt+0x76/0x80
[<c04049b0>] apic_timer_interrupt+0x28/0x30
[<c0402d52>] default_idle+0x0/0x3e
[<c061007b>] __xfrm_policy_check+0x4c5/0x592
[<c0402d7e>] default_idle+0x2c/0x3e
[<c04023d0>] cpu_idle+0x9e/0xb7
[<c074b812>] start_kernel+0x3b6/0x3be
[<c074b25a>] unknown_bootoption+0x0/0x202
=======================
Code: eb 10 39 72 18 75 09 89 f8 33 42 14 85 c6 74 0e 8b 12 85 d2 74 08 f6 42 25 01
74 e6 31 d2 83 c4 0c 89 d0 5b 5e 5f c3 56 89 ce 53 <8b> 80 a8 00 00 00 85 c0 74
39 8b 48 0c 31 db eb 24 0f b6 41 24
EIP: [<c05fe72b>] inet_select_addr+0x4/0x9f SS:ESP 0068:c078bc9c
<0>Kernel panic - not syncing: Fatal exception in interrupt
<0>Rebooting in 60 seconds..
And a few relevant details:
# uname -a
Linux host 2.6.20-1.2948.fc6 #1 SMP Fri Apr 27 19:48:40 EDT 2007 i686 i686 i386
GNU/Linux
# lspci -tv
-[0000:00]-+-00.0 Intel Corporation 82845G/GL[Brookdale-G]/GE/PE DRAM
Controller/Host-Hub Interface
+-02.0 Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset
Integrated Graphics Device
+-1d.0 Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB
UHCI Controller #1
+-1d.1 Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB
UHCI Controller #2
+-1d.2 Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB
UHCI Controller #3
+-1d.7 Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI
Controller
+-1e.0-[0000:01]----05.0 Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+
+-1f.0 Intel Corporation 82801DB/DBL (ICH4/ICH4-L) LPC Interface
Bridge
+-1f.1 Intel Corporation 82801DB (ICH4) IDE Controller
\-1f.3 Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus
Controller
Rather than turn off ipt_REJECT or icmp_errors_use_inbound_ifaddr, I've
removed the system from production to try and capture the error. It's
still live and on the wire, and the oopses occur every few days/weeks.
Unfortunately, I cannot reproduce at will. :-(
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/