We found possible from-LAN DDoS bug. Since the conditions that trigger it are
not that common and hopefully are not easy achieved from outside of LAN, and
we do not have yet the time to propose a patch right now, it was decided to
publish this report now for your help with analyzing it, thanks.
Please apologize us [LKML], bug was tested on a bit patched kernels (e.g.
Debian kernels) but we are fairly sure the bug exists in vanilla kernel as
well and we want to test that too, shortly.
Many linux computers, at least with Realtek NIC especially RTL8111/8168B, hang
(keyboard, mouse, or even ATA channels) if they encounter
NETDEV Watchdog timeout (which happens often on low-end cards like Realtek, in
busy gigabit LAN network).
Kernels: at least 2.6.32, 3.2.0, 3.2.46 are effected.
Devices: at least RTL8111/8168B hang always (rev06, 02, and likely all in
between). Some Intel cards appear to hang, but less often. Marvel and other
cards are not enoug testes.
Workaround: none known in software for tested systems
This looks possibly exploitable as DDoS from LAN (or even from Internet with
fast gateway/pipe) by causing heavy traffic in order to hang computers.
All checked 8 linux computers in LAN show the watchdog timeout.
And 3 out of 6 desktop computers appeared to ALWAYS have trouble at the time
of watchdog timeout: e.g. hanging keyboard, they where on Realtek NIC.
The card reported by someone else, was also Realtek.
One of realtek using computers also has ATA disk errors: ata1.01: qc timeout
(cmd 0xa0), harddrive appears stuck until cable is replugged.
Intel and Marvel wasn't fully observed and where less tested so far (mostly
headless).
But, at least two times Intel bases computer hanged at same time as other RTL
computers (at time of NETDEV watchdog timeout).
This bug is probably NOT limited to only realtek therefore.
No workaround for this problem exists in software.
In hardware:
- avoid such network conditions that trigger them (maybe connecting to slow
HUB)
- maybe use other NIC cards
- when 2nd card was plugged in (usb0) it instantly was unhanging the computer
same as replugging eth0 cable would, and seemed to immunize it from hanging
-
Software workaround attempted:
kernel cmdline "pcie_aspm=off" changed nothing.
/proc/cmdline was:
BOOT_IMAGE=/vmlinuz-3.2.46-grsec-good.0.1.6 root=/dev/mapper/VG1-LV_root ro
quiet pcie_aspm=off
and we got same behavior as before. Error log was:
[43779.922271] [ cut here ]
[43779.922280] WARNING: at net/sched/sch_generic.c:255
dev_watchdog+0xeb/0x14b()
[43779.922283] Hardware name: To be filled by O.E.M.
[43779.922287] NETDEV WATCHDOG: eth4 (r8169): transmit queue 0 timed out
[43779.922289] Modules linked in: xt_owner xt_tcpudp xt_state ipt_REJECT
ipt_LOG xt_limit xt_mark iptable_raw iptable_nat nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables x_tables
acpi_cpufreq mperf cpufreq_stats cpufreq_conservative cpufreq_userspace
cpufreq_powersave parport_pc ppdev lp parport bridge stp bnep rfcomm bluetooth
rfkill binfmt_misc kvm_intel kvm fuse ext2 hwmon_vid loop tpm_tis tpm psmouse
tpm_bios serio_raw pcspkr joydev evdev i2c_i801 i915 drm_kms_helper drm
i2c_algo_bit i2c_core button video processor ext4 jbd2 mbcache crc16 cryptd
aes_x86_64 aes_generic cbc dm_crypt dm_mod sg usbhid hid sr_mod cdrom sd_mod
crc_t10dif ata_generic xhci_hcd ehci_hcd ata_piix libata r8169 thermal mii
usbcore scsi_mod usb_common fan thermal_sys [last unloaded: scsi_wait_scan]
[43779.922371] Pid: 0, comm: swapper/0 Not tainted 3.2.46-grsec-good.0.1.6 #1
[43779.922373] Call Trace:
[43779.922375][] warn_slowpath_common+0x80/0x98
[43779.922385] [] warn_slowpath_fmt+0x41/0x43
[43779.922400] [] ? .LC4+0xf2/0x15a [r8169]
[43779.922403] [] dev_watchdog+0xeb/0x14b
[43779.922407] [] run_timer_softirq+0x273/0x3b1
[43779.922409] [] ? run_timer_softirq+0x197/0x3b1
[43779.922413] [] ? hrtimer_interrupt+0x116/0x1e4
[43779.922417] [] ? netif_tx_unlock+0x51/0x51
[43779.922421] [] __do_softirq+0x118/0x24c
[43779.922424] [] ? clockevents_program_event+0x9d/0xbe
[43779.922427] [] ? hrtimer_interrupt+0x129/0x1e4
[43779.922431] [] call_softirq+0x1c/0x30
[43779.922436] [] do_softirq+0x43/0x98
[43779.922438] [] irq_exit+0x4b/0xc5
[43779.922442] [] smp_apic_timer_interrupt+0x85/0x93
[43779.922445] [] apic_timer_interrupt+0x70/0x80
[43779.922447][] ? sched_clock_cpu+0x4a/0xdd
[43779.922454] [] ? intel_idle+0xdf/0x119
[43779.922457] [] ? intel_idle+0xdb/0x119
[43779.922460] [] ? notifier_call_chain+0x81/0x81
[43779.922464] [] cpuidle_idle_call+0x11f/0x1fc
[43779.922468] [] cpu_idle+0xa8/0x109
[43779.922471] [] rest_init+0xd0/0xd7
[43779.922474] [] ? csum_partial_copy_generic+0x16c/0x16c
[43779.922478] [] start_kernel+0x3ed/0x3f8
[43779.922481] [] x86_64_start_reservations+0xb8/0xbc
[43779.922484] [] x86_64_start_kernel+0x101/0x110
[43779.922