hanging, and possible exploit/ddos from LAN for RTL and other cards (watchdog netdev)

2013-06-16 Thread opensou...@tigusoft.pl


We found possible from-LAN DDoS bug. Since the conditions that trigger it are 
not that common and hopefully are not easy achieved from outside of LAN, and 
we do not have yet the time to propose a patch right now, it was decided to 
publish this report now for your help with analyzing it, thanks.
Please apologize us [LKML], bug was tested on a bit patched kernels (e.g. 
Debian kernels) but we are fairly sure the bug exists in vanilla kernel as 
well and we want to test that too, shortly.


Many linux computers, at least with Realtek NIC especially RTL8111/8168B, hang 
(keyboard, mouse, or even ATA channels) if they encounter
NETDEV Watchdog timeout (which happens often on low-end cards like Realtek, in 
busy gigabit LAN network).

Kernels: at least 2.6.32, 3.2.0, 3.2.46 are effected.
Devices: at least RTL8111/8168B hang always (rev06, 02, and likely all in 
between). Some Intel cards appear to hang, but less often. Marvel and other 
cards are not enoug testes.
Workaround: none known in software for tested systems

This looks possibly exploitable as DDoS from LAN (or even from Internet with 
fast gateway/pipe) by causing heavy traffic in order to hang computers.



All checked 8 linux computers in LAN show the watchdog timeout.

And 3 out of 6 desktop computers appeared to ALWAYS have trouble at the time 
of watchdog timeout: e.g. hanging keyboard, they where on Realtek NIC.
The card reported by someone else, was also Realtek.
One of realtek using computers also has ATA disk errors: ata1.01: qc timeout 
(cmd 0xa0), harddrive appears stuck until cable is replugged.

Intel and Marvel wasn't fully observed and where less tested so far (mostly 
headless).
But, at least two times Intel bases computer hanged at same time as other RTL 
computers (at time of NETDEV watchdog timeout).



This bug is probably NOT limited to only realtek therefore.

No workaround for this problem exists in software. 

In hardware: 
- avoid such network conditions that trigger them (maybe connecting to slow 
HUB)
- maybe use other NIC cards
- when 2nd card was plugged in (usb0) it instantly was unhanging the computer 
same as replugging eth0 cable would, and seemed to immunize it from hanging


-
Software workaround attempted:
kernel cmdline "pcie_aspm=off" changed nothing.

/proc/cmdline was:
BOOT_IMAGE=/vmlinuz-3.2.46-grsec-good.0.1.6 root=/dev/mapper/VG1-LV_root ro 
quiet pcie_aspm=off

and we got same behavior as before. Error log was:

[43779.922271] [ cut here ]
[43779.922280] WARNING: at net/sched/sch_generic.c:255 
dev_watchdog+0xeb/0x14b()
[43779.922283] Hardware name: To be filled by O.E.M.
[43779.922287] NETDEV WATCHDOG: eth4 (r8169): transmit queue 0 timed out
[43779.922289] Modules linked in: xt_owner xt_tcpudp xt_state ipt_REJECT 
ipt_LOG xt_limit xt_mark iptable_raw iptable_nat nf_nat nf_conntrack_ipv4 
nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables x_tables 
acpi_cpufreq mperf cpufreq_stats cpufreq_conservative cpufreq_userspace 
cpufreq_powersave parport_pc ppdev lp parport bridge stp bnep rfcomm bluetooth 
rfkill binfmt_misc kvm_intel kvm fuse ext2 hwmon_vid loop tpm_tis tpm psmouse 
tpm_bios serio_raw pcspkr joydev evdev i2c_i801 i915 drm_kms_helper drm 
i2c_algo_bit i2c_core button video processor ext4 jbd2 mbcache crc16 cryptd 
aes_x86_64 aes_generic cbc dm_crypt dm_mod sg usbhid hid sr_mod cdrom sd_mod 
crc_t10dif ata_generic xhci_hcd ehci_hcd ata_piix libata r8169 thermal mii 
usbcore scsi_mod usb_common fan thermal_sys [last unloaded: scsi_wait_scan]
[43779.922371] Pid: 0, comm: swapper/0 Not tainted 3.2.46-grsec-good.0.1.6 #1
[43779.922373] Call Trace:
[43779.922375][] warn_slowpath_common+0x80/0x98
[43779.922385]  [] warn_slowpath_fmt+0x41/0x43
[43779.922400]  [] ? .LC4+0xf2/0x15a [r8169]
[43779.922403]  [] dev_watchdog+0xeb/0x14b
[43779.922407]  [] run_timer_softirq+0x273/0x3b1
[43779.922409]  [] ? run_timer_softirq+0x197/0x3b1
[43779.922413]  [] ? hrtimer_interrupt+0x116/0x1e4
[43779.922417]  [] ? netif_tx_unlock+0x51/0x51
[43779.922421]  [] __do_softirq+0x118/0x24c
[43779.922424]  [] ? clockevents_program_event+0x9d/0xbe
[43779.922427]  [] ? hrtimer_interrupt+0x129/0x1e4
[43779.922431]  [] call_softirq+0x1c/0x30
[43779.922436]  [] do_softirq+0x43/0x98
[43779.922438]  [] irq_exit+0x4b/0xc5
[43779.922442]  [] smp_apic_timer_interrupt+0x85/0x93
[43779.922445]  [] apic_timer_interrupt+0x70/0x80
[43779.922447][] ? sched_clock_cpu+0x4a/0xdd
[43779.922454]  [] ? intel_idle+0xdf/0x119
[43779.922457]  [] ? intel_idle+0xdb/0x119
[43779.922460]  [] ? notifier_call_chain+0x81/0x81
[43779.922464]  [] cpuidle_idle_call+0x11f/0x1fc
[43779.922468]  [] cpu_idle+0xa8/0x109
[43779.922471]  [] rest_init+0xd0/0xd7
[43779.922474]  [] ? csum_partial_copy_generic+0x16c/0x16c
[43779.922478]  [] start_kernel+0x3ed/0x3f8
[43779.922481]  [] x86_64_start_reservations+0xb8/0xbc
[43779.922484]  [] x86_64_start_kernel+0x101/0x110
[43779.922

Re: hanging, and possible exploit/ddos from LAN for RTL and other cards (watchdog netdev)

2013-06-21 Thread opensou...@tigusoft.pl
  1184363   PCI-MSI-edge  fglrx[1]@PCI:4:0:0
NMI:  6  3   Non-maskable interrupts
LOC:   9558   5140   Local timer interrupts
SPU:  0  0   Spurious interrupts
PMI:  6  3   Performance monitoring interrupts
IWI:  0  0   IRQ work interrupts
RTR:  1  0   APIC ICR read retries
RES:   1401797   Rescheduling interrupts
CAL: 48 66   Function call interrupts
TLB: 66 70   TLB shootdowns
TRM:  0  0   Thermal event interrupts
THR:  0  0   Threshold APIC interrupts
MCE:  0  0   Machine check exceptions
MCP:  2  2   Machine check polls
ERR:  0
MIS:  0
===



Thanks for helping to debug and find this problem: tigusoft.pl , #grsecurity , 
Arach , Admin2501 , R.Freeman 
(yeap, we got everyone and their goldfish to join testing :)







> opensou...@tigusoft.pl  :
> [...]
> 
> > Thanks for helping to debug and find this problem: tigusoft.pl ,
> > #grsecurity , Arach , Admin2501 , R.Freeman
> > 
> > 
> > We await any instructions how to debug this further.
> 
> Please send the XID of Realtek devices you own ('dmesg | grep XID').
> r8169 support level is not the same for all chipsets and kernel.
> 
> Also please identify the motherboards and see if the r8169 irq is
> shared on some of those.
> 
> Besides the r8169 patch you added, you may test if clocksource=acpi_pm
> makes a difference (courtesy of Lance Lassetter on netdev).
> 
> [...]
> 
> > We plan to re-test this on newest kernels as well,
> 
> You will be welcome.
> 
> > but it was decided the report should be sent without delaying more since
> > at least for us it explained many of "strange" cases of linux computers
> > hanging, so it might help other people out there too.
> 
> Despite what the subject of the mail suggests, there is no "exploit"
> though. Right ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/