Greetings misc@

I am facing regular and consequent DDoS, and I would like to know how
the OpenBSD community deal with these. Hints and inputs welcome.

The obvious first : my input pipes are not filled, there is plenty of
bandwith available for my regular users. As OpenBSD is not enough (in
my setup, I am sure there is a solution) to mitigate such attacks we
use a proprietary product, but this solution has some undesirable
side-effects and is not a viable long term solution for us.

Methodology is more or less always the same :
        - massive UDP flood           :   2 Gbps / 150 Kpps -> dropped
directly on the router, not a problem
        - moderate ICMP flood         :  10 Mbps /  12 Kpps
        - moderate IP fragments flood : 380 Mbps /  57 Kpps
        - moderate TCP RST flood      :  10 Mbps /  30 Kpps
        - massive TCP SYN flood       : 640 Mbps /   2 Mpps -> yup, that hurts

So, UDP never ever reaches my OpenBSD box. The SYN are made with a
very vicious method : each used IP send exactly one SYN, but there are
millions of them (traffic probably spoofed, but can not use uRPF as we
have asymmetric traffic and routes). I tried to set limit states with
1M entries, and it was quickly filled (tried 5M but the box collapses
way before that). So in the end, the state table collapses and no
traffic can pass, even for regular users with already established
connections.

I ran some experiments in a lab trying to reproduce this, with a box
roughly identical to what I have in production (but much weaker, of
course). The box collapses at 600 Kpps SYN (100% interrupts), but
handles everything very gently (less than 50% interrupts and no packet
loss) if the first rule evaluated is block drop in quick from !
<whitelisted_users>. So it seems that my bottleneck is PF here, not
the hardware. A consequence of this saturation : both my main firewall
and my backup claims MASTER ownership of the CARP (split brain
syndrome). CARP works just fine when I add the block rule, though.

Some configuration details :
        - OS  : OpenBSD 5.0/amd64 box, using GENERIC.MP
        - CPU : Intel X3460 CPU (4 cores, 2.80GHz)
        - RAM : 4GB
        - NIC : 2x Intel 82576 (2 ports each)

Each network card has the following setup : one port to the LAN, one
port to the WAN. Each pair (LAN1/LAN2 and WAN1/WAN2) is trunked using
LACP. Already bumped net.inet.ip.ifq.maxlen, as all NICs are
supported. My benchmarks did highlight two interesting things : amd64
has better performance than i386 (roughly 5-10% less interrupts, with
same rules and traffic), but the difference between GENERIC and
GENERIC.MP is insignificant.

My current idea is to hack a daemon to track established connections
(extracting them ` la netstat), and inject my block rule in an anchor
(` la relayd) when needed (watching some stats from pf, with its ioctl
interface). Pros: regular users the firewall saw before the attack can
still use the service. Cons: no new users are allowed until the
removal of the rule, obviously. Better than nothing, but I welcome any
other hints :)

One other solution may be to add boxes. I tried a carpnodes cluster,
but at 600 Kpps I got a "split brain" with both nodes claiming MASTER
for each carpnode. Maybe if I configure ALTQ it could help this ? As I
have more boxes, I could deal with the performance impact of ALTQ.

I am willing to test any patch/suggestion you may have, of course.
Even just hints about kernel code, as I am currently messing with PF
code myself. I did compile a profiled kernel, I must now check the
results but that will be another story.

To finish, here is the typical load on the box (errors are from
various DDoS, not related to normal use) :

Status: Enabled for 77 days 02:17:58             Debug: err

Interface Stats for trunk1            IPv4             IPv6
  Bytes In                   8885330383273                0
  Bytes Out                 72449316050298            20224
  Packets In
    Passed                     48738702875                0
    Blocked                    10152865611                0
  Packets Out
    Passed                     67293792876              281
    Blocked                     4557637133                0

State Table                          Total             Rate
  current entries                    37135
  searches                    130771929548        19632.2/s
  inserts                       4718030394          708.3/s
  removals                      4717993259          708.3/s
Source Tracking Table
  current entries                     7455
  searches                      4951426366          743.3/s
  inserts                        623672861           93.6/s
  removals                       623665406           93.6/s
Counters
  match                         5600111978          840.7/s
  bad-offset                             0            0.0/s
  fragment                         3591379            0.5/s
  short                            2500133            0.4/s
  normalize                          10968            0.0/s
  memory                             71750            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                       3863476            0.6/s
  ip-option                              0            0.0/s
  proto-cksum                            0            0.0/s
  state-mismatch                   3722058            0.6/s
  state-insert                           0            0.0/s
  state-limit                            0            0.0/s
  src-limit                      234360390           35.2/s
  synproxy                     13817759263         2074.4/s
Limit Counters
  max states per rule                    0            0.0/s
  max-src-states                     89727            0.0/s
  max-src-nodes                          0            0.0/s
  max-src-conn                           0            0.0/s
  max-src-conn-rate                      0            0.0/s
  overload table insertion               0            0.0/s
  overload flush states                  0            0.0/s

Reply via email to