Joachim Schipper wrote:
Your worries about losing proxies is correct; it looks like you have
that problem mostly covered. I'm not sure it would help much about
bandwidth hogs, though - I don't have any numbers on what programs are
most often used, but something like wget certainly does respect
robots.txt.
Actually it does. There is many attacks going on right now as you know,
but if you put them in category, you have the tones of variation of user
pass value sanity check and you can now see that on Security focus. They
release in the last three days, over a dozen so far. Even more now I am
sure. I saw that started a few eeks ago if you look into the archive,
but that's irrelevant anyway. The other is a virus that spread the same
way, or similar. In that case they actually call big content page(s) on
your site. When I mean big content, it's not with images, etc. But text
stuff. The reason if their virus do not process the content and would
need to be bigger to do so. This way, it still small and the web server
see it as legit and will reply. But if you have pages that have .5MB of
text on it as an example that comes from database back end, then they
hope to bring your server down, your SQL back end down and if not make
you waist as much bandwidth as possible. I notice it first on the HUGE
increase on the GB of transfer each day. Just for you to get a picture
of this effect. I have logged over 300,000 sources of virus doing this
type of attack so far on my servers and they pull a series of pages that
are pretty big in text content, between 150KB minimum to 750KB, or so
excluding any other content. Each of the offending source will pull that
content many times a day. I mean just think about it.
So, if you go ONLY with an example of let say just for fun. One time an
hour only from each one on and accessing an average page of 500KB. You
get a waisted transfer for that day only of:
24 hours * 500,000 Bytes in size * from 300,000 source and you have:
3,600,000,000,000 Bytes of waisted bandwidth / day.
Now if you assume that this is prefect and constant without peek for
example, then you need to push this amount of data in 24 hours, so you
would need:
3,600,000,000,000 * 8 bits/Byte = 28,800,000,000,000 / (60 seconds * 60
minutes * 24 hours) and get 333,333,333 bits/sec needed in capacity,
just for this waisting stuff!
And this is only based on one query per hours! Get the picture and the
size of the problem. (:>
So, what I put into place to counter that doesn't stop it as you can
stop the source from coming in, but you need to find the good out of the
bad and my reply to bad one happen to be only 5 bytes instead in the log
anyway.
All this is with forgetting all the overhead, etc.
So, yes it's a BIG help for "bandwidth hogs"!
And don't forget that's per destination under attack! (:>
So, yes, it can be totally unmanageable if not stop from the start and
on big scale.
3. DDoS GET attacks & Bandwidth suckers defense. Multiple approach.
3.1 Good users supply data check.
So far most/all of the variations of attacks on web sites are with
scripts trying to inject itself to your servers. Well, you need to do
sanity checks on your code. Nothing can really protect you for that if
you don't check what you expect to receive from users input. So, I have
nothing for that. No idea anyway on how to, other then may be limiting
the side of the argument a get can send, but even that is bad idea I think.
This is not applicable to DDoS, really - though you are otherwise right,
of course.
I provided a very simple way to not remove the problem, but to at a
minimum stop it from getting infected based on all the latest series of
security focus variations and it also have the benefit to point you to
any possible source that your server might have install on them as well.
Very simple really.
3.2 Gray listing idea via 302 temporary return code.
This could be effective, indeed - though I am not sure it would block
many attackers.
Work like a charm in real life so far. See number above for results.
It's been use successfully so far for a few weeks and no bad side effect
still, just HUGE benefits! And the servers still don't break into sweat yet!
3.4 What about the compromise user computer itself, or proxy server.
Faking those headers is easily done, though; ideally, you'd want to
cross-check p0f and the headers. I'm not entirely sure it would hurt an
attacker more than it hurt you, though, and priviliged code is always
scary, and doubly so when close to essentially untrusted web apps.
True for sure. But you still need a way to make the difference between
good and bad passing through proxy, or you loose to much. Here
obviously, I go with the fact that so far, yes these headers are fake
and it's trivial to do as well, but none of the attack so far anyway
generate random headers. In witch case it would be useless obviously.
4. What about more intelligent attack.
You *should* consider some unconventional browsers before going to far
down this lane, though. Notably, your 1x1-image will show up quite
readable on text-mode browsers; be sure to, at least, add a 'don't
click' alt attribute.
I know about the text one and tested with Lynx to see, but I did forget
that I should add the "Do NOT CLICK HERE... Bot trap WARING" stuff, so I
will do that.
Also, neither text-based browsers nor most legitimate bots will request
images.
And that was the point. Allow legitimate bots, if you choose so
obviously, and ban the bad one!
Best,
Daniel