Hi guys,

On Tue, Jan 27, 2015 at 06:01:13AM +0800, Yuan Long wrote:
> I am in the same fix.
> No matter what we try, the data to address is the real
> laptop/desktop/cellphone/server count. That count is skewed as soon as
> there are a hundred laptops/desktops behind a router.
> 
> Best I heard is from Willy himself, suggestion to use base32+src. At the
> cost of losing plain text and having a binary to use in acl but works for
> now. Grateful to have HAProxy in the first place.

There's no universal rule. Everything depends on how the site is made,
and how the bad guys are acting. For example, some sites may work very
well with a rate-limit on base32+src. That could be the case when you
want to prevent a client from mirroring a whole web site. But for sites
with very few urls, it could be another story. Conversely, some sites
will provide lots of different links to various objects. Think for
example about a merchant's site where each photo of object for sale is
a different URL. You wouldn't want to block users who simply click on
"next" and get 50 new photos each time.

So the first thing to do is to define how the site is supposed to work.
Next, you define what is a bad behaviour, and how to distinguish between
intentional bad behaviour and accidental bad behaviour (eg: people who
have to hit reload several times because of a poor connection). For most
sites, you have to keep in mind that it's better to let some bad users
pass through than to block legitimate users. So you want to put the cursor
on the business side and not on the policy enforcement side.

Proxies, firewalls etc make the problem worse, but not too much in general.
You'll easily see some addresses sending 3-10 times more requests than other
ones because they're proxying many users. But if you realize that a valid
user may also reach that level of traffic on regular use of the site, it's
a threshold you have to accept anyway. What would be unlikely however is
that surprizingly all users behind a proxy browse on steroids. So setting
blocking levels 10 times higher than the average pace you normally observe
might already give very good results.

If your site is very special and needs to enforce strict rules against
sucking or spamming (eg: forums), then you may need to identify the client
and observe cookies. But then there's even less generic rule, it totally
depends on the application and the sequence to access the site. To be
transparent on this subject, we've been involved in helping a significant
number of sites under abuse or attack at HAProxy Technologies, and it
turns out that whatever new magic tricks you find for one site are often
irrelevant to the next one. Each time you have to go back to pencil and
paper and write down the complete browsing sequence and find a few subtle
elements there.

Regards,
Willy


Reply via email to