Hi guys, On Tue, Jan 27, 2015 at 06:01:13AM +0800, Yuan Long wrote: > I am in the same fix. > No matter what we try, the data to address is the real > laptop/desktop/cellphone/server count. That count is skewed as soon as > there are a hundred laptops/desktops behind a router. > > Best I heard is from Willy himself, suggestion to use base32+src. At the > cost of losing plain text and having a binary to use in acl but works for > now. Grateful to have HAProxy in the first place.
There's no universal rule. Everything depends on how the site is made, and how the bad guys are acting. For example, some sites may work very well with a rate-limit on base32+src. That could be the case when you want to prevent a client from mirroring a whole web site. But for sites with very few urls, it could be another story. Conversely, some sites will provide lots of different links to various objects. Think for example about a merchant's site where each photo of object for sale is a different URL. You wouldn't want to block users who simply click on "next" and get 50 new photos each time. So the first thing to do is to define how the site is supposed to work. Next, you define what is a bad behaviour, and how to distinguish between intentional bad behaviour and accidental bad behaviour (eg: people who have to hit reload several times because of a poor connection). For most sites, you have to keep in mind that it's better to let some bad users pass through than to block legitimate users. So you want to put the cursor on the business side and not on the policy enforcement side. Proxies, firewalls etc make the problem worse, but not too much in general. You'll easily see some addresses sending 3-10 times more requests than other ones because they're proxying many users. But if you realize that a valid user may also reach that level of traffic on regular use of the site, it's a threshold you have to accept anyway. What would be unlikely however is that surprizingly all users behind a proxy browse on steroids. So setting blocking levels 10 times higher than the average pace you normally observe might already give very good results. If your site is very special and needs to enforce strict rules against sucking or spamming (eg: forums), then you may need to identify the client and observe cookies. But then there's even less generic rule, it totally depends on the application and the sequence to access the site. To be transparent on this subject, we've been involved in helping a significant number of sites under abuse or attack at HAProxy Technologies, and it turns out that whatever new magic tricks you find for one site are often irrelevant to the next one. Each time you have to go back to pencil and paper and write down the complete browsing sequence and find a few subtle elements there. Regards, Willy