On Thu, Aug 14, 2003 at 08:49:49AM -0500, Jack L. Stone wrote: > Server Version: Apache/1.3.27 (Unix) FrontPage/5.0.2.2510 PHP/4.3.1 > The above is typical of the servers in use, and with csh shells employed, > plus IPFW. > > My apologies for the length of this question, but the background seems > necessary as brief as I can make it so the question makes sense. > > The problem: > We have several servers that provide online reading of Technical articles > and each have several hundred MB to a GB of content. > > When we started providing the articles 6-7 years ago, folks used browsers > to read the articles. Now, the trend has become a more lazy approach and > there is an increasing use of those download utilities which can be left > unattended to download entire web sites taking several hours to do so. > Multiply this by a number of similar downloads and there goes the > bandwidth, denying those other normal online readers the speed needed for > loading and browsing in the manner intended. Several hundred will be > reading at a time and several 1000 daily. <snip> There is no easy solution to this, but one avenue might be to look at bandwidth throttling in an apache module.
One that I've used before is mod_throttle which is in the ports: /usr/ports/www/mod_throttle which allows you to throttle users by ip address to a certain number of documents and/or up to a certain transfer limit. IIRC it's fairly limited though in that you can only apply per IP limits to _every_ virtual host - ie in the global httpd.conf context. A more finegrained solution (from what I've read, haven't tried it) is mod_bwshare - this one isn't in the ports but can be found here: http://www.topology.org/src/bwshare/ this module overcomes some of the shortfalls of mod_throttle and allows you to specify finer granularity over who consumes how much bandwidth over what time period. > Now, my question: Is it possible to write a script that can constantly scan > the Apache logs to look for certain footprints of those downloaders, > perhaps the names, like "HTTRACK", being one I see a lot. Whenever I see > one of those sessions, I have been able to abort them by adding a rule to > the firewall to deny the IP address access to the server. This aborts the > downloading, but have seen the attempts constantly continue for a day or > two, confirming unattended downloads. > > Thus, if the script could spot an "offender" and then perhaps make use of > the firewall to add a rule containing the offender's IP address and then > flush to reset the firewall, this would at least abort the download and > free up the bandwidth (I already have a script that restarts the firewall). > > Is this possible and how would I go about it....??? If you really wanted to go down this route then I found a script someone wrote a while back to find 'rude robots' from a httpd logfile which you could perhaps adapt to do dynamic filtering in conjunction with your firewall: http://stein.cshl.org/~lstein/talks/perl_conference/cute_tricks/log9.html If you have any success let me know. -- Jez http://www.munk.nu/ _______________________________________________ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"