Good spot raster. I've blocked the following User-Agent on the frontend load balancer, so no other VM/website can be reached by those bits suckers.
acl u-robots-bad hdr(User-Agent) -i Ahrefsbot acl u-robots-bad hdr(User-Agent) -i Baiduspider acl u-robots-bad hdr(User-Agent) -i Cliqzbot acl u-robots-bad hdr(User-Agent) -i DotBot acl u-robots-bad hdr(User-Agent) -i MJ12bot acl u-robots-bad hdr(User-Agent) -i Semrushbot acl u-robots-bad hdr(User-Agent) -i YandexBot https://git.enlightenment.org/ is back online. Cheers, Bertrand On 25/04/2017 22:40, Bertrand Jacquin wrote: > Hey, > > Taking a look > > Cheers > > On 25/04/2017 19:51, Dave wrote: >> Try the following apache config in your directory directive, or >> .htaccess >> file: >> >> BrowserMatchNoCase Baiduspider botblock >> BrowserMatchNoCase Semrushbot botblock >> BrowserMatchNoCase Ahrefsbot botblock >> Order Deny,Allow >> Deny from env=botblock >> >> Should block those specific bots, while allowing others to use http. >> It >> could take a few weeks before the bots realise you've made a change to >> your >> robots.txt . >> >> Cheers, >> davek >> >> >> >> In the year 2017, of the month of April, on the 26th day, Carsten >> Haitzler wrote: >>> I've had to disable the whole http support for now for >>> git.enlightenment.org >>> because several bots are crawling it causing our VM to basically be >>> loaded with >>> 10-20 cgit cgi's running git queries for history etc. continually. >>> I/O and >>> system load is going through the roof as a result and causing other >>> stuff like >>> phab to crawl and begin timing out. >>> >>> So anyone using HTTP for doing cmdline git stuff is, at this moment, >>> going to >>> find things not working. SSH and GIT protocol should still work. I'll >>> keep this >>> shut down for a few hours hoping the bots give up. >>> >>> I added a robots.txt and edited the cigtrc to deny all bots from >>> indexing >>> git.enlightenment.org - but the bots seem to be ignoring that now >>> that they >>> have decided to start indexing. >>> >>> I am wondering if this has been the cause of our issues - being >>> overloaded by >>> indexer bots. FYI I counted 3 different bots indexing cgit at the >>> same time: >>> Baiduspider, Semrushbot, Ahrefsbot. >>> >>> I hope later they will start listening to robots.txt, but for now I >>> need to >>> keep things off until the bots give up. >>> >>> -- >>> ------------- Codito, ergo sum - "I code, therefore I am" >>> -------------- >>> The Rasterman (Carsten Haitzler) [email protected] >>> >>> >>> ------------------------------------------------------------------------------ >>> Check out the vibrant tech community on one of the world's most >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>> _______________________________________________ >>> enlightenment-devel mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/enlightenment-devel -- Bertrand ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ enlightenment-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
