On Tue, 25 Apr 2017 22:58:14 -0700 Bertrand Jacquin <[email protected]> said:
sounds good. one day they'll find the robots.txt ... i hope. but not today. this will indeed be good for now. turning git.e.org off entirely was an emergency solution until i either could see these time out (they didnt after a few hrs) or a better solution like below was put in place. :) phabricator works oh so much better now! like i can actually use arc to apply patches/diffs as opposed to getting 503's every time... > Good spot raster. > > I've blocked the following User-Agent on the frontend load balancer, so > no other VM/website can be reached by those bits suckers. > > acl u-robots-bad hdr(User-Agent) -i Ahrefsbot > acl u-robots-bad hdr(User-Agent) -i Baiduspider > acl u-robots-bad hdr(User-Agent) -i Cliqzbot > acl u-robots-bad hdr(User-Agent) -i DotBot > acl u-robots-bad hdr(User-Agent) -i MJ12bot > acl u-robots-bad hdr(User-Agent) -i Semrushbot > acl u-robots-bad hdr(User-Agent) -i YandexBot > > https://git.enlightenment.org/ is back online. > > Cheers, > Bertrand > > On 25/04/2017 22:40, Bertrand Jacquin wrote: > > Hey, > > > > Taking a look > > > > Cheers > > > > On 25/04/2017 19:51, Dave wrote: > >> Try the following apache config in your directory directive, or > >> .htaccess > >> file: > >> > >> BrowserMatchNoCase Baiduspider botblock > >> BrowserMatchNoCase Semrushbot botblock > >> BrowserMatchNoCase Ahrefsbot botblock > >> Order Deny,Allow > >> Deny from env=botblock > >> > >> Should block those specific bots, while allowing others to use http. > >> It > >> could take a few weeks before the bots realise you've made a change to > >> your > >> robots.txt . > >> > >> Cheers, > >> davek > >> > >> > >> > >> In the year 2017, of the month of April, on the 26th day, Carsten > >> Haitzler wrote: > >>> I've had to disable the whole http support for now for > >>> git.enlightenment.org > >>> because several bots are crawling it causing our VM to basically be > >>> loaded with > >>> 10-20 cgit cgi's running git queries for history etc. continually. > >>> I/O and > >>> system load is going through the roof as a result and causing other > >>> stuff like > >>> phab to crawl and begin timing out. > >>> > >>> So anyone using HTTP for doing cmdline git stuff is, at this moment, > >>> going to > >>> find things not working. SSH and GIT protocol should still work. I'll > >>> keep this > >>> shut down for a few hours hoping the bots give up. > >>> > >>> I added a robots.txt and edited the cigtrc to deny all bots from > >>> indexing > >>> git.enlightenment.org - but the bots seem to be ignoring that now > >>> that they > >>> have decided to start indexing. > >>> > >>> I am wondering if this has been the cause of our issues - being > >>> overloaded by > >>> indexer bots. FYI I counted 3 different bots indexing cgit at the > >>> same time: > >>> Baiduspider, Semrushbot, Ahrefsbot. > >>> > >>> I hope later they will start listening to robots.txt, but for now I > >>> need to > >>> keep things off until the bots give up. > >>> > >>> -- > >>> ------------- Codito, ergo sum - "I code, therefore I am" > >>> -------------- > >>> The Rasterman (Carsten Haitzler) [email protected] > >>> > >>> > >>> ------------------------------------------------------------------------------ > >>> Check out the vibrant tech community on one of the world's most > >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot > >>> _______________________________________________ > >>> enlightenment-devel mailing list > >>> [email protected] > >>> https://lists.sourceforge.net/lists/listinfo/enlightenment-devel > > -- > Bertrand > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > enlightenment-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/enlightenment-devel > -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- The Rasterman (Carsten Haitzler) [email protected] ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ enlightenment-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
