Hi: I am unable to get the attached patch via mail. Its better if you create a JIra issue and attached the patch there.
Thank you. On 2/15/07, Doğacan Güney <[EMAIL PROTECTED]> wrote: > Hi, > > There seems to be two small bugs in lib-http's RobotRulesParser. > > First is about reading crawl-delay. The code doesn't check for addRules, > so the nutch bot will get the crawl-delay value of another robot's > crawl-delay in robots.txt. Let me try to be more clear: > > User-agent: foobot > Crawl-delay: 3600 > > User-agent: * > Disallow: > > > In such a robots.txt file, nutch bot will get 3600 as its crawl-delay > value, no matter what nutch bot's name actually is. > > Second is about main method. RobotRulesParser.main advertises its usage > as "<robots-file> <url-file> <agent-name>+" but if you give it more than > one agent time it refuses it. > > Trivial patch attached. > > -- > Doğacan Güney > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
