I was just looking through NoRobotClient and have concern whether Droids will actually respect robots.txt when force allow is false in most scenarios; consider the following robots.txt:
User-agent: * Disallow: /foo/ and the starting URI: http://www.example.com/foo/bar.html In the code I see - in NoRobotClient.isUrlAllowed() - the following: String path = uri.getPath(); String basepath = baseURI.getPath(); if (path.startsWith(basepath)) { path = path.substring(basepath.length()); if (!path.startsWith("/")) { path = "/" + path; } } ... Boolean allowed = this.rules != null ? this.rules.isAllowed( path ) : null; if(allowed == null) { allowed = this.wildcardRules != null ? this.wildcardRules.isAllowed( path ) : null; } if(allowed == null) { allowed = Boolean.TRUE; } The path will always be converted to /bar.html and is checked against the Rules in rules and wildcardRules but won't be found. However, basepath (which will now be /foo) is never checked against the Rules, therefore giving an incorrect true result for the isUrlAllowed method, no? robin
