Ok; just no offence please, some people are doing it for glory, others simply paid fulltime by their employers (such as Yahoo, CNET, Oracle). I am cleaning fulltime some crap in a bank for instance. Sorry for off-topic. Robots are my part time hobby.
Real-life crawls (thousands websites) usually shows many specific directives which webmaster use for some robots, it's hard to find documentation (Japanese robots, Russian robots, Chinese, Americans :)) I think framework should be simple enough to plugin specific RobotDirective instance (or implementation?) - I don't see Nutch or BIXO approach can do it easily... also, Fetching Application (such as Nutch) can use additional directives from configuration files (such as do not follow anything like *shopping_cart* for specific site); it can be Regular Expressions, anything. And I also love Wicket (off-topic again) but "scraper" (configurator) for a robot should have UI, and Wicket is fastest (development time)
