Ok; just no offence please, some people are doing it for glory, others
simply paid fulltime by their employers (such as Yahoo, CNET, Oracle). I am
cleaning fulltime some crap in a bank for instance. Sorry for off-topic.
Robots are my part time hobby.

Real-life crawls (thousands websites) usually shows many specific directives
which webmaster use for some robots, it's hard to find documentation
(Japanese robots, Russian robots, Chinese, Americans :))

I think framework should be simple enough to plugin specific RobotDirective
instance (or implementation?) - I don't see Nutch or BIXO approach can do it
easily... also, Fetching Application (such as Nutch) can use additional
directives from configuration files (such as do not follow anything like
*shopping_cart* for specific site); it can be Regular Expressions, anything.
And I also love Wicket (off-topic again) but "scraper" (configurator) for a
robot should have UI, and Wicket is fastest (development time)


Reply via email to