Hi, On 11/27/2010 04:01 PM, Fuad Efendi wrote: > Ok; just no offence please, some people are doing it for glory, others > simply paid fulltime by their employers (such as Yahoo, CNET, Oracle). I am > cleaning fulltime some crap in a bank for instance. Sorry for off-topic. > Robots are my part time hobby. > > Real-life crawls (thousands websites) usually shows many specific directives > which webmaster use for some robots, it's hard to find documentation > (Japanese robots, Russian robots, Chinese, Americans :)) > > I think framework should be simple enough to plugin specific RobotDirective > instance (or implementation?) - I don't see Nutch or BIXO approach can do it > easily... also, Fetching Application (such as Nutch) can use additional > directives from configuration files (such as do not follow anything like > *shopping_cart* for specific site); it can be Regular Expressions, anything.
see this : http://svn.apache.org/repos/asf/incubator/droids/trunk/droids-core/src/test/java/org/apache/droids/examples/SimpleRuntime.java Particulary this part : URLFiltersFactory filtersFactory = new URLFiltersFactory(); RegexURLFilter defaultURLFilter = new RegexURLFilter(); defaultURLFilter.setFile("classpath:/regex-urlfilter.txt"); Fill your requirements no ? > And I also love Wicket (off-topic again) but "scraper" (configurator) for a > robot should have UI, and Wicket is fastest (development time) Search in ML archive and/or bugtrack, there is someone (sorry don't remember name), that build an ui configurator if I well remember... ++ > >
