Hi,

On 11/27/2010 04:01 PM, Fuad Efendi wrote:
> Ok; just no offence please, some people are doing it for glory, others
> simply paid fulltime by their employers (such as Yahoo, CNET, Oracle). I am
> cleaning fulltime some crap in a bank for instance. Sorry for off-topic.
> Robots are my part time hobby.
> 
> Real-life crawls (thousands websites) usually shows many specific directives
> which webmaster use for some robots, it's hard to find documentation
> (Japanese robots, Russian robots, Chinese, Americans :))
> 
> I think framework should be simple enough to plugin specific RobotDirective
> instance (or implementation?) - I don't see Nutch or BIXO approach can do it
> easily... also, Fetching Application (such as Nutch) can use additional
> directives from configuration files (such as do not follow anything like
> *shopping_cart* for specific site); it can be Regular Expressions, anything.

see this :
http://svn.apache.org/repos/asf/incubator/droids/trunk/droids-core/src/test/java/org/apache/droids/examples/SimpleRuntime.java

Particulary this part :
URLFiltersFactory filtersFactory = new URLFiltersFactory();
    RegexURLFilter defaultURLFilter = new RegexURLFilter();
    defaultURLFilter.setFile("classpath:/regex-urlfilter.txt");

Fill your requirements no ?

> And I also love Wicket (off-topic again) but "scraper" (configurator) for a
> robot should have UI, and Wicket is fastest (development time)

Search in ML archive and/or bugtrack, there is someone (sorry don't
remember name), that build an ui configurator if I well remember...

++

> 
> 

Reply via email to