Hi @ all, I'd like to run an intranet crawl with my own plugin on the domain www.wikicfp.com. (http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&skip=1)
The problem is that nutch doesn't find the important urls, so nutch can't crawl further... (http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&page=2) (http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&page=3) (http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&page=4) (http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&page= ....) Any suggestions? nutch-site.xml <property> <name>plugin.includes</name> <value>my-plugin|protocol-http|parse-(html|js)|index-basic</value> <description> </description> </property> I commented all urlfilter files (regex etc..) in conf/. Thanks in advance. Regards, MyD -- View this message in context: http://www.nabble.com/Nutch-doesn%27t-find-all-urls..-Any-suggestion--tp22599690p22599690.html Sent from the Nutch - User mailing list archive at Nabble.com.