Hi @ all,

I'd like to run an intranet crawl with my own plugin on the domain
www.wikicfp.com.
(http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&skip=1)

The problem is that nutch doesn't find the important urls, so nutch can't
crawl further...
(http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&page=2)
(http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&page=3)
(http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&page=4)
(http://www.wikicfp.com/cfp/call?conference=artificial%20intelligence&page=
....)

Any suggestions?

nutch-site.xml

<property>
  <name>plugin.includes</name>
  <value>my-plugin|protocol-http|parse-(html|js)|index-basic</value>
  <description>
  </description>
</property>

I commented all urlfilter files (regex etc..) in conf/.

Thanks in advance.

Regards,
MyD

-- 
View this message in context: 
http://www.nabble.com/Nutch-doesn%27t-find-all-urls..-Any-suggestion--tp22599690p22599690.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to