hi in plugin.includes value change urlfilter-regex to urlfilter-(crawl|regex)
bhupal Barry Haddow wrote: > > Hi Bhupal > > The plugin.includes is below - I haven't changed it at all. What should it > be? > > thanks and regards, > Barry > > <property> > <name>plugin.includes</name> > <value>protocol-http|urlfilter-regex|parse-(text|html|js)|index-(basic| > anchor)|query-(basic|site|url)|summary-basic|scoring-opic| > urlnormalizer-(pass|regex|basic)</value> > <description>Regular expression naming plugin directory names to > include. Any plugin not matching this expression is excluded. > In any case you need at least include the nutch-extensionpoints plugin. > By > default Nutch includes crawling just HTML and plain text via HTTP, > and basic indexing and search plugins. In order to use HTTPS please > enable > protocol-httpclient, but be aware of possible intermittent problems with > the > underlying commons-httpclient library. > </description> > </property> > > > On Tuesday 29 January 2008, bhupal wrote: >> Hi, >> >> Look at your conf/nutch-default.xml. >> I think you have not added crawl-urlfilter plugin in plugin-include >> property. >> >> bhupal. >> >> Barry Haddow wrote: >> > Hi >> > >> > I'm try to get the nutch/hadoop example from >> > http://wiki.apache.org/nutch/NutchHadoopTutorial >> > running. >> > >> > I've set up the urllist.txm and the crawl-urlfilter.xml as instructed >> in >> > the >> > tutorial, but whenever I run the crawl it either reports >> > >> > Generator: 0 records selected for fetching, exiting ... >> > Stopping at depth=1 - no more URLs to fetch. >> > >> > or >> > >> > Generator: 0 records selected for fetching, exiting ... >> > Stopping at depth=0 - no more URLs to fetch. >> > >> > >> > I can't tell if the crawler has managed to fetch any data. How can I >> > extract >> > whatever data is has downloaded? >> > >> > thanks, >> > Barry > > > -- View this message in context: http://www.nabble.com/Simple-crawl-fails-to-find-any-URLs-tp15143487p15156232.html Sent from the Nutch - User mailing list archive at Nabble.com.