HI, what you can do is remove parse-js and other related plugin from nutch-site.xml file and nutch-default.xml file both . but its not recommended to do change in nutch-default.xml , though sometimes without changing in nutch-default.xml , it does not affect .
so you see what the changes you can do according to the requirement I am sure once you remove the parse-js It wount crawl javascript and try removing other plugins as parse-msword etc. I hope that it will done Ratnesh,V2Solutions,India Meryl Silverburgh wrote: > > Hi, > > How can I configure nutch just crawl html links (no images, no > javascript files, no css files)? > And it won't record in the crawl database for non html pages links. > > thank you. > > -- View this message in context: http://www.nabble.com/How-to-config-nutch-just-crawl-html-links--tf3562947.html#a9957697 Sent from the Nutch - User mailing list archive at Nabble.com.
