When I org.apache.nutch.parse.rss.RSSParser , its working fine.Now I am getting URLs.Now i want to get content. How will i do this? Do i need to send to all URLs to crawldb.Then run the crawl command,or there is another way.
hi I want to parse feedUrl using nutch.i tried to use org.apache.nutch.parse.feed.FeedParser class. Its input is xml. I put in xml the link below. http://timesofindia.indiatimes.com/rssfeedsdefault.cms This url contains all rss feeds for newspaper.When i tried to use it through Rome Feed Parser it was giving me all the permalink, title,date etc. But nutch parser doesnot give anything. How can i get all the permalink,title,date in this url. -- View this message in context: http://www.nabble.com/How-to-Parse-Rss-Feed-URL-tp24386051p24404029.html Sent from the Nutch - User mailing list archive at Nabble.com.
