I am using the rss parser, and as the fetcher finds the urls within the rss feed, I am attaching the descriptions of each item and placing it into the anchors of the outlinks.
when I get to the indexer, for any item url that did not redirect, I can get the Inlink and see the anchor text and associate the proper description to the given url being indexed. However, if that rss item went to some url that redirects, when I am in the indexer, the inlinks are null. How do I keep the inlinks associated to the correctly redirected page? example to clarify the above question: rss feed url: http://somecompany/feed.rss item url within feed: http://somecompany/8888/item.html that item url redirects to: http://somecompany/redirect8888/item.html I want to keep the outlink generated from the parsing of the rss feed url and have it associated as the inlink to the redirected url. Would I need to do something in the fetcher to modify the crawldatum somehow? Or, am I missing something somewhere? Thanks, Scott -- View this message in context: http://www.nabble.com/Null-Inlinks-with-rss-redirect-tf2829892.html#a7900655 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
