I am using the rss parser, and as the fetcher finds the urls within the rss
feed, I am attaching the descriptions of each item and placing it into the
anchors of the outlinks.

when I get to the indexer, for any item url that did not redirect, I can get
the Inlink and see the anchor text and associate the proper description to
the given url being indexed.

However, if that rss item went to some url that redirects, when I am in the
indexer, the inlinks are null.  
How do I keep the inlinks associated to the correctly redirected page?

example to clarify the above question:
rss feed url: http://somecompany/feed.rss
item url within feed: http://somecompany/8888/item.html
that item url redirects to: http://somecompany/redirect8888/item.html

I want to keep the outlink generated from the parsing of the rss feed url
and have it associated as the inlink to the redirected url.

Would I need to do something in the fetcher to modify the crawldatum
somehow? Or, am I missing something somewhere?

Thanks,
Scott


-- 
View this message in context: 
http://www.nabble.com/Null-Inlinks-with-rss-redirect-tf2829892.html#a7900655
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to