[Nutch-general] parse-rss e

ogjunk-nutch Wed, 28 Mar 2007 13:32:07 -0800

Hi,

Chris added the RSS parses plugin a while back.  I never used it, so I'm not 
sure what that stuff is really for.  Can somebody explain?


Normally fetching and indexing a single web page results in a single Document 
in the index.  What happens when an RSS feed is encountered?  If the RSS feed 
is full, we treat each item as its own page/Document, and if it's not, then we 
extract item links and include those in some future fetchlist?

How does the link to an RSS feed make it into a fetchlist to begin with?  One 
has to include it explicitly, or does some other parser also parse links to 
feeds from HEAD>LINK element? ( http://issues.apache.org/jira/browse/NUTCH-412 
?)

Thanks,
Otis

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] parse-rss e

Reply via email to