Chris Mattmann wrote: > Guys, > > Sorry to be so thick-headed, but could someone explain to me in really > simple language what this change is requesting that is different from the > current Nutch API? I still don't get it, sorry... > Currently, the RSS parser returns a single Parse object that aggregates all all feeds entries, so one big Parse per RSS feed, say, fetching the feed http://blog.foofactory.fi/atom.xml currently creates:
parse1: Online indexing - integrating Nutch with Solr There might be times when [...] Website up I got an extra burst of energy yesterday [...] Sorted out The Fetcher performance in post 0.7.x version of Nutch [...] The proposed change would allow to return one Parse object per feed-entry, so fetching the feed http://blog.foofactory.fi/atom.xml would create: parse1: Online indexing - integrating Nutch with Solr There might be times when [...] parse2: Website up I got an extra burst of energy yesterday [...] parse3: Sorted out The Fetcher performance in post 0.7.x version of Nutch [...] And each would be indexed as a separate document. Does it makes sense? Another application I can see: the zip-indexer will be able to return a parse for each file in the archive. Renaud ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
