Doug Cutting wrote: > Gal Nitzan wrote: >> IMHO the data that is needed i.e. the data that will be fetched in >> the next fetch process is already available in the <item> element. >> Each <item> element represents one web resource. And there is no >> reason to go to the server and re-fetch that resource. > > Perhaps ProtocolOutput should change. The method: > > Content getContent(); > > could be deprecated and replaced with: > > Content[] getContents(); > > This would require changes to the indexing pipeline. I can't think of > any severe complications, but I haven't looked closely.
Since getProtocolOutput is called by Fetcher, fetcher(actually, the underlying protocol plugin) needs to be aware that we are actually fetching a rss feed and partially parse it to return an array of Contents. I think it would make much more sense to change parse plugins to take content and return Parse[] instead of Parse. -- Doğacan Güney > > Could something like that work? > > Doug > > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
