Re: [Nutch-dev] RSS-fecter and index individul-how can i realize this function

Doğacan Güney Mon, 05 Feb 2007 05:29:45 -0800

Doug Cutting wrote:
> Gal Nitzan wrote:
>> IMHO the data that is needed i.e. the data that will be fetched in 
>> the next fetch process is already available in the <item> element. 
>> Each <item> element represents one web resource. And there is no 
>> reason to go to the server and re-fetch that resource.
>
> Perhaps ProtocolOutput should change.  The method:
>
>   Content getContent();
>
> could be deprecated and replaced with:
>
>   Content[] getContents();
>
> This would require changes to the indexing pipeline.  I can't think of 
> any severe complications, but I haven't looked closely.


Since getProtocolOutput is called by Fetcher, fetcher(actually, the 
underlying protocol plugin) needs to be aware that we are actually 
fetching a rss feed and partially parse it to return an array of Contents.

I think it would make much more sense to change parse plugins to take 
content and return Parse[] instead of Parse.

--
Doğacan Güney
>
> Could something like that work?
>
> Doug
>
>
>


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] RSS-fecter and index individul-how can i realize this function

Reply via email to