I send again this message as it apparently didn't go through. (I am messing up with my email addresses on the mailing list...)
-----Original Message----- Sent: Friday, February 02, 2007 10:29 AM Using Nutch 0.8, we modified the code starting at the fetching/parsing steps and the following. We have a different implementation of the Parse Object and OutputFormat including an additional list of ParseData objects saved in an additionnal subfolder in the DFS. We changed the indexing step a lot too, so we don't use the nutch code there. -----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Friday, February 02, 2007 10:19 AM To: nutch-dev@lucene.apache.org Subject: Re: RSS-fecter and index individul-how can i realize this function Attention, votre correspondant continue de vous écrire à votre ancienne adresse en @orange-ft.com, qui va être désactivée début avril. Veuillez lui demander de mettre à jour son carnet d'adresses avec votre nouvelle adresse en @orange-ftgroup.com. Caution : your correspondent is still writing to your orange-ft.com address, which will be disabled beginning of April. Please ask him/her to update his/her address book to orange-ftgroup.com .................................................. Gal Nitzan wrote: > IMHO the data that is needed i.e. the data that will be fetched in the next > fetch process is already available in the <item> element. Each <item> element > represents one web resource. And there is no reason to go to the server and > re-fetch that resource. Perhaps ProtocolOutput should change. The method: Content getContent(); could be deprecated and replaced with: Content[] getContents(); This would require changes to the indexing pipeline. I can't think of any severe complications, but I haven't looked closely. Could something like that work? Doug