I send again this message as it apparently didn't go through.
(I am messing up with my email addresses on the mailing list...) 

-----Original Message-----
Sent: Friday, February 02, 2007 10:29 AM

Using Nutch 0.8, we modified the code starting at the fetching/parsing steps 
and the following.
We have a different implementation of the Parse Object and OutputFormat 
including an additional list of ParseData objects saved in an additionnal 
subfolder in the DFS.
We changed the indexing step a lot too, so we don't use the nutch code there.


-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Friday, February 02, 2007 10:19 AM
To: nutch-dev@lucene.apache.org
Subject: Re: RSS-fecter and index individul-how can i realize this function

Attention, votre correspondant continue de vous écrire à votre ancienne adresse 
en @orange-ft.com, qui va être désactivée début avril. Veuillez lui demander de 
mettre à jour son carnet d'adresses avec votre nouvelle adresse en 
@orange-ftgroup.com.

Caution : your correspondent is still writing to your orange-ft.com address, 
which will be disabled beginning of April. Please ask him/her to update his/her 
address book to orange-ftgroup.com 
..................................................

Gal Nitzan wrote:
> IMHO the data that is needed i.e. the data that will be fetched in the next 
> fetch process is already available in the <item> element. Each <item> element 
> represents one web resource. And there is no reason to go to the server and 
> re-fetch that resource.

Perhaps ProtocolOutput should change.  The method:

   Content getContent();

could be deprecated and replaced with:

   Content[] getContents();

This would require changes to the indexing pipeline.  I can't think of

any severe complications, but I haven't looked closely.

Could something like that work?

Doug

Reply via email to