pages: <url, <status, contentHash, lastFetchDate, numFailures> >
Is this list of storable fields extendable by plugins?
Sure, I don't see why not.
Great. I was not sure about that.


For example it might be intersting to monitor changes on websites and prefer more up to date pages in ranking.
So you'd add a lastChangedDate?
I am not sure exactly in the moment. Currently I think I would store the length of the document in one field.
So I could calculate the size of changes in the length when fetching the page again. There might be better possibilities to calculate a value about the size of changes. But currently I am not familar with that.


Second I would store a value about how frequently the page change.
If the page changes more then 10% or 10 words in length I would increment this value, else decrement it.
This value I would use to influence ranking. Often changing pages would be preferred.


So, 2 key/value pairs should be enough.

But storing the lastChangedDate could be interesting also.

Matthias


------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to