Hi Xiao,
FWIR there is adaptive refetch interval support in Nutch currently -
or are you looking for something different?
Regards,
-- Ken
On Oct 27, 2010, at 1:42am, xiao yang wrote:
I want to modify the schedule of crawler to make it more real-time.
Some web pages are frequently updated, while others seldom change. My
idea is to classify URL into 2 categories which will affect the score
of URL, so I want to add a field to store which category a URL belongs
to.
The idea is simple, but I found it's not so easy to implement in
Nutch.
Thanks!
Xiao
--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g