Daniel D. wrote:
*My assumption was that as I have crawled 3 original URLs and have discovered some new URLS, I should next time see in the fetchlist my 3 original URLS + new URLS (based on specified urlfilter-regex ). I wanted to see my original URLS re-crawled! I didn't find them in the new fetchlist and this was my question – what am I missing here? Why those URLS are not being included in the fetchlist even so they fetch time already past?*

Okay, you had the default fetch interval set to 1 day, right? You need to check this, e.g. by dumping the DB (nutch readdb db -dumppageurl).

Next re-fetch time is after 1 day. If you generate fetchlists before that, only newly discovered pages will be added to fetchlists, because your original pages are still not due for re-fetching.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy. Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to