Re: [Nutch-general] Forcing update of some URLs

Briggs Thu, 19 Apr 2007 14:56:19 -0700

>From what I have gathered is that you may want to keep multiple
crawldbs for your crawls.  So, you could have a crawldb for more
frequent crawls and fire off nutch and read that db with the
appropriate configs for that job.   I was hoping for the same
mechanism, but it looks like we need to write this for ourselves.



On 4/12/07, Arie Karhendana <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I'm a new user of Nutch. I use Nutch primarily to crawl blog and news
> sites. But I noticed that Nutch fetches pages only on some refresh
> interval (30 days default).
>
> Blog and news sites have unique characteristic that some of their
> pages are updated very frequently (e.g. the main page) so they have to
> be refetched often, while other pages don't need to be refreshed /
> refetched at all (e.g. the news article pages, which eventually will
> become 'obsolete').
>
> Is there any way to force update some URLs? Can I just 're-inject' the
> URLs to set the next fetch date to 'immediately'?
>
> Thank you,
> --
> Arie Karhendana
>


-- 
"Conscious decisions by concious minds are what make reality real"

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Forcing update of some URLs

Reply via email to