>From what I have gathered is that you may want to keep multiple crawldbs for your crawls. So, you could have a crawldb for more frequent crawls and fire off nutch and read that db with the appropriate configs for that job. I was hoping for the same mechanism, but it looks like we need to write this for ourselves.
On 4/12/07, Arie Karhendana <[EMAIL PROTECTED]> wrote: > Hi all, > > I'm a new user of Nutch. I use Nutch primarily to crawl blog and news > sites. But I noticed that Nutch fetches pages only on some refresh > interval (30 days default). > > Blog and news sites have unique characteristic that some of their > pages are updated very frequently (e.g. the main page) so they have to > be refetched often, while other pages don't need to be refreshed / > refetched at all (e.g. the news article pages, which eventually will > become 'obsolete'). > > Is there any way to force update some URLs? Can I just 're-inject' the > URLs to set the next fetch date to 'immediately'? > > Thank you, > -- > Arie Karhendana > -- "Conscious decisions by concious minds are what make reality real" ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
