Thank you for the tip, I still can't solve my problem. Let me explain in more details what I'm doing...
1. I created a file called 'urls.txt'. Put one url in it (e.g. http://localhost/xxx/) 2. nutch admin db -create 3. nutch inject db urls.txt 4. nutch generate db segments 5. nutch fetch segments/<latest_segment> 6. nutch updatedb db segments/<latest_segment> After repeating for, say, 2-3 times steps 4-6 and creating the index I then run: * nutch inject db new_urls.txt (new_urls.txt contains something like http://localhost/yyy/) * nutch generate db segments * nutch fetch segments/<latest_segment> The fetcher still downloads urls from http://localhost/xxx/ (along with those from http://localhost/yyy/), even if there are no links between the two sites. I can understand why it is behaving this way: I think the last 'generate' instruction takes all outgoing links from the latest segment, isn't it? But how can I 'force' nutch to consider only outgoing links from the newly injected url? A regex-urlfilter won't solve my problem, since this is a very simple example and not a real production scenario... Thank you in advance, Ennio On 1/24/06, "Håvard W. Kongsgård" <[EMAIL PROTECTED]> wrote: > If your "old urls" have not expired(30 day) then a bin/nutch generate > will process only the new urls. > > > > Ennio Tosi wrote: > > >Hi, I created an index from an injected url. My problem is that if now > >I inject another url in the webdb, the fetcher reprocesses the > >starting url too... Is there a way to tell nutch to only process the > >latest injected resource? > > > >Thanks, > >Ennio > > > > > > > > > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642 _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general