sandeep pujar wrote:
> By incremental I meant after a full crawl then next
> crawls should fetch only the changed pages.

The problem with fetching changed pages is you need to know what pages 
have changed.  Once you do you can load only the changed pages through 
an inject, generated, fetch, cycle and then merge crawldb and segments 
with previously fetched results.  The python script performs this type 
of process but not for changed pages, for new unfetched links.  You may 
be able to modify it to fetch only changed pages.

Dennis Kubes
> 
> I was not clear on how I could use the python
> automation script for that.
> 
> Is there something I am missing here ?
> 
> 
> --- Dennis Kubes <[EMAIL PROTECTED]> wrote:
> 
>> You can use the python automation script found at:
>>
>>
> http://wiki.apache.org/nutch/Automating_Fetches_with_Python
>> I almost have a new version ready.  Will post it in
>> the next couple of 
>> days to the wiki.
>>
>> Dennis Kubes
>>
>> sandeep pujar wrote:
>>> Greetings,
>>>
>>> Are there ways we can initiate incremental
>> crawl/index
>>> using Nutch.
>>>
>>> I tried to lookup  wikis and other sources and did
>> not
>>> find much information.
>>>
>>> Any ideas pointers,
>>>
>>> Thanks,
>>> Sandeep
>>>
>>>
>>>
>>>
>>>  
>>>
> ____________________________________________________________________________________
>>> Sucker-punch spam with award-winning protection. 
>>> Try the free Yahoo! Mail Beta.
>>>
> http://advision.webevents.yahoo.com/mailbeta/features_spam.html
> 
> 
> 
>  
> ____________________________________________________________________________________
> Don't get soaked.  Take a quick peak at the forecast
> with the Yahoo! Search weather shortcut.
> http://tools.search.yahoo.com/shortcuts/#loc_weather

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to