In was confused by this tutorial: http://wiki.apache.org/nutch/NutchTutorial
Reading this page one might get to the conclusion that the crawl tool
can't do iterative crawling, because under "3.2 Using Individual
Commands for Whole-Web Crawling" there's  the sentence "This also
permits ... incremental crawling", as if the crawl command described
before (3.1 Using the Crawl Command) couldn't do that.

Could someone perhaps improve this part of the tutorial?

Matthias






On Thu, May 10, 2012 at 8:39 PM, Markus Jelsma
<markus.jel...@openindex.io> wrote:
>
> By default each crawl is iterative. The crawl command is nothing more than a 
> wrapper around the individual crawl cycle commands. The depth parameter is 
> nothing more than executing a single crawl cycle multiple times. This is, if 
> i am not mistaken, also true for older releases, certainly 1.2 and above.
>
>
> On Thu, 10 May 2012 19:31:27 +0100, Lewis John Mcgibbney 
> <lewis.mcgibb...@gmail.com> wrote:
>>
>> For the record, there is a patch pending review for Nutchgora which
>> will sort part of this for you as well.
>>
>> https://issues.apache.org/jira/browse/NUTCH-1301
>>
>> Susam Pal also contributed a patch for Nutchgora regarding incremental
>> indexing but I can't find it just now sorry.
>>
>> Lewis
>>
>>
>> On Thu, May 10, 2012 at 5:18 PM, Matthias Paul
>> <magethle.nu...@gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> can the crawl-command also be used for iterative crawls?
>>> In older Nutch-versions this was not possible but in 1.5 it seems to work?
>>>
>>> Thanks
>>> Matthias
>
>
> --
> Markus Jelsma - CTO - Openindex

Reply via email to