Re: [Nutch-general] Partial crawls.

Lyndon Maydwell Sun, 01 Jul 2007 18:11:37 -0700

I should also mention that I'm running nutch version 0.9

On 7/2/07, Lyndon Maydwell <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm a new user to nutch and am wondering about seeding the database by
> running a crawl with  a very shallow depth, then growing the database
> every time the periodic update script is done. I have two scripts that
> I'm currently using, but I'm not sure if the update script is actually
> adding searchable data. The initial crawl script is doing a great job,
> and I can verify that it is working by using the search app that comes
> with nutch, but my maintenance script doesn't seem to be adding any
> results, although it throws no errors.
>
> Below are the two small scripts. Am I missing any simple errors?
>
> -- initial crawl script << END1 --
>
> #!/bin/sh
> ./../bin/nutch crawl urls -dir crawl -depth 2 -topN 10000
>
> END1
>
> -- updater script << END2 --
>
> first="crawl"
> second="100000"
>
> ../bin/nutch generate $first/crawldb $first/segments -topN $second
>
> segment=`ls -d $first/segments/* | tail -1 | grep "[a-zA-Z0-9/]*"`
>
> ../bin/nutch fetch       $segment
>
> ../bin/nutch updatedb    $first/crawldb $segment
>
> rm -r $first/indexes
>
> ../bin/nutch invertlinks $first/linkdb  $first/segments/*
>
> ../bin/nutch index       $first/indexes $first/crawldb
> $first/linkdb $first/segments/*
>
> END2
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Partial crawls.

Reply via email to