here is a url to scripts for nutch 0.8 and 0.9
http://wiki.apache.org/nutch/IntranetRecrawl#head-93eea6620f57b24dbe3591c293aead539a017ec7




On Tue, May 26, 2009 at 2:07 PM, Malaviya, Sanjay X <
[email protected]> wrote:

> I found script for msintsining the nutch index, but that seems to be quite
> old and may be for version 0.7 If I run it I get bunch of errors.
>
> Parameter like bin/nutch analyze is not there in version 0.9 or 1.0
> Similarly parameter bin/index require bunch of inputs
> There is no crawl/tmpfile
>
> -----------------------
> #!/bin/bash
>
>  # Set JAVA_HOME to reflect your systems java configuration
>  export JAVA_HOME=/usr/lib/j2sdk1.5-sun
>
>  # Start index updation
>  bin/nutch generate crawl.virtusa/db crawl.virtusa/segments -topN 1000
>  s=`ls -d crawl.virtusa/segments/2* | tail -1`
>  echo Segment is $s
>  bin/nutch fetch $s
>  bin/nutch updatedb crawl.virtusa/db $s
>  bin/nutch analyze crawl.virtusa/db 5
>  bin/nutch index $s
>  bin/nutch dedup crawl.virtusa /segments crawl.virtusa/tmpfile
>
>  # Merge segments to prevent too many open files exception in Lucene
>  bin/nutch mergesegs -dir crawl.virtusa/segments -i -ds
>  s=`ls -d crawl.virtusa/segments/2* | tail -1`
>  echo Merged Segment is $s
>
>  rm -rf crawl.virtusa/index
>
> -----------------------
>
>
> Sanjay
> -----Original Message-----
> From: Malaviya, Sanjay X [mailto:[email protected]]
> Sent: Tuesday, May 26, 2009 3:11 PM
> To: [email protected]
> Subject: Shell Script to maintain Nutch index
>
> Hi,
> Does anyone has the shell script to maintain nutch index that can be
> scheduled to run every day. This will take care of the updates happening on
> the web sites. I need it for version 0.9 or 1.0
>
> Thanks
> Sanjay
>
>
> ------------------------------------------
> The contents of this message, together with any attachments, are intended
> only for the use of the person(s) to which they are addressed and may
> contain confidential and/or privileged information. Further, any medical
> information herein is confidential and protected by law. It is unlawful for
> unauthorized persons to use, review, copy, disclose, or disseminate
> confidential medical information. If you are not the intended recipient,
> immediately advise the sender and delete this message and any attachments.
> Any distribution, or copying of this message, or any attachment, is
> prohibited.
> ------------------------------------------
> The contents of this message, together with any attachments, are
> intended only for the use of the person(s) to which they are
> addressed and may contain confidential and/or privileged
> information. Further, any medical information herein is
> confidential and protected by law. It is unlawful for unauthorized
> persons to use, review, copy, disclose, or disseminate confidential
> medical information. If you are not the intended recipient,
> immediately advise the sender and delete this message and any
> attachments. Any distribution, or copying of this message, or any
> attachment, is prohibited.
>

Reply via email to