I found script for msintsining the nutch index, but that seems to be quite old 
and may be for version 0.7 If I run it I get bunch of errors.

Parameter like bin/nutch analyze is not there in version 0.9 or 1.0
Similarly parameter bin/index require bunch of inputs
There is no crawl/tmpfile

-----------------------
#!/bin/bash

  # Set JAVA_HOME to reflect your systems java configuration
  export JAVA_HOME=/usr/lib/j2sdk1.5-sun  

  # Start index updation
  bin/nutch generate crawl.virtusa/db crawl.virtusa/segments -topN 1000
  s=`ls -d crawl.virtusa/segments/2* | tail -1`
  echo Segment is $s
  bin/nutch fetch $s
  bin/nutch updatedb crawl.virtusa/db $s
  bin/nutch analyze crawl.virtusa/db 5
  bin/nutch index $s
  bin/nutch dedup crawl.virtusa /segments crawl.virtusa/tmpfile 

  # Merge segments to prevent too many open files exception in Lucene
  bin/nutch mergesegs -dir crawl.virtusa/segments -i -ds
  s=`ls -d crawl.virtusa/segments/2* | tail -1`
  echo Merged Segment is $s

  rm -rf crawl.virtusa/index

-----------------------


Sanjay
-----Original Message-----
From: Malaviya, Sanjay X [mailto:[email protected]] 
Sent: Tuesday, May 26, 2009 3:11 PM
To: [email protected]
Subject: Shell Script to maintain Nutch index

Hi,
Does anyone has the shell script to maintain nutch index that can be scheduled 
to run every day. This will take care of the updates happening on the web 
sites. I need it for version 0.9 or 1.0
 
Thanks
Sanjay


------------------------------------------
The contents of this message, together with any attachments, are intended only 
for the use of the person(s) to which they are addressed and may contain 
confidential and/or privileged information. Further, any medical information 
herein is confidential and protected by law. It is unlawful for unauthorized 
persons to use, review, copy, disclose, or disseminate confidential medical 
information. If you are not the intended recipient, immediately advise the 
sender and delete this message and any attachments. Any distribution, or 
copying of this message, or any attachment, is prohibited.
------------------------------------------
The contents of this message, together with any attachments, are
intended only for the use of the person(s) to which they are
addressed and may contain confidential and/or privileged
information. Further, any medical information herein is
confidential and protected by law. It is unlawful for unauthorized
persons to use, review, copy, disclose, or disseminate confidential
medical information. If you are not the intended recipient,
immediately advise the sender and delete this message and any
attachments. Any distribution, or copying of this message, or any
attachment, is prohibited.

Reply via email to