I found script for msintsining the nutch index, but that seems to be quite old and may be for version 0.7 If I run it I get bunch of errors.
Parameter like bin/nutch analyze is not there in version 0.9 or 1.0 Similarly parameter bin/index require bunch of inputs There is no crawl/tmpfile ----------------------- #!/bin/bash # Set JAVA_HOME to reflect your systems java configuration export JAVA_HOME=/usr/lib/j2sdk1.5-sun # Start index updation bin/nutch generate crawl.virtusa/db crawl.virtusa/segments -topN 1000 s=`ls -d crawl.virtusa/segments/2* | tail -1` echo Segment is $s bin/nutch fetch $s bin/nutch updatedb crawl.virtusa/db $s bin/nutch analyze crawl.virtusa/db 5 bin/nutch index $s bin/nutch dedup crawl.virtusa /segments crawl.virtusa/tmpfile # Merge segments to prevent too many open files exception in Lucene bin/nutch mergesegs -dir crawl.virtusa/segments -i -ds s=`ls -d crawl.virtusa/segments/2* | tail -1` echo Merged Segment is $s rm -rf crawl.virtusa/index ----------------------- Sanjay -----Original Message----- From: Malaviya, Sanjay X [mailto:[email protected]] Sent: Tuesday, May 26, 2009 3:11 PM To: [email protected] Subject: Shell Script to maintain Nutch index Hi, Does anyone has the shell script to maintain nutch index that can be scheduled to run every day. This will take care of the updates happening on the web sites. I need it for version 0.9 or 1.0 Thanks Sanjay ------------------------------------------ The contents of this message, together with any attachments, are intended only for the use of the person(s) to which they are addressed and may contain confidential and/or privileged information. Further, any medical information herein is confidential and protected by law. It is unlawful for unauthorized persons to use, review, copy, disclose, or disseminate confidential medical information. If you are not the intended recipient, immediately advise the sender and delete this message and any attachments. Any distribution, or copying of this message, or any attachment, is prohibited. ------------------------------------------ The contents of this message, together with any attachments, are intended only for the use of the person(s) to which they are addressed and may contain confidential and/or privileged information. Further, any medical information herein is confidential and protected by law. It is unlawful for unauthorized persons to use, review, copy, disclose, or disseminate confidential medical information. If you are not the intended recipient, immediately advise the sender and delete this message and any attachments. Any distribution, or copying of this message, or any attachment, is prohibited.
