here is a url to scripts for nutch 0.8 and 0.9 http://wiki.apache.org/nutch/IntranetRecrawl#head-93eea6620f57b24dbe3591c293aead539a017ec7
On Tue, May 26, 2009 at 2:07 PM, Malaviya, Sanjay X < [email protected]> wrote: > I found script for msintsining the nutch index, but that seems to be quite > old and may be for version 0.7 If I run it I get bunch of errors. > > Parameter like bin/nutch analyze is not there in version 0.9 or 1.0 > Similarly parameter bin/index require bunch of inputs > There is no crawl/tmpfile > > ----------------------- > #!/bin/bash > > # Set JAVA_HOME to reflect your systems java configuration > export JAVA_HOME=/usr/lib/j2sdk1.5-sun > > # Start index updation > bin/nutch generate crawl.virtusa/db crawl.virtusa/segments -topN 1000 > s=`ls -d crawl.virtusa/segments/2* | tail -1` > echo Segment is $s > bin/nutch fetch $s > bin/nutch updatedb crawl.virtusa/db $s > bin/nutch analyze crawl.virtusa/db 5 > bin/nutch index $s > bin/nutch dedup crawl.virtusa /segments crawl.virtusa/tmpfile > > # Merge segments to prevent too many open files exception in Lucene > bin/nutch mergesegs -dir crawl.virtusa/segments -i -ds > s=`ls -d crawl.virtusa/segments/2* | tail -1` > echo Merged Segment is $s > > rm -rf crawl.virtusa/index > > ----------------------- > > > Sanjay > -----Original Message----- > From: Malaviya, Sanjay X [mailto:[email protected]] > Sent: Tuesday, May 26, 2009 3:11 PM > To: [email protected] > Subject: Shell Script to maintain Nutch index > > Hi, > Does anyone has the shell script to maintain nutch index that can be > scheduled to run every day. This will take care of the updates happening on > the web sites. I need it for version 0.9 or 1.0 > > Thanks > Sanjay > > > ------------------------------------------ > The contents of this message, together with any attachments, are intended > only for the use of the person(s) to which they are addressed and may > contain confidential and/or privileged information. Further, any medical > information herein is confidential and protected by law. It is unlawful for > unauthorized persons to use, review, copy, disclose, or disseminate > confidential medical information. If you are not the intended recipient, > immediately advise the sender and delete this message and any attachments. > Any distribution, or copying of this message, or any attachment, is > prohibited. > ------------------------------------------ > The contents of this message, together with any attachments, are > intended only for the use of the person(s) to which they are > addressed and may contain confidential and/or privileged > information. Further, any medical information herein is > confidential and protected by law. It is unlawful for unauthorized > persons to use, review, copy, disclose, or disseminate confidential > medical information. If you are not the intended recipient, > immediately advise the sender and delete this message and any > attachments. Any distribution, or copying of this message, or any > attachment, is prohibited. >
