Hi Matthew, I am surious about one thing. How do you know you can just drop $depth number of the most oldest segments in the end? I haven't studied nutch code regarding this topic yet but I thought that segment can be dropped once you are sure that all its content is already crawled in some newer segments (which should be checked somehow via some function/script - which hasen't been yet implemented to my knowledge).
Also I don't think this question has been discussed on dev/user lists in detail yet so I just wanted to ask you about your opinion. The situation could get even more complicated if people add -topN parameter into script (which can happen because some might prefer crawling in ten smaller bunches over to two huge crawls due to various technical reasons). Anyway, never mind if you don't want to bother about my silly question :-) Regards, Lukas On 8/4/06, Matthew Holt <[EMAIL PROTECTED]> wrote: > Last email regarding this script. I found a bug in it that is sporadic > (i think it only affected different setups). However, since it would be > a problem sometimes, I refactored the script. I'd suggest you redownload > the script if you are using it. > > Matt > > Matthew Holt wrote: > > I'm currently pretty busy at work. If I have I'll do it later. > > > > The version 0.8 recrawl script has a working version online now. I > > temporarily modified it on the website yesterday when I ran into some > > problems, but I further tested it and the actual working code is > > modified now. So if you got it off the web site any time yesterday, I > > would redownload the script. > > > > Matt > > > > Lourival JĂșnior wrote: > >> Hi Matthew! > >> > >> Could you update the script to the version 0.7.2 with the same > >> functionalities? I write a scritp that do this, but it don't work very > >> well... > >> > >> Regards! > >> > >> On 8/2/06, Matthew Holt <[EMAIL PROTECTED]> wrote: > >>> > >>> Just letting everyone know that I updated the recrawl script on the > >>> Wiki. It now merges the created segments them deletes the old segs to > >>> prevent a lot of unneeded data remaining/growing on the hard drive. > >>> Matt > >>> > >>> > >>> http://wiki.apache.org/nutch/IntranetRecrawl?action=show#head-e58e25a0b9530bb6fcdfb282fd27a207fc0aff03 > >>> > >>> > >> > >> > >> > > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
