Hi Matthew,

I am surious about one thing. How do you know you can just drop $depth
number of the most oldest segments in the end? I haven't studied nutch
code regarding this topic yet but I thought that segment can be
dropped once you are sure that all its content is already crawled in
some newer segments (which should be checked somehow via some
function/script - which hasen't been yet implemented to my knowledge).

Also I don't think this question has been discussed on dev/user lists
in detail yet so I just wanted to ask you about your opinion. The
situation could get even more complicated if people add -topN
parameter into script (which can happen because some might prefer
crawling in ten smaller bunches over to two huge crawls due to various
technical reasons).

Anyway, never mind if you don't want to bother about my silly question :-)

Regards,
Lukas

On 8/4/06, Matthew Holt <[EMAIL PROTECTED]> wrote:
> Last email regarding this script. I found a bug in it that is sporadic
> (i think it only affected different setups). However, since it would be
> a problem sometimes, I refactored the script. I'd suggest you redownload
> the script if you are using it.
>
> Matt
>
> Matthew Holt wrote:
> > I'm currently pretty busy at work. If I have I'll do it later.
> >
> > The version 0.8 recrawl script has a working version online now. I
> > temporarily modified it on the website yesterday when I ran into some
> > problems, but I further tested it and the actual working code is
> > modified now. So if you got it off the web site any time yesterday, I
> > would redownload the script.
> >
> > Matt
> >
> > Lourival JĂșnior wrote:
> >> Hi Matthew!
> >>
> >> Could you update the script to the version 0.7.2 with the same
> >> functionalities? I write a scritp that do this, but it don't work very
> >> well...
> >>
> >> Regards!
> >>
> >> On 8/2/06, Matthew Holt <[EMAIL PROTECTED]> wrote:
> >>>
> >>> Just letting everyone know that I updated the recrawl script on the
> >>> Wiki. It now merges the created segments them deletes the old segs to
> >>> prevent a lot of unneeded data remaining/growing on the hard drive.
> >>>   Matt
> >>>
> >>>
> >>> http://wiki.apache.org/nutch/IntranetRecrawl?action=show#head-e58e25a0b9530bb6fcdfb282fd27a207fc0aff03
> >>>
> >>>
> >>
> >>
> >>
> >
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to