I'm about to do a pretty big crawl, and when I generate my segments it says
"jobtracker is 'local' , generating exactly one partition.

My problem is that I cant put all my monney on that the crawler wont crash
any time during the period I'm about to crawl. And from what I can
understand a crash while crawling a segment will mean that I have to reado
the whole segment. (is this wrong?)

So my idea was to create many segments, and do a batch file which starts the
segments right after eachother, and if it crashes thats no problem, lets
just continue and redo the segments that didnt got crawled.

>From reading on the net I've realized that I cant do -numFetchers on local
jobs, and that I have to set it to -ndfs. But I just cant seem to get this
to work.
bin/nutch generate -ndfs <nameserver:port> crawl/crawldb crawl/segments
-numFetchers 10 is basicly what I would like to do, but I have no clue what
the <nameserver:port> is. The more I read about ndfs, the more I start to
doubt if that's really what I want to do.

Is there perhaps a way to split segments after it's generated? Just like
there's a way to merge them with mergesegs?

Why is this so hard, have I missed something? I cant be the first who want
to do a fail-safe crawl and dont want to loose all work if the connection or
computer crashes.
-- 
View this message in context: 
http://www.nabble.com/How-to-get-more-than-1-segments-tp23606579p23606579.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to