Okay, I actually just wrote you a long email of what to do, step by step but 
when I tried to send it, my web mail session timed out and forced me to 
re-login, losing it all... I'm not happy :(
 
But straight to the point, since your using the older 0.7 code-base you can use 
partially fetched segments. When the fetcher dies just continue on to the next 
step as if it completed successfully.
 
You wont have all the pages in there, but you can always setup another fetch 
list, fetch it (fully or partially) and then merge the segments together and 
re-index.
 
There actually isn't much of a reason to generate "huge" multi-million page 
fetch lists when you can create lots of smaller ones and merge them together. 
This allows for more of a ladder-style approach, and in some cases reduces the 
risk of errors in terms of Hadoop versions (0.8+) with large unrecoverable 
fetches or failed parse-reduce stages.
 
Hope this helps.


----- Original Message ----
From: shrinivas patwardhan <[EMAIL PROTECTED]>
To: [email protected]
Sent: Monday, January 1, 2007 11:48:52 PM
Subject: fetcher : some doubts


hello
while fetching a certain fetchlist say about 2 - 3 million pages there are
some errors that might the fetcher process to stop
1: low disk space
2 : any severe error (like internet connection faliure for a long time )
3: some more like  java heap space
these are some of the resons that i faced ..
now straight answer would be that i should be taking care of all those
before i start fetching ..
i agree but in case the fetcher stops ...
Is there any way to continue fetching from where it stopped ?
if not can we all contribute towards that ?
i have been through the re fetching threads but wud that help me in this
case
example :
if i have fetched around a million pages and still some 2 million pages a
left to be fetched and the fetcher stops  due to low disk space is there any
way to continue from where i stopped after i organise everything (arrange
for another disk or free partition )

Thanks & Regards
Shrinivas Patwardhan
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to