[ 
http://issues.apache.org/jira/browse/NUTCH-183?page=comments#action_12363888 ] 

Dominik Friedrich commented on NUTCH-183:
-----------------------------------------

I tested this patch and jobs seem to run into deadlocks when one node crashes 
while others are loading map output data from that node. Here some lines 
tasktracker on node B that tries to copy map data from node A which has crashed:

060124 181752 task_r_27r56x 0.2212838% reduce > copy > [EMAIL PROTECTED]:50040
060124 181753 task_r_7jjqag 0.17820947% reduce > copy >
060124 181753 task_r_27r56x 0.2212838% reduce > copy > [EMAIL PROTECTED]:50040
060124 181754 task_r_7jjqag 0.17820947% reduce > copy >
060124 181754 task_r_27r56x 0.2212838% reduce > copy > [EMAIL PROTECTED]:50040
060124 181755 task_r_7jjqag 0.17820947% reduce > copy >
060124 181755 task_r_27r56x 0.2212838% reduce > copy > [EMAIL PROTECTED]:50040
060124 181756 task_r_7jjqag 0.17820947% reduce > copy >
[...]
060124 223510 task_r_27r56x 0.2212838% reduce > copy > [EMAIL PROTECTED]:50040
060124 223511 task_r_7jjqag 0.17820947% reduce > copy >
060124 223511 task_r_27r56x 0.2212838% reduce > copy > [EMAIL PROTECTED]:50040
060124 223512 task_r_7jjqag 0.17820947% reduce > copy >
060124 223512 task_r_27r56x 0.2212838% reduce > copy > [EMAIL PROTECTED]:50040
060124 223513 task_r_7jjqag 0.17820947% reduce > copy >
060124 223513 task_r_27r56x 0.2212838% reduce > copy > [EMAIL PROTECTED]:50040

Node A was removed from the jobtracker's node list but it seems like not all 
tasks depending on that node have been killed.

> MapReduce has a series of problems concerning task-allocation to worker nodes
> -----------------------------------------------------------------------------
>
>          Key: NUTCH-183
>          URL: http://issues.apache.org/jira/browse/NUTCH-183
>      Project: Nutch
>         Type: Improvement
>  Environment: All
>     Reporter: Mike Cafarella
>  Attachments: jobtracker.patch
>
> The MapReduce JobTracker is not great at allocating tasks to TaskTracker 
> worker nodes.
> Here are the problems:
> 1) There is no speculative execution of tasks
> 2) Reduce tasks must wait until all map tasks are completed before doing any 
> work
> 3) TaskTrackers don't distinguish between Map and Reduce jobs.  Also, the 
> number of
> tasks at a single node is limited to some constant.  That means you can get 
> weird deadlock
> problems upon machine failure.  The reduces take up all the available 
> execution slots, but they
> don't do productive work, because they're waiting for a map task to complete. 
>  Of course, that
> map task won't even be started until the reduce tasks finish, so you can see 
> the problem...
> 4) The JobTracker is so complicated that it's hard to fix any of these.
> The right solution is a rewrite of the JobTracker to be a lot more flexible 
> in task handling.
> It has to be a lot simpler.  One way to make it simpler is to add an 
> abstraction I'll call
> "TaskInProgress".  Jobs are broken into chunks called TasksInProgress.  All 
> the TaskInProgress
> objects must be complete, somehow, before the Job is complete.
> A single TaskInProgress can be executed by one or more Tasks.  TaskTrackers 
> are assigned Tasks.
> If a Task fails, we report it back to the JobTracker, where the 
> TaskInProgress lives.  The TIP can then
> decide whether to launch additional  Tasks or not.
> Speculative execution is handled within the TIP.  It simply launches multiple 
> Tasks in parallel.  The
> TaskTrackers have no idea that these Tasks are actually doing the same chunk 
> of work.  The TIP
> is complete when any one of its Tasks are complete.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to