[jira] Commented: (MAPREDUCE-2162) speculative execution does not handle cases where stddev > mean well
[ https://issues.apache.org/jira/browse/MAPREDUCE-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965936#action_12965936 ] Joydeep Sen Sarma commented on MAPREDUCE-2162: -- here's the reasoning behind capping stddev at mean/3. we speculate if: * rate < mean - stddev implies * 1/rate > 1/(mean - stddev) implies * 1/rate > 1/mean + (1/(mean - stddev) - 1/mean) implies # projectedTime > meanTime + Delta where * Delta = (1/(mean - stddev) - 1/mean) if * stddev <= mean/3 // for example then * Delta > (1/(mean - mean/3) - 1/mean) ==> * Delta > 0.5/mean = 0.5 * MeanTime now our our equation _1_ becomes: # projectedTime > MeanTime + 0.5*MeanTime two observations: * by capping stddev - we have converted the rate check into a meaningful check on the running time of a task - tasks that run longer than a certain time (relative to the mean) will be guaranteed to be speculated. * the Meantime + 0.5*Meantime slack over the mean is same as the heuristic discussed in the jira where two rules were discussed: ** dont speculate if runningTime <= MeanTime * 0.5 ** dont speculate if remainingTime < MeanTime * if we add these two together - since runningTime + remainingTime == projectedTime - this becomes (roughly): ** speculate only if projectedTime > MeanTime + MeanTime*0.5 so the heuristics in the jira are structurally similar to capping the stddev at mean/3. as explained earlier - the percentile stuff is actually (approximately) being done by speculativeCap (no more than 10% of the tasks can be speculated and tasks are sorted (by latest finish time) before speculating). > speculative execution does not handle cases where stddev > mean well > > > Key: MAPREDUCE-2162 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2162 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > > the new speculation code only speculates tasks whose progress rate deviates > from the mean progress rate of a job by more than some multiple (typically > 1.0) of stddev. stddev can be larger than mean. which means that if we ever > get into a situation where this condition holds true - then a task with even > 0 progress rate will not be speculated. > it's not clear that this condition is self-correcting. if a job has thousands > of tasks - then one laggard task, inspite of not being speculated for a long > time, may not be able to fix the condition of stddev > mean. > we have seen jobs where tasks have not been speculated for hours and this > seems one explanation why this may have happened. here's an example job with > stddev > mean: > DataStatistics: count is 6, sum is 1.7141054797775723E-8, sumSquares is > 2.9381575958035014E-16 mean is 2.8568424662959537E-9 std() is > 6.388093955645905E-9 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2028) streaming should support MultiFileInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965849#action_12965849 ] Allen Wittenauer commented on MAPREDUCE-2028: - Actually, what should probably happen is that MultiFileWordCount's "MyInputFormat" and "MultiLineRecordRecord" should get promoted out of examples and officially into the mapred(uce) APIs. The following appears to implement exactly what us streaming users want/need: $HADOOP_HOME/bin/hadoop \ jar \ `ls $HADOOP_HOME/contrib/streaming/hadoop-*-streaming.jar` \ -libjars `ls $HADOOP_HOME/hadoop-*-examples.jar` \ -inputformat org.apache.hadoop.examples.MultiFileWordCount\$MyInputFormat \ -inputreader org.apache.hadoop.examples.MultiFileWordCount\$MultiFileLineRecordReader \ > streaming should support MultiFileInputFormat > - > > Key: MAPREDUCE-2028 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2028 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.20.2 >Reporter: Allen Wittenauer > Fix For: 0.21.1, 0.22.0 > > > There should be a way to call MultiFileInputFormat from streaming without > having to write Java code... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2156) Raid-aware FSCK
[ https://issues.apache.org/jira/browse/MAPREDUCE-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965744#action_12965744 ] Ramkumar Vadali commented on MAPREDUCE-2156: +1, looks good. > Raid-aware FSCK > --- > > Key: MAPREDUCE-2156 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2156 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Affects Versions: 0.23.0 >Reporter: Patrick Kling >Assignee: Patrick Kling > Fix For: 0.23.0 > > Attachments: MAPREDUCE-2156.2.patch, MAPREDUCE-2156.patch > > > Currently, FSCK reports files as corrupt even if they can be fixed using > parity blocks. We need a tool that only reports files that are irreparably > corrupt (i.e., files for which too many data or parity blocks belonging to > the same stripe have been lost or corrupted). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2162) speculative execution does not handle cases where stddev > mean well
[ https://issues.apache.org/jira/browse/MAPREDUCE-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965624#action_12965624 ] Joydeep Sen Sarma commented on MAPREDUCE-2162: -- spent a lot of time coding and thinking about this. i am more to make a simple change to cap the standardDeviation at some maximum value (say Mean/3). i did a detailed analysis that seems to suggest that doing so would be roughly equivalent to the scheme discussed above. we already have the notion of a 'speculative cap' - putting a speculative cap of 10% of the currently running tasks would be roughly equivalent of speculating the bottom 10%. (The LateComparator currently sorts speculatable tasks by remaining time (instead of progress rate). if it were to sort based on progress rate - it would be very similar to speculating the bottom 10%) the conditions discussed here (runningTime >= mean/2 and remainingTime speculative execution does not handle cases where stddev > mean well > > > Key: MAPREDUCE-2162 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2162 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > > the new speculation code only speculates tasks whose progress rate deviates > from the mean progress rate of a job by more than some multiple (typically > 1.0) of stddev. stddev can be larger than mean. which means that if we ever > get into a situation where this condition holds true - then a task with even > 0 progress rate will not be speculated. > it's not clear that this condition is self-correcting. if a job has thousands > of tasks - then one laggard task, inspite of not being speculated for a long > time, may not be able to fix the condition of stddev > mean. > we have seen jobs where tasks have not been speculated for hours and this > seems one explanation why this may have happened. here's an example job with > stddev > mean: > DataStatistics: count is 6, sum is 1.7141054797775723E-8, sumSquares is > 2.9381575958035014E-16 mean is 2.8568424662959537E-9 std() is > 6.388093955645905E-9 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.