[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098777#comment-13098777
 ] 

Harsh J commented on MAPREDUCE-2236:
------------------------------------

Well do you want me to rebase or do you feel there's no need to? I'm not 
targeting anything lower than trunk here, so let me know if its relevant (I'll 
also like hows/whys :D)

> No task may execute due to an Integer overflow possibility
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-2236
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2236
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>         Environment: Linux, Hadoop 0.20.2
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Critical
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2236.r1.diff, MAPREDUCE-2236.r1.diff, 
> MAPREDUCE-2236.r2.diff
>
>
> If the attempts is configured to use Integer.MAX_VALUE, an overflow occurs 
> inside TaskInProgress, and thereby no task is attempted by the cluster and 
> the map tasks stay in pending state forever.
> For example, here's a job driver that causes this:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.fs.FSDataOutputStream;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.mapred.FileInputFormat;
> import org.apache.hadoop.mapred.JobClient;
> import org.apache.hadoop.mapred.JobConf;
> import org.apache.hadoop.mapred.TextInputFormat;
> import org.apache.hadoop.mapred.lib.IdentityMapper;
> import org.apache.hadoop.mapred.lib.NullOutputFormat;
> @SuppressWarnings("deprecation")
> public class IntegerOverflow {
>       /**
>        * @param args
>        * @throws IOException 
>        */
>       @SuppressWarnings("deprecation")
>       public static void main(String[] args) throws IOException {
>               JobConf conf = new JobConf();
>               
>               Path inputPath = new Path("ignore");
>               FileSystem fs = FileSystem.get(conf);
>               if (!fs.exists(inputPath)) {
>                       FSDataOutputStream out = fs.create(inputPath);
>                       out.writeChars("Test");
>                       out.close();
>               }
>               
>               conf.setInputFormat(TextInputFormat.class);
>               conf.setOutputFormat(NullOutputFormat.class);
>               FileInputFormat.addInputPath(conf, inputPath);
>               
>               conf.setMapperClass(IdentityMapper.class);
>               conf.setNumMapTasks(1);
>               // Problem inducing line follows.
>               conf.setMaxMapAttempts(Integer.MAX_VALUE);
>               
>               // No reducer in this test, although setMaxReduceAttempts leads 
> to the same problem.
>               conf.setNumReduceTasks(0);
>               
>               JobClient.runJob(conf);
>       }
> }
> {code}
> The above code will not let any map task run. Additionally, a log would be 
> created inside JobTracker logs with the following information that clearly 
> shows the overflow:
> {code}
> 2010-12-30 00:59:07,836 WARN org.apache.hadoop.mapred.TaskInProgress: 
> Exceeded limit of -2147483648 (plus 0 killed) attempts for the tip 
> 'task_201012300058_0001_m_000000'
> {code}
> The issue lies inside the TaskInProgress class 
> (/o/a/h/mapred/TaskInProgress.java), at line 1018 (trunk), part of the 
> getTaskToRun(String taskTracker) method.
> {code}
>   public Task getTaskToRun(String taskTracker) throws IOException {   
>     // Create the 'taskid'; do not count the 'killed' tasks against the job!
>     TaskAttemptID taskid = null;
>     /* ============ THIS LINE v ====================================== */
>     if (nextTaskId < (MAX_TASK_EXECS + maxTaskAttempts + numKilledTasks)) {
>     /* ============ THIS LINE ^====================================== */
>       // Make sure that the attempts are unqiue across restarts
>       int attemptId = job.getNumRestarts() * NUM_ATTEMPTS_PER_RESTART + 
> nextTaskId;
>       taskid = new TaskAttemptID( id, attemptId);
>       ++nextTaskId;
>     } else {
>       LOG.warn("Exceeded limit of " + (MAX_TASK_EXECS + maxTaskAttempts) +
>               " (plus " + numKilledTasks + " killed)"  + 
>               " attempts for the tip '" + getTIPId() + "'");
>       return null;
>     }
> {code}
> Since all three variables being added are integer in type, one of them being 
> Integer.MAX_VALUE makes the condition fail with an overflow, thereby logging 
> and returning a null as the result is negative.
> One solution would be to make one of these variables into a long, so the 
> addition does not overflow?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to