[ https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907872#action_12907872 ]
Xing Shi commented on MAPREDUCE-1434: ------------------------------------- Yes it is the former, but I think the commiters should review it first? Or after I submit it ? > Dynamic add input for one job > ----------------------------- > > Key: MAPREDUCE-1434 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Affects Versions: 0.20.3 > Reporter: Xing Shi > Fix For: 0.20.3 > > Attachments: dynamic_input-v1.patch > > > Always we should firstly upload the data to hdfs, then we can analize the > data using hadoop mapreduce. > Sometimes, the upload process takes long time. So if we can add input during > one job, the time can be saved. > WHAT? > Client: > a) hadoop job -add-input jobId inputFormat ... > Add the input to jobid > b) hadoop job -add-input done > Tell the JobTracker, the input has been prepared over. > c) hadoop job -add-input status jobid > Show how many input the jobid has. > HOWTO? > Mainly, I think we should do three things: > 1. JobClinet: here JobClient should support add input to a job, indeed, > JobClient generate the split, and submit to JobTracker. > 2. JobTracker: JobTracker support addInput, and add the new tasks to the > original mapTasks. Because the uploaded data will be > processed quickly, so it also should update the scheduler to support pending > a map task till Client tells the Job input done. > 3. Reducer: the reducer should also update the mapNums, so it will shuffle > right. > This is the rough idea, and I will update it . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.