[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412086#comment-15412086 ] Devaraj K commented on MAPREDUCE-3902: -- It has been a pending item for quite some time, I would like to take this feature forward. I will create a cloned JIRA with the details doc for this to work on and also to avoid the collision with the existing branch MR-3902. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Kannan Rajah > Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, > MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233932#comment-14233932 ] Rohith commented on MAPREDUCE-3902: --- I will update earliest > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Kannan Rajah > Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, > MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233492#comment-14233492 ] Kannan Rajah commented on MAPREDUCE-3902: - OK. I am going to start looking at the code changes given in this patch to understand the workflow better. I don't see a design doc per se. If there is one, that will be ideal. Let's wait for Rohith to get back on his design spec also. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, > MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233366#comment-14233366 ] Tsuyoshi OZAWA commented on MAPREDUCE-3902: --- I think this JIRA has become stale naturally. I can help you if you're planning to do this. [~rohithsharma], Is your design doc different from Sid's one? Maybe we need to deal with AM restart. cc: [~sseth] You have stopped this work since did you find any design issue or something? If the answer is positive, could you tell us the information? > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, > MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233212#comment-14233212 ] Kannan Rajah commented on MAPREDUCE-3902: - [~rohithsharma] Can you share the design spec and patch? > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, > MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232635#comment-14232635 ] Rohith commented on MAPREDUCE-3902: --- I wonder this jira has become stale for long time and would like to know the reason. I personally think this feature would be helpfull in terms of latency container allocation latency.We have done few analysis and implemented support for JVM reuse on branch-2 without breaking existing AM functionality. We would be ready to share prototype patch along with design doc. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, > MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232259#comment-14232259 ] Kannan Rajah commented on MAPREDUCE-3902: - Thanks [~ozawa]. Do you know why this work had stopped since Sep 2012? I want to understand if there is any other design already in progress to address this problem. If so, I would like to contribute to it. For e.g., there was a post by [~seth.siddha...@gmail.com] on Tez in 2013 and how it tries to solve the container reuse problem. http://hortonworks.com/blog/re-using-containers-in-apache-tez/. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, > MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232243#comment-14232243 ] Tsuyoshi OZAWA commented on MAPREDUCE-3902: --- [~rkannan82] thanks for pinging me. I think the works in this ticket was being done in branch(MAPREDUCE-3902). It is based on the old trunk code and now difficult to rebase it. [~sseth], what do you think? > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, > MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232232#comment-14232232 ] Kannan Rajah commented on MAPREDUCE-3902: - [~ozawa] Would like to check with you also since you are the assignee for some child JIRAs. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, > MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232081#comment-14232081 ] Kannan Rajah commented on MAPREDUCE-3902: - [~sseth] I am going to look at JVM reuse in YARN. I came across this JIRA and see there has not been any update in a long time. Can you please provide an update? > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, > MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447943#comment-13447943 ] Siddharth Seth commented on MAPREDUCE-3902: --- Thanks for the help with this JIRA. bq. because MRAppMaster in container-reuse implementation has the feature to monitor whether the running tasks on the containers are "the last task at a machine or not", for the purpose of exiting JVMs on containers, as you know. That will definitely be simpler to achieve with the container-reuse AM, with nodes already tracking container information. Last task on a node can be figured out relatively easily by the scheduler. It is, however, also possible with the current AM, and several bits like the decision on when to run the combiner - should be a straight forward port to the reuse-AM. IAC, it'll be good to get the re-use AM into trunk fast. Looking forward to the updates on 4502 and 4525. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447292#comment-13447292 ] Tsuyoshi OZAWA commented on MAPREDUCE-3902: --- Thanks for your enumerating remaining tasks, Siddharth. I'll support you as far as possible. And I've not yet explained you the relationship between container-reuse work and MAPREDUCE-4502, so it may confuse you. I'm sorry for the short of explanation. I'll give it to you briefly. I'm planning to implement MAPREDUCE-4502 and MAPREDUCE-4525 with container-reuse implementation, because MRAppMaster in container-reuse implementation has the feature to monitor whether the running tasks on the containers are "the last task at a machine or not", for the purpose of exiting JVMs on containers, as you know. This feature is very similar to monitor task progress per containers, for the purpose of starting to run combiner for multi-level aggregation (MAPREDUCE-4502 and MAPREDUCE-4525). The description here is not documented, so I'll write down my thought as the design note for MAPREDUCE-4502 and MAPREDUCE-4525 within next one week. I'm very appreciate if you review it. Thanks, Tsuyoshi > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446355#comment-13446355 ] Siddharth Seth commented on MAPREDUCE-3902: --- You're right. There's a lot of patches which will need to go in. Creating some of the sub-tasks that will be required before this can be considered for a merge back to trunk. I believe this will take several more weeks. MAPREDUCE-4502 doesn't necessarily need to be blocked on this - if that's something you're waiting to work on. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445786#comment-13445786 ] Tsuyoshi OZAWA commented on MAPREDUCE-3902: --- [~sseth], I think that it is necessary to create lots of patches for dealing with this ticket. If you have any opinion about how to advance this ticket, please let me know. My concern is your review cost and whether my priority set is correct or not. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437662#comment-13437662 ] Tsuyoshi OZAWA commented on MAPREDUCE-3902: --- bq. I think a pull request against github for now, and for bigger / more significant changes - a separate subtasks under this jira for the changes. Okey. bq. I'd like to create a separate branch for this jira, pull in the current set of changes with some cleanup, and then continue development. Will create a branch later this week unless if noone objects. All right, I agree with the idea to create new branch for this jira. It's much easier to trace the changes. And, I sent pull request (at github)[https://github.com/sidseth/h2-container-reuse/pull/1]. Please check it out. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436358#comment-13436358 ] Siddharth Seth commented on MAPREDUCE-3902: --- bq. If I create some patches(ex. fixing TODOs or something), should I send pull request against your github or attach patch here? I think a pull request against github for now, and for bigger / more significant changes - a separate subtasks under this jira for the changes. bq. Do you think that it's needed to separate hadoop-mapreduce-client-app from hadoop-mapreduce-client-app2? Your prototype is under hadoop-mapreduce-client-app2 currently. This make it difficult to rebase your code on trunk. The intention was to be able to run the existing code as well as the modified code in the same install - with a simple config change to chose between the implementations. That makes side by side comparisons much easier. Once this implementation stabilizes, it can be moved back to mapreduce-client-app to replace the current implementation. Also, there's some pretty big changes to TaskAttempt, AM scheduling classes, etc - given this, I'm not sure how useful a merge from trunk would be. This will have some overhead though - of pulling in / factoring in jiras which have been fixed after the branch. With mapreduce-client-app2 being a separate module, development could continue in the main branches. However, given that this implementation is not stable, I'd like to create a separate branch for this jira, pull in the current set of changes with some cleanup, and then continue development. Will create a branch later this week unless if noone objects. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435807#comment-13435807 ] Tsuyoshi OZAWA commented on MAPREDUCE-3902: --- @Siddharth, I have two questions, although my work is still in progress. # If I create some patches(ex. fixing TODOs or something), should I send pull request against your github or attach patch here? # Do you think that it's needed to separate hadoop-mapreduce-client-app from hadoop-mapreduce-client-app2? Your prototype is under hadoop-mapreduce-client-app2 currently. This make it difficult to rebase your code on trunk. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) : MAPREDUCE-4525 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430927#comment-13430927 ] Tsuyoshi OZAWA commented on MAPREDUCE-3902: --- @Siddharth, Thank you for your sharing the progress and the design you've thought. I'm going to fix TODOs of your code at github. If you have any ideas about the design, please write it down here. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428412#comment-13428412 ] Siddharth Seth commented on MAPREDUCE-3902: --- @Tsuyoshi; I'd spoken with Vinod and others about this a while ago. Should have posted this earlier.. Adding the functionality to the AM in the current state is possible - but will further complicate some components which are already quite complicated - and tough to change. The TaskAttempt state machine is currently really a mix of TaskAttempt transitions as well as Container transitions. The RMContaienrAllocator is also dealing with more than it should - Nodes, Containers as well as scheduling. The idea was to split the functionality into a separate TaskAttempt, Container and Node state machine, along with reduced functionality in the scheduler (also decoupling the RM request and AM scheduling). This would make the code cleaner and make re-use (as well as other improvements like handling retired nodes) easier to implement. Had worked with Vinod on the state transitions, and have been working on the implementation in bits and pieces to see how feasible it is. The code is at https://github.com/sidseth/h2-container-reuse . It's a little bit of a mess at the moment, with lots of TODOs, etc splattered all over, but is just about functional. There's no explicit re-use scheduling yet - but re-use can be tested by running a job which requires more containers than available on the cluster (and some config changes). bq. the 2nd topic(combining per container) should be moved, because the change seems to be too big. I believe this was, at least initially, meant to ensure that output from all taskAttempts in one container, would be fetched only once by a reducer (without a common combiner). Either way, that could be a separate jira. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428212#comment-13428212 ] Tsuyoshi OZAWA commented on MAPREDUCE-3902: --- s/should be moved/should be moved to the new ticket/ > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428211#comment-13428211 ] Tsuyoshi OZAWA commented on MAPREDUCE-3902: --- IMHO, the 2nd topic(combining per container) should be moved, because the change seems to be too big. If there are no counter opinion, I'm going to create new ticket to deal with the 2nd topic as a sub-task of MAPREDUCe-3902. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Siddharth Seth > Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219552#comment-13219552 ] Zhihong Yu commented on MAPREDUCE-3902: --- {code} + private void makeContainerReuseDecision() { +targetMapContainers = +conf.getInt(MRJobConfig.MR_AM_CONTAINER_REUSE_MAX_CONTAINERS, +numMapTasks); + } {code} Maybe more logic is going to be added to the above method ? {code} + //Key->Resource Capability + //Value->ResourceRequest + protected final Map> remoteRequestsTable = - new TreeMap>>(); + new HashMap>(); {code} The comment above doesn't seem to match the Map structure. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Arun C Murthy > Attachments: MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218825#comment-13218825 ] Kang Xiao commented on MAPREDUCE-3902: -- Container resuse will also be useful to scale RM since it reduce the scheduling load of RM. > MR AM should reuse containers for map tasks, there-by allowing fine-grained > control on num-maps for users without need for CombineFileInputFormat etc. > -- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Arun C Murthy > Attachments: MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214819#comment-13214819 ] Jay Finger commented on MAPREDUCE-3902: --- I haven't read the patch, forgive me if the answer is already there. Is there a cap on the amount of re-use? For example, if the container has been in use for more than 1 minute then do not re-use it. Or to rephrase, what prevents a cluster with a few large jobs from having hogged containers? > MR AM should reuse containers for map tasks > --- > > Key: MAPREDUCE-3902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Arun C Murthy >Assignee: Arun C Murthy > Attachments: MAPREDUCE-3902.patch > > > The MR AM is now in a great position to reuse containers across (map) tasks. > This is something similar to JVM re-use we had in 0.20.x, but in a > significantly better manner: > # Consider data-locality when re-using containers > # Consider the new shuffle - ensure that reduces fetch output of the whole > container at once (i.e. all maps) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira