[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2016-08-08 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412086#comment-15412086
 ] 

Devaraj K commented on MAPREDUCE-3902:
--

It has been a pending item for quite some time, I would like to take this 
feature forward. I will create a cloned JIRA with the details doc for this to 
work on and also to avoid the collision with the existing branch MR-3902.

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Kannan Rajah
> Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, 
> MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2014-12-03 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233932#comment-14233932
 ] 

Rohith commented on MAPREDUCE-3902:
---

I will update earliest

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Kannan Rajah
> Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, 
> MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2014-12-03 Thread Kannan Rajah (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233492#comment-14233492
 ] 

Kannan Rajah commented on MAPREDUCE-3902:
-

OK. I am going to start looking at the code changes given in this patch to 
understand the workflow better. I don't see a design doc per se. If there is 
one, that will be ideal. Let's wait for Rohith to get back on his design spec 
also.

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, 
> MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2014-12-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233366#comment-14233366
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-3902:
---

I think this JIRA has become stale naturally. I can help you if you're planning 
to do this.

[~rohithsharma], Is your design doc different from Sid's one? Maybe we need to 
deal with AM restart.

cc: [~sseth] You have stopped this work since did you find any design issue or 
something? If the answer is positive, could you tell us the information?

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, 
> MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2014-12-03 Thread Kannan Rajah (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233212#comment-14233212
 ] 

Kannan Rajah commented on MAPREDUCE-3902:
-

[~rohithsharma] Can you share the design spec and patch?

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, 
> MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2014-12-02 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232635#comment-14232635
 ] 

Rohith commented on MAPREDUCE-3902:
---

I wonder this jira has become stale for long time and would like to know the 
reason. I personally think this feature would be helpfull in terms of latency 
container allocation latency.We have done few analysis and implemented support 
for JVM reuse on branch-2 without breaking existing AM functionality. We would 
be ready to share prototype patch along with design doc.

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, 
> MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2014-12-02 Thread Kannan Rajah (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232259#comment-14232259
 ] 

Kannan Rajah commented on MAPREDUCE-3902:
-

Thanks [~ozawa]. Do you know why this work had stopped since Sep 2012? I want 
to understand if there is any other design already in progress to address this 
problem. If so, I would like to contribute to it. For e.g., there was a post by 
[~seth.siddha...@gmail.com] on Tez in 2013 and how it tries to solve the 
container reuse problem. 
http://hortonworks.com/blog/re-using-containers-in-apache-tez/. 

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, 
> MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2014-12-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232243#comment-14232243
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-3902:
---

[~rkannan82] thanks for pinging me. I think the works in this ticket was being 
done in branch(MAPREDUCE-3902). It is based on the old trunk code and now 
difficult to rebase it. [~sseth], what do you think?

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, 
> MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2014-12-02 Thread Kannan Rajah (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232232#comment-14232232
 ] 

Kannan Rajah commented on MAPREDUCE-3902:
-

[~ozawa] Would like to check with you also since you are the assignee for some 
child JIRAs.

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, 
> MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2014-12-02 Thread Kannan Rajah (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232081#comment-14232081
 ] 

Kannan Rajah commented on MAPREDUCE-3902:
-

[~sseth]  I am going to look at JVM reuse in YARN. I came across this JIRA and 
see there has not been any update in a long time. Can you please provide an 
update?

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: AMContainerRefactorNotes.pdf, AM_ContainerRefactor.pdf, 
> MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2012-09-04 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447943#comment-13447943
 ] 

Siddharth Seth commented on MAPREDUCE-3902:
---

Thanks for the help with this JIRA.
bq. because MRAppMaster in container-reuse implementation has the feature to 
monitor whether the running tasks on the containers are "the last task at a 
machine or not", for the purpose of exiting JVMs on containers, as you know.
That will definitely be simpler to achieve with the container-reuse AM, with 
nodes already tracking container information. Last task on a node can be 
figured out relatively easily by the scheduler. It is, however, also possible 
with the current AM, and several bits like the decision on when to run the 
combiner - should be a straight forward port to the reuse-AM. IAC, it'll be 
good to get the re-use AM into trunk fast. Looking forward to the updates on 
4502 and 4525. 


> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2012-09-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447292#comment-13447292
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-3902:
---

Thanks for your enumerating remaining tasks, Siddharth. I'll support you as far 
as possible. 

And I've not yet explained you the relationship between container-reuse work 
and MAPREDUCE-4502, so it may confuse you. I'm sorry for the short of 
explanation. I'll give it to you briefly. I'm planning to implement 
MAPREDUCE-4502 and MAPREDUCE-4525 with container-reuse implementation, because 
MRAppMaster in container-reuse implementation has the feature to monitor 
whether the running tasks on the containers are "the last task at a machine or 
not", for the purpose of exiting JVMs on containers, as you know. This feature 
is very similar to monitor task progress per containers, for the purpose of 
starting to run combiner for multi-level aggregation (MAPREDUCE-4502 and 
MAPREDUCE-4525).



The description here is not documented, so I'll write down my thought as the 
design note for MAPREDUCE-4502 and MAPREDUCE-4525 within next one week. I'm 
very appreciate if you review it.






Thanks, 


Tsuyoshi  

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2012-08-31 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446355#comment-13446355
 ] 

Siddharth Seth commented on MAPREDUCE-3902:
---

You're right. There's a lot of patches which will need to go in. Creating some 
of the sub-tasks that will be required before this can be considered for a 
merge back to trunk. I believe this will take several more weeks. 
MAPREDUCE-4502 doesn't necessarily need to be blocked on this - if that's 
something you're waiting to work on.

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2012-08-31 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445786#comment-13445786
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-3902:
---

[~sseth],

I think that it is necessary to create lots of patches for dealing with this 
ticket. If you have any opinion about how to advance this ticket, please let me 
know. My concern is your review cost and whether my priority set is correct or 
not.

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2012-08-19 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437662#comment-13437662
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-3902:
---

bq. I think a pull request against github for now, and for bigger / more 
significant changes - a separate subtasks under this jira for the changes.

Okey.

bq. I'd like to create a separate branch for this jira, pull in the current set 
of changes with some cleanup, and then continue development. Will create a 
branch later this week unless if noone objects.

All right, I agree with the idea to create new branch for this jira. It's much 
easier to trace the changes.

And, I sent pull request (at 
github)[https://github.com/sidseth/h2-container-reuse/pull/1]. Please check it 
out.

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2012-08-16 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436358#comment-13436358
 ] 

Siddharth Seth commented on MAPREDUCE-3902:
---

bq. If I create some patches(ex. fixing TODOs or something), should I send pull 
request against your github or attach patch here?
I think a pull request against github for now, and for bigger / more 
significant changes - a separate subtasks under this jira for the changes.

bq. Do you think that it's needed to separate hadoop-mapreduce-client-app from 
hadoop-mapreduce-client-app2? Your prototype is under 
hadoop-mapreduce-client-app2 currently. This make it difficult to rebase your 
code on trunk.
The intention was to be able to run the existing code as well as the modified 
code in the same install - with a simple config change to chose between the 
implementations. That makes side by side comparisons much easier. Once this 
implementation stabilizes, it can be moved back to mapreduce-client-app to 
replace the current implementation. Also, there's some pretty big changes to 
TaskAttempt, AM scheduling classes, etc - given this, I'm not sure how useful a 
merge from trunk would be. This will have some overhead though - of pulling in 
/ factoring in jiras which have been fixed after the branch.

With mapreduce-client-app2 being a separate module, development could continue 
in the main branches. However, given that this implementation is not stable, 
I'd like to create a separate branch for this jira, pull in the current set of 
changes with some cleanup, and then continue development. Will create a branch 
later this week unless if noone objects.

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2012-08-16 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435807#comment-13435807
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-3902:
---

@Siddharth,

I have two questions, although my work is still in progress.

# If I create some patches(ex. fixing TODOs or something), should I send pull 
request against your github or attach patch here?
# Do you think that it's needed to separate hadoop-mapreduce-client-app from 
hadoop-mapreduce-client-app2? Your prototype is under 
hadoop-mapreduce-client-app2 currently. This make it difficult to rebase your 
code on trunk.

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps)  : MAPREDUCE-4525 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2012-08-08 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430927#comment-13430927
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-3902:
---

@Siddharth,

Thank you for your sharing the progress and the design you've thought. I'm 
going to fix TODOs of your code at github. If you have any ideas about the 
design, please write it down here.

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2012-08-03 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428412#comment-13428412
 ] 

Siddharth Seth commented on MAPREDUCE-3902:
---

@Tsuyoshi; I'd spoken with Vinod and others about this a while ago. Should have 
posted this earlier.. Adding the functionality to the AM in the current state 
is possible - but will further complicate some components which are already 
quite complicated - and tough to change.

The TaskAttempt state machine is currently really a mix of TaskAttempt 
transitions as well as Container transitions. The RMContaienrAllocator is also 
dealing with more than it should - Nodes, Containers as well as scheduling. 

The idea was to split the functionality into a separate TaskAttempt, Container 
and Node state machine, along with reduced functionality in the scheduler (also 
decoupling the RM request and AM scheduling). This would make the code cleaner 
and make re-use (as well as other improvements like handling retired nodes) 
easier to implement.

Had worked with Vinod on the state transitions, and have been working on the 
implementation in bits and pieces to see how feasible it is. The code is at 
https://github.com/sidseth/h2-container-reuse . It's a little bit of a mess at 
the moment, with lots of TODOs, etc splattered all over, but is just about 
functional. There's no explicit re-use scheduling yet - but re-use can be 
tested by running a job which requires more containers than available on the 
cluster (and some config changes).

bq. the 2nd topic(combining per container) should be moved, because the change 
seems to be too big.
I believe this was, at least initially, meant to ensure that output from all 
taskAttempts in one container, would be fetched only once by a reducer (without 
a common combiner). Either way, that could be a separate jira.

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2012-08-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428212#comment-13428212
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-3902:
---

s/should be moved/should be moved to the new ticket/

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2012-08-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428211#comment-13428211
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-3902:
---

IMHO, the 2nd topic(combining per container) should be moved, because the 
change seems to be too big.
If there are no counter opinion, I'm going to create new ticket to deal with 
the 2nd topic as a sub-task of MAPREDUCe-3902.

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Siddharth Seth
> Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2012-02-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219552#comment-13219552
 ] 

Zhihong Yu commented on MAPREDUCE-3902:
---

{code}
+  private void makeContainerReuseDecision() {
+targetMapContainers = 
+conf.getInt(MRJobConfig.MR_AM_CONTAINER_REUSE_MAX_CONTAINERS, 
+numMapTasks);
+  }
{code}
Maybe more logic is going to be added to the above method ?
{code}
+  //Key->Resource Capability
+  //Value->ResourceRequest
+  protected final Map>
   remoteRequestsTable =
-  new TreeMap>>();
+  new HashMap>();
{code}
The comment above doesn't seem to match the Map structure.

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Attachments: MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

2012-02-28 Thread Kang Xiao (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218825#comment-13218825
 ] 

Kang Xiao commented on MAPREDUCE-3902:
--

Container resuse will also be useful to scale RM since it reduce the scheduling 
load of RM.

> MR AM should reuse containers for map tasks, there-by allowing fine-grained 
> control on num-maps for users without need for CombineFileInputFormat etc.
> --
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Attachments: MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks

2012-02-23 Thread Jay Finger (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214819#comment-13214819
 ] 

Jay Finger commented on MAPREDUCE-3902:
---

I haven't read the patch, forgive me if the answer is already there.

Is there a cap on the amount of re-use?  For example, if the container has been 
in use for more than 1 minute then do not re-use it.

Or to rephrase, what prevents a cluster with a few large jobs from having 
hogged containers?

> MR AM should reuse containers for map tasks
> ---
>
> Key: MAPREDUCE-3902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Attachments: MAPREDUCE-3902.patch
>
>
> The MR AM is now in a great position to reuse containers across (map) tasks. 
> This is something similar to JVM re-use we had in 0.20.x, but in a 
> significantly better manner:
> # Consider data-locality when re-using containers
> # Consider the new shuffle - ensure that reduces fetch output of the whole 
> container at once (i.e. all maps) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira