[ https://issues.apache.org/jira/browse/MAPREDUCE-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748477#comment-13748477 ]
Avner BenHanoch commented on MAPREDUCE-5329: -------------------------------------------- Hi Siddharth, I read some documentation about Yarn architecture and I got a better understanding of your points. I am trying to suggest a new solution: I see 4 issues in the current implementation of AuxiliaryServices / NodeManager in Yarn: # MAPREDUCE-5329: APPLICATION_INIT is never sent to AuxiliaryServices other than the built in ShuffleHandler. This is in contrast to [the following Yarn/NodeManager documentation|http://hortonworks.com/blog/apache-hadoop-yarn-nodemanager/], which says: _"Auxiliary services are notified when an application’s first container starts on the node"_ # YARN-886: APPLICATION_STOP is inconsistent with APPLICATION_INIT # New issue: We should consider shuffleToken to be specific to the shuffle provider # New issue: AM should support multiple AuxiliaryServices, each with a distinct service port for #1 & #2 we have already created JIRA issues. *I strongly suggest creating distinct JIRA issues for #3 & #4 as well*. This will allow progressing in parallel and for the users to benefit from the fixes independently (without binding one fix to the other). Last comment, regarding #3, I think that perhaps we should leave shuffleToken to be general to all shuffle providers. This is for 2 reasons: * AFAICS, shuffleToken is based on jobToken and user credentials; hence, it is not specific to the provider but it is to the job&user. * In the shuffle-consumer side, the token is not specific to the shuffle-consumer, but it is part of the reduceTask; hence, it is general for all shuffle-consumers. Hence all shuffle services can use the same ShuffleToken without any problem. Please let me know what do you think. Thanks for your help, Avner > APPLICATION_INIT is never sent to AuxServices other than the builtin > ShuffleHandler > ----------------------------------------------------------------------------------- > > Key: MAPREDUCE-5329 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5329 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am > Affects Versions: 2.0.4-alpha > Reporter: Avner BenHanoch > > APPLICATION_INIT is never sent to AuxServices other than the built-in > ShuffleHandler. This means that 3rd party ShuffleProvider(s) will not be > able to function, because APPLICATION_INIT enables the AuxiliaryService to > map jobId->userId. This is needed for properly finding the MOFs of a job per > reducers' requests. > NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to > hard-coded expression in hadoop code. The current TaskAttemptImpl.java code > explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, > ...) and ignores any additional AuxiliaryService. As a result, only the > built-in ShuffleHandler will get APPLICATION_INIT events. Any 3rd party > AuxillaryService will never get APPLICATION_INIT events. > I think a solution can be in one of two ways: > 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register > each of them, by calling serviceData.put (…) in loop. > 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668 > "APPLICATION_STOP is never sent to AuxServices". This means that in case the > 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux > Services regardless of the value in event.getServiceID(). > I prefer the 2nd solution. I am welcoming any ideas. I can provide the > needed patch for any option that people like. > See [Pluggable Shuffle in Hadoop > documentation|http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira