[ https://issues.apache.org/jira/browse/MAPREDUCE-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685513#comment-13685513 ]
Avner BenHanoch commented on MAPREDUCE-5329: -------------------------------------------- Another comment, I looked in the code of *createCommonContainerLaunchContext* in TaskAttemptImpl.java, it appears that this method *creates a brand new serviceData* and fills it explicitly with just the builtin "ShuffleHandler" using: {code} // Service data Map<String, ByteBuffer> serviceData = new HashMap<String, ByteBuffer>(); ... // Add shuffle token LOG.info("Putting shuffle token in serviceData"); serviceData.put(ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, ShuffleHandler.serializeServiceData(jobToken)); {code} The only external input to this code snippet is the jobToken which is truly an argument to the method. Hence, *it seems that an application can only determine the jobToken in the serviceData, but NOT the services in the serviceData*. Now, If you don't want all AuxServices to get INIT event, we can create a new conf param with the sub-list of AuxServices that should get INIT event (and the default value will be the builtin 'ShuffleHandler' instead of hard-coded). However, I don't think there is an AuxService that is incompatible with INIT event since the initApp method is part of the AuxiliaryService interface. Hence, my private opinion is that sending INIT to all AuxServices matches the current design. However, it is your call to decide. I am okay with any option that can work, and the best option is without code changes - if this is possible :) > APPLICATION_INIT is never sent to AuxServices other than the builtin > ShuffleHandler > ----------------------------------------------------------------------------------- > > Key: MAPREDUCE-5329 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5329 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am > Affects Versions: 2.0.4-alpha > Reporter: Avner BenHanoch > > APPLICATION_INIT is never sent to AuxServices other than the built-in > ShuffleHandler. This means that 3rd party ShuffleProvider(s) will not be > able to function, because APPLICATION_INIT enables the AuxiliaryService to > map jobId->userId. This is needed for properly finding the MOFs of a job per > reducers' requests. > NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to > hard-coded expression in hadoop code. The current TaskAttemptImpl.java code > explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, > ...) and ignores any additional AuxiliaryService. As a result, only the > built-in ShuffleHandler will get APPLICATION_INIT events. Any 3rd party > AuxillaryService will never get APPLICATION_INIT events. > I think a solution can be in one of two ways: > 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register > each of them, by calling serviceData.put (…) in loop. > 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668 > "APPLICATION_STOP is never sent to AuxServices". This means that in case the > 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux > Services regardless of the value in event.getServiceID(). > I prefer the 2nd solution. I am welcoming any ideas. I can provide the > needed patch for any option that people like. > See [Pluggable Shuffle in Hadoop > documentation|http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira