[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748477#comment-13748477
 ] 

Avner BenHanoch commented on MAPREDUCE-5329:
--------------------------------------------

Hi Siddharth,
 
I read some documentation about Yarn architecture and I got a better 
understanding of your points.  I am trying to suggest a new solution:
 
I see 4 issues in the current implementation of AuxiliaryServices / NodeManager 
in Yarn:
# MAPREDUCE-5329: APPLICATION_INIT is never sent to AuxiliaryServices other 
than the built in ShuffleHandler.  This is in contrast to [the following 
Yarn/NodeManager 
documentation|http://hortonworks.com/blog/apache-hadoop-yarn-nodemanager/], 
which says: _"Auxiliary services are notified when an application’s first 
container starts on the node"_
# YARN-886: APPLICATION_STOP is inconsistent with APPLICATION_INIT
# New issue: We should consider shuffleToken to be specific to the shuffle 
provider
# New issue: AM should support multiple AuxiliaryServices, each with a distinct 
service port
 

for #1 & #2 we have already created JIRA issues.  *I strongly suggest creating 
distinct JIRA issues for #3 & #4 as well*.  This will allow progressing in 
parallel and for the users to benefit from the fixes independently (without 
binding one fix to the other).
  
Last comment, regarding #3, I think that perhaps we should leave shuffleToken 
to be general to all shuffle providers.  This is for 2 reasons:
* AFAICS, shuffleToken is based on jobToken and user credentials; hence, it is 
not specific to the provider but it is to the job&user.
* In the shuffle-consumer side, the token is not specific to the 
shuffle-consumer, but it is part of the reduceTask; hence, it is general for 
all shuffle-consumers.
Hence all shuffle services can use the same ShuffleToken without any problem.

Please let me know what do you think.

Thanks for your help,
Avner
 
                
> APPLICATION_INIT is never sent to AuxServices other than the builtin 
> ShuffleHandler
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5329
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5329
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.4-alpha
>            Reporter: Avner BenHanoch
>
> APPLICATION_INIT is never sent to AuxServices other than the built-in 
> ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
> able to function, because APPLICATION_INIT enables the AuxiliaryService to 
> map jobId->userId. This is needed for properly finding the MOFs of a job per 
> reducers' requests.
> NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
> hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
> explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
> ...) and ignores any additional AuxiliaryService. As a result, only the 
> built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
> AuxillaryService will never get APPLICATION_INIT events.
> I think a solution can be in one of two ways:
> 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
> each of them, by calling serviceData.put (…) in loop.
> 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
> "APPLICATION_STOP is never sent to AuxServices".  This means that in case the 
> 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
> Services regardless of the value in event.getServiceID().
> I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
> needed patch for any option that people like.
> See [Pluggable Shuffle in Hadoop 
> documentation|http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to