[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682488#comment-13682488
 ] 

Siddharth Seth commented on YARN-802:
-

Can the MR AM specify the service to be used via configuration, and set the 
service data accordingly.

> APPLICATION_INIT is never sent to AuxServices other than the builtin 
> ShuffleHandler
> ---
>
> Key: YARN-802
> URL: https://issues.apache.org/jira/browse/YARN-802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager
>Affects Versions: 2.0.4-alpha
>Reporter: Avner BenHanoch
>
> APPLICATION_INIT is never sent to AuxServices other than the built-in 
> ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
> able to function, because APPLICATION_INIT enables the AuxiliaryService to 
> map jobId->userId. This is needed for properly finding the MOFs of a job per 
> reducers' requests.
> NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
> hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
> explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
> ...) and ignores any additional AuxiliaryService. As a result, only the 
> built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
> AuxillaryService will never get APPLICATION_INIT events.
> I think a solution can be in one of two ways:
> 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
> each of them, by calling serviceData.put (…) in loop.
> 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
> "APPLICATION_STOP is never sent to AuxServices".  This means that in case the 
> 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
> Services regardless of the value in event.getServiceID().
> I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
> needed patch for any option that people like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-13 Thread Avner BenHanoch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682539#comment-13682539
 ] 

Avner BenHanoch commented on YARN-802:
--

Hi Siddharth,

I am not sure I understand the question.  Do you suggest that we'll have a 
configuration with sub-list of the AuxServices and only members of the new list 
will get the APPLICATION_INIT event?
- This is possible, however, it will not match the current behavior with 
APPLICATION_STOP event, since the last event is being sent to ALL AuxServices.

Please elaborate.

> APPLICATION_INIT is never sent to AuxServices other than the builtin 
> ShuffleHandler
> ---
>
> Key: YARN-802
> URL: https://issues.apache.org/jira/browse/YARN-802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager
>Affects Versions: 2.0.4-alpha
>Reporter: Avner BenHanoch
>
> APPLICATION_INIT is never sent to AuxServices other than the built-in 
> ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
> able to function, because APPLICATION_INIT enables the AuxiliaryService to 
> map jobId->userId. This is needed for properly finding the MOFs of a job per 
> reducers' requests.
> NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
> hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
> explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
> ...) and ignores any additional AuxiliaryService. As a result, only the 
> built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
> AuxillaryService will never get APPLICATION_INIT events.
> I think a solution can be in one of two ways:
> 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
> each of them, by calling serviceData.put (…) in loop.
> 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
> "APPLICATION_STOP is never sent to AuxServices".  This means that in case the 
> 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
> Services regardless of the value in event.getServiceID().
> I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
> needed patch for any option that people like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682724#comment-13682724
 ] 

Siddharth Seth commented on YARN-802:
-

It should be possible to configure the MR AM with the shuffle service that 
needs to be used, in which case the MR AM sets up the service id correctly (in 
TaskAttemptImpl), and the NodeManager can send the init event to the correct 
service. We should probably change the stop to behave the same way. 

> APPLICATION_INIT is never sent to AuxServices other than the builtin 
> ShuffleHandler
> ---
>
> Key: YARN-802
> URL: https://issues.apache.org/jira/browse/YARN-802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager
>Affects Versions: 2.0.4-alpha
>Reporter: Avner BenHanoch
>
> APPLICATION_INIT is never sent to AuxServices other than the built-in 
> ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
> able to function, because APPLICATION_INIT enables the AuxiliaryService to 
> map jobId->userId. This is needed for properly finding the MOFs of a job per 
> reducers' requests.
> NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
> hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
> explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
> ...) and ignores any additional AuxiliaryService. As a result, only the 
> built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
> AuxillaryService will never get APPLICATION_INIT events.
> I think a solution can be in one of two ways:
> 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
> each of them, by calling serviceData.put (…) in loop.
> 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
> "APPLICATION_STOP is never sent to AuxServices".  This means that in case the 
> 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
> Services regardless of the value in event.getServiceID().
> I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
> needed patch for any option that people like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-13 Thread Avner BenHanoch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683102#comment-13683102
 ] 

Avner BenHanoch commented on YARN-802:
--

This idea will force us to restart mapred daemons any time we want to switch 
shuffle service and will limit us to having just 1 ShuffleProvider at a time.  
I already discussed this idea in MAPREDUCE-4049 (look for the word multiple 
there).  I am pasting here one paragraph from that discussion:

{quote}
It could be that a ShuffleConsumerX will be ideal for jobs of one type, while 
ShuffleConsumerY will be ideal for jobs of other type (for example Grep vs. 
TeraSort). Hence, multiple Shuffle-Consumer plugins may run in parallel in 
multiple jobs. Each of the consumers will contact its desired shuffle provider. 
Hence, all providers should be available in parallel (also, one shuffle service 
can be sensitive to type of network problems that doesn't disturb other shuffle 
services, hence, it should be possible to fallback to another shuffle on the 
fly).
{quote}

> APPLICATION_INIT is never sent to AuxServices other than the builtin 
> ShuffleHandler
> ---
>
> Key: YARN-802
> URL: https://issues.apache.org/jira/browse/YARN-802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager
>Affects Versions: 2.0.4-alpha
>Reporter: Avner BenHanoch
>
> APPLICATION_INIT is never sent to AuxServices other than the built-in 
> ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
> able to function, because APPLICATION_INIT enables the AuxiliaryService to 
> map jobId->userId. This is needed for properly finding the MOFs of a job per 
> reducers' requests.
> NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
> hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
> explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
> ...) and ignores any additional AuxiliaryService. As a result, only the 
> built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
> AuxillaryService will never get APPLICATION_INIT events.
> I think a solution can be in one of two ways:
> 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
> each of them, by calling serviceData.put (…) in loop.
> 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
> "APPLICATION_STOP is never sent to AuxServices".  This means that in case the 
> 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
> Services regardless of the value in event.getServiceID().
> I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
> needed patch for any option that people like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-14 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683579#comment-13683579
 ] 

Siddharth Seth commented on YARN-802:
-

With YARN, a new AM (Application) is started per job. The initApp in the NM is 
per app - so each job/app can choose which shuffle provider it wants to use. 
The shuffle service configured for an AM will be specific to a single job only.
>From MAPREDUCE-4049
bq.  A shuffle consumer instance will only contact one of the shuffle providers 
and will request its desired files only from from this provider.

I'm assuming a single job will only use one shuffle provider - or do you see a 
situation where multiple shuffle providers can serve data to a single job ?

In case of multiple jobs being run by a single AM - this gets more complicated, 
and we may need to initialize multiple providers.

> APPLICATION_INIT is never sent to AuxServices other than the builtin 
> ShuffleHandler
> ---
>
> Key: YARN-802
> URL: https://issues.apache.org/jira/browse/YARN-802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager
>Affects Versions: 2.0.4-alpha
>Reporter: Avner BenHanoch
>
> APPLICATION_INIT is never sent to AuxServices other than the built-in 
> ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
> able to function, because APPLICATION_INIT enables the AuxiliaryService to 
> map jobId->userId. This is needed for properly finding the MOFs of a job per 
> reducers' requests.
> NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
> hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
> explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
> ...) and ignores any additional AuxiliaryService. As a result, only the 
> built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
> AuxillaryService will never get APPLICATION_INIT events.
> I think a solution can be in one of two ways:
> 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
> each of them, by calling serviceData.put (…) in loop.
> 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
> "APPLICATION_STOP is never sent to AuxServices".  This means that in case the 
> 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
> Services regardless of the value in event.getServiceID().
> I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
> needed patch for any option that people like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-15 Thread Avner BenHanoch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684258#comment-13684258
 ] 

Avner BenHanoch commented on YARN-802:
--

Thanks for the explanation about YARN.  Still this is not enough, for two 
reasons:

1. It is true that *usually* "A shuffle consumer instance will only contact one 
of the shuffle providers". Still, as written in the quote I pasted , "it should 
be possible to fallback to another shuffle on the fly".  This means that one 
consumer can load another consumer and serve as proxy to the real consumer that 
will contact another provider.

2. In a single job there are multiple reducers each with its own shuffle 
consumer instance; hence, we have *multiple shuffle consumers per job*.  It 
should be possible for each consumer to choose its preffered provider based on 
memory/network/... condition on its machine regardless of other consumers in 
the same job.  


> APPLICATION_INIT is never sent to AuxServices other than the builtin 
> ShuffleHandler
> ---
>
> Key: YARN-802
> URL: https://issues.apache.org/jira/browse/YARN-802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager
>Affects Versions: 2.0.4-alpha
>Reporter: Avner BenHanoch
>
> APPLICATION_INIT is never sent to AuxServices other than the built-in 
> ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
> able to function, because APPLICATION_INIT enables the AuxiliaryService to 
> map jobId->userId. This is needed for properly finding the MOFs of a job per 
> reducers' requests.
> NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
> hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
> explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
> ...) and ignores any additional AuxiliaryService. As a result, only the 
> built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
> AuxillaryService will never get APPLICATION_INIT events.
> I think a solution can be in one of two ways:
> 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
> each of them, by calling serviceData.put (…) in loop.
> 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
> "APPLICATION_STOP is never sent to AuxServices".  This means that in case the 
> 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
> Services regardless of the value in event.getServiceID().
> I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
> needed patch for any option that people like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-16 Thread Avner BenHanoch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684624#comment-13684624
 ] 

Avner BenHanoch commented on YARN-802:
--


Hi Siddharth
In addition to what I wrote, I just noticed that perhaps we have a 
misunderstanding.
See "Implementing a Custom Shuffle" from [Hadoop documentation about Pluggable 
Shuffle|http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html]
{quote}
A custom shuffle implementation requires a 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.AuxiliaryServiceimplementation
 class running in the NodeManagers and a 
org.apache.hadoop.mapred.ShuffleConsumerPlugin implementation class running in 
the Reducer tasks.
The default implementations provided by Hadoop can be used as references:
•   org.apache.hadoop.mapred.ShuffleHandler
•   org.apache.hadoop.mapreduce.task.reduce.Shuffle
{quote}

In this issue we are talking about the *provider side of the Shuffle*, we want 
to have additional ShuffleProvider(s) in addition to the default ShuffleHandler.
All shuffle providers are AuxServices running in the NodeManager.  I see them 
as daemons like ShuffleHandler, they are not applications or jobs. If I 
understand correctly, in order to simultaneously support multiple jobs of 
multiple users that each can contact different Shuffle provider we must have 
all providers in the air in parallel.
With this, the ShuffleConsumer in each reducer will be able to request MOFs 
from its desired provider.  However, the provider must know where in the disks 
are the MOFs of a particular job.  The way to know that (see ShuffleHandler) is 
based on the mapping: jobId -> userId.  This data for this map arrive from the 
APPLICATION_INIT event.  Hence, all AuxServices that serve as ShuffleProviders 
need to get APPLICATION_INIT events.


> APPLICATION_INIT is never sent to AuxServices other than the builtin 
> ShuffleHandler
> ---
>
> Key: YARN-802
> URL: https://issues.apache.org/jira/browse/YARN-802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager
>Affects Versions: 2.0.4-alpha
>Reporter: Avner BenHanoch
>
> APPLICATION_INIT is never sent to AuxServices other than the built-in 
> ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
> able to function, because APPLICATION_INIT enables the AuxiliaryService to 
> map jobId->userId. This is needed for properly finding the MOFs of a job per 
> reducers' requests.
> NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
> hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
> explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
> ...) and ignores any additional AuxiliaryService. As a result, only the 
> built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
> AuxillaryService will never get APPLICATION_INIT events.
> I think a solution can be in one of two ways:
> 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
> each of them, by calling serviceData.put (…) in loop.
> 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
> "APPLICATION_STOP is never sent to AuxServices".  This means that in case the 
> 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
> Services regardless of the value in event.getServiceID().
> I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
> needed patch for any option that people like.
> See about Pluggable Shuffle in this URL: 
> http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-17 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685110#comment-13685110
 ] 

Siddharth Seth commented on YARN-802:
-

bq. If I understand correctly, in order to simultaneously support multiple jobs 
of multiple users that each can contact different Shuffle provider we must have 
all providers in the air in parallel.
Multiple providers can be run by the NodeManager in parallel. An application 
chooses which provider(s) it wants to use when it starts a container on a 
NodeManager.

bq. This data for this map arrive from the APPLICATION_INIT event. Hence, all 
AuxServices that serve as ShuffleProviders need to get APPLICATION_INIT events.
The data in the APPLICAITON_INIT event is from the startContainer request (the 
serviceData in the ContainerLaunchConetxt). If the application wants the INIT 
event to go to multiple providers, it can set the service data accordingly. The 
MapReduce AM hardcodes this to the default SHUFFLE_PROVIDER which is why only 
that one gets the init event.

There may be auxillary services which are not responsible for shuffle, or are 
in general incompatible with the shuffle consumer configured by the job. I 
don't think they need to get an INIT event.

> APPLICATION_INIT is never sent to AuxServices other than the builtin 
> ShuffleHandler
> ---
>
> Key: YARN-802
> URL: https://issues.apache.org/jira/browse/YARN-802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager
>Affects Versions: 2.0.4-alpha
>Reporter: Avner BenHanoch
>
> APPLICATION_INIT is never sent to AuxServices other than the built-in 
> ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
> able to function, because APPLICATION_INIT enables the AuxiliaryService to 
> map jobId->userId. This is needed for properly finding the MOFs of a job per 
> reducers' requests.
> NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
> hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
> explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
> ...) and ignores any additional AuxiliaryService. As a result, only the 
> built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
> AuxillaryService will never get APPLICATION_INIT events.
> I think a solution can be in one of two ways:
> 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
> each of them, by calling serviceData.put (…) in loop.
> 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
> "APPLICATION_STOP is never sent to AuxServices".  This means that in case the 
> 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
> Services regardless of the value in event.getServiceID().
> I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
> needed patch for any option that people like.
> See [Pluggable Shuffle in Hadoop 
> documentation|http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira