[ 
https://issues.apache.org/jira/browse/SPARK-22404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308802#comment-16308802
 ] 

Devaraj K commented on SPARK-22404:
-----------------------------------

Thanks [~irashid] for the comment.

bq. can you provide a little more explanation for the point of this?

An unmanagedAM is an AM that is not launched and managed by the RM. The client 
creates a new application on the RM and negotiates a new attempt id. Then it 
waits for the RM app state to reach be YarnApplicationState.ACCEPTED after 
which it spawns the AM in same/another process and passes it the container id 
via env variable Environment.CONTAINER_ID. The AM(as part of same or different 
process) can register with the RM using the attempt id obtained from the 
container id and proceed as normal.

In this PR/JIRA, providing a new configuration "spark.yarn.un-managed-am" 
(defaults to false) to enable the Unmanaged AM Application in Yarn Client mode 
which starts the Application Master service as part of the Client. It utilizes 
the existing code for communicating between the Application Master <-> Task 
Scheduler for the container requests/allocations/launch, and eliminates these,
*       Allocating and launching the Application Master container
*       Remote Node/Process communication between Application Master <-> Task 
Scheduler

bq. how much time does this save for you?
It removes the AM container scheduling and launching time, and eliminates the 
AM acting as proxy for requesting, launching and removing executors. I can post 
the comparison results here with and without unmanaged am.

bq. What's the downside of an unmanaged AM?
Unmanaged AM service would run as part of the Client, Client can handle if 
anything goes wrong with the unmanaged AM service unlike relaunching the AM 
container for failures.

bq. the idea makes sense, but the yarn interaction and client mode is already 
pretty complicated so I'd like good justication for this
In this PR, it reuses the most of the existing code for communication between 
AM <-> Task Scheduler but happens in the same process. The Client starts the AM 
service in the same process when the applications state is ACCEPTED and 
proceeds as usual without disrupting existing flow.


> Provide an option to use unmanaged AM in yarn-client mode
> ---------------------------------------------------------
>
>                 Key: SPARK-22404
>                 URL: https://issues.apache.org/jira/browse/SPARK-22404
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 2.2.0
>            Reporter: Devaraj K
>
> There was an issue SPARK-1200 to provide an option but was closed without 
> fixing.
> Using an unmanaged AM in yarn-client mode would allow apps to start up 
> faster, but not requiring the container launcher AM to be launched on the 
> cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to