[ 
https://issues.apache.org/jira/browse/SPARK-15974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332749#comment-15332749
 ] 

Mingyu Kim commented on SPARK-15974:
------------------------------------

I agree this is not ideal. A lot of times setting up a server with an Socket 
won't be an unreasonable thing to do, though.

The alternative would be to have Spark program pass some information to Spark 
AM during the start-up. (Having Spark program set port to YARN is not possible 
as discussed on the thread linked above.) This can probably done through the 
use of static variables in the Spark program class. None of these sound 
particularly great to me, but here are some options I can think of,

- Spark program class optionally has Map<String, Object> initialize() method, 
which returns some named objects back to Spark AM. "rpc-port" could be one of 
the key names supported, and we can imagine adding more keys later. Spark 
program class will need to store some information (in the case of RPC port, a 
Server object or Socket) as a static var for main method to use.
- Pass something like a SettableFuture to the main method so that Spark AM can 
wait for some initialization to be done. This means that command line args need 
to be augmented with this one extra thing, which is confusing, or that the 
SettableFuture needs to be passed to Spark program class through some other 
method and then stored as a static var in Spark program class for the main 
method to use.

Another option would be to change the way spark-submitted applications are 
written so that the class implements an interface with an explicit initialize 
method, as opposed to a class with the main method, which allows us to avoid 
playing with the static variables, but this will be a pretty big compatibility 
break for Spark.

> Create a socket on YARN AM start-up
> -----------------------------------
>
>                 Key: SPARK-15974
>                 URL: https://issues.apache.org/jira/browse/SPARK-15974
>             Project: Spark
>          Issue Type: New Feature
>          Components: YARN
>            Reporter: Mingyu Kim
>
> YARN provides a way for AppilcationMaster to register a RPC port so that a 
> client outside the YARN cluster can reach the application for any RPCs, but 
> Spark’s YARN AMs simply register a dummy port number of 0. For the Spark 
> programs that starts up a server, this makes it hard for the submitter to 
> discover the server port securely. Spark's ApplicationMaster should 
> optionally create a ServerSocket and pass it to the Spark user program. This 
> socket initialization should be disabled by default.
> Some discussion on dev@spark thread: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Utilizing-YARN-AM-RPC-port-field-td17892.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to