[ https://issues.apache.org/jira/browse/SPARK-15974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332749#comment-15332749 ]
Mingyu Kim commented on SPARK-15974: ------------------------------------ I agree this is not ideal. A lot of times setting up a server with an Socket won't be an unreasonable thing to do, though. The alternative would be to have Spark program pass some information to Spark AM during the start-up. (Having Spark program set port to YARN is not possible as discussed on the thread linked above.) This can probably done through the use of static variables in the Spark program class. None of these sound particularly great to me, but here are some options I can think of, - Spark program class optionally has Map<String, Object> initialize() method, which returns some named objects back to Spark AM. "rpc-port" could be one of the key names supported, and we can imagine adding more keys later. Spark program class will need to store some information (in the case of RPC port, a Server object or Socket) as a static var for main method to use. - Pass something like a SettableFuture to the main method so that Spark AM can wait for some initialization to be done. This means that command line args need to be augmented with this one extra thing, which is confusing, or that the SettableFuture needs to be passed to Spark program class through some other method and then stored as a static var in Spark program class for the main method to use. Another option would be to change the way spark-submitted applications are written so that the class implements an interface with an explicit initialize method, as opposed to a class with the main method, which allows us to avoid playing with the static variables, but this will be a pretty big compatibility break for Spark. > Create a socket on YARN AM start-up > ----------------------------------- > > Key: SPARK-15974 > URL: https://issues.apache.org/jira/browse/SPARK-15974 > Project: Spark > Issue Type: New Feature > Components: YARN > Reporter: Mingyu Kim > > YARN provides a way for AppilcationMaster to register a RPC port so that a > client outside the YARN cluster can reach the application for any RPCs, but > Spark’s YARN AMs simply register a dummy port number of 0. For the Spark > programs that starts up a server, this makes it hard for the submitter to > discover the server port securely. Spark's ApplicationMaster should > optionally create a ServerSocket and pass it to the Spark user program. This > socket initialization should be disabled by default. > Some discussion on dev@spark thread: > http://apache-spark-developers-list.1001551.n3.nabble.com/Utilizing-YARN-AM-RPC-port-field-td17892.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org