Hi all,

 

YARN provides a way for AppilcationMaster to register a RPC port so that a 
client outside the YARN cluster can reach the application for any RPCs, but 
Spark’s YARN AMs simply register a dummy port number of 0. (See 
https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClient.scala#L74)
 This is useful for the long-running Spark application usecases where jobs are 
submitted via a form of RPC to an already started Spark context running in YARN 
cluster mode. Spark job server 
(https://github.com/spark-jobserver/spark-jobserver) and Livy 
(https://github.com/cloudera/hue/tree/master/apps/spark/java) are good 
open-source examples of these usecases. The current work-around is to have the 
Spark AM make a call back to a configured URL with the port number of the RPC 
server for the client to communicate with the AM.

 

Utilizing YARN AM RPC port allows the port number reporting to be done in a 
secure way (i.e. With AM RPC port field and Kerberized YARN cluster, you don’t 
need to re-invent a way to verify the authenticity of the port number 
reporting.) and removes the callback from YARN cluster back to a client, which 
means you can operate YARN in a low-trust environment and run other client 
applications behind a firewall.

 

A couple of proposals for utilizing YARN AM RPC port I have are, (Note that you 
cannot simply pre-configure the port number and pass it to Spark AM via 
configuration because of potential port conflicts on the YARN node)

 

·         Start-up an empty Jetty server during Spark AM initialization, set 
the port number when registering AM with RM, and pass a reference to the Jetty 
server into the Spark application (e.g. through SparkContext) for the 
application to dynamically add servlet/resources to the Jetty server.

·         Have an optional static method in the main class (e.g. 
initializeRpcPort()) which optionally sets up a RPC server and returns the RPC 
port. Spark AM can call this method, register the port number to RM and 
continue on with invoking the main method. I don’t see this making a good API, 
though.

 

I’m curious to hear what other people think. Would this be useful for anyone? 
What do you think about the proposals? Please feel free to suggest other ideas. 
Thanks!

 

Mingyu

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to