Hi all,
YARN provides a way for AppilcationMaster to register a RPC port so that a client outside the YARN cluster can reach the application for any RPCs, but Spark’s YARN AMs simply register a dummy port number of 0. (See https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClient.scala#L74) This is useful for the long-running Spark application usecases where jobs are submitted via a form of RPC to an already started Spark context running in YARN cluster mode. Spark job server (https://github.com/spark-jobserver/spark-jobserver) and Livy (https://github.com/cloudera/hue/tree/master/apps/spark/java) are good open-source examples of these usecases. The current work-around is to have the Spark AM make a call back to a configured URL with the port number of the RPC server for the client to communicate with the AM. Utilizing YARN AM RPC port allows the port number reporting to be done in a secure way (i.e. With AM RPC port field and Kerberized YARN cluster, you don’t need to re-invent a way to verify the authenticity of the port number reporting.) and removes the callback from YARN cluster back to a client, which means you can operate YARN in a low-trust environment and run other client applications behind a firewall. A couple of proposals for utilizing YARN AM RPC port I have are, (Note that you cannot simply pre-configure the port number and pass it to Spark AM via configuration because of potential port conflicts on the YARN node) · Start-up an empty Jetty server during Spark AM initialization, set the port number when registering AM with RM, and pass a reference to the Jetty server into the Spark application (e.g. through SparkContext) for the application to dynamically add servlet/resources to the Jetty server. · Have an optional static method in the main class (e.g. initializeRpcPort()) which optionally sets up a RPC server and returns the RPC port. Spark AM can call this method, register the port number to RM and continue on with invoking the main method. I don’t see this making a good API, though. I’m curious to hear what other people think. Would this be useful for anyone? What do you think about the proposals? Please feel free to suggest other ideas. Thanks! Mingyu
smime.p7s
Description: S/MIME cryptographic signature