Utilizing YARN AM RPC port field

Mingyu Kim Mon, 13 Jun 2016 17:31:09 -0700

Hi all,

YARN provides a way for AppilcationMaster to register a RPC port so that a
client outside the YARN cluster can reach the application for any RPCs, but
Spark’s YARN AMs simply register a dummy port number of 0. (See
https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClient.scala#L74)
This is useful for the long-running Spark application usecases where jobs are
submitted via a form of RPC to an already started Spark context running in YARN
cluster mode. Spark job server
(https://github.com/spark-jobserver/spark-jobserver) and Livy
(https://github.com/cloudera/hue/tree/master/apps/spark/java) are good
open-source examples of these usecases. The current work-around is to have the
Spark AM make a call back to a configured URL with the port number of the RPC
server for the client to communicate with the AM.

Utilizing YARN AM RPC port allows the port number reporting to be done in a
secure way (i.e. With AM RPC port field and Kerberized YARN cluster, you don’t
need to re-invent a way to verify the authenticity of the port number
reporting.) and removes the callback from YARN cluster back to a client, which
means you can operate YARN in a low-trust environment and run other client
applications behind a firewall.

A couple of proposals for utilizing YARN AM RPC port I have are, (Note that you
cannot simply pre-configure the port number and pass it to Spark AM via
configuration because of potential port conflicts on the YARN node)

· Start-up an empty Jetty server during Spark AM initialization, set
the port number when registering AM with RM, and pass a reference to the Jetty
server into the Spark application (e.g. through SparkContext) for the
application to dynamically add servlet/resources to the Jetty server.

· Have an optional static method in the main class (e.g.
initializeRpcPort()) which optionally sets up a RPC server and returns the RPC
port. Spark AM can call this method, register the port number to RM and
continue on with invoking the main method. I don’t see this making a good API,
though.

I’m curious to hear what other people think. Would this be useful for anyone?
What do you think about the proposals? Please feel free to suggest other ideas.
Thanks!

Mingyu

smime.p7s
Description: S/MIME cryptographic signature

Utilizing YARN AM RPC port field

Reply via email to