[ https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-5113: ----------------------------- Component/s: Spark Core > Audit and document use of hostnames and IP addresses in Spark > ------------------------------------------------------------- > > Key: SPARK-5113 > URL: https://issues.apache.org/jira/browse/SPARK-5113 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: Patrick Wendell > Priority: Critical > > Spark has multiple network components that start servers and advertise their > network addresses to other processes. > We should go through each of these components and make sure they have > consistent and/or documented behavior wrt (a) what interface(s) they bind to > and (b) what hostname they use to advertise themselves to other processes. We > should document this clearly and explain to people what to do in different > cases (e.g. EC2, dockerized containers, etc). > When Spark initializes, it will search for a network interface until it finds > one that is not a loopback address. Then it will do a reverse DNS lookup for > a hostname associated with that interface. Then the network components will > use that hostname to advertise the component to other processes. That > hostname is also the one used for the akka system identifier (akka supports > only supplying a single name which it uses both as the bind interface and as > the actor identifier). In some cases, that hostname is used as the bind > hostname also (e.g. I think this happens in the connection manager and > possibly akka) - which will likely internally result in a re-resolution of > this to an IP address. In other cases (the web UI and netty shuffle) we seem > to bind to all interfaces. > The best outcome would be to have three configs that can be set on each > machine: > {code} > SPARK_LOCAL_IP # Ip address we bind to for all services > SPARK_INTERNAL_HOSTNAME # Hostname we advertise to remote processes within > the cluster > SPARK_EXTERNAL_HOSTNAME # Hostname we advertise to processes outside the > cluster (e.g. the UI) > {code} > It's not clear how easily we can support that scheme while providing > backwards compatibility. The last one (SPARK_EXTERNAL_HOSTNAME) is easy - > it's just an alias for what is now SPARK_PUBLIC_DNS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org