[ 
https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5113:
-----------------------------
    Component/s: Spark Core

> Audit and document use of hostnames and IP addresses in Spark
> -------------------------------------------------------------
>
>                 Key: SPARK-5113
>                 URL: https://issues.apache.org/jira/browse/SPARK-5113
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Patrick Wendell
>            Priority: Critical
>
> Spark has multiple network components that start servers and advertise their 
> network addresses to other processes.
> We should go through each of these components and make sure they have 
> consistent and/or documented behavior wrt (a) what interface(s) they bind to 
> and (b) what hostname they use to advertise themselves to other processes. We 
> should document this clearly and explain to people what to do in different 
> cases (e.g. EC2, dockerized containers, etc).
> When Spark initializes, it will search for a network interface until it finds 
> one that is not a loopback address. Then it will do a reverse DNS lookup for 
> a hostname associated with that interface. Then the network components will 
> use that hostname to advertise the component to other processes. That 
> hostname is also the one used for the akka system identifier (akka supports 
> only supplying a single name which it uses both as the bind interface and as 
> the actor identifier). In some cases, that hostname is used as the bind 
> hostname also (e.g. I think this happens in the connection manager and 
> possibly akka) - which will likely internally result in a re-resolution of 
> this to an IP address. In other cases (the web UI and netty shuffle) we seem 
> to bind to all interfaces.
> The best outcome would be to have three configs that can be set on each 
> machine:
> {code}
> SPARK_LOCAL_IP # Ip address we bind to for all services
> SPARK_INTERNAL_HOSTNAME # Hostname we advertise to remote processes within 
> the cluster
> SPARK_EXTERNAL_HOSTNAME # Hostname we advertise to processes outside the 
> cluster (e.g. the UI)
> {code}
> It's not clear how easily we can support that scheme while providing 
> backwards compatibility. The last one (SPARK_EXTERNAL_HOSTNAME) is easy - 
> it's just an alias for what is now SPARK_PUBLIC_DNS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to