Definitely a +1 for executor binding to 0.0.0.0, instead of doing a `gethostname` and `getaddrinfo`. But I am assuming this semantics would kick in only if LIBPROCESS_IP is not set, which should be the norm.
+1 for LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT and the onus being on the frameworks to set these variables. I guess the framework can set the LIBPROCESS_ADVERTISE_IP to the agent IP and LIBPROCESS_ADVERTISE_PORT to the host port when it specifies a port-mapping. While I believe this particular logic of setting LIBPROCESS_ADVERTISE_IP to agent IP can be done in the agent (it could look at the port mapping as well), when to actually set these variables (whether the executors even need to advertise their IP addresses, is a decision that the Frameworks should be privy too and not left to the agent. On Tue, Oct 11, 2016 at 7:31 PM, haosdent <[email protected]> wrote: > > libprocess should always bind to 0.0.0.0 > + 1 for this > > On Wed, Oct 12, 2016 at 2:33 AM, Jie Yu <[email protected]> wrote: > > > Hi folks, > > > > I was in the process of cleaning up some tech debt related to env > variables > > in our code base. I created an epic ticket > > <https://issues.apache.org/jira/browse/MESOS-6341> to track. I searched > > relevant tickets fired previously, and found MESOS-3740 > > <https://issues.apache.org/jira/browse/MESOS-3740>. I did some digging > on > > how we handle LIBPROCESS_IP currently, and here are my findings: > > > > 1) We always set LIBPROCESS_IP in the executor environment variables: > > https://github.com/apache/mesos/blob/master/src/slave/ > > slave.cpp#L6793-L6796 > > > > This is not an issue for an executor that runs on host network. However, > if > > the executor wants to run on non-host network (e.g., overlay), this might > > be problematic, because libprocess for the executor will try to bind to > > LIBPROCESS_IP, but the IP is not valid inside the container. > > > > 2) As mentioned in MESOS-3740 > > <https://issues.apache.org/jira/browse/MESOS-3740>, some user wants to > run > > a Mesos framework in a Mesos container. The old style framework driver > > assumes a 2 way communication channel between the framework and the Mesos > > master. In order for the master to reach the framework running inside a > > Mesos container, the framework's libprocess should advertise its ip and > > port properly. This problem gets tricky because the networking for the > > Mesos container: > > > > 2.a) If the container uses host network, libprocess should bind to > 0.0.0.0, > > and advertise itself using the agent ip and the relevant port > > 2.b) If the container has a routable ip (e.g., using calico or overlay), > > libprocess should still bind to 0.0.0.0, and advertise itself using the > > container ip and the relevant port. Currently, it binds to agent ip > (which > > will fail), and advertise itself using agnet ip and the port in the > > container (which will fail as well) > > 2.c) If the container has a private ip (e.g., bridge), libprocess should > > still bind to 0.0.0.0, and advertise itself using the agent ip and > _mapped_ > > host port. Currently, it binds to agent ip (which will fail), and > advertise > > itself using agent ip and the port in the container (which will fail as > > well) > > > > Therefore, the workaround > > <https://github.com/mesosphere/mesos/commit/ > b9c622b53b3ffcc27911fcdcefc37a > > 52ebe33bdd> > > suggested in MESOS-3740 <https://issues.apache.org/ > jira/browse/MESOS-3740> > > is not ideal. It does not consider 2.b) and 2.c) > > > > Libprocess now supports both LIBPROCESS_IP and LIBPROCESS_ADVERTISE_IP so > > the bind address does not have to be the address that is being > advertised. > > > > For the 2.c) case, Mesos don't have a way to determine the advertise port > > (mapped port). This information is only known to the framework (which > host > > port it'll use to serve as the mapped port for the libprocess). > > > > Given that, I think Mesos should not bindly set LIBPROCESS_IP to agent IP > > in executor environment variables. Framework should be the one that sets > > LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT appropriately if it > > tries to launch another Mesos framework so that Master can reach the new > > framework. If the framework just wants to launch a regular container that > > does not depends on libprocess, it should simply not set these env > > variables. > > > > Also, I think libprocess should always bind to 0.0.0.0, rather than > doing a > > hostname lookup and bind to the IP found for the hostname. > > LIBPROCESS_ADVERTISE_IP can be used to overwrite the ip address it wants > to > > advertise to peers. If that's not specified, it'll try to do a hostname > > lookup to guess a routable ip. > > > > Thoughts? > > - Jie > > > > > > -- > Best Regards, > Haosdent Huang > -- Avinash Sridharan, Mesosphere +1 (323) 702 5245
