> > While I believe this particular logic of setting LIBPROCESS_ADVERTISE_IP > to agent IP can be done in the agent (it could look at the port mapping > as well)
What if there are multiple port mappings? How can the agent decide which port to be used as LIBPROCESS_ADVERTISE_PORT? On Tue, Oct 11, 2016 at 9:27 PM, Avinash Sridharan <[email protected]> wrote: > Definitely a +1 for executor binding to 0.0.0.0, instead of doing a > `gethostname` and `getaddrinfo`. But I am assuming this semantics would > kick in only if LIBPROCESS_IP is not set, which should be the norm. > > +1 for LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT and the onus > being on the frameworks to set these variables. I guess the framework can > set the LIBPROCESS_ADVERTISE_IP to the agent IP and > LIBPROCESS_ADVERTISE_PORT to the host port when it specifies a > port-mapping. While I believe this particular logic of > setting LIBPROCESS_ADVERTISE_IP to agent IP can be done in the agent (it > could look at the port mapping as well), when to actually set these > variables (whether the executors even need to advertise their IP addresses, > is a decision that the Frameworks should be privy too and not left to the > agent. > > On Tue, Oct 11, 2016 at 7:31 PM, haosdent <[email protected]> wrote: > > > > libprocess should always bind to 0.0.0.0 > > + 1 for this > > > > On Wed, Oct 12, 2016 at 2:33 AM, Jie Yu <[email protected]> wrote: > > > > > Hi folks, > > > > > > I was in the process of cleaning up some tech debt related to env > > variables > > > in our code base. I created an epic ticket > > > <https://issues.apache.org/jira/browse/MESOS-6341> to track. I > searched > > > relevant tickets fired previously, and found MESOS-3740 > > > <https://issues.apache.org/jira/browse/MESOS-3740>. I did some digging > > on > > > how we handle LIBPROCESS_IP currently, and here are my findings: > > > > > > 1) We always set LIBPROCESS_IP in the executor environment variables: > > > https://github.com/apache/mesos/blob/master/src/slave/ > > > slave.cpp#L6793-L6796 > > > > > > This is not an issue for an executor that runs on host network. > However, > > if > > > the executor wants to run on non-host network (e.g., overlay), this > might > > > be problematic, because libprocess for the executor will try to bind to > > > LIBPROCESS_IP, but the IP is not valid inside the container. > > > > > > 2) As mentioned in MESOS-3740 > > > <https://issues.apache.org/jira/browse/MESOS-3740>, some user wants to > > run > > > a Mesos framework in a Mesos container. The old style framework driver > > > assumes a 2 way communication channel between the framework and the > Mesos > > > master. In order for the master to reach the framework running inside a > > > Mesos container, the framework's libprocess should advertise its ip and > > > port properly. This problem gets tricky because the networking for the > > > Mesos container: > > > > > > 2.a) If the container uses host network, libprocess should bind to > > 0.0.0.0, > > > and advertise itself using the agent ip and the relevant port > > > 2.b) If the container has a routable ip (e.g., using calico or > overlay), > > > libprocess should still bind to 0.0.0.0, and advertise itself using the > > > container ip and the relevant port. Currently, it binds to agent ip > > (which > > > will fail), and advertise itself using agnet ip and the port in the > > > container (which will fail as well) > > > 2.c) If the container has a private ip (e.g., bridge), libprocess > should > > > still bind to 0.0.0.0, and advertise itself using the agent ip and > > _mapped_ > > > host port. Currently, it binds to agent ip (which will fail), and > > advertise > > > itself using agent ip and the port in the container (which will fail as > > > well) > > > > > > Therefore, the workaround > > > <https://github.com/mesosphere/mesos/commit/ > > b9c622b53b3ffcc27911fcdcefc37a > > > 52ebe33bdd> > > > suggested in MESOS-3740 <https://issues.apache.org/ > > jira/browse/MESOS-3740> > > > is not ideal. It does not consider 2.b) and 2.c) > > > > > > Libprocess now supports both LIBPROCESS_IP and LIBPROCESS_ADVERTISE_IP > so > > > the bind address does not have to be the address that is being > > advertised. > > > > > > For the 2.c) case, Mesos don't have a way to determine the advertise > port > > > (mapped port). This information is only known to the framework (which > > host > > > port it'll use to serve as the mapped port for the libprocess). > > > > > > Given that, I think Mesos should not bindly set LIBPROCESS_IP to agent > IP > > > in executor environment variables. Framework should be the one that > sets > > > LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT appropriately if > it > > > tries to launch another Mesos framework so that Master can reach the > new > > > framework. If the framework just wants to launch a regular container > that > > > does not depends on libprocess, it should simply not set these env > > > variables. > > > > > > Also, I think libprocess should always bind to 0.0.0.0, rather than > > doing a > > > hostname lookup and bind to the IP found for the hostname. > > > LIBPROCESS_ADVERTISE_IP can be used to overwrite the ip address it > wants > > to > > > advertise to peers. If that's not specified, it'll try to do a hostname > > > lookup to guess a routable ip. > > > > > > Thoughts? > > > - Jie > > > > > > > > > > > -- > > Best Regards, > > Haosdent Huang > > > > > > -- > Avinash Sridharan, Mesosphere > +1 (323) 702 5245 >
