>
> While I believe this particular logic of setting LIBPROCESS_ADVERTISE_IP
> to agent IP can be done in the agent (it could look at the port mapping
> as well)


What if there are multiple port mappings? How can the agent decide which
port to be used as  LIBPROCESS_ADVERTISE_PORT?

On Tue, Oct 11, 2016 at 9:27 PM, Avinash Sridharan <[email protected]>
wrote:

> Definitely a +1 for executor binding to 0.0.0.0, instead of doing a
> `gethostname` and `getaddrinfo`. But I am assuming this semantics would
> kick in only if LIBPROCESS_IP is not set, which should be the norm.
>
> +1 for LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT and the onus
> being on the frameworks to set these variables. I guess the framework can
> set the LIBPROCESS_ADVERTISE_IP to the agent IP and
> LIBPROCESS_ADVERTISE_PORT to the host port when it specifies a
> port-mapping. While I believe this particular logic of
> setting LIBPROCESS_ADVERTISE_IP to agent IP can be done in the agent (it
> could look at the port mapping as well), when to actually set these
> variables (whether the executors even need to advertise their IP addresses,
> is a decision that the Frameworks should be privy too and not left to the
> agent.
>
> On Tue, Oct 11, 2016 at 7:31 PM, haosdent <[email protected]> wrote:
>
> > > libprocess should always bind to 0.0.0.0
> > + 1 for this
> >
> > On Wed, Oct 12, 2016 at 2:33 AM, Jie Yu <[email protected]> wrote:
> >
> > > Hi folks,
> > >
> > > I was in the process of cleaning up some tech debt related to env
> > variables
> > > in our code base. I created an epic ticket
> > > <https://issues.apache.org/jira/browse/MESOS-6341> to track. I
> searched
> > > relevant tickets fired previously, and found MESOS-3740
> > > <https://issues.apache.org/jira/browse/MESOS-3740>. I did some digging
> > on
> > > how we handle LIBPROCESS_IP currently, and here are my findings:
> > >
> > > 1) We always set LIBPROCESS_IP in the executor environment variables:
> > > https://github.com/apache/mesos/blob/master/src/slave/
> > > slave.cpp#L6793-L6796
> > >
> > > This is not an issue for an executor that runs on host network.
> However,
> > if
> > > the executor wants to run on non-host network (e.g., overlay), this
> might
> > > be problematic, because libprocess for the executor will try to bind to
> > > LIBPROCESS_IP, but the IP is not valid inside the container.
> > >
> > > 2) As mentioned in MESOS-3740
> > > <https://issues.apache.org/jira/browse/MESOS-3740>, some user wants to
> > run
> > > a Mesos framework in a Mesos container. The old style framework driver
> > > assumes a 2 way communication channel between the framework and the
> Mesos
> > > master. In order for the master to reach the framework running inside a
> > > Mesos container, the framework's libprocess should advertise its ip and
> > > port properly. This problem gets tricky because the networking for the
> > > Mesos container:
> > >
> > > 2.a) If the container uses host network, libprocess should bind to
> > 0.0.0.0,
> > > and advertise itself using the agent ip and the relevant port
> > > 2.b) If the container has a routable ip (e.g., using calico or
> overlay),
> > > libprocess should still bind to 0.0.0.0, and advertise itself using the
> > > container ip and the relevant port. Currently, it binds to agent ip
> > (which
> > > will fail), and advertise itself using agnet ip and the port in the
> > > container (which will fail as well)
> > > 2.c) If the container has a private ip (e.g., bridge), libprocess
> should
> > > still bind to 0.0.0.0, and advertise itself using the agent ip and
> > _mapped_
> > > host port. Currently, it binds to agent ip (which will fail), and
> > advertise
> > > itself using agent ip and the port in the container (which will fail as
> > > well)
> > >
> > > Therefore, the workaround
> > > <https://github.com/mesosphere/mesos/commit/
> > b9c622b53b3ffcc27911fcdcefc37a
> > > 52ebe33bdd>
> > > suggested in MESOS-3740 <https://issues.apache.org/
> > jira/browse/MESOS-3740>
> > > is not ideal. It does not consider 2.b) and 2.c)
> > >
> > > Libprocess now supports both LIBPROCESS_IP and LIBPROCESS_ADVERTISE_IP
> so
> > > the bind address does not have to be the address that is being
> > advertised.
> > >
> > > For the 2.c) case, Mesos don't have a way to determine the advertise
> port
> > > (mapped port). This information is only known to the framework (which
> > host
> > > port it'll use to serve as the mapped port for the libprocess).
> > >
> > > Given that, I think Mesos should not bindly set LIBPROCESS_IP to agent
> IP
> > > in executor environment variables. Framework should be the one that
> sets
> > > LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT appropriately if
> it
> > > tries to launch another Mesos framework so that Master can reach the
> new
> > > framework. If the framework just wants to launch a regular container
> that
> > > does not depends on libprocess, it should simply not set these env
> > > variables.
> > >
> > > Also, I think libprocess should always bind to 0.0.0.0, rather than
> > doing a
> > > hostname lookup and bind to the IP found for the hostname.
> > > LIBPROCESS_ADVERTISE_IP can be used to overwrite the ip address it
> wants
> > to
> > > advertise to peers. If that's not specified, it'll try to do a hostname
> > > lookup to guess a routable ip.
> > >
> > > Thoughts?
> > > - Jie
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
> >
>
>
>
> --
> Avinash Sridharan, Mesosphere
> +1 (323) 702 5245
>

Reply via email to