acecile5555555 created MESOS-10190:
--------------------------------------
Summary: libprocess fails with "Failed to obtain the IP address
for <uuid>" when using CNI on some hosts
Key: MESOS-10190
URL: https://issues.apache.org/jira/browse/MESOS-10190
Project: Mesos
Issue Type: Bug
Components: executor
Affects Versions: 1.9.0
Reporter: acecile5555555
Hello,
We deployed CNI support and 3 of our hosts (all the same) are failing to start
container with CNI enabled. The log file is:
{noformat}
E0917 16:58:11.481551 16770 process.cpp:1153] EXIT with status 1: Failed to
obtain the IP address for '7c4beac7-5385-4dfa-845a-beb01e13c77c'; the DNS
service may not be able to resolve it: Name or service not known{noformat}
So I tried enforcing LIBPROCESS_IP using env variable but I saw Mesos
overwrites it. So I rebuilt Mesos with additionnal debugging and here is the
log:
{noformat}
Overwriting environment variable 'LIBPROCESS_IP' from '10.99.50.3' to '0.0.0.0'
E0917 16:34:49.779429 31428 process.cpp:1153] EXIT with status 1: Failed to
obtain the IP address for 'de65bbd8-b237-4884-ba87-7e13cb85078f'; the DNS
service may not be able to resolve it: Name or service not known{noformat}
According to the code, it's expected to be set to 0.0.0.0 (MESOS-5127). So I
tried to understand why libprocess attempts to resolve a container run uuid
instead of the hostname, here is libprocess code:
{noformat}
// Resolve the hostname if ip is 0.0.0.0 in case we actually have
// a valid external IP address. Note that we need only one IP
// address, so that other processes can send and receive and
// don't get confused as to whom they are sending to.
if (__address__.ip.isAny()) {
char hostname[512]; if (gethostname(hostname, sizeof(hostname)) < 0) {
PLOG(FATAL) << "Failed to initialize, gethostname";
} // Lookup an IP address of local hostname, taking the first result.
Try<net::IP> ip = net::getIP(hostname, __address__.ip.family()); if
(ip.isError()) {
EXIT(EXIT_FAILURE)
<< "Failed to obtain the IP address for '" << hostname << "';"
<< " the DNS service may not be able to resolve it: " << ip.error();
} __address__.ip = ip.get();
}
{noformat}
Well actually this is perfectly fine, except "gethostname" returns the
container UUID instead of an valid host IP address. How is that even possible ?
Any help would be greatly appreciated.
Regards, Adam.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)