[ https://issues.apache.org/jira/browse/MESOS-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201746#comment-17201746 ]
Benjamin Mahler commented on MESOS-10190: ----------------------------------------- cc [~qianzhang] > libprocess fails with "Failed to obtain the IP address for <uuid>" when using > CNI on some hosts > ----------------------------------------------------------------------------------------------- > > Key: MESOS-10190 > URL: https://issues.apache.org/jira/browse/MESOS-10190 > Project: Mesos > Issue Type: Bug > Components: executor > Affects Versions: 1.9.0 > Reporter: acecile5555555 > Priority: Major > > Hello, > > We deployed CNI support and 3 of our hosts (all the same) are failing to > start container with CNI enabled. The log file is: > {noformat} > E0917 16:58:11.481551 16770 process.cpp:1153] EXIT with status 1: Failed to > obtain the IP address for '7c4beac7-5385-4dfa-845a-beb01e13c77c'; the DNS > service may not be able to resolve it: Name or service not known{noformat} > So I tried enforcing LIBPROCESS_IP using env variable but I saw Mesos > overwrites it. So I rebuilt Mesos with additionnal debugging and here is the > log: > {noformat} > Overwriting environment variable 'LIBPROCESS_IP' from '10.99.50.3' to > '0.0.0.0' > E0917 16:34:49.779429 31428 process.cpp:1153] EXIT with status 1: Failed to > obtain the IP address for 'de65bbd8-b237-4884-ba87-7e13cb85078f'; the DNS > service may not be able to resolve it: Name or service not known{noformat} > According to the code, it's expected to be set to 0.0.0.0 (MESOS-5127). So I > tried to understand why libprocess attempts to resolve a container run uuid > instead of the hostname, here is libprocess code: > > {noformat} > // Resolve the hostname if ip is 0.0.0.0 in case we actually have > // a valid external IP address. Note that we need only one IP > // address, so that other processes can send and receive and > // don't get confused as to whom they are sending to. > if (__address__.ip.isAny()) { > char hostname[512]; > if (gethostname(hostname, sizeof(hostname)) < 0) { > PLOG(FATAL) << "Failed to initialize, gethostname"; > } > // Lookup an IP address of local hostname, taking the first result. > Try<net::IP> ip = net::getIP(hostname, __address__.ip.family()); > if (ip.isError()) { > EXIT(EXIT_FAILURE) > << "Failed to obtain the IP address for '" << hostname << "';" > << " the DNS service may not be able to resolve it: " << ip.error(); > } > __address__.ip = ip.get(); > } > {noformat} > > Well actually this is perfectly fine, except "gethostname" returns the > container UUID instead of an valid host IP address. How is that even possible > ? > > Any help would be greatly appreciated. > Regards, Adam. -- This message was sent by Atlassian Jira (v8.3.4#803005)