On Tuesday, January 27, 2015 at 2:53:46 PM UTC-8, John Bollinger wrote:
>
>
> As far as I can tell, it is a design characteristic of the current hosts 
> file format that it associates each address with exactly one canonical 
> name, and each canonical name with exactly one address.  This is a bit 
> ticklish, though, because there seems to be no canonical reference for the 
> file format itself.  Nevertheless, the Linux manpage for it says it has *one 
> line* per IP address, and that "[f]or each host *a single line* should be 
> present [...]" (emphasis added).
>

I've done some investigation into various implementations of DNS resolvers 
and can say that the documentation (or at least this interpretation 
thereof) is inaccurate.  getaddrinfo(3), which is used on all platforms 
including Windows to resolve hostnames, provides a linked list of results 
and can optionally provide the canonical hostname.  When a hostname is 
listed in the hosts file multiple times, getaddrinfo(3) provides all the IP 
addresses that are listed for that host, just like it would if it had to 
fetch that information from DNS.  There are, however, some differences 
between platforms as to how canonical names are determined if a host is 
listed as the canonical name for one IP and an alias for another.  
Explanation of this will take some doing, so bear with me here.

If you have the following hosts file:

1.1.1.1    host
1.1.2.2    host
1.1.1.1    other

Doing a lookup on "other" will return a linked list with one element, which 
contains the IP address 1.1.1.1 and the canonical name "other".  This is 
true on all platforms I tested (i.e. Linux, FreeBSD, OpenBSD, OS X/Darwin, 
Windows).  Doing a lookup on "host" will return a linked list with two 
elements, the first with IP address 1.1.1.1 and canonical name "host", and 
the second with IP address 1.1.2.2.  This is where things differ.  
According to the documentation for all platforms, only the first item in 
the list is supposed to have the canonical name.  Despite this, the BSD 
variants (i.e. FreeBSD, OpenBSD, and OS X/Darwin) fill in the canonical 
name on all elements of the list.  So on Linux and Windows, the second 
element has no canonical name provided, but on the BSDs "host" is listed 
for the second element.

If you have this hosts file:

1.1.1.1    host
1.1.2.2    host    other
1.1.1.1    other

Things are a bit different.  The results for looking up "host" are the same 
on all platforms as they were in the previous example, but when looking up 
"other" things vary.  On Linux, the first element has an IP address of 
"1.1.2.2" and a canonical name of "other" and the second element has an IP 
address of "1.1.1.1" and no canonical name.  On FreeBSD and OpenBSD, the 
first element has an IP address of "1.1.2.2" and a canonical name of "host" 
and the second element has an IP address of "1.1.1.1" and a canonical name 
of "other".  On OS X, the order gets switched up; the first element has an 
IP address of "1.1.1.1" and a canonical name of "other" and the second 
element has an IP address of "1.1.2.2" and a canonical name of "other".  On 
Windows, the first element has an IP address of "1.1.1.1" and a canonical 
name of "other" and there is no second element because I guess as far as 
Windows is concerned hostnames can be either canonical or aliases but not 
both, and canonical takes precedence.

If you have this hosts file:

1.1.1.1    host    other
1.1.2.2    host
1.1.1.1    other

Things get weird.  Windows stands by its stance of "canonical and alias are 
mutually exclusive" and provides a single element containing "1.1.1.1" and 
"other".  Linux provides two elements with IP address "1.1.1.1", the first 
with a canonical name of "other" and the second without.  FreeBSD and 
OpenBSD provide two elements with IP address "1.1.1.1", the first with 
canonical name "host" and the second with canonical name "other".  OS 
X/Darwin, however, does something weird.  It provides three elements, the 
first two with IP address "1.1.1.1" and canonical name "other", and the 
third with IP address "1.1.2.2" and canonical name "other".

What?

As far as I can tell, OS X/Darwin's outlook is, "Well, the canonical name 
was something else sometimes, so I went ahead and resolved that for you 
too.  Also, I didn't make any way for you to tell which items had the 
different canonical name.  You're welcome!"  This behavior continues if you 
use this hosts file:

1.1.1.1    host    other
1.1.2.2    host

In this case, when resolving "other", every platform except for OS X/Darwin 
provides one element with an IP address of "1.1.1.1" and a canonical name 
of "host".  By contrast, OS X/Darwin provides two elements, the first with 
IP address "1.1.1.1" and canonical name "host" and the second with IP 
address "1.1.2.2" and canonical name "host".


So yeah.  I do think that the host type should support specifying multiple 
IPs for the same hostname, because every resolver implementation I can 
track down seems to support that (with the possible exception of Solaris, 
which I can check tomorrow, though I very much doubt that it'll prove an 
outlier).  This makes sense as the hosts file is something of a poor man's 
lightning-fast DNS server.  It may be worth also putting some logic in to 
detect cases where a hostname is specified as both a canonical name and an 
alias and throwing a warning or an error.  Also to detect cases (at least 
on OS X) where an alias is specified for some but not all of the IP 
addresses listed for its canonical name.

 

> Indeed, though the type's documentation merely says that a Host resource 
> represents a "host entry", the longtime design demonstrates that it more 
> specifically represents a mapping from a canonical hostname to properties 
> of that hostname including a network address.  It is implicit in the 
> historic use of hostname alone as namevar that duplicate canonical names 
> cannot be modeled.  That these entries are typically recorded in /etc/hosts 
> (on some systems) is in fact a function of the provider and of the 'target' 
> property, so really the format and allowed usage of particular host files 
> in particular contexts can be only weak guidance for whether the model is 
> appropriate.
>

By contrast, in Chef the namevar is the IP address 
<https://github.com/customink-webops/hostsfile/>.  The fact that the 
canonical hostname is the namevar here is merely a choice that was made 
when the type was designed, and I am of the opinion that the design is 
flawed.
 

> Objection, Your Honor!  Describing the issue as flawed parsing assumes 
> that the files being parsed are correct, and that they are (intended to be) 
> supported by Puppet, but the validity of both assertions is unclear.  To be 
> sure, Host files containing more than one record bearing the same canonical 
> name do not comply with Puppet's model for host entries.  It is 
> unsurprising that Puppet does not handle such files well, but that could as 
> easily be ascribed to invalid/incompatible files as to flawed parsing.
>

This is true, and doesn't really matter for a system entirely provisioned 
by/with Puppet, but if one is attempting to Puppetize infrastructure that 
already exists, this may well be a problem, one with no good solution at 
present other than to fall back to managing the hosts file with a file 
resource.  This is suboptimal.
 

> Additionally, perhaps it would be better to deprecate the Host resource in 
> favor of something different, maybe a "HostEntry", that is not burdened 
> with the same limitations.
>

I actually suggested this very thing in my most recent comment on the JIRA 
issue 
<https://tickets.puppetlabs.com/browse/PUP-3901?focusedCommentId=133116&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-133116>,
 
albeit as part of way to enable relationships without requiring collectors.

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/e31b45d6-c377-4858-a810-36ec2c65ea9a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to