Re: [HACKERS] Hostnames in pg_hba.conf

Mark Mielke Thu, 11 Feb 2010 07:37:26 -0800

On 02/11/2010 08:13 AM, Bart Samwel wrote:

ISSUE #1: Performance / caching
At present, I've simply not added caching. The reasoning for this isas follows:
(a) getaddrinfo doesn't tell us about expiry, so when do you refresh?
(b) If you put the cache in the postmaster, it will not work forexec-based backends as opposed to fork-based backends, since thoseread pg_hba.conf every time they are exec'ed.(c) If you put this in the postmaster, the postmaster will have toupdate the cache every once in a while, which may be slow and whichmay prevent new connections while the cache update takes place.(d) Outdated cache entries may inexplicably and without any loggingchoose the wrong rule for some clients. Big aargh: people will startusing this to specify 'deny' rules based on host names.
If you COULD get expiry info out of getaddrinfo you could potentiallystore this info in a table or something like that, and have it updatedby the backends? But that's way over my head for now. ISTM that thisstuff may better be handled by a locally-running caching DNS server,if people have performance issues with the lack of caching. Theselocal caching DNS servers can also handle expiry correctly, etcetera.
We should of course still take care to look up a given hostname onlyonce for each connection request.

You should cache for some minimal amount of time or some minimal numberof records - even if it's just one minute, and even if it's a fixedlength LRU sorted list. This would deal with situations where a newconnection is raised several times a second (some types of load). Forconnections raised once a minute or less, the benefit of caching is farless. But, this can be a feature tagged on later if necessary anddoesn't need to gate the feature.

Many UNIX/Linux boxes have some sort of built-in cache, sometimespersistent, sometimes shared. On my Linux box, I have nscd - "nameserver caching daemon" - which should be able to cache these sorts oflookups. I believe it is used for things as common as mapping uid tousername in output of "/bin/ls -l", so it does need to be pretty fast.

The difference between in process cache and something like "nscd" is theinter-process communication required to use "nscd".

ISSUE #2: Reverse lookup?
There was a suggestion on the TODO list on the wiki, which basicallysaid that maybe we could use reverse lookup to find "the" hostname andthen check for that hostname in the list. I think that won't work,since IPs can go by many names and may not support reverse lookup forsome hostnames (/etc/hosts anybody?). Furthermore, due to thetop-to-bottom processing of pg_hba.conf, you CANNOT SKIP entries thatmight possibly match. For instance, if the third line is for host"foo.example.com <http://foo.example.com>" and the fifth line is for"bar.example.com <http://bar.example.com>", both lines may apply tothe same IP, and you still HAVE to check the first one, even ifreverse lookup turns up the second host name. So it doesn't save youany lookups, it just costs an extra one.

I don't see a need to do a reverse lookup. Reverse lookups are sometimesdone as a verification check, in the sense that it's cheap to get a mapfrom NAME -> IP, but sometimes it is much harder to get the reverse mapfrom IP -> NAME. However, it's not a reliable check as many legitimateusers have trouble getting a reverse map from IP -> NAME. It alsodoesn't same anything as IP -> NAME lookups are a completely differentset of name servers, and these name servers are not always optimized forspeed as IP -> NAME lookups are less common than NAME -> IP. Finally, ifone finds a map from IP -> NAME, that doesn't prove that a map from NAME-> IP exists, so using *any* results from IP -> NAME is questionable.


I think reverse lookups are unnecessary and undesirable.

ISSUE #3: Multiple hostnames?
Currently, a pg_hba entry lists an IP / netmask combination. I wouldsuggest allowing lists of hostnames in the entries, so that you can atleast mimic the "match multiple hosts by a single rule". Any reasonnot to do this?

I'm mixed. In some situations, I've wanted to put multiple IP/netmask. Iwould say that if multiple names are supported, then multiple IP/netmaskshould be supported. But, this does make the lines unwieldy beyond twoor three. This direction leans towards the capability to define "hostclasses", where the rules allows the host class, and the host class canhave a list of hostnames.


Two other aspects I don't see mentioned:

1) What will you do for hostnames that have multiple IP addresses? Willyou accept all IP addresses as being valid?2) What will you do if they specify a hostname and a netmask? This seemslike a convenient way of saying "everybody on the same subnet as NAME."


Cheers,
mark

--
Mark Mielke<[email protected]>

Re: [HACKERS] Hostnames in pg_hba.conf

Reply via email to