Re: [HACKERS] Hostnames in pg_hba.conf
On Fri, Feb 12, 2010 at 02:31, Mark Mielke m...@mark.mielke.cc wrote: But once there, it seems clear that packing hostnames or netmasks onto one line is just ugly and hard to manage. I'd like to see this extended to any of the many ways to allow hostnames to be specified one per line. For example: set tool_servers { 127.0.0.1/32 ::1/128 1.2.3.4/32 1.2.3.5/32 } host DATABASE USER $tool_servers md5 The above features easy parsing capability. Of course, then I'll ask for the ability to simplify specifying multiple databases: set databases { db1 db2 } set users { user1 user2 } host $databases $users $tool_servers md5 Sorry... :-) Definitely sounds useful! But I do now see that this is entirely orthogonal to what I was trying to do -- which means I don't have to do anything about it. :-) I think wildcards are interesting, but I have yet to see an actual use case other than it's cool and very generalized. In my mind (tell me if I'm wrong), the most common type of PostgreSQL authentication setup is within a local network within an organization. There, you either authorize an entire subnet (the entire server park or all client PCs) or you authorize specific hosts (single IP address). The wildcard case is for replacing the first case, but for that case, subnets are usually just fine. I'm trying to target the second case here. The user case would be an organization with nodes all over the IP space, that wants to manage configuration from a single place. DNS would be that single place of choice. If moves trust from trust the netmasks to be kept up-to-date to trust that DNS will be kept up-to-date. Since DNS has important reasons to be up-to-date, it's a pretty safe bet that DNS is equal or more up-to-date than pg_hba.conf hard coded netmasks. It makes sense, but it can be a later use case. It doesn't have to be in version 1. DNS is preferred to subnets in that regard, definitely. But again, that points to the per-hostname route, and it's not a use case for the wildcard route (unless people explicitly choose to organize their DNS hierarchy so that they can use it for PostgreSQL authorization -- doubtful.) Cheers, Bart
Re: [HACKERS] Avoiding bad prepared-statement plans.
Hi Robert, On Tue, Feb 9, 2010 at 17:43, Robert Haas robertmh...@gmail.com wrote: On Tue, Feb 9, 2010 at 7:08 AM, Jeroen Vermeulen j...@xs4all.nl wrote: = Projected-cost threshold = If a prepared statement takes parameters, and the generic plan has a high projected cost, re-plan each EXECUTE individually with all its parameter values bound. It may or may not help, but unless the planner is vastly over-pessimistic, re-planning isn't going to dominate execution time for these cases anyway. How high is high? Perhaps this could be based on a (configurable?) ratio of observed planning time and projected execution time. I mean, if planning it the first time took 30 ms and projected execution time is 1 ms, then by all means NEVER re-plan. But if planning the first time took 1 ms and resulted in a projected execution time of 50 ms, then it's relatively cheap to re-plan every time (cost increase per execution is 1/50 = 2%), and the potential gains are much greater (taking a chunk out of 50 ms adds up quickly). Cheers, Bart
Re: [HACKERS] Avoiding bad prepared-statement plans.
On Thu, Feb 11, 2010 at 13:25, Pavel Stehule pavel.steh...@gmail.comwrote: 2010/2/11 Bart Samwel b...@samwel.tk: Perhaps this could be based on a (configurable?) ratio of observed planning time and projected execution time. I mean, if planning it the first time took 30 ms and projected execution time is 1 ms, then by all means NEVER re-plan. But if planning the first time took 1 ms and resulted in a projected execution time of 50 ms, then it's relatively cheap to re-plan every time (cost increase per execution is 1/50 = 2%), and the potential gains are much greater (taking a chunk out of 50 ms adds up quickly). It could be a good idea. I don't belive to sophisticate methods. There can be a very simply solution. The could be a limit for price. More expensive queries can be replaned every time when the price will be over limit. I guess the required complexity depends on how variable planning costs are. If planning is typically = 2 ms, then a hard limit on estimated price is useful and can be set as low as (the equivalent of) 15 ms. However, if planning costs can be 50 ms, then the lowest reasonable fixed limit is quite a bit larger than that -- and that does not solve the problem reported earlier in this thread, where a query takes 30 ms using a generic plan and 1 ms using a specialized plan. Anyhow, I have no clue how much time the planner takes. Can anybody provide any statistics in that regard? Cheers, Bart
Re: [HACKERS] Avoiding bad prepared-statement plans.
On Thu, Feb 11, 2010 at 13:41, Robert Haas robertmh...@gmail.com wrote: On Thu, Feb 11, 2010 at 7:39 AM, Bart Samwel b...@samwel.tk wrote: Anyhow, I have no clue how much time the planner takes. Can anybody provide any statistics in that regard? It depends a great deal on the query, which is one of the things that makes implementing this rather challenging. But I guess you can probably expect it to be on the same order for the same query in generic form and with filled-in parameters? Because that's the underlying assumption of the ratio criterion -- that re-planning with filled-in parameters takes about as much time as the initial planning run took. Cheers, Bart
[HACKERS] Hostnames in pg_hba.conf
Hi there, I've been working on a patch to add hostname support to pg_hba.conf. It's not ready for public display yet, but I would just like to run a couple of issues / discussion points past everybody. ISSUE #1: Performance / caching At present, I've simply not added caching. The reasoning for this is as follows: (a) getaddrinfo doesn't tell us about expiry, so when do you refresh? (b) If you put the cache in the postmaster, it will not work for exec-based backends as opposed to fork-based backends, since those read pg_hba.conf every time they are exec'ed. (c) If you put this in the postmaster, the postmaster will have to update the cache every once in a while, which may be slow and which may prevent new connections while the cache update takes place. (d) Outdated cache entries may inexplicably and without any logging choose the wrong rule for some clients. Big aargh: people will start using this to specify 'deny' rules based on host names. If you COULD get expiry info out of getaddrinfo you could potentially store this info in a table or something like that, and have it updated by the backends? But that's way over my head for now. ISTM that this stuff may better be handled by a locally-running caching DNS server, if people have performance issues with the lack of caching. These local caching DNS servers can also handle expiry correctly, etcetera. We should of course still take care to look up a given hostname only once for each connection request. ISSUE #2: Reverse lookup? There was a suggestion on the TODO list on the wiki, which basically said that maybe we could use reverse lookup to find the hostname and then check for that hostname in the list. I think that won't work, since IPs can go by many names and may not support reverse lookup for some hostnames (/etc/hosts anybody?). Furthermore, due to the top-to-bottom processing of pg_hba.conf, you CANNOT SKIP entries that might possibly match. For instance, if the third line is for host foo.example.com and the fifth line is for bar.example.com, both lines may apply to the same IP, and you still HAVE to check the first one, even if reverse lookup turns up the second host name. So it doesn't save you any lookups, it just costs an extra one. ISSUE #3: Multiple hostnames? Currently, a pg_hba entry lists an IP / netmask combination. I would suggest allowing lists of hostnames in the entries, so that you can at least mimic the match multiple hosts by a single rule. Any reason not to do this? Comments / bright ideas are welcome, especially regarding issue #1. Cheers, Bart
Re: [HACKERS] Hostnames in pg_hba.conf
On Thu, Feb 11, 2010 at 16:36, Mark Mielke m...@mark.mielke.cc wrote: On 02/11/2010 08:13 AM, Bart Samwel wrote: ISSUE #2: Reverse lookup? There was a suggestion on the TODO list on the wiki, which basically said that maybe we could use reverse lookup to find the hostname and then check for that hostname in the list. I think that won't work, since IPs can go by many names and may not support reverse lookup for some hostnames (/etc/hosts anybody?). Furthermore, due to the top-to-bottom processing of pg_hba.conf, you CANNOT SKIP entries that might possibly match. For instance, if the third line is for host foo.example.com and the fifth line is for bar.example.com, both lines may apply to the same IP, and you still HAVE to check the first one, even if reverse lookup turns up the second host name. So it doesn't save you any lookups, it just costs an extra one. I don't see a need to do a reverse lookup. Reverse lookups are sometimes done as a verification check, in the sense that it's cheap to get a map from NAME - IP, but sometimes it is much harder to get the reverse map from IP - NAME. However, it's not a reliable check as many legitimate users have trouble getting a reverse map from IP - NAME. It also doesn't same anything as IP - NAME lookups are a completely different set of name servers, and these name servers are not always optimized for speed as IP - NAME lookups are less common than NAME - IP. Finally, if one finds a map from IP - NAME, that doesn't prove that a map from NAME - IP exists, so using *any* results from IP - NAME is questionable. I think reverse lookups are unnecessary and undesirable. ISSUE #3: Multiple hostnames? Currently, a pg_hba entry lists an IP / netmask combination. I would suggest allowing lists of hostnames in the entries, so that you can at least mimic the match multiple hosts by a single rule. Any reason not to do this? I'm mixed. In some situations, I've wanted to put multiple IP/netmask. I would say that if multiple names are supported, then multiple IP/netmask should be supported. But, this does make the lines unwieldy beyond two or three. This direction leans towards the capability to define host classes, where the rules allows the host class, and the host class can have a list of hostnames. Yes, but before you know it people will ask for being able to specify multiple host classes. :-) Quite simply put, with a single subnet you can allow multiple hosts in. Allowing only a single hostname is a step backward from that, so adding support for multiple hostnames could be useful if somebody is replacing subnets with hostname-based configuration. Two other aspects I don't see mentioned: 1) What will you do for hostnames that have multiple IP addresses? Will you accept all IP addresses as being valid? Yes, all addresses returned by (pg_)getaddrinfo will be considered valid. Most importantly, this ensures that if a host has an IPv4 and an IPv6 address they are both accepted. Plus, if there are multiple addresses, we have no clue of figuring out which address is the address. :-) 2) What will you do if they specify a hostname and a netmask? This seems like a convenient way of saying everybody on the same subnet as NAME. Not supported. Either an IP address / netmask combo, or a hostname, but not both. I wouldn't want to recommend hardcoding something such as netmasks (which are definitely subnet dependent) in combination with something as volatile as a host name -- move it to a different subnet, and you might allow a whole bigger subnet than you intended. If they want to specify a netmask, then they should just use hardcoded IPs as well. Cheers, Bart
Re: [HACKERS] Hostnames in pg_hba.conf
On Thu, Feb 11, 2010 at 17:21, Tom Lane t...@sss.pgh.pa.us wrote: Bart Samwel b...@samwel.tk writes: I've been working on a patch to add hostname support to pg_hba.conf. Have you read the previous discussions about that? Yes, mostly. The previous discussions included all sorts of complex stuff such as wildcards. Personally, I'd think that in the cases where you'd want wildcards, then you should use IP / netmask configuration, because that's a way better indicator of something that comes from the same source network entity. For instance, wildcards are nice for all our own servers, except that you'd normally use IP / netmaks to indicate your own server subnet. The way I see it, hostname based configuration should be plain and simple. You suggested in one of the earlier discussions that it should not be much more than removing the AI_NUMERICHOST hint in the lookup. My current solution is slightly more involved, since it performs the by-hostname lookup at check time, not at pg_hba.conf read time -- but there is not much more complexity involved. If there is a case for more complexity, then we will hear the actual use cases after this basic support is added, I guess. Cheers, Bart
Re: [HACKERS] Hostnames in pg_hba.conf
On Thu, Feb 11, 2010 at 23:01, Mark Mielke m...@mark.mielke.cc wrote: On 02/11/2010 04:54 PM, Bart Samwel wrote: ISSUE #3: Multiple hostnames? Currently, a pg_hba entry lists an IP / netmask combination. I would suggest allowing lists of hostnames in the entries, so that you can at least mimic the match multiple hosts by a single rule. Any reason not to do this? I'm mixed. In some situations, I've wanted to put multiple IP/netmask. I would say that if multiple names are supported, then multiple IP/netmask should be supported. But, this does make the lines unwieldy beyond two or three. This direction leans towards the capability to define host classes, where the rules allows the host class, and the host class can have a list of hostnames. Yes, but before you know it people will ask for being able to specify multiple host classes. :-) Quite simply put, with a single subnet you can allow multiple hosts in. Allowing only a single hostname is a step backward from that, so adding support for multiple hostnames could be useful if somebody is replacing subnets with hostname-based configuration. This implies two aspects which may not be true: 1) All hosts that I want to allow belong to the same subnet. 2) If I trust one host on the subnet, then I trust all hosts on the subnet. While the above two points are often true, they are not universally true. I don't think we're talking about the same thing here. I wasn't suggesting doing hostname-plus-netmask. NO! I was suggesting that where a lazy sysadmin would previously configure by subnet, they might switch to more fine-grained hostname-based configuration ONLY IF it doesn't require duplicating every line in pg_hba.conf for every host in the subnet. 2) What will you do if they specify a hostname and a netmask? This seems like a convenient way of saying everybody on the same subnet as NAME. Not supported. Either an IP address / netmask combo, or a hostname, but not both. I wouldn't want to recommend hardcoding something such as netmasks (which are definitely subnet dependent) in combination with something as volatile as a host name -- move it to a different subnet, and you might allow a whole bigger subnet than you intended. If they want to specify a netmask, then they should just use hardcoded IPs as well. Ah yes, I recall this from a previous thread. I think I also disagreed on the other thread. :-) I thought of a use for reverse lookup - it would allow wild card hostnames. Still, that's an advanced feature that might be for later... :-) I think wildcards are interesting, but I have yet to see an actual use case other than it's cool and very generalized. In my mind (tell me if I'm wrong), the most common type of PostgreSQL authentication setup is within a local network within an organization. There, you either authorize an entire subnet (the entire server park or all client PCs) or you authorize specific hosts (single IP address). The wildcard case is for replacing the first case, but for that case, subnets are usually just fine. I'm trying to target the second case here. Cheers, Bart