Re: [HACKERS] SCRAM protocol documentation
On 11/08/17 15:00, Michael Paquier wrote: On Fri, Aug 11, 2017 at 9:31 PM, Álvaro Hernández Tortosa <a...@8kdata.com> wrote: On 11/08/17 13:18, Michael Paquier wrote: On Fri, Aug 11, 2017 at 3:50 PM, Álvaro Hernández Tortosa <a...@8kdata.com> wrote: Relatedly, the SCRAM specification doesn't appear to allow omitting the user name in this manner. Why don't we just send the actual user name, even though it's redundant with the startup message? The problem is where a username includes characters as a comma or '=', which can be avoided if the string is in UTF-8 as the username is prepared with SASLprep before being used in the SASL exchange, but we have no way now to be sure now that the string is actually in UTF-8. If at some point we decide that only things using UTF-8 are good to be used during authentication, using the username in the exchange messages instead of the one in the startup packet would be fine and actually better IMO in the long term. Please note that the specification says that both the username and the password must be encoded in UTF-8, so we are not completely compliant here. If there is something to address, that would be this part. The reason why the username is ignored, unless I'm wrong, is not exactly that it is already sent. It is that Postgres does not restrict usernames to be UTF-8 only, while SCRAM does. As such, if a username would not be UTF-8, it will not be sent reliably over SCRAM. That's basically the point I was making. Note that I would not be against Postgres forcing strings to be in UTF-8. Now things are fuzzy because of the lack of restrictions. I'm +1 for that. But I guess that involves a protocol change, and that's a completely different can of worms If there's a clear meaning about ignoring the user here, why not settle on something like the "*"? It's not going to change the world sending a few bytes less on initialization, but I guess it doesn't hurt either... I am not sure either that '*' would be that much helpful. Requiring that things are in UTF-8 would be more compliant with the original RFC. But we really don't need to send the username, since Postgres already knows it (and that accommodates for non UTF-8 usernames). So why bother? Just sending something like "*" (which is UTF-8 and produces the same value under Saslprep) should be enough. I think the idea of ignoring the username is pretty neat, but maybe a "standard" like "send me an asterisk here" could be even better than leaving it empty. Personally I don't see much difference between both, so I'd rather leave things as they are now. Strictly speaking the RFC assumes that the username is at least 1 character. I understand this was precisely Peter's original comment. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM protocol documentation
On 11/08/17 13:18, Michael Paquier wrote: On Fri, Aug 11, 2017 at 3:50 PM, Álvaro Hernández Tortosa <a...@8kdata.com> wrote: On 11/08/17 03:57, Peter Eisentraut wrote: The SCRAM protocol documentation (https://www.postgresql.org/docs/devel/static/sasl-authentication.html) states "To avoid confusion, the client should use pg_same_as_startup_message as the username in the client-first-message." However, the client implementation in libpq doesn't actually do that, it sends an empty string for the user name. I find no other reference to "pg_same_as_startup_message" in the sources. Should the documentation be updated? Yes, definitely. I think that we should mention that the server uses the username of the startup packet and ignores the data sent by the frontend potentially provided in client-first-message. But it already says so the documentation: "The username that was already sent in the startup message is used instead." Relatedly, the SCRAM specification doesn't appear to allow omitting the user name in this manner. Why don't we just send the actual user name, even though it's redundant with the startup message? The problem is where a username includes characters as a comma or '=', which can be avoided if the string is in UTF-8 as the username is prepared with SASLprep before being used in the SASL exchange, but we have no way now to be sure now that the string is actually in UTF-8. If at some point we decide that only things using UTF-8 are good to be used during authentication, using the username in the exchange messages instead of the one in the startup packet would be fine and actually better IMO in the long term. Please note that the specification says that both the username and the password must be encoded in UTF-8, so we are not completely compliant here. If there is something to address, that would be this part. The reason why the username is ignored, unless I'm wrong, is not exactly that it is already sent. It is that Postgres does not restrict usernames to be UTF-8 only, while SCRAM does. As such, if a username would not be UTF-8, it will not be sent reliably over SCRAM. If there's a clear meaning about ignoring the user here, why not settle on something like the "*"? It's not going to change the world sending a few bytes less on initialization, but I guess it doesn't hurt either... I am not sure either that '*' would be that much helpful. Requiring that things are in UTF-8 would be more compliant with the original RFC. But we really don't need to send the username, since Postgres already knows it (and that accommodates for non UTF-8 usernames). So why bother? Just sending something like "*" (which is UTF-8 and produces the same value under Saslprep) should be enough. I think the idea of ignoring the username is pretty neat, but maybe a "standard" like "send me an asterisk here" could be even better than leaving it empty. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM protocol documentation
On 11/08/17 03:57, Peter Eisentraut wrote: The SCRAM protocol documentation (https://www.postgresql.org/docs/devel/static/sasl-authentication.html) states "To avoid confusion, the client should use pg_same_as_startup_message as the username in the client-first-message." However, the client implementation in libpq doesn't actually do that, it sends an empty string for the user name. I find no other reference to "pg_same_as_startup_message" in the sources. Should the documentation be updated? Relatedly, the SCRAM specification doesn't appear to allow omitting the user name in this manner. Why don't we just send the actual user name, even though it's redundant with the startup message? Hi Peter. You are absolutely right, I was also surprised by this when I was doing the JDBC implementation. Actually I chose to send an asterisk ("*"), see https://github.com/pgjdbc/pgjdbc/pull/842/files#diff-c52128420a3882543ffa20a48964abe4R88, as it is shorter than the username (likely). I don't like the empty string either, and actually the library built for the JDBC and used in pgjdbc does explicitly disallow the use of an empty username. If there's a clear meaning about ignoring the user here, why not settle on something like the "*"? It's not going to change the world sending a few bytes less on initialization, but I guess it doesn't hurt either... Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Authentication mechanisms categorization
On 14/07/17 13:29, Michael Paquier wrote: On Fri, Jul 14, 2017 at 12:16 PM, Álvaro Hernández Tortosa <a...@8kdata.com> wrote: If the parameter authmethod would rather be "authmethods", i.e., a list, I think it would be significantly more flexible. Yes, but the handling of a list becomes messier if there are some other connection parameters that are dependent on the authentication method. Say if a list made of scram-sha-256 and scram-sha-3 as methods is sent, and a parameter named saslchannel lists scram-sha-256-plus is used, this becomes unusable with scram-sha-3. Using individual names for a parameter makes interactions with other parameters easier to handle and less bug-prone. That's also by experience more flexible for the application. I agree with a list of methods and all the values already existing for sslmode, this might be more than enough, specially if the channel binding SCRAM mechanisms would get a different authmethod than their non-channel binding partners (like scram-sha-256-plus). This makes the list argument for the authmethods, in my opinion, stronger. For the channel binding patch, I have actually implemented saslname to enforce the name of the SASL mechanism name to use (SCRAM-SHA-256 or its -PLUS) as well as saslchannelbinding to enforce the channel binding type. That's very handy, and at the end I saw that having a list does not add much value in a feature that should be as simple as possible as the client will use one match at the end for authentication, and let the client know if it failed or succeeded (yes I hate sslmode=prefer which does up to two attempts at once). But that's as far as my opinion stands. It is not possible to know the future, but we cannot discard as well the fact that a future authentication method, say hoge could as well support scram-sha-256, in which case cases like that using a list "authmethods=hoge,sasl authmechanisms=scram-sha-256" would mean that scram-sha-256 needs to be enforced for both things, but the dependency handling makes things unnecessary complicated in libpq. My argument here is crazy though. Hi Michael. I'm mostly convinced by the power of all the parameters that already exist, given that you added both saslname and saslchannelbinding to the already existing sslmode. That's great, and allows for very fine choosing of the auth method. So it would be great if (non-libpq) driver implementations would expose the same parameter names to the users. I will study this for JDBC. My only fear is that this could become very complicated for the user, and could end up looking like the authentication algorithm lists for SSL, which are very hard to digest for the non expert. To handle list of auth methods spread through three parameters, seem to me going a bit in this direction. A categorization like the one proposed, while maybe difficult to do initially, and maybe to maintain too, is just precisely to take this burden out from the user, and expose a simpler setting for them. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Authentication mechanisms categorization
On 14/07/17 11:09, Michael Paquier wrote: On Sat, Jul 8, 2017 at 2:19 PM, Álvaro Hernández Tortosa <a...@8kdata.com> wrote: There has been some prior discussion, that we recently continued at pgday.ru, about what to do if a client wants to use a "strong" authentication mechanism but a rogue server forces the client to use a weaker authentication mechanism. This is the case if the client expects SCRAM to be used but a rogue server just replies with AuthenticationCleartextPassword, for example. Client drivers will authenticate using this latter mechanism, transparently (at least pgjdbc implementation does this, and I believe libpq also). This somehow defeats the purpose of some mechanisms like SCRAM. Yeah :( It was discussed to add a parameter to the driver like "SCRAM-only", but I think this may not be ideal. "SCRAM-only" means that code needs to be written to prevent every other authentication mechanism, explicitly, which is far from ideal. Much worse, it defeats using other auth mechanisms that might be OK for the user. Also, this doesn't consider whether SCRAM is good without channel binding. I think it would be better to make a categorization of authentication mechanisms and then have an agreement among libpq and drivers to set a minimum level of security based on the user's request. Some initial ideas are: - Three security levels: Basic, Medium, Advanced. - Prevents MITM / does not. - Given this X possible attacks, a matrix of which mechanisms avoid which attacks (something similar to the table comparing the possible effects of the different isolation levels). This is not trivial: for example, SCRAM may be OK without channel binding in the presence of SSL, but without SSL channel binding is a must to prevent MITM. Similarly, are other auth mechanisms like Kerberos (I'm not an expert here) as "safe" as SCRAM with our without channel binding? The use of channel binding is linked to SSL, which gets already controlled by sslmode. H yes. As of today. Maybe tomorrow there's a channel binding mechanism that does not make use of SSL. But this is also unlikely. Users won't get trapped in this area by using "require" instead of the default of "prefer". I would love to see the default value changed actually from "prefer" to "require" here. "prefer" as a default is a trap for users. There were discussions about that not long ago but this gave nothing. I believe this should be discussed and find a common agreement to be implemented by libpq and all the drivers, including a single naming scheme for the parameter and possible values. Opinions? I think that we don't need to have anything complicated here: let's have at least a connection parameter, and perhaps an environment variable which enforces the type of the authentication method to use: scram-sha-256, md5, etc. I don't think that there is any need to support a list of methods, any application could just enforce the parameter to a different value if the previous one failed. Categorization is something that can lose value over the ages, something considered as strong now could be weak in 10 years. By supporting only a list of dedicated names users have the same flexibility, and we just need to switch the default we consider safe. Controlling SSL is already a separate and existing parameter, so I don't think that it should be part of this scheme. Having documentation giving a recommended combination, say "authmethod=scram-sha-256 sslmode=require" would be enough IMO. If the parameter authmethod would rather be "authmethods", i.e., a list, I think it would be significantly more flexible. I agree with a list of methods and all the values already existing for sslmode, this might be more than enough, specially if the channel binding SCRAM mechanisms would get a different authmethod than their non-channel binding partners (like scram-sha-256-plus). This makes the list argument for the authmethods, in my opinion, stronger. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Authentification method on client side checking
On 09/07/17 18:47, Victor Drobny wrote: Hello, Despite the addition of SCRAM authentification to PostgreSQL 10, MITM attack can be performed by saying that the server supports, for example, only md5 authentication. The possible solution for it is checking authentification method on a client side and reject connections that could be unsafe. Postgresql server can require unencrypted password passing, md5, scram, gss or sspi authentification. Hi Victor. Precisely yesterday I initiated a similar thread: https://www.postgresql.org/message-id/d4098ef4-2910-c8bf-f1e3-f178ba77c381%408kdata.com I think that a) the mere auth mechanism is not enough (channel binding or not, ssl or not, change a lot the effective security obtained) and b) maybe a categorization is a better way of specifying a connection security requirements. What's your opinion on this? Any answer should also be coordinated among the drivers. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Authentication mechanisms categorization
There has been some prior discussion, that we recently continued at pgday.ru, about what to do if a client wants to use a "strong" authentication mechanism but a rogue server forces the client to use a weaker authentication mechanism. This is the case if the client expects SCRAM to be used but a rogue server just replies with AuthenticationCleartextPassword, for example. Client drivers will authenticate using this latter mechanism, transparently (at least pgjdbc implementation does this, and I believe libpq also). This somehow defeats the purpose of some mechanisms like SCRAM. It was discussed to add a parameter to the driver like "SCRAM-only", but I think this may not be ideal. "SCRAM-only" means that code needs to be written to prevent every other authentication mechanism, explicitly, which is far from ideal. Much worse, it defeats using other auth mechanisms that might be OK for the user. Also, this doesn't consider whether SCRAM is good without channel binding. I think it would be better to make a categorization of authentication mechanisms and then have an agreement among libpq and drivers to set a minimum level of security based on the user's request. Some initial ideas are: - Three security levels: Basic, Medium, Advanced. - Prevents MITM / does not. - Given this X possible attacks, a matrix of which mechanisms avoid which attacks (something similar to the table comparing the possible effects of the different isolation levels). This is not trivial: for example, SCRAM may be OK without channel binding in the presence of SSL, but without SSL channel binding is a must to prevent MITM. Similarly, are other auth mechanisms like Kerberos (I'm not an expert here) as "safe" as SCRAM with our without channel binding? I believe this should be discussed and find a common agreement to be implemented by libpq and all the drivers, including a single naming scheme for the parameter and possible values. Opinions? Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM auth and Pgpool-II
On 06/07/17 04:03, Tatsuo Ishii wrote: Hi PostgreSQL hackers, I would like to hear ideas how Pgpool-II can deal with SCRAM auth which will be in PostgreSQL 10. For those who are not familiar with Pgpool-II[1], it is an external OSS project to provide some additional features to PostgreSQL, including load balancing and automatic failover. Pgpool-II works as a proxy between PostgreSQL client and PostgreSQL server(s). When a client wants to connects to PostgreSQL and SCRAM auth is enabled, it sends user name to server. Then the server sends information including a salt to the client. The client computes a "ClientProof" using the salt and other information, and sends it to the server[2]. For Pgpool-II, things would go as follows: 1) clients sends user name to Pgpool-II. 2) Pgpool-II forwards it to PostgreSQL servers. 3) Each PostgreSQL server sends their own salt to Pgpool-II. 4) Pgpool-II is confused because there are multiple salts and each has different values. The client only accepts single salt obviously. So my question is, is there any solution or workaround for the problem #4. Someone at PGCon 2017 suggested that the problem could be avoided if the auth method between the client and Pgpool-II is "trust" (which means no auth). But this does not seem to be a best solution for me because it would weaken the security. I suspect this problem may not be specific to Pgpool-II. Any middle ware which handles multiple PostgreSQL servers could have the similar problem. Any suggestion would be appreciated. Thanks in advance. [1] https://pgpool.net [2] https://tools.ietf.org/html/rfc5802#section-3 Hi Tatsuo. There's definitely an important concern here that should be addressed: how poolers/proxies/middleware/etc can deal with SCRAM, specifically in the context of channel binding. If there is to be a single connection from client to PostgreSQL server, intercepted by pgpool to perform the magic foo, then channel binding is, indeed, designed to defeat this. If, however, pgpool or the middleware manages two separate connections (client<->pool and pool<->PG) then there is some light here. One SCRAM feature not implemented as of today is the authzid (authorization identity: see https://tools.ietf.org/html/rfc5802#page-10, SCRAM attribute "a" and https://tools.ietf.org/html/rfc5801). Authzid is basically "I want to authenticate as user X and once authenticated, consider I'm user Y". With authzid, a pool/proxy may have a common user name with its own SCRAM credentials to authenticate with the backend PostgreSQL, and pass the authzid with the real username (the one provided on the client<->pool connection). This would require: a) That authzid is implemented in PostgreSQL. b) A mechanism in PG to name which user(s) Y are allowed to be authenticated by user X. This is similar, but not identical, to the current SET ROLE. From a SCRAM protocol perspective, this is very simple, just an extra attribute. Part b) may need more discussion. I believe this could be of help to your case and other middleware. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using postgres planner as standalone component
On 01/07/17 22:48, Ricky Stevens wrote: Hi, For one of my personal projects I am interested in using the PostgreSQL planner as a standalone library. However, I would like to run this as an embedded library instead of actually creating anything on disk. I've realized that postgres has several pg_operator, pg_class etc. tables which it uses for query planning purposes. Is there any PostgreSQL component interface whose implementation could be overridden to not actually try to read these tables from disk but instead read it from a custom memory region that is managed by my code. Thanks! Maybe you'd like to consider gporca https://github.com/greenplum-db/gporca as an alternative. You may also want to look at Calcite https://calcite.apache.org/docs/ if you were more into the Java world. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256
On 20/06/17 06:11, Michael Paquier wrote: On Tue, Jun 6, 2017 at 3:40 PM, Michael Paquier <michael.paqu...@gmail.com> wrote: As far as I can see, there are a couple of things that I still need to work on to make people happy: - Rework the generic APIs for TLS finish and endpoint so as any implementation can use channel binding without inducing any extra code footprint to be-secure.c and fe-secure.c. - Implement endpoint, as Alvaro is saying for JDBC that would be nicer. - Have a couple of tests for channel binding to allow people to test the feature easily. Those will be in src/test/ssl/. It would be nice as well to be able to enforce the channel binding type on libpq-side, which is useful at least for testing. So we are going to need an environment variable for this purpose, and a connection parameter. Okay, here we go. Attached is a set of four patches: - 0001 is some refactoring for the SSL tests so as other test suite in src/test/ssl can take advantage of the connection routines. There is nothing fancy here. - 0002 is the implementation of tls-unique as channel binding. This has been largely reworked since last submission, I have found on the way a couple of bugs and some correctness issues. - 0003 is a patch to add as connection parameters saslname and saslchannelbinding. With support of more SASL mechanisms (PG10 has SCRAM-SHA-256, I am adding SCRAM-SHA-256-PLUS here), saslname can be used to enforce on the client-side the value of the SASL mechanism chosen. saslchannelbinding does the same for the channel binding name. This is very useful for testing, and a set of tests are added in src/test/ssl/ for tls-unique and the SASL mechanisms. The tests cover many scenarios, like downgrade attacks for example. - 0004 is the implementation of tls-server-end-point, as Alvaro has asked. Per RFC 5929, the binding data needs to be a hash of the server certificate. If the signature algorithm of the certificate is MD5 or SHA-1, then SHA-256 is used. Other signature algos like SHA-384 or 512 are used to hash the data. The hashed data is then encoded in base64 and sent to the server for verification. Tests using saslchannelname have been added as well. It took me a while to find out that OBJ_find_sigid_algs(X509_get_signature_nid(X509*)) needs to be used to find out the algorithm of a certificate with OpenSSL. With the tests directly in the patch, things are easy to run. WIth PG10 stabilization work, of course I don't expect much feedback :) But this set of patches looks like the direction we want to go so as JDBC and libpq users can take advantage of channel binding with SCRAM. This is awesome, Michael. In the coming weeks, and once my PR for pgjdbc has been added, I will work towards another patch to implement channel binding. Should be reasonably easy now, thanks to this. Appreciated! Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM in the PG 10 release notes
On 11/05/17 09:20, Heikki Linnakangas wrote: On 05/11/2017 07:03 AM, Michael Paquier wrote: On Thu, May 11, 2017 at 11:50 AM, Bruce Momjian <br...@momjian.us> wrote: I have added this as an open item because we will have to wait to see where we are with driver support as the release gets closer. As Postgres ODBC now has a hard dependency with libpq, no actions is taken from there. At least this makes one driver worth mentioning. FWIW, I wrote a patch for the Go driver at https://github.com/lib/pq, to implement SCRAM. It's awaiting review. I updated the List of Drivers in the Wiki. I added a few drivers that were missing, like the ODBC driver, and the pgtclng driver, as well as a Go and Rust driver that I'm aware of. I reformatted it, and added a column to indicate whether each driver uses libpq or not. https://wiki.postgresql.org/wiki/List_of_drivers There is a similar list in our docs: https://www.postgresql.org/docs/devel/static/external-interfaces.html Should we update the list in the docs, adding every driver we know of? Or curate the list somehow, adding only more popular drivers? Or perhaps add a link to the Wiki page from the docs? We can use this list in the Wiki to track which drivers implement SCRAM. I have just submitted a PR to add SCRAM support for pgjdbc: https://github.com/pgjdbc/pgjdbc/pull/842 Thread on the pgjdbc-list: https://www.postgresql.org/message-id/95dea232-aeeb-d619-e917-abf32b44ef8a%408kdata.com Hopefully it will be merged soon, and the list of drivers could then be updated with this. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256
On 01/06/17 17:50, Stephen Frost wrote: Robert, * Robert Haas (robertmh...@gmail.com) wrote: On Wed, May 31, 2017 at 7:59 PM, Stephen Frost <sfr...@snowman.net> wrote: If your comments regarding the requirement that we have interoperability testing of this feature before accepting it were intended to mean that we need to make sure that the JDBC driver is able to work with the chosen solution (or, perhaps, that we support multuple options, one of which works with the JDBC changes), and that we review the other TLS libraries with the goal of making sure that what we agree on in core should work with those also, then that's great and exactly what I'd like to see also. OK, great. That's what I mean (mostly - see below). Glad to hear it. If your comments regarding the requirement that we have interoperability testing of this feature before accepting it were intended to mean that we must have a complete alternative TLS solution, while that would actually play a great deal to a goal I've had for a while to have PG support multiple TLS implementations (LibNSS being top-of-mind, at least for me, as I've mentioned a couple times), I can't agree that we should require that before accepting channel binding as a feature. To do so would be akin to requiring that we support multiple TLS implementations before we agreed to support client-side certificates, in my opinion. I don't think that we need to have a committed patch allowing multiple SSL implementations before we accept a patch for channel binding, but I think it might be a good idea for someone to hack either libpq or the server enough to make it work with some other SSL implementation (Windows SSL would probably be the best candidate) and test that what we think will work for channel binding actually does work. It wouldn't need to be a committable patch, but it should be enough to make a successful connection in the laboratory. Now, there might also be other ways besides that of testing that interoperability is possible, so don't take me as being of the view that someone has to necessarily do exactly that thing that I just said. But I do think that we probably should do more than say "hey, we've got this undocumented SSL API, and this other Windows API, and it looks like they do about the same thing, so we're good". There's sometimes a big difference between what sounds like it should work on paper and what actually does work. I certainly wouldn't object to someone working on this, but I feel like it's a good deal more work than perhaps you're realizing (and I tend to think trying to use the Windows SSL implementation would increase the level of effort, not minimize it). Perhaps it wouldn't be too bad to write a one-off piece of code which just connects and then returns the channel binding information on each side and then one could hand-check that what's returned matches what it's supposed to based on the RFC, but if it doesn't, then what? In the specific case we're talking about, there's two approaches in the RFC and it seems like we should support both because at least our current JDBC implementation only works with one, and ideally we can allow for that to be extended to other methods, but I wouldn't want to implement a method that only works on Windows SSL because that implementation, for whatever reason, doesn't follow either of the methods available in the RFC. To make things even more complicated, I think the RFC is not helping very much, as the definition is not very clear itself: " Description: The first TLS Finished message sent (note: the Finished struct, not the TLS record layer message containing it) in the most recent TLS handshake of the TLS connection being bound to (note: TLS connection, not session, so that the channel binding is specific to each connection regardless of whether session resumption is used). If TLS renegotiation takes place before the channel binding operation, then the first TLS Finished message sent of the latest/ inner-most TLS connection is used. Note that for full TLS handshakes, the first Finished message is sent by the client, while for abbreviated TLS handshakes (session resumption), the first Finished message is sent by the server. " https://tools.ietf.org/html/rfc5929#section-3.1 If you read further, it becomes even less clear what it is and that if there are re-negotiations, it gets more complicated. Server-end-point is kind of better specified: " The hash of the TLS server's certificate [RFC5280] as it appears, octet for octet, in the server's Certificate message. " Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256
On 01/06/17 18:11, Bruce Momjian wrote: On Wed, May 31, 2017 at 09:37:02AM -0400, Robert Haas wrote: On Tue, May 30, 2017 at 11:49 PM, Stephen Frost <sfr...@snowman.net> wrote: ... and I don't believe that we should be asking the implementors of channel binding to also implement support for multiple TLS libraries in PostgreSQL in order to test that their RFC-following (at least, as far as they can tell) implementation actually works. You're of course free to believe what you wish, but that sounds short-sighted to me. If we implement channel binding and it turns out not to be interoperable with other SSL implementations, then what? We can't change it later without breaking compatibility with our own prior implementation. Note that Álvaro Hernández Tortosa said about two hours before you sent this email that it doesn't seem possible to implement something comparable in Java's standard SSL stack. If that's the case, adopting this implementation is dooming everyone who connects to the database server using JDBC to be unable to use channel binding. And that's a large percentage of our user base. Just to step back, exactly how does channel binding work? Is each side of the SSL connection hashing the password hash with the shared SSL session secret in some way that each side knows the other end knows the password hash, but not disclosing the secret or password hash? Is there some other way JDBC can get that information? In short, channel binding augments SCRAM (could be also other authentication methods, I guess) by adding to the mix of data to be exchanged, data that comes off-band. Normal SCRAM exchange involves user, a unique token, a salt and some other information. If you add into the mix off-band information, that is uniquely known by only client and server, you can avoid MITM attacks. It's not as simple as "hashing" the password, though. SCRAM involves 4 steps (2 messages each way) with somehow complex hashing of data in the last 2 steps. There are basically 2 channel binding options, that is, 2 possible data pieces that could be taken off-band of the SCRAM authentication and throw them into this mix: - tls-unique. It's like a unique identifier for each client-server SSL connection. It should be get from the SSL channel and there doesn't seem to be a super consistent way of getting it from the channel (in OpenSSL is an undocumented API, it's not available in normal Java stack, etc). - tls-server-end-point. Channel binding data is just a hash of a SSL certificate, as is. As such, seems to be easier to be supported across multiple OSs/SSL stacks. However, SCRAM RFC states that if channel binding is implemented, at least tls-unique must be implemented, with others being optional (there is as of today a third one, but only applies to telnet). So in my opinion, I'd implement both of the above, for maximum flexibility, and add a field to the required scram authentication febe message with a list (aka CSV) of the channel binding mechanisms supported by this server. In this way, I believe covers support for several channel binding mechanisms and could be extended in the future should other mechanisms appear. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256
On 31/05/17 03:39, Michael Paquier wrote: On Tue, May 30, 2017 at 5:59 PM, Robert Haas <robertmh...@gmail.com> wrote: That sounds like undue optimism to me. Unless somebody's tested that Michael's proposed implementation, which uses undocumented OpenSSL APIs, actually interoperates properly with a SCRAM + channel binding implementation based on some other underlying SSL implementation, we can't really know that it's going to work. It's not like we're calling SSL_do_the_right_thing_for_channel_binding_thing_per_rfc5929(). We're calling SSL_do_something_undocumented() and hoping that something_undocumented == the_right_thing_for_channel_binding_thing_per_rfc5929. Could be true, but without actual interoperability testing it sounds pretty speculative to me. Hm? Python is using that stuff for a certain amount of years now, for the same goal of providing channel binding for SSL authentication. You can look at this thread: https://bugs.python.org/issue12551 So qualifying that as a speculative method sounds a quite hard word to me. Actually this Python patch seems to ultimately rely on the same OpenSSL functions as your implementation. I don't have anything against implementing tls-unique, specially as per the RFC it is mandatory (if channel binding support is provided). However, given the use of undocumented API, given that at least the Java driver would have a very difficult time implementing it (basically replacing Java's standard SSL stack with a completely external one, BouncyCastle, which adds load, size and complexity), I would rather focus the efforts on tls-server-end-point which only needs to access the certificate data, something that most APIs/frameworks/languages seem to cover much more naturally than tls-unique. Both are equally safe and effective and so having support for both seems sensible to me. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Channel binding support for SCRAM-SHA-256
[cross-posting to pgjdbc] On 25/05/17 17:17, Michael Paquier wrote: Hi all, Please find attached a patch to add support for channel binding for SCRAM, to mitigate MITM attacks using this protocol. Per RFC5802 (https://tools.ietf.org/html/rfc5802), servers supporting channel binding need to add support for tls-unique, and this is what this patch does. This is awesome, Michael :) Thank you! You have prevented me eventually writing the patch and having then PostgreSQL source tainted with Java-to-C "transpiled" code ^_^ As defined in RFC5056 (exactly here => https://tools.ietf.org/html/rfc5056#section-4.1), servers can use the TLS finish message to determine if the client is actually the same as the one who initiated the connection, to eliminate the risk of MITM attacks. OpenSSL offers two undocumented APIs for this purpose: - SSL_get_finished() to get the latest TLS finish message that the client has sent. - SSL_get_peer_finished(), to get the latest TLS finish message expected by the server. So basically what is needed here is saving the latest message generated by client in libpq using get_finished(), send it to the server using the SASL exchange messages (value sent with the final client message), and then compare it with what the server expected. Connection is successful if what the client has sent and what the server expects match. There is also a clear documentation about what is expected from the client and the server in terms of how both should behave using the first client message https://tools.ietf.org/html/rfc5802#section-6. So I have tried to follow it, reviews are welcome particularly regarding that. The implementation is done in such a way that channel binding is used in the context of an SSL connection, which is what the RFCs expect from an implementation. After a deeper analysis, I have some concerns/comments here: - tls-unique, as you mentioned, uses two undocumented APIs. This raises a small flag about the stability and future of those APIs. - More importantly, some drivers (not libpq-based) may have unexpected difficulties implementing tls-unique. In particular, there is not an API in Java to get the finished message. I expect other languages to maybe have similar limitations. For Java I see two options: * Also implement tls-server-end-point, which rather relies on the server certificate. This information seems to be exposed as part of the Java APIs: https://docs.oracle.com/javase/8/docs/api/java/security/cert/Certificate.html#getEncoded-- * Use the BouncyCastle library (http://bouncycastle.org/), which explicitly supports tls-unique (https://www.bouncycastle.org/docs/pkixdocs1.5on/org/bouncycastle/est/TLSUniqueProvider.html#getTLSUnique() , https://www.bouncycastle.org/docs/tlsdocs1.5on/org/bouncycastle/tls/ChannelBinding.html#tls_unique ). This would require, however, non-trivial changes to the driver and, I expect, a lot of extra effort. - It seems to me that tls-unique might be a bit fragile. In particular, it requires the server to be aware of TSL renegotiations and avoid performing one while the SCRAM authentication is being performed. I don't know if this is already guaranteed by the current patch, but it seems to me it requires complex interactions between levels of abstraction that are quite separate (SSL and SCRAM). This is explained by the RFC: " server applications MUST NOT request TLS renegotiation during phases of the application protocol during which application-layer authentication occurs " (https://tools.ietf.org/html/rfc5929#section-3.1) Based on this facts, I'd suggest to either implement tls-server-end-point or implement both tls-server-end-point and tls-unique. The latter would require a more complete protocol message to advertise separately SCRAM mechanisms and channel binding names. One of such structures could be the one I suggested earlier: https://www.postgresql.org/message-id/df8c6e27-4d8e-5281-96e5-131a4e638...@8kdata.com Before even that, the server needs to send to the client the list of SASL mechanisms that are supported. This adds SCRAM-SHA-256-PLUS if the server has been built with SSL support to the list. And I'd say the list of channel binding names supported. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange
My only desire would be to have a final spec and implement the full parser now, not have to change it in the future. We already know today all the requirements, so please pick one and I will follow it :) On Apr 13, 2017 13:47, "Heikki Linnakangas" <hlinn...@iki.fi> wrote: > On 04/13/2017 02:35 PM, Álvaro Hernández Tortosa wrote: > >> On 13/04/17 13:24, Heikki Linnakangas wrote: >> >>> Right, when we get channel binding, the server will list >>> "SCRAM-SHA-256" and "SCRAM-SHA-256-PLUS" as the list of mechanisms. >>> And if we get channel binding using something else than tls-unique, >>> then those will be added as extra mechanisms, too, e.g. >>> "SCRAM-SHA-256-PLUS tls-awesome". >>> >> >> And how about supporting different SCRAM mechanisms with different >> possible channel bindings? Separate by space too? So given a field, is >> the first item the SCRAM mechanism, and all the remaning the channel >> binding methods? I.e.: >> >> SCRAM-SHA-256-PLUS tls-awesome tls-awesome2 tls-awesome3\0... >> > > I think we're going in circles.. Yeah, we could do that. Or they could be > listed as separate mechanisms: > > SCRAM-SHA-256-PLUS\0 > SCRAM-SHA-256-PLUS tls-awesome\0 > SCRAM-SHA-256-PLUS tls-awesome2\0 > SCRAM-SHA-256-PLUS tls-awesome3\0 > \0 > > One alternative is that you could list extra channel bindings that are > supported by all the mechanisms, as separate entries. So the list could be: > > binding tls-unique\0 > binding tls-awesome\0 > binding tls-awesome2\0 > binding tls-awesome3\0 > SCRAM-SHA-256-PLUS\0 > SCRAM-SHA-512-PLUS\0 > \0 > > which would mean that those bindings are supported by all the mechanisms > that follow. I think this would achieve the same thing as your proposed > separate field, with the current proposed protocol. > > But again, I'm 99% sure we won't need it, and we don't need to decide the > exact syntax for channel bindings yet. We have the flexibility now, so we > can cross the bridge when we get there. > > - Heikki > >
Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange
On 13/04/17 04:54, Michael Paquier wrote: On Thu, Apr 13, 2017 at 6:37 AM, Álvaro Hernández Tortosa <a...@8kdata.com> wrote: By looking at the them, and unless I'm missing something, I don't see how the extra information for the future implementation of channel binding would be added (without changing the protocol). Relevant part is: The message body is a list of SASL authentication mechanisms, in the server's order of preference. A zero byte is required as terminator after the last authentication mechanism name. For each mechanism, there is the following: String Name of a SASL authentication mechanism. How do you plan to implement it, in future versions, without modifying the AuthenticationSASL message? Or is it OK to add new fields to a message in future PostgreSQL versions, without considering that a protocol change? I don't quite understand the complain here, it is perfectly fine to add as many null-terminated names as you want with this model. The patches would make the server just send one mechanism name now, but nothing prevents the addition of more. I think I explained in my previous reply, but just in case: there are two lists here: SCRAM mechanism and channel binding mechanisms. They are orthogonal, you could pick them separately (only with the -PLUS variants, of course). All two (both SCRAM and channel binding mechanisms) have to be advertised by the server. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange
On 13/04/17 13:24, Heikki Linnakangas wrote: On 04/13/2017 05:54 AM, Michael Paquier wrote: On Thu, Apr 13, 2017 at 6:37 AM, Álvaro Hernández Tortosa <a...@8kdata.com> wrote: By looking at the them, and unless I'm missing something, I don't see how the extra information for the future implementation of channel binding would be added (without changing the protocol). Relevant part is: The message body is a list of SASL authentication mechanisms, in the server's order of preference. A zero byte is required as terminator after the last authentication mechanism name. For each mechanism, there is the following: String Name of a SASL authentication mechanism. How do you plan to implement it, in future versions, without modifying the AuthenticationSASL message? Or is it OK to add new fields to a message in future PostgreSQL versions, without considering that a protocol change? I don't quite understand the complain here, it is perfectly fine to add as many null-terminated names as you want with this model. The patches would make the server just send one mechanism name now, but nothing prevents the addition of more. Right, when we get channel binding, the server will list "SCRAM-SHA-256" and "SCRAM-SHA-256-PLUS" as the list of mechanisms. And if we get channel binding using something else than tls-unique, then those will be added as extra mechanisms, too, e.g. "SCRAM-SHA-256-PLUS tls-awesome". And how about supporting different SCRAM mechanisms with different possible channel bindings? Separate by space too? So given a field, is the first item the SCRAM mechanism, and all the remaning the channel binding methods? I.e.: SCRAM-SHA-256-PLUS tls-awesome tls-awesome2 tls-awesome3\0... Please note that if this is the solution chosen: - A lot of parsing and convention is required (first arg is the SCRAM mechanism, succesive are channel binding; tls-unique is always "implied", etc) - Channel binding names will be repeated for every SCRAM mechanism with "-PLUS". This is not needed, SCRAM mechanisms and channel binding are separate things. - Channel binding names will not be present on others, making the parser even more complex. All this vs, again, stating SCRAM mechanisms on one list and channel binding on another list, which is to my much more KISS. But... anyway, if this is the decision made, at least I think this should be documented now, because client parsers need to be designed one way or another. Thanks, Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange
On 12/04/17 19:34, Heikki Linnakangas wrote: On 04/11/2017 02:32 PM, Álvaro Hernández Tortosa wrote: So I still see your proposal more awkward and less clear, mixing things that are separate. But again, your choice :) So, here's my more full-fledged proposal. The first patch refactors libpq code, by moving the responsibility of reading the GSS/SSPI/SASL/MD5 specific data from the authentication request packet, from the enormous switch-case construct in PQConnectPoll(), into pg_fe_sendauth(). This isn't strictly necessary, but I think it's useful cleanup anyway, and now that there's a bit more structure in the AuthenticationSASL message, the old way was getting awkward. The second patch contains the protocol changes, and adds the documentation for it. - Heikki Hi Heikki. Thanks for the patches :) By looking at the them, and unless I'm missing something, I don't see how the extra information for the future implementation of channel binding would be added (without changing the protocol). Relevant part is: The message body is a list of SASL authentication mechanisms, in the server's order of preference. A zero byte is required as terminator after the last authentication mechanism name. For each mechanism, there is the following: String Name of a SASL authentication mechanism. How do you plan to implement it, in future versions, without modifying the AuthenticationSASL message? Or is it OK to add new fields to a message in future PostgreSQL versions, without considering that a protocol change? On a side note, I'd mention that the list of SASL authentication mechanisms contains valid IANA Registry SCRAM names (https://www.iana.org/assignments/sasl-mechanisms/sasl-mechanisms.xhtml#scram)for SCRAM authentication messages (making it more clear what values would be expected there from the client). I hope to start testing this from Java the coming weekend. I will keep you posted. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some thoughts about SCRAM implementation
On 12/04/17 18:38, Robert Haas wrote: On Tue, Apr 11, 2017 at 9:20 AM, Álvaro Hernández Tortosa <a...@8kdata.com> wrote: LOL. Do you really want a half-baked Java programmer writing a patch in C for PostgreSQL? I once tried that and Magnus said my code was Java code that happened to compile with a C compiler ^_^ Having said that, I take the bait, I like challenges and putting my words behind my code :) Excellent, because that's how stuff gets done around here. Saying that you want something and hoping other people will do it is fine, but being will to put some effort into it is a lot more likely to get it done. Not to be harsh, but showing up 3 days after feature freeze to complain that a feature pending commit for 14 months is missing something that you really need isn't really the right way to make something happen. I'm pretty sure that the lack of channel binding was discussed quite a bit earlier than now, so I think there was adequate opportunity to protest, contribute a patch, etc. It's not that I don't have sympathy for the way you feel about this: seeing features you care about fall out of a release sucks, and I've experienced a lot of that suckage quite recently, so the pain is fresh in my mind. But it's something we all have to live with for the overall good of the product. We used to not have a firm feature freeze, and that was way worse. Hi Robert. Not harsh, no worries, but please note a couple of things, let me give you a bit of context about my recent comments: - I haven't complained about not having channel binding support. I was just giving my own recommendations for the PostgreSQL community, in the belief that contributing this opinion could let -hackes to make a more informed decision about whether to include or not a given feature. - I haven't said either that I need it. I don't use SCRAM, much less with channel binding. Rather than looking after myself here, I'm trying to sit on the foot of potential users and speak up for them. I might be wrong, of course, but this is the only role I'm trying to play here. - I know how PostgreSQL release cycles work and not meaning to hack them anyway. I just thought raising the fact that something that I believe might be requested by enterprise users was on the other hand a minor addition to a feature, and thus a really high value-to-effort ratio. Indeed, I bet adding channel binding is an order of magnitude easier than adding saslprep (which is optional too, btw). Ultimately I'm happy with SCRAM in PG 10, whether it has cbinding or not, as it is a great addition and makes PG10 an even better software. - Even though I don't really care about SCRAM, and without having any prior knowledge about SCRAM, I volunteered some time ago to study SCRAM, give a lightning talk about SCRAM and later write a client implementation for the jdbc driver. And I have already devoted a very fair amount of time in doing so, and will keep doing that until all code is done. Code WIP is here FYI: https://github.com/ahachete/scram. So it's not that I haven't already put my code behind my words. - Given all that, I still want to volunteer to not only do the client jdbc part and consequently help debugging the server implementation (as I believe it is so far the only non-libpq implementation), but also try to also stand by my words by writing channel binding code for PostgreSQL server. This is a huge effort for me, I only did C programming on the year 2001. But still, I want to help from my limited capabilities as much as I can. - Having thoroughly studied the RFC and companion documentation, I thought I was in a knowledge position as to give some recommendations that other hackers may not know about (unless a deep study of SCRAM would have been done). That's it, recommendation, ideas. In this case, I think it is abundantly clear that SCRAM without channel binding is still a good feature. I agree. I may have exaggerated before, downplaying SCRAM without channel binding. I think it is great. Period. I also still think channel binding is very small code addition yet provides even better value. I apologize for not picking my previous words more carefully. One piece of evidence for that is that the standard uses the suffix -PLUS to denote the availability of channel binding. That clearly conveys that channel binding has value, but also that not having it does not make the whole thing useless. Otherwise, they would have required it to be part of every implementation, or they would have made you add -CRAPPY if you didn't have it. The discussion elsewhere on this thread has adequately underlined the value of what we've already got, so I won't further belabor the point here. Furthermore, I think that the state of this feature as it currently exists in the tree is actually kind of concerning. There are currently four open items pertaining to SCRAM at least two of which look to m
Re: [HACKERS] Some thoughts about SCRAM implementation
On 12/04/17 18:09, Tom Lane wrote: Heikki Linnakangas <hlinn...@iki.fi> writes: On 04/12/2017 06:26 PM, Bruce Momjian wrote: How does it do that? Good question, crypto magic? I don't know the details, but the basic idea is that you extract a blob of data that uniquely identifies the TLS connection. Using some OpenSSL functions, in this case. I think it's a hash of some of the TLS handshake messages that were used when the TLS connection was established (that's what "tls-unique" means). That data is then incorporated in the hash calculations of the SCRAM authentication. If the client and the server are not speaking over the same TLS connection, they will use different values for the TLS data, and the SCRAM computations will not match, and you get an authentication failure. I believe the above is not correct. Channel binding data is *not* used for any hash computations. It is simply a byte array that is received as an extra user parameter, and the server then gets it by its own way (its own TLS data) and do a byte comparison. That's what the RFCs say about it. ... which the user can't tell apart from having fat-fingered the password, I suppose? Doesn't sound terribly friendly. A report of a certificate mismatch is far more likely to lead people to realize there's a MITM. So given what I said before, that won't happen. Indeed, SCRAM RFC contains a specific error code for this: "channel-bindings-dont-match". Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some thoughts about SCRAM implementation
On 11/04/17 15:18, Heikki Linnakangas wrote: On 04/11/2017 04:09 PM, Álvaro Hernández Tortosa wrote: But I will conserve my remaining courage (thanks Michael!) credits for future threads ;) I have stated my opinion clearly, I will now go back to the client library. Once you're done with the client library, feel free to post a patch to implement channel binding. If it really is trivial, we can discuss it. Most likely, it won't go into v10, but someone's got to write the patch for v11 anyway, and until there's a patch on the table to discuss, there's no point in arguing. Yes, I'm trying to cajole you into writing the patch for v11, so that I don't have to :-). LOL. Do you really want a half-baked Java programmer writing a patch in C for PostgreSQL? I once tried that and Magnus said my code was Java code that happened to compile with a C compiler ^_^ Having said that, I take the bait, I like challenges and putting my words behind my code :) Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some thoughts about SCRAM implementation
On 11/04/17 15:03, Magnus Hagander wrote: On Tue, Apr 11, 2017 at 2:53 PM, Álvaro Hernández Tortosa <a...@8kdata.com <mailto:a...@8kdata.com>> wrote: On 10/04/17 20:32, Andres Freund wrote: On 2017-04-10 20:28:27 +0200, Álvaro Hernández Tortosa wrote: On 10/04/17 13:02, Heikki Linnakangas wrote: On 04/10/2017 12:39 PM, Álvaro Hernández Tortosa wrote: - I think channel binding support should be added. SCRAM brings security improvements over md5 and other simpler digest algorithms. But where it really shines is together with channel binding. This is the only method to prevent MITM attacks. Implementing it should not very difficult. There are several possible channel binding mechanisms, but the mandatory and probably more typical one is 'tls-unique' which basically means getting the byte array from the TLSfinish() message and comparing it with the same data sent by the client. That's more or less all it takes to implement it. So I would go for it. We missed the boat for PostgreSQL 10. You're right that it probably wouldn't be difficult to implement, but until there's a concrete patch to discuss, that's a moot point. Really? That's a real shame I know we're very late in the CF cycle but, again, this would be a real shame. That can equally be said about about a lot of features. If we don't stop at some point... Also, we're not late in the CF cycle, the CF cycle for v10 is over. It's not like the non-existance of channel binding removes previously existing features, or makes SCRAM useless. Greetings, Andres Freund I know this is a lost battle. But please bear with me for a minute. Let's put ourselves on the foot of potential users. Why would anyone want to use SCRAM? What for? The hashing mechanism is better, no question. And bring some added benefits, true. So its "better". But the real gain comes from using channel binding, which avoids impersonation, MITM attacks. This is the deal breaker. SCRAM without channel binding is like Coke Zero without caffeine and mixed with water. Don't get me wrong, the work behind is great. I think you are seriously undervaluing the SCRAM without channel binding. I'm not. If I wouldn't appreciate its value, I wouldn't be writing a client library for the jdbc driver. It improves a lot of things over our current md5 method beyond just using a stronger hashing algorithm. Sure, channel binding is great. But that's not a dealbreaker, or even close to it. I think otherwise. It is close to a dealbreaker. And it's so few extra code lines that it requires But just a bit more is needed to make it really a big announcement and provide real value to (I guess, mostly but very interesting) enterprise customers, for which MITM and impersonating are big things. The good news is that adding channel binding is like inverse Paretto: a 20% of extra effort (I bet significantly less) leads to 80% improvement. I would expect most enterprise customers who care about MITM protection are already using either TLS or ipsec to cover that already. They have benefit from the other parts of SCRAM, but they've already solved those problems. Enterprises use checklists. And discard solutions if they don't have "checks" on all the items. One of those is, in my opinion, SCRAM with channel binding. I don't want this to happen to PG, specially when it's so easy to implement. But I will conserve my remaining courage (thanks Michael!) credits for future threads ;) I have stated my opinion clearly, I will now go back to the client library. Thanks, Álvaro -- Álvaro Hernández Tortosa --- <8K>data
Re: [HACKERS] Some thoughts about SCRAM implementation
On 10/04/17 20:32, Andres Freund wrote: On 2017-04-10 20:28:27 +0200, Álvaro Hernández Tortosa wrote: On 10/04/17 13:02, Heikki Linnakangas wrote: On 04/10/2017 12:39 PM, Álvaro Hernández Tortosa wrote: - I think channel binding support should be added. SCRAM brings security improvements over md5 and other simpler digest algorithms. But where it really shines is together with channel binding. This is the only method to prevent MITM attacks. Implementing it should not very difficult. There are several possible channel binding mechanisms, but the mandatory and probably more typical one is 'tls-unique' which basically means getting the byte array from the TLSfinish() message and comparing it with the same data sent by the client. That's more or less all it takes to implement it. So I would go for it. We missed the boat for PostgreSQL 10. You're right that it probably wouldn't be difficult to implement, but until there's a concrete patch to discuss, that's a moot point. Really? That's a real shame I know we're very late in the CF cycle but, again, this would be a real shame. That can equally be said about about a lot of features. If we don't stop at some point... Also, we're not late in the CF cycle, the CF cycle for v10 is over. It's not like the non-existance of channel binding removes previously existing features, or makes SCRAM useless. Greetings, Andres Freund I know this is a lost battle. But please bear with me for a minute. Let's put ourselves on the foot of potential users. Why would anyone want to use SCRAM? What for? The hashing mechanism is better, no question. And bring some added benefits, true. So its "better". But the real gain comes from using channel binding, which avoids impersonation, MITM attacks. This is the deal breaker. SCRAM without channel binding is like Coke Zero without caffeine and mixed with water. Don't get me wrong, the work behind is great. But just a bit more is needed to make it really a big announcement and provide real value to (I guess, mostly but very interesting) enterprise customers, for which MITM and impersonating are big things. The good news is that adding channel binding is like inverse Paretto: a 20% of extra effort (I bet significantly less) leads to 80% improvement. So CF v10 is over. So we're on testing phase. Can't we consider this a "missing feature bug"? ^_^ Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange
On 11/04/17 13:21, Heikki Linnakangas wrote: On 04/11/2017 01:39 PM, Álvaro Hernández Tortosa wrote: The fact that you null terminate them (fine with me) doesn't change my reasoning. How do you separate multiple channel binding methods? And do you realize that you will be repeating the channel binding methods without reason? A contrived but legal, possible example: Field1: SCRAM-SHA-256\0 SCRAM-SHA-512\0 SCRAM-SHA-999\0 SCRAM-SHA-256-PLUS tls-unique tls-awesome yet-another-tls\0 SCRAM-SHA-512-PLUS tls-unique tls-awesome yet-another-tls\0 SCRAM-SHA-999-PLUS tls-unique tls-awesome yet-another-tls\0 I was actually thinking of: SCRAM-SHA-256\0 SCRAM-SHA-512\0 SCRAM-SHA-999\0 SCRAM-SHA-256-PLUS\0 SCRAM-SHA-256-PLUS tls-awesome\0 SCRAM-SHA-256-PLUS yet-another-tls\0 SCRAM-SHA-512-PLUS\0 SCRAM-SHA-512-PLUS tls-awesome\0 SCRAM-SHA-512-PLUS yet-another-tls\0 SCRAM-SHA-999-PLUS\0 SCRAM-SHA-999-PLUS tls-awesome\0 SCRAM-SHA-999-PLUS yet-another-tls\0 In practice, I don't foresee us having this many options, so the verbosity won't be an issue. Parsing this is very straightforward. That's maybe slightly better, since -I agree- verbosity is not an issue. But parsing (parsers, and validators) are still more complex (you need to check that if suffix is -PLUS you need to split by space and find another field with another format based on another lookup table of IANA registry names and so forth). Vs: this field is for SCRAM names, this field is for channel binding names. Done. Let me exemplify. In Java-ish syntax, your type would be something like: List<Pair<ScramMechanism,ChannelBindingType>> from where you need to extract individually ScramMechanisms and unique(ChannelBindingType) My proposal would have two lists: List List which is exactly what you need. So I still see your proposal more awkward and less clear, mixing things that are separate. But again, your choice :) vs Field 1: SCRAM-SHA-256,SCRAM-SHA-256-PLUS,SCRAM-SHA-1-PLUS,SCRAM-SHA-512-PLUS (simple CSV) Field 2: tls-unique (String) What if tls-unique is only supported with SCRAM-SHA-256-PLUS, while SCRAM-SHA-512-PLUS requires tls-awesome? It can't happen. The RFC clearly states that they are orthogonal. It is left to the implementations support one or the other, but no reason to limit applicability of a given binding method to a given SCRAM mechanisms (or viceversa). Well, if tls-unique is found to be insecure, a future SCRAM-SHA-512-PLUS spec might well define that the default for that mechanism is tls-unique-new-and-secure rather than tls-unique. Maybe even forbid using tls-unique altogether. I don't think that's likely, but this is all about future-proofing, so I'd rather keep it flexible. If it would be insecure, I'd immediately stop it from being advertised, and problem solved. Nothing would change (under my proposal). -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange
On 11/04/17 12:23, Heikki Linnakangas wrote: On 04/11/2017 11:55 AM, Álvaro Hernández Tortosa wrote: On 11/04/17 08:50, Heikki Linnakangas wrote: Oh, I see. According to the SCRAM RFC, "tls-unique" is used by default. I don't see us implementing anything else any time soon. PostgreSQL doesn't support any other "channel type" than TLS, and tls-unique is the only one that makes sense for TLS. If we do need it after all, the server could advertise the additional channel binding types as additional SASL mechanisms in the AuthenticationSASL message, maybe something like: "SCRAM-SHA-256" "SCRAM-SHA-256-PLUS" (for tls-unique) "SCRAM-SHA-256-PLUS ssh-unique" (for hypothetical ssh channel binding) The same trick can be used to advertise any other SASL mechanism specific options, if needed in the future. Why not add an extra field to the message? This scheme has in my opinion some disadvantages: - You assume no extensibility. Maybe Postgres will implement other mechanisms for channel binding. Maybe not. But why limit ourselves? - Apart from tls-unique there are others today, like tls-server-end-point and who knows if in the future TLS 1.x comes with something like 'tls-unique-1.x' - Why have to parse the string (separated by spaces) when you could use different fields and have no parsing at all? I don't think an option separated by space is any more difficult to parse than a separate field. I'm envisioning that the "parsing" would be simply: if (strcmp(sasl_mechanism, "SCRAM-SHA-256") == 0) { ... } else if (strcmp(sasl_mechanism, "SCRAM-SHA-256-PLUS") == 0) { ... } else if (strcmp(sasl_mechanism, "SCRAM-SHA-256-PLUS tls-awesome") == 0) { ... } This can be extended for more complicated options, if necessary. Although if we find that we need a dozen different options, I think we've done something wrong. - How do you advertise different SCRAM mechanisms with different channel binding types? And a mix of SCRAM mechanisms with and without channel binding? If I got it right, with your proposal it would be something like: Field 1: SCRAM-SHA-256,SCRAM-SHA-256-PLUS tls-unique,SCRAM-SHA-1-PLUS tls-unique,SCRAM-SHA-512-PLUS tls-unique (which is basically a CSV of pairs where the right part of the pair might be empty; too much IMHO for a single field) Yes, except that in my proposal, the list is not a comma-separated string, but a list of null-terminated strings, similar to how some other lists of options in the FE/BE protocol are transmitted. The fact that you null terminate them (fine with me) doesn't change my reasoning. How do you separate multiple channel binding methods? And do you realize that you will be repeating the channel binding methods without reason? A contrived but legal, possible example: Field1: SCRAM-SHA-256\0 SCRAM-SHA-512\0 SCRAM-SHA-999\0 SCRAM-SHA-256-PLUS tls-unique tls-awesome yet-another-tls\0 SCRAM-SHA-512-PLUS tls-unique tls-awesome yet-another-tls\0 SCRAM-SHA-999-PLUS tls-unique tls-awesome yet-another-tls\0 vs Field 1: SCRAM-SHA-256 SCRAM-SHA-512 SCRAM-SHA-999 SCRAM-SHA-256-PLUS SCRAM-SHA-512-PLUS SCRAM-SHA-999-PLUS\0 Field 2: tls-unique tls-awesome yet-another-tls\0 Parsing and validation of the first option is significantly more complex than the second option. vs Field 1: SCRAM-SHA-256,SCRAM-SHA-256-PLUS,SCRAM-SHA-1-PLUS,SCRAM-SHA-512-PLUS (simple CSV) Field 2: tls-unique (String) What if tls-unique is only supported with SCRAM-SHA-256-PLUS, while SCRAM-SHA-512-PLUS requires tls-awesome? It can't happen. The RFC clearly states that they are orthogonal. It is left to the implementations support one or the other, but no reason to limit applicability of a given binding method to a given SCRAM mechanisms (or viceversa). What about possible other options you might need to tack on to mechanisms? This seems less flexible. I'm not saying that either of those cases is very likely, but I don't think it's very likely we'll need extra options or different channel binding types any time soon, anyway. As I said, channel binding and SCRAM mechanisms are orthogonal. On the flip side, a contrived parsing mechanism with optional fields and many that would be repeated all the time seems far from ideal. And far less clear. Is there any reason not to use two fields? AuthenticationSASL is a new message, I guess we can freely choose its format, right? Yes, we can choose the format freely. Cool. In summary, I think the simple list of mechanism names is best, because: * It's simple, and doesn't have any extra fields or options that are not needed right now. I think it is too complicated and mixing things that are orthogonal. Since protocol is hard to extend, adding an extra field for the channel binding mechanisms -even if unused as of PG 10- is a good thing. If you need to change the protocol later, th
Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange
On 11/04/17 08:50, Heikki Linnakangas wrote: On 04/10/2017 11:03 PM, Álvaro Hernández Tortosa wrote: Channel binding needs to specify actually three things: - The mechanism, which is indeed suffixed "-PLUS". - The channel binding name, which is described here: https://tools.ietf.org/html/rfc5056. Types are also IANA-registered (see https://www.iana.org/assignments/channel-binding-types/channel-binding-types.xhtml). SCRAM mandates to implement 'tls-unique', but other channel binding types could be supported (like 'tls-server-end-point' for example). - The channel binding data, which is channel binding mechanism dependent, and is sent as part of the client last message. What I'm talking about here is the second one, the channel binding type (name). Oh, I see. According to the SCRAM RFC, "tls-unique" is used by default. I don't see us implementing anything else any time soon. PostgreSQL doesn't support any other "channel type" than TLS, and tls-unique is the only one that makes sense for TLS. If we do need it after all, the server could advertise the additional channel binding types as additional SASL mechanisms in the AuthenticationSASL message, maybe something like: "SCRAM-SHA-256" "SCRAM-SHA-256-PLUS" (for tls-unique) "SCRAM-SHA-256-PLUS ssh-unique" (for hypothetical ssh channel binding) The same trick can be used to advertise any other SASL mechanism specific options, if needed in the future. Why not add an extra field to the message? This scheme has in my opinion some disadvantages: - You assume no extensibility. Maybe Postgres will implement other mechanisms for channel binding. Maybe not. But why limit ourselves? - Apart from tls-unique there are others today, like tls-server-end-point and who knows if in the future TLS 1.x comes with something like 'tls-unique-1.x' - Why have to parse the string (separated by spaces) when you could use different fields and have no parsing at all? - How do you advertise different SCRAM mechanisms with different channel binding types? And a mix of SCRAM mechanisms with and without channel binding? If I got it right, with your proposal it would be something like: Field 1: SCRAM-SHA-256,SCRAM-SHA-256-PLUS tls-unique,SCRAM-SHA-1-PLUS tls-unique,SCRAM-SHA-512-PLUS tls-unique (which is basically a CSV of pairs where the right part of the pair might be empty; too much IMHO for a single field) vs Field 1: SCRAM-SHA-256,SCRAM-SHA-256-PLUS,SCRAM-SHA-1-PLUS,SCRAM-SHA-512-PLUS (simple CSV) Field 2: tls-unique (String) Is there any reason not to use two fields? AuthenticationSASL is a new message, I guess we can freely choose its format, right? I'm not sure I follow. The username is sent from client to server, and currently, the server will ignore it. If you're writing a client library, it can send whatever it wants. (Although again I would recommend an empty string, to avoid confusion. Sending the same username as in the startup packet, as long as it's in UTF-8, seems reasonable too.) OK, understood. I will not let then the SCRAM implementation I'm writing to allow for empty string as the user name, but in the pgjdbc glue code send "ignore" as the user name or something like that ;P Hmm, so the SASL library you're using doesn't like sending an empty string as the username? Now that I look at RFC5802 more closely, it says: If the preparation of the username fails or results in an empty string, the server SHOULD abort the authentication exchange. Perhaps we should reserve a magic user name to mean "same as startup message", in addition or instead of the empty string. We actually discussed that already at [1], but we forgot about it since. That works. Please let me know what is the "magic constant" chosen ;P Thanks, Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange
On 10/04/17 21:41, Heikki Linnakangas wrote: On 04/10/2017 09:33 PM, Álvaro Hernández Tortosa wrote: Thanks for posting the patched HTML. In my opinion, all looks good except that: - I will add an extra String (a CSV) to AuthenticationSASL message for channel binding names, so that message format can remain without changes when channel binding is implemented. It can be empty. Note that SCRAM-SHA-256 with channel binding has a different SASL mechanism name, SRAM-SHA-256-PLUS. No need for a separate flag or string for channel binding. When support for channel binding is added to the server, it will advertise two SASL mechanisms in the AuthenticationSASL message, SCRAM-SHA-256 and SCRAM-SHA-256-PLUS. (Or just SCRAM-SHA-256-PLUS, if channel-binding is required). Channel binding needs to specify actually three things: - The mechanism, which is indeed suffixed "-PLUS". - The channel binding name, which is described here: https://tools.ietf.org/html/rfc5056. Types are also IANA-registered (see https://www.iana.org/assignments/channel-binding-types/channel-binding-types.xhtml). SCRAM mandates to implement 'tls-unique', but other channel binding types could be supported (like 'tls-server-end-point' for example). - The channel binding data, which is channel binding mechanism dependent, and is sent as part of the client last message. What I'm talking about here is the second one, the channel binding type (name). - If the username used is the one sent in the startup message, rather than leaving it empty in the client-first-message, I would force it to be the same as the used in the startuo message. The problem with that is that the SCRAM spec dictates that the username must be encoded in UTF-8, but PostgreSQL supports non-UTF-8 usernames. Or did you mean that, if the username is sent, it must match the one in the startup packet, but an empty string would always be allowed? That would be reasonable. Otherwise we may confuse some client implementations which would probably consider that as an error; for one, my implementation would currently throw an error if username is empty, and I think that's correct. I'm not sure I follow. The username is sent from client to server, and currently, the server will ignore it. If you're writing a client library, it can send whatever it wants. (Although again I would recommend an empty string, to avoid confusion. Sending the same username as in the startup packet, as long as it's in UTF-8, seems reasonable too.) OK, understood. I will not let then the SCRAM implementation I'm writing to allow for empty string as the user name, but in the pgjdbc glue code send "ignore" as the user name or something like that ;P Thanks for reviewing this! I'll start hacking on code changes to go with these docs. Thanks for writing the code :) Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange
On 10/04/17 14:57, Heikki Linnakangas wrote: On 04/07/2017 01:13 AM, Michael Paquier wrote: On Fri, Apr 7, 2017 at 5:15 AM, Álvaro Hernández Tortosa <a...@8kdata.com> wrote: I don't see it. The message AuthenticationSASL.String could contain a CSV of the SCRAM protocols supported. This is specially important to support channel binding (which is just another protocol name for this matter), which is the really enhanced security mechanism of SCRAM. Since this message is sent regardless, and the client replies with PasswordMessage, no extra round trip is required. However, PasswordMessage needs to also include a field with the name of the selected protocol (it is the client who picks). Or a different message would need to be created, but no extra round-trips more than those required by SCRAM itself (4 messages for SCRAM + 1 extra for the server to tell the client it needs to use SCRAM). Yes, it seems to me that the list of protocols to send should be done by sendAuthRequest(). Then the client parses the received string, and sends an extra 'p' message with its choice before sending the first SCRAM message. So there is no need for any extra round trips. I started writing down the protocol docs, based on the above idea. See attached. The AuthenticationSASL message now contains a list of mechanisms. Does that seem clear to you? If so, I'll change the code to match the attached docs. I added two new message formats to the docs, SASLResponse and SASLInitialResponse. Those use the same type byte as PasswordMessage, 'p', but I decided to describe them as separate messages for documentation purposes, since the content is completely different depending on whether the message is sent as part of SASL, GSS, md5, or password authentication. IOW, this is not a change in the implementation, only in the way it's documented. While working on this, and reading the RFCs more carefully, I noticed one detail we should change, to be spec-complicant. The SASL spec specifies that a SASL authentication exchange consists of challenge-response pairs. There must be a response to each challenge. If the last message in the authentication mechanism (SCRAM in this case) goes in the server->client direction, then that message must sent as "additional data" in the server->client message that tells the client that the authentication was successful. That's AuthenticationOK in the PostgreSQL protocol. In the current implementation, the server-final-message is sent as an AuthenticationSASLContinue message, and the client doesn't respond to that. We should change that, so that the server-final-message is sent as "additional data" in the AuthenticationOK message. The attached docs patch describes that, rather than what the current implementation does. (For your convenience, I built the HTML docs with this patch, and put them up at http://hlinnaka.iki.fi/temp/scram-wip-docs/protocol.html for viewing) - Heikki Thanks for posting the patched HTML. In my opinion, all looks good except that: - I will add an extra String (a CSV) to AuthenticationSASL message for channel binding names, so that message format can remain without changes when channel binding is implemented. It can be empty. - If the username used is the one sent in the startup message, rather than leaving it empty in the client-first-message, I would force it to be the same as the used in the startuo message. Otherwise we may confuse some client implementations which would probably consider that as an error; for one, my implementation would currently throw an error if username is empty, and I think that's correct. Álvaro -- Álvaro Hernández Tortosa --- <8K>data
Re: [HACKERS] Some thoughts about SCRAM implementation
On 10/04/17 13:02, Heikki Linnakangas wrote: On 04/10/2017 12:39 PM, Álvaro Hernández Tortosa wrote: - I think channel binding support should be added. SCRAM brings security improvements over md5 and other simpler digest algorithms. But where it really shines is together with channel binding. This is the only method to prevent MITM attacks. Implementing it should not very difficult. There are several possible channel binding mechanisms, but the mandatory and probably more typical one is 'tls-unique' which basically means getting the byte array from the TLSfinish() message and comparing it with the same data sent by the client. That's more or less all it takes to implement it. So I would go for it. We missed the boat for PostgreSQL 10. You're right that it probably wouldn't be difficult to implement, but until there's a concrete patch to discuss, that's a moot point. Really? That's a real shame I know we're very late in the CF cycle but, again, this would be a real shame. - One SCRAM parameter, i, the iteration count, is important, and in my opinion should be configurable. AFAIK it is currently hard coded at 4096, which is the minimum value accepted by the RFC. But it should be at least 15K (RFC says), and given that it affects computing time of the authentication, it should be configurable. It's also now specified per user, which I think its too much granularity (it should suffice to say this group of users all require this iteration count). Yeah, a GUC might be a good idea. Cool. The iteration count is a part of the stored SCRAM verifier, so it's determined when you set the user's password. That's why it's per-user. Sure. But I think per-user is too much granularity. And if it is not, it should be a parameter exposed to the CREATE USER command, such that it can be (effectively) set per-user. I maintain the best approach could be the suggested pg_scram.conf with the format (or a similar one) that I proposed in the OP. - The SCRAM RFC states that server should advertise supported mechanisms. I see discussion going into not advertising them. I think it should be done, I don't see reasons not to do it, and it will become compliant with the RFC. The SASL spec [RFC4422] says that mechanism negotiation is protocol specific. The SCRAM spec [RFC5802] prescribes how negotiation of channel-binding is done, and the current implementation is compliant with that. (Which is very straightforward when neither the client nor the server supports channel binding). Yeah. But what I'm saying is that we might want to add an extra String to AuthenticationSASL message for channel binding, so that the message format needs not to be changed when channel binding is added. But the channel binding method name needs to be negotiated, and so there needs to be room for it. The negotiation of the mechanism is being discussed one the "Letting the client choose the protocol to use during a SASL exchange" thread. I'm just writing a more concrete proposal based on your suggestion of sending a list of SASL mechanisms in the AuthenticationSASL message. Stay tuned! Yepp, I will reply there, thanks :) - I don't see why proposed scram mechanism names for pg_hba.conf are not following the IANA Registry standard ( https://www.iana.org/assignments/sasl-mechanisms/sasl-mechanisms.xhtml#scram), which is uppercase (like in SCRAM-SHA-256). All the existing authentication mechanisms are spelled in lowercase. And uppercase is ugly. It's a matter of taste, for sure. And I agree with you. It's ugly. But it's standard. I'd say let's favor standardization vs our own taste. - SCRAM also supports the concept of authzid, which is kind of what pg_ident does: authenticate the user as the user sent, but login as the user specified here. It could also be supported. Yeah, it might be handy for something like pgbouncer, which could then have one userid+password to authenticate, but then switch to another user. But the SCRAM part of that seems to be just a small part of the overall feature. In any case, this is clearly Postgres 11 material. Again, it should be a tiny change, just an extra attribute sent over the message. But I guess time was over... just saying... ;) * The nonce length is not specified by the RFC. I see typical implementations use 24 chars for the client and 18 for the server. Current code uses 10. I think it should not hurt making it at least 16 or 18. Wouldn't hurt, I guess. IIRC I checked some other implementations, when I picked 10, but I don't remember which ones anymore. Got a reference for 24/18? First reference is the RFC example itself (non-mandatory, of course). But then I saw many followed this. As a quick example, GNU SASL defines: #define SNONCE_ENTROPY_BYTES 18 https://www.gnu.org/software/gsasl/coverage/lib/scram/server.c.gcov.frameset.html * It seems to me that the errors defined by the
[HACKERS] Some thoughts about SCRAM implementation
Hi! There's some ongoing discussion about SCRAM (like this thread https://www.postgresql.org/message-id/243d8c11-6149-a4bb-0909-136992f74b23%40iki.fi) but I wanted to open a new thread that covers these topics and other, more general ones. Here are some thoughts based on my perspective on SCRAM, which I gained thanks to studying it as part of the effort to implement SCRAM in the JDBC driver. FYI, some very early code can be found here: https://github.com/ahachete/scram. Although there's still a lot to do, I'm doing great progress and I expect to have a working version within 2 weeks. So here's what I think: - Work so far is great. Thanks Heikki, Michael and others for the effort involved. I love to have SCRAM support in Postgres! - I think channel binding support should be added. SCRAM brings security improvements over md5 and other simpler digest algorithms. But where it really shines is together with channel binding. This is the only method to prevent MITM attacks. Implementing it should not very difficult. There are several possible channel binding mechanisms, but the mandatory and probably more typical one is 'tls-unique' which basically means getting the byte array from the TLSfinish() message and comparing it with the same data sent by the client. That's more or less all it takes to implement it. So I would go for it. - One SCRAM parameter, i, the iteration count, is important, and in my opinion should be configurable. AFAIK it is currently hard coded at 4096, which is the minimum value accepted by the RFC. But it should be at least 15K (RFC says), and given that it affects computing time of the authentication, it should be configurable. It's also now specified per user, which I think its too much granularity (it should suffice to say this group of users all require this iteration count). - The SCRAM RFC states that server should advertise supported mechanisms. I see discussion going into not advertising them. I think it should be done, I don't see reasons not to do it, and it will become compliant with the RFC. - I don't see why proposed scram mechanism names for pg_hba.conf are not following the IANA Registry standard ( https://www.iana.org/assignments/sasl-mechanisms/sasl-mechanisms.xhtml#scram), which is uppercase (like in SCRAM-SHA-256). - SCRAM also supports the concept of authzid, which is kind of what pg_ident does: authenticate the user as the user sent, but login as the user specified here. It could also be supported. Based on these comments, I'd like to propose the following changes: * Implement channel binding. * Add a GUC for the channel binding technique to use. It could be named as "scram_channel_binding_name" or "sasl_channel_binding_name", for example. And probably supported value as of today, 'tls-unique'. * AuthenticationSASL backend message: - The String field make it a CSV that would be a list of one or more supported IANA SCRAM authentication methods. This info comes from pg_scram.conf (see below). - Add another String to specify the variety of channel binding supported (if any), like 'tls-unique'. This info comes from the GUC. * Treat the configuration of scram in pg_hba.conf, which seems complicated, similarly to how pg_ident.conf is treated. Basically in pg_hba.conf specify a "scram=scram_entry_name" value and then look for a pg_scram.conf file for an entry with that name. The pg_scram.conf might have the following structure: name|scram_mechanisms|i scramlight|SCRAM-SHA-1 | 4096 scramstrong |SCRAM-SHA-256,SCRAM-SHA-256-PLUS|16384 * The nonce length is not specified by the RFC. I see typical implementations use 24 chars for the client and 18 for the server. Current code uses 10. I think it should not hurt making it at least 16 or 18. * It seems to me that the errors defined by the RFC, sent on the server-final-message if there were errors, are never sent with the current implementation. I think they should be sent as per the standard, and also proceed until the last stage even if errors were detected earlier (to conform with the RFC). Thoughts? Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Some thoughts about SCRAM implementation
Hi! There's some ongoing discussion about SCRAM (like this thread https://www.postgresql.org/message-id/243d8c11-6149-a4bb-0909-136992f74b23%40iki.fi) but I wanted to open a new thread that covers these topics and other, more general ones. Here are some thoughts based on my perspective on SCRAM, which I gained thanks to studying it as part of the effort to implement SCRAM in the JDBC driver. FYI, some very early code can be found here: https://github.com/ahachete/scram. Although there's still a lot to do, I'm doing great progress and I expect to have a working version within 2 weeks. So here's what I think: - Work so far is great. Thanks Heikki, Michael and others for the effort involved. I love to have SCRAM support in Postgres! - I think channel binding support should be added. SCRAM brings security improvements over md5 and other simpler digest algorithms. But where it really shines is together with channel binding. This is the only method to prevent MITM attacks. Implementing it should not very difficult. There are several possible channel binding mechanisms, but the mandatory and probably more typical one is 'tls-unique' which basically means getting the byte array from the TLSfinish() message and comparing it with the same data sent by the client. That's more or less all it takes to implement it. So I would go for it. - One SCRAM parameter, i, the iteration count, is important, and in my opinion should be configurable. AFAIK it is currently hard coded at 4096, which is the minimum value accepted by the RFC. But it should be at least 15K (RFC says), and given that it affects computing time of the authentication, it should be configurable. It's also now specified per user, which I think its too much granularity (it should suffice to say this group of users all require this iteration count). - The SCRAM RFC states that server should advertise supported mechanisms. I see discussion going into not advertising them. I think it should be done, I don't see reasons not to do it, and it will become compliant with the RFC. - I don't see why proposed scram mechanism names for pg_hba.conf are not following the IANA Registry standard ( https://www.iana.org/assignments/sasl-mechanisms/sasl-mechanisms.xhtml#scram), which is uppercase (like in SCRAM-SHA-256). - SCRAM also supports the concept of authzid, which is kind of what pg_ident does: authenticate the user as the user sent, but login as the user specified here. It could also be supported. Based on these comments, I'd like to propose the following changes: * Implement channel binding. * Add a GUC for the channel binding technique to use. It could be named as "scram_channel_binding_name" or "sasl_channel_binding_name", for example. And probably supported value as of today, 'tls-unique'. * AuthenticationSASL backend message: - The String field make it a CSV that would be a list of one or more supported IANA SCRAM authentication methods. This info comes from pg_scram.conf (see below). - Add another String to specify the variety of channel binding supported (if any), like 'tls-unique'. This info comes from the GUC. * Treat the configuration of scram in pg_hba.conf, which seems complicated, similarly to how pg_ident.conf is treated. Basically in pg_hba.conf specify a "scram=scram_entry_name" value and then look for a pg_scram.conf file for an entry with that name. The pg_scram.conf might have the following structure: name|scram_mechanisms|i scramlight|SCRAM-SHA-1 | 4096 scramstrong |SCRAM-SHA-256,SCRAM-SHA-256-PLUS|16384 * The nonce length is not specified by the RFC. I see typical implementations use 24 chars for the client and 18 for the server. Current code uses 10. I think it should not hurt making it at least 16 or 18. * It seems to me that the errors defined by the RFC, sent on the server-final-message if there were errors, are never sent with the current implementation. I think they should be sent as per the standard, and also proceed until the last stage even if errors were detected earlier (to conform with the RFC). Thoughts? Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM authentication, take three
On 07/04/17 11:05, Magnus Hagander wrote: On Fri, Apr 7, 2017 at 9:59 AM, Heikki Linnakangas <hlinn...@iki.fi <mailto:hlinn...@iki.fi>> wrote: On 04/07/2017 10:38 AM, Magnus Hagander wrote: So here's a wild idea. What if we just call it "sha256"? Does the user actually care about it being scram, or is scram just an implementation detail for them? That way when the next one shows up, it'll be sha512 or whatever. It happens to use scram under the hood, but does the user have to or does the user want to care about that? (One could argue the same way that the user shouldn't have to or want to care about the hashing algorithm -- but if that's the case then we should only have one entry, it would be "scram", and the system would decide from there. And I think this discussion already indicates we don't think this is enough) I think the "SCRAM" part is more important than "SHA-256", so -1 on that. If that is the important part, then I agree :) I am not entirely sure that the scram part *is* more important though. I agree it is much more important. Needed, I'd say. "SHA-256" could refer to other mechanisms that just simply hash the value (maybe with a salt, or not) with that hash algorithm. SCRAM is a different beast, with much more functionality than that. So yes, it matters a lot :) I think most users will be a lot more comfortable with "sha256" than "scram" though. But I guess that says using scram-sha-256 is the correct way. I don't like UPPERCASE, but the RFC links to the IANA registry where SCRAM methods are all uppercase and with dashes: SCRAM-SHA-256 and SCRAM-SHA-256-PLUS. I'd use those names, they are standardized. The main against using just "scram" is that it's misleading, because we implement SCRAM-SHA-256, rather than SCRAM-SHA-1, which was the first SCRAM mechanism, commonly called just SCRAM. As long as that's the only SCRAM variant we have, that's not too bad, but it becomes more confusing if we ever implement SCRAM-SHA-512 or SCRAM-something-else in the future. That's the point Noah made, and it's a fair point, but the question is whether we consider that to be more important than having a short name for what we have now. Yeah, I agree we should be prepared for the future. And having "scram" and "scram-sha-512" would definitely fall under confusing. The channel binding aspect is actually more important to think about right now, as that we will hopefully implement in the next release or two. In [1], Michael wrote: There is also the channel binding to think about... So we could have a list of keywords perhaps associated with SASL? Imagine for example: sasl$algo,$channel_binding Giving potentially: saslscram_sha256 saslscram_sha256,channel saslscram_sha512 saslscram_sha512,channel In the case of the patch of this thread just the first entry would make sense, once channel binding support is added a second keyword/option could be added. And there are of course other methods that could replace SCRAM.. It should also be possible to somehow specify "use channel binding, if the client supports it". Is that really a type of authentication? We already hvae the idea of authentication method options, used for most other things except md5 which doesn't have any. So it could be "sha256 channelbind=on", "sha256 channelbind=off" or "sha256 channelbind=negotiate" or something like that? > Technically, the channel-binding variant is a separate SASL mechanism, i.e. it has a separate name, SCRAM-SHA-256-PLUS. I'm not sure if > users/admins think of it that way. I bet they don't. Probably. But let's not underestimate channel binding: it is the "greatest" feature of SCRAM, and where security really shines. I'd encourage the use of channel binding and would definitely make it explicit. As for the options, there's no way to negotiate, the client picks. It could still be three-valued: on, off, only-channel-binding (or however you want to call them). That will only change what mechanisms the server will be advertising to clients. Álvaro -- Álvaro Hernández Tortosa --- <8K>data
Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange
On 06/04/17 22:05, Tom Lane wrote: Simon Riggs <si...@2ndquadrant.com> writes: How would we provide the list of protocols? Surely the protocol is defined by pg_hba.conf, which makes it dependent upon username, database and ip range. If the list were accurate, it would allow an attacker to discover how best to attack. If the list were inaccurate it would just be an annoyance. At minimum, providing the list of protocols means an extra round trip to the server. Yeah, that's a problem. I don't see it. The message AuthenticationSASL.String could contain a CSV of the SCRAM protocols supported. This is specially important to support channel binding (which is just another protocol name for this matter), which is the really enhanced security mechanism of SCRAM. Since this message is sent regardless, and the client replies with PasswordMessage, no extra round trip is required. However, PasswordMessage needs to also include a field with the name of the selected protocol (it is the client who picks). Or a different message would need to be created, but no extra round-trips more than those required by SCRAM itself (4 messages for SCRAM + 1 extra for the server to tell the client it needs to use SCRAM). ISTM that if you have a valid role to connect to then you'll also know what authentication mechanism to use so you should be able to specify the mechanism in your connection message and save the extra trip. I do not buy that in the least. It has never been the case before now that clients know in advance what the auth challenge method will be. If we put that requirement on them for SCRAM, we're just going to be exporting a lot of pain and end-user-visible inconsistency. Perhaps we could turn this around: have the client send (in the connection request packet) a list of auth protocols it thinks it is able to handle. Per the SCRAM RFC, it is the server who advertises and the client who picks. Regards, Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Letting the client choose the protocol to use during a SASL exchange
On 06/04/17 19:05, Heikki Linnakangas wrote: On 04/06/2017 08:13 AM, Noah Misch wrote: If any SCRAM open item is a beta blocker, it's this one. (But SASLprep is also in or near that status.) Post-beta wire protocol changes are bad, considering beta is normally the time for projects like pgjdbc and npgsql to start adapting to such changes. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. I will work on this next week. I haven't given it much thought yet, but I think it's going to be pretty straightforward. It won't require much code yet, as we only support one SASL mechanism. We just need to ensure that we don't paint ourselves in the corner with the protocol. I think this could easily extended from the current message. SCRAM does not force any concrete way of negotiating the protocols supported. The current SASL message sends the only SCRAM method supported. I think it should be enough to make this a CSV, which for a single value, is the same as we have today (so no code change required, only documentation). The other message that needs to be changed is the password one, the first time the client sends the SCRAM "client-first-message", which needs to contain the algorithm selected. As of today, that could be a constant (at least in the code) and validate is value. So I guess code changes would be minimal. I could be wrong, of course. I'm working myself on Java's (pgjdbc) implementation, and I will hopefully have a prototype by next week to try it. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implementing full UTF-8 support (aka supporting 0x00)
On 03/08/16 21:42, Geoff Winkless wrote: On 3 August 2016 at 20:36, Álvaro Hernández Tortosa <a...@8kdata.com> wrote: Isn't the correct syntax something like: select E'\uc080', U&'\c080'; ? It is a single character, 16 bit unicode sequence (see https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html). No, what you've done there is created the three-byte utf8 sequence \xec8280 # select U&'\c080'::bytea; bytea -- \xec8280 It's not a UCS2 c080, it's utf8 c080. Geoff Yes, you're absolutely right ^_^ Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implementing full UTF-8 support (aka supporting 0x00)
On 03/08/16 21:31, Geoff Winkless wrote: On 3 August 2016 at 20:13, Álvaro Hernández Tortosa <a...@8kdata.com> wrote: Yet they are accepted by Postgres (like if Postgres would support Modified UTF-8 intentionally). The caracter in psql does not render as a nul but as this symbol: "삀". Not accepted as valid utf8: # select E'\xc0\x80'; ERROR: invalid byte sequence for encoding "UTF8": 0xc0 0x80 You would need a "modified utf8" encoding, I think. Geoff Isn't the correct syntax something like: select E'\uc080', U&'\c080'; ? It is a single character, 16 bit unicode sequence (see https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html). Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implementing full UTF-8 support (aka supporting 0x00)
On 03/08/16 20:14, Álvaro Hernández Tortosa wrote: On 03/08/16 17:47, Kevin Grittner wrote: On Wed, Aug 3, 2016 at 9:54 AM, Álvaro Hernández Tortosa <a...@8kdata.com> wrote: What would it take to support it? Would it be of any value to support "Modified UTF-8"? https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8 That's nice, but I don't think so. The problem is that you cannot predict how people would send you data, like when importing from other databases. I guess it may work if Postgres would implement such UTF-8 variant and also the drivers, but that would still require an encoding conversion (i.e., parsing every string) to change the 0x00, which seems like a serious performance hit. It could be worse than nothing, though! Thanks, Álvaro It may indeed work. According to https://en.wikipedia.org/wiki/UTF-8#Codepage_layout the encoding used in Modified UTF-8 is an (otherwise) invalid UTF-8 code point. In short, the \u00 nul is represented (overlong encoding) by the two-byte, 1 character sequence \uc080. These two bytes are invalid UTF-8 so should not appear in an otherwise valid UTF-8 string. Yet they are accepted by Postgres (like if Postgres would support Modified UTF-8 intentionally). The caracter in psql does not render as a nul but as this symbol: "삀". Given that this works, the process would look like this: - Parse all input data looking for bytes with hex value 0x00. If they appear in the string, they are the null byte. - Replace that byte with the two bytes 0xc080. - Reverse the operation when reading. This is OK but of course a performance hit (searching for 0x00 and then augmenting the byte[] or whatever data structure to account for the extra byte). A little bit of a PITA, but I guess better than fixing it all :) Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implementing full UTF-8 support (aka supporting 0x00)
On 03/08/16 18:35, Geoff Winkless wrote: On 3 August 2016 at 15:54, Álvaro Hernández Tortosa <a...@8kdata.com> wrote: Given that 0x00 is a perfectly legal UTF-8 character, I conclude we're strictly non-compliant. It's perhaps worth mentioning that 0x00 is valid ASCII too, and PostgreSQL has never stored that either. Then yes, it could also be a problem. But as of today I believe solving the problem for UTF-8 would solve the great majority of this embedded NUL problems. Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implementing full UTF-8 support (aka supporting 0x00)
On 03/08/16 17:47, Kevin Grittner wrote: On Wed, Aug 3, 2016 at 9:54 AM, Álvaro Hernández Tortosa <a...@8kdata.com> wrote: What would it take to support it? Would it be of any value to support "Modified UTF-8"? https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8 That's nice, but I don't think so. The problem is that you cannot predict how people would send you data, like when importing from other databases. I guess it may work if Postgres would implement such UTF-8 variant and also the drivers, but that would still require an encoding conversion (i.e., parsing every string) to change the 0x00, which seems like a serious performance hit. It could be worse than nothing, though! Thanks, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implementing full UTF-8 support (aka supporting 0x00)
On 03/08/16 17:23, Tom Lane wrote: =?UTF-8?Q?=c3=81lvaro_Hern=c3=a1ndez_Tortosa?= <a...@8kdata.com> writes: As has been previously discussed (see https://www.postgresql.org/message-id/BAY7-F17FFE0E324AB3B642C547E96890%40phx.gbl for instance) varlena fields cannot accept the literal 0x00 value. Yup. What would it take to support it? One key reason why that's hard is that datatype input and output functions use nul-terminated C strings as the representation of the text form of any datatype. We can't readily change that API without breaking huge amounts of code, much of it not under the core project's control. There may be other places where nul-terminated strings would be a hazard (mumble fgets mumble), but offhand that API seems like the major problem so far as the backend is concerned. There would be a slew of client-side problems as well. For example this would assuredly break psql and pg_dump, along with every other client that supposes that it can treat PQgetvalue() as returning a nul-terminated string. This end of it would possibly be even worse than fixing the backend, because so little of the affected code is under our control. In short, the problem is not with having an embedded nul in a stored text value. The problem is the reams of code that suppose that the text representation of any data value is a nul-terminated C string. regards, tom lane Wow. That seems like a daunting task. I guess, then, than even implementing a new datatype based on bytea but that would use the text IO functions to show up as text (not send/recv) would neither work, right? Thanks for the input, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Implementing full UTF-8 support (aka supporting 0x00)
Hi list. As has been previously discussed (see https://www.postgresql.org/message-id/BAY7-F17FFE0E324AB3B642C547E96890%40phx.gbl for instance) varlena fields cannot accept the literal 0x00 value. Sure, you can use bytea, but this hardly a good solution. The problem seems to be hitting some use cases, like: - People migrating data from other databases (apart from PostgreSQL, I don't know of any other database which suffers the same problem). - People using drivers which use UTF-8 or equivalent encodings by default (Java for example) Given that 0x00 is a perfectly legal UTF-8 character, I conclude we're strictly non-compliant. And given the general Postgres policy regarding standards compliance and the people being hit by this, I think it should be addressed. Specially since all the usual fixes are a real PITA (re-parsing, re-generating strings, which is very expensive, or dropping data). What would it take to support it? Isn't the varlena header propagated everywhere, which could help infer the real length of the string? Any pointers or suggestions would be welcome. Thanks, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Protocol buffer support for Postgres
On 24/06/16 14:23, Flavius Anton wrote: On Thu, Jun 23, 2016 at 2:54 PM, Flavius Anton <f.v.an...@gmail.com> wrote: On Thu, Jun 23, 2016 at 1:50 PM, Greg Stark <st...@mit.edu> wrote: On Thu, Jun 23, 2016 at 8:15 PM, Flavius Anton <f.v.an...@gmail.com> wrote: I'd love to talk more about this. I thought quite a bit about this a few years ago but never really picked up it to work on. Any other thoughts on this? My guess is that it might be an important addition to Postgres that can attract even more users, but I am not sure if there's enough interest from the community. If I want to pick this task, how should I move forward? Do I need to write a design document or similar or should I come up with a patch that implements a draft prototype? I am new to this community and don't know the code yet, so I'd appreciate some guidance from an older, more experienced member. Other than protobuf, there are also other serialization formats that might be worth considering. Not too long ago I suggested working specifically on serialization formas for the json/jsonb types: https://www.postgresql.org/message-id/56CB8A62.40100%408kdata.com I believe this effort is on the same boat. Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On 15/05/16 14:42, Magnus Hagander wrote: On Sun, May 15, 2016 at 2:29 PM, Álvaro Hernández Tortosa <a...@8kdata.com <mailto:a...@8kdata.com>> wrote: On 14/05/16 20:02, Petr Jelinek wrote: +1 for going with 10.0 after 9.6 and 11.0 afterwards, etc. It will hopefully both end these discussions and remove the confusion the current versioning scheme has (I too heard way to many times about people using postgres8 or postgres9). Even worse: I've been told that a company was using "PostgreSQL 8.5" ^_^ That's not necessarily the version numbers fault. That's them using an alpha version.. (Yes, I've run into a customer just a couple of years ago that were still on 8.5 alpha) It was their fault, obviously. There were not using the alpha version, they were using 8.3 but they thought it was 8.5 (and yes, that's terrible that they provide information without checking it). Anyway, and not being version number's fault, having one less number may have helped here and probably in other cases too. Álvaro -- Álvaro Hernández Tortosa --- 8Kdata
Re: [HACKERS] 10.0
On 14/05/16 20:02, Petr Jelinek wrote: +1 for going with 10.0 after 9.6 and 11.0 afterwards, etc. It will hopefully both end these discussions and remove the confusion the current versioning scheme has (I too heard way to many times about people using postgres8 or postgres9). Even worse: I've been told that a company was using "PostgreSQL 8.5" ^_^ Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On 14/05/16 02:00, Tom Lane wrote: [...] I don't think this is about version number inflation, but actually more the opposite. What you're calling the major number is really a marketing number. There is not a technical distinction between major releases where we choose to bump the first number and those where we choose to bump the second. It's all about marketing. So to me, merging those numbers would be an anti-marketing move. I think it's a good move: it would be more honest and transparent about what the numbers mean, not less so. If having two "major" numbers is a marketing game, and if it works in such a way, I'd immediately say let's keep it. Decisions like the one debated here should be driven more from what is going to help user adoption rather than -hackers personal taste. BTW, none of these approaches seem dishonest to me. Having said that, I believe having a single major number is a more effective marketing. Non major-major versions may make the release look like a "probably not worth" upgrade. People may hold their breath until a major-major upgrade, specially if people support this idea in forums like saying: "10.0 will come with amazing features, because version is bumped from 9.6". So +1 to call 10.0 the next version and 11.0 the one after that. Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about google summer of code 2016
On 23/03/16 01:56, Amit Langote wrote: On 2016/03/23 9:19, Álvaro Hernández Tortosa wrote: - Regarding GSoC: it looks to me that we failed to submit in time. Is this what happened, or we weren't selected? If the former (and no criticism here, just realizing a fact) what can we do next year to avoid this happening again? Is anyone "appointed" to take care of it? See Thom's message here: http://www.postgresql.org/message-id/CAA-aLv6i3jh1H-5UHb8jSB0gMwA9sg_cqw3=MwddVzr=pxa...@mail.gmail.com Thanks, Amit OK, read the thread, thanks for the info :) Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about google summer of code 2016
On 22/02/16 23:23, Álvaro Hernández Tortosa wrote: On 22/02/16 05:10, Tom Lane wrote: Heikki Linnakangas <hlinn...@iki.fi> writes: On 19/02/16 10:10, Ã�lvaro Hernández Tortosa wrote: Oleg and I discussed recently that a really good addition to a GSoC item would be to study whether it's convenient to have a binary serialization format for jsonb over the wire. Seems a bit risky for a GSoC project. We don't know if a different serialization format will be a win, or whether we want to do it in the end, until the benchmarking is done. It's also not clear what we're trying to achieve with the serialization format: smaller on-the-wire size, faster serialization in the server, faster parsing in the client, or what? Another variable is that your answers might depend on what format you assume the client is trying to convert from/to. (It's presumably not text JSON, but then what is it?) As I mentioned before, there are many well-known JSON serialization formats, like: - http://ubjson.org/ - http://cbor.io/ - http://msgpack.org/ - BSON (ok, let's skip that one hehehe) - http://wiki.fasterxml.com/SmileFormatSpec Having said that, I'm not sure that risk is a blocking factor here. History says that a large fraction of our GSoC projects don't result in a commit to core PG. As long as we're clear that "success" in this project isn't measured by getting a feature committed, it doesn't seem riskier than any other one. Maybe it's even less risky, because there's less of the success condition that's not under the GSoC student's control. I wanted to bring an update here. It looks like someone did the expected benchmark "for us" :) https://eng.uber.com/trip-data-squeeze/(thanks Alam for the link) While this is Uber's own test, I think the conclusions are quite significant: an encoding like message pack + zlib requires only 14% of the size and encodes+decodes in 76% of the time of JSON. There are of course other contenders that trade better encoding times over slightly slower decoding and bigger size. But there are very interesting numbers on this benchmark. MessagePack, CBOR and UJSON (all + zlib) look like really good options. So now that we have this data I would like to ask these questions to the community: - Is this enough, or do we need to perform our own, different benchmarks? - If this is enough, and given that we weren't elected for GSoC, is there interest in the community to work on this nonetheless? - Regarding GSoC: it looks to me that we failed to submit in time. Is this what happened, or we weren't selected? If the former (and no criticism here, just realizing a fact) what can we do next year to avoid this happening again? Is anyone "appointed" to take care of it? Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Soliciting Feedback on Improving Server-Side Programming Documentation
I started a similar thread with probably similar concerns: http://www.postgresql.org/message-id/56d1a6aa.6080...@8kdata.com I believe this effort should be done. I added to my TODO list to compile a list of used functions in a selection of picked extensions to use that as a starting point of an "API". Regards, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata On 15/03/16 13:02, Corey Huinker wrote: Over the past few months, I've been familiarizing myself with postgres server side programming in C. My attempts to educate myself were slow and halting. The existing server side programming documentation has some examples, but those examples didn't show me how do what I wanted to do, and my research-via-google was highly circular, almost always pointing back to the documentation I had already found lacking, or a copy of it. Most of what I have learned I have culled from asking people on IRC, or bugging people I've met through user groups and PgConf. In all cases, people have been extremely helpful. However, this method is inefficient, because we're using two people's time, one of whom has to tolerate my incessant questions and slow learning pace. Furthermore, the helpful suggestions I received boiled down to: 1. The function/macro/var you're looking for is PG_FOO, git grep PG_FOO 2. Look in blah.c which does something like what you're trying to do 3. The comments in blah.h do a good job of listing and explaining this macro or that #1 git grep is a helpful reflex for discovering examples on my own, but it requires that I have a term to search on in the first place, and too often I don't know what I don't know. #2 is the gold standard in terms of correctness (the code had to have worked at least up to the last checkin date), and in terms of discoverability it often gave me names of new macros to search for, coding patterns, etc. However, I was always left with the questions: How would I have figured this out on my own? How is the next person going to figure it out? Why doesn't anybody document this? #3 Often answers the last question in #2: It *is* documented, but that documentation is not easily discoverable by conventional means. So what I'd like to do is migrate some of the helpful information in the header files into pages of web searchable documentation, and also to revamp the existing documentation to be more relevant. Along the way, I collected a list of things I wished I'd had from the start: * A list of all the GETARG_* macros. It would have been especially great if this were in table form: Your Parameter Is A / Use This Macro / Which Gives This Result Type / Working example. * A list/table of the DatumGet* macros. I'm aware that many of them overlap/duplicate others. That'd be good to know too. * The table at http://www.postgresql.org/docs/9.1/static/errcodes-appendix.html has the numeric codes and PL/PGSQL constants enumerated. It'd be nice if it had the C #define as well * The SPI documentation mentions most/all of the SPI functions, but I couldn't find documentation on the SPI variables like SPI_processed and SPI_tuptable. * Examples and explanation of how PG_TRY()/PG_CATCH work. How to add context callbacks. * Direct Function Calls * A comparison of the two modes of writing SRF functions (Materialize vs multi-call) * Less explanation of how to do write V0-style functions. That was called the "old style" back in version 7.1. Why is that information up front in the documentation when so much else is sequestered in header files? Some of these things may seem obvious/trivial to you. I would argue that they're only obvious in retrospect, and the more obvious-to-you things we robustly document, the quicker we accumulate programmers who are capable of agreeing that it's obvious, and that's good for the community. I'm aware that some of these APIs change frequently. In those cases, I suggest that we make note of that on the same page. Because I'm still going through the learning curve, I'm probably the least qualified to write the actual documentation. However, I have a clear memory of what was hard to learn and I have the motivation to make it easier on the next person. That makes me a good focal point for gathering, formatting, and submitting the documentation in patch form. I'm volunteering to do so. What I need from you is: * Citations of existing documentation in header files that could/should be exposed in our more formal documentation. * Explanations of any of the things above, which I can then reformat into proposed documentation. * A willingness to review the proposed new documentation * Reasoned explanations for why this is a fool's errand You supply the expertise, I'll write the patch. Thanks in advance.
Re: [HACKERS] New competition from Microsoft?
On 07/03/16 23:32, Joshua D. Drake wrote: On 03/07/2016 01:43 PM, Josh berkus wrote: All, http://blogs.microsoft.com/?p=67248 Once SQL Server is available on Linux, we're going to see more people using it as an alternative to PostgreSQL. Especially since they're picking up a lot of our better features, like R support. Yes but: 1. MSSQL is not going to magically run well on Linux. It was never designed to run on a *NIX platform and it long left the parts that would (Sybase) in the dust. I'd be surprised if it would *not* run well on Linux. Linux is an easy platform to develop, it's fast and reliable and they have plenty of time (till release). 2. This is actually good for us. It validates our primary deployment platform. Why run a closed source MSSQL when you can get Open Source PostgreSQL on the same platform major platform? SQL Server seems to have some nice features that PostgreSQL don't have. Encryption support and Hekaton (columnar-store "shared buffers") come to my mind. It's definitely not a toy database, even less on Linux. 3. Competition is good when we have it, our community steps up. Absolutely, competition is very welcome! Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How can we expand PostgreSQL ecosystem?
On 07/03/16 11:54, José Luis Tallón wrote: On 03/07/2016 07:30 AM, Tsunakawa, Takayuki wrote: From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Mark Kirkwood For cloud - in particular Openstack (which I am working with ATM), the biggest thing would be: - multi-master replication or failing that: - self managing single master failover (voting/quorum etc) Hmm consul (optionally in a dockerized setup) ? https://www.consul.io/ There are already HA solutions based on consensus, like Patroni: https://github.com/zalando/patroni so that operators can essentially 'set and forget'. We currently use Mysql+ Galera (multi master) and Mongodb (self managing single master) and the convenience and simplicity is just so important (Openstack is a huge complex collection of services - hand holding of any one service is pretty much a non starter). Yes, I was also asked whether PostgreSQL has any optional functionality like Galera Cluster for MySQL. He was planning a scalable PaaS service which performs heavy reads and writes. Demand exists. AFAIK, Galera has its own set of drawbacks Right, some of them are explained here: https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster Regards, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL extension API? Documentation?
On 27/02/16 15:43, Tom Lane wrote: Chapman Flack <c...@anastigmatix.net> writes: On 02/27/16 08:37, Ã�lvaro Hernández Tortosa wrote: In other words: what is the API surface exposed by PostgreSQL to extension developers? The assumption is that no PostgreSQL code should be modified, just adding your own and calling existing funcitons. That's an excellent question that repeatedly comes up, in particular because of the difference between the way the MSVC linker works on Windows, and the way most other linkers work on other platforms. Yeah. It would be a fine thing to have a document defining what we consider to be the exposed API for extensions. In most cases we could not actually stop extension developers from relying on stuff outside the defined API, and I don't particularly feel a need to try. But it would be clear to all concerned that if you rely on something not in the API, it's your problem if we remove it or whack it around in some future release. On the other side, it would be clearer to core-code developers which changes should be avoided because they would cause pain to extension authors. Unfortunately, it would be a lot of work to develop such a thing, and no one has wanted to take it on. Why would it be so much work? Creating a function list, and maybe documenting those, doesn't sound like a daunting task. I wouldn't mind volunteering for this work, but I guess I would need some help to understand and identify the candidate parts of the API. If anyone could help me here, please let me know. Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL extension API? Documentation?
On 27/02/16 15:10, Chapman Flack wrote: On 02/27/16 08:37, Álvaro Hernández Tortosa wrote: In other words: what is the API surface exposed by PostgreSQL to extension developers? The assumption is that no PostgreSQL code should be modified, just adding your own and calling existing funcitons. That's an excellent question that repeatedly comes up, in particular because of the difference between the way the MSVC linker works on Windows, and the way most other linkers work on other platforms. The issue there is ... on most non-Windows platforms, there are only the general C rules to think about: if a symbol is static (or auto of course) it is not visible to extensions, but otherwise it is. For MSVC, in contrast, symbols need to have a certain decoration (look for PGDLLIMPORT in various PostgreSQL .h files) for an MSVC-built extension to be able to see it, otherwise it isn't accessible. Well, that's not quite right. It turns out (and it may have taken some work on the build process to make it turn out this way) ... *functions* are accessible from MSVC (as long as they would be accessible under normal C rules) whether or not they have PGDLLIMPORT. It's just data symbols/variables that have to have PGDLLIMPORT or they aren't available on Windows/MSVC. And *that* arrangement is the result of a long thread in 2014 that unfolded after discovering that what was really happening in MSVC *before* that was that MSVC would silently pretend to link your non-PGDLLIMPORT data symbols, and then give you the wrong data. http://www.postgresql.org/message-id/flat/52fab90b.6020...@2ndquadrant.com In that long thread, there are a few messages in the middle that probably give the closest current answer to your API question. Craig Ringer has consistently favored making other platforms work more like Windows/MSVC, so that the PGDLLIMPORT business would serve to limit and more clearly define the API surface: http://www.postgresql.org/message-id/52ef1468.6080...@2ndquadrant.com Andres Freund had the pragmatic reply: http://www.postgresql.org/message-id/20140203103701.ga1...@awork2.anarazel.de I think that'd be an exercise in futility. ... We'd break countless extensions people have written. ... we'd need to have a really separate API layer ... doesn't seem likely to arrive anytime soon, if ever. which was ultimately concurred in by Tom, and Craig too: http://www.postgresql.org/message-id/29286.1391436...@sss.pgh.pa.us http://www.postgresql.org/message-id/52efa654.8010...@2ndquadrant.com Andres characterized it as "We have a (mostly) proper API. Just not an internal/external API split." http://www.postgresql.org/message-id/20140203142514.gd1...@awork2.anarazel.de -Chap Hi Chapman. Thank you very much for your detailed message and all the references. They were very appropiate. However, I still lack a list of functions that might be callable (I understand not even those labeled with PGDLLIMPORT are all good candidates and some good candidates might not be labeled as such) from an extension point of view. Have you come across such a list over any of these threads? I haven't been able to find it. Thanks for your input! Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL extension API? Documentation?
On 27/02/16 15:01, Fabrízio de Royes Mello wrote: On Sat, Feb 27, 2016 at 10:37 AM, Álvaro Hernández Tortosa <a...@8kdata.com <mailto:a...@8kdata.com>> wrote: > > > Hi. > > I have a newbie question for extension development. Extensions provide an entry point, and are dynamically linked to PostgreSQL. But what APIs/functions are really available for extensions to call? > > The most obvious API is SPI. You could also implements hooks. Of course, functions, types, aggregates, whatever. But can an extension call other "internal" PostgreSQL functions? Is there any restriction to what could --or should-- call an extension? Is there any specific API, or any documentation which states what is available to use? > > In other words: what is the API surface exposed by PostgreSQL to extension developers? The assumption is that no PostgreSQL code should be modified, just adding your own and calling existing funcitons. > I don't know what kind of problem you want to solve, but maybe you should ask to yourself: Good point. I don't know. More precisely: no specific problem as of today. But if I knew all the "exposed API" I could more clearly think of what problems could be solved. In other words: I see it's not clear what an extension could "extend". And knowing that would help extension developers to create new solutions. 1) I need to change some current PostgreSQL behavior? If that means not changing current code, might well be an option. 2) I need to add a new feature do PostgreSQL without change the current behavior? Writing a C extension you can access a lot of internal code if it's available internally by .h headers. For example, some time ago I'm thinking to write an extension to show more internal information about autovacuum (internal queue, etc... some like pg_stat_autovaccuum) . But nowadays is impossible without change the core because some internal structures are not exposed, so we should define an internal API to expose this kind of information. So, calling any code exposed by the headers is ok for an extension? Is then the set of all .h files the "exposed API"? Or are some of those functions that should never be called? So depending what problem you want to solve you can write an extension to do that. Then unfortunately the short aswer is "depend". Hope that we can find a more general answer :) Thanks for your opinion! Álvaro -- Álvaro Hernández Tortosa --- 8Kdata
[HACKERS] PostgreSQL extension API? Documentation?
Hi. I have a newbie question for extension development. Extensions provide an entry point, and are dynamically linked to PostgreSQL. But what APIs/functions are really available for extensions to call? The most obvious API is SPI. You could also implements hooks. Of course, functions, types, aggregates, whatever. But can an extension call other "internal" PostgreSQL functions? Is there any restriction to what could --or should-- call an extension? Is there any specific API, or any documentation which states what is available to use? In other words: what is the API surface exposed by PostgreSQL to extension developers? The assumption is that no PostgreSQL code should be modified, just adding your own and calling existing funcitons. Thanks, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The plan for FDW-based sharding
On 27/02/16 09:19, Konstantin Knizhnik wrote: On 02/27/2016 06:54 AM, Robert Haas wrote: [...] So maybe the goal for the GTM isn't to provide true serializability across the cluster but some lesser degree of transaction isolation. But then exactly which serialization anomalies are we trying to prevent, and why is it OK to prevent those and not others? Absolutely agree. There are some theoretical discussion regarding CAP and different distributed level of isolation. But at practice people want to solve their tasks. Most of PostgeSQL used are using default isolation level: read committed although there are alot of "wonderful" anomalies with it. Serialazable transaction in Oracle are actually violating fundamental serializability rule and still Oracle is one of ther most popular database in the world... The was isolation bug in Postgres-XL which doesn't prevent from using it by commercial customers... I think this might be a dangerous line of thought. While I agree PostgreSQL should definitely look at the market and answer questions that (current and prospective) users may ask, and be more practical than idealist, easily ditching isolation guarantees might not be a good thing. That Oracle is the leader with their isolation problems or that most people run PostgreSQL under read committed is not a good argument to cut the corner and just go to bare minimum (if any) isolation guarantees. First, because PostgreSQL has always been trusted and understood as a system with *strong* guarantees (whatever that means). . Second, because what we may perceive as OK from the market, might change soon. From my observations, while I agree with you most people "don't care" or, worse, "don't realize", is rapidly changing. More and more people are becoming aware of the problems of distributed systems and the significant consequences they may have on them. A lot of them have been illustrated in the famous Jepsen posts. As an example, and a good one given that you have mentioned Galera before, is this one: https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster which demonstrates how Galera fails to provide Snapshot Isolation, even on healthy state --despite they claim that. As of today, I would expect any distributed system to clearly state its guarantees in the documentation. And them adhere to them, like for instance proving it with tests such as Jepsen. So I do not say that discussing all this theoretical questions is not need as formally proven correctness of distributed algorithm. I would like to see work forward here, so I really appreciate all your work here. I cannot give an opinion on whether the DTM API is good or not, but I agree with Robert a good technical discussion on these issues is a good, and a needed, starting point. Feedback may also help you avoid pitfalls that may have gone unnoticed until tons of code are implemented. Academical approaches are sometimes "very academical", but studying them doesn't hurt either :) Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about google summer of code 2016
On 22/02/16 23:34, Tom Lane wrote: =?UTF-8?Q?=c3=81lvaro_Hern=c3=a1ndez_Tortosa?= <a...@8kdata.com> writes: On 22/02/16 05:10, Tom Lane wrote: Another variable is that your answers might depend on what format you assume the client is trying to convert from/to. (It's presumably not text JSON, but then what is it?) As I mentioned before, there are many well-known JSON serialization formats, like: - http://ubjson.org/ - http://cbor.io/ - http://msgpack.org/ - BSON (ok, let's skip that one hehehe) - http://wiki.fasterxml.com/SmileFormatSpec Ah, the great thing about standards is there are so many to choose from :-( So I guess part of the GSoC project would have to be figuring out which one of these would make the most sense for us to adopt. regards, tom lane Yes. And unless I'm mistaken, there's an int16 to identify the data format. Apart from the chosen format, others may be provided as an alternative using different data formats. Or alternatives (like compressed text json). Of course, this may be better suited for a next GSoC project, of course. Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about google summer of code 2016
On 22/02/16 05:10, Tom Lane wrote: Heikki Linnakangas <hlinn...@iki.fi> writes: On 19/02/16 10:10, Ã�lvaro Hernández Tortosa wrote: Oleg and I discussed recently that a really good addition to a GSoC item would be to study whether it's convenient to have a binary serialization format for jsonb over the wire. Seems a bit risky for a GSoC project. We don't know if a different serialization format will be a win, or whether we want to do it in the end, until the benchmarking is done. It's also not clear what we're trying to achieve with the serialization format: smaller on-the-wire size, faster serialization in the server, faster parsing in the client, or what? Another variable is that your answers might depend on what format you assume the client is trying to convert from/to. (It's presumably not text JSON, but then what is it?) As I mentioned before, there are many well-known JSON serialization formats, like: - http://ubjson.org/ - http://cbor.io/ - http://msgpack.org/ - BSON (ok, let's skip that one hehehe) - http://wiki.fasterxml.com/SmileFormatSpec Having said that, I'm not sure that risk is a blocking factor here. History says that a large fraction of our GSoC projects don't result in a commit to core PG. As long as we're clear that "success" in this project isn't measured by getting a feature committed, it doesn't seem riskier than any other one. Maybe it's even less risky, because there's less of the success condition that's not under the GSoC student's control. Agreed :) Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about google summer of code 2016
On 21/02/16 21:15, Heikki Linnakangas wrote: On 19/02/16 10:10, Álvaro Hernández Tortosa wrote: Oleg and I discussed recently that a really good addition to a GSoC item would be to study whether it's convenient to have a binary serialization format for jsonb over the wire. Some argue this should be benchmarked first. So the scope for this project would be to benchmark and analyze the potential improvements and then agree on which format jsonb could be serialized to (apart from the current on-disk format, there are many json or nested k-v formats that could be used for sending over the wire). Seems a bit risky for a GSoC project. We don't know if a different serialization format will be a win, Over the current serialization (text) is hard to believe there will be no wins. or whether we want to do it in the end, until the benchmarking is done. It's also not clear what we're trying to achieve with the serialization format: smaller on-the-wire size, faster serialization in the server, faster parsing in the client, or what? Probably all of them (it would be ideal if it could be selectable). Some may favor small on-the-wire size (which can be significant with several serialization formats) or faster decoding (de-serialization takes a significant execution time). Of course, all this should be tested and benchmarked before, but we're not alone here. This is a significant request from many, at least from the Java users, where it has been discussed many times. Specially if wire format adheres to one well-known (or even Standard) format, so that the receiving side and the drivers could expose an API based on that format --one of the other big pains today in this side. I think it fits very well for a GSoC! :) Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about google summer of code 2016
Hi. Oleg and I discussed recently that a really good addition to a GSoC item would be to study whether it's convenient to have a binary serialization format for jsonb over the wire. Some argue this should be benchmarked first. So the scope for this project would be to benchmark and analyze the potential improvements and then agree on which format jsonb could be serialized to (apart from the current on-disk format, there are many json or nested k-v formats that could be used for sending over the wire). I would like to mentor this project with Oleg. Thanks, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata On 17/02/16 08:40, Amit Langote wrote: Hi Shubham, On 2016/02/17 16:27, Shubham Barai wrote: Hello everyone, I am currently pursuing my bachelor of engineering in computer science at Maharashtra Institute of Technology, Pune ,India. I am very excited about contributing to postgres through google summer of code program. Is postgres applying for gsoc 2016 as mentoring organization ? I think it does. Track this page for updates: http://www.postgresql.org/developer/summerofcode/ You can contact one of the people listed on that page for the latest. I didn't find for 2016 but here is the PostgreSQL wiki page for the last year's GSoC page: https://wiki.postgresql.org/wiki/GSoC_2015#Project_Ideas Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bugs and bug tracking
On 13/10/15 04:40, Tom Lane wrote: I'm with Robert on the idea that commit log entries need to be limited-width. I personally format them to 75 characters, so that git_changelog's output is less than 80 characters. regards, tom lane Little bit off-topic, but if precisely if we're trying to make the commits/bug-tracking/whatever system more user-friendly also for non-hacker users, I'd adhere to the 50/72 "standard" for commit messages, which seems to be quite extended: http://chris.beams.io/posts/git-commit/#seven-rules (50 chars for the commit summary, 72 chars line wrapping) Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bugs and bug tracking
On 13/10/15 16:24, Andres Freund wrote: On 2015-10-13 16:21:54 +0200, Álvaro Hernández Tortosa wrote: On 13/10/15 04:40, Tom Lane wrote: I'm with Robert on the idea that commit log entries need to be limited-width. I personally format them to 75 characters, so that git_changelog's output is less than 80 characters. regards, tom lane Little bit off-topic, but if precisely if we're trying to make the commits/bug-tracking/whatever system more user-friendly also for non-hacker users, I'd adhere to the 50/72 "standard" for commit messages, which seems to be quite extended: http://chris.beams.io/posts/git-commit/#seven-rules (50 chars for the commit summary, 72 chars line wrapping) -1 - imo 50 chars too often makes the commit summary too unspecific, requiring to read much more. I'm not strong advocate of 50 chars anyway, but if people are getting used to this, and probably also tools, I'd try to stick to it. And I believe you should be able to describe a commit in 50 chars. But we shouldn't of course deviate and start yet another thread on this, so it's all up to you :) Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bugs and bug tracking
On 13/10/15 17:39, Joshua D. Drake wrote: On 10/13/2015 08:15 AM, Tom Lane wrote: Andres Freund <and...@anarazel.de> writes: On 2015-10-13 16:21:54 +0200, �lvaro Hern�ndez Tortosa wrote: (50 chars for the commit summary, 72 chars line wrapping) -1 - imo 50 chars too often makes the commit summary too unspecific, requiring to read much more. I agree --- I have a hard enough time writing a good summary in 75 characters. 50 would be awful. The idea of writing a commit message that is useful in a number of characters that is less than half a tweet sounds unbearable. The idea of trying to discern what the hell a commit actually is in a number of characters that is less than half a tweet sounds completely ridiculous. -1 on that particular aspect. jD I'm writing a YC application and they ask you to summarize your whole project idea in less than 50 chars. So I guess that a commit message can be summarized under 50 chars too ^_^ We even do this with Java commits, and hey, you know, if you include a JavaStyleCamelCaseUnnecesarilyVerboseClassName in this summary you're screwed up! But it seems there's clear agreement on *not* restricting it to 50, so I have nothing else to add :) Best, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables
On 25/04/15 06:39, Jim Nasby wrote: On 4/24/15 7:11 PM, Álvaro Hernández Tortosa wrote: On 24/04/15 05:24, Tom Lane wrote: ... TBH, I've got very little enthusiasm for fixing this given the number of reports of trouble from the field, which so far as I recall is zero. Álvaro's case came up through intentionally trying to create an unreasonable number of tables, not from real usage. This thread likewise appears to contain lots of speculation and no reports of anyone hitting a problem in practice. It is certainly true that this was a very synthetic case. I envision, however, certain use cases where we may hit a very large number of tables: The original case has NOTHING to do with the number of tables and everything to do with the number of toasted values a table can have. If you have to toast 4B attributes in a single relation it will fail. In reality, if you get anywhere close to that things will fall apart due to OID conflicts. This case isn't nearly as insane as 4B tables. A table storing 10 text fields each of which is 2K would hit this limit with only 400M rows. If my math is right that's only 8TB; certainly not anything insane space-wise or rowcount-wise. Perhaps it's still not fixing, but I think it's definitely worth documenting. They are definitely different problems, but caused by similar symptoms: an oid wrapping around, or not even there: just trying to find an unused one. If fixed, we should probably look at both at the same time. It's worth document but also, as I said, maybe also fixing them, so that if three years from now they really show up, solution is already in production (rather than in patching state). Regards, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables
On 24/04/15 05:24, Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: * Bruce Momjian (br...@momjian.us) wrote: On Sun, Feb 1, 2015 at 03:54:03PM +0100, Álvaro Hernández Tortosa wrote: The problem here is that performance degrades exponentially, or worse. Speaking here from experience, we already tested this for a very similar case (table creation, where two oids are consumed from a global sequence when inserting to pg_class). Have a look at http://www.slideshare.net/nosys/billion-tables-project-nycpug-2013, slides 43-45. We tested there this scenario and shown that table creations per second dropped from 10K to a few per second and then to a few per day. In the graphs you can't even realize there were more tables been created. At around 8K tables from the theoretical limit of 4B oids consumed, the process basically stopped (doing more insertions). We don't report the maximum number of tables per database, or the maximum number of TOAST values. Agreed? For my 2c, this limitation is a surprise to users and therefore we should add documentation to point out that it exists, unless we're going to actually fix it (which is certainly what I'd prefer to see...). TBH, I've got very little enthusiasm for fixing this given the number of reports of trouble from the field, which so far as I recall is zero. Álvaro's case came up through intentionally trying to create an unreasonable number of tables, not from real usage. This thread likewise appears to contain lots of speculation and no reports of anyone hitting a problem in practice. It is certainly true that this was a very synthetic case. I envision, however, certain use cases where we may hit a very large number of tables: - Massive multitenancy - Aggressive partitioning - Massive multitenancy with aggressive partitioning - Software dynamically generated tables, like those created by ToroDB (https://github.com/torodb/torodb). In ToroDB we generate tables depending only on the input data, so we may end up having as many as required by the datasource. For example, a general purpose json datastore may generate several tables per document inserted. Certainly this is likely to become an issue at some point in the future, but I'm not finding it very compelling to worry about now. By the time it does become an issue, we may have additional considerations or use cases that should inform a solution; which seems to me to be a good argument not to try to fix it in advance of real problems. Perhaps, I understand this argument, and it makes sense. However, on the other side, given the long time it may take from patch to commit and then release version to companies finally using it in production, I'd rather try to fix it soon, as there are already reports and use cases that may hit it, rather than wait three years until it explodes in our faces. After all, 640Kb RAM is enough, right? So maybe 2B tables is not that far in the horizon. Who knows. Regards, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start
On 24/03/15 20:56, Bruce Momjian wrote: On Fri, Mar 20, 2015 at 04:43:42PM -0400, Bruce Momjian wrote: On Sat, Nov 8, 2014 at 09:53:18PM +0100, Álvaro Hernández Tortosa wrote: On 07/11/14 22:02, Greg Sabino Mullane wrote: Kevin Grittner wrote: I think most people have always assumed that BEGIN starts the transaction and that is the point at which the snapshot is obtained. But there is so much evidence to the contrary. Not only does the *name* of the command (BEGIN or START) imply a start, but pg_stat_activity shows the connection idle in transaction after the command (and before a snapshot is acquired) Er...I think we are arguing the same thing here. So no contrary needed? :) So do we agree to fix the docs? ^_^ Doc patch attached. Patch applied. Thanks for the report. Awesome! Thanks! :) Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing our reliance on MD5
On 11/02/15 02:30, Tom Lane wrote: [...] I think it would be wise to take two steps back and think about what the threat model is here, and what we actually need to improve. Offhand I can remember two distinct things we might wish to have more protection against: * scraping of passwords off the wire protocol (but is that still a threat in an SSL world?). Better salting practice would do more than replacing the algorithm as such for this, IMO. mitm We might consider it our problem or not, but in general terms man-in-the-middle attacks, which are easy to implement in many scenarios, are a scraping problem. In particular, I have seen tons of developers turn off SSL validation during development and not turning back it on for production, leaving servers vulnerable to password scraping under mitm attacks. So I would always considering hashing anyway. SCRAM seems to be a good solution anyway. Regards, Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables
On 31/01/15 14:55, Roger Pack wrote: [...] Now, the reality is that GetNewOidWithIndex() is going to keep incrementing the global OID counter until it finds an OID that isn't in the toast table. That means that if you actually get anywhere close to using 4B OIDs you're going to become extremely unhappy with the performance of toasting new data. OK so system stability doesn't degrade per se when it wraps [since they all use that GetNewOid method or similar [?] good to know. So basically when it gets near 4B TOAST'ed rows it may have to wrap that counter and search for unused number, and for each number it's querying the TOAST table to see if it's already used, degrading performance. The problem here is that performance degrades exponentially, or worse. Speaking here from experience, we already tested this for a very similar case (table creation, where two oids are consumed from a global sequence when inserting to pg_class). Have a look at http://www.slideshare.net/nosys/billion-tables-project-nycpug-2013, slides 43-45. We tested there this scenario and shown that table creations per second dropped from 10K to a few per second and then to a few per day. In the graphs you can't even realize there were more tables been created. At around 8K tables from the theoretical limit of 4B oids consumed, the process basically stopped (doing more insertions). Hope that this information helps. Best regards, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 12/12/14 20:43, Josh Berkus wrote: On 12/12/2014 11:35 AM, Alvaro Herrera wrote: Uh, really? Last I looked at the numbers from SPI treasurer reports, they are not impressive enough to hire a full-time engineer, let alone a senior one. The Linux Foundation has managed to pay for Linus Torvalds somehow, so it does sound possible. We have a number of companies making money all over the globe, at least. I think Álvaro (Herrera) got to the real point when he suggested to fund a developer. Or three, as was also suggested. But to really nail it down, I'd say not only for CFM. I think, overall, the PostgreSQL community would really need to seriously consider raising funds and pay with that full time development for some senior developers. That would also allow to have development more oriented into what the community really wants, rather than what other companies or individuals work for (which is absolutely great, of course). You're looking at this wrong. We have that amount of money in the account based on zero fundraising whatsoever, which we don't do because we don't spend the money. We get roughly $20,000 per year just by putting up a donate link, and not even promoting it. So, what this would take is: 1) a candidate who is currently a known major committer I think it would be even better to sell this approach as a long-term strategy, not tied to any particular candidate. Sure, some known major committer is for sure a good selling point; but a well communicated strategy for a long-term foundation-like fund raising to improve PostgreSQL in certain ways is the way to go. 2) clear goals for what this person would spend their time doing +1. That may be the Core Team based on feedback/input from all the list, or something like that 3) buy-in from the Core Team, the committers, and the general hackers community (including buy-in to the idea of favorable publicity for funding supporters) +1 4) an organizing committee with the time to deal with managing foundation funds +1. Absolutely necessary, otherwise funding will not work If we had those four things, the fundraising part would be easy. I speak as someone who used to raise $600,000 per year for a non-profit in individual gifts alone. I know it sounds difficult, and surely it is, but I believe the PostgreSQL community should be able to raise, globally, some millions per year to stably, and permanently, fund this community-guided development and have our best developers devoted 100% to PostgreSQL. Regards, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start
On 06/11/14 15:00, Kevin Grittner wrote: Álvaro Hernández Tortosa a...@8kdata.com wrote: There has been two comments which seem to state that changing this may introduce some performance problems and some limitations when you need to take out some locks. I still believe, however, that current behavior is confusing for the user. Sure, one option is to patch the documentation, as I was suggesting. Yeah, I thought that's what we were talking about, and in that regard I agree that the docs could be more clear. I'm not quite sure what to say where to fix that, but I can see how someone could be confused and have the expectation that once they have run BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE the transaction will not see the work of transactions committing after that. The fact that this is possible is implied, if one reads carefully and thinks about it, by the statement right near the start of the Transaction Isolation section which says any concurrent execution of a set of Serializable transactions is guaranteed to produce the same effect as running them one at a time in some order. As Robert pointed out, this is not necessarily the commit order or the transaction start order. It is entirely possible that if you have serializable transactions T1 and T2, where T1 executes BEGIN first (and even runs a query before T2 executes BEGIN) and T1 commits first, that T2 will appear to have run first because it will look at a set of data which T1 modifies and not see the changes. If T1 were to *also* look at a set of data which T2 modifies, then one of the transactions would be rolled back with a serialization failure, to prevent a cycle in the apparent order of execution; so the requirements of the standard (and of most software which is attempting to handle race conditions) is satisfied. For many popular benchmarks (and I suspect most common workloads) this provides the necessary protections with better performance than is possible using blocking to provide the required guarantees.[1] Yes, you're right in that the any concurrent execution... phrase implicitly means that snapshot may not be taken at BEGIN or SET TRANSACTION time, but it's definitely not clear enough for the average user. Yet this may apply to the serializable case, but it doesn't to the repeatable read where the docs read The Repeatable Read isolation level only sees data committed before the transaction began; it never sees either uncommitted data or changes committed during transaction execution by concurrent transactions. The first part is confusing, as we discussed; the second part is even more confusing as it says during transaction execution, and isn't the transaction -not the snapshot- beginning at BEGIN time? Surprisingly, the language is way more clear in the SET TRANSACTION doc page [2]. At any rate, the language in that section is a little fuzzy on the concept of the start of the transaction. Perhaps it would be enough to change language like: | sees a snapshot as of the start of the transaction, not as of the | start of the current query within the transaction. to: | sees a snapshot as of the start of the first query within the | transaction, not as of the start of the current query within the | transaction. Would that have prevented the confusion here? I think that definitely helps. But it may be better to make it even more clear, more explicit. And offering a solution for the user who may like the snapshot to be taken at begin time, like suggesting to do a SELECT 1 query. But what about creating a flag to BEGIN and SET TRANSACTION commands, called IMMEDIATE FREEZE (or something similar), which applies only to REPEATABLE READ and SERIALIZABLE? If this flag is set (and may be off by default, but of course the default may be configurable via a guc parameter), freeze happens when it is present (BEGIN or SET TRANSACTION) time. This would be a backwards-compatible change, while would provide the option of freezing without the nasty hack of having to do a SELECT 1 prior to your real queries, and everything will of course be well documented. What is the use case where you are having a problem? This seems like an odd solution, so it would be helpful to know what problem it is attempting to solve. I don't have a particular use case. I just came across the issue and thought the documentation and behavior wasn't consistent. So the first aim is not to have users surprised (in a bad way). But I see a clear use case: users who might want to open a (repeatable read | serializable) transaction to have their view of the database frozen, to perform any later operation on that frozen view. Sure, that comes at a penalty, but I see that potentially interesting too. Regards, Álvaro [1] http://www.postgresql.org/docs/9.4/static/transaction-iso.html [2] http://www.postgresql.org/docs/9.4/static/sql-set-transaction.html -- Álvaro Hernández Tortosa
Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start
On 07/11/14 22:02, Greg Sabino Mullane wrote: -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 Kevin Grittner wrote: I think most people have always assumed that BEGIN starts the transaction and that is the point at which the snapshot is obtained. But there is so much evidence to the contrary. Not only does the *name* of the command (BEGIN or START) imply a start, but pg_stat_activity shows the connection idle in transaction after the command (and before a snapshot is acquired) Er...I think we are arguing the same thing here. So no contrary needed? :) So do we agree to fix the docs? ^_^ Why? This fix might not deal with the bigger issues that I discussed, like that the later-to-start and later-to-acquire-a-snapshot transaction might logically be first in the apparent order of execution. You can't fix that without a lot of blocking -- that most of us don't want. Right, which is why the suggestion of a user-controllable switch, that defaults to the current behavior, seems an excellent compromise. I also think so. It's backwards-compatible and opt-in. It also makes the documentation very clear, as there is an specific option for this. Depending on *why* they think this is important, they might need to be acquiring various locks to prevent behavior they don't want, in which case having acquired a snapshot at BEGIN would be exactly the *wrong* thing to do. The exact nature of the problem we're trying to solve here does matter. I cannot speak to the OP, but I also do not think we should try and figure out every possible scenario people may have. Certainly the long-standing documentation bug may have caused some unexpected or unwanted behavior, so let's start by fixing that. +1 Tom Lane wrote: Another thing that I think hasn't been mentioned in this thread is that we used to have severe problems with client libraries that like to issue BEGIN and then go idle until they have something to do. Which, for some reason, is a prevalent behavior. I'm not advocating changing the default behavior, but I would not want to see bad client libraries used a reason for any change we make. Clients should not be doing this, period, and there is no reason for us to support that. If the IMMEDIATE FREEZE mode is not the default, as I suggested, it shouldn't introduce any problem with past code. Regards, Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start
On 06/11/14 02:06, Jim Nasby wrote: On 11/5/14, 6:04 PM, Álvaro Hernández Tortosa wrote: On 05/11/14 17:46, Jim Nasby wrote: On 11/4/14, 6:11 PM, Álvaro Hernández Tortosa wrote: Should we improve then the docs stating this more clearly? Any objection to do this? If we go that route we should also mention that now() will no longer be doing what you probably hope it would (AFAIK it's driven by BEGIN and not the first snapshot). If I understand you correctly, you mean that if we add a note to the documentation stating that the transaction really freezes when you do the first query, people would expect now() to be also frozen when the first query is done, which is not what happens (it's frozen at tx start). Then, yes, you're right, probably *both* the isolation levels and the now() function documentation should be patched to become more precise. Bingo. Hrm, is there anything else that differs between the two? Perhaps we should change how now() works, but I'm worried about what that might do to existing applications... Perhaps, I also believe it might not be good for existing applications, but it definitely has a different freeze behavior, which seems inconsistent too. Yeah, I'd really rather fix it... There has been two comments which seem to state that changing this may introduce some performance problems and some limitations when you need to take out some locks. I still believe, however, that current behavior is confusing for the user. Sure, one option is to patch the documentation, as I was suggesting. But what about creating a flag to BEGIN and SET TRANSACTION commands, called IMMEDIATE FREEZE (or something similar), which applies only to REPEATABLE READ and SERIALIZABLE? If this flag is set (and may be off by default, but of course the default may be configurable via a guc parameter), freeze happens when it is present (BEGIN or SET TRANSACTION) time. This would be a backwards-compatible change, while would provide the option of freezing without the nasty hack of having to do a SELECT 1 prior to your real queries, and everything will of course be well documented. What do you think? Best regards, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start
On 05/11/14 17:46, Jim Nasby wrote: On 11/4/14, 6:11 PM, Álvaro Hernández Tortosa wrote: Should we improve then the docs stating this more clearly? Any objection to do this? If we go that route we should also mention that now() will no longer be doing what you probably hope it would (AFAIK it's driven by BEGIN and not the first snapshot). If I understand you correctly, you mean that if we add a note to the documentation stating that the transaction really freezes when you do the first query, people would expect now() to be also frozen when the first query is done, which is not what happens (it's frozen at tx start). Then, yes, you're right, probably *both* the isolation levels and the now() function documentation should be patched to become more precise. Perhaps we should change how now() works, but I'm worried about what that might do to existing applications... Perhaps, I also believe it might not be good for existing applications, but it definitely has a different freeze behavior, which seems inconsistent too. Thanks, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start
On 06/11/14 00:42, Robert Haas wrote: On Mon, Nov 3, 2014 at 2:14 PM, Álvaro Hernández Tortosa a...@8kdata.com wrote: Given a transaction started with BEGIN (REPEATABLE READ | SERIALIZABLE), if a concurrent session commits some data before *any* query within the first transaction, that committed data is seen by the transaction. This is not what I'd expect. I think the problem is with your expectation, not the behavior. But my expectation is derived from the documentation: The Repeatable Read isolation level only sees data committed before the transaction began; In PostgreSQL you will see data committed after a BEGIN ... (REPEATABLE READ | SERIALIZABLE) statement (only before the first query). And it's reasonable to think that transaction begins when you issue a BEGIN statement. It's also reasonable to think this way as: - now() is frozen at BEGIN time, as Nasby pointed out - pg_stat_activity says that the transaction is started, as Kevin mentioned So if the behavior is different from what the documentation says and what other external indicators may point out, I think at least documentation should be clear about this precise behavior, to avoid confusing users. Serializable means that the transactions execute in such a fashion that their parallel execution is equivalent to some serial order of execution. It doesn't say that it must be equivalent to any particular order that you might imagine, such as the order in which the transactions commit, or, as you propose, the order in which they begin. Generally, I think that's a good thing, because transaction isolation is expensive: even at repeatable read, but for the need to provide transaction isolation, there would be no such thing as bloat. The repeatable read and serializable levels add further overhead of various kinds. We could provide a super-duper serializable level that provides even tighter guarantees, but (1) I can't imagine many people are keen to make the cost of serialization even higher than it already is and (2) if you really want that behavior, just do some trivial operation sufficient to cause a snapshot to be taken immediately after the BEGIN. I'm not really asking for a new isolation level, just that either BEGIN really freezes (for repeatable read and serializable) or if that's expensive and not going to happen, that the documentation clearly states the fact that freeze starts at first-query-time, and that if you need to freeze before your first real query, you should do a dummy one instead (like SELECT 1). Also, if this early freeze is a performance hit -and for that reason BEGIN is not going to be changed to freeze- then that also should be pointed out in the documentation, so that users that freeze early with SELECT 1-type queries understand that. Regards, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start
On 04/11/14 09:07, Craig Ringer wrote: On 11/04/2014 07:31 AM, Álvaro Hernández Tortosa wrote: Thank you for your comment, Tom. However I think this behavior, as seen from a user perspective, it's not the expected one. That may be the case, but I think it's the SQL-standard behaviour, so we can't really mess with it. The spec requires SET TRANSACTION ISOLATION, and you can't implement that if you take a snapshot at BEGIN. It's true that the standard mandates SET TRANSACTION rather than setting the isolation level with the BEGIN statement, and in any case you can raise/lower the isolation level with SET regardless of what the session or the begin command said. However, is it really a problem taking a snapshot at BEGIN time --only if the tx is started with BEGIN ... (REPEATABLE READ | SERIALIZABLE)? AFAIK, and I may be missing some internal details here, the worst that can happen is that you took one extra, unnecessary snapshot. I don't see that as a huge problem. The standard (92) says that transaction is initiated when a transaction-initiating SQL-statement is executed. To be fair, that sounds to me more of a SELECT rather than a BEGIN, but I may be wrong. If it is still the intended behavior, I think it should be clearly documented as such, and a recommendation similar to issue a 'SELECT 1' right after BEGIN to freeze the data before any own query or similar comment should be added. Again, as I said in my email, the documentation clearly says that only sees data committed before the transaction began. And this is clearly not the real behavior. It's more of a difference in when the transaction begins. Arguably, BEGIN says I intend to begin a new transaction with the next query rather than immediately begin executing a new transaction. This concept could be clearer in the docs. If this is really how it should behave, I'd +1000 to make it clearer in the docs, and to explicitly suggest the user to perform a query discarding the results early after BEGIN if the user wants the state freezed if there may span time between BEGIN and the real queries to be executed (like doing a SELECT 1). Sure, there are, that was the link I pointed out, but I found no explicit mention to the fact that I'm raising here. I'm sure it's documented *somewhere*, in that I remember reading about this detail in the docs, but I can't find _where_ in the docs. It doesn't seem to be in: http://www.postgresql.org/docs/current/static/transaction-iso.html where I'd expect. Yepp, there's no mention there. In any case, we simply cannot take the snapshot at BEGIN time, because it's permitted to: SET TRANSACTION ISOLATION LEVEL READ COMMITTED; in a DB that has default serializable isolation or has a SET SESSION CHARACTERISTICS isolation mode of serializable. Note that SET TRANSACTION is SQL-standard. As I said, AFAIK it shouldn't matter a lot to take the snapshot at BEGIN. The worst that can happen is that you end up in read committed and you need to take more snapshots, one per query. AFAIK deferring the snapshot that's consistent with other RDBMSes that use snapshots, too. I tried Oracle and SQL Server. SQL Server seems to behave as PostgreSQL, but just because it locks the table if accessed in a serializable transaction, so it definitely waits until select to lock it. However, Oracle behaved as I expected: data is frozen at BEGIN time. I haven't tested others. The docs of that command allude to, but doesn't explicitly state, the behaviour you mention. http://www.postgresql.org/docs/current/static/sql-set-transaction.html Should we improve then the docs stating this more clearly? Any objection to do this? Regards, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Repeatable read and serializable transactions see data committed after tx start
Hi! Given a transaction started with BEGIN (REPEATABLE READ | SERIALIZABLE), if a concurrent session commits some data before *any* query within the first transaction, that committed data is seen by the transaction. This is not what I'd expect. Specifically, the documentation states that: The Repeatable Read isolation level only sees data committed before the transaction began; [1] IMHO, from a user perspective the transaction begins when the BEGIN command is issued. If I really want to see a frozen view of the database before any real SELECT, I have to issue another query like SELECT 1. This seems odd to me. I understand tx snapshot may be deferred until real execution for performance reasons, but it is confusing from a user perspective. Is this really expected, or is it a bug? Am I having a bad day and missing some point here? ^_^ Regards, Álvaro [1] http://www.postgresql.org/docs/devel/static/transaction-iso.html P.S. In case it wasn't clear what I meant, here's an example: Session 1 Session 2 CREATE TABLE i (i integer); BEGIN ISOLATION LEVEL REPEATABLE READ; INSERT INTO i VALUES (1); SELECT i FROM i; -- returns 1 row, value 1 -- should return empty set INSERT INTO i VALUES (2); SELECT i FROM i; -- returns 1 row, value 1 -- returns, as it should, the same as the previous query In the first select, I'd have expected to have no rows. If a SELECT 1 is issued after BEGIN, there are no rows found. -- Álvaro Hernández Tortosa --- 8Kdata
Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start
On 03/11/14 22:19, Tom Lane wrote: =?ISO-8859-1?Q?=C1lvaro_Hern=E1ndez_Tortosa?= a...@8kdata.com writes: IMHO, from a user perspective the transaction begins when the BEGIN command is issued. If I really want to see a frozen view of the database before any real SELECT, I have to issue another query like SELECT 1. This seems odd to me. I understand tx snapshot may be deferred until real execution for performance reasons, but it is confusing from a user perspective. Is this really expected, or is it a bug? Am I having a bad day and missing some point here? ^_^ It's expected. Without this behavior, you could not take out any locks before freezing the transaction snapshot, which would be a bad thing. Thank you for your comment, Tom. However I think this behavior, as seen from a user perspective, it's not the expected one. There may be some internal reasons for it, but for the user executing the transaction, it's normal to expect the freezing to happen right after the BEGIN, rather than *after* the first query. If it is still the intended behavior, I think it should be clearly documented as such, and a recommendation similar to issue a 'SELECT 1' right after BEGIN to freeze the data before any own query or similar comment should be added. Again, as I said in my email, the documentation clearly says that only sees data committed before the transaction began. And this is clearly not the real behavior. I think there are some examples in the concurrency control chapter of the manual. Sure, there are, that was the link I pointed out, but I found no explicit mention to the fact that I'm raising here. Regards, Álvaro -- Álvaro Hernández Tortosa --- 8Kdata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 04/09/14 18:02, Craig Ringer wrote: On 09/04/2014 06:48 AM, Joshua D. Drake wrote: On 09/03/2014 11:48 AM, Robert Haas wrote: Anyway, to get back around to the topic of PL/SQL compatibility specifically, if you care about that issue, pick one thing that exists in PL/SQL but not in PL/pgsql and try to do something about it. Maybe it'll be something that EnterpiseDB has already done something about, in which case, if your patch gets committed, Advanced Server will lose a bit of distinction as compared with PostgreSQL. Or maybe it'll be something that EnterpriseDB hasn't done anything about, and then everybody comes out strictly ahead. What I think you shouldn't do (although you're free to ignore me) is continue thinking of Oracle compatibility as one monolithic thing, because it isn't, or to pursue of a course of trying to get the PostgreSQL community to slavishly follow Oracle, because I think you'll fail, and even if you succeed I don't think the results will actually be positive for PostgreSQL. Well put Robert. Indeed, especially with reference to the size and scope of Oracle. Its XML library alone is huge. At best it's reasonable to hope for compatibility with a limited subset of PL/SQL - and really, we're a good way there already, with most of what's missing being down to missing core server features or things PostgreSQL just does differently. True Oracle compatibility (for procedures) pretty much requires an embedded JVM with a rich class library. Since PL/Java seems to be dying a slow death by neglect and disinterest I don't think it's likely anyone would be tackling compatibility with the embedded JVM features anytime soon. There are a few things I would like to see, like secure session variables in PL/PgSQL. Mostly, though, I think talk of Oracle compatibility seems to be something that comes up before the speaker has really understood what that would mean, and the sheer scope of the endeavour. It's not going from 50% compatible to 80% compatible, it's going from 5% compatible to 7% compatible. The most used 5% maybe, but still... Getting that 5% of what is most used, would be a great gain. Maybe the speaker is mislead in the size of the endeavour, but quite sure about what that market needs are ;) Cheers, Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 03/09/14 20:48, Robert Haas wrote: On Tue, Sep 2, 2014 at 5:47 PM, Álvaro Hernández Tortosa a...@nosys.es wrote: Yeah, we differ there. I think having an Oracle compatibility layer in PostgreSQL would be the-next-big-thing we could have. Oracle is has orders of magnitude bigger user base than postgres has; and having the ability to attract them would bring us many many more users which, in turn, would benefit us all very significantly. It would be my #1 priority to do in postgres (but yes, I know -guess- how hard and what resources that would require). But dreaming is free :) There are a number of reasons why this isn't really practical. First, Oracle compatibility isn't one feature. The compatibility layer that exists in EnterpriseDB's Advanced Server product consists of many different changes to many different parts of the system. A few of those changes are simple syntax compatibility, where we do the exact same thing PostgreSQL does but with different syntax, but a lot of them are functional enhancements. Even within SPL, there's a whole bunch of different changes to a whole bunch of different areas, and most of those are functional enhancements rather than just tweaking syntax. So, if you tried to implement a new, Oracle-compatible PL, you'd find that you don't have one or a small number of changes to make, but a long series of features ranging from small to very large. You'd also find that adding a new PL, without changing any other parts of the server, only bridges a small part of the compatibility gap. Hi Robert, thanks for the insights here. Understood it is not a single thing the compatibility layer. And it's sure a very long and involved task to build such compatibility parts. However, I don't see anything bad in having one or some parts of it. For example, having a pl that is similar -maybe only syntax- is a good thing. Sure, there are surely lot of things that can't be done simply, tons of functions not available and so on, but that alone would mean Oracle users would feel both more comfortable and making their current code easier to port. That would already be a lot. Second, if you did manage to develop something which was significantly more compatible with Oracle than PostgreSQL or PL/pgsql is today, you'd probably find that the community wouldn't accept it. It's almost misleading to think of Oracle as a database; it's an enormous software ecosystem with facilities for doing just about everything under the sun, and many of those things more than one way. For example, in 9.4, EnterpriseDB will be releasing a UTL_HTTP package that contains many of the same interfaces that are present in Oracle. The interface decisions made by Oracle Corporation are reasonable in view of their architecture, but I am quite sure that this community would not want, for example, to return long text values as SETOF VARCHAR(2000) rather than TEXT, just because Oracle does that. And rightly so: I wouldn't want PostgreSQL to follow any other product that slavishly whether I worked at EnterpriseDB or not. This kind of thing crops up over and over again, and it only works to say that PostgreSQL should choose the Oracle behavior in every case if you believe that the primary mission of PostgreSQL should be to copy Oracle, and I don't. I also don't think it's a bad thing that Advanced Server makes different decisions than PostgreSQL in some cases. A further problem is that, in this particular case, you'd probably here the argument from PostgreSQL hackers that they really don't want to be burdened with maintaining an HTTP client in the core server when the same thing could be done from an extension, and that's a valid argument, too. It is also valid for EnterpriseDB to make a different decision for itself, based on business priorities. I wouldn't follow those routes just for doing perfect compatibility. I agree, and I'd never push for those. In the light of all these things, I'd never expect perfect, scientific compatibility, but a best, but well documented, effort. Now, none of that is to say that we wouldn't do well to give a little more thought to Oracle compatibility than we do. We've either made or narrowly avoided a number of decisions over the years which introduced - or threatened to introduce - minor, pointless incompatibilities with other database products, Oracle included. That really doesn't benefit anyone. To take another example, I've been complaining about the fact that PostgreSQL 8.3+ requires far more typecasts in stored procedures than any other database I'm aware of for years, probably since before I joined EnterpriseDB. And I still think we're kidding ourselves to think that we've got that right when nobody else is doing something similar. I don't think the community should reverse that decision to benefit EnterpriseDB, or to be compatible with Oracle: I think the community should reverse that decision because it's stupid
Re: [HACKERS] PL/pgSQL 2
On 03/09/14 15:24, Joshua D. Drake wrote: On 09/02/2014 04:01 PM, Álvaro Hernández Tortosa wrote: It's not copying. It's easying a path for others to migrate and come to Postgres. I'm interested why you are more interested in MSSQL. My reasons for being interested in Oracle are: - It has more users (biggest and above all, the main reason: we could attract more) - Postgres is perceived as similar to Oracle (so migration is likely to be easier) That's all I want. Grow postgres userbase, attracting Oracle users :) I find that we have more opportunity to replace MSSQL than Oracle. Obviously it depends on a lot of things but my goal is as yours, just with a different database. Honestly, I don't care whether MSSQL or Oracle. What I want is to attract more users, get Postgres out of where it is and appeal even more users. With that regard, Oracle or MSSQL doesn't matter to me. That's why if you have some time, I'd love to listen to why do you think there is more opportunity to replace MSSQL. We may continue that privately as is a little bit off-topic. Thanks! Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 02/09/14 05:24, Craig Ringer wrote: I couldn't disagree more. If we were to implement anything, it'd be PL/PSM (http://en.wikipedia.org/wiki/SQL/PSM). I'm sure it's as bizarre and quirky as anything else the SQL committee has brought forth, but it's at least a standard(ish) language. So we'd choose a bizarre and quirky language instead of anything better just because it's standard. I'm sure current and prospective users will surely prefer a bizarre and quirky language that is standard approved, rather than a modern, comfortable, easy-to-use, that is not embodied by the ISO. No doubt ^_^ Creating a new language when there are already many existing contenders is absolutely nonsensical. Other than PL/PSM the only thing that'd make any sense would be to *pick a suitable existing language* like Lua or JavaScript and bless it as a supported, always-available, in-core language runtime that's compiled in by default. That is in my opinion a way more sensible choice. To bless PL/JavaScript as an in-core language would be a very wise choice. Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 02/09/14 06:40, Tom Lane wrote: Craig Ringer cr...@2ndquadrant.com writes: If someone came up with a convincing PL/SQL compatibility layer then it'd be worth considering adopting - when it was ready. But of course, anyone who does the work for that is quite likely to want to sell it to cashed-up Oracle users looking to save a few hundred grand on per-CPU licensing. As a case in point, EDB have spent quite a few man-years on their Oracle compatibility layer; and it's still not a terribly exact match, according to my colleagues who have looked at it. So that is a tarbaby I don't personally care to touch ... even ignoring the fact that cutting off EDB's air supply wouldn't be a good thing for the community to do. regards, tom lane OK, so this compatibility layer is tough. Knew that already ;) But on the other side, the syntax is similar to plpgsql, right? So what about just having a compatible syntax? It would be the first step to that compatibility layer, which could -or could not- be a long-term goal for postgres (having the whole layer). I don't buy that having that would cut EDB's air supply. They're doing great, and they know how to take care of themselves, I'm sure ;) Besides that, competition is always positive, and I'm sure they'd be more benefited than harmed by postgres having that layer. If we are to have another plpgsql-like language (like plpgsql2) and we could design it so it would attract many many users (let's not forget that Oracle may have around two orders of magnitude more users than pg), that would benefit us all greatly. Even if not perfect. Even if it is a longer project which spans more than one release. But just having the syntax (or most of it, maybe avoiding some complex unimplemented postgres features, if that required a huge effort) is a big win. For 9.4, we have the media already saying Postgres has NoSQL capabilities (which is only partially true). For x.y we could have the media saying Postgres adds Oracle compatibility (which would be only partially true). But that brings a lot of users to postgres, and that helps us all. And also it could serve as a motivation point to implement those in-core missing features, too, that Oracle has. If on the other hand we resign from attracting Oracle users, in a moment where non-Oracle databases are fighting for them. and we lose here well, let's at least have a very compelling, attractive, in-core, blessed, language. Even disliking it myself, PL/JavaScript would be my #1 candidate there. My 4 (already) cents, Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 02/09/14 11:34, Mark Kirkwood wrote: On 02/09/14 21:25, Álvaro Hernández Tortosa wrote: On 02/09/14 05:24, Craig Ringer wrote: I couldn't disagree more. If we were to implement anything, it'd be PL/PSM (http://en.wikipedia.org/wiki/SQL/PSM). I'm sure it's as bizarre and quirky as anything else the SQL committee has brought forth, but it's at least a standard(ish) language. So we'd choose a bizarre and quirky language instead of anything better just because it's standard. I'm sure current and prospective users will surely prefer a bizarre and quirky language that is standard approved, rather than a modern, comfortable, easy-to-use, that is not embodied by the ISO. No doubt ^_^ Well there is the risk that by randomly adding new syntax to PL/pgSQL we turn it in a bizarre and quirky *non standard* language. Part of the attraction of PL/pgsql is that it is Ada like - if we break that too much then...well...that would be bad. So I think a careful balance is needed, to add new features that keep the spirit of the original language. I agree. I think I haven't suggested adding new syntax to pl/pgsql. But having its syntax similar to ADA is IMHO not something good. I'm sure few prospective postgres users would be compelled to that. They are compelled about JavaScript, python, Scala or Ruby, to name a few, but definitely not ADA. Regards, Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 02/09/14 11:31, Pavel Stehule wrote: 2014-09-02 11:25 GMT+02:00 Álvaro Hernández Tortosa a...@nosys.es mailto:a...@nosys.es: On 02/09/14 05:24, Craig Ringer wrote: I couldn't disagree more. If we were to implement anything, it'd be PL/PSM (http://en.wikipedia.org/wiki/SQL/PSM). I'm sure it's as bizarre and quirky as anything else the SQL committee has brought forth, but it's at least a standard(ish) language. So we'd choose a bizarre and quirky language instead of anything better just because it's standard. I'm sure current and prospective users will surely prefer a bizarre and quirky language that is standard approved, rather than a modern, comfortable, easy-to-use, that is not embodied by the ISO. No doubt ^_^ SQL/PSM is used in DB2, Sybase Anywhere, MySQL, That's a way better argument that it's standard :))) Still, I think postgres is in the position of attracting more Oracle than DB2+Sybase+MySQL users Álvaro
Re: [HACKERS] PL/pgSQL 2
On 02/09/14 11:44, Pavel Stehule wrote: For 9.4, we have the media already saying Postgres has NoSQL capabilities (which is only partially true). For x.y we could have the media saying Postgres adds Oracle compatibility (which would be only partially true). But that brings a lot of users to postgres, and that helps us all. Partial true can enforce so lot of people will hate postgres too. False promises are wrong Then let's stop talking about postgres being NoSQL. NoSQL is basically schema-less (really bad name) plus infinite scalability (which basically means transparent sharding). We fail to provide the latter very clearly... Álvaro
Re: [HACKERS] PL/pgSQL 2
On 02/09/14 11:56, Pavel Stehule wrote: 2014-09-02 11:50 GMT+02:00 Álvaro Hernández Tortosa a...@nosys.es mailto:a...@nosys.es: On 02/09/14 11:31, Pavel Stehule wrote: 2014-09-02 11:25 GMT+02:00 Álvaro Hernández Tortosa a...@nosys.es mailto:a...@nosys.es: On 02/09/14 05:24, Craig Ringer wrote: I couldn't disagree more. If we were to implement anything, it'd be PL/PSM (http://en.wikipedia.org/wiki/SQL/PSM). I'm sure it's as bizarre and quirky as anything else the SQL committee has brought forth, but it's at least a standard(ish) language. So we'd choose a bizarre and quirky language instead of anything better just because it's standard. I'm sure current and prospective users will surely prefer a bizarre and quirky language that is standard approved, rather than a modern, comfortable, easy-to-use, that is not embodied by the ISO. No doubt ^_^ SQL/PSM is used in DB2, Sybase Anywhere, MySQL, That's a way better argument that it's standard :))) Still, I think postgres is in the position of attracting more Oracle than DB2+Sybase+MySQL users Not all can be happy :) We can implement SQL/PSM in conformity with ANSI SQL. But we cannot to implement PL/SQL be in 20% compatible with oracle - Aggegates, pipe functions, collections, without rewriting lot code. I remember lot of projects that promises compatibility with Oracle based on Firebird -- all are dead. Now situation is little bit different - there are big press for migration from Oracle, but Oracle is too big monster. OK. Thanks for all the info I was missing about this complexity, I see that it goes well beyond the syntax thing. However, I'd insist that this should be IMHO a big priority, and I'd set it as a long-term goal. Even better if it could have a phased approach, that would make a lot of people happier (targeting the most used functionality). I'm sure pushing us to implement those missing features would also be really good, too. In the meantime, having another language (probably not plpgsql2) that is modern and appealing to many users would be a very nice win. Regards, Álvaro
Re: [HACKERS] PL/pgSQL 2
On 02/09/14 12:46, Marko Tiikkaja wrote: On 9/2/14 11:40 AM, Álvaro Hernández Tortosa wrote: If we are to have another plpgsql-like language (like plpgsql2) and we could design it so it would attract many many users (let's not forget that Oracle may have around two orders of magnitude more users than pg), that would benefit us all greatly. Even if not perfect. Even if it is a longer project which spans more than one release. But just having the syntax (or most of it, maybe avoiding some complex unimplemented postgres features, if that required a huge effort) is a big win. Have you looked at http://www.postgresql.org/docs/9.3/static/plpgsql-porting.html already? Precisely this page shows some indications of examples of things that could be done at a language level that would make it way easier to port from PL/SQL (if you don't use that unsupported stuff). At least for that, if the syntax is exactly the same, it could make things much more comfortable (I'm not aiming for a 0-effort port, at least in first place, but to get the 80% or 60% easier than now). As far as I can tell, that already *is* the case as far as the language goes. It seems to me that most of the stuff that's different between the two are things that are out of the control of the language (no autonomous transactions, function source code in a literal etc.) Maybe it would be interesting to analyze: - What it's impossible to have right now in postgres - What can be implemented in a different way, but that would work in postgres - What could be somehow emulated And adapt the syntax as much as possible to aim for the biggest compatibility possible. For 9.4, we have the media already saying Postgres has NoSQL capabilities (which is only partially true). For x.y we could have the media saying Postgres adds Oracle compatibility (which would be only partially true). But that brings a lot of users to postgres, and that helps us all. This would be a horrible, horrible lie. Certainly not more horrible than today's PostgreSQL has NoSQL. Despite that, I'm not saying I'd lie. I'd say what the media would say, which is completely different. If on the other hand we resign from attracting Oracle users, in a moment where non-Oracle databases are fighting for them. and we lose here well, let's at least have a very compelling, attractive, in-core, blessed, language. Even disliking it myself, PL/JavaScript would be my #1 candidate there. The best part about PL/PgSQL is the seamless integration with SQL. You can put an SQL expression pretty much anywhere. How well would that work if the framework was Javascript instead of the ADA-like body that both PL/SQL and PL/PgSQL implement? SQL integration is a must in a PL/* language, that's for sure. But leveraging a well known language, tooling, and, specially, external libraries/ecosystem is a much bigger win. Specially if all the languages that I know of are capable (with more or less effort) to integrate SQL. So maybe JavaScript with a way of integrating SQL would be preferable IMO. Regards, Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 02/09/14 17:03, Hannu Krosing wrote: On 09/02/2014 11:52 AM, Álvaro Hernández Tortosa wrote: On 02/09/14 11:44, Pavel Stehule wrote: For 9.4, we have the media already saying Postgres has NoSQL capabilities (which is only partially true). For x.y we could have the media saying Postgres adds Oracle compatibility (which would be only partially true). But that brings a lot of users to postgres, and that helps us all. Partial true can enforce so lot of people will hate postgres too. False promises are wrong Then let's stop talking about postgres being NoSQL. NoSQL is basically schema-less (really bad name) plus infinite scalability (which basically means transparent sharding). We fail to provide the latter very clearly... Have you ever tried any of the real NoSQL products version of infinite scalability ? Yes, and they are absolutely not infinite, and they suck in many other places. But they scale beyond one node, transparently, something that postgres doesn't. And regardless, this is what people is buying, we like it or not. We are no worse than most if you use just the unstructured part (which is what the NoSQL crowd provides) and something like pl/proxy for scaling. We are definitely worse. This is the problem, we only look to our own belly bottom (if this expression exists in English). All NoSQL scale *easily*, *transparently* beyond one node. Postgres doesn't. I'm not saying they don't suck at many many other things, or that some of them may be worse solution than the problem. But despite JSON/JSONB in pg is awesome, it's far far away from what we need to compete agains NoSQL in these regards. Ask anyone not in the postgres world to use pl/proxy for scaling and they will run away to mongo/whatever. Talk about HA... and the discussion is over :( I know how hard these problems are in the general, transactional approach that postgres takes, and that NoSQL does this for very simple, non-ACID cases, but they do. Hence, we cannot claim NoSQL compliance, just because we have jsonb. Unfortunately :( (Surely we do have many other values, but let's not say that we have NoSQL capabilities, because we don't while others -better or worse- do). Regards, Álvaro
Re: [HACKERS] PL/pgSQL 2
On 02/09/14 18:20, Joel Jacobson wrote: On Tue, Sep 2, 2014 at 6:09 PM, Kevin Grittner kgri...@ymail.com wrote: Joel Jacobson j...@trustly.com wrote: Sorry for being unclear, I didn't mean to suggest the main concern is updating *all* rows. The main concern is when you have a rather complex UPDATE WHERE clause, aiming to update exactly one row. Some of the expressions might be assertions, to just double-verify the values and to make it stand-out you are checking those expressions. These are two different problems which probably need two different solutions. Making the default behavior of a set-based command that it throw an error if the resulting set is not exactly one row doesn't seem like the right solution to either one of them. I see your point. Basically, we have two types of applications where PL/pgSQL is commonly used. a) OLTP applications where you typically operate on one row for each UPDATE command. b) Data warehouseing applications where you process multiple rows in each UPDATE command. Both have different desired default behaviours of the different set-based commands used in PL/pgSQL. I think both are important enough to motivate a nice syntax for both use-cases. If we cannot change the default behaviour of UPDATE, then I vote for the eariler proposed STRICT UPDATE syntax. I see both use cases, but I think the SQL default are set operations. For this particular point, if there would be just a syntax change rather than a new language, in the line of the ONE ROW UPDATE syntax, I'd say UNIQUE UPDATE. This {ONE ROW, UNIQUE} syntax may also enforce having a WHERE clause. I find both better than the STRICT, which is not doing what -you may think- it does. Regards, Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 02/09/14 18:33, Hannu Krosing wrote: On 09/02/2014 06:27 PM, Joel Jacobson wrote: On Tue, Sep 2, 2014 at 6:11 PM, Álvaro Hernández Tortosa a...@nosys.es wrote: We are definitely worse. This is the problem, we only look to our own belly bottom (if this expression exists in English). All NoSQL scale *easily*, *transparently* beyond one node. Postgres doesn't. I'm not saying they don't suck at many many other things, or that some of them may be worse solution than the problem. But despite JSON/JSONB in pg is awesome, it's far far away from what we need to compete agains NoSQL in these regards. So the discussion started out with a desire to improve PL/pgSQL. Now somehow NoSQL and JSON is discussed in the same thread. Interesting. Godwin's Law never fails :-) http://en.wikipedia.org/wiki/Godwin's_law Not to mention completely unsubstantiated claims about *all* NoSQL scaling *easily* and *transparently* beyond one node :) Honestly, this is off-topic and we can argue forever, but regardless all do or not, what's sure is that Postgres doesn't have horizontal scalability. Period. And this is what we should look at. And we can't claim we're NoSQL until we have (easy, transparent) horizontal scalability. Best, Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 02/09/14 23:11, David Johnston wrote: On Tue, Sep 2, 2014 at 4:48 PM, Joshua D. Drake j...@commandprompt.com mailto:j...@commandprompt.comwrote: On 09/02/2014 09:48 AM, Bruce Momjian wrote: As a case in point, EDB have spent quite a few man-years on their Oracle compatibility layer; and it's still not a terribly exact match, according to my colleagues who have looked at it. So that is a tarbaby I don't personally care to touch ... even ignoring the fact that cutting off EDB's air supply wouldn't be a good thing for the community to do. What any commercial entity and the Community do are mutually exclusive and we can not and should not determine what features we will support based on any commercial endeavor. From where I sit the mutually exclusive argument doesn't seem to be true - and in fact is something I think would be bad if it were. We shouldn't be afraid to add features to core that vendors are offering but at the same time the fact that the Oracle compatibility aspects are commercial instead of in-core is a plus to help ensure that there are people making a decent living off PostgreSQL and thus are invested in its future Definitely we shouldn't be afraid to add any feature to core, if we (as a community) like it and can do it. And for sure, commercial versions and consultancy companies need to make a living and we should care of them all (that includes myself -my company-, of course). But there is plenty of space for all, specially with an Oracle compatibility layer. That would attract many many many users to postgres, and we all (including EDB, of course) would immediately benefit from it. Of course the community too. Plus, competition is never bad: it's the key to progress. Even if it would steal business from EDB, having to compete with PostgreSQL would foster them to improve and differentiate, becoming better. So I don't see any problem there. Of course, that's only my view :) Best, Álvaro
Re: [HACKERS] PL/pgSQL 2
On 02/09/14 23:34, Joshua D. Drake wrote: On 09/02/2014 02:11 PM, David Johnston wrote: On Tue, Sep 2, 2014 at 4:48 PM, Joshua D. Drake j...@commandprompt.com mailto:j...@commandprompt.comwrote: On 09/02/2014 09:48 AM, Bruce Momjian wrote: As a case in point, EDB have spent quite a few man-years on their Oracle compatibility layer; and it's still not a terribly exact match, according to my colleagues who have looked at it. So that is a tarbaby I don't personally care to touch ... even ignoring the fact that cutting off EDB's air supply wouldn't be a good thing for the community to do. What any commercial entity and the Community do are mutually exclusive and we can not and should not determine what features we will support based on any commercial endeavor. From where I sit the mutually exclusive argument doesn't seem to be true - and in fact is something I think would be bad if it were. We shouldn't be afraid to add features to core that vendors are offering but at the same time the fact that the Oracle compatibility aspects are commercial instead of in-core is a plus to help ensure that there are people making a decent living off PostgreSQL and thus are invested in Far more people make a very good living off of PostgreSQL than *any* commercial variant. I stand by what I said. It is not the responsibility or the care of the community what a commercial vendor does or does not do with their fork except, possibly to implement the open source equivalent where it makes sense or where licensing may not be followed. (Read: I don't care about oracle compatibility) Yeah, we differ there. I think having an Oracle compatibility layer in PostgreSQL would be the-next-big-thing we could have. Oracle is has orders of magnitude bigger user base than postgres has; and having the ability to attract them would bring us many many more users which, in turn, would benefit us all very significantly. It would be my #1 priority to do in postgres (but yes, I know -guess- how hard and what resources that would require). But dreaming is free :) Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 03/09/14 00:41, Joshua D. Drake wrote: On 09/02/2014 02:47 PM, Álvaro Hernández Tortosa wrote: Yeah, we differ there. I think having an Oracle compatibility layer in PostgreSQL would be the-next-big-thing we could have. Oracle is has orders of magnitude bigger user base than postgres has; and having the ability to attract them would bring us many many more users which, in turn, would benefit us all very significantly. It would be my #1 priority to do in postgres (but yes, I know -guess- how hard and what resources that would require). But dreaming is free :) Oracle compatibility certainly has merit, I just don't see it as useful for core. I would be far more interested in MSSQL compatibility honestly. That said, Postgres itself is a rockstar and I think we can make our own case without having to copy others. It's not copying. It's easying a path for others to migrate and come to Postgres. I'm interested why you are more interested in MSSQL. My reasons for being interested in Oracle are: - It has more users (biggest and above all, the main reason: we could attract more) - Postgres is perceived as similar to Oracle (so migration is likely to be easier) That's all I want. Grow postgres userbase, attracting Oracle users :) Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API
On 02/09/14 04:47, Dobes Vandermeer wrote: Same idea as PgBouncer or PgPool. The advantage over hacking PgBouncer/PgPool for the job is that Tomcat can already do a lot of what you want using built-in, pre-existing functionality. Connection pool management, low level REST-style HTTP processing, JSON handling etc are all done for you. Yeah, those are nice conveniences but I still think installing Java and getting something to run on startup is a bit more of a hurdle. Better maek life easier up front by having a simple standalone proxy you can compile and run with just whatever is already available on a typical AWS ubuntu environment. If instead of Tomcat you use Jetty, you can embed the whole app+Jetty+dependencies in a single executable JAR, which easies deployment a lot. Installing a JVM in a Ubuntu environment is just one apt-get and even easier if you use CloudFormation for automation. I don't think is a bad choice at all... you get most of the functionality you want already there, as Craig said, and it's lightweight. Hope it helps, Álvaro
Re: [HACKERS] PL/pgSQL 2
On 01/09/14 14:27, Joel Jacobson wrote: On Mon, Sep 1, 2014 at 1:30 PM, Pavel Stehule pavel.steh...@gmail.com wrote: I agree with Andres - it is not a good for plpgsql and for plpgsql users. The benefit must be significant for 90% of users. ... Official implementation of plpgsql2 can be very wrong and dangerous signal - so we should not to do. Do you argue the introduction of plpgsql2 would hurt the users of plpgsql in some way? How? If you have X% who continue to happily use plpgsql, and (100-X%) who find they can use plpgsql2 in their project, for new functions or all functions (for a new project), then you have made (100-X)% of the users more happy, than they would be if they were forced to use plpgsql and suffer from its problems. It *would* be a problem if you had to choose between writing all functions in their plpgsql or plpgsql2, but thanks to postgres support for different pl-languages and mixing different languages in the same project, I cannot see the problem. What it's clear from my non-hacker, casual hackers ml reader opinion here, is that there is room for new language features or a new in-core language at once. I find Joel's reasoning quite clear about the general concepts of improving on plpgsql, although the precise changes may not be big enough to justify just a new version. But if there are enough changes, and breaking compatibility with the current plpgsql is a major concern, I fail to buy other arguments of why doing plpgsql2 is a bad thing. The comparisons with Python/Perl are very misleading, as they have nothing to do with Postgres, and the case is obviously different. What I can add is that, if Postgres is to devote resources to a new language, I would plan it with a broader scope. What would attract most users? Would it bring non postgres users to Postgres? What could be one of the killer features of any next version? My trivial answer to most of these questions is: PL/SQL. I don't know with detail how complex this is to get in Postgres (well, EDB probably knows), but if I had to chose a new language, this is it. So my questions would rather be: - Is it feasible (resources, time, interest) to implement PL/SQL in Postgres? - Does it support all the requested new features Joel and others mentioned in this thread as desires for the new language? - If the answer to the previous question is no, could those unsupported features be implemented as a compatible superset of PL/SQL? Sorry if this sounds too unconventional for this list, but this is what IMVHO many users would be more pleased with. My 2 cents, Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 01/09/14 20:42, Tom Lane wrote: =?UTF-8?B?w4FsdmFybyBIZXJuw6FuZGV6IFRvcnRvc2E=?= a...@nosys.es writes: What I can add is that, if Postgres is to devote resources to a new language, I would plan it with a broader scope. What would attract most users? Would it bring non postgres users to Postgres? What could be one of the killer features of any next version? My trivial answer to most of these questions is: PL/SQL. By that I suppose you mean I wish it would act just like Oracle. The problem with such a wish is that a lot of the incompatibilities with Oracle are functions of the core SQL engine, not of the PL. plpgsql already is about as close to PL/SQL as it's possible to get without changing core Postgres behavior --- or at least, that was the original design desire, and I don't think that it's failed in any large degree. regards, tom lane It's true that some of the incompatibilities are the core engine, internal functions and so on, and that the plpgsql design goal was to achieve similarity. But similarity is not code compatibility, and afaik, plpgsql is not code compatible with PL/SQL. Having 1:1 code compatibility, if possible, is a very well first step, only followed by the core functionalities you mention. If postgres were going for a new language, why not implement one which, having the other suggested functionality, also has 1:1 PL/SQL code compatibility? I'm sure it's no trivial task, but one highly desirable. Regards, Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 01/09/14 21:08, Pavel Stehule wrote: 2014-09-01 20:58 GMT+02:00 Álvaro Hernández Tortosa a...@nosys.es mailto:a...@nosys.es: On 01/09/14 20:42, Tom Lane wrote: =?UTF-8?B?w4FsdmFybyBIZXJuw6FuZGV6IFRvcnRvc2E=?= a...@nosys.es mailto:a...@nosys.es writes: What I can add is that, if Postgres is to devote resources to a new language, I would plan it with a broader scope. What would attract most users? Would it bring non postgres users to Postgres? What could be one of the killer features of any next version? My trivial answer to most of these questions is: PL/SQL. By that I suppose you mean I wish it would act just like Oracle. The problem with such a wish is that a lot of the incompatibilities with Oracle are functions of the core SQL engine, not of the PL. plpgsql already is about as close to PL/SQL as it's possible to get without changing core Postgres behavior --- or at least, that was the original design desire, and I don't think that it's failed in any large degree. regards, tom lane It's true that some of the incompatibilities are the core engine, internal functions and so on, and that the plpgsql design goal was to achieve similarity. But similarity is not code compatibility, and afaik, plpgsql is not code compatible with PL/SQL. Having 1:1 code compatibility, if possible, is a very well first step, only followed by the core functionalities you mention. If postgres were going for a new language, why not implement one which, having the other suggested functionality, also has 1:1 PL/SQL code compatibility? I'm sure it's no trivial task, but one highly desirable. It is false expectation - language is only one part .. and plpgsql isn't to far. There are different system of modules, different system of custom aggregates, mainly with PL/SQL is very complex library dbms_. This library is maybe more complex than current Postgres base. OK. Understood. Full compatibility may be a longer-term goal. But why it's bad to have the same syntax at a language -not library- level? It is task for commercial project --- not all Postgres users need a Oracle compatibility layer. Certainly not all users need that layer. But I'm sure few would complain to have it. Besides that, why do you say it is meant for a commercial project? If it is because postgres should not listen to users willing to migrate from Oracle --then we're screwed, losing the biggest opportunity (of attracting a large crowd of users) of recent times. If it is because it's too complex... well, I don't think the postgres community (as a whole) have less resources than commercial projects. Next, I am sure, so it is in contradiction to Joel proposal. That's not my business ;P No, really: if there is a new version of a language, which modifies the current syntax of plpgsql; if plpgsql is already very similar to PL/SQL: why not rather than coming up with a new syntax use an already existing one? One that many, many more users than plpgsql, already know? Regards, Álvaro
Re: [HACKERS] PL/pgSQL 2
On 01/09/14 21:52, Joel Jacobson wrote: On Mon, Sep 1, 2014 at 8:34 PM, Álvaro Hernández Tortosa a...@nosys.es wrote: What I can add is that, if Postgres is to devote resources to a new language, I would plan it with a broader scope. What would attract most users? Would it bring non postgres users to Postgres? What could be one of the killer features of any next version? My trivial answer to most of these questions is: PL/SQL. I don't know with detail how complex this is to get in Postgres (well, EDB probably knows), but if I had to chose a new language, this is it. So my questions would rather be: Interesting visionary ideas. For me personally, I have no Oracle functions to port to Postgres, so all differences between PL/SQL and PL/pgSQL would just be unnecessary extra amount of work in the process of porting existing code into a new language, be it PL/SQL or PL/pgSQL 2. That said, if PL/SQL wouldn't suffer from the problems I have with PL/pgSQL today, I wouldn't see a problem writing new code in the language, but then I would probably never manage to port all existing code to PL/SQL and I would be stuck with a mix of code in two languages instead of one. With PL/pgSQL 2 on the other hand, it would be feasible to eventually port all my existing code, as most of it would be compatible without changes and the rest would easy to make compatible. I guess it's a question of if it's more important to recruit Oracle users, or if it's more important to improve the life of existing PL/pgSQL Postgres users. I agree that for you, unfortunately, plpgsql2 would be better than PL/SQL. However, I believe as a whole a bigger majority of users would be benefited from this. If anyone is willing to look at the market numbers, the number of PL/SQL users compared to those of plpgsql is really huge. That would surely attract many more users to postgres, which would ultimately greatly help us all (in detriment of you and a few others, sure, but on a way that I'm sure it's manageable). Of course, I'd +1 any attempt to build a super-set of PL/SQL that would try to implement, also, the request that you and others may have about this PL, should they make sense. Regards, Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 01/09/14 23:31, Marko Tiikkaja wrote: On 2014-09-01 11:11 PM, Álvaro Hernández Tortosa wrote: No, really: if there is a new version of a language, which modifies the current syntax of plpgsql; if plpgsql is already very similar to PL/SQL: why not rather than coming up with a new syntax use an already existing one? One that many, many more users than plpgsql, already know? The point isn't to create a new language just for the sake of creating a new one. It's to fix the problems PL/PgSQL has. If we're just going to trade the problems in PL/PgSQL with another set of problems implemented by PL/SQL, we're just worse off in the end. Agreed. But if we can solve them --only if we could-- by leveraging a syntax that happens to be: - Similar to that of plpgsql (exactly the same as plpgsql2 would be similar to plpgsql) - Already known by a large, very large, group of users we would be way better off. If there are unresolved problems in the PL/SQL current implementation, doing a superset of it may make sense. Regards, Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL 2
On 01/09/14 23:46, David G Johnston wrote: Álvaro Hernández Tortosa wrote On 01/09/14 21:52, Joel Jacobson wrote: On Mon, Sep 1, 2014 at 8:34 PM, Álvaro Hernández Tortosa lt; aht@ gt; wrote: What I can add is that, if Postgres is to devote resources to a new language, I would plan it with a broader scope. What would attract most users? Would it bring non postgres users to Postgres? What could be one of the killer features of any next version? My trivial answer to most of these questions is: PL/SQL. I don't know with detail how complex this is to get in Postgres (well, EDB probably knows), but if I had to chose a new language, this is it. So my questions would rather be: Interesting visionary ideas. For me personally, I have no Oracle functions to port to Postgres, so all differences between PL/SQL and PL/pgSQL would just be unnecessary extra amount of work in the process of porting existing code into a new language, be it PL/SQL or PL/pgSQL 2. That said, if PL/SQL wouldn't suffer from the problems I have with PL/pgSQL today, I wouldn't see a problem writing new code in the language, but then I would probably never manage to port all existing code to PL/SQL and I would be stuck with a mix of code in two languages instead of one. With PL/pgSQL 2 on the other hand, it would be feasible to eventually port all my existing code, as most of it would be compatible without changes and the rest would easy to make compatible. I guess it's a question of if it's more important to recruit Oracle users, or if it's more important to improve the life of existing PL/pgSQL Postgres users. I agree that for you, unfortunately, plpgsql2 would be better than PL/SQL. However, I believe as a whole a bigger majority of users would be benefited from this. Is it even legal for us to create PL/SQL? Honestly, I don't know. But I don't think so. And if it is, be it. Let's have Oracle sue us (who?) Beyond that in all likelihood having both a version two of the pl/pgsql language and the pl/SQL language would be a desireable outcome for, say, a 10.0 release. If 10.0 is to follow 9.5, I'd agree. Althouth I'd also agree for any earlier, if that would be possible. The former simply because languages by their very nature are evolutionary and at some point the lost productivity of suppressing such evolution in the name of backward compatibility will be deemed undesirable. It may be desirable to simply call the new language pl/elephant instead pl/pgsql2 but the fundamental reason for evolving a language in order to incorporate newly acquired knowledge is unarguable. Though in this case the entire language/extension mechanism should be considered and not just the specific procedural-SQL language we are dealing with here. The goal of adding PL/SQL would be to increase the user base of the project and hopefully attract new blood to the development team in order to maximize long-term survivability and increase the pace of innovation. We would be unable to introduce substantial evolution to this language for that very reason and so a different language is likely to be needed - eventually. That's what I think. Increasing the user base, in a moment where for many reasons we all know many want to migrate off of Oracle... let's bring them here before they move to something else, with also piles of money and great marketing capabilities Álvaro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers