Re: [HACKERS] SCRAM protocol documentation

2017-08-11 Thread Álvaro Hernández Tortosa



On 11/08/17 15:00, Michael Paquier wrote:

On Fri, Aug 11, 2017 at 9:31 PM, Álvaro Hernández Tortosa
<a...@8kdata.com> wrote:

On 11/08/17 13:18, Michael Paquier wrote:

On Fri, Aug 11, 2017 at 3:50 PM, Álvaro Hernández Tortosa
<a...@8kdata.com> wrote:

Relatedly, the SCRAM specification doesn't appear to allow omitting the
user name in this manner.  Why don't we just send the actual user name,
even though it's redundant with the startup message?

The problem is where a username includes characters as a comma or '=',
which can be avoided if the string is in UTF-8 as the username is
prepared with SASLprep before being used in the SASL exchange, but we
have no way now to be sure now that the string is actually in UTF-8.
If at some point we decide that only things using UTF-8 are good to be
used during authentication, using the username in the exchange
messages instead of the one in the startup packet would be fine and
actually better IMO in the long term. Please note that the
specification says that both the username and the password must be
encoded in UTF-8, so we are not completely compliant here. If there is
something to address, that would be this part.

 The reason why the username is ignored, unless I'm wrong, is not exactly
that it is already sent. It is that Postgres does not restrict usernames to
be UTF-8 only, while SCRAM does. As such, if a username would not be UTF-8,
it will not be sent reliably over SCRAM.

That's basically the point I was making. Note that I would not be
against Postgres forcing strings to be in UTF-8. Now things are fuzzy
because of the lack of restrictions.


I'm +1 for that. But I guess that involves a protocol change, 
and that's a completely different can of worms





  If there's a clear meaning about ignoring the user here, why not
settle
on something like the "*"? It's not going to change the world sending a
few
bytes less on initialization, but I guess it doesn't hurt either...

I am not sure either that '*' would be that much helpful. Requiring
that things are in UTF-8 would be more compliant with the original
RFC.

 But we really don't need to send the username, since Postgres already
knows it (and that accommodates for non UTF-8 usernames). So why bother?
Just sending something like "*" (which is UTF-8 and produces the same value
under Saslprep) should be enough. I think the idea of ignoring the username
is pretty neat, but maybe a "standard" like "send me an asterisk here" could
be even better than leaving it empty.

Personally I don't see much difference between both, so I'd rather
leave things as they are now.


Strictly speaking the RFC assumes that the username is at least 1 
character. I understand this was precisely Peter's original comment.



Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SCRAM protocol documentation

2017-08-11 Thread Álvaro Hernández Tortosa



On 11/08/17 13:18, Michael Paquier wrote:

On Fri, Aug 11, 2017 at 3:50 PM, Álvaro Hernández Tortosa
<a...@8kdata.com> wrote:

On 11/08/17 03:57, Peter Eisentraut wrote:

The SCRAM protocol documentation
(https://www.postgresql.org/docs/devel/static/sasl-authentication.html)
states

"To avoid confusion, the client should use pg_same_as_startup_message as
the username in the client-first-message."

However, the client implementation in libpq doesn't actually do that, it
sends an empty string for the user name.  I find no other reference to
"pg_same_as_startup_message" in the sources.  Should the documentation
be updated?

Yes, definitely. I think that we should mention that the server uses
the username of the startup packet and ignores the data sent by the
frontend potentially provided in client-first-message.


But it already says so the documentation:

"The username that was already sent in the startup message is used instead."




Relatedly, the SCRAM specification doesn't appear to allow omitting the
user name in this manner.  Why don't we just send the actual user name,
even though it's redundant with the startup message?

The problem is where a username includes characters as a comma or '=',
which can be avoided if the string is in UTF-8 as the username is
prepared with SASLprep before being used in the SASL exchange, but we
have no way now to be sure now that the string is actually in UTF-8.
If at some point we decide that only things using UTF-8 are good to be
used during authentication, using the username in the exchange
messages instead of the one in the startup packet would be fine and
actually better IMO in the long term. Please note that the
specification says that both the username and the password must be
encoded in UTF-8, so we are not completely compliant here. If there is
something to address, that would be this part.


The reason why the username is ignored, unless I'm wrong, is not 
exactly that it is already sent. It is that Postgres does not restrict 
usernames to be UTF-8 only, while SCRAM does. As such, if a username 
would not be UTF-8, it will not be sent reliably over SCRAM.





 If there's a clear meaning about ignoring the user here, why not settle
on something like the "*"? It's not going to change the world sending a few
bytes less on initialization, but I guess it doesn't hurt either...

I am not sure either that '*' would be that much helpful. Requiring
that things are in UTF-8 would be more compliant with the original
RFC.


But we really don't need to send the username, since Postgres 
already knows it (and that accommodates for non UTF-8 usernames). So why 
bother? Just sending something like "*" (which is UTF-8 and produces the 
same value under Saslprep) should be enough. I think the idea of 
ignoring the username is pretty neat, but maybe a "standard" like "send 
me an asterisk here" could be even better than leaving it empty.



Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SCRAM protocol documentation

2017-08-11 Thread Álvaro Hernández Tortosa



On 11/08/17 03:57, Peter Eisentraut wrote:

The SCRAM protocol documentation
(https://www.postgresql.org/docs/devel/static/sasl-authentication.html)
states

"To avoid confusion, the client should use pg_same_as_startup_message as
the username in the client-first-message."

However, the client implementation in libpq doesn't actually do that, it
sends an empty string for the user name.  I find no other reference to
"pg_same_as_startup_message" in the sources.  Should the documentation
be updated?

Relatedly, the SCRAM specification doesn't appear to allow omitting the
user name in this manner.  Why don't we just send the actual user name,
even though it's redundant with the startup message?



Hi Peter.

You are absolutely right, I was also surprised by this when I was 
doing the JDBC implementation. Actually I chose to send an asterisk 
("*"), see 
https://github.com/pgjdbc/pgjdbc/pull/842/files#diff-c52128420a3882543ffa20a48964abe4R88, 
as it is shorter than the username (likely).


I don't like the empty string either, and actually the library 
built for the JDBC and used in pgjdbc does explicitly disallow the use 
of an empty username.


If there's a clear meaning about ignoring the user here, why not 
settle on something like the "*"? It's not going to change the world 
sending a few bytes less on initialization, but I guess it doesn't hurt 
either...



Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Authentication mechanisms categorization

2017-07-19 Thread Álvaro Hernández Tortosa



On 14/07/17 13:29, Michael Paquier wrote:

On Fri, Jul 14, 2017 at 12:16 PM, Álvaro Hernández Tortosa
<a...@8kdata.com> wrote:

 If the parameter authmethod would rather be "authmethods", i.e., a list,
I think it would be significantly more flexible.

Yes, but the handling of a list becomes messier if there are some
other connection parameters that are dependent on the authentication
method. Say if a list made of scram-sha-256 and scram-sha-3 as methods
is sent, and a parameter named saslchannel lists scram-sha-256-plus is
used, this becomes unusable with scram-sha-3. Using individual names
for a parameter makes interactions with other parameters easier to
handle and less bug-prone. That's also by experience more flexible for
the application.


 I agree with a list of methods and all the values already existing for
sslmode, this might be more than enough, specially if the channel binding
SCRAM mechanisms would get a different authmethod than their non-channel
binding partners (like scram-sha-256-plus). This makes the list argument for
the authmethods, in my opinion, stronger.

For the channel binding patch, I have actually implemented saslname to
enforce the name of the SASL mechanism name to use (SCRAM-SHA-256 or
its -PLUS) as well as saslchannelbinding to enforce the channel
binding type. That's very handy, and at the end I saw that having a
list does not add much value in a feature that should be as simple as
possible as the client will use one match at the end for
authentication, and let the client know if it failed or succeeded (yes
I hate sslmode=prefer which does up to two attempts at once). But
that's as far as my opinion stands.

It is not possible to know the future, but we cannot discard as well
the fact that a future authentication method, say hoge could as well
support scram-sha-256, in which case cases like that using a list
"authmethods=hoge,sasl authmechanisms=scram-sha-256" would mean that
scram-sha-256 needs to be enforced for both things, but the dependency
handling makes things unnecessary complicated in libpq. My argument
here is crazy though.


Hi Michael.

I'm mostly convinced by the power of all the parameters that 
already exist, given that you added both saslname and saslchannelbinding 
to the already existing sslmode. That's great, and allows for very fine 
choosing of the auth method. So it would be great if (non-libpq) driver 
implementations would expose the same parameter names to the users. I 
will study this for JDBC.


My only fear is that this could become very complicated for the 
user, and could end up looking like the authentication algorithm lists 
for SSL, which are very hard to digest for the non expert. To handle 
list of auth methods spread through three parameters, seem to me going a 
bit in this direction. A categorization like the one proposed, while 
maybe difficult to do initially, and maybe to maintain too, is just 
precisely to take this burden out from the user, and expose a simpler 
setting for them.


Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Authentication mechanisms categorization

2017-07-14 Thread Álvaro Hernández Tortosa



On 14/07/17 11:09, Michael Paquier wrote:

On Sat, Jul 8, 2017 at 2:19 PM, Álvaro Hernández Tortosa <a...@8kdata.com> 
wrote:

 There has been some prior discussion, that we recently continued at
pgday.ru, about what to do if a client wants to use a "strong"
authentication mechanism but a rogue server forces the client to use a
weaker authentication mechanism. This is the case if the client expects
SCRAM to be used but a rogue server just replies with
AuthenticationCleartextPassword, for example. Client drivers will
authenticate using this latter mechanism, transparently (at least pgjdbc
implementation does this, and I believe libpq also). This somehow defeats
the purpose of some mechanisms like SCRAM.

Yeah :(


 It was discussed to add a parameter to the driver like "SCRAM-only", but
I think this may not be ideal. "SCRAM-only" means that code needs to be
written to prevent every other authentication mechanism, explicitly, which
is far from ideal. Much worse, it defeats using other auth mechanisms that
might be OK for the user. Also, this doesn't consider whether SCRAM is good
without channel binding.

 I think it would be better to make a categorization of authentication
mechanisms and then have an agreement among libpq and drivers to set a
minimum level of security based on the user's request. Some initial ideas
are:

- Three security levels: Basic, Medium, Advanced.
- Prevents MITM / does not.
- Given this X possible attacks, a matrix of which mechanisms avoid which
attacks (something similar to the table comparing the possible effects of
the different isolation levels).

 This is not trivial: for example, SCRAM may be OK without channel
binding in the presence of SSL, but without SSL channel binding is a must to
prevent MITM. Similarly, are other auth mechanisms like Kerberos (I'm not an
expert here) as "safe" as SCRAM with our without channel binding?

The use of channel binding is linked to SSL, which gets already
controlled by sslmode.


H yes. As of today. Maybe tomorrow there's a channel binding 
mechanism that does not make use of SSL. But this is also unlikely.



  Users won't get trapped in this area by using
"require" instead of the default of "prefer". I would love to see the
default value changed actually from "prefer" to "require" here.
"prefer" as a default is a trap for users. There were discussions
about that not long ago but this gave nothing.


 I believe this should be discussed and find a common agreement to be
implemented by libpq and all the drivers, including a single naming scheme
for the parameter and possible values. Opinions?

I think that we don't need to have anything complicated here: let's
have at least a connection parameter, and perhaps an environment
variable which enforces the type of the authentication method to use:
scram-sha-256, md5, etc. I don't think that there is any need to
support a list of methods, any application could just enforce the
parameter to a different value if the previous one failed.

Categorization is something that can lose value over the ages,
something considered as strong now could be weak in 10 years. By
supporting only a list of dedicated names users have the same
flexibility, and we just need to switch the default we consider safe.
Controlling SSL is already a separate and existing parameter, so I
don't think that it should be part of this scheme. Having
documentation giving a recommended combination, say
"authmethod=scram-sha-256 sslmode=require" would be enough IMO.


If the parameter authmethod would rather be "authmethods", i.e., a 
list, I think it would be significantly more flexible.


I agree with a list of methods and all the values already existing 
for sslmode, this might be more than enough, specially if the channel 
binding SCRAM mechanisms would get a different authmethod than their 
non-channel binding partners (like scram-sha-256-plus). This makes the 
list argument for the authmethods, in my opinion, stronger.



Álvaro



--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Authentification method on client side checking

2017-07-09 Thread Álvaro Hernández Tortosa



On 09/07/17 18:47, Victor Drobny wrote:

Hello,

Despite the addition of SCRAM authentification to PostgreSQL 10, MITM 
attack can be performed by saying that the server supports, for 
example, only md5 authentication. The possible solution for it is 
checking authentification method on a client side and reject 
connections that could be unsafe.


Postgresql server can require unencrypted password passing, md5, 
scram, gss or sspi authentification.


Hi Victor.

Precisely yesterday I initiated a similar thread: 
https://www.postgresql.org/message-id/d4098ef4-2910-c8bf-f1e3-f178ba77c381%408kdata.com


I think that a) the mere auth mechanism is not enough (channel 
binding or not, ssl or not, change a lot the effective security 
obtained) and b) maybe a categorization is a better way of specifying a 
connection security requirements.


What's your opinion on this? Any answer should also be coordinated 
among the drivers.



Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Authentication mechanisms categorization

2017-07-07 Thread Álvaro Hernández Tortosa


There has been some prior discussion, that we recently continued at 
pgday.ru, about what to do if a client wants to use a "strong" 
authentication mechanism but a rogue server forces the client to use a 
weaker authentication mechanism. This is the case if the client expects 
SCRAM to be used but a rogue server just replies with 
AuthenticationCleartextPassword, for example. Client drivers will 
authenticate using this latter mechanism, transparently (at least pgjdbc 
implementation does this, and I believe libpq also). This somehow 
defeats the purpose of some mechanisms like SCRAM.


It was discussed to add a parameter to the driver like 
"SCRAM-only", but I think this may not be ideal. "SCRAM-only" means that 
code needs to be written to prevent every other authentication 
mechanism, explicitly, which is far from ideal. Much worse, it defeats 
using other auth mechanisms that might be OK for the user. Also, this 
doesn't consider whether SCRAM is good without channel binding.


I think it would be better to make a categorization of 
authentication mechanisms and then have an agreement among libpq and 
drivers to set a minimum level of security based on the user's request. 
Some initial ideas are:


- Three security levels: Basic, Medium, Advanced.
- Prevents MITM / does not.
- Given this X possible attacks, a matrix of which mechanisms avoid 
which attacks (something similar to the table comparing the possible 
effects of the different isolation levels).


This is not trivial: for example, SCRAM may be OK without channel 
binding in the presence of SSL, but without SSL channel binding is a 
must to prevent MITM. Similarly, are other auth mechanisms like Kerberos 
(I'm not an expert here) as "safe" as SCRAM with our without channel 
binding?


I believe this should be discussed and find a common agreement to 
be implemented by libpq and all the drivers, including a single naming 
scheme for the parameter and possible values. Opinions?



Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SCRAM auth and Pgpool-II

2017-07-07 Thread Álvaro Hernández Tortosa



On 06/07/17 04:03, Tatsuo Ishii wrote:

Hi PostgreSQL hackers,

I would like to hear ideas how Pgpool-II can deal with SCRAM auth
which will be in PostgreSQL 10.

For those who are not familiar with Pgpool-II[1], it is an external
OSS project to provide some additional features to PostgreSQL,
including load balancing and automatic failover. Pgpool-II works as a
proxy between PostgreSQL client and PostgreSQL server(s).

When a client wants to connects to PostgreSQL and SCRAM auth is
enabled, it sends user name to server. Then the server sends
information including a salt to the client. The client computes a
"ClientProof" using the salt and other information, and sends it to
the server[2].

For Pgpool-II, things would go as follows:

1) clients sends user name to Pgpool-II.
2) Pgpool-II forwards it to PostgreSQL servers.
3) Each PostgreSQL server sends their own salt to Pgpool-II.
4) Pgpool-II is confused because there are multiple salts and each has
different values. The client only accepts single salt obviously.

So my question is, is there any solution or workaround for the problem
#4. Someone at PGCon 2017 suggested that the problem could be avoided
if the auth method between the client and Pgpool-II is "trust" (which
means no auth). But this does not seem to be a best solution for me
because it would weaken the security.

I suspect this problem may not be specific to Pgpool-II. Any middle
ware which handles multiple PostgreSQL servers could have the similar
problem.

Any suggestion would be appreciated. Thanks in advance.

[1] https://pgpool.net
[2] https://tools.ietf.org/html/rfc5802#section-3



Hi Tatsuo.

There's definitely an important concern here that should be 
addressed: how poolers/proxies/middleware/etc can deal with SCRAM, 
specifically in the context of channel binding.


If there is to be a single connection from client to PostgreSQL 
server, intercepted by pgpool to perform the magic foo, then channel 
binding is, indeed, designed to defeat this. If, however, pgpool or the 
middleware manages two separate connections (client<->pool and 
pool<->PG) then there is some light here.


One SCRAM feature not implemented as of today is the authzid 
(authorization identity: see 
https://tools.ietf.org/html/rfc5802#page-10, SCRAM attribute "a" and 
https://tools.ietf.org/html/rfc5801). Authzid is basically "I want to 
authenticate as user X and once authenticated, consider I'm user Y". 
With authzid, a pool/proxy may have a common user name with its own 
SCRAM credentials to authenticate with the backend PostgreSQL, and pass 
the authzid with the real username (the one provided on the 
client<->pool connection).


This would require:

a) That authzid is implemented in PostgreSQL.
b) A mechanism in PG to name which user(s) Y are allowed to be 
authenticated by user X. This is similar, but not identical, to the 
current SET ROLE.


From a SCRAM protocol perspective, this is very simple, just an 
extra attribute. Part b) may need more discussion.


I believe this could be of help to your case and other middleware.


Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using postgres planner as standalone component

2017-07-02 Thread Álvaro Hernández Tortosa



On 01/07/17 22:48, Ricky Stevens wrote:

Hi,

For one of my personal projects I am interested in using the 
PostgreSQL planner as a standalone library. However, I would like to 
run this as an embedded library instead of actually creating anything 
on disk.


I've realized that postgres has several pg_operator, pg_class etc. 
tables which it uses for query planning purposes. Is there any 
PostgreSQL component interface whose implementation could be 
overridden to not actually try to read these tables from disk but 
instead read it from a custom memory region that is managed by my code.


Thanks!



Maybe you'd like to consider gporca 
https://github.com/greenplum-db/gporca as an alternative. You may also 
want to look at Calcite https://calcite.apache.org/docs/ if you were 
more into the Java world.



Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-06-20 Thread Álvaro Hernández Tortosa



On 20/06/17 06:11, Michael Paquier wrote:

On Tue, Jun 6, 2017 at 3:40 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:

As far as I can see, there are a couple of things that I still need to
work on to make people happy:
- Rework the generic APIs for TLS finish and endpoint so as any
implementation can use channel binding without inducing any extra code
footprint to be-secure.c and fe-secure.c.
- Implement endpoint, as Alvaro is saying for JDBC that would be nicer.
- Have a couple of tests for channel binding to allow people to test
the feature easily. Those will be in src/test/ssl/. It would be nice
as well to be able to enforce the channel binding type on libpq-side,
which is useful at least for testing. So we are going to need an
environment variable for this purpose, and a connection parameter.

Okay, here we go. Attached is a set of four patches:
- 0001 is some refactoring for the SSL tests so as other test suite in
src/test/ssl can take advantage of the connection routines. There is
nothing fancy here.
- 0002 is the implementation of tls-unique as channel binding. This
has been largely reworked since last submission, I have found on the
way a couple of bugs and some correctness issues.
- 0003 is a patch to add as connection parameters saslname and
saslchannelbinding. With support of more SASL mechanisms (PG10 has
SCRAM-SHA-256, I am adding SCRAM-SHA-256-PLUS here), saslname can be
used to enforce on the client-side the value of the SASL mechanism
chosen. saslchannelbinding does the same for the channel binding name.
This is very useful for testing, and a set of tests are added in
src/test/ssl/ for tls-unique and the SASL mechanisms. The tests cover
many scenarios, like downgrade attacks for example.
- 0004 is the implementation of tls-server-end-point, as Alvaro has
asked. Per RFC 5929, the binding data needs to be a hash of the server
certificate. If the signature algorithm of the certificate is MD5 or
SHA-1, then SHA-256 is used. Other signature algos like SHA-384 or 512
are used to hash the data. The hashed data is then encoded in base64
and sent to the server for verification. Tests using saslchannelname
have been added as well. It took me a while to find out that
OBJ_find_sigid_algs(X509_get_signature_nid(X509*)) needs to be used to
find out the algorithm of a certificate with OpenSSL.

With the tests directly in the patch, things are easy to run. WIth
PG10 stabilization work, of course I don't expect much feedback :)
But this set of patches looks like the direction we want to go so as
JDBC and libpq users can take advantage of channel binding with SCRAM.


This is awesome, Michael.

In the coming weeks, and once my PR for pgjdbc has been added, I 
will work towards another patch to implement channel binding. Should be 
reasonably easy now, thanks to this.


Appreciated!

Álvaro



--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SCRAM in the PG 10 release notes

2017-06-11 Thread Álvaro Hernández Tortosa



On 11/05/17 09:20, Heikki Linnakangas wrote:

On 05/11/2017 07:03 AM, Michael Paquier wrote:
On Thu, May 11, 2017 at 11:50 AM, Bruce Momjian <br...@momjian.us> 
wrote:

I have added this as an open item because we will have to wait to see
where we are with driver support as the release gets closer.


As Postgres ODBC now has a hard dependency with libpq, no actions is
taken from there. At least this makes one driver worth mentioning.


FWIW, I wrote a patch for the Go driver at https://github.com/lib/pq, 
to implement SCRAM. It's awaiting review.


I updated the List of Drivers in the Wiki. I added a few drivers that 
were missing, like the ODBC driver, and the pgtclng driver, as well as 
a Go and Rust driver that I'm aware of. I reformatted it, and added a 
column to indicate whether each driver uses libpq or not.


https://wiki.postgresql.org/wiki/List_of_drivers

There is a similar list in our docs:

https://www.postgresql.org/docs/devel/static/external-interfaces.html

Should we update the list in the docs, adding every driver we know of? 
Or curate the list somehow, adding only more popular drivers? Or 
perhaps add a link to the Wiki page from the docs?


We can use this list in the Wiki to track which drivers implement SCRAM.




I have just submitted a PR to add SCRAM support for pgjdbc: 
https://github.com/pgjdbc/pgjdbc/pull/842


Thread on the pgjdbc-list: 
https://www.postgresql.org/message-id/95dea232-aeeb-d619-e917-abf32b44ef8a%408kdata.com


Hopefully it will be merged soon, and the list of drivers could 
then be updated with this.



Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-06-01 Thread Álvaro Hernández Tortosa



On 01/06/17 17:50, Stephen Frost wrote:

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:

On Wed, May 31, 2017 at 7:59 PM, Stephen Frost <sfr...@snowman.net> wrote:

If your comments regarding the requirement that we have interoperability
testing of this feature before accepting it were intended to mean that
we need to make sure that the JDBC driver is able to work with the
chosen solution (or, perhaps, that we support multuple options, one of
which works with the JDBC changes), and that we review the other TLS
libraries with the goal of making sure that what we agree on in core
should work with those also, then that's great and exactly what I'd like
to see also.

OK, great.  That's what I mean (mostly - see below).

Glad to hear it.


If your comments regarding the requirement that we have interoperability
testing of this feature before accepting it were intended to mean that
we must have a complete alternative TLS solution, while that would
actually play a great deal to a goal I've had for a while to have PG
support multiple TLS implementations (LibNSS being top-of-mind, at least
for me, as I've mentioned a couple times), I can't agree that we should
require that before accepting channel binding as a feature.  To do so
would be akin to requiring that we support multiple TLS implementations
before we agreed to support client-side certificates, in my opinion.

I don't think that we need to have a committed patch allowing multiple
SSL implementations before we accept a patch for channel binding, but
I think it might be a good idea for someone to hack either libpq or
the server enough to make it work with some other SSL implementation
(Windows SSL would probably be the best candidate) and test that what
we think will work for channel binding actually does work.  It
wouldn't need to be a committable patch, but it should be enough to
make a successful connection in the laboratory.  Now, there might also
be other ways besides that of testing that interoperability is
possible, so don't take me as being of the view that someone has to
necessarily do exactly that thing that I just said.  But I do think
that we probably should do more than say "hey, we've got this
undocumented SSL API, and this other Windows API, and it looks like
they do about the same thing, so we're good".  There's sometimes a big
difference between what sounds like it should work on paper and what
actually does work.

I certainly wouldn't object to someone working on this, but I feel like
it's a good deal more work than perhaps you're realizing (and I tend to
think trying to use the Windows SSL implementation would increase the
level of effort, not minimize it).  Perhaps it wouldn't be too bad to
write a one-off piece of code which just connects and then returns the
channel binding information on each side and then one could hand-check
that what's returned matches what it's supposed to based on the RFC, but
if it doesn't, then what?  In the specific case we're talking about,
there's two approaches in the RFC and it seems like we should support
both because at least our current JDBC implementation only works with
one, and ideally we can allow for that to be extended to other methods,
but I wouldn't want to implement a method that only works on Windows SSL
because that implementation, for whatever reason, doesn't follow either
of the methods available in the RFC.


To make things even more complicated, I think the RFC is not 
helping very much, as the definition is not very clear itself:


"
Description: The first TLS Finished message sent (note: the Finished
   struct, not the TLS record layer message containing it) in the most
   recent TLS handshake of the TLS connection being bound to (note: TLS
   connection, not session, so that the channel binding is specific to
   each connection regardless of whether session resumption is used).
   If TLS renegotiation takes place before the channel binding
   operation, then the first TLS Finished message sent of the latest/
   inner-most TLS connection is used.  Note that for full TLS
   handshakes, the first Finished message is sent by the client, while
   for abbreviated TLS handshakes (session resumption), the first
   Finished message is sent by the server.
"
https://tools.ietf.org/html/rfc5929#section-3.1

If you read further, it becomes even less clear what it is and that 
if there are re-negotiations, it gets more complicated. Server-end-point 
is kind of better specified:


"
The hash of the TLS server's certificate [RFC5280] as it
   appears, octet for octet, in the server's Certificate message.
"


Álvaro



--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-06-01 Thread Álvaro Hernández Tortosa



On 01/06/17 18:11, Bruce Momjian wrote:

On Wed, May 31, 2017 at 09:37:02AM -0400, Robert Haas wrote:

On Tue, May 30, 2017 at 11:49 PM, Stephen Frost <sfr...@snowman.net> wrote:

... and I don't believe that we should be asking the
implementors of channel binding to also implement support for multiple
TLS libraries in PostgreSQL in order to test that their RFC-following
(at least, as far as they can tell) implementation actually works.

You're of course free to believe what you wish, but that sounds
short-sighted to me.  If we implement channel binding and it turns out
not to be interoperable with other SSL implementations, then what?  We
can't change it later without breaking compatibility with our own
prior implementation.  Note that Álvaro Hernández Tortosa said about
two hours before you sent this email that it doesn't seem possible to
implement something comparable in Java's standard SSL stack.  If
that's the case, adopting this implementation is dooming everyone who
connects to the database server using JDBC to be unable to use channel
binding.  And that's a large percentage of our user base.

Just to step back, exactly how does channel binding work?  Is each side
of the SSL connection hashing the password hash with the shared SSL
session secret in some way that each side knows the other end knows
the password hash, but not disclosing the secret or password hash?  Is
there some other way JDBC can get that information?



In short, channel binding augments SCRAM (could be also other 
authentication methods, I guess) by adding to the mix of data to be 
exchanged, data that comes off-band. Normal SCRAM exchange involves 
user, a unique token, a salt and some other information. If you add into 
the mix off-band information, that is uniquely known by only client and 
server, you can avoid MITM attacks. It's not as simple as "hashing" the 
password, though. SCRAM involves 4 steps (2 messages each way) with 
somehow complex hashing of data in the last 2 steps.


There are basically 2 channel binding options, that is, 2 possible 
data pieces that could be taken off-band of the SCRAM authentication and 
throw them into this mix:


- tls-unique. It's like a unique identifier for each client-server SSL 
connection. It should be get from the SSL channel and there doesn't seem 
to be a super consistent way of getting it from the channel (in OpenSSL 
is an undocumented API, it's not available in normal Java stack, etc).
- tls-server-end-point. Channel binding data is just a hash of a SSL 
certificate, as is. As such, seems to be easier to be supported across 
multiple OSs/SSL stacks.


However, SCRAM RFC states that if channel binding is implemented, 
at least tls-unique must be implemented, with others being optional 
(there is as of today a third one, but only applies to telnet). So in my 
opinion, I'd implement both of the above, for maximum flexibility, and 
add a field to the required scram authentication febe message with a 
list (aka CSV) of the channel binding mechanisms supported by this 
server. In this way, I believe covers support for several channel 
binding mechanisms and could be extended in the future should other 
mechanisms appear.



Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-05-30 Thread Álvaro Hernández Tortosa



On 31/05/17 03:39, Michael Paquier wrote:

On Tue, May 30, 2017 at 5:59 PM, Robert Haas <robertmh...@gmail.com> wrote:

That sounds like undue optimism to me.  Unless somebody's tested that
Michael's proposed implementation, which uses undocumented OpenSSL
APIs, actually interoperates properly with a SCRAM + channel binding
implementation based on some other underlying SSL implementation, we
can't really know that it's going to work.  It's not like we're
calling SSL_do_the_right_thing_for_channel_binding_thing_per_rfc5929().
We're calling SSL_do_something_undocumented() and hoping that
something_undocumented ==
the_right_thing_for_channel_binding_thing_per_rfc5929.  Could be true,
but without actual interoperability testing it sounds pretty
speculative to me.

Hm? Python is using that stuff for a certain amount of years now, for
the same goal of providing channel binding for SSL authentication. You
can look at this thread:
https://bugs.python.org/issue12551
So qualifying that as a speculative method sounds a quite hard word to me.


Actually this Python patch seems to ultimately rely on the same 
OpenSSL functions as your implementation.


I don't have anything against implementing tls-unique, specially as 
per the RFC it is mandatory (if channel binding support is provided). 
However, given the use of undocumented API, given that at least the Java 
driver would have a very difficult time implementing it (basically 
replacing Java's standard SSL stack with a completely external one, 
BouncyCastle, which adds load, size and complexity), I would rather 
focus the efforts on tls-server-end-point which only needs to access the 
certificate data, something that most APIs/frameworks/languages seem to 
cover much more naturally than tls-unique.


Both are equally safe and effective and so having support for both 
seems sensible to me.



Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Channel binding support for SCRAM-SHA-256

2017-05-27 Thread Álvaro Hernández Tortosa


[cross-posting to pgjdbc]


On 25/05/17 17:17, Michael Paquier wrote:

Hi all,

Please find attached a patch to add support for channel binding for
SCRAM, to mitigate MITM attacks using this protocol. Per RFC5802
(https://tools.ietf.org/html/rfc5802), servers supporting channel
binding need to add support for tls-unique, and this is what this
patch does.


This is awesome, Michael :) Thank you! You have prevented me 
eventually writing the patch and having then PostgreSQL source tainted 
with Java-to-C "transpiled" code ^_^


As defined in RFC5056 (exactly here =>
https://tools.ietf.org/html/rfc5056#section-4.1), servers can use the
TLS finish message to determine if the client is actually the same as
the one who initiated the connection, to eliminate the risk of MITM
attacks. OpenSSL offers two undocumented APIs for this purpose:
- SSL_get_finished() to get the latest TLS finish message that the
client has sent.
- SSL_get_peer_finished(), to get the latest TLS finish message
expected by the server.
So basically what is needed here is saving the latest message
generated by client in libpq using get_finished(), send it to the
server using the SASL exchange messages (value sent with the final
client message), and then compare it with what the server expected.
Connection is successful if what the client has sent and what the
server expects match.

There is also a clear documentation about what is expected from the
client and the server in terms of how both should behave using the
first client message https://tools.ietf.org/html/rfc5802#section-6. So
I have tried to follow it, reviews are welcome particularly regarding
that. The implementation is done in such a way that channel binding is
used in the context of an SSL connection, which is what the RFCs
expect from an implementation.


After a deeper analysis, I have some concerns/comments here:

- tls-unique, as you mentioned, uses two undocumented APIs. This raises 
a small flag about the stability and future of those APIs.


- More importantly, some drivers (not libpq-based) may have unexpected 
difficulties implementing tls-unique. In particular, there is not an API 
in Java to get the finished message. I expect other languages to maybe 
have similar limitations. For Java I see two options:
* Also implement tls-server-end-point, which rather relies on the 
server certificate. This information seems to be exposed as part of the 
Java APIs: 
https://docs.oracle.com/javase/8/docs/api/java/security/cert/Certificate.html#getEncoded--
* Use the BouncyCastle library (http://bouncycastle.org/), which 
explicitly supports tls-unique 
(https://www.bouncycastle.org/docs/pkixdocs1.5on/org/bouncycastle/est/TLSUniqueProvider.html#getTLSUnique() 
, 
https://www.bouncycastle.org/docs/tlsdocs1.5on/org/bouncycastle/tls/ChannelBinding.html#tls_unique 
). This would require, however, non-trivial changes to the driver and, I 
expect, a lot of extra effort.


- It seems to me that tls-unique might be a bit fragile. In particular, 
it requires the server to be aware of TSL renegotiations and avoid 
performing one while the SCRAM authentication is being performed. I 
don't know if this is already guaranteed by the current patch, but it 
seems to me it requires complex interactions between levels of 
abstraction that are quite separate (SSL and SCRAM). This is explained 
by the RFC:


"
server applications MUST NOT request TLS renegotiation during phases of 
the application protocol during which application-layer authentication 
occurs

"
(https://tools.ietf.org/html/rfc5929#section-3.1)


Based on this facts, I'd suggest to either implement 
tls-server-end-point or implement both tls-server-end-point and 
tls-unique. The latter would require a more complete protocol message to 
advertise separately SCRAM mechanisms and channel binding names. One of 
such structures could be the one I suggested earlier: 
https://www.postgresql.org/message-id/df8c6e27-4d8e-5281-96e5-131a4e638...@8kdata.com





Before even that, the server needs to send to the client the list of
SASL mechanisms that are supported. This adds SCRAM-SHA-256-PLUS if
the server has been built with SSL support to the list.


And I'd say the list of channel binding names supported.


    Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-13 Thread Álvaro Hernández Tortosa
My only desire would be to have a final spec and implement the full parser
now, not have to change it in the future. We already know today all the
requirements, so please pick one and I will follow it :)



On Apr 13, 2017 13:47, "Heikki Linnakangas" <hlinn...@iki.fi> wrote:

> On 04/13/2017 02:35 PM, Álvaro Hernández Tortosa wrote:
>
>> On 13/04/17 13:24, Heikki Linnakangas wrote:
>>
>>> Right, when we get channel binding, the server will list
>>> "SCRAM-SHA-256" and "SCRAM-SHA-256-PLUS" as the list of mechanisms.
>>> And if we get channel binding using something else than tls-unique,
>>> then those will be added as extra mechanisms, too, e.g.
>>> "SCRAM-SHA-256-PLUS tls-awesome".
>>>
>>
>>  And how about supporting different SCRAM mechanisms with different
>> possible channel bindings? Separate by space too? So given a field, is
>> the first item the SCRAM mechanism, and all the remaning the channel
>> binding methods? I.e.:
>>
>> SCRAM-SHA-256-PLUS tls-awesome tls-awesome2 tls-awesome3\0...
>>
>
> I think we're going in circles.. Yeah, we could do that. Or they could be
> listed as separate mechanisms:
>
> SCRAM-SHA-256-PLUS\0
> SCRAM-SHA-256-PLUS tls-awesome\0
> SCRAM-SHA-256-PLUS tls-awesome2\0
> SCRAM-SHA-256-PLUS tls-awesome3\0
> \0
>
> One alternative is that you could list extra channel bindings that are
> supported by all the mechanisms, as separate entries. So the list could be:
>
> binding tls-unique\0
> binding tls-awesome\0
> binding tls-awesome2\0
> binding tls-awesome3\0
> SCRAM-SHA-256-PLUS\0
> SCRAM-SHA-512-PLUS\0
> \0
>
> which would mean that those bindings are supported by all the mechanisms
> that follow. I think this would achieve the same thing as your proposed
> separate field, with the current proposed protocol.
>
> But again, I'm 99% sure we won't need it, and we don't need to decide the
> exact syntax for channel bindings yet. We have the flexibility now, so we
> can cross the bridge when we get there.
>
> - Heikki
>
>


Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-13 Thread Álvaro Hernández Tortosa



On 13/04/17 04:54, Michael Paquier wrote:

On Thu, Apr 13, 2017 at 6:37 AM, Álvaro Hernández Tortosa
<a...@8kdata.com> wrote:

 By looking at the them, and unless I'm missing something, I don't see
how the extra information for the future implementation of channel binding
would be added (without changing the protocol). Relevant part is:

The message body is a list of SASL authentication mechanisms, in the
server's order of preference. A zero byte is required as terminator after
the last authentication mechanism name. For each mechanism, there is the
following:



 String



 Name of a SASL authentication mechanism.




 How do you plan to implement it, in future versions, without modifying
the AuthenticationSASL message? Or is it OK to add new fields to a message
in future PostgreSQL versions, without considering that a protocol change?

I don't quite understand the complain here, it is perfectly fine to
add as many null-terminated names as you want with this model. The
patches would make the server just send one mechanism name now, but
nothing prevents the addition of more.


I think I explained in my previous reply, but just in case: there 
are two lists here: SCRAM mechanism and channel binding mechanisms. They 
are orthogonal, you could pick them separately (only with the -PLUS 
variants, of course). All two (both SCRAM and channel binding 
mechanisms) have to be advertised by the server.



Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-13 Thread Álvaro Hernández Tortosa



On 13/04/17 13:24, Heikki Linnakangas wrote:

On 04/13/2017 05:54 AM, Michael Paquier wrote:

On Thu, Apr 13, 2017 at 6:37 AM, Álvaro Hernández Tortosa
<a...@8kdata.com> wrote:
By looking at the them, and unless I'm missing something, I 
don't see
how the extra information for the future implementation of channel 
binding

would be added (without changing the protocol). Relevant part is:

The message body is a list of SASL authentication mechanisms, in the
server's order of preference. A zero byte is required as terminator 
after
the last authentication mechanism name. For each mechanism, there is 
the

following:



String



Name of a SASL authentication mechanism.




How do you plan to implement it, in future versions, without 
modifying
the AuthenticationSASL message? Or is it OK to add new fields to a 
message
in future PostgreSQL versions, without considering that a protocol 
change?


I don't quite understand the complain here, it is perfectly fine to
add as many null-terminated names as you want with this model. The
patches would make the server just send one mechanism name now, but
nothing prevents the addition of more.


Right, when we get channel binding, the server will list 
"SCRAM-SHA-256" and "SCRAM-SHA-256-PLUS" as the list of mechanisms. 
And if we get channel binding using something else than tls-unique, 
then those will be added as extra mechanisms, too, e.g. 
"SCRAM-SHA-256-PLUS tls-awesome".


And how about supporting different SCRAM mechanisms with different 
possible channel bindings? Separate by space too? So given a field, is 
the first item the SCRAM mechanism, and all the remaning the channel 
binding methods? I.e.:


SCRAM-SHA-256-PLUS tls-awesome tls-awesome2 tls-awesome3\0...

Please note that if this is the solution chosen:

- A lot of parsing and convention is required (first arg is the SCRAM 
mechanism, succesive are channel binding; tls-unique is always 
"implied", etc)
- Channel binding names will be repeated for every SCRAM mechanism with 
"-PLUS". This is not needed, SCRAM mechanisms and channel binding are 
separate things.
- Channel binding names will not be present on others, making the parser 
even more complex.


All this vs, again, stating SCRAM mechanisms on one list and 
channel binding on another list, which is to my much more KISS. But... 
anyway, if this is the decision made, at least I think this should be 
documented now, because client parsers need to be designed one way or 
another.



Thanks,

Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-12 Thread Álvaro Hernández Tortosa



On 12/04/17 19:34, Heikki Linnakangas wrote:

On 04/11/2017 02:32 PM, Álvaro Hernández Tortosa wrote:

 So I still see your proposal more awkward and less clear, mixing
things that are separate. But again, your choice  :)


So, here's my more full-fledged proposal.

The first patch refactors libpq code, by moving the responsibility of 
reading the GSS/SSPI/SASL/MD5 specific data from the authentication 
request packet, from the enormous switch-case construct in 
PQConnectPoll(), into pg_fe_sendauth(). This isn't strictly necessary, 
but I think it's useful cleanup anyway, and now that there's a bit 
more structure in the AuthenticationSASL message, the old way was 
getting awkward.


The second patch contains the protocol changes, and adds the 
documentation for it.


- Heikki



Hi Heikki.

Thanks for the patches :)

By looking at the them, and unless I'm missing something, I don't 
see how the extra information for the future implementation of channel 
binding would be added (without changing the protocol). Relevant part is:


The message body is a list of SASL authentication mechanisms, in the
server's order of preference. A zero byte is required as terminator after
the last authentication mechanism name. For each mechanism, there is the
following:



String



Name of a SASL authentication mechanism.






How do you plan to implement it, in future versions, without 
modifying the AuthenticationSASL message? Or is it OK to add new fields 
to a message in future PostgreSQL versions, without considering that a 
protocol change?


On a side note, I'd mention that the list of SASL authentication 
mechanisms contains valid IANA Registry SCRAM names 
(https://www.iana.org/assignments/sasl-mechanisms/sasl-mechanisms.xhtml#scram)for 
SCRAM authentication messages (making it more clear what values would be 
expected there from the client).


I hope to start testing this from Java the coming weekend. I will 
keep you posted.



Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some thoughts about SCRAM implementation

2017-04-12 Thread Álvaro Hernández Tortosa



On 12/04/17 18:38, Robert Haas wrote:

On Tue, Apr 11, 2017 at 9:20 AM, Álvaro Hernández Tortosa
<a...@8kdata.com> wrote:

 LOL. Do you really want a half-baked Java programmer writing a patch in
C for PostgreSQL? I once tried that and Magnus said my code was Java code
that happened to compile with a C compiler ^_^

 Having said that, I take the bait, I like challenges and putting my
words behind my code :)

Excellent, because that's how stuff gets done around here.  Saying
that you want something and hoping other people will do it is fine,
but being will to put some effort into it is a lot more likely to get
it done.  Not to be harsh, but showing up 3 days after feature freeze
to complain that a feature pending commit for 14 months is missing
something that you really need isn't really the right way to make
something happen.  I'm pretty sure that the lack of channel binding
was discussed quite a bit earlier than now, so I think there was
adequate opportunity to protest, contribute a patch, etc.  It's not
that I don't have sympathy for the way you feel about this: seeing
features you care about fall out of a release sucks, and I've
experienced a lot of that suckage quite recently, so the pain is fresh
in my mind.  But it's something we all have to live with for the
overall good of the product.  We used to not have a firm feature
freeze, and that was way worse.


Hi Robert. Not harsh, no worries, but please note a couple of 
things, let me give you a bit of context about my recent comments:


- I haven't complained about not having channel binding support. I was 
just giving my own recommendations for the PostgreSQL community, in the 
belief that contributing this opinion could let -hackes to make a more 
informed decision about whether to include or not a given feature.


- I haven't said either that I need it. I don't use SCRAM, much less 
with channel binding. Rather than looking after myself here, I'm trying 
to sit on the foot of potential users and speak up for them. I might be 
wrong, of course, but this is the only role I'm trying to play here.


- I know how PostgreSQL release cycles work and not meaning to hack them 
anyway. I just thought raising the fact that something that I believe 
might be requested by enterprise users was on the other hand a minor 
addition to a feature, and thus a really high value-to-effort ratio. 
Indeed, I bet adding channel binding is an order of magnitude easier 
than adding saslprep (which is optional too, btw). Ultimately I'm happy 
with SCRAM in PG 10, whether it has cbinding or not, as it is a great 
addition and makes PG10 an even better software.


- Even though I don't really care about SCRAM, and without having any 
prior knowledge about SCRAM, I volunteered some time ago to study SCRAM, 
give a lightning talk about SCRAM and later write a client 
implementation for the jdbc driver. And I have already devoted a very 
fair amount of time in doing so, and will keep doing that until all code 
is done. Code WIP is here FYI: https://github.com/ahachete/scram. So 
it's not that I haven't already put my code behind my words.


- Given all that, I still want to volunteer to not only do the client 
jdbc part and consequently help debugging the server implementation (as 
I believe it is so far the only non-libpq implementation), but also try 
to also stand by my words by writing channel binding code for PostgreSQL 
server. This is a huge effort for me, I only did C programming on the 
year 2001. But still, I want to help from my limited capabilities as 
much as I can.


- Having thoroughly studied the RFC and companion documentation, I 
thought I was in a knowledge position as to give some recommendations 
that other hackers may not know about (unless a deep study of SCRAM 
would have been done). That's it, recommendation, ideas.





In this case, I think it is abundantly clear that SCRAM without
channel binding is still a good feature.


I agree. I may have exaggerated before, downplaying SCRAM without 
channel binding. I think it is great. Period. I also still think channel 
binding is very small code addition yet provides even better value. I 
apologize for not picking my previous words more carefully.



  One piece of evidence for
that is that the standard uses the suffix -PLUS to denote the
availability of channel binding.  That clearly conveys that channel
binding has value, but also that not having it does not make the whole
thing useless.  Otherwise, they would have required it to be part of
every implementation, or they would have made you add -CRAPPY if you
didn't have it.  The discussion elsewhere on this thread has
adequately underlined the value of what we've already got, so I won't
further belabor the point here.

Furthermore, I think that the state of this feature as it currently
exists in the tree is actually kind of concerning.  There are
currently four open items pertaining to SCRAM at least two of which
look to m

Re: [HACKERS] Some thoughts about SCRAM implementation

2017-04-12 Thread Álvaro Hernández Tortosa



On 12/04/17 18:09, Tom Lane wrote:

Heikki Linnakangas <hlinn...@iki.fi> writes:

On 04/12/2017 06:26 PM, Bruce Momjian wrote:

How does it do that?

Good question, crypto magic? I don't know the details, but the basic
idea is that you extract a blob of data that uniquely identifies the TLS
connection. Using some OpenSSL functions, in this case. I think it's a
hash of some of the TLS handshake messages that were used when the TLS
connection was established (that's what "tls-unique" means). That data
is then incorporated in the hash calculations of the SCRAM
authentication. If the client and the server are not speaking over the
same TLS connection, they will use different values for the TLS data,
and the SCRAM computations will not match, and you get an authentication
failure.


I believe the above is not correct. Channel binding data is *not* 
used for any hash computations. It is simply a byte array that is 
received as an extra user parameter, and the server then gets it by its 
own way (its own TLS data) and do a byte comparison. That's what the 
RFCs say about it.

... which the user can't tell apart from having fat-fingered the password,
I suppose?  Doesn't sound terribly friendly.  A report of a certificate
mismatch is far more likely to lead people to realize there's a MITM.


So given what I said before, that won't happen. Indeed, SCRAM RFC 
contains a specific error code for this: "channel-bindings-dont-match".



Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some thoughts about SCRAM implementation

2017-04-11 Thread Álvaro Hernández Tortosa



On 11/04/17 15:18, Heikki Linnakangas wrote:

On 04/11/2017 04:09 PM, Álvaro Hernández Tortosa wrote:

 But I will conserve my remaining courage (thanks Michael!) credits
for future threads ;) I have stated my opinion clearly, I will now go
back to the client library.


Once you're done with the client library, feel free to post a patch to 
implement channel binding. If it really is trivial, we can discuss it. 
Most likely, it won't go into v10, but someone's got to write the 
patch for v11 anyway, and until there's a patch on the table to 
discuss, there's no point in arguing. Yes, I'm trying to cajole you 
into writing the patch for v11, so that I don't have to :-).


LOL. Do you really want a half-baked Java programmer writing a 
patch in C for PostgreSQL? I once tried that and Magnus said my code was 
Java code that happened to compile with a C compiler ^_^


Having said that, I take the bait, I like challenges and putting my 
words behind my code :)



Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some thoughts about SCRAM implementation

2017-04-11 Thread Álvaro Hernández Tortosa



On 11/04/17 15:03, Magnus Hagander wrote:
On Tue, Apr 11, 2017 at 2:53 PM, Álvaro Hernández Tortosa 
<a...@8kdata.com <mailto:a...@8kdata.com>> wrote:




On 10/04/17 20:32, Andres Freund wrote:

On 2017-04-10 20:28:27 +0200, Álvaro Hernández Tortosa wrote:


On 10/04/17 13:02, Heikki Linnakangas wrote:

On 04/10/2017 12:39 PM, Álvaro Hernández Tortosa wrote:

- I think channel binding support should be added.
SCRAM brings security
improvements over md5 and other simpler digest
algorithms. But where it
really shines is together with channel binding.
This is the only method
to prevent MITM attacks. Implementing it should
not very difficult.
There are several possible channel binding
mechanisms, but the mandatory
and probably more typical one is 'tls-unique'
which basically means
getting the byte array from the TLSfinish()
message and comparing it
with the same data sent by the client. That's more
or less all it takes
to implement it. So I would go for it.

We missed the boat for PostgreSQL 10. You're right
that it probably
wouldn't be difficult to implement, but until there's
a concrete patch
to discuss, that's a moot point.

 Really? That's a real shame I know we're very
late in the CF cycle
but, again, this would be a real shame.

That can equally be said about about a lot of features.  If we
don't
stop at some point... Also, we're not late in the CF cycle,
the CF cycle
for v10 is over.  It's not like the non-existance of channel
binding
removes previously existing features, or makes SCRAM useless.


Greetings,

Andres Freund


I know this is a lost battle. But please bear with me for a
minute.

Let's put ourselves on the foot of potential users. Why would
anyone want to use SCRAM? What for? The hashing mechanism is
better, no question. And bring some added benefits, true. So its
"better". But the real gain comes from using channel binding,
which avoids impersonation, MITM attacks. This is the deal
breaker. SCRAM without channel binding is like Coke Zero without
caffeine and mixed with water. Don't get me wrong, the work behind
is great.



I think you are seriously undervaluing the SCRAM without channel binding.


I'm not. If I wouldn't appreciate its value, I wouldn't be writing 
a client library for the jdbc driver.



It improves a lot of things over our current md5 method beyond just 
using a stronger hashing algorithm.


Sure, channel binding is great. But that's not a dealbreaker, or even 
close to it.


I think otherwise. It is close to a dealbreaker. And it's so few 
extra code lines that it requires




But just a bit more is needed to make it really a big
announcement and provide real value to (I guess, mostly but very
interesting) enterprise customers, for which MITM and
impersonating are big things. The good news is that adding channel
binding is like inverse Paretto: a 20% of extra effort (I bet
significantly less) leads to 80% improvement.


I would expect most enterprise customers who care about MITM 
protection are already using either TLS or ipsec to cover that 
already. They have benefit from the other parts of SCRAM, but they've 
already solved those problems.


Enterprises use checklists. And discard solutions if they don't 
have "checks" on all the items. One of those is, in my opinion, SCRAM 
with channel binding. I don't want this to happen to PG, specially when 
it's so easy to implement.



But I will conserve my remaining courage (thanks Michael!) credits 
for future threads ;) I have stated my opinion clearly, I will now go 
back to the client library.



    Thanks,

    Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



Re: [HACKERS] Some thoughts about SCRAM implementation

2017-04-11 Thread Álvaro Hernández Tortosa



On 10/04/17 20:32, Andres Freund wrote:

On 2017-04-10 20:28:27 +0200, Álvaro Hernández Tortosa wrote:


On 10/04/17 13:02, Heikki Linnakangas wrote:

On 04/10/2017 12:39 PM, Álvaro Hernández Tortosa wrote:

- I think channel binding support should be added. SCRAM brings security
improvements over md5 and other simpler digest algorithms. But where it
really shines is together with channel binding. This is the only method
to prevent MITM attacks. Implementing it should not very difficult.
There are several possible channel binding mechanisms, but the mandatory
and probably more typical one is 'tls-unique' which basically means
getting the byte array from the TLSfinish() message and comparing it
with the same data sent by the client. That's more or less all it takes
to implement it. So I would go for it.

We missed the boat for PostgreSQL 10. You're right that it probably
wouldn't be difficult to implement, but until there's a concrete patch
to discuss, that's a moot point.

 Really? That's a real shame I know we're very late in the CF cycle
but, again, this would be a real shame.

That can equally be said about about a lot of features.  If we don't
stop at some point... Also, we're not late in the CF cycle, the CF cycle
for v10 is over.  It's not like the non-existance of channel binding
removes previously existing features, or makes SCRAM useless.


Greetings,

Andres Freund


I know this is a lost battle. But please bear with me for a minute.

Let's put ourselves on the foot of potential users. Why would 
anyone want to use SCRAM? What for? The hashing mechanism is better, no 
question. And bring some added benefits, true. So its "better". But the 
real gain comes from using channel binding, which avoids impersonation, 
MITM attacks. This is the deal breaker. SCRAM without channel binding is 
like Coke Zero without caffeine and mixed with water. Don't get me 
wrong, the work behind is great.


But just a bit more is needed to make it really a big announcement 
and provide real value to (I guess, mostly but very interesting) 
enterprise customers, for which MITM and impersonating are big things. 
The good news is that adding channel binding is like inverse Paretto: a 
20% of extra effort (I bet significantly less) leads to 80% improvement.


So CF v10 is over. So we're on testing phase. Can't we consider 
this a "missing feature bug"? ^_^



    Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-11 Thread Álvaro Hernández Tortosa



On 11/04/17 13:21, Heikki Linnakangas wrote:

On 04/11/2017 01:39 PM, Álvaro Hernández Tortosa wrote:

The fact that you null terminate them (fine with me) doesn't change
my reasoning. How do you separate multiple channel binding methods? And
do you realize that you will be repeating the channel binding methods
without reason? A contrived but legal, possible example:

Field1:
SCRAM-SHA-256\0
SCRAM-SHA-512\0
SCRAM-SHA-999\0
SCRAM-SHA-256-PLUS tls-unique tls-awesome yet-another-tls\0
SCRAM-SHA-512-PLUS tls-unique tls-awesome yet-another-tls\0
SCRAM-SHA-999-PLUS tls-unique tls-awesome yet-another-tls\0


I was actually thinking of:

SCRAM-SHA-256\0
SCRAM-SHA-512\0
SCRAM-SHA-999\0
SCRAM-SHA-256-PLUS\0
SCRAM-SHA-256-PLUS tls-awesome\0
SCRAM-SHA-256-PLUS yet-another-tls\0
SCRAM-SHA-512-PLUS\0
SCRAM-SHA-512-PLUS tls-awesome\0
SCRAM-SHA-512-PLUS yet-another-tls\0
SCRAM-SHA-999-PLUS\0
SCRAM-SHA-999-PLUS tls-awesome\0
SCRAM-SHA-999-PLUS yet-another-tls\0

In practice, I don't foresee us having this many options, so the 
verbosity won't be an issue. Parsing this is very straightforward.


That's maybe slightly better, since -I agree- verbosity is not an 
issue. But parsing (parsers, and validators) are still more complex (you 
need to check that if suffix is -PLUS you need to split by space and 
find another field with another format based on another lookup table of 
IANA registry names and so forth). Vs: this field is for SCRAM names, 
this field is for channel binding names. Done.


Let me exemplify. In Java-ish syntax, your type would be something 
like:


List<Pair<ScramMechanism,ChannelBindingType>> from where you need to 
extract individually ScramMechanisms and unique(ChannelBindingType)


My proposal would have two lists:

List
List

which is exactly what you need.

So I still see your proposal more awkward and less clear, mixing 
things that are separate. But again, your choice  :)





vs

Field 1:
SCRAM-SHA-256,SCRAM-SHA-256-PLUS,SCRAM-SHA-1-PLUS,SCRAM-SHA-512-PLUS
(simple CSV)
Field 2: tls-unique (String)


What if tls-unique is only supported with SCRAM-SHA-256-PLUS, while
SCRAM-SHA-512-PLUS requires tls-awesome?


 It can't happen. The RFC clearly states that they are orthogonal.
It is left to the implementations support one or the other, but no
reason to limit applicability of a given binding method to a given SCRAM
mechanisms (or viceversa).


Well, if tls-unique is found to be insecure, a future 
SCRAM-SHA-512-PLUS spec might well define that the default for that 
mechanism is tls-unique-new-and-secure rather than tls-unique. Maybe 
even forbid using tls-unique altogether. I don't think that's likely, 
but this is all about future-proofing, so I'd rather keep it flexible.


If it would be insecure, I'd immediately stop it from being 
advertised, and problem solved. Nothing would change (under my proposal).


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-11 Thread Álvaro Hernández Tortosa



On 11/04/17 12:23, Heikki Linnakangas wrote:

On 04/11/2017 11:55 AM, Álvaro Hernández Tortosa wrote:

On 11/04/17 08:50, Heikki Linnakangas wrote:

Oh, I see. According to the SCRAM RFC, "tls-unique" is used by
default. I don't see us implementing anything else any time soon.
PostgreSQL doesn't support any other "channel type" than TLS, and
tls-unique is the only one that makes sense for TLS.

If we do need it after all, the server could advertise the additional
channel binding types as additional SASL mechanisms in the
AuthenticationSASL message, maybe something like:

"SCRAM-SHA-256"
"SCRAM-SHA-256-PLUS" (for tls-unique)
"SCRAM-SHA-256-PLUS ssh-unique" (for hypothetical ssh channel binding)

The same trick can be used to advertise any other SASL mechanism
specific options, if needed in the future.


 Why not add an extra field to the message? This scheme has in my
opinion some disadvantages:

- You assume no extensibility. Maybe Postgres will implement other
mechanisms for channel binding. Maybe not. But why limit ourselves?
- Apart from tls-unique there are others today, like
tls-server-end-point and who knows if in the future TLS 1.x comes with
something like 'tls-unique-1.x'
- Why have to parse the string (separated by spaces) when you could use
different fields and have no parsing at all?


I don't think an option separated by space is any more difficult to 
parse than a separate field. I'm envisioning that the "parsing" would 
be simply:


if (strcmp(sasl_mechanism, "SCRAM-SHA-256") == 0) { ... }
else if (strcmp(sasl_mechanism, "SCRAM-SHA-256-PLUS") == 0) { ... }
else if (strcmp(sasl_mechanism, "SCRAM-SHA-256-PLUS tls-awesome") == 
0) { ... }


This can be extended for more complicated options, if necessary. 
Although if we find that we need a dozen different options, I think 
we've done something wrong.



- How do you advertise different SCRAM mechanisms with different channel
binding types? And a mix of SCRAM mechanisms with and without channel
binding? If I got it right, with your proposal it would be something 
like:


Field 1: SCRAM-SHA-256,SCRAM-SHA-256-PLUS tls-unique,SCRAM-SHA-1-PLUS
tls-unique,SCRAM-SHA-512-PLUS tls-unique

(which is basically a CSV of pairs where the right part of the pair
might be empty; too much IMHO for a single field)


Yes, except that in my proposal, the list is not a comma-separated 
string, but a list of null-terminated strings, similar to how some 
other lists of options in the FE/BE protocol are transmitted.


The fact that you null terminate them (fine with me) doesn't change 
my reasoning. How do you separate multiple channel binding methods? And 
do you realize that you will be repeating the channel binding methods 
without reason? A contrived but legal, possible example:


Field1:
SCRAM-SHA-256\0
SCRAM-SHA-512\0
SCRAM-SHA-999\0
SCRAM-SHA-256-PLUS tls-unique tls-awesome yet-another-tls\0
SCRAM-SHA-512-PLUS tls-unique tls-awesome yet-another-tls\0
SCRAM-SHA-999-PLUS tls-unique tls-awesome yet-another-tls\0

vs

Field 1:
SCRAM-SHA-256 SCRAM-SHA-512 SCRAM-SHA-999 SCRAM-SHA-256-PLUS 
SCRAM-SHA-512-PLUS SCRAM-SHA-999-PLUS\0


Field 2:
tls-unique tls-awesome yet-another-tls\0


Parsing and validation of the first option is significantly more 
complex than the second option.





vs

Field 1:
SCRAM-SHA-256,SCRAM-SHA-256-PLUS,SCRAM-SHA-1-PLUS,SCRAM-SHA-512-PLUS
(simple CSV)
Field 2: tls-unique (String)


What if tls-unique is only supported with SCRAM-SHA-256-PLUS, while 
SCRAM-SHA-512-PLUS requires tls-awesome? 


It can't happen. The RFC clearly states that they are orthogonal. 
It is left to the implementations support one or the other, but no 
reason to limit applicability of a given binding method to a given SCRAM 
mechanisms (or viceversa).



What about possible other options you might need to tack on to 
mechanisms? This seems less flexible. I'm not saying that either of 
those cases is very likely, but I don't think it's very likely we'll 
need extra options or different channel binding types any time soon, 
anyway.


As I said, channel binding and SCRAM mechanisms are orthogonal.

On the flip side, a contrived parsing mechanism with optional 
fields and many that would be repeated all the time seems far from 
ideal. And far less clear.






Is there any reason not to use two fields? AuthenticationSASL is a
new message, I guess we can freely choose its format, right?


Yes, we can choose the format freely.


Cool.



In summary, I think the simple list of mechanism names is best, because:

* It's simple, and doesn't have any extra fields or options that are 
not needed right now.


I think it is too complicated and mixing things that are 
orthogonal. Since protocol is hard to extend, adding an extra field for 
the channel binding mechanisms -even if unused as of PG 10- is a good 
thing. If you need to change the protocol later, th

Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-11 Thread Álvaro Hernández Tortosa



On 11/04/17 08:50, Heikki Linnakangas wrote:

On 04/10/2017 11:03 PM, Álvaro Hernández Tortosa wrote:

 Channel binding needs to specify actually three things:
- The mechanism, which is indeed suffixed "-PLUS".
- The channel binding name, which is described here:
https://tools.ietf.org/html/rfc5056. Types are also IANA-registered (see
https://www.iana.org/assignments/channel-binding-types/channel-binding-types.xhtml). 


SCRAM mandates to implement 'tls-unique', but other channel binding
types could be supported (like 'tls-server-end-point' for example).
- The channel binding data, which is channel binding mechanism
dependent, and is sent as part of the client last message.

 What I'm talking about here is the second one, the channel binding
type (name).


Oh, I see. According to the SCRAM RFC, "tls-unique" is used by 
default. I don't see us implementing anything else any time soon. 
PostgreSQL doesn't support any other "channel type" than TLS, and 
tls-unique is the only one that makes sense for TLS.


If we do need it after all, the server could advertise the additional 
channel binding types as additional SASL mechanisms in the 
AuthenticationSASL message, maybe something like:


"SCRAM-SHA-256"
"SCRAM-SHA-256-PLUS" (for tls-unique)
"SCRAM-SHA-256-PLUS ssh-unique" (for hypothetical ssh channel binding)

The same trick can be used to advertise any other SASL mechanism 
specific options, if needed in the future.


Why not add an extra field to the message? This scheme has in my 
opinion some disadvantages:


- You assume no extensibility. Maybe Postgres will implement other 
mechanisms for channel binding. Maybe not. But why limit ourselves?
- Apart from tls-unique there are others today, like 
tls-server-end-point and who knows if in the future TLS 1.x comes with 
something like 'tls-unique-1.x'
- Why have to parse the string (separated by spaces) when you could use 
different fields and have no parsing at all?
- How do you advertise different SCRAM mechanisms with different channel 
binding types? And a mix of SCRAM mechanisms with and without channel 
binding? If I got it right, with your proposal it would be something like:


Field 1: SCRAM-SHA-256,SCRAM-SHA-256-PLUS tls-unique,SCRAM-SHA-1-PLUS 
tls-unique,SCRAM-SHA-512-PLUS tls-unique


(which is basically a CSV of pairs where the right part of the pair 
might be empty; too much IMHO for a single field)


vs

Field 1: 
SCRAM-SHA-256,SCRAM-SHA-256-PLUS,SCRAM-SHA-1-PLUS,SCRAM-SHA-512-PLUS 
(simple CSV)

Field 2: tls-unique (String)

Is there any reason not to use two fields? AuthenticationSASL is a 
new message, I guess we can freely choose its format, right?







I'm not sure I follow. The username is sent from client to server, and
currently, the server will ignore it. If you're writing a client
library, it can send whatever it wants. (Although again I would
recommend an empty string, to avoid confusion. Sending the same
username as in the startup packet, as long as it's in UTF-8, seems
reasonable too.)


 OK, understood. I will not let then the SCRAM implementation I'm
writing to allow for empty string as the user name, but in the pgjdbc
glue code send "ignore" as the user name or something like that ;P


Hmm, so the SASL library you're using doesn't like sending an empty 
string as the username? Now that I look at RFC5802 more closely, it says:



If the preparation of the username fails or results in an empty
string, the server SHOULD abort the authentication exchange.


Perhaps we should reserve a magic user name to mean "same as startup 
message", in addition or instead of the empty string. We actually 
discussed that already at [1], but we forgot about it since.


That works. Please let me know what is the "magic constant" chosen ;P


Thanks,

Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-10 Thread Álvaro Hernández Tortosa



On 10/04/17 21:41, Heikki Linnakangas wrote:

On 04/10/2017 09:33 PM, Álvaro Hernández Tortosa wrote:

 Thanks for posting the patched HTML. In my opinion, all looks good
except that:

- I will add an extra String (a CSV) to AuthenticationSASL message for
channel binding names, so that message format can remain without changes
when channel binding is implemented. It can be empty.


Note that SCRAM-SHA-256 with channel binding has a different SASL 
mechanism name, SRAM-SHA-256-PLUS. No need for a separate flag or 
string for channel binding. When support for channel binding is added 
to the server, it will advertise two SASL mechanisms in the 
AuthenticationSASL message, SCRAM-SHA-256 and SCRAM-SHA-256-PLUS. (Or 
just SCRAM-SHA-256-PLUS, if channel-binding is required).


Channel binding needs to specify actually three things:
- The mechanism, which is indeed suffixed "-PLUS".
- The channel binding name, which is described here: 
https://tools.ietf.org/html/rfc5056. Types are also IANA-registered (see 
https://www.iana.org/assignments/channel-binding-types/channel-binding-types.xhtml). 
SCRAM mandates to implement 'tls-unique', but other channel binding 
types could be supported (like 'tls-server-end-point' for example).
- The channel binding data, which is channel binding mechanism 
dependent, and is sent as part of the client last message.


What I'm talking about here is the second one, the channel binding 
type (name).





- If the username used is the one sent in the startup message, rather
than leaving it empty in the client-first-message, I would force it to
be the same as the used in the startuo message.


The problem with that is that the SCRAM spec dictates that the 
username must be encoded in UTF-8, but PostgreSQL supports non-UTF-8 
usernames.


Or did you mean that, if the username is sent, it must match the one 
in the startup packet, but an empty string would always be allowed? 
That would be reasonable.



Otherwise we may confuse
some client implementations which would probably consider that as an
error; for one, my implementation would currently throw an error if
username is empty, and I think that's correct.


I'm not sure I follow. The username is sent from client to server, and 
currently, the server will ignore it. If you're writing a client 
library, it can send whatever it wants. (Although again I would 
recommend an empty string, to avoid confusion. Sending the same 
username as in the startup packet, as long as it's in UTF-8, seems 
reasonable too.)


OK, understood. I will not let then the SCRAM implementation I'm 
writing to allow for empty string as the user name, but in the pgjdbc 
glue code send "ignore" as the user name or something like that ;P


Thanks for reviewing this! I'll start hacking on code changes to go 
with these docs.


Thanks for writing the code :)


    Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-10 Thread Álvaro Hernández Tortosa



On 10/04/17 14:57, Heikki Linnakangas wrote:

On 04/07/2017 01:13 AM, Michael Paquier wrote:
On Fri, Apr 7, 2017 at 5:15 AM, Álvaro Hernández Tortosa 
<a...@8kdata.com> wrote:
I don't see it. The message AuthenticationSASL.String could 
contain a
CSV of the SCRAM protocols supported. This is specially important to 
support
channel binding (which is just another protocol name for this 
matter), which
is the really enhanced security mechanism of SCRAM. Since this 
message is
sent regardless, and the client replies with PasswordMessage, no 
extra round
trip is required. However, PasswordMessage needs to also include a 
field
with the name of the selected protocol (it is the client who picks). 
Or a
different message would need to be created, but no extra round-trips 
more
than those required by SCRAM itself (4 messages for SCRAM + 1 extra 
for the

server to tell the client it needs to use SCRAM).


Yes, it seems to me that the list of protocols to send should be done
by sendAuthRequest(). Then the client parses the received string, and
sends an extra 'p' message with its choice before sending the first
SCRAM message. So there is no need for any extra round trips.


I started writing down the protocol docs, based on the above idea. See 
attached. The AuthenticationSASL message now contains a list of 
mechanisms.


Does that seem clear to you? If so, I'll change the code to match the 
attached docs.


I added two new message formats to the docs, SASLResponse and 
SASLInitialResponse. Those use the same type byte as PasswordMessage, 
'p', but I decided to describe them as separate messages for 
documentation purposes, since the content is completely different 
depending on whether the message is sent as part of SASL, GSS, md5, or 
password authentication. IOW, this is not a change in the 
implementation, only in the way it's documented.



While working on this, and reading the RFCs more carefully, I noticed 
one detail we should change, to be spec-complicant. The SASL spec 
specifies that a SASL authentication exchange consists of 
challenge-response pairs. There must be a response to each challenge. 
If the last message in the authentication mechanism (SCRAM in this 
case) goes in the server->client direction, then that message must 
sent as "additional data" in the server->client message that tells the 
client that the authentication was successful. That's AuthenticationOK 
in the PostgreSQL protocol. In the current implementation, the 
server-final-message is sent as an AuthenticationSASLContinue message, 
and the client doesn't respond to that.


We should change that, so that the server-final-message is sent as 
"additional data" in the AuthenticationOK message. The attached docs 
patch describes that, rather than what the current implementation does.


(For your convenience, I built the HTML docs with this patch, and put 
them up at http://hlinnaka.iki.fi/temp/scram-wip-docs/protocol.html 
for viewing)


- Heikki




Thanks for posting the patched HTML. In my opinion, all looks good 
except that:


- I will add an extra String (a CSV) to AuthenticationSASL message for 
channel binding names, so that message format can remain without changes 
when channel binding is implemented. It can be empty.


- If the username used is the one sent in the startup message, rather 
than leaving it empty in the client-first-message, I would force it to 
be the same as the used in the startuo message. Otherwise we may confuse 
some client implementations which would probably consider that as an 
error; for one, my implementation would currently throw an error if 
username is empty, and I think that's correct.



    Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



Re: [HACKERS] Some thoughts about SCRAM implementation

2017-04-10 Thread Álvaro Hernández Tortosa



On 10/04/17 13:02, Heikki Linnakangas wrote:

On 04/10/2017 12:39 PM, Álvaro Hernández Tortosa wrote:

- I think channel binding support should be added. SCRAM brings security
improvements over md5 and other simpler digest algorithms. But where it
really shines is together with channel binding. This is the only method
to prevent MITM attacks. Implementing it should not very difficult.
There are several possible channel binding mechanisms, but the mandatory
and probably more typical one is 'tls-unique' which basically means
getting the byte array from the TLSfinish() message and comparing it
with the same data sent by the client. That's more or less all it takes
to implement it. So I would go for it.


We missed the boat for PostgreSQL 10. You're right that it probably 
wouldn't be difficult to implement, but until there's a concrete patch 
to discuss, that's a moot point.


Really? That's a real shame I know we're very late in the CF 
cycle but, again, this would be a real shame.





- One SCRAM parameter, i, the iteration count, is important, and in my
opinion should be configurable. AFAIK it is currently hard coded at
4096, which is the minimum value accepted by the RFC. But it should be
at least 15K (RFC says), and given that it affects computing time of the
authentication, it should be configurable. It's also now specified per
user, which I think its too much granularity (it should suffice to say
this group of users all require this iteration count).


Yeah, a GUC might be a good idea.


Cool.



The iteration count is a part of the stored SCRAM verifier, so it's 
determined when you set the user's password. That's why it's per-user.


Sure. But I think per-user is too much granularity. And if it is 
not, it should be a parameter exposed to the CREATE USER command, such 
that it can be (effectively) set per-user. I maintain the best approach 
could be the suggested pg_scram.conf with the format (or a similar one) 
that I proposed in the OP.





- The SCRAM RFC states that server should advertise supported
mechanisms. I see discussion going into not advertising them. I think it
should be done, I don't see reasons not to do it, and it will become
compliant with the RFC.


The SASL spec [RFC4422] says that mechanism negotiation is protocol 
specific. The SCRAM spec [RFC5802] prescribes how negotiation of 
channel-binding is done, and the current implementation is compliant 
with that. (Which is very straightforward when neither the client nor 
the server supports channel binding).


Yeah. But what I'm saying is that we might want to add an extra 
String to AuthenticationSASL message for channel binding, so that the 
message format needs not to be changed when channel binding is added. 
But the channel binding method name needs to be negotiated, and so there 
needs to be room for it.




The negotiation of the mechanism is being discussed one the "Letting 
the client choose the protocol to use during a SASL exchange" thread. 
I'm just writing a more concrete proposal based on your suggestion of 
sending a list of SASL mechanisms in the AuthenticationSASL message. 
Stay tuned!


Yepp, I will reply there, thanks :)




- I don't see why proposed scram mechanism names for pg_hba.conf are not
following the IANA Registry standard (
https://www.iana.org/assignments/sasl-mechanisms/sasl-mechanisms.xhtml#scram), 


which is uppercase (like in SCRAM-SHA-256).


All the existing authentication mechanisms are spelled in lowercase. 
And uppercase is ugly. It's a matter of taste, for sure.


And I agree with you. It's ugly. But it's standard. I'd say let's 
favor standardization vs our own taste.





- SCRAM also supports the concept of authzid, which is kind of what
pg_ident does: authenticate the user as the user sent, but login as the
user specified here. It could also be supported.


Yeah, it might be handy for something like pgbouncer, which could then 
have one userid+password to authenticate, but then switch to another 
user. But the SCRAM part of that seems to be just a small part of the 
overall feature. In any case, this is clearly Postgres 11 material.


Again, it should be a tiny change, just an extra attribute sent 
over the message. But I guess time was over... just saying... ;)






* The nonce length is not specified by the RFC. I see typical
implementations use 24 chars for the client and 18 for the server.
Current code uses 10. I think it should not hurt making it at least 16
or 18.


Wouldn't hurt, I guess. IIRC I checked some other implementations, 
when I picked 10, but I don't remember which ones anymore. Got a 
reference for 24/18?


First reference is the RFC example itself (non-mandatory, of 
course). But then I saw many followed this. As a quick example, GNU SASL 
defines:


#define SNONCE_ENTROPY_BYTES 18
https://www.gnu.org/software/gsasl/coverage/lib/scram/server.c.gcov.frameset.html




* It seems to me that the errors defined by the

[HACKERS] Some thoughts about SCRAM implementation

2017-04-10 Thread Álvaro Hernández Tortosa


Hi!

There's some ongoing discussion about SCRAM (like this thread 
https://www.postgresql.org/message-id/243d8c11-6149-a4bb-0909-136992f74b23%40iki.fi) 
but I wanted to open a new thread that covers these topics and other, 
more general ones. Here are some thoughts based on my perspective on 
SCRAM, which I gained thanks to studying it as part of the effort to 
implement SCRAM in the JDBC driver. FYI, some very early code can be 
found here: https://github.com/ahachete/scram. Although there's still a 
lot to do, I'm doing great progress and I expect to have a working 
version within 2 weeks.


So here's what I think:

- Work so far is great. Thanks Heikki, Michael and others for the effort 
involved. I love to have SCRAM support in Postgres!


- I think channel binding support should be added. SCRAM brings security 
improvements over md5 and other simpler digest algorithms. But where it 
really shines is together with channel binding. This is the only method 
to prevent MITM attacks. Implementing it should not very difficult. 
There are several possible channel binding mechanisms, but the mandatory 
and probably more typical one is 'tls-unique' which basically means 
getting the byte array from the TLSfinish() message and comparing it 
with the same data sent by the client. That's more or less all it takes 
to implement it. So I would go for it.


- One SCRAM parameter, i, the iteration count, is important, and in my 
opinion should be configurable. AFAIK it is currently hard coded at 
4096, which is the minimum value accepted by the RFC. But it should be 
at least 15K (RFC says), and given that it affects computing time of the 
authentication, it should be configurable. It's also now specified per 
user, which I think its too much granularity (it should suffice to say 
this group of users all require this iteration count).


- The SCRAM RFC states that server should advertise supported 
mechanisms. I see discussion going into not advertising them. I think it 
should be done, I don't see reasons not to do it, and it will become 
compliant with the RFC.


- I don't see why proposed scram mechanism names for pg_hba.conf are not 
following the IANA Registry standard ( 
https://www.iana.org/assignments/sasl-mechanisms/sasl-mechanisms.xhtml#scram), 
which is uppercase (like in SCRAM-SHA-256).


- SCRAM also supports the concept of authzid, which is kind of what 
pg_ident does: authenticate the user as the user sent, but login as the 
user specified here. It could also be supported.



Based on these comments, I'd like to propose the following changes:

* Implement channel binding.

* Add a GUC for the channel binding technique to use. It could be named 
as "scram_channel_binding_name" or "sasl_channel_binding_name", for 
example. And probably supported value as of today, 'tls-unique'.


* AuthenticationSASL backend message:
- The String field make it a CSV that would be a list of one or 
more supported IANA SCRAM authentication methods. This info comes from 
pg_scram.conf (see below).
- Add another String to specify the variety of channel binding 
supported (if any), like 'tls-unique'. This info comes from the GUC.


* Treat the configuration of scram in pg_hba.conf, which seems 
complicated, similarly to how pg_ident.conf is treated. Basically in 
pg_hba.conf specify a "scram=scram_entry_name" value and then look for a 
pg_scram.conf file for an entry with that name. The pg_scram.conf might 
have the following structure:


name|scram_mechanisms|i
scramlight|SCRAM-SHA-1  |  4096
scramstrong |SCRAM-SHA-256,SCRAM-SHA-256-PLUS|16384


* The nonce length is not specified by the RFC. I see typical 
implementations use 24 chars for the client and 18 for the server. 
Current code uses 10. I think it should not hurt making it at least 16 
or 18.


* It seems to me that the errors defined by the RFC, sent on the 
server-final-message if there were errors, are never sent with the 
current implementation. I think they should be sent as per the standard, 
and also proceed until the last stage even if errors were detected 
earlier (to conform with the RFC).



    Thoughts?

    Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Some thoughts about SCRAM implementation

2017-04-10 Thread Álvaro Hernández Tortosa


Hi!

There's some ongoing discussion about SCRAM (like this thread 
https://www.postgresql.org/message-id/243d8c11-6149-a4bb-0909-136992f74b23%40iki.fi) 
but I wanted to open a new thread that covers these topics and other, 
more general ones. Here are some thoughts based on my perspective on 
SCRAM, which I gained thanks to studying it as part of the effort to 
implement SCRAM in the JDBC driver. FYI, some very early code can be 
found here: https://github.com/ahachete/scram. Although there's still a 
lot to do, I'm doing great progress and I expect to have a working 
version within 2 weeks.


So here's what I think:

- Work so far is great. Thanks Heikki, Michael and others for the effort 
involved. I love to have SCRAM support in Postgres!


- I think channel binding support should be added. SCRAM brings security 
improvements over md5 and other simpler digest algorithms. But where it 
really shines is together with channel binding. This is the only method 
to prevent MITM attacks. Implementing it should not very difficult. 
There are several possible channel binding mechanisms, but the mandatory 
and probably more typical one is 'tls-unique' which basically means 
getting the byte array from the TLSfinish() message and comparing it 
with the same data sent by the client. That's more or less all it takes 
to implement it. So I would go for it.


- One SCRAM parameter, i, the iteration count, is important, and in my 
opinion should be configurable. AFAIK it is currently hard coded at 
4096, which is the minimum value accepted by the RFC. But it should be 
at least 15K (RFC says), and given that it affects computing time of the 
authentication, it should be configurable. It's also now specified per 
user, which I think its too much granularity (it should suffice to say 
this group of users all require this iteration count).


- The SCRAM RFC states that server should advertise supported 
mechanisms. I see discussion going into not advertising them. I think it 
should be done, I don't see reasons not to do it, and it will become 
compliant with the RFC.


- I don't see why proposed scram mechanism names for pg_hba.conf are not 
following the IANA Registry standard ( 
https://www.iana.org/assignments/sasl-mechanisms/sasl-mechanisms.xhtml#scram), 
which is uppercase (like in SCRAM-SHA-256).


- SCRAM also supports the concept of authzid, which is kind of what 
pg_ident does: authenticate the user as the user sent, but login as the 
user specified here. It could also be supported.



Based on these comments, I'd like to propose the following changes:

* Implement channel binding.

* Add a GUC for the channel binding technique to use. It could be named 
as "scram_channel_binding_name" or "sasl_channel_binding_name", for 
example. And probably supported value as of today, 'tls-unique'.


* AuthenticationSASL backend message:
- The String field make it a CSV that would be a list of one or 
more supported IANA SCRAM authentication methods. This info comes from 
pg_scram.conf (see below).
- Add another String to specify the variety of channel binding 
supported (if any), like 'tls-unique'. This info comes from the GUC.


* Treat the configuration of scram in pg_hba.conf, which seems 
complicated, similarly to how pg_ident.conf is treated. Basically in 
pg_hba.conf specify a "scram=scram_entry_name" value and then look for a 
pg_scram.conf file for an entry with that name. The pg_scram.conf might 
have the following structure:


name|scram_mechanisms|i
scramlight|SCRAM-SHA-1  |  4096
scramstrong |SCRAM-SHA-256,SCRAM-SHA-256-PLUS|16384


* The nonce length is not specified by the RFC. I see typical 
implementations use 24 chars for the client and 18 for the server. 
Current code uses 10. I think it should not hurt making it at least 16 
or 18.


* It seems to me that the errors defined by the RFC, sent on the 
server-final-message if there were errors, are never sent with the 
current implementation. I think they should be sent as per the standard, 
and also proceed until the last stage even if errors were detected 
earlier (to conform with the RFC).



    Thoughts?

    Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SCRAM authentication, take three

2017-04-07 Thread Álvaro Hernández Tortosa



On 07/04/17 11:05, Magnus Hagander wrote:
On Fri, Apr 7, 2017 at 9:59 AM, Heikki Linnakangas <hlinn...@iki.fi 
<mailto:hlinn...@iki.fi>> wrote:


On 04/07/2017 10:38 AM, Magnus Hagander wrote:

So here's a wild idea. What if we just call it "sha256"? Does
the user
actually care about it being scram, or is scram just an
implementation
detail for them? That way when the next one shows up, it'll be
sha512 or
whatever. It happens to use scram under the hood, but does the
user have to
or does the user want to care about that?

(One could argue the same way that the user shouldn't have to
or want to
care about the hashing algorithm -- but if that's the case
then we should
only have one entry, it would be "scram", and the system would
decide from
there. And I think this discussion already indicates we don't
think this is
enough)


I think the "SCRAM" part is more important than "SHA-256", so -1
on that.


If that is the important part, then I agree :) I am not entirely sure 
that the scram part *is* more important though.


I agree it is much more important. Needed, I'd say. "SHA-256" could 
refer to other mechanisms that just simply hash the value (maybe with a 
salt, or not) with that hash algorithm. SCRAM is a different beast, with 
much more functionality than that. So yes, it matters a lot :)




I think most users will be a lot more comfortable with "sha256" than 
"scram" though. But I guess that says using scram-sha-256 is the 
correct way.


I don't like UPPERCASE, but the RFC links to the IANA registry 
where SCRAM methods are all uppercase and with dashes: SCRAM-SHA-256 and 
SCRAM-SHA-256-PLUS. I'd use those names, they are standardized.





The main against using just "scram" is that it's misleading,
because we implement SCRAM-SHA-256, rather than SCRAM-SHA-1, which
was the first SCRAM mechanism, commonly called just SCRAM. As long
as that's the only SCRAM variant we have, that's not too bad, but
it becomes more confusing if we ever implement SCRAM-SHA-512 or
SCRAM-something-else in the future. That's the point Noah made,
and it's a fair point, but the question is whether we consider
that to be more important than having a short name for what we
have now.


Yeah, I agree we should be prepared for the future. And having "scram" 
and "scram-sha-512" would definitely fall under confusing.


The channel binding aspect is actually more important to think
about right
now, as that we will hopefully implement in the next release
or two.

In [1], Michael wrote:

There is also the channel binding to think about... So we
could have a
list of keywords perhaps associated with SASL? Imagine for
example:
sasl$algo,$channel_binding
Giving potentially:
saslscram_sha256
saslscram_sha256,channel
saslscram_sha512
saslscram_sha512,channel
In the case of the patch of this thread just the first
entry would
make sense, once channel binding support is added a second
keyword/option could be added. And there are of course
other methods
that could replace SCRAM..


It should also be possible to somehow specify "use channel
binding, if the
client supports it".


Is that really a type of authentication? We already hvae the idea of
authentication method options, used for most other things except
md5 which
doesn't have any. So it could be "sha256 channelbind=on", "sha256
channelbind=off" or "sha256 channelbind=negotiate" or something
like that?


> Technically, the channel-binding variant is a separate SASL 
mechanism, i.e. it has a separate name, SCRAM-SHA-256-PLUS. I'm not 
sure if > users/admins think of it that way.



I bet they don't.


Probably. But let's not underestimate channel binding: it is the 
"greatest" feature of SCRAM, and where security really shines. I'd 
encourage the use of channel binding and would definitely make it explicit.


As for the options, there's no way to negotiate, the client picks. 
It could still be three-valued: on, off, only-channel-binding (or 
however you want to call them). That will only change what mechanisms 
the server will be advertising to clients.




Álvaro



--

Álvaro Hernández Tortosa


---
<8K>data



Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-06 Thread Álvaro Hernández Tortosa



On 06/04/17 22:05, Tom Lane wrote:

Simon Riggs <si...@2ndquadrant.com> writes:

How would we provide the list of protocols? Surely the protocol is
defined by pg_hba.conf, which makes it dependent upon username,
database and ip range. If the list were accurate, it would allow an
attacker to discover how best to attack. If the list were inaccurate
it would just be an annoyance.
At minimum, providing the list of protocols means an extra round trip
to the server.

Yeah, that's a problem.


I don't see it. The message AuthenticationSASL.String could contain 
a CSV of the SCRAM protocols supported. This is specially important to 
support channel binding (which is just another protocol name for this 
matter), which is the really enhanced security mechanism of SCRAM. Since 
this message is sent regardless, and the client replies with 
PasswordMessage, no extra round trip is required. However, 
PasswordMessage needs to also include a field with the name of the 
selected protocol (it is the client who picks). Or a different message 
would need to be created, but no extra round-trips more than those 
required by SCRAM itself (4 messages for SCRAM + 1 extra for the server 
to tell the client it needs to use SCRAM).





ISTM that if you have a valid role to connect to then you'll also know
what authentication mechanism to use so you should be able to specify
the mechanism in your connection message and save the extra trip.

I do not buy that in the least.  It has never been the case before now
that clients know in advance what the auth challenge method will be.
If we put that requirement on them for SCRAM, we're just going to be
exporting a lot of pain and end-user-visible inconsistency.

Perhaps we could turn this around: have the client send (in the connection
request packet) a list of auth protocols it thinks it is able to handle.


Per the SCRAM RFC, it is the server who advertises and the client 
who picks.



Regards,

Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Letting the client choose the protocol to use during a SASL exchange

2017-04-06 Thread Álvaro Hernández Tortosa



On 06/04/17 19:05, Heikki Linnakangas wrote:

On 04/06/2017 08:13 AM, Noah Misch wrote:
If any SCRAM open item is a beta blocker, it's this one.  (But 
SASLprep is

also in or near that status.)  Post-beta wire protocol changes are bad,
considering beta is normally the time for projects like pgjdbc and 
npgsql to

start adapting to such changes.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.


I will work on this next week. I haven't given it much thought yet, 
but I think it's going to be pretty straightforward. It won't require 
much code yet, as we only support one SASL mechanism. We just need to 
ensure that we don't paint ourselves in the corner with the protocol.





I think this could easily extended from the current message. SCRAM 
does not force any concrete way of negotiating the protocols supported. 
The current SASL message sends the only SCRAM method supported. I think 
it should be enough to make this a CSV, which for a single value, is the 
same as we have today (so no code change required, only documentation). 
The other message that needs to be changed is the password one, the 
first time the client sends the SCRAM "client-first-message", which 
needs to contain the algorithm selected. As of today, that could be a 
constant (at least in the code) and validate is value.


So I guess code changes would be minimal. I could be wrong, of course.

I'm working myself on Java's (pgjdbc) implementation, and I will 
hopefully have a prototype by next week to try it.



Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing full UTF-8 support (aka supporting 0x00)

2016-08-03 Thread Álvaro Hernández Tortosa



On 03/08/16 21:42, Geoff Winkless wrote:

On 3 August 2016 at 20:36, Álvaro Hernández Tortosa <a...@8kdata.com> wrote:

 Isn't the correct syntax something like:

select E'\uc080', U&'\c080';

?

 It is a single character, 16 bit unicode sequence (see
https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html).

No, what you've done there is created the three-byte utf8 sequence \xec8280

# select U&'\c080'::bytea;
   bytea
--
  \xec8280

It's not a UCS2 c080, it's utf8 c080.

Geoff


Yes, you're absolutely right ^_^

    Álvaro


--

Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing full UTF-8 support (aka supporting 0x00)

2016-08-03 Thread Álvaro Hernández Tortosa



On 03/08/16 21:31, Geoff Winkless wrote:

On 3 August 2016 at 20:13, Álvaro Hernández Tortosa <a...@8kdata.com> wrote:

Yet they are accepted by Postgres
(like if Postgres would support Modified UTF-8 intentionally). The caracter
in psql does not render as a nul but as this symbol: "삀".

Not accepted as valid utf8:

# select E'\xc0\x80';
ERROR:  invalid byte sequence for encoding "UTF8": 0xc0 0x80

You would need a "modified utf8" encoding, I think.

Geoff


Isn't the correct syntax something like:

select E'\uc080', U&'\c080';

?

It is a single character, 16 bit unicode sequence (see 
https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html).



Álvaro

--

Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing full UTF-8 support (aka supporting 0x00)

2016-08-03 Thread Álvaro Hernández Tortosa



On 03/08/16 20:14, Álvaro Hernández Tortosa wrote:



On 03/08/16 17:47, Kevin Grittner wrote:
On Wed, Aug 3, 2016 at 9:54 AM, Álvaro Hernández Tortosa 
<a...@8kdata.com> wrote:



 What would it take to support it?

Would it be of any value to support "Modified UTF-8"?

https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8



That's nice, but I don't think so.

The problem is that you cannot predict how people would send you 
data, like when importing from other databases. I guess it may work if 
Postgres would implement such UTF-8 variant and also the drivers, but 
that would still require an encoding conversion (i.e., parsing every 
string) to change the 0x00, which seems like a serious performance hit.


It could be worse than nothing, though!

Thanks,

Álvaro



It may indeed work.

According to https://en.wikipedia.org/wiki/UTF-8#Codepage_layout 
the encoding used in Modified UTF-8 is an (otherwise) invalid UTF-8 code 
point. In short, the \u00 nul is represented (overlong encoding) by the 
two-byte, 1 character sequence \uc080. These two bytes are invalid UTF-8 
so should not appear in an otherwise valid UTF-8 string. Yet they are 
accepted by Postgres (like if Postgres would support Modified UTF-8 
intentionally). The caracter in psql does not render as a nul but as 
this symbol: "삀".


Given that this works, the process would look like this:

- Parse all input data looking for bytes with hex value 0x00. If they 
appear in the string, they are the null byte.

- Replace that byte with the two bytes 0xc080.
- Reverse the operation when reading.

This is OK but of course a performance hit (searching for 0x00 and 
then augmenting the byte[] or whatever data structure to account for the 
extra byte). A little bit of a PITA, but I guess better than fixing it 
all :)



Álvaro


--

Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing full UTF-8 support (aka supporting 0x00)

2016-08-03 Thread Álvaro Hernández Tortosa



On 03/08/16 18:35, Geoff Winkless wrote:

On 3 August 2016 at 15:54, Álvaro Hernández Tortosa <a...@8kdata.com> wrote:

 Given that 0x00 is a perfectly legal UTF-8 character, I conclude we're
strictly non-compliant.

It's perhaps worth mentioning that 0x00 is valid ASCII too, and
PostgreSQL has never stored that either.


Then yes, it could also be a problem. But as of today I believe 
solving the problem for UTF-8 would solve the great majority of this 
embedded NUL problems.


Álvaro


--

Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing full UTF-8 support (aka supporting 0x00)

2016-08-03 Thread Álvaro Hernández Tortosa



On 03/08/16 17:47, Kevin Grittner wrote:

On Wed, Aug 3, 2016 at 9:54 AM, Álvaro Hernández Tortosa <a...@8kdata.com> 
wrote:


 What would it take to support it?

Would it be of any value to support "Modified UTF-8"?

https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8



That's nice, but I don't think so.

The problem is that you cannot predict how people would send you 
data, like when importing from other databases. I guess it may work if 
Postgres would implement such UTF-8 variant and also the drivers, but 
that would still require an encoding conversion (i.e., parsing every 
string) to change the 0x00, which seems like a serious performance hit.


It could be worse than nothing, though!

Thanks,

Álvaro

--

Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing full UTF-8 support (aka supporting 0x00)

2016-08-03 Thread Álvaro Hernández Tortosa



On 03/08/16 17:23, Tom Lane wrote:

=?UTF-8?Q?=c3=81lvaro_Hern=c3=a1ndez_Tortosa?= <a...@8kdata.com> writes:

  As has been previously discussed (see
https://www.postgresql.org/message-id/BAY7-F17FFE0E324AB3B642C547E96890%40phx.gbl
for instance) varlena fields cannot accept the literal 0x00 value.

Yup.


  What would it take to support it?

One key reason why that's hard is that datatype input and output
functions use nul-terminated C strings as the representation of the
text form of any datatype.  We can't readily change that API without
breaking huge amounts of code, much of it not under the core project's
control.

There may be other places where nul-terminated strings would be a hazard
(mumble fgets mumble), but offhand that API seems like the major problem
so far as the backend is concerned.

There would be a slew of client-side problems as well.  For example this
would assuredly break psql and pg_dump, along with every other client that
supposes that it can treat PQgetvalue() as returning a nul-terminated
string.  This end of it would possibly be even worse than fixing the
backend, because so little of the affected code is under our control.

In short, the problem is not with having an embedded nul in a stored
text value.  The problem is the reams of code that suppose that the
text representation of any data value is a nul-terminated C string.

regards, tom lane


Wow. That seems like a daunting task.

I guess, then, than even implementing a new datatype based on bytea 
but that would use the text IO functions to show up as text (not 
send/recv) would neither work, right?


Thanks for the input,

Álvaro


--

Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Implementing full UTF-8 support (aka supporting 0x00)

2016-08-03 Thread Álvaro Hernández Tortosa


Hi list.

As has been previously discussed (see 
https://www.postgresql.org/message-id/BAY7-F17FFE0E324AB3B642C547E96890%40phx.gbl 
for instance) varlena fields cannot accept the literal 0x00 value. Sure, 
you can use bytea, but this hardly a good solution. The problem seems to 
be hitting some use cases, like:


- People migrating data from other databases (apart from PostgreSQL, I 
don't know of any other database which suffers the same problem).
- People using drivers which use UTF-8 or equivalent encodings by 
default (Java for example)


Given that 0x00 is a perfectly legal UTF-8 character, I conclude 
we're strictly non-compliant. And given the general Postgres policy 
regarding standards compliance and the people being hit by this, I think 
it should be addressed. Specially since all the usual fixes are a real 
PITA (re-parsing, re-generating strings, which is very expensive, or 
dropping data).


What would it take to support it? Isn't the varlena header 
propagated everywhere, which could help infer the real length of the 
string? Any pointers or suggestions would be welcome.


Thanks,

Álvaro


--

Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protocol buffer support for Postgres

2016-06-24 Thread Álvaro Hernández Tortosa



On 24/06/16 14:23, Flavius Anton wrote:

On Thu, Jun 23, 2016 at 2:54 PM, Flavius Anton <f.v.an...@gmail.com> wrote:

On Thu, Jun 23, 2016 at 1:50 PM, Greg Stark <st...@mit.edu> wrote:

On Thu, Jun 23, 2016 at 8:15 PM, Flavius Anton <f.v.an...@gmail.com> wrote:

I'd love to talk more about this.

I thought quite a bit about this a few years ago but never really
picked up it to work on.

Any other thoughts on this? My guess is that it might be an important
addition to Postgres that can attract even more users, but I am not
sure if there's enough interest from the community. If I want to pick
this task, how should I move forward? Do I need to write a design
document or similar or should I come up with a patch that implements a
draft prototype? I am new to this community and don't know the code
yet, so I'd appreciate some guidance from an older, more experienced
member.




Other than protobuf, there are also other serialization formats 
that might be worth considering. Not too long ago I suggested working 
specifically on serialization formas for the json/jsonb types: 
https://www.postgresql.org/message-id/56CB8A62.40100%408kdata.com I 
believe this effort is on the same boat.


Álvaro

--

Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 10.0

2016-05-15 Thread Álvaro Hernández Tortosa



On 15/05/16 14:42, Magnus Hagander wrote:



On Sun, May 15, 2016 at 2:29 PM, Álvaro Hernández Tortosa 
<a...@8kdata.com <mailto:a...@8kdata.com>> wrote:




On 14/05/16 20:02, Petr Jelinek wrote:

+1 for going with 10.0 after 9.6 and 11.0 afterwards, etc.

It will hopefully both end these discussions and remove the
confusion the current versioning scheme has (I too heard way
to many times about people using postgres8 or postgres9).


Even worse: I've been told that a company was using
"PostgreSQL 8.5" ^_^


That's not necessarily the version numbers fault. That's them using an 
alpha version.. (Yes, I've run into a customer just a couple of years 
ago that were still on 8.5 alpha)





It was their fault, obviously. There were not using the alpha 
version, they were using 8.3 but they thought it was 8.5 (and yes, 
that's terrible that they provide information without checking it). 
Anyway, and not being version number's fault, having one less number may 
have helped here and probably in other cases too.


Álvaro

--
Álvaro Hernández Tortosa


---
8Kdata



Re: [HACKERS] 10.0

2016-05-15 Thread Álvaro Hernández Tortosa



On 14/05/16 20:02, Petr Jelinek wrote:

+1 for going with 10.0 after 9.6 and 11.0 afterwards, etc.

It will hopefully both end these discussions and remove the confusion 
the current versioning scheme has (I too heard way to many times about 
people using postgres8 or postgres9).


Even worse: I've been told that a company was using "PostgreSQL 
8.5" ^_^



Álvaro

--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 10.0

2016-05-14 Thread Álvaro Hernández Tortosa



On 14/05/16 02:00, Tom Lane wrote:

[...]

I don't think this is about version number inflation, but actually more
the opposite.  What you're calling the major number is really a marketing
number.  There is not a technical distinction between major releases where
we choose to bump the first number and those where we choose to bump the
second.  It's all about marketing.  So to me, merging those numbers would
be an anti-marketing move.  I think it's a good move: it would be more
honest and transparent about what the numbers mean, not less so.


If having two "major" numbers is a marketing game, and if it works 
in such a way, I'd immediately say let's keep it. Decisions like the one 
debated here should be driven more from what is going to help user 
adoption rather than -hackers personal taste. BTW, none of these 
approaches seem dishonest to me.


Having said that, I believe having a single major number is a more 
effective marketing. Non major-major versions may make the release look 
like a "probably not worth" upgrade. People may hold their breath until 
a major-major upgrade, specially if people support this idea in forums 
like saying: "10.0 will come with amazing features, because version is 
bumped from 9.6".


So +1 to call 10.0 the next version and 11.0 the one after that.

    Álvaro

--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about google summer of code 2016

2016-03-22 Thread Álvaro Hernández Tortosa



On 23/03/16 01:56, Amit Langote wrote:

On 2016/03/23 9:19, Álvaro Hernández Tortosa wrote:

- Regarding GSoC: it looks to me that we failed to submit in time. Is this
what happened, or we weren't selected? If the former (and no criticism
here, just realizing a fact) what can we do next year to avoid this
happening again? Is anyone "appointed" to take care of it?

See Thom's message here:

http://www.postgresql.org/message-id/CAA-aLv6i3jh1H-5UHb8jSB0gMwA9sg_cqw3=MwddVzr=pxa...@mail.gmail.com

Thanks,
Amit




OK, read the thread, thanks for the info :)

Álvaro

--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about google summer of code 2016

2016-03-22 Thread Álvaro Hernández Tortosa



On 22/02/16 23:23, Álvaro Hernández Tortosa wrote:



On 22/02/16 05:10, Tom Lane wrote:

Heikki Linnakangas <hlinn...@iki.fi> writes:

On 19/02/16 10:10, �lvaro Hernández Tortosa wrote:

Oleg and I discussed recently that a really good addition to a GSoC
item would be to study whether it's convenient to have a binary
serialization format for jsonb over the wire.

Seems a bit risky for a GSoC project. We don't know if a different
serialization format will be a win, or whether we want to do it in the
end, until the benchmarking is done. It's also not clear what we're
trying to achieve with the serialization format: smaller on-the-wire
size, faster serialization in the server, faster parsing in the client,
or what?

Another variable is that your answers might depend on what format you
assume the client is trying to convert from/to.  (It's presumably not
text JSON, but then what is it?)


As I mentioned before, there are many well-known JSON 
serialization formats, like:


- http://ubjson.org/
- http://cbor.io/
- http://msgpack.org/
- BSON (ok, let's skip that one hehehe)
- http://wiki.fasterxml.com/SmileFormatSpec



Having said that, I'm not sure that risk is a blocking factor here.
History says that a large fraction of our GSoC projects don't result
in a commit to core PG.  As long as we're clear that "success" in this
project isn't measured by getting a feature committed, it doesn't seem
riskier than any other one.  Maybe it's even less risky, because there's
less of the success condition that's not under the GSoC student's 
control.




I wanted to bring an update here. It looks like someone did the 
expected benchmark "for us" :)


https://eng.uber.com/trip-data-squeeze/(thanks Alam for the link)

While this is Uber's own test, I think the conclusions are quite 
significant: an encoding like message pack + zlib requires only 14% of 
the size and encodes+decodes in 76% of the time of JSON. There are of 
course other contenders that trade better encoding times over slightly 
slower decoding and bigger size. But there are very interesting numbers 
on this benchmark. MessagePack, CBOR and UJSON (all + zlib) look like 
really good options.


So now that we have this data I would like to ask these questions 
to the community:


- Is this enough, or do we need to perform our own, different benchmarks?

- If this is enough, and given that we weren't elected for GSoC, is 
there interest in the community to work on this nonetheless?


- Regarding GSoC: it looks to me that we failed to submit in time. Is 
this what happened, or we weren't selected? If the former (and no 
criticism here, just realizing a fact) what can we do next year to avoid 
this happening again? Is anyone "appointed" to take care of it?



    Álvaro

--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Soliciting Feedback on Improving Server-Side Programming Documentation

2016-03-15 Thread Álvaro Hernández Tortosa


I started a similar thread with probably similar concerns: 
http://www.postgresql.org/message-id/56d1a6aa.6080...@8kdata.com


I believe this effort should be done. I added to my TODO list to 
compile a list of used functions in a selection of picked extensions to 
use that as a starting point of an "API".


Regards,

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata




On 15/03/16 13:02, Corey Huinker wrote:
Over the past few months, I've been familiarizing myself with postgres 
server side programming in C.


My attempts to educate myself were slow and halting. The existing 
server side programming documentation has some examples, but those 
examples didn't show me how do what I wanted to do, and my 
research-via-google was highly circular, almost always pointing back 
to the documentation I had already found lacking, or a copy of it.


Most of what I have learned I have culled from asking people on IRC, 
or bugging people I've met through user groups and PgConf. In all 
cases, people have been extremely helpful. However, this method is 
inefficient, because we're using two people's time, one of whom has to 
tolerate my incessant questions and slow learning pace.


Furthermore, the helpful suggestions I received boiled down to:
1. The function/macro/var you're looking for is PG_FOO, git grep PG_FOO
2. Look in blah.c which does something like what you're trying to do
3. The comments in blah.h do a good job of listing and explaining this 
macro or that


#1 git grep is a helpful reflex for discovering examples on my own, 
but it requires that I have a term to search on in the first place, 
and too often I don't know what I don't know.


#2 is the gold standard in terms of correctness (the code had to have 
worked at least up to the last checkin date), and in terms of 
discoverability it often gave me names of new macros to search for, 
coding patterns, etc. However, I was always left with the questions: 
How would I have figured this out on my own? How is the next person 
going to figure it out? Why doesn't anybody document this?


#3 Often answers the last question in #2: It *is* documented, but that 
documentation is not easily discoverable by conventional means.


So what I'd like to do is migrate some of the helpful information in 
the header files into pages of web searchable documentation, and also 
to revamp the existing documentation to be more relevant.


Along the way, I collected a list of things I wished I'd had from the 
start:


  * A list of all the GETARG_* macros. It would have been especially
great if this were in table form:   Your Parameter Is A / Use This
Macro / Which Gives This Result Type / Working example.
  * A list/table of the DatumGet* macros. I'm aware that many of them
overlap/duplicate others. That'd be good to know too.
  * The table at
http://www.postgresql.org/docs/9.1/static/errcodes-appendix.html
has the numeric codes and PL/PGSQL constants enumerated. It'd be
nice if it had the C #define as well
  * The SPI documentation mentions most/all of the SPI functions, but
I couldn't find documentation on the SPI variables like
SPI_processed and SPI_tuptable.
  * Examples and explanation of how PG_TRY()/PG_CATCH work. How to add
context callbacks.
  * Direct Function Calls
  * A comparison of the two modes of writing SRF functions
(Materialize vs multi-call)
  * Less explanation of how to do write V0-style functions. That was
called the "old style" back in version 7.1. Why is that
information up front in the documentation when so much else is
sequestered in header files?

Some of these things may seem obvious/trivial to you. I would argue 
that they're only obvious in retrospect, and the more obvious-to-you 
things we robustly document, the quicker we accumulate programmers who 
are capable of agreeing that it's obvious, and that's good for the 
community.


I'm aware that some of these APIs change frequently. In those cases, I 
suggest that we make note of that on the same page.


Because I'm still going through the learning curve, I'm probably the 
least qualified to write the actual documentation. However, I have a 
clear memory of what was hard to learn and I have the motivation to 
make it easier on the next person. That makes me a good focal point 
for gathering, formatting, and submitting the documentation in patch 
form. I'm volunteering to do so. What I need from you is:


  * Citations of existing documentation in header files that
could/should be exposed in our more formal documentation.
  * Explanations of any of the things above, which I can then reformat
into proposed documentation.
  * A willingness to review the proposed new documentation
  * Reasoned explanations for why this is a fool's errand

You supply the expertise, I'll write the patch.

Thanks in advance.




Re: [HACKERS] New competition from Microsoft?

2016-03-07 Thread Álvaro Hernández Tortosa



On 07/03/16 23:32, Joshua D. Drake wrote:

On 03/07/2016 01:43 PM, Josh berkus wrote:

All,

http://blogs.microsoft.com/?p=67248

Once SQL Server is available on Linux, we're going to see more people
using it as an alternative to PostgreSQL.  Especially since they're
picking up a lot of our better features, like R support.



Yes but:

1. MSSQL is not going to magically run well on Linux. It was never 
designed to run on a *NIX platform and it long left the parts that 
would (Sybase) in the dust.


I'd be surprised if it would *not* run well on Linux. Linux is an 
easy platform to develop, it's fast and reliable and they have plenty of 
time (till release).


2. This is actually good for us. It validates our primary deployment 
platform. Why run a closed source MSSQL when you can get Open Source 
PostgreSQL on the same platform major platform?


SQL Server seems to have some nice features that PostgreSQL don't 
have. Encryption support and Hekaton (columnar-store "shared buffers") 
come to my mind. It's definitely not a toy database, even less on Linux.




3. Competition is good when we have it, our community steps up.



Absolutely, competition is very welcome!

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread Álvaro Hernández Tortosa



On 07/03/16 11:54, José Luis Tallón wrote:

On 03/07/2016 07:30 AM, Tsunakawa, Takayuki wrote:

From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Mark Kirkwood
For cloud - in particular Openstack (which I am working with ATM), the
biggest thing would be:

- multi-master replication

or failing that:

- self managing single master failover (voting/quorum etc)


Hmm consul (optionally in a dockerized setup) ?
https://www.consul.io/



There are already HA solutions based on consensus, like Patroni: 
https://github.com/zalando/patroni





so that operators can essentially 'set and forget'. We currently use
Mysql+ Galera (multi master) and Mongodb (self managing single master)
and the convenience and simplicity is just so important (Openstack is a
huge complex collection of services - hand holding of any one 
service is

pretty much a non starter).
Yes, I was also asked whether PostgreSQL has any optional 
functionality like Galera Cluster for MySQL.  He was planning a 
scalable PaaS service which performs heavy reads and writes. Demand 
exists.


AFAIK, Galera has its own set of drawbacks


Right, some of them are explained here: 
https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster


Regards,

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL extension API? Documentation?

2016-02-27 Thread Álvaro Hernández Tortosa



On 27/02/16 15:43, Tom Lane wrote:

Chapman Flack <c...@anastigmatix.net> writes:

On 02/27/16 08:37, �lvaro Hernández Tortosa wrote:

In other words: what is the API surface exposed by PostgreSQL to
extension developers? The assumption is that no PostgreSQL code should be
modified, just adding your own and calling existing funcitons.

That's an excellent question that repeatedly comes up, in particular
because of the difference between the way the MSVC linker works on Windows,
and the way most other linkers work on other platforms.

Yeah.  It would be a fine thing to have a document defining what we
consider to be the exposed API for extensions.  In most cases we could
not actually stop extension developers from relying on stuff outside the
defined API, and I don't particularly feel a need to try.  But it would be
clear to all concerned that if you rely on something not in the API, it's
your problem if we remove it or whack it around in some future release.
On the other side, it would be clearer to core-code developers which
changes should be avoided because they would cause pain to extension
authors.

Unfortunately, it would be a lot of work to develop such a thing, and no
one has wanted to take it on.


Why would it be so much work? Creating a function list, and maybe 
documenting those, doesn't sound like a daunting task.


I wouldn't mind volunteering for this work, but I guess I would 
need some help to understand and identify the candidate parts of the 
API. If anyone could help me here, please let me know.


Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL extension API? Documentation?

2016-02-27 Thread Álvaro Hernández Tortosa



On 27/02/16 15:10, Chapman Flack wrote:

On 02/27/16 08:37, Álvaro Hernández Tortosa wrote:

 In other words: what is the API surface exposed by PostgreSQL to
extension developers? The assumption is that no PostgreSQL code should be
modified, just adding your own and calling existing funcitons.

That's an excellent question that repeatedly comes up, in particular
because of the difference between the way the MSVC linker works on Windows,
and the way most other linkers work on other platforms.

The issue there is ... on most non-Windows platforms, there are only the
general C rules to think about: if a symbol is static (or auto of course)
it is not visible to extensions, but otherwise it is.

For MSVC, in contrast, symbols need to have a certain decoration
(look for PGDLLIMPORT in various PostgreSQL .h files) for an MSVC-built
extension to be able to see it, otherwise it isn't accessible.

Well, that's not quite right. It turns out (and it may have taken some
work on the build process to make it turn out this way) ... *functions*
are accessible from MSVC (as long as they would be accessible under
normal C rules) whether or not they have PGDLLIMPORT. It's just
data symbols/variables that have to have PGDLLIMPORT or they aren't
available on Windows/MSVC.

And *that* arrangement is the result of a long thread in 2014 that
unfolded after discovering that what was really happening in MSVC
*before* that was that MSVC would silently pretend to link your
non-PGDLLIMPORT data symbols, and then give you the wrong data.

http://www.postgresql.org/message-id/flat/52fab90b.6020...@2ndquadrant.com

In that long thread, there are a few messages in the middle that probably
give the closest current answer to your API question. Craig Ringer has
consistently favored making other platforms work more like Windows/MSVC,
so that the PGDLLIMPORT business would serve to limit and more clearly
define the API surface:

http://www.postgresql.org/message-id/52ef1468.6080...@2ndquadrant.com

Andres Freund had the pragmatic reply:

http://www.postgresql.org/message-id/20140203103701.ga1...@awork2.anarazel.de


I think that'd be an exercise in futility. ... We'd break countless
extensions people have written. ... we'd need to have a really
separate API layer ... doesn't seem likely to arrive anytime soon,
if ever.

which was ultimately concurred in by Tom, and Craig too:

http://www.postgresql.org/message-id/29286.1391436...@sss.pgh.pa.us
http://www.postgresql.org/message-id/52efa654.8010...@2ndquadrant.com

Andres characterized it as "We have a (mostly) proper API. Just not
an internal/external API split."

http://www.postgresql.org/message-id/20140203142514.gd1...@awork2.anarazel.de

-Chap


Hi Chapman.

Thank you very much for your detailed message and all the 
references. They were very appropiate.


However, I still lack a list of functions that might be callable (I 
understand not even those labeled with PGDLLIMPORT are all good 
candidates and some good candidates might not be labeled as such) from 
an extension point of view. Have you come across such a list over any of 
these threads? I haven't been able to find it.


Thanks for your input!

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL extension API? Documentation?

2016-02-27 Thread Álvaro Hernández Tortosa



On 27/02/16 15:01, Fabrízio de Royes Mello wrote:



On Sat, Feb 27, 2016 at 10:37 AM, Álvaro Hernández Tortosa 
<a...@8kdata.com <mailto:a...@8kdata.com>> wrote:

>
>
> Hi.
>
> I have a newbie question for extension development. Extensions 
provide an entry point, and are dynamically linked to PostgreSQL. But 
what APIs/functions are really available for extensions to call?

>
> The most obvious API is SPI. You could also implements hooks. Of 
course, functions, types, aggregates, whatever. But can an extension 
call other "internal" PostgreSQL functions? Is there any restriction 
to what could --or should-- call an extension? Is there any specific 
API, or any documentation which states what is available to use?

>
> In other words: what is the API surface exposed by PostgreSQL to 
extension developers? The assumption is that no PostgreSQL code should 
be modified, just adding your own and calling existing funcitons.

>

I don't know what kind of problem you want to solve, but maybe you 
should ask to yourself:


Good point. I don't know. More precisely: no specific problem as of 
today. But if I knew all the "exposed API" I could more clearly think of 
what problems could be solved.


In other words: I see it's not clear what an extension could 
"extend". And knowing that would help extension developers to create new 
solutions.




1) I need to change some current PostgreSQL behavior?


If that means not changing current code, might well be an option.



2) I need to add a new feature do PostgreSQL without change the 
current behavior?


Writing a C extension you can access a lot of internal code if it's 
available internally by .h headers. For example, some time ago I'm 
thinking to write an extension to show more internal information about 
autovacuum (internal queue, etc... some like pg_stat_autovaccuum) . 
But nowadays is impossible without change the core because some 
internal structures are not exposed, so we should define an internal 
API to expose this kind of information.


So, calling any code exposed by the headers is ok for an extension? 
Is then the set of all .h files the "exposed API"? Or are some of those 
functions that should never be called?




So depending what problem you want to solve you can write an extension 
to do that. Then unfortunately the short aswer is "depend".


    Hope that we can find a more general answer :) Thanks for your opinion!

Álvaro



--
Álvaro Hernández Tortosa


---
8Kdata



[HACKERS] PostgreSQL extension API? Documentation?

2016-02-27 Thread Álvaro Hernández Tortosa


Hi.

I have a newbie question for extension development. Extensions 
provide an entry point, and are dynamically linked to PostgreSQL. But 
what APIs/functions are really available for extensions to call?


The most obvious API is SPI. You could also implements hooks. Of 
course, functions, types, aggregates, whatever. But can an extension 
call other "internal" PostgreSQL functions? Is there any restriction to 
what could --or should-- call an extension? Is there any specific API, 
or any documentation which states what is available to use?


In other words: what is the API surface exposed by PostgreSQL to 
extension developers? The assumption is that no PostgreSQL code should 
be modified, just adding your own and calling existing funcitons.


Thanks,

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Álvaro Hernández Tortosa



On 27/02/16 09:19, Konstantin Knizhnik wrote:

On 02/27/2016 06:54 AM, Robert Haas wrote:


[...]



So maybe the goal for the GTM isn't to provide true serializability

across the cluster but some lesser degree of transaction isolation.
But then exactly which serialization anomalies are we trying to
prevent, and why is it OK to prevent those and not others?

Absolutely agree. There are some theoretical discussion regarding CAP 
and different distributed level of isolation.
But at practice people want to solve their tasks. Most of PostgeSQL 
used are using default isolation level: read committed although there 
are alot of "wonderful" anomalies with it.
Serialazable transaction in Oracle are actually violating fundamental 
serializability rule and still Oracle is one of ther most popular 
database in the world...
The was isolation bug in Postgres-XL which doesn't prevent from using 
it by commercial customers...


I think this might be a dangerous line of thought. While I agree 
PostgreSQL should definitely look at the market and answer questions 
that (current and prospective) users may ask, and be more practical than 
idealist, easily ditching isolation guarantees might not be a good thing.


 That Oracle is the leader with their isolation problems or that 
most people run PostgreSQL under read committed is not a good argument 
to cut the corner and just go to bare minimum (if any) isolation 
guarantees. First, because PostgreSQL has always been trusted and 
understood as a system with *strong* guarantees (whatever that means). . 
Second, because what we may perceive as OK from the market, might change 
soon. From my observations, while I agree with you most people "don't 
care" or, worse, "don't realize", is rapidly changing. More and more 
people are becoming aware of the problems of distributed systems and the 
significant consequences they may have on them.


A lot of them have been illustrated in the famous Jepsen posts. As 
an example, and a good one given that you have mentioned Galera before, 
is this one: https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster 
which demonstrates how Galera fails to provide Snapshot Isolation, even 
on healthy state --despite they claim that.


As of today, I would expect any distributed system to clearly state 
its guarantees in the documentation. And them adhere to them, like for 
instance proving it with tests such as Jepsen.




So I do not say that discussing all this theoretical questions is not 
need as formally proven correctness of distributed algorithm.


I would like to see work forward here, so I really appreciate all 
your work here. I cannot give an opinion on whether the DTM API is good 
or not, but I agree with Robert a good technical discussion on these 
issues is a good, and a needed, starting point. Feedback may also help 
you avoid pitfalls that may have gone unnoticed until tons of code are 
implemented.


Academical approaches are sometimes "very academical", but studying 
them doesn't hurt either :)



Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about google summer of code 2016

2016-02-22 Thread Álvaro Hernández Tortosa



On 22/02/16 23:34, Tom Lane wrote:

=?UTF-8?Q?=c3=81lvaro_Hern=c3=a1ndez_Tortosa?= <a...@8kdata.com> writes:

On 22/02/16 05:10, Tom Lane wrote:

Another variable is that your answers might depend on what format you
assume the client is trying to convert from/to.  (It's presumably not
text JSON, but then what is it?)

  As I mentioned before, there are many well-known JSON serialization
formats, like:
- http://ubjson.org/
- http://cbor.io/
- http://msgpack.org/
- BSON (ok, let's skip that one hehehe)
- http://wiki.fasterxml.com/SmileFormatSpec

Ah, the great thing about standards is there are so many to choose from :-(

So I guess part of the GSoC project would have to be figuring out which
one of these would make the most sense for us to adopt.

regards, tom lane


Yes.

And unless I'm mistaken, there's an int16 to identify the data 
format. Apart from the chosen format, others may be provided as an 
alternative using different data formats. Or alternatives (like 
compressed text json). Of course, this may be better suited for a next 
GSoC project, of course.


Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about google summer of code 2016

2016-02-22 Thread Álvaro Hernández Tortosa



On 22/02/16 05:10, Tom Lane wrote:

Heikki Linnakangas <hlinn...@iki.fi> writes:

On 19/02/16 10:10, �lvaro Hernández Tortosa wrote:

Oleg and I discussed recently that a really good addition to a GSoC
item would be to study whether it's convenient to have a binary
serialization format for jsonb over the wire.

Seems a bit risky for a GSoC project. We don't know if a different
serialization format will be a win, or whether we want to do it in the
end, until the benchmarking is done. It's also not clear what we're
trying to achieve with the serialization format: smaller on-the-wire
size, faster serialization in the server, faster parsing in the client,
or what?

Another variable is that your answers might depend on what format you
assume the client is trying to convert from/to.  (It's presumably not
text JSON, but then what is it?)


As I mentioned before, there are many well-known JSON serialization 
formats, like:


- http://ubjson.org/
- http://cbor.io/
- http://msgpack.org/
- BSON (ok, let's skip that one hehehe)
- http://wiki.fasterxml.com/SmileFormatSpec



Having said that, I'm not sure that risk is a blocking factor here.
History says that a large fraction of our GSoC projects don't result
in a commit to core PG.  As long as we're clear that "success" in this
project isn't measured by getting a feature committed, it doesn't seem
riskier than any other one.  Maybe it's even less risky, because there's
less of the success condition that's not under the GSoC student's control.


Agreed :)

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about google summer of code 2016

2016-02-22 Thread Álvaro Hernández Tortosa



On 21/02/16 21:15, Heikki Linnakangas wrote:

On 19/02/16 10:10, Álvaro Hernández Tortosa wrote:
  Oleg and I discussed recently that a really good addition to a 
GSoC

item would be to study whether it's convenient to have a binary
serialization format for jsonb over the wire. Some argue this should be
benchmarked first. So the scope for this project would be to benchmark
and analyze the potential improvements and then agree on which format
jsonb could be serialized to (apart from the current on-disk format,
there are many json or nested k-v formats that could be used for sending
over the wire).


Seems a bit risky for a GSoC project. We don't know if a different 
serialization format will be a win, 


Over the current serialization (text) is hard to believe there will 
be no wins.


or whether we want to do it in the end, until the benchmarking is 
done. It's also not clear what we're trying to achieve with the 
serialization format: smaller on-the-wire size, faster serialization 
in the server, faster parsing in the client, or what?


Probably all of them (it would be ideal if it could be selectable). 
Some may favor small on-the-wire size (which can be significant with 
several serialization formats) or faster decoding (de-serialization 
takes a significant execution time). Of course, all this should be 
tested and benchmarked before, but we're not alone here.


This is a significant request from many, at least from the Java 
users, where it has been discussed many times. Specially if wire format 
adheres to one well-known (or even Standard) format, so that the 
receiving side and the drivers could expose an API based on that format 
--one of the other big pains today in this side.


I think it fits very well for a GSoC! :)

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about google summer of code 2016

2016-02-19 Thread Álvaro Hernández Tortosa


Hi.

Oleg and I discussed recently that a really good addition to a GSoC 
item would be to study whether it's convenient to have a binary 
serialization format for jsonb over the wire. Some argue this should be 
benchmarked first. So the scope for this project would be to benchmark 
and analyze the potential improvements and then agree on which format 
jsonb could be serialized to (apart from the current on-disk format, 
there are many json or nested k-v formats that could be used for sending 
over the wire).


I would like to mentor this project with Oleg.

Thanks,

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata




On 17/02/16 08:40, Amit Langote wrote:

Hi Shubham,

On 2016/02/17 16:27, Shubham Barai wrote:

Hello everyone,

I am currently pursuing my bachelor of engineering in computer science
at Maharashtra
Institute of Technology, Pune ,India. I am very excited about contributing
to postgres through google summer of code program.

Is postgres   applying for gsoc 2016 as mentoring organization ?

I think it does.  Track this page for updates:
http://www.postgresql.org/developer/summerofcode/

You can contact one of the people listed on that page for the latest.

I didn't find for 2016 but here is the PostgreSQL wiki page for the last
year's GSoC page: https://wiki.postgresql.org/wiki/GSoC_2015#Project_Ideas

Thanks,
Amit








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bugs and bug tracking

2015-10-13 Thread Álvaro Hernández Tortosa


On 13/10/15 04:40, Tom Lane wrote:
I'm with Robert on the idea that commit log entries need to be 
limited-width. I personally format them to 75 characters, so that 
git_changelog's output is less than 80 characters. regards, tom lane 


Little bit off-topic, but if precisely if we're trying to make the 
commits/bug-tracking/whatever system more user-friendly also for 
non-hacker users, I'd adhere to the 50/72 "standard" for commit 
messages, which seems to be quite extended: 
http://chris.beams.io/posts/git-commit/#seven-rules


(50 chars for the commit summary, 72 chars line wrapping)

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bugs and bug tracking

2015-10-13 Thread Álvaro Hernández Tortosa


On 13/10/15 16:24, Andres Freund wrote:

On 2015-10-13 16:21:54 +0200, Álvaro Hernández Tortosa wrote:

On 13/10/15 04:40, Tom Lane wrote:

I'm with Robert on the idea that commit log entries need to be
limited-width. I personally format them to 75 characters, so that
git_changelog's output is less than 80 characters. regards, tom lane

 Little bit off-topic, but if precisely if we're trying to make the
commits/bug-tracking/whatever system more user-friendly also for non-hacker
users, I'd adhere to the 50/72 "standard" for commit messages, which seems
to be quite extended: http://chris.beams.io/posts/git-commit/#seven-rules

(50 chars for the commit summary, 72 chars line wrapping)

-1 - imo 50 chars too often makes the commit summary too unspecific,
requiring to read much more.

I'm not strong advocate of 50 chars anyway, but if people are 
getting used to this, and probably also tools, I'd try to stick to it. 
And I believe you should be able to describe a commit in 50 chars. But 
we shouldn't of course deviate and start yet another thread on this, so 
it's all up to you :)


Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bugs and bug tracking

2015-10-13 Thread Álvaro Hernández Tortosa


On 13/10/15 17:39, Joshua D. Drake wrote:

On 10/13/2015 08:15 AM, Tom Lane wrote:

Andres Freund <and...@anarazel.de> writes:

On 2015-10-13 16:21:54 +0200, �lvaro Hern�ndez Tortosa wrote:

(50 chars for the commit summary, 72 chars line wrapping)



-1 - imo 50 chars too often makes the commit summary too unspecific,
requiring to read much more.


I agree --- I have a hard enough time writing a good summary in 75
characters.  50 would be awful.


The idea of writing a commit message that is useful in a number of 
characters that is less than half a tweet sounds unbearable. The idea 
of trying to discern what the hell a commit actually is in a number of 
characters that is less than half a tweet sounds completely ridiculous.


-1 on that particular aspect.

jD



I'm writing a YC application and they ask you to summarize your 
whole project idea in less than 50 chars. So I guess that a commit 
message can be summarized under 50 chars too ^_^
We even do this with Java commits, and hey, you know, if you include a 
JavaStyleCamelCaseUnnecesarilyVerboseClassName in this summary you're 
screwed up!


But it seems there's clear agreement on *not* restricting it to 50, 
so I have nothing else to add :)


Best,

Álvaro

--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables

2015-04-26 Thread Álvaro Hernández Tortosa


On 25/04/15 06:39, Jim Nasby wrote:

On 4/24/15 7:11 PM, Álvaro Hernández Tortosa wrote:

On 24/04/15 05:24, Tom Lane wrote:

...

TBH, I've got very little enthusiasm for fixing this given the number
of reports of trouble from the field, which so far as I recall is zero.
Álvaro's case came up through intentionally trying to create an
unreasonable number of tables, not from real usage.  This thread 
likewise

appears to contain lots of speculation and no reports of anyone hitting
a problem in practice.


 It is certainly true that this was a very synthetic case. I
envision, however, certain use cases where we may hit a very large
number of tables:


The original case has NOTHING to do with the number of tables and 
everything to do with the number of toasted values a table can have. 
If you have to toast 4B attributes in a single relation it will fail. 
In reality, if you get anywhere close to that things will fall apart 
due to OID conflicts.


This case isn't nearly as insane as 4B tables. A table storing 10 text 
fields each of which is 2K would hit this limit with only 400M rows. 
If my math is right that's only 8TB; certainly not anything insane 
space-wise or rowcount-wise.


Perhaps it's still not fixing, but I think it's definitely worth 
documenting.


They are definitely different problems, but caused by similar 
symptoms: an oid wrapping around, or not even there: just trying to find 
an unused one. If fixed, we should probably look at both at the same time.


It's worth document but also, as I said, maybe also fixing them, so 
that if three years from now they really show up, solution is already in 
production (rather than in patching state).


Regards,

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables

2015-04-24 Thread Álvaro Hernández Tortosa


On 24/04/15 05:24, Tom Lane wrote:

Stephen Frost sfr...@snowman.net writes:

* Bruce Momjian (br...@momjian.us) wrote:

On Sun, Feb  1, 2015 at 03:54:03PM +0100, Álvaro Hernández Tortosa wrote:

The problem here is that performance degrades exponentially, or
worse. Speaking here from experience, we already tested this for a
very similar case (table creation, where two oids are consumed from
a global sequence when inserting to pg_class). Have a look at
http://www.slideshare.net/nosys/billion-tables-project-nycpug-2013,
slides 43-45. We tested there this scenario and shown that table
creations per second dropped from 10K to a few per second and then
to a few per day. In the graphs you can't even realize there were
more tables been created. At around 8K tables from the theoretical
limit of 4B oids consumed, the process basically stopped (doing more
insertions).

We don't report the maximum number of tables per database, or the
maximum number of TOAST values.  Agreed?

For my 2c, this limitation is a surprise to users and therefore we
should add documentation to point out that it exists, unless we're going
to actually fix it (which is certainly what I'd prefer to see...).

TBH, I've got very little enthusiasm for fixing this given the number
of reports of trouble from the field, which so far as I recall is zero.
Álvaro's case came up through intentionally trying to create an
unreasonable number of tables, not from real usage.  This thread likewise
appears to contain lots of speculation and no reports of anyone hitting
a problem in practice.


It is certainly true that this was a very synthetic case. I 
envision, however, certain use cases where we may hit a very large 
number of tables:


- Massive multitenancy
- Aggressive partitioning
- Massive multitenancy with aggressive partitioning
- Software dynamically generated tables, like those created by ToroDB 
(https://github.com/torodb/torodb). In ToroDB we generate tables 
depending only on the input data, so we may end up having as many as 
required by the datasource. For example, a general purpose json 
datastore may generate several tables per document inserted.




Certainly this is likely to become an issue at some point in the future,
but I'm not finding it very compelling to worry about now.  By the time
it does become an issue, we may have additional considerations or use
cases that should inform a solution; which seems to me to be a good
argument not to try to fix it in advance of real problems.  Perhaps,


I understand this argument, and it makes sense. However, on the 
other side, given the long time it may take from patch to commit and 
then release version to companies finally using it in production, I'd 
rather try to fix it soon, as there are already reports and use cases 
that may hit it, rather than wait three years until it explodes in our 
faces. After all, 640Kb RAM is enough, right? So maybe 2B tables is not 
that far in the horizon. Who knows.


Regards,

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

2015-03-24 Thread Álvaro Hernández Tortosa


On 24/03/15 20:56, Bruce Momjian wrote:

On Fri, Mar 20, 2015 at 04:43:42PM -0400, Bruce Momjian wrote:

On Sat, Nov  8, 2014 at 09:53:18PM +0100, Álvaro Hernández Tortosa wrote:

On 07/11/14 22:02, Greg Sabino Mullane wrote:

Kevin Grittner wrote:

I think most people have always assumed that
BEGIN starts the transaction and that is the point at
which the snapshot is obtained.

But there is so much evidence to the contrary.  Not only does the
*name* of the command (BEGIN or START) imply a start, but
pg_stat_activity shows the connection idle in transaction after
the command (and before a snapshot is acquired)

Er...I think we are arguing the same thing here. So no contrary
needed? :)

 So do we agree to fix the docs? ^_^

Doc patch attached.

Patch applied.  Thanks for the report.




Awesome! Thanks! :)

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing our reliance on MD5

2015-02-11 Thread Álvaro Hernández Tortosa


On 11/02/15 02:30, Tom Lane wrote:

[...]


I think it would be wise to take two steps back and think about what
the threat model is here, and what we actually need to improve.
Offhand I can remember two distinct things we might wish to have more
protection against:

* scraping of passwords off the wire protocol (but is that still
a threat in an SSL world?).  Better salting practice would do more
than replacing the algorithm as such for this, IMO.


mitm

We might consider it our problem or not, but in general terms 
man-in-the-middle attacks, which are easy to implement in many 
scenarios, are a scraping problem. In particular, I have seen tons of 
developers turn off SSL validation during development and not turning 
back it on for production, leaving servers vulnerable to password 
scraping under mitm attacks. So I would always considering hashing anyway.


SCRAM seems to be a good solution anyway.

Regards,

Álvaro




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables

2015-02-01 Thread Álvaro Hernández Tortosa


On 31/01/15 14:55, Roger Pack wrote:

[...]

Now, the reality is that GetNewOidWithIndex() is going to keep
incrementing the global OID counter until it finds an OID that isn't in
the toast table. That means that if you actually get anywhere close to
using 4B OIDs you're going to become extremely unhappy with the
performance of toasting new data.
OK so system stability doesn't degrade per se when it wraps [since
they all use that GetNewOid method or similar [?] good to know.

So basically when it gets near 4B TOAST'ed rows it may have to wrap that
counter and search for unused number, and for each number it's
querying the TOAST table to see if it's already used, degrading
performance.



The problem here is that performance degrades exponentially, or 
worse. Speaking here from experience, we already tested this for a very 
similar case (table creation, where two oids are consumed from a global 
sequence when inserting to pg_class). Have a look at 
http://www.slideshare.net/nosys/billion-tables-project-nycpug-2013, 
slides 43-45. We tested there this scenario and shown that table 
creations per second dropped from 10K to a few per second and then to a 
few per day. In the graphs you can't even realize there were more tables 
been created. At around 8K tables from the theoretical limit of 4B oids 
consumed, the process basically stopped (doing more insertions).


Hope that this information helps.

Best regards,

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-13 Thread Álvaro Hernández Tortosa


On 12/12/14 20:43, Josh Berkus wrote:

On 12/12/2014 11:35 AM, Alvaro Herrera wrote:

Uh, really?  Last I looked at the numbers from SPI treasurer reports,
they are not impressive enough to hire a full-time engineer, let alone a
senior one.

The Linux Foundation has managed to pay for Linus Torvalds somehow, so
it does sound possible.  We have a number of companies making money all
over the globe, at least.


I think Álvaro (Herrera) got to the real point when he suggested to 
fund a developer. Or three, as was also suggested. But to really nail it 
down, I'd say not only for CFM. I think, overall, the PostgreSQL 
community would really need to seriously consider raising funds and pay 
with that full time development for some senior developers.


That would also allow to have development more oriented into what 
the community really wants, rather than what other companies or 
individuals work for (which is absolutely great, of course).



You're looking at this wrong.  We have that amount of money in the
account based on zero fundraising whatsoever, which we don't do because
we don't spend the money.  We get roughly $20,000 per year just by
putting up a donate link, and not even promoting it.

So, what this would take is:

1) a candidate who is currently a known major committer


I think it would be even better to sell this approach as a 
long-term strategy, not tied to any particular candidate. Sure, some 
known major committer is for sure a good selling point; but a well 
communicated strategy for a long-term foundation-like fund raising to 
improve PostgreSQL in certain ways is the way to go.


2) clear goals for what this person would spend their time doing
+1. That may be the Core Team based on feedback/input from all the 
list, or something like that


3) buy-in from the Core Team, the committers, and the general hackers
community (including buy-in to the idea of favorable publicity for
funding supporters)

+1


4) an organizing committee with the time to deal with managing
foundation funds

+1. Absolutely necessary, otherwise funding will not work


If we had those four things, the fundraising part would be easy.  I
speak as someone who used to raise $600,000 per year for a non-profit in
individual gifts alone.


I know it sounds difficult, and surely it is, but I believe the 
PostgreSQL community should be able to raise, globally, some millions 
per year to stably, and permanently, fund this community-guided 
development and have our best developers devoted 100% to PostgreSQL.



Regards,

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-08 Thread Álvaro Hernández Tortosa


On 06/11/14 15:00, Kevin Grittner wrote:

Álvaro Hernández Tortosa a...@8kdata.com wrote:


 There has been two comments which seem to state that changing this
may introduce some performance problems and some limitations when you
need to take out some locks. I still believe, however, that current
behavior is confusing for the user. Sure, one option is to patch the
documentation, as I was suggesting.

Yeah, I thought that's what we were talking about, and in that
regard I agree that the docs could be more clear.  I'm not quite
sure what to say where to fix that, but I can see how someone could
be confused and have the expectation that once they have run BEGIN
TRANSACTION ISOLATION LEVEL SERIALIZABLE the transaction will not
see the work of transactions committing after that.  The fact that
this is possible is implied, if one reads carefully and thinks
about it, by the statement right near the start of the Transaction
Isolation section which says any concurrent execution of a set of
Serializable transactions is guaranteed to produce the same effect
as running them one at a time in some order.  As Robert pointed
out, this is not necessarily the commit order or the transaction
start order.

It is entirely possible that if you have serializable transactions
T1 and T2, where T1 executes BEGIN first (and even runs a query
before T2 executes BEGIN) and T1 commits first, that T2 will
appear to have run first because it will look at a set of data
which T1 modifies and not see the changes.  If T1 were to *also*
look at a set of data which T2 modifies, then one of the
transactions would be rolled back with a serialization failure, to
prevent a cycle in the apparent order of execution; so the
requirements of the standard (and of most software which is
attempting to handle race conditions) is satisfied.  For many
popular benchmarks (and I suspect most common workloads) this
provides the necessary protections with better performance than is
possible using blocking to provide the required guarantees.[1]


Yes, you're right in that the any concurrent execution... phrase 
implicitly means that snapshot may not be taken at BEGIN or SET 
TRANSACTION time, but it's definitely not clear enough for the average 
user. Yet this may apply to the serializable case, but it doesn't to the 
repeatable read where the docs read  The Repeatable Read isolation 
level only sees data committed before the transaction began; it never 
sees either uncommitted data or changes committed during transaction 
execution by concurrent transactions. The first part is confusing, as 
we discussed; the second part is even more confusing as it says during 
transaction execution, and isn't the transaction -not the snapshot- 
beginning at BEGIN time?


Surprisingly, the language is way more clear in the SET TRANSACTION 
doc page [2].




At any rate, the language in that section is a little fuzzy on the
concept of the start of the transaction.  Perhaps it would be
enough to change language like:

| sees a snapshot as of the start of the transaction, not as of the
| start of the current query within the transaction.

to:

| sees a snapshot as of the start of the first query within the
| transaction, not as of the start of the current query within the
| transaction.

Would that have prevented the confusion here?


I think that definitely helps. But it may be better to make it even 
more clear, more explicit. And offering a solution for the user who may 
like the snapshot to be taken at begin time, like suggesting to do a 
SELECT 1 query.



 But what about creating a flag to BEGIN and SET TRANSACTION
commands, called IMMEDIATE FREEZE (or something similar), which
applies only to REPEATABLE READ and SERIALIZABLE? If this flag is set
(and may be off by default, but of course the default may be
configurable via a guc parameter), freeze happens when it is present
(BEGIN or SET TRANSACTION) time. This would be a backwards-compatible
change, while would provide the option of freezing without the nasty
hack of having to do a SELECT 1 prior to your real queries, and
everything will of course be well documented.

What is the use case where you are having a problem?  This seems
like an odd solution, so it would be helpful to know what problem
it is attempting to solve.


I don't have a particular use case. I just came across the issue 
and thought the documentation and behavior wasn't consistent. So the 
first aim is not to have users surprised (in a bad way). But I see a 
clear use case: users who might want to open a (repeatable read | 
serializable) transaction to have their view of the database frozen, to 
perform any later operation on that frozen view. Sure, that comes at a 
penalty, but I see that potentially interesting too.


Regards,

Álvaro



[1] http://www.postgresql.org/docs/9.4/static/transaction-iso.html
[2] http://www.postgresql.org/docs/9.4/static/sql-set-transaction.html

--
Álvaro Hernández Tortosa

Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-08 Thread Álvaro Hernández Tortosa


On 07/11/14 22:02, Greg Sabino Mullane wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


Kevin Grittner wrote:

I think most people have always assumed that
BEGIN starts the transaction and that is the point at
which the snapshot is obtained.

But there is so much evidence to the contrary.  Not only does the
*name* of the command (BEGIN or START) imply a start, but
pg_stat_activity shows the connection idle in transaction after
the command (and before a snapshot is acquired)

Er...I think we are arguing the same thing here. So no contrary
needed? :)


So do we agree to fix the docs? ^_^




Why?  This fix might not deal with the bigger issues that I
discussed, like that the later-to-start and
later-to-acquire-a-snapshot transaction might logically be first in
the apparent order of execution.  You can't fix that without a
lot of blocking -- that most of us don't want.

Right, which is why the suggestion of a user-controllable switch,
that defaults to the current behavior, seems an excellent compromise.


I also think so. It's backwards-compatible and opt-in. It also 
makes the documentation very clear, as there is an specific option for this.





Depending on *why* they think this is important, they might need to
be acquiring various locks to prevent behavior they don't want, in which case
having acquired a snapshot at BEGIN would be exactly the *wrong*
thing to do.  The exact nature of the problem we're trying to solve
here does matter.

I cannot speak to the OP, but I also do not think we should try and
figure out every possible scenario people may have. Certainly the
long-standing documentation bug may have caused some unexpected or
unwanted behavior, so let's start by fixing that.


+1


Tom Lane wrote:

Another thing that I think hasn't been mentioned in this thread is
that we used to have severe problems with client libraries that like
to issue BEGIN and then go idle until they have something to do.
Which, for some reason, is a prevalent behavior.

I'm not advocating changing the default behavior, but I would not want
to see bad client libraries used a reason for any change we make. Clients
should not be doing this, period, and there is no reason for us to
support that.


If the IMMEDIATE FREEZE mode is not the default, as I suggested, 
it shouldn't introduce any problem with past code.


Regards,

Álvaro



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-06 Thread Álvaro Hernández Tortosa


On 06/11/14 02:06, Jim Nasby wrote:

On 11/5/14, 6:04 PM, Álvaro Hernández Tortosa wrote:


On 05/11/14 17:46, Jim Nasby wrote:

On 11/4/14, 6:11 PM, Álvaro Hernández Tortosa wrote:
 Should we improve then the docs stating this more clearly? Any 
objection to do this?


If we go that route we should also mention that now() will no longer 
be doing what you probably hope it would (AFAIK it's driven by BEGIN 
and not the first snapshot).


 If I understand you correctly, you mean that if we add a note to 
the documentation stating that the transaction really freezes when 
you do the first query, people would expect now() to be also frozen 
when the first query is done, which is not what happens (it's frozen 
at tx start). Then, yes, you're right, probably *both* the isolation 
levels and the now() function documentation should be patched to 
become more precise.


Bingo.

Hrm, is there anything else that differs between the two?

Perhaps we should change how now() works, but I'm worried about what 
that might do to existing applications...


 Perhaps, I also believe it might not be good for existing 
applications, but it definitely has a different freeze behavior, 
which seems inconsistent too.


Yeah, I'd really rather fix it...


There has been two comments which seem to state that changing this 
may introduce some performance problems and some limitations when you 
need to take out some locks. I still believe, however, that current 
behavior is confusing for the user. Sure, one option is to patch the 
documentation, as I was suggesting.


But what about creating a flag to BEGIN and SET TRANSACTION 
commands, called IMMEDIATE FREEZE (or something similar), which 
applies only to REPEATABLE READ and SERIALIZABLE? If this flag is set 
(and may be off by default, but of course the default may be 
configurable via a guc parameter), freeze happens when it is present 
(BEGIN or SET TRANSACTION) time. This would be a backwards-compatible 
change, while would provide the option of freezing without the nasty 
hack of having to do a SELECT 1 prior to your real queries, and 
everything will of course be well documented.


What do you think?


Best regards,

Álvaro

--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-05 Thread Álvaro Hernández Tortosa


On 05/11/14 17:46, Jim Nasby wrote:

On 11/4/14, 6:11 PM, Álvaro Hernández Tortosa wrote:
 Should we improve then the docs stating this more clearly? Any 
objection to do this?


If we go that route we should also mention that now() will no longer 
be doing what you probably hope it would (AFAIK it's driven by BEGIN 
and not the first snapshot).


If I understand you correctly, you mean that if we add a note to 
the documentation stating that the transaction really freezes when you 
do the first query, people would expect now() to be also frozen when the 
first query is done, which is not what happens (it's frozen at tx 
start). Then, yes, you're right, probably *both* the isolation levels 
and the now() function documentation should be patched to become more 
precise.




Perhaps we should change how now() works, but I'm worried about what 
that might do to existing applications...


Perhaps, I also believe it might not be good for existing 
applications, but it definitely has a different freeze behavior, which 
seems inconsistent too.


Thanks,

Álvaro

--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-05 Thread Álvaro Hernández Tortosa


On 06/11/14 00:42, Robert Haas wrote:

On Mon, Nov 3, 2014 at 2:14 PM, Álvaro Hernández Tortosa a...@8kdata.com 
wrote:

 Given a transaction started with BEGIN (REPEATABLE READ |
SERIALIZABLE), if a concurrent session commits some data before *any* query
within the first transaction, that committed data is seen by the
transaction. This is not what I'd expect.

I think the problem is with your expectation, not the behavior.


But my expectation is derived from the documentation:

The Repeatable Read isolation level only sees data committed before the 
transaction began;


In PostgreSQL you will see data committed after a BEGIN ... 
(REPEATABLE READ | SERIALIZABLE) statement (only before the first 
query). And it's reasonable to think that transaction begins when you 
issue a BEGIN statement. It's also reasonable to think this way as:


- now() is frozen at BEGIN time, as Nasby pointed out
- pg_stat_activity says that the transaction is started, as Kevin mentioned

So if the behavior is different from what the documentation says 
and what other external indicators may point out, I think at least 
documentation should be clear about this precise behavior, to avoid 
confusing users.



Serializable means that the transactions execute in such a fashion
that their parallel execution is equivalent to some serial order of
execution.  It doesn't say that it must be equivalent to any
particular order that you might imagine, such as the order in which
the transactions commit, or, as you propose, the order in which they
begin.  Generally, I think that's a good thing, because transaction
isolation is expensive: even at repeatable read, but for the need to
provide transaction isolation, there would be no such thing as bloat.
The repeatable read and serializable levels add further overhead of
various kinds.  We could provide a super-duper serializable level that
provides even tighter guarantees, but (1) I can't imagine many people
are keen to make the cost of serialization even higher than it already
is and (2) if you really want that behavior, just do some trivial
operation sufficient to cause a snapshot to be taken immediately after
the BEGIN.



I'm not really asking for a new isolation level, just that either 
BEGIN really freezes (for repeatable read and serializable) or if that's 
expensive and not going to happen, that the documentation clearly states 
the fact that freeze starts at first-query-time, and that if you need to 
freeze before your first real query, you should do a dummy one instead 
(like SELECT 1). Also, if this early freeze is a performance hit -and 
for that reason BEGIN is not going to be changed to freeze- then that 
also should be pointed out in the documentation, so that users that 
freeze early with SELECT 1-type queries understand that.


Regards,

Álvaro

--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-04 Thread Álvaro Hernández Tortosa


On 04/11/14 09:07, Craig Ringer wrote:

On 11/04/2014 07:31 AM, Álvaro Hernández Tortosa wrote:

 Thank you for your comment, Tom. However I think this behavior, as
seen from a user perspective, it's not the expected one.

That may be the case, but I think it's the SQL-standard behaviour, so we
can't really mess with it.

The spec requires SET TRANSACTION ISOLATION, and you can't implement
that if you take a snapshot at BEGIN.


It's true that the standard mandates SET TRANSACTION rather than 
setting the isolation level with the BEGIN statement, and in any case 
you can raise/lower the isolation level with SET regardless of what the 
session or the begin command said. However, is it really a problem 
taking a snapshot at BEGIN time --only if the tx is started with BEGIN 
... (REPEATABLE READ | SERIALIZABLE)? AFAIK, and I may be missing some 
internal details here, the worst that can happen is that you took one 
extra, unnecessary snapshot. I don't see that as a huge problem.


The standard (92) says that transaction is initiated when a 
transaction-initiating SQL-statement is executed. To be fair, that 
sounds to me more of a SELECT rather than a BEGIN, but I may be wrong.



 If it is still the intended behavior, I think it should be clearly
documented as such, and a recommendation similar to issue a 'SELECT 1'
right after BEGIN to freeze the data before any own query or similar
comment should be added. Again, as I said in my email, the documentation
clearly says that only sees data committed before the transaction
began. And this is clearly not the real behavior.

It's more of a difference in when the transaction begins.

Arguably, BEGIN says I intend to begin a new transaction with the
next query rather than immediately begin executing a new transaction.

This concept could be clearer in the docs.


If this is really how it should behave, I'd +1000 to make it 
clearer in the docs, and to explicitly suggest the user to perform a 
query discarding the results early after BEGIN if the user wants the 
state freezed if there may span time between BEGIN and the real queries 
to be executed (like doing a SELECT 1).





 Sure, there are, that was the link I pointed out, but I found no
explicit mention to the fact that I'm raising here.

I'm sure it's documented *somewhere*, in that I remember reading about
this detail in the docs, but I can't find _where_ in the docs.

It doesn't seem to be in:

http://www.postgresql.org/docs/current/static/transaction-iso.html

where I'd expect.


Yepp, there's no mention there.



In any case, we simply cannot take the snapshot at BEGIN time, because
it's permitted to:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

in a DB that has default serializable isolation or has a SET SESSION
CHARACTERISTICS isolation mode of serializable. Note that SET
TRANSACTION is SQL-standard.


As I said, AFAIK it shouldn't matter a lot to take the snapshot at 
BEGIN. The worst that can happen is that you end up in read committed 
and you need to take more snapshots, one per query.




AFAIK deferring the snapshot that's consistent with other RDBMSes that
use snapshots, too.


I tried Oracle and SQL Server. SQL Server seems to behave as 
PostgreSQL, but just because it locks the table if accessed in a 
serializable transaction, so it definitely waits until select to lock 
it. However, Oracle behaved as I expected: data is frozen at BEGIN time. 
I haven't tested others.





The docs of that command allude to, but doesn't explicitly state, the
behaviour you mention.

http://www.postgresql.org/docs/current/static/sql-set-transaction.html




Should we improve then the docs stating this more clearly? Any 
objection to do this?


Regards,

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-03 Thread Álvaro Hernández Tortosa

Hi!

Given a transaction started with BEGIN (REPEATABLE READ | 
SERIALIZABLE), if a concurrent session commits some data before *any* 
query within the first transaction, that committed data is seen by the 
transaction. This is not what I'd expect. Specifically, the 
documentation states that:


The Repeatable Read isolation level only sees data committed before the 
transaction began; [1]


IMHO, from a user perspective the transaction begins when the BEGIN 
command is issued. If I really want to see a frozen view of the 
database before any real SELECT, I have to issue another query like 
SELECT 1. This seems odd to me. I understand tx snapshot may be 
deferred until real execution for performance reasons, but it is 
confusing from a user perspective. Is this really expected, or is it a 
bug? Am I having a bad day and missing some point here? ^_^


Regards,

Álvaro


[1] http://www.postgresql.org/docs/devel/static/transaction-iso.html


P.S. In case it wasn't clear what I meant, here's an example:


Session 1 Session 2

CREATE TABLE i (i integer);
BEGIN ISOLATION LEVEL REPEATABLE READ;
INSERT INTO i VALUES (1);
SELECT i FROM i; -- returns 1 row, value 1
-- should return empty set
INSERT INTO i VALUES (2);
SELECT i FROM i; -- returns 1 row, value 1
-- returns, as it should, the same as the previous query


In the first select, I'd have expected to have no rows. If a SELECT 1 
is issued after BEGIN, there are no rows found.


--
Álvaro Hernández Tortosa


---
8Kdata



Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-03 Thread Álvaro Hernández Tortosa


On 03/11/14 22:19, Tom Lane wrote:

=?ISO-8859-1?Q?=C1lvaro_Hern=E1ndez_Tortosa?= a...@8kdata.com writes:

  IMHO, from a user perspective the transaction begins when the BEGIN
command is issued. If I really want to see a frozen view of the
database before any real SELECT, I have to issue another query like
SELECT 1. This seems odd to me. I understand tx snapshot may be
deferred until real execution for performance reasons, but it is
confusing from a user perspective. Is this really expected, or is it a
bug? Am I having a bad day and missing some point here? ^_^

It's expected.  Without this behavior, you could not take out any locks
before freezing the transaction snapshot, which would be a bad thing.


Thank you for your comment, Tom. However I think this behavior, as 
seen from a user perspective, it's not the expected one. There may be 
some internal reasons for it, but for the user executing the 
transaction, it's normal to expect the freezing to happen right after 
the BEGIN, rather than *after* the first query.


If it is still the intended behavior, I think it should be clearly 
documented as such, and a recommendation similar to issue a 'SELECT 1' 
right after BEGIN to freeze the data before any own query or similar 
comment should be added. Again, as I said in my email, the documentation 
clearly says that only sees data committed before the transaction 
began. And this is clearly not the real behavior.




I think there are some examples in the concurrency control chapter
of the manual.


Sure, there are, that was the link I pointed out, but I found no 
explicit mention to the fact that I'm raising here.



Regards,

Álvaro

--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-16 Thread Álvaro Hernández Tortosa


On 04/09/14 18:02, Craig Ringer wrote:

On 09/04/2014 06:48 AM, Joshua D. Drake wrote:

On 09/03/2014 11:48 AM, Robert Haas wrote:


Anyway, to get back around to the topic of PL/SQL compatibility
specifically, if you care about that issue, pick one thing that exists
in PL/SQL but not in PL/pgsql and try to do something about it.  Maybe
it'll be something that EnterpiseDB has already done something about,
in which case, if your patch gets committed, Advanced Server will lose
a bit of distinction as compared with PostgreSQL.  Or maybe it'll be
something that EnterpriseDB hasn't done anything about, and then
everybody comes out strictly ahead.  What I think you shouldn't do
(although you're free to ignore me) is continue thinking of Oracle
compatibility as one monolithic thing, because it isn't, or to pursue
of a course of trying to get the PostgreSQL community to slavishly
follow Oracle, because I think you'll fail, and even if you succeed I
don't think the results will actually be positive for PostgreSQL.

Well put Robert.

Indeed, especially with reference to the size and scope of Oracle. Its
XML library alone is huge.

At best it's reasonable to hope for compatibility with a limited subset
of PL/SQL - and really, we're a good way there already, with most of
what's missing being down to missing core server features or things
PostgreSQL just does differently.

True Oracle compatibility (for procedures) pretty much requires an
embedded JVM with a rich class library. Since PL/Java seems to be dying
a slow death by neglect and disinterest I don't think it's likely anyone
would be tackling compatibility with the embedded JVM features anytime soon.

There are a few things I would like to see, like secure session
variables in PL/PgSQL. Mostly, though, I think talk of Oracle
compatibility seems to be something that comes up before the speaker
has really understood what that would mean, and the sheer scope of the
endeavour.

It's not going from 50% compatible to 80% compatible, it's going from 5%
compatible to 7% compatible. The most used 5% maybe, but still...



Getting that 5% of what is most used, would be a great gain. Maybe 
the speaker is mislead in the size of the endeavour, but quite sure 
about what that market needs are ;)


Cheers,

Álvaro




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-16 Thread Álvaro Hernández Tortosa


On 03/09/14 20:48, Robert Haas wrote:

On Tue, Sep 2, 2014 at 5:47 PM, Álvaro Hernández Tortosa a...@nosys.es wrote:

 Yeah, we differ there. I think having an Oracle compatibility layer in
PostgreSQL would be the-next-big-thing we could have. Oracle is has orders
of magnitude bigger user base than postgres has; and having the ability to
attract them would bring us many many more users which, in turn, would
benefit us all very significantly.

 It would be my #1 priority to do in postgres (but yes, I know -guess-
how hard and what resources that would require). But dreaming is free :)

There are a number of reasons why this isn't really practical.

First, Oracle compatibility isn't one feature.  The compatibility
layer that exists in EnterpriseDB's Advanced Server product consists
of many different changes to many different parts of the system.  A
few of those changes are simple syntax compatibility, where we do the
exact same thing PostgreSQL does but with different syntax, but a lot
of them are functional enhancements.  Even within SPL, there's a whole
bunch of different changes to a whole bunch of different areas, and
most of those are functional enhancements rather than just tweaking
syntax.  So, if you tried to implement a new, Oracle-compatible PL,
you'd find that you don't have one or a small number of changes to
make, but a long series of features ranging from small to very large.
You'd also find that adding a new PL, without changing any other parts
of the server, only bridges a small part of the compatibility gap.


Hi Robert, thanks for the insights here. Understood it is not a 
single thing the compatibility layer. And it's sure a very long and 
involved task to build such compatibility parts.


However, I don't see anything bad in having one or some parts of 
it. For example, having a pl that is similar -maybe only syntax- is a 
good thing.  Sure, there are surely lot of things that can't be done 
simply, tons of functions not available and so on, but that alone would 
mean Oracle users would feel both more comfortable and making their 
current code easier to port. That would already be a lot.




Second, if you did manage to develop something which was significantly
more compatible with Oracle than PostgreSQL or PL/pgsql is today,
you'd probably find that the community wouldn't accept it.  It's
almost misleading to think of Oracle as a database; it's an enormous
software ecosystem with facilities for doing just about everything
under the sun, and many of those things more than one way.  For
example, in 9.4, EnterpriseDB will be releasing a UTL_HTTP package
that contains many of the same interfaces that are present in Oracle.
The interface decisions made by Oracle Corporation are reasonable in
view of their architecture, but I am quite sure that this community
would not want, for example, to return long text values as SETOF
VARCHAR(2000) rather than TEXT, just because Oracle does that.  And
rightly so: I wouldn't want PostgreSQL to follow any other product
that slavishly whether I worked at EnterpriseDB or not.  This kind of
thing crops up over and over again, and it only works to say that
PostgreSQL should choose the Oracle behavior in every case if you
believe that the primary mission of PostgreSQL should be to copy
Oracle, and I don't.  I also don't think it's a bad thing that
Advanced Server makes different decisions than PostgreSQL in some
cases.  A further problem is that, in this particular case, you'd
probably here the argument from PostgreSQL hackers that they really
don't want to be burdened with maintaining an HTTP client in the core
server when the same thing could be done from an extension, and that's
a valid argument, too.  It is also valid for EnterpriseDB to make a
different decision for itself, based on business priorities.


I wouldn't follow those routes just for doing perfect 
compatibility. I agree, and I'd never push for those. In the light of 
all these things, I'd never expect perfect, scientific compatibility, 
but a best, but well documented, effort.




Now, none of that is to say that we wouldn't do well to give a little
more thought to Oracle compatibility than we do.  We've either made or
narrowly avoided a number of decisions over the years which introduced
- or threatened to introduce - minor, pointless incompatibilities with
other database products, Oracle included.  That really doesn't benefit
anyone.  To take another example, I've been complaining about the fact
that PostgreSQL 8.3+ requires far more typecasts in stored procedures
than any other database I'm aware of for years, probably since before
I joined EnterpriseDB.  And I still think we're kidding ourselves to
think that we've got that right when nobody else is doing something
similar.  I don't think the community should reverse that decision to
benefit EnterpriseDB, or to be compatible with Oracle: I think the
community should reverse that decision because it's stupid

Re: [HACKERS] PL/pgSQL 2

2014-09-03 Thread Álvaro Hernández Tortosa


On 03/09/14 15:24, Joshua D. Drake wrote:


On 09/02/2014 04:01 PM, Álvaro Hernández Tortosa wrote:


 It's not copying. It's easying a path for others to migrate and
come to Postgres.

 I'm interested why you are more interested in MSSQL. My reasons for
being interested in Oracle are:

- It has more users (biggest and above all, the main reason: we could
attract more)
- Postgres is perceived as similar to Oracle (so migration is likely
to be easier)

 That's all I want. Grow postgres userbase, attracting Oracle 
users :)


I find that we have more opportunity to replace MSSQL than Oracle. 
Obviously it depends on a lot of things but my goal is as yours, just 
with a different database.



Honestly, I don't care whether MSSQL or Oracle. What I want is to 
attract more users, get Postgres out of where it is and appeal even more 
users. With that regard, Oracle or MSSQL doesn't matter to me.


That's why if you have some time, I'd love to listen to why do you 
think there is more opportunity to replace MSSQL. We may continue that 
privately as is a little bit off-topic.


Thanks!

Álvaro



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Álvaro Hernández Tortosa


On 02/09/14 05:24, Craig Ringer wrote:

I couldn't disagree more.

If we were to implement anything, it'd be PL/PSM
(http://en.wikipedia.org/wiki/SQL/PSM). I'm sure it's as bizarre and
quirky as anything else the SQL committee has brought forth, but it's at
least a standard(ish) language.
So we'd choose a bizarre and quirky language instead of anything 
better just because it's standard. I'm sure current and prospective 
users will surely prefer a bizarre and quirky language that is standard 
approved, rather than a modern, comfortable, easy-to-use, that is not 
embodied by the ISO. No doubt ^_^




Creating a new language when there are already many existing contenders
is absolutely nonsensical. Other than PL/PSM the only thing that'd make
any sense would be to *pick a suitable existing language* like Lua or
JavaScript and bless it as a supported, always-available, in-core
language runtime that's compiled in by default.


That is in my opinion a way more sensible choice. To bless 
PL/JavaScript as an in-core language would be a very wise choice.


Álvaro



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Álvaro Hernández Tortosa


On 02/09/14 06:40, Tom Lane wrote:

Craig Ringer cr...@2ndquadrant.com writes:

If someone came up with a convincing PL/SQL compatibility layer then
it'd be worth considering adopting - when it was ready. But of course,
anyone who does the work for that is quite likely to want to sell it to
cashed-up Oracle users looking to save a few hundred grand on per-CPU
licensing.

As a case in point, EDB have spent quite a few man-years on their Oracle
compatibility layer; and it's still not a terribly exact match, according
to my colleagues who have looked at it.  So that is a tarbaby I don't
personally care to touch ... even ignoring the fact that cutting off
EDB's air supply wouldn't be a good thing for the community to do.

regards, tom lane




OK, so this compatibility layer is tough. Knew that already ;) But 
on the other side, the syntax is similar to plpgsql, right? So what 
about just having a compatible syntax? It would be the first step to 
that compatibility layer, which could -or could not- be a long-term goal 
for postgres (having the whole layer).


I don't buy that having that would cut EDB's air supply. They're 
doing great, and they know how to take care of themselves, I'm sure ;) 
Besides that, competition is always positive, and I'm sure they'd be 
more benefited than harmed by postgres having that layer.


If we are to have another plpgsql-like language (like plpgsql2) and 
we could design it so it would attract many many users (let's not forget 
that Oracle may have around two orders of magnitude more users than pg), 
that would benefit us all greatly. Even if not perfect. Even if it is a 
longer project which spans more than one release. But just having the 
syntax (or most of it, maybe avoiding some complex unimplemented 
postgres features, if that required a huge effort) is a big win.


For 9.4, we have the media already saying Postgres has NoSQL 
capabilities (which is only partially true). For x.y we could have the 
media saying Postgres adds Oracle compatibility (which would be only 
partially true). But that brings a lot of users to postgres, and that 
helps us all.


And also it could serve as a motivation point to implement 
those in-core missing features, too, that Oracle has.


If on the other hand we resign from attracting Oracle users, in a 
moment where non-Oracle databases are fighting for them. and we lose 
here well, let's at least have a very compelling, attractive, 
in-core, blessed, language. Even disliking it myself, PL/JavaScript 
would be my #1 candidate there.


My 4 (already) cents,

Álvaro


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Álvaro Hernández Tortosa


On 02/09/14 11:34, Mark Kirkwood wrote:

On 02/09/14 21:25, Álvaro Hernández Tortosa wrote:


On 02/09/14 05:24, Craig Ringer wrote:

I couldn't disagree more.

If we were to implement anything, it'd be PL/PSM
(http://en.wikipedia.org/wiki/SQL/PSM). I'm sure it's as bizarre and
quirky as anything else the SQL committee has brought forth, but 
it's at

least a standard(ish) language.

 So we'd choose a bizarre and quirky language instead of anything
better just because it's standard. I'm sure current and prospective
users will surely prefer a bizarre and quirky language that is standard
approved, rather than a modern, comfortable, easy-to-use, that is not
embodied by the ISO. No doubt ^_^



Well there is the risk that by randomly adding new syntax to PL/pgSQL 
we turn it in a bizarre and quirky *non standard* language. Part of 
the attraction of PL/pgsql is that it is Ada like - if we break that 
too much then...well...that would be bad. So I think a careful balance 
is needed, to add new features that keep the spirit of the original 
language.




I agree. I think I haven't suggested adding new syntax to pl/pgsql. 
But having its syntax similar to ADA is IMHO not something good. I'm 
sure few prospective postgres users would be compelled to that. They are 
compelled about JavaScript, python, Scala or Ruby, to name a few, but 
definitely not ADA.


Regards,

Álvaro


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Álvaro Hernández Tortosa


On 02/09/14 11:31, Pavel Stehule wrote:




2014-09-02 11:25 GMT+02:00 Álvaro Hernández Tortosa a...@nosys.es 
mailto:a...@nosys.es:



On 02/09/14 05:24, Craig Ringer wrote:

I couldn't disagree more.

If we were to implement anything, it'd be PL/PSM
(http://en.wikipedia.org/wiki/SQL/PSM). I'm sure it's as
bizarre and
quirky as anything else the SQL committee has brought forth,
but it's at
least a standard(ish) language.

So we'd choose a bizarre and quirky language instead of
anything better just because it's standard. I'm sure current and
prospective users will surely prefer a bizarre and quirky language
that is standard approved, rather than a modern, comfortable,
easy-to-use, that is not embodied by the ISO. No doubt ^_^


SQL/PSM is used in DB2, Sybase Anywhere, MySQL,


That's a way better argument that it's standard :)))

Still, I think postgres is in the position of attracting more 
Oracle than DB2+Sybase+MySQL users


Álvaro



Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Álvaro Hernández Tortosa


On 02/09/14 11:44, Pavel Stehule wrote:






For 9.4, we have the media already saying Postgres has NoSQL
capabilities (which is only partially true). For x.y we could
have the media saying Postgres adds Oracle compatibility (which
would be only partially true). But that brings a lot of users to
postgres, and that helps us all.


Partial true can enforce so lot of people will hate postgres too. 
False promises are wrong


Then let's stop talking about postgres being NoSQL. NoSQL is 
basically schema-less (really bad name) plus infinite scalability 
(which basically means transparent sharding). We fail to provide the 
latter very clearly...


Álvaro


Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Álvaro Hernández Tortosa


On 02/09/14 11:56, Pavel Stehule wrote:




2014-09-02 11:50 GMT+02:00 Álvaro Hernández Tortosa a...@nosys.es 
mailto:a...@nosys.es:



On 02/09/14 11:31, Pavel Stehule wrote:




2014-09-02 11:25 GMT+02:00 Álvaro Hernández Tortosa a...@nosys.es
mailto:a...@nosys.es:


On 02/09/14 05:24, Craig Ringer wrote:

I couldn't disagree more.

If we were to implement anything, it'd be PL/PSM
(http://en.wikipedia.org/wiki/SQL/PSM). I'm sure it's as
bizarre and
quirky as anything else the SQL committee has brought
forth, but it's at
least a standard(ish) language.

So we'd choose a bizarre and quirky language instead of
anything better just because it's standard. I'm sure current
and prospective users will surely prefer a bizarre and quirky
language that is standard approved, rather than a modern,
comfortable, easy-to-use, that is not embodied by the ISO. No
doubt ^_^


SQL/PSM is used in DB2, Sybase Anywhere, MySQL,


That's a way better argument that it's standard :)))

Still, I think postgres is in the position of attracting more
Oracle than DB2+Sybase+MySQL users


Not all can be happy :)

We can implement SQL/PSM in conformity with ANSI SQL. But we cannot to 
implement PL/SQL be in 20% compatible with oracle - Aggegates, pipe 
functions, collections, without rewriting lot code.


I remember lot of projects that promises compatibility with Oracle 
based on Firebird -- all are dead. Now situation is little bit 
different - there are big press for migration from Oracle, but Oracle 
is too big monster.



OK. Thanks for all the info I was missing about this complexity, I 
see that it goes well beyond the syntax thing.


However, I'd insist that this should be IMHO a big priority, and 
I'd set it as a long-term goal. Even better if it could have a phased 
approach, that would make a lot of people happier (targeting the most 
used functionality). I'm sure pushing us to implement those missing 
features would also be really good, too.


In the meantime, having another language (probably not plpgsql2) 
that is modern and appealing to many users would be a very nice win.


Regards,

Álvaro



Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Álvaro Hernández Tortosa


On 02/09/14 12:46, Marko Tiikkaja wrote:

On 9/2/14 11:40 AM, Álvaro Hernández Tortosa wrote:
  If we are to have another plpgsql-like language (like plpgsql2) 
and

we could design it so it would attract many many users (let's not forget
that Oracle may have around two orders of magnitude more users than pg),
that would benefit us all greatly. Even if not perfect. Even if it is a
longer project which spans more than one release. But just having the
syntax (or most of it, maybe avoiding some complex unimplemented
postgres features, if that required a huge effort) is a big win.


Have you looked at 
http://www.postgresql.org/docs/9.3/static/plpgsql-porting.html already? 


Precisely this page shows some indications of examples of things 
that could be done at a language level that would make it way easier to 
port from PL/SQL (if you don't use that unsupported stuff). At least for 
that, if the syntax is exactly the same, it could make things much more 
comfortable (I'm not aiming for a 0-effort port, at least in first 
place, but to get the 80% or 60% easier than now).



 As far as I can tell, that already *is* the case as far as the 
language goes.  It seems to me that most of the stuff that's different 
between the two are things that are out of the control of the language 
(no autonomous transactions, function source code in a literal etc.)


Maybe it would be interesting to analyze:

- What it's impossible to have right now in postgres
- What can be implemented in a different way, but that would work in 
postgres

- What could be somehow emulated

And adapt the syntax as much as possible to aim for the biggest 
compatibility possible.





  For 9.4, we have the media already saying Postgres has NoSQL
capabilities (which is only partially true). For x.y we could have the
media saying Postgres adds Oracle compatibility (which would be only
partially true). But that brings a lot of users to postgres, and that
helps us all.


This would be a horrible, horrible lie.


Certainly not more horrible than today's PostgreSQL has NoSQL. 
Despite that, I'm not saying I'd lie. I'd say what the media would say, 
which is completely different.





  If on the other hand we resign from attracting Oracle users, in a
moment where non-Oracle databases are fighting for them. and we lose
here well, let's at least have a very compelling, attractive,
in-core, blessed, language. Even disliking it myself, PL/JavaScript
would be my #1 candidate there.


The best part about PL/PgSQL is the seamless integration with SQL.  
You can put an SQL expression pretty much anywhere.  How well would 
that work if the framework was Javascript instead of the ADA-like 
body that both PL/SQL and PL/PgSQL implement?


SQL integration is a must in a PL/* language, that's for sure. But 
leveraging a well known language, tooling, and, specially, external 
libraries/ecosystem is a much bigger win. Specially if all the languages 
that I know of are capable (with more or less effort) to integrate SQL. 
So maybe JavaScript with a way of integrating SQL would be preferable IMO.


Regards,

Álvaro


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Álvaro Hernández Tortosa


On 02/09/14 17:03, Hannu Krosing wrote:

On 09/02/2014 11:52 AM, Álvaro Hernández Tortosa wrote:


On 02/09/14 11:44, Pavel Stehule wrote:






For 9.4, we have the media already saying Postgres has
NoSQL capabilities (which is only partially true). For x.y we
could have the media saying Postgres adds Oracle compatibility
(which would be only partially true). But that brings a lot of
users to postgres, and that helps us all.


Partial true can enforce so lot of people will hate postgres too. 
False promises are wrong


Then let's stop talking about postgres being NoSQL. NoSQL is 
basically schema-less (really bad name) plus infinite scalability 
(which basically means transparent sharding). We fail to provide the 
latter very clearly...
Have you ever tried any of the real NoSQL products version of 
infinite scalability ?
Yes, and they are absolutely not infinite, and they suck in many 
other places. But they scale beyond one node, transparently, something 
that postgres doesn't. And regardless, this is what people is buying, we 
like it or not.




We are no worse than most if you use just the unstructured part (which 
is what the NoSQL crowd provides) and something like pl/proxy for scaling.


We are definitely worse. This is the problem, we only look to our 
own belly bottom (if this expression exists in English). All NoSQL scale 
*easily*, *transparently* beyond one node. Postgres doesn't. I'm not 
saying they don't suck at many many other things, or that some of them 
may be worse solution than the problem. But despite JSON/JSONB in pg is 
awesome, it's far far away from what we need to compete agains NoSQL in 
these regards.


Ask anyone not in the postgres world to use pl/proxy for scaling 
and they will run away to mongo/whatever. Talk about HA... and the 
discussion is over :( I know how hard these problems are in the general, 
transactional approach that postgres takes, and that NoSQL does this for 
very simple, non-ACID cases, but they do. Hence, we cannot claim NoSQL 
compliance, just because we have jsonb. Unfortunately :( (Surely we do 
have many other values, but let's not say that we have NoSQL 
capabilities, because we don't while others -better or worse- do).


Regards,


Álvaro



Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Álvaro Hernández Tortosa


On 02/09/14 18:20, Joel Jacobson wrote:

On Tue, Sep 2, 2014 at 6:09 PM, Kevin Grittner kgri...@ymail.com wrote:

Joel Jacobson j...@trustly.com wrote:


Sorry for being unclear, I didn't mean to suggest the main concern is
updating *all* rows.
The main concern is when you have a rather complex UPDATE WHERE clause,
aiming to update exactly one row. Some of the expressions might be
assertions, to just double-verify the values and to make it stand-out
you are checking those expressions.


These are two different problems which probably need two different
solutions.  Making the default behavior of a set-based command that
it throw an error if the resulting set is not exactly one row
doesn't seem like the right solution to either one of them.

I see your point.
Basically, we have two types of applications where PL/pgSQL is commonly used.
a) OLTP applications where you typically operate on one row for each
UPDATE command.
b) Data warehouseing applications where you process multiple rows in
each UPDATE command.

Both have different desired default behaviours of the different
set-based commands used in PL/pgSQL.
I think both are important enough to motivate a nice syntax for both use-cases.
If we cannot change the default behaviour of UPDATE, then I vote for
the eariler proposed STRICT UPDATE syntax.


I see both use cases, but I think the SQL default are set 
operations. For this particular point, if there would be just a syntax 
change rather than a new language, in the line of the ONE ROW UPDATE 
syntax, I'd say UNIQUE UPDATE. This {ONE ROW, UNIQUE} syntax may also 
enforce having a WHERE clause. I find both better than the STRICT, which 
is not doing what -you may think- it does.


Regards,

Álvaro


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Álvaro Hernández Tortosa


On 02/09/14 18:33, Hannu Krosing wrote:

On 09/02/2014 06:27 PM, Joel Jacobson wrote:

On Tue, Sep 2, 2014 at 6:11 PM, Álvaro Hernández Tortosa a...@nosys.es wrote:

 We are definitely worse. This is the problem, we only look to our own
belly bottom (if this expression exists in English). All NoSQL scale
*easily*, *transparently* beyond one node. Postgres doesn't. I'm not saying
they don't suck at many many other things, or that some of them may be worse
solution than the problem. But despite JSON/JSONB in pg is awesome, it's far
far away from what we need to compete agains NoSQL in these regards.

So the discussion started out with a desire to improve PL/pgSQL. Now
somehow NoSQL and JSON is discussed in the same thread. Interesting.
Godwin's Law never fails :-)
http://en.wikipedia.org/wiki/Godwin's_law

Not to mention completely unsubstantiated claims about *all* NoSQL
scaling *easily* and *transparently* beyond one node :)



Honestly, this is off-topic and we can argue forever, but 
regardless all do or not, what's sure is that Postgres doesn't have 
horizontal scalability. Period.


And this is what we should look at. And we can't claim we're NoSQL 
until we have (easy, transparent) horizontal scalability.


Best,

Álvaro



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Álvaro Hernández Tortosa


On 02/09/14 23:11, David Johnston wrote:
On Tue, Sep 2, 2014 at 4:48 PM, Joshua D. Drake j...@commandprompt.com 
mailto:j...@commandprompt.comwrote:



On 09/02/2014 09:48 AM, Bruce Momjian wrote:

As a case in point, EDB have spent quite a few man-years
on their Oracle
compatibility layer; and it's still not a terribly exact
match, according
to my colleagues who have looked at it.  So that is a
tarbaby I don't
personally care to touch ... even ignoring the fact that
cutting off
EDB's air supply wouldn't be a good thing for the
community to do.


What any commercial entity and the Community do are mutually
exclusive and we can not and should not determine what features we
will support based on any commercial endeavor.


​From where I sit the mutually exclusive argument doesn't seem to be 
true - and in fact is something I think would be bad if it were.  We 
shouldn't be afraid to add features to core that vendors are offering 
but at the same time the fact that the Oracle compatibility aspects 
are commercial instead of in-core is a plus to help ensure that there 
are people making a decent living off PostgreSQL and thus are invested 
in its future


Definitely we shouldn't be afraid to add any feature to core, if we 
(as a community) like it and can do it. And for sure, commercial 
versions and consultancy companies need to make a living and we should 
care of them all (that includes myself -my company-, of course). But 
there is plenty of space for all, specially with an Oracle compatibility 
layer. That would attract many many many users to postgres, and we all 
(including EDB, of course) would immediately benefit from it. Of course 
the community too.


Plus, competition is never bad: it's the key to progress. Even if 
it would steal business from EDB, having to compete with PostgreSQL 
would foster them to improve and differentiate, becoming better. So I 
don't see any problem there.


Of course, that's only my view :)

Best,

Álvaro



Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Álvaro Hernández Tortosa


On 02/09/14 23:34, Joshua D. Drake wrote:


On 09/02/2014 02:11 PM, David Johnston wrote:

On Tue, Sep 2, 2014 at 4:48 PM, Joshua D. Drake j...@commandprompt.com
mailto:j...@commandprompt.comwrote:


On 09/02/2014 09:48 AM, Bruce Momjian wrote:

As a case in point, EDB have spent quite a few man-years on
their Oracle
compatibility layer; and it's still not a terribly exact
match, according
to my colleagues who have looked at it.  So that is a
tarbaby I don't
personally care to touch ... even ignoring the fact that
cutting off
EDB's air supply wouldn't be a good thing for the community
to do.


What any commercial entity and the Community do are mutually
exclusive and we can not and should not determine what features we
will support based on any commercial endeavor.


​From where I sit the mutually exclusive argument doesn't seem to be
true - and in fact is something I think would be bad if it were.  We
shouldn't be afraid to add features to core that vendors are offering
but at the same time the fact that the Oracle compatibility aspects are
commercial instead of in-core is a plus to help ensure that there are
people making a decent living off PostgreSQL and thus are invested in


Far more people make a very good living off of PostgreSQL than *any* 
commercial variant. I stand by what I said. It is not the 
responsibility or the care of the community what a commercial vendor 
does or does not do with their fork except, possibly to implement the 
open source equivalent where it makes sense or where licensing may not 
be followed. (Read: I don't care about oracle compatibility)



Yeah, we differ there. I think having an Oracle compatibility layer 
in PostgreSQL would be the-next-big-thing we could have. Oracle is has 
orders of magnitude bigger user base than postgres has; and having the 
ability to attract them would bring us many many more users which, in 
turn, would benefit us all very significantly.


It would be my #1 priority to do in postgres (but yes, I know 
-guess- how hard and what resources that would require). But dreaming is 
free :)


Álvaro



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Álvaro Hernández Tortosa


On 03/09/14 00:41, Joshua D. Drake wrote:


On 09/02/2014 02:47 PM, Álvaro Hernández Tortosa wrote:


 Yeah, we differ there. I think having an Oracle compatibility layer
in PostgreSQL would be the-next-big-thing we could have. Oracle is has
orders of magnitude bigger user base than postgres has; and having the
ability to attract them would bring us many many more users which, in
turn, would benefit us all very significantly.

 It would be my #1 priority to do in postgres (but yes, I know
-guess- how hard and what resources that would require). But dreaming is
free :)


Oracle compatibility certainly has merit, I just don't see it as 
useful for core. I would be far more interested in MSSQL compatibility 
honestly. That said, Postgres itself is a rockstar and I think we can 
make our own case without having to copy others.


It's not copying. It's easying a path for others to migrate and 
come to Postgres.


I'm interested why you are more interested in MSSQL. My reasons for 
being interested in Oracle are:


- It has more users (biggest and above all, the main reason: we could 
attract more)
- Postgres is perceived as similar to Oracle (so migration is likely 
to be easier)


That's all I want. Grow postgres userbase, attracting Oracle users :)

Álvaro


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API

2014-09-02 Thread Álvaro Hernández Tortosa


On 02/09/14 04:47, Dobes Vandermeer wrote:


Same idea as PgBouncer or PgPool. The advantage over hacking
PgBouncer/PgPool for the job is that Tomcat can already do a lot
of what
you want using built-in, pre-existing functionality. Connection pool
management, low level REST-style HTTP processing, JSON handling
etc are
all done for you.


Yeah, those are nice conveniences but I still think installing Java 
and getting something to run on startup is a bit more of a hurdle. 
 Better maek life easier up front by having a simple standalone proxy 
you can compile and run with just whatever is already available on a 
typical AWS ubuntu environment.


If instead of Tomcat you use Jetty, you can embed the whole 
app+Jetty+dependencies in a single executable JAR, which easies 
deployment a lot. Installing a JVM in a Ubuntu environment is just one 
apt-get and even easier if you use CloudFormation for automation. I 
don't think is a bad choice at all... you get most of the functionality 
you want already there, as Craig said, and it's lightweight.


Hope it helps,

Álvaro



Re: [HACKERS] PL/pgSQL 2

2014-09-01 Thread Álvaro Hernández Tortosa


On 01/09/14 14:27, Joel Jacobson wrote:

On Mon, Sep 1, 2014 at 1:30 PM, Pavel Stehule pavel.steh...@gmail.com wrote:

I agree with Andres - it is not a good for plpgsql and for plpgsql users.
The benefit must be significant for 90% of users.

...

Official implementation of plpgsql2 can be very wrong and dangerous signal -
so we should not to do.

Do you argue the introduction of plpgsql2 would hurt the users of
plpgsql in some way? How?

If you have X% who continue to happily use plpgsql, and (100-X%) who
find they can use plpgsql2 in their project, for new functions or all
functions (for a new project), then you have made (100-X)% of the
users more happy, than they would be if they were forced to use
plpgsql and suffer from its problems.

It *would* be a problem if you had to choose between writing all
functions in their plpgsql or plpgsql2, but thanks to postgres support
for different pl-languages and mixing different languages in the same
project, I cannot see the problem.


What it's clear from my non-hacker, casual hackers ml reader 
opinion here, is that there is room for new language features or a new 
in-core language at once. I find Joel's reasoning quite clear about the 
general concepts of improving on plpgsql, although the precise changes 
may not be big enough to justify just a new version. But if there are 
enough changes, and breaking compatibility with the current plpgsql is a 
major concern, I fail to buy other arguments of why doing plpgsql2 is a 
bad thing. The comparisons with Python/Perl are very misleading, as they 
have nothing to do with Postgres, and the case is obviously different.


What I can add is that, if Postgres is to devote resources to a new 
language, I would plan it with a broader scope. What would attract most 
users? Would it bring non postgres users to Postgres? What could be one 
of the killer features of any next version? My trivial answer to most of 
these questions is: PL/SQL. I don't know with detail how complex this is 
to get in Postgres (well, EDB probably knows), but if I had to chose a 
new language, this is it. So my questions would rather be:


- Is it feasible (resources, time, interest) to implement PL/SQL in 
Postgres?
- Does it support all the requested new features Joel and others 
mentioned in this thread as desires for the new language?
- If the answer to the previous question is no, could those unsupported 
features be implemented as a compatible superset of PL/SQL?


Sorry if this sounds too unconventional for this list, but this is 
what IMVHO many users would be more pleased with.


My 2 cents,

Álvaro


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-01 Thread Álvaro Hernández Tortosa


On 01/09/14 20:42, Tom Lane wrote:

=?UTF-8?B?w4FsdmFybyBIZXJuw6FuZGV6IFRvcnRvc2E=?= a...@nosys.es writes:

  What I can add is that, if Postgres is to devote resources to a new
language, I would plan it with a broader scope. What would attract most
users? Would it bring non postgres users to Postgres? What could be one
of the killer features of any next version? My trivial answer to most of
these questions is: PL/SQL.

By that I suppose you mean I wish it would act just like Oracle.
The problem with such a wish is that a lot of the incompatibilities
with Oracle are functions of the core SQL engine, not of the PL.
plpgsql already is about as close to PL/SQL as it's possible to get
without changing core Postgres behavior --- or at least, that was
the original design desire, and I don't think that it's failed in
any large degree.

regards, tom lane


It's true that some of the incompatibilities are the core engine, 
internal functions and so on, and that the plpgsql design goal was to 
achieve similarity. But similarity is not code compatibility, and 
afaik, plpgsql is not code compatible with PL/SQL. Having 1:1 code 
compatibility, if possible, is a very well first step, only followed by 
the core functionalities you mention.


If postgres were going for a new language, why not implement one 
which, having the other suggested functionality, also has 1:1 PL/SQL 
code compatibility? I'm sure it's no trivial task, but one highly desirable.


Regards,

Álvaro



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-01 Thread Álvaro Hernández Tortosa


On 01/09/14 21:08, Pavel Stehule wrote:




2014-09-01 20:58 GMT+02:00 Álvaro Hernández Tortosa a...@nosys.es 
mailto:a...@nosys.es:



On 01/09/14 20:42, Tom Lane wrote:

=?UTF-8?B?w4FsdmFybyBIZXJuw6FuZGV6IFRvcnRvc2E=?= a...@nosys.es
mailto:a...@nosys.es writes:

  What I can add is that, if Postgres is to devote
resources to a new
language, I would plan it with a broader scope. What would
attract most
users? Would it bring non postgres users to Postgres? What
could be one
of the killer features of any next version? My trivial
answer to most of
these questions is: PL/SQL.

By that I suppose you mean I wish it would act just like Oracle.
The problem with such a wish is that a lot of the
incompatibilities
with Oracle are functions of the core SQL engine, not of the PL.
plpgsql already is about as close to PL/SQL as it's possible
to get
without changing core Postgres behavior --- or at least, that was
the original design desire, and I don't think that it's failed in
any large degree.

regards, tom lane


It's true that some of the incompatibilities are the core
engine, internal functions and so on, and that the plpgsql design
goal was to achieve similarity. But similarity is not code
compatibility, and afaik, plpgsql is not code compatible with
PL/SQL. Having 1:1 code compatibility, if possible, is a very well
first step, only followed by the core functionalities you mention.

If postgres were going for a new language, why not implement
one which, having the other suggested functionality, also has 1:1
PL/SQL code compatibility? I'm sure it's no trivial task, but one
highly desirable.


It is false expectation - language is only one part .. and plpgsql 
isn't to far. There are different system of modules, different system 
of custom aggregates, mainly with PL/SQL is very complex library 
dbms_. This library is maybe more complex than current Postgres base.


OK. Understood. Full compatibility may be a longer-term goal. But 
why it's bad to have the same syntax at a language -not library- level?




It is task for commercial project --- not all Postgres users need a 
Oracle compatibility layer.


Certainly not all users need that layer. But I'm sure few would 
complain to have it.


Besides that, why do you say it is meant for a commercial project? 
If it is because postgres should not listen to users willing to migrate 
from Oracle --then we're screwed, losing the biggest opportunity (of 
attracting a large crowd of users) of recent times. If it is because 
it's too complex... well, I don't think the postgres community (as a 
whole) have less resources than commercial projects.



Next, I am sure, so it is in contradiction to Joel proposal.


That's not my business ;P

No, really: if there is a new version of a language, which 
modifies the current syntax of plpgsql; if plpgsql is already very 
similar to PL/SQL: why not rather than coming up with a new syntax use 
an already existing one? One that many, many more users than plpgsql, 
already know?


Regards,

Álvaro




Re: [HACKERS] PL/pgSQL 2

2014-09-01 Thread Álvaro Hernández Tortosa


On 01/09/14 21:52, Joel Jacobson wrote:

On Mon, Sep 1, 2014 at 8:34 PM, Álvaro Hernández Tortosa a...@nosys.es wrote:

 What I can add is that, if Postgres is to devote resources to a new
language, I would plan it with a broader scope. What would attract most
users? Would it bring non postgres users to Postgres? What could be one of
the killer features of any next version? My trivial answer to most of these
questions is: PL/SQL. I don't know with detail how complex this is to get in
Postgres (well, EDB probably knows), but if I had to chose a new language,
this is it. So my questions would rather be:

Interesting visionary ideas.

For me personally, I have no Oracle functions to port to Postgres, so
all differences
between PL/SQL and PL/pgSQL would just be unnecessary extra amount of work
in the process of porting existing code into a new language, be it
PL/SQL or PL/pgSQL 2.

That said, if PL/SQL wouldn't suffer from the problems I have with
PL/pgSQL today,
I wouldn't see a problem writing new code in the language, but then I
would probably
never manage to port all existing code to PL/SQL and I would be stuck
with a mix of code
in two languages instead of one. With PL/pgSQL 2 on the other hand, it
would be feasible
to eventually port all my existing code, as most of it would be
compatible without changes
and the rest would easy to make compatible.

I guess it's a question of if it's more important to recruit Oracle users,
or if it's more important to improve the life of existing PL/pgSQL
Postgres users.


I agree that for you, unfortunately, plpgsql2 would be better than 
PL/SQL. However, I believe as a whole a bigger majority of users would 
be benefited from this.


If anyone is willing to look at the market numbers, the number of 
PL/SQL users compared to those of plpgsql is really huge. That would 
surely attract many more users to postgres, which would ultimately 
greatly help us all (in detriment of you and a few others, sure, but on 
a way that I'm sure it's manageable).


Of course, I'd +1 any attempt to build a super-set of PL/SQL that 
would try to implement, also, the request that you and others may have 
about this PL, should they make sense.


Regards,

Álvaro


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-01 Thread Álvaro Hernández Tortosa


On 01/09/14 23:31, Marko Tiikkaja wrote:

On 2014-09-01 11:11 PM, Álvaro Hernández Tortosa wrote:

  No, really: if there is a new version of a language, which
modifies the current syntax of plpgsql; if plpgsql is already very
similar to PL/SQL: why not rather than coming up with a new syntax use
an already existing one? One that many, many more users than plpgsql,
already know?


The point isn't to create a new language just for the sake of creating 
a new one.  It's to fix the problems PL/PgSQL has.  If we're just 
going to trade the problems in PL/PgSQL with another set of problems 
implemented by PL/SQL, we're just worse off in the end.


Agreed. But if we can solve them --only if we could-- by leveraging 
a syntax that happens to be:


- Similar to that of plpgsql (exactly the same as plpgsql2 would be 
similar to plpgsql)

- Already known by a large, very large, group of users

we would be way better off. If there are unresolved problems in the 
PL/SQL current implementation, doing a superset of it may make sense.


Regards,


Álvaro



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/pgSQL 2

2014-09-01 Thread Álvaro Hernández Tortosa


On 01/09/14 23:46, David G Johnston wrote:

Álvaro Hernández Tortosa wrote

On 01/09/14 21:52, Joel Jacobson wrote:

On Mon, Sep 1, 2014 at 8:34 PM, Álvaro Hernández Tortosa lt;

aht@
gt; wrote:

  What I can add is that, if Postgres is to devote resources to a new
language, I would plan it with a broader scope. What would attract most
users? Would it bring non postgres users to Postgres? What could be one
of
the killer features of any next version? My trivial answer to most of
these
questions is: PL/SQL. I don't know with detail how complex this is to
get in
Postgres (well, EDB probably knows), but if I had to chose a new
language,
this is it. So my questions would rather be:

Interesting visionary ideas.

For me personally, I have no Oracle functions to port to Postgres, so
all differences
between PL/SQL and PL/pgSQL would just be unnecessary extra amount of
work
in the process of porting existing code into a new language, be it
PL/SQL or PL/pgSQL 2.

That said, if PL/SQL wouldn't suffer from the problems I have with
PL/pgSQL today,
I wouldn't see a problem writing new code in the language, but then I
would probably
never manage to port all existing code to PL/SQL and I would be stuck
with a mix of code
in two languages instead of one. With PL/pgSQL 2 on the other hand, it
would be feasible
to eventually port all my existing code, as most of it would be
compatible without changes
and the rest would easy to make compatible.

I guess it's a question of if it's more important to recruit Oracle
users,
or if it's more important to improve the life of existing PL/pgSQL
Postgres users.

  I agree that for you, unfortunately, plpgsql2 would be better than
PL/SQL. However, I believe as a whole a bigger majority of users would
be benefited from this.

Is it even legal for us to create PL/SQL?
Honestly, I don't know. But I don't think so. And if it is, be it. 
Let's have Oracle sue us (who?)


Beyond that in all likelihood having both a version two of the pl/pgsql
language and the pl/SQL language would be a desireable outcome for, say, a
10.0 release.
If 10.0 is to follow 9.5, I'd agree. Althouth I'd also agree for 
any earlier, if that would be possible.


The former simply because languages by their very nature are evolutionary
and at some point the lost productivity of suppressing such evolution in the
name of backward compatibility will be deemed undesirable.  It may be
desirable to simply call the new language pl/elephant instead pl/pgsql2 but
the fundamental reason for evolving a language in order to incorporate newly
acquired knowledge is unarguable. Though in this case the entire
language/extension mechanism should be considered and not just the specific
procedural-SQL language we are dealing with here.

The goal of adding PL/SQL would be to increase the user base of the project
and hopefully attract new blood to the development team in order to maximize
long-term survivability and increase the pace of innovation.  We would be
unable to introduce substantial evolution to this language for that very
reason and so a different language is likely to be needed - eventually.


That's what I think. Increasing the user base, in a moment where 
for many reasons we all know many want to migrate off of Oracle... let's 
bring them here before they move to something else, with also piles of 
money and great marketing capabilities


Álvaro



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >