Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
David G Johnston wrote Ian Barwick wrote Hi, The JDBC API provides the getGeneratedKeys() method as a way of retrieving primary key values without the need to explicitly specify the primary key column(s). This is a widely-used feature, however the implementation has significant performance drawbacks. Currently this feature is implemented in the JDBC driver by appending RETURNING * to the supplied statement. However this means all columns of affected rows will be returned to the client, which causes significant performance problems, particularly on wide tables. To mitigate this, it would be desirable to enable the JDBC driver to request only the primary key value(s). Seems like a good idea. ERROR: Relation does not have any primary key(s) Relation does not have a primary key. or Relation has no primary key. (preferred) By definition it cannot have more than one so it must have none. It could have multiple unique constraints but I do not believe they are considered if not tagged as primary. Also, I did see where you account for auto-updatable views but what about complex views with instead of triggers? These can still be the target of DML queries but are not guaranteed (though can possibly) to return a well-defined primary key. At worse an explicit error about the view itself, not the apparent lack of primary key, should be emitted. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/RETURNING-PRIMARY-KEY-syntax-extension-tp5806462p5806464.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
On 09/06/14 14:47, David G Johnston wrote: Ian Barwick wrote Hi, The JDBC API provides the getGeneratedKeys() method as a way of retrieving primary key values without the need to explicitly specify the primary key column(s). This is a widely-used feature, however the implementation has significant performance drawbacks. Currently this feature is implemented in the JDBC driver by appending RETURNING * to the supplied statement. However this means all columns of affected rows will be returned to the client, which causes significant performance problems, particularly on wide tables. To mitigate this, it would be desirable to enable the JDBC driver to request only the primary key value(s). Seems like a good idea. ERROR: Relation does not have any primary key(s) Relation does not have a primary key. or Relation has no primary key. (preferred) By definition it cannot have more than one so it must have none. Ah yes, amazing what a fresh pair of eyes does :). The plural is the vestige of an earlier iteration which said something about the relation not having any primary key column(s). Will fix, thanks. Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On Sat, Jun 7, 2014 at 1:28 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-06 15:44:25 -0400, Tom Lane wrote: I figured it'd be easy enough to get a better estimate by adding another counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively assuming that in-progress inserts and deletes will both commit). I did that, and found that it helped Tim's test case not at all :-(. A bit of sleuthing revealed that HeapTupleSatisfiesVacuum actually returns INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of whether the transaction has since marked it for deletion: /* * It'd be possible to discern between INSERT/DELETE in progress * here by looking at xmax - but that doesn't seem beneficial for * the majority of callers and even detrimental for some. We'd * rather have callers look at/wait for xmin than xmax. It's * always correct to return INSERT_IN_PROGRESS because that's * what's happening from the view of other backends. */ return HEAPTUPLE_INSERT_IN_PROGRESS; That's only the case of a couple of days ago. I really wasn't sure wheter to go that way or discern the two cases. That changed in the wake of: http://www.postgresql.org/message-id/20140530143150.GA11051@localhost Won't this change impact the calculation of number of live rows for analyze (acquire_sample_rows() considers the HEAPTUPLE_DELETE_IN_PROGRESS tuples as liverows for tuples updated by transactions other than current transaction)? Even if we think that estimates are okay, the below comment in acquire_same_rows() doesn't seem to suggest it. /* * We count delete-in-progress rows as still live, using * the same reasoning given above; but we don't bother to * include them in the sample. * .. */ With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
On Monday, June 9, 2014, Ian Barwick i...@2ndquadrant.com wrote: On 09/06/14 14:47, David G Johnston wrote: Ian Barwick wrote Hi, The JDBC API provides the getGeneratedKeys() method as a way of retrieving primary key values without the need to explicitly specify the primary key column(s). This is a widely-used feature, however the implementation has significant performance drawbacks. Currently this feature is implemented in the JDBC driver by appending RETURNING * to the supplied statement. However this means all columns of affected rows will be returned to the client, which causes significant performance problems, particularly on wide tables. To mitigate this, it would be desirable to enable the JDBC driver to request only the primary key value(s). ISTM that having a non-null returning clause variable when no returning is present in the command makes things more complicated and introduces unnecessary checks in the not uncommon case of multiple non-returning commands being issued in series. returningList was able to be null and so should returningClause. Then if non-null first check for the easy column listing and then check for the more expensive PK lookup request. Then again the extra returning checks may just amount noise. David J.
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
On 09/06/14 17:47, David G Johnston wrote: Ian Barwick wrote Hi, The JDBC API provides the getGeneratedKeys() method as a way of retrieving primary key values without the need to explicitly specify the primary key column(s). This is a widely-used feature, however the implementation has significant performance drawbacks. Currently this feature is implemented in the JDBC driver by appending RETURNING * to the supplied statement. However this means all columns of affected rows will be returned to the client, which causes significant performance problems, particularly on wide tables. To mitigate this, it would be desirable to enable the JDBC driver to request only the primary key value(s). Seems like a good idea. ERROR: Relation does not have any primary key(s) Relation does not have a primary key. or Relation has no primary key. (preferred) By definition it cannot have more than one so it must have none. It could have multiple unique constraints but I do not believe they are considered if not tagged as primary. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/RETURNING-PRIMARY-KEY-syntax-extension-tp5806462p5806463.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. From memory all unique keys can be considered 'candidate primary keys', but only one can be designated 'the PRIMARY KEY'. I also like your preferred error message, and to the full extent of my decidedly Non-Authority, I hereby authorise it! :-) Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller
When I read the source code about the hashjoin, I was very confused that the postgresql define the NTUP_PER_BUCKET value as 10. Since this value is used to estimate the tuple count in one bucket, is it better if we have a smaller value? I have not done some experiments, but it seems that we could archive less hash collisions and better performance if we decrease the value. So could anyone explain to me that why we define NTUP_PER_BUCKET as 10? If there exists a specified situation that we would get worse performance or some troubles if set NTUP_PER_BUCKET to 1 or 2? Thanks very much. -- View this message in context: http://postgresql.1045698.n5.nabble.com/why-postgresql-define-NTUP-PER-BUCKET-as-10-not-other-numbers-smaller-tp5806472.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup failed to back up large file
On Wednesday, June 4, 2014, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net javascript:; writes: On Tue, Jun 3, 2014 at 6:38 PM, Tom Lane t...@sss.pgh.pa.us javascript:; wrote: Another thought is we could make pg_basebackup simply skip any files that exceed RELSEG_SIZE, on the principle that you don't really need/want enormous log files to get copied anyhow. We'd still need the pax extension if the user had configured large RELSEG_SIZE, but having a compatible tar could be documented as a requirement of doing that. I think going all the way to pax is the proper long-term thing to do, at least if we can confirm it works in the main tar implementations. For backpatchable that seems more reasonable. It doesn't work today, and we just need to document that it doesn't, with larger RELSEG_SIZE. And then fix it properly for the future. Agreed, this would be a reasonable quick fix that we could replace in 9.5 or later. Fujii, are you going to be able to work on this with the now expanded scope? :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Allowing NOT IN to use ANTI joins
On Mon, Jun 09, 2014 at 12:36:30AM +1200, David Rowley wrote: Currently pull_up_sublinks_qual_recurse only changes the plan for NOT EXISTS queries and leaves NOT IN alone. The reason for this is because the values returned by a subquery in the IN clause could have NULLs. Awesome. I've had a brief look at the patch and other than a line of extraneous whitespace it looks sane. Since it is only testing on NOT IN queries I don't think there are any issues with it slowing down simple queries. I also note you can't prove id+1 not null. At first I thought you might be able to prove this not null if the operator/function was strict, but then I realised that strict only means null if input is null not output is only null if inputs are null. Pity. Nice work. -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Allowing NOT IN to use ANTI joins
On Sun, Jun 8, 2014 at 3:36 PM, David Rowley dgrowle...@gmail.com wrote: Currently pull_up_sublinks_qual_recurse only changes the plan for NOT EXISTS queries and leaves NOT IN alone. The reason for this is because the values returned by a subquery in the IN clause could have NULLs. I believe the reason why this hasn't been done yet, is that the plan becomes invalid when another backend modifies the nullability of the column. To get it to replan, you'd have to introduce a dependency on the NOT NULL constraint, but it's impossible for now because there's no pg_constraint entry for NOT NULLs. The only way to consistently guarantee nullability is through primary key constraints. Fortunately that addresses most of the use cases of NOT IN(), in my experience. See the comment in check_functional_grouping: * Currently we only check to see if the rel has a primary key that is a * subset of the grouping_columns. We could also use plain unique constraints * if all their columns are known not null, but there's a problem: we need * to be able to represent the not-null-ness as part of the constraints added * to *constraintDeps. FIXME whenever not-null constraints get represented * in pg_constraint. The behavior you want seems somewhat similar to check_functional_grouping; maybe you could unify it with your targetListIsGuaranteedNotToHaveNulls at some level. (PS: that's one ugly function name :) Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
On 06/09/2014 09:06 AM, Gavin Flower wrote: From memory all unique keys can be considered 'candidate primary keys', but only one can be designated 'the PRIMARY KEY'. Almost. Candidate keys are also NOT NULL. -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Supporting Windows SChannel as OpenSSL replacement
Hi, I've been looking at Windows' native SSL implementatation, the SChannel API. It would be nice to support that as a replacement for OpenSSL on Windows. Currently, we bundle the OpenSSL library in the PostgreSQL, installers, which is annoying because whenever OpenSSL puts out a new release that fixes vulnerabilities, we need to do a security release of PostgreSQL on Windows. I was reminded of this recently wrt. psqlODBC, which bundles libpq and openssl as well. It's particularly annoying for psqlODBC and other client applications, as people typically update it less diligently than their servers. I think that we should keep the user-visible behavior the same, i.e. the libpq connection options, locations of the certificate files etc. would all be the same regardless of which SSL implementation is used. Using Windows SChannel API might make it possible to integrate better with Windows' own certificate store etc. but I don't really know much about that stuff, so for starters I'd like to just use it as a drop-in replacement for OpenSSL. Thoughts? While we're at it, we'll probably want to refactor things so that it's easy to support other SSL implementations too, like gnutls. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing NOT IN to use ANTI joins
On 06/08/2014 02:36 PM, David Rowley wrote: + if (!get_attnotnull(tle-resorigtbl, tle-resorigcol)) + return false; As Marti says, you can't do this because NOT NULL doesn't have an oid to attach a dependency to. You'll have to restrict this test to primary keys only for now. -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On Monday, June 9, 2014, Heikki Linnakangas hlinnakan...@vmware.com wrote: Hi, I've been looking at Windows' native SSL implementatation, the SChannel API. It would be nice to support that as a replacement for OpenSSL on Windows. Currently, we bundle the OpenSSL library in the PostgreSQL, installers, which is annoying because whenever OpenSSL puts out a new release that fixes vulnerabilities, we need to do a security release of PostgreSQL on Windows. I was reminded of this recently wrt. psqlODBC, which bundles libpq and openssl as well. It's particularly annoying for psqlODBC and other client applications, as people typically update it less diligently than their servers. I think that we should keep the user-visible behavior the same, i.e. the libpq connection options, locations of the certificate files etc. would all be the same regardless of which SSL implementation is used. Using Windows SChannel API might make it possible to integrate better with Windows' own certificate store etc. but I don't really know much about that stuff, so for starters I'd like to just use it as a drop-in replacement for OpenSSL. Thoughts? While we're at it, we'll probably want to refactor things so that it's easy to support other SSL implementations too, like gnutls. It's a project that many have started, and nobody has finished :) I'm definitely interested in working on such a things, but I've been unable to carve out enough time recently. One problem is as you say, that we're exposing openssl too much. For one thing, we *cannot* keep the current interface, because it returns OpenSSL internal datastructures. Those functions will need to be deprecated and replaced with something else. Also, my memory says that SChannel doesn't support the key file format that we use now, which makes a much bigger break with the supported platforms. That may have changed of course - have you researched that part? The main other entries I've been looking at are NSS and gnutls, both of which can speak our current file formats. I think the right thing is to start with those and thereby make it more pluggable, and only after that tackle schannel. But I do think it would be good to have them all. It's also a question of if we can accept supporting a different set of libraries on the server vs on the client. It's really on the client that it's a bigger problem, but in the end I think we want to have symmetrical support. But it might be worth doing just the client side initially, and then move to the server. I think in general, the client side is actually likely to be *harder* than the server side.. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On 06/09/2014 02:53 PM, Magnus Hagander wrote: Also, my memory says that SChannel doesn't support the key file format that we use now, which makes a much bigger break with the supported platforms. That may have changed of course - have you researched that part? A quick web search turned up a few discussion forums threads with a recipe for this (e.g https://stackoverflow.com/questions/1231178/load-an-x509-pem-file-into-windows-cryptoapi). There's no direct read this file function, but there are low-level functions that can decode the file format once it's read into memory. So it seems possible to make it work. It's also a question of if we can accept supporting a different set of libraries on the server vs on the client. It's really on the client that it's a bigger problem, but in the end I think we want to have symmetrical support. But it might be worth doing just the client side initially, and then move to the server. I think in general, the client side is actually likely to be *harder* than the server side.. Once we've modified the client to support multiple libraries, it's probably not much extra effort to do the same to the server. I wouldn't like to support different libraries in client and server, if only because it would be more complicated to have separate ./configure options for client and server. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
David G Johnston david.g.johns...@gmail.com wrote: ERROR: Relation does not have any primary key(s) Relation does not have a primary key. or Relation has no primary key. (preferred) Project style says that the primary message should not capitalize the first word, nor should it end in a period. Detail and hints should be in sentence style, but not the message itself. http://www.postgresql.org/docs/9.3/interactive/error-style-guide.html#AEN100914 -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On 2014-06-09 13:53:15 +0200, Magnus Hagander wrote: The main other entries I've been looking at are NSS and gnutls, both of which can speak our current file formats. I think the right thing is to start with those and thereby make it more pluggable, and only after that tackle schannel. But I do think it would be good to have them all. I think NSS makes a great deal of sense - the advantages of supporting gnutls are much less clear to me. Maybe it's little enough additional code that that doesn't matter much, but we imo shouldn't focus on it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On Mon, Jun 09, 2014 at 02:45:08PM +0300, Heikki Linnakangas wrote: Thoughts? While we're at it, we'll probably want to refactor things so that it's easy to support other SSL implementations too, like gnutls. One project that is proud to support several SSL implementations is curl: http://curl.haxx.se/ Git: https://github.com/bagder/curl.git Implementations: https://github.com/bagder/curl/tree/master/lib/vtls List from vtls.c: - OpenSSL - GnuTLS - NSS - QsoSSL - GSKit - PolarSSL - CyaSSL - Schannel SSPI - SecureTransport (Darwin) We cannot reuse the code directly, but seems it's usable for reference for various gotchas that need to be solved. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [bug fix] Memory leak in dblink
Hello, I've fixed and tested a memory leak bug in dblink. Could you review and commit this? I'll add this to the CommitFest shortly. [Problem] A user reported a problem in pgsql-jp ML that he encountered a out of memory error when he ran the ran the following function on 32-bit PostgreSQL 9.3: CREATE OR REPLACE FUNCTION aaa( character varying) RETURNS character varying AS $BODY$ DECLARE ... BEGIN PERFORM (SELECT DBLINK_CONNECT('conn','dbname=DB-B user=postgres')); DELETE FROM tbl0010 dba WHERE EXISTS ( SELECT tbl0010_cd FROM tbl0010 INNER JOIN ( SELECT * FROM DBLINK ('conn',' SELECT tbl0411_cd FROM tbl0411 INNER JOIN( ... The above query calls dblink() hundreds of thousands of times. You should reproduce the problem with a simpler query like this: CREATE TABLE mytable (col int); INSERT INTO mytable ...; /* insert many rows */ SELECT * FROM mytable WHERE EXISTS (SELECT * FROM dblink( 'con', 'SELECT * FROM mytable WHERE col = ' || col) t(col)); [Cause] Hundreds of thousands of the following same line were output in the server log: dblink temporary context: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used Each dblink function call creates an instance of this memory context, but it fails to delete it. This bug seems to have been introduced in 9.2.0 by this performance improvement: Improve efficiency of dblink by using libpq's new single-row processingmode(Kyotaro Horiguchi, Marko Kreen) Regards MauMau dblink_memleak.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On 06/09/2014 01:45 PM, Heikki Linnakangas wrote: Thoughts? While we're at it, we'll probably want to refactor things so that it's easy to support other SSL implementations too, like gnutls. There was a patch set for this from Martijn van Oosterhout which was quite complete. http://www.postgresql.org/message-id/20060504134807.gk4...@svana.org I am interested in dropping the dependency on OpenSSL, if only to fix the situation with Debian, libreadline and OpenSSL[1]. Notes 1. They now compile against libedit and change to using libreadline at runtime. This does not work perfectly though since libreadline supports some features which libedit does not which can only be checked for at compile time. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
On Mon, Jun 9, 2014 at 10:07 AM, MauMau maumau...@gmail.com wrote: Hello, I've fixed and tested a memory leak bug in dblink. Could you review and commit this? I'll add this to the CommitFest shortly. I think there no need to add it to the commitfest, because it's a bugfix and not a new feature. Or am I missing something? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
On 06/09/2014 06:58 AM, Ian Barwick wrote: Hi, The JDBC API provides the getGeneratedKeys() method as a way of retrieving primary key values without the need to explicitly specify the primary key column(s). Is it defined by the standard, to return _only_ generated primary keys, and not for example generated alternate keys ? Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On Mon, Jun 9, 2014 at 3:19 PM, Andreas Karlsson andr...@proxel.se wrote: On 06/09/2014 01:45 PM, Heikki Linnakangas wrote: Thoughts? While we're at it, we'll probably want to refactor things so that it's easy to support other SSL implementations too, like gnutls. There was a patch set for this from Martijn van Oosterhout which was quite complete. http://www.postgresql.org/message-id/20060504134807.gk4...@svana.org A lot has, unfortunately, changed since 2006. It might be a good startingpoint. But also actively starting from the point of let's try to support multiple libraries rather than let's try to support gnutls is probably also important. I am interested in dropping the dependency on OpenSSL, if only to fix the situation with Debian, libreadline and OpenSSL[1]. That's one of the many reasons, yes :) At some point we should design a new API, so that we can deprecate the old one. Even if we don't hve the code ready, we need to get rid of PQgetssl(), and replace it with something else. I'm thinking probably a functoin that returns both a void pointer and an enum that tells you which library is actually in use. And a boolean just saying ssl on/off, because that's what a lot of clients are interested in and they don't care aobut more than that. Obviously, we also have to do something about PQinitOpenSSL(). Unfortunately, I think it's too late to do that for 9.4 - otherwise it would've been good to have a whole cycle of deprecation on it... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On Mon, Jun 9, 2014 at 3:02 PM, Marko Kreen mark...@gmail.com wrote: On Mon, Jun 09, 2014 at 02:45:08PM +0300, Heikki Linnakangas wrote: Thoughts? While we're at it, we'll probably want to refactor things so that it's easy to support other SSL implementations too, like gnutls. One project that is proud to support several SSL implementations is curl: http://curl.haxx.se/ Git: https://github.com/bagder/curl.git Implementations: https://github.com/bagder/curl/tree/master/lib/vtls List from vtls.c: - OpenSSL - GnuTLS - NSS - QsoSSL - GSKit - PolarSSL - CyaSSL - Schannel SSPI - SecureTransport (Darwin) We cannot reuse the code directly, but seems it's usable for reference for various gotchas that need to be solved. I did actually talk to Daniel at some point about turning that into a generalized library, and/or getting him interested in helping out with it. I can't remember where that ended up - I'll see if I can poke his interest :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] [bug fix] Memory leak in dblink
From: Fabrízio de Royes Mello fabriziome...@gmail.com I think there no need to add it to the commitfest, because it's a bugfix and not a new feature. Or am I missing something? The CommitFest app has an option bug fix in the list of topic choices. I suppose the reason is that if the bug fix is only posted to pgsql-hackers and/or pgsql-bugs, it might be forgotten. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing NOT IN to use ANTI joins
Marti Raudsepp ma...@juffo.org writes: On Sun, Jun 8, 2014 at 3:36 PM, David Rowley dgrowle...@gmail.com wrote: Currently pull_up_sublinks_qual_recurse only changes the plan for NOT EXISTS queries and leaves NOT IN alone. The reason for this is because the values returned by a subquery in the IN clause could have NULLs. I believe the reason why this hasn't been done yet, is that the plan becomes invalid when another backend modifies the nullability of the column. To get it to replan, you'd have to introduce a dependency on the NOT NULL constraint, but it's impossible for now because there's no pg_constraint entry for NOT NULLs. I don't believe this is an issue, because we are only talking about a *plan* depending on the NOT NULL condition. ALTER TABLE DROP NOT NULL would result in a relcache inval event against the table, which would result in invalidating all cached plans mentioning the table. I forget exactly what context we were discussing needing a NOT NULL constraint's OID for, but it would have to be something where the dependency was longer-lived than a plan; perhaps semantics of a view? The existing comparable case is that a view containing ungrouped variable references is allowed if the GROUP BY includes a primary key, which means the semantic validity of the view depends on the pkey. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On Fri, Jun 6, 2014 at 3:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: It did not use to blow this question off: back around 8.3 you got DELETE_IN_PROGRESS if the tuple had a delete pending. I think we need less laziness + fuzzy thinking here. Maybe we should have a separate HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code? Is it *really* the case that callers other than VACUUM itself are okay with failing to make this distinction? I think that would be a good idea for conceptual clarity if nothing else. If callers are OK with it, then they can treat both of those codes alike; but then at least there's clear evidence as to the author's intent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
Heikki Linnakangas hlinnakan...@vmware.com writes: I've been looking at Windows' native SSL implementatation, the SChannel API. It would be nice to support that as a replacement for OpenSSL on Windows. Currently, we bundle the OpenSSL library in the PostgreSQL, installers, which is annoying because whenever OpenSSL puts out a new release that fixes vulnerabilities, we need to do a security release of PostgreSQL on Windows. Does SChannel have a better security track record than OpenSSL? Or is the point here just that we can define it as not our problem when a vulnerability surfaces? I'm doubtful that we can ignore security issues affecting PG just because somebody else is responsible for shipping the fix, and thus am concerned that if we support N different SSL libraries, we will need to keep track of N sets of vulnerabilities instead of just one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On 2014-06-09 10:14:32 -0400, Robert Haas wrote: On Fri, Jun 6, 2014 at 3:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: It did not use to blow this question off: back around 8.3 you got DELETE_IN_PROGRESS if the tuple had a delete pending. I think we need less laziness + fuzzy thinking here. Maybe we should have a separate HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code? Is it *really* the case that callers other than VACUUM itself are okay with failing to make this distinction? I think that would be a good idea for conceptual clarity if nothing else. If callers are OK with it, then they can treat both of those codes alike; but then at least there's clear evidence as to the author's intent. I am happy to introduce the code for that. But it'd have to be =9.4 or 9.4? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
Hi, On 2014-06-09 10:18:40 -0400, Tom Lane wrote: Does SChannel have a better security track record than OpenSSL? Or is the point here just that we can define it as not our problem when a vulnerability surfaces? Well, it's patched as part of the OS - so no new PG binaries have to be released when it's buggy. I'm doubtful that we can ignore security issues affecting PG just because somebody else is responsible for shipping the fix, and thus am concerned that if we support N different SSL libraries, we will need to keep track of N sets of vulnerabilities instead of just one. In most of the cases where such a issue exists it'll primarily affect binary distributions that include the ssl library - and those will only pick one anyway. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Mon, Jun 2, 2014 at 11:42 AM, Tom Lane t...@sss.pgh.pa.us wrote: Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: TBH I think that trying to do anything at all for inner joins is probably a bad idea. The cases where the optimization could succeed are so narrow that it's unlikely to be worth adding cycles to every query to check. I agree that we don't want to add too many cycles to trivial queries but I don't think it's at all fair to say that FK-check joins are a narrow use-case and avoiding that join could be a very nice win. [ thinks for a bit... ] OK, I'd been thinking that to avoid a join the otherwise-unreferenced table would have to have a join column that is both unique and the referencing side of an FK to the other table's join column. But after consuming more caffeine I see I got that backwards and it would need to be the *referenced* side of the FK, which is indeed a whole lot more plausible case. Back when I did web development, this came up for me all the time. I'd create a fact table with lots of id columns referencing dimension tables, and then make a view over it that joined to all of those tables so that it was easy, when reporting, to select whatever bits of information were needed. But the problem was that if the report didn't need all the columns, it still had to pay the cost of computing them, which eventually got to be problematic. That was what inspired me to develop the patch for LEFT JOIN removal, but to really solve the problems I had back then, removing INNER joins as well would have been essential. So, while I do agree that we have to keep the planning cost under control, I'm quite positive about the general concept. I think a lot of users will benefit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Andres Freund and...@2ndquadrant.com writes: On 2014-06-09 10:14:32 -0400, Robert Haas wrote: I think that would be a good idea for conceptual clarity if nothing else. If callers are OK with it, then they can treat both of those codes alike; but then at least there's clear evidence as to the author's intent. I am happy to introduce the code for that. But it'd have to be =9.4 or 9.4? We need a solution that can be back-patched, unless you're prepared to revert what you already did to HTSV in the back branches. Having said that, it's not clear to me that we couldn't change HTSV's API in the back branches. What third-party code would be likely to be depending on it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
From: Heikki Linnakangas hlinnakan...@vmware.com Thoughts? While we're at it, we'll probably want to refactor things so that it's easy to support other SSL implementations too, like gnutls. That may be good because it provides users with choices. But I wonder if it is worth the complexity and maintainability of PostgreSQL code. * Are SChannel and other libraries more secure than OpenSSL? IIRC, recently I read in the news that GnuTLS had a vulnerability. OpenSSL is probably the most widely used library, and many people are getting more interested in its quality. I expect the quality will improve thanks to the help from The Linux foundation and other organizations/researchers. * Do other libraries get support from commercial vendor product support? For example, Safenet Inc., the famous HSM (hardware security module) vendor, supports OpenSSL to access the private key stored in its HSM product. Intel offered AES-NI implementation code to OpenSSL community. I guess OpenSSL will continue to be the most functional and obtain the widest adoption and support. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/09/2014 08:58 AM, MauMau wrote: From: Fabrízio de Royes Mello fabriziome...@gmail.com I think there no need to add it to the commitfest, because it's a bugfix and not a new feature. Or am I missing something? The CommitFest app has an option bug fix in the list of topic choices. I suppose the reason is that if the bug fix is only posted to pgsql-hackers and/or pgsql-bugs, it might be forgotten. Yes, please add it to the commitfest and I'll claim it. Thanks, Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTlccRAAoJEDfy90M199hln1oP/i0FO8d9j6c8TAbmDHHh3p2Q xjKvUSmnk6crAZR43M5wkUt3bj/qp58evYLJG6x0i71tJLGVjHByT2GrfFTjyWdB hfBQy7Su1t6QsqXuOEvL4KksscRp8zGQC+vqBks69zbfi3IcfF0nAnnHzk+qWmfL WZ7k7hhbtI03llWU7QB/JZxjKt8H1tR1kauDrQroZv0uNL4qbG6darLxt53h9WaG 0K1m/iVdrhbSYzxwMzdrvKhtYexLWA1iLje6u9lZSYhXQtTD/J+gCcSkE+VngF9I hjVixfnWbB+Y8VF2Fwee0wbIV0C/9L1OVodFFIaGIPyLUc2bbSI9KkknK4CCfR3M s7/mpSUPod4JKZxmNNSll/ituUV1sWq9DJ1RhiXqLU+dAxCQGTG5jxw9dHBwC0fO giQ/srh0lnR6C3SOjgGb3mC1+uNPxNWJOt+kyL+5GIQ+RyRFBPfo5hMvlwgUnj/V 764CAJIn2IpoqEondkKRGVfJMEp3Xg/WhlXEed/hMGwoC7DT0z1GKXxWBuqJ74eA YYAp8EeHREIs0SMcPT9qUi/iSvS3jbq6U0BQM/qRPNE6yEujJ630VXHpgwTCCcuB 37yRrGl++J91UGY3pSPCOzq14G9Mscfm4a55MD+4Svuf5lOTTAr9BLzjs6XftHev 5Mx+HaOg58xVRQBA6NVB =F4pB -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On Mon, Jun 09, 2014 at 03:35:23PM +0200, Magnus Hagander wrote: On Mon, Jun 9, 2014 at 3:19 PM, Andreas Karlsson andr...@proxel.se wrote: On 06/09/2014 01:45 PM, Heikki Linnakangas wrote: There was a patch set for this from Martijn van Oosterhout which was quite complete. http://www.postgresql.org/message-id/20060504134807.gk4...@svana.org Wow, blast from the past. A lot has, unfortunately, changed since 2006. It might be a good startingpoint. But also actively starting from the point of let's try to support multiple libraries rather than let's try to support gnutls is probably also important. The patch did provide an API. The idea was that there were a number of functions which would need to be defined to support an SSL library. Each library would then have a wrapper which wrapped the library and based on the results of configure it compiled the right file into the backend. These functions were: extern void pgtls_initialize(void); extern void pgtls_destroy(void); extern int pgtls_open_server(Port *); extern void pgtls_close(Port *); extern ssize_t pgtls_read(Port *port, void *ptr, size_t len); extern ssize_t pgtls_write(Port *port, void *ptr, size_t len); Which should be easy enough to support for any library. These days you'd need to add support for verifying certificates, but I don't think that that would be difficult (unless the actual certificate formats are different). No switching after compile time, that would just lead to useless overhead. At some point we should design a new API, so that we can deprecate the old one. Even if we don't hve the code ready, we need to get rid of PQgetssl(), and replace it with something else. I'm thinking probably a functoin that returns both a void pointer and an enum that tells you which library is actually in use. And a boolean just saying ssl on/off, because that's what a lot of clients are interested in and they don't care aobut more than that. Obviously, we also have to do something about PQinitOpenSSL(). Yeah, I think this was one of the more controversial parts. Support in the backend was primarily moving code around and renaming functions, fairly straightforward. Even error handling was not so hard (I found the gnutls handling of errors much easier than openssl). One tricky part is that programs like to use libpq for the authentication, and then they hijack the connection using PGgetssl(). The way I dealt with this is defining a new state passthrough where the caller would get a few function pointers to read/write/check the connection. Then the callers would not need to know what library libpq was compiled with. And libpq would know the connection was hijacked and refuse to act anymore. I don't think everyone was pleased with this, but no real alternative was presented (other than requiring people hijacking the connection to do the hard work). For information about which library was in use there was PQgettlsinfo() which returned a PGresult with information about the library and connection. I beleive since then new functions have been added to libpq to retrive info about certificates, so that might need a rethink also. Basically, I think these last two points are the hard parts to get agreement (assuming there's agreement to do anything at all about the problem) and without nailing those down first whoever picks this up will be in for a lot of work. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On 06/09/2014 05:22 PM, Andres Freund wrote: Hi, On 2014-06-09 10:18:40 -0400, Tom Lane wrote: Does SChannel have a better security track record than OpenSSL? Or is the point here just that we can define it as not our problem when a vulnerability surfaces? Well, it's patched as part of the OS - so no new PG binaries have to be released when it's buggy. Right. I have no idea what SChannel's track record is, but when there's a vulnerability in the native SSL implementation in Windows, you better upgrade anyway, regardless of PostgreSQL. So when we rely on that, we don't put any extra burden on users. And we won't need to release new binaries just to update the DLL included in it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On 2014-06-09 10:30:43 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-06-09 10:14:32 -0400, Robert Haas wrote: I think that would be a good idea for conceptual clarity if nothing else. If callers are OK with it, then they can treat both of those codes alike; but then at least there's clear evidence as to the author's intent. I am happy to introduce the code for that. But it'd have to be =9.4 or 9.4? We need a solution that can be back-patched, unless you're prepared to revert what you already did to HTSV in the back branches. Well, I think reverting surely wouldn't be the right cure. It fixes a somewhat nasty bug. I'd certainly be prepared to add the two lines necessary to make it return DELETE_IN_PROGRESS after trying to understand Kevin's email about predicate.c and going through the other callers another time. I did ask about what people think is the more appropriate return value. And the only person that had voiced an opinion was Alvaro agreeing that it's more correct to return INSERT_IN_PROGRESS... Don't make this about me insisting on anything. Having said that, it's not clear to me that we couldn't change HTSV's API in the back branches. What third-party code would be likely to be depending on it? I'm not sure. I could imagine tools like pg_repack depending on it (it doesn't). I started to search for external users of the function and have only found a bunch of forks of postgres so far. Several of which I've never heard of before. I think it's more reasonable to return DELETE_IN_PROGRESS in existing branches and then go the new return value in 9.5 or maybe 9.4. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
Ian Barwick i...@2ndquadrant.com writes: [ RETURNING PRIMARY KEY ] It looks to me like this is coded to have the expansion of the primary key done at parse time, which seems like fundamentally the wrong thing. Consider a view or rule containing this clause; the pkey might be different by the time execution rolls around. It'd be better probably if the rewriter or planner did the expansion (and threw the error for no-primary-key, if necessary). Alternatively, we could do it like this and consider that the view is dependent on the primary key constraint, but that seems inflexible. BTW, it seems like your representation of the clause was rather poorly chosen: it forces changing a whole lot of code that otherwise would not need to be changed. I'd have left returningList alone and put the returningPrimaryKey flag someplace else. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMA packaging and patch
On Sun, Jun 8, 2014 at 5:45 PM, Kevin Grittner kgri...@ymail.com wrote: I ran into a situation where a machine with 4 NUMA memory nodes and 40 cores had performance problems due to NUMA. The problems were worst right after they rebooted the OS and warmed the cache by running a script of queries to read all tables. These were all run on a single connection. As it turned out, the size of the database was just over one-quarter of the size of RAM, and with default NUMA policies both the OS cache for the database and the PostgreSQL shared memory allocation were placed on a single NUMA segment, so access to the CPU package managing that segment became a bottleneck. On top of that, processes which happened to run on the CPU package which had all the cached data had to allocate memory for local use on more distant memory because there was none left in the more local memory. Through normal operations, things eventually tended to shift around and get better (after several hours of heavy use with substandard performance). I ran some benchmarks and found that even in long-running tests, spreading these allocations among the memory segments showed about a 2% benefit in a read-only load. The biggest difference I saw in a long-running read-write load was about a 20% hit for unbalanced allocations, but I only saw that once. I talked to someone at PGCon who managed to engineer much worse performance hits for an unbalanced load, although the circumstances were fairly artificial. Still, fixing this seems like something worth doing if further benchmarks confirm benefits at this level. By default, the OS cache and buffers are allocated in the memory node with the shortest distance from the CPU a process is running on. This is determined by a the cpuset associated with the process which reads or writes the disk page. Typically a NUMA machine starts with a single cpuset with a policy specifying this behavior. Fixing this aspect of things seems like an issue for packagers, although we should probably document it for those running from their own source builds. To set an alternate policy for PostgreSQL, you first need to find or create the location for cpuset specification, which uses a filesystem in a way similar to the /proc directory. On a machine with more than one memory node, the appropriate filesystem is probably already mounted, although different distributions use different filesystem names and mount locations. I will illustrate the process on my Ubuntu machine. Even though it has only one memory node (and so, this makes no difference), I have it handy at the moment to confirm the commands as I put them into the email. # Sysadmin must create the root cpuset if not already done. (On a # system with NUMA memory, this will probably already be mounted.) # Location and options can vary by distro. sudo sudo mkdir /dev/cpuset sudo mount -t cpuset none /dev/cpuset # Sysadmin must create a cpuset for postgres and configure # resources. This will normally be all cores and all RAM. This is # where we specify that this cpuset will spread pages among its # memory nodes. sudo mkdir /dev/cpuset/postgres sudo /bin/bash -c echo 0-3 /dev/cpuset/postgres/cpus sudo /bin/bash -c echo 0 /dev/cpuset/postgres/mems sudo /bin/bash -c echo 1 /dev/cpuset/postgres/memory_spread_page # Sysadmin must grant permissions to the desired setting(s). # This could be by user or group. sudo chown postgres /dev/cpuset/postgres/tasks # The pid of postmaster or an ancestor process must be written to # the tasks file of the cpuset. This can be a shell from which # pg_ctl is run, at least for bash shells. It could also be # written by the postmaster itself, essentially as an extra pid # file. Possible snippet from a service script: echo $$ /dev/cpuset/postgres/tasks pg_ctl start ... Where the OS cache is larger than shared_buffers, the above is probably more important than the attached patch, which causes the main shared memory segment to be spread among all available memory nodes. This patch only compiles in the relevant code if configure is run using the --with-libnuma option, in which case a dependency on the numa library is created. It is v3 to avoid confusion with earlier versions I have shared with a few people off-list. (The only difference from v2 is fixing bitrot.) I'll add it to the next CF. Hm, your patch seems to boil down to interleave_memory(start, size, numa_all_nodes_ptr) inside PGSharedMemoryCreate(). I've read your email a couple of times and am a little hazy around a couple of points, in particular: the above is probably more important than the attached patch. So I have a couple of questions: *) There is a lot of advice floating around (for example here: http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html) to instruct operators to disable zone_reclaim. Will your changes invalidate any of that
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On Mon, Jun 9, 2014 at 10:40 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Right. I have no idea what SChannel's track record is, but when there's a vulnerability in the native SSL implementation in Windows, you better upgrade anyway, regardless of PostgreSQL. So when we rely on that, we don't put any extra burden on users. And we won't need to release new binaries just to update the DLL included in it. Right, heartily agreed. It wouldn't surprise me if there are lots of Windows machines out there that have 4 or 5 copies of OpenSSL on them, each provided by a different installer for some other piece of software that happens to depend on OpenSSL. When OpenSSL then has a security vulnerability, you're not safe until all of the people who produce those installers produce new versions and you upgrade to all of those new versions. In practice, I'm sure that an enormous amount slips through the cracks here. Relying on something that is part of the OS and updated by the OS vendor seems like less work for both packagers (who have to prepare the updates) and users (who have to apply them). Of course there may be cases where the OS implementation sucks badly or otherwise can't be relied upon, and then we'll just have to live with shipping copies of things. But avoiding it sounds better, if someone's volunteering to do the work -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On 05/06/14 23:09, Linos wrote: On 05/06/14 19:39, Tom Lane wrote: Merlin Moncure mmonc...@gmail.com writes: On Thu, Jun 5, 2014 at 9:54 AM, Linos i...@linos.es wrote: What I don't understand is why the statistics have this bad information, all my tests are done on a database just restored and analyzed. Can I do something to improve the quality of my database statistics and let the planner do better choices? Maybe increase the statistics target of the columns involved? By that I meant row count estimates coming out of the joins are way off. This is pushing the planner into making bad choices. The most pervasive problem I see is that the row count estimate boils down to '1' at some juncture causing the server to favor nestloop/index scan when something like a hash join would likely be more appropriate. There's some fairly wacko stuff going on in this example, like why is the inner HashAggregate costed so much higher by 9.3 than 8.4, when the inputs are basically the same? And why does 9.3 fail to suppress the SubqueryScan on ven, when 8.4 does get rid of it? And why is the final output rows estimate so much higher in 9.3? That one is actually higher than the product of the two nestloop inputs, which looks like possibly a bug. I think what's happening is that 9.3 is picking what it knows to be a less than optimal join method so that it can sort the output by means of the ordered scan Index Scan using referencia_key on modelo mo, and thereby avoid an explicit sort of what it thinks would be 42512461 rows. With a closer-to-reality estimate there, it would have gone for a plan more similar to 8.4's, ie, hash joins and then an explicit sort. There is a lot going on in this plan that we haven't been told about; for instance at least one of the query's tables seems to actually be a view, and some other ones appear to be inheritance trees with partitioning constraints, and I'm suspicious that some of the aggregates might be user-defined functions with higher than normal costs. I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here. regards, tom lane Query 2 doesn't use any view and you can find the schema here: http://pastebin.com/Nkv7FwRr Query 1 use 5 views: ticket_cabecera, ticket_linea, reserva_cabecera, reserva_linea and tarifa_proveedor_modelo_precio, I have factored out the four first with the same result as before, you can find the new query and the new plan here: http://pastebin.com/7u2Dkyxp http://explain.depesz.com/s/2V9d Actually the execution time is worse than before. About the last view if I change join from tarifa_proveedor_modelo_precio to tarifa_modelo_precio (a table with nearly the same structure as the view) the query is executed much faster, but I get a similar time changing the (MIN(cab.time_stamp_recepcion)::DATE = ) to (WHERE cab.time_stamp_recepcion::date = ) in the ent subquery that never was a view. Anyway I included tarifa_modelo_precio to the query1 schema file for reference and you can find the plan using tarifa_modelo_precio instead of the view tarifa_proveedor_modelo_precio here: http://explain.depesz.com/s/4gV query1 schema file: http://pastebin.com/JpqM87dr Regards, Miguel Angel. Hello, Is this information enough? I could try to assemble a complete test case but I have very little time right now because I am trying to meet a very difficult deadline. I will do ASAP if needed. Regards, Miguel Angel. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
Noah Misch n...@leadboat.com writes: On Wed, Aug 07, 2013 at 07:57:53AM -0400, Peter Eisentraut wrote: Did anyone have any outstanding concerns about this latest version? I thought it looked ready to commit. After upgrading to GNU Emacs 23.4.1 from a version predating directory-local variables, I saw switch/case indentation go on the fritz. My hooks were issuing (c-set-style postgresql), but .dir-locals.el set it back to plain bsd style. The most-reasonable fix I found was to locally add c-file-style to ignored-local-variables. It seems pretty ugly for this patch to be overriding our own .dir-locals.el, especially in a way with as many potential side effects as that. Why can't we fix .dir-locals.el to play nice with this? Is there a way for it to check for whether postgresql style is defined? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On Mon, Jun 9, 2014 at 9:51 AM, Linos i...@linos.es wrote: Hello, Is this information enough? I could try to assemble a complete test case but I have very little time right now because I am trying to meet a very difficult deadline. I will do ASAP if needed. It is not -- it was enough to diagnose a potential problem but not the solution. Tom was pretty clear: I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here.. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On 09/06/14 16:55, Merlin Moncure wrote: On Mon, Jun 9, 2014 at 9:51 AM, Linos i...@linos.es wrote: Hello, Is this information enough? I could try to assemble a complete test case but I have very little time right now because I am trying to meet a very difficult deadline. I will do ASAP if needed. It is not -- it was enough to diagnose a potential problem but not the solution. Tom was pretty clear: I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here.. merlin Merlin, in the email I replied to are attached the table/view schemas, I was referring to this information as enough or not. Tom said full details about the table/view schemas and these details are attached to the original email I replied to. Miguel Angel. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Andres Freund and...@2ndquadrant.com wrote: Well, I think reverting surely wouldn't be the right cure. It fixes a somewhat nasty bug. I'd certainly be prepared to add the two lines necessary to make it return DELETE_IN_PROGRESS after trying to understand Kevin's email about predicate.c and going through the other callers another time. I'm not actually sure yet whether the current state of affairs causes a problem for the serializable transaction isolation level implementation. The most important thing to me is that whatever is implemented is accurately documented in code comments so I can make any necessary adjustments to code in predicate.c -- or possibly determine that I *do* need some change to HTSV. Right now the HTSV embedded code comments suggest that the enum names and comments don't accurately describe the conditions under which they are returned, but I can't find anything else which does, short of reverse-engineering that from some fairly complex code. Perhaps it would be good if you could provide a concise description of the conditions under which value could currently be returned on this (or the related) thread before we talk about what changes might be needed? Maybe this is clear to others involved in the discussion, but I am not confident that I fully understand what gets returned under what conditions. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller
On Mon, Jun 9, 2014 at 4:06 AM, b8flowerfire b8flowerf...@gmail.com wrote: When I read the source code about the hashjoin, I was very confused that the postgresql define the NTUP_PER_BUCKET value as 10. Since this value is used to estimate the tuple count in one bucket, is it better if we have a smaller value? I have not done some experiments, but it seems that we could archive less hash collisions and better performance if we decrease the value. So could anyone explain to me that why we define NTUP_PER_BUCKET as 10? If there exists a specified situation that we would get worse performance or some troubles if set NTUP_PER_BUCKET to 1 or 2? This has come up before. Basically, the problem is that if you reduce NTUP_PER_BUCKET, the bucket array gets larger, which might reduce the amount of space available for tuples to the point where the hash join overflows to multiple batches. That will be more expensive than performing the hash join with a higher NTUP_PER_BUCKET. But having said that, I think the current situation is pretty bad, too. NTUP_PER_BUCKET is basically the anticipated load factor for the hash table, and everything I've ever read about hash table design says you want that to be something like 1, or 0.25. So 10 seems really high. But I'm not sure exactly what to do to fix the problem, because there are indeed cases where we will be worse off if we just lower the value categorically. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On Mon, Jun 9, 2014 at 4:39 PM, Martijn van Oosterhout klep...@svana.org wrote: On Mon, Jun 09, 2014 at 03:35:23PM +0200, Magnus Hagander wrote: On Mon, Jun 9, 2014 at 3:19 PM, Andreas Karlsson andr...@proxel.se wrote: On 06/09/2014 01:45 PM, Heikki Linnakangas wrote: There was a patch set for this from Martijn van Oosterhout which was quite complete. http://www.postgresql.org/message-id/20060504134807.gk4...@svana.org Wow, blast from the past. That's fun, itsn't it? :) A lot has, unfortunately, changed since 2006. It might be a good startingpoint. But also actively starting from the point of let's try to support multiple libraries rather than let's try to support gnutls is probably also important. The patch did provide an API. The idea was that there were a number of functions which would need to be defined to support an SSL library. Each library would then have a wrapper which wrapped the library and based on the results of configure it compiled the right file into the backend. These functions were: extern void pgtls_initialize(void); extern void pgtls_destroy(void); extern int pgtls_open_server(Port *); extern void pgtls_close(Port *); extern ssize_t pgtls_read(Port *port, void *ptr, size_t len); extern ssize_t pgtls_write(Port *port, void *ptr, size_t len); Which should be easy enough to support for any library. These days you'd need to add support for verifying certificates, but I don't think that that would be difficult (unless the actual certificate formats are different). The two difficult points I think are the async support (libpq) and the windows socket emulation support (backend). Do those really work there? In particular the win32 stuff - though I guess that's less critical since we can actually do hackish things there, unlike in libpq. But the example there is that we can't have the library use recv()/send(), instead having it work through our own functions. No switching after compile time, that would just lead to useless overhead. Yes, I absolutely think we don't need to support 1 library at runtime. At some point we should design a new API, so that we can deprecate the old one. Even if we don't hve the code ready, we need to get rid of PQgetssl(), and replace it with something else. I'm thinking probably a functoin that returns both a void pointer and an enum that tells you which library is actually in use. And a boolean just saying ssl on/off, because that's what a lot of clients are interested in and they don't care aobut more than that. Obviously, we also have to do something about PQinitOpenSSL(). Yeah, I think this was one of the more controversial parts. Support in the backend was primarily moving code around and renaming functions, fairly straightforward. Even error handling was not so hard (I found the gnutls handling of errors much easier than openssl). Yeah, the backend is easier, but also less important from the original reason. For the patching reason, it's of course just as important. One tricky part is that programs like to use libpq for the authentication, and then they hijack the connection using PGgetssl(). Is there *anybody* other than odbc that does that? Do we actually need a published API for that, or just a hack for pgodbc? The way I dealt with this is defining a new state passthrough where the caller would get a few function pointers to read/write/check the connection. Then the callers would not need to know what library libpq was compiled with. And libpq would know the connection was hijacked and refuse to act anymore. I don't think everyone was pleased with this, but no real alternative was presented (other than requiring people hijacking the connection to do the hard work). For information about which library was in use there was PQgettlsinfo() which returned a PGresult with information about the library and connection. I beleive since then new functions have been added to libpq to retrive info about certificates, so that might need a rethink also. Not really, no. we return the OpenSSL structure and have the client use that one directly. That's quite horrible already :) Basically, I think these last two points are the hard parts to get agreement (assuming there's agreement to do anything at all about the problem) and without nailing those down first whoever picks this up will be in for a lot of work. Agreed. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller
Robert Haas robertmh...@gmail.com writes: This has come up before. Basically, the problem is that if you reduce NTUP_PER_BUCKET, the bucket array gets larger, which might reduce the amount of space available for tuples to the point where the hash join overflows to multiple batches. That will be more expensive than performing the hash join with a higher NTUP_PER_BUCKET. But having said that, I think the current situation is pretty bad, too. NTUP_PER_BUCKET is basically the anticipated load factor for the hash table, and everything I've ever read about hash table design says you want that to be something like 1, or 0.25. So 10 seems really high. But I'm not sure exactly what to do to fix the problem, because there are indeed cases where we will be worse off if we just lower the value categorically. Keep in mind that that standard advice is meant for all-in-memory cases, not for cases where the alternative to running with longer hash chains is dumping tuples out to disk and reading them back. I'm quite prepared to believe that we should change NTUP_PER_BUCKET ... but appealing to standard advice isn't a good basis for arguing that. Actual performance measurements (in both batched and unbatched cases) would be a suitable basis for proposing a change. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On Mon, Jun 09, 2014 at 11:39:17PM +0900, MauMau wrote: From: Heikki Linnakangas hlinnakan...@vmware.com Thoughts? While we're at it, we'll probably want to refactor things so that it's easy to support other SSL implementations too, like gnutls. That may be good because it provides users with choices. But I wonder if it is worth the complexity and maintainability of PostgreSQL code. The complexity is very low. SSL is a standard protocol and so all libraries offer the same functionality. Were not really doing anything complex. * Are SChannel and other libraries more secure than OpenSSL? IIRC, recently I read in the news that GnuTLS had a vulnerability. OpenSSL is probably the most widely used library, and many people are getting more interested in its quality. I expect the quality will improve thanks to the help from The Linux foundation and other organizations/researchers. Does that matter? What's wrong with letting people choose. OpenVPN these days supports multiple SSL libraries, because PolarSSL (for example) has been vetted for a higher security level than OpenSSL. * Do other libraries get support from commercial vendor product support? For example, Safenet Inc., the famous HSM (hardware security module) vendor, supports OpenSSL to access the private key stored in its HSM product. Intel offered AES-NI implementation code to OpenSSL community. I guess OpenSSL will continue to be the most functional and obtain the widest adoption and support. And the crappiest license. I think it's silly for PostgreSQL dictate what SSL library users must use, when there are so many possibilities. We also support libedit for, in my opinion, worse reasons. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] performance regression in 9.2/9.3
On Mon, Jun 9, 2014 at 10:00 AM, Linos i...@linos.es wrote: On 09/06/14 16:55, Merlin Moncure wrote: On Mon, Jun 9, 2014 at 9:51 AM, Linos i...@linos.es wrote: Hello, Is this information enough? I could try to assemble a complete test case but I have very little time right now because I am trying to meet a very difficult deadline. I will do ASAP if needed. It is not -- it was enough to diagnose a potential problem but not the solution. Tom was pretty clear: I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here.. merlin Merlin, in the email I replied to are attached the table/view schemas, I was referring to this information as enough or not. Tom said full details about the table/view schemas and these details are attached to the original email I replied to. A self contained test case would generally imply a precise sequence of steps (possibly with supplied data, or some manipulations via generate_series) that would reproduce the issue locally. Since data may not be required, you might be able to get away with a 'schema only dump', but you'd need to make sure to include necessary statistics (mostly what you'd need is in pg_statistic which you'd have to join against pg_class, pg_attribute and pg_namespace). Ideally, you'd be able to restore your schema only dump on a blank database with autovacuum disabled, hack in your statistics, and verify your query produced the same plan. Then (and only then) you could tar up your schema only file, the statistics data, and the query to update the data, and your query with the bad plan which you've triple checked matched your problem condition's plan, and send it to Tom. There might be some things I've missed but getting a blank database to reproduce your problem with a minimum number of steps is key. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On 09/06/14 17:30, Merlin Moncure wrote: On Mon, Jun 9, 2014 at 10:00 AM, Linos i...@linos.es wrote: On 09/06/14 16:55, Merlin Moncure wrote: On Mon, Jun 9, 2014 at 9:51 AM, Linos i...@linos.es wrote: Hello, Is this information enough? I could try to assemble a complete test case but I have very little time right now because I am trying to meet a very difficult deadline. I will do ASAP if needed. It is not -- it was enough to diagnose a potential problem but not the solution. Tom was pretty clear: I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here.. merlin Merlin, in the email I replied to are attached the table/view schemas, I was referring to this information as enough or not. Tom said full details about the table/view schemas and these details are attached to the original email I replied to. A self contained test case would generally imply a precise sequence of steps (possibly with supplied data, or some manipulations via generate_series) that would reproduce the issue locally. Since data may not be required, you might be able to get away with a 'schema only dump', but you'd need to make sure to include necessary statistics (mostly what you'd need is in pg_statistic which you'd have to join against pg_class, pg_attribute and pg_namespace). Ideally, you'd be able to restore your schema only dump on a blank database with autovacuum disabled, hack in your statistics, and verify your query produced the same plan. Then (and only then) you could tar up your schema only file, the statistics data, and the query to update the data, and your query with the bad plan which you've triple checked matched your problem condition's plan, and send it to Tom. There might be some things I've missed but getting a blank database to reproduce your problem with a minimum number of steps is key. merlin oh I understand now, sorry for the misunderstanding, I will prepare the complete test case ASAP, thank you for the explanation Merlin. Miguel Angel. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On 2014-06-09 08:00:52 -0700, Kevin Grittner wrote: I'm not actually sure yet whether the current state of affairs causes a problem for the serializable transaction isolation level implementation. I'd replied in the other thread before noticing you'd replied here... From what I understand right now it's not affected at all. I tried to make things a bit clearer there - but I am not sure I've succeed. I'm certainly willing to explain things further if you can tell me which are is unclear. Right now the HTSV embedded code comments suggest that the enum names and comments don't accurately describe the conditions under which they are returned, but I can't find anything else which does, short of reverse-engineering that from some fairly complex code. Not really. You could even argue the current code more correctly represents the (old) comments: HEAPTUPLE_INSERT_IN_PROGRESS, /* inserting xact is still in progress */ HEAPTUPLE_DELETE_IN_PROGRESS/* deleting xact is still in progress */ the current code will return INSERT_IN_PROGRESS even if the tuple has *also* been deleted in another xact... I think the problem here is that there's simply no way to really represent that case accurately with the current API. Perhaps it would be good if you could provide a concise description of the conditions under which value could currently be returned on this (or the related) thread before we talk about what changes might be needed? Maybe this is clear to others involved in the discussion, but I am not confident that I fully understand what gets returned under what conditions. HEAPTUPLE_DEAD, /* tuple is dead and deletable */ 1) xmin has committed, xmax has committed and wasn't just a locker. Xmax precedes OldestXmin. HEAPTUPLE_LIVE, /* tuple is live (committed, no deleter) */ 1) xmin has committed, xmax unset 2) xmin has committed, xmax is locked only. Status of xmax is irrelevant 3) xmin has committed, xmax has aborted. HEAPTUPLE_RECENTLY_DEAD,/* tuple is dead, but not deletable yet */ 1) xmin has committed, xmax has committed and wasn't only a locker. But xmax doesn't precede OldestXmin. HEAPTUPLE_INSERT_IN_PROGRESS, /* inserting xact is still in progress */ new: 1) xmin is in progress, xmin is the current backend, xmax is invalid 2) xmin is in progress, xmin is the current backend, xmax only a locker 3) xmin is in progress, xmin is the current backend, xmax aborted 4) xmin is in progress, xmin is *not* current backend, xmax is irrelevant old: 1) xmin is in progress, xmax is invalid 2) xmin is in progress, xmax is only a locker HEAPTUPLE_DELETE_IN_PROGRESS/* deleting xact is still in progress */ new: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmin is the current backend, xmax is not just a locker and in progress. old: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmax is set and not not just a locker Note that the 2) case here never checked xmax's status. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMA packaging and patch
Merlin Moncure mmonc...@gmail.com wrote: On Sun, Jun 8, 2014 at 5:45 PM, Kevin Grittner kgri...@ymail.com wrote: Hm, your patch seems to boil down to interleave_memory(start, size, numa_all_nodes_ptr) inside PGSharedMemoryCreate(). That's the functional part -- the rest is about not breaking the builds for environments which are not NUMA-aware. I've read your email a couple of times and am a little hazy around a couple of points, in particular: the above is probably more important than the attached patch. So I have a couple of questions: *) There is a lot of advice floating around (for example here: http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html ) to instruct operators to disable zone_reclaim. Will your changes invalidate any of that advice? I expect that it will make the need for that far less acute, although it is probably still best to disable zone_reclaim (based on the documented conditions under which disabling it makes sense). *) is there any downside to enabling --with-libnuma if you have support? Not that I can see. There are two additional system calls on postmaster start-up. I don't expect the time those take to be significant. Do you expect packagers will enable it generally? I suspect so. Why not just always build it in (if configure allows it) and rely on a GUC if there is some kind of tradeoff (and if there is one, what kinds of things are you looking for to manage it)? If a build is done on a machine with the NUMA library, and the executable is deployed on a machine without it, the postmaster will get an error on the missing library. I talked about this briefly with Tom in Ottawa, and he thought that it would be up to packagers to create a dependency on the library if they build PostgreSQL using the --with-libnuma option. The reason to require the option is so that a build is not created which won't run on target machines if a packagers does nothing to deal with NUMA. *) The bash script above, what problem does the 'alternate policy' solve? By default, all OS buffers and cache is located in the memory node closest to the process which does the read or write which first causes it to be used. For something like the cp command, that probably makes sense. For something like PostgreSQL it can lead to unbalanced placement of shared resources (like pages in shared tables and indexes). *) What kinds of improvements (even if in very general terms) will we see from better numa management? Are there further optimizations possible? When I spread both OS cache and PostgreSQL shared memory, I got about 2% better performance overall for a read-only load on a 4 node system which started with everything on one node. I used pgbench and picked a scale which put the database size at about 25% of machine memory before I initialized the database, so that one memory node was 100% filled with minimal spill to the other nodes. The run times between the two cases had very minimal overlap. The balanced memory usage had more consistent results; the unbalance load had more variable performance timings, with a rare run showing better than all the balanced times. I didn't spend as much time with read/write benchmarks but those seemed overall worse for the unbalance load, and one outlier on the bad side was about 20% below the (again, pretty tightly clustered) times for the balanced load. These tests were designed to try to create a pretty bad case for the unbalanced load in a default cpuset configuration and just an unlucky sizing of the working set relative to a memory node size. At PGCon I had a discussion over lunch with someone who saw far worse performance from unbalance memory, but he carefully engineered a really bad case by using one cpuset to force all data into one node, and then another cpuset to force PostgreSQL to run only on cores from which access to that node was relatively slow. If I remember correctly, he saw about 20% of the throughput that way versus using the same cores with balanced memory usage. He conceded that this was a pretty artificial case, and you would have to be *trying* to hurt performance to set things up that way, but he wanted to establish a worst case so that he had a hard bounding of what the maximum possible benefit from balancing load might be. There is definitely a need for more benchmarks and benchmarks on more environments, but my preliminary tests all looked favorable to the combination of this patch and the cpuset changes. I would have posted this months ago if I had found enough time to do more benchmarks and put together a nice presentation of the results, but I figured it was a good idea to put this in front of people even with only preliminary results, so that if others were interested in doing so they could see what results they got in their environments or with workloads I had not considered. I will note that given the wide differences I saw between run times with the
Re: [HACKERS] NUMA packaging and patch
On 2014-06-09 08:59:03 -0700, Kevin Grittner wrote: *) There is a lot of advice floating around (for example here: http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html ) to instruct operators to disable zone_reclaim. Will your changes invalidate any of that advice? I expect that it will make the need for that far less acute, although it is probably still best to disable zone_reclaim (based on the documented conditions under which disabling it makes sense). I think it'll still be important unless you're running an OLTP workload (i.e. minimal per backend allocations) and your entire workload fits into shared buffers. What zone_reclaim 0 essentially does is to never allocate memory from remote nodes. I.e. it will throw away all numa node local OS cache to satisfy a memory allocation (including pagefaults). I honestly wouldn't expect this to make a huge difference *wrt* zone_reclaim_mode. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
Linos i...@linos.es writes: On 05/06/14 19:39, Tom Lane wrote: I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here. query1 schema file: http://pastebin.com/JpqM87dr Sorry about the delay on getting back to this. I downloaded the above schema file and tried to run the originally given query with it, and it failed because the query refers to a couple of tienda columns that don't exist anywhere in this schema. When you submit an updated version, please make sure that all the moving parts match ;-). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Andres Freund and...@2ndquadrant.com wrote: On 2014-06-09 08:00:52 -0700, Kevin Grittner wrote: I tried to make things a bit clearer there - but I am not sure I've succeed. I'm certainly willing to explain things further if you can tell me which are is unclear. Thanks! IMO, something like this should be included in the code comments. HEAPTUPLE_INSERT_IN_PROGRESS, /* inserting xact is still in progress */ HEAPTUPLE_DELETE_IN_PROGRESS /* deleting xact is still in progress */ the current code will return INSERT_IN_PROGRESS even if the tuple has *also* been deleted in another xact... I think the problem here is that there's simply no way to really represent that case accurately with the current API. For purposes of predicate.c, if the also deleted activity might be rolled back without rolling back the insert, INSERT_IN_PROGRESS is the only correct value. If they will either both commit or neither will commit, predicate.c would be more efficient if HEAPTUPLE_RECENTLY_DEAD was returned, but I HEAPTUPLE_INSERT_IN_PROGRESS would be OK from a correctness PoV. Perhaps it would be good if you could provide a concise description of the conditions under which value could currently be returned on this (or the related) thread before we talk about what changes might be needed? Maybe this is clear to others involved in the discussion, but I am not confident that I fully understand what gets returned under what conditions. HEAPTUPLE_DEAD, /* tuple is dead and deletable */ 1) xmin has committed, xmax has committed and wasn't just a locker. Xmax precedes OldestXmin. Perfect. HEAPTUPLE_LIVE, /* tuple is live (committed, no deleter) */ 1) xmin has committed, xmax unset 2) xmin has committed, xmax is locked only. Status of xmax is irrelevant 3) xmin has committed, xmax has aborted. Perfect. HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */ 1) xmin has committed, xmax has committed and wasn't only a locker. But xmax doesn't precede OldestXmin. For my purposes, it would be better if this also included: 2) xmin is in progress, xmax matches (or includes) xmin ... but that would be only a performance tweak. HEAPTUPLE_INSERT_IN_PROGRESS, /* inserting xact is still in progress */ new: 1) xmin is in progress, xmin is the current backend, xmax is invalid 2) xmin is in progress, xmin is the current backend, xmax only a locker 3) xmin is in progress, xmin is the current backend, xmax aborted 4) xmin is in progress, xmin is *not* current backend, xmax is irrelevant old: 1) xmin is in progress, xmax is invalid 2) xmin is in progress, xmax is only a locker I think this is OK from a correctness PoV. There may be an opportunity to optimize. I will look more closely at whether it seems likely to matter much, and what sort of change in the return conditions or in predicate.c might be needed. HEAPTUPLE_DELETE_IN_PROGRESS /* deleting xact is still in progress */ new: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmin is the current backend, xmax is not just a locker and in progress. I'm not clear on how 2) could happen unless xmax is the current backend or a subtransaction thereof. Could you clarify? old: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmax is set and not not just a locker Note that the 2) case here never checked xmax's status. Again, I'm not sure how 2) could happen unless they involve the same top-level transaction. What am I missing? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: HEAPTUPLE_INSERT_IN_PROGRESS, /* inserting xact is still in progress */ HEAPTUPLE_DELETE_IN_PROGRESS /* deleting xact is still in progress */ the current code will return INSERT_IN_PROGRESS even if the tuple has *also* been deleted in another xact... I think the problem here is that there's simply no way to really represent that case accurately with the current API. For purposes of predicate.c, if the also deleted activity might be rolled back without rolling back the insert, INSERT_IN_PROGRESS is the only correct value. If they will either both commit or neither will commit, predicate.c would be more efficient if HEAPTUPLE_RECENTLY_DEAD was returned, but I HEAPTUPLE_INSERT_IN_PROGRESS would be OK from a correctness PoV. That's basically the argument for the new behaviour. But I am not sure, given predicate.c's coding, how HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate, since that's the contentious point with Tom? Since 'both in progress' can only happen if xmin and xmax are the same toplevel xid and you resolve subxids to toplevel xids I think it should currently be safe either way? HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */ 1) xmin has committed, xmax has committed and wasn't only a locker. But xmax doesn't precede OldestXmin. For my purposes, it would be better if this also included: 2) xmin is in progress, xmax matches (or includes) xmin ... but that would be only a performance tweak. I don't see that happening as there's several callers for which it is important to know whether the xacts are still alive or not. HEAPTUPLE_DELETE_IN_PROGRESS /* deleting xact is still in progress */ new: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmin is the current backend, xmax is not just a locker and in progress. I'm not clear on how 2) could happen unless xmax is the current backend or a subtransaction thereof. Could you clarify? old: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmax is set and not not just a locker Note that the 2) case here never checked xmax's status. Again, I'm not sure how 2) could happen unless they involve the same top-level transaction. What am I missing? Right, both can only happen if the tuple is created deleted in the same backend. Is that in contradiction to something you see? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch
I'm having trouble setting max_async_io_prefetchers in postgresql.conf It says it cannot be changed. Is that fixed at initdb time? (sounds like a bad idea if it is) On Sun, Jun 8, 2014 at 11:12 PM, johnlumby johnlu...@hotmail.com wrote: updated version of patch compatible with git head of 140608, (adjusted proc oid and a couple of minor fixes) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMA packaging and patch
Andres Freund and...@2ndquadrant.com wrote: On 2014-06-09 08:59:03 -0700, Kevin Grittner wrote: *) There is a lot of advice floating around (for example here: http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html ) to instruct operators to disable zone_reclaim. Will your changes invalidate any of that advice? I expect that it will make the need for that far less acute, although it is probably still best to disable zone_reclaim (based on the documented conditions under which disabling it makes sense). I think it'll still be important unless you're running an OLTP workload (i.e. minimal per backend allocations) and your entire workload fits into shared buffers. What zone_reclaim 0 essentially does is to never allocate memory from remote nodes. I.e. it will throw away all numa node local OS cache to satisfy a memory allocation (including pagefaults). I don't think that cpuset spreading of OS buffers and cache, and the patch to spread shared memory, will make too much difference unless the working set is fully cached. Where I have seen the biggest problems is when the active set one memory node and total machine RAM. I would agree that unless this patch is providing benefit for such a fully-cached load, it won't make any difference regarding the need for zone_reclaim_mode. Where the data is heavily cached, zone_reclaim 0 might discard some cached pages to allow, say, a RAM sort to be done in faster memory (for the current process's core), so it might be a wash or even make zone_reclaim 0 a win. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Andres Freund and...@2ndquadrant.com writes: On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote: For purposes of predicate.c, if the also deleted activity might be rolled back without rolling back the insert, INSERT_IN_PROGRESS is the only correct value. That's basically the argument for the new behaviour. But I am not sure, given predicate.c's coding, how HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate, since that's the contentious point with Tom? Since 'both in progress' can only happen if xmin and xmax are the same toplevel xid and you resolve subxids to toplevel xids I think it should currently be safe either way? Assuming that Kevin's describing his needs correctly, we could resolve this by inventing HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS. VACUUM could assume that that's a probably-dead tuple, while SSI could do something different. I'm not sure if it's worth special-casing xmin == xmax, where the tuple is guaranteed to never be of interest to any other transaction? The reason this stuff is not too carefully spec'd is that when HTSV was written, there was no expectation that there was any correctness issue around which of these cases was returned. I wonder whether SSI should be using HTSV at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
On 09/06/14 23:42, Vik Fearing wrote: On 06/09/2014 09:06 AM, Gavin Flower wrote: From memory all unique keys can be considered 'candidate primary keys', but only one can be designated 'the PRIMARY KEY'. Almost. Candidate keys are also NOT NULL. Yeah, obviously! (Except, I did actually forget that - me bad.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
On 2014-06-09 13:42:22 +0200, Vik Fearing wrote: On 06/09/2014 09:06 AM, Gavin Flower wrote: From memory all unique keys can be considered 'candidate primary keys', but only one can be designated 'the PRIMARY KEY'. Almost. Candidate keys are also NOT NULL. The list actually is a bit longer. They also cannot be partial. There's generally also the restriction that for some contexts - like e.g. foreign keys - primary/candidate keys may not be deferrable.. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On 06/09/2014 06:03 PM, Magnus Hagander wrote: One tricky part is that programs like to use libpq for the authentication, and then they hijack the connection using PGgetssl(). Is there*anybody* other than odbc that does that? Do we actually need a published API for that, or just a hack for pgodbc? I wish psqlODBC would stop doing that. It's kind of cool that it supports compiling without libpq, but it's really quite a mess. I think we should modify psqlODBC to use the libpq API like most people do. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On Mon, Jun 9, 2014 at 7:45 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 06/09/2014 06:03 PM, Magnus Hagander wrote: One tricky part is that programs like to use libpq for the authentication, and then they hijack the connection using PGgetssl(). Is there*anybody* other than odbc that does that? Do we actually need a published API for that, or just a hack for pgodbc? I wish psqlODBC would stop doing that. It's kind of cool that it supports compiling without libpq, but it's really quite a mess. I think we should modify psqlODBC to use the libpq API like most people do. This was, I believe, done at one point, and then reverted. I think that was because libpq didn't actually have all the features required either for the current or for planned featues of the ODBC driver. That situation might be very different now though, there's more functionality available in libpq.. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] updated emacs configuration
On Mon, Jun 09, 2014 at 10:52:34AM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Wed, Aug 07, 2013 at 07:57:53AM -0400, Peter Eisentraut wrote: Did anyone have any outstanding concerns about this latest version? I thought it looked ready to commit. After upgrading to GNU Emacs 23.4.1 from a version predating directory-local variables, I saw switch/case indentation go on the fritz. My hooks were issuing (c-set-style postgresql), but .dir-locals.el set it back to plain bsd style. The most-reasonable fix I found was to locally add c-file-style to ignored-local-variables. It seems pretty ugly for this patch to be overriding our own .dir-locals.el, especially in a way with as many potential side effects as that. Why can't we fix .dir-locals.el to play nice with this? Is there a way for it to check for whether postgresql style is defined? Nontrivial material in .dir-locals.el will tend to fire unsafe-local-variable warnings. Ugly situation, yes. What are these many potential side effects you have in mind? -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Andres Freund and...@2ndquadrant.com wrote: On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote: I am not sure, given predicate.c's coding, how HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate, since that's the contentious point with Tom? Since 'both in progress' can only happen if xmin and xmax are the same toplevel xid and you resolve subxids to toplevel xids I think it should currently be safe either way? The only way that it could be a problem is if the DELETE is in a subtransaction which might get rolled back without rolling back the INSERT. If we ignore the conflict because we assume the INSERT will be negated by the DELETE, and that doesn't happen, we would get false negatives which would compromise correctness. If we assume that the DELETE might not happen when the DELETE is not in a separate subtransaction we might get a false positive, which would only be a performance hit. If we know either is possible and have a way to check in predicate.c, it's fine to check it there. HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */ 1) xmin has committed, xmax has committed and wasn't only a locker. But xmax doesn't precede OldestXmin. For my purposes, it would be better if this also included: 2) xmin is in progress, xmax matches (or includes) xmin ... but that would be only a performance tweak. I don't see that happening as there's several callers for which it is important to know whether the xacts are still alive or not. OK HEAPTUPLE_DELETE_IN_PROGRESS /* deleting xact is still in progress */ new: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmin is the current backend, xmax is not just a locker and in progress. I'm not clear on how 2) could happen unless xmax is the current backend or a subtransaction thereof. Could you clarify? old: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmax is set and not not just a locker Note that the 2) case here never checked xmax's status. Again, I'm not sure how 2) could happen unless they involve the same top-level transaction. What am I missing? Right, both can only happen if the tuple is created deleted in the same backend. Is that in contradiction to something you see? Well, you're making a big point that the status of xmax was not checked in the old code. If xmax is the same as xmin and xmin is in progress, the additional check seems redundant -- unless I'm missing something. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On 09/06/14 18:31, Tom Lane wrote: Linos i...@linos.es writes: On 05/06/14 19:39, Tom Lane wrote: I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here. query1 schema file: http://pastebin.com/JpqM87dr Sorry about the delay on getting back to this. I downloaded the above schema file and tried to run the originally given query with it, and it failed because the query refers to a couple of tienda columns that don't exist anywhere in this schema. When you submit an updated version, please make sure that all the moving parts match ;-). regards, tom lane Tom are you trying with the modified query 1 I posted in the email you found the schema link? I changed a little bit to remove 4 views, these views were where tienda columns were. Here you can find the modified query and the new explain without these views. http://pastebin.com/7u2Dkyxp http://explain.depesz.com/s/2V9d Anyway Merlin told me how to create a more complete self-contained case without data, I will try to do it ASAP, I am really busy right now trying to meet a deadline but I will try to search for a while to create this test-case. Thank you Tom. Regards, Miguel Angel. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Tom Lane t...@sss.pgh.pa.us wrote: The reason this stuff is not too carefully spec'd is that when HTSV was written, there was no expectation that there was any correctness issue around which of these cases was returned. I wonder whether SSI should be using HTSV at all. That's certainly a reasonable question. When I was writing the SSI code, I was blissfully unaware of HTSV and had coded up a way to check this which seemed to work for all tests we had. Jeff Davis, reviewing the code, was concerned that such separate code was more likely to miss something or to break as visibility handling changed. He argued that HTSV was basically checking for the same things I was, and a redundant and version which did the check differently was a bad idea. Here is where it was discussed during development: http://www.postgresql.org/message-id/flat/1296499247.11513.777.camel@jdavis#1296499247.11513.777.camel@jdavis -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote: old: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmax is set and not not just a locker Note that the 2) case here never checked xmax's status. Again, I'm not sure how 2) could happen unless they involve the same top-level transaction. What am I missing? Right, both can only happen if the tuple is created deleted in the same backend. Is that in contradiction to something you see? Well, you're making a big point that the status of xmax was not checked in the old code. If xmax is the same as xmin and xmin is in progress, the additional check seems redundant -- unless I'm missing something. IIRC the case that prompted the fix was a CREATE INDEX in the same transaction that created a tuple which was later deleted by an aborted subtransaction. If the creating transaction is not this backend, that's fine. But if the creating transaction IsCurrentTransactionId() then we need to be careful about aborted subxacts: if a tuple was deleted by an aborted subxact then it's still visible to this transaction. Xmax must be checked in this case. Note that the difference is pretty specific to CREATE INDEX. Vacuuming doesn't have to worry about such cases, mainly because you can't run vacuum in a transaction. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Tom Lane t...@sss.pgh.pa.us wrote: Assuming that Kevin's describing his needs correctly, we could resolve this by inventing HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS. VACUUM could assume that that's a probably-dead tuple, while SSI could do something different. That could work. On the other hand, the old code, where such a transaction showed as HEAPTUPLE_DELETE_IN_PROGRESS, might still work for predicate.c as long as it's clear that this return code sometimes means insert and delete are both in progress and the insert might commit without committing the delete. Those conditions could be identified within predicate.c; although it seems like that would involve duplicating some work which was already in HTSV, with the usual costs and risks of duplicate code. I'm not sure if it's worth special-casing xmin == xmax, where the tuple is guaranteed to never be of interest to any other transaction? That could be checked in predicate.c. I see no reason to create a separate return code for it if it's not of interest for other callers. It could even be left as a later optimization, since a false positive serialization failure doesn't compromise correctness, but it seems like it is probably easy enough to cover to just do so now. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
On 6/9/14, 8:35 AM, Hannu Krosing wrote: On 06/09/2014 06:58 AM, Ian Barwick wrote: Hi, The JDBC API provides the getGeneratedKeys() method as a way of retrieving primary key values without the need to explicitly specify the primary key column(s). Is it defined by the standard, to return _only_ generated primary keys, and not for example generated alternate keys ? I was wondering that myself. I think it's certainly reasonable to expect someone would wan RETURNING SEQUENCE VALUES, which would return the value of every column that owned a sequence (ie: ALTER SEQUENCE ... OWNED BY). ISTM that would certainly handle the performance aspect of this, and it sounds more in line with what I'd expect getGeneratedKeys() to do. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing NOT IN to use ANTI joins
On Sun, Jun 8, 2014 at 5:36 AM, David Rowley dgrowle...@gmail.com wrote: Currently pull_up_sublinks_qual_recurse only changes the plan for NOT EXISTS queries and leaves NOT IN alone. The reason for this is because the values returned by a subquery in the IN clause could have NULLs. A simple example of this (without a subquery) is: select 1 where 3 not in (1, 2, null); returns 0 rows because 3 NULL is unknown. The attached patch allows an ANTI-join plan to be generated in cases like: CREATE TABLE a (id INT PRIMARY KEY, b_id INT NOT NULL); CREATE TABLE b (id INT NOT NULL); SELECT * FROM a WHERE b_id NOT IN(SELECT id FROM b); To generate a plan like: QUERY PLAN - Hash Anti Join (cost=64.00..137.13 rows=1070 width=8) Hash Cond: (a.b_id = b.id) - Seq Scan on a (cost=0.00..31.40 rows=2140 width=8) - Hash (cost=34.00..34.00 rows=2400 width=4) - Seq Scan on b (cost=0.00..34.00 rows=2400 width=4) I think this will be great, I've run into this problem often from applications I have no control over. I thought a more complete, but probably much harder, solution would be to add some metadata to the hash anti-join infrastructure that tells it If you find any nulls in the outer scan, stop running without returning any rows. I think that should work because the outer rel already has to run completely before any rows can be returned. But what I can't figure out is, would that change obviate the need for your change? Once we can correctly deal with nulls in a NOT IN list through a hash anti join, is there a cost estimation advantage to being able to prove that the that null can't occur? (And of course if you have code that works, while I have vague notions of what might be, then my notion probably does not block your code.) Cheers, Jeff
Re: [HACKERS] Allowing NOT IN to use ANTI joins
Jeff Janes jeff.ja...@gmail.com writes: On Sun, Jun 8, 2014 at 5:36 AM, David Rowley dgrowle...@gmail.com wrote: The attached patch allows an ANTI-join plan to be generated in cases like: CREATE TABLE a (id INT PRIMARY KEY, b_id INT NOT NULL); CREATE TABLE b (id INT NOT NULL); SELECT * FROM a WHERE b_id NOT IN(SELECT id FROM b); I think this will be great, I've run into this problem often from applications I have no control over. I thought a more complete, but probably much harder, solution would be to add some metadata to the hash anti-join infrastructure that tells it If you find any nulls in the outer scan, stop running without returning any rows. I think that should work because the outer rel already has to run completely before any rows can be returned. Huh? The point of an antijoin (or indeed most join methods) is that we *don't* have to examine the whole inner input to make a decision. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cancelling statement due to user request error occurs but the transaction has committed.
Hi Amit, Thank you for your response. There can be similar observation if the server goes off (power outage or anything like) after committing transaction, client will receive connection broken, so he can misunderstand that as well. I think for such corner cases, client needs to reconfirm his action results with database before concluding anything. I see. Now, I understand that ProcessInterrupts Error (ProcDie, QueryCancel, ClientLost..) does not mean That query has been RollBacked. Regards, Naoya --- Naoya Anzai Engineering Department NEC Solution Inovetors, Ltd. E-Mail: anzai-na...@mxu.nes.nec.co.jp --- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
On Sun, 2014-06-08 at 21:55 -0400, Noah Misch wrote: After upgrading to GNU Emacs 23.4.1 from a version predating directory-local variables, I saw switch/case indentation go on the fritz. My hooks were issuing (c-set-style postgresql), but .dir-locals.el set it back to plain bsd style. I'd consider just getting rid of the (c-file-style . bsd) setting in .dir-locals.el and put the actual interesting settings from the style in there. Another alternative is to change emacs.samples to use hack-local-variables-hook instead, as described here: http://www.emacswiki.org/emacs/LocalVariables -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tests for client programs
On Thu, 2014-06-05 at 21:57 -0400, Noah Misch wrote: I recommend TMPDIR = 1 instead of setting DIR. I originally decided against doing that, because 1) I don't know if all systems would have enough space in their regular temporary directory for the kinds of things we put there. Using the build directory seems safer. 2) One debugging method is to set CLEANUP to false and then manually inspect the data directory left behind. (In the future, this might be exposed via a command-line option.) This would become more cumbersome and error-prone if we used TMPDIR. This temporary directory is used for Unix sockets, so path length limitations will be a problem: http://www.postgresql.org/message-id/20140329172224.ga170...@tornado.leadboat.com That, however, is a good argument for doing it the other way. Maybe we need two temporary directories. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RETURNING PRIMARY KEY syntax extension
A definite +1 on this feature. A while ago I got partway through hacking the hibernate postgres dialect to make it issue a RETURNING clause to spit out the primary key before I realised that the driver was already doing a RETURNING * internally. On 10 June 2014 05:53, Jim Nasby j...@nasby.net wrote: I was wondering that myself. I think it's certainly reasonable to expect someone would wan RETURNING SEQUENCE VALUES, which would return the value of every column that owned a sequence (ie: ALTER SEQUENCE ... OWNED BY). ISTM that would certainly handle the performance aspect of this, and it sounds more in line with what I'd expect getGeneratedKeys() to do. Keep in mind that not all generated keys come from sequences. Many people have custom key generator functions, including UUIDs and other exotic things like Instagram's setup [1]. RETURNING GENERATED KEYS perhaps, but then how do we determine that? Any column that was filled with a default value? But that's potentially returning far more values than the user will want - I bet 99% of users just want their generated primary key. The spec is a bit vague [2]: Retrieves any auto-generated keys created as a result of executing this Statement object. If this Statement object did not generate any keys, an empty ResultSet object is returned. Note:If the columns which represent the auto-generated keys were not specified, the JDBC driver implementation will determine the columns which best represent the auto-generated keys. The second paragraph refers to [3] and [4] where the application can specify which columns it's after. Given that there's a mechanism to specify which keys the application wants returned in the driver, and the driver in that case can just issue a RETURNING clause with a column list, my gut feel would be to just support returning primary keys as that will handle most cases of e.g. middleware like ORMs fetching that without needing to know the specific column names. Cheers Tom [1] http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram [2] http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#getGeneratedKeys() [3] http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#execute(java.lang.String,%20int[]) [4] http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#execute(java.lang.String,%20java.lang.String[])
Re: [HACKERS] Allowing NOT IN to use ANTI joins
On Monday, June 9, 2014, Tom Lane t...@sss.pgh.pa.us wrote: Jeff Janes jeff.ja...@gmail.com javascript:; writes: On Sun, Jun 8, 2014 at 5:36 AM, David Rowley dgrowle...@gmail.com javascript:; wrote: The attached patch allows an ANTI-join plan to be generated in cases like: CREATE TABLE a (id INT PRIMARY KEY, b_id INT NOT NULL); CREATE TABLE b (id INT NOT NULL); SELECT * FROM a WHERE b_id NOT IN(SELECT id FROM b); I think this will be great, I've run into this problem often from applications I have no control over. I thought a more complete, but probably much harder, solution would be to add some metadata to the hash anti-join infrastructure that tells it If you find any nulls in the outer scan, stop running without returning any rows. I think that should work because the outer rel already has to run completely before any rows can be returned. Huh? The point of an antijoin (or indeed most join methods) is that we *don't* have to examine the whole inner input to make a decision. But all hash join methods needs to examine the entire *outer* input, no? Have I screwed up my terminology here? If you are using NOT IN, then once you find a NULL in the outer input (if the outer input is the in-list: clearly you can't reverse the two inputs in this case), you don't even need to finish reading the outer input, nor start reading the inner input, because all rows are automatically excluded by the weird semantics of NOT IN. Cheers, Jeff
Re: [HACKERS] tests for client programs
On Mon, Jun 09, 2014 at 09:12:27PM -0400, Peter Eisentraut wrote: On Thu, 2014-06-05 at 21:57 -0400, Noah Misch wrote: I recommend TMPDIR = 1 instead of setting DIR. I originally decided against doing that, because 1) I don't know if all systems would have enough space in their regular temporary directory for the kinds of things we put there. Using the build directory seems safer. 2) One debugging method is to set CLEANUP to false and then manually inspect the data directory left behind. (In the future, this might be exposed via a command-line option.) This would become more cumbersome and error-prone if we used TMPDIR. This temporary directory is used for Unix sockets, so path length limitations will be a problem: http://www.postgresql.org/message-id/20140329172224.ga170...@tornado.leadboat.com That, however, is a good argument for doing it the other way. Maybe we need two temporary directories. Two temporary directories sounds fair, given those constraints. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing NOT IN to use ANTI joins
Jeff Janes jeff.ja...@gmail.com writes: On Monday, June 9, 2014, Tom Lane t...@sss.pgh.pa.us wrote: Huh? The point of an antijoin (or indeed most join methods) is that we *don't* have to examine the whole inner input to make a decision. But all hash join methods needs to examine the entire *outer* input, no? Have I screwed up my terminology here? I think you're confusing inner and outer inputs --- the sub-select inside the NOT IN is the inner input according to the way I think about it. But I had assumed that was what you meant already. If you are using NOT IN, then once you find a NULL in the outer input (if the outer input is the in-list: clearly you can't reverse the two inputs in this case), you don't even need to finish reading the outer input, nor start reading the inner input, because all rows are automatically excluded by the weird semantics of NOT IN. The point I'm trying to make is that the goal of most join types is to give an answer without having necessarily read all of either input. For instance, if we tried to do this with a mergejoin it wouldn't work reliably: it might suppose that it could accept an outer row on the basis of no match in a higher-order sort column before it'd reached any nulls appearing in lower-order sort columns. You might be right that we could hot-wire the hash join case in particular, but I'm failing to see the point of expending lots of extra effort in order to deliver a useless answer faster. If there are NULLs in the inner input, then NOT IN is simply the wrong query to make, and giving an empty output in a relatively short amount of time isn't going to help clue the user in on that. (If the SQL standard would let us do so, I'd be arguing for throwing an error if we found a NULL.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
On Mon, Jun 9, 2014 at 6:37 PM, MauMau maumau...@gmail.com wrote: Hello, I've fixed and tested a memory leak bug in dblink. Could you review and commit this? I'll add this to the CommitFest shortly. Is there a need to free memory context in PG_CATCH()? In error path the memory due to this temporary context will get freed as it will delete the transaction context and this temporary context will definitely be in the hierarchy of it, so it should also get deleted. I think such handling can be useful incase we use PG_CATCH() to suppress the error. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
[HACKERS] Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller
Robert Haas wrote On Mon, Jun 9, 2014 at 4:06 AM, b8flowerfire lt; b8flowerfire@ gt; wrote: This has come up before. Basically, the problem is that if you reduce NTUP_PER_BUCKET, the bucket array gets larger, which might reduce the amount of space available for tuples to the point where the hash join overflows to multiple batches. That will be more expensive than performing the hash join with a higher NTUP_PER_BUCKET. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company Thanks for the explanation. But i don't think it is very convincible. Simply reduce the value of NTUP_PER_BUCKET will enlarge the pointer array and reduce the tuples in one batch. But is that effect significant to the performance? The utilization of the work_mem, i think, is determined by the ratio of size of the pointer and the size of the tuple. Let's assume the size of tuple is 28 bytes, which is very reasonable because it's the sum of the size of HJTUPLE_OVERHEAD(at least 8 bytes), the size of MinimalTupleData(at least 10 bytes) and the content of a tuple(assume 10 bytes). And the size of pointer is 4 bytes. So, if NTUP_PER_BUCKET is set to 10, about (28 * 10 / 28 * 10 + 4) of the work_mem is used to store tuples. If NTUP_PER_BUCKET is set to 1, about (28 / 28 + 4) of the work_mem is used to store tuples, reduced to 90% of the original. As a result, changing the value of NTUP_PER_BUCKET to 1 may increase the batches number by only about 10%. So it that enough to effect the performance? Or maybe i can not do the calculation simply in this way. Besides, we have larger main-memory now. If we set the work_mem larger, the more batches effect introduced by the smaller NTUP_PER_BUCKET value may be reduced, couldn't it? I have read about discussion about the NTUP_PER_BUCKET before. It seems that if we change NTUP_PER_BUCKET to 50 or even larger, the performance wouldn't be much worse. Because every tuple in the chain of a bucket has a hash value. Having more tuples in a bucket simply increase some comparisons of two integers. So is it the same if we change it smaller, that we could not get much better? Is it one of the reasons that we define it as 10? After all, it is my first time to discuss in a open source community. Thank you all for the reply and the discussion. Thanks. -- View this message in context: http://postgresql.1045698.n5.nabble.com/why-postgresql-define-NTUP-PER-BUCKET-as-10-not-other-numbers-smaller-tp5806472p5806617.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers