Re: [HACKERS] Proposal : REINDEX SCHEMA
On Tue, Dec 9, 2014 at 4:44 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Dec 9, 2014 at 10:10 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Dec 2, 2014 at 3:42 PM, Michael Paquier michael.paqu...@gmail.com wrote: Adding on top of that a couple of things cleaned up, like docs and typos, and I got the patch attached. Let's have a committer have a look a it now, I am marking that as Ready for Committer. For the archives, this has been committed as fe263d1. Thanks Simon for looking and the final push. And sorry that I didn't spot the issue with tap tests when reviewing, check-world passed but my dev VM missed necessary perl packages. While re-looking at that. I just found that when selecting the relations that are reindexed for a schema we ignore materialized view as the key scan is only done using 'r' as relkind. The patch attached fixes that. Here is an updated patch doing as well that: - Regression test checking if user has permissions on schema was broken - Silent NOTICE messages of REINDEX by having client_min_messages set to WARINING (thoughts about having a plpgsql function doing consistency checks of relfilenode before and after reindex?) -- Michael From 402afad6c124d2b74a5a82e36e017d2dedb0186d Mon Sep 17 00:00:00 2001 From: Michael Paquier michael@otacoo.com Date: Tue, 9 Dec 2014 16:40:39 +0900 Subject: [PATCH] Fix a couple of bugs in REINDEX SCHEMA The following issues are fixed: - The key scan used was using a filter on relation relkind, but that's not actually necessary as a filter is applied when building the list of OIDs reindexed. - Regression test checking permission of reindexed schema was broken - Upgrade client_min_messages to 'warning' per complaints from jaragundi and leech as the table reindex ordering is not entirely guaranteed (we may as well here use a plpgsql function that does check if relfilenode has been changed for the reindexed relations. --- src/backend/commands/indexcmds.c | 8 ++-- src/test/regress/expected/create_index.out | 21 + src/test/regress/sql/create_index.sql | 10 ++ 3 files changed, 17 insertions(+), 22 deletions(-) diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index a3e8a15..9b07216 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -1867,16 +1867,12 @@ ReindexObject(const char *objectName, ReindexObjectType objectKind) */ if (objectKind == REINDEX_OBJECT_SCHEMA) { - scan_keys = palloc(sizeof(ScanKeyData) * 2); + scan_keys = palloc(sizeof(ScanKeyData)); ScanKeyInit(scan_keys[0], Anum_pg_class_relnamespace, BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(objectOid)); - ScanKeyInit(scan_keys[1], - Anum_pg_class_relkind, - BTEqualStrategyNumber, F_CHAREQ, - 'r'); - num_keys = 2; + num_keys = 1; } else num_keys = 0; diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index ebac939..418b0ec 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -2834,31 +2834,28 @@ explain (costs off) -- -- REINDEX SCHEMA -- +SET client_min_messages TO 'warning'; REINDEX SCHEMA schema_to_reindex; -- failure, schema does not exist ERROR: schema schema_to_reindex does not exist CREATE SCHEMA schema_to_reindex; -CREATE TABLE schema_to_reindex.table1(col1 SERIAL PRIMARY KEY); -CREATE TABLE schema_to_reindex.table2(col1 SERIAL PRIMARY KEY, col2 VARCHAR(100) NOT NULL); -CREATE INDEX ON schema_to_reindex.table2(col2); +CREATE TABLE schema_to_reindex.table(col1 SERIAL PRIMARY KEY); +CREATE MATERIALIZED VIEW schema_to_reindex.matview AS SELECT col1 FROM schema_to_reindex.table; +CREATE INDEX ON schema_to_reindex.matview(col1); REINDEX SCHEMA schema_to_reindex; -NOTICE: table schema_to_reindex.table1 was reindexed -NOTICE: table schema_to_reindex.table2 was reindexed BEGIN; REINDEX SCHEMA schema_to_reindex; -- failure, cannot run in a transaction ERROR: REINDEX SCHEMA cannot run inside a transaction block END; +RESET client_min_messages; -- Failure for unauthorized user -CREATE ROLE reindexuser login; +CREATE ROLE user_reindex login; SET SESSION ROLE user_reindex; -ERROR: role user_reindex does not exist REINDEX SCHEMA schema_to_reindex; -NOTICE: table schema_to_reindex.table1 was reindexed -NOTICE: table schema_to_reindex.table2 was reindexed +ERROR: must be owner of schema schema_to_reindex -- Clean up RESET ROLE; DROP ROLE user_reindex; -ERROR: role user_reindex does not exist DROP SCHEMA schema_to_reindex CASCADE; NOTICE: drop cascades to 2 other objects -DETAIL: drop cascades to table schema_to_reindex.table1 -drop cascades to table schema_to_reindex.table2 +DETAIL: drop cascades to table schema_to_reindex.table +drop cascades to materialized view schema_to_reindex.matview diff --git a/src/test/regress/sql/create_index.sql
[HACKERS] committsdesc.c not ignored in contrib/pg_xlogdump
Hi all, As mentioned in $subject, we are missing an entry in pg_xlogdump's .gitignore. Patch attached. Regards, -- Michael diff --git a/contrib/pg_xlogdump/.gitignore b/contrib/pg_xlogdump/.gitignore index 37974e0..16cf749 100644 --- a/contrib/pg_xlogdump/.gitignore +++ b/contrib/pg_xlogdump/.gitignore @@ -2,6 +2,7 @@ # Source files copied from src/backend/access/ /brindesc.c /clogdesc.c +/committsdesc.c /dbasedesc.c /gindesc.c /gistdesc.c -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.5] Custom Plan API
On 7 December 2014 at 08:21, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Please wait for a few days. The ctidscan module is not adjusted for the latest interface yet. I am in many ways a patient man. At this point it is 12 days since my request for a working example. Feedback I am receiving is that the API is unusable. That could be because it is impenetrable, or because it is unusable. I'm not sure it matters which. We need a working example to ensure that the API meets the needs of a wide section of users and if it does not, to give other users a chance to request changes to the API so that it becomes usable. The window for such feedback is approaching zero very quickly now and we need action. Thanks -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Tue, Dec 9, 2014 at 12:59 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Dec 9, 2014 at 8:08 AM, Amit Langote langote_amit...@lab.ntt.co.jp wrote: From: Robert Haas [mailto:robertmh...@gmail.com] On Sat, Dec 6, 2014 at 2:59 AM, Amit Kapila amit.kapil...@gmail.com wrote: I guess you could list or hash partition on multiple columns, too. How would you distinguish values in list partition for multiple columns? I mean for range partition, we are sure there will be either one value for each column, but for list it could be multiple and not fixed for each partition, so I think it will not be easy to support the multicolumn partition key for list partitions. I don't understand. If you want to range partition on columns (a, b), you say that, say, tuples with (a, b) values less than (100, 200) go here and the rest go elsewhere. For list partitioning, you say that, say, tuples with (a, b) values of EXACTLY (100, 200) go here and the rest go elsewhere. I'm not sure how useful that is but it's not illogical. In case of list partitioning, 100 and 200 would respectively be one of the values in lists of allowed values for a and b. I thought his concern is whether this list of values for each column in partkey is as convenient to store and manipulate as range partvalues. Yeah and also how would user specify the values, as an example assume that table is partitioned on monthly_salary, so partition definition would look: PARTITION BY LIST(monthly_salary) ( PARTITION salary_less_than_thousand VALUES(300, 900), PARTITION salary_less_than_two_thousand VALUES (500,1000,1500), ... ) Now if user wants to define multi-column Partition based on monthly_salary and annual_salary, how do we want him to specify the values. Basically how to distinguish which values belong to first column key and which one's belong to second column key. Perhaps you are talking about syntactic difficulties that I totally missed in my other reply to this mail? Can we represent the same data by rather using a subpartitioning scheme? ISTM, semantics would remain the same. ... PARTITION BY (monthly_salary) SUBPARTITION BY (annual_salary)? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Tue, Nov 11, 2014 at 3:00 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-11-11 09:29:22 +, Thom Brown wrote: On 26 September 2014 12:40, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Sep 23, 2014 at 10:31 AM, Robert Haas robertmh...@gmail.com wrote: But this gets at another point: the way we're benchmarking this right now, we're really conflating the effects of three different things: 1. Changing the locking regimen around the freelist and clocksweep. 2. Adding a bgreclaimer process. 3. Raising the number of buffer locking partitions. First of all thanks for committing part-1 of this changes and it seems you are planing to commit part-3 based on results of tests which Andres is planing to do and for remaining part (part-2), today Were parts 2 and 3 committed in the end? 3 was committed. 2 wasn't because it's not yet clear whether how beneficial it is generally. As shown upthread that this patch (as it stands today) is dependent on another patch (wait free LW_SHARED acquisition) which is still not committed and still some more work is needed to clearly show the gain by this patch, so I have marked it as Returned with Feedback. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Proposal : REINDEX SCHEMA
On 9 December 2014 at 17:17, Michael Paquier michael.paqu...@gmail.com wrote: While re-looking at that. I just found that when selecting the relations that are reindexed for a schema we ignore materialized view as the key scan is only done using 'r' as relkind. The patch attached fixes that. Here is an updated patch doing as well that: - Regression test checking if user has permissions on schema was broken - Silent NOTICE messages of REINDEX by having client_min_messages set to WARINING (thoughts about having a plpgsql function doing consistency checks of relfilenode before and after reindex?) ISTM that REINDEX is not consistent with VACUUM, ANALYZE or CLUSTER in the way it issues NOTICE messages. I'm inclined to simply remove the NOTICE messages, except when a REINDEX ... VERBOSE is requested. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX SCHEMA
On Tuesday, December 9, 2014, Simon Riggs si...@2ndquadrant.com wrote: On 9 December 2014 at 17:17, Michael Paquier michael.paqu...@gmail.com javascript:; wrote: While re-looking at that. I just found that when selecting the relations that are reindexed for a schema we ignore materialized view as the key scan is only done using 'r' as relkind. The patch attached fixes that. Here is an updated patch doing as well that: - Regression test checking if user has permissions on schema was broken - Silent NOTICE messages of REINDEX by having client_min_messages set to WARINING (thoughts about having a plpgsql function doing consistency checks of relfilenode before and after reindex?) ISTM that REINDEX is not consistent with VACUUM, ANALYZE or CLUSTER in the way it issues NOTICE messages. I'm inclined to simply remove the NOTICE messages, except when a REINDEX ... VERBOSE is requested. +1 to remove the NOTICE messages except when specifying VERBOSE. It would output a lot of messages if there are many table in schema. If nobody objects to it, I will work on this. Regards, -- Sawada Masahiko -- Regards, --- Sawada Masahiko
Re: [HACKERS] [v9.5] Custom Plan API
Simon, The sample code is here: https://github.com/kaigai/ctidscan The code itself and regression tests shows how does it work and interact with the core backend. However, its source code comments are not updated and SGML document is not ready yet, because of my schedule in earlier half of December. I try to add the above stuff for a patch of contrib module, but will take a few more days. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: Simon Riggs [mailto:si...@2ndquadrant.com] Sent: Tuesday, December 09, 2014 12:24 AM To: Kaigai Kouhei(海外 浩平) Cc: Robert Haas; Thom Brown; Kohei KaiGai; Tom Lane; Alvaro Herrera; Shigeru Hanada; Stephen Frost; Andres Freund; PgHacker; Jim Mlodgenski; Peter Eisentraut Subject: Re: [HACKERS] [v9.5] Custom Plan API On 7 December 2014 at 08:21, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Please wait for a few days. The ctidscan module is not adjusted for the latest interface yet. I am in many ways a patient man. At this point it is 12 days since my request for a working example. Feedback I am receiving is that the API is unusable. That could be because it is impenetrable, or because it is unusable. I'm not sure it matters which. We need a working example to ensure that the API meets the needs of a wide section of users and if it does not, to give other users a chance to request changes to the API so that it becomes usable. The window for such feedback is approaching zero very quickly now and we need action. Thanks -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX SCHEMA
On Tue, Dec 9, 2014 at 6:00 PM, Simon Riggs si...@2ndquadrant.com wrote: On 9 December 2014 at 17:17, Michael Paquier michael.paqu...@gmail.com wrote: While re-looking at that. I just found that when selecting the relations that are reindexed for a schema we ignore materialized view as the key scan is only done using 'r' as relkind. The patch attached fixes that. Here is an updated patch doing as well that: - Regression test checking if user has permissions on schema was broken - Silent NOTICE messages of REINDEX by having client_min_messages set to WARINING (thoughts about having a plpgsql function doing consistency checks of relfilenode before and after reindex?) ISTM that REINDEX is not consistent with VACUUM, ANALYZE or CLUSTER in the way it issues NOTICE messages. I'm inclined to simply remove the NOTICE messages, except when a REINDEX ... VERBOSE is requested. OK. Perhaps that's not worth mentioning in the release notes, but some users may be used to the old behavior. What about the other issues (regression test for permissions incorrect and matviews)? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Casting issues with domains
Le 08/12/2014 16:18, Tom Lane a écrit : Thomas Reiss thomas.re...@dalibo.com writes: postgres=# explain select * from test2 where a='toto'; QUERY PLAN -- Seq Scan on test1 (cost=0.00..1693.00 rows=500 width=5) Filter: (((a)::tstdom)::text = 'toto'::text) (2 lignes) As you can see, a is casted to tstdom then again to text. This casts prevents the optimizer to choose an index scan to retrieve the data. The casts are however strictly equivalent and should be not prevent the optimizer to use indexes. No, they are not equivalent. The optimizer can't simply drop the cast-to-domain, because that cast might result in a runtime error due to a domain CHECK constraint violation. (This is true even if no such constraint exists at planning time, unfortunately. If we had a mechanism to force replanning at ALTER DOMAIN ADD CONSTRAINT, maybe the no-constraints case could be handled better, but we don't; and adding one would also imply adding more locks around domain usage, so it's not all that attractive to do it.) The short answer is that SQL domains are not zero-cost type aliases. Perhaps there would be value in having a feature that *is* a a zero-cost alias, but it wouldn't be a domain. I agree regarding the feature for zero-cost aliases. It would ease access on the catalog done via the information_schema for example. Thanks for your answer. There's some room for improvement for sure, but it not as easy as it seems. Regards, Thomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX SCHEMA
On Tue, Dec 9, 2014 at 6:43 PM, Michael Paquier michael.paqu...@gmail.com wrote: OK. Perhaps that's not worth mentioning in the release notes, but some users may be used to the old behavior. What about the other issues (regression test for permissions incorrect and matviews)? Here is in any case an updated patch to fix all those things rebased on latest HEAD (ae4e688). Thanks, -- Michael From 4cc862ed78b3537438e257aa24a5d0ee1479eb24 Mon Sep 17 00:00:00 2001 From: Michael Paquier michael@otacoo.com Date: Tue, 9 Dec 2014 16:40:39 +0900 Subject: [PATCH] Fix two bugs in REINDEX SCHEMA: regression and matviews The following issues are fixed: - The key scan used was using a filter on relation relkind, but that's not actually necessary as a filter is applied when building the list of OIDs reindexed. - Regression test checking permission of reindexed schema was broken --- src/backend/commands/indexcmds.c | 8 ++-- src/test/regress/expected/create_index.out | 15 +++ src/test/regress/sql/create_index.sql | 8 3 files changed, 13 insertions(+), 18 deletions(-) diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index cf4de72..6711a66 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -1867,16 +1867,12 @@ ReindexObject(const char *objectName, ReindexObjectType objectKind) */ if (objectKind == REINDEX_OBJECT_SCHEMA) { - scan_keys = palloc(sizeof(ScanKeyData) * 2); + scan_keys = palloc(sizeof(ScanKeyData)); ScanKeyInit(scan_keys[0], Anum_pg_class_relnamespace, BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(objectOid)); - ScanKeyInit(scan_keys[1], - Anum_pg_class_relkind, - BTEqualStrategyNumber, F_CHAREQ, - 'r'); - num_keys = 2; + num_keys = 1; } else num_keys = 0; diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index eba14e2..0d0a5ca 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -2837,24 +2837,23 @@ explain (costs off) REINDEX SCHEMA schema_to_reindex; -- failure, schema does not exist ERROR: schema schema_to_reindex does not exist CREATE SCHEMA schema_to_reindex; -CREATE TABLE schema_to_reindex.table1(col1 SERIAL PRIMARY KEY); -CREATE TABLE schema_to_reindex.table2(col1 SERIAL PRIMARY KEY, col2 VARCHAR(100) NOT NULL); -CREATE INDEX ON schema_to_reindex.table2(col2); +CREATE TABLE schema_to_reindex.table(col1 SERIAL PRIMARY KEY); +CREATE MATERIALIZED VIEW schema_to_reindex.matview AS SELECT col1 FROM schema_to_reindex.table; +CREATE INDEX ON schema_to_reindex.matview(col1); REINDEX SCHEMA schema_to_reindex; BEGIN; REINDEX SCHEMA schema_to_reindex; -- failure, cannot run in a transaction ERROR: REINDEX SCHEMA cannot run inside a transaction block END; -- Failure for unauthorized user -CREATE ROLE reindexuser login; +CREATE ROLE user_reindex login; SET SESSION ROLE user_reindex; -ERROR: role user_reindex does not exist REINDEX SCHEMA schema_to_reindex; +ERROR: must be owner of schema schema_to_reindex -- Clean up RESET ROLE; DROP ROLE user_reindex; -ERROR: role user_reindex does not exist DROP SCHEMA schema_to_reindex CASCADE; NOTICE: drop cascades to 2 other objects -DETAIL: drop cascades to table schema_to_reindex.table1 -drop cascades to table schema_to_reindex.table2 +DETAIL: drop cascades to table schema_to_reindex.table +drop cascades to materialized view schema_to_reindex.matview diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql index 1cd57da..5b6c9e6 100644 --- a/src/test/regress/sql/create_index.sql +++ b/src/test/regress/sql/create_index.sql @@ -970,16 +970,16 @@ explain (costs off) -- REINDEX SCHEMA schema_to_reindex; -- failure, schema does not exist CREATE SCHEMA schema_to_reindex; -CREATE TABLE schema_to_reindex.table1(col1 SERIAL PRIMARY KEY); -CREATE TABLE schema_to_reindex.table2(col1 SERIAL PRIMARY KEY, col2 VARCHAR(100) NOT NULL); -CREATE INDEX ON schema_to_reindex.table2(col2); +CREATE TABLE schema_to_reindex.table(col1 SERIAL PRIMARY KEY); +CREATE MATERIALIZED VIEW schema_to_reindex.matview AS SELECT col1 FROM schema_to_reindex.table; +CREATE INDEX ON schema_to_reindex.matview(col1); REINDEX SCHEMA schema_to_reindex; BEGIN; REINDEX SCHEMA schema_to_reindex; -- failure, cannot run in a transaction END; -- Failure for unauthorized user -CREATE ROLE reindexuser login; +CREATE ROLE user_reindex login; SET SESSION ROLE user_reindex; REINDEX SCHEMA schema_to_reindex; -- 2.2.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] committsdesc.c not ignored in contrib/pg_xlogdump
Michael Paquier wrote: Hi all, As mentioned in $subject, we are missing an entry in pg_xlogdump's .gitignore. Thanks, pushed. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
Josh Berkus wrote: Hi, Pardon me for jumping into this late. In general, I like Alvaro's approach. Please don't call this Alvaro's approach as I'm not involved in this anymore. Amit Langote has taken ownership of it now. While some resemblance to what I originally proposed might remain, I haven't kept track of how this has evolved and this might be a totally different thing now. Or not. Anyway I just wanted to comment on a single point: 6. Unique Index Problem Cannot create a unique index across multiple partitions, which prevents the partitioned table from being FK'd. Not Addressed (but could be addressed in the future) I think it's unlikely that we will ever create a unique index that spans all the partitions, actually. Even if there are some wild ideas on how to implement such a thing, the number of difficult issues that no one knows how to attack seems too large. I would perhaps be thinking in allowing foreign keys to be defined on column sets that are prefixed by partition keys; unique indexes must exist on all partitions on the same columns including the partition keys. (Perhaps make an extra exception that if a partition allows a single value for the partition column, that column need not be part of the unique index.) 10. Scaling Problem Inheritance partitioning becomes prohibitively slow for the planner at somewhere between 100 and 500 partitions depending on various factors. No idea? At least it was my intention to make the system scale to huge number of partitions, but this requires some forward thinking (such as avoiding loading the index list of all of them or evern opening all of them at the planner stage) and I think would be defeated if we want to keep all the generality of the inheritance-based approach. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
Amit Kapila wrote: On Tue, Dec 9, 2014 at 1:42 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 8, 2014 at 2:56 PM, Andres Freund and...@2ndquadrant.com wrote: I don't think that's mutually exclusive with the idea of partitions-as-tables. I mean, you can add code to the ALTER TABLE path that says if (i_am_not_the_partitioning_root) ereport(ERROR, ...) wherever you want. That'll be a lot of places you'll need to touch. More fundamentally: Why should we name something a table that's not one? Well, I'm not convinced that it isn't one. And adding a new relkind will involve a bunch of code churn, too. But I don't much care to pre-litigate this: when someone has got a patch, we can either agree that the approach is OK or argue that it is problematic because X. I think we need to hammer down the design in broad strokes first, and I'm not sure we're totally there yet. That's right, I think at this point defining the top level behaviour/design is very important to proceed, we can decide about the better implementation approach afterwards (may be once initial patch is ready, because it might not be a major work to do it either way). So here's where we are on this point till now as per my understanding, I think that direct operations should be prohibited on partitions, you think that they should be allowed and Andres think that it might be better to allow direct operations on partitions for Read. FWIW in my original proposal I was rejecting some things that after further consideration turn out to be possible to allow; for instance directly referencing individual partitions in COPY. We could allow something like COPY lineitems PARTITION FOR VALUE '2000-01-01' TO STDOUT or maybe COPY PARTITION FOR VALUE '2000-01-01' ON TABLE lineitems TO STDOUT and this would emit the whole partition for year 2000 of table lineitems, and only that (the value is just computed on the fly to fit the partitioning constraints for that individual partition). Then pg_dump would be able to dump each and every partition separately. In a similar way we could have COPY FROM allow input into individual partitions so that such a dump can be restored in parallel for each partition. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small TRUNCATE glitch
Bruce Momjian br...@momjian.us writes: Added to TODO: o Clear table counters on TRUNCATE http://archives.postgresql.org/pgsql-hackers/2008-04/msg00169.php Hello, Attached is a WIP patch for this TODO. From 97665ef1ca7d1847e90d4dfab38562135f01fb2b Mon Sep 17 00:00:00 2001 From: Alex Shulgin a...@commandprompt.com Date: Tue, 9 Dec 2014 16:35:14 +0300 Subject: [PATCH] WIP: track TRUNCATEs in pgstat transaction stats. The n_live_tup and n_dead_tup counters need to be set to 0 after a TRUNCATE on the relation. We can't issue a special message to the stats collector because the xact might be later aborted, so we track the fact that the relation was truncated during the xact (and reset this xact's insert/update/delete counters). When xact is committed, we use the `truncated' flag to reset the n_live_tup and n_dead_tup counters. --- src/backend/commands/tablecmds.c | 2 ++ src/backend/postmaster/pgstat.c | 70 src/include/pgstat.h | 3 ++ 3 files changed, 68 insertions(+), 7 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c new file mode 100644 index 1e737a0..192d033 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** ExecuteTruncate(TruncateStmt *stmt) *** 1224,1229 --- 1224,1231 */ reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST); } + + pgstat_count_heap_truncate(rel); } /* diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c new file mode 100644 index c7f41a5..7ff66b5 *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *** typedef struct TwoPhasePgStatRecord *** 200,208 PgStat_Counter tuples_updated; /* tuples updated in xact */ PgStat_Counter tuples_deleted; /* tuples deleted in xact */ Oid t_id; /* table's OID */ ! bool t_shared; /* is it a shared catalog? */ } TwoPhasePgStatRecord; /* * Info about current snapshot of stats file */ --- 200,211 PgStat_Counter tuples_updated; /* tuples updated in xact */ PgStat_Counter tuples_deleted; /* tuples deleted in xact */ Oid t_id; /* table's OID */ ! char t_flags; /* see TWOPHASE_PGSTAT_RECORD_*_FLAGs */ } TwoPhasePgStatRecord; + #define TWOPHASE_PGSTAT_RECORD_SHARED_FLAG 0x01 /* is it a shared catalog? */ + #define TWOPHASE_PGSTAT_RECORD_TRUNC_FLAG 0x02 /* was the relation truncated? */ + /* * Info about current snapshot of stats file */ *** pgstat_count_heap_delete(Relation rel) *** 1864,1869 --- 1867,1896 } /* + * pgstat_count_heap_truncate - update tuple counters due to truncate + */ + void + pgstat_count_heap_truncate(Relation rel) + { + PgStat_TableStatus *pgstat_info = rel-pgstat_info; + + if (pgstat_info != NULL) + { + /* We have to log the effect at the proper transactional level */ + int nest_level = GetCurrentTransactionNestLevel(); + + if (pgstat_info-trans == NULL || + pgstat_info-trans-nest_level != nest_level) + add_tabstat_xact_level(pgstat_info, nest_level); + + pgstat_info-trans-tuples_inserted = 0; + pgstat_info-trans-tuples_updated = 0; + pgstat_info-trans-tuples_deleted = 0; + pgstat_info-trans-truncated = true; + } + } + + /* * pgstat_update_heap_dead_tuples - update dead-tuples count * * The semantics of this are that we are reporting the nontransactional *** AtEOXact_PgStat(bool isCommit) *** 1927,1932 --- 1954,1961 tabstat-t_counts.t_tuples_deleted += trans-tuples_deleted; if (isCommit) { + tabstat-t_counts.t_truncated = trans-truncated; + /* insert adds a live tuple, delete removes one */ tabstat-t_counts.t_delta_live_tuples += trans-tuples_inserted - trans-tuples_deleted; *** AtEOSubXact_PgStat(bool isCommit, int ne *** 1991,1999 { if (trans-upper trans-upper-nest_level == nestDepth - 1) { ! trans-upper-tuples_inserted += trans-tuples_inserted; ! trans-upper-tuples_updated += trans-tuples_updated; ! trans-upper-tuples_deleted += trans-tuples_deleted; tabstat-trans = trans-upper; pfree(trans); } --- 2020,2039 { if (trans-upper trans-upper-nest_level == nestDepth - 1) { ! if (trans-truncated) ! { ! trans-upper-truncated = true; ! /* replace upper xact stats with ours */ ! trans-upper-tuples_inserted = trans-tuples_inserted; ! trans-upper-tuples_updated = trans-tuples_updated; ! trans-upper-tuples_deleted = trans-tuples_deleted; ! } ! else ! { ! trans-upper-tuples_inserted += trans-tuples_inserted; ! trans-upper-tuples_updated += trans-tuples_updated; ! trans-upper-tuples_deleted += trans-tuples_deleted; ! } tabstat-trans = trans-upper; pfree(trans); } ***
Re: [HACKERS] No documentation on record *= record?
Jim Nasby jim.na...@bluetreble.com wrote: There doesn't seem to be documentation on *= (or search isn't finding it). Is this intentional? http://www.postgresql.org/docs/9.4/static/functions-comparisons.html#COMPOSITE-TYPE-COMPARISON -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Tue, Dec 9, 2014 at 12:46 AM, Amit Kapila amit.kapil...@gmail.com wrote: I agree with this. For a first version, I think it's OK to start a worker up for a particular sequential scan and have it help with that sequential scan until the scan is completed, and then exit. It should not, as the present version of the patch does, assign a fixed block range to each worker; instead, workers should allocate a block or chunk of blocks to work on until no blocks remain. That way, even if every worker but one gets stuck, the rest of the scan can still finish. I will check on this point and see if it is feasible to do something on those lines, basically currently at Executor initialization phase, we set the scan limits and then during Executor Run phase use heap_getnext to fetch the tuples accordingly, but doing it dynamically means at ExecutorRun phase we need to reset the scan limit for which page/pages to scan, still I have to check if there is any problem with such an idea. Do you any different idea in mind? Hmm. Well, it looks like there are basically two choices: you can either (as you propose) deal with this above the level of the heap_beginscan/heap_getnext API by scanning one or a few pages at a time and then resetting the scan to a new starting page via heap_setscanlimits; or alternatively, you can add a callback to HeapScanDescData that, if non-NULL, will be invoked to get the next block number to scan. I'm not entirely sure which is better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user set local_preload_libraries.
On Mon, Dec 8, 2014 at 9:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: Barring someone committing to spend the time to improve that situation (time that would be poorly invested IMO), I don't think that we want to open up ignore_system_indexes as USERSET, or do anything else to encourage its use. If we're intent on removing PGC_BACKEND then I'd be okay with reclassifying ignore_system_indexes as SUSET; but I'm not exactly convinced that we should be trying to get rid of PGC_BACKEND. Well, if you want to discourage its use, I think there's an argument that marking it as SUSET would be more restrictive than what we have at present, since it would altogether prohibit non-superuser use. I'm not wedded to the idea of getting rid of PGC_BACKEND, but I do like it. Peter's survey of the landscape seems to show that there's very little left in that category and the stuff that is there is somewhat uninspiring. And simplifying things is always nice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX SCHEMA
Simon Riggs si...@2ndquadrant.com writes: ISTM that REINDEX is not consistent with VACUUM, ANALYZE or CLUSTER in the way it issues NOTICE messages. I'm inclined to simply remove the NOTICE messages, except when a REINDEX ... VERBOSE is requested. My recollection is that those other commands do issue messages always, but at some DEBUGn log level when not VERBOSE. Ideally REINDEX would adopt that same behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX SCHEMA
On Tue, Dec 9, 2014 at 11:59 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: ISTM that REINDEX is not consistent with VACUUM, ANALYZE or CLUSTER in the way it issues NOTICE messages. I'm inclined to simply remove the NOTICE messages, except when a REINDEX ... VERBOSE is requested. My recollection is that those other commands do issue messages always, but at some DEBUGn log level when not VERBOSE. Ideally REINDEX would adopt that same behavior. So it seems to that REINDEX command issues messages as INFO level when that is specified VERBOSE option. If not specified, it issue message as DEBUG2. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX SCHEMA
On Tue, Dec 9, 2014 at 10:21 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Dec 9, 2014 at 6:43 PM, Michael Paquier michael.paqu...@gmail.com wrote: OK. Perhaps that's not worth mentioning in the release notes, but some users may be used to the old behavior. What about the other issues (regression test for permissions incorrect and matviews)? Here is in any case an updated patch to fix all those things rebased on latest HEAD (ae4e688). The patch is fine: - No compiler warnings - Added properly regressions tests and run ok There are no changes in the docs. Maybe we can mention matviews on REINDEX SCHEMA docs, what do you think? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
[HACKERS] logical column ordering
So I've been updating my very old patch to allow logical and physical column reordering. Here's a WIP first cut for examination. There are plenty of rough edges here; most importantly there is no UI at all for column reordering other than direct UPDATEs of pg_attribute, which most likely falls afoul of cache invalidation problems. For now I'm focusing on correct processing when columns are moved logically; I haven't yet started to see how to move columns physically, but I think that part is much more localized than the logical one. Just as a reminder, this effort is an implementation of ideas that have been discussed previously; in particular, see these threads: http://www.postgresql.org/message-id/20414.1166719...@sss.pgh.pa.us (2006) http://www.postgresql.org/message-id/6843.1172126...@sss.pgh.pa.us (2007) http://www.postgresql.org/message-id/23035.1227659...@sss.pgh.pa.us (2008) To recap, this is based on the idea of having three numbers for each attribute rather than a single attnum; the first of these is attnum (a number that uniquely identifies an attribute since its inception and may or may not have any relationship to its storage position and the place it expands to through user interaction). The second is attphysnum, which indicates where it is stored in the physical structure. The third is attlognum, which indicates where it expands in *, where must its values be placed in COPY or VALUES lists, etc --- the logical position as the user sees it. The first thing where this matters is tuple descriptor expansion in parse analysis; at this stage, things such as * (in select *) are turned into a target list, which must be sorted according to attlognum. To achieve this I added a new routine to tupledescs, TupleDescGetSortedAttrs() which computes a new Attribute array and caches it in the TupleDesc for later uses; this array points to the same elements in the normal attribute list but is order by attlognum. Additionally there are a number of places that iterate on such target lists and use the iterator as the attribute number; those were modified to have a separate attribute number as attnum within the loop. Another place that needs tweaking is heapam.c, which must construct a physical tuple from Datum/nulls arrays (heap_form_tuple). In some cases the input arrays are sorted in logical column order. I have opted to add a flag that indicates whether the array is in logical order; if it is the routines compute the correct physical order. (Actually as I mentioned above I still haven't made any effort to make sure they work in the case that attnum differs from attphysnum, but this should be reasonably contained changes.) The part where I stopped just before sending the current state is this error message: alvherre=# select * from quux where (a,c) in ( select a,c from quux ); select * from quux where (a,c) in ( select a,c from quux ); ERROR: failed to find unique expression in subplan tlist I'm going to see about it while I get feedback on the rest of this patch; in particular, extra test cases that fail to work when columns have been moved around are welcome, so that I can add them to the regress test. What I have now is the basics I'm building as I go along. The regression tests show examples of some logical column renumbering (which can be done after the table already contains some data) but none of physical column renumbering (which can only be done when the table is completely empty.) My hunch is that the sample foo, bar, baz, quux tables should present plenty of opportunities to display brokenness in the planner and executor. PS: Phil Currier allegedly had a patch back in 2007-2008 that did this, or something very similar ... though he never posted a single bit of it, and then he vanished without a trace. If he's still available it would be nice to see his WIP patch, even if outdated, as it might serve as inspiration and let us know what other places need tweaking. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 18ae318..54473be 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -2368,6 +2368,9 @@ get_attnum_pk_pos(int *pkattnums, int pknumatts, int key) return -1; } +/* + * FIXME this probably needs to be tweaked. + */ static HeapTuple get_tuple_of_interest(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals) { diff --git a/contrib/spi/timetravel.c b/contrib/spi/timetravel.c index a37cbee..d7e6e50 100644 --- a/contrib/spi/timetravel.c +++ b/contrib/spi/timetravel.c @@ -314,6 +314,7 @@ timetravel(PG_FUNCTION_ARGS) Oid *ctypes; char sql[8192]; char separ = ' '; + Form_pg_attribute *attrs; /* allocate ctypes for preparation */ ctypes = (Oid *) palloc(natts * sizeof(Oid)); @@ -322,10 +323,11 @@ timetravel(PG_FUNCTION_ARGS) * Construct query: INSERT INTO _relation_
Re: [HACKERS] On partitioning
On 12/09/2014 12:17 AM, Amit Langote wrote: Now if user wants to define multi-column Partition based on monthly_salary and annual_salary, how do we want him to specify the values. Basically how to distinguish which values belong to first column key and which one's belong to second column key. Perhaps you are talking about syntactic difficulties that I totally missed in my other reply to this mail? Can we represent the same data by rather using a subpartitioning scheme? ISTM, semantics would remain the same. ... PARTITION BY (monthly_salary) SUBPARTITION BY (annual_salary)? ... or just use arrays. PARTITION BY LIST ( monthly_salary, annual_salary ) PARTITION salary_small VALUES ({[300,400],[5000,6000]}) ) ... but that begs the question of how partition by list over two columns (or more) would even work? You'd need an a*b number of partitions, and the user would be pretty much certain to miss a few value combinations. Maybe we should just restrict list partitioning to a single column for a first release, and wait and see if people ask for more? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] moving from contrib to bin
Here are the contrib programs: oid2name pg_archivecleanup pg_standby pg_test_fsync pg_test_timing pg_upgrade pg_xlogdump pgbench vacuumlo The proposal would basically be to mv contrib/$x src/bin/$x and also move the reference pages in the documentation. +1 Considering that all of the above have been around for a while, it's kind of silly that they're still in contrib. Mostly that just guarantees that nobody will use them, even when it's appropriate. The one exception I might make above is pg_standby. What do we need this for today, exactly? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] moving from contrib to bin
On Mon, Dec 8, 2014 at 10:26 PM, Peter Eisentraut pete...@gmx.net wrote: Let's take another crack at moving stuff out of contrib. Nobody likes contrib. The task is only finding something that most people like better. I like contrib fine. It's a great place for things to live that are not quite baked enough for core, but still worth distributing. oid2name pg_archivecleanup pg_standby pg_test_fsync pg_test_timing pg_upgrade pg_xlogdump pgbench vacuumlo The proposal would basically be to mv contrib/$x src/bin/$x and also move the reference pages in the documentation. I think pg_archivecleanup, pg_upgrade, and possibly pgbench have enough general utility to justify putting them in src/bin, but not the rest. oid2name, pg_standby, and vacuumlo are probably closer to being candidates for removal than promotion in my book. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Testing DDL deparsing support
On Mon, Dec 8, 2014 at 12:43:36PM -0500, Robert Haas wrote: On Sat, Dec 6, 2014 at 10:43 PM, Bruce Momjian br...@momjian.us wrote: This causes creation DDL is checked if it is used in the regression database, but what about ALTER and DROP? pg_dump doesn't issue those, except in special cases like inheritance. The proposed testing mechanism should cover any ALTER commands that are in the regression tests provided that those objects are not subsequently dropped -- because if the ALTER commands aren't replayed properly, then the later pg_dump won't produce the same output. There probably are some gaps in our current regression tests in this area, but that's probably a good thing to fix regardless of this. OK, I understand now that the ALTERs are being passed to the slave and we then can test that against pg_dump --- sounds good. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: multivariate statistics / proof of concept
On 8.12.2014 02:01, Michael Paquier wrote: On Sun, Nov 16, 2014 at 3:35 AM, Tomas Vondra t...@fuzzy.cz wrote: Thanks for the link. I've been looking for a good dataset with such data, and this one is by far the best one. The current version of the patch supports only data types passed by value (i.e. no varlena types - text, ), which means it's impossible to build multivariate stats on some of the interesting columns (state, city, ...). I guess it's time to start working on removing this limitation. Tomas, what's your status on this patch? Are you planning to make it more complicated than it is? For now I have switched it to a Needs Review state because even your first version did not get advanced review (that's quite big btw). I guess that we should switch it to the next CF. Hello Michael, I agree with moving the patch to the next CF - I'm working on the patch, but I will take a bit more time to submit a new version and I can do that in the next CF. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] advance local xmin more aggressively
On Mon, Dec 8, 2014 at 9:31 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 8, 2014 at 4:56 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I don't immediately see the problem either, but I have to say that grovelling through all the resource owners seems ugly anyway. Resource owners are not meant to be traversed like that. And there could be a lot of them, and you have to visit every one of them. That could get expensive if there are a lot of resource owners. 1. I don't really see why resource owners shouldn't be traversed like that. They are clearly intended to form a hierarchy, and there's existing code that recurses through the hierarchy from a given level downward. What's ugly about that? Here's a patch. I looked at the issue of tracking parent-less resource owners a bit more closely. It turns out that resource owners are always created in TopMemoryContext since they should only be freed explicitly (cf. resowner.c). I was a bit worried about that, because it would be bad to keep a list of all parent-less resource owners if list elements could vanish in a context reset. But that doesn't seem to be a concern. So I stole the nextchild links of parent-less resource owners, which are not used for anything currently, to keep a list of such resource owners. It occurred to me that there's probably not much value in recomputing xmin when the active snapshot stack is non-empty. It's not impossible that a PL/pgsql function could close a cursor with an old xmin and then do lots of other work (or just sleep for a long time) before returning to the top-level, but it is pretty unlikely. So the attached patch only considers recomputing the advertised xmin when the active snapshot stack is empty. That'll happen at the end of each statement, which seems soon enough. Upthread, I suggested keeping a tally of the number of snapshots with the advertised xmin and recomputing the xmin to advertise only when it reaches 0. This patch doesn't implementation that optimization, but it does have code that aborts the traversal of the resource owner hierarchy as soon as we see an xmin that will preclude advancing our advertised xmin. Releasing N resource owners could therefore cost O(N^2) in the worst case, but note that releasing N resource owners is *already* an O(N^2) operation in the worst case, because the list of children of a particular parent resource owner is singly linked, and thus deleting a resource owner is O(N). It's been that way for an awfully long time without anyone complaining, probably because (a) it's not particularly common to have large numbers of cursors open simultaneously and (b) even if you do have that, the constant factor is pretty low. Following Heikki's previous suggestion, the patch contains checks to make sure that we find exactly the number of registered snapshots that we expect to find. We could consider demoting these to asserts or something, but this is more likely to catch bugs if there are any. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c index 0955bcc..186fa91 100644 --- a/src/backend/utils/resowner/resowner.c +++ b/src/backend/utils/resowner/resowner.c @@ -113,6 +113,7 @@ typedef struct ResourceOwnerData ResourceOwner CurrentResourceOwner = NULL; ResourceOwner CurTransactionResourceOwner = NULL; ResourceOwner TopTransactionResourceOwner = NULL; +ResourceOwner FirstRootResourceOwner = NULL; /* * List of add-on callbacks for resource releasing @@ -167,6 +168,11 @@ ResourceOwnerCreate(ResourceOwner parent, const char *name) owner-nextchild = parent-firstchild; parent-firstchild = owner; } + else + { + owner-nextchild = FirstRootResourceOwner; + FirstRootResourceOwner = owner; + } return owner; } @@ -442,6 +448,8 @@ ResourceOwnerDelete(ResourceOwner owner) * the owner tree. Better a leak than a crash. */ ResourceOwnerNewParent(owner, NULL); + Assert(FirstRootResourceOwner == owner); + FirstRootResourceOwner = owner-nextchild; /* And free the object. */ if (owner-buffers) @@ -502,6 +510,14 @@ ResourceOwnerNewParent(ResourceOwner owner, } } } + else + { + ResourceOwner *link = FirstRootResourceOwner; + + while (*link != owner) + link = ((*link)-nextchild); + *link = owner-nextchild; + } if (newparent) { @@ -513,7 +529,8 @@ ResourceOwnerNewParent(ResourceOwner owner, else { owner-parent = NULL; - owner-nextchild = NULL; + owner-nextchild = FirstRootResourceOwner; + FirstRootResourceOwner = owner; } } @@ -1161,6 +1178,50 @@ ResourceOwnerForgetSnapshot(ResourceOwner owner, Snapshot snapshot) } /* + * Invoke a caller-supplied function on every snapshot registered with this + * resource owner or any descendent resource owner. The callback can abort + * the traversal by returning true. + */ +bool +ResourceOwnerWalkSnapshots(ResourceOwner owner, +
Re: [HACKERS] moving from contrib to bin
Peter Eisentraut wrote: Here are the contrib programs: oid2name pg_archivecleanup pg_standby pg_test_fsync pg_test_timing pg_upgrade pg_xlogdump pgbench vacuumlo The proposal would basically be to mv contrib/$x src/bin/$x and also move the reference pages in the documentation. Maybe it makes sense to have a distinction between client programs and server programs. Can we have src/sbin/ and move stuff that involves the server side in there? I think that'd be pg_xlogdump, pg_archivecleanup, pg_upgrade, pg_test_timing, pg_test_fsync. (If we were feeling bold we could also move pg_resetxlog, pg_controldata and initdb there.) (For pg_upgrade you also need to do something about pg_upgrade_support, which is good because that is one very ugly crock.) I agree that oid2name and vacuumlo need to be in a better state to deserve their promotion to src/bin, if we keep them at all. In any case I support the move out of contrib of everything except vacuumlo and oid2name, for reasons already stated by others in the thread. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small TRUNCATE glitch
Alex Shulgin a...@commandprompt.com writes: Bruce Momjian br...@momjian.us writes: Added to TODO: o Clear table counters on TRUNCATE http://archives.postgresql.org/pgsql-hackers/2008-04/msg00169.php Hello, Attached is a WIP patch for this TODO. This part went as an attachment, which wasn't my intent: It does the trick by tracking if a TRUNCATE command was issued under a (sub)transaction and uses this knowledge to reset the live/dead tuple counters later if the transaction was committed. Testing in simple cases shows that this clears the counters correctly, including use of savepoints. The 2PC part requires extending bool flag to fit the trunc flag, is this approach sane? Given that 2PC transaction should survive server restart, it's reasonable to expect it to also survive the upgrade, so I see no clean way of adding another bool field to the TwoPhasePgStatRecord struct (unless some would like to add checks on record length, etc.). I'm going to add some regression tests, but not sure what would be the best location for this. The truncate.sql seems like natural choice, but stats are not updating realtime, so I'd need to borrow some tricks from stats.sql or better put the new tests in the stats.sql itself? -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] moving from contrib to bin
On Tue, Dec 9, 2014 at 06:10:02PM -0300, Alvaro Herrera wrote: (For pg_upgrade you also need to do something about pg_upgrade_support, which is good because that is one very ugly crock.) FYI, pg_upgrade_support was segregated from pg_upgrade only because we wanted separate binary and shared object build/install targets. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSSAPI, SSPI - include_realm default
On 12/5/14 1:06 PM, Stephen Frost wrote: I suggest we also backpatch some documentation suggesting that people manually change the include_realm parameter (perhaps also with a note saying that the default will change in 9.5). I'll work on a patch for back-branches if everyone is alright with this patch against master. I don't think backpatching this is necessary or appropriate. First of all, this isn't even released, and it might very well change again later. The right time to publicly notify about this change is not before when 9.5 is released. Also, it's not like people keep re-reading the old documentation in order to get updated advice. It might very well be confusing if stable documentation changes because of future events. Users who are interested in knowing about changes in future releases should read the release notes of those future releases. My comment that include_realm is supported back to 8.4 was because there is an expectation that a pg_hba.conf file can be used unchanged across several major releases. So when 9.5 comes out and people update their pg_hba.conf files for 9.5, those files will still work in old releases. But the time to do those updates is then, not now. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSSAPI, SSPI - include_realm default
On Dec 9, 2014 10:52 PM, Peter Eisentraut pete...@gmx.net wrote: On 12/5/14 1:06 PM, Stephen Frost wrote: I suggest we also backpatch some documentation suggesting that people manually change the include_realm parameter (perhaps also with a note saying that the default will change in 9.5). I'll work on a patch for back-branches if everyone is alright with this patch against master. I don't think backpatching this is necessary or appropriate. First of all, this isn't even released, and it might very well change again later. The right time to publicly notify about this change is not before when 9.5 is released. Also, it's not like people keep re-reading the old documentation in order to get updated advice. It might very well be confusing if stable documentation changes because of future events. Users who are interested in knowing about changes in future releases should read the release notes of those future releases. My comment that include_realm is supported back to 8.4 was because there is an expectation that a pg_hba.conf file can be used unchanged across several major releases. So when 9.5 comes out and people update their pg_hba.conf files for 9.5, those files will still work in old releases. But the time to do those updates is then, not now. I thought the idea was to backpatch documentation saying it's a good idea to change this value to x because of y. Not actually referring to the upcoming change directly. And I still think that part is a good idea, as it helps people avoid potential security pitfalls. So not really a backpatch as so, rather a separate patch for the back branches. (and people definitely reread the docs - since they deploy new systems on the existing versions...) /Magnus
[HACKERS] operator does not exist: character varying[] character[]
Is there any particular reason we don't allow comparing char and varchar arrays? If not I'll submit a patch. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] operator does not exist: character varying[] character[]
Jim Nasby jim.na...@bluetreble.com writes: On 12/9/14, 4:19 PM, Jim Nasby wrote: Is there any particular reason we don't allow comparing char and varchar arrays? If not I'll submit a patch. We're also missing operators on text and varchar arrays. Adding operators would be an incorrect fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On 12/8/14, 5:19 PM, Josh Berkus wrote: On 12/08/2014 02:12 PM, Jim Nasby wrote: On 12/8/14, 12:26 PM, Josh Berkus wrote: 4. Creation Locking Problem high probability of lock pile-ups whenever a new partition is created on demand due to multiple backends trying to create the partition at the same time. Not Addressed? Do users actually try and create new partitions during DML? That sounds doomed to failure in pretty much any system... There is no question that it would be easier for users to create partitions on demand automatically. Particularly if you're partitioning by something other than time. For a particular case, consider users on RDS, which has no cron jobs for creating new partitons; it's on demand or manually. It's quite possible that there is no good way to work out the locking for on-demand partitions though, but *if* we're going to have a 2nd partition system, I think it's important to at least discuss the problems with on-demand creation. Yeah, we should discuss it. Perhaps the right answer here may be our own job scheduler, something a lot of folks want anyway. 11. Hash Partitioning Some users would prefer to partition into a fixed number of hash-allocated partitions. Not Addressed. Though, you should be able to do that in either system if you bother to define your own hash in a BEFORE trigger... That doesn't do you any good with the SELECT query, unless you change your middleware to add a hash(column) to every query. Which would be really hard to do for joins. A. COPY/ETL then attach In inheritance partitioning, you can easily build a partition outside the master and then attach it, allowing for minimal disturbance of concurrent users. Could be addressed in the future. How much of the desire for this is because our current row routing solutions are very slow? I suspect that's the biggest reason, and hopefully Alvaro's proposal mostly eliminates it. That doesn't always work, though. In some cases the partition is being built using some fairly complex logic (think of partitions which are based on matviews) and there's no fast way to create the new data. Again, this is an acceptable casualty of an improved design, but if it will be so, we should consciously decide that. Is there an example you can give here? If the scheme is that complicated I'm failing to see how you're supposed to do things like partition elimination. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSSAPI, SSPI - include_realm default
* Peter Eisentraut (pete...@gmx.net) wrote: On 12/5/14 1:06 PM, Stephen Frost wrote: I suggest we also backpatch some documentation suggesting that people manually change the include_realm parameter (perhaps also with a note saying that the default will change in 9.5). I'll work on a patch for back-branches if everyone is alright with this patch against master. I don't think backpatching this is necessary or appropriate. Sorry if that wasn't clear but the idea was to *just* backpatch the documentation comments, not to change the default in back-branches. First of all, this isn't even released, and it might very well change again later. The right time to publicly notify about this change is not before when 9.5 is released. Also, it's not like people keep re-reading the old documentation in order to get updated advice. It might very well be confusing if stable documentation changes because of future events. Users who are interested in knowing about changes in future releases should read the release notes of those future releases. My comment that include_realm is supported back to 8.4 was because there is an expectation that a pg_hba.conf file can be used unchanged across several major releases. So when 9.5 comes out and people update their pg_hba.conf files for 9.5, those files will still work in old releases. But the time to do those updates is then, not now. The back-branches are being patched to discourage using the default because it's not a secure approach. New users start using PG all the time and so changing the existing documentation is worthwhile to ensure those new users understand. A note in the release notes for whichever minor release the change to the documentation shows up in would be a good way to make existing users aware of the change and hopefully encourage them to review their configuration. If we don't agree that the change should be made then we can discuss that, but everyone commenting so far has agreed on the change. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] operator does not exist: character varying[] character[]
On 12/9/14, 4:19 PM, Jim Nasby wrote: Is there any particular reason we don't allow comparing char and varchar arrays? If not I'll submit a patch. We're also missing operators on text and varchar arrays. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSSAPI, SSPI - include_realm default
* Magnus Hagander (mag...@hagander.net) wrote: On Dec 9, 2014 10:52 PM, Peter Eisentraut pete...@gmx.net wrote: On 12/5/14 1:06 PM, Stephen Frost wrote: I suggest we also backpatch some documentation suggesting that people manually change the include_realm parameter (perhaps also with a note saying that the default will change in 9.5). I'll work on a patch for back-branches if everyone is alright with this patch against master. I don't think backpatching this is necessary or appropriate. First of all, this isn't even released, and it might very well change again later. The right time to publicly notify about this change is not before when 9.5 is released. Also, it's not like people keep re-reading the old documentation in order to get updated advice. It might very well be confusing if stable documentation changes because of future events. Users who are interested in knowing about changes in future releases should read the release notes of those future releases. My comment that include_realm is supported back to 8.4 was because there is an expectation that a pg_hba.conf file can be used unchanged across several major releases. So when 9.5 comes out and people update their pg_hba.conf files for 9.5, those files will still work in old releases. But the time to do those updates is then, not now. I thought the idea was to backpatch documentation saying it's a good idea to change this value to x because of y. Not actually referring to the upcoming change directly. And I still think that part is a good idea, as it helps people avoid potential security pitfalls. I agree with this but I don't really see why we wouldn't say hey, this is going to change in 9.5. Peter's argument sounds like he'd rather we not make any changes to the existing documentation, and I don't agree with that, and if we're making changes then, imv, we might as well comment that the default is changed in 9.5. So not really a backpatch as so, rather a separate patch for the back branches. (and people definitely reread the docs - since they deploy new systems on the existing versions...) Yes, I was going to write a different patch for the back-branches, apologies if that wasn't clear. I'll see about drafting something up soon as there doesn't seem to be any argument about the substance of the proposed patch for master. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] logical column ordering
On 12/9/14, 11:41 AM, Alvaro Herrera wrote: I'm going to see about it while I get feedback on the rest of this patch; in particular, extra test cases that fail to work when columns have been moved around are welcome, so that I can add them to the regress test. What I have now is the basics I'm building as I go along. The regression tests show examples of some logical column renumbering (which can be done after the table already contains some data) but none of physical column renumbering (which can only be done when the table is completely empty.) My hunch is that the sample foo, bar, baz, quux tables should present plenty of opportunities to display brokenness in the planner and executor. The ideal case would be to do something like randomizing logical and physical ordering as tables are created throughout the entire test suite (presumably as an option). That should work for physical ordering, but presumably it would pointlessly blow up on logical ordering because the expected output is hard-coded. Perhaps instead of randomizing logical ordering we could force that to be the same ordering in which fields were supplied and actually randomize attnum? In particular, I'm thinking that in DefineRelation we can randomize stmt-tableElts before merging in inheritance attributes. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] operator does not exist: character varying[] character[]
On 12/9/14, 4:30 PM, Tom Lane wrote: Jim Nasby jim.na...@bluetreble.com writes: On 12/9/14, 4:19 PM, Jim Nasby wrote: Is there any particular reason we don't allow comparing char and varchar arrays? If not I'll submit a patch. We're also missing operators on text and varchar arrays. Adding operators would be an incorrect fix. Right, I'm assuming this is a problem somewhere else (haven't looked into it yet). I just wanted confirmation that this is unexpected before I try and fix it. I'll take your silence on that point as confirmation that this is a bug. :) -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical column ordering
On 12/09/2014 09:41 AM, Alvaro Herrera wrote: The first thing where this matters is tuple descriptor expansion in parse analysis; at this stage, things such as * (in select *) are turned into a target list, which must be sorted according to attlognum. To achieve this I added a new routine to tupledescs, The two other major cases are: INSERT INTO table SELECT|VALUES ... COPY table FROM|TO ... ... although copy should just be a subclass of SELECT. Question on COPY, though: there's reasons why people would want COPY to dump in either physical or logical order. If you're doing COPY to create CSV files for output, then you want the columns in logical order. If you're doing COPY for pg_dump, then you want them in physical order for faster dump/reload. So we're almost certainly going to need to have an option for COPY. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX SCHEMA
On Wed, Dec 10, 2014 at 1:37 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Tue, Dec 9, 2014 at 10:21 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Dec 9, 2014 at 6:43 PM, Michael Paquier michael.paqu...@gmail.com wrote: OK. Perhaps that's not worth mentioning in the release notes, but some users may be used to the old behavior. What about the other issues (regression test for permissions incorrect and matviews)? Here is in any case an updated patch to fix all those things rebased on latest HEAD (ae4e688). The patch is fine: - No compiler warnings - Added properly regressions tests and run ok There are no changes in the docs. Maybe we can mention matviews on REINDEX SCHEMA docs, what do you think? Current documentation tells that all the indexes in schema are reindexed, only matviews and relations can have one, so that's implicitly specified IMO. If in the future we add support for another relkind and that it can have indexes, we won't need to update the docs as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Elusive segfault with 9.3.5 query cancel
On 5 Dec 2014, at 22:41, Jim Nasby jim.na...@bluetreble.com wrote: Perhaps we should also officially recommend production servers be setup to create core files. AFAIK the only downside is the time it would take to write a core that's huge because of shared buffers, but perhaps there's some way to avoid writing those? (That means the core won't help if the bug is due to something in a buffer, but that seems unlikely enough that the tradeoff is worth it...) Good idea. It seems the madvise() system call (with MADV_DONTDUMP) is exactly what's needed to avoid dumping shared buffers. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical column ordering
On 12/09/2014 06:19 PM, Josh Berkus wrote: On 12/09/2014 09:41 AM, Alvaro Herrera wrote: The first thing where this matters is tuple descriptor expansion in parse analysis; at this stage, things such as * (in select *) are turned into a target list, which must be sorted according to attlognum. To achieve this I added a new routine to tupledescs, The two other major cases are: INSERT INTO table SELECT|VALUES ... COPY table FROM|TO ... ... although copy should just be a subclass of SELECT. Question on COPY, though: there's reasons why people would want COPY to dump in either physical or logical order. If you're doing COPY to create CSV files for output, then you want the columns in logical order. If you're doing COPY for pg_dump, then you want them in physical order for faster dump/reload. So we're almost certainly going to need to have an option for COPY. I seriously doubt it, although I could be wrong. Unless someone can show a significant performance gain from using physical order, which would be a bit of a surprise to me, I would just stick with logical ordering as the default. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] group locking: incomplete patch, just for discussion
On Sun, Nov 23, 2014 at 3:59 PM, Andres Freund and...@2ndquadrant.com wrote: The heavyweight locking issue is really killing me, though. I don't really understand why you're not content with just detecting deadlocks for now. Everything else seems like bells and whistles for later. I don't think it's OK for a parallel query to just randomly deadlock. I think that should be the end goal, yes. It's OK for parallel query to excuse itself politely and retire from the field, but I don't think it's OK for it to say, oh, sure, I can parallelize that for you, and then fail by deadlocking with itself. But I also think that right now it doesn't necessarily need to be the focus immediately. I am in strong agreement with Robert that unprincipled deadlocks are not okay. Now, what that actually means in practice might be kind of hard to delineate. Take my ON CONFLICT UPDATE patch. The locking there is optimistic, and so in theory there are risks of lock starvation [1]. There is no axiomatic reason why a single backend cannot be starved of a lock in the event of a perfect storm of conflicts. Similarly, the Lehman and Yao paper makes what can only be called a plausibility argument around how livelocks are not a concern in practice [2] when an indefinite number of move right operations are required after a page split; having written a bunch of theorems around the correctness of other aspects of their algorithm, the argument here is more or less: Come on, that'll never happen!. Concurrency is hard. If you want another example of this kind of thing, then there's how Oracle deals with READ COMMITTED row conflicts -- the statement is rolled back and restarted. Who is to say with certainty that it'll make any progress the second time? Or the third or forth? Note that the optimistic locking stuff was actually my solution to the unprincipled deadlocks problem, a solution that, as I say, also implies the theoretical risk of lock starvation (but not livelock). I think I would also have found it acceptable if there was a solution that reduced the original risk, the risk of *deadlocks* to merely a theoretical one, as opposed to a very real one (that could be shown with a simple testcase). This is a topic that I think will be much easier to approach if some demonstration of actual parallel users would be available. Doesn't have to be committable or such, but I think it'll help to shape how this has to look. I think unrecognized deadlocks will make things too annoying to use, but I don't think it needs to be guaranteed deadlock free or such. I couldn't agree more. The greater point about livelock, which I believe applies here as well, is that there are different standards by which we might consider that unprincipled deadlocks won't happen. Basically, it seems inevitable that a practical standard sometimes must be applied. If a well informed person, say myself or Andres, cannot make any given practical implementation deadlock (or alternatively be starved for locks, if we're talking about an optimistic locking work-around to unprincipled deadlocks that introduces that risk), then I guess that means that there are no unprincipled deadlocks/livelocks, and I for one will probably find that acceptable given enough testing. I realize that that might sound terribly loose (I probably forgot a few caveats), but what it boils down to is that we *don't* have the tools to analyze certain types of problems like this, or at least I don't think we do, because no one does. The closest thing we have is some well written custom stress testing, and lots of scrutiny. Of course I'd prefer a solution that is provably correct from first principles (in other words, I'd prefer to use algorithms that are in some formal sense non-blocking [3] or something along those lines), but that will often be cost prohibitive or otherwise illusive when designing concurrent algorithms. I think I agree with you both. [1] https://wiki.postgresql.org/wiki/UPSERT#Theoretical_lock_starvation_hazards [2] http://www.cs.cornell.edu/courses/cs4411/2009sp/blink.pdf , Section 6.4 Livelock [3] https://en.wikipedia.org/wiki/Non-blocking_algorithm -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] thinko in convertToJsonb()
The call: reserveFromBuffer(buffer, sizeof(VARHDRSZ)) is assuming that the size of varlena header is the same size as the type used to return that size, which happens to be so, but someone could easily change that macro to: #define VARHDRSZ ((int64) sizeof(int32)) And you'd want to reserve sizeof(int32), not sizeof(int64) in convertToJsonb. Perhaps the code really meant to say: reserveFromBuffer(buffer, VARHDRSZ) mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical column ordering
Andrew Dunstan wrote: On 12/09/2014 06:19 PM, Josh Berkus wrote: On 12/09/2014 09:41 AM, Alvaro Herrera wrote: The first thing where this matters is tuple descriptor expansion in parse analysis; at this stage, things such as * (in select *) are turned into a target list, which must be sorted according to attlognum. To achieve this I added a new routine to tupledescs, The two other major cases are: INSERT INTO table SELECT|VALUES ... COPY table FROM|TO ... Yes, both are covered. ... although copy should just be a subclass of SELECT. It is not. There's one part of COPY that goes through SELECT processing, but only when the table being copied is a subselect. Normal COPY does not use the same code path. Question on COPY, though: there's reasons why people would want COPY to dump in either physical or logical order. If you're doing COPY to create CSV files for output, then you want the columns in logical order. If you're doing COPY for pg_dump, then you want them in physical order for faster dump/reload. So we're almost certainly going to need to have an option for COPY. I seriously doubt it, although I could be wrong. Unless someone can show a significant performance gain from using physical order, which would be a bit of a surprise to me, I would just stick with logical ordering as the default. Well, we have an optimization that avoids a projection step IIRC by using the physical tlist instead of having to build a tailored one. I guess the reason that's there is because somebody did measure an improvement. Maybe it *is* worth having as an option for pg_dump ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small TRUNCATE glitch
Alex Shulgin wrote: The 2PC part requires extending bool flag to fit the trunc flag, is this approach sane? Given that 2PC transaction should survive server restart, it's reasonable to expect it to also survive the upgrade, so I see no clean way of adding another bool field to the TwoPhasePgStatRecord struct (unless some would like to add checks on record length, etc.). I don't think we need to have 2PC files survive a pg_upgrade. It seems perfectly okay to remove them from the new cluster at some appropriate time, *if* they are copied from the old cluster at all (I don't think they should be.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
There is an interesting thread about strcoll() overhead over on -general: http://www.postgresql.org/message-id/cab25xexnondrmc1_cy3jvmb0tmydm38ef9q2d7xla0rbncj...@mail.gmail.com My guess was that this person experienced a rather unexpected downside of spilling to disk when sorting on a text attribute: System throughput becomes very CPU bound, because tapesort tends to result in more comparisons [1]. With abbreviated keys, tapesort can actually compete with quicksort in certain cases [2]. Tapesorts of text attributes are especially bad on released versions of Postgres, and will perform very little actual I/O. In all seriousness, I wonder if we should add a release note item stating that when using Postgres 9.5, due to the abbreviated key optimization, external sorts can be much more I/O bound than in previous releases... [1] http://www.postgresql.org/message-id/20140806035512.ga91...@tornado.leadboat.com [2] http://www.postgresql.org/message-id/CAM3SWZQiGvGhMB4TMbEWoNjO17=ysb5b5y5mgqjsanq4uwt...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
Hi Ashutosh, Thanks for the review! (2014/11/28 18:14), Ashutosh Bapat wrote: On Thu, Nov 27, 2014 at 3:52 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: (2014/11/17 17:55), Ashutosh Bapat wrote: Here are my review comments for patch fdw-inh-3.patch. Tests --- 1. It seems like you have copied from testcase inherit.sql to postgres_fdw testcase. That's a good thing, but it makes the test quite long. May be we should have two tests in postgres_fdw contrib module, one for simple cases, and other for inheritance. What do you say? IMO, the test is not so time-consuming, so it doesn't seem worth splitting it into two. I am not worried about the timing but I am worried about the length of the file and hence ease of debugging in case we find any issues there. We will leave that for the commiter to decide. OK Documentation 1. The change in ddl.sgml -We will refer to the child tables as partitions, though they -are in every way normal productnamePostgreSQL/ tables. +We will refer to the child tables as partitions, though we assume +that they are normal productnamePostgreSQL/ tables. adds phrase we assume that, which confuses the intention behind the sentence. The original text is intended to highlight the equivalence between partition and normal table, where as the addition esp. the word assume weakens that equivalence. Instead now we have to highlight the equivalence between partition and normal or foreign table. The wording similar to though they are either normal or foreign tables should be used there. You are right, but I feel that there is something out of place in saying that there (5.10.2. Implementing Partitioning) because the procedure there has been written based on normal tables only. Put another way, if we say that, I think we'd need to add more docs, describing the syntax and/or the corresponding examples for foreign-table cases. But I'd like to leave that for another patch. So, how about the wording we assume *here* that, instead of we assume that, as I added the following notice in the previous section (5.10.1. Overview)? @@ -2650,7 +2669,10 @@ VALUES ('Albany', NULL, NULL, 'NY'); table of a single parent table. The parent table itself is normally empty; it exists just to represent the entire data set. You should be familiar with inheritance (see xref linkend=ddl-inherit) before -attempting to set up partitioning. +attempting to set up partitioning. (The setup and management of +partitioned tables illustrated in this section assume that each +partition is a normal table. However, you can do that in a similar way +for cases where some or all partitions are foreign tables.) This looks ok, though, I would like to see final version of the document. But I think, we will leave that for committer to handle. OK 2. The wording some kind of optimization gives vague picture. May be it can be worded as Since the constraints are assumed to be true, they are used in constraint-based query optimization like constraint exclusion for partitioned tables.. +Those constraints are used in some kind of query optimization such +as constraint exclusion for partitioned tables (see +xref linkend=ddl-partitioning). Will follow your revision. Done. Code --- 1. In the following change +/* * acquire_inherited_sample_rows -- acquire sample rows from inheritance tree * * This has the same API as acquire_sample_rows, except that rows are * collected from all inheritance children as well as the specified table. - * We fail and return zero if there are no inheritance children. + * We fail and return zero if there are no inheritance children or there are + * inheritance children that foreign tables. The addition should be there are inheritance children that *are all *foreign tables. Note the addition are all. Sorry, I incorrectly wrote the comment. What I tried to write is We fail and return zero if there are no inheritance children or if we are not in VAC_MODE_SINGLE case and inheritance tree contains at least one foreign table.. You might want to use English description of VAC_MODE_SINGLE instead of that macro in the comment, so that reader doesn't have to look up VAC_MODE_SINGLE. But I think, we will leave this for the committer. I corrected the comments and
Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps
On Thu, Dec 4, 2014 at 9:26 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 12/04/2014 01:47 PM, Petr Jelinek wrote: On 04/12/14 12:26, Heikki Linnakangas wrote: On 12/04/2014 01:16 PM, Petr Jelinek wrote: On 04/12/14 10:42, Heikki Linnakangas wrote: On 12/03/2014 04:54 PM, Alvaro Herrera wrote: ir commit timestamp directly as they commit, or an external transaction c Sorry, I'm late to the party, but here's some random comments on this after a quick review: * The whole concept of a node ID seems to be undocumented, and unused. No-one calls CommitTsSetDefaultNodeId(). The whole SET_TS record type is dead code too, when a non-default node_id is not set. Please remove, or explain how all that's supposed to work. That's API meant to be used by extensions, maybe it would be preferable if there was SQL API exposed also? (It might also make more sense once replication identifiers are submitted.) Maybe. Hard to tell without any documentation or comments on how it's supposed to work. In unacceptable in its current state. I would prefer to remove it now, and add it back later together with the code that actually uses it. I don't like having unused internal functions and APIs sitting the source tree in the hope that someone will find them useful in the future. It's more likely that it will bitrot, or not actually work as intended, when someone later tries to use it. The thing is I already have extension for 9.4 where I would use this API for conflict detection if it was available so it's not about hoping for somebody to find it useful. There was discussion about this during the review process because the objection was raised already then. Yeah, it was raised. I don't think it was really addressed. There was lengthy discussion on whether to include LSN, node id, and/or some other information, but there was no discussion on: * What is a node ID? * How is it used? * Who assigns it? etc. None of those things are explained in the documentation nor code comments. The node ID sounds suspiciously like the replication identifiers that have been proposed a couple of times. I don't remember if I like replication identifiers or not, but I'd rather discuss that issue explicitly and separately. I don't want the concept of a replication/node ID to fly under the radar like this. What would the SQL API look like, and what would it be used for? It would probably mirror the C one, from my POV it's not needed but maybe some replication solution would prefer to use this without having to write C components... I can't comment on that, because without any documentation, I don't even know what the C API is. BTW, why is it OK that the node-ID of a commit is logged as a separate WAL record, after the commit record? That means that it's possible that a transaction commits, but you crash just before writing the SETTS record, so that after replay the transaction appears committed but with the default node ID. (Maybe that's OK given the intended use case, but it's hard to tell without any documentation. See a pattern here? ;-) ) Could it be possible to get a refresh on that before it bitrots too much or we'll simply forget about it? In the current state, there is no documentation able to explain what is the point of the nodeid stuff, how to use it and what use cases it is aimed at. The API in committs.c should as well be part of it. If that's not done, I think that it would be fair to remove this portion and simply WAL-log the commit timestamp its GUC is activated. Thanks, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Tue, Dec 9, 2014 at 11:44 PM, Josh Berkus j...@agliodbs.com wrote: On 12/09/2014 12:17 AM, Amit Langote wrote: Now if user wants to define multi-column Partition based on monthly_salary and annual_salary, how do we want him to specify the values. Basically how to distinguish which values belong to first column key and which one's belong to second column key. Perhaps you are talking about syntactic difficulties that I totally missed in my other reply to this mail? Can we represent the same data by rather using a subpartitioning scheme? ISTM, semantics would remain the same. ... PARTITION BY (monthly_salary) SUBPARTITION BY (annual_salary)? Using SUBPARTITION is not the answer for multi-column partition, I think if we have to support it for List partitioning then something on lines what Josh has mentioned below could workout, but I don't think it is important to support multi-column partition for List at this stage. ... or just use arrays. PARTITION BY LIST ( monthly_salary, annual_salary ) PARTITION salary_small VALUES ({[300,400],[5000,6000]}) ) ... but that begs the question of how partition by list over two columns (or more) would even work? You'd need an a*b number of partitions, and the user would be pretty much certain to miss a few value combinations. Maybe we should just restrict list partitioning to a single column for a first release, and wait and see if people ask for more? I also think we should not support multi-column list partition in first release. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] On partitioning
On Tue, Dec 9, 2014 at 7:21 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Amit Kapila wrote: On Tue, Dec 9, 2014 at 1:42 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 8, 2014 at 2:56 PM, Andres Freund and...@2ndquadrant.com wrote: I don't think that's mutually exclusive with the idea of partitions-as-tables. I mean, you can add code to the ALTER TABLE path that says if (i_am_not_the_partitioning_root) ereport(ERROR, ...) wherever you want. That'll be a lot of places you'll need to touch. More fundamentally: Why should we name something a table that's not one? Well, I'm not convinced that it isn't one. And adding a new relkind will involve a bunch of code churn, too. But I don't much care to pre-litigate this: when someone has got a patch, we can either agree that the approach is OK or argue that it is problematic because X. I think we need to hammer down the design in broad strokes first, and I'm not sure we're totally there yet. That's right, I think at this point defining the top level behaviour/design is very important to proceed, we can decide about the better implementation approach afterwards (may be once initial patch is ready, because it might not be a major work to do it either way). So here's where we are on this point till now as per my understanding, I think that direct operations should be prohibited on partitions, you think that they should be allowed and Andres think that it might be better to allow direct operations on partitions for Read. FWIW in my original proposal I was rejecting some things that after further consideration turn out to be possible to allow; for instance directly referencing individual partitions in COPY. We could allow something like COPY lineitems PARTITION FOR VALUE '2000-01-01' TO STDOUT or maybe COPY PARTITION FOR VALUE '2000-01-01' ON TABLE lineitems TO STDOUT or COPY [TABLE] lineitems PARTITION FOR VALUE '2000-01-01' TO STDOUT COPY [TABLE] lineitems PARTITION part_1,part_2, TO STDOUT I think we should try to support operations on partitions via main table whereever it is required. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] thinko in convertToJsonb()
On Tue, Dec 9, 2014 at 11:11 AM, Mark Dilger m...@port25.com wrote: The call: reserveFromBuffer(buffer, sizeof(VARHDRSZ)) is assuming that the size of varlena header is the same size as the type used to return that size, which happens to be so, but someone could easily change that macro to: #define VARHDRSZ ((int64) sizeof(int32)) And you'd want to reserve sizeof(int32), not sizeof(int64) in convertToJsonb. Perhaps the code really meant to say: reserveFromBuffer(buffer, VARHDRSZ) Good catch! The code is indeed incorrect. Attached is a one-line patch addressing that, I guess someone is going to pick up that sooner or later. -- Michael 20141210_jsonb_varlena_header.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical column ordering
Josh Berkus j...@agliodbs.com writes: Question on COPY, though: there's reasons why people would want COPY to dump in either physical or logical order. If you're doing COPY to create CSV files for output, then you want the columns in logical order. If you're doing COPY for pg_dump, then you want them in physical order for faster dump/reload. So we're almost certainly going to need to have an option for COPY. This is complete nonsense, Josh, or at least it is until you can provide some solid evidence to believe that column ordering would make any noticeable performance difference in COPY. I know of no reason to believe that the existing user-defined-column-ordering option makes any difference. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical column ordering
Alvaro Herrera alvhe...@2ndquadrant.com writes: Andrew Dunstan wrote: I seriously doubt it, although I could be wrong. Unless someone can show a significant performance gain from using physical order, which would be a bit of a surprise to me, I would just stick with logical ordering as the default. Well, we have an optimization that avoids a projection step IIRC by using the physical tlist instead of having to build a tailored one. I guess the reason that's there is because somebody did measure an improvement. Maybe it *is* worth having as an option for pg_dump ... The physical tlist thing is there because it's demonstrable that ExecProject() takes nonzero time. COPY does not go through ExecProject though. What's more, it already has code to deal with a user-specified column order, and nobody's ever claimed that that code imposes a measurable performance overhead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
We haven't heard anything from Horiguchi-san and Hanada-san for almost a week. So, I am fine marking it as ready for committer. What do you say? On Wed, Dec 10, 2014 at 8:48 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: Hi Ashutosh, Thanks for the review! (2014/11/28 18:14), Ashutosh Bapat wrote: On Thu, Nov 27, 2014 at 3:52 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: (2014/11/17 17:55), Ashutosh Bapat wrote: Here are my review comments for patch fdw-inh-3.patch. Tests --- 1. It seems like you have copied from testcase inherit.sql to postgres_fdw testcase. That's a good thing, but it makes the test quite long. May be we should have two tests in postgres_fdw contrib module, one for simple cases, and other for inheritance. What do you say? IMO, the test is not so time-consuming, so it doesn't seem worth splitting it into two. I am not worried about the timing but I am worried about the length of the file and hence ease of debugging in case we find any issues there. We will leave that for the commiter to decide. OK Documentation 1. The change in ddl.sgml -We will refer to the child tables as partitions, though they -are in every way normal productnamePostgreSQL/ tables. +We will refer to the child tables as partitions, though we assume +that they are normal productnamePostgreSQL/ tables. adds phrase we assume that, which confuses the intention behind the sentence. The original text is intended to highlight the equivalence between partition and normal table, where as the addition esp. the word assume weakens that equivalence. Instead now we have to highlight the equivalence between partition and normal or foreign table. The wording similar to though they are either normal or foreign tables should be used there. You are right, but I feel that there is something out of place in saying that there (5.10.2. Implementing Partitioning) because the procedure there has been written based on normal tables only. Put another way, if we say that, I think we'd need to add more docs, describing the syntax and/or the corresponding examples for foreign-table cases. But I'd like to leave that for another patch. So, how about the wording we assume *here* that, instead of we assume that, as I added the following notice in the previous section (5.10.1. Overview)? @@ -2650,7 +2669,10 @@ VALUES ('Albany', NULL, NULL, 'NY'); table of a single parent table. The parent table itself is normally empty; it exists just to represent the entire data set. You should be familiar with inheritance (see xref linkend=ddl-inherit) before -attempting to set up partitioning. +attempting to set up partitioning. (The setup and management of +partitioned tables illustrated in this section assume that each +partition is a normal table. However, you can do that in a similar way +for cases where some or all partitions are foreign tables.) This looks ok, though, I would like to see final version of the document. But I think, we will leave that for committer to handle. OK 2. The wording some kind of optimization gives vague picture. May be it can be worded as Since the constraints are assumed to be true, they are used in constraint-based query optimization like constraint exclusion for partitioned tables.. +Those constraints are used in some kind of query optimization such +as constraint exclusion for partitioned tables (see +xref linkend=ddl-partitioning). Will follow your revision. Done. Code --- 1. In the following change +/* * acquire_inherited_sample_rows -- acquire sample rows from inheritance tree * * This has the same API as acquire_sample_rows, except that rows are * collected from all inheritance children as well as the specified table. - * We fail and return zero if there are no inheritance children. + * We fail and return zero if there are no inheritance children or there are + * inheritance children that foreign tables. The addition should be there are inheritance children that *are all *foreign tables. Note the addition are all. Sorry, I incorrectly wrote the comment. What I tried to write is We fail and return zero if there are no inheritance children or if we are
Re: [HACKERS] On partitioning
On Wed, Dec 10, 2014 at 12:33 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Dec 9, 2014 at 11:44 PM, Josh Berkus j...@agliodbs.com wrote: On 12/09/2014 12:17 AM, Amit Langote wrote: Now if user wants to define multi-column Partition based on monthly_salary and annual_salary, how do we want him to specify the values. Basically how to distinguish which values belong to first column key and which one's belong to second column key. Perhaps you are talking about syntactic difficulties that I totally missed in my other reply to this mail? Can we represent the same data by rather using a subpartitioning scheme? ISTM, semantics would remain the same. ... PARTITION BY (monthly_salary) SUBPARTITION BY (annual_salary)? Using SUBPARTITION is not the answer for multi-column partition, I think if we have to support it for List partitioning then something on lines what Josh has mentioned below could workout, but I don't think it is important to support multi-column partition for List at this stage. Yeah, I realize multicolumn list partitioning and list-list composite partitioning are different things in many respects. And given how awkward multicolumn list partitioning is looking to implement, I also think we only allow single column in a list partition key. ... or just use arrays. PARTITION BY LIST ( monthly_salary, annual_salary ) PARTITION salary_small VALUES ({[300,400],[5000,6000]}) ) ... but that begs the question of how partition by list over two columns (or more) would even work? You'd need an a*b number of partitions, and the user would be pretty much certain to miss a few value combinations. Maybe we should just restrict list partitioning to a single column for a first release, and wait and see if people ask for more? I also think we should not support multi-column list partition in first release. Yes. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Wed, Dec 10, 2014 at 12:46 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Dec 9, 2014 at 7:21 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Amit Kapila wrote: On Tue, Dec 9, 2014 at 1:42 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 8, 2014 at 2:56 PM, Andres Freund and...@2ndquadrant.com wrote: I don't think that's mutually exclusive with the idea of partitions-as-tables. I mean, you can add code to the ALTER TABLE path that says if (i_am_not_the_partitioning_root) ereport(ERROR, ...) wherever you want. That'll be a lot of places you'll need to touch. More fundamentally: Why should we name something a table that's not one? Well, I'm not convinced that it isn't one. And adding a new relkind will involve a bunch of code churn, too. But I don't much care to pre-litigate this: when someone has got a patch, we can either agree that the approach is OK or argue that it is problematic because X. I think we need to hammer down the design in broad strokes first, and I'm not sure we're totally there yet. That's right, I think at this point defining the top level behaviour/design is very important to proceed, we can decide about the better implementation approach afterwards (may be once initial patch is ready, because it might not be a major work to do it either way). So here's where we are on this point till now as per my understanding, I think that direct operations should be prohibited on partitions, you think that they should be allowed and Andres think that it might be better to allow direct operations on partitions for Read. FWIW in my original proposal I was rejecting some things that after further consideration turn out to be possible to allow; for instance directly referencing individual partitions in COPY. We could allow something like COPY lineitems PARTITION FOR VALUE '2000-01-01' TO STDOUT or maybe COPY PARTITION FOR VALUE '2000-01-01' ON TABLE lineitems TO STDOUT or COPY [TABLE] lineitems PARTITION FOR VALUE '2000-01-01' TO STDOUT COPY [TABLE] lineitems PARTITION part_1,part_2, TO STDOUT I think we should try to support operations on partitions via main table whereever it is required. We can also allow to explicitly name a partition COPY [TABLE ] lineitems PARTITION lineitems_2001 TO STDOUT; Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers