Re: CLUSTER on partitioned index
On 2022-Apr-26, Michael Paquier wrote: > On Sat, Apr 16, 2022 at 08:58:50PM +0900, Michael Paquier wrote: > > Well, I am a bit annoyed that we don't actually check that a CLUSTER > > command does not block when doing a CLUSTER on a partitioned table > > while a lock is held on one of its partitions. So, attached is a > > proposal of patch to improve the test coverage in this area. > > This was the last reason why this was listed as an open item, so, > hearing nothing, I have applied this patch to add those extra tests, > and switched the item as fixed. Thank you! -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: CLUSTER on partitioned index
On Sat, Apr 16, 2022 at 08:58:50PM +0900, Michael Paquier wrote: > Well, I am a bit annoyed that we don't actually check that a CLUSTER > command does not block when doing a CLUSTER on a partitioned table > while a lock is held on one of its partitions. So, attached is a > proposal of patch to improve the test coverage in this area. While on > it, I have added a test with a normal table. You can see the > difference once you remove the ACL check added recently in > get_tables_to_cluster_partitioned(). What do you think? This was the last reason why this was listed as an open item, so, hearing nothing, I have applied this patch to add those extra tests, and switched the item as fixed. -- Michael signature.asc Description: PGP signature
Re: CLUSTER on partitioned index
On Thu, Apr 14, 2022 at 10:37:06PM +0200, Alvaro Herrera wrote: > Thanks for the patch -- I have pushed it now, with some wording changes > and renaming the role to regress_* to avoid buildfarm's ire. Cool, thanks. > Michaël in addition proposes an isolation test. I'm not sure; is it > worth the additional test run time? It doesn't seem a critical issue. > But if anybody feels like contributing one, step right ahead. Well, I am a bit annoyed that we don't actually check that a CLUSTER command does not block when doing a CLUSTER on a partitioned table while a lock is held on one of its partitions. So, attached is a proposal of patch to improve the test coverage in this area. While on it, I have added a test with a normal table. You can see the difference once you remove the ACL check added recently in get_tables_to_cluster_partitioned(). What do you think? -- Michael From fdf5347bf4853f19341a8d67a655cae9fece15f0 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Sat, 16 Apr 2022 20:52:03 +0900 Subject: [PATCH] Add isolation tests for CLUSTER with partitions --- .../expected/cluster-conflict-partition.out | 35 ++ .../isolation/expected/cluster-conflict.out | 19 ++ src/test/isolation/isolation_schedule | 2 + .../specs/cluster-conflict-partition.spec | 37 +++ .../isolation/specs/cluster-conflict.spec | 30 +++ 5 files changed, 123 insertions(+) create mode 100644 src/test/isolation/expected/cluster-conflict-partition.out create mode 100644 src/test/isolation/expected/cluster-conflict.out create mode 100644 src/test/isolation/specs/cluster-conflict-partition.spec create mode 100644 src/test/isolation/specs/cluster-conflict.spec diff --git a/src/test/isolation/expected/cluster-conflict-partition.out b/src/test/isolation/expected/cluster-conflict-partition.out new file mode 100644 index 00..7acb675c97 --- /dev/null +++ b/src/test/isolation/expected/cluster-conflict-partition.out @@ -0,0 +1,35 @@ +Parsed test spec with 2 sessions + +starting permutation: s1_begin s1_lock_parent s2_auth s2_cluster s1_commit s2_reset +step s1_begin: BEGIN; +step s1_lock_parent: LOCK cluster_part_tab IN SHARE UPDATE EXCLUSIVE MODE; +step s2_auth: SET ROLE regress_cluster_part; +step s2_cluster: CLUSTER cluster_part_tab USING cluster_part_ind; +step s1_commit: COMMIT; +step s2_cluster: <... completed> +step s2_reset: RESET ROLE; + +starting permutation: s1_begin s2_auth s1_lock_parent s2_cluster s1_commit s2_reset +step s1_begin: BEGIN; +step s2_auth: SET ROLE regress_cluster_part; +step s1_lock_parent: LOCK cluster_part_tab IN SHARE UPDATE EXCLUSIVE MODE; +step s2_cluster: CLUSTER cluster_part_tab USING cluster_part_ind; +step s1_commit: COMMIT; +step s2_cluster: <... completed> +step s2_reset: RESET ROLE; + +starting permutation: s1_begin s1_lock_child s2_auth s2_cluster s1_commit s2_reset +step s1_begin: BEGIN; +step s1_lock_child: LOCK cluster_part_tab1 IN SHARE UPDATE EXCLUSIVE MODE; +step s2_auth: SET ROLE regress_cluster_part; +step s2_cluster: CLUSTER cluster_part_tab USING cluster_part_ind; +step s1_commit: COMMIT; +step s2_reset: RESET ROLE; + +starting permutation: s1_begin s2_auth s1_lock_child s2_cluster s1_commit s2_reset +step s1_begin: BEGIN; +step s2_auth: SET ROLE regress_cluster_part; +step s1_lock_child: LOCK cluster_part_tab1 IN SHARE UPDATE EXCLUSIVE MODE; +step s2_cluster: CLUSTER cluster_part_tab USING cluster_part_ind; +step s1_commit: COMMIT; +step s2_reset: RESET ROLE; diff --git a/src/test/isolation/expected/cluster-conflict.out b/src/test/isolation/expected/cluster-conflict.out new file mode 100644 index 00..614d8f9d15 --- /dev/null +++ b/src/test/isolation/expected/cluster-conflict.out @@ -0,0 +1,19 @@ +Parsed test spec with 2 sessions + +starting permutation: s1_begin s1_lock s2_auth s2_cluster s1_commit s2_reset +step s1_begin: BEGIN; +step s1_lock: LOCK cluster_tab IN SHARE UPDATE EXCLUSIVE MODE; +step s2_auth: SET ROLE regress_cluster_conflict; +step s2_cluster: CLUSTER cluster_tab USING cluster_ind; +step s1_commit: COMMIT; +step s2_cluster: <... completed> +step s2_reset: RESET ROLE; + +starting permutation: s1_begin s2_auth s1_lock s2_cluster s1_commit s2_reset +step s1_begin: BEGIN; +step s2_auth: SET ROLE regress_cluster_conflict; +step s1_lock: LOCK cluster_tab IN SHARE UPDATE EXCLUSIVE MODE; +step s2_cluster: CLUSTER cluster_tab USING cluster_ind; +step s1_commit: COMMIT; +step s2_cluster: <... completed> +step s2_reset: RESET ROLE; diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule index 607760386e..529a4cbd4d 100644 --- a/src/test/isolation/isolation_schedule +++ b/src/test/isolation/isolation_schedule @@ -102,6 +102,8 @@ test: partition-key-update-2 test: partition-key-update-3 test: partition-key-update-4 test: plpgsql-toast +test: cluster-conflict +test: cluster-conflict-partition test: truncate-conflict test: serializable-parallel
Re: CLUSTER on partitioned index
Thanks for the patch -- I have pushed it now, with some wording changes and renaming the role to regress_* to avoid buildfarm's ire. Michaël in addition proposes an isolation test. I'm not sure; is it worth the additional test run time? It doesn't seem a critical issue. But if anybody feels like contributing one, step right ahead. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: CLUSTER on partitioned index
On Wed, Apr 13, 2022 at 05:52:14AM -0500, Justin Pryzby wrote: > Are you sure? The ownership re-check in cluster_rel() occurs after acquiring > locks. Yep, you are right. However, the SQL test does not check for this blocking scenario. In short, removing the new ACL check in get_tables_to_cluster_partitioned() makes the test behave the same way. Could you implement an isolation check to make sure that the difference is visible? The SQL check looks useful in itself, either way. -- Michael signature.asc Description: PGP signature
Re: CLUSTER on partitioned index
On Wed, Apr 13, 2022 at 03:50:15PM +0900, Michael Paquier wrote: > > > That dates to a556549d7 (see also cbe24a6dd8 for an earlier commit in > > CLUSTER > > itself). The reason was to avoid blocking if an unprivileged user runs > > VACUUM > > FULL which would try to lock things (including shared catalogs) before > > checking > > if they have permission to vacuum them. That commit also initially checks > > the > > owner of the partitioned table, and then re-checking owner of partitions > > later > > on. > > > > A similar issue exists here. But 1) catalog tables are not partitioned, > > and, > > 2) ownership of a partitioned table is checked immediately. So the problem > > can > > only occur if a user who owns a partitioned table but doesn't own all its > > partitions tries to cluster it, and it blocks behind another session. > > Fixing > > this is probably a good idea, but seems improbable that it would avoid a > > DOS. > > Catalogs are out of the picture as you say and I would not worry about > them becoming somewhat partitioned even in the far future. Are you > saying that it is possible for a user kicking a CLUSTER command on a > partitioned table who has no ownership on some of the partitions to > do some blocking table_open() calls if the permission check is not > done in get_tables_to_cluster_partitioned()? Hence, this user could > block the access to such partitions? I am not sure that we need to > add any new ownership checks here as CLUOPT_RECHECK gets added to the > parameters in cluster() before calling cluster_multiple_rels(), then > we do a mix of try_relation_open() with a skip when we are not the > owner anymore. So this logic looks sound to me. In short, you don't > need this extra check, and the test proposed in 0002 keeps the same > behavior. Are you sure? The ownership re-check in cluster_rel() occurs after acquiring locks. s1: postgres=# CREATE TABLE p(i int) PARTITION BY LIST (i); postgres=# CREATE TABLE p1 PARTITION OF p FOR VALUES IN (1); postgres=# CREATE TABLE p2 PARTITION OF p FOR VALUES IN (2); postgres=# CREATE INDEX ON p (i); postgres=# CREATE ROLE po WITH LOGIN; postgres=# ALTER TABLE p OWNER TO po; postgres=# begin; SELECT FROM p1; s2: postgres=> SET client_min_messages =debug; postgres=> CLUSTER VERBOSE p USING p_i_idx ; LOG: process 26058 still waiting for AccessExclusiveLock on relation 39577 of database 5 after 1000.105 ms postgres=> SELECT 39577::regclass; regclass | p1
Re: CLUSTER on partitioned index
On Mon, Apr 11, 2022 at 09:06:09AM -0500, Justin Pryzby wrote: > On Sat, Apr 02, 2022 at 07:21:11PM +0200, Alvaro Herrera wrote: >> 1. in VACUUM FULL we only process partitions that are owned by the >> invoking user. We don't have this test in the new code. I'm not sure >> why do we do that there; is it worth doing the same here? I think that adding a test is a good idea for such things. Perhaps we could have an isolation test, but what Justin is proposing seems good enough to me for this goal. > That dates to a556549d7 (see also cbe24a6dd8 for an earlier commit in CLUSTER > itself). The reason was to avoid blocking if an unprivileged user runs VACUUM > FULL which would try to lock things (including shared catalogs) before > checking > if they have permission to vacuum them. That commit also initially checks the > owner of the partitioned table, and then re-checking owner of partitions later > on. > > A similar issue exists here. But 1) catalog tables are not partitioned, and, > 2) ownership of a partitioned table is checked immediately. So the problem > can > only occur if a user who owns a partitioned table but doesn't own all its > partitions tries to cluster it, and it blocks behind another session. Fixing > this is probably a good idea, but seems improbable that it would avoid a DOS. Catalogs are out of the picture as you say and I would not worry about them becoming somewhat partitioned even in the far future. Are you saying that it is possible for a user kicking a CLUSTER command on a partitioned table who has no ownership on some of the partitions to do some blocking table_open() calls if the permission check is not done in get_tables_to_cluster_partitioned()? Hence, this user could block the access to such partitions? I am not sure that we need to add any new ownership checks here as CLUOPT_RECHECK gets added to the parameters in cluster() before calling cluster_multiple_rels(), then we do a mix of try_relation_open() with a skip when we are not the owner anymore. So this logic looks sound to me. In short, you don't need this extra check, and the test proposed in 0002 keeps the same behavior. >> 2. We should silently skip a partition that's a foreign table, I >> suppose. > > I think it's not needed, since the loop over index children doesn't see a > child > index on the foreign table? Hmm. That may be a sign to add an assertion, at least, or something based on RELKIND_HAS_STORAGE(). I was wondering what 0001 was doing here as that's a separate issue, but it looked fine so I have applied it. + /* Use a permanent memory context for the result list */ + old_context = MemoryContextSwitchTo(cluster_context); + rtc = (RelToCluster *) palloc(sizeof(RelToCluster)); Independently of the extra ownership check, the memory context manipulation has to be fixed and the code shoudl switch to RelToCluster only when saving an item. +CREATE ROLE ptnowner; Roles that are created in the regression tests need to be prefixed with "regress_", or some buildfarm members will complain. FWIW, I enforce -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS in all my dev builds. I have added an open item for now, but the whole looks straight-forward to me. -- Michael signature.asc Description: PGP signature
Re: CLUSTER on partitioned index
On Mon, Apr 11, 2022 at 7:06 AM Justin Pryzby wrote: > On Sat, Apr 02, 2022 at 07:21:11PM +0200, Alvaro Herrera wrote: > > Small things here. > > > 1. in VACUUM FULL we only process partitions that are owned by the > > invoking user. We don't have this test in the new code. I'm not sure > > why do we do that there; is it worth doing the same here? > > That dates to a556549d7 (see also cbe24a6dd8 for an earlier commit in > CLUSTER > itself). The reason was to avoid blocking if an unprivileged user runs > VACUUM > FULL which would try to lock things (including shared catalogs) before > checking > if they have permission to vacuum them. That commit also initially checks > the > owner of the partitioned table, and then re-checking owner of partitions > later > on. > > A similar issue exists here. But 1) catalog tables are not partitioned, > and, > 2) ownership of a partitioned table is checked immediately. So the > problem can > only occur if a user who owns a partitioned table but doesn't own all its > partitions tries to cluster it, and it blocks behind another session. > Fixing > this is probably a good idea, but seems improbable that it would avoid a > DOS. > > > 2. We should silently skip a partition that's a foreign table, I > > suppose. > > I think it's not needed, since the loop over index children doesn't see a > child > index on the foreign table. ? > > > 3. We do mark the index on the partitions as indisclustered AFAICS (we > > claim that the partitioned table's index is not marked, which is > > accurate). So users doing unadorned CLUSTER afterwards will get the > > partitions clustered too, once they cluster the partitioned table. If > > they don't want this, they would have to ALTER TABLE to remove the > > marking. How likely is that this will be a problem? Maybe documenting > > this point is enough. > > It seems at least as likely that someone would *want* the partitions to be > marked clustered as that someone would want them to be unchanged. > > The cluster mark accurately reflects having been clustered. It seems > unlikely > that a user would want something else to be clustered later by "cluster;". > Since clustering on a partitioned table wasn't supported before, nothing > weird > will happen to someone who upgrades to v15 unless they elect to use the new > feature. As this seems to be POLA, it doesn't even need to be > documented. ? > Hi, For v13-0002-cluster-early-ownership-check-of-partitions.patch : only for it to fails ownership check anyway to fails -> to fail Cheers
Re: CLUSTER on partitioned index
On Sat, Apr 02, 2022 at 07:21:11PM +0200, Alvaro Herrera wrote: > Small things here. > 1. in VACUUM FULL we only process partitions that are owned by the > invoking user. We don't have this test in the new code. I'm not sure > why do we do that there; is it worth doing the same here? That dates to a556549d7 (see also cbe24a6dd8 for an earlier commit in CLUSTER itself). The reason was to avoid blocking if an unprivileged user runs VACUUM FULL which would try to lock things (including shared catalogs) before checking if they have permission to vacuum them. That commit also initially checks the owner of the partitioned table, and then re-checking owner of partitions later on. A similar issue exists here. But 1) catalog tables are not partitioned, and, 2) ownership of a partitioned table is checked immediately. So the problem can only occur if a user who owns a partitioned table but doesn't own all its partitions tries to cluster it, and it blocks behind another session. Fixing this is probably a good idea, but seems improbable that it would avoid a DOS. > 2. We should silently skip a partition that's a foreign table, I > suppose. I think it's not needed, since the loop over index children doesn't see a child index on the foreign table. ? > 3. We do mark the index on the partitions as indisclustered AFAICS (we > claim that the partitioned table's index is not marked, which is > accurate). So users doing unadorned CLUSTER afterwards will get the > partitions clustered too, once they cluster the partitioned table. If > they don't want this, they would have to ALTER TABLE to remove the > marking. How likely is that this will be a problem? Maybe documenting > this point is enough. It seems at least as likely that someone would *want* the partitions to be marked clustered as that someone would want them to be unchanged. The cluster mark accurately reflects having been clustered. It seems unlikely that a user would want something else to be clustered later by "cluster;". Since clustering on a partitioned table wasn't supported before, nothing weird will happen to someone who upgrades to v15 unless they elect to use the new feature. As this seems to be POLA, it doesn't even need to be documented. ? >From 19c02209dfbcbf73494e1e6d3ca0db50f64dc5fd Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Fri, 24 Sep 2021 14:18:52 -0500 Subject: [PATCH v13 1/2] Recheck arg is not needed since 2011 It was added at: a4dde3bff36dac1ac0b699becad6f103d861a874 And not used since: 7e2f906201c8bb95f7fb17e56b8740c38bda5441 --- src/backend/commands/cluster.c | 6 +++--- src/backend/commands/tablecmds.c | 2 +- src/include/commands/cluster.h | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 322d6bb2f18..0f0a6e9f018 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -232,7 +232,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel) if (rel != NULL) { Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE); - check_index_is_clusterable(rel, indexOid, true, AccessShareLock); + check_index_is_clusterable(rel, indexOid, AccessShareLock); rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid); /* close relation, releasing lock on parent table */ @@ -434,7 +434,7 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params) /* Check heap and index are valid to cluster on */ if (OidIsValid(indexOid)) - check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock); + check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock); /* * Quietly ignore the request if this is a materialized view which has not @@ -480,7 +480,7 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params) * protection here. */ void -check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMODE lockmode) +check_index_is_clusterable(Relation OldHeap, Oid indexOid, LOCKMODE lockmode) { Relation OldIndex; diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 90edd0bb97d..1d7db41d172 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -14055,7 +14055,7 @@ ATExecClusterOn(Relation rel, const char *indexName, LOCKMODE lockmode) indexName, RelationGetRelationName(rel; /* Check index is valid to cluster on */ - check_index_is_clusterable(rel, indexOid, false, lockmode); + check_index_is_clusterable(rel, indexOid, lockmode); /* And do the work */ mark_index_clustered(rel, indexOid, false); diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h index 3c279f6210a..df8e73af409 100644 --- a/src/include/commands/cluster.h +++ b/src/include/commands/cluster.h @@ -34,7 +34,7 @@ typedef struct ClusterParams extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel); extern void cluster_rel(Oid
Re: CLUSTER on partitioned index
On Sat, Apr 02, 2022 at 07:11:47PM +0200, Alvaro Herrera wrote: > Thanks, pushed. Thank you for revisiting it, and thanks to Zhihong Yu for earlier review. I'll look into your outstanding questions later this week. -- Justin
Re: CLUSTER on partitioned index
Small things here. 1. in VACUUM FULL we only process partitions that are owned by the invoking user. We don't have this test in the new code. I'm not sure why do we do that there; is it worth doing the same here? 2. We should silently skip a partition that's a foreign table, I suppose. 3. We do mark the index on the partitions as indisclustered AFAICS (we claim that the partitioned table's index is not marked, which is accurate). So users doing unadorned CLUSTER afterwards will get the partitions clustered too, once they cluster the partitioned table. If they don't want this, they would have to ALTER TABLE to remove the marking. How likely is that this will be a problem? Maybe documenting this point is enough. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ Voy a acabar con todos los humanos / con los humanos yo acabaré voy a acabar con todos (bis) / con todos los humanos acabaré ¡acabaré! (Bender)
Re: CLUSTER on partitioned index
Thanks, pushed. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La fuerza no está en los medios físicos sino que reside en una voluntad indomable" (Gandhi)
Re: CLUSTER on partitioned index
On Thu, Mar 31, 2022 at 12:54:36PM +0200, Alvaro Herrera wrote: > I realized after posting that we used to allow clustering toast tables, > but after my changes we no longer do. (Justin's version had a > RELKIND_HAS_STORAGE test here instead, which seemed a little too lax.) I > don't know why we allowed it and I don't know of anyone who has ever > used that feature and we don't have any test coverage for it, but I > don't have any reason to explicitly disallow it either. So I propose to > continue to allow it: Good catch. My daily vacuum script would've discovered that they're no longer supported, as it tests for (among other things) c.relkind IN ('r','t'). That clusters tables that have an indisclustered set and vacuums various others. (BTW, it's the same script that discovered in 2019 that clustering on expressional indexes had been broken by the heapam changes). I think the response should be to add a test case, which could be 0001 or 00099.
Re: CLUSTER on partitioned index
On Wed, Mar 30, 2022 at 10:51:43PM +0200, Alvaro Herrera wrote: > On 2022-Feb-23, Justin Pryzby wrote: > > > I hope that Alvaro will comment on the simplified patches. If multiple > > people > > think the patch isn't worth it, feel free to close it. But I don't see how > > complexity could be the reason. > > I gave your patch a look and it seems a reasonable thing to do. Maybe > not terribly useful in most cases, but there may be some cases for which > it is. I found some part of it a bit repetitive, so I moved things > around a bit. What do think about this? Thanks for looking at it. The changes to finish_heap_swap() and get_tables_to_cluster() are superfluous, right ? I think this comment is worth preserving (it'd be okay if it lived in the commit message). -* Expand partitioned relations for CLUSTER (the corresponding -* thing for VACUUM FULL happens in and around expand_vacuum_rel() + if (rel != NULL) In this case, maybe it should Assert() that it's relkind=p (mostly for purposes of self-documentation). +partition of the specified partitioned index (which must be specified). This is my own language, but now seems repetitive. I think the parenthetical part should be a separate sentance: "For partitioned indexes, the index may not be omitted.". Otherwise looks ok. diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml index b3463ae5c46..fbc090cd0b0 100644 --- a/doc/src/sgml/ref/cluster.sgml +++ b/doc/src/sgml/ref/cluster.sgml @@ -199,7 +199,8 @@ CLUSTER [VERBOSE] Clustering a partitioned table clusters each of its partitions using the -partition of the specified partitioned index (which must be specified). +partition of the specified partitioned index. When clustering a +partitioned table, the index may not be omitted. diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 8417cbdb67f..412147f05bc 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -231,6 +231,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel) params.options |= CLUOPT_RECHECK; if (rel != NULL) { + Assert (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE); check_index_is_clusterable(rel, indexOid, true, AccessShareLock); rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid); @@ -451,6 +452,7 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params) } Assert(OldHeap->rd_rel->relkind == RELKIND_RELATION || + OldHeap->rd_rel->relkind == RELKIND_TOASTVALUE || OldHeap->rd_rel->relkind == RELKIND_MATVIEW); /* diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out index 3f2758d13f6..6cf18c8d321 100644 --- a/src/test/regress/expected/cluster.out +++ b/src/test/regress/expected/cluster.out @@ -305,6 +305,8 @@ WHERE pg_class.oid=indexrelid - (0 rows) +-- Verify that toast is clusterable +CLUSTER pg_toast.pg_toast_826 USING pg_toast_826_index; -- Verify that clustering all tables does in fact cluster the right ones CREATE USER regress_clstr_user; CREATE TABLE clstr_1 (a INT PRIMARY KEY); diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql index 74118993a82..ae27c35f65d 100644 --- a/src/test/regress/sql/cluster.sql +++ b/src/test/regress/sql/cluster.sql @@ -104,6 +104,9 @@ WHERE pg_class.oid=indexrelid AND pg_class_2.relname = 'clstr_tst' AND indisclustered; +-- Verify that toast is clusterable +CLUSTER pg_toast.pg_toast_826 USING pg_toast_826_index; + -- Verify that clustering all tables does in fact cluster the right ones CREATE USER regress_clstr_user; CREATE TABLE clstr_1 (a INT PRIMARY KEY);
Re: CLUSTER on partitioned index
On Thu, Mar 31, 2022 at 6:54 AM Alvaro Herrera wrote: > I realized after posting that we used to allow clustering toast tables, > but after my changes we no longer do. (Justin's version had a > RELKIND_HAS_STORAGE test here instead, which seemed a little too lax.) I > don't know why we allowed it and I don't know of anyone who has ever > used that feature and we don't have any test coverage for it, but I > don't have any reason to explicitly disallow it either. So I propose to > continue to allow it: I think that's probably a good decision. It's certainly useful to have a way to force a rewrite of a TOAST table, although a lot of people who would benefit from that operation probably don't know that they need it, or don't know that they need just that, and end up rewriting both the main table and the TOAST table. Whether it's useful to be able to run CLUSTER specifically rather than VACUUM FULL on the TOAST table is less clear, but I don't think we're likely to save anything by forbidding it. Maybe we should consider adding a test, though. -- Robert Haas EDB: http://www.enterprisedb.com
Re: CLUSTER on partitioned index
I realized after posting that we used to allow clustering toast tables, but after my changes we no longer do. (Justin's version had a RELKIND_HAS_STORAGE test here instead, which seemed a little too lax.) I don't know why we allowed it and I don't know of anyone who has ever used that feature and we don't have any test coverage for it, but I don't have any reason to explicitly disallow it either. So I propose to continue to allow it: >From 05ba6124422fb7c2fd19575e905e444ba3eef1e5 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 31 Mar 2022 12:49:57 +0200 Subject: [PATCH] allow to cluster toast tables --- src/backend/commands/cluster.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 8417cbdb67..b391d7c434 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -451,7 +451,8 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params) } Assert(OldHeap->rd_rel->relkind == RELKIND_RELATION || - OldHeap->rd_rel->relkind == RELKIND_MATVIEW); + OldHeap->rd_rel->relkind == RELKIND_MATVIEW || + OldHeap->rd_rel->relkind == RELKIND_TOASTVALUE); /* * All predicate locks on the tuples or pages are about to be made -- 2.30.2 -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Java is clearly an example of money oriented programming" (A. Stepanov)
Re: CLUSTER on partitioned index
On 2022-Feb-23, Justin Pryzby wrote: > I hope that Alvaro will comment on the simplified patches. If multiple people > think the patch isn't worth it, feel free to close it. But I don't see how > complexity could be the reason. I gave your patch a look and it seems a reasonable thing to do. Maybe not terribly useful in most cases, but there may be some cases for which it is. I found some part of it a bit repetitive, so I moved things around a bit. What do think about this? -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was amazing when I first started using it at 7.2, and I'm continually astounded by learning new features and techniques made available by the continuing work of the development team." Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml index 86f5fdc469..b3463ae5c4 100644 --- a/doc/src/sgml/ref/cluster.sgml +++ b/doc/src/sgml/ref/cluster.sgml @@ -196,6 +196,12 @@ CLUSTER [VERBOSE] in the pg_stat_progress_cluster view. See for details. + + +Clustering a partitioned table clusters each of its partitions using the +partition of the specified partitioned index (which must be specified). + + diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 02a7e94bf9..8417cbdb67 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -32,7 +32,9 @@ #include "catalog/index.h" #include "catalog/namespace.h" #include "catalog/objectaccess.h" +#include "catalog/partition.h" #include "catalog/pg_am.h" +#include "catalog/pg_inherits.h" #include "catalog/toasting.h" #include "commands/cluster.h" #include "commands/defrem.h" @@ -73,6 +75,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, bool *pSwapToastByContent, TransactionId *pFreezeXid, MultiXactId *pCutoffMulti); static List *get_tables_to_cluster(MemoryContext cluster_context); +static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context, + Oid indexOid); +static void cluster_multiple_rels(List *rvs, ClusterParams *params); /*--- @@ -105,6 +110,10 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel) ListCell *lc; ClusterParams params = {0}; bool verbose = false; + Relation rel = NULL; + Oid indexOid = InvalidOid; + MemoryContext cluster_context; + List *rtcs; /* Parse option list */ foreach(lc, stmt->params) @@ -126,11 +135,13 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel) if (stmt->relation != NULL) { /* This is the single-relation case. */ - Oid tableOid, - indexOid = InvalidOid; - Relation rel; + Oid tableOid; - /* Find, lock, and check permissions on the table */ + /* + * Find, lock, and check permissions on the table. We obtain + * AccessExclusiveLock right away to avoid lock-upgrade hazard in the + * single-transaction case. + */ tableOid = RangeVarGetRelidExtended(stmt->relation, AccessExclusiveLock, 0, @@ -146,14 +157,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel) (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot cluster temporary tables of other sessions"))); - /* - * Reject clustering a partitioned table. - */ - if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot cluster a partitioned table"))); - if (stmt->indexname == NULL) { ListCell *index; @@ -188,71 +191,100 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel) stmt->indexname, stmt->relation->relname))); } - /* close relation, keep lock till commit */ - table_close(rel, NoLock); + if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) + { + /* close relation, keep lock till commit */ + table_close(rel, NoLock); - /* Do the job. */ - cluster_rel(tableOid, indexOid, ¶ms); + /* Do the job. */ + cluster_rel(tableOid, indexOid, ¶ms); + + return; + } + } + + /* + * By here, we know we are in a multi-table situation. In order to avoid + * holding locks for too long, we want to process each table in its own + * transaction. This forces us to disallow running inside a user + * transaction block. + */ + PreventInTransactionBlock(isTopLevel, "CLUSTER"); + + /* Also, we need a memory context to hold our list of relations */ + cluster_context = AllocSetContextCreate(PortalContext, + "Cluster", + ALLOCSET_DEFAULT_SIZES); + + /* + * Either we're processing a partitioned table, or we were not given any + * table name at all. In either case, obtain a list of relations to + * process. + * + *
Re: CLUSTER on partitioned index
On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote: > I have to wonder if there really *is* a use case for CLUSTER in the > first place on regular tables, let alone on partitioned tables, which > are likely to be large and thus take a lot of time. What justifies > spending so much time on this implementation? My impression is that > CLUSTER is pretty much a fringe command nowadays, because of the access > exclusive lock required. > > Does anybody actually use it? I hope that Alvaro will comment on the simplified patches. If multiple people think the patch isn't worth it, feel free to close it. But I don't see how complexity could be the reason. -- Justin
Re: CLUSTER on partitioned index
On Fri, Dec 03, 2021 at 10:16:24AM +0900, Michael Paquier wrote: > On Thu, Sep 23, 2021 at 06:56:26PM -0500, Justin Pryzby wrote: > > On Thu, Sep 23, 2021 at 08:18:41PM +0200, Matthias van de Meent wrote: > >> Note: The following review is based on the assumption that this v11 > >> revision was meant to contain only one patch. I put this up as a note, > >> because it seemed quite limited when compared to earlier versions of > >> the patchset. > > > > Alvaro's critique was that the patchset was too complicated for what was > > claimed to be a marginal feature. My response was to rearrange the > > patchset to > > its minimal form, supporting CLUSTER without marking the index as clustered. > > > > My goal is to present a minimal patch and avoid any nonessential complexity. > > FWIW, my opinion on the matter is similar to Alvaro's, and an extra > read of the patch gives me the same impression. Let's see if others > have an opinion on the matter. You and Alvaro thought the patch was too complicated for its value, so I reduced the scope to its essential form. CLUSTER was claimed to be of marginal utility, since the table is locked. But locking one partition at a time would be less disruptive than locking an equivalent non-partitioned table. There's only about a dozen, other remaining restrictions/limitations on partitioned tables, (AMs, triggers, identity, generated, exclusion, CIC, FREEZE). Since it's supported to VACUUM (including VACUUM FULL) and REINDEX a partitioned table, I'm still suprised there's much hesitation to support CLUSTER (which is used by vacuum full). > > Thanks for looking. I'm going to see about updating comments based on > > corresponding parts of vacuum and on this message itself. > > It doesn't feel right to just discard the patch at this stage, and it > needs an update, so I have moved it to the next CF for now, waiting on > author. If this does not really move on, my suggestion is to discard > the patch at the end of next CF, aka 2022-01. This includes minor updates based on Mathias review (commit message and test case). -- Justin >From cbeb98a016a05738007c98ef0a3828bdedf83d24 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sun, 7 Jun 2020 16:58:42 -0500 Subject: [PATCH v11] Implement CLUSTER of partitioned table.. VACUUM (including vacuum full) has recursed into partitioned tables since partitioning were introduced in v10 (3c3bb9933). See expand_vacuum_rel(). For VACUUM FULL, vacuum_rel() calls cluster_rel() for each partition. This patch is to make cluster() do all the same stuff before calling cluster_rel(). For now, partitioned indexes cannot be marked clustered, so clustering requires specification of a partitioned index on the partitioned table. See also a556549d7 and 19de0ab23. --- doc/src/sgml/ref/cluster.sgml | 6 + src/backend/commands/cluster.c| 177 -- src/bin/psql/tab-complete.c | 1 + src/include/commands/cluster.h| 1 + src/test/regress/expected/cluster.out | 45 ++- src/test/regress/sql/cluster.sql | 21 ++- 6 files changed, 204 insertions(+), 47 deletions(-) diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml index 86f5fdc469b..b3463ae5c46 100644 --- a/doc/src/sgml/ref/cluster.sgml +++ b/doc/src/sgml/ref/cluster.sgml @@ -196,6 +196,12 @@ CLUSTER [VERBOSE] in the pg_stat_progress_cluster view. See for details. + + +Clustering a partitioned table clusters each of its partitions using the +partition of the specified partitioned index (which must be specified). + + diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 61853e6dec4..196c7f7eeb2 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -32,7 +32,9 @@ #include "catalog/index.h" #include "catalog/namespace.h" #include "catalog/objectaccess.h" +#include "catalog/partition.h" #include "catalog/pg_am.h" +#include "catalog/pg_inherits.h" #include "catalog/toasting.h" #include "commands/cluster.h" #include "commands/defrem.h" @@ -73,6 +75,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, bool *pSwapToastByContent, TransactionId *pFreezeXid, MultiXactId *pCutoffMulti); static List *get_tables_to_cluster(MemoryContext cluster_context); +static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context, + Oid indexOid); +static void cluster_multiple_rels(List *rvs, int options); /*--- @@ -131,6 +136,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel) Relation rel; /* Find, lock, and check permissions on the table */ + /* Obtain AEL now to avoid lock-upgrade hazard in the single-transaction case */ tableOid = RangeVarGetRelidExtended(stmt->relation, AccessExclusiveLock, 0, @@ -146,14 +152,6 @@ clus
Re: CLUSTER on partitioned index
On Thu, Sep 23, 2021 at 06:56:26PM -0500, Justin Pryzby wrote: > On Thu, Sep 23, 2021 at 08:18:41PM +0200, Matthias van de Meent wrote: >> Note: The following review is based on the assumption that this v11 >> revision was meant to contain only one patch. I put this up as a note, >> because it seemed quite limited when compared to earlier versions of >> the patchset. > > Alvaro's critique was that the patchset was too complicated for what was > claimed to be a marginal feature. My response was to rearrange the patchset > to > its minimal form, supporting CLUSTER without marking the index as clustered. > > My goal is to present a minimal patch and avoid any nonessential complexity. FWIW, my opinion on the matter is similar to Alvaro's, and an extra read of the patch gives me the same impression. Let's see if others have an opinion on the matter. > Thanks for looking. I'm going to see about updating comments based on > corresponding parts of vacuum and on this message itself. It doesn't feel right to just discard the patch at this stage, and it needs an update, so I have moved it to the next CF for now, waiting on author. If this does not really move on, my suggestion is to discard the patch at the end of next CF, aka 2022-01. -- Michael signature.asc Description: PGP signature
Re: CLUSTER on partitioned index
On Thu, Sep 23, 2021 at 08:18:41PM +0200, Matthias van de Meent wrote: > On Sun, 12 Sept 2021 at 22:10, Justin Pryzby wrote: > > > > On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote: > > > I have to wonder if there really *is* a use case for CLUSTER in the > > > first place on regular tables, let alone on partitioned tables, which > > > are likely to be large and thus take a lot of time. > > > > The cluster now is done one partition at a time, so it might take a long > > time, > > but doesn't lock the whole partition heirarchy. Same as VACUUM (since v10) > > and > > (since v14) REINDEX. > > Note: The following review is based on the assumption that this v11 > revision was meant to contain only one patch. I put this up as a note, > because it seemed quite limited when compared to earlier versions of > the patchset. Alvaro's critique was that the patchset was too complicated for what was claimed to be a marginal feature. My response was to rearrange the patchset to its minimal form, supporting CLUSTER without marking the index as clustered. > I noticed that you store the result of find_all_inheritors(..., > NoLock) in get_tables_to_cluster_partitioned, without taking care of > potential concurrent partition hierarchy changes that the comment on > find_all_inheritors warns against or documenting why it is safe, which > sounds dangerous in case someone wants to adapt the code. One problem > I can think of is that only storing reloid and indoid is not > necessarily safe, as they might be reused by drop+create table running > parallel to the CLUSTER command. The parallel code in vacuum is expand_vacuum_rel(), which is where the corresponding things happens for vacuum full. This patch is to make cluster() do all the same stuff before calling cluster_rel(). What VACUUM tries to do is to avoid erroring if a partition is dropped while cluster is running. cluster_rel() does the same thing by calling cluster_multiple_rels() ,which uses CLUOPT_RECHECK. If the OIDs wrapped around, I think existing vacuum could accidentally process a new table with the same OID as a dropped partition. I think cluster would *normally* catch that case and error in check_index_is_clusterable(): | Check that index is in fact an index on the given relation Arguably VACUUM FULL could call cluster() (not cluster_rel()) and pass the partitioned table rather than first expanding it. But non-full vacuum needs to expand partitioned tables anyway. > Apart from that, I think it would be useful (though not strictly > necessary for this patch) if you could adapt the current CLUSTER > progress reporting to report the progress for the whole partition > hierarchy, instead of a new progress report for each relation, as was > the case in earlier versions of the patchset. Yea, but this is already true for VACUUM FULL (which uses CLUSTER and supports partitioned tables since v10) and REINDEX. See also https://postgr.es/m/20210216064214.gi28...@telsasoft.com My goal is to present a minimal patch and avoid any nonessential complexity. > The v11 patch seems quite incomplete when compared to previous > discussions, or at the very least is very limited (no ALTER TABLE > clustering commands for partitioned tables, but `CLUSTER ptable USING > pindex` is supported). If v11 is the new proposed direction for ptable > clustering, could you also document these limitations in the > cluster.sgml and alter_table.sgml docs? You said it's less complete, but it's is due to deliberate reduction in scope. cluster.sgml says: +Clustering a partitioned table clusters each of its partitions using the +partition of the specified partitioned index (which must be specified). The ALTER restriction hasn't changed, so I didn't touch the documentation. I am still curious myself to know if this is the direction the patch should move. > > [ v11-0001-Implement-CLUSTER-of-partitioned-table.patch ] > > > diff --git a/src/test/regress/expected/cluster.out > > b/src/test/regress/expected/cluster.out > > ... > > +ALTER TABLE clstrpart SET WITHOUT CLUSTER; > > +ERROR: cannot mark index clustered in partitioned table > > This error message does not seem to match my expectation as a user: I > am not trying to mark an index as clustered, and for a normal table > "SET WITHOUT CLUSTER" does not fail for unclustered tables. I think > that behaviour of normal unclustered tables should be shared here as > well. At the very least, the error message should be changed. This is the pre-existing behavior. > > -DROP TABLE clstrpart; > > I believe that this cleanup should not be fully removed, but moved to > before '-- Test CLUSTER with external tuplesorting', as the table is > not used after that line. You're right - this was from when the patchset handled CLUSTER ON. Leaving the index allows testing in pg_dump - a large part of the complexity of the elided patches is to handle restoring a partitioned index, without violating the rule that partitions of an c
Re: CLUSTER on partitioned index
On Sun, 12 Sept 2021 at 22:10, Justin Pryzby wrote: > > On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote: > > I have to wonder if there really *is* a use case for CLUSTER in the > > first place on regular tables, let alone on partitioned tables, which > > are likely to be large and thus take a lot of time. > > The cluster now is done one partition at a time, so it might take a long time, > but doesn't lock the whole partition heirarchy. Same as VACUUM (since v10) > and > (since v14) REINDEX. Note: The following review is based on the assumption that this v11 revision was meant to contain only one patch. I put this up as a note, because it seemed quite limited when compared to earlier versions of the patchset. I noticed that you store the result of find_all_inheritors(..., NoLock) in get_tables_to_cluster_partitioned, without taking care of potential concurrent partition hierarchy changes that the comment on find_all_inheritors warns against or documenting why it is safe, which sounds dangerous in case someone wants to adapt the code. One problem I can think of is that only storing reloid and indoid is not necessarily safe, as they might be reused by drop+create table running parallel to the CLUSTER command. Apart from that, I think it would be useful (though not strictly necessary for this patch) if you could adapt the current CLUSTER progress reporting to report the progress for the whole partition hierarchy, instead of a new progress report for each relation, as was the case in earlier versions of the patchset. The v11 patch seems quite incomplete when compared to previous discussions, or at the very least is very limited (no ALTER TABLE clustering commands for partitioned tables, but `CLUSTER ptable USING pindex` is supported). If v11 is the new proposed direction for ptable clustering, could you also document these limitations in the cluster.sgml and alter_table.sgml docs? > [ v11-0001-Implement-CLUSTER-of-partitioned-table.patch ] > diff --git a/src/test/regress/expected/cluster.out > b/src/test/regress/expected/cluster.out > ... > +ALTER TABLE clstrpart SET WITHOUT CLUSTER; > +ERROR: cannot mark index clustered in partitioned table This error message does not seem to match my expectation as a user: I am not trying to mark an index as clustered, and for a normal table "SET WITHOUT CLUSTER" does not fail for unclustered tables. I think that behaviour of normal unclustered tables should be shared here as well. At the very least, the error message should be changed. > ALTER TABLE clstrpart CLUSTER ON clstrpart_idx; > ERROR: cannot mark index clustered in partitioned table A "HINT: use the CLUSTER command to cluster partitioned tables" (or equivalent) should be added if we decide to keep the clustering APIs of ALTER TABLE disabled for partitioned tables, as CLUSTER is now implemented for partitioned tables. > -DROP TABLE clstrpart; I believe that this cleanup should not be fully removed, but moved to before '-- Test CLUSTER with external tuplesorting', as the table is not used after that line. Kind regards, Matthias van de Meent
Re: CLUSTER on partitioned index
On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote: > I have to wonder if there really *is* a use case for CLUSTER in the > first place on regular tables, let alone on partitioned tables, which > are likely to be large and thus take a lot of time. The cluster now is done one partition at a time, so it might take a long time, but doesn't lock the whole partition heirarchy. Same as VACUUM (since v10) and (since v14) REINDEX. The patch series would be simpler if partitioned indexes weren't allowed to be marked CLUSTERED ON. Then, "USING " would be required, which is a step forward from not supporting cluster on partitioned index at all. As attached. It's arguably true that the follow-up patches supporting indisclustered on partitioned indexes aren't worth the trouble. For sure CLUSTER is useful, see eg. https://github.com/bucardo/check_postgres/issues/29 It's sometimes important that the table is clustered to allow index scan to work well (or be chosen at all). If a table is scanned by an index, and isn't well-clustered, then a larger fraction (multiple) of the table will be read than what's optimal. That requires more IO, and more cache space. A year ago, I partitioned one of our previously-unpartitioned tables, and ended up clustering the partitions on their partition key (and indexed column) using \gexec. This was preferable to doing INSERT .. SELECT .. ORDER BY, which would've made the initial process slower - maybe unjustifiably slower for some customers. Cluster (using \gexec) was something I was able to do afterward, for completeness, since I expect the partitions to be mostly-clustered automatically, so it was bothering me that the existing data was unordered, and that it might behave differently in the future. > What justifies spending so much time on this implementation? Actually, I don't use partitioned indexes at all here, so this is not for us.. I worked on this after Adger asked about CIC on partitioned tables (for which I have a patch in the queue). Isn't it worth supporting that (or should we include an example about how to use format() with %I and \gexec) ? VACUUM [FULL] has recursed into partitions since v10 (f0e44751d). REINDEX supports partitioned tables in v14 (a6642b3ae). Partitioned indexes exist since v11 (as you well know), so it's somewhat odd that CLUSTER isn't supported, and seems increasingly weird as decreasing number of DDL commands are not supported. Supporting DDL on partitioned tables supports the idea that the physical partitions can be seen as an implementation detail by the DBA, which I understand was the intent since v10. You're right that I wouldn't plan to *routinely* re-cluster a partitioned table. Rather, I'd cluster only its "recent" *partitions*, and leave the old ones alone. Or cluster the partitions, a single time, once they're no longer recent. I don't think the feature is marginal just because I don't use it routinely. > My impression is that CLUSTER is pretty much a fringe command nowadays, > because of the access exclusive lock required. A step forward would be to integrate something like pg_repack/reorg/squeeze. I used pg_repack --index until v12 got REINDEX CONCURRENTLY. The goal there was to improve index scans on some large, append-only partitions where the planner gave an index scan, but performance was poor (now, we use BRIN so it works well without reindex). I tested that this would still be an issue by creating a non-brin index for a single day's table (even with v13 deduplication and v12 TID tiebreak). As I see it, support for partitioned cluster is orthogonal to an online/concurrent cluster, which is a job for another patch. -- Justin >From d2e7daf9c05cd3c20c60f5e35c0d6b2612d75d1b Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sun, 7 Jun 2020 16:58:42 -0500 Subject: [PATCH v11] Implement CLUSTER of partitioned table.. For now, partitioned indexes cannot be marked clustered, so clustering requires specification of a partitioned index on the partitioned table. VACUUM (including vacuum full) has recursed into partitione tables since partitioning were introduced in v10 (3c3bb9933). See expand_vacuum_rel(). See also a556549d7 and 19de0ab23. --- doc/src/sgml/ref/cluster.sgml | 6 + src/backend/commands/cluster.c| 170 +++--- src/bin/psql/tab-complete.c | 1 + src/include/commands/cluster.h| 1 + src/test/regress/expected/cluster.out | 46 ++- src/test/regress/sql/cluster.sql | 22 +++- 6 files changed, 197 insertions(+), 49 deletions(-) diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml index 86f5fdc469..b3463ae5c4 100644 --- a/doc/src/sgml/ref/cluster.sgml +++ b/doc/src/sgml/ref/cluster.sgml @@ -196,6 +196,12 @@ CLUSTER [VERBOSE] in the pg_stat_progress_cluster vie
Re: CLUSTER on partitioned index
On Tue, 2021-07-20 at 20:27 -0400, Alvaro Herrera wrote: > I have to wonder if there really *is* a use case for CLUSTER in the > first place on regular tables, let alone on partitioned tables, which > are likely to be large and thus take a lot of time. What justifies > spending so much time on this implementation? My impression is that > CLUSTER is pretty much a fringe command nowadays, because of the access > exclusive lock required. > > Does anybody actually use it? I see is used in the field occasionally, as it can really drastically improve performance. But I admit is is not frequently used. In a data warehouse, which is updated only occasionally, running CLUSTER after an update can make a lot of sense. I personally think that it is enough to be able to cluster the table partiton by partition. Yours, Laurenz Albe
Re: CLUSTER on partitioned index
On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote: > I have to wonder if there really *is* a use case for CLUSTER in the > first place on regular tables, let alone on partitioned tables, which > are likely to be large and thus take a lot of time. What justifies > spending so much time on this implementation? My impression is that > CLUSTER is pretty much a fringe command nowadays, because of the access > exclusive lock required. > > Does anybody actually use it? Yeah, I am not getting really excited about doing anything here either. I thought for some time about the interactions with indisclustered and partitioned tables, but anything I could come up with felt clunky. -- Michael signature.asc Description: PGP signature
Re: CLUSTER on partitioned index
I have to wonder if there really *is* a use case for CLUSTER in the first place on regular tables, let alone on partitioned tables, which are likely to be large and thus take a lot of time. What justifies spending so much time on this implementation? My impression is that CLUSTER is pretty much a fringe command nowadays, because of the access exclusive lock required. Does anybody actually use it? -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)
Re: CLUSTER on partitioned index
Hi, For v10-0002-Implement-CLUSTER-of-partitioned-table.patch : or that an partitioned index was previously set clustered. 'an partitioned index' -> a partitioned index + * Return a List of tables and associated index, where each index is a associated index -> associated indices For cluster(): - rel = table_open(tableOid, NoLock); + rel = table_open(tableOid, ShareUpdateExclusiveLock); Considering the comment preceding cluster() (forced to acquire exclusive locks on all the tables), maybe add a comment explaining why it is safe to take ShareUpdateExclusiveLock. +cluster_multiple_rels(List *rvs, int options) I think the multiple in the method name is not needed since the relation is in plural. Cheers On Fri, Apr 2, 2021 at 1:03 PM Justin Pryzby wrote: > @cfbot: rebased >
Re: CLUSTER on partitioned index
@cfbot: rebased >From 686cd8fc644f1f86d81d7748b66feddd634c4dc8 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Thu, 26 Nov 2020 14:37:08 -0600 Subject: [PATCH v10 1/8] pg_dump: make CLUSTER ON a separate dump object.. ..since it needs to be restored after any child indexes are restored *and attached*. The order needs to be: 1) restore child and parent index (order doesn't matter); 2) attach child index; 3) set cluster on child and parent index (order doesn't matter); --- src/bin/pg_dump/pg_dump.c | 86 ++ src/bin/pg_dump/pg_dump.h | 8 src/bin/pg_dump/pg_dump_sort.c | 8 3 files changed, 82 insertions(+), 20 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 25717ce0e6..9a3044fd8c 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -215,6 +215,7 @@ static void dumpSequence(Archive *fout, const TableInfo *tbinfo); static void dumpSequenceData(Archive *fout, const TableDataInfo *tdinfo); static void dumpIndex(Archive *fout, const IndxInfo *indxinfo); static void dumpIndexAttach(Archive *fout, const IndexAttachInfo *attachinfo); +static void dumpIndexClusterOn(Archive *fout, const IndexClusterInfo *clusterinfo); static void dumpStatisticsExt(Archive *fout, const StatsExtInfo *statsextinfo); static void dumpConstraint(Archive *fout, const ConstraintInfo *coninfo); static void dumpTableConstraintComment(Archive *fout, const ConstraintInfo *coninfo); @@ -7232,6 +7233,11 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables) i_inddependcollversions; int ntups; + int ncluster = 0; + IndexClusterInfo *clusterinfo; + clusterinfo = (IndexClusterInfo *) + pg_malloc0(numTables * sizeof(IndexClusterInfo)); + for (i = 0; i < numTables; i++) { TableInfo *tbinfo = &tblinfo[i]; @@ -7611,6 +7617,27 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables) /* Plain secondary index */ indxinfo[j].indexconstraint = 0; } + + /* Record each table's CLUSTERed index, if any */ + if (indxinfo[j].indisclustered) + { +IndxInfo *index = &indxinfo[j]; +IndexClusterInfo *cluster = &clusterinfo[ncluster]; + +cluster->dobj.objType = DO_INDEX_CLUSTER_ON; +cluster->dobj.catId.tableoid = 0; +cluster->dobj.catId.oid = 0; +AssignDumpId(&cluster->dobj); +cluster->dobj.name = pg_strdup(index->dobj.name); +cluster->dobj.namespace = index->indextable->dobj.namespace; +cluster->index = index; +cluster->indextable = &tblinfo[i]; + +/* The CLUSTER ON depends on its index.. */ +addObjectDependency(&cluster->dobj, index->dobj.dumpId); + +ncluster++; + } } PQclear(res); @@ -10472,6 +10499,9 @@ dumpDumpableObject(Archive *fout, const DumpableObject *dobj) case DO_SUBSCRIPTION: dumpSubscription(fout, (const SubscriptionInfo *) dobj); break; + case DO_INDEX_CLUSTER_ON: + dumpIndexClusterOn(fout, (IndexClusterInfo *) dobj); + break; case DO_PRE_DATA_BOUNDARY: case DO_POST_DATA_BOUNDARY: /* never dumped, nothing to do */ @@ -16721,6 +16751,41 @@ getAttrName(int attrnum, const TableInfo *tblInfo) return NULL;/* keep compiler quiet */ } +/* + * dumpIndexClusterOn + * record that the index is clustered. + */ +static void +dumpIndexClusterOn(Archive *fout, const IndexClusterInfo *clusterinfo) +{ + DumpOptions *dopt = fout->dopt; + TableInfo *tbinfo = clusterinfo->indextable; + char *qindxname; + PQExpBuffer q; + + if (dopt->dataOnly) + return; + + q = createPQExpBuffer(); + qindxname = pg_strdup(fmtId(clusterinfo->dobj.name)); + + /* index name is not qualified in this syntax */ + appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER ON %s;\n", + fmtQualifiedDumpable(tbinfo), qindxname); + + if (clusterinfo->dobj.dump & DUMP_COMPONENT_DEFINITION) + ArchiveEntry(fout, clusterinfo->dobj.catId, clusterinfo->dobj.dumpId, + ARCHIVE_OPTS(.tag = clusterinfo->dobj.name, + .namespace = tbinfo->dobj.namespace->dobj.name, + .owner = tbinfo->rolname, + .description = "INDEX CLUSTER ON", + .section = SECTION_POST_DATA, + .createStmt = q->data)); + + destroyPQExpBuffer(q); + free(qindxname); +} + /* * dumpIndex * write out to fout a user-defined index @@ -16775,16 +16840,6 @@ dumpIndex(Archive *fout, const IndxInfo *indxinfo) * similar code in dumpConstraint! */ - /* If the index is clustered, we need to record that. */ - if (indxinfo->indisclustered) - { - appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER", - fmtQualifiedDumpable(tbinfo)); - /* index name is not qualified in this syntax */ - appendPQExpBuffer(q, " ON %s;\n", - qindxname); - } - /* * If the index has any statistics on some of its columns, generate * the associated ALTER INDEX queries. @@ -17111,16 +17166,6 @@ dumpConstraint(Archive *fout, const ConstraintInfo *coninfo) * similar code in dumpIndex! */ - /* If the index i
Re: CLUSTER on partitioned index
On Wed, Feb 10, 2021 at 02:04:58PM -0600, Justin Pryzby wrote: > On Sat, Feb 06, 2021 at 08:45:49AM -0600, Justin Pryzby wrote: > > On Mon, Jan 18, 2021 at 12:34:59PM -0600, Justin Pryzby wrote: > > > On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote: > > > > On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote: > > > > > On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote: > > > > > > On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote: > > > > > > > I'm attaching a counter-proposal to your catalog change, which > > > > > > > preserves > > > > > > > indisclustered on children of clustered, partitioned indexes, and > > > > > > > invalidates > > > > > > > indisclustered when attaching unclustered indexes. > > > > > > > > > > > > ..and now propagates CLUSTER ON to child indexes. > > > > > > > > > > > > I left this as separate patches to show what I mean and what's new > > > > > > while we > > > > > > discuss it. > > > > > > > > > > This fixes some omissions in the previous patch and error in its test > > > > > cases. > > > > > > > > > > CLUSTER ON recurses to children, since I think a clustered parent > > > > > index means > > > > > that all its child indexes are clustered. "SET WITHOUT CLUSTER" > > > > > doesn't have > > > > > to recurse to children, but I did it like that for consistency and it > > > > > avoids > > > > > the need to special case InvalidOid. > > > > > > > > The previous patch failed pg_upgrade when restoring a clustered, parent > > > > index, > > > > since it's marked INVALID until indexes have been built on all child > > > > tables, so > > > > CLUSTER ON was rejected on invalid index. > > > > > > > > So I think CLUSTER ON needs to be a separate pg_dump object, to allow > > > > attaching > > > > the child index (thereby making the parent "valid") to happen before SET > > > > CLUSTER on the parent index. > > > > > > Rebased on b5913f612 and now a3dc92600. > > > > This resolves ORDER BY test failure with COLLATE "C". > > It occured to me that progress reporting should expose this. > > I did this in the style of pg_stat_progress_create_index, adding columns > partitions_total and partitions_done showing the overall progress. The > progress > of individual partitions is also visible in {blocks,tuples}_{done,total}. > This seems odd, but that's how the index view behaves. Rebased on 8a8f4d8ede288c2a29105f4708e22ce7f3526149. This also resolves an issue in the last patch which would've broken progress reporting of vacuum full. And take the suggestion to move memory context switching outside the loop. -- Justin >From 3a39edbed1595efc230aac1de09fc276cc7ca7f4 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Thu, 26 Nov 2020 14:37:08 -0600 Subject: [PATCH v9 1/8] pg_dump: make CLUSTER ON a separate dump object.. ..since it needs to be restored after any child indexes are restored *and attached*. The order needs to be: 1) restore child and parent index (order doesn't matter); 2) attach child index; 3) set cluster on child and parent index (order doesn't matter); --- src/bin/pg_dump/pg_dump.c | 86 ++ src/bin/pg_dump/pg_dump.h | 8 src/bin/pg_dump/pg_dump_sort.c | 8 3 files changed, 82 insertions(+), 20 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index eb988d7eb4..e93d2eb828 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -208,6 +208,7 @@ static void dumpSequence(Archive *fout, const TableInfo *tbinfo); static void dumpSequenceData(Archive *fout, const TableDataInfo *tdinfo); static void dumpIndex(Archive *fout, const IndxInfo *indxinfo); static void dumpIndexAttach(Archive *fout, const IndexAttachInfo *attachinfo); +static void dumpIndexClusterOn(Archive *fout, const IndexClusterInfo *clusterinfo); static void dumpStatisticsExt(Archive *fout, const StatsExtInfo *statsextinfo); static void dumpConstraint(Archive *fout, const ConstraintInfo *coninfo); static void dumpTableConstraintComment(Archive *fout, const ConstraintInfo *coninfo); @@ -7092,6 +7093,11 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables) i_inddependcollversions; int ntups; + int ncluster = 0; + IndexClusterInfo *clusterinfo; + clusterinfo = (IndexClusterInfo *) + pg_malloc0(numTables * sizeof(IndexClusterInfo)); + for (i = 0; i < numTables; i++) { TableInfo *tbinfo = &tblinfo[i]; @@ -7471,6 +7477,27 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables) /* Plain secondary index */ indxinfo[j].indexconstraint = 0; } + + /* Record each table's CLUSTERed index, if any */ + if (indxinfo[j].indisclustered) + { +IndxInfo *index = &indxinfo[j]; +IndexClusterInfo *cluster = &clusterinfo[ncluster]; + +cluster->dobj.objType = DO_INDEX_CLUSTER_ON; +cluster->dobj.catId.tableoid = 0; +cluster->dobj.catId.oid = 0; +AssignDumpId(&cluster->dobj); +cluster->dobj.nam
Re: CLUSTER on partitioned index
On Sat, Feb 06, 2021 at 08:45:49AM -0600, Justin Pryzby wrote: > On Mon, Jan 18, 2021 at 12:34:59PM -0600, Justin Pryzby wrote: > > On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote: > > > On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote: > > > > On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote: > > > > > On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote: > > > > > > I'm attaching a counter-proposal to your catalog change, which > > > > > > preserves > > > > > > indisclustered on children of clustered, partitioned indexes, and > > > > > > invalidates > > > > > > indisclustered when attaching unclustered indexes. > > > > > > > > > > ..and now propagates CLUSTER ON to child indexes. > > > > > > > > > > I left this as separate patches to show what I mean and what's new > > > > > while we > > > > > discuss it. > > > > > > > > This fixes some omissions in the previous patch and error in its test > > > > cases. > > > > > > > > CLUSTER ON recurses to children, since I think a clustered parent index > > > > means > > > > that all its child indexes are clustered. "SET WITHOUT CLUSTER" > > > > doesn't have > > > > to recurse to children, but I did it like that for consistency and it > > > > avoids > > > > the need to special case InvalidOid. > > > > > > The previous patch failed pg_upgrade when restoring a clustered, parent > > > index, > > > since it's marked INVALID until indexes have been built on all child > > > tables, so > > > CLUSTER ON was rejected on invalid index. > > > > > > So I think CLUSTER ON needs to be a separate pg_dump object, to allow > > > attaching > > > the child index (thereby making the parent "valid") to happen before SET > > > CLUSTER on the parent index. > > > > Rebased on b5913f612 and now a3dc92600. > > This resolves ORDER BY test failure with COLLATE "C". It occured to me that progress reporting should expose this. I did this in the style of pg_stat_progress_create_index, adding columns partitions_total and partitions_done showing the overall progress. The progress of individual partitions is also visible in {blocks,tuples}_{done,total}. This seems odd, but that's how the index view behaves. -- Justin >From 1614cfa411dabd74faa64dc805f52405e2572239 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Thu, 26 Nov 2020 14:37:08 -0600 Subject: [PATCH v8 1/8] pg_dump: make CLUSTER ON a separate dump object.. ..since it needs to be restored after any child indexes are restored *and attached*. The order needs to be: 1) restore child and parent index (order doesn't matter); 2) attach child index; 3) set cluster on child and parent index (order doesn't matter); --- src/bin/pg_dump/pg_dump.c | 86 ++ src/bin/pg_dump/pg_dump.h | 8 src/bin/pg_dump/pg_dump_sort.c | 8 3 files changed, 82 insertions(+), 20 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index eb988d7eb4..e93d2eb828 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -208,6 +208,7 @@ static void dumpSequence(Archive *fout, const TableInfo *tbinfo); static void dumpSequenceData(Archive *fout, const TableDataInfo *tdinfo); static void dumpIndex(Archive *fout, const IndxInfo *indxinfo); static void dumpIndexAttach(Archive *fout, const IndexAttachInfo *attachinfo); +static void dumpIndexClusterOn(Archive *fout, const IndexClusterInfo *clusterinfo); static void dumpStatisticsExt(Archive *fout, const StatsExtInfo *statsextinfo); static void dumpConstraint(Archive *fout, const ConstraintInfo *coninfo); static void dumpTableConstraintComment(Archive *fout, const ConstraintInfo *coninfo); @@ -7092,6 +7093,11 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables) i_inddependcollversions; int ntups; + int ncluster = 0; + IndexClusterInfo *clusterinfo; + clusterinfo = (IndexClusterInfo *) + pg_malloc0(numTables * sizeof(IndexClusterInfo)); + for (i = 0; i < numTables; i++) { TableInfo *tbinfo = &tblinfo[i]; @@ -7471,6 +7477,27 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables) /* Plain secondary index */ indxinfo[j].indexconstraint = 0; } + + /* Record each table's CLUSTERed index, if any */ + if (indxinfo[j].indisclustered) + { +IndxInfo *index = &indxinfo[j]; +IndexClusterInfo *cluster = &clusterinfo[ncluster]; + +cluster->dobj.objType = DO_INDEX_CLUSTER_ON; +cluster->dobj.catId.tableoid = 0; +cluster->dobj.catId.oid = 0; +AssignDumpId(&cluster->dobj); +cluster->dobj.name = pg_strdup(index->dobj.name); +cluster->dobj.namespace = index->indextable->dobj.namespace; +cluster->index = index; +cluster->indextable = &tblinfo[i]; + +/* The CLUSTER ON depends on its index.. */ +addObjectDependency(&cluster->dobj, index->dobj.dumpId); + +ncluster++; + } } PQclear(res); @@ -10323,6 +10350,9 @@ dumpDumpableObject(Archive *fout, const Dumpab
Re: CLUSTER on partitioned index
Hi, For v7-0002-Implement-CLUSTER-of-partitioned-table.patch: +* We have to build the list in a different memory context so it will +* survive the cross-transaction processing +*/ + old_context = MemoryContextSwitchTo(cluster_context); cluster_context is not modified within the loop. Can the memory context switching code be moved outside the loop ? Cheers On Sat, Feb 6, 2021 at 6:46 AM Justin Pryzby wrote: > On Mon, Jan 18, 2021 at 12:34:59PM -0600, Justin Pryzby wrote: > > On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote: > > > On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote: > > > > On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote: > > > > > On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote: > > > > > > I'm attaching a counter-proposal to your catalog change, which > preserves > > > > > > indisclustered on children of clustered, partitioned indexes, > and invalidates > > > > > > indisclustered when attaching unclustered indexes. > > > > > > > > > > ..and now propagates CLUSTER ON to child indexes. > > > > > > > > > > I left this as separate patches to show what I mean and what's new > while we > > > > > discuss it. > > > > > > > > This fixes some omissions in the previous patch and error in its > test cases. > > > > > > > > CLUSTER ON recurses to children, since I think a clustered parent > index means > > > > that all its child indexes are clustered. "SET WITHOUT CLUSTER" > doesn't have > > > > to recurse to children, but I did it like that for consistency and > it avoids > > > > the need to special case InvalidOid. > > > > > > The previous patch failed pg_upgrade when restoring a clustered, > parent index, > > > since it's marked INVALID until indexes have been built on all child > tables, so > > > CLUSTER ON was rejected on invalid index. > > > > > > So I think CLUSTER ON needs to be a separate pg_dump object, to allow > attaching > > > the child index (thereby making the parent "valid") to happen before > SET > > > CLUSTER on the parent index. > > > > Rebased on b5913f612 and now a3dc92600. > > This resolves ORDER BY test failure with COLLATE "C". > > -- > Justin >
Re: CLUSTER on partitioned index
On Mon, Jan 18, 2021 at 12:34:59PM -0600, Justin Pryzby wrote: > On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote: > > On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote: > > > On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote: > > > > On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote: > > > > > I'm attaching a counter-proposal to your catalog change, which > > > > > preserves > > > > > indisclustered on children of clustered, partitioned indexes, and > > > > > invalidates > > > > > indisclustered when attaching unclustered indexes. > > > > > > > > ..and now propagates CLUSTER ON to child indexes. > > > > > > > > I left this as separate patches to show what I mean and what's new > > > > while we > > > > discuss it. > > > > > > This fixes some omissions in the previous patch and error in its test > > > cases. > > > > > > CLUSTER ON recurses to children, since I think a clustered parent index > > > means > > > that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't > > > have > > > to recurse to children, but I did it like that for consistency and it > > > avoids > > > the need to special case InvalidOid. > > > > The previous patch failed pg_upgrade when restoring a clustered, parent > > index, > > since it's marked INVALID until indexes have been built on all child > > tables, so > > CLUSTER ON was rejected on invalid index. > > > > So I think CLUSTER ON needs to be a separate pg_dump object, to allow > > attaching > > the child index (thereby making the parent "valid") to happen before SET > > CLUSTER on the parent index. > > Rebased on b5913f612 and now a3dc92600. This resolves ORDER BY test failure with COLLATE "C". -- Justin >From 1101d6b8a44f5cc170816f305476750352dc06de Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Thu, 26 Nov 2020 14:37:08 -0600 Subject: [PATCH v7 1/7] pg_dump: make CLUSTER ON a separate dump object.. ..since it needs to be restored after any child indexes are restored *and attached*. The order needs to be: 1) restore child and parent index (order doesn't matter); 2) attach child index; 3) set cluster on child and parent index (order doesn't matter); --- src/bin/pg_dump/pg_dump.c | 86 ++ src/bin/pg_dump/pg_dump.h | 8 src/bin/pg_dump/pg_dump_sort.c | 8 3 files changed, 82 insertions(+), 20 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index d99b61e621..8dc8a42964 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -208,6 +208,7 @@ static void dumpSequence(Archive *fout, TableInfo *tbinfo); static void dumpSequenceData(Archive *fout, TableDataInfo *tdinfo); static void dumpIndex(Archive *fout, IndxInfo *indxinfo); static void dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo); +static void dumpIndexClusterOn(Archive *fout, IndexClusterInfo *clusterinfo); static void dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo); static void dumpConstraint(Archive *fout, ConstraintInfo *coninfo); static void dumpTableConstraintComment(Archive *fout, ConstraintInfo *coninfo); @@ -7092,6 +7093,11 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables) i_inddependcollversions; int ntups; + int ncluster = 0; + IndexClusterInfo *clusterinfo; + clusterinfo = (IndexClusterInfo *) + pg_malloc0(numTables * sizeof(IndexClusterInfo)); + for (i = 0; i < numTables; i++) { TableInfo *tbinfo = &tblinfo[i]; @@ -7471,6 +7477,27 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables) /* Plain secondary index */ indxinfo[j].indexconstraint = 0; } + + /* Record each table's CLUSTERed index, if any */ + if (indxinfo[j].indisclustered) + { +IndxInfo *index = &indxinfo[j]; +IndexClusterInfo *cluster = &clusterinfo[ncluster]; + +cluster->dobj.objType = DO_INDEX_CLUSTER_ON; +cluster->dobj.catId.tableoid = 0; +cluster->dobj.catId.oid = 0; +AssignDumpId(&cluster->dobj); +cluster->dobj.name = pg_strdup(index->dobj.name); +cluster->dobj.namespace = index->indextable->dobj.namespace; +cluster->index = index; +cluster->indextable = &tblinfo[i]; + +/* The CLUSTER ON depends on its index.. */ +addObjectDependency(&cluster->dobj, index->dobj.dumpId); + +ncluster++; + } } PQclear(res); @@ -10323,6 +10350,9 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj) case DO_SUBSCRIPTION: dumpSubscription(fout, (SubscriptionInfo *) dobj); break; + case DO_INDEX_CLUSTER_ON: + dumpIndexClusterOn(fout, (IndexClusterInfo *) dobj); + break; case DO_PRE_DATA_BOUNDARY: case DO_POST_DATA_BOUNDARY: /* never dumped, nothing to do */ @@ -16543,6 +16573,41 @@ getAttrName(int attrnum, TableInfo *tblInfo) return NULL;/* keep compiler quiet */ } +/* + * dumpIndexClusterOn + * record that the index is clustered. + */ +static void +dumpIndexClusterOn(Archive *fout, In
Re: CLUSTER on partitioned index
On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote: > On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote: > > On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote: > > > On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote: > > > > I'm attaching a counter-proposal to your catalog change, which preserves > > > > indisclustered on children of clustered, partitioned indexes, and > > > > invalidates > > > > indisclustered when attaching unclustered indexes. > > > > > > ..and now propagates CLUSTER ON to child indexes. > > > > > > I left this as separate patches to show what I mean and what's new while > > > we > > > discuss it. > > > > This fixes some omissions in the previous patch and error in its test cases. > > > > CLUSTER ON recurses to children, since I think a clustered parent index > > means > > that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't > > have > > to recurse to children, but I did it like that for consistency and it avoids > > the need to special case InvalidOid. > > The previous patch failed pg_upgrade when restoring a clustered, parent index, > since it's marked INVALID until indexes have been built on all child tables, > so > CLUSTER ON was rejected on invalid index. > > So I think CLUSTER ON needs to be a separate pg_dump object, to allow > attaching > the child index (thereby making the parent "valid") to happen before SET > CLUSTER on the parent index. Rebased on b5913f612 and now a3dc92600. This patch is intertwined with the tablespace patch: not only will it get rebase conflict, but will also need to test the functionality of CLUSTER (TABLESPACE a) partitioned_table; -- Justin >From cb817c479d2a2d4ae94cfa8d215e369b5d206738 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Thu, 26 Nov 2020 14:37:08 -0600 Subject: [PATCH v6 1/7] pg_dump: make CLUSTER ON a separate dump object.. ..since it needs to be restored after any child indexes are restored *and attached*. The order needs to be: 1) restore child and parent index (order doesn't matter); 2) attach child index; 3) set cluster on child and parent index (order doesn't matter); --- src/bin/pg_dump/pg_dump.c | 86 ++ src/bin/pg_dump/pg_dump.h | 8 src/bin/pg_dump/pg_dump_sort.c | 8 3 files changed, 82 insertions(+), 20 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 798d14580e..e6526392e5 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -208,6 +208,7 @@ static void dumpSequence(Archive *fout, TableInfo *tbinfo); static void dumpSequenceData(Archive *fout, TableDataInfo *tdinfo); static void dumpIndex(Archive *fout, IndxInfo *indxinfo); static void dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo); +static void dumpIndexClusterOn(Archive *fout, IndexClusterInfo *clusterinfo); static void dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo); static void dumpConstraint(Archive *fout, ConstraintInfo *coninfo); static void dumpTableConstraintComment(Archive *fout, ConstraintInfo *coninfo); @@ -7092,6 +7093,11 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables) i_inddependcollversions; int ntups; + int ncluster = 0; + IndexClusterInfo *clusterinfo; + clusterinfo = (IndexClusterInfo *) + pg_malloc0(numTables * sizeof(IndexClusterInfo)); + for (i = 0; i < numTables; i++) { TableInfo *tbinfo = &tblinfo[i]; @@ -7471,6 +7477,27 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables) /* Plain secondary index */ indxinfo[j].indexconstraint = 0; } + + /* Record each table's CLUSTERed index, if any */ + if (indxinfo[j].indisclustered) + { +IndxInfo *index = &indxinfo[j]; +IndexClusterInfo *cluster = &clusterinfo[ncluster]; + +cluster->dobj.objType = DO_INDEX_CLUSTER_ON; +cluster->dobj.catId.tableoid = 0; +cluster->dobj.catId.oid = 0; +AssignDumpId(&cluster->dobj); +cluster->dobj.name = pg_strdup(index->dobj.name); +cluster->dobj.namespace = index->indextable->dobj.namespace; +cluster->index = index; +cluster->indextable = &tblinfo[i]; + +/* The CLUSTER ON depends on its index.. */ +addObjectDependency(&cluster->dobj, index->dobj.dumpId); + +ncluster++; + } } PQclear(res); @@ -10296,6 +10323,9 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj) case DO_SUBSCRIPTION: dumpSubscription(fout, (SubscriptionInfo *) dobj); break; + case DO_INDEX_CLUSTER_ON: + dumpIndexClusterOn(fout, (IndexClusterInfo *) dobj); + break; case DO_PRE_DATA_BOUNDARY: case DO_POST_DATA_BOUNDARY: /* never dumped, nothing to do */ @@ -16516,6 +16546,41 @@ getAttrName(int attrnum, TableInfo *tblInfo) return NULL;/* keep compiler quiet */ } +/* + * dumpIndexClusterOn + * record that the index is clustered. + */ +static void +dumpIndexClusterOn(Archive *fout, IndexClusterInfo *clusterinfo) +{ + Du
Re: CLUSTER on partitioned index
On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote: > On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote: > > On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote: > > > I'm attaching a counter-proposal to your catalog change, which preserves > > > indisclustered on children of clustered, partitioned indexes, and > > > invalidates > > > indisclustered when attaching unclustered indexes. > > > > ..and now propagates CLUSTER ON to child indexes. > > > > I left this as separate patches to show what I mean and what's new while we > > discuss it. > > This fixes some omissions in the previous patch and error in its test cases. > > CLUSTER ON recurses to children, since I think a clustered parent index means > that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't have > to recurse to children, but I did it like that for consistency and it avoids > the need to special case InvalidOid. The previous patch failed pg_upgrade when restoring a clustered, parent index, since it's marked INVALID until indexes have been built on all child tables, so CLUSTER ON was rejected on invalid index. So I think CLUSTER ON needs to be a separate pg_dump object, to allow attaching the child index (thereby making the parent "valid") to happen before SET CLUSTER on the parent index. -- Justin >From 2bcb70391a15b605c090f585668161079aa2b0b5 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Thu, 26 Nov 2020 14:37:08 -0600 Subject: [PATCH v5 1/7] pg_dump: make CLUSTER ON a separate dump object.. ..since it needs to be restored after any child indexes are restored *and attached*. The order needs to be: 1) restore child and parent index (order doesn't matter); 2) attach child index; 3) set cluster on child and parent index (order doesn't matter); --- src/bin/pg_dump/pg_dump.c | 86 ++ src/bin/pg_dump/pg_dump.h | 8 src/bin/pg_dump/pg_dump_sort.c | 28 ++- 3 files changed, 91 insertions(+), 31 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index dc1d41dd8d..ad990f5f8d 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -207,6 +207,7 @@ static void dumpSequence(Archive *fout, TableInfo *tbinfo); static void dumpSequenceData(Archive *fout, TableDataInfo *tdinfo); static void dumpIndex(Archive *fout, IndxInfo *indxinfo); static void dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo); +static void dumpIndexClusterOn(Archive *fout, IndexClusterInfo *clusterinfo); static void dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo); static void dumpConstraint(Archive *fout, ConstraintInfo *coninfo); static void dumpTableConstraintComment(Archive *fout, ConstraintInfo *coninfo); @@ -7036,6 +7037,11 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables) i_inddependcollversions; int ntups; + int ncluster = 0; + IndexClusterInfo *clusterinfo; + clusterinfo = (IndexClusterInfo *) + pg_malloc0(numTables * sizeof(IndexClusterInfo)); + for (i = 0; i < numTables; i++) { TableInfo *tbinfo = &tblinfo[i]; @@ -7415,6 +7421,27 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables) /* Plain secondary index */ indxinfo[j].indexconstraint = 0; } + + /* Record each table's CLUSTERed index, if any */ + if (indxinfo[j].indisclustered) + { +IndxInfo *index = &indxinfo[j]; +IndexClusterInfo *cluster = &clusterinfo[ncluster]; + +cluster->dobj.objType = DO_INDEX_CLUSTER_ON; +cluster->dobj.catId.tableoid = 0; +cluster->dobj.catId.oid = 0; +AssignDumpId(&cluster->dobj); +cluster->dobj.name = pg_strdup(index->dobj.name); +cluster->dobj.namespace = index->indextable->dobj.namespace; +cluster->index = index; +cluster->indextable = &tblinfo[i]; + +/* The CLUSTER ON depends on its index.. */ +addObjectDependency(&cluster->dobj, index->dobj.dumpId); + +ncluster++; + } } PQclear(res); @@ -10221,6 +10248,9 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj) case DO_SUBSCRIPTION: dumpSubscription(fout, (SubscriptionInfo *) dobj); break; + case DO_INDEX_CLUSTER_ON: + dumpIndexClusterOn(fout, (IndexClusterInfo *) dobj); + break; case DO_PRE_DATA_BOUNDARY: case DO_POST_DATA_BOUNDARY: /* never dumped, nothing to do */ @@ -16408,6 +16438,41 @@ getAttrName(int attrnum, TableInfo *tblInfo) return NULL;/* keep compiler quiet */ } +/* + * dumpIndexClusterOn + * record that the index is clustered. + */ +static void +dumpIndexClusterOn(Archive *fout, IndexClusterInfo *clusterinfo) +{ + DumpOptions *dopt = fout->dopt; + TableInfo *tbinfo = clusterinfo->indextable; + char *qindxname; + PQExpBuffer q; + + if (dopt->dataOnly) + return; + + q = createPQExpBuffer(); + qindxname = pg_strdup(fmtId(clusterinfo->dobj.name)); + + /* index name is not qualified in this syntax */ + appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER ON %s;\n", + f
Re: CLUSTER on partitioned index
On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote: > On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote: > > I'm attaching a counter-proposal to your catalog change, which preserves > > indisclustered on children of clustered, partitioned indexes, and > > invalidates > > indisclustered when attaching unclustered indexes. > > ..and now propagates CLUSTER ON to child indexes. > > I left this as separate patches to show what I mean and what's new while we > discuss it. This fixes some omissions in the previous patch and error in its test cases. CLUSTER ON recurses to children, since I think a clustered parent index means that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't have to recurse to children, but I did it like that for consistency and it avoids the need to special case InvalidOid. -- Justin >From 75690aacef0d294e2667bd1091cf647dc9f5d187 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sun, 7 Jun 2020 16:58:42 -0500 Subject: [PATCH v4 1/5] Implement CLUSTER of partitioned table.. This requires either specification of a partitioned index on which to cluster, or that an partitioned index was previously set clustered. --- doc/src/sgml/ref/cluster.sgml | 6 + src/backend/commands/cluster.c| 167 +++--- src/bin/psql/tab-complete.c | 1 + src/include/nodes/parsenodes.h| 5 +- src/test/regress/expected/cluster.out | 58 - src/test/regress/sql/cluster.sql | 24 +++- 6 files changed, 208 insertions(+), 53 deletions(-) diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml index b9450e7366..0476cfff72 100644 --- a/doc/src/sgml/ref/cluster.sgml +++ b/doc/src/sgml/ref/cluster.sgml @@ -172,6 +172,12 @@ CLUSTER [VERBOSE] are periodically reclustered. + +Clustering a partitioned table clusters each of its partitions using the +index partition of the given partitioned index or (if not specified) the +partitioned index marked as clustered. + + diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 04d12a7ece..391e018bbd 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -32,7 +32,9 @@ #include "catalog/index.h" #include "catalog/namespace.h" #include "catalog/objectaccess.h" +#include "catalog/partition.h" #include "catalog/pg_am.h" +#include "catalog/pg_inherits.h" #include "catalog/toasting.h" #include "commands/cluster.h" #include "commands/progress.h" @@ -72,6 +74,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, bool *pSwapToastByContent, TransactionId *pFreezeXid, MultiXactId *pCutoffMulti); static List *get_tables_to_cluster(MemoryContext cluster_context); +static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context, + Oid indexOid); +static void cluster_multiple_rels(List *rvs, int options); /*--- @@ -113,7 +118,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel) AccessExclusiveLock, 0, RangeVarCallbackOwnsTable, NULL); - rel = table_open(tableOid, NoLock); + rel = table_open(tableOid, ShareUpdateExclusiveLock); /* * Reject clustering a remote temp table ... their local buffer @@ -124,14 +129,6 @@ cluster(ClusterStmt *stmt, bool isTopLevel) (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot cluster temporary tables of other sessions"))); - /* - * Reject clustering a partitioned table. - */ - if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot cluster a partitioned table"))); - if (stmt->indexname == NULL) { ListCell *index; @@ -169,8 +166,32 @@ cluster(ClusterStmt *stmt, bool isTopLevel) /* close relation, keep lock till commit */ table_close(rel, NoLock); - /* Do the job. */ - cluster_rel(tableOid, indexOid, stmt->options); + if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) + { + /* Do the job. */ + cluster_rel(tableOid, indexOid, stmt->options); + } + else + { + List *rvs; + MemoryContext cluster_context; + + /* Refuse to hold strong locks in a user transaction */ + PreventInTransactionBlock(isTopLevel, "CLUSTER"); + + cluster_context = AllocSetContextCreate(PortalContext, +"Cluster", +ALLOCSET_DEFAULT_SIZES); + + rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid); + cluster_multiple_rels(rvs, stmt->options); + + /* Start a new transaction for the cleanup work. */ + StartTransactionCommand(); + + /* Clean up working storage */ + MemoryContextDelete(cluster_context); + } } else { @@ -180,7 +201,6 @@ cluster(ClusterStmt *stmt, bool isTopLevel) */ MemoryContext cluster_context; List *rvs; - ListCell *rv; /* * We ca
Re: CLUSTER on partitioned index
@cfbot: rebased On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote: > I'm attaching a counter-proposal to your catalog change, which preserves > indisclustered on children of clustered, partitioned indexes, and invalidates > indisclustered when attaching unclustered indexes. ..and now propagates CLUSTER ON to child indexes. I left this as separate patches to show what I mean and what's new while we discuss it. -- Justin >From 521b91b2af5688e88145714db1e990b803ea Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sun, 7 Jun 2020 16:58:42 -0500 Subject: [PATCH v3 1/3] Implement CLUSTER of partitioned table.. This requires either specification of a partitioned index on which to cluster, or that an partitioned index was previously set clustered. --- doc/src/sgml/ref/cluster.sgml | 6 + src/backend/commands/cluster.c| 167 +++--- src/bin/psql/tab-complete.c | 1 + src/include/nodes/parsenodes.h| 5 +- src/test/regress/expected/cluster.out | 58 - src/test/regress/sql/cluster.sql | 24 +++- 6 files changed, 208 insertions(+), 53 deletions(-) diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml index b9450e7366..0476cfff72 100644 --- a/doc/src/sgml/ref/cluster.sgml +++ b/doc/src/sgml/ref/cluster.sgml @@ -172,6 +172,12 @@ CLUSTER [VERBOSE] are periodically reclustered. + +Clustering a partitioned table clusters each of its partitions using the +index partition of the given partitioned index or (if not specified) the +partitioned index marked as clustered. + + diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 04d12a7ece..391e018bbd 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -32,7 +32,9 @@ #include "catalog/index.h" #include "catalog/namespace.h" #include "catalog/objectaccess.h" +#include "catalog/partition.h" #include "catalog/pg_am.h" +#include "catalog/pg_inherits.h" #include "catalog/toasting.h" #include "commands/cluster.h" #include "commands/progress.h" @@ -72,6 +74,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, bool *pSwapToastByContent, TransactionId *pFreezeXid, MultiXactId *pCutoffMulti); static List *get_tables_to_cluster(MemoryContext cluster_context); +static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context, + Oid indexOid); +static void cluster_multiple_rels(List *rvs, int options); /*--- @@ -113,7 +118,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel) AccessExclusiveLock, 0, RangeVarCallbackOwnsTable, NULL); - rel = table_open(tableOid, NoLock); + rel = table_open(tableOid, ShareUpdateExclusiveLock); /* * Reject clustering a remote temp table ... their local buffer @@ -124,14 +129,6 @@ cluster(ClusterStmt *stmt, bool isTopLevel) (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot cluster temporary tables of other sessions"))); - /* - * Reject clustering a partitioned table. - */ - if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot cluster a partitioned table"))); - if (stmt->indexname == NULL) { ListCell *index; @@ -169,8 +166,32 @@ cluster(ClusterStmt *stmt, bool isTopLevel) /* close relation, keep lock till commit */ table_close(rel, NoLock); - /* Do the job. */ - cluster_rel(tableOid, indexOid, stmt->options); + if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) + { + /* Do the job. */ + cluster_rel(tableOid, indexOid, stmt->options); + } + else + { + List *rvs; + MemoryContext cluster_context; + + /* Refuse to hold strong locks in a user transaction */ + PreventInTransactionBlock(isTopLevel, "CLUSTER"); + + cluster_context = AllocSetContextCreate(PortalContext, +"Cluster", +ALLOCSET_DEFAULT_SIZES); + + rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid); + cluster_multiple_rels(rvs, stmt->options); + + /* Start a new transaction for the cleanup work. */ + StartTransactionCommand(); + + /* Clean up working storage */ + MemoryContextDelete(cluster_context); + } } else { @@ -180,7 +201,6 @@ cluster(ClusterStmt *stmt, bool isTopLevel) */ MemoryContext cluster_context; List *rvs; - ListCell *rv; /* * We cannot run this form of CLUSTER inside a user transaction block; @@ -204,25 +224,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel) */ rvs = get_tables_to_cluster(cluster_context); - /* Commit to get out of starting transaction */ - PopActiveSnapshot(); - CommitTransactionCommand(); - - /* Ok, now that we've got them all, cluster them one by one */ - foreach(rv, rvs) - { - RelToCluster *rvtc = (RelToCl
CLUSTER on partitioned index
Forking this thread, since the existing CFs have been closed. https://www.postgresql.org/message-id/flat/20200914143102.GX18552%40telsasoft.com#58b1056488451f8594b0f0ba40996afd On Tue, Oct 06, 2020 at 01:38:23PM +0900, Michael Paquier wrote: > On Mon, Oct 05, 2020 at 10:07:33PM -0500, Justin Pryzby wrote: > > Honestly, I think you're over-thinking and over-engineering indisclustered. > > > > If "clusteredness" was something we offered to maintain across DML, I think > > that might be important to provide stronger guarantees. As it is now, I > > don't > > think this patch is worth changing the catalog definition. > > Well, this use case is new because we are discussing the relationship > of indisclustered across multiple transactions for multiple indexes, > so I'd rather have this discussion than not, and I have learnt > the hard way with REINDEX that we should care a lot about the > consistency of partition trees at any step of the operation. indisclustered is only used as a default for "CLUSTER" (without USING). The worst thing that can happen if it's "inconsistent" is that "CLUSTER;" clusters a table on the "old" clustered index (that it was already clustered on), which is what would've happened before running some command which was interrupted. > Let's > imagine a simple example here, take this partition tree: p (parent), > and two partitions p1 and p2. p has two partitioned indexes i and j, > indexes also present in p1 and p2 as i1, i2, j1 and j2. Let's assume > that the user has done a CLUSTER on p USING i that completes, meaning > that i, i1 and i2 have indisclustered set. Now let's assume that the > user does a CLUSTER on p USING j this time, and that this command > fails while processing p2, meaning that indisclustered is set for j1, > i2, and perhaps i or j depending on what the patch does. I think the state of "indisclustered" at that point is not critical. The command failed, and the user can re-run it, or ALTER..SET CLUSTER. Actually, I think the only inconsistent state is if two indexes are both marked indisclustered. I'm attaching a counter-proposal to your catalog change, which preserves indisclustered on children of clustered, partitioned indexes, and invalidates indisclustered when attaching unclustered indexes. Also, I noticed that CREATE TABLE (LIKE.. INCLUDING INDEXES) doesn't preserve indisclustered, but I can't say that's an issue. -- Justin >From dd4588352f99186f28fc666c497f85a87ac11da2 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sun, 7 Jun 2020 16:58:42 -0500 Subject: [PATCH v1 1/3] Implement CLUSTER of partitioned table.. This requires either specification of a partitioned index on which to cluster, or that an partitioned index was previously set clustered. TODO: handle DB-WIDE "CLUSTER;" for partitioned tables new partitions need to inherit indisclustered ? --- doc/src/sgml/ref/cluster.sgml | 6 + src/backend/commands/cluster.c| 169 +++--- src/bin/psql/tab-complete.c | 1 + src/include/nodes/parsenodes.h| 5 +- src/test/regress/expected/cluster.out | 58 - src/test/regress/sql/cluster.sql | 24 +++- 6 files changed, 209 insertions(+), 54 deletions(-) diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml index b9450e7366..0476cfff72 100644 --- a/doc/src/sgml/ref/cluster.sgml +++ b/doc/src/sgml/ref/cluster.sgml @@ -172,6 +172,12 @@ CLUSTER [VERBOSE] are periodically reclustered. + +Clustering a partitioned table clusters each of its partitions using the +index partition of the given partitioned index or (if not specified) the +partitioned index marked as clustered. + + diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 0d647e912c..1db8382a27 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -32,7 +32,9 @@ #include "catalog/index.h" #include "catalog/namespace.h" #include "catalog/objectaccess.h" +#include "catalog/partition.h" #include "catalog/pg_am.h" +#include "catalog/pg_inherits.h" #include "catalog/toasting.h" #include "commands/cluster.h" #include "commands/progress.h" @@ -75,6 +77,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, TransactionId *pFreezeXid, MultiXactId *pCutoffMulti); static List *get_tables_to_cluster(MemoryContext cluster_context); +static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context, + Oid indexOid); +static void cluster_multiple_rels(List *rvs, bool isTopLevel, int options); /*--- @@ -116,7 +121,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel) AccessExclusiveLock, 0, RangeVarCallbackOwnsTable, NULL); - rel = table_open(tableOid, NoLock); + rel = table_open(tableOid, ShareUpdateExclusiveLock); /* * Reject clustering a remote temp