On Sat, Jun 06, 2020 at 09:23:32AM -0500, Justin Pryzby wrote: > On Wed, Jun 03, 2020 at 08:22:29PM +0800, 李杰(慎追) wrote: > > Partitioning is necessary for very large tables. > > However, I found that postgresql does not support create index > > concurrently on partitioned tables. > > The document show that we need to create an index on each partition > > individually and then finally create the partitioned index > > non-concurrently. > > This is undoubtedly a complex operation for DBA, especially when there are > > many partitions. > > > Therefore, I wonder why pg does not support concurrent index creation on > > partitioned tables? > > What are the difficulties of this function? > > If I want to implement it, what should I pay attention? > > Maybe I'm wrong, but I don't think there's any known difficulty - just that > nobody did it yet.
I said that but I was actually thinking about the code for "REINDEX CONCURRENTLY" (which should also handle partitioned tables). I looked at CIC now and came up with the attached. All that's needed to allow this case is to close the relation before recursing to partitions - it needs to be closed before calling CommitTransactionCommand(). There's probably a better way to write this, but I can't see that there's anything complicated about handling partitioned tables. -- Justin
>From de5f151805946b6999aa25aaddd9ebaaf91ccff2 Mon Sep 17 00:00:00 2001 From: Justin Pryzby <pryz...@telsasoft.com> Date: Sat, 6 Jun 2020 17:42:23 -0500 Subject: [PATCH v1] Allow CREATE INDEX CONCURRENTLY on partitioned table --- doc/src/sgml/ref/create_index.sgml | 9 ----- src/backend/commands/indexcmds.c | 54 +++++++++++++++++------------- 2 files changed, 31 insertions(+), 32 deletions(-) diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index ff87b2d28f..e1298b8523 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -657,15 +657,6 @@ Indexes: cannot. </para> - <para> - Concurrent builds for indexes on partitioned tables are currently not - supported. However, you may concurrently build the index on each - partition individually and then finally create the partitioned index - non-concurrently in order to reduce the time where writes to the - partitioned table will be locked out. In this case, building the - partitioned index is a metadata only operation. - </para> - </refsect2> </refsect1> diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 2baca12c5f..d2809c85a0 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -652,17 +652,6 @@ DefineIndex(Oid relationId, partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE; if (partitioned) { - /* - * Note: we check 'stmt->concurrent' rather than 'concurrent', so that - * the error is thrown also for temporary tables. Seems better to be - * consistent, even though we could do it on temporary table because - * we're not actually doing it concurrently. - */ - if (stmt->concurrent) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot create index on partitioned table \"%s\" concurrently", - RelationGetRelationName(rel)))); if (stmt->excludeOpNames) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), @@ -1139,15 +1128,23 @@ DefineIndex(Oid relationId, CreateComments(indexRelationId, RelationRelationId, 0, stmt->idxcomment); + /* save lockrelid and locktag for below */ + heaprelid = rel->rd_lockInfo.lockRelId; + SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId); + if (partitioned) { + char *relname = pstrdup(RelationGetRelationName(rel)); + /* * Unless caller specified to skip this step (via ONLY), process each * partition to make sure they all contain a corresponding index. * * If we're called internally (no stmt->relation), recurse always. */ - if (!stmt->relation || stmt->relation->inh) + if (stmt->relation && !stmt->relation->inh) + table_close(rel, NoLock); + else { PartitionDesc partdesc = RelationGetPartitionDesc(rel); int nparts = partdesc->nparts; @@ -1166,6 +1163,8 @@ DefineIndex(Oid relationId, for (i = 0; i < numberOfKeyAttributes; i++) opfamOids[i] = get_opclass_family(classObjectId[i]); + table_close(rel, NoLock); + /* * For each partition, scan all existing indexes; if one matches * our index definition and is not already attached to some other @@ -1196,9 +1195,9 @@ DefineIndex(Oid relationId, ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot create unique index on partitioned table \"%s\"", - RelationGetRelationName(rel)), + relname), errdetail("Table \"%s\" contains partitions that are foreign tables.", - RelationGetRelationName(rel)))); + relname))); table_close(childrel, lockmode); continue; @@ -1365,21 +1364,35 @@ DefineIndex(Oid relationId, } } + /* + * CIC needs to mark a partitioned table as VALID, which itself + * requires setting READY, which is unset for CIC (even though + * it's meaningless for an index without storage). + */ + if (concurrent) + { + if (ActiveSnapshotSet()) + PopActiveSnapshot(); + CommitTransactionCommand(); + StartTransactionCommand(); + index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY); + + CommandCounterIncrement(); + index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID); + } + /* * Indexes on partitioned tables are not themselves built, so we're * done here. */ - table_close(rel, NoLock); if (!OidIsValid(parentIndexId)) pgstat_progress_end_command(); return address; } + table_close(rel, NoLock); if (!concurrent) { - /* Close the heap and we're done, in the non-concurrent case */ - table_close(rel, NoLock); - /* If this is the top-level index, we're done. */ if (!OidIsValid(parentIndexId)) pgstat_progress_end_command(); @@ -1387,11 +1400,6 @@ DefineIndex(Oid relationId, return address; } - /* save lockrelid and locktag for below, then close rel */ - heaprelid = rel->rd_lockInfo.lockRelId; - SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId); - table_close(rel, NoLock); - /* * For a concurrent build, it's important to make the catalog entries * visible to other transactions before we start to build the index. That -- 2.17.0