On 01.03.2017 13:53, Maksim Milyutin wrote:
Hi hackers!As I've understood from thread [1] the main issue of creating local indexes for partitions is supporting REINDEX and DROP INDEX operations on parent partitioned tables. Furthermore Robert Haas mentioned the problem of creating index on key that is represented in partitions with single value (or primitive interval) [1] i.e. under the list-partitioning or range-partitioning with unit interval. I would like to propose the following solution: 1. Create index for hierarchy of partitioned tables and partitions recursively. Don't create relfilenode for indexes on parents, only entries in catalog (much like the partitioned table's storage elimination in [2]). Abstract index for partitioned tables is only for the reference on indexes of child tables to perform REINDEX and DROP INDEX operations. 2. Specify created indexes in pg_depend table so that indexes of child tables depend on corresponding indexes of parent tables with type of dependency DEPENDENCY_NORMAL so that index could be removed separately for partitions and recursively/separately for partitioned tables. 3. REINDEX on index of partitioned table would perform this operation on existing indexes of corresponding partitions. In this case it is necessary to consider such operations as REINDEX SCHEMA | DATABASE | SYSTEM so that partitions' indexes wouldn't be re-indexed multiple times in a row. Any thoughts? 1. https://www.postgresql.org/message-id/CA+TgmoZUwj=qynak+f7xef4w_e2g3xxdmnsnzmzjuinhrco...@mail.gmail.com 2. https://www.postgresql.org/message-id/2b0d42f2-3a53-763b-c9c2-47139e4b1c2e%40lab.ntt.co.jp
I want to present the first version of patches that implement local indexes for partitioned tables and discuss some technical details of that implementation.
1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX (literal 'l').
This was done because physical storage is created in the 'heap_create' function and we need to revoke the creating storage as with partitioned tables. Since information that this index belongs to partitioned tables is not available in 'heap_create' function (pg_index entry on the index is not created yet) I chose the least painful way - added a specific relkind for index on partitioned table. I suppose that this act will require the integrating new relkind to different places of source code so I'm ready to consider another proposals on this point.
2. My implementation doesn't support the concurrent creating of local index (CREATE INDEX CONCURRENTLY). As I understand, this operation involves nontrivial manipulation with snapshots and I don't know how to implement concurrent creating of multiple indexes. In this point I ask help from community.
3. As I noticed early pg_depend table is used for cascade deleting indexes on partitioned table and its children. I also use pg_depend to determine relationship between parent and child indexes when reindex executes recursively on child indexes.
Perhaps, it's not good way to use pg_depend to determine the relationship between parent and child indexes because the kind of this relationship is not defined. I could propose to add into pg_index table specific field of 'oidvector' type that specify oids of dependent indexes for the current local index.
On this stage I want to discuss only technical details of local indexes' implementation. The problems related to merging existing indexes of partitions within local index tree, determination uniqueness of field in global sense through local index and syntax notes I want to arise later.
CC welcome! -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c index cc5ac8b..bec3983 100644 --- a/src/backend/access/index/indexam.c +++ b/src/backend/access/index/indexam.c @@ -154,7 +154,8 @@ index_open(Oid relationId, LOCKMODE lockmode) r = relation_open(relationId, lockmode); - if (r->rd_rel->relkind != RELKIND_INDEX) + if (r->rd_rel->relkind != RELKIND_INDEX && + r->rd_rel->relkind != RELKIND_LOCAL_INDEX) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("\"%s\" is not an index", diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index fc088b2..26a10e9 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -1107,7 +1107,8 @@ doDeletion(const ObjectAddress *object, int flags) { char relKind = get_rel_relkind(object->objectId); - if (relKind == RELKIND_INDEX) + if (relKind == RELKIND_INDEX || + relKind == RELKIND_LOCAL_INDEX) { bool concurrent = ((flags & PERFORM_DELETION_CONCURRENTLY) != 0); diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 36917c8..91ac740 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -293,6 +293,7 @@ heap_create(const char *relname, case RELKIND_COMPOSITE_TYPE: case RELKIND_FOREIGN_TABLE: case RELKIND_PARTITIONED_TABLE: + case RELKIND_LOCAL_INDEX: create_storage = false; /* diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 7924c30..36452a9 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -41,6 +41,7 @@ #include "catalog/pg_collation.h" #include "catalog/pg_constraint.h" #include "catalog/pg_constraint_fn.h" +#include "catalog/pg_depend.h" #include "catalog/pg_operator.h" #include "catalog/pg_opclass.h" #include "catalog/pg_tablespace.h" @@ -726,6 +727,7 @@ index_create(Relation heapRelation, Oid namespaceId; int i; char relpersistence; + char index_relkind; is_exclusion = (indexInfo->ii_ExclusionOps != NULL); @@ -843,6 +845,10 @@ index_create(Relation heapRelation, } } + index_relkind = + (heapRelation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) ? + RELKIND_INDEX : RELKIND_LOCAL_INDEX; + /* * create the index relation's relcache entry and physical disk file. (If * we fail further down, it's the smgr's responsibility to remove the disk @@ -854,7 +860,7 @@ index_create(Relation heapRelation, indexRelationId, relFileNode, indexTupDesc, - RELKIND_INDEX, + index_relkind, relpersistence, shared_relation, mapped_relation, @@ -1548,10 +1554,14 @@ index_drop(Oid indexId, bool concurrent) TransferPredicateLocksToHeapRelation(userIndexRelation); } - /* - * Schedule physical removal of the files - */ - RelationDropStorage(userIndexRelation); + if (userIndexRelation->rd_rel->relkind != RELKIND_LOCAL_INDEX) + { + + /* + * Schedule physical removal of the files + */ + RelationDropStorage(userIndexRelation); + } /* * Close and flush the index's relcache entry, to ensure relcache doesn't @@ -3300,6 +3310,109 @@ IndexGetRelation(Oid indexId, bool missing_ok) } /* + * Find all leaf indexes included into local index with 'indexId' oid and lock + * all dependent indexes and respective relations. + * + * Search is performed in pg_depend table since all indexes belonging to child + * tables depends on index from parent table. + * + * indexId: the oid of local index whose leaf indexes need to find + * result: list of result leaf indexes + * depRel: already opened pg_depend relation + * indexLockmode: lockmode for indexes' locks + * heapLockmode: lockmode for relations' locks + */ +static void +findDepedentLeafIndexes(Oid indexId, List **result, Relation depRel, + LOCKMODE indexLockmode, LOCKMODE heapLockmode) +{ + ScanKeyData key[3]; + int nkeys; + SysScanDesc scan; + HeapTuple tup; + List *localSubIndexIds = NIL; + ListCell *lc; + + ScanKeyInit(&key[0], + Anum_pg_depend_refclassid, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(RelationRelationId)); + ScanKeyInit(&key[1], + Anum_pg_depend_refobjid, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(indexId)); + nkeys = 2; + + scan = systable_beginscan(depRel, DependReferenceIndexId, true, + NULL, nkeys, key); + + while (HeapTupleIsValid(tup = systable_getnext(scan))) + { + Form_pg_depend foundDep = (Form_pg_depend) GETSTRUCT(tup); + Relation index; + + if (foundDep->classid != RelationRelationId) + continue; + + /* Open and lock child index */ + index = relation_open(foundDep->objid, indexLockmode); + + /* Lock relation */ + LockRelationOid(IndexGetRelation(index->rd_id, false), heapLockmode); + + if (index->rd_rel->relkind == RELKIND_INDEX) + *result = lappend_oid(*result, index->rd_id); + else if (index->rd_rel->relkind == RELKIND_LOCAL_INDEX) + localSubIndexIds = lappend_oid(localSubIndexIds, index->rd_id); + + relation_close(index, NoLock); + } + + systable_endscan(scan); + + /* Iterate thorugh local subindexes to extract their leaf indexes */ + foreach(lc, localSubIndexIds) + { + findDepedentLeafIndexes(lfirst_oid(lc), result, depRel, indexLockmode, + heapLockmode); + } +} + +/* + * Reindex all real indexes included into local index with 'parent_index_id' oid + */ +static void +reindex_local_index(Oid parent_index_id, bool skip_constraint_checks, + char persistence, int options) +{ + List *leaf_indexes = NIL; + ListCell *lc; + Relation deprel; + + /* + * We open pg_depend just once and passing the Relation pointer down to all + * the recursive searching of leaf indexes steps. + */ + deprel = heap_open(DependRelationId, AccessShareLock); + + /* + * Extract all leaf indexes, and lock all indexes belonging with parent + * local index using AccessExclusive lock and corresponding relations using + * Share lock + */ + findDepedentLeafIndexes(parent_index_id, &leaf_indexes, deprel, + AccessExclusiveLock, ShareLock); + + foreach(lc, leaf_indexes) + { + reindex_index(lfirst_oid(lc), skip_constraint_checks, persistence, + options); + } + + heap_close(deprel, AccessShareLock); +} + +/* * reindex_index - This routine is used to recreate a single index */ void @@ -3338,6 +3451,19 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, errmsg("cannot reindex temporary tables of other sessions"))); /* + * Reindex local index belonging to partitioned table + */ + if (heapRelation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { + index_close(iRel, NoLock); + heap_close(heapRelation, NoLock); + + reindex_local_index(indexId, skipped_constraint, persistence, options); + + return; + } + + /* * Also check for active uses of the index in the current transaction; we * don't want to reindex underneath an open indexscan. */ diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 9618032..8bbe3d0 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -23,7 +23,9 @@ #include "catalog/catalog.h" #include "catalog/index.h" #include "catalog/indexing.h" +#include "catalog/partition.h" #include "catalog/pg_am.h" +#include "catalog/pg_inherits_fn.h" #include "catalog/pg_opclass.h" #include "catalog/pg_opfamily.h" #include "catalog/pg_tablespace.h" @@ -283,6 +285,30 @@ CheckIndexCompatible(Oid oldId, return ret; } +#define PUSH_REL_PARTITION_OIDS(rel, part_oids, rel_index_oid, \ + parent_index_oids) \ + do\ + {\ + if (RelationGetPartitionDesc((rel)))\ + {\ + int i;\ + for (i = 0; i < (rel)->rd_partdesc->nparts; ++i)\ + {\ + (part_oids) = lcons_oid((rel)->rd_partdesc->oids[i],\ + (part_oids));\ + (parent_index_oids) = lcons_oid((rel_index_oid),\ + (parent_index_oids));\ + }\ + }\ + } while(0) + +#define POP_REL_PARTITION_OIDS(part_oids, parent_index_oids) \ + do\ + {\ + (part_oids) = list_delete_first((part_oids));\ + (parent_index_oids) = list_delete_first((parent_index_oids));\ + } while(0) + /* * DefineIndex * Creates a new index. @@ -372,7 +398,8 @@ DefineIndex(Oid relationId, namespaceId = RelationGetNamespace(rel); if (rel->rd_rel->relkind != RELKIND_RELATION && - rel->rd_rel->relkind != RELKIND_MATVIEW) + rel->rd_rel->relkind != RELKIND_MATVIEW && + rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) { if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE) @@ -384,11 +411,6 @@ DefineIndex(Oid relationId, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot create index on foreign table \"%s\"", RelationGetRelationName(rel)))); - else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - ereport(ERROR, - (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("cannot create index on partitioned table \"%s\"", - RelationGetRelationName(rel)))); else ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), @@ -396,6 +418,12 @@ DefineIndex(Oid relationId, RelationGetRelationName(rel)))); } + if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE && stmt->concurrent) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot create local index on partitioned table \"%s\" concurrently", + RelationGetRelationName(rel)))); + /* * Don't try to CREATE INDEX on temp tables of other backends. */ @@ -660,7 +688,8 @@ DefineIndex(Oid relationId, coloptions, reloptions, stmt->primary, stmt->isconstraint, stmt->deferrable, stmt->initdeferred, allowSystemTableMods, - skip_build || stmt->concurrent, + skip_build || stmt->concurrent || + rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE, stmt->concurrent, !check_rights, stmt->if_not_exists); @@ -677,8 +706,89 @@ DefineIndex(Oid relationId, CreateComments(indexRelationId, RelationRelationId, 0, stmt->idxcomment); + /* + * Create local index on partitioned table that comes down to creating of + * indexes on child relations using depth-first traversal + */ + if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { + + List *part_oids = NIL, /* stack for concerned partion oids */ + *parent_index_oids = NIL; /* stack for corresponding parent + index oids */ + + Assert(!stmt->concurrent); + + /* + * Initially push child oids of current relation and related + * parent index oids + */ + PUSH_REL_PARTITION_OIDS(rel, part_oids, indexRelationId, + parent_index_oids); + + while (list_length(part_oids) > 0) + { + Relation childrel; + Oid parent_index_oid, + child_index_oid; + ObjectAddress index_address, + parent_index_address; + char *child_index_name; + + /* Extract top child relation and related parent index oid from stacks */ + childrel = relation_open(linitial_oid(part_oids), lockmode); + parent_index_oid = linitial_oid(parent_index_oids); + + /* Choose name for child index */ + child_index_name = + ChooseIndexName(RelationGetRelationName(childrel), + namespaceId, indexColNames, + stmt->excludeOpNames, stmt->primary, + stmt->isconstraint); + + /* Create index for child node */ + child_index_oid = + index_create(childrel, child_index_name, InvalidOid, + InvalidOid, indexInfo, indexColNames, + accessMethodId, tablespaceId, + collationObjectId, classObjectId, + coloptions, reloptions, stmt->primary, + stmt->isconstraint, stmt->deferrable, + stmt->initdeferred, allowSystemTableMods, + skip_build || childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE, + stmt->concurrent, !check_rights, + stmt->if_not_exists); + + /* Pop current reloid and related parent index oid from stacks */ + POP_REL_PARTITION_OIDS(part_oids, parent_index_oids); + + /* + * Push new childs of current child relation and + * related parent indexes to stacks + */ + PUSH_REL_PARTITION_OIDS(childrel, part_oids, child_index_oid, + parent_index_oids); + + /* Release relcache entry from childrel */ + relation_close(childrel, NoLock); + + /* + * Add entry to pg_depend to specify dependancy child index + * from parent one + */ + ObjectAddressSet(index_address, RelationRelationId, + child_index_oid); + ObjectAddressSet(parent_index_address, RelationRelationId, + parent_index_oid); + recordDependencyOn(&index_address, &parent_index_address, + DEPENDENCY_NORMAL); + } + } + + if (!stmt->concurrent) { + /* Close the heap and we're done, in the non-concurrent case */ heap_close(rel, NoLock); return address; @@ -1800,7 +1910,7 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation, relkind = get_rel_relkind(relId); if (!relkind) return; - if (relkind != RELKIND_INDEX) + if (relkind != RELKIND_INDEX && relkind != RELKIND_LOCAL_INDEX) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("\"%s\" is not an index", relation->relname))); diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 3b28e8c..da92211 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -1113,9 +1113,13 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid, * It chooses RELKIND_RELATION for both regular and partitioned tables. * That means we must be careful before giving the wrong type error when * the relation is RELKIND_PARTITIONED_TABLE. + * + * Similar statements hold for RELKIND_LOCAL_INDEX and RELKIND_INDEX. */ if (classform->relkind == RELKIND_PARTITIONED_TABLE) expected_relkind = RELKIND_RELATION; + else if (classform->relkind == RELKIND_LOCAL_INDEX) + expected_relkind = RELKIND_INDEX; else expected_relkind = classform->relkind; diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h index d1d493e..21af6fa 100644 --- a/src/include/catalog/pg_class.h +++ b/src/include/catalog/pg_class.h @@ -159,6 +159,8 @@ DESCR(""); #define RELKIND_RELATION 'r' /* ordinary table */ #define RELKIND_INDEX 'i' /* secondary index */ +#define RELKIND_LOCAL_INDEX 'l' /* local index for + partitioned table */ #define RELKIND_SEQUENCE 'S' /* sequence object */ #define RELKIND_TOASTVALUE 't' /* for out-of-line values */ #define RELKIND_VIEW 'v' /* view */
local_index.sql
Description: application/sql
create table test (a int, b int) partition by range (a); create table test_0 partition of test for values from (0) to (2); create table test_1 partition of test for values from (2) to (4) partition by list (a); create table test_1_0 partition of test_1 for values in (2); create table test_1_1 partition of test_1 for values in (3); -- Test recursive simple index creation -- create index on test (b); select indexrelid::regclass, indrelid::regclass from pg_index where indrelid::regclass::text like 'test%' order by indexrelid::regclass::text; indexrelid | indrelid ----------------+---------- test_0_b_idx | test_0 test_1_0_b_idx | test_1_0 test_1_1_b_idx | test_1_1 test_1_b_idx | test_1 test_b_idx | test (5 rows) select objid::regclass, refobjid::regclass from pg_depend where refobjid::regclass::text like 'test%_idx' order by objid::regclass::text; objid | refobjid ----------------+-------------- test_0_b_idx | test_b_idx test_1_0_b_idx | test_1_b_idx test_1_1_b_idx | test_1_b_idx test_1_b_idx | test_b_idx (4 rows) -- Test index usage in SELECT query -- insert into test select i%4, i from generate_series(1, 1000) i; set enable_seqscan to off; analyze test; explain (costs off) select * from test where a=1 and b=100; QUERY PLAN ----------------------------------------------- Append -> Index Scan using test_0_b_idx on test_0 Index Cond: (b = 100) Filter: (a = 1) (4 rows) -- Test recursive index dropping -- drop index test_b_idx; ERROR: cannot drop relation test_b_idx because other objects depend on it DETAIL: relation test_1_b_idx depends on relation test_b_idx index test_1_1_b_idx depends on relation test_1_b_idx index test_1_0_b_idx depends on relation test_1_b_idx index test_0_b_idx depends on relation test_b_idx HINT: Use DROP ... CASCADE to drop the dependent objects too. drop index test_0_b_idx; select indexrelid::regclass, indrelid::regclass from pg_index where indrelid::regclass::text like 'test%' order by indexrelid::regclass::text; indexrelid | indrelid ----------------+---------- test_1_0_b_idx | test_1_0 test_1_1_b_idx | test_1_1 test_1_b_idx | test_1 test_b_idx | test (4 rows) select objid::regclass, refobjid::regclass from pg_depend where refobjid::regclass::text like 'test%_idx' order by objid::regclass::text; objid | refobjid ----------------+-------------- test_1_0_b_idx | test_1_b_idx test_1_1_b_idx | test_1_b_idx test_1_b_idx | test_b_idx (3 rows) drop index test_b_idx cascade; NOTICE: drop cascades to 3 other objects DETAIL: drop cascades to relation test_1_b_idx drop cascades to index test_1_1_b_idx drop cascades to index test_1_0_b_idx select indexrelid::regclass, indrelid::regclass from pg_index where indrelid::regclass::text like 'test%' order by indexrelid::regclass::text; indexrelid | indrelid ------------+---------- (0 rows) select objid::regclass, refobjid::regclass from pg_depend where refobjid::regclass::text like 'test%_idx' order by objid::regclass::text; objid | refobjid -------+---------- (0 rows) -- Test creating of naming index -- create index local_idx_on_test on test (b); select indexrelid::regclass, indrelid::regclass from pg_index where indrelid::regclass::text like 'test%' order by indexrelid::regclass::text; indexrelid | indrelid -------------------+---------- local_idx_on_test | test test_0_b_idx | test_0 test_1_0_b_idx | test_1_0 test_1_1_b_idx | test_1_1 test_1_b_idx | test_1 (5 rows) -- Test reindex -- reindex index local_idx_on_test; reindex table test; reindex schema public; drop table test cascade;
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers