Re: [HACKERS] Proposal: Local indexes for partitioned table
Alvaro Herrera writes: > Tom Lane wrote: >> It might work to build the new key in a context that's initially a >> child of CurrentMemoryContext, then reparent it to be a child of >> CacheMemoryContext when done. > That's another way (than the PG_TRY block), but I think it's more > complicated with no gain. I disagree: PG_TRY blocks are pretty expensive. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
> Robert Haas writes: > > We could do that, but the motivation for the current system was to > > avoid leaking memory in a long-lived context. Yeah, my approach here is to use a CATCH block that deletes the memory context just created, thus avoiding a long-lived leak. Tom Lane wrote: > Another key point is to avoid leaving a corrupted relcache entry behind > if you fail partway through. Sure ... in the code as I have it we only assign the local variable to the relcache entry if everything is succesful. So no relcache corruption should result. > It might work to build the new key in a context that's initially a > child of CurrentMemoryContext, then reparent it to be a child of > CacheMemoryContext when done. That's another way (than the PG_TRY block), but I think it's more complicated with no gain. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
Robert Haas writes: > On Fri, Oct 27, 2017 at 7:27 PM, Alvaro Herrera > wrote: >> I noticed that RelationBuildPartitionKey is generating a partition key >> in a temp context, then creating a private context and copying the key >> into that. That seems leftover from some previous iteration of some >> other patch; I think it's pretty reasonable to create the new context >> right from the start and allocate the key there directly instead. Then >> there's no need for copy_partition_key at all. > We could do that, but the motivation for the current system was to > avoid leaking memory in a long-lived context. I think the same > technique is used elsewhere for similar reasons. I admit I haven't > checked whether there would actually be a leak here if we did it as > you propose, but I wouldn't find it at all surprising. Another key point is to avoid leaving a corrupted relcache entry behind if you fail partway through. I think that the current coding is RelationBuildPartitionKey is safe, although it's far too undercommented for my taste. (The ordering of the last few steps is *critical*.) It might work to build the new key in a context that's initially a child of CurrentMemoryContext, then reparent it to be a child of CacheMemoryContext when done. But you'd have to look at whether that would end up wasting long-lived memory, if the build process generates any cruft in the same context. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On Fri, Oct 27, 2017 at 7:27 PM, Alvaro Herrera wrote: > I noticed that RelationBuildPartitionKey is generating a partition key > in a temp context, then creating a private context and copying the key > into that. That seems leftover from some previous iteration of some > other patch; I think it's pretty reasonable to create the new context > right from the start and allocate the key there directly instead. Then > there's no need for copy_partition_key at all. We could do that, but the motivation for the current system was to avoid leaking memory in a long-lived context. I think the same technique is used elsewhere for similar reasons. I admit I haven't checked whether there would actually be a leak here if we did it as you propose, but I wouldn't find it at all surprising. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
Alvaro Herrera wrote: > I'm now working on the ability to build unique indexes (and unique > constraints) on top of this. So I think there's not a lot of additional code required to support unique indexes with the restrictions mentioned; proof-of-concept (with several holes still) attached. As before, this is not finished, as there a few things that are wrong (such as generateClonedIndexStmt taking a RangeVar), and others that I don't understand (such as why is rd_partkey NULL for partitioned partitions when DefineIndex cascades to them and the columns are checked). I noticed that RelationBuildPartitionKey is generating a partition key in a temp context, then creating a private context and copying the key into that. That seems leftover from some previous iteration of some other patch; I think it's pretty reasonable to create the new context right from the start and allocate the key there directly instead. Then there's no need for copy_partition_key at all. Anyway, here is a preliminary submission before I close shop for the week. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 37e683e352df5f3e6c110d94881abe888e36953e Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 16 Oct 2017 12:01:12 +0200 Subject: [PATCH v2 1/7] Tweak index_create/index_constraint_create APIs I was annoyed by index_create and index_constraint_create respective APIs. It's too easy to get confused when adding a new behavioral flag given the large number of boolean flags they already have. Turning them into a flags bitmask makes that code easier to read, as in the attached patch. I split the flags in two -- one set for index_create directly and another related to constraints. index_create() itself receives both, and then passes down the constraint one to index_constraint_create. It is shorter in LOC terms to create a single set mixing all the values, but it seemed to make more sense this way. Also, I chose not to turn the booleans 'allow_sytem_table_mods' and 'is_internal' into flag bits because of the different way they interact with this code. --- src/backend/catalog/index.c | 101 +++ src/backend/catalog/toasting.c | 3 +- src/backend/commands/indexcmds.c | 33 + src/backend/commands/tablecmds.c | 13 +++-- src/include/catalog/index.h | 29 ++- 5 files changed, 100 insertions(+), 79 deletions(-) diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index c7b2f031f0..17faeffada 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -680,19 +680,25 @@ UpdateIndexRelation(Oid indexoid, * classObjectId: array of index opclass OIDs, one per index column * coloptions: array of per-index-column indoption settings * reloptions: AM-specific options - * isprimary: index is a PRIMARY KEY - * isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION constraint - * deferrable: constraint is DEFERRABLE - * initdeferred: constraint is INITIALLY DEFERRED + * flags: bitmask that can include any combination of these bits: + * INDEX_CREATE_IS_PRIMARY + * the index is a primary key + * INDEX_CREATE_ADD_CONSTRAINT: + * invoke index_constraint_create also + * INDEX_CREATE_SKIP_BUILD: + * skip the index_build() step for the moment; caller must do it + * later (typically via reindex_index()) + * INDEX_CREATE_CONCURRENT: + * do not lock the table against writers. The index will be + * marked "invalid" and the caller must take additional steps + * to fix it up. + * INDEX_CREATE_IF_NOT_EXISTS: + * do not throw an error if a relation with the same name + * already exists. + * constr_flags: flags passed to index_constraint_create + * (only if INDEX_CREATE_ADD_CONSTRAINT is set) * allow_system_table_mods: allow table to be a system catalog - * skip_build: true to skip the index_build() step for the moment; caller - * must do it later (typically via reindex_index()) - * concurrent: if true, do not lock the table against writers. The index - * will be marked "invalid" and the caller must take additional steps - * to fix it up. * is_internal: if true, post creation hook for new index - * if_not_exists: if true, do not throw an error if a relation with - * the same name already exists. * * Returns the OID of the created index. */ @@ -709,15 +715,10 @@ index_create(Relation heapRelation, Oid *classObjectId, int16 *coloptions, Datum reloptions, -bool isprimary, -bool isconstraint, -
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 2017/10/24 1:15, Alvaro Herrera wrote: > Robert Haas wrote: >> On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera >> wrote: >>> I started with Maksim's submitted code, and developed according to the >>> ideas discussed in this thread. Attached is a very WIP patch series for >>> this feature. Nice! >>> Many things remain to be done before this is committable: pg_dump >>> support needs to be written. ALTER INDEX ATTACH/DETACH not yet >>> implemented. No REINDEX support yet. Docs not updated (but see the >>> regression test as a guide for how this is supposed to work; see patch >>> 0005). CREATE INDEX CONCURRENTLY not done yet. >>> >>> I'm now working on the ability to build unique indexes (and unique >>> constraints) on top of this. >> >> Cool. Are you planning to do that by (a) only allowing the special >> case where the partition key columns/expressions are included in the >> indexed columns/expressions, (b) trying to make every insert to any >> index check all of the indexes for uniqueness conflicts, or (c) >> implementing global indexes? Because (b) sounds complex - think about >> attach operations, for example - and (c) sounds super-hard. I'd >> suggest doing (a) first, just on the basis of complexity. > > Yes, I think (a) is a valuable thing to have -- not planning on doing > (c) at all because I fear it'll be a huge time sink. I'm not sure about > (b), but it's not currently on my plan. +1 to proceeding with (a) first. >> I hope that you don't get so involved in making this unique index >> stuff work that we don't get the cascading index feature, at least, >> committed to v11. That's already a considerable step forward in terms >> of ease of use, and I'd really like to have it. > > Absolutely -- I do plan to get this one finished regardless of unique > indexes. +1 Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
Robert Haas wrote: > On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera > wrote: > > I started with Maksim's submitted code, and developed according to the > > ideas discussed in this thread. Attached is a very WIP patch series for > > this feature. > > > > Many things remain to be done before this is committable: pg_dump > > support needs to be written. ALTER INDEX ATTACH/DETACH not yet > > implemented. No REINDEX support yet. Docs not updated (but see the > > regression test as a guide for how this is supposed to work; see patch > > 0005). CREATE INDEX CONCURRENTLY not done yet. > > > > I'm now working on the ability to build unique indexes (and unique > > constraints) on top of this. > > Cool. Are you planning to do that by (a) only allowing the special > case where the partition key columns/expressions are included in the > indexed columns/expressions, (b) trying to make every insert to any > index check all of the indexes for uniqueness conflicts, or (c) > implementing global indexes? Because (b) sounds complex - think about > attach operations, for example - and (c) sounds super-hard. I'd > suggest doing (a) first, just on the basis of complexity. Yes, I think (a) is a valuable thing to have -- not planning on doing (c) at all because I fear it'll be a huge time sink. I'm not sure about (b), but it's not currently on my plan. > I hope that you don't get so involved in making this unique index > stuff work that we don't get the cascading index feature, at least, > committed to v11. That's already a considerable step forward in terms > of ease of use, and I'd really like to have it. Absolutely -- I do plan to get this one finished regardless of unique indexes. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera wrote: > I started with Maksim's submitted code, and developed according to the > ideas discussed in this thread. Attached is a very WIP patch series for > this feature. > > Many things remain to be done before this is committable: pg_dump > support needs to be written. ALTER INDEX ATTACH/DETACH not yet > implemented. No REINDEX support yet. Docs not updated (but see the > regression test as a guide for how this is supposed to work; see patch > 0005). CREATE INDEX CONCURRENTLY not done yet. > > I'm now working on the ability to build unique indexes (and unique > constraints) on top of this. Cool. Are you planning to do that by (a) only allowing the special case where the partition key columns/expressions are included in the indexed columns/expressions, (b) trying to make every insert to any index check all of the indexes for uniqueness conflicts, or (c) implementing global indexes? Because (b) sounds complex - think about attach operations, for example - and (c) sounds super-hard. I'd suggest doing (a) first, just on the basis of complexity. I hope that you don't get so involved in making this unique index stuff work that we don't get the cascading index feature, at least, committed to v11. That's already a considerable step forward in terms of ease of use, and I'd really like to have it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
Hello I started with Maksim's submitted code, and developed according to the ideas discussed in this thread. Attached is a very WIP patch series for this feature. Many things remain to be done before this is committable: pg_dump support needs to be written. ALTER INDEX ATTACH/DETACH not yet implemented. No REINDEX support yet. Docs not updated (but see the regression test as a guide for how this is supposed to work; see patch 0005). CREATE INDEX CONCURRENTLY not done yet. I'm now working on the ability to build unique indexes (and unique constraints) on top of this. The docs have not been updated yet, but the new regression test file illustrates how this works. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 8e8bf2f587188859eba1f61a44ee9ee08d960f51 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 16 Oct 2017 12:01:12 +0200 Subject: [PATCH v1 1/5] Tweak index_create/index_constraint_create APIs I was annoyed by index_create and index_constraint_create respective APIs. It's too easy to get confused when adding a new behavioral flag given the large number of boolean flags they already have. Turning them into a flags bitmask makes that code easier to read, as in the attached patch. I split the flags in two -- one set for index_create directly and another related to constraints. index_create() itself receives both, and then passes down the constraint one to index_constraint_create. It is shorter in LOC terms to create a single set mixing all the values, but it seemed to make more sense this way. Also, I chose not to turn the booleans 'allow_sytem_table_mods' and 'is_internal' into flag bits because of the different way they interact with this code. --- src/backend/catalog/index.c | 101 +++ src/backend/catalog/toasting.c | 3 +- src/backend/commands/indexcmds.c | 33 + src/backend/commands/tablecmds.c | 13 +++-- src/include/catalog/index.h | 29 ++- 5 files changed, 100 insertions(+), 79 deletions(-) diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index c7b2f031f0..17faeffada 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -680,19 +680,25 @@ UpdateIndexRelation(Oid indexoid, * classObjectId: array of index opclass OIDs, one per index column * coloptions: array of per-index-column indoption settings * reloptions: AM-specific options - * isprimary: index is a PRIMARY KEY - * isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION constraint - * deferrable: constraint is DEFERRABLE - * initdeferred: constraint is INITIALLY DEFERRED + * flags: bitmask that can include any combination of these bits: + * INDEX_CREATE_IS_PRIMARY + * the index is a primary key + * INDEX_CREATE_ADD_CONSTRAINT: + * invoke index_constraint_create also + * INDEX_CREATE_SKIP_BUILD: + * skip the index_build() step for the moment; caller must do it + * later (typically via reindex_index()) + * INDEX_CREATE_CONCURRENT: + * do not lock the table against writers. The index will be + * marked "invalid" and the caller must take additional steps + * to fix it up. + * INDEX_CREATE_IF_NOT_EXISTS: + * do not throw an error if a relation with the same name + * already exists. + * constr_flags: flags passed to index_constraint_create + * (only if INDEX_CREATE_ADD_CONSTRAINT is set) * allow_system_table_mods: allow table to be a system catalog - * skip_build: true to skip the index_build() step for the moment; caller - * must do it later (typically via reindex_index()) - * concurrent: if true, do not lock the table against writers. The index - * will be marked "invalid" and the caller must take additional steps - * to fix it up. * is_internal: if true, post creation hook for new index - * if_not_exists: if true, do not throw an error if a relation with - * the same name already exists. * * Returns the OID of the created index. */ @@ -709,15 +715,10 @@ index_create(Relation heapRelation, Oid *classObjectId, int16 *coloptions, Datum reloptions, -bool isprimary, -bool isconstraint, -bool deferrable, -bool initdeferred, +uint16 flags, +uint16 constr_flags, bool allow_system_table_mods, -bool skip_build, -bool concurrent, -bool is_internal, -bool if_not_exis
Re: [HACKERS] Proposal: Local indexes for partitioned table
On Tue, Oct 10, 2017 at 11:22 AM, Alvaro Herrera wrote: > Hmm ... yeah, ATTACH and DETACH sound acceptable to me. On DETACH, the > abstract index should be marked indisvalid=false unless a substitute > index already exists; and on ATTACH when indisvalid=false we verify that > all local indexes exist, and if so we can flip indisvalid. That way, we > can continue to rely on the parent index always having all its children > when the flag is set. True. I don't see an immediate use for that guarantee, actually, because the index can't be "scanned" either way. But it might be useful someday, if for example we wanted to try plan large numbers of children symmetrically rather than (as we do currently) planning each one individually, or if we've got an idea about making foreign keys work. I think you could ignore this for now, though, if you want. > I'm not clear on a syntax that creates the main index and hopes to later > have the sub-indexes created. Another approach is to do it the other > way around, i.e. create the children first, then once they're all in > place create the main one normally, which merely verifies that all the > requisite children exist. This is related to what I proposed to occur > when a regular table is joined as a partition of the partitioned table: > we run a verification that an index matching the parent's abstract > indexes exists, and if not we raise an error. (Alternatively we could > allow the case, and mark the abstract index as indisvalid=false, but > that seems to violate POLA). It's not much different than pg_catalog.binary_upgrade_create_empty_extension but I don't really care which way pg_dump emits it. >> Another thing that would let you do is CREATE INDEX CONCURRENTLY >> replacement_concrete_index; ALTER INDEX abstract_index DETACH >> PARTITION old_concrete_index, ATTACH PARTITION >> replacement_concrete_index; DROP INDEX CONCURRENTLY >> old_concrete_index, which seems like a thing someone might want to do. > > Yeah, this is a point I explicitly mentioned, and this proposal seems > like a good way. Cool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
Robert Haas wrote: > On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera > wrote: > > 2. create one index for each existing partition. These would be > >identical to what would happen if you created the index directly on > >each partition, except that there is an additional dependency to the > >parent's abstract index. > > One thing I'm a bit worried about is how to name these subordinate > indexes. They have to have names because that's how pg_class works, > and those names can't all be the same, again because that's how > pg_class works. There's no problem right away when you first create > the partitioned index, because you can just pick names out of a hat > using whatever name-generating algorithm seems best. However, when > you dump-and-restore (including but not limited to the pg_upgrade > case) you've got to preserve those names. If you just generate a new > name that may or may not be the same as the old one, then it may > collide with a user-specified name that only occurs later in the dump. > Also, you'll have trouble if the user has applied a COMMENT or a > SECURITY LABEL to the index because that command works by name, or if > the user has a reference to the index name inside a function or > whatever. > > These are pretty annoying corner-case bugs because they're not likely > to come up very often. Most people won't notice or care if the index > name changes. But I don't think it's acceptable to just ignore the > problem. I agree it's a problem that needs to be addressed directly. > An idea I had was to treat the abstract index - to use your > term - sort of the way we treat an extension. Normally, when you > create an index on a partitioned table, it cascades, but for dump and > restore purpose, we tag on some syntax that says "well, don't actually > create the subordinate indexes, i'll tell you about those later". > Then for each subordinate index we issue a separate CREATE INDEX > command followed by ALTER INDEX abstract_index ATTACH PARTITION > concrete_index or something of that sort. That means you can't > absolutely count on the parent index to have all of the children it's > supposed to have but maybe that's OK. Hmm ... yeah, ATTACH and DETACH sound acceptable to me. On DETACH, the abstract index should be marked indisvalid=false unless a substitute index already exists; and on ATTACH when indisvalid=false we verify that all local indexes exist, and if so we can flip indisvalid. That way, we can continue to rely on the parent index always having all its children when the flag is set. I'm not clear on a syntax that creates the main index and hopes to later have the sub-indexes created. Another approach is to do it the other way around, i.e. create the children first, then once they're all in place create the main one normally, which merely verifies that all the requisite children exist. This is related to what I proposed to occur when a regular table is joined as a partition of the partitioned table: we run a verification that an index matching the parent's abstract indexes exists, and if not we raise an error. (Alternatively we could allow the case, and mark the abstract index as indisvalid=false, but that seems to violate POLA). > Another thing that would let you do is CREATE INDEX CONCURRENTLY > replacement_concrete_index; ALTER INDEX abstract_index DETACH > PARTITION old_concrete_index, ATTACH PARTITION > replacement_concrete_index; DROP INDEX CONCURRENTLY > old_concrete_index, which seems like a thing someone might want to do. Yeah, this is a point I explicitly mentioned, and this proposal seems like a good way. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On Sat, Oct 7, 2017 at 7:04 PM, Robert Haas wrote: > On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera > wrote: >> 2. create one index for each existing partition. These would be >>identical to what would happen if you created the index directly on >>each partition, except that there is an additional dependency to the >>parent's abstract index. > > One thing I'm a bit worried about is how to name these subordinate > indexes. They have to have names because that's how pg_class works, > and those names can't all be the same, again because that's how > pg_class works. There's no problem right away when you first create > the partitioned index, because you can just pick names out of a hat > using whatever name-generating algorithm seems best. However, when > you dump-and-restore (including but not limited to the pg_upgrade > case) you've got to preserve those names. If you just generate a new > name that may or may not be the same as the old one, then it may > collide with a user-specified name that only occurs later in the dump. > Also, you'll have trouble if the user has applied a COMMENT or a > SECURITY LABEL to the index because that command works by name, or if > the user has a reference to the index name inside a function or > whatever. > > These are pretty annoying corner-case bugs because they're not likely > to come up very often. Most people won't notice or care if the index > name changes. But I don't think it's acceptable to just ignore the > problem. An idea I had was to treat the abstract index - to use your > term - sort of the way we treat an extension. Normally, when you > create an index on a partitioned table, it cascades, but for dump and > restore purpose, we tag on some syntax that says "well, don't actually > create the subordinate indexes, i'll tell you about those later". > Then for each subordinate index we issue a separate CREATE INDEX > command followed by ALTER INDEX abstract_index ATTACH PARTITION > concrete_index or something of that sort. That means you can't > absolutely count on the parent index to have all of the children it's > supposed to have but maybe that's OK. +1. How about CREATE INDEX ... PARTITION OF ... FOR TABLE ...? to create the index and attach it? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
07.10.17 16:34, Robert Haas wrote: On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera wrote: One thing I'm a bit worried about is how to name these subordinate indexes. They have to have names because that's how pg_class works, and those names can't all be the same, again because that's how pg_class works. There's no problem right away when you first create the partitioned index, because you can just pick names out of a hat using whatever name-generating algorithm seems best. However, when you dump-and-restore (including but not limited to the pg_upgrade case) you've got to preserve those names. If you just generate a new name that may or may not be the same as the old one, then it may collide with a user-specified name that only occurs later in the dump. Also, you'll have trouble if the user has applied a COMMENT or a SECURITY LABEL to the index because that command works by name, or if the user has a reference to the index name inside a function or whatever. These are pretty annoying corner-case bugs because they're not likely to come up very often. Most people won't notice or care if the index name changes. But I don't think it's acceptable to just ignore the problem. An idea I had was to treat the abstract index - to use your term - sort of the way we treat an extension. Normally, when you create an index on a partitioned table, it cascades, but for dump and restore purpose, we tag on some syntax that says "well, don't actually create the subordinate indexes, i'll tell you about those later". Then for each subordinate index we issue a separate CREATE INDEX command followed by ALTER INDEX abstract_index ATTACH PARTITION concrete_index or something of that sort. That means you can't absolutely count on the parent index to have all of the children it's supposed to have but maybe that's OK. AFAICS, the main problem with naming is generating new unique names for subordinate indexes on the stage of migrating data scheme (pg_dump, pg_upgrade, etc). And we cannot specify these names in the 'CREATE INDEX partitioned_index' statement therefore we have to regenerate their. In this case I propose to restore index names' hierarchy *bottom-up*, i.e. first of all create indexes for the leaf partitions and then create ones for parents up to root explicitly specifying names. When creating index on parent table we have to check is there exist any index on child table that could be child index (identical criteria). If so, not generate new index but implicitly attach that index into parent one. If we have incomplete index hierarchy, e.g. we dropped some indexes of partitions previously, then recreating of parent's index would regenerate (not attach) indexes for those partitions. We could drop those odd generated indexes after building of parent's index. This decision is not straightforward but provides to consider 'CREATE INDEX paritioned_table' statement as a cascade operation. As a result, we can specify name for each concrete index while recreating a whole hierarchy. -- Regards, Maksim Milyutin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera wrote: > 2. create one index for each existing partition. These would be >identical to what would happen if you created the index directly on >each partition, except that there is an additional dependency to the >parent's abstract index. One thing I'm a bit worried about is how to name these subordinate indexes. They have to have names because that's how pg_class works, and those names can't all be the same, again because that's how pg_class works. There's no problem right away when you first create the partitioned index, because you can just pick names out of a hat using whatever name-generating algorithm seems best. However, when you dump-and-restore (including but not limited to the pg_upgrade case) you've got to preserve those names. If you just generate a new name that may or may not be the same as the old one, then it may collide with a user-specified name that only occurs later in the dump. Also, you'll have trouble if the user has applied a COMMENT or a SECURITY LABEL to the index because that command works by name, or if the user has a reference to the index name inside a function or whatever. These are pretty annoying corner-case bugs because they're not likely to come up very often. Most people won't notice or care if the index name changes. But I don't think it's acceptable to just ignore the problem. An idea I had was to treat the abstract index - to use your term - sort of the way we treat an extension. Normally, when you create an index on a partitioned table, it cascades, but for dump and restore purpose, we tag on some syntax that says "well, don't actually create the subordinate indexes, i'll tell you about those later". Then for each subordinate index we issue a separate CREATE INDEX command followed by ALTER INDEX abstract_index ATTACH PARTITION concrete_index or something of that sort. That means you can't absolutely count on the parent index to have all of the children it's supposed to have but maybe that's OK. Another thing that would let you do is CREATE INDEX CONCURRENTLY replacement_concrete_index; ALTER INDEX abstract_index DETACH PARTITION old_concrete_index, ATTACH PARTITION replacement_concrete_index; DROP INDEX CONCURRENTLY old_concrete_index, which seems like a thing someone might want to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
Hi! On 06.10.2017 19:37, Alvaro Herrera wrote: As you propose, IMO this new feature would use the standard index creation syntax: CREATE [UNIQUE] INDEX someindex ON parted_table (a, b); This command currently throws an error. We'd have it do two things: 1. create catalog rows in pg_class and pg_index for the main table, indicating the existance of this partitioned index. These would not point to an actual index (since the main table itself is empty), but instead to an "abstract" index. This abstract index can be used by various operations; see below. Robert Haas proposed[1] to use the name "partitioned index" (instead of abstract) that have to be reflected in 'relkind' field of pg_class as the RELKIND_PARTITIONED_INDEX value. I propose that an index declared UNIQUE throws an error for now. We can implement uniqueness (for the case where the indexed columns match the partitioning key) later, once we sort out all the issues here first. I think unique indexes would be very useful even with that limitation, but let's have it as a separate project. Yes, global uniqueness through local unique indexes causes further work related with foreign keys on partitioned table, full support of INSERT OF CONFLICT, etc. It make sense to implement after the current stage of work. I think using pg_depend as the mechanism to link partition indexes to parent is a bad idea. How about pg_inherits instead? Seems more appropriate. Greg Stark proposed[2] to use new pg_index field of oid type that refers to the parent pg_index item. AFAIC pg_inherits also makes sense but semantically it deals with inheriting tables. IMHO the using of this catalog table to define relation between partitioned table and partitions looks like a hack to make use of constraint exclusion logic for partition pruning. Creating hierachy-wide indexes for existing partitioned tables is likely to take a long time, so we must include CONCURRENTLY as an option. This will need some transaction machinery support in order to ensure that each partition gets its index created at some point in a long chain of transactions, and that the whole thing is marked valid only at the end. Also, if the creation is interrupted (because of a crash or a regular shutdown), it'll be useful to be able to continue rather than being forced to start from scratch. This option was very difficult for me. I would be interested to see the implementation. During ALTER TABLE ... ATTACH PARTITION, we check that an indexing satisfying the abstract index exist (and we create a pg_inherit link). If not, the command is aborted. We could create necessary index for partition, not abort ALTER TABLE ... ATTACH PARTITION statement. We need to come up with some way to generate names for each partition index. I think the calling 'ChooseIndexName(RelationGetRelationName(childrel), namespaceId, indexColNames, ...)' resolves this problem. I am going to work on this now. It will be great! I think this project is difficult for me on the part of integration described above functionality with the legacy postgres code. Also IMO this project is very important because it opens the way for such feature as global uniqueness of fields of partitioned tables. And any protraction in implementation is bad. I would like to review and test your intermediate results. 1. https://www.postgresql.org/message-id/CA%2BTgmoY5UOUnW%3DMcwT7xUB_2W5dAkvOg5kD20Spx5gF-Ad47cA%40mail.gmail.com 2. https://www.postgresql.org/message-id/CAM-w4HOVftuv5RVi3a%2BsRV6nBpg204w7%3DL8MwPXVvYBFo1uM1Q%40mail.gmail.com -- Regards, Maksim Milyutin
Re: [HACKERS] Proposal: Local indexes for partitioned table
Hello I've been thinking about this issue too. I think your patch is not ambitious enough. Here's my brain dump on the issue, to revive discussion. As you propose, IMO this new feature would use the standard index creation syntax: CREATE [UNIQUE] INDEX someindex ON parted_table (a, b); This command currently throws an error. We'd have it do two things: 1. create catalog rows in pg_class and pg_index for the main table, indicating the existance of this partitioned index. These would not point to an actual index (since the main table itself is empty), but instead to an "abstract" index. This abstract index can be used by various operations; see below. 2. create one index for each existing partition. These would be identical to what would happen if you created the index directly on each partition, except that there is an additional dependency to the parent's abstract index. If partitions are themselves partitioned, we would recursively apply the indexes to the sub-partitions by doing (1) above for the partitioned partition. Once the index has been created for all existing partitions, the hierarchy-wide index becomes valid and can be used normally by the planner/executor. I think we could use the pg_index.indisvalid property for the abstract index for this. I propose that an index declared UNIQUE throws an error for now. We can implement uniqueness (for the case where the indexed columns match the partitioning key) later, once we sort out all the issues here first. I think unique indexes would be very useful even with that limitation, but let's have it as a separate project. I think using pg_depend as the mechanism to link partition indexes to parent is a bad idea. How about pg_inherits instead? Seems more appropriate. Creating hierachy-wide indexes for existing partitioned tables is likely to take a long time, so we must include CONCURRENTLY as an option. This will need some transaction machinery support in order to ensure that each partition gets its index created at some point in a long chain of transactions, and that the whole thing is marked valid only at the end. Also, if the creation is interrupted (because of a crash or a regular shutdown), it'll be useful to be able to continue rather than being forced to start from scratch. When a new partition is added, indexes satisfying the partitioned table's abstract indexes are created automatically. During ALTER TABLE ... ATTACH PARTITION, we check that an indexing satisfying the abstract index exist (and we create a pg_inherit link). If not, the command is aborted. REINDEX is easily supported (just reindex each partition's index individually), but that command is blocking, which is not good. For concurrent operation for tables not partitioned, a typical pattern to avoid blocking the table is to suggest creation of an identical index using CREATE INDEX CONCURRENTLY, then drop the original one. That's not going to work with these partitioned indexes, which is going to be a problem. I don't have any great ideas about this part yet. We need to come up with some way to generate names for each partition index. I am going to work on this now. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
10.08.17 23:01, Robert Haas wrote: On Wed, Apr 19, 2017 at 5:25 AM, Maksim Milyutin wrote: Ok, thanks for the feedback. Then I'll use a new relkind for local partitioned index in further development. Any update on this? I'll continue to work soon. Sorry for so long delay. -- Regards, Maksim Milyutin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On Wed, Apr 19, 2017 at 5:25 AM, Maksim Milyutin wrote: > Ok, thanks for the feedback. Then I'll use a new relkind for local > partitioned index in further development. Any update on this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 19.04.2017 11:42, Ashutosh Bapat wrote: On Tue, Apr 18, 2017 at 4:43 PM, Maksim Milyutin wrote: Local partitioned indexes can be recognized through the check on the relkind of table to which the index refers. Something like this: heap = relation_open(IndexGetRelation(indexid, false), heapLockmode); if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) /* indexid is local index on partitioned table */ An index on partitioned table can be global index (yet to be implemented) or a local index. We can not differentiate between those just by looking at the relation on which they are built. We could to refine the criteria for the local partitioned index later encapsulating it in a macro, e.g., adding a new flag from pg_index that differentiate the type of index on partitioned table. Thеsе cases must be caught. But as much as partitioned tables doesn't participate in query plans their indexes are unaccessible by executor. Reindex operation is overloaded with my patch. A global index would have storage for a partitioned table whereas a local index wouldn't have any storage for a partitioned table. I agree with Amit that we need new relkinds for local as well as global indexes. Ok, thanks for the feedback. Then I'll use a new relkind for local partitioned index in further development. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On Tue, Apr 18, 2017 at 4:43 PM, Maksim Milyutin wrote: > On 18.04.2017 13:08, Amit Langote wrote: >> >> Hi, >> > > Hi, Amit! > >> On 2017/04/17 23:00, Maksim Milyutin wrote: >>> >>> >>> Ok, thanks for the note. >>> >>> But I want to discuss the relevancy of introduction of a new relkind for >>> partitioned index. I could to change the control flow in partitioned >>> index >>> creation (specify conditional statement in the 'index_create' routine in >>> attached patch) and not enter to the 'heap_create' routine. This case >>> releases us from integrating new relkind into different places of >>> Postgres >>> code. But we have to copy-paste some specific code from 'heap_create' >>> function, e.g., definition of relfilenode and tablespaceid for the new >>> index and perhaps something more when 'heap_create' routine will be >>> extended. >> >> >> I may be missing something, but isn't it that a new relkind will be needed >> anyway? How does the rest of the code distinguish such index objects once >> they are created? > > > Local partitioned indexes can be recognized through the check on the relkind > of table to which the index refers. Something like this: > > heap = relation_open(IndexGetRelation(indexid, false), heapLockmode); > if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > /* indexid is local index on partitioned table */ An index on partitioned table can be global index (yet to be implemented) or a local index. We can not differentiate between those just by looking at the relation on which they are built. > >> Is it possible that some other code may try to access >> the storage for an index whose indrelid is a partitioned table? >> > > Thеsе cases must be caught. But as much as partitioned tables doesn't > participate in query plans their indexes are unaccessible by executor. > Reindex operation is overloaded with my patch. > A global index would have storage for a partitioned table whereas a local index wouldn't have any storage for a partitioned table. I agree with Amit that we need new relkinds for local as well as global indexes. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 18.04.2017 13:08, Amit Langote wrote: Hi, Hi, Amit! On 2017/04/17 23:00, Maksim Milyutin wrote: Ok, thanks for the note. But I want to discuss the relevancy of introduction of a new relkind for partitioned index. I could to change the control flow in partitioned index creation (specify conditional statement in the 'index_create' routine in attached patch) and not enter to the 'heap_create' routine. This case releases us from integrating new relkind into different places of Postgres code. But we have to copy-paste some specific code from 'heap_create' function, e.g., definition of relfilenode and tablespaceid for the new index and perhaps something more when 'heap_create' routine will be extended. I may be missing something, but isn't it that a new relkind will be needed anyway? How does the rest of the code distinguish such index objects once they are created? Local partitioned indexes can be recognized through the check on the relkind of table to which the index refers. Something like this: heap = relation_open(IndexGetRelation(indexid, false), heapLockmode); if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) /* indexid is local index on partitioned table */ Is it possible that some other code may try to access the storage for an index whose indrelid is a partitioned table? Thеsе cases must be caught. But as much as partitioned tables doesn't participate in query plans their indexes are unaccessible by executor. Reindex operation is overloaded with my patch. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
Hi, On 2017/04/17 23:00, Maksim Milyutin wrote: > On 10.04.2017 14:20, Robert Haas wrote: >> On Tue, Apr 4, 2017 at 12:10 PM, Maksim Milyutin >> wrote: >>> 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX >>> (literal 'l'). >> >> Seems like it should maybe be RELKIND_PARTITIONED_INDEX. There's >> nothing particularly "local" about it. I suppose what you're going >> for is that it's not global, but in a way it *is* global to the >> partitioning hierarchy. That's the point. It's just that it's >> partitioned. >> > > Ok, thanks for the note. > > But I want to discuss the relevancy of introduction of a new relkind for > partitioned index. I could to change the control flow in partitioned index > creation (specify conditional statement in the 'index_create' routine in > attached patch) and not enter to the 'heap_create' routine. This case > releases us from integrating new relkind into different places of Postgres > code. But we have to copy-paste some specific code from 'heap_create' > function, e.g., definition of relfilenode and tablespaceid for the new > index and perhaps something more when 'heap_create' routine will be extended. I may be missing something, but isn't it that a new relkind will be needed anyway? How does the rest of the code distinguish such index objects once they are created? Is it possible that some other code may try to access the storage for an index whose indrelid is a partitioned table? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 10.04.2017 14:20, Robert Haas wrote: On Tue, Apr 4, 2017 at 12:10 PM, Maksim Milyutin wrote: 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX (literal 'l'). Seems like it should maybe be RELKIND_PARTITIONED_INDEX. There's nothing particularly "local" about it. I suppose what you're going for is that it's not global, but in a way it *is* global to the partitioning hierarchy. That's the point. It's just that it's partitioned. Ok, thanks for the note. But I want to discuss the relevancy of introduction of a new relkind for partitioned index. I could to change the control flow in partitioned index creation (specify conditional statement in the 'index_create' routine in attached patch) and not enter to the 'heap_create' routine. This case releases us from integrating new relkind into different places of Postgres code. But we have to copy-paste some specific code from 'heap_create' function, e.g., definition of relfilenode and tablespaceid for the new index and perhaps something more when 'heap_create' routine will be extended. What do you think about this way? -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 2328b92..9c15bc9 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -41,6 +41,7 @@ #include "catalog/pg_collation.h" #include "catalog/pg_constraint.h" #include "catalog/pg_constraint_fn.h" +#include "catalog/pg_depend.h" #include "catalog/pg_operator.h" #include "catalog/pg_opclass.h" #include "catalog/pg_tablespace.h" @@ -849,17 +850,33 @@ index_create(Relation heapRelation, * we fail further down, it's the smgr's responsibility to remove the disk * file again.) */ - indexRelation = heap_create(indexRelationName, -namespaceId, -tableSpaceId, -indexRelationId, -relFileNode, -indexTupDesc, -RELKIND_INDEX, -relpersistence, -shared_relation, -mapped_relation, -allow_system_table_mods); + if (heapRelation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) + indexRelation = heap_create(indexRelationName, + namespaceId, + tableSpaceId, + indexRelationId, + relFileNode, + indexTupDesc, + RELKIND_INDEX, + relpersistence, + shared_relation, + mapped_relation, + allow_system_table_mods); + else + indexRelation = + RelationBuildLocalRelation(indexRelationName, + namespaceId, + indexTupDesc, + indexRelationId, + (OidIsValid(relFileNode)) ? + relFileNode : indexRelationId, + (tableSpaceId == MyDatabaseTableSpace) ? + InvalidOid : tableSpaceId, + shared_relation, + mapped_relation, + relpersistence, + RELKIND_INDEX); + Assert(indexRelationId == RelationGetRelid(indexRelation)); @@ -1549,10 +1566,14 @@ index_drop(Oid indexId, bool concurrent) TransferPredicateLocksToHeapRelation(userIndexRelation); } - /* - * Schedule physical removal of the files - */ - RelationDropStorage(userIndexRelation); + if (userHeapRelation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) + { + + /* + * Schedule physical removal of the files + */ + RelationDropStorage(userIndexRelation); + } /* * Close and flush the index's relcache entry, to ensure relcache doesn't @@ -3294,6 +3315,111 @@ IndexGetRelation(Oid indexId, bool missing_ok) } /* + * Find all leaf indexes included into local index with 'indexId' oid and lock + * all dependent indexes and respective relations. + * + * Search is performed in pg_depend table since all indexes belonging to child + * tables depends on index from parent table. + * + * indexId: the oid of local index whose leaf indexes need to find + * result: list of result leaf indexes + * depRel: already opened pg_depend relation + * indexLockmode: lockmode for indexes' locks + * heapLockmode: lockmode for relations' locks + */ +static void +findDepedentLeafIndexes(Oid indexId, List **result, Relation depRel, + LOCKMODE indexLockmode, LOCKMODE heapLockmode) +{ + ScanKeyData key[3]; + int nkeys; + SysScanDesc scan; + HeapTuple tup; + List *localSubIndexIds = NIL; + ListCell *lc; + + ScanKeyInit(&key[0], +Anum_pg_depend_refclassid, +BTEqualStrategyNumber, F_OIDEQ, +ObjectIdGetDatum(RelationRelationId)); + ScanKeyInit(&key[1], +Anum_pg_depend_refobjid, +BTEqualStrategyNumber, F_OIDEQ, +ObjectIdGetDatum(indexId)); + nkeys = 2; + + scan = systable_beginscan(depRel, DependReferenceIndexId, true, + NULL, nkeys, key); + + while (HeapTupleIsValid(tup = systable_getnext(scan))) + { + Form_pg_depend foundDep = (Form_pg_depend) GETSTRUCT(tup); + Relation index, + heap; + + if (foundDep->classid != RelationRelationId) + continue; + + /* Open and lock c
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 10.04.2017 13:46, Greg Stark wrote: On 4 April 2017 at 17:10, Maksim Milyutin wrote: 3. As I noticed early pg_depend table is used for cascade deleting indexes on partitioned table and its children. I also use pg_depend to determine relationship between parent and child indexes when reindex executes recursively on child indexes. Perhaps, it's not good way to use pg_depend to determine the relationship between parent and child indexes because the kind of this relationship is not defined. I could propose to add into pg_index table specific field of 'oidvector' type that specify oids of dependent indexes for the current local index. Alternately you could have an single oid in pg_index on each of the children that specifies which local index is its parent. That would probably require a new index on that column so you could look up all the children efficiently. I think it would behave more sensibly when you're adding or removing a partition, especially if you want to add many partitions in parallel using multiple transactions. An oidvector of children would effectively mean you could only be doing one partition creation or deletion at a time. Thanks for your comment. Your approach sounds better than mine. I'll try it. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On Tue, Apr 4, 2017 at 12:10 PM, Maksim Milyutin wrote: > 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX > (literal 'l'). Seems like it should maybe be RELKIND_PARTITIONED_INDEX. There's nothing particularly "local" about it. I suppose what you're going for is that it's not global, but in a way it *is* global to the partitioning hierarchy. That's the point. It's just that it's partitioned. > Perhaps, it's not good way to use pg_depend to determine the relationship > between parent and child indexes because the kind of this relationship is > not defined. I could propose to add into pg_index table specific field of > 'oidvector' type that specify oids of dependent indexes for the current > local index. I agree with Greg's comment on this point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 4 April 2017 at 17:10, Maksim Milyutin wrote: > > 3. As I noticed early pg_depend table is used for cascade deleting indexes > on partitioned table and its children. I also use pg_depend to determine > relationship between parent and child indexes when reindex executes > recursively on child indexes. > > Perhaps, it's not good way to use pg_depend to determine the relationship > between parent and child indexes because the kind of this relationship is > not defined. I could propose to add into pg_index table specific field of > 'oidvector' type that specify oids of dependent indexes for the current > local index. Alternately you could have an single oid in pg_index on each of the children that specifies which local index is its parent. That would probably require a new index on that column so you could look up all the children efficiently. I think it would behave more sensibly when you're adding or removing a partition, especially if you want to add many partitions in parallel using multiple transactions. An oidvector of children would effectively mean you could only be doing one partition creation or deletion at a time. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 01.03.2017 13:53, Maksim Milyutin wrote: Hi hackers! As I've understood from thread [1] the main issue of creating local indexes for partitions is supporting REINDEX and DROP INDEX operations on parent partitioned tables. Furthermore Robert Haas mentioned the problem of creating index on key that is represented in partitions with single value (or primitive interval) [1] i.e. under the list-partitioning or range-partitioning with unit interval. I would like to propose the following solution: 1. Create index for hierarchy of partitioned tables and partitions recursively. Don't create relfilenode for indexes on parents, only entries in catalog (much like the partitioned table's storage elimination in [2]). Abstract index for partitioned tables is only for the reference on indexes of child tables to perform REINDEX and DROP INDEX operations. 2. Specify created indexes in pg_depend table so that indexes of child tables depend on corresponding indexes of parent tables with type of dependency DEPENDENCY_NORMAL so that index could be removed separately for partitions and recursively/separately for partitioned tables. 3. REINDEX on index of partitioned table would perform this operation on existing indexes of corresponding partitions. In this case it is necessary to consider such operations as REINDEX SCHEMA | DATABASE | SYSTEM so that partitions' indexes wouldn't be re-indexed multiple times in a row. Any thoughts? 1. https://www.postgresql.org/message-id/CA+TgmoZUwj=qynak+f7xef4w_e2g3xxdmnsnzmzjuinhrco...@mail.gmail.com 2. https://www.postgresql.org/message-id/2b0d42f2-3a53-763b-c9c2-47139e4b1c2e%40lab.ntt.co.jp I want to present the first version of patches that implement local indexes for partitioned tables and discuss some technical details of that implementation. 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX (literal 'l'). This was done because physical storage is created in the 'heap_create' function and we need to revoke the creating storage as with partitioned tables. Since information that this index belongs to partitioned tables is not available in 'heap_create' function (pg_index entry on the index is not created yet) I chose the least painful way - added a specific relkind for index on partitioned table. I suppose that this act will require the integrating new relkind to different places of source code so I'm ready to consider another proposals on this point. 2. My implementation doesn't support the concurrent creating of local index (CREATE INDEX CONCURRENTLY). As I understand, this operation involves nontrivial manipulation with snapshots and I don't know how to implement concurrent creating of multiple indexes. In this point I ask help from community. 3. As I noticed early pg_depend table is used for cascade deleting indexes on partitioned table and its children. I also use pg_depend to determine relationship between parent and child indexes when reindex executes recursively on child indexes. Perhaps, it's not good way to use pg_depend to determine the relationship between parent and child indexes because the kind of this relationship is not defined. I could propose to add into pg_index table specific field of 'oidvector' type that specify oids of dependent indexes for the current local index. On this stage I want to discuss only technical details of local indexes' implementation. The problems related to merging existing indexes of partitions within local index tree, determination uniqueness of field in global sense through local index and syntax notes I want to arise later. CC welcome! -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c index cc5ac8b..bec3983 100644 --- a/src/backend/access/index/indexam.c +++ b/src/backend/access/index/indexam.c @@ -154,7 +154,8 @@ index_open(Oid relationId, LOCKMODE lockmode) r = relation_open(relationId, lockmode); - if (r->rd_rel->relkind != RELKIND_INDEX) + if (r->rd_rel->relkind != RELKIND_INDEX && + r->rd_rel->relkind != RELKIND_LOCAL_INDEX) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("\"%s\" is not an index", diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index fc088b2..26a10e9 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -1107,7 +1107,8 @@ doDeletion(const ObjectAddress *object, int flags) { char relKind = get_rel_relkind(object->objectId); -if (relKind == RELKIND_INDEX) +if (relKind == RELKIND_INDEX || + relKind == RELKIND_LOCAL_INDEX) { bool concurrent = ((flags & PERFORM_DELETION_CONCURRENTLY) != 0); diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 36917c8..91ac740 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -293,6 +293,7 @@ heap_create(c
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 02.03.2017 11:41, Robert Haas wrote: Sounds generally good. One thing to keep in mind is that - in this system - a UNIQUE index on the parent doesn't actually guarantee uniqueness across the whole partitioning hierarchy unless it so happens that the index columns or expressions are the same as the partitioning columns or expressions. That's a little a counterintuitive, and people have already been complaining that a partitioned table + partitions doesn't look enough like a plain table. However, I'm not sure there's a better alternative, because somebody might want partition-wise unique indexes even though that doesn't guarantee global uniqueness. So I think if someday we have global indexes, then we can plan to use some other syntax for that, like CREATE GLOBAL [ UNIQUE ] INDEX. Yes, I absolutely agree with your message that cross-partition uniqueness is guaranteed through global index on partitioned table apart from the case when the index key are the same as partitioning key (or index comprises partitioning key in general). Thanks for your comment. I'll try to propose the first patches as soon as possible. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On Wed, Mar 1, 2017 at 4:23 PM, Maksim Milyutin wrote: > As I've understood from thread [1] the main issue of creating local indexes > for partitions is supporting REINDEX and DROP INDEX operations on parent > partitioned tables. Furthermore Robert Haas mentioned the problem of > creating index on key that is represented in partitions with single value > (or primitive interval) [1] i.e. under the list-partitioning or > range-partitioning with unit interval. > > I would like to propose the following solution: > > 1. Create index for hierarchy of partitioned tables and partitions > recursively. Don't create relfilenode for indexes on parents, only entries > in catalog (much like the partitioned table's storage elimination in [2]). > Abstract index for partitioned tables is only for the reference on indexes > of child tables to perform REINDEX and DROP INDEX operations. > > 2. Specify created indexes in pg_depend table so that indexes of child > tables depend on corresponding indexes of parent tables with type of > dependency DEPENDENCY_NORMAL so that index could be removed separately for > partitions and recursively/separately for partitioned tables. > > 3. REINDEX on index of partitioned table would perform this operation on > existing indexes of corresponding partitions. In this case it is necessary > to consider such operations as REINDEX SCHEMA | DATABASE | SYSTEM so that > partitions' indexes wouldn't be re-indexed multiple times in a row. > > Any thoughts? Sounds generally good. One thing to keep in mind is that - in this system - a UNIQUE index on the parent doesn't actually guarantee uniqueness across the whole partitioning hierarchy unless it so happens that the index columns or expressions are the same as the partitioning columns or expressions. That's a little a counterintuitive, and people have already been complaining that a partitioned table + partitions doesn't look enough like a plain table. However, I'm not sure there's a better alternative, because somebody might want partition-wise unique indexes even though that doesn't guarantee global uniqueness. So I think if someday we have global indexes, then we can plan to use some other syntax for that, like CREATE GLOBAL [ UNIQUE ] INDEX. But, of course, that's just my opinion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal: Local indexes for partitioned table
Hi hackers! As I've understood from thread [1] the main issue of creating local indexes for partitions is supporting REINDEX and DROP INDEX operations on parent partitioned tables. Furthermore Robert Haas mentioned the problem of creating index on key that is represented in partitions with single value (or primitive interval) [1] i.e. under the list-partitioning or range-partitioning with unit interval. I would like to propose the following solution: 1. Create index for hierarchy of partitioned tables and partitions recursively. Don't create relfilenode for indexes on parents, only entries in catalog (much like the partitioned table's storage elimination in [2]). Abstract index for partitioned tables is only for the reference on indexes of child tables to perform REINDEX and DROP INDEX operations. 2. Specify created indexes in pg_depend table so that indexes of child tables depend on corresponding indexes of parent tables with type of dependency DEPENDENCY_NORMAL so that index could be removed separately for partitions and recursively/separately for partitioned tables. 3. REINDEX on index of partitioned table would perform this operation on existing indexes of corresponding partitions. In this case it is necessary to consider such operations as REINDEX SCHEMA | DATABASE | SYSTEM so that partitions' indexes wouldn't be re-indexed multiple times in a row. Any thoughts? 1. https://www.postgresql.org/message-id/CA+TgmoZUwj=qynak+f7xef4w_e2g3xxdmnsnzmzjuinhrco...@mail.gmail.com 2. https://www.postgresql.org/message-id/2b0d42f2-3a53-763b-c9c2-47139e4b1c2e%40lab.ntt.co.jp -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers