Re: [HACKERS] INSERT/SELECT and excessive foreign key checks
On 21-aug-2007, at 10:55, Alvaro Herrera wrote: It might go in if it's correct. If you have an answer to all the objections then there's no reason not to include it. But I must admit I didn't understand what was your answer to the above objection; care to rephrase? sorry, egg on my face, testing error. the revised patch doesn't catch the case of another backend deleting referenced tuples. I'll work on it some more. Lodewijk ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] INSERT/SELECT and excessive foreign key checks
Lodewijk Vöge escribió: > On 19-aug-2007, at 12:38, Tom Lane wrote: > >> An additional problem with your proposal is that it fails to consider >> other changes that might be happening concurrently -- eg, what if some >> other backend deletes a source row after you copy it, and commits before >> you do? > > then the patch indeed failed, but when I change it to check those carried > over FKs also once, it catches it correctly. > > are there other such issues? or is this kind of optimization not going in > no matter what? It might go in if it's correct. If you have an answer to all the objections then there's no reason not to include it. But I must admit I didn't understand what was your answer to the above objection; care to rephrase? -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "On the other flipper, one wrong move and we're Fatal Exceptions" (T.U.X.: Term Unit X - http://www.thelinuxreview.com/TUX/) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] INSERT/SELECT and excessive foreign key checks
On 19-aug-2007, at 12:38, Tom Lane wrote: An additional problem with your proposal is that it fails to consider other changes that might be happening concurrently -- eg, what if some other backend deletes a source row after you copy it, and commits before you do? then the patch indeed failed, but when I change it to check those carried over FKs also once, it catches it correctly. are there other such issues? or is this kind of optimization not going in no matter what? Lodewijk ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] INSERT/SELECT and excessive foreign key checks
On 19-aug-2007, at 12:38, Tom Lane wrote: "Hack" is the right word. People keep proposing variants of the idea that the executor should optimize updates on the basis of examining the query tree to see whether columns changed or not, and they're always wrong. You don't know what else might have been done to the row by BEFORE triggers. but that's something it can check for. if there are BEFORE triggers on the table, don't do it. An additional problem with your proposal is that it fails to consider other changes that might be happening concurrently -- eg, what if some other backend deletes a source row after you copy it, and commits before you do? There would be an interval with no committed row having that FK value, and no one holding a row lock on the referenced PK row, so some third transaction could delete the PK row. so if it checks those FKs being carried over also only once, that would close that hole, right? it would just be nice to not have to disable triggers altogether in this case. there is a person twiddling his/her thumbs while all this checking and re-checking is going on. Lodewijk ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] INSERT/SELECT and excessive foreign key checks
James Mansion wrote: I was wondering whether one could try to identify what might be termed 'enum tables' that exist to provide lookups. There are perhaps three main types of table that is the target of a foreign key lookup: 1) tables that map to program language enumerations: typically small (less than a hundred rows) and changing very infrequently. 8.3's enum types will help with these, and they get you out of the FK game altogether. cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] INSERT/SELECT and excessive foreign key checks
"Webb Sprague" <[EMAIL PROTECTED]> writes: > Is there a different potential hack for marking a table read-only, > turning it on and off with a function()? In a hackish vein, use a > trigger to enforce this, and maybe a rule that can do the > optimization? I think several people already have something like this in mind for the next release for several different motivations. It is (as most things are in Postgres) a bit trickier than it appears since even if you block subsequent writes the table's contents still "change" from the point of view of clients when their snapshots change. What's needed is a two-phase command which first starts blocking writes to the table then vacuums it waiting on each page until every tuple in the entire table can be frozen. At that point the contents are truly static and the table can be marked as such. That would enable a number of optimizations: . The table can be moved to a read-only medium. . Index scans can be index-only scans . The statistics could gather information such as min/max for each column and the planner could trust this data. That would allow constraint exclusion to kick in for partitions even if you're not querying on the partition key. It also allows us to exclude the parent table of the inheritance tree. . FK checks could rely on a share table lock instead of row locks and aggressively cache which key values are found even across transactions. But this all relies on a user-visible operation to switch the table from read-write to read-only and back again. It cannot switch the behaviour transparently because switching it back to read-write requires taking a lock and notifying everyone to dump their old plans and caches. Bruce's idea had the merit that it could be made transparent, but I think as a result it has too few optimizations that would be safe to do. As a DBA I don't think I would have been too upset at the idea that if I manually marked a table read only it would enable various optimizations. Especially if I was told I could mark it read write whenever I felt like, make my changes and then set it back to read-only. It does have the "one more knob" nature though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] INSERT/SELECT and excessive foreign key checks
> ... People keep proposing variants of the idea > that the executor should optimize updates on the basis of examining > the query tree to see whether columns changed or not, and they're always > wrong. You don't know what else might have been done to the row by > BEFORE triggers. Is there a different potential hack for marking a table read-only, turning it on and off with a function()? In a hackish vein, use a trigger to enforce this, and maybe a rule that can do the optimization? I wouldn't be the one writing this, so maybe it is ridiculous and I apologize for contributing to list-noise, but perhaps it would be useful for those tables that change so rarely. Of course, it would be a "contrib" package and nothing for the core. -w ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] INSERT/SELECT and excessive foreign key checks
Lodewijk Voege <[EMAIL PROTECTED]> writes: > I hacked up a patch that handles these two cases: "Hack" is the right word. People keep proposing variants of the idea that the executor should optimize updates on the basis of examining the query tree to see whether columns changed or not, and they're always wrong. You don't know what else might have been done to the row by BEFORE triggers. An additional problem with your proposal is that it fails to consider other changes that might be happening concurrently -- eg, what if some other backend deletes a source row after you copy it, and commits before you do? There would be an interval with no committed row having that FK value, and no one holding a row lock on the referenced PK row, so some third transaction could delete the PK row. There are some implementation problems in this patch too, but it's not necessary to delve into such details when the basic premise is unworkable. Sorry... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] INSERT/SELECT and excessive foreign key checks
Andrew Dunstan wrote: Could we achieve the same thing in a more general way by having a per-FK tiny (say 10?) LRU cache of values checked. Then it wouldn't only be restricted to constant expressions. Of course, then the trigger would need to keep state, so it might well be too complex (e.g. what if there are are concurrent inserts?) I was wondering whether one could try to identify what might be termed 'enum tables' that exist to provide lookups. There are perhaps three main types of table that is the target of a foreign key lookup: 1) tables that map to program language enumerations: typically small (less than a hundred rows) and changing very infrequently. 2) tables that hold quasi-static reference data where rows are 'never' deleted (the may be amended, perhaps to identify that they are logically inactivated, but still needed for reference lookup from existing rows elsewhere) - typically customer definitions, product definitions, site definitions and that sort of thing that is often regarded as 'static data' by a user application session but which may change. 3) master records in master/detail relationships such as order/orderline. If you can have mechanisms that reflect the likelihood of an update and optimise accordingly, then hopefully performance in real-world applications can be improved. In the case of 1) for example, we might reasonably have a single logical read/write lock that controls access to ALL such tables in a schema, and a single 'update generation count'. The lock would effectively provide repeatable read stability across all of the tables (a multi-table table lock) while in place, and the generation count (which can be a tran id) idicates to caching processes when the cache is stale. This means that unlike normal MVCC the readers will block a writer, but in this case we expect the write to happen only during application release. In the case of 2), we can't use the cross-table lock, and the tables may be large, so the suggested LRU cache per table (with a table-level read/write lock again) may be most effective, but we may elect to regard a read lock as allowing any operation that doesn't invalidate the primary key.. And in the case of 3) we don't do anything special at all. I certainly think that anything which can materially reduce lookups in case 1) and hopefully 2) will encourage good database design and declarative referential integrity - in some clases of high performance application the cost is too high to be done inline with an update, which is a shame. James ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] INSERT/SELECT and excessive foreign key checks
"Andrew Dunstan" <[EMAIL PROTECTED]> writes: > Could we achieve the same thing in a more general way by having a per-FK tiny > (say 10?) LRU cache of values checked. Then it wouldn't only be restricted to > constant expressions. Of course, then the trigger would need to keep state, so > it might well be too complex (e.g. what if there are are concurrent inserts?) The danger here is actually that the same transaction (or command in the case of non-deferred constraints) later deletes the same record. eg, something like: INSERT INTO tab VALUES (1)-> queues check for FK value 1 DELETE FROM tab where fk = 1 DELETE FROM othertab where id = 1 INSERT INTO tab VALUES (1)-> skips queing check because 1 is in cache Now when the triggers fire that check is skipped because the record that fired it is no longer valid. And the second record You could do it when it comes time to do the check though. Keep a cache of values you've already actually performed the check for. But I'm not sure how much that will save. I was also thinking of having a cache which simply avoided the SPI query which would keep the tid and xmin found previously. Then we could look up the actual record directly instead of going through the executor. As long as the record is still there with the same xmin then we now the value we cached for it is still valid. This would help the OLTP case where you're performing many small transactions which refer to the same records over and over again. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] INSERT/SELECT and excessive foreign key checks
Gregory Stark wrote: "Lodewijk Voege" <[EMAIL PROTECTED]> writes: I hacked up a patch that handles these two cases: - for such an INSERT/SELECT, check constant FKs only once. This sounds like a clever idea. It seems the abstraction violation is worth it to me. Could we achieve the same thing in a more general way by having a per-FK tiny (say 10?) LRU cache of values checked. Then it wouldn't only be restricted to constant expressions. Of course, then the trigger would need to keep state, so it might well be too complex (e.g. what if there are are concurrent inserts?) cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] INSERT/SELECT and excessive foreign key checks
"Lodewijk Voege" <[EMAIL PROTECTED]> writes: > I hacked up a patch that handles these two cases: > - for such an INSERT/SELECT, check constant FKs only once. This sounds like a clever idea. It seems the abstraction violation is worth it to me. > - for an INSERT/SELECT from/to the same table, don't check FKs that are > carried over as is at all. (it'd be nice to extend this to fields that > have the same FK constraint, rather than the current patch' strict > same-table, same-field condition) I'm not sure this is safe. It means you're not locking the target row. If someone else comes along and deletes both the original source record and the target record before you've gotten around to inserting the new record... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] INSERT/SELECT and excessive foreign key checks
hello, I'm working on an application that once in a while needs to shuffle some data around with a query like: INSERT INTO foo (some_field, bar_FK, baz_FK) SELECT some_field, 12345 AS bar_FK, baz_FK FROM foo WHERE bar_FK=123 ie. copy a block of data from a table into itself while setting one foreign key to a constant and carrying some other foreign keys over as is. the foreign key checks for this case appear very suboptimal: for every inserted row, both the constant foreign key as well as the keys being carried over are checked. some of these blocks of data being copied are large, the largest block being (currently) 1.6M rows. the table has three such foreign keys, which from reading the code means 4.8M trigger events get queued up, where it only needs to do exactly one as far as I can tell. I hacked up a patch that handles these two cases: - for such an INSERT/SELECT, check constant FKs only once. - for an INSERT/SELECT from/to the same table, don't check FKs that are carried over as is at all. (it'd be nice to extend this to fields that have the same FK constraint, rather than the current patch' strict same-table, same-field condition) the difference for this application is pretty dramatic: on my development system, copying 1.6M rows in this manner with the table having 3 FKs on it: - 8.3 from CVS with 128MB of shared memory takes just under 6 minutes. the postgres process peaks at 552MB virtual, 527MB RSS and 135MB shared. - 8.3 with patch and 128MB of shared memory takes 42 seconds. the process peaks at 160MB virtual, 138MB RSS and 135MB shared. so, is this optimization generally useful and correct? and if so, what do I do to get a version of the patch in the tree? it's against CVS HEAD (well, against the HEAD of the SVN mirror of CVS) and it passes the regression tests. but I have no clue whether I'm trampling over abstraction barriers, etc. comments appreciated. thanks, Lodewijk Index: src/include/commands/trigger.h === --- src/include/commands/trigger.h (revision 28781) +++ src/include/commands/trigger.h (working copy) @@ -157,6 +157,7 @@ /* * in utils/adt/ri_triggers.c */ +extern bool RI_FKey_all_attrs_covered(Trigger *trigger, Relation fk_rel, int16 *attnums, int num_attnums); extern bool RI_FKey_keyequal_upd_pk(Trigger *trigger, Relation pk_rel, HeapTuple old_row, HeapTuple new_row); extern bool RI_FKey_keyequal_upd_fk(Trigger *trigger, Relation fk_rel, Index: src/backend/executor/execMain.c === --- src/backend/executor/execMain.c (revision 28781) +++ src/backend/executor/execMain.c (working copy) @@ -39,6 +39,7 @@ #include "catalog/heap.h" #include "catalog/namespace.h" #include "catalog/toasting.h" +#include "catalog/pg_trigger.h" #include "commands/tablespace.h" #include "commands/trigger.h" #include "executor/execdebug.h" @@ -65,6 +66,9 @@ /* decls for local routines only used within this module */ static void InitPlan(QueryDesc *queryDesc, int eflags); +static void FilterFKChecks(EState *estate, int (*condition)(TargetEntry *te)); +static void FilterFKChecksForFieldsCarriedOver(EState *estate); +static void FilterFKChecksForConstantFields(EState *estate); static void initResultRelInfo(ResultRelInfo *resultRelInfo, Relation resultRelationDesc, Index resultRelationIndex, @@ -821,6 +825,9 @@ queryDesc->tupDesc = tupType; queryDesc->planstate = planstate; + if (operation == CMD_INSERT) + FilterFKChecksForFieldsCarriedOver(estate); + /* * If doing SELECT INTO, initialize the "into" relation. We must wait * till now so we have the "clean" result tuple type to create the new @@ -832,7 +839,76 @@ OpenIntoRel(queryDesc); } +static void +FilterFKChecks(EState *estate, int (*condition)(TargetEntry *te)) { + TriggerDesc *trigdesc; + Trigger *trigger; + ListCell *em; + Relationrel; + int16 *attrs; + int i, num_attrs; + + if (estate->es_plannedstmt == NULL || + estate->es_num_result_relations > 1 || + estate->es_result_relations == NULL || + estate->es_plannedstmt->planTree->type == T_ValuesScan) + return; + trigdesc = estate->es_result_relations[0].ri_TrigDesc; + if (trigdesc == NULL) + return; + + num_attrs = 0; + attrs = palloc( + list_length(estate->es_plannedstmt->planTree->targetlist) * sizeof(int16)); + foreach(em, estate->es_plannedstmt->planTree->targetlist) { + TargetEntry *te = (TargetEntry *)lfirst(em); + if (condition(te)) + attrs[num_attrs++] = te->resno; + } + + rel