Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)
On Wed, Jun 20, 2012 at 2:19 AM, Tom Lane wrote: > I've marked this patch committed, although in the end there was nothing > left of it ;-) > Thank you, Dean and Tom! I'm sorry for not participating in this thread, I've been away for the past five weeks and have much catching up to do.
Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)
Dean Rasheed writes: > On 12 February 2012 02:06, Vik Reykja wrote: >> I decided to take a crack at the todo item created from the following post: >> http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php > Here's my review of this patch. I've marked this patch committed, although in the end there was nothing left of it ;-). After teaching the trigger skip logic that old PK nulls or new FK nulls mean the constraint needn't be checked, there is no case where ri_KeysEqual is called on data that is not known to be all-non-null on one side or the other. So it doesn't matter whether we use plain equality or is-not-distinct logic. We could have applied these changes anyway but I didn't see much value, since as previously noted there would be some cases where the comparison got microscopically slower. I did add a comment about the point in case anybody revisits the code again in future. > There's also a separate question about whether the RI trigger > functions need to be doing these key comparisons, given that they are > done earlier when the triggers are queued > (http://archives.postgresql.org/pgsql-performance/2005-10/msg00459.php), > but the savings to be made there are likely to be smaller than the > savings this patch makes by not queuing the triggers at all. I haven't looked into this question. It would only matter for PK updates (since the FK-side triggers make no such comparisons), so it's probably not going to make much difference in typical workloads. Still, if anybody wants to investigate ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)
Dean Rasheed writes: > I think that the patch already covers the most common use case (in my > experience) but we may as well get as much out of it as we can while > we're here. Yeah. The cases involving nulls are probably really rather unlikely altogether, but it seems a tad silly to fix only some of them when we can fix all of them for marginally more effort. > Are you planning to tackle this, or should I move the patch back to > "waiting on author" to give Vik Reykja a chance to update it? It doesn't look like much else is "ready for committer" yet, so I think I'll keep hacking on this one. The whole of ri_triggers is looking a bit old and creaky to me; for instance the SET DEFAULT triggers believe that they can't cache plans involving DEFAULT, which was fixed years ago (I'm pretty sure that the plancache level should take care of that automatically, since an ALTER TABLE ... DEFAULT command would force a relcache flush). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)
On 17 June 2012 18:48, Tom Lane wrote: > Gurjeet Singh writes: >> On Sat, Jun 16, 2012 at 9:50 AM, Dean Rasheed >> wrote: >> I find it interesting that 'actual time' for top level 'Update on fk_table' >> is always higher in patched versions, and yet the 'Total runtime' is lower >> for the patched versions. I would've expected 'Total runtime' to be >> proportional to the increase in top-level row-source's 'actual time'. >> Even the time consumed by Seq scans is higher in patched version, so I >> think the patch's affect on performance needs to be evaluated. > > AFAICS, the only way that the given patch could possibly make anything > slower is that if the old value of some tested attribute is NULL, the > comparison routines used to fall out immediately; now, they will do an > additional SPI_getbinval call to extract the new value before making > any decision. So that would account for some small increase in the > ModifyTable runtime in cases where there are a lot of null keys in FK > rows being updated, which accurately describes Dean's test case, if not > so much the real world. I don't have a big problem with it, since the > point of the patch is to possibly save a great deal more work in exactly > these cases. > > It strikes me though that we are still leaving some money on the table. > The SQL spec says clearly that no RI action need be taken when a null > PK key value is updated to non-null, and I think this is right because > there cannot possibly be any FK rows that are considered to match the > old value. (Note that both the spec and our FK code treat the RI > equality operators as strict, even if the underlying functions aren't > really.) So we ought to have asymmetric logic in there when making > checks on PK rows, such that null->non-null is not considered an > interesting change. If done properly this would remove the above- > described slowdown in the PK case. > Yeah, that makes sense. > Conversely, if an FK value is changed from non-null to null, that is > either always OK (if MATCH SIMPLE, or if MATCH FULL and all the FK > columns went to null) or a certain failure (if MATCH FULL and we > have a mix of nulls and non-nulls). There's no need to queue a > trigger event in the "always OK" cases, so I think we need some > asymmetric logic in the FK case as well. > Makes sense too. I think that the patch already covers the most common use case (in my experience) but we may as well get as much out of it as we can while we're here. Are you planning to tackle this, or should I move the patch back to "waiting on author" to give Vik Reykja a chance to update it? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)
On 17 June 2012 18:30, Kevin Grittner wrote: > Gurjeet Singh wrote: >> Dean Rasheed wrote: > >> in HEAD: >>> ... (actual time=1390.037..1390.037 rows=0 loops=1) >>> Trigger for constraint fk_table_e_fkey: time=210.184 calls=9 >>> Total runtime: 1607.626 ms > >>> With this patch: >>> ... (actual time=1489.640..1489.640 rows=0 loops=1) >>> [no triggers fired] >>> Total runtime: 1489.679 ms > >>> for every row: >>> ... (actual time=1565.148..1565.148 rows=0 loops=1) >>> Trigger for constraint fk_table_e_fkey: time=705.962 calls=10 >>> Total runtime: 2279.408 ms > >>> with this patch >>> ... (actual time=1962.755..1962.755 rows=0 loops=1) >>> Trigger for constraint fk_table_e_fkey: time=257.845 calls=1 >>> Total runtime: 2221.912 ms > >> I find it interesting that 'actual time' for top level 'Update on >> fk_table' is always higher in patched versions, and yet the 'Total >> runtime' is lower for the patched versions. I would've expected >> 'Total runtime' to be proportional to the increase in top-level >> row-source's 'actual time'. > > I figured that the trigger time was counted separately. It seems to > add up pretty well that way. I guess the question is whether there > is a case where the increase in seqscan time is *not* compensated by > less time in the triggers. > > -Kevin I wouldn't read too much into the individual timings I posted above, since I get massive variations between runs. If I repeat it enough times, I can convince myself that the update times excluding trigger execution are unchanged on average, but the trigger execution times (which are indeed counted separately) are a real savings. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)
"Kevin Grittner" writes: > I figured that the trigger time was counted separately. Yeah, it is. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)
Gurjeet Singh writes: > On Sat, Jun 16, 2012 at 9:50 AM, Dean Rasheed wrote: > I find it interesting that 'actual time' for top level 'Update on fk_table' > is always higher in patched versions, and yet the 'Total runtime' is lower > for the patched versions. I would've expected 'Total runtime' to be > proportional to the increase in top-level row-source's 'actual time'. > Even the time consumed by Seq scans is higher in patched version, so I > think the patch's affect on performance needs to be evaluated. AFAICS, the only way that the given patch could possibly make anything slower is that if the old value of some tested attribute is NULL, the comparison routines used to fall out immediately; now, they will do an additional SPI_getbinval call to extract the new value before making any decision. So that would account for some small increase in the ModifyTable runtime in cases where there are a lot of null keys in FK rows being updated, which accurately describes Dean's test case, if not so much the real world. I don't have a big problem with it, since the point of the patch is to possibly save a great deal more work in exactly these cases. It strikes me though that we are still leaving some money on the table. The SQL spec says clearly that no RI action need be taken when a null PK key value is updated to non-null, and I think this is right because there cannot possibly be any FK rows that are considered to match the old value. (Note that both the spec and our FK code treat the RI equality operators as strict, even if the underlying functions aren't really.) So we ought to have asymmetric logic in there when making checks on PK rows, such that null->non-null is not considered an interesting change. If done properly this would remove the above- described slowdown in the PK case. Conversely, if an FK value is changed from non-null to null, that is either always OK (if MATCH SIMPLE, or if MATCH FULL and all the FK columns went to null) or a certain failure (if MATCH FULL and we have a mix of nulls and non-nulls). There's no need to queue a trigger event in the "always OK" cases, so I think we need some asymmetric logic in the FK case as well. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)
Gurjeet Singh wrote: > Dean Rasheed wrote: > in HEAD: >> ... (actual time=1390.037..1390.037 rows=0 loops=1) >> Trigger for constraint fk_table_e_fkey: time=210.184 calls=9 >> Total runtime: 1607.626 ms >> With this patch: >> ... (actual time=1489.640..1489.640 rows=0 loops=1) >> [no triggers fired] >> Total runtime: 1489.679 ms >> for every row: >> ... (actual time=1565.148..1565.148 rows=0 loops=1) >> Trigger for constraint fk_table_e_fkey: time=705.962 calls=10 >> Total runtime: 2279.408 ms >> with this patch >> ... (actual time=1962.755..1962.755 rows=0 loops=1) >> Trigger for constraint fk_table_e_fkey: time=257.845 calls=1 >> Total runtime: 2221.912 ms > I find it interesting that 'actual time' for top level 'Update on > fk_table' is always higher in patched versions, and yet the 'Total > runtime' is lower for the patched versions. I would've expected > 'Total runtime' to be proportional to the increase in top-level > row-source's 'actual time'. I figured that the trigger time was counted separately. It seems to add up pretty well that way. I guess the question is whether there is a case where the increase in seqscan time is *not* compensated by less time in the triggers. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)
On Sat, Jun 16, 2012 at 9:50 AM, Dean Rasheed wrote: Then in HEAD: > EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1; > > QUERY PLAN > > --- > Update on fk_table (cost=0.00..2300.00 rows=10 width=26) (actual > time=1390.037..1390.037 rows=0 loops=1) > -> Seq Scan on fk_table (cost=0.00..2300.00 rows=10 width=26) > (actual time=0.010..60.841 rows=10 loops=1) > Trigger for constraint fk_table_e_fkey: time=210.184 calls=9 > Total runtime: 1607.626 ms > (4 rows) > > So the RI trigger is fired 9 times, for the unchanged NULL FK rows. > > With this patch, the RI trigger is not fired at all: > EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1; > > QUERY PLAN > > --- > Update on fk_table (cost=0.00..2300.00 rows=10 width=26) (actual > time=1489.640..1489.640 rows=0 loops=1) > -> Seq Scan on fk_table (cost=0.00..2300.00 rows=10 width=26) > (actual time=0.010..66.328 rows=10 loops=1) > Total runtime: 1489.679 ms > (3 rows) > > > Similarly, if I update the FK column in HEAD the RI trigger is fired > for every row: > EXPLAIN ANALYSE UPDATE fk_table SET e=e-1; > > QUERY PLAN > > --- > Update on fk_table (cost=0.00..1800.00 rows=10 width=26) (actual > time=1565.148..1565.148 rows=0 loops=1) > -> Seq Scan on fk_table (cost=0.00..1800.00 rows=10 width=26) > (actual time=0.010..42.725 rows=10 loops=1) > Trigger for constraint fk_table_e_fkey: time=705.962 calls=10 > Total runtime: 2279.408 ms > (4 rows) > > whereas with this patch it is only fired for the non-NULL FK rows that > are changing: > EXPLAIN ANALYSE UPDATE fk_table SET e=e-1; > > QUERY PLAN > > --- > Update on fk_table (cost=0.00..5393.45 rows=299636 width=26) (actual > time=1962.755..1962.755 rows=0 loops=1) > -> Seq Scan on fk_table (cost=0.00..5393.45 rows=299636 width=26) > (actual time=0.023..52.850 rows=10 loops=1) > Trigger for constraint fk_table_e_fkey: time=257.845 calls=1 > Total runtime: 2221.912 ms > (4 rows) > I find it interesting that 'actual time' for top level 'Update on fk_table' is always higher in patched versions, and yet the 'Total runtime' is lower for the patched versions. I would've expected 'Total runtime' to be proportional to the increase in top-level row-source's 'actual time'. Even the time consumed by Seq scans is higher in patched version, so I think the patch's affect on performance needs to be evaluated. Best regards, -- Gurjeet Singh EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)
On 16 June 2012 18:04, Tom Lane wrote: > Dean Rasheed writes: >> BTW, I had no problems applying both the original patch and Chetan >> Suttraway's version. The only difference between the patches seems to >> be that the original is in context format, and Chetan Suttraway's is >> in unified format. > >> Which format do hackers actually prefer? The wiki page >> http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git >> suggests context format, but then the linked example >> http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified >> format. Do people care, or are both formats OK? > > Some people find one or the other more readable. (I'm in the camp that > says unified format is great for isolated single-line changes and > utterly unreadable for anything more complex, but apparently there are > people who prefer it.) > > For detailed review/commit purposes, it doesn't matter that much as long > as the patch applies cleanly, since it's easy to apply it and then get > a diff in the other format if you prefer reading the other. However, > if you're just hoping people will eyeball the patch in email and comment > on it, readability matters. If the patch requires manual fixup in order > to get it to apply anymore, readability is also a concern, since you're > dependent on the committer not misinterpreting the hunks he has to patch > in by hand. > OK thanks, that's good to know. I tend to find context format easier to read for large patches, but that's a highly subjective thing. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)
Dean Rasheed writes: > BTW, I had no problems applying both the original patch and Chetan > Suttraway's version. The only difference between the patches seems to > be that the original is in context format, and Chetan Suttraway's is > in unified format. > Which format do hackers actually prefer? The wiki page > http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git > suggests context format, but then the linked example > http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified > format. Do people care, or are both formats OK? Some people find one or the other more readable. (I'm in the camp that says unified format is great for isolated single-line changes and utterly unreadable for anything more complex, but apparently there are people who prefer it.) For detailed review/commit purposes, it doesn't matter that much as long as the patch applies cleanly, since it's easy to apply it and then get a diff in the other format if you prefer reading the other. However, if you're just hoping people will eyeball the patch in email and comment on it, readability matters. If the patch requires manual fixup in order to get it to apply anymore, readability is also a concern, since you're dependent on the committer not misinterpreting the hunks he has to patch in by hand. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] REVIEW: Optimize referential integrity checks (todo item)
On 12 February 2012 02:06, Vik Reykja wrote: > I decided to take a crack at the todo item created from the following post: > http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php > > The attached patch makes the desired changes in both code and function > naming. > > It seemed quite easy to do but wasn't marked as easy on the todo, so I'm > wondering if I've missed something. All regression tests pass. > Here's my review of this patch. Basic stuff: * Patch applies OK (some offsets). BTW, I had no problems applying both the original patch and Chetan Suttraway's version. The only difference between the patches seems to be that the original is in context format, and Chetan Suttraway's is in unified format. Which format do hackers actually prefer? The wiki page http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git suggests context format, but then the linked example http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified format. Do people care, or are both formats OK? * Compiles cleanly with no warnings. * Regression tests pass. The regression tests have not been updated. I think that's fair enough - I don't see a way to test this in a regression test - but I did some testing (see below) to confirm the expected behaviour. * No doc changes needed. What it does: - The primary benefit this patch offers is to prevent unnecessary queuing of RI triggers in the case of updates to a FK table where the old and new FK values are both NULL. It does this by effectively replacing the existing key equality checks with IS [NOT] DISTINCT FROM checks. This seems like a worthwhile optimisation, because I think that it is fairly common to have NULLs in FKs columns, and then update some other column. The patch also prevents unnecessary queuing of RI triggers when the PK table is updated, and the old and new values are both NULL, but that seems like a much less common case. I've looked over the code changes fairly closely, and I believe that this is a safe change. Technically, I think the changes to ri_OneKeyEqual() and ri_AllKeysUnequal() are unnecessary, since they can only be called from the trigger functions in the case where all the old values are non-NULL, hence the new versions end up behaving the same. However, I think it makes sense for all these functions to be consistent. Talking of consistency, I wonder if RI_FKey_keyequal_upd_pk() and RI_FKey_keyequal_upd_fk() ought to be renamed to RI_FKey_keyunchanged_upd_pk() and RI_FKey_keyunchanged_upd_pk()? Testing: I tested using the following tables: CREATE TABLE pk_table ( a int PRIMARY KEY ); INSERT INTO pk_table SELECT * FROM generate_series(1,10); CREATE TABLE fk_table ( a int PRIMARY KEY, b int, c int, d int, e int REFERENCES pk_table(a) ); INSERT INTO fk_table SELECT i, i*2, i*3, i*4, CASE WHEN i%10 = 0 THEN i END FROM generate_series(1,10) g(i); (i.e., FK populated in 10% of rows) Then in HEAD: EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1; QUERY PLAN --- Update on fk_table (cost=0.00..2300.00 rows=10 width=26) (actual time=1390.037..1390.037 rows=0 loops=1) -> Seq Scan on fk_table (cost=0.00..2300.00 rows=10 width=26) (actual time=0.010..60.841 rows=10 loops=1) Trigger for constraint fk_table_e_fkey: time=210.184 calls=9 Total runtime: 1607.626 ms (4 rows) So the RI trigger is fired 9 times, for the unchanged NULL FK rows. With this patch, the RI trigger is not fired at all: EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1; QUERY PLAN --- Update on fk_table (cost=0.00..2300.00 rows=10 width=26) (actual time=1489.640..1489.640 rows=0 loops=1) -> Seq Scan on fk_table (cost=0.00..2300.00 rows=10 width=26) (actual time=0.010..66.328 rows=10 loops=1) Total runtime: 1489.679 ms (3 rows) Similarly, if I update the FK column in HEAD the RI trigger is fired for every row: EXPLAIN ANALYSE UPDATE fk_table SET e=e-1; QUERY PLAN --- Update on fk_table (cost=0.00..1800.00 rows=10 width=26) (actual time=1565.148..1565.148 rows=0 loops=1) -> Seq Scan on fk_table (cost=0.00..1800.00 rows=10 width=26) (actual time=0.010..42.725 rows=10 loops=1) Trigger for constraint fk_table_e_fkey: time=705.962 calls=10 Total runtime: 2279.408 ms (4 rows) whereas with this patch it is only fired for the non-NULL FK rows that are changing: EXPLAIN ANALYSE UPDATE fk_tab