subject:"\[HACKERS\] REVIEW\: Optimize referential integrity checks \(todo item\)"

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-26 Thread Vik Reykja

On Wed, Jun 20, 2012 at 2:19 AM, Tom Lane  wrote:

> I've marked this patch committed, although in the end there was nothing
> left of it ;-)
>

Thank you, Dean and Tom!

I'm sorry for not participating in this thread, I've been away for the past
five weeks and have much catching up to do.

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-19 Thread Tom Lane

Dean Rasheed  writes:
> On 12 February 2012 02:06, Vik Reykja  wrote:
>> I decided to take a crack at the todo item created from the following post:
>> http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php

> Here's my review of this patch.

I've marked this patch committed, although in the end there was nothing
left of it ;-).  After teaching the trigger skip logic that old PK nulls
or new FK nulls mean the constraint needn't be checked, there is no case
where ri_KeysEqual is called on data that is not known to be
all-non-null on one side or the other.  So it doesn't matter whether
we use plain equality or is-not-distinct logic.  We could have applied
these changes anyway but I didn't see much value, since as previously
noted there would be some cases where the comparison got microscopically
slower.  I did add a comment about the point in case anybody revisits
the code again in future.

> There's also a separate question about whether the RI trigger
> functions need to be doing these key comparisons, given that they are
> done earlier when the triggers are queued
> (http://archives.postgresql.org/pgsql-performance/2005-10/msg00459.php),
> but the savings to be made there are likely to be smaller than the
> savings this patch makes by not queuing the triggers at all.

I haven't looked into this question.  It would only matter for PK
updates (since the FK-side triggers make no such comparisons), so it's
probably not going to make much difference in typical workloads.
Still, if anybody wants to investigate ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-18 Thread Tom Lane

Dean Rasheed  writes:
> I think that the patch already covers the most common use case (in my
> experience) but we may as well get as much out of it as we can while
> we're here.

Yeah.  The cases involving nulls are probably really rather unlikely
altogether, but it seems a tad silly to fix only some of them when
we can fix all of them for marginally more effort.

> Are you planning to tackle this, or should I move the patch back to
> "waiting on author" to give Vik Reykja a chance to update it?

It doesn't look like much else is "ready for committer" yet, so I think
I'll keep hacking on this one.  The whole of ri_triggers is looking a
bit old and creaky to me; for instance the SET DEFAULT triggers believe
that they can't cache plans involving DEFAULT, which was fixed years
ago (I'm pretty sure that the plancache level should take care of that
automatically, since an ALTER TABLE ... DEFAULT command would force a
relcache flush).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-18 Thread Dean Rasheed

On 17 June 2012 18:48, Tom Lane  wrote:
> Gurjeet Singh  writes:
>> On Sat, Jun 16, 2012 at 9:50 AM, Dean Rasheed 
>> wrote:
>> I find it interesting that 'actual time' for top level 'Update on fk_table'
>> is always higher in patched versions, and yet the 'Total runtime' is lower
>> for the patched versions. I would've expected 'Total runtime' to be
>> proportional to the increase in top-level row-source's 'actual time'.
>> Even the time consumed by Seq scans is higher in patched version, so I
>> think the patch's affect on performance needs to be evaluated.
>
> AFAICS, the only way that the given patch could possibly make anything
> slower is that if the old value of some tested attribute is NULL, the
> comparison routines used to fall out immediately; now, they will do an
> additional SPI_getbinval call to extract the new value before making
> any decision.  So that would account for some small increase in the
> ModifyTable runtime in cases where there are a lot of null keys in FK
> rows being updated, which accurately describes Dean's test case, if not
> so much the real world.  I don't have a big problem with it, since the
> point of the patch is to possibly save a great deal more work in exactly
> these cases.
>
> It strikes me though that we are still leaving some money on the table.
> The SQL spec says clearly that no RI action need be taken when a null
> PK key value is updated to non-null, and I think this is right because
> there cannot possibly be any FK rows that are considered to match the
> old value.  (Note that both the spec and our FK code treat the RI
> equality operators as strict, even if the underlying functions aren't
> really.)  So we ought to have asymmetric logic in there when making
> checks on PK rows, such that null->non-null is not considered an
> interesting change.  If done properly this would remove the above-
> described slowdown in the PK case.
>

Yeah, that makes sense.

> Conversely, if an FK value is changed from non-null to null, that is
> either always OK (if MATCH SIMPLE, or if MATCH FULL and all the FK
> columns went to null) or a certain failure (if MATCH FULL and we
> have a mix of nulls and non-nulls).  There's no need to queue a
> trigger event in the "always OK" cases, so I think we need some
> asymmetric logic in the FK case as well.
>

Makes sense too.
I think that the patch already covers the most common use case (in my
experience) but we may as well get as much out of it as we can while
we're here.

Are you planning to tackle this, or should I move the patch back to
"waiting on author" to give Vik Reykja a chance to update it?

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-18 Thread Dean Rasheed

On 17 June 2012 18:30, Kevin Grittner  wrote:
> Gurjeet Singh  wrote:
>> Dean Rasheed wrote:
>
>> in HEAD:
>>> ... (actual time=1390.037..1390.037 rows=0 loops=1)
>>> Trigger for constraint fk_table_e_fkey: time=210.184 calls=9
>>> Total runtime: 1607.626 ms
>
>>> With this patch:
>>> ... (actual time=1489.640..1489.640 rows=0 loops=1)
>>> [no triggers fired]
>>> Total runtime: 1489.679 ms
>
>>> for every row:
>>> ... (actual time=1565.148..1565.148 rows=0 loops=1)
>>> Trigger for constraint fk_table_e_fkey: time=705.962 calls=10
>>> Total runtime: 2279.408 ms
>
>>> with this patch
>>> ... (actual time=1962.755..1962.755 rows=0 loops=1)
>>> Trigger for constraint fk_table_e_fkey: time=257.845 calls=1
>>> Total runtime: 2221.912 ms
>
>> I find it interesting that 'actual time' for top level 'Update on
>> fk_table' is always higher in patched versions, and yet the 'Total
>> runtime' is lower for the patched versions. I would've expected
>> 'Total runtime' to be proportional to the increase in top-level
>> row-source's 'actual time'.
>
> I figured that the trigger time was counted separately.  It seems to
> add up pretty well that way.  I guess the question is whether there
> is a case where the increase in seqscan time is *not* compensated by
> less time in the triggers.
>
> -Kevin

I wouldn't read too much into the individual timings I posted above,
since I get massive variations between runs. If I repeat it enough
times, I can convince myself that the update times excluding trigger
execution are unchanged on average, but the trigger execution times
(which are indeed counted separately) are a real savings.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-17 Thread Tom Lane

"Kevin Grittner"  writes:
> I figured that the trigger time was counted separately.

Yeah, it is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-17 Thread Tom Lane

Gurjeet Singh  writes:
> On Sat, Jun 16, 2012 at 9:50 AM, Dean Rasheed wrote:
> I find it interesting that 'actual time' for top level 'Update on fk_table'
> is always higher in patched versions, and yet the 'Total runtime' is lower
> for the patched versions. I would've expected 'Total runtime' to be
> proportional to the increase in top-level row-source's 'actual time'.
> Even the time consumed by Seq scans is higher in patched version, so I
> think the patch's affect on performance needs to be evaluated.

AFAICS, the only way that the given patch could possibly make anything
slower is that if the old value of some tested attribute is NULL, the
comparison routines used to fall out immediately; now, they will do an
additional SPI_getbinval call to extract the new value before making
any decision.  So that would account for some small increase in the
ModifyTable runtime in cases where there are a lot of null keys in FK
rows being updated, which accurately describes Dean's test case, if not
so much the real world.  I don't have a big problem with it, since the
point of the patch is to possibly save a great deal more work in exactly
these cases.

It strikes me though that we are still leaving some money on the table.
The SQL spec says clearly that no RI action need be taken when a null
PK key value is updated to non-null, and I think this is right because
there cannot possibly be any FK rows that are considered to match the
old value.  (Note that both the spec and our FK code treat the RI
equality operators as strict, even if the underlying functions aren't
really.)  So we ought to have asymmetric logic in there when making
checks on PK rows, such that null->non-null is not considered an
interesting change.  If done properly this would remove the above-
described slowdown in the PK case.

Conversely, if an FK value is changed from non-null to null, that is
either always OK (if MATCH SIMPLE, or if MATCH FULL and all the FK
columns went to null) or a certain failure (if MATCH FULL and we
have a mix of nulls and non-nulls).  There's no need to queue a
trigger event in the "always OK" cases, so I think we need some
asymmetric logic in the FK case as well.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-17 Thread Kevin Grittner

Gurjeet Singh  wrote:
> Dean Rasheed wrote:
 
> in HEAD:
>> ... (actual time=1390.037..1390.037 rows=0 loops=1)
>> Trigger for constraint fk_table_e_fkey: time=210.184 calls=9
>> Total runtime: 1607.626 ms
 
>> With this patch:
>> ... (actual time=1489.640..1489.640 rows=0 loops=1)
>> [no triggers fired]
>> Total runtime: 1489.679 ms
 
>> for every row:
>> ... (actual time=1565.148..1565.148 rows=0 loops=1)
>> Trigger for constraint fk_table_e_fkey: time=705.962 calls=10
>> Total runtime: 2279.408 ms
 
>> with this patch
>> ... (actual time=1962.755..1962.755 rows=0 loops=1)
>> Trigger for constraint fk_table_e_fkey: time=257.845 calls=1
>> Total runtime: 2221.912 ms
 
> I find it interesting that 'actual time' for top level 'Update on
> fk_table' is always higher in patched versions, and yet the 'Total
> runtime' is lower for the patched versions. I would've expected
> 'Total runtime' to be proportional to the increase in top-level
> row-source's 'actual time'.
 
I figured that the trigger time was counted separately.  It seems to
add up pretty well that way.  I guess the question is whether there
is a case where the increase in seqscan time is *not* compensated by
less time in the triggers.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-17 Thread Gurjeet Singh

On Sat, Jun 16, 2012 at 9:50 AM, Dean Rasheed wrote:

Then in HEAD:
> EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1;
>
>  QUERY PLAN
>
> ---
>  Update on fk_table  (cost=0.00..2300.00 rows=10 width=26) (actual
> time=1390.037..1390.037 rows=0 loops=1)
>   ->  Seq Scan on fk_table  (cost=0.00..2300.00 rows=10 width=26)
> (actual time=0.010..60.841 rows=10 loops=1)
>  Trigger for constraint fk_table_e_fkey: time=210.184 calls=9
>  Total runtime: 1607.626 ms
> (4 rows)
>
> So the RI trigger is fired 9 times, for the unchanged NULL FK rows.
>
> With this patch, the RI trigger is not fired at all:
> EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1;
>
>  QUERY PLAN
>
> ---
>  Update on fk_table  (cost=0.00..2300.00 rows=10 width=26) (actual
> time=1489.640..1489.640 rows=0 loops=1)
>   ->  Seq Scan on fk_table  (cost=0.00..2300.00 rows=10 width=26)
> (actual time=0.010..66.328 rows=10 loops=1)
>  Total runtime: 1489.679 ms
> (3 rows)
>
>
> Similarly, if I update the FK column in HEAD the RI trigger is fired
> for every row:
> EXPLAIN ANALYSE UPDATE fk_table SET e=e-1;
>
>  QUERY PLAN
>
> ---
>  Update on fk_table  (cost=0.00..1800.00 rows=10 width=26) (actual
> time=1565.148..1565.148 rows=0 loops=1)
>   ->  Seq Scan on fk_table  (cost=0.00..1800.00 rows=10 width=26)
> (actual time=0.010..42.725 rows=10 loops=1)
>  Trigger for constraint fk_table_e_fkey: time=705.962 calls=10
>  Total runtime: 2279.408 ms
> (4 rows)
>
> whereas with this patch it is only fired for the non-NULL FK rows that
> are changing:
> EXPLAIN ANALYSE UPDATE fk_table SET e=e-1;
>
>  QUERY PLAN
>
> ---
>  Update on fk_table  (cost=0.00..5393.45 rows=299636 width=26) (actual
> time=1962.755..1962.755 rows=0 loops=1)
>   ->  Seq Scan on fk_table  (cost=0.00..5393.45 rows=299636 width=26)
> (actual time=0.023..52.850 rows=10 loops=1)
>  Trigger for constraint fk_table_e_fkey: time=257.845 calls=1
>  Total runtime: 2221.912 ms
> (4 rows)
>

I find it interesting that 'actual time' for top level 'Update on fk_table'
is always higher in patched versions, and yet the 'Total runtime' is lower
for the patched versions. I would've expected 'Total runtime' to be
proportional to the increase in top-level row-source's 'actual time'.

Even the time consumed by Seq scans is higher in patched version, so I
think the patch's affect on performance needs to be evaluated.

Best regards,
-- 
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-16 Thread Dean Rasheed

On 16 June 2012 18:04, Tom Lane  wrote:
> Dean Rasheed  writes:
>> BTW, I had no problems applying both the original patch and Chetan
>> Suttraway's version. The only difference between the patches seems to
>> be that the original is in context format, and Chetan Suttraway's is
>> in unified format.
>
>> Which format do hackers actually prefer? The wiki page
>> http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git
>> suggests context format, but then the linked example
>> http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified
>> format. Do people care, or are both formats OK?
>
> Some people find one or the other more readable.  (I'm in the camp that
> says unified format is great for isolated single-line changes and
> utterly unreadable for anything more complex, but apparently there are
> people who prefer it.)
>
> For detailed review/commit purposes, it doesn't matter that much as long
> as the patch applies cleanly, since it's easy to apply it and then get
> a diff in the other format if you prefer reading the other.  However,
> if you're just hoping people will eyeball the patch in email and comment
> on it, readability matters.  If the patch requires manual fixup in order
> to get it to apply anymore, readability is also a concern, since you're
> dependent on the committer not misinterpreting the hunks he has to patch
> in by hand.
>

OK thanks, that's good to know.
I tend to find context format easier to read for large patches, but
that's a highly subjective thing.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-16 Thread Tom Lane

Dean Rasheed  writes:
> BTW, I had no problems applying both the original patch and Chetan
> Suttraway's version. The only difference between the patches seems to
> be that the original is in context format, and Chetan Suttraway's is
> in unified format.

> Which format do hackers actually prefer? The wiki page
> http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git
> suggests context format, but then the linked example
> http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified
> format. Do people care, or are both formats OK?

Some people find one or the other more readable.  (I'm in the camp that
says unified format is great for isolated single-line changes and
utterly unreadable for anything more complex, but apparently there are
people who prefer it.)

For detailed review/commit purposes, it doesn't matter that much as long
as the patch applies cleanly, since it's easy to apply it and then get
a diff in the other format if you prefer reading the other.  However,
if you're just hoping people will eyeball the patch in email and comment
on it, readability matters.  If the patch requires manual fixup in order
to get it to apply anymore, readability is also a concern, since you're
dependent on the committer not misinterpreting the hunks he has to patch
in by hand.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-16 Thread Dean Rasheed

On 12 February 2012 02:06, Vik Reykja  wrote:
> I decided to take a crack at the todo item created from the following post:
> http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php
>
> The attached patch makes the desired changes in both code and function
> naming.
>
> It seemed quite easy to do but wasn't marked as easy on the todo, so I'm
> wondering if I've missed something.  All regression tests pass.
>

Here's my review of this patch.

Basic stuff:

* Patch applies OK (some offsets).

BTW, I had no problems applying both the original patch and Chetan
Suttraway's version. The only difference between the patches seems to
be that the original is in context format, and Chetan Suttraway's is
in unified format.

Which format do hackers actually prefer? The wiki page
http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git
suggests context format, but then the linked example
http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified
format. Do people care, or are both formats OK?

* Compiles cleanly with no warnings.

* Regression tests pass.

The regression tests have not been updated. I think that's fair enough
- I don't see a way to test this in a regression test - but I did some
testing (see below) to confirm the expected behaviour.

* No doc changes needed.

What it does:
-

The primary benefit this patch offers is to prevent unnecessary
queuing of RI triggers in the case of updates to a FK table where the
old and new FK values are both NULL. It does this by effectively
replacing the existing key equality checks with IS [NOT] DISTINCT FROM
checks.

This seems like a worthwhile optimisation, because I think that it is
fairly common to have NULLs in FKs columns, and then update some other
column.

The patch also prevents unnecessary queuing of RI triggers when the PK
table is updated, and the old and new values are both NULL, but that
seems like a much less common case.

I've looked over the code changes fairly closely, and I believe that
this is a safe change.

Technically, I think the changes to ri_OneKeyEqual() and
ri_AllKeysUnequal() are unnecessary, since they can only be called
from the trigger functions in the case where all the old values are
non-NULL, hence the new versions end up behaving the same. However, I
think it makes sense for all these functions to be consistent.

Talking of consistency, I wonder if RI_FKey_keyequal_upd_pk() and
RI_FKey_keyequal_upd_fk() ought to be renamed to
RI_FKey_keyunchanged_upd_pk() and RI_FKey_keyunchanged_upd_pk()?

Testing:

I tested using the following tables:

CREATE TABLE pk_table
(
  a int PRIMARY KEY
);

INSERT INTO pk_table
  SELECT * FROM generate_series(1,10);

CREATE TABLE fk_table
(
  a int PRIMARY KEY,
  b int,
  c int,
  d int,
  e int REFERENCES pk_table(a)
);

INSERT INTO fk_table
  SELECT i, i*2, i*3, i*4, CASE WHEN i%10 = 0 THEN i END
FROM generate_series(1,10) g(i);

(i.e., FK populated in 10% of rows)

Then in HEAD:
EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1;

  QUERY PLAN
---
 Update on fk_table  (cost=0.00..2300.00 rows=10 width=26) (actual
time=1390.037..1390.037 rows=0 loops=1)
   ->  Seq Scan on fk_table  (cost=0.00..2300.00 rows=10 width=26)
(actual time=0.010..60.841 rows=10 loops=1)
 Trigger for constraint fk_table_e_fkey: time=210.184 calls=9
 Total runtime: 1607.626 ms
(4 rows)

So the RI trigger is fired 9 times, for the unchanged NULL FK rows.

With this patch, the RI trigger is not fired at all:
EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1;

  QUERY PLAN
---
 Update on fk_table  (cost=0.00..2300.00 rows=10 width=26) (actual
time=1489.640..1489.640 rows=0 loops=1)
   ->  Seq Scan on fk_table  (cost=0.00..2300.00 rows=10 width=26)
(actual time=0.010..66.328 rows=10 loops=1)
 Total runtime: 1489.679 ms
(3 rows)

Similarly, if I update the FK column in HEAD the RI trigger is fired
for every row:
EXPLAIN ANALYSE UPDATE fk_table SET e=e-1;

  QUERY PLAN
---
 Update on fk_table  (cost=0.00..1800.00 rows=10 width=26) (actual
time=1565.148..1565.148 rows=0 loops=1)
   ->  Seq Scan on fk_table  (cost=0.00..1800.00 rows=10 width=26)
(actual time=0.010..42.725 rows=10 loops=1)
 Trigger for constraint fk_table_e_fkey: time=705.962 calls=10
 Total runtime: 2279.408 ms
(4 rows)

whereas with this patch it is only fired for the non-NULL FK rows that
are changing:
EXPLAIN ANALYSE UPDATE fk_tab

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

[HACKERS] REVIEW: Optimize referential integrity checks (todo item)

12 matches

Site Navigation

Mail list logo

Footer information