subject:"\[HACKERS\] INSERT\/SELECT and excessive foreign key checks"

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

2007-08-21 Thread Lodewijk Vöge



On 21-aug-2007, at 10:55, Alvaro Herrera wrote:


It might go in if it's correct.  If you have an answer to all the
objections then there's no reason not to include it.  But I must  
admit I

didn't understand what was your answer to the above objection; care to
rephrase?


sorry, egg on my face, testing error. the revised patch doesn't catch  
the case of another backend deleting referenced tuples. I'll work on  
it some more.


Lodewijk

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

2007-08-21 Thread Alvaro Herrera

Lodewijk Vöge escribió:
> On 19-aug-2007, at 12:38, Tom Lane wrote:
>
>> An additional problem with your proposal is that it fails to consider
>> other changes that might be happening concurrently -- eg, what if some
>> other backend deletes a source row after you copy it, and commits before
>> you do?
>
> then the patch indeed failed, but when I change it to check those carried 
> over FKs also once, it catches it correctly.
>
> are there other such issues? or is this kind of optimization not going in 
> no matter what?

It might go in if it's correct.  If you have an answer to all the
objections then there's no reason not to include it.  But I must admit I
didn't understand what was your answer to the above objection; care to
rephrase?

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"On the other flipper, one wrong move and we're Fatal Exceptions"
(T.U.X.: Term Unit X  - http://www.thelinuxreview.com/TUX/)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

2007-08-21 Thread Lodewijk Vöge


On 19-aug-2007, at 12:38, Tom Lane wrote:


An additional problem with your proposal is that it fails to consider
other changes that might be happening concurrently -- eg, what if some
other backend deletes a source row after you copy it, and commits  
before

you do?


then the patch indeed failed, but when I change it to check those  
carried over FKs also once, it catches it correctly.


are there other such issues? or is this kind of optimization not  
going in no matter what?


Lodewijk

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

2007-08-20 Thread Lodewijk Vöge



On 19-aug-2007, at 12:38, Tom Lane wrote:


"Hack" is the right word.  People keep proposing variants of the idea
that the executor should optimize updates on the basis of examining
the query tree to see whether columns changed or not, and they're  
always

wrong.  You don't know what else might have been done to the row by
BEFORE triggers.


but that's something it can check for. if there are BEFORE triggers  
on the table, don't do it.



An additional problem with your proposal is that it fails to consider
other changes that might be happening concurrently -- eg, what if some
other backend deletes a source row after you copy it, and commits  
before

you do?
  There would be an interval with no committed row having that FK
value, and no one holding a row lock on the referenced PK row, so some
third transaction could delete the PK row.


so if it checks those FKs being carried over also only once, that  
would close that hole, right?


it would just be nice to not have to disable triggers altogether in  
this case. there is a person twiddling his/her thumbs while all this  
checking and re-checking is going on.


Lodewijk

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

2007-08-19 Thread Andrew Dunstan




James Mansion wrote:
I was wondering whether one could try to identify what might be termed 
'enum tables' that exist to provide lookups.


There are perhaps three main types of table that is the target of a 
foreign key lookup:


1) tables that map to program language enumerations: typically small 
(less than a hundred rows) and changing very infrequently.





8.3's enum types will help with these, and they get you out of the FK 
game altogether.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

2007-08-19 Thread Gregory Stark

"Webb Sprague" <[EMAIL PROTECTED]> writes:

> Is there a different potential hack for marking a table read-only,
> turning it on and off with a function()?  In a hackish vein, use a
> trigger to enforce this, and maybe  a rule that can do the
> optimization?

I think several people already have something like this in mind for the next
release for several different motivations. It is (as most things are in
Postgres) a bit trickier than it appears since even if you block subsequent
writes the table's contents still "change" from the point of view of clients
when their snapshots change.

What's needed is a two-phase command which first starts blocking writes to the
table then vacuums it waiting on each page until every tuple in the entire
table can be frozen. At that point the contents are truly static and the table
can be marked as such.

That would enable a number of optimizations:

. The table can be moved to a read-only medium.

. Index scans can be index-only scans

. The statistics could gather information such as min/max for each column and
  the planner could trust this data. That would allow constraint exclusion to
  kick in for partitions even if you're not querying on the partition key. It
  also allows us to exclude the parent table of the inheritance tree.

. FK checks could rely on a share table lock instead of row locks and
  aggressively cache which key values are found even across transactions.

But this all relies on a user-visible operation to switch the table from
read-write to read-only and back again. It cannot switch the behaviour
transparently because switching it back to read-write requires taking a lock
and notifying everyone to dump their old plans and caches.

Bruce's idea had the merit that it could be made transparent, but I think as a
result it has too few optimizations that would be safe to do.

As a DBA I don't think I would have been too upset at the idea that if I
manually marked a table read only it would enable various optimizations.
Especially if I was told I could mark it read write whenever I felt like, make
my changes and then set it back to read-only. It does have the "one more knob"
nature though.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

2007-08-19 Thread Webb Sprague

>  ... People keep proposing variants of the idea
> that the executor should optimize updates on the basis of examining
> the query tree to see whether columns changed or not, and they're always
> wrong.  You don't know what else might have been done to the row by
> BEFORE triggers.

Is there a different potential hack for marking a table read-only,
turning it on and off with a function()?  In a hackish vein, use a
trigger to enforce this, and maybe  a rule that can do the
optimization?

I wouldn't be the one writing this, so maybe it is ridiculous and I
apologize for contributing to list-noise, but perhaps it would be
useful for those tables that change so rarely.

Of course, it would be a "contrib" package and nothing for the core.

-w

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

2007-08-19 Thread Tom Lane

Lodewijk Voege <[EMAIL PROTECTED]> writes:
> I hacked up a patch that handles these two cases:

"Hack" is the right word.  People keep proposing variants of the idea
that the executor should optimize updates on the basis of examining
the query tree to see whether columns changed or not, and they're always
wrong.  You don't know what else might have been done to the row by
BEFORE triggers.

An additional problem with your proposal is that it fails to consider
other changes that might be happening concurrently -- eg, what if some
other backend deletes a source row after you copy it, and commits before
you do?  There would be an interval with no committed row having that FK
value, and no one holding a row lock on the referenced PK row, so some
third transaction could delete the PK row.

There are some implementation problems in this patch too, but it's not
necessary to delve into such details when the basic premise is
unworkable.  Sorry...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

2007-08-19 Thread James Mansion


Andrew Dunstan wrote:
Could we achieve the same thing in a more general way by having a 
per-FK tiny (say 10?) LRU cache of values checked. Then it wouldn't 
only be restricted to constant expressions. Of course, then the 
trigger would need to keep state, so it might well be too complex 
(e.g. what if there are are concurrent inserts?)


I was wondering whether one could try to identify what might be termed 
'enum tables' that exist to provide lookups.


There are perhaps three main types of table that is the target of a 
foreign key lookup:


1) tables that map to program language enumerations: typically small 
(less than a hundred rows) and changing very infrequently.


2) tables that hold quasi-static reference data where rows are 'never' 
deleted (the may be amended, perhaps to identify that they are logically 
inactivated, but still needed for reference lookup from existing rows 
elsewhere) - typically customer definitions, product definitions, site 
definitions and that sort of thing that is often regarded as 'static 
data' by a user application session but which may change.


3) master records in master/detail relationships such as order/orderline.

If you can have mechanisms that reflect the likelihood of an update and 
optimise accordingly, then hopefully performance in real-world 
applications can be improved.


In the case of 1) for example, we might reasonably have a single logical 
read/write lock that controls access to ALL such tables in a schema, and 
a single 'update generation count'.  The lock would effectively provide 
repeatable read stability across all of the tables (a multi-table table 
lock) while in place, and the generation count (which can be a tran id) 
idicates to caching processes when the cache is stale.  This means that 
unlike normal MVCC the readers will block a writer, but in this case we 
expect the write to happen only during application release.


In the case of 2), we can't use the cross-table lock, and the tables may 
be large, so the suggested LRU cache per table (with a table-level 
read/write lock again) may be most effective, but we may elect to regard 
a read lock as allowing any operation that doesn't invalidate the 
primary key..


And in the case of 3) we don't do anything special at all.

I certainly think that anything which can materially reduce lookups in 
case 1) and hopefully 2) will encourage good database design and 
declarative referential integrity - in some clases of high performance 
application the cost is too high to be done inline with an update, which 
is a shame.


James


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

2007-08-19 Thread Gregory Stark

"Andrew Dunstan" <[EMAIL PROTECTED]> writes:

> Could we achieve the same thing in a more general way by having a per-FK tiny
> (say 10?) LRU cache of values checked. Then it wouldn't only be restricted to
> constant expressions. Of course, then the trigger would need to keep state, so
> it might well be too complex (e.g. what if there are are concurrent inserts?)

The danger here is actually that the same transaction (or command in the case
of non-deferred constraints) later deletes the same record. eg, something
like:

INSERT INTO tab VALUES (1)-> queues check for FK value 1
DELETE FROM tab where fk = 1
DELETE FROM othertab where id = 1
INSERT INTO tab VALUES (1)-> skips queing check because 1 is in cache

Now when the triggers fire that check is skipped because the record that fired
it is no longer valid. And the second record 

You could do it when it comes time to do the check though. Keep a cache of
values you've already actually performed the check for. But I'm not sure how
much that will save.

I was also thinking of having a cache which simply avoided the SPI query which
would keep the tid and xmin found previously. Then we could look up the actual
record directly instead of going through the executor. As long as the record
is still there with the same xmin then we now the value we cached for it is
still valid. This would help the OLTP case where you're performing many small
transactions which refer to the same records over and over again.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

2007-08-19 Thread Andrew Dunstan




Gregory Stark wrote:

"Lodewijk Voege" <[EMAIL PROTECTED]> writes:

  

I hacked up a patch that handles these two cases:
  - for such an INSERT/SELECT, check constant FKs only once.



This sounds like a clever idea. It seems the abstraction violation is worth it
to me.
  


Could we achieve the same thing in a more general way by having a per-FK 
tiny (say 10?) LRU cache of values checked. Then it wouldn't only be 
restricted to constant expressions. Of course, then the trigger would 
need to keep state, so it might well be too complex (e.g. what if there 
are are concurrent inserts?)


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

2007-08-19 Thread Gregory Stark

"Lodewijk Voege" <[EMAIL PROTECTED]> writes:

> I hacked up a patch that handles these two cases:
>   - for such an INSERT/SELECT, check constant FKs only once.

This sounds like a clever idea. It seems the abstraction violation is worth it
to me.

>   - for an INSERT/SELECT from/to the same table, don't check FKs that are
> carried over as is at all. (it'd be nice to extend this to fields that
> have the same FK constraint, rather than the current patch' strict
> same-table, same-field condition)

I'm not sure this is safe. It means you're not locking the target row. If
someone else comes along and deletes both the original source record and the
target record before you've gotten around to inserting the new record...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] INSERT/SELECT and excessive foreign key checks

2007-08-18 Thread Lodewijk Voege

hello,

I'm working on an application that once in a while needs to shuffle some data
around with a query like:

  INSERT INTO foo (some_field, bar_FK, baz_FK)
  SELECT some_field,
 12345 AS bar_FK,
 baz_FK
  FROM foo
  WHERE bar_FK=123

ie. copy a block of data from a table into itself while setting one foreign
key to a constant and carrying some other foreign keys over as is.

the foreign key checks for this case appear very suboptimal: for every
inserted row, both the constant foreign key as well as the keys being carried
over are checked. some of these blocks of data being copied are large, the
largest block being (currently) 1.6M rows. the table has three such foreign
keys, which from reading the code means 4.8M trigger events get queued up,
where it only needs to do exactly one as far as I can tell.

I hacked up a patch that handles these two cases:
  - for such an INSERT/SELECT, check constant FKs only once.
  - for an INSERT/SELECT from/to the same table, don't check FKs that are
carried over as is at all. (it'd be nice to extend this to fields that
have the same FK constraint, rather than the current patch' strict
same-table, same-field condition)
 
the difference for this application is pretty dramatic: on my development
system, copying 1.6M rows in this manner with the table having 3 FKs on it:

  - 8.3 from CVS with 128MB of shared memory takes just under 6 minutes. the
postgres process peaks at 552MB virtual, 527MB RSS and 135MB shared.
  - 8.3 with patch and 128MB of shared memory takes 42 seconds. the process
peaks at 160MB virtual, 138MB RSS and 135MB shared.

so, is this optimization generally useful and correct? and if so, what do I
do to get a version of the patch in the tree? it's against CVS HEAD (well,
against the HEAD of the SVN mirror of CVS) and it passes the regression
tests. but I have no clue whether I'm trampling over abstraction barriers,
etc. comments appreciated.

thanks,
Lodewijk

Index: src/include/commands/trigger.h
===
--- src/include/commands/trigger.h  (revision 28781)
+++ src/include/commands/trigger.h  (working copy)
@@ -157,6 +157,7 @@
 /*
  * in utils/adt/ri_triggers.c
  */
+extern bool RI_FKey_all_attrs_covered(Trigger *trigger, Relation fk_rel, int16 
*attnums, int num_attnums);
 extern bool RI_FKey_keyequal_upd_pk(Trigger *trigger, Relation pk_rel,
HeapTuple old_row, HeapTuple 
new_row);
 extern bool RI_FKey_keyequal_upd_fk(Trigger *trigger, Relation fk_rel,
Index: src/backend/executor/execMain.c
===
--- src/backend/executor/execMain.c (revision 28781)
+++ src/backend/executor/execMain.c (working copy)
@@ -39,6 +39,7 @@
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
 #include "catalog/toasting.h"
+#include "catalog/pg_trigger.h"
 #include "commands/tablespace.h"
 #include "commands/trigger.h"
 #include "executor/execdebug.h"
@@ -65,6 +66,9 @@
 
 /* decls for local routines only used within this module */
 static void InitPlan(QueryDesc *queryDesc, int eflags);
+static void FilterFKChecks(EState *estate, int (*condition)(TargetEntry *te));
+static void FilterFKChecksForFieldsCarriedOver(EState *estate);
+static void FilterFKChecksForConstantFields(EState *estate);
 static void initResultRelInfo(ResultRelInfo *resultRelInfo,
  Relation resultRelationDesc,
  Index resultRelationIndex,
@@ -821,6 +825,9 @@
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
 
+   if (operation == CMD_INSERT)
+   FilterFKChecksForFieldsCarriedOver(estate);
+
/*
 * If doing SELECT INTO, initialize the "into" relation.  We must wait
 * till now so we have the "clean" result tuple type to create the new
@@ -832,7 +839,76 @@
OpenIntoRel(queryDesc);
 }
 
+static void
+FilterFKChecks(EState *estate, int (*condition)(TargetEntry *te)) {
+   TriggerDesc *trigdesc;
+   Trigger *trigger;
+   ListCell *em;
+   Relationrel;
+   int16 *attrs;
+   int i, num_attrs;
+
+   if (estate->es_plannedstmt == NULL || 
+   estate->es_num_result_relations > 1 ||
+   estate->es_result_relations == NULL ||
+   estate->es_plannedstmt->planTree->type == T_ValuesScan)
+ return;
+   trigdesc = estate->es_result_relations[0].ri_TrigDesc;
+   if (trigdesc == NULL)
+ return;
+
+   num_attrs = 0;
+   attrs = palloc(
+   list_length(estate->es_plannedstmt->planTree->targetlist) * 
sizeof(int16));
+   foreach(em, estate->es_plannedstmt->planTree->targetlist)  {
+   TargetEntry *te = (TargetEntry *)lfirst(em);
+   if (condition(te))
+   attrs[num_attrs++] = te->resno;
+   }
+
+   rel

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

Re: [HACKERS] INSERT/SELECT and excessive foreign key checks

[HACKERS] INSERT/SELECT and excessive foreign key checks

13 matches

Site Navigation

Mail list logo

Footer information