On 10 June 2013 07:06, Noah Misch <n...@leadboat.com> wrote: > On Sun, Jun 09, 2013 at 10:51:43AM +0100, Simon Riggs wrote: >> On 9 June 2013 02:12, Noah Misch <n...@leadboat.com> wrote: >> > On Sat, Jun 08, 2013 at 08:20:42PM -0400, Robert Haas wrote: >> >> On Sat, Jun 8, 2013 at 5:41 PM, Noah Misch <n...@leadboat.com> wrote: >> >> > Likewise; I don't see why we couldn't perform an optimistic check ASAP >> >> > and >> >> > schedule a final after-statement check when an early check fails. That >> >> > changes performance characteristics without changing semantics. >> >> >> >> ...this seems like it might have some promise; but what if the action >> >> we're performing isn't idempotent? And how do we know? >> > >> > The action discussed so far is RI_FKey_check_ins(). It acquires a KEY >> > SHARE >> > lock (idempotent by nature) on a row that it finds using B-tree equality >> > (presumed IMMUTABLE, thus idempotent). RI_FKey_check_upd() is nearly the >> > same >> > action, so the same argument holds. Before treating any other operation in >> > the same way, one would need to conduct similar analysis. >> >> As long as we are talking about FKs only, then this approach can work. >> All we are doing is applying the locks slightly earlier than before. >> Once locked they will prevent any later violations, so we are safe >> from anybody except *ourselves* from making changes that would >> invalidate the earlier check. Trouble is, there are various ways I >> can see that as possible, so making a check early doesn't allow you to >> avoid making the check later as well. > > This UPDATE or DELETE that invalidates the check by modifying the PK row will > fire the usual RI_FKey_*_{upd,del} trigger on the PK table. That will (a) > fail the transaction, (b) CASCADE to delete the new FK row, or (c) update the > new FK row's key column to NULL/DEFAULT. If (a) happens we're of course fine. > If (b) or (c) happens, the FK's AFTER check already today becomes a no-op due > to the visibility test in RI_FKey_check(). Consequently, I don't think later > actions of the SQL statement can put us in a position to need a second check. > >> AFAICS there are weird cases where changing the way FKs execute will >> change the way complex trigger applications will execute. I don't see >> a way to avoid that other than "do nothing". Currently, we execute the >> checks following the normal order of execution rules for triggers. >> Every idea we've had so far changes that in some way; variously in >> major or minor ways, but changed nonetheless. > > I've tried to envision a trigger-creates-missing-references scenario that > would notice the difference. The trigger in question would, I'm presuming, be > an AFTER ROW INSERT trigger named such that it fires before the FK trigger. > The initial optimistic FK check would fail, so we would queue a traditional FK > AFTER trigger. From that point, the scenario proceeds exactly as it proceeds > today. Could you detail a problem scenario you have in mind? > >> Even the approach of deferring checks to allow them to be applied in a >> batch mean we might change the way applications execute in detail. >> However, since the only possible change there would be to decrease the >> number of self-induced failures that seems OK. >> >> So the question is how much change do we want to introduce? I'll guess >> "not much", rather than "lots" or "none". > > The batch would need to fire at the trigger firing position of the *last* > queue entry it covers. If you run a final FK check earlier than that, other > AFTER triggers that expect to run before the FK check and affect its outcome > may not yet have run. In contrast, an FK check running later than usual is > mostly fine; whether a tuple has been locked does not affect the semantics of > later ordinary actions in the transaction. (I say "ordinary" to exclude a > function like pgrowlocks() that makes a point to discern. Also note that the > reasoning about timing only applies to definitive FK checks that can throw > errors; a tentative, soft-failure check is acceptable anytime after the new > row is in place.) > > One can, however, at least construct problem cases. When a query calls > functions that perform non-transactional actions, changing the timing of an > ERROR with respect to those calls changes application behavior. Taking locks > in a different order affects the incidence of deadlocks. Does compatibility > to that degree have much value? I'd be happy to file those in the "not much > change" category. > >> Proposal: Have a WHEN clause that accumulates values to be checked in >> a hash table up to work_mem in size, allowing us to eliminate the most >> common duplicates (just not *all* duplicates). If the value isn't a >> duplicate (or at least the first seen tuple with that value), we will >> queue up a check for later. That approach gives us *exactly* what we >> have now and works with the two common cases: i) few, mostly >> duplicated values, ii) many values, but clustered together. Then apply >> changes in batches at end of statement. > > I'm still fine with this proposal, but it does not dramatically sidestep these > sorts of tricky situations. Suppose a COPY inserts rows with fkcol=1 at TIDs > (0,3), (0,5) and (0,6). You queue the (0,3) check and omit the rest. This > works great so long as (0,3) still satisfies SnapshotSelf by the time the > check actually happens. If (0,3) is dead to SnapshotSelf, the check should > instead happen at the timing of (0,5). If that's dead, at the timing of > (0,6). If that too is dead, the check must not happen at all. By the time > you store enough logistics data to rigorously mirror current behavior, you've > recreated today's trigger queue.
Your earlier comments argue that it is OK to make an early check. The above seems to argue the opposite, not sure. IIUYC we can do this: * search hash table for a value, if found, skip check and continue * if entry in hash not found make an immediate FK check * if the check passes, store value in hash table, if it fits * if check does not pass or value doesn't fit, queue up an after trigger queue entry except we want to batch things a little, so same algo just with a little batching. * search hash table for a value, if found, skip check and continue * if entry in hash not found add to next batch of checks and continue * when batch full make immediate FK checks for whole batch in one SQL stmt * if a check passes, store value in hash table, if it fits * if check does not pass or value doesn't fit, queue up an after trigger queue entry * when executing queue, use batches to reduce number of SQL stmts Which hopefully is the amalgam of all the good ideas so far -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers