On Fri, 15 Aug 2003, Tom Lane wrote:

> Stephan Szabo <[EMAIL PROTECTED]> writes:
> > On Fri, 15 Aug 2003, Christopher Kings-Lynne wrote:
> >> I can also attest to the horrendously long time it takes to restore the ADD
> >> FOREIGN KEY section...
>
> > That really needs to be rewritten to do a single check over the table
> > rather than running the constraint for every row.  I keep meaning to get
> > around to it and never actually do. :(  I'm not sure that in practice
> > you'll get a better plan at restore time depending on what the default
> > statistics give you.
>
> In simple cases I think that the creation of indexes would be enough to
> get you a passable plan --- CREATE INDEX does update pg_class.reltuples,
> so the planner will know how big the tables are, and for single-column
> primary keys the existence of a unique index is enough to cue the
> planner that the column is unique, even without any ANALYZE stats.
> Those are the biggest levers on the plan choice.
>
> This assumes that pg_dump always dumps CREATE INDEX before ADD FOREIGN
> KEY; I'm not certain if there's anything to enforce that at the
> moment...
>
> I assume what you have in mind is to replace
> validateForeignKeyConstraint() with something that does a join of the
> two tables via an SPI command.  But supposing that we want to keep the
> present ability to report (one of) the failing key values, it seems
> like the query has to look like
>       SELECT keycolumns FROM referencing_table WHERE
>       keycolumns NOT IN (SELECT refcols FROM referenced_table);
> which is only gonna do the right thing for one of the MATCH styles
> (not sure which, offhand ... actually it may not do the right thing
> for any match style if there are nulls in referenced_table ...).

Yes, in practice, you'd have to put IS NOT NULL checks in the subselect,
which is fine for the two match types we support since a referenced row
with a NULL isn't a choice for a referenced row for those.  I think MATCH
PARTIAL might have to fall back to the repeated check unless we can make
the query work which would be harder because you only want to compare the
columns for a particular row where the keycolumn case is not null and I
can't think of a query for that that'd be particularly clean and likely to
be fast, then again I don't think the constraint would be either. :(

It'd probably be:
MATCH unspecified:
 SELECT keycolumns FROM referencing_table WHERE
  (keycolumns) NOT IN (SELECT refcols FROM referenced_table
    WHERE refcol1 IS NOT NULL AND ... )
 AND keycolumn1 IS NOT NULL AND ...;

MATCH FULL: (something like, I haven't tried it)
 SELECT keycolumns FROM referencing_table WHERE
  ((keycolumns) NOT IN (SELECT refcols FROM referenced_table
   WHERE refcol1 IS NOT NULL AND ...)
  AND
   (keycolumn1 IS NOT NULL AND ...)
  )
  OR ((keycolumn1 IS NOT NULL)!=(keycolumn2 IS NOT NULL) OR ...)

> How would you make it work for all the MATCH styles?  And will it
> really be all that efficient?  (NOT IN is a lot more circumscribed
> than IN.)

I'm not really sure yet.  Limited tests seem to show that it'll probably
be as fast if not faster for all reasonable cases, but I'd want to
generate a much larger random data set and actually put it in to make a
fair comparison (maybe temporarily with a set to allow people to try both
cases on real world data).  One other advantage here is that we don't need
to get row locks while checking this if we've already gotten the exclusive
table locks on both tables involved.  I'm not sure if we do that currently
though.



---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to