Kaiting Chen:
I'd like to propose a change to PostgreSQL to allow the creation of a foreign
key constraint referencing a superset of uniquely constrained columns.

+1

Tom Lane:
TBH, I think this is a fundamentally bad idea and should be rejected
outright.  It fuzzes the semantics of the FK relationship, and I'm
not convinced that there are legitimate use-cases.  Your example
schema could easily be dismissed as bad design that should be done
some other way.

I had to add quite a few unique constraints on a superset of already uniquely constrained columns in the past, just to be able to support FKs to those columns. I think those cases most often come up when dealing with slightly denormalized schemas, e.g. for efficiency.

One other use-case I had recently, was along the followling lines, in abstract terms:

CREATE TABLE classes (class INT PRIMARY KEY, ...);

CREATE TABLE instances (
  instance INT PRIMARY KEY,
  class INT REFERENCES classes,
  ...
);

Think about classes and instances as in OOP. So the table classes contains some definitions for different types of object and the table instances realizes them into concrete objects.

Now, assume you have some property of a class than is best modeled as a table like this:

CREATE TABLE classes_prop (
  property INT PRIMARY KEY,
  class INT REFERNECES classes,
  ...
);

Now, assume you need to store data for each of those classes_prop rows for each instance. You'd do the following:

CREATE TABLE instances_prop (
  instance INT REFERENCES instances,
  property INT REFERENCES classes_prop,
  ...
);

However, this does not ensure that the instance and the property you're referencing in instances_prop are actually from the same class, so you add a class column:

CREATE TABLE instances_prop (
  instance INT,
  class INT,
  property INT,
  FOREIGN KEY (instance, class) REFERENCES instances,
  FOREIGN KEY (property, class) REFERENCES classes_prop,
  ...
);

But this won't work, without creating some UNIQUE constraints on those supersets of the PK column first.

For one example of where the semantics get fuzzy, it's not
very clear how the extra-baggage columns ought to participate in
CASCADE updates.  Currently, if we have
    CREATE TABLE foo (a integer PRIMARY KEY, b integer);
then an update that changes only foo.b doesn't need to update
referencing tables, and I think we even have optimizations that
assume that if no unique-key columns are touched then RI checks
need not be made.  But if you did
    CREATE TABLE bar (x integer, y integer,
                      FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE 
CASCADE);
then perhaps you expect bar.y to be updated ... or maybe you don't?

In all use-cases I had so far, I would expect bar.y to be updated, too.

I think it would not even be possible to NOT update bar.y, because the FK would then not match anymore. foo.a is the PK, so the value in bar.x already forces bar.y to be the same as foo.b at all times.

bar.y is a little bit like a generated value in that sense, it should always match foo.b. I think it would be great, if we could actually go a step further, too: On an update to bar.x to a new value, if foo.a=bar.x exists, I would like to set bar.y automatically to the new foo.b. Otherwise those kind of updates always have to either query foo before, or add a trigger to do the same.

In the classes/instances example above, when updating instances_prop.property to a new value, instances_prop.class would be updated automatically to match classes_prop.class. This would fail, when the class is different than the class required by the FK to instances, though, providing exactly the safe-guard that this constraint was supposed to provide, without incurring additional overhead in update statements.

In the foo/bar example above, which is just a bit of denormalization, this automatic update would also be helpful - because rejecting the update on the grounds that the columns don't match doesn't make sense here.

Another example is that I think the idea is only well-defined when
the subset column(s) are a primary key, or at least all marked NOT NULL.
Otherwise they're not as unique as you're claiming.

I fail to see why. My understanding is that rows with NULL values in the referenced table can't participate in FK matches anyway, because both MATCH SIMPLE and MATCH FULL wouldn't require a match when any/all of the columns in the referencing table are NULL. MATCH PARTIAL is not implemented, so I can't tell whether the semantics would be different there.

I'm not sure whether a FK on a superset of unique columns would be useful with MATCH SIMPLE. Maybe it could be forced to be MATCH FULL, if MATCH SIMPLE is indeed not well-defined.

It's also unclear to me how this ought to interact with the
information_schema views concerning foreign keys.  We generally
feel that we don't want to present any non-SQL-compatible data
in information_schema, for fear that it will confuse applications
that expect to see SQL-spec behavior there.  So do we leave such
FKs out of the views altogether, or show only the columns involving
the associated unique constraint?  Neither answer seems pleasant.

Instead of tweaking FKs, maybe it would be possible to define a UNIQUE constraint re-using an existing index that guarantees uniqueness on a subset of columns already? This would allow to create those FK relationships by creating another unique constraint - without the overhead of creating yet another index.

So roughly something like this:

ALTER TABLE foo ADD UNIQUE (a, b) USING INDEX foo_pk;

This should give a consistent output for information_schema views?

Best

Wolfgang


Reply via email to