At 09:50 PM 9/29/2008, Richard Broersma wrote:
On Mon, Sep 29, 2008 at 7:48 PM, Steve Midgley <[EMAIL PROTECTED]> wrote:

> In my specific case it turns out I only had duplicates, but there could have > been n-plicates, so your code is still correct for my use-case (though I
> didn't say that in my OP).

Ya there are a lot of neat queries that you can construct.  If you
have a good background in math and set theory (which I don't have) you
can develop all sorts of powerful analysis queries.

On a side note, I thought that I should mention that unwanted
duplicates are an example where some ~have gotten bitten~ with a
purely surrogate key approach.  To make matter worse, is when some
users update part of one duplicate and another updates a different
duplicated on a another field(s).  Then once the designer discovers
the duplicate problem, she/he has to figure out some way of merging
these non-exact duplicates.  So even if the designer has no intention
of implementing natural primary/foreign keys, he/she will still
benefit from a natural key consideration in that a strategy can be
designed to prevent getting bitten by duplicated data.

I only mention this because db designers get bitten by this all the
time.  Well at least the ones that subscribe to www.utteraccess.com
get bitten.  From what I've seen not one day has gone by without
someone posting a question to this site about how to both find and
remove all but one of the duplicates.

Truly. I have worked with some school districts around the US and this duplicate record problem is more than theoretical. Some of the gnarliest, dirtiest, n-plicate data I've ever seen comes out of the US public education system.

More generally where I have seen a need for natural keys, I've always taken the "best of both worlds" approach. So I always stick an integer/serial PK into any table - why not - they're cheap and sometimes are handy. And then for tables along the lines of your description, I add a compound unique index which serves the business rule of "no dupes along these lines."

Am I following your point? Any reason why using serial PK's with "compound natural unique indices" is better/worse than just using natural PK's?

Steve


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Reply via email to