Re: [HACKERS] Surrogate keys (Was: enums)

Dann Corbit Thu, 19 Jan 2006 11:24:15 -0800


> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
> [EMAIL PROTECTED] On Behalf Of Josh Berkus
> Sent: Thursday, January 19, 2006 10:09 AM
> To: Martijn van Oosterhout
> Cc: [email protected]
> Subject: Re: [HACKERS] Surrogate keys (Was: enums)
> 
> Martjin,
> 
> > In any of these either misspellings, changes of names, ownership or
> > even structure over time render the obvious useless as keys. There
are
> > techniques for detecting and reducing duplication but the point is
that
> > for any of these duplicates *can* be valid data.
> 
> Please point me out where, in the writings of E.F. Codd or in the SQL
> Standard, it says that keys have to be immutable for the life of the
row.


Only do that for data that you care about.  If you think that the data
has no value, there is no need to have a way to identify a row.

> Duplicate *values* can be valid data.  Duplicate *tuples* show some
> serious flaws in your database design.  If you have a personnel
> directory on which you've not bothered to define any unique
constraints
> other than the ID column, then you can't match your data to reality.
If
> you have two rows with the same first and last name, you don't know if
> they are two different people or the same person, duplicated.  Which
> will be a big problem come paycheck time.
> 
> Per E.F. Codd, each tuple is a *unique* predicate (or key) comprising
a
> set of values definining a *unique* data entity.  i.e. "The employeee
> named "John" "Little" at extension "4531".  There is nothing anywhere
> said about keys never changing.
> 
> This is Databases 101 material.  Really!

I give it an 'F.'

When the data changes, the problems generated are not just due to
repercussions related to the child and parent tables related through the
primary key.

Someone has an invoice, and they call in with a question. A combination
of their name and address was used as a primary key.  They moved, and
sent in a forwarding address.  The DBA was smart enough to design the
database to cascade results, so that there are no orphan records and we
have not compromised the structure of the database.
The customer calls in with a question about an old invoice.
"We have no record of that transaction."

I was a DBA for a database for a company with many millions of customers
worldwide (e.g. the product registration table was 24 GB).

Their design had natural keys in it.  It caused dozens of problems,
every single day.

I content that most people are not smart enough to decide when a natural
key is a good idea.  The engineers that designed the database were
probably pretty smart, since it sort of worked and had thousands of
tables and hundreds of millions of rows in it.  But one bad decision on
a natural key will cause literally millions of dollars of damage.

The primary defense I have heard so far is that the Oids are hard to
understand.  They are nothing in comparison with understanding what to
do when you have 25 changes to primary keys on various tables every
single day.

Once you get used to Oids, I find it hard to believe that any
intelligent person finds them confusing.  Confusion resulting from
having primary keys that are a moving target?  Now that's confusion for
you.

IMO-YMMV.

I think it is time for me to give it a rest, though.  My experience may
be very atypical and I feel strangely passionate about it.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: [HACKERS] Surrogate keys (Was: enums)

Reply via email to