"Peter Normann" <[EMAIL PROTECTED]> wrote on 06/16/2005 02:15:34 PM:

> Sorry guys,

> I just learned that ctrl-return would send the email you are composing, 
so I
> got cut off short, so let me try again :-/

> It appears to be usual practice from what I can see in this list to use 
a
> primary key that actually contain somewhat meaningful data.

> I have always used primary keys solely for one purpose only: To identify 
a
> table row uniquely, using auto incremental integers. Actually, if I were 
to
> use a unique number in an application, I would add a column to the table 
for
> just that. I would never use the primary key. The table may contain 
foreign
> keys and those may hold an entirely different value than this primary - 
even
> though the other table would share a one to one relationship.

> I have always felt, that if you would rely on a primary key for holding
> somewhat meaningful data, somewhere down the road it could spell 
trouble.

> Anyway, I just wanted to hear if anybody would share their thoughts on 
any
> advantages or drawbacks as to having the primary key contain meaningful
> data.

> Peter Normann

Primary Keys (PKs) have special significance in the realm of database 
theory and operation. A PK value or tuple (for multi-column PKs) will 
uniquely identify each and every row of data within a table. Because of 
this unique distinction, many database engines (including MySQL's InnoDB 
engine) will store their data in PK order (called clustering). Many (if 
not most) engines use the PK as one half of the "pointers" table used to 
store data in the actual files of the database.  Because pointers are 
usually stored as offsets (a fixed size) the size of the PK is strongly 
influenced on the choice of the column(s) that construct the PK. If no PK 
is declared for a table, some databases will use EVERY column (all of 
them) and hash them together to make the pointers table. That means you 
can have duplicate rows in your database and if you wanted to update just 
one of those duplicates, you couldn't.

The reason that many of us use "real" data in our primary keys is because 
the PK is also an index. It's two uses for the price of one: data 
integrity, faster lookups. 

However, it's more common to assign auto-inc numbers to rows so that it 
becomes possible to refer to those rows with a numeric value rather than 
to duplicate the actual "unique data". 

Numbers take up either 2,4, or 8 bytes and are compared MUCH faster than 
string values. Those facts form part of the basis of the theory of 
database normalization. When you are trying to normalize a database, you 
want to replace commonly used values with a reference of those values. 
That way the values are stored only once (less space = faster searches). 
However, it would still be "correct" (in a textbook-kind-of-way) to use 
the full "unique data" tuple for referencing a row in one table from 
another ,a foreign key(FK). Under most circumstances, tables are arranged 
in a parent-child relationship where the parent can have several children 
but each child can have only one parent. That means that the FK stored on 
the child table must uniquely identify a single row of the parent table. 
That leaves us with only two options: use the auto-inc value or use the 
"unique data" tuple. It takes much less space (usually) to store just the 
auto-inc value. 


Using a numeric PK to store, sort, and lookup records will be faster than 
using the data itself so most people declare their auto-inc columns as the 
PK of their table and assign a UNIQUE index to their actual data. That way 
their data values are both indexed and protected against duplication. 

Would changing the "meaningful" data (the unique tuple) break a FK (if it 
was the value used)? Maybe. In MySQL, you can declare FKs for cascading 
updates. That means that if I had was using the tuple {STEVE,989984} for a 
FK and STEVE wants to be called STEVEN then updating the PK to {STEVEN, 
989984) would automatically update the FKs on the tables that used it. If 
you didn't have cascading updates, and you wanted to change the value, you 
would need to:

a) start a transaction
b) change all child tables to point to the new FK
c) change the PK on the parent table
d) commit the transaction

By wrapping the process within a transactional boundary, we prevent the 
database from becoming "inconsistent" (changed in one place but not in 
another). So while it's possible to use "real" data as FK tuples, it's 
takes more administration if you need to change the PK value it 
references. That is another argument in favor of the auto-inc value being 
the PK. Because it has no significance to the data (except to identify a 
row) changing significant fields will not break relational integrity.

I know I have rambled but you asked a very "wide" question and it's late 
in my work day. Thanks for bearing with me.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Reply via email to