"Kenneth Wagner" <[EMAIL PROTECTED]> wrote on 12/21/2005 
04:27:53 PM:

> Hi Rhino,
> 
> Excellent question. Felt as you do, initially.
> 
> Here's what changed my mind.
> 
> Integer keys are fast. And small. Hence, they take very little RAM 
space.
> 
> They are contiguous. A missing PK is easy to find. There's a gap in the 
> number sequence.
> Can't do this with the part description. No way to tell if a record is 
> missing.
> 
> Example: The system gets hung up or crashes and a reboot is needed.
> How to test the integrity of the parts table. I.e., anything missing? 
Check 
> the PK for
> continuity is a good place to start. With a timestamp I would even know 
the 
> date
> where the file got truncated. Example. It's Dec 20th. The highest date 
in 
> the file is
> Dec 1st at rec# 1203023. That's where the analysis would begin. Other 
files 
> that
> didn't get truncated but have the related key # in them would tip me off 
as 
> to how
> much is missing. Like an order file.
> 
> Speed. Especially where related files are concerned. Foreign keys. Links 
on 
> integer
> fields are faster, smaller and more efficient. Keys remain smaller and 
> faster.
> 
> Activity testing: Let's say I do some statistical testing.  Like how 
many 
> new parts
> per month on average. Easy to do with the integer PK. Even easier if it 
has 
> a timestamp.
> Then if the average suddenly drops or increases I would want to know 
why. Or 
> modify
> my DB tables or coding. Note that the timestamp does not have to be in 
your 
> example
> table. It could be in an insert/update table that just tracks what has 
been 
> added or updated
> by PK, timestamp, activity type and updatedbyuserID.
> 
> So, there's 2 cents worth.
> 
> Wondering how relevant this is?
> 
> HTH,
> 
> Ken Wagner
> 
> 
> 
> ----- Original Message ----- 
> From: "Rhino" <[EMAIL PROTECTED]>
> To: "mysql" <mysql@lists.mysql.com>
> Sent: Wednesday, December 21, 2005 2:54 PM
> Subject: Reason for Auto-increment primary keys?
> 
> 
> > One technique that I see a lot on this mailing list is people putting 
> > auto-incremented integer primary keys on their tables.
> >
> > Maybe I'm just "old school" but I've always thought that you should 
choose 
> > a primary key based on data that is actually in the table whenever 
> > possible, rather than generating a new value out of thin air.
> >
> > The only exception that comes to mind is things like ID numbers; for 
> > example, it is better to use an internally-generated integer for an 
> > employee number than it is to use an employee's name. Even the 
combination 
> > of first name and last name is not necessarily unique - I could cite a 

> > real life example -and, of course, people can change their names. That 

> > makes names less desireable than a generated value when you are trying 
to 
> > uniquely indentify such entities. In such a case, a nice, reasonable 
short 
> > integer is easier.
> >
> > I just found this rather good definition of primary keys at 
> > http://www.utexas.edu/its/windows/database/datamodeling/dm/keys.html. 
The 
> > relevant bit says that a primary key must have:
> > - a non-null value for each instance of the entity
> > - a value that is unique for each instance of an entity
> > - a value that must not change or become null during the life of the 
each 
> > instance of the entity
> >
> > That article makes the same basic remarks about name vs. ID but makes 
the 
> > point that it is more commonly the case that table designers will use 
> > something like a social security number - an _externally_ generated 
> > number - to distinguish between employees rather than an 
> > internally-generated number.
> >
> > But the trend in this mailing list is toward using generated values as 

> > primary keys in virtually EVERY table, even when good primary keys can 
be 
> > found in the (non-generated) data already existing in the table.
> >
> > Now, I haven't done anything remotely resembling a quantified analysis 
so 
> > maybe I'm wildly exaggerating this trend. But I do seem to recall a 
lot of 
> > table descriptions with auto-generated keys and I don't think they 
were 
> > all a name vs. ID scenario....
> >
> > Has anyone else noticed a similar trend?
> >
> > If this trend is real, it doesn't seem like a very good trend to me. 
For 
> > example, if you were keeping track of parts in a warehouse, why would 
> > anyone make a table that looked like this:
> > ID (autogenerated PK)     PART_NO    PART_DESCRIPTION
> > 1                                   A01             Widget
> > 2                                    B03            Grapple Grommet
> > 3                                    A02            Snow Shovel
> > 4                                    D11            Whisk
> > 5                                    C04            Duct Tape
> >
> > when this table is simpler:
> >
> > PART_NO (PK)   PART_DESCRIPTION
> > A01                 Widget
> > B03                Grapple Grommet
> > A02                Snow Shovel
> > D11                Whisk
> > C04                Duct Tape
> >
> > Would anyone care to convince me that the first version of the table 
is 
> > "better" than the second version in some way?
> >
> > I just want to be sure that no one has come along with some new and 
> > compelling reason to autogenerate keys when perfectly good keys can be 

> > found within the data already. I don't mind being "old school" but I 
don't 
> > want to be "out to lunch" :-)
> >
> >
> > Rhino
> >
> >
> >
> > -- 
> > No virus found in this outgoing message.
> > Checked by AVG Free Edition.
> > Version: 7.1.371 / Virus Database: 267.14.2/208 - Release Date: 
20/12/2005
> >

I agree with every point that Kenneth just made. Integers compare 5 to 50 
times faster than strings (depending on the length of the string) and 
usually take up much less room. That means that more index items can fit 
into memory and you are less likely to cause memory paging during an index 
operation. 

I frequently define both an auto_inc field and a PK on other values. I use 
the auto_inc field for FK relationships (due to the already mentioned 
reasons) but the PK is there to preserve my data integrity. 

Basically, the heavy use of auto_increment is a practical compromise of 
form vs. speed.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Reply via email to