"Peter Normann" <[EMAIL PROTECTED]> wrote on 06/16/2005 02:15:34 PM:
> Sorry guys, > I just learned that ctrl-return would send the email you are composing, so I > got cut off short, so let me try again :-/ > It appears to be usual practice from what I can see in this list to use a > primary key that actually contain somewhat meaningful data. > I have always used primary keys solely for one purpose only: To identify a > table row uniquely, using auto incremental integers. Actually, if I were to > use a unique number in an application, I would add a column to the table for > just that. I would never use the primary key. The table may contain foreign > keys and those may hold an entirely different value than this primary - even > though the other table would share a one to one relationship. > I have always felt, that if you would rely on a primary key for holding > somewhat meaningful data, somewhere down the road it could spell trouble. > Anyway, I just wanted to hear if anybody would share their thoughts on any > advantages or drawbacks as to having the primary key contain meaningful > data. > Peter Normann Primary Keys (PKs) have special significance in the realm of database theory and operation. A PK value or tuple (for multi-column PKs) will uniquely identify each and every row of data within a table. Because of this unique distinction, many database engines (including MySQL's InnoDB engine) will store their data in PK order (called clustering). Many (if not most) engines use the PK as one half of the "pointers" table used to store data in the actual files of the database. Because pointers are usually stored as offsets (a fixed size) the size of the PK is strongly influenced on the choice of the column(s) that construct the PK. If no PK is declared for a table, some databases will use EVERY column (all of them) and hash them together to make the pointers table. That means you can have duplicate rows in your database and if you wanted to update just one of those duplicates, you couldn't. The reason that many of us use "real" data in our primary keys is because the PK is also an index. It's two uses for the price of one: data integrity, faster lookups. However, it's more common to assign auto-inc numbers to rows so that it becomes possible to refer to those rows with a numeric value rather than to duplicate the actual "unique data". Numbers take up either 2,4, or 8 bytes and are compared MUCH faster than string values. Those facts form part of the basis of the theory of database normalization. When you are trying to normalize a database, you want to replace commonly used values with a reference of those values. That way the values are stored only once (less space = faster searches). However, it would still be "correct" (in a textbook-kind-of-way) to use the full "unique data" tuple for referencing a row in one table from another ,a foreign key(FK). Under most circumstances, tables are arranged in a parent-child relationship where the parent can have several children but each child can have only one parent. That means that the FK stored on the child table must uniquely identify a single row of the parent table. That leaves us with only two options: use the auto-inc value or use the "unique data" tuple. It takes much less space (usually) to store just the auto-inc value. Using a numeric PK to store, sort, and lookup records will be faster than using the data itself so most people declare their auto-inc columns as the PK of their table and assign a UNIQUE index to their actual data. That way their data values are both indexed and protected against duplication. Would changing the "meaningful" data (the unique tuple) break a FK (if it was the value used)? Maybe. In MySQL, you can declare FKs for cascading updates. That means that if I had was using the tuple {STEVE,989984} for a FK and STEVE wants to be called STEVEN then updating the PK to {STEVEN, 989984) would automatically update the FKs on the tables that used it. If you didn't have cascading updates, and you wanted to change the value, you would need to: a) start a transaction b) change all child tables to point to the new FK c) change the PK on the parent table d) commit the transaction By wrapping the process within a transactional boundary, we prevent the database from becoming "inconsistent" (changed in one place but not in another). So while it's possible to use "real" data as FK tuples, it's takes more administration if you need to change the PK value it references. That is another argument in favor of the auto-inc value being the PK. Because it has no significance to the data (except to identify a row) changing significant fields will not break relational integrity. I know I have rambled but you asked a very "wide" question and it's late in my work day. Thanks for bearing with me. Shawn Green Database Administrator Unimin Corporation - Spruce Pine