On 2001-06-27 at 00:24 +0200, [EMAIL PROTECTED] wrote:

> On Tue, Jun 26, 2001 at 05:32:43PM -0400, Michael Bilow wrote:
> > On 2001-06-26 at 21:45 +0200, [EMAIL PROTECTED] wrote:
> > 
> > > Why would this matter? Do databases assume that records with primary
> > > keys "near" one another will often be used together?
> > 
> > Yes, this is why they are called "primary keys."  Traditionally, database
> > engines would try to entry-sequence records by primary key, and there
> > remains an expectation that access by primary key will always be the
> > fastest and most efficient mechanism for accessing a table.
> 
> It seems strange to me that locality would be important in this
> case. The assumption that record number 5 and record number 6 are
> inherently linked more than record 5 and record 8793 are would
> certainly hold for some databases, but that it should be true in the
> general case (or even just often enough that it matters)?
> 
> I can only see the usefulness for binary search, but there you would
> presumably build index tables anyway so actual location of data
> doesn't matter.

The main reason why primary key access is expected to be more efficient is
because experience has shown that databases tend to made up of two flavors
of table: tables which are read frequently and written infrequently, which
are usually searched on the same key, and tables which are inserted to
frequently and read not too much more often than they are written.

An example is something like an order entry system where orders are
created in an orders table for customers in a customers table to sell
items that are in an items table.  The items table will be written very
rarely, only when new items are introduced, but will be read frequently.  
Although there might be occasional need to search the items table on some
key other than the primary key, such as a description field, the vast
majority of accesses from the point of view of the database engine will be
to resolve references from other tables and these will all be done by
primary key.  For example, whenever an order is viewed, the orders table
references to items by primary key will have to be resolved through the
items table.  Because of this, optimizing for primary key will usually
result in an order of magnitude performance improvement.  The customers
table may be modified more frequently than the items table, but if there
are regular customers then the customers table will still be modified much
less frequently than the orders table.

The orders table, in turn, is mostly being modified by insertion
operations.  There might be occasions to modify an order record, say to
notate than an order has been shipped or that part of an order is
backordered, but the basic common operation regarding an order table will
be to either insert a new order or to locate all orders associated with
some other entity, such as a customer.  Looking up all orders for a
customer will require resolving through a secondary index on the orders
table, but those references will themselves resolve back to primary keys
in the orders table.  So the end result is that all database accesses are
eventually going to become a search by primary key, and optimizing for
that is invariably a huge win.

-- Mike



_______________________________________________
JBoss-user mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/jboss-user

Reply via email to