Re: More normalization

Rob Thu, 06 Jun 2002 02:43:31 -0700

I think what you're getting at is that you're trying to consider both
'contactInfo' and 'entity' to be superclasses, with subclasses like
'phoneContactInfo', 'addressContactInfo', etc. and you'd like other tables
to have fields which reference the superclass. Inheritance relationships are
a common problem in relational databases. (In fact, these representational
problems are not specific to databases, but to any data store which needs to
cope with object-oriented data.) There are at least three common ways of
approaching the subclassing problem, each with its own stengths and
weaknesses:

1) Keep one table named after the superclass, add a 'type' field, and have
lots of redundant fields.
In your case, you'd have one table called 'contactInfo' and one table called
'entities', each with a type field and then lots and lots of fields: the
union of the fields in your current tables. Joins between the superclasses
are trivial: just use a single join table as you would for any many-to-many
join. Joining just a specific subclass (like joining against just the
'postal' contact info) requires including the type field in the join.
This approach is by far the simplest and often provides the best speed in
relational databases, particularly when you only care about joins to the
superclass, however you could potentially waste a lot of storage and it is
certainly far from being 'normalized'.
(This seems like a good time to point out that while normalization of your
data model is almost always a good idea, full normalization of the RDBMS
implementation of that data model is seldom crucial and is often sacrificed
in favor of speed and/or storage optimizations.)
This design occurs most often as an entity in the data model migrates from
being a single type with a few variations to a set of distinct subtypes.

2) Keep one table for each of the subclasses, duplicating the superclass
fields in each of those tables.
This seems to be the situation you've described: tables like 'address',
'phone', and 'email'. If there were fields common to all subclasses, you
would add them to each and every table; i.e if you wanted to add a
'date_entered' field to all contact information, then you'd add it to each
table. One 'invisible' field which always needs to be duplicated is the
primary key: if you consider all three tables members of a common
superclass, then that superclass (with no actual representation in the
database) must have a primary key, and of course that primary key must be
unique across all three tables. You can make this implicitly true by making
the primary key a combination of the particular subtype (which is stored
implicitly in each table) and a type-unique identifier (a common
auto-increment field). Joins to the superclass are accomplished by
referencing not just the row's id in the table, but also the table type
(it's a two-column primary key), joining against all the subclass tables
with checks against type. Joins to the subclass are trivial joins against
the row id.
This system has very different performance characteristics from the first
approach: there is very little wasted space (beyond the extra column in
superclass foreign keys), however processing is substantially more complex
(multiple left joins against constant comparision and row comparisions and
overall nullity checks...) although indices will still provide equivalent
algorithmic performance, assuming your number of subclasses is bounded. This
technique is most appropriate when most of the joins are to specific
subclasses.
While this is a very popular approach, I've always felt it was messy and
complicated to require everyone who references the superclass to store
information that exists nowhere else in the database. At the very least,
good documentation of your data model (with strong emphasis on the fact that
these separate tables are in fact subclasses of an invisible superclass, and
their common fields must be unioned to create the virtual superclass table)
is essential.

3) Keep a superclass table and multiple subclass tables.
In this case, all the information common to all subclasses is stored in a
single table, like 'contactInfo', including some kind of primary key. There
is then an additional table for each subclass with subclass-specific info:
the 'phone' table would have area code, number, and extension, etc. There is
a one-to-one relationship between each row in the subclass table and each
row in the superclass table, and this is most commonly implemented by having
the subclass table duplicate the superclass primary key, usually using it as
its own primary key in the subclass domain. (Whether each superclass must be
related to a row in one of the subclass tables is up to you, depending on
whether or not you consider the superclass 'pure virtual' or not.)
Joins against the superclass (and all superclass-type fields) are trivial by
joining against the superclass table on the primary key, and joins against
subclasses are joins against both the subclass and superclass tables (both
on the same primary key)- you get some fields from each.
Inexperienced object-oriented designers get nervous about the need to join
to the superclass, but then display all the subclass fields (downcast
joins). In truth, this is somewhat nonsensical in an object-oriented system,
but you can still use multiple joins against all the subclass tables to make
it work.
Even if you don't have any commmon information in the superclass, if you
really consider these separate things to inherit from some common base then
this technique is useful just for keeping the primary keys in each table
unique across the superclass.
This technique remains more efficient than the first one in terms of
storage, and is comparable to the second one in terms of efficiency. It is,
however, certainly the cleanest, making joins to both superclass and
subclass simple from other tables.

Note of course that all three approaches are logically equivalent- general
relational operations can transform any one into any other; the only
differences are in access speed, storage space, and maintainability.

Did I answer your original question at all?

-rob

On 5/6/02 at 9:42 am, Craig Berry <[EMAIL PROTECTED]> wrote:

> Here's a design question similar to the current threads on normalization
> issues.
> 
> Suppose I have three types of contact information I want to keep track of:
> Postal address, phone, and email.  The design is that any given entity
> that has contacts info associated may have zero or more of each.
> 
> I also have many types of entities which may have contact info.
> Individual people, businesses, departments within businesses, and so
> forth.  These differ sufficiently that each gets its own table.
> 
> The only way I can see to handle the contactable-entity-to-contacts
> mappings is to have M times N association tables, where M is the number of
> contact types and N the number of contactable entity types.  So you'd have
> tables like PERSON_EMAIL, BUSINESS_POSTAL, and so forth, each just
> relating 1 of the first part to 0..N of the second.
> 
> However, this design leads to an explosion of those 'join tables'.  If I
> have four contact types (say I add 'url') and five contactable entity
> types, I end up with 20 join tables!  I can obviously do this, but I can't
> help thinking I'm missing some way to simplify this design.
> 
> So...any suggestions?
> 
> <topical check food>query sql</topical check food>
> 
> -- 
>    |   Craig Berry - http://www.cine.net/~cberry/
>  --*--  "When resolving office conflicts, remember the wisdom of
>    |   Mahatma Gandhi: If enough peasants die horribly, someone
>        will probably notice." -- The Onion
> 
> 
> 
> 
> ---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
> 
> To request this thread, e-mail <[EMAIL PROTECTED]>
> To unsubscribe, e-mail <[EMAIL PROTECTED]>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
> 
> 

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: More normalization

Reply via email to