Re: Join Question

Bruce Feist Thu, 27 Mar 2003 18:56:34 -0800

alx wrote:

On Fri, 2003-03-28 at 01:39, Bruce Feist wrote:

Usually it's best to work with normalized tables, which would make this trivial. tbl_reports isn't normalized, since it has a simulated array of persons in it. Could it be split into two tables:

i'm interested on how to normalize a table... can you suggest me some reads ?

A reasonable, but too abridged, summary is at
http://www.tomlauren.com/docs/databaseDesign.html#denormalization .  A
more complete description is at
http://www.15seconds.com/issue/020522.htm .  And, essentially *any* book
on database design worthy of the name will devote a good chunk of space
to it.

And, I'll give a description right here.  I can't help it; I used to
teach database administration <g>.

There are several "normal forms", which are degrees of normalization.
 The goal of normalizing is to minimize weird stuff called "anomalies",
which I'll describe in a moment.  The most common target is "third
normal form", which I'll describe in another moment.

Anomalies occur when database operations which seem correct result in
logically corrupt databases.  For instance, let's imagine database
representing a school, where all information on a teacher is carried in
a "course" table instead of being in its own table.  The course table
might contain teacher name, teacher salary, date hired, course name,
classroom number, and the time of day that the course meets, for
example.  Note:  This is a *bad* design!  We will make it even worse by
adding room for up to 30 student names, for students taking the class.

Now, let's say that all courses a teacher teaches are cancelled, and the
teacher is going to be assigned new courses instead.  We delete the
course rows... and suddenly we've lost all records of the teacher, too!
 The teacher data should have been put in a different table as part of
the normalization process.  This is a "delete anomaly".

Similarly, with the above structure, what happens if the teacher's name
was misspelled when first entered, so we updated a course row to reflect
the correct spelling.  Unless we do that to *all* course rows for the
teacher, we now have an inconsistency -- it looks as though there are
two different teachers, because their name is entered in two different
ways.  Redundancy leads to corruption; this anomaly was an "update anomaly".

As we go to stricter normal forms, we increase the stability of our data
by making it less prone to corruption through anomalies.  However, we
tend to increase complexity and decrease performance as well (not
always!), so there's a tradeoff.  Third normal form is a usual
compromise, in which the following conditions are met:

1)  Every table has a primary (unique) key, which may consist of one or
more fields.  In the above example, we might try to make the combination
of course time and teacher name the primary key; this would work as long
as we didn't have two teachers with the same name.  (It would be much
better to have a unique Teacher ID.)
2)  Every non-key field in a table depends on the *entire* key.  (In the
above example, we would be violating this -- date hired does not depend
on the course time, just on the teacher name.)
3)  Non-key fields depend only on key fields, nothing else.  In other
words, if you know the key, that should be enough to identify the value
of any other field in the table.  This means that there can be no
repeating fields in the table.  Our inclusion of student names would
violate this; the name of a student depends on more than the teacher's
name and time.  (In fact, it violates the second condition as well...
it's unrelated to the teacher's name and time!)

Here is an alternative design consisting of  more tables, which does not
violate these rules:

TEACHER
teacher_id (primary key)
teacher_name
date_hired
salary

COURSE
course_id (primary key)
course_description

PRESENTATION (a teacher teaching a course)
teacher_id
course_id
time_given
(primary key is combination of teacher_id, course_id, and time_given)

STUDENT
student_id (primary key)
student name
matriculation_date

ATTENDEE (a student taking a course)
teacher_id
course_id
time_given
student_id
(primary key is combination of teacher_id, course_id, time_given, and
student_id)

Normalization is part of the process of "logical design"; this part of
the process is independant of the RDBMS being used to implement the
system.  After logical design comes "physical design", in which you take
advantage of the specific facilities offered by your RDBMS to implement
it efficiently.  For instance, new indexes would be assigned here to
speed queries, and table types would be determined.

Bruce Feist


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Join Question

Reply via email to