On Fri, 2003-03-28 at 01:39, Bruce Feist wrote:
Usually it's best to work with normalized tables, which would make thisi'm interested on how to normalize a table... can you suggest me some
trivial. tbl_reports isn't normalized, since it has a simulated array
of persons in it. Could it be split into two tables:
reads ?
A reasonable, but too abridged, summary is at http://www.tomlauren.com/docs/databaseDesign.html#denormalization . A more complete description is at http://www.15seconds.com/issue/020522.htm . And, essentially *any* book on database design worthy of the name will devote a good chunk of space to it.
And, I'll give a description right here. I can't help it; I used to teach database administration <g>.
There are several "normal forms", which are degrees of normalization. The goal of normalizing is to minimize weird stuff called "anomalies", which I'll describe in a moment. The most common target is "third normal form", which I'll describe in another moment.
Anomalies occur when database operations which seem correct result in logically corrupt databases. For instance, let's imagine database representing a school, where all information on a teacher is carried in a "course" table instead of being in its own table. The course table might contain teacher name, teacher salary, date hired, course name, classroom number, and the time of day that the course meets, for example. Note: This is a *bad* design! We will make it even worse by adding room for up to 30 student names, for students taking the class.
Now, let's say that all courses a teacher teaches are cancelled, and the teacher is going to be assigned new courses instead. We delete the course rows... and suddenly we've lost all records of the teacher, too! The teacher data should have been put in a different table as part of the normalization process. This is a "delete anomaly".
Similarly, with the above structure, what happens if the teacher's name was misspelled when first entered, so we updated a course row to reflect the correct spelling. Unless we do that to *all* course rows for the teacher, we now have an inconsistency -- it looks as though there are two different teachers, because their name is entered in two different ways. Redundancy leads to corruption; this anomaly was an "update anomaly".
As we go to stricter normal forms, we increase the stability of our data by making it less prone to corruption through anomalies. However, we tend to increase complexity and decrease performance as well (not always!), so there's a tradeoff. Third normal form is a usual compromise, in which the following conditions are met:
1) Every table has a primary (unique) key, which may consist of one or more fields. In the above example, we might try to make the combination of course time and teacher name the primary key; this would work as long as we didn't have two teachers with the same name. (It would be much better to have a unique Teacher ID.) 2) Every non-key field in a table depends on the *entire* key. (In the above example, we would be violating this -- date hired does not depend on the course time, just on the teacher name.) 3) Non-key fields depend only on key fields, nothing else. In other words, if you know the key, that should be enough to identify the value of any other field in the table. This means that there can be no repeating fields in the table. Our inclusion of student names would violate this; the name of a student depends on more than the teacher's name and time. (In fact, it violates the second condition as well... it's unrelated to the teacher's name and time!)
Here is an alternative design consisting of more tables, which does not violate these rules:
TEACHER teacher_id (primary key) teacher_name date_hired salary
COURSE course_id (primary key) course_description
PRESENTATION (a teacher teaching a course) teacher_id course_id time_given (primary key is combination of teacher_id, course_id, and time_given)
STUDENT student_id (primary key) student name matriculation_date
ATTENDEE (a student taking a course) teacher_id course_id time_given student_id (primary key is combination of teacher_id, course_id, time_given, and student_id)
Normalization is part of the process of "logical design"; this part of the process is independant of the RDBMS being used to implement the system. After logical design comes "physical design", in which you take advantage of the specific facilities offered by your RDBMS to implement it efficiently. For instance, new indexes would be assigned here to speed queries, and table types would be determined.
Bruce Feist
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]