James Tu wrote:
Hi:

Let's say I want to store the following information.

Unique ID - INT(10) autoincrement
First Name - VARCHAR (25)
Last Name - VARCHAR (25)
Age - INT(3)

In general 'age' is a bad column, because you need to know what year the data was entered to calculate the current age. It is often better to store year of birth or date of birth. This may not be relevant to your application, I just wanted to mention it.

Date - DATETIME
Activity - VARCHAR(100)
Data - TEXT


I would be basing my queries on all columns _except_ the Data column. I.e. I would be using WHERE's with all except the Data column.

You are not telling us how much data you are planning to maintain. How big will the "Data" column be, on average, and how many rows/persons are we talking about? Hundreds, thousands or millions?

My question is...which design would perform better?

(Design A) Put all in one table...index all the columns that I will use WHERE with.
-TABLE_ALL-
Unique ID - INT(10) autoincrement
First Name - VARCHAR (25)
Last Name - VARCHAR (25)
Age - INT(3)
Date - DATETIME
Activity - VARCHAR(100)
Data - TEXT

Indices - Unique ID, First Name, Last Name, Age, Date, Activity

You will probably not need to index all columns. If you have few rows, you don't need indexes at all, except for the primary key on the unique ID. A primary key automatically works as an index.

I would probably start with only the primary key, and add indexes only when I find that some queries are too slow.

SELECT First_Name, Last_Name, Data
FROM TABLE_ALL
WHERE
Activity = 'draw' AND Age > 24;



(Design B) Put the Data in its own separate table.
-TABLE_A-
Unique ID - INT(10) autoincrement
First Name - VARCHAR (25)
Last Name - VARCHAR (25)
Age - INT(3)
Date - DATETIME
Activity - VARCHAR(100)
Data_ID - INT(10)

Indices - Unique ID, First Name, Last Name, Age, Date, Activity

-TABLE_B-
Data_ID - INT(10)
Data - TEXT

Index - Data_ID

This will be faster if your Data column is relatively big (several K on average, I don't know. depends on your HW, of course).

I would suggest using the unique ID from TABLE_A as a primary key in TABLE_B, and drop Data_ID from TABLE_A.

If there are millions of rows I would normalize these tables to the extreme, something like this:

Person: P_Id,Born
FName: FN_Id,FirstName
LName: LN_Id,LastName
FN_P: FN_Id,P_id
LN_P: LN_Id,P_id
Activity: A_Id,Activity
Act_P: A_id,P_Id
Data:P_Id,Data

FN_P and LN_P are so-called "link tables", linking names to persons in a many-to-many relation. Even further normalization would have been achieved with an additional counter column. It would be used in these tables to maintain the order of the names when a person have multiple first names or last names, so that you would have one FName row for each unique name, "Mary Jane" would be split in "Mary" and "Jane".

You could query this schema like this:

SELECT FirstName,LastName,Data
  FROM Person,FName,LName,Data,Activity,FN_P,LN_P,Act_P
  WHERE
    Person.P_Id = Data.P_Id AND
    Person.P_Id = FN_P.P_Id AND
    Person.P_Id = LN_P.P_Id AND
    Person.P_Id = Act_P.P_Id AND
    FName.FN_Id = FN_P.FN_Id AND
    LName.LN_Id = LN_P.LN_Id AND
    Activity.A_Id = Act_P.A_Id AND
    Activity = 'draw' and Born < year(now()) - 24

...or with more explicit formulated joins, like this:

SELECT FirstName,LastName,Data
  FROM Person
  NATURAL JOIN Act_P NATURAL JOIN Activity
  INNER JOIN FN_P ON FN_P.P_Id=Person.P_Id NATURAL JOIN FName
  INNER JOIN LN_P ON LN_P.P_Id=Person.P_Id NATURAL JOIN LName,
  LEFT JOIN Data ON Data.P_Id = Person.P_Id
  WHERE
    Activity = 'draw' and Born < year(now()) - 24

The NATURAL JOINS are joins based on columns with the same name in the two joined tables, see the manual. The LEFT JOIN is used in this case because some Persons may not have a corresponding row in the Data table, in this case the Data column of the result table will contain NULL. If you used an INNER join in place of the LEFT join in this case, Persons without a Data record would be omitted from the result.

SELECT TABLE_A.First_Name, TABLE_A.Last_Name, TABLE_B.Data
FROM TABLE_A, TABLE_B
WHERE
Activity = 'draw' AND Age > 24 AND TABLE_A.Data_ID = TABLE_B.Data_ID;
(Aside: Would this query give me the same results as the above query?)

Yes, I think so, if all rows in TABLE_A have a corresponding row in TABLE_B.

--
Roger


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to