Re: Query performance...two table design options

Roger Baklund Fri, 27 May 2005 14:51:41 -0700

James Tu wrote:

Hi:


Let's say I want to store the following information.

Unique ID - INT(10) autoincrement
First Name - VARCHAR (25)
Last Name - VARCHAR (25)
Age - INT(3)

In general 'age' is a bad column, because you need to know what year thedata was entered to calculate the current age. It is often better tostore year of birth or date of birth. This may not be relevant to yourapplication, I just wanted to mention it.

Date - DATETIME
Activity - VARCHAR(100)
Data - TEXT
I would be basing my queries on all columns _except_ the Data column. I.e. Iwould be using WHERE's with all except the Data column.

You are not telling us how much data you are planning to maintain. Howbig will the "Data" column be, on average, and how many rows/persons arewe talking about? Hundreds, thousands or millions?

My question is...which design would perform better?

(Design A) Put all in one table...index all the columns that I will useWHERE with.

-TABLE_ALL-
Unique ID - INT(10) autoincrement
First Name - VARCHAR (25)
Last Name - VARCHAR (25)
Age - INT(3)
Date - DATETIME
Activity - VARCHAR(100)
Data - TEXT

Indices - Unique ID, First Name, Last Name, Age, Date, Activity

You will probably not need to index all columns. If you have few rows,you don't need indexes at all, except for the primary key on the uniqueID. A primary key automatically works as an index.

I would probably start with only the primary key, and add indexes onlywhen I find that some queries are too slow.

SELECT First_Name, Last_Name, Data
FROM TABLE_ALL
WHERE
Activity = 'draw' AND Age > 24;



(Design B) Put the Data in its own separate table.
-TABLE_A-
Unique ID - INT(10) autoincrement
First Name - VARCHAR (25)
Last Name - VARCHAR (25)
Age - INT(3)
Date - DATETIME
Activity - VARCHAR(100)
Data_ID - INT(10)

Indices - Unique ID, First Name, Last Name, Age, Date, Activity

-TABLE_B-
Data_ID - INT(10)
Data - TEXT

Index - Data_ID

This will be faster if your Data column is relatively big (several K onaverage, I don't know. depends on your HW, of course).

I would suggest using the unique ID from TABLE_A as a primary key inTABLE_B, and drop Data_ID from TABLE_A.

If there are millions of rows I would normalize these tables to theextreme, something like this:


Person: P_Id,Born
FName: FN_Id,FirstName
LName: LN_Id,LastName
FN_P: FN_Id,P_id
LN_P: LN_Id,P_id
Activity: A_Id,Activity
Act_P: A_id,P_Id
Data:P_Id,Data

FN_P and LN_P are so-called "link tables", linking names to persons in amany-to-many relation. Even further normalization would have beenachieved with an additional counter column. It would be used in thesetables to maintain the order of the names when a person have multiplefirst names or last names, so that you would have one FName row for eachunique name, "Mary Jane" would be split in "Mary" and "Jane".


You could query this schema like this:

SELECT FirstName,LastName,Data
  FROM Person,FName,LName,Data,Activity,FN_P,LN_P,Act_P
  WHERE
    Person.P_Id = Data.P_Id AND
    Person.P_Id = FN_P.P_Id AND
    Person.P_Id = LN_P.P_Id AND
    Person.P_Id = Act_P.P_Id AND
    FName.FN_Id = FN_P.FN_Id AND
    LName.LN_Id = LN_P.LN_Id AND
    Activity.A_Id = Act_P.A_Id AND
    Activity = 'draw' and Born < year(now()) - 24

...or with more explicit formulated joins, like this:

SELECT FirstName,LastName,Data
  FROM Person
  NATURAL JOIN Act_P NATURAL JOIN Activity
  INNER JOIN FN_P ON FN_P.P_Id=Person.P_Id NATURAL JOIN FName
  INNER JOIN LN_P ON LN_P.P_Id=Person.P_Id NATURAL JOIN LName,
  LEFT JOIN Data ON Data.P_Id = Person.P_Id
  WHERE
    Activity = 'draw' and Born < year(now()) - 24

The NATURAL JOINS are joins based on columns with the same name in thetwo joined tables, see the manual. The LEFT JOIN is used in this casebecause some Persons may not have a corresponding row in the Data table,in this case the Data column of the result table will contain NULL. Ifyou used an INNER join in place of the LEFT join in this case, Personswithout a Data record would be omitted from the result.

SELECT TABLE_A.First_Name, TABLE_A.Last_Name, TABLE_B.Data
FROM TABLE_A, TABLE_B
WHERE
Activity = 'draw' AND Age > 24 AND TABLE_A.Data_ID = TABLE_B.Data_ID;
(Aside: Would this query give me the same results as the above query?)


Yes, I think so, if all rows in TABLE_A have a corresponding row in TABLE_B.

--
Roger


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query performance...two table design options

Reply via email to