On Apr 7, 2005, at 5:20 PM, Marilyn Davis wrote:

Hi MySQL people.

I'm trying to understand the pros and cons of the "set" column type.

Here is an excerpt from the article:

http://dev.mysql.com/tech-resources/articles/mysql-set-datatype.html


Why You Shouldn't Use SET

"Third, using a set means your data is not normalized. In our above
example, we are tracking a person's interests for a hypothetical
dating site. In a normalized schema, there should be three tables: one
for the person, one for all possible interests, and one that links a
person to their particular interests."

The "above example" is:

  CREATE TABLE set_test(
    rowid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    myset SET('Travel','Sports','Dancing','Fine Dining')
  );

I guess I'm not understanding the point here.  What practical
advantage is there to creating three tables instead of using a set
column type?

Thank you for any help.

Marilyn Davis

A slightly more realistic example might be

  CREATE TABLE member
  (  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
     name CHAR(30),
     interests SET('Travel','Sports','Dancing','Fine Dining')
  );

with some rows like

  id  name    interests
   1  Joe     Travel, Sports
   2  Sue     Dancing, Fine Dining

The 3 table alternative (the normalized version) would be

  CREATE TABLE member
  (  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
     name CHAR(30),
     INDEX (name)
  );

  CREATE TABLE interest
  (  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
     category CHAR(30),
     INDEX (category)
  );

  CREATE TABLE member_interests
  (  member_id INT UNSIGNED NOT NULL,
     interest_id INT UNSIGNED NOT NULL,
     PRIMARY KEY (member_id, interest_id),
     INDEX (interest_id)
  );

Some rows:

member
  id  name
   1  Joe
   2  Sue

interest
  id  category
   1  Travel
   2  Fine Dining
   3  Dancing
   4  Sports

member_interests
  member_id  interest_id
     1          1
     1          4
     2          2
     2          3

Despite the apparent increase in complexity of the 3-table version, it has a number of advantages. Here are a few:

* Retrieving  the list of interests
  SET:
    DESCRIBE member interests;

  3 TABLES:
    SELECT category FROM interest;

In the SET case, you have to parse the output to find the interests, while in the 3 TABLES case you simply get a list of interests.

* Adding a new interest
SET:
ALTER TABLE member CHANGE interests interests SET('Travel','Sports','Dancing','Fine Dining', 'Literature');


  3 TABLES:
    INSERT INTO interest (category) VALUES ('Literature');

In the SET case, you have to change the table structure! Mysql makes a temporary table with the new column definition, copies all the data from the member table to this temporary table, then replaces member with the temporary table. You cannot edit (INSERT, DELETE, UPDATE) the member table till this is done. (See <http://dev.mysql.com/doc/mysql/en/alter-table.html>) Also, while mysql should correctly convert the interests data to the new column definition, you should make a backup first, just in case. In the 3 TABLE case, on the other hand, this is a simple, quick insert.

* Finding members with a particular interest:

  SET:
    SELECT id, name FROM member WHERE FIND_IN_SET('Sports', interests);

  3 TABLES:
    SELECT m.id, m.name
    FROM member m
    JOIN member_interest mi ON m.id = mi.member_id
    JOIN interest i ON mi.interest_id = i.id
    WHERE i.category = 'Sports';

In the SET case, the query *looks* simpler, but mysql cannot use an index on member.interests to narrow the search. That is, mysql has to examine every single row of the member table (a full table scan), looking at the value in the interests column to see if 'Sports' is in there. That's slow, and it gets worse as the table grows. In the 3 TABLE case, however, the index on interest.category will be used to find precisely those rows with the correct interest, without looking at any other rows. That's fast, and scales better as the table grows.

If your table is small and the load on your server is low, the SET type may work fine. If this is a table of a 100 or so friends and relatives, for example, to be used by you (and perhaps your spouse), the limitations of the SET type may not be a problem. As the table gets larger and/or the number of users goes up, however, those limitations will kill you.

Michael


-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to