See my remarks interspersed below.

Rhino

----- Original Message ----- 
From: "Andy B" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, October 28, 2004 5:09 PM
Subject: Re: column choices for certain data


> Hi...
>
> Thanks for the info/help here.

No problem; that's what the list is for ;-)

> I understand everything up to the part where
> it talks about referencing more than 1 table to each other and the foreign
> keys and all that stuff...
>
> basically I got lost. Not because you explained it wrong but because I
> haven't got the foggiest clue what keys/table linking is or how it
works...
>
Fair enough. I don't know what you know or don't know so I just made an
assumption and guessed your experience was more extensive than it is.

> Is it possible we can start with a simpler example with
linking/keys/foreign
> keys and stuff so maybe I can follow it easier?? And of course if not
valid
> for the list maybe we can take further talking to private email?? (sorry
I'm
> very new at this stuff)...
>
Actually, the concept is not that hard. I'll try to explain it a little more
clearly.

The whole idea of both the SET datatype and the lookup table I showed you is
to limit the values which can be stored in a given column to specific
values. If you just defined your business type column as char(50) or
something like that, you could put *any* SINGLE value you liked into the
column as long as it didn't exceed 50 characters; if you added 'not null' to
the column description, you would additionally be forced to put *something*
in the column, even if that was a single blank. Other than those limitations
though (and possibly some limitations with regards to specific characters,
depending on the code page, such as accented letters or whatever), you could
supply any SINGLE value you want.

In your case, you want to limit the valid values for the business type to a
few specific values. Also, you want to be able to store MULTIPLE business
types for the same business. Now, I'm not precisely sure what values you are
envisioning for this business type but let's say, for argument's sake, that
it is a short description of the business, like pool hall or dry cleaner.
You may have a slightly different idea of business type in mind; if so,
substitute the values that you envision for my examples. Whatever the values
you want, you've led me to believe that a given business could have more
than one business type so I'm picturing a business that has a pool hall but
also takes in dry cleaning as one example. (By the way, if I'm
misunderstanding this and you really only want a business type column to
contain ONE value for any given row, then you don't really need SET at all.
Remember, the idea of SET is that you could store up to 64 values in the
same column for a given row, so that you could, in theory, say that Smitty's
is a combination pool hall and dry cleaner (plus up to 62 other things if
the owner has some additional sidelines, like selling gardening supplies or
magazines).

Okay then, let's get Codd's objections to this out of the way first. If you
built a table using SET and a company had just these two business types, the
row would look like this:
registration_number    business_name    business_location    business_type
1                               Smitty's                123 Main Street
pool hall, dry cleaner

If the business had 5 sidelines, the row would look like this:

registration_number    business_name    business_location    business_type
1                               Smitty's                123 Main Street
pool hall, dry cleaner, gardening supplies, magazines, gun repair [forgive
the line wrapping; all 5 values should appear in the same line/row]

E.F. Codd, the guy who came up with the theory upon which all relational
databases, like DB2 and MySQL, are based, would object strongly to this
because his theory said that you should NEVER have more than one value in a
column of a given row. The use of SET clearly violates that rule. (That may
be why I've never seen it in another relational database, like DB2).

Instead, we are going to satisfy Codd's theory by putting the business types
in a separate table, the Business_types table. That table is going to have
one row for each business type for each business. I've shown you that table
earlier; here it is again:

registration_number    business_type
1                                pool hall
1                                dry cleaner
2                                restaurant

As you can see, it indicates that the business whose registration number is
1 is carrying on two different business types, pool hall and dry cleaner.
The business whose registration number if 2 is carrying on only one business
type, restaurant.

We could stop there and probably satisfy your requirements quite nicely with
just the Businesses and Business_types tables. These two tables would let us
record any number of businesses in the directory and they would allow us to
record any number of business types for each of those businesses, not just a
maximum of 64 business types. This is already better than using 'SET' since
we no longer have the limitation of 64. This design also satisfies Codd's
rules in his relational theory, we no longer have multiple values in a
single column of a single row of a table. However, we have the ability to
record as many business types for a given business as we like so we haven't
lost any information.

Now, a quick discussion of primary keys. Codd said that every table needs a
primary key: a value that makes each row of a table unique from all of the
other rows in the table. In other words, if you know the primary key of a
row, you can be absolutely certain that only one row (or none) in the table
will ever have that specific key value. For instance, a driver's licence
number should normally be unique within your state, province or country,
depending on how driver's licences work in your area. Therefore, if you live
in Alabama and your driver's licence number is K999456PL, there should never
be another driver from Alabama who has that same driver's licence number. If
we record driver's licence information in a database table, that driver's
licence number should be unique and identify only a single row of the data,
never two or more rows. We would likely make the primary key of that table
the driver's licence number. If we do that in our table definition, it will
be IMPOSSIBLE to ever add a new row to the table that has the same driver's
licence number as an existing row.

The primary key is a very important decision for any table. In our case, the
primary key of the Businesses table is likely going to be a number that we
assign when we enroll people in our business directory; perhaps we assign
these numbers sequentially. That's what I envisioned when I described the
Businesses table: the first business in the directory gets the number 1, the
second gets the number 2, and so on. [That's just for simplicity sake; in
reality, numbers could be assigned randomly or by dividing the owner's shoe
size by his IQ, or however you like. The primary key would not HAVE to be a
number though; if you hated the concept of treating people like numbers, you
could use the business name as the primary key although it would severely
complicate your life. What happens if Smitty's opens a second location and
it was supposed to be registered separately! In that case, you'd have to
make the primary key something other than just the business name; you might
have to combine business name with location or something like that. Well,
let's not complicate our lives too much unless we have to; let's assume that
we give a registration number of our own choosing to each member of our
directory and that is our primary key.]

The Business_types table also needs a primary key. However, it can't be the
registration_number by itself; if it were, we'd be limited to having only
one business type for each business, which is contrary to what we are trying
to accomplish. In this case, the primary key we choose is the COMBINATION of
registration_number and business_type. That ensures that we can record as
many business types as we like for each registration_number but it ensures
that we don't record the same business type repeatedly for a given
registration_number. For example, we couldn't have two (or more) rows that
say that the business with registration_number 1 is a pool hall.

Are you with me so far? I've thrown a lot at you in a few paragraphs so take
a breather, read it a few times and try to digest it, then think about
reading on....

Okay, assuming you've digested things so far, let me explain the lookup
table, Business_Types_Lookup. That table is designed to limit the specific
values which are acceptable for the business_type column in the
Business_Types table. If we don't use the lookup table, we can put ANY value
we like in the business_type column in the Business_Types table (as long as
it is not too large for the column or has any characters that violate the
code page). That may be perfectly acceptable for you, in which case you can
probably stop reading. However, I'm going to assume that you want to limit
the specific values which can be used for business_type.

You could think of the Business_Type_Lookup table as a sort of
spell-checker; if it is turned on, then the only values which can go in the
business_type column in the Business_Types table are values that are in the
business_type column of the lookup table. Therefore, if the lookup table
listed only the values 'pool hall', 'dry cleaner' and 'restaurant', those
are the only values that could ever be put in the business_type column in
the Business_Types table.

Limiting the values that a column can have is a very common necessity in a
database. For instance, you might need to store a customer's two letter
state code in the database. In that case, you want to allow NY and NH but
you don't want to allow ZY or BB since those aren't valid state codes. In
that case, a lookup table containing the valid state codes would ensure that
no one could enter any state code except one of the ones that was in the
state code lookup table.

That's exactly what we are doing with business types: we store all of the
acceptable values in the lookup table, then "turn on our spell-checker", if
I can put it that way.

The lookup table, like any other table, needs a primary key. In my lookup
table for business types, I'm imagining two columns but some lookup tables
will only have one column, the one containing the codes themselves. In my
case, I'm imagining that the codes may be a bit terse so I have a second
column that lets you explain in more detail what the code means, just like a
state code lookup table would probably have a column for the two letter
state code and a second column that explains that AL is Alabama or AK is
Alaska or whatever. In other words, the second column is just a comment that
helps you understand the data better.

In choosing the primary key for a state code lookup table, I would choose
the two character state code. THAT is the thing that needs to be unique so
that is the primary key. By the same reasoning, the primary key of the
Business_Types_Lookup table is the business_type; I never want to have two
rows in the lookup table that have a business type of 'pool hall'.

Okay, we're almost done. The only thing left to explain is the idea of a
foreign key.

A foreign key is a column in one table that gets its value from a column in
ANOTHER table. In this case, our foreign key is the business_type column in
the Business_Types table. That's why it contains a FOREIGN KEY clause that
refers back to the lookup table's primary key. As soon as we define that
column to be a foreign key, we ensure that the database itself will always
ensure that every value put into the business_type column of the
Business_Types table is one that comes from the business_type column of the
lookup table. In other words, once our three tables are created and the
lookup table has been populated with the desired values for business_type,
we can be sure that any value in the business_type column of the
Business_Types table is one that comes from the lookup table. If the lookup
table contains only the values 'foo', 'bar' and 'squeak', then only those
three business types can appear in the Business_Types table. This, in
effect, is the activation of the 'spell-checker'.

The beauty of this is that the enforcement of these values is done entirely
by the database; you don't have to write any application code to do it. If
you try to insert a value in the Business_Type table that isn't in the
lookup table, the DATABASE detects this and refuses with a message that
tells you what you tried to do wrong.

Do you follow all that? By designing the tables this way, we have ensured
that we can record any number of business types for a given table but that
only valid business types can be recorded. We control what types are valid
and can have as many valid types as we like. That's why I think lookup
tables are a better approach than the 'SET' datatype.

Oh, one small thing. Since MySQL isn't fully mature yet, foreign keys only
get enforced between tables that use the InnoDB engine; therefore, each of
our tables should have 'TYPE=InnoDB' in the definition. (I think I forgot
the 'Type=InnoDB' in some of the original table definitions.)

> I want to try and take this 1 step at a time until I get it...
>
Well, I see I've blathered on for quite a few paragraphs here. Some of these
concepts may be new to you and may require some time to digest. Think about
them and re-read as necessary, then ask me if you need further
clarification.

> let me know how I/we should go on from here. (the other side note) is I
> eventually have to work this db system into a php driven application (not
> like it matters on this list but...)
>
> anyways let me know how to continue with the matter...
>
I think its best to have these conversations on the list in most
circumstances; that ensures that everyone benefits from the discussion: you
and anyone else who researches this topic later in the archives. Otherwise,
you are the only one benefitting.

> tnx for the help...
>
My pleasure.

Rhino

> ----- Original Message ----- 
> From: "Rhino" <[EMAIL PROTECTED]>
> To: "Andy B" <[EMAIL PROTECTED]>
> Sent: Thursday, October 28, 2004 4:00 PM
> Subject: Re: column choices for certain data
>
>
> >
> > ----- Original Message ----- 
> > From: "Andy B" <[EMAIL PROTECTED]>
> > To: <[EMAIL PROTECTED]>
> > Sent: Thursday, October 28, 2004 1:15 PM
> > Subject: column choices for certain data
> >
> >
> >> Hi...
> >> I have a db that I'm writing. It's for a business directory and one of
> >> the
> >> fields/columns in the table needs to have a list of business types in
it
> >> (i.e. retail, auto, computer and so on). Since there may be more than
one
> >> category that a business fits under I was wondering if "SET" is the
best
> >> choice for that??
> >>
> > I wouldn't use SET if I were you.
> >
> > I have never used the 'SET' column type in MySQL and had to look it up
in
> > the manual to see what it did. However, I've worked with relational
> > databases for 20+ years (mostly DB2) and it doesn't have a 'SET' column
> > type
> > in its repertoire; I've gotten used to doing things without 'SET' so
maybe
> > I'm just being stodgy ;-)
> >
> > The chief advantage of 'SET', as far as I can tell from the manual, is
> > that
> > it lets you control the specific values which can be in a column without
> > having to write application lookups to verify that the value you are
> > supplying is one that is valid for the 'SET' column. Therefore, if you
had
> > only 3 business types, sole proprietorship, partnership, and
corporation,
> > you could put those 3 values in the set and be sure that those are the
> > only
> > 3 values that would ever be allowed in the column. That's fine as far as
> > it
> > goes and is a very useful thing.
> >
> > However, on the negative side, there is a fixed maximum of 64 values in
> > the
> > set. While that may be sufficient for your immediate needs, I don't thin
k
> > you can be certain that it will be sufficient for your long term needs.
> > For
> > example, if this is an eclectic business that combines a lot of lines of
> > business, you may find that it sells groceries, operates a dry cleaner,
> > contains a movie theatre, and umpteen other things all under the same
> > business name. You may find that 64 values isn't enough once you start
> > making the set include all the different functions of the business.
> >
> > The second negative is that I don't think 'SET' is a datatype found in
> > most
> > other databases. Therefore, if you eventually port this table over to
> > another database, you may have to rework the design somewhat to get the
> > same
> > effect, which could be a pain.
> >
> > The third negative is that putting multiple values in a single column of
a
> > single row violates Codd's Rules, which are the foundation of all
> > relational
> > databases. Codd is probably rolling in his grave at the mere thought of
> > doing this ;-)
> >
> > Therefore, let me suggest this, which should give you the same benefits
> > without the 64 value limitation while being portable to other databases:
> > store the business type in a separate table, even if there is only one
> > possible value for business type for most rows in your directory.
> >
> > For example, create one table to hold the basic information about your
> > business:
> >
> > create table businesses
> > (registration_number int not null,
> > business_name char(50) not null,
> > business_location char(100) not null,
> > [etc.]
> > primary key(registration_number));
> >
> > Sample Contents:
> > registration_number    business_name    business_location
> > 1                               Smitty's                123 Main Street
> > 2                               Bob's                     456 Park
Street
> >
> > create table business_types
> > (registration_number,
> > business_type char(20) not null,
> > primary key(registration_number,business_types)
> > foreign key(business_type) references
> > business_types_lookup(business_type))TYPE=InnoDB;
> >
> > Sample Contents:
> > registration_number    business_type
> > 1                                pool hall
> > 1                                dry cleaner
> > 2                                restaurant
> >
> > create table business_types_lookup
> > (business_type char(20) not null,
> > business_type_description char(200) not null,
> > primary key(business_type));
> >
> > Sample Contents:
> > business_type    business_type_description
> > pool hall            gambling establishment or other den of iniquity
> > restaurant         eating establishment that can serve alcohol
> >
> > Do you see how this works?
> >
> > Every time you add a new business to your database, you add one row to
the
> > Businesses table, assigning a registration number to the business and
> > recording company name, location, etc. Then, for each of the possible
> > business types that apply to that business, you add a row to the
> > Business_Types table. For example, Smitty's is both a pool hall and a
dry
> > cleaner so you add one row for each business type, linking it back to
the
> > Businesses table via the registration number that uniquely identifies
one
> > business. Any given business can have as many business types as you
like,
> > not just a maximum of 64.
> >
> > The third table is not strictly necessary but it is generally a good
idea;
> > it is a lookup table that makes sure that only valid business types get
> > chosen in the Business_Types table. You create one row in
> > Business_Types_Lookup for each type of business that you think is valid
> > for
> > your purposes. You can have as many business_types as you like in the
> > Business_Types_Lookup table, not just a maximum of 64. The 'foreign key'
> > designation on the business_type column of the Business_Types table
> > ensures
> > that ONLY values from the lookup table are acceptable in Business_Types;
> > if
> > you don't have 'shoe shine stand' in the lookups table, you can't use it
> > in
> > the Business_Types table either. (Since foreign keys are only enforced
in
> > InnoDB tables, you have to specify Type=InnoDB when you define the
> > Business_Types table.)
> >
> > Is this clear? If not, let me know and I'll try to clarify it for you.
> >
> > Rhino
> >
> >
> >
>
>
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
>


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to