See my remarks interspersed below. Rhino
----- Original Message ----- From: "Andy B" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, October 28, 2004 5:09 PM Subject: Re: column choices for certain data > Hi... > > Thanks for the info/help here. No problem; that's what the list is for ;-) > I understand everything up to the part where > it talks about referencing more than 1 table to each other and the foreign > keys and all that stuff... > > basically I got lost. Not because you explained it wrong but because I > haven't got the foggiest clue what keys/table linking is or how it works... > Fair enough. I don't know what you know or don't know so I just made an assumption and guessed your experience was more extensive than it is. > Is it possible we can start with a simpler example with linking/keys/foreign > keys and stuff so maybe I can follow it easier?? And of course if not valid > for the list maybe we can take further talking to private email?? (sorry I'm > very new at this stuff)... > Actually, the concept is not that hard. I'll try to explain it a little more clearly. The whole idea of both the SET datatype and the lookup table I showed you is to limit the values which can be stored in a given column to specific values. If you just defined your business type column as char(50) or something like that, you could put *any* SINGLE value you liked into the column as long as it didn't exceed 50 characters; if you added 'not null' to the column description, you would additionally be forced to put *something* in the column, even if that was a single blank. Other than those limitations though (and possibly some limitations with regards to specific characters, depending on the code page, such as accented letters or whatever), you could supply any SINGLE value you want. In your case, you want to limit the valid values for the business type to a few specific values. Also, you want to be able to store MULTIPLE business types for the same business. Now, I'm not precisely sure what values you are envisioning for this business type but let's say, for argument's sake, that it is a short description of the business, like pool hall or dry cleaner. You may have a slightly different idea of business type in mind; if so, substitute the values that you envision for my examples. Whatever the values you want, you've led me to believe that a given business could have more than one business type so I'm picturing a business that has a pool hall but also takes in dry cleaning as one example. (By the way, if I'm misunderstanding this and you really only want a business type column to contain ONE value for any given row, then you don't really need SET at all. Remember, the idea of SET is that you could store up to 64 values in the same column for a given row, so that you could, in theory, say that Smitty's is a combination pool hall and dry cleaner (plus up to 62 other things if the owner has some additional sidelines, like selling gardening supplies or magazines). Okay then, let's get Codd's objections to this out of the way first. If you built a table using SET and a company had just these two business types, the row would look like this: registration_number business_name business_location business_type 1 Smitty's 123 Main Street pool hall, dry cleaner If the business had 5 sidelines, the row would look like this: registration_number business_name business_location business_type 1 Smitty's 123 Main Street pool hall, dry cleaner, gardening supplies, magazines, gun repair [forgive the line wrapping; all 5 values should appear in the same line/row] E.F. Codd, the guy who came up with the theory upon which all relational databases, like DB2 and MySQL, are based, would object strongly to this because his theory said that you should NEVER have more than one value in a column of a given row. The use of SET clearly violates that rule. (That may be why I've never seen it in another relational database, like DB2). Instead, we are going to satisfy Codd's theory by putting the business types in a separate table, the Business_types table. That table is going to have one row for each business type for each business. I've shown you that table earlier; here it is again: registration_number business_type 1 pool hall 1 dry cleaner 2 restaurant As you can see, it indicates that the business whose registration number is 1 is carrying on two different business types, pool hall and dry cleaner. The business whose registration number if 2 is carrying on only one business type, restaurant. We could stop there and probably satisfy your requirements quite nicely with just the Businesses and Business_types tables. These two tables would let us record any number of businesses in the directory and they would allow us to record any number of business types for each of those businesses, not just a maximum of 64 business types. This is already better than using 'SET' since we no longer have the limitation of 64. This design also satisfies Codd's rules in his relational theory, we no longer have multiple values in a single column of a single row of a table. However, we have the ability to record as many business types for a given business as we like so we haven't lost any information. Now, a quick discussion of primary keys. Codd said that every table needs a primary key: a value that makes each row of a table unique from all of the other rows in the table. In other words, if you know the primary key of a row, you can be absolutely certain that only one row (or none) in the table will ever have that specific key value. For instance, a driver's licence number should normally be unique within your state, province or country, depending on how driver's licences work in your area. Therefore, if you live in Alabama and your driver's licence number is K999456PL, there should never be another driver from Alabama who has that same driver's licence number. If we record driver's licence information in a database table, that driver's licence number should be unique and identify only a single row of the data, never two or more rows. We would likely make the primary key of that table the driver's licence number. If we do that in our table definition, it will be IMPOSSIBLE to ever add a new row to the table that has the same driver's licence number as an existing row. The primary key is a very important decision for any table. In our case, the primary key of the Businesses table is likely going to be a number that we assign when we enroll people in our business directory; perhaps we assign these numbers sequentially. That's what I envisioned when I described the Businesses table: the first business in the directory gets the number 1, the second gets the number 2, and so on. [That's just for simplicity sake; in reality, numbers could be assigned randomly or by dividing the owner's shoe size by his IQ, or however you like. The primary key would not HAVE to be a number though; if you hated the concept of treating people like numbers, you could use the business name as the primary key although it would severely complicate your life. What happens if Smitty's opens a second location and it was supposed to be registered separately! In that case, you'd have to make the primary key something other than just the business name; you might have to combine business name with location or something like that. Well, let's not complicate our lives too much unless we have to; let's assume that we give a registration number of our own choosing to each member of our directory and that is our primary key.] The Business_types table also needs a primary key. However, it can't be the registration_number by itself; if it were, we'd be limited to having only one business type for each business, which is contrary to what we are trying to accomplish. In this case, the primary key we choose is the COMBINATION of registration_number and business_type. That ensures that we can record as many business types as we like for each registration_number but it ensures that we don't record the same business type repeatedly for a given registration_number. For example, we couldn't have two (or more) rows that say that the business with registration_number 1 is a pool hall. Are you with me so far? I've thrown a lot at you in a few paragraphs so take a breather, read it a few times and try to digest it, then think about reading on.... Okay, assuming you've digested things so far, let me explain the lookup table, Business_Types_Lookup. That table is designed to limit the specific values which are acceptable for the business_type column in the Business_Types table. If we don't use the lookup table, we can put ANY value we like in the business_type column in the Business_Types table (as long as it is not too large for the column or has any characters that violate the code page). That may be perfectly acceptable for you, in which case you can probably stop reading. However, I'm going to assume that you want to limit the specific values which can be used for business_type. You could think of the Business_Type_Lookup table as a sort of spell-checker; if it is turned on, then the only values which can go in the business_type column in the Business_Types table are values that are in the business_type column of the lookup table. Therefore, if the lookup table listed only the values 'pool hall', 'dry cleaner' and 'restaurant', those are the only values that could ever be put in the business_type column in the Business_Types table. Limiting the values that a column can have is a very common necessity in a database. For instance, you might need to store a customer's two letter state code in the database. In that case, you want to allow NY and NH but you don't want to allow ZY or BB since those aren't valid state codes. In that case, a lookup table containing the valid state codes would ensure that no one could enter any state code except one of the ones that was in the state code lookup table. That's exactly what we are doing with business types: we store all of the acceptable values in the lookup table, then "turn on our spell-checker", if I can put it that way. The lookup table, like any other table, needs a primary key. In my lookup table for business types, I'm imagining two columns but some lookup tables will only have one column, the one containing the codes themselves. In my case, I'm imagining that the codes may be a bit terse so I have a second column that lets you explain in more detail what the code means, just like a state code lookup table would probably have a column for the two letter state code and a second column that explains that AL is Alabama or AK is Alaska or whatever. In other words, the second column is just a comment that helps you understand the data better. In choosing the primary key for a state code lookup table, I would choose the two character state code. THAT is the thing that needs to be unique so that is the primary key. By the same reasoning, the primary key of the Business_Types_Lookup table is the business_type; I never want to have two rows in the lookup table that have a business type of 'pool hall'. Okay, we're almost done. The only thing left to explain is the idea of a foreign key. A foreign key is a column in one table that gets its value from a column in ANOTHER table. In this case, our foreign key is the business_type column in the Business_Types table. That's why it contains a FOREIGN KEY clause that refers back to the lookup table's primary key. As soon as we define that column to be a foreign key, we ensure that the database itself will always ensure that every value put into the business_type column of the Business_Types table is one that comes from the business_type column of the lookup table. In other words, once our three tables are created and the lookup table has been populated with the desired values for business_type, we can be sure that any value in the business_type column of the Business_Types table is one that comes from the lookup table. If the lookup table contains only the values 'foo', 'bar' and 'squeak', then only those three business types can appear in the Business_Types table. This, in effect, is the activation of the 'spell-checker'. The beauty of this is that the enforcement of these values is done entirely by the database; you don't have to write any application code to do it. If you try to insert a value in the Business_Type table that isn't in the lookup table, the DATABASE detects this and refuses with a message that tells you what you tried to do wrong. Do you follow all that? By designing the tables this way, we have ensured that we can record any number of business types for a given table but that only valid business types can be recorded. We control what types are valid and can have as many valid types as we like. That's why I think lookup tables are a better approach than the 'SET' datatype. Oh, one small thing. Since MySQL isn't fully mature yet, foreign keys only get enforced between tables that use the InnoDB engine; therefore, each of our tables should have 'TYPE=InnoDB' in the definition. (I think I forgot the 'Type=InnoDB' in some of the original table definitions.) > I want to try and take this 1 step at a time until I get it... > Well, I see I've blathered on for quite a few paragraphs here. Some of these concepts may be new to you and may require some time to digest. Think about them and re-read as necessary, then ask me if you need further clarification. > let me know how I/we should go on from here. (the other side note) is I > eventually have to work this db system into a php driven application (not > like it matters on this list but...) > > anyways let me know how to continue with the matter... > I think its best to have these conversations on the list in most circumstances; that ensures that everyone benefits from the discussion: you and anyone else who researches this topic later in the archives. Otherwise, you are the only one benefitting. > tnx for the help... > My pleasure. Rhino > ----- Original Message ----- > From: "Rhino" <[EMAIL PROTECTED]> > To: "Andy B" <[EMAIL PROTECTED]> > Sent: Thursday, October 28, 2004 4:00 PM > Subject: Re: column choices for certain data > > > > > > ----- Original Message ----- > > From: "Andy B" <[EMAIL PROTECTED]> > > To: <[EMAIL PROTECTED]> > > Sent: Thursday, October 28, 2004 1:15 PM > > Subject: column choices for certain data > > > > > >> Hi... > >> I have a db that I'm writing. It's for a business directory and one of > >> the > >> fields/columns in the table needs to have a list of business types in it > >> (i.e. retail, auto, computer and so on). Since there may be more than one > >> category that a business fits under I was wondering if "SET" is the best > >> choice for that?? > >> > > I wouldn't use SET if I were you. > > > > I have never used the 'SET' column type in MySQL and had to look it up in > > the manual to see what it did. However, I've worked with relational > > databases for 20+ years (mostly DB2) and it doesn't have a 'SET' column > > type > > in its repertoire; I've gotten used to doing things without 'SET' so maybe > > I'm just being stodgy ;-) > > > > The chief advantage of 'SET', as far as I can tell from the manual, is > > that > > it lets you control the specific values which can be in a column without > > having to write application lookups to verify that the value you are > > supplying is one that is valid for the 'SET' column. Therefore, if you had > > only 3 business types, sole proprietorship, partnership, and corporation, > > you could put those 3 values in the set and be sure that those are the > > only > > 3 values that would ever be allowed in the column. That's fine as far as > > it > > goes and is a very useful thing. > > > > However, on the negative side, there is a fixed maximum of 64 values in > > the > > set. While that may be sufficient for your immediate needs, I don't thin k > > you can be certain that it will be sufficient for your long term needs. > > For > > example, if this is an eclectic business that combines a lot of lines of > > business, you may find that it sells groceries, operates a dry cleaner, > > contains a movie theatre, and umpteen other things all under the same > > business name. You may find that 64 values isn't enough once you start > > making the set include all the different functions of the business. > > > > The second negative is that I don't think 'SET' is a datatype found in > > most > > other databases. Therefore, if you eventually port this table over to > > another database, you may have to rework the design somewhat to get the > > same > > effect, which could be a pain. > > > > The third negative is that putting multiple values in a single column of a > > single row violates Codd's Rules, which are the foundation of all > > relational > > databases. Codd is probably rolling in his grave at the mere thought of > > doing this ;-) > > > > Therefore, let me suggest this, which should give you the same benefits > > without the 64 value limitation while being portable to other databases: > > store the business type in a separate table, even if there is only one > > possible value for business type for most rows in your directory. > > > > For example, create one table to hold the basic information about your > > business: > > > > create table businesses > > (registration_number int not null, > > business_name char(50) not null, > > business_location char(100) not null, > > [etc.] > > primary key(registration_number)); > > > > Sample Contents: > > registration_number business_name business_location > > 1 Smitty's 123 Main Street > > 2 Bob's 456 Park Street > > > > create table business_types > > (registration_number, > > business_type char(20) not null, > > primary key(registration_number,business_types) > > foreign key(business_type) references > > business_types_lookup(business_type))TYPE=InnoDB; > > > > Sample Contents: > > registration_number business_type > > 1 pool hall > > 1 dry cleaner > > 2 restaurant > > > > create table business_types_lookup > > (business_type char(20) not null, > > business_type_description char(200) not null, > > primary key(business_type)); > > > > Sample Contents: > > business_type business_type_description > > pool hall gambling establishment or other den of iniquity > > restaurant eating establishment that can serve alcohol > > > > Do you see how this works? > > > > Every time you add a new business to your database, you add one row to the > > Businesses table, assigning a registration number to the business and > > recording company name, location, etc. Then, for each of the possible > > business types that apply to that business, you add a row to the > > Business_Types table. For example, Smitty's is both a pool hall and a dry > > cleaner so you add one row for each business type, linking it back to the > > Businesses table via the registration number that uniquely identifies one > > business. Any given business can have as many business types as you like, > > not just a maximum of 64. > > > > The third table is not strictly necessary but it is generally a good idea; > > it is a lookup table that makes sure that only valid business types get > > chosen in the Business_Types table. You create one row in > > Business_Types_Lookup for each type of business that you think is valid > > for > > your purposes. You can have as many business_types as you like in the > > Business_Types_Lookup table, not just a maximum of 64. The 'foreign key' > > designation on the business_type column of the Business_Types table > > ensures > > that ONLY values from the lookup table are acceptable in Business_Types; > > if > > you don't have 'shoe shine stand' in the lookups table, you can't use it > > in > > the Business_Types table either. (Since foreign keys are only enforced in > > InnoDB tables, you have to specify Type=InnoDB when you define the > > Business_Types table.) > > > > Is this clear? If not, let me know and I'll try to clarify it for you. > > > > Rhino > > > > > > > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]