Re: Normalization question
Thanks for you responses! This list has proven to be for useful to pick pick the brains of other DBA's. I think we are probably going to go the route of splitting the different pieces of data into seperate columns, while keeping the original product_id, since we have a lot of third party applications that reference that column for data identification purposes, although not to specifically find the information that is overloaded within the field. Thanks!
Re: Normalization question
Rod Heyd [EMAIL PROTECTED] wrote on 12/09/2005 11:01:38 AM: Hi Everyone, I've got a little debate that maybe you can all help me settle. I have three tables that I need to join, but there are performance problems with the joins due to some misunderstandings of my predecessor about what's needed to join the tables efficiently. Here's the setup. t1 has a unique key defined on it, call it command_id, t1 has a 1 to many relationship with t2. t2 has t1's command_id as a foreign key constraint, plus a set of one or more instrument_id's. Each command results in one or more instruments taking data. The commanding elements defined here are then set to our instrument to aquire the data. When the data comes back it is organized into t3 by command_id and instrument_id. So the primary key on t3 looks like this: command_id_instrument_id. So, now I need to write a query that looks at what was commanded in t1 and t2 and then look for any missing data that has not yet been received. So, I've got a query that looks something like this: SELECT stuff FROM t1 JOIN t2 ON t1.command_id = t2.command_id LEFT JOIN t3 ON t3.data_id = concat(t1.command_id,'_',t2.instrument_id) Now, I think everyone is going to see immediately that the join on t3 is going to have absolutely horrible performance, the join condition here can't take advantage of any indexes since the string function pretty much destroys any hope of that. To make matters worse, the left join is a total killer. So my suggestion to solve the performance bottleneck is to add two columns to t3, command_id and instrument_id, and create a combined index on the command_id and instrument_id columns. the join condition on t3 then becomes: LEFT JOIN t3 ON t3.command_id = t1.command_id AND t3.instrument_id = t2.instrument_id This works beautifully! The performance of this new join condition is about 480 times faster than the original. Here's the rub. Since there is a unique data_id that already exists which combines the information in both command_id and instrument_id keys, I'm being told by our seasoned software developer that I am violating classic database design rules against redundant data. In my opinion, this is a minor violation at best. We have a good reason for wanting to identify each data segment with our originally defined data_id, but this is not strictly a requirement on the database, it's more of an agreed upon convention that we are going to use to identify the data segments and distribute them to our consumers. From a database stand point, the only requirement is that the data_id be unique. It could be anything as far as the database is concerned, as long as the data_id remains unique, it doesn't matter that it may be overloaded with some duplicate information. Any more experienced DBA's than I have an opinion on this? Thanks! You are not duplicating data by referring to objects by their id values. The false-normalization through the use of the composite key on t3 was a mistake. By obscuring the actual relationships between t1, t2, and t3, your original designer broke one of the cardinal rules of designing a relational database. Each foreign key should point to at least (and usually at most) one row on a single table. His foreign key pointed to any row on either table. That was a bad design decision that created serious performance bottlenecks. If you have the chance to fix his design of t3 to keep the separate columns for each FK, please do it. If you want to keep the composite key for historical reasons, I don't see why that can't happen but you should stop using it as the primary key for the table. The drawback to composite keys is that they could possibly refer to more than one thing. Assume for a moment that you are creating a composite key out of two text-type fields. Also imagine that you are compositing those fields with a _ character (as in your example). This pair ('testme_a', 'secondhalf') and this pair ('testme', 'a_secondhalf') would both evaluate to 'testme_a_secondhalf'. We lose our 1:1 correspondence between key and parent. Obviously, my example was contrived and using just numbers can avoid this kind of behavior but it illustrated why it is just not correct to construct foreign keys. Foreign keys are SUPPOSED to contain values that actually match something in the other table. Using them does not violate the no duplicates rule your friend describes and can be used to enforce relational integrity. Just so you have it on record (IMHO): You are correct and the other DBA should read up about modern relational databases and modern theories of normalization. Shawn Green Database Administrator Unimin Corporation - Spruce Pine
Re: Normalization question
Rod Heyd wrote: Hi Everyone, I've got a little debate that maybe you can all help me settle. I have three tables that I need to join, but there are performance problems with the joins due to some misunderstandings of my predecessor about what's needed to join the tables efficiently. Here's the setup. t1 has a unique key defined on it, call it command_id, t1 has a 1 to many relationship with t2. t2 has t1's command_id as a foreign key constraint, plus a set of one or more instrument_id's. Each command results in one or more instruments taking data. The commanding elements defined here are then set to our instrument to aquire the data. When the data comes back it is organized into t3 by command_id and instrument_id. So the primary key on t3 looks like this: command_id_instrument_id. Yuck! So, now I need to write a query that looks at what was commanded in t1 and t2 and then look for any missing data that has not yet been received. So, I've got a query that looks something like this: SELECT stuff FROM t1 JOIN t2 ON t1.command_id = t2.command_id LEFT JOIN t3 ON t3.data_id = concat(t1.command_id,'_',t2.instrument_id) Now, I think everyone is going to see immediately that the join on t3 is going to have absolutely horrible performance, the join condition here can't take advantage of any indexes since the string function pretty much destroys any hope of that. To make matters worse, the left join is a total killer. I think the LEFT JOIN isn't that big a deal, if the proper index is there and usable. You can easily find out by comparing the speed of your query below against the speed of the same query without LEFT. So my suggestion to solve the performance bottleneck is to add two columns to t3, command_id and instrument_id, and create a combined index on the command_id and instrument_id columns. Yes, exactly. The combined index should be UNIQUE. the join condition on t3 then becomes: LEFT JOIN t3 ON t3.command_id = t1.command_id AND t3.instrument_id = t2.instrument_id This works beautifully! The performance of this new join condition is about 480 times faster than the original. Yes, this is the way this should be done. Here's the rub. Since there is a unique data_id that already exists which combines the information in both command_id and instrument_id keys, I'm being told by our seasoned software developer that I am violating classic database design rules against redundant data. It is true that you now have redundant data. Clearly, you do not need both the two new columns, command_id and instrument_id, and the old column, command_id_instrument_id. Redundant data is a violation of classic database design rules, so one or the other has to go if you want to follow the rules. Which should go? Well, the old column, command_id_instrument_id, combines the answer to two questions, Which command? and Which instrument?, into one column. That is also also a violation of the classic database design rules, and a really bad idea. It leads to precisely the sort of problem you are trying to fix. It also makes it difficult to find the rows in t3 which belong to a particular command, or to a particular instrument. Those queries require string matching, and the latter could not use an index. If you want to follow the rules, drop the old column. In my opinion, this is a minor violation at best. We have a good reason for wanting to identify each data segment with our originally defined data_id, but this is not strictly a requirement on the database, it's more of an agreed upon convention that we are going to use to identify the data segments and distribute them to our consumers. From a database stand point, the only requirement is that the data_id be unique. It could be anything as far as the database is concerned, as long as the data_id remains unique, it doesn't matter that it may be overloaded with some duplicate information. Any more experienced DBA's than I have an opinion on this? The strictly correct solution is to replace the old, broken column with your two new columns. You can either make the combined index on the two new columns the PRIMARY KEY, or you can make an AUTO_INCREMENT primary key and define the combination of the two columns as UNIQUE. It is then trivial to SELECT CONCAT(command_id, '_', instrument_id) ... when you want to display the t3 id using the agreed upon convention. In short, your seasoned software developer is right to want to follow the rules, but the rules dictate replacing the old column with the two new columns. Thanks! Michael -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: normalization question
Hi! I don't understand problem. Car can belong to only one Dealer, and one Dealer can have many Cars, therefore DealerID should be in Car table. Also Car can be only one Model, but there are many Cars the same Model. It looks quite normal to put ModelID into Car table, and I don't see any possible problems. Of course if there are any special requirements for the database, they have to be incorporated into the model. Am I missing your point? Artem -Original Message- From: [EMAIL PROTECTED] [mailto:speters;metromls.com] Sent: Monday, October 21, 2002 5:43 PM To: [EMAIL PROTECTED] Subject: normalization question I'm working on a design for a database that is leaving me with some normalization issues that I havent had to deal with before. For example, if im trying to store information about car dealerships I end up with the following situation: I want to store the cars in stock at each physical dealership, with certain attributes. So, I have a table Dealership as follows: Dealer_ID Name Address Etc. Then there is a Car_Make table (for things like Chevy, Hyundai): Car_Make_ID Make_Name ... And a Model table (for things like Elantra, Neon, etc): Model: Model_ID Car_Make_ID Model_Name ... And a table to describe the cars themselves: (one row per car) Car: VIN_number(vehicle identification number, what the DMV uses) Model_ID other attributes ... Now, to associate a specific car with a dealer, I want to put Dealer_ID in the Cars table, but then the Cars table has multiple parent tables, specifically Model Dealership I'm sure that this is far from normal, and will lead to programming problems, but i dont see what to do about it. Any advice would be greatly appreciated. thanks, sean peters [EMAIL PROTECTED] sql, query - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php Attention: Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorized to state them to be the views of any such entity. The information contained in this message and or attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any system and destroy any copies. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: normalization question
Except in the case that cars aren't held at the dealers and any one car can be purchased from a range of dealers. In which case you will need a seperate table like, Dealer_Car_Map (dealer_id,car_id). Ric. - Original Message - From: Artem Koltsov [EMAIL PROTECTED] To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Monday, October 21, 2002 9:17 PM Subject: RE: normalization question Hi! I don't understand problem. Car can belong to only one Dealer, and one Dealer can have many Cars, therefore DealerID should be in Car table. Also Car can be only one Model, but there are many Cars the same Model. It looks quite normal to put ModelID into Car table, and I don't see any possible problems. Of course if there are any special requirements for the database, they have to be incorporated into the model. Am I missing your point? Artem -Original Message- From: [EMAIL PROTECTED] [mailto:speters;metromls.com] Sent: Monday, October 21, 2002 5:43 PM To: [EMAIL PROTECTED] Subject: normalization question I'm working on a design for a database that is leaving me with some normalization issues that I havent had to deal with before. For example, if im trying to store information about car dealerships I end up with the following situation: I want to store the cars in stock at each physical dealership, with certain attributes. So, I have a table Dealership as follows: Dealer_ID Name Address Etc. Then there is a Car_Make table (for things like Chevy, Hyundai): Car_Make_ID Make_Name ... And a Model table (for things like Elantra, Neon, etc): Model: Model_ID Car_Make_ID Model_Name ... And a table to describe the cars themselves: (one row per car) Car: VIN_number(vehicle identification number, what the DMV uses) Model_ID other attributes ... Now, to associate a specific car with a dealer, I want to put Dealer_ID in the Cars table, but then the Cars table has multiple parent tables, specifically Model Dealership I'm sure that this is far from normal, and will lead to programming problems, but i dont see what to do about it. Any advice would be greatly appreciated. thanks, sean peters [EMAIL PROTECTED] sql, query - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php Attention: Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorized to state them to be the views of any such entity. The information contained in this message and or attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any system and destroy any copies. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: Normalization question
I have a question about the setup of the tables in my database. In my members table, I store a lot of info about demographics, such as marital status, income, etc. The way I'm doing this is the enum type. Is it better to use this, or would it be better to create separate tables for each demographic I want to store? Advantages of this would be no additional programming required when I want to list the # of members who have selected each demographic...plus I could add new items any time I wanted (i.e. I want to add widowed to the list of marital statuses). What is the correct way to do this? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Tuesday, June 04, 2002 6:24 PM To: Daren Cotter Subject: Re: Normalization question Your message cannot be posted because it appears to be either spam or simply off topic to our filter. To bypass the filter you must include one of the following words in your message: sql,query If you just reply to this message, and include the entire text of it in the reply, your reply will go through. However, you should first review the text of the message to make sure it has something to do with MySQL. Just typing the word MySQL once will be sufficient, for example. You have written the following: I have a question about the setup of the tables in my database. In my members table, I store a lot of info about demographics, such as marital status, income, etc. The way I'm doing this is the enum type. Is it better to use this, or would it be better to create separate tables for each demographic I want to store? Advantages of this would be no additional programming required when I want to list the # of members who have selected each demographic...plus I could add new items any time I wanted (i.e. I want to add widowed to the list of marital statuses). What is the correct way to do this? - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: Normalization question
I won't purport to tell you the 'correct' way to do it but I will tell you how I would do it. When I have information (marital status) I create a 'domain table'. I have 2 rules for most of my domain tables. 1: they end in the word Type. (maritalStatusType) 2: they have 3 fields: The PK (maritalStatusTypeID) description varChar(100) endDate date So in this case maritalStatusTypeID description endDate 1 Married null 2 Single null 3 Divorcednull 4 Widowed null 5 Shacked Up 2002-06-03 Now, in your members table you store the maritalStatusTypeID as a FK. then a simply select member.*, maritalStatusType.description from member left join maritalStatusType on member.maritalStatusTypeID = maritalStatusType.maritalStatusTypeID If you use the domain table to populate a select box (which I do) then the endDate provides a mechanism whereby you can stop if from appearing in the select box without deleting the record. (because, as we all know, deleting a record from a domain table would leave orphaned FKs in the members table and orphans are bad mojo.) Anyhow, this has worked for me in the past. =C= * * Cal Evans * Journeyman Programmer * Techno-Mage * http://www.calevans.com * -Original Message- From: Daren Cotter [mailto:[EMAIL PROTECTED]] Sent: Tuesday, June 04, 2002 6:38 PM To: [EMAIL PROTECTED] Subject: RE: Normalization question I have a question about the setup of the tables in my database. In my members table, I store a lot of info about demographics, such as marital status, income, etc. The way I'm doing this is the enum type. Is it better to use this, or would it be better to create separate tables for each demographic I want to store? Advantages of this would be no additional programming required when I want to list the # of members who have selected each demographic...plus I could add new items any time I wanted (i.e. I want to add widowed to the list of marital statuses). What is the correct way to do this? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Tuesday, June 04, 2002 6:24 PM To: Daren Cotter Subject: Re: Normalization question Your message cannot be posted because it appears to be either spam or simply off topic to our filter. To bypass the filter you must include one of the following words in your message: sql,query If you just reply to this message, and include the entire text of it in the reply, your reply will go through. However, you should first review the text of the message to make sure it has something to do with MySQL. Just typing the word MySQL once will be sufficient, for example. You have written the following: I have a question about the setup of the tables in my database. In my members table, I store a lot of info about demographics, such as marital status, income, etc. The way I'm doing this is the enum type. Is it better to use this, or would it be better to create separate tables for each demographic I want to store? Advantages of this would be no additional programming required when I want to list the # of members who have selected each demographic...plus I could add new items any time I wanted (i.e. I want to add widowed to the list of marital statuses). What is the correct way to do this? - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: normalization question
The data wouldn't need to be stored for any longer than two to three months, so that shouldn't be a problem...what about my query to get all members that have not read the mailing, is that possible (assuming I don't use the reverted logic you were talking about). Daren Cotter CEO, InboxDollars.com http://www.inboxdollars.com (507) 382-0435 -Original Message- From: Rene Churchill [mailto:[EMAIL PROTECTED]] Sent: Sunday, August 19, 2001 7:34 AM To: Daren Cotter Cc: [EMAIL PROTECTED] Subject: Re: normalization question Hi Daren, How long do you need to store this information? The normalized table that you describe is simple and easy to index. Having several million rows in the table won't bother MySQL. So is it sufficient to keep the data around for a month and then prune it out of the database? Long term records could be dumped out into text logfiles for safe keeping. Another possiblity is to invert your storage logic. By that, I mean insert a row in the cross-reference table for every user that has NOT read the email. Then when they read it, remove that row from the table. This will automatically prune down the table size as more and more of your readers check their email. I would HIGHLY suggest using the DELAYED option on that size of an insert. You may not want to go this logic inversion route because of the havoc it will play with other queries. For example, since any user that does NOT have a matching row in the cross-reference table is assumed to have read the message, any new user joining you will automatically be assumed to have read all messages to date. Rene Daren Cotter wrote: I have a table which needs to store which emails each member has read and had their account credited for. Currently, I have a field read_array longtext in my main members table, and each time a member reads an email, it simply concats to the end of this array. For example, they read email 288, their read array is: 288 They read 298, the read_array is: 288||298 Using this method (which is far from normalization, I know), I have been able to accomplish the two major things I need to be able to accomplish: 1) Insure that no member can get credit for reading the same mailing twice (check their current read_array, using ereg()) 2) Run a query to see how many members have read each mailing (WHERE read_array LIKE '%mail_id%') I know that both of these tasks would be easier if I created a new table, and stored the member_id, mail_id, and date, and the table would then be normalized...however, I send approximately one new mailing per day, and this would mean 100,000 new rows in this table every day. Can somone offer me expert advice as to which method is better? Using an array that will grow to unlimited size, or using a table that is going to grow much larger every day? Also, as a deterrant to switching to the normalized method...I frequently run queries to do the following: *) After I send a mailing, a week later, I send the mailing again to all members who did not confirm reading the email. My Query is like: select member_id, (more) FROM members WHERE read_array not like '%mail_id%' Is there any possible way I could accompish this task in one query if I had this process normalized? -- Rene Churchillhttp://www.vtwebwizard.com Internet Consulting 802-244-5151 Specializing in Web Programming - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php