Rod Heyd <[EMAIL PROTECTED]> wrote on 12/09/2005 11:01:38 AM: > Hi Everyone, > > I've got a little debate that maybe you can all help me settle. > > I have three tables that I need to join, but there are performance problems > with the joins due to some misunderstandings of my predecessor about what's > needed to join the tables efficiently. > > Here's the setup. > > t1 has a unique key defined on it, call it command_id, t1 has a 1 to many > relationship with t2. > t2 has t1's command_id as a foreign key constraint, plus a set of one or > more instrument_id's. > Each command results in one or more instruments taking data. > > The commanding elements defined here are then set to our instrument to > aquire the data. > When the data comes back it is organized into t3 by command_id and > instrument_id. > So the primary key on t3 looks like this: command_id_instrument_id. > > So, now I need to write a query that looks at what was commanded in t1 and > t2 and then look for > any missing data that has not yet been received. > > So, I've got a query that looks something like this: > > SELECT stuff > > FROM > t1 > JOIN > t2 > ON > t1.command_id = t2.command_id > > LEFT JOIN > t3 > ON > t3.data_id = concat(t1.command_id,'_',t2.instrument_id) > > Now, I think everyone is going to see immediately that the join on t3 is > going to have absolutely horrible performance, > the join condition here can't take advantage of any indexes since the string > function pretty much destroys any hope of that. To make matters worse, the > left join is a total killer. > > > So my suggestion to solve the performance bottleneck is to add two columns > to t3, > command_id and instrument_id, and create a combined index on the command_id > and instrument_id > columns. > > the join condition on t3 then becomes: > > LEFT JOIN > t3 > ON > t3.command_id = t1.command_id > AND > t3.instrument_id = t2.instrument_id > > This works beautifully! The performance of this new join condition is about > 480 times faster than the original. > > Here's the rub. Since there is a unique data_id that already exists which > combines the information in both > command_id and instrument_id keys, I'm being told by our seasoned software > developer that I am violating "classic" database design rules against > redundant data. > > In my opinion, this is a minor "violation" at best. We have a good reason > for wanting to identify each data segment with our originally defined > data_id, but this is not strictly a requirement on the database, it's more > of an agreed upon convention that we are going to use to identify the data > segments and distribute them to our consumers. From a database stand point, > the only requirement is that the data_id be unique. It could be anything as > far as the database is concerned, as long as the data_id remains unique, it > doesn't matter that it may be overloaded with some "duplicate" information. > Any more experienced DBA's than I have an opinion on this? > > Thanks!
You are not duplicating data by referring to objects by their id values. The false-normalization through the use of the composite key on t3 was a mistake. By obscuring the actual relationships between t1, t2, and t3, your original designer broke one of the cardinal rules of designing a relational database. Each foreign key should point to at least (and usually at most) one row on a single table. His foreign key pointed to any row on either table. That was a bad design decision that created serious performance bottlenecks. If you have the chance to fix his design of t3 to keep the separate columns for each FK, please do it. If you want to keep the composite key for historical reasons, I don't see why that can't happen but you should stop using it as the primary key for the table. The drawback to composite keys is that they could possibly refer to more than one thing. Assume for a moment that you are creating a composite key out of two text-type fields. Also imagine that you are compositing those fields with a _ character (as in your example). This pair ('testme_a', 'secondhalf') and this pair ('testme', 'a_secondhalf') would both evaluate to 'testme_a_secondhalf'. We lose our 1:1 correspondence between key and parent. Obviously, my example was contrived and using just numbers can avoid this kind of behavior but it illustrated why it is just not correct to construct foreign keys. Foreign keys are SUPPOSED to contain values that actually match something in the other table. Using them does not violate the "no duplicates" rule your friend describes and can be used to enforce relational integrity. Just so you have it on record (IMHO): You are correct and the other DBA should read up about modern relational databases and modern theories of normalization. Shawn Green Database Administrator Unimin Corporation - Spruce Pine