Hi Everyone, I've got a little debate that maybe you can all help me settle.
I have three tables that I need to join, but there are performance problems with the joins due to some misunderstandings of my predecessor about what's needed to join the tables efficiently. Here's the setup. t1 has a unique key defined on it, call it command_id, t1 has a 1 to many relationship with t2. t2 has t1's command_id as a foreign key constraint, plus a set of one or more instrument_id's. Each command results in one or more instruments taking data. The commanding elements defined here are then set to our instrument to aquire the data. When the data comes back it is organized into t3 by command_id and instrument_id. So the primary key on t3 looks like this: command_id_instrument_id. So, now I need to write a query that looks at what was commanded in t1 and t2 and then look for any missing data that has not yet been received. So, I've got a query that looks something like this: SELECT stuff FROM t1 JOIN t2 ON t1.command_id = t2.command_id LEFT JOIN t3 ON t3.data_id = concat(t1.command_id,'_',t2.instrument_id) Now, I think everyone is going to see immediately that the join on t3 is going to have absolutely horrible performance, the join condition here can't take advantage of any indexes since the string function pretty much destroys any hope of that. To make matters worse, the left join is a total killer. So my suggestion to solve the performance bottleneck is to add two columns to t3, command_id and instrument_id, and create a combined index on the command_id and instrument_id columns. the join condition on t3 then becomes: LEFT JOIN t3 ON t3.command_id = t1.command_id AND t3.instrument_id = t2.instrument_id This works beautifully! The performance of this new join condition is about 480 times faster than the original. Here's the rub. Since there is a unique data_id that already exists which combines the information in both command_id and instrument_id keys, I'm being told by our seasoned software developer that I am violating "classic" database design rules against redundant data. In my opinion, this is a minor "violation" at best. We have a good reason for wanting to identify each data segment with our originally defined data_id, but this is not strictly a requirement on the database, it's more of an agreed upon convention that we are going to use to identify the data segments and distribute them to our consumers. From a database stand point, the only requirement is that the data_id be unique. It could be anything as far as the database is concerned, as long as the data_id remains unique, it doesn't matter that it may be overloaded with some "duplicate" information. Any more experienced DBA's than I have an opinion on this? Thanks!