Rod Heyd wrote:
Hi Everyone,

I've got a little debate that maybe you can all help me settle.

I have three tables that I need to join, but there are performance problems
with the joins due to some misunderstandings of my predecessor about what's
needed to join the tables efficiently.

Here's the setup.

t1 has a unique key defined on it, call it command_id, t1 has a 1 to many
relationship with t2.
t2 has t1's command_id as a foreign key constraint, plus a set of one or
more instrument_id's.
Each command results in one or more instruments taking data.

The commanding elements defined here are then set to our instrument to aquire
the data. When the data comes back it is organized into t3 by command_id and instrument_id. So the primary key on t3 looks like this:
command_id_instrument_id.

Yuck!

So, now I need to write a query that looks at what was commanded in t1 and t2
and then look for any missing data that has not yet been received.

So, I've got a query that looks something like this:

SELECT  stuff
FROM t1 JOIN t2
ON t1.command_id = t2.command_id
LEFT JOIN t3
ON t3.data_id = concat(t1.command_id,'_',t2.instrument_id)

Now, I think everyone is going to see immediately that the join on t3 is
going to have absolutely horrible performance,
the join condition here can't take advantage of any indexes since the string
function pretty much destroys any hope of that. To make matters worse, the
left join is a total killer.

I think the LEFT JOIN isn't that big a deal, if the proper index is there and usable. You can easily find out by comparing the speed of your query below against the speed of the same query without "LEFT".

So my suggestion to solve the performance bottleneck is to add two columns to
t3, command_id and instrument_id, and create a combined index on the
command_id and instrument_id columns.

Yes, exactly.  The combined index should be UNIQUE.

the join condition on t3 then becomes:

LEFT JOIN t3
ON t3.command_id = t1.command_id AND t3.instrument_id = t2.instrument_id

This works beautifully!  The performance of this new join condition is about
480 times faster than the original.

Yes, this is the way this should be done.

Here's the rub. Since there is a unique data_id that already exists which combines the information in both command_id and instrument_id keys, I'm being
told by our seasoned software developer that I am violating "classic"
database design rules against redundant data.

It is true that you now have redundant data. Clearly, you do not need both the two new columns, command_id and instrument_id, and the old column, command_id_instrument_id. Redundant data is a violation of "classic" database design rules, so one or the other has to go if you want to follow the rules.

Which should go? Well, the old column, command_id_instrument_id, combines the answer to two questions, "Which command?" and "Which instrument?", into one column. That is also also a violation of the "classic" database design rules, and a really bad idea. It leads to precisely the sort of problem you are trying to fix. It also makes it difficult to find the rows in t3 which belong to a particular command, or to a particular instrument. Those queries require string matching, and the latter could not use an index. If you want to follow the rules, drop the old column.

In my opinion, this is a minor "violation" at best.  We have a good reason
for wanting to identify each data segment with our originally defined
data_id, but this is not strictly a requirement on the database, it's more
of an agreed upon convention that we are going to use to identify the data
segments and distribute them to our consumers.  From a database stand point,
the only requirement is that the data_id be unique.  It could be anything as
far as the database is concerned, as long as the data_id remains unique, it
doesn't matter that it may be overloaded with some "duplicate" information.
Any more experienced DBA's than I have an opinion on this?

The strictly correct solution is to replace the old, broken column with your two new columns. You can either make the combined index on the two new columns the PRIMARY KEY, or you can make an AUTO_INCREMENT primary key and define the combination of the two columns as UNIQUE. It is then trivial to

  SELECT CONCAT(command_id, '_', instrument_id) ...

when you want to display the t3 id using the agreed upon convention.

In short, your seasoned software developer is right to want to follow the rules, but the rules dictate replacing the old column with the two new columns.

Thanks!

Michael

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to