Normalization question

Rod Heyd Fri, 09 Dec 2005 08:02:10 -0800

Hi Everyone,

I've got a little debate that maybe you can all help me settle.


I have three tables that I need to join, but there are performance problems
with the joins due to some misunderstandings of my predecessor about what's
needed to join the tables efficiently.

Here's the setup.

t1 has a unique key defined on it, call it command_id, t1 has a 1 to many
relationship with t2.
t2 has t1's command_id as a foreign key constraint, plus a set of one or
more instrument_id's.
Each command results in one or more instruments taking data.

The commanding elements defined here are then set to our instrument to
aquire the data.
When the data comes back it is organized into t3 by command_id and
instrument_id.
So the primary key on t3 looks like this: command_id_instrument_id.

So, now I need to write a query that looks at what was commanded in t1 and
t2 and then look for
any missing data that has not yet been received.

So, I've got a query that looks something like this:

SELECT  stuff

FROM
            t1
JOIN
            t2
ON
           t1.command_id = t2.command_id

LEFT JOIN
          t3
ON
        t3.data_id = concat(t1.command_id,'_',t2.instrument_id)

Now, I think everyone is going to see immediately that the join on t3 is
going to have absolutely horrible performance,
the join condition here can't take advantage of any indexes since the string
function pretty much destroys any hope of that. To make matters worse, the
left join is a total killer.


So my suggestion to solve the performance bottleneck is to add two columns
to t3,
command_id and instrument_id, and create a combined index on the command_id
and instrument_id
columns.

the join condition on t3 then becomes:

LEFT JOIN
         t3
ON
        t3.command_id = t1.command_id
AND
        t3.instrument_id = t2.instrument_id

This works beautifully!  The performance of this new join condition is about
480 times faster than the original.

Here's the rub.  Since there is a unique data_id that already exists which
combines the information in both
command_id and instrument_id keys, I'm being told by our seasoned software
developer that I am violating "classic" database design rules against
redundant data.

In my opinion, this is a minor "violation" at best.  We have a good reason
for wanting to identify each data segment with our originally defined
data_id, but this is not strictly a requirement on the database, it's more
of an agreed upon convention that we are going to use to identify the data
segments and distribute them to our consumers.  From a database stand point,
the only requirement is that the data_id be unique.  It could be anything as
far as the database is concerned, as long as the data_id remains unique, it
doesn't matter that it may be overloaded with some "duplicate" information.
Any more experienced DBA's than I have an opinion on this?

Thanks!

Normalization question

Reply via email to