subject:"RE\: normalization question"

Re: Normalization question

2005-12-10 Thread Rod Heyd

Thanks for you responses!  This list has proven to be for useful to pick
pick the brains of other DBA's.

I think we are probably going to go the route of splitting the different
pieces of data into seperate columns, while keeping the original product_id,
since we have a lot of third party applications that reference that column
for data identification purposes, although not to specifically find the
information that is overloaded within the field.

Thanks!

Re: Normalization question

2005-12-09 Thread SGreen

Rod Heyd [EMAIL PROTECTED] wrote on 12/09/2005 11:01:38 AM:

 Hi Everyone,
 
 I've got a little debate that maybe you can all help me settle.
 
 I have three tables that I need to join, but there are performance 
problems
 with the joins due to some misunderstandings of my predecessor about 
what's
 needed to join the tables efficiently.
 
 Here's the setup.
 
 t1 has a unique key defined on it, call it command_id, t1 has a 1 to 
many
 relationship with t2.
 t2 has t1's command_id as a foreign key constraint, plus a set of one or
 more instrument_id's.
 Each command results in one or more instruments taking data.
 
 The commanding elements defined here are then set to our instrument to
 aquire the data.
 When the data comes back it is organized into t3 by command_id and
 instrument_id.
 So the primary key on t3 looks like this: command_id_instrument_id.
 
 So, now I need to write a query that looks at what was commanded in t1 
and
 t2 and then look for
 any missing data that has not yet been received.
 
 So, I've got a query that looks something like this:
 
 SELECT  stuff
 
 FROM
 t1
 JOIN
 t2
 ON
t1.command_id = t2.command_id
 
 LEFT JOIN
   t3
 ON
 t3.data_id = concat(t1.command_id,'_',t2.instrument_id)
 
 Now, I think everyone is going to see immediately that the join on t3 is
 going to have absolutely horrible performance,
 the join condition here can't take advantage of any indexes since the 
string
 function pretty much destroys any hope of that. To make matters worse, 
the
 left join is a total killer.
 
 
 So my suggestion to solve the performance bottleneck is to add two 
columns
 to t3,
 command_id and instrument_id, and create a combined index on the 
command_id
 and instrument_id
 columns.
 
 the join condition on t3 then becomes:
 
 LEFT JOIN
  t3
 ON
 t3.command_id = t1.command_id
 AND
 t3.instrument_id = t2.instrument_id
 
 This works beautifully!  The performance of this new join condition is 
about
 480 times faster than the original.
 
 Here's the rub.  Since there is a unique data_id that already exists 
which
 combines the information in both
 command_id and instrument_id keys, I'm being told by our seasoned 
software
 developer that I am violating classic database design rules against
 redundant data.
 
 In my opinion, this is a minor violation at best.  We have a good 
reason
 for wanting to identify each data segment with our originally defined
 data_id, but this is not strictly a requirement on the database, it's 
more
 of an agreed upon convention that we are going to use to identify the 
data
 segments and distribute them to our consumers.  From a database stand 
point,
 the only requirement is that the data_id be unique.  It could be 
anything as
 far as the database is concerned, as long as the data_id remains unique, 
it
 doesn't matter that it may be overloaded with some duplicate 
information.
 Any more experienced DBA's than I have an opinion on this?
 
 Thanks!

You are not duplicating data by referring to objects by their id values. 
The false-normalization through the use of the composite key on t3 was a 
mistake. By obscuring the actual relationships between t1, t2, and t3, 
your original designer broke one of the cardinal rules of designing a 
relational database. Each foreign key should point to at least  (and 
usually at most) one row on a single table. His foreign key pointed to any 
row on either table. That was a bad design decision that created serious 
performance bottlenecks.  If you have the chance to fix his design of t3 
to keep the separate columns for each FK, please do it. If you want to 
keep the composite key for historical reasons, I don't see why that can't 
happen but you should stop using it as the primary key for the table. 

The drawback to composite keys is that they could possibly refer to more 
than one thing.  Assume for a moment that you are creating a composite key 
out of two text-type fields. Also imagine that you are compositing those 
fields with a _ character (as in your example). This pair ('testme_a', 
'secondhalf') and this pair ('testme', 'a_secondhalf') would both evaluate 
to 'testme_a_secondhalf'. We lose our 1:1 correspondence between key and 
parent.

Obviously, my example was contrived and using just numbers can avoid this 
kind of behavior but it illustrated why it is just not correct to 
construct foreign keys. Foreign keys are SUPPOSED to contain values that 
actually match something in the other table. Using them does not violate 
the no duplicates rule your friend describes and can be used to enforce 
relational integrity.

Just so you have it on record (IMHO): You are correct and the other DBA 
should read up about modern relational databases and modern theories of 
normalization.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Re: Normalization question

2005-12-09 Thread Michael Stassen


Rod Heyd wrote:

Hi Everyone,

I've got a little debate that maybe you can all help me settle.

I have three tables that I need to join, but there are performance problems
with the joins due to some misunderstandings of my predecessor about what's
needed to join the tables efficiently.

Here's the setup.

t1 has a unique key defined on it, call it command_id, t1 has a 1 to many
relationship with t2.
t2 has t1's command_id as a foreign key constraint, plus a set of one or
more instrument_id's.
Each command results in one or more instruments taking data.

The commanding elements defined here are then set to our instrument to aquire
the data. When the data comes back it is organized into t3 by command_id and 
instrument_id. So the primary key on t3 looks like this:

command_id_instrument_id.


Yuck!


So, now I need to write a query that looks at what was commanded in t1 and t2
and then look for any missing data that has not yet been received.

So, I've got a query that looks something like this:

SELECT  stuff
FROM t1 JOIN t2
ON t1.command_id = t2.command_id
LEFT JOIN t3
ON t3.data_id = concat(t1.command_id,'_',t2.instrument_id)

Now, I think everyone is going to see immediately that the join on t3 is
going to have absolutely horrible performance,
the join condition here can't take advantage of any indexes since the string
function pretty much destroys any hope of that. To make matters worse, the
left join is a total killer.


I think the LEFT JOIN isn't that big a deal, if the proper index is there and 
usable.  You can easily find out by comparing the speed of your query below 
against the speed of the same query without LEFT.



So my suggestion to solve the performance bottleneck is to add two columns to
t3, command_id and instrument_id, and create a combined index on the
command_id and instrument_id columns.


Yes, exactly.  The combined index should be UNIQUE.


the join condition on t3 then becomes:

LEFT JOIN t3
ON t3.command_id = t1.command_id AND t3.instrument_id = t2.instrument_id

This works beautifully!  The performance of this new join condition is about
480 times faster than the original.


Yes, this is the way this should be done.

Here's the rub. Since there is a unique data_id that already exists which 
combines the information in both command_id and instrument_id keys, I'm being

told by our seasoned software developer that I am violating classic
database design rules against redundant data.


It is true that you now have redundant data.  Clearly, you do not need both the 
two new columns, command_id and instrument_id, and the old column, 
command_id_instrument_id.  Redundant data is a violation of classic database 
design rules, so one or the other has to go if you want to follow the rules.


Which should go?  Well, the old column, command_id_instrument_id, combines the 
answer to two questions, Which command? and Which instrument?, into one 
column.  That is also also a violation of the classic database design rules, 
and a really bad idea.  It leads to precisely the sort of problem you are trying 
to fix.  It also makes it difficult to find the rows in t3 which belong to a 
particular command, or to a particular instrument.  Those queries require string 
matching, and the latter could not use an index.  If you want to follow the 
rules, drop the old column.



In my opinion, this is a minor violation at best.  We have a good reason
for wanting to identify each data segment with our originally defined
data_id, but this is not strictly a requirement on the database, it's more
of an agreed upon convention that we are going to use to identify the data
segments and distribute them to our consumers.  From a database stand point,
the only requirement is that the data_id be unique.  It could be anything as
far as the database is concerned, as long as the data_id remains unique, it
doesn't matter that it may be overloaded with some duplicate information.
Any more experienced DBA's than I have an opinion on this?


The strictly correct solution is to replace the old, broken column with your two 
new columns.  You can either make the combined index on the two new columns the 
PRIMARY KEY, or you can make an AUTO_INCREMENT primary key and define the 
combination of the two columns as UNIQUE.  It is then trivial to


  SELECT CONCAT(command_id, '_', instrument_id) ...

when you want to display the t3 id using the agreed upon convention.

In short, your seasoned software developer is right to want to follow the rules, 
but the rules dictate replacing the old column with the two new columns.



Thanks!


Michael

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

RE: normalization question

2002-10-21 Thread Artem Koltsov

Hi!
I don't understand problem. Car can belong to only one Dealer, and one Dealer can have 
many Cars, therefore DealerID should be in Car table. Also Car can be only one Model, 
but there are many Cars the same Model. It looks quite normal to put ModelID into Car 
table, and I don't see any possible problems. Of course if there are any special 
requirements for the database, they have to be incorporated into the model.
Am I missing your point?

Artem

-Original Message-
From: [EMAIL PROTECTED] [mailto:speters;metromls.com]
Sent: Monday, October 21, 2002 5:43 PM
To: [EMAIL PROTECTED]
Subject: normalization question


I'm working on a design for a database that is leaving me with some
normalization issues that I havent had to deal with before. For example, if
im trying to store
information about car dealerships I end up with the following situation:

I want to store the cars in stock at each physical dealership, with certain
attributes.

So, I have a table Dealership as follows:
Dealer_ID
Name
Address
Etc.

Then there is a Car_Make table (for things like Chevy, Hyundai):
Car_Make_ID
Make_Name
...

And a Model table (for things like Elantra, Neon, etc):
Model:
Model_ID
Car_Make_ID
Model_Name
...

And a table to describe the cars themselves: (one row per car)
Car:
VIN_number(vehicle identification number, what the DMV uses)
Model_ID
other attributes
...

Now, to associate a specific car with a dealer, I want to put Dealer_ID in
the Cars table, but then the Cars table has multiple parent tables,
specifically
Model  Dealership

I'm sure that this is far from normal, and will lead to programming
problems,
but i dont see what to do about it.

Any advice would be greatly appreciated.

 thanks,
sean peters
[EMAIL PROTECTED]


sql, query



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

 
Attention:
Any views expressed in this message are those of the individual sender, except where 
the message states otherwise and the sender is authorized to state them to be the 
views of any such entity.

The information contained in this message and or attachments is intended only for the 
person or entity to which it is addressed and may contain confidential and/or 
privileged material. Any review, retransmission, dissemination or other use of, or 
taking of any action in reliance upon, this information by persons or entities other 
than the intended recipient is prohibited. If you received this in error, please 
contact the sender and delete the material from any system and destroy any copies.

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: normalization question

2002-10-21 Thread Richard Clarke

Except in the case that cars aren't held at the dealers and any one car can
be purchased from a range of dealers.
In which case you will need a seperate table like,
Dealer_Car_Map (dealer_id,car_id).

Ric.

- Original Message -
From: Artem Koltsov [EMAIL PROTECTED]
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Monday, October 21, 2002 9:17 PM
Subject: RE: normalization question

Hi!
I don't understand problem. Car can belong to only one Dealer, and one
Dealer can have many Cars, therefore DealerID should be in Car table. Also
Car can be only one Model, but there are many Cars the same Model. It looks
quite normal to put ModelID into Car table, and I don't see any possible
problems. Of course if there are any special requirements for the database,
they have to be incorporated into the model.
Am I missing your point?

Artem

-Original Message-
From: [EMAIL PROTECTED] [mailto:speters;metromls.com]
Sent: Monday, October 21, 2002 5:43 PM
To: [EMAIL PROTECTED]
Subject: normalization question

I'm working on a design for a database that is leaving me with some
normalization issues that I havent had to deal with before. For example, if
im trying to store
information about car dealerships I end up with the following situation:

I want to store the cars in stock at each physical dealership, with certain
attributes.

So, I have a table Dealership as follows:
Dealer_ID
Name
Address
Etc.

Then there is a Car_Make table (for things like Chevy, Hyundai):
Car_Make_ID
Make_Name
...

And a Model table (for things like Elantra, Neon, etc):
Model:
Model_ID
Car_Make_ID
Model_Name
...

And a table to describe the cars themselves: (one row per car)
Car:
VIN_number(vehicle identification number, what the DMV uses)
Model_ID
other attributes
...

Now, to associate a specific car with a dealer, I want to put Dealer_ID in
the Cars table, but then the Cars table has multiple parent tables,
specifically
Model  Dealership

I'm sure that this is far from normal, and will lead to programming
problems,
but i dont see what to do about it.

Any advice would be greatly appreciated.

 thanks,
sean peters
[EMAIL PROTECTED]

sql, query

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail
[EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Attention:
Any views expressed in this message are those of the individual sender,
except where the message states otherwise and the sender is authorized to
state them to be the views of any such entity.

The information contained in this message and or attachments is intended
only for the person or entity to which it is addressed and may contain
confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon,
this information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
delete the material from any system and destroy any copies.

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

RE: Normalization question

2002-06-04 Thread Daren Cotter



I have a question about the setup of the tables in my database.

In my members table, I store a lot of info about demographics, such as
marital status, income, etc. The way I'm doing this is the enum type. Is
it better to use this, or would it be better to create separate tables
for each demographic I want to store? Advantages of this would be no
additional programming required when I want to list the # of members who
have selected each demographic...plus I could add new items any time I
wanted (i.e. I want to add widowed to the list of marital statuses).

What is the correct way to do this?




-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] 
Sent: Tuesday, June 04, 2002 6:24 PM
To: Daren Cotter
Subject: Re: Normalization question 


Your message cannot be posted because it appears to be either spam or
simply off topic to our filter. To bypass the filter you must include
one of the following words in your message:

sql,query

If you just reply to this message, and include the entire text of it in
the reply, your reply will go through. However, you should first review
the text of the message to make sure it has something to do with MySQL.
Just typing the word MySQL once will be sufficient, for example.

You have written the following:

I have a question about the setup of the tables in my database.

In my members table, I store a lot of info about demographics, such as
marital status, income, etc. The way I'm doing this is the enum type. Is
it better to use this, or would it be better to create separate tables
for each demographic I want to store? Advantages of this would be no
additional programming required when I want to list the # of members who
have selected each demographic...plus I could add new items any time I
wanted (i.e. I want to add widowed to the list of marital statuses).

What is the correct way to do this?



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

RE: Normalization question

2002-06-04 Thread Cal Evans


I won't purport to tell you the 'correct' way to do it but I will tell you
how I would do it.

When I have information (marital status) I create a 'domain table'. I have 2
rules for most of my domain tables.

1: they end in the word Type. (maritalStatusType)
2: they have 3 fields:
 The PK (maritalStatusTypeID)
 description varChar(100)
 endDate date

So in this case
maritalStatusTypeID description endDate
1   Married null
2   Single  null
3   Divorcednull
4   Widowed null
5   Shacked Up  2002-06-03

Now, in your members table you store the maritalStatusTypeID as a FK.

then a simply

select member.*,
   maritalStatusType.description
  from member left join maritalStatusType on member.maritalStatusTypeID =
maritalStatusType.maritalStatusTypeID


If you use the domain table to populate a select box (which I do) then the
endDate provides a mechanism whereby you can stop if from appearing in the
select box without deleting the record. (because, as we all know, deleting a
record from a domain table would leave orphaned FKs in the members table and
orphans are bad mojo.)

Anyhow, this has worked for me in the past.

=C=


*
* Cal Evans
* Journeyman Programmer
* Techno-Mage
* http://www.calevans.com
*


-Original Message-
From: Daren Cotter [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, June 04, 2002 6:38 PM
To: [EMAIL PROTECTED]
Subject: RE: Normalization question



I have a question about the setup of the tables in my database.

In my members table, I store a lot of info about demographics, such as
marital status, income, etc. The way I'm doing this is the enum type. Is
it better to use this, or would it be better to create separate tables
for each demographic I want to store? Advantages of this would be no
additional programming required when I want to list the # of members who
have selected each demographic...plus I could add new items any time I
wanted (i.e. I want to add widowed to the list of marital statuses).

What is the correct way to do this?




-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, June 04, 2002 6:24 PM
To: Daren Cotter
Subject: Re: Normalization question


Your message cannot be posted because it appears to be either spam or
simply off topic to our filter. To bypass the filter you must include
one of the following words in your message:

sql,query

If you just reply to this message, and include the entire text of it in
the reply, your reply will go through. However, you should first review
the text of the message to make sure it has something to do with MySQL.
Just typing the word MySQL once will be sufficient, for example.

You have written the following:

I have a question about the setup of the tables in my database.

In my members table, I store a lot of info about demographics, such as
marital status, income, etc. The way I'm doing this is the enum type. Is
it better to use this, or would it be better to create separate tables
for each demographic I want to store? Advantages of this would be no
additional programming required when I want to list the # of members who
have selected each demographic...plus I could add new items any time I
wanted (i.e. I want to add widowed to the list of marital statuses).

What is the correct way to do this?



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

RE: normalization question

2001-08-19 Thread Daren Cotter


The data wouldn't need to be stored for any longer than two to three months,
so that shouldn't be a problem...what about my query to get all members that
have not read the mailing, is that possible (assuming I don't use the
reverted logic you were talking about).



Daren Cotter
CEO, InboxDollars.com
http://www.inboxdollars.com
(507) 382-0435

-Original Message-
From: Rene Churchill [mailto:[EMAIL PROTECTED]]
Sent: Sunday, August 19, 2001 7:34 AM
To: Daren Cotter
Cc: [EMAIL PROTECTED]
Subject: Re: normalization question



Hi Daren,

How long do you need to store this information?  The normalized table
that you describe is simple and easy to index.  Having several million
rows in the table won't bother MySQL.  So is it sufficient to keep the
data around for a month and then prune it out of the database?  Long
term records could be dumped out into text logfiles for safe keeping.

Another possiblity is to invert your storage logic.  By that, I mean
insert a row in the cross-reference table for every user that has
NOT read the email.  Then when they read it, remove that row from
the table.  This will automatically prune down the table size as
more and more of your readers check their email.

I would HIGHLY suggest using the DELAYED option on that size of an
insert.

You may not want to go this logic inversion route because of the havoc
it will play with other queries.  For example, since any user that
does NOT have a matching row in the cross-reference table is assumed
to have read the message, any new user joining you will automatically
be assumed to have read all messages to date.

Rene

Daren Cotter wrote:
 I have a table which needs to store which emails each member has read and
 had their account credited for. Currently, I have a field read_array
 longtext in my main members table, and each time a member reads an email,
 it simply concats to the end of this array. For example, they read email
 288, their read array is:

 288

 They read 298, the read_array is:

 288||298

 Using this method (which is far from normalization, I know), I have been
 able to accomplish the two major things I need to be able to accomplish:

 1) Insure that no member can get credit for reading the same mailing twice
 (check their current read_array, using ereg())
 2) Run a query to see how many members have read each mailing (WHERE
 read_array LIKE '%mail_id%')

 I know that both of these tasks would be easier if I created a new table,
 and stored the member_id, mail_id, and date, and the table would then be
 normalized...however, I send approximately one new mailing per day, and
this
 would mean 100,000 new rows in this table every day. Can somone offer me
 expert advice as to which method is better? Using an array that will
grow
 to unlimited size, or using a table that is going to grow much larger
every
 day?

 Also, as a deterrant to switching to the normalized method...I frequently
 run queries to do the following:

 *) After I send a mailing, a week later, I send the mailing again to all
 members who did not confirm reading the email. My Query is like:
 select member_id, (more) FROM members WHERE read_array not like
'%mail_id%'

 Is there any possible way I could accompish this task in one query if I
had
 this process normalized?

--
Rene Churchillhttp://www.vtwebwizard.com
Internet Consulting   802-244-5151
Specializing in Web Programming

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail
[EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Normalization question

Re: Normalization question

Re: Normalization question

RE: normalization question

Re: normalization question

RE: Normalization question

RE: Normalization question

RE: normalization question

8 matches

Site Navigation

Mail list logo

Footer information