Re: many fields or many tables? (Understanding DB design)

2004-10-19 Thread SGreen
You have already gone a long way to describing your table structure by 
describing your data elements and their relationships to each other.  Let 
me try to summarize you descriptions and see if I can show you how to 
translate your text descriptions into table descriptions.

1. There are things called projects. 
2. Some projects have sub-projects. (I will assume that there is at most 
1 parent project per sub-project)
3. Some projects contain groups.
4. All projects contain people.
5. Some people assigned to projects also belong to one or more groups.


Let's do the easy one now:

CREATE TABLE project (
id int auto_increment primary key,
Name varchar(20) not null,
...other project related fields and indexes...
)

There are three ways to define tables for sub-projects. One is 
self-referential. This supports only one parent per sub-project but can 
extend into many layers of sub-sub projects.

CREATE TABLE project (
id int auto_increment primary key,
parentproject_id int not null default(0),
Name varchar(20) not null,
...other project related fields and indexes...
)

One is explicitly parent-child. The disadvantage to this method is if you 
need to reference a project/sub-project you need to check or decide 
between two different values that reside on two different table. 

CREATE TABLE project (
id int auto_increment primary key,
Name varchar(20) not null,
...other project related fields and indexes...
)

CREATE TABLE subproject (
id int auto_increment primary key,
Name varchar(20) not null,
...other project related fields and indexes...
)

The third is semi-self-referential in that all of the projects and 
subprojects are listed in the same table but their relationship (parent to 
child) is maintained in a third table. This method supports sub-projects 
that can be children of multiple projects. The danger here is you can 
possibly create a circular reference (A is a parent of B. B is a parent of 
C. C is a parent of A)

CREATE TABLE project (
id int auto_increment primary key,
Name varchar(20) not null,
...other project related fields and indexes...
)
CREATE TABLE subprojects (
project_id int not null,
subproject_id int not null,
... declare keys and indexes here...
)

In the table subprojects, both columns get the ID values from two 
different project records.

We need to look at #4 before we get to #3. Because a project must have 
people, that implies that there is a person thing in your system. Create 
a table to store information about a person

CREATE TABLE person (
id int auto_increment primary key,
First Name varchar(20) not null,
Last Name varchar(20) not null,
... more person-related fields and indexes ...
)

Now, #4 also states that each project can have 0 or more people assigned 
to it. What it didn't say, but should have, was whether each person can be 
assigned to more than one project. I know that most people do work on more 
than one project at a time or will be assigned to a new project after the 
old one is over. You have a many-to-many relationship between your people 
and your projects

create table people_projects (
id int auto_increment primary key,
person_id int not null,
project_id int not null,
UNIQUE(person_id, project_id)
...other indexes as needed...
)

By declaring that the COMBINATION of the values person_id and project_id 
must be UNIQUE, you guarantee that nobody is assigned to the same project 
more than once.

Looking at #3...I get the impression from the description that each group 
is specific to a single project and that each project can have zero or 
more groups (a project could just have people that aren't in any groups or 
no groups at all).

CREATE TABLE projectgroup (
id int auto_increment primary key,
project_id int not null,
name varchar(20) not null,
...other fields and keys as necessary...
)

Number 5 is an interesting relationship. It is declaring a relationonship 
on a relationship. The people assigned to a project  objects are on the 
people_projects table, not the person table. We need to equate those 
people to one or more groups. Again we are in a many-to-many situation and 
model it this way:

CREATE TABLE peopleproject_projectgroups (
id int auto_increment primary key,
peopleproject_id int not null,
projectgroup_id int not null,
UNIQUE (peopleproject_id, group_id)
... other indexes...
)

You may have noticed that while we have related a person assigned to a 
project to a project group however there is nothing in our data 
definitions that will prevent you from assigning a person assigned to one 
project to a group assigned to a different project. That bit if business 
rule enforcement must come from your application. The database can do a 
lot but it won't do everything.

Does this help you get started?

Shawn Green
Database 

Re: many fields or many tables? (Understanding DB design)

2004-10-19 Thread Timothy Luoma
On Oct 19, 2004, at 10:17 AM, [EMAIL PROTECTED] wrote:
You have already gone a long way to describing your table structure by 
describing your data elements and their relationships to each other. 
 Let me try to summarize you descriptions and see if I can show you 
how to translate your text descriptions into table descriptions.
Ok, I'm going to just go through and make sure I'm following your 
translation (I've never been good at foreign languages, and SQL is 
apparently no different ;-)

1. There are things called projects.
Yes... (FPP and WW) although of course all of the projects fall under 
one meta-project (TiM), but I guess that's taken care of by the fact 
that this DB will contain only information about that one meta-project.

2. Some projects have sub-projects. (I will assume that there is at 
most 1 parent project per sub-project)
Yes. (I am thinking here of FPP1 and FPP2 and FPP3).  So far WW has 
only one sub-project (WW1) but we expect there will be more 
eventually.

3. Some projects contain groups.
I would have said (to use your wording) some sub-projects contain 
groups... otherwise I think I'm confused what the difference is 
between a sub-project and a group.

I'm thinking of it this way:
fpp -- fpp1 -- fpp11
-- fpp12
-- fpp13
-- fpp14
-- fpp2 -- fpp21
-- fpp22
-- fpp23
are you saying that I ought to be thinking of it this way
fpp -- fpp1
-- fpp11
-- fpp12
-- fpp13
-- fpp14
-- fpp2
-- fpp21
-- fpp22
-- fpp23

4. All projects contain people.
Yes.
5. Some people assigned to projects also belong to one or more groups.
Here's where it starts to get fuzzy.
You have to be in one of FPP1 or FPP2 or FPP3 or WW1 (we'll call it WW1 
even though there isn't a WW2 yet).

You can be in FPP1 (or FPP2 or FPP3) and WW1.
If you are in FPP11 you can't be in FPP12 or FPP13 or FPP14 or FPP2 or 
FPP21 or FPP22 or FPP23.  Those final groups are exclusive and required 
(if you are in FPP1 then you must be in one and only one of FPP11 or 
FPP12 or FPP13 or FPP14).

I've been thinking about it like students at a university (DB) which 
has several colleges (projects), and each college has graduates of a 
particular class (sub-project).

The university has no current students or alumni who aren't from a 
particular college and a particular year.

Some classes (sub-projects) are broken down into further groups...
We need to look at #4 before we get to #3. Because a project must have 
people, that implies that there is a person thing in your system. 
Create a table to store information about a person

CREATE TABLE person (
        id int auto_increment primary key,
        First Name varchar(20) not null,
        Last Name varchar(20) not null,
... more person-related fields and indexes ...
)
Now should that table have all the information about persons (name, 
address, email.) ?

That was what I was originally thinking needed to be spread out into 
separate tables.


Now, #4 also states that each project can have 0 or more people 
assigned to it. What it didn't say, but should have, was whether each 
person can be assigned to more than one project. I know that most 
people do work on more than one project at a time or will be assigned 
to a new project after the old one is over. You have a many-to-many 
relationship between your people and your projects
It is possible that someone could be in only one project (ever) or that 
they might be involved in more than one.  We want to leave the door 
open for the 2nd option, although it will probably be more rare that 
someone is in more than 1 (however, we already have some and will no 
doubt have others).


By declaring that the COMBINATION of the values person_id and 
project_id must be UNIQUE, you guarantee that nobody is assigned to 
the same project more than once.
Ah, so I would assume I could do the same for sub-sub-projects (FPP11, 
FPP12, etc)?


Looking at #3...I get the impression from the description that each 
group is specific to a single project and that each project can have 
zero or more groups (a project could just have people that aren't in 
any groups or no groups at all).
The projects (FPP and WW) will always have sub-projects.  People have 
to belong to at least one project (FPP) and only one sub-project of 
that project (FPP1 or FPP2 or FPP3).  Further division beyond that is 
possible *and* if possible it is mandatory that everyone is in one and 
only one.


You may have noticed that while we have related a person assigned to 
a project to a project group however there is nothing in our data 
definitions that will prevent you from assigning a person assigned to 
one project to a group assigned to a different project. That bit if 
business rule enforcement must come from your application. The 
database can 

Re: many fields or many tables? (Understanding DB design)

2004-10-19 Thread SGreen
Timothy Luoma [EMAIL PROTECTED] wrote on 10/19/2004 11:11:12 AM:

 
 On Oct 19, 2004, at 10:17 AM, [EMAIL PROTECTED] wrote:
 
  You have already gone a long way to describing your table structure by 

  describing your data elements and their relationships to each other. 
   Let me try to summarize you descriptions and see if I can show you 
  how to translate your text descriptions into table descriptions.
 
 Ok, I'm going to just go through and make sure I'm following your 
 translation (I've never been good at foreign languages, and SQL is 
 apparently no different ;-)
 
  1. There are things called projects.
 
 Yes... (FPP and WW) although of course all of the projects fall under 
 one meta-project (TiM), but I guess that's taken care of by the fact 
 that this DB will contain only information about that one meta-project.
 
  2. Some projects have sub-projects. (I will assume that there is at 
  most 1 parent project per sub-project)
 
 Yes. (I am thinking here of FPP1 and FPP2 and FPP3).  So far WW has 
 only one sub-project (WW1) but we expect there will be more 
 eventually.


But in reality, the only difference in a project and a sub-project is that 
there is a parent to a sub-project. That's why I didn't differentiate 
between them later on.

 
  3. Some projects contain groups.
 
 I would have said (to use your wording) some sub-projects contain 
 groups... otherwise I think I'm confused what the difference is 
 between a sub-project and a group.
 
 I'm thinking of it this way:
 
 fpp -- fpp1-- fpp11
 -- fpp12
 -- fpp13
 -- fpp14
-- fpp2   -- fpp21
 -- fpp22
 -- fpp23
 
 are you saying that I ought to be thinking of it this way
 
 fpp -- fpp1
-- fpp11
-- fpp12
-- fpp13
-- fpp14
-- fpp2
-- fpp21
-- fpp22
-- fpp23
 

No, you had it right the first time. Remember, the only difference between 
a project and a sub-project is whether or not it has a parent project. It 
has nothing to do with how it's stored in the database.

 
  4. All projects contain people.
 
 Yes.
 
  5. Some people assigned to projects also belong to one or more groups.
 
 Here's where it starts to get fuzzy.
 
 You have to be in one of FPP1 or FPP2 or FPP3 or WW1 (we'll call it WW1 
 even though there isn't a WW2 yet).
 
 You can be in FPP1 (or FPP2 or FPP3) and WW1.
 
 If you are in FPP11 you can't be in FPP12 or FPP13 or FPP14 or FPP2 or 
 FPP21 or FPP22 or FPP23.  Those final groups are exclusive and required 
 (if you are in FPP1 then you must be in one and only one of FPP11 or 
 FPP12 or FPP13 or FPP14).

This type of mutual exclusion is currently beyond the realm of MySQL DDL. 
You will have to enforce that business rule in your application code, not 
with the database design.

 
 I've been thinking about it like students at a university (DB) which 
 has several colleges (projects), and each college has graduates of a 
 particular class (sub-project).

I agree but I see it more as a school(university) containing other schools 
(colleges) with each school having zero or more classes (freshman, 
sophomore, etc). 

In this design the top-most school is the university, it has no parent 
schools. The next tier schools are colleges, each of them is the child to 
a university. 

A university-type school has no child classes. Each college-type school 
as 4 classes: freshman, sophomore, junior, senior.

You have one master project, FPP. There are child projects: FPP1 and 
FPP2. Both FPP1 and FPP2 are children to FPP.  FPP1 is the parent to the 
following groups: FPP11,FPP12,FPP13,FPP14. FPP2 is the parent to the 
following groups: FPP21,FPP22,FPP23

However, a project is a project and a group is still a group. The 
only difference between those and sub-projects and sub-groups are whether 
or not they get parents.

 
 The university has no current students or alumni who aren't from a 
 particular college and a particular year.
 
 Some classes (sub-projects) are broken down into further groups...
 
  We need to look at #4 before we get to #3. Because a project must have 

  people, that implies that there is a person thing in your system. 
  Create a table to store information about a person
 
  CREATE TABLE person (
  id int auto_increment primary key,
  First Name varchar(20) not null,
  Last Name varchar(20) not null,
  ... more person-related fields and indexes ...
  )
 
 Now should that table have all the information about persons (name, 
 address, email.) ?


This is the tip of the person iceberg. You can create as complex a 
structure as you need in order to completely document a person in your 
system. Analyze your need for person information by describing them just 
as you described your project planning needs. However, all of the 
information can be summarized by that one ID from this one table, 
understand?

 That was what I was originally thinking needed to be spread out into 
 separate tables.


Re: many fields or many tables? (Understanding DB design)

2004-10-18 Thread SGreen
There are numerous advantages to going with the multiple table database 
you described. That is what we call a normalized data structure. Try 
searching again for terms like normalized, normalizing, and normal 
form for additional background. Add the terms tutorial, or overview 
to find web sites that have more explanations than theory. Keep following 
links until you either hit a brick wall or a breakthrough. 

The whole practice of normalization was pioneered in the seventies by 
Boyce Codd. Even if he didn't invent everything about it (well, he almost 
did!), he is considered one of the primary fathers of relational 
databases. 

Come back to the list if you have any specific questions. Normalization is 
a tool/technique that all new database designers should incorporate into 
their designs as early as possible. Failure to do so can lead to some 
horrible design choices that make your database both hard to maintain and 
impossible to expand. However, as important as it is, many people have a 
hard time understanding the concepts involved. Please come back to the 
list if you run into problems.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



Timothy Luoma [EMAIL PROTECTED] wrote on 10/18/2004 02:02:31 PM:

 
 I have been tinkering with MySQL long enough to suit what modest needs 
 I have had, but now I need to setup a new DB that is going to have more 
 information in it, and I want to make sure that I am doing it the most 
 efficient way for the long term.
 
 Surprisingly, I have not been able to find a good resource for this, 
 which means that I am either asking Google the wrong question, or this 
 is something that everyone else seems to understand.  If the former, 
 please accept my apology and feel free to point me to any FM that I 
 might RT.
 
 The DB that I need to design is fairly simple, and will contain things 
 like: Last Name, First Name, Middle Name, Preferred Name, Street 
 Address, City, State, Zip, Home Phone, Work Phone, Cell Phone, Birthday 
 (month and date, only a few gave years), Denomination, Church Name, 
 email address (most have only 1, a few have 2), Program Name, 
 Graduation Year, Group Name, and AIM Screen Name.
 
 For the vast majority of the people in the DB, I have all the above 
 information above.  The plan is to be able to pull out either all the 
 information into one really big printout, or be able to ask for things 
 like just names  email, or just names  phone numbers or just names  
 people from a specific program  a specific year, etc.
 
 My first inclination was just to make one table with all of the 
 information as fields to that one table (assigning everyone a 
 UniqueID).
 
 Then I wondered if I should break it out into several tables, i.e. a 
 table for name with fields First, Middle, Preferred, Last and a table 
 for address with fields street, city, state, zip
 
 The advantage to the first (one big table) seemed to be easier to 
 create, but I wondered if it would be difficult to add things later 
 (i.e. if I need an entire new column or something like work address 
 [street, city, state, zip])?
 
 The advantage to the second (many tables) seemed to be that it would be 
 easier to add/remove things... but it would take effort to make sure 
 that everything tied back to the proper ID.
 
 Is there a general consensus over which is better?  Are there other 
 +/- that I am not seeing in doing in one way over the other?  Is there 
 a disadvantage to having a field like email2 or birthyear which 
 would be blank for the majority of the participants?
 
 Hopefully I'm not asking a question that's too broad to answer or comes 
 down to just personal preference.  I'm hoping there's a good definitive 
 reason to do one or ther other.
 
 Thanks for your time or any FAQ pointers you might share.
 
 TjL
 
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
 


Re: many fields or many tables? (Understanding DB design)

2004-10-18 Thread Brian
Timothy, 

Definately follow the advice that Shawn gave you.  Doing it this way
will make it easy to have any number of emails per person without have
to know how many beforehand
.  Here's an example below:

Table USERS:
userid=15
fname='Timothy'
lname='Luoma'

Table EMAILS:
userid=15
email='[EMAIL PROTECTED]'

userid=15
email='[EMAIL PROTECTED]'

Now to get your name you'd do:
select fname, lname from USERS where userid=15;
This would return:
'Timothy' 'Luoma'

Or to get the emails:
select email from emails where userid=15;
This would return:
'[EMAIL PROTECTED]'
'[EMAIL PROTECTED]'

Or to combine it together:
select a.fname, a.lname, b.email from users a, emails b where a.userid=b.userid;
This would return:
'Timothy' 'Luoma' '[EMAIL PROTECTED]'
'Timothy' 'Luoma' '[EMAIL PROTECTED]'
 

Basically what I'm trying to say with my example is you have one
table(USERS) that holds a key(userid) which you can use to access all
the other tables.



On Mon, 18 Oct 2004 14:16:12 -0400, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 There are numerous advantages to going with the multiple table database
 you described. That is what we call a normalized data structure. Try
 searching again for terms like normalized, normalizing, and normal
 form for additional background. Add the terms tutorial, or overview
 to find web sites that have more explanations than theory. Keep following
 links until you either hit a brick wall or a breakthrough.
 
 The whole practice of normalization was pioneered in the seventies by
 Boyce Codd. Even if he didn't invent everything about it (well, he almost
 did!), he is considered one of the primary fathers of relational
 databases.
 
 Come back to the list if you have any specific questions. Normalization is
 a tool/technique that all new database designers should incorporate into
 their designs as early as possible. Failure to do so can lead to some
 horrible design choices that make your database both hard to maintain and
 impossible to expand. However, as important as it is, many people have a
 hard time understanding the concepts involved. Please come back to the
 list if you run into problems.
 
 Shawn Green
 Database Administrator
 Unimin Corporation - Spruce Pine
 
 Timothy Luoma [EMAIL PROTECTED] wrote on 10/18/2004 02:02:31 PM:
 
 
 
 
  I have been tinkering with MySQL long enough to suit what modest needs
  I have had, but now I need to setup a new DB that is going to have more
  information in it, and I want to make sure that I am doing it the most
  efficient way for the long term.
 
  Surprisingly, I have not been able to find a good resource for this,
  which means that I am either asking Google the wrong question, or this
  is something that everyone else seems to understand.  If the former,
  please accept my apology and feel free to point me to any FM that I
  might RT.
 
  The DB that I need to design is fairly simple, and will contain things
  like: Last Name, First Name, Middle Name, Preferred Name, Street
  Address, City, State, Zip, Home Phone, Work Phone, Cell Phone, Birthday
  (month and date, only a few gave years), Denomination, Church Name,
  email address (most have only 1, a few have 2), Program Name,
  Graduation Year, Group Name, and AIM Screen Name.
 
  For the vast majority of the people in the DB, I have all the above
  information above.  The plan is to be able to pull out either all the
  information into one really big printout, or be able to ask for things
  like just names  email, or just names  phone numbers or just names 
  people from a specific program  a specific year, etc.
 
  My first inclination was just to make one table with all of the
  information as fields to that one table (assigning everyone a
  UniqueID).
 
  Then I wondered if I should break it out into several tables, i.e. a
  table for name with fields First, Middle, Preferred, Last and a table
  for address with fields street, city, state, zip
 
  The advantage to the first (one big table) seemed to be easier to
  create, but I wondered if it would be difficult to add things later
  (i.e. if I need an entire new column or something like work address
  [street, city, state, zip])?
 
  The advantage to the second (many tables) seemed to be that it would be
  easier to add/remove things... but it would take effort to make sure
  that everything tied back to the proper ID.
 
  Is there a general consensus over which is better?  Are there other
  +/- that I am not seeing in doing in one way over the other?  Is there
  a disadvantage to having a field like email2 or birthyear which
  would be blank for the majority of the participants?
 
  Hopefully I'm not asking a question that's too broad to answer or comes
  down to just personal preference.  I'm hoping there's a good definitive
  reason to do one or ther other.
 
  Thanks for your time or any FAQ pointers you might share.
 
  TjL
 
 
  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe:  

Re: many fields or many tables? (Understanding DB design)

2004-10-18 Thread Timothy Luoma
Ok, this makes a lot of sense now.  (As usual, what seems like more 
work initially pays off in the end.)

Here's a specific question.
The parent project is called TiM.  We will, at times, want to pull 
out information for *everyone*.  But more often we will want to pull 
out information just from sub-projects.  One of these is FPP and 
another is WW.

WW right now has only taken place once and did not have any further 
sub-divisions.

FPP will span about 5 years, and be divided into 3 groups (and possibly 
more after the initial pilot project).

The people who are in the first group of FPP participants are referred 
to as FPP1, the second are FPP2 (FPP3 has not started yet).

FPP1 involves 28 people broken down into 4 groups (FPP11, FPP12, FPP13, 
FPP14).

FPP2 involves 19 people broken down into 3 groups (FPP21, FPP22, FPP23).
I want to be able to match all those who are in the DB (i.e. any TiM 
participant), or just those who are in FPP, or just those who are in 
FPP1, or just those who are in FPP11 (or FPP23, etc).

I'm having trouble figuring out how to be that specific and that broad 
at the same time.  Just not quite getting my head wrapped around it.

TjL
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: many fields or many tables? (Understanding DB design)

2004-10-18 Thread Brian
On Mon, 18 Oct 2004 17:49:22 -0400, Timothy Luoma [EMAIL PROTECTED] wrote:
 
 Ok, this makes a lot of sense now.  (As usual, what seems like more
 work initially pays off in the end.)
 
 Here's a specific question.
 
 The parent project is called TiM.  We will, at times, want to pull
 out information for *everyone*.  But more often we will want to pull
 out information just from sub-projects.  One of these is FPP and
 another is WW.
 
 WW right now has only taken place once and did not have any further
 sub-divisions.
 
 FPP will span about 5 years, and be divided into 3 groups (and possibly
 more after the initial pilot project).
 
 The people who are in the first group of FPP participants are referred
 to as FPP1, the second are FPP2 (FPP3 has not started yet).
 
 FPP1 involves 28 people broken down into 4 groups (FPP11, FPP12, FPP13,
 FPP14).
 
 FPP2 involves 19 people broken down into 3 groups (FPP21, FPP22, FPP23).
 
 I want to be able to match all those who are in the DB (i.e. any TiM
 participant), or just those who are in FPP, or just those who are in
 FPP1, or just those who are in FPP11 (or FPP23, etc).
 
 I'm having trouble figuring out how to be that specific and that broad
 at the same time.  Just not quite getting my head wrapped around it.
 
 TjL
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
 
 

Here is a way that'll allow you to have a single participant in more
than one group, any number of groups in a sublevel, and any number of 
levels deep:

table USERS:
userid=15

userid=16

table groups_to_users:
userid=15
ugid=10

userid=15
ugid=11

userid=16
ugid=12


table group_defs:
ugid=10
level=0
id='FPP1'

ugid=10
level=1
id='1'

ugid=11
level=0
id='FPP2'

ugid=11
level=1
id='1'

ugid=11
level=2
id='2'

ugid=12
level=0
id='WW'

The above means user 15 is part of:
FPP1-1 and FPP2-1-2
User 16 is part of:
WW

Now of course there is a better solution if you don't need that much
flexibility as this example would require multiple inserts to add
someone to a single group and processing after a select to put it
together.

To get a raw list of groups you'd use:
select ugid, level, id from group_defs group by ugid, level;
On the above table this should give you:
10 0 FPP1
10 1 1
11 0 FPP2
11 1 1
11 2 2
12 0 WW

To get a formatted list of group names use:
select ugid, group_concat(id order by level) from group_defs group by ugid;
This will give you:
10 FPP11
11 FPP212
12 WW

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]