Re: hosted application data design

SGreen Thu, 02 Sep 2004 07:01:01 -0700

No, you aren't asking too much of MySQL in most of your questions but in 
another question, you are. Let me explain.


PRIMARY KEYs are nothing more than UNIQUE INDEXES on a table. How many 
columns it takes to uniquely identify any row of information depends on 
your data structure. Many people use an auto_incrementing INT (or BIGINT) 
column to artificially generate uniqueness. That means that their PRIMARY 
KEYS only require one column. Others have the requirements that certain 
combinations of columns cannot appear more than once in a table. They 
designate all of those columns as their PRIMARY KEY.  Here's an example 
that combines elements of both (for the sake of this sample, this is NOT a 
normalized data structure but it will be highly indexed)

CREATE TABLE us_buildings (
        ID int not null auto_increment.
        State  varchar(30) not null,
        City varchar(30) not null,
        County varchar(30) not null,
        Road varchar(45) not null,
        AddressNum int not null,
        AddressSuffix varchar(15)
        PRIMARY KEY (AddressNum, Road, City, State, AddressSuffix, County)
        UNIQUE (ID),
        Key(Road, City, State, AddressNum),
        Key(State, County, City) 
        Key(County, State, City)
)

This table is structured to ensure that any single structure in the US can 
be uniquely identified by it's position (AddressNum + AddressSuffix) along 
a road. The multiple-column PRIMARY KEY is used to prevent duplication and 
also serves as an index. The UNIQUE index on the ID column prevents anyone 
from changing the column into a duplicate value. That makes this column a 
good column to reference from another table (maybe a table like "resident" 
or "surveylog") without needing to duplicate all of the textual 
information it takes to identify the structure. The additional indexes are 
defined so that common searches on combinations of certain values will 
execute quickly. In a properly designed data structure, each element of 
that table would have their own table and this "fact table" would be using 
only the ID values of the other tables. That change (normalization) will 
make a HUGE difference in both storage requirements and index performance 
of this example as the database engine will retrieve numeric values 
directly from the index without the need for an additional seek (or two) 
to read the value from the table data and you will have nothing but 
numeric indexes on this table.

In your case, you can still use an auto_incrementing ID value as a 
reference value and your PRIMARY KEY can be composed of a combination of 
(client_ID, client_msg_num) but you will have to mange your per-client 
message numbers manually. It would be rather difficult to build into any 
database a management system that automates the process of creating an 
auto_incrementing value based on the unique combination of values in one 
or several columns.

Just before you insert a new message, you would need to check for the 
MAX(client_msg_num) for the client_id creating the message and assign the 
new message the next larger number. You would use transactions (InnoDB) or 
table locking (all table types)  to prevent a second instance of your 
application from doing the exact same thing at the exact same time (if you 
don't serialize your access to your message numbers, you could 
accidentally issue duplicate numbers). Some DBAs would design a second 
table that stores the most recently used message number for each client_ID 
(I have used both techniques). Which method works best for you will depend 
on your benchmark testing.

I hope you see that MySQL does most of the work for what you want to do 
but not quite all of it.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine



Justin French <[EMAIL PROTECTED]> wrote on 09/02/2004 12:31:57 
AM:

> Hi all,
> 
> I'm designing a hosted application that many clients will access.  I've 
> got all sites running off one PHP code base quite nicely, and now I'm 
> hoping to reduce the multiple databases down to just one shared DB.
> 
> (Note: at this stage, all accounts are fictional, so there's no legacy 
> data to worry about).
> 
> So instead of each account have it's own database and autonomous 
> tables, I'd like all accounts to share one database.  This will:
> 
> - make updates easy
> - allow each account to be "group aware"
> - allow a parent website easy access to all account data
> 
> I've added client_id columns to all tables, and things are working 
> fine, with the following exception...
> 
> I'm worried about the primary keys (IDs) for each table growing to an 
> unmanageable size pretty quick... 50 accounts * 500 forum posts = 25000 
> IDs pretty quickly.  50 sites * 5000 members = 250,000 members.
> 
> I'm sure this isn't much of a technical/performance issue (with proper 
> indexing), but it's just strikes me as a little strange that the first 
> news post for an account could have an ID of '502846', rather than '1'.
> 
> So, I was hoping I might be missing something in regards to primary 
> keys...
> 
> Can a primary key (or unique key) be established on two columns (eg 
> primary key = client_id + post id)?
> Can an auto-increment column only increment in relation to the client 
> id?
> 
> Or am I expecting *way* too much out of MySQL and relational data 
> design?
> 
> ---
> Justin French
> http://indent.com.au
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
>

Re: hosted application data design

Reply via email to