"Jigal van Hemert" <[EMAIL PROTECTED]> wrote on 04/26/2005 10:35:06 AM:

> From: "Harald Fuchs"
> 
> > > id INT(11) - accountID
> > > name VARCHAR(32) - parameter name
> > > value INT(11) - parameter value
> >
> > > Other tables contain string, datetime, etc. parameters.
> >
> > > Since most searches are made for a value (or range) of one or more
> > > parameters, a usable primary key is:
> > > name-value-id
> >
> > That's a horrible denormalization.  If one named parameter can hold
> > only one INT value for one account id, then (id, name) could be a
> > primary key; otherwise, you'd need a surrogate primary key.
> 
> And what if you want to find the IDs which have a certain value in a
> parameter? Then you're searching for name and value and only need the id 
as
> the glue for joins.
> 
> The sitution is that I have to store a variable (and changing) number of
> parameters for a large number of IDs. The traditional construction of a
> column for each parameter is not usable anymore:
> - modifying the table structure for new parameters requires a lot of 
time
> - all unused parameters will still take space
> - design of index(es) is virtually impossible because searches are made 
on
> various combinations of parameters
> The parameter table solution is rather simple, but has some performance
> complications once you have 26,000,000 records for 475,000 accounts in a
> 5,3GB database...
> 
> How would *you* normalize this situation? All suggestions are welcome!
> 
> Regards, Jigal.
> 

I think what tripped us up (and raised the red flags) was that you called 
your index on (id, name, value) a PRIMARY KEY. However, the unique 
combination of values on your parameters table should have only been for 
(id, name), right?  The id-name combination will uniquely identify a 
parameter for an account (and each account cannot have the same parameter 
more than once, can it?). So your PK should be on (id, name) and a regular 
index could be placed on (id, name, value) as a covering index.

However, your covering index will be a complete duplicate of your table 
and based on the numbers you give us, it probably won't fit into memory. 
You would end up trying to use paged memory to search an index to get at 
the value of the 'value' column just to avoid an extra trip to the disk to 
get the value based on an PK match.  I don't think the extra effort (and 
disk space used and working memory consumed) would actually help your 
retrieval speed. If it were me, I would not index that table beyond the PK 
on just (id, name). But, I would still test it both ways just to be sure.

I had a similar situation as you and chose a similar solution. My data 
storage challenge was to make millions of laboratory reports accessible 
for statistical analysis. Each report can contain any number of analyses 
on them (from 2 to 60) depending on what tests were requested when the 
sample was submitted to the lab. I also created a table for reports and a 
separate table for each test-result pair (fortunately for me, all of my 
results were numeric or could be represented with numbers). The two tables 
are linked through a FK (I am also using InnoDB). Not only is this saving 
space but the design is flexible because I do not need to change my tables 
every time the lab comes up with a new analysis it can perform (for the 
exact same reasons you described above).

Don't get discouraged, OK? ;-)
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Reply via email to