Going forward, you may think about redesigning your tables if you can. Your structure is very "flat", databases are designed for quick searching "vertically", meaning finding specific rows. You really want to keep your columns to a minimum if you can so you can take full advantage of indexes.
If your data really doesn't change all that often (or not at all), then I think you should create a "Preferred Products" table. This is sort of what you do in datawarehousing, design for accessibility instead of maintainability.
On May 16, 2005, at 6:04 PM, Rod Heyd wrote:
I have a question about joins. My situation is as follows:
I have 5 tables identical in structure. Each table represents essentially
the same data, however, the data in each table represents a different
"version." The processing involved in generating the values stored in each
table may have changed slightly, and therefore the values calculated may
vary as well.
On top of this structure is a need to identify a particular version of a record as the "preferred" version.
The version tables look something like this:
Tables Version_(1-5)
----------------------------------------------------------------------- -------------------------------
ID |PRODUCT_ID |DATA1 |DATA2 |DATA3 |.........|DATA_N |
----------------------------------------------------------------------- -------------------------------
ID is a unique value across all 5 version tables
PRODUCT_ID is unique to a given version table and identifies the source
"product", but is not unique in all 5 tables, ie, given a
PRODUCT_ID=27 all 5 version tables will have a single record
with PRODUCT_ID=27.
DATA1...DATA_N These are columns with calculated values.
Now in order to identify which version of a given PRODUCT_ID is "preferred,"
we have created a map table that looks similar to this:
Table Preferred_Map
------------------------------------------------- MAP_ID |ID |PRODUCT_ID | -------------------------------------------------
MAP_ID is an auto_increment primary key
ID is the unique ID from the Version tables (unique value)
PRODUCT_ID is the PRODUCT_ID from the tables (unique value)
So to find information about records that are preferred, we have queries that look like this:
SELECT Preferred_Map.PRODUCT_ID, DATA1, DATA2, DATA3
FROM Preferred_Map INNER JOIN Version_1 USING(ID,PRODUCT_ID)
WHERE DATA1='x' AND DATA2='y' AND DATA3='z';
We run this same query for each of the 5 version tables and union them
together. However the performance is terrible compared with what we
get by running the query on a single version table without the join. These
queries run anywhere from 6 to 10 times slower with the join than without it,
which is pretty much unacceptable for our application.
Running explain on the above query yields this:
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: Preferred_Map type: ALL possible_keys: ID,PRODUCT_ID key: NULL key_len: NULL ref: NULL rows: 45191 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: Version_1 type: eq_ref possible_keys: PRIMARY,PRODUCT_ID key: PRIMARY key_len: 4 ref: HiRISE_Test.Preferred_Map.ID rows: 1 Extra: Using where
The bottle neck appears to be the full table scan on the Preferred_Map table. Is there a way to optimize this somehow, to prevent the full table scan? Or are we better off creating a "Preferred Products" table with copies of all the records that are preferred? At the moment, it appears that the latter option is better than trying to do this with a bunch of unions and joins.
Thanks for your help!
-Rod
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
-- Brent Baisley Systems Architect Landover Associates, Inc. Search & Advisory Services for Advanced Technology Environments p: 212.759.6400/800.759.0577
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]