Re: Improving the performance of joins

Brent Baisley Tue, 17 May 2005 06:14:21 -0700

You should have a compound index in all your tables on ID+PRODUCT_ID, since that is what you are joining on. Ideally, you should also have an index on the DATA column that you will be using in the WHERE clause. But your structure does really lend itself to this.

Going forward, you may think about redesigning your tables if you can. Your structure is very "flat", databases are designed for quick searching "vertically", meaning finding specific rows. You really want to keep your columns to a minimum if you can so you can take full advantage of indexes.

If your data really doesn't change all that often (or not at all), then I think you should create a "Preferred Products" table. This is sort of what you do in datawarehousing, design for accessibility instead of maintainability.


On May 16, 2005, at 6:04 PM, Rod Heyd wrote:

I have a question about joins.  My situation is as follows:
I have 5 tables identical in structure. Each table represents essentially the same data, however, the data in each table represents a different "version." The processing involved in generating the values stored in each table may have changed slightly, and therefore the values calculated may vary as well.
On top of this structure is a need to identify a particular version of
a record as the "preferred" version.
The version tables look something like this:
Tables Version_(1-5)
----------------------------------------------------------------------- ------------------------------- ID |PRODUCT_ID |DATA1 |DATA2 |DATA3 |.........|DATA_N | ----------------------------------------------------------------------- -------------------------------
ID                    is a unique value across all 5 version tables
PRODUCT_ID is unique to a given version table and identifies the source "product", but is not unique in all 5 tables, ie, given a PRODUCT_ID=27 all 5 version tables will have a single record with PRODUCT_ID=27.
DATA1...DATA_N  These are columns with calculated values.
Now in order to identify which version of a given PRODUCT_ID is "preferred," we have created a map table that looks similar to this:
Table Preferred_Map
-------------------------------------------------
MAP_ID  |ID     |PRODUCT_ID     |
-------------------------------------------------
MAP_ID is an auto_increment primary key
ID is the unique ID from the Version tables (unique value)
PRODUCT_ID is the PRODUCT_ID from the tables (unique value)
So to find information about records that are preferred, we have
queries that look like this:
SELECT
        Preferred_Map.PRODUCT_ID, DATA1, DATA2, DATA3
FROM
        Preferred_Map INNER JOIN Version_1 USING(ID,PRODUCT_ID)
WHERE
        DATA1='x' AND DATA2='y' AND DATA3='z';
We run this same query for each of the 5 version tables and union them together. However the performance is terrible compared with what we get by running the query on a single version table without the join. These queries run anywhere from 6 to 10 times slower with the join than without it, which is pretty much unacceptable for our application.
Running explain on the above query yields this:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: Preferred_Map
type: ALL
possible_keys: ID,PRODUCT_ID
key: NULL
key_len: NULL
ref: NULL
rows: 45191
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: Version_1
type: eq_ref
possible_keys: PRIMARY,PRODUCT_ID
key: PRIMARY
key_len: 4
ref: HiRISE_Test.Preferred_Map.ID
rows: 1
Extra: Using where
The bottle neck appears to be the full table scan on the
Preferred_Map table.  Is there a way to optimize this somehow, to
prevent the full table scan?  Or are we better off creating a
"Preferred Products" table with copies of all the records that are
preferred?  At the moment, it appears that the latter option is
better than trying to do this with a bunch of unions and joins.
Thanks for your help!
-Rod
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]

--
Brent Baisley
Systems Architect
Landover Associates, Inc.
Search & Advisory Services for Advanced Technology Environments
p: 212.759.6400/800.759.0577


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Improving the performance of joins

Reply via email to