Re: INNER JOIN across multiple tables appear very slow.

sam wun Sun, 23 Jan 2005 18:06:04 -0800

Hi Michael,

Thank you for being so kind to explain the problem to me.

Michael Stassen wrote:

sam wun wrote:
sam wun wrote:
Hi,
Can anyone tell me how to optimize the following sql statement?
$sql_1 = "SELECT t.prodcode 'Product Code', t.prodname 'Product Name', ROUND(avg(t.salesvolume),2) 'PastSales Quantity', ROUND(avg(tt.salesvolume),2) 'Sales Quantity', ROUND(avg(t.netsales),2) 'PastSales', ROUND(avg(tt.netsales),2) 'Sales', Round(((avg(t.netsales) - avg(tt.netsales))/avg(tt.netsales))*100,2) 'SalesDiff', Round(((avg(t.salesvolume) - avg(tt.salesvolume))/avg(tt.salesvolume))*100,2) 'QtyDiff' FROM customer c INNER JOIN transaction t ON c.custcode = t.custcode AND date(t.date) >= '$past_date_period_startdate' AND date(t.date) <= '$past_date_period_enddate' AND c.salescode = '$salescode_param' INNER JOIN transaction tt ON c.custcode = tt.custcode AND date(tt.date) >= '$input_date_period_startdate' AND date(tt.date) <= '$input_date_period_enddate' AND c.salescode = '$salescode_param' AND t.prodcode = tt.prodcode INNER JOIN inventory i ON i.prodcode = t.prodcode AND i.prodcode = tt.prodcode AND i.basename = '$basename_param' WHERE i.prodname is not NULL AND i.basename is not NULL GROUP BY c.salescode, i.basename, t.prodcode ORDER BY SalesDiff desc;
Thanks
Sam
I see several potential problems:
1) The biggest fixable problem is the use of the date() function. You should never use a function of a column value in a WHERE clause, if you can avoid it, because that prevents the use of an index on that column to help choose rows. In this case, the date() function isn't even needed, so even if there is no index on t.date, or the optimizer doesn't choose to use it, you are still forcing 4 unnecessary function calls per row.
In other words, change
  AND date(t.date) >= '$past_date_period_startdate'
  AND date(t.date) <= '$past_date_period_enddate'
to
  AND t.date >= '$past_date_period_startdate'
  AND t.date <= '$past_date_period_enddate'
or the equivalent
  AND t.date BETWEEN '$past_date_period_startdate'
                 AND '$past_date_period_enddate'

Yes, I am totally agree with this change. I m still new to mysql/sql statement. I never realised the date field can be used for comparison without using date.

2) In general, you should'nt put things in the JOIN condition which are not part of the JOIN criteria. For example, the condition "c.salescode = '$salescode_param'" is a restriction on which rows in c to consider, not a condition of the JOIN to t or tt, so it belongs in the WHERE clause.

3) You have a complex GROUP BY based on a column from each table, and an ORDER BY based on a calculated value. That rules out using an index to do the grouping or ordering. Hence the dreaded "Using temporary; Using filesort" in your explain output. This appears unavoidable in your case.

This clears my question about how to use JOIN. The "using temporary" may be caused by the clause "create temporary table" in other sql statement that created before execute the sql statement as written above. I was trying to break down the entire operation into sub-queires so that it produce sub-recordset faster and then JOIN with these recordset later. I m not sure whether using temporary tables will also speed thing up. I will drop the idea of using temporary table if that doesn't help much. I guess I can change the group by with only one column rather than 3 columns, and the result still the same since salescode and basename are input by the user, so this query only display the recordset of the product(code/name) base on value of salescode and the basename.

4) You have redundant conditions on your JOIN to i. Since t.prodcode = tt.prodcode for every row of your results (because you require this in the join of t to tt), the two conditions "i.prodcode = t.prodcode" and "i.prodcode = tt.prodcode" are equivalent. The optimizer is probably smart enough to notice that, but it will have less work to do if you pick one and delete the other.

5) Because you require "i.basename = '$basename_param'", i.basename cannot be NULL, so there is no need for the conditrion "i.basename is not NULL".

6) Are you sure you need "i.prodname is not NULL"? If you can be certain that all rows in inventory with non NULL prodcode and basename have non NULL prodnames, you could leave that out. I can't assume you don't need it based only on the rest of your query, so I'll leave it in, for now.

You are right, as I have made a changed to my table definition, basename and prodcname will never be NULL, its defult value is UNKNOWN.

Based on the above, I'd suggest you change your query to:
SELECT t.prodcode 'Product Code', t.prodname 'Product Name', ROUND(avg(t.salesvolume),2) 'PastSales Quantity', ROUND(avg(tt.salesvolume),2) 'Sales Quantity', ROUND(avg(t.netsales),2) 'PastSales', ROUND(avg(tt.netsales),2) 'Sales', ROUND(((avg(t.netsales)-avg(tt.netsales))/avg(tt.netsales))*100,2) AS 'SalesDiff', ROUND(((avg(t.salesvolume)-avg(tt.salesvolume))/avg(tt.salesvolume))*100,2)
   AS 'QtyDiff'
 FROM customer c
 JOIN transaction t
    ON c.custcode = t.custcode
 JOIN transaction tt
    ON c.custcode = tt.custcode
    AND t.prodcode = tt.prodcode
 JOIN inventory i
    ON i.prodcode = t.prodcode
 WHERE c.salescode = '$salescode_param'
   AND t.date BETWEEN '$past_date_period_startdate'
                  AND '$past_date_period_enddate'
   AND tt.date BETWEEN '$input_date_period_startdate'
                   AND '$input_date_period_enddate'
   AND i.basename = '$basename_param'
   AND i.prodname is not NULL
 GROUP BY c.salescode, i.basename, t.prodcode
 ORDER BY SalesDiff desc;
Hi, here is the result from the explain command on the query:
<snip> +-------+--------+-------------------+----------+---------+---------------------+------+----------------------------------------------+

| table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+--------+-------------------+----------+---------+---------------------+------+----------------------------------------------+

| i | ALL | PRIMARY,basename | NULL | NULL | NULL | 2491 | Using where; Using temporary; Using filesort | | t | ref | custcode,prodcode | prodcode | 32 | datacube.i.prodcode | 30 | Using where | | c | eq_ref | PRIMARY,custcode | PRIMARY | 32 | datacube.t.custcode | 1 | | | tt | ref | custcode,prodcode | prodcode | 32 | datacube.i.prodcode | 30 | Using where | +-------+--------+-------------------+----------+---------+---------------------+------+----------------------------------------------+
4 rows in set (0.00 sec)
sam
I spent a while trying to make sense of this before I realized that this EXPLAIN doesn't really help with your original question, as you've changed the qery. Still, I can make a couple of comments.

1) The JOIN condition for t and c includes "c.custcode = t.custcode", yet EXPLAIN says the optimizer is choosing the PRIMARY index over the custcode index in the customer table. This suggests to me that you have redundant indexes, which causes the optimizer to waste time considering useless index choices (and slows INSERTs). If you need help cleaning up your indexes, send the output of
  SHOW INDEX FROM tablename
for each of your tables.

2) Since the goal is to choose transactions for a particular customer during a specific date range, your query might benefit from a multi-column index in table transaction on (custcode, date). If you do add that multi-column index, it will also act as a single column index on custcode, so you could then drop any existing single column index on custcode (in transaction). It is also possible a multi-column index on (prodcode, date) could help, depending on your actual data.

I've been trying to do that, but I never have idea how to add multi-column indexing across multi-tables. It certainly will help alot if multi-column index can be implemented.


Thanks again for your help
Sam.

Michael

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: INNER JOIN across multiple tables appear very slow.

Reply via email to