Hi Michael,
Thank you for being so kind to explain the problem to me.
Michael Stassen wrote:
Yes, I am totally agree with this change. I m still new to mysql/sql statement. I never realised the date field can be used for comparison without using date.
sam wun wrote:
sam wun wrote:
Hi,
Can anyone tell me how to optimize the following sql statement?
$sql_1 =
"SELECT t.prodcode 'Product Code',
t.prodname 'Product Name',
ROUND(avg(t.salesvolume),2) 'PastSales Quantity',
ROUND(avg(tt.salesvolume),2) 'Sales Quantity',
ROUND(avg(t.netsales),2) 'PastSales',
ROUND(avg(tt.netsales),2) 'Sales',
Round(((avg(t.netsales) - avg(tt.netsales))/avg(tt.netsales))*100,2) 'SalesDiff',
Round(((avg(t.salesvolume) - avg(tt.salesvolume))/avg(tt.salesvolume))*100,2) 'QtyDiff'
FROM customer c
INNER JOIN transaction t
ON c.custcode = t.custcode
AND date(t.date) >= '$past_date_period_startdate'
AND date(t.date) <= '$past_date_period_enddate'
AND c.salescode = '$salescode_param'
INNER JOIN transaction tt
ON c.custcode = tt.custcode
AND date(tt.date) >= '$input_date_period_startdate'
AND date(tt.date) <= '$input_date_period_enddate'
AND c.salescode = '$salescode_param'
AND t.prodcode = tt.prodcode
INNER JOIN inventory i
ON i.prodcode = t.prodcode
AND i.prodcode = tt.prodcode
AND i.basename = '$basename_param'
WHERE i.prodname is not NULL
AND i.basename is not NULL
GROUP BY c.salescode, i.basename, t.prodcode
ORDER BY SalesDiff desc;
Thanks Sam
I see several potential problems:
1) The biggest fixable problem is the use of the date() function. You should never use a function of a column value in a WHERE clause, if you can avoid it, because that prevents the use of an index on that column to help choose rows. In this case, the date() function isn't even needed, so even if there is no index on t.date, or the optimizer doesn't choose to use it, you are still forcing 4 unnecessary function calls per row.
In other words, change
AND date(t.date) >= '$past_date_period_startdate' AND date(t.date) <= '$past_date_period_enddate'
to
AND t.date >= '$past_date_period_startdate' AND t.date <= '$past_date_period_enddate'
or the equivalent
AND t.date BETWEEN '$past_date_period_startdate' AND '$past_date_period_enddate'
2) In general, you should'nt put things in the JOIN condition which are not part of the JOIN criteria. For example, the condition "c.salescode = '$salescode_param'" is a restriction on which rows in c to consider, not a condition of the JOIN to t or tt, so it belongs in the WHERE clause.This clears my question about how to use JOIN.
3) You have a complex GROUP BY based on a column from each table, and an ORDER BY based on a calculated value. That rules out using an index to do the grouping or ordering. Hence the dreaded "Using temporary; Using filesort" in your explain output. This appears unavoidable in your case.
The "using temporary" may be caused by the clause "create temporary table" in other sql statement that created before execute the sql statement as written above.
I was trying to break down the entire operation into sub-queires so that it produce sub-recordset faster and then JOIN with these recordset later.
I m not sure whether using temporary tables will also speed thing up. I will drop the idea of using temporary table if that doesn't help much.
I guess I can change the group by with only one column rather than 3 columns, and the result still the same since salescode and basename are input by the user, so this query only display the recordset of the product(code/name) base on value of salescode and the basename.
4) You have redundant conditions on your JOIN to i. Since t.prodcode = tt.prodcode for every row of your results (because you require this in the join of t to tt), the two conditions "i.prodcode = t.prodcode" and "i.prodcode = tt.prodcode" are equivalent. The optimizer is probably smart enough to notice that, but it will have less work to do if you pick one and delete the other.You are right, as I have made a changed to my table definition, basename and prodcname will never be NULL, its defult value is UNKNOWN.
5) Because you require "i.basename = '$basename_param'", i.basename cannot be NULL, so there is no need for the conditrion "i.basename is not NULL".
6) Are you sure you need "i.prodname is not NULL"? If you can be certain that all rows in inventory with non NULL prodcode and basename have non NULL prodnames, you could leave that out. I can't assume you don't need it based only on the rest of your query, so I'll leave it in, for now.
I remembed I made slight change on indexing of the tables after posting the previous message, here is the current indexing situiation of all the tables:Based on the above, I'd suggest you change your query to:
SELECT
t.prodcode 'Product Code',
t.prodname 'Product Name',
ROUND(avg(t.salesvolume),2) 'PastSales Quantity',
ROUND(avg(tt.salesvolume),2) 'Sales Quantity',
ROUND(avg(t.netsales),2) 'PastSales',
ROUND(avg(tt.netsales),2) 'Sales',
ROUND(((avg(t.netsales)-avg(tt.netsales))/avg(tt.netsales))*100,2)
AS 'SalesDiff',
ROUND(((avg(t.salesvolume)-avg(tt.salesvolume))/avg(tt.salesvolume))*100,2)
AS 'QtyDiff' FROM customer c JOIN transaction t ON c.custcode = t.custcode JOIN transaction tt ON c.custcode = tt.custcode AND t.prodcode = tt.prodcode JOIN inventory i ON i.prodcode = t.prodcode WHERE c.salescode = '$salescode_param' AND t.date BETWEEN '$past_date_period_startdate' AND '$past_date_period_enddate' AND tt.date BETWEEN '$input_date_period_startdate' AND '$input_date_period_enddate' AND i.basename = '$basename_param' AND i.prodname is not NULL GROUP BY c.salescode, i.basename, t.prodcode ORDER BY SalesDiff desc;
Hi, here is the result from the explain command on the query:
<snip>
+-------+--------+-------------------+----------+---------+---------------------+------+----------------------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+--------+-------------------+----------+---------+---------------------+------+----------------------------------------------+
| i | ALL | PRIMARY,basename | NULL | NULL | NULL | 2491 | Using where; Using temporary; Using filesort |
| t | ref | custcode,prodcode | prodcode | 32 | datacube.i.prodcode | 30 | Using where |
| c | eq_ref | PRIMARY,custcode | PRIMARY | 32 | datacube.t.custcode | 1 | |
| tt | ref | custcode,prodcode | prodcode | 32 | datacube.i.prodcode | 30 | Using where |
+-------+--------+-------------------+----------+---------+---------------------+------+----------------------------------------------+
4 rows in set (0.00 sec)
sam
I spent a while trying to make sense of this before I realized that this EXPLAIN doesn't really help with your original question, as you've changed the qery. Still, I can make a couple of comments.
1) The JOIN condition for t and c includes "c.custcode = t.custcode", yet EXPLAIN says the optimizer is choosing the PRIMARY index over the custcode index in the customer table. This suggests to me that you have redundant indexes, which causes the optimizer to waste time considering useless index choices (and slows INSERTs). If you need help cleaning up your indexes, send the output of
SHOW INDEX FROM tablename
for each of your tables.
mysql> SHOW INDEX FROM customer;
+----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| customer | 0 | PRIMARY | 1 | custcode | A | 458 | NULL | NULL | | BTREE | |
| customer | 1 | salescode | 1 | salescode | A | 28 | NULL | NULL | | BTREE | |
| customer | 1 | custcode | 1 | custcode | A | 458 | NULL | NULL | | BTREE | |
+----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)
mysql> SHOW INDEX FROM transaction;
+-------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| transaction | 0 | PRIMARY | 1 | transcode | A | 161763 | NULL | NULL | | BTREE | |
| transaction | 1 | custcode | 1 | custcode | A | 364 | NULL | NULL | | BTREE | |
| transaction | 1 | prodcode | 1 | prodcode | A | 5392 | NULL | NULL | | BTREE | |
| transaction | 1 | date | 1 | date | A | 1234 | NULL | NULL | | BTREE | |
| transaction | 1 | netsales | 1 | netsales | A | 17973 | NULL | NULL | | BTREE | |
| transaction | 1 | salesvolume | 1 | salesvolume | A | 304 | NULL | NULL | | BTREE | |
+-------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.24 sec)
mysql> SHOW INDEX FROM inventory;
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| inventory | 0 | PRIMARY | 1 | prodcode | A | 2044 | NULL | NULL | | BTREE | |
| inventory | 1 | basename | 1 | basename | A | 28 | NULL | NULL | YES | BTREE | |
| inventory | 1 | vendorname | 1 | vendorname | A | 24 | NULL | NULL | YES | BTREE | |
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.01 sec)
2) Since the goal is to choose transactions for a particular customer during a specific date range, your query might benefit from a multi-column index in table transaction on (custcode, date). If you do add that multi-column index, it will also act as a single column index on custcode, so you could then drop any existing single column index on custcode (in transaction). It is also possible a multi-column index on (prodcode, date) could help, depending on your actual data.I've been trying to do that, but I never have idea how to add multi-column indexing across multi-tables. It certainly will help alot if multi-column index can be implemented.
Thanks again for your help Sam.
Michael
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]