Well, it seems you've helped solve my problem, Shawn...here we go:
first off, sorry about the text wrapping and the sloppy sql...I do get
lazy sometimes...I am a programmer :P
you asked how fast this runs:
SELECT ind_first_name
, ind_last_name
, fam_phone
, max(iat_date) max_date
FROM tms_individual_account_transactions
, tms_families
, tms_individuals
WHERE fam_key = ind_fam_key
and ind_key = iat_ind_key
and iat_date <= '2000-01-01'
GROUP BY ind_key
ORDER BY ind_last_name, ind_first_name;
when I remove the group by and order by...to do that, I also have to
remove the max function, like so:
SELECT ind_first_name
, ind_last_name
, fam_phone
FROM tms_individual_account_transactions
, tms_families
, tms_individuals
WHERE fam_key = ind_fam_key
and ind_key = iat_ind_key
and iat_date <= '2000-01-01';
nice and quick result, about 3 seconds...
now, this query:
SELECT ind_first_name
, ind_last_name
, fam_phone
, max(iat_date) max_date
FROM tms_individual_account_transactions
, tms_families
, tms_individuals
WHERE fam_key = ind_fam_key
and ind_key = iat_ind_key
and iat_date <= '2000-01-01'
GROUP BY 2,1,3;
takes over 120 seconds...some improvement, but not much...
now, in reading your suggestion to create the temp table, I realized
that my previous explain plan that I posted wasn't using the index on
the iat_date field; rather, it was using the index on the iat_ind_key
field; so, I wondered what would happen if I "forced" using the
index_iat_date index...like so:
select ind_first_name, ind_last_name, fam_phone, max(iat_date) max_date
from tms_individual_account_transactions force index (index_iat_date),
tms_families, tms_individuals where
fam_key = ind_fam_key and
ind_key = iat_ind_key and
iat_date <= '2005-01-01'
group by ind_key
order by ind_last_name, ind_first_name
and now my query that was taking 120+ seconds, was now taking around 5
seconds...
and an explain plan to prove it:
+--------+--------+---------------+--------+---------+--------+--------+--------+
| table | type | possible_keys | key | key_len | ref | rows |
Extra |
+--------+--------+---------------+--------+---------+--------+--------+--------+
| tms_individual_account_transactions| range | index_iat_date|
index_iat_date| 3 | [NULL] | 49695 | Using where; Using
temporary; Using filesort|
| tms_individuals| eq_ref | PRIMARY,index_ind_fam_key| PRIMARY| 4
| tms_individual_account_transactions.iat_ind_key| 1 | |
| tms_families| eq_ref | PRIMARY | PRIMARY| 4 |
tms_individuals.ind_fam_key| 1 | |
+--------+--------+---------------+--------+---------+--------+--------+--------+
now, I'm not a DBA or a MySQL guru, so I won't try to comprehend "why"
it works so much better...I'm more of a "tinker until it works kind of a
guy"...maybe this whole scenario of mine might be handy for MySQL
optimizer developers to take note of?
Anyway, I appreciate your comments/suggestions, Shawn...it got me to
have a second look at what I was doing and try some different
things...exactly what I needed!
Thanks!
Kevin
[EMAIL PROTECTED] wrote:
Kevin Weslowski <[EMAIL PROTECTED]> wrote on 05/26/2005 02:20:46 AM:
> Hi,
>
> I've got a query that I believe should be giving me better performance
> than it is, since I've put indexes on all the appropriate columns...I'll
> include as much info as I can for anyone wishing to help me optimize the
> query/tables/indicies...thanks in advance for any assistance...
>
> the query:
>
> select ind_first_name, ind_last_name, fam_phone, max(iat_date) max_date
> from tms_individual_account_transactions, tms_families, tms_individuals
> where
> fam_key = ind_fam_key and
> ind_key = iat_ind_key and
> iat_date <= '2000-01-01'
> group by ind_key
> order by ind_last_name, ind_first_name;
>
> it takes ~150 seconds on a PIII 667 MHz, with 640MB RAM...and returns
> ~700 rows...
>
> MySQL server version: 4.0.18
>
> tms_families: ~1000 records
> tms_individuals: ~1700 records
> tms_individual_account_transactions: ~100000 records
>
> the explain plan:
>
> +--------+--------+---------------+--------+---------+--------
> +--------+--------+
> | table | type | possible_keys | key | key_len | ref | rows |
> Extra |
> +--------+--------+---------------+--------+---------+--------
> +--------+--------+
> | tms_families| ALL | PRIMARY | [NULL] | [NULL] | [NULL] | 993
> | Using temporary; Using filesort|
> | tms_individuals| ref | PRIMARY,index_ind_fam_key|
> index_ind_fam_key| 4 | tms_families.fam_key| 1 | |
> | tms_individual_account_transactions| ref |
> index_iat_ind_key,index_iat_date| index_iat_ind_key| 4 |
> tms_individuals.ind_key| 79 | Using where|
> +--------+--------+---------------+--------+---------+--------
> +--------+--------+
>
>
>
> now, I know what's killing the query is the max() and group by, but I
> still think I should get better performance??
>
> now, the table definitions (sorry for the overkill, but there's
> referential integrity I needed to maintain with extraneous tables):
>
> CREATE TABLE `tms_marriage_statuses` (
> `mst_key` int(11) unsigned NOT NULL default '0',
> `mst_description` varchar(15) NOT NULL default '',
> PRIMARY KEY (`mst_key`),
> KEY `unique_mst_description` (`mst_description`)
> ) TYPE=InnoDB;
>
> CREATE TABLE `tms_families` (
> `fam_key` int(11) unsigned NOT NULL default '0',
> `fam_name` varchar(50) NOT NULL default '',
> `fam_sortname` varchar(25) NOT NULL default '',
> `fam_address` varchar(100) default '',
> `fam_city` varchar(50) default '',
> `fam_prov_state` char(2) default '',
> `fam_postal_zip_code` varchar(10) default '',
> `fam_phone` varchar(14) default '',
> `fam_update_user` varchar(15) default '',
> `fam_update_date` timestamp(14) NOT NULL,
> `fam_cmts_family_no` int(11) unsigned default '0',
> `fam_admin_comments` varchar(255) default NULL,
> PRIMARY KEY (`fam_key`),
> UNIQUE KEY `unique_fam_name` (`fam_name`),
> KEY `index_fam_cmts_family_no` (`fam_cmts_family_no`)
> ) TYPE=InnoDB;
>
> CREATE TABLE `tms_individuals` (
> `ind_key` int(11) unsigned NOT NULL default '0',
> `ind_fam_key` int(11) unsigned NOT NULL default '0',
> `ind_last_name` varchar(30) NOT NULL default '',
> `ind_first_name` varchar(20) NOT NULL default '',
> `ind_middle_name` varchar(20) default NULL,
> `ind_name_title` varchar(6) default NULL,
> `ind_name_suffix` char(3) default NULL,
> `ind_marriage_status` int(11) unsigned NOT NULL default '0',
> `ind_sex` char(1) NOT NULL default '',
> `ind_join_date` date default NULL,
> `ind_birth_date` date default NULL,
> `ind_envelope_number` varchar(10) default NULL,
> `ind_update_user` varchar(15) default NULL,
> `ind_update_date` timestamp(14) NOT NULL,
> `ind_admin_comments` varchar(255) default NULL,
> `ind_cmts_member_no` int(11) unsigned default NULL,
> `ind_deceased` char(1) NOT NULL default 'N',
> `ind_member` char(1) NOT NULL default 'Y',
> PRIMARY KEY (`ind_key`),
> KEY `index_ind_fam_key` (`ind_fam_key`),
> KEY `index_ind_marriage_status` (`ind_marriage_status`),
> KEY `index_ind_cmts_member_no` (`ind_cmts_member_no`),
> CONSTRAINT `tms_individuals_ibfk_1` FOREIGN KEY (`ind_fam_key`)
> REFERENCES `tms_families` (`fam_key`) ON DELETE CASCADE ON UPDATE
CASCADE,
> CONSTRAINT `tms_individuals_ibfk_2` FOREIGN KEY
> (`ind_marriage_status`) REFERENCES `tms_marriage_statuses` (`mst_key`)
> ) TYPE=InnoDB;
>
> CREATE TABLE `tms_accounts` (
> `acc_key` int(11) unsigned NOT NULL default '0',
> `acc_number` int(11) unsigned NOT NULL default '0',
> `acc_description` varchar(50) NOT NULL default '',
> `acc_update_user` varchar(15) default '',
> `acc_update_date` timestamp(14) NOT NULL,
> PRIMARY KEY (`acc_key`),
> UNIQUE KEY `unique_acc_number` (`acc_number`)
> ) TYPE=InnoDB;
>
> CREATE TABLE `tms_individual_account_transactions` (
> `iat_key` int(11) unsigned NOT NULL default '0',
> `iat_ind_key` int(11) unsigned NOT NULL default '0',
> `iat_acc_key` int(11) unsigned NOT NULL default '0',
> `iat_date` date NOT NULL default '0000-00-00',
> `iat_amount` decimal(10,2) NOT NULL default '0.00',
> `iat_update_user` varchar(15) default '',
> `iat_update_date` timestamp(14) NOT NULL,
> `iat_comments` varchar(100) default NULL,
> `iat_post_datetime` datetime default NULL,
> PRIMARY KEY (`iat_key`),
> KEY `index_iat_ind_key` (`iat_ind_key`),
> KEY `index_iat_acc_key` (`iat_acc_key`),
> KEY `index_iat_date` (`iat_date`),
> CONSTRAINT `fk_iat_acc_key` FOREIGN KEY (`iat_acc_key`) REFERENCES
> `tms_accounts` (`acc_key`),
> CONSTRAINT `fk_iat_ind_key` FOREIGN KEY (`iat_ind_key`) REFERENCES
> `tms_individuals` (`ind_key`) ON DELETE CASCADE ON UPDATE CASCADE
> ) TYPE=InnoDB;
>
>
>
>
> I think that's all the pertinent info...if I missed anything else that
> would be helpful in solving this, let me know...
>
> Thanks all!
>
> Kevin
>
THANK YOU! THANK YOU! THANK YOU! for posting your SHOW CREATE TABLE
Statements and the EXPLAIN plan for the query. If we could get everyone
else to do just that, this list would be a much nicer place to lurk. 8-D
Here is your original query vertically reformatted (The "vertical"
format survives message wrapping better):
SELECT ind_first_name
, ind_last_name
, fam_phone
, max(iat_date) max_date
FROM tms_individual_account_transactions
, tms_families
, tms_individuals
WHERE fam_key = ind_fam_key
and ind_key = iat_ind_key
and iat_date <= '2000-01-01'
GROUP BY ind_key
ORDER BY ind_last_name, ind_first_name;
IF you run it without the GROUP BY and ORDER BY clauses how fast is it?
If you changed your query to say:
SELECT ind_first_name
, ind_last_name
, fam_phone
, max(iat_date) max_date
FROM tms_individual_account_transactions
, tms_families
, tms_individuals
WHERE fam_key = ind_fam_key
and ind_key = iat_ind_key
and iat_date <= '2000-01-01'
GROUP BY 2,1,3;
you will be grouping on all of your non-aggregated columns (most RDBMS
systems would have rejected your query on the fact that you weren't
doing just that), not some column that is not even part of the results,
AND as a side effect of the GROUP BY, you get sorting for free. That's
why I am grouping on ind_last_name and ind_first_name (to duplicate your
original ORDER BY) which leaves fam_phone listed last (because it needed
to be there for this to be complete).
It's my pet peeve to leave queries in this format (implicit INNER
JOINS). I firmly believe that the comma separated format is just too
easily abused. Here is my updated query reformatted to use explicit JOIN
statements:
SELECT ind_first_name
, ind_last_name
, fam_phone
, max(iat_date) max_date
FROM tms_individual_account_transactions
INNER JOIN tms_families
ON fam_key = ind_fam_key
INNER JOIN tms_individuals
ON ind_key = iat_ind_key
WHERE iat_date <= '2000-01-01'
GROUP BY 2,1,3;
Another thing you might consider is to pre-compute a table of
MAX(idat_date) values for each iat_ind_key for the dates <='2000-01-01'
then JOINing the other two tables to that in order to lookup the textual
information. Assuming there is an average of 10 transactions per
individual this will speed up your JOINs by a factor of about 10 (it
will take appx 1/10 the time to create the final report)
CREATE TEMPORARY TABLE tmpMaxDates (KEY (iat_ind_key))
SELECT iat_ind_key, max(iat_date) max_date
FROM tms_individual_account_transactions
WHERE iat_date <= '2001-01-01'
GROUP BY 1;
SELECT ind_first_name
, ind_last_name
, fam_phone
, max_date
FROM tmpMaxDates
INNER JOIN tms_families
ON fam_key = ind_fam_key
INNER JOIN tms_individuals
ON ind_key = iat_ind_key
ORDER BY ind_last_name, ind_first_name;
DROP TEMPORARY TABLE tmpMaxDates;
This technique will probably reduce your total time to a few seconds or
less (The longest part of that will be to build tmpMaxDates). Let us
know what works and what doesn't, OK?
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]