Thanks for your answers. I found out, that MySQL behaves very strange in this situation. I think this is a bug or important missing feature.
I would like to see how other DBMS behave in this situation, which I would think is a common problem - whenever you want to join one column of a table with several columns of another table. MySQL uses an index_merge when I query for one specific athlete: SELECT COUNT(*) FROM matches m WHERE (m.team1_player_id = 808884 OR m.team1_partner_id = 808884 OR m.team2_player_id = 808884 OR m.team2_partner_id = 808884) id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE m index_merge team1_player_id_idx,team1_partner_id_idx,team2_player_id_idx,team2_partner_i d_idx team1_player_id_idx,team1_partner_id_idx,team2_player_id_idx,team2_partner_i d_idx 5,5,5,5 NULL 153 Using union(team1_player_id_idx,team1_partner_id_idx,team2_player_id_idx,team2_par tner_id_idx); Using where But it doesn't use the index merge when joining like I originally tried, although logically it should be able to use it, shouldn't it? Of course, adding USE INDEX FOR JOIN (team1_player_id_idx, team1_partner_id_idx, team2_player_id_idx, team2_partner_id_idx) cannot convince MySQL to use the indexes. SELECT a.id, COUNT(*) FROM athletes a FROM athletes a LEFT JOIN matches m USE INDEX FOR JOIN (team1_player_id_idx, team1_partner_id_idx, team2_player_id_idx, team2_partner_id_idx) ON ( m.team1_player_id = a.id OR m.team1_partner_id = a.id OR m.team2_player_id = a.id OR m.team2_partner_id = a.id ) WHERE a.gender = 'female' GROUP BY a.id Then Michael Dykman said, MySQL is restricted to one index per table per query. So I thought, maybe I can help MySQL when I add a compound index for all players. So it could use the one index to resolve the join. ALTER TABLE `matches` ADD INDEX `players_idx` ( `team1_player_id` , `team1_partner_id` , `team2_player_id` , `team2_partner_id` ) ; Explain now gives me: id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE a ref gender_index_idx gender_index_idx 1 const 2193 Using where; Using temporary; Using filesort 1 SIMPLE m index team1_player_id_idx,team1_partner_id_idx,team2_pla... players_idx 20 NULL 46664 Using index Hm, what does this mean? It uses the index (players_idx) but still evaluates all 46664 rows? This does not make sense to me. Anyways, the query is still extremely slow (2400 rows in 3 min 5 sec). But with IGNORE INDEX (players_idx) the query seems to be infinite (over 15 min). So now it seems that UNIONS are the only options. But they are also much slower than it should be. You version took 10 seconds. My modified version with UNION ALL instead of DISTINCT takes 5.2 seconds - only showing athletes with min. 1 match. SELECT *, COUNT(am.id) AS number_matches FROM ( SELECT a.id AS athlete_id, a.first_name, a.last_name, a.gender, m.* FROM athletes a INNER JOIN matches m ON (m.team1_player_id = a.id) UNION ALL SELECT a.id AS athlete_id, a.first_name, a.last_name, a.gender, m.* FROM athletes a INNER JOIN matches m ON (m.team1_partner_id = a.id) UNION ALL SELECT a.id AS athlete_id, a.first_name, a.last_name, a.gender, m.* FROM athletes a INNER JOIN matches m ON (m.team2_player_id = a.id) UNION ALL SELECT a.id AS athlete_id, a.first_name, a.last_name, a.gender, m.* FROM athletes a INNER JOIN matches m ON (m.team2_partner_id = a.id) ) AS am GROUP BY am.athlete_id ORDER BY number_matches DESC But then I could also use the idea I had in the first post, which still needs 3.8 seconds. SELECT *, COUNT(am.id) AS number_matches FROM ( ( SELECT team1_player_id AS player_id, teammatch_id, match_type, team1_score, team2_score, team1_points, team2_points, no_fight FROM matches WHERE team1_player_id IS NOT NULL # reduced the query from 4.1 to 3.8 sec ) UNION ALL ( SELECT team1_partner_id, teammatch_id, match_type, team1_score, team2_score, team1_points, team2_points, no_fight FROM matches WHERE team1_partner_id IS NOT NULL ) UNION ALL ( SELECT team2_player_id, teammatch_id, match_type, team1_score, team2_score, team1_points, team2_points, no_fight FROM matches WHERE team2_player_id IS NOT NULL ) UNION ALL ( SELECT team2_partner_id, teammatch_id, match_type, team1_score, team2_score, team1_points, team2_points, no_fight FROM matches WHERE team2_partner_id IS NOT NULL ) ) AS am INNER JOIN athletes a ON (am.player_id = a.id) GROUP BY a.id ORDER BY number_matches DESC Changing the database schema doesn't seem to be usefull. Regarding my schema is already normalized and over-normalization generally decreases performances. I guess I would then run into other problems. Greetings Tobias -----Ursprüngliche Nachricht----- Von: Michael Dykman [mailto:mdyk...@gmail.com] Gesendet: Dienstag, 5. Januar 2010 17:14 An: mysql@lists.mysql.com Cc: Tobias Schultze Betreff: Re: Join with OR-condition and Indexes What it comes down to is that MySQL can only use 1 index per table per query. The moment your query includes OR examining different columns, a full table scan is the only option. One typical way to implement this is to use UNIONS as Mr. Green suggested: SELECT aid, count(*) FROM ( SELECT a.id FROM athletes a LEFT JOIN matches m ON (m.team1_player_id = a.id) UNION SELECT a.id FROM athletes a LEFT JOIN matches m ON (m.team1_partner_id = a.id) UNION SELECT a.id FROM athletes a LEFT JOIN matches m ON (m.team2_player_id = a.id) UNION SELECT a.id FROM athletes a LEFT JOIN matches m ON (m.team2_partner_id = a.id) ) AS tmp GROUP BY aid On Tue, Jan 5, 2010 at 9:18 AM, Shawn Green <shawn.gr...@sun.com> wrote: > Tobias Schultze wrote: >> >> Hello, >> >> >> I'm working on an application for my bachelor thesis. >> >> I'm having a performance problem with a SQL-Query in MySQL5. >> >> I hoped someone can easily enlighten me in this issue. >> >> >> The schema: >> >> >> CREATE TABLE IF NOT EXISTS `athletes` ( >> >> `id` int(11) NOT NULL AUTO_INCREMENT, >> >> `last_name` varchar(20) NOT NULL, >> >> `first_name` varchar(20) NOT NULL, >> >> `gender` enum('male','female') NOT NULL, >> >> `birthday` date NOT NULL, >> >> `country` char(2) NOT NULL, >> >> `club_id` int(11) NOT NULL, >> >> `is_active` tinyint(1) NOT NULL, >> >> PRIMARY KEY (`id`), >> >> KEY `gender_index_idx` (`gender`), >> >> KEY `is_active_index_idx` (`is_active`), >> >> KEY `club_id_idx` (`club_id`) >> >> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Sportler bzw Mitglieder des >> Verbandes'; >> >> >> -- -------------------------------------------------------- >> >> >> CREATE TABLE IF NOT EXISTS `matches` ( >> >> `id` int(11) NOT NULL AUTO_INCREMENT, >> >> `teammatch_id` int(11) NOT NULL, >> >> `match_type` varchar(10) NOT NULL, >> >> `team1_player_id` int(11) DEFAULT NULL, >> >> `team1_partner_id` int(11) DEFAULT NULL, >> >> `team2_player_id` int(11) DEFAULT NULL, >> >> `team2_partner_id` int(11) DEFAULT NULL, >> >> `team1_score` tinyint(3) unsigned DEFAULT NULL, >> >> `team2_score` tinyint(3) unsigned DEFAULT NULL, >> >> PRIMARY KEY (`id`), >> >> KEY `teammatch_id_idx` (`teammatch_id`), >> >> KEY `team1_player_id_idx` (`team1_player_id`), >> >> KEY `team1_partner_id_idx` (`team1_partner_id`), >> >> KEY `team2_player_id_idx` (`team2_player_id`), >> >> KEY `team2_partner_id_idx` (`team2_partner_id`) >> >> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Spiele zwischen zwei oder >> vier Sportlern (Einzel und Doppel)' AUTO_INCREMENT=46665 ; >> >> >> >> I want to get all matches for each athlete and calculate statistics such >> as >> number of matches etc. >> >> The basic very simplified query is like >> >> >> SELECT a.id, COUNT(*) >> >> FROM athletes a >> >> LEFT JOIN matches m ON ( >> m.team1_player_id = a.id OR >> m.team1_partner_id = a.id OR >> m.team2_player_id = a.id OR >> m.team2_partner_id = a.id >> >> ) >> >> WHERE a.gender = 'female' >> >> GROUP BY a.id >> >> >> >> Now the problem is, that mysql uses a full table scan to retrieve the >> matches for an athlete, so the execution takes many seconds or even worse. >> >> An athlete can be referenced in any of the m.team1_player_id OR >> m.team1_partner_id OR m.team2_player_id OR m.team2_partner_id. (That >> allows >> doubles matches.) >> >> Why is an full table scan necessary although there is an index on each of >> these fields? So an index exists for each OR-part of the join condition... >> >> >> Here the execution plan: >> >> >> >> id >> select_type >> table >> type >> possible_keys >> key >> key_len >> ref >> rows >> Extra >> >> 1 >> >> SIMPLE >> >> a >> >> ref >> >> gender_index_idx >> >> gender_index_idx >> >> 1 >> >> const >> >> 2193 >> >> Using where; Using temporary; Using filesort >> >> >> 1 >> >> SIMPLE >> >> m >> >> ALL >> >> team1_player_id_idx,team1_partner_id_idx,team2_pla... >> >> NULL >> >> NULL >> >> NULL >> >> 46664 >> >> >> >> >> Joining on each fields like the following is very fast and uses the index >> but of course doesn't give me the expected result. >> >> >> FROM athletes a >> >> LEFT JOIN matches m ON (a.id = m.team1_player_id) >> >> LEFT JOIN matches m2 ON (a.id = m2.team2_player_id) >> >> >> >> Maybe I need to do a workaround using a UNION? >> >> But this doesn't help either: (It takes 76 seconds) >> >> >> FROM athletes a >> >> LEFT JOIN ( >> >> ( >> >> SELECT team1_player_id AS player_id, teammatch_id, match_type, >> team1_score, >> team2_score, team1_points, team2_points, no_fight >> FROM matches >> >> ) >> UNION >> ( >> >> SELECT team1_partner_id, teammatch_id, match_type, team1_score, >> team2_score, >> team1_points, team2_points, no_fight >> FROM matches >> >> ) >> >> UNION >> ( >> >> SELECT team2_player_id, teammatch_id, match_type, team1_score, >> team2_score, >> team1_points, team2_points, no_fight >> FROM matches >> >> ) >> >> UNION >> ( >> >> SELECT team2_partner_id, teammatch_id, match_type, team1_score, >> team2_score, >> team1_points, team2_points, no_fight >> FROM matches >> >> ) >> >> ) m ON (a.id = m.player_id) >> >> >> >> >> I hope someone can help me with this. >> >> Thanks in advance. >> >> > > I think the problem here is that you have an ordered vector table (main1, > partner1, main2, partner2) which requires you to optionally match on 4 > separate columns to make your JOIN condition because the athlete you want to > locate could be in any of those 4 places in the match record. > > What would be faster is to normalize your match data into two tables. The > first table would be just match information (id, date, location, etc) and > another table of match-athlete pairs (match_id, athlete_id) > > alternatively you would have to use a query like > ( > select... > FROM athletes > INNER JOIN matches > ON ... matches.member1 > ) UNION ( > select... > FROM athletes > INNER JOIN matches > ON ... matches.partner1 > ) UNION ( > select... > FROM athletes > INNER JOIN matches > ON ... matches.member2 > ) UNION ( > select... > FROM athletes > INNER JOIN matches > ON ... matches.partner2 > ) > > in order to use the indexes for the JOINS. > > -- > Shawn Green, MySQL Senior Support Engineer > Sun Microsystems, Inc. > Office: Blountville, TN > > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql?unsub=mdyk...@gmail.com > > -- - michael dykman - mdyk...@gmail.com May the Source be with you. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=arch...@jab.org