Re: Join with OR-condition and Indexes

Michael Dykman Tue, 05 Jan 2010 08:14:49 -0800

What it comes down to is that MySQL can only use 1 index per table per
query. The moment your query includes OR examining different columns,
a full table scan is the only option.


One typical way to implement this is to use UNIONS as Mr. Green suggested:

SELECT aid, count(*) FROM
(
SELECT a.id
FROM athletes a
LEFT JOIN matches m ON (m.team1_player_id = a.id)
    UNION
SELECT a.id
FROM athletes a
LEFT JOIN matches m ON (m.team1_partner_id = a.id)
    UNION
SELECT a.id
FROM athletes a
LEFT JOIN matches m ON (m.team2_player_id = a.id)
    UNION
SELECT a.id
FROM athletes a
LEFT JOIN matches m ON (m.team2_partner_id = a.id)
) AS tmp GROUP BY aid


On Tue, Jan 5, 2010 at 9:18 AM, Shawn Green <shawn.gr...@sun.com> wrote:
> Tobias Schultze wrote:
>>
>> Hello,
>>
>>
>> I'm working on an application for my bachelor thesis.
>>
>> I'm having a performance problem with a SQL-Query in MySQL5.
>>
>> I hoped someone can easily enlighten me in this issue.
>>
>>
>> The schema:
>>
>>
>> CREATE TABLE IF NOT EXISTS `athletes` (
>>
>>  `id` int(11) NOT NULL AUTO_INCREMENT,
>>
>>  `last_name` varchar(20) NOT NULL,
>>
>>  `first_name` varchar(20) NOT NULL,
>>
>>  `gender` enum('male','female') NOT NULL,
>>
>>  `birthday` date NOT NULL,
>>
>>  `country` char(2) NOT NULL,
>>
>>  `club_id` int(11) NOT NULL,
>>
>>  `is_active` tinyint(1) NOT NULL,
>>
>>  PRIMARY KEY (`id`),
>>
>>  KEY `gender_index_idx` (`gender`),
>>
>>  KEY `is_active_index_idx` (`is_active`),
>>
>>  KEY `club_id_idx` (`club_id`)
>>
>> ) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COMMENT='Sportler bzw Mitglieder des
>> Verbandes';
>>
>>
>> -- --------------------------------------------------------
>>
>>
>> CREATE TABLE IF NOT EXISTS `matches` (
>>
>>  `id` int(11) NOT NULL AUTO_INCREMENT,
>>
>>  `teammatch_id` int(11) NOT NULL,
>>
>>  `match_type` varchar(10) NOT NULL,
>>
>>  `team1_player_id` int(11) DEFAULT NULL,
>>
>>  `team1_partner_id` int(11) DEFAULT NULL,
>>
>>  `team2_player_id` int(11) DEFAULT NULL,
>>
>>  `team2_partner_id` int(11) DEFAULT NULL,
>>
>>  `team1_score` tinyint(3) unsigned DEFAULT NULL,
>>
>>  `team2_score` tinyint(3) unsigned DEFAULT NULL,
>>
>>  PRIMARY KEY (`id`),
>>
>>  KEY `teammatch_id_idx` (`teammatch_id`),
>>
>>  KEY `team1_player_id_idx` (`team1_player_id`),
>>
>>  KEY `team1_partner_id_idx` (`team1_partner_id`),
>>
>>  KEY `team2_player_id_idx` (`team2_player_id`),
>>
>>  KEY `team2_partner_id_idx` (`team2_partner_id`)
>>
>> ) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COMMENT='Spiele zwischen zwei oder
>> vier Sportlern (Einzel und Doppel)' AUTO_INCREMENT=46665 ;
>>
>>
>>
>> I want to get all matches for each athlete and calculate statistics such
>> as
>> number of matches etc.
>>
>> The basic very simplified query is like
>>
>>
>> SELECT a.id, COUNT(*)
>>
>> FROM athletes a
>>
>> LEFT JOIN matches m ON (
>>            m.team1_player_id = a.id OR
>>            m.team1_partner_id = a.id OR
>>            m.team2_player_id = a.id OR
>>            m.team2_partner_id = a.id
>>
>> )
>>
>> WHERE a.gender = 'female'
>>
>> GROUP BY a.id
>>
>>
>>
>> Now the problem is, that mysql uses a full table scan to retrieve the
>> matches for an athlete, so the execution takes many seconds or even worse.
>>
>> An athlete can be referenced in any of the m.team1_player_id OR
>> m.team1_partner_id OR m.team2_player_id OR m.team2_partner_id. (That
>> allows
>> doubles matches.)
>>
>> Why is an full table scan necessary although there is an index on each of
>> these fields? So an index exists for each OR-part of the join condition...
>>
>>
>> Here the execution plan:
>>
>>
>>
>> id
>> select_type
>> table
>> type
>> possible_keys
>> key
>> key_len
>> ref
>> rows
>> Extra
>>
>> 1
>>
>> SIMPLE
>>
>> a
>>
>> ref
>>
>> gender_index_idx
>>
>> gender_index_idx
>>
>> 1
>>
>> const
>>
>> 2193
>>
>> Using where; Using temporary; Using filesort
>>
>>
>> 1
>>
>> SIMPLE
>>
>> m
>>
>> ALL
>>
>> team1_player_id_idx,team1_partner_id_idx,team2_pla...
>>
>> NULL
>>
>> NULL
>>
>> NULL
>>
>> 46664
>>
>>
>>
>>
>> Joining on each fields like the following is very fast and uses the index
>> but of course doesn't give me the expected result.
>>
>>
>> FROM athletes a
>>
>> LEFT JOIN matches m ON (a.id = m.team1_player_id)
>>
>> LEFT JOIN matches m2 ON (a.id = m2.team2_player_id)
>>
>>
>>
>> Maybe I need to do a workaround using a UNION?
>>
>> But this doesn't help either: (It takes 76 seconds)
>>
>>
>> FROM athletes a
>>
>> LEFT JOIN (
>>
>> (
>>
>> SELECT team1_player_id AS player_id, teammatch_id, match_type,
>> team1_score,
>> team2_score, team1_points, team2_points, no_fight
>> FROM matches
>>
>> )
>> UNION
>> (
>>
>> SELECT team1_partner_id, teammatch_id, match_type, team1_score,
>> team2_score,
>> team1_points, team2_points, no_fight
>> FROM matches
>>
>> )
>>
>> UNION
>> (
>>
>> SELECT team2_player_id, teammatch_id, match_type, team1_score,
>> team2_score,
>> team1_points, team2_points, no_fight
>> FROM matches
>>
>> )
>>
>> UNION
>> (
>>
>> SELECT team2_partner_id, teammatch_id, match_type, team1_score,
>> team2_score,
>> team1_points, team2_points, no_fight
>> FROM matches
>>
>> )
>>
>> ) m ON (a.id = m.player_id)
>>
>>
>>
>>
>> I hope someone can help me with this.
>>
>> Thanks in advance.
>>
>>
>
> I think the problem here is that you have an ordered vector table (main1,
> partner1, main2, partner2) which requires you to optionally match on 4
> separate columns to make your JOIN condition because the athlete you want to
> locate could be in any of those 4 places in the match record.
>
> What would be faster is to normalize your match data into two tables. The
> first table would be just match information (id, date, location, etc) and
> another table of match-athlete pairs (match_id, athlete_id)
>
> alternatively you would have to use a query like
> (
> select...
> FROM athletes
> INNER JOIN matches
>  ON ... matches.member1
> ) UNION (
> select...
> FROM athletes
> INNER JOIN matches
>  ON ... matches.partner1
> ) UNION (
> select...
> FROM athletes
> INNER JOIN matches
>  ON ... matches.member2
> ) UNION (
> select...
> FROM athletes
> INNER JOIN matches
>  ON ... matches.partner2
> )
>
> in order to use the indexes for the JOINS.
>
> --
> Shawn Green, MySQL Senior Support Engineer
> Sun Microsystems, Inc.
> Office: Blountville, TN
>
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql?unsub=mdyk...@gmail.com
>
>



-- 
 - michael dykman
 - mdyk...@gmail.com

 May the Source be with you.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Join with OR-condition and Indexes

Reply via email to