Benjamin, When MySQL does a join, it appears that it considers one table as the primary table and one table as the dependent table. It then selects rows from the primary table and then, for each selected row, it fetches the corresponding rows from the dependent table.
For an inner join, MySQL can pick either of the two tables as the primary table. So, it picks the larger table, sorts it into the order needed for the GROUP BY, then fetches rows from the smaller table. In your case, the smaller table fits nicely in main memory, so the query is reasonably fast. For a left join, MySQL needs to use the first table as the primary table. In your case, the smaller table is the primary table, and you are using all of its rows. Since the larger table does not fit in main memory, MySQL reads randomly from it. Since there are many records in the child table for each parent row, and since they are spread around, MySQL is reading a large fraction of the child table for each of the 67 parent rows for which there are children. That will be slow. One could imagine the query optimizer sorting the large table for the left join, as it does for the inner join. I seem to recall that DB2, for example, sometimes does this. However, a query optimizer is, by nature, a package of compromises, and MySQL doesn't do it that way. So what can you do? Here are some thoughts: (1) Give MySQL enough main memory to hold all of the child table. -- This is clearly easiest, if you are able to do it, as it requires no database or SQL changes. However, it may not be possible. (2) Index the child table on (p_id, c_id). If you are selecting more columns from child in your actual query, include them, too. -- MySQL should satisfy the query using the index, rather than the data. The index is in order by p_id, so it will only need to be read once. -- However, you need to maintain this index. (3) Use a union to do the left join. SELECT PARENT.p_id, 0 FROM parent LEFT JOIN child ON (parent.p_id = child.p_id) WHERE child.p_id is null UNION ALL SELECT parent.p_id, COUNT(child.c_id) FROM parent JOIN child ON (parent.p_id = child.p_id) GROUP BY parent.p_id; -- The first SELECT should be satisfied using only the index on child.p_id, so it should be fast. It will give you the parents with no children. -- However, it does require rewriting your SQL. Also, if you want the result ordered by parent.p_id, you get to do a second ORDER BY on the result of the UNION. (4) Copy the child rows to a temporary table ordered by child.p_id, then do the left-join query. (5) Arrange to maintain the child table so that the rows are approximately ordered by child.p_id. -- This takes some work, but it might speed up other queries, if you frequently need to select all of the children for a particular parent. HTH Bill > Date: Thu, 22 Jan 2004 20:09:42 +0100 > From: Benjamin PERNOT > To: [EMAIL PROTECTED] > Subject: JOIN 10 times quicker than LEFT JOIN on big tables and simple queries? > Here is my problem: > I have 2 tables, a parent table and a child table. The parent table has got 113 > rows, the child table has got 3 000 000 rows. > parent: > ------------------- > | p_id | name | > ------------------- > | 1 | A | > | 2 | B | > | ... | ... | > | 112 | C | > | 113 | D | > ------------------- > child: > ------------------ > | c_id | p_id | > ------------------ > | 1 | 1 | > | 2 | 56 | > | ... | ... | > |2999999| 2 | > |3000000| 56 | > ------------------ > I want to get a list of all the parents (even the parents without child) with > the number of children they've got. I use a LEFT JOIN in order to retrieve all > the parents without exception : > SELECT parent.p_id, COUNT(child.c_id) FROM parent LEFT JOIN child ON (parent. > p_id = child.p_id) GROUP BY parent.p_id; > This query takes 140 seconds to be executed and I got 70 results. > Now if I use a basic JOIN like that: > SELECT parent.p_id, COUNT(child.c_id) FROM parent JOIN child ON (parent.p_id = > child.p_id) GROUP BY parent.p_id; > The query takes now 13 seconds to finish!! But now I got only 67 results because > the basic JOIN does not include the parents without children. > What I don't understand is why the JOIN is far much quicker than the LEFT JOIN > whereas the only difference is that the LEFT JOIN includes the parents without > children? Any explanations? > Here are the EXPLAIN for the 2 cases : > LEFT JOIN case : > -------------------------------------------------------------------------- ----- > table type possible_keys key key_len ref rows Extra > parent index NULL PRIMARY 4 NULL 113 Using index > child ref p_id p_id 5 parent.p_id 40694 > -------------------------------------------------------------------------- ----- > JOIN case: > -------------------------------------------------------------------------- ----- > table type possible_keys key key_len ref rows Extra > child ALL p_id NULL NULL NULL 3000000 Using > temporary; Using filesort > parent eq_ref PRIMARY PRIMARY 4 child.p_id 1 Using index > -------------------------------------------------------------------------- ----- > I'm using MySQL 4.0.13 and MyISAM tables. I'm using keys and Indexes. > Thank you very much. > Benjamin -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]