Benjamin,

When MySQL does a join, it appears that it considers one table as the
primary
table and one table as the dependent table.  It then selects rows from the
primary table and then, for each selected row, it fetches the corresponding
rows from the dependent table.

For an inner join, MySQL can pick either of the two tables as the primary
table.
So, it picks the larger table, sorts it into the order needed for the GROUP
BY, then
fetches rows from the smaller table.  In your case, the smaller table fits
nicely
in main memory, so the query is reasonably fast.

For a left join, MySQL needs to use the first table as the primary table.
In your
case, the smaller table is the primary table, and you are using all of its
rows.
Since the larger table does not fit in main memory, MySQL reads randomly
from
it.  Since there are many records in the child table for each parent row,
and since
they are spread around, MySQL is reading a large fraction of the child table
for
each of the 67 parent rows for which there are children.  That will be slow.

One could imagine the query optimizer sorting the large table for the left
join, as
it does for the inner join.  I seem to recall that DB2, for example,
sometimes does
this.  However, a query optimizer is, by nature, a package of compromises,
and
MySQL doesn't do it that way.

So what can you do?  Here are some thoughts:

(1)  Give MySQL enough main memory to hold all of the child table.

-- This is clearly easiest, if you are able to do it, as it requires no
database or SQL changes.  However, it may not be possible.

(2)  Index the child table on (p_id, c_id).  If you are selecting more
columns
from child in your actual query, include them, too.

-- MySQL should satisfy the query using the index, rather than the data.
The
index is in order by p_id, so it will only need to be read once.

-- However, you need to maintain this index.

(3)  Use a union to do the left join.

  SELECT PARENT.p_id, 0
    FROM parent LEFT JOIN child ON (parent.p_id = child.p_id)
    WHERE child.p_id is null
  UNION ALL
  SELECT parent.p_id, COUNT(child.c_id)
    FROM parent JOIN child ON (parent.p_id = child.p_id)
    GROUP BY parent.p_id;

-- The first SELECT should be satisfied using only the index on child.p_id,
so it should be fast.  It will give you the parents with no children.

-- However, it does require rewriting your SQL.  Also, if you want the
result ordered by parent.p_id, you get to do a second ORDER BY on
the result of the UNION.

(4)  Copy the child rows to a temporary table ordered by child.p_id, then
do the left-join query.

(5)  Arrange to maintain the child table so that the rows are approximately
ordered by child.p_id.

-- This takes some work, but it might speed up other queries, if you
frequently
need to select all of the children for a particular parent.


HTH

Bill

> Date: Thu, 22 Jan 2004 20:09:42 +0100
> From: Benjamin PERNOT
> To: [EMAIL PROTECTED]
> Subject: JOIN 10 times quicker than LEFT JOIN on big tables and simple
queries?

> Here is my problem:
> I have 2 tables, a parent table and a child table. The parent table has
got 113
> rows, the child table has got 3 000 000 rows.

> parent:
> -------------------
> | p_id  |   name  |
> -------------------
> |  1    |     A   |
> |  2    |     B   |
> |  ...  |    ...  |
> |  112  |     C   |
> |  113  |     D   |
> -------------------

> child:
> ------------------
> | c_id  |  p_id  |
> ------------------
> |  1    |    1   |
> |  2    |    56  |
> |  ...  |   ...  |
> |2999999|    2   |
> |3000000|    56  |
> ------------------

> I want to get a list of all the parents (even the parents without child)
with
> the number of children they've got. I use a LEFT JOIN in order to retrieve
all
> the parents without exception :

> SELECT parent.p_id, COUNT(child.c_id) FROM parent LEFT JOIN child ON
(parent.
> p_id = child.p_id) GROUP BY parent.p_id;

> This query takes 140 seconds to be executed and I got 70 results.
> Now if I use a basic JOIN like that:

> SELECT parent.p_id, COUNT(child.c_id) FROM parent JOIN child ON
(parent.p_id =
> child.p_id) GROUP BY parent.p_id;

> The query takes now 13 seconds to finish!! But now I got only 67 results
because
> the basic JOIN does not include the parents without children.

> What I don't understand is why the JOIN is far much quicker than the LEFT
JOIN
> whereas the only difference is that the LEFT JOIN includes the parents
without
> children? Any explanations?

> Here are the EXPLAIN for the 2 cases :
> LEFT JOIN case :
> --------------------------------------------------------------------------
-----
> table  type   possible_keys key      key_len ref         rows  Extra
> parent index  NULL          PRIMARY  4       NULL        113   Using index
> child  ref    p_id          p_id     5       parent.p_id 40694
> --------------------------------------------------------------------------
-----

> JOIN case:
> --------------------------------------------------------------------------
-----
> table  type   possible_keys  key     key_len  ref        rows    Extra
> child  ALL    p_id           NULL    NULL     NULL       3000000 Using
> temporary; Using filesort
> parent eq_ref PRIMARY        PRIMARY 4        child.p_id 1       Using
index
> --------------------------------------------------------------------------
-----


> I'm using MySQL 4.0.13 and MyISAM tables. I'm using keys and Indexes.
> Thank you very much.

> Benjamin



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to