Brent B. Powers wrote:

I've created and populated two simple tables:

create table a (id serial, d integer)
insert a (d) values (1)
insert a (d) values (2)
insert a (d) values (3)
insert a (d) values (4)

create table a (id serial, a integer)
insert b (a) values (1)
insert b (a) values (1)
insert b (a) values (4)
insert b (a) values (4)

Now, I'd like the id's of a that are not referred to in b... thus:

mysql>   select a.id
   ->     from a
   ->left join b on a.id = b.a
   -> group by b.a
   ->   having count(b.a) = 0;
+----+
| id |
+----+
|  2 |
+----+
1 rows in set (0.00 sec)

That's not right.... check all of the counts:
mysql>    select a.id, count(b.a)
   ->      from a
   -> left join b on a.id = b.a
   ->  group by b.a;
+----+------------+
| id | count(b.a) |
+----+------------+
|  2 |          0 |
|  1 |          2 |
|  4 |          2 |
+----+------------+
3 rows in set (0.00 sec)

Still not good.. where's 3?
mysql> select * from b;
+----+------+
| id | a    |
+----+------+
|  1 |    1 |
|  2 |    1 |
|  3 |    4 |
|  4 |    4 |
+----+------+
4 rows in set (0.00 sec)

Can anyone tell me why I don't see a.id = 3 having a reference count of 0?

Cheers

Because you are grouping on b.a, and there is no 3 in b. Table b is on the right side of the LEFT JOIN. When b doesn't have a row for the JOIN, it gets a made up row with all values set to NULL. Consider:


  mysql> SELECT b.a, COUNT(b.a)
      -> FROM a LEFT JOIN b ON a.id = b.a
      -> GROUP BY b.a;
  +------+------------+
  | a    | COUNT(b.a) |
  +------+------------+
  | NULL |          0 |
  |    1 |          2 |
  |    4 |          2 |
  +------+------------+
  3 rows in set (0.00 sec)

See? When you GROUP BY b.a there are 3 groups, 1, 4, and everything that was missing (the NULL group).

Many DBs wouldn't even allow your query, because you tried to select a column, a.id, not referenced in your GROUP BY. MySQL allows this, but the manual warns against selecting a column whose values are not unique per group. a.id does not have unique values per group, because all unreferenced a.ids are thrown together in the NULL b.a group. In that case, you simply get *one* of them for that group (the first one found - hence 2, not 3).

You could fix your query by grouping on a.id, which is the thing you wanted anyway.

  mysql> SELECT a.id, COUNT(b.a)
      -> FROM a LEFT JOIN b ON a.id = b.a
      -> GROUP BY a.id;
  +----+------------+
  | id | COUNT(b.a) |
  +----+------------+
  |  1 |          2 |
  |  2 |          0 |
  |  3 |          0 |
  |  4 |          2 |
  +----+------------+
  4 rows in set (0.00 sec)

Hence,

  mysql> SELECT a.id, COUNT(b.a)
      -> FROM a LEFT JOIN b ON a.id = b.a
      -> GROUP BY a.id
      -> HAVING COUNT(b.a) = 0;
  +----+------------+
  | id | COUNT(b.a) |
  +----+------------+
  |  2 |          0 |
  |  3 |          0 |
  +----+------------+
  2 rows in set (0.00 sec)

This is inefficient, though. You are counting the number of references for every a.id, then filtering out the rows with non-zero counts in post-processing using the HAVING clause. You are relying on COUNT() to return zero for NULL, but you really only need to find the NULLs to have what you want. This query is usually better:

  mysql> SELECT a.id
      -> FROM a LEFT JOIN b ON a.id = b.a
      -> WHERE b.a IS NULL;
  +----+
  | id |
  +----+
  |  2 |
  |  3 |
  +----+
  2 rows in set (0.01 sec)

Why? Because the optimizer, seeing the WHERE b.a IS NULL, should be smart enough to stop looking for matches in b for a particular a.id as soon as it finds the first one <http://dev.mysql.com/doc/mysql/en/LEFT_JOIN_optimization.html>.

Michael

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to