Re: Hiding columns used in GROUP BY and HAVING clauses

Baron Schwartz Mon, 18 Jun 2007 09:37:27 -0700


Dan Nelson wrote:

In the last episode (Jun 18), Edward Kay said:

From: Dan Nelson [mailto:[EMAIL PROTECTED]
<snip>

At the moment, I have this and it works:

        select * from contact_address
        group by primary_entity_id
        having count(primary_entity_id) = 1
        and is_primary = 0;

This is fine except I want to use the result in a sub-query.
Since it returns two columns this doesn't work:

        update contact_address set is_primary = 1 where address_id in (
                select * from contact_address
                group by primary_entity_id
                having count(primary_entity_id) = 1
                and is_primary = 0
        );

Normally, I'd only return the address_id in the sub-SELECT,
but I need the is_primary column for the HAVING clause.

I did some tests, and it looks like you can use aggregate
functions in your HAVING clause without actually selecting the
column.  So "HAVING COUNT(primary_entity_id) = 1" should work
even if you only select address_id.

Yes, that is true and it does work.

What doesn't work however, is the extra 'AND is_primary = 0' HAVING
clause.

Couldn't you move that up into a WHERE clause (still in the
subquery)? It's just a regular field comparison, so it doesn't have
to be in the HAVING clause.

No, because that's not the same thing. Imagine the following data:

address_id   primary_entity_id   is_primary
-------------------------------------------
  1            293                 0
  2            293                 1

With my query above, the sub-query result set would be empty since
count(primary_entity_id) = 2.

If I move the is_primary = 0 requirement to a WHERE clause, then the
first row would be incorrectly updated since the group function would
only be working on the sub-set of data (where is_primary = 0).

It's a subtle but important difference.


Yes, that complicates things a bit...  That does mean you'll need to
pull another column in your subquery.  I guess you could strip that
column out with a subquery :) Something like:

update contact_address set is_primary = 1 where address_id in (select address_id from (

    select address_id, primary_entity_id from contact_address
      group by primary_entity_id
      having count(primary_entity_id) = 1

and primary_entity_id = 0));

This is fine, but as I wrote earlier, MySQL does not do well with IN()subqueries, and nesting a subquery in a dependent FROM clause is probably evenharder for it to optimize well, since that will end up being an un-indexedtemporary table. It is probably better to write this as a JOIN. I gave anexample a few days ago.


Cheers
Baron

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Hiding columns used in GROUP BY and HAVING clauses

Reply via email to