RE: Hiding columns used in GROUP BY and HAVING clauses

Edward Kay Tue, 19 Jun 2007 01:44:29 -0700

> -----Original Message-----
> From: Baron Schwartz [mailto:[EMAIL PROTECTED]


> Dan Nelson wrote:
> > In the last episode (Jun 18), Edward Kay said:
> >> From: Dan Nelson [mailto:[EMAIL PROTECTED]
> >> <snip>
> >>>>>> At the moment, I have this and it works:
> >>>>>>
> >>>>>>        select * from contact_address
> >>>>>>        group by primary_entity_id
> >>>>>>        having count(primary_entity_id) = 1
> >>>>>>        and is_primary = 0;
> >>>>>>
> >>>>>> This is fine except I want to use the result in a sub-query.
> >>>>>> Since it returns two columns this doesn't work:
> >>>>>>
> >>>>>>        update contact_address set is_primary = 1 where 
> address_id in (
> >>>>>>                select * from contact_address
> >>>>>>                group by primary_entity_id
> >>>>>>                having count(primary_entity_id) = 1
> >>>>>>                and is_primary = 0
> >>>>>>        );
> >>>>>>
> >>>>>> Normally, I'd only return the address_id in the sub-SELECT,
> >>>>>> but I need the is_primary column for the HAVING clause.
> >>>>> I did some tests, and it looks like you can use aggregate
> >>>>> functions in your HAVING clause without actually selecting the
> >>>>> column.  So "HAVING COUNT(primary_entity_id) = 1" should work
> >>>>> even if you only select address_id.
> >>>> Yes, that is true and it does work.
> >>>>
> >>>> What doesn't work however, is the extra 'AND is_primary = 0' HAVING
> >>>> clause.
> >>> Couldn't you move that up into a WHERE clause (still in the
> >>> subquery)? It's just a regular field comparison, so it doesn't have
> >>> to be in the HAVING clause.
> >> No, because that's not the same thing. Imagine the following data:
> >>
> >> address_id   primary_entity_id   is_primary
> >> -------------------------------------------
> >>   1            293                 0
> >>   2            293                 1
> >>
> >> With my query above, the sub-query result set would be empty since
> >> count(primary_entity_id) = 2.
> >>
> >> If I move the is_primary = 0 requirement to a WHERE clause, then the
> >> first row would be incorrectly updated since the group function would
> >> only be working on the sub-set of data (where is_primary = 0).
> >>
> >> It's a subtle but important difference.
> > 
> > Yes, that complicates things a bit...  That does mean you'll need to
> > pull another column in your subquery.  I guess you could strip that
> > column out with a subquery :) Something like:
> > 
> > update contact_address set is_primary = 1 where address_id in ( 
> >   select address_id from (
> >     select address_id, primary_entity_id from contact_address
> >       group by primary_entity_id
> >       having count(primary_entity_id) = 1
> >       and primary_entity_id = 0 
> >   )
> > ); 
> > 
> 
> This is fine, but as I wrote earlier, MySQL does not do well with IN() 
> subqueries, and nesting a subquery in a dependent FROM clause is 
> probably even 
> harder for it to optimize well, since that will end up being an 
> un-indexed 
> temporary table.  It is probably better to write this as a JOIN.  
> I gave an 
> example a few days ago.

Thanks Baron and Dan for your comments, you were both really helpful.

Edward

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

RE: Hiding columns used in GROUP BY and HAVING clauses

Reply via email to