Re: Query with group by

Rhino Thu, 16 Sep 2004 08:11:10 -0700

I agree that Michael's solution with the temporary tables is the best I have
seen so far.


I am running MySQL 4.0.15 so I don't have any subquery capability. As a
result, I went to DB2 to come up with one solution that does give the right
answer via a subquery:

select content, location, version, date
from versions
where concat(location, char(date)) in
(select concat(location, char(max(date)))
from versions
where content = 'ALPHA'
group by location);

I don't know if this will work in MySQL 4.1.x though. You may want to give
it a try if you have 4.1.x. By the way, I'm not convinced that this is the
*best* solution using a subquery; it's just the first one I could think of.
I don't have all day to spend on this ;-)

One other thought: is it really necessary to keep track of all of the
different versions you have had at each location? I can't think of a lot of
cases where I'd really care that I had version 8 of the ALPHA in Tokyo last
week or last year. Your query would be simpler - and the volume of data
would be less - if you only kept track of the current version for each
content at each location.

Rhino

----- Original Message ----- 
From: "Michael Stassen" <[EMAIL PROTECTED]>
To: "Jose Miguel Pérez" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Thursday, September 16, 2004 10:47 AM
Subject: Re: Query with group by


>
> Jose Miguel Pérez wrote:
>
> > Hi Vincent!
> >
> >>i'm trying to built a query that report me the latest tuple for a given
> >>field.
> >>I explain myself a bit. Let's say i have a table such this :
> >
> >     There is no simple solution to your problem. Strictly speaking, it's
> > forbidden to name a column in the SELECT clause which is NOT on the
GROUP BY
> > clause. (Read chapter: "13.9.3 GROUP BY with Hidden Fields" from the
MySQL
> > Documentation.). Anyway, MySQL is relaxed from ANSI SQL in such a way
it's
> > permitted to include a column which is not in the GROUP BY clause
(hidden
> > columns). The values returned from such a query are "unpredictable".
> >
> >     I started saying there is no _simple_ solution. However, you can
think a
> > little bit and your problem goes away. I can think of two solutions to
your
> > problem which neither involves a subquery.
> >
> >     You are lucky in that you know for sure what your last value is (you
do
> > have a "date" field"). You already know the value of using MAX(date) in
the
> > SELECT. You can concatenate the date column with the version, and return
a
> > "SUBSTRING" from that operation, something like this:
> >
> >         SELECT SUBSTRING_INDEX(MAX(CONCAT(date, ' ', version)), ' ', -1)
AS
> > correct_version,
> >                 location, version AS wrong_version
> >         FROM cities
> >         WHERE content='ALPHA'
> >         GROUP BY location
>
> This is the MAX-CONCAT trick.  It works, but it's inefficient, as it has
to
> do a full table scan, with calculations done on each row.  An index on
date
> cannot be used in this case, because we are searching for the maximum
> CONCAT(date, ' ', version), rather than the maximum date.
>
> >     You will get the following result set: (Notice the "correct_version"
> > column).
> >
> > +-----------------+----------+---------------+
> > | correct_version | location | wrong_version |
> > +-----------------+----------+---------------+
> > | 10              | NEW-YORK |            11 |
> > | 10              | PARIS    |            10 |
> > | 11              | TOKYO    |            10 |
> > +-----------------+----------+---------------+
> >
> >     There is another option, however, and this one is what I like most.
In
> > two words, you can join the table with itself, like this:
> >
> >         SELECT c1.date, c1.location, c1.version
> >         FROM cities c1
> >         LEFT JOIN cities c2
> >             ON c1.location=c2.location AND c1.content=c2.content
> >                 AND c2.date>c1.date
> >         WHERE c2.id IS NULL AND c1.content = 'ALPHA'
> >
> >     What we have done here is first LEFT JOIN the table (cities in my
> > example) with itself so that we have in the left part those rows which
have
> > the maximum date and with NULL on the right as the value of c2. We then
> > select those saying "WHERE c2.id IS NULL". The last operation is
selecting
> > the rows for an 'ALPHA' content.
>
> This will work, but it is also somewhat inefficient.  The LEFT JOIN is
> creating numerous extra, unwanted rows, only to throw them away with the
> WHERE c2.id IS NULL.  Assuming n rows for a particular location value, you
> are creating 1 + (n * (n - 1)/2) rows {optimized down to n rows, if id is
> defined NOT NULL} to find the one row you want for that group.  That's no
> big deal for this small sample table, but it may not scale well.
>
> >     Notice that with this version we also eliminate the GROUP BY, since
all
> > rows returned from the join are unique. The result set returned with the
> > last example is this:
> >
> > +------------+----------+---------+
> > | date       | location | version |
> > +------------+----------+---------+
> > | 2004-09-16 | PARIS    |      10 |
> > | 2004-09-16 | NEW-YORK |      10 |
> > | 2004-09-15 | TOKYO    |      11 |
> > +------------+----------+---------+
>
> The most efficient way is probably to use a temporary table.
>
>    CREATE TEMPORARY TABLE max_dates
>    SELECT location, MAX(date) AS max_date
>    FROM temp
>    WHERE content = 'ALPHA'
>    GROUP BY location;
>
>    SELECT t.*
>    FROM temp t, max_dates m
>    WHERE t.location = m.location
>    AND t.date = m.max_date;
>
>    DROP TABLE max_dates;
>
> The manual describes the MAX-CONCAT trick, the temporary table solution,
and
> a subquery solution for 4.1+
> <http://dev.mysql.com/doc/mysql/en/example-Maximum-column-group-row.html>.
>
> >     Cheers,
> >     Jose Miguel.
>
> Michael
>
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
>


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query with group by

Reply via email to