Hi Michael!

    Yes, you're right, thanks for extending and clarifying my message.
However, I'm not confident about your comments on the inefficiency for the
following query:

> >         SELECT c1.date, c1.location, c1.version
> >         FROM cities c1
> >         LEFT JOIN cities c2
> >             ON c1.location=c2.location AND c1.content=c2.content
> >                 AND c2.date>c1.date
> >         WHERE c2.id IS NULL AND c1.content = 'ALPHA'
> >
>
> This will work, but it is also somewhat inefficient.  The LEFT JOIN is
> creating numerous extra, unwanted rows, only to throw them away with the
> WHERE c2.id IS NULL.  Assuming n rows for a particular location value, you

    [...] and then [...]

> The most efficient way is probably to use a temporary table.
>
>    CREATE TEMPORARY TABLE max_dates
>    SELECT location, MAX(date) AS max_date
>    FROM temp
>    WHERE content = 'ALPHA'
>    GROUP BY location;
>
>    SELECT t.*
>    FROM temp t, max_dates m
>    WHERE t.location = m.location
>    AND t.date = m.max_date;
>
>    DROP TABLE max_dates;

    I don't think the temporary table is such an efficient way of doing
this. Pardon me, I'm provably wrong, but let me explain to see if I think
correctly. First, I assume as true this table have an index on "location",
"content" and "date", apart from the PK on ID. Given that, on my query we
are using the keys at full, I mean, although you say "the left join is
creating numerous extra, unwanted rows", this is not true. We could apply
the standard algebra here, but the real world query optimizers are smart
enough to not retrieve unwanted data. (What about joining four or more
tables! Multiply then).

    Your query is creating a temporary table, doing a full scan of it
(thanks to the MAX(date) function), etc. If you do a EXPLAIN SELECT for your
query, you'll notice there is an Extra of: "Using where; Using temporary;
Using FILESORT". Reading the MySQL documentation, one can see "If you want
to make your queries as fast as possible, you should look out for Extra
values of Using filesort and Using temporary.". (Chapter 7.2.1 EXPLAIN
Syntax).

    If I'm not wrong, maybe the first LEFT JOIN is worse from a mathematical
point of view, but the temporary one may be is the worst from a practical
perspective.

    And you'll see I'm very cautious because I'm not such a SQL guru, but
I'd like to know other opinions.

    Anyway, I don't know if one can program an agregate UDF called something
like EXTERNAL_MAX(...) or something, so that we could do like:

        SELECT EXTERNAL_MAX(date, version)  ---> i.e: Returns the "version"
value for the row with MAX(date).

    This, for sure, will be the best solution. ;-)

    Cheers,
    Jose Miguel.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to