Re: ranking album plays by city, help me beat Oracle

Benjamin Pflugmann Thu, 22 Feb 2001 02:14:08 -0800
Hello.

On Tue, Feb 20, 2001 at 03:27:08PM -0800, [EMAIL PROTECTED] wrote:
> > Huh? You got it 7 times faster by only extracting that query from the
> > for loop and changing it to a group by? Or did you do anything else?
> 
> Sorry there were some other unnecessary queries that were mistakenly
> in the measurement..

Ah, okay. Never mind, but good to know.

> Here is the missing EXPLAIN.
> 
> mysql> EXPLAIN SELECT same_album_id, SUM(td_num) FROM album_by_metro
>     -> WHERE metro = 'Atlanta' AND rpt_dt >= '2001-01-01' AND
>     -> rpt_dt <= '2001-02-14'
>     -> GROUP BY same_album_id HAVING SUM(td_num) > 500
>     -> ;
> +----------------+-------+----------------------------+---------------------+---
> ------+------+------+-----------------------------+
> | table          | type  | possible_keys              | key                 | ke
> y_len | ref  | rows | Extra                       |
> +----------------+-------+----------------------------+---------------------+---
> ------+------+------+-----------------------------+
> | album_by_metro | range | rpt_dt,album_by_metro_idx2 | album_by_metro_idx2 |
>   103 | NULL | 8925 | where used; Using temporary |
> +----------------+-------+----------------------------+---------------------+---
> ------+------+------+-----------------------------+
> 1 row in set (0.00 sec)

"Using temporary" is not good (see also http://www.mysql.com/doc/E/X/EXPLAIN.html).
Hm. I don't know, why a temporary table is used here. Two suggestions:
Make a special key for that query (metro,rpt_dt,same_album_id,td_sum).
And second, force in-memory tables with SELECT SQL_SMALL_RESULT
same_album_id, ...

Could you post the EXPLAIN and the speed measurement after each of the
changes, please.

> Sorry I tried the below suggested query but it it is much slower at
> 1min 30 secs probably because it ranks all albums for all metros
> rather than just working on the rank of one album over the metros.

Hm. Sorry, I don't understand your argument. By the for loop, your
query does the same, doesn't it (selecting the rank of one album in
each metro is the same as selecting for all metros the rank of one
album)?

But, of course, yes, the time is way off.

> mysql> EXPLAIN SELECT metro, same_album_id, SUM(td_num) AS played
>     -> FROM album_by_metro
>     -> WHERE rpt_dt >= '2001-01-01' AND rpt_dt <= '2001-02-14'
>     -> GROUP BY metro, same_album_id
>     -> HAVING SUM(td_num) > 1000
>     -> ORDER BY metro, same_album_id, played DESC;
> +----------------+-------+---------------+--------+---------+------+--------+---
> -------------------------------------------------------+
> | table          | type  | possible_keys | key    | key_len | ref  | rows   | Ex
> tra                                                    |
> +----------------+-------+---------------+--------+---------+------+--------+---
> -------------------------------------------------------+
> | album_by_metro | index | rpt_dt        | rpt_dt |     161 | NULL | 716943 | wh
> ere used; Using index; Using temporary; Using filesort |
> +----------------+-------+---------------+--------+---------+------+--------+---
> ------------------------------------------------------+

Hm. rpt_dt used as I guessed and the number of rows seems reasonable
(about 100 times as much as with the query above). "using index" is
fine (key_len indicates that not only rpt_dt is read from the key, but
also all the other column values). But "using temporary" and "using
filesort" is considered bad.

We could use a better key, leaving out the column "region", e.g.
(rpt_dt,metro,same_album_id,td_sum), but I guess, this will give only
a small improvement (would you try it please?).

The use of a temporary table is obligatory when GROUP BY and ORDER BY
differ. One could change ORDER BY to "metro, same_album_id" and do the
sorting on client side, but I doubt this would be a better solution.

I guess the real problem is that the temporary table is created on
disk, and this would be *slow*. One can force in-memory temporary
tables by using SELECT SQL_SMALL_RESULT metro... (the temporary table
should be no bigger than, hm, about 500*100*160 Bytes = 8MB... I hope
that is acceptable with your hardware.)

Would you please try it?

We probably should bother about the filesort, because with the GROUP
BY (and the suggested index above), we already have indirectly sorted
on (metro,same_album_id) and only need to sort on the last column
(played).

I am really curious, whether my arguments hold, and therefore what
your test results are. Thank you in advance.

> Can this faster approach below which in a loop queries the # of
> albums with sums be optimized to be a single query. ?

I believe no, my query above was the best try I could give this.

> Thanks for all your help & interest.

Your are welcome.

Bye,

        Benjamin.


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: ranking album plays by city, help me beat Oracle

Reply via email to