This is the latest approach we're using now. It turns out that the query to get Total 
Plays for an Album per City can be taken out of the City for loop & done only once as 
a GROUP BY. 
The worst case time for MySQL is now 16 seconds compared to Oracle's 8 seconds. On 
most cases MySQL is now faster than Oracle but once in a while it's slower. 

 $query = "SELECT metro, sum(td_num) FROM album_by_metro
                  WHERE same_album_id = $album
                  AND rpt_dt >= '$BeginDate' AND rpt_dt <= '$EndDate'
       GROUP BY metro"; 
$start = time(); 
 $result = mysql_query($query) or die ("Can't get $query" . mysql_error()); 

 while ($row = mysql_fetch_array($result)) 
 { 
  $metro = $row[0]; 
  $metros[] = $metro; 
  $V_SUMS[$metro] = $row[1];
 }
// while 
 mysql_free_result($result); 
// 
 foreach ($metros as $metro)
 {
  $V_SUM = $V_SUMS[$metro]; 
  $query = "SELECT same_album_id, SUM(td_num) FROM album_by_metro
    WHERE  metro = '$metro' AND rpt_dt >= '$BeginDate' AND rpt_dt <= '$EndDate'
      GROUP BY same_album_id HAVING SUM(td_num) > $V_SUM";
  
  $result = mysql_query($query, $connection);
  if ($result)
   $rank = mysql_num_rows($result)+1;
  else
   $rank = 0; 
 $ranks[$metro] = $rank; 
 }


Below is everything you asked for except the Oracle stored procedure which I can't 
send.
thanks for your help,
- Sam.
mysql> describe album_by_metro;
+---------------+--------------+------+-----+------------+-------+--------------
--------+
| Field         | Type         | Null | Key | Default    | Extra | Privileges
        |
+---------------+--------------+------+-----+------------+-------+--------------
--------+
| rpt_dt        | date         |      | MUL | 0000-00-00 |       | select,insert
,update |
| region        | varchar(50)  |      |     |            |       | select,insert
,update |
| metro         | varchar(100) |      | MUL |            |       | select,insert
,update |
| same_album_id | int(11)      |      | MUL | 0          |       | select,insert
,update |
| td_num        | int(11)      |      |     | 0          |       | select,insert
,update |
 
mysql> EXPLAIN SELECT same_album_id, SUM(td_num) FROM album_by_metro
    WHERE  metro = 'Atlanta' AND rpt_dt >= '2001-01-01' AND
     rpt_dt <= '2001-02-14'
      GROUP BY same_album_id HAVING SUM(td_num) > 500
mysql> show index from album_by_metro;
+----------------+------------+---------------------+--------------+------------
---+-----------+-------------+----------+--------+---------+
| Table          | Non_unique | Key_name            | Seq_in_index | Column_name
   | Collation | Cardinality | Sub_part | Packed | Comment |
+----------------+------------+---------------------+--------------+------------
---+-----------+-------------+----------+--------+---------+
| album_by_metro |          1 | rpt_dt              |            1 | rpt_dt
   | A         |        NULL |     NULL | NULL   |         |
| album_by_metro |          1 | rpt_dt              |            2 | region
   | A         |        NULL |     NULL | NULL   |         |
| album_by_metro |          1 | rpt_dt              |            3 | metro
   | A         |        NULL |     NULL | NULL   |         |
| album_by_metro |          1 | rpt_dt              |            4 | same_album_
id | A         |        NULL |     NULL | NULL   |         |
| album_by_metro |          1 | rpt_dt              |            5 | td_num
   | A         |        NULL |     NULL | NULL   |         |
| album_by_metro |          1 | album_by_metro_idx2 |            1 | metro
   | A         |        NULL |     NULL | NULL   |         |
| album_by_metro |          1 | album_by_metro_idx2 |            2 | rpt_dt
   | A         |        NULL |     NULL | NULL   |         |
| album_by_metro |          1 | album_by_metro_idx1 |            1 | same_album_
id | A         |        NULL |     NULL | NULL   |         |
| album_by_metro |          1 | album_by_metro_idx1 |            2 | rpt_dt
   | A         |        NULL |     NULL | NULL   |         |
+----------------+------------+---------------------+--------------+------------
---+-----------+-------------+----------+--------+---------+
 
  Benjamin Pflugmann <[EMAIL PROTECTED]> wrote: 
Hi.

On Thu, Feb 15, 2001 at 09:25:56PM -0800, [EMAIL PROTECTED] wrote:
> 
> > Could you please post the result of "EXPLAIN" for both queries and "SHOW INDEX 
>FROM album_by_city".
> 
> 
> I haven't used EXPLAIN before.

Just put it before the SELECT in your query. I don't know whether this
will work directly in your program (php?), you can do it another way,
too. Let's take your first query:

SELECT SUM(total_plays) FROM album_by_city WHERE album_id = $album_id
AND city = $this_city AND date >= $StartDate AND date <= $EndDate;

You have to replace all variables by some real values. (best if you
simply put an additional echo in your php(?) program).

Assume $album_id=10, $this_city="New York", $StartDate="2001-01-01"
and $EndDate="2001-02-14" would be valid values (replace them by
whatever correct values are), then your select looks like

SELECT SUM(total_plays) FROM album_by_city WHERE album_id=10 AND
city="New York" AND date >= "2001-01-01" AND date <= "2001-02-14";

You can execute this in the mysql command line client. And then simply
add EXPLAIN to it:

EXPLAIN SELECT SUM(total_plays) FROM album_by_city WHERE album_id=10 AND
city="New York" AND date >= "2001-01-01" AND date <= "2001-02-14";

And do the same for the second query.

> There are indexes on each column.

That's not enough. To use index best with the above query, you would
need an combined index on (album_id,city,date) or maybe even
(album_id,city,date,total_plays), which maybe could avoid to read the
data file at all.

So, please post the result of SHOW INDEX FROM album_by_city, which
enables us to give better help (instead of guessing around).

> There are about 100 cities or regions that are used for ranking.

Hm. 200 queries, half a second per query does not seem too bad with
only a single column index.

> > Additionally, I would be interested, if you used the same method with
> 
> > Oracle or another. If so, would you please post it, too.
> 
> I didn't write the Oracle queries but they used sub selects and all
> kinds of not available in MySQL optimizations.

Again, if you are not forbidden to do so, please post it. Maybe it
took an approach I am not thinking of. It would render it unnecessary
that we have to start from scratch.

To be true, I am a bit nerved. I quite concretely asked for
information to be able to help, and you somehow did not bother to give
specific answers and I had to ask the same questions again.

I would appreciate, if you take the time to gather the information and
post what I asked for, even if you do not see immediate need, as I
have perfectly my reasons. Please do your part to enable us to do
ours. Thank you.

Bye,

Benjamin.


> 
> Thanks for your reply.
> 
> - Sam.
> Benjamin Pflugmann 
wrote: 
> Hi.
> 
> Could you please post the result of "EXPLAIN" for both queries and
> "SHOW INDEX FROM album_by_city".
> 
> And, for how much cities does this gets executed?
> 
> Additionally, I would be interested, if you used the same method with
> Oracle or another. If so, would you please post it, too.
> 
> Bye,
> 
> Benjamin.
> 
> 
> On Thu, Feb 15, 2001 at 10:47:16AM -0800, [EMAIL PROTECTED] wrote:
> > 
> > I'm moving over a music play database from Oracle to MySQL. So far MySQL is faster 
>for all reports except this one where Oracle takes 8 seconds & MySQL will take 108 
>with my current method.
> [...]
> > I want to report the ranking of an album's plays (against all other albums ) for 
>every city over a time period.
> > National album ranking for "Beatles #1" between 1/1/2001 and 2/14/2001 :
> > New York #1
> > Chicago #2
> > Boston #1
> > Miami #3
> > 
> > Right now the current approach involves looping over every city & making 2 
>queries. First query to get the total # of plays for this album for the time period 
>in that city. Next query to figure out how many other albums for that city & for that 
>time have total play sums that are greater, which is the Rank.
> > 
> > foreach $cities as $this_city
> > {
> > $this_album_total_plays = "select sum(total_plays) from album_by_city
> > where album_id = $album_id
> > and city = $this_city
> > and date >= $StartDate and date <= $EndDate";
> > 
> > $rank = "select album_id, sum(total_plays) from album_by_city
> > where city = $this_city
> > and date >= $StartDate and date <= $EndDate
> > group by album_id
> > having sum(total_plays) > $this_album_total_plays";
> > echo "$this_city $rank";
> > }
> > Is there a way to do this with fewer queries or perhaps temporary tables for 
>greater speed ?


---------------------------------
Do You Yahoo!?
- Get personalized email addresses from Yahoo! Mail Personal Address  - only $35 a 
year!

Reply via email to