I think Roger was actually on the right track with his initial suggestion that this is a groupwise maximum problem as described in the manual page he referenced. Try this:

  CREATE TEMPORARY TABLE topten (k1 CHAR(1), total_amt int);

  LOCK TABLES Z AS x READ, Z AS y READ;

  INSERT INTO topten
  SELECT x.k1, x.total_amt
  FROM Z x JOIN Z y ON x.k1 = y.k1
  GROUP BY x.k1, x.total_amt
  HAVING SUM(y.total_amt > x.total_amt) < 10
  ORDER BY x.k1, x.total_amt DESC;

  SELECT x.*
  FROM Z AS x
  JOIN topten ON x.k1=topten.k1 AND x.total_amt = topten.total_amt
  ORDER BY x.k1, x.total_amt DESC;

  UNLOCK TABLES;
  DROP TABLE topten;

If you want the top N, you only have to change the 10 to N in the HAVING clause. It's still a bit expensive, but you only join the table to itself once regardless of N, so it should scale a bit better.

Michael

Michael Stassen wrote:


Michael Stassen wrote:

Don't bother. This is a very expensive solution. You get nearly a Cartesian product on each JOIN. I've got a 40 row test table with 20 values in each of 2 groups. The top 3 version of this examines 2302 rows to produce the 3 values for each of the 2 groups. The top 10 version has been running for several minutes...


It just finished:

+------+-------+--------+-------+--------+-------+-------+---------+--------+-------+-------+

| g | first | second | third | fourth | fifth | sixth | seventh | eighth | ninth | tenth |
+------+-------+--------+-------+--------+-------+-------+---------+--------+-------+-------+


| a | 392 | 339 | 332 | 330 | 304 | 279 | 271 | 250 | 183 | 179 |
| b | 390 | 338 | 302 | 273 | 272 | 268 | 245 | 215 | 211 | 189 |
+------+-------+--------+-------+--------+-------+-------+---------+--------+-------+-------+


2 rows in set (7 min 41.06 sec)

Nearly 8 minutes to get the top 10 for two 20-row groups. This definitely doesn't scale.

Michael



-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to