Ray wrote:
> an alternative to the order by rand() with large record sets is to
> pick a random starting point "limit $randPoint, 30"  don't know if
> its a viable solution to your situation, but it limits you to 2
> querys (row count, fetch) rather then the 30 (fetch 1 x 30)

Thanks! I did see this suggested on another forum. However when I tried
it, I found that EXPLAIN wasn't very encouraging. Using this minimal
table:

CREATE TABLE visible_pics (
  pic_id int(10) unsigned NOT NULL default '0',
  doc_id int(10) unsigned NOT NULL default '0',
  PRIMARY KEY  (pic_id),
  KEY doc_id (doc_id)
) TYPE=MyISAM;

mysql> explain select * from visible_pics limit 10000,1;
+--------------+------+---------------+------+---------+------+-------+-------+
| table        | type | possible_keys | key  | key_len | ref  | rows  |
Extra |
+--------------+------+---------------+------+---------+------+-------+-------+
| visible_pics | ALL  | NULL          | NULL |    NULL | NULL | 19633
|       |
+--------------+------+---------------+------+---------+------+-------+-------+
1 row in set (0.00 sec)

mysql> explain select * from visible_pics order by pic_id limit 10000,1;
+--------------+-------+---------------+---------+---------+------+-------+-------+
| table        | type  | possible_keys | key     | key_len | ref  |
rows  | Extra |
+--------------+-------+---------------+---------+---------+------+-------+-------+
| visible_pics | index | NULL          | PRIMARY |       4 | NULL |
19633 |       |
+--------------+-------+---------------+---------+---------+------+-------+-------+
1 row in set (0.00 sec)

In both cases, the number of rows which will be scanned is close to the
total number of rows. I included the second EXPLAIN to see if using
pic_id (the primary key) would make any difference. It actually seems to
actually be faster without using the index, in my trivial tests:

mysql> select * from visible_pics order by pic_id limit 10000,1;
+--------+--------+
| pic_id | doc_id |
+--------+--------+
|  11669 |    258 |
+--------+--------+
1 row in set (0.09 sec)

mysql> select * from visible_pics order by pic_id limit 10100,1;
+--------+--------+
| pic_id | doc_id |
+--------+--------+
|  11771 |    258 |
+--------+--------+
1 row in set (0.08 sec)

mysql> select * from visible_pics limit 10100,1;
+--------+--------+
| pic_id | doc_id |
+--------+--------+
|  11750 |    258 |
+--------+--------+
1 row in set (0.02 sec)

mysql> select * from visible_pics limit 12100,1;
+--------+--------+
| pic_id | doc_id |
+--------+--------+
|  14085 |    269 |
+--------+--------+
1 row in set (0.02 sec)

mysql> select * from visible_pics limit 900,1;
+--------+--------+
| pic_id | doc_id |
+--------+--------+
|   1100 |     53 |
+--------+--------+
1 row in set (0.01 sec)

mysql> select * from visible_pics limit 18000,1;
+--------+--------+
| pic_id | doc_id |
+--------+--------+
|  20343 |    387 |
+--------+--------+
1 row in set (0.03 sec)

mysql> 
mysql> select * from visible_pics order by pic_id limit 12000,1;
+--------+--------+
| pic_id | doc_id |
+--------+--------+
|  13857 |    325 |
+--------+--------+
1 row in set (0.10 sec)

The last one was just to confirm that there wasn't some kind of disk
caching going on that affected the results. The query without using the
index was definitely faster. If the average query is about 0.05 second,
and you do 30 of them, then that would give about 1.5 seconds for the
whole thing. This is in fact worse than just doing the "ORDER BY RAND()
LIMIT 30" on the same table:

mysql> select * from visible_pics order by rand() limit 30;
+--------+--------+
| pic_id | doc_id |
+--------+--------+
|   4149 |     98 |
|   5030 |    148 |
|   1911 |     69 |
|   4258 |    105 |
|  14131 |    170 |
|  17047 |    165 |
|  12643 |    319 |
|  14271 |    180 |
|   1815 |     69 |
|  12768 |    260 |
|   8118 |    164 |
|   2339 |     87 |
|   3058 |     63 |
|   2573 |     46 |
|  11511 |    230 |
|  16939 |    335 |
|   7749 |    113 |
|   6921 |    164 |
|   2106 |     79 |
|   3609 |     91 |
|  12513 |    259 |
|  18169 |    234 |
|  19173 |    372 |
|  11912 |    305 |
|   2026 |     69 |
|   7697 |    222 |
|  20834 |    447 |
|    977 |     53 |
|   1638 |     24 |
|  13986 |    308 |
+--------+--------+
30 rows in set (0.22 sec)

This isn't as simple as it appears at first, however - this is merely
the query to get 30 random pic_id's. I then have to do 30 more queries
to get the "real" records in the separate table, so that I can build the
HTML page with filenames, captions etc.

Thanks again - this is a good one to know about (for anyone else out
there who is encountering the same issues). But the above tests were on
a very minimal table, with no "where" clause, because the table was
specially prepared to only contain the relevant records in the first
place.

I am still wondering if I should post my original question about whether
"ORDER BY RAND()" will be optimized anytime soon to the "internals"
list... I don't want to needlessly cross-post, especially if the same
MySQL developers read this list...

Thanks again,

-Neil

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to