Ray wrote: > an alternative to the order by rand() with large record sets is to > pick a random starting point "limit $randPoint, 30" don't know if > its a viable solution to your situation, but it limits you to 2 > querys (row count, fetch) rather then the 30 (fetch 1 x 30)
Thanks! I did see this suggested on another forum. However when I tried it, I found that EXPLAIN wasn't very encouraging. Using this minimal table: CREATE TABLE visible_pics ( pic_id int(10) unsigned NOT NULL default '0', doc_id int(10) unsigned NOT NULL default '0', PRIMARY KEY (pic_id), KEY doc_id (doc_id) ) TYPE=MyISAM; mysql> explain select * from visible_pics limit 10000,1; +--------------+------+---------------+------+---------+------+-------+-------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +--------------+------+---------------+------+---------+------+-------+-------+ | visible_pics | ALL | NULL | NULL | NULL | NULL | 19633 | | +--------------+------+---------------+------+---------+------+-------+-------+ 1 row in set (0.00 sec) mysql> explain select * from visible_pics order by pic_id limit 10000,1; +--------------+-------+---------------+---------+---------+------+-------+-------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +--------------+-------+---------------+---------+---------+------+-------+-------+ | visible_pics | index | NULL | PRIMARY | 4 | NULL | 19633 | | +--------------+-------+---------------+---------+---------+------+-------+-------+ 1 row in set (0.00 sec) In both cases, the number of rows which will be scanned is close to the total number of rows. I included the second EXPLAIN to see if using pic_id (the primary key) would make any difference. It actually seems to actually be faster without using the index, in my trivial tests: mysql> select * from visible_pics order by pic_id limit 10000,1; +--------+--------+ | pic_id | doc_id | +--------+--------+ | 11669 | 258 | +--------+--------+ 1 row in set (0.09 sec) mysql> select * from visible_pics order by pic_id limit 10100,1; +--------+--------+ | pic_id | doc_id | +--------+--------+ | 11771 | 258 | +--------+--------+ 1 row in set (0.08 sec) mysql> select * from visible_pics limit 10100,1; +--------+--------+ | pic_id | doc_id | +--------+--------+ | 11750 | 258 | +--------+--------+ 1 row in set (0.02 sec) mysql> select * from visible_pics limit 12100,1; +--------+--------+ | pic_id | doc_id | +--------+--------+ | 14085 | 269 | +--------+--------+ 1 row in set (0.02 sec) mysql> select * from visible_pics limit 900,1; +--------+--------+ | pic_id | doc_id | +--------+--------+ | 1100 | 53 | +--------+--------+ 1 row in set (0.01 sec) mysql> select * from visible_pics limit 18000,1; +--------+--------+ | pic_id | doc_id | +--------+--------+ | 20343 | 387 | +--------+--------+ 1 row in set (0.03 sec) mysql> mysql> select * from visible_pics order by pic_id limit 12000,1; +--------+--------+ | pic_id | doc_id | +--------+--------+ | 13857 | 325 | +--------+--------+ 1 row in set (0.10 sec) The last one was just to confirm that there wasn't some kind of disk caching going on that affected the results. The query without using the index was definitely faster. If the average query is about 0.05 second, and you do 30 of them, then that would give about 1.5 seconds for the whole thing. This is in fact worse than just doing the "ORDER BY RAND() LIMIT 30" on the same table: mysql> select * from visible_pics order by rand() limit 30; +--------+--------+ | pic_id | doc_id | +--------+--------+ | 4149 | 98 | | 5030 | 148 | | 1911 | 69 | | 4258 | 105 | | 14131 | 170 | | 17047 | 165 | | 12643 | 319 | | 14271 | 180 | | 1815 | 69 | | 12768 | 260 | | 8118 | 164 | | 2339 | 87 | | 3058 | 63 | | 2573 | 46 | | 11511 | 230 | | 16939 | 335 | | 7749 | 113 | | 6921 | 164 | | 2106 | 79 | | 3609 | 91 | | 12513 | 259 | | 18169 | 234 | | 19173 | 372 | | 11912 | 305 | | 2026 | 69 | | 7697 | 222 | | 20834 | 447 | | 977 | 53 | | 1638 | 24 | | 13986 | 308 | +--------+--------+ 30 rows in set (0.22 sec) This isn't as simple as it appears at first, however - this is merely the query to get 30 random pic_id's. I then have to do 30 more queries to get the "real" records in the separate table, so that I can build the HTML page with filenames, captions etc. Thanks again - this is a good one to know about (for anyone else out there who is encountering the same issues). But the above tests were on a very minimal table, with no "where" clause, because the table was specially prepared to only contain the relevant records in the first place. I am still wondering if I should post my original question about whether "ORDER BY RAND()" will be optimized anytime soon to the "internals" list... I don't want to needlessly cross-post, especially if the same MySQL developers read this list... Thanks again, -Neil -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]