Hi all,

I am using MySQL 4.0.x to run a community website which has (among other
things) over 19,000 pictures. There is a page that selects 30 random
thumbnails. I have noticed that the performance of "ORDER BY RAND()" on
this table has a significant impact on performace. I have all the
relevant indexes defined, and I have researched this issue on the Web.
It seems that other people have also encountered a performance hit while
using ORDER BY RAND(). The reason appears to be that when you do EXPLAIN
on a query using this, MySQL reports "Using temporary; Using filesort",
which is the worst possible result. Also, the number of rows reported is
pretty much the entire set. So, presumably, the current implementation
of ORDER BY RAND() means that MySQL has to traverse the entire table,
regardless of other indexes.

There are, of course, other ways to get around this, but they are all
more complex than simply using ORDER BY RAND(). I think that selecting a
random number of records from a table is something that a lot of
websites would like to be able to do, and so as datasets get larger it
would be nice to see this function scale well. For anyone who has a
website with a large archive of data, the ability to present a random
selection of this data is very useful.

I would like to know if anyone knows if the MySQL team is aware of this
problem, and if so whether they are planning on improving it at any
point. I ask mainly because if I am told that "yes, it'll be much better
in version X" then I can live with the couple of seconds that it takes
currently, knowing that this will be better down the line. However if I
am advised that this is a fundamentally hard problem for whatever
reason, then I will put the effort into reworking my tables to use an
alternative solution.

The only real solution that I can see which is fast is to make another
table which contains just the unique IDs of the pictures that are
visible (there are others which are not publicly visible, and which
shouldn't be included in the random query, so making a separate table
with the appropriate subset makes sense for performance). This new table
will have a primary key which is a numeric "sequence" field. Every
record will have its own sequence number, going from 1 up to the number
of records. Then, instead of doing one query with ORDER BY RAND() LIMIT
30, I can instead do 30 queries, each with a different random sequence
(generated from Perl), which will look up the unique sequence number.
Since this is a primary key, it will be very fast, so that doing 30
queries will not have a big performance impact. However this scheme
requires that the sequences in the new table be kept very consistent -
for example, if a picture is removed from the sequence then the sequence
numbers above that record have to be updated. This introduces potential
for error, but it is a possible solution. I don't want to implement it,
obviously, if ORDER BY RAND() is slated for improvement.

Thanks for any ideas or insights...

-Neil Gunton

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to