>From: Scott Haneda [mailto:[EMAIL PROTECTED]
>Sent: Saturday, May 17, 2008 5:32 PM
>To: mysql@lists.mysql.com
>Subject: improving random record selection
>
>I posted this a month or so ago, and was helped a little, but I am now
>back.
>
>Currently I use select x, y, z from images where (condition) order by
>rand() limit 1;
>
>As most know, it is slow, depending on the record set, and what I
>compare it to, it can be from one order of magnitude slower, to several.
>
>I have cobbled together a solution, but it on occasion, returns an
>empty result set, which is causing me grief. I should mention, there
>are holes in my id column, and I am needing to select a set based on a
>condition.
>
>The below sql I do not fully understand either, if someone could step
>me through it, I would certainly appreciate it, though my main goal is
>to figure out why I get an empty set on occasion.
>
>               $sql = "
>               SELECT storage_path, image_md5, t.id
>               FROM images AS t
>               JOIN
>                       (SELECT CEIL(MAX(id)*RAND()) AS id FROM images) AS x
>ON (t.id >=
>x.id)
>                        AND (t.approved = 1) AND (t.ip_address !=
>'$exclude_ip') LIMIT 1";
>
>* I could almost live with the slow speed of an order by rand() but I
>find it has a less than even distribution.  While it indeed may be
>very random, I am looking for a pretty flat response in distribution
>of returned records over time.
[JS] You say you want a "flat" distribution; by that I think you mean that
the probability of selecting any given record is the same. If you have gaps
in your data, I can't think of any way to do that other than be assigning a
unique and sequential ID to each record. If you ever delete a record, you'd
have to renumber the remaining ones. Then you'd pick off a random value for
this unique ID.

At first glance, this seems to be the only way to avoid sampling errors.

If there were some way of setting a cursor to an arbitrary record, that
would work very well; but you don't want to be stepping sequentially through
(on average) half or your records.
>--
>Scott
>[EMAIL PROTECTED]
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
>infoshop.com





-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to