>-----Original Message-----
>From: Rob Wultsch [mailto:[EMAIL PROTECTED]
>Sent: Saturday, May 17, 2008 6:47 PM
>To: Scott Haneda
>Cc: mysql@lists.mysql.com
>Subject: Re: improving random record selection
>
>On Sat, May 17, 2008 at 2:32 PM, Scott Haneda <[EMAIL PROTECTED]>
>wrote:
>>                $sql = "
>>                SELECT storage_path, image_md5, t.id
>>                FROM images AS t
>>                JOIN
>>                        (SELECT CEIL(MAX(id)*RAND()) AS id FROM images)
>AS x
>> ON (t.id >= x.id)
>>                         AND (t.approved = 1) AND (t.ip_address !=
>> '$exclude_ip') LIMIT 1";
>I am going to reformat (whitespace only) your query a bit to start out
>with:
>SELECT storage_path, image_md5, t.id
>FROM images AS t
>      JOIN  (
>             SELECT CEIL(
>                                   MAX(id)*RAND()
>                               ) AS id
>              FROM images
>      ) AS x ON (t.id >= x.id)
>                      AND (t.approved = 1)
>                      AND (t.ip_address != '$exclude_ip')
>LIMIT 1
>
>I am going to break this up a bit:
>SELECT storage_path, image_md5, t.id
>FROM images AS t
>JOIN ....
>This should be mostly self explanatory. t.id specifies the table
>because id is ambiguous (x.id is created later on). Please note that I
>never use JOIN by itself. I would make this an INNER JOIN.
>
>SELECT CEIL(
>                        MAX(id)*RAND()
>                     ) AS id
>FROM images AS x
>
>MAX(id) find the largest id that currently exists. This value is then
>multiplied by whatever rand returns, which would be a between 0 and 1.
>The result of the multiplication is then rounded up, and aliased as
>id. The the table (of one row) is then aliased as x. So you now have
>x.id which is a random number between 0 and the largest id value that
>currently exists.
>
>ON (t.id >= x.id)
>      AND (t.approved = 1)
>      AND (t.ip_address != '$exclude_ip')
>Finally we have your JOIN condition. It says, for the table aliased as
>t, the id must be great than or equal to x.id (which was explained
>above). This will eliminate some portion of the images table from the
>possibility of being selected. Next all rows in the same table where
>approved is not equal to 1 should be removed. Finally all rows that
>fail t.ip_address != '$exclude_ip' get excluded.
>
>LIMIT 1
>Only return one row.
>
>Problems:
>1.  You should be using:
>      AND (t.approved = 1)
>      AND (t.ip_address != '$exclude_ip')
>in the subquery. If x.id is larger than the largest row that fits
>those conditions you will get no results.
>2. There is no ORDER BY clause. There is nothing telling MySQL use the
>t.id which is next largest value above x.id. MySQL will probably pick
>out the right row, because they are probably stored in order. You
>probably can get away with not having the ORDER BY clause, and it will
>cost you extra cycles. How many extra cycles depends on how out of
>order the table is. You can reorder the row by id using:
>ALTER TABLE images ORDER BY  id;
>3. If the holes in your data are not distributed equally...
>
>Suggested new query:
>SELECT storage_path, image_md5, t.id
>FROM images AS t
>      INNER JOIN  (
>            SELECT CEIL(
>                                  MAX(id)*RAND()
>                                 ) AS id
>            FROM images
>            WHERE x.approved = 1
>                      AND x.ip_address != '$exclude_ip'
>      ) AS x ON (t.id >= x.id)
>ORDER BY t.id ASC
>LIMIT 1
>
[JS] I might not understand what this is doing, but I think it will
preferentially sample the ids that are at the end of a gap.
>--
>Rob Wultsch
>[EMAIL PROTECTED]
>wultsch (aim)
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
>infoshop.com





-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to