>-----Original Message----- >From: Rob Wultsch [mailto:[EMAIL PROTECTED] >Sent: Saturday, May 17, 2008 6:47 PM >To: Scott Haneda >Cc: mysql@lists.mysql.com >Subject: Re: improving random record selection > >On Sat, May 17, 2008 at 2:32 PM, Scott Haneda <[EMAIL PROTECTED]> >wrote: >> $sql = " >> SELECT storage_path, image_md5, t.id >> FROM images AS t >> JOIN >> (SELECT CEIL(MAX(id)*RAND()) AS id FROM images) >AS x >> ON (t.id >= x.id) >> AND (t.approved = 1) AND (t.ip_address != >> '$exclude_ip') LIMIT 1"; >I am going to reformat (whitespace only) your query a bit to start out >with: >SELECT storage_path, image_md5, t.id >FROM images AS t > JOIN ( > SELECT CEIL( > MAX(id)*RAND() > ) AS id > FROM images > ) AS x ON (t.id >= x.id) > AND (t.approved = 1) > AND (t.ip_address != '$exclude_ip') >LIMIT 1 > >I am going to break this up a bit: >SELECT storage_path, image_md5, t.id >FROM images AS t >JOIN .... >This should be mostly self explanatory. t.id specifies the table >because id is ambiguous (x.id is created later on). Please note that I >never use JOIN by itself. I would make this an INNER JOIN. > >SELECT CEIL( > MAX(id)*RAND() > ) AS id >FROM images AS x > >MAX(id) find the largest id that currently exists. This value is then >multiplied by whatever rand returns, which would be a between 0 and 1. >The result of the multiplication is then rounded up, and aliased as >id. The the table (of one row) is then aliased as x. So you now have >x.id which is a random number between 0 and the largest id value that >currently exists. > >ON (t.id >= x.id) > AND (t.approved = 1) > AND (t.ip_address != '$exclude_ip') >Finally we have your JOIN condition. It says, for the table aliased as >t, the id must be great than or equal to x.id (which was explained >above). This will eliminate some portion of the images table from the >possibility of being selected. Next all rows in the same table where >approved is not equal to 1 should be removed. Finally all rows that >fail t.ip_address != '$exclude_ip' get excluded. > >LIMIT 1 >Only return one row. > >Problems: >1. You should be using: > AND (t.approved = 1) > AND (t.ip_address != '$exclude_ip') >in the subquery. If x.id is larger than the largest row that fits >those conditions you will get no results. >2. There is no ORDER BY clause. There is nothing telling MySQL use the >t.id which is next largest value above x.id. MySQL will probably pick >out the right row, because they are probably stored in order. You >probably can get away with not having the ORDER BY clause, and it will >cost you extra cycles. How many extra cycles depends on how out of >order the table is. You can reorder the row by id using: >ALTER TABLE images ORDER BY id; >3. If the holes in your data are not distributed equally... > >Suggested new query: >SELECT storage_path, image_md5, t.id >FROM images AS t > INNER JOIN ( > SELECT CEIL( > MAX(id)*RAND() > ) AS id > FROM images > WHERE x.approved = 1 > AND x.ip_address != '$exclude_ip' > ) AS x ON (t.id >= x.id) >ORDER BY t.id ASC >LIMIT 1 > [JS] I might not understand what this is doing, but I think it will preferentially sample the ids that are at the end of a gap. >-- >Rob Wultsch >[EMAIL PROTECTED] >wultsch (aim) > >-- >MySQL General Mailing List >For list archives: http://lists.mysql.com/mysql >To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] >infoshop.com
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]