Re: Efficiently finding a random record

Michael Stassen Sun, 15 May 2005 08:50:22 -0700

You are both missing something fundamental: functions are evaluated for each and every row. Your query,

  SELECT * FROM history WHERE id = CAST(RAND() * 2355008 AS UNSIGNED);

gets a random number for *each row*, and returns the row if that row's random number is the same as its id. You are rolling the dice on each row. Because the odds of a match are 1/2355008 and you are rolling the dice 2355008 times, the *expected* number of matches is 1. That is, *on average*, this will return one row, but the outcome on any single running of this query will vary. (This also requires a full table scan, making it slow.)

You want to choose 1 random number, then choose the row with that id. Assuming there are no gaps (not a good assumption in usual practice), this can be accomplished with a user variable and two statements:

  SET @rand_id = CAST(1 + 2355008*RAND() AS UNSIGNED);
  SELECT * FROM history WHERE id = @rand_id;

Because ensuring that there are no gaps in your ids usually requires monkeying with the ids in ways that are considered bad practice, a query that doesn't make that assumption would be better. One solution would be to modify the query to allow several rows to match and then take the first one (using LIMIT). For example, if the selected random id is missing, we take the next id we find, like this:

  SELECT @rand_id:= CAST( 1 + MAX(id)*RAND() AS UNSIGNED) FROM history;
  SELECT * FROM history WHERE id >= @rand_id LIMIT 1;

Both queries should use the index on id and return quickly.

Michael

Gary Huntress wrote:

There's definitely more going on here that I don't understand. Consider this simple query and result:

mysql> select * from history where id = CAST( rand() * 2355008 as unsigned);
+---------+--------+------------+-------+--------+
| id      | symbol | thedate    | close | volume |
+---------+--------+------------+-------+--------+
| 1187308 | MAN    | 2003-05-23 | 33.13 | 444800 |
| 1398426 | NHP    | 2003-08-26 | 14.69 | 215200 |
| 2176684 | UMPQ   | 2004-01-27 | 20.74 |  43500 |
+---------+--------+------------+-------+--------+
3 rows in set (2.67 sec)
ID is an autoincrement field, and there are 2355008 total unique records, yet even though the where clause is a test for equivalence I get three result records. I think I reasonably expected to get 1 and only 1 record.
Puzzled,
Gary
----- Original Message ----- From: <[EMAIL PROTECTED]>
To: "Gary Huntress" <[EMAIL PROTECTED]>
Cc: <mysql@lists.mysql.com>
Sent: Sunday, May 15, 2005 6:42 AM
Subject: Re: Efficiently finding a random record
Hi, i did the test and agree with you. even with Select ID from history where id = 1+CAST( rand() * 199999 as UNSIGNED) limit 1;

when testing : select CAST( rand() * 199999 as UNSIGNED) this always returns a unique value.
So you don't need limit 1.
The workaround i found is : Select ID from history where id >= 1+CAST( rand() * 199999 as UNSIGNED) limit 1;
here limit 1 is usefull, and there no empty set.
Mathias
Selon Gary Huntress <[EMAIL PROTECTED]>:
As it happens, I found this post useful tonight for an almost identical
situation.
I have a table with exactly 200,000 rows, the first column is an
autoincrement ID field.  I am confident that all IDs are consecutive and
there are no holes.
When I do:
Select ID from history where id = CAST( rand() * 199999 + 1 as UNSIGNED) limit 1;

shouldn't it always return a single record? The CAST id should always be in the range from 1 to 200000.

About 50% of the time I get an empty recordset and I'm puzzled. (could be lack of sleep though)

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Efficiently finding a random record

Reply via email to