Re: Efficiently finding a random record
Michael Stassen wrote: For example, if the selected random id is missing, we take the next id we find, like this: SELECT @rand_id:= CAST( 1 + MAX(id)*RAND() AS UNSIGNED) FROM history; SELECT * FROM history WHERE id = @rand_id LIMIT 1; That will have a possibly undesired effect. Records that have gaps in the IDs before them will be twice, three times, etc. (depending on the size of the gap), as likely to be selected as records with no preceding gaps. -- Keith Ivey [EMAIL PROTECTED] Smokefree DC http://www.smokefreedc.org Washington, DC -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Efficiently finding a random record
Michael Stassen wrote: For example, if the selected random id is missing, we take the next id we find, like this: SELECT @rand_id:= CAST( 1 + MAX(id)*RAND() AS UNSIGNED) FROM history; SELECT * FROM history WHERE id = @rand_id LIMIT 1; That will have a possibly undesired effect. Records that have gaps in the IDs before them will be twice, three times, etc. (depending on the size of the gap), as likely to be selected as records with no preceding gaps. Replace MAX with COUNT and the WHERE clause with an OFFSET and the gap problem should go away... -philip -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Efficiently finding a random record
Hi, i did the test and agree with you. even with Select ID from history where id = 1+CAST( rand() * 19 as UNSIGNED) limit 1; when testing : select CAST( rand() * 19 as UNSIGNED) this always returns a unique value. So you don't need limit 1. The workaround i found is : Select ID from history where id = 1+CAST( rand() * 19 as UNSIGNED) limit 1; here limit 1 is usefull, and there no empty set. Mathias Selon Gary Huntress [EMAIL PROTECTED]: As it happens, I found this post useful tonight for an almost identical situation. I have a table with exactly 200,000 rows, the first column is an autoincrement ID field. I am confident that all IDs are consecutive and there are no holes. When I do: Select ID from history where id = CAST( rand() * 19 + 1 as UNSIGNED) limit 1; shouldn't it always return a single record? The CAST id should always be in the range from 1 to 20. About 50% of the time I get an empty recordset and I'm puzzled. (could be lack of sleep though) Gary - Original Message - From: Eric Bergen [EMAIL PROTECTED] To: Dan Bolser [EMAIL PROTECTED] Cc: Philip Hallstrom [EMAIL PROTECTED]; Brian Dunning [EMAIL PROTECTED]; mysql@lists.mysql.com Sent: Friday, May 13, 2005 7:48 PM Subject: Re: Efficiently finding a random record in() can take millions of arguments. Up to max packet size. Try it :) Dan Bolser wrote: On Fri, 13 May 2005, Eric Bergen wrote: Even better is if you have an integer primary key (think auto_increment) and use in() So if you want 10 random rows in your app generate 20 or so random numbers and do something like select col1, from t where x in (1, 5, 3, 2...) limit 10 check num rows and if you don't get enough generate more random numbers and try again. in() is blazing fast even with thousands of numbers so don't be afraid to kick a few extra in. I heard about a 255 'in' limit. When you say 'thousands of numbers' do you mean in the IN or in the column? -Eric Philip Hallstrom wrote: I have a db of about 300,000 records and when I try to find one random record like this: select * from table order by rand() limit 1; it can take several minutes. My Sherlock Holmes instincts tell me that what I'm doing is somehow inefficient. What is the primary culprit here? The culprit is that mysql has to assign a random value (via rand()) to each of the 300,000 rows, then order all 300,000 rows by that random value and return the first one. So even though you're just retrieving one value, you're processing 300,000 rows. You'd be better off doing something like this in your application.. row_count = get row count; random_row = get random number from 0 to row_count - 1; result = db query LIMIT 1 OFFSET random_row or something like that... -philip -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Efficiently finding a random record
There's definitely more going on here that I don't understand. Consider this simple query and result: mysql select * from history where id = CAST( rand() * 2355008 as unsigned); +-+++---++ | id | symbol | thedate| close | volume | +-+++---++ | 1187308 | MAN| 2003-05-23 | 33.13 | 444800 | | 1398426 | NHP| 2003-08-26 | 14.69 | 215200 | | 2176684 | UMPQ | 2004-01-27 | 20.74 | 43500 | +-+++---++ 3 rows in set (2.67 sec) ID is an autoincrement field, and there are 2355008 total unique records, yet even though the where clause is a test for equivalence I get three result records. I think I reasonably expected to get 1 and only 1 record. Puzzled, Gary - Original Message - From: [EMAIL PROTECTED] To: Gary Huntress [EMAIL PROTECTED] Cc: mysql@lists.mysql.com Sent: Sunday, May 15, 2005 6:42 AM Subject: Re: Efficiently finding a random record Hi, i did the test and agree with you. even with Select ID from history where id = 1+CAST( rand() * 19 as UNSIGNED) limit 1; when testing : select CAST( rand() * 19 as UNSIGNED) this always returns a unique value. So you don't need limit 1. The workaround i found is : Select ID from history where id = 1+CAST( rand() * 19 as UNSIGNED) limit 1; here limit 1 is usefull, and there no empty set. Mathias Selon Gary Huntress [EMAIL PROTECTED]: As it happens, I found this post useful tonight for an almost identical situation. I have a table with exactly 200,000 rows, the first column is an autoincrement ID field. I am confident that all IDs are consecutive and there are no holes. When I do: Select ID from history where id = CAST( rand() * 19 + 1 as UNSIGNED) limit 1; shouldn't it always return a single record? The CAST id should always be in the range from 1 to 20. About 50% of the time I get an empty recordset and I'm puzzled. (could be lack of sleep though) Gary - Original Message - From: Eric Bergen [EMAIL PROTECTED] To: Dan Bolser [EMAIL PROTECTED] Cc: Philip Hallstrom [EMAIL PROTECTED]; Brian Dunning [EMAIL PROTECTED]; mysql@lists.mysql.com Sent: Friday, May 13, 2005 7:48 PM Subject: Re: Efficiently finding a random record in() can take millions of arguments. Up to max packet size. Try it :) Dan Bolser wrote: On Fri, 13 May 2005, Eric Bergen wrote: Even better is if you have an integer primary key (think auto_increment) and use in() So if you want 10 random rows in your app generate 20 or so random numbers and do something like select col1, from t where x in (1, 5, 3, 2...) limit 10 check num rows and if you don't get enough generate more random numbers and try again. in() is blazing fast even with thousands of numbers so don't be afraid to kick a few extra in. I heard about a 255 'in' limit. When you say 'thousands of numbers' do you mean in the IN or in the column? -Eric Philip Hallstrom wrote: I have a db of about 300,000 records and when I try to find one random record like this: select * from table order by rand() limit 1; it can take several minutes. My Sherlock Holmes instincts tell me that what I'm doing is somehow inefficient. What is the primary culprit here? The culprit is that mysql has to assign a random value (via rand()) to each of the 300,000 rows, then order all 300,000 rows by that random value and return the first one. So even though you're just retrieving one value, you're processing 300,000 rows. You'd be better off doing something like this in your application.. row_count = get row count; random_row = get random number from 0 to row_count - 1; result = db query LIMIT 1 OFFSET random_row or something like that... -philip -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Efficiently finding a random record
You are both missing something fundamental: functions are evaluated for each and every row. Your query, SELECT * FROM history WHERE id = CAST(RAND() * 2355008 AS UNSIGNED); gets a random number for *each row*, and returns the row if that row's random number is the same as its id. You are rolling the dice on each row. Because the odds of a match are 1/2355008 and you are rolling the dice 2355008 times, the *expected* number of matches is 1. That is, *on average*, this will return one row, but the outcome on any single running of this query will vary. (This also requires a full table scan, making it slow.) You want to choose 1 random number, then choose the row with that id. Assuming there are no gaps (not a good assumption in usual practice), this can be accomplished with a user variable and two statements: SET @rand_id = CAST(1 + 2355008*RAND() AS UNSIGNED); SELECT * FROM history WHERE id = @rand_id; Because ensuring that there are no gaps in your ids usually requires monkeying with the ids in ways that are considered bad practice, a query that doesn't make that assumption would be better. One solution would be to modify the query to allow several rows to match and then take the first one (using LIMIT). For example, if the selected random id is missing, we take the next id we find, like this: SELECT @rand_id:= CAST( 1 + MAX(id)*RAND() AS UNSIGNED) FROM history; SELECT * FROM history WHERE id = @rand_id LIMIT 1; Both queries should use the index on id and return quickly. Michael Gary Huntress wrote: There's definitely more going on here that I don't understand. Consider this simple query and result: mysql select * from history where id = CAST( rand() * 2355008 as unsigned); +-+++---++ | id | symbol | thedate| close | volume | +-+++---++ | 1187308 | MAN| 2003-05-23 | 33.13 | 444800 | | 1398426 | NHP| 2003-08-26 | 14.69 | 215200 | | 2176684 | UMPQ | 2004-01-27 | 20.74 | 43500 | +-+++---++ 3 rows in set (2.67 sec) ID is an autoincrement field, and there are 2355008 total unique records, yet even though the where clause is a test for equivalence I get three result records. I think I reasonably expected to get 1 and only 1 record. Puzzled, Gary - Original Message - From: [EMAIL PROTECTED] To: Gary Huntress [EMAIL PROTECTED] Cc: mysql@lists.mysql.com Sent: Sunday, May 15, 2005 6:42 AM Subject: Re: Efficiently finding a random record Hi, i did the test and agree with you. even with Select ID from history where id = 1+CAST( rand() * 19 as UNSIGNED) limit 1; when testing : select CAST( rand() * 19 as UNSIGNED) this always returns a unique value. So you don't need limit 1. The workaround i found is : Select ID from history where id = 1+CAST( rand() * 19 as UNSIGNED) limit 1; here limit 1 is usefull, and there no empty set. Mathias Selon Gary Huntress [EMAIL PROTECTED]: As it happens, I found this post useful tonight for an almost identical situation. I have a table with exactly 200,000 rows, the first column is an autoincrement ID field. I am confident that all IDs are consecutive and there are no holes. When I do: Select ID from history where id = CAST( rand() * 19 + 1 as UNSIGNED) limit 1; shouldn't it always return a single record? The CAST id should always be in the range from 1 to 20. About 50% of the time I get an empty recordset and I'm puzzled. (could be lack of sleep though) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Efficiently finding a random record
Eric Bergen wrote: in() can take millions of arguments. Up to max packet size. Try it :) True, but several past threads have suggested that performance drops dramatically when the size of the IN list gets too large. As IN is equivalent to an equality check for each value in the list, separated by ORs, this is not surprising. Dan Bolser wrote: On Fri, 13 May 2005, Eric Bergen wrote: Even better is if you have an integer primary key (think auto_increment) and use in() So if you want 10 random rows in your app generate 20 or so random numbers and do something like select col1, from t where x in (1, 5, 3, 2...) limit 10 check num rows and if you don't get enough generate more random numbers and try again. in() is blazing fast even with thousands of numbers so don't be afraid to kick a few extra in. I heard about a 255 'in' limit. When you say 'thousands of numbers' do you mean in the IN or in the column? -Eric -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Efficiently finding a random record
As it happens, I found this post useful tonight for an almost identical situation. I have a table with exactly 200,000 rows, the first column is an autoincrement ID field. I am confident that all IDs are consecutive and there are no holes. When I do: Select ID from history where id = CAST( rand() * 19 + 1 as UNSIGNED) limit 1; shouldn't it always return a single record? The CAST id should always be in the range from 1 to 20. About 50% of the time I get an empty recordset and I'm puzzled. (could be lack of sleep though) Gary - Original Message - From: Eric Bergen [EMAIL PROTECTED] To: Dan Bolser [EMAIL PROTECTED] Cc: Philip Hallstrom [EMAIL PROTECTED]; Brian Dunning [EMAIL PROTECTED]; mysql@lists.mysql.com Sent: Friday, May 13, 2005 7:48 PM Subject: Re: Efficiently finding a random record in() can take millions of arguments. Up to max packet size. Try it :) Dan Bolser wrote: On Fri, 13 May 2005, Eric Bergen wrote: Even better is if you have an integer primary key (think auto_increment) and use in() So if you want 10 random rows in your app generate 20 or so random numbers and do something like select col1, from t where x in (1, 5, 3, 2...) limit 10 check num rows and if you don't get enough generate more random numbers and try again. in() is blazing fast even with thousands of numbers so don't be afraid to kick a few extra in. I heard about a 255 'in' limit. When you say 'thousands of numbers' do you mean in the IN or in the column? -Eric Philip Hallstrom wrote: I have a db of about 300,000 records and when I try to find one random record like this: select * from table order by rand() limit 1; it can take several minutes. My Sherlock Holmes instincts tell me that what I'm doing is somehow inefficient. What is the primary culprit here? The culprit is that mysql has to assign a random value (via rand()) to each of the 300,000 rows, then order all 300,000 rows by that random value and return the first one. So even though you're just retrieving one value, you're processing 300,000 rows. You'd be better off doing something like this in your application.. row_count = get row count; random_row = get random number from 0 to row_count - 1; result = db query LIMIT 1 OFFSET random_row or something like that... -philip -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Efficiently finding a random record
I have a db of about 300,000 records and when I try to find one random record like this: select * from table order by rand() limit 1; it can take several minutes. My Sherlock Holmes instincts tell me that what I'm doing is somehow inefficient. What is the primary culprit here? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Efficiently finding a random record
At 12:54 PM 5/13/05, Brian Dunning wrote: I have a db of about 300,000 records and when I try to find one random record like this: select * from table order by rand() limit 1; it can take several minutes. My Sherlock Holmes instincts tell me that what I'm doing is somehow inefficient. What is the primary culprit here? Sherlock Holmes would have checked the archives first! There was a discussion about this a couple of weeks ago - Apr.26 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Efficiently finding a random record
I have a db of about 300,000 records and when I try to find one random record like this: select * from table order by rand() limit 1; it can take several minutes. My Sherlock Holmes instincts tell me that what I'm doing is somehow inefficient. What is the primary culprit here? The culprit is that mysql has to assign a random value (via rand()) to each of the 300,000 rows, then order all 300,000 rows by that random value and return the first one. So even though you're just retrieving one value, you're processing 300,000 rows. You'd be better off doing something like this in your application.. row_count = get row count; random_row = get random number from 0 to row_count - 1; result = db query LIMIT 1 OFFSET random_row or something like that... -philip -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Efficiently finding a random record
Even better is if you have an integer primary key (think auto_increment) and use in() So if you want 10 random rows in your app generate 20 or so random numbers and do something like select col1, from t where x in (1, 5, 3, 2...) limit 10 check num rows and if you don't get enough generate more random numbers and try again. in() is blazing fast even with thousands of numbers so don't be afraid to kick a few extra in. -Eric Philip Hallstrom wrote: I have a db of about 300,000 records and when I try to find one random record like this: select * from table order by rand() limit 1; it can take several minutes. My Sherlock Holmes instincts tell me that what I'm doing is somehow inefficient. What is the primary culprit here? The culprit is that mysql has to assign a random value (via rand()) to each of the 300,000 rows, then order all 300,000 rows by that random value and return the first one. So even though you're just retrieving one value, you're processing 300,000 rows. You'd be better off doing something like this in your application.. row_count = get row count; random_row = get random number from 0 to row_count - 1; result = db query LIMIT 1 OFFSET random_row or something like that... -philip -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Efficiently finding a random record
On Fri, 13 May 2005, Eric Bergen wrote: Even better is if you have an integer primary key (think auto_increment) and use in() So if you want 10 random rows in your app generate 20 or so random numbers and do something like select col1, from t where x in (1, 5, 3, 2...) limit 10 check num rows and if you don't get enough generate more random numbers and try again. in() is blazing fast even with thousands of numbers so don't be afraid to kick a few extra in. I heard about a 255 'in' limit. When you say 'thousands of numbers' do you mean in the IN or in the column? -Eric Philip Hallstrom wrote: I have a db of about 300,000 records and when I try to find one random record like this: select * from table order by rand() limit 1; it can take several minutes. My Sherlock Holmes instincts tell me that what I'm doing is somehow inefficient. What is the primary culprit here? The culprit is that mysql has to assign a random value (via rand()) to each of the 300,000 rows, then order all 300,000 rows by that random value and return the first one. So even though you're just retrieving one value, you're processing 300,000 rows. You'd be better off doing something like this in your application.. row_count = get row count; random_row = get random number from 0 to row_count - 1; result = db query LIMIT 1 OFFSET random_row or something like that... -philip -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Efficiently finding a random record
in() can take millions of arguments. Up to max packet size. Try it :) Dan Bolser wrote: On Fri, 13 May 2005, Eric Bergen wrote: Even better is if you have an integer primary key (think auto_increment) and use in() So if you want 10 random rows in your app generate 20 or so random numbers and do something like select col1, from t where x in (1, 5, 3, 2...) limit 10 check num rows and if you don't get enough generate more random numbers and try again. in() is blazing fast even with thousands of numbers so don't be afraid to kick a few extra in. I heard about a 255 'in' limit. When you say 'thousands of numbers' do you mean in the IN or in the column? -Eric Philip Hallstrom wrote: I have a db of about 300,000 records and when I try to find one random record like this: select * from table order by rand() limit 1; it can take several minutes. My Sherlock Holmes instincts tell me that what I'm doing is somehow inefficient. What is the primary culprit here? The culprit is that mysql has to assign a random value (via rand()) to each of the 300,000 rows, then order all 300,000 rows by that random value and return the first one. So even though you're just retrieving one value, you're processing 300,000 rows. You'd be better off doing something like this in your application.. row_count = get row count; random_row = get random number from 0 to row_count - 1; result = db query LIMIT 1 OFFSET random_row or something like that... -philip -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]