Re: Efficiently finding a random record

2005-05-16 Thread Keith Ivey
Michael Stassen wrote:
For example, if the selected random id is 
missing, we take the next id we find, like this:

  SELECT @rand_id:= CAST( 1 + MAX(id)*RAND() AS UNSIGNED) FROM history;
  SELECT * FROM history WHERE id = @rand_id LIMIT 1;
That will have a possibly undesired effect.  Records that have gaps in the IDs 
before them will be twice, three times, etc. (depending on the size of the gap), 
as likely to be selected as records with no preceding gaps.

--
Keith Ivey [EMAIL PROTECTED]
Smokefree DC
http://www.smokefreedc.org
Washington, DC
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Efficiently finding a random record

2005-05-16 Thread Philip Hallstrom
Michael Stassen wrote:
For example, if the selected random id is missing, we take the next id we 
find, like this:

  SELECT @rand_id:= CAST( 1 + MAX(id)*RAND() AS UNSIGNED) FROM history;
  SELECT * FROM history WHERE id = @rand_id LIMIT 1;
That will have a possibly undesired effect.  Records that have gaps in the 
IDs before them will be twice, three times, etc. (depending on the size of 
the gap), as likely to be selected as records with no preceding gaps.
Replace MAX with COUNT and the WHERE clause with an OFFSET and the gap 
problem should go away...

-philip
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Efficiently finding a random record

2005-05-15 Thread mfatene
Hi,
i did the test and agree with you. even with
Select ID from history where id = 1+CAST( rand() * 19 as UNSIGNED) limit 1;

when testing : select CAST( rand() * 19 as UNSIGNED) this always returns a
unique value.

So you don't need limit 1.

The workaround i found is :
Select ID from history where id = 1+CAST( rand() * 19 as UNSIGNED) limit 1;

here limit 1 is usefull, and there no empty set.

Mathias


Selon Gary Huntress [EMAIL PROTECTED]:

 As it happens, I found this post useful tonight for an almost identical
 situation.

 I have a table with exactly 200,000 rows, the first column is an
 autoincrement ID field.  I am confident that all IDs are consecutive and
 there are no holes.

 When I do:

 Select ID from history where id = CAST( rand() * 19 + 1 as UNSIGNED)
 limit 1;

 shouldn't it always return a single record?  The CAST id should always be in
 the range from 1 to 20.

 About 50% of the time I get an empty recordset and I'm puzzled.  (could be
 lack of sleep though)


 Gary
 - Original Message -
 From: Eric Bergen [EMAIL PROTECTED]
 To: Dan Bolser [EMAIL PROTECTED]
 Cc: Philip Hallstrom [EMAIL PROTECTED]; Brian Dunning
 [EMAIL PROTECTED]; mysql@lists.mysql.com
 Sent: Friday, May 13, 2005 7:48 PM
 Subject: Re: Efficiently finding a random record


  in() can take millions of arguments. Up to max packet size. Try it :)
 
  Dan Bolser wrote:
 
 On Fri, 13 May 2005, Eric Bergen wrote:
 
 
 Even better is if you have an integer primary key (think auto_increment)
 and use in()
 
 So if you want 10 random rows in your app generate 20 or so random
 numbers and do something like
 select col1, from t where x in (1, 5, 3, 2...) limit 10
 
 check num rows and if you don't get enough generate more random numbers
 and try again.
 
 in() is blazing fast even with thousands of numbers so don't be afraid to
 kick a few extra in.
 
 
 I heard about a 255 'in' limit. When you say 'thousands of numbers' do you
 mean in the IN or in the column?
 
 
 
 -Eric
 
 
 Philip Hallstrom wrote:
 
 
 I have a db of about 300,000 records and when I try to find one random
 record like this:
 
 select * from table order by rand() limit 1;
 
 it can take several minutes. My Sherlock Holmes instincts tell me that
 what I'm doing is somehow inefficient. What is the primary culprit
 here?
 
 The culprit is that mysql has to assign a random value (via rand()) to
 each of the 300,000 rows, then order all 300,000 rows by that random
 value and return the first one.
 
 So even though you're just retrieving one value, you're processing
 300,000 rows.
 
 You'd be better off doing something like this in your application..
 
 row_count = get row count;
 random_row = get random number from 0 to row_count - 1;
 result = db query LIMIT 1 OFFSET random_row
 
 or something like that...
 
 -philip
 
 
 
 
 
 
 
 
 
  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe:
  http://lists.mysql.com/[EMAIL PROTECTED]
 
 



 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]





-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Efficiently finding a random record

2005-05-15 Thread Gary Huntress
There's definitely more going on here that I don't understand.   Consider 
this simple query and result:

mysql select * from history where id = CAST( rand() * 2355008 as unsigned);
+-+++---++
| id  | symbol | thedate| close | volume |
+-+++---++
| 1187308 | MAN| 2003-05-23 | 33.13 | 444800 |
| 1398426 | NHP| 2003-08-26 | 14.69 | 215200 |
| 2176684 | UMPQ   | 2004-01-27 | 20.74 |  43500 |
+-+++---++
3 rows in set (2.67 sec)
ID is an autoincrement field, and there are 2355008 total unique records, 
yet even though the where clause is a test for equivalence I get three 
result records.  I think I reasonably expected to get 1 and only 1 record.

Puzzled,
Gary
- Original Message - 
From: [EMAIL PROTECTED]
To: Gary Huntress [EMAIL PROTECTED]
Cc: mysql@lists.mysql.com
Sent: Sunday, May 15, 2005 6:42 AM
Subject: Re: Efficiently finding a random record


Hi,
i did the test and agree with you. even with
Select ID from history where id = 1+CAST( rand() * 19 as UNSIGNED) 
limit 1;

when testing : select CAST( rand() * 19 as UNSIGNED) this always 
returns a
unique value.

So you don't need limit 1.
The workaround i found is :
Select ID from history where id = 1+CAST( rand() * 19 as UNSIGNED) 
limit 1;

here limit 1 is usefull, and there no empty set.
Mathias
Selon Gary Huntress [EMAIL PROTECTED]:
As it happens, I found this post useful tonight for an almost identical
situation.
I have a table with exactly 200,000 rows, the first column is an
autoincrement ID field.  I am confident that all IDs are consecutive and
there are no holes.
When I do:
Select ID from history where id = CAST( rand() * 19 + 1 as 
UNSIGNED)
limit 1;

shouldn't it always return a single record?  The CAST id should always be 
in
the range from 1 to 20.

About 50% of the time I get an empty recordset and I'm puzzled.  (could 
be
lack of sleep though)

Gary
- Original Message -
From: Eric Bergen [EMAIL PROTECTED]
To: Dan Bolser [EMAIL PROTECTED]
Cc: Philip Hallstrom [EMAIL PROTECTED]; Brian Dunning
[EMAIL PROTECTED]; mysql@lists.mysql.com
Sent: Friday, May 13, 2005 7:48 PM
Subject: Re: Efficiently finding a random record
 in() can take millions of arguments. Up to max packet size. Try it :)

 Dan Bolser wrote:

On Fri, 13 May 2005, Eric Bergen wrote:


Even better is if you have an integer primary key (think 
auto_increment)
and use in()

So if you want 10 random rows in your app generate 20 or so random
numbers and do something like
select col1, from t where x in (1, 5, 3, 2...) limit 10

check num rows and if you don't get enough generate more random 
numbers
and try again.

in() is blazing fast even with thousands of numbers so don't be afraid 
to
kick a few extra in.


I heard about a 255 'in' limit. When you say 'thousands of numbers' do 
you
mean in the IN or in the column?



-Eric


Philip Hallstrom wrote:


I have a db of about 300,000 records and when I try to find one 
random
record like this:

select * from table order by rand() limit 1;

it can take several minutes. My Sherlock Holmes instincts tell me 
that
what I'm doing is somehow inefficient. What is the primary culprit
here?

The culprit is that mysql has to assign a random value (via rand()) 
to
each of the 300,000 rows, then order all 300,000 rows by that random
value and return the first one.

So even though you're just retrieving one value, you're processing
300,000 rows.

You'd be better off doing something like this in your application..

row_count = get row count;
random_row = get random number from 0 to row_count - 1;
result = db query LIMIT 1 OFFSET random_row

or something like that...

-philip









 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/[EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Efficiently finding a random record

2005-05-15 Thread Michael Stassen
You are both missing something fundamental: functions are evaluated for 
each and every row.  Your query,

  SELECT * FROM history WHERE id = CAST(RAND() * 2355008 AS UNSIGNED);
gets a random number for *each row*, and returns the row if that row's 
random number is the same as its id.  You are rolling the dice on each 
row.  Because the odds of a match are 1/2355008 and you are rolling the 
dice 2355008 times, the *expected* number of matches is 1.  That is, *on 
average*, this will return one row, but the outcome on any single 
running of this query will vary.  (This also requires a full table scan, 
making it slow.)

You want to choose 1 random number, then choose the row with that id. 
Assuming there are no gaps (not a good assumption in usual practice), 
this can be accomplished with a user variable and two statements:

  SET @rand_id = CAST(1 + 2355008*RAND() AS UNSIGNED);
  SELECT * FROM history WHERE id = @rand_id;
Because ensuring that there are no gaps in your ids usually requires 
monkeying with the ids in ways that are considered bad practice, a query 
that doesn't make that assumption would be better.  One solution would 
be to modify the query to allow several rows to match and then take the 
first one (using LIMIT).  For example, if the selected random id is 
missing, we take the next id we find, like this:

  SELECT @rand_id:= CAST( 1 + MAX(id)*RAND() AS UNSIGNED) FROM history;
  SELECT * FROM history WHERE id = @rand_id LIMIT 1;
Both queries should use the index on id and return quickly.
Michael
Gary Huntress wrote:
There's definitely more going on here that I don't understand.   
Consider this simple query and result:

mysql select * from history where id = CAST( rand() * 2355008 as 
unsigned);

+-+++---++
| id  | symbol | thedate| close | volume |
+-+++---++
| 1187308 | MAN| 2003-05-23 | 33.13 | 444800 |
| 1398426 | NHP| 2003-08-26 | 14.69 | 215200 |
| 2176684 | UMPQ   | 2004-01-27 | 20.74 |  43500 |
+-+++---++
3 rows in set (2.67 sec)
ID is an autoincrement field, and there are 2355008 total unique 
records, yet even though the where clause is a test for equivalence I 
get three result records.  I think I reasonably expected to get 1 and 
only 1 record.

Puzzled,
Gary
- Original Message - From: [EMAIL PROTECTED]
To: Gary Huntress [EMAIL PROTECTED]
Cc: mysql@lists.mysql.com
Sent: Sunday, May 15, 2005 6:42 AM
Subject: Re: Efficiently finding a random record

Hi,
i did the test and agree with you. even with
Select ID from history where id = 1+CAST( rand() * 19 as UNSIGNED) 
limit 1;

when testing : select CAST( rand() * 19 as UNSIGNED) this always 
returns a
unique value.

So you don't need limit 1.
The workaround i found is :
Select ID from history where id = 1+CAST( rand() * 19 as 
UNSIGNED) limit 1;

here limit 1 is usefull, and there no empty set.
Mathias
Selon Gary Huntress [EMAIL PROTECTED]:
As it happens, I found this post useful tonight for an almost identical
situation.
I have a table with exactly 200,000 rows, the first column is an
autoincrement ID field.  I am confident that all IDs are consecutive and
there are no holes.
When I do:
Select ID from history where id = CAST( rand() * 19 + 1 as 
UNSIGNED)
limit 1;

shouldn't it always return a single record?  The CAST id should 
always be in
the range from 1 to 20.

About 50% of the time I get an empty recordset and I'm puzzled.  
(could be
lack of sleep though)

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Efficiently finding a random record

2005-05-15 Thread Michael Stassen
Eric Bergen wrote:
in() can take millions of arguments. Up to max packet size. Try it :)
True, but several past threads have suggested that performance drops 
dramatically when the size of the IN list gets too large.  As IN is 
equivalent to an equality check for each value in the list, separated by 
ORs, this is not surprising.

Dan Bolser wrote:
On Fri, 13 May 2005, Eric Bergen wrote:
 

Even better is if you have an integer primary key (think 
auto_increment) and use in()

So if you want 10 random rows in your app generate 20 or so random 
numbers and do something like
select col1, from t where x in (1, 5, 3, 2...) limit 10

check num rows and if you don't get enough generate more random 
numbers and try again.

in() is blazing fast even with thousands of numbers so don't be 
afraid to kick a few extra in.
I heard about a 255 'in' limit. When you say 'thousands of numbers' do 
you mean in the IN or in the column?

-Eric

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Efficiently finding a random record

2005-05-14 Thread Gary Huntress
As it happens, I found this post useful tonight for an almost identical 
situation.

I have a table with exactly 200,000 rows, the first column is an 
autoincrement ID field.  I am confident that all IDs are consecutive and 
there are no holes.

When I do:
   Select ID from history where id = CAST( rand() * 19 + 1 as UNSIGNED) 
limit 1;

shouldn't it always return a single record?  The CAST id should always be in 
the range from 1 to 20.

About 50% of the time I get an empty recordset and I'm puzzled.  (could be 
lack of sleep though)

Gary
- Original Message - 
From: Eric Bergen [EMAIL PROTECTED]
To: Dan Bolser [EMAIL PROTECTED]
Cc: Philip Hallstrom [EMAIL PROTECTED]; Brian Dunning 
[EMAIL PROTECTED]; mysql@lists.mysql.com
Sent: Friday, May 13, 2005 7:48 PM
Subject: Re: Efficiently finding a random record


in() can take millions of arguments. Up to max packet size. Try it :)
Dan Bolser wrote:
On Fri, 13 May 2005, Eric Bergen wrote:

Even better is if you have an integer primary key (think auto_increment) 
and use in()

So if you want 10 random rows in your app generate 20 or so random 
numbers and do something like
select col1, from t where x in (1, 5, 3, 2...) limit 10

check num rows and if you don't get enough generate more random numbers 
and try again.

in() is blazing fast even with thousands of numbers so don't be afraid to 
kick a few extra in.

I heard about a 255 'in' limit. When you say 'thousands of numbers' do you
mean in the IN or in the column?

-Eric
Philip Hallstrom wrote:

I have a db of about 300,000 records and when I try to find one random 
record like this:

select * from table order by rand() limit 1;
it can take several minutes. My Sherlock Holmes instincts tell me that 
what I'm doing is somehow inefficient. What is the primary culprit 
here?

The culprit is that mysql has to assign a random value (via rand()) to 
each of the 300,000 rows, then order all 300,000 rows by that random 
value and return the first one.

So even though you're just retrieving one value, you're processing 
300,000 rows.

You'd be better off doing something like this in your application..
row_count = get row count;
random_row = get random number from 0 to row_count - 1;
result = db query LIMIT 1 OFFSET random_row
or something like that...
-philip





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: 
http://lists.mysql.com/[EMAIL PROTECTED]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Efficiently finding a random record

2005-05-13 Thread Brian Dunning
I have a db of about 300,000 records and when I try to find one  
random record like this:

select * from table order by rand() limit 1;
it can take several minutes. My Sherlock Holmes instincts tell me  
that what I'm doing is somehow inefficient. What is the primary  
culprit here?

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Efficiently finding a random record

2005-05-13 Thread Frank Bax
At 12:54 PM 5/13/05, Brian Dunning wrote:
I have a db of about 300,000 records and when I try to find one
random record like this:
select * from table order by rand() limit 1;
it can take several minutes. My Sherlock Holmes instincts tell me
that what I'm doing is somehow inefficient. What is the primary
culprit here?

Sherlock Holmes would have checked the archives first!  There was a 
discussion about this a couple of weeks ago - Apr.26 

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Efficiently finding a random record

2005-05-13 Thread Philip Hallstrom
I have a db of about 300,000 records and when I try to find one random record 
like this:

select * from table order by rand() limit 1;
it can take several minutes. My Sherlock Holmes instincts tell me that what 
I'm doing is somehow inefficient. What is the primary culprit here?
The culprit is that mysql has to assign a random value (via rand()) to 
each of the 300,000 rows, then order all 300,000 rows by that random value 
and return the first one.

So even though you're just retrieving one value, you're processing 300,000 
rows.

You'd be better off doing something like this in your application..
row_count = get row count;
random_row = get random number from 0 to row_count - 1;
result = db query LIMIT 1 OFFSET random_row
or something like that...
-philip
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Efficiently finding a random record

2005-05-13 Thread Eric Bergen
Even better is if you have an integer primary key (think auto_increment) 
and use in()

So if you want 10 random rows in your app generate 20 or so random 
numbers and do something like
select col1, from t where x in (1, 5, 3, 2...) limit 10

check num rows and if you don't get enough generate more random numbers 
and try again.

in() is blazing fast even with thousands of numbers so don't be afraid 
to kick a few extra in.

-Eric
Philip Hallstrom wrote:
I have a db of about 300,000 records and when I try to find one 
random record like this:

select * from table order by rand() limit 1;
it can take several minutes. My Sherlock Holmes instincts tell me 
that what I'm doing is somehow inefficient. What is the primary 
culprit here?

The culprit is that mysql has to assign a random value (via rand()) to 
each of the 300,000 rows, then order all 300,000 rows by that random 
value and return the first one.

So even though you're just retrieving one value, you're processing 
300,000 rows.

You'd be better off doing something like this in your application..
row_count = get row count;
random_row = get random number from 0 to row_count - 1;
result = db query LIMIT 1 OFFSET random_row
or something like that...
-philip

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Efficiently finding a random record

2005-05-13 Thread Dan Bolser
On Fri, 13 May 2005, Eric Bergen wrote:

Even better is if you have an integer primary key (think auto_increment) 
and use in()

So if you want 10 random rows in your app generate 20 or so random 
numbers and do something like
select col1, from t where x in (1, 5, 3, 2...) limit 10

check num rows and if you don't get enough generate more random numbers 
and try again.

in() is blazing fast even with thousands of numbers so don't be afraid 
to kick a few extra in.

I heard about a 255 'in' limit. When you say 'thousands of numbers' do you
mean in the IN or in the column?



-Eric


Philip Hallstrom wrote:

 I have a db of about 300,000 records and when I try to find one 
 random record like this:

 select * from table order by rand() limit 1;

 it can take several minutes. My Sherlock Holmes instincts tell me 
 that what I'm doing is somehow inefficient. What is the primary 
 culprit here?


 The culprit is that mysql has to assign a random value (via rand()) to 
 each of the 300,000 rows, then order all 300,000 rows by that random 
 value and return the first one.

 So even though you're just retrieving one value, you're processing 
 300,000 rows.

 You'd be better off doing something like this in your application..

 row_count = get row count;
 random_row = get random number from 0 to row_count - 1;
 result = db query LIMIT 1 OFFSET random_row

 or something like that...

 -philip






-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Efficiently finding a random record

2005-05-13 Thread Eric Bergen
in() can take millions of arguments. Up to max packet size. Try it :)
Dan Bolser wrote:
On Fri, 13 May 2005, Eric Bergen wrote:
 

Even better is if you have an integer primary key (think auto_increment) 
and use in()

So if you want 10 random rows in your app generate 20 or so random 
numbers and do something like
select col1, from t where x in (1, 5, 3, 2...) limit 10

check num rows and if you don't get enough generate more random numbers 
and try again.

in() is blazing fast even with thousands of numbers so don't be afraid 
to kick a few extra in.
   

I heard about a 255 'in' limit. When you say 'thousands of numbers' do you
mean in the IN or in the column?
 

-Eric
Philip Hallstrom wrote:
   

I have a db of about 300,000 records and when I try to find one 
random record like this:

select * from table order by rand() limit 1;
it can take several minutes. My Sherlock Holmes instincts tell me 
that what I'm doing is somehow inefficient. What is the primary 
culprit here?
   

The culprit is that mysql has to assign a random value (via rand()) to 
each of the 300,000 rows, then order all 300,000 rows by that random 
value and return the first one.

So even though you're just retrieving one value, you're processing 
300,000 rows.

You'd be better off doing something like this in your application..
row_count = get row count;
random_row = get random number from 0 to row_count - 1;
result = db query LIMIT 1 OFFSET random_row
or something like that...
-philip
 

   


 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]