Re: ORDER BY RAND() Too Slow! Alternatives?

2001-02-11 Thread Jeffrey D. Wheelhouse

At 11:39 PM 2/10/2001 -0800, Stephen Waits wrote:
Never mind on the "it doesn't work on my system" more like it didn't
work on my brain :)  Works fine.

Oh, phew.

Theoretically it could be as fast as Carsten's method couldn't it?  If
it hit a record on the first shot?  Otherwise it's pounding through an
index O(random-nearest_id) where his does it O(1).  And could it
potentially loop infinitely?

Based on my admittedly pathetic understanding of B-trees and database 
indexes, I *think* Carsten's approach is O(lg n) on the number of rows.  My 
approach is O(M*n) on the number of rows, where M is a pretty lightweight 
access to nab the key.  The "LIMIT $rand, 1" approach is O(D*n/2) on the 
number of rows over time, but D is a nasty I/O hit to slurp the whole row 
into the resultset.

The only case where Carsten's approach and mine would converge would be if 
you were using a query where no index could be applied.  Then they'd both 
be stuck at O(N) on the number of rows.

I am curious whether "(@rand:=@rand-1)+id=id" can be optimized to remove 
the table reference (id) without having the query optimizer decide it only 
needs to run once.  That might shave a good bit off of M.

In a case like this, it would be handy to have a ROW() function that tracks 
the running counter being used to generate the "X rows in set." 
statistic.  But such a thing would probably be of limited utility.

At 11:28PM 2/10/2001 -0800, Stephen Waits wrote:
Carsten's approach is one of those "duh" things I don't understand why I
hadn't thought of it.?

Likewise.  It's a good reminder that clever solutions don't always come 
from linear thinking.  Thanks Carsten!

Jeff


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: ORDER BY RAND() Too Slow! Alternatives?

2001-02-10 Thread Jeffrey D. Wheelhouse


Could you do something like:

CREATE TEMPORARY TABLE temptable (
   pk INTEGER,
   rand INTEGER
);
INSERT INTO temptable SELECT yourpk,Rand() FROM yourtable;
SELECT yourtable.* FROM yourtable,temptable WHERE pk=yourpk ORDER BY rand;
DROP TABLE temptable;

That might be quicker than your current approach.

Jeff

At 12:12 PM 2/10/2001 -0800, Stephen Waits wrote:

Hi there,

In the quest to get a random row from a table, "order by rand()" has
proven too inefficient and slow.  It's slow because MySQL apparently
selects ALL rows into memory, then randomly shuffles ALL of them, then
gives you the first one - very inefficient.  There are a few other ways
I've thrown around but none are "elegant".

One is, if a table has an id # column, like "id int unsigned not null
auto_increment", I could do this:

select max(id) from table;
$random_number = ...
select * from table where id=$random_number;

This is very fast (assuming the id field is a unique index).  But it has
the problem that if records have been deleted I might get a 0-row
response.  It also does not work if I want to limit to a particular
category, for instance "where category='women'" or something.

I could do this too:

select count(*) from table;
$random_number = ...
select * from table limit $random_number,1;

This has the benefit of always working but the speed, though faster than
the "order by rand()" method, remains unacceptable.  The speed seems
linear with regard to the size of $random_number; which is probably
obvious to you.

So I've experimented with several other things:

select * from table where limit rand(),1;
select * from table where id=(mod(floor(rand()*4294967296),count(*))+1);
.. and it only gets uglier from -- these are all not accepted by MySQL.

MySQL does not allow for subqueries which is another way it could possibly
be accomplished.  In the end, I'll just use what works, no matter the
speed.

BUT, I'd love to hear what other people have done to solve this problem!

Thanks,
Steve


-
Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: ORDER BY RAND() Too Slow! Alternatives?

2001-02-10 Thread Carsten H. Pedersen

 Hi there,

 In the quest to get a random row from a table, "order by rand()" has
 proven too inefficient and slow.  It's slow because MySQL apparently
 selects ALL rows into memory, then randomly shuffles ALL of them, then
 gives you the first one - very inefficient.  There are a few other ways
 I've thrown around but none are "elegant".

 One is, if a table has an id # column, like "id int unsigned not null
 auto_increment", I could do this:

 select max(id) from table;
 $random_number = ...
 select * from table where id=$random_number;

How about
select * from table
where id$random_number
order by id
limit 1;

(note that I'm using '' rather than '='). This should always work,
and be pretty fast. There is a caveat, tho': this won't work if
you need "exact randomness", i.e. certain records will have a
better chance of being selected than others. This gets worse,
the larger "holes" are in sets of deleted id's.

/ Carsten
--
Carsten H. Pedersen
keeper and maintainer of the bitbybit.dk MySQL FAQ
http://www.bitbybit.dk/mysqlfaq



 This is very fast (assuming the id field is a unique index).  But it has
 the problem that if records have been deleted I might get a 0-row
 response.  It also does not work if I want to limit to a particular
 category, for instance "where category='women'" or something.

 I could do this too:

 select count(*) from table;
 $random_number = ...
 select * from table limit $random_number,1;

 This has the benefit of always working but the speed, though faster than
 the "order by rand()" method, remains unacceptable.  The speed seems
 linear with regard to the size of $random_number; which is probably
 obvious to you.

 So I've experimented with several other things:

 select * from table where limit rand(),1;
 select * from table where id=(mod(floor(rand()*4294967296),count(*))+1);
 .. and it only gets uglier from -- these are all not accepted by MySQL.

 MySQL does not allow for subqueries which is another way it could possibly
 be accomplished.  In the end, I'll just use what works, no matter the
 speed.

 BUT, I'd love to hear what other people have done to solve this problem!

 Thanks,
 Steve


 -
 Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)

 To request this thread, e-mail [EMAIL PROTECTED]
 To unsubscribe, e-mail
 [EMAIL PROTECTED]
 Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: ORDER BY RAND() Too Slow! Alternatives?

2001-02-10 Thread Robert Barrington

?php

$query = "SELECT col1, col2 FROM the_table ORDER BY RAND() LIMIT 1";
$result = mysql_query($query) or die("could not query");
$row = mysql_fetch_array($result);
print $row[col1];
print "P";
print $row[col2];

?

Robert B. Barrington

GetMart Commercial Ecom: Web Administrator
http://weddinginlasvegas.com/ 
http://getmart.com/ 
[EMAIL PROTECTED]
Vegas Vista Productions
3172 North Rainbow Boulevard
Suite 326
Las Vegas, Nevada 89108-4534
Telephone: (702)656-1027
Facsimile: (702)656-1608

-Original Message-
From: Stephen Waits [mailto:[EMAIL PROTECTED]]
Sent: Saturday, February 10, 2001 12:13 PM
To: [EMAIL PROTECTED]
Subject: ORDER BY RAND() Too Slow! Alternatives?


Hi there,

In the quest to get a random row from a table, "order by rand()" has
proven too inefficient and slow.  It's slow because MySQL apparently
selects ALL rows into memory, then randomly shuffles ALL of them, then
gives you the first one - very inefficient.  There are a few other ways
I've thrown around but none are "elegant".

One is, if a table has an id # column, like "id int unsigned not null
auto_increment", I could do this:

select max(id) from table;
$random_number = ...
select * from table where id=$random_number;

This is very fast (assuming the id field is a unique index).  But it has
the problem that if records have been deleted I might get a 0-row
response.  It also does not work if I want to limit to a particular
category, for instance "where category='women'" or something.

I could do this too:

select count(*) from table;
$random_number = ...
select * from table limit $random_number,1;

This has the benefit of always working but the speed, though faster than
the "order by rand()" method, remains unacceptable.  The speed seems
linear with regard to the size of $random_number; which is probably
obvious to you.

So I've experimented with several other things:

select * from table where limit rand(),1;
select * from table where id=(mod(floor(rand()*4294967296),count(*))+1);
.. and it only gets uglier from -- these are all not accepted by MySQL.

MySQL does not allow for subqueries which is another way it could possibly
be accomplished.  In the end, I'll just use what works, no matter the
speed.

BUT, I'd love to hear what other people have done to solve this problem!

Thanks,
Steve


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: ORDER BY RAND() Too Slow! Alternatives?

2001-02-10 Thread Stephen Waits



"Jeffrey D. Wheelhouse" wrote:
 
 SELECT @lines:=COUNT(id) FROM table;
 SET @rand=CEILING(RAND()*@lines);
 SELECT * FROM table WHERE (@rand:=@rand-1)+id=id;

Never mind on the "it doesn't work on my system" more like it didn't
work on my brain :)  Works fine.  And now that I ponder it a bit more
and I think I understand what it's doing I see the performance
implications.

Theoretically it could be as fast as Carsten's method couldn't it?  If
it hit a record on the first shot?  Otherwise it's pounding through an
index O(random-nearest_id) where his does it O(1).  And could it
potentially loop infinitely?

--Steve

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php