Re: ORDER BY RAND() Too Slow! Alternatives?

2001-02-11 Thread Jeffrey D. Wheelhouse

At 11:39 PM 2/10/2001 -0800, Stephen Waits wrote:
>Never mind on the "it doesn't work on my system" more like it didn't
>work on my brain :)  Works fine.

Oh, phew.

>Theoretically it could be as fast as Carsten's method couldn't it?  If
>it hit a record on the first shot?  Otherwise it's pounding through an
>index O(random-nearest_id) where his does it O(1).  And could it
>potentially loop infinitely?

Based on my admittedly pathetic understanding of B-trees and database 
indexes, I *think* Carsten's approach is O(lg n) on the number of rows.  My 
approach is O(M*n) on the number of rows, where M is a pretty lightweight 
access to nab the key.  The "LIMIT $rand, 1" approach is O(D*n/2) on the 
number of rows over time, but D is a nasty I/O hit to slurp the whole row 
into the resultset.

The only case where Carsten's approach and mine would converge would be if 
you were using a query where no index could be applied.  Then they'd both 
be stuck at O(N) on the number of rows.

I am curious whether "(@rand:=@rand-1)+id=id" can be optimized to remove 
the table reference (id) without having the query optimizer decide it only 
needs to run once.  That might shave a good bit off of M.

In a case like this, it would be handy to have a ROW() function that tracks 
the running counter being used to generate the "X rows in set." 
statistic.  But such a thing would probably be of limited utility.

At 11:28PM 2/10/2001 -0800, Stephen Waits wrote:
>Carsten's approach is one of those "duh" things I don't understand why I
>hadn't thought of it.?

Likewise.  It's a good reminder that clever solutions don't always come 
from linear thinking.  Thanks Carsten!

Jeff


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: ORDER BY RAND() Too Slow! Alternatives?

2001-02-10 Thread Stephen Waits



"Jeffrey D. Wheelhouse" wrote:
> 
> SELECT @lines:=COUNT(id) FROM table;
> SET @rand=CEILING(RAND()*@lines);
> SELECT * FROM table WHERE (@rand:=@rand-1)+id=id;

Never mind on the "it doesn't work on my system" more like it didn't
work on my brain :)  Works fine.  And now that I ponder it a bit more
and I think I understand what it's doing I see the performance
implications.

Theoretically it could be as fast as Carsten's method couldn't it?  If
it hit a record on the first shot?  Otherwise it's pounding through an
index O(random-nearest_id) where his does it O(1).  And could it
potentially loop infinitely?

--Steve

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: ORDER BY RAND() Too Slow! Alternatives?

2001-02-10 Thread Stephen Waits


"Jeffrey D. Wheelhouse" wrote:
> 
> Here's another approach.  I'm curious about the performance implications:
> 
> SELECT @lines:=COUNT(id) FROM table;
> SET @rand=CEILING(RAND()*@lines);
> SELECT * FROM table WHERE (@rand:=@rand-1)+id=id;
> 
> This *should* give each row an equal chance, but it's a rather nasty
> sit-'n-spin loop.  Instead of being linear with the size of the random
> number, it's linear with the size of the table, but the constant multiplier
> is much smaller.  For all but the largest and/or simplest tables, that
> should be a win.

This is broken for me - but, I've learned you can actually set variables
in SQL which just opened a HUGE door for me :)  I'm going to study more
about this ASAP.  ***Thanks for the suggestion.***  I get the following:

mysql> select * from table where (@rand:=@rand-1)+id=id;
ERROR 1064: You have an error in your SQL syntax near 'table where
(@rand:=@rand-1)+id=id' at line 1

> By my measurements, this is a good 3x faster than the LIMIT $rand, 1
> approach on my test table.  But if "id" is indexed, Carsten's no-calc
> approach still blows it away.

Carsten's approach is one of those "duh" things I don't understand why I
hadn't thought of it.  It's fast, and very closely approximates what I
want to do.

> Is it possible to do a "fair" match without incurring at least one full
> pass through the table?

This is "the question" isn't it.  At a minimum, you must know the # of
rows you have to choose from; depending on your indexes this may not
require a full pass through.  But then you must (ideally) randomly
access any single row in that set.

So maybe it cannot be done under MySQL..  What about Oracle, MS SQL, and
Postgres?

Thanks,
Steve

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: ORDER BY RAND() Too Slow! Alternatives?

2001-02-10 Thread Jeffrey D. Wheelhouse


Hmm, no reading comprehension points for me.  I managed to read Steve's 
message the first time without realizing that he only wanted one row.

Here's another approach.  I'm curious about the performance implications:

SELECT @lines:=COUNT(id) FROM table;
SET @rand=CEILING(RAND()*@lines);
SELECT * FROM table WHERE (@rand:=@rand-1)+id=id;

This *should* give each row an equal chance, but it's a rather nasty 
sit-'n-spin loop.  Instead of being linear with the size of the random 
number, it's linear with the size of the table, but the constant multiplier 
is much smaller.  For all but the largest and/or simplest tables, that 
should be a win.

By my measurements, this is a good 3x faster than the LIMIT $rand, 1 
approach on my test table.  But if "id" is indexed, Carsten's no-calc 
approach still blows it away.

Is it possible to do a "fair" match without incurring at least one full 
pass through the table?

Jeff

At 10:11 PM 2/10/2001 +0100, Carsten H. Pedersen wrote:
> > Hi there,
> >
> > In the quest to get a random row from a table, "order by rand()" has
> > proven too inefficient and slow.  It's slow because MySQL apparently
> > selects ALL rows into memory, then randomly shuffles ALL of them, then
> > gives you the first one - very inefficient.  There are a few other ways
> > I've thrown around but none are "elegant".
> >
> > One is, if a table has an id # column, like "id int unsigned not null
> > auto_increment", I could do this:
> >
> > select max(id) from table;
> > $random_number = ...
> > select * from table where id=$random_number;
>
>How about
> select * from table
> where id>$random_number
> order by id
> limit 1;
>
>(note that I'm using '>' rather than '='). This should always work,
>and be pretty fast. There is a caveat, tho': this won't work if
>you need "exact randomness", i.e. certain records will have a
>better chance of being selected than others. This gets worse,
>the larger "holes" are in sets of deleted id's.
>
>/ Carsten
>--
>Carsten H. Pedersen
>keeper and maintainer of the bitbybit.dk MySQL FAQ
>http://www.bitbybit.dk/mysqlfaq
>
>
> >
> > This is very fast (assuming the id field is a unique index).  But it has
> > the problem that if records have been deleted I might get a 0-row
> > response.  It also does not work if I want to limit to a particular
> > category, for instance "where category='women'" or something.
> >
> > I could do this too:
> >
> > select count(*) from table;
> > $random_number = ...
> > select * from table limit $random_number,1;
> >
> > This has the benefit of always working but the speed, though faster than
> > the "order by rand()" method, remains unacceptable.  The speed seems
> > linear with regard to the size of $random_number; which is probably
> > obvious to you.
> >
> > So I've experimented with several other things:
> >
> > select * from table where limit rand(),1;
> > select * from table where id=(mod(floor(rand()*4294967296),count(*))+1);
> > .. and it only gets uglier from -- these are all not accepted by MySQL.
> >
> > MySQL does not allow for subqueries which is another way it could possibly
> > be accomplished.  In the end, I'll just use what works, no matter the
> > speed.
> >
> > BUT, I'd love to hear what other people have done to solve this problem!
> >
> > Thanks,
> > Steve
> >
> >
> > -
> > Before posting, please check:
> >http://www.mysql.com/manual.php   (the manual)
> >http://lists.mysql.com/   (the list archive)
> >
> > To request this thread, e-mail <[EMAIL PROTECTED]>
> > To unsubscribe, e-mail
> > <[EMAIL PROTECTED]>
> > Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
> >
>
>
>-
>Before posting, please check:
>http://www.mysql.com/manual.php   (the manual)
>http://lists.mysql.com/   (the list archive)
>
>To request this thread, e-mail <[EMAIL PROTECTED]>
>To unsubscribe, e-mail <[EMAIL PROTECTED]>
>Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: ORDER BY RAND() Too Slow! Alternatives?

2001-02-10 Thread Robert Barrington

";
print $row[col2];

?>

Robert B. Barrington

GetMart Commercial Ecom: Web Administrator
http://weddinginlasvegas.com/ 
http://getmart.com/ 
[EMAIL PROTECTED]
Vegas Vista Productions
3172 North Rainbow Boulevard
Suite 326
Las Vegas, Nevada 89108-4534
Telephone: (702)656-1027
Facsimile: (702)656-1608

-Original Message-
From: Stephen Waits [mailto:[EMAIL PROTECTED]]
Sent: Saturday, February 10, 2001 12:13 PM
To: [EMAIL PROTECTED]
Subject: ORDER BY RAND() Too Slow! Alternatives?


Hi there,

In the quest to get a random row from a table, "order by rand()" has
proven too inefficient and slow.  It's slow because MySQL apparently
selects ALL rows into memory, then randomly shuffles ALL of them, then
gives you the first one - very inefficient.  There are a few other ways
I've thrown around but none are "elegant".

One is, if a table has an id # column, like "id int unsigned not null
auto_increment", I could do this:

select max(id) from table;
$random_number = ...
select * from table where id=$random_number;

This is very fast (assuming the id field is a unique index).  But it has
the problem that if records have been deleted I might get a 0-row
response.  It also does not work if I want to limit to a particular
category, for instance "where category='women'" or something.

I could do this too:

select count(*) from table;
$random_number = ...
select * from table limit $random_number,1;

This has the benefit of always working but the speed, though faster than
the "order by rand()" method, remains unacceptable.  The speed seems
linear with regard to the size of $random_number; which is probably
obvious to you.

So I've experimented with several other things:

select * from table where limit rand(),1;
select * from table where id=(mod(floor(rand()*4294967296),count(*))+1);
.. and it only gets uglier from -- these are all not accepted by MySQL.

MySQL does not allow for subqueries which is another way it could possibly
be accomplished.  In the end, I'll just use what works, no matter the
speed.

BUT, I'd love to hear what other people have done to solve this problem!

Thanks,
Steve


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: ORDER BY RAND() Too Slow! Alternatives?

2001-02-10 Thread Carsten H. Pedersen

> Hi there,
>
> In the quest to get a random row from a table, "order by rand()" has
> proven too inefficient and slow.  It's slow because MySQL apparently
> selects ALL rows into memory, then randomly shuffles ALL of them, then
> gives you the first one - very inefficient.  There are a few other ways
> I've thrown around but none are "elegant".
>
> One is, if a table has an id # column, like "id int unsigned not null
> auto_increment", I could do this:
>
> select max(id) from table;
> $random_number = ...
> select * from table where id=$random_number;

How about
select * from table
where id>$random_number
order by id
limit 1;

(note that I'm using '>' rather than '='). This should always work,
and be pretty fast. There is a caveat, tho': this won't work if
you need "exact randomness", i.e. certain records will have a
better chance of being selected than others. This gets worse,
the larger "holes" are in sets of deleted id's.

/ Carsten
--
Carsten H. Pedersen
keeper and maintainer of the bitbybit.dk MySQL FAQ
http://www.bitbybit.dk/mysqlfaq


>
> This is very fast (assuming the id field is a unique index).  But it has
> the problem that if records have been deleted I might get a 0-row
> response.  It also does not work if I want to limit to a particular
> category, for instance "where category='women'" or something.
>
> I could do this too:
>
> select count(*) from table;
> $random_number = ...
> select * from table limit $random_number,1;
>
> This has the benefit of always working but the speed, though faster than
> the "order by rand()" method, remains unacceptable.  The speed seems
> linear with regard to the size of $random_number; which is probably
> obvious to you.
>
> So I've experimented with several other things:
>
> select * from table where limit rand(),1;
> select * from table where id=(mod(floor(rand()*4294967296),count(*))+1);
> .. and it only gets uglier from -- these are all not accepted by MySQL.
>
> MySQL does not allow for subqueries which is another way it could possibly
> be accomplished.  In the end, I'll just use what works, no matter the
> speed.
>
> BUT, I'd love to hear what other people have done to solve this problem!
>
> Thanks,
> Steve
>
>
> -
> Before posting, please check:
>http://www.mysql.com/manual.php   (the manual)
>http://lists.mysql.com/   (the list archive)
>
> To request this thread, e-mail <[EMAIL PROTECTED]>
> To unsubscribe, e-mail
> <[EMAIL PROTECTED]>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
>


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: ORDER BY RAND() Too Slow! Alternatives?

2001-02-10 Thread Jeffrey D. Wheelhouse


Could you do something like:

CREATE TEMPORARY TABLE temptable (
   pk INTEGER,
   rand INTEGER
);
INSERT INTO temptable SELECT yourpk,Rand() FROM yourtable;
SELECT yourtable.* FROM yourtable,temptable WHERE pk=yourpk ORDER BY rand;
DROP TABLE temptable;

That might be quicker than your current approach.

Jeff

At 12:12 PM 2/10/2001 -0800, Stephen Waits wrote:

>Hi there,
>
>In the quest to get a random row from a table, "order by rand()" has
>proven too inefficient and slow.  It's slow because MySQL apparently
>selects ALL rows into memory, then randomly shuffles ALL of them, then
>gives you the first one - very inefficient.  There are a few other ways
>I've thrown around but none are "elegant".
>
>One is, if a table has an id # column, like "id int unsigned not null
>auto_increment", I could do this:
>
>select max(id) from table;
>$random_number = ...
>select * from table where id=$random_number;
>
>This is very fast (assuming the id field is a unique index).  But it has
>the problem that if records have been deleted I might get a 0-row
>response.  It also does not work if I want to limit to a particular
>category, for instance "where category='women'" or something.
>
>I could do this too:
>
>select count(*) from table;
>$random_number = ...
>select * from table limit $random_number,1;
>
>This has the benefit of always working but the speed, though faster than
>the "order by rand()" method, remains unacceptable.  The speed seems
>linear with regard to the size of $random_number; which is probably
>obvious to you.
>
>So I've experimented with several other things:
>
>select * from table where limit rand(),1;
>select * from table where id=(mod(floor(rand()*4294967296),count(*))+1);
>.. and it only gets uglier from -- these are all not accepted by MySQL.
>
>MySQL does not allow for subqueries which is another way it could possibly
>be accomplished.  In the end, I'll just use what works, no matter the
>speed.
>
>BUT, I'd love to hear what other people have done to solve this problem!
>
>Thanks,
>Steve
>
>
>-
>Before posting, please check:
>http://www.mysql.com/manual.php   (the manual)
>http://lists.mysql.com/   (the list archive)
>
>To request this thread, e-mail <[EMAIL PROTECTED]>
>To unsubscribe, e-mail <[EMAIL PROTECTED]>
>Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php