Re: Select x random rows from DB

2011-02-21 Thread Phlip
On Feb 20, 2:19 pm, galago  wrote:

> What is the best way, to select X random rows from DB? I know that
> method: .all().order_by('?')[:X] is not good idea.
> What methods do you use?

  order_by('RAND()')

That might use the same seed each time.

To create, for example, a rotating home page with different content
each day, but the same after each page refresh, take today's date, MD5
hash it, and stick the hash between the () on RAND().

We used to use that with MySQL in a performance-sensitive environment,
and nobody complained about it...

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-21 Thread Christophe Pettus
If you know this is going to be an important part of the application, it might 
make sense to have a random primary key, such as a UUID.

--
-- Christophe Pettus
  x...@thebuild.com

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-21 Thread Cal Leeming [Simplicity Media Ltd]
Please see my previous post with the django snippet for more info on this
method (as it uses this same principle).

On Mon, Feb 21, 2011 at 10:06 PM, Mikhail Korobov wrote:

> This is the function for getting 1 item that works even if some rows
> were deleted that works times faster than order_by('?') even for not-
> so-big datasets at least on mysql:
>
> def get_random_item(model, max_id=None):
>if max_id is None:
>max_id = model.objects.aggregate(Max('id')).values()[0]
>min_id = math.ceil(max_id*random.random())
>return model.objects.filter(id__gte=min_id)[0]
>
> It assumes that almost all records are still in DB and ids are more or
> less successive because otherwise the distribution won't be uniform.
> If there are a lot of items in DB then it should be almost safe to
> call this several times (and re-call if the same row is obtained).
>
> On 22 фев, 02:10, galago  wrote:
> > I need to do this in my tagcloud.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-21 Thread Mikhail Korobov
Stackoverflow: 
http://stackoverflow.com/questions/962619/how-to-pull-a-random-record-using-djangos-orm/971671#971671

On 22 фев, 03:06, Mikhail Korobov  wrote:
> This is the function for getting 1 item that works even if some rows
> were deleted that works times faster than order_by('?') even for not-
> so-big datasets at least on mysql:
>
> def get_random_item(model, max_id=None):
>     if max_id is None:
>         max_id = model.objects.aggregate(Max('id')).values()[0]
>     min_id = math.ceil(max_id*random.random())
>     return model.objects.filter(id__gte=min_id)[0]
>
> It assumes that almost all records are still in DB and ids are more or
> less successive because otherwise the distribution won't be uniform.
> If there are a lot of items in DB then it should be almost safe to
> call this several times (and re-call if the same row is obtained).
>
> On 22 фев, 02:10, galago  wrote:
>
>
>
>
>
>
>
> > I need to do this in my tagcloud.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-21 Thread Mikhail Korobov
This is the function for getting 1 item that works even if some rows
were deleted that works times faster than order_by('?') even for not-
so-big datasets at least on mysql:

def get_random_item(model, max_id=None):
if max_id is None:
max_id = model.objects.aggregate(Max('id')).values()[0]
min_id = math.ceil(max_id*random.random())
return model.objects.filter(id__gte=min_id)[0]

It assumes that almost all records are still in DB and ids are more or
less successive because otherwise the distribution won't be uniform.
If there are a lot of items in DB then it should be almost safe to
call this several times (and re-call if the same row is obtained).

On 22 фев, 02:10, galago  wrote:
> I need to do this in my tagcloud.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-21 Thread galago
I need to do this in my tagcloud.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-21 Thread Christophe Pettus

On Feb 21, 2011, at 10:34 AM, Eric Chamberlain wrote:
> If you have lots of rows, this query is really slow as the db must build a 
> new table for the ORDER BY.

You can do better if you have a guaranteed ordinal on the rows; otherwise, it 
has to do a full table scan no matter what.

--
-- Christophe Pettus
   x...@thebuild.com

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-21 Thread Eric Chamberlain

On Feb 20, 2011, at 4:30 PM, Christophe Pettus wrote:

> 
> On Feb 20, 2011, at 2:19 PM, galago wrote:
> 
>> What is the best way, to select X random rows from DB? I know that method: 
>> .all().order_by('?')[:X] is not good idea.
> 
> The best way is to push it onto the DB, using a raw query:
> 
>   random_results = Table.objects.raw("SELECT * FROM table ORDER BY 
> random() LIMIT X")

If you have lots of rows, this query is really slow as the db must build a new 
table for the ORDER BY.

--
Eric Chamberlain, Founder
RF.com - http://RF.com/







-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-21 Thread Cal Leeming [Simplicity Media Ltd]
Chris, please don't tell someone "this is the best way", when the solution
is dependant on so many other factors (as stated in my original reply). Our
answers strongly influence other peoples decisions, and "best practice"
answers should only ever be given if you are 100% sure it is correct, and
have personally experimented with every other possible method.

Please don't take offence to this email, I would expect someone to say the
same to me if I had made a similar mistake.


> On 21 Feb 2011 00:31, "Christophe Pettus"  wrote:
>
>
> On Feb 20, 2011, at 2:19 PM, galago wrote:
>
> > What is the best way, to select X random rows from DB...
>
> The best way is to push it onto the DB, using a raw query:
>
>random_results = Table.objects.raw("SELECT * FROM table ORDER BY
> random() LIMIT X")
>
> --
> -- Christophe Pettus
>   x...@thebuild.com
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-20 Thread Christophe Pettus

On Feb 20, 2011, at 2:19 PM, galago wrote:

> What is the best way, to select X random rows from DB? I know that method: 
> .all().order_by('?')[:X] is not good idea.

The best way is to push it onto the DB, using a raw query:

random_results = Table.objects.raw("SELECT * FROM table ORDER BY 
random() LIMIT X")

--
-- Christophe Pettus
   x...@thebuild.com

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-20 Thread Cal Leeming [Simplicity Media Ltd]
Also, take strong notice of this part:

EXAMPLE:>>> _t = time.time(); x = map(lambda x: x,
Post.objects.filter(id__gte=40, id__lt=400500).all()); print "Took
%ss"%(time.time() - _t)Took 0.0467309951782s>>> _t =
time.time(); _res = map(lambda x: x,
Post.objects.all()[40:400500]); print "Took %ss"%(time.time() -
_t)Took 1.05785298347s>>>



On Sun, Feb 20, 2011 at 10:35 PM, Cal Leeming [Simplicity Media Ltd] <
cal.leem...@simplicitymedialtd.co.uk> wrote:

> Also, this snippet isn't directly related, but it does show the methods we
> used to paginate through 50 million rows, with query times of below 0.5
> seconds :) It uses the same method that was in my first suggestion (the ID
> max thing)
>
> http://djangosnippets.org/snippets/2277/
>
> Cal
>
> On Sun, Feb 20, 2011 at 10:26 PM, <delegb...@dudupay.com> wrote:
>
>> That would have been my approach if I were to do that. So, we are now two
>> seeking the good way to get it done.
>> Hello People,
>> Galago and I have the same challenge, please help us.
>>
>> Sent from my BlackBerry wireless device from MTN
>> --
>> *From: * galago <prog...@gmail.com>
>> *Sender: * django-users@googlegroups.com
>> *Date: *Sun, 20 Feb 2011 14:24:37 -0800 (PST)
>> *To: *<django-users@googlegroups.com>
>> *ReplyTo: * django-users@googlegroups.com
>> *Subject: *Re: Select x random rows from DB
>>
>> If i knew what is a good idea, I wouldn't aks here:D
>> I only know - that described method is DB killer :)
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To post to this group, send email to django-users@googlegroups.com.
>> To unsubscribe from this group, send email to
>> django-users+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/django-users?hl=en.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To post to this group, send email to django-users@googlegroups.com.
>> To unsubscribe from this group, send email to
>> django-users+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/django-users?hl=en.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-20 Thread Cal Leeming [Simplicity Media Ltd]
Also, this snippet isn't directly related, but it does show the methods we
used to paginate through 50 million rows, with query times of below 0.5
seconds :) It uses the same method that was in my first suggestion (the ID
max thing)

http://djangosnippets.org/snippets/2277/

Cal

On Sun, Feb 20, 2011 at 10:26 PM, <delegb...@dudupay.com> wrote:

> That would have been my approach if I were to do that. So, we are now two
> seeking the good way to get it done.
> Hello People,
> Galago and I have the same challenge, please help us.
>
> Sent from my BlackBerry wireless device from MTN
> --
> *From: * galago <prog...@gmail.com>
> *Sender: * django-users@googlegroups.com
> *Date: *Sun, 20 Feb 2011 14:24:37 -0800 (PST)
> *To: *<django-users@googlegroups.com>
> *ReplyTo: * django-users@googlegroups.com
> *Subject: *Re: Select x random rows from DB
>
> If i knew what is a good idea, I wouldn't aks here:D
> I only know - that described method is DB killer :)
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-20 Thread delegbede
That would have been my approach if I were to do that. So, we are now two 
seeking the good way to get it done. 
Hello People,
Galago and I have the same challenge, please help us. 
Sent from my BlackBerry wireless device from MTN

-Original Message-
From: galago <prog...@gmail.com>
Sender: django-users@googlegroups.com
Date: Sun, 20 Feb 2011 14:24:37 
To: <django-users@googlegroups.com>
Reply-To: django-users@googlegroups.com
Subject: Re: Select x random rows from DB

If i knew what is a good idea, I wouldn't aks here:D
I only know - that described method is DB killer :)

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-20 Thread Cal Leeming [Simplicity Media Ltd]
Hi Galago,

There are several ways to do this, and it really depends on the size of your
database, and what your storage policies are.

Here are some of the various methods we have used:

   - Use ID max, then random.randint().
   Only works if you *never* delete rows from the database, and instead have
   a field called 'is_deleted', which is set to 1 each time. Great if you have
   a database with more than 100 thousand rows.
   POC: from django.db.models import Max,Count,Q,F; total_items =
   model.aggregate(Max('id'))['id__max']; random.randint(0,total_items)

   - Use .count(), then random.randint()
   Count is slow as hell depending on how many rows you have.

   - Use ORDER BY RAND() inside the SQL itself, although from what I
   remember, this wasn't great.

It really does just come down to how well your database has been designed,
how many rows you have, and what you are attempting to achieve. I would
suggest creating a script which benchmarks all these different methods
against the data you have, and see which comes out top, that's what we do,
the results are sometimes surprising :)

Hope this helps.

Cal


On Sun, Feb 20, 2011 at 10:19 PM, galago  wrote:

> What is the best way, to select X random rows from DB? I know that
> method: .all().order_by('?')[:X] is not good idea.
> What methods do you use?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-20 Thread delegbede
Knowing what is not a good idea means you know what the good idea is, 
logically. 
So, Galago, what's the good idea???
Sent from my BlackBerry wireless device from MTN

-Original Message-
From: galago 
Sender: django-users@googlegroups.com
Date: Sun, 20 Feb 2011 14:19:14 
To: 
Reply-To: django-users@googlegroups.com
Subject: Select x random rows from DB

What is the best way, to select X random rows from DB? I know that 
method: .all().order_by('?')[:X] is not good idea.
What methods do you use?

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Select x random rows from DB

2011-02-20 Thread galago
If i knew what is a good idea, I wouldn't aks here:D
I only know - that described method is DB killer :)

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.