Re: speeding up iterating over query set

2011-07-12 Thread Andre Terra
Have you looked at haystack?

http://haystacksearch.org/


Cheers,
André

On Tue, Jul 12, 2011 at 10:49 AM, Michel30  wrote:

> I have tried and I think I have it mostly working: it returns ALL
> unique docid's.
>
> What is left is that my code is part of a search function. Originally
> I got normalized keywords from a user and used those Q-objects to look
> for keywords in a selected set of columns.
>
> I still have to figure out how to get that into the SQL part..
>
> On Jul 12, 3:29 pm, bruno desthuilliers
>  wrote:
> > On Jul 12, 12:26 pm, Michel30  wrote:
> >
> > > Hi guys,
> >
> > > I've been trying your suggestions but I'm afraid I'm stretching the
> > > limits of my Python/Django abilities ;-)
> >
> > > Bruno got it right: what I want is a queryset of "model" with distinct
> > > docid having the highest version number, sorted by revisiondate.
> >
> > (snip)
> > > My code does this, but the loop that selects the distinct docid's is
> > > what makes it terribly slow...
> >
> > Then why don't you just try the solution(s) I posted ?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: speeding up iterating over query set

2011-07-12 Thread bruno desthuilliers
On Jul 12, 3:49 pm, Michel30  wrote:
> I have tried and I think I have it mostly working: it returns ALL
> unique docid's.
>
> What is left is that my code is part of a search function. Originally
> I got normalized keywords from a user and used those Q-objects to look
> for keywords in a selected set of columns.
>
> I still have to figure out how to get that into the SQL part..

With the 2-fold solution I suggested (raw SQL query to retrieve latest
revisions then "normal" ORM query) you may not have to "get that into
the SQL part" - just use it in the second "normal" ORM query.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: speeding up iterating over query set

2011-07-12 Thread Michel30
I have tried and I think I have it mostly working: it returns ALL
unique docid's.

What is left is that my code is part of a search function. Originally
I got normalized keywords from a user and used those Q-objects to look
for keywords in a selected set of columns.

I still have to figure out how to get that into the SQL part..

On Jul 12, 3:29 pm, bruno desthuilliers
 wrote:
> On Jul 12, 12:26 pm, Michel30  wrote:
>
> > Hi guys,
>
> > I've been trying your suggestions but I'm afraid I'm stretching the
> > limits of my Python/Django abilities ;-)
>
> > Bruno got it right: what I want is a queryset of "model" with distinct
> > docid having the highest version number, sorted by revisiondate.
>
> (snip)
> > My code does this, but the loop that selects the distinct docid's is
> > what makes it terribly slow...
>
> Then why don't you just try the solution(s) I posted ?

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: speeding up iterating over query set

2011-07-12 Thread bruno desthuilliers
On Jul 12, 12:26 pm, Michel30  wrote:
> Hi guys,
>
> I've been trying your suggestions but I'm afraid I'm stretching the
> limits of my Python/Django abilities ;-)
>
> Bruno got it right: what I want is a queryset of "model" with distinct
> docid having the highest version number, sorted by revisiondate.
>
(snip)
> My code does this, but the loop that selects the distinct docid's is
> what makes it terribly slow...

Then why don't you just try the solution(s) I posted ?

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: speeding up iterating over query set

2011-07-12 Thread Michel30
Hi guys,

I've been trying your suggestions but I'm afraid I'm stretching the
limits of my Python/Django abilities ;-)

Bruno got it right: what I want is a queryset of "model" with distinct
docid having the highest version number, sorted by revisiondate.

If have the following result of my
found_entries = model.objects.filter((Q-
objects),obsolete=0).order_by('-version','docid') :

++---+-+--+
| pk | docid | version | revisiondate |
++---+-+--+
|  1 | 1 |   1 | 2000-02-10   |
|  2 | 2 |   1 | 2000-02-11   |
|  3 | 3 |   1 | 2000-02-12   |
|  4 | 3 |   3 | 2000-02-13   |
|  5 | 2 |   3 | 2000-02-14   |
|  6 | 1 |   3 | 2000-02-15   |
++---+-+--+

Then I want to retrieve only these results, sorted on revisiondate:

++---+-+--+
| pk | docid | version | revisiondate |
++---+-+--+
|  6 | 1 |   3 | 2000-02-15   |
|  5 | 2 |   3 | 2000-02-14   |
|  4 | 3 |   3 | 2000-02-13   |
++---+-+--+

My code does this, but the loop that selects the distinct docid's is
what makes it terribly slow...

Hope this clarifies it.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: speeding up iterating over query set

2011-07-12 Thread Michel30
Hi guys,

I've been trying your suggestions but I'm afraid I'm stretching the
limits of my Python/Django abilities ;-)

Bruno got it right: what I want is a queryset of "model" with distinct
docid having the highest version number, sorted by revisiondate.

If have the following result of my
found_entries = model.objects.filter((Q-
objects),obsolete=0).order_by('-version','docid') :

++---+-+--+
| pk | docid | version | revisiondate |
++---+-+--+
|  1 | 1 |   1 | 2000-02-10   |
|  2 | 2 |   1 | 2000-02-11   |
|  3 | 3 |   1 | 2000-02-12   |
|  4 | 3 |   3 | 2000-02-13   |
|  5 | 2 |   3 | 2000-02-14   |
|  6 | 1 |   3 | 2000-02-15   |
++---+-+--+

Then I want to retrieve only these results, sorted on revisiondate:

++---+-+--+
| pk | docid | version | revisiondate |
++---+-+--+
|  6 | 1 |   3 | 2000-02-15   |
|  5 | 2 |   3 | 2000-02-14   |
|  4 | 3 |   3 | 2000-02-13   |
++---+-+--+

My code does this, but the loop that selects the distinct docid's is
what makes it terribly slow...

Hope this clarifies it.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: speeding up iterating over query set

2011-07-12 Thread Michel30
Hi guys,

I've been trying your suggestions but I'm afraid I'm stretching the
limits of my Python/Django abilities ;-)

Bruno got it right: what I want is a queryset of "model" with distinct
docid having the highest version number, sorted by revisiondate.

If have the following result of my
found_entries = model.objects.filter((Q-
objects),obsolete=0).order_by('-version','docid') :

++---+-+--+
| pk | docid | version | revisiondate |
++---+-+--+
|  1 | 1 |   1 | 2000-02-10   |
|  2 | 2 |   1 | 2000-02-11   |
|  3 | 3 |   1 | 2000-02-12   |
|  4 | 3 |   3 | 2000-02-13   |
|  5 | 2 |   3 | 2000-02-14   |
|  6 | 1 |   3 | 2000-02-15   |
++---+-+--+

Then I want to retrieve only these results, sorted on revisiondate:

++---+-+--+
| pk | docid | version | revisiondate |
++---+-+--+
|  6 | 1 |   3 | 2000-02-15   |
|  5 | 2 |   3 | 2000-02-14   |
|  4 | 3 |   3 | 2000-02-13   |
++---+-+--+

My code does this, but the loop that selects the distinct docid's is
what makes it terribly slow...

Hope this clarifies it.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: speeding up iterating over query set

2011-07-11 Thread bruno desthuilliers
On 11 juil, 15:57, Michel30  wrote:
> Hi all,
>
> I have a basic search function that uses Q objects.
> After profiling it I found that the actual (mysql) database query
> finishes in fractions of seconds but the iterating after this can take
> up to 50 seconds per 10.000 results.
>
> I have been trying to speed it up but I have had not much results..
>
> My query is this one:
>
>        found_entries = model.objects.filter((Q-objects),
> obsolete=0).order_by('-version','docid')
>
> So far so good, but then I need a dictionary to retrieve only unique
> 'documentid's'.
>
>     rev_dict = {}
>
> This is the part that hurts:
>
>     for d in found_entries:
>         rev_dict[d.documentid] = d



> And then some more sorting and filtering:
>
>     filtered_entries = rev_dict.values()
>     filtered_entries.sort(key=lambda d: d.revisiondate, reverse=True)
>
> Does anyone have some better ideas to achieve this?

Ok, so what you want is a queryset of "model" with distinct docid
having the highest version number, sorted by revisiondate ? The
cleanest solution would be to first write the appropriate SQL query,
then find out how to express it using django. FWIW, the raw SQL query
might look like this (MySQL):

mysql> select f1.pk, f1.docid, f1.version, f1.revisiondate from foo f1
where f1.version=(select max(f2.version) from foo f2 where
f2.docid=f1.docid) order by revisiondate desc;
++---+-+--+
| pk | docid | version | revisiondate |
++---+-+--+
|  7 | 3 |   2 | 2000-02-12   |
|  6 | 1 |   2 | 2000-02-11   |
|  5 | 2 |   3 | 2000-02-10   |
++---+-+--+
3 rows in set (0.00 sec)

NB: table definition and date being:

mysql> explain foo;
+--+-+--+-+-++
| Field| Type| Null | Key | Default | Extra  |
+--+-+--+-+-++
| pk   | int(11) | NO   | PRI | NULL| auto_increment |
| docid| int(11) | NO   | | NULL||
| version  | int(11) | NO   | | NULL||
| revisiondate | date| NO   | | NULL||
+--+-+--+-+-++

mysql> select * from foo;
++---+-+--+
| pk | docid | version | revisiondate |
++---+-+--+
|  1 | 1 |   1 | 2000-01-01   |
|  2 | 2 |   1 | 2000-01-02   |
|  3 | 3 |   1 | 2000-01-02   |
|  4 | 2 |   2 | 2000-02-10   |
|  5 | 2 |   3 | 2000-02-10   |
|  6 | 1 |   2 | 2000-02-11   |
|  7 | 3 |   2 | 2000-02-12   |
++---+-+--+
7 rows in set (0.00 sec)


NB: Adding indexes on docid and/or version might help (or damage)
performances, depending on your data set.

Now you just have to learn how to use django's orm F, max and extra.
Or, for a simpler solution (but less portable) solution, use a raw SQL
query to only retrieve distinct pk then a second query to build the
whole queryset, ie:

from django.db import connection

def get_latest_revisions():
cursor = connection.cursor()
cursor.execute("""
SELECT DISTINCT f1.pk  FROM foo as f1
WHERE f1.version=(SELECT max(f2.version) FROM foo f2 WHERE
f2.docid=f1.docid)
""")
ids = [row[0] for row in cursor] # or is it "row['pk']" ???
cursor.close()
return model.objects.filter(pk__in=ids).order_by("-revision_date")

This might possibly be done using
manager.raw(your_sql_here).values_list("pk", flat=True) to build the
pk list, but I have never used RawQuerySets so far and the doc doesn't
tell if RawQuerySet supports values_list so you'll have to try by
yourself.

HTH

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: speeding up iterating over query set

2011-07-11 Thread Andre Terra
You seem to be hauling the same data from one side to the other, and I can't
see why.

Try:

MyModel.objects.values_list('id', flat=True)

You can use that to build a dictionary if you add a second field, or using
list comprehensions which I believe are faster than for loops,

dict(MyModel.objects.values_list('id', 'docid'))

but I honestly don't understand what you are trying to do. Maybe if you
share more of the model logic and explain what you are trying to achieve we
can provide you with a way to get there, instead of trying to improve the
way you decided to go about it.


Cheers,
André Terra


On Mon, Jul 11, 2011 at 10:57 AM, Michel30  wrote:

> Hi all,
>
> I have a basic search function that uses Q objects.
> After profiling it I found that the actual (mysql) database query
> finishes in fractions of seconds but the iterating after this can take
> up to 50 seconds per 10.000 results.
>
> I have been trying to speed it up but I have had not much results..
>
> My query is this one:
>
>   found_entries = model.objects.filter((Q-objects),
> obsolete=0).order_by('-version','docid')
>
> So far so good, but then I need a dictionary to retrieve only unique
> 'documentid's'.
>
>rev_dict = {}
>
> This is the part that hurts:
>
>for d in found_entries:
>rev_dict[d.documentid] = d
>
> And then some more sorting and filtering:
>
>filtered_entries = rev_dict.values()
>filtered_entries.sort(key=lambda d: d.revisiondate, reverse=True)
>
> Does anyone have some better ideas to achieve this?
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: speeding up iterating over query set

2011-07-11 Thread Michel30
I didn't try this approach yet, I'll give it a go and let you know my
mileage.

Thanks

On Jul 11, 4:06 pm, "Cal Leeming [Simplicity Media Ltd]"
 wrote:
> On Mon, Jul 11, 2011 at 3:04 PM, Cal Leeming [Simplicity Media Ltd] <
>
>
>
> cal.leem...@simplicitymedialtd.co.uk> wrote:
>
> > On Mon, Jul 11, 2011 at 2:57 PM, Michel30  wrote:
>
> >> Hi all,
>
> >> I have a basic search function that uses Q objects.
> >> After profiling it I found that the actual (mysql) database query
> >> finishes in fractions of seconds but the iterating after this can take
> >> up to 50 seconds per 10.000 results.
>
> >> I have been trying to speed it up but I have had not much results..
>
> >> My query is this one:
>
> >>       found_entries = model.objects.filter((Q-objects),
> >> obsolete=0).order_by('-version','docid')
>
> >> So far so good, but then I need a dictionary to retrieve only unique
> >> 'documentid's'.
>
> > You could do:
>
> > # grab all results
> > _res =
> > model.objects.filter((Q-objects),obsolete=0).order_by('-version','docid').values()
> > # re-map them into (id, obj)
> > _res = map(lambda x: [x.docid, x], _res)
> > # wrap in a dict(), which uses index position 0 as the key, and index
> > position 1 as the value
> > _res = dict(_res)
>
> Just tested the same principle, and it seems to work fine. It uses last
> object found as the final choice if dups are found.
>
> >>> _res = [ [1,2], [1,3], [2,4], [2,5] ]
> >>> map(lambda x: [x[0], x[1]], _res)
>
> [[1, 2], [1, 3], [2, 4], [2, 5]]>>> map(lambda x: [x[0], x[1]], _res)
>
> [[1, 2], [1, 3], [2, 4], [2, 5]]
>
> >>> dict(map(lambda x: [x[0], x[1]], _res))
> {1: 3, 2: 5}
>
> > Let me know if this give you the results you need.
>
> >>    rev_dict = {}
>
> >> This is the part that hurts:
>
> >>    for d in found_entries:
> >>        rev_dict[d.documentid] = d
>
> >> And then some more sorting and filtering:
>
> >>    filtered_entries = rev_dict.values()
> >>    filtered_entries.sort(key=lambda d: d.revisiondate, reverse=True)
>
> >> Does anyone have some better ideas to achieve this?
>
> >> Thanks
>
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "Django users" group.
> >> To post to this group, send email to django-users@googlegroups.com.
> >> To unsubscribe from this group, send email to
> >> django-users+unsubscr...@googlegroups.com.
> >> For more options, visit this group at
> >>http://groups.google.com/group/django-users?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: speeding up iterating over query set

2011-07-11 Thread Cal Leeming [Simplicity Media Ltd]
On Mon, Jul 11, 2011 at 3:06 PM, Cal Leeming [Simplicity Media Ltd] <
cal.leem...@simplicitymedialtd.co.uk> wrote:

>
>
> On Mon, Jul 11, 2011 at 3:04 PM, Cal Leeming [Simplicity Media Ltd] <
> cal.leem...@simplicitymedialtd.co.uk> wrote:
>
>>
>>
>> On Mon, Jul 11, 2011 at 2:57 PM, Michel30  wrote:
>>
>>> Hi all,
>>>
>>> I have a basic search function that uses Q objects.
>>> After profiling it I found that the actual (mysql) database query
>>> finishes in fractions of seconds but the iterating after this can take
>>> up to 50 seconds per 10.000 results.
>>>
>>> I have been trying to speed it up but I have had not much results..
>>>
>>> My query is this one:
>>>
>>>   found_entries = model.objects.filter((Q-objects),
>>> obsolete=0).order_by('-version','docid')
>>>
>>> So far so good, but then I need a dictionary to retrieve only unique
>>> 'documentid's'.
>>>
>>
>> You could do:
>>
>> # grab all results
>> _res =
>> model.objects.filter((Q-objects),obsolete=0).order_by('-version','docid').values()
>> # re-map them into (id, obj)
>> _res = map(lambda x: [x.docid, x], _res)
>>
> Sorry, small adjustment:

# re-map them into (id, obj), model(**x) will remap the values into new ORM
objects
_res = map(lambda x: [x.get('docid'), model(**x)], _res)


# wrap in a dict(), which uses index position 0 as the key, and index
>> position 1 as the value
>> _res = dict(_res)
>>
>
> Just tested the same principle, and it seems to work fine. It uses last
> object found as the final choice if dups are found.
>
> >>> _res = [ [1,2], [1,3], [2,4], [2,5] ]
> >>> map(lambda x: [x[0], x[1]], _res)
> [[1, 2], [1, 3], [2, 4], [2, 5]]
> >>> map(lambda x: [x[0], x[1]], _res)
> [[1, 2], [1, 3], [2, 4], [2, 5]]
> >>> dict(map(lambda x: [x[0], x[1]], _res))
> {1: 3, 2: 5}
> >>>
>
>
>
>
>>
>> Let me know if this give you the results you need.
>>
>>
>>>rev_dict = {}
>>>
>>> This is the part that hurts:
>>>
>>>for d in found_entries:
>>>rev_dict[d.documentid] = d
>>>
>>> And then some more sorting and filtering:
>>>
>>>filtered_entries = rev_dict.values()
>>>filtered_entries.sort(key=lambda d: d.revisiondate, reverse=True)
>>>
>>> Does anyone have some better ideas to achieve this?
>>>
>>> Thanks
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Django users" group.
>>> To post to this group, send email to django-users@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> django-users+unsubscr...@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/django-users?hl=en.
>>>
>>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: speeding up iterating over query set

2011-07-11 Thread Cal Leeming [Simplicity Media Ltd]
On Mon, Jul 11, 2011 at 3:04 PM, Cal Leeming [Simplicity Media Ltd] <
cal.leem...@simplicitymedialtd.co.uk> wrote:

>
>
> On Mon, Jul 11, 2011 at 2:57 PM, Michel30  wrote:
>
>> Hi all,
>>
>> I have a basic search function that uses Q objects.
>> After profiling it I found that the actual (mysql) database query
>> finishes in fractions of seconds but the iterating after this can take
>> up to 50 seconds per 10.000 results.
>>
>> I have been trying to speed it up but I have had not much results..
>>
>> My query is this one:
>>
>>   found_entries = model.objects.filter((Q-objects),
>> obsolete=0).order_by('-version','docid')
>>
>> So far so good, but then I need a dictionary to retrieve only unique
>> 'documentid's'.
>>
>
> You could do:
>
> # grab all results
> _res =
> model.objects.filter((Q-objects),obsolete=0).order_by('-version','docid').values()
> # re-map them into (id, obj)
> _res = map(lambda x: [x.docid, x], _res)
> # wrap in a dict(), which uses index position 0 as the key, and index
> position 1 as the value
> _res = dict(_res)
>

Just tested the same principle, and it seems to work fine. It uses last
object found as the final choice if dups are found.

>>> _res = [ [1,2], [1,3], [2,4], [2,5] ]
>>> map(lambda x: [x[0], x[1]], _res)
[[1, 2], [1, 3], [2, 4], [2, 5]]
>>> map(lambda x: [x[0], x[1]], _res)
[[1, 2], [1, 3], [2, 4], [2, 5]]
>>> dict(map(lambda x: [x[0], x[1]], _res))
{1: 3, 2: 5}
>>>




>
> Let me know if this give you the results you need.
>
>
>>rev_dict = {}
>>
>> This is the part that hurts:
>>
>>for d in found_entries:
>>rev_dict[d.documentid] = d
>>
>> And then some more sorting and filtering:
>>
>>filtered_entries = rev_dict.values()
>>filtered_entries.sort(key=lambda d: d.revisiondate, reverse=True)
>>
>> Does anyone have some better ideas to achieve this?
>>
>> Thanks
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To post to this group, send email to django-users@googlegroups.com.
>> To unsubscribe from this group, send email to
>> django-users+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/django-users?hl=en.
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: speeding up iterating over query set

2011-07-11 Thread Cal Leeming [Simplicity Media Ltd]
On Mon, Jul 11, 2011 at 2:57 PM, Michel30  wrote:

> Hi all,
>
> I have a basic search function that uses Q objects.
> After profiling it I found that the actual (mysql) database query
> finishes in fractions of seconds but the iterating after this can take
> up to 50 seconds per 10.000 results.
>
> I have been trying to speed it up but I have had not much results..
>
> My query is this one:
>
>   found_entries = model.objects.filter((Q-objects),
> obsolete=0).order_by('-version','docid')
>
> So far so good, but then I need a dictionary to retrieve only unique
> 'documentid's'.
>

You could do:

# grab all results
_res =
model.objects.filter((Q-objects),obsolete=0).order_by('-version','docid').values()
# re-map them into (id, obj)
_res = map(lambda x: [x.docid, x], _res)
# wrap in a dict(), which uses index position 0 as the key, and index
position 1 as the value
_res = dict(_res)

Let me know if this give you the results you need.


>rev_dict = {}
>
> This is the part that hurts:
>
>for d in found_entries:
>rev_dict[d.documentid] = d
>
> And then some more sorting and filtering:
>
>filtered_entries = rev_dict.values()
>filtered_entries.sort(key=lambda d: d.revisiondate, reverse=True)
>
> Does anyone have some better ideas to achieve this?
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



speeding up iterating over query set

2011-07-11 Thread Michel30
Hi all,

I have a basic search function that uses Q objects.
After profiling it I found that the actual (mysql) database query
finishes in fractions of seconds but the iterating after this can take
up to 50 seconds per 10.000 results.

I have been trying to speed it up but I have had not much results..

My query is this one:

   found_entries = model.objects.filter((Q-objects),
obsolete=0).order_by('-version','docid')

So far so good, but then I need a dictionary to retrieve only unique
'documentid's'.

rev_dict = {}

This is the part that hurts:

for d in found_entries:
rev_dict[d.documentid] = d

And then some more sorting and filtering:

filtered_entries = rev_dict.values()
filtered_entries.sort(key=lambda d: d.revisiondate, reverse=True)

Does anyone have some better ideas to achieve this?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.