You should rewrite the query into a form that doesn't require distinct. In 
general, when you see a query that has joins and DISTINCT, that should be 
an alarm bell that something isn't written correctly in the query. 
Unfortunately Django's ORM generates such queries, and that isn't easy to 
fix as there are databases that like DISTINCT more than rewriting the query 
to use subqueries.

In any case, you should check how to write the query without the need of 
distinct. Something like this should work:

self.items = BibliographicRecord.objects.
listable_objects().filter( 
        authored__researcher__in=researchers, 
        )
self.items = BibliographicRecord.objects.
listable_objects().filter( 
        pk__in=self.items.values_list('pk') 
        )

But maybe you can push the __in to deeper into the authored__researches 
lookup...

 - Anssi

On Saturday, November 2, 2013 4:50:42 PM UTC+2, Daniele Procida wrote:
>
> On Fri, Nov 1, 2013, Javier Guerra Giraldez <jav...@guerrag.com<javascript:>> 
> wrote: 
>
> >have you tried eliminating the second IN relationship?  something like 
> > 
> >entities = entity.get_descendants() 
> > 
> >items = BibliographicRecord.objects.filter 
> >(authored__researcher__person__member_of__entity__in=entities).distinct() 
>
> Indeed I have, but in that form it takes around 1770ms, compared to around 
> 1540ms in the original form. What I actually do is: 
>
>     # breaking apart the queries allows the use of values_lists 
>     entities = self.entity.get_descendants( 
>         include_self=True 
>         ).values_list('id', flat=True) 
>
>     # and the set() here is about 230ms faster than putting a distinct() 
> on 
>     # the first query 
>     researchers = set(Researcher.objects.filter( 
>         person__entities__in=entities 
>         ).values_list('person', flat=True)) 
>
>     self.items = BibliographicRecord.objects.listable_objects().filter( 
>         authored__researcher__in=researchers, 
>         ).distinct() 
>
> I think that's partly because this way the SELECT doesn't have to grab all 
> the fields of publications_bibliographicrecord. 
>
> But, the real killer is the combination of ordering (in the queryset or on 
> the model, it doesn't matter) with the distinct() - as soon as one is 
> removed from the equation, the execution time drops to around 250ms. 
>
> That's for 55000 BibliographicRecords created by that last operation 
> (before distinct() is applied; distinct() reduces them to 28000). 
>
> That seems excessive to me. 
>
> BibliographicRecord has a custom primary key, and its id fields look like 
> "d9ce7e2f-663e-4fc6-8448-b214c6915aed:web-of-science". Could that be 
> implicated in performance? 
>
> Daniele 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/8208cc22-cc94-4fe6-9245-709bdc42647f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to