subject:"Project optimisation stage\: Advice to boost speed of database queries across tables\?"

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

2010-03-25 Thread Sam Walters

I also read a good article. As it turns out im using a lot of inner joins:

http://www.caktusgroup.com/blog/2009/09/

This  was quite handy.
My processis usually build it to be as fast as possible in raw SQL
then go back in and try and makre it with the ORM.

"I've read your description and it doesn't mention how often your data
changes. If it's not too often, you might want to consider simply
caching the whole results or parts of it. You'll then have to add a
bit of code to regenerate the cache every time the data changes."

Yes actually in some of the scenarios the data is quite static,
however im doing a lot of stuff with lat/long coordinates now so it
means i have a bunch of keys in the results dict returned which are
all dynamic, per click values and need to be recalculated a lot.
Note: I have taken your advice on board for some of my views Philippe.

Cheers

Sam_W


On Fri, Jan 8, 2010 at 12:18 AM, Philippe Raoult
 wrote:
> Hi Sam,
>
> I've read your description and it doesn't mention how often your data
> changes. If it's not too often, you might want to consider simply
> caching the whole results or parts of it. You'll then have to add a
> bit of code to regenerate the cache every time the data changes. It
> looks to me like you have one query for selecting which entries to
> print, then many repetitive queries to find information about each
> entry. If you had a per-entry cache then you would probably be down to
> one main query plus a lot of cache lookups.
>
> Also you were on the right track when using addresses =
> Address.objects.filter(directory__in=directories). I've often used
> that kind of queries to reduce (1 + many queries) down to (2 + lots of
> dictionnary operations)
>
> Hope that helps.
>
> Regards,
> Philippe
>
>
> On 7 jan, 11:39, Sam Walters  wrote:
>> Thanks Koen
>> I had suspected such things would exist but couldnt find them.
>> I will take a look at this. Looks like a neat addon which would
>> greatly help my project turnaround time.
>>
>> cheers
>> Sam
>>
>> On Wed, Jan 6, 2010 at 11:07 PM, koenb  wrote:
>>
>> > On 31 dec 2009, 01:56, Sam Walters  wrote:
>> >> Thanks for the replies.
>>
>> >> Yes, there is the option of going to raw MySQL. However the project
>> >> requirements mean i can't use raw SQL. (portability, readability)
>> >> From what i can see using django's db API i have to execute the
>> >> queries 500 times.
>> >> I am very familiar with the query documentation and i know that
>> >> select_related will prevent foward facing foreign keys translating to
>> >> an individual sql queries which hit the db and would slow it down.
>>
>> >> Fact is even when i dont use 'select_related' the major performance
>> >> problem occurs with the 'many-to-many' and 'reverse foreign' keys
>> >> (some 75% of the performance penalty for my package method is with
>> >> these) and only 20% can be solved by select_related.
>>
>> >> To be specific about how the multiplicities unfold:
>>
>> >> search_querySet is a Directory.objects.filter(...
>>
>> >> for s in search_querySet:
>> >>         address_info = s.address_set.all() #.select_related(depth=2) -
>> >> yes i can/will put select related here but it really does not help
>> >> that much 20% tops
>> >>         #address_info is usually 2-3 rows from an address table
>> >>         for a in address_info:#.select_related(depth=2):
>> >>             if a.addresstype.adrtype == 'Physical' and
>> >> a.addradmin.addr_enabled == True:
>> >>             #further reduction in the number of rows which we need to
>> >> get values from.
>> >>             related_phone=a.phone_set.all()
>> >>             related_email=s.email_set.all()
>> >>             #phones and emails are a small number of rows 2-3 tops
>>
>> >> It is these lines which produce the performance hit.
>> >> I cant see a way of using django's query language to avoid having to
>> >> descend into each of the 500 'directory' objects because of the
>> >> necessity to get all rows from the related tables in phones an emails
>> >> and to inspect the type of 'address' object.
>>
>> >> thanks for the ideas. will continue testing and looking for answers
>>
>> >> -Sam
>>
>> >> On Thu, Dec 31, 2009 at 2:41 AM, Nick Arnett  
>> >> wrote:
>>
>> >> > On Wed, Dec 30, 2009 at 7:15 AM, Adam Playford 
>> >> > wrote:
>>
>> >> >> I'm not an expert on this, but a few thoughts.
>>
>> >> >> First, if I'm reading your message right, it sounds like your problem
>> >> >> probably isn't with the query, but with how many times you're running
>> >> >> it.
>>
>> >> > I'll echo that... the problem is not the database - the queries are as 
>> >> > good
>> >> > as it gets.  The problem is running them repeatedly.
>>
>> >> > If all else fails, I'd replace those queries that execute 500 times 
>> >> > with raw
>> >> > SQL that uses the IN operator to get the required rows.

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

2010-01-07 Thread Philippe Raoult

Hi Sam,

I've read your description and it doesn't mention how often your data
changes. If it's not too often, you might want to consider simply
caching the whole results or parts of it. You'll then have to add a
bit of code to regenerate the cache every time the data changes. It
looks to me like you have one query for selecting which entries to
print, then many repetitive queries to find information about each
entry. If you had a per-entry cache then you would probably be down to
one main query plus a lot of cache lookups.

Also you were on the right track when using addresses =
Address.objects.filter(directory__in=directories). I've often used
that kind of queries to reduce (1 + many queries) down to (2 + lots of
dictionnary operations)

Hope that helps.

Regards,
Philippe


On 7 jan, 11:39, Sam Walters  wrote:
> Thanks Koen
> I had suspected such things would exist but couldnt find them.
> I will take a look at this. Looks like a neat addon which would
> greatly help my project turnaround time.
>
> cheers
> Sam
>
> On Wed, Jan 6, 2010 at 11:07 PM, koenb  wrote:
>
> > On 31 dec 2009, 01:56, Sam Walters  wrote:
> >> Thanks for the replies.
>
> >> Yes, there is the option of going to raw MySQL. However the project
> >> requirements mean i can't use raw SQL. (portability, readability)
> >> From what i can see using django's db API i have to execute the
> >> queries 500 times.
> >> I am very familiar with the query documentation and i know that
> >> select_related will prevent foward facing foreign keys translating to
> >> an individual sql queries which hit the db and would slow it down.
>
> >> Fact is even when i dont use 'select_related' the major performance
> >> problem occurs with the 'many-to-many' and 'reverse foreign' keys
> >> (some 75% of the performance penalty for my package method is with
> >> these) and only 20% can be solved by select_related.
>
> >> To be specific about how the multiplicities unfold:
>
> >> search_querySet is a Directory.objects.filter(...
>
> >> for s in search_querySet:
> >>         address_info = s.address_set.all() #.select_related(depth=2) -
> >> yes i can/will put select related here but it really does not help
> >> that much 20% tops
> >>         #address_info is usually 2-3 rows from an address table
> >>         for a in address_info:#.select_related(depth=2):
> >>             if a.addresstype.adrtype == 'Physical' and
> >> a.addradmin.addr_enabled == True:
> >>             #further reduction in the number of rows which we need to
> >> get values from.
> >>             related_phone=a.phone_set.all()
> >>             related_email=s.email_set.all()
> >>             #phones and emails are a small number of rows 2-3 tops
>
> >> It is these lines which produce the performance hit.
> >> I cant see a way of using django's query language to avoid having to
> >> descend into each of the 500 'directory' objects because of the
> >> necessity to get all rows from the related tables in phones an emails
> >> and to inspect the type of 'address' object.
>
> >> thanks for the ideas. will continue testing and looking for answers
>
> >> -Sam
>
> >> On Thu, Dec 31, 2009 at 2:41 AM, Nick Arnett  wrote:
>
> >> > On Wed, Dec 30, 2009 at 7:15 AM, Adam Playford 
> >> > wrote:
>
> >> >> I'm not an expert on this, but a few thoughts.
>
> >> >> First, if I'm reading your message right, it sounds like your problem
> >> >> probably isn't with the query, but with how many times you're running
> >> >> it.
>
> >> > I'll echo that... the problem is not the database - the queries are as 
> >> > good
> >> > as it gets.  The problem is running them repeatedly.
>
> >> > If all else fails, I'd replace those queries that execute 500 times with 
> >> > raw
> >> > SQL that uses the IN operator to get the required rows.
>
> >> > E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` 
> >> > FROM
> >> > `common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173)
>
> >> > I imagine there's an ORM query that will do the same thing, but I know 
> >> > MySQL
> >> > far better than I know Django.
>
> >> > Nick
>
> > You can take a look at apps like django-selectreverse [1] or django-
> > batch-select [2] for some ideas to make this kind of optimisations
> > easier.
>
> > Koen
>
> > [1]http://code.google.com/p/django-selectreverse/
> > [2]http://github.com/lilspikey/django-batch-select
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Django users" group.
> > To post to this group, send email to django-us...@googlegroups.com.
> > To unsubscribe from this group, send email to 
> > django-users+unsubscr...@googlegroups.com.
> > For more options, visit this group 
> > athttp://groups.google.com/group/django-users?hl=en.
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

2010-01-07 Thread Sam Walters

Thanks Koen
I had suspected such things would exist but couldnt find them.
I will take a look at this. Looks like a neat addon which would
greatly help my project turnaround time.

cheers
Sam

On Wed, Jan 6, 2010 at 11:07 PM, koenb  wrote:
>
> On 31 dec 2009, 01:56, Sam Walters  wrote:
>> Thanks for the replies.
>>
>> Yes, there is the option of going to raw MySQL. However the project
>> requirements mean i can't use raw SQL. (portability, readability)
>> From what i can see using django's db API i have to execute the
>> queries 500 times.
>> I am very familiar with the query documentation and i know that
>> select_related will prevent foward facing foreign keys translating to
>> an individual sql queries which hit the db and would slow it down.
>>
>> Fact is even when i dont use 'select_related' the major performance
>> problem occurs with the 'many-to-many' and 'reverse foreign' keys
>> (some 75% of the performance penalty for my package method is with
>> these) and only 20% can be solved by select_related.
>>
>> To be specific about how the multiplicities unfold:
>>
>> search_querySet is a Directory.objects.filter(...
>>
>> for s in search_querySet:
>>         address_info = s.address_set.all() #.select_related(depth=2) -
>> yes i can/will put select related here but it really does not help
>> that much 20% tops
>>         #address_info is usually 2-3 rows from an address table
>>         for a in address_info:#.select_related(depth=2):
>>             if a.addresstype.adrtype == 'Physical' and
>> a.addradmin.addr_enabled == True:
>>             #further reduction in the number of rows which we need to
>> get values from.
>>             related_phone=a.phone_set.all()
>>             related_email=s.email_set.all()
>>             #phones and emails are a small number of rows 2-3 tops
>>
>> It is these lines which produce the performance hit.
>> I cant see a way of using django's query language to avoid having to
>> descend into each of the 500 'directory' objects because of the
>> necessity to get all rows from the related tables in phones an emails
>> and to inspect the type of 'address' object.
>>
>> thanks for the ideas. will continue testing and looking for answers
>>
>> -Sam
>>
>> On Thu, Dec 31, 2009 at 2:41 AM, Nick Arnett  wrote:
>>
>> > On Wed, Dec 30, 2009 at 7:15 AM, Adam Playford 
>> > wrote:
>>
>> >> I'm not an expert on this, but a few thoughts.
>>
>> >> First, if I'm reading your message right, it sounds like your problem
>> >> probably isn't with the query, but with how many times you're running
>> >> it.
>>
>> > I'll echo that... the problem is not the database - the queries are as good
>> > as it gets.  The problem is running them repeatedly.
>>
>> > If all else fails, I'd replace those queries that execute 500 times with 
>> > raw
>> > SQL that uses the IN operator to get the required rows.
>>
>> > E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM
>> > `common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173)
>>
>> > I imagine there's an ORM query that will do the same thing, but I know 
>> > MySQL
>> > far better than I know Django.
>>
>> > Nick
>>
>
> You can take a look at apps like django-selectreverse [1] or django-
> batch-select [2] for some ideas to make this kind of optimisations
> easier.
>
> Koen
>
> [1] http://code.google.com/p/django-selectreverse/
> [2] http://github.com/lilspikey/django-batch-select
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Django users" group.
> To post to this group, send email to django-us...@googlegroups.com.
> To unsubscribe from this group, send email to 
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/django-users?hl=en.
>
>
>
>
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

2010-01-06 Thread koenb


On 31 dec 2009, 01:56, Sam Walters  wrote:
> Thanks for the replies.
>
> Yes, there is the option of going to raw MySQL. However the project
> requirements mean i can't use raw SQL. (portability, readability)
> From what i can see using django's db API i have to execute the
> queries 500 times.
> I am very familiar with the query documentation and i know that
> select_related will prevent foward facing foreign keys translating to
> an individual sql queries which hit the db and would slow it down.
>
> Fact is even when i dont use 'select_related' the major performance
> problem occurs with the 'many-to-many' and 'reverse foreign' keys
> (some 75% of the performance penalty for my package method is with
> these) and only 20% can be solved by select_related.
>
> To be specific about how the multiplicities unfold:
>
> search_querySet is a Directory.objects.filter(...
>
> for s in search_querySet:
>         address_info = s.address_set.all() #.select_related(depth=2) -
> yes i can/will put select related here but it really does not help
> that much 20% tops
>         #address_info is usually 2-3 rows from an address table
>         for a in address_info:#.select_related(depth=2):
>             if a.addresstype.adrtype == 'Physical' and
> a.addradmin.addr_enabled == True:
>             #further reduction in the number of rows which we need to
> get values from.
>             related_phone=a.phone_set.all()
>             related_email=s.email_set.all()
>             #phones and emails are a small number of rows 2-3 tops
>
> It is these lines which produce the performance hit.
> I cant see a way of using django's query language to avoid having to
> descend into each of the 500 'directory' objects because of the
> necessity to get all rows from the related tables in phones an emails
> and to inspect the type of 'address' object.
>
> thanks for the ideas. will continue testing and looking for answers
>
> -Sam
>
> On Thu, Dec 31, 2009 at 2:41 AM, Nick Arnett  wrote:
>
> > On Wed, Dec 30, 2009 at 7:15 AM, Adam Playford 
> > wrote:
>
> >> I'm not an expert on this, but a few thoughts.
>
> >> First, if I'm reading your message right, it sounds like your problem
> >> probably isn't with the query, but with how many times you're running
> >> it.
>
> > I'll echo that... the problem is not the database - the queries are as good
> > as it gets.  The problem is running them repeatedly.
>
> > If all else fails, I'd replace those queries that execute 500 times with raw
> > SQL that uses the IN operator to get the required rows.
>
> > E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM
> > `common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173)
>
> > I imagine there's an ORM query that will do the same thing, but I know MySQL
> > far better than I know Django.
>
> > Nick
>

You can take a look at apps like django-selectreverse [1] or django-
batch-select [2] for some ideas to make this kind of optimisations
easier.

Koen

[1] http://code.google.com/p/django-selectreverse/
[2] http://github.com/lilspikey/django-batch-select
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

2010-01-05 Thread Tomasz Zieliński



On 4 Sty, 07:30, Sam Walters  wrote:
> Hi Tomasz
> Yes, i have followed a raw sql approach now im looking at my test data
> to see which objects have multiple rows and cleaning that up.
>
> Its a shame that '__in' has limited use under these scenarios:
>
> directories = search_querySet.distinct()
> addresses = Address.objects.filter(directory__in=directories)
> addresses.values('directory__id', . *some other relevent fields*)
>
> this certainly allows me to select the related sets of addresses for
> each directory however i need to be able to relate each address object
> back to its directory (by id would be great) trying to get the
> directory 'id' packaged in the values() gives me an error saying
> invalid field even though 'directory' is listed as a valid field.

I'm not 100% sure, but AFAIR you can only take values of fields
contained directly in models.

> If i could do that then i could iterate through each dictionary and
> zip related items together based on their directory id's or something
> nice like that.
>
> "
> Re those VIEWs, they are SELECTs with JOINs, which effectively
> produce up to a few rows for single object (e.g. if you have person
> with 3 phone numbers,
> you're getting 3 rows), but it turns out to be much more efficient to
> process/merge
> that in Python code than to issue hundreds of SQL queries.
> "
>
> Yes this seems to be the best way, do you have any links where i can
> see how various people have implemented this? Would be good to write a
> 'pythonic' solution
>

I wrote simple function that works for my purposes, I suspect that
you use-case also doesn't require anything fancy, but I agree
that 'pythonic' solutions are good for morale :-)

--
Tomasz Zielinski
http://pyconsultant.eu
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

2010-01-03 Thread Sam Walters

Hi Tomasz
Yes, i have followed a raw sql approach now im looking at my test data
to see which objects have multiple rows and cleaning that up.

Its a shame that '__in' has limited use under these scenarios:

directories = search_querySet.distinct()
addresses = Address.objects.filter(directory__in=directories)
addresses.values('directory__id', . *some other relevent fields*)

this certainly allows me to select the related sets of addresses for
each directory however i need to be able to relate each address object
back to its directory (by id would be great) trying to get the
directory 'id' packaged in the values() gives me an error saying
invalid field even though 'directory' is listed as a valid field.
If i could do that then i could iterate through each dictionary and
zip related items together based on their directory id's or something
nice like that.

"
Re those VIEWs, they are SELECTs with JOINs, which effectively
produce up to a few rows for single object (e.g. if you have person
with 3 phone numbers,
you're getting 3 rows), but it turns out to be much more efficient to
process/merge
that in Python code than to issue hundreds of SQL queries.
"

Yes this seems to be the best way, do you have any links where i can
see how various people have implemented this? Would be good to write a
'pythonic' solution

cheers

-sam
2010/1/4 Tomasz Zieliński :
> On 31 Gru 2009, 01:56, Sam Walters  wrote:
>
>> for s in search_querySet:
>>         address_info = s.address_set.all() #.select_related(depth=2) -
>> yes i can/will put select related here but it really does not help
>> that much 20% tops
>>         #address_info is usually 2-3 rows from an address table
>>         for a in address_info:#.select_related(depth=2):
>>             if a.addresstype.adrtype == 'Physical' and
>> a.addradmin.addr_enabled == True:
>>             #further reduction in the number of rows which we need to
>> get values from.
>>             related_phone=a.phone_set.all()
>>             related_email=s.email_set.all()
>>             #phones and emails are a small number of rows 2-3 tops
>>
>> It is these lines which produce the performance hit.
>> I cant see a way of using django's query language to avoid having to
>> descend into each of the 500 'directory' objects because of the
>> necessity to get all rows from the related tables in phones an emails
>> and to inspect the type of 'address' object.
>>
>
> I solved very similar problem by creating database VIEWs with data I
> needed,
> wrapping those VIEWs with unmanaged Django models and then using
> simple .filter(...)-s.
>
> Re those VIEWs, they are SELECTs with JOINs, which effectively
> produce up to a few rows for single object (e.g. if you have person
> with 3 phone numbers,
> you're getting 3 rows), but it turns out to be much more efficient to
> process/merge
> that in Python code than to issue hundreds of SQL queries.
>
> --
> Tomasz Zielinski
> http://pyconsultant.eu
>
> --
>
> You received this message because you are subscribed to the Google Groups 
> "Django users" group.
> To post to this group, send email to django-us...@googlegroups.com.
> To unsubscribe from this group, send email to 
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/django-users?hl=en.
>
>
>

--

You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

2010-01-03 Thread Tomasz Zieliński

On 31 Gru 2009, 01:56, Sam Walters  wrote:

> for s in search_querySet:
>         address_info = s.address_set.all() #.select_related(depth=2) -
> yes i can/will put select related here but it really does not help
> that much 20% tops
>         #address_info is usually 2-3 rows from an address table
>         for a in address_info:#.select_related(depth=2):
>             if a.addresstype.adrtype == 'Physical' and
> a.addradmin.addr_enabled == True:
>             #further reduction in the number of rows which we need to
> get values from.
>             related_phone=a.phone_set.all()
>             related_email=s.email_set.all()
>             #phones and emails are a small number of rows 2-3 tops
>
> It is these lines which produce the performance hit.
> I cant see a way of using django's query language to avoid having to
> descend into each of the 500 'directory' objects because of the
> necessity to get all rows from the related tables in phones an emails
> and to inspect the type of 'address' object.
>

I solved very similar problem by creating database VIEWs with data I
needed,
wrapping those VIEWs with unmanaged Django models and then using
simple .filter(...)-s.

Re those VIEWs, they are SELECTs with JOINs, which effectively
produce up to a few rows for single object (e.g. if you have person
with 3 phone numbers,
you're getting 3 rows), but it turns out to be much more efficient to
process/merge
that in Python code than to issue hundreds of SQL queries.

--
Tomasz Zielinski
http://pyconsultant.eu

--

You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

2010-01-03 Thread Tomasz Zieliński

On 30 Gru 2009, 16:41, Nick Arnett  wrote:
>
> E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM
> `common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173)
>
> I imagine there's an ORM query that will do the same thing, but I know MySQL
> far better than I know Django.
>

This is as simple as using .filter(addresstype__id__in=[1,6,8,52,173])

--
Tomasz Zielinski
http://pyconsultant.eu

--

You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

2009-12-30 Thread Sam Walters

Thanks for the replies.

Yes, there is the option of going to raw MySQL. However the project
requirements mean i can't use raw SQL. (portability, readability)
>From what i can see using django's db API i have to execute the
queries 500 times.
I am very familiar with the query documentation and i know that
select_related will prevent foward facing foreign keys translating to
an individual sql queries which hit the db and would slow it down.

Fact is even when i dont use 'select_related' the major performance
problem occurs with the 'many-to-many' and 'reverse foreign' keys
(some 75% of the performance penalty for my package method is with
these) and only 20% can be solved by select_related.

To be specific about how the multiplicities unfold:

search_querySet is a Directory.objects.filter(...

for s in search_querySet:
address_info = s.address_set.all() #.select_related(depth=2) -
yes i can/will put select related here but it really does not help
that much 20% tops
#address_info is usually 2-3 rows from an address table
for a in address_info:#.select_related(depth=2):
if a.addresstype.adrtype == 'Physical' and
a.addradmin.addr_enabled == True:
#further reduction in the number of rows which we need to
get values from.
related_phone=a.phone_set.all()
related_email=s.email_set.all()
#phones and emails are a small number of rows 2-3 tops

It is these lines which produce the performance hit.
I cant see a way of using django's query language to avoid having to
descend into each of the 500 'directory' objects because of the
necessity to get all rows from the related tables in phones an emails
and to inspect the type of 'address' object.

thanks for the ideas. will continue testing and looking for answers

-Sam

On Thu, Dec 31, 2009 at 2:41 AM, Nick Arnett  wrote:
>
>
> On Wed, Dec 30, 2009 at 7:15 AM, Adam Playford 
> wrote:
>>
>> I'm not an expert on this, but a few thoughts.
>>
>> First, if I'm reading your message right, it sounds like your problem
>> probably isn't with the query, but with how many times you're running
>> it.
>
> I'll echo that... the problem is not the database - the queries are as good
> as it gets.  The problem is running them repeatedly.
>
> If all else fails, I'd replace those queries that execute 500 times with raw
> SQL that uses the IN operator to get the required rows.
>
> E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM
> `common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173)
>
> I imagine there's an ORM query that will do the same thing, but I know MySQL
> far better than I know Django.
>
> Nick
>
> --
>
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-us...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>

--

You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

2009-12-30 Thread Nick Arnett

On Wed, Dec 30, 2009 at 7:15 AM, Adam Playford wrote:

> I'm not an expert on this, but a few thoughts.
>
> First, if I'm reading your message right, it sounds like your problem
> probably isn't with the query, but with how many times you're running
> it.

I'll echo that... the problem is not the database - the queries are as good
as it gets.  The problem is running them repeatedly.

If all else fails, I'd replace those queries that execute 500 times with raw
SQL that uses the IN operator to get the required rows.

E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM
`common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173)

I imagine there's an ORM query that will do the same thing, but I know MySQL
far better than I know Django.

Nick

--

You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

2009-12-30 Thread Adam Playford

I'm not an expert on this, but a few thoughts.

First, if I'm reading your message right, it sounds like your problem
probably isn't with the query, but with how many times you're running
it. If you think about it, if it's taking you 12 seconds to run *500*
queries, that's actually pretty darn good -- only 24 ms per query,
which is quite fast.

So (again, if I understand you correctly) I would try focusing on how
you can execute that query fewer times. It should be possible to not
run a different query for every single item; you'll just need to look
at how you're using Django's ORM.

A few thoughts:

1) If you haven't looked at select_related, give it a shot. It might
fix your problem entirely, particularly if you can find the right
depth: http://docs.djangoproject.com/en/1.1/ref/models/querysets/#id4

2) If that doesn't work, try setting up django-debugtoolbar. In fact,
I'd do that anyway. It's a great help at narrowing down problems like
this, because it shows you every query a page uses, and then shows you
how long each individual query takes to execute. This should let you
narrow in on the problem more precisely: 
http://github.com/robhudson/django-debug-toolbar

Good luck.

Adam

On Dec 30, 1:46 am, Sam Walters  wrote:
> Hi
> I have reached the optimisation stage of my project.
> I am trying to work out how to reduce the turnaround time for my queries.
> I implemented 'db_index=True'  for the fields where i determined there
> should be a performance boost by indexing that particular field.
> This has not really improved the speed of the queries.
>
> Worst case senario is that my website will have to display about 500
> out of 6000 entries in the db.
>
> Each entry bridges a many-to-many relationship with another table and
> also follows a reverse one-to-one which will return a set of related
> entries. Often the query takes 12-14 seconds from start to finish.
> (and 4-5 seconds to display around 100 results)
> The major performance penalty thus far has been measured to be when
> the MySQL statements for these queries are executed.
> The many-to-may and one-to-many components only return 1-3 related
> records at most for each entry and they always hit indexes so that
> part of the design is optimised.
>
> I have been using the following tools:
>
> print connection.queries
> import time
> and looking at the raw MySQL to see whatother optmisations could be made.
> use of MySQL EXPLAIN
>
> Anyway, here are the details:
>
> *in views.py the method that packages the results from the mysql 
> query:http://pastebin.com/m3eef56e5
>
> *the models associated with this from two applications: 'directory' and 
> 'common'http://pastebin.com/m3868a1fchttp://pastebin.com/m18ec3765
>
> *python manage.py sqlall directory && python manage.py sqlall common:
>
> http://pastebin.com/m63a50593http://pastebin.com/m6f958cda
>
> As far as I can tell indexes are working and the queries should be fast.
>
> Here is the MySQL per iteration (will execute 500 times for my worst
> case scenario which takes 12 seconds on average)
>
> SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM
> `common_addresstype` WHERE `common_addresstype`.`id` = 1;
> SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM
> `common_addresstype` WHERE `common_addresstype`.`id` = 2;
> SELECT `common_addradmin`.`id`,
> `common_addradmin`.`surfaceMail_works`,
> `common_addradmin`.`addr_enabled` FROM `common_addradmin` WHERE
> `common_addradmin`.`id` = 1;
> SELECT `common_address`.`id`, `common_address`.`airfield_id`,
> `common_address`.`country_id`, `common_address`.`addresstype_id`,
> `common_address`.`addradmin_id`, `common_address`.`location_id`,
> `common_address`.`street1`, `common_address`.`street2`,
> `common_address`.`user_lat_dec`, `common_address`.`user_long_dec`,
> `common_address`.`zoom` FROM `common_address` INNER JOIN
> `common_address_directory` ON (`common_address`.`id` =
> `common_address_directory`.`address_id`) WHERE
> `common_address_directory`.`directory_id` = 4267;
>
> Last but not least Explain for the above MySQL for 1 instance of 500 entries.
>
> mysql> EXPLAIN SELECT `common_addresstype`.`id`,
> `common_addresstype`.`adrtype` FROM `common_addresstype` WHERE
> `common_addresstype`.`id` = 1;
> ++-++---+---+-+ 
> -+---+--+---+
> | id | select_type | table              | type  | possible_keys | key
>    | key_len | ref   | rows | Extra |
> ++-++---+---+-+ 
> -+---+--+---+
> |  1 | SIMPLE      | common_addresstype | const | PRIMARY       |
> PRIMARY | 4       | const |    1 |       |
> ++-++---+---+-+ 
> -+---+--+---+
>
> mysql> EXPLAIN SELECT `common_addresstype`.`id`,
> `common_addresstype`.`adrtype` FROM `common_addresstype` WHERE
>

Project optimisation stage: Advice to boost speed of database queries across tables?

2009-12-29 Thread Sam Walters

Hi
I have reached the optimisation stage of my project.
I am trying to work out how to reduce the turnaround time for my queries.
I implemented 'db_index=True'  for the fields where i determined there
should be a performance boost by indexing that particular field.
This has not really improved the speed of the queries.

Worst case senario is that my website will have to display about 500
out of 6000 entries in the db.

Each entry bridges a many-to-many relationship with another table and
also follows a reverse one-to-one which will return a set of related
entries. Often the query takes 12-14 seconds from start to finish.
(and 4-5 seconds to display around 100 results)
The major performance penalty thus far has been measured to be when
the MySQL statements for these queries are executed.
The many-to-may and one-to-many components only return 1-3 related
records at most for each entry and they always hit indexes so that
part of the design is optimised.

I have been using the following tools:

print connection.queries
import time
and looking at the raw MySQL to see whatother optmisations could be made.
use of MySQL EXPLAIN

Anyway, here are the details:

*in views.py the method that packages the results from the mysql query:
http://pastebin.com/m3eef56e5

*the models associated with this from two applications: 'directory' and 'common'
http://pastebin.com/m3868a1fc
http://pastebin.com/m18ec3765

*python manage.py sqlall directory && python manage.py sqlall common:

http://pastebin.com/m63a50593
http://pastebin.com/m6f958cda

As far as I can tell indexes are working and the queries should be fast.

Here is the MySQL per iteration (will execute 500 times for my worst
case scenario which takes 12 seconds on average)

SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM
`common_addresstype` WHERE `common_addresstype`.`id` = 1;
SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM
`common_addresstype` WHERE `common_addresstype`.`id` = 2;
SELECT `common_addradmin`.`id`,
`common_addradmin`.`surfaceMail_works`,
`common_addradmin`.`addr_enabled` FROM `common_addradmin` WHERE
`common_addradmin`.`id` = 1;
SELECT `common_address`.`id`, `common_address`.`airfield_id`,
`common_address`.`country_id`, `common_address`.`addresstype_id`,
`common_address`.`addradmin_id`, `common_address`.`location_id`,
`common_address`.`street1`, `common_address`.`street2`,
`common_address`.`user_lat_dec`, `common_address`.`user_long_dec`,
`common_address`.`zoom` FROM `common_address` INNER JOIN
`common_address_directory` ON (`common_address`.`id` =
`common_address_directory`.`address_id`) WHERE
`common_address_directory`.`directory_id` = 4267;

Last but not least Explain for the above MySQL for 1 instance of 500 entries.

mysql> EXPLAIN SELECT `common_addresstype`.`id`,
`common_addresstype`.`adrtype` FROM `common_addresstype` WHERE
`common_addresstype`.`id` = 1;
++-++---+---+-+-+---+--+---+
| id | select_type | table  | type  | possible_keys | key
   | key_len | ref   | rows | Extra |
++-++---+---+-+-+---+--+---+
|  1 | SIMPLE  | common_addresstype | const | PRIMARY   |
PRIMARY | 4   | const |1 |   |
++-++---+---+-+-+---+--+---+

mysql> EXPLAIN SELECT `common_addresstype`.`id`,
`common_addresstype`.`adrtype` FROM `common_addresstype` WHERE
`common_addresstype`.`id` = 2;
++-++---+---+-+-+---+--+---+
| id | select_type | table  | type  | possible_keys | key
   | key_len | ref   | rows | Extra |
++-++---+---+-+-+---+--+---+
|  1 | SIMPLE  | common_addresstype | const | PRIMARY   |
PRIMARY | 4   | const |1 |   |
++-++---+---+-+-+---+--+---+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT `common_addradmin`.`id`,
`common_addradmin`.`surfaceMail_works`,
`common_addradmin`.`addr_enabled` FROM `common_addradmin` WHERE
`common_addradmin`.`id` = 1;
++-+--+---+---+-+-+---+--+---+
| id | select_type | table| type  | possible_keys | key
 | key_len | ref   | rows | Extra |
++-+--+---+---+-+-+---+--+---+
|  1 | SIMPLE  | common_addradmin | const | PRIMARY   |
PRIMARY | 4   | const |1 |   |
++-+--+---+---+-+-+---+--+---+
1 row in set (0.00 sec)


mysql> EXPLAIN SELECT `common_address`.`id`,
`common_address`.`airfield_id`,

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

Re: Project optimisation stage: Advice to boost speed of database queries across tables?

Project optimisation stage: Advice to boost speed of database queries across tables?

12 matches

Site Navigation

Mail list logo

Footer information