Re: Project optimisation stage: Advice to boost speed of database queries across tables?
I also read a good article. As it turns out im using a lot of inner joins: http://www.caktusgroup.com/blog/2009/09/ This was quite handy. My processis usually build it to be as fast as possible in raw SQL then go back in and try and makre it with the ORM. "I've read your description and it doesn't mention how often your data changes. If it's not too often, you might want to consider simply caching the whole results or parts of it. You'll then have to add a bit of code to regenerate the cache every time the data changes." Yes actually in some of the scenarios the data is quite static, however im doing a lot of stuff with lat/long coordinates now so it means i have a bunch of keys in the results dict returned which are all dynamic, per click values and need to be recalculated a lot. Note: I have taken your advice on board for some of my views Philippe. Cheers Sam_W On Fri, Jan 8, 2010 at 12:18 AM, Philippe Raoultwrote: > Hi Sam, > > I've read your description and it doesn't mention how often your data > changes. If it's not too often, you might want to consider simply > caching the whole results or parts of it. You'll then have to add a > bit of code to regenerate the cache every time the data changes. It > looks to me like you have one query for selecting which entries to > print, then many repetitive queries to find information about each > entry. If you had a per-entry cache then you would probably be down to > one main query plus a lot of cache lookups. > > Also you were on the right track when using addresses = > Address.objects.filter(directory__in=directories). I've often used > that kind of queries to reduce (1 + many queries) down to (2 + lots of > dictionnary operations) > > Hope that helps. > > Regards, > Philippe > > > On 7 jan, 11:39, Sam Walters wrote: >> Thanks Koen >> I had suspected such things would exist but couldnt find them. >> I will take a look at this. Looks like a neat addon which would >> greatly help my project turnaround time. >> >> cheers >> Sam >> >> On Wed, Jan 6, 2010 at 11:07 PM, koenb wrote: >> >> > On 31 dec 2009, 01:56, Sam Walters wrote: >> >> Thanks for the replies. >> >> >> Yes, there is the option of going to raw MySQL. However the project >> >> requirements mean i can't use raw SQL. (portability, readability) >> >> From what i can see using django's db API i have to execute the >> >> queries 500 times. >> >> I am very familiar with the query documentation and i know that >> >> select_related will prevent foward facing foreign keys translating to >> >> an individual sql queries which hit the db and would slow it down. >> >> >> Fact is even when i dont use 'select_related' the major performance >> >> problem occurs with the 'many-to-many' and 'reverse foreign' keys >> >> (some 75% of the performance penalty for my package method is with >> >> these) and only 20% can be solved by select_related. >> >> >> To be specific about how the multiplicities unfold: >> >> >> search_querySet is a Directory.objects.filter(... >> >> >> for s in search_querySet: >> >> address_info = s.address_set.all() #.select_related(depth=2) - >> >> yes i can/will put select related here but it really does not help >> >> that much 20% tops >> >> #address_info is usually 2-3 rows from an address table >> >> for a in address_info:#.select_related(depth=2): >> >> if a.addresstype.adrtype == 'Physical' and >> >> a.addradmin.addr_enabled == True: >> >> #further reduction in the number of rows which we need to >> >> get values from. >> >> related_phone=a.phone_set.all() >> >> related_email=s.email_set.all() >> >> #phones and emails are a small number of rows 2-3 tops >> >> >> It is these lines which produce the performance hit. >> >> I cant see a way of using django's query language to avoid having to >> >> descend into each of the 500 'directory' objects because of the >> >> necessity to get all rows from the related tables in phones an emails >> >> and to inspect the type of 'address' object. >> >> >> thanks for the ideas. will continue testing and looking for answers >> >> >> -Sam >> >> >> On Thu, Dec 31, 2009 at 2:41 AM, Nick Arnett >> >> wrote: >> >> >> > On Wed, Dec 30, 2009 at 7:15 AM, Adam Playford >> >> > wrote: >> >> >> >> I'm not an expert on this, but a few thoughts. >> >> >> >> First, if I'm reading your message right, it sounds like your problem >> >> >> probably isn't with the query, but with how many times you're running >> >> >> it. >> >> >> > I'll echo that... the problem is not the database - the queries are as >> >> > good >> >> > as it gets. The problem is running them repeatedly. >> >> >> > If all else fails, I'd replace those queries that execute 500 times >> >> > with raw >> >> > SQL that uses the IN operator to get the required rows.
Re: Project optimisation stage: Advice to boost speed of database queries across tables?
Hi Sam, I've read your description and it doesn't mention how often your data changes. If it's not too often, you might want to consider simply caching the whole results or parts of it. You'll then have to add a bit of code to regenerate the cache every time the data changes. It looks to me like you have one query for selecting which entries to print, then many repetitive queries to find information about each entry. If you had a per-entry cache then you would probably be down to one main query plus a lot of cache lookups. Also you were on the right track when using addresses = Address.objects.filter(directory__in=directories). I've often used that kind of queries to reduce (1 + many queries) down to (2 + lots of dictionnary operations) Hope that helps. Regards, Philippe On 7 jan, 11:39, Sam Walterswrote: > Thanks Koen > I had suspected such things would exist but couldnt find them. > I will take a look at this. Looks like a neat addon which would > greatly help my project turnaround time. > > cheers > Sam > > On Wed, Jan 6, 2010 at 11:07 PM, koenb wrote: > > > On 31 dec 2009, 01:56, Sam Walters wrote: > >> Thanks for the replies. > > >> Yes, there is the option of going to raw MySQL. However the project > >> requirements mean i can't use raw SQL. (portability, readability) > >> From what i can see using django's db API i have to execute the > >> queries 500 times. > >> I am very familiar with the query documentation and i know that > >> select_related will prevent foward facing foreign keys translating to > >> an individual sql queries which hit the db and would slow it down. > > >> Fact is even when i dont use 'select_related' the major performance > >> problem occurs with the 'many-to-many' and 'reverse foreign' keys > >> (some 75% of the performance penalty for my package method is with > >> these) and only 20% can be solved by select_related. > > >> To be specific about how the multiplicities unfold: > > >> search_querySet is a Directory.objects.filter(... > > >> for s in search_querySet: > >> address_info = s.address_set.all() #.select_related(depth=2) - > >> yes i can/will put select related here but it really does not help > >> that much 20% tops > >> #address_info is usually 2-3 rows from an address table > >> for a in address_info:#.select_related(depth=2): > >> if a.addresstype.adrtype == 'Physical' and > >> a.addradmin.addr_enabled == True: > >> #further reduction in the number of rows which we need to > >> get values from. > >> related_phone=a.phone_set.all() > >> related_email=s.email_set.all() > >> #phones and emails are a small number of rows 2-3 tops > > >> It is these lines which produce the performance hit. > >> I cant see a way of using django's query language to avoid having to > >> descend into each of the 500 'directory' objects because of the > >> necessity to get all rows from the related tables in phones an emails > >> and to inspect the type of 'address' object. > > >> thanks for the ideas. will continue testing and looking for answers > > >> -Sam > > >> On Thu, Dec 31, 2009 at 2:41 AM, Nick Arnett wrote: > > >> > On Wed, Dec 30, 2009 at 7:15 AM, Adam Playford > >> > wrote: > > >> >> I'm not an expert on this, but a few thoughts. > > >> >> First, if I'm reading your message right, it sounds like your problem > >> >> probably isn't with the query, but with how many times you're running > >> >> it. > > >> > I'll echo that... the problem is not the database - the queries are as > >> > good > >> > as it gets. The problem is running them repeatedly. > > >> > If all else fails, I'd replace those queries that execute 500 times with > >> > raw > >> > SQL that uses the IN operator to get the required rows. > > >> > E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` > >> > FROM > >> > `common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173) > > >> > I imagine there's an ORM query that will do the same thing, but I know > >> > MySQL > >> > far better than I know Django. > > >> > Nick > > > You can take a look at apps like django-selectreverse [1] or django- > > batch-select [2] for some ideas to make this kind of optimisations > > easier. > > > Koen > > > [1]http://code.google.com/p/django-selectreverse/ > > [2]http://github.com/lilspikey/django-batch-select > > > -- > > You received this message because you are subscribed to the Google Groups > > "Django users" group. > > To post to this group, send email to django-us...@googlegroups.com. > > To unsubscribe from this group, send email to > > django-users+unsubscr...@googlegroups.com. > > For more options, visit this group > > athttp://groups.google.com/group/django-users?hl=en. -- You received this message because you are subscribed to the Google Groups "Django users" group. To post
Re: Project optimisation stage: Advice to boost speed of database queries across tables?
Thanks Koen I had suspected such things would exist but couldnt find them. I will take a look at this. Looks like a neat addon which would greatly help my project turnaround time. cheers Sam On Wed, Jan 6, 2010 at 11:07 PM, koenbwrote: > > On 31 dec 2009, 01:56, Sam Walters wrote: >> Thanks for the replies. >> >> Yes, there is the option of going to raw MySQL. However the project >> requirements mean i can't use raw SQL. (portability, readability) >> From what i can see using django's db API i have to execute the >> queries 500 times. >> I am very familiar with the query documentation and i know that >> select_related will prevent foward facing foreign keys translating to >> an individual sql queries which hit the db and would slow it down. >> >> Fact is even when i dont use 'select_related' the major performance >> problem occurs with the 'many-to-many' and 'reverse foreign' keys >> (some 75% of the performance penalty for my package method is with >> these) and only 20% can be solved by select_related. >> >> To be specific about how the multiplicities unfold: >> >> search_querySet is a Directory.objects.filter(... >> >> for s in search_querySet: >> address_info = s.address_set.all() #.select_related(depth=2) - >> yes i can/will put select related here but it really does not help >> that much 20% tops >> #address_info is usually 2-3 rows from an address table >> for a in address_info:#.select_related(depth=2): >> if a.addresstype.adrtype == 'Physical' and >> a.addradmin.addr_enabled == True: >> #further reduction in the number of rows which we need to >> get values from. >> related_phone=a.phone_set.all() >> related_email=s.email_set.all() >> #phones and emails are a small number of rows 2-3 tops >> >> It is these lines which produce the performance hit. >> I cant see a way of using django's query language to avoid having to >> descend into each of the 500 'directory' objects because of the >> necessity to get all rows from the related tables in phones an emails >> and to inspect the type of 'address' object. >> >> thanks for the ideas. will continue testing and looking for answers >> >> -Sam >> >> On Thu, Dec 31, 2009 at 2:41 AM, Nick Arnett wrote: >> >> > On Wed, Dec 30, 2009 at 7:15 AM, Adam Playford >> > wrote: >> >> >> I'm not an expert on this, but a few thoughts. >> >> >> First, if I'm reading your message right, it sounds like your problem >> >> probably isn't with the query, but with how many times you're running >> >> it. >> >> > I'll echo that... the problem is not the database - the queries are as good >> > as it gets. The problem is running them repeatedly. >> >> > If all else fails, I'd replace those queries that execute 500 times with >> > raw >> > SQL that uses the IN operator to get the required rows. >> >> > E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM >> > `common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173) >> >> > I imagine there's an ORM query that will do the same thing, but I know >> > MySQL >> > far better than I know Django. >> >> > Nick >> > > You can take a look at apps like django-selectreverse [1] or django- > batch-select [2] for some ideas to make this kind of optimisations > easier. > > Koen > > [1] http://code.google.com/p/django-selectreverse/ > [2] http://github.com/lilspikey/django-batch-select > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To post to this group, send email to django-us...@googlegroups.com. > To unsubscribe from this group, send email to > django-users+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/django-users?hl=en. > > > > -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Project optimisation stage: Advice to boost speed of database queries across tables?
On 31 dec 2009, 01:56, Sam Walterswrote: > Thanks for the replies. > > Yes, there is the option of going to raw MySQL. However the project > requirements mean i can't use raw SQL. (portability, readability) > From what i can see using django's db API i have to execute the > queries 500 times. > I am very familiar with the query documentation and i know that > select_related will prevent foward facing foreign keys translating to > an individual sql queries which hit the db and would slow it down. > > Fact is even when i dont use 'select_related' the major performance > problem occurs with the 'many-to-many' and 'reverse foreign' keys > (some 75% of the performance penalty for my package method is with > these) and only 20% can be solved by select_related. > > To be specific about how the multiplicities unfold: > > search_querySet is a Directory.objects.filter(... > > for s in search_querySet: > address_info = s.address_set.all() #.select_related(depth=2) - > yes i can/will put select related here but it really does not help > that much 20% tops > #address_info is usually 2-3 rows from an address table > for a in address_info:#.select_related(depth=2): > if a.addresstype.adrtype == 'Physical' and > a.addradmin.addr_enabled == True: > #further reduction in the number of rows which we need to > get values from. > related_phone=a.phone_set.all() > related_email=s.email_set.all() > #phones and emails are a small number of rows 2-3 tops > > It is these lines which produce the performance hit. > I cant see a way of using django's query language to avoid having to > descend into each of the 500 'directory' objects because of the > necessity to get all rows from the related tables in phones an emails > and to inspect the type of 'address' object. > > thanks for the ideas. will continue testing and looking for answers > > -Sam > > On Thu, Dec 31, 2009 at 2:41 AM, Nick Arnett wrote: > > > On Wed, Dec 30, 2009 at 7:15 AM, Adam Playford > > wrote: > > >> I'm not an expert on this, but a few thoughts. > > >> First, if I'm reading your message right, it sounds like your problem > >> probably isn't with the query, but with how many times you're running > >> it. > > > I'll echo that... the problem is not the database - the queries are as good > > as it gets. The problem is running them repeatedly. > > > If all else fails, I'd replace those queries that execute 500 times with raw > > SQL that uses the IN operator to get the required rows. > > > E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM > > `common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173) > > > I imagine there's an ORM query that will do the same thing, but I know MySQL > > far better than I know Django. > > > Nick > You can take a look at apps like django-selectreverse [1] or django- batch-select [2] for some ideas to make this kind of optimisations easier. Koen [1] http://code.google.com/p/django-selectreverse/ [2] http://github.com/lilspikey/django-batch-select -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Project optimisation stage: Advice to boost speed of database queries across tables?
On 4 Sty, 07:30, Sam Walterswrote: > Hi Tomasz > Yes, i have followed a raw sql approach now im looking at my test data > to see which objects have multiple rows and cleaning that up. > > Its a shame that '__in' has limited use under these scenarios: > > directories = search_querySet.distinct() > addresses = Address.objects.filter(directory__in=directories) > addresses.values('directory__id', . *some other relevent fields*) > > this certainly allows me to select the related sets of addresses for > each directory however i need to be able to relate each address object > back to its directory (by id would be great) trying to get the > directory 'id' packaged in the values() gives me an error saying > invalid field even though 'directory' is listed as a valid field. I'm not 100% sure, but AFAIR you can only take values of fields contained directly in models. > If i could do that then i could iterate through each dictionary and > zip related items together based on their directory id's or something > nice like that. > > " > Re those VIEWs, they are SELECTs with JOINs, which effectively > produce up to a few rows for single object (e.g. if you have person > with 3 phone numbers, > you're getting 3 rows), but it turns out to be much more efficient to > process/merge > that in Python code than to issue hundreds of SQL queries. > " > > Yes this seems to be the best way, do you have any links where i can > see how various people have implemented this? Would be good to write a > 'pythonic' solution > I wrote simple function that works for my purposes, I suspect that you use-case also doesn't require anything fancy, but I agree that 'pythonic' solutions are good for morale :-) -- Tomasz Zielinski http://pyconsultant.eu -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Project optimisation stage: Advice to boost speed of database queries across tables?
Hi Tomasz Yes, i have followed a raw sql approach now im looking at my test data to see which objects have multiple rows and cleaning that up. Its a shame that '__in' has limited use under these scenarios: directories = search_querySet.distinct() addresses = Address.objects.filter(directory__in=directories) addresses.values('directory__id', . *some other relevent fields*) this certainly allows me to select the related sets of addresses for each directory however i need to be able to relate each address object back to its directory (by id would be great) trying to get the directory 'id' packaged in the values() gives me an error saying invalid field even though 'directory' is listed as a valid field. If i could do that then i could iterate through each dictionary and zip related items together based on their directory id's or something nice like that. " Re those VIEWs, they are SELECTs with JOINs, which effectively produce up to a few rows for single object (e.g. if you have person with 3 phone numbers, you're getting 3 rows), but it turns out to be much more efficient to process/merge that in Python code than to issue hundreds of SQL queries. " Yes this seems to be the best way, do you have any links where i can see how various people have implemented this? Would be good to write a 'pythonic' solution cheers -sam 2010/1/4 Tomasz Zieliński: > On 31 Gru 2009, 01:56, Sam Walters wrote: > >> for s in search_querySet: >> address_info = s.address_set.all() #.select_related(depth=2) - >> yes i can/will put select related here but it really does not help >> that much 20% tops >> #address_info is usually 2-3 rows from an address table >> for a in address_info:#.select_related(depth=2): >> if a.addresstype.adrtype == 'Physical' and >> a.addradmin.addr_enabled == True: >> #further reduction in the number of rows which we need to >> get values from. >> related_phone=a.phone_set.all() >> related_email=s.email_set.all() >> #phones and emails are a small number of rows 2-3 tops >> >> It is these lines which produce the performance hit. >> I cant see a way of using django's query language to avoid having to >> descend into each of the 500 'directory' objects because of the >> necessity to get all rows from the related tables in phones an emails >> and to inspect the type of 'address' object. >> > > I solved very similar problem by creating database VIEWs with data I > needed, > wrapping those VIEWs with unmanaged Django models and then using > simple .filter(...)-s. > > Re those VIEWs, they are SELECTs with JOINs, which effectively > produce up to a few rows for single object (e.g. if you have person > with 3 phone numbers, > you're getting 3 rows), but it turns out to be much more efficient to > process/merge > that in Python code than to issue hundreds of SQL queries. > > -- > Tomasz Zielinski > http://pyconsultant.eu > > -- > > You received this message because you are subscribed to the Google Groups > "Django users" group. > To post to this group, send email to django-us...@googlegroups.com. > To unsubscribe from this group, send email to > django-users+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/django-users?hl=en. > > > -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Project optimisation stage: Advice to boost speed of database queries across tables?
On 31 Gru 2009, 01:56, Sam Walterswrote: > for s in search_querySet: > address_info = s.address_set.all() #.select_related(depth=2) - > yes i can/will put select related here but it really does not help > that much 20% tops > #address_info is usually 2-3 rows from an address table > for a in address_info:#.select_related(depth=2): > if a.addresstype.adrtype == 'Physical' and > a.addradmin.addr_enabled == True: > #further reduction in the number of rows which we need to > get values from. > related_phone=a.phone_set.all() > related_email=s.email_set.all() > #phones and emails are a small number of rows 2-3 tops > > It is these lines which produce the performance hit. > I cant see a way of using django's query language to avoid having to > descend into each of the 500 'directory' objects because of the > necessity to get all rows from the related tables in phones an emails > and to inspect the type of 'address' object. > I solved very similar problem by creating database VIEWs with data I needed, wrapping those VIEWs with unmanaged Django models and then using simple .filter(...)-s. Re those VIEWs, they are SELECTs with JOINs, which effectively produce up to a few rows for single object (e.g. if you have person with 3 phone numbers, you're getting 3 rows), but it turns out to be much more efficient to process/merge that in Python code than to issue hundreds of SQL queries. -- Tomasz Zielinski http://pyconsultant.eu -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Project optimisation stage: Advice to boost speed of database queries across tables?
On 30 Gru 2009, 16:41, Nick Arnettwrote: > > E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM > `common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173) > > I imagine there's an ORM query that will do the same thing, but I know MySQL > far better than I know Django. > This is as simple as using .filter(addresstype__id__in=[1,6,8,52,173]) -- Tomasz Zielinski http://pyconsultant.eu -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Project optimisation stage: Advice to boost speed of database queries across tables?
Thanks for the replies. Yes, there is the option of going to raw MySQL. However the project requirements mean i can't use raw SQL. (portability, readability) >From what i can see using django's db API i have to execute the queries 500 times. I am very familiar with the query documentation and i know that select_related will prevent foward facing foreign keys translating to an individual sql queries which hit the db and would slow it down. Fact is even when i dont use 'select_related' the major performance problem occurs with the 'many-to-many' and 'reverse foreign' keys (some 75% of the performance penalty for my package method is with these) and only 20% can be solved by select_related. To be specific about how the multiplicities unfold: search_querySet is a Directory.objects.filter(... for s in search_querySet: address_info = s.address_set.all() #.select_related(depth=2) - yes i can/will put select related here but it really does not help that much 20% tops #address_info is usually 2-3 rows from an address table for a in address_info:#.select_related(depth=2): if a.addresstype.adrtype == 'Physical' and a.addradmin.addr_enabled == True: #further reduction in the number of rows which we need to get values from. related_phone=a.phone_set.all() related_email=s.email_set.all() #phones and emails are a small number of rows 2-3 tops It is these lines which produce the performance hit. I cant see a way of using django's query language to avoid having to descend into each of the 500 'directory' objects because of the necessity to get all rows from the related tables in phones an emails and to inspect the type of 'address' object. thanks for the ideas. will continue testing and looking for answers -Sam On Thu, Dec 31, 2009 at 2:41 AM, Nick Arnettwrote: > > > On Wed, Dec 30, 2009 at 7:15 AM, Adam Playford > wrote: >> >> I'm not an expert on this, but a few thoughts. >> >> First, if I'm reading your message right, it sounds like your problem >> probably isn't with the query, but with how many times you're running >> it. > > I'll echo that... the problem is not the database - the queries are as good > as it gets. The problem is running them repeatedly. > > If all else fails, I'd replace those queries that execute 500 times with raw > SQL that uses the IN operator to get the required rows. > > E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM > `common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173) > > I imagine there's an ORM query that will do the same thing, but I know MySQL > far better than I know Django. > > Nick > > -- > > You received this message because you are subscribed to the Google Groups > "Django users" group. > To post to this group, send email to django-us...@googlegroups.com. > To unsubscribe from this group, send email to > django-users+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/django-users?hl=en. > -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Project optimisation stage: Advice to boost speed of database queries across tables?
On Wed, Dec 30, 2009 at 7:15 AM, Adam Playfordwrote: > I'm not an expert on this, but a few thoughts. > > First, if I'm reading your message right, it sounds like your problem > probably isn't with the query, but with how many times you're running > it. I'll echo that... the problem is not the database - the queries are as good as it gets. The problem is running them repeatedly. If all else fails, I'd replace those queries that execute 500 times with raw SQL that uses the IN operator to get the required rows. E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM `common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173) I imagine there's an ORM query that will do the same thing, but I know MySQL far better than I know Django. Nick -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Project optimisation stage: Advice to boost speed of database queries across tables?
I'm not an expert on this, but a few thoughts. First, if I'm reading your message right, it sounds like your problem probably isn't with the query, but with how many times you're running it. If you think about it, if it's taking you 12 seconds to run *500* queries, that's actually pretty darn good -- only 24 ms per query, which is quite fast. So (again, if I understand you correctly) I would try focusing on how you can execute that query fewer times. It should be possible to not run a different query for every single item; you'll just need to look at how you're using Django's ORM. A few thoughts: 1) If you haven't looked at select_related, give it a shot. It might fix your problem entirely, particularly if you can find the right depth: http://docs.djangoproject.com/en/1.1/ref/models/querysets/#id4 2) If that doesn't work, try setting up django-debugtoolbar. In fact, I'd do that anyway. It's a great help at narrowing down problems like this, because it shows you every query a page uses, and then shows you how long each individual query takes to execute. This should let you narrow in on the problem more precisely: http://github.com/robhudson/django-debug-toolbar Good luck. Adam On Dec 30, 1:46 am, Sam Walterswrote: > Hi > I have reached the optimisation stage of my project. > I am trying to work out how to reduce the turnaround time for my queries. > I implemented 'db_index=True' for the fields where i determined there > should be a performance boost by indexing that particular field. > This has not really improved the speed of the queries. > > Worst case senario is that my website will have to display about 500 > out of 6000 entries in the db. > > Each entry bridges a many-to-many relationship with another table and > also follows a reverse one-to-one which will return a set of related > entries. Often the query takes 12-14 seconds from start to finish. > (and 4-5 seconds to display around 100 results) > The major performance penalty thus far has been measured to be when > the MySQL statements for these queries are executed. > The many-to-may and one-to-many components only return 1-3 related > records at most for each entry and they always hit indexes so that > part of the design is optimised. > > I have been using the following tools: > > print connection.queries > import time > and looking at the raw MySQL to see whatother optmisations could be made. > use of MySQL EXPLAIN > > Anyway, here are the details: > > *in views.py the method that packages the results from the mysql > query:http://pastebin.com/m3eef56e5 > > *the models associated with this from two applications: 'directory' and > 'common'http://pastebin.com/m3868a1fchttp://pastebin.com/m18ec3765 > > *python manage.py sqlall directory && python manage.py sqlall common: > > http://pastebin.com/m63a50593http://pastebin.com/m6f958cda > > As far as I can tell indexes are working and the queries should be fast. > > Here is the MySQL per iteration (will execute 500 times for my worst > case scenario which takes 12 seconds on average) > > SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM > `common_addresstype` WHERE `common_addresstype`.`id` = 1; > SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM > `common_addresstype` WHERE `common_addresstype`.`id` = 2; > SELECT `common_addradmin`.`id`, > `common_addradmin`.`surfaceMail_works`, > `common_addradmin`.`addr_enabled` FROM `common_addradmin` WHERE > `common_addradmin`.`id` = 1; > SELECT `common_address`.`id`, `common_address`.`airfield_id`, > `common_address`.`country_id`, `common_address`.`addresstype_id`, > `common_address`.`addradmin_id`, `common_address`.`location_id`, > `common_address`.`street1`, `common_address`.`street2`, > `common_address`.`user_lat_dec`, `common_address`.`user_long_dec`, > `common_address`.`zoom` FROM `common_address` INNER JOIN > `common_address_directory` ON (`common_address`.`id` = > `common_address_directory`.`address_id`) WHERE > `common_address_directory`.`directory_id` = 4267; > > Last but not least Explain for the above MySQL for 1 instance of 500 entries. > > mysql> EXPLAIN SELECT `common_addresstype`.`id`, > `common_addresstype`.`adrtype` FROM `common_addresstype` WHERE > `common_addresstype`.`id` = 1; > ++-++---+---+-+ > -+---+--+---+ > | id | select_type | table | type | possible_keys | key > | key_len | ref | rows | Extra | > ++-++---+---+-+ > -+---+--+---+ > | 1 | SIMPLE | common_addresstype | const | PRIMARY | > PRIMARY | 4 | const | 1 | | > ++-++---+---+-+ > -+---+--+---+ > > mysql> EXPLAIN SELECT `common_addresstype`.`id`, > `common_addresstype`.`adrtype` FROM `common_addresstype` WHERE >
Project optimisation stage: Advice to boost speed of database queries across tables?
Hi I have reached the optimisation stage of my project. I am trying to work out how to reduce the turnaround time for my queries. I implemented 'db_index=True' for the fields where i determined there should be a performance boost by indexing that particular field. This has not really improved the speed of the queries. Worst case senario is that my website will have to display about 500 out of 6000 entries in the db. Each entry bridges a many-to-many relationship with another table and also follows a reverse one-to-one which will return a set of related entries. Often the query takes 12-14 seconds from start to finish. (and 4-5 seconds to display around 100 results) The major performance penalty thus far has been measured to be when the MySQL statements for these queries are executed. The many-to-may and one-to-many components only return 1-3 related records at most for each entry and they always hit indexes so that part of the design is optimised. I have been using the following tools: print connection.queries import time and looking at the raw MySQL to see whatother optmisations could be made. use of MySQL EXPLAIN Anyway, here are the details: *in views.py the method that packages the results from the mysql query: http://pastebin.com/m3eef56e5 *the models associated with this from two applications: 'directory' and 'common' http://pastebin.com/m3868a1fc http://pastebin.com/m18ec3765 *python manage.py sqlall directory && python manage.py sqlall common: http://pastebin.com/m63a50593 http://pastebin.com/m6f958cda As far as I can tell indexes are working and the queries should be fast. Here is the MySQL per iteration (will execute 500 times for my worst case scenario which takes 12 seconds on average) SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM `common_addresstype` WHERE `common_addresstype`.`id` = 1; SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM `common_addresstype` WHERE `common_addresstype`.`id` = 2; SELECT `common_addradmin`.`id`, `common_addradmin`.`surfaceMail_works`, `common_addradmin`.`addr_enabled` FROM `common_addradmin` WHERE `common_addradmin`.`id` = 1; SELECT `common_address`.`id`, `common_address`.`airfield_id`, `common_address`.`country_id`, `common_address`.`addresstype_id`, `common_address`.`addradmin_id`, `common_address`.`location_id`, `common_address`.`street1`, `common_address`.`street2`, `common_address`.`user_lat_dec`, `common_address`.`user_long_dec`, `common_address`.`zoom` FROM `common_address` INNER JOIN `common_address_directory` ON (`common_address`.`id` = `common_address_directory`.`address_id`) WHERE `common_address_directory`.`directory_id` = 4267; Last but not least Explain for the above MySQL for 1 instance of 500 entries. mysql> EXPLAIN SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM `common_addresstype` WHERE `common_addresstype`.`id` = 1; ++-++---+---+-+-+---+--+---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-++---+---+-+-+---+--+---+ | 1 | SIMPLE | common_addresstype | const | PRIMARY | PRIMARY | 4 | const |1 | | ++-++---+---+-+-+---+--+---+ mysql> EXPLAIN SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM `common_addresstype` WHERE `common_addresstype`.`id` = 2; ++-++---+---+-+-+---+--+---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-++---+---+-+-+---+--+---+ | 1 | SIMPLE | common_addresstype | const | PRIMARY | PRIMARY | 4 | const |1 | | ++-++---+---+-+-+---+--+---+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT `common_addradmin`.`id`, `common_addradmin`.`surfaceMail_works`, `common_addradmin`.`addr_enabled` FROM `common_addradmin` WHERE `common_addradmin`.`id` = 1; ++-+--+---+---+-+-+---+--+---+ | id | select_type | table| type | possible_keys | key | key_len | ref | rows | Extra | ++-+--+---+---+-+-+---+--+---+ | 1 | SIMPLE | common_addradmin | const | PRIMARY | PRIMARY | 4 | const |1 | | ++-+--+---+---+-+-+---+--+---+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT `common_address`.`id`, `common_address`.`airfield_id`,