Re: Optimization of a Large QuerySet iteration
On Jul 26, 4:59 pm, nixlistswrote: > On Tue, Jul 26, 2011 at 7:42 AM, Roman Klesel > wrote: > > ... > > > The main question you may ask yourself may be whether or not you > > really want to have django do the whole calculation thing. Many > > database engines have very powerful aggregation capabilities, support > > for stored procedures, functions etc. > > Some of the aggregation features are available through the django orm. > > Thanks, > > Is it a best practice to take a bunch of code out to the stored > procedures as much as possible, or the opposite is true with Django? It all depends on your project's requirements. A generic app should be as db-agnostic as possible (that is, should work on any DB you can use with Django), which may make stored procs somewhat unpractical. If it's a specific, custom app and the specs says "it will run on this exact DB, period", then stored procs may be fine (just make sure the customer / DBA is ok...). This being said, you can already do a lot with standard SQL functions and aggregation. -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Optimization of a Large QuerySet iteration
2011/7/26 nixlists: > Is it a best practice to take a bunch of code out to the stored > procedures as much as possible, or the opposite is true with Django? I don't want to comment on whether it's best practice or not ... I was thinking about efficiency: If the data processing gets too slow doing it with django, shipping the task to the database could improve things. On the other hand, if the speed is not the main issue. Keeping things simple and in one place is also nice. I just wanted to point out that usually databases are capable of doing these things very efficiently. Regards Roman -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Optimization of a Large QuerySet iteration
On Tue, Jul 26, 2011 at 7:42 AM, Roman Kleselwrote: ... > The main question you may ask yourself may be whether or not you > really want to have django do the whole calculation thing. Many > database engines have very powerful aggregation capabilities, support > for stored procedures, functions etc. > Some of the aggregation features are available through the django orm. Thanks, Is it a best practice to take a bunch of code out to the stored procedures as much as possible, or the opposite is true with Django? -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Optimization of a Large QuerySet iteration
On Tue, Jul 26, 2011 at 9:52 AM, Tom Evanswrote: > p_dict = dict(contract_products.values_list('product', 'wac__wac')) > > This should cut down the number of queries in the view significantly. Thanks! This is much better than what I had. -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Optimization of a Large QuerySet iteration
On Mon, Jul 25, 2011 at 4:57 AM, nixlistswrote: > Hi. I am a newbie :) > > Is there a way to rewrite the dictionary code above to make it more > readable, and/or rewrite everything in a more efficient manner in > general? It still takes a bit of time to run (but much faster than > doing queries inside the loop). > > If I am doing anything glaringly wrong, or if you have any other > suggestions please let me know. I really appreciate your help. > > Thanks. > I've rearranged your mail to make it a bit easier to answer. You are on the right track with your approach here, but you are doing too much in loops, which is why you feel it is slow. > c = Contract.objects.get(id = contract_id) > > p_per_c=c.products.all() > > c_dict = dict((product.id, product.contractproduct_set.get( > contract=contract_id,ndc=product.id).rebate_pct) > for product in p_per_c) > > p_dict = dict((product.id, product.contractproduct_set.get( > contract=contract_id, ndc=product.id).wac.wac) > for product in p_per_c) So, this is the critical section of your code. You are trying to build dictionaries of data that you can then use later on, to avoid hitting the database repeatedly. However, the queries you use here are a bit naive, and mean you have to do many similar queries to build up your data sets. This is where the code can be improved. c_dict seems to contain the product id => rebate_pct from the appropriate ContractProduct for each Product associated with this Contract, but it must perform one query to get the list of products, and one query per product to get the rebate_pct. This could be simplified - there are many ways to approach this, but I like to start from the thing that I want - the ContractProduct: ContractProduct.objects.filter(contract=c) should produce all the items we are interested in for this contract. The next performance tip is to not fetch and create django model instances when all you want is the data - just ask for it using values() or values_list(), hence: contract_products = ContractProduct.objects.filter(contract=c) c_dict = dict(contract_products.values_list('product', 'rebate_pct')) p_dict, unless I've misread, is just slightly different data from the same data set, and hence applying the same approach: p_dict = dict(contract_products.values_list('product', 'wac__wac')) This should cut down the number of queries in the view significantly. Cheers Tom -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Optimization of a Large QuerySet iteration
Hello, 2011/7/25 nixlists: > Is there a way to rewrite the dictionary code above to make it more > readable, and/or rewrite everything in a more efficient manner in > general? I also consider myself a beginner in programming but I've been using django for about 1 year now. To me the code looks quite ok and straight forward. The main question you may ask yourself may be whether or not you really want to have django do the whole calculation thing. Many database engines have very powerful aggregation capabilities, support for stored procedures, functions etc. Some of the aggregation features are available through the django orm. But that's a design decision that may depend on various factors. If the db-host is powerful enough however it may speed up things quite a bit. Regards Roman -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Optimization of a Large QuerySet iteration
Hi. I am a newbie :) This is probably not a very Django-like design at this point, but the goal is to move towards it from a legacy-type design DB. My models are currently set up something like the following (since there is quite a bit of trial-and-error database design and data import work to see how to best design/redesign still at this stage, most of the fields are allowed to be blank and NULL, hopefully this will be resolved and DB design will be finalized): class Product(models.Model): ndc = models.CharField(max_length=50, blank=True, null=True) product_name = models.CharField(max_length=50, blank=True, null=True) quantity = models.IntegerField(blank=True, null=True) product_description = models.CharField(max_length=50, blank=True, null=True) class Pricing(models.Model): ndc = models.ForeignKey(Product) wac = models.DecimalField(max_digits=17, decimal_places=8, blank=True, null=True) wac_date = models.DateField(blank = True, null = True) wacdate_end = models.DateField(blank = True, null = True) class Contract(models.Model): products = models.ManyToManyField(Product, through='ContractProduct') ... class ContractProduct(models.Model): contract = models.ForeignKey(Contract) ndc = models.ForeignKey(Product) rebate_pct = models.DecimalField(max_digits=17, decimal_places=8) wac = models.ForeignKey(Pricing) class Claims(models.Model): contract = models.ForeignKey(Contract) ndc = models.ForeignKey(Product) date = models.DateField(blank=True, null=True) claim_number = models.CharField(max_length=50, blank=True, null=True) quantity = models.IntegerField(blank=True, null=True) prices = models.ManyToManyField(Pricing) # likely not needed ... One of my views returns many objects from the Claims table and iterates over them to calculate some values based on all the objects in the query. In order to optimize things a bit and not do a bazillion queries inside the for loop, I build small dictionaries and iterate over those instead, something like the following: @login_required def agreement(request, contract_id, quarter, paginate_by=30, **kwargs): c = Contract.objects.get(id = contract_id) p_per_c=c.products.all() c_dict = dict((product.id, product.contractproduct_set.get( contract=contract_id,ndc=product.id).rebate_pct) for product in p_per_c) p_dict = dict((product.id, product.contractproduct_set.get( contract=contract_id, ndc=product.id).wac.wac) for product in p_per_c) yearnow = datetime.date.today().year quarters = [ [datetime.date(yearnow, 1, 1), datetime.date(yearnow, 3, 31)], [datetime.date(yearnow, 4, 1), datetime.date(yearnow, 6, 30)], [datetime.date(yearnow, 7, 1), datetime.date(yearnow, 9, 30)], [datetime.date(yearnow, 10, 1), datetime.date(yearnow, 12, 31)] ] start_date = quarters[int(quarter)-1][0] end_date = quarters[int(quarter)-1][1] cset = Claims.objects.filter(contract = contract_id, rebate_period_start = start_date, rebate_period_end__lte = end_date) t_rebate = 0; t_wac_sales = 0 for row in cset: rebate_pct = c_dict[row.ndc_id] rebate = row.quantity * p_dict[row.ndc_id] * rebate_pct wac_sales = row.quantity * p_dict[row.ndc_id] t_wac_sales += wac_sales t_rebate += rebate Is there a way to rewrite the dictionary code above to make it more readable, and/or rewrite everything in a more efficient manner in general? It still takes a bit of time to run (but much faster than doing queries inside the loop). If I am doing anything glaringly wrong, or if you have any other suggestions please let me know. I really appreciate your help. Thanks. -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: queryset iteration
On Feb 3, 6:19 pm, Malcolm Tredinnickwrote: > "Default" iteration over a queryset (using __iter__) also caches the > instances in memory so that they can be reused without requiring extra > database queries. Normally, the memory overhead is negligible, since the > queryset is being used to put results on, say, a web page, when a > billion rows isn't the right number to send back. > > If you want to iterate over the results without that caching behaviour, > using the iterator() method on the Queryset, rather than the __iter__() > method. Thus: > > for i in qs.iterator(): > ... > > There will still be a lot of memory used here (proportional to the > number of results returned, which in your case is "a lot"), since the > database wrapper (psycopg or whatever) will pull all the data back from > the database. That will still be a multiple less than also keeping > copies of the Python objects, but it won't be zero. Using a pagination > equivalent, as your doing, could well be the best solution if you want > to keep memory at an absolute minimum. > Thanks, Malcolm, for the detailed explanation. Using iterator() worked great! While it still took more memory than my manual slicing method (but still a manageable amount, in my case), it reduced the number of queries to the database, which are time consuming. Regards, Casey --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: queryset iteration
On Tue, 2009-02-03 at 15:06 -0800, Casey Deccio wrote: > Hi, > > I'm using django 1.0.2. I have an model with a large number of > records. I was hoping that the built-in iteration on queryset objects > would reduce the memory requirement, but it is using up all my memory > (and swap), and I'm forced to break up the chunks manually using > queryset slices. Iterating over slices of 1 entries at a time my > memory usage is only around 40MB of memory, but without breaking it up > manually (i.e., iterating over the entire, unsliced queryset) was > using over 1GB before I killed it. "Default" iteration over a queryset (using __iter__) also caches the instances in memory so that they can be reused without requiring extra database queries. Normally, the memory overhead is negligible, since the queryset is being used to put results on, say, a web page, when a billion rows isn't the right number to send back. If you want to iterate over the results without that caching behaviour, using the iterator() method on the Queryset, rather than the __iter__() method. Thus: for i in qs.iterator(): ... There will still be a lot of memory used here (proportional to the number of results returned, which in your case is "a lot"), since the database wrapper (psycopg or whatever) will pull all the data back from the database. That will still be a multiple less than also keeping copies of the Python objects, but it won't be zero. Using a pagination equivalent, as your doing, could well be the best solution if you want to keep memory at an absolute minimum. Regards, Malcolm > > Is there anything I might be missing? > > Here are the examples: > > # without manual slicing > qs = MyModel.objects.all() > for obj in qs: > # do something > # ... > pass > > # with manual slicing > index = 0 > qs = MyModel.objects.all()[0:1] > while qs: > for obj in qs: > # do something > # ... > pass > index += 1 > qs = MyModel.objects.all()[index:index+1] > > Regards, > Casey > > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
queryset iteration
Hi, I'm using django 1.0.2. I have an model with a large number of records. I was hoping that the built-in iteration on queryset objects would reduce the memory requirement, but it is using up all my memory (and swap), and I'm forced to break up the chunks manually using queryset slices. Iterating over slices of 1 entries at a time my memory usage is only around 40MB of memory, but without breaking it up manually (i.e., iterating over the entire, unsliced queryset) was using over 1GB before I killed it. Is there anything I might be missing? Here are the examples: # without manual slicing qs = MyModel.objects.all() for obj in qs: # do something # ... pass # with manual slicing index = 0 qs = MyModel.objects.all()[0:1] while qs: for obj in qs: # do something # ... pass index += 1 qs = MyModel.objects.all()[index:index+1] Regards, Casey --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---