Re: Optimization of a Large QuerySet iteration

2011-07-27 Thread bruno desthuilliers
On Jul 26, 4:59 pm, nixlists  wrote:
> On Tue, Jul 26, 2011 at 7:42 AM, Roman Klesel 
> wrote:
>
> ...
>
> > The main question you may ask yourself may be whether or not you
> > really want to have django do the whole calculation thing. Many
> > database engines have very powerful aggregation capabilities, support
> > for stored procedures, functions etc.
> > Some of the aggregation features are available through the django orm.
>
> Thanks,
>
> Is it a best practice to take a bunch of code out to the stored
> procedures as much as possible, or the opposite is true with Django?

It all depends on your project's requirements. A generic app should be
as db-agnostic as possible (that is, should work on any DB you can use
with Django), which may make stored procs somewhat unpractical. If
it's a specific, custom app and the specs says "it will run on this
exact DB, period", then stored procs may be fine (just make sure the
customer / DBA is ok...).

This being said, you can already do a lot with standard SQL functions
and aggregation.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Optimization of a Large QuerySet iteration

2011-07-27 Thread Roman Klesel
2011/7/26 nixlists :

> Is it a best practice to take a bunch of code out to the stored
> procedures as much as possible, or the opposite is true with Django?

I don't want to comment on whether it's best practice or not ...

I was thinking about efficiency:
If the data processing gets too slow doing it with django, shipping
the task to the database could improve things.

On the other hand, if the speed is not the main issue. Keeping things
simple and in one place is also nice.

I just wanted to point out that usually databases are capable of doing
these things very efficiently.

Regards Roman

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Optimization of a Large QuerySet iteration

2011-07-26 Thread nixlists
On Tue, Jul 26, 2011 at 7:42 AM, Roman Klesel
 wrote:
...

> The main question you may ask yourself may be whether or not you
> really want to have django do the whole calculation thing. Many
> database engines have very powerful aggregation capabilities, support
> for stored procedures, functions etc.
> Some of the aggregation features are available through the django orm.

Thanks,

Is it a best practice to take a bunch of code out to the stored
procedures as much as possible, or the opposite is true with Django?

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Optimization of a Large QuerySet iteration

2011-07-26 Thread nixlists
On Tue, Jul 26, 2011 at 9:52 AM, Tom Evans  wrote:
>  p_dict = dict(contract_products.values_list('product', 'wac__wac'))
>
> This should cut down the number of queries in the view significantly.

Thanks! This is much better than what I had.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Optimization of a Large QuerySet iteration

2011-07-26 Thread Tom Evans
On Mon, Jul 25, 2011 at 4:57 AM, nixlists  wrote:
> Hi. I am a newbie  :)
>
> Is there a way to rewrite the dictionary code above to make it more
> readable, and/or rewrite everything in a more efficient manner in
> general? It still takes a bit of time to run (but much faster than
> doing queries inside the loop).
>
> If I am doing anything glaringly wrong, or if you have any other
> suggestions please let me know. I really appreciate your help.
>
> Thanks.
>

I've rearranged your mail to make it a bit easier to answer. You are
on the right track with your approach here, but you are doing too much
in loops, which is why you feel it is slow.


>    c = Contract.objects.get(id = contract_id)
>
>    p_per_c=c.products.all()
>
>    c_dict = dict((product.id, product.contractproduct_set.get(
>        contract=contract_id,ndc=product.id).rebate_pct)
>        for product in p_per_c)
>
>    p_dict = dict((product.id, product.contractproduct_set.get(
>        contract=contract_id, ndc=product.id).wac.wac)
>        for product in p_per_c)


So, this is the critical section of your code. You are trying to build
dictionaries of data that you can then use later on, to avoid hitting
the database repeatedly. However, the queries you use here are a bit
naive, and mean you have to do many similar queries to build up your
data sets. This is where the code can be improved.

c_dict seems to contain the product id => rebate_pct from the
appropriate ContractProduct for each Product associated with this
Contract, but it must perform one query to get the list of products,
and one query per product to get the rebate_pct. This could be
simplified - there are many ways to approach this, but I like to start
from the thing that I want - the ContractProduct:

  ContractProduct.objects.filter(contract=c)

should produce all the items we are interested in for this contract.
The next performance tip is to not fetch and create django model
instances when all you want is the data - just ask for it using
values() or values_list(), hence:

  contract_products = ContractProduct.objects.filter(contract=c)
  c_dict = dict(contract_products.values_list('product', 'rebate_pct'))

p_dict, unless I've misread, is just slightly different data from the
same data set, and hence applying the same approach:

  p_dict = dict(contract_products.values_list('product', 'wac__wac'))

This should cut down the number of queries in the view significantly.

Cheers

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Optimization of a Large QuerySet iteration

2011-07-26 Thread Roman Klesel
Hello,

2011/7/25 nixlists :
> Is there a way to rewrite the dictionary code above to make it more
> readable, and/or rewrite everything in a more efficient manner in
> general?

I also consider myself a beginner in programming but I've been using
django for about 1 year now.

To me the code looks quite ok and straight forward.

The main question you may ask yourself may be whether or not you
really want to have django do the whole calculation thing. Many
database engines have very powerful aggregation capabilities, support
for stored procedures, functions etc.
Some of the aggregation features are available through the django orm.

But that's a design decision that may depend on various factors.

If the db-host is powerful enough however it may speed up things quite a bit.

Regards
  Roman

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Optimization of a Large QuerySet iteration

2011-07-24 Thread nixlists
Hi. I am a newbie  :)

This is probably not a very Django-like design at this point, but the
goal is to move towards it from a legacy-type design DB.

My models are currently set up something like the following (since
there is quite a bit of trial-and-error database design and data
import work to see how to best design/redesign still at this stage,
most of the fields are allowed to be blank and NULL, hopefully this
will be resolved and DB design will be finalized):

class Product(models.Model):
ndc = models.CharField(max_length=50, blank=True, null=True)
product_name = models.CharField(max_length=50, blank=True, null=True)
quantity = models.IntegerField(blank=True, null=True)
product_description = models.CharField(max_length=50, blank=True, null=True)

class Pricing(models.Model):
ndc = models.ForeignKey(Product)
wac = models.DecimalField(max_digits=17, decimal_places=8,
blank=True, null=True)
wac_date = models.DateField(blank = True, null = True)
wacdate_end = models.DateField(blank = True, null = True)

class Contract(models.Model):
products = models.ManyToManyField(Product, through='ContractProduct')
...

class ContractProduct(models.Model):
contract = models.ForeignKey(Contract)
ndc = models.ForeignKey(Product)
rebate_pct = models.DecimalField(max_digits=17, decimal_places=8)
wac = models.ForeignKey(Pricing)

class Claims(models.Model):
contract = models.ForeignKey(Contract)
ndc = models.ForeignKey(Product)
date = models.DateField(blank=True, null=True)
claim_number = models.CharField(max_length=50, blank=True, null=True)
quantity = models.IntegerField(blank=True, null=True)
prices = models.ManyToManyField(Pricing) # likely not needed
...

One of my views returns many objects from the Claims table and
iterates over them to calculate some values based on all the objects
in the query. In order to optimize things a bit and not do a bazillion
queries inside the for loop, I build small dictionaries and iterate
over those instead, something like the following:

@login_required
def agreement(request, contract_id, quarter, paginate_by=30, **kwargs):
c = Contract.objects.get(id = contract_id)

p_per_c=c.products.all()

c_dict = dict((product.id, product.contractproduct_set.get(
contract=contract_id,ndc=product.id).rebate_pct)
for product in p_per_c)

p_dict = dict((product.id, product.contractproduct_set.get(
contract=contract_id, ndc=product.id).wac.wac)
for product in p_per_c)


yearnow = datetime.date.today().year

quarters = [ [datetime.date(yearnow, 1, 1), datetime.date(yearnow, 3, 31)],
 [datetime.date(yearnow, 4, 1), datetime.date(yearnow, 6, 30)],
 [datetime.date(yearnow, 7, 1), datetime.date(yearnow, 9, 30)],
 [datetime.date(yearnow, 10, 1), datetime.date(yearnow, 12, 31)]
  ]
start_date = quarters[int(quarter)-1][0]
end_date =   quarters[int(quarter)-1][1]

cset = Claims.objects.filter(contract = contract_id,
rebate_period_start = start_date, rebate_period_end__lte = end_date)

t_rebate = 0; t_wac_sales = 0

for row in cset:
rebate_pct = c_dict[row.ndc_id]
rebate = row.quantity * p_dict[row.ndc_id] * rebate_pct
wac_sales = row.quantity * p_dict[row.ndc_id]
t_wac_sales += wac_sales
t_rebate += rebate


Is there a way to rewrite the dictionary code above to make it more
readable, and/or rewrite everything in a more efficient manner in
general? It still takes a bit of time to run (but much faster than
doing queries inside the loop).

If I am doing anything glaringly wrong, or if you have any other
suggestions please let me know. I really appreciate your help.

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: queryset iteration

2009-02-04 Thread Casey

On Feb 3, 6:19 pm, Malcolm Tredinnick 
wrote:
> "Default" iteration over a queryset (using __iter__) also caches the
> instances in memory so that they can be reused without requiring extra
> database queries. Normally, the memory overhead is negligible, since the
> queryset is being used to put results on, say, a web page, when a
> billion rows isn't the right number to send back.
>
> If you want to iterate over the results without that caching behaviour,
> using the iterator() method on the Queryset, rather than the __iter__()
> method. Thus:
>
>         for i in qs.iterator():
>            ...
>
> There will still be a lot of memory used here (proportional to the
> number of results returned, which in your case is "a lot"), since the
> database wrapper (psycopg or whatever) will pull all the data back from
> the database. That will still be a multiple less than also keeping
> copies of the Python objects, but it won't be zero. Using a pagination
> equivalent, as your doing, could well be the best solution if you want
> to keep memory at an absolute minimum.
>

Thanks, Malcolm, for the detailed explanation.  Using iterator()
worked great!  While it still took more memory than my manual slicing
method (but still a manageable amount, in my case), it reduced the
number of queries to the database, which are time consuming.

Regards,
Casey
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: queryset iteration

2009-02-03 Thread Malcolm Tredinnick

On Tue, 2009-02-03 at 15:06 -0800, Casey Deccio wrote:
>  Hi,
> 
> I'm using django 1.0.2.  I have an model with a large number of
> records.  I was hoping that the built-in iteration on queryset objects
> would reduce the memory requirement, but it is using up all my memory
> (and swap), and I'm forced to break up the chunks manually using
> queryset slices.  Iterating over slices of 1 entries at a time my
> memory usage is only around 40MB of memory, but without breaking it up
> manually (i.e., iterating over the entire, unsliced queryset) was
> using over 1GB before I killed it.

"Default" iteration over a queryset (using __iter__) also caches the
instances in memory so that they can be reused without requiring extra
database queries. Normally, the memory overhead is negligible, since the
queryset is being used to put results on, say, a web page, when a
billion rows isn't the right number to send back.

If you want to iterate over the results without that caching behaviour,
using the iterator() method on the Queryset, rather than the __iter__()
method. Thus:

for i in qs.iterator():
   ...

There will still be a lot of memory used here (proportional to the
number of results returned, which in your case is "a lot"), since the
database wrapper (psycopg or whatever) will pull all the data back from
the database. That will still be a multiple less than also keeping
copies of the Python objects, but it won't be zero. Using a pagination
equivalent, as your doing, could well be the best solution if you want
to keep memory at an absolute minimum.

Regards,
Malcolm

> 
> Is there anything I might be missing?
> 
> Here are the examples:
> 
> # without manual slicing
> qs = MyModel.objects.all()
> for obj in qs:
> # do something
> # ...
> pass
> 
> # with manual slicing
> index = 0
> qs = MyModel.objects.all()[0:1]
> while qs:
> for obj in qs:
> # do something
> # ...
> pass
> index += 1
> qs = MyModel.objects.all()[index:index+1]
> 
> Regards,
> Casey
> 
> > 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



queryset iteration

2009-02-03 Thread Casey Deccio
Hi,

I'm using django 1.0.2.  I have an model with a large number of records.  I
was hoping that the built-in iteration on queryset objects would reduce the
memory requirement, but it is using up all my memory (and swap), and I'm
forced to break up the chunks manually using queryset slices.  Iterating
over slices of 1 entries at a time my memory usage is only around 40MB
of memory, but without breaking it up manually (i.e., iterating over the
entire, unsliced queryset) was using over 1GB before I killed it.

Is there anything I might be missing?

Here are the examples:

# without manual slicing
qs = MyModel.objects.all()
for obj in qs:
# do something
# ...
pass

# with manual slicing
index = 0
qs = MyModel.objects.all()[0:1]
while qs:
for obj in qs:
# do something
# ...
pass
index += 1
qs = MyModel.objects.all()[index:index+1]

Regards,
Casey

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---