Re: Optimizing DB query involving annotate and aggregate

2014-02-11 Thread ST
Managed it in the end, so for reference here is what I ended up with:

def total_credit_debt(self):
transaction_totals = 
Transaction.objects.filter(member__isnull=False).values('member').annotate(total=Sum('amount')).order_by()
creditors = transaction_totals.filter(total__gt=0)
debtors = transaction_totals.filter(total__lt=0)
total_credit = sum(row['total'] for row in creditors)
total_debt = sum(row['total'] for row in debtors)
return total_credit, total_debt

(@classmethod on TransactionManager)

Unfortunately I didn't manage to get it to do the second sum inside the 
database as well, but this is fast enough - there's only a few hundred 
members with non-zero balances. If you think there's any way this could be 
improved / cleaned up, I'd be happy to hear:)

Thanks!


On Thursday, February 6, 2014 7:59:37 PM UTC, arnonym wrote:
>
> On Wed, 5 Feb 2014 10:11:29 -0800 (PST) ST 
>  
>
> wrote: 
> > Hi, 
> > 
> > I'm trying to optimize the run-time of getting total credit and debt 
> > values out of our database. Ideally I'd like to formulate it as a 
> > Django query. This is the raw SQL query I have, which produces the 
> > right answer and is very fast (milliseconds): 
> > 
> > SELECT sum(tg.total) FROM 
> >   ( 
> >   SELECT sum(t.amount) AS total, t.member_id AS member_id 
> >   FROM club_transaction AS t 
> >   WHERE t.member_id IS NOT NULL 
> >   GROUP BY t.member_id 
> >   ) AS tg 
> > WHERE tg.total < 0 
> > 
> > (plus a second query for > 0) 
> > 
> > My Django implementation was: 
> > 
> > m = 
> > Member.objects.annotate(balance=Sum('transaction__amount')) m_debt = 
> > m.filter(balance__lt=0).aggregate(total=Sum('balance')) m_credit = 
> > m.filter(balance__gt=0).aggregate(total=Sum('balance')) 
> > 
> > which looks a lot nicer, is easier to understand and maintain. 
> > 
> > However, it results in the following SQL query (slightly redacted): 
> > 
> > SELECT SUM(balance) FROM 
> >   ( 
> >   SELECT "club_member"."id" AS "id", {all the other fields}, 
> > SUM("club_transaction"."amount") AS "balance" 
> >   FROM "club_member" 
> >   LEFT OUTER JOIN "auth_user" ON ("club_member"."user_id" = 
> > "auth_user"."id") 
> >   LEFT OUTER JOIN "club_transaction" ON ("club_member"."id" = 
> > "club_transaction"."member_id") 
> >   GROUP BY "club_member"."id", {all the other fields}, 
> > "auth_user"."last_name", "auth_user"."first_name" 
> >   HAVING SUM("club_transaction"."amount") < 0 
> >   ORDER BY "auth_user"."last_name" ASC, "auth_user"."first_name" ASC 
> >   ) subquery 
> > 
> > (again, plus another one for > 0) 
> > which is very slow (almost 1.5 seconds). 
> > 
> > How can I construct a Django query which doesn't request (and group 
> > by) all the unnecessary other fields ? 
> > I already tried playing around with only() and values() but never got 
> > it to work. 
>
> I did something similar a few years back. Its next to impossible with 
> django-orm, just do it in raw sql. 
> The django-orm can't really do any advanced group_by clauses. And you 
> can't give them with extra(). Well, you can give them but they won't be 
> used;-) 
>
> Doing two (or three) orm-queries and then joining the data in python 
> will actually be slower then doing it all in hand-crafted sql. 
>
> So just do the sql by hand. And then optimize by having three columns, 
> one with the SUM(amount) if amount>0, one with the SUM(amount) if 
> amount<0 and one with the SUM(amount). Total credits, total depts and 
> balance all in one query (if possible)... 
>
> - Arnold 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/73bd4ddb-d6ee-461a-bed3-a3ee915b8859%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Optimizing DB query involving annotate and aggregate

2014-02-06 Thread Arnold Krille
On Wed, 5 Feb 2014 10:11:29 -0800 (PST) ST 
wrote:
> Hi,
> 
> I'm trying to optimize the run-time of getting total credit and debt
> values out of our database. Ideally I'd like to formulate it as a
> Django query. This is the raw SQL query I have, which produces the
> right answer and is very fast (milliseconds):
> 
> SELECT sum(tg.total) FROM
>   (
>   SELECT sum(t.amount) AS total, t.member_id AS member_id
>   FROM club_transaction AS t
>   WHERE t.member_id IS NOT NULL
>   GROUP BY t.member_id
>   ) AS tg
> WHERE tg.total < 0
> 
> (plus a second query for > 0)
> 
> My Django implementation was:
> 
> m =
> Member.objects.annotate(balance=Sum('transaction__amount')) m_debt =
> m.filter(balance__lt=0).aggregate(total=Sum('balance')) m_credit =
> m.filter(balance__gt=0).aggregate(total=Sum('balance'))
> 
> which looks a lot nicer, is easier to understand and maintain.
> 
> However, it results in the following SQL query (slightly redacted):
> 
> SELECT SUM(balance) FROM
>   (
>   SELECT "club_member"."id" AS "id", {all the other fields}, 
> SUM("club_transaction"."amount") AS "balance"
>   FROM "club_member"
>   LEFT OUTER JOIN "auth_user" ON ("club_member"."user_id" = 
> "auth_user"."id")
>   LEFT OUTER JOIN "club_transaction" ON ("club_member"."id" = 
> "club_transaction"."member_id")
>   GROUP BY "club_member"."id", {all the other fields}, 
> "auth_user"."last_name", "auth_user"."first_name"
>   HAVING SUM("club_transaction"."amount") < 0
>   ORDER BY "auth_user"."last_name" ASC, "auth_user"."first_name" ASC
>   ) subquery
> 
> (again, plus another one for > 0)
> which is very slow (almost 1.5 seconds).
> 
> How can I construct a Django query which doesn't request (and group
> by) all the unnecessary other fields ?
> I already tried playing around with only() and values() but never got
> it to work.

I did something similar a few years back. Its next to impossible with
django-orm, just do it in raw sql.
The django-orm can't really do any advanced group_by clauses. And you
can't give them with extra(). Well, you can give them but they won't be
used;-)

Doing two (or three) orm-queries and then joining the data in python
will actually be slower then doing it all in hand-crafted sql.

So just do the sql by hand. And then optimize by having three columns,
one with the SUM(amount) if amount>0, one with the SUM(amount) if
amount<0 and one with the SUM(amount). Total credits, total depts and
balance all in one query (if possible)...

- Arnold


signature.asc
Description: PGP signature


Re: Optimizing DB query involving annotate and aggregate

2014-02-06 Thread ST
On Wednesday, February 5, 2014 8:01:53 PM UTC, Anssi Kääriäinen wrote:
>
> Something like this might work:
> 
> Transaction.objects.values_list('member_id').annotate(total=Sum('amount')).filter(total__gt=0).aggregate(Sum('total'))
>

This didn't work - it produced a "SELECT  FROM" query, which obviously 
didn't work - tried adding 'amount' to the values_list, but that didn't 
help either. Eventually got it to work by using .only('member', 'amount') 
instead, and it *was* fast, but it didn't quite do what we need: we only 
want the *outstanding* credit/debt, i.e. some member might have many 
transactions, but they all add up to zero - so instead of increasing both 
total_debt and total_credit by that value, it shouldn't affect the totals. 
Which is why I wanted to total up by members first... any idea how to get 
that to work ?

 

>
> That is, don't start from Member, Django isn't smart enough to get rid of 
> the non-necessary joins. Instead go directly for the query you wrote in 
> SQL. In the ORM .values() is the way to control GROUP BY clause. Getting 
> the order of .values() and .annotate() right is important, and it is 
> sometimes hard to figure out how to do it correctly. Also, if you have some 
> extra ordering going on in the query just issue an empty .order_by() call - 
> that will get rid of all ordering.
>
>  - Anssi
>
> On Wednesday, February 5, 2014 8:11:29 PM UTC+2, ST wrote:
>>
>> Hi,
>>
>> I'm trying to optimize the run-time of getting total credit and debt 
>> values out of our database. Ideally I'd like to formulate it as a Django 
>> query. This is the raw SQL query I have, which produces the right answer 
>> and is very fast (milliseconds):
>>
>> SELECT sum(tg.total) FROM
>>   (
>>   SELECT sum(t.amount) AS total, t.member_id AS member_id
>>   FROM club_transaction AS t
>>   WHERE t.member_id IS NOT NULL
>>   GROUP BY t.member_id
>>   ) AS tg
>> WHERE tg.total < 0
>>
>> (plus a second query for > 0)
>>
>> My Django implementation was:
>>
>> m = Member.objects.annotate(balance=Sum('transaction__amount'))
>> m_debt = m.filter(balance__lt=0).aggregate(total=Sum('balance'))
>> m_credit = m.filter(balance__gt=0).aggregate(total=Sum('balance'))
>>
>> which looks a lot nicer, is easier to understand and maintain.
>>
>> However, it results in the following SQL query (slightly redacted):
>>
>> SELECT SUM(balance) FROM
>>   (
>>   SELECT "club_member"."id" AS "id", {all the other fields}, 
>> SUM("club_transaction"."amount") AS "balance"
>>   FROM "club_member"
>>   LEFT OUTER JOIN "auth_user" ON ("club_member"."user_id" = 
>> "auth_user"."id")
>>   LEFT OUTER JOIN "club_transaction" ON ("club_member"."id" = 
>> "club_transaction"."member_id")
>>   GROUP BY "club_member"."id", {all the other fields}, 
>> "auth_user"."last_name", "auth_user"."first_name"
>>   HAVING SUM("club_transaction"."amount") < 0
>>   ORDER BY "auth_user"."last_name" ASC, "auth_user"."first_name" ASC
>>   ) subquery
>>
>> (again, plus another one for > 0)
>> which is very slow (almost 1.5 seconds).
>>
>> How can I construct a Django query which doesn't request (and group by) 
>> all the unnecessary other fields ?
>> I already tried playing around with only() and values() but never got it 
>> to work.
>>
>> Looking forward to your responses!
>>
>> best regards,
>> ST
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/3e239afe-037c-4ace-b9eb-ac2404f75981%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Optimizing DB query involving annotate and aggregate

2014-02-05 Thread Anssi Kääriäinen
Something like this might work:

Transaction.objects.values_list('member_id').annotate(total=Sum('amount')).filter(total__gt=0).aggregate(Sum('total'))

That is, don't start from Member, Django isn't smart enough to get rid of 
the non-necessary joins. Instead go directly for the query you wrote in 
SQL. In the ORM .values() is the way to control GROUP BY clause. Getting 
the order of .values() and .annotate() right is important, and it is 
sometimes hard to figure out how to do it correctly. Also, if you have some 
extra ordering going on in the query just issue an empty .order_by() call - 
that will get rid of all ordering.

 - Anssi

On Wednesday, February 5, 2014 8:11:29 PM UTC+2, ST wrote:
>
> Hi,
>
> I'm trying to optimize the run-time of getting total credit and debt 
> values out of our database. Ideally I'd like to formulate it as a Django 
> query. This is the raw SQL query I have, which produces the right answer 
> and is very fast (milliseconds):
>
> SELECT sum(tg.total) FROM
>   (
>   SELECT sum(t.amount) AS total, t.member_id AS member_id
>   FROM club_transaction AS t
>   WHERE t.member_id IS NOT NULL
>   GROUP BY t.member_id
>   ) AS tg
> WHERE tg.total < 0
>
> (plus a second query for > 0)
>
> My Django implementation was:
>
> m = Member.objects.annotate(balance=Sum('transaction__amount'))
> m_debt = m.filter(balance__lt=0).aggregate(total=Sum('balance'))
> m_credit = m.filter(balance__gt=0).aggregate(total=Sum('balance'))
>
> which looks a lot nicer, is easier to understand and maintain.
>
> However, it results in the following SQL query (slightly redacted):
>
> SELECT SUM(balance) FROM
>   (
>   SELECT "club_member"."id" AS "id", {all the other fields}, 
> SUM("club_transaction"."amount") AS "balance"
>   FROM "club_member"
>   LEFT OUTER JOIN "auth_user" ON ("club_member"."user_id" = 
> "auth_user"."id")
>   LEFT OUTER JOIN "club_transaction" ON ("club_member"."id" = 
> "club_transaction"."member_id")
>   GROUP BY "club_member"."id", {all the other fields}, 
> "auth_user"."last_name", "auth_user"."first_name"
>   HAVING SUM("club_transaction"."amount") < 0
>   ORDER BY "auth_user"."last_name" ASC, "auth_user"."first_name" ASC
>   ) subquery
>
> (again, plus another one for > 0)
> which is very slow (almost 1.5 seconds).
>
> How can I construct a Django query which doesn't request (and group by) 
> all the unnecessary other fields ?
> I already tried playing around with only() and values() but never got it 
> to work.
>
> Looking forward to your responses!
>
> best regards,
> ST
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/ed49abcb-2089-4c4f-a46a-7ba67f3caa81%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Optimizing DB query involving annotate and aggregate

2014-02-05 Thread ST
Hi,

I'm trying to optimize the run-time of getting total credit and debt values 
out of our database. Ideally I'd like to formulate it as a Django query. 
This is the raw SQL query I have, which produces the right answer and is 
very fast (milliseconds):

SELECT sum(tg.total) FROM
  (
  SELECT sum(t.amount) AS total, t.member_id AS member_id
  FROM club_transaction AS t
  WHERE t.member_id IS NOT NULL
  GROUP BY t.member_id
  ) AS tg
WHERE tg.total < 0

(plus a second query for > 0)

My Django implementation was:

m = Member.objects.annotate(balance=Sum('transaction__amount'))
m_debt = m.filter(balance__lt=0).aggregate(total=Sum('balance'))
m_credit = m.filter(balance__gt=0).aggregate(total=Sum('balance'))

which looks a lot nicer, is easier to understand and maintain.

However, it results in the following SQL query (slightly redacted):

SELECT SUM(balance) FROM
  (
  SELECT "club_member"."id" AS "id", {all the other fields}, 
SUM("club_transaction"."amount") AS "balance"
  FROM "club_member"
  LEFT OUTER JOIN "auth_user" ON ("club_member"."user_id" = 
"auth_user"."id")
  LEFT OUTER JOIN "club_transaction" ON ("club_member"."id" = 
"club_transaction"."member_id")
  GROUP BY "club_member"."id", {all the other fields}, 
"auth_user"."last_name", "auth_user"."first_name"
  HAVING SUM("club_transaction"."amount") < 0
  ORDER BY "auth_user"."last_name" ASC, "auth_user"."first_name" ASC
  ) subquery

(again, plus another one for > 0)
which is very slow (almost 1.5 seconds).

How can I construct a Django query which doesn't request (and group by) all 
the unnecessary other fields ?
I already tried playing around with only() and values() but never got it to 
work.

Looking forward to your responses!

best regards,
ST

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/367c1f59-870c-40d1-9a53-4feb08890f29%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.