My bad, the correct SQL query is this:

SELECT "date", sum("car_crashes") FROM (


// String from Python
str(aggregated.query)


) as "aggregated" GROUP BY "date"



Il giorno lunedì 19 agosto 2019 23:10:47 UTC+2, Jo ha scritto:
>
> I have a table that looks like this
>
> date                car_crashes         city
> 01.01               1                   Washington
> 01.02               4                   Washington
> 01.03               0                   Washington
> 01.04               2                   Washington
> 01.05               0                   Washington
> 01.06               3                   Washington
> 01.07               4                   Washington
> 01.08               1                   Washington
> 01.01               0                   Detroit
> 01.02               2                   Detroit
> 01.03               4                   Detroit
> 01.04               2                   Detroit
> 01.05               0                   Detroit
> 01.06               3                   Detroit
> 01.07               1                   Detroit
>
>
> I want to know how many car crashes for each day happened in the entire 
> nation, and I can do that with this:
>
>
> Model.values("date") \
> .annotate(car_crashes=Sum('car_crashes')) \
> .values("date", "car_crashes")
>
>
>
> Now, let's suppose I have an array like this:
>
> weights = [
>     {
>         "city": "Washington",
>         "weight": 1,
>     },
>     {
>         "city": "Detroit",
>         "weight": 2,
>     }
> ]
>
>
>
> This means that Detroit's car crashes should be multiplied by 2 before 
> being aggregated with Washington's.
>
> It can be done like this:
>
> from django.db.models import IntegerField
>
>
> when_list = [When(city=w['city'], then=w['weight']) for w in weights]
> case_params = {'default': 1, 'output_field': IntegerField()}
>
>
> Model.objects.values('date') \
>     .annotate(
>         weighted_car_crashes=Sum(
>             F('car_crashes') * Case(*when_list, **case_params)
>     ))
>
> However, this generates very slow SQL code, especially as more properties 
> and a larger array are introduced.
>
> Another solution which is way faster but still sub-optimal is using pandas 
> to :
>
> aggregated = false
> for weight in weights:
>
>  ag = Model.values("date") \
>  .annotate(car_crashes=Sum('car_crashes')) \
>  .values("date", "car_crashes")
>
>
>  if aggregated is False:
>  aggregated = ag
>  else:
>  aggregated = aggregated.union(ag)
>
>
> aggregated = pd.DataFrame(aggregated)
> if len(weights) > 1:
>  aggregated = aggregated.groupby("date", as_index=False).sum(level=[1])
>
>
> This is faster, but still not as fast as what happens if, before calling 
> pandas, I take the aggregated.query string and
> wrap it with a few lines of SQL.
>
>
> SELECT "date", sum("car_crashes") FROM (
>
>
> // String from Python
> str(aggregated.query)
>
>
> )
>
>
> This works perfectly when pasted into my database SQL. I could do this in 
> Python/Django using .raw() but the documentation says to ask here before 
> using .raw() as mostly anything could be acomplished with the ORM.
>
> Yet, I don't see how. Once I call .union on 2 querysets, I cannot 
> aggregate further.
>
> aggregated.union(ag).annotate(cc=Sum('car_crashes'))
>
> gives
>
> Cannot compute Sum('car_crashes'): 'car_crashes' is an aggregate
>
>
>
>
> Is this possible to do with the Django ORM or should I use .raw()?
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/7096fdff-ec55-4d1b-8110-9fd6bb70f9e1%40googlegroups.com.

Reply via email to