Hey Mads,

I'll start by saying that subquery handling is a bit messy in the ORM. As 
you noticed there a few tickets and PRs about trying to make it better. I 
personally tried to get things better by pushing a lot of the current 
Subquery logic down Query a few months[0] ago but there's still a lot of 
room for improvements.

If I understand correctly filtering against window expression would require 
an automatic subquery wrap around which the ORM already does under certain 
circumstances but always on terminal operations such as aggregation[1].

I think that getting the semantic of how the wrapping should be performed 
while allowing following Queryset operations (filter, annotate, aggregate) 
to take place would be quite tricky and error prone so I suggest you 
explore the following alternative.

There's a ticket opened[2] to introduce a Queryset method to use the 
current query as a subquery which would allow to circumvent the implicit 
wrap around problem and require this method to be used to filter against 
window expression. To reuse #28333's example and assuming the method would 
be named subquery you could achieve what you're after with the following

    Window.objects.annotate(
        row=Window(expression=RowNumber())
    ).subquery().filter(row__gt=1)

which would result in SQL equivalent to

    SELECT * FROM (
        SELECT *, ROW_NUMBER()
        FROM window
    ) subquery WHERE subquery.row > 1

This subquery operation would address a few other cases that I know 
Claude[3] is interested in seeing addressed as well. Maybe you'd be able to 
pair on this problem given his interest in the problem and your past 
interactions with some of the ORM's internal.

>From the little investgation I've done so far I think it shouldn't be too 
hard to implement once there's a way to add subquery to 
`Query.alias_map`[4]. From that point the _subquery_ operation should 
create a new `Query` instance and add the current one to `alias_map` as 
`base_table` (the first entry in `alias_map`) by making sure to augment the 
outer `Query.annotations` and select mask appropriately. It's not clear to 
me what should be done with `JOIN`s originating from `select_related` prior 
to calling the operation but I think it would be more appropriate to prune 
them from the inner query and graft them to the outer one since they'll 
have the same base table.

Here's a bit of peudo-code of what I tried to describe above

    def subquery(self):
        query = Query(self.model)
        alias = 'subquery'
        query.alias_map[alias] = BaseSubquery(self.query.clone(), alias)
        # XXX: Handle the select mask as well.
        query.annotations = {
            annotation: Ref(annotation, expr)
            for annotation, expr in self.query.annotations.items()
        }
        return QuerySet(self.model, query=query)

It's certainly missing quite a lot of things but it should get you started.

I'm happy to provide more guidance along the way if you decide to head in 
this direction.

Cheers,
Simon


[0] 
https://github.com/django/django/commit/35431298226165986ad07e91f9d3aca721ff38ec#diff-0edd853580d56db07e4020728d59e193
[1] 
https://github.com/django/django/blob/6fbce45b0376f0ec8f3bf244f4f1f8d62c201f58/django/db/models/sql/compiler.py#L1556-L1574
[2] https://code.djangoproject.com/ticket/24462
[3] https://code.djangoproject.com/ticket/30592
[4] https://github.com/django/django/pull/12602

Le samedi 4 avril 2020 05:11:01 UTC-4, Mads Jensen a écrit :
>
> Hey,
>
> I originally submitted the window functions expressions to Django a couple 
> of years ago for Django 2.0. In hindsight, 
> there are probably things that could be implemented a bit better although 
> I'm not sure how I easily could have used the
> SQLCompiler or QuerySet/Query classes directly, especially considering the 
> fact that there are slight discrepencies in
> how window expressions are supported between the different databases that 
> Django officielly supports.
>
> Anyway, it spawned a ticket #28333 [1] that I like to dedicate some time 
> to implement since the feature appears to be wanted. 
>
> I have some thoughts about the process:
>
> 1. Include another flag: filtering_requires_subquery (should default to 
> False, just filterable defaults to True)
> 2. Edit in QuerySet._filter_or_exclude to add somehow the following logic:
>
> 1. If the expression is not filterable, then we can introduce a subquery 
> and filter directly there.
> I surmise that this statement would be generic for SQL: if something is 
> not filterable in a main query,
> then it will be including it in another query as a subquery, and filtering 
> that way?
> 2. Everything related to existing ordering etc. is a bit blurry.
> 3. Having to work with table aliases doesn't seem to be something that I'd 
> want to worry about.
> There's Query.rewrite_cols for the case when there's an annotation from a 
> joined table,
> Sum(F('author__awards')) is the example given in the comment in that 
> function.
>
> 3. I think it's a general lacune that it's difficult to add a subquery to 
> an existing query.
> Although there's a boolean flag Query.subquery, it doesn't seem to make it 
> easier to deal with filtering and other
> things related to a subquery. I found this when searching for tickets that 
> have "subquery" in the description: 
> https://code.djangoproject.com/ticket/20127 
>
> When are the table aliases used (T1, T2 etc.), and when are the actual 
> table names used? I think it's when a table is auto-joined?
>
> I'm not sure how much work that is required in SQLCompiler and Query, or 
> if those two parts can stay untouched.
>
> I tend to overthink things and sometimes make them more complex, so it's 
> likely that there are things in the above that 
>
> Query.as_sql has this:
>
> def as_sql(self, compiler, connection):
> sql, params = self.get_compiler(connection=connection).as_sql()
> if self.subquery:
> sql = '(%s)' % sql
> return sql, params
>
> If someone is willing to discuss ideas about the process, please reach out 
> to me. Devil lies in the details, and I suppose some 
> of the problem that subqueries are difficult in the ORM could be mended 
> with work on this patch. However, I'd like to interact with
> people who have worked more with the internals of the ORM than I did. One 
> particular name Simon Charettes comes to mind, 
> since he also volunteered to mentor GSOC in the ORM area. 
>
> [1] https://code.djangoproject.com/ticket/28333
> --
> Med venlig hilsen / Kind regards,
> Mads Jensen
>
>
>
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/27c2598f-08b3-41f7-84af-3b026eadfd97%40googlegroups.com.

Reply via email to