This has not received a lot of feedback so far, so I think some code can 
help. I am not sure what is the next steps; as I've understood, we need 
some kind of consensus to open a ticket.

Please, don't hesitate to express your opinion on the feature and the API, 
as well as the implementation to move this forward.
https://github.com/tatarinov1997/django/pull/1/

Cheers,
Alexandr.



On Sunday, 12 April 2020 19:35:55 UTC+3, Alexandr Tatarinov wrote:
>
> Hello everyone.
> So far, I've managed to implement joins via Subquery and OuterRef with 
> some caveats.
>
> To begin with, the API looks like this
>
> Product.objects.create(name='Pizza', country='USA', type='FOOD')
> Tax.objects.create(name='USA Food Tax', amount=10, product_country='USA', 
> product_type='FOOD')
>         
> taxes = Tax.objects.filter(
>     product_type=OuterRef('type'),
>     product_country=OuterRef('country')
> )
> products = Product.objects.annotate(taxes=Subquery(taxes, join=True, 
> join_type=LOUTER))
> product = products.annotate(tax=F('taxes__amount'), tax_name=F(
> 'taxes__name'))
>
> The generated SQL is
> SELECT "core_product"."id", "core_product"."name", "core_product".
> "country", "core_product"."type", "taxes"."__col3" AS "tax", "taxes".
> "__col4" AS "tax_name" 
> FROM "core_product" 
> LEFT OUTER JOIN 
> (
>     SELECT 
>         "core_tax"."id", "core_tax"."product_country", "core_tax".
> "product_type", "core_tax"."name", "core_tax"."amount", 
>         "core_tax"."product_country" AS "__col1", "core_tax".
> "product_type" AS "__col2", "core_tax"."amount" AS "__col3", "core_tax".
> "name" AS "__col4"
>     FROM "core_tax"
> ) "taxes" ON ("taxes"."__col1" = ("core_product"."country") AND "taxes".
> "__col2" = ("core_product"."type"))
>
> We use *.annotate* for joins to avoid adding new methods, also *.annotate* 
> has 
> some useful checks to avoid name collisions.
> To distinguish plain subquery from the joined subquery, a *join *flag is 
> added. To specify join type, there is an optional *join_type* param to 
> Subquery.
>
> Then, one can refer to joined subquery columns by the given alias, but 
> only via *F()* expression.
> This means it is not possible to write *.values('taxes__name') *directly*, 
> *but can be achieved via *.annotate(**tax_name=F('taxes__name'))* 
> *.values('tax_name').*
> Other examples are *.order_by(F('taxes__name')) *and* 
> .values_list(F('taxes__name')), *they work without *.annotate() *but 
> require *F() *wrapper. 
>
> The reasons why the string fields and *F* expressions are treated 
> separately and differently in various QuerySet methods are still unclear 
> for me.
> The *F *expression works because the implementation relies on 
> *Query.resolve_ref 
> *to be called to short-circuit columns from the joined subqueries.
> Maybe, we will be able to fix this later or leave as-is with the 
> documentation in place to always use *.annotate*.
>
> The implementation is ~170 lines long, but I think some corner cases are 
> yet to be discovered.
> For now, it would be great to hear your feedback on the API and the 
> feature itself, so we can move on with the ticket and discuss the 
> implementation.
>
> Thanks, Alexandr.
>
>
> On Monday, 6 April 2020 16:34:55 UTC+3, Alexandr Tatarinov wrote:
>>
>> Hello folks,
>>
>> Following the discussion 
>> https://groups.google.com/forum/#!topic/django-developers/b370mxfKCHg, I 
>> would like to suggest adding the ability to join QuerySet with a subquery 
>> clause to Django.
>> The benefits are pretty much described and discussed in the linked topic, 
>> the only one I would like to add (my use case) is the ability to select 
>> multiple columns from a Subquery without the need to repeat the subquery 
>> clause, which hurts performance.
>> As Anssi Kääriäinen mentioned, generic subquery supports seems to be a 
>> better idea than CTEs, and simple implementation as a starting point will 
>> allow moving gradually and add RETURNING support later on (for nested 
>> creates and updates).
>>
>> I have created a working prototype and successfully using it in one of my 
>> projects in production. After reading Daniel Miller's 
>> https://github.com/dimagi/django-cte, I've rewritten the API inspired by 
>> his ideas, plus tried to keep it as Django-style as possible.
>> Here is the usage example.
>>
>> taxes = JoinedSubquery(Tax.objects.all())
>> products = Product.objects.all()
>>
>> # .join is probably better be placed in QuerySet class, so it will be 
>> products.join(taxes, ...)
>> products_with_taxes = taxes.join(
>>     products,
>>     join_type=LOUTER,  # Optional argument, LOUTER by default
>>     country=taxes.F('product_country'),  # Also Q objects are supported 
>>     type=taxes.F('product_type')
>> ).annotate(
>>     tax_amount=taxes.F('amount')  # .join() and .annotate() calls 
>> needn't to be chained
>> )
>>
>>
>> I am not sure about named joins discussed in the CTE topic
>> qs.attach(q1=some_qs).filter(a=F('q1__b'))
>>
>> It seems to be more implicit and harder to implement, requiring changes 
>> to existing functionality and thus introducing a big patch which I would 
>> like to avoid.
>>
>> It is not possible to follow relations in *join.F* expressions, but it 
>> is easily achievable by firstly fetching needed columns by *annotate*(), 
>> so I don't see the need to implement it.
>>
>> Source code is available here 
>> https://gist.github.com/tatarinov1997/126c49c4b1bb44ae6c57afbc5f43f58d , 
>> tested with Django 2.2. Feel free to copy and play around.
>>
>> So, please let me know what do you think about it. I am willing to 
>> continue working on it, and any help is appreciated.
>>
>> Thanks,
>> Alexandr.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/cb88ec38-cb8d-4ebc-a749-8a60dc83fd6a%40googlegroups.com.

Reply via email to