I will always suggest that we use the Subquery API to make CTEs. To make them 
recursive, just add a keyword argument (recursive=True) and/or use a union.

It’s been a while since I looked at CTEs, so I might be missing something.

I would hate to see us create an entirely separate API for CTEs.


From: django-developers@googlegroups.com <django-developers@googlegroups.com> 
On Behalf Of Gaga Ro
Sent: Tuesday, May 10, 2022 9:01 AM
To: Django developers (Contributions to Django itself) 
<django-developers@googlegroups.com>
Subject: Re: Adding generated common table expressions

Hello everyone,

I'm often using django-cte and I'd be thrilled to have CTE in the core.

If I'm not mistaken, the only DB currently supported by Django and not having 
CTE support is MySQL 5.7 (with an end of life in October 2023). I don't know if 
Django 4.2 will support it, but it should be dropped for Django 5.0 as it will 
be released in 2023. So we should have all supported DB supporting CTE when 
this feature would be over.

The ticket (https://code.djangoproject.com/ticket/28919) has been stalled for a 
few years now, this thread as well. I am willing to work on this but I would 
like more information first.

If I try to list all the requirements, we should have:

* A way to add one or more CTE.
* A way to reference the columns from the CTE.
* A way to join them in the main query.
* Setting a CTE as recursive?
* Choosing if a CTE is materialized or not (Not all DB support that, and I'm 
not sure if they all handle it the same way)?
* Insert / delete CTE with returning data?

Do we have a better idea now of what the API should look like?

Thanks.
Le jeudi 17 octobre 2019 à 23:43:49 UTC+2, 
buzzi....@gmail.com<mailto:buzzi....@gmail.com> a écrit :
What do you think of this syntax instead?
q1 = Book.objects.values('author_id').annotate(avg_price=Avg('price'))

q2 = Author.objects.attach('book_prices', q1, id=F('book_prices__author_id'))


def attach(name, queryset, **params):
   # Would look something like this.
   ...


Same sql output.

On Thursday, April 6, 2017 at 9:14:01 AM UTC-4, Anssi Kääriäinen wrote:
On Thursday, April 6, 2017 at 11:53:32 AM UTC+3, Marc Tamlyn wrote:
Regarding Anssi's comments about SubQuery, we do now have that in core as of 
1.11 [0]. It does look like an .attach() approach might actually have been a 
nicer version of this, but on the other hand it's currently implementable 
solely with the Expressions API. It seems like the OuterRef is very similar to 
your queryset.ref(). An even nicer approach using attach could be to say 
qs.attach(q1=some_qs).filter(a=F('q1__b'))?

Hmmh, we have one form of SubQuery, but that's actually for SELECT clause, not 
for FROM clause. I believe the same class won't work for the CTE or subquery in 
FROM clause case.

As for the attach(), seems like a really nice syntax. We do need something for 
generating the join clause for the JOIN. If you look at an example:
    q1 = Book.objects.values('author_id').annotate(avg_price=Avg('price'))
    q2 = Author.objects.attach(q1=q1)
it needs to create something like:
WITH q1 AS (
    SELECT author_id, avg(price) FROM book GROUP BY author_id
)
SELECT 
author.id<https://us-east-2.protection.sophos.com?d=author.id&u=aHR0cDovL2F1dGhvci5pZA==&i=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1&t=Z0JhMUVTd2l1QmxSMDlQeU9kaHNOQW03N21LM3UwTFlJTDB6aitGcE1URT0=&h=5e5d822fc76540f8bea7d6204744abc8>,
 
author.name<https://us-east-2.protection.sophos.com?d=author.name&u=aHR0cDovL2F1dGhvci5uYW1l&i=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1&t=WXM2NWFjb2s0ejAvN0ZWS3V2aXJDbVIrZnUzNmlhL0hnQTQyeWx2K1lEMD0=&h=5e5d822fc76540f8bea7d6204744abc8>
   FROM author
   LEFT JOIN q1 ON 
author.id<https://us-east-2.protection.sophos.com?d=author.id&u=aHR0cDovL2F1dGhvci5pZA==&i=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1&t=Z0JhMUVTd2l1QmxSMDlQeU9kaHNOQW03N21LM3UwTFlJTDB6aitGcE1URT0=&h=5e5d822fc76540f8bea7d6204744abc8>
 = q1.author_id;

Or, equivalently without the CTE:

SELECT 
author.id<https://us-east-2.protection.sophos.com?d=author.id&u=aHR0cDovL2F1dGhvci5pZA==&i=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1&t=Z0JhMUVTd2l1QmxSMDlQeU9kaHNOQW03N21LM3UwTFlJTDB6aitGcE1URT0=&h=5e5d822fc76540f8bea7d6204744abc8>,
 
author.name<https://us-east-2.protection.sophos.com?d=author.name&u=aHR0cDovL2F1dGhvci5uYW1l&i=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1&t=WXM2NWFjb2s0ejAvN0ZWS3V2aXJDbVIrZnUzNmlhL0hnQTQyeWx2K1lEMD0=&h=5e5d822fc76540f8bea7d6204744abc8>
   FROM author
   LEFT JOIN ( SELECT author_id, avg(price) FROM book GROUP BY author_id) ON 
author.id<https://us-east-2.protection.sophos.com?d=author.id&u=aHR0cDovL2F1dGhvci5pZA==&i=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1&t=Z0JhMUVTd2l1QmxSMDlQeU9kaHNOQW03N21LM3UwTFlJTDB6aitGcE1URT0=&h=5e5d822fc76540f8bea7d6204744abc8>
 = q1.author_id;

Now, the main points are:
   1. There is no need to design this to be about CTEs. That just limits the 
feature from backends that don't have CTEs without any real benefit. From 
Django's perspective the two above queries are the same.
   2. We do need something for the JOIN ON condition. In some cases Django 
could guess this, but there needs to be an explicit way to express the join 
condition.

If we allow usage of expressions from the attached queryset, but don't try to 
go for cases where model instance are created from the attached queryset, this 
will be both possible to implement without having to write a change-everything 
patch, and this will also be a really nice feature.

For recursive CTEs, I'd leave that strictly as a later step. The only thing we 
need to check right now is that we don't do something that prevents a good 
recursive CTEs implementation later on.

 - Anssi

Looking forwards to seeing a DEP!

[0] 
https://docs.djangoproject.com/en/1.11/ref/models/expressions/#subquery-expressions<https://us-east-2.protection.sophos.com?d=djangoproject.com&u=aHR0cHM6Ly9kb2NzLmRqYW5nb3Byb2plY3QuY29tL2VuLzEuMTEvcmVmL21vZGVscy9leHByZXNzaW9ucy8jc3VicXVlcnktZXhwcmVzc2lvbnM=&i=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1&t=MmdMampLV0lDLzhFamQyVzFmd0NWektaS3hDbUgzTkdaQ0J0Q0RmZXo5Zz0=&h=5e5d822fc76540f8bea7d6204744abc8>

On 22 March 2017 at 01:32, Ashley Waite 
<ashley....@gmail.com<mailto:ashley....@gmail.com>> wrote:
Here's the code changes I've made, noting that some of them were to shove in a 
generalised VALUES clause that mocks being a queryset, so that it plays with 
the same interface.

https://github.com/django/django/compare/master...ashleywaite:cte-dev#files_bucket<https://us-east-2.protection.sophos.com?d=github.com&u=aHR0cHM6Ly9naXRodWIuY29tL2RqYW5nby9kamFuZ28vY29tcGFyZS9tYXN0ZXIuLi5hc2hsZXl3YWl0ZTpjdGUtZGV2I2ZpbGVzX2J1Y2tldA==&i=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1&t=eVZVYlJFVzljMzBxbTU3Nm11MStCRFQ3eDVKR0dvT3pobFVkR1VCSHJyVT0=&h=5e5d822fc76540f8bea7d6204744abc8>

I've had a glance at cte-trees/cte-forest and once general CTEs are worked out 
expanding that to include recursive CTEs wouldn't be too difficult, and that 
would greatly simplify the implementation of cte-forest to the extent that it 
might be viable as a django data/reference type.

- Ashley


On Saturday, March 18, 2017 at 8:28:53 PM UTC+11, Josh Smeaton wrote:
Thanks for bringing this up Ashley, and for all of the detail you provided. I'd 
certainly like to see CTEs make their way into Django, provided we could come 
up with a nice enough API. From the look of it, you've already got something 
that works with an okay API so I'm hopeful.

I'd be very interested in seeing your POC too if you're able to share.

>From looking very briefly at django-cte-trees it doesn't aim to support user 
>defined CTEs for anything other than recursive queries. I'd be interested in 
>seeing, as part of a DEP, how CTE inclusion in django core could support the 
>cte-trees project from an API perspective.

On Friday, 17 March 2017 22:28:17 UTC+11, Ashley Waite wrote:

Hey all,



I'd like to suggest adding Common Table Expression (CTE) query generation as a 
feature to Django.

I've been working on a project that required manipulation of many records at 
once, and as with many ORMs found that this wasn't an ideal use-case in Django. 
As the rest of our code base and related projects are in Django, there was a 
strong preference to find a way to do it and keep to the same 
model-is-the-truth design.

I first did this by writing some hackish functions using raw querysets and 
generating my own CTE based queries, but it lacked ideal flexibility and 
maintainability. So I've now written some modifications into my Django to do 
this in a more Django-esque way and think that this functionality would be 
beneficial within the project itself, but am unsure exactly where to start the 
conversation about that.



Why generate CTE based queries from querysets?

By allowing querysets to be attached to each other, and setting appropriate 
WHERE clauses, arbitrary and nested SQL queries can be generated. Where the 
results of the queries are only necessary for the execution of following 
queries this saves a very substantial amount of time and database work. Once 
these features exist, other functionality can also transparently use these to 
generate more efficient queries (such as large IN clauses).

This allows several powerful use cases I think Django would benefit from:



Large 'IN' clauses, can be implemented as CTEs reducing expensive lookups to a 
single CTE INNER JOIN. For sets of thousands to match from tables of millions 
of records this can be a very substantial gain.



Composite 'IN' conditions, where multiple fields must match and you're matching 
against a large set of condition rows. In my usage this was "where the md5/sha 
hashes match one of the million md5/sha tuples in my match set". This is simply 
a CTE JOIN with two clauses in the WHERE.



Nested data creation, where the parent doesn't yet exist. Django doesn't 
currently do this as the primary keys are needed, and this makes normalised 
data structures unappealing. Using INSERTs as CTEs that supply those keys to 
following statements means that entire nested data structures of new 
information can be recreated in the database at once, efficiently and 
atomically.



Non-uniform UPDATEs, such that a modified set of objects can all be updated 
with different data at the same time by utilising a CTE values statement JOINed 
to the UPDATE statement. As there's currently no way to do this kind of bulk 
update the alternative is to update each instance individually, and this 
doesn't scale well.

These could also be used with aggregations and other calculated fields to 
create complex queries that aren't possible at the moment.



What my PoC looks like

With another mildly hackish PoC that creates a VALUEs set from a 
dict/namedtuple which can be used to provide large input data, my present 
modified version syntax looks a bit like this (not perfect queries):

class Hashes(models.Model):
    md5 = models.UUIDField(verbose_name="MD5 hash (base16)", db_index=True)
    sha2 = models.CharField(max_length=44, null=True, verbose_name="SHA256 hash 
(base64)")

# Mock QuerySet of values

q_mo = Hashes.as_literal(input_hashes).values("md5", "sha2")

# A big IN query
q_in = Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5"))

# Matched existing values with composite 'IN' (where md5 and sha2 match, or md5 
matches and existing record lacks sha2)

q_ex = 
Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5")).filter(Q(sha160=q_mo.ref("sha160"))
 | Q(sha160=None))



# Create new records that don't exist

q_cr = Hashes.objects.attach(q_mo, 
q_ex).filter(md5=q_mo.ref("md5")).exclude(md5=q_ex.ref("md5")).values("md5", 
"sha2").as_insert()
Returning the newly created records.

SQL can be generated that looks something like this:

WITH cte_1_0 (md5, sha2) AS (

      VALUES ('00002d30243bfe9d06673765c432c2bd'::uuid, 
'fsA8okuCuq9KybxqcAzNdjlIyAx1QJjTPdf1ZFK/hDI='::varchar(44)),
      ('0000f20a46e4e60338697948a0917423', 
'6bVZgpYZtit1E32BlANWXoKnFFFDNierDSIi0SraND4=')),

cte_1 AS (

      SELECT "hashes"."id", "hashes"."md5", "hashes"."sha2"

      FROM "hashes" , "cte_1_0"

      WHERE ("hashes"."md5" = (cte_1_0.md5) AND ("hashes"."sha2" = 
(cte_1_0.sha2) OR "hashes"."sha2" IS NULL) ))

SELECT "hashes"."md5"

FROM "hashes" , "cte_1_0" , "cte_1"

WHERE ("hashes"."md5" = (cte_1_0.md5) AND NOT ("hashes"."md5" = (cte_1.md5)))
That is:

  *   A qs.as_insert() and qs.as_update() on queryset to create *lazy* insert 
and update queries.
  *   A qs.attach() that allows querysets to be attached to other querysets, 
and will generate them as CTE statements.
  *   A qs.ref() that returns an expression that when the query is compiled 
will be a field reference on the CTE that represents that queryset.
  *   Additional compilers on the QuerySet subclasses that these return (so no 
changes to base compilers meaning no functionality impact to existing usage)
  *   Generation of WITH clauses for attached querysets, and RETURN clauses for 
lazy UPDATE and INSERT querysets with fields requested (via values() in this 
case)

As these can be attached to querysets that are attached to querysets, that 
are... etc, many statements can be chained allowing substantial changes to be 
performed without needing Django to have to receive, process, and resend at 
every step.

I've had a read through the enhancement proposal docs etc, and I'm willing to 
do what's needed to make this functionality solid, and put forth a proposal to 
add it. But am first seeking feedback on it, and whether this is a feature that 
will be considered.


Thanks,
- Ashley
--
You received this message because you are subscribed to the Google Groups 
"Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
django-develop...@googlegroups.com<mailto:django-develop...@googlegroups.com>.
To post to this group, send email to 
django-d...@googlegroups.com<mailto:django-d...@googlegroups.com>.
Visit this group at 
https://groups.google.com/group/django-developers<https://us-east-2.protection.sophos.com?d=google.com&u=aHR0cHM6Ly9ncm91cHMuZ29vZ2xlLmNvbS9ncm91cC9kamFuZ28tZGV2ZWxvcGVycw==&i=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1&t=TkgrS3ZLMHNmcEljSG5vaUwwZDdRSVZML1pMN3RGS3gzUlU5WDBsMURFQT0=&h=5e5d822fc76540f8bea7d6204744abc8>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/9af2d3dc-81a0-4d76-a3a0-cdef850158ee%40googlegroups.com<https://us-east-2.protection.sophos.com?d=google.com&u=aHR0cHM6Ly9ncm91cHMuZ29vZ2xlLmNvbS9kL21zZ2lkL2RqYW5nby1kZXZlbG9wZXJzLzlhZjJkM2RjLTgxYTAtNGQ3Ni1hM2EwLWNkZWY4NTAxNThlZSU0MGdvb2dsZWdyb3Vwcy5jb20_dXRtX21lZGl1bT1lbWFpbCZ1dG1fc291cmNlPWZvb3Rlcg==&i=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1&t=UnkrcHRwb2lieFRxNjR2anl0akxnTGljVmc0L3Uxb1BRNnB3WWFrV2RUaz0=&h=5e5d822fc76540f8bea7d6204744abc8>.

For more options, visit 
https://groups.google.com/d/optout<https://us-east-2.protection.sophos.com?d=google.com&u=aHR0cHM6Ly9ncm91cHMuZ29vZ2xlLmNvbS9kL29wdG91dA==&i=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1&t=QzduVHdqNExHaUNiem9wUHNWQTlQN1RXb1VySC9ubVFZRHN0dDBLN3RFbz0=&h=5e5d822fc76540f8bea7d6204744abc8>.

--
You received this message because you are subscribed to the Google Groups 
"Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
django-developers+unsubscr...@googlegroups.com<mailto:django-developers+unsubscr...@googlegroups.com>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/c559fd08-447f-4953-9940-b2090f2148e5n%40googlegroups.com<https://us-east-2.protection.sophos.com?d=google.com&u=aHR0cHM6Ly9ncm91cHMuZ29vZ2xlLmNvbS9kL21zZ2lkL2RqYW5nby1kZXZlbG9wZXJzL2M1NTlmZDA4LTQ0N2YtNDk1My05OTQwLWIyMDkwZjIxNDhlNW4lNDBnb29nbGVncm91cHMuY29tP3V0bV9tZWRpdW09ZW1haWwmdXRtX3NvdXJjZT1mb290ZXI=&i=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1&t=WlhxTFdUTmpoelQyUElsS1Q4UG9qUThhN3hBakRHYXo5L21DTjVWVEkvUT0=&h=5e5d822fc76540f8bea7d6204744abc8>.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/7b017a9731e949bda0904d63868fba52%40Exchange.ISS.LOCAL.

Reply via email to