Re: Adding generated common table expressions

2022-05-10 Thread Gaga Ro
As far as I know, CTE and subquery are equivalent when used only in a 
single place in the query.

CTE should be better when a single query is used several times.

If we want to reuse the Subquery API, we should find a way to be able to 
reuse a subquery (hash its content and use that as a key to detect the 
duplication?).

Le mardi 10 mai 2022 à 16:08:06 UTC+2, matthew.pava a écrit :

> I will always suggest that we use the Subquery API to make CTEs. To make 
> them recursive, just add a keyword argument (recursive=True) and/or use a 
> union.
>
>  
>
> It’s been a while since I looked at CTEs, so I might be missing something.
>
>  
>
> I would hate to see us create an entirely separate API for CTEs.
>
>  
>
>  
>
> *From:* django-d...@googlegroups.com  *On 
> Behalf Of *Gaga Ro
> *Sent:* Tuesday, May 10, 2022 9:01 AM
> *To:* Django developers (Contributions to Django itself) <
> django-d...@googlegroups.com>
> *Subject:* Re: Adding generated common table expressions
>
>  
>
> Hello everyone,
>
>  
>
> I'm often using django-cte and I'd be thrilled to have CTE in the core.
>
>  
>
> If I'm not mistaken, the only DB currently supported by Django and not 
> having CTE support is MySQL 5.7 (with an end of life in October 2023). I 
> don't know if Django 4.2 will support it, but it should be dropped for 
> Django 5.0 as it will be released in 2023. So we should have all supported 
> DB supporting CTE when this feature would be over.
>
>  
>
> The ticket (https://code.djangoproject.com/ticket/28919) has been stalled 
> for a few years now, this thread as well. I am willing to work on this but 
> I would like more information first.
>
>  
>
> If I try to list all the requirements, we should have:
>
>  
>
> * A way to add one or more CTE.
>
> * A way to reference the columns from the CTE.
>
> * A way to join them in the main query.
>
> * Setting a CTE as recursive?
>
> * Choosing if a CTE is materialized or not (Not all DB support that, and 
> I'm not sure if they all handle it the same way)?
>
> * Insert / delete CTE with returning data?
>
>  
>
> Do we have a better idea now of what the API should look like?
>
>  
>
> Thanks.
>
> Le jeudi 17 octobre 2019 à 23:43:49 UTC+2, buzzi@gmail.com a écrit :
>
> What do you think of this syntax instead?
>
> q1 = Book.objects.values('author_id').annotate(avg_price=Avg('price'))
>
>
> q2 = Author.objects.attach('book_prices', q1, id=F(
> 'book_prices__author_id'))
>
>
> def attach(name, queryset, **params):
># Would look something like this.
>...
>
>  
>
>  
>
> Same sql output.
>
>
> On Thursday, April 6, 2017 at 9:14:01 AM UTC-4, Anssi Kääriäinen wrote:
>
> On Thursday, April 6, 2017 at 11:53:32 AM UTC+3, Marc Tamlyn wrote:
>
> Regarding Anssi's comments about SubQuery, we do now have that in core as 
> of 1.11 [0]. It does look like an .attach() approach might actually have 
> been a nicer version of this, but on the other hand it's currently 
> implementable solely with the Expressions API. It seems like the OuterRef 
> is very similar to your queryset.ref(). An even nicer approach using attach 
> could be to say qs.attach(q1=some_qs).filter(a=F('q1__b'))?
>
>  
>
> Hmmh, we have one form of SubQuery, but that's actually for SELECT clause, 
> not for FROM clause. I believe the same class won't work for the CTE or 
> subquery in FROM clause case.
>
>  
>
> As for the attach(), seems like a really nice syntax. We do need something 
> for generating the join clause for the JOIN. If you look at an example:
>
> q1 = Book.objects.values('author_id').annotate(avg_price=Avg('price'))
>
> q2 = Author.objects.attach(q1=q1)
>
> it needs to create something like:
>
> WITH q1 AS (
>
> SELECT author_id, avg(price) FROM book GROUP BY author_id
>
> )
>
> SELECT author.id 
> <https://us-east-2.protection.sophos.com?d=author.id=aHR0cDovL2F1dGhvci5pZA===NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1=Z0JhMUVTd2l1QmxSMDlQeU9kaHNOQW03N21LM3UwTFlJTDB6aitGcE1URT0==5e5d822fc76540f8bea7d6204744abc8>,
>  
> author.name 
> <https://us-east-2.protection.sophos.com?d=author.name=aHR0cDovL2F1dGhvci5uYW1l=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1=WXM2NWFjb2s0ejAvN0ZWS3V2aXJDbVIrZnUzNmlhL0hnQTQyeWx2K1lEMD0==5e5d822fc76540f8bea7d6204744abc8>
>
>FROM author
>
>LEFT JOIN q1 ON author.id 
> <https://us-east-2.protection.sophos.com?d=author.id=aHR0cDovL2F1dGhvci5pZA===NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1=Z0JhMUVTd2l1QmxSMDlQeU9kaHNOQW03N21LM3UwTFlJTDB6aitGcE1URT0==5e5d822fc76540f8bea7d6204744abc8>
>  
> = q1.author_id;
>
>  
>
> Or, 

RE: Adding generated common table expressions

2022-05-10 Thread Matthew Pava
I will always suggest that we use the Subquery API to make CTEs. To make them 
recursive, just add a keyword argument (recursive=True) and/or use a union.

It’s been a while since I looked at CTEs, so I might be missing something.

I would hate to see us create an entirely separate API for CTEs.


From: django-developers@googlegroups.com  
On Behalf Of Gaga Ro
Sent: Tuesday, May 10, 2022 9:01 AM
To: Django developers (Contributions to Django itself) 

Subject: Re: Adding generated common table expressions

Hello everyone,

I'm often using django-cte and I'd be thrilled to have CTE in the core.

If I'm not mistaken, the only DB currently supported by Django and not having 
CTE support is MySQL 5.7 (with an end of life in October 2023). I don't know if 
Django 4.2 will support it, but it should be dropped for Django 5.0 as it will 
be released in 2023. So we should have all supported DB supporting CTE when 
this feature would be over.

The ticket (https://code.djangoproject.com/ticket/28919) has been stalled for a 
few years now, this thread as well. I am willing to work on this but I would 
like more information first.

If I try to list all the requirements, we should have:

* A way to add one or more CTE.
* A way to reference the columns from the CTE.
* A way to join them in the main query.
* Setting a CTE as recursive?
* Choosing if a CTE is materialized or not (Not all DB support that, and I'm 
not sure if they all handle it the same way)?
* Insert / delete CTE with returning data?

Do we have a better idea now of what the API should look like?

Thanks.
Le jeudi 17 octobre 2019 à 23:43:49 UTC+2, 
buzzi@gmail.com<mailto:buzzi@gmail.com> a écrit :
What do you think of this syntax instead?
q1 = Book.objects.values('author_id').annotate(avg_price=Avg('price'))

q2 = Author.objects.attach('book_prices', q1, id=F('book_prices__author_id'))


def attach(name, queryset, **params):
   # Would look something like this.
   ...


Same sql output.

On Thursday, April 6, 2017 at 9:14:01 AM UTC-4, Anssi Kääriäinen wrote:
On Thursday, April 6, 2017 at 11:53:32 AM UTC+3, Marc Tamlyn wrote:
Regarding Anssi's comments about SubQuery, we do now have that in core as of 
1.11 [0]. It does look like an .attach() approach might actually have been a 
nicer version of this, but on the other hand it's currently implementable 
solely with the Expressions API. It seems like the OuterRef is very similar to 
your queryset.ref(). An even nicer approach using attach could be to say 
qs.attach(q1=some_qs).filter(a=F('q1__b'))?

Hmmh, we have one form of SubQuery, but that's actually for SELECT clause, not 
for FROM clause. I believe the same class won't work for the CTE or subquery in 
FROM clause case.

As for the attach(), seems like a really nice syntax. We do need something for 
generating the join clause for the JOIN. If you look at an example:
q1 = Book.objects.values('author_id').annotate(avg_price=Avg('price'))
q2 = Author.objects.attach(q1=q1)
it needs to create something like:
WITH q1 AS (
SELECT author_id, avg(price) FROM book GROUP BY author_id
)
SELECT 
author.id<https://us-east-2.protection.sophos.com?d=author.id=aHR0cDovL2F1dGhvci5pZA===NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1=Z0JhMUVTd2l1QmxSMDlQeU9kaHNOQW03N21LM3UwTFlJTDB6aitGcE1URT0==5e5d822fc76540f8bea7d6204744abc8>,
 
author.name<https://us-east-2.protection.sophos.com?d=author.name=aHR0cDovL2F1dGhvci5uYW1l=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1=WXM2NWFjb2s0ejAvN0ZWS3V2aXJDbVIrZnUzNmlhL0hnQTQyeWx2K1lEMD0==5e5d822fc76540f8bea7d6204744abc8>
   FROM author
   LEFT JOIN q1 ON 
author.id<https://us-east-2.protection.sophos.com?d=author.id=aHR0cDovL2F1dGhvci5pZA===NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1=Z0JhMUVTd2l1QmxSMDlQeU9kaHNOQW03N21LM3UwTFlJTDB6aitGcE1URT0==5e5d822fc76540f8bea7d6204744abc8>
 = q1.author_id;

Or, equivalently without the CTE:

SELECT 
author.id<https://us-east-2.protection.sophos.com?d=author.id=aHR0cDovL2F1dGhvci5pZA===NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1=Z0JhMUVTd2l1QmxSMDlQeU9kaHNOQW03N21LM3UwTFlJTDB6aitGcE1URT0==5e5d822fc76540f8bea7d6204744abc8>,
 
author.name<https://us-east-2.protection.sophos.com?d=author.name=aHR0cDovL2F1dGhvci5uYW1l=NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1=WXM2NWFjb2s0ejAvN0ZWS3V2aXJDbVIrZnUzNmlhL0hnQTQyeWx2K1lEMD0==5e5d822fc76540f8bea7d6204744abc8>
   FROM author
   LEFT JOIN ( SELECT author_id, avg(price) FROM book GROUP BY author_id) ON 
author.id<https://us-east-2.protection.sophos.com?d=author.id=aHR0cDovL2F1dGhvci5pZA===NWVjN2YxNzUxNGEyNzMxNmMyMGRkZGU1=Z0JhMUVTd2l1QmxSMDlQeU9kaHNOQW03N21LM3UwTFlJTDB6aitGcE1URT0==5e5d822fc76540f8bea7d6204744abc8>
 = q1.author_id;

Now, the main points are:
   1. There is no need to design this to be about CTEs. That just limits the 
feature from backends that don't have CTEs without any real benefit. From 
Django's perspective the two above queries are the same.
   2. We do need something for the JOIN ON condition. In some cases Django 
could guess t

Re: Adding generated common table expressions

2022-05-10 Thread Gaga Ro
Hello everyone,

I'm often using django-cte and I'd be thrilled to have CTE in the core.

If I'm not mistaken, the only DB currently supported by Django and not 
having CTE support is MySQL 5.7 (with an end of life in October 2023). I 
don't know if Django 4.2 will support it, but it should be dropped for 
Django 5.0 as it will be released in 2023. So we should have all supported 
DB supporting CTE when this feature would be over.

The ticket (https://code.djangoproject.com/ticket/28919) has been stalled 
for a few years now, this thread as well. I am willing to work on this but 
I would like more information first.

If I try to list all the requirements, we should have:

* A way to add one or more CTE.
* A way to reference the columns from the CTE.
* A way to join them in the main query.
* Setting a CTE as recursive?
* Choosing if a CTE is materialized or not (Not all DB support that, and 
I'm not sure if they all handle it the same way)?
* Insert / delete CTE with returning data?

Do we have a better idea now of what the API should look like?

Thanks.
Le jeudi 17 octobre 2019 à 23:43:49 UTC+2, buzzi@gmail.com a écrit :

> What do you think of this syntax instead?
>
> q1 = Book.objects.values('author_id').annotate(avg_price=Avg('price'))
>
> q2 = Author.objects.attach('book_prices', q1, id=F(
> 'book_prices__author_id'))
>
>
> def attach(name, queryset, **params):
># Would look something like this.
>...
>
>
> Same sql output.
>
> On Thursday, April 6, 2017 at 9:14:01 AM UTC-4, Anssi Kääriäinen wrote:
>>
>> On Thursday, April 6, 2017 at 11:53:32 AM UTC+3, Marc Tamlyn wrote:
>>>
>>> Regarding Anssi's comments about SubQuery, we do now have that in core 
>>> as of 1.11 [0]. It does look like an .attach() approach might actually have 
>>> been a nicer version of this, but on the other hand it's currently 
>>> implementable solely with the Expressions API. It seems like the OuterRef 
>>> is very similar to your queryset.ref(). An even nicer approach using attach 
>>> could be to say qs.attach(q1=some_qs).filter(a=F('q1__b'))?
>>>
>>
>> Hmmh, we have one form of SubQuery, but that's actually for SELECT 
>> clause, not for FROM clause. I believe the same class won't work for the 
>> CTE or subquery in FROM clause case.
>>
>> As for the attach(), seems like a really nice syntax. We do need 
>> something for generating the join clause for the JOIN. If you look at an 
>> example:
>> q1 = Book.objects.values('author_id').annotate(avg_price=Avg('price'))
>> q2 = Author.objects.attach(q1=q1)
>> it needs to create something like:
>> WITH q1 AS (
>> SELECT author_id, avg(price) FROM book GROUP BY author_id
>> )
>> SELECT author.id, author.name
>>FROM author
>>LEFT JOIN q1 ON author.id = q1.author_id;
>>
>> Or, equivalently without the CTE:
>>
>> SELECT author.id, author.name
>>FROM author
>>LEFT JOIN ( SELECT author_id, avg(price) FROM book GROUP BY author_id) 
>> ON author.id = q1.author_id;
>>
>> Now, the main points are:
>>1. There is no need to design this to be about CTEs. That just limits 
>> the feature from backends that don't have CTEs without any real benefit. 
>> From Django's perspective the two above queries are the same.
>>2. We do need something for the JOIN ON condition. In some cases 
>> Django could guess this, but there needs to be an explicit way to express 
>> the join condition.
>>
>> If we allow usage of expressions from the attached queryset, but don't 
>> try to go for cases where model instance are created from the attached 
>> queryset, this will be both possible to implement without having to write a 
>> change-everything patch, and this will also be a really nice feature.
>>
>> For recursive CTEs, I'd leave that strictly as a later step. The only 
>> thing we need to check right now is that we don't do something that 
>> prevents a good recursive CTEs implementation later on.
>>
>>  - Anssi
>>
>>>
>>> Looking forwards to seeing a DEP!
>>>
>>> [0] 
>>> https://docs.djangoproject.com/en/1.11/ref/models/expressions/#subquery-expressions
>>>
>>> On 22 March 2017 at 01:32, Ashley Waite  wrote:
>>>
 Here's the code changes I've made, noting that some of them were to 
 shove in a generalised VALUES clause that mocks being a queryset, so that 
 it plays with the same interface.


 https://github.com/django/django/compare/master...ashleywaite:cte-dev#files_bucket

 I've had a glance at cte-trees/cte-forest and once general CTEs are 
 worked out expanding that to include recursive CTEs wouldn't be too 
 difficult, and that would greatly simplify the implementation of 
 cte-forest 
 to the extent that it might be viable as a django data/reference type.

 - Ashley


 On Saturday, March 18, 2017 at 8:28:53 PM UTC+11, Josh Smeaton wrote:
>
> Thanks for bringing this up Ashley, and for all of the detail you 
> provided. I'd certainly like to see CTEs make their way into Django, 

Re: Adding generated common table expressions

2019-10-17 Thread Javier Buzzi
What do you think of this syntax instead?

q1 = Book.objects.values('author_id').annotate(avg_price=Avg('price'))
q2 = Author.objects.attach('book_prices', q1, id=F('book_prices__author_id'
))


def attach(name, queryset, **params):
   # Would look something like this.
   ...


Same sql output.

On Thursday, April 6, 2017 at 9:14:01 AM UTC-4, Anssi Kääriäinen wrote:
>
> On Thursday, April 6, 2017 at 11:53:32 AM UTC+3, Marc Tamlyn wrote:
>>
>> Regarding Anssi's comments about SubQuery, we do now have that in core as 
>> of 1.11 [0]. It does look like an .attach() approach might actually have 
>> been a nicer version of this, but on the other hand it's currently 
>> implementable solely with the Expressions API. It seems like the OuterRef 
>> is very similar to your queryset.ref(). An even nicer approach using attach 
>> could be to say qs.attach(q1=some_qs).filter(a=F('q1__b'))?
>>
>
> Hmmh, we have one form of SubQuery, but that's actually for SELECT clause, 
> not for FROM clause. I believe the same class won't work for the CTE or 
> subquery in FROM clause case.
>
> As for the attach(), seems like a really nice syntax. We do need something 
> for generating the join clause for the JOIN. If you look at an example:
> q1 = Book.objects.values('author_id').annotate(avg_price=Avg('price'))
> q2 = Author.objects.attach(q1=q1)
> it needs to create something like:
> WITH q1 AS (
> SELECT author_id, avg(price) FROM book GROUP BY author_id
> )
> SELECT author.id, author.name
>FROM author
>LEFT JOIN q1 ON author.id = q1.author_id;
>
> Or, equivalently without the CTE:
>
> SELECT author.id, author.name
>FROM author
>LEFT JOIN ( SELECT author_id, avg(price) FROM book GROUP BY author_id) 
> ON author.id = q1.author_id;
>
> Now, the main points are:
>1. There is no need to design this to be about CTEs. That just limits 
> the feature from backends that don't have CTEs without any real benefit. 
> From Django's perspective the two above queries are the same.
>2. We do need something for the JOIN ON condition. In some cases Django 
> could guess this, but there needs to be an explicit way to express the join 
> condition.
>
> If we allow usage of expressions from the attached queryset, but don't try 
> to go for cases where model instance are created from the attached 
> queryset, this will be both possible to implement without having to write a 
> change-everything patch, and this will also be a really nice feature.
>
> For recursive CTEs, I'd leave that strictly as a later step. The only 
> thing we need to check right now is that we don't do something that 
> prevents a good recursive CTEs implementation later on.
>
>  - Anssi
>
>>
>> Looking forwards to seeing a DEP!
>>
>> [0] 
>> https://docs.djangoproject.com/en/1.11/ref/models/expressions/#subquery-expressions
>>
>> On 22 March 2017 at 01:32, Ashley Waite  wrote:
>>
>>> Here's the code changes I've made, noting that some of them were to 
>>> shove in a generalised VALUES clause that mocks being a queryset, so that 
>>> it plays with the same interface.
>>>
>>>
>>> https://github.com/django/django/compare/master...ashleywaite:cte-dev#files_bucket
>>>
>>> I've had a glance at cte-trees/cte-forest and once general CTEs are 
>>> worked out expanding that to include recursive CTEs wouldn't be too 
>>> difficult, and that would greatly simplify the implementation of cte-forest 
>>> to the extent that it might be viable as a django data/reference type.
>>>
>>> - Ashley
>>>
>>>
>>> On Saturday, March 18, 2017 at 8:28:53 PM UTC+11, Josh Smeaton wrote:

 Thanks for bringing this up Ashley, and for all of the detail you 
 provided. I'd certainly like to see CTEs make their way into Django, 
 provided we could come up with a nice enough API. From the look of it, 
 you've already got something that works with an okay API so I'm hopeful.

 I'd be very interested in seeing your POC too if you're able to share.

 From looking very briefly at django-cte-trees it doesn't aim to support 
 user defined CTEs for anything other than recursive queries. I'd be 
 interested in seeing, as part of a DEP, how CTE inclusion in django core 
 could support the cte-trees project from an API perspective.

 On Friday, 17 March 2017 22:28:17 UTC+11, Ashley Waite wrote:
>
> Hey all,
>
>
> I'd like to suggest adding Common Table Expression (CTE) query 
> generation as a feature to Django.
>
> I've been working on a project that required manipulation of many 
> records at once, and as with many ORMs found that this wasn't an ideal 
> use-case in Django. As the rest of our code base and related projects are 
> in Django, there was a strong preference to find a way to do it and keep 
> to 
> the same model-is-the-truth design.
>
> I first did this by writing some hackish functions using raw querysets 
> and generating my own CTE based queries, but 

Re: Adding generated common table expressions

2019-07-22 Thread Jacob Rief
For my better understanding, could you please elaborate on what you mean by 
Hashes.as_literal(input_hashes) ?

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/48354bd1-5bd4-4b3a-b23e-376389ab2767%40googlegroups.com.


Re: Adding generated common table expressions

2017-11-29 Thread matthew.pava
I would like to know what the status of this is (adding CTE functionality 
to Django).  It is supported by all current databases that Django supports.

On Friday, March 17, 2017 at 6:49:39 PM UTC-5, Tim Graham wrote:
>
> Hi, I don't know anything about CTE, but did you see this third-party 
> package? https://github.com/petrounias/django-cte-trees -- It seems to be 
> PostgreSQL only.
>
> I was going to write, "Considering that not all databases support CTE 
> (MySQL doesn't), a third-party app might be the way to go rather than 
> having it built-in to Django." -- however, then I noticed that supported is 
> added in MySQL 8 [0], so maybe any database differences could be abstracted 
> away with an appropriate API.
>
> I'll leave it to others to comment on the technical details of your 
> proposal. Meanwhile, if you have a chance to contribute smaller patches to 
> Django, I think it's helpful to start with smaller patches before tackling 
> something larger like this.
>
> [0] 
> http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-table-expressions-in-mysql-ctes/
>
> On Friday, March 17, 2017 at 7:28:17 AM UTC-4, Ashley Waite wrote:
>>
>> Hey all,
>>
>>
>> I'd like to suggest adding Common Table Expression (CTE) query generation 
>> as a feature to Django.
>>
>> I've been working on a project that required manipulation of many records 
>> at once, and as with many ORMs found that this wasn't an ideal use-case in 
>> Django. As the rest of our code base and related projects are in Django, 
>> there was a strong preference to find a way to do it and keep to the same 
>> model-is-the-truth design.
>>
>> I first did this by writing some hackish functions using raw querysets 
>> and generating my own CTE based queries, but it lacked ideal flexibility 
>> and maintainability. So I've now written some modifications into my Django 
>> to do this in a more Django-esque way and think that this functionality 
>> would be beneficial within the project itself, but am unsure exactly where 
>> to start the conversation about that.
>>
>>
>> *Why generate CTE based queries from querysets?*
>>
>> By allowing querysets to be attached to each other, and setting 
>> appropriate WHERE clauses, arbitrary and nested SQL queries can be 
>> generated. Where the results of the queries are only necessary for the 
>> execution of following queries this saves a very substantial amount of time 
>> and database work. Once these features exist, other functionality can also 
>> transparently use these to generate more efficient queries (such as large 
>> IN clauses).
>>
>> This allows several powerful use cases I think Django would benefit from:
>>
>>
>> *Large 'IN' clauses*, can be implemented as CTEs reducing expensive 
>> lookups to a single CTE INNER JOIN. For sets of thousands to match from 
>> tables of millions of records this can be a very substantial gain.
>>
>>
>> *Composite 'IN' conditions,* where multiple fields must match and you're 
>> matching against a large set of condition rows. In my usage this was "where 
>> the md5/sha hashes match one of the million md5/sha tuples in my match 
>> set". This is simply a CTE JOIN with two clauses in the WHERE.
>>
>>
>> *Nested data creation*, where the parent doesn't yet exist. Django 
>> doesn't currently do this as the primary keys are needed, and this makes 
>> normalised data structures unappealing. Using INSERTs as CTEs that supply 
>> those keys to following statements means that entire nested data structures 
>> of new information can be recreated in the database at once, efficiently 
>> and atomically.
>>
>>
>> *Non-uniform UPDATE*s, such that a modified set of objects can all be 
>> updated with different data at the same time by utilising a CTE values 
>> statement JOINed to the UPDATE statement. As there's currently no way to do 
>> this kind of bulk update the alternative is to update each instance 
>> individually, and this doesn't scale well.
>>
>> These could also be used with aggregations and other calculated fields to 
>> create complex queries that aren't possible at the moment.
>>
>>
>> *What my PoC looks like*
>>
>> With another mildly hackish PoC that creates a VALUEs set from a 
>> dict/namedtuple which can be used to provide large input data, my present 
>> modified version syntax looks a bit like this (not perfect queries):
>>
>> class Hashes(models.Model):
>> md5 = models.UUIDField(verbose_name="MD5 hash (base16)", db_index=True)
>> sha2 = models.CharField(max_length=44, null=True, verbose_name="SHA256 
>> hash (base64)")
>>
>> # Mock QuerySet of values
>> q_mo = Hashes.as_literal(input_hashes).values("md5", "sha2")
>> # A big IN query
>> q_in = Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5"))
>>
>> # Matched existing values with composite 'IN' (where md5 and sha2 match, or 
>> md5 matches and existing record lacks sha2)
>> q_ex = 
>> Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5")).filter(Q(sha160=q_mo.ref("sha160"))
>>  | 

Re: Adding generated common table expressions

2017-04-06 Thread Anssi Kääriäinen
On Thursday, April 6, 2017 at 11:53:32 AM UTC+3, Marc Tamlyn wrote:
>
> Regarding Anssi's comments about SubQuery, we do now have that in core as 
> of 1.11 [0]. It does look like an .attach() approach might actually have 
> been a nicer version of this, but on the other hand it's currently 
> implementable solely with the Expressions API. It seems like the OuterRef 
> is very similar to your queryset.ref(). An even nicer approach using attach 
> could be to say qs.attach(q1=some_qs).filter(a=F('q1__b'))?
>

Hmmh, we have one form of SubQuery, but that's actually for SELECT clause, 
not for FROM clause. I believe the same class won't work for the CTE or 
subquery in FROM clause case.

As for the attach(), seems like a really nice syntax. We do need something 
for generating the join clause for the JOIN. If you look at an example:
q1 = Book.objects.values('author_id').annotate(avg_price=Avg('price'))
q2 = Author.objects.attach(q1=q1)
it needs to create something like:
WITH q1 AS (
SELECT author_id, avg(price) FROM book GROUP BY author_id
)
SELECT author.id, author.name
   FROM author
   LEFT JOIN q1 ON author.id = q1.author_id;

Or, equivalently without the CTE:

SELECT author.id, author.name
   FROM author
   LEFT JOIN ( SELECT author_id, avg(price) FROM book GROUP BY author_id) 
ON author.id = q1.author_id;

Now, the main points are:
   1. There is no need to design this to be about CTEs. That just limits 
the feature from backends that don't have CTEs without any real benefit. 
>From Django's perspective the two above queries are the same.
   2. We do need something for the JOIN ON condition. In some cases Django 
could guess this, but there needs to be an explicit way to express the join 
condition.

If we allow usage of expressions from the attached queryset, but don't try 
to go for cases where model instance are created from the attached 
queryset, this will be both possible to implement without having to write a 
change-everything patch, and this will also be a really nice feature.

For recursive CTEs, I'd leave that strictly as a later step. The only thing 
we need to check right now is that we don't do something that prevents a 
good recursive CTEs implementation later on.

 - Anssi

>
> Looking forwards to seeing a DEP!
>
> [0] 
> https://docs.djangoproject.com/en/1.11/ref/models/expressions/#subquery-expressions
>
> On 22 March 2017 at 01:32, Ashley Waite  > wrote:
>
>> Here's the code changes I've made, noting that some of them were to shove 
>> in a generalised VALUES clause that mocks being a queryset, so that it 
>> plays with the same interface.
>>
>>
>> https://github.com/django/django/compare/master...ashleywaite:cte-dev#files_bucket
>>
>> I've had a glance at cte-trees/cte-forest and once general CTEs are 
>> worked out expanding that to include recursive CTEs wouldn't be too 
>> difficult, and that would greatly simplify the implementation of cte-forest 
>> to the extent that it might be viable as a django data/reference type.
>>
>> - Ashley
>>
>>
>> On Saturday, March 18, 2017 at 8:28:53 PM UTC+11, Josh Smeaton wrote:
>>>
>>> Thanks for bringing this up Ashley, and for all of the detail you 
>>> provided. I'd certainly like to see CTEs make their way into Django, 
>>> provided we could come up with a nice enough API. From the look of it, 
>>> you've already got something that works with an okay API so I'm hopeful.
>>>
>>> I'd be very interested in seeing your POC too if you're able to share.
>>>
>>> From looking very briefly at django-cte-trees it doesn't aim to support 
>>> user defined CTEs for anything other than recursive queries. I'd be 
>>> interested in seeing, as part of a DEP, how CTE inclusion in django core 
>>> could support the cte-trees project from an API perspective.
>>>
>>> On Friday, 17 March 2017 22:28:17 UTC+11, Ashley Waite wrote:

 Hey all,


 I'd like to suggest adding Common Table Expression (CTE) query 
 generation as a feature to Django.

 I've been working on a project that required manipulation of many 
 records at once, and as with many ORMs found that this wasn't an ideal 
 use-case in Django. As the rest of our code base and related projects are 
 in Django, there was a strong preference to find a way to do it and keep 
 to 
 the same model-is-the-truth design.

 I first did this by writing some hackish functions using raw querysets 
 and generating my own CTE based queries, but it lacked ideal flexibility 
 and maintainability. So I've now written some modifications into my Django 
 to do this in a more Django-esque way and think that this functionality 
 would be beneficial within the project itself, but am unsure exactly where 
 to start the conversation about that.


 *Why generate CTE based queries from querysets?*

 By allowing querysets to be attached to each other, and setting 
 appropriate WHERE clauses, 

Re: Adding generated common table expressions

2017-04-06 Thread Ashley Waite
Oh right, I feel silly for not thinking of named arguments as a resolution
there, that's a damn good idea!

It resolves a couple of minor issues I'd anticipated in a complete
implementation, so consider the idea stolen!  ;)

I mean, ummm, that was totally what I had planned!

- Ashley

On 06/04/2017 6:53 PM, "Marc Tamlyn"  wrote:

> Hi!
>
> This looks generally very good, and I'm quite excited about it.
>
> In terms of taking it forwards, I think a DEP is a very good idea, and
> there are at least 3 core devs who are keen to see a solution. Whether you
> have the right solution yet I'm not so sure, but it sounds like you're not
> either.
>
> Regarding Anssi's comments about SubQuery, we do now have that in core as
> of 1.11 [0]. It does look like an .attach() approach might actually have
> been a nicer version of this, but on the other hand it's currently
> implementable solely with the Expressions API. It seems like the OuterRef
> is very similar to your queryset.ref(). An even nicer approach using attach
> could be to say qs.attach(q1=some_qs).filter(a=F('q1__b'))?
>
> Looking forwards to seeing a DEP!
>
> [0] https://docs.djangoproject.com/en/1.11/ref/
> models/expressions/#subquery-expressions
>
> On 22 March 2017 at 01:32, Ashley Waite  wrote:
>
>> Here's the code changes I've made, noting that some of them were to shove
>> in a generalised VALUES clause that mocks being a queryset, so that it
>> plays with the same interface.
>>
>> https://github.com/django/django/compare/master...ashleywait
>> e:cte-dev#files_bucket
>>
>> I've had a glance at cte-trees/cte-forest and once general CTEs are
>> worked out expanding that to include recursive CTEs wouldn't be too
>> difficult, and that would greatly simplify the implementation of cte-forest
>> to the extent that it might be viable as a django data/reference type.
>>
>> - Ashley
>>
>>
>> On Saturday, March 18, 2017 at 8:28:53 PM UTC+11, Josh Smeaton wrote:
>>>
>>> Thanks for bringing this up Ashley, and for all of the detail you
>>> provided. I'd certainly like to see CTEs make their way into Django,
>>> provided we could come up with a nice enough API. From the look of it,
>>> you've already got something that works with an okay API so I'm hopeful.
>>>
>>> I'd be very interested in seeing your POC too if you're able to share.
>>>
>>> From looking very briefly at django-cte-trees it doesn't aim to support
>>> user defined CTEs for anything other than recursive queries. I'd be
>>> interested in seeing, as part of a DEP, how CTE inclusion in django core
>>> could support the cte-trees project from an API perspective.
>>>
>>> On Friday, 17 March 2017 22:28:17 UTC+11, Ashley Waite wrote:

 Hey all,


 I'd like to suggest adding Common Table Expression (CTE) query
 generation as a feature to Django.

 I've been working on a project that required manipulation of many
 records at once, and as with many ORMs found that this wasn't an ideal
 use-case in Django. As the rest of our code base and related projects are
 in Django, there was a strong preference to find a way to do it and keep to
 the same model-is-the-truth design.

 I first did this by writing some hackish functions using raw querysets
 and generating my own CTE based queries, but it lacked ideal flexibility
 and maintainability. So I've now written some modifications into my Django
 to do this in a more Django-esque way and think that this functionality
 would be beneficial within the project itself, but am unsure exactly where
 to start the conversation about that.


 *Why generate CTE based queries from querysets?*

 By allowing querysets to be attached to each other, and setting
 appropriate WHERE clauses, arbitrary and nested SQL queries can be
 generated. Where the results of the queries are only necessary for the
 execution of following queries this saves a very substantial amount of time
 and database work. Once these features exist, other functionality can also
 transparently use these to generate more efficient queries (such as large
 IN clauses).

 This allows several powerful use cases I think Django would benefit
 from:


 *Large 'IN' clauses*, can be implemented as CTEs reducing expensive
 lookups to a single CTE INNER JOIN. For sets of thousands to match from
 tables of millions of records this can be a very substantial gain.


 *Composite 'IN' conditions,* where multiple fields must match and
 you're matching against a large set of condition rows. In my usage this was
 "where the md5/sha hashes match one of the million md5/sha tuples in my
 match set". This is simply a CTE JOIN with two clauses in the WHERE.


 *Nested data creation*, where the parent doesn't yet exist. Django
 doesn't currently do this as the primary keys are needed, and this makes

Re: Adding generated common table expressions

2017-04-06 Thread Marc Tamlyn
Hi!

This looks generally very good, and I'm quite excited about it.

In terms of taking it forwards, I think a DEP is a very good idea, and
there are at least 3 core devs who are keen to see a solution. Whether you
have the right solution yet I'm not so sure, but it sounds like you're not
either.

Regarding Anssi's comments about SubQuery, we do now have that in core as
of 1.11 [0]. It does look like an .attach() approach might actually have
been a nicer version of this, but on the other hand it's currently
implementable solely with the Expressions API. It seems like the OuterRef
is very similar to your queryset.ref(). An even nicer approach using attach
could be to say qs.attach(q1=some_qs).filter(a=F('q1__b'))?

Looking forwards to seeing a DEP!

[0]
https://docs.djangoproject.com/en/1.11/ref/models/expressions/#subquery-expressions

On 22 March 2017 at 01:32, Ashley Waite  wrote:

> Here's the code changes I've made, noting that some of them were to shove
> in a generalised VALUES clause that mocks being a queryset, so that it
> plays with the same interface.
>
> https://github.com/django/django/compare/master...
> ashleywaite:cte-dev#files_bucket
>
> I've had a glance at cte-trees/cte-forest and once general CTEs are worked
> out expanding that to include recursive CTEs wouldn't be too difficult, and
> that would greatly simplify the implementation of cte-forest to the extent
> that it might be viable as a django data/reference type.
>
> - Ashley
>
>
> On Saturday, March 18, 2017 at 8:28:53 PM UTC+11, Josh Smeaton wrote:
>>
>> Thanks for bringing this up Ashley, and for all of the detail you
>> provided. I'd certainly like to see CTEs make their way into Django,
>> provided we could come up with a nice enough API. From the look of it,
>> you've already got something that works with an okay API so I'm hopeful.
>>
>> I'd be very interested in seeing your POC too if you're able to share.
>>
>> From looking very briefly at django-cte-trees it doesn't aim to support
>> user defined CTEs for anything other than recursive queries. I'd be
>> interested in seeing, as part of a DEP, how CTE inclusion in django core
>> could support the cte-trees project from an API perspective.
>>
>> On Friday, 17 March 2017 22:28:17 UTC+11, Ashley Waite wrote:
>>>
>>> Hey all,
>>>
>>>
>>> I'd like to suggest adding Common Table Expression (CTE) query
>>> generation as a feature to Django.
>>>
>>> I've been working on a project that required manipulation of many
>>> records at once, and as with many ORMs found that this wasn't an ideal
>>> use-case in Django. As the rest of our code base and related projects are
>>> in Django, there was a strong preference to find a way to do it and keep to
>>> the same model-is-the-truth design.
>>>
>>> I first did this by writing some hackish functions using raw querysets
>>> and generating my own CTE based queries, but it lacked ideal flexibility
>>> and maintainability. So I've now written some modifications into my Django
>>> to do this in a more Django-esque way and think that this functionality
>>> would be beneficial within the project itself, but am unsure exactly where
>>> to start the conversation about that.
>>>
>>>
>>> *Why generate CTE based queries from querysets?*
>>>
>>> By allowing querysets to be attached to each other, and setting
>>> appropriate WHERE clauses, arbitrary and nested SQL queries can be
>>> generated. Where the results of the queries are only necessary for the
>>> execution of following queries this saves a very substantial amount of time
>>> and database work. Once these features exist, other functionality can also
>>> transparently use these to generate more efficient queries (such as large
>>> IN clauses).
>>>
>>> This allows several powerful use cases I think Django would benefit from:
>>>
>>>
>>> *Large 'IN' clauses*, can be implemented as CTEs reducing expensive
>>> lookups to a single CTE INNER JOIN. For sets of thousands to match from
>>> tables of millions of records this can be a very substantial gain.
>>>
>>>
>>> *Composite 'IN' conditions,* where multiple fields must match and
>>> you're matching against a large set of condition rows. In my usage this was
>>> "where the md5/sha hashes match one of the million md5/sha tuples in my
>>> match set". This is simply a CTE JOIN with two clauses in the WHERE.
>>>
>>>
>>> *Nested data creation*, where the parent doesn't yet exist. Django
>>> doesn't currently do this as the primary keys are needed, and this makes
>>> normalised data structures unappealing. Using INSERTs as CTEs that supply
>>> those keys to following statements means that entire nested data structures
>>> of new information can be recreated in the database at once, efficiently
>>> and atomically.
>>>
>>>
>>> *Non-uniform UPDATE*s, such that a modified set of objects can all be
>>> updated with different data at the same time by utilising a CTE values
>>> statement JOINed to the UPDATE statement. As there's 

Re: Adding generated common table expressions

2017-03-21 Thread Ashley Waite
Here's the code changes I've made, noting that some of them were to shove 
in a generalised VALUES clause that mocks being a queryset, so that it 
plays with the same interface.

https://github.com/django/django/compare/master...ashleywaite:cte-dev#files_bucket

I've had a glance at cte-trees/cte-forest and once general CTEs are worked 
out expanding that to include recursive CTEs wouldn't be too difficult, and 
that would greatly simplify the implementation of cte-forest to the extent 
that it might be viable as a django data/reference type.

- Ashley

On Saturday, March 18, 2017 at 8:28:53 PM UTC+11, Josh Smeaton wrote:
>
> Thanks for bringing this up Ashley, and for all of the detail you 
> provided. I'd certainly like to see CTEs make their way into Django, 
> provided we could come up with a nice enough API. From the look of it, 
> you've already got something that works with an okay API so I'm hopeful.
>
> I'd be very interested in seeing your POC too if you're able to share.
>
> From looking very briefly at django-cte-trees it doesn't aim to support 
> user defined CTEs for anything other than recursive queries. I'd be 
> interested in seeing, as part of a DEP, how CTE inclusion in django core 
> could support the cte-trees project from an API perspective.
>
> On Friday, 17 March 2017 22:28:17 UTC+11, Ashley Waite wrote:
>>
>> Hey all,
>>
>>
>> I'd like to suggest adding Common Table Expression (CTE) query generation 
>> as a feature to Django.
>>
>> I've been working on a project that required manipulation of many records 
>> at once, and as with many ORMs found that this wasn't an ideal use-case in 
>> Django. As the rest of our code base and related projects are in Django, 
>> there was a strong preference to find a way to do it and keep to the same 
>> model-is-the-truth design.
>>
>> I first did this by writing some hackish functions using raw querysets 
>> and generating my own CTE based queries, but it lacked ideal flexibility 
>> and maintainability. So I've now written some modifications into my Django 
>> to do this in a more Django-esque way and think that this functionality 
>> would be beneficial within the project itself, but am unsure exactly where 
>> to start the conversation about that.
>>
>>
>> *Why generate CTE based queries from querysets?*
>>
>> By allowing querysets to be attached to each other, and setting 
>> appropriate WHERE clauses, arbitrary and nested SQL queries can be 
>> generated. Where the results of the queries are only necessary for the 
>> execution of following queries this saves a very substantial amount of time 
>> and database work. Once these features exist, other functionality can also 
>> transparently use these to generate more efficient queries (such as large 
>> IN clauses).
>>
>> This allows several powerful use cases I think Django would benefit from:
>>
>>
>> *Large 'IN' clauses*, can be implemented as CTEs reducing expensive 
>> lookups to a single CTE INNER JOIN. For sets of thousands to match from 
>> tables of millions of records this can be a very substantial gain.
>>
>>
>> *Composite 'IN' conditions,* where multiple fields must match and you're 
>> matching against a large set of condition rows. In my usage this was "where 
>> the md5/sha hashes match one of the million md5/sha tuples in my match 
>> set". This is simply a CTE JOIN with two clauses in the WHERE.
>>
>>
>> *Nested data creation*, where the parent doesn't yet exist. Django 
>> doesn't currently do this as the primary keys are needed, and this makes 
>> normalised data structures unappealing. Using INSERTs as CTEs that supply 
>> those keys to following statements means that entire nested data structures 
>> of new information can be recreated in the database at once, efficiently 
>> and atomically.
>>
>>
>> *Non-uniform UPDATE*s, such that a modified set of objects can all be 
>> updated with different data at the same time by utilising a CTE values 
>> statement JOINed to the UPDATE statement. As there's currently no way to do 
>> this kind of bulk update the alternative is to update each instance 
>> individually, and this doesn't scale well.
>>
>> These could also be used with aggregations and other calculated fields to 
>> create complex queries that aren't possible at the moment.
>>
>>
>> *What my PoC looks like*
>>
>> With another mildly hackish PoC that creates a VALUEs set from a 
>> dict/namedtuple which can be used to provide large input data, my present 
>> modified version syntax looks a bit like this (not perfect queries):
>>
>> class Hashes(models.Model):
>> md5 = models.UUIDField(verbose_name="MD5 hash (base16)", db_index=True)
>> sha2 = models.CharField(max_length=44, null=True, verbose_name="SHA256 
>> hash (base64)")
>>
>> # Mock QuerySet of values
>> q_mo = Hashes.as_literal(input_hashes).values("md5", "sha2")
>> # A big IN query
>> q_in = Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5"))
>>
>> # Matched existing values with composite 'IN' (where 

Re: Adding generated common table expressions

2017-03-21 Thread Ashley Waite
CTE Forest is a specific use case of CTEs to store self-referential tree 
data sets.

It's quite a different use to what I'm proposing, though its implementation 
might become simpler if Django is generally CTE aware.

- Ashley

On Saturday, March 18, 2017 at 6:59:57 PM UTC+11, Matthias Kestenholz wrote:
>
> Hi, 
>
> On Sat, Mar 18, 2017 at 12:49 AM, Tim Graham  > wrote: 
> > Hi, I don't know anything about CTE, but did you see this third-party 
> > package? https://github.com/petrounias/django-cte-trees -- It seems to 
> be 
> > PostgreSQL only. 
>
> Just chiming in to point out a maintained and up-to-date friendly fork 
> of the project above: 
> https://github.com/matthiask/django-cte-forest 
>
> Ashley, is your PoC available somewhere? Is your implementation in any 
> way similar to the implementation in 
> django-cte-trees/django-cte-forest? 
>
> Thanks, 
> Matthias 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/742681c3-85e7-4420-89c8-834c33690990%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Adding generated common table expressions

2017-03-21 Thread Ashley Waite
I'm nesting the query generators as they currently exist and adding 
awareness of that nesting into another compiler, so that queries that 
reference other models, contain annotations, etc should still in theory 
work fine.

I do agree that the best method would be to generalise this to incorporate 
subqueries as well, but I considered those to be a subset of general CTE 
queries which can effectively be any valid query.

With some minor modifications to the existing INSERT/UPDATE/SELECT 
compilers via subclassing I think that subqueries as special cases of 
general statements that have some restrictions is probably the easiest 
route.
My current fiddling has changed INSERT/UPDATE to have returning clauses 
using subclassed compilers:
https://github.com/ashleywaite/django/blob/cte-dev/django/db/models/sql/subqueries.py#L174-L209
https://github.com/ashleywaite/django/blob/cte-dev/django/db/models/sql/compiler.py#L1283-L1321

Such that regular non-RETURN aware uses aren't impacted by that code - but 
I suspect that making INSERT and UPDATE be RETURN aware and lazily executed 
as a default case might be better, and more consistent with the way that 
SELECT based querysets work.

I'm using a workaround to feed in VALUES clauses, which again I think would 
probably be better as a proper QuerySet subclass that is used by INSERT 
statements and CTEs alike.
The language is a bit confusing because .values() is a method in Django, so 
mock querysets that define literal values intended to generate a VALUES 
clause don't really have a good name, but they would introduce some 
consistency as that's what INSERT statements are doing, and what 
non-uniform UPDATEs and composite IN style queries would want to do.

The leading CTE WITH clauses I'm generating are done via it's own compiler 
as well, such that only queries that are aware of having such queries 
attached will generate them:
https://github.com/ashleywaite/django/blob/cte-dev/django/db/models/sql/compiler.py#L1324-L1374

These could fairly easily be adapted to use a more general case for 
attaching one query to another, as my relatively haskish PoC was mostly to 
show that it can be done without substantial complexity, and that 
modification of other systems to better incorporate these patterns is 
probably a good idea.

As they stand now, CTEs used in the way I am using them could with little 
effort be modified to support aggregations, and annotations on the base 
query to pull in values from complex generated CTE's is probably the ideal 
way to perform some multi-table calculations.

- Ashley

On Monday, March 20, 2017 at 6:23:44 PM UTC+11, Anssi Kääriäinen wrote:
>
> +1 for the INSERT and UPDATE parts.
>
> For the .attach() method, I wonder if it would be better to work on having 
> generic subquery support in the ORM instead of targeting CTEs specifically. 
> From Django's perspective there isn't any big difference between:
> WITH fooquery AS (SELECT * FROM foo WHERE ...)
> SELECT * FROM baz JOIN fooquery ON ...
> and
> SELECT * FROM baz JOIN (SELECT * FROM foo WHERE ...) fooquery ON ...
>
> The great thing about the subquery form is that it works on every 
> database, and to my knowledge the subquery support is surprisingly standard 
> on all database Django needs to care about.
>
> If we want full support for either of the above cases, then we need some 
> way to tell to the Django's ORM what the fooquery (either as subquery or 
> with CTE) produces. For simple cases this won't be complex (say, for 
> .values() it's just a list of expressions), but what to do when the 
> subquery/CTE contains .select_related() for example? Then we'd have a 
> situation where the subquery produces a list of expressions but those 
> expressions might refer to different models in the CTE.
>
> So, the starting point should be to allow only "simple" queries in 
> .attach(). This means that the attached query must be either .values() 
> queryset, or a queryset containing expressions from single model only (no 
> .extra(), .annotate() or select_related() added). In addition I think 
> .attach() should support subqueries instead of using only CTEs. We get 
> support on every backend instead of PostgreSQL only for pretty much the 
> same amount of effort.
>
>  - Anssi
>
> On Saturday, March 18, 2017 at 11:28:53 AM UTC+2, Josh Smeaton wrote:
>>
>> Thanks for bringing this up Ashley, and for all of the detail you 
>> provided. I'd certainly like to see CTEs make their way into Django, 
>> provided we could come up with a nice enough API. From the look of it, 
>> you've already got something that works with an okay API so I'm hopeful.
>>
>> I'd be very interested in seeing your POC too if you're able to share.
>>
>> From looking very briefly at django-cte-trees it doesn't aim to support 
>> user defined CTEs for anything other than recursive queries. I'd be 
>> interested in seeing, as part of a DEP, how CTE inclusion in django core 
>> could support the cte-trees 

Re: Adding generated common table expressions

2017-03-18 Thread Josh Smeaton
Thanks for bringing this up Ashley, and for all of the detail you provided. 
I'd certainly like to see CTEs make their way into Django, provided we 
could come up with a nice enough API. From the look of it, you've already 
got something that works with an okay API so I'm hopeful.

I'd be very interested in seeing your POC too if you're able to share.

>From looking very briefly at django-cte-trees it doesn't aim to support 
user defined CTEs for anything other than recursive queries. I'd be 
interested in seeing, as part of a DEP, how CTE inclusion in django core 
could support the cte-trees project from an API perspective.

On Friday, 17 March 2017 22:28:17 UTC+11, Ashley Waite wrote:
>
> Hey all,
>
>
> I'd like to suggest adding Common Table Expression (CTE) query generation 
> as a feature to Django.
>
> I've been working on a project that required manipulation of many records 
> at once, and as with many ORMs found that this wasn't an ideal use-case in 
> Django. As the rest of our code base and related projects are in Django, 
> there was a strong preference to find a way to do it and keep to the same 
> model-is-the-truth design.
>
> I first did this by writing some hackish functions using raw querysets and 
> generating my own CTE based queries, but it lacked ideal flexibility and 
> maintainability. So I've now written some modifications into my Django to 
> do this in a more Django-esque way and think that this functionality would 
> be beneficial within the project itself, but am unsure exactly where to 
> start the conversation about that.
>
>
> *Why generate CTE based queries from querysets?*
>
> By allowing querysets to be attached to each other, and setting 
> appropriate WHERE clauses, arbitrary and nested SQL queries can be 
> generated. Where the results of the queries are only necessary for the 
> execution of following queries this saves a very substantial amount of time 
> and database work. Once these features exist, other functionality can also 
> transparently use these to generate more efficient queries (such as large 
> IN clauses).
>
> This allows several powerful use cases I think Django would benefit from:
>
>
> *Large 'IN' clauses*, can be implemented as CTEs reducing expensive 
> lookups to a single CTE INNER JOIN. For sets of thousands to match from 
> tables of millions of records this can be a very substantial gain.
>
>
> *Composite 'IN' conditions,* where multiple fields must match and you're 
> matching against a large set of condition rows. In my usage this was "where 
> the md5/sha hashes match one of the million md5/sha tuples in my match 
> set". This is simply a CTE JOIN with two clauses in the WHERE.
>
>
> *Nested data creation*, where the parent doesn't yet exist. Django 
> doesn't currently do this as the primary keys are needed, and this makes 
> normalised data structures unappealing. Using INSERTs as CTEs that supply 
> those keys to following statements means that entire nested data structures 
> of new information can be recreated in the database at once, efficiently 
> and atomically.
>
>
> *Non-uniform UPDATE*s, such that a modified set of objects can all be 
> updated with different data at the same time by utilising a CTE values 
> statement JOINed to the UPDATE statement. As there's currently no way to do 
> this kind of bulk update the alternative is to update each instance 
> individually, and this doesn't scale well.
>
> These could also be used with aggregations and other calculated fields to 
> create complex queries that aren't possible at the moment.
>
>
> *What my PoC looks like*
>
> With another mildly hackish PoC that creates a VALUEs set from a 
> dict/namedtuple which can be used to provide large input data, my present 
> modified version syntax looks a bit like this (not perfect queries):
>
> class Hashes(models.Model):
> md5 = models.UUIDField(verbose_name="MD5 hash (base16)", db_index=True)
> sha2 = models.CharField(max_length=44, null=True, verbose_name="SHA256 
> hash (base64)")
>
> # Mock QuerySet of values
> q_mo = Hashes.as_literal(input_hashes).values("md5", "sha2")
> # A big IN query
> q_in = Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5"))
>
> # Matched existing values with composite 'IN' (where md5 and sha2 match, or 
> md5 matches and existing record lacks sha2)
> q_ex = 
> Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5")).filter(Q(sha160=q_mo.ref("sha160"))
>  | Q(sha160=None))
>
> # Create new records that don't exist
> q_cr = Hashes.objects.attach(q_mo, 
> q_ex).filter(md5=q_mo.ref("md5")).exclude(md5=q_ex.ref("md5")).values("md5", 
> "sha2").as_insert()
>
> Returning the newly created records.
>
> SQL can be generated that looks something like this:
>
> WITH cte_1_0 (md5, sha2) AS (
>   VALUES ('2d30243bfe9d06673765c432c2bd'::uuid, 
> 'fsA8okuCuq9KybxqcAzNdjlIyAx1QJjTPdf1ZFK/hDI='::varchar(44)),
>   ('f20a46e4e60338697948a0917423', 
> '6bVZgpYZtit1E32BlANWXoKnFFFDNierDSIi0SraND4=')),
> 

Re: Adding generated common table expressions

2017-03-18 Thread Adam Johnson
>
>  supported is added in MySQL 8 [0]
>

Additionally MariaDB 10.2 supports them, and it's nearing release.

On 18 March 2017 at 07:59, Matthias Kestenholz  wrote:

> Hi,
>
> On Sat, Mar 18, 2017 at 12:49 AM, Tim Graham  wrote:
> > Hi, I don't know anything about CTE, but did you see this third-party
> > package? https://github.com/petrounias/django-cte-trees -- It seems to
> be
> > PostgreSQL only.
>
> Just chiming in to point out a maintained and up-to-date friendly fork
> of the project above:
> https://github.com/matthiask/django-cte-forest
>
> Ashley, is your PoC available somewhere? Is your implementation in any
> way similar to the implementation in
> django-cte-trees/django-cte-forest?
>
> Thanks,
> Matthias
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers  (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/django-developers/CANvPqgDWUPuUA6rw8uawa7_
> AsUkXFhuegFM1cJ-gjVUCxvBUwg%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM1pXBMBEOJNmg8w%3D1PWxnFCUCZSpONgYu%3Dk7WXG55yLew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Adding generated common table expressions

2017-03-18 Thread Matthias Kestenholz
Hi,

On Sat, Mar 18, 2017 at 12:49 AM, Tim Graham  wrote:
> Hi, I don't know anything about CTE, but did you see this third-party
> package? https://github.com/petrounias/django-cte-trees -- It seems to be
> PostgreSQL only.

Just chiming in to point out a maintained and up-to-date friendly fork
of the project above:
https://github.com/matthiask/django-cte-forest

Ashley, is your PoC available somewhere? Is your implementation in any
way similar to the implementation in
django-cte-trees/django-cte-forest?

Thanks,
Matthias

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CANvPqgDWUPuUA6rw8uawa7_AsUkXFhuegFM1cJ-gjVUCxvBUwg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Adding generated common table expressions

2017-03-17 Thread Tim Graham
Hi, I don't know anything about CTE, but did you see this third-party 
package? https://github.com/petrounias/django-cte-trees -- It seems to be 
PostgreSQL only.

I was going to write, "Considering that not all databases support CTE 
(MySQL doesn't), a third-party app might be the way to go rather than 
having it built-in to Django." -- however, then I noticed that supported is 
added in MySQL 8 [0], so maybe any database differences could be abstracted 
away with an appropriate API.

I'll leave it to others to comment on the technical details of your 
proposal. Meanwhile, if you have a chance to contribute smaller patches to 
Django, I think it's helpful to start with smaller patches before tackling 
something larger like this.

[0] 
http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-table-expressions-in-mysql-ctes/

On Friday, March 17, 2017 at 7:28:17 AM UTC-4, Ashley Waite wrote:
>
> Hey all,
>
>
> I'd like to suggest adding Common Table Expression (CTE) query generation 
> as a feature to Django.
>
> I've been working on a project that required manipulation of many records 
> at once, and as with many ORMs found that this wasn't an ideal use-case in 
> Django. As the rest of our code base and related projects are in Django, 
> there was a strong preference to find a way to do it and keep to the same 
> model-is-the-truth design.
>
> I first did this by writing some hackish functions using raw querysets and 
> generating my own CTE based queries, but it lacked ideal flexibility and 
> maintainability. So I've now written some modifications into my Django to 
> do this in a more Django-esque way and think that this functionality would 
> be beneficial within the project itself, but am unsure exactly where to 
> start the conversation about that.
>
>
> *Why generate CTE based queries from querysets?*
>
> By allowing querysets to be attached to each other, and setting 
> appropriate WHERE clauses, arbitrary and nested SQL queries can be 
> generated. Where the results of the queries are only necessary for the 
> execution of following queries this saves a very substantial amount of time 
> and database work. Once these features exist, other functionality can also 
> transparently use these to generate more efficient queries (such as large 
> IN clauses).
>
> This allows several powerful use cases I think Django would benefit from:
>
>
> *Large 'IN' clauses*, can be implemented as CTEs reducing expensive 
> lookups to a single CTE INNER JOIN. For sets of thousands to match from 
> tables of millions of records this can be a very substantial gain.
>
>
> *Composite 'IN' conditions,* where multiple fields must match and you're 
> matching against a large set of condition rows. In my usage this was "where 
> the md5/sha hashes match one of the million md5/sha tuples in my match 
> set". This is simply a CTE JOIN with two clauses in the WHERE.
>
>
> *Nested data creation*, where the parent doesn't yet exist. Django 
> doesn't currently do this as the primary keys are needed, and this makes 
> normalised data structures unappealing. Using INSERTs as CTEs that supply 
> those keys to following statements means that entire nested data structures 
> of new information can be recreated in the database at once, efficiently 
> and atomically.
>
>
> *Non-uniform UPDATE*s, such that a modified set of objects can all be 
> updated with different data at the same time by utilising a CTE values 
> statement JOINed to the UPDATE statement. As there's currently no way to do 
> this kind of bulk update the alternative is to update each instance 
> individually, and this doesn't scale well.
>
> These could also be used with aggregations and other calculated fields to 
> create complex queries that aren't possible at the moment.
>
>
> *What my PoC looks like*
>
> With another mildly hackish PoC that creates a VALUEs set from a 
> dict/namedtuple which can be used to provide large input data, my present 
> modified version syntax looks a bit like this (not perfect queries):
>
> class Hashes(models.Model):
> md5 = models.UUIDField(verbose_name="MD5 hash (base16)", db_index=True)
> sha2 = models.CharField(max_length=44, null=True, verbose_name="SHA256 
> hash (base64)")
>
> # Mock QuerySet of values
> q_mo = Hashes.as_literal(input_hashes).values("md5", "sha2")
> # A big IN query
> q_in = Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5"))
>
> # Matched existing values with composite 'IN' (where md5 and sha2 match, or 
> md5 matches and existing record lacks sha2)
> q_ex = 
> Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5")).filter(Q(sha160=q_mo.ref("sha160"))
>  | Q(sha160=None))
>
> # Create new records that don't exist
> q_cr = Hashes.objects.attach(q_mo, 
> q_ex).filter(md5=q_mo.ref("md5")).exclude(md5=q_ex.ref("md5")).values("md5", 
> "sha2").as_insert()
>
> Returning the newly created records.
>
> SQL can be generated that looks something like this:
>
> WITH cte_1_0 (md5, sha2) AS (
>   VALUES 

Adding generated common table expressions

2017-03-17 Thread Ashley Waite


Hey all,


I'd like to suggest adding Common Table Expression (CTE) query generation 
as a feature to Django.

I've been working on a project that required manipulation of many records 
at once, and as with many ORMs found that this wasn't an ideal use-case in 
Django. As the rest of our code base and related projects are in Django, 
there was a strong preference to find a way to do it and keep to the same 
model-is-the-truth design.

I first did this by writing some hackish functions using raw querysets and 
generating my own CTE based queries, but it lacked ideal flexibility and 
maintainability. So I've now written some modifications into my Django to 
do this in a more Django-esque way and think that this functionality would 
be beneficial within the project itself, but am unsure exactly where to 
start the conversation about that.


*Why generate CTE based queries from querysets?*

By allowing querysets to be attached to each other, and setting appropriate 
WHERE clauses, arbitrary and nested SQL queries can be generated. Where the 
results of the queries are only necessary for the execution of following 
queries this saves a very substantial amount of time and database work. 
Once these features exist, other functionality can also transparently use 
these to generate more efficient queries (such as large IN clauses).

This allows several powerful use cases I think Django would benefit from:


*Large 'IN' clauses*, can be implemented as CTEs reducing expensive lookups 
to a single CTE INNER JOIN. For sets of thousands to match from tables of 
millions of records this can be a very substantial gain.


*Composite 'IN' conditions,* where multiple fields must match and you're 
matching against a large set of condition rows. In my usage this was "where 
the md5/sha hashes match one of the million md5/sha tuples in my match 
set". This is simply a CTE JOIN with two clauses in the WHERE.


*Nested data creation*, where the parent doesn't yet exist. Django doesn't 
currently do this as the primary keys are needed, and this makes normalised 
data structures unappealing. Using INSERTs as CTEs that supply those keys 
to following statements means that entire nested data structures of new 
information can be recreated in the database at once, efficiently and 
atomically.


*Non-uniform UPDATE*s, such that a modified set of objects can all be 
updated with different data at the same time by utilising a CTE values 
statement JOINed to the UPDATE statement. As there's currently no way to do 
this kind of bulk update the alternative is to update each instance 
individually, and this doesn't scale well.

These could also be used with aggregations and other calculated fields to 
create complex queries that aren't possible at the moment.


*What my PoC looks like*

With another mildly hackish PoC that creates a VALUEs set from a 
dict/namedtuple which can be used to provide large input data, my present 
modified version syntax looks a bit like this (not perfect queries):

class Hashes(models.Model):
md5 = models.UUIDField(verbose_name="MD5 hash (base16)", db_index=True)
sha2 = models.CharField(max_length=44, null=True, verbose_name="SHA256 hash 
(base64)")

# Mock QuerySet of values
q_mo = Hashes.as_literal(input_hashes).values("md5", "sha2")
# A big IN query
q_in = Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5"))

# Matched existing values with composite 'IN' (where md5 and sha2 match, or md5 
matches and existing record lacks sha2)
q_ex = 
Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5")).filter(Q(sha160=q_mo.ref("sha160"))
 | Q(sha160=None))

# Create new records that don't exist
q_cr = Hashes.objects.attach(q_mo, 
q_ex).filter(md5=q_mo.ref("md5")).exclude(md5=q_ex.ref("md5")).values("md5", 
"sha2").as_insert()

Returning the newly created records.

SQL can be generated that looks something like this:

WITH cte_1_0 (md5, sha2) AS (
VALUES ('2d30243bfe9d06673765c432c2bd'::uuid, 
'fsA8okuCuq9KybxqcAzNdjlIyAx1QJjTPdf1ZFK/hDI='::varchar(44)),
('f20a46e4e60338697948a0917423', 
'6bVZgpYZtit1E32BlANWXoKnFFFDNierDSIi0SraND4=')),
cte_1 AS (
SELECT "hashes"."id", "hashes"."md5", "hashes"."sha2" 
FROM "hashes" , "cte_1_0" 
WHERE ("hashes"."md5" = (cte_1_0.md5) AND ("hashes"."sha2" = 
(cte_1_0.sha2) OR "hashes"."sha2" IS NULL) )) 
SELECT "hashes"."md5" 
FROM "hashes" , "cte_1_0" , "cte_1" 
WHERE ("hashes"."md5" = (cte_1_0.md5) AND NOT ("hashes"."md5" = (cte_1.md5)))

That is:

   - A qs.as_insert() and qs.as_update() on queryset to create *lazy* 
   insert and update queries.
   - A qs.attach() that allows querysets to be attached to other querysets, 
   and will generate them as CTE statements.
   - A qs.ref() that returns an expression that when the query is compiled 
   will be a field reference on the CTE that represents that queryset.
   - Additional compilers on the QuerySet subclasses that these return (so 
   no changes to base compilers meaning no