Re: Database agnostic EnumField

2017-10-20 Thread Ashley Waite
Thanks for that link Tom, I thought I'd come across that before but forgot 
where I saw the discussion!
That definitely seems like the right place.

Jani, I patch in several changes to core classes in order to do this 
without any of those caveats, but have tried to use as few as was possible.

Specifically:
* Changes to the migration state classes to allow for state tracking of 
types
* Changes to schema editor to allow for parametised db_types
* Changes to migration autodetector to have operations generated and in the 
right places (enums before models)

Tom, to handle existing records I apply on_delete functionality. If 
declared on a field (optional) it provides a default behaviour for that 
field, but it also asks during makemigrations (see here 
https://github.com/ashleywaite/django-more/blob/master/django_enum/patches.py#L28
 
) and then gives the option to apply any of those equivalent behaviours to 
existing records, including setting the value to a newly added value so you 
can effectively 'rename' a value within the enum.

Then during migrate these are applied, which can update records 
accordingly, or block migration if there's data that has protect.

Though I do need to update this to do that part via the execute() method so 
that it plays properly with sqlmigrate.

- Ashley

On Saturday, October 21, 2017 at 5:39:30 AM UTC+11, Tom Forbes wrote:
>
> There is a maybe/someday ticket for this here: 
> https://code.djangoproject.com/ticket/24342
>
> If your interested you could work on adding this into Django through that 
> ticket, it seems like an interesting idea.
>
> When removing an enum value, how do you handle existing records? Are all 
> supported databases consistent in throwing errors if existing rows are 
> still present with a removed enumeration value?
>
> On Fri, Oct 20, 2017 at 12:55 PM, Ashley Waite <ashley@gmail.com 
> > wrote:
>
>> I've been working a bit on an EnumField implementation because it'll save 
>> me a lot of future time in a project, and existing implementations seem to 
>> be fragile, non-reversible, or one-database-only. Wondering why there isn't 
>> a PEP435 based EnumField in Django itself, I didn't find many answers with 
>> a search on the mailing list.
>>
>> Is this a feature that would be considered, and if so, what would the 
>> expectations on it be?
>>
>> I was a bit reluctant on all the implementations I could find because 
>> they seem to reduce to these issues:
>> * Using an int/char instead of database enum
>> * Being database vendor specific
>> * Requiring a non-standard enum or sub-class of it
>> * Not allowing enum reuse
>> * Not migrating changes statefully (ie, injecting type declaration on 
>> connection in postgres)
>> * Not allowing enum changes (add/remove/rename)
>> * Not parametising enum values (mysql)
>> * Not handling data consistency on enum changes
>>
>> And realised, maybe that's why it's not a standard field.
>>
>> I've done a POC implementation that works for mysql and postgres, and 
>> should be able to easily generalise to work on any database via two flags 
>> (has_enum, and requires_enum_declaration) that determine how to deal with 
>> changes to it.
>>
>> It addresses all of these issues, migrates, and with a little more work 
>> can handle data effects as well.
>>
>> So where should I go with this from here?
>> https://github.com/ashleywaite/django-more/tree/master/django_enum
>>
>> - Ashley
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-develop...@googlegroups.com .
>> To post to this group, send email to django-d...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/django-developers.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-developers/317b5aea-b68f-467b-886d-68a49c7194c7%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/django-developers/317b5aea-b68f-467b-886d-68a49c7194c7%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/efc9bde8-294c-4127-b2c1-718c1cbd6c63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Database agnostic EnumField

2017-10-20 Thread Ashley Waite
I've been working a bit on an EnumField implementation because it'll save 
me a lot of future time in a project, and existing implementations seem to 
be fragile, non-reversible, or one-database-only. Wondering why there isn't 
a PEP435 based EnumField in Django itself, I didn't find many answers with 
a search on the mailing list.

Is this a feature that would be considered, and if so, what would the 
expectations on it be?

I was a bit reluctant on all the implementations I could find because they 
seem to reduce to these issues:
* Using an int/char instead of database enum
* Being database vendor specific
* Requiring a non-standard enum or sub-class of it
* Not allowing enum reuse
* Not migrating changes statefully (ie, injecting type declaration on 
connection in postgres)
* Not allowing enum changes (add/remove/rename)
* Not parametising enum values (mysql)
* Not handling data consistency on enum changes

And realised, maybe that's why it's not a standard field.

I've done a POC implementation that works for mysql and postgres, and 
should be able to easily generalise to work on any database via two flags 
(has_enum, and requires_enum_declaration) that determine how to deal with 
changes to it.

It addresses all of these issues, migrates, and with a little more work can 
handle data effects as well.

So where should I go with this from here?
https://github.com/ashleywaite/django-more/tree/master/django_enum

- Ashley

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/317b5aea-b68f-467b-886d-68a49c7194c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: PostgreSQL Partial Indexes package

2017-10-20 Thread Ashley Waite
I did a similar thing (but only tested in postgres) a while back, using Q's 
/ filter clause to generate the where clause.

Perhaps with our powers combined we're close to a generalised 
implementation?

https://github.com/ashleywaite/django-more/blob/master/django_more/indexes.py



On Saturday, October 7, 2017 at 7:56:00 PM UTC+11, Mattias Linnap wrote:
>
> Hi django-developers,
>
> I have written a package that implements PostgreSQL and SQLite partial 
> indexes on top of the new class-based indexes: 
> https://github.com/mattiaslinnap/django-partial-index
> The most common use case is partial unique constraints, but I have a few 
> projects where non-unique partial indexes have turned out useful as well.
>
> I have a few questions on how to continue with this:
>
> 1. Right now the "where condition" expression is provided as a string, and 
> has to be different for PostgreSQL and SQLite in some common cases (for 
> example boolean literals). Is there a good abstraction for SQL expressions 
> somewhere in Django internals that I could use instead, something similar 
> to Q-expressions perhaps? In particular, to add validate_unique() support 
> to ModelForms, I would need to be able to extract all fields that are 
> mentioned in the where condition.
> 2. I've seen mentions of a "check constraints" support being in 
> development (https://github.com/django/django/pull/7615). Will that 
> include partial unique constraints, or is just for the per-column checks?
> 3. If separate, then it would be nice to one day get partial indexes 
> merged into the contrib.postgres package. Do you have any suggestions on 
> what needs to happen before that - more test coverage, more contributors, 
> more users, or similar?
>
> Best,
>
> Mattias
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/84f4431a-cdab-43c7-b96b-de4d2eb9805b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Adding generated common table expressions

2017-04-06 Thread Ashley Waite
Oh right, I feel silly for not thinking of named arguments as a resolution
there, that's a damn good idea!

It resolves a couple of minor issues I'd anticipated in a complete
implementation, so consider the idea stolen!  ;)

I mean, ummm, that was totally what I had planned!

- Ashley

On 06/04/2017 6:53 PM, "Marc Tamlyn" <marc.tam...@gmail.com> wrote:

> Hi!
>
> This looks generally very good, and I'm quite excited about it.
>
> In terms of taking it forwards, I think a DEP is a very good idea, and
> there are at least 3 core devs who are keen to see a solution. Whether you
> have the right solution yet I'm not so sure, but it sounds like you're not
> either.
>
> Regarding Anssi's comments about SubQuery, we do now have that in core as
> of 1.11 [0]. It does look like an .attach() approach might actually have
> been a nicer version of this, but on the other hand it's currently
> implementable solely with the Expressions API. It seems like the OuterRef
> is very similar to your queryset.ref(). An even nicer approach using attach
> could be to say qs.attach(q1=some_qs).filter(a=F('q1__b'))?
>
> Looking forwards to seeing a DEP!
>
> [0] https://docs.djangoproject.com/en/1.11/ref/
> models/expressions/#subquery-expressions
>
> On 22 March 2017 at 01:32, Ashley Waite <ashley.c.wa...@gmail.com> wrote:
>
>> Here's the code changes I've made, noting that some of them were to shove
>> in a generalised VALUES clause that mocks being a queryset, so that it
>> plays with the same interface.
>>
>> https://github.com/django/django/compare/master...ashleywait
>> e:cte-dev#files_bucket
>>
>> I've had a glance at cte-trees/cte-forest and once general CTEs are
>> worked out expanding that to include recursive CTEs wouldn't be too
>> difficult, and that would greatly simplify the implementation of cte-forest
>> to the extent that it might be viable as a django data/reference type.
>>
>> - Ashley
>>
>>
>> On Saturday, March 18, 2017 at 8:28:53 PM UTC+11, Josh Smeaton wrote:
>>>
>>> Thanks for bringing this up Ashley, and for all of the detail you
>>> provided. I'd certainly like to see CTEs make their way into Django,
>>> provided we could come up with a nice enough API. From the look of it,
>>> you've already got something that works with an okay API so I'm hopeful.
>>>
>>> I'd be very interested in seeing your POC too if you're able to share.
>>>
>>> From looking very briefly at django-cte-trees it doesn't aim to support
>>> user defined CTEs for anything other than recursive queries. I'd be
>>> interested in seeing, as part of a DEP, how CTE inclusion in django core
>>> could support the cte-trees project from an API perspective.
>>>
>>> On Friday, 17 March 2017 22:28:17 UTC+11, Ashley Waite wrote:
>>>>
>>>> Hey all,
>>>>
>>>>
>>>> I'd like to suggest adding Common Table Expression (CTE) query
>>>> generation as a feature to Django.
>>>>
>>>> I've been working on a project that required manipulation of many
>>>> records at once, and as with many ORMs found that this wasn't an ideal
>>>> use-case in Django. As the rest of our code base and related projects are
>>>> in Django, there was a strong preference to find a way to do it and keep to
>>>> the same model-is-the-truth design.
>>>>
>>>> I first did this by writing some hackish functions using raw querysets
>>>> and generating my own CTE based queries, but it lacked ideal flexibility
>>>> and maintainability. So I've now written some modifications into my Django
>>>> to do this in a more Django-esque way and think that this functionality
>>>> would be beneficial within the project itself, but am unsure exactly where
>>>> to start the conversation about that.
>>>>
>>>>
>>>> *Why generate CTE based queries from querysets?*
>>>>
>>>> By allowing querysets to be attached to each other, and setting
>>>> appropriate WHERE clauses, arbitrary and nested SQL queries can be
>>>> generated. Where the results of the queries are only necessary for the
>>>> execution of following queries this saves a very substantial amount of time
>>>> and database work. Once these features exist, other functionality can also
>>>> transparently use these to generate more efficient queries (such as large
>>>> IN clauses).
>>>>
>>>> This allows several powerful use cases I think Django would benefit
>>>> f

Re: Adding generated common table expressions

2017-03-21 Thread Ashley Waite
Here's the code changes I've made, noting that some of them were to shove 
in a generalised VALUES clause that mocks being a queryset, so that it 
plays with the same interface.

https://github.com/django/django/compare/master...ashleywaite:cte-dev#files_bucket

I've had a glance at cte-trees/cte-forest and once general CTEs are worked 
out expanding that to include recursive CTEs wouldn't be too difficult, and 
that would greatly simplify the implementation of cte-forest to the extent 
that it might be viable as a django data/reference type.

- Ashley

On Saturday, March 18, 2017 at 8:28:53 PM UTC+11, Josh Smeaton wrote:
>
> Thanks for bringing this up Ashley, and for all of the detail you 
> provided. I'd certainly like to see CTEs make their way into Django, 
> provided we could come up with a nice enough API. From the look of it, 
> you've already got something that works with an okay API so I'm hopeful.
>
> I'd be very interested in seeing your POC too if you're able to share.
>
> From looking very briefly at django-cte-trees it doesn't aim to support 
> user defined CTEs for anything other than recursive queries. I'd be 
> interested in seeing, as part of a DEP, how CTE inclusion in django core 
> could support the cte-trees project from an API perspective.
>
> On Friday, 17 March 2017 22:28:17 UTC+11, Ashley Waite wrote:
>>
>> Hey all,
>>
>>
>> I'd like to suggest adding Common Table Expression (CTE) query generation 
>> as a feature to Django.
>>
>> I've been working on a project that required manipulation of many records 
>> at once, and as with many ORMs found that this wasn't an ideal use-case in 
>> Django. As the rest of our code base and related projects are in Django, 
>> there was a strong preference to find a way to do it and keep to the same 
>> model-is-the-truth design.
>>
>> I first did this by writing some hackish functions using raw querysets 
>> and generating my own CTE based queries, but it lacked ideal flexibility 
>> and maintainability. So I've now written some modifications into my Django 
>> to do this in a more Django-esque way and think that this functionality 
>> would be beneficial within the project itself, but am unsure exactly where 
>> to start the conversation about that.
>>
>>
>> *Why generate CTE based queries from querysets?*
>>
>> By allowing querysets to be attached to each other, and setting 
>> appropriate WHERE clauses, arbitrary and nested SQL queries can be 
>> generated. Where the results of the queries are only necessary for the 
>> execution of following queries this saves a very substantial amount of time 
>> and database work. Once these features exist, other functionality can also 
>> transparently use these to generate more efficient queries (such as large 
>> IN clauses).
>>
>> This allows several powerful use cases I think Django would benefit from:
>>
>>
>> *Large 'IN' clauses*, can be implemented as CTEs reducing expensive 
>> lookups to a single CTE INNER JOIN. For sets of thousands to match from 
>> tables of millions of records this can be a very substantial gain.
>>
>>
>> *Composite 'IN' conditions,* where multiple fields must match and you're 
>> matching against a large set of condition rows. In my usage this was "where 
>> the md5/sha hashes match one of the million md5/sha tuples in my match 
>> set". This is simply a CTE JOIN with two clauses in the WHERE.
>>
>>
>> *Nested data creation*, where the parent doesn't yet exist. Django 
>> doesn't currently do this as the primary keys are needed, and this makes 
>> normalised data structures unappealing. Using INSERTs as CTEs that supply 
>> those keys to following statements means that entire nested data structures 
>> of new information can be recreated in the database at once, efficiently 
>> and atomically.
>>
>>
>> *Non-uniform UPDATE*s, such that a modified set of objects can all be 
>> updated with different data at the same time by utilising a CTE values 
>> statement JOINed to the UPDATE statement. As there's currently no way to do 
>> this kind of bulk update the alternative is to update each instance 
>> individually, and this doesn't scale well.
>>
>> These could also be used with aggregations and other calculated fields to 
>> create complex queries that aren't possible at the moment.
>>
>>
>> *What my PoC looks like*
>>
>> With another mildly hackish PoC that creates a VALUEs set from a 
>> dict/namedtuple which can be used to provide large input data, my present 
>> modified version syntax looks a bit like this (not perfect q

Re: Adding generated common table expressions

2017-03-21 Thread Ashley Waite
CTE Forest is a specific use case of CTEs to store self-referential tree 
data sets.

It's quite a different use to what I'm proposing, though its implementation 
might become simpler if Django is generally CTE aware.

- Ashley

On Saturday, March 18, 2017 at 6:59:57 PM UTC+11, Matthias Kestenholz wrote:
>
> Hi, 
>
> On Sat, Mar 18, 2017 at 12:49 AM, Tim Graham  > wrote: 
> > Hi, I don't know anything about CTE, but did you see this third-party 
> > package? https://github.com/petrounias/django-cte-trees -- It seems to 
> be 
> > PostgreSQL only. 
>
> Just chiming in to point out a maintained and up-to-date friendly fork 
> of the project above: 
> https://github.com/matthiask/django-cte-forest 
>
> Ashley, is your PoC available somewhere? Is your implementation in any 
> way similar to the implementation in 
> django-cte-trees/django-cte-forest? 
>
> Thanks, 
> Matthias 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/742681c3-85e7-4420-89c8-834c33690990%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Adding generated common table expressions

2017-03-21 Thread Ashley Waite
 defined CTEs for anything other than recursive queries. I'd be 
>> interested in seeing, as part of a DEP, how CTE inclusion in django core 
>> could support the cte-trees project from an API perspective.
>>
>> On Friday, 17 March 2017 22:28:17 UTC+11, Ashley Waite wrote:
>>>
>>> Hey all,
>>>
>>>
>>> I'd like to suggest adding Common Table Expression (CTE) query 
>>> generation as a feature to Django.
>>>
>>> I've been working on a project that required manipulation of many 
>>> records at once, and as with many ORMs found that this wasn't an ideal 
>>> use-case in Django. As the rest of our code base and related projects are 
>>> in Django, there was a strong preference to find a way to do it and keep to 
>>> the same model-is-the-truth design.
>>>
>>> I first did this by writing some hackish functions using raw querysets 
>>> and generating my own CTE based queries, but it lacked ideal flexibility 
>>> and maintainability. So I've now written some modifications into my Django 
>>> to do this in a more Django-esque way and think that this functionality 
>>> would be beneficial within the project itself, but am unsure exactly where 
>>> to start the conversation about that.
>>>
>>>
>>> *Why generate CTE based queries from querysets?*
>>>
>>> By allowing querysets to be attached to each other, and setting 
>>> appropriate WHERE clauses, arbitrary and nested SQL queries can be 
>>> generated. Where the results of the queries are only necessary for the 
>>> execution of following queries this saves a very substantial amount of time 
>>> and database work. Once these features exist, other functionality can also 
>>> transparently use these to generate more efficient queries (such as large 
>>> IN clauses).
>>>
>>> This allows several powerful use cases I think Django would benefit from:
>>>
>>>
>>> *Large 'IN' clauses*, can be implemented as CTEs reducing expensive 
>>> lookups to a single CTE INNER JOIN. For sets of thousands to match from 
>>> tables of millions of records this can be a very substantial gain.
>>>
>>>
>>> *Composite 'IN' conditions,* where multiple fields must match and 
>>> you're matching against a large set of condition rows. In my usage this was 
>>> "where the md5/sha hashes match one of the million md5/sha tuples in my 
>>> match set". This is simply a CTE JOIN with two clauses in the WHERE.
>>>
>>>
>>> *Nested data creation*, where the parent doesn't yet exist. Django 
>>> doesn't currently do this as the primary keys are needed, and this makes 
>>> normalised data structures unappealing. Using INSERTs as CTEs that supply 
>>> those keys to following statements means that entire nested data structures 
>>> of new information can be recreated in the database at once, efficiently 
>>> and atomically.
>>>
>>>
>>> *Non-uniform UPDATE*s, such that a modified set of objects can all be 
>>> updated with different data at the same time by utilising a CTE values 
>>> statement JOINed to the UPDATE statement. As there's currently no way to do 
>>> this kind of bulk update the alternative is to update each instance 
>>> individually, and this doesn't scale well.
>>>
>>> These could also be used with aggregations and other calculated fields 
>>> to create complex queries that aren't possible at the moment.
>>>
>>>
>>> *What my PoC looks like*
>>>
>>> With another mildly hackish PoC that creates a VALUEs set from a 
>>> dict/namedtuple which can be used to provide large input data, my present 
>>> modified version syntax looks a bit like this (not perfect queries):
>>>
>>> class Hashes(models.Model):
>>> md5 = models.UUIDField(verbose_name="MD5 hash (base16)", db_index=True)
>>> sha2 = models.CharField(max_length=44, null=True, verbose_name="SHA256 
>>> hash (base64)")
>>>
>>> # Mock QuerySet of values
>>> q_mo = Hashes.as_literal(input_hashes).values("md5", "sha2")
>>> # A big IN query
>>> q_in = Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5"))
>>>
>>> # Matched existing values with composite 'IN' (where md5 and sha2 match, or 
>>> md5 matches and existing record lacks sha2)
>>> q_ex = 
>>> Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5&

Adding generated common table expressions

2017-03-17 Thread Ashley Waite


Hey all,


I'd like to suggest adding Common Table Expression (CTE) query generation 
as a feature to Django.

I've been working on a project that required manipulation of many records 
at once, and as with many ORMs found that this wasn't an ideal use-case in 
Django. As the rest of our code base and related projects are in Django, 
there was a strong preference to find a way to do it and keep to the same 
model-is-the-truth design.

I first did this by writing some hackish functions using raw querysets and 
generating my own CTE based queries, but it lacked ideal flexibility and 
maintainability. So I've now written some modifications into my Django to 
do this in a more Django-esque way and think that this functionality would 
be beneficial within the project itself, but am unsure exactly where to 
start the conversation about that.


*Why generate CTE based queries from querysets?*

By allowing querysets to be attached to each other, and setting appropriate 
WHERE clauses, arbitrary and nested SQL queries can be generated. Where the 
results of the queries are only necessary for the execution of following 
queries this saves a very substantial amount of time and database work. 
Once these features exist, other functionality can also transparently use 
these to generate more efficient queries (such as large IN clauses).

This allows several powerful use cases I think Django would benefit from:


*Large 'IN' clauses*, can be implemented as CTEs reducing expensive lookups 
to a single CTE INNER JOIN. For sets of thousands to match from tables of 
millions of records this can be a very substantial gain.


*Composite 'IN' conditions,* where multiple fields must match and you're 
matching against a large set of condition rows. In my usage this was "where 
the md5/sha hashes match one of the million md5/sha tuples in my match 
set". This is simply a CTE JOIN with two clauses in the WHERE.


*Nested data creation*, where the parent doesn't yet exist. Django doesn't 
currently do this as the primary keys are needed, and this makes normalised 
data structures unappealing. Using INSERTs as CTEs that supply those keys 
to following statements means that entire nested data structures of new 
information can be recreated in the database at once, efficiently and 
atomically.


*Non-uniform UPDATE*s, such that a modified set of objects can all be 
updated with different data at the same time by utilising a CTE values 
statement JOINed to the UPDATE statement. As there's currently no way to do 
this kind of bulk update the alternative is to update each instance 
individually, and this doesn't scale well.

These could also be used with aggregations and other calculated fields to 
create complex queries that aren't possible at the moment.


*What my PoC looks like*

With another mildly hackish PoC that creates a VALUEs set from a 
dict/namedtuple which can be used to provide large input data, my present 
modified version syntax looks a bit like this (not perfect queries):

class Hashes(models.Model):
md5 = models.UUIDField(verbose_name="MD5 hash (base16)", db_index=True)
sha2 = models.CharField(max_length=44, null=True, verbose_name="SHA256 hash 
(base64)")

# Mock QuerySet of values
q_mo = Hashes.as_literal(input_hashes).values("md5", "sha2")
# A big IN query
q_in = Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5"))

# Matched existing values with composite 'IN' (where md5 and sha2 match, or md5 
matches and existing record lacks sha2)
q_ex = 
Hashes.objects.attach(q_mo).filter(md5=q_mo.ref("md5")).filter(Q(sha160=q_mo.ref("sha160"))
 | Q(sha160=None))

# Create new records that don't exist
q_cr = Hashes.objects.attach(q_mo, 
q_ex).filter(md5=q_mo.ref("md5")).exclude(md5=q_ex.ref("md5")).values("md5", 
"sha2").as_insert()

Returning the newly created records.

SQL can be generated that looks something like this:

WITH cte_1_0 (md5, sha2) AS (
VALUES ('2d30243bfe9d06673765c432c2bd'::uuid, 
'fsA8okuCuq9KybxqcAzNdjlIyAx1QJjTPdf1ZFK/hDI='::varchar(44)),
('f20a46e4e60338697948a0917423', 
'6bVZgpYZtit1E32BlANWXoKnFFFDNierDSIi0SraND4=')),
cte_1 AS (
SELECT "hashes"."id", "hashes"."md5", "hashes"."sha2" 
FROM "hashes" , "cte_1_0" 
WHERE ("hashes"."md5" = (cte_1_0.md5) AND ("hashes"."sha2" = 
(cte_1_0.sha2) OR "hashes"."sha2" IS NULL) )) 
SELECT "hashes"."md5" 
FROM "hashes" , "cte_1_0" , "cte_1" 
WHERE ("hashes"."md5" = (cte_1_0.md5) AND NOT ("hashes"."md5" = (cte_1.md5)))

That is:

   - A qs.as_insert() and qs.as_update() on queryset to create *lazy* 
   insert and update queries.
   - A qs.attach() that allows querysets to be attached to other querysets, 
   and will generate them as CTE statements.
   - A qs.ref() that returns an expression that when the query is compiled 
   will be a field reference on the CTE that represents that queryset.
   - Additional compilers on the QuerySet subclasses that these return (so 
   no changes to base compilers meaning no