Re: RFC: "UPSERT" in PostgreSQL

Anssi Kääriäinen Mon, 29 Sep 2014 00:48:08 -0700

I think we could use UPSERT for our .save() logic. Django's save()
method is defined as:
  - Insert a row in to the database if there isn't already a matching
row for the saved model's primary key
  - Otherwise update the row


This is currently implemented as "try to update, if nothing was updated,
then insert". Naturally, that approach has some race conditions.

There are a couple of problems with the suggested INSERT ... ON
DUPLICATE KEY UPDATE feature for this use case:
  1. We need to use WITHIN primary_key_idx_name, but we don't
necessarily know the primary key index name of the table. It would be
extremely useful to have support for WITHIN PRIMARY KEY syntax.
  2. We need to know if the row was updated or if it was created
(post_save signal needs this information). Looking at the spec, it seems
this is possible to do by issuing a query:
   INSERT ... ON DUPLICATE KEY UPDATE RETURNING primary_key_col;
and then, if nothing is returned, we know it was an update. However, I
see this way as problematic. if PostgreSQL ever wants to allow returning
the updated values on conflict, then using RETURNING primary_key_col;
wouldn't work any more. It seems somewhat likely that somebody will want
to add support for RETURNING for the ON CONFLICT UPDATE case later on.

Using the proposed feature for create_or_update() method isn't that easy
- the problem is that the user is free to specify any filtering for the
model to be upserted. We would need to check if the filtering matches
some unique index exactly, and if it does, then do an INSERT ON
DUPLICATE KEY UPDATE WITHING uq_index_name, but we don't know the index
name.

Still possible use case is some sort of "bulk merge" operation. The
operation would be defined as:
  - It takes in a list of model instances
  - For those models which have primary key set:
    - If the database has a row with same pk, that row is updated
    - Otherwise a new row is inserted
  - Those models that do not have primary key set are bulk inserted to
the database.

A good use case is for example updating employee table from an external
resource, or just loading test data from a file (AKA fixture loading).

If we could use WITHIN PRIMARY KEY, and have better knowledge of which
rows were inserted and which updated then it seems the proposed feature
would match the bulk merge's primary key set use case perfectly.

Even if MySQL has the ON DUPLICATE KEY UPDATE feature we haven't yet
used it. Quickly checking, it seems we can't use it, because we can't
define to use only the primary key index of the table for conflict
checking.

Some non-Django review comments:
  - I am not sure what exactly the "ON CONFLICT UPDATE also optionally
accepts a WITHIN clause..." section of the docs means.
  - I am not sure what exactly is supposed to happen if you do:
   INSERT INTO author(name, age) VALUES ('Anssi', 33) ON CONFLICT UPDATE
SET name = CONFLICTING(name), age = CONFLICTING(age);
   and author has separate unique indexes on name and age, and finally
two rows with values ('Tom', 33), and ('Anssi', 30) exists. If I am
reading the section mentioned in the first item correctly, then one of
the conflicting indexes is chosen as the source of the conflict, and
that row's value is then updated. Is this effectively random behavior a
good API?
  - Wild suggestion: Maybe it would be better to default to the PRIMARY
KEY index of the table. If no PK index exists, the user must specify
which unique index to use. Maybe there shouldn't be a possibility to
specify more than one unique index?
  - I assume there is a good reason to use CONFLICTING(id) instead of
CONFLICTING.id in the syntax?
  - I didn't see tests for expression or partial indexes in the patches.
Are partial unique indexes supported?

 - Anssi

On Sat, 2014-09-27 at 16:01 -0700, Peter Geoghegan wrote:
> Hello,
> 
> I am a PostgreSQL major contributor, currently undertaking development
> of a feature sometimes called "UPSERT" for PostgreSQL.
> 
> Jacob Kaplan-Moss is an acquaintance and co-worker, and I know that
> it's certainly something that interests him personally - presumably he
> has some ideas on how an UPSERT-like feature could be used by Django.
> Beyond that, I don't have a good sense of what the level of interest
> is in the Django community. I would like to reach out and get feedback
> on the syntax that I've proposed on the pgsql-hackers development
> list. Certainly, I've put a lot of work into this, and expect to put a
> lot more in, so obviously it's important that we not make any missteps
> with user-visible semantics. I've used Django a bit in the past, and
> my general impression is that the Django development community has
> made sound technical decisions, so I'm happy to take your concerns as
> representative of the concerns of ORM authors and web frameworks
> generally.
> 
> This is a non-standard syntax that is somewhat comparable to MySQL's
> INSERT ... ON DUPLICATE KEY UPDATE, while being more flexible and
> safe. The SQL standard's MERGE statement is kind of weird, and has a
> lot of baggage from trying to serve multiple use-cases, so I avoided
> trying to implement that, preferring to focus on something that has
> simple atomic insert-or-update semantics - what people seem to
> actually want.
> 
> I would like to get a sense of:
> 
> * Would you consider the syntax that I've proposed a good one?
> 
> * If it was available, would you use it in future versions of Django?
> Do you think the plugins ecosystem could take advantage of it sooner?
> Is the demand there?
> 
> * What could be better about the feature?
> 
> * How are you using the MySQL feature (INSERT ... ON DUPLICATE KEY
> UPDATE), if at all? Do you think it would be useful to have some
> degree of compatibility with it? Why or why not?
> 
> I've built html user-facing documentation, which is publicly
> available. This page is probably of most interest -- there is
> extensive discussion of the new "ON CONFLICT UPDATE/IGNORE" clause:
> 
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html
> 
> These may be of some interest (search for "on conflict" to see the new
> references to the feature):
> 
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/transaction-iso.html
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/ddl-inherit.html
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-createrule.html
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/trigger-definition.html
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-createtrigger.html
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/index-unique-checks.html
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-createview.html
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/postgres-fdw.html
> 
> If you feel like actually trying out my patch, you can download the
> latest version of the patchset from:
> 
> http://www.postgresql.org/message-id/CAM3SWZRvkCKc=1Y6_Wn8mk97_Vi8+j-aX-RY-=msrjvu-ec...@mail.gmail.com
> 
> You'll need to build PostgreSQL from the git master branch (which
> includes dependencies on things like Flex and Bison), with the patch
> set applied. There are guides to doing this on the internet, including
> this one: http://www.postgresql.org/docs/devel/static/sourcerepo.html
> 
> Thanks!
> -- 
> Peter Geoghegan
> 


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/1411977381.14923.64.camel%40TTY32.
For more options, visit https://groups.google.com/d/optout.

Re: RFC: "UPSERT" in PostgreSQL

Reply via email to