Re: [GENERAL] Why is unique constraint needed for upsert? (treat atomicity as optional)
On 7/23/14 6:03 PM, John R Pierce wrote: On 7/23/2014 1:45 PM, Seamus Abshere wrote: What if we treat atomicity as optional? atomicity is not and never will be optional in PostgreSQL. I'm wondering what a minimal definition of upsert could be - possibly separating concurrency handling out as a (rigorously defined) option for those who need it. -- no guarantees, no index required UPSERT age = 5 INTO dogs WHERE name = 'Jerry'; and if there's several rows with name='Jerry', you'd want to update them ALL ? if name isn't indexed, this will, as Tom suggests, require a FULL table scan, and it still will have issues with concurrency Ah, I was just saying, in terms of correctness, it seems to me that upsert shouldn't NEED a index to work, just like you don't need an index on "name" when you say WHERE name = 'Jerry' in SELECTs or INSERTS or UPDATES. Appreciate the defense of data integrity in any case!! Best, Seamus -- Seamus Abshere, SCEA https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is unique constraint needed for upsert? (treat atomicity as optional)
On 7/23/2014 1:45 PM, Seamus Abshere wrote: What if we treat atomicity as optional? You could have extremely readable syntax like: atomicity is not and never will be optional in PostgreSQL. -- no guarantees, no index required UPSERT age = 5 INTO dogs WHERE name = 'Jerry'; and if there's several rows with name='Jerry', you'd want to update them ALL ? if name isn't indexed, this will, as Tom suggests, require a FULL table scan, and it still will have issues with concurrency (connection scans table, finds nothing, starts to insert, user 2 scans table, finds nothing, starts to insert, poof, now we have two records?!?). If name *is* indexed and unique, this collision will cause a error at commit for at least one of those connections. -- john r pierce 37N 122W somewhere on the middle of the left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is unique constraint needed for upsert? (treat atomicity as optional)
Seamus Abshere writes: > On 7/23/14 3:40 PM, Tom Lane wrote: >> For the OP's benefit --- the subtext John left unstated is that the >> unique-key mechanism has already solved the problem of preventing >> concurrent updates from creating duplicate keys. > What if we treat atomicity as optional? You'll get a *much* warmer response to that kind of suggestion from MongoDB or MySQL, no doubt. PG is not in the business of optional data integrity. > I just think there are a lot of non-concurrent bulk loading and > processing workflows that could benefit from the performance advantages > of upsert (one trip to database). What exactly is your argument for supposing that an UPSERT without an underlying index would perform so well? It seems much more likely that it'd suck, because of having to do full-table scans to look for existing rows. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is unique constraint needed for upsert? (treat atomicity as optional)
On 7/23/14 3:40 PM, Tom Lane wrote: John R Pierce writes: On 7/23/2014 10:21 AM, Seamus Abshere wrote: Upsert is usually defined [1] in reference to a violating a unique key: Is this theoretically preferable to just looking for a row that matches certain criteria, updating it if found or inserting otherwise? what happens when two connections do this more or less concurrently, in transactions? For the OP's benefit --- the subtext John left unstated is that the unique-key mechanism has already solved the problem of preventing concurrent updates from creating duplicate keys. If we build a version of UPSERT that doesn't rely on a unique index then it'll need some entirely new mechanism to prevent concurrent key insertion. (And if you don't care about concurrent cases, you don't really need UPSERT ...) hi all, What if we treat atomicity as optional? You could have extremely readable syntax like: -- no guarantees, no index required UPSERT age = 5 INTO dogs WHERE name = 'Jerry'; -- optionally tell us how you want to deal with collision UPSERT age = 3 INTO dogs WHERE name = 'Jerry' ON DUPLICATE KEY KEEP NEWEST; UPSERT age = 5 INTO dogs WHERE name = 'Jerry' ON DUPLICATE KEY KEEP NEWEST; -- only **require** (by throwing an error) a unique index or a locked table for queries like UPSERT age = age+1 INTO dogs WHERE name = 'Jerry'; Obviously this flies in the face of what most people say the "fundamental Upsert property" is [1] At READ COMMITTED isolation level, you should always get an atomic insert or update [1] I just think there are a lot of non-concurrent bulk loading and processing workflows that could benefit from the performance advantages of upsert (one trip to database). Best, thanks, Seamus [1] http://www.pgcon.org/2014/schedule/events/661.en.html -- Seamus Abshere, SCEA https://github.com/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general