Re: New addition to the merge sql standard

2023-11-20 Thread Nick DeCoursin
 college haha).

However, it might be pertinent for me to reference this under the READ
COMMITTED semantics:

MERGE allows the user to specify various combinations of INSERT, UPDATE and
> DELETE subcommands. A MERGE command with both INSERT and UPDATE subcommands
> looks similar to INSERT with an ON CONFLICT DO UPDATE clause but does not
> guarantee that either INSERT or UPDATE will occur. If MERGE attempts an
> UPDATE or DELETE and the row is concurrently updated but the join
> condition still passes for the current target and the current source tuple,
> then MERGE will behave the same as the UPDATE or DELETE commands and
> perform its action on the updated version of the row. *However, because *
> *MERGE** can specify several actions and they can be conditional, the
> conditions for each action are re-evaluated on the updated version of the
> row, starting from the first action, even if the action that had originally
> matched appears later in the list of actions.* On the other hand, if the
> row is concurrently updated or deleted so that the join condition fails,
> then MERGE will evaluate the condition's NOT MATCHED actions next, and
> execute the first one that succeeds. If MERGE attempts an INSERT and a
> unique index is present and a duplicate row is concurrently inserted, then
> a uniqueness violation error is raised; MERGE does not attempt to avoid
> such errors by restarting evaluation of MATCHED conditions.
>

With this in mind, the `merge` statement doesn't block on concurrent
inserts, nor is that necessary. The merge semantics imply that there is no
blocking/waiting. Deriving from this or in tandem with this, the insert
within merge doesn't need to do any blocking or waiting either, only when
it actually performs the job of committing the insert, if this fails, then
perform the `on conflict do nothing`. Therefore, due to the original merge
semantics, merge insert doesn't need to follow the same requirements as
normal `insert`, and it doesn't need to *wait* - it's a best effort thing.

In my opinion, `merge` is meant for batch operations of large data, and
that's the best way to think about it. It's not meant for perfectly
serializable data. It's meant for moving huge datasets efficiently in a
best effort means.

Cheers,
Nick

On Thu, Nov 16, 2023 at 6:13 PM Alvaro Herrera 
wrote:

> On 2023-Nov-16, Nick DeCoursin wrote:
>
> > In my opinion, it would be better for merge to offer the functionality to
> > simply ignore the rows that cause unique violation exceptions instead of
> > tanking the whole query.
>
> "ignore" may not be what you want, though.  Perhaps the fact that insert
> (coming from the NOT MATCHED clause) fails (== conflicts with a tuple
> concurrently inserted in an unique or exclusion constraint) should
> transform the row operation into a MATCHED case, so it'd fire the other
> clauses in the overall MERGE operation.  Then you could add a WHEN
> MATCHED DO NOTHING case which does the ignoring that you want; or just
> let them be handled by WHEN MATCHED UPDATE or whatever.  But you may
> need some way to distinguish rows that appeared concurrently from rows
> that were there all along.
>
> In regards to the SQL standard, I hope what you're saying is merely not
> documented by them.  If it indeed isn't, it may be possible to get them
> to accept some new behavior, and then I'm sure we'd consider
> implementing it.  If your suggestion goes against what they already
> have, I'm afraid you'd be doomed.  So the next question is, how do other
> implementations handle this case you're talking about?  SQL Server, DB2
> and Oracle being the relevant ones.
>
> Assuming the idea is good and there are no conflicts, then maybe it's
> just lack of round tuits.
>
> Happen to have some?
>
> I vaguely recall thinking about this, and noticing that implementing
> something of this sort would require messing around with the ExecInsert
> interface.  It'd probably require splitting it in pieces, similar to how
> ExecUpdate was split.
>
> There are some comments in the code about possible "live-locks" where
> merge would be eternally confused between inserting a new row which it
> then wants to delete; or something like that.  For sure we would need to
> understand the concurrent behavior of this new feature very clearly.
>
>
> An interesting point is that our inserts *wait* to see whether the
> concurrent insertion commits or aborts, when a unique constraint is
> involved.  I'm not sure you want to have MERGE blocking on concurrent
> inserts.  This is all assuming READ COMMITTED semantics; on REPEATABLE
> READ or higher, I think you're just screwed, because of course MERGE is
> not going to get a snapshot that sees the rows inserted by transactions
> that started after.
>
> You'd need to explore all this very carefully.
>
> --
> Álvaro HerreraBreisgau, Deutschland  —
> https://www.EnterpriseDB.com/
>


New addition to the merge sql standard

2023-11-16 Thread Nick DeCoursin
Dear Postgres Administrators,

There was a great article of `merge` by Lukas Fittl here:
https://pganalyze.com/blog/5mins-postgres-15-merge-vs-insert-on-conflict

In his article, he highlights one of the severe disadvantages to merge:

The comment that he essentially made is that the downside of MERGE's
> handling of concurrency is that when you concurrently INSERT, so at the
> same time as you're executing the MERGE statement, there is another INSERT
> going on, then MERGE might not notice that. MERGE would go into its INSERT
> logic, and then it would get a unique violation.


This means that any individual row insert during the insert logic of the
merge query can cause a unique violation and tank the entire merge query.

I explained this in more detail here:
https://stackoverflow.com/questions/77479975/postgres-merge-silently-ignore-unique-constraint-violation

In my opinion, it would be better for merge to offer the functionality to
simply ignore the rows that cause unique violation exceptions instead of
tanking the whole query.

Thank you,
Nick