Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

David G Johnston Wed, 23 Jul 2014 15:49:34 -0700

Peter Geoghegan-3 wrote
>> with semantics like this:
>>
>> 1. Search the table, using any type of scan you like, for a row
>> matching the given predicate.
> 
> Perhaps I've misunderstood, but this is fundamentally different to
> what I'd always thought would need to happen. I always understood that
> the UPSERT should be insert-driven, and so not really have an initial
> scan at all in the same sense that every Insert lacks one. Moreover,
> I've always thought that everyone agreed on that. We go to insert, and
> then in the course of inserting index tuples do something with dirty
> snapshots. That's how we get information on conflicts.


SQL standard not-withstanding there really is no way to pick one
implementation and not have it be sub-optimal in some situations.  Unless
there is a high barrier why not introduce syntax:

SCAN FIRST; INSERT FIRST

that allows the user to specify the behavior that they expect would be most
efficient given their existing/new data ratio?


>> Having said all that, I believe the INSERT ON CONFLICT syntax is more
>> easily comprehensible than previous proposals.  But I still tend to
>> agree with Andres that an explicit UPSERT syntax or something like it,
>> that captures all of the MVCC games inside itself, is likely
>> preferable from a user standpoint, whatever the implementation ends up
>> looking like.
> 
> Okay then. If you both feel that way, I will come up with something
> closer to what you sketch. But for now I still strongly feel it ought
> to be driven by an insert. Perhaps I've misunderstood you entirely,
> though.

Getting a little syntax crazy here but having all of:

UPSERT [SCAN|INSERT] FIRST
INSERT ON CONFLICT UPDATE - same as INSERT FIRST
UPDATE ON MISSING INSERT - same as SCAN FIRST

with the corresponding 2 implementations would make the user interface
slightly more complicated but able to be conformed to the actual data that
the user has.


You could basically perform a two-phase pass where you run the
user-requested algorithm and then for all failures attempt the alternate
algorithm and then error if both fail.

I am not at all fluent on the concurrency issues here, and the MVCC
violations and re-tries that might be considered, but at a high-level there
is disagreement here simply because both answers are "correct" and ideally
both can be provided to the user.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Making-joins-involving-ctid-work-for-the-benefit-of-UPSERT-tp5811919p5812640.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

Reply via email to