On Wed, Dec 17, 2014 at 1:12 PM, Heikki Linnakangas <hlinnakan...@vmware.com> wrote: > It looks like we are close to reaching consensus on the syntax. Phew! Thanks > for maintaining the wiki pages and the documentation. All of the below is > based on those, I haven't looked at the patch itself yet.
Great, thanks! Yes, I am relieved that we appeared to have agreed on a syntax. > The one thing that I still feel uneasy about is the Unique Index Inference > thing. > The documentation says that: > >> Omitting the specification indicates a total indifference to where >> any would-be uniqueness violation could occur, which isn't always >> appropriate; > Some questions: > > 1. Does that mean that if you leave out the key columns, the insertion is > IGNOREd if it violates *any* unique key constraint? Yes. This is particularly important for the implementation of things like IGNORE's updatable view support. More generally, for various ETL use cases it's possible to imagine the user simply not caring. > 2. If you do specify the key columns, then the IGNORE path is taken only if > the insertion violates a unique key constraint on those particular columns. > Otherwise an error is thrown. Right? That's right. > Now, let's imagine a table like this: > > CREATE TABLE persons ( > username text unique, > real_name text unique, > data text > ); > > Is there any way to specify both of those constraints, so that the insertion > is IGNOREd if it violates either one of them? If you try to do: > > INSERT INTO persons(username, real_name, data) > VALUES('foobar', 'foo bar') > ON CONFLICT (username, real_name) IGNORE; > > It will fail because there is no unique index on (username, real_name). In > this particular case, you could leave out the specification, but if there > was a third constraint that you're not expecting to conflict with, you would > want violations of that constraint to still throw an error. And you can't > leave out the specification with ON CONFLICT UPDATE anyway. Good point. For the IGNORE case: I guess the syntax just isn't that flexible. I agree that that isn't ideal. For the UPDATE case: Suppose your example was an UPDATE where we simply assigned the excluded.data value to the data column in the auxiliary UPDATE's targetlist. What would the user really be asking for with that command, at a really high level? It seems like they might actually want to run two UPSERT commands (one for username, the other for real_name), or rethink their indexing strategy - in particular, whether it's appropriate that there isn't a composite unique constraint on (username, real_name). Now, suppose that by accident or by convention it will always be possible for a composite unique index to be built on (username, real_name) - no dup violations would be raised if it was attempted, but it just hasn't been and won't be. In other words, it's generally safe to actually pretend that there is one. Then, surely it doesn't matter if the user picks one or the other unique index. It'll all work out when the user assigns to both in the UPDATE targetlist, because of the assumed convention that I think is implied by the example. If the convention is violated, at least you get a dup violation letting you know (iff you bothered to assign). But I wouldn't like to encourage that pattern. I think that the long and the short of it is that you really ought to have one unique index as an arbiter in mind when writing a DML statement for the UPDATE variant. Relying on this type of convention is possible, I suppose, but ill-advised. > 3. Why is the specification required with ON CONFLICT UPDATE, but not with > ON CONFLICT IGNORE? That was a fairly recent decision, taken mainly to keep Kevin happy -- although TBH I don't recall that he was particularly insistent on that restriction. I could still go either way on that question. The idea, as I mentioned, is that it's legitimate to not care where a dup violation might occur for certain ETL use cases. For UPSERT, though, the only argument for not making it mandatory is that that's something extra to type, and lazy people would prefer not to bother. This is because we assume that the first dup violation is the only possible one without the unique index inference clause. If we don't have the index expressions with which to infer an arbiter unique index (as with MySQL's ON DUPLICATE KEY UPDATE), you'd better be sure that you accounted for all possible sources of would-be duplicate violations - otherwise a random row will be updated! That isn't a fantastic argument for not making a unique index inference clause mandatory, but it might be an okay one. > 4. What happens if there are multiple unique indexes with identical columns, > and you give those columns in the inference specification? Doesn't matter > which index you use, I guess, if they're all identical, but see next > question. It doesn't really matter which one you pick, but right now, at the urging of Robert, we cost the list of candidates and pick the cheapest iff there is more than one [1] (and error if there are none). This is roughly similar to costing of indexes for CLUSTER, and occurs during optimization (only parse analysis of the expressions associated with the unique index inference clause occurs during parse analysis - indexes are looked up and matched in the optimizer). > 5. What if there are multiple unique indexes with the same columns, but > different operator classes? I thought about that. I am reusing a little bit of the CREATE INDEX infrastructure for raw parsing, and for a small amount of parse analysis (conveniently, this makes the command reject things like aggregate functions with no additional code - the error messages only mention "index expressions", so I believe that's fine). This could include an opclass specification, but right now non-default opclasses are rejected during extra steps in parse analysis, for no particular reason. I could easily have the unique index inference specification accept a named opclass, if you thought that was important, and you thought naming a non-default opclass by name was a good SQL interface. It would take only a little effort to support non-default opclasses. > 6. Why are partial unique indexes not supported as arbitrators? Robert and I discussed this quite a bit -- it was the argument for being able to name a unique index by name (not that I'm very happy with that idea or anything) [2]. Basically, dealing with the possible behaviors with before row insert triggers might in general greatly complicate the implementation, even though the issues we'd then be protected against would seldom arise. Robert seemed to think that we could revisit this in a future version [3]. Note that IGNORE will still IGNORE any partial unique index -- it just won't accept one as the sole arbiter of whether or not the IGNORE path should be taken (so it's really the inference specification syntax that doesn't accept partial unique indexes, just as it doesn't accept updatable views, exclusion constraints, and inheritance parents where the semantics are similarly iffy -- that's both the reason for and the mechanism by which ON CONFLICT UPDATE does not support these things). [1] http://www.postgresql.org/message-id/cam3swzqz+jykwfuzvcsf0qtpa2qiy+8ngnchjfwgz3ddzrf...@mail.gmail.com [2] http://www.postgresql.org/message-id/cam3swzq8tdpdjiwj_fw4ao8gevpyiixwbe67ovquufpj+y1...@mail.gmail.com [3] http://www.postgresql.org/message-id/CA+TgmoZgLgY2PBAMTY3T1jpYXAvNL-w=t6o+6pmqrvr+vn-...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers