Re: Guidance on INSERT RETURNING order
Sorry, I may have jumped to a conclusion that's not quite correct. On Mon, 17 Apr 2023 at 23:58, Federico wrote: > On Tue, 18 Apr 2023 at 00:21, John Howroyd > wrote: > > ... > > > > Personally, I didn't see Frederico's comment as anything to do with > order; just how one could output additional values in the RETURNING clause > (namely, v.num from a subexpression of the SELECT but in whatever order it > comes). On the other hand, that seems a lot more complicated to me because > it is not an expression in the overall SELECT feeding the INSERT, whereas > the WITH ORDINALITY is a specific declaration to match input order with > output order by inserting a counter. > > I didn't mean to suggest any particular order should be kept by insert > or by returning. I was merely commenting on the David G. Johnston > reply > > I suppose breaking the restriction that only columns present on > the insertion-table can be returned is a possible option that also > solves another infrequent request. > > ... > Best, > Federico > This might be a possibility. The v.num (from the original example) is accessible in the outer select, so one can a envisage a system to handle this, but at (presumably) much greater expense: preparation of the SELECT, orchestration of the output rows (some fields for INSERT some for RETURNING) and (presumably) a whole load of mem copies to RETURNING. Is this something to consider (perhaps just for feasibility while writing an initial patch based on WITH ORDINALITY)? To put it another way, v.num is a valid expression in the overall SELECT, but still a lot more complicated (and presumably expensive).
Re: Guidance on INSERT RETURNING order
PS: Sorry, I haven't yet thought how this might work with UPDATE or MERGE, but if I am on the right track with INSERT I'll give this some thought. On Mon, 17 Apr 2023 at 18:48, John Howroyd wrote: > May I clarify the ideas being discussed so far, perhaps with a view to > make a relevant proposal. My apologies if I get anything wrong or go too > far. > > As I understand it the proposal is to supplement the syntax to something > like: > > INSERT INTO table (a, b, c) > VALUES ((1,2,3), (4,5,6), ...) > WITH ORDINALITY > RETURNING table.id, ordinality > ; > > The meaning of which is to adjoin an ordinality column to the output > reflecting the declaration order in the values clause. So an output of > (not necessarily in any order): > (1001, 1) > (1003, 2) > means that table.id = 1001 was assigned to the inserted row from tuple > (1,2,3) (from VALUES, because that table.id is associated to ordinality = > 1) and table.id = 1003 was assigned to the inserted row from tuple > (4,5,6). The output being ordered as determined by the internals of query > execution (not necessarily the one shown). > > Is that correct? > > I presume (although, not quite so clear) that one would have: > > INSERT INTO table (a, b, c) > SELECT a_val, b_val, c_val > FROM joined_tables > WHERE some_condition > ORDER BY something_relevant > WITH ORDINALITY > RETURNING table.id, ordinality > ; > > The meaning being very much as before replacing 'declaration order' by > 'row order of the SELECT statement as defined by the ORDER BY clause'; so > pretty much like a row_number() but in the output of the RETURNING clause > (and without an OVER modification). I added the ORDER BY clause as I don't > really see what this would mean without it; but this (presumably) does not > affect output order only the order of the incoming rows (and hence the > generation of the ordinality output). > > Is that correct? > > Might there be a natural syntax to label the 'ordinality' output column? > Perhaps something like: > > ... > WITH ORDINALITY (col_name) > RETURNING table.id, col_name > ; > > I don't want to clash with the syntax for Table Functions. > > Is it a step too far to propose allowing an additional ORDER BY clause > after the RETURNING clause (a specific declaration for the query execution > to assign cpu cycles; especially if the WITH ORDINALITY is not tied to > output order)? > > Personally, I didn't see Frederico's comment as anything to do with order; > just how one could output additional values in the RETURNING clause > (namely, v.num from a subexpression of the SELECT but in whatever order it > comes). On the other hand, that seems a lot more complicated to me because > it is not an expression in the overall SELECT feeding the INSERT, whereas > the WITH ORDINALITY is a specific declaration to match input order with > output order by inserting a counter. > > Apologies, if I have misunderstood or invented something that's not > possible! >
Re: Guidance on INSERT RETURNING order
May I clarify the ideas being discussed so far, perhaps with a view to make a relevant proposal. My apologies if I get anything wrong or go too far. As I understand it the proposal is to supplement the syntax to something like: INSERT INTO table (a, b, c) VALUES ((1,2,3), (4,5,6), ...) WITH ORDINALITY RETURNING table.id, ordinality ; The meaning of which is to adjoin an ordinality column to the output reflecting the declaration order in the values clause. So an output of (not necessarily in any order): (1001, 1) (1003, 2) means that table.id = 1001 was assigned to the inserted row from tuple (1,2,3) (from VALUES, because that table.id is associated to ordinality = 1) and table.id = 1003 was assigned to the inserted row from tuple (4,5,6). The output being ordered as determined by the internals of query execution (not necessarily the one shown). Is that correct? I presume (although, not quite so clear) that one would have: INSERT INTO table (a, b, c) SELECT a_val, b_val, c_val FROM joined_tables WHERE some_condition ORDER BY something_relevant WITH ORDINALITY RETURNING table.id, ordinality ; The meaning being very much as before replacing 'declaration order' by 'row order of the SELECT statement as defined by the ORDER BY clause'; so pretty much like a row_number() but in the output of the RETURNING clause (and without an OVER modification). I added the ORDER BY clause as I don't really see what this would mean without it; but this (presumably) does not affect output order only the order of the incoming rows (and hence the generation of the ordinality output). Is that correct? Might there be a natural syntax to label the 'ordinality' output column? Perhaps something like: ... WITH ORDINALITY (col_name) RETURNING table.id, col_name ; I don't want to clash with the syntax for Table Functions. Is it a step too far to propose allowing an additional ORDER BY clause after the RETURNING clause (a specific declaration for the query execution to assign cpu cycles; especially if the WITH ORDINALITY is not tied to output order)? Personally, I didn't see Frederico's comment as anything to do with order; just how one could output additional values in the RETURNING clause (namely, v.num from a subexpression of the SELECT but in whatever order it comes). On the other hand, that seems a lot more complicated to me because it is not an expression in the overall SELECT feeding the INSERT, whereas the WITH ORDINALITY is a specific declaration to match input order with output order by inserting a counter. Apologies, if I have misunderstood or invented something that's not possible!
Re: Guidance on INSERT RETURNING order
A patch for what? All my testing gives me the same output order as the declaration order. Does anyone have an example where this is not the case? The problem is that SQLAlchemy is an ORM and they need to be sure to match records from the insert to the relevant code side objects. This needs to be efficient as the majority of the python world will use this ORM for their database needs. There is no PostgreSQL documentation upon which to give this assertion. So what is really needed is for PostgreSQL to commit to this behaviour, or give guidance on how this might be achieved. On Fri, 14 Apr 2023 at 19:50, David G. Johnston wrote: > On Fri, Apr 14, 2023 at 11:42 AM John Howroyd > wrote: > >> @PostgreSQL: Might I ask if this is still being actively considered or >> should we repost to another mailing list (perhaps pgsql-hackers or any >> other you might suggest)? >> > > This is the right place for such a discussion. Unless you think you've > come up with a design that you want to propose and that, if accepted, you > would then write up a patch for. For that you'd probably want to move the > discussion to -hackers with an outline of the patch you plan to write. > > David J. > >
Re: Guidance on INSERT RETURNING order
@PostgreSQL: Might I ask if this is still being actively considered or should we repost to another mailing list (perhaps pgsql-hackers or any other you might suggest)?
Re: Guidance on INSERT RETURNING order
A few more thoughts. My " simply marshal the RETURNING results through and in memory/temporary table maintaining the order from the SELECT and return these to the client upon commit" isn't quite right with the "upon commit": that (the commit) may happen much later with several statements wrapped into a transaction. So very much an internal decision about when those results happen to be returned. The sentiment is that the returning results are emitted as a last step in the processing of the statement / execution plan. Future statements in a multi-statement transaction may depend on the output. However, the marshalling is presumably already happening (except any tie to ordering to the original declared tuples); otherwise I really don't understand how the with syntax (I think first suggested by Adrian Klaver) would work. @SQLAlchemy, if you wanted to off load reordering to database side a combination solution might be considered: with rslt as ( INSERT INTO mytable (a, b) SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num RETURNING mytable.id, mytable.other_auto_gen ) select * from rslt order by id ; should work (not yet tested); and could be modified to the "sentinel" columns. Where the primary key is explicitly in the inserted columns (and you are not using an identity) then explicit matching would need to occur unless you are given an order guarantee. As for randomly generated GUID/UUID, I don't think anything will work other than a guarantee on order or a surrogate "sentinel" column/explicit tuple id. @Mike Bayer: Isn't the order guarantee sufficient (wasn't that the original assumption in the 2.0 code)? I do quite understand the wish for having one solution that fits all without dialectic dependencies. However, in my opinion, this one is going to run for some time and is heavily dependent on internals. With mssql, this seems to me to be deep down in the insert internals and unlikely to change any time soon (at least until they start losing market share because other DBMSs do it better). Here (PostgreSQL) the actual mechanisms required can probably be achieved much more readily (given the will). But the fundamental issue of matching code side objects with database records (with side effects) remains and is only going to become more prominent. @PostgreSQL, isn't this your opportunity to lead the way again!-). On Thu, 13 Apr 2023 at 05:26, Mike Bayer wrote: > here's what would be cool, a new token called "tuple_order" or something > > > INSERT INTO table (a, b, c) VALUES ((1, 2, 3), (4, 5, 6), ...) RETURNING > table.id, inserted.tuple_order > > tuple_order would be incrementing values 1, 2, 3, 4, 5, ... which > correlate the each row delivered by RETURNING to each entry in the VALUES > clause, in the order they were stated in that VALUES clause, that is entry > (1, 2, 3) would be tuple_order 1, entry (4, 5, 6) would be tuple order 2, > etc. > > That token would solve the entire problem in all cases. The 1800 line > changeset I've been working on all week would go away (if every database > had this) and the change would be reduced to a few dozen lines. > > > On Wed, Apr 12, 2023, at 11:05 PM, John Howroyd wrote: > > To clarify, the work around (for SQLAlchemy) does not require the > RETURNING values to be ordered; they will be reordered code side. Longer > term matching the order up would be an added bonus! > > On Thu, 13 Apr 2023 at 04:02, John Howroyd > wrote: > > The ideal solution would be that the RETURNING values are in the same > order as the declared rows in the table literal "FROM (VALUES (10, 11, 1), > (12, 13, 2)) AS sel(p1, p2, num)". But before any of you jump down my > throat about about sets and a bar to parallel inserts, I think it's a small > matter of some internal smoke and mirrors: simply marshal the RETURNING > results through and in memory/temporary table maintaining the order from > the SELECT and return these to the client upon commit. At this moment, I > believe the folk at SQLAlchemy are looking for a work around that can work > for most use cases with either an identity column or a fake "sentinel" > column for matching back to the ORM objects. There may be an upper limit > of 1000 to the number rows in a single insert (this arises in MS SQL > Server, who absolutely do not guarantee the order of their OUTPUT clause -- > it's tied to the physical row insert rather than marshalled). > > My feeling is that PostgreSQL may already do this. See the commit: > > https://github.com/postgres/postgres/commit/9118d03a8cca3d97327c56bf89a72e328e454e63 > and the original message thread > > https://www.postgresql.org/message-id/flat/CADBwkHv1nUx5bqxqRLp26k8iUWSnY3uW-1_0EMYxK
Re: Guidance on INSERT RETURNING order
To clarify, the work around (for SQLAlchemy) does not require the RETURNING values to be ordered; they will be reordered code side. Longer term matching the order up would be an added bonus! On Thu, 13 Apr 2023 at 04:02, John Howroyd wrote: > The ideal solution would be that the RETURNING values are in the same > order as the declared rows in the table literal "FROM (VALUES (10, 11, 1), > (12, 13, 2)) AS sel(p1, p2, num)". But before any of you jump down my > throat about about sets and a bar to parallel inserts, I think it's a small > matter of some internal smoke and mirrors: simply marshal the RETURNING > results through and in memory/temporary table maintaining the order from > the SELECT and return these to the client upon commit. At this moment, I > believe the folk at SQLAlchemy are looking for a work around that can work > for most use cases with either an identity column or a fake "sentinel" > column for matching back to the ORM objects. There may be an upper limit > of 1000 to the number rows in a single insert (this arises in MS SQL > Server, who absolutely do not guarantee the order of their OUTPUT clause -- > it's tied to the physical row insert rather than marshalled). > > My feeling is that PostgreSQL may already do this. See the commit: > > https://github.com/postgres/postgres/commit/9118d03a8cca3d97327c56bf89a72e328e454e63 > and the original message thread > > https://www.postgresql.org/message-id/flat/CADBwkHv1nUx5bqxqRLp26k8iUWSnY3uW-1_0EMYxKns0t2JnUA%40mail.gmail.com > and references to undesired behaviour prior to PostgreSQL 9.6 such as in > https://www.postgresql.org/docs/current/sql-select.html. > > The test cases in the above commit use: > SELECT nextval('sequence'), datavals FROM ... ORDER BY something; > type syntax. And I presume that's exactly what's happening internally > when there's a serial identity column (via query rewrite rules or another > mechanism). > > So really, that just needs confirming. What happens with triggers? > Presumably the same as long as query rewrite rules are used, but this is > presumably getting deeper into the code for actual inserts after the > initial select. The jump to the output being ordered, is just a guess but > there's some bits and pieces that seem to suggest that there may indeed be > a marshalling process going on in the background (whether or not that is > linked to the original order is another matter). > > I have set up a PostgreSQL server to test if I can break this hypothesis > and see what query explains can allude to. Does anyone have a test case > where the order is not preserved? > > Might I also point out that if the implementation of parallel inserts does > create a bar then doing so may end up with the programmatic interfaces > (such as SQLAlchemy) not being able to use that feature (possibly reverting > back to single inserts). Ur, so what would be the problem being solved > with parallel inserts? > > On Thu, 13 Apr 2023 at 02:27, Mike Bayer wrote: > >> We do something different for uuids. These are usually created client >> side anyway or are very easy to create client side if not and then we also >> have a separate sentinel column option. >> >> Overall the whole thing is an optimization that can be turned off for >> uncommon cases. We just can't go back to having insert of 1000 rows be >> 3-5x slower for all tables w server generated primary keys which is what >> statement at a time gets us. Surrogate integer Pks w sequence or identity >> is the big case. >> >> On Wed, Apr 12, 2023, at 8:58 PM, Thorsten Glaser wrote: >> > On Wed, 12 Apr 2023, Mike Bayer wrote: >> > >> >>ascending values for "mytable_id_seq" will correspond to the ascending >> >>values for "num". >> > >> > But, again, what if it’s uuid or something instead of a sequence? >> > >> > bye, >> > //mirabilos >> > -- >> > 15:41⎜ Somebody write a testsuite for >> helloworld :-) >> >> >>
Re: Guidance on INSERT RETURNING order
The ideal solution would be that the RETURNING values are in the same order as the declared rows in the table literal "FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num)". But before any of you jump down my throat about about sets and a bar to parallel inserts, I think it's a small matter of some internal smoke and mirrors: simply marshal the RETURNING results through and in memory/temporary table maintaining the order from the SELECT and return these to the client upon commit. At this moment, I believe the folk at SQLAlchemy are looking for a work around that can work for most use cases with either an identity column or a fake "sentinel" column for matching back to the ORM objects. There may be an upper limit of 1000 to the number rows in a single insert (this arises in MS SQL Server, who absolutely do not guarantee the order of their OUTPUT clause -- it's tied to the physical row insert rather than marshalled). My feeling is that PostgreSQL may already do this. See the commit: https://github.com/postgres/postgres/commit/9118d03a8cca3d97327c56bf89a72e328e454e63 and the original message thread https://www.postgresql.org/message-id/flat/CADBwkHv1nUx5bqxqRLp26k8iUWSnY3uW-1_0EMYxKns0t2JnUA%40mail.gmail.com and references to undesired behaviour prior to PostgreSQL 9.6 such as in https://www.postgresql.org/docs/current/sql-select.html. The test cases in the above commit use: SELECT nextval('sequence'), datavals FROM ... ORDER BY something; type syntax. And I presume that's exactly what's happening internally when there's a serial identity column (via query rewrite rules or another mechanism). So really, that just needs confirming. What happens with triggers? Presumably the same as long as query rewrite rules are used, but this is presumably getting deeper into the code for actual inserts after the initial select. The jump to the output being ordered, is just a guess but there's some bits and pieces that seem to suggest that there may indeed be a marshalling process going on in the background (whether or not that is linked to the original order is another matter). I have set up a PostgreSQL server to test if I can break this hypothesis and see what query explains can allude to. Does anyone have a test case where the order is not preserved? Might I also point out that if the implementation of parallel inserts does create a bar then doing so may end up with the programmatic interfaces (such as SQLAlchemy) not being able to use that feature (possibly reverting back to single inserts). Ur, so what would be the problem being solved with parallel inserts? On Thu, 13 Apr 2023 at 02:27, Mike Bayer wrote: > We do something different for uuids. These are usually created client > side anyway or are very easy to create client side if not and then we also > have a separate sentinel column option. > > Overall the whole thing is an optimization that can be turned off for > uncommon cases. We just can't go back to having insert of 1000 rows be > 3-5x slower for all tables w server generated primary keys which is what > statement at a time gets us. Surrogate integer Pks w sequence or identity > is the big case. > > On Wed, Apr 12, 2023, at 8:58 PM, Thorsten Glaser wrote: > > On Wed, 12 Apr 2023, Mike Bayer wrote: > > > >>ascending values for "mytable_id_seq" will correspond to the ascending > >>values for "num". > > > > But, again, what if it’s uuid or something instead of a sequence? > > > > bye, > > //mirabilos > > -- > > 15:41⎜ Somebody write a testsuite for helloworld > :-) > > >
Re: Guidance on INSERT RETURNING order
Hi All, The fundamental question here is: What does "insert into object(col1, ... , coln) select sel1, ... , seln from ... order by ord1, ... ordm returning val1, ..., valk" mean? In particular, what does the "order by" do in "insert ... select ... order by"? Will this be honoured with parallel inserts? Is that really too much to ask? Is a DBMS just a black box archive of data (throw it in and forget about it)? In my opinion, it's a system for the consistent and efficient storage and *retrieval* of data, to be used in the context of other software systems. Programmatically, that means matching up what's in the database with what's stored in memory. And yes, a DBMS needs to give guarantees! ACID compliance is one such guarantee, but it is not the only one required. Finally, please don't read this badly. It is meant purely to question where we want to go, I strongly support PostgreSQL and believe it has made many major contributions to the development of DBMS over the years. Many thanks, John. On Wed, 12 Apr 2023 at 11:10, Federico wrote: > On Wed, 12 Apr 2023 at 11:46, Thorsten Glaser wrote: > > > > On Tue, 11 Apr 2023, Federico wrote: > > > > >My understanding was that they are generated in select order > > > > But are they? (I don’t know, but I’d not assume that.) > > That's kind of the point for this question, to see if that's correct or > not. > > > >If my understanding is incorrect, would this alternative guarantee the > above > > > > >INSERT INTO t(id, data) > > >SELECT nextval(pg_get_serial_sequence('t', 'id')) data > > >FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num) > > >ORDER BY num > > >RETURNING id > > > > Wouldn’t, at that point, it be better to just send multiple > > individual INSERT statements? The overhead (on both sides) > > for all mentioned… workarounds… surely is larger than that? > > No, not by a long shot. Sending thousands of single inserts > sequentially over the network requires a lot more time even when doing > that on localhost. > Using a single statement is many times faster. > > Federico > > > bye, > > //mirabilos > > -- > > 15:41⎜ Somebody write a testsuite for helloworld > :-) > > > > > > > > > > > > >