Re: Guidance on INSERT RETURNING order

2023-04-18 Thread John Howroyd
Sorry, I may have jumped to a conclusion that's not quite correct.

On Mon, 17 Apr 2023 at 23:58, Federico  wrote:

> On Tue, 18 Apr 2023 at 00:21, John Howroyd 
> wrote:
> > ...
> >
> > Personally, I didn't see Frederico's comment as anything to do with
> order; just how one could output additional values in the RETURNING clause
> (namely, v.num from a subexpression of the SELECT but in whatever order it
> comes).  On the other hand, that seems a lot more complicated to me because
> it is not an expression in the overall SELECT feeding the INSERT, whereas
> the WITH ORDINALITY is a specific declaration to match input order with
> output order by inserting a counter.
>
> I didn't mean to suggest any particular order should be kept by insert
> or by returning. I was merely commenting on the David G. Johnston
> reply
>
>  I suppose breaking the restriction that only columns present on
> the insertion-table can be returned is a possible option that also
> solves another infrequent request.
>
>
...
> Best,
>   Federico
>

This might be a possibility.  The v.num (from the original example) is
accessible in the outer select, so one can a envisage a system to handle
this, but at (presumably) much greater expense: preparation of the SELECT,
orchestration of the output rows (some fields for INSERT some for
RETURNING) and (presumably) a whole load of mem copies to RETURNING.  Is
this something to consider (perhaps just for feasibility while writing an
initial patch based on WITH ORDINALITY)?

To put it another way, v.num is a valid expression in the overall SELECT,
but still a lot more complicated (and presumably expensive).


Re: Guidance on INSERT RETURNING order

2023-04-17 Thread John Howroyd
PS:  Sorry, I haven't yet thought how this might work with UPDATE or MERGE,
but if I am on the right track with INSERT I'll give this some thought.

On Mon, 17 Apr 2023 at 18:48, John Howroyd  wrote:

> May I clarify the ideas being discussed so far, perhaps with a view to
> make a relevant proposal.  My apologies if I get anything wrong or go too
> far.
>
> As I understand it the proposal is to supplement the syntax to something
> like:
>
> INSERT INTO table (a, b, c)
> VALUES ((1,2,3), (4,5,6), ...)
> WITH ORDINALITY
> RETURNING table.id, ordinality
> ;
>
> The meaning of which is to adjoin an ordinality column to the output
> reflecting the declaration order in the values clause.  So an output of
> (not necessarily in any order):
> (1001, 1)
> (1003, 2)
> means that table.id = 1001 was assigned to the inserted row from tuple
> (1,2,3) (from VALUES, because that table.id is associated to ordinality =
> 1) and table.id = 1003 was assigned to the inserted row  from tuple
> (4,5,6).  The output being ordered as determined by the internals of query
> execution (not necessarily the one shown).
>
> Is that correct?
>
> I presume (although, not quite so clear) that one would have:
>
> INSERT INTO table (a, b, c)
> SELECT a_val, b_val, c_val
> FROM joined_tables
> WHERE some_condition
> ORDER BY something_relevant
> WITH ORDINALITY
> RETURNING table.id, ordinality
> ;
>
> The meaning being very much as before replacing 'declaration order' by
> 'row order of the SELECT statement as defined by the ORDER BY clause'; so
> pretty much like a row_number() but in the output of the RETURNING clause
> (and without an OVER modification).  I added the ORDER BY clause as I don't
> really see what this would mean without it; but this (presumably) does not
> affect output order only the order of the incoming rows (and hence the
> generation of the ordinality output).
>
> Is that correct?
>
> Might there be a natural syntax to label the 'ordinality' output column?
> Perhaps something like:
>
> ...
> WITH ORDINALITY (col_name)
> RETURNING table.id, col_name
> ;
>
> I don't want to clash with the syntax for Table Functions.
>
> Is it a step too far to propose allowing an additional ORDER BY clause
> after the RETURNING clause (a specific declaration for the query execution
> to assign cpu cycles; especially if the WITH ORDINALITY is not tied to
> output order)?
>
> Personally, I didn't see Frederico's comment as anything to do with order;
> just how one could output additional values in the RETURNING clause
> (namely, v.num from a subexpression of the SELECT but in whatever order it
> comes).  On the other hand, that seems a lot more complicated to me because
> it is not an expression in the overall SELECT feeding the INSERT, whereas
> the WITH ORDINALITY is a specific declaration to match input order with
> output order by inserting a counter.
>
> Apologies, if I have misunderstood or invented something that's not
> possible!
>


Re: Guidance on INSERT RETURNING order

2023-04-17 Thread John Howroyd
May I clarify the ideas being discussed so far, perhaps with a view to make
a relevant proposal.  My apologies if I get anything wrong or go too far.

As I understand it the proposal is to supplement the syntax to something
like:

INSERT INTO table (a, b, c)
VALUES ((1,2,3), (4,5,6), ...)
WITH ORDINALITY
RETURNING table.id, ordinality
;

The meaning of which is to adjoin an ordinality column to the output
reflecting the declaration order in the values clause.  So an output of
(not necessarily in any order):
(1001, 1)
(1003, 2)
means that table.id = 1001 was assigned to the inserted row from tuple
(1,2,3) (from VALUES, because that table.id is associated to ordinality =
1) and table.id = 1003 was assigned to the inserted row  from tuple
(4,5,6).  The output being ordered as determined by the internals of query
execution (not necessarily the one shown).

Is that correct?

I presume (although, not quite so clear) that one would have:

INSERT INTO table (a, b, c)
SELECT a_val, b_val, c_val
FROM joined_tables
WHERE some_condition
ORDER BY something_relevant
WITH ORDINALITY
RETURNING table.id, ordinality
;

The meaning being very much as before replacing 'declaration order' by 'row
order of the SELECT statement as defined by the ORDER BY clause'; so pretty
much like a row_number() but in the output of the RETURNING clause (and
without an OVER modification).  I added the ORDER BY clause as I don't
really see what this would mean without it; but this (presumably) does not
affect output order only the order of the incoming rows (and hence the
generation of the ordinality output).

Is that correct?

Might there be a natural syntax to label the 'ordinality' output column?
Perhaps something like:

...
WITH ORDINALITY (col_name)
RETURNING table.id, col_name
;

I don't want to clash with the syntax for Table Functions.

Is it a step too far to propose allowing an additional ORDER BY clause
after the RETURNING clause (a specific declaration for the query execution
to assign cpu cycles; especially if the WITH ORDINALITY is not tied to
output order)?

Personally, I didn't see Frederico's comment as anything to do with order;
just how one could output additional values in the RETURNING clause
(namely, v.num from a subexpression of the SELECT but in whatever order it
comes).  On the other hand, that seems a lot more complicated to me because
it is not an expression in the overall SELECT feeding the INSERT, whereas
the WITH ORDINALITY is a specific declaration to match input order with
output order by inserting a counter.

Apologies, if I have misunderstood or invented something that's not
possible!


Re: Guidance on INSERT RETURNING order

2023-04-14 Thread John Howroyd
A patch for what?  All my testing gives me the same output order as the
declaration order.  Does anyone have an example where this is not the case?

The problem is that SQLAlchemy is an ORM and they need to be sure to match
records from the insert to the relevant code side objects.  This needs to
be efficient as the majority of the python world will use this ORM for
their database needs.  There is no PostgreSQL documentation upon which to
give this assertion.

So what is really needed is for PostgreSQL to commit to this behaviour, or
give guidance on how this might be achieved.

On Fri, 14 Apr 2023 at 19:50, David G. Johnston 
wrote:

> On Fri, Apr 14, 2023 at 11:42 AM John Howroyd 
> wrote:
>
>> @PostgreSQL:  Might I ask if this is still being actively considered or
>> should we repost to another mailing list (perhaps pgsql-hackers or any
>> other you might suggest)?
>>
>
> This is the right place for such a discussion.  Unless you think you've
> come up with a design that you want to propose and that, if accepted, you
> would then write up a patch for.  For that you'd probably want to move the
> discussion to -hackers with an outline of the patch you plan to write.
>
> David J.
>
>


Re: Guidance on INSERT RETURNING order

2023-04-14 Thread John Howroyd
@PostgreSQL:  Might I ask if this is still being actively considered or
should we repost to another mailing list (perhaps pgsql-hackers or any
other you might suggest)?


Re: Guidance on INSERT RETURNING order

2023-04-13 Thread John Howroyd
A few more thoughts.

My " simply marshal the RETURNING results through and in memory/temporary
table maintaining the order from the SELECT and return these to the client
upon commit" isn't quite right with the "upon commit": that (the commit)
may happen much later with several statements wrapped into a transaction.
So very much an internal decision about when those results happen to be
returned.  The sentiment is that the returning results are emitted as a
last step in the processing of the statement / execution plan.  Future
statements in a multi-statement transaction may depend on the output.

However, the marshalling is presumably already happening (except any tie to
ordering to the original declared tuples); otherwise I really don't
understand how the with syntax (I think first suggested by Adrian Klaver)
would work.  @SQLAlchemy, if you wanted to off load reordering to database
side a combination solution might be considered:

with rslt as (
  INSERT INTO mytable (a, b)
  SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num)
ORDER BY num
  RETURNING mytable.id, mytable.other_auto_gen
)
select * from rslt
order by id
;

should work (not yet tested); and could be modified to the "sentinel"
columns.  Where the primary key is explicitly in the inserted columns (and
you are not using an identity) then explicit matching would need to occur
unless you are given an order guarantee.  As for randomly generated
GUID/UUID, I don't think anything will work other than a guarantee on order
or a surrogate "sentinel" column/explicit tuple id.

@Mike Bayer: Isn't the order guarantee sufficient (wasn't that the original
assumption in the 2.0 code)?  I do quite understand the wish for having one
solution that fits all without dialectic dependencies.  However, in my
opinion, this one is going to run for some time and is heavily dependent on
internals.  With mssql, this seems to me to be deep down in the insert
internals and unlikely to change any time soon (at least until they start
losing market share because other DBMSs do it better).  Here (PostgreSQL)
the actual mechanisms required can probably be achieved much more readily
(given the will).  But the fundamental issue of matching code side objects
with database records (with side effects) remains and is only going to
become more prominent.  @PostgreSQL, isn't this your opportunity to lead
the way again!-).

On Thu, 13 Apr 2023 at 05:26, Mike Bayer  wrote:

> here's what would be cool, a new token called "tuple_order" or something
>
>
> INSERT INTO table (a, b, c) VALUES ((1, 2, 3), (4, 5, 6), ...) RETURNING
> table.id, inserted.tuple_order
>
> tuple_order would be incrementing values 1, 2, 3, 4, 5, ... which
> correlate the each row delivered by RETURNING to each entry in the VALUES
> clause, in the order they were stated in that VALUES clause, that is entry
> (1, 2, 3) would be tuple_order 1, entry (4, 5, 6) would be tuple order 2,
> etc.
>
> That token would solve the entire problem in all cases. The 1800 line
> changeset I've been working on all week would go away (if every database
> had this) and the change would be reduced to a few dozen lines.
>
>
> On Wed, Apr 12, 2023, at 11:05 PM, John Howroyd wrote:
>
> To clarify, the work around (for SQLAlchemy) does not require the
> RETURNING values to be ordered; they will be reordered code side.  Longer
> term matching the order up would be an added bonus!
>
> On Thu, 13 Apr 2023 at 04:02, John Howroyd 
> wrote:
>
> The ideal solution would be that the RETURNING values are in the same
> order as the declared rows in the table literal "FROM (VALUES (10, 11, 1),
> (12, 13, 2)) AS sel(p1, p2, num)".  But before any of you jump down my
> throat about about sets and a bar to parallel inserts, I think it's a small
> matter of some internal smoke and mirrors: simply marshal the RETURNING
> results through and in memory/temporary table maintaining the order from
> the SELECT and return these to the client upon commit.  At this moment, I
> believe the folk at SQLAlchemy are looking for a work around that can work
> for most use cases with either an identity column or a fake "sentinel"
> column for matching back to the ORM objects.  There may be an upper limit
> of 1000 to the number rows in a single insert (this arises in  MS SQL
> Server, who absolutely do not guarantee the order of their OUTPUT clause --
> it's tied to the physical row insert rather than marshalled).
>
> My feeling is that PostgreSQL may already do this.  See the commit:
>
> https://github.com/postgres/postgres/commit/9118d03a8cca3d97327c56bf89a72e328e454e63
> and the original message thread
>
> https://www.postgresql.org/message-id/flat/CADBwkHv1nUx5bqxqRLp26k8iUWSnY3uW-1_0EMYxK

Re: Guidance on INSERT RETURNING order

2023-04-12 Thread John Howroyd
To clarify, the work around (for SQLAlchemy) does not require the RETURNING
values to be ordered; they will be reordered code side.  Longer term
matching the order up would be an added bonus!

On Thu, 13 Apr 2023 at 04:02, John Howroyd  wrote:

> The ideal solution would be that the RETURNING values are in the same
> order as the declared rows in the table literal "FROM (VALUES (10, 11, 1),
> (12, 13, 2)) AS sel(p1, p2, num)".  But before any of you jump down my
> throat about about sets and a bar to parallel inserts, I think it's a small
> matter of some internal smoke and mirrors: simply marshal the RETURNING
> results through and in memory/temporary table maintaining the order from
> the SELECT and return these to the client upon commit.  At this moment, I
> believe the folk at SQLAlchemy are looking for a work around that can work
> for most use cases with either an identity column or a fake "sentinel"
> column for matching back to the ORM objects.  There may be an upper limit
> of 1000 to the number rows in a single insert (this arises in  MS SQL
> Server, who absolutely do not guarantee the order of their OUTPUT clause --
> it's tied to the physical row insert rather than marshalled).
>
> My feeling is that PostgreSQL may already do this.  See the commit:
>
> https://github.com/postgres/postgres/commit/9118d03a8cca3d97327c56bf89a72e328e454e63
> and the original message thread
>
> https://www.postgresql.org/message-id/flat/CADBwkHv1nUx5bqxqRLp26k8iUWSnY3uW-1_0EMYxKns0t2JnUA%40mail.gmail.com
> and references to undesired behaviour prior to PostgreSQL 9.6 such as in
> https://www.postgresql.org/docs/current/sql-select.html.
>
> The test cases in the above commit use:
> SELECT nextval('sequence'), datavals FROM ... ORDER BY something;
> type syntax.  And I presume that's exactly what's happening internally
> when there's a serial identity column (via query rewrite rules or another
> mechanism).
>
> So really, that just needs confirming.  What happens with triggers?
> Presumably the same as long as query rewrite rules are used, but this is
> presumably getting deeper into the code for actual inserts after the
> initial select.  The jump to the output being ordered, is just a guess but
> there's some bits and pieces that seem to suggest that there may indeed be
> a marshalling process going on in the background (whether or not that is
> linked to the original order is another matter).
>
> I have set up a PostgreSQL server to test if I can break this hypothesis
> and see what query explains can allude to.  Does anyone have a test case
> where the order is not preserved?
>
> Might I also point out that if the implementation of parallel inserts does
> create a bar then doing so may end up with the programmatic interfaces
> (such as SQLAlchemy) not being able to use that feature (possibly reverting
> back to single inserts).  Ur, so what would be the problem being solved
> with parallel inserts?
>
> On Thu, 13 Apr 2023 at 02:27, Mike Bayer  wrote:
>
>> We do something different for uuids.  These are usually created client
>> side anyway or are very easy to create client side if not and then we also
>> have a separate sentinel column option.
>>
>> Overall the whole thing is an optimization that can be turned off for
>> uncommon cases.   We just can't go back to having insert of 1000 rows be
>> 3-5x slower for all tables w server generated primary keys which is what
>> statement at a time gets us.  Surrogate integer Pks w sequence or identity
>> is the big case.
>>
>> On Wed, Apr 12, 2023, at 8:58 PM, Thorsten Glaser wrote:
>> > On Wed, 12 Apr 2023, Mike Bayer wrote:
>> >
>> >>ascending values for "mytable_id_seq" will correspond to the ascending
>> >>values for "num".
>> >
>> > But, again, what if it’s uuid or something instead of a sequence?
>> >
>> > bye,
>> > //mirabilos
>> > --
>> > 15:41⎜ Somebody write a testsuite for
>> helloworld :-)
>>
>>
>>


Re: Guidance on INSERT RETURNING order

2023-04-12 Thread John Howroyd
The ideal solution would be that the RETURNING values are in the same order
as the declared rows in the table literal "FROM (VALUES (10, 11, 1), (12,
13, 2)) AS sel(p1, p2, num)".  But before any of you jump down my throat
about about sets and a bar to parallel inserts, I think it's a small matter
of some internal smoke and mirrors: simply marshal the RETURNING results
through and in memory/temporary table maintaining the order from the SELECT
and return these to the client upon commit.  At this moment, I believe the
folk at SQLAlchemy are looking for a work around that can work for most use
cases with either an identity column or a fake "sentinel" column for
matching back to the ORM objects.  There may be an upper limit of 1000 to
the number rows in a single insert (this arises in  MS SQL Server, who
absolutely do not guarantee the order of their OUTPUT clause -- it's tied
to the physical row insert rather than marshalled).

My feeling is that PostgreSQL may already do this.  See the commit:
https://github.com/postgres/postgres/commit/9118d03a8cca3d97327c56bf89a72e328e454e63
and the original message thread
https://www.postgresql.org/message-id/flat/CADBwkHv1nUx5bqxqRLp26k8iUWSnY3uW-1_0EMYxKns0t2JnUA%40mail.gmail.com
and references to undesired behaviour prior to PostgreSQL 9.6 such as in
https://www.postgresql.org/docs/current/sql-select.html.

The test cases in the above commit use:
SELECT nextval('sequence'), datavals FROM ... ORDER BY something;
type syntax.  And I presume that's exactly what's happening internally when
there's a serial identity column (via query rewrite rules or another
mechanism).

So really, that just needs confirming.  What happens with triggers?
Presumably the same as long as query rewrite rules are used, but this is
presumably getting deeper into the code for actual inserts after the
initial select.  The jump to the output being ordered, is just a guess but
there's some bits and pieces that seem to suggest that there may indeed be
a marshalling process going on in the background (whether or not that is
linked to the original order is another matter).

I have set up a PostgreSQL server to test if I can break this hypothesis
and see what query explains can allude to.  Does anyone have a test case
where the order is not preserved?

Might I also point out that if the implementation of parallel inserts does
create a bar then doing so may end up with the programmatic interfaces
(such as SQLAlchemy) not being able to use that feature (possibly reverting
back to single inserts).  Ur, so what would be the problem being solved
with parallel inserts?

On Thu, 13 Apr 2023 at 02:27, Mike Bayer  wrote:

> We do something different for uuids.  These are usually created client
> side anyway or are very easy to create client side if not and then we also
> have a separate sentinel column option.
>
> Overall the whole thing is an optimization that can be turned off for
> uncommon cases.   We just can't go back to having insert of 1000 rows be
> 3-5x slower for all tables w server generated primary keys which is what
> statement at a time gets us.  Surrogate integer Pks w sequence or identity
> is the big case.
>
> On Wed, Apr 12, 2023, at 8:58 PM, Thorsten Glaser wrote:
> > On Wed, 12 Apr 2023, Mike Bayer wrote:
> >
> >>ascending values for "mytable_id_seq" will correspond to the ascending
> >>values for "num".
> >
> > But, again, what if it’s uuid or something instead of a sequence?
> >
> > bye,
> > //mirabilos
> > --
> > 15:41⎜ Somebody write a testsuite for helloworld
> :-)
>
>
>


Re: Guidance on INSERT RETURNING order

2023-04-12 Thread John Howroyd
Hi All,

The fundamental question here is:
What does "insert into object(col1, ... , coln) select sel1, ... , seln
from ... order by ord1, ... ordm returning val1, ..., valk" mean?

In particular, what does the "order by" do in "insert ... select ... order
by"?  Will this be honoured with parallel inserts?  Is that really too much
to ask?

Is a DBMS just a black box archive of data (throw it in and forget about
it)?  In my opinion, it's a system for the consistent and efficient storage
and *retrieval* of data, to be used in the context of other software
systems.  Programmatically, that means matching up what's in the database
with what's stored in memory.

And yes, a DBMS needs to give guarantees!  ACID compliance is one such
guarantee, but it is not the only one required.

Finally, please don't read this badly.  It is meant purely to question
where we want to go, I strongly support PostgreSQL and believe it has made
many major contributions to the development of DBMS over the years.

Many thanks,
John.



On Wed, 12 Apr 2023 at 11:10, Federico  wrote:

> On Wed, 12 Apr 2023 at 11:46, Thorsten Glaser  wrote:
> >
> > On Tue, 11 Apr 2023, Federico wrote:
> >
> > >My understanding was that they are generated in select order
> >
> > But are they? (I don’t know, but I’d not assume that.)
>
> That's kind of the point for this question, to see if that's correct or
> not.
>
> > >If my understanding is incorrect, would this alternative guarantee the
> above
> >
> > >INSERT INTO t(id, data)
> > >SELECT nextval(pg_get_serial_sequence('t', 'id')) data
> > >FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num)
> > >ORDER BY num
> > >RETURNING id
> >
> > Wouldn’t, at that point, it be better to just send multiple
> > individual INSERT statements? The overhead (on both sides)
> > for all mentioned… workarounds… surely is larger than that?
>
> No, not by a long shot. Sending thousands of single inserts
> sequentially over the network requires a lot more time even when doing
> that on localhost.
> Using a single statement is many times faster.
>
> Federico
>
> > bye,
> > //mirabilos
> > --
> > 15:41⎜ Somebody write a testsuite for helloworld
> :-)
> >
> >
> >
> >
>
>
>
>
>