Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2024-03-23 Thread Alexander Korotkov
On Tue, Mar 19, 2024 at 5:20 PM Alexander Korotkov  wrote:
> On Tue, Nov 28, 2023 at 11:00 AM Pavel Borisov  wrote:
> > > You're designing new APIs, days before the feature freeze.
> > On Wed, 5 Apr 2023 at 06:54, Michael Paquier  wrote:
> > >
> > > On Tue, Apr 04, 2023 at 01:25:46AM +0300, Alexander Korotkov wrote:
> > > > Pavel, thank you for you review, revisions and rebase.
> > > > We'll reconsider this once v17 is branched.
> >
> > I've looked through patches v16 once more and think they're good
> > enough, and previous issues are all addressed. I see that there is
> > nothing that blocks it from being committed except the last iteration
> > was days before v16 feature freeze.
> >
> > Recently in another thread [1] Alexander posted a new version of
> > patches v16 (as 0001 and 0002) In 0001 only indenation, comments, and
> > commit messages changed from v16 in this thread. In 0002 new test
> > eval-plan-qual-2 was integrated into the existing eval-plan-qual test.
> > For maintaining the most recent versions in this thread I'm attaching
> > them under v17. I suppose that we can commit these patches to v17 if
> > there are no objections or additional reviews.
> >
> > [1] 
> > https://www.postgresql.org/message-id/flat/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
>
> The new revision of patches is attached.
>
> It has updated commit messages, new comments, and some variables were
> renamed to be more consistent with surroundings.
>
> I also think that all the design issues spoken before are resolved.
> It would be nice to hear from Andres about this.
>
> I'll continue rechecking these patches myself.

I've re-read this thread.  It still seems to me that the issues raised
before are addressed now.  Fingers crossed, I'm going to push this if
there are no objections.

--
Regards,
Alexander Korotkov




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2024-03-19 Thread Alexander Korotkov
Hi, Pavel!

On Tue, Nov 28, 2023 at 11:00 AM Pavel Borisov  wrote:
> > You're designing new APIs, days before the feature freeze.
> On Wed, 5 Apr 2023 at 06:54, Michael Paquier  wrote:
> >
> > On Tue, Apr 04, 2023 at 01:25:46AM +0300, Alexander Korotkov wrote:
> > > Pavel, thank you for you review, revisions and rebase.
> > > We'll reconsider this once v17 is branched.
>
> I've looked through patches v16 once more and think they're good
> enough, and previous issues are all addressed. I see that there is
> nothing that blocks it from being committed except the last iteration
> was days before v16 feature freeze.
>
> Recently in another thread [1] Alexander posted a new version of
> patches v16 (as 0001 and 0002) In 0001 only indenation, comments, and
> commit messages changed from v16 in this thread. In 0002 new test
> eval-plan-qual-2 was integrated into the existing eval-plan-qual test.
> For maintaining the most recent versions in this thread I'm attaching
> them under v17. I suppose that we can commit these patches to v17 if
> there are no objections or additional reviews.
>
> [1] 
> https://www.postgresql.org/message-id/flat/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com

The new revision of patches is attached.

It has updated commit messages, new comments, and some variables were
renamed to be more consistent with surroundings.

I also think that all the design issues spoken before are resolved.
It would be nice to hear from Andres about this.

I'll continue rechecking these patches myself.

--
Regards,
Alexander Korotkov


v18-0001-Allow-locking-updated-tuples-in-tuple_update-and.patch
Description: Binary data


v18-0002-Add-EvalPlanQual-delete-returning-isolation-test.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-11-28 Thread Pavel Borisov
Hi, hackers!

> You're designing new APIs, days before the feature freeze.
On Wed, 5 Apr 2023 at 06:54, Michael Paquier  wrote:
>
> On Tue, Apr 04, 2023 at 01:25:46AM +0300, Alexander Korotkov wrote:
> > Pavel, thank you for you review, revisions and rebase.
> > We'll reconsider this once v17 is branched.

I've looked through patches v16 once more and think they're good
enough, and previous issues are all addressed. I see that there is
nothing that blocks it from being committed except the last iteration
was days before v16 feature freeze.

Recently in another thread [1] Alexander posted a new version of
patches v16 (as 0001 and 0002) In 0001 only indenation, comments, and
commit messages changed from v16 in this thread. In 0002 new test
eval-plan-qual-2 was integrated into the existing eval-plan-qual test.
For maintaining the most recent versions in this thread I'm attaching
them under v17. I suppose that we can commit these patches to v17 if
there are no objections or additional reviews.

[1] 
https://www.postgresql.org/message-id/flat/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com

Kind regards,
Pavel Borisov


v17-0002-Add-EvalPlanQual-delete-returning-isolation-test-v1.patch
Description: Binary data


v17-0001-Allow-locking-updated-tuples-in-tuple_update-and--v1.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-04-04 Thread Michael Paquier
On Tue, Apr 04, 2023 at 01:25:46AM +0300, Alexander Korotkov wrote:
> Pavel, thank you for you review, revisions and rebase.
> We'll reconsider this once v17 is branched.

The patch was still in the current CF, so I have moved it to the next
one based on the latest updates. 
--
Michael


signature.asc
Description: PGP signature


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-04-03 Thread Alexander Korotkov
On Mon, Apr 3, 2023 at 5:12 PM Pavel Borisov  wrote:
> Upon Alexander reverting patches v15 from master, I've rebased what
> was correction patches v4 in a message above on a fresh master
> (together with patches v15). The resulting patch v16 is attached.

Pavel, thank you for you review, revisions and rebase.
We'll reconsider this once v17 is branched.

--
Regards,
Alexander Korotkov




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-04-03 Thread Pavel Borisov
Upon Alexander reverting patches v15 from master, I've rebased what
was correction patches v4 in a message above on a fresh master
(together with patches v15). The resulting patch v16 is attached.

Pavel.


v16-0002-Add-EvalPlanQual-delete-returning-isolation-test.patch
Description: Binary data


v16-0001-Allow-locking-updated-tuples-in-tuple_update-and.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-04-03 Thread Pavel Borisov
Hi, Alexander!

On 2023-04-02 03:37:19 +0300, Alexander Korotkov wrote:
> On Sat, Apr 1, 2023 at 8:21 AM Andres Freund  wrote:
> > Given that the in-tree state has been broken for a week, I think it probably
> > is time to revert the commits that already went in.
>
> The revised patch is attached.  The most notable change is getting rid
> of LazyTupleTableSlot.  Also get rid of complex computations to detect
> how to initialize LazyTupleTableSlot.  Instead just pass the oldSlot
> as an argument of ExecUpdate() and ExecDelete().  The price for this
> is just preallocation of ri_oldTupleSlot before calling ExecDelete().
> The slot allocation is quite cheap.  After all wrappers it's
> table_slot_callbacks(), which is very cheap, single palloc() and few
> fields initialization.  It doesn't seem reasonable to introduce an
> infrastructure to evade this.
>
> I think patch resolves all the major issues you've highlighted.  Even
> if there are some minor things missed, I'd prefer to push this rather
> than reverting the whole work.

I looked into the latest patch v3.
In my view, it addresses all the issues discussed in [1]. Also, with
the pushing oldslot logic outside  code becomes more transparent. I've
added some very minor modifications to the code and comments in patch
v4-0001. Also, I'm for committing Andres' isolation test. I've added
some minor revisions to make the test run routinely among the other
isolation tests. The test could also be made a part of the existing
eval-plan-qual.spec, but I have left it separate yet.

Also, I think that signatures of ExecUpdate() and ExecDelete()
functions, especially the last one are somewhat overloaded with
different status bool variables added by different authors on
different occasions. If they are combined into some kind of status
variable, it would be nice. But as this doesn't touch API, is not
related to the current update/delete optimization, it could be
modified anytime in the future as well.

The changes that indeed touch API are adding TupleTableSlot and
conversion of bool wait flag into now four-state options variable for
tuple_update(), tuple_delete(), heap_update(), heap_delete() and
heap_lock_tuple() and a couple of Exec*DeleteTriggers(). I think they
are justified.

One thing that is not clear to me is that we pass oldSlot into
simple_table_tuple_update() whereas as per the comment on this
function "concurrent updates of
the target tuple is not expected (for example, because we have a lock
on the relation associated with the tuple)". It seems not to break
anything but maybe this could be simplified.

Overall I think the patch is good enough.

Regards,
Pavel Borisov,
Supabase.

[1] 
https://www.postgresql.org/message-id/CAPpHfdtwKb5UVXKkDQZYW8nQCODy0fY_S7mV5Z%2Bcg7urL%3DzDEA%40mail.gmail.com


v4-0002-Add-EvalPlanQual-delete-returning-isolation-test.patch
Description: Binary data


v4-0001-Revise-changes-in-764da7710b-and-11470f544e.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-04-03 Thread Alexander Korotkov
On Sun, Apr 2, 2023 at 3:47 AM Andres Freund  wrote:
> On 2023-04-02 03:37:19 +0300, Alexander Korotkov wrote:
> > On Sat, Apr 1, 2023 at 8:21 AM Andres Freund  wrote:
> > > Given that the in-tree state has been broken for a week, I think it 
> > > probably
> > > is time to revert the commits that already went in.
> >
> > The revised patch is attached.  The most notable change is getting rid
> > of LazyTupleTableSlot.  Also get rid of complex computations to detect
> > how to initialize LazyTupleTableSlot.  Instead just pass the oldSlot
> > as an argument of ExecUpdate() and ExecDelete().  The price for this
> > is just preallocation of ri_oldTupleSlot before calling ExecDelete().
> > The slot allocation is quite cheap.  After all wrappers it's
> > table_slot_callbacks(), which is very cheap, single palloc() and few
> > fields initialization.  It doesn't seem reasonable to introduce an
> > infrastructure to evade this.
> >
> > I think patch resolves all the major issues you've highlighted.  Even
> > if there are some minor things missed, I'd prefer to push this rather
> > than reverting the whole work.
>
> Shrug. You're designing new APIs, days before the feature freeze. This just
> doesn't seem ready in time for 16. I certainly won't have time to look at it
> sufficiently in the next 5 days.

OK.  Reverted.

--
Regards,
Alexander Korotkov




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-04-01 Thread Andres Freund
Hi,

On 2023-04-02 03:37:19 +0300, Alexander Korotkov wrote:
> On Sat, Apr 1, 2023 at 8:21 AM Andres Freund  wrote:
> > Given that the in-tree state has been broken for a week, I think it probably
> > is time to revert the commits that already went in.
> 
> The revised patch is attached.  The most notable change is getting rid
> of LazyTupleTableSlot.  Also get rid of complex computations to detect
> how to initialize LazyTupleTableSlot.  Instead just pass the oldSlot
> as an argument of ExecUpdate() and ExecDelete().  The price for this
> is just preallocation of ri_oldTupleSlot before calling ExecDelete().
> The slot allocation is quite cheap.  After all wrappers it's
> table_slot_callbacks(), which is very cheap, single palloc() and few
> fields initialization.  It doesn't seem reasonable to introduce an
> infrastructure to evade this.
> 
> I think patch resolves all the major issues you've highlighted.  Even
> if there are some minor things missed, I'd prefer to push this rather
> than reverting the whole work.

Shrug. You're designing new APIs, days before the feature freeze. This just
doesn't seem ready in time for 16. I certainly won't have time to look at it
sufficiently in the next 5 days.

Greetings,

Andres Freund




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-04-01 Thread Alexander Korotkov
.Hi!

On Sat, Apr 1, 2023 at 8:21 AM Andres Freund  wrote:
> On 2023-03-31 16:57:41 +0300, Alexander Korotkov wrote:
> > On Wed, Mar 29, 2023 at 8:34 PM Alexander Korotkov  
> > wrote:
> > > On Mon, Mar 27, 2023 at 1:49 PM Alexander Korotkov  
> > > wrote:
> > > > On Fri, Mar 24, 2023 at 3:39 AM Andres Freund  
> > > > wrote:
> > > > > On 2023-03-23 23:24:19 +0300, Alexander Korotkov wrote:
> > > > > > On Thu, Mar 23, 2023 at 8:06 PM Andres Freund  
> > > > > > wrote:
> > > > > > > I seriously doubt that solving this at the tuple locking level is 
> > > > > > > the right
> > > > > > > thing. If we want to avoid refetching tuples, why don't we add a 
> > > > > > > parameter to
> > > > > > > delete/update to generally put the old tuple version into a slot, 
> > > > > > > not just as
> > > > > > > an optimization for a subsequent lock_tuple()? Then we could 
> > > > > > > remove all
> > > > > > > refetching tuples for triggers. It'd also provide the basis for 
> > > > > > > adding support
> > > > > > > for referencing the OLD version in RETURNING, which'd be quite 
> > > > > > > powerful.
> > > >
> > > > After some thoughts, I think I like idea of fetching old tuple version
> > > > in update/delete.  Everything that evades extra tuple fetching and do
> > > > more of related work in a single table AM call, makes table AM API
> > > > more flexible.
> > > >
> > > > I'm working on patch implementing this.  I'm going to post it later 
> > > > today.
> > >
> > > Here is the patchset.  I'm continue to work on comments and refactoring.
> > >
> > > My quick question is why do we need ri_TrigOldSlot for triggers?
> > > Can't we just pass the old tuple for after row trigger in
> > > ri_oldTupleSlot?
> > >
> > > Also, I wonder if we really need a LazyTupleSlot.  It allows to evade
> > > extra tuple slot allocation.  But as I get in the end the tuple slot
> > > allocation is just a single palloc.  I bet the effect would be
> > > invisible in the benchmarks.
> >
> > Sorry, previous patches don't even compile.  The fixed version is attached.
> > I'm going to post significantly revised patchset soon.
>
> Given that the in-tree state has been broken for a week, I think it probably
> is time to revert the commits that already went in.

The revised patch is attached.  The most notable change is getting rid
of LazyTupleTableSlot.  Also get rid of complex computations to detect
how to initialize LazyTupleTableSlot.  Instead just pass the oldSlot
as an argument of ExecUpdate() and ExecDelete().  The price for this
is just preallocation of ri_oldTupleSlot before calling ExecDelete().
The slot allocation is quite cheap.  After all wrappers it's
table_slot_callbacks(), which is very cheap, single palloc() and few
fields initialization.  It doesn't seem reasonable to introduce an
infrastructure to evade this.

I think patch resolves all the major issues you've highlighted.  Even
if there are some minor things missed, I'd prefer to push this rather
than reverting the whole work.

--
Regards,
Alexander Korotkov


0001-Revise-changes-in-764da7710b-and-11470f544e-v3.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-04-01 Thread Pavel Borisov
Hi, Andres!

On Sat, 1 Apr 2023, 09:21 Andres Freund,  wrote:

> Given that the in-tree state has been broken for a week, I think it
> probably
> is time to revert the commits that already went in.
>

It seems that although the patch addressing the issues is not a quick fix,
there is a big progress in it already. I propose to see it's status a week
later and if it is not ready then to revert existing. Hope there are no
other patches in the existing branch complained to suffer this.

Kind regards,
Pavel Borisov,
Supabase

>


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-31 Thread Andres Freund
Hi,

On 2023-03-31 16:57:41 +0300, Alexander Korotkov wrote:
> On Wed, Mar 29, 2023 at 8:34 PM Alexander Korotkov  
> wrote:
> > On Mon, Mar 27, 2023 at 1:49 PM Alexander Korotkov  
> > wrote:
> > > On Fri, Mar 24, 2023 at 3:39 AM Andres Freund  wrote:
> > > > On 2023-03-23 23:24:19 +0300, Alexander Korotkov wrote:
> > > > > On Thu, Mar 23, 2023 at 8:06 PM Andres Freund  
> > > > > wrote:
> > > > > > I seriously doubt that solving this at the tuple locking level is 
> > > > > > the right
> > > > > > thing. If we want to avoid refetching tuples, why don't we add a 
> > > > > > parameter to
> > > > > > delete/update to generally put the old tuple version into a slot, 
> > > > > > not just as
> > > > > > an optimization for a subsequent lock_tuple()? Then we could remove 
> > > > > > all
> > > > > > refetching tuples for triggers. It'd also provide the basis for 
> > > > > > adding support
> > > > > > for referencing the OLD version in RETURNING, which'd be quite 
> > > > > > powerful.
> > >
> > > After some thoughts, I think I like idea of fetching old tuple version
> > > in update/delete.  Everything that evades extra tuple fetching and do
> > > more of related work in a single table AM call, makes table AM API
> > > more flexible.
> > >
> > > I'm working on patch implementing this.  I'm going to post it later today.
> >
> > Here is the patchset.  I'm continue to work on comments and refactoring.
> >
> > My quick question is why do we need ri_TrigOldSlot for triggers?
> > Can't we just pass the old tuple for after row trigger in
> > ri_oldTupleSlot?
> >
> > Also, I wonder if we really need a LazyTupleSlot.  It allows to evade
> > extra tuple slot allocation.  But as I get in the end the tuple slot
> > allocation is just a single palloc.  I bet the effect would be
> > invisible in the benchmarks.
> 
> Sorry, previous patches don't even compile.  The fixed version is attached.
> I'm going to post significantly revised patchset soon.

Given that the in-tree state has been broken for a week, I think it probably
is time to revert the commits that already went in.

Greetings,

Andres Freund




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-31 Thread Alexander Korotkov
On Wed, Mar 29, 2023 at 8:34 PM Alexander Korotkov  wrote:
> On Mon, Mar 27, 2023 at 1:49 PM Alexander Korotkov  
> wrote:
> > On Fri, Mar 24, 2023 at 3:39 AM Andres Freund  wrote:
> > > On 2023-03-23 23:24:19 +0300, Alexander Korotkov wrote:
> > > > On Thu, Mar 23, 2023 at 8:06 PM Andres Freund  
> > > > wrote:
> > > > > I seriously doubt that solving this at the tuple locking level is the 
> > > > > right
> > > > > thing. If we want to avoid refetching tuples, why don't we add a 
> > > > > parameter to
> > > > > delete/update to generally put the old tuple version into a slot, not 
> > > > > just as
> > > > > an optimization for a subsequent lock_tuple()? Then we could remove 
> > > > > all
> > > > > refetching tuples for triggers. It'd also provide the basis for 
> > > > > adding support
> > > > > for referencing the OLD version in RETURNING, which'd be quite 
> > > > > powerful.
> >
> > After some thoughts, I think I like idea of fetching old tuple version
> > in update/delete.  Everything that evades extra tuple fetching and do
> > more of related work in a single table AM call, makes table AM API
> > more flexible.
> >
> > I'm working on patch implementing this.  I'm going to post it later today.
>
> Here is the patchset.  I'm continue to work on comments and refactoring.
>
> My quick question is why do we need ri_TrigOldSlot for triggers?
> Can't we just pass the old tuple for after row trigger in
> ri_oldTupleSlot?
>
> Also, I wonder if we really need a LazyTupleSlot.  It allows to evade
> extra tuple slot allocation.  But as I get in the end the tuple slot
> allocation is just a single palloc.  I bet the effect would be
> invisible in the benchmarks.

Sorry, previous patches don't even compile.  The fixed version is attached.
I'm going to post significantly revised patchset soon.

--
Regards,
Alexander Korotkov


0001-Improve-lazy-tuple-slot-v2.patch
Description: Binary data


0002-Revise-changes-for-tuple_update-and-tuple_delete-v2.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-29 Thread Alexander Korotkov
Andres,

On Mon, Mar 27, 2023 at 1:49 PM Alexander Korotkov  wrote:
> On Fri, Mar 24, 2023 at 3:39 AM Andres Freund  wrote:
> > On 2023-03-23 23:24:19 +0300, Alexander Korotkov wrote:
> > > On Thu, Mar 23, 2023 at 8:06 PM Andres Freund  wrote:
> > > > I seriously doubt that solving this at the tuple locking level is the 
> > > > right
> > > > thing. If we want to avoid refetching tuples, why don't we add a 
> > > > parameter to
> > > > delete/update to generally put the old tuple version into a slot, not 
> > > > just as
> > > > an optimization for a subsequent lock_tuple()? Then we could remove all
> > > > refetching tuples for triggers. It'd also provide the basis for adding 
> > > > support
> > > > for referencing the OLD version in RETURNING, which'd be quite powerful.
>
> After some thoughts, I think I like idea of fetching old tuple version
> in update/delete.  Everything that evades extra tuple fetching and do
> more of related work in a single table AM call, makes table AM API
> more flexible.
>
> I'm working on patch implementing this.  I'm going to post it later today.

Here is the patchset.  I'm continue to work on comments and refactoring.

My quick question is why do we need ri_TrigOldSlot for triggers?
Can't we just pass the old tuple for after row trigger in
ri_oldTupleSlot?

Also, I wonder if we really need a LazyTupleSlot.  It allows to evade
extra tuple slot allocation.  But as I get in the end the tuple slot
allocation is just a single palloc.  I bet the effect would be
invisible in the benchmarks.

--
Regards,
Alexander Korotkov


0001-Improve-lazy-tuple-slot-v1.patch
Description: Binary data


0002-Revise-changes-for-tuple_update-and-tuple_delete-v1.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-27 Thread Alexander Korotkov
Hi!

On Fri, Mar 24, 2023 at 3:39 AM Andres Freund  wrote:
> On 2023-03-23 23:24:19 +0300, Alexander Korotkov wrote:
> > On Thu, Mar 23, 2023 at 8:06 PM Andres Freund  wrote:
> > > I seriously doubt that solving this at the tuple locking level is the 
> > > right
> > > thing. If we want to avoid refetching tuples, why don't we add a 
> > > parameter to
> > > delete/update to generally put the old tuple version into a slot, not 
> > > just as
> > > an optimization for a subsequent lock_tuple()? Then we could remove all
> > > refetching tuples for triggers. It'd also provide the basis for adding 
> > > support
> > > for referencing the OLD version in RETURNING, which'd be quite powerful.

After some thoughts, I think I like idea of fetching old tuple version
in update/delete.  Everything that evades extra tuple fetching and do
more of related work in a single table AM call, makes table AM API
more flexible.

I'm working on patch implementing this.  I'm going to post it later today.

--
Regards,
Alexander Korotkov




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-23 Thread Andres Freund
Hi,

An off-list conversation veered on-topic again. Reposting for posterity:

On 2023-03-23 23:24:19 +0300, Alexander Korotkov wrote:
> On Thu, Mar 23, 2023 at 8:06 PM Andres Freund  wrote:
> > I seriously doubt that solving this at the tuple locking level is the right
> > thing. If we want to avoid refetching tuples, why don't we add a parameter 
> > to
> > delete/update to generally put the old tuple version into a slot, not just 
> > as
> > an optimization for a subsequent lock_tuple()? Then we could remove all
> > refetching tuples for triggers. It'd also provide the basis for adding 
> > support
> > for referencing the OLD version in RETURNING, which'd be quite powerful.
>
> I spent some time thinking on this.  Does our attempt to update/delete
> tuple imply that we've already fetched the old tuple version?

Yes, but somewhat "far away", below the ExecProcNode() in ExecModifyTable(). I
don't think we can rely on that. The old tuple is just identified via a junk
attribute (c.f. "For UPDATE/DELETE/MERGE, fetch the row identity info for the
tuple..."). The NEW tuple is computed in the target list of the source query.
It's possible that for some simpler cases we could figure out that the
returned slot is the "old" tuple, but it'd be hard to make that work.

Alternatively we could evaluate returning as part of the source query
plan. While that'd work nicely for the EPQ cases (the EPQ evaluation would
compute the new values), it could not be relied upon for before triggers.

It might or might not be a win to try to do so - if you have a selective
query, ferrying around the entire source tuple might cost more than it
saves.


> We needed that at least to do initial qual check and calculation of the new
> tuple (for update case).

The NEW tuple is computed in the source query, as I mentioned, I don't think
we easily can get access to the source row in the general case.


> We currently may not have the old tuple at hand at the time we do
> table_tuple_update()/table_tuple_delete().  But that seems to be just and
> issue of our executor code.  Do it worth to make table AM fetch the old
> *unmodified* tuple given that we've already fetched it for sure?

Not unconditionally (e.g. if you neither have triggers, nor RETURNING, there's
not much point, unless the query is simple enough that we could make it
free). But in the other cases it seems beneficial. The caller would reliably
know whether they want the source tuple to be fetched, or not.

We could make it so that iff we already have the "old" tuple in the slot,
it'll not be put in there "again", but if it's not the right row version, it
is.

We could use the same approach to make the "happy path" in update/delete
cheaper. If the source tuple is provided, heap_delete(), heap_update() won't
need to do a ReadBuffer(), they could just IncrBufferRefCount(). That'd be a
quite substantial win.

Greetings,

Andres Freund




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-23 Thread Andres Freund
Hi,

On 2023-03-23 18:08:36 +0300, Alexander Korotkov wrote:
> On Thu, Mar 23, 2023 at 3:30 AM Andres Freund  wrote:
> > On 2023-03-21 01:25:11 +0300, Alexander Korotkov wrote:
> > > I'm going to push patchset v15 if no objections.
> >
> > Just saw that this went in - didn't catch up with the thread before,
> > unfortunately. At the very least I'd like to see some more work on cleaning 
> > up
> > the lazy tuple slot stuff. It's replete with unnecessary multiple-evaluation
> > hazards - I realize that there's some of those already, but I don't think we
> > should go further down that route. As far as I can tell there's no need for
> > any of this to be macros.
> 
> Thank you for taking a look at this, even post-commit.  Regarding
> marcos, do you think inline functions would be good instead?

Yes.


> > I think it's entirely sensible to avoid the tuple fetching in ExecDelete(),
> > but it needs a bit less localized work. Instead of using the presence of a
> > tuple in the returning slot, ExecDelete() should track whether it already 
> > has
> > fetched the deleted tuple.
> >
> > Or alternatively, do the work to avoid refetching the tuple for the much 
> > more
> > common case of not needing EPQ at all.
> >
> > I guess this really is part of my issue with this change - it optimizes the
> > rare case, while not addressing the same inefficiency in the common case.
> 
> I'm going to fix this for ExecDelete().  Avoiding refetching the tuple
> in more common case is something I'm definitely very interested in.
> But I would leave it for the future.

It doesn't seem like a good plan to start with the rare and then address the
common case. The solution for the common case might solve the rare case as
well. One way to make to fix the common case would be to return a tuple
suitable for returning computation as part of the input plan - which would
also fix the EPQ case, since we could just use the EPQ output. Of course there
are complications like triggers, but they seem like they could be dealt with.


> > > @@ -299,14 +305,46 @@ heapam_tuple_complete_speculative(Relation 
> > > relation, TupleTableSlot *slot,
> > >  static TM_Result
> > >  heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
> > >   Snapshot snapshot, Snapshot 
> > > crosscheck, bool wait,
> > > - TM_FailureData *tmfd, bool 
> > > changingPart)
> > > + TM_FailureData *tmfd, bool 
> > > changingPart,
> > > + LazyTupleTableSlot *lockedSlot)
> > >  {
> > > + TM_Result   result;
> > > +
> > >   /*
> > >* Currently Deleting of index tuples are handled at vacuum, in 
> > > case if
> > >* the storage itself is cleaning the dead tuples by itself, it is 
> > > the
> > >* time to call the index tuple deletion also.
> > >*/
> > > - return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, 
> > > changingPart);
> > > + result = heap_delete(relation, tid, cid, crosscheck, wait,
> > > +  tmfd, changingPart);
> > > +
> > > + /*
> > > +  * If the tuple has been concurrently updated, then get the lock on 
> > > it.
> > > +  * (Do this if caller asked for tat by providing a 'lockedSlot'.) 
> > > With the
> > > +  * lock held retry of delete should succeed even if there are more
> > > +  * concurrent update attempts.
> > > +  */
> > > + if (result == TM_Updated && lockedSlot)
> > > + {
> > > + TupleTableSlot *evalSlot;
> > > +
> > > + Assert(wait);
> > > +
> > > + evalSlot = LAZY_TTS_EVAL(lockedSlot);
> > > + result = heapam_tuple_lock_internal(relation, tid, snapshot,
> > > + 
> > > evalSlot, cid, LockTupleExclusive,
> > > + 
> > > LockWaitBlock,
> > > + 
> > > TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
> > > + 
> > > tmfd, true);
> >
> > Oh, huh? As I mentioned before, the unconditional use of LockWaitBlock means
> > the wait parameter is ignored.
> >
> > I'm frankly getting annoyed here.
> 
> lockedSlot shoudln't be provided when wait == false.  The assertion
> above expresses this intention.  However, the code lacking of comment
> directly expressing this idea.

I don't think a comment here is going to fix things. You can't expect somebody
trying to use tableam to look into the guts of a heapam function to understand
the API constraints to this degree. And there's afaict no comments in tableam
that indicate any of this.

I also just don't see why this is a sensible constraint? Why should this only
work 

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-23 Thread Alexander Korotkov
Hi!

On Thu, Mar 23, 2023 at 3:30 AM Andres Freund  wrote:
> On 2023-03-21 01:25:11 +0300, Alexander Korotkov wrote:
> > I'm going to push patchset v15 if no objections.
>
> Just saw that this went in - didn't catch up with the thread before,
> unfortunately. At the very least I'd like to see some more work on cleaning up
> the lazy tuple slot stuff. It's replete with unnecessary multiple-evaluation
> hazards - I realize that there's some of those already, but I don't think we
> should go further down that route. As far as I can tell there's no need for
> any of this to be macros.

Thank you for taking a look at this, even post-commit.  Regarding
marcos, do you think inline functions would be good instead?

> > From a8b4e8a7b27815e013ea07b8cc9ac68541a9ac07 Mon Sep 17 00:00:00 2001
> > From: Alexander Korotkov 
> > Date: Tue, 21 Mar 2023 00:34:15 +0300
> > Subject: [PATCH 1/2] Evade extra table_tuple_fetch_row_version() in
> >  ExecUpdate()/ExecDelete()
> >
> > When we lock tuple using table_tuple_lock() then we at the same time fetch
> > the locked tuple to the slot.  In this case we can skip extra
> > table_tuple_fetch_row_version() thank to we've already fetched the 'old' 
> > tuple
> > and nobody can change it concurrently since it's locked.
> >
> > Discussion: 
> > https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com
> > Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp
> > Reviewed-by: Andres Freund, Chris Travers
> > ---
> >  src/backend/executor/nodeModifyTable.c | 48 +++---
> >  1 file changed, 35 insertions(+), 13 deletions(-)
> >
> > diff --git a/src/backend/executor/nodeModifyTable.c 
> > b/src/backend/executor/nodeModifyTable.c
> > index 3a673895082..93ebfdbb0d8 100644
> > --- a/src/backend/executor/nodeModifyTable.c
> > +++ b/src/backend/executor/nodeModifyTable.c
> > @@ -1559,6 +1559,22 @@ ldelete:
> >   {
> >   case TM_Ok:
> >   
> > Assert(context->tmfd.traversed);
> > +
> > + /*
> > +  * Save locked tuple 
> > for further processing of
> > +  * RETURNING clause.
> > +  */
> > + if (processReturning 
> > &&
> > + 
> > resultRelInfo->ri_projectReturning &&
> > + 
> > !resultRelInfo->ri_FdwRoutine)
> > + {
> > + 
> > TupleTableSlot *returningSlot;
> > +
> > + returningSlot 
> > = ExecGetReturningSlot(estate, resultRelInfo);
> > + 
> > ExecCopySlot(returningSlot, inputslot);
> > + 
> > ExecMaterializeSlot(returningSlot);
> > + }
> > +
> >   epqslot = 
> > EvalPlanQual(context->epqstate,
> > 
> >  resultRelationDesc,
> > 
> >  resultRelInfo->ri_RangeTableIndex,
>
> This seems a bit byzantine. We use inputslot = EvalPlanQualSlot(...) to make
> EvalPlanQual() a bit cheaper, because that avoids a slot copy inside
> EvalPlanQual().  But now we copy and materialize that slot anyway - and we do
> so even if EPQ fails. And we afaics also do it when epqreturnslot is set, in
> which case we'll afaics never use the copied slot.

Yes, I agree that there is a redundancy we could avoid.

> Read the next paragraph below before replying to the above - I don't think
> this is right for other reasons:
>
> > @@ -1673,12 +1689,17 @@ ldelete:
> >   }
> >   else
> >   {
> > + /*
> > +  * Tuple can be already fetched to the returning slot 
> > in case
> > +  * we've previously locked it.  Fetch the tuple only 
> > if the slot
> > +  * is empty.
> > +  */
> >   slot = ExecGetReturningSlot(estate, resultRelInfo);
> >   if (oldtuple != NULL)
> >   {
> >   ExecForceStoreHeapTuple(oldtuple, slot, 
> > false);
> >   }
> > - else
> > + else if (TupIsNull(slot))
> >   {

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-22 Thread Andres Freund
Hi,

On 2023-03-21 01:25:11 +0300, Alexander Korotkov wrote:
> I'm going to push patchset v15 if no objections.

Just saw that this went in - didn't catch up with the thread before,
unfortunately. At the very least I'd like to see some more work on cleaning up
the lazy tuple slot stuff. It's replete with unnecessary multiple-evaluation
hazards - I realize that there's some of those already, but I don't think we
should go further down that route. As far as I can tell there's no need for
any of this to be macros.


> From a8b4e8a7b27815e013ea07b8cc9ac68541a9ac07 Mon Sep 17 00:00:00 2001
> From: Alexander Korotkov 
> Date: Tue, 21 Mar 2023 00:34:15 +0300
> Subject: [PATCH 1/2] Evade extra table_tuple_fetch_row_version() in
>  ExecUpdate()/ExecDelete()
>
> When we lock tuple using table_tuple_lock() then we at the same time fetch
> the locked tuple to the slot.  In this case we can skip extra
> table_tuple_fetch_row_version() thank to we've already fetched the 'old' tuple
> and nobody can change it concurrently since it's locked.
>
> Discussion: 
> https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com
> Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp
> Reviewed-by: Andres Freund, Chris Travers
> ---
>  src/backend/executor/nodeModifyTable.c | 48 +++---
>  1 file changed, 35 insertions(+), 13 deletions(-)
>
> diff --git a/src/backend/executor/nodeModifyTable.c 
> b/src/backend/executor/nodeModifyTable.c
> index 3a673895082..93ebfdbb0d8 100644
> --- a/src/backend/executor/nodeModifyTable.c
> +++ b/src/backend/executor/nodeModifyTable.c
> @@ -1559,6 +1559,22 @@ ldelete:
>   {
>   case TM_Ok:
>   
> Assert(context->tmfd.traversed);
> +
> + /*
> +  * Save locked tuple 
> for further processing of
> +  * RETURNING clause.
> +  */
> + if (processReturning &&
> + 
> resultRelInfo->ri_projectReturning &&
> + 
> !resultRelInfo->ri_FdwRoutine)
> + {
> + TupleTableSlot 
> *returningSlot;
> +
> + returningSlot = 
> ExecGetReturningSlot(estate, resultRelInfo);
> + 
> ExecCopySlot(returningSlot, inputslot);
> + 
> ExecMaterializeSlot(returningSlot);
> + }
> +
>   epqslot = 
> EvalPlanQual(context->epqstate,
>   
>resultRelationDesc,
>   
>resultRelInfo->ri_RangeTableIndex,

This seems a bit byzantine. We use inputslot = EvalPlanQualSlot(...) to make
EvalPlanQual() a bit cheaper, because that avoids a slot copy inside
EvalPlanQual().  But now we copy and materialize that slot anyway - and we do
so even if EPQ fails. And we afaics also do it when epqreturnslot is set, in
which case we'll afaics never use the copied slot.

Read the next paragraph below before replying to the above - I don't think
this is right for other reasons:

> @@ -1673,12 +1689,17 @@ ldelete:
>   }
>   else
>   {
> + /*
> +  * Tuple can be already fetched to the returning slot 
> in case
> +  * we've previously locked it.  Fetch the tuple only if 
> the slot
> +  * is empty.
> +  */
>   slot = ExecGetReturningSlot(estate, resultRelInfo);
>   if (oldtuple != NULL)
>   {
>   ExecForceStoreHeapTuple(oldtuple, slot, false);
>   }
> - else
> + else if (TupIsNull(slot))
>   {
>   if 
> (!table_tuple_fetch_row_version(resultRelationDesc, tupleid,
>   
>SnapshotAny, slot))


I don't think this is correct as-is - what if ExecDelete() is called with some
older tuple in the returning slot? If we don't enter the TM_Updated path, it
won't get updated, and we'll return the wrong tuple.  It 

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-12 Thread Alexander Korotkov
On Fri, Mar 10, 2023 at 8:17 PM Chris Travers  wrote:
> "Right, the improvement this patch gives to the heap is not the full 
> motivation. Another motivation is the improvement it gives to TableAM API. 
> Our current API implies that the effort on locating the tuple by tid is 
> small. This is more or less true for the heap, where we just need to pin and 
> lock the buffer. But imagine other TableAM implementations, where locating a 
> tuple is more expensive."
>
> Yeah. Our TableAM API is a very nice start to getting pluggable storage, but 
> we still have a long ways to go to have an ability to really provide a wide 
> variety of pluggable storage engines.
>
> In particular, the following approaches are likely to have much more 
> expensive tid lookups:
>  - columnar storage (may require a lot of random IO to reconstruct a tuple)
>  - index oriented storage (tid no longer physically locatable in the file via 
> seek)
>  - compressed cold storage like pg_ctyogen (again seek may be problematic).
>
> To my mind I think the performance benefits are a nice side benefit, but the 
> main interest I have on this is regarding improvements in the TableAM 
> capabilities.  I cannot see how to do this without a lot more infrastructure.

Chris, thank you for your feedback.

The revised patch set is attached.  Some comments are improved.  Also,
we implicitly skip the new facility for the MERGE case.  As I get Dean
Rasheed is going to revise the locking for MERGE soon [1].

Pavel, could you please re-run your test case on the revised patch?

1. 
https://www.postgresql.org/message-id/caezatcu9e9ccbi70ynbccf7xvz+zrjid0_6eeq2zeza1p+7...@mail.gmail.com

--
Regards,
Alexander Korotkov
From a06e0864fd45b3fcac961a45e8c66f4dad6aac47 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov 
Date: Wed, 11 Jan 2023 15:00:49 +0300
Subject: [PATCH 1/2] Evade extra table_tuple_fetch_row_version() in
 ExecUpdate()/ExecDelete()

When we lock tuple using table_tuple_lock() then we at the same time fetch
the locked tuple to the slot.  In this case we can skip extra
table_tuple_fetch_row_version() thank to we've already fetched the 'old' tuple
and nobody can change it concurrently since it's locked.

Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com
Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp
---
 src/backend/executor/nodeModifyTable.c | 46 ++
 1 file changed, 33 insertions(+), 13 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 6f44d71f16b..839e8fe0d04 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1589,6 +1589,21 @@ ldelete:
 	{
 		case TM_Ok:
 			Assert(context->tmfd.traversed);
+
+			/*
+			 * Save locked tuple for further processing of
+			 * RETURNING clause.
+			 */
+			if (processReturning &&
+resultRelInfo->ri_projectReturning &&
+!resultRelInfo->ri_FdwRoutine)
+			{
+TupleTableSlot *returningSlot;
+returningSlot = ExecGetReturningSlot(estate, resultRelInfo);
+ExecCopySlot(returningSlot, inputslot);
+ExecMaterializeSlot(returningSlot);
+			}
+
 			epqslot = EvalPlanQual(context->epqstate,
    resultRelationDesc,
    resultRelInfo->ri_RangeTableIndex,
@@ -1703,12 +1718,16 @@ ldelete:
 		}
 		else
 		{
+			/*
+			 * Tuple can be already fetched to the returning slot in case we've
+			 * previously locked it.  Fetch the tuple only if the slot is empty.
+			 */
 			slot = ExecGetReturningSlot(estate, resultRelInfo);
 			if (oldtuple != NULL)
 			{
 ExecForceStoreHeapTuple(oldtuple, slot, false);
 			}
-			else
+			else if (TupIsNull(slot))
 			{
 if (!table_tuple_fetch_row_version(resultRelationDesc, tupleid,
    SnapshotAny, slot))
@@ -2409,6 +2428,19 @@ redo_act:
 		case TM_Ok:
 			Assert(context->tmfd.traversed);
 
+			/* Make sure ri_oldTupleSlot is initialized. */
+			if (unlikely(!resultRelInfo->ri_projectNewInfoValid))
+ExecInitUpdateProjection(context->mtstate,
+		 resultRelInfo);
+
+			/*
+			 * Save the locked tuple for further calculation of
+			 * the new tuple.
+			 */
+			oldSlot = resultRelInfo->ri_oldTupleSlot;
+			ExecCopySlot(oldSlot, inputslot);
+			ExecMaterializeSlot(oldSlot);
+
 			epqslot = EvalPlanQual(context->epqstate,
    resultRelationDesc,
    resultRelInfo->ri_RangeTableIndex,
@@ -2417,18 +2449,6 @@ redo_act:
 /* Tuple not passing quals anymore, exiting... */
 return NULL;
 
-			/* Make sure ri_oldTupleSlot is initialized. */
-			if (unlikely(!resultRelInfo->ri_projectNewInfoValid))
-ExecInitUpdateProjection(context->mtstate,
-		 resultRelInfo);
-
-			/* Fetch the most recent version of old tuple. */
-			oldSlot 

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-10 Thread Chris Travers
"Right, the improvement this patch gives to the heap is not the full 
motivation. Another motivation is the improvement it gives to TableAM API. Our 
current API implies that the effort on locating the tuple by tid is small. This 
is more or less true for the heap, where we just need to pin and lock the 
buffer. But imagine other TableAM implementations, where locating a tuple is 
more expensive."

Yeah. Our TableAM API is a very nice start to getting pluggable storage, but we 
still have a long ways to go to have an ability to really provide a wide 
variety of pluggable storage engines.

In particular, the following approaches are likely to have much more expensive 
tid lookups:
 - columnar storage (may require a lot of random IO to reconstruct a tuple)
 - index oriented storage (tid no longer physically locatable in the file via 
seek)
 - compressed cold storage like pg_ctyogen (again seek may be problematic).

To my mind I think the performance benefits are a nice side benefit, but the 
main interest I have on this is regarding improvements in the TableAM 
capabilities.  I cannot see how to do this without a lot more infrastructure.

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-10 Thread Alexander Korotkov
On Wed, Mar 8, 2023 at 4:22 AM Andres Freund  wrote:
> On 2023-03-07 04:45:32 +0300, Alexander Korotkov wrote:
> > The second patch now implements a concept of LazyTupleTableSlot, a slot
> > which gets allocated only when needed.  Also, there is more minor
> > refactoring and more comments.
>
> This patch already is pretty big for what it actually improves. Introducing
> even infrastructure to get a not that big win, in a not particularly
> interesting, extreme, workload...

It's true that the win isn't dramatic.  But can't agree that workload
isn't interesting.  In my experience, high-contention over limited set
of row is something that frequently happen is production.  I
personally took part in multiple investigations over such workloads.

> What is motivating this?

Right, the improvement this patch gives to heap is not the full
motivation.  Another motivation is improvement it gives to TableAM
API.  Our current API implies that the effort on locating the tuple by
tid is small.  This is more or less true for heap, where we just need
to pin and lock the buffer.  But imagine other TableAM
implementations, where locating a tuple is more expensive.  Current
API insist that we do that twice in update attempt and lock.  Doing
that in single call could give such TableAM's singification economy
(but even for heap it's something).  I'm working on such TableAM: it's
OrioleDB which implements index-organized tables.  And I know there
are other examples (for instance, zedstore), where TID lookup includes
some indirection.

--
Regards,
Alexander Korotkov




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-07 Thread Andres Freund
Hi,

On 2023-03-07 04:45:32 +0300, Alexander Korotkov wrote:
> The second patch now implements a concept of LazyTupleTableSlot, a slot
> which gets allocated only when needed.  Also, there is more minor
> refactoring and more comments.

This patch already is pretty big for what it actually improves. Introducing
even infrastructure to get a not that big win, in a not particularly
interesting, extreme, workload...

What is motivating this?

Greetings,

Andres Freund




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-07 Thread Alexander Korotkov
On Tue, Mar 7, 2023 at 7:26 PM Pavel Borisov  wrote:
> On Tue, 7 Mar 2023, 10:10 Alexander Korotkov,  wrote:
>> I don't know what exactly Pavel meant, but average overall numbers for
>> low concurrency are.
>> master: 420401 (stddev of average 233)
>> patchset v11: 420111 (stddev of average 199)
>> The difference is less than 0.1% and that is very safely within the error.
>
>
> Yes, the only thing that I meant is that for low-concurrency case the results 
> between patch and master are within the difference between repeated series of 
> measurements. So I concluded that the test can not prove any difference 
> between patch and master.
>
> I haven't meant or written there is some performance degradation.
>
> Alexander, I suppose did an extra step and calculated overall average and 
> stddev, from raw data provided. Thanks!

Pavel, thank you for verifying this.

Could you, please, rerun performance benchmarks for the v13?  It
introduces LazyTupleTableSlot, which shouldn't do any measurable
impact on performance.  But still.

--
Regards,
Alexander Korotkov




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-07 Thread Pavel Borisov
Hi, Andres and Alexander!

On Tue, 7 Mar 2023, 10:10 Alexander Korotkov,  wrote:

> On Tue, Mar 7, 2023 at 4:50 AM Andres Freund  wrote:
> > On 2023-03-02 14:28:56 +0400, Pavel Borisov wrote:
> > > 2. Heap updates with low tuple concurrency:
> > > Prepare with pkeys (pgbench -d postgres -i -I dtGvp -s 300
> --unlogged-tables)
> > > Update 3*10^7 rows, 50 conns (pgbench postgres -f
> > > ./update-only-account.sql -s 300 -P10 -M prepared -T 600 -j 5 -c 50)
> > >
> > > Result: Both patches and master are the same within a tolerance of
> > > less than 0.7%.
> >
> > What exactly does that mean? I would definitely not want to accept a 0.7%
> > regression of the uncontended case to benefit the contended case here...
>
> I don't know what exactly Pavel meant, but average overall numbers for
> low concurrency are.
> master: 420401 (stddev of average 233)
> patchset v11: 420111 (stddev of average 199)
> The difference is less than 0.1% and that is very safely within the error.
>

Yes, the only thing that I meant is that for low-concurrency case the
results between patch and master are within the difference between repeated
series of measurements. So I concluded that the test can not prove any
difference between patch and master.

I haven't meant or written there is some performance degradation.

Alexander, I suppose did an extra step and calculated overall average and
stddev, from raw data provided. Thanks!

Regards,
Pavel.

>


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-06 Thread Alexander Korotkov
On Tue, Mar 7, 2023 at 4:50 AM Andres Freund  wrote:
> On 2023-03-02 14:28:56 +0400, Pavel Borisov wrote:
> > 2. Heap updates with low tuple concurrency:
> > Prepare with pkeys (pgbench -d postgres -i -I dtGvp -s 300 
> > --unlogged-tables)
> > Update 3*10^7 rows, 50 conns (pgbench postgres -f
> > ./update-only-account.sql -s 300 -P10 -M prepared -T 600 -j 5 -c 50)
> >
> > Result: Both patches and master are the same within a tolerance of
> > less than 0.7%.
>
> What exactly does that mean? I would definitely not want to accept a 0.7%
> regression of the uncontended case to benefit the contended case here...

I don't know what exactly Pavel meant, but average overall numbers for
low concurrency are.
master: 420401 (stddev of average 233)
patchset v11: 420111 (stddev of average 199)
The difference is less than 0.1% and that is very safely within the error.

--
Regards,
Alexander Korotkov




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-06 Thread Andres Freund
Hi,

On 2023-03-02 14:28:56 +0400, Pavel Borisov wrote:
> 2. Heap updates with low tuple concurrency:
> Prepare with pkeys (pgbench -d postgres -i -I dtGvp -s 300 --unlogged-tables)
> Update 3*10^7 rows, 50 conns (pgbench postgres -f
> ./update-only-account.sql -s 300 -P10 -M prepared -T 600 -j 5 -c 50)
> 
> Result: Both patches and master are the same within a tolerance of
> less than 0.7%.

What exactly does that mean? I would definitely not want to accept a 0.7%
regression of the uncontended case to benefit the contended case here...

Greetings,

Andres Freund




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-06 Thread Alexander Korotkov
On Thu, Mar 2, 2023 at 11:16 PM Alexander Korotkov  wrote:
> On Thu, Mar 2, 2023 at 9:17 PM Pavel Borisov  wrote:
> > On Thu, 2 Mar 2023 at 18:53, Alexander Korotkov  
> > wrote:
> > > On Thu, Mar 2, 2023 at 1:29 PM Pavel Borisov  
> > > wrote:
> > > > > Let's see the performance results for the patchset.  I'll properly
> > > > > revise the comments if results will be good.
> > > > >
> > > > > Pavel, could you please re-run your tests over revised patchset?
> > > >
> > > > Since last time I've improved the test to avoid significant series
> > > > differences due to AWS storage access variation that is seen in [1].
> > > > I.e. each series of tests is run on a tmpfs with newly inited pgbench
> > > > tables and vacuum. Also, I've added a test for low-concurrency updates
> > > > where the locking optimization isn't expected to improve performance,
> > > > just to make sure the patches don't make things worse.
> > > >
> > > > The tests are as follows:
> > > > 1. Heap updates with high tuple concurrency:
> > > > Prepare without pkeys (pgbench -d postgres -i -I dtGv -s 10 
> > > > --unlogged-tables)
> > > > Update tellers 100 rows, 50 conns ( pgbench postgres -f
> > > > ./update-only-tellers.sql -s 10 -P10 -M prepared -T 600 -j 5 -c 50 )
> > > >
> > > > Result: Average of 5 series with patches (0001+0002) is around 5%
> > > > faster than both master and patch 0001. Still, there are some
> > > > fluctuations between different series of the measurements of the same
> > > > patch, but much less than in [1]
> > >
> > > Thank you for running this that fast!
> > >
> > > So, it appears that 0001 patch has no effect.  So, we probably should
> > > consider to drop 0001 patch and consider just 0002 patch.
> > >
> > > The attached patch v12 contains v11 0002 patch extracted separately.
> > > Please, add it to the performance comparison.  Thanks.
> >
> > I've done a benchmarking on a full series of four variants: master vs
> > v11-0001 vs v11-0001+0002 vs v12 in the same configuration as in the
> > previous measurement. The results are as follows:
> >
> > 1. Heap updates with high tuple concurrency:
> > Average of 5 series v11-0001+0002 is around 7% faster than the master.
> > I need to note that while v11-0001+0002 shows consistent performance
> > improvement over the master, its value can not be determined more
> > precisely than a couple of percents even with averaging. So I'd
> > suppose we may not conclude from the results if a more subtle
> > difference between v11-0001+0002 vs v12 (and master vs v11-0001)
> > really exists.
> >
> > 2. Heap updates with high tuple concurrency:
> > All patches and master are still the same within a tolerance of
> > less than 0.7%.
> >
> > Overall patch v11-0001+0002 doesn't show performance degradation so I
> > don't see why to apply only patch 0002 skipping 0001.
>
> Thank you, Pavel.  So, it seems that we have substantial benefit only
> with two patches.  So, I'll continue working on both of them.

The revised patchset is attached.  The patch removing extra
table_tuple_fetch_row_version() is back.  The second patch now
implements a concept of LazyTupleTableSlot, a slot which gets
allocated only when needed.  Also, there is more minor refactoring and
more comments.

--
Regards,
Alexander Korotkov


0001-Evade-extra-table_tuple_fetch_row_version-in-Exe-v13.patch
Description: Binary data


0002-Allow-locking-updated-tuples-in-tuple_update-and-v13.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-02 Thread Alexander Korotkov
On Thu, Mar 2, 2023 at 9:17 PM Pavel Borisov  wrote:
> On Thu, 2 Mar 2023 at 18:53, Alexander Korotkov  wrote:
> > On Thu, Mar 2, 2023 at 1:29 PM Pavel Borisov  wrote:
> > > > Let's see the performance results for the patchset.  I'll properly
> > > > revise the comments if results will be good.
> > > >
> > > > Pavel, could you please re-run your tests over revised patchset?
> > >
> > > Since last time I've improved the test to avoid significant series
> > > differences due to AWS storage access variation that is seen in [1].
> > > I.e. each series of tests is run on a tmpfs with newly inited pgbench
> > > tables and vacuum. Also, I've added a test for low-concurrency updates
> > > where the locking optimization isn't expected to improve performance,
> > > just to make sure the patches don't make things worse.
> > >
> > > The tests are as follows:
> > > 1. Heap updates with high tuple concurrency:
> > > Prepare without pkeys (pgbench -d postgres -i -I dtGv -s 10 
> > > --unlogged-tables)
> > > Update tellers 100 rows, 50 conns ( pgbench postgres -f
> > > ./update-only-tellers.sql -s 10 -P10 -M prepared -T 600 -j 5 -c 50 )
> > >
> > > Result: Average of 5 series with patches (0001+0002) is around 5%
> > > faster than both master and patch 0001. Still, there are some
> > > fluctuations between different series of the measurements of the same
> > > patch, but much less than in [1]
> >
> > Thank you for running this that fast!
> >
> > So, it appears that 0001 patch has no effect.  So, we probably should
> > consider to drop 0001 patch and consider just 0002 patch.
> >
> > The attached patch v12 contains v11 0002 patch extracted separately.
> > Please, add it to the performance comparison.  Thanks.
>
> I've done a benchmarking on a full series of four variants: master vs
> v11-0001 vs v11-0001+0002 vs v12 in the same configuration as in the
> previous measurement. The results are as follows:
>
> 1. Heap updates with high tuple concurrency:
> Average of 5 series v11-0001+0002 is around 7% faster than the master.
> I need to note that while v11-0001+0002 shows consistent performance
> improvement over the master, its value can not be determined more
> precisely than a couple of percents even with averaging. So I'd
> suppose we may not conclude from the results if a more subtle
> difference between v11-0001+0002 vs v12 (and master vs v11-0001)
> really exists.
>
> 2. Heap updates with high tuple concurrency:
> All patches and master are still the same within a tolerance of
> less than 0.7%.
>
> Overall patch v11-0001+0002 doesn't show performance degradation so I
> don't see why to apply only patch 0002 skipping 0001.

Thank you, Pavel.  So, it seems that we have substantial benefit only
with two patches.  So, I'll continue working on both of them.

--
Regards,
Alexander Korotkov




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-02 Thread Alexander Korotkov
Hi, Pavel!

On Thu, Mar 2, 2023 at 1:29 PM Pavel Borisov  wrote:
> > Let's see the performance results for the patchset.  I'll properly
> > revise the comments if results will be good.
> >
> > Pavel, could you please re-run your tests over revised patchset?
>
> Since last time I've improved the test to avoid significant series
> differences due to AWS storage access variation that is seen in [1].
> I.e. each series of tests is run on a tmpfs with newly inited pgbench
> tables and vacuum. Also, I've added a test for low-concurrency updates
> where the locking optimization isn't expected to improve performance,
> just to make sure the patches don't make things worse.
>
> The tests are as follows:
> 1. Heap updates with high tuple concurrency:
> Prepare without pkeys (pgbench -d postgres -i -I dtGv -s 10 --unlogged-tables)
> Update tellers 100 rows, 50 conns ( pgbench postgres -f
> ./update-only-tellers.sql -s 10 -P10 -M prepared -T 600 -j 5 -c 50 )
>
> Result: Average of 5 series with patches (0001+0002) is around 5%
> faster than both master and patch 0001. Still, there are some
> fluctuations between different series of the measurements of the same
> patch, but much less than in [1]

Thank you for running this that fast!

So, it appears that 0001 patch has no effect.  So, we probably should
consider to drop 0001 patch and consider just 0002 patch.

The attached patch v12 contains v11 0002 patch extracted separately.
Please, add it to the performance comparison.  Thanks.

--
Regards,
Alexander Korotkov


0001-Allow-locking-updated-tuples-in-tuple_update-and-v12.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-01 Thread Gregory Stark
It looks like this patch received some feedback from Andres and hasn't
had any further work posted. I'm going to move it to "Waiting on
Author".

It doesn't sound like this is likely to get committed this release
cycle unless responding to Andres's points simpler than I expect.




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-01 Thread Alexander Korotkov
Hi, Andres.

Thank you for your review.  Sorry for the late reply.  I took some
time for me to figure out how to revise the patch.

The revised patchset is attached.  I decided to split the patch into two:
1) Avoid re-fetching the "original" row version during update and delete.
2) Save the efforts by re-using existing context of
tuple_update()/tuple_delete() for locking the tuple.
They are two separate optimizations. So let's evaluate their
performance separately.

On Tue, Jan 10, 2023 at 4:07 AM Andres Freund  wrote:
> I'm a bit worried that this is optimizing the rare case while hurting the
> common case. See e.g. my point below about creating additional slots in the
> happy path.

This makes sense.  It worth to allocate the slot only if we're going
to store a tuple there.  I implemented this by passing a callback for
slot allocation instead of the slot.

> It's also not clear that change is right directionally. If we want to avoid
> re-fetching the "original" row version, why don't we provide that
> functionality via table_tuple_lock()?

These are two distinct optimizations.  Now, they come as two distinct patches.

> On 2023-01-09 13:29:18 +0300, Alexander Korotkov wrote:
> > @@ -53,6 +53,12 @@ static bool SampleHeapTupleVisible(TableScanDesc scan, 
> > Buffer buffer,
> >  HeapTuple 
> > tuple,
> >  
> > OffsetNumber tupoffset);
> >
> > +static TM_Result heapam_tuple_lock_internal(Relation relation, ItemPointer 
> > tid,
> > +   
> >   Snapshot snapshot, TupleTableSlot *slot,
> > +   
> >   CommandId cid, LockTupleMode mode,
> > +   
> >   LockWaitPolicy wait_policy, uint8 flags,
> > +   
> >   TM_FailureData *tmfd, bool updated);
> > +
> >  static BlockNumber heapam_scan_get_blocks_done(HeapScanDesc hscan);
> >
> >  static const TableAmRoutine heapam_methods;
> > @@ -299,14 +305,39 @@ heapam_tuple_complete_speculative(Relation relation, 
> > TupleTableSlot *slot,
> >  static TM_Result
> >  heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
> >   Snapshot snapshot, Snapshot 
> > crosscheck, bool wait,
> > - TM_FailureData *tmfd, bool 
> > changingPart)
> > + TM_FailureData *tmfd, bool 
> > changingPart,
> > + TupleTableSlot *lockedSlot)
> >  {
> > + TM_Result   result;
> > +
> >   /*
> >* Currently Deleting of index tuples are handled at vacuum, in case 
> > if
> >* the storage itself is cleaning the dead tuples by itself, it is the
> >* time to call the index tuple deletion also.
> >*/
> > - return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, 
> > changingPart);
> > + result = heap_delete(relation, tid, cid, crosscheck, wait, tmfd, 
> > changingPart);
> > +
> > + /*
> > +  * If the tuple has been concurrently updated, get lock already so 
> > that on
> > +  * retry it will succeed, provided that the caller asked to do this by
> > +  * providing a lockedSlot.
> > +  */
> > + if (result == TM_Updated && lockedSlot != NULL)
> > + {
> > + result = heapam_tuple_lock_internal(relation, tid, snapshot,
> > +   
> >   lockedSlot, cid, LockTupleExclusive,
> > +   
> >   LockWaitBlock,
> > +   
> >   TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
> > +   
> >   tmfd, true);
>
> You're ignoring the 'wait' parameter here, no? I think the modification to
> heapam_tuple_update() has the same issue.

Yep.  I didn't catch this, because currently we also call
tuple_update()/tuple_delete() with wait == true.  Fixed.

> > + if (result == TM_Ok)
> > + {
> > + tmfd->traversed = true;
> > + return TM_Updated;
> > + }
> > + }
> > +
> > + return result;
>
> Doesn't this mean that the caller can't easily distinguish between
> heapam_tuple_delete() and heapam_tuple_lock_internal() returning a failure
> state?

Exactly.  But currently nodeModifyTable.c handles these failure states
in the similar way.  And I don't see why it should be different in
future.

> > @@ -350,213 +402,8 @@ heapam_tuple_lock(Relation relation, ItemPointer tid, 
> > 

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-02-28 Thread Alexander Korotkov
On Wed, Mar 1, 2023 at 12:03 AM Gregory Stark  wrote:
> It looks like this patch received some feedback from Andres and hasn't
> had any further work posted. I'm going to move it to "Waiting on
> Author".

I'll post the updated version in the next couple of days.

> It doesn't sound like this is likely to get committed this release
> cycle unless responding to Andres's points simpler than I expect.

I wouldn't think ahead that much.

--
Regards,
Alexander Korotkov




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-09 Thread Andres Freund
Hi,

I'm a bit worried that this is optimizing the rare case while hurting the
common case. See e.g. my point below about creating additional slots in the
happy path.

It's also not clear that change is right directionally. If we want to avoid
re-fetching the "original" row version, why don't we provide that
functionality via table_tuple_lock()?


On 2023-01-09 13:29:18 +0300, Alexander Korotkov wrote:
> @@ -53,6 +53,12 @@ static bool SampleHeapTupleVisible(TableScanDesc scan, 
> Buffer buffer,
>  HeapTuple 
> tuple,
>  OffsetNumber 
> tupoffset);
>
> +static TM_Result heapam_tuple_lock_internal(Relation relation, ItemPointer 
> tid,
> + 
> Snapshot snapshot, TupleTableSlot *slot,
> + 
> CommandId cid, LockTupleMode mode,
> + 
> LockWaitPolicy wait_policy, uint8 flags,
> + 
> TM_FailureData *tmfd, bool updated);
> +
>  static BlockNumber heapam_scan_get_blocks_done(HeapScanDesc hscan);
>
>  static const TableAmRoutine heapam_methods;
> @@ -299,14 +305,39 @@ heapam_tuple_complete_speculative(Relation relation, 
> TupleTableSlot *slot,
>  static TM_Result
>  heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
>   Snapshot snapshot, Snapshot crosscheck, 
> bool wait,
> - TM_FailureData *tmfd, bool changingPart)
> + TM_FailureData *tmfd, bool changingPart,
> + TupleTableSlot *lockedSlot)
>  {
> + TM_Result   result;
> +
>   /*
>* Currently Deleting of index tuples are handled at vacuum, in case if
>* the storage itself is cleaning the dead tuples by itself, it is the
>* time to call the index tuple deletion also.
>*/
> - return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, 
> changingPart);
> + result = heap_delete(relation, tid, cid, crosscheck, wait, tmfd, 
> changingPart);
> +
> + /*
> +  * If the tuple has been concurrently updated, get lock already so that 
> on
> +  * retry it will succeed, provided that the caller asked to do this by
> +  * providing a lockedSlot.
> +  */
> + if (result == TM_Updated && lockedSlot != NULL)
> + {
> + result = heapam_tuple_lock_internal(relation, tid, snapshot,
> + 
> lockedSlot, cid, LockTupleExclusive,
> + 
> LockWaitBlock,
> + 
> TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
> + 
> tmfd, true);

You're ignoring the 'wait' parameter here, no? I think the modification to
heapam_tuple_update() has the same issue.


> + if (result == TM_Ok)
> + {
> + tmfd->traversed = true;
> + return TM_Updated;
> + }
> + }
> +
> + return result;

Doesn't this mean that the caller can't easily distinguish between
heapam_tuple_delete() and heapam_tuple_lock_internal() returning a failure
state?


> @@ -350,213 +402,8 @@ heapam_tuple_lock(Relation relation, ItemPointer tid, 
> Snapshot snapshot,
> LockWaitPolicy wait_policy, uint8 flags,
> TM_FailureData *tmfd)
>  {

Moving the entire body of the function around, makes it harder to review
this change, because the code movement is intermingled with "actual" changes.


> +/*
> + * This routine does the work for heapam_tuple_lock(), but also support
> + * `updated` to re-use the work done by heapam_tuple_update() or
> + * heapam_tuple_delete() on fetching tuple and checking its visibility.
> + */
> +static TM_Result
> +heapam_tuple_lock_internal(Relation relation, ItemPointer tid, Snapshot 
> snapshot,
> +TupleTableSlot *slot, 
> CommandId cid, LockTupleMode mode,
> +LockWaitPolicy wait_policy, 
> uint8 flags,
> +TM_FailureData *tmfd, bool 
> updated)
> +{
> + BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
> + TM_Result   result;
> + Buffer  buffer = InvalidBuffer;
> + HeapTuple   tuple = >base.tupdata;
> + boolfollow_updates;
> +
> + follow_updates 

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-09 Thread Andres Freund
Hi,

On 2023-01-09 13:46:50 +0300, Alexander Korotkov wrote:
> I'm going to push v9.

Could you hold off for a bit? I'd like to look at this, I'm not sure I like
the direction.

Greetings,

Andres Freund




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-09 Thread Alexander Korotkov
Hi, Pavel!

On Mon, Jan 9, 2023 at 1:41 PM Pavel Borisov  wrote:
> On Mon, 9 Jan 2023 at 13:29, Alexander Korotkov  wrote:
> > On Mon, Jan 9, 2023 at 1:10 PM Alexander Korotkov  
> > wrote:
> > > On Mon, Jan 9, 2023 at 12:56 PM Aleksander Alekseev
> > >  wrote:
> > > > > I'm going to push this if no objections.
> > > >
> > > > I took a fresh look at the patch and it LGTM. I only did a few
> > > > cosmetic changes, PFA v7.
> > > >
> > > > Changes since v6 are:
> > >
> > > Thank you for looking into this.  It appears that I've applied changes
> > > proposed by Mason to v5, not v6.  That lead to comment mismatch with
> > > the code that you've noticed.  v8 should be correct.  Please, recheck.
> >
> > v9 also incorporates lost changes to the commit message by Pavel Borisov.
> I've looked through patch v9. It resembles patch v5 plus comments
> clarification by Mason plus the right discussion link in the commit
> message from v8. Aleksander's proposal of Assert in v7 was due to
> changes lost between v5 and v6, as combining connected variables in v5
> makes checks for them being in agreement one with the other
> unnecessary. So changes from v7 are not in v9.
>
> Sorry for being so detailed in small details. In my opinion the patch
> now is ready to be committed.

Sorry for creating this mess with lost changes.  And thank you for
confirming it's good now.  I'm going to push v9.

--
Regards,
Alexander Korotkov




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-09 Thread Aleksander Alekseev
Alexander, Pavel,

> Sorry for being so detailed in small details. In my opinion the patch
> now is ready to be committed.

Agree.

Personally I liked the version with (lockUpdated, lockedSlot) pair a
bit more since it is a bit more readable, however the version without
lockUpdated is less error prone and slightly more efficient. So all in
all I have no strong opinion on which is better.

-- 
Best regards,
Aleksander Alekseev




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-09 Thread Pavel Borisov
Hi, Alexander!

On Mon, 9 Jan 2023 at 13:29, Alexander Korotkov  wrote:
>
> On Mon, Jan 9, 2023 at 1:10 PM Alexander Korotkov  
> wrote:
> > On Mon, Jan 9, 2023 at 12:56 PM Aleksander Alekseev
> >  wrote:
> > > > I'm going to push this if no objections.
> > >
> > > I took a fresh look at the patch and it LGTM. I only did a few
> > > cosmetic changes, PFA v7.
> > >
> > > Changes since v6 are:
> >
> > Thank you for looking into this.  It appears that I've applied changes
> > proposed by Mason to v5, not v6.  That lead to comment mismatch with
> > the code that you've noticed.  v8 should be correct.  Please, recheck.
>
> v9 also incorporates lost changes to the commit message by Pavel Borisov.
I've looked through patch v9. It resembles patch v5 plus comments
clarification by Mason plus the right discussion link in the commit
message from v8. Aleksander's proposal of Assert in v7 was due to
changes lost between v5 and v6, as combining connected variables in v5
makes checks for them being in agreement one with the other
unnecessary. So changes from v7 are not in v9.

Sorry for being so detailed in small details. In my opinion the patch
now is ready to be committed.

Regards,
Pavel Borisov




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-09 Thread Alexander Korotkov
On Mon, Jan 9, 2023 at 1:10 PM Alexander Korotkov  wrote:
> On Mon, Jan 9, 2023 at 12:56 PM Aleksander Alekseev
>  wrote:
> > > I'm going to push this if no objections.
> >
> > I took a fresh look at the patch and it LGTM. I only did a few
> > cosmetic changes, PFA v7.
> >
> > Changes since v6 are:
>
> Thank you for looking into this.  It appears that I've applied changes
> proposed by Mason to v5, not v6.  That lead to comment mismatch with
> the code that you've noticed.  v8 should be correct.  Please, recheck.

v9 also incorporates lost changes to the commit message by Pavel Borisov.

--
Regards,
Alexander Korotkov


0001-Allow-locking-updated-tuples-in-tuple_update-and--v9.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-09 Thread Alexander Korotkov
Hi Aleksander,

On Mon, Jan 9, 2023 at 12:56 PM Aleksander Alekseev
 wrote:
> > I'm going to push this if no objections.
>
> I took a fresh look at the patch and it LGTM. I only did a few
> cosmetic changes, PFA v7.
>
> Changes since v6 are:

Thank you for looking into this.  It appears that I've applied changes
proposed by Mason to v5, not v6.  That lead to comment mismatch with
the code that you've noticed.  v8 should be correct.  Please, recheck.

--
Regards,
Alexander Korotkov


0001-Allow-locking-updated-tuples-in-tuple_update-and--v8.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-09 Thread Aleksander Alekseev
Hi Alexander,

> I'm going to push this if no objections.

I took a fresh look at the patch and it LGTM. I only did a few
cosmetic changes, PFA v7.

Changes since v6 are:

```
@@ -318,12 +318,12 @@ heapam_tuple_delete(Relation relation,
ItemPointer tid, CommandId cid,
 result = heap_delete(relation, tid, cid, crosscheck, wait, tmfd,
changingPart);

 /*
- * If the tuple has been concurrently updated, get lock already so that on
- * retry it will succeed, provided that the caller asked to do this by
- * providing a lockedSlot.
+ * If lockUpdated is true and the tuple has been concurrently updated, get
+ * the lock immediately so that on retry we will succeed.
  */
 if (result == TM_Updated && lockUpdated)
 {
+Assert(lockedSlot != NULL);
```

... and the same for heapam_tuple_update().

-- 
Best regards,
Aleksander Alekseev


v7-0001-Allow-locking-updated-tuples-in-tuple_update-and-.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-08 Thread Alexander Korotkov
Hi, Mason!

Thank you very much for your review.

On Sun, Jan 8, 2023 at 9:33 PM Mason Sharp  wrote:
> On Fri, Jan 6, 2023 at 4:46 AM Pavel Borisov  wrote:
>> Besides, the new version has only some minor changes in the comments
>> and the commit message.
> It looks good, and the greater the concurrency the greater the benefit will 
> be. Just a few minor suggestions regarding comments.
>
> "ExecDeleteAct() have already locked the old tuple for us", change "have" to 
> "has".
>
> The comments in heapam_tuple_delete() and heapam_tuple_update() might be a 
> little clearer with something like:
>
> "If the tuple has been concurrently updated, get lock already so that on
> retry it will succeed, provided that the caller asked to do this by
> providing a lockedSlot."

Thank you.  These changes are incorporated into v6 of the patch.

> Also, not too important, but perhaps better clarify in the commit message 
> that the repeated work is driven by ExecUpdate and ExecDelete and can happen 
> multiple times depending on the concurrency.

Hmm... It can't happen arbitrary number of times.  If tuple was
concurrently updated, the we lock it.  Once we lock, nobody can change
it until we finish out work.  So, I think no changes needed.

I'm going to push this if no objections.

--
Regards,
Alexander Korotkov


0001-Allow-locking-updated-tuples-in-tuple_update-and--v6.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-08 Thread Mason Sharp
On Fri, Jan 6, 2023 at 4:46 AM Pavel Borisov  wrote:

> Hi, Alexander!
>
> On Thu, 5 Jan 2023 at 15:11, Alexander Korotkov 
> wrote:
> >
> > On Wed, Jan 4, 2023 at 5:05 PM Alexander Korotkov 
> wrote:
> > > On Wed, Jan 4, 2023 at 3:43 PM Pavel Borisov 
> wrote:
> > > > One more update of a patchset to avoid compiler warnings.
> > >
> > > Thank you for your help.  I'm going to provide the revised version of
> > > patch with comments and commit message in the next couple of days.
> >
> > The revised patch is attached.  It contains describing commit message,
> > comments and some minor code improvements.
>
> I've looked through the patch once again. It seems in a nice state to
> be committed.
> I also noticed that in tableam level and NodeModifyTable function
> calls we have a one-to-one correspondence between *lockedSlot и bool
> lockUpdated, but no checks on this in case something changes in the
> code in the future. I'd propose combining these variables to remain
> free from these checks. See v5 of a patch. Tests are successfully
> passed.
> Besides, the new version has only some minor changes in the comments
> and the commit message.
>
> Kind regards,
> Pavel Borisov,
> Supabase.
>

It looks good, and the greater the concurrency the greater the benefit will
be. Just a few minor suggestions regarding comments.

"ExecDeleteAct() have already locked the old tuple for us", change "have"
to "has".

The comments in heapam_tuple_delete() and heapam_tuple_update() might be a
little clearer with something like:

"If the tuple has been concurrently updated, get lock already so that on
retry it will succeed, provided that the caller asked to do this by
providing a lockedSlot."

Also, not too important, but perhaps better clarify in the commit message
that the repeated work is driven by ExecUpdate and ExecDelete and can
happen multiple times depending on the concurrency.

Best Regards,

Mason


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-06 Thread Pavel Borisov
Hi, Alexander!

On Thu, 5 Jan 2023 at 15:11, Alexander Korotkov  wrote:
>
> On Wed, Jan 4, 2023 at 5:05 PM Alexander Korotkov  
> wrote:
> > On Wed, Jan 4, 2023 at 3:43 PM Pavel Borisov  wrote:
> > > One more update of a patchset to avoid compiler warnings.
> >
> > Thank you for your help.  I'm going to provide the revised version of
> > patch with comments and commit message in the next couple of days.
>
> The revised patch is attached.  It contains describing commit message,
> comments and some minor code improvements.

I've looked through the patch once again. It seems in a nice state to
be committed.
I also noticed that in tableam level and NodeModifyTable function
calls we have a one-to-one correspondence between *lockedSlot и bool
lockUpdated, but no checks on this in case something changes in the
code in the future. I'd propose combining these variables to remain
free from these checks. See v5 of a patch. Tests are successfully
passed.
Besides, the new version has only some minor changes in the comments
and the commit message.

Kind regards,
Pavel Borisov,
Supabase.


v5-0001-Allow-locking-updated-tuples-in-tuple_update-and-.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-05 Thread Alexander Korotkov
On Wed, Jan 4, 2023 at 5:05 PM Alexander Korotkov  wrote:
> On Wed, Jan 4, 2023 at 3:43 PM Pavel Borisov  wrote:
> > One more update of a patchset to avoid compiler warnings.
>
> Thank you for your help.  I'm going to provide the revised version of
> patch with comments and commit message in the next couple of days.

The revised patch is attached.  It contains describing commit message,
comments and some minor code improvements.

--
Regards,
Alexander Korotkov


0001-Allow-locking-updated-tuples-in-tuple_update-and--v4.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-04 Thread Alexander Korotkov
Hi, Pavel!

On Wed, Jan 4, 2023 at 3:43 PM Pavel Borisov  wrote:
> On Wed, 4 Jan 2023 at 12:52, Pavel Borisov  wrote:
> > On Wed, 4 Jan 2023 at 12:41, vignesh C  wrote:
> > >
> > > On Fri, 1 Jul 2022 at 16:49, Alexander Korotkov  
> > > wrote:
> > > >
> > > > Hackers,
> > > >
> > > > When working in the read committed transaction isolation mode
> > > > (default), we have the following sequence of actions when
> > > > tuple_update() or tuple_delete() find concurrently updated tuple.
> > > >
> > > > 1. tuple_update()/tuple_delete() returns TM_Updated
> > > > 2. tuple_lock()
> > > > 3. Re-evaluate plan qual (recheck if we still need to update/delete
> > > > and calculate the new tuple for update)
> > > > 4. tuple_update()/tuple_delete() (this time should be successful,
> > > > since we've previously locked the tuple).
> > > >
> > > > I wonder if we should merge steps 1 and 2. We could save some efforts
> > > > already done during tuple_update()/tuple_delete() for locking the
> > > > tuple. In heap table access method, we've to start tuple_lock() with
> > > > the first tuple in the chain, but tuple_update()/tuple_delete()
> > > > already visited it. For undo-based table access methods,
> > > > tuple_update()/tuple_delete() should start from the last version, why
> > > > don't place the tuple lock immediately once a concurrent update is
> > > > detected. I think this patch should have some performance benefits on
> > > > high concurrency.
> > > >
> > > > Also, the patch simplifies code in nodeModifyTable.c getting rid of
> > > > the nested case. I also get rid of extra
> > > > table_tuple_fetch_row_version() in ExecUpdate. Why re-fetch the old
> > > > tuple, when it should be exactly the same tuple we've just locked.
> > > >
> > > > I'm going to check the performance impact. Thoughts and feedback are 
> > > > welcome.
> > >
> > > The patch does not apply on top of HEAD as in [1], please post a rebased 
> > > patch:
> > > === Applying patches on top of PostgreSQL commit ID
> > > eb5ad4ff05fd382ac98cab60b82f7fd6ce4cfeb8 ===
> > > === applying patch
> > > ./0001-Lock-updated-tuples-in-tuple_update-and-tuple_del-v1.patch
> > > patching file src/backend/executor/nodeModifyTable.c
> > > ...
> > > Hunk #3 FAILED at 1376.
> > > ...
> > > 1 out of 15 hunks FAILED -- saving rejects to file
> > > src/backend/executor/nodeModifyTable.c.rej
> > >
> > > [1] - http://cfbot.cputube.org/patch_41_4099.log
> >
> > The rebased patch is attached. It's just a change in formatting, no
> > changes in code though.
>
> One more update of a patchset to avoid compiler warnings.

Thank you for your help.  I'm going to provide the revised version of
patch with comments and commit message in the next couple of days.

--
Regards,
Alexander Korotkov




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-04 Thread Pavel Borisov
On Wed, 4 Jan 2023 at 12:52, Pavel Borisov  wrote:
>
> Hi, Vignesh!
>
> On Wed, 4 Jan 2023 at 12:41, vignesh C  wrote:
> >
> > On Fri, 1 Jul 2022 at 16:49, Alexander Korotkov  
> > wrote:
> > >
> > > Hackers,
> > >
> > > When working in the read committed transaction isolation mode
> > > (default), we have the following sequence of actions when
> > > tuple_update() or tuple_delete() find concurrently updated tuple.
> > >
> > > 1. tuple_update()/tuple_delete() returns TM_Updated
> > > 2. tuple_lock()
> > > 3. Re-evaluate plan qual (recheck if we still need to update/delete
> > > and calculate the new tuple for update)
> > > 4. tuple_update()/tuple_delete() (this time should be successful,
> > > since we've previously locked the tuple).
> > >
> > > I wonder if we should merge steps 1 and 2. We could save some efforts
> > > already done during tuple_update()/tuple_delete() for locking the
> > > tuple. In heap table access method, we've to start tuple_lock() with
> > > the first tuple in the chain, but tuple_update()/tuple_delete()
> > > already visited it. For undo-based table access methods,
> > > tuple_update()/tuple_delete() should start from the last version, why
> > > don't place the tuple lock immediately once a concurrent update is
> > > detected. I think this patch should have some performance benefits on
> > > high concurrency.
> > >
> > > Also, the patch simplifies code in nodeModifyTable.c getting rid of
> > > the nested case. I also get rid of extra
> > > table_tuple_fetch_row_version() in ExecUpdate. Why re-fetch the old
> > > tuple, when it should be exactly the same tuple we've just locked.
> > >
> > > I'm going to check the performance impact. Thoughts and feedback are 
> > > welcome.
> >
> > The patch does not apply on top of HEAD as in [1], please post a rebased 
> > patch:
> > === Applying patches on top of PostgreSQL commit ID
> > eb5ad4ff05fd382ac98cab60b82f7fd6ce4cfeb8 ===
> > === applying patch
> > ./0001-Lock-updated-tuples-in-tuple_update-and-tuple_del-v1.patch
> > patching file src/backend/executor/nodeModifyTable.c
> > ...
> > Hunk #3 FAILED at 1376.
> > ...
> > 1 out of 15 hunks FAILED -- saving rejects to file
> > src/backend/executor/nodeModifyTable.c.rej
> >
> > [1] - http://cfbot.cputube.org/patch_41_4099.log
>
> The rebased patch is attached. It's just a change in formatting, no
> changes in code though.

One more update of a patchset to avoid compiler warnings.

Regards,
Pavel Borisov


v3-0001-Lock-updated-tuples-in-tuple_update-and-tuple_del.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-04 Thread Pavel Borisov
Hi, Vignesh!

On Wed, 4 Jan 2023 at 12:41, vignesh C  wrote:
>
> On Fri, 1 Jul 2022 at 16:49, Alexander Korotkov  wrote:
> >
> > Hackers,
> >
> > When working in the read committed transaction isolation mode
> > (default), we have the following sequence of actions when
> > tuple_update() or tuple_delete() find concurrently updated tuple.
> >
> > 1. tuple_update()/tuple_delete() returns TM_Updated
> > 2. tuple_lock()
> > 3. Re-evaluate plan qual (recheck if we still need to update/delete
> > and calculate the new tuple for update)
> > 4. tuple_update()/tuple_delete() (this time should be successful,
> > since we've previously locked the tuple).
> >
> > I wonder if we should merge steps 1 and 2. We could save some efforts
> > already done during tuple_update()/tuple_delete() for locking the
> > tuple. In heap table access method, we've to start tuple_lock() with
> > the first tuple in the chain, but tuple_update()/tuple_delete()
> > already visited it. For undo-based table access methods,
> > tuple_update()/tuple_delete() should start from the last version, why
> > don't place the tuple lock immediately once a concurrent update is
> > detected. I think this patch should have some performance benefits on
> > high concurrency.
> >
> > Also, the patch simplifies code in nodeModifyTable.c getting rid of
> > the nested case. I also get rid of extra
> > table_tuple_fetch_row_version() in ExecUpdate. Why re-fetch the old
> > tuple, when it should be exactly the same tuple we've just locked.
> >
> > I'm going to check the performance impact. Thoughts and feedback are 
> > welcome.
>
> The patch does not apply on top of HEAD as in [1], please post a rebased 
> patch:
> === Applying patches on top of PostgreSQL commit ID
> eb5ad4ff05fd382ac98cab60b82f7fd6ce4cfeb8 ===
> === applying patch
> ./0001-Lock-updated-tuples-in-tuple_update-and-tuple_del-v1.patch
> patching file src/backend/executor/nodeModifyTable.c
> ...
> Hunk #3 FAILED at 1376.
> ...
> 1 out of 15 hunks FAILED -- saving rejects to file
> src/backend/executor/nodeModifyTable.c.rej
>
> [1] - http://cfbot.cputube.org/patch_41_4099.log

The rebased patch is attached. It's just a change in formatting, no
changes in code though.

Regards,
Pavel Borisov,
Supabase.


v2-0001-Lock-updated-tuples-in-tuple_update-and-tuple_del.patch
Description: Binary data


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-04 Thread vignesh C
On Fri, 1 Jul 2022 at 16:49, Alexander Korotkov  wrote:
>
> Hackers,
>
> When working in the read committed transaction isolation mode
> (default), we have the following sequence of actions when
> tuple_update() or tuple_delete() find concurrently updated tuple.
>
> 1. tuple_update()/tuple_delete() returns TM_Updated
> 2. tuple_lock()
> 3. Re-evaluate plan qual (recheck if we still need to update/delete
> and calculate the new tuple for update)
> 4. tuple_update()/tuple_delete() (this time should be successful,
> since we've previously locked the tuple).
>
> I wonder if we should merge steps 1 and 2. We could save some efforts
> already done during tuple_update()/tuple_delete() for locking the
> tuple. In heap table access method, we've to start tuple_lock() with
> the first tuple in the chain, but tuple_update()/tuple_delete()
> already visited it. For undo-based table access methods,
> tuple_update()/tuple_delete() should start from the last version, why
> don't place the tuple lock immediately once a concurrent update is
> detected. I think this patch should have some performance benefits on
> high concurrency.
>
> Also, the patch simplifies code in nodeModifyTable.c getting rid of
> the nested case. I also get rid of extra
> table_tuple_fetch_row_version() in ExecUpdate. Why re-fetch the old
> tuple, when it should be exactly the same tuple we've just locked.
>
> I'm going to check the performance impact. Thoughts and feedback are welcome.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
eb5ad4ff05fd382ac98cab60b82f7fd6ce4cfeb8 ===
=== applying patch
./0001-Lock-updated-tuples-in-tuple_update-and-tuple_del-v1.patch
patching file src/backend/executor/nodeModifyTable.c
...
Hunk #3 FAILED at 1376.
...
1 out of 15 hunks FAILED -- saving rejects to file
src/backend/executor/nodeModifyTable.c.rej

[1] - http://cfbot.cputube.org/patch_41_4099.log

Regards,
Vignesh




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2022-07-29 Thread Alexander Korotkov
Hi, Pavel!

On Fri, Jul 29, 2022 at 11:12 AM Pavel Borisov  wrote:
> I ran the following benchmark on master branch (15) vs patch (15-lock):
>
> On the 36-vcore AWS server, I've run an UPDATE-only pgbench script with 50 
> connections on pgbench_tellers with 100 rows. The idea was to introduce as 
> much as possible concurrency for updates but avoid much clients being in a 
> wait state.
> Indexes were not built to avoid index-update-related delays.
> Done 2 runs each consisting of 6 series of updates (1st run: 
> master-patch-master-patch-master-patch, 2nd run 
> patch-master-patch-master-patch-master)
> Each series started a fresh server and did VACUUM FULL to avoid bloating heap 
> relation after the previous series to affect the current. It collected data 
> for 10 minutes with first-minute data being dropped.
> Disk-related operations were suppressed where possible (WAL, fsync etc.)
>
> postgresql.conf:
> fsync = off
> autovacuum = off
> full_page_writes = off
> max_worker_processes = 99
> max_parallel_workers = 99
> max_connections = 100
> shared_buffers = 4096MB
> work_mem = 50MB
>
> Attached are pictures of 2 runs, shell script, and SQL script that were 
> running.
> According to htop all 36-cores were loaded to ~94% in each series
>
> I'm not sure how to interpret the results. Seems like a TPS difference 
> between runs is significant, with average performance with lock-patch 
> (15lock) seeming a little bit faster than the master (15).
>
> Could someone try to repeat this on another server? What do you think?

Thank you for your benchmarks.  The TPS variation is high, and run
order heavily affects the result.  Nevertheless, I think there is a
small but noticeable positive effect of the patch.  I'll continue
working on the patch bringing it into more acceptable shape.

--
Regards,
Alexander Korotkov




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2022-07-12 Thread Alexander Korotkov
Hi Aleksander!

Thank you for your efforts reviewing this patch.

On Thu, Jul 7, 2022 at 12:43 PM Aleksander Alekseev
 wrote:
> > I'm going to need more time to meditate on the proposed changes and to 
> > figure out the performance impact.
>
> OK, turned out this patch is slightly more complicated than I
> initially thought, but I think I managed to get some vague
> understanding of what's going on.
>
> I tried to reproduce the case with concurrently updated tuples you
> described on the current `master` branch. I created a new table:
>
> ```
> CREATE TABLE phonebook(
>   "id" SERIAL PRIMARY KEY NOT NULL,
>   "name" NAME NOT NULL,
>   "phone" INT NOT NULL);
>
> INSERT INTO phonebook ("name", "phone")
> VALUES ('Alice', 123), ('Bob', 456), ('Charlie', 789);
> ```
>
> Then I opened two sessions and attached them with LLDB. I did:
>
> ```
> (lldb) b heapam_tuple_update
> (lldb) c
> ```
>
> ... in both cases because I wanted to see two calls (steps 2 and 4) to
> heapam_tuple_update() and check the return values.
>
> Then I did:
>
> ```
> session1 =# BEGIN;
> session2 =# BEGIN;
> session1 =# UPDATE phonebook SET name = 'Alex' WHERE name = 'Alice';
> ```
>
> This update succeeds and I see heapam_tuple_update() returning TM_Ok.
>
> ```
> session2 =# UPDATE phonebook SET name = 'Alfred' WHERE name = 'Alice';
> ```
>
> This update hangs on a lock.
>
> ```
> session1 =# COMMIT;
> ```
>
> Now session2 unfreezes and returns 'UPDATE 0'. table_tuple_update()
> was called once and returned TM_Updated. Also session2 sees an updated
> tuple now. So apparently the visibility check (step 3) didn't pass.

Yes.  But it's not exactly a visibility check.  Session2 re-evaluates
WHERE condition on the most recent row version (bypassing snapshot).
WHERE condition is not true anymore, thus the row is not upated.

> At this point I'm slightly confused. I don't see where a performance
> improvement is expected, considering that session2 gets blocked until
> session1 commits.
>
> Could you please walk me through here? Am I using the right test case
> or maybe you had another one in mind? Which steps do you consider
> expensive and expect to be mitigated by the patch?

This patch is not intended to change some high-level logic. On the
high level transaction, which updated the row, still holding a lock on
it until finished. The possible positive performance impact I expect
from doing the work of two calls tuple_update() and tuple_lock() in
the one call of tuple_update().  If we do this in one call, we can
save some efforts, for instance lock the same buffer once not twice.

--
Regards,
Alexander Korotkov




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2022-07-07 Thread Aleksander Alekseev
Hi Alexander,

> I'm going to need more time to meditate on the proposed changes and to figure 
> out the performance impact.

OK, turned out this patch is slightly more complicated than I
initially thought, but I think I managed to get some vague
understanding of what's going on.

I tried to reproduce the case with concurrently updated tuples you
described on the current `master` branch. I created a new table:

```
CREATE TABLE phonebook(
  "id" SERIAL PRIMARY KEY NOT NULL,
  "name" NAME NOT NULL,
  "phone" INT NOT NULL);

INSERT INTO phonebook ("name", "phone")
VALUES ('Alice', 123), ('Bob', 456), ('Charlie', 789);
```

Then I opened two sessions and attached them with LLDB. I did:

```
(lldb) b heapam_tuple_update
(lldb) c
```

... in both cases because I wanted to see two calls (steps 2 and 4) to
heapam_tuple_update() and check the return values.

Then I did:

```
session1 =# BEGIN;
session2 =# BEGIN;
session1 =# UPDATE phonebook SET name = 'Alex' WHERE name = 'Alice';
```

This update succeeds and I see heapam_tuple_update() returning TM_Ok.

```
session2 =# UPDATE phonebook SET name = 'Alfred' WHERE name = 'Alice';
```

This update hangs on a lock.

```
session1 =# COMMIT;
```

Now session2 unfreezes and returns 'UPDATE 0'. table_tuple_update()
was called once and returned TM_Updated. Also session2 sees an updated
tuple now. So apparently the visibility check (step 3) didn't pass.

At this point I'm slightly confused. I don't see where a performance
improvement is expected, considering that session2 gets blocked until
session1 commits.

Could you please walk me through here? Am I using the right test case
or maybe you had another one in mind? Which steps do you consider
expensive and expect to be mitigated by the patch?

-- 
Best regards,
Aleksander Alekseev




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2022-07-05 Thread Aleksander Alekseev
Hi again,

> +if (!updated)
> +{
> +/* Should not encounter speculative tuple on recheck */
> +Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));
> - ReleaseBuffer(buffer);
> +ReleaseBuffer(buffer);
> +}
> +else
> +{
> +updated = false;
> +}

OK, I got confused here. I suggest changing the if(!...) { .. } else {
.. } code to if() { .. } else { .. } here.

-- 
Best regards,
Aleksander Alekseev




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2022-07-05 Thread Aleksander Alekseev
Hi Alexander,

> Thoughts and feedback are welcome.

I took some preliminary look at the patch. I'm going to need more time
to meditate on the proposed changes and to figure out the performance
impact.

So far I just wanted to let you know that the patch applied OK for me
and passed all the tests. The `else` branch here seems to be redundant
here:

+if (!updated)
+{
+/* Should not encounter speculative tuple on recheck */
+Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));
- ReleaseBuffer(buffer);
+ReleaseBuffer(buffer);
+}
+else
+{
+updated = false;
+}

Also I wish there were a little bit more comments since some of the
proposed changes are not that straightforward.

-- 
Best regards,
Aleksander Alekseev




POC: Lock updated tuples in tuple_update() and tuple_delete()

2022-07-01 Thread Alexander Korotkov
Hackers,

When working in the read committed transaction isolation mode
(default), we have the following sequence of actions when
tuple_update() or tuple_delete() find concurrently updated tuple.

1. tuple_update()/tuple_delete() returns TM_Updated
2. tuple_lock()
3. Re-evaluate plan qual (recheck if we still need to update/delete
and calculate the new tuple for update)
4. tuple_update()/tuple_delete() (this time should be successful,
since we've previously locked the tuple).

I wonder if we should merge steps 1 and 2. We could save some efforts
already done during tuple_update()/tuple_delete() for locking the
tuple. In heap table access method, we've to start tuple_lock() with
the first tuple in the chain, but tuple_update()/tuple_delete()
already visited it. For undo-based table access methods,
tuple_update()/tuple_delete() should start from the last version, why
don't place the tuple lock immediately once a concurrent update is
detected. I think this patch should have some performance benefits on
high concurrency.

Also, the patch simplifies code in nodeModifyTable.c getting rid of
the nested case. I also get rid of extra
table_tuple_fetch_row_version() in ExecUpdate. Why re-fetch the old
tuple, when it should be exactly the same tuple we've just locked.

I'm going to check the performance impact. Thoughts and feedback are welcome.

--
Regards,
Alexander Korotkov


0001-Lock-updated-tuples-in-tuple_update-and-tuple_del-v1.patch
Description: Binary data