Re: Exposing the lock manager's WaitForLockers() to SQL

2024-05-30 Thread Will Mortensen
I should add that the latest patches remove permissions checks because
pg_locks doesn't have any, and improve the commit messages. Hope I
didn't garble anything doing this late after the dev conference. :-)

Robert asked me about other existing functions that could be
leveraged, such as GetConflictingVirtualXIDs(), but I didn't see any
with close-enough semantics that handle fast-path locks as needed for
tables/relations.




Re: Exposing the lock manager's WaitForLockers() to SQL

2024-05-30 Thread Will Mortensen
I got some very helpful off-list feedback from Robert Haas that this
needed more self-contained explanation/motivation. So here goes. :-)

This patch set adds a new SQL function pg_wait_for_lockers(), which
waits for transactions holding specified table locks to commit or roll
back. This can be useful with knowledge of the queries in those
transactions, particularly for asynchronous and incremental processing
of inserted/updated rows.

Specifically, consider a scenario where INSERTs and UPDATEs always set
a serial column to its default value. A client can call
pg_sequence_last_value() + pg_wait_for_lockers() and then take a new
DB snapshot and know that rows committed after this snapshot will have
values of the serial column greater than the value from
pg_sequence_last_value(). As shown in the example at the end, this
allows the client to asynchronously and incrementally read
inserted/updated rows with minimal per-client state, without buffering
changes, and without affecting writer transactions.

There are lots of other ways to support incrementally reading new
rows, but they don’t have all of those qualities. For example:

* Forcing writers to commit in a specific order (e.g. by serial column
value) would reduce throughput

* Explicitly tracking or coordinating with writers would likely be
more complex, impact performance, and/or require much more state

* Methods that are synchronous or buffer/queue changes are problematic
if readers fall behind

Existing ways to wait for table locks also have downsides:

* Taking a conflicting lock with LOCK blocks new transactions from
taking the lock of interest while LOCK waits. And in order to wait for
writers holding RowExclusiveLock, we must take ShareLock, which also
conflicts with ShareUpdateExclusiveLock and therefore unnecessarily
interferes with (auto)vacuum. Finally, with multiple tables LOCK locks
them one at a time, so it waits (and holds locks) longer than
necessary.

* Using pg_locks / pg_lock_status() to identify the transactions
holding the locks is more expensive since it also returns all other
locks in the DB cluster, plus there’s no efficient built-in way to
wait for the transactions to commit or roll back.

By contrast, pg_wait_for_lockers() doesn’t block other transactions,
waits on multiple tables in parallel, and doesn’t spend time looking
at irrelevant locks.

This change is split into three patches for ease of review. The first
two patches modify the existing WaitForLockers() C function and other
locking internals to support waiting for lockers in a single lock
mode, which allows waiting for INSERT/UPDATE without waiting for
vacuuming. These changes could be omitted at the cost of unnecessary
waiting, potentially for a long time with slow vacuums. The third
patch adds the pg_wait_for_lockers() SQL function, which just calls
WaitForLockers().

FWIW, another solution might be to directly expose the functions that
WaitForLockers() calls, namely GetLockConflicts() (generalized to
GetLockers() in the first patch) to identify the transactions holding
the locks, and VirtualXactLock() to wait for each transaction to
commit or roll back. That would be more complicated for the client but
could be more broadly useful. I could investigate that further if it
seems preferable.


=== Example ===

Assume we have the following table:

CREATE TABLE page_views (
id bigserial,
view_time timestamptz
);

which is only ever modified by (potentially concurrent) INSERT
commands that assign the default value to the id column. We can run
the following commands:

SELECT pg_sequence_last_value('page_views_id_seq');

 pg_sequence_last_value

   4

SELECT pg_wait_for_lockers(array['page_views']::oid, regclass[],
'RowExclusiveLock', FALSE);

Now we know that all rows where id <= 4 have been committed or rolled
back, and we can observe/process them:

SELECT * FROM page_views WHERE id <= 4;

 id |   view_time
+---
  2 | 2024-01-01 12:34:01.00-00
  3 | 2024-01-01 12:34:00.00-00

Later we can iterate:

SELECT pg_sequence_last_value('page_views_id_seq');

 pg_sequence_last_value

  9

SELECT pg_wait_for_lockers(array['page_views']::oid, regclass[],
'RowExclusiveLock', FALSE);

We already observed all the rows where id <= 4, so this time we can
filter them out:

SELECT * FROM page_views WHERE id > 4 AND id <= 9;

 id |   view_time
+---
  5 | 2024-01-01 12:34:05.00-00
  8 | 2024-01-01 12:34:04.00-00
  9 | 2024-01-01 12:34:07.00-00

We can continue iterating like this to incrementally observe more
newly inserted rows. Note that the only state we persist across
iterations is the value returned by pg_sequence_last_value().

In this example, we processed inserted rows exactly once. Variations
are possible for handling updates, as discussed in the original email,
and I could explain that 

Re: [PATCH] LockAcquireExtended improvement

2024-05-22 Thread Will Mortensen
Thanks! :-)




Re: [PATCH] LockAcquireExtended improvement

2024-05-18 Thread Will Mortensen
On Tue, Mar 26, 2024 at 7:14 PM Will Mortensen  wrote:
> This comment on ProcSleep() seems to have the values of dontWait
> backward (double negatives are tricky):
>
> * Result: PROC_WAIT_STATUS_OK if we acquired the lock,
> PROC_WAIT_STATUS_ERROR
> * if not (if dontWait = true, this is a deadlock; if dontWait = false, we
> * would have had to wait).
>
> Also there's a minor typo in a comment in LockAcquireExtended():
>
> * Check the proclock entry status. If dontWait = true, this is an
> * expected case; otherwise, it will open happen if something in the
> * ipc communication doesn't work correctly.
>
> "open" should be "only".

Here's a patch fixing those typos.


v1-0001-Fix-typos-from-LOCK-NOWAIT-improvement.patch
Description: Binary data


Re: Exposing the lock manager's WaitForLockers() to SQL

2024-03-26 Thread Will Mortensen
Rebased, fixed a couple typos, and reordered the isolation tests to
put the most elaborate pair last.


v11-0001-Refactor-GetLockConflicts-into-more-general-GetL.patch
Description: Binary data


v11-0002-Allow-specifying-single-lockmode-in-WaitForLocke.patch
Description: Binary data


v11-0003-Add-pg_wait_for_lockers-function.patch
Description: Binary data


Re: [PATCH] LockAcquireExtended improvement

2024-03-26 Thread Will Mortensen
On Thu, Mar 14, 2024 at 1:15 PM Robert Haas  wrote:
> Seeing no further discussion, I have committed my version of this
> patch, with your test case.

This comment on ProcSleep() seems to have the values of dontWait
backward (double negatives are tricky):

* Result: PROC_WAIT_STATUS_OK if we acquired the lock,
PROC_WAIT_STATUS_ERROR
* if not (if dontWait = true, this is a deadlock; if dontWait = false, we
* would have had to wait).

Also there's a minor typo in a comment in LockAcquireExtended():

* Check the proclock entry status. If dontWait = true, this is an
* expected case; otherwise, it will open happen if something in the
* ipc communication doesn't work correctly.

"open" should be "only".




Re: Exposing the lock manager's WaitForLockers() to SQL

2024-03-08 Thread Will Mortensen
Rebased and fixed conflicts.

FWIW re: Andrey's comment in his excellent CF summary email[0]: we're
currently using vanilla Postgres (via Gentoo) on single nodes, and not
anything fancy like Citus. The Citus relationship is just that we were
inspired by Marco's blog post there. We have a variety of clients
written in different languages that generally don't coordinate their
table modifications, and Marco's scheme merely requires them to use
sequences idiomatically, which we can just about manage. :-)

This feature is then a performance optimization to support this scheme
while avoiding the case where one writer holding a RowExclusiveLock
blocks the reader from taking a ShareLock which in turn prevents other
writers from taking a RowExclusiveLock for a long time. Instead, the
reader can wait for the first writer without taking any locks or
blocking later writers. I've illustrated this difference in the
isolation tests.

Still hoping we can get this into 17. :-)

[0] 
https://www.postgresql.org/message-id/C8D65462-0888-4484-A72C-C99A94381ECD%40yandex-team.ru


v10-0003-Add-pg_wait_for_lockers-function.patch
Description: Binary data


v10-0001-Refactor-GetLockConflicts-into-more-general-GetL.patch
Description: Binary data


v10-0002-Allow-specifying-single-lockmode-in-WaitForLocke.patch
Description: Binary data


Re: Exposing the lock manager's WaitForLockers() to SQL

2024-01-28 Thread Will Mortensen
Minor style fix; sorry for the spam.


v9-0001-Refactor-GetLockConflicts-into-more-general-GetLo.patch
Description: Binary data


v9-0002-Allow-specifying-single-lockmode-in-WaitForLocker.patch
Description: Binary data


v9-0003-Add-pg_wait_for_lockers-function.patch
Description: Binary data


Re: Exposing the lock manager's WaitForLockers() to SQL

2024-01-28 Thread Will Mortensen
I guess the output of the deadlock test was unstable, so I simply
removed it in v8 here, but I can try to fix it instead if it seems
important to test that.


v8-0001-Refactor-GetLockConflicts-into-more-general-GetLo.patch
Description: Binary data


v8-0003-Add-pg_wait_for_lockers-function.patch
Description: Binary data


v8-0002-Allow-specifying-single-lockmode-in-WaitForLocker.patch
Description: Binary data


Re: Exposing the lock manager's WaitForLockers() to SQL

2024-01-28 Thread Will Mortensen
On Fri, Jan 26, 2024 at 4:54 AM vignesh C  wrote:
>
> CFBot shows that there is one warning as in [1]:
> patching file doc/src/sgml/libpq.sgml
> ...
> [09:30:40.000] [943/2212] Compiling C object
> src/backend/postgres_lib.a.p/storage_lmgr_lock.c.obj
> [09:30:40.000] c:\cirrus\src\backend\storage\lmgr\lock.c(4084) :
> warning C4715: 'ParseLockmodeName': not all control paths return a
> value

Thanks Vignesh, I guess the MS compiler doesn't have
__builtin_constant_p()? So I added an unreachable return, and a
regression test that exercises this error path.

I also made various other simplifications and minor fixes to the code,
docs, and tests.

Back in v5 (with a new SQL command) I had a detailed example in the
docs, which I removed when changing to a function, and I'm not sure if
I should try to add it back now...I could shrink it but it might still
be too long for this part of the docs?

Anyway, please see attached.


v7-0001-Refactor-GetLockConflicts-into-more-general-GetLo.patch
Description: Binary data


v7-0002-Allow-specifying-single-lockmode-in-WaitForLocker.patch
Description: Binary data


v7-0003-Add-pg_wait_for_lockers-function.patch
Description: Binary data


Re: Exposing the lock manager's WaitForLockers() to SQL

2024-01-11 Thread Will Mortensen
Here is a new series adding a single pg_wait_for_lockers() function
that takes a boolean argument to control the interpretation of the
lock mode. It omits LOCK's handling of descendant tables so it
requires permissions directly on descendants in order to wait for
locks on them. Not sure if that would be a problem for anyone.


v6-0001-Refactor-GetLockConflicts-into-more-general-GetLo.patch
Description: Binary data


v6-0002-Allow-specifying-single-lockmode-in-WaitForLocker.patch
Description: Binary data


v6-0003-Add-pg_wait_for_lockers-function.patch
Description: Binary data


Re: Exposing the lock manager's WaitForLockers() to SQL

2024-01-09 Thread Will Mortensen
Hi Laurenz, thanks for taking a look!

On Sat, Jan 6, 2024 at 4:00 AM Laurenz Albe  wrote:
> While your original use case is valid, I cannot think of
> any other use case.  So it is a special-purpose statement that is only
> useful for certain processing of append-only tables.

It is definitely somewhat niche. :-) But as I mentioned in my
longwinded original message, the scheme is easily extended (with some
tradeoffs) to process updates, if they set a non-primary-key column
using a sequence. As for deletions though, our applications handle
them separately.

> Is it worth creating a new SQL statement for that, which could lead to
> a conflict with future editions of the SQL standard?  Couldn't we follow
> the PostgreSQL idiosyncrasy of providing a function with side effects
> instead?

I would be happy to add a pg_foo() function instead. Here are a few
things to figure out:

* To support waiting for lockers in a specified mode vs. conflicting
with a specified mode, should there be two functions, or one function
with a boolean argument like I used in C?

* Presumably the function(s) would take a regclass[] argument?

* Presumably the lock mode would be specified using strings like
'ShareLock'? There's no code to parse these AFAICT, but we could add
it.

* Maybe we could omit LOCK's handling of descendant tables for
simplicity? I will have to see how much other code needs to be
duplicated or shared.

I'll look further into it later this week.




Re: Exposing the lock manager's WaitForLockers() to SQL

2024-01-06 Thread Will Mortensen
Simplified the code and docs, and rewrote the example with more prose
instead of PL/pgSQL, which unfortunately made it longer, although it
could be truncated. Not really sure what's best...


v5-0001-Refactor-GetLockConflicts-into-more-general-GetLo.patch
Description: Binary data


v5-0003-Add-WAIT-FOR-LOCKERS-command.patch
Description: Binary data


v5-0002-Allow-specifying-single-lockmode-in-WaitForLocker.patch
Description: Binary data


Re: Exposing the lock manager's WaitForLockers() to SQL

2023-12-23 Thread Will Mortensen
I meant to add that the example in the doc is adapted from Marco
Slot's blog post linked earlier:
https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/




Re: Exposing the lock manager's WaitForLockers() to SQL

2023-12-23 Thread Will Mortensen
On Sun, Sep 3, 2023 at 11:16 PM Will Mortensen  wrote:
> I realized that for our use case, we'd ideally wait for holders of
> RowExclusiveLock only, and not e.g. VACUUM holding
> ShareUpdateExclusiveLock. Waiting for lockers in a specific mode seems
> possible by generalizing/duplicating WaitForLockersMultiple() and
> GetLockConflicts(), but I'd love to have a sanity check before
> attempting that. Also, I imagine those semantics might be too
> different to make sense as part of the LOCK command.

Well I attempted it. :-) Here is a new series that refactors
GetLockConflicts(), generalizes WaitForLockersMultiple(), and adds a
new WAIT FOR LOCKERS command.

I first tried extending LOCK further, but the code became somewhat
unwieldy and the syntax became very confusing. I also thought again
about making new pg_foo() functions, but that would seemingly make it
harder to share code with LOCK, and sharing syntax (to the extent it
makes sense) feels very natural. Also, a new SQL command provides
plenty of doc space. :-) (I'll see about adding more examples later.)

I'll try to edit the title of the CF entry accordingly. Still looking
forward to any feedback. :-)


v4-0002-Allow-specifying-single-lockmode-in-WaitForLocker.patch
Description: Binary data


v4-0003-Add-WAIT-FOR-LOCKERS-command.patch
Description: Binary data


v4-0001-Refactor-GetLockConflicts-into-more-general-GetLo.patch
Description: Binary data


Re: Exposing the lock manager's WaitForLockers() to SQL

2023-09-04 Thread Will Mortensen
I realized that for our use case, we'd ideally wait for holders of
RowExclusiveLock only, and not e.g. VACUUM holding
ShareUpdateExclusiveLock. Waiting for lockers in a specific mode seems
possible by generalizing/duplicating WaitForLockersMultiple() and
GetLockConflicts(), but I'd love to have a sanity check before
attempting that. Also, I imagine those semantics might be too
different to make sense as part of the LOCK command.

Alternatively, I had originally been trying to use the pg_locks view,
which obviously provides flexibility in identifying existing lock
holders. But I couldn't find a way to wait for the locks to be
released / transactions to finish, and I was a little concerned about
the performance impact of selecting from it frequently when we only
care about a subset of the locks, although I didn't try to assess that
in our particular application.

In any case, I'm looking forward to hearing more feedback from
reviewers and potential users. :-)




Re: Exposing the lock manager's WaitForLockers() to SQL

2023-08-03 Thread Will Mortensen
Updated docs a bit. I'll see about adding this to the next CF in hopes
of attracting a reviewer. :-)


v3-0001-Add-WAIT-ONLY-option-to-LOCK-command.patch
Description: Binary data


Re: Exposing the lock manager's WaitForLockers() to SQL

2023-07-04 Thread Will Mortensen
Updated patch with more tests and a first attempt at doc updates.

As the commit message and doc now point out, using
WaitForLockersMultiple() makes for a behavior difference with actually
locking multiple tables, in that the combined set of conflicting locks
is obtained only once for all tables, rather than obtaining conflicts
and locking / waiting for just the first table and then obtaining
conflicts and locking / waiting for the second table, etc. This is
definitely desirable for my use case, but maybe these kinds of
differences illustrate the potential awkwardness of extending LOCK?

Thanks again for any and all feedback!


v2-0001-Add-WAIT-ONLY-option-to-LOCK-statement.patch
Description: Binary data


Re: [PATCH] doc: add missing mention of MERGE in MVCC

2023-06-21 Thread Will Mortensen
I saw, thanks again!

On Wed, Jun 21, 2023 at 4:08 PM Bruce Momjian  wrote:
>
> On Mon, Jun 19, 2023 at 11:32:46PM -0700, Will Mortensen wrote:
> > MERGE is now a data-modification command too.
>
> Yes, this has been applied too.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   Only you can decide what is important to you.




[PATCH] doc: add missing mention of MERGE in MVCC

2023-06-20 Thread Will Mortensen
MERGE is now a data-modification command too.


0002-doc-add-missing-mention-of-MERGE-in-MVCC.patch
Description: Binary data


[PATCH] doc: fix markup indentation in MVCC

2023-06-20 Thread Will Mortensen
Trivial fix to make the indentation consistent.
From 46977fbe5fa0a26ef77938a8fe30b9def062e8f8 Mon Sep 17 00:00:00 2001
From: Will Mortensen 
Date: Sat, 27 Aug 2022 17:07:11 -0700
Subject: [PATCH 1/6] doc: fix markup indentation in MVCC

---
 doc/src/sgml/mvcc.sgml | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 337f6dd429..69b01d01b9 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -109,8 +109,8 @@
dirty read
dirty read
   
- 
-  
+  
+   
 A transaction reads data written by a concurrent uncommitted 
transaction.

   
@@ -121,8 +121,8 @@
nonrepeatable read
nonrepeatable read
   
- 
-  
+  
+   
 A transaction re-reads data it has previously read and finds that data
 has been modified by another transaction (that committed since the
 initial read).
@@ -135,8 +135,8 @@
phantom read
phantom read
   
- 
-  
+  
+   
 A transaction re-executes a query returning a set of rows that satisfy 
a
 search condition and finds that the set of rows satisfying the 
condition
 has changed due to another recently-committed transaction.
@@ -149,8 +149,8 @@
serialization anomaly
serialization anomaly
   
- 
-  
+  
+   
 The result of successfully committing a group of transactions
 is inconsistent with all possible orderings of running those
 transactions one at a time.
-- 
2.25.1



Re: Exposing the lock manager's WaitForLockers() to SQL

2023-02-01 Thread Will Mortensen
Here is a first attempt at a WIP patch. Sorry about the MIME type.

It doesn't take any locks on the tables, but I'm not super confident
that that's safe, so any input would be appreciated.

I omitted view support for simplicity, but if that seems like a
requirement I'll see about adding it. I assume we would need to take
AccessShareLock on views (and release it, per above).

If the syntax and behavior seem roughly correct I'll work on updating the docs.

The commit message at the beginning of the .patch has slightly more commentary.

Thanks for any and all feedback!


0001-Add-WAIT-ONLY-option-to-LOCK.patch
Description: Binary data


Re: Exposing the lock manager's WaitForLockers() to SQL

2023-01-12 Thread Will Mortensen
Hi Andres,

On Thu, Jan 12, 2023 at 7:49 PM Andres Freund  wrote:
> Consider a scenario like this:
>
> tx 1: acquires RowExclusiveLock on tbl1 to insert rows
> tx 2: acquires AccessShareLock on tbl1
> tx 2: WaitForLockers(ShareRowExclusiveLock, tbl1) ends up waiting for tx1
> tx 1: truncate tbl1 needs an AccessExclusiveLock

Oh of course, thanks.

Is it even necessary to take the AccessShareLock? I see that one can call e.g.
RangeVarGetRelidExtended() with NoLock, and from the comments it seems
like that might be OK here?

Did you have any remaining concerns about the suitability of WaitForLockers()
for the use case?

Any thoughts on the syntax? It seems like an option to LOCK (like Marco
suggested) might be simplest to implement albeit a little tricky to document.

Supporting descendant tables looks straightforward enough (just collect more
locktags?). Views look more involved; maybe we can avoid supporting them?




Re: Exposing the lock manager's WaitForLockers() to SQL

2023-01-12 Thread Will Mortensen
Hi Andres,

On Thu, Jan 12, 2023 at 11:31 AM Andres Freund  wrote:
> I know that WaitForLockers() is an existing function :). I'm not sure it's
> entirely suitable for your use case. So I mainly wanted to point out that if
> you end up writing a separate version of it, you still need to integrate with
> the deadlock detection.

I see. What about it seems potentially unsuitable?

> On 2023-01-11 23:03:30 -0800, Will Mortensen wrote:
> > To my very limited understanding, from looking at the existing callers and
> > the implementation of LOCK, that would look something like this (assuming
> > we're in a SQL command like LOCK and calling unmodified WaitForLockers()
> > with a single table):
> >
> > 1. Call something like RangeVarGetRelidExtended() with AccessShareLock
> > to ensure the table is not dropped and obtain the table oid
> >
> > 2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid
> >
> > 3. Call WaitForLockers(), which internally calls GetLockConflicts() and
> > VirtualXactLock(). These certainly take plenty of locks of various types,
> > and will likely sleep in LockAcquire() waiting for transactions to finish,
> > but there don't seem to be any unusual pre/postconditions, nor do we
> > hold any unusual locks already.
>
> I suspect that keeping the AccessShareLock while doing the WaitForLockers() is
> likely to increase the deadlock risk noticeably.  I think for the use case you
> might get away with resolving the relation names, building the locktags, and
> then release the lock, before calling WaitForLockers. If somebody drops the
> table or such, you'd presumably still get desired behaviour that way, without
> the increased deaadlock risk.

That makes sense. I agree it seems fine to just return if e.g. the table is
dropped.

FWIW re: deadlocks in general, I probably didn't highlight it well in my
original email, but the existing solution for this use case (as Marco
described in his blog post) is to actually lock the table momentarily.
Marco's blog post uses ShareRowExclusiveLock, but I think ShareLock is
sufficient for us; in any case, that's stronger than the AccessShareLock that
we need to merely wait.

And actually locking the table with e.g. ShareLock seems perhaps *more*
likely to cause deadlocks (and hurts performance), since it not only waits for
existing conflicting lockers (e.g. RowExclusiveLock) as desired, but also
undesirably blocks other transactions from newly acquiring conflicting locks
in the meantime. Hence the motivation for this feature. :-)

I'm sure I may be missing something though. Thanks for all your feedback. :-)




Re: Exposing the lock manager's WaitForLockers() to SQL

2023-01-12 Thread Will Mortensen
I suppose if it's correct that we need to lock the table first (at least
in ACCESS SHARE mode), an option to LOCK perhaps makes
more sense. Maybe you could specify two modes like:

LOCK TABLE IN _lockmode_ MODE AND THEN WAIT FOR CONFLICTS WITH _waitmode_ MODE;

But that might be excessive. :-D And I don't know if there's any
reason to use a _lockmode_ other than ACCESS SHARE.

On Wed, Jan 11, 2023 at 11:03 PM Will Mortensen  wrote:
>
> Hi Andres,
>
> On Wed, Jan 11, 2023 at 12:33 PM Andres Freund  wrote:
> > I think such a function would still have to integrate enough with the lock
> > manager infrastructure to participate in the deadlock detector. Otherwise I
> > think you'd trivially end up with loads of deadlocks.
>
> Could you elaborate on which unusual deadlock concerns arise? To be
> clear, WaitForLockers() is an existing function in lmgr.c
> (https://github.com/postgres/postgres/blob/216a784829c2c5f03ab0c43e009126cbb819e9b2/src/backend/storage/lmgr/lmgr.c#L986),
> and naively it seems like we mostly just need to call it. To my very
> limited understanding, from looking at the existing callers and the
> implementation of LOCK, that would look something like this
> (assuming we're in a SQL command like LOCK and calling unmodified
> WaitForLockers() with a single table):
>
> 1. Call something like RangeVarGetRelidExtended() with AccessShareLock
> to ensure the table is not dropped and obtain the table oid
>
> 2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid
>
> 3. Call WaitForLockers(), which internally calls GetLockConflicts() and
> VirtualXactLock(). These certainly take plenty of locks of various types,
> and will likely sleep in LockAcquire() waiting for transactions to finish,
> but there don't seem to be any unusual pre/postconditions, nor do we
> hold any unusual locks already.
>
> Obviously a deadlock is possible if transactions end up waiting for each
> other, just as when taking table or row locks, etc., but it seems like this
> would be detected as usual?




Re: Exposing the lock manager's WaitForLockers() to SQL

2023-01-11 Thread Will Mortensen
Hi Andres,

On Wed, Jan 11, 2023 at 12:33 PM Andres Freund  wrote:
> I think such a function would still have to integrate enough with the lock
> manager infrastructure to participate in the deadlock detector. Otherwise I
> think you'd trivially end up with loads of deadlocks.

Could you elaborate on which unusual deadlock concerns arise? To be
clear, WaitForLockers() is an existing function in lmgr.c
(https://github.com/postgres/postgres/blob/216a784829c2c5f03ab0c43e009126cbb819e9b2/src/backend/storage/lmgr/lmgr.c#L986),
and naively it seems like we mostly just need to call it. To my very
limited understanding, from looking at the existing callers and the
implementation of LOCK, that would look something like this
(assuming we're in a SQL command like LOCK and calling unmodified
WaitForLockers() with a single table):

1. Call something like RangeVarGetRelidExtended() with AccessShareLock
to ensure the table is not dropped and obtain the table oid

2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid

3. Call WaitForLockers(), which internally calls GetLockConflicts() and
VirtualXactLock(). These certainly take plenty of locks of various types,
and will likely sleep in LockAcquire() waiting for transactions to finish,
but there don't seem to be any unusual pre/postconditions, nor do we
hold any unusual locks already.

Obviously a deadlock is possible if transactions end up waiting for each
other, just as when taking table or row locks, etc., but it seems like this
would be detected as usual?




Re: Exposing the lock manager's WaitForLockers() to SQL

2023-01-11 Thread Will Mortensen
Hi Marco, thanks for the reply! Glad to know you'd find it useful too. :-)

On Tue, Jan 10, 2023 at 1:01 AM Marco Slot  wrote:
> I'm wondering whether it could be an option of the LOCK command.
> (LOCK WAIT ONLY?)

I assume that's doable, but just from looking at the docs, it might be
a little confusing. For example, at least if we use
WaitForLockersMultiple(), waiting for multiple tables would happen in
parallel (which I think is good), while locking them is documented to
happen sequentially. Also, normal LOCK is illegal outside a
transaction, but waiting makes perfect sense. (Actually, normal LOCK
makes sense too, if the goal was just to wait. :-) )

By contrast, while LOCK has NOWAIT, and SELECT's locking clause
has NOWAIT and SKIP LOCKED, they only change the blocking/failure
behavior, while success still means taking the lock and has the same
semantics.

But I'm really no expert on SQL syntax or typical practice for things like
this. Anything that works is fine with me. :-)



As a possibly superfluous sidebar, I wanted to correct this part of my
original message:

> On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen  wrote:
> > pg_sequence_last_value() (still undocumented!) can be used to
> > obtain an instantaneous upper bound on the sequence values
> > that have been returned by nextval(), even if the transaction
> > that called nextval() hasn't yet committed.

This is true, but not the most important part of making this scheme
work: as you mentioned in the Citus blog post, to avoid missing rows,
we need (and this gives us) an instantaneous *lower* bound on the
sequence values that could be used by transactions that commit after
we finish waiting (and start processing). This doesn't work with
sequence caching, since without somehow inspecting all sessions'
sequence caches, rows with arbitrarily old/low cached sequence
values could be committed arbitrarily far into the future, and we'd
fail to process them.

As you also implied in the blog post, the upper bound is what
allows us to also process each row *exactly* once (instead of at
least once) and in sequence order, if desired.

So those are the respective justifications for both arms of the
WHERE clause: id > min_id AND id <= max_id .

On Tue, Jan 10, 2023 at 1:01 AM Marco Slot  wrote:
>
> On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen  wrote:
> > We'd like to be able to call the lock manager's WaitForLockers() and
> > WaitForLockersMultiple() from SQL. Below I describe our use case, but
> > basically I'm wondering if this:
> >
> > 1. Seems like a reasonable thing to do
> >
> > 2. Would be of interest upstream
> >
> > 3. Should be done with a new pg_foo() function (taking an
> >oid?), or a whole new SQL command, or something else
>
> Definitely +1 on adding a function/syntax to wait for lockers without
> actually taking a lock. The get sequence value + lock-and-release
> approach is still the only reliable scheme I've found for reliably and
> efficiently processing new inserts in PostgreSQL. I'm wondering
> whether it could be an option of the LOCK command. (LOCK WAIT ONLY?)
>
> Marco




Exposing the lock manager's WaitForLockers() to SQL

2022-12-23 Thread Will Mortensen
Hi there,

We'd like to be able to call the lock manager's WaitForLockers() and
WaitForLockersMultiple() from SQL. Below I describe our use case, but
basically I'm wondering if this:

1. Seems like a reasonable thing to do

2. Would be of interest upstream

3. Should be done with a new pg_foo() function (taking an
   oid?), or a whole new SQL command, or something else

If this sounds promising, we may be able to code this up and submit it.

The rest of this email describes our use cases and related observations:


 Use Case Background 

Our use case is inspired by this blog post by Marco Slot (CC'ed) at
Citus Data: 
https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/
. This describes a scheme for correctly aggregating rows given minimal
coordination with an arbitrary number of writers while keeping minimal
additional state. It relies on two simple facts:

1. INSERT/UPDATE take their ROW EXCLUSIVE lock on the target
   table before evaluating any column DEFAULT expressions,
   and thus before e.g. calling nextval() on a sequence in
   the DEFAULT expression. And of course, this lock is only
   released when the transaction commits or rolls back.

2. pg_sequence_last_value() (still undocumented!) can be
   used to obtain an instantaneous upper bound on the
   sequence values that have been returned by nextval(), even
   if the transaction that called nextval() hasn't yet
   committed.

So, assume we have a table:

create table tbl (
id bigserial,
data text
);

which is only ever modified by INSERTs that use DEFAULT for id. Then,
a client can process each row exactly once using a loop like this
(excuse the pseudo-SQL):

min_id := 0;
while true:
max_id := pg_sequence_last_value('tbl_id_seq');
wait_for_writers('tbl'::regclass);
SELECT
some_aggregation(data)
FROM tbl
WHERE id > min_id AND id <= max_id;
min_id := max_id;

In the blog post, the equivalent of wait_for_writers() is implemented
by taking and immediately releasing a SHARE ROW EXCLUSIVE lock on tbl.
It's unclear why this can't be SHARE, since it just needs to conflict
with INSERT's ROW EXCLUSIVE, but in any case it's sufficient for
correctness.

(Note that this version only works if the rows committed by the
transactions that it waited for are actually visible to the SELECT, so
for example, the whole thing can't be within a Repeatable Read or
Serializable transaction.)


 Why WaitForLockers()? 

No new writer can acquire a ROW EXCLUSIVE lock as long as we're
waiting to obtain the SHARE lock, even if we only hold it for an
instant. If we have to wait a long time, because some existing writer
holds its ROW EXCLUSIVE lock for a long time, this could noticeably
reduce overall writer throughput.

But we don't actually need to obtain a lock at all--and waiting for
transactions that already hold conflicting locks is exactly what
WaitForLockers() / WaitForLockersMultiple() does. Using it instead
would prevent any interference with writers.


 Appendix: Extensions and Observations 

Aside from downgrading to SHARE mode and merely waiting instead of
locking, we propose a couple other extensions and observations related
to Citus' scheme. These only tangentially motivate our need for
WaitForLockers(), so you may stop reading here unless the overall
scheme is of interest.

== Separate client for reading sequences and waiting ==

First, in our use case each batch of rows might require extensive
processing as part of a larger operation that doesn't want to block
waiting for writers to commit. A simple extension is to separate the
processing from the determination of sequence values. In other words,
have a single client that sits in a loop:

while true:
seq_val := pg_sequence_last_value('tbl_id_seq');
WaitForLockers('tbl'::regclass, 'SHARE');
publish(seq_val);

and any number of other clients that use the series of published
sequence values to do their own independent processing (maintaining
their own additional state).

This can be extended to multiple tables with WaitForLockersMultiple():

while true:
seq_val1 := pg_sequence_last_value('tbl1_id_seq');
seq_val2 := pg_sequence_last_value('tbl2_id_seq');
WaitForLockersMultiple(
ARRAY['tbl1', 'tbl2']::regclass[], 'SHARE');
publish('tbl1', seq_val1);
publish('tbl2', seq_val2);

Which is clearly more efficient than locking or waiting for the tables
in sequence, hence the desire for that function as well.

== Latency ==

This brings us to a series of observations about latency. If some
writers take a long time to commit, some already-committed rows might
not be processed for a long time. To avoid exacerbating this when
using WaitForLockersMultiple(), which obviously has to wait for the
last writer of any specified table, it should be 

[PATCH] fix doc example of bit-reversed MAC address

2022-06-03 Thread Will Mortensen
Pretty trivial since this is documenting something that Postgres
*doesn't* do, but it incorrectly reversed only the bits of each
nibble, not the whole byte. See e.g.
https://www.ibm.com/docs/en/csfdcd/7.1?topic=ls-bit-ordering-in-mac-addresses
for a handy table.
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 9b0c6c6926..eb2cf34ae6 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -3907,7 +3907,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays
  IEEE Std 802-2001 specifies the second shown form (with hyphens)
  as the canonical form for MAC addresses, and specifies the first
  form (with colons) as the bit-reversed notation, so that
- 08-00-2b-01-02-03 = 01:00:4D:08:04:0C.  This convention is widely
+ 08-00-2b-01-02-03 = 10:00:D4:80:40:C0.  This convention is widely
  ignored nowadays, and it is relevant only for obsolete network
  protocols (such as Token Ring).  PostgreSQL makes no provisions
  for bit reversal, and all accepted formats use the canonical LSB