date:20230111

Re: Generate pg_stat_get_xact*() functions with Macros

2023-01-11 Thread Drouvot, Bertrand


Hi,

On 1/11/23 11:59 PM, Andres Freund wrote:

Hi,

Michael, CCing you because of the point about PG_STAT_GET_DBENTRY_FLOAT8
below.


On 2023-01-05 14:48:39 +0100, Drouvot, Bertrand wrote:

While at it, I also took the opportunity to create the macros for 
pg_stat_get_xact_function_total_time(),
pg_stat_get_xact_function_self_time() and pg_stat_get_function_total_time(), 
pg_stat_get_function_self_time()
(even if the same code pattern is only repeated two 2 times).


I'd split that up into a separate commit.




Thanks for looking at it! Makes sense, will do.



Now that this patch renames some fields


I don't mind renaming the fields - the prefixes really don't provide anything
useful. But it's not clear why this is related to this patch? You could just
include the f_ prefix in the macro, no?




Right, but the idea is to take the same approach that the one used in 
8018ffbf58 (where placing the prefixes in the macro
would have been possible too).



, I think that, for consistency, those ones should be renamed too (aka remove 
the f_ and t_ prefixes):

PgStat_FunctionCounts.f_numcalls
PgStat_StatFuncEntry.f_numcalls
PgStat_TableCounts.t_truncdropped
PgStat_TableCounts.t_delta_live_tuples
PgStat_TableCounts.t_delta_dead_tuples
PgStat_TableCounts.t_changed_tuples


Yea, without that the result is profoundly weird.



But I think it would be better to do it in a follow-up patch (once this one get 
committed).


I don't mind committing it in a separate commit, but I think it should be done
at least immediately following the other commit. I.e. developed together.

I probably would go the other way, and rename all of them first. That'd make
this commit a lot more focused, and the renaming commit purely mechanical.



Yeah, makes sense. Let's proceed that way. I'll provide the "rename" patch.


Probably should remove PgStat_BackendFunctionEntry. 


I think that would be a 3rd patch, agree?


@@ -168,19 +168,19 @@ pgstat_end_function_usage(PgStat_FunctionCallUsage *fcu, 
bool finalize)
INSTR_TIME_ADD(total_func_time, f_self);
  
  	/*

-* Compute the new f_total_time as the total elapsed time added to the
-* pre-call value of f_total_time.  This is necessary to avoid
+* Compute the new total_time as the total elapsed time added to the
+* pre-call value of total_time.  This is necessary to avoid
 * double-counting any time taken by recursive calls of myself.  (We do
 * not need any similar kluge for self time, since that already excludes
 * any recursive calls.)
 */
-   INSTR_TIME_ADD(f_total, fcu->save_f_total_time);
+   INSTR_TIME_ADD(f_total, fcu->save_total_time);
  
  	/* update counters in function stats table */

if (finalize)
fs->f_numcalls++;
-   fs->f_total_time = f_total;
-   INSTR_TIME_ADD(fs->f_self_time, f_self);
+   fs->total_time = f_total;
+   INSTR_TIME_ADD(fs->self_time, f_self);
  }


I'd also rename f_self etc.



Makes sense, will do.


@@ -148,29 +148,24 @@ pg_stat_get_function_calls(PG_FUNCTION_ARGS)
PG_RETURN_INT64(funcentry->f_numcalls);
  }
  
-Datum

-pg_stat_get_function_total_time(PG_FUNCTION_ARGS)
-{
-   Oid funcid = PG_GETARG_OID(0);
-   PgStat_StatFuncEntry *funcentry;
-
-   if ((funcentry = pgstat_fetch_stat_funcentry(funcid)) == NULL)
-   PG_RETURN_NULL();
-   /* convert counter from microsec to millisec for display */
-   PG_RETURN_FLOAT8(((double) funcentry->f_total_time) / 1000.0);
+#define PG_STAT_GET_FUNCENTRY_FLOAT8(stat) 
\
+Datum  
\
+CppConcat(pg_stat_get_function_,stat)(PG_FUNCTION_ARGS)
\
+{  
\
+   Oid funcid = PG_GETARG_OID(0);  
\
+   PgStat_StatFuncEntry *funcentry;
\
+   
\
+   if ((funcentry = pgstat_fetch_stat_funcentry(funcid)) == NULL)  \
+   PG_RETURN_NULL();   
\
+   /* convert counter from microsec to millisec for display */ 
\
+   PG_RETURN_FLOAT8(((double) funcentry->stat) / 1000.0);  
 \
  }


Hm. Given the conversion with / 1000, is PG_STAT_GET_FUNCENTRY_FLOAT8 an
accurate name? Maybe PG_STAT_GET_FUNCENTRY_FLOAT8_MS?

I now see that PG_STAT_GET_DBENTRY_FLOAT8 already exists, defined the same
way. But the name fields misleading enough t

Re: Using WaitEventSet in the postmaster

2023-01-11 Thread Thomas Munro

On Thu, Jan 12, 2023 at 7:57 PM Thomas Munro  wrote:
> On Thu, Jan 12, 2023 at 7:27 PM Tom Lane  wrote:
> > skink seems to have found a problem:
> >
> > ==2011873== VALGRINDERROR-BEGIN
> > ==2011873== Syscall param epoll_wait(events) points to unaddressable byte(s)
> > ==2011873==at 0x4D8DC73: epoll_wait (epoll_wait.c:30)
> > ==2011873==by 0x55CA49: WaitEventSetWaitBlock (latch.c:1527)
> > ==2011873==by 0x55D591: WaitEventSetWait (latch.c:1473)
> > ==2011873==by 0x4F2B28: ServerLoop (postmaster.c:1729)
> > ==2011873==by 0x4F3E85: PostmasterMain (postmaster.c:1452)
> > ==2011873==by 0x42643C: main (main.c:200)
> > ==2011873==  Address 0x7b1e620 is 1,360 bytes inside a recently 
> > re-allocated block of size 8,192 alloc'd
> > ==2011873==at 0x48407B4: malloc (vg_replace_malloc.c:381)
> > ==2011873==by 0x6D9D30: AllocSetContextCreateInternal (aset.c:446)
> > ==2011873==by 0x4F2D9B: PostmasterMain (postmaster.c:614)
> > ==2011873==by 0x42643C: main (main.c:200)
> > ==2011873==
> > ==2011873== VALGRINDERROR-END
>
> Repro'd here on Valgrind.  Oh, that's interesting.  WaitEventSetWait()
> wants to use an internal buffer of the size given to the constructor
> function, but passes the size of the caller's output buffer to
> epoll_wait() and friends.  Perhaps it should use Min(nevents,
> set->nevents_space).  I mean, I should have noticed that, but I think
> that's arguably a pre-existing bug in the WES code, or at least an
> unhelpful interface.  Thinking...

Yeah.  This stops valgrind complaining here.
From f3d201a2509affc8248a930f18a58cdb7ed3220f Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 12 Jan 2023 20:05:38 +1300
Subject: [PATCH] Fix WaitEventSetWait() buffer overrun.

The WAIT_USE_EPOLL and WAIT_USE_KQUEUE implementations of
WaitEventSetWaitBlock() confused the size of their internal buffer with
the size of the caller's output buffer, and could ask the kernel for too
many events.  In fact the set of events retrieved from the kernel needs
to be able to fit in both buffers, so take the minimum of the two.

The WAIT_USE_POLL and WAIT_USE WIN32 implementations didn't have this
confusion.

This probably didn't come up before because we always used the same
number in both places, but commit 7389aad6 calculates a dynamic size at
construction time, while using MAXLISTEN for its output event buffer on
the stack.  That seems like a reasonable thing to want to do, so
consider this to be a pre-existing bug worth fixing.

As reported by skink, valgrind and Tom Lane.

Discussion: https://postgr.es/m/901504.1673504836%40sss.pgh.pa.us
---
 src/backend/storage/ipc/latch.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index a238c5827c..d79d71a851 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1525,7 +1525,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 
 	/* Sleep */
 	rc = epoll_wait(set->epoll_fd, set->epoll_ret_events,
-	nevents, cur_timeout);
+	Min(nevents, set->nevents_space), cur_timeout);
 
 	/* Check return code */
 	if (rc < 0)
@@ -1685,7 +1685,8 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 
 	/* Sleep */
 	rc = kevent(set->kqueue_fd, NULL, 0,
-set->kqueue_ret_events, nevents,
+set->kqueue_ret_events,
+Min(nevents, set->nevents_space),
 timeout_p);
 
 	/* Check return code */
-- 
2.35.1

Re: [PATCH] Const'ify the arguments of ilist.c/ilist.h functions

2023-01-11 Thread Peter Eisentraut


On 07.01.23 08:21, Peter Eisentraut wrote:

On 23.11.22 14:57, Aleksander Alekseev wrote:

Hi Andres,

Thanks for the review!

I don't think it is correct for any of these to add const. The only 
reason it

works is because of casting etc.


Fair enough. PFA the corrected patch v2.


This patch version looks correct to me.  It is almost the same as the 
one that Andres had posted in his thread, except that yours also 
modifies slist_delete() and dlist_member_check().  Both of these changes 
also look correct to me.


committed

Re: Strengthen pg_waldump's --save-fullpage tests

2023-01-11 Thread Drouvot, Bertrand


Hi,

On 1/12/23 5:44 AM, Michael Paquier wrote:

On Wed, Jan 11, 2023 at 07:17:47PM +0530, Bharath Rupireddy wrote:

Note that the raw page on the table might differ not just in page LSN
but also in other fields, for instance see heap_mask for instance. It
masks lsn, checksum, hint bits, unused space etc. before verifying FPI
consistency during recovery in
verifyBackupPageConsistency().


FWIW, I don't really want to enter in this business here.  That feels
like a good addition of technical debt compared to the potential
gain.


Agree, let's forget about it.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Exposing the lock manager's WaitForLockers() to SQL

2023-01-11 Thread Will Mortensen

Hi Andres,

On Wed, Jan 11, 2023 at 12:33 PM Andres Freund  wrote:
> I think such a function would still have to integrate enough with the lock
> manager infrastructure to participate in the deadlock detector. Otherwise I
> think you'd trivially end up with loads of deadlocks.

Could you elaborate on which unusual deadlock concerns arise? To be
clear, WaitForLockers() is an existing function in lmgr.c
(https://github.com/postgres/postgres/blob/216a784829c2c5f03ab0c43e009126cbb819e9b2/src/backend/storage/lmgr/lmgr.c#L986),
and naively it seems like we mostly just need to call it. To my very
limited understanding, from looking at the existing callers and the
implementation of LOCK, that would look something like this
(assuming we're in a SQL command like LOCK and calling unmodified
WaitForLockers() with a single table):

1. Call something like RangeVarGetRelidExtended() with AccessShareLock
to ensure the table is not dropped and obtain the table oid

2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid

3. Call WaitForLockers(), which internally calls GetLockConflicts() and
VirtualXactLock(). These certainly take plenty of locks of various types,
and will likely sleep in LockAcquire() waiting for transactions to finish,
but there don't seem to be any unusual pre/postconditions, nor do we
hold any unusual locks already.

Obviously a deadlock is possible if transactions end up waiting for each
other, just as when taking table or row locks, etc., but it seems like this
would be detected as usual?

RE: Fix pg_publication_tables to exclude generated columns

2023-01-11 Thread shiy.f...@fujitsu.com

On Wed, Jan 11, 2023 2:40 PM Amit Kapila  wrote:
> 
> On Wed, Jan 11, 2023 at 10:07 AM Tom Lane  wrote:
> >
> > Amit Kapila  writes:
> > >> On Mon, Jan 9, 2023 11:06 PM Tom Lane  wrote:
> > >>> We could just not fix it in the back branches.  I'd argue that this is
> > >>> as much a definition change as a bug fix, so it doesn't really feel
> > >>> like something to back-patch anyway.
> >
> > > So, if we don't backpatch then it could lead to an error when it
> > > shouldn't have which is clearly a bug. I think we should backpatch
> > > this unless Tom or others are against it.
> >
> > This isn't a hill that I'm ready to die on ... but do we have any field
> > complaints about this?  If not, I still lean against a back-patch.
> > I think there's a significant risk of breaking case A while fixing
> > case B when we change this behavior, and that's something that's
> > better done only in a major release.
> >
> 
> Fair enough, but note that there is a somewhat related problem for
> dropped columns [1] as well. While reviewing that it occurred to me
> that generated columns also have a similar problem which leads to this
> thread (it would have been better if there is a mention of the same in
> the initial email). Now, as symptoms are similar, I think we shouldn't
> back-patch that as well, otherwise, it will appear to be partially
> fixed. What do you think?
> 
> [1] - https://www.postgresql.org/message-
> id/OSZPR01MB631087C65BA81E1FEE5A60D2FDF59%40OSZPR01MB6310.jpnpr
> d01.prod.outlook.com
> 

I agree to only fix them on HEAD.

I merged this patch and the one in [1] as they are similar problems. Please
see the attached patch.

I removed the changes in tablesync.c which simplified the query in
fetch_remote_table_info(), because it only works for publishers of v16. Those
changes are based on pg_get_publication_tables() returning all columns when no
column list is specified, but publishers of v15 return NULL in that case.

[1] 
https://www.postgresql.org/message-id/OSZPR01MB631087C65BA81E1FEE5A60D2FDF59%40OSZPR01MB6310.jpnprd01.prod.outlook.com


Regards,
Shi yu


v2-0001-Ignore-dropped-columns-and-generated-columns-when.patch
Description:  v2-0001-Ignore-dropped-columns-and-generated-columns-when.patch


v2-0002-Tests-for-checking-column-list-restriction.patch
Description: v2-0002-Tests-for-checking-column-list-restriction.patch

Re: Using WaitEventSet in the postmaster

2023-01-11 Thread Thomas Munro

On Thu, Jan 12, 2023 at 7:27 PM Tom Lane  wrote:
> skink seems to have found a problem:
>
> ==2011873== VALGRINDERROR-BEGIN
> ==2011873== Syscall param epoll_wait(events) points to unaddressable byte(s)
> ==2011873==at 0x4D8DC73: epoll_wait (epoll_wait.c:30)
> ==2011873==by 0x55CA49: WaitEventSetWaitBlock (latch.c:1527)
> ==2011873==by 0x55D591: WaitEventSetWait (latch.c:1473)
> ==2011873==by 0x4F2B28: ServerLoop (postmaster.c:1729)
> ==2011873==by 0x4F3E85: PostmasterMain (postmaster.c:1452)
> ==2011873==by 0x42643C: main (main.c:200)
> ==2011873==  Address 0x7b1e620 is 1,360 bytes inside a recently re-allocated 
> block of size 8,192 alloc'd
> ==2011873==at 0x48407B4: malloc (vg_replace_malloc.c:381)
> ==2011873==by 0x6D9D30: AllocSetContextCreateInternal (aset.c:446)
> ==2011873==by 0x4F2D9B: PostmasterMain (postmaster.c:614)
> ==2011873==by 0x42643C: main (main.c:200)
> ==2011873==
> ==2011873== VALGRINDERROR-END

Repro'd here on Valgrind.  Oh, that's interesting.  WaitEventSetWait()
wants to use an internal buffer of the size given to the constructor
function, but passes the size of the caller's output buffer to
epoll_wait() and friends.  Perhaps it should use Min(nevents,
set->nevents_space).  I mean, I should have noticed that, but I think
that's arguably a pre-existing bug in the WES code, or at least an
unhelpful interface.  Thinking...

Re: On login trigger: take three

2023-01-11 Thread Pavel Stehule

Hi


po 19. 12. 2022 v 10:40 odesílatel Mikhail Gribkov 
napsal:

> Hi Ted,
>
> > bq. in to the system
> > 'in to' -> into
> > bq. Any bugs in a trigger procedure
> > Any bugs -> Any bug
>
> Thanks!  Fixed typos in the attached v35.
>
> >   +   if (event == EVT_Login)
> >   +   dbgtag = CMDTAG_LOGIN;
> >   +   else
> >   +   dbgtag = CreateCommandTag(parsetree);
> > The same snippet appears more than once. It seems CMDTAG_LOGIN handling
> can be moved into `CreateCommandTag`.
>
> It appears twice in fact, both cases are nearby: in the main code and
> under the assert-checking ifdef.
> Moving CMDTAG_LOGIN to CreateCommandTag would change the CreateCommandTag
> function signature and ideology. CreateCommandTag finds a tag based on the
> command parse tree, while login event is a specific case when we do not
> have any command and the parse tree is NULL. Changing CreateCommandTag
> signature for these two calls looks a little bit overkill as it would lead
> to changing lots of other places the function is called from.
> We could move this snippet to a separate function, but here are the
> similar concerns I think: it would make sense for a more common or a more
> complex snippet, but not for two cases of if-else one-liners.
>

I checked this patch and it looks well. All tests passed. Together with
https://commitfest.postgresql.org/41/4013/ it can be a good feature.

I re-tested impact on performance and for the worst case looks like less
than 1% (0.8%). I think it is acceptable. Tested pgbench scenario "SELECT
1"

pgbench -f ~/test.sql -C -c 3 -j 5 -T 100 -P10 postgres

733 tps (master), 727 tps (patched).

I think raising an exception inside should be better tested - not it is
only in 001_stream_rep.pl - generally more tests are welcome - there are no
tested handling exceptions.

Regards

Pavel


> --
>  best regards,
> Mikhail A. Gribkov
>
> e-mail: youzh...@gmail.com
> *http://www.flickr.com/photos/youzhick/albums
> *
> http://www.strava.com/athletes/5085772
> phone: +7(916)604-71-12
> Telegram: @youzhick
>
>
>
> On Sat, Dec 17, 2022 at 3:29 PM Ted Yu  wrote:
>
>>
>>
>> On Sat, Dec 17, 2022 at 3:46 AM Mikhail Gribkov 
>> wrote:
>>
>>> Hi Nikita,
>>>
>>> > Mikhail, I've checked the patch and previous discussion,
>>> > the condition mentioned earlier is still actual:
>>>
>>> Thanks for pointing this out! My bad, I forgot to fix the documentation
>>> example.
>>> Attached v34 has this issue fixed, as well as a couple other problems
>>> with the example code.
>>>
>>> --
>>>  best regards,
>>> Mikhail A. Gribkov
>>>
>> Hi,
>>
>> bq. in to the system
>>
>> 'in to' -> into
>>
>> bq. Any bugs in a trigger procedure
>>
>> Any bugs -> Any bug
>>
>> +   if (event == EVT_Login)
>> +   dbgtag = CMDTAG_LOGIN;
>> +   else
>> +   dbgtag = CreateCommandTag(parsetree);
>>
>> The same snippet appears more than once. It seems CMDTAG_LOGIN handling
>> can be moved into `CreateCommandTag`.
>>
>> Cheers
>>
>

Re: Fix condition in shm_toc and remove unused function shm_toc_freespace.

2023-01-11 Thread Zhang Mingli

Hi,

On Jan 12, 2023, 14:34 +0800, Zhang Mingli , wrote:
> Hi, hackers
>
> Some conditions in shm_toc_insert and shm_toc_allocate are bogus, like:
>   if (toc_bytes + nbytes > total_bytes || toc_bytes + nbytes < 
> toc_bytes)
> Remove the condition `toc_bytes + nbytes < toc_bytes` and take a 
> sizeof(shm_entry) into account in shm_toc_allocate though
> shm_toc_allocate does that too.
  shm_toc_insert does that too, and  we can report error earlier.

Regards,
Zhang Mingli

Re: Small miscellaneus fixes (Part II)

2023-01-11 Thread John Naylor

On Thu, Jan 12, 2023 at 12:34 PM Justin Pryzby  wrote:
>
> On Thu, Jan 12, 2023 at 12:15:24PM +0700, John Naylor wrote:
> > On Fri, Dec 23, 2022 at 8:08 AM Justin Pryzby 
wrote:
> >
> > > Makes sense now (in your first message, you said that the problem was
> > > with "sign", and the patch didn't address the actual problem in
> > > IS_PLUS()).
> > >
> > > One can look and find that the unreachable code was introduced at
> > > 7a3e7b64a.
> > >
> > > With your proposed change, the unreachable line is hit by regression
> > > tests, which is an improvment.  As is the change to pg_dump.c.
> >
> > But that now reachable line just unsets a flag that we previously found
> > unset, right?
>
> Good point.
>
> > And if that line was unreachable, then surely the previous flag-clearing
> > operation is too?
> >
> > 5669  994426 : if (IS_MINUS(Np->Num)) // <- also
always
> > false
> > 5670   0 : Np->Num->flag &= ~NUM_F_MINUS;
> > 5671 : }
> > 5672 524 : else if (Np->sign != '+' &&
IS_PLUS(Np->Num))
> > 5673   0 : Np->Num->flag &= ~NUM_F_PLUS;
> >
> >
https://coverage.postgresql.org/src/backend/utils/adt/formatting.c.gcov.html
> >
> > I'm inclined to turn the dead unsets into asserts.
>
> To be clear - did you mean like this ?
>
> diff --git a/src/backend/utils/adt/formatting.c
b/src/backend/utils/adt/formatting.c
> index a4b524ea3ac..848956879f5 100644
> --- a/src/backend/utils/adt/formatting.c
> +++ b/src/backend/utils/adt/formatting.c
> @@ -5662,15 +5662,13 @@ NUM_processor(FormatNode *node, NUMDesc *Num,
char *inout,
> }
> else
> {
> +   Assert(!IS_MINUS(Np->Num));
> +   Assert(!IS_PLUS(Np->Num));
> if (Np->sign != '-')
> {
> if (IS_BRACKET(Np->Num) &&
IS_FILLMODE(Np->Num))
> Np->Num->flag &= ~NUM_F_BRACKET;
> -   if (IS_MINUS(Np->Num))
> -   Np->Num->flag &= ~NUM_F_MINUS;
> }
> -   else if (Np->sign != '+' && IS_PLUS(Np->Num))
> -   Np->Num->flag &= ~NUM_F_PLUS;
>
> if (Np->sign == '+' && IS_FILLMODE(Np->Num) &&
IS_LSIGN(Np->Num) == false)
> Np->sign_wrote = true;  /* needn't sign */

I was originally thinking of something more localized:

--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -5666,11 +5666,10 @@ NUM_processor(FormatNode *node, NUMDesc *Num, char
*inout,
 {
 if (IS_BRACKET(Np->Num) && IS_FILLMODE(Np->Num))
 Np->Num->flag &= ~NUM_F_BRACKET;
-if (IS_MINUS(Np->Num))
-Np->Num->flag &= ~NUM_F_MINUS;
+Assert(!IS_MINUS(Np->Num));
 }
-else if (Np->sign != '+' && IS_PLUS(Np->Num))
-Np->Num->flag &= ~NUM_F_PLUS;
+else if (Np->sign != '+')
+Assert(!IS_PLUS(Np->Num));

...but arguably the earlier check is close enough that it's silly to assert
in the "else" branch, and I'd be okay with just nuking those lines. Another
thing that caught my attention is the assumption that unsetting a bit is so
expensive that we have to first check if it's set, so we may as well remove
"IS_BRACKET(Np->Num)" as well.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Common function for percent placeholder replacement

2023-01-11 Thread Peter Eisentraut


On 11.01.23 19:54, Nathan Bossart wrote:

On Wed, Jan 11, 2023 at 11:09:01AM +0100, Peter Eisentraut wrote:

committed with that fixed


While rebasing my recovery modules patch set, I noticed a couple of small
things that might be worth cleaning up.


committed, thanks

Fix condition in shm_toc and remove unused function shm_toc_freespace.

2023-01-11 Thread Zhang Mingli

Hi, hackers

Some conditions in shm_toc_insert and shm_toc_allocate are bogus, like:

if (toc_bytes + nbytes > total_bytes || toc_bytes + nbytes < toc_bytes)

Remove the condition `toc_bytes + nbytes < toc_bytes` and take a 
sizeof(shm_entry) into account in shm_toc_allocate though
shm_toc_allocate does that too.

/* Check for memory exhaustion and overflow. */
- if (toc_bytes + nbytes > total_bytes || toc_bytes + nbytes < 
toc_bytes)
+ if (toc_bytes + sizeof(shm_toc_entry) + nbytes > total_bytes)
 {
    SpinLockRelease(&toc->toc_mutex);

shm_toc_freespace is introduced with shm_toc by original commit 6ddd5137b2, but 
is not used since then, so remove it.


Regards,
Zhang Mingli


v0-0001-Fix-condition-in-shm_toc-and-remove-unused-functi.patch
Description: Binary data

Re: recovery modules

2023-01-11 Thread Michael Paquier

On Wed, Jan 11, 2023 at 11:29:01AM -0800, Nathan Bossart wrote:
> I'm having trouble thinking of any practical advantage of providing the
> redo LSN and TLI.  If the main use-case is removing older archives as the
> documentation indicates, it seems better to provide the file name so that
> you can plug it straight into strcmp() to determine whether the file can be
> removed (i.e., what pg_archivecleanup does).  If we provided the LSN and
> TLI instead, you'd either need to convert that into a WAL file name for
> strcmp(), or you'd need to convert the candidate file name into an LSN and
> TLI and compare against those.

Logging was one thing that came immediately in mind, to let the module
know the redo LSN and TLI the segment name was built from without
having to recalculate it back.  If you don't feel strongly about that,
I am fine to discard this remark.  It is not like this hook should be
set in stone across major releases, in any case.

> I initially created a separate basic_restore module, but decided to fold it
> into basic_archive to simplify the patch and tests.  I hesitated to rename
> it because it already exists in v15, and since it deals with creating and
> restoring archive files, the name still seemed somewhat accurate.  That
> being said, I don't mind renaming it if that's what folks want.

I've done that in the past for pg_verify_checksums -> pg_checksums, so
I would not mind renaming it so as it reflects better its role.
(Being outvoted is fine for me if this suggestion sounds bad).

Saying that, 0001 seems fine on its own (minus the redo LSN/TLI with
the duplication for the segment name build), so I would be tempted to
get this one done.  My gut tells me that we'd better remove the
duplication and just pass down the two fields to
shell_archive_cleanup() and shell_recovery_end(), with the segment
name given to ExecuteRecoveryCommand()..
--
Michael


signature.asc
Description: PGP signature

Re: Using WaitEventSet in the postmaster

2023-01-11 Thread Tom Lane

Thomas Munro  writes:
> Pushed, after a few very minor adjustments, mostly comments.  Thanks
> for the reviews and pointers.  I think there are quite a lot of
> refactoring and refinement opportunities unlocked by this change (I
> have some draft proposals already), but for now I'll keep an eye on
> the build farm.

skink seems to have found a problem:

==2011873== VALGRINDERROR-BEGIN
==2011873== Syscall param epoll_wait(events) points to unaddressable byte(s)
==2011873==at 0x4D8DC73: epoll_wait (epoll_wait.c:30)
==2011873==by 0x55CA49: WaitEventSetWaitBlock (latch.c:1527)
==2011873==by 0x55D591: WaitEventSetWait (latch.c:1473)
==2011873==by 0x4F2B28: ServerLoop (postmaster.c:1729)
==2011873==by 0x4F3E85: PostmasterMain (postmaster.c:1452)
==2011873==by 0x42643C: main (main.c:200)
==2011873==  Address 0x7b1e620 is 1,360 bytes inside a recently re-allocated 
block of size 8,192 alloc'd
==2011873==at 0x48407B4: malloc (vg_replace_malloc.c:381)
==2011873==by 0x6D9D30: AllocSetContextCreateInternal (aset.c:446)
==2011873==by 0x4F2D9B: PostmasterMain (postmaster.c:614)
==2011873==by 0x42643C: main (main.c:200)
==2011873== 
==2011873== VALGRINDERROR-END

regards, tom lane

Re: Add a new pg_walinspect function to extract FPIs from WAL records

2023-01-11 Thread Michael Paquier

On Wed, Jan 11, 2023 at 06:59:18PM +0530, Bharath Rupireddy wrote:
> I've done it that way for pg_get_wal_fpi_info. If this format looks
> okay, I can propose to do the same for other functions (for
> backpatching too) in a separate thread though.

My vote would be to make that happen first, to have in place cleaner
basics for the docs.  I could just do it and move on..

> We deliberated to have something like below:
> pg_get_wal_stats(start_lsn, end_lsn, till_end_of_wal default false);
> pg_get_wal_records_info(start_lsn, end_lsn, till_end_of_wal default false);
> 
> We wanted to have better validation of the start_lsn and end_lsn, that
> is, start_lsn < end_lsn and end_lsn mustn't fall into the future when
> users specify it by themselves (otherwise, one can easily trick the
> server by passing in the extreme end of the LSN - 0x).
> And, we couldn't find a better way to deal with when till_end_of_wal
> is passed as true (in the above version of the functions).
>
> Another idea was to have something like below:
> pg_get_wal_stats(start_lsn, end_lsn default '0/0');
> pg_get_wal_records_info(start_lsn, end_lsn default '0/0');
> 
> When end_lsn is not entered or entered as invalid lsn, then return the
> stats/info till end of the WAL. Again, we wanted to have some
> validation of the user-entered end_lsn.

This reminds me of the slot advancing, where we discarded this case
just because it is useful to enforce a LSN far in the future.
Honestly, I cannot think of any case where I would use this level of
validation, especially having *two* functions to decide one behavior
or the other: this stuff could just use one function and use for
example NULL as a setup to enforce the end of WAL, on top of a LSN far
ahead..  But well..

>> be useful to make this new FPI function work at least with an insanely
>> high LSN value to make sure that we fetch all the FPIs from a given
>> start position, up to the end of WAL?  That looks like a pretty good
>> default behavior to me, rather than issuing an error when a LSN is
>> defined as in the future..  I am really wondering why we have
>> ValidateInputLSNs(till_end_of_wal=false) to begin with, while we could
>> just allow any LSN value in the future automatically, as we can know
>> the current insert or replay LSNs (depending on the recovery state).
> 
> Hm. How about having pg_get_wal_fpi_info_till_end_of_wal() then?

I don't really want to make the interface more bloated with more
functions than necessary, TBH, though I get your point to push for
consistency in these functions.  This makes me really wonder whether
we'd better make relax all the existing functions, but I may get
outvoted.
--
Michael


signature.asc
Description: PGP signature

Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-01-11 Thread Masahiko Sawada

On Wed, Jan 11, 2023 at 12:13 PM John Naylor
 wrote:
>
> On Tue, Jan 10, 2023 at 7:08 PM Masahiko Sawada  wrote:
>
> > It looks no problem in terms of vacuum integration, although I've not
> > fully tested yet. TID store uses the radix tree as the main storage,
> > and with the template radix tree, the data types for shared and
> > non-shared will be different. TID store can have an union for the
> > radix tree and the structure would be like follows:
>
> > /* Storage for Tids */
> > union tree
> > {
> > local_radix_tree*local;
> > shared_radix_tree   *shared;
> > };
>
> We could possibly go back to using a common data type for this, but with 
> unused fields in each setting, as before. We would have to be more careful of 
> things like the 32-bit crash from a few weeks ago.

One idea to have a common data type without unused fields is to use
radix_tree a base class. We cast it to radix_tree_shared or
radix_tree_local depending on the flag is_shared in radix_tree. For
instance we have like (based on non-template version),

struct radix_tree
{
boolis_shared;
MemoryContext context;
};

typedef struct rt_shared
{
rt_handle   handle;
uint32  magic;

/* Root node */
   dsa_pointer  root;

uint64  max_val;
uint64  num_keys;

/* need a lwlock */

/* statistics */
#ifdef RT_DEBUG
int32   cnt[RT_SIZE_CLASS_COUNT];
#endif
} rt_shared;

struct radix_tree_shared
{
radix_tree rt;

rt_shared *shared;
dsa_area *area;
} radix_tree_shared;

struct radix_tree_local
{
radix_tree rt;

uint64  max_val;
uint64  num_keys;

rt_node *root;

/* used only when the radix tree is private */
MemoryContextData *inner_slabs[RT_SIZE_CLASS_COUNT];
MemoryContextData *leaf_slabs[RT_SIZE_CLASS_COUNT];

/* statistics */
#ifdef RT_DEBUG
int32   cnt[RT_SIZE_CLASS_COUNT];
#endif
} radix_tree_local;

>
> > In the functions of TID store, we need to call either local or shared
> > radix tree functions depending on whether TID store is shared or not.
> > We need if-branch for each key-value pair insertion, but I think it
> > would not be a big performance problem in TID store use cases, since
> > vacuum is an I/O intensive operation in many cases.
>
> Also, the branch will be easily predicted. That was still true in earlier 
> patches, but with many more branches and fatter code paths.
>
> > Overall, I think
> > there is no problem and I'll investigate it in depth.
>
> Okay, great. If the separate-functions approach turns out to be ugly, we can 
> always go back to the branching approach for shared memory. I think we'll 
> want to keep this as a template overall, at least to allow different value 
> types and to ease adding variable-length keys if someone finds a need.

I agree to keep this as a template. From the vacuum integration
perspective, it would be better if we can use a common data type for
shared and local. It makes sense to have different data types if the
radix trees have different values types.

>
> > Apart from that, I've been considering the lock support for shared
> > radix tree. As we discussed before, the current usage (i.e, only
> > parallel index vacuum) doesn't require locking support at all, so it
> > would be enough to have a single lock for simplicity.
>
> Right, that should be enough for PG16.
>
> > If we want to
> > use the shared radix tree for other use cases such as the parallel
> > heap vacuum or the replacement of the hash table for shared buffers,
> > we would need better lock support.
>
> For future parallel pruning, I still think a global lock is "probably" fine 
> if the workers buffer in local arrays. Highly concurrent applications will 
> need additional work, of course.
>
> > For example, if we want to support
> > Optimistic Lock Coupling[1],
>
> Interesting, from the same authors!

+1

>
> > we would need to change not only the node
> > structure but also the logic. Which probably leads to widen the gap
> > between the code for non-shared and shared radix tree. In this case,
> > once we have a better radix tree optimized for shared case, perhaps we
> > can replace the templated shared radix tree with it. I'd like to hear
> > your opinion on this line.
>
> I'm not in a position to speculate on how best to do scalable concurrency, 
> much less how it should coexist with the local implementation. It's 
> interesting that their "ROWEX" scheme gives up maintaining order in the 
> linear nodes.

>
> > > One review point I'll mention: Somehow I didn't notice there is no use 
> > > for the "chunk" field in the rt_node type -- it's only set to zero and 
> > > copied when growing. What is the purpose? Removing it would allow the 
> > > smallest node to take up only 32 bytes with a fanout of 3, by eliminating 
> > > padding.
> >
> > Oh, I didn't notice that. The chunk field was originally used when
> > redirecting the child pointer in the parent node from old to new
> > (grown

Re: pgsql: Doc: add XML ID attributes to and tags.

2023-01-11 Thread Brar Piening


On 12.01.2023 at 00:05, Tom Lane wrote:

Peter Eisentraut  writes:

Any reason the new ids in create_database.sgml deviate from the normal
naming schemes used everywhere else?  Is it to preserve the existing
create-database-strategy?  Maybe we should rename that one and make the
new ones consistent?


I don't remember every single choice I made but the general goal was to
at least stay consistent within a varlist/[sub]section/file/chapter and
*never* change pre-existing ids since somebody may already be pointing
to them.

After all it is just an identifier that is supposed to be unique and
should not hurt our aesthetic feelings too much.

The consistency is mostly because we tend to like it and maybe also to
avoid collisions when making up new ids but I doubt that anybody will
ever try to remember an id or infer one form knowledge about the thing
it should be pointing at. I consider it a pretty opaque string that is
meant for copy-paste from a browser to some editing window.

It is all in our head and as a matter of fact we could be using UUIDs as
Ids and save us from any further consistency issues. It's just that they
look so ugly.


You'd have to ask Brar that, I didn't question his choices too much.

I have no objection to changing things as you suggest.I'm hesitant to
rename very many pre-existing IDs for fear of breaking peoples' bookmarks,
but changing create-database-strategy doesn't seem like a big deal.


Personally I'd only d this for ids that haven't been "released" as
official documentation (even as "devel" since the new things tend to
attract more discussions and probably linking). I very much consider
URLs as UI and go long ways to keep them consistent (HTTP 3xx is a
friend of mine) as you never know who might be pointing at them from
where and making them a moving target defeats their purpose and probably
hurt more than some inconsistency.

Regards,

Brar

Re: Small miscellaneus fixes (Part II)

2023-01-11 Thread Justin Pryzby

On Thu, Jan 12, 2023 at 12:15:24PM +0700, John Naylor wrote:
> On Fri, Dec 23, 2022 at 8:08 AM Justin Pryzby  wrote:
> 
> > Makes sense now (in your first message, you said that the problem was
> > with "sign", and the patch didn't address the actual problem in
> > IS_PLUS()).
> >
> > One can look and find that the unreachable code was introduced at
> > 7a3e7b64a.
> >
> > With your proposed change, the unreachable line is hit by regression
> > tests, which is an improvment.  As is the change to pg_dump.c.
> 
> But that now reachable line just unsets a flag that we previously found
> unset, right?

Good point.

> And if that line was unreachable, then surely the previous flag-clearing
> operation is too?
> 
> 5669  994426 : if (IS_MINUS(Np->Num)) // <- also always
> false
> 5670   0 : Np->Num->flag &= ~NUM_F_MINUS;
> 5671 : }
> 5672 524 : else if (Np->sign != '+' && IS_PLUS(Np->Num))
> 5673   0 : Np->Num->flag &= ~NUM_F_PLUS;
> 
> https://coverage.postgresql.org/src/backend/utils/adt/formatting.c.gcov.html
> 
> I'm inclined to turn the dead unsets into asserts.

To be clear - did you mean like this ?

diff --git a/src/backend/utils/adt/formatting.c 
b/src/backend/utils/adt/formatting.c
index a4b524ea3ac..848956879f5 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -5662,15 +5662,13 @@ NUM_processor(FormatNode *node, NUMDesc *Num, char 
*inout,
}
else
{
+   Assert(!IS_MINUS(Np->Num));
+   Assert(!IS_PLUS(Np->Num));
if (Np->sign != '-')
{
if (IS_BRACKET(Np->Num) && IS_FILLMODE(Np->Num))
Np->Num->flag &= ~NUM_F_BRACKET;
-   if (IS_MINUS(Np->Num))
-   Np->Num->flag &= ~NUM_F_MINUS;
}
-   else if (Np->sign != '+' && IS_PLUS(Np->Num))
-   Np->Num->flag &= ~NUM_F_PLUS;
 
if (Np->sign == '+' && IS_FILLMODE(Np->Num) && 
IS_LSIGN(Np->Num) == false)
Np->sign_wrote = true;  /* needn't sign */

Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf

2023-01-11 Thread Michael Paquier

On Wed, Jan 11, 2023 at 03:22:35PM +0100, Jelte Fennema wrote:
> The main uncertainty I have is if the case insensitivity check is
> actually needed in check_role. It seems like a case insensitive
> check against the database user shouldn't actually be necessary.
> I only understand the need for the case insensitive check against
> the system user. But I have too little experience with LDAP/kerberos
> to say for certain. So for now I kept the existing behaviour to
> not regress.

if (!identLine->pg_user->quoted &&
+   identLine->pg_user->string[0] != '+' &&
+   !token_is_keyword(identLine->pg_user, "all") &&
+   !token_has_regexp(identLine->pg_user) &&
If we finish by allowing a regexp for the PG user in an IdentLine, I
would choose to drop "all" entirely.  Simpler is better when it comes
to authentication, though we are working on getting things more..
Complicated.

+   Quoting a database-username containing
+   \1 makes the \1
+   lose its special meaning.
0002 and 0003 need careful thinking.

+# Success as the regular expression matches and \1 is replaced
+reset_pg_ident($node, 'mypeermap', qq{/^$system_user(.*)\$},
+   'test\1mapuser');
+test_role(
+   $node,
+   qq{testmapuser},
+   'peer',
+   0,
+   'with regular expression in user name map with \1',
+   log_like =>
+ [qr/connection authenticated: identity="$system_user" method=peer/]);
Relying on kerberos to check the substitution pattern is a bit
annoying..  I would be really tempted to extract and commit that
independently of the rest, actually, to provide some coverage of the
substitution case in the peer test.

> The patchset also contains 3 preparatory patches with two refactoring
> passes and one small bugfix + test additions.

Applied 0001, which looked fine and was an existing issue.  At the
end, I had no issues with the names you suggested.
--
Michael


signature.asc
Description: PGP signature

Re: resend from mailing list archive doesn't working

2023-01-11 Thread Pavel Stehule

čt 12. 1. 2023 v 6:24 odesílatel Justin Pryzby 
napsal:

> On Thu, Jan 12, 2023 at 06:20:16AM +0100, Pavel Stehule wrote:
> > Hi
> >
> > I need to resend
> >
> https://www.postgresql.org/message-id/CALte62yFXQvRrA47unpedfcn%3DGoE_VyvxcKkqj2NUhenK__qgA%40mail.gmail.com
> >
> > Unfortunately I didn't get any mail.
>
> It looks like you're already on the "CC" list, and so gmail will
> de-duplicate the mail, and you won't see the resent one.
>
> I only know that because it was explained to me here:
>
> https://www.postgresql.org/message-id/cabuevezdvk5sxwfutzfkgjb7zvf3m_y-uxbw1cxeyvbxuu6...@mail.gmail.com


It is true, I found this thread in an thrash

Thank you for info

Regards

Pavel

>
>
> --
> Justin
>

Re: resend from mailing list archive doesn't working

2023-01-11 Thread Justin Pryzby

On Thu, Jan 12, 2023 at 06:20:16AM +0100, Pavel Stehule wrote:
> Hi
> 
> I need to resend
> https://www.postgresql.org/message-id/CALte62yFXQvRrA47unpedfcn%3DGoE_VyvxcKkqj2NUhenK__qgA%40mail.gmail.com
> 
> Unfortunately I didn't get any mail.

It looks like you're already on the "CC" list, and so gmail will
de-duplicate the mail, and you won't see the resent one.

I only know that because it was explained to me here:
https://www.postgresql.org/message-id/cabuevezdvk5sxwfutzfkgjb7zvf3m_y-uxbw1cxeyvbxuu6...@mail.gmail.com

-- 
Justin

Re: Remove source code display from \df+?

2023-01-11 Thread Pavel Stehule

st 11. 1. 2023 v 22:11 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > st 11. 1. 2023 v 19:31 odesílatel Magnus Hagander 
> > napsal:
> >> This is only about Internal and C, isn't it? Isn't the oid of these
> >> static, and identified by INTERNALlanguageId and ClanguageId
> respectively?
> >> So we could just have the query show the prosrc column if the language
> oid
> >> is one of those two, and otherwise show "Please use \sf to view the
> >> source"?
>
> > I think it can be acceptable - maybe we can rename the column "source
> code"
> > like "internal name" or some like that.
>
> Yeah, "source code" has always been kind of a misnomer for these
> languages.
>
> > again I don't think printing  "Please use \sf to view the source"? "
> often
> > can be user friendly.  \? is clear and \sf is easy to use
>
> We could shorten it to "See \sf" or something like that.  But if we change
> the column header to "internal name" or the like, then the column just
> obviously doesn't apply for non-internal languages, so leaving it null
> should be fine.
>

+1



> regards, tom lane
>

resend from mailing list archive doesn't working

2023-01-11 Thread Pavel Stehule

Hi

I need to resend
https://www.postgresql.org/message-id/CALte62yFXQvRrA47unpedfcn%3DGoE_VyvxcKkqj2NUhenK__qgA%40mail.gmail.com

Unfortunately I didn't get any mail.

Regards

Pavel

Re: Small miscellaneus fixes (Part II)

2023-01-11 Thread John Naylor

On Fri, Dec 23, 2022 at 8:08 AM Justin Pryzby  wrote:

> Makes sense now (in your first message, you said that the problem was
> with "sign", and the patch didn't address the actual problem in
> IS_PLUS()).
>
> One can look and find that the unreachable code was introduced at
> 7a3e7b64a.
>
> With your proposed change, the unreachable line is hit by regression
> tests, which is an improvment.  As is the change to pg_dump.c.

But that now reachable line just unsets a flag that we previously found
unset, right?
And if that line was unreachable, then surely the previous flag-clearing
operation is too?

5669  994426 : if (IS_MINUS(Np->Num)) // <- also always
false
5670   0 : Np->Num->flag &= ~NUM_F_MINUS;
5671 : }
5672 524 : else if (Np->sign != '+' && IS_PLUS(Np->Num))
5673   0 : Np->Num->flag &= ~NUM_F_PLUS;

https://coverage.postgresql.org/src/backend/utils/adt/formatting.c.gcov.html

I'm inclined to turn the dead unsets into asserts.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: on placeholder entries in view rule action query's range table

2023-01-11 Thread Justin Pryzby

On Wed, Jan 11, 2023 at 10:45:33PM -0500, Tom Lane wrote:
> Amit Langote  writes:
> >  On Thu, Jan 12, 2023 at 10:06 AM Tom Lane  wrote:
> >> I've pushed this with some cleanup --- aside from fixing
> >> outfuncs/readfuncs, I did some more work on the comments, which
> >> I think you were too sloppy about.
> 
> > Thanks a lot for the fixes.
> 
> It looks like we're not out of the woods on this: the buildfarm
> members that run cross-version-upgrade tests are all unhappy.
> Most of them are not reporting any useful details, but I suspect
> that they are barfing because dumps from the old server include
> table-qualified variable names in some CREATE VIEW commands while
> dumps from HEAD omit the qualifications.  I don't see any
> mechanism in TestUpgradeXversion.pm that could deal with that
> conveniently, and in any case we'd have to roll out a client
> script update to the affected animals.  I fear we may have to
> revert this pending development of better TestUpgradeXversion.pm
> support.

There's a diffs available for several of them:

- SELECT citext_table.id,
-citext_table.name
+ SELECT id,
+name

It looks like TestUpgradeXversion.pm is using the diff command I sent to
get tigher bounds on allowable changes.

20210415153722.gl6...@telsasoft.com

It's ugly and a terrible hack, and I don't know whether anyone would say
it's good enough, but one could can probably avoid the diff like:

sed -r '/CREATE/,/^$/{ s/\w+\.//g }'

You'd still have to wait for it to be deployed, though.

-- 
Justin

Re: allowing for control over SET ROLE

2023-01-11 Thread Noah Misch

On Wed, Jan 11, 2023 at 03:13:29PM -0500, Robert Haas wrote:
> On Wed, Jan 11, 2023 at 10:16 AM Noah Misch  wrote:
> > I still think docs for the SET option itself should give a sense of the
> > diversity of things it's intended to control.  It could be simple.  A bunch 
> > of
> > the sites you're modifying are near text like "These restrictions enforce 
> > that
> > altering the owner doesn't do anything you couldn't do by dropping and
> > recreating the aggregate function."  Perhaps the main SET doc could say
> > something about how it restricts other things that would yield equivalent
> > outcomes.  (Incidentally, DROP is another case of something one likely 
> > doesn't
> > want the WITH SET FALSE member using.  I think that reinforces a point I 
> > wrote
> > upthread.  To achieve the original post's security objective, the role must
> > own no objects whatsoever.)
> 
> I spent a while on this. The attached is as well I was able to figure
> out how to do. What do you think?

I think this is good to go modulo one or two things:

> Subject: [PATCH v2] More documentation update for GRANT ... WITH SET OPTION.
> 
> Update the reference pages for various ALTER commands that
> mentioned that you must be a member of role that will be the
> new owner to instead say that you must be able to SET ROLE
> to the new owner. Update ddl.sgml's generate statement on this

s/generate/general/

> --- a/doc/src/sgml/ref/grant.sgml
> +++ b/doc/src/sgml/ref/grant.sgml
> @@ -298,6 +298,20 @@ GRANT  class="parameter">role_name [, ...] TO  This option defaults to TRUE.
>
>  
> +  
> +   To create an object owned by another role or give ownership of an existing
> +   object to another role, you must have the ability to SET
> +   ROLE to that role; otherwise, commands such as ALTER
> +   ... OWNER TO or CREATE DATABASE ... OWNER
> +   will fail.  However, a user who inherits the privileges of a role but does
> +   not have the ability to SET ROLE to that role may be
> +   able to obtain full access to the role by manipulating existing objects
> +   owned by that role (e.g. they could redefine an existing function to act
> +   as a Trojan horse).  Therefore, if a role's privileges are to be inherited
> +   but should not be accessible via SET ROLE, it should 
> not
> +   own any SQL objects.
> +  

I recommend deleting the phrase "are to be inherited but" as superfluous.  The
earlier sentence's mention will still be there.  WITH SET FALSE + NOINHERIT is
a combination folks should not use or should use only when the role has no
known privileges.

Re: Spinlock is missing when updating two_phase of ReplicationSlot

2023-01-11 Thread Masahiko Sawada

On Thu, Jan 12, 2023 at 1:42 PM Michael Paquier  wrote:
>
> On Wed, Jan 11, 2023 at 02:36:17PM +0900, Michael Paquier wrote:
> > Looks right to me, the paths updating the data related to the slots
> > are careful about that, even when it comes to fetching a slot from
> > MyReplicationSlot.  I have been looking around the slot code to see if
> > there are other inconsistencies, and did not notice anything standing
> > out.  Will fix..
>
> And done, thanks!

Thank you!

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-11 Thread Amit Kapila

On Thu, Jan 12, 2023 at 9:54 AM Peter Smith  wrote:
>
>
> doc/src/sgml/monitoring.sgml
>
> 5. pg_stat_subscription
>
> @@ -3198,11 +3198,22 @@ SELECT pid, wait_event_type, wait_event FROM
> pg_stat_activity WHERE wait_event i
>
>   
>
> +   apply_leader_pid integer
> +  
> +  
> +   Process ID of the leader apply worker, if this process is a apply
> +   parallel worker. NULL if this process is a leader apply worker or a
> +   synchronization worker.
> +  
> + 
> +
> + 
> +  
> relid oid
>
>
> OID of the relation that the worker is synchronizing; null for the
> -   main apply worker
> +   main apply worker and the parallel apply worker
>
>   
>
> 5a.
>
> (Same as general comment #1 about terminology)
>
> "apply_leader_pid" --> "leader_apply_pid"
>

How about naming this as just leader_pid? I think it could be helpful
in the future if we decide to parallelize initial sync (aka parallel
copy) because then we could use this for the leader PID of parallel
sync workers as well.

-- 
With Regards,
Amit Kapila.

Re: Cygwin cleanup

2023-01-11 Thread Justin Pryzby

On Wed, Jan 11, 2023 at 10:39:49PM -0600, Justin Pryzby wrote:
> Here's my latest copy of the patch.
> +  # due to resource constraints we don't run this task by default for now
> +  trigger_type: manual

Now, with trigger_type commented, so Thomas doesn't have to click
"trigger" for me.
>From 16d2553e1e1d95aea1c215d7e909c7ea57fe160e Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 25 Jul 2022 23:05:10 +1200
Subject: [PATCH] WIP CI support for Cygwin.

See also: d8e78714-dc77-4a64-783f-e863ba4d9...@2ndquadrant.com

https://cirrus-ci.com/task/5145086722834432

XXX This should use a canned Docker image with all the right packages
installed?  But if the larger image is slower to start, then maybe not...

XXX trigger_type: manual
ci-os-only: cygwin
---
 .cirrus.yml | 83 +
 configure   |  2 +-
 configure.ac|  2 +-
 src/backend/replication/logical/snapbuild.c |  4 +-
 src/test/perl/PostgreSQL/Test/Cluster.pm|  4 +-
 src/test/perl/PostgreSQL/Test/Utils.pm  | 12 ++-
 src/test/recovery/t/020_archive_status.pl   |  2 +-
 src/tools/ci/cores_backtrace.sh | 19 -
 8 files changed, 118 insertions(+), 10 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index d13726ed893..f704da2ff14 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -737,6 +737,89 @@ task:
   type: text/plain
 
 
+task:
+  name: Windows - Cygwin
+
+  # due to resource constraints we don't run this task by default for now
+  #XXX trigger_type: manual
+  # worth using only_if despite being manual, otherwise this task will show up
+  # when e.g. ci-os-only: linux is used.
+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+  # otherwise it'll be sorted before other tasks
+  depends_on: SanityCheck
+
+  #XXX only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+  #timeout_in: 120m
+
+  env:
+CPUS: 4
+BUILD_JOBS: $CPUS
+TEST_JOBS: $CPUS
+CCACHE_DIR: /tmp/ccache
+CCACHE_LOGFILE: ccache.log
+# --disable-dynamicbase
+# --with-gssapi
+CONFIGURE_CACHE: /tmp/ccache/configure.cache
+PG_TEST_USE_UNIX_SOCKETS: 1
+EXTRA_REGRESS_OPTS: --max-connections=1
+PG_TEST_EXTRA: ldap ssl # disable kerberos
+CC: ccache gcc
+CFLAGS: -Og -ggdb
+BASH: C:\tools\cygwin\bin\bash.exe --login
+
+  #windows_container:
+#image: cirrusci/windowsservercore:2019-2022.06.23
+#os_version: 2019
+  compute_engine_instance:
+image_project: $IMAGE_PROJECT
+image: family/pg-ci-windows-ci-vs-2019
+platform: windows
+cpu: $CPUS
+memory: 4G
+
+  setup_additional_packages_script: |
+choco install -y --no-progress cygwin
+C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libxml2-devel,libxslt-devel,libssl-devel,openldap-devel,libreadline-devel,perl,meson,ninja,perl-IPC-Run
+REM libkrb5-devel,krb5-server
+%BASH% -c "cygserver-config -y"
+%BASH% -c "echo 'kern.ipc.semmni 1024' >> /etc/cygserver.conf"
+%BASH% -c "echo 'kern.ipc.semmns 1024' >> /etc/cygserver.conf"
+%BASH% -c "net start cygserver"
+
+  sysinfo_script: |
+chcp
+systeminfo
+powershell -Command get-psdrive -psprovider filesystem
+set
+%BASH% -c "id; uname -a; ulimit -a -H; ulimit -a -S; export"
+
+  ccache_cache:
+folder: C:\tools\cygwin\tmp\ccache
+fingerprint_key: ccache/cygwin
+reupload_on_changes: true
+
+  configure_script: |
+%BASH% -c "cd '%cd%' && meson setup --buildtype=debug -Dcassert=true -Dssl=openssl -Duuid=e2fs -DPG_TEST_EXTRA='$PG_TEST_EXTRA' build"
+
+  build_script: |
+%BASH% -c "cd '%cd%' && ninja -C build -j${BUILD_JOBS}"
+%BASH% -c "ccache --show-stats"
+
+  always:
+upload_caches: ccache
+
+#%BASH% -c "cd '%cd%' && echo 'data_sync_retry = on' >> src/tools/ci/pg_ci_base.conf"
+#%BASH% -c "cd '%cd%' && echo 'wal_sync_method = fdatasync' >> src/tools/ci/pg_ci_base.conf"
+# --repeat 9
+  test_world_script: |
+%BASH% -c "cd '%cd%' && meson test $MTEST_ARGS --num-processes ${TEST_JOBS}"
+
+  on_failure:
+<<: *on_failure_meson
+cores_script: |
+  %BASH% -c "cd '%cd%' && src/tools/ci/cores_backtrace.sh cygwin ."
+
+
 task:
   name: CompilerWarnings
   # task that did not run, count as a success, so we need to recheck Linux'
diff --git a/configure b/configure
index 5d07fd0bb91..72d56f00534 100755
--- a/configure
+++ b/configure
@@ -16477,7 +16477,7 @@ fi
 
 # mingw has adopted a GNU-centric interpretation of optind/optreset,
 # so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
   case " $LIBOBJS " in
   *" getopt.$ac_objext "* ) ;;
   *) LIBOBJS="$LIBOBJS getopt.$ac_objext"
diff --git a/configure.ac b/configure.ac
index e9b74ced6ca..e0a9c332060 100644
--- a/configure

Re: on placeholder entries in view rule action query's range table

2023-01-11 Thread Amit Langote

On Thu, Jan 12, 2023 at 12:45 PM Tom Lane  wrote:
> Amit Langote  writes:
> >  On Thu, Jan 12, 2023 at 10:06 AM Tom Lane  wrote:
> >> I've pushed this with some cleanup --- aside from fixing
> >> outfuncs/readfuncs, I did some more work on the comments, which
> >> I think you were too sloppy about.
>
> > Thanks a lot for the fixes.
>
> It looks like we're not out of the woods on this: the buildfarm
> members that run cross-version-upgrade tests are all unhappy.
> Most of them are not reporting any useful details, but I suspect
> that they are barfing because dumps from the old server include
> table-qualified variable names in some CREATE VIEW commands while
> dumps from HEAD omit the qualifications.  I don't see any
> mechanism in TestUpgradeXversion.pm that could deal with that
> conveniently, and in any case we'd have to roll out a client
> script update to the affected animals.  I fear we may have to
> revert this pending development of better TestUpgradeXversion.pm
> support.

Ah, OK, no problem.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: Strengthen pg_waldump's --save-fullpage tests

2023-01-11 Thread Bharath Rupireddy

On Thu, Jan 12, 2023 at 10:14 AM Michael Paquier  wrote:
>
> On Wed, Jan 11, 2023 at 07:17:47PM +0530, Bharath Rupireddy wrote:
> > Note that the raw page on the table might differ not just in page LSN
> > but also in other fields, for instance see heap_mask for instance. It
> > masks lsn, checksum, hint bits, unused space etc. before verifying FPI
> > consistency during recovery in
> > verifyBackupPageConsistency().
>
> FWIW, I don't really want to enter in this business here.  That feels
> like a good addition of technical debt compared to the potential
> gain.

I couldn't agree more.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Strengthen pg_waldump's --save-fullpage tests

2023-01-11 Thread Michael Paquier

On Wed, Jan 11, 2023 at 07:17:47PM +0530, Bharath Rupireddy wrote:
> Note that the raw page on the table might differ not just in page LSN
> but also in other fields, for instance see heap_mask for instance. It
> masks lsn, checksum, hint bits, unused space etc. before verifying FPI
> consistency during recovery in
> verifyBackupPageConsistency().

FWIW, I don't really want to enter in this business here.  That feels
like a good addition of technical debt compared to the potential
gain.
--
Michael


signature.asc
Description: PGP signature

Re: Spinlock is missing when updating two_phase of ReplicationSlot

2023-01-11 Thread Michael Paquier

On Wed, Jan 11, 2023 at 02:36:17PM +0900, Michael Paquier wrote:
> Looks right to me, the paths updating the data related to the slots
> are careful about that, even when it comes to fetching a slot from
> MyReplicationSlot.  I have been looking around the slot code to see if
> there are other inconsistencies, and did not notice anything standing
> out.  Will fix..

And done, thanks!
--
Michael


signature.asc
Description: PGP signature

Re: Cygwin cleanup

2023-01-11 Thread Justin Pryzby

On Sat, Jan 07, 2023 at 12:39:11AM +1300, Thomas Munro wrote:
> On Wed, Nov 9, 2022 at 2:04 PM Justin Pryzby  wrote:
> > +data_sync_retry = on
> 
> Sharing with the list some clues that Justin and I figured out about
> what that part is doing.  Without it, you get failures like:
> 
>   PANIC:  could not open file "pg_logical/snapshots/0-14FE6B0.snap":
> No such file or directory
> 
> That's been seen before:
> 
>   https://www.postgresql.org/message-id/flat/17827.1549866683%40sss.pgh.pa.us
> 
> That thread concluded that the operating system must have a non-atomic
> rename(), ie a kernel bug.  I don't know why Cygwin would display that
> behaviour and our native Windows build not; maybe timing, or maybe our
> own open() and rename() wrappers for Windows do something important
> differently than Cygwin's open() and rename().
> 
> On reflection, that seems a bit too flimsy to have in-tree without
> more investigation, which I won't have time for myself, so I'm going
> to withdraw this entry.

Not so fast :)

Here's my latest copy of the patch.  Most recently, rather than setting
data_sync_retry=no, I changed to call fsync_fname_ext() rather than
fsync_fname(), which uses PANIC (except when data_sync_retry is
disabled).  That seems to work, showing that the problem is limited to
SnapBuildSerialize(), and not a problem with all fsync()...

https://cirrus-ci.com/task/5990885733695488

Thomas raised a good question, which was how the tests were passing when
SnapBuildSerialize() was raising an error, which is what it would've
been doing when I used data_sync_retry=no.

So .. why is wal_sync_method being used to control fsync for things
other than WAL?

See 6dc7760ac (c. 2005) which added wal_fsync_writethrough, at which
point (since 9b178555f, c. 2004) wal_sync_method was already being used
for SLOG.

Now, it's also being used for logical decoding (since b89e1510 and
858ec1185, c. 2014) in rewriteheap.c/snapbuild.c.  And pidfiles (since
ee0e525bf, 2010).  And the control file (8b938d36f7, 2019).  Note that
data_sync_retry wasn't added until 9ccdd7f66 (c.  2018)

It looks like logical decoding may be the "most wrong" place that
wal_sync_method is being used, so maybe my change is reasonable to
consider, and not just a workaround.

I'm going to re-open the CF entry to let this run for a while to see how
it works out.

-- 
Justin
>From b07add11b8bf39f5bfbae4f9072470f31da97360 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 25 Jul 2022 23:05:10 +1200
Subject: [PATCH] WIP CI support for Cygwin.

ci-os-only: cygwin

See also: d8e78714-dc77-4a64-783f-e863ba4d9...@2ndquadrant.com

https://cirrus-ci.com/task/5145086722834432

XXX This should use a canned Docker image with all the right packages
installed?  But if the larger image is slower to start, then maybe not...
---
 .cirrus.yml | 83 +
 configure   |  2 +-
 configure.ac|  2 +-
 src/backend/replication/logical/snapbuild.c |  4 +-
 src/test/perl/PostgreSQL/Test/Cluster.pm|  4 +-
 src/test/perl/PostgreSQL/Test/Utils.pm  | 12 ++-
 src/test/recovery/t/020_archive_status.pl   |  2 +-
 src/tools/ci/cores_backtrace.sh | 19 -
 8 files changed, 118 insertions(+), 10 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index d13726ed893..4507f734e94 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -737,6 +737,89 @@ task:
   type: text/plain
 
 
+task:
+  name: Windows - Cygwin
+
+  # due to resource constraints we don't run this task by default for now
+  trigger_type: manual
+  # worth using only_if despite being manual, otherwise this task will show up
+  # when e.g. ci-os-only: linux is used.
+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+  # otherwise it'll be sorted before other tasks
+  depends_on: SanityCheck
+
+  #XXX only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+  #timeout_in: 120m
+
+  env:
+CPUS: 4
+BUILD_JOBS: $CPUS
+TEST_JOBS: $CPUS
+CCACHE_DIR: /tmp/ccache
+CCACHE_LOGFILE: ccache.log
+# --disable-dynamicbase
+# --with-gssapi
+CONFIGURE_CACHE: /tmp/ccache/configure.cache
+PG_TEST_USE_UNIX_SOCKETS: 1
+EXTRA_REGRESS_OPTS: --max-connections=1
+PG_TEST_EXTRA: ldap ssl # disable kerberos
+CC: ccache gcc
+CFLAGS: -Og -ggdb
+BASH: C:\tools\cygwin\bin\bash.exe --login
+
+  #windows_container:
+#image: cirrusci/windowsservercore:2019-2022.06.23
+#os_version: 2019
+  compute_engine_instance:
+image_project: $IMAGE_PROJECT
+image: family/pg-ci-windows-ci-vs-2019
+platform: windows
+cpu: $CPUS
+memory: 4G
+
+  setup_additional_packages_script: |
+choco install -y --no-progress cygwin
+C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libxml2-devel,libxslt-devel,libssl-devel,open

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-11 Thread Peter Smith

Hi, here are some review comments for patch v78-0001.

==

General

1. (terminology)

AFAIK everywhere until now we’ve been referring everywhere
(docs/comments/code) to the parent apply worker as the "leader apply
worker". Not the "main apply worker". Not the "apply leader worker".
Not any other variations...

>From this POV I think the worker member "apply_leader_pid" would be
better named "leader_apply_pid",  but I see that this was already
committed to HEAD differently.

Maybe it is not possible (or you don't want) to change that internal
member name but IMO at least all the new code and docs should try to
be using consistent terminology (e.g. leader_apply_XXX) where
possible.

==

Commit message

2.

main_worker_pid is Process ID of the leader apply worker, if this process is a
apply parallel worker. NULL if this process is a leader apply worker or a
synchronization worker.

IIUC, this text is just cut/paste from the monitoring.sgml. In a
review comment below I suggest some changes to that text, so then this
commit message should also change to be the same.

~~

3.

The new column can make it easier to distinguish leader apply worker and apply
parallel worker which is also similar to the 'leader_pid' column in
pg_stat_activity.

SUGGESTION
The new column makes it easier to distinguish parallel apply workers
from other kinds of workers. It is implemented this way to be similar
to the 'leader_pid' column in pg_stat_activity.

==

doc/src/sgml/logical-replication.sgml

4.

+   being synchronized. Moreover, if the streaming transaction is applied in
+   parallel, there will be additional workers.

SUGGESTION
there will be additional workers -> there may be additional parallel
apply workers

==

doc/src/sgml/monitoring.sgml

5. pg_stat_subscription

@@ -3198,11 +3198,22 @@ SELECT pid, wait_event_type, wait_event FROM
pg_stat_activity WHERE wait_event i

  
   
+   apply_leader_pid integer
+  
+  
+   Process ID of the leader apply worker, if this process is a apply
+   parallel worker. NULL if this process is a leader apply worker or a
+   synchronization worker.
+  
+ 
+
+ 
+  
relid oid
   
   
OID of the relation that the worker is synchronizing; null for the
-   main apply worker
+   main apply worker and the parallel apply worker
   
  

5a.

(Same as general comment #1 about terminology)

"apply_leader_pid" --> "leader_apply_pid"

~~

5b.

The current text feels awkward. I see it was copied from the similar
text of 'pg_stat_activity' but perhaps it can be simplified a bit.

SUGGESTION
Process ID of the leader apply worker if this process is a parallel
apply worker; otherwise NULL.

~~

5c.
BEFORE
null for the main apply worker and the parallel apply worker

AFTER
null for the leader apply worker and parallel apply workers

~~

5c.

relid oid
   
   
OID of the relation that the worker is synchronizing; null for the
-   main apply worker
+   main apply worker and the parallel apply worker
   


main apply worker -> leader apply worker

~~~

6.

@@ -3212,7 +3223,7 @@ SELECT pid, wait_event_type, wait_event FROM
pg_stat_activity WHERE wait_event i
   
   
Last write-ahead log location received, the initial value of
-   this field being 0
+   this field being 0; null for the parallel apply worker
   
  

BEFORE
null for the parallel apply worker

AFTER
null for parallel apply workers

~~~

7.

@@ -3221,7 +3232,8 @@ SELECT pid, wait_event_type, wait_event FROM
pg_stat_activity WHERE wait_event i
last_msg_send_time timestamp
with time zone
   
   
-   Send time of last message received from origin WAL sender
+   Send time of last message received from origin WAL sender; null for the
+   parallel apply worker
   
  

(same as #6)

BEFORE
null for the parallel apply worker

AFTER
null for parallel apply workers

~~~

8.

@@ -3230,7 +3242,8 @@ SELECT pid, wait_event_type, wait_event FROM
pg_stat_activity WHERE wait_event i
last_msg_receipt_time
timestamp with time zone
   
   
-   Receipt time of last message received from origin WAL sender
+   Receipt time of last message received from origin WAL sender; null for
+   the parallel apply worker
   
  

(same as #6)

BEFORE
null for the parallel apply worker

AFTER
null for parallel apply workers

~~~

9.

@@ -3239,7 +3252,8 @@ SELECT pid, wait_event_type, wait_event FROM
pg_stat_activity WHERE wait_event i
latest_end_lsn pg_lsn
   
   
-   Last write-ahead log location reported to origin WAL sender
+   Last write-ahead log location reported to origin WAL sender; null for
+   the parallel apply worker
   
  

(same as #6)

BEFORE
null for the parallel apply worker

AFTER
null for parallel apply workers

~~~

10.

@@ -3249,7 +3263,7 @@ SELECT pid, wait_event_type, wait_event FROM
pg_stat_

Re: Using WaitEventSet in the postmaster

2023-01-11 Thread Thomas Munro

On Wed, Jan 11, 2023 at 4:07 PM Thomas Munro  wrote:
> Thanks.  Here's v10.  I'll wait a bit longer to see if anyone else has 
> feedback.

Pushed, after a few very minor adjustments, mostly comments.  Thanks
for the reviews and pointers.  I think there are quite a lot of
refactoring and refinement opportunities unlocked by this change (I
have some draft proposals already), but for now I'll keep an eye on
the build farm.

Re: Fixing a couple of buglets in how VACUUM sets visibility map bits

2023-01-11 Thread Peter Geoghegan

On Mon, Jan 9, 2023 at 12:58 PM Peter Geoghegan  wrote:
> On Mon, Jan 9, 2023 at 11:57 AM Andres Freund  wrote:
> > Afaict we'll need to backpatch this all the way?
>
> I thought that we probably wouldn't need to, at first. But I now think
> that we really have to.

Attached is v4. This is almost the same as v3. The only notable change
is in how the issue is explained in comments, and in the commit
message.

I have revised my opinion on this question once more. In light of what
has come to light about the issue from recent testing, I lean towards
a HEAD-only commit once again. What do you think?

I still hope to be able to commit this on my original timeline (on
Monday or so), without the issue taking up too much more of
everybody's time.

Thanks
-- 
Peter Geoghegan

v4-0001-Tighten-up-VACUUM-s-approach-to-setting-VM-bits.patch
Description: Binary data

Re: on placeholder entries in view rule action query's range table

2023-01-11 Thread Tom Lane

Amit Langote  writes:
>  On Thu, Jan 12, 2023 at 10:06 AM Tom Lane  wrote:
>> I've pushed this with some cleanup --- aside from fixing
>> outfuncs/readfuncs, I did some more work on the comments, which
>> I think you were too sloppy about.

> Thanks a lot for the fixes.

It looks like we're not out of the woods on this: the buildfarm
members that run cross-version-upgrade tests are all unhappy.
Most of them are not reporting any useful details, but I suspect
that they are barfing because dumps from the old server include
table-qualified variable names in some CREATE VIEW commands while
dumps from HEAD omit the qualifications.  I don't see any
mechanism in TestUpgradeXversion.pm that could deal with that
conveniently, and in any case we'd have to roll out a client
script update to the affected animals.  I fear we may have to
revert this pending development of better TestUpgradeXversion.pm
support.

regards, tom lane

Re: Blocking execution of SECURITY INVOKER

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-11 18:16:32 -0800, Jeff Davis wrote:
> Motivation
> ==
> 
> SECURITY INVOKER is dangerous, particularly for administrators. There
> are numerous ways to put code in a place that's likely to be executed:
> triggers, views calling functions, logically-replicated tables, casts,
> search path and function resolution tricks, etc. If this code is run
> with the privileges of the invoker, then it provides an easy path to
> privilege escalation.

> We've addressed some of these risks, i.e. by offering better ways to
> control the search path, and by ignoring SECURITY INVOKER in some
> contexts (like maintenance commands). But it still leaves a lot of
> risks for administrators who want to do a SELECT or INSERT. And it
> limits major use cases, like logical replication (where the
> subscription owner must trust all table owners).

I'm very skeptical about this framing. On the one hand, you can do a lot of
mischief with security definer functions if they get privileged information as
well. But more importantly, just because a function is security definer,
doesn't mean it's safe to be called with attacker controlled input, and the
privilege check will be done with the rights of the admin in many of these
contexts.

And encouraging more security definer functions to be used will cause a lot of
other security issues.

However - I think the concept of more strict ownership checks is a good one. I
just don't think it's right to tie it to SECURITY INVOKER.

I think it'd be quite valuable to have a guc that prevents the execution of
any code that's not directly controlled by the privileged user. Not just
checking function ownership, but also checking ownership of the trigger
definition (i.e. table), check constraints, domain constraints, indexes with
expression columns / partial indexes, etc.

> Use Cases
> =
> 
> 1. If you are a superuser/admin working on a problem interactively, you
> can protect yourself against accidentally executing malicious code with
> your privileges.

In that case I think what's actually desirable is to simply execute no code
controlled by untrusted users.  Even a security definer function can mess up
your day when called in the wrong situation, e.g. due to getting access to the
content of arguments (e.g. a trigger's row contents) or preventing an admin's
write from taking effect (by returning the relevant values from a trigger).

And not ever allowing execution of untrusted code in that situation IME
doesn't prevent desirable things.

> 2. You can set up logical replication subscriptions into tables owned
> by users you don't trust, as long as triggers (if needed) can be safely
> written as SECURITY DEFINER.

I think a much more promising path towards that is to add a feature to logical
replication that changes the execution context to the table owner while
applying those changes.

Using security definer functions for triggers opens up a significant new
attack surface, lots of code that previously didn't need to be safe against
any possible privilege escalation, now needs to be. Expanding the scope of
what needs to protect against privesc, is a BAD idea.

> 3. You can ensure that running an extension script doesn't somehow
> execute malicious code with superuser privileges.

It's not safe to allow executing secdef code in that context either. If a less
privileged user manages to get called in that context you don't want to
execute the code, even in a secdef, you want to error out, so the problem can
be detected and rectified.

> 4. Users can protect themselves from executing malicious code in cases
> where:
>   a. role membership accurately describes the trust relationship
> already
>   b. triggers, views-calling-UDFs, etc., (if any) can be safely written
> as SECURITY DEFINER

I don't think b) is true very often. And a) seems very problematic as well, as
particularly in "pseudo superuser" environments the most feasible way to
implement pseudo superusers is to automatically grant group membership to the
pseudo superuser.  See also e5b8a4c098a.

> While that may not be every conceivable use case, it feels very useful
> to me.
> 
> Even if you really don't like SECURITY DEFINER, points 1, 3, and 4(a)
> seem like wins. And there are a lot of cases where the user simply
> doesn't need triggers (etc.).

4 doesn't seem like a win, it's a requirement?

And as I said, for 1 and 3 I think it's way preferrable to error out.

> Future Work
> ===
> 
> In some cases, we should consider defaulting (or even forcing) this GUC
> to be true, such as in a subscription apply worker.
> 
> This proposal may offer a path to allowing non-superusers to create
> event triggers.

That'd allow a less-privileged user to completely hobble the admin by erroring
out on all actions.

Greetings,

Andres Freund

Re: How to generate the new expected out file.

2023-01-11 Thread Tom Lane

Amit Kapila  writes:
> On Thu, Jan 12, 2023 at 4:39 AM Andres Freund  wrote:
>> On 2023-01-05 16:22:01 +0530, Amit Kapila wrote:
>>> You can run the tests and copy the required changes from
>>> src/test/regress/output/interval.out to
>>> src/test/regress/expected/interval.out

>> Wonder if we should have a bit of content about that in 
>> doc/src/sgml/regress.sgml?

> Yeah, I think it could be useful, especially for new people. The other
> option could be to add some information in src/test/regress/README/

Yeah, regress.sgml is more aimed at consumers of the regression tests
than developers.  I could see expanding the README to include some
developer tips.

In this case I can think of another important tip, which is to be
sure to update all of the variant expected-files when a test has
more than one of them.  If you are not in a position to reproduce
all of the variants directly (say, there's a Windows-specific
expected-file and you're not on Windows) it often works to take
the diff you have for one variant and apply it to the other(s).
If that's not quite right, well, the cfbot or buildfarm will
help you out eventually.

regards, tom lane

Re: ATTACH PARTITION seems to ignore column generation status

2023-01-11 Thread Tom Lane

Amit Langote  writes:
> I noticed a typo in the doc additions, which I've attached a fix for.

Doh, right, pushed.

regards, tom lane

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-11 Thread shveta malik

On Wed, Jan 11, 2023 at 6:16 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shveta,
>
> Thanks for reviewing! PSA new version.
>
> > 1.
> > + errmsg("min_apply_delay must not be set when streaming = parallel")));
> > we give the same error msg for  both the cases:
> > a. when subscription is created with streaming=parallel  but we are
> > trying to alter subscription to set min_apply_delay >0
> > b. when subscription is created with some min_apply_delay and we are
> > trying to alter subscription to make it streaming=parallel.
> > For case a, error msg looks fine but for case b, I think error msg
> > should be changed slightly.
> > ALTER SUBSCRIPTION regress_testsub SET (streaming = parallel);
> > ERROR:  min_apply_delay must not be set when streaming = parallel
> > This gives the feeling that we are trying to modify min_apply_delay
> > but we are not. Maybe we can change it to:
> > "subscription with min_apply_delay must not be allowed to stream
> > parallel" (or something better)
>
> Your point that error messages are strange is right. And while
> checking other ones, I found they have very similar styles. Therefore I 
> reworded
> ERROR messages in AlterSubscription() and parse_subscription_options() to 
> follow
> them. Which version is better?
>

v14 one looks much better. Thanks!

thanks
Shveta

Re: How to generate the new expected out file.

2023-01-11 Thread Amit Kapila

On Thu, Jan 12, 2023 at 4:39 AM Andres Freund  wrote:
>
> On 2023-01-05 16:22:01 +0530, Amit Kapila wrote:
> > On Thu, Jan 5, 2023 at 4:12 PM jian he  wrote:
> > > Hi.
> > >
> > > I changed the src/test/regress/sql/interval.sql, How can I generate the 
> > > new src/test/regress/expected/interval.out file.
> > >
> >
> > You can run the tests and copy the required changes from
> > src/test/regress/output/interval.out to
> > src/test/regress/expected/interval.out
>
> Wonder if we should have a bit of content about that in 
> doc/src/sgml/regress.sgml?
>

Yeah, I think it could be useful, especially for new people. The other
option could be to add some information in src/test/regress/README/


-- 
With Regards,
Amit Kapila.

RE: Allow logical replication to copy tables in binary format

2023-01-11 Thread Takamichi Osumi (Fujitsu)

On Wednesday, January 11, 2023 7:45 PM Melih Mutlu  
wrote:
> Thanks for your review.
> Done.
Hi, minor comments on v5.

(1) publisher's version check

+   /* If the publisher is v16 or later, copy in binary format.*/
+   server_version = walrcv_server_version(LogRepWorkerWalRcvConn);
+   if (server_version >=16 && MySubscription->binary)
+   {
+   appendStringInfoString(&cmd, "  WITH (FORMAT binary)");
+   options = lappend(options, makeDefElem("format", (Node *) 
makeString("binary"), -1));
+   }
+
+   elog(LOG, "version: %i, %s", server_version, cmd.data);

(1-1) There is no need to log the version and the query by elog here.
(1-2) Also, I suggest we can remove the server_version variable itself,
  because we have only one actual reference for it.
  There is a style that we call walrcv_server_version in the
  'if condition' directly like existing codes in fetch_remote_table_info().
(1-3) Suggestion to improve comments.
  FROM:
  /* If the publisher is v16 or later, copy in binary format.*/
  TO:
  /* If the publisher is v16 or later, copy data in the required data 
format. */

(2) Minor suggestion for some test code alignment.

 $result =
   $node_subscriber->safe_psql('postgres',
"SELECT sum(a) FROM tst_dom_constr");
-is($result, '21', 'sql-function constraint on domain');
+is($result, '33', 'sql-function constraint on domain');
+
+$result_binary =
+  $node_subscriber->safe_psql('postgres',
+   "SELECT sum(a) FROM tst_dom_constr");
+is($result_binary, '33', 'sql-function constraint on domain');

I think if we change the order of this part of check like below, then
it would look more aligned with other existing test codes introduced by this 
patch.

---
my $domain_check = 'SELECT sum(a) FROM tst_dom_constr';
$result = $node_subscriber->safe_psql('postgres', $domain_check);
$result_binary = $node_subscriber->safe_psql('postgres', $domain_check);
is($result, '33', 'sql-function constraint on domain');
is($result_binary, '33', 'sql-function constraint on domain in binary');
---

Best Regards,
Takamichi Osumi

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-11 Thread Kyotaro Horiguchi

At Wed, 11 Jan 2023 12:46:24 +, "Hayato Kuroda (Fujitsu)" 
 wrote in 
> them. Which version is better?


Some comments by a quick loock, different from the above.


+ CONNECTION 'host=192.168.1.50 port=5432 user=foo dbname=foodb'

I understand that we (not PG people, but IT people) are supposed to
use in documents a certain set of special addresses that is guaranteed
not to be routed in the field.

> TEST-NET-1 : 192.0.2.0/24
> TEST-NET-2 : 198.51.100.0/24
> TEST-NET-3 : 203.0.113.0/24

(I found 192.83.123.89 in the postgres_fdw doc, but it'd be another issue..)


+   if (strspn(tmp, "-0123456789 ") == strlen(tmp))

Do we need to bother spending another memory block for apparent
non-digits here?


+   errmsg(INT64_FORMAT " ms is 
outside the valid range for parameter \"%s\"",

We don't use INT64_FORMAT in translatable message strings. Cast then
use %lld instead.

This message looks unfriendly as it doesn't suggest the valid range,
and it shows the input value in a different unit from what was in the
input. A I think we can spell it as "\"%s\" is outside the valid range
for subsciription parameter \"%s\" (0 ..  in millisecond)"

+   int64   min_apply_delay;
..
+   if (ms < 0 || ms > INT_MAX)

Why is the variable wider than required?


+   errmsg("%s and %s are mutually 
exclusive options",
+  "min_apply_delay > 0", 
"streaming = parallel"));

Mmm. Couldn't we refuse 0 as min_apply_delay?


+   sub->minapplydelay > 0)
...
+   if (opts.min_apply_delay > 0 &&

Is there any reason for the differenciation?


+   errmsg("cannot 
set %s for subscription with %s",
+  
"streaming = parallel", "min_apply_delay > 0"));

I think that this shoud be more like human-speking. Say, "cannot
enable min_apply_delay for subscription in parallel streaming mode" or
something..  The same is applicable to the nearby message.



+static void maybe_delay_apply(TimestampTz ts);

  apply_spooled_messages(FileSet *stream_fileset, TransactionId xid,
-  XLogRecPtr lsn)
+  XLogRecPtr lsn, TimestampTz ts)

"ts" looks too generic. Couldn't it be more specific?
We need a explanation for the parameter in the function comment.


+   if (!am_parallel_apply_worker())
+   {
+   Assert(ts > 0);
+   maybe_delay_apply(ts);

It seems to me better that the if condition and assertion are checked
inside maybe_delay_apply().


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: ATTACH PARTITION seems to ignore column generation status

2023-01-11 Thread Amit Langote

On Thu, Jan 12, 2023 at 5:58 AM Tom Lane  wrote:
> Amit Langote  writes:
> > I've updated your disallow-generated-child-columns-2.patch to do this,
> > and have also merged the delta post that I had attached with my last
> > email, whose contents you sound to agree with.
>
> Pushed with some further work to improve the handling of multiple-
> inheritance cases.  We still need to insist that all or none of the
> parent columns are generated, but we don't have to require their
> generation expressions to be alike: that can be resolved by letting
> the child table override the expression, much as we've long done for
> plain default expressions.  (This did need some work in pg_dump
> after all.)  I'm pretty happy with where this turned out.

Thanks, that all looks more consistent now indeed.

I noticed a typo in the doc additions, which I've attached a fix for.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


8bf6ec3ba3-doc-typo.patch
Description: Binary data

Re: releasing ParallelApplyTxnHash when pa_launch_parallel_worker returns NULL

2023-01-11 Thread Ted Yu

On Wed, Jan 11, 2023 at 6:54 PM Amit Kapila  wrote:

> On Wed, Jan 11, 2023 at 9:31 AM Ted Yu  wrote:
> >
> > On Tue, Jan 10, 2023 at 7:55 PM houzj.f...@fujitsu.com <
> houzj.f...@fujitsu.com> wrote:
> >>
> >> On Wednesday, January 11, 2023 10:21 AM Ted Yu 
> wrote:
> >> > /* First time through, initialize parallel apply worker state
> hashtable. */
> >> > if (!ParallelApplyTxnHash)
> >> >
> >> > I think it would be better if `ParallelApplyTxnHash` is created by
> the first
> >> > successful parallel apply worker.
> >>
> >> Thanks for the suggestion. But I am not sure if it's worth to changing
> the
> >> order here, because It will only optimize the case where user enable
> parallel
> >> apply but never get an available worker which should be rare. And in
> such a
> >> case, it'd be better to increase the number of workers or disable the
> parallel mode.
> >>
> >
> >
> > I think even though the chance is rare, we shouldn't leak resource.
> >
>
> But that is true iff we are never able to start the worker. Anyway, I
> think it is probably fine either way but we can change it as per your
> suggestion to make it more robust and probably for the code clarity
> sake. I'll push this tomorrow unless someone thinks otherwise.
>
> --
> With Regards,
> Amit Kapila.
>

Thanks Amit for the confirmation.

Cheers

Re: releasing ParallelApplyTxnHash when pa_launch_parallel_worker returns NULL

2023-01-11 Thread Amit Kapila

On Wed, Jan 11, 2023 at 9:31 AM Ted Yu  wrote:
>
> On Tue, Jan 10, 2023 at 7:55 PM houzj.f...@fujitsu.com 
>  wrote:
>>
>> On Wednesday, January 11, 2023 10:21 AM Ted Yu  wrote:
>> > /* First time through, initialize parallel apply worker state 
>> > hashtable. */
>> > if (!ParallelApplyTxnHash)
>> >
>> > I think it would be better if `ParallelApplyTxnHash` is created by the 
>> > first
>> > successful parallel apply worker.
>>
>> Thanks for the suggestion. But I am not sure if it's worth to changing the
>> order here, because It will only optimize the case where user enable parallel
>> apply but never get an available worker which should be rare. And in such a
>> case, it'd be better to increase the number of workers or disable the 
>> parallel mode.
>>
>
>
> I think even though the chance is rare, we shouldn't leak resource.
>

But that is true iff we are never able to start the worker. Anyway, I
think it is probably fine either way but we can change it as per your
suggestion to make it more robust and probably for the code clarity
sake. I'll push this tomorrow unless someone thinks otherwise.

-- 
With Regards,
Amit Kapila.

Re: on placeholder entries in view rule action query's range table

2023-01-11 Thread Amit Langote

 On Thu, Jan 12, 2023 at 10:06 AM Tom Lane  wrote:
> Amit Langote  writes:
> > On Mon, Jan 9, 2023 at 5:58 AM Tom Lane  wrote:
> >> Conceivably we could make it work by allowing RTE_SUBQUERY RTEs to
> >> carry a relation OID and associated RTEPermissionInfo, so that when a
> >> view's RTE_RELATION RTE is transmuted to an RTE_SUBQUERY RTE it still
> >> carries the info needed to let us lock and permission-check the view.
> >> That might be a bridge too far from the ugliness perspective ...
> >> although certainly the business with OLD and/or NEW RTEs isn't very
> >> pretty either.
>
> > I had thought about that idea but was a bit scared of trying it,
> > because it does sound like something that might become a maintenance
> > burden in the future.  Though I gave that a try today given that it
> > sounds like I may have your permission. ;-)
>
> Given the small number of places that need to be touched, I don't
> think it's a maintenance problem.  I agree with your fear that you
> might have missed some, but I also searched and found no more.

OK, thanks.

> > I was
> > surprised that nothing failed with a -DWRITE_READ_PARSE_PLAN_TREES
> > build, because of the way RTEs are written and read -- relid,
> > rellockmode are not written/read for RTE_SUBQUERY RTEs.
>
> I think that's mostly accidental, stemming from the facts that:
> (1) Stored rules wouldn't have these fields populated yet anyway.
> (2) The regression tests haven't got any good way to check that a
> needed lock was actually acquired.  It looks to me like with the
> patch as you have it, when a plan tree is copied into the plan
> cache the view relid is lost (if pg_plan_query stripped it thanks
> to -DWRITE_READ_PARSE_PLAN_TREES) and thus we won't re-acquire the
> view lock in AcquireExecutorLocks during later plan uses.  But
> that would have no useful effect unless it forced a re-plan due to
> a concurrent view replacement, which is a scenario I'm pretty sure
> we don't actually exercise in the tests.

Ah, does it make sense to have isolation tests cover this?

> (3) The executor doesn't look at these fields after startup, so
> failure to transmit them to parallel workers doesn't hurt.
>
> In any case, it would clearly be very foolish not to fix
> outfuncs/readfuncs to preserve all the fields we're using.
>
> I've pushed this with some cleanup --- aside from fixing
> outfuncs/readfuncs, I did some more work on the comments, which
> I think you were too sloppy about.

Thanks a lot for the fixes.

> Sadly, the original nested-views test case still has O(N^2)
> planning time :-(.  I dug into that harder and now see where
> the problem really lies.  The rewriter recursively replaces
> the view RTE_RELATION RTEs with RTE_SUBQUERY all the way down,
> which takes it only linear time.  However, then we have a deep
> nest of RTE_SUBQUERYs, and the initial copyObject in
> pull_up_simple_subquery repeatedly copies everything below the
> current pullup recursion level, so that it's still O(N^2)
> even though the final rtable will have only N entries.

That makes sense.

> I'm afraid to remove the copyObject step, because that would
> cause problems in the cases where we try to perform pullup
> and have to abandon it later.  (Maybe we could get rid of
> all such cases, but I'm not sanguine about that succeeding.)
> I'm tempted to try to fix it by taking view replacement out
> of the rewriter altogether and making prepjointree.c handle
> it during the same recursive scan that does subquery pullup,
> so that we aren't applying copyObject to already-expanded
> RTE_SUBQUERY nests.  However, that's more work than I care to
> put into the problem right now.

OK, I will try to give your idea a shot sometime later.

BTW, I noticed that we could perhaps remove the following in the
fireRIRrules()'s loop that calls ApplyRetrieveRule(), because we no
longer put any unreferenced OLD/NEW RTEs in the view queries.

/*
 * If the table is not referenced in the query, then we ignore it.
 * This prevents infinite expansion loop due to new rtable entries
 * inserted by expansion of a rule. A table is referenced if it is
 * part of the join set (a source table), or is referenced by any Var
 * nodes, or is the result table.
 */
if (rt_index != parsetree->resultRelation &&
!rangeTableEntry_used((Node *) parsetree, rt_index, 0))
continue;

Commenting this out doesn't break make check.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Blocking execution of SECURITY INVOKER

2023-01-11 Thread Jeff Davis

Motivation
==

SECURITY INVOKER is dangerous, particularly for administrators. There
are numerous ways to put code in a place that's likely to be executed:
triggers, views calling functions, logically-replicated tables, casts,
search path and function resolution tricks, etc. If this code is run
with the privileges of the invoker, then it provides an easy path to
privilege escalation.

We've addressed some of these risks, i.e. by offering better ways to
control the search path, and by ignoring SECURITY INVOKER in some
contexts (like maintenance commands). But it still leaves a lot of
risks for administrators who want to do a SELECT or INSERT. And it
limits major use cases, like logical replication (where the
subscription owner must trust all table owners).

Note that, in the SQL spec, SECURITY DEFINER is the default, which may
be due to some of the dangers of SECURITY INVOKER. (SECURITY DEFINER
carries its own risks, of course, especially if the definer is highly
privileged.)

Prior work
==

https://www.postgresql.org/message-id/19327.1533748538%40sss.pgh.pa.us

The above thread came up with a couple solutions to express a trust
relationship between users (via GUC or DDL). I'm happy if that
discussion continues, but it appeared to trail off.

My new proposal is different (and simpler, I believe) in two ways:

First, my proposal is only concerned with SECURITY INVOKER functions
and executing arbitrary code written by untrusted users. There's still
the potential for mischief without using SECURITY INVOKER (e.g. if the
search path isn't properly controlled), but it's a different kind of
problem. This narrower problem scope makes my proposal less invasive.

Second, my proposal doesn't establish a new trust relationship. If the
SECURITY INVOKER function is owned by a user that can SET ROLE to you,
you trust it; otherwise not. 

Proposal


New boolean GUC check_function_owner_trust, default false.

If check_function_owner_trust=true, throw an error if you try to
execute a function that is SECURITY INVOKER and owned by a user other
than you or someone that can SET ROLE to you.

Use Cases
=

1. If you are a superuser/admin working on a problem interactively, you
can protect yourself against accidentally executing malicious code with
your privileges.

2. You can set up logical replication subscriptions into tables owned
by users you don't trust, as long as triggers (if needed) can be safely
written as SECURITY DEFINER.

3. You can ensure that running an extension script doesn't somehow
execute malicious code with superuser privileges.

4. Users can protect themselves from executing malicious code in cases
where:
  a. role membership accurately describes the trust relationship
already
  b. triggers, views-calling-UDFs, etc., (if any) can be safely written
as SECURITY DEFINER

While that may not be every conceivable use case, it feels very useful
to me.

Even if you really don't like SECURITY DEFINER, points 1, 3, and 4(a)
seem like wins. And there are a lot of cases where the user simply
doesn't need triggers (etc.).

Extensions
==

Some extensions might create and extension-specific user that owns lots
of SECURITY INVOKER functions. If this GUC is set, other users wouldn't
be able to call those functions.

Our contrib extensions don't seem do that, and all the tests for them
pass without modification (even when the GUC is true).

For extensions that do create extension-specific users that own
SECURITY INVOKER functions, this GUC alone won't work. Trying to
capture that use case as well could involve more discussion (involving
extension authors) and may result in an extension-specific trust
proposal, so I'm considering that out of scope.

Loose Ends
==

Do we need to check security-invoker views? I don't think it's nearly
as important, because views can't write data. A security-invoker view
read from a security definer function uses the privileges of the
function owner, so I don't see an obvious way to abuse a security
invoker view, but perhaps I'm not creative enough.

Also, Noah's patch did things differently from mine in a few places,
and I need to work out whether I missed something. I may have to add a
check for the range types "subtype_diff" function, for instance.

Future Work
===

In some cases, we should consider defaulting (or even forcing) this GUC
to be true, such as in a subscription apply worker.

This proposal may offer a path to allowing non-superusers to create
event triggers.

We may want to provide SECURITY PUBLIC or SECURITY NONE (or even
"SECURITY AS "?), which would execute a function with minimal
privileges, and further reduce the need for executing untrusted
SECURITY INVOKER code.

Another idea is to have READ ONLY functions which would be another way
to make SECURITY INVOKER safer.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 1b4b575017a226011e6a0174c0a4c372015faa06 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 11 Jan

RE: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2023-01-11 Thread Yoshikazu Imai (Fujitsu)

Hi.

While investigating the codes in RelationCopyStorageUsingBuffer, I wonder that
there is any reason to use RBM_NORMAL for acquiring destination buffer.
I think we can use RBM_ZERO_AND_LOCK here instead of RBM_NORMAL.

When we use RBM_NORMAL, a destination block is read by smgrread and it's
content is verified with PageIsVerifiedExtended in ReadBuffer_common.
A page verification normally succeeds because destination blocks are
zero-filled by using smgrextend, but do we trust that it is surely zero-filled?

On Fri, Aug 5, 2022 at 00:32 AM Tom Lane  
wrote:
> Hmm ... makes sense.  We'd be using smgrextend to write just the last page
> of the file, relying on its API spec "Note that we assume writing a block
> beyond current EOF causes intervening file space to become filled with
> zeroes".  I don't know that we're using that assumption aggressively
> today, but as long as it doesn't confuse the kernel it'd probably be fine.

If there is a block which is not zero-filled, page verification would fail and
eventually CREATE DATABASE would fail.
(I also think that originally we don't need page verification for destination 
blocks
here because those blocks are just overwritten by source buffer.)

Also, from performance POV, I think it is good to use RBM_ZERO_AND_LOCK.
In RBM_NORMAL, destination blocks are read from disk by smgrread each time, but
in RBM_ZERO_AND_LOCK, we only set buffers zero-filled by MemSet and don't
access the disk which makes performance better.
If we expect the destination buffer is always zero-filled, we can use
RBM_ZERO_AND_LOCK.

Apart from above, there seems to be an old comment which is for the old codes
when acquiring destination buffer by using P_NEW.

"/* Use P_NEW to extend the destination relation. */"

--
Yoshikazu Imai

> -Original Message-
> From: Dilip Kumar 
> Sent: Friday, September 2, 2022 11:22 PM
> To: Justin Pryzby 
> Cc: Robert Haas ; Tom Lane ; 
> Andres Freund ;
> Ashutosh Sharma ; Maciek Sakrejda 
> ; Bruce Momjian
> ; Alvaro Herrera ; Andrew Dunstan 
> ; Heikki
> Linnakangas ; pgsql-hackers@lists.postgresql.org; Thomas 
> Munro 
> Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
> 
> On Fri, Sep 2, 2022 at 5:25 PM Justin Pryzby  wrote:
> >
> > On Tue, Aug 02, 2022 at 12:50:43PM -0500, Justin Pryzby wrote:
> > > Also, if I understand correctly, this patch seems to assume that
> > > nobody is connected to the source database.  But what's actually
> > > enforced is just that nobody *else* is connected.  Is it any issue
> > > that the current DB can be used as a source?  Anyway, both of the
> > > above problems are reproducible using a different database.
> > >
> > > |postgres=# CREATE DATABASE new TEMPLATE postgres STRATEGY wal_log;
> > > |CREATE DATABASE
> >
> > On Thu, Aug 04, 2022 at 05:16:04PM -0500, Justin Pryzby wrote:
> > > On Thu, Aug 04, 2022 at 06:02:50PM -0400, Tom Lane wrote:
> > > > The "invalidation" comment bothered me for awhile, but I think
> > > > it's
> > > > fine: we know that no other backend can connect to the source DB
> > > > because we have it locked,
> > >
> > > About that - is it any problem that the currently-connected db can
> > > be used as a template?  It's no issue for 2-phase commit, because
> > > "create database" cannot run in an txn.
> >
> > Would anybody want to comment on this ?
> > Is it okay that the *current* DB can be used as a template ?
> 
> I don't think there should be any problem with that.  The main problem could 
> have been that since we are reading the
> pg_class tuple block by block there could be an issue if someone concurrently 
> modifies the pg_class or there are some
> tuples that are inserted by the prepared transaction.  But in this case, the 
> same backend can not have an open prepared
> transaction while creating a database and that backend of course can not 
> perform any parallel operation as well.
> 
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

2023-01-11 Thread Peter Geoghegan

On Wed, Jan 11, 2023 at 4:44 PM Andres Freund  wrote:
> > Attached patch fixes up these issues. It's almost totally mechanical.
>
> Looks better, thanks!

Pushed that just now.

Thanks
-- 
Peter Geoghegan

Re: No Callbacks on FATAL

2023-01-11 Thread Tom Lane

Andres Freund  writes:
> On 2023-01-11 18:10:33 -0500, Tom Lane wrote:
>> It's intended behavior, and I seriously doubt that it ever worked
>> differently.

> Hm? MemoryContextDelete() unconditionally calls the
> callbacks. ShutdownPostgres() calls AbortOutOfAnyTransaction(). So if there's
> an ongoing transaction, we'll call the reset callbacks on TopMemoryContext and
> its children.

Hmm ... I'd forgotten that we'd reach AbortOutOfAnyTransaction in
the FATAL code path.  It does seem like any memory contexts below
TopTransactionContext ought to get cleaned up then.

As you say, we really need more details to see what's happening
here.

regards, tom lane

Re: on placeholder entries in view rule action query's range table

2023-01-11 Thread Tom Lane

Amit Langote  writes:
> On Mon, Jan 9, 2023 at 5:58 AM Tom Lane  wrote:
>> Conceivably we could make it work by allowing RTE_SUBQUERY RTEs to
>> carry a relation OID and associated RTEPermissionInfo, so that when a
>> view's RTE_RELATION RTE is transmuted to an RTE_SUBQUERY RTE it still
>> carries the info needed to let us lock and permission-check the view.
>> That might be a bridge too far from the ugliness perspective ...
>> although certainly the business with OLD and/or NEW RTEs isn't very
>> pretty either.

> I had thought about that idea but was a bit scared of trying it,
> because it does sound like something that might become a maintenance
> burden in the future.  Though I gave that a try today given that it
> sounds like I may have your permission. ;-)

Given the small number of places that need to be touched, I don't
think it's a maintenance problem.  I agree with your fear that you
might have missed some, but I also searched and found no more.

> I was
> surprised that nothing failed with a -DWRITE_READ_PARSE_PLAN_TREES
> build, because of the way RTEs are written and read -- relid,
> rellockmode are not written/read for RTE_SUBQUERY RTEs.

I think that's mostly accidental, stemming from the facts that:
(1) Stored rules wouldn't have these fields populated yet anyway.
(2) The regression tests haven't got any good way to check that a
needed lock was actually acquired.  It looks to me like with the
patch as you have it, when a plan tree is copied into the plan
cache the view relid is lost (if pg_plan_query stripped it thanks
to -DWRITE_READ_PARSE_PLAN_TREES) and thus we won't re-acquire the
view lock in AcquireExecutorLocks during later plan uses.  But
that would have no useful effect unless it forced a re-plan due to
a concurrent view replacement, which is a scenario I'm pretty sure
we don't actually exercise in the tests.
(3) The executor doesn't look at these fields after startup, so
failure to transmit them to parallel workers doesn't hurt.

In any case, it would clearly be very foolish not to fix
outfuncs/readfuncs to preserve all the fields we're using.

>> BTW, I don't entirely understand why this patch is passing regression
>> tests, because it's failed to deal with numerous places that have
>> hard-wired knowledge about these extra RTEs.  Look for references to
>> PRS2_OLD_VARNO and PRS2_NEW_VARNO.

> AFAICS, the places that still have hard-wired knowledge of these
> placeholder RTEs only manipulate non-SELECT rules, so don't care about
> views.

Yeah, I looked through them too and didn't find any problems.

I've pushed this with some cleanup --- aside from fixing
outfuncs/readfuncs, I did some more work on the comments, which
I think you were too sloppy about.

Sadly, the original nested-views test case still has O(N^2)
planning time :-(.  I dug into that harder and now see where
the problem really lies.  The rewriter recursively replaces
the view RTE_RELATION RTEs with RTE_SUBQUERY all the way down,
which takes it only linear time.  However, then we have a deep
nest of RTE_SUBQUERYs, and the initial copyObject in
pull_up_simple_subquery repeatedly copies everything below the
current pullup recursion level, so that it's still O(N^2)
even though the final rtable will have only N entries.

I'm afraid to remove the copyObject step, because that would
cause problems in the cases where we try to perform pullup
and have to abandon it later.  (Maybe we could get rid of
all such cases, but I'm not sanguine about that succeeding.)
I'm tempted to try to fix it by taking view replacement out
of the rewriter altogether and making prepjointree.c handle
it during the same recursive scan that does subquery pullup,
so that we aren't applying copyObject to already-expanded
RTE_SUBQUERY nests.  However, that's more work than I care to
put into the problem right now.

regards, tom lane

Re: low wal_retrieve_retry_interval causes missed signals on Windows

2023-01-11 Thread Nathan Bossart

On Wed, Jan 11, 2023 at 04:40:14PM -0800, Andres Freund wrote:
> On 2023-01-11 15:26:45 -0800, Nathan Bossart wrote:
>> On Wed, Jan 11, 2023 at 12:48:36PM -0800, Andres Freund wrote:
>> > Given that we check for interrupts in other parts of recovery with
>> > HandleStartupProcInterrupt(), which doesn't interact with latches, isn't 
>> > the
>> > actual bug that HandleStartupProcInterrupt() doesn't contain the same black
>> > magic that CHECK_FOR_INTERRUPTS() contains on windows?  Namely this stuff:
>> 
>> Yeah, this seems like a more comprehensive fix.  I've attached a patch that
>> adds this Windows signaling stuff to the HandleXXXInterrupts() functions in
>> the files you listed.  Is this roughly what you had in mind?  If so, I'll
>> look around for anywhere else it is needed.
> 
> Yes, that's what I roughly was thinking of. Although seeing the diff, I think
> it might be worth introducing a helper function that'd containing at least
> pgwin32_dispatch_queued_signals() and ProcessProcSignalBarrier(). It's a bit
> complicated by ProcessProcSignalBarrier() only being applicable to shared
> memory connected processes - excluding e.g. pgarch.

As of d75288f, the archiver should be connected to shared memory, so we
might be in luck.  I guess we'd need to watch out for this if we want to
back-patch it beyond v14.  I'll work on a patch...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: No Callbacks on FATAL

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-11 18:10:33 -0500, Tom Lane wrote:
> Ed Behn  writes:
> > I'm using a callback function that is called when a memory context is
> > deleted to remove a temporary file. This works fine when the transaction
> > ends normally or raises an ERROR. However, when a FATAL event happens, the
> > callback is not run. Is this a bug or intended behaviour?
>
> It's intended behavior, and I seriously doubt that it ever worked
> differently.

Hm? MemoryContextDelete() unconditionally calls the
callbacks. ShutdownPostgres() calls AbortOutOfAnyTransaction(). So if there's
an ongoing transaction, we'll call the reset callbacks on TopMemoryContext and
its children.

Of course that doesn't mean we'll delete all existing contexts...

> > It seems to me that callbacks should be run in the event of a FATAL event
> > in order to clean up any lingering issues.
>
> They'd be far more likely to cause issues than cure them.  Or at least
> that's the design assumption.  If you really need something here, put
> it in an on_proc_exit callback not a memory context callback.

Or, depending on the use case, a transaction callback.

It's really hard to know what precisely to suggest, without knowing a good bit
more about the intended usecase.

Greetings,

Andres Freund

Re: pgsql: Add new GUC createrole_self_grant.

2023-01-11 Thread David G. Johnston

On Wed, Jan 11, 2023 at 2:16 PM Robert Haas  wrote:

> On Wed, Jan 11, 2023 at 4:00 PM Tom Lane  wrote:
> > Robert Haas  writes:
> > > If you want to make safe a SECURITY DEFINER function written using sql
> > > or plpgsql, you either have to schema-qualify every single reference
> > > or, more realistically, attach a SET clause to the function to set the
> > > search_path to a sane value during the time that the function is
> > > executing. The problem here can be handled the same way, except that
> > > it's needed in a vastly more limited set of circumstances: you have to
> > > be calling a SECURITY DEFINER function that will execute CREATE ROLE
> > > as a non-superuser (and that user then needs to be sensitive to the
> > > value of this GUC in some security-relevant way). It might be good to
> > > document this -- I just noticed that the CREATE FUNCTION page has a
> > > section on "Writing SECURITY DEFINER Functions Safely" which talks
> > > about dealing with the search_path issues, and it seems like it would
> > > be worth adding a sentence or two there to talk about this.
> >
> > OK, I'd be satisfied with that.
>
> OK, I'll draft a patch tomorrow.
>
>
Justed wanted to chime in and say Robert has eloquently put into words much
of what I have been thinking here, and that I concur that guiding the DBA
to use care with the power they have been provided is a sane position to
take.

+1, and thank you.

David J.

Re: Switching XLog source from archive to streaming when primary available

2023-01-11 Thread Nathan Bossart

On Tue, Oct 18, 2022 at 07:31:33AM +0530, Bharath Rupireddy wrote:
> In summary, the standby state machine in WaitForWALToBecomeAvailable()
> exhausts all the WAL in pg_wal before switching to streaming after
> failing to fetch from archive. The v8 patch proposed upthread deviates
> from this behaviour. Hence, attaching v9 patch that keeps the
> behaviour as-is, that means, the standby exhausts all the WAL in
> pg_wal before switching to streaming after fetching WAL from archive
> for at least streaming_replication_retry_interval milliseconds.

I think this is okay.  The following comment explains why archives are
preferred over existing files in pg_wal:

 * When doing archive recovery, we always prefer an archived log file 
even
 * if a file of the same name exists in XLOGDIR.  The reason is that the
 * file in XLOGDIR could be an old, un-filled or partly-filled version
 * that was copied and restored as part of backing up $PGDATA.

With your patch, we might replay one of these "old" files in pg_wal instead
of the complete version of the file from the archives, but I think that is
still correct.  We'll just replay whatever exists in pg_wal (which may be
un-filled or partly-filled) before attempting streaming.  If that fails,
we'll go back to trying the archives again.

Would you mind testing this scenario?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-11 16:06:31 -0800, Peter Geoghegan wrote:
> On Mon, Jan 9, 2023 at 2:18 PM Peter Geoghegan  wrote:
> > I'll try to get back to it this week.
> 
> Attached patch fixes up these issues. It's almost totally mechanical.

Looks better, thanks!


> (Ended up using "git diff --color-moved=dimmed-zebra
> --color-moved-ws=ignore-all-space" with this, per your recent tip,
> which did help.)

It's a really useful feature. I configured git to always use
--color-moved=dimmed-zebra, but haven't quite dared to enable
--color-moved-ws=ignore-all-space by default.

Greetings,

Andres Freund

Re: low wal_retrieve_retry_interval causes missed signals on Windows

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-11 15:26:45 -0800, Nathan Bossart wrote:
> On Wed, Jan 11, 2023 at 12:48:36PM -0800, Andres Freund wrote:
> > Given that we check for interrupts in other parts of recovery with
> > HandleStartupProcInterrupt(), which doesn't interact with latches, isn't the
> > actual bug that HandleStartupProcInterrupt() doesn't contain the same black
> > magic that CHECK_FOR_INTERRUPTS() contains on windows?  Namely this stuff:
> 
> Yeah, this seems like a more comprehensive fix.  I've attached a patch that
> adds this Windows signaling stuff to the HandleXXXInterrupts() functions in
> the files you listed.  Is this roughly what you had in mind?  If so, I'll
> look around for anywhere else it is needed.

Yes, that's what I roughly was thinking of. Although seeing the diff, I think
it might be worth introducing a helper function that'd containing at least
pgwin32_dispatch_queued_signals() and ProcessProcSignalBarrier(). It's a bit
complicated by ProcessProcSignalBarrier() only being applicable to shared
memory connected processes - excluding e.g. pgarch.

Greetings,

Andres Freund

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-11 17:26:19 -0700, David G. Johnston wrote:
> Should we just add "ring_buffers" to the existing "shared_buffers" and
> "temp_buffers" settings?

The different types of ring buffers have different sizes, for good reasons. So
I don't see that working well. I also think it'd be more often useful to
control this on a statement basis - if you have a parallel import tool that
starts NCPU COPYs you'd want a smaller buffer than a single threaded COPY. Of
course each session can change the ring buffer settings, but still.

> Then give VACUUM a (BUFFER_POOL=ring*|shared) option?

That seems likely to mislead, because it'd still use shared buffers when the
blocks are already present. The ring buffers aren't a separate buffer pool,
they're a subset of the normal bufferpool. Lookup is done normally, only when
a page isn't found, the search for a victim buffer first tries to use a buffer
from the ring.

> I think making DBAs aware of this dynamic and making the ring buffer usage
> user-facing is beneficial in its own right (at least, the concept that
> changes done by vacuum don't impact shared_buffers, regardless of how that
> non-impact manifests).

VACUUM can end up dirtying all of shared buffers, even with the ring buffer in
use...

Greetings,

Andres Freund

Re: Use windows VMs instead of windows containers on the CI

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-11 17:21:21 -0600, Justin Pryzby wrote:
> On Tue, Jan 10, 2023 at 03:20:18PM +0300, Nazir Bilal Yavuz wrote:
> > Hi,
> > 
> > I propose using windows VMs instead of containers, the patch is attached.
> > Currently, windows containers are used on the CI, but these container images
> > are needs to get pulled on every CI run, also they are slow to run.
> 
> > @@ -589,8 +591,10 @@ task:
> ># otherwise it'll be sorted before other tasks
> >depends_on: SanityCheck
> >  
> > -  windows_container:
> > -image: $CONTAINER_REPO/windows_ci_mingw64:latest
> > +  compute_engine_instance:
> > +image_project: $IMAGE_PROJECT
> > +image: family/pg-ci-windows-ci-mingw64
> > +platform: windows
> >  cpu: $CPUS
> >  memory: 4G
> 
> It looks like MinGW currently doesn't have the necessary perl modules:
> 
> [19:58:46.356] Message: Can't locate IPC/Run.pm in @INC (you may need to 
> install the IPC::Run module) (@INC contains: 
> C:/msys64/ucrt64/lib/perl5/site_perl/5.32.1 
> C:/msys64/ucrt64/lib/perl5/site_perl/5.32.1 
> C:/msys64/ucrt64/lib/perl5/site_perl C:/msys64/ucrt64/lib/perl5/vendor_perl 
> C:/msys64/ucrt64/lib/perl5/core_perl) at config/check_modules.pl line 11.
> [19:58:46.356] BEGIN failed--compilation aborted at config/check_modules.pl 
> line 11.
> [19:58:46.356] meson.build:1337: WARNING: Additional Perl modules are 
> required to run TAP tests.
> 
> That could be caused by a transient failure combined with bad error
> handling - if there's an error while building the image, it shouldn't be
> uploaded.

Yea, there's a problem where packer on windows doesn't seem to abort after a
powershell script error out. The reason isn't yet quiete clear. I think Bilal
is working on a workaround.

Greetings,

Andres Freund

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-01-11 Thread David G. Johnston

On Wed, Jan 11, 2023 at 2:39 PM Andres Freund  wrote:

> Hi,
>
> On 2023-01-11 16:18:34 -0500, Tom Lane wrote:
> > Peter Geoghegan  writes:
> > > On Wed, Jan 11, 2023 at 11:18 AM Andres Freund 
> wrote:
> > >> I don't like that - it's also quite useful to disable use of
> ringbuffers when
> > >> you actually need to clean up indexes. Especially when we have a lot
> of dead
> > >> tuples we'll rescan indexes over and over...
> >
> > > That's a fair point.
> >
> > > My vote goes to "REUSE_BUFFERS", then.
> >
> > I wonder whether it could make sense to allow a larger ringbuffer size,
> > rather than just the limit cases of "on" and "off".
>
> I can see that making sense, particularly if we were to later extend this
> to
> other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading
> of
> data > 16MB but also << s_b vastly slower, but it can still be very
> important
> to use if there's lots of parallel processes loading data.
>
>
Should we just add "ring_buffers" to the existing "shared_buffers" and
"temp_buffers" settings?

Then give VACUUM a (BUFFER_POOL=ring*|shared) option?

I think making DBAs aware of this dynamic and making the ring buffer usage
user-facing is beneficial in its own right (at least, the concept that
changes done by vacuum don't impact shared_buffers, regardless of how that
non-impact manifests).  But I don't see much benefit trying to come up with
a different name.

David J.

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

2023-01-11 Thread Peter Geoghegan

On Mon, Jan 9, 2023 at 2:18 PM Peter Geoghegan  wrote:
> I'll try to get back to it this week.

Attached patch fixes up these issues. It's almost totally mechanical.

(Ended up using "git diff --color-moved=dimmed-zebra
--color-moved-ws=ignore-all-space" with this, per your recent tip,
which did help.)

--
Peter Geoghegan


v1-0001-Rename-freeze-plan-dedup-routines.patch
Description: Binary data

Re: Refactor recordExtObjInitPriv()

2023-01-11 Thread Nathan Bossart

On Tue, Dec 27, 2022 at 09:56:10AM +0100, Peter Eisentraut wrote:
> Refactor recordExtObjInitPriv():  Instead of half a dozen of
> mostly-duplicate conditional branches, write one common one that can handle
> most catalogs.  We already have all the information we need, such as which
> system catalog corresponds to which catalog table and which column is the
> ACL column.

This seems reasonable.

> + /* This will error on unsupported classoid. */
> + else if (get_object_attnum_acl(classoid))

nitpick: I would suggest explicitly checking that it isn't
InvalidAttrNumber instead of relying on it always being 0.

> -  classoid == AggregateRelationId ||

I noticed that AggregateRelationId isn't listed in the ObjectProperty
array, so I think recordExtObjInitPriv() will begin erroring for that
classoid instead of ignoring it like we do today.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Rework of collation code, extensibility

2023-01-11 Thread Jeff Davis

On Wed, 2023-01-11 at 15:08 +0100, Peter Eisentraut wrote:
> I think the refactoring that you proposed in the thread "Refactor to 
> introduce pg_strcoll()." was on a sensible track.  Maybe we should
> try 
> to get that done.

Those should be patches 0001-0003 in this thread (now at v6), which are
all pure refactoring.

Let's consider those patches the topic of this thread and I'll move
0004-0007 back to the multi-lib ICU thread on the next revision.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: mprove tab completion for ALTER EXTENSION ADD/DROP

2023-01-11 Thread Michael Paquier

On Wed, Jan 11, 2023 at 10:29:25PM +0530, vignesh C wrote:
> I too felt keeping it simpler is better. How about using the simple
> first version of patch itself?

Okay, I have just done that, then, after checking that all the object
types were covered (28).  Note that PROCEDURAL LANGUAGE has been
removed for simplicity.
--
Michael


signature.asc
Description: PGP signature

Re: pgsql: Delay commit status checks until freezing executes.

2023-01-11 Thread Peter Geoghegan

On Wed, Jan 11, 2023 at 3:06 PM Andres Freund  wrote:
> > + * We can't use TransactionIdDidAbort here because it won't treat 
> > transactions
> > + * that were in progress during a crash as aborted by now.  We determine 
> > that
> > + * transactions aborted/crashed through process of elimination instead.
>
> s/by now//?

Did it that way in the commit I pushed just now.

Thanks
-- 
Peter Geoghegan

Re: low wal_retrieve_retry_interval causes missed signals on Windows

2023-01-11 Thread Nathan Bossart

On Wed, Jan 11, 2023 at 12:48:36PM -0800, Andres Freund wrote:
> Given that we check for interrupts in other parts of recovery with
> HandleStartupProcInterrupt(), which doesn't interact with latches, isn't the
> actual bug that HandleStartupProcInterrupt() doesn't contain the same black
> magic that CHECK_FOR_INTERRUPTS() contains on windows?  Namely this stuff:

Yeah, this seems like a more comprehensive fix.  I've attached a patch that
adds this Windows signaling stuff to the HandleXXXInterrupts() functions in
the files you listed.  Is this roughly what you had in mind?  If so, I'll
look around for anywhere else it is needed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index de0bbbfa79..e57be4f72b 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -543,6 +543,11 @@ CheckpointerMain(void)
 static void
 HandleCheckpointerInterrupts(void)
 {
+#ifdef WIN32
+	if (UNBLOCKED_SIGNAL_QUEUE())
+		pgwin32_dispatch_queued_signals();
+#endif
+
 	if (ProcSignalBarrierPending)
 		ProcessProcSignalBarrier();
 
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 8ecdb9ca23..68184894fe 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -777,6 +777,11 @@ pgarch_die(int code, Datum arg)
 static void
 HandlePgArchInterrupts(void)
 {
+#ifdef WIN32
+	if (UNBLOCKED_SIGNAL_QUEUE())
+		pgwin32_dispatch_queued_signals();
+#endif
+
 	if (ProcSignalBarrierPending)
 		ProcessProcSignalBarrier();
 
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 8786186898..0e8cb3a174 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -171,6 +171,11 @@ HandleStartupProcInterrupts(void)
 	static uint32 postmaster_poll_count = 0;
 #endif
 
+#ifdef WIN32
+	if (UNBLOCKED_SIGNAL_QUEUE())
+		pgwin32_dispatch_queued_signals();
+#endif
+
 	/*
 	 * Process any requests or signals received recently.
 	 */

Re: Use windows VMs instead of windows containers on the CI

2023-01-11 Thread Justin Pryzby

On Tue, Jan 10, 2023 at 03:20:18PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> I propose using windows VMs instead of containers, the patch is attached.
> Currently, windows containers are used on the CI, but these container images
> are needs to get pulled on every CI run, also they are slow to run.

> @@ -589,8 +591,10 @@ task:
># otherwise it'll be sorted before other tasks
>depends_on: SanityCheck
>  
> -  windows_container:
> -image: $CONTAINER_REPO/windows_ci_mingw64:latest
> +  compute_engine_instance:
> +image_project: $IMAGE_PROJECT
> +image: family/pg-ci-windows-ci-mingw64
> +platform: windows
>  cpu: $CPUS
>  memory: 4G

It looks like MinGW currently doesn't have the necessary perl modules:

[19:58:46.356] Message: Can't locate IPC/Run.pm in @INC (you may need to 
install the IPC::Run module) (@INC contains: 
C:/msys64/ucrt64/lib/perl5/site_perl/5.32.1 
C:/msys64/ucrt64/lib/perl5/site_perl/5.32.1 
C:/msys64/ucrt64/lib/perl5/site_perl C:/msys64/ucrt64/lib/perl5/vendor_perl 
C:/msys64/ucrt64/lib/perl5/core_perl) at config/check_modules.pl line 11.
[19:58:46.356] BEGIN failed--compilation aborted at config/check_modules.pl 
line 11.
[19:58:46.356] meson.build:1337: WARNING: Additional Perl modules are required 
to run TAP tests.

That could be caused by a transient failure combined with bad error
handling - if there's an error while building the image, it shouldn't be
uploaded.

-- 
Justin

Re: Show various offset arrays for heap WAL records

2023-01-11 Thread Peter Geoghegan

On Wed, Jan 11, 2023 at 3:00 PM Andres Freund  wrote:
> What are your thoughts about the place for the helper functions? You're ok
> with rmgrdesc_utils.[ch]?

Yeah, that seems okay.

We may well need to put more stuff in that file. We're overdue a big
overhaul of the rmgr output, so that everybody uses the same format
for everything. We made some progress on that for 16 already, by
standardizing on the name snapshotConflictHorizon, but a lot of
annoying inconsistencies still remain. Like the punctuation issue you
mentioned.

Ideally we'd be able to make the output more easy to manipulate via
the SQL interface from pg_walinspect, or perhaps via scripting. That
would require some rules that are imposed top-down, so that consumers
of the data can make certain general assumptions. But that's fairly
natural. It's not like there is just inherently a great deal of
diversity that we need to be considered. For example, the WAL records
used by each individual index access method are all very similar. In
fact the most important index AM WAL records used by each index AM
(e.g. insert, delete, vacuum) have virtually the same format as each
other already.

-- 
Peter Geoghegan

Re: No Callbacks on FATAL

2023-01-11 Thread Tom Lane

Ed Behn  writes:
> I'm using a callback function that is called when a memory context is
> deleted to remove a temporary file. This works fine when the transaction
> ends normally or raises an ERROR. However, when a FATAL event happens, the
> callback is not run. Is this a bug or intended behaviour?

It's intended behavior, and I seriously doubt that it ever worked
differently.

> It seems to me that callbacks should be run in the event of a FATAL event
> in order to clean up any lingering issues.

They'd be far more likely to cause issues than cure them.  Or at least
that's the design assumption.  If you really need something here, put
it in an on_proc_exit callback not a memory context callback.

regards, tom lane

Re: How to generate the new expected out file.

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-05 16:22:01 +0530, Amit Kapila wrote:
> On Thu, Jan 5, 2023 at 4:12 PM jian he  wrote:
> > Hi.
> >
> > I changed the src/test/regress/sql/interval.sql, How can I generate the new 
> > src/test/regress/expected/interval.out file.
> >
> 
> You can run the tests and copy the required changes from
> src/test/regress/output/interval.out to
> src/test/regress/expected/interval.out

Wonder if we should have a bit of content about that in 
doc/src/sgml/regress.sgml?

Greetings,

Andres Freund

Re: pgsql: Delay commit status checks until freezing executes.

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-11 14:29:25 -0800, Peter Geoghegan wrote:
> On Sat, Jan 7, 2023 at 7:25 PM Andres Freund  wrote:
> > Probably a good idea, although it doesn't neatly fit right now.
> 
> I'll leave it for now.
> 
> Attached is v2, which changes things based on your feedback. Would
> like to get this out of the way soon.

Makes sense. It's clearly an improvement.


> + * We can't use TransactionIdDidAbort here because it won't treat 
> transactions
> + * that were in progress during a crash as aborted by now.  We determine that
> + * transactions aborted/crashed through process of elimination instead.

s/by now//?


>   * When using an MVCC snapshot, we rely on XidInMVCCSnapshot rather than
>   * TransactionIdIsInProgress, but the logic is otherwise the same: do not
> diff --git a/src/backend/access/transam/transam.c 
> b/src/backend/access/transam/transam.c
> index 3a28dcc43..7629904bb 100644
> --- a/src/backend/access/transam/transam.c
> +++ b/src/backend/access/transam/transam.c
> @@ -110,7 +110,8 @@ TransactionLogFetch(TransactionId transactionId)
>   *  transaction tree.
>   *
>   * See also TransactionIdIsInProgress, which once was in this module
> - * but now lives in procarray.c.
> + * but now lives in procarray.c, as well as comments at the top of
> + * heapam_visibility.c that explain how everything fits together.
>   * 
>   */

+1

Greetings,

Andres Freund

Re: pgsql: Doc: add XML ID attributes to and tags.

2023-01-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 09.01.23 21:08, Tom Lane wrote:
>> Doc: add XML ID attributes to  and  tags.

> Any reason the new ids in create_database.sgml deviate from the normal 
> naming schemes used everywhere else?  Is it to preserve the existing 
> create-database-strategy?  Maybe we should rename that one and make the 
> new ones consistent?

You'd have to ask Brar that, I didn't question his choices too much.

I have no objection to changing things as you suggest.  I'm hesitant to
rename very many pre-existing IDs for fear of breaking peoples' bookmarks,
but changing create-database-strategy doesn't seem like a big deal.

That reminds me that I was going to suggest fixing the few existing
variances from the "use '-' not '_'" policy:

$ grep 'id="[a-zA-Z0-9-]*_' *sgml ref/*sgml
config.sgml: 
libpq.sgml:
libpq.sgml:
libpq.sgml:
libpq.sgml:
pgbuffercache.sgml:  
ref/pg_checksums.sgml: 

As you say, this isn't required by the toolchain any longer, but it
seems like a good idea to have consistent tag spelling.  I'm particularly
annoyed by guc-plan-cache_mode, which isn't even consistent with itself
let alone every other guc-XXX tag.

regards, tom lane

Re: No Callbacks on FATAL

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-11 17:47:28 -0500, Ed Behn wrote:
> I'm developing a module that implements Haskell as a procedural language (
> https://www.postgresql.org/about/news/plhaskell-v10-released-2519/)
> 
> I'm using a callback function that is called when a memory context is
> deleted to remove a temporary file. This works fine when the transaction
> ends normally or raises an ERROR. However, when a FATAL event happens, the
> callback is not run. Is this a bug or intended behaviour? I think that this
> is a new behavior and that the callback was called in an earlier version
> (perhaps v14) when I was originally developing this code. I'm running
> v15.1.
> 
> It seems to me that callbacks should be run in the event of a FATAL event
> in order to clean up any lingering issues.

I think you need to provide a bit more details to allow us to analyze this. I
assume you're talking about a MemoryContextRegisterResetCallback()?  Which
memory context are you registering the callback on? What FATAL error is
preventing the cleanup from happening?

Even better would be a way to reproduce this without needing to build an
external extension with its own dependencies. Perhaps you can hack it into one
of the contrib/ modules?

Greetings,

Andres Freund

Re: Show various offset arrays for heap WAL records

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-11 14:53:54 -0800, Peter Geoghegan wrote:
> On Tue, Jan 10, 2023 at 11:35 AM Andres Freund  wrote:
> > Nontrivial, I'm afraid. We don't pass any relevant parameters to rm_desc:
> > void(*rm_desc) (StringInfo buf, XLogReaderState 
> > *record);
> >
> > so we'd need to patch all of them. That might be worth doing at some point,
> > but I don't want to tackle it right now.
> 
> Okay. Let's just get the basics in soon, then.

> I would like to have a similar capability for index access methods,
> but mostly just for investigating performance. Whenever we've really
> needed something like this for debugging it seems to have been a
> heapam thing, just because there's a lot more that can go wrong with
> pruning, which is spread across many different places.

What are your thoughts about the place for the helper functions? You're ok
with rmgrdesc_utils.[ch]?

Greetings,

Andres Freund

Re: Generate pg_stat_get_xact*() functions with Macros

2023-01-11 Thread Andres Freund

Hi,

Michael, CCing you because of the point about PG_STAT_GET_DBENTRY_FLOAT8
below.


On 2023-01-05 14:48:39 +0100, Drouvot, Bertrand wrote:
> While at it, I also took the opportunity to create the macros for 
> pg_stat_get_xact_function_total_time(),
> pg_stat_get_xact_function_self_time() and pg_stat_get_function_total_time(), 
> pg_stat_get_function_self_time()
> (even if the same code pattern is only repeated two 2 times).

I'd split that up into a separate commit.


> Now that this patch renames some fields

I don't mind renaming the fields - the prefixes really don't provide anything
useful. But it's not clear why this is related to this patch? You could just
include the f_ prefix in the macro, no?


> , I think that, for consistency, those ones should be renamed too (aka remove 
> the f_ and t_ prefixes):
> 
> PgStat_FunctionCounts.f_numcalls
> PgStat_StatFuncEntry.f_numcalls
> PgStat_TableCounts.t_truncdropped
> PgStat_TableCounts.t_delta_live_tuples
> PgStat_TableCounts.t_delta_dead_tuples
> PgStat_TableCounts.t_changed_tuples

Yea, without that the result is profoundly weird.


> But I think it would be better to do it in a follow-up patch (once this one 
> get committed).

I don't mind committing it in a separate commit, but I think it should be done
at least immediately following the other commit. I.e. developed together.

I probably would go the other way, and rename all of them first. That'd make
this commit a lot more focused, and the renaming commit purely mechanical.

Probably should remove PgStat_BackendFunctionEntry. PgStat_TableStatus has
reason to exist, but that's not true for PgStat_BackendFunctionEntry.



> @@ -168,19 +168,19 @@ pgstat_end_function_usage(PgStat_FunctionCallUsage 
> *fcu, bool finalize)
>   INSTR_TIME_ADD(total_func_time, f_self);
>  
>   /*
> -  * Compute the new f_total_time as the total elapsed time added to the
> -  * pre-call value of f_total_time.  This is necessary to avoid
> +  * Compute the new total_time as the total elapsed time added to the
> +  * pre-call value of total_time.  This is necessary to avoid
>* double-counting any time taken by recursive calls of myself.  (We do
>* not need any similar kluge for self time, since that already excludes
>* any recursive calls.)
>*/
> - INSTR_TIME_ADD(f_total, fcu->save_f_total_time);
> + INSTR_TIME_ADD(f_total, fcu->save_total_time);
>  
>   /* update counters in function stats table */
>   if (finalize)
>   fs->f_numcalls++;
> - fs->f_total_time = f_total;
> - INSTR_TIME_ADD(fs->f_self_time, f_self);
> + fs->total_time = f_total;
> + INSTR_TIME_ADD(fs->self_time, f_self);
>  }

I'd also rename f_self etc.


> @@ -148,29 +148,24 @@ pg_stat_get_function_calls(PG_FUNCTION_ARGS)
>   PG_RETURN_INT64(funcentry->f_numcalls);
>  }
>  
> -Datum
> -pg_stat_get_function_total_time(PG_FUNCTION_ARGS)
> -{
> - Oid funcid = PG_GETARG_OID(0);
> - PgStat_StatFuncEntry *funcentry;
> -
> - if ((funcentry = pgstat_fetch_stat_funcentry(funcid)) == NULL)
> - PG_RETURN_NULL();
> - /* convert counter from microsec to millisec for display */
> - PG_RETURN_FLOAT8(((double) funcentry->f_total_time) / 1000.0);
> +#define PG_STAT_GET_FUNCENTRY_FLOAT8(stat)   
> \
> +Datum
> \
> +CppConcat(pg_stat_get_function_,stat)(PG_FUNCTION_ARGS)  
> \
> +{
> \
> + Oid funcid = PG_GETARG_OID(0);  
> \
> + PgStat_StatFuncEntry *funcentry;
> \
> + 
> \
> + if ((funcentry = pgstat_fetch_stat_funcentry(funcid)) == NULL)  \
> + PG_RETURN_NULL();   
> \
> + /* convert counter from microsec to millisec for display */ 
> \
> + PG_RETURN_FLOAT8(((double) funcentry->stat) / 1000.0);  
> \
>  }

Hm. Given the conversion with / 1000, is PG_STAT_GET_FUNCENTRY_FLOAT8 an
accurate name? Maybe PG_STAT_GET_FUNCENTRY_FLOAT8_MS?

I now see that PG_STAT_GET_DBENTRY_FLOAT8 already exists, defined the same
way. But the name fields misleading enough that I'd be inclined to rename it?


> +#define PG_STAT_GET_XACT_WITH_SUBTRANS_RELENTRY_INT64(stat)  
> \

How about PG_STAT_GET_XACT_PLUS_SUBTRANS_INT64?


Although I suspect this actually hints at an architec

Re: Show various offset arrays for heap WAL records

2023-01-11 Thread Peter Geoghegan

On Tue, Jan 10, 2023 at 11:35 AM Andres Freund  wrote:
> Nontrivial, I'm afraid. We don't pass any relevant parameters to rm_desc:
> void(*rm_desc) (StringInfo buf, XLogReaderState *record);
>
> so we'd need to patch all of them. That might be worth doing at some point,
> but I don't want to tackle it right now.

Okay. Let's just get the basics in soon, then.

I would like to have a similar capability for index access methods,
but mostly just for investigating performance. Whenever we've really
needed something like this for debugging it seems to have been a
heapam thing, just because there's a lot more that can go wrong with
pruning, which is spread across many different places.

-- 
Peter Geoghegan

Re: Add SHELL_EXIT_CODE to psql

2023-01-11 Thread Corey Huinker

>
>
>
> The patch does not apply on top of HEAD as in [1], please post a rebased
> patch:
>
>
Conflict was due to the doc patch applying id tags to psql variable names.
I've rebased and added my own id tags to the two new variables.
From 9e2827a6f955e7cebf87ca538fab113a359951b4 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Wed, 11 Jan 2023 17:27:23 -0500
Subject: [PATCH v5 1/2] Add PG_OS_TARGET environment variable to enable
 OS-specific changes to regression tests.

---
 src/test/regress/pg_regress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 6cd5998b9d..bf0ada4560 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -594,6 +594,17 @@ initialize_environment(void)
 #endif
 	}
 
+		/*
+		 * Regression tests that invoke OS commands must occasionally use
+		 * OS-specific syntax (example: NUL vs /dev/null). This env var allows
+		 * those tests to adjust commands accordingly.
+		 */
+#if defined(WIN32)
+		setenv("PG_OS_TARGET", "WIN32", 1);
+#else
+		setenv("PG_OS_TARGET", "OTHER", 1);
+#endif
+
 	/*
 	 * Set translation-related settings to English; otherwise psql will
 	 * produce translated messages and produce diffs.  (XXX If we ever support
-- 
2.39.0

From 4ed7ef0a21a541277250641c63a7bceea4c1ef62 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Wed, 11 Jan 2023 17:28:17 -0500
Subject: [PATCH v5 2/2] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE
 which report the success or failure of the most recent psql OS-command
 executed via the \! command or `backticks`. These variables are roughly
 analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL
 commands.

---
 doc/src/sgml/ref/psql-ref.sgml | 21 +
 src/bin/psql/command.c |  8 +++
 src/bin/psql/help.c|  4 
 src/bin/psql/psqlscanslash.l   | 38 ++
 src/test/regress/expected/psql.out | 25 
 src/test/regress/sql/psql.sql  | 21 +
 6 files changed, 112 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index dc6528dc11..3d8a7353b3 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4264,6 +4264,27 @@ bar
 
   
 
+  
+   SHELL_ERROR
+   
+
+ true if the last shell command failed, false if
+ it succeeded. This applies to shell commands invoked via either \!
+ or `. See also SHELL_EXIT_CODE.
+
+   
+  
+
+  
+   SHELL_EXIT_CODE
+   
+
+ The exit code return by the last shell command, invoked via either \! or `.
+ 0 means no error. See also SHELL_ERROR.
+
+   
+  
+
   
 SHOW_ALL_RESULTS
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b5201edf55..5d15c2143b 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -5005,6 +5005,7 @@ static bool
 do_shell(const char *command)
 {
 	int			result;
+	char		exit_code_buf[32];
 
 	fflush(NULL);
 	if (!command)
@@ -5032,6 +5033,13 @@ do_shell(const char *command)
 	else
 		result = system(command);
 
+	snprintf(exit_code_buf, sizeof(exit_code_buf), "%d", WEXITSTATUS(result));
+	SetVariable(pset.vars, "SHELL_EXIT_CODE", exit_code_buf);
+	if (result == 0)
+		SetVariable(pset.vars, "SHELL_ERROR", "false");
+	else
+		SetVariable(pset.vars, "SHELL_ERROR", "true");
+
 	if (result == 127 || result == -1)
 	{
 		pg_log_error("\\!: failed");
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index e45c4aaca5..6d5226f793 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -455,6 +455,10 @@ helpVariables(unsigned short int pager)
 		  "show all results of a combined query (\\;) instead of only the last\n");
 	HELP0("  SHOW_CONTEXT\n"
 		  "controls display of message context fields [never, errors, always]\n");
+	HELP0("  SHELL_ERROR\n"
+		  "true if last shell command failed, else false\n");
+	HELP0("  SHELL_EXIT_CODE\n"
+		  "Exit code of the last shell command\n");
 	HELP0("  SINGLELINE\n"
 		  "if set, end of line terminates SQL commands (same as -S option)\n");
 	HELP0("  SINGLESTEP\n"
diff --git a/src/bin/psql/psqlscanslash.l b/src/bin/psql/psqlscanslash.l
index 8449ee1a52..f6f03bd816 100644
--- a/src/bin/psql/psqlscanslash.l
+++ b/src/bin/psql/psqlscanslash.l
@@ -23,6 +23,9 @@
 #include "fe_utils/conditional.h"
 
 #include "libpq-fe.h"
+#include "settings.h"
+#include "variables.h"
+#include 
 }
 
 %{
@@ -774,6 +777,8 @@ evaluate_backtick(PsqlScanState state)
 	bool		error = false;
 	char		buf[512];
 	size_t		result;
+	int			exit_code = 0;
+	char		exit_code_buf[32];
 
 	initPQExpBuffer(&cmd_output);
 
@@ -783,6 +788,7 @@ evaluate_backtick(PsqlScanState state)
 	{
 		pg_log_error("%s: %m", cmd);
 		error = true;
+		exit_code = -1;
 	}
 
 	if (!error)
@@ -790,7 +7

No Callbacks on FATAL

2023-01-11 Thread Ed Behn

I'm developing a module that implements Haskell as a procedural language (
https://www.postgresql.org/about/news/plhaskell-v10-released-2519/)

I'm using a callback function that is called when a memory context is
deleted to remove a temporary file. This works fine when the transaction
ends normally or raises an ERROR. However, when a FATAL event happens, the
callback is not run. Is this a bug or intended behaviour? I think that this
is a new behavior and that the callback was called in an earlier version
(perhaps v14) when I was originally developing this code. I'm running
v15.1.

It seems to me that callbacks should be run in the event of a FATAL event
in order to clean up any lingering issues.
-Ed

Re: pgsql: Delay commit status checks until freezing executes.

2023-01-11 Thread Peter Geoghegan

On Sat, Jan 7, 2023 at 7:25 PM Andres Freund  wrote:
> Probably a good idea, although it doesn't neatly fit right now.

I'll leave it for now.

Attached is v2, which changes things based on your feedback. Would
like to get this out of the way soon.

> > Also, does amcheck's get_xid_status() need a reference to these rules?
>
> Don't think so? Whad made you ask?

Just the fact that it seems to more or less follow the protocol
described at the top of heapam_visibility.c. Not very important,
though.

-- 
Peter Geoghegan

v2-0001-Improve-TransactionIdDidAbort-comments.patch
Description: Binary data

Re: [PATCH] psql: Add tab-complete for optional view parameters

2023-01-11 Thread Thomas Munro

On Thu, Jan 12, 2023 at 5:50 AM vignesh C  wrote:
> For some reason CFBot is not able to apply the patch, please have a
> look and post an updated version if required, also check and handle
> Dean Rasheed and Jim Jones  comment if applicable:
> === Applying patches on top of PostgreSQL commit ID
> 5f6401f81cb24bd3930e0dc589fc4aa8b5424cdc ===
> === applying patch
> ./v2-0001-psql-Add-tab-complete-for-optional-view-parameter.patch
> gpatch:  Only garbage was found in the patch input.

Melanie pointed me at this issue.  This particular entry is now fixed,
and I think I know what happened:  cfbot wasn't checking the HTTP
status when downloading patches from the web archives, because I had
incorrectly assumed Python's requests.get() would raise an exception
if the web server sent an error status, but it turns out you have to
ask for that.  I've now fixed that.  So I think it was probably trying
to apply one of those "guru meditation" error message the web archives
occasionally spit out.

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-01-11 Thread Magnus Hagander

On Wed, Jan 11, 2023 at 8:06 PM Jacob Champion 
wrote:

> On Wed, Jan 11, 2023 at 10:23 AM Magnus Hagander 
> wrote:
> > Sorry to jump in (very) late in this game. So first, I like this general
> approach :)
>
> Thanks!
>
> > It feels icky to have to add configure tests just to make a test work.
> But I guess there isn't really a way around that if we want to test the
> full thing.
>
> I agree...
>
> > However, shouldn't we be using X509_get_default_cert_file_env() to get
> the name of the env? Granted it's  unlikely to be anything else, but if
> it's an API you're supposed to use. (In an ideal world that function would
> not return anything in LibreSSL but I think it does include something, and
> then just ignores it?)
>
> I think you're right, but before I do that, is the cure better than
> the disease? It seems like that would further complicate a part of the
> Perl tests that is already unnecessarily complicated. (The Postgres
> code doesn't use the envvar at all.) Unless you already know of an
> OpenSSL-alike that doesn't use that same envvar name?
>

Fair point. No, I have not run into one, I just recalled having seen the
API :)

And you're right -- I didn't consider that we were looking at that one in
the *perl* code, not the C code. In the C code it would've been a trivial
replacement. In the perl, I agree it's not worth it -- at least not until
we run into a platform where it *would' matter.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-01-11 Thread Justin Pryzby

> Subject: [PATCH v45 4/5] Add system view tracking IO ops per backend type

The patch can/will fail with:

CREATE TABLESPACE test_io_shared_stats_tblspc LOCATION '';
+WARNING:  tablespaces created by regression test cases should have names 
starting with "regress_"

CREATE TABLESPACE test_stats LOCATION '';
+WARNING:  tablespaces created by regression test cases should have names 
starting with "regress_"

(I already sent patches to address the omission in cirrus.yml)

1760 :  errhint("Target must be \"archiver\", 
\"io\", \"bgwriter\", \"recovery_prefetch\", or \"wal\".")));
=> Do you want to put these in order?

pgstat_get_io_op_name() isn't currently being hit by tests; actually,
it's completely unused.

FlushRelationBuffers() isn't being hit for local buffers.

> +  
> pg_stat_iopg_stat_io
> +  
> +   One row per backend type, context, target object combination showing
> +   cluster-wide I/O statistics.

I suggest: "One row for each combination of of .."

> +   The pg_stat_io and
> +   pg_statio_ set of views are especially useful for
> +   determining the effectiveness of the buffer cache.  When the number of 
> actual
> +   disk reads is much smaller than the number of buffer hits, then the cache 
> is
> +   satisfying most read requests without invoking a kernel call.

I would change this say "Postgres' own buffer cache is satisfying ..."

> However, these
> +   statistics do not give the entire story: due to the way in which
> +   PostgreSQL handles disk I/O, data that is not 
> in
> +   the PostgreSQL buffer cache might still reside 
> in
> +   the kernel's I/O cache, and might therefore still be fetched without

I suggest to refer to "the kernel's page cache"

> +   The pg_stat_io view will contain one row for each
> +   backend type, I/O context, and target I/O object combination showing
> +   cluster-wide I/O statistics. Combinations which do not make sense are
> +   omitted.

"..for each combination of .."

> +  io_context for a type of I/O operation. For

"for I/O operations"

> +  vacuum: I/O operations done outside of shared
> +  buffers incurred while vacuuming and analyzing permanent relations.

s/incurred/performed/

> +  bulkread: Qualifying large read I/O operations
> +  done outside of shared buffers, for example, a sequential scan of a
> +  large table.

I don't think it's correct to say that it's "outside of" shared-buffers.
s/Qualifying/Certain/

> +  bulkwrite: Qualifying large write I/O operations
> +  done outside of shared buffers, such as COPY.

Same

> +Target object of an I/O operation. Possible values are:
> +   
> +
> + 
> +  relation: This includes permanent relations.

It says "includes permanent" but what seems to mean is that it
"exclusive of temporary relations".

> + 
> +  
> +   
> +read bigint
> +   
> +   
> +Number of read operations in units of op_bytes.

This looks too much like it means "bytes".
Should say: "in number of blocks of size >op_bytes<"

But wait - is it the number of read operations "in units of op_bytes"
(which would means this already multiplied by op_bytes, and is in units
of bytes).

Or the "number of read operations" *of* op_bytes chunks ?  Which would
mean this is a "pure" number, and could be multipled by op_bytes to
obtain a size in bytes.

> +Number of write operations in units of op_bytes.

> +Number of relation extend operations in units of
> +op_bytes.

same

> +In io_context normal, this 
> counts
> +the number of times a block was evicted from a buffer and replaced 
> with
> +another block. In io_contexts
> +bulkwrite, bulkread, and
> +vacuum, this counts the number of times a block 
> was
> +evicted from shared buffers in order to add the shared buffer to a
> +separate size-limited ring buffer.

This never defines what "evicted" means.  Does it mea that a dirty
buffer was written out ?

> +The number of times an existing buffer in a size-limited ring buffer
> +outside of shared buffers was reused as part of an I/O operation in 
> the
> +bulkread, bulkwrite, or
> +vacuum io_contexts.

Maybe say "as part of a bulk I/O operation (bulkread, bulkwrite, or
vacuum)."

> +  
> +   pg_stat_io can be used to inform database tuning.

> +   For example:
> +   
> +
> + 
> +  A high evicted count can indicate that shared 
> buffers
> +  should be increased.
> + 
> +
> +
> + 
> +  Client backends rely on the checkpointer to ensure data is persisted to
> +  permanent storage. Large numbers of files_synced by
> +  client backends could indicate a misconfiguration of
> +  shared buffers or of checkpointer. More information on checkpointer

of *the* checkpointer

> +  Normally, client backends should be able to re

Re: logical decoding and replication of sequences, take 2

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-11 22:30:42 +0100, Tomas Vondra wrote:
> On 1/11/23 21:58, Andres Freund wrote:
> > If you're thinking of decoding changes in parallel (rather than streaming 
> > out
> > large changes before commit when possible), you'd only be able to do that in
> > cases when transaction haven't performed catalog changes, I think. In which
> > case there'd also be no issue wrt transactional sequence changes.
> > 
> 
> Perhaps, although it's not clear to me how would you know that in
> advance? I mean, you could start decoding changes in parallel, and then
> you find one of the earlier transactions touched a catalog.

You could have a running count of in-progress catalog modifying transactions
and not allow parallelized processing when that's not 0.

> Bu maybe I misunderstand what "decoding" refers to - don't we need the
> snapshot only in reorderbuffer? In which case all the other stuff could
> be parallelized (not sure if that's really expensive).

Calling output functions is pretty expensive, so being able to call those in
parallel has some benefits. But I don't think we're there.

> Anyway, all of this is far out of scope of this patch.

Yea, clearly that's independent work. And I don't think relying on commit
order in one more place, i.e. for sequences, would make it harder.

Greetings,

Andres Freund

Re: logical decoding and replication of sequences, take 2

2023-01-11 Thread Tomas Vondra




On 1/11/23 21:12, Andres Freund wrote:
> Hi,
> 
> 
> Heikki, CCed you due to the point about 2c03216d8311 below.
> 
> 
> On 2023-01-10 19:32:12 +0100, Tomas Vondra wrote:
>> 0001 is a fix for the pre-existing issue in logicalmsg_decode,
>> attempting to build a snapshot before getting into a consistent state.
>> AFAICS this only affects assert-enabled builds and is otherwise
>> harmless, because we are not actually using the snapshot (apply gets a
>> valid snapshot from the transaction).
> 
> LGTM.
> 
> 
>> 0002 is a rebased version of the original approach, committed as
>> 0da92dc530 (and then reverted in 2c7ea57e56). This includes the same fix
>> as 0001 (for the sequence messages), the primary reason for the revert.
>>
>> The rebase was not quite straightforward, due to extensive changes in
>> how publications deal with tables/schemas, and so on. So this adopts
>> them, but other than that it behaves just like the original patch.
> 
> This is a huge diff:
>>  72 files changed, 4715 insertions(+), 612 deletions(-)
> 
> It'd be nice to split it to make review easier. Perhaps the sequence decoding
> support could be split from the whole publication rigamarole?
> 
> 
>> This does not include any changes to test_decoding and/or the built-in
>> replication - those will be committed in separate patches.
> 
> Looks like that's not the case anymore?
> 

Ah, right!  Now I realized I originally committed this in chunks, but
the revert was a single commit. And I just "reverted the revert" to
create this patch.

I'll definitely split this into smaller patches. This also explains the
obsolete commit message about test_decoding not being included, etc.

> 
>> +/*
>> + * Update the sequence state by modifying the existing sequence data row.
>> + *
>> + * This keeps the same relfilenode, so the behavior is non-transactional.
>> + */
>> +static void
>> +SetSequence_non_transactional(Oid seqrelid, int64 last_value, int64 
>> log_cnt, bool is_called)
>> +{
>> +SeqTableelm;
>> +Relationseqrel;
>> +Buffer  buf;
>> +HeapTupleData seqdatatuple;
>> +Form_pg_sequence_data seq;
>> +
>> +/* open and lock sequence */
>> +init_sequence(seqrelid, &elm, &seqrel);
>> +
>> +/* lock page' buffer and read tuple */
>> +seq = read_seq_tuple(seqrel, &buf, &seqdatatuple);
>> +
>> +/* check the comment above nextval_internal()'s equivalent call. */
>> +if (RelationNeedsWAL(seqrel))
>> +{
>> +GetTopTransactionId();
>> +
>> +if (XLogLogicalInfoActive())
>> +GetCurrentTransactionId();
>> +}
>> +
>> +/* ready to change the on-disk (or really, in-buffer) tuple */
>> +START_CRIT_SECTION();
>> +
>> +seq->last_value = last_value;
>> +seq->is_called = is_called;
>> +seq->log_cnt = log_cnt;
>> +
>> +MarkBufferDirty(buf);
>> +
>> +/* XLOG stuff */
>> +if (RelationNeedsWAL(seqrel))
>> +{
>> +xl_seq_rec  xlrec;
>> +XLogRecPtr  recptr;
>> +Pagepage = BufferGetPage(buf);
>> +
>> +XLogBeginInsert();
>> +XLogRegisterBuffer(0, buf, REGBUF_WILL_INIT);
>> +
>> +xlrec.locator = seqrel->rd_locator;
>> +xlrec.created = false;
>> +
>> +XLogRegisterData((char *) &xlrec, sizeof(xl_seq_rec));
>> +XLogRegisterData((char *) seqdatatuple.t_data, 
>> seqdatatuple.t_len);
>> +
>> +recptr = XLogInsert(RM_SEQ_ID, XLOG_SEQ_LOG);
>> +
>> +PageSetLSN(page, recptr);
>> +}
>> +
>> +END_CRIT_SECTION();
>> +
>> +UnlockReleaseBuffer(buf);
>> +
>> +/* Clear local cache so that we don't think we have cached numbers */
>> +/* Note that we do not change the currval() state */
>> +elm->cached = elm->last;
>> +
>> +relation_close(seqrel, NoLock);
>> +}
>> +
>> +/*
>> + * Update the sequence state by creating a new relfilenode.
>> + *
>> + * This creates a new relfilenode, to allow transactional behavior.
>> + */
>> +static void
>> +SetSequence_transactional(Oid seq_relid, int64 last_value, int64 log_cnt, 
>> bool is_called)
>> +{
>> +SeqTableelm;
>> +Relationseqrel;
>> +Buffer  buf;
>> +HeapTupleData seqdatatuple;
>> +Form_pg_sequence_data seq;
>> +HeapTuple   tuple;
>> +
>> +/* open and lock sequence */
>> +init_sequence(seq_relid, &elm, &seqrel);
>> +
>> +/* lock page' buffer and read tuple */
>> +seq = read_seq_tuple(seqrel, &buf, &seqdatatuple);
>> +
>> +/* Copy the existing sequence tuple. */
>> +tuple = heap_copytuple(&seqdatatuple);
>> +
>> +/* Now we're done with the old page */
>> +UnlockReleaseBuffer(buf);
>> +
>> +/*
>> + * Modify the copied tuple to update the sequence state (similar to what
>> + * ResetSequence does).
>> + */
>> +seq = (Form_pg_sequence_data) GETSTRUCT(tuple);
>> +seq->last_value = last_value;
>> +seq->is_called = is_c

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-11 16:18:34 -0500, Tom Lane wrote:
> Peter Geoghegan  writes:
> > On Wed, Jan 11, 2023 at 11:18 AM Andres Freund  wrote:
> >> I don't like that - it's also quite useful to disable use of ringbuffers 
> >> when
> >> you actually need to clean up indexes. Especially when we have a lot of 
> >> dead
> >> tuples we'll rescan indexes over and over...
> 
> > That's a fair point.
> 
> > My vote goes to "REUSE_BUFFERS", then.
> 
> I wonder whether it could make sense to allow a larger ringbuffer size,
> rather than just the limit cases of "on" and "off".

I can see that making sense, particularly if we were to later extend this to
other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading of
data > 16MB but also << s_b vastly slower, but it can still be very important
to use if there's lots of parallel processes loading data.

Maybe BUFFER_USAGE_LIMIT, with a value from -1 to N, with -1 indicating the
default value, 0 preventing use of a buffer access strategy, and 1..N
indicating the size in blocks?

Would we want to set an upper limit lower than implied by the memory limit for
the BufferAccessStrategy allocation?

Greetings,

Andres Freund

Re: logical decoding and replication of sequences, take 2

2023-01-11 Thread Tomas Vondra

On 1/11/23 21:58, Andres Freund wrote:
> Hi,
> 
> On 2023-01-11 15:41:45 -0500, Robert Haas wrote:
>> I wonder, then, what happens if somebody wants to do parallel apply.  That
>> would seem to require some relaxation of this rule, but then doesn't that
>> break what this patch wants to do?
> 
> I don't think it'd pose a direct problem - presumably you'd only parallelize
> applying changes, not committing the transactions containing them. You'd get a
> lot of inconsistencies otherwise.
> 

Right. It's the commit order that matters - as long as that's
maintained, the result should be consistent etc.

There's plenty of other hard problems, though - for example it's trivial
for the apply workers to apply the changes in the incorrect order
(contradicting commit order) and then a deadlock. And the deadlock
detector may easily keep aborting the incorrect worker (the oldest one),
so that the replication grinds down to a halt.

I was wondering recently how far would we get by just doing prefetch for
logical apply - instead of applying the changes, just try doing a lookup
on he replica identity values, and then simple serial apply.

> If you're thinking of decoding changes in parallel (rather than streaming out
> large changes before commit when possible), you'd only be able to do that in
> cases when transaction haven't performed catalog changes, I think. In which
> case there'd also be no issue wrt transactional sequence changes.
> 

Perhaps, although it's not clear to me how would you know that in
advance? I mean, you could start decoding changes in parallel, and then
you find one of the earlier transactions touched a catalog.

Bu maybe I misunderstand what "decoding" refers to - don't we need the
snapshot only in reorderbuffer? In which case all the other stuff could
be parallelized (not sure if that's really expensive).

Anyway, all of this is far out of scope of this patch.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-01-11 Thread Tom Lane

Peter Geoghegan  writes:
> On Wed, Jan 11, 2023 at 11:18 AM Andres Freund  wrote:
>> I don't like that - it's also quite useful to disable use of ringbuffers when
>> you actually need to clean up indexes. Especially when we have a lot of dead
>> tuples we'll rescan indexes over and over...

> That's a fair point.

> My vote goes to "REUSE_BUFFERS", then.

I wonder whether it could make sense to allow a larger ringbuffer size,
rather than just the limit cases of "on" and "off".

regards, tom lane

Re: pgsql: Add new GUC createrole_self_grant.

2023-01-11 Thread Robert Haas

On Wed, Jan 11, 2023 at 4:00 PM Tom Lane  wrote:
> Robert Haas  writes:
> > If you want to make safe a SECURITY DEFINER function written using sql
> > or plpgsql, you either have to schema-qualify every single reference
> > or, more realistically, attach a SET clause to the function to set the
> > search_path to a sane value during the time that the function is
> > executing. The problem here can be handled the same way, except that
> > it's needed in a vastly more limited set of circumstances: you have to
> > be calling a SECURITY DEFINER function that will execute CREATE ROLE
> > as a non-superuser (and that user then needs to be sensitive to the
> > value of this GUC in some security-relevant way). It might be good to
> > document this -- I just noticed that the CREATE FUNCTION page has a
> > section on "Writing SECURITY DEFINER Functions Safely" which talks
> > about dealing with the search_path issues, and it seems like it would
> > be worth adding a sentence or two there to talk about this.
>
> OK, I'd be satisfied with that.

OK, I'll draft a patch tomorrow.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: doc: mentioned CREATE+ATTACH PARTITION as an alternative to CREATE TABLE..PARTITION OF

2023-01-11 Thread Robert Haas

On Wed, Jan 11, 2023 at 10:48 AM Robert Treat  wrote:
> > @Robert: I wonder why shouldn't CREATE..PARTITION OF *also* be patched
> > to first create a table, and then attach the partition, transparently
> > doing what everyone would want, without having to re-read the updated
> > docs or know to issue two commands?  I wrote a patch for this which
> > "doesn't fail tests", but I still wonder if I'm missing something..
> >
>
> I was thinking there might be either lock escalation issues or perhaps
> issues around index attachment that don't surface using create
> partition of, but I don't actually see any, in which case that does
> seem like a better change all around. But like you, I feel I must be
> overlooking something :-)

To be honest, I'm not sure whether either of you are missing anything
or not. I think a major reason why I didn't implement this was that
it's a different code path. DefineRelation() has code to do a bunch of
things that are also done by ATExecAttachPartition(), and I haven't
gone through exhaustively and checked whether there are any relevant
differences. I think that part of the reason that I did not research
that at the time is that the patch was incredibly complicated to get
working at all and I didn't want to take any risk of adding things to
it that might create more problems. Now that it's been a few years, we
might feel more confident.

Another thing that probably deserves at least a bit of thought is the
fact that ATTACH PARTITION just attaches a partition, whereas CREATE
TABLE does a lot more things. Are any of those things potential
hazards? Like what if the newly-created table references the parent
via a foreign key, or uses the parent's row type as a column type or
as part of a column default expression or in a CHECK constraint or
something? Basically, try to think of weird scenarios where the new
table would interact with the parent in some weird way where the
weaker lock would be a problem. Maybe there's nothing to see here: not
sure.

Also, we need to separately analyze the cases where (1) the new
partition is the default partition, (2) the new partition is not the
default partition but a default partition exists, and (3) the new
partition is not the default partition and no default partition
exists.

Sorry not to have more definite thoughts here. I know that when I
developed the original patch, I thought about this case and decided my
brain was full. However, I do not recall whether I knew about any
specific problems that needed to be fixed, or just feared that there
might be some.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Remove source code display from \df+?

2023-01-11 Thread Tom Lane

Pavel Stehule  writes:
> st 11. 1. 2023 v 19:31 odesílatel Magnus Hagander 
> napsal:
>> This is only about Internal and C, isn't it? Isn't the oid of these
>> static, and identified by INTERNALlanguageId and ClanguageId respectively?
>> So we could just have the query show the prosrc column if the language oid
>> is one of those two, and otherwise show "Please use \sf to view the
>> source"?

> I think it can be acceptable - maybe we can rename the column "source code"
> like "internal name" or some like that.

Yeah, "source code" has always been kind of a misnomer for these
languages.

> again I don't think printing  "Please use \sf to view the source"? " often
> can be user friendly.  \? is clear and \sf is easy to use

We could shorten it to "See \sf" or something like that.  But if we change
the column header to "internal name" or the like, then the column just
obviously doesn't apply for non-internal languages, so leaving it null
should be fine.

regards, tom lane

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-11 14:38:34 -0600, Justin Pryzby wrote:
> On Wed, Jan 11, 2023 at 10:58:54AM -0800, Andres Freund wrote:
> > Some ideas:
> > 
> > USE_RING_BUFFERS on|off
> > REUSE_BUFFERS on|off
> 
> +1 for either of these.

Then I'd go for REUSE_BUFFERS. What made you prefer it over
LIMIT_BUFFER_USAGE?

USE_BUFFER_ACCESS_STRATEGY would be a name tied to the implementation that's
not awful, I think..

> I don't think it's an issue to expose implementation details here.
> Anyone who wants to change this will know about the implementation
> details that they're changing, and it's better to refer to it by the
> same/similar name and not by some other name that's hard to find.

A ringbuffer could refer to a lot of things other than something limiting
buffer usage, that's why I don't like it.

> BTW I can't see that the ring buffer is currently exposed in any
> user-facing docs for COPY/ALTER/VACUUM/CREATE ?

Yea, there's surprisingly little in the docs about it, given how much it
influences behaviour. It's mentioned in tablesample-method.sgml, but without
explanation - and it's a page documenting C API...

Greetings,

Andres Freund

Re: Can we let extensions change their dumped catalog schemas?

2023-01-11 Thread Tom Lane

Jacob Champion  writes:
> On Tue, Jan 10, 2023 at 7:53 PM Tom Lane  wrote:
>> I also fear that it will break
>> a bunch of use-cases that work fine today, which are exactly the
>> ones for which we originally defined pg_dump as *not* committing
>> to a particular extension version.

> Right, I think it would have to be opt-in. Say, a new control file
> option dump_version or some such.

That would require all the installed extensions to cope with this
the same way, which does not seem like a great assumption.

regards, tom lane

Re: pgsql: Add new GUC createrole_self_grant.

2023-01-11 Thread Tom Lane

Robert Haas  writes:
> If you want to make safe a SECURITY DEFINER function written using sql
> or plpgsql, you either have to schema-qualify every single reference
> or, more realistically, attach a SET clause to the function to set the
> search_path to a sane value during the time that the function is
> executing. The problem here can be handled the same way, except that
> it's needed in a vastly more limited set of circumstances: you have to
> be calling a SECURITY DEFINER function that will execute CREATE ROLE
> as a non-superuser (and that user then needs to be sensitive to the
> value of this GUC in some security-relevant way). It might be good to
> document this -- I just noticed that the CREATE FUNCTION page has a
> section on "Writing SECURITY DEFINER Functions Safely" which talks
> about dealing with the search_path issues, and it seems like it would
> be worth adding a sentence or two there to talk about this.

OK, I'd be satisfied with that.

regards, tom lane

Re: ATTACH PARTITION seems to ignore column generation status

2023-01-11 Thread Tom Lane

Amit Langote  writes:
> I've updated your disallow-generated-child-columns-2.patch to do this,
> and have also merged the delta post that I had attached with my last
> email, whose contents you sound to agree with.

Pushed with some further work to improve the handling of multiple-
inheritance cases.  We still need to insist that all or none of the
parent columns are generated, but we don't have to require their
generation expressions to be alike: that can be resolved by letting
the child table override the expression, much as we've long done for
plain default expressions.  (This did need some work in pg_dump
after all.)  I'm pretty happy with where this turned out.

regards, tom lane

Re: logical decoding and replication of sequences, take 2

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-11 15:41:45 -0500, Robert Haas wrote:
> I wonder, then, what happens if somebody wants to do parallel apply.  That
> would seem to require some relaxation of this rule, but then doesn't that
> break what this patch wants to do?

I don't think it'd pose a direct problem - presumably you'd only parallelize
applying changes, not committing the transactions containing them. You'd get a
lot of inconsistencies otherwise.

If you're thinking of decoding changes in parallel (rather than streaming out
large changes before commit when possible), you'd only be able to do that in
cases when transaction haven't performed catalog changes, I think. In which
case there'd also be no issue wrt transactional sequence changes.

Greetings,

Andres Freund

Re: low wal_retrieve_retry_interval causes missed signals on Windows

2023-01-11 Thread Andres Freund

Hi,

On 2023-01-10 22:11:16 -0800, Nathan Bossart wrote:
> The attached patch fixes this by always calling WaitLatch(), even if
> wal_retrieve_retry_interval milliseconds have already elapsed and the
> timeout is 0.

It doesn't seem right to call WaitLatch() just for that purpose - nor
necessarily a complete fix.

Given that we check for interrupts in other parts of recovery with
HandleStartupProcInterrupt(), which doesn't interact with latches, isn't the
actual bug that HandleStartupProcInterrupt() doesn't contain the same black
magic that CHECK_FOR_INTERRUPTS() contains on windows?  Namely this stuff:

#ifndef WIN32
...
#else
#define INTERRUPTS_PENDING_CONDITION() \
(unlikely(UNBLOCKED_SIGNAL_QUEUE()) ? pgwin32_dispatch_queued_signals() 
: 0, \
 unlikely(InterruptPending))
#endif

/* Service interrupt, if one is pending and it's safe to service it now */
#define CHECK_FOR_INTERRUPTS() \
do { \
if (INTERRUPTS_PENDING_CONDITION()) \
ProcessInterrupts(); \
} while(0)

Looks like we have that bug in quite a few places... Some are "protected" by
unconditional WaitLatch() calls, but at least pgarch.c, checkpointer.c via
CheckpointWriteDelay() seem borked.

Greetings,

Andres Freund

Re: logical decoding and replication of sequences, take 2

2023-01-11 Thread Robert Haas

On Wed, Jan 11, 2023 at 3:28 PM Andres Freund  wrote:
> On 2023-01-11 15:23:18 -0500, Robert Haas wrote:
> > Yeah, I meant if #1 had committed and then #2 started to do its thing.
> > I was worried that decoding might reach the nextval operations in
> > transaction #2 before it replayed #1.
> >
> > This worry may be entirely based on me not understanding how this
> > actually works. Do we always apply a transaction as soon as we see the
> > commit record for it, before decoding any further?
>
> Yes.
>
> Otherwise we'd have a really hard time figuring out the correct historical
> snapshot to use for subsequent transactions - they'd have been able to see the
> catalog modifications made by the committing transaction.

I wonder, then, what happens if somebody wants to do parallel apply.
That would seem to require some relaxation of this rule, but then
doesn't that break what this patch wants to do?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

1 2 >

1 - 100 of 191 matches

Mail list logo