date:20230105

Add progress reporting to pg_verifybackup

2023-01-05 Thread Masahiko Sawada

Hi all,

I've attached the simple patch to add the progress reporting option to
pg_verifybackup. The progress information is displayed with --progress
option only during the checksum verification, which is the most time
consuming task. It cannot be used together with --quiet option.

Feedback is very welcome.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v1-0001-Add-progress-reporting-to-pg_verifybackup.patch
Description: Binary data

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-05 Thread Dilip Kumar

On Fri, Jan 6, 2023 at 12:59 PM Dilip Kumar  wrote:
>
> > > 3.
> > > +else if (shmq_res == SHM_MQ_WOULD_BLOCK)
> > > +{
> > > +/* Replay the changes from the file, if any. */
> > > +if (pa_has_spooled_message_pending())
> > > +{
> > > +pa_spooled_messages();
> > > +}
> > >
> > > I think we do not need this pa_has_spooled_message_pending() function.
> > > Because this function is just calling pa_get_fileset_state() which is
> > > acquiring mutex and getting filestate then if the filestate is not
> > > FS_EMPTY then we call pa_spooled_messages() that will again call
> > > pa_get_fileset_state() which will again acquire mutex.  I think when
> > > the state is FS_SERIALIZE_IN_PROGRESS it will frequently call
> > > pa_get_fileset_state() consecutively 2 times, and I think we can
> > > easily achieve the same behavior with just one call.
> > >
> >
> > This is just to keep the code easy to follow. As this would be a rare
> > case, so thought of giving preference to code clarity.
>
> I think the code will be simpler with just one function no? I mean
> instead of calling pa_has_spooled_message_pending() in if condition
> what if we directly call pa_spooled_messages();, this is anyway
> fetching the file_state and if the filestate is EMPTY then it can
> return false, and if it returns false we can execute the code which is
> there in else condition.  We might need to change the name of the
> function though.
>
But anyway it is not a performance-critical path so if you think the
current way looks cleaner then I am fine with that too.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-05 Thread Dilip Kumar

On Fri, Jan 6, 2023 at 12:05 PM Amit Kapila  wrote:
>

>
> Yeah, I also don't think sending extra eight bytes with stream_start
> message is worth it. But it is fine to mention the same in the
> comments.

Right.

> > 2.
> >
> > + * XXX Additionally, we also stop the worker if the leader apply worker
> > + * serialize part of the transaction data due to a send timeout. This 
> > is
> > + * because the message could be partially written to the queue and 
> > there
> > + * is no way to clean the queue other than resending the message until 
> > it
> > + * succeeds. Instead of trying to send the data which anyway would have
> > + * been serialized and then letting the parallel apply worker deal with
> > + * the spurious message, we stop the worker.
> > + */
> > +if (winfo->serialize_changes ||
> > +list_length(ParallelApplyWorkerPool) >
> > +(max_parallel_apply_workers_per_subscription / 2))
> >
> > IMHO this reason (XXX Additionally, we also stop the worker if the
> > leader apply worker serialize part of the transaction data due to a
> > send timeout) for stopping the worker looks a bit hackish to me.  It
> > may be a rare case so I am not talking about the performance but the
> > reasoning behind stopping is not good. Ideally we should be able to
> > clean up the message queue and reuse the worker.
> >
>
> TBH, I don't know what is the better way to deal with this with the
> current infrastructure. I thought we can do this as a separate
> enhancement in the future.

Okay.

> > 3.
> > +else if (shmq_res == SHM_MQ_WOULD_BLOCK)
> > +{
> > +/* Replay the changes from the file, if any. */
> > +if (pa_has_spooled_message_pending())
> > +{
> > +pa_spooled_messages();
> > +}
> >
> > I think we do not need this pa_has_spooled_message_pending() function.
> > Because this function is just calling pa_get_fileset_state() which is
> > acquiring mutex and getting filestate then if the filestate is not
> > FS_EMPTY then we call pa_spooled_messages() that will again call
> > pa_get_fileset_state() which will again acquire mutex.  I think when
> > the state is FS_SERIALIZE_IN_PROGRESS it will frequently call
> > pa_get_fileset_state() consecutively 2 times, and I think we can
> > easily achieve the same behavior with just one call.
> >
>
> This is just to keep the code easy to follow. As this would be a rare
> case, so thought of giving preference to code clarity.

I think the code will be simpler with just one function no? I mean
instead of calling pa_has_spooled_message_pending() in if condition
what if we directly call pa_spooled_messages();, this is anyway
fetching the file_state and if the filestate is EMPTY then it can
return false, and if it returns false we can execute the code which is
there in else condition.  We might need to change the name of the
function though.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Logical replication timeout problem

2023-01-05 Thread Ashutosh Bapat

Hi Wang,
Thanks for working on this. One of our customer faced a similar
situation when running BDR with PostgreSQL.

I tested your patch and it solves the problem.

Please find some review comments below

On Tue, Nov 8, 2022 at 8:34 AM wangw.f...@fujitsu.com
 wrote:
>
>
> Attach the patch.
>

+/*
+ * Helper function for ReorderBufferProcessTXN for updating progress.
+ */
+static inline void
+ReorderBufferUpdateProgress(ReorderBuffer *rb, ReorderBufferTXN *txn,
+ReorderBufferChange *change)
+{
+LogicalDecodingContext *ctx = rb->private_data;
+static intchanges_count = 0;

It's not easy to know that a variable is static when reading the code which
uses it. So it's easy to interpret code wrong. I would probably track it
through logical decoding context itself OR through a global variable like other
places where we track the last timestamps. But there's more below on this.

+
+if (!ctx->update_progress)
+return;
+
+Assert(!ctx->fast_forward);
+
+/* set output state */
+ctx->accept_writes = false;
+ctx->write_xid = txn->xid;
+ctx->write_location = change->lsn;
+ctx->end_xact = false;

This patch reverts many of the changes of the previous commit which tried to
fix this issue i.e. 8df2374. end_xact was introduced by the same commit but
without much explanation of that in the commit message. Its only user,
WalSndUpdateProgress(), is probably making a wrong assumption as well.

 * We don't have a mechanism to get the ack for any LSN other than end
 * xact LSN from the downstream. So, we track lag only for end of
 * transaction LSN.

IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr which is
sent downstream through a keep alive message. Downstream may acknowledge this
LSN. So we do get ack for any LSN, not just commit LSN.

So I propose removing end_xact as well.

+
+/*
+ * We don't want to try sending a keepalive message after processing each
+ * change as that can have overhead. Tests revealed that there is no
+ * noticeable overhead in doing it after continuously processing 100 or so
+ * changes.
+ */
+#define CHANGES_THRESHOLD 100

I think a time based threashold makes more sense. What if the timeout was
nearing and those 100 changes just took little more time causing a timeout? We
already have a time based threashold in WalSndKeepaliveIfNecessary(). And that
function is invoked after reading every WAL record in WalSndLoop(). So it does
not look like it's an expensive function. If it is expensive we might want to
worry about WalSndLoop as well. Does it make more sense to remove this
threashold?

+
+/*
+ * After continuously processing CHANGES_THRESHOLD changes, we
+ * try to send a keepalive message if required.
+ */
+if (++changes_count >= CHANGES_THRESHOLD)
+{
+ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
+changes_count = 0;
+}
+}
+

On the other thread, I mentioned that we don't have a TAP test for it.
I agree with
Amit's opinion there that it's hard to create a test which will timeout
everywhere. I think what we need is a way to control the time required for
decoding a transaction.

A rough idea is to induce a small sleep after decoding every change. The amount
of sleep * number of changes will help us estimate and control the amount of
time taken to decode a transaction. Then we create a transaction which will
take longer than the timeout threashold to decode. But that's a
significant code. I
don't think PostgreSQL has a facility to induce a delay at a particular place
in the code.

-- 
Best Wishes,
Ashutosh Bapat

Re: Rework of collation code, extensibility

2023-01-05 Thread Jeff Davis

On Wed, 2023-01-04 at 22:46 +0100, Peter Eisentraut wrote:
> It combines some refactoring that was previously posted with partial 
> support for multiple ICU libraries with partial support for some new 
> hooks.  Shouldn't those be three separate threads?

Originally they felt more separate to me, too; but as I worked on them
it seemed better to consider them as a patch series. Whatever is easier
for reviewers works for me, though.

>   I think the multiple 
> ICU libraries already does have a separate thread; how does this
> relate 
> to that work?

Multilib ICU support adds complexity, and my hope is that this patch
set cleans up and organizes things to better prepare for that
complexity.

>   I don't know what the hooks are supposed to be for?

I found them very useful for testing during development. One of the
patches adds a test module for the ICU hook, and I think that's a
valuable place to test regardless of whether any other extension uses
the hook. Also, if proper multilib support doesn't land in 16, then the
hooks could be a way to build rudimentary multilib support (or at least
some kind of ICU version lockdown) until it does land.

When Thomas's work is in place, I expect the hooks to change slightly.
The hooks are not meant to set any specific API in stone.

>   What 
> other locale libraries are you thinking about using this way?  How
> can 
> we asses whether these interfaces are sufficient for that?

I'm not considering any other locale libraries, nor did I see much
discussion of that.

>   The 
> refactoring patches don't look convincing just by looking at the
> numbers:
> 
>   3 files changed, 406 insertions(+), 247 deletions(-)
>   6 files changed, 481 insertions(+), 150 deletions(-)
>   12 files changed, 400 insertions(+), 323 deletions(-)

The existing code is not great, in my opinion: it doesn't have clear
API boundaries, the comments are insufficient, and lots of special
cases need to be handled awkwardly by callers. That style is hard to
beat when it comes to the raw line count; but it's quite difficult to
understand and work on.

I think my changes are an improvement, but obviously that depends on
the opinion of others who are working in this part of the code. What do
you think?

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-01-05 Thread Japin Li



On Mon, 19 Dec 2022 at 22:40, Maxim Orlov  wrote:
> Hi!
>
> As a result of discussion in the thread [0], Robert Haas proposed to focus
> on making SLRU 64 bit, as a first step towards 64 bit XIDs.
> Here is the patch set.
>
> In overall, code of this patch set is based on the existing code from [0]
> and may be simplified, due to the fact, that SLRU_PAGES_PER_SEGMENT is not
> meant to be changed now.
> But I decided to leave it that way. At least for now.
>
> As always, reviews and opinions are very welcome.
>

For v51-0003. We can use GetClogDirName instead of GET_MAJOR_VERSION in
copy_subdir_files().

diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 1c49c63444..3934978b97 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -857,10 +857,7 @@ copy_xact_xlog_xid(void)
pfree(new_path);
}
else
-   copy_subdir_files(GET_MAJOR_VERSION(old_cluster.major_version) 
<= 906 ?
- "pg_clog" : "pg_xact",
- 
GET_MAJOR_VERSION(new_cluster.major_version) <= 906 ?
- "pg_clog" : "pg_xact");
+   copy_subdir_files(GetClogDirName(old_cluster), 
GetClogDirName(new_cluster));
 
prep_status("Setting oldest XID for new cluster");
exec_prog(UTILITY_LOG_FILE, NULL, true, true,

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

2023-01-05 Thread Amit Langote

On Fri, Jan 6, 2023 at 3:33 PM Tom Lane  wrote:
> Amit Langote  writes:
> > BTW, you wrote in the commit message:
> > (At present it seems that we don't enforce that for partitioning
> > either, which is likely wrong to some degree or other; but the case
> > clearly needs to be handled with traditional inheritance.)
>
> > Maybe I'm missing something, but AFICS, neither
> > traditional-inheritance child tables nor partitions allow a generated
> > column with an expression that is not the same as the parent's
> > expression for the same generated column:
>
> Well, there's some large holes in that, as per my post at [1].
> I'm on board with locking this down for partitioning, but we haven't
> done so yet.
>
> [1] https://www.postgresql.org/message-id/2793383.1672944799%40sss.pgh.pa.us

Ah, I had missed that.  Will check and reply there.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-05 Thread Amit Kapila

On Fri, Jan 6, 2023 at 11:24 AM Dilip Kumar  wrote:
>
> On Fri, Jan 6, 2023 at 9:37 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Thursday, January 5, 2023 7:54 PM Dilip Kumar  
> > wrote:
> > Thanks, I have started another thread[1]
> >
> > Attach the parallel apply patch set here again. I didn't change the patch 
> > set,
> > attach it here just to let the CFbot keep testing it.
>
> I have completed the review and some basic testing and it mostly looks
> fine to me.  Here is my last set of comments/suggestions.
>
> 1.
> /*
>  * Don't start a new parallel worker if user has set skiplsn as it's
>  * possible that they want to skip the streaming transaction. For
>  * streaming transactions, we need to serialize the transaction to a file
>  * so that we can get the last LSN of the transaction to judge whether to
>  * skip before starting to apply the change.
>  */
> if (!XLogRecPtrIsInvalid(MySubscription->skiplsn))
> return false;
>
>
> I think this is fine to block parallelism in this case, but it is also
> possible to make it less restrictive, basically, only if the first lsn
> of the transaction is <= skiplsn, then only it is possible that the
> final_lsn might match with skiplsn otherwise that is not possible. And
> if we want then we can allow parallelism in that case.
>
> I understand that currently we do not have first_lsn of the
> transaction in stream start message but I think that should be easy to
> do?  Although I am not sure if it is worth it, it's good to make a
> note at least.
>

Yeah, I also don't think sending extra eight bytes with stream_start
message is worth it. But it is fine to mention the same in the
comments.

> 2.
>
> + * XXX Additionally, we also stop the worker if the leader apply worker
> + * serialize part of the transaction data due to a send timeout. This is
> + * because the message could be partially written to the queue and there
> + * is no way to clean the queue other than resending the message until it
> + * succeeds. Instead of trying to send the data which anyway would have
> + * been serialized and then letting the parallel apply worker deal with
> + * the spurious message, we stop the worker.
> + */
> +if (winfo->serialize_changes ||
> +list_length(ParallelApplyWorkerPool) >
> +(max_parallel_apply_workers_per_subscription / 2))
>
> IMHO this reason (XXX Additionally, we also stop the worker if the
> leader apply worker serialize part of the transaction data due to a
> send timeout) for stopping the worker looks a bit hackish to me.  It
> may be a rare case so I am not talking about the performance but the
> reasoning behind stopping is not good. Ideally we should be able to
> clean up the message queue and reuse the worker.
>

TBH, I don't know what is the better way to deal with this with the
current infrastructure. I thought we can do this as a separate
enhancement in the future.

> 3.
> +else if (shmq_res == SHM_MQ_WOULD_BLOCK)
> +{
> +/* Replay the changes from the file, if any. */
> +if (pa_has_spooled_message_pending())
> +{
> +pa_spooled_messages();
> +}
>
> I think we do not need this pa_has_spooled_message_pending() function.
> Because this function is just calling pa_get_fileset_state() which is
> acquiring mutex and getting filestate then if the filestate is not
> FS_EMPTY then we call pa_spooled_messages() that will again call
> pa_get_fileset_state() which will again acquire mutex.  I think when
> the state is FS_SERIALIZE_IN_PROGRESS it will frequently call
> pa_get_fileset_state() consecutively 2 times, and I think we can
> easily achieve the same behavior with just one call.
>

This is just to keep the code easy to follow. As this would be a rare
case, so thought of giving preference to code clarity.

-- 
With Regards,
Amit Kapila.

Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

2023-01-05 Thread Tom Lane

Amit Langote  writes:
> BTW, you wrote in the commit message:
> (At present it seems that we don't enforce that for partitioning
> either, which is likely wrong to some degree or other; but the case
> clearly needs to be handled with traditional inheritance.)

> Maybe I'm missing something, but AFICS, neither
> traditional-inheritance child tables nor partitions allow a generated
> column with an expression that is not the same as the parent's
> expression for the same generated column:

Well, there's some large holes in that, as per my post at [1].
I'm on board with locking this down for partitioning, but we haven't
done so yet.

regards, tom lane

[1] https://www.postgresql.org/message-id/2793383.1672944799%40sss.pgh.pa.us

Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

2023-01-05 Thread Amit Langote

On Fri, Jan 6, 2023 at 12:28 AM Tom Lane  wrote:
> Amit Langote  writes:
> > On Thu, Jan 5, 2023 at 4:59 AM Tom Lane  wrote:
> >> I've not looked into what it'd take to back-patch this.  We can't
> >> add a field to ResultRelInfo in released branches (cf 4b3e37993),
> >> but we might be able to repurpose RangeTblEntry.extraUpdatedCols.
>
> > I think we can make that work.  Would you like me to give that a try?
>
> I'm on it already.

Thanks.  What you committed seems fine to me.

>  AFAICT, the above won't actually work because
> we don't have RTEs for all ResultRelInfos (per the
> "relinfo->ri_RangeTableIndex != 0" test in ExecGetExtraUpdatedCols).

Yeah, I noticed that too.  I was thinking that that wouldn't be a
problem, because it is only partitions that are execution-time routing
targets that don't have their own RTEs and using a translated copy of
the root parent's RTE's extraUpdatedCols for them as before isn't
wrong.  Note that partitions can't have generated columns that are not
present in its parent.

BTW, you wrote in the commit message:

However, there's nothing that says a traditional-inheritance child
can't have generated columns that aren't there in its parent, or that
have different dependencies than are in the parent's expression.
(At present it seems that we don't enforce that for partitioning
either, which is likely wrong to some degree or other; but the case
clearly needs to be handled with traditional inheritance.)

Maybe I'm missing something, but AFICS, neither
traditional-inheritance child tables nor partitions allow a generated
column with an expression that is not the same as the parent's
expression for the same generated column:

-- traditional inheritance
create table inhp (a int, b int generated always as (a+1) stored);
create table inhc (b int generated always as (a+2) stored) inherits (inhp);
NOTICE:  moving and merging column "b" with inherited definition
DETAIL:  User-specified column moved to the position of the inherited column.
ERROR:  child column "b" specifies generation expression
HINT:  Omit the generation expression in the definition of the child
table column to inherit the generation expression from the parent
table.
create table inhc (a int, b int);
alter table inhc inherit inhp;
ERROR:  column "b" in child table must be a generated column
alter table inhc drop b, add b int generated always as (a+2) stored;
alter table inhc inherit inhp;
ERROR:  column "b" in child table has a conflicting generation expression

-- partitioning
create table partp (a int, b int generated always as (a+1) stored)
partition by list (a);
create table partc partition of partp (b generated always as (a+2)
stored) for values in (1);
ERROR:  generated columns are not supported on partitions
create table partc (a int, b int);
alter table partp attach partition partc for values in (1);
ERROR:  column "b" in child table must be a generated column
alter table partc drop b, add b int generated always as (a+2) stored;
alter table partp attach partition partc for values in (1);
ERROR:  column "b" in child table has a conflicting generation expression


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: Moving forward with TDE

2023-01-05 Thread vignesh C

On Fri, 4 Nov 2022 at 03:36, David Christensen
 wrote:
>
> > Unless somebody in the community remembers open questions/issues with
> > TDE that were never addressed I suggest simply iterating with our
> > usual testing/reviewing process. For now I'm going to change the
> > status of the CF entry [1] to "Waiting for Author" since the patchset
> > doesn't pass the CI [2].
>
> Thanks, enclosed is a new version that is rebased on HEAD and fixes a
> bug that the new pg_control_init() test picked up.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
b82557ecc2ebbf649142740a1c5ce8d19089f620 ===
=== applying patch
./v2-0004-cfe-04-common_over_cfe-03-scripts-squash-commit.patch
patching file src/common/Makefile
Hunk #2 FAILED at 84.
1 out of 2 hunks FAILED -- saving rejects to file src/common/Makefile.rej

[1] - http://cfbot.cputube.org/patch_41_3985.log

Regards,
Vignesh

Re: refactoring relation extension and BufferAlloc(), faster COPY

2023-01-05 Thread vignesh C

On Sat, 29 Oct 2022 at 08:24, Andres Freund  wrote:
>
> The patches here aren't fully polished (as will be evident). But they should
> be more than good enough to discuss whether this is a sane direction.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
f2857af485a00ab5dbfa2c83af9d83afe4378239 ===
=== applying patch
./v1-0001-wip-lwlock-fix-quadratic-behaviour-with-very-long.patch
patching file src/include/storage/proc.h
Hunk #1 FAILED at 217.
1 out of 1 hunk FAILED -- saving rejects to file src/include/storage/proc.h.rej
patching file src/backend/storage/lmgr/lwlock.c
Hunk #1 succeeded at 988 with fuzz 2 (offset 1 line).
Hunk #2 FAILED at 1047.
Hunk #3 FAILED at 1076.
Hunk #4 FAILED at 1104.
Hunk #5 FAILED at 1117.
Hunk #6 FAILED at 1141.
Hunk #7 FAILED at 1775.
Hunk #8 FAILED at 1790.
7 out of 8 hunks FAILED -- saving rejects to file
src/backend/storage/lmgr/lwlock.c.rej

[1] - http://cfbot.cputube.org/patch_41_3993.log

Regards,
Vignesh

Re: Add a new pg_walinspect function to extract FPIs from WAL records

2023-01-05 Thread Bharath Rupireddy

On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy
 wrote:
>
> > I'm also wondering if it would make sense to extend the test coverage of it 
> > (and pg_waldump) to "validate" that both
> > extracted images are the same and matches the one modified right after the 
> > checkpoint.
> >
> > What do you think? (could be done later in another patch though).
>
> I think pageinspect can be used here. We can fetch the raw page from
> the table after the checkpoint and raw FPI from the WAL record logged
> as part of the update. I've tried to do so [1], but I see a slight
> difference in the raw output. The expectation is that they both be the
> same. It might be that the update operation logs the FPI with some
> more info set (prune_xid). I'll try to see why it is so.
>
> I'm attaching the v2 patch for further review.
>
> [1]
> SELECT * FROM page_header(:'page_from_table');
> lsn| checksum | flags | lower | upper | special | pagesize |
> version | prune_xid
> ---+--+---+---+---+-+--+-+---
>  0/1891D78 |0 | 0 |40 |  8064 |8192 | 8192 |
> 4 | 0
> (1 row)
>
> SELECT * FROM page_header(:'page_from_wal');
> lsn| checksum | flags | lower | upper | special | pagesize |
> version | prune_xid
> ---+--+---+---+---+-+--+-+---
>  0/1891D78 |0 | 0 |44 |  8032 |8192 | 8192 |
> 4 |   735
> (1 row)

Ugh, v2 patch missed the new file added, I'm attaching v3 patch for
further review. Sorry for the noise.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v3-0001-Add-FPI-extract-function-to-pg_walinspect.patch
Description: Binary data

RE: Force streaming every change in logical decoding

2023-01-05 Thread shiy.f...@fujitsu.com

On Thu, Dec 22, 2022 3:17 PM Masahiko Sawada  wrote:
> >  I think
> > instead of adding new tests with this patch, it may be better to
> > change some of the existing tests related to streaming to use this
> > parameter as that will clearly show one of the purposes of this patch.
> 
> +1. I think test_decoding/sql/stream.sql and spill.sql are good
> candidates and we change logical replication TAP tests in a separate
> patch.
> 

Hi,

I tried to reduce the data size in test_decoding tests by using the new added
GUC "logical_decoding_mode", and found that the new GUC is not suitable for
some cases.

For example, in the following cases in stream.sql (and there are some similar
cases in twophase_stream.sql), the changes in sub transaction exceed
logical_decoding_work_mem, but they won't be streamed because the it is rolled
back. (But the transaction is marked as streamed.) After the sub transaction,
there is a small amount of inserts, as logical_decoding_work_mem is not
exceeded, it will be streamed when decoding COMMIT. If we use
logical_decoding_mode=immediate, the small amount of inserts in toplevel
transaction will be streamed immediately. This is different from before, so I'm
not sure we can use logical_decoding_mode in this case.

```
-- streaming test with sub-transaction
BEGIN;
savepoint s1;
SELECT 'msg5' FROM pg_logical_emit_message(true, 'test', repeat('a', 50));
INSERT INTO stream_test SELECT repeat('a', 2000) || g.i FROM generate_series(1, 
35) g(i);
TRUNCATE table stream_test;
rollback to s1;
INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 
20) g(i);
COMMIT;
```

Besides, some cases in spill.sql can't use the new GUC because they want to
serialize when processing the specific toplevel transaction or sub transaction.
For example, in the case below, the changes in the subxact exceed
logical_decoding_work_mem and are serialized, and the insert after subxact is
not serialized. If logical_decoding_mode is used, both of them will be
serialized. This is not what we want to test.

```
-- spilling subxact, non-spilling main xact
BEGIN;
SAVEPOINT s;
INSERT INTO spill_test SELECT 'serialize-subbig-topsmall--1:'||g.i FROM 
generate_series(1, 5000) g(i);
RELEASE SAVEPOINT s;
INSERT INTO spill_test SELECT 'serialize-subbig-topsmall--2:'||g.i FROM 
generate_series(5001, 5001) g(i);
COMMIT;
```

Although the rest cases in spill.sql can use new GUC, but it needs set and reset
logical_decoding_mode many times. Besides, the tests in this file were added
before logical_decoding_work_mem was introduced, I am not sure if we can modify
these cases by using the new GUC.

Also, it looks the tests for toast in stream.sql can't be changed, too.

Due to the above reasons, it seems that currently few tests can be modified to
use logical_decoding_mode. If later we find some tests can changed to use
it, we can do it in a separate thread. I will close the CF entry.

Regards,
Shi yu

Re: A new strategy for pull-up correlated ANY_SUBLINK

2023-01-05 Thread vignesh C

On Sun, 13 Nov 2022 at 04:15, Tom Lane  wrote:
>
> Andy Fan  writes:
> > In the past we pull-up the ANY-sublink with 2 steps, the first step is to
> > pull up the sublink as a subquery, and the next step is to pull up the
> > subquery if it is allowed.  The benefits of this method are obvious,
> > pulling up the subquery has more requirements, even if we can just finish
> > the first step, we still get huge benefits. However the bad stuff happens
> > if varlevelsup = 1 involves, things fail at step 1.
>
> > convert_ANY_sublink_to_join ...
>
> > if (contain_vars_of_level((Node *) subselect, 1))
> > return NULL;
>
> > In this patch we distinguish the above case and try to pull-up it within
> > one step if it is helpful, It looks to me that what we need to do is just
> > transform it to EXIST-SUBLINK.
>
> This patch seems awfully messy to me.  The fact that you're having to
> duplicate stuff done elsewhere suggests at the least that you've not
> plugged the code into the best place.
>
> Looking again at that contain_vars_of_level restriction, I think the
> reason for it was just to avoid making a FROM subquery that has outer
> references, and the reason we needed to avoid that was merely that we
> didn't have LATERAL at the time.  So I experimented with the attached.
> It seems to work, in that we don't get wrong answers from any of the
> small number of places that are affected.  (I wonder though whether
> those test cases still test what they were intended to, particularly
> the postgres_fdw one.  We might have to try to hack them some more
> to not get affected by this optimization.)  Could do with more test
> cases, no doubt.
>
> One thing I'm not at all clear about is whether we need to restrict
> the optimization so that it doesn't occur if the subquery contains
> outer references falling outside available_rels.  I think that that
> case is covered by is_simple_subquery() deciding later to not pull up
> the subquery based on LATERAL restrictions, but maybe that misses
> something.
>
> I'm also wondering whether the similar restriction in
> convert_EXISTS_sublink_to_join could be removed similarly.
> In this light it was a mistake for convert_EXISTS_sublink_to_join
> to manage the pullup itself rather than doing it in the two-step
> fashion that convert_ANY_sublink_to_join does it.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
b82557ecc2ebbf649142740a1c5ce8d19089f620 ===
=== applying patch ./v2-0001-use-LATERAL-for-ANY_SUBLINK.patch
patching file contrib/postgres_fdw/expected/postgres_fdw.out
...
Hunk #2 FAILED at 6074.
Hunk #3 FAILED at 6087.
2 out of 3 hunks FAILED -- saving rejects to file
src/test/regress/expected/join.out.rej

[1] - http://cfbot.cputube.org/patch_41_3941.log

Regards,
Vignesh

Re: making relfilenodes 56 bits

2023-01-05 Thread Dilip Kumar

On Wed, Jan 4, 2023 at 5:45 PM vignesh C  wrote:
>
> On Fri, 21 Oct 2022 at 11:31, Michael Paquier  wrote:
> >
> > On Thu, Sep 29, 2022 at 09:23:38PM -0400, Tom Lane wrote:
> > > Hmmm ... I'd tend to do SELECT COUNT(*) FROM.  But can't we provide
> > > any actual checks on the sanity of the output?  I realize that the
> > > output's far from static, but still ...
> >
> > Honestly, checking all the fields is not that exciting, but the
> > maximum I can think of that would be portable enough is something like
> > the attached.  No arithmetic operators for xid limits things a bit,
> > but at least that's something.
> >
> > Thoughts?
>
> The patch does not apply on top of HEAD as in [1], please post a rebased 
> patch:
>

Because of the extra WAL overhead, we are not continuing with the
patch, I will withdraw it.



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Generating code for query jumbling through gen_node_support.pl

2023-01-05 Thread vignesh C

On Wed, 7 Dec 2022 at 13:27, Michael Paquier  wrote:
>
> Hi all,
>
> This thread is a follow-up of the recent discussion about query
> jumbling with DDL statements, where the conclusion was that we'd want
> to generate all this code automatically for all the nodes:
> https://www.postgresql.org/message-id/36e5bffe-e989-194f-85c8-06e7bc88e...@amazon.com
>
> What this patch allows to do it to compute the same query ID for
> utility statements using their parsed Node state instead of their
> string, meaning that things like "BEGIN", "bEGIN" or "begin" would be
> treated the same, for example.  But the main idea is not only that.
>
> I have implemented that as of the attached, where the following things
> are done:
> - queryjumble.c is moved to src/backend/nodes/, to stick with the
> other things for node equal/read/write/copy, renamed to
> jumblefuncs.c.
> - gen_node_support.c is extended to generate the functions and the
> switch for the jumbling.  There are a few exceptions, as of the Lists
> and RangeTblEntry to do the jumbling consistently.
> - Two pg_node_attr() are added in consistency with the existing ones:
> no_jumble to discard completely a node from the the query jumbling
> and jumble_ignore to discard one field from the jumble.
>
> The patch is in a rather good shape, passes check-world and the CI,
> but there are a few things that need to be discussed IMO.  Things
> could be perhaps divided in more patches, now the areas touched are
> quite different so it did not look like a big deal to me as the
> changes touch different areas and are straight-forward.
>
> The location of the Nodes is quite invasive because we only care about
> that for T_Const now in the query jumbling, and this could be
> compensated with a third pg_node_attr() that tracks for the "int
> location" of a Node whether it should participate in the jumbling or
> not.  There is also an argument where we would want to not include by
> default new fields added to a Node, but that would require added more
> pg_node_attr() than what's done here.
>
> Note that the plan is to extend the normalization to some other parts
> of the Nodes, like CALL and SET as mentioned on the other thread.  I
> have done nothing about that yet but doing so can be done in a few
> lines with the facility presented here (aka just add a location
> field).  Hence, the normalization is consistent with the existing
> queryjumble.c for the fields and the nodes processed.
>
> In this patch, things are done so as the query ID is not computed
> anymore from the query string but from the Query.  I still need to
> study the performance impact of that with short queries.  If things
> prove to be noticeable in some cases, this stuff could be switched to
> use a new GUC where we could have a code path for the computation of
> utilityStmt using its string as a fallback.  I am not sure that I want
> to enter in this level of complications, though, to keep things
> simple, but that's yet to be done.
>
> A bit more could be cut but pg_ident got in the way..  There are also
> a few things for pg_stat_statements where a query ID of 0 can be
> implied for utility statements in some cases.
>
> Generating this code leads to an overall removal of code as what
>  queryjumble.c is generated automatically:
>  13 files changed, 901 insertions(+), 1113 deletions(-)
>
> I am adding that to the next commit fest.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
b82557ecc2ebbf649142740a1c5ce8d19089f620 ===
=== applying patch
./0001-Support-for-automated-query-jumble-with-all-Nodes.patch
...
patching file src/backend/utils/misc/queryjumble.c
Hunk #1 FAILED at 1.
Not deleting file src/backend/utils/misc/queryjumble.c as content
differs from patch
1 out of 1 hunk FAILED -- saving rejects to file
src/backend/utils/misc/queryjumble.c.rej

[1] - http://cfbot.cputube.org/patch_41_4047.log

Regards,
Vignesh

Re: Force streaming every change in logical decoding

2023-01-05 Thread vignesh C

On Mon, 26 Dec 2022 at 14:04, Amit Kapila  wrote:
>
> On Sat, Dec 24, 2022 at 3:28 PM Dilip Kumar  wrote:
> >
> > On Sat, Dec 24, 2022 at 8:56 AM Amit Kapila  wrote:
> > >
> > > > >
> > > > > OK, I removed the modification in 022_twophase_cascade.pl and combine 
> > > > > the two patches.
> > > >
> > > > Thank you for updating the patch. The v6 patch looks good to me.
> > > >
> > >
> > > LGTM as well. I'll push this on Monday.
> > >
> >
> > The patch looks good to me.
> >
>
> Pushed.

If there is nothing pending for this patch, can we close the commitfest entry?

Regards,
Vignesh

Re: Add a new pg_walinspect function to extract FPIs from WAL records

2023-01-05 Thread vignesh C

On Thu, 5 Jan 2023 at 18:52, Bharath Rupireddy
 wrote:
>
> On Wed, Jan 4, 2023 at 8:19 PM Drouvot, Bertrand
>  wrote:
> >
> > I think it makes sense to somehow align the pg_walinspect functions with 
> > the pg_waldump "features".
> > And since [1] added FPI "extraction" then +1 for the proposed patch in this 
> > thread.
> >
> > > [1] 
> > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8
> > > [2] 
> > > https://postgr.es/m/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjx6bx...@mail.gmail.com
> >
> > I just have a few comments:
>
> Thanks for reviewing.
>
> > +
> > +/*
> > + * Get full page images and their info associated with a given WAL record.
> > + */
> >
> >
> > + 
> > +  Gets raw full page images and their information associated with all 
> > the
> > +  valid WAL records between start_lsn and
> > +  end_lsn. Returns one row per full page 
> > image.
> >
> > Worth to add a few words about decompression too?
>
> Done.
>
> > What about adding a few words about compression? (like "Decompression is 
> > applied if necessary"?)
> >
> >
> > +   /* Full page exists, so let's output it. */
> > +   if (!RestoreBlockImage(record, block_id, page))
> >
> > "Full page exists, so let's output its info and content." instead?
>
> Done.
>
> > I'm also wondering if it would make sense to extend the test coverage of it 
> > (and pg_waldump) to "validate" that both
> > extracted images are the same and matches the one modified right after the 
> > checkpoint.
> >
> > What do you think? (could be done later in another patch though).
>
> I think pageinspect can be used here. We can fetch the raw page from
> the table after the checkpoint and raw FPI from the WAL record logged
> as part of the update. I've tried to do so [1], but I see a slight
> difference in the raw output. The expectation is that they both be the
> same. It might be that the update operation logs the FPI with some
> more info set (prune_xid). I'll try to see why it is so.
>
> I'm attaching the v2 patch for further review.

I felt one of the files was missing in the patch:
[13:39:03.534] contrib/pg_walinspect/meson.build:19:0: ERROR: File
pg_walinspect--1.0--1.1.sql does not exist.

Please post an updated version for the same.

Regards,
Vignesh

Re: [PATCH] Expand character set for ltree labels

2023-01-05 Thread vignesh C

On Wed, 4 Jan 2023 at 00:27, Garen Torikian  wrote:
>
> Sure. Rebased onto HEAD.
>

There is one more merge conflict,  please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
eb5ad4ff05fd382ac98cab60b82f7fd6ce4cfeb8 ===
=== applying patch ./0003-Expand-character-set-for-ltree-labels.patch
patching file contrib/ltree/expected/ltree.out
Hunk #1 succeeded at 25 with fuzz 2.
Hunk #2 FAILED at 51.
Hunk #3 FAILED at 537.
Hunk #4 succeeded at 1201 with fuzz 2.
2 out of 4 hunks FAILED -- saving rejects to file
contrib/ltree/expected/ltree.out.rej

Regards,
Vignesh

Re: TAP output format in pg_regress

2023-01-05 Thread vignesh C

On Tue, 3 Jan 2023 at 16:01, vignesh C  wrote:
>
> On Tue, 29 Nov 2022 at 00:57, Daniel Gustafsson  wrote:
> >
> > > On 28 Nov 2022, at 20:02, Nikolay Shaplov  wrote:
> >
> > > From my reviewer's point of view patch is ready for commit.
> > >
> > > Thank you for your patience :-)
> >
> > Thanks for review.
> >
> > The attached tweaks a few comments and attempts to address the compiler 
> > warning
> > error in the CFBot CI.  Not sure I entirely agree with the compiler there 
> > but
> > here is an attempt to work around it at least (by always copying the va_list
> > for the logfile). It also adds a missing va_end for the logfile va_list.
> >
> > I hope this is close to a final version of this patch (commitmessage needs a
> > bit of work though).
> >
> > > PS Should I change commitfest status?
> >
> > Sure, go ahead.
> >
>
> The patch does not apply on top of HEAD as in [1], please post a rebased 
> patch:
> === Applying patches on top of PostgreSQL commit ID
> 92957ed98c5c565362ce665266132a7f08f6b0c0 ===
> === applying patch
> ./v14-0001-Change-pg_regress-output-format-to-be-TAP-compli.patch
> patching file meson.build
> Hunk #1 FAILED at 2968.
> 1 out of 1 hunk FAILED -- saving rejects to file meson.build.rej

Attached a rebased patch on top of HEAD to try and see if we can close
this patch in this commitfest.

Regards,
Vignesh
From 005bcf13037577a8682e3a6365b1b52bb2e48492 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson 
Date: Fri, 6 Jan 2023 11:09:10 +0530
Subject: [PATCH v15] Change pg_regress output format to be TAP compliant

This converts pg_regress output format to emit TAP complient output
while keeping it as human readable as possible for use without TAP
test harnesses. As verbose harness related information isn't really
supported by TAP this also reduces the verbosity of pg_regress runs
which makes scrolling through log output in buildfarm/CI runs a bit
easier as well.

TAP format testing is also enabled in Meson as of this.

Discussion: https://postgr.es/m/bd4b107d-7e53-4794-acba-275beb432...@yesql.se
---
 meson.build   |   1 +
 src/test/regress/pg_regress.c | 575 ++
 2 files changed, 314 insertions(+), 262 deletions(-)

diff --git a/meson.build b/meson.build
index 45fb9dd616..7195937d17 100644
--- a/meson.build
+++ b/meson.build
@@ -3024,6 +3024,7 @@ foreach test_dir : tests
   env.prepend('PATH', temp_install_bindir, test_dir['bd'])
 
   test_kwargs = {
+'protocol': 'tap',
 'priority': 10,
 'timeout': 1000,
 'depends': test_deps + t.get('deps', []),
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 40e6c231a3..a05f423c75 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -68,6 +68,26 @@ const char *basic_diff_opts = "-w";
 const char *pretty_diff_opts = "-w -U3";
 #endif
 
+/* For parallel tests, testnames are indented when printed for grouping */
+#define PARALLEL_INDENT 7
+/*
+ * The width of the testname field when printing to ensure vertical alignment
+ * of test runtimes. This number is somewhat arbitrarily chosen to match the
+ * older pre-TAP output format.
+ */
+#define TESTNAME_WIDTH 36
+
+typedef enum TAPtype
+{
+	DIAG = 0,
+	BAIL,
+	NOTE,
+	DETAIL,
+	TEST_STATUS,
+	PLAN,
+	NONE
+}			TAPtype;
+
 /* options settable from command line */
 _stringlist *dblist = NULL;
 bool		debug = false;
@@ -103,6 +123,7 @@ static const char *sockdir;
 static const char *temp_sockdir;
 static char sockself[MAXPGPATH];
 static char socklock[MAXPGPATH];
+static StringInfo failed_tests = NULL;
 
 static _resultmap *resultmap = NULL;
 
@@ -116,12 +137,29 @@ static int	fail_ignore_count = 0;
 static bool directory_exists(const char *dir);
 static void make_directory(const char *dir);
 
-static void header(const char *fmt,...) pg_attribute_printf(1, 2);
-static void status(const char *fmt,...) pg_attribute_printf(1, 2);
+static void test_status_print(bool ok, const char *testname, double runtime, bool parallel, bool ignore);
+static void test_status_ok(const char *testname, double runtime, bool parallel);
+static void test_status_failed(const char *testname, bool ignore, double runtime, bool parallel);
+static void bail_out(bool noatexit, const char *fmt,...) pg_attribute_printf(2, 3);
+static void emit_tap_output(TAPtype type, const char *fmt,...) pg_attribute_printf(2, 3);
+static void emit_tap_output_v(TAPtype type, const char *fmt, va_list argp) pg_attribute_printf(2, 0);
+
 static StringInfo psql_start_command(void);
 static void psql_add_command(StringInfo buf, const char *query,...) pg_attribute_printf(2, 3);
 static void psql_end_command(StringInfo buf, const char *database);
 
+/*
+ * Convenience macros for printing TAP output with a more shorthand syntax
+ * aimed at making the code more readable.
+ */
+#define plan(x)emit_tap_output(PLAN, "1..%i\n", (x))
+#define note(...)			emit_tap_output(NOTE, __VA_ARGS__)
+#define note_detail(...)

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-05 Thread Dilip Kumar

On Fri, Jan 6, 2023 at 9:37 AM houzj.f...@fujitsu.com
 wrote:
>
> On Thursday, January 5, 2023 7:54 PM Dilip Kumar  
> wrote:
> Thanks, I have started another thread[1]
>
> Attach the parallel apply patch set here again. I didn't change the patch set,
> attach it here just to let the CFbot keep testing it.

I have completed the review and some basic testing and it mostly looks
fine to me.  Here is my last set of comments/suggestions.

1.
/*
 * Don't start a new parallel worker if user has set skiplsn as it's
 * possible that they want to skip the streaming transaction. For
 * streaming transactions, we need to serialize the transaction to a file
 * so that we can get the last LSN of the transaction to judge whether to
 * skip before starting to apply the change.
 */
if (!XLogRecPtrIsInvalid(MySubscription->skiplsn))
return false;

I think this is fine to block parallelism in this case, but it is also
possible to make it less restrictive, basically, only if the first lsn
of the transaction is <= skiplsn, then only it is possible that the
final_lsn might match with skiplsn otherwise that is not possible. And
if we want then we can allow parallelism in that case.

I understand that currently we do not have first_lsn of the
transaction in stream start message but I think that should be easy to
do?  Although I am not sure if it is worth it, it's good to make a
note at least.

2.

+ * XXX Additionally, we also stop the worker if the leader apply worker
+ * serialize part of the transaction data due to a send timeout. This is
+ * because the message could be partially written to the queue and there
+ * is no way to clean the queue other than resending the message until it
+ * succeeds. Instead of trying to send the data which anyway would have
+ * been serialized and then letting the parallel apply worker deal with
+ * the spurious message, we stop the worker.
+ */
+if (winfo->serialize_changes ||
+list_length(ParallelApplyWorkerPool) >
+(max_parallel_apply_workers_per_subscription / 2))

IMHO this reason (XXX Additionally, we also stop the worker if the
leader apply worker serialize part of the transaction data due to a
send timeout) for stopping the worker looks a bit hackish to me.  It
may be a rare case so I am not talking about the performance but the
reasoning behind stopping is not good. Ideally we should be able to
clean up the message queue and reuse the worker.

3.
+else if (shmq_res == SHM_MQ_WOULD_BLOCK)
+{
+/* Replay the changes from the file, if any. */
+if (pa_has_spooled_message_pending())
+{
+pa_spooled_messages();
+}

I think we do not need this pa_has_spooled_message_pending() function.
Because this function is just calling pa_get_fileset_state() which is
acquiring mutex and getting filestate then if the filestate is not
FS_EMPTY then we call pa_spooled_messages() that will again call
pa_get_fileset_state() which will again acquire mutex.  I think when
the state is FS_SERIALIZE_IN_PROGRESS it will frequently call
pa_get_fileset_state() consecutively 2 times, and I think we can
easily achieve the same behavior with just one call.

4.

+ * leader, or when there there is an error. None of these cases will allow
+ * the code to reach here.

/when there there is an error/when there is an error

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Notify downstream to discard the streamed transaction which was aborted due to crash.

2023-01-05 Thread Dilip Kumar

On Fri, Jan 6, 2023 at 9:25 AM houzj.f...@fujitsu.com
 wrote:
>
> Hi,
>
> When developing another feature, I find an existing bug which was reported 
> from Dilip[1].
>
> Currently, it's possible that we only send a streaming block without sending a
> end of stream message(stream abort) when decoding and streaming a transaction
> that was aborted due to crash because we might not WAL log a XLOG_XACT_ABORT
> for such a crashed transaction. This will cause the subscriber(with
> streaming=on) create a stream file but won't delete it until the apply
> worker restart.
>
> BUG repro(borrowed from Dilip):
> ---
> 1. start 2 servers(config: logical_decoding_work_mem=64kB)
> ./pg_ctl -D data/ -c -l pub_logs start
> ./pg_ctl -D data1/ -c -l sub_logs start
>
> 2. Publisher:
> create table t(a int PRIMARY KEY ,b text);
> create publication test_pub for table t
> with(PUBLISH='insert,delete,update,truncate');
> alter table t replica identity FULL ;
>
> 3. Subscription Server:
> create table t(a int,b text);
> create subscription test_sub CONNECTION 'host=localhost port=1
> dbname=postgres' PUBLICATION test_pub WITH ( slot_name =
> test_slot_sub1,streaming=on);
>
> 4. Publication Server:
> begin ;
> insert into t values (generate_series(1,5), 'z');  -- (while 
> executing this restart publisher in 2-3 secs)
>
> Restart the publication server, while the transaction is still in an
> uncommitted state.
> ./pg_ctl -D data/ -c -l pub_logs restart -mi
> ---
>
> After restarting the publisher, we can see the subscriber receive a streaming
> block and create a stream file(/base/pgsql_tmp/xxx.fileset).
>
> To fix it, One idea is to send a stream abort message when we are cleaning up
> crashed transaction on publisher(e.g. in ReorderBufferAbortOld()). And here is
> a tiny patch which changes the same. I have confirmed that the bug is fixed 
> and
> all regression tests pass. I didn't add a testcase because we need to make 
> sure
> the crash happens before all the WAL logged transactions data are decoded 
> which
> doesn't seem easy to write a stable test for this.
>
> Thoughts ?

Fix looks good to me.  Thanks for working on this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

RE: Notify downstream to discard the streamed transaction which was aborted due to crash.

2023-01-05 Thread houzj.f...@fujitsu.com

On Friday, January 6, 2023 1:15 PM Amit Kapila  wrote:
> 
> On Fri, Jan 6, 2023 at 9:25 AM houzj.f...@fujitsu.com 
> wrote:
> >
> >
> > To fix it, One idea is to send a stream abort message when we are
> > cleaning up crashed transaction on publisher(e.g. in
> > ReorderBufferAbortOld()). And here is a tiny patch which changes the
> > same. I have confirmed that the bug is fixed and all regression tests
> > pass. I didn't add a testcase because we need to make sure the crash
> > happens before all the WAL logged transactions data are decoded which
> doesn't seem easy to write a stable test for this.
> >
> 
> Your fix looks good to me. Have you tried this in PG-14 where it was
> introduced?

Yes, I have confirmed that PG-14 has the same problem and can be fixed after
applying the patch.

Best regards,
Hou zj

Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-05 Thread Nathan Bossart

I found some additional places that should remove the last-start time from
the hash table.  I've added those in v14.

On Fri, Jan 06, 2023 at 10:30:18AM +0530, Amit Kapila wrote:
> On Thu, Jan 5, 2023 at 10:49 PM Nathan Bossart  
> wrote:
>> On Thu, Jan 05, 2023 at 11:34:37AM +0530, Amit Kapila wrote:
>> > On Thu, Jan 5, 2023 at 10:16 AM Nathan Bossart  
>> > wrote:
>> >> In v12, I moved the restart for two_phase mode to the end of
>> >> process_syncing_tables_for_apply() so that we don't need to rely on 
>> >> another
>> >> iteration of the loop.
>> >
>> > This should work but it is better to add a comment before calling
>> > CommandCounterIncrement() to indicate that this is for making changes
>> > to the relation state visible.
>>
>> Will do.
> 
> Isn't it better to move this part into a separate patch as this is
> useful even without the main patch to improve wakeups?

І moved it to a separate patch in v14.

>> > Thinking along similar lines, won't apply worker need to be notified
>> > of SUBREL_STATE_SYNCWAIT state change by the tablesync worker?
>>
>> wait_for_worker_state_change() should notify the apply worker in this case.
> 
> I think this is yet to be included in the patch, right?

This is already present on HEAD.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 955111b35d9d416b8d895649525b25c03a15e653 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 5 Jan 2023 21:17:42 -0800
Subject: [PATCH v14 1/3] move restart for two_phase mode to end of
 process_syncing_tables_for_apply()

---
 src/backend/replication/logical/tablesync.c | 48 +++--
 1 file changed, 26 insertions(+), 22 deletions(-)

diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 2fdfeb5b4c..280fed8cdb 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -415,6 +415,7 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
 	static HTAB *last_start_times = NULL;
 	ListCell   *lc;
 	bool		started_tx = false;
+	bool		should_exit = false;
 
 	Assert(!IsTransactionState());
 
@@ -446,28 +447,6 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
 		last_start_times = NULL;
 	}
 
-	/*
-	 * Even when the two_phase mode is requested by the user, it remains as
-	 * 'pending' until all tablesyncs have reached READY state.
-	 *
-	 * When this happens, we restart the apply worker and (if the conditions
-	 * are still ok) then the two_phase tri-state will become 'enabled' at
-	 * that time.
-	 *
-	 * Note: If the subscription has no tables then leave the state as
-	 * PENDING, which allows ALTER SUBSCRIPTION ... REFRESH PUBLICATION to
-	 * work.
-	 */
-	if (MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
-		AllTablesyncsReady())
-	{
-		ereport(LOG,
-(errmsg("logical replication apply worker for subscription \"%s\" will restart so that two_phase can be enabled",
-		MySubscription->name)));
-
-		proc_exit(0);
-	}
-
 	/*
 	 * Process all tables that are being synchronized.
 	 */
@@ -619,9 +598,34 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
 
 	if (started_tx)
 	{
+		/*
+		 * Even when the two_phase mode is requested by the user, it remains as
+		 * 'pending' until all tablesyncs have reached READY state.
+		 *
+		 * When this happens, we restart the apply worker and (if the conditions
+		 * are still ok) then the two_phase tri-state will become 'enabled' at
+		 * that time.
+		 *
+		 * Note: If the subscription has no tables then leave the state as
+		 * PENDING, which allows ALTER SUBSCRIPTION ... REFRESH PUBLICATION to
+		 * work.
+		 */
+		CommandCounterIncrement();	/* make updates visible */
+		if (MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
+			AllTablesyncsReady())
+		{
+			ereport(LOG,
+	(errmsg("logical replication apply worker for subscription \"%s\" will restart so that two_phase can be enabled",
+			MySubscription->name)));
+			should_exit = true;
+		}
+
 		CommitTransactionCommand();
 		pgstat_report_stat(true);
 	}
+
+	if (should_exit)
+		proc_exit(0);
 }
 
 /*
-- 
2.25.1

>From 3018a4029b573c69f6674b86735d0d94c1350eb4 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 21 Nov 2022 16:01:01 -0800
Subject: [PATCH v14 2/3] wake up logical workers as needed instead of relying
 on periodic wakeups

---
 src/backend/access/transam/xact.c|  3 ++
 src/backend/commands/alter.c |  7 
 src/backend/commands/subscriptioncmds.c  |  7 
 src/backend/replication/logical/worker.c | 46 
 src/include/replication/logicalworker.h  |  3 ++
 5 files changed, 66 insertions(+)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 24221542e7..54145bf805 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -47,6 +47,7 @@
 #include "pgstat.h"
 #include "replication/logical.h"
 #include

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-01-05 Thread Andrey Borodin

On Mon, Dec 19, 2022 at 6:41 AM Maxim Orlov  wrote:
>
> As always, reviews and opinions are very welcome.

Hi! I think that 64-bit xids are a very important feature and I want
to help advance it. That's why I want to try to understand a patch
better.
Do I get it right that the proposed v51 patchset only changes the SLRU
filenames and type of pageno representation? Is SLRU wraparound still
exactly there after 0x byte?

The thing is we had some nasty bugs because SLRU wraparound is tricky.
And I think it would be beneficial if we could get to continuous SLRU
space. But the patch seems to avoid addressing this problem.

Also, I do not understand what is the reason for splitting 1st and 2nd
steps. Certainly, there must be some logic behind it, but I just can't
grasp it...
And the purpose of the 3rd step with pg_upgrade changes is a complete
mystery for me. Please excuse my incompetence in the topic, but maybe
some commit message or comments would help. What kind of xact segments
conversion we do? Why is it only necessary for xacts, but not other
SLRUs?

Thank you for working on this important project!

Best regards, Andrey Borodin.

Re: Logical replication - schema change not invalidating the relation cache

2023-01-05 Thread vignesh C

On Fri, 6 Jan 2023 at 04:32, Tom Lane  wrote:
>
> vignesh C  writes:
> > On Thu, 5 Jan 2023 at 03:17, Tom Lane  wrote:
> >> The bigger picture here though is that in examples such as the one
> >> you gave at the top of the thread, it's not very clear to me that
> >> there's *any* principled behavior.  If the connection between publisher
> >> and subscriber tables is only the relation name, fine ... but exactly
> >> which relation name applies?
>
> > The connection between publisher and subscriber table is based on
> > relation id, During the first change relid, relname and schema name
> > from publisher will be sent to the subscriber. Subscriber stores these
> > id, relname and schema name in the LogicalRepRelMap hash for which
> > relation id is the key. Subsequent data received in the subscriber
> > will use the relation id received from the publisher and apply the
> > changes in the subscriber.
>
> Hm.  I spent some time cleaning up this patch, and found that there's
> still a problem.  ISTM that the row with value "3" ought to end up
> in the subscriber's sch2.t1 table, but it does not: the attached
> test script fails with
>
> t/100_bugs.pl .. 6/?
> #   Failed test 'check data in subscriber sch2.t1 after schema rename'
> #   at t/100_bugs.pl line 361.
> #  got: ''
> # expected: '3'
> # Looks like you failed 1 test of 9.
> t/100_bugs.pl .. Dubious, test returned 1 (wstat 256, 0x100)
> Failed 1/9 subtests
>
> What's up with that?

When the subscription is created, the subscriber will create a
subscription relation map of the corresponding relations from the
publication. The subscription relation map will only have sch1.t1
entry. As sch2.t1 was not present in the publisher when the
subscription was created, subscription will not have this entry in the
subscription relation map. So the insert operations performed on the
new table sch2.t1 will not be applied by the subscriber. We will have
to refresh the publication using 'ALTER SUBSCRIPTION ... REFRESH
PUBLICATION' to fetch missing table information from publisher. This
will start replication of tables that were added to the subscribed-to
publications since CREATE SUBSCRIPTION or the last invocation of
REFRESH PUBLICATION.
I have modified the test to include 'ALTER SUBSCRIPTION ... REFRESH
PUBLICATION' to get the new data. The test should expect 1 & 3 for
sch2.t1 as the record with value 1 was already inserted before rename.
The updated v6 patch has the changes for the same.

Regards,
Vignesh
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 7737242516..876adab38e 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1929,7 +1929,22 @@ init_rel_sync_cache(MemoryContext cachectx)
 
 	Assert(RelationSyncCache != NULL);
 
+	/* We must update the cache entry for a relation after a relcache flush */
 	CacheRegisterRelcacheCallback(rel_sync_cache_relation_cb, (Datum) 0);
+
+	/*
+	 * Flush all cache entries after a pg_namespace change, in case it was a
+	 * schema rename affecting a relation being replicated.
+	 */
+	CacheRegisterSyscacheCallback(NAMESPACEOID,
+  rel_sync_cache_publication_cb,
+  (Datum) 0);
+
+	/*
+	 * Flush all cache entries after any publication changes.  (We need no
+	 * callback entry for pg_publication, because publication_invalidation_cb
+	 * will take care of it.)
+	 */
 	CacheRegisterSyscacheCallback(PUBLICATIONRELMAP,
   rel_sync_cache_publication_cb,
   (Datum) 0);
@@ -2325,8 +2340,8 @@ rel_sync_cache_relation_cb(Datum arg, Oid relid)
 /*
  * Publication relation/schema map syscache invalidation callback
  *
- * Called for invalidations on pg_publication, pg_publication_rel, and
- * pg_publication_namespace.
+ * Called for invalidations on pg_publication, pg_publication_rel,
+ * pg_publication_namespace, and pg_namespace.
  */
 static void
 rel_sync_cache_publication_cb(Datum arg, int cacheid, uint32 hashvalue)
@@ -2337,14 +2352,14 @@ rel_sync_cache_publication_cb(Datum arg, int cacheid, uint32 hashvalue)
 	/*
 	 * We can get here if the plugin was used in SQL interface as the
 	 * RelSchemaSyncCache is destroyed when the decoding finishes, but there
-	 * is no way to unregister the relcache invalidation callback.
+	 * is no way to unregister the invalidation callbacks.
 	 */
 	if (RelationSyncCache == NULL)
 		return;
 
 	/*
-	 * There is no way to find which entry in our cache the hash belongs to so
-	 * mark the whole cache as invalid.
+	 * We have no easy way to identify which cache entries this invalidation
+	 * event might have affected, so just mark them all invalid.
 	 */
 	hash_seq_init(, RelationSyncCache);
 	while ((entry = (RelationSyncEntry *) hash_seq_search()) != NULL)
diff --git a/src/test/subscription/t/100_bugs.pl b/src/test/subscription/t/100_bugs.pl
index 143caac792..ed310184ff 100644
--- a/src/test/subscription/t/100_bugs.pl
+++

Re: Notify downstream to discard the streamed transaction which was aborted due to crash.

2023-01-05 Thread Amit Kapila

On Fri, Jan 6, 2023 at 9:25 AM houzj.f...@fujitsu.com
 wrote:
>
>
> To fix it, One idea is to send a stream abort message when we are cleaning up
> crashed transaction on publisher(e.g. in ReorderBufferAbortOld()). And here is
> a tiny patch which changes the same. I have confirmed that the bug is fixed 
> and
> all regression tests pass. I didn't add a testcase because we need to make 
> sure
> the crash happens before all the WAL logged transactions data are decoded 
> which
> doesn't seem easy to write a stable test for this.
>

Your fix looks good to me. Have you tried this in PG-14 where it was introduced?

-- 
With Regards,
Amit Kapila.

Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-05 Thread Amit Kapila

On Thu, Jan 5, 2023 at 10:49 PM Nathan Bossart  wrote:
>
> On Thu, Jan 05, 2023 at 11:34:37AM +0530, Amit Kapila wrote:
> > On Thu, Jan 5, 2023 at 10:16 AM Nathan Bossart  
> > wrote:
> >> In v12, I moved the restart for two_phase mode to the end of
> >> process_syncing_tables_for_apply() so that we don't need to rely on another
> >> iteration of the loop.
> >
> > This should work but it is better to add a comment before calling
> > CommandCounterIncrement() to indicate that this is for making changes
> > to the relation state visible.
>
> Will do.
>

Isn't it better to move this part into a separate patch as this is
useful even without the main patch to improve wakeups?

> > Thinking along similar lines, won't apply worker need to be notified
> > of SUBREL_STATE_SYNCWAIT state change by the tablesync worker?
>
> wait_for_worker_state_change() should notify the apply worker in this case.
>

I think this is yet to be included in the patch, right?

-- 
With Regards,
Amit Kapila.

Re: Infinite Interval

2023-01-05 Thread jian he

On Fri, Jan 6, 2023 at 6:54 AM Joseph Koshakow  wrote:

> Jian,
>
> I incorporated your changes and updated interval.out and ran
> pgindent. Looks like some of the error messages have changed and we
> have some issues with parsing "+infinity" after rebasing.
>
> - Joe
>

Looks like some of the error messages have changed and we
> have some issues with parsing "+infinity" after rebasing.
>

There is a commit 2ceea5adb02603ef52579b568ca2c5aebed87358
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ceea5adb02603ef52579b568ca2c5aebed87358
if you pull this commit then you can do select interval '+infinity', even
though I don't know why.

Re: Schema variables - new implementation for Postgres 15 (typo)

2023-01-05 Thread Julien Rouhaud

Hi,

On Fri, Dec 23, 2022 at 08:38:43AM +0100, Pavel Stehule wrote:
>
> I am sending an updated patch, fixing the mentioned issue. Big thanks for
> testing, and checking.

There were multiple reviews in the last weeks which reported some issues, but
unless I'm missing something I don't see any follow up from the reviewers on
the changes?

I will still wait a bit to see if they chime in while I keep looking at the
diff since the last version of the code I reviewed.

But in the meantime I already saw a couple of things that don't look right:

--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -481,6 +481,11 @@ does_not_exist_skipping(ObjectType objtype, Node *object)
msg = gettext_noop("publication \"%s\" does not exist, 
skipping");
name = strVal(object);
break;
+   case OBJECT_VARIABLE:
+   msg = gettext_noop("session variable \"%s\" does not 
exist, skipping");
+   name = NameListToString(castNode(List, object));
+   break;
+   default:

case OBJECT_COLUMN:

the "default:" seems like a thinko during a rebase?

+Datum
+GetSessionVariableWithTypeCheck(Oid varid, bool *isNull, Oid expected_typid)
+{
+   SVariable   svar;
+
+   svar = prepare_variable_for_reading(varid);
+   Assert(svar != NULL && svar->is_valid);
+
+   return CopySessionVariableWithTypeCheck(varid, isNull, expected_typid);
+
+   if (expected_typid != svar->typid)
+   elog(ERROR, "type of variable \"%s.%s\" is different than 
expected",
+
get_namespace_name(get_session_variable_namespace(varid)),
+get_session_variable_name(varid));
+
+   *isNull = svar->isnull;
+
+   return svar->value;
+}

there's a unconditional return in the middle of the function, which also looks
like a thinko during a rebase since CopySessionVariableWithTypeCheck mostly
contains the same following code.

I'm also wondering if there should be additional tests based on the last
scenario reported by Dmitry? (I don't see any new or similar test, but I may
have missed it).

> > > Why do you think so? The variable has no mvcc support - it is just stored
> > > value with local visibility without mvcc support. There can be little bit
> > > similar issues like with global temporary tables.
> >
> > Yeah, sorry for not being precise, I mean global temporary tables. This
> > is not my analysis, I've simply picked up it was mentioned a couple of
> > times here. The points above are not meant to serve as an objection
> > against the patch, but rather to figure out if there are any gaps left
> > to address and come up with some sort of plan with "committed" as a
> > final destination.
> >
>
> There are some similarities, but there are a lot of differences too.
> Handling of metadata is partially similar, but session variable is almost
> the value cached in session memory. It has no statistics, it is not stored
> in a file. Because there is different storage, I don't think there is some
> intersection on implementation level.

+1

Re: pg_ftruncate hardcodes length=0 but only under windows

2023-01-05 Thread Thomas Munro

On Fri, Jan 6, 2023 at 4:16 PM Justin Pryzby  wrote:
> -   ret = ftruncate(fd, 0);
> +   ret = ftruncate(fd, length);

Oops.  Right.  Thanks, pushed.

Notify downstream to discard the streamed transaction which was aborted due to crash.

2023-01-05 Thread houzj.f...@fujitsu.com

Hi,

When developing another feature, I find an existing bug which was reported from 
Dilip[1].

Currently, it's possible that we only send a streaming block without sending a
end of stream message(stream abort) when decoding and streaming a transaction
that was aborted due to crash because we might not WAL log a XLOG_XACT_ABORT
for such a crashed transaction. This will cause the subscriber(with
streaming=on) create a stream file but won't delete it until the apply
worker restart.

BUG repro(borrowed from Dilip):
---
1. start 2 servers(config: logical_decoding_work_mem=64kB)
./pg_ctl -D data/ -c -l pub_logs start
./pg_ctl -D data1/ -c -l sub_logs start

2. Publisher:
create table t(a int PRIMARY KEY ,b text);
create publication test_pub for table t
with(PUBLISH='insert,delete,update,truncate');
alter table t replica identity FULL ;

3. Subscription Server:
create table t(a int,b text);
create subscription test_sub CONNECTION 'host=localhost port=1
dbname=postgres' PUBLICATION test_pub WITH ( slot_name =
test_slot_sub1,streaming=on);

4. Publication Server:
begin ;
insert into t values (generate_series(1,5), 'z');  -- (while 
executing this restart publisher in 2-3 secs)

Restart the publication server, while the transaction is still in an
uncommitted state.
./pg_ctl -D data/ -c -l pub_logs restart -mi
---

After restarting the publisher, we can see the subscriber receive a streaming
block and create a stream file(/base/pgsql_tmp/xxx.fileset).

To fix it, One idea is to send a stream abort message when we are cleaning up
crashed transaction on publisher(e.g. in ReorderBufferAbortOld()). And here is
a tiny patch which changes the same. I have confirmed that the bug is fixed and
all regression tests pass. I didn't add a testcase because we need to make sure
the crash happens before all the WAL logged transactions data are decoded which
doesn't seem easy to write a stable test for this.

Thoughts ?

[1] 
https://www.postgresql.org/message-id/CAFiTN-sTYk%3Dh75Jn1a7ee%2B5hOcdQFjKGDvF_0NWQQXmoBv4A%2BA%40mail.gmail.com

Best regards,
Hou zj


0001-fix-stream-changes-for-crashed-transaction.patch
Description: 0001-fix-stream-changes-for-crashed-transaction.patch

Re: Minimal logical decoding on standbys

2023-01-05 Thread Andres Freund

Hi,

Thomas, there's one point at the bottom wrt ConditionVariables that'd be
interesting for you to comment on.

On 2023-01-05 16:15:39 -0500, Robert Haas wrote:
> On Tue, Jan 3, 2023 at 2:42 AM Drouvot, Bertrand
>  wrote:
> > Please find attached v36, tiny rebase due to 1de58df4fe.
>
> 0001 looks committable to me now, though we probably shouldn't do that
> unless we're pretty confident about shipping enough of the rest of
> this to accomplish something useful.

Cool!

ISTM that the ordering of patches isn't quite right later on. ISTM that it
doesn't make sense to introduce working logic decoding without first fixing
WalSndWaitForWal() (i.e. patch 0006). What made you order the patches that
way?

0001:
> 4. We can't rely on the standby's relcache entries for this purpose in
> any way, because the WAL record that causes the problem might be
> replayed before the standby even reaches consistency.

The startup process can't access catalog contents in the first place, so the
consistency issue is secondary.

ISTM that the commit message omits a fairly significant portion of the change:
The introduction of indisusercatalog / the reason for its introduction.

Why is indisusercatalog stored as "full" column, whereas we store the fact of
table being used as a catalog table in a reloption? I'm not adverse to moving
to a full column, but then I think we should do the same for tables.

Earlier version of the patches IIRC sourced the "catalogness" from the
relation. What lead you to changing that? I'm not saying it's wrong, just not
sure it's right either.

It'd be good to introduce cross-checks that indisusercatalog is set
correctly. RelationGetIndexList() seems like a good candidate.

I'd probably split the introduction of indisusercatalog into a separate patch.

Why was HEAP_DEFAULT_USER_CATALOG_TABLE introduced in this patch?

I wonder if we instead should compute a relation's "catalogness" in the
relcache. That'd would have the advantage of making
RelationIsUsedAsCatalogTable() cheaper and working for all kinds of
relations.

VISIBILITYMAP_ON_CATALOG_ACCESSIBLE_IN_LOGICAL_DECODING is a very long
identifier. Given that the field in the xlog records is just named
isCatalogRel, any reason to not just name it correspondingly?

0002:

> +/*
> + * Helper for InvalidateConflictingLogicalReplicationSlot -- acquires the 
> given slot
> + * and mark it invalid, if necessary and possible.
> + *
> + * Returns whether ReplicationSlotControlLock was released in the interim 
> (and
> + * in that case we're not holding the lock at return, otherwise we are).
> + *
> + * This is inherently racy, because we release the LWLock
> + * for syscalls, so caller must restart if we return true.
> + */
> +static bool
> +InvalidatePossiblyConflictingLogicalReplicationSlot(ReplicationSlot *s, 
> TransactionId xid)

This appears to be a near complete copy of InvalidatePossiblyObsoleteSlot(). I
don't think we should have two versions of that non-trivial code. Seems we
could just have an additional argument for InvalidatePossiblyObsoleteSlot()?

> + ereport(LOG,
> + (errmsg("invalidating slot \"%s\" 
> because it conflicts with recovery", NameStr(slotname;
> +

I think this should report more details, similar to what
InvalidateObsoleteReplicationSlots() does.

> --- a/src/backend/replication/logical/logicalfuncs.c
> +++ b/src/backend/replication/logical/logicalfuncs.c
> @@ -216,11 +216,14 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo 
> fcinfo, bool confirm, bool bin
>
>   /*
>* After the sanity checks in CreateDecodingContext, make sure 
> the
> -  * restart_lsn is valid.  Avoid "cannot get changes" wording in 
> this
> +  * restart_lsn is valid or both xmin and catalog_xmin are valid.
> +  * Avoid "cannot get changes" wording in this
>* errmsg because that'd be confusingly ambiguous about no 
> changes
>* being available.
>*/
> - if (XLogRecPtrIsInvalid(MyReplicationSlot->data.restart_lsn))
> + if (XLogRecPtrIsInvalid(MyReplicationSlot->data.restart_lsn)
> + || (!TransactionIdIsValid(MyReplicationSlot->data.xmin)
> + && 
> !TransactionIdIsValid(MyReplicationSlot->data.catalog_xmin)))
>   ereport(ERROR,
>   
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>errmsg("can no longer get changes from 
> replication slot \"%s\"",

Hm. Feels like we should introduce a helper like SlotIsInvalidated() instead
of having this condition in a bunch of places.

> + if (!TransactionIdIsValid(MyReplicationSlot->data.xmin)
> +  && !TransactionIdIsValid(MyReplicationSlot->data.catalog_xmin))
> + ereport(ERROR,
> + 
>

pg_ftruncate hardcodes length=0 but only under windows

2023-01-05 Thread Justin Pryzby

57faaf376 added pg_truncate(const char *path, off_t length), but
"length" is ignored under WIN32 and the file is unconditionally
truncated to 0.

There's no live bug, since the only caller passes 0:

| src/backend/storage/smgr/md.c:  ret = pg_truncate(path, 0);

But I guess extension users could be unhappy under win32, so maybe a fix
should be backpatched.

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index d4a46f01583..926d000f2ea 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -638,7 +638,7 @@ pg_truncate(const char *path, off_t length)
fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
if (fd >= 0)
{
-   ret = ftruncate(fd, 0);
+   ret = ftruncate(fd, length);
save_errno = errno;
CloseTransientFile(fd);
errno = save_errno;

Re: Add index scan progress to pg_stat_progress_vacuum

2023-01-05 Thread Imseih (AWS), Sami

>Similar to above three cases, vacuum can bypass index vacuuming if
>there are almost zero TIDs. Should we set indexes_total to 0 in this
>case too? If so, I think we can set both indexes_total and
>indexes_completed at the beginning of the index vacuuming/cleanup and
>reset them at the end. 

Unlike the other 3 cases, in which the vacuum and cleanup are totally skipped,
a cleanup still occurs when the index vacuum is bypassed. From what I can tell,
this is to allow for things like a gin pending list cleanup even if the index
is not vacuumed. There could be other reasons as well.

if (bypass)
{
/*
 * There are almost zero TIDs.  Behave as if there were 
precisely
 * zero: bypass index vacuuming, but do index cleanup.
 *
 * We expect that the ongoing VACUUM operation will finish very
 * quickly, so there is no point in considering speeding up as a
 * failsafe against wraparound failure. (Index cleanup is 
expected to
 * finish very quickly in cases where there were no 
ambulkdelete()
 * calls.)
 */
vacrel->do_index_vacuuming = false;
}

So it seems like we should still report the total number of indexes as we are 
currently
doing in the patch.

With that said, the documentation should make this be more clear.

For indexes_total, the description should be:

   Number of indexes that will be vacuumed or cleaned up. This value will be
0 if there are no indexes to vacuum, 
INDEX_CLEANUP
is set to OFF, or vacuum failsafe is triggered.
See 

For indexes_completed, it should be:

   Number of indexes vacuumed in the current vacuum cycle when the
   phase is vacuuming indexes, or the number
   of indexes cleaned up in the cleaning up indexes
   phase.


Regards,

--
Sami Imseih
Amazon Web Services: https://aws.amazon.com

Re: Improve tab completion for ALTER FUNCTION/PROCEDURE/ROUTINE

2023-01-05 Thread vignesh C

On Thu, 5 Jan 2023 at 18:22, Dean Rasheed  wrote:
>
> On Tue, 6 Dec 2022 at 19:12, vignesh C  wrote:
> >
> > On Tue, 6 Dec 2022 at 20:42, Melih Mutlu  wrote:
> > >
> > > Also one little suggestion:
> > >
> > >> + if (ends_with(prev_wd, ')'))
> > >> + COMPLETE_WITH(Alter_routine_options, "CALLED ON NULL INPUT",
> > >> +  "RETURNS NULL ON NULL INPUT", "STRICT", "SUPPORT");
> > >
> > > What do you think about gathering FUNCTION options as you did with 
> > > ROUTINE options.
> > > Something like the following would seem nicer, I think.
> > >
> > >> #define Alter_function_options \
> > >> Alter_routine_options, "CALLED ON NULL INPUT", \
> > >> "RETURNS NULL ON NULL INPUT", "STRICT", "SUPPORT"
> >
> > I did not make it as a macro for alter function options as it is used
> > only in one place whereas the others were required in more than one
> > place.
>
> My feeling is that having this macro somewhat improves readability and
> consistency between the 3 cases, so I think it's worth it, even if
> it's only used once.
>
> I think it slightly improves readability to keep all the arguments to
> Matches() on one line, and that seems to be the style elsewhere, even
> if that makes the line longer than 80 characters.
>
> Also in the interests of readability, I think it's slightly easier to
> follow if the "ALTER PROCEDURE  (...)" and "ALTER ROUTINE 
> (...)" cases are made to immediately follow the "ALTER FUNCTION 
> (...)" case, with the longer/more complex cases following on from
> that.
>
> That leads to the attached, which barring objections, I'll push shortly.

The changes look good to me.

Regards,
Vignesh

Re: Infinite Interval

2023-01-05 Thread Joseph Koshakow

Jian,

I incorporated your changes and updated interval.out and ran
pgindent. Looks like some of the error messages have changed and we
have some issues with parsing "+infinity" after rebasing.

- Joe
From 4bf672f9079322cffde635dff2078582fca55f09 Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 17 Dec 2022 14:21:26 -0500
Subject: [PATCH] This is WIP.

TODOs
1. Various TODOs in code.
2. Correctly implement interval_part for infinite intervals.
3. Fix Tests.
4. Should we just use the months field to test for infinity?
5. Update docs

Ashutosh Bapat and Joe Koshakow
---
 src/backend/utils/adt/date.c   |  20 +
 src/backend/utils/adt/datetime.c   |  14 +-
 src/backend/utils/adt/timestamp.c  | 425 ---
 src/include/datatype/timestamp.h   |  22 +
 src/test/regress/expected/horology.out |   6 +-
 src/test/regress/expected/interval.out | 691 +++--
 src/test/regress/sql/horology.sql  |   6 +-
 src/test/regress/sql/interval.sql  | 191 ++-
 8 files changed, 1264 insertions(+), 111 deletions(-)

diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 99171d9c92..8334b9053f 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -2067,6 +2067,11 @@ time_pl_interval(PG_FUNCTION_ARGS)
 	Interval   *span = PG_GETARG_INTERVAL_P(1);
 	TimeADT		result;
 
+	if (INTERVAL_NOT_FINITE(span))
+		ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("cannot add infinite interval to time")));
+
 	result = time + span->time;
 	result -= result / USECS_PER_DAY * USECS_PER_DAY;
 	if (result < INT64CONST(0))
@@ -2085,6 +2090,11 @@ time_mi_interval(PG_FUNCTION_ARGS)
 	Interval   *span = PG_GETARG_INTERVAL_P(1);
 	TimeADT		result;
 
+	if (INTERVAL_NOT_FINITE(span))
+		ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("cannot subtract infinite interval from time")));
+
 	result = time - span->time;
 	result -= result / USECS_PER_DAY * USECS_PER_DAY;
 	if (result < INT64CONST(0))
@@ -2599,6 +2609,11 @@ timetz_pl_interval(PG_FUNCTION_ARGS)
 	Interval   *span = PG_GETARG_INTERVAL_P(1);
 	TimeTzADT  *result;
 
+	if (INTERVAL_NOT_FINITE(span))
+		ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("cannot add infinite interval to time")));
+
 	result = (TimeTzADT *) palloc(sizeof(TimeTzADT));
 
 	result->time = time->time + span->time;
@@ -2621,6 +2636,11 @@ timetz_mi_interval(PG_FUNCTION_ARGS)
 	Interval   *span = PG_GETARG_INTERVAL_P(1);
 	TimeTzADT  *result;
 
+	if (INTERVAL_NOT_FINITE(span))
+		ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("cannot subtract infinite interval from time")));
+
 	result = (TimeTzADT *) palloc(sizeof(TimeTzADT));
 
 	result->time = time->time - span->time;
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index d166613895..4192e7a74b 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -70,7 +70,7 @@ static bool DetermineTimeZoneAbbrevOffsetInternal(pg_time_t t,
   const char *abbr, pg_tz *tzp,
   int *offset, int *isdst);
 static pg_tz *FetchDynamicTimeZone(TimeZoneAbbrevTable *tbl, const datetkn *tp,
-   DateTimeErrorExtra *extra);
+   DateTimeErrorExtra * extra);
 
 
 const int	day_tab[2][13] =
@@ -978,7 +978,7 @@ ParseDateTime(const char *timestr, char *workbuf, size_t buflen,
 int
 DecodeDateTime(char **field, int *ftype, int nf,
 			   int *dtype, struct pg_tm *tm, fsec_t *fsec, int *tzp,
-			   DateTimeErrorExtra *extra)
+			   DateTimeErrorExtra * extra)
 {
 	int			fmask = 0,
 tmask,
@@ -1928,7 +1928,7 @@ DetermineTimeZoneAbbrevOffsetInternal(pg_time_t t, const char *abbr, pg_tz *tzp,
 int
 DecodeTimeOnly(char **field, int *ftype, int nf,
 			   int *dtype, struct pg_tm *tm, fsec_t *fsec, int *tzp,
-			   DateTimeErrorExtra *extra)
+			   DateTimeErrorExtra * extra)
 {
 	int			fmask = 0,
 tmask,
@@ -3233,7 +3233,7 @@ DecodeTimezone(const char *str, int *tzp)
 int
 DecodeTimezoneAbbrev(int field, const char *lowtoken,
 	 int *ftype, int *offset, pg_tz **tz,
-	 DateTimeErrorExtra *extra)
+	 DateTimeErrorExtra * extra)
 {
 	const datetkn *tp;
 
@@ -3635,6 +3635,8 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
 			case DTK_STRING:
 			case DTK_SPECIAL:
 type = DecodeUnits(i, field[i], );
+if (type == UNKNOWN_FIELD)
+	type = DecodeSpecial(i, field[i], );
 if (type == IGNORE_DTF)
 	continue;
 
@@ -4040,7 +4042,7 @@ DecodeUnits(int field, const char *lowtoken, int *val)
  * separate SQLSTATE codes, so ...
  */
 void
-DateTimeParseError(int dterr, DateTimeErrorExtra *extra,
+DateTimeParseError(int dterr, DateTimeErrorExtra * extra,
    const char *str, const char *datatype,
    Node *escontext)
 {
@@ -4919,7 +4921,7 @@ InstallTimeZoneAbbrevs(TimeZoneAbbrevTable *tbl)
  */
 static pg_tz *

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-01-05 Thread Peter Geoghegan

On Thu, Jan 5, 2023 at 3:27 PM Peter Geoghegan  wrote:
> This particular error message is from the hardening added to Postgres
> 15 in commit e7428a99. So it's not surprising that Michail didn't see
> the same error on 14.

Reproduced this on HEAD locally (no docker), without any difficulty.

FWIW, I find that the "Assert(ItemIdIsNormal(lp));" at the top of
heap_lock_tuple() is the first thing that fails on my assert-enabled
build.

-- 
Peter Geoghegan

Re: fix and document CLUSTER privileges

2023-01-05 Thread Nathan Bossart

Apparently I forgot to run all the tests before posting v4.  Here is a new
version of the patch that should pass all tests.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 145101e6a5..749b410e16 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -67,7 +67,8 @@ CLUSTER [VERBOSE]
   
 
   
-   CLUSTER without any parameter reclusters all the
+   CLUSTER without a
+   table_name reclusters all the
previously-clustered tables in the current database that the calling user
owns or has the MAINTAIN privilege for, or all such tables
if called by a superuser or a role with privileges of the
@@ -134,6 +135,15 @@ CLUSTER [VERBOSE]
  
   Notes
 
+   
+To cluster a table, one must have the MAINTAIN privilege
+on the table or be the table's owner, a superuser, or a role with
+privileges of the
+pg_maintain
+role.  Database-wide clusters and clusters on partitioned tables will skip
+over any tables that the calling user does not have permission to cluster.
+   
+

 In cases where you are accessing single rows randomly
 within a table, the actual order of the data in the
@@ -202,7 +212,8 @@ CLUSTER [VERBOSE]

 Clustering a partitioned table clusters each of its partitions using the
 partition of the specified partitioned index.  When clustering a partitioned
-table, the index may not be omitted.
+table, the index may not be omitted.  CLUSTER on a
+partitioned table cannot be executed inside a transaction block.

 
  
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index f11691aff7..55b699ef05 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -80,6 +80,7 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
 			   Oid indexOid);
+static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
 
 
 /*---
@@ -366,8 +367,7 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 	if (recheck)
 	{
 		/* Check that the user still has privileges for the relation */
-		if (!object_ownercheck(RelationRelationId, tableOid, save_userid) &&
-			pg_class_aclcheck(tableOid, save_userid, ACL_MAINTAIN) != ACLCHECK_OK)
+		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
 		{
 			relation_close(OldHeap, AccessExclusiveLock);
 			goto out;
@@ -1645,8 +1645,7 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 		index = (Form_pg_index) GETSTRUCT(indexTuple);
 
-		if (!object_ownercheck(RelationRelationId, index->indrelid, GetUserId()) &&
-			pg_class_aclcheck(index->indrelid, GetUserId(), ACL_MAINTAIN) != ACLCHECK_OK)
+		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
@@ -1695,10 +1694,7 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 			continue;
 
 		/* Silently skip partitions which the user has no access to. */
-		if (!object_ownercheck(RelationRelationId, relid, GetUserId()) &&
-			pg_class_aclcheck(relid, GetUserId(), ACL_MAINTAIN) != ACLCHECK_OK &&
-			(!object_ownercheck(DatabaseRelationId, MyDatabaseId, GetUserId()) ||
-			 IsSharedRelation(relid)))
+		if (!cluster_is_permitted_for_relation(relid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
@@ -1714,3 +1710,16 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 
 	return rtcs;
 }
+
+static bool
+cluster_is_permitted_for_relation(Oid relid, Oid userid)
+{
+	if (object_ownercheck(RelationRelationId, relid, userid) ||
+		pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
+		return true;
+
+	ereport(WARNING,
+			(errmsg("permission denied to cluster \"%s\", skipping it",
+	get_rel_name(relid;
+	return false;
+}
diff --git a/src/test/isolation/expected/cluster-conflict-partition.out b/src/test/isolation/expected/cluster-conflict-partition.out
index 7acb675c97..9e31fc06d9 100644
--- a/src/test/isolation/expected/cluster-conflict-partition.out
+++ b/src/test/isolation/expected/cluster-conflict-partition.out
@@ -3,7 +3,7 @@ Parsed test spec with 2 sessions
 starting permutation: s1_begin s1_lock_parent s2_auth s2_cluster s1_commit s2_reset
 step s1_begin: BEGIN;
 step s1_lock_parent: LOCK cluster_part_tab IN SHARE UPDATE EXCLUSIVE MODE;
-step s2_auth: SET ROLE regress_cluster_part;
+step s2_auth: SET ROLE regress_cluster_part; SET client_min_messages = 'ERROR';
 step s2_cluster: CLUSTER cluster_part_tab USING cluster_part_ind; 
 step s1_commit: COMMIT;
 step s2_cluster: <... completed>
@@ -11,7 +11,7 @@ step s2_reset:

Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit linker of old Rtdyld.

2023-01-05 Thread Alex Fan

There is discussion in
https://github.com/riscv-non-isa/riscv-toolchain-conventions/issues/13 to
change the abi default, but not much attention for some time. The consensus
seems to be set the abi and extension explicitly.

> I recommend proposing a patch for adding such an API to LLVM.

I would like to try some time later. Jitlink allows lots of flexibility to
inspect each linking process, I feel myself don't know enough use cases to
propose a good enough c-abi for it.

The thing I am thinking is these patch to llvm will take some time to land
especially for abi and extension default. But jitlink and orc for riscv is
very mature since llvm-15, and even llvm-14 with two minor patches. It
would be good to have these bits, though ugly, so that postgresql jit can
work with llvm-15 as most distros are still moving to it.

cheers,
Alex Fan

On Sun, Dec 25, 2022 at 11:02 PM Andres Freund  wrote:

> Hi,
>
> On 2022-11-23 21:13:04 +1100, Alex Fan wrote:
> > > @@ -241,6 +246,40 @@ llvm_mutable_module(LLVMJitContext *context)
> > > context->module = LLVMModuleCreateWithName("pg");
> > > LLVMSetTarget(context->module, llvm_triple);
> > > LLVMSetDataLayout(context->module, llvm_layout);
> > > +#ifdef __riscv
> > > +#if __riscv_xlen == 64
> > > +#ifdef __riscv_float_abi_double
> > > +   abiname = "lp64d";
> > > +#elif defined(__riscv_float_abi_single)
> > > +   abiname = "lp64f";
> > > +#else
> > > +   abiname = "lp64";
> > > +#endif
> > > +#elif __riscv_xlen == 32
> > > +#ifdef __riscv_float_abi_double
> > > +   abiname = "ilp32d";
> > > +#elif defined(__riscv_float_abi_single)
> > > +   abiname = "ilp32f";
> > > +#else
> > > +   abiname = "ilp32";
> > > +#endif
> > > +#else
> > > +   elog(ERROR, "unsupported riscv xlen %d", __riscv_xlen);
> > > +#endif
> > > +   /*
> > > +* set this manually to avoid llvm defaulting to soft
> > > float and
> > > +* resulting in linker error: `can't link double-float
> > > modules
> > > +* with soft-float modules`
> > > +* we could set this for TargetMachine via MCOptions,
> but
> > > there
> > > +* is no C API for it
> > > +* ref:
>
> I think this is something that should go into the llvm code, rather than
> postgres.
>
>
> > > @@ -820,16 +861,21 @@ llvm_session_initialize(void)
> > > elog(DEBUG2, "LLVMJIT detected CPU \"%s\", with features
> \"%s\"",
> > >  cpu, features);
> > >
> > > +#ifdef __riscv
> > > +   reloc=LLVMRelocPIC;
> > > +   codemodel=LLVMCodeModelMedium;
> > > +#endif
>
> Same.
>
>
>
>
> > > +#ifdef USE_JITLINK
> > > +/*
> > > + * There is no public C API to create ObjectLinkingLayer for JITLINK,
> > > create our own
> > > + */
> > > +DEFINE_SIMPLE_CONVERSION_FUNCTIONS(llvm::orc::ExecutionSession,
> > > LLVMOrcExecutionSessionRef)
> > > +DEFINE_SIMPLE_CONVERSION_FUNCTIONS(llvm::orc::ObjectLayer,
> > > LLVMOrcObjectLayerRef)
>
> I recommend proposing a patch for adding such an API to LLVM.
>
>
>
> Greetings,
>
> Andres Freund
>

Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-05 Thread Nathan Bossart

On Thu, Jan 05, 2023 at 09:29:24AM -0800, Nathan Bossart wrote:
> On Thu, Jan 05, 2023 at 10:57:58AM +0530, Amit Kapila wrote:
>> True, if we want we can use dshash for this.
> 
> I'll look into this.

Here is an attempt at using dshash.  This is quite a bit cleaner since we
don't need garbage collection or the flag in the worker slots.  There is
some extra work required to set up the table, but it doesn't seem too bad.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 8853500e9b238b0d07871940d4e7db9e43802b8b Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 21 Nov 2022 16:01:01 -0800
Subject: [PATCH v13 1/2] wake up logical workers as needed instead of relying
 on periodic wakeups

---
 src/backend/access/transam/xact.c   |  3 ++
 src/backend/commands/alter.c|  7 +++
 src/backend/commands/subscriptioncmds.c |  7 +++
 src/backend/replication/logical/tablesync.c | 49 -
 src/backend/replication/logical/worker.c| 46 +++
 src/include/replication/logicalworker.h |  3 ++
 6 files changed, 93 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 24221542e7..54145bf805 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -47,6 +47,7 @@
 #include "pgstat.h"
 #include "replication/logical.h"
 #include "replication/logicallauncher.h"
+#include "replication/logicalworker.h"
 #include "replication/origin.h"
 #include "replication/snapbuild.h"
 #include "replication/syncrep.h"
@@ -2360,6 +2361,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_LogicalRepWorkers(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2860,6 +2862,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_LogicalRepWorkers(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 70d359eb6a..4e8102b59f 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -59,6 +59,7 @@
 #include "commands/user.h"
 #include "miscadmin.h"
 #include "parser/parse_func.h"
+#include "replication/logicalworker.h"
 #include "rewrite/rewriteDefine.h"
 #include "tcop/utility.h"
 #include "utils/builtins.h"
@@ -279,6 +280,12 @@ AlterObjectRename_internal(Relation rel, Oid objectId, const char *new_name)
 		if (strncmp(new_name, "regress_", 8) != 0)
 			elog(WARNING, "subscriptions created by regression test cases should have names starting with \"regress_\"");
 #endif
+
+		/*
+		 * Wake up the logical replication workers to handle this change
+		 * quickly.
+		 */
+		LogicalRepWorkersWakeupAtCommit(objectId);
 	}
 	else if (nameCacheId >= 0)
 	{
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index b9c5df796f..c0fd82cdf2 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
 #include "nodes/makefuncs.h"
 #include "pgstat.h"
 #include "replication/logicallauncher.h"
+#include "replication/logicalworker.h"
 #include "replication/origin.h"
 #include "replication/slot.h"
 #include "replication/walreceiver.h"
@@ -1362,6 +1363,9 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 
 	InvokeObjectPostAlterHook(SubscriptionRelationId, subid, 0);
 
+	/* Wake up the logical replication workers to handle this change quickly. */
+	LogicalRepWorkersWakeupAtCommit(subid);
+
 	return myself;
 }
 
@@ -1733,6 +1737,9 @@ AlterSubscriptionOwner_internal(Relation rel, HeapTuple tup, Oid newOwnerId)
 			  form->oid, 0);
 
 	ApplyLauncherWakeupAtCommit();
+
+	/* Wake up the logical replication workers to handle this change quickly. */
+	LogicalRepWorkersWakeupAtCommit(form->oid);
 }
 
 /*
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 2fdfeb5b4c..8c8f27ebcf 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -415,6 +415,7 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
 	static HTAB *last_start_times = NULL;
 	ListCell   *lc;
 	bool		started_tx = false;
+	bool		should_exit = false;
 
 	Assert(!IsTransactionState());
 
@@ -446,28 +447,6 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
 		last_start_times = NULL;
 	}
 
-	/*
-	 * Even when the two_phase mode is requested by the user, it remains as
-	 * 'pending' until all tablesyncs have reached READY state.
-	 *
-	 * When this happens, we restart the apply worker and (if the conditions
-	 * are still ok) then the two_phase tri-state will become 'enabled' at
-	 * that time.
-	 *
-	 * Note: If the subscription has no tables then leave the state as
-	 * PENDING,

Re: A varint implementation for PG?

2023-01-05 Thread Robert Haas

Andres asked me off-list if I could take another look at this.

So here's a bit of review:

- The header comment at the top of the file gives some examples of how
the encoding works, and then basically says, oh wait, there's also a
sign bit at the end, so all those examples are actually wrong. It's
really this other thing. Perhaps it's possible to reword things to
avoid that.

- The XXX comment in the file header says "variants" where it probably
means "varints". A later comment mentions "lenght" instead of
"length".

- In pg_varint_encode_uint64, where it says "XXX: I'm sure there's a
neater way to do this," I'm not sure exactly what your parameters for
neater are, but you could avoid the conditional if you just did:

bits_of_output_data = bits_of_input_data + bytes_of_input_data;
bytes_of_output_data = (bits_of_output_data + BITS_PER_BYTE - 1) /
BITS_PER_BYTE;

I think the comment implies that you thought of that and discarded it
on performance grounds, but I'm not quite sure that I'm right about
that, so I mention it here.

I also thought of another approach that doesn't require computing
bytes_of_input_data:

bytes_of_output_data = bits_of_input_data / 7 + 1;

The intuition here is that every byte you add to the output gives you
room for 7 additional data bits, because you gain 8 for the new byte
but also have to consume 1 bit from the first byte, for a net gain of
7. This formula gives the wrong answer when bits_of_input_data is 63
or 64, but you don't need to use it for values greater than
PG_VARINT_UINT64_MAX_8BYTE_VAL, so that doesn't matter.

- It's a bit surprising that the length argument to memcpy() is a
constant in the 8-bytes-or-less case. It should be fine, because the
output buffer must be big enough to hold at least 9 more bytes, so all
that can happen is we write unnecessary bytes that the caller can
later overwrite, or not. But it would be worth a comment, I think. In
particular, we should note that you should always have enough
allocated space in the output buffer for the maximum width of 9 bytes
even if you know the number you're actually encoding is small. You do
mention this in the decoding function, but not on the encoding side.

- The FIXME comment for pg_varint_decode_uint64 no verb.

- ret <<= BITS_PER_BYTE * (BITS_PER_BYTE - bytes) is awfully hard to
reason about. My first thought was that it had to be wrong: if we are
shifting left during encoding, how can we also be shifting left during
decoding? Actually I think I see, now, how it works on a little-Endian
machine. Logically, the encoded bytes are the least-significant bytes.
We copy them to the least-significant bytes of "ret," but they're in
the wrong order, with the most significant byte of the encoded
representation in the last significant byte of "ret". By shifting
left, we move all the bytes to the other "end" of "ret", and then the
byte swap puts them back where they were, but now in the right order.
Along the way, the garbage bits we're supposed to be ignoring get
thrown away and replaced with zeroes. But I still don't see how it
works on a big-Endian machine. On such a machine, we copy the encoded
bytes, which are still the least significant bytes of the original
value, to the most significant bytes of "ret". The byte swap isn't
going to do anything here, so in this case it feels like the shift
needs to be in the other direction -- a shift right. But maybe not,
because originally I thought it should be a shift right in both cases,
and now I don't think that's right.

- Re "mask out length indicator bits & separator bit", only the
separator bit actually is being or needs to be masked out.

- I think the overall interface is fine. I think it might be useful to
add a function that returns the length to which something would encode
without actually encoding it, for cases where you want to estimate how
much space you're going to need for something so that you can allocate
space for it, and then only afterwards do the encoding for real.

That's all I've got.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-05 Thread David Rowley

On Thu, 5 Jan 2023 at 19:14, Ankit Kumar Pandey  wrote:
> We are already doing something like I mentioned.
>
> Consider this example:
>
> explain SELECT rank() OVER (ORDER BY a), count(*) OVER (ORDER BY a,b)
> FROM abcd;
>  QUERY PLAN
> --
>   WindowAgg  (cost=83.80..127.55 rows=1250 width=24)
> ->  WindowAgg  (cost=83.80..108.80 rows=1250 width=16)
>   ->  Sort  (cost=83.80..86.92 rows=1250 width=8)
> Sort Key: a, b
> ->  Seq Scan on abcd  (cost=0.00..19.50 rows=1250 width=8)
> (5 rows)
>
>
> If it is okay to do extra sort for first window function (rank) here,
> why would it be
>
> any different in case which I mentioned?

We *can* reuse Sorts where a more strict or equivalent sort order is
available.  The question is how do we get the final WindowClause to do
something slightly more strict to save having to do anything for the
ORDER BY.  One way you might think would be to adjust the
WindowClause's orderClause to add the additional clauses, but that
cannot be done because that would cause are_peers() in nodeWindowAgg.c
to not count some rows as peers when they maybe should be given a less
strict orderClause in the WindowClause.

It might be possible to adjust create_one_window_path() so that when
processing the final WindowClause that it looks at the DISTINCT or
ORDER BY clause to see if we can sort on a few extra columns to save
having to do any further sorting.  We just *cannot* make any
adjustments to the WindowClause's orderClause.

David

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-01-05 Thread Peter Geoghegan

On Thu, Jan 5, 2023 at 1:49 PM Matthias van de Meent
 wrote:
> client 29 script 0 aborted in command 2 query 0: ERROR:  heap tid from index 
> tuple (111,1) points past end of heap page line pointer array at offset 262 
> of block 1 in index "something_is_wrong_here_pkey"

This particular error message is from the hardening added to Postgres
15 in commit e7428a99. So it's not surprising that Michail didn't see
the same error on 14.

-- 
Peter Geoghegan

Re: Generate pg_stat_get_xact*() functions with Macros

2023-01-05 Thread Andres Freund

Hi,

On 2023-01-05 15:19:54 -0500, Corey Huinker wrote:
> It does get me wondering, however, if we reordered the three typedefs to
> group like-typed registers together, we could make them an array with the
> names becoming defined constant index values (or keeping them via a union),
> then the typedefs effectively become:

I think that'd make it substantially enough harder to work with the
datastructures that I don't want to go there.

The "more fundamental" approach would be to switch to using a table-returning
function for accessing these stat values. When just accessing a single counter
or two, the current approach avoids the overhead of having to construct a
tuple. But after that the overhead of having to fetch the stats data (i.e. a
hash table lookup, potentially some locking) multiple times takes over.

Unfortunately there's currently no way to dynamically switch between those
behaviours.

Greetings,

Andres Freund

Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit linker of old Rtdyld.

2023-01-05 Thread Alex Fan

> why should we do this only for riscv?
I originally considered riscv because I only have amd64 and riscv hardware
for testing. But afaik, JITLink currently supports arm64 and x86-64 (ELF &
MacOS), riscv (both 64bit and 32bit, ELF). i386 seems also supported. No
support for ppc or mips exist yet. I am most familiar with riscv and its
support has been quite stable. I also know Julia folks have been using
orcjit only for arm64 on MacOS for quite a while, but stayed in mcjit for
other platforms. clasp  also
uses orcjit on x86 (ELF & macos) heavily. I'd say It should be safe to
switch on x86 and arm64 also.

> that is necessary for RISCV because they haven't got around to doing this
yet
I doubt if the runtimedylib patch will eventually be accepted since mcjit
with its runtimedylib is in maintenance mode. Users are generally suggested
to switch to orcjit.

I am not familiar with Windows either. I realise from your link,
RuntimeDyld does have windows COFF support. I have clearly misread
something from discord.
There is a recent talk  about
jitlink windows support that covers a lot of details. I think the situation
is that RuntimeDyld support for COFF is not stable and only limited to
large code model (RuntimeDyld's constraint), so people tend to use ELF
format on windows with RuntimeDyld. JITLINK's recent COFF support is more
stable and allows small code model.

The logic of abi and extensions is indeed confusing and llvm backend also
has some nuances. I will try to explain it to my best and am very happy to
clarify more based on my knowledge if there are more questions.

 'm' 'i' 'a' 'f' 'd' are all extension features and usually are grouped
together as 'g' for general. Along with 'c' for compression (and two tiny
extension sets Zicsr and Zifencei splitted from 'i' since isa spec
20191213, llvm will still default enable them along with 'i', so not
relevant to us) is named rv64gc for 64 bit machine. rv64gc is generally the
recommended basic set for linux capable 64bit machines. Some embedded cores
without +f +d still work with Linux, but are rare and unlikely want
postgresql.
abi_name like 'lp64' 'lp64d' is considered independent from cpu extensions.
You can have a "+f +d" machines with lp64 abi, meaning the function body
can have +f +d instructions and registers, but the function signature
cannot. To set abi explicitly for llvm backend, we need to pass
MCOptions.ABIName or a module metadata target-abi.

If abi is missing, before llvm-15, the backend defaults to lp64 on 64bit
platform, ignoring any float extension enabled or not. The consensus seemed
to be backend should be explicitly configured. After llvm-15, specifically this
commit , it chooses lp64d if +d is
present to align with clang default. I mostly test on llvm-14 because
JITLINK riscv support is complete already except some minor fixes, and
notice until now the newly change. But because I test on Gentoo, it has abi
lp64 build on riscv64gc, so if abi is not configured this way, I would end
up with lp64d enabled by +d extension from riscv64`g`c on a lp64 build.

> your patch seems to imply that LLVM is not able to get features reliably
> for RISCV -- why not, immaturity or technical reason why it can't?
Immaturity. Actually, unimplemented for riscv as you can check here
.
Because gethostcpuname usually returns generic or generic-rv64, feature
list for these is basically empty except 'i'. I may work out a patch for
llvm later.

> wouldn't the subtarget/ABI be the same as the one that the LLVM library
itself was compiled for
llvm is inherently a multitarget & cross platform compiler backend. It is
capable of all subtargets & features for enabled platform(s). The target
triple works like you said. There is LLVM_DEFAULT_TARGET_TRIPLE that sets
the default to native if no target is specified in runtime, so default
triple is reliable. But cpuname, abi, extensions don't follow this logic.
The llvm riscv backend devs expect these to be configured explicitly
(although there are some default and dependencies, and frontend like clang
also have default). Therefore gethostcpuname is needed and feature
extensions are derived from known cpuname. In case cpuname from
gethostcpuname is not enough, gethostcpufeatures is needed like your
example of AVX extension.

> why we don't need to deal with this kind of manual subtarget selection on
any other architecture
ppc sets default abi here
,
so there is no abi issue. Big end or little end is encoded in target triple
like ppc64 (big endian), ppc64le (little endian), and a recent riscv64be
patch . I guess that is why there are

Re: Logical replication - schema change not invalidating the relation cache

2023-01-05 Thread Tom Lane

vignesh C  writes:
> On Thu, 5 Jan 2023 at 03:17, Tom Lane  wrote:
>> The bigger picture here though is that in examples such as the one
>> you gave at the top of the thread, it's not very clear to me that
>> there's *any* principled behavior.  If the connection between publisher
>> and subscriber tables is only the relation name, fine ... but exactly
>> which relation name applies?

> The connection between publisher and subscriber table is based on
> relation id, During the first change relid, relname and schema name
> from publisher will be sent to the subscriber. Subscriber stores these
> id, relname and schema name in the LogicalRepRelMap hash for which
> relation id is the key. Subsequent data received in the subscriber
> will use the relation id received from the publisher and apply the
> changes in the subscriber.

Hm.  I spent some time cleaning up this patch, and found that there's
still a problem.  ISTM that the row with value "3" ought to end up
in the subscriber's sch2.t1 table, but it does not: the attached
test script fails with

t/100_bugs.pl .. 6/? 
#   Failed test 'check data in subscriber sch2.t1 after schema rename'
#   at t/100_bugs.pl line 361.
#  got: ''
# expected: '3'
# Looks like you failed 1 test of 9.
t/100_bugs.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/9 subtests 

What's up with that?

regards, tom lane

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 7737242516..876adab38e 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1929,7 +1929,22 @@ init_rel_sync_cache(MemoryContext cachectx)
 
 	Assert(RelationSyncCache != NULL);
 
+	/* We must update the cache entry for a relation after a relcache flush */
 	CacheRegisterRelcacheCallback(rel_sync_cache_relation_cb, (Datum) 0);
+
+	/*
+	 * Flush all cache entries after a pg_namespace change, in case it was a
+	 * schema rename affecting a relation being replicated.
+	 */
+	CacheRegisterSyscacheCallback(NAMESPACEOID,
+  rel_sync_cache_publication_cb,
+  (Datum) 0);
+
+	/*
+	 * Flush all cache entries after any publication changes.  (We need no
+	 * callback entry for pg_publication, because publication_invalidation_cb
+	 * will take care of it.)
+	 */
 	CacheRegisterSyscacheCallback(PUBLICATIONRELMAP,
   rel_sync_cache_publication_cb,
   (Datum) 0);
@@ -2325,8 +2340,8 @@ rel_sync_cache_relation_cb(Datum arg, Oid relid)
 /*
  * Publication relation/schema map syscache invalidation callback
  *
- * Called for invalidations on pg_publication, pg_publication_rel, and
- * pg_publication_namespace.
+ * Called for invalidations on pg_publication, pg_publication_rel,
+ * pg_publication_namespace, and pg_namespace.
  */
 static void
 rel_sync_cache_publication_cb(Datum arg, int cacheid, uint32 hashvalue)
@@ -2337,14 +2352,14 @@ rel_sync_cache_publication_cb(Datum arg, int cacheid, uint32 hashvalue)
 	/*
 	 * We can get here if the plugin was used in SQL interface as the
 	 * RelSchemaSyncCache is destroyed when the decoding finishes, but there
-	 * is no way to unregister the relcache invalidation callback.
+	 * is no way to unregister the invalidation callbacks.
 	 */
 	if (RelationSyncCache == NULL)
 		return;
 
 	/*
-	 * There is no way to find which entry in our cache the hash belongs to so
-	 * mark the whole cache as invalid.
+	 * We have no easy way to identify which cache entries this invalidation
+	 * event might have affected, so just mark them all invalid.
 	 */
 	hash_seq_init(, RelationSyncCache);
 	while ((entry = (RelationSyncEntry *) hash_seq_search()) != NULL)
diff --git a/src/test/subscription/t/100_bugs.pl b/src/test/subscription/t/100_bugs.pl
index 143caac792..de827f8777 100644
--- a/src/test/subscription/t/100_bugs.pl
+++ b/src/test/subscription/t/100_bugs.pl
@@ -70,9 +70,10 @@ $node_publisher->wait_for_catchup('sub1');
 pass('index predicates do not cause crash');
 
 # We'll re-use these nodes below, so drop their replication state.
-# We don't bother to drop the tables though.
 $node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub1");
 $node_publisher->safe_psql('postgres', "DROP PUBLICATION pub1");
+# Drop the tables too.
+$node_publisher->safe_psql('postgres', "DROP TABLE tab1");
 
 $node_publisher->stop('fast');
 $node_subscriber->stop('fast');
@@ -307,6 +308,58 @@ is( $node_subscriber->safe_psql(
 	qq(-1|1),
 	"update works with REPLICA IDENTITY");
 
+# Clean up
+$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
+$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub");
+$node_publisher->safe_psql('postgres', "DROP TABLE tab_replidentity_index");
+$node_subscriber->safe_psql('postgres', "DROP TABLE tab_replidentity_index");
+
+# Test schema invalidation by renaming the schema
+
+# Create tables on publisher
+$node_publisher->safe_psql('postgres', "CREATE SCHEMA sch1");

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-05 Thread Tom Lane

Tomas Vondra  writes:
> There are no callers outside postgresAcquireSampleRowsFunc, so what
> about renaming them like this?

>   deparseAnalyzeTuplesSql
> -> deparseAnalyzeInfoSql

>   postgresCountTuplesForForeignTable
> -> postgresGetAnalyzeInfoForForeignTable

WFM.

regards, tom lane

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-01-05 Thread Matthias van de Meent

On Thu, 5 Jan 2023 at 14:12, Michail Nikolaev 
wrote:
>
> Hello, hackers.
>
> It seems like PG 14 works incorrectly with vacuum_defer_cleanup_age
> (or just not cleared rows, not sure) and SELECT FOR UPDATE + UPDATE.
> I am not certain, but hot_standby_feedback probably able to cause the
> same issues.
>
> Steps to reproduce:
>
> [steps]
>
> I was able to see such a set of errors (looks scary):
>
> ERROR:  MultiXactId 30818104 has not been created yet -- apparent
wraparound
> ERROR:  could not open file "base/13757/16385.1" (target block
> 39591744): previous segment is only 24 blocks

This looks quite suspicious too - it wants to access a block at 296GB of
data, where only 196kB exist.

> ERROR:  attempted to lock invisible tuple
> ERROR:  could not access status of transaction 38195704
> DETAIL:  Could not open file "pg_subtrans/0246": No such file or
directory.

I just saw two instances of this "attempted to lock invisible tuple" error
for the 15.1 image (run on Docker in Ubuntu in WSL) with your reproducer
script, so this does not seem to be specific to PG14 (.6).

And, after some vacuum and restarting the process, I got the following:

client 29 script 0 aborted in command 2 query 0: ERROR:  heap tid from
index tuple (111,1) points past end of heap page line pointer array at
offset 262 of block 1 in index "something_is_wrong_here_pkey"

There is indeed something wrong there; the page can't be read by
pageinspect:

$ select get_raw_page('public.something_is_wrong_here', 111)::bytea;
ERROR:  invalid page in block 111 of relation base/5/16385

I don't have access to the v14 data anymore (I tried a restart, which
dropped the data :-( ), but will retain my v15 instance for some time to
help any debugging.

Kind regards,

Matthias van de Meent

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-05 Thread Tomas Vondra




On 1/5/23 22:05, Tom Lane wrote:
> Tomas Vondra  writes:
>> The second patch adds the relkind check, so that we only issue
>> TABLESAMPLE on valid relation types (tables, matviews). But I'm not sure
>> we actually need it - the example presented earlier was foreign table
>> pointing to a sequence. But that actually works fine even without this
>> patch, because fore sequences we have reltuples=1, which disables
>> sampling due to the check mentioned above (because targrows >= 1).
> 
> Seems pretty accidental still.
> 
>> The other issue with this patch is that it seems wrong to check the
>> relkind value locally - instead we should check this on the remote
>> server, because that's where we'll run TABLESAMPLE. Currently there are
>> no differences between versions, but what if we implement TABLESAMPLE
>> for another relkind in the future? Then we'll either fail to use
>> sampling or (worse) we'll try using TABLESAMPLE for unsupported relkind,
>> depending on which of the servers has newer version.
> 
> We don't have a way to do that, and I don't think we need one.  The
> worst case is that we don't try to use TABLESAMPLE when the remote
> server is a newer version that would allow it.  If we do extend the
> set of supported relkinds, we'd need a version-specific test in
> postgres_fdw so that we don't wrongly use TABLESAMPLE with an older
> remote server ... but that's hardly complicated, or a new concept.
> 
> On the other hand, if we let the code stand, the worst case is that
> ANALYZE fails because we try to apply TABLESAMPLE in an unsupported
> case, presumably with some remote relkind that doesn't exist today.
> I think that's clearly worse than not exploiting TABLESAMPLE
> although we could (with some other new remote relkind).
> 

OK

> As far as the patch details go: I'd write 0001 as
> 
> +Assert(sample_frac >= 0.0 && sample_frac <= 1.0);
> 
> The way you have it seems a bit contorted.

Yeah, that's certainly cleaner.

> As for 0002, personally
> I'd rename the affected functions to reflect their expanded duties,
> and they *definitely* require adjustments to their header comments.
> Functionally it looks fine though.
> 

No argument about the header comments, ofc - I just haven't done that in
the WIP patch. As for the renaming, any suggestions for the new names?
The patch tweaks two functions in a way that affects what they return:

1) deparseAnalyzeTuplesSql - adds relkind to the "reltuples" SQL

2) postgresCountTuplesForForeignTable - adds can_sample flag

There are no callers outside postgresAcquireSampleRowsFunc, so what
about renaming them like this?

  deparseAnalyzeTuplesSql
-> deparseAnalyzeInfoSql

  postgresCountTuplesForForeignTable
-> postgresGetAnalyzeInfoForForeignTable


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Cygwin cleanup

2023-01-05 Thread Thomas Munro

On Fri, Jan 6, 2023 at 1:22 AM Thomas Munro  wrote:
> https://cirrus-ci.com/task/5142810819559424 [still running at time of writing]
>
> Gotta run, but I'll check again in the morning to see if that does better...

Yes!  Two successful runs with all TAP tests so far.  So it looks like
we can probably stop lorikeet's spurious failures, by happy
coincidence due to other work, and we could seriously consider
committing this optional CI test for it, much like we have the
optional MSYS build.  Any interest in producing a tidied up version of
the patch, Justin?  Or I can, but I'll go and work on other things
first.

I pushed a change to switch the semaphore implementation.  I haven't
personally seen that failure mode on lorikeet, but I would guess
that's because (1) it's only running a tiny subset of the tests, (2)
it crashes for the other reason with higher likelihood, and/or (3)
it's not using much concurrency yet because the build farm doesn't use
meson yet.

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-01-05 Thread Michail Nikolaev

Hello, Andres.

I apologize for the direct ping, but I think your snapshot scalability
work in PG14 could be related to the issue.

The TransactionIdRetreatedBy implementation looks correct... But with
txid_current=212195 I see errors like "could not access status of
transaction 58643736"...
So, maybe vacuum_defer_cleanup_age just highlights some special case
(something with "previous" xids on the left side of zero?)

Thanks,
Michail.

Re: Add tracking of backend memory allocated to pg_stat_activity

2023-01-05 Thread Reid Thompson

On Thu, 2023-01-05 at 14:13 -0600, Justin Pryzby wrote:
> 
> I suggest to close the associated CF entry.

Closed with status of Withdrawn. If that is invalid, please advise.

> (Also, the people who participated in this thread may want to be
> included in the other thread going forward.)

Explicitly adding previously missed participants to Cc: - that 
conversation/patches are being consolidated to
the thread  "Add the ability to limit the amount of memory that can be 
allocated to backends"
 
https://www.postgresql.org/message-id/bd57d9a4c219cc1392665fd5fba61dde8027b3da.ca...@crunchydata.com



Thanks,
Reid

-- 
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thomp...@crunchydata.com
www.crunchydata.com

Re: ExecRTCheckPerms() and many prunable partitions (checkAsUser)

2023-01-05 Thread Tom Lane

Justin Pryzby  writes:
> On Wed, Dec 21, 2022 at 01:44:11PM -0600, Justin Pryzby wrote:
>> Alvaro could you comment on this ?

I believe Alvaro's on vacation for a few days more.

regards, tom lane

Re: Minimal logical decoding on standbys

2023-01-05 Thread Robert Haas

On Tue, Jan 3, 2023 at 2:42 AM Drouvot, Bertrand
 wrote:
> Please find attached v36, tiny rebase due to 1de58df4fe.

0001 looks committable to me now, though we probably shouldn't do that
unless we're pretty confident about shipping enough of the rest of
this to accomplish something useful.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: ExecRTCheckPerms() and many prunable partitions (checkAsUser)

2023-01-05 Thread Justin Pryzby

On Wed, Dec 21, 2022 at 01:44:11PM -0600, Justin Pryzby wrote:
> Alvaro could you comment on this ?

I added here so it's not forgotten.
https://commitfest.postgresql.org/42/4107/

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-05 Thread Tom Lane

Tomas Vondra  writes:
> The second patch adds the relkind check, so that we only issue
> TABLESAMPLE on valid relation types (tables, matviews). But I'm not sure
> we actually need it - the example presented earlier was foreign table
> pointing to a sequence. But that actually works fine even without this
> patch, because fore sequences we have reltuples=1, which disables
> sampling due to the check mentioned above (because targrows >= 1).

Seems pretty accidental still.

> The other issue with this patch is that it seems wrong to check the
> relkind value locally - instead we should check this on the remote
> server, because that's where we'll run TABLESAMPLE. Currently there are
> no differences between versions, but what if we implement TABLESAMPLE
> for another relkind in the future? Then we'll either fail to use
> sampling or (worse) we'll try using TABLESAMPLE for unsupported relkind,
> depending on which of the servers has newer version.

We don't have a way to do that, and I don't think we need one.  The
worst case is that we don't try to use TABLESAMPLE when the remote
server is a newer version that would allow it.  If we do extend the
set of supported relkinds, we'd need a version-specific test in
postgres_fdw so that we don't wrongly use TABLESAMPLE with an older
remote server ... but that's hardly complicated, or a new concept.

On the other hand, if we let the code stand, the worst case is that
ANALYZE fails because we try to apply TABLESAMPLE in an unsupported
case, presumably with some remote relkind that doesn't exist today.
I think that's clearly worse than not exploiting TABLESAMPLE
although we could (with some other new remote relkind).

As far as the patch details go: I'd write 0001 as

+Assert(sample_frac >= 0.0 && sample_frac <= 1.0);

The way you have it seems a bit contorted.  As for 0002, personally
I'd rename the affected functions to reflect their expanded duties,
and they *definitely* require adjustments to their header comments.
Functionally it looks fine though.

regards, tom lane

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-05 Thread Tomas Vondra

Hi,

here's two minor postgres_fdw patches, addressing the two issues
discussed last week.


1) 0001-Fix-stale-comment-in-postgres_fdw.patch

The first one cleans up the comment referencing the sample rate
adjustment. While doing that, I realized without the adjustment we
should never get sample_rate outside the valid range because we do:

if ((reltuples <= 0) || (targrows >= reltuples))
method = ANALYZE_SAMPLE_OFF;

So I removed the clamping, or rather replaced it with an assert.


2) 0002-WIP-check-relkind-in-FDW-analyze.patch

The second patch adds the relkind check, so that we only issue
TABLESAMPLE on valid relation types (tables, matviews). But I'm not sure
we actually need it - the example presented earlier was foreign table
pointing to a sequence. But that actually works fine even without this
patch, because fore sequences we have reltuples=1, which disables
sampling due to the check mentioned above (because targrows >= 1).

The other issue with this patch is that it seems wrong to check the
relkind value locally - instead we should check this on the remote
server, because that's where we'll run TABLESAMPLE. Currently there are
no differences between versions, but what if we implement TABLESAMPLE
for another relkind in the future? Then we'll either fail to use
sampling or (worse) we'll try using TABLESAMPLE for unsupported relkind,
depending on which of the servers has newer version.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 681b3569278bc55b4e2bbfcf129245702616ecd9 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Thu, 5 Jan 2023 21:13:20 +0100
Subject: [PATCH 2/2] WIP: check relkind in FDW analyze

---
 contrib/postgres_fdw/deparse.c  |  2 +-
 contrib/postgres_fdw/postgres_fdw.c | 19 +++
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 4461fb06f02..9f39d792280 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2383,7 +2383,7 @@ deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
 	initStringInfo();
 	deparseRelation(, rel);
 
-	appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = ");
+	appendStringInfoString(buf, "SELECT reltuples, relkind FROM pg_catalog.pg_class WHERE oid = ");
 	deparseStringLiteral(buf, relname.data);
 	appendStringInfoString(buf, "::pg_catalog.regclass");
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 65822b8a9aa..50b03067152 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4978,7 +4978,7 @@ postgresAnalyzeForeignTable(Relation relation,
  *		Count tuples in foreign table (just get pg_class.reltuples).
  */
 static double
-postgresCountTuplesForForeignTable(Relation relation)
+postgresCountTuplesForForeignTable(Relation relation, bool *can_sample)
 {
 	ForeignTable *table;
 	UserMapping *user;
@@ -4986,6 +4986,10 @@ postgresCountTuplesForForeignTable(Relation relation)
 	StringInfoData sql;
 	PGresult   *volatile res = NULL;
 	volatile double reltuples = -1;
+	volatile char relkind = 0;
+
+	/* assume the relkind can't be sampled */
+	*can_sample = false;
 
 	/*
 	 * Get the connection to use.  We do the remote access as the table's
@@ -5008,9 +5012,10 @@ postgresCountTuplesForForeignTable(Relation relation)
 		if (PQresultStatus(res) != PGRES_TUPLES_OK)
 			pgfdw_report_error(ERROR, res, conn, false, sql.data);
 
-		if (PQntuples(res) != 1 || PQnfields(res) != 1)
+		if (PQntuples(res) != 1 || PQnfields(res) != 2)
 			elog(ERROR, "unexpected result from deparseAnalyzeTuplesSql query");
 		reltuples = strtod(PQgetvalue(res, 0, 0), NULL);
+		relkind = *(PQgetvalue(res, 0, 1));
 	}
 	PG_FINALLY();
 	{
@@ -5021,6 +5026,10 @@ postgresCountTuplesForForeignTable(Relation relation)
 
 	ReleaseConnection(conn);
 
+	*can_sample = (relkind == RELKIND_RELATION ||
+   relkind == RELKIND_MATVIEW ||
+   relkind == RELKIND_PARTITIONED_TABLE);
+
 	return reltuples;
 }
 
@@ -5167,13 +5176,15 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	 */
 	if (method != ANALYZE_SAMPLE_OFF)
 	{
-		reltuples = postgresCountTuplesForForeignTable(relation);
+		bool	can_sample;
+
+		reltuples = postgresCountTuplesForForeignTable(relation, _sample);
 
 		/*
 		 * Remote's reltuples could be 0 or -1 if the table has never been
 		 * vacuumed/analyzed.  In that case, disable sampling after all.
 		 */
-		if ((reltuples <= 0) || (targrows >= reltuples))
+		if ((!can_sample) || (reltuples <= 0) || (targrows >= reltuples))
 			method = ANALYZE_SAMPLE_OFF;
 		else
 		{
-- 
2.39.0

From 020ba6afb87287d782477054949d77954377d347 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Thu, 5 Jan 2023 20:11:49 +0100
Subject: [PATCH 1/2] Fix stale comment in postgres_fdw

A comment was left behind referencing a bit (sample rate adjustment)
removed from

Re: Generate pg_stat_get_xact*() functions with Macros

2023-01-05 Thread Corey Huinker

On Thu, Jan 5, 2023 at 8:50 AM Drouvot, Bertrand <
bertranddrouvot...@gmail.com> wrote:

> Hi hackers,
>
> Please find attached a patch proposal to $SUBJECT.
>
> This is the same kind of work that has been done in 83a1a1b566 and
> 8018ffbf58 but this time for the
> pg_stat_get_xact*() functions (as suggested by Andres in [1]).
>
> The function names remain the same, but some fields have to be renamed.
>
> While at it, I also took the opportunity to create the macros for
> pg_stat_get_xact_function_total_time(),
> pg_stat_get_xact_function_self_time() and
> pg_stat_get_function_total_time(), pg_stat_get_function_self_time()
> (even if the same code pattern is only repeated two 2 times).
>
> Now that this patch renames some fields, I think that, for consistency,
> those ones should be renamed too (aka remove the f_ and t_ prefixes):
>
> PgStat_FunctionCounts.f_numcalls
> PgStat_StatFuncEntry.f_numcalls
> PgStat_TableCounts.t_truncdropped
> PgStat_TableCounts.t_delta_live_tuples
> PgStat_TableCounts.t_delta_dead_tuples
> PgStat_TableCounts.t_changed_tuples
>
> But I think it would be better to do it in a follow-up patch (once this
> one get committed).
>
> [1]:
> https://www.postgresql.org/message-id/20230105002733.ealhzubjaiqis6ua%40awork3.anarazel.de
>
> Looking forward to your feedback,
>
> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com


I like code cleanups like this. It makes sense, it results in less code,
and anyone doing a `git grep pg_stat_get_live_tuples` will quickly find the
macro definition.

Unsurprisingly, it passes `make check-world`.

So I think it's good to go as-is.

It does get me wondering, however, if we reordered the three typedefs to
group like-typed registers together, we could make them an array with the
names becoming defined constant index values (or keeping them via a union),
then the typedefs effectively become:

typedef struct PgStat_FunctionCallUsage
{
PgStat_FunctionCounts *fs;
instr_time  time_counters[3];
} PgStat_FunctionCallUsage;


typedef struct PgStat_BackendSubEntry
{
PgStat_Counter counters[2];
} PgStat_BackendSubEntry;


typedef struct PgStat_TableCounts
{
boolt_truncdropped;
PgStat_Counter counters[12];
} PgStat_TableCounts;


Then we'd only have 3 actual C functions:

pg_stat_get_xact_counter(oid, int)
pg_stat_get_xact_subtrans_counter(oid, int)
pg_stat_get_xact_function_time_counter(oid, int)

and then the existing functions become SQL standard function body calls,
something like this:

CREATE OR REPLACE FUNCTION pg_stat_get_xact_numscans(oid)
 RETURNS bigint
 LANGUAGE sql
 STABLE PARALLEL RESTRICTED COST 1
RETURN pg_stat_get_xact_counter($1, 0);


CREATE OR REPLACE FUNCTION pg_stat_get_xact_tuples_returned(oid)
 RETURNS bigint
 LANGUAGE sql
 STABLE PARALLEL RESTRICTED COST 1
RETURN pg_stat_get_xact_counter($1, 1);



The most obvious drawback to this approach is that the C functions would
need to do runtime bounds checking on the index parameter, and the amount
of memory footprint saved by going from 17 short functions to 3 is not
enough to make any real difference. So I think your approach is better, but
I wanted to throw this idea out there.

Re: Add tracking of backend memory allocated to pg_stat_activity

2023-01-05 Thread Justin Pryzby

On Thu, Jan 05, 2023 at 01:58:33PM -0500, Reid Thompson wrote:
> On Tue, 2023-01-03 at 16:25 +0530, vignesh C wrote:
> > ...
> > The patch does not apply on top of HEAD as in [1], please post a
> > rebased patch:
> >... 
> > Regards,
> > Vignesh
> 
> Per conversation in thread listed below, patches have been submitted to the 
> "Add the ability to limit the amount of memory that can be allocated to 
> backends" thread
> https://www.postgresql.org/message-id/bd57d9a4c219cc1392665fd5fba61dde8027b3da.ca...@crunchydata.com

I suggest to close the associated CF entry.

(Also, the people who participated in this thread may want to be
included in the other thread going forward.)

> 0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patch
> 0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patch
> 
> On Thu, 8 Dec 2022 at 19:44, Reid Thompson
>  wrote:
> >
> > On Sun, 2022-11-27 at 09:40 -0600, Justin Pryzby wrote:
> > > > ...
> > > > I still wonder whether there needs to be a separate CF entry for
> > > > the 0001 patch.  One issue is that there's two different lists of
> > > > people involved in the threads.
> > > >
> >
> > I'm OK with containing the conversation to one thread if everyone else
> > is.  If there's no argument against, then patches after today will go
> > to the "Add the ability to limit the amount of memory that can be
> > allocated to backends" thread
> > https://www.postgresql.org/message-id/bd57d9a4c219cc1392665fd5fba61dde8027b3da.ca...@crunchydata.com

Re: fixing CREATEROLE

2023-01-05 Thread Robert Haas

On Tue, Jan 3, 2023 at 3:11 PM Robert Haas  wrote:
> Committed and back-patched 0001 with fixes for the issues that you pointed 
> out.
>
> Here's a trivial rebase of the rest of the patch set.

I committed 0001 and 0002 after improving the commit messages a bit.
Here's the remaining two patches back. I've done a bit more polishing
of these as well, specifically in terms of fleshing out the regression
tests. I'd like to move forward with these soon, if nobody's too
vehemently opposed to that.

Previous feedback, especially from Tom but also others, was that the
role-level properties the final patch was creating were not good. Now
it doesn't create any new role-level properties, and in fact it has
nothing to say about role-level properties in any way. That might not
be the right thing. Right now, if you have CREATEROLE, you can create
new roles with any combination of attributes you like, except that you
cannot set the SUPERUSER, REPLICATION, or BYPASSRLS properties. While
I think it makes sense that a CREATEROLE user can't hand out SUPERUSER
or REPLICATION privileges, it is really not obvious to me why a
CREATEROLE user shouldn't be permitted to hand out BYPASSRLS, at least
if they have it themselves, and right now there's no way to allow
that. On the other hand, I think that some superusers might want to
restrict a CREATEROLE user's ability to hand out CREATEROLE or
CREATEDB to the users they create, and right now there's no way to
prohibit that.

I don't have a great idea about what a system for handling this
problem ought to look like. In a vacuum, I think it would be
reasonable to change CREATEROLE to only allow CREATEDB, BYPASSRLS, and
similar to be given to new users if the creating user possesses them,
but that approach does not work for CREATEROLE, because if you didn't
have that, you couldn't create any new users at all. It's also pretty
weird for, say, CONNECTION LIMIT. I doubt that there's any connection
between the CONNECTION LIMIT of the CREATEROLE user and the values
that they ought to be able to set for users that they create. Probably
you just want to allow setting CONNECTION LIMIT for downstream users,
or not. Or maybe it's not even worth worrying about -- I think there
might be a decent argument that limiting the ability to set CONNECTION
LIMIT just isn't interesting.

If someone else has a good idea what we ought to do about this part of
the problem, I'd be interested to hear it. Absent such a good idea --
or if that good idea is more work to implement that can be done in the
near term -- I think it would be OK to ship as much as I've done here
and revisit the topic at some later point when we've had a chance to
absorb user feedback.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v4-0001-Restrict-the-privileges-of-CREATEROLE-users.patch
Description: Binary data

v4-0002-Add-new-GUC-createrole_self_grant.patch
Description: Binary data

Re: Add tracking of backend memory allocated to pg_stat_activity

2023-01-05 Thread Reid Thompson

On Tue, 2023-01-03 at 16:25 +0530, vignesh C wrote:
> ...
> The patch does not apply on top of HEAD as in [1], please post a
> rebased patch:
>... 
> Regards,
> Vignesh

Per conversation in thread listed below, patches have been submitted to the 
"Add the ability to limit the amount of memory that can be allocated to 
backends" thread
https://www.postgresql.org/message-id/bd57d9a4c219cc1392665fd5fba61dde8027b3da.ca...@crunchydata.com

0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patch
0002-Add-the-ability-to-limit-the-amount-of-memory-that-c.patch

On Thu, 8 Dec 2022 at 19:44, Reid Thompson
 wrote:
>
> On Sun, 2022-11-27 at 09:40 -0600, Justin Pryzby wrote:
> > > ...
> > > I still wonder whether there needs to be a separate CF entry for
> > > the 0001 patch.  One issue is that there's two different lists of
> > > people involved in the threads.
> > >
>
> I'm OK with containing the conversation to one thread if everyone else
> is.  If there's no argument against, then patches after today will go
> to the "Add the ability to limit the amount of memory that can be
> allocated to backends" thread
> https://www.postgresql.org/message-id/bd57d9a4c219cc1392665fd5fba61dde8027b3da.ca...@crunchydata.com

-- 
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thomp...@crunchydata.com
www.crunchydata.com

Re: pgsql: Delay commit status checks until freezing executes.

2023-01-05 Thread Peter Geoghegan

On Wed, Jan 4, 2023 at 10:59 PM Amit Kapila  wrote:
> You are an extremely valuable person for this project and I wish that
> you continue working with the same enthusiasm.

Thank you, Amit. Knowing that my efforts are appreciated by colleagues
does make it easier to persevere.

--
Peter Geoghegan

ATTACH PARTITION seems to ignore column generation status

2023-01-05 Thread Tom Lane

This does not seem good:

regression=# create table pp (a int, b int) partition by range(a);
CREATE TABLE
regression=# create table cc (a int generated always as (b+1) stored, b int);
CREATE TABLE
regression=# alter table pp attach partition cc for values from ('1') to 
('10'); 
ALTER TABLE
regression=# insert into pp values(1,100);
INSERT 0 1
regression=# table pp;
  a  |  b  
-+-
 101 | 100
(1 row)

I'm not sure to what extent it's sensible for partitions to have
GENERATED columns that don't match their parent; but even if that's
okay for payload columns I doubt we want to allow partitioning
columns to be GENERATED.

regards, tom lane

Re: Add the ability to limit the amount of memory that can be allocated to backends.

2023-01-05 Thread Reid Thompson

On Tue, 2023-01-03 at 16:22 +0530, vignesh C wrote:
> 
> The patch does not apply on top of HEAD as in [1], please post a
> rebased patch:
> ...
> Regards,
> Vignesh
>

Attached is rebased patch, with some updates related to committed changes.

Thanks,
Reid

-- 
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thomp...@crunchydata.com
www.crunchydata.com

From 69516942b71d5d41850fbc00b971db7476c7a01a Mon Sep 17 00:00:00 2001
From: Reid Thompson 
Date: Sat, 4 Jun 2022 22:23:59 -0400
Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be
 allocated to backends.

This builds on the work that adds backend memory allocated to pg_stat_activity.

Add GUC variable max_total_backend_memory.

Specifies a limit to the amount of memory (in MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit). If unset,
or set to 0 it is disabled. It is intended as a resource to help avoid the OOM
killer on LINUX and manage resources in general. A backend request that would
push the total over the limit will be denied with an out of memory error causing
that backend's current query/transaction to fail. Due to the dynamic nature of
memory allocations, this limit is not exact. If within 1.5MB of the limit and
two backends request 1MB each at the same time both may be allocated, and exceed
the limit. Further requests will not be allocated until dropping below the
limit. Keep this in mind when setting this value. This limit does not affect
auxiliary backend processes. Backend memory allocations are displayed in the
pg_stat_activity view.
---
 doc/src/sgml/config.sgml  |  26 +
 src/backend/storage/ipc/dsm_impl.c|  12 ++
 src/backend/utils/activity/backend_status.c   | 108 ++
 src/backend/utils/misc/guc_tables.c   |  11 ++
 src/backend/utils/misc/postgresql.conf.sample |   3 +
 src/backend/utils/mmgr/aset.c |  17 +++
 src/backend/utils/mmgr/generation.c   |   9 ++
 src/backend/utils/mmgr/slab.c |   9 +-
 src/include/utils/backend_status.h|   3 +
 9 files changed, 197 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 05b3862d09..0362f26451 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2079,6 +2079,32 @@ include_dir 'conf.d'
   
  
 
+ 
+  max_total_backend_memory (integer)
+  
+   max_total_backend_memory configuration parameter
+  
+  
+  
+   
+Specifies a limit to the amount of memory (MB) that may be allocated to
+backends in total (i.e. this is not a per user or per backend limit).
+If unset, or set to 0 it is disabled.  A backend request that would
+push the total over the limit will be denied with an out of memory
+error causing that backend's current query/transaction to fail. Due to
+the dynamic nature of memory allocations, this limit is not exact. If
+within 1.5MB of the limit and two backends request 1MB each at the same
+time both may be allocated, and exceed the limit. Further requests will
+not be allocated until dropping below the limit. Keep this in mind when
+setting this value. This limit does not affect auxiliary backend
+processes  . Backend memory
+allocations (allocated_bytes) are displayed in the
+pg_stat_activity
+view.
+   
+  
+ 
+
  
  
 
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 22885c7bd2..f7047107d5 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -254,6 +254,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
 		return true;
 	}
 
+	/* Do not exceed maximum allowed memory allocation */
+	if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+		return false;
+
 	/*
 	 * Create new segment or open an existing one for attach.
 	 *
@@ -525,6 +529,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
 		int			flags = IPCProtection;
 		size_t		segsize;
 
+		/* Do not exceed maximum allowed memory allocation */
+		if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+			return false;
+
 		/*
 		 * Allocate the memory BEFORE acquiring the resource, so that we don't
 		 * leak the resource if memory allocation fails.
@@ -719,6 +727,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
 		return true;
 	}
 
+	/* Do not exceed maximum allowed memory allocation */
+	if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+		return false;
+
 	/* Create new segment or open an existing one for attach. */
 	if (op == DSM_OP_CREATE)
 	{
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 7baf2db57d..da2b5fb042 100644
---

Support for dumping extended statistics

2023-01-05 Thread Hari krishna Maddileti

Hi Team,

In order to restore dumped extended statistics (stxdndistinct, 
stxddependencies, stxdmcv) we need to provide input functions to parse 
pg_distinct/pg_dependency/pg_mcv_list strings.

Today we get the ERROR "cannot accept a value of type 
pg_ndistinct/pg_dependencies/pg_mcv_list" when we try to do an insert of any 
type.

Approch tried:
- Using yacc grammar file (statistics_gram.y) to parse the input string to its 
internal format for the types pg_distinct and pg_dependencies
- We are just calling byteain() for serialized input text of type pg_mcv_list.

Currently the changes are working locally,  I would like to push the commit 
changes to upstream if there any usecase for postgres.   Would like to know if 
there any interest from postgres side.

Regards,
Hari Krishna

Re: New strategies for freezing, advancing relfrozenxid early

2023-01-05 Thread Matthias van de Meent

On Thu, 5 Jan 2023 at 02:21, I wrote:
>
> On Tue, 3 Jan 2023 at 21:30, Peter Geoghegan  wrote:
> >
> > Attached is v14.
> > [PATCH v14 3/3] Finish removing aggressive mode VACUUM.
>
> I've not completed a review for this patch - I'll continue on that
> tomorrow:

This is that.

> @@ -2152,10 +2109,98 @@ lazy_scan_noprune(LVRelState *vacrel,
> [...]
> +/* wait 10ms, then 20ms, then 30ms, then give up */
> [...]
> +pg_usleep(1000L * 10L * i);

Could this use something like autovacuum_cost_delay? I don't quite
like the use of arbitrary hardcoded millisecond delays - it can slow a
system down by a significant fraction, especially on high-contention
systems, and this potential of 60ms delay per scanned page can limit
the throughput of this new vacuum strategy to < 17 pages/second
(<136kB/sec) for highly contended sections, which is not great.

It is also not unlikely that in the time it was waiting, the page
contents were updated significantly (concurrent prune, DELETEs
committed), which could result in improved bounds. I think we should
redo the dead items check if we waited, but failed to get a lock - any
tuples removed now reduce work we'll have to do later.

> +++ b/doc/src/sgml/ref/vacuum.sgml
> [...] Pages where
> +  all tuples are known to be frozen are always skipped.

"...are always skipped, unless the >DISABLE_PAGE_SKIPPING< option is used."

> +++ b/doc/src/sgml/maintenance.sgml

There are a lot of details being lost from the previous version of
that document. Some of the details are obsolete (mentions of
aggressive VACUUM and freezing behavior), but others are not
(FrozenTransactionId in rows from a pre-9.4 system, the need for
vacuum for prevention of issues surrounding XID wraparound).

I also am not sure this is the best place to store most of these
mentions, but I can't find a different place where these details on
certain interesting parts of the system are documented, and plain
removal of the information does not sit right with me.

Specifically, I don't like the removal of the following information
from our documentation:

- Size of pg_xact and pg_commit_ts data in relation to autovacuum_freeze_max_age
   Although it is less likely with the new behaviour that we'll hit
these limits due to more eager freezing of transactions, it is still
important for users to have easy access to this information, and
tuning this for storage size is not useless information.

- The reason why VACUUM is essential to the long-term consistency of
Postgres' MVCC system
Informing the user about our use of 32-bit transaction IDs and
that we update an epoch when this XID wraps around does not
automatically make the user aware of the issues that surface around
XID wraparound. Retaining the explainer for XID wraparound in the docs
seems like a decent idea - it may be moved, but please don't delete
it.

- Special transaction IDs, their meaning and where they can occur
   I can't seem to find any other information in the docs section, and
it is useful to have users understand that certain values are
considered special: FrozenTransactionId and BootstrapTransactionId.

Kind regards,

Matthias van de Meent

Re: daitch_mokotoff module

2023-01-05 Thread Alvaro Herrera

On 2023-Jan-05, Dag Lem wrote:

> Is there anything else I should do here, to avoid the status being
> incorrectly stuck at "Waiting for Author" again.

Just mark it Needs Review for now.  I'll be back from vacation on Jan
11th and can have a look then (or somebody else can, perhaps.)

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"

Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-05 Thread Nathan Bossart

On Thu, Jan 05, 2023 at 10:57:58AM +0530, Amit Kapila wrote:
> True, if we want we can use dshash for this.

I'll look into this.

> The garbage collection
> mechanism used in the patch seems odd to me as that will remove/add
> entries to the hash table even when the corresponding subscription is
> never dropped.

Yeah, I think this deserves a comment.  We can remove anything beyond
wal_retrieve_retry_interval because the lack of a hash table entry is taken
to mean that we can start the worker immediately.  There might be a corner
case when wal_retrieve_retry_interval is concurrently updated, in which
case we'll effectively use the previous value for the worker.  That doesn't
seem too terrible to me.

It might be possible to remove this garbage collection completely if we use
dshash, but I haven't thought through that approach completely yet.

> Also, adding this garbage collection each time seems
> like an overhead, especially for small values of
> wal_retrieve_retry_interval and a large number of subscriptions.

Right.

> Another point is immediately after cleaning the worker info, trying to
> find it again seems of no use. In logicalrep_worker_launch(), using
> both in_use and restart_immediately to find an unused slot doesn't
> look neat to me, we could probably keep the in_use flag intact if we
> want to reuse the worker. But again after freeing the worker, keeping
> its associated slot allocated sounds odd to me.

Yeah, this flag certainly feels hacky.  With a shared hash table, we could
just have backends remove the last-start-time entry directly, and we
wouldn't need the flag.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-05 Thread Nathan Bossart

On Thu, Jan 05, 2023 at 11:34:37AM +0530, Amit Kapila wrote:
> On Thu, Jan 5, 2023 at 10:16 AM Nathan Bossart  
> wrote:
>> In v12, I moved the restart for two_phase mode to the end of
>> process_syncing_tables_for_apply() so that we don't need to rely on another
>> iteration of the loop.
> 
> This should work but it is better to add a comment before calling
> CommandCounterIncrement() to indicate that this is for making changes
> to the relation state visible.

Will do.

> Thinking along similar lines, won't apply worker need to be notified
> of SUBREL_STATE_SYNCWAIT state change by the tablesync worker?

wait_for_worker_state_change() should notify the apply worker in this case.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: fix and document CLUSTER privileges

2023-01-05 Thread Nathan Bossart

On Thu, Jan 05, 2023 at 02:38:58PM +0100, Gilles Darold wrote:
> This is a bit confusing, this commitfest entry patch is also included in an
> other commitfest entry [1] into patch v3-0001-fix-maintain-privs.patch with
> some additional conditions.
> 
> Committers should be aware that this commitfest entry must be withdrawn if
> [1] is committed first.  There is no status or dependency field that I can
> use, I could move this one to "Ready for Committer" status but this is not
> exact until [1] has been committed or withdrawn.

I will either rebase the other patch or discard this one as needed.  I'm
not positive that we'll proceed with the proposed approach for the other
one, but the patch tracked here should still be worthwhile regardless.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Infinite Interval

2023-01-05 Thread Joseph Koshakow

On Thu, Jan 5, 2023 at 5:20 AM jian he  wrote:
>
>
>
> On Wed, Jan 4, 2023 at 10:13 PM jian he  wrote:
>>
>>
>>
>> I don't know how to generate an interval.out file.

Personally I just write the .out files manually. I think it especially
helps as a way to double-check that the results are what you expected.
After running make check a regressions.diff file will be generated with
all the differences between your .out file and the results of the test.


> logic combine and clean up for functions in backend/utils/adt/timestamp.c 
> (timestamp_pl_interval,timestamptz_pl_interval, interval_pl, interval_mi).

One thing I was hoping to achieve was to avoid redundant checks if
possible. For example, in the following code:
> +if ((INTERVAL_IS_NOBEGIN(span1) && INTERVAL_IS_NOEND(span2))
> +  ||(INTERVAL_IS_NOBEGIN(span1) && !INTERVAL_NOT_FINITE(span2))
> +  ||(!INTERVAL_NOT_FINITE(span1) && INTERVAL_IS_NOEND(span2)))
> +   INTERVAL_NOBEGIN(result);
If `(INTERVAL_IS_NOBEGIN(span1) && INTERVAL_IS_NOEND(span2))` is false,
then we end up checking `INTERVAL_IS_NOBEGIN(span1)` twice

> For 1. I don't know how to format the code. I have a problem installing 
> pg_indent. If the format is wrong, please reformat.

I'll run pg_indent and send an updated patch if anything changes.

Thanks for your help on this patch!

- Joe Koshakow

Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

2023-01-05 Thread Tom Lane

Amit Langote  writes:
> On Thu, Jan 5, 2023 at 4:59 AM Tom Lane  wrote:
>> Here's a draft patch that does it like that.  This seems like a win
>> for more reasons than just pruning, because I was able to integrate
>> the calculation into runtime setup of the expressions, so that we
>> aren't doing an extra stringToNode() on them.

> Thanks for the patch.  This looks pretty neat and I agree that this
> seems like a net win overall.

Thanks for looking.

>> I've not looked into what it'd take to back-patch this.  We can't
>> add a field to ResultRelInfo in released branches (cf 4b3e37993),
>> but we might be able to repurpose RangeTblEntry.extraUpdatedCols.

> I think we can make that work.  Would you like me to give that a try?

I'm on it already.  AFAICT, the above won't actually work because
we don't have RTEs for all ResultRelInfos (per the
"relinfo->ri_RangeTableIndex != 0" test in ExecGetExtraUpdatedCols).
Probably we need something more like what 4b3e37993 did.

regards, tom lane

Re: Allow tailoring of ICU locales with custom rules

2023-01-05 Thread Peter Eisentraut


Patch needed a rebase; no functionality changes.

On 14.12.22 10:26, Peter Eisentraut wrote:
This patch exposes the ICU facility to add custom collation rules to a 
standard collation.  This would allow users to customize any ICU 
collation to whatever they want.  A very simple example from the 
documentation/tests:


CREATE COLLATION en_custom
     (provider = icu, locale = 'en', rules = ' < g');

This places "g" after "a" before "b".  Details about the syntax can be 
found at 
.


The code is pretty straightforward.  It mainly just records these rules 
in the catalog and feeds them to ICU when creating the collator object.
From ae729e2d5d37a382b713ea95897ccb35d0c1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 14 Dec 2022 10:15:03 +0100
Subject: [PATCH v2] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: 
https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f7...@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml| 18 +++
 doc/src/sgml/ref/create_collation.sgml| 22 +
 doc/src/sgml/ref/create_database.sgml | 12 +
 src/backend/catalog/pg_collation.c|  5 ++
 src/backend/commands/collationcmds.c  | 23 +++--
 src/backend/commands/dbcommands.c | 49 +--
 src/backend/utils/adt/pg_locale.c | 41 +++-
 src/backend/utils/init/postinit.c | 11 -
 src/include/catalog/pg_collation.h|  2 +
 src/include/catalog/pg_database.h |  3 ++
 src/include/utils/pg_locale.h |  1 +
 .../regress/expected/collate.icu.utf8.out | 30 
 src/test/regress/sql/collate.icu.utf8.sql | 13 +
 13 files changed, 220 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9316b811ac..afa9f28ef9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ pg_collation 
Columns
   
  
 
+ 
+  
+   collicurules text
+  
+  
+   ICU collation rules for this collation object
+  
+ 
+
  
   
collversion text
@@ -3106,6 +3115,15 @@ pg_database 
Columns
   
  
 
+ 
+  
+   daticurules text
+  
+  
+   ICU collation rules for this database
+  
+ 
+
  
   
datcollversion text
diff --git a/doc/src/sgml/ref/create_collation.sgml 
b/doc/src/sgml/ref/create_collation.sgml
index 58f5f0cd63..2c7266107e 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
 [ LC_CTYPE = lc_ctype, ]
 [ PROVIDER = provider, ]
 [ DETERMINISTIC = boolean, ]
+[ RULES = rules, ]
 [ VERSION = version ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] name FROM 
existing_collation
@@ -149,6 +150,19 @@ Parameters
  
 
 
+
+ rules
+
+ 
+  
+   Specifies additional collation rules to customize the behavior of the
+   collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+   for details on the syntax.
+  
+ 
+
+
 
  version
 
@@ -228,6 +242,14 @@ Examples
 
   
 
+  
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+
+
+
+  
+
   
To create a collation from an existing collation:
 
diff --git a/doc/src/sgml/ref/create_database.sgml 
b/doc/src/sgml/ref/create_database.sgml
index ea38c64731..aa6f121a81 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -192,6 +192,18 @@ Parameters
   
  
 
+ 
+  icu_rules
+  
+   
+Specifies additional collation rules to customize the behavior of the
+collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+for details on the syntax.
+   
+  
+ 
+
  
   locale_provider
 
diff --git a/src/backend/catalog/pg_collation.c 
b/src/backend/catalog/pg_collation.c
index 287b13725d..fd022e6fc2 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, Oid collnamespace,
int32 collencoding,
const char *collcollate, const char *collctype,
const char *colliculocale,
+   const char *collicurules,
const char *collversion,
bool if_not_exists,
bool quiet)
@@ -194,6 +195,10 @@ CollationCreate(const char *collname, Oid collnamespace,

Re: moving extraUpdatedCols out of RangeTblEntry (into ModifyTable)

2023-01-05 Thread Amit Langote

On Thu, Jan 5, 2023 at 4:59 AM Tom Lane  wrote:
> I wrote:
> > After further thought: maybe we should get radical and postpone this
> > work all the way to executor startup.  The downside of that is having
> > to do it over again on each execution of a prepared plan.  But the
> > upside is that when the UPDATE targets a many-partitioned table,
> > we would have a chance at not doing the work at all for partitions
> > that get pruned at runtime.  I'm not sure if that win would emerge
> > immediately or if we still have executor work to do to manage pruning
> > of the target table.  I'm also not sure that this'd be a net win
> > overall.  But it seems worth considering.
>
> Here's a draft patch that does it like that.  This seems like a win
> for more reasons than just pruning, because I was able to integrate
> the calculation into runtime setup of the expressions, so that we
> aren't doing an extra stringToNode() on them.

Thanks for the patch.  This looks pretty neat and I agree that this
seems like a net win overall.

As an aside, I wonder why AttrDefault (and other things in
RelationData that need stringToNode() done on them to put into a Query
or a plan tree) doesn't store the expression Node tree to begin with?

> There's still a code path that does such a calculation at plan time
> (get_rel_all_updated_cols), but it's only used by postgres_fdw which
> has some other rather-inefficient behaviors in the same area.
>
> I've not looked into what it'd take to back-patch this.  We can't
> add a field to ResultRelInfo in released branches (cf 4b3e37993),
> but we might be able to repurpose RangeTblEntry.extraUpdatedCols.

I think we can make that work.  Would you like me to give that a try?

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: heapgettup refactoring

2023-01-05 Thread Peter Eisentraut


On 03.01.23 21:39, Melanie Plageman wrote:

On 30.11.22 23:34, Melanie Plageman wrote:

I have attached a patchset with only the code changes contained in the
previous patch 0003. I have broken the refactoring down into many
smaller pieces for ease of review.


To keep this moving along a bit, I have committed your 0002, which I
think is a nice little improvement on its own.


Thanks!
I've attached a rebased patchset - v4.

I also changed heapgettup_no_movement() to noinline (instead of inline).
David Rowley pointed out that this might make more sense given how
comparatively rare no movement scans are.


Ok, let's look through these patches starting from the top then.

v4-0001-Add-no-movement-scan-helper.patch

This makes sense overall; there is clearly some duplicate code that can 
be unified.


It appears that during your rebasing you have effectively reverted your 
earlier changes that have been committed as 
8e1db29cdbbd218ab6ba53eea56624553c3bef8c.  You should undo that.


I don't understand the purpose of the noinline maker.  If it's not 
necessary to inline, we can just leave it off, but there is no need to 
outright prevent inlining AFAICT.


I don't know why you changed the if/else sequences.  Before, the 
sequence was effectively


if (forward)
{
...
}
else if (backward)
{
...
}
else
{
/* it's no movement */
}

Now it's changed to

if (no movement)
{
...
return;
}

if (forward)
{
...
}
else
{
Assert(backward);
...
}

Sure, that's the same thing, but it looks less elegant to me.

Generate pg_stat_get_xact*() functions with Macros

2023-01-05 Thread Drouvot, Bertrand


Hi hackers,

Please find attached a patch proposal to $SUBJECT.

This is the same kind of work that has been done in 83a1a1b566 and 8018ffbf58 
but this time for the
pg_stat_get_xact*() functions (as suggested by Andres in [1]).

The function names remain the same, but some fields have to be renamed.

While at it, I also took the opportunity to create the macros for 
pg_stat_get_xact_function_total_time(),
pg_stat_get_xact_function_self_time() and pg_stat_get_function_total_time(), 
pg_stat_get_function_self_time()
(even if the same code pattern is only repeated two 2 times).

Now that this patch renames some fields, I think that, for consistency, those 
ones should be renamed too (aka remove the f_ and t_ prefixes):

PgStat_FunctionCounts.f_numcalls
PgStat_StatFuncEntry.f_numcalls
PgStat_TableCounts.t_truncdropped
PgStat_TableCounts.t_delta_live_tuples
PgStat_TableCounts.t_delta_dead_tuples
PgStat_TableCounts.t_changed_tuples

But I think it would be better to do it in a follow-up patch (once this one get 
committed).

[1]: 
https://www.postgresql.org/message-id/20230105002733.ealhzubjaiqis6ua%40awork3.anarazel.de

Looking forward to your feedback,

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/backend/utils/activity/pgstat_function.c 
b/src/backend/utils/activity/pgstat_function.c
index 6139a27fee..77af8541e0 100644
--- a/src/backend/utils/activity/pgstat_function.c
+++ b/src/backend/utils/activity/pgstat_function.c
@@ -124,7 +124,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
fcu->fs = >f_counts;
 
/* save stats for this function, later used to compensate for recursion 
*/
-   fcu->save_f_total_time = pending->f_counts.f_total_time;
+   fcu->save_total_time = pending->f_counts.total_time;
 
/* save current backend-wide total time */
fcu->save_total = total_func_time;
@@ -168,19 +168,19 @@ pgstat_end_function_usage(PgStat_FunctionCallUsage *fcu, 
bool finalize)
INSTR_TIME_ADD(total_func_time, f_self);
 
/*
-* Compute the new f_total_time as the total elapsed time added to the
-* pre-call value of f_total_time.  This is necessary to avoid
+* Compute the new total_time as the total elapsed time added to the
+* pre-call value of total_time.  This is necessary to avoid
 * double-counting any time taken by recursive calls of myself.  (We do
 * not need any similar kluge for self time, since that already excludes
 * any recursive calls.)
 */
-   INSTR_TIME_ADD(f_total, fcu->save_f_total_time);
+   INSTR_TIME_ADD(f_total, fcu->save_total_time);
 
/* update counters in function stats table */
if (finalize)
fs->f_numcalls++;
-   fs->f_total_time = f_total;
-   INSTR_TIME_ADD(fs->f_self_time, f_self);
+   fs->total_time = f_total;
+   INSTR_TIME_ADD(fs->self_time, f_self);
 }
 
 /*
@@ -204,10 +204,10 @@ pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool 
nowait)
return false;
 
shfuncent->stats.f_numcalls += localent->f_counts.f_numcalls;
-   shfuncent->stats.f_total_time +=
-   INSTR_TIME_GET_MICROSEC(localent->f_counts.f_total_time);
-   shfuncent->stats.f_self_time +=
-   INSTR_TIME_GET_MICROSEC(localent->f_counts.f_self_time);
+   shfuncent->stats.total_time +=
+   INSTR_TIME_GET_MICROSEC(localent->f_counts.total_time);
+   shfuncent->stats.self_time +=
+   INSTR_TIME_GET_MICROSEC(localent->f_counts.self_time);
 
pgstat_unlock_entry(entry_ref);
 
diff --git a/src/backend/utils/activity/pgstat_relation.c 
b/src/backend/utils/activity/pgstat_relation.c
index 1730425de1..625cd2a96f 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -371,9 +371,9 @@ pgstat_count_heap_update(Relation rel, bool hot)
ensure_tabstat_xact_level(pgstat_info);
pgstat_info->trans->tuples_updated++;
 
-   /* t_tuples_hot_updated is nontransactional, so just advance it 
*/
+   /* tuples_hot_updated is nontransactional, so just advance it */
if (hot)
-   pgstat_info->t_counts.t_tuples_hot_updated++;
+   pgstat_info->t_counts.tuples_hot_updated++;
}
 }
 
@@ -501,9 +501,9 @@ AtEOXact_PgStat_Relations(PgStat_SubXactStatus *xact_state, 
bool isCommit)
if (!isCommit)
restore_truncdrop_counters(trans);
/* count attempted actions regardless of commit/abort */
-   tabstat->t_counts.t_tuples_inserted += trans->tuples_inserted;
-   tabstat->t_counts.t_tuples_updated += trans->tuples_updated;
-   tabstat->t_counts.t_tuples_deleted += trans->tuples_deleted;
+

Re: fix and document CLUSTER privileges

2023-01-05 Thread Gilles Darold


Le 05/01/2023 à 06:12, Nathan Bossart a écrit :

On Wed, Jan 04, 2023 at 11:27:05PM +0100, Gilles Darold wrote:

Got it, this is patch add_cluster_skip_messages.patch . IMHO this patch
should be part of this commitfest as it is directly based on this one. You
could create a second patch here that adds the warning message so that
committers can decide here if it should be applied.

That's fine with me.  I added the warning messages in v4.



This is a bit confusing, this commitfest entry patch is also included in 
an other commitfest entry [1] into patch 
v3-0001-fix-maintain-privs.patch with some additional conditions.



Committers should be aware that this commitfest entry must be withdrawn 
if [1] is committed first.  There is no status or dependency field that 
I can use, I could move this one to "Ready for Committer" status but 
this is not exact until [1] has been committed or withdrawn.



[1] https://commitfest.postgresql.org/41/4070/


--
Gilles Darold

Re: Ignore dropped columns when checking the column list in logical replication

2023-01-05 Thread Amit Kapila

On Wed, Jan 4, 2023 at 12:28 PM shiy.f...@fujitsu.com
 wrote:
>
> I tried to fix them in the attached patch.
>

Don't we need a similar handling for generated columns? We don't send
those to the subscriber side, see checks in proto.c.

-- 
With Regards,
Amit Kapila.

Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

2023-01-05 Thread Bharath Rupireddy

On Wed, Jan 4, 2023 at 12:03 AM Nathan Bossart  wrote:
>
> On Tue, Jan 03, 2023 at 02:53:10PM +0530, Bharath Rupireddy wrote:
> > In summary:
> > the flow when the standby is in crash recovery is pg_wal -> [archive
> > -> pg_wal -> stream] -> [archive -> pg_wal -> stream] -> [] -> [] ...
> > the flow when the standby is in archive recovery is [archive -> pg_wal
> > -> stream] -> [archive -> pg_wal -> stream] -> [] -> [] ...
>
> This is my understanding as well.
>
> > The proposed patch makes the inherent state change to pg_wal after
> > failure to read from archive in XLogFileReadAnyTLI() to explicit by
> > setting currentSource to XLOG_FROM_PG_WAL in the state machine. I
> > think it doesn't alter the existing state machine or add any new extra
> > lookups in pg_wal.
>
> I'm assuming this change would simplify your other patch that modifieѕ
> WaitForWALToBecomeAvailable() [0].  Is that correct?
>
> [0] https://commitfest.postgresql.org/41/3663/

Yes, it does simplify the other feature patch.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Add a new pg_walinspect function to extract FPIs from WAL records

2023-01-05 Thread Bharath Rupireddy

On Wed, Jan 4, 2023 at 8:19 PM Drouvot, Bertrand
 wrote:
>
> I think it makes sense to somehow align the pg_walinspect functions with the 
> pg_waldump "features".
> And since [1] added FPI "extraction" then +1 for the proposed patch in this 
> thread.
>
> > [1] 
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d497093cbecccf6df26365e06a5f8f8614b591c8
> > [2] 
> > https://postgr.es/m/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjx6bx...@mail.gmail.com
>
> I just have a few comments:

Thanks for reviewing.

> +
> +/*
> + * Get full page images and their info associated with a given WAL record.
> + */
>
>
> + 
> +  Gets raw full page images and their information associated with all the
> +  valid WAL records between start_lsn and
> +  end_lsn. Returns one row per full page 
> image.
>
> Worth to add a few words about decompression too?

Done.

> What about adding a few words about compression? (like "Decompression is 
> applied if necessary"?)
>
>
> +   /* Full page exists, so let's output it. */
> +   if (!RestoreBlockImage(record, block_id, page))
>
> "Full page exists, so let's output its info and content." instead?

Done.

> I'm also wondering if it would make sense to extend the test coverage of it 
> (and pg_waldump) to "validate" that both
> extracted images are the same and matches the one modified right after the 
> checkpoint.
>
> What do you think? (could be done later in another patch though).

I think pageinspect can be used here. We can fetch the raw page from
the table after the checkpoint and raw FPI from the WAL record logged
as part of the update. I've tried to do so [1], but I see a slight
difference in the raw output. The expectation is that they both be the
same. It might be that the update operation logs the FPI with some
more info set (prune_xid). I'll try to see why it is so.

I'm attaching the v2 patch for further review.

[1]
SELECT * FROM page_header(:'page_from_table');
lsn| checksum | flags | lower | upper | special | pagesize |
version | prune_xid
---+--+---+---+---+-+--+-+---
 0/1891D78 |0 | 0 |40 |  8064 |8192 | 8192 |
4 | 0
(1 row)

SELECT * FROM page_header(:'page_from_wal');
lsn| checksum | flags | lower | upper | special | pagesize |
version | prune_xid
---+--+---+---+---+-+--+-+---
 0/1891D78 |0 | 0 |44 |  8032 |8192 | 8192 |
4 |   735
(1 row)

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v2-0001-Add-FPI-extract-function-to-pg_walinspect.patch
Description: Binary data

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2023-01-05 Thread Dean Rasheed

On Thu, 5 Jan 2023 at 11:03, Alvaro Herrera  wrote:
>
> I haven't read this patch other than superficially; I suppose the
> feature it's introducing is an OK one to have as an extension to the
> standard.  (I hope the community members that are committee members
> will propose this extension to become part of the standard.)
>

Thanks for looking!

> > --- a/src/backend/optimizer/prep/preptlist.c
> > +++ b/src/backend/optimizer/prep/preptlist.c
> > @@ -157,15 +157,14 @@ preprocess_targetlist(PlannerInfo *root)
> >   /*
> >* Add resjunk entries for any Vars used in each 
> > action's
> >* targetlist and WHEN condition that belong to 
> > relations other
> > -  * than target.  Note that aggregates, window 
> > functions and
> > -  * placeholder vars are not possible anywhere in 
> > MERGE's WHEN
> > -  * clauses.  (PHVs may be added later, but they don't 
> > concern us
> > -  * here.)
> > +  * than target.  Note that aggregates and window 
> > functions are not
> > +  * possible anywhere in MERGE's WHEN clauses, but 
> > PlaceHolderVars
> > +  * may have been added by subquery pullup.
> >*/
> >   vars = pull_var_clause((Node *)
> >  
> > list_concat_copy((List *) action->qual,
> > 
> >   action->targetList),
> > -0);
> > +
> > PVC_INCLUDE_PLACEHOLDERS);
>
> Hmm, is this new because of NOT MATCHED BY SOURCE, or is it something
> that can already be hit by existing features of MERGE?  In other words
> -- is this a bug fix that should be backpatched ahead of introducing NOT
> MATCHED BY SOURCE?
>

It's new because of NOT MATCHED BY SOURCE, and I also found that I had
to make the same change in the MERGE INTO view patch, in the case
where the target view is simple enough to allow subquery pullup, but
also had INSTEAD OF triggers causing the pullup to happen in the
planner rather than the rewriter.

I couldn't think of a way that it could happen with the existing MERGE
code though, so I don't think it's a bug that needs fixing and
back-patching.

> > @@ -127,10 +143,12 @@ transformMergeStmt(ParseState *pstate, M
> >*/
> >   is_terminal[0] = false;
> >   is_terminal[1] = false;
> > + is_terminal[2] = false;
>
> I think these 0/1/2 should be replaced by the values of MergeMatchKind.
>

Agreed.

> > + /* Join type required */
> > + if (left_join && right_join)
> > + qry->mergeJoinType = JOIN_FULL;
> > + else if (left_join)
> > + qry->mergeJoinType = JOIN_LEFT;
> > + else if (right_join)
> > + qry->mergeJoinType = JOIN_RIGHT;
> > + else
> > + qry->mergeJoinType = JOIN_INNER;
>
> One of the review comments that MERGE got initially was that parse
> analysis was not a place to "do query optimization", in the sense that
> the original code was making a decision whether to make an outer or
> inner join based on the set of WHEN clauses that appear in the command.
> That's how we ended up with transform_MERGE_to_join and
> mergeUseOuterJoin instead.  This new code is certainly not the same, but
> it makes me a bit unconfortable.  Maybe it's OK, though.
>

Yeah I agree, it's a bit ugly. Perhaps a better solution would be to
do away with that field entirely and just make the decision in
transform_MERGE_to_join() by examining the action list again. That
would require making MergeAction's "matched" field a MergeMatchKind
rather than a bool, but maybe that's not so bad, since retaining that
information might prove useful one day.

Regards,
Dean

BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-01-05 Thread Michail Nikolaev

Hello, hackers.

It seems like PG 14 works incorrectly with vacuum_defer_cleanup_age
(or just not cleared rows, not sure) and SELECT FOR UPDATE + UPDATE.
I am not certain, but hot_standby_feedback probably able to cause the
same issues.

Steps to reproduce:

1) Start Postgres like this:

  docker run -it -p 5432:5432 --name pg -e
POSTGRES_PASSWORD=postgres -e LANG=C.UTF-8 -d postgres:14.6  -c
vacuum_defer_cleanup_age=100

2) Prepare scheme:

  CREATE TABLE something_is_wrong_here (id bigserial PRIMARY KEY,
value numeric(15,4) DEFAULT 0 NOT NULL);
  INSERT INTO something_is_wrong_here (value) (SELECT 1 from
generate_series(0, 100));

3) Prepare file for pgbench:

  BEGIN;

  SELECT omg.*
 FROM something_is_wrong_here AS omg
 ORDER BY random()
 LIMIT 1
 FOR UPDATE
  \gset

  UPDATE something_is_wrong_here SET value = :value + 1 WHERE id = :id;

  END;

4) Run pgbench:

  pgbench -c 50 -j 2 -n -f reproduce.bench 'port= 5432
host=localhost user=postgres dbname=postgres password=postgres' -T 100
-P 1

I was able to see such a set of errors (looks scary):

ERROR:  MultiXactId 30818104 has not been created yet -- apparent wraparound
ERROR:  could not open file "base/13757/16385.1" (target block
39591744): previous segment is only 24 blocks
ERROR:  attempted to lock invisible tuple
ERROR:  could not access status of transaction 38195704
DETAIL:  Could not open file "pg_subtrans/0246": No such file or directory.


Best regards,
Michail.

Re: Improve tab completion for ALTER FUNCTION/PROCEDURE/ROUTINE

2023-01-05 Thread Dean Rasheed

On Tue, 6 Dec 2022 at 19:12, vignesh C  wrote:
>
> On Tue, 6 Dec 2022 at 20:42, Melih Mutlu  wrote:
> >
> > Also one little suggestion:
> >
> >> + if (ends_with(prev_wd, ')'))
> >> + COMPLETE_WITH(Alter_routine_options, "CALLED ON NULL INPUT",
> >> +  "RETURNS NULL ON NULL INPUT", "STRICT", "SUPPORT");
> >
> > What do you think about gathering FUNCTION options as you did with ROUTINE 
> > options.
> > Something like the following would seem nicer, I think.
> >
> >> #define Alter_function_options \
> >> Alter_routine_options, "CALLED ON NULL INPUT", \
> >> "RETURNS NULL ON NULL INPUT", "STRICT", "SUPPORT"
>
> I did not make it as a macro for alter function options as it is used
> only in one place whereas the others were required in more than one
> place.

My feeling is that having this macro somewhat improves readability and
consistency between the 3 cases, so I think it's worth it, even if
it's only used once.

I think it slightly improves readability to keep all the arguments to
Matches() on one line, and that seems to be the style elsewhere, even
if that makes the line longer than 80 characters.

Also in the interests of readability, I think it's slightly easier to
follow if the "ALTER PROCEDURE  (...)" and "ALTER ROUTINE 
(...)" cases are made to immediately follow the "ALTER FUNCTION 
(...)" case, with the longer/more complex cases following on from
that.

That leads to the attached, which barring objections, I'll push shortly.

While playing around with this, I noticed that the "... SET SCHEMA"
case offers "FROM CURRENT" and "TO" as completions, which is
incorrect. It should really offer to complete with a list of schemas.
However, since that's a pre-existing bug in a different region of the
code, I think it's best addressed in a separate patch, which probably
ought to be back-patched.

Regards,
Dean
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
new file mode 100644
index 5312a7e..b07257c
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1149,6 +1149,21 @@ static const SchemaQuery Query_for_trigg
 "CREATE", "CONNECT", "TEMPORARY", "EXECUTE", "USAGE", "SET", "ALTER SYSTEM", \
 "MAINTAIN", "ALL"
 
+/* ALTER PROCEDURE options */
+#define Alter_procedure_options \
+"DEPENDS ON EXTENSION", "EXTERNAL SECURITY", "NO DEPENDS ON EXTENSION", \
+"OWNER TO", "RENAME TO", "RESET", "SECURITY", "SET"
+
+/* ALTER ROUTINE options */
+#define Alter_routine_options \
+Alter_procedure_options, "COST", "IMMUTABLE", "LEAKPROOF", "NOT LEAKPROOF", \
+"PARALLEL", "ROWS", "STABLE", "VOLATILE"
+
+/* ALTER FUNCTION options */
+#define Alter_function_options \
+Alter_routine_options, "CALLED ON NULL INPUT", "RETURNS NULL ON NULL INPUT", \
+"STRICT", "SUPPORT"
+
 /*
  * These object types were introduced later than our support cutoff of
  * server version 9.2.  We use the VersionedQuery infrastructure so that
@@ -1812,15 +1827,45 @@ psql_completion(const char *text, int st
 		else
 			COMPLETE_WITH_FUNCTION_ARG(prev2_wd);
 	}
-	/* ALTER FUNCTION,PROCEDURE,ROUTINE  (...) */
-	else if (Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny))
+	/* ALTER FUNCTION  (...) */
+	else if (Matches("ALTER", "FUNCTION", MatchAny, MatchAny))
 	{
 		if (ends_with(prev_wd, ')'))
-			COMPLETE_WITH("OWNER TO", "RENAME TO", "SET SCHEMA",
-		  "DEPENDS ON EXTENSION", "NO DEPENDS ON EXTENSION");
+			COMPLETE_WITH(Alter_function_options);
 		else
 			COMPLETE_WITH_FUNCTION_ARG(prev2_wd);
 	}
+	/* ALTER PROCEDURE  (...) */
+	else if (Matches("ALTER", "PROCEDURE", MatchAny, MatchAny))
+	{
+		if (ends_with(prev_wd, ')'))
+			COMPLETE_WITH(Alter_procedure_options);
+		else
+			COMPLETE_WITH_FUNCTION_ARG(prev2_wd);
+	}
+	/* ALTER ROUTINE  (...) */
+	else if (Matches("ALTER", "ROUTINE", MatchAny, MatchAny))
+	{
+		if (ends_with(prev_wd, ')'))
+			COMPLETE_WITH(Alter_routine_options);
+		else
+			COMPLETE_WITH_FUNCTION_ARG(prev2_wd);
+	}
+	/* ALTER FUNCTION|ROUTINE  (...) PARALLEL */
+	else if (Matches("ALTER", "FUNCTION|ROUTINE", MatchAny, MatchAny, "PARALLEL"))
+		COMPLETE_WITH("RESTRICTED", "SAFE", "UNSAFE");
+	/* ALTER FUNCTION|PROCEDURE|ROUTINE  (...) [EXTERNAL] SECURITY */
+	else if (Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny, "SECURITY") ||
+			 Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny, "EXTERNAL", "SECURITY"))
+		COMPLETE_WITH("DEFINER", "INVOKER");
+	/* ALTER FUNCTION|PROCEDURE|ROUTINE  (...) RESET */
+	else if (Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny, "RESET"))
+		COMPLETE_WITH_QUERY_VERBATIM_PLUS(Query_for_list_of_set_vars,
+		  "ALL");
+	/* ALTER FUNCTION|PROCEDURE|ROUTINE  (...) SET */
+	else if (Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny, "SET"))
+		COMPLETE_WITH_QUERY_VERBATIM_PLUS(Query_for_list_of_set_vars,
+		  "SCHEMA");
 
 	/* ALTER PUBLICATION  */
 	else if (Matches("ALTER", "PUBLICATION", MatchAny))

Re: Using AF_UNIX sockets always for tests on Windows

2023-01-05 Thread Andrew Dunstan



On 2023-01-04 We 07:13, vignesh C wrote:
> On Fri, 2 Dec 2022 at 18:08, Andrew Dunstan  wrote:
>>
>> On 2022-12-01 Th 21:10, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2022-12-01 20:56:18 -0500, Tom Lane wrote:
 Andres Freund  writes:
> On 2022-12-01 20:30:36 -0500, Tom Lane wrote:
>> If we remove that, won't we have a whole lot of code that's not
>> tested at all on any platform, ie all the TCP-socket code?
> There's some coverage via the auth and ssl tests. But I agree it's an
> issue. But to me the fix for that seems to be to add a dedicated test for
> that, rather than relying on windows to test our socket code - that's 
> quite a
> few separate code paths from the tcp support of other platforms.
 IMO that's not the best way forward, because you'll always have
 nagging questions about whether a single-purpose test covers
 everything that needs coverage.
>>> Still seems better than not having any coverage in our development
>>> environments...
>>>
>>>
 I think the best place to be in would be to be able to run the whole test
 suite using either TCP or UNIX sockets, on any platform (with stuff like 
 the
 SSL test overriding the choice as needed).
>>> I agree that that's useful. But it seems somewhat independent from the
>>> majority of the proposed changes. To be able to test force-tcp-everywhere we
>>> don't need e.g.  code for setting sspi auth in pg_regress etc - it's afaik
>>> just needed so there's a secure way of running tests at all on windows.
>>>
>>> I think 0003 should be "trimmed" to only change the default for
>>> $use_unix_sockets on windows and to remove PG_TEST_USE_UNIX_SOCKETS. Whoever
>>> wants to, can then add a new environment variable to force tap tests to use
>>> tcp.
>>>
>> Not sure if it's useful here, but a few months ago I prepared patches to
>> remove the config-auth option of pg_regress, which struck me as more
>> than odd, and replace it with a perl module. I didn't get around to
>> finishing them, but the patches as of then are attached.
>>
>> I agree that having some switch that says "run everything with TCP" or
>> "run (almost) everything with Unix sockets" would be good.
> The patch does not apply on top of HEAD as in [1], please post a rebased 
> patch:
> === Applying patches on top of PostgreSQL commit ID
> bf03cfd162176d543da79f9398131abc251ddbb9 ===
> === applying patch
> ./0001-Do-config_auth-in-perl-code-for-TAP-tests-and-vcregr.patch
> patching file contrib/basebackup_to_shell/t/001_basic.pl
> Hunk #1 FAILED at 21.
> 1 out of 1 hunk FAILED -- saving rejects to file
> contrib/basebackup_to_shell/t/001_basic.pl.rej
> patching file src/bin/pg_basebackup/t/010_pg_basebackup.pl
> Hunk #1 FAILED at 29.
> 1 out of 1 hunk FAILED -- saving rejects to file
> src/bin/pg_basebackup/t/010_pg_basebackup.pl.rej
> Hunk #3 FAILED at 461.
> 1 out of 3 hunks FAILED -- saving rejects to file
> src/test/perl/PostgreSQL/Test/Cluster.pm.rej
>
> [1] - http://cfbot.cputube.org/patch_41_4033.log
>

What I posted was not intended as a replacement for Thomas' patches, or
indeed meant as a CF item at all.

So really we're waiting on Thomas to post a response to Tom's and
Andres' comments upthread.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Cygwin cleanup

2023-01-05 Thread Thomas Munro

On Wed, Jan 4, 2023 at 3:25 AM Justin Pryzby  wrote:
> On Tue, Jan 03, 2023 at 05:54:56PM +0530, vignesh C wrote:
> > Is there still some work pending for this thread as Andres had
> > committed some part, if so, can you post an updated patch for the
> > same.
>
> Thomas, what's your opinion ?

One observation is that your CI patch *nearly* succeeds, even if
hacked to turn on the full TAP tests, if applied on top of the
WaitEventSet-for-postmaster patch:

https://cirrus-ci.com/task/4533371804581888

No cigar though, it still failed a few times for me in the
subscription tests with EAGAIN, when accessing semaphores:

semctl(24576010, 14, SETVAL, 0) failed: Resource temporarily unavailable

That isn't an error I expect from semctl(), but from some cursory
research it seems like that system call is actually talking to the
cygserver process over a pipe (?) to implement SysV semaphores.  Maybe
it couldn't keep up, but doesn't like to block?  Perhaps we could try
to tune that server, but let's try the POSIX kind of semaphores
instead.  From a quick peek at the source, they are implemented some
other way on direct native NT voodoo, no cygserver involved.

https://cirrus-ci.com/task/5142810819559424 [still running at time of writing]

Gotta run, but I'll check again in the morning to see if that does better...

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-05 Thread Alexander Korotkov

On Wed, Jan 4, 2023 at 5:05 PM Alexander Korotkov  wrote:
> On Wed, Jan 4, 2023 at 3:43 PM Pavel Borisov  wrote:
> > One more update of a patchset to avoid compiler warnings.
>
> Thank you for your help.  I'm going to provide the revised version of
> patch with comments and commit message in the next couple of days.

The revised patch is attached.  It contains describing commit message,
comments and some minor code improvements.

--
Regards,
Alexander Korotkov


0001-Allow-locking-updated-tuples-in-tuple_update-and--v4.patch
Description: Binary data

Re: Logical replication - schema change not invalidating the relation cache

2023-01-05 Thread vignesh C

On Thu, 5 Jan 2023 at 03:17, Tom Lane  wrote:
>
> vignesh C  writes:
> > [ v3-0001-Fix-for-invalidating-logical-replication-relation.patch ]
>
> (btw, please don't send multiple patch versions with the same number,
> it's very confusing.)

Since it was just rebasing on top of HEAD, I did not change the
version, I will take care of this point in the later versions.

> I looked briefly at this patch.  I wonder why you wrote a whole new
> callback function instead of just using rel_sync_cache_publication_cb
> for this case too.

Yes we can use rel_sync_cache_publication_cb itself for the
invalidation of the relations, I have changed it.

> The bigger picture here though is that in examples such as the one
> you gave at the top of the thread, it's not very clear to me that
> there's *any* principled behavior.  If the connection between publisher
> and subscriber tables is only the relation name, fine ... but exactly
> which relation name applies?  If you've got a transaction that is both
> inserting some data and renaming the table, it's really debatable which
> insertions should be sent under which name(s).  So how much should we
> actually care about such cases?  Do we really want to force a cache flush
> any time somebody changes a (possibly unrelated) pg_namespace entry?
> We could be giving up significant performance and not accomplishing much
> except changing from one odd behavior to a different one.

The connection between publisher and subscriber table is based on
relation id, During the first change relid, relname and schema name
from publisher will be sent to the subscriber. Subscriber stores these
id, relname and schema name in the LogicalRepRelMap hash for which
relation id is the key. Subsequent data received in the subscriber
will use the relation id received from the publisher and apply the
changes in the subscriber.
The problem does not stop even after the transaction that renames the
schema is completed(Step3 in first mail). Even after the transaction
is completed i.e after Step 3 the inserts of sch1.t1 and sch2.t1 both
get replicated to sch1.t1 in the subscriber side. This happens because
the publisher id's of sch2.t1 and sch1.t1 are mapped to sch1.t1 in the
subscriber side and both inserts are successful.
Step4) In Publisher
postgres=# insert into sch2.t1 values(11);
INSERT 0 1
postgres=# insert into sch1.t1 values(12);
INSERT 0 1

Step5) In Subscriber
postgres=# select * from sch1.t1;
 c1

 11
 12
(2 rows)

During the sch1.t1 first insertion the relid, relname and schema name
from publisher will be sent to the subscriber, this entry will be
mapped to sch1.t1 in subscriber side and any insert from the publisher
will insert to sch1.t1.
After the rename of schema(relid will not be changed) since this entry
is not invalidated, even though we are inserting to sch2.t1 as the
relid is not changed, subscriber will continue to insert into sch1.t1
in subscriber.
During the first insert of new table sch1.t1, the relid, relname and
schema name from publisher will be sent to the subscriber, this entry
will be again mapped to sch1.t1 in the subscriber side.
Since both the entries sch1.t1 and sch2.t1 are mapped to sch1.t1 in
the subscriber side, both inserts will insert to the same table.
This issue will get fixed if we invalidate the relation and update the
relmap in the subscriber.
I did not like the behavior where any insert on sch1.t1 or sch2.t1
replicates the changes to sch1.t1 in the subscriber. I felt it might
be better to fix this issue. I agree that the invalidations are
costly. If you feel this is a very corner case then we can skip it.

Attached an updated patch.

Regards,
Vignesh
From 74211d016528699205163dab8ecc7fe04aec09b2 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Sat, 12 Mar 2022 13:04:55 +0530
Subject: [PATCH v4] Fix for invalidating logical replication relations when
 there is a change in schema.

When the schema gets changed, the rel sync cache invalidation was not
happening, fixed it by adding a callback for schema change.
---
 src/backend/replication/pgoutput/pgoutput.c |  3 +
 src/test/subscription/t/001_rep_changes.pl  | 67 +
 2 files changed, 70 insertions(+)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 7737242516..f68c271cf2 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1936,6 +1936,9 @@ init_rel_sync_cache(MemoryContext cachectx)
 	CacheRegisterSyscacheCallback(PUBLICATIONNAMESPACEMAP,
   rel_sync_cache_publication_cb,
   (Datum) 0);
+	CacheRegisterSyscacheCallback(NAMESPACEOID,
+  rel_sync_cache_publication_cb,
+  (Datum) 0);
 }

 /*
diff --git a/src/test/subscription/t/001_rep_changes.pl b/src/test/subscription/t/001_rep_changes.pl
index 91aa068c95..3e91640c03 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -542,6 +542,73

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-05 Thread Dilip Kumar

On Thu, Jan 5, 2023 at 5:03 PM houzj.f...@fujitsu.com
 wrote:
>
> On Thursday, January 5, 2023 4:22 PM Dilip Kumar  
> wrote:
> >

> Thanks for reporting the problem.
>
> After analyzing the behavior, I think it's a bug on publisher side which
> is not directly related to parallel apply.
>
> I think the root reason is that we didn't try to send a stream end(stream
> abort) message to subscriber for the crashed transaction which was streamed
> before.
> The behavior is that, after restarting, the publisher will start to decode the
> transaction that aborted due to crash, and when try to stream the first change
> of that transaction, it will send a stream start message but then it realizes
> that the transaction was aborted, so it will enter the PG_CATCH block of
> ReorderBufferProcessTXN() and call ReorderBufferResetTXN() which send the
> stream stop message. And in this case, there would be a parallel apply worker
> started on subscriber waiting for stream end message which will never come.

I suspected it but didn't analyze this.

> I think the same behavior happens for the non-parallel mode which will cause
> a stream file left on subscriber and will not be cleaned until the apply 
> worker is
> restarted.
> To fix it, I think we need to send a stream abort message when we are cleaning
> up crashed transaction on publisher(e.g., in ReorderBufferAbortOld()). And 
> here
> is a tiny patch which change the same. I have confirmed that the bug is fixed
> and all regression tests pass.
>
> What do you think ?
> I will start a new thread and try to write a testcase if possible
> after reaching a consensus.

I think your analysis looks correct and we can raise this in a new thread.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

RE: Perform streaming logical transactions by background workers and parallel apply

2023-01-05 Thread houzj.f...@fujitsu.com

On Thursday, January 5, 2023 4:22 PM Dilip Kumar  wrote:
> 
> On Thu, Jan 5, 2023 at 9:07 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Wednesday, January 4, 2023 9:29 PM Dilip Kumar
>  wrote:
> 
> > > I think this looks good to me.
> >
> > Thanks for the comments.
> > Attach the new version patch set which changed the comments as
> suggested.
> 
> Thanks for the updated patch, while testing this I see one strange
> behavior which seems like bug to me, here is the step to reproduce
> 
> 1. start 2 servers(config: logical_decoding_work_mem=64kB)
> ./pg_ctl -D data/ -c -l pub_logs start
> ./pg_ctl -D data1/ -c -l sub_logs start
> 
> 2. Publisher:
> create table t(a int PRIMARY KEY ,b text);
> CREATE OR REPLACE FUNCTION large_val() RETURNS TEXT LANGUAGE SQL AS
> 'select array_agg(md5(g::text))::text from generate_series(1, 256) g';
> create publication test_pub for table t
> with(PUBLISH='insert,delete,update,truncate');
> alter table t replica identity FULL ;
> insert into t values (generate_series(1,2000),large_val()) ON CONFLICT
> (a) DO UPDATE SET a=EXCLUDED.a*300;
> 
> 3. Subscription Server:
> create table t(a int,b text);
> create subscription test_sub CONNECTION 'host=localhost port=5432
> dbname=postgres' PUBLICATION test_pub WITH ( slot_name =
> test_slot_sub1,streaming=parallel);
> 
> 4. Publication Server:
> begin ;
> savepoint a;
> delete from t;
> savepoint b;
> insert into t values (generate_series(1,5000),large_val()) ON CONFLICT
> (a) DO UPDATE SET a=EXCLUDED.a*3;  -- (while executing this start
> publisher in 2-3 secs)
> 
> Restart the publication server, while the transaction is still in an
> uncommitted state.
> ./pg_ctl -D data/ -c -l pub_logs stop -mi
> ./pg_ctl -D data/ -c -l pub_logs start -mi
> 
> after this, the parallel apply worker stuck in waiting on stream lock
> forever (even after 10 mins) -- see below, from subscriber logs I can
> see one of the parallel apply worker [75677] started but never
> finished [no error], after that I have performed more operation [same
> insert] which got applied by new parallel apply worked which got
> started and finished within 1 second.
> 

Thanks for reporting the problem.

After analyzing the behavior, I think it's a bug on publisher side which
is not directly related to parallel apply.

I think the root reason is that we didn't try to send a stream end(stream
abort) message to subscriber for the crashed transaction which was streamed
before.

The behavior is that, after restarting, the publisher will start to decode the
transaction that aborted due to crash, and when try to stream the first change
of that transaction, it will send a stream start message but then it realizes
that the transaction was aborted, so it will enter the PG_CATCH block of
ReorderBufferProcessTXN() and call ReorderBufferResetTXN() which send the
stream stop message. And in this case, there would be a parallel apply worker
started on subscriber waiting for stream end message which will never come.

I think the same behavior happens for the non-parallel mode which will cause
a stream file left on subscriber and will not be cleaned until the apply worker 
is
restarted.

To fix it, I think we need to send a stream abort message when we are cleaning
up crashed transaction on publisher(e.g., in ReorderBufferAbortOld()). And here
is a tiny patch which change the same. I have confirmed that the bug is fixed
and all regression tests pass.

What do you think ?
I will start a new thread and try to write a testcase if possible
after reaching a consensus.

Best regards,
Hou zj

0001-fix-stream-changes-for-crashed-transaction.patch
Description: 0001-fix-stream-changes-for-crashed-transaction.patch

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2023-01-05 Thread Alvaro Herrera

I haven't read this patch other than superficially; I suppose the
feature it's introducing is an OK one to have as an extension to the
standard.  (I hope the community members that are committee members 
will propose this extension to become part of the standard.)

On 2023-Jan-02, Dean Rasheed wrote:

> --- a/src/backend/optimizer/prep/preptlist.c
> +++ b/src/backend/optimizer/prep/preptlist.c
> @@ -157,15 +157,14 @@ preprocess_targetlist(PlannerInfo *root)
>   /*
>* Add resjunk entries for any Vars used in each 
> action's
>* targetlist and WHEN condition that belong to 
> relations other
> -  * than target.  Note that aggregates, window functions 
> and
> -  * placeholder vars are not possible anywhere in 
> MERGE's WHEN
> -  * clauses.  (PHVs may be added later, but they don't 
> concern us
> -  * here.)
> +  * than target.  Note that aggregates and window 
> functions are not
> +  * possible anywhere in MERGE's WHEN clauses, but 
> PlaceHolderVars
> +  * may have been added by subquery pullup.
>*/
>   vars = pull_var_clause((Node *)
>  
> list_concat_copy((List *) action->qual,
>   
> action->targetList),
> -0);
> +
> PVC_INCLUDE_PLACEHOLDERS);

Hmm, is this new because of NOT MATCHED BY SOURCE, or is it something
that can already be hit by existing features of MERGE?  In other words
-- is this a bug fix that should be backpatched ahead of introducing NOT
MATCHED BY SOURCE?

> @@ -127,10 +143,12 @@ transformMergeStmt(ParseState *pstate, M
>*/
>   is_terminal[0] = false;
>   is_terminal[1] = false;
> + is_terminal[2] = false;

I think these 0/1/2 should be replaced by the values of MergeMatchKind.

> + /* Join type required */
> + if (left_join && right_join)
> + qry->mergeJoinType = JOIN_FULL;
> + else if (left_join)
> + qry->mergeJoinType = JOIN_LEFT;
> + else if (right_join)
> + qry->mergeJoinType = JOIN_RIGHT;
> + else
> + qry->mergeJoinType = JOIN_INNER;

One of the review comments that MERGE got initially was that parse
analysis was not a place to "do query optimization", in the sense that
the original code was making a decision whether to make an outer or
inner join based on the set of WHEN clauses that appear in the command.
That's how we ended up with transform_MERGE_to_join and
mergeUseOuterJoin instead.  This new code is certainly not the same, but
it makes me a bit unconfortable.  Maybe it's OK, though.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: How to generate the new expected out file.

2023-01-05 Thread Amit Kapila

On Thu, Jan 5, 2023 at 4:12 PM jian he  wrote:
> Hi.
>
> I changed the src/test/regress/sql/interval.sql, How can I generate the new 
> src/test/regress/expected/interval.out file.
>

You can run the tests and copy the required changes from
src/test/regress/output/interval.out to
src/test/regress/expected/interval.out

-- 
With Regards,
Amit Kapila.

How to generate the new expected out file.

2023-01-05 Thread jian he

Hi.

I changed the src/test/regress/sql/interval.sql, How can I generate the new
src/test/regress/expected/interval.out file.

Issue in MERGE with concurrent UPDATE and MERGE

2023-01-05 Thread Shruthi Gowda

Hi,

While I was running some isolation tests for MERGE, I noticed one issue
when MERGE tries to UPDATE rows that are concurrently updated by another
session.

Below is the test case for the same.


 TEST CASE START =


 DROP TABLE target;

  DROP TABLE source;


  CREATE TABLE source (id int primary key, balance int);

  INSERT INTO source VALUES (1, 100);

  INSERT INTO source VALUES (2, 200);


  CREATE TABLE target (id int primary key, balance int);

  INSERT INTO target VALUES (1, 10);

  INSERT INTO target VALUES (2, 20);


Session 1:


begin;

UPDATE target SET balance = balance + 1;

select * from target;


Session 2:


begin;

MERGE INTO target t

  USING (SELECT * from source) s

  ON (s.id = t.id)

  WHEN MATCHED THEN

UPDATE SET balance = t.balance + s.balance

  WHEN NOT MATCHED THEN

INSERT (id, balance) VALUES (s.id, s.balance);


< MERGE will wait because the rows are locked by Session 1 >



Session 1:


commit;


Session 2:


 SELECT * FROM target;

  commit;


 TEST CASE END
=



The MERGE fails with the error :

ERROR:  duplicate key value violates unique constraint "target_pkey"
DETAIL:  Key (id)=(2) already exists.



However, the above test case works fine when the target table has only one
matching row with the source table. When there are multiple matching rows
and those rows are concurrently updated, only the first record gets updated
in MERGE. The subsequent records fail to update and return from
ExecMergeMatched( ) from the below place and enter into the WHEN NOT
MATCHED INSERT flow.


(void) ExecGetJunkAttribute(epqslot,

  resultRelInfo->ri_RowIdAttNo,

   );

 if (isNull)

 return false;




Regards,
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com

Re: Infinite Interval

2023-01-05 Thread jian he

On Wed, Jan 4, 2023 at 10:13 PM jian he  wrote:

>
>
> On Tue, Jan 3, 2023 at 6:14 AM Joseph Koshakow  wrote:
>
>> I have another patch, this one adds validations to operations that
>> return intervals and updated error messages. I tried to give all of the
>> error messages meaningful text, but I'm starting to think that almost all
>> of them should just say "interval out of range". The current approach
>> may reveal some implementation details and lead to confusion. For
>> example, some subtractions are converted to additions which would lead
>> to an error message about addition.
>>
>> SELECT date 'infinity' - interval 'infinity';
>> ERROR:  cannot add infinite values with opposite signs
>>
>> I've also updated the commit message to include the remaining TODOs,
>> which I've copied below
>>
>>   1. Various TODOs in code.
>>   2. Correctly implement interval_part for infinite intervals.
>>   3. Test consolidation.
>>   4. Should we just use the months field to test for infinity?
>>
>
>
> 3. Test consolidation.
> I used the DO command, reduced a lot of test sql code.
> I don't know how to generate an interval.out file.
> I hope the format is ok. I use https://sqlformat.darold.net/ format the
> sql code.
> Then I saw on the internet that one line should be no more than 80 chars.
> so I slightly changed the format.
>
> --
>  I recommend David Deutsch's <>
>
>   Jian
>
>
>

1. Various TODOs in code.
logic combine and clean up for functions in backend/utils/adt/timestamp.c
(timestamp_pl_interval,timestamptz_pl_interval, interval_pl, interval_mi).
3. Test consolidation in /regress/sql/interval.sql

For 1. I don't know how to format the code. I have a problem installing
pg_indent. If the format is wrong, please reformat.
3. As the previous email thread.
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 9484b29ec4..350363b9ad 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -2973,21 +2973,21 @@ timestamp_pl_interval(PG_FUNCTION_ARGS)
 	 * timestamp with the same sign. Adding two infintes with different signs
 	 * results in an error.
 	 */
-	/* TODO this logic can probably be combined and cleaned up. */
-	if (INTERVAL_IS_NOBEGIN(span) && TIMESTAMP_IS_NOBEGIN(timestamp))
-		TIMESTAMP_NOBEGIN(result);
-	else if (INTERVAL_IS_NOEND(span) && TIMESTAMP_IS_NOEND(timestamp))
-		TIMESTAMP_NOEND(result);
-	else if (INTERVAL_NOT_FINITE(span) && TIMESTAMP_NOT_FINITE(timestamp))
-		ereport(ERROR,
-(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
- errmsg("cannot add infinite values with opposite signs")));
-	else if (INTERVAL_IS_NOBEGIN(span))
-		TIMESTAMP_NOBEGIN(result);
-	else if (INTERVAL_IS_NOEND(span))
-		TIMESTAMP_NOEND(result);
-	else if (TIMESTAMP_NOT_FINITE(timestamp))
-		result = timestamp;
+	if (	(INTERVAL_IS_NOBEGIN(span) && TIMESTAMP_IS_NOBEGIN(timestamp))
+	|| (INTERVAL_IS_NOBEGIN(span) && !TIMESTAMP_NOT_FINITE(timestamp))
+	|| (!INTERVAL_NOT_FINITE(span)&&	TIMESTAMP_IS_NOBEGIN(timestamp)))
+			TIMESTAMP_NOBEGIN(result);
+
+	else if ((INTERVAL_IS_NOEND(span)&& TIMESTAMP_IS_NOEND(timestamp))
+	|| (INTERVAL_IS_NOEND(span)&& !TIMESTAMP_NOT_FINITE(timestamp))
+	||(!INTERVAL_NOT_FINITE(span) &&	TIMESTAMP_IS_NOEND(timestamp)))
+			TIMESTAMP_NOEND(result);
+
+	else if ((!INTERVAL_NOT_FINITE(span) && TIMESTAMP_NOT_FINITE(timestamp))
+		|| (INTERVAL_NOT_FINITE(span) && !TIMESTAMP_NOT_FINITE(timestamp)))
+		ereport(ERROR,
+			(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+		errmsg("cannot add infinite values with opposite signs")));
 	else
 	{
 		if (span->month != 0)
@@ -3095,21 +3095,21 @@ timestamptz_pl_interval(PG_FUNCTION_ARGS)
 	 * timestamp with the same sign. Adding two infintes with different signs
 	 * results in an error.
 	 */
-	/* TODO this logic can probably be combined and cleaned up. */
-	if (INTERVAL_IS_NOBEGIN(span) && TIMESTAMP_IS_NOBEGIN(timestamp))
-		TIMESTAMP_NOBEGIN(result);
-	else if (INTERVAL_IS_NOEND(span) && TIMESTAMP_IS_NOEND(timestamp))
-		TIMESTAMP_NOEND(result);
-	else if (INTERVAL_NOT_FINITE(span) && TIMESTAMP_NOT_FINITE(timestamp))
-		ereport(ERROR,
-(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
- errmsg("cannot add infinite values with opposite signs")));
-	else if (INTERVAL_IS_NOBEGIN(span))
-		TIMESTAMP_NOBEGIN(result);
-	else if (INTERVAL_IS_NOEND(span))
-		TIMESTAMP_NOEND(result);
-	else if (TIMESTAMP_NOT_FINITE(timestamp))
-		result = timestamp;
+	if ((INTERVAL_IS_NOBEGIN(span) && TIMESTAMP_IS_NOBEGIN(timestamp))
+|| (INTERVAL_IS_NOBEGIN(span) && !TIMESTAMP_NOT_FINITE(timestamp))
+|| (!INTERVAL_NOT_FINITE(span)&&  TIMESTAMP_IS_NOBEGIN(timestamp)))
+TIMESTAMP_NOBEGIN(result);
+
+else if ((INTERVAL_IS_NOEND(span)   && TIMESTAMP_IS_NOEND(timestamp))
+	|| (INTERVAL_IS_NOEND(span)&& !TIMESTAMP_NOT_FINITE(timestamp))
+	|| (!INTERVAL_NOT_FINITE(span) &&

Re: Allow placeholders in ALTER ROLE w/o superuser

2023-01-05 Thread Alexander Korotkov

On Tue, Jan 3, 2023 at 5:41 PM Pavel Borisov  wrote:
> On Tue, 3 Jan 2023 at 17:28, Justin Pryzby  wrote:
> > I should've mentioned that the same things should be removed from
> > Makefile, too...
> >
> Thanks, Justin!
> Attached is a new patch accordingly.

Thank you.  I've pushed my version, which is split into two commits.

--
Regards,
Alexander Korotkov

Re: Split index and table statistics into different types of stats

2023-01-05 Thread Drouvot, Bertrand


Hi,

On 1/5/23 1:27 AM, Andres Freund wrote:

Hi,

On 2023-01-03 15:19:18 +0100, Drouvot, Bertrand wrote:

diff --git a/src/backend/access/common/relation.c 
b/src/backend/access/common/relation.c
index 4017e175e3..fca166a063 100644
--- a/src/backend/access/common/relation.c
+++ b/src/backend/access/common/relation.c
@@ -73,7 +73,10 @@ relation_open(Oid relationId, LOCKMODE lockmode)
if (RelationUsesLocalBuffers(r))
MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE;
  
-	pgstat_init_relation(r);

+   if (r->rd_rel->relkind == RELKIND_INDEX)
+   pgstat_init_index(r);
+   else
+   pgstat_init_table(r);
  
  	return r;

  }
@@ -123,7 +126,10 @@ try_relation_open(Oid relationId, LOCKMODE lockmode)
if (RelationUsesLocalBuffers(r))
MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE;
  
-	pgstat_init_relation(r);

+   if (r->rd_rel->relkind == RELKIND_INDEX)
+   pgstat_init_index(r);
+   else
+   pgstat_init_table(r);
  
  	return r;

  }


Not this patch's fault, but the functions in relation.c have gotten duplicated
to an almost ridiculous degree :(



Thanks for looking at it!
Right, I'll have a look and will try to submit a dedicated patch for this.




diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index 3fb38a25cf..98bb230b95 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -776,11 +776,19 @@ ReadBufferExtended(Relation reln, ForkNumber forkNum, 
BlockNumber blockNum,
 * Read the buffer, and update pgstat counters to reflect a cache hit or
 * miss.
 */
-   pgstat_count_buffer_read(reln);
+   if (reln->rd_rel->relkind == RELKIND_INDEX)
+   pgstat_count_index_buffer_read(reln);
+   else
+   pgstat_count_table_buffer_read(reln);
buf = ReadBuffer_common(RelationGetSmgr(reln), 
reln->rd_rel->relpersistence,
forkNum, blockNum, mode, 
strategy, );
if (hit)
-   pgstat_count_buffer_hit(reln);
+   {
+   if (reln->rd_rel->relkind == RELKIND_INDEX)
+   pgstat_count_index_buffer_hit(reln);
+   else
+   pgstat_count_table_buffer_hit(reln);
+   }
return buf;
  }


Not nice to have additional branches here :(.


Indeed, but that does look like the price to pay for the moment ;-(



I think going forward we should move buffer stats to a "per-relfilenode" stats
entry (which would allow to track writes too), then thiw branch would be
removed again.




Agree. I think the best approach is to have this patch committed and then 
resume working on [1] (which would most probably introduce
the "per-relfilenode" stats.) Does this approach make sense to you?



+/* -
+ *
+ * pgstat_index.c
+ *   Implementation of index statistics.


This is a fair bit of duplicated code. Perhaps it'd be worth keeping
pgstat_relation with code common to table/index stats?



Good point, will look at what can be done.




+bool
+pgstat_index_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+   static const PgStat_IndexCounts all_zeroes;
+   Oid dboid;
+
+   PgStat_IndexStatus *lstats; /* pending stats entry  */
+   PgStatShared_Index *shrelcomstats;


What does "com" stand for in shrelcomstats?



Oops, thanks!

This naming is coming from my first try while working on this subject (that I 
did not share).
The idea I had at that time was to create a PGSTAT_KIND_RELATION_COMMON stat 
type for common stats between tables and indexes
and a dedicated one (PGSTAT_KIND_TABLE) for tables (given that indexes would 
have been fully part of the common one).
But it did not work well (specially as we want "dedicated" field names), so I 
preferred to submit the current proposal.

Will fix this bad naming.




+   PgStat_StatIndEntry *indentry;  /* index entry of shared stats */
+   PgStat_StatDBEntry *dbentry;/* pending database entry */
+
+   dboid = entry_ref->shared_entry->key.dboid;
+   lstats = (PgStat_IndexStatus *) entry_ref->pending;
+   shrelcomstats = (PgStatShared_Index *) entry_ref->shared_stats;
+
+   /*
+* Ignore entries that didn't accumulate any actual counts, such as
+* indexes that were opened by the planner but not used.
+*/
+   if (memcmp(>i_counts, _zeroes,
+  sizeof(PgStat_IndexCounts)) == 0)
+   {
+   return true;
+   }


I really need to propose pg_memiszero().



Oh yeah, great idea, that would be easier to read.





  Datum
-pg_stat_get_xact_numscans(PG_FUNCTION_ARGS)
+pg_stat_get_tab_xact_numscans(PG_FUNCTION_ARGS)
  {
Oid relid = PG_GETARG_OID(0);
int64   result;
@@

Re: daitch_mokotoff module

2023-01-05 Thread Dag Lem

Is there anything else I should do here, to avoid the status being
incorrectly stuck at "Waiting for Author" again.

Best regards

Dag Lem

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-05 Thread Dilip Kumar

On Thu, Jan 5, 2023 at 9:07 AM houzj.f...@fujitsu.com
 wrote:
>
> On Wednesday, January 4, 2023 9:29 PM Dilip Kumar  
> wrote:

> > I think this looks good to me.
>
> Thanks for the comments.
> Attach the new version patch set which changed the comments as suggested.

Thanks for the updated patch, while testing this I see one strange
behavior which seems like bug to me, here is the step to reproduce

1. start 2 servers(config: logical_decoding_work_mem=64kB)
./pg_ctl -D data/ -c -l pub_logs start
./pg_ctl -D data1/ -c -l sub_logs start

2. Publisher:
create table t(a int PRIMARY KEY ,b text);
CREATE OR REPLACE FUNCTION large_val() RETURNS TEXT LANGUAGE SQL AS
'select array_agg(md5(g::text))::text from generate_series(1, 256) g';
create publication test_pub for table t
with(PUBLISH='insert,delete,update,truncate');
alter table t replica identity FULL ;
insert into t values (generate_series(1,2000),large_val()) ON CONFLICT
(a) DO UPDATE SET a=EXCLUDED.a*300;

3. Subscription Server:
create table t(a int,b text);
create subscription test_sub CONNECTION 'host=localhost port=5432
dbname=postgres' PUBLICATION test_pub WITH ( slot_name =
test_slot_sub1,streaming=parallel);

4. Publication Server:
begin ;
savepoint a;
delete from t;
savepoint b;
insert into t values (generate_series(1,5000),large_val()) ON CONFLICT
(a) DO UPDATE SET a=EXCLUDED.a*3;  -- (while executing this start
publisher in 2-3 secs)

Restart the publication server, while the transaction is still in an
uncommitted state.
./pg_ctl -D data/ -c -l pub_logs stop -mi
./pg_ctl -D data/ -c -l pub_logs start -mi

after this, the parallel apply worker stuck in waiting on stream lock
forever (even after 10 mins) -- see below, from subscriber logs I can
see one of the parallel apply worker [75677] started but never
finished [no error], after that I have performed more operation [same
insert] which got applied by new parallel apply worked which got
started and finished within 1 second.

dilipku+  75660  1  0 13:39 ?00:00:00
/home/dilipkumar/work/PG/install/bin/postgres -D data
dilipku+  75661  75660  0 13:39 ?00:00:00 postgres: checkpointer
dilipku+  75662  75660  0 13:39 ?00:00:00 postgres: background writer
dilipku+  75664  75660  0 13:39 ?00:00:00 postgres: walwriter
dilipku+  75665  75660  0 13:39 ?00:00:00 postgres: autovacuum launcher
dilipku+  75666  75660  0 13:39 ?00:00:00 postgres: logical
replication launcher
dilipku+  75675  75595  0 13:39 ?00:00:00 postgres: logical
replication apply worker for subscription 16389
dilipku+  75676  75660  0 13:39 ?00:00:00 postgres: walsender
dilipkumar postgres ::1(42192) START_REPLICATION
dilipku+  75677  75595  0 13:39 ?00:00:00 postgres: logical
replication parallel apply worker for subscription 16389  waiting

Subscriber logs:
2023-01-05 13:39:07.261 IST [75595] LOG:  background worker "logical
replication worker" (PID 75649) exited with exit code 1
2023-01-05 13:39:12.272 IST [75675] LOG:  logical replication apply
worker for subscription "test_sub" has started
2023-01-05 13:39:12.307 IST [75677] LOG:  logical replication parallel
apply worker for subscription "test_sub" has started
2023-01-05 13:43:31.003 IST [75596] LOG:  checkpoint starting: time
2023-01-05 13:46:32.045 IST [76337] LOG:  logical replication parallel
apply worker for subscription "test_sub" has started
2023-01-05 13:46:35.214 IST [76337] LOG:  logical replication parallel
apply worker for subscription "test_sub" has finished
2023-01-05 13:46:50.241 IST [76384] LOG:  logical replication parallel
apply worker for subscription "test_sub" has started
2023-01-05 13:46:53.676 IST [76384] LOG:  logical replication parallel
apply worker for subscription "test_sub" has finished

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

1 2 >

1 - 100 of 101 matches

Mail list logo