Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-20 Thread Andres Freund
Hi,

On 2022-02-21 12:39:31 +0530, Amit Kapila wrote:
> Fair enough. Then, how about the following keeping the following information:

Mostly sounds good.


> * subid (subscription id)
> * subname (subscription name)

Coming from catalog, via join, I assume?


> * sync_error_count/sync_failure_count (number of timed table sync failed)
> * apply_error_count/apply_failure_count (number of times apply failed)

Yep.


> * sync_success_count (number of table syncs successfully finished)

This one I'm not quite convinced by. You can't rely on precise counts with
pgstats and we should be able to get a better idea from somewhere more
permanent which relations succeeded?  But it also doesn't do much harm, so ...


> * apply_commit_count (number of transactions applied successfully)
> * apply_rollback_count (number of transactions explicitly rolled back)

What does "explicit" mean here?


> * stats_reset (Time at which these statistics were last reset)
> 
> The view name could be pg_stat_subscription_lrep,
> pg_stat_logical_replication, or something on those lines.

pg_stat_subscription_stats :)

(I really dislike that we have pg_stat_ stuff that's not actually stats, but
something describing the current state, but that ship has well sailed).

Greetings,

Andres Freund




Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-02-20 Thread Andy Fan
>
>
> >> It actually deals with a more general form of this case, because the
> >> clauses don't need to reference the same attribute - so for example this
> >> would work too, assuming there is extended stats object on the columns
> >> on each side:
> >>
> >>   P(A.c = B.d | (A.e < 42) & (B.f < 42))
> >
> > That'd be cool.
> >
>
> Yeah, but the patch implementing this still needs more work.
>
>
Thanks for that patch. That patch has been on my study list for a long
time and it can fix the other real case I met some day ago.  I spent one
day studying it again yesterday just that the result does not deserve
sharing at the current stage.   As for the purpose here,  if we have
extended statistics, I believe it can work well.  But requiring extended
statistics for this feature does not look very reasonable to me.  Do you
think we can go further in direction for the issue here?   and it would
be super great that you can take a look at the commit 3 [1].  IIUC,
It can solve the issue and is pretty straightforward.

[1]
https://www.postgresql.org/message-id/CAKU4AWrdeQZ8xvf%3DDVhndUs%3DRGn8oVoSJvYK3Yj7uWq2%3Ddt%3DMg%40mail.gmail.com


-- 
Best Regards
Andy Fan


RE: Synchronizing slots from primary to standby

2022-02-20 Thread kato-...@fujitsu.com
Hello,

This patch status is already returned with feedback.
However, I've applied this patch on 27b77ecf9f and tested so report it.

make installcheck-world is passed.
However, when I promote the standby server and update on the new primary server,
apply worker could not start logical replication and emmit the following error.

LOG:  background worker "logical replication worker" (PID 14506) exited with 
exit code 1
LOG:  logical replication apply worker for subscription "sub1" has started
ERROR:  terminating logical replication worker due to timeout
LOG:  background worker "logical replication worker" (PID 14535) exited with 
exit code 1
LOG:  logical replication apply worker for subscription "sub1" has started

It seems that apply worker does not start because wal sender is already exist 
on the new primary.
Do you have any thoughts about what the cause might be?

test script is attached.

regards, sho kato


failover.sh
Description: failover.sh


Re: make tuplestore helper function

2022-02-20 Thread Michael Paquier
On Thu, Feb 17, 2022 at 04:10:01PM +0900, Michael Paquier wrote:
> Asserting that we are in the correct memory context in when calling
> MakeFuncResultTuplestore() sounds rather sensible from here as per the
> magics done in the various json functions.  Still, it really feels
> like we could do a more centralized consolidation of the whole.

So, I got my hands on this area, and found myself applying 07daca5 as
a first piece of the puzzle.  Anyway, after more review today, I have
bumped into more pieces that could be consolidated, and finished with
the following patch set:
- 0001 changes a couple of SRFs that I found worth simplifying.  These
refer mostly to the second and fourth group mentioned upthread by
Melanie, with two exceptions: pg_tablespace_databases() where it is
not worth changing to keep it backward-compatible and pg_ls_dir() as
per its one-arg version.  That's a nice first cut in itself.
- 0002 changes a couple of places to unify some existing SRF checks,
that I bumped into on the way.  The value here is in using the same
error messages everywhere, reducing the translation effort and
generating the same errors for all cases based on the same conditions.
This eases the review of the next patch.
- 0003 is the actual refactoring meat, where I have been able to move
the check on expectedDesc into MakeFuncResultTuplestore(), shaving
more code than previously.  If you discard the cases of patch 0001, it
should actually be possible to set setResult, setDesc and returnMode
within the new function, which would feel natural as that's the place
where we create the function's tuplestore and we want to materialize
its contents.  The cases with the JSON code are also a bit hairy and
need more thoughts, but we could also cut this part of the code from
the initial refactoring effort.

So, for now, 0001 and 0002 look like rather committable pieces.  0003
needs a bit more thoughts about all the corner cases we need to
consider, mostly what I am mentioning above.  Perhaps the pieces of
0003 related to pg_options_to_table() should be moved to 0001 as a
matter of clarity.
--
Michael
From 0b72ccfe50dfb9e8a3d7f71fd15b6c653ead22a7 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 21 Feb 2022 16:15:26 +0900
Subject: [PATCH v7 1/3] Clean up and simplify some code related to SRFs

---
 src/backend/commands/prepare.c | 31 +++---
 src/backend/utils/misc/guc.c   | 20 ++---
 src/backend/utils/misc/pg_config.c | 69 --
 src/backend/utils/mmgr/portalmem.c | 23 +++---
 4 files changed, 35 insertions(+), 108 deletions(-)

diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index e0c985ef8b..dce30aed6c 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -22,6 +22,7 @@
 #include "catalog/pg_type.h"
 #include "commands/createas.h"
 #include "commands/prepare.h"
+#include "funcapi.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/analyze.h"
@@ -716,30 +717,13 @@ pg_prepared_statement(PG_FUNCTION_ARGS)
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("materialize mode required, but it is not allowed in this context")));
 
+	if (get_call_result_type(fcinfo, NULL, ) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
 	/* need to build tuplestore in query context */
 	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
 	oldcontext = MemoryContextSwitchTo(per_query_ctx);
 
-	/*
-	 * build tupdesc for result tuples. This must match the definition of the
-	 * pg_prepared_statements view in system_views.sql
-	 */
-	tupdesc = CreateTemplateTupleDesc(7);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "name",
-	   TEXTOID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "statement",
-	   TEXTOID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "prepare_time",
-	   TIMESTAMPTZOID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "parameter_types",
-	   REGTYPEARRAYOID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "from_sql",
-	   BOOLOID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "generic_plans",
-	   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "custom_plans",
-	   INT8OID, -1, 0);
-
 	/*
 	 * We put all the tuples into a tuplestore in one scan of the hashtable.
 	 * This avoids any issue of the hashtable possibly changing between calls.
@@ -747,6 +731,9 @@ pg_prepared_statement(PG_FUNCTION_ARGS)
 	tupstore =
 		tuplestore_begin_heap(rsinfo->allowedModes & SFRM_Materialize_Random,
 			  false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
 
 	/* generate junk in short-term context */
 	MemoryContextSwitchTo(oldcontext);
@@ -778,10 +765,6 @@ pg_prepared_statement(PG_FUNCTION_ARGS)
 		}
 	}
 
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
-
 	return (Datum) 0;

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-02-20 Thread Andy Fan
Thanks for the detailed explanation.

On Sat, Feb 19, 2022 at 2:27 AM Robert Haas  wrote:

> On Fri, Feb 18, 2022 at 12:56 AM Andy Fan 
> wrote:
> > What do you think about moving on this feature?  The items known by me
> > are: 1).  Make sure the estimation error can be fixed or discuss if my
> current
> > solution is workable.  b).  Just distribute some selectivity
> restrictinfo to
> > RelOptInfo.  c).  See how hard it is to treat the original / derived
> qual equally.
> > d).  Reduce the extra planner cost at much as possible.  Any other
> important
> > items I missed?
>
> I think it's not realistic to do anything here for PostgreSQL 15.
> Considering that it's almost the end of February and feature freeze
> will probably be in perhaps 5-6 weeks, in order to get something
> committed at this point,


I didn't expect that we could commit it very soon;)   Actually my
expectation
was that more people would care about the direction of this feature.  I care
about it, but that's not enough obviously.  So I summarized the direction I
want to go, and let more people see if that's right.


> Tom doesn't think we need this at all, and you and I and
> Tomas all have somewhat different ideas on what approach we ought to
> be taking,


Agreed.  IMO,  the estimation error looks like a serious issue that we
all agree to find a solution.  But currently we have different ways to
handle
that. I'd pretty much hope that we can have a discussion about this stuff.

and the patches appear to be at a POC level at this point rather than

something that's close to being ready to ship,
>

This is very true since no consensus on an approach so far. PoC would
be enough for now.


> It seems to me that the thing to do here is see if you can build
> consensus on an approach. Just saying that we ought to think the
> patches you've already got are good enough is not going to get you
> anywhere.


I truly understand this and no matter which approach I insist on, the
only reason is just because I think it is the best one IMO and not because
it comes from me or not.


> I do understand that the political element of this problem
> is frustrating to you, as it is to many people. But consider the
> alternative: suppose the way things worked around here is that any
> committer could commit anything they liked without needing the
> approval of any other committer, or even over their objections. Well,
> it would be chaos.


This is the fact I think.


> People would be constantly reverting or rewriting
> things that other people had done, and everybody would probably be
> pissed off at each other all the time, and the quality would go down
> the tubes and nobody would use PostgreSQL any more.



> But the reason it's frustrating is because the PostgreSQL
> community is a community of human beings, and there's nothing more
> frustrating in the world than the stuff other human beings do.
>
>
New knowledge gained from  how committers think about  other's patch:)
It is reasonable.  Committing the patch is not my only goal.  Thinking
stuff more completely is also an awesome thing to get during discussion.
Just that sometimes ignorance is frustrating (I also truly understood
that everyone's energy is limited).

However, it's equally true that we get further working together than
> we would individually. I think Tom is wrong about the merits of doing
> something in this area, but I also think he's incredibly smart and
> thoughtful and one of the best technologists I've ever met, and
> probably just one of the absolute best technologists on Planet Earth.
> And I also have to consider, and this is really important, the
> possibility that Tom is right about this issue and I am wrong. So far
> Tom hasn't replied to what I wrote, but I hope he does. Maybe he'll
> admit that I have some valid points. Maybe he'll tell me why he thinks
> I'm wrong. Maybe I'll learn about some problem that I haven't
> considered from his response, and maybe that will lead to a refinement
> of the idea that will make it better.


+1.  Just to be more precise,  are you also confused about why this
should not be done at all.  IIUC, I get 3 reasons from Tom's reply.
a). Planning cost. b). estimation error.  c)  extra qual execution is bad.



> I don't know, but it's certainly
> happened in plenty of other cases. And that's how PostgreSQL gets to
> be this pretty amazing database that it is. So, yeah, building
> consensus is frustrating and it takes a long time and sometimes it
> feels like other people are obstructing you needlessly and sometimes
> that's probably true. But there's not a realistic alternative. Nobody
> here is smart enough to create software that is as good as what all of
> us create together.
>

+1.

-- 
Best Regards
Andy Fan


Re: Slow standby snapshot

2022-02-20 Thread Michail Nikolaev
Hello, Andrey.

Thanks for your efforts.

> Patch on barrier seems too complicated to me right now. I’d propose to focus 
> on KnowAssignedXidsNext patch: it’s clean, simple and effective.
I'll extract it to the separated patch later.

> I’ve rebased the patch so that it does not depend on previous step. Please 
> check out it’s current state, if you are OK with it - let’s mark the patch 
> Ready for Committer. Just maybe slightly better commit message would make the 
> patch easier to understand.
Everything seems to be correct.

Best regards,
Michail.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-20 Thread Amit Kapila
On Mon, Feb 21, 2022 at 11:04 AM Andres Freund  wrote:
>
> On 2022-02-21 12:56:46 +0900, Masahiko Sawada wrote:
>
> > To take a precedent, we used to store accumulative statistics such as
> > spill_txns to pg_stat_replication, but then for the same reason we moved
> > those statistics to the new stats view, pg_stat_replication_slot. New
> > subscription statistics that we're introducing are cumulative statistics
> > whereas pg_stat_subscription is a dynamic statistics view.
>
> I'm happy to have cumulative subscriber stats somewhere in pgstats. But it
> shouldn't be split by worker or relation, and it shouldn't contain
> non-cumulative error information.
>

Fair enough. Then, how about the following keeping the following information:

* subid (subscription id)
* subname (subscription name)
* sync_error_count/sync_failure_count (number of timed table sync failed)
* apply_error_count/apply_failure_count (number of times apply failed)
* sync_success_count (number of table syncs successfully finished)
* apply_commit_count (number of transactions applied successfully)
* apply_rollback_count (number of transactions explicitly rolled back)
* stats_reset (Time at which these statistics were last reset)

The view name could be pg_stat_subscription_lrep,
pg_stat_logical_replication, or something on those lines.

-- 
With Regards,
Amit Kapila.




Re: Assert in pageinspect with NULL pages

2022-02-20 Thread Alexander Lakhin
Hello Michael,

18.02.2022 06:07, Michael Paquier wrpte:
> On Thu, Feb 17, 2022 at 09:00:20PM -0600, Justin Pryzby wrote:
>> BRIN can also crash if passed a non-brin index.
>>
>> I've been sitting on this one for awhile.  Feel free to include it in your
>> patchset.
> Ugh.  Thanks!  I am keeping a note about this one.
Could you please confirm before committing the patchset that it fixes
the bug #16527 [1]? Or maybe I could check it?
(Original patch proposed by Daria doesn't cover that case, but if the
patch going to be improved, probably it's worth fixing that bug too.)

[1]
https://www.postgresql.org/message-id/flat/16527-ef7606186f0610a1%40postgresql.org

Best regards,
Alexander




Re: Optionally automatically disable logical replication subscriptions on error

2022-02-20 Thread Peter Smith
Thanks for addressing my previous comments. Now I have looked at v19.

On Mon, Feb 21, 2022 at 11:25 AM osumi.takami...@fujitsu.com
 wrote:
>
> On Friday, February 18, 2022 3:27 PM Peter Smith  
> wrote:
> > Hi. Below are my code review comments for v18.
> Thank you for your review !
...
> > 5. src/backend/replication/logical/worker.c - DisableSubscriptionOnError
> >
> > + /*
> > + * We would not be here unless this subscription's disableonerr field
> > + was
> > + * true when our worker began applying changes, but check whether that
> > + * field has changed in the interim.
> > + */
> >
> > Apparently, this function might just do nothing if it detects some situation
> > where the flag was changed somehow, but I'm not 100% sure that the callers
> > are properly catering for when nothing happens.
> >
> > IMO it would be better if this function would return true/false to mean "did
> > disable subscription happen or not?" because that will give the calling 
> > code the
> > chance to check the function return and do the right thing - e.g. if the 
> > caller first
> > thought it should be disabled but then it turned out it did NOT disable...
> I don't think we need to do something more.
> After this function, table sync worker and the apply worker
> just exit. IMO, we don't need to do additional work for
> already-disabled subscription on the caller sides.
> It should be sufficient to fulfill the purpose of
> DisableSubscriptionOnError or confirm it has been fulfilled.

Hmmm -  Yeah, it may be the workers might just exit soon after anyhow
as you say so everything comes out in the wash, but still, I felt for
this case when DisableSubscriptionOnError turned out to do nothing it
would be better to exit via the existing logic. And that is easy to do
if the function returns true/false.

For example, changes like below seemed neater code to me. YMMV.

BEFORE (SyncTableStartWrapper):
if (MySubscription->disableonerr)
{
DisableSubscriptionOnError();
proc_exit(0);
}
AFTER
if (MySubscription->disableonerr && DisableSubscriptionOnError())
proc_exit(0);

BEFORE (ApplyLoopWrapper)
if (MySubscription->disableonerr)
{
/* Disable the subscription */
DisableSubscriptionOnError();
return;
}
AFTER
if (MySubscription->disableonerr && DisableSubscriptionOnError())
return;

~~~

Here are a couple more comments:

1. src/backend/replication/logical/worker.c -
DisableSubscriptionOnError, Refactor error handling

(this comment assumes the above gets changed too)

+static void
+DisableSubscriptionOnError(void)
+{
+ Relation rel;
+ bool nulls[Natts_pg_subscription];
+ bool replaces[Natts_pg_subscription];
+ Datum values[Natts_pg_subscription];
+ HeapTuple tup;
+ Form_pg_subscription subform;
+
+ /* Emit the error */
+ EmitErrorReport();
+ /* Abort any active transaction */
+ AbortOutOfAnyTransaction();
+ /* Reset the ErrorContext */
+ FlushErrorState();
+
+ /* Disable the subscription in a fresh transaction */
+ StartTransactionCommand();

If this DisableSubscriptionOnError function decides later that
actually the 'disableonerr' flag is false (i.e. it's NOT going to
disable the subscription after all) then IMO it make more sense that
the error logging for that case should just do whatever it is doing
now by the normal error processing mechanism.

In other words, I thought perhaps the code to
EmitErrorReport/FlushError state etc be moved to be BELOW the if
(!subform->subdisableonerr) bail-out code?

Please see what you think in my attached POC [1]. It seems neater to
me, and tests are all OK. Maybe I am mistaken...

~~~

2. Commit message - wording

Logical replication apply workers for a subscription can easily get
stuck in an infinite loop of attempting to apply a change,
triggering an error (such as a constraint violation), exiting with
an error written to the subscription worker log, and restarting.

SUGGESTION
"exiting with an error written" --> "exiting with the error written"

--
[1] peter-v19-poc.diff - POC just to try some of my suggestions above
to make sure all tests still pass ok.

Kind Regards,
Peter Smith.
Fujitsu Australia.
diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index a3c240f..4fe6b7f 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2805,7 +2805,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 /*
  * Disable the current subscription, after error recovery processing.
  */
-static void
+static bool
 DisableSubscriptionOnError(void)
 {
Relationrel;
@@ -2815,14 +2815,8 @@ DisableSubscriptionOnError(void)
HeapTuple   tup;
Form_pg_subscription subform;
 
-   /* Emit the error */
-   EmitErrorReport();
-   /* Abort any active transaction */
-   AbortOutOfAnyTransaction();
-   /* Reset the ErrorContext */
-   FlushErrorState();
-
/* Disable the subscription in a fresh transaction */
+   AbortOutOfAnyTransaction();

Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-02-20 Thread Etsuro Fujita
On Sat, Feb 19, 2022 at 6:55 AM David Zhang  wrote:
> Tried to apply the patches to master branch, no warning found and
> regression test passed.

Thanks for testing!

> Now, we have many places (5) calling the same function with a constant
> number 3. Is this a good time to consider redefine this number a
> macro somewhere?

Yeah, I think that is a good idea.  I’ll do so in the next version of
the parallel-abort patch (#0003) if no objections.

Best regards,
Etsuro Fujita




Re: Separate the result of \watch for each query execution (psql)

2022-02-20 Thread Pavel Stehule
Hi

po 21. 2. 2022 v 6:19 odesílatel Noboru Saito  napsal:

> I need a way to separate the results of \watch for each query execution.
>
> There is only a blank line between the results of \watch.
> However, there is also a blank line after the title, which complicates
> the rules.
>
> My suggestion is to insert a "form feed(\f)" (preferably a newline)
> before the result and output it.
> Then, the output will be as follows.
>
> # select now(); \watch 1
> ^L  <--- add
> Thu Feb 17 11:52:05 2022 (every 1s)
>
> now
> ---
> 2022-02-17 11:52:05.69394 + 09
> (1 row)
>
> ^L   <--- add
> Thu Feb 17 11:52:06 2022 (every 1s)
>
> now
> --
> 2022-02-17 11:52:06.96906 + 09
> (1 row)
>
> (^L is usually not displayed. It is visualized by passing through a
> filter such as `less`.)
>
> This is possible with the following simple patch.
>
> ```
> diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
> index d65b9a124f..ee9442d0a6 100644
> --- a/src/bin/psql/common.c
> +++ b/src/bin/psql/common.c
> @@ -646,10 +646,12 @@ PSQLexecWatch(const char *query, const
> printQueryOpt *opt, FILE *printQueryFout)
>   switch (PQresultStatus(res))
>   {
>   case PGRES_TUPLES_OK:
> + fprintf(fout, "\f\n");
>   printQuery(res, opt, fout, false, pset.logfile);
>   break;
>
>   case PGRES_COMMAND_OK:
> + fprintf(fout, "\f\n");
>   fprintf(fout, "%s\n%s\n\n", opt->title, PQcmdStatus(res));
>   break;
>
> ```
>
> I am developing a terminal pager ov (https://github.com/noborus/ov).
> It's a generic pager, but it has features suitable for use with psql.
>
> I found that \watch supports PAGER in PostgreSQL 15. That's great.
>
> ov can be received and displayed, but I want to display it from the
> beginning every time it is executed (like pspg).
>
> The current output is a bit difficult to clear and display for each result.
>

I strongly agree. It was a lot of work to find a workable solution for
pspg. Special chars that starting result and maybe other, that ending
result can significantly increase robustness and can reduce code. I think
it can be better to use form feed at the end of form - like it is semantic
of form feed. You know, at this moment, the result is complete.
https://en.wikipedia.org/wiki/Page_break

I don't think using it by default can be the best. Lot of people don't use
specialized pagers, but it can be set by \pset. Form feed should be used on
end

\pset formfeed [on, off]

Ascii has few nice characters, that we can use

1 .. SOX - start of header
2 .. STX - start of text
3 .. ETX - end of text

Using it, it can reduce some heuristic in pspg, that is not fully 100% when
border is not 2.

But implementation of formfeed support can be a very good start. Any mark
that can be used for synchronization can help a lot.

Regards

Pavel


Re: pg_stat_get_replication_slot and pg_stat_get_subscription_worker incorrectly marked as proretset

2022-02-20 Thread Masahiko Sawada
Hi,

On Mon, Feb 21, 2022 at 2:36 PM Michael Paquier  wrote:
>
> Hi all,
> (Author and committer added in CC.)
>
> While reviewing the code of a bunch of SRF functions in the core code,
> I have noticed that the two functions mentioned in $subject are marked
> as proretset but both functions don't return a set of tuples, just one
> record for the object given in input.  It is also worth noting that
> prorows is set to 1.

Thanks for pointing it out. Agreed.

>
> This looks like a copy-pasto error that has spread around.  The error
> on pg_stat_get_subscription_worker is recent as of 8d74fc9, and the
> one on pg_stat_get_replication_slot has been introduced in 3fa17d3,
> meaning that REL_14_STABLE got it wrong for the second part.
>
> I am aware about the discussions on the parent view for the first
> case and its design issues, but it does not change the fact that we'd
> better address the second case on HEAD IMO.
>
> Thoughts?

Agreed.

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-02-20 Thread Etsuro Fujita
On Fri, Feb 18, 2022 at 1:46 AM Fujii Masao  wrote:
> I reviewed 0001 patch. It looks good to me except the following minor things. 
> If these are addressed, I think that the 001 patch can be marked as ready for 
> committer.

OK

> +* Also determine to commit (sub)transactions opened on the remote 
> server
> +* in parallel at (sub)transaction end.
>
> Like the comment "Determine whether to keep the connection ...", "determine 
> to commit" should be "determine whether to commit"?

Agreed.  I’ll change it as such.

> "remote server" should be "remote servers"?

Maybe I’m missing something, but we determine this for the given
remote server, so it seems to me correct to say “the remote server”,
not “the remote servers“.

> +   curlevel = GetCurrentTransactionNestLevel();
> +   snprintf(sql, sizeof(sql), "RELEASE SAVEPOINT s%d", curlevel);
>
> Why does pgfdw_finish_pre_subcommit_cleanup() need to call 
> GetCurrentTransactionNestLevel() and construct the "RELEASE SAVEPOINT" query 
> string again? pgfdw_subxact_callback() already does them and probably we can 
> make it pass either of them to pgfdw_finish_pre_subcommit_cleanup() as its 
> argument.

Yeah, that would save cycles, but I think that that makes code a bit
unclean IMO.  (To save cycles, I think we could also modify
pgfdw_subxact_callback() to reuse the query in the while loop in that
function when processing multiple open remote subtransactions there,
but that would make code a bit complicated, so I don’t think it’s a
good idea to do so, either.)  So I’d vote for reconstructing the query
in pgfdw_finish_pre_subcommit_cleanup() as we do in
pgfdw_subxact_callback().

To avoid calling GetCurrentTransactionNestLevel() again, I think we
could pass the curlevel variable to that function.

> +   This option controls whether postgres_fdw commits
> +   remote (sub)transactions opened on a foreign server in a local
> +   (sub)transaction in parallel when the local (sub)transaction commits.
>
> "a foreign server" should be "foreign servers"?

I thought it would be good to say “a foreign server”, not “foreign
servers”, because it makes clear that even remote transactions opened
on a single foreign server are committed in parallel.  (To say that
this option is not for a specific foreign server, I added to the
documentation “This option can only be specified for foreign
servers”.)

> "in a local (sub)transaction" part seems not to be necessary.

And I thought adding it would make clear which remote transactions are
committed in parallel.  But maybe I’m missing something, so could you
elaborate a bit more on these?

Thanks for reviewing!

Best regards,
Etsuro Fujita




pg_stat_get_replication_slot and pg_stat_get_subscription_worker incorrectly marked as proretset

2022-02-20 Thread Michael Paquier
Hi all,
(Author and committer added in CC.)

While reviewing the code of a bunch of SRF functions in the core code,
I have noticed that the two functions mentioned in $subject are marked
as proretset but both functions don't return a set of tuples, just one
record for the object given in input.  It is also worth noting that
prorows is set to 1.

This looks like a copy-pasto error that has spread around.  The error
on pg_stat_get_subscription_worker is recent as of 8d74fc9, and the
one on pg_stat_get_replication_slot has been introduced in 3fa17d3,
meaning that REL_14_STABLE got it wrong for the second part.

I am aware about the discussions on the parent view for the first
case and its design issues, but it does not change the fact that we'd
better address the second case on HEAD IMO.

Thoughts?
--
Michael
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7f1ee97f55..aa05b9665f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5370,16 +5370,16 @@
   proargnames => '{pid,status,receive_start_lsn,receive_start_tli,written_lsn,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
   prosrc => 'pg_stat_get_wal_receiver' },
 { oid => '6169', descr => 'statistics: information about replication slot',
-  proname => 'pg_stat_get_replication_slot', prorows => '1', proisstrict => 'f',
-  proretset => 't', provolatile => 's', proparallel => 'r',
+  proname => 'pg_stat_get_replication_slot', proisstrict => 'f',
+  provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'text',
   proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
   proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
   prosrc => 'pg_stat_get_replication_slot' },
 { oid => '8523', descr => 'statistics: information about subscription worker',
-  proname => 'pg_stat_get_subscription_worker', prorows => '1', proisstrict => 'f',
-  proretset => 't', provolatile => 's', proparallel => 'r',
+  proname => 'pg_stat_get_subscription_worker', proisstrict => 'f',
+  provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'oid oid',
   proallargtypes => '{oid,oid,oid,oid,oid,text,xid,int8,text,timestamptz}',
   proargmodes => '{i,i,o,o,o,o,o,o,o,o}',


signature.asc
Description: PGP signature


Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-20 Thread Andres Freund
Hi,

On 2022-02-21 12:56:46 +0900, Masahiko Sawada wrote:
> > The patch you referenced [1] should just store the stats in the
> > pg_stat_subscription view, not pg_stat_subscription_workers.
> >
> > It *does* make sense to keep stats about the number of table syncs that 
> > failed
> > etc. But that should be a counter in pg_stat_subscription, not a row in
> > pg_stat_subscription_workers.
>
> We have discussed using pg_stat_subscription before but concluded it's
> not an appropriate place to store error information because it ends up
> keeping cumulative stats mixed with non-cumulative stats.

Well, as we've amply discussed, the non-cumulative stats shouldn't be in the
pgstat subsystem.


> To take a precedent, we used to store accumulative statistics such as
> spill_txns to pg_stat_replication, but then for the same reason we moved
> those statistics to the new stats view, pg_stat_replication_slot. New
> subscription statistics that we're introducing are cumulative statistics
> whereas pg_stat_subscription is a dynamic statistics view.

I'm happy to have cumulative subscriber stats somewhere in pgstats. But it
shouldn't be split by worker or relation, and it shouldn't contain
non-cumulative error information.

Greetings,

Andres Freund




Separate the result of \watch for each query execution (psql)

2022-02-20 Thread Noboru Saito
I need a way to separate the results of \watch for each query execution.

There is only a blank line between the results of \watch.
However, there is also a blank line after the title, which complicates
the rules.

My suggestion is to insert a "form feed(\f)" (preferably a newline)
before the result and output it.
Then, the output will be as follows.

# select now(); \watch 1
^L  <--- add
Thu Feb 17 11:52:05 2022 (every 1s)

now
---
2022-02-17 11:52:05.69394 + 09
(1 row)

^L   <--- add
Thu Feb 17 11:52:06 2022 (every 1s)

now
--
2022-02-17 11:52:06.96906 + 09
(1 row)

(^L is usually not displayed. It is visualized by passing through a
filter such as `less`.)

This is possible with the following simple patch.

```
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index d65b9a124f..ee9442d0a6 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -646,10 +646,12 @@ PSQLexecWatch(const char *query, const
printQueryOpt *opt, FILE *printQueryFout)
  switch (PQresultStatus(res))
  {
  case PGRES_TUPLES_OK:
+ fprintf(fout, "\f\n");
  printQuery(res, opt, fout, false, pset.logfile);
  break;

  case PGRES_COMMAND_OK:
+ fprintf(fout, "\f\n");
  fprintf(fout, "%s\n%s\n\n", opt->title, PQcmdStatus(res));
  break;

```

I am developing a terminal pager ov (https://github.com/noborus/ov).
It's a generic pager, but it has features suitable for use with psql.

I found that \watch supports PAGER in PostgreSQL 15. That's great.

ov can be received and displayed, but I want to display it from the
beginning every time it is executed (like pspg).

The current output is a bit difficult to clear and display for each result.
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index d65b9a124f..ee9442d0a6 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -646,10 +646,12 @@ PSQLexecWatch(const char *query, const printQueryOpt *opt, FILE *printQueryFout)
 	switch (PQresultStatus(res))
 	{
 		case PGRES_TUPLES_OK:
+			fprintf(fout, "\f\n");
 			printQuery(res, opt, fout, false, pset.logfile);
 			break;
 
 		case PGRES_COMMAND_OK:
+			fprintf(fout, "\f\n");
 			fprintf(fout, "%s\n%s\n\n", opt->title, PQcmdStatus(res));
 			break;
 


Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-20 Thread Amit Kapila
On Sat, Feb 19, 2022 at 10:35 PM David G. Johnston
 wrote:
>
> On Sat, Feb 19, 2022 at 9:37 AM Andres Freund  wrote:
>>
>> IMO the type of information you'd want for apply failures is substantially
>>
>> different enough from worker failures that I don't really see the temptation
>> to put them in the same table.
>>
>
> It's an error message and a transaction LSN in both cases right now, along 
> with knowledge of whether said transaction only affects a single table (relid 
> is not null) or not (relid is null).  Do you have a concrete idea in mind 
> that would make this separation need more obvious?
>

I would also like to mention that in some cases, sync workers also
behaves like apply worker (after initial sync till it catches up with
the apply worker) and try to stream and apply changes similar to apply
worker. The error during that phase will be no different than the
apply worker. One idea to make the separation more obvious is to
introduce 'worker_type' column similar to backend_type in
pg_stat_activity which will tell whether it is an apply worker or a
table sync worker.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-20 Thread Amit Kapila
On Sat, Feb 19, 2022 at 10:07 PM Andres Freund  wrote:
>
>
> I also still think that _worker shouldn't be part of any of the naming
> here.
>

Okay, the other options that comes to mind for this are:
pg_subscription_replication_error, or
pg_subscription_lreplication_error, or pg_subscription_lrep_error? We
can use similar naming at another place (view) if required.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-20 Thread Masahiko Sawada
On Sun, Feb 20, 2022 at 1:02 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-02-19 22:38:19 +0900, Masahiko Sawada wrote:
> > On Sat, Feb 19, 2022 at 5:32 AM Andres Freund  wrote:
> > > On 2022-02-18 17:26:04 +0900, Masahiko Sawada wrote:
> > > > With this change, pg_stat_subscription_workers will be like:
> > > >
> > > > * subid
> > > > * subname
> > > > * subrelid
> > > > * error_count
> > > > * last_error_timestamp
> > >
> > > > This view will be extended by adding transaction statistics proposed
> > > > on another thread[1].
> > >
> > > I do not agree with these bits. What's the point of these per-relation 
> > > stats
> > > at this poitns.  You're just duplicating the normal relation pg_stats 
> > > here.
> > >
> > > I really think we just should drop pg_stat_subscription_workers. Even if 
> > > we
> > > don't, we definitely should rename it, because it still isn't meaningfully
> > > about workers.
> >
> > The view has stats per subscription worker (i.e., apply worker and
> > tablesync worker), not per relation. The subrelid is OID of the
> > relation that the tablesync worker is synchronizing. For the stats of
> > apply workers, it is null.
>
> That's precisely the misuse of the stats subsystem that I'm complaining about
> here. The whole design of pgstat (as it is today) only makes sense if you can
> loose a message and it doesn't matter much, because it's just an incremental
> counter increment that's lost.  And to be able properly prune dead pgstat
> contents the underlying objects stats are kept around either need to be
> permanent (e.g. stats about WAL) or a record of objects needs to exist
> (e.g. stats about relations).
>
>
> Even leaving everything else aside, a key of (dboid, subid, subrelid), where
> subrelid can be NULL, but where (dboid, subid) is *not* unique, imo is poor
> relational design.  What is the justification for mixing relation specific and
> non-relation specific contents in this view?

I think the current schema of the view with key (dboid, subid,
subrelid) comes from the fact that we store the same statistics for
both apply and tablesync. I think even if we have two separate views
for apply and tablesync, they would have almost the same columns
except for their keys. Also, from the relational design point of view,
pg_locks has a somewhat similar table schema; its database and
relation columns can be NULL.

>
>
> The patch you referenced [1] should just store the stats in the
> pg_stat_subscription view, not pg_stat_subscription_workers.
>
> It *does* make sense to keep stats about the number of table syncs that failed
> etc. But that should be a counter in pg_stat_subscription, not a row in
> pg_stat_subscription_workers.

We have discussed using pg_stat_subscription before but concluded it's
not an appropriate place to store error information because it ends up
keeping cumulative stats mixed with non-cumulative stats. To take a
precedent, we used to store accumulative statistics such as spill_txns
to pg_stat_replication, but then for the same reason we moved those
statistics to the new stats view, pg_stat_replication_slot. New
subscription statistics that we're introducing are cumulative
statistics whereas pg_stat_subscription is a dynamic statistics view.


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




RE: Failed transaction statistics to measure the logical replication progress

2022-02-20 Thread osumi.takami...@fujitsu.com
On Saturday, February 19, 2022 12:00 AM osumi.takami...@fujitsu.com 
 wrote:
> On Friday, February 18, 2022 3:34 PM Tang, Haiying/唐 海英
>  wrote:
> > On Wed, Jan 12, 2022 8:35 PM osumi.takami...@fujitsu.com
> >  wrote:
> > 4) I noticed that the abort_count doesn't include aborted streaming
> > transactions.
> > Should we take this case into consideration?
> Hmm, we can add this into this column, when there's no objection.
> I'm not sure but someone might say those should be separate columns.
I've addressed this point in a new v23 patch,
since there was no opinion on this so far.

Kindly have a look at the attached one.

Best Regards,
Takamichi Osumi



v23-0001-Extend-pg_stat_subscription_workers-to-include-g.patch
Description:  v23-0001-Extend-pg_stat_subscription_workers-to-include-g.patch


Re: Fix overflow in DecodeInterval

2022-02-20 Thread Joseph Koshakow
On Sun, Feb 20, 2022 at 6:37 PM Tom Lane  wrote:
> Couple of quick comments:

Thanks for the comments Tom, I'll work on fixing these and submit a
new patch.

> * The uses of tm2itm make me a bit itchy.  Is that sweeping
> upstream-of-there overflow problems under the rug?

I agree, I'm not super happy with that approach. In fact
I'm pretty sure it will cause queries like
SELECT INTERVAL '2147483648:00:00';
to overflow upstream, even though queries like
SELECT INTERVAL '2147483648 hours';
would not. The places tm2itm is being used are
 * After DecodeTime
 * In interval_to_char.
The more general issue is how to share code with
functions that are doing almost identical things but use
pg_tm instead of the new pg_itm? I'm not really sure what
the best solution is right now but I will think about it. If
anyone has suggestions though, feel free to chime in.

- Joe Koshakow




Re: Trap errors from streaming child in pg_basebackup to exit early

2022-02-20 Thread Michael Paquier
On Fri, Feb 18, 2022 at 10:00:43PM +0100, Daniel Gustafsson wrote:
> This is good idea, I was going in a different direction earlier with a test 
> but
> this is cleaner.  The attached 0001 refactors pump_until; 0002 fixes a trivial
> spelling error found while hacking; and 0003 is the previous patch complete
> with a test that passes on Cirrus CI.

This looks rather sane to me, and I can confirm that this passes
the CI and a manual run of MSVC tests with my own box.

+is($node->poll_query_until('postgres',
+   "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE " .
+   "application_name = '010_pg_basebackup.pl' AND wait_event =
'WalSenderMain' " .
+   "AND backend_type = 'walsender'"), "1", "Walsender killed");
If you do that, don't you have a risk to kill the WAL sender doing the
BASE_BACKUP?  That could falsify the test.  It seems to me that it
would be safer to add a check on query ~ 'START_REPLICATION' or
something like that.

-   diag("aborting wait: program timed out");
-   diag("stream contents: >>", $$stream, "<<");
-   diag("pattern searched for: ", $untl);
Keeping some of this information around would be useful for
debugging in the refactored routine.

+my $sigchld_bb = IPC::Run::start(
+   [
+   @pg_basebackup_defs, '-X', 'stream', '-D', "$tempdir/sigchld",
+   '-r', '32', '-d', $node->connstr('postgres')
+   ],
I would recommend the use of long options here as a matter to
self-document what this does, and add a comment explaining why
--max-rate is preferable, mainly for fast machines.
--
Michael


signature.asc
Description: PGP signature


Re: Uniforms the errors msgs at tuplestore paths

2022-02-20 Thread Ranier Vilela
Em dom., 20 de fev. de 2022 às 22:08, Michael Paquier 
escreveu:

> On Sun, Feb 20, 2022 at 11:37:33AM -0300, Ranier Vilela wrote:
> > I can't see:
> > plperl.c
> > pl_exec.c
> > pttcl.c
> >
> > Only jsonfuncs.c, but the error about "materialized mode" is not
> reported.
>
> Melanie has done a nice analysis of all the code paths doing
> materialization checks for her patch with SRF functions.  Though there
> are parts that can be simplified to reduce the differences in check
> patterns before doing the overall refactoring, I think that we'd
> better keep any discussion related to this topic on the other thread
> rather than splitting the effort across more threads.
>
Fine, I agree.

regards,
Ranier Vilela


Re: Uniforms the errors msgs at tuplestore paths

2022-02-20 Thread Michael Paquier
On Sun, Feb 20, 2022 at 11:37:33AM -0300, Ranier Vilela wrote:
> I can't see:
> plperl.c
> pl_exec.c
> pttcl.c
> 
> Only jsonfuncs.c, but the error about "materialized mode" is not reported.

Melanie has done a nice analysis of all the code paths doing
materialization checks for her patch with SRF functions.  Though there
are parts that can be simplified to reduce the differences in check
patterns before doing the overall refactoring, I think that we'd
better keep any discussion related to this topic on the other thread
rather than splitting the effort across more threads.
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade verbosity when redirecting output to log file

2022-02-20 Thread Andres Freund
Hi,

On 2022-02-18 19:46:26 -0600, Justin Pryzby wrote:
> +* If outputting to a tty / or , append newline. pg_log_v() will put 
> the
> +* individual progress items onto the next line.
> +*/
> +   if (log_opts.isatty || log_opts.verbose)
>
> I guess the comment should say "or in verbose mode".

Indeed. I think I got caught in a back-and-forth between different
formulations.

Baring that, anybody against committing this?

Greetings,

Andres Freund




Re: pgsql: Add support for building with ZSTD.

2022-02-20 Thread Michael Paquier
On Sun, Feb 20, 2022 at 09:45:05AM -0500, Robert Haas wrote:
> No issues at all with you adjusting this, but I think that sentence
> reads a little awkwardly.

Thanks.

> Perhaps instead of "The default is
> zstd, that would be the command found in
> PATH." you could write something like "The default
> is zstd, which will search for a command by that
> name in the configured PATH." Or maybe something
> else is better, not sure exactly, your version just seems a little odd
> to me.

Okay, done then.  We've been using the same wording for all the other
variables, and what you are suggesting here sounds much better to me,
so I have adjusted all the descriptions this way, and added the ZSTD
part.
--
Michael


signature.asc
Description: PGP signature


Re: WIP: System Versioned Temporal Table

2022-02-20 Thread Corey Huinker
>
>
> The spec does not allow schema changes at all on a a system versioned
> table, except to change the system versioning itself.
>
>
That would greatly simplify things!


Re: set TESTDIR from perl rather than Makefile

2022-02-20 Thread Andres Freund
Hi,

On 2022-02-19 17:53:09 -0600, Justin Pryzby wrote:
> I also meant to also attach it.

Is the patch actually independent of the other patches in your stack?


I like this concept a lot:

- I've had to use a wrapper around individual tap tests for meson, just to set
  the CWD etc.
- Being able to run all tap tests at once, instead of many separate prove
  invocations is a lot more readable. And can be faster.
- makes it easier to invoke tap tests "manually", which can be useful when
  debugging problems (not fun to run make in valgrind or rr)
- I'd like to put test data and test log files in different places than they
  are eventually. This seems like it gets us a tiny bit closer to that.



> - $expected = slurp_file_eval("traces/$testname.trace");
> + my $inputdir = "$ENV{'TESTDIR'}/tmp_check";
> + $expected = slurp_file_eval("$inputdir/traces/$testname.trace");

Why is this needed? Shouldn't we end up in exactly the same dir with/without
this patch?


> diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm 
> b/src/test/perl/PostgreSQL/Test/Utils.pm
> index 31e2b0315e..8a8d95ca8c 100644
> --- a/src/test/perl/PostgreSQL/Test/Utils.pm
> +++ b/src/test/perl/PostgreSQL/Test/Utils.pm
> @@ -184,19 +184,21 @@ INIT
>   # test may still fail, but it's more likely to report useful facts.
>   $SIG{PIPE} = 'IGNORE';
>  
> - # Determine output directories, and create them.  The base path is the
> - # TESTDIR environment variable, which is normally set by the invoking
> - # Makefile.
> - $tmp_check = $ENV{TESTDIR} ? "$ENV{TESTDIR}/tmp_check" : "tmp_check";
> + my $test_dir = File::Spec->rel2abs(dirname($0));
> + my $test_name = basename($0);
> + $test_name =~ s/\.[^.]+$//;
> +
> + # Determine output directories, and create them.
> + # TODO: set srcdir?
> + $tmp_check = "$test_dir/tmp_check";
>   $log_path = "$tmp_check/log";
> + $ENV{TESTDIR} = $test_dir;
>  
>   mkdir $tmp_check;
>   mkdir $log_path;
>  
>   # Open the test log file, whose name depends on the test name.
> - $test_logfile = basename($0);
> - $test_logfile =~ s/\.[^.]+$//;
> - $test_logfile = "$log_path/regress_log_$test_logfile";
> + $test_logfile = "$log_path/regress_log_$test_name";
>   open my $testlog, '>', $test_logfile
> or die "could not open STDOUT to logfile \"$test_logfile\": $!";
>  
> diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
> index e2b0db0879..63085506e0 100644
> --- a/src/tools/msvc/vcregress.pl
> +++ b/src/tools/msvc/vcregress.pl
> @@ -261,10 +261,8 @@ sub tap_check
>   $ENV{PG_REGRESS}= "$topdir/$Config/pg_regress/pg_regress";
>   $ENV{REGRESS_SHLIB} = "$topdir/src/test/regress/regress.dll";
>  
> - $ENV{TESTDIR} = "$dir";
>   my $module = basename $dir;
> - # add the module build dir as the second element in the PATH
> - $ENV{PATH} =~ s!;!;$topdir/$Config/$module;!;
> + #$ENV{VCREGRESS_MODE} = $Config;

Hm. How does the module build dir end up on PATH after this?

Greetings,

Andres Freund




RE: Optionally automatically disable logical replication subscriptions on error

2022-02-20 Thread osumi.takami...@fujitsu.com
On Friday, February 18, 2022 3:27 PM Peter Smith  wrote:
> Hi. Below are my code review comments for v18.
Thank you for your review !

> ==
> 
> 1. Commit Message - wording
> 
> BEFORE
> To partially remedy the situation, adding a new subscription_parameter named
> 'disable_on_error'.
> 
> AFTER
> To partially remedy the situation, this patch adds a new
> subscription_parameter named 'disable_on_error'.
Fixed.

> ~~~
> 
> 2. Commit message - wording
> 
> BEFORE
> Require to bump catalog version.
> 
> AFTER
> A catalog version bump is required.
Fixed.

> ~~~
> 
> 3. doc/src/sgml/ref/alter_subscription.sgml - whitespace
> 
> @@ -201,8 +201,8 @@ ALTER SUBSCRIPTION  class="parameter">name RENAME TO <
>information.  The parameters that can be altered
>are slot_name,
>synchronous_commit,
> -  binary, and
> -  streaming.
> +  binary,streaming, and
> +  disable_on_error.
>   
> 
> There is a missing space before streaming.
Fixed. 



> ~~~
> 
> 4. src/backend/replication/logical/worker.c - WorkerErrorRecovery
> 
> @@ -2802,6 +2803,89 @@ LogicalRepApplyLoop(XLogRecPtr
> last_received)  }
> 
>  /*
> + * Worker error recovery processing, in preparation for disabling the
> + * subscription.
> + */
> +static void
> +WorkerErrorRecovery(void)
> 
> I was wondering about the need for this to be a separate function? It is only
> called immediately before calling 'DisableSubscriptionOnError'
> so would it maybe be better just to put this code inside
> DisableSubscriptionOnError with the appropriate comments?
I preferred to have one specific for error handling,
because from caller sides, when we catch error, it's apparent
that error recovery is done. But, the function name "DisableSubscriptionOnError"
by itself should have the nuance that we do something on error.
So, we can think that it's okay to have error recovery processing
in this function.

So, I removed the function and fixed some related comments.


> ~~~
> 
> 5. src/backend/replication/logical/worker.c - DisableSubscriptionOnError
> 
> + /*
> + * We would not be here unless this subscription's disableonerr field
> + was
> + * true when our worker began applying changes, but check whether that
> + * field has changed in the interim.
> + */
> 
> Apparently, this function might just do nothing if it detects some situation
> where the flag was changed somehow, but I'm not 100% sure that the callers
> are properly catering for when nothing happens.
> 
> IMO it would be better if this function would return true/false to mean "did
> disable subscription happen or not?" because that will give the calling code 
> the
> chance to check the function return and do the right thing - e.g. if the 
> caller first
> thought it should be disabled but then it turned out it did NOT disable...
I don't think we need to do something more.
After this function, table sync worker and the apply worker
just exit. IMO, we don't need to do additional work for
already-disabled subscription on the caller sides.
It should be sufficient to fulfill the purpose of
DisableSubscriptionOnError or confirm it has been fulfilled.


> ~~~
> 
> 6. src/backend/replication/logical/worker.c - LogicalRepHandleTableSync
> name
> 
> +/*
> + * Execute the initial sync with error handling. Disable the
> +subscription,
> + * if it's required.
> + */
> +static void
> +LogicalRepHandleTableSync(XLogRecPtr *origin_startpos,
> +   char **myslotname, MemoryContext cctx)
> 
> I felt that it is a bit overkill to put a "LogicalRep" prefix here because it 
> is a static
> function.
> 
> IMO this function should be renamed as 'SyncTableStartWrapper' because that
> describes better what it is doing.
Makes sense. Fixed.


> ~~~
> 
> 7. src/backend/replication/logical/worker.c - LogicalRepHandleTableSync
> Assert
> 
> Even though we can know this to be true because of where it is called from, I
> think the readability of the function will be improved if you add an 
> assertion at
> the top:
> 
> Assert(am_tablesync_worker());
Fixed.

> And then, because the function is clearly for Tablesync worker only there is 
> no
> need to keep mentioning that in the subsequent comments...
> 
> e.g.1
> /* This is table synchronization worker, call initial sync. */
> AFTER:
> /* Call initial sync. */
Fixed.

> e.g.2
> /*
>  * Report the table sync error. There is no corresponding message type
>  * for table synchronization.
>  */
> AFTER
> /*
>  * Report the error. There is no corresponding message type for table
>  * synchronization.
>  */
Agreed. Fixed


> ~~~
> 
> 8. src/backend/replication/logical/worker.c - LogicalRepHandleTableSync
> unnecessarily complex
> 
> +static void
> +LogicalRepHandleTableSync(XLogRecPtr *origin_startpos,
> +   char **myslotname, MemoryContext cctx) {
> + char*syncslotname;
> + bool error_recovery_done = false;
> 
> IMO this logic is way more complex than it needed to be. IIUC that
> 'error_recovery_done' and various conditions can be removed, and the whole

Re: Fix overflow in DecodeInterval

2022-02-20 Thread Tom Lane
Joseph Koshakow  writes:
> Attached is a patch of my first pass. The to_char method isn't finished
> and I need to add a bunch of tests, but everything else is in place. It
> ended up being a fairly large change in case anyone wants to take a look
> the changes so far.

Couple of quick comments:

* You've assumed in a number of places that long == int64.  This is wrong
on many platforms.  Current project style for use of int64 in printf-type
calls is to cast the argument to (long long) explicitly and use "%lld" (or
"%llu", etc).  As for strtoint_64, surely that functionality exists
somewhere already (hopefully with fewer bugs than this has).

* I think that tools like Coverity will complain about how you've
got both calls that check the result of AdjustFractSeconds (etc)
and calls that don't.  You might consider coding the latter like

+/* this can't overflow: */
+(void) AdjustFractSeconds(fval, itm, fsec, SECS_PER_DAY);

in hopes of (a) silencing those warnings and (b) making the implicit
assumption clear to readers.

* I'm not entirely buying use of pg_time_t, rather than just int64,
in struct itm.  I think the point of this struct is to get away
from datetime-specific datatypes.  Also, if pg_time_t ever changed
width, it'd create issues for users of this struct.  I also note
a number of places in the patch that are assuming that these fields
are int64 not something else.

* I'm a little inclined to rename interval2tm/tm2interval to
interval2itm/itm2interval, as I think it'd be confusing for those
function names to refer to a typedef they no longer use.

* The uses of tm2itm make me a bit itchy.  Is that sweeping
upstream-of-there overflow problems under the rug?

* // comments are not project style, either.  While pgindent will
convert them to /* ... */ style, the results might not be pleasing.

> One thing I noticed is that interval.c has a ton of code copied and pasted
> from other files. The code seemed out of date from those other files, so
> I tried to bring it up to date and add my changes.

We haven't actually maintained ecpg's copies of backend datatype-specific
code in many years.  While bringing that stuff up to speed might be
worthwhile (or perhaps not, given the lack of complaints), I'd see it
as a separate project.

regards, tom lane




Re: do only critical work during single-user vacuum?

2022-02-20 Thread Peter Geoghegan
On Sun, Feb 20, 2022 at 2:15 PM Andres Freund  wrote:
> We could e.g. add an error if FreezeMultiXactId() needs to create a new
> multixact for a far-in-the-past xid. That's not great, of course, but if we
> include the precise cause (pid of backend / prepared xact name / slot name /
> ...) necessitating creating a new multi, it'd still be a significant
> improvement over the status quo.

There are databases that have large tables (that grow and grow), and
also have tables that need to allocate many MultiXacts (or lots of
member space, at least). I strongly suspect that these are seldom the
same table, though.

The current inability of the system to recognize this difference seems
like it might be a real problem. Why should big tables that contain no
actual MultiXactIds at all (and never contained even one) get early
anti-wraparound VACUUMs, specifically focussed on averting MultiXact
wraparound? I'm hoping that the patch that adds smarter tracking of
final relfrozenxid/relminmxid values during VACUUM makes this less of
a problem automatically.

-- 
Peter Geoghegan




Re: do only critical work during single-user vacuum?

2022-02-20 Thread Noah Misch
On Sun, Feb 20, 2022 at 02:15:37PM -0800, Andres Freund wrote:
> On 2022-02-19 20:57:57 -0800, Noah Misch wrote:
> > On Wed, Feb 16, 2022 at 03:43:12PM +0700, John Naylor wrote:
> > > On Wed, Feb 16, 2022 at 6:17 AM Peter Geoghegan  wrote:
> > > > On Tue, Feb 15, 2022 at 9:28 AM Peter Geoghegan  wrote:
> > > 
> > > > > I did notice from my own testing of the failsafe (by artificially
> > > > > inducing wraparound failure using an XID burning C function) that
> > > > > autovacuum seemed to totally correct the problem, even when the system
> > > > > had already crossed xidStopLimit - it came back on its own. I wasn't
> > > > > completely sure of how robust this effect was, though.
> > > 
> > > I'll put some effort in finding any way that it might not be robust.
> > 
> > A VACUUM may create a not-trivially-bounded number of multixacts via
> > FreezeMultiXactId().  In a cluster at multiStopLimit, completing VACUUM
> > without error needs preparation something like:
> > 
> > 1. Kill each XID that might appear in a multixact.
> > 2. Resolve each prepared transaction that might appear in a multixact.
> > 3. Run VACUUM.  At this point, multiStopLimit is blocking new multixacts 
> > from
> >other commands, and the lack of running multixact members removes the 
> > need
> >for FreezeMultiXactId() to create multixacts.
> > 
> > Adding to the badness of single-user mode so well described upthread, one 
> > can
> > enter it without doing (2) and then wrap the nextMXact counter.
> 
> If we collected the information along the lines of  I proposed in the second 
> half of
> https://www.postgresql.org/message-id/20220204013539.qdegpqzvayq3d4y2%40alap3.anarazel.de
> we should be able to handle such cases more intelligently, I think?
> 
> We could e.g. add an error if FreezeMultiXactId() needs to create a new
> multixact for a far-in-the-past xid. That's not great, of course, but if we
> include the precise cause (pid of backend / prepared xact name / slot name /
> ...) necessitating creating a new multi, it'd still be a significant
> improvement over the status quo.

Yes, exactly.




Re: do only critical work during single-user vacuum?

2022-02-20 Thread Andres Freund
Hi,

On 2022-02-19 20:57:57 -0800, Noah Misch wrote:
> On Wed, Feb 16, 2022 at 03:43:12PM +0700, John Naylor wrote:
> > On Wed, Feb 16, 2022 at 6:17 AM Peter Geoghegan  wrote:
> > > On Tue, Feb 15, 2022 at 9:28 AM Peter Geoghegan  wrote:
> > 
> > > > I did notice from my own testing of the failsafe (by artificially
> > > > inducing wraparound failure using an XID burning C function) that
> > > > autovacuum seemed to totally correct the problem, even when the system
> > > > had already crossed xidStopLimit - it came back on its own. I wasn't
> > > > completely sure of how robust this effect was, though.
> > 
> > I'll put some effort in finding any way that it might not be robust.
> 
> A VACUUM may create a not-trivially-bounded number of multixacts via
> FreezeMultiXactId().  In a cluster at multiStopLimit, completing VACUUM
> without error needs preparation something like:
> 
> 1. Kill each XID that might appear in a multixact.
> 2. Resolve each prepared transaction that might appear in a multixact.
> 3. Run VACUUM.  At this point, multiStopLimit is blocking new multixacts from
>other commands, and the lack of running multixact members removes the need
>for FreezeMultiXactId() to create multixacts.
> 
> Adding to the badness of single-user mode so well described upthread, one can
> enter it without doing (2) and then wrap the nextMXact counter.

If we collected the information along the lines of  I proposed in the second 
half of
https://www.postgresql.org/message-id/20220204013539.qdegpqzvayq3d4y2%40alap3.anarazel.de
we should be able to handle such cases more intelligently, I think?

We could e.g. add an error if FreezeMultiXactId() needs to create a new
multixact for a far-in-the-past xid. That's not great, of course, but if we
include the precise cause (pid of backend / prepared xact name / slot name /
...) necessitating creating a new multi, it'd still be a significant
improvement over the status quo.

Greetings,

Andres Freund




Re: initdb / bootstrap design

2022-02-20 Thread Andres Freund
Hi,

On 2022-02-19 20:46:26 -0500, Tom Lane wrote:
> I tried it like that (full patch attached) and the results are intensely
> disappointing.  On my Mac laptop, the time needed for 50 iterations of
> initdb drops from 16.8 sec to 16.75 sec.

Hm. I'd hoped for at least a little bit bigger win. But I think it enables
more, see below:


> Not sure that this is worth pursuing any further.

I experimented with moving all the bootstrapping into --boot mode and got it
working. Albeit definitely with a few hacks (more below).

While I had hoped for a bit more of a win, it's IMO a nice improvement.
Executing 10 initdb -N --wal-segsize 1 in a loop:

HEAD:

  assert:
  8.06user 1.17system 0:09.25elapsed 99%CPU (0avgtext+0avgdata 
91724maxresident)k
  0inputs+549280outputs (40major+99824minor)pagefaults 0swaps

  opt:
  2.89user 0.99system 0:04.81elapsed 80%CPU (0avgtext+0avgdata 
88864maxresident)k
  0inputs+549280outputs (40major+99792minor)pagefaults 0swaps


default to lz4:

  assert:
  7.61user 1.03system 0:08.69elapsed 99%CPU (0avgtext+0avgdata 
91508maxresident)k
  0inputs+546400outputs (42major+99551minor)pagefaults 0swaps

  opt:
  2.55user 0.94system 0:03.49elapsed 99%CPU (0avgtext+0avgdata 
88816maxresident)k
  0inputs+546400outputs (40major+99551minor)pagefaults 0swaps


bootstrap replace:

  assert:
  7.42user 1.00system 0:08.52elapsed 98%CPU (0avgtext+0avgdata 
91656maxresident)k
  0inputs+546400outputs (40major+97737minor)pagefaults 0swaps

  opt:
  2.49user 0.98system 0:03.49elapsed 99%CPU (0avgtext+0avgdata 
88700maxresident)k
  0inputs+546400outputs (40major+97728minor)pagefaults 0swaps


everything in bootstrap:

  assert:
  6.31user 0.94system 0:07.35elapsed 98%CPU (0avgtext+0avgdata 
97812maxresident)k
  0inputs+547360outputs (30major+88617minor)pagefaults 0swaps

  opt:
  2.42user 0.85system 0:03.28elapsed 99%CPU (0avgtext+0avgdata 
94572maxresident)k
  0inputs+547360outputs (30major+83712minor)pagefaults 0swaps


optimize WAL in bootstrap:
  assert:
  6.26user 0.96system 0:07.29elapsed 99%CPU (0avgtext+0avgdata 
97844maxresident)k
  0inputs+547360outputs (30major+88586minor)pagefaults 0swaps

  opt:
  2.43user 0.80system 0:03.24elapsed 99%CPU (0avgtext+0avgdata 
94436maxresident)k
  0inputs+547360outputs (30major+83664minor)pagefaults 0swaps


remote isatty in bootstrap:

  assert:
  6.15user 0.83system 0:06.99elapsed 99%CPU (0avgtext+0avgdata 
97832maxresident)k
  0inputs+465120outputs (30major+88559minor)pagefaults 0swaps

  opt:
  2.28user 0.85system 0:03.14elapsed 99%CPU (0avgtext+0avgdata 
94604maxresident)k
  0inputs+465120outputs (30major+83728minor)pagefaults 0swaps


That's IMO not bad.

On windows I see a higher gains, which makes sense, because filesystem IO is
slower. Freebsd as well, but the variance is oddly high, so I might be doing
something wrong.


The main reason I like this however isn't the speedup itself, but that after
this initdb doesn't depend on single user mode at all anymore.


About the prototype:

- Most of the bootstrap SQL is executed from bootstrap.c itself. But some
  still comes from the client. E.g. password, a few information_schema
  details and the database / authid changes.

- To execute the sql I mostly used extension.c's
  read_whole_file()/execute_sql_string(). But VACUUM, CREATE DATABASE require
  all the transactional hacks in portal.c etc. So I wrapped
  exec_simple_query() for that phase.

  Might be better to just call vacuum.c / database.c directly.

- for indexed relcache access to work the phase of
  RelationCacheInitializePhase3() that's initially skipped needs to be
  executed. I hacked that up by adding a RelationCacheInitializePhase3b() that
  bootstrap.c can call, but that's obviously too ugly to live.

- InvalidateSystemCaches() is needed after bki processing. Otherwise I see an
  "row is too big:" error. Didn't investigate yet.

- I definitely removed some validation that we'd probably want. But that seems
  something to care about later...

- 0004 prevents a fair bit of WAL from being written. While XLogInsert did
  some of that, it didn't block FPIs, which obviously are bulky. This reduces
  WAL from ~5MB to ~100kB.


There's quite a bit of further speedup potential:

- One bottleneck, particularly in optimized mode, is the handling of huge node
  trees for views. strToNode() and nodeRead() are > 10% alone

- Enabling index access sometime during the postgres.bki processing would make
  invalidation handling for subsequent indexes faster. Or maybe we can disable
  a few more invalidations. Inval processing is >10%

- more than 10% (assert) / 7% (optimized) is spent in
  compute_scalar_stats()->qsort_arg(). Something seems off with that to me.


Completely crazy?


Greetings,

Andres Freund
>From 45d63168ddeb8bdf3ed29ca150f453ffcd051697 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sun, 20 Feb 2022 12:20:42 -0800
Subject: [PATCH v1 1/5] Set default_toast_compression=lz4 if available.

Makes initdb faster, generally a good idea, 

Re: Adding CI to our tree (ccache)

2022-02-20 Thread Justin Pryzby
On Sun, Feb 20, 2022 at 12:47:31PM -0800, Andres Freund wrote:
> > Did you ever try to use clcache (or others) ?
> > 
> > When I tried, it refused to cache because of our debug settings
> > (DebugInformationFormat) - which seem to be enabled even in release mode.
> 
> > I wonder if that'll be an issue for ccache, too.  I think that line may 
> > need to
> > be conditional on debug mode.
> 
> That's relatively easily solvable by using a different debug format IIRC (/Z7
> or such).

Yes. I got that working for CI by overriding with a value from the environment.
https://cirrus-ci.com/task/6191974075072512

This is right after rebasing, so it doesn't save anything, but normally cuts
build time to 90sec, which isn't impressive, but it's something.

BTW, I think it's worth compiling the windows build with optimizations (as I
did here).  At least with all the tap tests, this pays for itself.  I suppose
you don't want to use a Release build, but optimizations could be enabled by
an(other) environment variable.

-- 
Justin




Re: killing perl2host

2022-02-20 Thread Thomas Munro
On Sun, Feb 20, 2022 at 10:51 AM Andrew Dunstan  wrote:
> On 2/19/22 16:34, Andres Freund wrote:
> >> The first
> >> removes perl2host completely and adjusts  all the places that use it.
> >> The second removes almost all the remaining msys special processing in
> >> TAP tests.

Very nice improvement.  Thanks for sorting this out.  I didn't enjoy
playing "Wordle" with the build farm.




Re: Adding CI to our tree (ccache)

2022-02-20 Thread Andres Freund
Hi,

On 2022-02-20 13:36:55 -0600, Justin Pryzby wrote:
> Have you tried to use the yet-to-be-released ccache with MSVC ?

Yes, it doesn't work, because it requires cl.exe to be used in a specific way
(only a single input file, specific output file naming). Which would require a
decent amount of changes to src/tools/msvc. I think it's more realistic with
meson etc.


> Also, do you know about msbuild /outputResultsCache ?

I don't think it's really usable for what we need. But it's hard to tell.


> Did you ever try to use clcache (or others) ?
> 
> When I tried, it refused to cache because of our debug settings
> (DebugInformationFormat) - which seem to be enabled even in release mode.

> I wonder if that'll be an issue for ccache, too.  I think that line may need 
> to
> be conditional on debug mode.

That's relatively easily solvable by using a different debug format IIRC (/Z7
or such).

Greetings,

Andres Freund




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-20 Thread Peter Geoghegan
,

On Sun, Feb 20, 2022 at 7:30 AM Robert Haas  wrote:
> Right, so we at least need to add a similar comment to what I proposed
> for MXIDs, and maybe other changes are needed, too.

Agreed.

> > Maybe the way that FreezeLimit/cutoff_xid is overloaded can be fixed
> > here, to make all of this less confusing. I only now fully realized
> > how confusing all of this stuff is -- very.
>
> Right. I think I understand all of this, or at least most of it -- but
> not from the comment. The question is how the comment can be more
> clear. My general suggestion is that function header comments should
> have more to do with the behavior of the function than how it fits
> into the bigger picture. If it's clear to the reader what conditions
> must hold before calling the function and which must hold on return,
> it helps a lot. IMHO, it's the job of the comments in the calling
> function to clarify why we then choose to call that function at the
> place and in the way that we do.

You've given me a lot of high quality feedback on all of this, which
I'll work through soon. It's hard to get the balance right here, but
it's made much easier by this kind of feedback.

> I think that the idea has potential, but I don't think that I
> understand yet what the *exact* algorithm is.

The algorithm seems to exploit a natural tendency that Andres once
described in a blog post about his snapshot scalability work [1]. To a
surprising extent, we can usefully bucket all tuples/pages into two
simple categories:

1. Very, very old ("infinitely old" for all practical purposes).

2. Very very new.

There doesn't seem to be much need for a third "in-between" category
in practice. This seems to be at least approximately true all of the
time.

Perhaps Andres wouldn't agree with this very general statement -- he
actually said something more specific. I for one believe that the
point he made generalizes surprisingly well, though. I have my own
theories about why this appears to be true. (Executive summary: power
laws are weird, and it seems as if the sparsity-of-effects principle
makes it easy to bucket things at the highest level, in a way that
generalizes well across disparate workloads.)

> Maybe I need to read the
> code, when I have some time for that. I can't form an intelligent
> opinion at this stage about whether this is likely to be a net
> positive.

The code in the v8-0002 patch is a bit sloppy right now. I didn't
quite get around to cleaning it up -- I was focussed on performance
validation of the algorithm itself. So bear that in mind if you do
look at v8-0002 (might want to wait for v9-0002 before looking).

I believe that the only essential thing about the algorithm itself is
that it freezes all the tuples on a page when it anticipates setting
the page all-visible, or (barring edge cases) freezes none at all.
(Note that setting the page all-visible/all-frozen may be happen just
after lazy_scan_prune returns, or in the second pass over the heap,
after LP_DEAD items are set to LP_UNUSED -- lazy_scan_prune doesn't
care which way it will happen.)

There are one or two other design choices that we need to make, like
what exact tuples we freeze in the edge case where FreezeLimit/XID age
forces us to freeze in lazy_scan_prune. These other design choices
don't seem relevant to the issue of central importance, which is
whether or not we come out ahead overall with this new algorithm.
FreezeLimit will seldom affect our choice to freeze or not freeze now,
and so AFAICT the exact way that FreezeLimit affects which precise
freezing-eligible tuples we freeze doesn't complicate performance
validation.

Remember when I got excited about how my big TPC-C benchmark run
showed a predictable, tick/tock style pattern across VACUUM operations
against the order and order lines table [2]? It seemed very
significant to me that the OldestXmin of VACUUM operation n
consistently went on to become the new relfrozenxid for the same table
in VACUUM operation n + 1. It wasn't exactly the same XID, but very
close to it (within the range of noise). This pattern was clearly
present, even though VACUUM operation n + 1 might happen as long as 4
or 5 hours after VACUUM operation n (this was a big table).

This pattern was encouraging to me because it showed (at least for the
workload and tables in question) that the amount of unnecessary extra
freezing can't have been too bad -- the fact that we can always
advance relfrozenxid in the same way is evidence of that. Note that
the vacuum_freeze_min_age setting can't have affected our choice of
what to freeze (given what we see in the logs), and yet there is a
clear pattern where the pages (it's really pages, not tuples) that the
new algorithm doesn't freeze in VACUUM operation n will reliably get
frozen in VACUUM operation n + 1 instead.

And so this pattern seems to lend support to the general idea of
letting the workload itself be the primary driver of what pages we
freeze (not FreezeLimit, and not anything based 

Re: libpq async duplicate error results

2022-02-20 Thread Tom Lane
Alvaro Herrera  writes:
> On 2022-Feb-07, Tom Lane wrote:
>> I see that PQgetResult does a hacky "resetPQExpBuffer(>errorMessage)"
>> in between pipelined queries.  It seems like if there are real usability
>> issues in this area, they probably come from that not being well placed.
>> In particular, I wonder why that's done with the pqPipelineProcessQueue
>> call in the PGASYNC_IDLE stanza, yet not with the pqPipelineProcessQueue
>> call in the PGASYNC_READY stanza (where we're returning a PIPELINE_SYNC
>> result).  It kinda looks to me like maybe pqPipelineProcessQueue
>> ought to have the responsibility for doing that, rather than having
>> two out of the three call sites do it.  Also it would seem more natural
>> to associate it with the reinitialization that happens inside
>> pqPipelineProcessQueue.

> Yeah, I agree that the placement of error message reset in pipelined
> libpq is more ad-hoc to the testing I was doing than carefully
> principled.  I didn't test changing it, but on a quick look your
> proposed change seems reasonable to me.

Here's a patch to do that.  It passes check-world (particularly
src/test/modules/libpq_pipeline), for whatever that's worth.

However ... in the wake of 618c16707, I wonder if we should consider
an alternative definition, which is to NOT clear errorMessage when
starting a new pipelined query.  (That would be this patch minus
the addition to pqPipelineProcessQueue.)  Thanks to 618c16707,
the returned error PGresult(s) should bear exactly the same contents
either way.  What would change is that PQerrorMessage(conn) would return
the concatenation of all errors that have occurred during the current
pipeline sequence.  I think that's arguably a more useful definition
than what we have now --- though obviously it'd require a docs change.
It seems to fit with the spirit of the v14 changes to ensure that
PQerrorMessage tells you about everything that happened during a
failed connection attempt, not only the last thing.

I've tested that variant, and it also passes check-world, which may
say something about libpq_pipeline's coverage of error cases ...
I didn't look closely.

regards, tom lane

diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 45dddaf556..495d2b4c13 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1380,10 +1380,7 @@ pqAppendCmdQueueEntry(PGconn *conn, PGcmdQueueEntry *entry)
 			 * state, we don't have to do anything.
 			 */
 			if (conn->asyncStatus == PGASYNC_IDLE)
-			{
-pqClearConnErrorState(conn);
 pqPipelineProcessQueue(conn);
-			}
 			break;
 	}
 }
@@ -2149,11 +2146,8 @@ PQgetResult(PGconn *conn)
 /*
  * We're about to return the NULL that terminates the round of
  * results from the current query; prepare to send the results
- * of the next query when we're called next.  Also, since this
- * is the start of the results of the next query, clear any
- * prior error message.
+ * of the next query when we're called next.
  */
-pqClearConnErrorState(conn);
 pqPipelineProcessQueue(conn);
 			}
 			break;
@@ -3099,6 +3093,9 @@ pqPipelineProcessQueue(PGconn *conn)
 		conn->cmd_queue_head == NULL)
 		return;
 
+	/* Reset the error state */
+	pqClearConnErrorState(conn);
+
 	/* Initialize async result-accumulation state */
 	pqClearAsyncResult(conn);
 


Re: Adding CI to our tree (ccache)

2022-02-20 Thread Justin Pryzby
Have you tried to use the yet-to-be-released ccache with MSVC ?

Also, do you know about msbuild /outputResultsCache ?
When I tried that, it gave a bunch of error.

https://cirrus-ci.com/task/5697497241747456

|[16:35:13.605]  1>c:\cirrus\pgsql.sln.metaproj : error : MSB4252: Project 
"c:\cirrus\pgsql.sln" with global properties [c:\cirrus\pgsql.sln]
|[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error : 
(TrackFileAccess=false; CLToolExe=clcache.exe) [c:\cirrus\pgsql.sln]
|[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error : is building project 
"c:\cirrus\initdb.vcxproj" with global properties [c:\cirrus\pgsql.sln]
|[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error : 
(TrackFileAccess=false; CLToolExe=clcache.exe; BuildingSolutionFile=true; 
CurrentSolutionConfigurationContents= 
[c:\cirrus\pgsql.sln]
|[16:35:13.615] c:\cirrus\pgsql.sln.metaproj : error :   Debug|x64 
[c:\cirrus\pgsql.sln]
|...
|[16:35:14.518]c:\cirrus\pgsql.sln.metaproj : error :   
Debug|x64 
[c:\cirrus\pgsql.sln]
|[16:35:14.518]c:\cirrus\pgsql.sln.metaproj : error : 
; SolutionDir=c:\cirrus\; SolutionExt=.sln; 
SolutionFileName=pgsql.sln; SolutionName=pgsql; 
SolutionPath=c:\cirrus\pgsql.sln; Configuration=Debug; Platform=x64) 
[c:\cirrus\pgsql.sln]
|[16:35:14.518]c:\cirrus\pgsql.sln.metaproj : error : with the 
(default) target(s) but the build result for the built project is not in the 
engine cache. In isolated builds this could mean one of the following: 
[c:\cirrus\pgsql.sln]
|[16:35:14.518]c:\cirrus\pgsql.sln.metaproj : error : - the 
reference was called with a target which is not specified in the 
ProjectReferenceTargets item in project "c:\cirrus\pgsql.sln" 
[c:\cirrus\pgsql.sln]
|[16:35:14.518]c:\cirrus\pgsql.sln.metaproj : error : - the 
reference was called with global properties that do not match the static graph 
inferred nodes [c:\cirrus\pgsql.sln]
|[16:35:14.518]c:\cirrus\pgsql.sln.metaproj : error : - the 
reference was not explicitly specified as a ProjectReference item in project 
"c:\cirrus\pgsql.sln" [c:\cirrus\pgsql.sln]
|[16:35:14.518]c:\cirrus\pgsql.sln.metaproj : error :  
[c:\cirrus\pgsql.sln]
|[16:35:14.518] 
|[16:35:14.518] 0 Warning(s)
|[16:35:14.518] 149 Error(s)

Did you ever try to use clcache (or others) ?

When I tried, it refused to cache because of our debug settings
(DebugInformationFormat) - which seem to be enabled even in release mode.

I wonder if that'll be an issue for ccache, too.  I think that line may need to
be conditional on debug mode.

https://cirrus-ci.com/task/4808554103177216

|[17:14:28.765]   C:\ProgramData\chocolatey\lib\clcache\clcache\clcache.py 
Expanded commandline '['/c', '/Isrc/include', '/Isrc/include/port/win32', 
'/Isrc/include/port/win32_msvc', '/Ic:/openssl/1.1/\\include', '/Zi', 
'/nologo', '/W3', '/WX-', '/diagnostics:column', '/Ox', '/D', 'WIN32', '/D', 
'_WINDOWS', '/D', '__WINDOWS__', '/D', '__WIN32__', '/D', 
'WIN32_STACK_RLIMIT=4194304', '/D', '_CRT_SECURE_NO_DEPRECATE', '/D', 
'_CRT_NONSTDC_NO_DEPRECATE', '/D', 'FRONTEND', '/D', '_MBCS', '/GF', '/Gm-', 
'/EHsc', '/MD', '/GS', '/fp:precise', '/Zc:wchar_t', '/Zc:forScope', 
'/Zc:inline', '/Fo.\\Release\\libpgcommon\\', 
'/Fd.\\Release\\libpgcommon\\libpgcommon.pdb', '/external:W3', '/Gd', '/TC', 
'/wd4018', '/wd4244', '/wd4273', '/wd4101', '/wd4102', '/wd4090', '/wd4267', 
'/FC', '/errorReport:queue', '/MP', 'src/common/archive.c', 
'src/common/base64.c', 'src/common/checksum_helper.c', 
'src/common/config_info.c', 'src/common/controldata_utils.c', 
'src/common/cryptohash_openssl.c', 'src/common/d2s.c', 'src/common/encnames.c', 
'src/common/exec.c', 'src/common/f2s.c', 'src/common/fe_memutils.c', 
'src/common/file_perm.c', 'src/common/file_utils.c', 'src/common/hashfn.c', 
'src/common/hmac_openssl.c', 'src/common/ip.c', 'src/common/jsonapi.c', 
'src/common/keywords.c', 'src/common/kwlookup.c', 'src/common/link-canary.c', 
'src/common/logging.c', 'src/common/md5_common.c', 'src/common/pg_get_line.c', 
'src/common/pg_lzcompress.c', 'src/common/pg_prng.c', 'src/common/pgfnames.c', 
'src/common/protocol_openssl.c', 'src/common/psprintf.c', 
'src/common/relpath.c', 'src/common/restricted_token.c', 'src/common/rmtree.c', 
'src/common/saslprep.c', 'src/common/scram-common.c', 'src/common/sprompt.c', 
'src/common/string.c', 'src/common/stringinfo.c', 'src/common/unicode_norm.c', 
'src/common/username.c', 'src/common/wait_error.c', 'src/common/wchar.c']'
|[17:14:28.765]   C:\ProgramData\chocolatey\lib\clcache\clcache\clcache.py 
Cannot cache invocation as ['/c', '/Isrc/include', '/Isrc/include/port/win32', 
'/Isrc/include/port/win32_msvc', '/Ic:/openssl/1.1/\\include', '/Zi', 
'/nologo', '/W3', '/WX-', '/diagnostics:column', '/Ox', '/D', 'WIN32', '/D', 
'_WINDOWS', '/D', '__WINDOWS__', '/D', '__WIN32__', '/D', 
'WIN32_STACK_RLIMIT=4194304', '/D', '_CRT_SECURE_NO_DEPRECATE', '/D', 
'_CRT_NONSTDC_NO_DEPRECATE', '/D', 

Re: Slow standby snapshot

2022-02-20 Thread Andrey Borodin


> On 22 Nov 2021, at 14:05, Michail Nikolaev  wrote:
> 
>> Write barrier must be issued after write, not before.
>> Don't we need to issue read barrier too?
> 
> The write barrier is issued after the changes to KnownAssignedXidsNext
> and KnownAssignedXidsValid arrays and before the update of
> headKnownAssignedXids.
> So, it seems to be correct. We make sure once the CPU sees changes of
> headKnownAssignedXids - underlying arrays contain all the required
> data.

Patch on barrier seems too complicated to me right now. I’d propose to focus on 
KnowAssignedXidsNext patch: it’s clean, simple and effective.

I’ve rebased the patch so that it does not depend on previous step. Please 
check out it’s current state, if you are OK with it - let’s mark the patch 
Ready for Committer. Just maybe slightly better commit message would make the 
patch easier to understand.


Thanks! Best regards, Andrey Borodin.


v4-0001-Use-linked-list-to-improve-KnownAssignedXids-perf.patch
Description: Binary data


Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-20 Thread Robert Haas
On Fri, Feb 18, 2022 at 7:12 PM Peter Geoghegan  wrote:
> We have to worry about XIDs from MultiXacts (and xmax values more
> generally). And we have to worry about the case where we start out
> with only xmin frozen (by an earlier VACUUM), and then have to freeze
> xmax too. I believe that we have to generally consider xmin and xmax
> independently. For example, we cannot ignore xmax, just because we
> looked at xmin, since in general xmin alone might have already been
> frozen.

Right, so we at least need to add a similar comment to what I proposed
for MXIDs, and maybe other changes are needed, too.

> The difference between the cleanup lock path (in
> lazy_scan_prune/heap_prepare_freeze_tuple) and the share lock path (in
> lazy_scan_noprune/heap_tuple_needs_freeze) is what is at issue in both
> of these confusing comment blocks, really. Note that cutoff_xid is the
> name that both heap_prepare_freeze_tuple and heap_tuple_needs_freeze
> have for FreezeLimit (maybe we should rename every occurence of
> cutoff_xid in heapam.c to FreezeLimit).
>
> At a high level, we aren't changing the fundamental definition of an
> aggressive VACUUM in any of the patches -- we still need to advance
> relfrozenxid up to FreezeLimit in an aggressive VACUUM, just like on
> HEAD, today (we may be able to advance it *past* FreezeLimit, but
> that's just a bonus). But in a non-aggressive VACUUM, where there is
> still no strict requirement to advance relfrozenxid (by any amount),
> the code added by 0001 can set relfrozenxid to any known safe value,
> which could either be from before FreezeLimit, or after FreezeLimit --
> almost anything is possible (provided we respect the relfrozenxid
> invariant, and provided we see that we didn't skip any
> all-visible-not-all-frozen pages).
>
> Since we still need to "respect FreezeLimit" in an aggressive VACUUM,
> the aggressive case might need to wait for a full cleanup lock the
> hard way, having tried and failed to do it the easy way within
> lazy_scan_noprune (lazy_scan_noprune will still return false when any
> call to heap_tuple_needs_freeze for any tuple returns false) -- same
> as on HEAD, today.
>
> And so the difference at issue here is: FreezeLimit/cutoff_xid only
> needs to affect the new NewRelfrozenxid value we use for relfrozenxid in
> heap_prepare_freeze_tuple, which is involved in real freezing -- not
> in heap_tuple_needs_freeze, whose main purpose is still to help us
> avoid freezing where a cleanup lock isn't immediately available. While
> the purpose of FreezeLimit/cutoff_xid within heap_tuple_needs_freeze
> is to determine its bool return value, which will only be of interest
> to the aggressive case (which might have to get a cleanup lock and do
> it the hard way), not the non-aggressive case (where ratcheting back
> NewRelfrozenxid is generally possible, and generally leaves us with
> almost as good of a value).
>
> In other words: the calls to heap_tuple_needs_freeze made from
> lazy_scan_noprune are simply concerned with the page as it actually
> is, whereas the similar/corresponding calls to
> heap_prepare_freeze_tuple from lazy_scan_prune are concerned with
> *what the page will actually become*, after freezing finishes, and
> after lazy_scan_prune is done with the page entirely (ultimately
> the final NewRelfrozenxid value set in pg_class.relfrozenxid only has
> to be <= the oldest extant XID *at the time the VACUUM operation is
> just about to end*, not some earlier time, so "being versus becoming"
> is an interesting distinction for us).
>
> Maybe the way that FreezeLimit/cutoff_xid is overloaded can be fixed
> here, to make all of this less confusing. I only now fully realized
> how confusing all of this stuff is -- very.

Right. I think I understand all of this, or at least most of it -- but
not from the comment. The question is how the comment can be more
clear. My general suggestion is that function header comments should
have more to do with the behavior of the function than how it fits
into the bigger picture. If it's clear to the reader what conditions
must hold before calling the function and which must hold on return,
it helps a lot. IMHO, it's the job of the comments in the calling
function to clarify why we then choose to call that function at the
place and in the way that we do.

> As a general rule, we try to freeze all of the remaining live tuples
> on a page (following pruning) together, as a group, or none at all.
> Most of the time this is triggered by our noticing that the page is
> about to be set all-visible (but not all-frozen), and doing work
> sufficient to mark it fully all-frozen instead. Occasionally there is
> FreezeLimit to consider, which is now more of a backstop thing, used
> to make sure that we never get too far behind in terms of unfrozen
> XIDs. This is useful in part because it avoids any future
> non-aggressive VACUUM that is fundamentally unable to advance
> relfrozenxid (you can't skip all-visible pages if there are 

Re: Print warning when I execute my own extension function

2022-02-20 Thread Dong Wook Lee
now I found a bug in this code.
I found the code of this part below is wrong

```
char *buffer = palloc(7 * sizeof(char));
>> unsigned int offset = sizeof(buffer); // wrong
buffer[--offset] = '\0'; /* here */
```

Thank you for replying

2022년 2월 20일 (일) 오후 11:51, Dong Wook Lee 님이 작성:
>
> I found a source code line that prints warning in the source code
> (src/backend/utils/mmgr/aset.c, line 1496)
>   /*
> │   1492 * Check for overwrite of
> padding space in an allocated chunk.
> │   1493 */
> │   1494if (chunk->aset == (void *)
> set && dsize < chsize &&
> │   1495!sentinel_ok(chunk,
> ALLOC_CHUNKHDRSZ + dsize))
> │B+>1496elog(WARNING, "problem
> in alloc set %s: detected write past chunk end in block %p, chunk %p",
> │   1497 name, block, chunk);
>
> In my extension c code.
> I assigned value at the allocated memory address
> and I found this assigned statement occurs warning
> should I don't use it like this?
> ```
> char *buffer = palloc(7 * sizeof(char));
> unsigned int offset = sizeof(buffer);
> > buffer[--offset] = '\0'; /* here */
> ```
>
>
>
> 2022년 2월 20일 (일) 오후 7:29, Julien Rouhaud 님이 작성:
> >
> > Hi,
> >
> > On Sun, Feb 20, 2022 at 07:23:56PM +0900, Dong Wook Lee wrote:
> > > Hi hackers,
> > > I've read in this blog (
> > > http://big-elephants.com/2015-10/writing-postgres-extensions-part-i/)
> > > and I wrote an extension about base36_encode with c code
> > > but when I executed a query like this below I got a warning below.
> > >
> > > ```
> > > postgres=# SELECT base36_encode(123);
> > > WARNING:  problem in alloc set ExprContext: detected write past chunk end
> > > in block 0x55fb75334d40, chunk 0x55fb75334d68
> > > WARNING:  problem in alloc set ExprContext: detected write past chunk end
> > > in block 0x55fb75334d40, chunk 0x55fb75334d68
> > >  base36_encode
> > > ---
> > >  3f
> > > (1 row)
> > > ```
> > >
> > > I don't know what this warning means and how I can fix it.
> >
> > It means that you have some problem in your memory allocation.  You can 
> > refer
> > to src/backend/utils/mmgr/aset.c for more details on those safety checks.




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-20 Thread Robert Haas
On Sat, Feb 19, 2022 at 8:54 PM Andres Freund  wrote:
> > Leaving behind disconnected/orphaned heap-only tuples is pretty much
> > pointless anyway, since they'll never be accessible by index scans.
> > Even after a REINDEX, since there is no root item from the heap page
> > to go in the index. (A dump and restore might work better, though.)
>
> Given that heap_surgery's raison d'etre is correcting corruption etc, I think
> it makes sense for it to do as minimal work as possible. Iterating through a
> HOT chain would be a problem if you e.g. tried to repair a page with HOT
> corruption.

Yeah, I agree. I don't have time to respond to all of these emails
thoroughly right now, but I think it's really important that
pg_surgery do the exact surgery the user requested, and not any other
work. I don't think that page defragmentation should EVER be REQUIRED
as a condition of other work. If other code is relying on that, I'd
say it's busted. I'm a little more uncertain about the case where we
kill the root tuple of a HOT chain, because I can see that this might
leave the page a state where sequential scans see the tuple at the end
of the chain and index scans don't. I'm not sure whether that should
be the responsibility of pg_surgery itself to avoid, or whether that's
your problem as a user of it -- although I lean mildly toward the
latter view, at the moment. But in any case surely the pruning code
can't just decide to go into an infinite loop if that happens. Code
that manipulates the states of data pages needs to be as robust
against arbitrary on-disk states as we can reasonably make it, because
pages get garbled on disk all the time.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Print warning when I execute my own extension function

2022-02-20 Thread Dong Wook Lee
I found a source code line that prints warning in the source code
(src/backend/utils/mmgr/aset.c, line 1496)
  /*
│   1492 * Check for overwrite of
padding space in an allocated chunk.
│   1493 */
│   1494if (chunk->aset == (void *)
set && dsize < chsize &&
│   1495!sentinel_ok(chunk,
ALLOC_CHUNKHDRSZ + dsize))
│B+>1496elog(WARNING, "problem
in alloc set %s: detected write past chunk end in block %p, chunk %p",
│   1497 name, block, chunk);

In my extension c code.
I assigned value at the allocated memory address
and I found this assigned statement occurs warning
should I don't use it like this?
```
char *buffer = palloc(7 * sizeof(char));
unsigned int offset = sizeof(buffer);
> buffer[--offset] = '\0'; /* here */
```



2022년 2월 20일 (일) 오후 7:29, Julien Rouhaud 님이 작성:
>
> Hi,
>
> On Sun, Feb 20, 2022 at 07:23:56PM +0900, Dong Wook Lee wrote:
> > Hi hackers,
> > I've read in this blog (
> > http://big-elephants.com/2015-10/writing-postgres-extensions-part-i/)
> > and I wrote an extension about base36_encode with c code
> > but when I executed a query like this below I got a warning below.
> >
> > ```
> > postgres=# SELECT base36_encode(123);
> > WARNING:  problem in alloc set ExprContext: detected write past chunk end
> > in block 0x55fb75334d40, chunk 0x55fb75334d68
> > WARNING:  problem in alloc set ExprContext: detected write past chunk end
> > in block 0x55fb75334d40, chunk 0x55fb75334d68
> >  base36_encode
> > ---
> >  3f
> > (1 row)
> > ```
> >
> > I don't know what this warning means and how I can fix it.
>
> It means that you have some problem in your memory allocation.  You can refer
> to src/backend/utils/mmgr/aset.c for more details on those safety checks.




Re: pgsql: Add support for building with ZSTD.

2022-02-20 Thread Robert Haas
On Sat, Feb 19, 2022 at 1:19 AM Michael Paquier  wrote:
> On Fri, Feb 18, 2022 at 06:53:10PM +, Robert Haas wrote:
> > Add support for building with ZSTD.
> >
> > This commit doesn't actually add anything that uses ZSTD; that will be
> > done separately. It just puts the basic infrastructure into place.
> >
> > Jeevan Ladhe, Robert Haas, and Michael Paquier. Reviewed by Justin
> > Pryzby and Andres Freund.
>
> I completely forgot that the section of the docs dedicated to the TAP
> tests with MSVC also needs a refresh to mention the new variable ZSTD,
> as of the attached.  Do you mind if I apply that?

No issues at all with you adjusting this, but I think that sentence
reads a little awkwardly. Perhaps instead of "The default is
zstd, that would be the command found in
PATH." you could write something like "The default
is zstd, which will search for a command by that
name in the configured PATH." Or maybe something
else is better, not sure exactly, your version just seems a little odd
to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Uniforms the errors msgs at tuplestore paths

2022-02-20 Thread Ranier Vilela
Em dom., 20 de fev. de 2022 às 11:30, Justin Pryzby 
escreveu:

> On Sun, Feb 20, 2022 at 11:12:42AM -0300, Ranier Vilela wrote:
> > Like how the commit
> >
> https://github.com/postgres/postgres/commit/07daca53bfcad59618a9c6fad304e380cc9d2bc1
> > The are some paths that were missed:
>
> I think these are all unified by the existing tuplestore patch.
> https://commitfest.postgresql.org/37/3420/

I can't see:
plperl.c
pl_exec.c
pttcl.c

Only jsonfuncs.c, but the error about "materialized mode" is not reported.

regards,
Ranier Vilela


Re: Uniforms the errors msgs at tuplestore paths

2022-02-20 Thread Justin Pryzby
On Sun, Feb 20, 2022 at 11:12:42AM -0300, Ranier Vilela wrote:
> Like how the commit
> https://github.com/postgres/postgres/commit/07daca53bfcad59618a9c6fad304e380cc9d2bc1
> The are some paths that were missed:

I think these are all unified by the existing tuplestore patch.
https://commitfest.postgresql.org/37/3420/

> -At jsonfuncs.c
>  The errors mgs do not report about the materialize mode.
> -At plperl.c
>  The function plperl_return_next_internal does not check rsi appropriately.
>  The error about materialize mode required, is missed.
> -At pl_exec.c
>  The error mgs do not report about the materialize mode
> -At pltcl.c:
>  The function pltcl_init_tuple_store does not check  rsi appropriately.




Uniforms the errors msgs at tuplestore paths

2022-02-20 Thread Ranier Vilela
Hi,

Like how the commit
https://github.com/postgres/postgres/commit/07daca53bfcad59618a9c6fad304e380cc9d2bc1
The are some paths that were missed:

-At jsonfuncs.c
 The errors mgs do not report about the materialize mode.
-At plperl.c
 The function plperl_return_next_internal does not check rsi appropriately.
 The error about materialize mode required, is missed.
-At pl_exec.c
 The error mgs do not report about the materialize mode
-At pltcl.c:
 The function pltcl_init_tuple_store does not check  rsi appropriately.

Patch attached.

regards,
Ranier Vilela


v1-0001-uniform-error-msgs-tuplestore.patch
Description: Binary data


Re: Print warning when I execute my own extension function

2022-02-20 Thread Julien Rouhaud
Hi,

On Sun, Feb 20, 2022 at 07:23:56PM +0900, Dong Wook Lee wrote:
> Hi hackers,
> I've read in this blog (
> http://big-elephants.com/2015-10/writing-postgres-extensions-part-i/)
> and I wrote an extension about base36_encode with c code
> but when I executed a query like this below I got a warning below.
>
> ```
> postgres=# SELECT base36_encode(123);
> WARNING:  problem in alloc set ExprContext: detected write past chunk end
> in block 0x55fb75334d40, chunk 0x55fb75334d68
> WARNING:  problem in alloc set ExprContext: detected write past chunk end
> in block 0x55fb75334d40, chunk 0x55fb75334d68
>  base36_encode
> ---
>  3f
> (1 row)
> ```
>
> I don't know what this warning means and how I can fix it.

It means that you have some problem in your memory allocation.  You can refer
to src/backend/utils/mmgr/aset.c for more details on those safety checks.




Print warning when I execute my own extension function

2022-02-20 Thread Dong Wook Lee
Hi hackers,
I've read in this blog (
http://big-elephants.com/2015-10/writing-postgres-extensions-part-i/)
and I wrote an extension about base36_encode with c code
but when I executed a query like this below I got a warning below.

```
postgres=# SELECT base36_encode(123);
WARNING:  problem in alloc set ExprContext: detected write past chunk end
in block 0x55fb75334d40, chunk 0x55fb75334d68
WARNING:  problem in alloc set ExprContext: detected write past chunk end
in block 0x55fb75334d40, chunk 0x55fb75334d68
 base36_encode
---
 3f
(1 row)
```

I don't know what this warning means and how I can fix it.

Thanks
Dong Wook Lee.


Re: postgres_fdw and skip locked

2022-02-20 Thread Alexander Pyhalov

Ashutosh Bapat писал 2022-02-17 16:30:

On Wed, Feb 16, 2022 at 8:38 PM Alexander Pyhalov
 wrote:


Ashutosh Bapat писал 2022-02-16 16:40:
> On Mon, Feb 14, 2022 at 4:23 PM Alexander Pyhalov
>  wrote:
>>
> I see that these options will work for all kinds of relations. So no
> problem if foreign table is pointing to something other than a table.

Hi.
The issue is that we perform deparsing while planing, we haven't
connected to server yet.
Are there any ideas how to find out its version without specifying it,
for example, in server options?


Yes, you are right. I wish foreign servers, esp. postgresql, could be
more integrated and if we could defer deparsing till execution phase.
But that's out of scope for this patch.


Hi.
I've updated patch to provide server-level "deparse_wait_policy" option, 
which controls if we deparse SKIP LOCKED or SELECT FOR UPDATE (true by 
default).
This will make possible for people with old servers to revert to old 
behavior.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 5f5a0434227debd8ca7ee406e8cb1997edff14a3 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 14 Feb 2022 12:01:26 +0300
Subject: [PATCH] postgres_fdw could pass lock wait policy to foreign server

---
 contrib/postgres_fdw/deparse.c|  12 +-
 .../postgres_fdw/expected/postgres_fdw.out| 110 +-
 contrib/postgres_fdw/option.c |   4 +-
 contrib/postgres_fdw/postgres_fdw.c   |   4 +
 contrib/postgres_fdw/postgres_fdw.h   |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql |  18 +++
 doc/src/sgml/postgres-fdw.sgml|  16 +++
 7 files changed, 161 insertions(+), 4 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac0288..f55afd461f0 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1429,8 +1429,7 @@ deparseLockingClause(deparse_expr_cxt *context)
  * For now, just ignore any [NO] KEY specification, since (a)
  * it's not clear what that means for a remote table that we
  * don't have complete information about, and (b) it wouldn't
- * work anyway on older remote servers.  Likewise, we don't
- * worry about NOWAIT.
+ * work anyway on older remote servers.
  */
 switch (rc->strength)
 {
@@ -1451,6 +1450,15 @@ deparseLockingClause(deparse_expr_cxt *context)
 if (bms_membership(rel->relids) == BMS_MULTIPLE &&
 	rc->strength != LCS_NONE)
 	appendStringInfo(buf, " OF %s%d", REL_ALIAS_PREFIX, relid);
+
+/* Lock behavior */
+if (fpinfo->deparse_wait_policy && rc->strength != LCS_NONE)
+{
+	if (rc->waitPolicy == LockWaitSkip)
+		appendStringInfoString(buf, " SKIP LOCKED");
+	else if (rc->waitPolicy == LockWaitError)
+		appendStringInfoString(buf, " NOWAIT");
+}
 			}
 		}
 	}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index b2e02caefe4..b33d61b8656 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -389,6 +389,35 @@ SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE;
  102 |  2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2  | 2  | foo
 (1 row)
 
+-- test wait policy specification
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = 101 FOR UPDATE NOWAIT;
+   QUERY PLAN
+-
+ Foreign Scan on public.ft1 t1
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, t1.*
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 101)) FOR UPDATE NOWAIT
+(3 rows)
+
+SELECT * FROM ft1 t1 WHERE c1 = 101 FOR UPDATE NOWAIT;
+ c1  | c2 |  c3   |  c4  |c5| c6 | c7 | c8  
+-++---+--+--+++-
+ 101 |  1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1  | 1  | foo
+(1 row)
+
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE SKIP LOCKED;
+ QUERY PLAN  
+-
+ Foreign Scan on public.ft1 t1
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, t1.*
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 102)) FOR SHARE SKIP LOCKED
+(3 rows)
+
+SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE SKIP LOCKED;
+ c1  | c2 |  c3   |  c4  |c5| c6 | c7 | c8