RE: [BUG]Update Toast data failure in logical replication

2022-02-10 Thread tanghy.f...@fujitsu.com
On Thu, Feb 10, 2022 9:34 PM Amit Kapila  wrote:
> 
> On Mon, Feb 7, 2022 at 1:27 PM Dilip Kumar  wrote:
> >
> > On Mon, Feb 7, 2022 at 12:25 PM Amit Kapila 
> wrote:
> > >
> > > Attached please find the modified patches.
> >
> > I have looked into the latest modification and back branch patches and
> > they look fine to me.
> >
> 
> Today, while looking at this patch again, I think I see one problem
> with the below change (referring pg10 patch):
> + if (attrnum < 0)
> + {
> + if (attrnum != ObjectIdAttributeNumber &&
> + attrnum != TableOidAttributeNumber)
> + {
> + modified = bms_add_member(modified,
> +   attrnum -
> +   FirstLowInvalidHeapAttributeNumber);
> + continue;
> + }
> + }
> ...
> ...
> + /* No need to check attributes that can't be stored externally. */
> + if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
> + continue;
> 
> I think it is possible that we use TupleDescAttr for system attribute
> (in this case ObjectIdAttributeNumber/TableOidAttributeNumber) which
> will be wrong as it contains only user attributes, not system
> attributes. See comments atop TupleDescData.
> 
> I think this check should be modified to  if (attrnum < 0 || isnull1
> || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1). What do you
> think?
> 

I agree with you.

> * Another minor comment:
> + if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
> +   value2, isnull1, isnull2))
> 
> I think here we can directly use tupdesc instead of 
> RelationGetDescr(relation).
> 

+1.

Attached the patches which fixed the above two comments and the first comment in
my previous mail [1], the rest is the same as before.
I ran the tests on all branches, they all passed as expected.

[1] 
https://www.postgresql.org/message-id/OS0PR01MB61134DD41BE6D986B9DB80CCFB2E9%40OS0PR01MB6113.jpnprd01.prod.outlook.com

Regards,
Tang


13-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patch
Description:  13-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patch


14-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patch
Description:  14-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patch


HEAD-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-at.patch
Description:  HEAD-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-at.patch


10-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patch
Description:  10-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patch


11-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patch
Description:  11-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patch


12-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patch
Description:  12-v9-0001-WAL-log-unchanged-toasted-replica-identity-key-.patch


Re: Race condition in TransactionIdIsInProgress

2022-02-10 Thread Andres Freund
Hi,

On 2022-02-10 21:56:09 -0800, Andres Freund wrote:
> I think this may actually mean that the hot corruption problem fixed in
> 
> commit 18b87b201f7
> Author: Andres Freund 
> Date:   2021-12-10 20:12:26 -0800
> 
> Fix possible HOT corruption when RECENTLY_DEAD changes to DEAD while 
> pruning.
> 
> for 14/master is present in older branches too :(. Need to trace through the
> HTSV and pruning logic with a bit more braincells than currently available to
> be sure.

On second thought, there's probably sufficiently more direct corruption this
can cause than corruption via hot pruning. Not that that's not a problem, but
... Inconsistent TransactionIdIsInProgress() result can wreak havoc quite
broadly.

Looks lik syncrep will make this a lot worse, because it can drastically
increase the window between the TransactionIdCommitTree() and
ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
least it might make it easier to write tests exercising this scenario...

Greetings,

Andres Freund




Re: Race condition in TransactionIdIsInProgress

2022-02-10 Thread Andres Freund
Hi,

On 2022-02-10 15:21:27 -0500, Robert Haas wrote:
> On Thu, Feb 10, 2022 at 11:46 AM Konstantin Knizhnik  
> wrote:
> > Postgres first records state transaction in CLOG, then removes
> > transaction from procarray and finally releases locks.
> > But it can happen that transaction is marked as committed in CLOG,
> > XMAX_COMMITTED bit is set in modified tuple but
> > TransactionIdIsInProgress still returns true. As a result "t_xmin %u is
> > uncommitted in tuple (%u,%u) to be updated in table"
> > error is reported.
>
> This is exactly why the HeapTupleSatisfiesFOO() functions all test
> TransactionIdIsInProgress() first and TransactionIdDidCommit() only if
> it returns false. I don't think it's really legal to test
> TransactionIdDidCommit() without checking TransactionIdIsInProgress()
> first. And I bet that's exactly what this code is doing...

The problem is that the TransactionIdDidAbort() call is coming from within
TransactionIdIsInProgress().

If I understand correctly, the problem is that the xid cache stuff doesn't
correctly work for subtransactions and breaks TransactionIdIsInProgress().

For the problem to occur you need an xid that is for a subtransaction that
suboverflowed and that is 'clog committed' but not 'procarray committed'.

If that xid is tested twice with TransactionIdIsInProgress(), you will get a
bogus result on the second call within the same session.

The first TransactionIdIsInProgress() will go to
/*
 * Step 4: have to check pg_subtrans.
 *
 * At this point, we know it's either a subtransaction of one of the 
Xids
 * in xids[], or it's not running.  If it's an already-failed
 * subtransaction, we want to say "not running" even though its parent 
may
 * still be running.  So first, check pg_xact to see if it's been 
aborted.
 */
xc_slow_answer_inc();

if (TransactionIdDidAbort(xid))
return false;

and fetch that xid. The TransactionLogFetch() will set cachedFetchXid, because
the xid's status is TRANSACTION_STATUS_COMMITTED. Because the transaction
didn't abort TransactionIdIsInProgress() will go onto the the following loop,
see the toplevel xid is in progress in the procarray and return true.

The second call to TransactionIdIsInProgress() will see that the fetched
transaction matches the cached transactionid and return false.

/*
 * We may have just checked the status of this transaction, so if it is
 * already known to be completed, we can fall out without any access to
 * shared memory.
 */
if (TransactionIdIsKnownCompleted(xid))
{
xc_by_known_xact_inc();
return false;
}

Without any improper uses of TransactionIdDidAbort()/TransactionIdDidCommit()
before TransactionIdIsInProgress(). And other sessions will still return
true. And even *this* session can return true again, if another transaction is
checked that ends up caching another transaction status in cachedFetchXid.


It looks to me like the TransactionIdIsKnownCompleted() path in
TransactionIdIsInProgress() is just entirely broken. An prior
TransactionLogFetch() can't be allowed to make subsequent
TransactionIdIsInProgress() calls return wrong values.


If I understand the problem correctly, a SELECT
pg_xact_status(clog-committed-pgproc-in-progres) can make
TransactionIdIsInProgress() return wrong values too. It's not just
TransactionIdIsInProgress() itself.


Looks like this problem goes back all the way back to

commit 611b4393f22f2bb43135501cd6b7591299b6b453
Author: Tom Lane 
Date:   2008-03-11 20:20:35 +

Make TransactionIdIsInProgress check transam.c's single-item XID status 
cache
before it goes groveling through the ProcArray.  In situations where the 
same
recently-committed transaction ID is checked repeatedly by tqual.c, this 
saves
a lot of shared-memory searches.  And it's cheap enough that it shouldn't
hurt noticeably when it doesn't help.
Concept and patch by Simon, some minor tweaking and comment-cleanup by Tom.


I think this may actually mean that the hot corruption problem fixed in

commit 18b87b201f7
Author: Andres Freund 
Date:   2021-12-10 20:12:26 -0800

Fix possible HOT corruption when RECENTLY_DEAD changes to DEAD while 
pruning.

for 14/master is present in older branches too :(. Need to trace through the
HTSV and pruning logic with a bit more braincells than currently available to
be sure.

Greetings,

Andres Freund




Re: Is there any documentation on how to correctly create extensions in HA(primary-standby) setup?

2022-02-10 Thread Julien Rouhaud
On Fri, Feb 11, 2022 at 10:16:27AM +0530, Bharath Rupireddy wrote:
> On Tue, Jan 18, 2022 at 1:26 PM Julien Rouhaud  wrote:
> >
> > I think that the common terminology is "module", not "extension".  That's
> > especially important here as this information is also relevant for modules 
> > that
> > may come with an SQL level extension.  This should be made clear in that new
> > documentation, same for the CREATE EXTENSION part that may not be relevant.
> 
> Thanks for reviewing this. The aim of the patch is to add how one can
> create extensions with "CREATE EXTENSION" command in replication
> setup, not sure why I should use the term "module". I hope you have
> seen the usage of extension in the extend.sgml.

The aim of this patch should be to clarify postgres configuration for
additional modules in physical replication, whether those includes an extension
or not.  Your patch covers implication of modifying shared_preload_libraries,
are you saying that if there's no extension associated with that library it
shouldn't be covered?

A simple example is auto_explain, which also means that the documentation
should probbaly mention that shared_preload_libraries is only an example and
all configuration changes should be reported (and eventually adapted) on the
standby, like session_preload_libraries among others.




Re: Is there any documentation on how to correctly create extensions in HA(primary-standby) setup?

2022-02-10 Thread Bharath Rupireddy
On Tue, Jan 18, 2022 at 1:26 PM Julien Rouhaud  wrote:
>
> Hi,
>
> On Thu, Dec 09, 2021 at 08:19:06AM +0530, Bharath Rupireddy wrote:
> >
> > Thanks. Attaching v1 patch specifying the notes there. Please review.
>
> I think that the common terminology is "module", not "extension".  That's
> especially important here as this information is also relevant for modules 
> that
> may come with an SQL level extension.  This should be made clear in that new
> documentation, same for the CREATE EXTENSION part that may not be relevant.

Thanks for reviewing this. The aim of the patch is to add how one can
create extensions with "CREATE EXTENSION" command in replication
setup, not sure why I should use the term "module". I hope you have
seen the usage of extension in the extend.sgml.

> It also seems that this documentation is only aimed for physical replication.
> It should also be explicitly stated as it might not be obvious for the 
> intended
> readers.

Yeah, I've changed the title and description accordingly.

> + [...] set it either via ALTER 
> SYSTEM
> + command or postgresql.conf file on both primary and
> + standys, reload the postgresql.conf file and 
> restart
> + the servers.
>
> Isn't the reload a terrible advice?  By definition changing
> shared_preload_libraries isn't compatible with a simple reload and will emit
> some error.

Yes, it will emit the following messages. I removed the reload part.

2022-02-11 04:07:53.178 UTC [1206594] LOG:  parameter
"shared_preload_libraries" cannot be changed without restarting the
server
2022-02-11 04:07:53.178 UTC [1206594] LOG:  configuration file
"/home/bharath/postgres/inst/bin/data/postgresql.auto.conf" contains
errors; unaffected changes were applied

> + [...] Create the extension on the primary, there is no need to
> + create it on the standbys as the  linkend="sql-createextension">CREATE EXTENSION
> + command is replicated.
>
> The "no need" here is quite ambiguous, as it seems to indicate that trying to
> create the extension on the standby will work but is unnecessary.

Modified.

Attaching v2, please have a look.

Regards,
Bharath Rupireddy.


v2-0001-Document-creating-an-extension-in-replication-set.patch
Description: Binary data


Re: Merging statistics from children instead of re-sampling everything

2022-02-10 Thread Andrey V. Lepikhov

On 2/11/22 03:37, Tomas Vondra wrote:

That being said, this thread was not really about foreign partitions,
but about re-analyzing inheritance trees in general. And sampling
foreign partitions doesn't really solve that - we'll still do the
sampling over and over.

IMO, to solve the problem we should do two things:
1. Avoid repeatable partition scans in the case inheritance tree.
2. Avoid to re-analyze everything in the case of active changes in small 
subset of partitions.


For (1) i can imagine a solution like multiplexing: on the stage of 
defining which relations to scan, group them and prepare parameters of 
scanning to make multiple samples in one shot.
It looks like we need a separate logic for analysis of partitioned 
tables - we should form and cache samples on each partition before an 
analysis.
It requires a prototype to understand complexity of such solution and 
can be done separately from (2).


Task (2) is more difficult to solve. Here we can store samples from each 
partition in values[] field of pg_statistic or in specific table which 
stores a 'most probable values' snapshot of each table.
Most difficult problem here, as you mentioned, is ndistinct value. Is it 
possible to store not exactly calculated value of ndistinct, but an 
'expected value', based on analysis of samples and histograms on 
partitions? Such value can solve also a problem of estimation of a SETOP 
result grouping (joining of them, etc), where we have statistics only on 
sources of the union.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: List of all* PostgreSQL EXTENSIONs in the world

2022-02-10 Thread Noah Misch
On Thu, Feb 10, 2022 at 04:04:17PM -0500, Joe Conway wrote:
> On 2/10/22 15:35, Robert Haas wrote:
> >On Thu, Feb 10, 2022 at 3:19 PM Joel Jacobson  wrote:
> >>I've compiled a list of all* PostgreSQL EXTENSIONs in the world:
> >>
> >>https://gist.github.com/joelonsql/e5aa27f8cc9bd22b8999b7de8aee9d47
> >>
> >>*) It's not all, but 1041, compared to the 338 found on PGXN.

How did you make the list?  (I'd imagine doing it by searching for
repositories containing evidence like \bpgxs\b matches.)

> >>Maybe it would be an idea to provide a lightweight solution,
> >>e.g. maintaining a simple curated list of repo urls,
> >>published at postgresql.org or wiki.postgresql.org,
> >>with a simple form allowing missing repos to be suggested for insertion?
> >
> >I think a list like this is probably not useful without at least a
> >brief description of each one, and maybe some attempt at
> >categorization. If I want a PostgreSQL extension to bake tasty
> >lasagne, I'm not going to go scroll through 1041 entries and hope
> >something jumps out at me. I'm going to Google "PostgreSQL lasagne
> >extension" and see if anything promising shows up. But if I had a list
> >that were organized by category, I might try looking for a relevant
> >category and then reading the blurbs for each one...
> 
> Agreed.

When I change back-branch APIs and ABIs, I use a PGXN snapshot to judge the
scope of affected modules.  Supplementing the search with these git
repositories would help, even without further curation.  I agree curation
would make the list more valuable for other use cases.  Contributing to PGXN
may be the way to go, though.




Re: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?

2022-02-10 Thread Bharath Rupireddy
On Fri, Feb 11, 2022 at 12:29 AM Justin Pryzby  wrote:
>
> On Thu, Feb 10, 2022 at 10:21:08PM +0530, Bharath Rupireddy wrote:
> > Thanks for the comments. Above looks good to me, changed that way, PSA v2.
>
> I spy a typo: subcription

Thanks. Corrected in v3 attached.

The CF entry https://commitfest.postgresql.org/36/3354/ was closed
with "Returned with Feedback". I'm not sure why. If the patch is still
of interest, I will add a new one for tracking.

Regards,
Bharath Rupireddy.


v3-0001-add-vcregress-taptest-and-mention-about-Windows-o.patch
Description: Binary data


Re: List of all* PostgreSQL EXTENSIONs in the world

2022-02-10 Thread Julien Rouhaud
Hi,

On Fri, Feb 11, 2022 at 09:22:01AM +0900, Ian Lawrence Barwick wrote:
> 2022年2月11日(金) 5:19 Joel Jacobson :
> >
> > Hi hackers,
> >
> > I've compiled a list of all* PostgreSQL EXTENSIONs in the world:
> >
> > https://gist.github.com/joelonsql/e5aa27f8cc9bd22b8999b7de8aee9d47
> >
> > *) It's not all, but 1041, compared to the 338 found on PGXN.
> 
> The only one I consider of possible general, viable use (and which is 
> available
> as part of the community packaging infrastructure) is also listed on PGXN.

Agreed, PGXN is already the official place for that and has all you need to
properly look for what you want AFAICT.

> > Maybe it would be an idea to provide a lightweight solution,
> > e.g. maintaining a simple curated list of repo urls,
> > published at postgresql.org or wiki.postgresql.org,
> > with a simple form allowing missing repos to be suggested for insertion?
> 
> The wiki sounds like a good starting point, assuming someone is willing to
> create/curate/maintain the list. It would need weeding out of any
> extensions which
> are inactive/unmaintained, duplicates/copies/forks (e.g. I see three
> instances of
> blackhole_fdw listed, but not the original repo, which is not even on
> GitHub) etc..

There are already some repositories that tries to gather some curated list of
projects around postgres, which probably already have the same problem with
abandoned or simply moved projects.

The only real solution is to have authors keep publishing and updating their
tools on PGXN.




Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-02-10 Thread Andres Freund
Hi,

Over in another thread I made some wild unsubstantiated guesses that the
windows issues could have been made much more likely by a somewhat odd bit of
code in PQisBusy():

https://postgr.es/m/1959196.1644544971%40sss.pgh.pa.us

Alexander, any chance you'd try if that changes the likelihood of the problem
occurring, without any other fixes / reverts applied?

Greetings,

Andres Freund




Re: Skipping logical replication transactions on subscriber side

2022-02-10 Thread Masahiko Sawada
On Thu, Jan 27, 2022 at 10:42 PM Peter Eisentraut
 wrote:
>
> On 26.01.22 05:05, Masahiko Sawada wrote:
> >> I think it is okay to clear after the first successful application of
> >> any transaction. What I was not sure was about the idea of giving
> >> WARNING/ERROR if the first xact to be applied is not the same as
> >> skip_xid.
> > Do you prefer not to do anything in this case?
>
> I think a warning would be sensible.  If the user specifies to skip a
> certain transaction and then that doesn't happen, we should at least say
> something.

Meanwhile waiting for comments on the discussion about the designs of
both pg_stat_subscription_workers and ALTER SUBSCRIPTION SKIP feature,
I’ve incorporated some (minor) comments on the current design patch,
which includes:

* Use LSN instead of XID.
* Raise a warning if the user specifies to skip a certain transaction
and then that doesn’t happen.
* Skip-LSN has an effect on the first non-empty transaction. That is,
it’s cleared after successfully committing a non-empty transaction,
preventing the user-specified wrong LSN to remain.
* Remove some unnecessary tap tests to reduce the test time.

I think we all agree with the first point regardless of where we store
error information. And speaking of the current design, I think we all
agree on other points. Since the design discussion is ongoing, I’ll
incorporate other comments according to the result of the discussion.

The attached 0001 patch modifies the pg_stat_subscription_workers to
report LSN instead of XID, which is required by ALTER SUBSCRIPTION
SKIP patch, the 0002 patch.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v11-0001-Report-error-transaction-s-commit-LSN-instead-of.patch
Description: Binary data


v11-0002-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch
Description: Binary data


Re: generic plans and "initial" pruning

2022-02-10 Thread Andres Freund
Hi,

On 2022-02-10 17:13:52 +0900, Amit Langote wrote:
> The attached patch implements this idea.  Sorry for the delay in
> getting this out and thanks to Robert for the off-list discussions on
> this.

I did not follow this thread at all. And I only skimmed the patch. So I'm
probably wrong.

I'm a wary of this increasing executor overhead even in cases it won't
help. Without this patch, for simple queries, I see small allocations
noticeably in profiles. This adds a bunch more, even if
!context->stmt->usesPreExecPruning:

- makeNode(ExecPrepContext)
- makeNode(ExecPrepOutput)
- palloc0(sizeof(PlanPrepOutput *) * result->numPlanNodes)
- stmt_execprep_list = lappend(stmt_execprep_list, execprep);
- AllocSetContextCreate(CurrentMemoryContext,
  "CachedPlan execprep list", ...
- ...

That's a lot of extra for something that's already a bottleneck.

Greetings,

Andres Freund




Re: Add jsonlog log_destination for JSON server logs

2022-02-10 Thread Michael Paquier
On Thu, Feb 10, 2022 at 07:45:17PM -0300, Alvaro Herrera wrote:
> From what I hear in the container world, what they would *prefer* (but
> they don't often get) is to receive the JSON-format logs directly in
> stderr from the daemons they run; they capture stderr and they have the
> logs just in the format they need, without having to open the log files,
> parsing the lines to rewrite in a different format as is done currently.

Yes, I have been pinged about that, which is why there are still cases
for my out-of-core extension jsonlog that uses the elog hook.

> I think this would be a relatively easy patch to do.  Opinions?

The postmaster goes through a couple of loops with the fd to open for
the default format, that the syslogger inherits from the postmaster,
and I am pretty sure that there are a couple of code paths around the
postmaster startup that can be tricky to reason about.

Making the new parameter PGC_POSTMASTER makes things easier to handle,
still the postmaster generates a couple of LOG entries and redirects
them to stderr before loading any GUC values, which would mean that we
cannot make sure that all the logs are valid JSON objects.  If we want
to be 100% waterproof here, we may want to track down the format to
use by default with a mean different than a GUC for the postmaster
startup?  A file holding this information in the root of the data
folder would be one way.
--
Michael


signature.asc
Description: PGP signature


Re: Add jsonlog log_destination for JSON server logs

2022-02-10 Thread Tom Lane
Alvaro Herrera  writes:
> So, thinking about this, there is one important piece that is missing
> here, which is the ability to change the default format for what we
> write to stderr.  Right now, if you have stderr output, it is always in
> the "plain multiline" format, with no option to change it.  If you want
> a JSON log, you have to read a file.  But ISTM it would be pretty useful
> if you could say "log_default_format=json" and get the log that we get
> in stderr in the JSON format instead.

>> From what I hear in the container world, what they would *prefer* (but
> they don't often get) is to receive the JSON-format logs directly in
> stderr from the daemons they run; they capture stderr and they have the
> logs just in the format they need, without having to open the log files,
> parsing the lines to rewrite in a different format as is done currently.

> I think this would be a relatively easy patch to do.  Opinions?

I think assuming that everything that comes out on the postmaster's stderr
is generated by our code is hopelessly naive.  See for example glibc's
bleats when it detects malloc corruption, or when loading a shlib fails.
So I don't believe something like this can be made to work reliably.

The existing syslogger logic does have the ability to cope with
such out-of-protocol data.  So maybe, if you are using syslogger,
you could have it transform such messages into some
lowest-common-denominator jsonlog format.  But it's not going to
work to expect that to happen with raw stderr.

regards, tom lane




Re: List of all* PostgreSQL EXTENSIONs in the world

2022-02-10 Thread Ian Lawrence Barwick
2022年2月11日(金) 5:19 Joel Jacobson :
>
> Hi hackers,
>
> I've compiled a list of all* PostgreSQL EXTENSIONs in the world:
>
> https://gist.github.com/joelonsql/e5aa27f8cc9bd22b8999b7de8aee9d47
>
> *) It's not all, but 1041, compared to the 338 found on PGXN.

More precisely it's a list of all? the repositories with PostgreSQL
extensions on
GitHub?

FWIW I see a few of mine on there, which are indeed PostgreSQL extensions,
but for are also mostly just random, mainly unmaintained non-production-ready
code I've dumped on GitHub in the unlikely event it's of any interest.

The only one I consider of possible general, viable use (and which is available
as part of the community packaging infrastructure) is also listed on PGXN.

OTOH not everything is on GitHub; PostGIS comes to mind.

> Maybe it would be an idea to provide a lightweight solution,
> e.g. maintaining a simple curated list of repo urls,
> published at postgresql.org or wiki.postgresql.org,
> with a simple form allowing missing repos to be suggested for insertion?

The wiki sounds like a good starting point, assuming someone is willing to
create/curate/maintain the list. It would need weeding out of any
extensions which
are inactive/unmaintained, duplicates/copies/forks (e.g. I see three
instances of
blackhole_fdw listed, but not the original repo, which is not even on
GitHub) etc..

Regards

Ian Barwick

-- 
EnterpriseDB: https://www.enterprisedb.com




Re: Allow parallel plan for referential integrity checks?

2022-02-10 Thread Andreas Karlsson

On 2/7/22 11:26, Frédéric Yhuel wrote:
Attached is a (naive) patch that aims to fix the case of a FK addition, 
but the handling of the flag CURSOR_OPT_PARALLEL_OK, generally speaking, 
looks rather hackish.


Thanks, for the patch. You can add it to the current open commitfest 
(https://commitfest.postgresql.org/37/).


Best regards,
Andreas




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-10 Thread Andres Freund
Hi,

On 2022-02-10 10:32:42 -0500, Robert Haas wrote:
> I won't really be surprised if we hear that someone has a 10GB
> template database and likes to make a ton of copies and only change
> 500 rows in each one while replicating the whole thing over a slow
> WAN. That can definitely happen, and I'm sure whoever is doing that
> has reasons for it which they consider good and sufficient. However, I
> don't think there are likely to be a ton of people doing stuff like
> that - just a few.

Yea. I would be a bit more concerned if we made creating template databases
very cheap, e.g. by using file copy-on-write functionality like we have for
pg_upgrade.  But right now it's a fairly hefty operation anyway.

Greetings,

Andres Freund




Re: Add jsonlog log_destination for JSON server logs

2022-02-10 Thread Alvaro Herrera
So, thinking about this, there is one important piece that is missing
here, which is the ability to change the default format for what we
write to stderr.  Right now, if you have stderr output, it is always in
the "plain multiline" format, with no option to change it.  If you want
a JSON log, you have to read a file.  But ISTM it would be pretty useful
if you could say "log_default_format=json" and get the log that we get
in stderr in the JSON format instead.

>From what I hear in the container world, what they would *prefer* (but
they don't often get) is to receive the JSON-format logs directly in
stderr from the daemons they run; they capture stderr and they have the
logs just in the format they need, without having to open the log files,
parsing the lines to rewrite in a different format as is done currently.

I think this would be a relatively easy patch to do.  Opinions?

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: Merging statistics from children instead of re-sampling everything

2022-02-10 Thread Tomas Vondra



On 2/10/22 12:50, Andrey Lepikhov wrote:
> On 21/1/2022 01:25, Tomas Vondra wrote:
>> But I don't have a very good idea what to do about statistics that we
>> can't really merge. For some types of statistics it's rather tricky to
>> reasonably merge the results - ndistinct is a simple example, although
>> we could work around that by building and merging hyperloglog counters.
>
> I think, as a first step on this way we can reduce a number of pulled
> tuples. We don't really needed to pull all tuples from a remote server.
> To construct a reservoir, we can pull only a tuple sample. Reservoir
> method needs only a few arguments to return a sample like you read
> tuples locally. Also, to get such parts of samples asynchronously, we
> can get size of each partition on a preliminary step of analysis.
> In my opinion, even this solution can reduce heaviness of a problem
> drastically.
> 

Oh, wow! I haven't realized we're fetching all the rows from foreign
(postgres_fdw) partitions. For local partitions we already do that,
because that uses the usual acquire function, with a reservoir
proportional to partition size. I have assumed we use tablesample to
fetch just a small fraction of rows from FDW partitions, and I agree
doing that would be a pretty huge benefit.

I actually tried hacking that together - there's a couple problems with
that (e.g. determining what fraction to sample using bernoulli/system),
but in principle it seems quite doable. Some minor changes to the FDW
API may be necessary, not sure.

Not sure about the async execution - that seems way more complicated,
and the sampling reduces the total cost, async just parallelizes it.


That being said, this thread was not really about foreign partitions,
but about re-analyzing inheritance trees in general. And sampling
foreign partitions doesn't really solve that - we'll still do the
sampling over and over.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:

2022-02-10 Thread Andres Freund
Hi,

On 2022-02-11 09:10:38 +1300, Thomas Munro wrote:
> I was about to commit that, because the original Windows problem it
> solved is showing up occasionally in CI failures (that is, it already
> solves a live problem, albeit a different and non-data-corrupting
> one):

+1

> It seems like I should go ahead and do that today, and we can study
> further uses for PROCSIGNAL_BARRIER_SMGRRELEASE in follow-on work?

Yes.

I wonder whether we really should make the barriers be conditional on
defined(WIN32) || defined(USE_ASSERT_CHECKING) as done in that patch, even
leaving wraparound dangers aside. On !windows we still have the issues of the
space for the dropped / moved files not being released if there are processes
having them open. That can be a lot of space if there's long-lived connections
in a cluster that doesn't fit into s_b (because processes will have random fds
open for writing back dirty buffers). And we don't truncate the files before
unlinking when done as part of a DROP DATABASE...

But that's something we can fine-tune later as well...

Greetings,

Andres Freund




Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:

2022-02-10 Thread Andres Freund
Hi,

On 2022-02-10 13:49:50 -0500, Robert Haas wrote:
> I agree. While I feel sort of bad about missing this issue in review,
> I also feel like it's pretty surprising that there isn't something
> plugging this hole already. It feels unexpected that our FD management
> layer might hand you an FD that references the previous file that had
> the same name instead of the one that exists right now, and I don't
> think it's too much of a stretch of the imagination to suppose that
> this won't be the last problem we have if we don't fix it.

Arguably it's not really the FD layer that's the issue - it's on the smgr
level. But that's semantics...


> The main question in my mind is who is going to actually make that
> happen. It was your idea (I think), Thomas coded it, and my commit
> made it a live problem. So who's going to get something committed
> here?

I think it'd make sense for Thomas to commit the current patch for the
tablespace issues first. And then we can go from there?


> > If we add such barriers to dropdb(), XLOG_DBASE_DROP redo, DropTableSpace(),
> > XLOG_TBLSPC_DROP redo, I think all the "new" ways to hit this are 
> > foreclosed.
> 
> What about movedb()?

My first reaction was that it should be fine, because RelFileNode.spcNode
would differ. But of course that doesn't protect against moving a database
back and forth. So yes, you're right, it's needed there too.


Are there other commands that could be problematic? ALTER TABLE SET TABLESPACE
should be fine, that allocates a new relfilenode.


> I definitely think it would be nice to cover this issue not only for
> database OIDs and tablespace OIDs but also for relfilenode OIDs.
> Otherwise we're right on the cusp of having the same bug in that case.
> pg_upgrade doesn't drop and recreate tables the way it does databases,
> but if somebody made it do that in the future, would they think about
> this? I bet not.

And even just plugging the wraparound aspect would be great. It's not actually
*that* hard to wrap oids around, because of toast.


> Dilip's been working on your idea of making relfilenode into a 56-bit
> quantity. That would make the problem of detecting wraparound go away.
> In the meantime, I suppose we could do think of trying to do something
> here:
> 
> /* wraparound, or first post-initdb
> assignment, in normal mode */
> ShmemVariableCache->nextOid = FirstNormalObjectId;
> ShmemVariableCache->oidCount = 0;
> 
> That's a bit sketch, because this is the OID counter, which isn't only
> used for relfilenodes.

I can see a few ways to deal with that:

1) Split nextOid into two, nextRelFileNode and nextOid. Have the wraparound
   behaviour only in the nextRelFileNode case.  That'd be a nice improvement,
   but not entirely trivial, because we currently use GetNewRelFileNode() for
   oid assignment as well (c.f. heap_create_with_catalog()).

2) Keep track of the last assigned relfilenode in ShmemVariableCache, and wait
   for a barrier in GetNewRelFileNode() if the assigned oid is wrapped
   around.  While there would be a chance of "missing" a ->nextOid wraparound,
   e.g. because of >2**32 toast oids being allocated, I think that could only
   happen if there were no "wrapped around relfilenodes" allocated, so that
   should be ok.

3) Any form of oid wraparound might cause problems, not just relfilenode
   issues. So having a place to emit barriers on oid wraparound might be a
   good idea.


One thing I wonder is how to make sure the path would be tested. Perhaps we
could emit the anti-wraparound barriers more frequently when assertions are
enabled?


> I'm not sure whether there would be problems waiting for a barrier here - I
> think we might be holding some lwlocks in some code paths.

It seems like a bad idea to call GetNewObjectId() with lwlocks held. From what
I can see the two callers of GetNewObjectId() both have loops that can go on
for a long time after a wraparound.



> > I think we need to add some debugging code to detect "fd to deleted file of
> > same name" style problems. Especially during regression tests they're quite
> > hard to notice, because most of the contents read/written are identical 
> > across
> > recycling.
> 
> Maybe. If we just plug the holes here, I am not sure we need permanent
> instrumentation. But I'm also not violently opposed to it.

The question is how we can find all the places we need to add barriers emit &
wait to. I e.g. didn't initially didn't realize that there's wraparound danger
in WAL replay as well.

Greetings,

Andres Freund




Re: Windows now has fdatasync()

2022-02-10 Thread Thomas Munro
On Sun, Feb 6, 2022 at 7:20 PM Michael Paquier  wrote:
> On Sun, Dec 12, 2021 at 03:48:10PM +1300, Thomas Munro wrote:
> > I tried out a quick POC patch and it runs a bit faster than fsync(), as
> > expected.
>
> Good news, as a too high difference would be suspect :)
>
> How much difference does it make in % and are the numbers rather
> reproducible?  Just wondering..

I've only tested on a qemu/kvm virtual machine with a virtual SATA
disk device, so take this with a bucket of salt, but I think that's
enough to see the impact of 'slow' SATA commands hitting the device
and being waited for, and what I see is that wal_sync_method=fdatasync
does about 25% more TPS than wal_sync_method=fsync, and
wal_sync_method=open_datasync is a wildly higher number that I don't
believe (ie I don't believe it waited for power loss durability and
the links above support that understanding), but tumbles back to earth
and almost exactly matches the wal_sync_method=fdatasync number when
the write cache is disabled.




Re: generic plans and "initial" pruning

2022-02-10 Thread Robert Haas
On Thu, Feb 10, 2022 at 3:14 AM Amit Langote  wrote:
> Maybe this should be more than one patch?  Say:
>
> 0001 to add ExecutorPrep and the boilerplate,
> 0002 to teach plancache.c to use the new facility

Could be, not sure. I agree that if it's possible to split this in a
meaningful way, it would facilitate review. I notice that there is
some straight code movement e.g. the creation of
ExecPartitionPruneFixSubPlanIndexes. It would be best, I think, to do
pure code movement in a preparatory patch so that the main patch is
just adding the new stuff we need and not moving stuff around.

David Rowley recently proposed a patch for some parallel-safety
debugging cross checks which added a plan tree walker. I'm not sure
whether he's going to press that patch forward to commit, but I think
we should get something like that into the tree and start using it,
rather than adding more bespoke code. Maybe you/we should steal that
part of his patch and commit it separately. What I'm imagining is that
plan_tree_walker() would know which nodes have subnodes and how to
recurse over the tree structure, and you'd have a walker function to
use with it that would know which executor nodes have ExecPrep
functions and call them, and just do nothing for the others. That
would spare you adding stub functions for nodes that don't need to do
anything, or don't need to do anything other than recurse. Admittedly
it would look a bit different from the existing executor phases, but
I'd argue that it's a better coding model.

Actually, you might've had this in the patch at some point, because
you have a declaration for plan_tree_walker but no implementation. I
guess one thing that's a bit awkward about this idea is that in some
cases you want to recurse to some subnodes but not other subnodes. But
maybe it would work to put the recursion in the walker function in
that case, and then just return true; but if you want to walk all
children, return false.

+ bool contains_init_steps;
+ bool contains_exec_steps;

s/steps/pruning/? maybe with contains -> needs or performs or requires as well?

+ * Returned information includes the set of RT indexes of relations referenced
+ * in the plan, and a PlanPrepOutput node for each node in the planTree if the
+ * node type supports producing one.

Aren't all RT indexes referenced in the plan?

+ * This may lock relations whose information may be used to produce the
+ * PlanPrepOutput nodes. For example, a partitioned table before perusing its
+ * PartitionPruneInfo contained in an Append node to do the pruning the result
+ * of which is used to populate the Append node's PlanPrepOutput.

"may lock" feels awfully fuzzy to me. How am I supposed to rely on
something that "may" happen? And don't we need to have tight logic
around locking, with specific guarantees about what is locked at which
points in the code and what is not?

+ * At least one of 'planstate' or 'econtext' must be passed to be able to
+ * successfully evaluate any non-Const expressions contained in the
+ * steps.

This also seems fuzzy. If I'm thinking of calling this function, I
don't know how I'd know whether this criterion is met.

I don't love PlanPrepOutput the way you have it. I think one of the
basic design issues for this patch is: should we think of the prep
phase as specifically pruning, or is it general prep and pruning is
the first thing for which we're going to use it? If it's really a
pre-pruning phase, we could name it that way instead of calling it
"prep". If it's really a general prep phase, then why does
PlanPrepOutput contain initially_valid_subnodes as a field? One could
imagine letting each prep function decide what kind of prep node it
would like to return, with partition pruning being just one of the
options. But is that a useful generalization of the basic concept, or
just pretending that a special-purpose mechanism is more general than
it really is?

+ return CreateQueryDesc(pstmt, NULL, /* XXX pass ExecPrepOutput too? */

It seems to me that we should do what the XXX suggests. It doesn't
seem nice if the parallel workers could theoretically decide to prune
a different set of nodes than the leader.

+ * known at executor startup (excludeing expressions containing

Extra e.

+ * into subplan indexes, is also returned for use during subsquent

Missing e.

Somewhere, we're going to need to document the idea that this may
permit us to execute a plan that isn't actually fully valid, but that
we expect to survive because we'll never do anything with the parts of
it that aren't. Maybe that should be added to the executor README, or
maybe there's some better place, but I don't think that should remain
something that's just implicit.

This is not a full review, just some initial thoughts looking through this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: WaitLatchOrSocket seems to not count to 4 right...

2022-02-10 Thread Thomas Munro
On Tue, Feb 8, 2022 at 1:51 PM Thomas Munro  wrote:
> (and one day we should make it dynamic and change udata to hold
> an index instead of a pointer...)

Here's a patch like that.

I'd originally sketched this out for another project, but I don't
think I need it for that anymore.  After this exchange I couldn't
resist fleshing it out for a commitfest, just on useability grounds.
Thoughts?
From 7d011ed1de06ee6aad850f72399f0d3c9217458f Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 11 Feb 2022 10:39:46 +1300
Subject: [PATCH] Expand WaitEventSet dynamically.

Previously, CreateWaitEventSet() took an argument to say how many events
could be added.  Teach it to grow automatically instead.  Also expose a
"reserve" function in case a caller knows how many it needs and would
like to avoid reallocation.

Discussion: https://postgr.es/m/CA%2BhUKG%2BiaLp6ETo%2Bu0632yVUGO6eSMx4hKBbcc7zy88thBQy%3Dg%40mail.gmail.com
---
 src/backend/executor/nodeAppend.c  |   2 +-
 src/backend/libpq/pqcomm.c |   2 +-
 src/backend/postmaster/pgstat.c|   2 +-
 src/backend/postmaster/syslogger.c |   2 +-
 src/backend/storage/ipc/latch.c| 201 +++--
 src/include/storage/latch.h|   3 +-
 6 files changed, 110 insertions(+), 102 deletions(-)

diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 7937f1c88f..3ad0441756 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -1029,7 +1029,7 @@ ExecAppendAsyncEventWait(AppendState *node)
 	/* We should never be called when there are no valid async subplans. */
 	Assert(node->as_nasyncremain > 0);
 
-	node->as_eventset = CreateWaitEventSet(CurrentMemoryContext, nevents);
+	node->as_eventset = CreateWaitEventSet(CurrentMemoryContext);
 	AddWaitEventToSet(node->as_eventset, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
 	  NULL, NULL);
 
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 22eb04948e..99fcc447b0 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -204,7 +204,7 @@ pq_init(void)
 (errmsg("could not set socket to nonblocking mode: %m")));
 #endif
 
-	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, 3);
+	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext);
 	socket_pos = AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE,
    MyProcPort->sock, NULL, NULL);
 	latch_pos = AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, PGINVALID_SOCKET,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 0646f53098..b4e791146c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3528,7 +3528,7 @@ PgstatCollectorMain(int argc, char *argv[])
 	pgStatDBHash = pgstat_read_statsfiles(InvalidOid, true, true);
 
 	/* Prepare to wait for our latch or data in our socket. */
-	wes = CreateWaitEventSet(CurrentMemoryContext, 3);
+	wes = CreateWaitEventSet(CurrentMemoryContext);
 	AddWaitEventToSet(wes, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
 	AddWaitEventToSet(wes, WL_POSTMASTER_DEATH, PGINVALID_SOCKET, NULL, NULL);
 	AddWaitEventToSet(wes, WL_SOCKET_READABLE, pgStatSock, NULL, NULL);
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 25e2131e31..d4a9a8af34 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -311,7 +311,7 @@ SysLoggerMain(int argc, char *argv[])
 	 * syslog pipe, which implies that all other backends have exited
 	 * (including the postmaster).
 	 */
-	wes = CreateWaitEventSet(CurrentMemoryContext, 2);
+	wes = CreateWaitEventSet(CurrentMemoryContext);
 	AddWaitEventToSet(wes, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
 #ifndef WIN32
 	AddWaitEventToSet(wes, WL_SOCKET_READABLE, syslogPipe[0], NULL, NULL);
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 5bb609b368..b5974d0f06 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -84,9 +84,32 @@
 #error "no wait set implementation available"
 #endif
 
+/* Each implementation needs an array of a certain object type. */
+#if defined(WAIT_USE_EPOLL)
+typedef struct epoll_event WaitEventImpl;
+#elif defined(WAIT_USE_KQUEUE)
+typedef struct kevent WaitEventImpl;
+#elif defined(WAIT_USE_POLL)
+typedef struct pollfd WaitEventImpl;
+#elif defined(WAIT_USE_WIN32)
+typedef HANDLE WaitEventImpl;
+#endif
+
+/* Windows needs an extra element in events_impl. */
+#if defined(WAIT_USE_WIN32)
+#define EXTRA_EVENT_IMPL 1
+#else
+#define EXTRA_EVENT_IMPL 0
+#endif
+
+/* Can be set lower to exercise reallocation code. */
+#define INITIAL_WAIT_EVENT_SET_SIZE 4
+
 /* typedef in latch.h */
 struct WaitEventSet
 {
+	MemoryContext context;
+
 	int			nevents;		/* number of registered events */
 	int			nevents_space;	/* maximum number of events in this set */
 
@@ -112,26 +135,17 @@ struct WaitEventSet
 	 */
 	bool		exit_on_postmaster_death;
 
+	/*
+	 * Array, of nevents_space 

Re: Synchronizing slots from primary to standby

2022-02-10 Thread Bruce Momjian
On Tue, Feb  8, 2022 at 08:27:32PM +0530, Ashutosh Sharma wrote:
> > Which means that if e.g. the standby_slot_names GUC differs from
> > synchronize_slot_names on the physical replica, the slots synchronized on 
> > the
> > physical replica are not going to be valid.  Or if the primary drops its
> > logical slots.
> >
> >
> > > Should the redo function for the drop replication slot have the capability
> > > to drop it on standby and its subscribers (if any) as well?
> >
> > Slots are not WAL logged (and shouldn't be).
> >
> > I think you pretty much need the recovery conflict handling infrastructure I
> > referenced upthread, which recognized during replay if a record has a 
> > conflict
> > with a slot on a standby.  And then ontop of that you can build something 
> > like
> > this patch.
> >
> 
> OK. Understood, thanks Andres.

I would love to see this feature in PG 15.  Can someone explain its
current status?  Thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: support for CREATE MODULE

2022-02-10 Thread Alvaro Herrera
On 2022-Feb-04, Tom Lane wrote:

> If we invent modules I think they need to work more like extensions
> naming-wise, ie they group objects but have no effect for naming.

I think modules are an orthogonal concept to extensions, and trying to
mix both is messy.

The way I see modules working is as a "logical" grouping of objects --
they provide encapsulated units of functionality.  A module has private
functions, which cannot be called except from other functions in the
same module.  You can abstract them out of the database design, leaving
you with only the exposed functions, the public API.

An extension is a way to distribute or package objects.  An extension
can contain a module, and of course you should be able to use an
extension to distribute a module, or even several modules.  In fact, I
think it should be possible to have several extensions contribute
different objects to the same module.

But things like name resolution rules (search path) are not affected by
how the objects are distributed, whereas the search path is critical in
how you think about the objects in a module.


If modules are just going to be extensions, I see no point.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: support for CREATE MODULE

2022-02-10 Thread Bruce Momjian
On Thu, Feb 10, 2022 at 08:53:15AM -0800, Swaha Miller wrote:
> On Fri, Feb 4, 2022 at 3:51 PM Tom Lane  wrote:
> 
> Hm. If the functional requirement is "group objects without needing
> any out-in-the-filesystem infrastructure", then I could see defining
> a module as being exactly like an extension except there's no such
> infrastructure --- and hence no concept of versions, plus pg_dump
> needs to act differently.  That's probably enough semantic difference
> to justify using a separate word, even if we can share a lot of
> code infrastructure.
> 
> Then as a first cut for modules, could we add CREATE MODULE
> syntax which adds an entry to pg_extension like CREATE EXTENSION
> does? And also add a new column to pg_extension to distinguish 
> modules from extensions. 
> 
> The three-part path name resolution for functions would remain the 
> same, nothing would need to change there because of modules.
> 
> Would that be an acceptable direction to go?

Well, that would allow us to have CREATE EXTENSION syntax, but what
would it do that CREATE SCHEMA does not?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: List of all* PostgreSQL EXTENSIONs in the world

2022-02-10 Thread Joe Conway

On 2/10/22 15:35, Robert Haas wrote:

On Thu, Feb 10, 2022 at 3:19 PM Joel Jacobson  wrote:

I've compiled a list of all* PostgreSQL EXTENSIONs in the world:

https://gist.github.com/joelonsql/e5aa27f8cc9bd22b8999b7de8aee9d47

*) It's not all, but 1041, compared to the 338 found on PGXN.

Maybe it would be an idea to provide a lightweight solution,
e.g. maintaining a simple curated list of repo urls,
published at postgresql.org or wiki.postgresql.org,
with a simple form allowing missing repos to be suggested for insertion?


I think a list like this is probably not useful without at least a
brief description of each one, and maybe some attempt at
categorization. If I want a PostgreSQL extension to bake tasty
lasagne, I'm not going to go scroll through 1041 entries and hope
something jumps out at me. I'm going to Google "PostgreSQL lasagne
extension" and see if anything promising shows up. But if I had a list
that were organized by category, I might try looking for a relevant
category and then reading the blurbs for each one...


Agreed.

The example I really like is the R project CRAN "Task Views" for their 
packages (currently the CRAN package repository has 18917 available 
packages):


https://cloud.r-project.org/web/views/

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: List of all* PostgreSQL EXTENSIONs in the world

2022-02-10 Thread Robert Haas
On Thu, Feb 10, 2022 at 3:19 PM Joel Jacobson  wrote:
> I've compiled a list of all* PostgreSQL EXTENSIONs in the world:
>
> https://gist.github.com/joelonsql/e5aa27f8cc9bd22b8999b7de8aee9d47
>
> *) It's not all, but 1041, compared to the 338 found on PGXN.
>
> Maybe it would be an idea to provide a lightweight solution,
> e.g. maintaining a simple curated list of repo urls,
> published at postgresql.org or wiki.postgresql.org,
> with a simple form allowing missing repos to be suggested for insertion?

I think a list like this is probably not useful without at least a
brief description of each one, and maybe some attempt at
categorization. If I want a PostgreSQL extension to bake tasty
lasagne, I'm not going to go scroll through 1041 entries and hope
something jumps out at me. I'm going to Google "PostgreSQL lasagne
extension" and see if anything promising shows up. But if I had a list
that were organized by category, I might try looking for a relevant
category and then reading the blurbs for each one...

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: is the base backup protocol used by out-of-core tools?

2022-02-10 Thread Robert Haas
On Wed, Feb 9, 2022 at 3:05 PM Magnus Hagander  wrote:
> Late to the game, but here's another +1.

Not that late, thanks for chiming in.

Looks pretty unanimous, so I have committed the patches.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:

2022-02-10 Thread Robert Haas
On Thu, Feb 10, 2022 at 3:11 PM Thomas Munro  wrote:
> On Fri, Feb 11, 2022 at 7:50 AM Robert Haas  wrote:
> > The main question in my mind is who is going to actually make that
> > happen. It was your idea (I think), Thomas coded it, and my commit
> > made it a live problem. So who's going to get something committed
> > here?
>
> I was about to commit that, because the original Windows problem it
> solved is showing up occasionally in CI failures (that is, it already
> solves a live problem, albeit a different and non-data-corrupting
> one):
>
> https://www.postgresql.org/message-id/CA%2BhUKGJp-m8uAD_wS7%2BdkTgif013SNBSoJujWxvRUzZ1nkoUyA%40mail.gmail.com
>
> It seems like I should go ahead and do that today, and we can study
> further uses for PROCSIGNAL_BARRIER_SMGRRELEASE in follow-on work?

A bird in the hand is worth two in the bush. Unless it's a vicious,
hand-biting bird.

In other words, if you think what you've got is better than what we
have now and won't break anything worse than it is today, +1 for
committing, and more improvements can come when we get enough round
tuits.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Race condition in TransactionIdIsInProgress

2022-02-10 Thread Robert Haas
On Thu, Feb 10, 2022 at 11:46 AM Konstantin Knizhnik  wrote:
> Postgres first records state transaction in CLOG, then removes
> transaction from procarray and finally releases locks.
> But it can happen that transaction is marked as committed in CLOG,
> XMAX_COMMITTED bit is set in modified tuple but
> TransactionIdIsInProgress still returns true. As a result "t_xmin %u is
> uncommitted in tuple (%u,%u) to be updated in table"
> error is reported.

This is exactly why the HeapTupleSatisfiesFOO() functions all test
TransactionIdIsInProgress() first and TransactionIdDidCommit() only if
it returns false. I don't think it's really legal to test
TransactionIdDidCommit() without checking TransactionIdIsInProgress()
first. And I bet that's exactly what this code is doing...

-- 
Robert Haas
EDB: http://www.enterprisedb.com




List of all* PostgreSQL EXTENSIONs in the world

2022-02-10 Thread Joel Jacobson
Hi hackers,

I've compiled a list of all* PostgreSQL EXTENSIONs in the world:

https://gist.github.com/joelonsql/e5aa27f8cc9bd22b8999b7de8aee9d47

*) It's not all, but 1041, compared to the 338 found on PGXN.

Maybe it would be an idea to provide a lightweight solution,
e.g. maintaining a simple curated list of repo urls,
published at postgresql.org  or 
wiki.postgresql.org,
with a simple form allowing missing repos to be suggested for insertion?

/Joel


Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:

2022-02-10 Thread Thomas Munro
On Fri, Feb 11, 2022 at 7:50 AM Robert Haas  wrote:
> The main question in my mind is who is going to actually make that
> happen. It was your idea (I think), Thomas coded it, and my commit
> made it a live problem. So who's going to get something committed
> here?

I was about to commit that, because the original Windows problem it
solved is showing up occasionally in CI failures (that is, it already
solves a live problem, albeit a different and non-data-corrupting
one):

https://www.postgresql.org/message-id/CA%2BhUKGJp-m8uAD_wS7%2BdkTgif013SNBSoJujWxvRUzZ1nkoUyA%40mail.gmail.com

It seems like I should go ahead and do that today, and we can study
further uses for PROCSIGNAL_BARRIER_SMGRRELEASE in follow-on work?




Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-10 Thread Nathan Bossart
On Thu, Feb 10, 2022 at 09:30:45PM +0530, Bharath Rupireddy wrote:
> In any case, let's remove the editor's lock/state file from those
> comments and have just only "We just log a message if a file doesn't
> fit the pattern". Attached v8 patch with that change.

I've moved this one to ready-for-committer.  I was under the impression
that Andres was firmly against this approach, but you did mention there was
an off-list discussion.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [PATCH v2] use has_privs_for_role for predefined roles

2022-02-10 Thread Joe Conway

On 2/10/22 14:28, Nathan Bossart wrote:

On Wed, Feb 09, 2022 at 04:39:11PM -0500, Joe Conway wrote:

On 2/9/22 13:13, Nathan Bossart wrote:

I do wonder if users find the differences between predefined roles and role
attributes confusing.  INHERIT doesn't govern role attributes, but it will
govern predefined roles when this patch is applied.  Maybe the role
attribute system should eventually be deprecated in favor of using
predefined roles for everything.  Or perhaps the predefined roles should be
converted to role attributes.


Yep, I was suggesting that the latter would have been preferable to me while
Robert seemed to prefer the former. Honestly I could be happy with either of
those solutions, but as I alluded to that is probably a discussion for the
next development cycle since I don't see us doing that big a change in this
one.


I agree.  I still think Joshua's proposed patch is a worthwhile improvement
for v15.


+1

I am planning to get into it in detail this weekend. So far I have 
really just ensured it merges cleanly and passes make world.


Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: decoupling table and index vacuum

2022-02-10 Thread Peter Geoghegan
On Thu, Feb 10, 2022 at 11:14 AM Robert Haas  wrote:
> Hmm. I think you're vastly overestimating the extent to which it's
> possible to spread out and reschedule the work. I don't know which of
> us is wrong. From my point of view, if VACUUM is going to do a full
> phase 1 heap pass and a full phase 2 heap pass on either side of
> whatever index work it does, there is no way that things are going to
> get that much more dynamic than they are today.

Waiting to vacuum each index allows us to wait until the next VACUUM
operation on the table, giving us more TIDs to remove when we do go to
vacuum one of these large indexes. Making decisions dynamically seems
very promising because it gives us the most flexibility. In principle
the workload might not allow for any of that, but in practice I think
that it will.

> I don't understand what your point is in these two paragraphs. I'm
> just arguing that, if a raw dead tuple count is meaningless because a
> lot of them are going to disappear harmlessly with or without vacuum,
> it's reasonable to try to get around that problem by counting the
> subset of dead tuples where that isn't true. I agree that it's unclear
> how to do that, but that doesn't mean that it can't be done.

VACUUM is a participant in the system -- it sees how physical
relations are affected by the workload, but it also sees how physical
relations are affected by previous VACUUM operations. If it goes to
VACUUM an index on the basis of a relatively small difference (that
might just be noise), and does so systematically and consistently,
that might have unintended consequences. In particular, we might do
the wrong thing, again and again, because we're overinterpreting noise
again and again.

> I have the same concern about this as what I mentioned before: it's
> purely retrospective. Therefore in my mind it's a very reasonable
> choice for a backstop, but not a somewhat unsatisfying choice for a
> primary mechanism.

I'm not saying that it's impossible or even unreasonable to do
something based on the current or anticipated state of the index,
exactly. Just that you have to be realistic about how accurate that
model is going to be in practice. In practice it'll be quite noisy,
and that must be accounted for. For example, we could deliberately
coarsen the information, so that only relatively large differences in
apparent-bloatedness are visible to the model.

The other thing is that VACUUM itself cannot be expected to operate
with all that much precision, just because of how it works at a high
level. Any quantitative measure will only be meaningful as a way of
prioritizing work. Which is going to be far easier by making the
behavior dynamic, and continually reassessing. Once a relatively large
difference among two indexes first emerges, we can be relatively
confident about what to do. But smaller differences are likely just
noise.

-- 
Peter Geoghegan




Re: Race condition in TransactionIdIsInProgress

2022-02-10 Thread Alvaro Herrera
On 2022-Feb-10, Andrey Borodin wrote:

> > On 10 Feb 2022, at 21:46, Konstantin Knizhnik  wrote:
> > As a result "t_xmin %u is uncommitted in tuple (%u,%u) to be updated in 
> > table"
> > error is reported.
> 
> Wow, cool, that seem to be a solution to one more mysterious
> corruption thread - “reporting TID/table with corruption error" [0].
> Thank you!

Ooh, great find.  I'll have a look at this soon.  Thanks for CCing me.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"I love the Postgres community. It's all about doing things _properly_. :-)"
(David Garamond)




Re: [PATCH v2] use has_privs_for_role for predefined roles

2022-02-10 Thread Nathan Bossart
On Wed, Feb 09, 2022 at 04:39:11PM -0500, Joe Conway wrote:
> On 2/9/22 13:13, Nathan Bossart wrote:
>> I do wonder if users find the differences between predefined roles and role
>> attributes confusing.  INHERIT doesn't govern role attributes, but it will
>> govern predefined roles when this patch is applied.  Maybe the role
>> attribute system should eventually be deprecated in favor of using
>> predefined roles for everything.  Or perhaps the predefined roles should be
>> converted to role attributes.
> 
> Yep, I was suggesting that the latter would have been preferable to me while
> Robert seemed to prefer the former. Honestly I could be happy with either of
> those solutions, but as I alluded to that is probably a discussion for the
> next development cycle since I don't see us doing that big a change in this
> one.

I agree.  I still think Joshua's proposed patch is a worthwhile improvement
for v15.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: make MaxBackends available in _PG_init

2022-02-10 Thread Nathan Bossart
On Thu, Feb 10, 2022 at 10:47:39AM +0900, Michael Paquier wrote:
> On Wed, Feb 09, 2022 at 09:53:38AM -0800, Nathan Bossart wrote:
>> This is fine with me as well.  I only left these out because the extra
>> variable felt unnecessary to me for these functions.
> 
> Okay, done, then.
> 
>> While you are at it, would you mind fixing the misspelling of
>> OldestVisibleMXactId in multixact.c (as per the attached)?
> 
> Indeed.  Fixed as well.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: decoupling table and index vacuum

2022-02-10 Thread Peter Geoghegan
On Thu, Feb 10, 2022 at 11:16 AM Robert Haas  wrote:
> On Thu, Feb 10, 2022 at 3:10 AM Dilip Kumar  wrote:
> > Actually I was not worried about the scan getting slow.  What I was
> > worried about is if we keep ignoring the dead tuples for long time
> > then in the worst case if we have huge number of dead tuples in the
> > index maybe 80% to 90% and then suddenly if we get a lot of insertion
> > for the keys which can not use bottom up deletion (due to the key
> > range).  So now we have a lot of pages which have only dead tuples but
> > we will still allocate new pages because we ignored the dead tuple %
> > and did not vacuum for a long time.
>
> It seems like a reasonable concern to me ... and I think it's somewhat
> related to my comments about trying to distinguish which dead tuples
> matter vs. which don't.

It's definitely a reasonable concern. But once you find yourself in
this situation, *every* index will need to be vacuumed anyway, pretty
much as soon as possible. There will be many LP_DEAD items in the
heap, which will be enough to force index vacuuming of all indexes.

--
Peter Geoghegan




Re: decoupling table and index vacuum

2022-02-10 Thread Robert Haas
On Thu, Feb 10, 2022 at 3:10 AM Dilip Kumar  wrote:
> Actually I was not worried about the scan getting slow.  What I was
> worried about is if we keep ignoring the dead tuples for long time
> then in the worst case if we have huge number of dead tuples in the
> index maybe 80% to 90% and then suddenly if we get a lot of insertion
> for the keys which can not use bottom up deletion (due to the key
> range).  So now we have a lot of pages which have only dead tuples but
> we will still allocate new pages because we ignored the dead tuple %
> and did not vacuum for a long time.

It seems like a reasonable concern to me ... and I think it's somewhat
related to my comments about trying to distinguish which dead tuples
matter vs. which don't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: decoupling table and index vacuum

2022-02-10 Thread Robert Haas
On Wed, Feb 9, 2022 at 6:18 PM Peter Geoghegan  wrote:
> You seem to be vastly underestimating the value in being able to
> spread out and reschedule the work, and manage costs more generally.

Hmm. I think you're vastly overestimating the extent to which it's
possible to spread out and reschedule the work. I don't know which of
us is wrong. From my point of view, if VACUUM is going to do a full
phase 1 heap pass and a full phase 2 heap pass on either side of
whatever index work it does, there is no way that things are going to
get that much more dynamic than they are today. And even if we didn't
do that, in order to make any progress setting LP_DEAD pointers to
LP_UNUSED, you have to vacuum the entire index, which might be BIG. It
would be great to have a lot of granularity here but it doesn't seem
achievable.

> > I was thinking along the lines of trying to figure out either a more
> > reliable count of dead tuples in the index, subtracting out whatever
> > we save by kill_prior_tuple and bottom-up vacuuming; or else maybe a
> > count of the subset of dead tuples that are likely not to get
> > opportunistically pruned in one way or another, if there's some way to
> > guess that.
>
> I don't know how to build something like that, since that works by
> understanding what's working, not by noticing that some existing
> strategy plainly isn't working. The only positive information that I have
> confidence in is the extreme case where you have zero index growth.
> Which is certainly possible, but perhaps not that interesting with a
> real workload.
>
> There are emergent behaviors with bottom-up deletion. Purely useful
> behaviors, as far as I know, but still very hard to precisely nail
> down. For example, Victor Yegorov came up with an adversarial
> benchmark [1] that showed that the technique dealt with index bloat
> from queue-like inserts and deletes that recycled the same distinct
> key values over time, since they happened to be mixed with non-hot
> updates. It dealt very well with that, even though *I had no clue*
> that it would work *at all*, and might have even incorrectly predicted
> the opposite if Victor had asked about it in advance.

I don't understand what your point is in these two paragraphs. I'm
just arguing that, if a raw dead tuple count is meaningless because a
lot of them are going to disappear harmlessly with or without vacuum,
it's reasonable to try to get around that problem by counting the
subset of dead tuples where that isn't true. I agree that it's unclear
how to do that, but that doesn't mean that it can't be done.

> > I realize I'm
> > hand-waving, but if the property is a property of the heap rather than
> > the index, how will different indexes get different treatment?
>
> Maybe by making the primary key growth an indicator of what is
> reasonable for the other indexes (or other B-Tree indexes) -- it has a
> natural tendency to be the least bloated possible index. If you have
> something like a GiST index, or if you have a B-Tree index that
> constantly gets non-HOT updates that logically modify an indexed
> column, then it should become reasonably obvious. Maybe there'd be
> some kind of feedback behavior to lock in "bloat prone index" for a
> time.

I have the same concern about this as what I mentioned before: it's
purely retrospective. Therefore in my mind it's a very reasonable
choice for a backstop, but not a somewhat unsatisfying choice for a
primary mechanism.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [RFC] building postgres with meson

2022-02-10 Thread Andrew Dunstan


On 2/10/22 12:52, Andres Freund wrote:
> Hi,
>
> On 2022-02-10 12:00:16 -0500, Andrew Dunstan wrote:
>> Any objection to my moving ahead with this?
> No. I don't yet understand what the transforms issue is and whether it can be
> avoidded, but clearly it's an improvement to be able to build with builtin
> msys tools vs not...



OK, thanks, done.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?

2022-02-10 Thread Justin Pryzby
On Thu, Feb 10, 2022 at 10:21:08PM +0530, Bharath Rupireddy wrote:
> Thanks for the comments. Above looks good to me, changed that way, PSA v2.

I spy a typo: subcription

-- 
Justin




Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:

2022-02-10 Thread Robert Haas
On Wed, Feb 9, 2022 at 5:00 PM Andres Freund  wrote:
> The problem starts with
>
> commit aa01051418f10afbdfa781b8dc109615ca785ff9
> Author: Robert Haas 
> Date:   2022-01-24 14:23:15 -0500
>
> pg_upgrade: Preserve database OIDs.

Well, that's sad.

> I think the most realistic way to address this is the mechanism is to use the
> mechanism from
> https://postgr.es/m/CA%2BhUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg%40mail.gmail.com

I agree. While I feel sort of bad about missing this issue in review,
I also feel like it's pretty surprising that there isn't something
plugging this hole already. It feels unexpected that our FD management
layer might hand you an FD that references the previous file that had
the same name instead of the one that exists right now, and I don't
think it's too much of a stretch of the imagination to suppose that
this won't be the last problem we have if we don't fix it. I also
agree that the mechanism proposed in that thread is the best way to
fix the problem. The sinval mechanism won't work, since there's
nothing to ensure that the invalidation messages are processed in a
sufficient timely fashion, and there's no obvious way to get around
that problem. But the ProcSignalBarrier mechanism is not intertwined
with heavyweight locking in the way that sinval is, so it should be a
good fit for this problem.

The main question in my mind is who is going to actually make that
happen. It was your idea (I think), Thomas coded it, and my commit
made it a live problem. So who's going to get something committed
here?

> If we add such barriers to dropdb(), XLOG_DBASE_DROP redo, DropTableSpace(),
> XLOG_TBLSPC_DROP redo, I think all the "new" ways to hit this are foreclosed.

What about movedb()?

> We perhaps could avoid all relfilenode recycling, issues on the fd level, by
> adding a barrier whenever relfilenode allocation wraps around. But I'm not
> sure how easy that is to detect.

I definitely think it would be nice to cover this issue not only for
database OIDs and tablespace OIDs but also for relfilenode OIDs.
Otherwise we're right on the cusp of having the same bug in that case.
pg_upgrade doesn't drop and recreate tables the way it does databases,
but if somebody made it do that in the future, would they think about
this? I bet not.

Dilip's been working on your idea of making relfilenode into a 56-bit
quantity. That would make the problem of detecting wraparound go away.
In the meantime, I suppose we could do think of trying to do something
here:

/* wraparound, or first post-initdb
assignment, in normal mode */
ShmemVariableCache->nextOid = FirstNormalObjectId;
ShmemVariableCache->oidCount = 0;

That's a bit sketch, because this is the OID counter, which isn't only
used for relfilenodes. I'm not sure whether there would be problems
waiting for a barrier here - I think we might be holding some lwlocks
in some code paths.

> I think we might actually be able to remove the checkpoint from dropdb() by
> using barriers?

That looks like it might work. We'd need to be sure that the
checkpointer would both close any open FDs and also absorb fsync
requests before acknowledging the barrier.

> I think we need to add some debugging code to detect "fd to deleted file of
> same name" style problems. Especially during regression tests they're quite
> hard to notice, because most of the contents read/written are identical across
> recycling.

Maybe. If we just plug the holes here, I am not sure we need permanent
instrumentation. But I'm also not violently opposed to it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Per-table storage parameters for TableAM/IndexAM extensions

2022-02-10 Thread Jeff Davis
On Tue, 2022-01-18 at 22:44 +0530, Sadhuprasad Patro wrote:
> As of now, I have fixed the comments from Dilip & Rushabh and have
> done some more changes after internal testing and review. Please find
> the latest patch attached.

Hi,

Thank you for working on this! Some questions/comments:

At a high level, it seems there are some options that are common to all
tables, regardless of the AM. For instance, the vacuum/autovacuum
options. (Even if the AM doesn't require vacuum, then it needs to at
least be able to communicate that somehow.) I think parallel_workers
and user_catalog_table also fit into this category. That means we need
all of StdRdOptions to be the same, with the possible exception of
toast_tuple_target and/or fillfactor.

The current patch just leaves it up to the AM to return a bytea that
can be cast to StdRdOptions, which seems like a fragile API.

That makes me think that what we really want is to have *extra* options
for a table AM, not an entirely custom set. Do you agree?

If so, I suggest you refactor so that if validation doesn't recognize a
parameter, it calls a table AM method to validate it, and lets it in if
validation succeeds. That way all the stuff around StdRdOptions is
unchanged. When the table AM needs the parameter value, it can parse
pg_class.reloptions for itself and save it in rd_amcache.

Regards,
Jeff Davis






Re: [RFC] building postgres with meson

2022-02-10 Thread Andres Freund
Hi,

On 2022-02-10 12:00:16 -0500, Andrew Dunstan wrote:
> Any objection to my moving ahead with this?

No. I don't yet understand what the transforms issue is and whether it can be
avoidded, but clearly it's an improvement to be able to build with builtin
msys tools vs not...

Greetings,

Andres Freund




Re: Race condition in TransactionIdIsInProgress

2022-02-10 Thread Andrey Borodin



> On 10 Feb 2022, at 21:46, Konstantin Knizhnik  wrote:
> As a result "t_xmin %u is uncommitted in tuple (%u,%u) to be updated in table"
> error is reported.

Wow, cool, that seem to be a solution to one more mysterious corruption thread 
- “reporting TID/table with corruption error" [0]. Thank you!

Now it’s obvious that this is not a real data corruption. Maybe let’s remove 
corruption error code from the error? I had been literally woken at night by 
this code few times in January.

And do you have a plan how to fix the actual issue?


Best regards, Andrey Borodin.

[0] 
https://www.postgresql.org/message-id/flat/202108191637.oqyzrdtnheir%40alvherre.pgsql





Re: [RFC] building postgres with meson

2022-02-10 Thread Andrew Dunstan


On 2/6/22 15:57, Andrew Dunstan wrote:
> On 2/6/22 13:39, Andres Freund wrote:
>> Hi,
>>
>> On 2022-02-06 12:06:41 -0500, Andrew Dunstan wrote:
>>> Here's a patch. I've tested the perl piece on master and it works fine.
>>> It applies cleanly down to 9.4, which is before we got transform modules
>>> (9.5) which fail if we just omit doing this platform-specific piece.
>> Given https://postgr.es/m/34e972bc-6e75-0754-9e6d-cde2518773a1%40dunslane.net
>> wouldn't it make sense to simply remove the pexports/gendef logic instead of
>> moving to gendef?
>>
> I haven't found a way to fix the transform builds if we do that. So
> let's leave that as a separate exercise unless you have a solution for
> that - this patch is really trivial.
>
>

Any objection to my moving ahead with this? My current workaround is this:


cat > /usr/bin/pexports 

Re: support for CREATE MODULE

2022-02-10 Thread Swaha Miller
On Fri, Feb 4, 2022 at 3:51 PM Tom Lane  wrote:

> Hm. If the functional requirement is "group objects without needing
> any out-in-the-filesystem infrastructure", then I could see defining
> a module as being exactly like an extension except there's no such
> infrastructure --- and hence no concept of versions, plus pg_dump
> needs to act differently.  That's probably enough semantic difference
> to justify using a separate word, even if we can share a lot of
> code infrastructure.
>

Then as a first cut for modules, could we add CREATE MODULE
syntax which adds an entry to pg_extension like CREATE EXTENSION
does? And also add a new column to pg_extension to distinguish
modules from extensions.

The three-part path name resolution for functions would remain the
same, nothing would need to change there because of modules.

Would that be an acceptable direction to go?

Swaha


Re: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?

2022-02-10 Thread Bharath Rupireddy
On Mon, Dec 13, 2021 at 5:09 AM Juan José Santamaría Flecha
 wrote:
>
>
> On Thu, Oct 21, 2021 at 7:21 AM Bharath Rupireddy 
>  wrote:
>>
>>
>> Here's the documentation v1 patch that I've come up with. Please review it.
>>
> There's a typo:
> +   To run an arbitrary TAP test set, run vcregress taptest
> +   comamnd. For example, use the following command for running subcription 
> TAP
> +   tests:
> s/comamnd/command/
>
> But also the wording, I like better what vcregress prints as help, so 
> something like:
> +   You can use vcregress taptest TEST_DIR to run an
> +   arbitrary TAP test set, where TEST_DIR is a required argument pointing to
> +   the directory where the tests reside. For example, use the following
> +   command for running the subcription TAP tests:

Thanks for the comments. Above looks good to me, changed that way, PSA v2.

Regards,
Bharath Rupireddy.


v2-0001-add-vcregress-taptest-and-mention-about-Windows-o.patch
Description: Binary data


Race condition in TransactionIdIsInProgress

2022-02-10 Thread Konstantin Knizhnik

Hi hackers,

Postgres first records state transaction in CLOG, then removes 
transaction from procarray and finally releases locks.
But it can happen that transaction is marked as committed in CLOG, 
XMAX_COMMITTED bit is set in modified tuple but
TransactionIdIsInProgress still returns true. As a result "t_xmin %u is 
uncommitted in tuple (%u,%u) to be updated in table"

error is reported.

TransactionIdIsInProgress first checks for cached XID status:

    /*
     * We may have just checked the status of this transaction, so if it is
     * already known to be completed, we can fall out without any access to
     * shared memory.
     */
    if (TransactionIdIsKnownCompleted(xid))
    {
    xc_by_known_xact_inc();
    return false;
    }

This status cache is updated by TransactionLogFetch.
So once transaction status is fetched from CLOG, subsequent invocation 
of TransactionIdIsKnownCompleted will return true

even if transaction is not removed from procarray.

The following stack illustrates how it can happen:

#6  0x55ab7758bbaf in TransactionLogFetch (transactionId=4482)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:88

#7  TransactionLogFetch (transactionId=4482)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:52
#8  0x55ab7758bc7d in TransactionIdDidAbort 
(transactionId=transactionId@entry=4482)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:186

#9  0x55ab77812a43 in TransactionIdIsInProgress (xid=4482)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/storage/ipc/procarray.c:1587
#10 0x55ab7753de79 in HeapTupleSatisfiesVacuumHorizon 
(htup=htup@entry=0x7ffe562cec50, buffer=buffer@entry=1817,

    dead_after=dead_after@entry=0x7ffe562ceb14)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1281
#11 0x55ab7753e2c5 in HeapTupleSatisfiesNonVacuumable (buffer=1817, 
snapshot=0x7ffe562cec70, htup=0x7ffe562cec50)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1449
#12 HeapTupleSatisfiesVisibility (tup=tup@entry=0x7ffe562cec50, 
snapshot=snapshot@entry=0x7ffe562cec70, buffer=buffer@entry=1817)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1804
#13 0x55ab7752f3b5 in heap_hot_search_buffer 
(tid=tid@entry=0x7ffe562cec2a, relation=relation@entry=0x7fa57fbf17c8, 
buffer=buffer@entry=1817,
    snapshot=snapshot@entry=0x7ffe562cec70, 
heapTuple=heapTuple@entry=0x7ffe562cec50, all_dead=all_dead@entry=0x0, 
first_call=true)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam.c:1802
#14 0x55ab775370d8 in heap_index_delete_tuples (rel=0x7fa57fbf17c8, 
delstate=)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam.c:7480
#15 0x55ab7755767a in table_index_delete_tuples 
(delstate=0x7ffe562d0160, rel=0x2)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/include/access/tableam.h:1327
#16 _bt_delitems_delete_check (rel=rel@entry=0x7fa57fbf4518, 
buf=buf@entry=1916, heapRel=heapRel@entry=0x7fa57fbf17c8,

    delstate=delstate@entry=0x7ffe562d0160)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtpage.c:1541
#17 0x55ab7754e475 in _bt_bottomupdel_pass 
(rel=rel@entry=0x7fa57fbf4518, buf=buf@entry=1916, 
heapRel=heapRel@entry=0x7fa57fbf17c8, newitemsz=20)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtdedup.c:406
#18 0x55ab7755010a in _bt_delete_or_dedup_one_page 
(rel=0x7fa57fbf4518, heapRel=0x7fa57fbf17c8, insertstate=0x7ffe562d0640,

--Type  for more, q to quit, c to continue without paging--
    simpleonly=, checkingunique=, 
uniquedup=, indexUnchanged=true)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:2763
#19 0x55ab775548c3 in _bt_findinsertloc (heapRel=0x7fa57fbf17c8, 
stack=0x55ab7a1cde08, indexUnchanged=true, checkingunique=true,

    insertstate=0x7ffe562d0640, rel=0x7fa57fbf4518)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:904
#20 _bt_doinsert (rel=rel@entry=0x7fa57fbf4518, 
itup=itup@entry=0x55ab7a1cde30, 
checkUnique=checkUnique@entry=UNIQUE_CHECK_YES,
    indexUnchanged=indexUnchanged@entry=true, 
heapRel=heapRel@entry=0x7fa57fbf17c8)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:255
#21 0x55ab7755a931 in btinsert (rel=0x7fa57fbf4518, 
values=, isnull=, ht_ctid=0x55ab7a1

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-02-10 Thread Peter Geoghegan
On Sun, Feb 6, 2022 at 7:45 AM Robert Haas  wrote:
> For what it's worth, I am generally in favor of having something like
> this in PostgreSQL. I think it's wrong of us to continue assuming that
> everyone has command-line access. Even when that's true, it's not
> necessarily convenient. If you choose to use a relational database,
> you may be the sort of person who likes SQL. And if you are, you may
> want to have the database tell you what's going on via SQL rather than
> command-line tools or operating system utilities. Imagine if we didn't
> have pg_stat_activity and you had to get that information by running a
> separate binary. Would anyone like that? Why is this case any
> different?

+1. An SQL interface is significantly easier to work with. Especially
because it can use the built-in LSN type, pg_lsn.

I don't find the slippery slope argument convincing. There aren't that
many other things that are like pg_waldump, but haven't already been
exposed via an SQL interface. Offhand, I can't think of any.

-- 
Peter Geoghegan




Re: faulty link

2022-02-10 Thread Dave Page
On Thu, 10 Feb 2022 at 15:53, Dagfinn Ilmari Mannsåker 
wrote:

> Tom Lane  writes:
>
> > =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> >> čt 10. 2. 2022 v 15:35 odesílatel Erik Rijkers  napsal:
> >>> The provided link
> >>> https://www.postgresql.org/docs/release/
> >>> leads to
> >>> https://www.postgresql.org/docs/release/14.2/
> >>> which gives 'Not Found' for me (Netherlands)
> >
> >> Thinking about that again, the 14.2 release just happened. Could it be
> >> just a matter of propagating new release info to mirrors?
> >
> > The link works for me, too (USA).  Stale cache seems like a reasonable
> > explanation for the OP's problem --- maybe clearing browser cache
> > would help?
>
> I'm getting a 404 as well from London. After trying multiple times with
> curl I did get one 200 response, but it's mostly 404s.
>
> It looks like some of the mirrors have it, but not all:
>
> $ for h in $(dig +short -tA www.mirrors.postgresql.org); do echo -n "$h:
> "; curl -i -k -s -HHost:www.postgresql.org "https://$h/docs/release/14.2/";
> | grep ^HTTP; done
>  72.32.157.230: HTTP/2 200
>  87.238.57.232: HTTP/2 404
>  217.196.149.50: HTTP/2 200
>
> $ for h in $(dig +short -t www.mirrors.postgresql.org); do echo -n
> "$h: "; curl -i -k -s -HHost:www.postgresql.org 
> "https://[$h]/docs/release/14.2/";
> | grep ^HTTP; done
>  2001:4800:3e1:1::230: HTTP/2 200
>  2a02:c0:301:0:::32: HTTP/2 404
>  2a02:16a8:dc51::50: HTTP/2 200
>

Despite the name, they're not actually mirrors. They're varnish caches. By
the looks of it one of them cached a 404 (probably someone tried to access
the new page before it really did exist). I've purged /docs/release now,
and everything is returning 200.

-- 
Dave Page
Blog: https://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: https://www.enterprisedb.com


Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-10 Thread Bharath Rupireddy
On Fri, Feb 4, 2022 at 5:33 AM Nathan Bossart  wrote:
> > Thanks. I get it. For syncing map files, we don't want to tolerate any
> > errors, whereas removal of the old map files (lesser than cutoff LSN)
> > can be tolerated in CheckPointLogicalRewriteHeap.
>
> LGTM.  Andres noted upthread [0] that the comment above sscanf() about
> skipping editors' lock files might not be accurate.  I don't think it's a
> huge problem if sscanf() matches those files, but perhaps we can improve
> the comment.
>
> [0] https://postgr.es/m/20220120194618.hmfd4kxkng2cgryh%40alap3.anarazel.de

Andres comment from [0]:

> An editor's lock file that starts with map- would presumably be the whole
> filename plus an additional file-ending. But this check won't catch those.

Agreed. sscanf checks can't detect the files named "whole filename
plus an additional file-ending". I just checked with vi editor lock
state file .0-14ED3B8.snap.swp [1], the log generated is [2]. I'm not
sure exactly which editor would create a lockfile like "whole filename
plus an additional file-ending".

In any case, let's remove the editor's lock/state file from those
comments and have just only "We just log a message if a file doesn't
fit the pattern". Attached v8 patch with that change.

[1]
-rw--- 1 bharath bharath 12288 Feb 10 15:48 .0-14ED3B8.snap.swp
-rw--- 1 bharath bharath   128 Feb 10 15:48 0-14ED518.snap
-rw--- 1 bharath bharath   128 Feb 10 15:49 0-14ED518.snap.lockfile
-rw--- 1 bharath bharath   128 Feb 10 15:49 0-14ED550.snap
-rw--- 1 bharath bharath   128 Feb 10 15:49 0-14ED600.snap

[2]
2022-02-10 15:48:47.938 UTC [1121678] LOG:  could not parse file name
"pg_logical/snapshots/.0-14ED3B8.snap.swp"

Regards,
Bharath Rupireddy.


v8-0001-Replace-ReadDir-with-ReadDirExtended.patch
Description: Binary data


Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-02-10 Thread Fujii Masao




On 2022/02/09 11:52, Kyotaro Horiguchi wrote:

0001: The fix of CreateRestartPoint


This patch might be ok for the master branch. But since concurrent checkpoint 
and restartpoint can happen in v14 or before, we would need another patch based 
on that assumption, for the backport. How about fixing the bug all the branches 
at first, then apply this patch in the master to improve the code?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: faulty link

2022-02-10 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
>> čt 10. 2. 2022 v 15:35 odesílatel Erik Rijkers  napsal:
>>> The provided link
>>> https://www.postgresql.org/docs/release/
>>> leads to
>>> https://www.postgresql.org/docs/release/14.2/
>>> which gives 'Not Found' for me (Netherlands)
>
>> Thinking about that again, the 14.2 release just happened. Could it be
>> just a matter of propagating new release info to mirrors?
>
> The link works for me, too (USA).  Stale cache seems like a reasonable
> explanation for the OP's problem --- maybe clearing browser cache
> would help?

I'm getting a 404 as well from London. After trying multiple times with
curl I did get one 200 response, but it's mostly 404s.

It looks like some of the mirrors have it, but not all:

$ for h in $(dig +short -tA www.mirrors.postgresql.org); do echo -n "$h: "; 
curl -i -k -s -HHost:www.postgresql.org "https://$h/docs/release/14.2/"; | grep 
^HTTP; done
 72.32.157.230: HTTP/2 200
 87.238.57.232: HTTP/2 404
 217.196.149.50: HTTP/2 200

$ for h in $(dig +short -t www.mirrors.postgresql.org); do echo -n "$h: "; 
curl -i -k -s -HHost:www.postgresql.org "https://[$h]/docs/release/14.2/"; | 
grep ^HTTP; done
 2001:4800:3e1:1::230: HTTP/2 200
 2a02:c0:301:0:::32: HTTP/2 404
 2a02:16a8:dc51::50: HTTP/2 200
 
>   regards, tom lane

- ilmari




Re: Logging in LockBufferForCleanup()

2022-02-10 Thread Fujii Masao




On 2022/02/10 20:28, Masahiko Sawada wrote:

While I understand that these are theoretically valid concerns, it’s
unclear to me how much the current code leads to bad effects in
practice. There are other low-level codes/paths that call palloc()
while holding LWLocks, and I’m not sure that the overheads result in
visible negative performance impact particularly because the calling
to LockBufferForCleanup() is likely to accompany wait in the first
place.


I'm also not sure that. palloc() for PS display, copy of process title, 
GetCurrentTimestamp() and ereport() of recovery conflict happen when we fail to 
acquire the lock. In this case we also call 
ResolveRecoveryConflictWithBufferPin() and wait to be signaled there. So 
compared to the wait time or overhead caused in 
ResolveRecoveryConflictWithBufferPin(), ISTM that the overhead caused by them 
would be ralatively small. Also ereport() of recovery conflict can happen only 
when log_recovery_conflict_waits parameter is enabled and the deadlock timeout 
happens during the lock wait.



BTW I think calling to LockBufferForCleanup() in a critical
section is a bad idea for sure since it becomes uninterruptible.


Yes. And as far as I tested, currently LockBufferForCleanup() is not called in 
critical section. Right?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-10 Thread Robert Haas
On Thu, Feb 10, 2022 at 2:52 AM Julien Rouhaud  wrote:
> Those extra WALs will also impact backups and replication.  You could have
> fancy hardware, a read-mostly workload and the need to replicate over a slow
> WAN, and in that case the 10GB could be much more problematic.

True, I guess, but how bad does your WAN have to be for that to be an
issue? On a 1 gigabit/second link, that's a little over 2 minutes of
transfer time. That's not nothing, but it's not extreme, either,
especially because there's no sense in querying an empty database.
You're going to have to put some stuff in that database before you can
do anything meaningful with it, and that's going to have to be
replicated with or without this feature.

I am not saying it couldn't be a problem, and that's why I'm endorsing
making the behavior optional. But I think that it's a niche scenario.
You need a bigger-than-normal template database, a slow WAN link, AND
you need the amount of data loaded into the databases you create from
the template to be small enough to make the cost of logging the
template pages material. If you create a 10GB database from a template
and then load 200GB of data into it, the WAL-logging overhead of
creating the template is only 5%.

I won't really be surprised if we hear that someone has a 10GB
template database and likes to make a ton of copies and only change
500 rows in each one while replicating the whole thing over a slow
WAN. That can definitely happen, and I'm sure whoever is doing that
has reasons for it which they consider good and sufficient. However, I
don't think there are likely to be a ton of people doing stuff like
that - just a few.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: faulty link

2022-02-10 Thread Tom Lane
=?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> čt 10. 2. 2022 v 15:35 odesílatel Erik Rijkers  napsal:
>> The provided link
>> https://www.postgresql.org/docs/release/
>> leads to
>> https://www.postgresql.org/docs/release/14.2/
>> which gives 'Not Found' for me (Netherlands)

> Thinking about that again, the 14.2 release just happened. Could it be
> just a matter of propagating new release info to mirrors?

The link works for me, too (USA).  Stale cache seems like a reasonable
explanation for the OP's problem --- maybe clearing browser cache
would help?

regards, tom lane




Re: CREATE INDEX CONCURRENTLY on partitioned index

2022-02-10 Thread Alexander Pyhalov

Justin Pryzby писал 2021-02-26 21:20:

On Mon, Feb 15, 2021 at 10:07:05PM +0300, Anastasia Lubennikova wrote:
5) Speaking of documentation, I think we need to add a paragraph about 
CIC
on partitioned indexes which will explain that invalid indexes may 
appear

and what user should do to fix them.


I'm not sure about that - it's already documented in general, for
nonpartitioned indexes.


Hi.

I've rebased patches and tried to fix issues I've seen. I've fixed 
reference after table_close() in the first patch (can be seen while 
building with CPPFLAGS='-DRELCACHE_FORCE_RELEASE'). It seems childidxs 
shouldn't live in ind_context, so I moved it out of it. Updated 
documentation to state that CIC can leave invalid or valid indexes on 
partitions if it's not succeeded. Also merged old 
0002-f-progress-reporting.patch and 
0003-WIP-Add-SKIPVALID-flag-for-more-integration.patch. It seems the 
first one didn't really fixed issue with progress report (as 
ReindexRelationConcurrently() uses pgstat_progress_start_command(), 
which seems to mess up the effect of this command in DefineIndex()). 
Note, that third patch completely removes attempts to report create 
index progress correctly (reindex reports about individual commands, not 
the whole CREATE INDEX).


So I've added 0003-Try-to-fix-create-index-progress-report.patch, which 
tries to fix the mess with create index progress report. It introduces 
new flag REINDEXOPT_REPORT_CREATE_PART to ReindexParams->options. Given 
this flag, ReindexRelationConcurrently() will not report about 
individual operations start/stop, but ReindexMultipleInternal() will 
report about reindexed partitions. To make the issue worse, some 
partitions can be handled in ReindexPartitions() and 
ReindexMultipleInternal() should know how many to correctly update 
PROGRESS_CREATEIDX_PARTITIONS_DONE counter. Also it needs IndexOid to 
correctly generate pg_stat_progress_create_index record, so we pass 
these parameters to it.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom eaad7c3ed2fda93fdb91aea60294f60489444bf7 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 7 Feb 2022 10:28:42 +0300
Subject: [PATCH 1/4] Allow CREATE INDEX CONCURRENTLY on partitioned table

0001-Allow-CREATE-INDEX-CONCURRENTLY-on-partitioned-table.patch from
https://www.postgresql.org/message-id/20210226182019.gu20...@telsasoft.com

Fixes:
  - rel was used after table_close();
  - it seems childidxs shouldn't live in ind_context;
  - updated doc.
---
 doc/src/sgml/ref/create_index.sgml |  14 +--
 src/backend/commands/indexcmds.c   | 151 ++---
 src/test/regress/expected/indexing.out |  60 +-
 src/test/regress/sql/indexing.sql  |  18 ++-
 4 files changed, 186 insertions(+), 57 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 91eaaabc90f..ffa98692430 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -641,7 +641,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 If a problem arises while scanning the table, such as a deadlock or a
 uniqueness violation in a unique index, the CREATE INDEX
-command will fail but leave behind an invalid index. This index
+command will fail but leave behind an invalid index.
+If this happens while creating index concurrently on a partitioned
+table, the command can also leave behind valid or
+invalid indexes on table partitions.  The invalid index
 will be ignored for querying purposes because it might be incomplete;
 however it will still consume update overhead. The psql
 \d command will report such an index as INVALID:
@@ -688,15 +691,6 @@ Indexes:
 cannot.

 
-   
-Concurrent builds for indexes on partitioned tables are currently not
-supported.  However, you may concurrently build the index on each
-partition individually and then finally create the partitioned index
-non-concurrently in order to reduce the time where writes to the
-partitioned table will be locked out.  In this case, building the
-partitioned index is a metadata only operation.
-   
-
   
  
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 560dcc87a2c..666ced8e1d7 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,6 +68,7 @@
 
 
 /* non-export function prototypes */
+static void reindex_invalid_child_indexes(Oid indexRelationId);
 static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
 static void CheckPredicate(Expr *predicate);
 static void ComputeIndexAttrs(IndexInfo *indexInfo,
@@ -670,17 +671,6 @@ DefineIndex(Oid relationId,
 	partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;
 	if (partitioned)
 	{
-		/*
-		 * Note: we check 'stmt->concurrent' rather than 'concurrent', so that
-		 * the error is thrown also for temporary tables.  Seems better to be
-		 * con

Re: faulty link

2022-02-10 Thread Josef Šimánek
čt 10. 2. 2022 v 15:35 odesílatel Erik Rijkers  napsal:
>
> The provided link
>https://www.postgresql.org/docs/release/
>
> leads to
>https://www.postgresql.org/docs/release/14.2/
>
> which gives 'Not Found' for me (Netherlands)

Thinking about that again, the 14.2 release just happened. Could it be
just a matter of propagating new release info to mirrors?

>
> At least one person on IRC reports it 'works' for them but it seems
> there still something wrong..
>
>
> Erik Rijkers
>
>
>
>
>




Re: faulty link

2022-02-10 Thread Daniel Westermann (DWE)
>The provided link
>   https://www.postgresql.org/docs/release/

>leads to
>   https://www.postgresql.org/docs/release/14.2/

>which gives 'Not Found' for me (Netherlands)

Works fine for me in Germany

Regards
Daniel



Re: faulty link

2022-02-10 Thread walther




leads to
https://www.postgresql.org/docs/release/14.2/

which gives 'Not Found' for me (Netherlands)


Same here: Not Found. (Germany)




Re: faulty link

2022-02-10 Thread Josef Šimánek
It works well here (Czech republic).

čt 10. 2. 2022 v 15:35 odesílatel Erik Rijkers  napsal:
>
> The provided link
>https://www.postgresql.org/docs/release/
>
> leads to
>https://www.postgresql.org/docs/release/14.2/
>
> which gives 'Not Found' for me (Netherlands)
>
>
> At least one person on IRC reports it 'works' for them but it seems
> there still something wrong..
>
>
> Erik Rijkers
>
>
>
>
>




Re: Support escape sequence for cluster_name in postgres_fdw.application_name

2022-02-10 Thread Fujii Masao



On 2022/02/09 9:19, r.takahash...@fujitsu.com wrote:

Hi,


Thank you for updating the patch.
I agree with the documentation and program.

How about adding the test for %c (Session ID)?
(Adding the test for %C (cluster_name) seems difficult.)


Ok, I added the tests for %c and %C escape sequences.
Attached is the updated version of the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index b2e02caefe..057342083c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10910,6 +10910,26 @@ SELECT pg_terminate_backend(pid, 18) FROM 
pg_stat_activity
  t
 (1 row)
 
+-- Test %c (session ID) and %C (cluster name) escape sequences.
+SET postgres_fdw.application_name TO 'fdw_%C%c';
+SELECT 1 FROM ft6 LIMIT 1;
+ ?column? 
+--
+1
+(1 row)
+
+SELECT pg_terminate_backend(pid, 18) FROM pg_stat_activity
+  WHERE application_name =
+substring('fdw_' || current_setting('cluster_name') ||
+  to_hex(trunc(EXTRACT(EPOCH FROM (SELECT backend_start FROM
+  pg_stat_get_activity(pg_backend_pid()::integer) || '.' ||
+  to_hex(pg_backend_pid())
+  for current_setting('max_identifier_length')::int);
+ pg_terminate_backend 
+--
+ t
+(1 row)
+
 --Clean up
 RESET postgres_fdw.application_name;
 RESET debug_discard_caches;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index fc3ce6a53a..af38e956e7 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -489,6 +489,12 @@ process_pgfdw_appname(const char *appname)
case 'a':
appendStringInfoString(&buf, application_name);
break;
+   case 'c':
+   appendStringInfo(&buf, "%lx.%x", (long) 
(MyStartTime), MyProcPid);
+   break;
+   case 'C':
+   appendStringInfoString(&buf, cluster_name);
+   break;
case 'd':
appendStringInfoString(&buf, 
MyProcPort->database_name);
break;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql 
b/contrib/postgres_fdw/sql/postgres_fdw.sql
index e050639b57..6c9f579c41 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3501,6 +3501,17 @@ SELECT pg_terminate_backend(pid, 18) FROM 
pg_stat_activity
 substring('fdw_' || current_setting('application_name') ||
   CURRENT_USER || '%' for current_setting('max_identifier_length')::int);
 
+-- Test %c (session ID) and %C (cluster name) escape sequences.
+SET postgres_fdw.application_name TO 'fdw_%C%c';
+SELECT 1 FROM ft6 LIMIT 1;
+SELECT pg_terminate_backend(pid, 18) FROM pg_stat_activity
+  WHERE application_name =
+substring('fdw_' || current_setting('cluster_name') ||
+  to_hex(trunc(EXTRACT(EPOCH FROM (SELECT backend_start FROM
+  pg_stat_get_activity(pg_backend_pid()::integer) || '.' ||
+  to_hex(pg_backend_pid())
+  for current_setting('max_identifier_length')::int);
+
 --Clean up
 RESET postgres_fdw.application_name;
 RESET debug_discard_caches;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 2bb31f1125..17cd90ab12 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -983,6 +983,20 @@ postgres=# SELECT postgres_fdw_disconnect_all();
  %a
  Application name on local server
 
+
+ %c
+ 
+  Session ID on local server
+  (see  for details)
+ 
+
+
+ %C
+ 
+  Cluster name in local server
+  (see  for details)
+ 
+
 
  %u
  User name on local server
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 6bb81707b0..f1bfe79feb 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -271,7 +271,7 @@ extern int  temp_file_limit;
 
 extern int num_temp_buffers;
 
-extern char *cluster_name;
+extern PGDLLIMPORT char *cluster_name;
 extern PGDLLIMPORT char *ConfigFileName;
 extern char *HbaFileName;
 extern char *IdentFileName;


Re: Plug minor memleak in pg_dump

2022-02-10 Thread Ranier Vilela
Em qui., 10 de fev. de 2022 às 10:57, Daniel Gustafsson 
escreveu:

> > On 10 Feb 2022, at 12:14, Ranier Vilela  wrote:
> > Em qua., 9 de fev. de 2022 às 23:16, Michael Paquier <
> mich...@paquier.xyz > escreveu:
>
> > This patch makes things worse, doesn't it?
> > No.
> >
> > Doesn't this localized
> > change mean that we expose ourselves more into *ignoring* TOC entries
> > if we mess up with this code in the future?
> > InvalidOid already used for "default".
>
> There is no default case here, setting the tableoid to InvalidOid is done
> when
> the archive doesn't support this particular feature.  If we can't read the
> tableoid here, it's a corrupt TOC and we should abort.
>
Well, the v2 aborts.


>
> > If ReadStr fails and returns NULL, sscanf will crash.
>
> Yes, which is better than silently propage the error.
>
Ok, silently propagating the error is bad,  but crashing is a signal of
poor tool.


> > Maybe in this case, better report to the user?
> > pg_log_warning?
>
> That would demote what is today a crash to a warning on a corrupt TOC
> entry,
> which I think is the wrong way to go.  Question is, can this fail in a
> non-synthetic case on output which was successfully generated by pg_dump?
> I'm
> not saying we should ignore errors, but I have a feeling that any input fed
> that triggers this will be broken enough to cause fireworks elsewhere too,
> and
> this being a chase towards low returns apart from complicating the code.
>
For me the code stays more simple and maintainable.

regards,
Ranier Vilela


faulty link

2022-02-10 Thread Erik Rijkers

The provided link
  https://www.postgresql.org/docs/release/

leads to
  https://www.postgresql.org/docs/release/14.2/

which gives 'Not Found' for me (Netherlands)


At least one person on IRC reports it 'works' for them but it seems 
there still something wrong..



Erik Rijkers







Re: refactoring basebackup.c

2022-02-10 Thread Jeevan Ladhe
Thanks for the patch, Dipesh.
With a quick look at the patch I have following observations:

--
In bbstreamer_lz4_compressor_new(), I think this alignment is not needed
on client side:

/* Align the output buffer length. */
compressed_bound += compressed_bound + BLCKSZ - (compressed_bound %
BLCKSZ);
--

bbstreamer_lz4_compressor_content(), avail_in and len variables both are
not changed. I think we can simply change the len to avail_in in the
argument list.
--

Comment:
+* Update the offset and capacity of output buffer based on based
on number
+* of bytes written to output buffer.

I think it is thinko:

+* Update the offset and capacity of output buffer based on number
of
+* bytes written to output buffer.
--

Indentation:

+   if ((mystreamer->base.bbs_buffer.maxlen -
mystreamer->bytes_written) <=
+   footer_bound)

--
I think similar to bbstreamer_lz4_compressor_content() in
bbstreamer_lz4_decompressor_content() we can change len to avail_in.


Regards,
Jeevan Ladhe

On Thu, 10 Feb 2022 at 18:11, Dipesh Pandit  wrote:

> Hi,
>
> > On Mon, Jan 31, 2022 at 4:41 PM Jeevan Ladhe <
> jeevan.la...@enterprisedb.com> wrote:
>
>> Hi Robert,
>>
>> I had an offline discussion with Dipesh, and he will be working on the
>> lz4 client side decompression part.
>>
>
> Please find the attached patch to support client side compression
> and decompression using lz4.
>
> Added a new lz4 bbstreamer to compress the archive chunks at
> client if user has specified --compress=clinet-lz4:[LEVEL] option
> in pg_basebackup. The new streamer accepts archive chunks
> compresses it and forwards it to plain-writer.
>
> Similarly, If a user has specified a server compressed lz4 archive
> with plain format (-F p) backup then it requires decompressing
> the compressed archive chunks before forwarding it to tar extractor.
> Added a new bbstreamer to decompress the compressed archive
> and forward it to tar extractor.
>
> Note: This patch can be applied on Jeevan Ladhe's v12 patch
> for lz4 compression.
>
> Thanks,
> Dipesh
>


Re: Plug minor memleak in pg_dump

2022-02-10 Thread Daniel Gustafsson
> On 10 Feb 2022, at 12:14, Ranier Vilela  wrote:
> Em qua., 9 de fev. de 2022 às 23:16, Michael Paquier  > escreveu:

> This patch makes things worse, doesn't it? 
> No.
>  
> Doesn't this localized
> change mean that we expose ourselves more into *ignoring* TOC entries
> if we mess up with this code in the future?
> InvalidOid already used for "default".

There is no default case here, setting the tableoid to InvalidOid is done when
the archive doesn't support this particular feature.  If we can't read the
tableoid here, it's a corrupt TOC and we should abort.

> If ReadStr fails and returns NULL, sscanf will crash.

Yes, which is better than silently propage the error.

> Maybe in this case, better report to the user?
> pg_log_warning?

That would demote what is today a crash to a warning on a corrupt TOC entry,
which I think is the wrong way to go.  Question is, can this fail in a
non-synthetic case on output which was successfully generated by pg_dump?  I'm
not saying we should ignore errors, but I have a feeling that any input fed
that triggers this will be broken enough to cause fireworks elsewhere too, and
this being a chase towards low returns apart from complicating the code.

--
Daniel Gustafsson   https://vmware.com/





Re: Plug minor memleak in pg_dump

2022-02-10 Thread Ranier Vilela
Em qui., 10 de fev. de 2022 às 08:14, Ranier Vilela 
escreveu:

> Em qua., 9 de fev. de 2022 às 23:16, Michael Paquier 
> escreveu:
>
>> On Wed, Feb 09, 2022 at 02:48:35PM -0300, Ranier Vilela wrote:
>> > IMO I think that still have troubles here.
>> >
>> > ReadStr can return NULL, so the fix can crash.
>>
>> -   sscanf(tmp, "%u", &te->catalogId.tableoid);
>> -   free(tmp);
>> +   if (tmp)
>> +   {
>> +   sscanf(tmp, "%u", &te->catalogId.tableoid);
>> +   free(tmp);
>> +   }
>> +   else
>> +   te->catalogId.tableoid = InvalidOid;
>>
>> This patch makes things worse, doesn't it?
>
> No.
>
>
>> Doesn't this localized
>> change mean that we expose ourselves more into *ignoring* TOC entries
>> if we mess up with this code in the future?
>
> InvalidOid already used for "default".
> If ReadStr fails and returns NULL, sscanf will crash.
>
> Maybe in this case, better report to the user?
> pg_log_warning?
>
Maybe in this case, the right thing is abort?

See v2, please.

regards,
Ranier Vilela


v2_fix_possible_null_dereference_pg_backup_archiver.patch
Description: Binary data


Re: [BUG]Update Toast data failure in logical replication

2022-02-10 Thread Amit Kapila
On Mon, Feb 7, 2022 at 1:27 PM Dilip Kumar  wrote:
>
> On Mon, Feb 7, 2022 at 12:25 PM Amit Kapila  wrote:
> >
> > Attached please find the modified patches.
>
> I have looked into the latest modification and back branch patches and
> they look fine to me.
>

Today, while looking at this patch again, I think I see one problem
with the below change (referring pg10 patch):
+ if (attrnum < 0)
+ {
+ if (attrnum != ObjectIdAttributeNumber &&
+ attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified,
+   attrnum -
+   FirstLowInvalidHeapAttributeNumber);
+ continue;
+ }
+ }
...
...
+ /* No need to check attributes that can't be stored externally. */
+ if (isnull1 || TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;

I think it is possible that we use TupleDescAttr for system attribute
(in this case ObjectIdAttributeNumber/TableOidAttributeNumber) which
will be wrong as it contains only user attributes, not system
attributes. See comments atop TupleDescData.

I think this check should be modified to  if (attrnum < 0 || isnull1
|| TupleDescAttr(tupdesc, attrnum - 1)->attlen != -1). What do you
think?

* Another minor comment:
+ if (!heap_attr_equals(RelationGetDescr(relation), attrnum, value1,
+   value2, isnull1, isnull2))

I think here we can directly use tupdesc instead of RelationGetDescr(relation).

-- 
With Regards,
Amit Kapila.




Re: refactoring basebackup.c

2022-02-10 Thread Dipesh Pandit
Hi,

> On Mon, Jan 31, 2022 at 4:41 PM Jeevan Ladhe <
jeevan.la...@enterprisedb.com> wrote:

> Hi Robert,
>
> I had an offline discussion with Dipesh, and he will be working on the
> lz4 client side decompression part.
>

Please find the attached patch to support client side compression
and decompression using lz4.

Added a new lz4 bbstreamer to compress the archive chunks at
client if user has specified --compress=clinet-lz4:[LEVEL] option
in pg_basebackup. The new streamer accepts archive chunks
compresses it and forwards it to plain-writer.

Similarly, If a user has specified a server compressed lz4 archive
with plain format (-F p) backup then it requires decompressing
the compressed archive chunks before forwarding it to tar extractor.
Added a new bbstreamer to decompress the compressed archive
and forward it to tar extractor.

Note: This patch can be applied on Jeevan Ladhe's v12 patch
for lz4 compression.

Thanks,
Dipesh
From 67e47579e119897c66e6f5f7a5e5e9542399072f Mon Sep 17 00:00:00 2001
From: Dipesh Pandit 
Date: Thu, 3 Feb 2022 18:31:03 +0530
Subject: [PATCH] support client side compression and decompression using LZ4

---
 src/bin/pg_basebackup/Makefile|   1 +
 src/bin/pg_basebackup/bbstreamer.h|   3 +
 src/bin/pg_basebackup/bbstreamer_lz4.c| 436 ++
 src/bin/pg_basebackup/pg_basebackup.c |  32 +-
 src/bin/pg_verifybackup/t/009_extract.pl  |   7 +-
 src/bin/pg_verifybackup/t/010_client_untar.pl | 111 +++
 src/tools/msvc/Mkvcbuild.pm   |   1 +
 7 files changed, 585 insertions(+), 6 deletions(-)
 create mode 100644 src/bin/pg_basebackup/bbstreamer_lz4.c
 create mode 100644 src/bin/pg_verifybackup/t/010_client_untar.pl

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index ada3a5a..1d0db4f 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -43,6 +43,7 @@ BBOBJS = \
 	bbstreamer_file.o \
 	bbstreamer_gzip.o \
 	bbstreamer_inject.o \
+	bbstreamer_lz4.o \
 	bbstreamer_tar.o
 
 all: pg_basebackup pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index fe49ae3..c2de77b 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -206,6 +206,9 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
 			void (*report_output_file) (const char *));
 
 extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
+ int compresslevel);
+extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
new file mode 100644
index 000..9055a23
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -0,0 +1,436 @@
+/*-
+ *
+ * bbstreamer_lz4.c
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/bin/pg_basebackup/bbstreamer_lz4.c
+ *-
+ */
+
+#include "postgres_fe.h"
+
+#include 
+
+#ifdef HAVE_LIBLZ4
+#include 
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+#ifdef HAVE_LIBLZ4
+typedef struct bbstreamer_lz4_frame
+{
+	bbstreamer	base;
+
+	LZ4F_compressionContext_t	cctx;
+	LZ4F_decompressionContext_t	dctx;
+	LZ4F_preferences_t			prefs;
+
+	size_t		bytes_written;
+	bool		header_written;
+} bbstreamer_lz4_frame;
+
+static void bbstreamer_lz4_compressor_content(bbstreamer *streamer,
+			  bbstreamer_member *member,
+			  const char *data, int len,
+			  bbstreamer_archive_context context);
+static void bbstreamer_lz4_compressor_finalize(bbstreamer *streamer);
+static void bbstreamer_lz4_compressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_lz4_compressor_ops = {
+	.content = bbstreamer_lz4_compressor_content,
+	.finalize = bbstreamer_lz4_compressor_finalize,
+	.free = bbstreamer_lz4_compressor_free
+};
+
+static void bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
+bbstreamer_member *member,
+const char *data, int len,
+bbstreamer_archive_context context);
+static void bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer);
+static void bbstreamer_lz4_decompressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
+	.content = bbstreamer_lz4_decompressor_content,
+	.finalize = bbstreamer_lz4_decompressor_finalize,
+	.free = bbstreamer_lz4_decompressor_free
+

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-10 Thread Dilip Kumar
On Wed, Feb 9, 2022 at 9:31 PM Robert Haas  wrote:
>
> On Wed, Feb 9, 2022 at 10:59 AM Dilip Kumar  wrote:
> > On Wed, Feb 9, 2022 at 9:25 PM Andrew Dunstan  wrote:
> > > On 6/16/21 03:52, Dilip Kumar wrote:
> > > > On Tue, Jun 15, 2021 at 7:01 PM Andrew Dunstan  
> > > > wrote:
> > > >> Rather than use size, I'd be inclined to say use this if the source
> > > >> database is marked as a template, and use the copydir approach for
> > > >> anything that isn't.
> > > > Yeah, that is possible, on the other thought wouldn't it be good to
> > > > provide control to the user by providing two different commands, e.g.
> > > > COPY DATABASE for the existing method (copydir) and CREATE DATABASE
> > > > for the new method (fully wal logged)?
> > >
> > > This proposal seems to have gotten lost.
> >
> > Yeah, I am planning to work on this part so that we can support both 
> > methods.
>
> But can we pick a different syntax? In my view this should be an
> option to CREATE DATABASE rather than a whole new command.

Maybe we can provide something like

CREATE DATABASE..WITH WAL_LOG=true/false ? OR
CREATE DATABASE..WITH WAL_LOG_DATA_PAGE=true/false ? OR
CREATE DATABASE..WITH CHECKPOINT=true/false ? OR

And then we can explain in documentation about these options?  I think
default should be new method?


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Refactor CheckpointWriteDelay()

2022-02-10 Thread Bharath Rupireddy
Hi,

Given that CheckpointWriteDelay() is really a hot code path i.e. gets
called for every dirty buffer written to disk, here's an opportunity
for us to improve it by avoiding some function calls.

Firstly, the AmCheckpointerProcess() macro check can be placed outside
of the CheckpointWriteDelay() which avoids function calls (dirty
buffers written to disk * one function call cost)  for a
non-checkpointer process(standalone backend) performing checkpoint.

Secondly, remove the function ImmediateCheckpointRequested() and read
the ckpt_flags from the CheckpointerShmem with volatile qualifier.
This saves some LOC and cost = dirty buffers written to disk * one
function call cost. The ImmediateCheckpointRequested() really does
nothing great, it just reads from the shared memory without lock and
checks whether there's any immediate checkpoint request pending behind
the current one.

Attaching a patch with the above changes. Please have a look at it.

I did a small experiment[1] with a use case [2] on my dev system where
I measured the total time spent in CheckpointWriteDelay() with and
without patch, to my surprise, I didn't see much difference. It may be
my experiment is wrong, my dev box doesn't show much diff and others
may or may not notice the difference. Despite this, the patch proposed
still helps IMO as it saves a few LOC (16) and I'm sure it will also
save some time for standalone backend checkpoints.

Other hackers may not agree with me on the readability (IMO, the patch
doesn't make it unreadable) or the diff that it creates with the
previous versions and so on. I'd rather argue that
CheckpointWriteDelay() is really a hot code path in production
environments and the patch proposed has some benefits.

Thoughts?

[1] see "write delay" at the end of "checkpoint complete" message:
HEAD:
2022-02-08 05:56:45.551 UTC [651784] LOG:  checkpoint starting: time
2022-02-08 05:57:39.154 UTC [651784] LOG:  checkpoint complete: wrote
14740 buffers (90.0%); 0 WAL file(s) added, 0 removed, 27 recycled;
write=53.461 s, sync=0.027 s, total=53.604 s; sync files=22,
longest=0.016 s, average=0.002 s; distance=438865 kB, estimate=438865
kB; write delay=53104.194 ms

2022-02-08 05:59:24.173 UTC [652589] LOG:  checkpoint starting: time
2022-02-08 06:00:18.166 UTC [652589] LOG:  checkpoint complete: wrote
14740 buffers (90.0%); 0 WAL file(s) added, 0 removed, 27 recycled;
write=53.848 s, sync=0.030 s, total=53.993 s; sync files=22,
longest=0.017 s, average=0.002 s; distance=438865 kB, estimate=438865
kB; write delay=53603.159 ms

PATCHED:
2022-02-08 06:07:26.286 UTC [662667] LOG:  checkpoint starting: time
2022-02-08 06:08:20.152 UTC [662667] LOG:  checkpoint complete: wrote
14740 buffers (90.0%); 0 WAL file(s) added, 0 removed, 27 recycled;
write=53.732 s, sync=0.026 s, total=53.867 s; sync files=22,
longest=0.016 s, average=0.002 s; distance=438865 kB, estimate=438865
kB; write delay=53399.582 ms

2022-02-08 06:10:17.554 UTC [663393] LOG:  checkpoint starting: time
2022-02-08 06:11:11.163 UTC [663393] LOG:  checkpoint complete: wrote
14740 buffers (90.0%); 0 WAL file(s) added, 0 removed, 27 recycled;
write=53.488 s, sync=0.023 s, total=53.610 s; sync files=22,
longest=0.018 s, average=0.002 s; distance=438865 kB, estimate=438865
kB; write delay=53099.114 ms

[2]
checkpoint_timeout = 60s
create table t1(a1 int, b1 int);
/* inserted 7mn rows */
insert into t1 select i, i*2 from generate_series(1, 700) i;

Regards,
Bharath Rupireddy.
From d9d58fc7d006f8fdb37d5ab9119062c5e0083c14 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 8 Feb 2022 05:19:37 +
Subject: [PATCH v1] Refactor CheckpointWriteDelay()

---
 src/backend/postmaster/checkpointer.c | 36 +++
 src/backend/storage/buffer/bufmgr.c   |  6 +++--
 2 files changed, 13 insertions(+), 29 deletions(-)

diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 23f691cd47..e1f0ccd561 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -162,7 +162,6 @@ static pg_time_t last_xlog_switch_time;
 static void HandleCheckpointerInterrupts(void);
 static void CheckArchiveTimeout(void);
 static bool IsCheckpointOnSchedule(double progress);
-static bool ImmediateCheckpointRequested(void);
 static bool CompactCheckpointerRequestQueue(void);
 static void UpdateSharedMemoryConfig(void);
 
@@ -651,25 +650,6 @@ CheckArchiveTimeout(void)
 	}
 }
 
-/*
- * Returns true if an immediate checkpoint request is pending.  (Note that
- * this does not check the *current* checkpoint's IMMEDIATE flag, but whether
- * there is one pending behind it.)
- */
-static bool
-ImmediateCheckpointRequested(void)
-{
-	volatile CheckpointerShmemStruct *cps = CheckpointerShmem;
-
-	/*
-	 * We don't need to acquire the ckpt_lck in this case because we're only
-	 * looking at a single flag bit.
-	 */
-	if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE)
-		return true;
-	return false;
-}
-
 /*
  * Chec

Re: Unnecessary call to resetPQExpBuffer in getIndexes

2022-02-10 Thread Julien Rouhaud
On Thu, Feb 10, 2022 at 12:25:36PM +0100, Peter Eisentraut wrote:
> On 09.02.22 19:21, Nathan Bossart wrote:
> > On Wed, Feb 09, 2022 at 10:50:07AM +0800, Julien Rouhaud wrote:
> > > I just noticed that e2c52beecd (adding PeterE in Cc) added a 
> > > resetPQExpBuffer()
> > > which seems unnecessary since the variable is untouched since the initial
> > > createPQExpBuffer().
> > > 
> > > Simple patch attached.
> > 
> > LGTM
> 
> committed

Thanks!




Re: Merging statistics from children instead of re-sampling everything

2022-02-10 Thread Andrey Lepikhov

On 21/1/2022 01:25, Tomas Vondra wrote:

But I don't have a very good idea what to do about statistics that we
can't really merge. For some types of statistics it's rather tricky to
reasonably merge the results - ndistinct is a simple example, although
we could work around that by building and merging hyperloglog counters.
I think, as a first step on this way we can reduce a number of pulled 
tuples. We don't really needed to pull all tuples from a remote server. 
To construct a reservoir, we can pull only a tuple sample. Reservoir 
method needs only a few arguments to return a sample like you read 
tuples locally. Also, to get such parts of samples asynchronously, we 
can get size of each partition on a preliminary step of analysis.
In my opinion, even this solution can reduce heaviness of a problem 
drastically.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Logging in LockBufferForCleanup()

2022-02-10 Thread Masahiko Sawada
On Thu, Feb 10, 2022 at 11:22 AM Andres Freund  wrote:
>
> Hi,
>
> I just noticed somewhat new code in LockBufferForCleanup(), added in
>
> commit 39b03690b529935a3c33024ee68f08e2d347cf4f
> Author: Fujii Masao 
> Date:   2021-01-13 22:59:17 +0900
>
> Log long wait time on recovery conflict when it's resolved.
>
> commit 64638ccba3a659d8b8a3a4bc5b47307685a64a8a
> Author: Fujii Masao 
> Date:   2020-03-30 17:35:03 +0900
>
> Report waiting via PS while recovery is waiting for buffer pin in hot 
> standby.
>
>
> After those commit LockBufferForCleanup() contains code doing memory
> allocations, elogs. That doesn't strike me as a good idea:
>
> Previously the code looked somewhat safe to use in critical section like
> blocks (although whether it'd be good idea to use in one is a different
> question), but not after. Even if not used in a critical section, adding new
> failure conditions to low-level code that's holding LWLocks etc. doesn't seem
> like a good idea.
>
> Secondly, the way it's done seems like a significant laying violation. Before
> the HS related code in LockBufferForCleanup() was encapsulated in a few calls
> to routines dedicated to dealing with that. Now it's all over
> LockBufferForCleanup().
>
> It also just increases the overhead of LockBuffer(). Adding palloc(), copying
> of process title, GetCurrentTimestamp() to a low level routine like this isn't
> free - even if it's mostly in the contended paths.

While I understand that these are theoretically valid concerns, it’s
unclear to me how much the current code leads to bad effects in
practice. There are other low-level codes/paths that call palloc()
while holding LWLocks, and I’m not sure that the overheads result in
visible negative performance impact particularly because the calling
to LockBufferForCleanup() is likely to accompany wait in the first
place. BTW I think calling to LockBufferForCleanup() in a critical
section is a bad idea for sure since it becomes uninterruptible.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Remove 'IN' from pg_freespace docs for consistency

2022-02-10 Thread Bharath Rupireddy
Hi,

It seems like 'IN' isn't used for description of a function input
parameter in the docs unlike 'OUT'. AFAICS this is consistent across
the docs except the function pg_freespace. Attaching a tiny patch to
remove 'IN' there.

Thoughts?

Regards,
Bharath Rupireddy.


v1-0001-Remove-IN-from-pg_freespace-docs-for-consistency.patch
Description: Binary data


Re: Unnecessary call to resetPQExpBuffer in getIndexes

2022-02-10 Thread Peter Eisentraut

On 09.02.22 19:21, Nathan Bossart wrote:

On Wed, Feb 09, 2022 at 10:50:07AM +0800, Julien Rouhaud wrote:

I just noticed that e2c52beecd (adding PeterE in Cc) added a resetPQExpBuffer()
which seems unnecessary since the variable is untouched since the initial
createPQExpBuffer().

Simple patch attached.


LGTM


committed




Re: Plug minor memleak in pg_dump

2022-02-10 Thread Ranier Vilela
Em qua., 9 de fev. de 2022 às 23:16, Michael Paquier 
escreveu:

> On Wed, Feb 09, 2022 at 02:48:35PM -0300, Ranier Vilela wrote:
> > IMO I think that still have troubles here.
> >
> > ReadStr can return NULL, so the fix can crash.
>
> -   sscanf(tmp, "%u", &te->catalogId.tableoid);
> -   free(tmp);
> +   if (tmp)
> +   {
> +   sscanf(tmp, "%u", &te->catalogId.tableoid);
> +   free(tmp);
> +   }
> +   else
> +   te->catalogId.tableoid = InvalidOid;
>
> This patch makes things worse, doesn't it?

No.


> Doesn't this localized
> change mean that we expose ourselves more into *ignoring* TOC entries
> if we mess up with this code in the future?

InvalidOid already used for "default".
If ReadStr fails and returns NULL, sscanf will crash.

Maybe in this case, better report to the user?
pg_log_warning?

regards,
Ranier Vilela


Re: Database-level collation version tracking

2022-02-10 Thread Julien Rouhaud
On Thu, Feb 10, 2022 at 09:57:59AM +0100, Peter Eisentraut wrote:
> New patch that fixes all reported issues, I think:
> 
> - Added test for ALTER DATABASE / REFRESH COLLATION VERSION
> 
> - Rewrote AlterDatabaseRefreshCollVersion() with better locking
> 
> - Added version checking in createdb()

Thanks!  All issues are indeed fixed.  I just have a few additional comments:



> From 290ebb9ca743a2272181f435d5ea76d8a7280a0a Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut 
> Date: Thu, 10 Feb 2022 09:44:20 +0100
> Subject: [PATCH v4] Database-level collation version tracking



> +  * collation version was specified explicitly as an statement option; 
> that

typo, should be "as a statement"

> + actual_versionstr = 
> get_collation_actual_version(COLLPROVIDER_LIBC, dbcollate);
> + if (!actual_versionstr)
> + ereport(ERROR,
> + (errmsg("template database \"%s\" has a 
> collation version, but no actual collation version could be determined",
> + dbtemplate)));
> +
> + if (strcmp(actual_versionstr, src_collversion) != 0)
> + ereport(ERROR,
> + (errmsg("template database \"%s\" has a 
> collation version mismatch",
> + dbtemplate),
> +  errdetail("The template database was 
> created using collation version %s, "
> +"but the operating 
> system provides version %s.",
> +src_collversion, 
> actual_versionstr),
> +  errhint("Rebuild all objects affected 
> by collation in the template database and run "
> +  "ALTER DATABASE %s 
> REFRESH COLLATION VERSION, "
> +  "or build PostgreSQL 
> with the right library version.",
> +  
> quote_identifier(dbtemplate;

After a second read I think the messages are slightly ambiguous.  What do you
think about specifying the problematic collation name and provider?

For now we only support libc default collation so users will probably have to
reindex almost everything on that database (not sure if the versioning is more
fine grained on Windows), but we should probably still specify "affected by
libc collation" in the errhint so they have a chance to avoid unnecessary
reindex.

And this will hopefully become more important to have those information, when
ICU default collations will land :)

> +/*
> + * ALTER DATABASE name REFRESH COLLATION VERSION
> + */
> +ObjectAddress
> +AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)

I'm wondering why you changed this function to return an ObjectAddress rather
than an Oid?  There's no event trigger support for ALTER DATABASE, and the rest
of similar utility commands also returns Oid.

Other than that it all looks good to me!




Re: row filtering for logical replication

2022-02-10 Thread Amit Kapila
On Thu, Feb 10, 2022 at 9:29 AM houzj.f...@fujitsu.com
 wrote:
>
>
> Attach the V80 patch which addressed the above comments and
> comments from Amit[1].
>

Thanks for the new version. Few minor/cosmetic comments:

1. Can we slightly change the below comment:
Before:
+ * To avoid fetching the publication information, we cache the publication
+ * actions and row filter validation information.

After:
To avoid fetching the publication information repeatedly, we cache the
publication actions and row filter validation information.

2.
+ /*
+ * For ordinary tables, make sure we don't copy data from child
+ * that inherits the named table.
+ */
+ if (lrel.relkind == RELKIND_RELATION)
+ appendStringInfoString(&cmd, " ONLY ");

I think we should mention the reason why we are doing so. So how about
something like: "For regular tables, make sure we don't copy data from
a child that inherits the named table as those will be copied
separately."

3.
Can we change the below comment?

Before:
+ /*
+ * Initialize the tuple slot, map and row filter that are only used
+ * when publishing inserts, updates or deletes.
+ */

After:
Initialize the tuple slot, map, and row filter. These are only used
when publishing inserts, updates, or deletes.

4.
+CREATE TABLE testpub_rf_tbl1 (a integer, b text);
+CREATE TABLE testpub_rf_tbl2 (c text, d integer);

Here, you can add a comment saying: "-- Test row filters" or something
on those lines.

5.
+-- test \d+ (now it displays filter information)
+SET client_min_messages = 'ERROR';
+CREATE PUBLICATION testpub_dplus_rf_yes FOR TABLE testpub_rf_tbl1
WHERE (a > 1) WITH (publish = 'insert');
+CREATE PUBLICATION testpub_dplus_rf_no FOR TABLE testpub_rf_tbl1;
+RESET client_min_messages;
+\d+ testpub_rf_tbl1
+  Table "public.testpub_rf_tbl1"
+ Column |  Type   | Collation | Nullable | Default | Storage  | Stats
target | Description
++-+---+--+-+--+--+-
+ a  | integer |   |  | | plain|  |
+ b  | text|   |  | | extended |  |
+Publications:
+"testpub_dplus_rf_no"
+"testpub_dplus_rf_yes" WHERE (a > 1)

I think here \d is sufficient to show row filters? I think it is
better to use table names such as testpub_rf_yes or testpub_rf_no in
this test.

6.
+# Similarly, the table filter for tab_rf_x (after the initial phase) has no
+# effect when combined with the ALL TABLES IN SCHEMA.
+# Expected: 5 initial rows + 2 new rows = 7 rows
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_rf_x (x)
VALUES (-99), (99)");
+$node_publisher->wait_for_catchup($appname);
+$result = $node_subscriber->safe_psql('postgres', "SELECT count(x)
FROM tab_rf_x");
+is($result, qq(7), 'check table tab_rf_x should not be filtered');

I think the comment here should say "ALL TABLES." instead of "ALL
TABLES IN SCHEMA." as there is no publication before this test which
is created with "ALL TABLES IN SCHEMA" clause.

7.
+# The subscription of the ALL TABLES IN SCHEMA publication means
there should be
+# no filtering on the tablesync COPY, so all expect all 5 will be present.

It doesn't make sense to use 'all' twice in the above comment, the
first one can be removed.

8.
+
+# setup structure on publisher
+$node_publisher->safe_psql('postgres',

I think it will be good if we can add some generic comments explaining
the purpose of the tests following this. We can add "# Tests FOR TABLE
with row filter publications" before the current comment.

9. For the newly added test for tab_rowfilter_inherited, the patch has
a test case only for initial sync, can we add a test for replication
after initial sync for the same?

-- 
With Regards,
Amit Kapila.




Re: Identify missing publications from publisher while create/alter subscription.

2022-02-10 Thread Ashutosh Sharma
On Wed, Feb 9, 2022 at 11:53 PM Euler Taveira  wrote:
>
> On Wed, Feb 9, 2022, at 12:06 PM, Ashutosh Sharma wrote:
>
> Just wondering if we should also be detecting the incorrect conninfo
> set with ALTER SUBSCRIPTION command as well. See below:
>
> -- try creating a subscription with incorrect conninfo. the command fails.
> postgres=# create subscription sub1 connection 'host=localhost
> port=5490 dbname=postgres' publication pub1;
> ERROR:  could not connect to the publisher: connection to server at
> "localhost" (::1), port 5490 failed: Connection refused
> Is the server running on that host and accepting TCP/IP connections?
> connection to server at "localhost" (127.0.0.1), port 5490 failed:
> Connection refused
> Is the server running on that host and accepting TCP/IP connections?
>
> That's because by default 'connect' parameter is true.
>

So can we use this option with the ALTER SUBSCRIPTION command. I think
we can't, which means if the user sets wrong conninfo using ALTER
SUBSCRIPTION command then we don't have the option to detect it like
we have in case of CREATE SUBSCRIPTION command. Since this thread is
trying to add the ability to identify the wrong/missing publication
name specified with the ALTER SUBSCRIPTION command, can't we do the
same for the wrong conninfo?

--
With Regards,
Ashutosh Sharma.




Re: POC: GROUP BY optimization

2022-02-10 Thread Andrey V. Lepikhov

On 1/22/22 01:34, Tomas Vondra wrote:




I rebased (with minor fixes) this patch onto current master.

Also, second patch dedicated to a problem of "varno 0" (fake_var).
I think, this case should make the same estimations as in the case of 
varno != 0, but no any stats found. So I suppose to restrict number of 
groups with min of a number of incoming tuples and DEFAULT_NUM_DISTINCT 
value.


--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom d5fd0f8f981d9e457320c1007f21f2b9b74aab9e Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Thu, 10 Feb 2022 13:51:51 +0500
Subject: [PATCH 2/2] Use default restriction for number of groups.

---
 src/backend/optimizer/path/costsize.c | 22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index 68a32740d7..b9e975df10 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1756,8 +1756,8 @@ cost_recursive_union(Path *runion, Path *nrterm, Path 
*rterm)
 
 /*
  * is_fake_var
- * Workaround for generate_append_tlist() which generates fake 
Vars with
- * varno == 0, that will cause a fail of estimate_num_group() call
+ * Workaround for generate_append_tlist() which generates fake 
Vars for the
+ * case of "varno 0", that will cause a fail of 
estimate_num_group() call
  *
  * XXX Ummm, why would estimate_num_group fail with this?
  */
@@ -1978,21 +1978,13 @@ compute_cpu_sort_cost(PlannerInfo *root, List 
*pathkeys, int nPresortedKeys,

  tuplesPerPrevGroup, NULL, NULL,

  &cache_varinfos,

  list_length(pathkeyExprs) - 1);
-   else if (tuples > 4.0)
+   else
/*
-* Use geometric mean as estimation if there is no any 
stats.
-* Don't use DEFAULT_NUM_DISTINCT because it used for 
only one
-* column while here we try to estimate number of 
groups over
-* set of columns.
-*
-* XXX Perhaps this should use DEFAULT_NUM_DISTINCT at 
least to
-* limit the calculated values, somehow?
-*
-* XXX What's the logic of the following formula?
+* In case of full uncertainity use default defensive 
approach. It
+* means that any permutations of such vars are 
equivalent.
+* Also, see comments for the cost_incremental_sort 
routine.
 */
-   nGroups = ceil(2.0 + sqrt(tuples) * (i + 1) / 
list_length(pathkeys));
-   else
-   nGroups = tuples;
+   nGroups = Min(tuplesPerPrevGroup, DEFAULT_NUM_DISTINCT);
 
/*
 * Presorted keys aren't participated in comparison but still 
checked
-- 
2.25.1

From 7357142a0f45313aa09014c4413c011b61aafe5f Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 21 Jan 2022 20:22:14 +0100
Subject: [PATCH 1/2] GROUP BY reordering

---
 .../postgres_fdw/expected/postgres_fdw.out|  15 +-
 src/backend/optimizer/path/costsize.c | 369 +-
 src/backend/optimizer/path/equivclass.c   |  13 +-
 src/backend/optimizer/path/pathkeys.c | 580 
 src/backend/optimizer/plan/planner.c  | 653 ++
 src/backend/optimizer/util/pathnode.c |   2 +-
 src/backend/utils/adt/selfuncs.c  |  37 +-
 src/backend/utils/misc/guc.c  |  32 +
 src/include/nodes/nodes.h |   1 +
 src/include/nodes/pathnodes.h |  10 +
 src/include/optimizer/cost.h  |   4 +-
 src/include/optimizer/paths.h |  11 +
 src/include/utils/selfuncs.h  |   5 +
 src/test/regress/expected/aggregates.out  | 244 ++-
 src/test/regress/expected/guc.out |   9 +-
 .../regress/expected/incremental_sort.out |   2 +-
 src/test/regress/expected/join.out|  51 +-
 .../regress/expected/partition_aggregate.out  | 136 ++--
 src/test/regress/expected/partition_join.out  |  75 +-
 src/test/regress/expected/union.out   |  60 +-
 src/test/regress/sql/aggregates.sql   |  99 +++
 src/test/regress/sql/incremental_sort.sql |   2 +-
 22 files changed, 1913 insertions(+), 497 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index b2e02caefe..2331d13858 100644
--- a/contrib/pos

Re: Database-level collation version tracking

2022-02-10 Thread Peter Eisentraut

New patch that fixes all reported issues, I think:

- Added test for ALTER DATABASE / REFRESH COLLATION VERSION

- Rewrote AlterDatabaseRefreshCollVersion() with better locking

- Added version checking in createdb()From 290ebb9ca743a2272181f435d5ea76d8a7280a0a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 10 Feb 2022 09:44:20 +0100
Subject: [PATCH v4] Database-level collation version tracking

This adds to database objects the same version tracking that collation
objects have.  There is a new pg_database column datcollversion that
stores the version, a new function
pg_database_collation_actual_version() to get the version from the
operating system, and a new subcommand ALTER DATABASE ... REFRESH
COLLATION VERSION.

This was not originally added together with pg_collation.collversion,
since originally version tracking was only supported for ICU, and ICU
on a database-level is not currently supported.  But we now have
version tracking for glibc (since PG13), FreeBSD (since PG14), and
Windows (since PG13), so this is useful to have now.

Discussion: 
https://www.postgresql.org/message-id/flat/f0ff3190-29a3-5b39-a179-fa32eee57db6%40enterprisedb.com

XXX catversion bump
---
 doc/src/sgml/catalogs.sgml|  11 +
 doc/src/sgml/func.sgml|  18 ++
 doc/src/sgml/ref/alter_collation.sgml |   3 +-
 doc/src/sgml/ref/alter_database.sgml  |  12 ++
 doc/src/sgml/ref/create_database.sgml |  21 ++
 src/backend/commands/dbcommands.c | 194 +-
 src/backend/parser/gram.y |   6 +
 src/backend/tcop/utility.c|  14 +-
 src/backend/utils/init/postinit.c |  34 +++
 src/bin/initdb/initdb.c   |  12 ++
 src/bin/pg_dump/pg_dump.c |  21 ++
 src/bin/psql/tab-complete.c   |   2 +-
 src/include/catalog/pg_database.h |   3 +
 src/include/catalog/pg_proc.dat   |   5 +
 src/include/commands/dbcommands.h |   1 +
 src/include/nodes/nodes.h |   1 +
 src/include/nodes/parsenodes.h|   6 +
 .../regress/expected/collate.icu.utf8.out |   4 +
 .../regress/expected/collate.linux.utf8.out   |   4 +
 src/test/regress/sql/collate.icu.utf8.sql |   4 +
 src/test/regress/sql/collate.linux.utf8.sql   |   4 +
 21 files changed, 366 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 879d2dbce0..5a1627a394 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3043,6 +3043,17 @@ pg_database 
Columns
   
  
 
+ 
+  
+   datcollversion text
+  
+  
+   Provider-specific version of the collation.  This is recorded when the
+   database is created and then checked when it is used, to detect
+   changes in the collation definition that could lead to data corruption.
+  
+ 
+
  
   
datacl aclitem[]
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1b064b4feb..df3cd5987b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27061,6 +27061,24 @@ Collation Management Functions

   
 
+  
+   
+
+ pg_database_collation_actual_version
+
+pg_database_collation_actual_version ( 
oid )
+text
+   
+   
+Returns the actual version of the database's collation as it is 
currently
+installed in the operating system.  If this is different from the
+value in
+
pg_database.datcollversion,
+then objects depending on the collation might need to be rebuilt.  See
+also .
+   
+  
+
   

 
diff --git a/doc/src/sgml/ref/alter_collation.sgml 
b/doc/src/sgml/ref/alter_collation.sgml
index 892c466565..a8c831d728 100644
--- a/doc/src/sgml/ref/alter_collation.sgml
+++ b/doc/src/sgml/ref/alter_collation.sgml
@@ -151,7 +151,8 @@ Notes

   
   
-   Currently, there is no version tracking for the database default collation.
+   For the database default collation, there is an analogous command
+   ALTER DATABASE ... REFRESH COLLATION VERSION.
   
 
   
diff --git a/doc/src/sgml/ref/alter_database.sgml 
b/doc/src/sgml/ref/alter_database.sgml
index 81e37536a3..89ed261b4c 100644
--- a/doc/src/sgml/ref/alter_database.sgml
+++ b/doc/src/sgml/ref/alter_database.sgml
@@ -35,6 +35,8 @@
 
 ALTER DATABASE name SET 
TABLESPACE new_tablespace
 
+ALTER DATABASE name REFRESH 
COLLATION VERSION
+
 ALTER DATABASE name SET 
configuration_parameter { TO | = } { 
value | DEFAULT }
 ALTER DATABASE name SET 
configuration_parameter FROM CURRENT
 ALTER DATABASE name RESET 
configuration_parameter
@@ -171,6 +173,16 @@ Parameters
 

 
+   
+REFRESH COLLATION VERSION
+
+ 
+  Update the database collation version.  See  for background.
+ 
+
+   
+
  
   configuration_parameter
   

Re: decoupling table and index vacuum

2022-02-10 Thread Dilip Kumar
On Wed, Feb 9, 2022 at 7:43 PM Robert Haas  wrote:
>
> On Wed, Feb 9, 2022 at 1:18 AM Dilip Kumar  wrote:

> I think that dead index tuples really don't matter if they're going to
> get removed anyway before a page split happens. In particular, if
> we're going to do a bottom-up index deletion pass before splitting the
> page, then who cares if there are going to be dead tuples around until
> then? You might think that they'd have the unfortunate effect of
> slowing down scans, and they could slow down ONE scan, but if they do,
> then I think kill_prior_tuple will hint them dead and they won't
> matter any more.

Actually I was not worried about the scan getting slow.  What I was
worried about is if we keep ignoring the dead tuples for long time
then in the worst case if we have huge number of dead tuples in the
index maybe 80% to 90% and then suddenly if we get a lot of insertion
for the keys which can not use bottom up deletion (due to the key
range).  So now we have a lot of pages which have only dead tuples but
we will still allocate new pages because we ignored the dead tuple %
and did not vacuum for a long time.

In short I am worried about creating a sudden bloat in the index due
to a lot of existing dead tuples.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com