Re: Out-of-memory error reports in libpq

2021-07-29 Thread Peter Smith
(This is not a code review - this is just to satisfy my curiosity)

I've seen lots of code like this where I may have been tempted to use
a ternary operator for readability, so I was wondering is there a PG
convention to avoid such ternary operator assignments, or is it simply
a personal taste thing, or is there some other reason?

For example:

if (msg)
  res->errMsg = msg;
else
  res->errMsg = libpq_gettext("out of memory\n");

VERSUS:

res->errMsg = msg ? msg : libpq_gettext("out of memory\n");

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Some code cleanup for pgbench and pg_verifybackup

2021-07-29 Thread Michael Paquier
On Wed, Jul 28, 2021 at 10:28:13AM -0400, Robert Haas wrote:
> I think all of these are reasonable fixes. In the case of
> pg_basebackup, a chmod() failure doesn't necessarily oblige us to give
> up and exit; we could presumably still write the data. But it may be
> best to give up and exit. The other cases appear to be clear bugs.

Yeah, there are advantages in both positions, still it is more natural
to me to not ignore this kind of failures.  Note the inconsistency
with initdb for example.  So, done.
--
Michael


signature.asc
Description: PGP signature


Re: Use WaitLatch for {pre, post}_auth_delay instead of pg_usleep

2021-07-29 Thread Kyotaro Horiguchi
At Thu, 29 Jul 2021 09:52:08 +0900, Michael Paquier  wrote 
in 
> On Wed, Jul 28, 2021 at 08:28:12PM +, Bossart, Nathan wrote:
> > On 7/28/21, 11:32 AM, "Tom Lane"  wrote:
> >> I follow the idea of using WaitLatch to ensure that the delays are
> >> interruptible by postmaster signals, but even that isn't worth a
> >> lot given the expected use of these things.  I don't see a need to
> >> expend any extra effort on wait-reporting.
> > 
> > +1.  The proposed patch doesn't make the delay visibility any worse
> > than what's already there.
> 
> Agreed to just drop the patch (my opinion about this patch is
> unchanged).  Not to mention that wait events are not available at SQL
> level at this stage yet.

I'm +1 to not adding wait event stuff at all. So the only advantage
this patch would offer is interruptivity. I vote +-0.0 for adding that
interruptivity (+1.0 from the previous opinion of mine:p).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2021-07-29 Thread Masahiko Sawada
On Tue, Mar 16, 2021 at 1:35 AM Oh, Mike  wrote:
>
> Sending this to pgsql-hackers list to create a CommitFest entry with the 
> attached patch proposal.
>
>
>
> Hello,
>
> We noticed that the logical replication could fail when the 
> Standby::RUNNING_XACT record is generated in the middle of a catalog 
> modifying transaction and if the logical decoding has to restart from the 
> RUNNING_XACT
>
> WAL entry.
>
> The Standby::RUNNING_XACT record is generated periodically (roughly every 15s 
> by default) or during a CHECKPOINT operation.
>
>
>
> Detailed problem description:
>
> Tested on 11.8 & current master.
>
> The logical replication slot restart_lsn advances in the middle of an open 
> txn that modified the catalog (e.g. TRUNCATE operation).
>
> Should the logical decoding has to restart it could fail with an error like 
> this:
>
> ERROR:  could not map filenode "base/13237/442428"

Thank you for reporting the issue.

I could reproduce this issue by the steps you shared.

>
> Currently, the system relies on processing Heap2::NEW_CID to keep track of 
> catalog modifying (sub)transactions.
>
> This context is lost if the logical decoding has to restart from a 
> Standby::RUNNING_XACTS that is written between the NEW_CID record and its 
> parent txn commit.
>
> If the logical stream restarts from this restart_lsn, then it doesn't have 
> the xid responsible for modifying the catalog.
>

I agree with your analysis. Since we don’t use commit WAL record to
track the transaction that has modified system catalogs, if we decode
only the commit record of such transaction, we cannot know the
transaction has been modified system catalogs, resulting in the
subsequent transaction scans system catalog with the wrong snapshot.

With the patch, if the commit WAL record has a XACT_XINFO_HAS_INVALS
flag and its xid is included in RUNNING_XACT record written at
restart_lsn,  we forcibly add the top XID and its sub XIDs as a
committed transaction that has modified system catalogs to the
snapshot. I might be missing something about your patch but I have
some comments on this approach:

1. Commit WAL record may not have invalidation message for system
catalogs  (e.g., when commit record has only invalidation message for
relcache) even if it has XACT_XINFO_HAS_INVALS flag. In this case, the
transaction wrongly is added to the snapshot, is that okay?

2. We might add a subtransaction XID as a committed transaction that
has modified system catalogs even if it actually didn't. As the
comment in SnapBuildBuildSnapshot() describes, we track only the
transactions that have modified the system catalog and store in the
snapshot (in the ‘xip' array). The patch could break that assumption.
However, I’m really not sure how to deal with this point. We cannot
know which subtransaction has actually modified system catalogs by
using only the commit WAL record.

3. The patch covers only the case where the restart_lsn exactly
matches the LSN of RUNNING_XACT. I wonder if there could be a case
where the decoding starts at a WAL record other than RUNNING_XACT but
the next WAL record is RUNNING_XACT.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: when the startup process doesn't (logging startup delays)

2021-07-29 Thread Nitin Jadhav
> The InitPostgres() case occurs when the server is started in bootstrap
> mode (during initdb) or in single-user mode (postgres --single). I do
> not see any reason why we shouldn't produce progress messages in at
> least the latter case. I suspect that someone who is in the rather
> desperate scenario of having to use single-user mode would really like
> to know how long the server is going to take to start up.

Thanks for sharing the information. I have done the necessary changes
to show the logs during the latter case (postgres --single) and
verified the log messages.

> +1 to emit the same log messages in single-user mode and basically
> whoever calls StartupXLOG. Do we need to adjust the GUC parameter
> log_startup_progress_interval(a reasonable value) so that the logs are
> generated by default?

At present, this feature is enabled by default and the initial value
set for log_startup_progress_interval is 10 seconds.


On Wed, Jul 28, 2021 at 9:07 PM Robert Haas  wrote:
>
> On Wed, Jul 28, 2021 at 11:25 AM Bharath Rupireddy
>  wrote:
> > > Perhaps during initdb we don't want messages, but I'm not sure that we
> > > need to do anything about that here. None of the messages that the
> > > server normally produces show up when you run initdb, so I guess they
> > > are getting redirected to /dev/null or something.
> >
> > I enabled the below log message in XLogFlush and ran initdb, it is
> > printing these logs onto the stdout, looks like the logs have not been
> > redirected to /dev/null in initdb/bootstrap mode.
> >
> > #ifdef WAL_DEBUG
> > if (XLOG_DEBUG)
> > elog(LOG, "xlog flush request %X/%X; write %X/%X; flush %X/%X",
> > LSN_FORMAT_ARGS(record),
> > LSN_FORMAT_ARGS(LogwrtResult.Write),
> > LSN_FORMAT_ARGS(LogwrtResult.Flush));
> > #endif
> >
> > So, even in bootstrap mode, can we use the above #ifdef WAL_DEBUG and
> > XLOG_DEBUG to print those logs? It looks like the problem with these
> > macros is that they are not settable by normal users in the production
> > environment, but only by the developers. I'm not sure how much it is
> > helpful to print the startup process progress logs in bootstrap mode.
> > Maybe we can use the IsBootstrapProcessingMode macro to disable these
> > logs in the bootstrap mode.
>
> I don't think we should drag XLOG_DEBUG into this. That's a debugging
> facility that isn't really relevant to the topic at hand. The point is
> the server normally prints a bunch of messages that we don't see in
> bootstrap mode. For example:
>
> [rhaas pgsql]$ postgres
> 2021-07-28 11:32:33.824 EDT [36801] LOG:  starting PostgreSQL 15devel
> on x86_64-apple-darwin19.6.0, compiled by clang version 5.0.2
> (tags/RELEASE_502/final), 64-bit
> 2021-07-28 11:32:33.825 EDT [36801] LOG:  listening on IPv6 address
> "::1", port 5432
> 2021-07-28 11:32:33.825 EDT [36801] LOG:  listening on IPv4 address
> "127.0.0.1", port 5432
> 2021-07-28 11:32:33.826 EDT [36801] LOG:  listening on Unix socket
> "/tmp/.s.PGSQL.5432"
> 2021-07-28 11:32:33.846 EDT [36802] LOG:  database system was shut
> down at 2021-07-28 11:32:32 EDT
> 2021-07-28 11:32:33.857 EDT [36801] LOG:  database system is ready to
> accept connections
>
> None of that shows up when you run initdb. Taking a fast look at the
> code, I don't think the reasons are the same for all of those
> messages. Some of the code isn't reached, whereas e.g. "database
> system was shut down at 2021-07-28 11:32:32 EDT" is special-cased. I'm
> not sure right off the top of my head what this code should do, but
> ideally it looks something like one of the cases we've already got.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
From 55c73bb0920e3487b72d2588c43c1c5e8798e6d4 Mon Sep 17 00:00:00 2001
From: Nitin 
Date: Thu, 29 Jul 2021 14:11:24 +0530
Subject: [PATCH] Shows the progress of the operations performed during startup
 process. The interval between each progress update is configurable and it
 should be mentioned in seconds

---
 src/backend/access/transam/xlog.c | 163 ++
 src/backend/postmaster/startup.c  |   1 +
 src/backend/storage/file/fd.c |  13 ++
 src/backend/storage/file/reinit.c |  17 +++
 src/backend/utils/init/postinit.c |   1 +
 src/backend/utils/misc/guc.c  |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   6 +
 src/include/access/xlog.h |  18 +++
 src/include/utils/timeout.h   |   1 +
 9 files changed, 232 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3479402..5a39da7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -79,6 +79,7 @@
 #include "utils/relmapper.h"
 #include "utils/pg_rusage.h"
 #include "utils/snapmgr.h"
+#include "utils/timeout.h"
 #include "utils/timestamp.h"
 
 extern uint32 bootstrap_data_checksum_version;
@@ -397,6 +398,23 @@ static bool doRequestWalReceiverReply;
  */
 s

Re: pg_receivewal starting position

2021-07-29 Thread Ronan Dunklau
Le jeudi 29 juillet 2021, 07:32:37 CEST Kyotaro Horiguchi a écrit :
> I didn't thought in details.  But I forgot that ordinary SQL commands
> have been prohibited in physical replication connection.  So we need a
> new replication command but it's not that a big deal.

Thank you for your feedback !


> 
> > I don't see any reason not to make it work for logical replication
> > connections / slots, but it wouldn't be that useful since we can query
> > the database in that case.
> 
> Ordinary SQL queries are usable on a logical replication slot so
> I'm not sure how logical replication connection uses the command.
> However, like you, I wouldn't bother restricting the command to
> physical replication, but perhaps the new command should return the
> slot type.
> 

Ok done in the attached patch.

> 
> I'm not sure it's worth adding complexity for such strictness.
> START_REPLICATION safely fails if someone steals the slot meanwhile.
> In the first place there's no means to protect a slot from others
> while idle.  One possible problem is the case where START_REPLICATION
> successfully acquire the slot after the new command failed.  But that
> case doesn't seem worse than the case someone advances the slot while
> absence.  So I think READ_REPLICATION_SLOT is sufficient.
> 

Ok, I implemented it like this. I tried to follow the pg_get_replication_slots 
approach with regards to how to prevent concurrent modification while reading 
the slot.

> > From pg_receivewal point of view, this would amount to:
> >  - check if we currently have wal in the target directory.
> >  
> >- if we do, proceed as currently done, by computing the start lsn and
> > 
> > timeline from the last archived wal
> > 
> >   - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the
> > 
> > restart_lsn as the start lsn if there is one, and don't provide a timeline
> > 
> >   - if we still don't have a start_lsn, fallback to using the current
> >   server
> > 
> > wal position as is done.
> 
> That's pretty much it.

Great.

> 
> > What do you think ? Which information should we provide about the slot ?
> 
> We need the timeline id to start with when using restart_lsn.  The
> current timeline can be used in most cases but there's a case where
> the LSN is historical.

Ok, see below.

> 
> pg_receivewal doesn't send a replication status report when a segment
> is finished. So after pg_receivewal stops just after a segment is
> finished, the slot stays at the beginning of the last segment.  Thus
> next time it will start from there, creating a duplicate segment.

I'm not sure I see where the problem is here. If we don't keep the segments in 
pg_walreceiver target directory, then it would be the responsibility of 
whoever moved them to make sure we don't have duplicates, or to handle them 
gracefully. 

Even if we were forcing a feedback after a segment is finished, there could 
still be a problem if the feedback never made it to the server but the segment 
was here. It might be interesting to send a feedback anyway.

Please find attached two patches implementing what we've been discussing.

Patch 0001 adds the new READ_REPLICATION_SLOT command. 
It returns for a given slot the type, restart_lsn, flush_lsn, 
restart_lsn_timeline and flush_lsn_timeline.
The timelines are determined by reading the current timeline history, and 
finding the timeline where we may find the record. I didn't find explicit test 
for eg IDENTIFY_SYSTEM so didn't write one either for this new command, but it 
is tested indirectly in patch 0002.

Patch 0002 makes pg_receivewal use that command if we use a replication slot 
and the command is available, and use the restart_lsn and restart_lsn_timeline 
as a starting point. It also adds a small test to check that we start back 
from the previous restart_lsn instead of the current flush position when our 
destination directory does not contain any WAL file. 

I also noticed we don't test following a timeline switch. It would probably be 
good to add that, both for the case where we determine the previous timeline 
from the archived segments and when it comes from the new command. What do you 
think ?


Regards,

-- 
Ronan Dunklau>From 37d9545c05b9e36aafac751f9dc549e75798413c Mon Sep 17 00:00:00 2001
From: Ronan Dunklau 
Date: Wed, 28 Jul 2021 16:34:54 +0200
Subject: [PATCH v1 1/2] Add READ_REPLICATION_SLOT command.

This commit introduces a new READ_REPLICATION_SLOT  command.
This command is used to read information about a replication slot when
using a physical replication connection.

In this first version it returns the slot type, restart_lsn, flush_lsn and
the timeline of the restart_lsn and flush_lsn, which are obtained by following the
current timeline history.
---
 doc/src/sgml/protocol.sgml |  62 +++
 src/backend/replication/repl_gram.y|  18 -
 src/backend/replication/repl_scanner.l |   1 +
 src/backend/replication/walsender.c| 106 +
 src/

Re: Slightly improve initdb --sync-only option's help message

2021-07-29 Thread Daniel Gustafsson
> On 26 Jul 2021, at 20:05, Bossart, Nathan  wrote:

> Here are my suggestions in patch form.

-   printf(_("  -S, --sync-only   only sync data directory\n"));
+   printf(_("  -S, --sync-only   safely write all database files 
to disk and exit\n"));

I think removing the word "only" here is a net loss, since it IMO clearly
conveyed that this option isn't doing any of the other initdb duties.  The
documentation states it in more words, but the help output must be brief so I
think "only" is a good keyword.

--
Daniel Gustafsson   https://vmware.com/





Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Masahiko Sawada
On Thu, Jul 29, 2021 at 3:53 AM Andres Freund  wrote:
>
> Hi,
>
> On 2021-07-27 13:06:56 +0900, Masahiko Sawada wrote:
> > Apart from performance and memory usage points of view, we also need
> > to consider the reusability of the code. When I started this thread, I
> > thought the best data structure would be the one optimized for
> > vacuum's dead tuple storage. However, if we can use a data structure
> > that can also be used in general, we can use it also for other
> > purposes. Moreover, if it's too optimized for the current TID system
> > (32 bits block number, 16 bits offset number, maximum block/offset
> > number, etc.) it may become a blocker for future changes.
>
> Indeed.
>
>
> > In that sense, radix tree also seems good since it can also be used in
> > gist vacuum as a replacement for intset, or a replacement for hash
> > table for shared buffer as discussed before. Are there any other use
> > cases?
>
> Yes, I think there are. Whenever there is some spatial locality it has a
> decent chance of winning over a hash table, and it will most of the time
> win over ordered datastructures like rbtrees (which perform very poorly
> due to the number of branches and pointer dispatches). There's plenty
> hashtables, e.g. for caches, locks, etc, in PG that have a medium-high
> degree of locality, so I'd expect a few potential uses. When adding
> "tree compression" (i.e. skip inner nodes that have a single incoming &
> outgoing node) radix trees even can deal quite performantly with
> variable width keys.

Good point.

>
> > On the other hand, I’m concerned that radix tree would be an
> > over-engineering in terms of vacuum's dead tuples storage since the
> > dead tuple storage is static data and requires only lookup operation,
> > so if we want to use radix tree as dead tuple storage, I'd like to see
> > further use cases.
>
> I don't think we should rely on the read-only-ness. It seems pretty
> clear that we'd want parallel dead-tuple scans at a point not too far
> into the future?

Indeed. Given that the radix tree itself has other use cases, I have
no concern about using radix tree for vacuum's dead tuples storage. It
will be better to have one that can be generally used and has some
optimizations that are helpful also for vacuum's use case, rather than
having one that is very optimized only for vacuum's use case.

During the performance benchmark, I found some bugs in the radix tree
implementation. Also, we need the functionality of tree iteration, and
if we have the radix tree in the source tree as a general library, we
need some changes since the current implementation seems to be for a
replacement for shared buffer’s hash table. I'll try to work on those
stuff as PoC if you don't. What do you think?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-07-29 Thread Michael Meskes

All,

> between DECLARE and the past queries qualify as an open item.  I am
> adding Michael Meskes in CC.  I got to wonder how much of a
> compatibility break it would be for DEALLOCATE and DESCRIBE to handle
> EXEC SQL AT in a way more consistent than DECLARE, even if these are
> bounded to a result set, and not a connection.

I just wanted to let you know that I'm well aware of this thread but
need to get through my backlog before I can comment. Sorry for the
delay.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De
Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org


signature.asc
Description: This is a digitally signed message part


Doc: Fixed the result of the bit_count example

2021-07-29 Thread wangzk.fns...@fujitsu.com
Hello:

There is a small problem with the documentation for the previously added SQL 
function “bit_count”.
In the doc, (https://www.postgresql.org/docs/14/functions-binarystring.html)
“bit_count('\x1234567890'::bytea)” result is "31" , but the actual result is 
"15".

Similar problems have been fixed in test file (commit: 
ebedd0c78fc51c293abe56e99a18c67af14da0c9),  but the doc has not been modified.
Regards,
Wang


0001-modify-doc-about-bit_count.patch
Description: 0001-modify-doc-about-bit_count.patch


Re: Doc: Fixed the result of the bit_count example

2021-07-29 Thread Daniel Gustafsson
> On 29 Jul 2021, at 11:23, wangzk.fns...@fujitsu.com wrote:
> 
> Hello:
> 
> There is a small problem with the documentation for the previously added SQL 
> function “bit_count”.
> 
> In the doc, (https://www.postgresql.org/docs/14/functions-binarystring.html)
> “bit_count('\x1234567890'::bytea)” result is "31" , but the actual result is 
> "15".
> 
> Similar problems have been fixed in test file (commit: 
> ebedd0c78fc51c293abe56e99a18c67af14da0c9),  but the doc has not been modified.

Good catch, with the default value of standard_conforming_strings the result is
indeed 15 and should be updated to reflect that.  I'll apply this shortly
backpatched to 14 where bit_count was introduced.

--
Daniel Gustafsson   https://vmware.com/





Re: Out-of-memory error reports in libpq

2021-07-29 Thread Andrew Dunstan


On 7/29/21 3:01 AM, Peter Smith wrote:
> (This is not a code review - this is just to satisfy my curiosity)
>
> I've seen lots of code like this where I may have been tempted to use
> a ternary operator for readability, so I was wondering is there a PG
> convention to avoid such ternary operator assignments, or is it simply
> a personal taste thing, or is there some other reason?
>
> For example:
>
> if (msg)
>   res->errMsg = msg;
> else
>   res->errMsg = libpq_gettext("out of memory\n");
>
> VERSUS:
>
> res->errMsg = msg ? msg : libpq_gettext("out of memory\n");
>


A simple grep on the sources should disabuse you of any idea that there
is such a convention. The code is littered with examples of the ?: operator.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Replace l337sp34k in comments.

2021-07-29 Thread Andrew Dunstan


On 7/27/21 7:39 PM, Peter Smith wrote:
> IMO the PG code comments are not an appropriate place for leetspeak 
> creativity.
>
> PSA a patch to replace a few examples that I recently noticed.
>
> "up2date" --> "up-to-date"
>

Personally, I would have written this as just "up to date", I don't
think the hyphens are required.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
On Thu, 29 Jul 2021 at 00:35, David G. Johnston 
wrote:

> On Wed, Jul 28, 2021 at 6:52 PM Bruce Momjian  wrote:
>
>> I came up with the attached patch.
>>
>
> Thank you.  It is an improvement but I think more could be done here (not
> exactly sure what - though removing the "copy binaries for contrib modules
> from the old server" seems like a decent second step.)
>
> I'm not sure it really needs a parenthetical, and I personally dislike
> using "Consider" to start the sentence.
>
> "Bringing extensions up to the newest version available on the new server
> can be done later using ALTER EXTENSION UPGRADE (after ensuring the correct
> binaries are installed)."
>


As for the patch. What exactly is being copied ?
This is all very confusing. Some of the extensions will work perfectly fine
on the new server without an upgrade. At least one of the extensions will
appear to function perfectly fine with new binaries and old function
definitions.
Seems to me we need to do more.

Dave


Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Yura Sokolov

Masahiko Sawada писал 2021-07-29 12:11:
On Thu, Jul 29, 2021 at 3:53 AM Andres Freund  
wrote:


Hi,

On 2021-07-27 13:06:56 +0900, Masahiko Sawada wrote:
> Apart from performance and memory usage points of view, we also need
> to consider the reusability of the code. When I started this thread, I
> thought the best data structure would be the one optimized for
> vacuum's dead tuple storage. However, if we can use a data structure
> that can also be used in general, we can use it also for other
> purposes. Moreover, if it's too optimized for the current TID system
> (32 bits block number, 16 bits offset number, maximum block/offset
> number, etc.) it may become a blocker for future changes.

Indeed.


> In that sense, radix tree also seems good since it can also be used in
> gist vacuum as a replacement for intset, or a replacement for hash
> table for shared buffer as discussed before. Are there any other use
> cases?

Yes, I think there are. Whenever there is some spatial locality it has 
a
decent chance of winning over a hash table, and it will most of the 
time
win over ordered datastructures like rbtrees (which perform very 
poorly

due to the number of branches and pointer dispatches). There's plenty
hashtables, e.g. for caches, locks, etc, in PG that have a medium-high
degree of locality, so I'd expect a few potential uses. When adding
"tree compression" (i.e. skip inner nodes that have a single incoming 
&

outgoing node) radix trees even can deal quite performantly with
variable width keys.


Good point.



> On the other hand, I’m concerned that radix tree would be an
> over-engineering in terms of vacuum's dead tuples storage since the
> dead tuple storage is static data and requires only lookup operation,
> so if we want to use radix tree as dead tuple storage, I'd like to see
> further use cases.

I don't think we should rely on the read-only-ness. It seems pretty
clear that we'd want parallel dead-tuple scans at a point not too far
into the future?


Indeed. Given that the radix tree itself has other use cases, I have
no concern about using radix tree for vacuum's dead tuples storage. It
will be better to have one that can be generally used and has some
optimizations that are helpful also for vacuum's use case, rather than
having one that is very optimized only for vacuum's use case.


Main portion of svtm that leads to memory saving is compression of many
pages at once (CHUNK). It could be combined with radix as a storage for
pointers to CHUNKs.

For a moment I'm benchmarking IntegerSet replacement based on Trie (HATM 
like)

and CHUNK compression, therefore datastructure could be used for gist
vacuum as well.

Since it is generic (allows to index all 64bit) it lacks of trick used
to speedup svtm. Still on 10x test it is faster than radix.

I'll send result later today after all benchmarks complete.

And I'll try then to make mix of radix and CHUNK compression.


During the performance benchmark, I found some bugs in the radix tree
implementation.


There is a bug in radix_to_key_off as well:

tid_i |= ItemPointerGetBlockNumber(tid) << shift;

ItemPointerGetBlockNumber returns uint32, therefore result after shift
is uint32 as well.

It leads to lesser memory consumption (and therefore better times) on
10x test, when page number exceed 2^23 (8M). It still produce "correct"
result for test since every page is filled in the same way.

Could you push your fixes for radix, please?

regards,
Yura Sokolov

y.soko...@postgrespro.ru
funny.fal...@gmail.com




Re: [Patch] ALTER SYSTEM READ ONLY

2021-07-29 Thread Dilip Kumar
On Wed, Jul 28, 2021 at 5:03 PM Amul Sul  wrote:
>
> I was too worried about how I could miss that & after thinking more
> about that, I realized that the operation for ArchiveRecoveryRequested
> is never going to be skipped in the startup process and that never
> left for the checkpoint process to do that later. That is the reason
> that assert was added there.
>
> When ArchiveRecoveryRequested, the server will no longer be in
> the wal prohibited mode, we implicitly change the state to
> wal-permitted. Here is the snip from the 0003 patch:
>
> @@ -6614,13 +6629,30 @@ StartupXLOG(void)
>   (errmsg("starting archive recovery")));
>   }
>
> - /*
> - * Take ownership of the wakeup latch if we're going to sleep during
> - * recovery.
> - */
>   if (ArchiveRecoveryRequested)
> + {
> + /*
> + * Take ownership of the wakeup latch if we're going to sleep during
> + * recovery.
> + */
>   OwnLatch(&XLogCtl->recoveryWakeupLatch);
>
> + /*
> + * Since archive recovery is requested, we cannot be in a wal prohibited
> + * state.
> + */
> + if (ControlFile->wal_prohibited)
> + {
> + /* No need to hold ControlFileLock yet, we aren't up far enough */
> + ControlFile->wal_prohibited = false;
> + ControlFile->time = (pg_time_t) time(NULL);
> + UpdateControlFile();
> +

Is there some reason why we are forcing 'wal_prohibited' to off if we
are doing archive recovery?  It might have already been discussed, but
I could not find it on a quick look into the thread.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Out-of-memory error reports in libpq

2021-07-29 Thread Ranier Vilela
Em qui., 29 de jul. de 2021 às 00:40, Tom Lane  escreveu:

> Ranier Vilela  writes:
> > IMO, I think that "char *msg" is unnecessary, isn't it?
>
> > + if (!PQExpBufferBroken(errorMessage))
> > + res->errMsg = pqResultStrdup(res, errorMessage->data);
> >   else
> > - res->errMsg = NULL;
> > + res->errMsg = libpq_gettext("out of memory\n");
>
> Please read the comment.
>
You're right, I missed pqResultStrdup fail.

+1

regards,
Ranier Vilela


Re: Out-of-memory error reports in libpq

2021-07-29 Thread Ranier Vilela
Em qui., 29 de jul. de 2021 às 04:02, Peter Smith 
escreveu:

> (This is not a code review - this is just to satisfy my curiosity)
>
> I've seen lots of code like this where I may have been tempted to use
> a ternary operator for readability, so I was wondering is there a PG
> convention to avoid such ternary operator assignments, or is it simply
> a personal taste thing, or is there some other reason?
>
> For example:
>
> if (msg)
>   res->errMsg = msg;
> else
>   res->errMsg = libpq_gettext("out of memory\n");
>
The C compiler will expand:

res->errMsg = msg ? msg : libpq_gettext("out of memory\n");

to

if (msg)
 res->errMsg = msg;
else
 res->errMsg = libpq_gettext("out of memory\n");

What IMHO is much more readable.

regards,
Ranier Vilela


Re: [Patch] ALTER SYSTEM READ ONLY

2021-07-29 Thread Amul Sul
On Thu, Jul 29, 2021 at 4:47 PM Dilip Kumar  wrote:
>
> On Wed, Jul 28, 2021 at 5:03 PM Amul Sul  wrote:
> >
> > I was too worried about how I could miss that & after thinking more
> > about that, I realized that the operation for ArchiveRecoveryRequested
> > is never going to be skipped in the startup process and that never
> > left for the checkpoint process to do that later. That is the reason
> > that assert was added there.
> >
> > When ArchiveRecoveryRequested, the server will no longer be in
> > the wal prohibited mode, we implicitly change the state to
> > wal-permitted. Here is the snip from the 0003 patch:
> >
> > @@ -6614,13 +6629,30 @@ StartupXLOG(void)
> >   (errmsg("starting archive recovery")));
> >   }
> >
> > - /*
> > - * Take ownership of the wakeup latch if we're going to sleep during
> > - * recovery.
> > - */
> >   if (ArchiveRecoveryRequested)
> > + {
> > + /*
> > + * Take ownership of the wakeup latch if we're going to sleep during
> > + * recovery.
> > + */
> >   OwnLatch(&XLogCtl->recoveryWakeupLatch);
> >
> > + /*
> > + * Since archive recovery is requested, we cannot be in a wal prohibited
> > + * state.
> > + */
> > + if (ControlFile->wal_prohibited)
> > + {
> > + /* No need to hold ControlFileLock yet, we aren't up far enough */
> > + ControlFile->wal_prohibited = false;
> > + ControlFile->time = (pg_time_t) time(NULL);
> > + UpdateControlFile();
> > +
>
> Is there some reason why we are forcing 'wal_prohibited' to off if we
> are doing archive recovery?  It might have already been discussed, but
> I could not find it on a quick look into the thread.
>

Here is: 
https://postgr.es/m/CA+TgmoZ=cctbaxxmtyzogxegqzoz9smkbwrdpsacpjvfcgc...@mail.gmail.com

Regards,
Amul




Re: [HACKERS] logical decoding of two-phase transactions

2021-07-29 Thread Amit Kapila
On Tue, Jul 27, 2021 at 11:41 AM Peter Smith  wrote:
>
> Please find attached the latest patch set v99*
>
> v98-0001 --> split into v99-0001 + v99-0002
>

Pushed the first refactoring patch after making few modifications as below.
1.
- /* open the spool file for the committed transaction */
+ /* Open the spool file for the committed/prepared transaction */
  changes_filename(path, MyLogicalRepWorker->subid, xid);

In the above comment, we don't need to say prepared. It can be done as
part of the second patch.

2.
+apply_handle_prepare_internal(LogicalRepPreparedTxnData
*prepare_data, char *gid)

I don't think there is any need for this function to take gid as
input. It can compute by itself instead of callers doing it.

3.
+static TransactionId+logicalrep_read_prepare_common(StringInfo in,
char *msgtype,
+   LogicalRepPreparedTxnData *prepare_data)

I don't think the above function needs to return xid because it is
already present as part of prepare_data. Even, if it is required due
to some reason for the second patch then let's do it as part of if but
I don't think it is required for the second patch.

4.
 /*
- * Write PREPARE to the output stream.
+ * Common code for logicalrep_write_prepare and
logicalrep_write_stream_prepare.
  */

Here and at a similar another place, we don't need to refer to
logicalrep_write_stream_prepare as that is part of the second patch.

Few comments on 0002 patch:
==
1.
+# -
+# 2PC + STREAMING TESTS
+# -
+
+# Setup logical replication (streaming = on)
+
+$node_B->safe_psql('postgres', "
+ ALTER SUBSCRIPTION tap_sub_B
+ SET (streaming = on);");
+
+$node_C->safe_psql('postgres', "
+ ALTER SUBSCRIPTION tap_sub_C
+ SET (streaming = on)");
+
+# Wait for subscribers to finish initialization
+$node_A->wait_for_catchup($appname_B);
+$node_B->wait_for_catchup($appname_C);

This is not the right way to determine if the new streaming option is
enabled on the publisher. Even if there is no restart of apply workers
(and walsender) after you have enabled the option, the above wait will
succeed. You need to do something like below as we are doing in
001_rep_changes.pl:

$oldpid = $node_publisher->safe_psql('postgres',
"SELECT pid FROM pg_stat_replication WHERE application_name = 'tap_sub';"
);
$node_subscriber->safe_psql('postgres',
"ALTER SUBSCRIPTION tap_sub SET PUBLICATION tap_pub_ins_only WITH
(copy_data = false)"
);
$node_publisher->poll_query_until('postgres',
"SELECT pid != $oldpid FROM pg_stat_replication WHERE application_name
= 'tap_sub';"
) or die "Timed out while waiting for apply to restart";

2.
+# Create some pre-existing content on publisher (uses same DDL as
015_stream test)

Here, in the comments, I don't see the need to same uses same DDL ...

-- 
With Regards,
Amit Kapila.




Re: Replace l337sp34k in comments.

2021-07-29 Thread Geoff Winkless
On Thu, 29 Jul 2021 at 11:22, Andrew Dunstan  wrote:

> Personally, I would have written this as just "up to date", I don't
> think the hyphens are required.
>

FWIW Mirriam-Webster and the CED suggest "up-to-date" when before a noun,
so the changes should be "up-to-date answer" but "are up to date".

https://dictionary.cambridge.org/dictionary/english/up-to-date

Geoff


Re: Out-of-memory error reports in libpq

2021-07-29 Thread Tom Lane
Andrew Dunstan  writes:
> On 7/29/21 3:01 AM, Peter Smith wrote:
>> I've seen lots of code like this where I may have been tempted to use
>> a ternary operator for readability, so I was wondering is there a PG
>> convention to avoid such ternary operator assignments, or is it simply
>> a personal taste thing, or is there some other reason?

> A simple grep on the sources should disabuse you of any idea that there
> is such a convention. The code is littered with examples of the ?: operator.

Yeah.  I happened not to write it that way here, but if I'd been reviewing
someone else's code and they'd done it that way, I'd not have objected.

In the case at hand, I'd personally avoid a ternary op for the first
assignment because then the line would run over 80 characters, and
you'd have to make decisions about where to break it.  (We don't have
a standardized convention about that, and none of the alternatives
look very good to my eye.)  Then it seemed to make sense to also
write the second step as an "if" not a ternary op.

regards, tom lane




Re: Showing applied extended statistics in explain

2021-07-29 Thread Dmitry Dolgov
> On Tue, Jul 27, 2021 at 10:20:34PM +0200, Tomas Vondra wrote:
>
> >> 1) The information is stashed in multiple lists added to a Plan. Maybe
> >> there's a better place, and maybe we need to invent a better way to
> >> track the info (a new node stashed in a single List).
> >>
> >> ...
> >>
> >> 3) It does not work for functional dependencies, because we effectively
> >> "merge" all functional dependencies and apply the entries. Not sure how
> >> to display this, but I think it should show the individual dependencies
> >> actually applied.
> >>
> >> ...
> >>
> >> 5) It includes just statistics name + clauses, but maybe we should
> >> include additional info (e.g estimate for that combination of clauses).
> >
> > Yes, a new node would be nice to have. The other questions above are
> > somewhat related to what it should contain, and I guess it depends on
> > the use case this patch targets. E.g. for the case "help to figure out
> > if an extended statistics was applied" even "merged" functional
> > dependencies would be fine I believe.
>
> What do you mean by "merged" functional dependencies? I guess we could
> say "these clauses were estimated using dependencies" without listing
> which exact dependencies were applied.

Yes, that's exactly what I was thinking. From the original email I've
got an impression that in case of functional dependencies you plan to
display the info only with the individual dependencies (when
implemented) or nothing at all. By "merged" I wanted to refer to the
statement about "merge" all functional dependencies and apply the
entries.

> > More advanced plan troubleshooting may benefit from estimates.
>
> I'm sorry, I don't know what you mean by this. Can you elaborate?

Yeah, sorry for not being clear. The idea was that the question about including
"additional info" strongly depends on which use cases the patch tries to
address, and I follow up on that further. There is no more hidden detailed
meaning here :)

> > I've got few more questions after reading the patch:
> >
> > * Is there any particular reason behind choosing only some scan nodes to
> >   display extended stats? E.g. BitmapHeapScan is missing.
> >
>
> All nodes that may apply extended stats to estimate quals should include
> this info. I guess BitmapHeapScan may do that for the filter, right?
>

Yes, something like this (stats output added by me):

 Bitmap Heap Scan on public.tenk1
   Output: unique1
   Recheck Cond: (tenk1.unique1 < 1000)
   Filter: (tenk1.stringu1 = 'xxx'::name)
   Statistics: public.s Clauses: ((unique1 < 1000) AND (stringu1 = 
'xxx'::name))
   ->  Bitmap Index Scan on tenk1_unique1
 Index Cond: (tenk1.unique1 < 1000)




Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Masahiko Sawada
On Thu, Jul 29, 2021 at 8:03 PM Yura Sokolov  wrote:
>
> Masahiko Sawada писал 2021-07-29 12:11:
> > On Thu, Jul 29, 2021 at 3:53 AM Andres Freund 
> > wrote:
> >>
> >> Hi,
> >>
> >> On 2021-07-27 13:06:56 +0900, Masahiko Sawada wrote:
> >> > Apart from performance and memory usage points of view, we also need
> >> > to consider the reusability of the code. When I started this thread, I
> >> > thought the best data structure would be the one optimized for
> >> > vacuum's dead tuple storage. However, if we can use a data structure
> >> > that can also be used in general, we can use it also for other
> >> > purposes. Moreover, if it's too optimized for the current TID system
> >> > (32 bits block number, 16 bits offset number, maximum block/offset
> >> > number, etc.) it may become a blocker for future changes.
> >>
> >> Indeed.
> >>
> >>
> >> > In that sense, radix tree also seems good since it can also be used in
> >> > gist vacuum as a replacement for intset, or a replacement for hash
> >> > table for shared buffer as discussed before. Are there any other use
> >> > cases?
> >>
> >> Yes, I think there are. Whenever there is some spatial locality it has
> >> a
> >> decent chance of winning over a hash table, and it will most of the
> >> time
> >> win over ordered datastructures like rbtrees (which perform very
> >> poorly
> >> due to the number of branches and pointer dispatches). There's plenty
> >> hashtables, e.g. for caches, locks, etc, in PG that have a medium-high
> >> degree of locality, so I'd expect a few potential uses. When adding
> >> "tree compression" (i.e. skip inner nodes that have a single incoming
> >> &
> >> outgoing node) radix trees even can deal quite performantly with
> >> variable width keys.
> >
> > Good point.
> >
> >>
> >> > On the other hand, I’m concerned that radix tree would be an
> >> > over-engineering in terms of vacuum's dead tuples storage since the
> >> > dead tuple storage is static data and requires only lookup operation,
> >> > so if we want to use radix tree as dead tuple storage, I'd like to see
> >> > further use cases.
> >>
> >> I don't think we should rely on the read-only-ness. It seems pretty
> >> clear that we'd want parallel dead-tuple scans at a point not too far
> >> into the future?
> >
> > Indeed. Given that the radix tree itself has other use cases, I have
> > no concern about using radix tree for vacuum's dead tuples storage. It
> > will be better to have one that can be generally used and has some
> > optimizations that are helpful also for vacuum's use case, rather than
> > having one that is very optimized only for vacuum's use case.
>
> Main portion of svtm that leads to memory saving is compression of many
> pages at once (CHUNK). It could be combined with radix as a storage for
> pointers to CHUNKs.
>
> For a moment I'm benchmarking IntegerSet replacement based on Trie (HATM
> like)
> and CHUNK compression, therefore datastructure could be used for gist
> vacuum as well.
>
> Since it is generic (allows to index all 64bit) it lacks of trick used
> to speedup svtm. Still on 10x test it is faster than radix.

BTW, how does svtm work when we add two sets of dead tuple TIDs to one
svtm? Dead tuple TIDs are unique sets but those sets could have TIDs
of the different offsets on the same block. The case I imagine is the
idea discussed on this thread[1]. With this idea, we store the
collected dead tuple TIDs somewhere and skip index vacuuming for some
reason (index skipping optimization, failsafe mode, or interruptions
etc.). Then, in the next lazy vacuum timing, we load the dead tuple
TIDs and start to scan the heap. During the heap scan in the second
lazy vacuum, it's possible that new dead tuples will be found on the
pages that we have already stored in svtm during the first lazy
vacuum. How can we efficiently update the chunk in the svtm?

Regards,

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoZgapzekbTqdBrcH8O8Yifi10_nB7uWLB8ajAhGL21M6A%40mail.gmail.com


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR

2021-07-29 Thread Daniel Verite
   Hi,

Trying the v7a patch, here are a few comments:

* SIGSEGV with ON HOLD cursors.

Reproducer:

declare c cursor with hold for select oid,relname
  from pg_class order by 1 limit 10;

select * from rows_in('c') as x(f1 oid,f2 name);

consumes a bit of time, then crashes and generates a 13 GB core file
without a usable stacktrace:

Core was generated by `postgres: daniel postgres [local] SELECT  '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7f4c5b2f3dc9 in ?? ()
(gdb) bt
#0  0x7f4c5b2f3dc9 in ?? ()
#1  0x564567efc505 in ?? ()
#2  0x0001 in ?? ()
#3  0x56456a4b28f8 in ?? ()
#4  0x56456a4b2908 in ?? ()
#5  0x56456a4b2774 in ?? ()
#6  0x56456a4ad218 in ?? ()
#7  0x56456a4b1590 in ?? ()
#8  0x0010 in ?? ()
#9  0x in ?? ()


* rows_in() does not fetch from the current position of the cursor,
but from the start. For instance, I would expect that if doing
FETCH FROM cursor followed by SELECT * FROM rows_in('cursor'), the first
row would be ignored by rows_in(). That seems more convenient and more
principled.


* 
+  
+   This section describes functions that cursors to be manipulated
+   in normal SELECT queries.
+  

A verb seems to be missing.
It should be "function that *allow* cursors to be..." or something
like that?

* 
+   The REFCURSOR must be open, and the query must be a
+   SELECT statement. If the REFCURSOR’s
+   output does not

After  there is a fancy quote (codepoint U+2019). There is
currently no codepoint outside of US-ASCII in *.sgml ref/*.sgml, so
they're probably not welcome.


* Also: does the community wants it as a built-in function in core?
As mentioned in a previous round of review, a function like this in
plpgsql comes close:

create function rows_in(x refcursor) returns setof record as $$
declare
 r record;
begin
  loop
fetch x into r;
exit when not found;
return next r;
  end loop;
end $$ language plpgsql;



Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: libpq compression

2021-07-29 Thread Daniil Zakhlystov
Hi!

I made some noticeable changes to the patch and fixed the previously mentioned 
issues.

On Fri, Mar 19, 2021 at 16:28 AM Justin Pryzby  wrote:
> Previously, I suggested that the server should have a "policy" GUC defining
> which compression methods are allowed.  Possibly including compression 
> "level".
> For example, the default might be to allow zstd, but only up to level 9.

Now libpq_compression GUC server setting controls the available client-server 
traffic compression methods.
It allows specifying the exact list of the allowed compression algorithms.

For example, to allow only and zlib, set the setting to "zstd,zlib". Also, 
maximal allowed compression level can be specified for each 
method, e.g. "zstd:1,zlib:2" setting will set the maximal compression level for 
zstd to 1 and zlib to 2. 
If a client requests the compression with a higher compression level, it will 
be set to the maximal allowed one. 
The default (and recommended) maximal compression level for each algorithm is 1.

On Fri, Mar 19, 2021 at 16:28 AM Justin Pryzby  wrote:
> There's commit messages claiming that the compression can be "asymmetric"
> between client-server vs server-client, but the commit seems to be unrelated,
> and the protocol documentation doesn't describe how this works.

On Dec 10, 2020, at 1:39 AM, Robert Haas  wrote:
> Another idea is that you could have a new message type that says "hey,
> the payload of this is 1 or more compressed messages." It uses the
> most-recently set compression method. This would make switching
> compression methods easier since the SetCompressionMethod message
> itself could always be sent uncompressed and/or not take effect until
> the next compressed message


I rewrote the compression initialization logic. Now it is asymmetric and the 
compression method is changeable on the fly.

Now compression initialization consists only of two phases:
1. Client sends startup packet with _pq_.compression containing the list of 
compression algorithms specified by the client with an optional
specification of compression level (e.g. “zstd:2,zlib:1”)
2. The server intersects the requested compression algorithms with the allowed 
ones (controlled via the libpq_compression server config setting).
If the intersection is not empty, the server responds with CompressionAck 
containing the final list of the compression algorithms that can be used for 
the compression of libpq messages between the client and server.
If the intersection is empty (server does not accept any of the requested 
algorithms), then it replies with CompressionAck containing the empty list.

After sending the CompressionAck message, the server can send the 
SetCompressionMethod message to set the current compression algorithm for 
server-to-client traffic compression.
After receiving the CompressionAck message, the client can send the 
SetCompressionMethod message to set the current compression algorithm for 
client-to-server traffic compression.

SetCompressionMethod message contains the index of the compression algorithm in 
the final list of the compression algorithms which is generated during 
compression initialization (described above).

Compressed data is wrapped into CompressedData messages.

Rules for compressing the protocol messages are defined in zpq_stream.c. For 
each protocol message, the preferred compression algorithm can be chosen.

On Wed, Apr 21, 2021 at 15:35 AM Ian Zagorskikh  
wrote:
> Let me drop my humble 2 cents in this thread. At this moment I checked only 
> 0001-Add-zlib-and-zstd-streaming-compression patch. With no order:
> 
> * No checks for failures. For example, value from malloc() is not checked for 
> NULL and used as-is right after the call. Also as a result possibly NULL 
> values passed into ZSTD functions which are explicitly not NULL-tolerant and 
> so dereference pointers without checks.
> 
> * Used memory management does not follow the schema used in the common 
> module. Direct calls to std C malloc/free are hard coded. AFAIU this is not 
> backend friendly. Looking at the code around I believe they should be wrapped 
> to ALLOC/FREE local macroses that depend on FRONTEND define. As it's done for 
> example. in src/common/hmac.c.
> 
> * If we're going to fix our code to be palloc/pfree friendly there's also a 
> way to pass our custom allocator callbacks inside ZSTD functions. By default 
> ZSTD uses malloc/free but it can be overwritten by the caller in e.g. 
> ZSTD_createDStream_advanced(ZSTD_customMem customMem) versions of API . IMHO 
> that would be good. If a 3rd party component allows us to inject  a custom 
> memory allocator and we do have it why not use this feature?
> 
> * Most zs_foo() functions do not check ptr arguments, though some like 
> zs_free() do. If we're speaking about a "common API" that can be used by a 
> wide range of different modules around the project, such a relaxed way to 
> treat input arguments IMHO can be an evil. Or at least add Assert(ptr) 
> ass

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Wed, Jul 28, 2021 at 09:35:28PM -0700, David G. Johnston wrote:
> On Wed, Jul 28, 2021 at 6:52 PM Bruce Momjian  wrote:
> 
> I came up with the attached patch.
> 
> 
> Thank you.  It is an improvement but I think more could be done here (not
> exactly sure what - though removing the "copy binaries for contrib modules 
> from
> the old server" seems like a decent second step.)

Uh, I don't see that text.

> I'm not sure it really needs a parenthetical, and I personally dislike using
> "Consider" to start the sentence.
> 
> "Bringing extensions up to the newest version available on the new server can
> be done later using ALTER EXTENSION UPGRADE (after ensuring the correct
> binaries are installed)."

OK, I went with this new text.  There is confusion over install vs copy,
and whether this is happening at the operating system level or the SQL
level.  I tried to clarify that, but I am not sure I was successful.  I
also used the word "extension" since this is more common than "custom".

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index a83c63cd98..5ab7c57cd9 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -299,15 +299,17 @@ make prefix=/usr/local/pgsql.new install

 

-Install custom shared object files
+Install extension shared object files
 
 
- Install any custom shared object files (or DLLs) used by the old cluster
- into the new cluster, e.g., pgcrypto.so,
- whether they are from contrib
- or some other source. Do not install the schema definitions, e.g.,
- CREATE EXTENSION pgcrypto, because these will be upgraded
- from the old cluster.
+ If the old cluster used extensions, whether from
+ contrib or some other source, install new
+ versions of the extension shared object files (or DLLs) into the
+ new cluster, e.g., pgcrypto.so.
+ Do not load the schema definitions, e.g.,
+ CREATE EXTENSION pgcrypto, because these will be
+ recreated from the old cluster.  (The extensions may be
+ upgraded later using ALTER EXTENSION ... UPGRADE.)
  Also, any custom full text search files (dictionary, synonym,
  thesaurus, stop words) must also be copied to the new cluster.
 
@@ -494,10 +496,10 @@ pg_upgrade.exe
  
 
  
-  Install custom shared object files
+  Install extension shared object files
 
   
-   Install the same custom shared object files on the new standbys
+   Install the same extension shared object files on the new standbys
that you installed in the new primary cluster.
   
  


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
>
> OK, I went with this new text.  There is confusion over install vs copy,
> and whether this is happening at the operating system level or the SQL
> level.  I tried to clarify that, but I am not sure I was successful.  I
> also used the word "extension" since this is more common than "custom".
>

Much better, however I'm unclear on whether CREATE EXTENSION is actually
executed on the upgraded server.

>From what I could gather it is not, otherwise the new function definitions
should have been in place.
I think what happens is that the function definitions are copied as part of
the schema and pg_extension is also copied.

Dave

>
>


Re: [postgres_fdw] add local pid to fallback_application_name

2021-07-29 Thread Tom Lane
"kuroda.hay...@fujitsu.com"  writes:
> I propose adding trackable information in postgres_fdw, in order to track 
> remote query correctly.

I don't think this is a great idea as-is.  People who need to do this
sort of thing will all have their own ideas of what they need to track
--- most obviously, it might be appropriate to include the originating
server's name, else you don't know what machine the PID is for.
So I think most people with this sort of requirement will be overriding
the default application name anyway, so we might as well keep the
default behavior simple.

What would be better to think about is how to let users specify this
kind of behavior for themselves.  I think it's possible to set
application_name as part of a foreign server's connection options,
but at present the result would only be a constant string.  Somebody
who wished the PID to be in there would like to have some sort of
formatting escape, say "%p" for PID.  Extrapolating wildly, maybe we
could make all the %-codes known to log_line_prefix available here.

Perhaps this is overkill.  But I think the patch you have here is
not going to make very many people happy: it'll either be detail
they don't want, or too little detail.

regards, tom lane




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 11:01:43AM -0400, Dave Cramer wrote:
> 
> 
> 
> 
> OK, I went with this new text.  There is confusion over install vs copy,
> and whether this is happening at the operating system level or the SQL
> level.  I tried to clarify that, but I am not sure I was successful.  I
> also used the word "extension" since this is more common than "custom".
> 
> 
> Much better, however I'm unclear on whether CREATE EXTENSION is actually
> executed on the upgraded server.
> 
> From what I could gather it is not, otherwise the new function definitions
> should have been in place. 
> I think what happens is that the function definitions are copied as part of 
> the
> schema and pg_extension is also copied.

Yes, the _effect_ of CREATE EXTENSION in the old cluster is copied to
the new cluster as object definitions.  CREATE EXTENSION runs the SQL
files associated with the extension.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Replace l337sp34k in comments.

2021-07-29 Thread Andrew Dunstan


On 7/29/21 8:51 AM, Geoff Winkless wrote:
> On Thu, 29 Jul 2021 at 11:22, Andrew Dunstan  > wrote:
>
> Personally, I would have written this as just "up to date", I don't
> think the hyphens are required.
>
>  
> FWIW Mirriam-Webster and the CED suggest "up-to-date" when before a
> noun, so the changes should be "up-to-date answer" but "are up to date".
>
> https://dictionary.cambridge.org/dictionary/english/up-to-date
> 
>
>

Interesting, thanks. My (admittedly old) Concise OED only has the
version with spaces, while my (also old) Collins Concise has the
hyphenated version. I learn something new every day, no matter how trivial.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
On Thu, 29 Jul 2021 at 11:10, Bruce Momjian  wrote:

> On Thu, Jul 29, 2021 at 11:01:43AM -0400, Dave Cramer wrote:
> >
> >
> >
> >
> > OK, I went with this new text.  There is confusion over install vs
> copy,
> > and whether this is happening at the operating system level or the
> SQL
> > level.  I tried to clarify that, but I am not sure I was
> successful.  I
> > also used the word "extension" since this is more common than
> "custom".
> >
> >
> > Much better, however I'm unclear on whether CREATE EXTENSION is actually
> > executed on the upgraded server.
> >
> > From what I could gather it is not, otherwise the new function
> definitions
> > should have been in place.
> > I think what happens is that the function definitions are copied as part
> of the
> > schema and pg_extension is also copied.
>
> Yes, the _effect_ of CREATE EXTENSION in the old cluster is copied to
> the new cluster as object definitions.  CREATE EXTENSION runs the SQL
> files associated with the extension.
>

OK, I think we should be more clear as to what is happening.  Saying they
will be recreated is misleading.
The extension definitions are being copied from the old server to the new
server.

I also think we should have stronger wording in the "The extensions may be
upgraded ..." statement.
I'd suggest "Any new versions of extensions should be upgraded using"

Dave

>
>


Re: [Proposal] Global temporary tables

2021-07-29 Thread wenjing zeng


> 2021年7月28日 23:09,Tony Zhu  写道:
> 
> Hi Wenjing
> 
> would you please rebase the code?
Thank you for your attention.
According to the test, the latest pgmaster code can merge the latest patch and 
pass the test.
https://www.travis-ci.com/github/wjzeng/postgres/builds 

If you have any questions, please give me feedback.


Wenjing


> 
> Thank you very much
> Tony
> 
> The new status of this patch is: Waiting on Author



Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 11:17:52AM -0400, Dave Cramer wrote:
> On Thu, 29 Jul 2021 at 11:10, Bruce Momjian  wrote:
> OK, I think we should be more clear as to what is happening.  Saying they will
> be recreated is misleading.
> The extension definitions are being copied from the old server to the new
> server.

I think my wording is says exactly that:

Do not load the schema definitions, e.g., CREATE EXTENSION
pgcrypto, because these will be recreated from the old
cluster.  

> I also think we should have stronger wording in the "The extensions may be
> upgraded ..." statement. 
> I'd suggest "Any new versions of extensions should be upgraded using"

I can't really comment on that since I see little mention of upgrading
extensions in our docs.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 7:56 AM Bruce Momjian  wrote:

> On Wed, Jul 28, 2021 at 09:35:28PM -0700, David G. Johnston wrote:
> > On Wed, Jul 28, 2021 at 6:52 PM Bruce Momjian  wrote:
> >
> > I came up with the attached patch.
> >
> >
> > Thank you.  It is an improvement but I think more could be done here (not
> > exactly sure what - though removing the "copy binaries for contrib
> modules from
> > the old server" seems like a decent second step.)
>
> Uh, I don't see that text.
>
>
"""
 5. Install custom shared object files

Install any custom shared object files (or DLLs) used by the old cluster
into the new cluster, e.g., pgcrypto.so, whether they are from contrib or
some other source. Do not install the schema definitions, e.g., CREATE
EXTENSION pgcrypto, because these will be upgraded from the old cluster.
Also, any custom full text search files (dictionary, synonym, thesaurus,
stop words) must also be copied to the new cluster.
"""
I have an issue with the fragment "whether they are from contrib" - my
understanding at this point is that because of the way we package and
version contrib it should not be necessary to copy those shared object
files from the old to the new server (maybe, just maybe, with a
qualification that you are upgrading between two versions that were in
support during the same time period).

David J.


Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Yura Sokolov

Masahiko Sawada писал 2021-07-29 17:29:
On Thu, Jul 29, 2021 at 8:03 PM Yura Sokolov  
wrote:


Masahiko Sawada писал 2021-07-29 12:11:
> On Thu, Jul 29, 2021 at 3:53 AM Andres Freund 
> wrote:
>>
>> Hi,
>>
>> On 2021-07-27 13:06:56 +0900, Masahiko Sawada wrote:
>> > Apart from performance and memory usage points of view, we also need
>> > to consider the reusability of the code. When I started this thread, I
>> > thought the best data structure would be the one optimized for
>> > vacuum's dead tuple storage. However, if we can use a data structure
>> > that can also be used in general, we can use it also for other
>> > purposes. Moreover, if it's too optimized for the current TID system
>> > (32 bits block number, 16 bits offset number, maximum block/offset
>> > number, etc.) it may become a blocker for future changes.
>>
>> Indeed.
>>
>>
>> > In that sense, radix tree also seems good since it can also be used in
>> > gist vacuum as a replacement for intset, or a replacement for hash
>> > table for shared buffer as discussed before. Are there any other use
>> > cases?
>>
>> Yes, I think there are. Whenever there is some spatial locality it has
>> a
>> decent chance of winning over a hash table, and it will most of the
>> time
>> win over ordered datastructures like rbtrees (which perform very
>> poorly
>> due to the number of branches and pointer dispatches). There's plenty
>> hashtables, e.g. for caches, locks, etc, in PG that have a medium-high
>> degree of locality, so I'd expect a few potential uses. When adding
>> "tree compression" (i.e. skip inner nodes that have a single incoming
>> &
>> outgoing node) radix trees even can deal quite performantly with
>> variable width keys.
>
> Good point.
>
>>
>> > On the other hand, I’m concerned that radix tree would be an
>> > over-engineering in terms of vacuum's dead tuples storage since the
>> > dead tuple storage is static data and requires only lookup operation,
>> > so if we want to use radix tree as dead tuple storage, I'd like to see
>> > further use cases.
>>
>> I don't think we should rely on the read-only-ness. It seems pretty
>> clear that we'd want parallel dead-tuple scans at a point not too far
>> into the future?
>
> Indeed. Given that the radix tree itself has other use cases, I have
> no concern about using radix tree for vacuum's dead tuples storage. It
> will be better to have one that can be generally used and has some
> optimizations that are helpful also for vacuum's use case, rather than
> having one that is very optimized only for vacuum's use case.

Main portion of svtm that leads to memory saving is compression of 
many
pages at once (CHUNK). It could be combined with radix as a storage 
for

pointers to CHUNKs., bute

For a moment I'm benchmarking IntegerSet replacement based on Trie 
(HATM

like)
and CHUNK compression, therefore datastructure could be used for gist
vacuum as well.

Since it is generic (allows to index all 64bit) it lacks of trick used
to speedup svtm. Still on 10x test it is faster than radix.


I've attached IntegerSet2 patch for pgtools repo and benchmark results.
Branch https://github.com/funny-falcon/pgtools/tree/integerset2

SVTM is measured with couple of changes from commit 
5055ef72d23482dd3e11ce

in that branch: 1) more often compress bitmap, but slower, 2) couple of
popcount tricks.

IntegerSet consists of trie index to CHUNKS. CHUNKS is compressed bitmap
of 2^15 (6+9) bits (almost like in SVTM, but for fixed bit width).

Well, IntegerSet2 is always faster than IntegerSet and always uses
significantly less memory (radix uses more memory than IntegerSet in
couple of tests and uses comparable in others).

IntegerSet2 is not always faster than radix. It is more like radix.
That it because both are generic prefix trees with comparable amount of
memory accesses. SVTM did the trick being not multilevel prefix tree, 
but

just 1 level bitmap index to chunks.

I believe, trie part of IntegerSet could be replaced with radix.
Ie use radix as storage for pointers to CHUNKS.


BTW, how does svtm work when we add two sets of dead tuple TIDs to one
svtm? Dead tuple TIDs are unique sets but those sets could have TIDs
of the different offsets on the same block. The case I imagine is the
idea discussed on this thread[1]. With this idea, we store the
collected dead tuple TIDs somewhere and skip index vacuuming for some
reason (index skipping optimization, failsafe mode, or interruptions
etc.). Then, in the next lazy vacuum timing, we load the dead tuple
TIDs and start to scan the heap. During the heap scan in the second
lazy vacuum, it's possible that new dead tuples will be found on the
pages that we have already stored in svtm during the first lazy
vacuum. How can we efficiently update the chunk in the svtm?


If we store tidmap to disk, then it will be serialized. Since SVTM/
IntegerSet2 are ordered, they could be loaded in order. Then we
can just merge tuples in per page basis: deserialize page (or CHUNK),
put new tu

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
I have an issue with the fragment "whether they are from contrib" - my
> understanding at this point is that because of the way we package and
> version contrib it should not be necessary to copy those shared object
> files from the old to the new server (maybe, just maybe, with a
> qualification that you are upgrading between two versions that were in
> support during the same time period).
>

Just to clarify. In no case are binaries copied from the old server to the
new server. Whether from contrib or otherwise.

Dave


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 8:36 AM Dave Cramer  wrote:

>
>
> I have an issue with the fragment "whether they are from contrib" - my
>> understanding at this point is that because of the way we package and
>> version contrib it should not be necessary to copy those shared object
>> files from the old to the new server (maybe, just maybe, with a
>> qualification that you are upgrading between two versions that were in
>> support during the same time period).
>>
>
> Just to clarify. In no case are binaries copied from the old server to the
> new server. Whether from contrib or otherwise.
>
>
I had used "binaries" when I should have written "shared object files".  I
just imagine a DLL as being a binary file so it seems accurate but we use
the term differently I suppose?

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 08:28:12AM -0700, David G. Johnston wrote:
> On Thu, Jul 29, 2021 at 7:56 AM Bruce Momjian  wrote:
> 
> On Wed, Jul 28, 2021 at 09:35:28PM -0700, David G. Johnston wrote:
> > On Wed, Jul 28, 2021 at 6:52 PM Bruce Momjian  wrote:
> >
> >     I came up with the attached patch.
> >
> >
> > Thank you.  It is an improvement but I think more could be done here 
> (not
> > exactly sure what - though removing the "copy binaries for contrib
> modules from
> > the old server" seems like a decent second step.)
> 
> Uh, I don't see that text.
> """
>  5. Install custom shared object files
> 
> Install any custom shared object files (or DLLs) used by the old cluster into
> the new cluster, e.g., pgcrypto.so, whether they are from contrib or some 
> other
> source. Do not install the schema definitions, e.g., CREATE EXTENSION 
> pgcrypto,
> because these will be upgraded from the old cluster. Also, any custom full 
> text
> search files (dictionary, synonym, thesaurus, stop words) must also be copied
> to the new cluster.
> """
> I have an issue with the fragment "whether they are from contrib" - my
> understanding at this point is that because of the way we package and version
> contrib it should not be necessary to copy those shared object files from the
> old to the new server (maybe, just maybe, with a qualification that you are
> upgrading between two versions that were in support during the same time
> period).

OK, so this is the confusion I was talking about.  You are supposed to
install _new_ _versions_ of the extensions that are in the old cluster
to the new cluster.  You are not supposed to _copy_ the files from the
old to new cluster.  I think my new patch makes that clearer, but can it
be improved?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 11:36:19AM -0400, Dave Cramer wrote:
> 
> 
> I have an issue with the fragment "whether they are from contrib" - my
> understanding at this point is that because of the way we package and
> version contrib it should not be necessary to copy those shared object
> files from the old to the new server (maybe, just maybe, with a
> qualification that you are upgrading between two versions that were in
> support during the same time period).
> 
> 
> Just to clarify. In no case are binaries copied from the old server to the new
> server. Whether from contrib or otherwise.

Right.  Those are _binaries_ and therefore made to match a specific
Postgres binary.  They might work or might not, but copying them is
never a good idea --- they should be recompiled to match the new server
binary, even if the extension had no version/API changes.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Jan Wieck

On 7/29/21 11:10 AM, Bruce Momjian wrote:

On Thu, Jul 29, 2021 at 11:01:43AM -0400, Dave Cramer wrote:

Much better, however I'm unclear on whether CREATE EXTENSION is actually
executed on the upgraded server.

From what I could gather it is not, otherwise the new function definitions
should have been in place. 
I think what happens is that the function definitions are copied as part of the

schema and pg_extension is also copied.


Yes, the _effect_ of CREATE EXTENSION in the old cluster is copied to
the new cluster as object definitions.  CREATE EXTENSION runs the SQL
files associated with the extension.



This assumes that the scripts executed during CREATE EXTENSION have no 
conditional code in them that depends on the server version. Running the 
same SQL script on different server versions can have different effects.


I don't have a ready example of such an extension, but if we ever would 
convert the backend parts of Slony into an extension, it would be one.



Regards, Jan

--
Jan Wieck




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
On Thu, 29 Jul 2021 at 11:39, David G. Johnston 
wrote:

> On Thu, Jul 29, 2021 at 8:36 AM Dave Cramer  wrote:
>
>>
>>
>> I have an issue with the fragment "whether they are from contrib" - my
>>> understanding at this point is that because of the way we package and
>>> version contrib it should not be necessary to copy those shared object
>>> files from the old to the new server (maybe, just maybe, with a
>>> qualification that you are upgrading between two versions that were in
>>> support during the same time period).
>>>
>>
>> Just to clarify. In no case are binaries copied from the old server to
>> the new server. Whether from contrib or otherwise.
>>
>>
> I had used "binaries" when I should have written "shared object files".  I
> just imagine a DLL as being a binary file so it seems accurate but we use
> the term differently I suppose?
>

No, we are using the same term. pg_upgrade does not copy anything that was
compiled, whether it is called a DLL or otherwise.


Dave

>
> David J.
>


Re: needless complexity in StartupXLOG

2021-07-29 Thread Robert Haas
On Wed, Jul 28, 2021 at 3:28 PM Andres Freund  wrote:
> > Imagine if instead of
> > all the hairy logic we have now we just replaced this whole if
> > (IsInRecovery) stanza with this:
> >
> > if (InRecovery)
> > CreateEndOfRecoveryRecord();
> >
> > That would be WAY easier to reason about than the rat's nest we have
> > here today. Now, I am not sure what it would take to get there, but I
> > think that is the direction we ought to be heading.
>
> What are we going to do in the single user ([1]) case in this awesome future?
> I guess we could just not create a checkpoint until single user mode is shut
> down / creates a checkpoint for other reasons?

It probably depends on who writes this thus-far-hypothetical patch,
but my thought is that we'd unconditionally request a checkpoint after
writing the end-of-recovery record, same as we do now if (promoted).
If we happened to be in single-user mode, then that checkpoint request
would turn into performing a checkpoint on the spot in the one and
only process we've got, but StartupXLOG() wouldn't really need to care
what happens under the hood after it called RequestCheckpoint().

> [1] I really wish somebody had the energy to just remove single user and
> bootstrap modes. The degree to which they increase complexity in the rest of
> the system is entirely unreasonable. There's not actually any reason
> bootstrapping can't happen with checkpointer et al running, it's just
> autovacuum that'd need to be blocked.

I don't know whether this is the way forward or not. I think a lot of
the complexity of the current system is incidental rather than
intrinsic. If I were going to work on this, I'd probably working on
trying to tidy up the code and reduce the number of places that need
to care about IsUnderPostmaster and IsPostmasterEnvironment, rather
than trying to get rid of them. I suspect that's a necessary
prerequisite step anyway, and not a trivial effort either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: alter table set TABLE ACCESS METHOD

2021-07-29 Thread Jeff Davis
On Thu, 2021-07-29 at 15:27 +0900, Michael Paquier wrote:
> Doing any checks around the hooks of objectaccess.h is very annoying,
> because we have no modules to check after them easily except sepgsql.
> Anyway, I have been checking that, with the hack-ish module attached
> and tracked down that swap_relation_files() calls
> InvokeObjectPostAlterHookArg() already (as you already spotted?), but
> that's an internal change when it comes to SET LOGGED/UNLOGGED/ACCESS
> METHOD :(
> 
> Attached is a small module I have used for those tests, for
> reference.  It passes on HEAD, and with the patch attached you can
> see
> the extra entries.

I see that ATExecSetTableSpace() also invokes the hook even for a no-
op. Should we do the same thing for setting the AM?

> > > > Also, I agree with Justin that it should fail when there are
> > > > multiple
> > > > SET ACCESS METHOD subcommands consistently, regardless of
> > > > whether
> > > > one
> > > > is a no-op, and it should probably throw a syntax error to
> > > > match
> > > > SET
> > > > TABLESPACE.
> > > 
> > > Hmm.  Okay.
> 
> I'd still disagree with that.

OK, I won't press for a change here.

Regards,
Jeff Davis






Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 8:42 AM Bruce Momjian  wrote:

> On Thu, Jul 29, 2021 at 11:36:19AM -0400, Dave Cramer wrote:
> >
> >
> > I have an issue with the fragment "whether they are from contrib" -
> my
> > understanding at this point is that because of the way we package and
> > version contrib it should not be necessary to copy those shared
> object
> > files from the old to the new server (maybe, just maybe, with a
> > qualification that you are upgrading between two versions that were
> in
> > support during the same time period).
> >
> >
> > Just to clarify. In no case are binaries copied from the old server to
> the new
> > server. Whether from contrib or otherwise.
>
> Right.  Those are _binaries_ and therefore made to match a specific
> Postgres binary.  They might work or might not, but copying them is
> never a good idea --- they should be recompiled to match the new server
> binary, even if the extension had no version/API changes.
>

Ok, looking at the flow again, where exactly would the user even be able to
execute "CREATE EXTENSION" meaningfully?  The relevant databases do not
exist (not totally sure what happens to the postgres database created
during the initdb step...) so at the point where the user is "installing
the extension" all they can reasonably do is a server-level install (they
could maybe create extension in the postgres database, but does that even
matter?).

So, I'd propose simplifying this all to something like:

Install extensions on the new server

Any extensions that are used by the old cluster need to be installed into
the new cluster.  Each database in the old cluster will have its current
version of all extensions migrated to the new cluster as-is.  You can use
the ALTER EXTENSION command, on a per-database basis, to update its
extensions post-upgrade.

David J.


Re: Out-of-memory error reports in libpq

2021-07-29 Thread Robert Haas
On Thu, Jul 29, 2021 at 9:57 AM Tom Lane  wrote:
> In the case at hand, I'd personally avoid a ternary op for the first
> assignment because then the line would run over 80 characters, and
> you'd have to make decisions about where to break it.  (We don't have
> a standardized convention about that, and none of the alternatives
> look very good to my eye.)

This is exactly why I rarely use ?:

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 08:39:32AM -0700, David G. Johnston wrote:
> On Thu, Jul 29, 2021 at 8:36 AM Dave Cramer  wrote:
> 
> 
> 
> 
> I have an issue with the fragment "whether they are from contrib" - my
> understanding at this point is that because of the way we package and
> version contrib it should not be necessary to copy those shared object
> files from the old to the new server (maybe, just maybe, with a
> qualification that you are upgrading between two versions that were in
> support during the same time period).
> 
> 
> Just to clarify. In no case are binaries copied from the old server to the
> new server. Whether from contrib or otherwise.
>
> I had used "binaries" when I should have written "shared object files".  I 
> just
> imagine a DLL as being a binary file so it seems accurate but we use the term
> differently I suppose?

Uh, technically, the _executable_ binary should only use shared object /
loadable libraries that were compiled against that binary's exported
API.  Sometimes mismatches work (if the API used by the shared object
has not changed in the binary) so people get used to it working, and
then one day it doesn't, but it is never a safe process.

If two people here are confused about this, obviously others will be as
well.  I think we were trying to do too much in that first sentence, so
I split it into two in the attached patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index a83c63cd98..7352936a94 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -299,15 +299,18 @@ make prefix=/usr/local/pgsql.new install

 

-Install custom shared object files
+Install extension shared object files
 
 
- Install any custom shared object files (or DLLs) used by the old cluster
- into the new cluster, e.g., pgcrypto.so,
- whether they are from contrib
- or some other source. Do not install the schema definitions, e.g.,
- CREATE EXTENSION pgcrypto, because these will be upgraded
- from the old cluster.
+ If the old cluster used extensions, whether from
+ contrib or some other source, it used
+ shared object files (or DLLs) to implement these extensions, e.g.,
+ pgcrypto.so.  Now, shared object files matching
+ the new server binary must be installed in the new cluster, usually
+ via operating system commands.  Do not load the schema definitions,
+ e.g., CREATE EXTENSION pgcrypto, because these
+ will be recreated from the old cluster.  (The extensions may be
+ upgraded later using ALTER EXTENSION ... UPGRADE.)
  Also, any custom full text search files (dictionary, synonym,
  thesaurus, stop words) must also be copied to the new cluster.
 
@@ -494,10 +497,10 @@ pg_upgrade.exe
  
 
  
-  Install custom shared object files
+  Install extension shared object files
 
   
-   Install the same custom shared object files on the new standbys
+   Install the same extension shared object files on the new standbys
that you installed in the new primary cluster.
   
  


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 11:46:12AM -0400, Jan Wieck wrote:
> On 7/29/21 11:10 AM, Bruce Momjian wrote:
> > On Thu, Jul 29, 2021 at 11:01:43AM -0400, Dave Cramer wrote:
> > > Much better, however I'm unclear on whether CREATE EXTENSION is actually
> > > executed on the upgraded server.
> > > 
> > > From what I could gather it is not, otherwise the new function definitions
> > > should have been in place. I think what happens is that the function
> > > definitions are copied as part of the
> > > schema and pg_extension is also copied.
> > 
> > Yes, the _effect_ of CREATE EXTENSION in the old cluster is copied to
> > the new cluster as object definitions.  CREATE EXTENSION runs the SQL
> > files associated with the extension.
> > 
> 
> This assumes that the scripts executed during CREATE EXTENSION have no
> conditional code in them that depends on the server version. Running the
> same SQL script on different server versions can have different effects.
> 
> I don't have a ready example of such an extension, but if we ever would
> convert the backend parts of Slony into an extension, it would be one.

The bottom line is that we have _no_ mechanism to handle this except
uninstalling the extension before the upgrade and re-installing it
afterward, which isn't clearly spelled out for each extension, as far as
I know, and would not work for extensions that create data types.

Yes, I do feel this is a big hold in our upgrade instructions.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 09:00:36AM -0700, David G. Johnston wrote:
> On Thu, Jul 29, 2021 at 8:42 AM Bruce Momjian  wrote:
> 
> On Thu, Jul 29, 2021 at 11:36:19AM -0400, Dave Cramer wrote:
> >
> >
> >     I have an issue with the fragment "whether they are from contrib" -
> my
> >     understanding at this point is that because of the way we package 
> and
> >     version contrib it should not be necessary to copy those shared
> object
> >     files from the old to the new server (maybe, just maybe, with a
> >     qualification that you are upgrading between two versions that were
> in
> >     support during the same time period).
> >
> >
> > Just to clarify. In no case are binaries copied from the old server to
> the new
> > server. Whether from contrib or otherwise.
> 
> Right.  Those are _binaries_ and therefore made to match a specific
> Postgres binary.  They might work or might not, but copying them is
> never a good idea --- they should be recompiled to match the new server
> binary, even if the extension had no version/API changes.
> 
> 
> Ok, looking at the flow again, where exactly would the user even be able to
> execute "CREATE EXTENSION" meaningfully?  The relevant databases do not exist
> (not totally sure what happens to the postgres database created during the
> initdb step...) so at the point where the user is "installing the extension"
> all they can reasonably do is a server-level install (they could maybe create
> extension in the postgres database, but does that even matter?).

They could technically start the new cluster and use "CREATE EXTENSION"
before the upgrade, and then the ugprade would fail since there would be
duplicate object errors.

> So, I'd propose simplifying this all to something like:
> 
> Install extensions on the new server
> 
> Any extensions that are used by the old cluster need to be installed into the
> new cluster.  Each database in the old cluster will have its current version 
> of
> all extensions migrated to the new cluster as-is.  You can use the ALTER
> EXTENSION command, on a per-database basis, to update its extensions
> post-upgrade.

Can you review the text I just posted?  Thanks.   I think we are making
progress.  ;-)

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: [Patch] ALTER SYSTEM READ ONLY

2021-07-29 Thread Robert Haas
On Wed, Jul 28, 2021 at 7:33 AM Amul Sul  wrote:
> I was too worried about how I could miss that & after thinking more
> about that, I realized that the operation for ArchiveRecoveryRequested
> is never going to be skipped in the startup process and that never
> left for the checkpoint process to do that later. That is the reason
> that assert was added there.
>
> When ArchiveRecoveryRequested, the server will no longer be in
> the wal prohibited mode, we implicitly change the state to
> wal-permitted. Here is the snip from the 0003 patch:

Ugh, OK. That makes sense, but I'm still not sure that I like it. I've
kind of been wondering: why not have XLogAcceptWrites() be the
responsibility of the checkpointer all the time, in every case? That
would require fixing some more things, and this is one of them, but
then it would be consistent, which means that any bugs would be likely
to get found and fixed. If calling XLogAcceptWrites() from the
checkpointer is some funny case that only happens when the system
crashes while WAL is prohibited, then we might fail to notice that we
have a bug.

This is especially true given that we have very little test coverage
in this area. Andres was ranting to me about this earlier this week,
and I wasn't sure he was right, but then I noticed that we have
exactly zero tests in the entire source tree that make use of
recovery_end_command. We really need a TAP test for that, I think.
It's too scary to do much reorganization of the code without having
any tests at all for the stuff we're moving around. Likewise, we're
going to need TAP tests for the stuff that is specific to this patch.
For example, we should have a test that crashes the server while it's
read only, brings it back up, checks that we still can't write WAL,
then re-enables WAL, and checks that we now can write WAL. There are
probably a bunch of other things that we should test, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: when the startup process doesn't (logging startup delays)

2021-07-29 Thread Robert Haas
On Thu, Jul 29, 2021 at 4:56 AM Nitin Jadhav
 wrote:
> Thanks for sharing the information. I have done the necessary changes
> to show the logs during the latter case (postgres --single) and
> verified the log messages.

Thanks. Can you please have a look at what I suggested down toward the
bottom of 
http://postgr.es/m/ca+tgmoap2wefsktmcgwt9lxuz7y99hnduyshpk7qnfuqb98...@mail.gmail.com
?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: needless complexity in StartupXLOG

2021-07-29 Thread Robert Haas
On Thu, Jul 29, 2021 at 1:47 AM Amul Sul  wrote:
> Can we have an elog() fatal error or warning to make sure that the
> last checkpoint is still readable? Since the case where the user
> (knowingly or unknowingly) or some buggy code has removed the WAL file
> containing the last checkpoint could be possible. If it is then we
> would have a hard time finding out when we get further unexpected
> behavior due to this. Thoughts?

Sure, we could, but I don't think we should. Such crazy things can
happen any time, not just at the point where this check is happening.
It's not particularly more likely to happen here vs. any other place
where we could insert a check. Should we check everywhere, all the
time, just in case?

> > I realize that conservatism may have played a role in this code ending
> > up looking the way that it does; someone seems to have thought it
> > would be better not to rely on a new idea in all cases. From my point
> > of view, though, it's scary to have so many cases, especially cases
> > that don't seem like they should ever be reached. I think that
> > simplifying the logic here and trying to do the same things in as many
> > cases as we can will lead to better robustness. Imagine if instead of
> > all the hairy logic we have now we just replaced this whole if
> > (IsInRecovery) stanza with this:
> >
> > if (InRecovery)
> > CreateEndOfRecoveryRecord();
>
> +1, and do the checkpoint at the end unconditionally as we are doing
> for the promotion.

Yeah, that was my thought, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
Dave Cramer


On Thu, 29 Jul 2021 at 12:16, Bruce Momjian  wrote:

> On Thu, Jul 29, 2021 at 09:00:36AM -0700, David G. Johnston wrote:
> > On Thu, Jul 29, 2021 at 8:42 AM Bruce Momjian  wrote:
> >
> > On Thu, Jul 29, 2021 at 11:36:19AM -0400, Dave Cramer wrote:
> > >
> > >
> > > I have an issue with the fragment "whether they are from
> contrib" -
> > my
> > > understanding at this point is that because of the way we
> package and
> > > version contrib it should not be necessary to copy those shared
> > object
> > > files from the old to the new server (maybe, just maybe, with a
> > > qualification that you are upgrading between two versions that
> were
> > in
> > > support during the same time period).
> > >
> > >
> > > Just to clarify. In no case are binaries copied from the old
> server to
> > the new
> > > server. Whether from contrib or otherwise.
> >
> > Right.  Those are _binaries_ and therefore made to match a specific
> > Postgres binary.  They might work or might not, but copying them is
> > never a good idea --- they should be recompiled to match the new
> server
> > binary, even if the extension had no version/API changes.
> >
> >
> > Ok, looking at the flow again, where exactly would the user even be able
> to
> > execute "CREATE EXTENSION" meaningfully?  The relevant databases do not
> exist
> > (not totally sure what happens to the postgres database created during
> the
> > initdb step...) so at the point where the user is "installing the
> extension"
> > all they can reasonably do is a server-level install (they could maybe
> create
> > extension in the postgres database, but does that even matter?).
>
> They could technically start the new cluster and use "CREATE EXTENSION"
> before the upgrade, and then the ugprade would fail since there would be
> duplicate object errors.
>
> > So, I'd propose simplifying this all to something like:
> >
> > Install extensions on the new server
> >
> > Any extensions that are used by the old cluster need to be installed
> into the
> > new cluster.  Each database in the old cluster will have its current
> version of
> > all extensions migrated to the new cluster as-is.  You can use the ALTER
> > EXTENSION command, on a per-database basis, to update its extensions
> > post-upgrade.
>
> Can you review the text I just posted?  Thanks.   I think we are making
> progress.  ;-)
>

I am OK with Everything except

Do not load the schema definitions,
e.g., CREATE EXTENSION pgcrypto, because these
will be recreated from the old cluster.  (The extensions may be
upgraded later using ALTER EXTENSION ... UPGRADE.)

 I take issue with the word "recreated". This implies something new is
created, when in fact the old definitions are simply copied over.

As I said earlier; using the wording "may be upgraded" is not nearly
cautionary enough.

Dave


Re: fixing pg_basebackup tests for modern Windows/msys2

2021-07-29 Thread Andrew Dunstan


On 7/28/21 9:31 AM, Andrew Dunstan wrote:
> While looking into issues with fairywren and pg_basebackup tests, I
> created a similar environment but with more modern Windows / msys2.
> Before it even got to the test that failed on fairywren it failed the
> first TAP test for a variety of reasons, all connected to
> TestLib::perl2host.
>
> First, this function is in some cases returning paths for directories
> with trailing slashes and or embedded double slashes.  Both of these can
> cause problems, especially when written to a tablespace map file. Also,
> the cygpath invocation is returning a path with backslashes whereas "pwd
> -W' returns a path with forward slashes.
>
> So the first attached patch rectifies these problems. It fixes issues
> with doubles and trailing slashes and makes cygpath return a path with
> forward slashes just like the non-cygpath branch.
>
> However, there is another problem, which is that if called on a path
> that includes a symlink, on the test platform I set up it actually
> resolves that link rather than just following it. The end result is that
> the use of a shorter path via a symlink is effectively defeated. I
> haven't found any way to stop this behaviour.
>
> The second patch therefore adjusts the test to avoid calling perl2host
> on such a path. It just calls perl2host on the symlink's parent, and
> thereafter uses that result.
>
>


I've pushed these in master and REL_14_STABLE.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Jan Wieck

On 7/29/21 12:00 PM, David G. Johnston wrote:
Ok, looking at the flow again, where exactly would the user even be able 
to execute "CREATE EXTENSION" meaningfully?  The relevant databases do 
not exist (not totally sure what happens to the postgres database 
created during the initdb step...) so at the point where the user is 
"installing the extension" all they can reasonably do is a server-level 
install (they could maybe create extension in the postgres database, but 
does that even matter?).


So, I'd propose simplifying this all to something like:

Install extensions on the new server


Extensions are not installed on the server level. Their binary 
components (shared objects) are, but the actual catalog modifications 
that make them accessible are performed per database by CREATE 
EXTENSION, which executes the SQL files associated with the extension. 
And they can be performed differently per database, like for example 
placing one and the same extension into different schemas in different 
databases.


pg_upgrade is not (and should not be) concerned with placing the 
extension's installation components into the new version's lib and share 
directories. But it is pg_upgrade's job to perform the correct catalog 
modification per database during the upgrade.


Any extensions that are used by the old cluster need to be installed 
into the new cluster.  Each database in the old cluster will have its 
current version of all extensions migrated to the new cluster as-is.  
You can use the ALTER EXTENSION command, on a per-database basis, to 
update its extensions post-upgrade.


That assumes that the extension SQL files are capable of detecting a 
server version change and perform the necessary (if any) steps to alter 
the extension's objects accordingly.


Off the top of my head I don't remember what happens when one executes 
ALTER EXTENSION ... UPGRADE ... when it is already on the latest version 
*of the extension*. Might be an error or a no-op.


And to make matters worse, it is not possible to work around this with a 
DROP EXTENSION ... CREATE EXTENSION. There are extensions that create 
objects, like user defined data types and functions, that will be 
referenced by end user objects like tables and views.



Regards, Jan

--
Jan Wieck




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 12:22:59PM -0400, Dave Cramer wrote:
> On Thu, 29 Jul 2021 at 12:16, Bruce Momjian  wrote:
> Can you review the text I just posted?  Thanks.   I think we are making
> progress.  ;-)
> 
> 
> I am OK with Everything except
> 
> Do not load the schema definitions,
> e.g., CREATE EXTENSION pgcrypto, because these
> will be recreated from the old cluster.  (The extensions may be
> upgraded later using ALTER EXTENSION ... UPGRADE.)
> 
>  I take issue with the word "recreated". This implies something new is 
> created,
> when in fact the old definitions are simply copied over.
> 
> As I said earlier; using the wording "may be upgraded" is not nearly 
> cautionary
> enough.

OK, I changed it to "copy" though I used "recreated" earlier since I was
worried "copy" would be confused with file copy.  I changed the
recommendation to "should be".

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index a83c63cd98..0e69f26628 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -299,15 +299,18 @@ make prefix=/usr/local/pgsql.new install

 

-Install custom shared object files
+Install extension shared object files
 
 
- Install any custom shared object files (or DLLs) used by the old cluster
- into the new cluster, e.g., pgcrypto.so,
- whether they are from contrib
- or some other source. Do not install the schema definitions, e.g.,
- CREATE EXTENSION pgcrypto, because these will be upgraded
- from the old cluster.
+ If the old cluster used extensions, whether from
+ contrib or some other source, it used
+ shared object files (or DLLs) to implement these extensions, e.g.,
+ pgcrypto.so.  Now, shared object files matching
+ the new server binary must be installed in the new cluster, usually
+ via operating system commands.  Do not load the schema definitions,
+ e.g., CREATE EXTENSION pgcrypto, because these
+ will be copied from the old cluster.  (Extensions should be upgraded
+ later using ALTER EXTENSION ... UPGRADE.)
  Also, any custom full text search files (dictionary, synonym,
  thesaurus, stop words) must also be copied to the new cluster.
 
@@ -494,10 +497,10 @@ pg_upgrade.exe
  
 
  
-  Install custom shared object files
+  Install extension shared object files
 
   
-   Install the same custom shared object files on the new standbys
+   Install the same extension shared object files on the new standbys
that you installed in the new primary cluster.
   
  


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 9:28 AM Jan Wieck  wrote:

> On 7/29/21 12:00 PM, David G. Johnston wrote:
> > Ok, looking at the flow again, where exactly would the user even be able
> > to execute "CREATE EXTENSION" meaningfully?  The relevant databases do
> > not exist (not totally sure what happens to the postgres database
> > created during the initdb step...) so at the point where the user is
> > "installing the extension" all they can reasonably do is a server-level
> > install (they could maybe create extension in the postgres database, but
> > does that even matter?).
> >
> > So, I'd propose simplifying this all to something like:
> >
> > Install extensions on the new server
>
> Extensions are not installed on the server level. Their binary
> components (shared objects) are, but the actual catalog modifications
> that make them accessible are performed per database by CREATE
> EXTENSION, which executes the SQL files associated with the extension.
> And they can be performed differently per database, like for example
> placing one and the same extension into different schemas in different
> databases.
>
> pg_upgrade is not (and should not be) concerned with placing the
> extension's installation components into the new version's lib and share
> directories. But it is pg_upgrade's job to perform the correct catalog
> modification per database during the upgrade.
>

That is exactly the point I am making.  The section is informing the user
of things to do that the server will not do.  Which is "install extension
code into the O/S" and that mentioning CREATE EXTENSION at this point in
the process is talking about something that is simply out-of-scope.



> > Any extensions that are used by the old cluster need to be installed
> > into the new cluster.  Each database in the old cluster will have its
> > current version of all extensions migrated to the new cluster as-is.
> > You can use the ALTER EXTENSION command, on a per-database basis, to
> > update its extensions post-upgrade.
>
> That assumes that the extension SQL files are capable of detecting a
> server version change and perform the necessary (if any) steps to alter
> the extension's objects accordingly.
>
> Off the top of my head I don't remember what happens when one executes
> ALTER EXTENSION ... UPGRADE ... when it is already on the latest version
> *of the extension*. Might be an error or a no-op.
>
> And to make matters worse, it is not possible to work around this with a
> DROP EXTENSION ... CREATE EXTENSION. There are extensions that create
> objects, like user defined data types and functions, that will be
> referenced by end user objects like tables and views.
>
>
These are all excellent points - but at present pg_upgrade simply doesn't
care and hopes that the extension author's documentation deals with these
possibilities in a sane manner.

David J.


Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Yura Sokolov

Yura Sokolov писал 2021-07-29 18:29:


I've attached IntegerSet2 patch for pgtools repo and benchmark results.
Branch https://github.com/funny-falcon/pgtools/tree/integerset2


Strange web-mail client... I never can be sure what it will attach...

Reattach benchmark results



regards,

Yura Sokolov
y.soko...@postgrespro.ru
funny.fal...@gmail.com
* Test 1
select prepare(
100, -- max block
10, -- # of dead tuples per page
20, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);

  name  |  attach   | attach | shuffled |  size_x10  | attach_x10| shuffled_x10
+---++--++---+-
 array  | 57.23 MB  |  0.029 |   96.650 | 572.21 MB  | _ | ___
 intset | 46.88 MB  |  0.095 |   82.797 | 468.67 MB  | _ | ___
 radix  | 40.26 MB  |  0.089 |   12.082 | 401.27 MB  | 0.999 | 196.490
 rtbm   | 64.02 MB  |  0.164 |   14.300 | 512.02 MB  | 1.991 | 189.343
 svtm   | 17.78 MB  |  0.098 |   12.411 | 178.59 MB  | 0.973 | 210.901
 tbm| 96.01 MB  |  0.239 |8.486 | 768.01 MB  | 2.607 | 119.593
 intset2| 17.78 MB  |  0.220 |   19.203 | 175.84 MB  | 2.030 | 320.488

* Test 2
select prepare(
100, -- max block
10, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);

  name  |  attach   | attach | shuffled |  size_x10  | attach_x10| shuffled_x10
+---++--++---+-
 array  | 57.23 MB  |  0.030 |4.737 | 572.21 MB  | _ | ___
 intset | 46.88 MB  |  0.096 |4.077 | 468.67 MB  | _ | ___
 radix  | 9.95 MB   |  0.051 |0.752 | 98.38 MB   | 0.531 |  18.642
 rtbm   | 34.02 MB  |  0.166 |0.424 | 288.02 MB  | 2.036 |   6.294
 svtm   | 5.78 MB   |  0.037 |0.203 | 54.60 MB   | 0.400 |   5.682
 tbm| 96.01 MB  |  0.245 |0.423 | 768.01 MB  | 2.624 |   5.848
 intset2| 6.27 MB   |  0.097 |0.414 | 61.29 MB   | 1.002 |  12.823

* Test 3
select prepare(
100, -- max block
2, -- # of dead tuples per page
100, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);

  name  |  attach   | attach | shuffled |  size_x10  | attach_x10| shuffled_x10
+---++--++---+-
 array  | 11.45 MB  |  0.006 |   53.623 | 114.45 MB  | _ | ___
 intset | 15.63 MB  |  0.024 |   56.291 | 156.23 MB  | _ | ___
 radix  | 40.26 MB  |  0.055 |   10.928 | 401.27 MB  | 0.622 | 173.953
 rtbm   | 36.02 MB  |  0.116 |8.452 | 320.02 MB  | 1.370 | 125.438
 svtm   |  9.28 MB  |  0.069 |6.780 | 102.10 MB  | 0.690 | 167.220
 tbm| 96.01 MB  |  0.188 |8.569 | 768.01 MB  | 1.935 | 118.841
 intset2|  4.28 MB  |  0.022 |5.624 |  42.04 MB  | 0.229 | 146.096

* Test 4
select prepare(
100, -- max block
100, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);

  name  |  attach   | attach | shuffled |  size_x10  | attach_x10| shuffled_x10
+---++--++---+-
 array  | 572.21 MB |  0.304 |   76.625 | 5722.05 MB | _ | ___
 intset | 93.74 MB  |  0.795 |   47.977 | 937.34 MB  | _ | ___
 radix  | 40.26 MB  |  0.264 |6.322 | 401.27 MB  | 2.674 | 101.887
 rtbm   | 36.02 MB  |  0.360 |4.195 | 320.02 MB  | 4.009 |  62.441
 svtm   | 7.28 MB   |  0.329 |2.729 | 73.60 MB   | 3.271 |  74.725
 tbm| 96.01 MB  |  0.564 |4.220 | 768.01 MB  | 5.711 |  59.047
 svtm   | 7.28 MB   |  0.908 |5.826 | 70.55 MB   | 9.089 | 137.902


* Test 5
select prepare(
100, -- max block
150, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
2 -- page interval
);

There are 1 consecutive pages that have 150 dead tuples at every
2 pages.

  name  |  attach   | attach | shuffled |  size_x10  | attach_x10| shuffled_x10
+---++--++---+-
 array  | 429.16 MB |  0.225 |   65.404 | 4291.54 MB | _ | ___
 intset | 46.88 MB  |  0.510 |   39.460 | 468.67 MB  | _ | ___
 radix  | 20.26 MB  |  0.192 |6.440 | 201.48 MB  | 1.954 | 111.539
 rtbm   | 18.02 MB  |  0.225 |7.077 | 160.02 MB  | 2.515 | 120.816
 svtm   | 3.66 MB   |  0.233 |3.493 | 37.10 MB   | 2.348 |  73.092
 tbm| 48.01 MB  |  0.341 |9.146 | 384.01 MB  | 3.544 | 161.435
 intset2| 3.78 MB   |  0.671 |5.081 | 35.29 MB   | 6.791 | 117.617

* Test 6
select prepare(
100, -- max block
10, -- # of

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Jan Wieck

On 7/29/21 12:44 PM, David G. Johnston wrote:

... - but at present pg_upgrade simply 
doesn't care and hopes that the extension author's documentation deals 
with these possibilities in a sane manner.


pg_upgrade is not able to address this in a guaranteed, consistent 
fashion. At this point there is no way to even make sure that a 3rd 
party extension provides the scripts needed to upgrade from one 
extension version to the next. We don't have the mechanism to upgrade 
the same extension version from one server version to the next, in case 
there are any modifications in place necessary.


How exactly do you envision that we, the PostgreSQL project, make sure 
that extension developers provide those mechanisms in the future?



Regards, Jan

--
Jan Wieck




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 9:34 AM Bruce Momjian  wrote:

> On Thu, Jul 29, 2021 at 12:22:59PM -0400, Dave Cramer wrote:
> > On Thu, 29 Jul 2021 at 12:16, Bruce Momjian  wrote:
> > Can you review the text I just posted?  Thanks.   I think we are
> making
> > progress.  ;-)
> >
> >
> > I am OK with Everything except
> >
> > Do not load the schema definitions,
> > e.g., CREATE EXTENSION pgcrypto, because these
> > will be recreated from the old cluster.  (The extensions may be
> > upgraded later using ALTER EXTENSION ... UPGRADE.)
> >
> >  I take issue with the word "recreated". This implies something new is
> created,
> > when in fact the old definitions are simply copied over.
> >
> > As I said earlier; using the wording "may be upgraded" is not nearly
> cautionary
> > enough.
>
> OK, I changed it to "copy" though I used "recreated" earlier since I was
> worried "copy" would be confused with file copy.  I changed the
> recommendation to "should be".
>
>
I'm warming up to "should" but maybe add a "why" such as "the old versions
are considered unsupported on the newer server".

I dislike "usually via operating system commands", just offload this to the
extension, i.e., "must be installed in the new cluster via installation
procedures specific to, and documented by, each extension (for contrib it
is usually enough to ensure the -contrib package was chosen to be installed
along with the -server package for your operating system.)"

I would simplify the first two sentences to just:

If the old cluster used extensions those same extensions must be installed
in the new cluster via installation procedures specific to, and documented
by, each extension.  For contrib extensions it is usually enough to install
the -contrib package via the same method you used to install the PostgreSQL
server.

I would consider my suggestion for "copy as-is/alter extension" wording in
my previous email in lieu of the existing third and fourth sentences, with
the "should" and "why" wording possibly worked in.  But the existing works
ok.

David J.


Re: Fix around conn_duration in pgbench

2021-07-29 Thread Fujii Masao




On 2021/07/29 2:23, Fujii Masao wrote:



On 2021/07/28 16:15, Yugo NAGATA wrote:

I found another disconnect_all().

/* XXX should this be connection time? */
disconnect_all(state, nclients);

The measurement is also not necessary here.
So the above comment should be removed or updated?


I think this disconnect_all will be a no-op because all connections should
be already closed in threadRun(), but I left it just to be sure that
connections are all cleaned-up. I updated the comment for explaining above.

I attached the updated patch. Could you please look at this?


Thanks for updating the patch! LGTM.


This patch needs to be back-patched because it fixes the bug
in measurement of disconnection delays. Thought?

But the patch fails to be back-patched to v13 or before because
pgbench's time logic was changed by commit 547f04e734.
Do you have the patches that can be back-patched to v13 or before?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 9:55 AM Jan Wieck  wrote:

> How exactly do you envision that we, the PostgreSQL project, make sure
> that extension developers provide those mechanisms in the future?
>
>
I have no suggestions here, and don't really plan to get deeply involved in
this area of the project anytime soon.  But it is definitely a topic worthy
of discussion on a new thread.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
>
>
>
> I would simplify the first two sentences to just:
>
> If the old cluster used extensions those same extensions must be installed
> in the new cluster via installation procedures specific to, and documented
> by, each extension.  For contrib extensions it is usually enough to install
> the -contrib package via the same method you used to install the PostgreSQL
> server.
>

Well this is not strictly true. There are many extensions that would work
just fine with the current pg_upgrade. It may not even be necessary to
recompile them.

Dave


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Alvaro Herrera
On 2021-Jul-29, Dave Cramer wrote:

> > If the old cluster used extensions those same extensions must be
> > installed in the new cluster via installation procedures specific
> > to, and documented by, each extension.  For contrib extensions it is
> > usually enough to install the -contrib package via the same method
> > you used to install the PostgreSQL server.
> 
> Well this is not strictly true. There are many extensions that would
> work just fine with the current pg_upgrade. It may not even be
> necessary to recompile them.

It is always necessary to recompile because of the PG_MODULE_MAGIC
declaration, which is mandatory and contains a check that the major
version matches.  Copying the original shared object never works.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"But static content is just dynamic content that isn't moving!"
http://smylers.hates-software.com/2007/08/15/fe244d0c.html




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
On Thu, 29 Jul 2021 at 13:13, Alvaro Herrera 
wrote:

> On 2021-Jul-29, Dave Cramer wrote:
>
> > > If the old cluster used extensions those same extensions must be
> > > installed in the new cluster via installation procedures specific
> > > to, and documented by, each extension.  For contrib extensions it is
> > > usually enough to install the -contrib package via the same method
> > > you used to install the PostgreSQL server.
> >
> > Well this is not strictly true. There are many extensions that would
> > work just fine with the current pg_upgrade. It may not even be
> > necessary to recompile them.
>
> It is always necessary to recompile because of the PG_MODULE_MAGIC
> declaration, which is mandatory and contains a check that the major
> version matches.  Copying the original shared object never works.
>

Thx, I knew I was on shaky ground with that last statement :)

Dave

>
>


Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Robert Haas
On Thu, Jul 29, 2021 at 5:11 AM Masahiko Sawada  wrote:
> Indeed. Given that the radix tree itself has other use cases, I have
> no concern about using radix tree for vacuum's dead tuples storage. It
> will be better to have one that can be generally used and has some
> optimizations that are helpful also for vacuum's use case, rather than
> having one that is very optimized only for vacuum's use case.

What I'm about to say might be a really stupid idea, especially since
I haven't looked at any of the code already posted, but what I'm
wondering about is whether we need a full radix tree or maybe just a
radix-like lookup aid. For example, suppose that for a relation <= 8MB
in size, we create an array of 1024 elements indexed by block number.
Each element of the array stores an offset into the dead TID array.
When you need to probe for a TID, you look up blkno and blkno + 1 in
the array and then bsearch only between those two offsets. For bigger
relations, a two or three level structure could be built, or it could
always be 3 levels. This could even be done on demand, so you
initialize all of the elements to some special value that means "not
computed yet" and then fill them the first time they're needed,
perhaps with another special value that means "no TIDs in that block".

I don't know if this is better, but I do kind of like the fact that
the basic representation is just an array. It makes it really easy to
predict how much memory will be needed for a given number of dead
TIDs, and it's very DSM-friendly as well.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Alvaro Herrera
On 2021-Jul-29, Bruce Momjian wrote:

> + If the old cluster used extensions, whether from
> + contrib or some other source, it used
> + shared object files (or DLLs) to implement these extensions, e.g.,
> + pgcrypto.so.  Now, shared object files matching
> + the new server binary must be installed in the new cluster, usually
> + via operating system commands.  Do not load the schema definitions,
> + e.g., CREATE EXTENSION pgcrypto, because these
> + will be copied from the old cluster.  (Extensions should be upgraded
> + later using ALTER EXTENSION ... UPGRADE.)

I propose this:


  If the old cluster used shared-object files (or DLLs) for extensions
  or other loadable modules, install recompiled versions of those files
  onto the new cluster.
  Do not install the extension themselves (i.e., do not run
  CREATE EXTENSION), because extension definitions
  will be carried forward from the old cluster.



  Extensions can be upgraded after pg_upgrade completes using
  ALTER EXTENSION ... UPGRADE, on a per-database
  basis.


I suggest " ... for extensions or other loadable modules" because
loadable modules aren't necessarily for extensions.  Also, it's
perfectly possible to have extension that don't have a loadable module.

I suggest "extension definitions ... carried forward" instead of
"extensions ... copied" (your proposed text) to avoid the idea that
files are copied; use it instead of "schema definitions ... upgraded"
(the current docs) to avoid the idea that they are actually upgraded;
also, "schema definition" seems a misleading term to use here.

I suggest "can be upgraded" rather than "should be upgraded" because
we're not making a recommendation, merely stating the fact that it is
possible to do so.

I suggest using the imperative mood, to be consistent with the
surrounding text.  (Applies to the first para; the second para is
informative.)


I haven't seen it mentioned in the thread, but I think the final phrase
of this  should be a separate step,


 Copy custom full-text search files
 
  Copy any custom full text search file (dictionary, synonym, thesaurus,
  stop word list) to the new server.
 


While this is closely related to extensions, it's completely different.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Sallah, I said NO camels! That's FIVE camels; can't you count?"
(Indiana Jones)




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Julien Rouhaud
On Fri, Jul 30, 2021 at 12:14 AM Bruce Momjian  wrote:
>
> On Thu, Jul 29, 2021 at 11:46:12AM -0400, Jan Wieck wrote:
> >
> > This assumes that the scripts executed during CREATE EXTENSION have no
> > conditional code in them that depends on the server version. Running the
> > same SQL script on different server versions can have different effects.
> >
> > I don't have a ready example of such an extension, but if we ever would
> > convert the backend parts of Slony into an extension, it would be one.
>
> The bottom line is that we have _no_ mechanism to handle this except
> uninstalling the extension before the upgrade and re-installing it
> afterward, which isn't clearly spelled out for each extension, as far as
> I know, and would not work for extensions that create data types.
>
> Yes, I do feel this is a big hold in our upgrade instructions.

FWIW I have an example of such an extension: powa-archivist extension
script runs an anonymous block code and creates if needed a custom
wrapper to emulate the current_setting(text, boolean) variant that
doesn't exist on pre-pg96 servers.




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
Dave Cramer


On Thu, 29 Jul 2021 at 13:43, Alvaro Herrera 
wrote:

> On 2021-Jul-29, Bruce Momjian wrote:
>
> > + If the old cluster used extensions, whether from
> > + contrib or some other source, it used
> > + shared object files (or DLLs) to implement these extensions, e.g.,
> > + pgcrypto.so.  Now, shared object files
> matching
> > + the new server binary must be installed in the new cluster, usually
> > + via operating system commands.  Do not load the schema definitions,
> > + e.g., CREATE EXTENSION pgcrypto, because these
> > + will be copied from the old cluster.  (Extensions should be
> upgraded
> > + later using ALTER EXTENSION ... UPGRADE.)
>
> I propose this:
>
> 
>   If the old cluster used shared-object files (or DLLs) for extensions
>   or other loadable modules, install recompiled versions of those files
>   onto the new cluster.
>   Do not install the extension themselves (i.e., do not run
>   CREATE EXTENSION), because extension definitions
>   will be carried forward from the old cluster.
> 
>
> 
>   Extensions can be upgraded after pg_upgrade completes using
>   ALTER EXTENSION ... UPGRADE, on a per-database
>   basis.
> 
>
> I suggest " ... for extensions or other loadable modules" because
> loadable modules aren't necessarily for extensions.  Also, it's
> perfectly possible to have extension that don't have a loadable module.
>
> I suggest "extension definitions ... carried forward" instead of
> "extensions ... copied" (your proposed text) to avoid the idea that
> files are copied; use it instead of "schema definitions ... upgraded"
> (the current docs) to avoid the idea that they are actually upgraded;
> also, "schema definition" seems a misleading term to use here.
>

I like "carried forward", however it presumes quite a bit of knowledge of
what is going on inside pg_upgrade.
That said I don't have a better option short of explaining the whole thing
which is clearly out of scope.

>
> I suggest "can be upgraded" rather than "should be upgraded" because
> we're not making a recommendation, merely stating the fact that it is
> possible to do so.
>
> Why not recommend it? I was going to suggest that we actually run alter
extension upgrade ... on all of them as a solution.

What are the downsides to upgrading them all by default ? AFAIK if they
need upgrading this should run all of the SQL scripts, if they don't then
this should be a NO-OP.

Dave


Re: Numeric x^y for negative x

2021-07-29 Thread Dean Rasheed
On Thu, 22 Jul 2021 at 16:19, Dean Rasheed  wrote:
>
> On Thu, 22 Jul 2021 at 06:13, Yugo NAGATA  wrote:
> >
> > Thank you for updating the patch. I am fine with the additional comments.
> > I don't think there is any other problem left, so I marked it 
> > Ready-for-Committers.
>
> Thanks for looking. Barring any further comments, I'll push this in a few 
> days.
>

So I have been testing this a lot over the last couple of days, and I
have concluded that the patch works well as far as it goes, but I did
manage to construct another case where numeric_power() loses
precision. I think, though, that it would be better to tackle that as
a separate patch.

In writing the commit message for this patch, I realised that it was
possible to tidy up the local_rscale calculation part of it a little,
to make it more obvious what was going wrong, so attached is a
slightly tweaked version. I'll hold off on committing it for a few
more days in case anyone else wants to have a look. Tom?

The other issue I found is related to the first part of power_var(),
where it does a low-precision calculation to get an estimate for the
weight of the result. It occurred to me that for certain input bases,
that calculation could be made to be quite inaccurate, and therefore
lead to choosing the wrong rscale later. This is the test case I
constructed:

  (1-1.5123456789012345678e-1000) ^ 1.15e1003

Here, the base is a sliver under 1, and so ln(base) is approximately
-1.5e-1000, and ln_dweight is -1000 (the decimal weight of ln(base)).
The problem is that the local_rscale used for the first low-precision
calculation is limited to NUMERIC_MAX_DISPLAY_SCALE (which is 1000),
so we only compute ln_base to a scale of 1000 at that stage, and the
result is rounded to exactly 2e-1000, which is off by a factor of
around 1.33.

That makes it think the result weight will be -998, when actually it's
-755, so it then chooses a local_rscale for the full calculation
that's far too small, and the result is very inaccurate.

To fix this, I think it's necessary to remove the line that limits the
initial local_rscale. I tried that in a debugger, and managed to get a
result that agreed with the result from "bc -l" with a scale of 2000.

The reason I think it will be OK to remove that line is that it only
ever comes into play when ln_dweight is a large negative number (and
the smallest it can be is -16383). But that can only happen in
instances like this, where the base is very very close to 1. In such
cases, the ln(base) calculation is very fast, because it basically
only has to do a couple of Taylor series terms, and it's done. This
will still only be a low-precision estimate of the result (about 8
significant digits, shifted down a long way).

It might also be necessary to re-think the choice of local_rscale for
the mul_var() that follows. If the weight of exp is much larger than
the weight of ln_base (or vice versa), it doesn't really make sense to
compute the product to the same local_rscale. That might be a source
of other inaccuracies. I'll try to investigate some more.

Anyway, I don't think any of that should get in the way of committing
the current patch.

Regards,
Dean
From 071d877af3ffcb8e2abd445cb5963d90a2e766bc Mon Sep 17 00:00:00 2001
From: Dean Rasheed 
Date: Thu, 29 Jul 2021 17:26:08 +0100
Subject: [PATCH] Fix corner-case errors and loss of precision in
 numeric_power().

This fixes a couple of related problems that arise when raising
numbers to very large powers.

Firstly, when raising a negative number to a very large integer power,
the result should be well-defined, but the previous code would only
cope if the exponent was small enough to go through power_var_int().
Otherwise it would throw an internal error, attempting to take the
logarithm of a negative number. Fix this by adding suitable handling
to the general case in power_var() to cope with negative bases,
checking for integer powers there.

Next, when raising a (positive or negative) number whose absolute
value is slightly less than 1 to a very large power, the result should
approach zero as the power is increased. However, in some cases, for
sufficiently large powers, this would lose all precision and return 1
instead. This was due to the way that the local_rscale was being
calculated for the final full-precision calculation:

  local_rscale = rscale + (int) val - ln_dweight + 8

The first two terms on the right hand side are meant to give the
number of significant digits required in the result ("val" being the
estimated result weight). However, this failed to account for the fact
that rscale is clipped to a maximum of NUMERIC_MAX_DISPLAY_SCALE
(1000), and the result weight might be less then -1000, causing their
sum to be negative, leading to a loss of precision. Fix this by
forcing the number of significant digits to be nonnegative. It's OK
for it to be zero (when the result weight is less than -1000), since
the local_rscale value then includes a few extra digits to ensure an

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Jan Wieck

On 7/29/21 2:04 PM, Julien Rouhaud wrote:

On Thu, Jul 29, 2021 at 11:46:12AM -0400, Jan Wieck wrote:



> I don't have a ready example of such an extension, but if we ever would
> convert the backend parts of Slony into an extension, it would be one.



FWIW I have an example of such an extension: powa-archivist extension
script runs an anonymous block code and creates if needed a custom
wrapper to emulate the current_setting(text, boolean) variant that
doesn't exist on pre-pg96 servers.



Thank you!

I presume that pg_upgrade on a database with that extension installed 
would silently succeed and have the pg_catalog as well as public (or 
wherever) version of that function present.



Regards, Jan

--
Jan Wieck




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Alvaro Herrera
On 2021-Jul-29, Dave Cramer wrote:

> > I suggest "can be upgraded" rather than "should be upgraded" because
> > we're not making a recommendation, merely stating the fact that it is
> > possible to do so.
>
> Why not recommend it? I was going to suggest that we actually run alter
> extension upgrade ... on all of them as a solution.
> 
> What are the downsides to upgrading them all by default ? AFAIK if they
> need upgrading this should run all of the SQL scripts, if they don't then
> this should be a NO-OP.

I'm not aware of any downsides, and I think it would be a good idea to
do so, but I also think that it would be good to sort out the docs
precisely (a backpatchable doc change, IMV) and once that is done we can
discuss how to improve pg_upgrade so that users no longer need to do
that (a non-backpatchable code change).  Incremental improvements and
all that ...

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 10:00:12AM -0700, David G. Johnston wrote:
> I'm warming up to "should" but maybe add a "why" such as "the old versions are
> considered unsupported on the newer server".
> 
> I dislike "usually via operating system commands", just offload this to the
> extension, i.e., "must be installed in the new cluster via installation
> procedures specific to, and documented by, each extension (for contrib it is
> usually enough to ensure the -contrib package was chosen to be installed along
> with the -server package for your operating system.)"
> 
> I would simplify the first two sentences to just:
> 
> If the old cluster used extensions those same extensions must be installed in
> the new cluster via installation procedures specific to, and documented by,
> each extension.  For contrib extensions it is usually enough to install the
> -contrib package via the same method you used to install the PostgreSQL 
> server.
> 
> I would consider my suggestion for "copy as-is/alter extension" wording in my
> previous email in lieu of the existing third and fourth sentences, with the
> "should" and "why" wording possibly worked in.  But the existing works ok.

I am sorry but none of your suggestions are exciting me --- they seem to
get into too much detail for the context.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: amcheck verification for GiST and GIN

2021-07-29 Thread Nikolay Samokhvalov
Hello,

First of all, thank you all -- Andrey, Peter, Heikki and others -- for this
work, GIN support in amcheck is *really* needed, especially for OS upgrades
such as from Ubuntu 16.04 (which is EOL now) to 18.04 or 20.04

I was trying to check a bunch of GINs on some production after switching
from Ubuntu 16.04 to 18.04 and got many errors. So decided to check for
16.04 first (that is still used on prod for that DB), without any OS/glibc
changes.

On 16.04, I still saw errors and it was not really expected because this
should mean that production is corrupted too. So, REINDEX should fix it.
But it didn't -- see output below. I cannot give data and thinking how to
create a synthetic demo of this. Any suggestions?

And is this a sign that the tool is wrong rather that we have a real
corruption cases? (I assume if we did, we would see no errors after
REINDEXing -- of course, if GIN itself doesn't have bugs).

Env: Ubuntu 16.04 (so, glibc 2.27), Postgres 12.7, patch from Heikki
slightly adjusted to work with PG12 (
https://gitlab.com/postgres/postgres/-/merge_requests/5) snippet used to
run amcheck:
https://gitlab.com/-/snippets/2001962 (see file #3)

Before reindex:


INFO:  [2021-07-29 17:44:42.525+00] Processing 4/29: index:
index_XXX_trigram (index relpages: 117935; heap tuples: ~379793)...

ERROR: index "index_XXX_trigram" has wrong tuple order, block 65754, offset
232


test=# reindex index index_XXX_trigram;

REINDEX



After REINDEX:


INFO:  [2021-07-29 18:01:23.339+00] Processing 4/29: index:
index_XXX_trigram (index relpages: 70100; heap tuples: ~379793)...

ERROR: index "index_XXX_trigram" has wrong tuple order, block 70048, offset
253



On Thu, Jul 15, 2021 at 00:03 Heikki Linnakangas  wrote:

> On 07/08/2020 00:33, Peter Geoghegan wrote:
> > On Wed, May 27, 2020 at 10:11 AM Grigory Kryachko 
> wrote:
> >> Here is the patch which I (with Andrey as my advisor) built on the top
> of the last patch from this thread:
> https://commitfest.postgresql.org/25/1800/ .
> >> It adds an ability to verify validity  of GIN index. It is not polished
> yet, but it works and we wanted to show it to you so you can give us some
> feedback, and also let you know about this work if you have any plans of
> writing something like that yourselves, so that you do not redo what is
> already done.
> >
> > Can you rebase this patch, please?
> >
> > Also suggest breaking out the series into distinct patch files using
> > "git format-patch master".
>
> I rebased the GIN parts of this patch, see attached. I also ran pgindent
> and made some other tiny cosmetic fixes, but I didn't review the patch,
> only rebased it in the state it was.
>
> I was hoping that this would be useful to track down the bug we're
> discussing here:
>
> https://www.postgresql.org/message-id/CAJYBUS8aBQQL22oHsAwjHdwYfdB_NMzt7-sZxhxiOdEdn7cOkw%40mail.gmail.com.
>
> But now that I look what checks this performs, I doubt this will catch
> the kind of corruption that's happened there. I suspect it's more subtle
> than an inconsistencies between parent and child pages, because only a
> few rows are affected. But doesn't hurt to try.
>
> - Heikki
>


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 02:28:55PM -0400, Bruce Momjian wrote:
> On Thu, Jul 29, 2021 at 10:00:12AM -0700, David G. Johnston wrote:
> > I'm warming up to "should" but maybe add a "why" such as "the old versions 
> > are
> > considered unsupported on the newer server".
> > 
> > I dislike "usually via operating system commands", just offload this to the
> > extension, i.e., "must be installed in the new cluster via installation
> > procedures specific to, and documented by, each extension (for contrib it is
> > usually enough to ensure the -contrib package was chosen to be installed 
> > along
> > with the -server package for your operating system.)"
> > 
> > I would simplify the first two sentences to just:
> > 
> > If the old cluster used extensions those same extensions must be installed 
> > in
> > the new cluster via installation procedures specific to, and documented by,
> > each extension.  For contrib extensions it is usually enough to install the
> > -contrib package via the same method you used to install the PostgreSQL 
> > server.

Oh, and you can't use the same installation procedures as when you
installed the extension because that probably included CREATE EXTENSION.
This really highlights why this is tricky to explain --- we need the
binaries, but not the SQL that goes with it.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 11:28 AM Bruce Momjian  wrote:

> I am sorry but none of your suggestions are exciting me --- they seem to
> get into too much detail for the context.
>

Fair, I still need to consider Alvaro's anyway, but given the amount of
general angst surrounding performing a pg_upgrade I do not feel that being
detailed is necessarily a bad thing, so long as the detail is relevant.
But I'll keep this in mind for my next reply.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 11:35 AM Bruce Momjian  wrote:

>
> Oh, and you can't use the same installation procedures as when you
> installed the extension because that probably included CREATE EXTENSION.
> This really highlights why this is tricky to explain --- we need the
> binaries, but not the SQL that goes with it.
>

Maybe...but the fact that "installation to the O/S" is cluster-wide and
"CREATE EXTENSION" is database-specific I believe this will generally take
care of itself in practice, especially if we leave the part (but ignore any
installation instructions that advise executing create extension).

David J.


Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Yura Sokolov

Robert Haas писал 2021-07-29 20:15:
On Thu, Jul 29, 2021 at 5:11 AM Masahiko Sawada  
wrote:

Indeed. Given that the radix tree itself has other use cases, I have
no concern about using radix tree for vacuum's dead tuples storage. It
will be better to have one that can be generally used and has some
optimizations that are helpful also for vacuum's use case, rather than
having one that is very optimized only for vacuum's use case.


What I'm about to say might be a really stupid idea, especially since
I haven't looked at any of the code already posted, but what I'm
wondering about is whether we need a full radix tree or maybe just a
radix-like lookup aid. For example, suppose that for a relation <= 8MB
in size, we create an array of 1024 elements indexed by block number.
Each element of the array stores an offset into the dead TID array.
When you need to probe for a TID, you look up blkno and blkno + 1 in
the array and then bsearch only between those two offsets. For bigger
relations, a two or three level structure could be built, or it could
always be 3 levels. This could even be done on demand, so you
initialize all of the elements to some special value that means "not
computed yet" and then fill them the first time they're needed,
perhaps with another special value that means "no TIDs in that block".


8MB relation is not a problem, imo. There is no need to do anything to
handle 8MB relation.

Problem is 2TB relation. It has 256M pages and, lets suppose, 3G dead
tuples.

Then offset array will be 2GB and tuple offset array will be 6GB (2 byte
offset per tuple). 8GB in total.

We can make offset array only for higher 3 bytes of block number.
We then will have 1M offset array weighted 8MB, and there will be array
of 3byte tuple pointers (1 remaining byte from block number, and 2 bytes
from Tuple) weighted 9GB.

But using per-batch compression schemes, there could be amortized
4 byte per page and 1 byte per tuple: 1GB + 3GB = 4GB memory.
Yes, it is not as guaranteed as in array approach. But 95% of time it is
such low and even lower. And better: more tuples are dead - better
compression works. Page with all tuples dead could be encoded as little
as 5 bytes. Therefore, overall memory consumption is more stable and
predictive.

Lower memory consumption of tuple storage means there is less chance
indexes should be scanned twice or more times. It gives more
predictability in user experience.


I don't know if this is better, but I do kind of like the fact that
the basic representation is just an array. It makes it really easy to
predict how much memory will be needed for a given number of dead
TIDs, and it's very DSM-friendly as well.


Whole thing could be encoded in one single array of bytes. Just give
"pointer-to-array"+"array-size" to constructor, and use "bump allocator"
inside. Complex logical structure doesn't imply "DSM-unfriendliness".
Hmm I mean if it is suitably designed.

In fact, my code uses bump allocator internally to avoid "per-allocation
overhead" of "aset", "slab" or "generational". And IntegerSet2 version
even uses it for all allocations since it has no reallocatable parts.

Well, if datastructure has reallocatable parts, it could be less 
friendly

to DSM.

regards,

---
Yura Sokolov
y.soko...@postgrespro.ru
funny.fal...@gmail.com




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 01:43:09PM -0400, Álvaro Herrera wrote:
> On 2021-Jul-29, Bruce Momjian wrote:
> 
> > + If the old cluster used extensions, whether from
> > + contrib or some other source, it used
> > + shared object files (or DLLs) to implement these extensions, e.g.,
> > + pgcrypto.so.  Now, shared object files matching
> > + the new server binary must be installed in the new cluster, usually
> > + via operating system commands.  Do not load the schema definitions,
> > + e.g., CREATE EXTENSION pgcrypto, because these
> > + will be copied from the old cluster.  (Extensions should be upgraded
> > + later using ALTER EXTENSION ... UPGRADE.)
> 
> I propose this:
> 
> 
>   If the old cluster used shared-object files (or DLLs) for extensions
>   or other loadable modules, install recompiled versions of those files
>   onto the new cluster.
>   Do not install the extension themselves (i.e., do not run
>   CREATE EXTENSION), because extension definitions
>   will be carried forward from the old cluster.
> 
> 
> 
>   Extensions can be upgraded after pg_upgrade completes using
>   ALTER EXTENSION ... UPGRADE, on a per-database
>   basis.
> 
> 
> I suggest " ... for extensions or other loadable modules" because
> loadable modules aren't necessarily for extensions.  Also, it's
> perfectly possible to have extension that don't have a loadable module.

Yes, good point.

> I suggest "extension definitions ... carried forward" instead of
> "extensions ... copied" (your proposed text) to avoid the idea that
> files are copied; use it instead of "schema definitions ... upgraded"
> (the current docs) to avoid the idea that they are actually upgraded;
> also, "schema definition" seems a misleading term to use here.

I used the term "duplicated".

> I suggest "can be upgraded" rather than "should be upgraded" because
> we're not making a recommendation, merely stating the fact that it is
> possible to do so.

Agreed.  Most extensions don't have updates between major versions.

> I suggest using the imperative mood, to be consistent with the
> surrounding text.  (Applies to the first para; the second para is
> informative.)

OK.

> I haven't seen it mentioned in the thread, but I think the final phrase
> of this  should be a separate step,
> 
> 
>  Copy custom full-text search files
>  
>   Copy any custom full text search file (dictionary, synonym, thesaurus,
>   stop word list) to the new server.
>  
> 
> 
> While this is closely related to extensions, it's completely different.

Agreed.  See attached patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index a83c63cd98..0a71465805 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -299,17 +299,27 @@ make prefix=/usr/local/pgsql.new install

 

-Install custom shared object files
+Install extension shared object files
 
 
- Install any custom shared object files (or DLLs) used by the old cluster
- into the new cluster, e.g., pgcrypto.so,
- whether they are from contrib
- or some other source. Do not install the schema definitions, e.g.,
- CREATE EXTENSION pgcrypto, because these will be upgraded
- from the old cluster.
- Also, any custom full text search files (dictionary, synonym,
- thesaurus, stop words) must also be copied to the new cluster.
+ Many extensions and custom modules, whether from
+ contrib or another source, use shared object
+ files (or DLLs), e.g., pgcrypto.so.  If the old
+ cluster used these, shared object files matching the new server binary
+ must be installed in the new cluster, usually via operating system
+ commands.  Do not load the schema definitions, e.g., CREATE
+ EXTENSION pgcrypto, because these will be duplicated from
+ the old cluster.  (Extensions with available updates can be processed
+ later using ALTER EXTENSION ... UPDATE.)
+
+   
+
+   
+Copy custom full-text search files
+
+
+ Copy any custom full text search files (dictionary, synonym,
+ thesaurus, stop words) from the old to the new cluster.
 

 
@@ -494,10 +504,10 @@ pg_upgrade.exe
  
 
  
-  Install custom shared object files
+  Install extension shared object files
 
   
-   Install the same custom shared object files on the new standbys
+   Install the same extension shared object files on the new standbys
that you installed in the new primary cluster.
   
  


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 02:19:17PM -0400, Álvaro Herrera wrote:
> On 2021-Jul-29, Dave Cramer wrote:
> 
> > > I suggest "can be upgraded" rather than "should be upgraded" because
> > > we're not making a recommendation, merely stating the fact that it is
> > > possible to do so.
> >
> > Why not recommend it? I was going to suggest that we actually run alter
> > extension upgrade ... on all of them as a solution.
> > 
> > What are the downsides to upgrading them all by default ? AFAIK if they
> > need upgrading this should run all of the SQL scripts, if they don't then
> > this should be a NO-OP.
> 
> I'm not aware of any downsides, and I think it would be a good idea to
> do so, but I also think that it would be good to sort out the docs
> precisely (a backpatchable doc change, IMV) and once that is done we can
> discuss how to improve pg_upgrade so that users no longer need to do
> that (a non-backpatchable code change).  Incremental improvements and
> all that ...

Agreed.  I don't think we have any consistent set of steps for detecting
and upgrading extensions --- that needs a lot more research.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Andres Freund
Hi,

On 2021-07-29 13:15:53 -0400, Robert Haas wrote:
> I don't know if this is better, but I do kind of like the fact that
> the basic representation is just an array. It makes it really easy to
> predict how much memory will be needed for a given number of dead
> TIDs, and it's very DSM-friendly as well.

I think those advantages are far outstripped by the big disadvantage of
needing to either size the array accurately from the start, or to
reallocate the whole array.  Our current pre-allocation behaviour is
very wasteful for most vacuums but doesn't handle large work_mem at all,
causing unnecessary index scans.

Greetings,

Andres Freund




Re: needless complexity in StartupXLOG

2021-07-29 Thread Andres Freund
Hi,

On 2021-07-29 12:49:19 +0800, Julien Rouhaud wrote:
> On Thu, Jul 29, 2021 at 3:28 AM Andres Freund  wrote:
> >
> > [1] I really wish somebody had the energy to just remove single user and
> > bootstrap modes. The degree to which they increase complexity in the rest of
> > the system is entirely unreasonable. There's not actually any reason
> > bootstrapping can't happen with checkpointer et al running, it's just
> > autovacuum that'd need to be blocked.
> 
> Any objection to adding an entry for that in the wiki TODO?

Not sure there's enough concensus on the idea for that. I personally
think that's a good approach at reducing relevant complexity, but I
don't know if anybody agrees...

Greetings,

Andres Freund




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
Dave Cramer


On Thu, 29 Jul 2021 at 15:06, Bruce Momjian  wrote:

> On Thu, Jul 29, 2021 at 01:43:09PM -0400, Álvaro Herrera wrote:
> > On 2021-Jul-29, Bruce Momjian wrote:
> >
> > > + If the old cluster used extensions, whether from
> > > + contrib or some other source, it used
> > > + shared object files (or DLLs) to implement these extensions,
> e.g.,
> > > + pgcrypto.so.  Now, shared object files
> matching
> > > + the new server binary must be installed in the new cluster,
> usually
> > > + via operating system commands.  Do not load the schema
> definitions,
> > > + e.g., CREATE EXTENSION pgcrypto, because these
> > > + will be copied from the old cluster.  (Extensions should be
> upgraded
> > > + later using ALTER EXTENSION ... UPGRADE.)
> >
> > I propose this:
> >
> > 
> >   If the old cluster used shared-object files (or DLLs) for extensions
> >   or other loadable modules, install recompiled versions of those files
> >   onto the new cluster.
> >   Do not install the extension themselves (i.e., do not run
> >   CREATE EXTENSION), because extension definitions
> >   will be carried forward from the old cluster.
> > 
> >
> > 
> >   Extensions can be upgraded after pg_upgrade completes using
> >   ALTER EXTENSION ... UPGRADE, on a per-database
> >   basis.
> > 
> >
> > I suggest " ... for extensions or other loadable modules" because
> > loadable modules aren't necessarily for extensions.  Also, it's
> > perfectly possible to have extension that don't have a loadable module.
>
> Yes, good point.
>
> > I suggest "extension definitions ... carried forward" instead of
> > "extensions ... copied" (your proposed text) to avoid the idea that
> > files are copied; use it instead of "schema definitions ... upgraded"
> > (the current docs) to avoid the idea that they are actually upgraded;
> > also, "schema definition" seems a misleading term to use here.
>
> I used the term "duplicated".
>
> > I suggest "can be upgraded" rather than "should be upgraded" because
> > we're not making a recommendation, merely stating the fact that it is
> > possible to do so.
>
> Agreed.  Most extensions don't have updates between major versions.
>
> > I suggest using the imperative mood, to be consistent with the
> > surrounding text.  (Applies to the first para; the second para is
> > informative.)
>
> OK.
>
> > I haven't seen it mentioned in the thread, but I think the final phrase
> > of this  should be a separate step,
> >
> > 
> >  Copy custom full-text search files
> >  
> >   Copy any custom full text search file (dictionary, synonym, thesaurus,
> >   stop word list) to the new server.
> >  
> > 
> >
> > While this is closely related to extensions, it's completely different.
>
> Agreed.  See attached patch.
>

So back to the original motivation for bringing this up. Recall that a
cluster was upgraded. Everything appeared to work fine, except that the
definitions of the functions were slightly different enough to cause a
fatal issue on restoring a dump from pg_dump.
Since pg_upgrade is now part of the core project, we need to make sure this
is not possible or be much more insistent that the user needs to upgrade
any extensions that require it.

I believe we should be doing more than making a recommendation.

Dave

>
>


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 12:28 PM Dave Cramer  wrote:

> So back to the original motivation for bringing this up. Recall that a
> cluster was upgraded. Everything appeared to work fine, except that the
> definitions of the functions were slightly different enough to cause a
> fatal issue on restoring a dump from pg_dump.
> Since pg_upgrade is now part of the core project, we need to make sure
> this is not possible or be much more insistent that the user needs to
> upgrade any extensions that require it.
>

I'm missing something here because I do not recall that level of detail
being provided.  The first email was simply an observation that the
pg_upgraded version and the create extension version were different in the
new cluster - which is working as designed (opinions of said design, at
least right here and now, are immaterial).

>From your piecemeal follow-on descriptions I do see that pg_dump seems to
be involved - though a self-contained demonstration is not available that I
can find.  But so far as pg_dump is concerned it just needs to export the
current version each database is running for a given extension, and
pg_restore issue a CREATE EXTENSION for the same version when prompted.  I
presume it does this correctly but admittedly haven't checked.  IOW, if
pg_dump is failing here it is more likely its own bug and should be fixed
rather than blame pg_upgrade.  Or its pg_stat_statement's bug and it should
be fixed.

In theory the procedure and requirements imposed by pg_upgrade here seem
reasonable.  Fewer moving parts during the upgrade is strictly better.  The
documentation was not clear on how things worked, and so its being cleaned
up, but the how hasn't been shown yet to be a problem nor that simply
running alter extension would be an appropriate solution for this single
case let alone in general.  Since running alter extension manually is
simple constructing such a test case and proving that the alter extension
at least works for it should be straight-forward.

Without that I cannot support changing the behavior or even saying that
users must run alter extension manually to overcome a limitation in
pg_upgrade.  They should do so in order to keep their code base current and
running supported code - but that is a judgement we've always left to the
DBA, with the exception of strongly discouraging not updating to the newest
point release and getting off unsupported major releases.

David J.


David J.


Re: Case expression pushdown

2021-07-29 Thread Tom Lane
Alexander Pyhalov  writes:
> [ 0001-Allow-pushing-CASE-expression-to-foreign-server-v7.patch ]

I looked this over.  It's better than before, but the collation
handling is still not at all correct.  We have to consider that a
CASE's arg expression supplies the collation for a contained
CaseTestExpr, otherwise we'll come to the wrong conclusions about
whether "CASE foreignvar WHEN ..." is shippable, if the foreignvar
is what's determining collation of the comparisons.

This means that the CaseExpr level of recursion has to pass data down
to the CaseTestExpr level.  In the attached, I did that by adding an
additional argument to foreign_expr_walker().  That's a bit invasive,
but it's not awful.  I thought about instead adding fields to the
foreign_loc_cxt struct.  But that seemed considerably messier in the
end, because we'd then have some fields that are information sourced
at one recursion level and some that are info sourced at another
level.

I also whacked the regression test cases around a lot.  They seemed
to spend a lot of time on irrelevant combinations, while failing to
check the things that matter, namely whether collation-based pushdown
decisions are made correctly.

regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c..d0cbfa8ad9 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -116,7 +116,8 @@ typedef struct deparse_expr_cxt
  */
 static bool foreign_expr_walker(Node *node,
 foreign_glob_cxt *glob_cxt,
-foreign_loc_cxt *outer_cxt);
+foreign_loc_cxt *outer_cxt,
+foreign_loc_cxt *case_arg_cxt);
 static char *deparse_type_name(Oid type_oid, int32 typemod);
 
 /*
@@ -158,6 +159,7 @@ static void deparseScalarArrayOpExpr(ScalarArrayOpExpr *node,
 static void deparseRelabelType(RelabelType *node, deparse_expr_cxt *context);
 static void deparseBoolExpr(BoolExpr *node, deparse_expr_cxt *context);
 static void deparseNullTest(NullTest *node, deparse_expr_cxt *context);
+static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
 static void deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context);
 static void printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod,
 			 deparse_expr_cxt *context);
@@ -254,7 +256,7 @@ is_foreign_expr(PlannerInfo *root,
 		glob_cxt.relids = baserel->relids;
 	loc_cxt.collation = InvalidOid;
 	loc_cxt.state = FDW_COLLATE_NONE;
-	if (!foreign_expr_walker((Node *) expr, &glob_cxt, &loc_cxt))
+	if (!foreign_expr_walker((Node *) expr, &glob_cxt, &loc_cxt, NULL))
 		return false;
 
 	/*
@@ -283,6 +285,10 @@ is_foreign_expr(PlannerInfo *root,
  *
  * In addition, *outer_cxt is updated with collation information.
  *
+ * case_arg_cxt is NULL if this subexpression is not inside a CASE-with-arg.
+ * Otherwise, it points to the collation info derived from the arg expression,
+ * which must be consulted by any CaseTestExpr.
+ *
  * We must check that the expression contains only node types we can deparse,
  * that all types/functions/operators are safe to send (they are "shippable"),
  * and that all collations used in the expression derive from Vars of the
@@ -294,7 +300,8 @@ is_foreign_expr(PlannerInfo *root,
 static bool
 foreign_expr_walker(Node *node,
 	foreign_glob_cxt *glob_cxt,
-	foreign_loc_cxt *outer_cxt)
+	foreign_loc_cxt *outer_cxt,
+	foreign_loc_cxt *case_arg_cxt)
 {
 	bool		check_type = true;
 	PgFdwRelationInfo *fpinfo;
@@ -432,17 +439,17 @@ foreign_expr_walker(Node *node,
  * result, so do those first and reset inner_cxt afterwards.
  */
 if (!foreign_expr_walker((Node *) sr->refupperindexpr,
-		 glob_cxt, &inner_cxt))
+		 glob_cxt, &inner_cxt, case_arg_cxt))
 	return false;
 inner_cxt.collation = InvalidOid;
 inner_cxt.state = FDW_COLLATE_NONE;
 if (!foreign_expr_walker((Node *) sr->reflowerindexpr,
-		 glob_cxt, &inner_cxt))
+		 glob_cxt, &inner_cxt, case_arg_cxt))
 	return false;
 inner_cxt.collation = InvalidOid;
 inner_cxt.state = FDW_COLLATE_NONE;
 if (!foreign_expr_walker((Node *) sr->refexpr,
-		 glob_cxt, &inner_cxt))
+		 glob_cxt, &inner_cxt, case_arg_cxt))
 	return false;
 
 /*
@@ -478,7 +485,7 @@ foreign_expr_walker(Node *node,
  * Recurse to input subexpressions.
  */
 if (!foreign_expr_walker((Node *) fe->args,
-		 glob_cxt, &inner_cxt))
+		 glob_cxt, &inner_cxt, case_arg_cxt))
 	return false;
 
 /*
@@ -526,7 +533,7 @@ foreign_expr_walker(Node *node,
  * Recurse to input subexpressions.
  */
 if (!foreign_expr_walker((Node *) oe->args,
-		 glob_cxt, &inner_cxt))
+		 glob_cxt, &inner_cxt, case_arg_cxt))
 	return false;
 
 /*
@@ -566,7 +573,7 @@ foreign_expr_walker(Node *node,
  * Recurse to input subexpressions.
  */
 if (!foreign_expr_walker((Node *) oe->args,
-		 glob_c

Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Tom Lane
Dave Cramer  writes:
> So back to the original motivation for bringing this up. Recall that a
> cluster was upgraded. Everything appeared to work fine, except that the
> definitions of the functions were slightly different enough to cause a
> fatal issue on restoring a dump from pg_dump.
> Since pg_upgrade is now part of the core project, we need to make sure this
> is not possible or be much more insistent that the user needs to upgrade
> any extensions that require it.

TBH, this seems like mostly the fault of the extension author.
The established design is that the SQL-level objects will be
carried forward verbatim by pg_upgrade.  Therefore, any newer-version
shared library MUST be able to cope sanely with SQL objects from
some previous revision.  The contrib modules provide multiple
examples of ways to do this.

If particular extension authors don't care to go to that much
trouble, it's on their heads to document that their extensions
are unsafe to pg_upgrade.

regards, tom lane




Re: Doc: Fixed the result of the bit_count example

2021-07-29 Thread Daniel Gustafsson
> On 29 Jul 2021, at 11:35, Daniel Gustafsson  wrote:

> I'll apply this shortly backpatched to 14 where bit_count was introduced.


And done, thanks!

--
Daniel Gustafsson   https://vmware.com/





Re: Replace l337sp34k in comments.

2021-07-29 Thread Gavin Flower

On 30/07/21 12:51 am, Geoff Winkless wrote:
On Thu, 29 Jul 2021 at 11:22, Andrew Dunstan > wrote:


Personally, I would have written this as just "up to date", I don't
think the hyphens are required.

FWIW Mirriam-Webster and the CED suggest "up-to-date" when before a 
noun, so the changes should be "up-to-date answer" but "are up to date".


https://dictionary.cambridge.org/dictionary/english/up-to-date 



Geoff


That 'feels' right to me.

Though in code, possibly it would be better to just use 'up-to-date' in 
code for consistency and to make the it easier to grep?


As a minor aside: double quotes should be used for speech and single 
quotes for quoting!



Cheers,
Gavin





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
 On Thu, Jul 29, 2021 at 05:01:24PM -0400, Tom Lane wrote:
> Dave Cramer  writes:
> > So back to the original motivation for bringing this up. Recall that a
> > cluster was upgraded. Everything appeared to work fine, except that the
> > definitions of the functions were slightly different enough to cause a
> > fatal issue on restoring a dump from pg_dump.
> > Since pg_upgrade is now part of the core project, we need to make sure this
> > is not possible or be much more insistent that the user needs to upgrade
> > any extensions that require it.
> 
> TBH, this seems like mostly the fault of the extension author.
> The established design is that the SQL-level objects will be
> carried forward verbatim by pg_upgrade.  Therefore, any newer-version
> shared library MUST be able to cope sanely with SQL objects from
> some previous revision.  The contrib modules provide multiple
> examples of ways to do this.
> 
> If particular extension authors don't care to go to that much
> trouble, it's on their heads to document that their extensions
> are unsafe to pg_upgrade.

I think we need to first give clear instructions on how to find out if
an extension update is available, and then how to update it.  I am
thinking we should supply a query which reports all extensions that can
be upgraded, at least for contrib.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Tom Lane
Bruce Momjian  writes:
> I think we need to first give clear instructions on how to find out if
> an extension update is available, and then how to update it.  I am
> thinking we should supply a query which reports all extensions that can
> be upgraded, at least for contrib.

I suggested awhile ago that pg_upgrade should look into
pg_available_extensions in the new cluster, and prepare
a script with ALTER EXTENSION UPDATE commands for
anything that's installed but is not the (new cluster's)
default version.

regards, tom lane




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 06:19:56PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > I think we need to first give clear instructions on how to find out if
> > an extension update is available, and then how to update it.  I am
> > thinking we should supply a query which reports all extensions that can
> > be upgraded, at least for contrib.
> 
> I suggested awhile ago that pg_upgrade should look into
> pg_available_extensions in the new cluster, and prepare
> a script with ALTER EXTENSION UPDATE commands for
> anything that's installed but is not the (new cluster's)
> default version.

I can do that, but I would think a pg_dump/restore would also have this
issue, so should this be more generic?  If we had a doc section about
that, we could add it a step to run in the pg_upgrade instructions.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Tom Lane
Bruce Momjian  writes:
> On Thu, Jul 29, 2021 at 06:19:56PM -0400, Tom Lane wrote:
>> I suggested awhile ago that pg_upgrade should look into
>> pg_available_extensions in the new cluster, and prepare
>> a script with ALTER EXTENSION UPDATE commands for
>> anything that's installed but is not the (new cluster's)
>> default version.

> I can do that, but I would think a pg_dump/restore would also have this
> issue, so should this be more generic?

No, because dump/restore does not have this issue.  Regular pg_dump just
issues "CREATE EXTENSION" commands, so you automatically get the target
server's default version.

regards, tom lane




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Bruce Momjian
On Thu, Jul 29, 2021 at 06:29:11PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Thu, Jul 29, 2021 at 06:19:56PM -0400, Tom Lane wrote:
> >> I suggested awhile ago that pg_upgrade should look into
> >> pg_available_extensions in the new cluster, and prepare
> >> a script with ALTER EXTENSION UPDATE commands for
> >> anything that's installed but is not the (new cluster's)
> >> default version.
> 
> > I can do that, but I would think a pg_dump/restore would also have this
> > issue, so should this be more generic?
> 
> No, because dump/restore does not have this issue.  Regular pg_dump just
> issues "CREATE EXTENSION" commands, so you automatically get the target
> server's default version.

Oh, so pg_upgrade does it differently so the oids are preserved?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Dave Cramer
On Thu, 29 Jul 2021 at 18:39, Bruce Momjian  wrote:

> On Thu, Jul 29, 2021 at 06:29:11PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > On Thu, Jul 29, 2021 at 06:19:56PM -0400, Tom Lane wrote:
> > >> I suggested awhile ago that pg_upgrade should look into
> > >> pg_available_extensions in the new cluster, and prepare
> > >> a script with ALTER EXTENSION UPDATE commands for
> > >> anything that's installed but is not the (new cluster's)
> > >> default version.
> >
> > > I can do that, but I would think a pg_dump/restore would also have this
> > > issue, so should this be more generic?
> >
> > No, because dump/restore does not have this issue.  Regular pg_dump just
> > issues "CREATE EXTENSION" commands, so you automatically get the target
> > server's default version.
>
> Oh, so pg_upgrade does it differently so the oids are preserved?
>
>
I suspect this is part of --binary_upgrade mode

Dave

> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   If only the physical world exists, free will is an illusion.
>
>


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread Alvaro Herrera
On 2021-Jul-29, Bruce Momjian wrote:

> On Thu, Jul 29, 2021 at 06:29:11PM -0400, Tom Lane wrote:

> > No, because dump/restore does not have this issue.  Regular pg_dump just
> > issues "CREATE EXTENSION" commands, so you automatically get the target
> > server's default version.
> 
> Oh, so pg_upgrade does it differently so the oids are preserved?

Have a look at pg_dump --binary-upgrade output :-)

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Ellos andaban todos desnudos como su madre los parió, y también las mujeres,
aunque no vi más que una, harto moza, y todos los que yo vi eran todos
mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)




Re: Use WaitLatch for {pre, post}_auth_delay instead of pg_usleep

2021-07-29 Thread Bossart, Nathan
On 7/29/21, 12:59 AM, "Kyotaro Horiguchi"  wrote:
> At Thu, 29 Jul 2021 09:52:08 +0900, Michael Paquier  
> wrote in
>> On Wed, Jul 28, 2021 at 08:28:12PM +, Bossart, Nathan wrote:
>> > On 7/28/21, 11:32 AM, "Tom Lane"  wrote:
>> >> I follow the idea of using WaitLatch to ensure that the delays are
>> >> interruptible by postmaster signals, but even that isn't worth a
>> >> lot given the expected use of these things.  I don't see a need to
>> >> expend any extra effort on wait-reporting.
>> >
>> > +1.  The proposed patch doesn't make the delay visibility any worse
>> > than what's already there.
>>
>> Agreed to just drop the patch (my opinion about this patch is
>> unchanged).  Not to mention that wait events are not available at SQL
>> level at this stage yet.
>
> I'm +1 to not adding wait event stuff at all. So the only advantage
> this patch would offer is interruptivity. I vote +-0.0 for adding that
> interruptivity (+1.0 from the previous opinion of mine:p).

I'm still in favor of moving to WaitLatch() for pre/post_auth_delay,
but I don't think we should worry about the wait reporting stuff.  The
patch doesn't add a tremendous amount of complexity, it improves the
behavior on postmaster crashes, and it follows the best practice
described in pgsleep.c of using WaitLatch() for long sleeps.

Nathan



Re: A qsort template

2021-07-29 Thread John Naylor
DER BY 1 DESC OFFSET '$ROWS''
#		run_query $1 $wm 0 'SELECT * FROM txt_test ORDER BY 1 DESC OFFSET '$ROWS''
		run_index $1 $wm 0 'CREATE INDEX x ON txt_test (a); DROP INDEX x'

		run_query $1 $wm 0 'SELECT * FROM num_test ORDER BY 1 OFFSET '$ROWS''
#		run_query $1 $wm 0 'SELECT * FROM num_test ORDER BY 1 DESC OFFSET '$ROWS''
		run_index $1 $wm 0 'CREATE INDEX x ON num_test (a); DROP INDEX x'

	done

}

create_tables

log "sort benchmark $ROWS"

load_random
run_queries "random"

load_sorted
run_queries "sorted"

load_almost_sorted
run_queries "almost_sorted"

load_reversed
run_queries "reversed"

load_organ_pipe
run_queries "organ_pipe"

load_rotated
run_queries "rotated"

load_0_1
run_queries "0_1"


test-acc-tuplesort-20210729.ods
Description: application/vnd.oasis.opendocument.spreadsheet


0001-WIP-Accelerate-tuple-sorting-for-common-types.patch
Description: Binary data


Re: Replace l337sp34k in comments.

2021-07-29 Thread Michael Paquier
On Fri, Jul 30, 2021 at 09:46:59AM +1200, Gavin Flower wrote:
> That 'feels' right to me.
> 
> Though in code, possibly it would be better to just use 'up-to-date' in code
> for consistency and to make the it easier to grep?

The change in llvmjit_expr.c may not look like an adjective though,
which I admit can be a bit confusing.  Still that does not look
completely wrong to me either.
--
Michael


signature.asc
Description: PGP signature


Re: Autovacuum on partitioned table (autoanalyze)

2021-07-29 Thread Andres Freund
Hi,

CCing RMT because I think we need to do something about this for v14.

On 2021-07-27 19:23:42 -0700, Andres Freund wrote:
> On 2021-07-22 13:54:58 -0700, Andres Freund wrote:
> > On 2021-04-08 01:20:14 -0400, Alvaro Herrera wrote:
> > > On 2021-Apr-07, Alvaro Herrera wrote:
> > >
> > > > OK, I bit the bullet and re-did the logic in the way I had proposed
> > > > earlier in the thread: do the propagation on the collector's side, by
> > > > sending only the list of ancestors: the collector can read the tuple
> > > > change count by itself, to add it to each ancestor.  This seems less
> > > > wasteful.  Attached is v16 which does it that way and seems to work
> > > > nicely under my testing.
> > >
> > > Pushed with this approach.  Thanks for persisting with this.
> >
> > I'm looking at this in the context of rebasing & polishing the shared
> > memory stats patch.
> >
> > I have a few questions / concerns:
>
> Another one, and I think this might warrant thinking about for v14:
>
> Isn't this going to create a *lot* of redundant sampling?  Especially if you
> have any sort of nested partition tree. In the most absurd case a partition
> with n parents will get sampled n times, solely due to changes to itself.
>
> Look at the following example:
>
> BEGIN;
> DROP TABLE if exists p;
> CREATE TABLE p (i int) partition by range(i);
> CREATE TABLE p_0 PARTITION OF p FOR VALUES FROM (   0) to (5000) 
> partition by range(i);
> CREATE TABLE p_0_0 PARTITION OF p_0 FOR VALUES FROM (   0) to (1000);
> CREATE TABLE p_0_1 PARTITION OF p_0 FOR VALUES FROM (1000) to (2000);
> CREATE TABLE p_0_2 PARTITION OF p_0 FOR VALUES FROM (2000) to (3000);
> CREATE TABLE p_0_3 PARTITION OF p_0 FOR VALUES FROM (3000) to (4000);
> CREATE TABLE p_0_4 PARTITION OF p_0 FOR VALUES FROM (4000) to (5000);
> -- create some initial data
> INSERT INTO p select generate_series(0, 5000 - 1) data FROM 
> generate_series(1, 100) reps;
> COMMIT;
>
> UPDATE p_0_4 SET i = i;
>
>
> Whenever the update is executed, all partitions will be sampled at least twice
> (once for p and once for p_0), with p_0_4 sampled three times.
>
> Of course, this is an extreme example, but it's not hard to imagine cases
> where v14 will cause the number of auto-analyzes increase sufficiently to bog
> down autovacuum to a problematic degree.
>
>
> Additionally, while analyzing all child partitions for a partitioned tables
> are AccessShareLock'ed at once. If a partition hierarchy has more than one
> level, it actually is likely that multiple autovacuum workers will end up
> processing the ancestors separately.  This seems like it might contribute to
> lock exhaustion issues with larger partition hierarchies?


I started to write a patch rejiggering autovacuum.c portion of this
change. While testing it I hit the case of manual ANALYZEs leaving
changes_since_analyze for partitioned tables in a bogus state - without a
minimally invasive way to fix that. After a bit of confused staring I realized
that the current code has a very similar problem:

Using the same setup as above:

INSERT INTO p VALUES (0,0); /* repeat as many times as desired */
ANALYZE p_0_0;

At this point the system will have lost track of the changes to p_0_0, unless
an autovacuum worker was launched between the INSERTs and the ANALYZE (which
would cause pgstat_report_anl_ancestors() to report the change count upwards).

There appears to be code trying to address that, but I don't see how it
ever does anything meaningful?

/*
 * Now report ANALYZE to the stats collector.  For regular tables, we do
 * it only if not doing inherited stats.  For partitioned tables, we 
only
 * do it for inherited stats. (We're never called for not-inherited 
stats
 * on partitioned tables anyway.)
 *
 * Reset the changes_since_analyze counter only if we analyzed all
 * columns; otherwise, there is still work for auto-analyze to do.
 */
if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
pgstat_report_analyze(onerel, totalrows, totaldeadrows,
  (va_cols == NIL));

/*
 * If this is a manual analyze of all columns of a permanent leaf
 * partition, and not doing inherited stats, also let the collector know
 * about the ancestor tables of this partition.  Autovacuum does the
 * equivalent of this at the start of its run, so there's no reason to 
do
 * it there.
 */
if (!inh && !IsAutoVacuumWorkerProcess() &&
(va_cols == NIL) &&
onerel->rd_rel->relispartition &&
onerel->rd_rel->relkind == RELKIND_RELATION &&
onerel->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)
{
pgstat_report_anl_ancestors(RelationGetRelid(onerel));
}

The pgstat_report_analyze() triggers pgstat_recv_analyze() to reset the
counter that pg

Re: speed up verifying UTF-8

2021-07-29 Thread John Naylor
On Mon, Jul 26, 2021 at 8:56 AM John Naylor 
wrote:
>
> >
> > Does that (and "len >= 32" condition) mean the patch does not improve
validation of the shorter strings (the ones less than 32 bytes)?
>
> Right. Also, the 32 byte threshold was just a temporary need for testing
32-byte stride -- testing different thresholds wouldn't hurt.  I'm not
terribly concerned about short strings, though, as long as we don't
regress.

I put together the attached quick test to try to rationalize the fast-path
threshold. (In case it isn't obvious, it must be at least 16 on all builds,
since wchar.c doesn't know which implementation it's calling, and SSE
register width sets the lower bound.) I changed the threshold first to 16,
and then 10, which will force using the byte-at-a-time code.

If we have only 16 bytes in the input, it still seems to be faster to use
SSE, even though it's called through a function pointer on x86. I didn't
test the DFA path, but I don't think the conclusion would be different.
I'll include the 16 threshold next time I need to update the patch.

Macbook x86, clang 12:

master + use 16:
 asc16 | asc32 | asc64 | mb16 | mb32 | mb64
---+---+---+--+--+--
   270 |   279 |   282 |  291 |  296 |  304

force byte-at-a-time:
 asc16 | asc32 | asc64 | mb16 | mb32 | mb64
---+---+---+--+--+--
   277 |   292 |   310 |  296 |  317 |  362

--
John Naylor
EDB: http://www.enterprisedb.com


mbverifystr-threshold.sql
Description: Binary data


  1   2   >