Re: bogus: logical replication rows/cols combinations
On Mon, May 2, 2022 at 6:11 PM Alvaro Herrera wrote: > > On 2022-May-02, Amit Kapila wrote: > > > > I think it is possible to expose a list of publications for each > > walsender as it is stored in each walsenders > > LogicalDecodingContext->output_plugin_private. AFAIK, each walsender > > can have one such LogicalDecodingContext and we can probably share it > > via shared memory? > > I guess we need to create a DSM each time a walsender opens a > connection, at START_REPLICATION time. Then ALTER PUBLICATION needs to > connect to all DSMs of all running walsenders and see if they are > reading from it. Is that what you have in mind? Alternatively, we > could have one DSM per publication with a PID array of all walsenders > that are sending it (each walsender needs to add its PID as it starts). > The latter might be better. > While thinking about using DSM here, I came across one of your commits f2f9fcb303 which seems to indicate that it is not a good idea to rely on it but I think you have changed dynamic shared memory to fixed shared memory usage because that was more suitable rather than DSM is not portable. Because I see a commit bcbd940806 where we have removed the 'none' option of dynamic_shared_memory_type. So, I think it should be okay to use DSM in this context. What do you think? -- With Regards, Amit Kapila.
Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
On Wed, May 4, 2022 at 2:23 PM Thomas Munro wrote: > Assuming no > objections or CI failures show up, I'll consider pushing the first two > patches tomorrow. Done. Time to watch the build farm. It's possible that these changes will produce some blowback, now that we're using PROCSIGNAL_BARRIER_SMGRRELEASE on common platforms. Obviously the earlier work started down this path already, but that was Windows-only, so it probably didn't get much attention from our mostly Unix crowd. For example, if there are bugs in the procsignal barrier system, or if there are places that don't process interrupts at all or promptly, we might hear complaints about that. That bug surface includes, for example, background workers created by extensions. An example on my mind is a place where we hold interrupts while cleaning up temporary files (= a loop of arbitrary size with filesystem ops that might hang on slow storage), so a concurrent DROP TABLESPACE will have to wait, which is kinda strange because it would in fact be perfectly safe to process this particular "interrupt". In that case we really just don't want the kinds of interrupts that perform non-local exits and prevent our cleanup work from running to completion, but we don't have a way to say that. I think we'll probably also want to invent a way to report which backend is holding up progress, since without that it's practically impossible for an end user to understand why their command is hanging.
Re: Support logical replication of DDLs
On Fri, May 6, 2022 at 10:21 PM Zheng Li wrote: > > > Attached is a set of two patches as an attempt to evaluate this approach. > > > > Thanks for exploring this direction. > > I read the deparsing thread and your patch. Here is my thought: > 1. The main concern on maintainability of the deparsing code still > applies if we want to adapt it for DDL replication. > I agree that it adds to our maintainability effort, like every time we enhance any DDL or add a new DDL that needs to be replicated, we probably need to change the deparsing code. But OTOH this approach seems to provide better flexibility. So, in the long run, maybe the effort is worthwhile. I am not completely sure at this stage which approach is better but I thought it is worth evaluating this approach as Alvaro and Robert seem to prefer this idea. > 2. The search_path and role still need to be handled, in the deparsing > code. And I think it's actually more overhead to qualify every object > compared to just logging the search_path and enforcing it on the apply > worker. > But doing it in the deparsing code will have the benefit that the other plugins won't have to develop similar logic. -- With Regards, Amit Kapila.
Re: Collation version tracking for macOS
On Mon, Feb 14, 2022 at 10:00 PM Peter Eisentraut wrote: > During development, I have been using the attached patch to simulate > libc collation versions on macOS. It just uses the internal major OS > version number. I don't know to what the extend the libc locales on > macOS are maintained or updated at all, so I don't know what practical > effect this would have. Again, it's mainly for development. If there > is interest from others, I think we could add this, maybe disabled by > default, or we just keep it in the mailing list archives for interested > parties. Last time I looked into this it seemed like macOS's strcoll() gave sensible answers in the traditional single-byte encodings, but didn't understand UTF-8 at all so you get C/strcmp() order. In other words there was effectively nothing to version. I remember that other old Unixes used to be like that, and I suspect that they might be using old pre-UTF-8 FreeBSD code for locales based on a quick peek at [1] (though FreeBSD itself has since learned to do CLDR-based UTF-8 sorting with a completely new implementation shared with other OSes). This makes me wonder if Apple is hiding another collation implementation somewhere up its sleeve -- surely that libc support is not good enough for the world's shiny globalised macOS/iOS apps? Maybe UCCompareText() and friends (UnicodeUtilitiesCoreLib) and the various Obj-C NSString comparison stuff, all of which probably predates Unixoid macOS (google tells me that UnicodeUtilities.h was present in macOS 9). It wouldn't be surprising if it shares nothing with the modern OS's C runtime stuff that came via NeXT. Just mentioning this as a curiosity, because I was trying to figure out how that could be left non-working without anyone complaining... [1] https://github.com/apple-open-source-mirror/Libc/tree/master/locale
Re: make MaxBackends available in _PG_init
On Fri, May 6, 2022 at 5:15 PM Nathan Bossart wrote: > On Fri, May 06, 2022 at 08:27:11AM -0700, Nathan Bossart wrote: > > +1, I'll put together a new patch set. > > As promised... Looks reasonable to me. Anyone else have thoughts? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Mark all GUC variable as PGDLLIMPORT
Hi, On 2022-04-08 08:42:38 -0400, Robert Haas wrote: > On Wed, Apr 6, 2022 at 7:56 PM Michael Paquier wrote: > > On Wed, Apr 06, 2022 at 12:57:29AM +0700, John Naylor wrote: > > > For these two patches, I'd say a day or two after feature freeze is a > > > reasonable goal. > > > > Yeah. For patches as invasive as the PGDLLIMPORT business and the > > frontend error refactoring, I am also fine to have two exceptions with > > the freeze deadline. > > Done now. Just noticed that extern sigset_t UnBlockSig, BlockSig, StartupBlockSig; are unadorned. There's also a number of variables that are only declared in .c files that !windows can still access. Some likely aren't worth caring about, but some others are underlying GUCs, so we probably do? E.g. CommitDelay CommitSiblings default_tablespace ignore_checksum_failure ignore_invalid_pages Log_disconnections ssl_renegotiation_limit temp_tablespaces Unix_socket_group Unix_socket_permissions wal_level_options Likely don't care: dynamic_shared_memory_options gistBufferingOptValues recovery_target_action_options ssl_protocol_versions_info Also noticed that the bbsink_ops variables are const, instead of static const, was that intentional? Greetings, Andres Freund
Re: Possible corruption by CreateRestartPoint at promotion
On Fri, May 06, 2022 at 08:52:45AM -0700, Nathan Bossart wrote: > I was looking at other changes in this area (e.g., 3c64dcb), and now I'm > wondering if we actually should invalidate the minRecoveryPoint when the > control file no longer indicates archive recovery. Specifically, what > happens if a base backup of a newly promoted standby is used for a > point-in-time restore? If the checkpoint location is updated and all > previous segments have been recycled/removed, it seems like the > minRecoveryPoint might point to a missing segment. A new checkpoint is enforced at the beginning of the backup which would update minRecoveryPoint and minRecoveryPointTLI, while we don't a allow a backup to finish if it began on a standby has just promoted in-between. -- Michael signature.asc Description: PGP signature
Re: [RFC] building postgres with meson -v8
Hi, On 2022-05-06 14:27:24 -0700, Andres Freund wrote: > > 0003-meson-Install-all-server-headers.patch > > > > With this, all the server headers installed by a makefile-based build are > > installed. I tried to strike a balance between using install_subdir() with > > exclude list versus listing things explicitly. Different variations might be > > possible, but this looked pretty sensible to me. > > I locally had something similar, but I'm worried that this approach will be > too fragile. Leads to e.g. editor temp files getting installed. I've merged it > for now, but I think we need a different approach. Meant to add potential alternatives here: The easiest likely would be to just add an install script that globs *.h. Alternatively we could build a file list at configure time, and then install that with install_header(). The advantage would be that it be available for things like cpluspluscheck, the disadvantage that something needs to trigger reconfiguration to update the file list. Greetings, Andres Freund
Re: [RFC] building postgres with meson -v8
Hi, On 2022-04-21 17:34:47 -0400, Tom Lane wrote: > FWIW, I don't think that either gaur or prairiedog need be factored into > this conversation. They cannot build ninja at all for lack of , > so whether they could run meson is pretty much beside the point. Yea. > (I wonder if we should stick in a configure test for , > just to see if anything else doesn't have it?) Might be worth doing? > We should worry a little more about Solaris and AIX, but even there I > think it's largely up to the platform owner whether they've updated > python to something modern. Looks like "AIX toolbox" is at 3.7. Solaris 11.4 apparently has 3.5 (11.3 is EOL January 2024). I think it's worth caring about supporting 3.6 due to RHEL 7 for now. > If it isn't, you need to move the goalposts > back some more :-(. As of today I see the following pre-3.6 pythons > in the buildfarm, exclusive of mine: > > skate 3.2.3 > snapper 3.2.3 Debian wheezy, I feel ok with dropping that. > topminnow 3.4.2 Debian jessie, similar. > hornet3.4.3 > sungazer 3.4.3 Looks like a newer python version is available for AIX, without manually compiling. > wrasse3.4.3 Apparently solaris 11.4 has python 3.5 (still not great :/) > shelduck 3.4.10 This animal seems to have retired. > curculio 3.5.1 Supported versions of openbsd have modern versions of python. > hoverfly 3.5.1 AIX > batfish 3.5.2 > spurfowl 3.5.2 > cuon 3.5.2 Ubuntu 16.04 is EOL (since 2021-04), outside of paid extended support. > ayu 3.5.3 > chimaera 3.5.3 > chipmunk 3.5.3 > grison3.5.3 > mussurana 3.5.3 > tadarida 3.5.3 > urocryon 3.5.3 These are all [variants of] debian stretch. I think we should be ok dropping support for that, the extended "LTS" support for stretch ends June 30, 2022 (with the last non-extended update at July 18, 2020). Greetings, Andres Freund [1] https://repology.org/project/python/versions
Re: [RFC] building postgres with meson -v8
Hi, On 2022-05-04 13:53:54 +0200, Peter Eisentraut wrote: > 0001-meson-Assorted-compiler-test-tweaks.patch > > I was going through a diff of pg_config.h between old and new build and > found a few omissions and small differences. Thanks, merged that. > is of course annoying and can be removed eventually, but it's useful when > analyzing the diff, and since it's already done in other places it seems > reasonable to apply it consistently. Yea, I'd tried to minimize the difference at some point, but haven't done so in a while... > 0002-meson-Add-pg_walinspect.patch > > This was added more recently and was not ported yet. Nothing too > interesting here. Merged that. > 0003-meson-Install-all-server-headers.patch > > With this, all the server headers installed by a makefile-based build are > installed. I tried to strike a balance between using install_subdir() with > exclude list versus listing things explicitly. Different variations might be > possible, but this looked pretty sensible to me. I locally had something similar, but I'm worried that this approach will be too fragile. Leads to e.g. editor temp files getting installed. I've merged it for now, but I think we need a different approach. > With these patches, the list of files installed with make versus meson match > up, except for known open items (postmaster symlink, some library naming > differences, pkgconfig, pgxs, test modules installed, documentation). I added pkgconfig since then. They're not exactly the same, but pretty close, except for one thing: Looks like some of the ecpg libraries really should link to some other ecpg libs? I think we're missing something there... That then leads to missing requirements in the .pc files. Re symlink: Do you have an opion about dropping the symlink vs implementing it (likely via a small helper script?)? Re library naming: It'd obviously be easy to adjust the library names, but I wonder if it'd not be worth keeping the _static.a suffix, right now unsuffixed library name imo is quite confusing. Re test modules: Not sure what the best fix for that is yet. Except that we don't have a search path for server libs, I'd just install them to a dedicated path or add the build dir to the search path. But we don't, so ... Re docs: I think the best approach here would be to have a new meson_options.txt option defining whether the docs should be built. But not quite sure. Greetings, Andres Freund
Re: make MaxBackends available in _PG_init
On Fri, May 06, 2022 at 08:27:11AM -0700, Nathan Bossart wrote: > +1, I'll put together a new patch set. As promised... -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 831218d6e0c04d6342bf593dbf6799efdd831b6b Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 18 Apr 2022 15:25:37 -0700 Subject: [PATCH v6 1/2] Add a new shmem_request_hook hook. Currently, preloaded libraries are expected to request additional shared memory and LWLocks in _PG_init(). However, it is not unusal for such requests to depend on MaxBackends, which won't be initialized at that time. Such requests could also depend on GUCs that other modules might change. This introduces a new hook where modules can safely use MaxBackends and GUCs to request additional shared memory and LWLocks. Furthermore, this change restricts requests for shared memory and LWLocks to this hook. Previously, libraries could make requests until the size of the main shared memory segment was calculated. Unlike before, we no longer silently ignore requests received at invalid times. Instead, we FATAL if someone tries to request additional shared memory or LWLocks outside of the hook. Authors: Julien Rouhaud, Nathan Bossart Discussion: https://postgr.es/m/20220412210112.GA2065815%40nathanxps13 --- contrib/pg_prewarm/autoprewarm.c | 17 +++- .../pg_stat_statements/pg_stat_statements.c | 27 +-- src/backend/postmaster/postmaster.c | 5 src/backend/storage/ipc/ipci.c| 20 +- src/backend/storage/lmgr/lwlock.c | 18 - src/backend/utils/init/miscinit.c | 15 +++ src/include/miscadmin.h | 5 src/tools/pgindent/typedefs.list | 1 + 8 files changed, 72 insertions(+), 36 deletions(-) diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c index 45e012a63a..c0c4f5d9ca 100644 --- a/contrib/pg_prewarm/autoprewarm.c +++ b/contrib/pg_prewarm/autoprewarm.c @@ -96,6 +96,8 @@ static void apw_start_database_worker(void); static bool apw_init_shmem(void); static void apw_detach_shmem(int code, Datum arg); static int apw_compare_blockinfo(const void *p, const void *q); +static void autoprewarm_shmem_request(void); +static shmem_request_hook_type prev_shmem_request_hook = NULL; /* Pointer to shared-memory state. */ static AutoPrewarmSharedState *apw_state = NULL; @@ -139,13 +141,26 @@ _PG_init(void) MarkGUCPrefixReserved("pg_prewarm"); - RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState))); + prev_shmem_request_hook = shmem_request_hook; + shmem_request_hook = autoprewarm_shmem_request; /* Register autoprewarm worker, if enabled. */ if (autoprewarm) apw_start_leader_worker(); } +/* + * Requests any additional shared memory required for autoprewarm. + */ +static void +autoprewarm_shmem_request(void) +{ + if (prev_shmem_request_hook) + prev_shmem_request_hook(); + + RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState))); +} + /* * Main entry point for the leader autoprewarm process. Per-database workers * have a separate entry point. diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index df2ce63790..87b75d779e 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -252,6 +252,7 @@ static int exec_nested_level = 0; static int plan_nested_level = 0; /* Saved hook values in case of unload */ +static shmem_request_hook_type prev_shmem_request_hook = NULL; static shmem_startup_hook_type prev_shmem_startup_hook = NULL; static post_parse_analyze_hook_type prev_post_parse_analyze_hook = NULL; static planner_hook_type prev_planner_hook = NULL; @@ -317,6 +318,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_1_10); PG_FUNCTION_INFO_V1(pg_stat_statements); PG_FUNCTION_INFO_V1(pg_stat_statements_info); +static void pgss_shmem_request(void); static void pgss_shmem_startup(void); static void pgss_shmem_shutdown(int code, Datum arg); static void pgss_post_parse_analyze(ParseState *pstate, Query *query, @@ -452,17 +454,11 @@ _PG_init(void) MarkGUCPrefixReserved("pg_stat_statements"); - /* - * Request additional shared resources. (These are no-ops if we're not in - * the postmaster process.) We'll allocate or attach to the shared - * resources in pgss_shmem_startup(). - */ - RequestAddinShmemSpace(pgss_memsize()); - RequestNamedLWLockTranche("pg_stat_statements", 1); - /* * Install hooks. */ + prev_shmem_request_hook = shmem_request_hook; + shmem_request_hook = pgss_shmem_request; prev_shmem_startup_hook = shmem_startup_hook; shmem_startup_hook = pgss_shmem_startup; prev_post_parse_analyze_hook = post_parse_analyze_hook; @@ -488,6 +484,7 @@ void _PG_fini(void) { /* Uninstall hooks. */ + shmem_request_hook = prev_shmem_request_hook; shmem_startup_hook =
Re: configure openldap crash warning
I wrote: > After thinking about this for awhile, it seems like the best solution > is to make configure proceed like this: > 1. Find libldap. > 2. Detect whether it's OpenLDAP 2.5 or newer. > 3. If not, try to find libldap_r. Here's a proposed patch for that. It seems to do the right thing with openldap 2.4.x and 2.6.x, but I don't have a 2.5 installation at hand to try. regards, tom lane diff --git a/configure b/configure index 364f37559d..274e0db8c5 100755 --- a/configure +++ b/configure @@ -13410,7 +13410,18 @@ _ACEOF fi done -if test "$enable_thread_safety" = yes; then +# The separate ldap_r library only exists in OpenLDAP < 2.5, and if we +# have 2.5 or later, we shouldn't even probe for ldap_r (we might find a +# library from a separate OpenLDAP installation). The most reliable +# way to check that is to check for a function introduced in 2.5. +ac_fn_c_check_func "$LINENO" "ldap_verify_credentials" "ac_cv_func_ldap_verify_credentials" +if test "x$ac_cv_func_ldap_verify_credentials" = xyes; then : + thread_safe_libldap=yes +else + thread_safe_libldap=no +fi + +if test "$enable_thread_safety" = yes -a "$thread_safe_libldap" = no; then # Use ldap_r for FE if available, else assume ldap is thread-safe. # On some platforms ldap_r fails to link without PTHREAD_LIBS. LIBS="$_LIBS" diff --git a/configure.ac b/configure.ac index bfd8b713a9..bba1ac5878 100644 --- a/configure.ac +++ b/configure.ac @@ -1374,7 +1374,14 @@ if test "$with_ldap" = yes ; then LDAP_LIBS_BE="-lldap $EXTRA_LDAP_LIBS" # This test is carried out against libldap. AC_CHECK_FUNCS([ldap_initialize]) -if test "$enable_thread_safety" = yes; then +# The separate ldap_r library only exists in OpenLDAP < 2.5, and if we +# have 2.5 or later, we shouldn't even probe for ldap_r (we might find a +# library from a separate OpenLDAP installation). The most reliable +# way to check that is to check for a function introduced in 2.5. +AC_CHECK_FUNC([ldap_verify_credentials], + [thread_safe_libldap=yes], + [thread_safe_libldap=no]) +if test "$enable_thread_safety" = yes -a "$thread_safe_libldap" = no; then # Use ldap_r for FE if available, else assume ldap is thread-safe. # On some platforms ldap_r fails to link without PTHREAD_LIBS. LIBS="$_LIBS"
Re: configure openldap crash warning
I wrote: > We still have a bit of work to do, because this setup isn't getting > all the way through src/test/ldap/: > 2022-05-04 11:01:33.459 EDT [21304] LOG: server process (PID 21312) was > terminated by signal 11: Segmentation fault: 11 > Many of the test cases pass, but it looks like ldaps-related ones don't. Sadly, this still happens with MacPorts' build of openldap 2.6.1. I was able to get a stack trace from the point of the segfault this time: (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) * frame #0: 0x00018f120628 libsystem_pthread.dylib`pthread_rwlock_rdlock frame #1: 0x0001019174c4 libcrypto.3.dylib`CRYPTO_THREAD_read_lock + 12 frame #2: 0x0001019099d0 libcrypto.3.dylib`ossl_lib_ctx_get_data + 56 frame #3: 0x0001019144d0 libcrypto.3.dylib`get_provider_store + 28 frame #4: 0x00010191641c libcrypto.3.dylib`ossl_provider_deregister_child_cb + 32 frame #5: 0x000101909748 libcrypto.3.dylib`OSSL_LIB_CTX_free + 48 frame #6: 0x00010aa982d8 legacy.dylib`legacy_teardown + 24 frame #7: 0x000101914840 libcrypto.3.dylib`ossl_provider_free + 76 frame #8: 0x0001018ec404 libcrypto.3.dylib`evp_cipher_free_int + 48 frame #9: 0x000101472804 libssl.3.dylib`SSL_CTX_free + 420 frame #10: 0x0001013a4768 libldap.2.dylib`ldap_int_tls_destroy + 40 frame #11: 0x00018f00fdd0 libsystem_c.dylib`__cxa_finalize_ranges + 464 frame #12: 0x00018f00fb74 libsystem_c.dylib`exit + 44 frame #13: 0x000100941cb8 postgres`proc_exit(code=) at ipc.c:152:2 [opt] frame #14: 0x00010096d804 postgres`PostgresMain(dbname=, username=) at postgres.c:4756:5 [opt] frame #15: 0x0001008d7730 postgres`BackendRun(port=) at postmaster.c:4489:2 [opt] frame #16: 0x0001008d6ff4 postgres`ServerLoop [inlined] BackendStartup(port=) at postmaster.c:4217:3 [opt] frame #17: 0x0001008d6fcc postgres`ServerLoop at postmaster.c:1791:7 [opt] frame #18: 0x0001008d474c postgres`PostmasterMain(argc=, argv=) at postmaster.c:1463:11 [opt] frame #19: 0x00010083a248 postgres`main(argc=, argv=) at main.c:202:3 [opt] frame #20: 0x00010117908c dyld`start + 520 So (1) libldap relies on libssl to implement ldaps ... no surprise there, and (2) something's going wrong in the atexit callback that it seemingly installs to close down its SSL context. It's not clear from this whether this is purely libldap's fault or if there is something we're doing that sends it off the rails. I could believe that the problem is essentially a double shutdown of libssl, except that there doesn't seem to be any reason why PG itself would have touched libssl; this isn't an SSL-enabled build. (Adding --with-openssl doesn't make it better, either.) On the whole I'm leaning to the position that this is openldap's fault not ours. regards, tom lane
Re: Configuration Parameter/GUC value validation hook
On Wed, May 4, 2022, at 8:12 AM, Bharath Rupireddy wrote: > How about we provide a sample extension (limiting some important > parameters say shared_buffers, work_mem and so on to some > "reasonable/recommended" limits) in the core along with the > set_config_option_hook? This way, all the people using open source > postgres out-of-the-box will benefit and whoever wants, can modify > that sample extension to suit their needs. The sampe extension can > also serve as an example to implement set_config_option_hook. I agree with Robert that providing a feature for it instead of a hook is the way to go. It is not just one or two vendors that will benefit from it but almost or if not all vendors will use this feature. Hooks should be used for niche features; that's not the case here. The commit a0ffa885e47 introduced the GRANT SET ON PARAMETER command. It could be used for this purpose. Despite of accepting GRANT on PGC_USERSET GUCs, it has no use. It doesn't mean that additional properties couldn't be added to the current syntax. This additional use case should be enforced before or while executing set_config_option(). Is it ok to extend this SQL command? The syntax could be: GRANT SET ON PARAMETER work_mem (MIN '1MB', MAX '512MB') TO bob; NULL keyword can be used to remove the MIN and MAX limit. The idea is to avoid a verbose syntax (add an "action" to MIN/MAX -- ADD MIN 1, DROP MAX 234, SET MIN 456). The other alternative is to ALTER USER SET and ALTER DATABASE SET. The current user can set parameter for himself and he could adjust the limits. Besides that the purpose of these SQL commands are to apply initial settings for a combination of user/database. I'm afraid it is out of scope to check after the session is established. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: failures in t/031_recovery_conflict.pl on CI
Andres Freund writes: > Done. Perhaps you could trigger a run on longfin, that seems to have been the > most reliably failing animal? No need, its cron job launched already. regards, tom lane
Re: configure openldap crash warning
I wrote: > Oh, I have a theory about this: I bet your Homebrew installation > has a recent OpenLDAP version that only supplies libldap not libldap_r. > In that case, configure will still find libldap_r available and will > bind libpq to it, and you get the observed result. The configure > check is not sophisticated enough to realize that it's finding chunks > of two different OpenLDAP installations. After thinking about this for awhile, it seems like the best solution is to make configure proceed like this: 1. Find libldap. 2. Detect whether it's OpenLDAP 2.5 or newer. 3. If not, try to find libldap_r. There are various ways we could perform step 2, but I think the most reliable is to try to link to some function that's present in 2.5 but not before. (In particular, this doesn't require any strong assumptions about whether the installation's header files match the library.) After a quick dig in 2.4 and 2.5, it looks like ldap_verify_credentials() would serve. Barring objections, I'll make a patch for that. BTW, I was a little distressed to read this in the 2.4 headers: ** If you fail to define LDAP_THREAD_SAFE when linking with ** -lldap_r or define LDAP_THREAD_SAFE when linking with -lldap, ** provided header definations and declarations may be incorrect. That's not something we do or ever have done, AFAIK. Given the lack of complaints and the fact that 2.4 is more or less EOL, I don't feel a strong need to worry about it; but it might be something to keep in mind in case we get bug reports. regards, tom lane
Re: failures in t/031_recovery_conflict.pl on CI
On 2022-05-06 12:12:19 -0400, Tom Lane wrote: > Andres Freund writes: > > I looked through all the failures I found and it's two kinds of failures, > > both > > related to the deadlock test. So I'm thinking of skipping just that test as > > in > > the attached. > > > Working on committing / backpatching that, unless somebody suggests changes > > quickly... > > WFM. Done. Perhaps you could trigger a run on longfin, that seems to have been the most reliably failing animal? Greetings, Andres Freund
Re: Estimating HugePages Requirements?
On Tue, Apr 26, 2022 at 10:34:06AM +0900, Michael Paquier wrote: > Yes, the redirection issue would apply to all the run-time GUCs. Should this be tracked as an open item for v15? There was another recent report about the extra log output [0]. [0] https://www.postgresql.org/message-id/YnARlI5nvbziobR4%40momjian.us -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Support logical replication of DDLs
> Attached is a set of two patches as an attempt to evaluate this approach. > > The first patch provides functions to deparse DDL commands. Currently, > it is restricted to just a simple CREATE TABLE statement, the required > code is extracted from one of the patches posted in the thread [1]. > > The second patch allows replicating simple CREATE TABLE DDL > replication. To do that we used an event trigger and DDL deparsing > facilities. While creating a publication, we register a command end > trigger that deparses the DDL as a JSON blob, and WAL logs it. The > event trigger is automatically removed at the time of drop > publication. The WALSender decodes the WAL and sends it downstream > similar to other DML commands. The subscriber then converts JSON back > to the DDL command string and executes it. In the subscriber, we also > add the newly added rel to pg_subscription_rel so that the DML changes > on the new table can be replicated without having to manually run > "ALTER SUBSCRIPTION ... REFRESH PUBLICATION". Some of the code related > to WAL logging and subscriber-side work is taken from the patch posted > by Zheng in this thread but there are quite a few changes in that as > we don't need schema, role, transaction vs. non-transactional > handling. > > Note that for now, we have hacked Create Publication code such that > when the user specifies the "FOR ALL TABLES" clause, we invoke this > new functionality. So, this will work only for "FOR ALL TABLES" > publications. For example, we need to below to replicate the simple > Create Table command. > > Publisher: > Create Publication pub1 For All Tables; > > Subscriber: > Create Subscription sub1 Connection '...' Publication pub1; > > Publisher: > Create Table t1(c1 int); > > Subscriber: > \d should show t1. > > As we have hacked CreatePublication function for this POC, the > regression tests are not passing but we can easily change it so that > we invoke new functionality with the syntax proposed in this thread or > with some other syntax and we shall do that in the next patch unless > this approach is not worth pursuing. > > This POC is prepared by Ajin Cherian, Hou-San, and me. > > Thoughts? > > [1] - > https://www.postgresql.org/message-id/20150215044814.GL3391%40alvh.no-ip.org Hi, Thanks for exploring this direction. I read the deparsing thread and your patch. Here is my thought: 1. The main concern on maintainability of the deparsing code still applies if we want to adapt it for DDL replication. 2. The search_path and role still need to be handled, in the deparsing code. And I think it's actually more overhead to qualify every object compared to just logging the search_path and enforcing it on the apply worker. 3. I'm trying to understand if deparsing helps with edge cases like "ALTER TABLE foo ADD COLUMN bar double precision DEFAULT random();". I don't think it helps out of the box. The crux of separating table rewrite and DDL still needs to be solved for such cases. Regards, Zheng
Re: Can postgres ever delete the recycled future WAL files to free-up disk space if max_wal_size is reduced or wal_recycle is set to off?
Bharath Rupireddy writes: > Can postgres delete the recycled future WAL files once max_wal_size is > reduced and/or wal_recycle is set to off? A checkpoint should do that, see RemoveOldXlogFiles. Maybe you have a broken WAL archiving setup, or something else preventing removal of old WAL files? regards, tom lane
Can postgres ever delete the recycled future WAL files to free-up disk space if max_wal_size is reduced or wal_recycle is set to off?
Hi, I have a scenario where max_wal_size = 10GB and wal_recycle = on, the postgres starts to recycle and keep WAL files for future use, eventually around 600~ WAL files have been kept in the pg_wal directory. The checkpoints were happening at regular intervals. But the disk was about to get full (of course scaling up disk is an option) but to avoid "no space left on device" crashes, changed max_wal_size = 5GB, and issued a checkpoint, thinking that postgres will free up the 5GB of disk space. It seems like that's not the case because postgres will not remove future WAL files even after max_wal_size is reduced, but if it can delete the future WAL file(s) immediately, the server would have had 5GB free disk space to keep the server up avoiding crash and meanwhile disk scaling can be performed. Can postgres delete the recycled future WAL files once max_wal_size is reduced and/or wal_recycle is set to off? Thoughts? Regards, Bharath Rupireddy.
Re: failures in t/031_recovery_conflict.pl on CI
Andres Freund writes: > I looked through all the failures I found and it's two kinds of failures, both > related to the deadlock test. So I'm thinking of skipping just that test as in > the attached. > Working on committing / backpatching that, unless somebody suggests changes > quickly... WFM. regards, tom lane
Re: failures in t/031_recovery_conflict.pl on CI
Hi, On 2022-05-05 23:57:28 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2022-05-05 23:36:22 -0400, Tom Lane wrote: > >> So I reluctantly vote for removing 031_recovery_conflict.pl in the > >> back branches for now, with the expectation that we'll fix the > >> infrastructure and put it back after the current release round > >> is done. > > > What about instead marking the flapping test TODO? That'd still give us most > > of the coverage... > > Are you sure there's just one test that's failing? I haven't checked > the buildfarm history close enough to be sure of that. But if it's > true, disabling just that one would be fine (again, as a stopgap > measure). I looked through all the failures I found and it's two kinds of failures, both related to the deadlock test. So I'm thinking of skipping just that test as in the attached. Working on committing / backpatching that, unless somebody suggests changes quickly... Greetings, Andres Freund diff --git c/src/test/recovery/t/031_recovery_conflict.pl i/src/test/recovery/t/031_recovery_conflict.pl index 52f00a6f514..72808095d21 100644 --- c/src/test/recovery/t/031_recovery_conflict.pl +++ i/src/test/recovery/t/031_recovery_conflict.pl @@ -228,6 +228,10 @@ check_conflict_stat("lock"); ## RECOVERY CONFLICT 5: Deadlock +SKIP: +{ + skip "disabled until after minor releases, due to instability"; + $sect = "startup deadlock"; $expected_conflicts++; @@ -286,6 +290,7 @@ check_conflict_stat("deadlock"); # clean up for next tests $node_primary->safe_psql($test_db, qq[ROLLBACK PREPARED 'lock';]); +} # Check that expected number of conflicts show in pg_stat_database. Needs to
Re: Possible corruption by CreateRestartPoint at promotion
On Fri, May 06, 2022 at 07:58:43PM +0900, Michael Paquier wrote: > And I have spent a bit of this stuff to finish with the attached. It > will be a plus to get that done on HEAD for beta1, so I'll try to deal > with it on Monday. I am still a bit stressed about the back branches > as concurrent checkpoints are possible via CreateCheckPoint() from the > startup process (not the case of HEAD), but the stable branches will > have a new point release very soon so let's revisit this choice there > later. v6 attached includes a TAP test, but I don't intend to include > it as it is expensive. I was looking at other changes in this area (e.g., 3c64dcb), and now I'm wondering if we actually should invalidate the minRecoveryPoint when the control file no longer indicates archive recovery. Specifically, what happens if a base backup of a newly promoted standby is used for a point-in-time restore? If the checkpoint location is updated and all previous segments have been recycled/removed, it seems like the minRecoveryPoint might point to a missing segment. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: libpq async duplicate error results
I wrote: > What is happening is that this bit in PQsendQueryStart is deciding > not to clear the message buffer because conn->cmd_queue_head isn't > NULL: > /* > * If this is the beginning of a query cycle, reset the error state. > * However, in pipeline mode with something already queued, the error > * buffer belongs to that command and we shouldn't clear it. > */ > if (newQuery && conn->cmd_queue_head == NULL) > pqClearConnErrorState(conn); > So apparently the fault is with the pipelined-query logic. I'm > not sure what cleanup step is missing though. I experimented with the attached patch, which is based on the idea that clearing the command queue is being done in entirely the wrong place. It's more appropriate to do it where we drop unsent output data, ie in pqDropConnection not pqDropServerData. The fact that it was jammed into the latter without any commentary doesn't do much to convince me that that placement was thought about carefully. This fixes the complained-of symptom and still passes check-world. However, I wonder whether cutting the queue down to completely empty is the right thing. Why is the queue set up like this in the first place --- that is, why don't we remove an entry the moment the command is sent? I also wonder whether pipelineStatus ought to get reset here. We aren't resetting asyncStatus here, so maybe it's fine, but I'm not quite sure. regards, tom lane diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 4c12f1411f..6e936bbff3 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -377,6 +377,7 @@ static int connectDBStart(PGconn *conn); static int connectDBComplete(PGconn *conn); static PGPing internal_ping(PGconn *conn); static PGconn *makeEmptyPGconn(void); +static void pqFreeCommandQueue(PGcmdQueueEntry *queue); static bool fillPGconn(PGconn *conn, PQconninfoOption *connOptions); static void freePGconn(PGconn *conn); static void closePGconn(PGconn *conn); @@ -462,6 +463,12 @@ pqDropConnection(PGconn *conn, bool flushInput) /* Always discard any unsent data */ conn->outCount = 0; + /* Likewise, discard any pending pipelined commands */ + pqFreeCommandQueue(conn->cmd_queue_head); + conn->cmd_queue_head = conn->cmd_queue_tail = NULL; + pqFreeCommandQueue(conn->cmd_queue_recycle); + conn->cmd_queue_recycle = NULL; + /* Free authentication/encryption state */ #ifdef ENABLE_GSS { @@ -569,12 +576,6 @@ pqDropServerData(PGconn *conn) } conn->notifyHead = conn->notifyTail = NULL; - pqFreeCommandQueue(conn->cmd_queue_head); - conn->cmd_queue_head = conn->cmd_queue_tail = NULL; - - pqFreeCommandQueue(conn->cmd_queue_recycle); - conn->cmd_queue_recycle = NULL; - /* Reset ParameterStatus data, as well as variables deduced from it */ pstatus = conn->pstatus; while (pstatus != NULL)
Re: make MaxBackends available in _PG_init
On Fri, May 06, 2022 at 10:43:21AM -0400, Tom Lane wrote: > Robert Haas writes: >> On Fri, May 6, 2022 at 9:57 AM Tom Lane wrote: >>> I agree that _PG_fini functions as they stand are worthless. >>> What I'm not getting is why we should care enough about that >>> to break just about everybody's extension. Even if unloading >>> extensions were feasible, who would bother? > >> Well, if we think that, then we ought to remove the NOT_USED code and >> all the random _PG_fini() stuff that's still floating around. > > I think that's exactly what we should do, if it bugs you that stuff > is just sitting there. I see no prospect that we'll ever make it > work, because the question of unhooking from hooks is just the tip > of the iceberg. As an example, what should happen with any custom > GUCs the module has defined? Dropping their values might not be > very nice, but if we leave them around then the next LOAD (if any) > will see a conflict. Another fun question is whether it's ever > safe to unload a module that was preloaded by the postmaster. > > In short, this seems like a can of very wriggly worms, with not > a lot of benefit that would ensue from opening it. +1, I'll put together a new patch set. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: gcc 12.1.0 warning
Erik Rijkers writes: > Not sure if these compiler-mutterings are worth reporting but I guess > we're trying to get a silent compile. > System: Debian 4.9.303-1 (2022-03-07) x86_64 GNU/Linux > Compiling with gcc 12.1.0 causes the below 'warning' and 'note'. > Compiling with --enable-cassert --enable-debug is silent, no warnings) > In function ‘guc_var_compare’, > inlined from ‘bsearch’ at > /usr/include/x86_64-linux-gnu/bits/stdlib-bsearch.h:33:23, > inlined from ‘find_option’ at guc.c:5649:35: > guc.c:5736:38: warning: array subscript ‘const struct config_generic[0]’ > is partly outside array bounds of ‘const char[8]’ [-Warray-bounds] > 5736 | return guc_name_compare(confa->name, confb->name); >| ~^~ I'd call that a compiler bug. regards, tom lane
Re: make MaxBackends available in _PG_init
Robert Haas writes: > On Fri, May 6, 2022 at 9:57 AM Tom Lane wrote: >> I agree that _PG_fini functions as they stand are worthless. >> What I'm not getting is why we should care enough about that >> to break just about everybody's extension. Even if unloading >> extensions were feasible, who would bother? > Well, if we think that, then we ought to remove the NOT_USED code and > all the random _PG_fini() stuff that's still floating around. I think that's exactly what we should do, if it bugs you that stuff is just sitting there. I see no prospect that we'll ever make it work, because the question of unhooking from hooks is just the tip of the iceberg. As an example, what should happen with any custom GUCs the module has defined? Dropping their values might not be very nice, but if we leave them around then the next LOAD (if any) will see a conflict. Another fun question is whether it's ever safe to unload a module that was preloaded by the postmaster. In short, this seems like a can of very wriggly worms, with not a lot of benefit that would ensue from opening it. regards, tom lane
gcc 12.1.0 warning
Hi, Not sure if these compiler-mutterings are worth reporting but I guess we're trying to get a silent compile. System: Debian 4.9.303-1 (2022-03-07) x86_64 GNU/Linux Compiling with gcc 12.1.0 causes the below 'warning' and 'note'. Compiling with --enable-cassert --enable-debug is silent, no warnings) In function ‘guc_var_compare’, inlined from ‘bsearch’ at /usr/include/x86_64-linux-gnu/bits/stdlib-bsearch.h:33:23, inlined from ‘find_option’ at guc.c:5649:35: guc.c:5736:38: warning: array subscript ‘const struct config_generic[0]’ is partly outside array bounds of ‘const char[8]’ [-Warray-bounds] 5736 | return guc_name_compare(confa->name, confb->name); | ~^~ guc.c: In function ‘find_option’: guc.c:5636:25: note: object ‘name’ of size 8 5636 | find_option(const char *name, bool create_placeholders, bool skip_errors, | ^~~~ (Compiling with gcc 6.3.0 does not complain.) Below are the two configure lines, FWIW. Erik Rijkers # cassert-build: no warning/note ./configure \ --prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD \ --bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/bin \ --libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/lib \ --with-pgport=6515 --quiet --enable-depend \ --enable-cassert --enable-debug \ --with-openssl --with-perl --with-libxml --with-libxslt --with-zlib \ --enable-tap-tests --with-extra-version=_0506_HEAD_701d --with-lz4 # normal build: causes warning/note: ./configure \ --prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD \ --bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/bin.fast \ --libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/lib.fast \ --with-pgport=6515 --quiet --enable-depend \ --with-openssl --with-perl --with-libxml --with-libxslt --with-zlib \ --enable-tap-tests --with-extra-version=_0506_HEAD_701d --with-lz4
Re: libpq async duplicate error results
Peter Eisentraut writes: > I took another look at this. The output from the test program at the > beginning of the thread is now (master branch): > ... > command = SELECT 'after'; > PQsendQuery() error: FATAL: terminating connection due to administrator > command > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > no connection to the server > It appears the "after" query is getting the error message from the > previous cycle somehow. What is happening is that this bit in PQsendQueryStart is deciding not to clear the message buffer because conn->cmd_queue_head isn't NULL: /* * If this is the beginning of a query cycle, reset the error state. * However, in pipeline mode with something already queued, the error * buffer belongs to that command and we shouldn't clear it. */ if (newQuery && conn->cmd_queue_head == NULL) pqClearConnErrorState(conn); So apparently the fault is with the pipelined-query logic. I'm not sure what cleanup step is missing though. regards, tom lane
Re: BufferAlloc: don't take two simultaneous locks
On Thu, Apr 21, 2022 at 6:58 PM Yura Sokolov wrote: > At the master state: > - SharedBufHash is not declared as HASH_FIXED_SIZE > - get_hash_entry falls back to element_alloc too fast (just if it doesn't > found free entry in current freelist partition). > - get_hash_entry has races. > - if there are small number of spare items (and NUM_BUFFER_PARTITIONS is > small number) and HASH_FIXED_SIZE is set, it becomes contended and > therefore slow. > > HASH_REUSE solves (for shared buffers) most of this issues. Free list > became rare fallback, so HASH_FIXED_SIZE for SharedBufHash doesn't lead > to performance hit. And with fair number of spare items, get_hash_entry > will find free entry despite its races. Hmm, I see. The idea of trying to arrange to reuse entries rather than pushing them onto a freelist and immediately trying to take them off again is an interesting one, and I kind of like it. But I can't imagine that anyone would commit this patch the way you have it. It's way too much action at a distance. If any ereport(ERROR,...) could happen between the HASH_REUSE operation and the subsequent HASH_ENTER, it would be disastrous, and those things are separated by multiple levels of call stack across different modules, so mistakes would be easy to make. If this could be made into something dynahash takes care of internally without requiring extensive cooperation with the calling code, I think it would very possibly be accepted. One approach would be to have a hash_replace() call that takes two const void * arguments, one to delete and one to insert. Then maybe you propagate that idea upward and have, similarly, a BufTableReplace operation that uses that, and then the bufmgr code calls BufTableReplace instead of BufTableDelete. Maybe there are other better ideas out there... -- Robert Haas EDB: http://www.enterprisedb.com
Re: make MaxBackends available in _PG_init
On Fri, May 6, 2022 at 9:57 AM Tom Lane wrote: > Robert Haas writes: > > perhaps for v16 or some future release we can think about redoing a > > bunch of hooks this way. There would be some migration pain for > > extension authors for sure, but then we might get to a point where > > extensions can be cleanly unloaded in at least some circumstances, > > instead of continuing to write silly _PG_fini() functions that > > couldn't possibly ever work. > > I agree that _PG_fini functions as they stand are worthless. > What I'm not getting is why we should care enough about that > to break just about everybody's extension. Even if unloading > extensions were feasible, who would bother? Well, if we think that, then we ought to remove the NOT_USED code and all the random _PG_fini() stuff that's still floating around. I don't actually have a clear answer to whether it's a useful thing to be able to unload modules. If the module is just providing a bunch of SQL-callable functions, it probably isn't. If it's modifying the behavior of your session, it might be. Now, it could also have an "off" switch in the form of a GUC, and probably should - but you'd probably save at least a few cycles by detaching from the hooks rather than just still getting called and doing nothing, and maybe that's worth something. Whether it's worth enough to justify making changes that will affect extensions, I'm not sure. IOW, I don't really know what we ought to do here, but I think that maintaining a vestigial system that has never worked and can never work is not it. -- Robert Haas EDB: http://www.enterprisedb.com
Re: make MaxBackends available in _PG_init
Robert Haas writes: > perhaps for v16 or some future release we can think about redoing a > bunch of hooks this way. There would be some migration pain for > extension authors for sure, but then we might get to a point where > extensions can be cleanly unloaded in at least some circumstances, > instead of continuing to write silly _PG_fini() functions that > couldn't possibly ever work. I agree that _PG_fini functions as they stand are worthless. What I'm not getting is why we should care enough about that to break just about everybody's extension. Even if unloading extensions were feasible, who would bother? regards, tom lane
Re: bogus: logical replication rows/cols combinations
On 5/6/22 15:40, Amit Kapila wrote: > On Fri, May 6, 2022 at 5:56 PM Tomas Vondra > wrote: >> I could think of below two options: 1. Forbid any case where column list is different for the same table when combining publications. 2. Forbid if the column list and row filters for a table are different in the set of publications we are planning to combine. This means we will allow combining column lists when row filters are not present or when column list is the same (we don't get anything additional by combining but the idea is we won't forbid such cases) and row filters are different. Now, I think the points in favor of (1) are that the main purpose of introducing a column list are: (a) the structure/schema of the subscriber is different from the publisher, (b) want to hide sensitive columns data. In both cases, it should be fine if we follow (1) and from Peter E.'s latest email [1] he also seems to be indicating the same. If we want to be slightly more relaxed then we can probably (2). We can decide on something else as well but I feel it should be such that it is easy to explain. >>> >>> I also think it makes sense to add a restriction like (1). I am planning to >>> implement the restriction if no one objects. >>> >> >> I'm not going to block that approach if that's the consensus here, >> though I'm not convinced. >> >> Let me point out (1) does *not* work for data redaction use case, >> certainly not the example Alvaro and me presented, because that relies >> on a combination of row filters and column filters. >> > > This should just forbid the case presented by Alvaro in his first > email in this thread [1]. > >> Requiring all column >> lists to be the same (and not specific to row filter) prevents that >> example from working. Yes, you can create multiple subscriptions, but >> that brings it's own set of challenges too. >> >> I doubt forcing users to use the more complex setup is good idea, and >> combining the column lists per [1] seems sound to me. >> >> That being said, the good thing is this restriction seems it might be >> relaxed in the future to work per [1], without causing any backwards >> compatibility issues. >> > > These are my thoughts as well. Even, if we decide to go via the column > list merging approach (in selective cases), we need to do some > performance testing of that approach as it does much more work per > tuple. It is possible that the impact is not much but still worth > evaluating, so let's try to see the patch to prohibit combining the > column lists then we can decide. > Surely we could do some performance testing now. I doubt it's very expensive - sure, you can construct cases with many row filters / column lists, but how likely is that in practice? Moreover, it's not like this would affect existing setups, so even if it's a bit expensive, we may interpret that as cost of the feature. >> Should we do something similar for row filters, though? It seems quite >> weird we're so concerned about unexpected behavior due to combining >> column lists (despite having a patch that makes it behave sanely), and >> at the same time wave off similarly strange behavior due to combining >> row filters because "that's what you get if you define the publications >> in a strange way". >> > > During development, we found that we can't combine the row-filters for > 'insert' and 'update'/'delete' because of replica identity > restrictions, so we have kept them separate. But if we came across > other such things then we can either try to fix those or forbid them. > I understand how we got to the current state. I'm just saying that this allows defining separate publications for insert, update and delete actions, and set different row filters for each of them. Which results in behavior that is hard to explain/understand, especially when it comes to tablesync. It seems quite strange to prohibit merging column lists because there might be some strange behavior that no one described, and allow setups with different row filters that definitely have strange behavior. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: make MaxBackends available in _PG_init
On Tue, Apr 19, 2022 at 11:47 AM Nathan Bossart wrote: > Okay, I did it this way in v5. I pushed 0001. Regarding 0002, I think there's no point in adding a _PG_fini(). The code to call _PG_fini() has been surrounded by #ifdef NOT_USED forever, and that seems unlikely to change any time soon as long as we stick with these stupid hook designs where there's effectively a linked list of hooks, but you can't actually access the list because spread across a bunch of random static variables in each module. I think we ought to go through all the hooks that are being used in this kind of way and replace them with a system where there's an explicit List of functions to call, and instead of this lame stuff like: + prev_shmem_request_hook = shmem_request_hook; + shmem_request_hook = autoprewarm_shmem_request; You instead do: shmem_request_functions = lappend(shmem_request_functions, autoprewarm_shmem_request); Or maybe wrap that up in an API, like: add_shmem_request_function(autoprewarm_shmem_request); Then it would be easy to have corresponding a corresponding remove function. For shmem request, it would probably make sense to ditch the part where each hook function calls the next hook function and instead just have the calling code loop over the list and call them one by one. For things like the executor start/run/end hooks, that wouldn't work, but you could have something like invoke_next_executor_start_hook_function(args). I don't think we should try to make this kind of change now - it seems to me that what you've done here is consistent with existing practice and we might as well commit it more or less like this for now. But perhaps for v16 or some future release we can think about redoing a bunch of hooks this way. There would be some migration pain for extension authors for sure, but then we might get to a point where extensions can be cleanly unloaded in at least some circumstances, instead of continuing to write silly _PG_fini() functions that couldn't possibly ever work. -- Robert Haas EDB: http://www.enterprisedb.com
Re: libpq async duplicate error results
I took another look at this. The output from the test program at the beginning of the thread is now (master branch): command = SELECT 'before'; result 1 status = PGRES_TUPLES_OK error message = "" command = SELECT pg_terminate_backend(pg_backend_pid()); result 1 status = PGRES_FATAL_ERROR error message = "FATAL: terminating connection due to administrator command " result 2 status = PGRES_FATAL_ERROR error message = "server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. " command = SELECT 'after'; PQsendQuery() error: FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. no connection to the server It appears the "after" query is getting the error message from the previous cycle somehow. The output in PG14 and PG13 is: command = SELECT 'after'; PQsendQuery() error: no connection to the server Is the change intended or do we need to think about more tweaking?
Re: Trying to add more tests to gistbuild.c
The attached patch is failing on make check due to a typo, resubmitting the correct one. -- Matheus Alcantaradiff --git a/src/test/regress/expected/gist.out b/src/test/regress/expected/gist.out index a36b4c9c56..b5edc44250 100644 --- a/src/test/regress/expected/gist.out +++ b/src/test/regress/expected/gist.out @@ -387,6 +387,12 @@ select p from gist_tbl order by circle(p,1) <-> point(0,0) limit 1; select p from gist_tbl order by circle(p,1) <-> point(0,0) limit 1; ERROR: lossy distance functions are not supported in index-only scans +-- Build an index using buffering caused by a index build that don't fit on cache. +set effective_cache_size = '1MB'; +create index gist_tbl_box_index_buffering on gist_tbl using gist (p, b, c); +reset effective_cache_size; +-- Force an index build using buffering. +create index gist_tbl_box_index_forcing_buffering on gist_tbl using gist (p) with (buffering=on); -- Clean up reset enable_seqscan; reset enable_bitmapscan; diff --git a/src/test/regress/sql/gist.sql b/src/test/regress/sql/gist.sql index 3360266370..214366c157 100644 --- a/src/test/regress/sql/gist.sql +++ b/src/test/regress/sql/gist.sql @@ -169,6 +169,15 @@ explain (verbose, costs off) select p from gist_tbl order by circle(p,1) <-> point(0,0) limit 1; select p from gist_tbl order by circle(p,1) <-> point(0,0) limit 1; + +-- Build an index using buffering caused by a index build that don't fit on cache. +set effective_cache_size = '1MB'; +create index gist_tbl_box_index_buffering on gist_tbl using gist (p, b, c); +reset effective_cache_size; + +-- Force an index build using buffering. +create index gist_tbl_box_index_forcing_buffering on gist_tbl using gist (p) with (buffering=on); + -- Clean up reset enable_seqscan; reset enable_bitmapscan;
Re: bogus: logical replication rows/cols combinations
On Fri, May 6, 2022 at 5:56 PM Tomas Vondra wrote: > > >> > >> I could think of below two options: > >> 1. Forbid any case where column list is different for the same table > >> when combining publications. > >> 2. Forbid if the column list and row filters for a table are different > >> in the set of publications we are planning to combine. This means we > >> will allow combining column lists when row filters are not present or > >> when column list is the same (we don't get anything additional by > >> combining but the idea is we won't forbid such cases) and row filters > >> are different. > >> > >> Now, I think the points in favor of (1) are that the main purpose of > >> introducing a column list are: (a) the structure/schema of the > >> subscriber is different from the publisher, (b) want to hide sensitive > >> columns data. In both cases, it should be fine if we follow (1) and > >> from Peter E.'s latest email [1] he also seems to be indicating the > >> same. If we want to be slightly more relaxed then we can probably (2). > >> We can decide on something else as well but I feel it should be such > >> that it is easy to explain. > > > > I also think it makes sense to add a restriction like (1). I am planning to > > implement the restriction if no one objects. > > > > I'm not going to block that approach if that's the consensus here, > though I'm not convinced. > > Let me point out (1) does *not* work for data redaction use case, > certainly not the example Alvaro and me presented, because that relies > on a combination of row filters and column filters. > This should just forbid the case presented by Alvaro in his first email in this thread [1]. > Requiring all column > lists to be the same (and not specific to row filter) prevents that > example from working. Yes, you can create multiple subscriptions, but > that brings it's own set of challenges too. > > I doubt forcing users to use the more complex setup is good idea, and > combining the column lists per [1] seems sound to me. > > That being said, the good thing is this restriction seems it might be > relaxed in the future to work per [1], without causing any backwards > compatibility issues. > These are my thoughts as well. Even, if we decide to go via the column list merging approach (in selective cases), we need to do some performance testing of that approach as it does much more work per tuple. It is possible that the impact is not much but still worth evaluating, so let's try to see the patch to prohibit combining the column lists then we can decide. > Should we do something similar for row filters, though? It seems quite > weird we're so concerned about unexpected behavior due to combining > column lists (despite having a patch that makes it behave sanely), and > at the same time wave off similarly strange behavior due to combining > row filters because "that's what you get if you define the publications > in a strange way". > During development, we found that we can't combine the row-filters for 'insert' and 'update'/'delete' because of replica identity restrictions, so we have kept them separate. But if we came across other such things then we can either try to fix those or forbid them. [1] - https://www.postgresql.org/message-id/202204251548.mudq7jbqnh7r%40alvherre.pgsql -- With Regards, Amit Kapila.
Re: Fix typo in comment
Alvaro Herrera writes: > On 2022-May-06, Zaorang Yang wrote: >> Maybe, the first letter of comments in postinit.c should be capitalized. > Hmm, typically these one-line comments are not "full sentences", so they > don't have capitals and no ending periods either. I wouldn't like the > endless stream of patches that would result if we let this go in. There is no project style guideline suggesting one over the other, so I think we should just leave it alone. regards, tom lane
Trying to add more tests to gistbuild.c
I'm studying how the gist index works trying to improve the test coverage of gistbuild.c. Reading the source code I noticed that the gistInitBuffering function is not covered, so I decided to start with it. Reading the documentation and the source I understood that for this function to be executed it is necessary to force bufferring=on when creating the index or the index to be created is big enough to not fit in the cache, am I correct? Considering the above, I added two new index creation statements in the gist regression test (patch attached) to create an index using buffering=on and another to try to simulate an index that does not fit in the cache. With these new tests the coverage went from 45.3% to 85.5%, but I have some doubts: - Does this test make sense? - Would there be a way to validate that the buffering was done correctly? - Is this test necessary? I've been studying Postgresql implementations and I'm just trying to start contributing the source code. -- Matheus Alcantaradiff --git a/src/test/regress/expected/gist.out b/src/test/regress/expected/gist.out index a36b4c9c56..044986433a 100644 --- a/src/test/regress/expected/gist.out +++ b/src/test/regress/expected/gist.out @@ -46,6 +46,12 @@ vacuum analyze gist_tbl; set enable_seqscan=off; set enable_bitmapscan=off; set enable_indexonlyscan=on; +-- Build an index using buffering caused by a index build that don't fit on cache. +set effective_cache_size = '1MB'; +create index gist_tbl_box_index_buffering on gist_tbl using gist (p, b, c); +reset effective_cache_size; +-- Force a index build using buffering. +create index gist_tbl_box_index_forcing_buffering on gist_tbl using gist (p) with (buffering=on); -- Test index-only scan with point opclass create index gist_tbl_point_index on gist_tbl using gist (p); -- check that the planner chooses an index-only scan diff --git a/src/test/regress/sql/gist.sql b/src/test/regress/sql/gist.sql index 3360266370..836ce84d71 100644 --- a/src/test/regress/sql/gist.sql +++ b/src/test/regress/sql/gist.sql @@ -55,6 +55,14 @@ set enable_seqscan=off; set enable_bitmapscan=off; set enable_indexonlyscan=on; +-- Build an index using buffering caused by a index build that don't fit on cache. +set effective_cache_size = '1MB'; +create index gist_tbl_box_index_buffering on gist_tbl using gist (p, b, c); +reset effective_cache_size; + +-- Force an index build using buffering. +create index gist_tbl_box_index_forcing_buffering on gist_tbl using gist (p) with (buffering=on); + -- Test index-only scan with point opclass create index gist_tbl_point_index on gist_tbl using gist (p);
Re: Fix typo in comment
> On 6 May 2022, at 14:50, Alvaro Herrera wrote: > > On 2022-May-06, Zaorang Yang wrote: > >> Maybe, the first letter of comments in postinit.c should be capitalized. > > Hmm, typically these one-line comments are not "full sentences", so they > don't have capitals and no ending periods either. I wouldn't like the > endless stream of patches that would result if we let this go in. Agreed. A quick grep turns up a fair number of such comments: $ git grep "^\s*\/\* [a-z].\+\*\/$" src/ | wc -l 16588 If anything should be done to this it would perhaps be better to add to pgindent or a similar automated process. If anything. -- Daniel Gustafsson https://vmware.com/
Re: Fix typo in comment
On 2022-May-06, Zaorang Yang wrote: > Maybe, the first letter of comments in postinit.c should be capitalized. Hmm, typically these one-line comments are not "full sentences", so they don't have capitals and no ending periods either. I wouldn't like the endless stream of patches that would result if we let this go in. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Fix typo in comment
Maybe, the first letter of comments in postinit.c should be capitalized. Attaching a tiny patch to fix it. 0001-Fix-typo-in-comment.patch Description: Binary data
Re: bogus: logical replication rows/cols combinations
On 5/6/22 05:23, houzj.f...@fujitsu.com wrote: > On Tuesday, May 3, 2022 11:31 AM Amit Kapila wrote: >> >> On Tue, May 3, 2022 at 12:10 AM Tomas Vondra >> wrote: >>> >>> On 5/2/22 19:51, Alvaro Herrera wrote: > Why would we need to know publications replicated by other >> walsenders? > And what if the subscriber is not connected at the moment? In that case > there'll be no walsender. Sure, if the replica is not connected then there's no issue -- as you say, that replica will fail at START_REPLICATION time. >>> >>> Right, I got confused a bit. >>> >>> Anyway, I think the main challenge is defining what exactly we want to >>> check, in order to ensure "sensible" behavior, without preventing way >>> too many sensible use cases. >>> >> >> I could think of below two options: >> 1. Forbid any case where column list is different for the same table >> when combining publications. >> 2. Forbid if the column list and row filters for a table are different >> in the set of publications we are planning to combine. This means we >> will allow combining column lists when row filters are not present or >> when column list is the same (we don't get anything additional by >> combining but the idea is we won't forbid such cases) and row filters >> are different. >> >> Now, I think the points in favor of (1) are that the main purpose of >> introducing a column list are: (a) the structure/schema of the >> subscriber is different from the publisher, (b) want to hide sensitive >> columns data. In both cases, it should be fine if we follow (1) and >> from Peter E.'s latest email [1] he also seems to be indicating the >> same. If we want to be slightly more relaxed then we can probably (2). >> We can decide on something else as well but I feel it should be such >> that it is easy to explain. > > I also think it makes sense to add a restriction like (1). I am planning to > implement the restriction if no one objects. > I'm not going to block that approach if that's the consensus here, though I'm not convinced. Let me point out (1) does *not* work for data redaction use case, certainly not the example Alvaro and me presented, because that relies on a combination of row filters and column filters. Requiring all column lists to be the same (and not specific to row filter) prevents that example from working. Yes, you can create multiple subscriptions, but that brings it's own set of challenges too. I doubt forcing users to use the more complex setup is good idea, and combining the column lists per [1] seems sound to me. That being said, the good thing is this restriction seems it might be relaxed in the future to work per [1], without causing any backwards compatibility issues. Should we do something similar for row filters, though? It seems quite weird we're so concerned about unexpected behavior due to combining column lists (despite having a patch that makes it behave sanely), and at the same time wave off similarly strange behavior due to combining row filters because "that's what you get if you define the publications in a strange way". regards [1] https://www.postgresql.org/message-id/5a85b8b7-fc1c-364b-5c62-0bb3e1e25824%40enterprisedb.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Fix typo in code comment - origin.c
On Fri, May 06, 2022 at 07:25:34PM +1000, Peter Smith wrote: > PSA trivial patch to fix some very old comment typo. Thanks, fixed. -- Michael signature.asc Description: PGP signature
Re: Possible corruption by CreateRestartPoint at promotion
On Thu, Apr 28, 2022 at 03:49:42PM +0900, Michael Paquier wrote: > I am not sure what you mean here. FWIW, I am translating the > suggestion of Nathan to split the existing check in > CreateRestartPoint() that we are discussing here into two if blocks, > instead of just one: > - Move the update of checkPoint and checkPointCopy into its own if > block, controlled only by the check on > (ControlFile->checkPointCopy.redo < lastCheckPoint.redo) > - Keep the code updating minRecoveryPoint and minRecoveryPointTLI > mostly as-is, but just do the update when the control file state is > DB_IN_ARCHIVE_RECOVERY. Of course we need to keep the check on > (minRecoveryPoint < lastCheckPointEndPtr). And I have spent a bit of this stuff to finish with the attached. It will be a plus to get that done on HEAD for beta1, so I'll try to deal with it on Monday. I am still a bit stressed about the back branches as concurrent checkpoints are possible via CreateCheckPoint() from the startup process (not the case of HEAD), but the stable branches will have a new point release very soon so let's revisit this choice there later. v6 attached includes a TAP test, but I don't intend to include it as it is expensive. -- Michael From ffc1469819b28ff76e3290cd695e60511d94305a Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Fri, 6 May 2022 19:57:54 +0900 Subject: [PATCH v6] Correctly update control file at promotion If the cluster ends recovery on a promotion (aka the control file shows a state different than DB_IN_ARCHIVE_RECOVERY) while CreateRestartPoint() is still processing, this function could miss the update the control file but still do the recycling/removal of the past WAL segments, causing a follow-up restart doing recovery to fail on a missing checkpoint record (aka PANIC) because the redo LSN referred in the control file was located in a segment already gone. This commit fixes the control file update so the redo LSN is updated to reflect what gets recycled, with the minimum recovery point changed only if the cluster is still doing archive recovery. --- src/backend/access/transam/xlog.c | 48 .../t/027_invalid_checkpoint_after_promote.pl | 113 ++ 2 files changed, 141 insertions(+), 20 deletions(-) create mode 100644 src/test/recovery/t/027_invalid_checkpoint_after_promote.pl diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 61cda56c6f..c9c0b3f90a 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6921,6 +6921,9 @@ CreateRestartPoint(int flags) XLogSegNo _logSegNo; TimestampTz xtime; + /* Concurrent checkpoint/restartpoint cannot happen */ + Assert (!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER); + /* Get a local copy of the last safe checkpoint record. */ SpinLockAcquire(>info_lck); lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr; @@ -7013,30 +7016,33 @@ CreateRestartPoint(int flags) */ PriorRedoPtr = ControlFile->checkPointCopy.redo; - /* - * Update pg_control, using current time. Check that it still shows - * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing; - * this is a quick hack to make sure nothing really bad happens if somehow - * we get here after the end-of-recovery checkpoint. - */ + /* Update pg_control, using current time */ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); - if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY && - ControlFile->checkPointCopy.redo < lastCheckPoint.redo) + + /* + * Check if the control file still shows an older checkpoint, and update + * its information. + */ + if (ControlFile->checkPointCopy.redo < lastCheckPoint.redo) { ControlFile->checkPoint = lastCheckPointRecPtr; ControlFile->checkPointCopy = lastCheckPoint; + } - /* - * Ensure minRecoveryPoint is past the checkpoint record. Normally, - * this will have happened already while writing out dirty buffers, - * but not necessarily - e.g. because no buffers were dirtied. We do - * this because a backup performed in recovery uses minRecoveryPoint to - * determine which WAL files must be included in the backup, and the - * file (or files) containing the checkpoint record must be included, - * at a minimum. Note that for an ordinary restart of recovery there's - * no value in having the minimum recovery point any earlier than this - * anyway, because redo will begin just after the checkpoint record. - */ + /* + * Ensure minRecoveryPoint is past the checkpoint record while archive + * recovery is still ongoing. Normally, this will have happened already + * while writing out dirty buffers, but not necessarily - e.g. because no + * buffers were dirtied. We do this because a base backup uses + * minRecoveryPoint to determine which WAL files must be included in the + * backup, and the file (or files) containing the checkpoint record must + * be included, at a minimum. Note that for an ordinary restart of + *
Re: strange slow query - lost lot of time somewhere
pá 6. 5. 2022 v 10:05 odesílatel David Rowley napsal: > On Fri, 6 May 2022 at 17:52, Pavel Stehule > wrote: > > Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at > nodeMemoize.c:268 > > 268 if (size == 0) > > (gdb) p size > > $1 = 4369066 > > Thanks for the report. I think I now see the problem. Looking at > [1], it seems that's a bushy plan. That's fine, but less common than a > left deep plan. > > I think the problem is down to some incorrect code in > get_memoize_path() where we pass the wrong value of "calls" to > create_memoize_path(). I think instead of outer_path->parent->rows it > instead should be outer_path->rows. > > If you look closely at the plan, you'll see that the outer side of the > inner-most Nested Loop is parameterized by some higher-level nested > loop. > > -> Nested Loop (cost=1.14..79.20 rows=91 width=8) (actual > time=0.024..0.024 rows=1 loops=66) > -> Index Only Scan using > uq_isi_itemid_itemimageid on item_share_image itemsharei2__1 > (cost=0.57..3.85 rows=91 width=16) (actual time=0.010..0.010 rows=1 > loops=66) >Index Cond: (item_id = itembo0_.id) >Heap Fetches: 21 > -> Memoize (cost=0.57..2.07 rows=1 width=8) > (actual time=0.013..0.013 rows=1 loops=66) > > so instead of passing 91 to create_memoize_path() as I thought. Since > I can't see any WHERE clause items filtering rows from the > itemsharei2__1 relation, then the outer_path->parent->rows is should > be whatever pg_class.reltuples says. > > Are you able to send the results of: > > explain select item_id from item_share_image group by item_id; -- I'm > interested in the estimated number of groups in the plan's top node. > > > select reltuples from pg_class where oid = 'item_share_image'::regclass; > > I'm expecting the estimated number of rows in the top node of the > group by plan to be about 4369066. > (2022-05-06 12:30:23) prd_aukro=# explain select item_id from item_share_image group by item_id; QUERY PLAN Finalize HashAggregate (cost=1543418.63..1554179.08 rows=1076045 width=8) Group Key: item_id -> Gather (cost=1000.57..1532658.18 rows=4304180 width=8) Workers Planned: 4 -> Group (cost=0.57..1101240.18 rows=1076045 width=8) Group Key: item_id -> Parallel Index Only Scan using ixfk_isi_itemid on item_share_image (cost=0.57..1039823.86 rows=24566530 width=8) (7 řádek) Čas: 1,808 ms (2022-05-06 12:30:26) prd_aukro=# select reltuples from pg_class where oid = 'item_share_image'::regclass; reltuples 9.826612e+07 (1 řádka) Čas: 0,887 ms Regards Pavel > David > > [1] https://explain.depesz.com/s/2rBw#source >
Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
On Thu, May 5, 2022 at 6:39 AM David Zhang wrote: > On 2022-05-02 1:25 a.m., Etsuro Fujita wrote: > > On Wed, Apr 20, 2022 at 4:55 AM David Zhang wrote: > >> I tried to apply the patch to master and plan to run some tests, but got > >> below errors due to other commits. > > I rebased the patch against HEAD. Attached is an updated patch. > > Applied the patch v8 to master branch today, and the `make check` is OK. > I also repeated previous performance tests on three virtual Ubuntu > 18.04, and the performance improvement of parallel abort in 10 times > average is more consistent. > > before: > > abort.1 = 2.6344 ms > abort.2 = 4.2799 ms > > after: > > abort.1 = 1.4105 ms > abort.2 = 2.2075 ms Good to know! Thanks for testing! > >> + * remote server in parallel at (sub)transaction end. > >> > >> Here, I think the comment above could potentially apply to multiple > >> remote server(s). > > I agree on that point, but I think it's correct to say "the remote > > server" here, because we determine this for the given remote server. > > Maybe I'm missing something, so could you elaborate on it? > Your understanding is correct. I was thinking `remote server(s)` would > be easy for end user to understand but this is a comment in source code, > so either way is fine for me. Ok, but I noticed that the comment failed to mention that the parallel_commit option is disabled by default. Also, I noticed a comment above it: * It's enough to determine this only when making new connection because * all the connections to the foreign server whose keep_connections option * is changed will be closed and re-made later. This would apply to the parallel_commit option as well. How about updating these like the attached? (I simplified the latter comment and moved it to a more appropriate place.) Best regards, Etsuro Fujita update-comment-in-make_new_connection.patch Description: Binary data
Re: Configuration Parameter/GUC value validation hook
On Wed, May 4, 2022 at 7:12 AM Bharath Rupireddy wrote: > Thanks Tom and Robert for your responses. > > How about we provide a sample extension (limiting some important > parameters say shared_buffers, work_mem and so on to some > "reasonable/recommended" limits) in the core along with the > set_config_option_hook? This way, all the people using open source > postgres out-of-the-box will benefit and whoever wants, can modify > that sample extension to suit their needs. The sampe extension can > also serve as an example to implement set_config_option_hook. > > Thoughts? Well, it's better than just adding a hook and stopping, but I'm not really sure that it's as good as what I'd like to have. -- Robert Haas EDB: http://www.enterprisedb.com
Re: strange slow query - lost lot of time somewhere
On Fri, 6 May 2022 at 20:04, David Rowley wrote: > Thanks for the report. I think I now see the problem. Looking at > [1], it seems that's a bushy plan. That's fine, but less common than a > left deep plan. On second thoughts, it does not need to be a bushy plan for the outer side of the nested loop to be parameterized by some higher-level nested loop. There's an example of a plan like this in the regression tests. regression=# explain (analyze, costs off, summary off) regression-# select * from tenk1 t1 left join regression-# (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2) regression-# on t1.hundred = t2.hundred and t1.ten = t3.ten regression-# where t1.unique1 = 1; QUERY PLAN - Nested Loop Left Join (actual time=0.258..0.487 rows=20 loops=1) -> Index Scan using tenk1_unique1 on tenk1 t1 (actual time=0.049..0.049 rows=1 loops=1) Index Cond: (unique1 = 1) -> Nested Loop (actual time=0.204..0.419 rows=20 loops=1) Join Filter: (t1.ten = t3.ten) Rows Removed by Join Filter: 80 -> Bitmap Heap Scan on tenk1 t2 (actual time=0.064..0.194 rows=100 loops=1) Recheck Cond: (t1.hundred = hundred) Heap Blocks: exact=86 -> Bitmap Index Scan on tenk1_hundred (actual time=0.036..0.036 rows=100 loops=1) Index Cond: (hundred = t1.hundred) -> Memoize (actual time=0.001..0.001 rows=1 loops=100) Cache Key: t2.thousand Cache Mode: logical Hits: 90 Misses: 10 Evictions: 0 Overflows: 0 Memory Usage: 4kB -> Index Scan using tenk1_unique2 on tenk1 t3 (actual time=0.009..0.009 rows=1 loops=10) Index Cond: (unique2 = t2.thousand) (17 rows) debugging this I see that the memorize plan won because it was passing 1 as the number of calls. It should have been passing 100. The memorize node's number of loops agrees with that. Fixing the calls to correctly pass 100 gets rid of the Memoize node. I've attached a patch to fix. I'll look at it in more detail after the weekend. I'm very tempted to change the EXPLAIN output in at least master to display the initial and final (maximum) hash table sizes. Wondering if anyone would object to that? David diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index 9a8c5165b0..313f379ed8 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -610,7 +610,7 @@ get_memoize_path(PlannerInfo *root, RelOptInfo *innerrel, hash_operators, extra->inner_unique, binary_mode, - outer_path->parent->rows); + outer_path->rows); } return NULL; diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index bf1a2db2cf..bd3375f2ba 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -3673,8 +3673,8 @@ select * from tenk1 t1 left join (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2) on t1.hundred = t2.hundred and t1.ten = t3.ten where t1.unique1 = 1; - QUERY PLAN --- + QUERY PLAN + Nested Loop Left Join -> Index Scan using tenk1_unique1 on tenk1 t1 Index Cond: (unique1 = 1) @@ -3684,20 +3684,17 @@ where t1.unique1 = 1; Recheck Cond: (t1.hundred = hundred) -> Bitmap Index Scan on tenk1_hundred Index Cond: (hundred = t1.hundred) - -> Memoize - Cache Key: t2.thousand - Cache Mode: logical - -> Index Scan using tenk1_unique2 on tenk1 t3 - Index Cond: (unique2 = t2.thousand) -(14 rows) + -> Index Scan using tenk1_unique2 on tenk1 t3 + Index Cond: (unique2 = t2.thousand) +(11 rows) explain (costs off) select * from tenk1 t1 left join (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2) on t1.hundred = t2.hundred and t1.ten + t2.ten = t3.ten where t1.unique1 = 1; - QUERY PLAN --- + QUERY PLAN
Fix typo in code comment - origin.c
PSA trivial patch to fix some very old comment typo. -- Kind Regards, Peter Smith. Fujitsu Australia v1-0001-Fix-typo-in-comment.patch Description: Binary data
Re: Perform streaming logical transactions by background workers and parallel apply
On Fri, Apr 29, 2022 at 3:22 PM shiy.f...@fujitsu.com wrote: ... > Thanks for your patch. > > The patch modified streaming option in logical replication, it can be set to > 'on', 'off' and 'apply'. The new option 'apply' haven't been tested in the > tap test. > Attach a patch which modified the subscription tap test to cover both 'on' and > 'apply' option. (The main patch is also attached to make cfbot happy.) > Here are my review comments for v5-0002 (TAP tests) Your changes followed a similar pattern of refactoring so most of my comments below is repeated for all the files. == 1. Commit message For the tap tests about streaming option in logical replication, test both 'on' and 'apply' option. SUGGESTION Change all TAP tests using the PUBLICATION "streaming" option, so they now test both 'on' and 'apply' values. ~~~ 2. src/test/subscription/t/015_stream.pl +sub test_streaming +{ I think the function should have a comment to say that its purpose is to encapsulate all the common (stream related) test steps so the same code can be run both for the streaming=on and streaming=apply cases. ~~~ 3. src/test/subscription/t/015_stream.pl + +# Test streaming mode on +# Test streaming mode apply These comments fell too small. IMO they should both be more prominent like: # Test using streaming mode 'on' ### # Test using streaming mode 'apply' ### ~~~ 4. src/test/subscription/t/015_stream.pl +# Test streaming mode apply +$node_publisher->safe_psql('postgres', "DELETE FROM test_tab WHERE (a > 2)"); $node_publisher->wait_for_catchup($appname); I think those 2 lines do not really belong after the "# Test streaming mode apply" comment. IIUC they are really just doing cleanup from the prior test part so I think they should a) be *above* this comment (and say "# cleanup the test data") or b) maybe it is best to put all the cleanup lines actually inside the 'test_streaming' function so that the last thing the function does is clean up after itself. option b seems tidier to me. ~~~ 5. src/test/subscription/t/016_stream_subxact.pl sub test_streaming should be commented. (same as comment #2) ~~~ 6. src/test/subscription/t/016_stream_subxact.pl The comments for the different streaming nodes should be more prominent. (same as comment #3) ~~~ 7. src/test/subscription/t/016_stream_subxact.pl +# Test streaming mode apply +$node_publisher->safe_psql('postgres', "DELETE FROM test_tab WHERE (a > 2)"); $node_publisher->wait_for_catchup($appname); These don't seem to belong here. They are clean up from the prior test. (same as comment #4) ~~~ 8. src/test/subscription/t/017_stream_ddl.pl sub test_streaming should be commented. (same as comment #2) ~~~ 9. src/test/subscription/t/017_stream_ddl.pl The comments for the different streaming nodes should be more prominent. (same as comment #3) ~~~ 10. src/test/subscription/t/017_stream_ddl.pl +# Test streaming mode apply $node_publisher->safe_psql( 'postgres', q{ -BEGIN; -INSERT INTO test_tab VALUES (2001, md5(2001::text), -2001, 2*2001); -ALTER TABLE test_tab ADD COLUMN e INT; -SAVEPOINT s1; -INSERT INTO test_tab VALUES (2002, md5(2002::text), -2002, 2*2002, -3*2002); -COMMIT; +DELETE FROM test_tab WHERE (a > 2); +ALTER TABLE test_tab DROP COLUMN c, DROP COLUMN d, DROP COLUMN e, DROP COLUMN f; }); $node_publisher->wait_for_catchup($appname); These don't seem to belong here. They are clean up from the prior test. (same as comment #4) ~~~ 11. .../t/018_stream_subxact_abort.pl sub test_streaming should be commented. (same as comment #2) ~~~ 12. .../t/018_stream_subxact_abort.pl The comments for the different streaming nodes should be more prominent. (same as comment #3) ~~~ 13. .../t/018_stream_subxact_abort.pl +# Test streaming mode apply +$node_publisher->safe_psql('postgres', "DELETE FROM test_tab WHERE (a > 2)"); $node_publisher->wait_for_catchup($appname); These don't seem to belong here. They are clean up from the prior test. (same as comment #4) ~~~ 14. .../t/019_stream_subxact_ddl_abort.pl sub test_streaming should be commented. (same as comment #2) ~~~ 15. .../t/019_stream_subxact_ddl_abort.pl The comments for the different streaming nodes should be more prominent. (same as comment #3) ~~~ 16. .../t/019_stream_subxact_ddl_abort.pl +test_streaming($node_publisher, $node_subscriber, $appname); + +# Test streaming mode apply $node_publisher->safe_psql( 'postgres', q{ -BEGIN; -INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3,500) s(i); -ALTER TABLE test_tab ADD COLUMN c INT; -SAVEPOINT s1; -INSERT INTO test_tab SELECT i, md5(i::text), -i FROM generate_series(501,1000) s(i); -ALTER TABLE test_tab ADD COLUMN d INT; -SAVEPOINT s2; -INSERT INTO test_tab SELECT i, md5(i::text), -i, 2*i FROM generate_series(1001,1500) s(i); -ALTER TABLE test_tab ADD COLUMN e INT; -SAVEPOINT
Re: strange slow query - lost lot of time somewhere
On Fri, 6 May 2022 at 17:52, Pavel Stehule wrote: > Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at > nodeMemoize.c:268 > 268 if (size == 0) > (gdb) p size > $1 = 4369066 Thanks for the report. I think I now see the problem. Looking at [1], it seems that's a bushy plan. That's fine, but less common than a left deep plan. I think the problem is down to some incorrect code in get_memoize_path() where we pass the wrong value of "calls" to create_memoize_path(). I think instead of outer_path->parent->rows it instead should be outer_path->rows. If you look closely at the plan, you'll see that the outer side of the inner-most Nested Loop is parameterized by some higher-level nested loop. -> Nested Loop (cost=1.14..79.20 rows=91 width=8) (actual time=0.024..0.024 rows=1 loops=66) -> Index Only Scan using uq_isi_itemid_itemimageid on item_share_image itemsharei2__1 (cost=0.57..3.85 rows=91 width=16) (actual time=0.010..0.010 rows=1 loops=66) Index Cond: (item_id = itembo0_.id) Heap Fetches: 21 -> Memoize (cost=0.57..2.07 rows=1 width=8) (actual time=0.013..0.013 rows=1 loops=66) so instead of passing 91 to create_memoize_path() as I thought. Since I can't see any WHERE clause items filtering rows from the itemsharei2__1 relation, then the outer_path->parent->rows is should be whatever pg_class.reltuples says. Are you able to send the results of: explain select item_id from item_share_image group by item_id; -- I'm interested in the estimated number of groups in the plan's top node. select reltuples from pg_class where oid = 'item_share_image'::regclass; I'm expecting the estimated number of rows in the top node of the group by plan to be about 4369066. David [1] https://explain.depesz.com/s/2rBw#source
RE: Logical replication timeout problem
On Fri, May 6, 2022 at 9:54 AM Masahiko Sawada wrote: > On Wed, May 4, 2022 at 7:18 PM Amit Kapila wrote: > > > > On Mon, May 2, 2022 at 8:07 AM Masahiko Sawada > wrote: > > > > > > On Mon, May 2, 2022 at 11:32 AM Amit Kapila > wrote: > > > > > > > > > > > > So, shall we go back to the previous approach of using a separate > > > > function update_replication_progress? > > > > > > Ok, agreed. > > > > > > > Attached, please find the updated patch accordingly. Currently, I have > > prepared it for HEAD, if you don't see any problem with this, we can > > prepare the back-branch patches based on this. > > Thank you for updating the patch. Looks good to me. Thanks for your review. Improve the back-branch patches according to the discussion. Move the CHANGES_THRESHOLD logic from function OutputPluginUpdateProgress to new funcion update_replication_progress. In addition, improve all patches formatting with pgindent. Attach the patches. Regards, Wang wei HEAD_v21-0001-Fix-the-logical-replication-timeout-during-large.patch Description: HEAD_v21-0001-Fix-the-logical-replication-timeout-during-large.patch REL14_v4-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL14_v4-0001-Fix-the-logical-replication-timeout-during-large-.patch REL13_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL13_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch REL12_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL12_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch REL11_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL11_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch REL10_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch Description: REL10_v3-0001-Fix-the-logical-replication-timeout-during-large-.patch
Re: make MaxBackends available in _PG_init
Hello! On 19.04.2022 18:46, Nathan Bossart wrote: Okay, I did it this way in v5. Recently i ran into a problem that it would be preferable in our extended pg_stat_statements to use MaxBackEnds in _PG_Init() but it's equal to zero here. And i was very happy to find this patch and this thread. It was not only very interesting and informative for me but also solves the current problem. Patch is applied cleanly on current master. Tests under Linux and Windows were successful. I've tried this patch with our extended pg_stat_statements and checked that MaxBackends has the correct value during extra shmem allocating. Thanks a lot! +1 for this patch. I have a little doubt about the comment in postmaster.c: "Now that loadable modules have had their chance to alter any GUCs". Perhaps this comment is too general. Not sure that it is possible to change any arbitrary GUC here. And maybe not bad to add Assert(size > 0) in RequestAddinShmemSpace() to see that the size calculation in contrib was wrong. With best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: strange slow query - lost lot of time somewhere
Pavel Stehule writes: > Breakpoint 1, build_hash_table (size=4369066, mstate=0xfc7f08) at > nodeMemoize.c:268 > 268 if (size == 0) > (gdb) p size > $1 = 4369066 Uh-huh regards, tom lane