RE: Ability to reference other extensions by schema in extension scripts
> Subject: Re: Ability to reference other extensions by schema in extension > scripts > > "Regina Obe" writes: > >> requires = 'extfoo, extbar' > >> no_relocate = 'extfoo' > > > So when no_relocate is specified, where would that live? > > In the control file. > > > Would I mark the extfoo as not relocatable on CREATE / ALTER of said > > extension? > > Or add an extra field to pg_extension > > We don't record dependent extensions in pg_extension now, so that doesn't > seem like it would fit well. I was envisioning that ALTER EXTENSION SET > SCHEMA would do something along the lines of > > (1) scrape the list of dependent extensions out of pg_depend > (2) open and parse each of their control files > (3) fail if any of their control files mentions the target one in > no_relocate. > > Admittedly, this'd be a bit slow, but I doubt that ALTER EXTENSION SET > SCHEMA is a performance bottleneck for anybody. > > > I had tried to do that originally, e.g. instead of even bothering with > > such an extra arg, just mark it as not relocatable if the extension's > > script contains references to the required extension's schema. > > I don't think that's a great approach, because those references might appear > in places that can track a rename (ie, in an object name that's resolved to a > stored OID). Short of fully parsing the script file you aren't going to get a > reliable answer. I'm content to lay that problem off on the extension authors. > > > But then what if extfoo is upgraded? > > We already have mechanisms for version-dependent control files, so I don't > see where there's a problem. > > regards, tom lane Attached is a revised patch with these changes in place. 0006-Allow-use-of-extschema-reqextname-to-reference.patch Description: Binary data
Re: optimize several list functions with SIMD intrinsics
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: not tested Spec compliant: not tested Documentation:not tested Hello, Adding some review comments: 1. In list_member_ptr, will it be okay to bring `const ListCell *cell` from #ifdef USE_NO_SIMD const ListCell *cell; #endif to #else like as mentioned below? This will make visual separation between #if cases more cleaner #else const ListCell *cell; foreach(cell, list) { if (lfirst(cell) == datum) return true; } return false; #endif 2. Lots of duplicated if (list == NIL) checks before calling list_member_inline_internal(list, datum) Can we not add this check in list_member_inline_internal itself? 3. if (cell) return list_delete_cell(list, cell) in list_delete_int/oid can we change if (cell) to if (cell != NULL) as used elsewhere? 4. list_member_inline_interal_idx , there is typo in name. 5. list_member_inline_interal_idx and list_member_inline_internal are practically same with almost 90+ % duplication. can we not refactor both into one and allow cell or true/false as return? Although I see usage of const ListCell so this might be problematic. 6. Loop for (i = 0; i < tail_idx; i += nelem_per_iteration) for Vector32 in list.c looks duplicated from pg_lfind32 (in pg_lfind.h), can we not reuse that? 7. Is it possible to add a benchmark which shows improvement against HEAD ? Regards, Ankit The new status of this patch is: Waiting on Author
Re: Add LZ4 compression in pg_dump
--- Original Message --- On Saturday, March 11th, 2023 at 7:00 AM, Alexander Lakhin wrote: > Hello, > 23.02.2023 23:24, Tomas Vondra wrote: > > > On 2/23/23 16:26, Tomas Vondra wrote: > > > > > Thanks for v30 with the updated commit messages. I've pushed 0001 after > > > fixing a comment typo and removing (I think) an unnecessary change in an > > > error message. > > > > > > I'll give the buildfarm a bit of time before pushing 0002 and 0003. > > > > I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.), > > and marked the CF entry as committed. Thanks for the patch! > > > > I wonder how difficult would it be to add the zstd compression, so that > > we don't have the annoying "unsupported" cases. > > > With the patch 0003 committed, a single warning -Wtype-limits appeared in the > master branch: > $ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8 > compress_lz4.c: In function ‘LZ4File_gets’: > compress_lz4.c:492:19: warning: comparison of unsigned expression in ‘< 0’ is > always false [-Wtype-limits] > 492 | if (dsize < 0) > | Thank you Alexander. Please find attached an attempt to address it. > (I wonder, is it accidental that there no other places that triggers > the warning, or some buildfarm animals had this check enabled before?) I can not answer about the buildfarms. Do you think that adding an explicit check for this warning in meson would help? I am a bit uncertain as I think that type-limits are included in extra. @@ -1748,6 +1748,7 @@ common_warning_flags = [ '-Wshadow=compatible-local', # This was included in -Wall/-Wformat in older GCC versions '-Wformat-security', + '-Wtype-limits', ] > > It is not a false positive as can be proved by the 002_pg_dump.pl modified as > follows: > - program => $ENV{'LZ4'}, > > + program => 'mv', > > args => [ > > - '-z', '-f', '--rm', > "$tempdir/compression_lz4_dir/blobs.toc", > "$tempdir/compression_lz4_dir/blobs.toc.lz4", > ], > }, Correct, it is not a false positive. The existing testing framework provides limited support for exercising error branches. Especially so when those are dependent on generated output. > A diagnostic logging added shows: > LZ4File_gets() after LZ4File_read_internal; dsize: 18446744073709551615 > > and pg_restore fails with: > error: invalid line in large object TOC file > ".../src/bin/pg_dump/tmp_check/tmp_test_22ri/compression_lz4_dir/blobs.toc": > "" It is a good thing that the restore fails with bad input. Yet it should have failed earlier. The attached makes certain it does fail earlier. Cheers, //Georgios > > Best regards, > AlexanderFrom b80bb52ef6546aee8c8431d7cc126fa4a76b543c Mon Sep 17 00:00:00 2001 From: Georgios Kokolatos Date: Sat, 11 Mar 2023 09:54:40 + Subject: [PATCH v1] Respect return type of LZ4File_read_internal The function LZ4File_gets() was storing the return value of LZ4File_read_internal in a variable of the wrong type, disregarding sign-es. As a consequence, LZ4File_gets() would not take the error path when it should. In an attempt to improve readability, spell out the significance of a negative return value of LZ4File_read_internal() in LZ4File_read(). Reported-by: Alexander Lakhin --- src/bin/pg_dump/compress_lz4.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c index 63e794cdc6..cc039f0b47 100644 --- a/src/bin/pg_dump/compress_lz4.c +++ b/src/bin/pg_dump/compress_lz4.c @@ -453,7 +453,7 @@ LZ4File_read(void *ptr, size_t size, CompressFileHandle *CFH) int ret; ret = LZ4File_read_internal(fs, ptr, size, false); - if (ret != size && !LZ4File_eof(CFH)) + if (ret < 0 || (ret != size && !LZ4File_eof(CFH))) pg_fatal("could not read from input file: %s", LZ4File_get_error(CFH)); return ret; @@ -486,14 +486,14 @@ static char * LZ4File_gets(char *ptr, int size, CompressFileHandle *CFH) { LZ4File*fs = (LZ4File *) CFH->private_data; - size_t dsize; + int ret; - dsize = LZ4File_read_internal(fs, ptr, size, true); - if (dsize < 0) + ret = LZ4File_read_internal(fs, ptr, size, true); + if (ret < 0) pg_fatal("could not read from input file: %s", LZ4File_get_error(CFH)); /* Done reading */ - if (dsize == 0) + if (ret == 0) return NULL; return ptr; -- 2.34.1
Re: optimize several list functions with SIMD intrinsics
> 7. Is it possible to add a benchmark which shows improvement against HEAD ? Please ignore this from my earlier mail, I just saw stats now. Thanks, Ankit
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Hi, I've done a rebase of a patch to the current master. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company From e43e9eab39bbd377665a7cad3b7fe7162f8f6578 Mon Sep 17 00:00:00 2001 From: Andrei Zubkov Date: Sat, 11 Mar 2023 09:53:10 +0300 Subject: [PATCH] pg_stat_statements: Track statement entry timestamp This patch adds stats_since and minmax_stats_since columns to the pg_stat_statements view and pg_stat_statements() function. The new min/max reset mode for the pg_stat_stetments_reset() function is controlled by the parameter minmax_only. stat_since column is populated with the current timestamp when a new statement is added to the pg_stat_statements hashtable. It provides clean information about statistics collection time interval for each statement. Besides it can be used by sampling solutions to detect situations when a statement was evicted and stored again between samples. Such sampling solution could derive any pg_stat_statements statistic values for an interval between two samples with the exception of all min/max statistics. To address this issue this patch adds the ability to reset min/max statistics independently of the statement reset using the new minmax_only parameter of the pg_stat_statements_reset(userid oid, dbid oid, queryid bigint, minmax_only boolean) function. Timestamp of such reset is stored in the minmax_stats_since field for each statement. pg_stat_statements_reset() function now returns the timestamp of a reset as a result. Discussion: https://www.postgresql.org/message-id/flat/72e80e7b160a6eb189df9ef6f068cce3765d37f8.camel%40moonset.ru --- contrib/pg_stat_statements/Makefile | 4 +- .../pg_stat_statements/expected/cursors.out | 28 +-- contrib/pg_stat_statements/expected/dml.out | 20 +-- .../expected/entry_timestamp.out | 159 ++ .../expected/level_tracking.out | 80 - .../expected/oldextversions.out | 70 .../pg_stat_statements/expected/planning.out | 18 +- .../pg_stat_statements/expected/select.out| 44 ++--- .../expected/user_activity.out| 120 ++--- .../pg_stat_statements/expected/utility.out | 150 - contrib/pg_stat_statements/expected/wal.out | 10 +- contrib/pg_stat_statements/meson.build| 2 + .../pg_stat_statements--1.10--1.11.sql| 81 + .../pg_stat_statements/pg_stat_statements.c | 139 +++ .../pg_stat_statements.control| 2 +- contrib/pg_stat_statements/sql/cursors.sql| 6 +- contrib/pg_stat_statements/sql/dml.sql| 4 +- .../sql/entry_timestamp.sql | 114 + .../pg_stat_statements/sql/level_tracking.sql | 12 +- .../pg_stat_statements/sql/oldextversions.sql | 7 + contrib/pg_stat_statements/sql/planning.sql | 4 +- contrib/pg_stat_statements/sql/select.sql | 10 +- .../pg_stat_statements/sql/user_activity.sql | 15 +- contrib/pg_stat_statements/sql/utility.sql| 28 +-- contrib/pg_stat_statements/sql/wal.sql| 2 +- doc/src/sgml/pgstatstatements.sgml| 66 +++- 26 files changed, 881 insertions(+), 314 deletions(-) create mode 100644 contrib/pg_stat_statements/expected/entry_timestamp.out create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.10--1.11.sql create mode 100644 contrib/pg_stat_statements/sql/entry_timestamp.sql diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index 5578a9dd4e3..b2446e7a1cf 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -6,7 +6,7 @@ OBJS = \ pg_stat_statements.o EXTENSION = pg_stat_statements -DATA = pg_stat_statements--1.4.sql \ +DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.10--1.11.sql \ pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \ pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \ pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \ @@ -18,7 +18,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS)) REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf REGRESS = select dml cursors utility level_tracking planning \ - user_activity wal cleanup oldextversions + user_activity wal entry_timestamp cleanup oldextversions # Disabled because these tests require "shared_preload_libraries=pg_stat_statements", # which typical installcheck users do not have (e.g. buildfarm clients). NO_INSTALLCHECK = 1 diff --git a/contrib/pg_stat_statements/expected/cursors.out b/contrib/pg_stat_statements/expected/cursors.out index 46375ea9051..0fc4b2c098d 100644 --- a/contrib/pg_stat_statements/expected/cursors.out +++ b/contrib/pg_stat_statements/expected/cursors.out @@ -3,10 +3,10 @@ -- -- These tests require track_utility to be enabled. SET pg_stat_statements.track_utility = TRUE; -SELECT pg_stat_statements_
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
> On Tue, Feb 28, 2023 at 3:16 AM Bharath Rupireddy > wrote: > > > On Thu, Jan 12, 2023 at 6:06 AM Andres Freund wrote: > > > > > > On 2023-01-11 17:26:19 -0700, David G. Johnston wrote: > > > > Should we just add "ring_buffers" to the existing "shared_buffers" and > > > > "temp_buffers" settings? > > > > > > The different types of ring buffers have different sizes, for good > > > reasons. So > > > I don't see that working well. I also think it'd be more often useful to > > > control this on a statement basis - if you have a parallel import tool > > > that > > > starts NCPU COPYs you'd want a smaller buffer than a single threaded > > > COPY. Of > > > course each session can change the ring buffer settings, but still. > > > > How about having GUCs for each ring buffer (bulk_read_ring_buffers, > > bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)? > > These options can help especially when statement level controls aren't > > easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If > > needed users can also set them at the system level. For instance, one > > can set bulk_write_ring_buffers to other than 16MB or -1 to disable > > the ring buffer to use shared_buffers and run a bunch of bulk write > > queries. In attached v3, I've changed the name of the guc from buffer_usage_limit to vacuum_buffer_usage_limit, since it is only used for vacuum and autovacuum. I haven't added the other suggested strategy gucs, as those could easily be done in a future patchset. I've also changed GetAccessStrategyExt() to GetAccessStrategyWithSize() -- similar to initArrayResultWithSize(). And I've added tab completion for BUFFER_USAGE_LIMIT so that it is easier to try out my patch. Most of the TODOs in the code are related to the question of how autovacuum uses the guc vacuum_buffer_usage_limit. autovacuum creates the buffer access strategy ring in do_autovacuum() before looping through and vacuuming tables. It passes this strategy object on to vacuum(). Since we reuse the same strategy object for all tables in a given invocation of do_autovacuum(), only failsafe autovacuum would change buffer access strategies. This is probably okay, but it does mean that the table-level VacuumParams variable, ring_size, means something different for autovacuum than vacuum. Autovacuum workers will always have set it to -1. We won't ever reach code in vacuum() which relies on VacuumParams->ring_size as long as autovacuum workers pass a non-NULL BufferAccessStrategy object to vacuum(), though. - Melanie From 6f40d87f4f462d48a67721260be7e30a7438520e Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Wed, 22 Feb 2023 12:26:01 -0500 Subject: [PATCH v3 2/3] use shared buffers when failsafe active --- src/backend/access/heap/vacuumlazy.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 8f14cf85f3..b319a244d5 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -2622,6 +2622,11 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel) if (unlikely(vacuum_xid_failsafe_check(&vacrel->cutoffs))) { vacrel->failsafe_active = true; + /* + * Assume the caller who allocated the memory for the + * BufferAccessStrategy object will free it. + */ + vacrel->bstrategy = NULL; /* Disable index vacuuming, index cleanup, and heap rel truncation */ vacrel->do_index_vacuuming = false; -- 2.37.2 From 139e71241729d7304188dffb603e6f43db0bf67c Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Wed, 22 Feb 2023 15:28:34 -0500 Subject: [PATCH v3 3/3] add vacuum option to specify ring_size and guc --- src/backend/commands/vacuum.c | 51 ++- src/backend/commands/vacuumparallel.c | 6 ++- src/backend/postmaster/autovacuum.c | 15 +- src/backend/storage/buffer/README | 2 + src/backend/storage/buffer/freelist.c | 40 +++ src/backend/utils/init/globals.c | 2 + src/backend/utils/misc/guc_tables.c | 11 src/backend/utils/misc/postgresql.conf.sample | 4 ++ src/bin/psql/tab-complete.c | 2 +- src/include/commands/vacuum.h | 1 + src/include/miscadmin.h | 1 + src/include/storage/bufmgr.h | 5 ++ 12 files changed, 136 insertions(+), 4 deletions(-) diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index c62be52e3b..22fe42ee2e 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -127,6 +127,9 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel) /* By default parallel vacuum is enabled */ params.nworkers = 0; + /* by default use buffer access strategy with default size */ + params.ring_size = -1; + /* Parse options list */ foreach(lc, vacstmt->options) { @@ -210,6 +213,43 @@ ExecVacuum(ParseState *pstate, Vacuu
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Fri, Mar 10, 2023 at 9:30 PM Masahiko Sawada wrote: > > On Fri, Mar 10, 2023 at 3:42 PM John Naylor > wrote: > > I'd suggest sharing your todo list in the meanwhile, it'd be good to discuss what's worth doing and what is not. > > Apart from more rounds of reviews and tests, my todo items that need > discussion and possibly implementation are: Quick thoughts on these: > * The memory measurement in radix trees and the memory limit in > tidstores. I've implemented it in v30-0007 through 0009 but we need to > review it. This is the highest priority for me. Agreed. > * Additional size classes. It's important for an alternative of path > compression as well as supporting our decoupling approach. Middle > priority. I'm going to push back a bit and claim this doesn't bring much gain, while it does have a complexity cost. The node1 from Andres's prototype is 32 bytes in size, same as our node3, so it's roughly equivalent as a way to ameliorate the lack of path compression. I say "roughly" because the loop in node3 is probably noticeably slower. A new size class will by definition still use that loop. About a smaller node125-type class: I'm actually not even sure we need to have any sub-max node bigger about 64 (node size 768 bytes). I'd just let 65+ go to the max node -- there won't be many of them, at least in synthetic workloads we've seen so far. > * Node shrinking support. Low priority. This is an architectural wart that's been neglected since the tid store doesn't perform deletion. We'll need it sometime. If we're not going to make this work, why ship a deletion API at all? I took a look at this a couple weeks ago, and fixing it wouldn't be that hard. I even had an idea of how to detect when to shrink size class within a node kind, while keeping the header at 5 bytes. I'd be willing to put effort into that, but to have a chance of succeeding, I'm unwilling to make it more difficult by adding more size classes at this point. -- John Naylor EDB: http://www.enterprisedb.com
Re: Add LZ4 compression in pg_dump
Hi Georgios, 11.03.2023 13:50, gkokola...@pm.me wrote: I can not answer about the buildfarms. Do you think that adding an explicit check for this warning in meson would help? I am a bit uncertain as I think that type-limits are included in extra. @@ -1748,6 +1748,7 @@ common_warning_flags = [ '-Wshadow=compatible-local', # This was included in -Wall/-Wformat in older GCC versions '-Wformat-security', + '-Wtype-limits', ] I'm not sure that I can promote additional checks (or determine where to put them), but if some patch introduces a warning of a type that wasn't present before, I think it's worth to eliminate the warning (if it is sensible) to keep the source code check baseline at the same level or even lift it up gradually. I've also found that the same commit introduced a single instance of the analyzer-possible-null-argument warning: CPPFLAGS="-Og -fanalyzer -Wno-analyzer-malloc-leak -Wno-analyzer-file-leak -Wno-analyzer-null-dereference -Wno-analyzer-shift-count-overflow -Wno-analyzer-free-of-non-heap -Wno-analyzer-null-argument -Wno-analyzer-double-free -Wanalyzer-possible-null-argument" ./configure --with-lz4 -q && make -s -j8 compress_io.c: In function ‘hasSuffix’: compress_io.c:158:47: warning: use of possibly-NULL ‘filename’ where non-null expected [CWE-690] [-Wanalyzer-possible-null-argument] 158 | int filenamelen = strlen(filename); | ^~~~ ‘InitDiscoverCompressFileHandle’: events 1-3 ... (I use gcc-11.3.) As I can see, many existing uses of strdup() are followed by a check for null result, so maybe it's a common practice and a similar check should be added in InitDiscoverCompressFileHandle(). (There also a couple of other warnings introduced with the lz4 compression patches, but those ones are not unique, so I maybe they aren't worth fixing.) It is a good thing that the restore fails with bad input. Yet it should have failed earlier. The attached makes certain it does fail earlier. Thanks! Your patch definitely fixes the issue. Best regards, Alexander
Re: Transparent column encryption
> On Mar 9, 2023, at 11:18 PM, Peter Eisentraut > wrote: > > Here is an updated patch. Thanks, Peter. The patch appears to be in pretty good shape, but I have a few comments and concerns. CEKIsVisible() and CMKIsVisible() are obviously copied from TSParserIsVisible(), and the code comments weren't fully updated. Specifically, the phrase "hidden by another parser of the same name" should be updated to not mention "parser". Why does get_cmkalg_name() return the string "unspecified" for PG_CMK_UNSPECIFIED, but the next function get_cmkalg_jwa_name() returns NULL for PG_CMK_UNSPECIFIED? It seems they would both return NULL, or both return "unspecified". If there's a reason for the divergence, could you add a code comment to clarify? BTW, get_cmkalg_jwa_name() has no test coverage. Looking further at code coverage, the new conditional in printsimple_startup() is never tested with (MyProcPort->column_encryption_enabled), so the block is never entered. This would seem to be a consequence of backends like walsender not using column encryption, which is not terribly surprising, but it got me wondering if you had a particular client use case in mind when you added this block? The new function pg_encrypted_in() appears totally untested, but I have to wonder if that's because we're not round-trip testing pg_dump with column encryption...? The code coverage in pg_dump looks fairly decent, but some column encryption code is not covered. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [Proposal] Allow pg_dump to include all child tables with the root table
Le 04/03/2023 à 20:18, Tom Lane a écrit : As noted, "childs" is bad English and "partitions" is flat out wrong (unless you change it to recurse only to partitions, which doesn't seem like a better definition). We could go with --[exclude-]table-and-children, or maybe --[exclude-]table-and-child-tables, but those are getting into carpal-tunnel-syndrome-inducing territory 🙁. I lack a better naming suggestion offhand. In attachment is version 3 of the patch, it includes the use of options suggested by Stephane and Tom: --table-and-children, --exclude-table-and-children --exclude-table-data-and-children. Documentation have been updated too. Thanks -- Gilles Darold diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index 334e4b7fd1..e97b73194e 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -775,6 +775,16 @@ PostgreSQL documentation + + --exclude-table-and-children=pattern + + +Same as -T/--exclude-table option but automatically +exclude partitions or child tables of the specified tables if any. + + + + --exclude-table-data=pattern @@ -793,6 +803,17 @@ PostgreSQL documentation + + --exclude-table-data-and-children=pattern + + +Same as --exclude-table-data but automatically +exclude partitions or child tables of the specified tables if any. + + + + + --extra-float-digits=ndigits @@ -1158,6 +1179,16 @@ PostgreSQL documentation + + --table-with-children=pattern + + +Same as -t/--table option but automatically +include partitions or child tables of the specified tables if any. + + + + --use-set-session-authorization diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 4217908f84..09d37991d6 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -130,6 +130,10 @@ static SimpleOidList foreign_servers_include_oids = {NULL, NULL}; static SimpleStringList extension_include_patterns = {NULL, NULL}; static SimpleOidList extension_include_oids = {NULL, NULL}; +static SimpleStringList table_include_patterns_and_children = {NULL, NULL}; +static SimpleStringList table_exclude_patterns_and_children = {NULL, NULL}; +static SimpleStringList tabledata_exclude_patterns_and_children = {NULL, NULL}; + static const CatalogId nilCatalogId = {0, 0}; /* override for standard extra_float_digits setting */ @@ -180,7 +184,8 @@ static void expand_foreign_server_name_patterns(Archive *fout, static void expand_table_name_patterns(Archive *fout, SimpleStringList *patterns, SimpleOidList *oids, - bool strict_names); + bool strict_names, + bool with_child_tables); static void prohibit_crossdb_refs(PGconn *conn, const char *dbname, const char *pattern); @@ -421,6 +426,9 @@ main(int argc, char **argv) {"on-conflict-do-nothing", no_argument, &dopt.do_nothing, 1}, {"rows-per-insert", required_argument, NULL, 10}, {"include-foreign-data", required_argument, NULL, 11}, + {"table-and-children", required_argument, NULL, 12}, + {"exclude-table-and-children", required_argument, NULL, 13}, + {"exclude-table-data-and-children", required_argument, NULL, 14}, {NULL, 0, NULL, 0} }; @@ -631,6 +639,19 @@ main(int argc, char **argv) optarg); break; + case 12: /* include table(s) with children */ +simple_string_list_append(&table_include_patterns_and_children, optarg); +dopt.include_everything = false; +break; + + case 13: /* exclude table(s) with children */ +simple_string_list_append(&table_exclude_patterns_and_children, optarg); +break; + + case 14:/* exclude table(s) data */ +simple_string_list_append(&tabledata_exclude_patterns_and_children, optarg); +break; + default: /* getopt_long already emitted a complaint */ pg_log_error_hint("Try \"%s --help\" for more information.", progname); @@ -814,17 +835,34 @@ main(int argc, char **argv) { expand_table_name_patterns(fout, &table_include_patterns, &table_include_oids, - strict_names); + strict_names, false); if (table_include_oids.head == NULL) pg_fatal("no matching tables were found"); } expand_table_name_patterns(fout, &table_exclude_patterns, &table_exclude_oids, - false); + false, false); expand_table_name_patterns(fout, &tabledata_exclude_patterns, &tabledata_exclude_oids, - false); + false, false); + + /* Expand table and children selection patterns into OID lists */ + if (table_include_patterns_and_children.head != NULL) + { + expand_table_name_patterns(fout, &table_include_patterns_and_children, + &table_in
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
On Sat, Mar 11, 2023 at 09:55:33AM -0500, Melanie Plageman wrote: > Subject: [PATCH v3 2/3] use shared buffers when failsafe active > > + /* > + * Assume the caller who allocated the memory for the > + * BufferAccessStrategy object will free it. > + */ > + vacrel->bstrategy = NULL; This comment could use elaboration: ** VACUUM normally restricts itself to a small ring buffer; but in failsafe mode, in order to process tables as quickly as possible, allow the leaving behind large number of dirty buffers. > Subject: [PATCH v3 3/3] add vacuum option to specify ring_size and guc > #define INT_ACCESS_ONCE(var) ((int)(*((volatile int *)&(var > +#define bufsize_limit_to_nbuffers(bufsize) (bufsize * 1024 / BLCKSZ) Macros are normally be capitalized It's a good idea to write "(bufsize)", in case someone passes "a+b". > @@ -586,6 +587,45 @@ GetAccessStrategy(BufferAccessStrategyType btype) > +BufferAccessStrategy > +GetAccessStrategyWithSize(BufferAccessStrategyType btype, int ring_size) Maybe it would make sense for GetAccessStrategy() to call GetAccessStrategyWithSize(). Or maybe not. > + {"vacuum_buffer_usage_limit", PGC_USERSET, RESOURCES_MEM, > + gettext_noop("Sets the buffer pool size for operations > employing a buffer access strategy."), The description should mention vacuum, if that's the scope of the GUC's behavior. > +#vacuum_buffer_usage_limit = -1 # size of vacuum buffer access strategy ring. > + # -1 to use default, > + # 0 to disable vacuum buffer access strategy > and use shared buffers > + # > 0 to specify size If I'm not wrong, there's still no documentation about "ring buffers" or postgres' "strategy". Which seems important to do for this patch, along with other documentation. This patch should add support in vacuumdb.c. And maybe a comment about adding support there, since it's annoying when it the release notes one year say "support VACUUM (FOO)" and then one year later say "support vacuumdb --foo". -- Justin
Re: Improve WALRead() to suck data directly from WAL buffers when possible
> [1] > subscription tests: > PATCHED: WAL buffers hit - 1972, misses - 32616 Can you share more details about the test here? I went through the v8 patch. Following are my thoughts to improve the WAL buffer hit ratio. Currently the no-longer-needed WAL data present in WAL buffers gets cleared in XLogBackgroundFlush() which is called based on the wal_writer_delay config setting. Once the data is flushed to the disk, it is treated as no-longer-needed and it will be cleared as soon as possible based on some config settings. I have done some testing by tweaking the wal_writer_delay config setting to confirm the behaviour. We can see that the WAL buffer hit ratio is good when the wal_writer_delay is big enough [2] compared to smaller wal_writer_delay [1]. So irrespective of the wal_writer_delay settings, we should keep the WAL data in the WAL buffers as long as possible so that all the readers (Mainly WAL senders) can take advantage of this. The WAL page should be evicted from the WAL buffers only when the WAL buffer is full and we need room for the new page. The patch attached takes care of this. We can see the improvements in WAL buffer hit ratio even when the wal_writer_delay is set to lower value [3]. Second, In WALRead(), we try to read the data from disk whenever we don't find the data from WAL buffers. We don't store this data in the WAL buffer. We just read the data, use it and leave it. If we store this data to the WAL buffer, then we may avoid a few disk reads. [1]: wal_buffers=1GB wal_writer_delay=1ms ./pgbench --initialize --scale=300 postgres WAL buffers hit=5046 WAL buffers miss=56767 [2]: wal_buffers=1GB wal_writer_delay=10s ./pgbench --initialize --scale=300 postgres WAL buffers hit=45454 WAL buffers miss=14064 [3]: wal_buffers=1GB wal_writer_delay=1ms ./pgbench --initialize --scale=300 postgres WAL buffers hit=37214 WAL buffers miss=844 Please share your thoughts. Thanks & Regards, Nitin Jadhav On Tue, Mar 7, 2023 at 12:39 PM Bharath Rupireddy wrote: > > On Tue, Mar 7, 2023 at 3:30 AM Nathan Bossart > wrote: > > > > +void > > +XLogReadFromBuffers(XLogRecPtr startptr, > > > > Since this function presently doesn't return anything, can we have it > > return the number of bytes read instead of storing it in a pointer > > variable? > > Done. > > > +ptr = startptr; > > +nbytes = count; > > +dst = buf; > > > > These variables seem superfluous. > > Needed startptr and count for DEBUG1 message and assertion at the end. > Removed dst and used buf in the new patch now. > > > +/* > > + * Requested WAL isn't available in WAL buffers, so return with > > + * what we have read so far. > > + */ > > +break; > > > > nitpick: I'd move this to the top so that you can save a level of > > indentation. > > Done. > > > +/* > > + * All the bytes are not in one page. Read available bytes > > on > > + * the current page, copy them over to output buffer and > > + * continue to read remaining bytes. > > + */ > > > > Is it possible to memcpy more than a page at a time? > > It would complicate things a lot there; the logic to figure out the > last page bytes that may or may not fit in the whole page gets > complicated. Also, the logic to verify each page's header gets > complicated. We might lose out if we memcpy all the pages at once and > start verifying each page's header in another loop. > > I would like to keep it simple - read a single page from WAL buffers, > verify it and continue. > > > +/* > > + * The fact that we acquire WALBufMappingLock while reading > > the WAL > > + * buffer page itself guarantees that no one else initializes > > it or > > + * makes it ready for next use in AdvanceXLInsertBuffer(). > > + * > > + * However, we perform basic page header checks for ensuring > > that > > + * we are not reading a page that just got initialized. Callers > > + * will anyway perform extensive page-level and record-level > > + * checks. > > + */ > > > > Hm. I wonder if we should make these assertions instead. > > Okay. I added XLogReaderValidatePageHeader for assert-only builds > which will help catch any issues there. But we can't perform record > level checks here because this function doesn't know where the record > starts from, it knows only pages. This change required us to pass in > XLogReaderState to XLogReadFromBuffers. I marked it as > PG_USED_FOR_ASSERTS_ONLY and did page header checks only when it is > passed as non-null so that someone who doesn't have XLogReaderState > can still read from buffers. > > > +elog(DEBUG1, "read %zu bytes out of %zu bytes from WAL buffers for > > given LSN %X/%X, Timeline ID %u", > > + *read_bytes, count, LSN_FORMAT_ARGS(startptr), tli); > > > > I definitely don't think we should
Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL
Hi all, (cc'ed Amit as he has the context) While working on [1], I realized that on HEAD there is a problem with the $subject. Here is the relevant discussion on the thread [2]. Quoting my own notes on that thread below; I realized that the dropped columns also get into the tuples_equal() > function. And, > the remote sends NULL to for the dropped columns(i.e., remoteslot), but > index_getnext_slot() (or table_scan_getnextslot) indeed fills the dropped > columns on the outslot. So, the dropped columns are not NULL in the outslot Amit also suggested checking generated columns, which indeed has the same problem. Here are the steps to repro the problem with dropped columns: - pub CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3 timestamptz); ALTER TABLE test REPLICA IDENTITY FULL; INSERT INTO test SELECT NULL, i, i, (i)::text, now() FROM generate_series(0,1)i; CREATE PUBLICATION pub FOR ALL TABLES; -- sub CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3 timestamptz); CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION pub; -- show that before dropping the columns, the data in the source and -- target are deleted properly DELETE FROM test WHERE x = 0; -- both on the source and target SELECT count(*) FROM test WHERE x = 0; ┌───┐ │ count │ ├───┤ │ 0 │ └───┘ (1 row) -- drop columns on both the the source ALTER TABLE test DROP COLUMN drop_1; ALTER TABLE test DROP COLUMN drop_2; ALTER TABLE test DROP COLUMN drop_3; -- drop columns on both the the target ALTER TABLE test DROP COLUMN drop_1; ALTER TABLE test DROP COLUMN drop_2; ALTER TABLE test DROP COLUMN drop_3; -- on the target ALTER SUBSCRIPTION sub REFRESH PUBLICATION; -- after dropping the columns DELETE FROM test WHERE x = 1; -- source SELECT count(*) FROM test WHERE x = 1; ┌───┐ │ count │ ├───┤ │ 0 │ └───┘ (1 row) **-- target, OOPS wrong result**SELECT count(*) FROM test WHERE x = 1; ┌───┐ │ count │ ├───┤ │ 1 │ └───┘ (1 row) Attaching a patch that could possibly solve the problem. Thanks, Onder KALACI [1] https://www.postgresql.org/message-id/flat/CACawEhUN%3D%2BvjY0%2B4q416-rAYx6pw-nZMHQYsJZCftf9MjoPN3w%40mail.gmail.com#2f7fa76f9e4496e3b52a9be6736e5b43 [2] https://www.postgresql.org/message-id/CACawEhUu6S8E4Oo7%2Bs5iaq%3DyLRZJb6uOZeEQSGJj-7NVkDzSaw%40mail.gmail.com v1-0001-Skip-dropped-and-generated-columns-when-REPLICA-I.patch Description: Binary data
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Hi Amit, all > > This triggers tuples_equal to fail. To fix that, I improved the > tuples_equal > > such that it skips the dropped columns. > > > > By any chance, have you tried with generated columns? Yes, it shows the same behavior. > See > logicalrep_write_tuple()/logicalrep_write_attrs() where we neither > send anything for dropped columns nor for generated columns. Similarly, on receiving side, in logicalrep_rel_open() and > slot_store_data(), we seem to be using NULL for such columns. > > Thanks for the explanation, it helps a lot. > > Yes, it would be better to report and discuss this in a separate thread, > Done via [1] > > > Attached as v40_0001 on the patch. > > > > Note that I need to have that commit as 0001 so that 0002 patch > > passes the tests. > > > > I think we can add such a test (which relies on existing buggy > behavior) later after fixing the existing bug. For now, it would be > better to remove that test and add it after we fix dropped columns > issue in HEAD. > Alright, when I push the next version (hopefully tomorrow), I'll follow this suggestion. Thanks, Onder KALACI [1] https://www.postgresql.org/message-id/CACawEhVQC9WoofunvXg12aXtbqKnEgWxoRx3%2Bv8q32AWYsdpGg%40mail.gmail.com
Re: buildfarm + meson
Hi, On 2023-03-09 18:31:10 -0500, Andrew Dunstan wrote: > Another thing: the test for uuid.h is too strict. On Fedora 36 the OSSP > header is in /usr/include, not /usr/include/ossp (I got around that for now > by symlinking it, but obviously that's a nasty hack we can't ask people to > do) Yea, that was just wrong. It happened to work on debian and a few other OSs, but ossp's .pc puts whatever the right directory is into the include path. Pushed the fairly obvious fix. Greetings, Andres Freund
Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences
Hi Alvaro, On Thu, 9 Mar 2023 10:22:49 +0100 Alvaro Herrera wrote: > On 2023-Jan-22, Karl O. Pinc wrote: > > > Actually, this CSS, added to doc/src/sgml/stylesheet.css, > > makes the column spacing look pretty good: > Okay, this looks good to me too. However, for it to actually work, we > need to patch the corresponding CSS file in the pgweb repository too. > I'll follow up in the other mailing list. Do you also like the page breaking in the PDF for each contributed package, per the v10-0002-Page-break-before-sect1-in-contrib-appendix-when-pdf.patch of https://www.postgresql.org/message-id/20230122144246.0ff87372%40slate.karlpinc.com ? No need to reply if I don't need to do anything. (I didn't want the patch to get lost.) Thanks for the review. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)
On Thu, Dec 15, 2022 at 10:13:23AM -0600, Justin Pryzby wrote: > Rebased on c727f511b. Rebased on 30a53b792. With minor changes including fixes to an intermediate patch. > This patch record was "closed for lack of interest", but I think what's > actually needed is committer review of which approach to take. > > - add backend functions but do not modify psql ? > - add to psql slash-plus commnds ? > - introduce psql double-plus commands for new options ? > - change pre-existing psql plus commands to only show size with >double-plus ? > - go back to the original, two-line client-side sum() ? > > Until then, the patchset is organized with those questions in mind. >From cf1596faa449ff1bd1d8b56306118d5eac764291 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 13 Jul 2021 21:25:48 -0500 Subject: [PATCH 1/4] Add pg_am_size(), pg_namespace_size() .. See also: 358a897fa, 528ac10c7 --- src/backend/utils/adt/dbsize.c | 132 src/include/catalog/pg_proc.dat | 19 + 2 files changed, 151 insertions(+) diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c index e5c0f1c45b6..af0955d1790 100644 --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -13,19 +13,25 @@ #include +#include "access/genam.h" #include "access/htup_details.h" #include "access/relation.h" +#include "access/table.h" #include "catalog/catalog.h" #include "catalog/namespace.h" #include "catalog/pg_authid.h" #include "catalog/pg_database.h" +#include "catalog/pg_namespace.h" #include "catalog/pg_tablespace.h" #include "commands/dbcommands.h" +#include "commands/defrem.h" #include "commands/tablespace.h" #include "miscadmin.h" #include "storage/fd.h" #include "utils/acl.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" +#include "utils/lsyscache.h" #include "utils/numeric.h" #include "utils/rel.h" #include "utils/relfilenumbermap.h" @@ -858,6 +864,132 @@ pg_size_bytes(PG_FUNCTION_ARGS) PG_RETURN_INT64(result); } +/* + * Return the sum of size of relations for which the given attribute of + * pg_class matches the specified OID value. + */ +static int64 +calculate_size_attvalue(AttrNumber attnum, Oid attval) +{ + int64 totalsize = 0; + ScanKeyData skey; + Relation pg_class; + SysScanDesc scan; + HeapTuple tuple; + + ScanKeyInit(&skey, attnum, +BTEqualStrategyNumber, F_OIDEQ, attval); + + pg_class = table_open(RelationRelationId, AccessShareLock); + scan = systable_beginscan(pg_class, InvalidOid, false, NULL, 1, &skey); + while (HeapTupleIsValid(tuple = systable_getnext(scan))) + { + Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple); + Relation rel; + + rel = try_relation_open(classtuple->oid, AccessShareLock); + if (!rel) + continue; + + for (ForkNumber forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++) + totalsize += calculate_relation_size(&(rel->rd_locator), rel->rd_backend, forkNum); + + relation_close(rel, AccessShareLock); + } + + systable_endscan(scan); + table_close(pg_class, AccessShareLock); + return totalsize; +} + +/* Compute the size of relations in a schema (namespace) */ +static int64 +calculate_namespace_size(Oid nspOid) +{ + /* + * User must be a member of pg_read_all_stats or have CREATE privilege for + * target namespace. + */ + if (!is_member_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS)) + { + AclResult aclresult; + + aclresult = object_aclcheck(NamespaceRelationId, nspOid, GetUserId(), ACL_CREATE); + if (aclresult != ACLCHECK_OK) + aclcheck_error(aclresult, OBJECT_SCHEMA, + get_namespace_name(nspOid)); + } + + return calculate_size_attvalue(Anum_pg_class_relnamespace, nspOid); +} + +Datum +pg_namespace_size_oid(PG_FUNCTION_ARGS) +{ + Oid nspOid = PG_GETARG_OID(0); + int64 size; + + size = calculate_namespace_size(nspOid); + + if (size < 0) + PG_RETURN_NULL(); + + PG_RETURN_INT64(size); +} + +Datum +pg_namespace_size_name(PG_FUNCTION_ARGS) +{ + Name nspName = PG_GETARG_NAME(0); + Oid nspOid = get_namespace_oid(NameStr(*nspName), false); + int64 size; + + size = calculate_namespace_size(nspOid); + + if (size < 0) + PG_RETURN_NULL(); + + PG_RETURN_INT64(size); +} + +/* Compute the size of relations using the given access method */ +static int64 +calculate_am_size(Oid amOid) +{ + /* XXX acl_check? */ + + return calculate_size_attvalue(Anum_pg_class_relam, amOid); +} + +Datum +pg_am_size_oid(PG_FUNCTION_ARGS) +{ + Oid amOid = PG_GETARG_OID(0); + int64 size; + + size = calculate_am_size(amOid); + + if (size < 0) + PG_RETURN_NULL(); + + PG_RETURN_INT64(size); +} + +Datum +pg_am_size_name(PG_FUNCTION_ARGS) +{ + Name amName = PG_GETARG_NAME(0); + Oid amOid = get_am_oid(NameStr(*amName), false); + int64 size; + + size = calculate_am_size(amOid); + + if (size < 0) + PG_RETURN_NULL(); + + PG_RETURN_INT64(size); +} + /* * Get the filenode of a relation * diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 1351d4be67c
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Hi, On 2023-03-09 12:15:16 -0800, Mark Dilger wrote: > > Somewhat random note: > > > > Is it intentional that we VACUUM FREEZE test ROWCOUNT times? That's > > effectively O(ROWCOUNT^2), albeit with small enough constants to not really > > matter. I don't think we need to insert the rows one-by-one either. Changing > > that to a single INSERT and FREEZE shaves 10-12% off the tests. I didn't > > change that, but we also fire off a psql for each tuple for > > heap_page_items(), > > with offset $N no less. That seems to be another 500ms. > > I don't recall the reasoning. Feel free to optimize the tests. Something like the attached. I don't know enough perl to know how to interpolate something like use constant ROWCOUNT => 17; so I just made it a variable. Greetings, Andres Freund >From a01e1481505e74097112c0ea358e7e0eef6a5684 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Sat, 11 Mar 2023 15:15:35 -0800 Subject: [PATCH v1] pg_amcheck: Minor test speedups Discussion: https://postgr.es/m/20230309001558.b7shzvio645eb...@awork3.anarazel.de --- src/bin/pg_amcheck/t/004_verify_heapam.pl | 33 +++ 1 file changed, 15 insertions(+), 18 deletions(-) diff --git a/src/bin/pg_amcheck/t/004_verify_heapam.pl b/src/bin/pg_amcheck/t/004_verify_heapam.pl index 9984d0d9f87..e5ae7e6aada 100644 --- a/src/bin/pg_amcheck/t/004_verify_heapam.pl +++ b/src/bin/pg_amcheck/t/004_verify_heapam.pl @@ -217,17 +217,17 @@ my $rel = $node->safe_psql('postgres', my $relpath = "$pgdata/$rel"; # Insert data and freeze public.test -use constant ROWCOUNT => 17; +my $ROWCOUNT = 17; $node->safe_psql( 'postgres', qq( INSERT INTO public.test (a, b, c) - VALUES ( + SELECT x'DEADF9F9DEADF9F9'::bigint, 'abcdefg', repeat('w', 1) - ); - VACUUM FREEZE public.test - )) for (1 .. ROWCOUNT); +FROM generate_series(1, $ROWCOUNT); + VACUUM FREEZE public.test;) +); my $relfrozenxid = $node->safe_psql('postgres', q(select relfrozenxid from pg_class where relname = 'test')); @@ -246,16 +246,13 @@ if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid) } # Find where each of the tuples is located on the page. -my @lp_off; -for my $tup (0 .. ROWCOUNT - 1) -{ - push( - @lp_off, - $node->safe_psql( - 'postgres', qq( -select lp_off from heap_page_items(get_raw_page('test', 'main', 0)) - offset $tup limit 1))); -} +my @lp_off = split '\n', $node->safe_psql( + 'postgres', qq( + select lp_off from heap_page_items(get_raw_page('test', 'main', 0)) + where lp <= $ROWCOUNT +) +); +is(scalar @lp_off, $ROWCOUNT, "acquired row offsets"); # Sanity check that our 'test' table on disk layout matches expectations. If # this is not so, we will have to skip the test until somebody updates the test @@ -267,7 +264,7 @@ open($file, '+<', $relpath) binmode $file; my $ENDIANNESS; -for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++) +for (my $tupidx = 0; $tupidx < $ROWCOUNT; $tupidx++) { my $offnum = $tupidx + 1;# offnum is 1-based, not zero-based my $offset = $lp_off[$tupidx]; @@ -345,7 +342,7 @@ open($file, '+<', $relpath) or BAIL_OUT("open failed: $!"); binmode $file; -for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++) +for (my $tupidx = 0; $tupidx < $ROWCOUNT; $tupidx++) { my $offnum = $tupidx + 1;# offnum is 1-based, not zero-based my $offset = $lp_off[$tupidx]; @@ -522,7 +519,7 @@ for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++) $tup->{t_infomask} &= ~HEAP_XMIN_INVALID; push @expected, - qr/${$header}xmin ${xmin} equals or exceeds next valid transaction ID 0:\d+/; + qr/${$header}xmin ${xmin} equals or exceeds next valid transaction ID 0:\d+/; } write_tuple($file, $offset, $tup); } -- 2.38.0
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
> On Mar 11, 2023, at 3:22 PM, Andres Freund wrote: > > Something like the attached. I like that your patch doesn't make the test longer. I assume you've already run the tests and that it works. > I don't know enough perl to know how to interpolate something like > use constant ROWCOUNT => 17; > so I just made it a variable. Seems fair. I certainly don't mind. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
slapd logs to syslog during tests
Hi, On my buildfarm host (for all my animals) I noted that slapd was by far the biggest contributor to syslog. Even though there's not normally slapd running. It's of course the slapds started by various tests. Would anybody mind if I add 'logfile_only' to slapd's config in LdapServer.pm? That still leaves a few logline, from before the config file parsing, but it's a lot better than all requests getting logged. Obviously I also could reconfigure syslog to just filter this stuff, but it seems that the tests shouldn't spam like that. Greetings, Andres Freund
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Hi, On 2023-03-11 15:34:55 -0800, Mark Dilger wrote: > > On Mar 11, 2023, at 3:22 PM, Andres Freund wrote: > > > > Something like the attached. > > I like that your patch doesn't make the test longer. I assume you've already > run the tests and that it works. I did check that, yes :). My process of writing perl is certainly, uh, iterative. No way I would get anything close to working without testing it. CI now finished the tests as well: https://cirrus-ci.com/build/6675457702100992 So I'll go ahead and push that. Greetings, Andres Freund
Re: [BUG] pg_stat_statements and extended query protocol
> If I remove this patch and recompile again, then "initdb -D $PGDATA" works. It appears you must "make clean; make install" to correctly compile after applying the patch. Regards, Sami Imseih Amazon Web Services (AWS)
Re: Transparent column encryption
Hi, On 2023-03-10 08:18:34 +0100, Peter Eisentraut wrote: > Here is an updated patch. I have done some cosmetic polishing and fixed a > minor Windows-related bug. > > In my mind, the patch is complete. > > If someone wants to do some in-depth code review, I suggest focusing on the > following files: > > * src/backend/access/common/printtup.c Have you done benchmarks of some simple workloads to verify this doesn't cause slowdowns (when not using encryption, obviously)? printtup.c is a performance sensitive portion for simple queries, particularly when they return multiple columns. And making tupledescs even wider is likely to have some price, both due to the increase in memory usage, and due to the lower cache density - and that's code where we're already hurting noticeably. Greetings, Andres Freund
Re: unsafe_tests module
Hi, On 2023-02-22 06:47:34 -0500, Andrew Dunstan wrote: > Given its nature and purpose as a module we don't want to run against an > installed instance, shouldn't src/test/modules/unsafe_tests have > NO_INSTALLCHECK=1 in its Makefile and runningcheck:false in its meson.build? Seems like a good idea to me. Greetings, Andres Freund
Re: slapd logs to syslog during tests
> On Mar 11, 2023, at 6:37 PM, Andres Freund wrote: > > Hi, > > On my buildfarm host (for all my animals) I noted that slapd was by far the > biggest contributor to syslog. Even though there's not normally slapd > running. It's of course the slapds started by various tests. > > Would anybody mind if I add 'logfile_only' to slapd's config in LdapServer.pm? > That still leaves a few logline, from before the config file parsing, but it's > a lot better than all requests getting logged. Makes sense Cheers Andrew
Re: [BUG] pg_stat_statements and extended query protocol
On Sat, Mar 11, 2023 at 11:55:22PM +, Imseih (AWS), Sami wrote: > It appears you must "make clean; make install" to correctly compile after > applying the patch. In a git repository, I've learnt to rely on this simple formula, even if it means extra cycles when running ./configure: git clean -d -x -f -- Michael signature.asc Description: PGP signature
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
On Fri, Mar 10, 2023 at 1:05 PM Thomas Munro wrote: > On Fri, Mar 10, 2023 at 11:37 AM Nathan Bossart > wrote: > > On Thu, Mar 09, 2023 at 05:27:08PM -0500, Tom Lane wrote: > > > Is it reasonable to assume that all modern platforms can time > > > millisecond delays accurately? Ten years ago I'd have suggested > > > truncating the delay to a multiple of 10msec and using this logic > > > to track the remainder, but maybe now that's unnecessary. > > > > If so, it might also be worth updating or removing this comment in > > pgsleep.c: > > > > * NOTE: although the delay is specified in microseconds, the effective > > * resolution is only 1/HZ, or 10 milliseconds, on most Unixen. Expect > > * the requested delay to be rounded up to the next resolution boundary. > > > > I've had doubts for some time about whether this is still accurate... Unfortunately I was triggered by this Unix archeology discussion, and wasted some time this weekend testing every Unix we target. I found 3 groups: 1. OpenBSD, NetBSD: Like the comment says, kernel ticks still control sleep resolution. I measure an average time of ~20ms when I ask for 1ms sleeps in a loop with select() or nanosleep(). I don't actually understand why it's not ~10ms because HZ is 100 on these systems, but I didn't look harder. 2. AIX, Solaris, illumos: select() can sleep for 1ms accurately, but not fractions of 1ms. If you use nanosleep() instead of select(), then AIX joins the third group (huh, maybe it's just that its select(us) calls poll(ms) under the covers?), but Solaris does not (maybe it's still tick-based, but HZ == 1000?). 3. Linux, FreeBSD, macOS: sub-ms sleeps are quite accurate (via various system calls). I didn't test Windows but it sounds a lot like it is in group 1 if you use WaitForMultipleObjects() or SleepEx(), as we do. You can probably tune some of the above; for example FreeBSD can go back to the old way with kern.eventtimer.periodic=1 to get a thousand interrupts per second (kern.hz) instead of programming a hardware timer to get an interrupt at just the right time, and then 0.5ms sleep requests get rounded to an average of 1ms, just like on Solaris. And power usage probably goes up. As for what do do about it, I dunno, how about this? * NOTE: although the delay is specified in microseconds, the effective - * resolution is only 1/HZ, or 10 milliseconds, on most Unixen. Expect - * the requested delay to be rounded up to the next resolution boundary. + * resolution is only 1/HZ on systems that use periodic kernel ticks to limit + * sleeping. This may cause sleeps to be rounded up by as much as 1-20 + * milliseconds on old Unixen and Windows. As for the following paragraph about the dangers of select() and interrupts and restarts, I suspect it is describing the HP-UX behaviour (a dropped platform), which I guess must have led to POSIX's reluctance to standardise that properly, but in any case all hypothetical concerns would disappear if we just used POSIX [clock_]nanosleep(), no? It has defined behaviour on signals, and it also tells you the remaining time (if we cared, which we wouldn't).
Re: Testing autovacuum wraparound (including failsafe)
On Wed, Mar 8, 2023 at 10:47 PM Michael Paquier wrote: > I may be missing something, but you cannot use directly a "postgres" > command in a TAP test, can you? See 1a9d802, that has fixed a problem > when TAP tests run with a privileged account on Windows. I was joking. But I did have a real point: once we have tests for the xidStopLimit mechanism, why not take the opportunity to correct the long standing issue with the documentation advising the use of single user mode? -- Peter Geoghegan
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
On Sun, Mar 12, 2023 at 1:34 AM Önder Kalacı wrote: > >> >> I think we can add such a test (which relies on existing buggy >> behavior) later after fixing the existing bug. For now, it would be >> better to remove that test and add it after we fix dropped columns >> issue in HEAD. > > > Alright, when I push the next version (hopefully tomorrow), I'll follow this > suggestion. > Okay, thanks. See, if you can also include your changes in the patch wip_for_optimize_index_column_match (after discussed modification). Few other minor comments: 1. + are enforced for primary keys. Internally, we follow a similar approach for + supporting index scans within logical replication scope. If there are no I think we can remove the above line: "Internally, we follow a similar approach for supporting index scans within logical replication scope." This didn't seem useful for users. 2. diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c index bc6409f695..646e608eb7 100644 --- a/src/backend/executor/execReplication.c +++ b/src/backend/executor/execReplication.c @@ -83,11 +83,8 @@ build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel, if (!AttributeNumberIsValid(table_attno)) { /* -* XXX: For a non-primary/unique index with an additional -* expression, we do not have to continue at this point. However, -* the below code assumes the index scan is only done for simple -* column references. If we can relax the assumption in the below -* code-block, we can also remove the continue. +* XXX: Currently, we don't support expressions in the scan key, +* see code below. */ I have tried to simplify the above comment. See, if that makes sense to you. 3. /* + * We only need to allocate once. This is allocated within per + * tuple context -- ApplyMessageContext -- hence no need to + * explicitly pfree(). + */ We normally don't write why we don't need to explicitly pfree. It is good during the review but not sure if it is a good idea to keep it in the final code. 4. I have modified the proposed commit message as follows, see if that makes sense to you, and let me know if I missed anything especially the review/author credit. Allow the use of indexes other than PK and REPLICA IDENTITY on the subscriber. Using REPLICA IDENTITY FULL on the publisher can lead to a full table scan per tuple change on the subscription when REPLICA IDENTITY or PK index is not available. This makes REPLICA IDENTITY FULL impractical to use apart from some small number of use cases. This patch allows using indexes other than PRIMARY KEY or REPLICA IDENTITY on the subscriber during apply of update/delete. The index that can be used must be a btree index, not a partial index, and it must have at least one column reference (i.e. cannot consist of only expressions). We can uplift these restrictions in the future. There is no smart mechanism to pick the index. If there is more than one index that satisfies these requirements, we just pick the first one. We discussed using some of the optimizer's low-level APIs for this but ruled it out as that can be a maintenance burden in the long run. This patch improves the performance in the vast majority of cases and the improvement is proportional to the amount of data in the table. However, there could be some regression in a small number of cases where the indexes have a lot of duplicate and dead rows. It was discussed that those are mostly impractical cases but we can provide a table or subscription level option to disable this feature if required. Author: Onder Kalaci Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqoniigz7g73hfys7+ng...@mail.gmail.com -- With Regards, Amit Kapila.
Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
And again: TRAP: failed Assert("PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED"), File: "../src/backend/storage/ipc/pmsigna... https://cirrus-ci.com/task/6558324615806976 https://api.cirrus-ci.com/v1/artifact/task/6558324615806976/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/002_pg_upgrade_old_node.log https://api.cirrus-ci.com/v1/artifact/task/6558324615806976/crashlog/crashlog-postgres.exe_0974_2023-03-11_13-57-27-982.txt
Re: [PoC] Implementation of distinct in Window Aggregates
On 04/01/23 18:10, Ankit Kumar Pandey wrote: On 29/12/22 20:58, Ankit Kumar Pandey wrote: > > On 24/12/22 18:22, Ankit Pandey wrote: >> Hi, >> >> This is a PoC patch which implements distinct operation in window >> aggregates (without order by and for single column aggregation, final >> version may vary wrt these limitations). Purpose of this PoC is to >> get feedback on the approach used and corresponding implementation, >> any nitpicking as deemed reasonable. >> >> Distinct operation is mirrored from implementation in nodeAgg. >> Existing partitioning logic determines if row is in partition and >> when distinct is required, all tuples for the aggregate column are >> stored in tuplesort. When finalize_windowaggregate gets called, >> tuples are sorted and duplicates are removed, followed by calling the >> transition function on each tuple. >> When distinct is not required, the above process is skipped and the >> transition function gets called directly and nothing gets inserted >> into tuplesort. >> Note: For each partition, in tuplesort_begin and tuplesort_end is >> involved to rinse tuplesort, so at any time, max tuples in tuplesort >> is equal to tuples in a particular partition. >> >> I have verified it for interger and interval column aggregates (to >> rule out obvious issues related to data types). >> >> Sample cases: >> >> create table mytable(id int, name text); >> insert into mytable values(1, 'A'); >> insert into mytable values(1, 'A'); >> insert into mytable values(5, 'B'); >> insert into mytable values(3, 'A'); >> insert into mytable values(1, 'A'); >> >> select avg(distinct id) over (partition by name) from mytable; >> avg >> >> 2. >> 2. >> 2. >> 2. >> 5. >> >> select avg(id) over (partition by name) from mytable; >> avg >> >> 1.5000 >> 1.5000 >> 1.5000 >> 1.5000 >> 5. >> >> select avg(distinct id) over () from mytable; >> avg >> >> 3. >> 3. >> 3. >> 3. >> 3. >> >> select avg(distinct id) from mytable; >> avg >> >> 3. >> >> This is my first-time contribution. Please let me know if anything >> can be >> improved as I`m eager to learn. >> >> Regards, >> Ankit Kumar Pandey > > Hi all, > > I know everyone is busy with holidays (well, Happy Holidays!) but I > will be glad if someone can take a quick look at this PoC and share > thoughts. > > This is my first time contribution so I am pretty sure there will be > some very obvious feedbacks (which will help me to move forward with > this change). > > Updated patch with latest master. Last patch was an year old. Attaching patch with rebase from latest HEAD Thanks, Ankit diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index 9240c691c1..7c07fb0684 100644 --- a/src/backend/executor/nodeWindowAgg.c +++ b/src/backend/executor/nodeWindowAgg.c @@ -155,13 +155,6 @@ typedef struct WindowStatePerAggData int64 transValueCount; /* number of currently-aggregated rows */ - Datum lastdatum; /* used for single-column DISTINCT */ - FmgrInfo equalfnOne; /* single-column comparisons*/ - Oid *eq_ops; - Oid *sort_ops; - - bool sort_in; - /* Data local to eval_windowaggregates() */ bool restart; /* need to restart this agg in this cycle? */ } WindowStatePerAggData; @@ -171,7 +164,7 @@ static void initialize_windowaggregate(WindowAggState *winstate, WindowStatePerAgg peraggstate); static void advance_windowaggregate(WindowAggState *winstate, WindowStatePerFunc perfuncstate, - WindowStatePerAgg peraggstate, Datum value, bool isNull); + WindowStatePerAgg peraggstate); static bool advance_windowaggregate_base(WindowAggState *winstate, WindowStatePerFunc perfuncstate, WindowStatePerAgg peraggstate); @@ -181,9 +174,6 @@ static void finalize_windowaggregate(WindowAggState *winstate, Datum *result, bool *isnull); static void eval_windowaggregates(WindowAggState *winstate); -static void process_ordered_aggregate_single(WindowAggState *winstate, - WindowStatePerFunc perfuncstate, - WindowStatePerAgg peraggstate); static void eval_windowfunction(WindowAggState *winstate, WindowStatePerFunc perfuncstate, Datum *result, bool *isnull); @@ -241,7 +231,6 @@ initialize_windowaggregate(WindowAggState *winstate, peraggstate->transValueIsNull = peraggstate->initValueIsNull; peraggstate->transValueCount = 0; peraggstate->resultValue = (Datum) 0; - peraggstate->lastdatum = (Datum) 0; peraggstate->resultValueIsNull = true; } @@ -252,21 +241,43 @@ initialize_windowaggregate(WindowAggState *winstate, static void