Re: Diagnostic comment in LogicalIncreaseXminForSlot
On Mon, Jul 12, 2021 at 5:28 PM Ashutosh Bapat wrote: > > On Mon, Jul 12, 2021 at 8:39 AM Amit Kapila wrote: >> >> On Mon, Jul 5, 2021 at 12:54 PM Masahiko Sawada >> wrote: >> > >> > On Fri, May 21, 2021 at 6:00 PM Ashutosh Bapat >> > wrote: >> > > >> > > It's there in CF. I am fine with PG-15. It will be good to patch the >> > > back-branches to have this extra diagnostic information available. >> > >> > The patch looks to me. >> > >> >> { >> slot->candidate_catalog_xmin = xmin; >> slot->candidate_xmin_lsn = current_lsn; >> + elog(DEBUG1, "got new catalog_xmin %u at %X/%X", xmin, >> + LSN_FORMAT_ARGS(current_lsn)); >> } >> SpinLockRelease(>mutex); >> >> Today, again looking at this patch, I don't think doing elog inside >> spinlock is a good idea. We can either release spinlock before it or >> use some variable to remember that we need to write such an elog and >> do it after releasing the lock. What do you think? > > > The elog will be effective only under DEBUG1, otherwise it will be almost a > NOOP. I am wondering whether it's worth adding a bool assignment and move the > elog out only for DEBUG1. > If you don't like adding another variable then probably we can release spinlock in each of if .. else loop. As mentioned previously, it doesn't seem a good idea to use elog inside spinlock, so we either need to find some way to avoid that or probably will drop this patch. Do let me know what you or others prefer here? -- With Regards, Amit Kapila.
Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
On Fri, Aug 6, 2021 at 9:57 PM Japin Li wrote: > > > > > Hmm yes, it cannot cover all cases. I had somehow misunderstood that > > the subscriber knows which relations are associated with which > > publications. Given that the subscriber doesn’t know which relations > > are associated with which publications (and the set of subscribed > > relations on the subscriber becomes out of date once the publication > > is updated), I think it's impossible to achieve that DROP PUBLICATION > > drops relations that are associated with only the publication being > > dropped. > > > >> Do we have better ideas to fix or shall we just > >> say that we will refresh based on whatever current set of relations > >> are present in publication to be dropped? > > > > I don't have better ideas. It seems to me that it's prudent that we > > accept the approach in v3 patch and refresh all publications in DROP > > PUBLICATION cases. > > > > Maybe refreshing all publications in PUBLICATION cases is simple way to > fix the problem. > Do you mean to say that do it for both Add and Drop or just for Drop? Actually, doing it both will make the behavior consistent but doing it just for Drop might be preferable by some users. I think it is better to be consistent here but I am fine if you and others feel it is better to refresh all publications only in Drop case. -- With Regards, Amit Kapila.
Re: Alias collision in `refresh materialized view concurrently`
Michael Paquier writes: > On Fri, Aug 06, 2021 at 10:48:40AM -0400, Tom Lane wrote: >> Given that it took this long to notice the problem at all, maybe >> this is not a fix to cram in on the weekend before the release wrap. >> But I don't see why we need to settle for "mostly works" when >> "always works" is barely any harder. > Yes, I would vote to delay that for a couple of days. That's not > worth taking a risk for. I went ahead and created the patch, including test case, and it seems fine. So I'm leaning towards pushing that tomorrow. Mainly because I don't want to have to document "we partially fixed this" in this release set and then "we really fixed it" three months from now. regards, tom lane
Re: Alias collision in `refresh materialized view concurrently`
On Fri, Aug 06, 2021 at 10:48:40AM -0400, Tom Lane wrote: > AFAICT that works and generates the identical parse tree to the original > coding. The only place touched by the patch where it's a bit difficult to > make the syntax unambiguous this way is > > "CREATE TEMP TABLE %s AS " > "SELECT _$mv.ctid AS tid, _$newdata " > "FROM %s _$mv FULL JOIN %s _$newdata ON (", > > because newdata.* would ordinarily get expanded to multiple columns > if it's at the top level of a SELECT list, and that's not what we want. > However, that's easily fixed using the same hack as in ruleutils.c's > get_variable: add a no-op cast to the table's rowtype. So this > would look like > > appendStringInfo(, > "CREATE TEMP TABLE %s AS " > "SELECT mv.ctid AS tid, newdata.*::%s " > "FROM %s mv FULL JOIN %s newdata ON (", > diffname, matviewname, matviewname, tempname); Smart piece. I haven't thought of that. > Given that it took this long to notice the problem at all, maybe > this is not a fix to cram in on the weekend before the release wrap. > But I don't see why we need to settle for "mostly works" when > "always works" is barely any harder. Yes, I would vote to delay that for a couple of days. That's not worth taking a risk for. -- Michael signature.asc Description: PGP signature
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Fri, Jul 30, 2021 at 3:01 PM Peter Geoghegan wrote: > The RMT discussed this recently. We are concerned about this issue, > including how it has been handled so far. > > If you're unable to resolve the issue (or at least commit time to > resolving the issue) then the RMT will call for the revert of the > original feature patch. The RMT continues to be concerned about the lack of progress on this open item, especially the lack of communication from Michael Meskes, the committer of the original patch (commit ad8305a). We are prepared to exercise our authority to resolve open items directly. The only fallback option available to us is a straight revert of commit ad8305a. We ask that Michael Meskes give a status update here no later than 23:59 on Fri 13 Aug ("anywhere on earth" timezone). This update should include a general assessment of the problem, a proposed resolution (e.g., committing the proposed patch from Hayato Kuroda), and an estimate of when we can expect the problem to be fully resolved. If Michael is unable to provide a status update by that deadline, or if Michael is unable to commit to a reasonably prompt resolution for this open item, then the RMT will call for a revert without further delay. The RMT's first responsibility is to ensure that PostgreSQL 14 is released on schedule. We would prefer to avoid a revert, but we cannot allow this to drag on indefinitely. -- Peter Geoghegan
Re: Use generation context to speed up tuplesorts
On 8/6/21 3:07 PM, David Rowley wrote: On Wed, 4 Aug 2021 at 02:10, Tomas Vondra wrote: A review would be nice, although it can wait - It'd be interesting to know if those patches help with the workload(s) you've been looking at. I tried out the v2 set of patches using the attached scripts. The attached spreadsheet includes the original tests and compares master with the patch which uses the generation context vs that patch plus your v2 patch. I've also included 4 additional tests, each of which starts with a 1 column table and then adds another 32 columns testing the performance after adding each additional column. I did this because I wanted to see if the performance was more similar to master when the allocations had less power of 2 wastage from allocset. If, for example, you look at row 123 of the spreadsheet you can see both patched and unpatched the allocations were 272 bytes each yet there was still a 50% performance improvement with just the generation context patch when compared to master. Looking at the spreadsheet, you'll also notice that in the 2 column test of each of the 4 new tests the number of bytes used for each allocation is larger with the generation context. 56 vs 48. This is due to the GenerationChunk struct size being later than the Allocset's version by 8 bytes. This is because it also holds the GenerationBlock. So with the patch there are some cases where we'll use slightly more memory. Additional tests: 1. Sort 1 tuples on a column with values 0-99 in memory. 2. As #1 but with 1 million tuples. 3 As #1 but with a large OFFSET to remove the overhead of sending to the client. 4. As #2 but with a large OFFSET. Test #3 above is the most similar one to the original tests and shows similar gains. When the sort becomes larger (1 million tuple test), the gains reduce. This indicates the gains are coming from improved CPU cache efficiency from the removal of the power of 2 wastage in memory allocations. All of the tests show that the patches to improve the allocation efficiency of generation.c don't help to improve the results of the test cases. I wondered if it's maybe worth trying to see what happens if instead of doubling the allocations each time, quadruple them instead. I didn't try this. Thanks for the scripts and the spreadsheet with results. I doubt quadrupling the allocations won't help very much, but I suspect the problem might be in the 0004 patch - at least that's what shows regression in my results. Could you try with just 0001-0003 applied? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Changes to recovery_min_apply_delay are ignored while waiting for delay
Rebased. Also added a stronger check to see if the standby is stuck in recovery_min_apply_delay: $node_standby->poll_query_until('postgres', qq{ SELECT wait_event = 'RecoveryApplyDelay' FROM pg_stat_activity WHERE backend_type='startup'; }) or die "Timed out checking if startup is in recovery_min_apply_delay"; Attached v3. Regards, Soumyadeep (VMware) From 441716f45d0fbffde1135ba007c3a6c5b6b464de Mon Sep 17 00:00:00 2001 From: Soumyadeep Chakraborty Date: Mon, 2 Aug 2021 20:50:37 -0700 Subject: [PATCH v3 1/1] Refresh delayUntil in recovery delay loop This ensures that the wait interval in the loop is correctly recalculated with the updated value of recovery_min_apply_delay. Now, if one changes the GUC while we are waiting, those changes will take effect. Practical applications include instantly cancelling a long wait time by setting recovery_min_apply_delay to 0. This can be useful in tests. --- src/backend/access/transam/xlog.c | 11 +--- src/test/recovery/t/005_replay_delay.pl | 36 +++-- 2 files changed, 42 insertions(+), 5 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index d0ec6a834be..3e05119212b 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6234,12 +6234,11 @@ recoveryApplyDelay(XLogReaderState *record) if (!getRecordTimestamp(record, )) return false; - delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay); - /* * Exit without arming the latch if it's already past time to apply this * record */ + delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay); msecs = TimestampDifferenceMilliseconds(GetCurrentTimestamp(), delayUntil); if (msecs <= 0) return false; @@ -6255,7 +6254,13 @@ recoveryApplyDelay(XLogReaderState *record) break; /* - * Wait for difference between GetCurrentTimestamp() and delayUntil + * Recalculate delayUntil as recovery_min_apply_delay could have changed + * while we were waiting in the loop. + */ + delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay); + + /* + * Wait for difference between GetCurrentTimestamp() and delayUntil. */ msecs = TimestampDifferenceMilliseconds(GetCurrentTimestamp(), delayUntil); diff --git a/src/test/recovery/t/005_replay_delay.pl b/src/test/recovery/t/005_replay_delay.pl index 0b56380e0a7..4f409ad10d1 100644 --- a/src/test/recovery/t/005_replay_delay.pl +++ b/src/test/recovery/t/005_replay_delay.pl @@ -7,7 +7,7 @@ use warnings; use PostgresNode; use TestLib; -use Test::More tests => 3; +use Test::More tests => 4; # Initialize primary node my $node_primary = PostgresNode->new('primary'); @@ -56,7 +56,6 @@ $node_standby->poll_query_until('postgres', ok(time() - $primary_insert_time >= $delay, "standby applies WAL only after replication delay"); - # Check that recovery can be paused or resumed expectedly. my $node_standby2 = PostgresNode->new('standby2'); $node_standby2->init_from_backup($node_primary, $backup_name, @@ -110,3 +109,36 @@ $node_standby2->poll_query_until('postgres', $node_standby2->promote; $node_standby2->poll_query_until('postgres', "SELECT NOT pg_is_in_recovery()") or die "Timed out while waiting for promotion to finish"; + +# Now test to see if updates to recovery_min_apply_delay apply when a standby is +# waiting for a recovery delay to elapse. + +# First, set the delay for the next commit to some obscenely high value. +$node_standby->safe_psql('postgres', "ALTER SYSTEM SET recovery_min_apply_delay TO '2h';"); +$node_standby->reload; + +$node_primary->safe_psql('postgres', + "INSERT INTO tab_int VALUES (generate_series(51, 60))"); + +# The standby's flush LSN should be caught up. +my $last_insert_lsn = + $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();"); +$node_primary->wait_for_catchup('standby', 'flush', $last_insert_lsn); + +# The standby's replay LSN will not be caught up as it should be waiting in +# recovery_min_apply_delay. +$node_standby->poll_query_until('postgres', qq{ +SELECT wait_event = 'RecoveryApplyDelay' FROM pg_stat_activity +WHERE backend_type='startup'; +}) or die "Timed out checking if startup is in recovery_min_apply_delay"; +is( $node_standby->safe_psql('postgres', + "SELECT pg_last_wal_replay_lsn() < '$last_insert_lsn'::pg_lsn;"), + 't', 'standby stuck in recovery_min_apply_delay'); + +# Clear the recovery_min_apply_delay timeout so that the wait is immediately +# cancelled and replay can proceed. +$node_standby->safe_psql('postgres', "ALTER SYSTEM SET recovery_min_apply_delay TO DEFAULT;"); +$node_standby->reload; + +# Now the standby should catch up. +$node_primary->wait_for_catchup('standby'); -- 2.25.1
Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
On Fri, Aug 6, 2021 at 2:50 PM Amit Kapila wrote: > > On Fri, Aug 6, 2021 at 10:09 AM Amit Kapila wrote: > > > > On Fri, Aug 6, 2021 at 5:09 AM Masahiko Sawada > > wrote: > > > > > > > But, isn't this happening because of your suggestion to compare the > > current set of relations with relations from publications that doesn't > > need to be removed? Do we have better ideas to fix or shall we just > > say that we will refresh based on whatever current set of relations > > are present in publication to be dropped? > > > > The other options could be that (a) for drop publication, we refresh > all the publications, (b) for both add/drop publication, we refresh > all the publications. > > Any better ideas? > > As this is introduced in PG-14 via the below commit, I am adding > authors and committer to see if they have any better ideas. > > commit 82ed7748b710e3ddce3f7ebc74af80fe4869492f > Author: Peter Eisentraut > Date: Tue Apr 6 10:44:26 2021 +0200 > > ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION > > Author: Japin Li > Author: Bharath Rupireddy > I've added this to Open Items. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash
Hi Drouvot, I can reproduce the issue you mentioned on REL_12_STABLE as well as Master branch, but the patch doesn't apply to REL_12_STABLE. After applied it to Master branch, it returns some wired result when run the query in the first time. As you can see in the log below, after the first time execute the query `select * from pg_logical_slot_get_changes('bdt_slot', null, null);` it returns some extra data. david:postgres$ psql -d postgres psql (15devel) Type "help" for help. postgres=# \q david:postgres$ psql -d postgres psql (15devel) Type "help" for help. postgres=# select pg_create_logical_replication_slot('bdt_slot','test_decoding'); pg_create_logical_replication_slot (bdt_slot,0/1484FA8) (1 row) postgres=# CREATE TABLE tbl1 (a INT, b TEXT); CREATE TABLE postgres=# CREATE TABLE tbl2 (a INT); CREATE TABLE postgres=# ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL; ALTER TABLE postgres=# postgres=# BEGIN; BEGIN postgres=*# INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ; INSERT 0 1 postgres=*# ALTER TABLE tbl1 ADD COLUMN id serial primary key; ALTER TABLE postgres=*# INSERT INTO tbl2 VALUES(1); INSERT 0 1 postgres=*# commit; COMMIT postgres=# postgres=# select * from pg_logical_slot_get_changes('bdt_slot', null, null); lsn| xid | data
Re: Numeric x^y for negative x
On Fri, 6 Aug 2021 at 21:26, Tom Lane wrote: > > Dean Rasheed writes: > > On Fri, 6 Aug 2021 at 17:15, Tom Lane wrote: > >> Looks plausible by eyeball (I've not tested). > > > So, I have back-branch patches for this ready to go. The question is, > > is it better to push now, or wait until after next week's releases? > > I'd push now, given we have a failing buildfarm member. > > Admittedly, there may be nobody else using that compiler out in > the real world, but we don't know that. > OK. Will do. Regards, Dean
Re: Numeric x^y for negative x
Dean Rasheed writes: > On Fri, 6 Aug 2021 at 17:15, Tom Lane wrote: >> Looks plausible by eyeball (I've not tested). > So, I have back-branch patches for this ready to go. The question is, > is it better to push now, or wait until after next week's releases? I'd push now, given we have a failing buildfarm member. Admittedly, there may be nobody else using that compiler out in the real world, but we don't know that. regards, tom lane
Re: Alias collision in `refresh materialized view concurrently`
I wrote: > I just came across this issue while preparing the release notes. > ISTM that people have expended a great deal of effort on a fundamentally > unreliable solution, when a reliable one is easily available. Concretely, I propose the attached. regards, tom lane diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c index 9493b227b4..512b00bc65 100644 --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -537,9 +537,12 @@ transientrel_destroy(DestReceiver *self) /* * Given a qualified temporary table name, append an underscore followed by * the given integer, to make a new table name based on the old one. + * The result is a palloc'd string. * - * This leaks memory through palloc(), which won't be cleaned up until the - * current memory context is freed. + * As coded, this would fail to make a valid SQL name if the given name were, + * say, "FOO"."BAR". Currently, the table name portion of the input will + * never be double-quoted because it's of the form "pg_temp_NNN", cf + * make_new_heap(). But we might have to work harder someday. */ static char * make_temptable_name_n(char *tempname, int n) @@ -627,16 +630,20 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, * that in a way that allows showing the first duplicated row found. Even * after we pass this test, a unique index on the materialized view may * find a duplicate key problem. + * + * Note: here and below, we use "tablename.*::tablerowtype" as a hack to + * keep ".*" from being expanded into multiple columns in a SELECT list. + * Compare ruleutils.c's get_variable(). */ resetStringInfo(); appendStringInfo(, - "SELECT _$newdata FROM %s _$newdata " - "WHERE _$newdata IS NOT NULL AND EXISTS " - "(SELECT 1 FROM %s _$newdata2 WHERE _$newdata2 IS NOT NULL " - "AND _$newdata2 OPERATOR(pg_catalog.*=) _$newdata " - "AND _$newdata2.ctid OPERATOR(pg_catalog.<>) " - "_$newdata.ctid)", - tempname, tempname); + "SELECT newdata.*::%s FROM %s newdata " + "WHERE newdata.* IS NOT NULL AND EXISTS " + "(SELECT 1 FROM %s newdata2 WHERE newdata2.* IS NOT NULL " + "AND newdata2.* OPERATOR(pg_catalog.*=) newdata.* " + "AND newdata2.ctid OPERATOR(pg_catalog.<>) " + "newdata.ctid)", + tempname, tempname, tempname); if (SPI_execute(querybuf.data, false, 1) != SPI_OK_SELECT) elog(ERROR, "SPI_exec failed: %s", querybuf.data); if (SPI_processed > 0) @@ -663,9 +670,9 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, resetStringInfo(); appendStringInfo(, "CREATE TEMP TABLE %s AS " - "SELECT _$mv.ctid AS tid, _$newdata " - "FROM %s _$mv FULL JOIN %s _$newdata ON (", - diffname, matviewname, tempname); + "SELECT mv.ctid AS tid, newdata.*::%s AS newdata " + "FROM %s mv FULL JOIN %s newdata ON (", + diffname, tempname, matviewname, tempname); /* * Get the list of index OIDs for the table from the relcache, and look up @@ -757,9 +764,9 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, if (foundUniqueIndex) appendStringInfoString(, " AND "); -leftop = quote_qualified_identifier("_$newdata", +leftop = quote_qualified_identifier("newdata", NameStr(attr->attname)); -rightop = quote_qualified_identifier("_$mv", +rightop = quote_qualified_identifier("mv", NameStr(attr->attname)); generate_operator_clause(, @@ -787,8 +794,8 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, Assert(foundUniqueIndex); appendStringInfoString(, - " AND _$newdata OPERATOR(pg_catalog.*=) _$mv) " - "WHERE _$newdata IS NULL OR _$mv IS NULL " + " AND newdata.* OPERATOR(pg_catalog.*=) mv.*) " + "WHERE newdata.* IS NULL OR mv.* IS NULL " "ORDER BY tid"); /* Create the temporary "diff" table. */ @@ -814,10 +821,10 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, /* Deletes must come before inserts; do them first. */ resetStringInfo(); appendStringInfo(, - "DELETE FROM %s _$mv WHERE ctid OPERATOR(pg_catalog.=) ANY " - "(SELECT _$diff.tid FROM %s _$diff " - "WHERE _$diff.tid IS NOT NULL " - "AND _$diff._$newdata IS NULL)", + "DELETE FROM %s mv WHERE ctid OPERATOR(pg_catalog.=) ANY " + "(SELECT diff.tid FROM %s diff " + "WHERE diff.tid IS NOT NULL " + "AND diff.newdata IS NULL)", matviewname, diffname); if (SPI_exec(querybuf.data, 0) != SPI_OK_DELETE) elog(ERROR, "SPI_exec failed: %s", querybuf.data); @@ -825,8 +832,8 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, /* Inserts go last. */ resetStringInfo(); appendStringInfo(, - "INSERT INTO %s SELECT (_$diff._$newdata).* " - "FROM %s _$diff WHERE tid IS NULL", + "INSERT INTO %s SELECT (diff.newdata).* " + "FROM %s diff WHERE tid
Re: Numeric x^y for negative x
On Fri, 6 Aug 2021 at 17:15, Tom Lane wrote: > > > I guess the best thing to do is just test the value against > > PG_INT32_MIN/MAX, which is what int84() does. There are 2 other places > > in numeric.c that use similar code to check for int16/32 overflow, so > > it's possible that they're broken in the same way on that platform, > > but they aren't covered by the regression tests, so it's also possible > > that they're OK. Anyway, something like the attached seems likely to > > be safer. > > Looks plausible by eyeball (I've not tested). > So, I have back-branch patches for this ready to go. The question is, is it better to push now, or wait until after next week's releases? Regards, Dean
Re: OpenSSL 3.0.0 compatibility
Daniel Gustafsson writes: > Until there is an animal running OpenSSL 3.0.0 in the buildfarm I think this > should be HEAD only. Further down the line we need to support OpenSSL 3 in > all > backbranches IMO since they are all equally likely to be compiled against it, > but not until we can regularly test against it in the farm. Works for me. regards, tom lane
Re: OpenSSL 3.0.0 compatibility
> On 6 Aug 2021, at 21:01, Tom Lane wrote: > > Peter Eisentraut writes: >> Are you going to commit these? Absolutely, a combination of unplanned home renovations and vacations changed my plans a bit recently. > Note that with release wraps scheduled for Monday, we are probably > already past the time when it'd be wise to push anything that has > a significant chance of introducing portability issues. There's > just not much time to deal with it if the buildfarm shows problems. > So unless you intend this as HEAD-only, I'd counsel waiting for the > release window to pass. Until there is an animal running OpenSSL 3.0.0 in the buildfarm I think this should be HEAD only. Further down the line we need to support OpenSSL 3 in all backbranches IMO since they are all equally likely to be compiled against it, but not until we can regularly test against it in the farm. -- Daniel Gustafsson https://vmware.com/
Re: OpenSSL 3.0.0 compatibility
Peter Eisentraut writes: > Are you going to commit these? Note that with release wraps scheduled for Monday, we are probably already past the time when it'd be wise to push anything that has a significant chance of introducing portability issues. There's just not much time to deal with it if the buildfarm shows problems. So unless you intend this as HEAD-only, I'd counsel waiting for the release window to pass. regards, tom lane
Release notes for August minor releases
I've pushed the first draft for $SUBJECT at https://git.postgresql.org/pg/commitdiff/2f38ec6a157b60291ece1deb072bfff84f317334 Please send comments/corrections by Sunday. regards, tom lane
Re: Support reset of Shared objects statistics in "pg_stat_reset" function
On Fri, 6 Aug 2021 at 21:17, Dilip Kumar wrote: > > On Fri, Aug 6, 2021 at 8:53 PM Himanshu Upadhyaya > wrote: > > > > Hi Sadhu, > > > > Patch working as expected with shared tables, just one Minor comment on the > > patch. > > + if (!dbentry) > > + return; > > + > > + /* > > +* We simply throw away all the shared table entries by recreating > > new > > +* hash table for them. > > +*/ > > + if (dbentry->tables != NULL) > > + hash_destroy(dbentry->tables); > > + if (dbentry->functions != NULL) > > + hash_destroy(dbentry->functions); > > + > > + dbentry->tables = NULL; > > + dbentry->functions = NULL; > > + > > + /* > > +* This creates empty hash tables for tables and functions. > > +*/ > > + reset_dbentry_counters(dbentry); > > > > We already have the above code for non-shared tables, can we restrict > > duplicate code? > > one solution I can think of, if we can have a list with two elements and > > iterate each element with > > these common steps? > > Another idea could be that instead of putting this logic in > pgstat_recv_resetcounter(), we can have this logic in pg_stat_reset() > or maybe in pgstat_reset_counters(). So now > pgstat_recv_resetcounter() logic remain the same and I think that > logic is much cleaner i.e. whatever dobid it got in the message it > will reset stat for that dboid. > > So now, if it depends upon the logic of the callers that what they > want to do so in this case pgstat_recv_resetcounter(), can send two > messages one for MyDatabaseOid which it is already doing, and another > message for the InvalidOid. > Hi, I reviewed patch and please find my review comments below: 1) In pgstat_recv_resetcounter, first we are checking for m_databaseid. +++ b/src/backend/postmaster/pgstat.c @@ -5092,7 +5092,6 @@ pgstat_recv_resetcounter(PgStat_MsgResetcounter *msg, int len) * Lookup the database in the hashtable. Nothing to do if not there. */ dbentry = pgstat_get_db_entry(msg->m_databaseid, false); if (!dbentry) return; If we don't find any dbentry, then we are returning but I think we should try to reset stats for shared tables. I may be wrong because I haven't tested this. 2) + +/* + * Lookup for the shared tables also to reset the stats + */ +dbentry = pgstat_get_db_entry(InvalidOid, false); +if (!dbentry) +return; I think, always we should get dbentry for shared tables so we can add assert here. +Assert (dbentry); 3) pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, int len) { PgStat_StatDBEntry *dbentry; +boolfound; dbentry = pgstat_get_db_entry(msg->m_databaseid, false); @@ -5168,13 +5192,41 @@ pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, int len) /* Set the reset timestamp for the whole database */ dbentry->stat_reset_timestamp = GetCurrentTimestamp(); -/* Remove object if it exists, ignore it if not */ +/* Remove object if it exists, if not then may be it's a shared table */ if (msg->m_resettype == RESET_TABLE) +{ (void) hash_search(dbentry->tables, (void *) &(msg->m_objectid), - HASH_REMOVE, NULL); + HASH_REMOVE, ); +if (!found) +{ +/* If we didn't find it, maybe it's a shared table */ +dbentry = pgstat_get_db_entry(InvalidOid, false); +if (dbentry) +{ +/* Set the reset timestamp for the whole database */ +dbentry->stat_reset_timestamp = GetCurrentTimestamp(); +(void) hash_search(dbentry->tables, (void *) &(msg->m_objectid), + HASH_REMOVE, NULL); +} +} +} else if (msg->m_resettype == RESET_FUNCTION) +{ (void) hash_search(dbentry->functions, (void *) &(msg->m_objectid), - HASH_REMOVE, NULL); + HASH_REMOVE, ); +if (!found) +{ +/* If we didn't find it, maybe it's a shared table */ +dbentry = pgstat_get_db_entry(InvalidOid, false); +if (dbentry) +{ +/* Set the reset timestamp for the whole database */ +dbentry->stat_reset_timestamp = GetCurrentTimestamp(); +(void) hash_search(dbentry->functions, (void *) &(msg->m_objectid), + HASH_REMOVE, NULL); +} +} +} } Above code can be replaced with: @@ -5160,7 +5162,10 @@ pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, int len) { PgStat_StatDBEntry *dbentry; - dbentry = pgstat_get_db_entry(msg->m_databaseid, false); + if (IsSharedRelation(msg->m_objectid)) + dbentry = pgstat_get_db_entry(InvalidOid, false); + else +
Re: OpenSSL 3.0.0 compatibility
On 20.07.21 01:23, Daniel Gustafsson wrote: So I think your proposed patch is sound and a good short-term and low-risk solution The attached 0001 disables the padding. I've tested this with OpenSSL 1.0.1, 1.0.2, 1.1.1 and Git HEAD at e278127cbfa2709d. Another aspect of OpenSSL 3 compatibility is that of legacy cipher support, and as we concluded upthread it's best to leave that to the user to define in openssl.cnf. The attached 0002 adds alternative output files for 3.0.0 installations without the legacy provider loaded, as well as adds a note in the pgcrypto docs to enable it in case DES is needed. It does annoy me a bit that we don't load the openssl.cnf file for 1.0.1 if we start mentioning it in the docs for other versions, but it's probably not worth the effort to fix it given the lack of complaints so far (it needs a call to OPENSSL_config(NULL); guarded to HAVE_ macros for 1.0.1). Are you going to commit these?
Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
Hi, Sorry for the late reply. Having read this thread, the problem is caused by misunderstanding the AlterSubscription_refresh(). My apologies. On Fri, 06 Aug 2021 at 14:12, Masahiko Sawada wrote: > On Fri, Aug 6, 2021 at 1:39 PM Amit Kapila wrote: >> >> On Fri, Aug 6, 2021 at 5:09 AM Masahiko Sawada wrote: >> > >> > On Thu, Aug 5, 2021 at 11:40 PM houzj.f...@fujitsu.com >> > wrote: >> > > >> > > > To summary, I think that what we want to do in DROP SUBSCRIPTION cases >> > > > is to >> > > > drop relations from pg_subscription_rel that are no longer included in >> > > > the set of >> > > > after-calculated publications. In the latter example, when ALTER >> > > > SUBSCRIPTION ... DROP PUBLICATION pub12, we should compare the current >> > > > set of relations associated with {pub12, pub13} to the set of >> > > > relcations associated >> > > > with pub13, not pub12. Then we can find out t2 is no longer associated >> > > > with the >> > > > after-calculated publication (i.g., pub13) so remove it. >> > > >> > > Thanks for the review. You are right, I missed this point. >> > > Attach new version patch which fix these problems(Also add a new >> > > pl-test). >> > > >> > >> > Thank you for updating the patch! >> > >> > Here are some comments: >> > >> > I think that it still could happen that DROP PULIBCATION misses >> > dropping relations. Please consider the following case: >> > >> > On publisher and subscriber: >> > create table t1 (a int); >> > create table t2 (a int); >> > create table t3 (a int); >> > >> > On publisher: >> > create publication pub12 for table t1, t2; >> > create publication pub3 for table t3; >> > >> > On subscriber: >> > create subscription sub connection 'dbname=postgres' publication pub12, >> > pub3; >> > >> > On publisher: >> > alter publication pub3 add table t1; >> > >> > On subscriber: >> > alter subscription sub drop publication pub12; >> > select srsubid, srrelid::regclass::text, srsubstate, srsublsn from >> > pg_subscription_rel; >> > srsubid | srrelid | srsubstate | srsublsn >> > -+-++--- >> >16393 | t3 | r | 0/1707CE0 >> >16393 | t1 | r | 0/1707D18 >> > >> > With v3 patch, pg_subscription_rel has entries for t3 and t1. But if >> > ALTER SUBSCRIPTION ... DROP PUBLICATION pub12 refreshes only relations >> > associated with pub12, t1 should be removed and be added when we >> > refresh pub3. >> > >> >> But, isn't this happening because of your suggestion to compare the >> current set of relations with relations from publications that doesn't >> need to be removed? > > Hmm yes, it cannot cover all cases. I had somehow misunderstood that > the subscriber knows which relations are associated with which > publications. Given that the subscriber doesn’t know which relations > are associated with which publications (and the set of subscribed > relations on the subscriber becomes out of date once the publication > is updated), I think it's impossible to achieve that DROP PUBLICATION > drops relations that are associated with only the publication being > dropped. > >> Do we have better ideas to fix or shall we just >> say that we will refresh based on whatever current set of relations >> are present in publication to be dropped? > > I don't have better ideas. It seems to me that it's prudent that we > accept the approach in v3 patch and refresh all publications in DROP > PUBLICATION cases. > Maybe refreshing all publications in PUBLICATION cases is simple way to fix the problem. >> >> One more thing, I think it would be better to write a separate refresh >> function for add/drop rather than adding checks in the current refresh >> functionality. We can extract common functionality code which can be >> called from the different refresh functions. > > +1 > Agreed. > Regards, -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Numeric x^y for negative x
Dean Rasheed writes: > So the "test for overflow by reverse-conversion" obviously isn't > working as intended, and it's going through power_var_int() when it > shouldn't. I don't think there's anything wrong with that code, so I > think this is a compiler bug. Yeah, looks like one. > I guess the best thing to do is just test the value against > PG_INT32_MIN/MAX, which is what int84() does. There are 2 other places > in numeric.c that use similar code to check for int16/32 overflow, so > it's possible that they're broken in the same way on that platform, > but they aren't covered by the regression tests, so it's also possible > that they're OK. Anyway, something like the attached seems likely to > be safer. Looks plausible by eyeball (I've not tested). regards, tom lane
Re: Numeric x^y for negative x
On Fri, 6 Aug 2021 at 03:58, Tom Lane wrote: > > Dean Rasheed writes: > > On Thu, 5 Aug 2021 at 17:04, Tom Lane wrote: > >> It looks like castoroides is not happy with this patch: > >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=castoroides=2021-08-01%2008%3A52%3A43 > > > Hmm, there's something very weird going on there. > > Yeah. I tried to reproduce this on the gcc compile farm's Solaris 10 > machine, but the test passed fine for me. The only obvious configuration > difference I can find is that that machine has > > $ cc -V > cc: Sun C 5.10 SunOS_sparc Patch 141861-10 2012/11/07 > > whereas castorides' compiler seems to be a few years older. So this > does seem like it could be a compiler bug. > Ah, so the latest test results from castoroides confirm my previous hypothesis, that it isn't even reaching the new code in power_var(): 0.99 ^ 233000 returned 1.0199545627709647 0.99 ^ 70 returned 0.9396000441558118 which are actually the results you'd get if you just cast the exponent to an int32, throwing away the top 32 bits and compute the results: 0.99 ^ -197580800 = 1.0199545627709647 0.99 ^ 623009792 = 0.9396000441558118 So the "test for overflow by reverse-conversion" obviously isn't working as intended, and it's going through power_var_int() when it shouldn't. I don't think there's anything wrong with that code, so I think this is a compiler bug. I guess the best thing to do is just test the value against PG_INT32_MIN/MAX, which is what int84() does. There are 2 other places in numeric.c that use similar code to check for int16/32 overflow, so it's possible that they're broken in the same way on that platform, but they aren't covered by the regression tests, so it's also possible that they're OK. Anyway, something like the attached seems likely to be safer. Regards, Dean diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c new file mode 100644 index 13bb968..0c210a3 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -4289,11 +4289,13 @@ numericvar_to_int32(const NumericVar *va if (!numericvar_to_int64(var, )) return false; + if (unlikely(val < PG_INT32_MIN) || unlikely(val > PG_INT32_MAX)) + return false; + /* Down-convert to int4 */ *result = (int32) val; - /* Test for overflow by reverse-conversion. */ - return ((int64) *result == val); + return true; } Datum @@ -4373,15 +4375,14 @@ numeric_int2(PG_FUNCTION_ARGS) (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("smallint out of range"))); - /* Down-convert to int2 */ - result = (int16) val; - - /* Test for overflow by reverse-conversion. */ - if ((int64) result != val) + if (unlikely(val < PG_INT16_MIN) || unlikely(val > PG_INT16_MAX)) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("smallint out of range"))); + /* Down-convert to int2 */ + result = (int16) val; + PG_RETURN_INT16(result); } @@ -10186,10 +10187,7 @@ power_var(const NumericVar *base, const if (numericvar_to_int64(exp, )) { - int expval = (int) expval64; - - /* Test for overflow by reverse-conversion. */ - if ((int64) expval == expval64) + if (expval64 >= PG_INT32_MIN && expval64 <= PG_INT32_MAX) { /* Okay, select rscale */ rscale = NUMERIC_MIN_SIG_DIGITS; @@ -10197,7 +10195,7 @@ power_var(const NumericVar *base, const rscale = Max(rscale, NUMERIC_MIN_DISPLAY_SCALE); rscale = Min(rscale, NUMERIC_MAX_DISPLAY_SCALE); -power_var_int(base, expval, result, rscale); +power_var_int(base, (int) expval64, result, rscale); return; } } diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out new file mode 100644 index 80b42f8..3707aff --- a/src/test/regress/expected/numeric.out +++ b/src/test/regress/expected/numeric.out @@ -1154,6 +1154,55 @@ SELECT * FROM fract_only; (7 rows) DROP TABLE fract_only; +-- Check conversion to integers +SELECT (-9223372036854775808.5)::int8; -- should fail +ERROR: bigint out of range +SELECT (-9223372036854775808.4)::int8; -- ok + int8 +-- + -9223372036854775808 +(1 row) + +SELECT 9223372036854775807.4::int8; -- ok +int8 +- + 9223372036854775807 +(1 row) + +SELECT 9223372036854775807.5::int8; -- should fail +ERROR: bigint out of range +SELECT (-2147483648.5)::int4; -- should fail +ERROR: integer out of range +SELECT (-2147483648.4)::int4; -- ok +int4 +- + -2147483648 +(1 row) + +SELECT 2147483647.4::int4; -- ok +int4 + + 2147483647 +(1 row) + +SELECT 2147483647.5::int4; -- should fail +ERROR: integer out of range +SELECT (-32768.5)::int2; -- should fail +ERROR: smallint out of range +SELECT (-32768.4)::int2; -- ok + int2 + + -32768 +(1 row) + +SELECT 32767.4::int2; -- ok + int2 +--- + 32767 +(1 row) + +SELECT
Re: Support reset of Shared objects statistics in "pg_stat_reset" function
On Fri, Aug 6, 2021 at 8:53 PM Himanshu Upadhyaya wrote: > > Hi Sadhu, > > Patch working as expected with shared tables, just one Minor comment on the > patch. > + if (!dbentry) > + return; > + > + /* > +* We simply throw away all the shared table entries by recreating new > +* hash table for them. > +*/ > + if (dbentry->tables != NULL) > + hash_destroy(dbentry->tables); > + if (dbentry->functions != NULL) > + hash_destroy(dbentry->functions); > + > + dbentry->tables = NULL; > + dbentry->functions = NULL; > + > + /* > +* This creates empty hash tables for tables and functions. > +*/ > + reset_dbentry_counters(dbentry); > > We already have the above code for non-shared tables, can we restrict > duplicate code? > one solution I can think of, if we can have a list with two elements and > iterate each element with > these common steps? Another idea could be that instead of putting this logic in pgstat_recv_resetcounter(), we can have this logic in pg_stat_reset() or maybe in pgstat_reset_counters(). So now pgstat_recv_resetcounter() logic remain the same and I think that logic is much cleaner i.e. whatever dobid it got in the message it will reset stat for that dboid. So now, if it depends upon the logic of the callers that what they want to do so in this case pgstat_recv_resetcounter(), can send two messages one for MyDatabaseOid which it is already doing, and another message for the InvalidOid. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Support reset of Shared objects statistics in "pg_stat_reset" function
Hi Sadhu, Patch working as expected with shared tables, just one Minor comment on the patch. + if (!dbentry) + return; + + /* +* We simply throw away all the shared table entries by recreating new +* hash table for them. +*/ + if (dbentry->tables != NULL) + hash_destroy(dbentry->tables); + if (dbentry->functions != NULL) + hash_destroy(dbentry->functions); + + dbentry->tables = NULL; + dbentry->functions = NULL; + + /* +* This creates empty hash tables for tables and functions. +*/ + reset_dbentry_counters(dbentry); We already have the above code for non-shared tables, can we restrict duplicate code? one solution I can think of, if we can have a list with two elements and iterate each element with these common steps? Thanks, Himanshu On Fri, Aug 6, 2021 at 5:40 PM Himanshu Upadhyaya < upadhyaya.himan...@gmail.com> wrote: > Hi Sadhu, > > > > The call to “pg_stat_reset“ does reset all the statistics data for > > tables belonging to the current database but does not take care of > > shared tables e.g pg_attribute. > > pg_attribute is not a shared catalog, is the mentioned scenario is also > applicable for few others tables? > > I have just tried it with-out your patch: > > postgres=# SELECT * FROM pg_statio_all_tables where relid=1249; > relid | schemaname | relname| heap_blks_read | heap_blks_hit | > idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | > tidx_blks_read | tidx_blks_hit > > ---++--++---+---+--+-+++--- > 1249 | pg_catalog | pg_attribute | 29 | 522 | > 8 | 673 | || > | > (1 row) > > postgres=# select pg_stat_reset(); > pg_stat_reset > --- > > (1 row) > > postgres=# SELECT * FROM pg_statio_all_tables where relid=1249; > relid | schemaname | relname| heap_blks_read | heap_blks_hit | > idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | > tidx_blks_read | tidx_blks_hit > > ---++--++---+---+--+-+++--- > 1249 | pg_catalog | pg_attribute | 0 | 0 | > 0 |0 | || > | > > > We are able to reset the stats of pg_attibute without your patch. > > Thanks, > Himanshu > > On Fri, Aug 6, 2021 at 1:56 PM Sadhuprasad Patro > wrote: > >> Hi, >> >> The call to “pg_stat_reset“ does reset all the statistics data for >> tables belonging to the current database but does not take care of >> shared tables e.g pg_attribute. Similarly to reset the statistics at >> table level, the call to “pg_stat_reset_single_table_counters“ does >> not take care of shared tables. >> >> When we reset all the statistics using the call “pg_stat_reset”, the >> postgres process internally makes calls to “ >> pgstat_recv_resetcounter“, which resets the statistics of all the >> tables of the current database. But not resetting the statistics of >> shared objects using database ID as 'InvalidOid'. >> >> The same fix is made in the internal function >> “pgstat_recv_resetsinglecounter“ to reset the statistics for the >> shared table for the call "pg_stat_reset_single_table_counters". >> >> -- >> thank u >> SADHU PRASAD >> EnterpriseDB: http://www.enterprisedb.com >> >
Re: Alias collision in `refresh materialized view concurrently`
Michael Paquier writes: > On Wed, Jun 02, 2021 at 03:44:45PM +0530, Bharath Rupireddy wrote: >> Thanks.The changes with that approach are very minimal. PSA v5 and let >> me know if any more changes are needed. > Simple enough, so applied and back-patched. I just came across this issue while preparing the release notes. ISTM that people have expended a great deal of effort on a fundamentally unreliable solution, when a reliable one is easily available. The originally complained-of issue was that a user-chosen column name could collide with the query-chosen table name: ERROR: column reference "mv" is ambiguous LINE 1: ...alog.=) mv.mv AND newdata OPERATOR(pg_catalog.*=) mv) WHERE ... This is true, but it's self-inflicted damage, because all you have to do is write the query so that mv is clearly a table name: ... mv.mv AND newdata.* OPERATOR(pg_catalog.*=) mv.*) WHERE ... AFAICT that works and generates the identical parse tree to the original coding. The only place touched by the patch where it's a bit difficult to make the syntax unambiguous this way is "CREATE TEMP TABLE %s AS " "SELECT _$mv.ctid AS tid, _$newdata " "FROM %s _$mv FULL JOIN %s _$newdata ON (", because newdata.* would ordinarily get expanded to multiple columns if it's at the top level of a SELECT list, and that's not what we want. However, that's easily fixed using the same hack as in ruleutils.c's get_variable: add a no-op cast to the table's rowtype. So this would look like appendStringInfo(, "CREATE TEMP TABLE %s AS " "SELECT mv.ctid AS tid, newdata.*::%s " "FROM %s mv FULL JOIN %s newdata ON (", diffname, matviewname, matviewname, tempname); Given that it took this long to notice the problem at all, maybe this is not a fix to cram in on the weekend before the release wrap. But I don't see why we need to settle for "mostly works" when "always works" is barely any harder. regards, tom lane
Re: RFC: Improve CPU cache locality of syscache searches
On Thu, Aug 5, 2021 at 4:12 PM Andres Freund wrote: > I have wondered before whether we should have syscache definitions generate > code specific to each syscache definition. I do think that'd give a good bit > of performance boost. But I don't see a trivial way to get there without > notational overhead. > > We could define syscaches in a header using a macro that's defined differently > in syscache.c than everywhere else. The header would declare a set of > functions for each syscache, syscache.c would define them to call an > always_inline function with the relevant constants. > > Or perhaps we should move syscache definitions into the pg_*.h headers, and > generate the relevant code as part of their processing. That seems like it > could be nice from a modularity POV alone. And it could do better than the > current approach, because we could hardcode the types for columns in the > syscache definition without increasing notational overhead. I had code laying around to generate the info needed for initialization, but I apparently deleted it and never emailed it. :-( If we went that far, I wonder if we could specialize the cc_bucket for the actual types, rather than just number of Datums, which would further save on space. More complex, though. Also, if comparisons got cheaper, maybe we could get away with not storing the full hash in the bucket in favor of a tag of the 16 highest bits. (The catctup would still need the full hash for invalidation, at least). Another idea I came across while researching in-memory key-value stores is "bulk chaining" -- see [1] for an example. In that case, our example 2-cacheline bucket would have a dlist_head pointing not to the catctups, but to another bucket. So that throws away my L1/L2 analogy and uses a chain of buckets, each of which contain multiple sets of hashes/keys/pointers. That's closer to a normal collision chain, but with better cache locality. If we did that, I imagine the catctup could dispense with storing its Datum array and its dlist_node entirely. [1] https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-lim.pdf -- John Naylor EDB: http://www.enterprisedb.com
[PATCH] OpenSSL: Mark underlying BIO with the appropriate type flags
Hi Everyone, I've ran into an issue where the OpenSSL API function "ssl_get_fd" fails, due to the underlying BIO object created by Postgres is not being flagged properly. Previous to OpenSSL version 1.1.0, the BIO methods object would be copied directly from the existing socket type and then its read\write functions would be replaced. With 1.1.0 and up, the object is created from scratch and then all its methods are initialized to be the ones of the socket type, except read/write which are custom. In this newer way, a new type is given to it by calling "BIO_get_new_index", but the related type flags aren't added. For more information please see: https://www.openssl.org/docs/man1.1.0/man3/BIO_get_new_index.html In this case, the type should have both BIO_TYPE_SOURCE_SINK and BIO_TYPE_DESCRIPTOR. The proposed patch will add these flags to the BIO type on creation. I have compiled it with OpenSSL, enabled encryption and verified that basic queries work fine. I don't believe this affects Postgres, since the code has been this way for 5 years. I ran into it because I'm writing auditing code that hooks on Postgres calls. I've already found a workaround by adding these flags myself with an additional hook, but thought it would be worth bringing up here and see if you think it's worth patching. Regards, Itamar Gafni Imperva --- NOTICE: This email and all attachments are confidential, may be proprietary, and may be privileged or otherwise protected from disclosure. They are intended solely for the individual or entity to whom the email is addressed. However, mistakes sometimes happen in addressing emails. If you believe that you are not an intended recipient, please stop reading immediately. Do not copy, forward, or rely on the contents in any way. Notify the sender and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or destroy any copy of this email and its attachments. The sender reserves and asserts all rights to confidentiality, as well as any privileges that may apply. Any disclosure, copying, distribution or action taken or omitted to be taken by an unintended recipient in reliance on this message is prohibited and may be unlawful. Please consider the environment before printing this email. 001-Add-type-flags-to-SSL-BIO.patch Description: 001-Add-type-flags-to-SSL-BIO.patch
[PATCH]Remove obsolete macro CHECKFLOATVAL in btree_gist
Hi Commit 6bf0bc842 replaced float.c's CHECKFLOATVAL() macro with static inline subroutines, but the fix introduced a performance regression as Tom Lane pointed out and fixed at 607f8ce74. Found obsolete CHECKFLOATVAL usage in contrib/btree_gist, and tried to fix it according to 607f8ce74. The attached patch has been testified in master. All tap tests passed. Regards, Tang 0001-Remove-obsolete-macro-CHECKFLOATVAL.patch Description: 0001-Remove-obsolete-macro-CHECKFLOATVAL.patch
Re: logical replication empty transactions
On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila wrote: > > On Fri, Jul 23, 2021 at 3:39 PM Ajin Cherian wrote: > > > > Let's first split the patch for prepared and non-prepared cases as > that will help to focus on each of them separately. As a first shot, I have split the patch into prepared and non-prepared cases, regards, Ajin Cherian Fujitsu Australia v12-0002-Skip-empty-prepared-transactions-for-logical-rep.patch Description: Binary data v12-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: Added schema level support for publication.
On Fri, Aug 6, 2021 at 4:02 PM Amit Kapila wrote: > > On Fri, Aug 6, 2021 at 2:16 PM vignesh C wrote: > > > > On Thu, Aug 5, 2021 at 3:54 PM Amit Kapila wrote: > > > > > > > > > Few more comments: > > > === > > > 1. Do we need the previous column 'puballtables' after adding pubtype > > > or pubkind in pg_publication? > > > > I felt this should be retained as old client will still be using > > puballtables, like in case of old client executing \dRp+ commands. > > > > But do we guarantee that old clients work with newer server versions? > For example, psql docs say: "psql works best with servers of the same > or an older major version. Backslash commands are particularly likely > to fail if the server is of a newer version than psql itself." [1] > (See Notes Section). Similarly, pg_dump docs say: "However, pg_dump > cannot dump from PostgreSQL servers newer than its own major version; > it will refuse to even try, rather than risk making an invalid dump." > [2] (See Notes Section). > > It seems Sawada-San has the same question and IIUC docs suggest we > don't need such compatibility, so what makes you think we need it? Ok, I was not sure if we can remove any system table columns, hence had retained it. It seems that my understanding was wrong. I will remove this column in the next version of the patch. Regards, Vignesh
Re: Use generation context to speed up tuplesorts
On Wed, 4 Aug 2021 at 02:10, Tomas Vondra wrote: > A review would be nice, although it can wait - It'd be interesting to > know if those patches help with the workload(s) you've been looking at. I tried out the v2 set of patches using the attached scripts. The attached spreadsheet includes the original tests and compares master with the patch which uses the generation context vs that patch plus your v2 patch. I've also included 4 additional tests, each of which starts with a 1 column table and then adds another 32 columns testing the performance after adding each additional column. I did this because I wanted to see if the performance was more similar to master when the allocations had less power of 2 wastage from allocset. If, for example, you look at row 123 of the spreadsheet you can see both patched and unpatched the allocations were 272 bytes each yet there was still a 50% performance improvement with just the generation context patch when compared to master. Looking at the spreadsheet, you'll also notice that in the 2 column test of each of the 4 new tests the number of bytes used for each allocation is larger with the generation context. 56 vs 48. This is due to the GenerationChunk struct size being later than the Allocset's version by 8 bytes. This is because it also holds the GenerationBlock. So with the patch there are some cases where we'll use slightly more memory. Additional tests: 1. Sort 1 tuples on a column with values 0-99 in memory. 2. As #1 but with 1 million tuples. 3 As #1 but with a large OFFSET to remove the overhead of sending to the client. 4. As #2 but with a large OFFSET. Test #3 above is the most similar one to the original tests and shows similar gains. When the sort becomes larger (1 million tuple test), the gains reduce. This indicates the gains are coming from improved CPU cache efficiency from the removal of the power of 2 wastage in memory allocations. All of the tests show that the patches to improve the allocation efficiency of generation.c don't help to improve the results of the test cases. I wondered if it's maybe worth trying to see what happens if instead of doubling the allocations each time, quadruple them instead. I didn't try this. David #!/bin/bash sec=5 dbname=postgres records=100 mod=100 psql -c "drop table if exists t" $dbname psql -c "create table t (a bigint not null)" $dbname psql -c "insert into t select x % $mod from generate_series(1,$records) x" $dbname psql -c "vacuum freeze t" $dbname for i in {1..32} do echo $i psql -c "explain analyze select * from t order by a offset 10" $dbname | grep -E "Memory|Disk" echo "select * from t order by a offset 10" > bench.sql for loops in {1..3} do pgbench -n -M prepared -T $sec -f bench.sql $dbname | grep tps done psql -c "alter table t add column c$i bigint" $dbname psql -c "update t set c$i = a" $dbname psql -c "vacuum full t" $dbname psql -c "vacuum freeze t" $dbname done generation context tuplesort.ods Description: application/vnd.oasis.opendocument.spreadsheet #!/bin/bash sec=5 dbname=postgres records=1 mod=100 psql -c "drop table if exists t" $dbname psql -c "create table t (a bigint not null)" $dbname psql -c "insert into t select x % $mod from generate_series(1,$records) x" $dbname psql -c "vacuum freeze t" $dbname for i in {1..32} do echo $i psql -c "explain analyze select * from t order by a offset 10" $dbname | grep -E "Memory|Disk" echo "select * from t order by a offset 10" > bench.sql for loops in {1..3} do pgbench -n -M prepared -T $sec -f bench.sql $dbname | grep tps done psql -c "alter table t add column c$i bigint" $dbname psql -c "update t set c$i = a" $dbname psql -c "vacuum full t" $dbname psql -c "vacuum freeze t" $dbname done
Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
On 8/5/21 11:29 PM, Andres Freund wrote: > Hi, > > When testing EXEC_BACKEND on linux I see occasional test failures as long as I > don't disable ASLR. There's a code comment to that effect: > >* If testing EXEC_BACKEND on Linux, you should run this as root before >* starting the postmaster: >* >* echo 0 >/proc/sys/kernel/randomize_va_space > > but I don't like doing that on a system wide basis. > > Linux allows disabling ASLR on a per-process basis using > personality(ADDR_NO_RANDOMIZE). There's a wrapper binary to do that as well, > setarch --addr-no-randomize. > > I was wondering if we should have postmaster do personality(ADDR_NO_RANDOMIZE) > for EXEC_BACKEND builds? It seems nicer to make it automatically work than > have people remember that they need to call "setarch --addr-no-randomize make > check". > > Not that it actually matters for EXEC_BACKEND, but theoretically doing > personality(ADDR_NO_RANDOMIZE) in postmaster is a tad more secure than doing > it via setarch, as in the personality() case postmaster's layout itself is > still randomized... > > > Or perhaps we should just add a comment mentioning setarch. > If we can set it conveniently then that seems worth doing. (Thinks: do we have non-Windows buildfarm members doing EXEC_BACKEND?) cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Support reset of Shared objects statistics in "pg_stat_reset" function
On Fri, 6 Aug 2021 at 17:40, Himanshu Upadhyaya wrote: > > Hi Sadhu, > > > > The call to “pg_stat_reset“ does reset all the statistics data for > > tables belonging to the current database but does not take care of > > shared tables e.g pg_attribute. > > pg_attribute is not a shared catalog, is the mentioned scenario is also > applicable for few others tables? Yes, I think, by mistake, Sadhu has mentioned pg_attribute. With patch, I checked for pg_database and verified that we are resetting stats. psql (15devel) Type "help" for help. postgres=# SELECT * FROM pg_statio_all_tables where relid=1262; relid | schemaname | relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit ---++-++---+---+--+-+++--- 1262 | pg_catalog | pg_database | 1 | 2 | 2 |0 | 0 | 0 | 0 | 0 (1 row) postgres=# postgres=# select pg_stat_reset(); pg_stat_reset --- (1 row) postgres=# SELECT * FROM pg_statio_all_tables where relid=1262; relid | schemaname | relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit ---++-++---+---+--+-+++--- 1262 | pg_catalog | pg_database | 0 | 0 | 0 |0 | 0 | 0 | 0 | 0 (1 row) postgres=# Shared tables are: 1. pg_database -- DatabaseRelationId 1262 2. pg_tablespcae -- TableSpaceRelationId 1213 3. pg_authid -- AuthIdRelationId 1260 4. pg_auth_members -- AuthMemRelationId 1261 5. pg_shdescription -- SharedDescriptionRelationId 2396 6. pg_shdepend -- SharedDependRelationId 1214 7. pg_shseclabel -- SharedSecLabelRelationId 3592 8. pg_db_role_setting -- DbRoleSettingRelationId 2694 9. pg_replication_origin -- ReplicationOriginRelationId 6000 10. pg_subscription -- SubscriptionRelationId 6100 -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com > > I have just tried it with-out your patch: > > postgres=# SELECT * FROM pg_statio_all_tables where relid=1249; > relid | schemaname | relname| heap_blks_read | heap_blks_hit | > idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | > tidx_blks_read | tidx_blks_hit > ---++--++---+---+--+-+++--- > 1249 | pg_catalog | pg_attribute | 29 | 522 | > 8 | 673 | ||| > (1 row) > > postgres=# select pg_stat_reset(); > pg_stat_reset > --- > > (1 row) > > postgres=# SELECT * FROM pg_statio_all_tables where relid=1249; > relid | schemaname | relname| heap_blks_read | heap_blks_hit | > idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | > tidx_blks_read | tidx_blks_hit > ---++--++---+---+--+-+++--- > 1249 | pg_catalog | pg_attribute | 0 | 0 | > 0 |0 | ||| > > > We are able to reset the stats of pg_attibute without your patch. > > Thanks, > Himanshu > > On Fri, Aug 6, 2021 at 1:56 PM Sadhuprasad Patro wrote: >> >> Hi, >> >> The call to “pg_stat_reset“ does reset all the statistics data for >> tables belonging to the current database but does not take care of >> shared tables e.g pg_attribute. Similarly to reset the statistics at >> table level, the call to “pg_stat_reset_single_table_counters“ does >> not take care of shared tables. >> >> When we reset all the statistics using the call “pg_stat_reset”, the >> postgres process internally makes calls to “ >> pgstat_recv_resetcounter“, which resets the statistics of all the >> tables of the current database. But not resetting the statistics of >> shared objects using database ID as 'InvalidOid'. >> >> The same fix is made in the internal function >> “pgstat_recv_resetsinglecounter“ to reset the statistics for the >> shared table for the call "pg_stat_reset_single_table_counters". >> >> -- >> thank u >> SADHU PRASAD >> EnterpriseDB: http://www.enterprisedb.com -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Re: Support reset of Shared objects statistics in "pg_stat_reset" function
Hi Sadhu, > The call to “pg_stat_reset“ does reset all the statistics data for > tables belonging to the current database but does not take care of > shared tables e.g pg_attribute. pg_attribute is not a shared catalog, is the mentioned scenario is also applicable for few others tables? I have just tried it with-out your patch: postgres=# SELECT * FROM pg_statio_all_tables where relid=1249; relid | schemaname | relname| heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit ---++--++---+---+--+-+++--- 1249 | pg_catalog | pg_attribute | 29 | 522 | 8 | 673 | || | (1 row) postgres=# select pg_stat_reset(); pg_stat_reset --- (1 row) postgres=# SELECT * FROM pg_statio_all_tables where relid=1249; relid | schemaname | relname| heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit ---++--++---+---+--+-+++--- 1249 | pg_catalog | pg_attribute | 0 | 0 | 0 |0 | || | We are able to reset the stats of pg_attibute without your patch. Thanks, Himanshu On Fri, Aug 6, 2021 at 1:56 PM Sadhuprasad Patro wrote: > Hi, > > The call to “pg_stat_reset“ does reset all the statistics data for > tables belonging to the current database but does not take care of > shared tables e.g pg_attribute. Similarly to reset the statistics at > table level, the call to “pg_stat_reset_single_table_counters“ does > not take care of shared tables. > > When we reset all the statistics using the call “pg_stat_reset”, the > postgres process internally makes calls to “ > pgstat_recv_resetcounter“, which resets the statistics of all the > tables of the current database. But not resetting the statistics of > shared objects using database ID as 'InvalidOid'. > > The same fix is made in the internal function > “pgstat_recv_resetsinglecounter“ to reset the statistics for the > shared table for the call "pg_stat_reset_single_table_counters". > > -- > thank u > SADHU PRASAD > EnterpriseDB: http://www.enterprisedb.com >
Re: Support reset of Shared objects statistics in "pg_stat_reset" function
On Fri, 6 Aug 2021 at 13:56, Sadhuprasad Patro wrote: > > Hi, > > The call to “pg_stat_reset“ does reset all the statistics data for > tables belonging to the current database but does not take care of > shared tables e.g pg_attribute. Similarly to reset the statistics at > table level, the call to “pg_stat_reset_single_table_counters“ does > not take care of shared tables. > > When we reset all the statistics using the call “pg_stat_reset”, the > postgres process internally makes calls to “ > pgstat_recv_resetcounter“, which resets the statistics of all the > tables of the current database. But not resetting the statistics of > shared objects using database ID as 'InvalidOid'. > > The same fix is made in the internal function > “pgstat_recv_resetsinglecounter“ to reset the statistics for the > shared table for the call "pg_stat_reset_single_table_counters". > Hi Sadhu, I was looking into the patch. I will look more in the coming days. I have below comments. 1. Please can you share the test case to reproduce the issue and please add the test case in patch. 2. diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 56755cb92b..f272931276 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -5092,7 +5092,6 @@ pgstat_recv_resetcounter(PgStat_MsgResetcounter *msg, int len) * Lookup the database in the hashtable. Nothing to do if not there. */ dbentry = pgstat_get_db_entry(msg->m_databaseid, false); - if (!dbentry) return; I think, by mistake, you removed one line in the patch. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Re: Added schema level support for publication.
On Fri, Aug 6, 2021 at 2:02 PM vignesh C wrote: > > On Wed, Aug 4, 2021 at 4:10 PM Amit Kapila wrote: > > > > On Tue, Aug 3, 2021 at 8:38 PM vignesh C wrote: > > > 6. > > + {PublicationSchemaRelationId, /* PUBLICATIONSCHEMAMAP */ > > + PublicationSchemaPsnspcidPspubidIndexId, > > + 2, > > + { > > + Anum_pg_publication_schema_psnspcid, > > + Anum_pg_publication_schema_pspubid, > > + 0, > > + 0 > > + }, > > > > Why don't we keep pubid as the first column in this index? > > I wanted to keep it similar to PUBLICATIONRELMAP, should we keep it as > it is, thoughts? > Okay, I see your point. I think for PUBLICATIONRELMAP, we need it because it is searched using the only relid in GetRelationPublications, so, similarly, in the patch, you are using schema_oid in GetSchemaPublications, so probably that will help. I was wondering why you haven't directly used the cache in GetSchemaPublications similar to GetRelationPublications? It seems there is a danger for concurrent object drop. Can you please check how the safety is ensured say when either one wants to drop the corresponding relation/schema or publication? Another point is why can't we use the other index (where the index is on relid or schema_oid (PublicationSchemaObjectIndexId))? -- With Regards, Amit Kapila.
Re: Added schema level support for publication.
On Fri, Aug 6, 2021 at 2:16 PM vignesh C wrote: > > On Thu, Aug 5, 2021 at 3:54 PM Amit Kapila wrote: > > > > > > Few more comments: > > === > > 1. Do we need the previous column 'puballtables' after adding pubtype > > or pubkind in pg_publication? > > I felt this should be retained as old client will still be using > puballtables, like in case of old client executing \dRp+ commands. > But do we guarantee that old clients work with newer server versions? For example, psql docs say: "psql works best with servers of the same or an older major version. Backslash commands are particularly likely to fail if the server is of a newer version than psql itself." [1] (See Notes Section). Similarly, pg_dump docs say: "However, pg_dump cannot dump from PostgreSQL servers newer than its own major version; it will refuse to even try, rather than risk making an invalid dump." [2] (See Notes Section). It seems Sawada-San has the same question and IIUC docs suggest we don't need such compatibility, so what makes you think we need it? [1] - https://www.postgresql.org/docs/devel/app-psql.html [2] - https://www.postgresql.org/docs/devel/app-pgdump.html -- With Regards, Amit Kapila.
RE: [PATCH]Comment improvement in publication.sql
Hi I saw some inaccurate comments for AlterPublicationStmt structure when reviewing patches related to publication[1]. The variable tables are used for 'ALTER PUBLICATION ... ADD/DROP/SET TABLE', but the comments only say 'ADD/DROP'. How about add 'SET' to the comments? typedef struct AlterPublicationStmt { NodeTag type; char *pubname;/* Name of the publication */ /* parameters used for ALTER PUBLICATION ... WITH */ List *options;/* List of DefElem nodes */ /* parameters used for ALTER PUBLICATION ... ADD/DROP TABLE */ List *tables; /* List of tables to add/drop */ boolfor_all_tables; /* Special publication for all tables in db */ DefElemAction tableAction; /* What action to perform with the tables */ } AlterPublicationStmt; It's also a comment improvement, so I add this change to this patch. A new version patch is attached, pleases have a look. [1] https://www.postgresql.org/message-id/OS0PR01MB61132C2C4E2232258EB192FDFBF19%40OS0PR01MB6113.jpnprd01.prod.outlook.com Regards Tang v3-improvement-on-comment.patch Description: v3-improvement-on-comment.patch
Re: Added schema level support for publication.
On Thu, Aug 5, 2021 at 3:54 PM Amit Kapila wrote: > > On Wed, Aug 4, 2021 at 4:10 PM Amit Kapila wrote: > > > > On Tue, Aug 3, 2021 at 8:38 PM vignesh C wrote: > > > > > > Thanks for reporting this, this is fixed in the v18 patch attached. > > > > > > > I have started looking into this patch and below are some initial comments. > > > > Few more comments: > === > 1. Do we need the previous column 'puballtables' after adding pubtype > or pubkind in pg_publication? I felt this should be retained as old client will still be using puballtables, like in case of old client executing \dRp+ commands. > 2. > @@ -224,6 +279,20 @@ CreatePublication(ParseState *pstate, > CreatePublicationStmt *stmt) > .. > + nspcrel = table_open(NamespaceRelationId, ShareUpdateExclusiveLock); > + PublicationAddSchemas(puboid, schemaoidlist, true, NULL); > + table_close(nspcrel, ShareUpdateExclusiveLock); > > What is the reason for opening and taking a lock on > NamespaceRelationId? Do you want to avoid dropping the corresponding > schema during this duration? If so, that is not sufficient because > what if somebody drops it just before you took lock on > NamespaceRelationId. I think you need to use LockDatabaseObject to > avoid dropping the schema and note that it should be unlocked only at > the end of the transaction similar to what we do for tables. I guess > you need to add this locking inside the function > PublicationAddSchemas. Also, it would be good if you can add few > comments in this part of the code to explain the reason for locking. Modified. > 3. The function PublicationAddSchemas is called from multiple places > in the patch but the locking protection is not there at all places. I > think if you add locking as suggested in the previous point then it > should be okay. I think you need similar locking for > PublicationDropSchemas. Modified. > 4. > @@ -421,16 +537,84 @@ AlterPublicationTables(AlterPublicationStmt > *stmt, Relation rel, > PublicationAddTables(pubid, rels, true, stmt); > > CloseTableList(delrels); > + if (pubform->pubtype == PUBTYPE_EMPTY) > + UpdatePublicationTypeTupleValue(rel, tup, > + Anum_pg_publication_pubtype, > + PUBTYPE_TABLE); > } > > At the above and all other similar places, the patch first updates the > pg_publication after performing the rel/schema operation. Isn't it > better to first update pg_publication to remain in sync with how > CreatePublication works? I am not able to see any issue with the way > you have it in the patch but it is better to keep the code consistent > across various similar functions to avoid confusion in the future. Modified. Thanks for the comments, the changes for the above are available in the v19 patch posted at [1]. [1] - https://www.postgresql.org/message-id/CALDaNm3BMLBpWOSdS3Q2vwpsM%3D0yovpJm8dKHRqNyFpANbrhpw%40mail.gmail.com Regards, Vignesh
Re: Added schema level support for publication.
On Wed, Aug 4, 2021 at 8:08 PM tanghy.f...@fujitsu.com < tanghy.f...@fujitsu.com> wrote: > > On Tuesday, August 3, 2021 11:08 PM vignesh C wrote: > > > > Thanks for reporting this, this is fixed in the v18 patch attached. > > Thanks for fixing it. > > Few suggestions for V18: > > 1. > +# Clean up the tables on both publisher and subscriber as we don't need them > +$node_publisher->safe_psql('postgres', "DROP SCHEMA sch1 cascade"); > +$node_publisher->safe_psql('postgres', "DROP SCHEMA sch2 cascade"); > +$node_publisher->safe_psql('postgres', "DROP SCHEMA sch3 cascade"); > +$node_subscriber->safe_psql('postgres', "DROP SCHEMA sch1 cascade"); > +$node_subscriber->safe_psql('postgres', "DROP SCHEMA sch2 cascade"); > +$node_subscriber->safe_psql('postgres', "DROP SCHEMA sch3 cascade"); > > Should we change the comment to "Clean up the schemas ... ", instead of 'tables'? Modified. > 2. > +$result = $node_subscriber->safe_psql('postgres', > +"SELECT count(*) FROM sch1.tab3"); > > Spaces are used here(and some other places), but in most places we use a TAB, so > I think it's better to change it to a TAB. Modified. > 3. > List *tables; /* List of tables to add/drop */ > boolfor_all_tables; /* Special publication for all tables in db */ > DefElemAction tableAction; /* What action to perform with the tables */ > + List *schemas;/* Optional list of schemas */ > } AlterPublicationStmt; > > Should we change the comment here to 'List of schemas to add/drop', then it can > be consistent with the comment for 'tables'. Modified. > I also noticed that 'tableAction' variable is used when we add/drop/set schemas, > so maybe the variable name is not suitable any more. Changed the variable name. > Besides, the following comment is above these codes. Should we add some comments > for schema? > > /* parameters used for ALTER PUBLICATION ... ADD/DROP TABLE */ Modified > And it says 'add/drop', do we need to add 'set'? (it's not related to this > patch, so I think I can add it in another thread[1] if needed, which is related > to comment improvement) You can include the change in the patch posted. > 4. > I saw the existing tests about permissions in publication.sql, should we add > tests for schema publication? Like this: > > diff --git a/src/test/regress/sql/publication.sql b/src/test/regress/sql/publication.sql > index 33dbdf7bed..c19337631e 100644 > --- a/src/test/regress/sql/publication.sql > +++ b/src/test/regress/sql/publication.sql > @@ -160,16 +160,19 @@ GRANT CREATE ON DATABASE regression TO regress_publication_user2; > SET ROLE regress_publication_user2; > SET client_min_messages = 'ERROR'; > CREATE PUBLICATION testpub2; -- ok > +CREATE PUBLICATION testpub3; -- ok > RESET client_min_messages; > > ALTER PUBLICATION testpub2 ADD TABLE testpub_tbl1; -- fail > +ALTER PUBLICATION testpub3 ADD SCHEMA pub_test; -- fail > > SET ROLE regress_publication_user; > GRANT regress_publication_user TO regress_publication_user2; > SET ROLE regress_publication_user2; > ALTER PUBLICATION testpub2 ADD TABLE testpub_tbl1; -- ok > +ALTER PUBLICATION testpub3 ADD SCHEMA pub_test; -- ok > > -DROP PUBLICATION testpub2; > +DROP PUBLICATION testpub2, testpub3; > > SET ROLE regress_publication_user; > REVOKE CREATE ON DATABASE regression FROM regress_publication_user2; Added. The changes for the above are available in the v19 patch posted at [1]. [1] - https://www.postgresql.org/message-id/CALDaNm3BMLBpWOSdS3Q2vwpsM%3D0yovpJm8dKHRqNyFpANbrhpw%40mail.gmail.com Regards, Vignesh
Gather performance analysis
Hi, I have been working on analyzing the performance of sending the tuple from workers to the Gather using the tuple queue. In the past there were many off-list discussions around this area, basically, the main point is that when the "shm_mq" was implemented that time maybe this was one of the best ways to implement this. But now, we have other choices like DSA for allocating shared memory on-demand, shared temporary files for non-blocking tuple queue. So my motivation for looking into this area is that now, we have another flexible alternative so can we use them to make gather faster and if so then 1. Can we actually reduce the tuple transfer cost and enable parallelism in more cases by reducing parallel_tuple_cost. 2. Can we use the tuple queue in more places, e.g., to implement the redistribute operator where we need to transfer data between the workers. IMHO for #1, it will be good enough if we can make the tuple transfer faster, but for #2, we will have to make a) tuple transfer faster because then we will have to transfer the tuples between the workers as well b) Infinite non-blocking tuple queue(maybe using shared temp file) so that there is no deadlock while workers are redistributing tuples to each other. So I have done some quick performance tests and analysis using perf, and some experiments with small prototypes for targeting a different set of problems. --Setup SET parallel_tuple_cost TO 0 -- to test parallelism in the extreme case CREATE TABLE t (a int, b varchar); INSERT INTO t SELECT i, repeat('a', 200) from generate_series(1,2) as i; ANALYZE t; Test query: EXPLAIN ANALYZE SELECT * FROM t; Perf analysis: Gather Node - 43.57% shm_mq_receive - 78.94% shm_mq_receive_bytes - 91.27% pg_atomic_read_u64 - pg_atomic_read_u64_impl - apic_timer_interrupt smp_apic_timer_interrupt Perf analysis: Worker Node - 99.14% shm_mq_sendv - 74.10% shm_mq_send_bytes + 42.35% shm_mq_inc_bytes_written - 32.56% pg_atomic_read_u64 - pg_atomic_read_u64_impl - 86.27% apic_timer_interrupt + 17.93% WaitLatch >From the perf results and also from the code analysis I can think of two main problems here 1. Schyncronization between the worker and gather node, just to identify the bytes written and read they need to do at least 2-3 atomic operations for each tuple and I think that is having huge penalty due to a) frequent cache line invalidation b) a lot of atomic operations. 2. If the tuple queue is full then the worker might need to wait for the gather to consume the tuple. Experiment #1: As part of this experiment, I have modified the sender to keep the local copy of "mq_bytes_read" and "mq_bytes_written" in the local mqh handle so that we don't need to frequently read/write cache sensitive shared memory variables. So now we only read/write from the shared memory in the below conditions 1) If the number of available bytes is not enough to send the tuple, read the updated value of bytes read and also inform the reader about the new writes. 2) After every 4k bytes written, update the shared memory variable and inform the reader. 3) on detach for sending any remaining data. Machine information: Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit CPU(s):56 On-line CPU(s) list: 0-55 Thread(s) per core:2 Core(s) per socket:14 Socket(s): 2 NUMA node(s): 2 Results: (query EXPLAIN ANALYZE SELECT * FROM t;) 1) Non-parallel (default) Execution Time: 31627.492 ms 2) Parallel with 4 workers (force by setting parallel_tuple_cost to 0) Execution Time: 37498.672 ms 3) Same as above (2) but with the patch. Execution Time: 23649.287 ms Observation: - As expected the results show that forcing the parallelism (by reducing the parallel_tuple_cost), drastically impacts the performance. - But in the same scenario, with the patch, we can see a huge gain of ~40% - Even if we compare it with the non-parallel plan we have gain ~25%. - With this, I think we can conclude that there is a huge potential for improvement if we communicate the tuple in batches, 1) one simple approach is what I used in my experiment, I think we can do some optimization in the reader as well, that instead of reading bytes_written every time from shared memory remember the previous value and once we have exhausted that then only read back the updated value from the shared memory. 2) Instead of copying the whole tuple in the tuple queue we can copy store the dsa_pointers of the tuple batch, I think Thomas Munro also suggested a similar approach to Robert, got to know this in offlist discussion with Robert. Experiment #2: See the behavior by increasing the parallel tuple queue size on head (for this I created a small patch to make parallel_tuple_queue size configurable) -- Results 4 WORKERS (tup_queue size= 64kB) : 38337.046 ms 4 WORKERS
Re: Added schema level support for publication.
On Wed, Aug 4, 2021 at 12:08 AM vignesh C wrote: > > On Tue, Aug 3, 2021 at 12:00 PM tanghy.f...@fujitsu.com > wrote: > > > > On Monday, August 2, 2021 11:40 PM vignesh C wrote: > > > > > > Thanks for the comments, attached v17 patches has the fixes for the same. > > > > Thanks for your new patch. > > > > I saw the following warning when compiling. It seems we don't need this > > variable any more. > > > > publicationcmds.c: In function ‘AlterPublicationSchemas’: > > publicationcmds.c:592:15: warning: unused variable ‘oldlc’ > > [-Wunused-variable] > >ListCell *oldlc; > >^ > > Thanks for reporting this, this is fixed in the v18 patch attached. I've also started reviewing this patch. I've not looked at the patch yet but here are initial comments/questions based on using this feature: pg_publication catalog still has puballtables column but it's still necessary? IIUC since pubtype = 'a' means publishing all tables in the database puballtables seems no longer necessary. --- Suppose that a parent table and its child table are defined in different schemas, there is a publication for the schema where only the parent table is defined, and the subscriber subscribes to the publication, should changes for its child table be replicated to the subscriber? In FOR TABLE cases, i.g., where the subscriber subscribes to the publication that is only for the parent table, changes for its child table are replicated to the subscriber. As far as I tested v18 patch, changes for the child table are not replicated in FOR SCHEMA cases. Here is the test script: On publisher and subscriber: create schema p_schema; create schema c_schema; create table p_schema.p (a int) partition by list (a); create table c_schema.c partition of p_schema.p for values in (1); On publisher: create publication pub_p_schema for schema p_schema; On subscriber: create subscription pub connection 'dbname=postgres' publication pub_p_schema; On publisher: insert into p_schema.p values (1); select * from p_schema.p; a --- 1 (1 row) On subscriber: select * from p_schema.p; a --- (0 rows) Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Support reset of Shared objects statistics in "pg_stat_reset" function
Hi, The call to “pg_stat_reset“ does reset all the statistics data for tables belonging to the current database but does not take care of shared tables e.g pg_attribute. Similarly to reset the statistics at table level, the call to “pg_stat_reset_single_table_counters“ does not take care of shared tables. When we reset all the statistics using the call “pg_stat_reset”, the postgres process internally makes calls to “ pgstat_recv_resetcounter“, which resets the statistics of all the tables of the current database. But not resetting the statistics of shared objects using database ID as 'InvalidOid'. The same fix is made in the internal function “pgstat_recv_resetsinglecounter“ to reset the statistics for the shared table for the call "pg_stat_reset_single_table_counters". -- thank u SADHU PRASAD EnterpriseDB: http://www.enterprisedb.com 0001-pg_stat_reset-and-pg_stat_reset_single_table_counter.patch Description: Binary data
Re: archive status ".ready" files may be created too early
At Fri, 6 Aug 2021 00:21:34 +, "Bossart, Nathan" wrote in > On 8/5/21, 12:39 AM, "Kyotaro Horiguchi" wrote: > > Honestly I don't like having this initialization in XLogWrite. We > > should and I think can initialize it earlier. It seems to me the most > > appropriate timing to initialize the variable is just before running > > the end-of-recovery checkpoint). Since StartupXLOG knows the first > > segment to write . If it were set to 0, that doesn't matter at all. > > We can get rid of the new symbol by doing this. > > This seems like a good idea to me. I made this change in v5. I > performed some basic testing, and it seems to reliably initialize > lastNotifiedSeg correctly. > > > + /* > > +* EndOfLog resides on the next segment of the last finished one. Set > > the > > +* last finished segment as lastNotifiedSeg now. In the case where the > > +* last crash has left the last several segments not being marked as > > +* .ready, the checkpoint just after does that for all finished > > segments. > > +* There's a corner case where the checkpoint advances segment, but that > > +* ends up at most with a duplicate archive notification. > > +*/ > > I'm not quite following the corner case you've described here. Is it > possible that the segment that EndOfLog points to will be eligible for > removal after the checkpoint? Archiving doesn't immediately mean removal. A finished segment is ought to be archived right away. Since the EndOfLog segment must not get marked .ready, setting lastNotifiedSeg to the previous segment is quite right, but if the end-of-recovery checkpoint advances segment, EndOfLog is marked .ready at the XLogFlush just after. But, sorry, what I forgot at the time was the checkpoint also moves lastNotifiedSeg. So, sorry, that corner case does not exist. > In v5 of the patch, I've also added an extra call to > NotifySegmentsReadyForArchive() in the same place we previously > created the .ready files. I think this helps notify archiver sooner > in certain cases (e.g., asynchronous commit). In v5, NotifySegmentsReadyForArchive() still holds ArchNotifyLock including .ready file creations. Since the notification loop doesn't need the hash itself, the loop can be took out of the lock section? current: LWLockAcquire(ArchNotifyLock, LW_EXCLUSIVE); last_notified = GetLastNotifiedSegment(); latest_boundary_seg = GetLatestRecordBoundarySegment(last_notified, flushed, ); if (found) { SetLastNotifiedSegment(latest_boundary_seg - 1); for (seg = last_notified + 1; seg < latest_boundary_seg; seg++) XLogArchiveNotifySeg(seg, false); RemoveRecordBoundariesUpTo(latest_boundary_seg); PgArchWakeup(); } LWLockRelease(ArchNotifyLock); But we can release the lock earlier. LWLockAcquire(ArchNotifyLock, LW_EXCLUSIVE); last_notified = GetLastNotifiedSegment(); latest_boundary_seg = GetLatestRecordBoundarySegment(last_notified, flushed, ); if (found) { SetLastNotifiedSegment(latest_boundary_seg - 1); RemoveRecordBoundariesUpTo(latest_boundary_seg); } LWLockRelease(ArchNotifyLock); if (found) { for (seg = last_notified + 1; seg < latest_boundary_seg; seg++) XLogArchiveNotifySeg(seg, false); PgArchWakeup(); } -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
On Fri, Aug 6, 2021 at 1:39 PM Amit Kapila wrote: > > On Fri, Aug 6, 2021 at 5:09 AM Masahiko Sawada wrote: > > > > On Thu, Aug 5, 2021 at 11:40 PM houzj.f...@fujitsu.com > > wrote: > > > > > > > To summary, I think that what we want to do in DROP SUBSCRIPTION cases > > > > is to > > > > drop relations from pg_subscription_rel that are no longer included in > > > > the set of > > > > after-calculated publications. In the latter example, when ALTER > > > > SUBSCRIPTION ... DROP PUBLICATION pub12, we should compare the current > > > > set of relations associated with {pub12, pub13} to the set of > > > > relcations associated > > > > with pub13, not pub12. Then we can find out t2 is no longer associated > > > > with the > > > > after-calculated publication (i.g., pub13) so remove it. > > > > > > Thanks for the review. You are right, I missed this point. > > > Attach new version patch which fix these problems(Also add a new pl-test). > > > > > > > Thank you for updating the patch! > > > > Here are some comments: > > > > I think that it still could happen that DROP PULIBCATION misses > > dropping relations. Please consider the following case: > > > > On publisher and subscriber: > > create table t1 (a int); > > create table t2 (a int); > > create table t3 (a int); > > > > On publisher: > > create publication pub12 for table t1, t2; > > create publication pub3 for table t3; > > > > On subscriber: > > create subscription sub connection 'dbname=postgres' publication pub12, > > pub3; > > > > On publisher: > > alter publication pub3 add table t1; > > > > On subscriber: > > alter subscription sub drop publication pub12; > > select srsubid, srrelid::regclass::text, srsubstate, srsublsn from > > pg_subscription_rel; > > srsubid | srrelid | srsubstate | srsublsn > > -+-++--- > >16393 | t3 | r | 0/1707CE0 > >16393 | t1 | r | 0/1707D18 > > > > With v3 patch, pg_subscription_rel has entries for t3 and t1. But if > > ALTER SUBSCRIPTION ... DROP PUBLICATION pub12 refreshes only relations > > associated with pub12, t1 should be removed and be added when we > > refresh pub3. > > > > But, isn't this happening because of your suggestion to compare the > current set of relations with relations from publications that doesn't > need to be removed? Hmm yes, it cannot cover all cases. I had somehow misunderstood that the subscriber knows which relations are associated with which publications. Given that the subscriber doesn’t know which relations are associated with which publications (and the set of subscribed relations on the subscriber becomes out of date once the publication is updated), I think it's impossible to achieve that DROP PUBLICATION drops relations that are associated with only the publication being dropped. > Do we have better ideas to fix or shall we just > say that we will refresh based on whatever current set of relations > are present in publication to be dropped? I don't have better ideas. It seems to me that it's prudent that we accept the approach in v3 patch and refresh all publications in DROP PUBLICATION cases. > > One more thing, I think it would be better to write a separate refresh > function for add/drop rather than adding checks in the current refresh > functionality. We can extract common functionality code which can be > called from the different refresh functions. +1 Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: RFC: Improve CPU cache locality of syscache searches
Hi, On Thu, Aug 5, 2021, at 22:20, Yura Sokolov wrote: > Andres Freund писал 2021-08-06 06:49: > > Hi, > > > > On 2021-08-06 06:43:55 +0300, Yura Sokolov wrote: > >> Why don't use simplehash or something like that? Open-addressing > >> schemes > >> show superior cache locality. > > > > I thought about that as well - but it doesn't really resolve the > > question of > > what we want to store in-line in the hashtable and what not. We can't > > store > > the tuples themselves in the hashtable for a myriad of reasons (need > > pointer > > stability, they're variably sized, way too large to move around > > frequently). > > > > > >> Well, simplehash entry will be 24 bytes this way. If simplehash > >> template > >> supports external key/element storage, then it could be shrunk to 16 > >> bytes, > >> and syscache entries will not need dlist_node. (But it doesn't at the > >> moment). > > > > I think storing keys outside of the hashtable entry defeats the purpose > > of the > > open addressing, given that they are always checked and that our > > conflict > > ratio should be fairly low. > > It's opposite: if conflict ratio were high, then key outside of > hashtable will > be expensive, since lookup to non-matched key will cost excess memory > access. > But with low conflict ratio we will usually hit matched entry at first > probe. > And since we will use entry soon, it doesn't matter when it will go to > CPU L1 > cache: during lookup or during actual usage. Often enough it does matter, because there will be earlier dependencies on whether a lookup is a cache hit/miss than on the content of the cached tuple. Regards, Andres