Re: Improve handling of parameter differences in physical replication
On 2020-02-28 08:45, Michael Paquier wrote: On Thu, Feb 27, 2020 at 02:37:24PM +0100, Peter Eisentraut wrote: On 2020-02-27 11:13, Fujii Masao wrote: Btw., I think the current setup is slightly buggy. The MaxBackends value that is used to size shared memory is computed as MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but we don't track autovacuum_max_workers in WAL. Maybe this is because autovacuum doesn't work during recovery? Autovacuum on the primary can use locks or xids, and so it's possible that the standby when processing WAL encounters more of those than it has locally allocated shared memory to handle. Putting aside your patch because that sounds like a separate issue.. Doesn't this mean that autovacuum_max_workers should be added to the control file, that we need to record in WAL any updates done to it and that CheckRequiredParameterValues() is wrong? That would be a direct fix, yes. Perhaps it might be better to track the combined MaxBackends instead, however. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: more ALTER .. DEPENDS ON EXTENSION fixes
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Tested the pg_dump patch for dumping "ALTER .. DEPENDS ON EXTENSION" in case of indexes, functions, triggers etc. The "ALTER .. DEPENDS ON EXTENSION" is included in the dump. However in some case not sure why "ALTER INDEX.DEPENDS ON EXTENSION" is repeated several times in the dump?
Re: Improve handling of parameter differences in physical replication
On Thu, Feb 27, 2020 at 02:37:24PM +0100, Peter Eisentraut wrote: > On 2020-02-27 11:13, Fujii Masao wrote: >>> Btw., I think the current setup is slightly buggy. The > MaxBackends value that is used to size shared memory is computed as > MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + > max_wal_senders, but we don't track autovacuum_max_workers in WAL. >> Maybe this is because autovacuum doesn't work during recovery? > > Autovacuum on the primary can use locks or xids, and so it's possible that > the standby when processing WAL encounters more of those than it has locally > allocated shared memory to handle. Putting aside your patch because that sounds like a separate issue.. Doesn't this mean that autovacuum_max_workers should be added to the control file, that we need to record in WAL any updates done to it and that CheckRequiredParameterValues() is wrong? -- Michael signature.asc Description: PGP signature
Re: Make mesage at end-of-recovery less scary.
On Fri, Feb 28, 2020 at 04:01:00PM +0900, Kyotaro Horiguchi wrote: > Hello, this is a followup thread of [1]. > > # I didn't noticed that the thread didn't cover -hackers.. > > When recovery of any type ends, we see several kinds of error messages > that says "WAL is broken". Have you considered an error context here? Your patch leads to a bit of duplication with the message a bit down of what you are changing where the end of local pg_wal is reached. > + * reached the end of WAL. Otherwise something's really wrong and > + * we report just only the errormsg if any. If we don't receive This sentence sounds strange to me. Or you meant "Something is wrong, so use errormsg as report if it is set"? > + * Note: errormsg is alreay translated. Typo here. > + if (StandbyMode) > + ereport(actual_emode, > + (errmsg ("rached end of WAL at %X/%X on timeline %u in > %s during streaming replication", StandbyMode happens also with only WAL archiving, depending on if primary_conninfo is set or not. > + (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during crash > recovery", FWIW, you are introducing three times the same typo, in the same word, in three different messages. -- Michael signature.asc Description: PGP signature
Re: Implementing Incremental View Maintenance
On Thu, 27 Feb 2020 14:35:55 -0700 (MST) legrand legrand wrote: > > I have tried to use an other patch with yours: > > "Planning counters in pg_stat_statements (using pgss_store)" > > > setting > > shared_preload_libraries='pg_stat_statements' > > pg_stat_statements.track=all > > restarting the cluster and creating the extension > > > > When trying following syntax: > > > create table b1 (id integer, x numeric(10,3)); > > create incremental materialized view mv1 as select id, count(*),sum(x) > > from b1 group by id; > > insert into b1 values (1,1) > > > > I got an ASSERT FAILURE in pg_stat_statements.c > > on > > Assert(query != NULL); > > > > comming from matview.c > > refresh_matview_datafill(dest_old, query, queryEnv, NULL); > > or > > refresh_matview_datafill(dest_new, query, queryEnv, NULL); > > > > If this (last) NULL field was replaced by the query text, > > a comment or just "n/a", > > it would fix the problem. > > > Could this be investigated ? > > Hello, > > thank you for patch v14, that fix problems inherited from temporary tables. > it seems that this ASSERT problem with pgss patch is still present ;o( > > Could we have a look ? Sorry but we are busy on fixing and improving IVM patches. I think fixing the assertion failure needs non trivial changes to other part of PosthreSQL. So we would like to work on the issue you reported after the pgss patch gets committed. Best Regards, Takuma Hoshiai > Thanks in advance > Regards > PAscal > > > > -- > Sent from: > https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html > > > > > > -- > Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html > > > -- Takuma Hoshiai
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema
On Fri, Feb 28, 2020 at 11:29:56AM +0530, Mahendra Singh Thalor wrote: > Patch looks good to me and it is fixing the assert failure. Thanks for looking at the patch. Bringing the code to act consistently with what was done in 246a6c8 still looks like the correct direction to take, in short don't drop temp relations created without an existing temp schema and ignore them instead of creating more issues with the storage, so I'd like to apply and back-patch this stuff down to 11. First, let's wait a couple of extra days. -- Michael signature.asc Description: PGP signature
Make mesage at end-of-recovery less scary.
Hello, this is a followup thread of [1]. # I didn't noticed that the thread didn't cover -hackers.. When recovery of any type ends, we see several kinds of error messages that says "WAL is broken". > LOG: invalid record length at 0/7CB6BC8: wanted 24, got 0 > LOG: redo is not required > LOG: database system is ready to accept connections This patch reduces the scariness of such messages as the follows. > LOG: rached end of WAL at 0/1551048 on timeline 1 in pg_wal during crash > recovery > DETAIL: invalid record length at 0/1551048: wanted 24, got 0 > LOG: redo is not required > LOG: database system is ready to accept connections [1] https://www.postgresql.org/message-id/20200117.172655.1384889922565817808.horikyota.ntt%40gmail.com I'll register this to the coming CF. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From f3692cb484b7f1ebc351ba8a522039c0b91bcfdb Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 28 Feb 2020 15:52:58 +0900 Subject: [PATCH] Make End-Of-Recovery error less scary When recovery in any type ends, we see a bit scary error message like "invalid record length" that suggests something serious is happening. Make this message less scary as "reached end of WAL". --- src/backend/access/transam/xlog.c | 45 ++- 1 file changed, 38 insertions(+), 7 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index d19408b3be..452c376f62 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -4288,6 +4288,10 @@ ReadRecord(XLogReaderState *xlogreader, int emode, EndRecPtr = xlogreader->EndRecPtr; if (record == NULL) { + int actual_emode = +emode_for_corrupt_record(emode, + ReadRecPtr ? ReadRecPtr : EndRecPtr); + if (readFile >= 0) { close(readFile); @@ -4295,14 +4299,41 @@ ReadRecord(XLogReaderState *xlogreader, int emode, } /* - * We only end up here without a message when XLogPageRead() - * failed - in that case we already logged something. In - * StandbyMode that only happens if we have been triggered, so we - * shouldn't loop anymore in that case. + * randAccess here means we are reading successive records during + * recovery. If we get here during recovery, we can assume that we + * reached the end of WAL. Otherwise something's really wrong and + * we report just only the errormsg if any. If we don't receive + * errormsg here, we already logged something. We don't emit + * "reached end of WAL" in muted messages. + * + * Note: errormsg is alreay translated. */ - if (errormsg) -ereport(emode_for_corrupt_record(emode, EndRecPtr), - (errmsg_internal("%s", errormsg) /* already translated */ )); + if (!private->randAccess && actual_emode == emode) + { +if (StandbyMode) + ereport(actual_emode, + (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during streaming replication", + (uint32) (EndRecPtr >> 32), (uint32) EndRecPtr, + ThisTimeLineID, + xlogSourceNames[currentSource]), + (errormsg ? errdetail_internal("%s", errormsg) : 0))); +else if (InArchiveRecovery) + ereport(actual_emode, + (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during archive recovery", + (uint32) (EndRecPtr >> 32), (uint32) EndRecPtr, + ThisTimeLineID, + xlogSourceNames[currentSource]), + (errormsg ? errdetail_internal("%s", errormsg) : 0))); +else + ereport(actual_emode, + (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during crash recovery", + (uint32) (EndRecPtr >> 32), (uint32) EndRecPtr, + ThisTimeLineID, + xlogSourceNames[currentSource]), + (errormsg ? errdetail_internal("%s", errormsg) : 0))); + } + else if (errormsg) +ereport(actual_emode, (errmsg_internal("%s", errormsg))); } /* -- 2.18.2
Re: Trying to pull up EXPR SubLinks
On Fri, Feb 28, 2020 at 2:35 PM Richard Guo wrote: > Hi All, > > Currently we will not consider EXPR_SUBLINK when pulling up sublinks and > this would cause performance issues for some queries with the form of: > 'a > (SELECT agg(b) from ...)' as described in [1]. > > So here is a patch as an attempt to pull up EXPR SubLinks. The idea, > which is based on Greenplum's implementation, is to perform the > following transformation. > > For query: > > select * from foo where foo.a > > (select avg(bar.a) from bar where foo.b = bar.b); > > we transform it to: > > select * from foo inner join > (select bar.b, avg(bar.a) as avg from bar group by bar.b) sub > on foo.b = sub.b and foo.a > sub.avg; > Glad to see this. I think the hard part is this transform is not *always* good. for example foo.a only has 1 rows, but bar has a lot of rows, if so the original would be the better one. doss this patch consider this problem? > Thanks > Richard >
Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
On Thu, Feb 27, 2020 at 06:29:34PM +0300, Alexey Kondratov wrote: > On 2020-02-27 16:41, Alexey Kondratov wrote: > > > > New version of the patch is attached. Thanks again for your review. > > > > Last patch (v18) got a conflict with one of today commits (05d8449e73). > Rebased version is attached. The shape of the patch is getting better. I have found some issues when reading through the patch, but nothing huge. + printf(_(" -c, --restore-target-wal use restore_command in target config\n")); + printf(_(" to retrieve WAL files from archive\n")); [...] {"progress", no_argument, NULL, 'P'}, + {"restore-target-wal", no_argument, NULL, 'c'}, It may be better to reorder that alphabetically. + if (rc != 0) + /* Sanity check, should never happen. */ + elog(ERROR, "failed to build restore_command due to missing parameters"); No point in having this comment IMO. +/* logging support */ +#define pg_fatal(...) do { pg_log_fatal(__VA_ARGS__); exit(1); } while(0) Actually, I don't think that this is a good idea to name this pg_fatal() as we have the same think in pg_rewind so it could be confusing. - while ((c = getopt_long(argc, argv, "D:nNPR", long_options, &option_index)) != -1) + while ((c = getopt_long(argc, argv, "D:nNPRc", long_options, &option_index)) != -1) Alphabetical order here. + rmdir($node_master->archive_dir); rmtree() is used in all our other tests. + pg_log_error("archive file \"%s\" has wrong size: %lu instead of %lu, %s", +xlogfname, (unsigned long) stat_buf.st_size, +(unsigned long) expectedSize, strerror(errno)); I think that the error message should be reworded: "unexpected WAL file size for \"%s\": %lu instead of %lu". Please note that there is no need for strerror() here at all, as errno should be 0. +if (xlogfd < 0) +pg_log_error("could not open file \"%s\" restored from archive: %s\n", + xlogpath, strerror(errno)); [...] +pg_log_error("could not stat file \"%s\" restored from archive: %s", +xlogpath, strerror(errno)); No need for strerror() as you can just use %m. And no need for the extra newline at the end as pg_log_* routines do that by themselves. + pg_log_error("could not restore file \"%s\" from archive\n", +xlogfname); No need for a newline here. -- Michael signature.asc Description: PGP signature
Trying to pull up EXPR SubLinks
Hi All, Currently we will not consider EXPR_SUBLINK when pulling up sublinks and this would cause performance issues for some queries with the form of: 'a > (SELECT agg(b) from ...)' as described in [1]. So here is a patch as an attempt to pull up EXPR SubLinks. The idea, which is based on Greenplum's implementation, is to perform the following transformation. For query: select * from foo where foo.a > (select avg(bar.a) from bar where foo.b = bar.b); we transform it to: select * from foo inner join (select bar.b, avg(bar.a) as avg from bar group by bar.b) sub on foo.b = sub.b and foo.a > sub.avg; To do that, we recurse through the quals in sub-select and extract quals of form 'foo(outervar) = bar(innervar)' and then according to innervars we make new SortGroupClause items and TargetEntry items for sub-select. And at last we pull up the sub-select into upper range table. As a result, the plan would change as: FROM QUERY PLAN Seq Scan on foo Filter: ((a)::numeric > (SubPlan 1)) SubPlan 1 -> Aggregate -> Seq Scan on bar Filter: (foo.b = b) (6 rows) TO QUERY PLAN -- Hash Join Hash Cond: (foo.b = bar.b) Join Filter: ((foo.a)::numeric > (avg(bar.a))) -> Seq Scan on foo -> Hash -> HashAggregate Group Key: bar.b -> Seq Scan on bar (8 rows) The patch works but still in draft stage. Post it here to see if it is the right thing we want. [1] https://www.postgresql.org/message-id/flat/CAKU4AWodctmbU%2BZj6U83y_RniQk0UeXBvKH1ZaJ%3DLR_iC90GOw%40mail.gmail.com Thanks Richard v1-0001-Draft-PR-for-pulling-up-EXPR_SUBLINK.patch Description: Binary data
Re: Add LogicalTapeSetExtend() to logtape.c
On Thu, Feb 27, 2020 at 01:01:08PM -0800, Jeff Davis wrote: > Attached is a patch that exports a new function from logtape.c: > >extern LogicalTapeSet *LogicalTapeSetExtend( > LogicalTapeSet *lts, int nAdditional); > > The purpose is to allow adding new tapes dynamically for the upcoming > Hash Aggregation work[0]. HashAgg doesn't know in advance how many > tapes it will need, though only a limited number are actually active at > a time. > > This was proposed and originally written by Adam Lee[1] (extract only > the changes to logtape.c/h from his posted patch). Strangely, I'm > seeing ~5% regression with his version when doing: > > -- t20m_1_int4 has 20 million random integers > select * from t20m_1_int4 order by i offset 1; > > Which seems to be due to using a pointer rather than a flexible array > member (I'm using gcc[2]). My version (attached) still uses a flexible > array member, which doesn't see the regression; but I repalloc the > whole struct so the caller needs to save the new pointer to the tape > set. > > That doesn't entirely make sense to me, and I'm wondering if someone > else can repro that result and/or make a suggestion, because I don't > have a good explanation. I'm fine with my version of the patch, but it > would be nice to know why there's such a big difference using a pointer > versus a flexible array member. > > Regards, > Jeff Davis I noticed another difference, I was using palloc0(), which could be one of the reason, but not sure. Tested your hashagg-20200226.patch on my laptop(Apple clang version 11.0.0), the average time is 25.9s: ``` create table t20m_1_int4(i int); copy t20m_1_int4 from program 'shuf -i 1-4294967295 -n 2000'; analyze t20m_1_int4; ``` ``` explain analyze select * from t20m_1_int4 order by i offset 1; QUERY PLAN --- Limit (cost=3310741.20..3310741.20 rows=1 width=4) (actual time=25666.471..25666.471 rows=0 loops=1) -> Sort (cost=3260740.96..3310741.20 rows=2096 width=4) (actual time=20663.065..24715.269 rows=2000 loops=1) Sort Key: i Sort Method: external merge Disk: 274056kB -> Seq Scan on t20m_1_int4 (cost=0.00..288496.96 rows=2096 width=4) (actual time=0.075..2749.385 rows=2000 loops=1) Planning Time: 0.109 ms Execution Time: 25911.542 ms (7 rows) ``` But if use the palloc0() or do the MemSet() like: ``` lts = (LogicalTapeSet *) palloc0(offsetof(LogicalTapeSet, tapes) + ntapes * sizeof(LogicalTape)); ... MemSet(lts->tapes, 0, lts->nTapes * sizeof(LogicalTape)); <--- this line doesn't matter as I observed, which makes a little sense the compiler might know it's already zero. ``` The average time goes up to 26.6s: ``` explain analyze select * from t20m_1_int4 order by i offset 1; QUERY PLAN --- Limit (cost=3310741.20..3310741.20 rows=1 width=4) (actual time=26419.712..26419.712 rows=0 loops=1) -> Sort (cost=3260740.96..3310741.20 rows=2096 width=4) (actual time=21430.044..25468.661 rows=2000 loops=1) Sort Key: i Sort Method: external merge Disk: 274056kB -> Seq Scan on t20m_1_int4 (cost=0.00..288496.96 rows=2096 width=4) (actual time=0.060..2839.452 rows=2000 loops=1) Planning Time: 0.111 ms Execution Time: 26652.255 ms (7 rows) ``` -- Adam Lee
Re: pgbench: option delaying queries till connections establishment?
At Thu, 27 Feb 2020 10:51:29 -0800, Andres Freund wrote in > Hi, > > On 2020-02-27 10:01:00 -0800, Andres Freund wrote: > > If so, should this be done unconditionally? A new option? Included in an > > existing one somehow? > > FWIW, leaving windows, error handling, and other annoyances aside, this > can be implemented fairly simply. See below. > > As an example of the difference: > > Before: > andres@awork3:~/build/postgres/dev-optimize/vpath$ ./src/bin/pgbench/pgbench > -M prepared -c 5000 -j 100 -T 100 -P1 -S > starting vacuum...end. > progress: 100.4 s, 515307.4 tps, lat 1.374 ms stddev 7.739 > transaction type: > scaling factor: 30 > query mode: prepared > number of clients: 5000 > number of threads: 100 > duration: 100 s > number of transactions actually processed: 51728348 > latency average = 1.374 ms > latency stddev = 7.739 ms > tps = 513802.541226 (including connections establishing) > tps = 521342.427158 (excluding connections establishing) > > > Note that there's no progress report until the end. That's because the > main thread didn't get a connection until the other threads were done. > > > After: > > pgbench -M prepared -c 5000 -j 100 -T 100 -P1 -S > starting vacuum...end. > progress: 1.5 s, 9943.5 tps, lat 4.795 ms stddev 14.822 > progress: 2.0 s, 380312.6 tps, lat 1.728 ms stddev 15.461 > progress: 3.0 s, 478811.1 tps, lat 2.052 ms stddev 31.687 > progress: 4.0 s, 470804.6 tps, lat 1.941 ms stddev 24.661 > > > > I think this also shows that "including/excluding connections > establishing" as well as some of the other stats reported pretty > bogus. In the 'before' case a substantial numer of the connections had > not yet been established until the end of the test run! I see it useful. In most cases we don't care connection time of pgbench. Especially in the mentioned case the result is just bogus. I think the reason for "including/excluding connection establishing" is not that people wants to see how long connection took to establish but that how long the substantial work took. If each client did run with continuously re-establishing new connections the connection time would be useful, but actually all the connections are established at once at the beginning. So FWIW I prefer that the barrier is applied by default (that is, it can be disabled) and the progress time starts at the time all clients has been established. > starting vacuum...end. + time to established 5000 connections: 1323ms ! progress: 1.0 s, 33.5 tps, lat 2.795 ms stddev 14.822 ! progress: 2.0 s, 380312.6 tps, lat 1.728 ms stddev 15.461 ! progress: 3.0 s, 478811.1 tps, lat 2.052 ms stddev 31.687 ! progress: 4.0 s, 470804.6 tps, lat 1.941 ms stddev 24.661 > transaction type: > scaling factor: 30 > query mode: prepared > number of clients: 5000 > number of threads: 100 > duration: 100 s > number of transactions actually processed: 51728348 > latency average = 1.374 ms > latency stddev = 7.739 ms > tps = 513802.541226 (including connections establishing) > tps = 521342.427158 (excluding connections establishing) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema
On Thu, 16 Jan 2020 at 09:36, Michael Paquier wrote: > > On Fri, Jan 10, 2020 at 08:07:48PM +0900, Michael Paquier wrote: > > Thinking more about it, this has a race condition if a temporary > > schema is removed after collecting the OIDs in the drop phase. So the > > updated attached is actually much more conservative and does not need > > an update of the log message, without giving up on the improvements > > done in v11~. In 9.4~10, the code of the second phase relies on > > GetTempNamespaceBackendId() which causes an orphaned relation to not > > be dropped in the event of a missing namespace. I'll just leave that > > alone for a couple of days now.. > > And back on that one, I still like better the solution as of the > attached which skips any relations with their namespace gone missing > as 246a6c87's intention was only to allow orphaned temp relations to > be dropped by autovacuum when a backend slot is connected, but not > using yet its own temp namespace. > > If we want the drop of temp relations to work properly, more thoughts > are needed regarding the storage part, and I am not actually sure that > it is autovacuum's job to handle that better. > > Any thoughts? Hi, Patch looks good to me and it is fixing the assert failure. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Re: Identifying user-created objects
On Wed, Feb 26, 2020 at 4:48 PM Masahiko Sawada wrote: > On Thu, 13 Feb 2020 at 17:13, Julien Rouhaud wrote: > > On Thu, Feb 13, 2020 at 8:32 AM Amit Langote > > wrote: > > > Maybe we could document that pg_is_user_object() and its internal > > > counterpart returns true only for objects that are created during > > > "normal" multi-user database operation. > > > > +1 > > Agreed. > > Attached updated version patch. Thanks for updating the patch. Some comments: + + pg_is_user_object(oid) + bool + +true if oid is the object which is created during +normal multi-user database operation. + + How about clarifying the description further as follows: "true for objects created while database is operating in normal multi-user mode, as opposed to single-user mode (see )." Term "multi-user operation" is not mentioned elsewhere in the documentation, so better to clarify what it means. Also, maybe a minor nitpick, but how about adding the new function's row at the end of the table (Table 9.72) instead of in the middle? Other than that, patch looks to be in pretty good shape. Thanks, Amit
Re: Autovacuum on partitioned table
On Fri, Feb 28, 2020 at 11:25 AM Alvaro Herrera wrote: > > /* > > + * If the relation is a partitioned table, we must add up children's > > + * reltuples. > > + */ > > + if (classForm->relkind == RELKIND_PARTITIONED_TABLE) > > + { > > + List *children; > > + ListCell *lc; > > + > > + reltuples = 0; > > + > > + /* Find all members of inheritance set taking AccessShareLock > > */ > > + children = find_all_inheritors(relid, AccessShareLock, NULL); > > + > > + foreach(lc, children) > > + { > > + OidchildOID = lfirst_oid(lc); > > + HeapTuple childtuple; > > + Form_pg_class childclass; > > + > > + /* Ignore the parent table */ > > + if (childOID == relid) > > + continue; > > I think this loop counts partitioned partitions multiple times, because > we add up reltuples for all levels, no? (If I'm wrong, that is, if > a partitioned rel does not have reltuples, then why skip the parent?) +1, no need to skip partitioned tables here a their reltuples is always 0. > > + /* > > + * If the table is a leaf partition, tell the stats collector its > > parent's > > + * changes_since_analyze for auto analyze Maybe write: For a leaf partition, add its current changes_since_analyze into its ancestors' counts. This must be done before sending the ANALYZE message as it resets the partition's changes_since_analyze counter. > > + */ > > + if (IsAutoVacuumWorkerProcess() && > > + rel->rd_rel->relispartition && > > + !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)) > > I'm not sure I understand why we do this only on autovac. Why not all > analyzes? +1. If there is a reason, it should at least be documented in the comment above. > > + { > > + Oid parentoid; > > + Relation parentrel; > > + PgStat_StatDBEntry *dbentry; > > + PgStat_StatTabEntry *tabentry; > > + > > + /* Get its parent table's Oid and relation */ > > + parentoid = get_partition_parent(RelationGetRelid(rel)); > > + parentrel = table_open(parentoid, AccessShareLock); > > Climbing up the partitioning hierarchy acquiring locks on ancestor > relations opens up for deadlocks. It's better to avoid that. (As a > test, you could try what happens if you lock the topmost relation with > access-exclusive and leave a transaction open, then have autoanalyze > run). At the same time, I wonder if it's sensible to move one level up > here, and also have pgstat_report_partanalyze move more levels up. Maybe fetch all ancestors here and process from the top. But as we'd have locked the leaf partition long before we got here, maybe we should lock ancestors even before we start analyzing the leaf partition? AccessShareLock should be enough on the ancestors because we're not actually analyzing them. (It appears get_partition_ancestors() returns a list where the root parent is the last element, so need to be careful with that.) Thanks, Amit
Re: Crash by targetted recovery
At Thu, 27 Feb 2020 20:04:41 +0900, Fujii Masao wrote in > > > On 2020/02/27 17:05, Kyotaro Horiguchi wrote: > > Thank you for the comment. > > At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao > > wrote in > >> On 2020/02/27 15:23, Kyotaro Horiguchi wrote: > I failed to understand why random access while reading from > stream is bad idea. Could you elaborate why? > >>> It seems to me the word "streaming" suggests that WAL record should be > >>> read sequentially. Random access, which means reading from arbitrary > >>> location, breaks a stream. (But the patch doesn't try to stop wal > >>> sender if randAccess.) > >>> > Isn't it sufficient to set currentSource to 0 when disabling > StandbyMode? > >>> I thought that and it should work, but I hesitated to manipulate on > >>> currentSource in StartupXLOG. currentSource is basically a private > >>> state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I > >>> think it's not good to modify it out of the the logic in > >>> WaitForWALToBecomeAvailable. > >> > >> If so, what about adding the following at the top of > >> WaitForWALToBecomeAvailable()? > >> > >> if (!StandbyMode && currentSource == XLOG_FROM_STREAM) > >> currentSource = 0; > > It works virtually the same way. I'm happy to do that if you don't > > agree to using randAccess. But I'd rather do that in the 'if > > (!InArchiveRecovery)' section. > > The approach using randAccess seems unsafe. Please imagine > the case where currentSource is changed to XLOG_FROM_ARCHIVE > because randAccess is true, while walreceiver is still running. > For example, this case can occur when the record at REDO > starting point is fetched with randAccess = true after walreceiver > is invoked to fetch the last checkpoint record. The situation > "currentSource != XLOG_FROM_STREAM while walreceiver is > running" seems invalid. No? When I mentioned an possibility of changing ReadRecord so that it modifies randAccess instead of currentSource, I thought that WaitForWALToBecomeAvailable should shutdown wal receiver as needed. At Thu, 27 Feb 2020 15:23:07 +0900 (JST), Kyotaro Horiguchi wrote in me> location, breaks a stream. (But the patch doesn't try to stop wal me> sender if randAccess.) And random access during StandbyMode ususally (always?) lets RecPtr go back. I'm not sure WaitForWALToBecomeAvailable works correctly if we don't have a file in pg_wal and the REDO point is far back by more than a segment from the initial checkpoint record. (It seems to cause assertion failure, but I haven't checked that.) If we go back to XLOG_FROM_ARCHIVE by random access, it correctly re-connects to the primary for the past segment. > So I think that the approach that I proposed is better. It depends on how far we assume RecPtr go back. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Autovacuum on partitioned table
Hosoya-san, Thanks for the new patch. On Wed, Feb 26, 2020 at 11:33 AM yuzuko wrote: > Attach the v5 patch. In this patch, pgstat_report_analyze() always reports 0 > as > msg.m_live_tuples and m_dead_tuples when the relation is partitioned. Some comments: + * PgStat_MsgPartAnalyzeSent by the backend or autovacuum daemon + * after ANALYZE for partitioned tables Looking at the way this message is used, it does not seem to be an "analyze" message and also it's not sent "after ANALYZE of partitioned tables", but really after ANALYZE of leaf partitions. Analyze (for both partitioned tables and leaf partitions) is reported as a PgStat_MsgAnalyze message as before. It seems that PgStat_MsgPartAnalyze is only sent to update a leaf partition's parent's (and recursively any grandparents') changes_since_analyze counters, so maybe we should find a different name for it. Maybe, PgStat_MsgPartChanges and accordingly the message type enum value. /* - * Report ANALYZE to the stats collector, too. However, if doing - * inherited stats we shouldn't report, because the stats collector only - * tracks per-table stats. Reset the changes_since_analyze counter only - * if we analyzed all columns; otherwise, there is still work for - * auto-analyze to do. + * Report ANALYZE to the stats collector, too. If the table is a + * partition, report changes_since_analyze of its parent because + * autovacuum process for partitioned tables needs it. Reset the + * changes_since_analyze counter only if we analyzed all columns; + * otherwise, there is still work for auto-analyze to do. */ The new comment says "partitions", which we typically use to refer to a child table, but this comment really talks about parent tables. Old comment says we don't report "inherited stats", presumably because stats collector lacks the infrastructure to distinguish a table's inherited stats and own stats, at least in the case of traditional inheritance. With this patch, we are making an exception for partitioned tables, because we are also teaching the stats collector to maintain at least changes_since_analyze for them that accumulates counts of changed tuples from partitions. It seems Alvaro already reported some of the other issues I had with the patch, such as why partanalyze messages are only sent from a autovacuum worker. Thanks, Amit
Re: Autovacuum on partitioned table
Hello Yuzuko, > + * Report ANALYZE to the stats collector, too. If the table is a > + * partition, report changes_since_analyze of its parent because > + * autovacuum process for partitioned tables needs it. Reset the > + * changes_since_analyze counter only if we analyzed all columns; > + * otherwise, there is still work for auto-analyze to do. >*/ > - if (!inh) > - pgstat_report_analyze(onerel, totalrows, totaldeadrows, > - (va_cols == NIL)); > + if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > + pgstat_report_analyze(onerel, totalrows, totaldeadrows, > + (va_cols == NIL)); Hmm, I think the comment has a bug: it says "report ... of its parent" but the report is of the same rel. (The pgstat_report_analyze line is mis-indented also). > /* > + * If the relation is a partitioned table, we must add up children's > + * reltuples. > + */ > + if (classForm->relkind == RELKIND_PARTITIONED_TABLE) > + { > + List *children; > + ListCell *lc; > + > + reltuples = 0; > + > + /* Find all members of inheritance set taking AccessShareLock */ > + children = find_all_inheritors(relid, AccessShareLock, NULL); > + > + foreach(lc, children) > + { > + OidchildOID = lfirst_oid(lc); > + HeapTuple childtuple; > + Form_pg_class childclass; > + > + /* Ignore the parent table */ > + if (childOID == relid) > + continue; I think this loop counts partitioned partitions multiple times, because we add up reltuples for all levels, no? (If I'm wrong, that is, if a partitioned rel does not have reltuples, then why skip the parent?) > + /* > + * If the table is a leaf partition, tell the stats collector its > parent's > + * changes_since_analyze for auto analyze > + */ > + if (IsAutoVacuumWorkerProcess() && > + rel->rd_rel->relispartition && > + !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)) I'm not sure I understand why we do this only on autovac. Why not all analyzes? > + { > + Oid parentoid; > + Relation parentrel; > + PgStat_StatDBEntry *dbentry; > + PgStat_StatTabEntry *tabentry; > + > + /* Get its parent table's Oid and relation */ > + parentoid = get_partition_parent(RelationGetRelid(rel)); > + parentrel = table_open(parentoid, AccessShareLock); Climbing up the partitioning hierarchy acquiring locks on ancestor relations opens up for deadlocks. It's better to avoid that. (As a test, you could try what happens if you lock the topmost relation with access-exclusive and leave a transaction open, then have autoanalyze run). At the same time, I wonder if it's sensible to move one level up here, and also have pgstat_report_partanalyze move more levels up. > + * pgstat_report_partanalyze() - > + * > + * Tell the collector about the parent table of which partition just > analyzed. > + * > + * Caller must provide a child's changes_since_analyze as a parents. I'm not sure what the last line is trying to say. Thanks, -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Autovacuum on partitioned table
On Wed, 26 Feb 2020 at 11:33, yuzuko wrote: > > Hi, > > Thanks for reviewing the patch. > > > > We can make it work correctly but I think perhaps we can skip updating > > > statistics values of partitioned tables other than n_mod_since_analyze > > > as the first step. Because if we support also n_live_tup and > > > n_dead_tup, user might get confused that other statistics values such > > > as seq_scan, seq_tup_read however are not supported. > > > > +1, that makes sense. > > > Yes, Indeed. I modified it not to update statistics other than > n_mod_since_analyze. > Attach the v5 patch. In this patch, pgstat_report_analyze() always reports 0 > as > msg.m_live_tuples and m_dead_tuples when the relation is partitioned. > Thank you for updating the patch. I'll look at it. I'd recommend to register this patch to the next commit fest so at not to forget. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Allowing ALTER TYPE to change storage strategy
Hi, From time to time I run into the limitation that ALTER TYPE does not allow changing storage strategy. I've written a bunch of data types over the years - in some cases I simply forgot to specify storage in CREATE TYPE (so it defaulted to PLAIN) or I expected PLAIN to be sufficient and then later wished I could enable TOAST. Obviously, without ALTER TYPE supporting that it's rather tricky. You either need to do dump/restore, or tweak the pg_type catalog directly. So here is an initial patch extending ALTER TYPE to support this. The question is why this was not implemented before - my assumption is this is not simply because no one wanted that. We track the storage in pg_attribute too, and ALTER TABLE allows changing that ... My understanding is that pg_type.typstorage essentially specifies two things: (a) default storage strategy for the attributes with this type, and (b) whether the type implementation is prepared to handle TOAST-ed values or not. And pg_attribute.attstorage has to respect this - when the type says PLAIN then the attribute can't simply use strategy that would enable TOAST. Luckily, this is only a problem when switching typstorage to PLAIN (i.e. when disabling TOAST for the type). The attached v1 patch checks if there are attributes with non-PLAIN storage for this type, and errors out if it finds one. But unfortunately that's not entirely correct, because ALTER TABLE only sets storage for new data. A table may be created with e.g. EXTENDED storage for an attribute, a bunch of rows may be loaded and then the storage for the attribute may be changed to PLAIN. That would pass the check as it's currently in the patch, yet there may be TOAST-ed values for the type with PLAIN storage :-( I'm not entirely sure what to do about this, but I think it's OK to just reject changes in this direction (from non-PLAIN to PLAIN storage). I've never needed it, and it seems pretty useless - it seems fine to just instruct the user to do a dump/restore. In the opposite direction - when switching from PLAIN to a TOAST-enabled storage, or enabling/disabling compression, this is not an issue at all. It's legal for type to specify e.g. EXTENDED but attribute to use PLAIN, for example. So the attached v1 patch simply allows this direction. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml index 67be1dd568..03a6a59468 100644 --- a/doc/src/sgml/ref/alter_type.sgml +++ b/doc/src/sgml/ref/alter_type.sgml @@ -30,6 +30,7 @@ ALTER TYPE name RENAME TO name SET SCHEMA new_schema ALTER TYPE name ADD VALUE [ IF NOT EXISTS ] new_enum_value [ { BEFORE | AFTER } neighbor_enum_value ] ALTER TYPE name RENAME VALUE existing_enum_value TO new_enum_value +ALTER TYPE name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN } where action is one of: @@ -97,6 +98,38 @@ ALTER TYPE name RENAME VALUE + + + SET STORAGE + + TOAST + per-type storage settings + + + + + This form sets the storage mode for a data type, controlling whether the + values are held inline or in a secondary TOAST table, + and whether the data should be compressed or not. + PLAIN must be used for fixed-length values such as + integer and is inline, uncompressed. MAIN + is for inline, compressible data. EXTERNAL is for + external, uncompressed data, and EXTENDED is for + external, compressed data. EXTENDED is the default + for most data types that support non-PLAIN storage. + Use of EXTERNAL will make substring operations on + very large text and bytea values run faster, + at the penalty of increased storage space. Note that + SET STORAGE doesn't itself change anything in the + tables, it just sets the strategy to be used by tables created in the + future. See for more information. + Note that this merely modifies the default for a data type, but each + attribute specifies it's own strategy that overrides this value. + See for how to change that. + + + + SET SCHEMA diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c index 1cb84182b0..d021b1d272 100644 --- a/src/backend/commands/alter.c +++ b/src/backend/commands/alter.c @@ -1042,3 +1042,48 @@ AlterObjectOwner_internal(Relation rel, Oid objectId, Oid new_ownerId) InvokeObjectPostAlterHook(classId, objectId, 0); } + +/* + * Executes an ALTER TYPE / SET STORAGE statement. At the moment this is + * supported only for OBJECT_TYPE nad OBJECT_DOMAIN. + */ +ObjectAddress +ExecAlterTypeStorageStmt(AlterTypeStorageStmt *stmt) +{ + char *storagemode; + charnewstorage; + + Assert(IsA(stmt->newstorage, String)); + storagemode = strVal(stmt->newstorage); + + if (pg_strcasecmp(storagemode, "plain") == 0)
Re: ALTER INDEX fails on partitioned index
On 2020-Feb-27, Justin Pryzby wrote: > The attached allows CREATE/ALTER to specify reloptions on a partitioned table > which are used as defaults for future children. > > I think that's a desirable behavior, same as for tablespaces. Michael > mentioned that ALTER INDEX ONLY doesn't exist, but that's only an issue if > ALTER acts recursively, which isn't the case here. I think ALTER not acting recursively is a bug that we would do well not to propagate any further. Enough effort we have to spend trying to fix that already. Let's add ALTER .. ONLY if needed. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: error context for vacuum to include block number
On 2020-Feb-27, Justin Pryzby wrote: > Originally, the patch only supported "scanning heap", and set the callback > strictly, to avoid having callback installed when calling other functions > (like > vacuuming heap/indexes). > > Then incrementally added callbacks in increasing number of places. We only > need one errcontext. And possibly you're right that the callback could always > be in place (?). But what about things like vacuuming FSM ? I think we'd > need > another "phase" for that (or else invent a PHASE_IGNORE to do nothing). Would > VACUUM_FSM be added to progress reporting, too? We're also talking about new > phase for TRUNCATE_PREFETCH and TRUNCATE_WAIT. I think we should use a separate enum. It's simple enough, and there's no reason to use the same enum for two different things if it seems to complicate matters. > Regarding the cbarg, at one point I took a suggestion from Andres to use the > LVRelStats struct. I got rid of that since I didn't like sharing "blkno" > between heap scanning and heap vacuuming, and needs to be reset when switching > back to scanning heap. I experimented now going back to that now. The only > utility is in having an single allocation of relname/space. I'm unsure about reusing that struct. Not saying don't do it, just ... unsure. It possibly has other responsibilities. I don't think there's a reason to keep 0002 separate. Regarding this, > + case PROGRESS_VACUUM_PHASE_VACUUM_HEAP: > + if (BlockNumberIsValid(cbarg->blkno)) > + errcontext("while vacuuming block %u of > relation \"%s.%s\"", > + cbarg->blkno, > cbarg->relnamespace, cbarg->relname); > + break; I think you should still call errcontext() when blkno is invalid. In fact, just remove the "if" line altogether and let it show whatever value is there. It should work okay. We don't expect the value to be invalid anyway. Maybe it would make sense to make the LVRelStats struct members be char arrays rather than pointers. Then you memcpy() or strlcpy() them instead of palloc/free. Please don't cuddle your braces. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Rethinking opclass member checks and dependency strength
Tomas Vondra writes: > On Sun, Jan 05, 2020 at 12:33:10PM -0500, Tom Lane wrote: >> I see your point that "check" suggests a read-only operation, but >> I'm not sure about a better verb. I thought of "amvalidatemembers", >> but that's not really much better than "check" is it? > I don't :-( Still haven't got a better naming idea, but in the meantime here's a rebase to fix a conflict with 612a1ab76. regards, tom lane diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c index 0104d02..1098ab7 100644 --- a/contrib/bloom/blutils.c +++ b/contrib/bloom/blutils.c @@ -138,6 +138,7 @@ blhandler(PG_FUNCTION_ARGS) amroutine->amproperty = NULL; amroutine->ambuildphasename = NULL; amroutine->amvalidate = blvalidate; + amroutine->amcheckmembers = NULL; amroutine->ambeginscan = blbeginscan; amroutine->amrescan = blrescan; amroutine->amgettuple = NULL; diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index 37f8d87..d76d17d 100644 --- a/doc/src/sgml/indexam.sgml +++ b/doc/src/sgml/indexam.sgml @@ -141,6 +141,7 @@ typedef struct IndexAmRoutine amproperty_function amproperty; /* can be NULL */ ambuildphasename_function ambuildphasename; /* can be NULL */ amvalidate_function amvalidate; +amcheckmembers_function amcheckmembers; /* can be NULL */ ambeginscan_function ambeginscan; amrescan_function amrescan; amgettuple_function amgettuple; /* can be NULL */ @@ -500,7 +501,48 @@ amvalidate (Oid opclassoid); the access method can reasonably do that. For example, this might include testing that all required support functions are provided. The amvalidate function must return false if the opclass is - invalid. Problems should be reported with ereport messages. + invalid. Problems should be reported with ereport + messages, typically at INFO level. + + + + +void +amcheckmembers (Oid opfamilyoid, +Oid opclassoid, +List *operators, +List *functions); + + Validate proposed operator and function members of an operator family, + so far as the access method can reasonably do that, and set their + dependency types if the default is not satisfactory. This is called + during CREATE OPERATOR CLASS and during + ALTER OPERATOR FAMILY ADD; in the latter + case opclassoid is InvalidOid. + The List arguments are lists + of OpFamilyMember structs, as defined + in amapi.h. + + Tests done by this function will typically be a subset of those + performed by amvalidate, + since amcheckmembers cannot assume that it is + seeing a complete set of members. For example, it would be reasonable + to check the signature of a support function, but not to check whether + all required support functions are provided. Any problems can be + reported by throwing an error. + + The dependency-related fields of + the OpFamilyMember structs are initialized by + the core code to create hard dependencies on the opclass if this + is CREATE OPERATOR CLASS, or soft dependencies on the + opfamily if this is ALTER OPERATOR FAMILY ADD. + amcheckmembers can adjust these fields if some other + behavior is more appropriate. For example, GIN, GiST, and SP-GiST + always set operator members to have soft dependencies on the opfamily, + since the connection between an operator and an opclass is relatively + weak in these index types; so it is reasonable to allow operator members + to be added and removed freely. Optional support functions are typically + also given soft dependencies, so that they can be removed if necessary. diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index 2e8f67e..d1e1176 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -118,6 +118,7 @@ brinhandler(PG_FUNCTION_ARGS) amroutine->amproperty = NULL; amroutine->ambuildphasename = NULL; amroutine->amvalidate = brinvalidate; + amroutine->amcheckmembers = NULL; amroutine->ambeginscan = brinbeginscan; amroutine->amrescan = brinrescan; amroutine->amgettuple = NULL; diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c index a7e55ca..e61e629 100644 --- a/src/backend/access/gin/ginutil.c +++ b/src/backend/access/gin/ginutil.c @@ -70,6 +70,7 @@ ginhandler(PG_FUNCTION_ARGS) amroutine->amproperty = NULL; amroutine->ambuildphasename = NULL; amroutine->amvalidate = ginvalidate; + amroutine->amcheckmembers = gincheckmembers; amroutine->ambeginscan = ginbeginscan; amroutine->amrescan = ginrescan; amroutine->amgettuple = NULL; diff --git a/src/backend/access/gin/ginvalidate.c b/src/backend/access/gin/ginvalidate.c index 0b62e0a..f44aa4d 100644 --- a/src/backend/access/gin/ginvalidate.c +++ b/src/backend/access/gin/ginvalidate.c @@ -267,3 +267,64 @@ ginvalidate(Oid opclassoid) return result; } + +/* + * Prechecking function for adding operators/functions to a GIN opfa
Re: ALTER INDEX fails on partitioned index
The attached allows CREATE/ALTER to specify reloptions on a partitioned table which are used as defaults for future children. I think that's a desirable behavior, same as for tablespaces. Michael mentioned that ALTER INDEX ONLY doesn't exist, but that's only an issue if ALTER acts recursively, which isn't the case here. The current behavior seems unreasonable: CREATE allows specifying fillfactor, which does nothing, and unable to alter it, either: postgres=# CREATE TABLE tt(i int)PARTITION BY RANGE (i);; CREATE TABLE postgres=# CREATE INDEX ON tt(i)WITH(fillfactor=11); CREATE INDEX postgres=# \d tt ... "tt_i_idx" btree (i) WITH (fillfactor='11') postgres=# ALTER INDEX tt_i_idx SET (fillfactor=12); ERROR: "tt_i_idx" is not a table, view, materialized view, or index Maybe there are other ALTER commands to handle (UNLOGGED currently does nothing on a partitioned table?, STATISTICS, ...). The first patch makes a prettier message, per Robert's suggestion. -- Justin >From e5bb363f514d768a4f540d9c82ad5745944b1486 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 30 Dec 2019 09:31:03 -0600 Subject: [PATCH v2 1/2] More specific error message when failing to alter a partitioned index.. "..is not an index" is deemed to be unfriendly https://www.postgresql.org/message-id/CA%2BTgmobq8_-DS7qDEmMi-4ARP1_0bkgFEjYfiK97L2eXq%2BQ%2Bnw%40mail.gmail.com --- src/backend/commands/tablecmds.c | 23 +-- 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index b7c8d66..1b271af 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -366,7 +366,7 @@ static void ATRewriteTables(AlterTableStmt *parsetree, static void ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode); static AlteredTableInfo *ATGetQueueEntry(List **wqueue, Relation rel); static void ATSimplePermissions(Relation rel, int allowed_targets); -static void ATWrongRelkindError(Relation rel, int allowed_targets); +static void ATWrongRelkindError(Relation rel, int allowed_targets, int actual_target); static void ATSimpleRecursion(List **wqueue, Relation rel, AlterTableCmd *cmd, bool recurse, LOCKMODE lockmode, AlterTableUtilityContext *context); @@ -5458,7 +5458,7 @@ ATSimplePermissions(Relation rel, int allowed_targets) /* Wrong target type? */ if ((actual_target & allowed_targets) == 0) - ATWrongRelkindError(rel, allowed_targets); + ATWrongRelkindError(rel, allowed_targets, actual_target); /* Permissions checks */ if (!pg_class_ownercheck(RelationGetRelid(rel), GetUserId())) @@ -5479,7 +5479,7 @@ ATSimplePermissions(Relation rel, int allowed_targets) * type. */ static void -ATWrongRelkindError(Relation rel, int allowed_targets) +ATWrongRelkindError(Relation rel, int allowed_targets, int actual_target) { char *msg; @@ -5527,9 +5527,20 @@ ATWrongRelkindError(Relation rel, int allowed_targets) break; } - ereport(ERROR, - (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg(msg, RelationGetRelationName(rel; + if (actual_target == ATT_PARTITIONED_INDEX && + (allowed_targets&ATT_INDEX) && + !(allowed_targets&ATT_PARTITIONED_INDEX)) + /* Add a special errhint for this case, since "is not an index" message is unfriendly */ + ereport(ERROR, +(errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg(msg, RelationGetRelationName(rel)), + // errhint("\"%s\" is a partitioned index", RelationGetRelationName(rel; + errhint("operation is not supported on partitioned indexes"))); + else + ereport(ERROR, +(errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg(msg, RelationGetRelationName(rel; + } /* -- 2.7.4 >From 5f7f79c4c8e85528e88ffa0b749faad1da4ab4d5 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 25 Feb 2020 14:27:09 -0600 Subject: [PATCH v2 2/2] Allow reloptions on partitioned tables and indexes.. ..which provides default values used for future (direct) partitions. See also similar behavior from: dfa60814 - Fix tablespace handling for partitioned indexes ca410302 - Fix tablespace handling for partitioned tables c94e6942 - Don't allocate storage for partitioned tables. --- src/backend/access/common/reloptions.c | 19 +-- src/backend/commands/tablecmds.c | 32 +++- src/test/regress/expected/reloptions.out | 29 + src/test/regress/sql/reloptions.sql | 11 +++ 4 files changed, 64 insertions(+), 27 deletions(-) diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index 79430d2..a2678e7 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -1108,10 +1108,8 @@ extractRelOptions(HeapTuple tuple, TupleDesc tupdesc, case RELKIND_RELATION: case RELKIND_TOASTVALUE: case RELKIND_MATVIEW: - options = heap_reloptions(classForm->relkind, datum
Re: Implementing Incremental View Maintenance
> I have tried to use an other patch with yours: > "Planning counters in pg_stat_statements (using pgss_store)" > setting > shared_preload_libraries='pg_stat_statements' > pg_stat_statements.track=all > restarting the cluster and creating the extension > When trying following syntax: > create table b1 (id integer, x numeric(10,3)); > create incremental materialized view mv1 as select id, count(*),sum(x) > from b1 group by id; > insert into b1 values (1,1) > > I got an ASSERT FAILURE in pg_stat_statements.c > on > Assert(query != NULL); > > comming from matview.c > refresh_matview_datafill(dest_old, query, queryEnv, NULL); > or > refresh_matview_datafill(dest_new, query, queryEnv, NULL); > > If this (last) NULL field was replaced by the query text, > a comment or just "n/a", > it would fix the problem. > Could this be investigated ? Hello, thank you for patch v14, that fix problems inherited from temporary tables. it seems that this ASSERT problem with pgss patch is still present ;o( Could we have a look ? Thanks in advance Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Resolving the python 2 -> python 3 mess
Andrew Dunstan writes: > On Thu, Feb 27, 2020 at 1:33 AM Tom Lane wrote: >> OK, so we need that *and* the AddProject addition you mentioned? > Yes, the first one builds it, the second one fixes the tests to run correctly. Thanks, here's a patchset incorporating those fixes. Otherwise same as before. regards, tom lane diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml index 1921915..ac989a3 100644 --- a/doc/src/sgml/plpython.sgml +++ b/doc/src/sgml/plpython.sgml @@ -164,13 +164,6 @@ - See also the - document https://docs.python.org/3/whatsnew/3.0.html";>What's - New In Python 3.0 for more information about porting to - Python 3. - - - It is not allowed to use PL/Python based on Python 2 and PL/Python based on Python 3 in the same session, because the symbols in the dynamic modules would clash, which could result in crashes of the @@ -179,6 +172,90 @@ a mismatch is detected. It is possible, however, to use both PL/Python variants in the same database, from separate sessions. + + + Converting from Python 2 to Python 3 + + +See the +document https://docs.python.org/3/whatsnew/3.0.html";>What's +New In Python 3.0 for the Python community's information and +recommendations about porting to Python 3. + + + +PostgreSQL provides some support for helping +you to convert existing Python 2 routines to Python 3. In an +installation built with Python 3, there is an +extension convert_python3 that changes functions +and procedures from the plpythonu +and plpython2u languages to +the plpython3u language. While doing so, it applies +the 2to3 tool described in the above document to +the body of each such routine. + + + +Using convert_python3 can be as simple as: + +CREATE EXTENSION convert_python3; +CALL convert_python3_all(); + +This must be done as database superuser. If you wish, you can drop the +extension once you're done converting everything. + + + +Since convert_python3 is Python 3 code, be careful +not to install or run it in a session that has previously executed any +Python 2 code. As explained above, that won't work. + + + +convert_python3_all has two optional arguments: the +name of the conversion tool to use (by default 2to3, +but you might for instance need to provide a full path name) and any +special command-line options to provide to it. You might for example +want to adjust the set of fixer rules +that 2to3 applies: + +CALL convert_python3_all(options => '-f idioms -x apply'); + +See 2to3's +https://docs.python.org/3/library/2to3.html";>documentation +for more information. + + + +The convert_python3 extension also provides a +procedure that converts just one Python 2 function at a time: + +CALL convert_python3_one('myfunc(int)'); + +The argument is the target function's OID, which can be written as +a regprocedure constant (see +). The main reason to use this would be +if you need to use different options for different functions. It has +the same optional arguments as convert_python3_all: + +CALL convert_python3_one('otherfunc(text)', tool => '/usr/bin/2to3', + options => '-f idioms'); + + + + +If you have needs that go beyond this, consult the source code for +the convert_python3 extension (it's just a +couple of plpython3u procedures) and adapt those +procedures as necessary. + + + +Keep in mind that if you've constructed any DO blocks +that use Python 2 code, those will have to be fixed up manually, +wherever the source code for them exists. + + diff --git a/src/pl/plpython/Makefile b/src/pl/plpython/Makefile index 9e95285..03f858b 100644 --- a/src/pl/plpython/Makefile +++ b/src/pl/plpython/Makefile @@ -38,6 +38,9 @@ DATA = $(NAME)u.control $(NAME)u--1.0.sql ifeq ($(python_majorversion),2) DATA += plpythonu.control plpythonu--1.0.sql endif +ifeq ($(python_majorversion),3) +DATA += convert_python3.control convert_python3--1.0.sql +endif # header files to install - it's not clear which of these might be needed # so install them all. diff --git a/src/pl/plpython/convert_python3--1.0.sql b/src/pl/plpython/convert_python3--1.0.sql new file mode 100644 index 000..3444ac0 --- /dev/null +++ b/src/pl/plpython/convert_python3--1.0.sql @@ -0,0 +1,149 @@ +/* convert_python3--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION convert_python3" to load this file. \quit + +-- This module provides two procedures, one to convert all python2 +-- functions and one to do just one. They're nearly identical, and +-- in principle convert_python3_all() could be written as a loop +-- around convert_python3_one(). It's not done that way since +-- creating a temp directory for each function in a bulk
Re: error context for vacuum to include block number
On Thu, Feb 20, 2020 at 02:02:36PM -0300, Alvaro Herrera wrote: > On 2020-Feb-19, Justin Pryzby wrote: > > > Also, I was thinking that lazy_scan_heap doesn't needs to do this: > > > > + /* Pop the error context stack while calling vacuum */ > > + error_context_stack = errcallback.previous; > > ... > > + /* Set the error context while continuing heap scan */ > > + error_context_stack = &errcallback; > > > > It seems to me that's not actually necessary, since lazy_vacuum_heap will > > just > > *push* a context handler onto the stack, and then pop it back off. > > So if you don't pop before pushing, you'll end up with two context > lines, right? Hm, looks like you're right, but that's not what I intended (and I didn't hit that in my test). > I think it would make sense to > initialize the context callback just *once* for a vacuum run, and from > that point onwards, just update the errcbarg struct to match what > you're currently doing -- not continually pop/push error callback stack > entries. See below ... Originally, the patch only supported "scanning heap", and set the callback strictly, to avoid having callback installed when calling other functions (like vacuuming heap/indexes). Then incrementally added callbacks in increasing number of places. We only need one errcontext. And possibly you're right that the callback could always be in place (?). But what about things like vacuuming FSM ? I think we'd need another "phase" for that (or else invent a PHASE_IGNORE to do nothing). Would VACUUM_FSM be added to progress reporting, too? We're also talking about new phase for TRUNCATE_PREFETCH and TRUNCATE_WAIT. Regarding the cbarg, at one point I took a suggestion from Andres to use the LVRelStats struct. I got rid of that since I didn't like sharing "blkno" between heap scanning and heap vacuuming, and needs to be reset when switching back to scanning heap. I experimented now going back to that now. The only utility is in having an single allocation of relname/space. -- Justin >From cad1177c7b61f1543fca1ac2120a238fe6f9e555 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Thu, 12 Dec 2019 20:54:37 -0600 Subject: [PATCH v22 1/3] vacuum errcontext to show block being processed Discussion: https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com --- src/backend/access/heap/vacuumlazy.c | 162 +++ 1 file changed, 144 insertions(+), 18 deletions(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 03c43ef..d34d0c5 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -270,6 +270,8 @@ typedef struct LVParallelState typedef struct LVRelStats { + char *relnamespace; + char *relname; /* useindex = true means two-pass strategy; false means one-pass */ bool useindex; /* Overall statistics about rel */ @@ -290,8 +292,12 @@ typedef struct LVRelStats int num_index_scans; TransactionId latestRemovedXid; bool lock_waiter_detected; -} LVRelStats; + /* Used for error callback: */ + char *indname; + BlockNumber blkno; /* used only for heap operations */ + int phase; /* Reusing same constants as for progress reporting */ +} LVRelStats; /* A few variables that don't seem worth passing around as parameters */ static int elevel = -1; @@ -314,10 +320,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel, LVRelStats *vacrelstats, LVParallelState *lps, int nindexes); static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats, - LVDeadTuples *dead_tuples, double reltuples); + LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats); static void lazy_cleanup_index(Relation indrel, IndexBulkDeleteResult **stats, - double reltuples, bool estimated_count); + double reltuples, bool estimated_count, LVRelStats *vacrelstats); static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer, int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer); static bool should_attempt_truncation(VacuumParams *params, @@ -361,6 +367,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats, LVParallelState *lps, int nindexes); static LVSharedIndStats *get_indstats(LVShared *lvshared, int n); static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared); +static void vacuum_error_callback(void *arg); +static void update_vacuum_error_cbarg(LVRelStats *errcbarg, int phase, + BlockNumber blkno, Relation rel); /* @@ -460,6 +469,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats)); + vacrelstats->relnamespace = get_namespace_name(RelationGetNamespace(onerel)); + vacrelstats->relname = pstrdup(RelationGetRelationName(onerel)); + vacrelstats->indname = NULL; vacrelst
Add LogicalTapeSetExtend() to logtape.c
Attached is a patch that exports a new function from logtape.c: extern LogicalTapeSet *LogicalTapeSetExtend( LogicalTapeSet *lts, int nAdditional); The purpose is to allow adding new tapes dynamically for the upcoming Hash Aggregation work[0]. HashAgg doesn't know in advance how many tapes it will need, though only a limited number are actually active at a time. This was proposed and originally written by Adam Lee[1] (extract only the changes to logtape.c/h from his posted patch). Strangely, I'm seeing ~5% regression with his version when doing: -- t20m_1_int4 has 20 million random integers select * from t20m_1_int4 order by i offset 1; Which seems to be due to using a pointer rather than a flexible array member (I'm using gcc[2]). My version (attached) still uses a flexible array member, which doesn't see the regression; but I repalloc the whole struct so the caller needs to save the new pointer to the tape set. That doesn't entirely make sense to me, and I'm wondering if someone else can repro that result and/or make a suggestion, because I don't have a good explanation. I'm fine with my version of the patch, but it would be nice to know why there's such a big difference using a pointer versus a flexible array member. Regards, Jeff Davis [0] https://postgr.es/m/6e7c269b9a84ff75fefcad8ab2d4758f03581e98.camel%40j-davis.com [1] https://postgr.es/m/20200108071202.GA1511@mars.local [2] gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c index 4f78b55fbaf..36104a73a75 100644 --- a/src/backend/utils/sort/logtape.c +++ b/src/backend/utils/sort/logtape.c @@ -201,6 +201,7 @@ static long ltsGetFreeBlock(LogicalTapeSet *lts); static void ltsReleaseBlock(LogicalTapeSet *lts, long blocknum); static void ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared, SharedFileSet *fileset); +static void ltsInitTape(LogicalTape *lt); static void ltsInitReadBuffer(LogicalTapeSet *lts, LogicalTape *lt); @@ -536,6 +537,30 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared, lts->nHoleBlocks = lts->nBlocksAllocated - nphysicalblocks; } +/* + * Initialize per-tape struct. Note we allocate the I/O buffer and the first + * block for a tape only when it is first actually written to. This avoids + * wasting memory space when tuplesort.c overestimates the number of tapes + * needed. + */ +static void +ltsInitTape(LogicalTape *lt) +{ + lt->writing = true; + lt->frozen= false; + lt->dirty = false; + lt->firstBlockNumber = -1L; + lt->curBlockNumber= -1L; + lt->nextBlockNumber = -1L; + lt->offsetBlockNumber = 0L; + lt->buffer= NULL; + lt->buffer_size = 0; + /* palloc() larger than MaxAllocSize would fail */ + lt->max_size = MaxAllocSize; + lt->pos = 0; + lt->nbytes= 0; +} + /* * Lazily allocate and initialize the read buffer. This avoids waste when many * tapes are open at once, but not all are active between rewinding and @@ -579,7 +604,6 @@ LogicalTapeSetCreate(int ntapes, TapeShare *shared, SharedFileSet *fileset, int worker) { LogicalTapeSet *lts; - LogicalTape *lt; int i; /* @@ -597,29 +621,8 @@ LogicalTapeSetCreate(int ntapes, TapeShare *shared, SharedFileSet *fileset, lts->nFreeBlocks = 0; lts->nTapes = ntapes; - /* - * Initialize per-tape structs. Note we allocate the I/O buffer and the - * first block for a tape only when it is first actually written to. This - * avoids wasting memory space when tuplesort.c overestimates the number - * of tapes needed. - */ for (i = 0; i < ntapes; i++) - { - lt =tapes[i]; - lt->writing = true; - lt->frozen = false; - lt->dirty = false; - lt->firstBlockNumber = -1L; - lt->curBlockNumber = -1L; - lt->nextBlockNumber = -1L; - lt->offsetBlockNumber = 0L; - lt->buffer = NULL; - lt->buffer_size = 0; - /* palloc() larger than MaxAllocSize would fail */ - lt->max_size = MaxAllocSize; - lt->pos = 0; - lt->nbytes = 0; - } + ltsInitTape( tapes[i]); /* * Create temp BufFile storage as required. @@ -1004,6 +1007,29 @@ LogicalTapeFreeze(LogicalTapeSet *lts, int tapenum, TapeShare *share) } } +/* + * Add additional tapes to this tape set. Not intended to be used when any + * tapes are frozen. + */ +LogicalTapeSet * +LogicalTapeSetExtend(LogicalTapeSet *lts, int nAdditional) +{ + int i; + int nTapesOrig = lts->nTapes; + Size newSize; + + lts->nTapes += nAdditional; + newSize = offsetof(LogicalTapeSet, tapes) + + lts->nTapes * sizeof(LogicalTape); + + lts = (LogicalTapeSet *) repalloc(lts, newSize); + + for (i = nTapesOrig; i < lts->nTapes; i++) + ltsInitTape( tapes[i]); + + return lts; +} + /* * Backspace the tape a given number of bytes. (We also support a more * general seek interface, see below.) diff --git a/src/include/utils/logtape.h b/src/include/utils/logtape.h index 69
Add absolute value to dict_int
I've seen a few requests on how to make FTS search on the absolute value of integers. This question is usually driven by the fact that the text search parser interprets a separating hyphen ("partnumber-987") as a minus sign. There is currently no good answer for this that doesn't involve C coding. I think this feature has a natural and trivial home in the dict_int extension, so attached is a patch that does that. There are no changes to the extension creation scripts, so there is no need to bump the version. And I think the convention is that we don't bump the version just to signify a change which implements a new feature when that doesn't change the creation scripts. Cheers, Jeff dict_int_absval_v001.patch Description: Binary data
Re: [HACKERS] Doubt in pgbench TPS number
On 2020-02-27 12:26:36 -0800, Andres Freund wrote: > Hi, > > On 2015-09-25 20:35:45 +0200, Fabien COELHO wrote: > > > > Hello Tatsuo, > > > > > Hmmm... I never use -C. The formula seems ok: > > > > > >tps_exclude = normal_xacts / (time_include - > > >(INSTR_TIME_GET_DOUBLE(conn_total_time) / > > > nthreads)); > > > > Hmmm... it is not:-) > > > > I think that the degree of parallelism to consider is nclients, not > > nthreads: while connection time is accumulated in conn_time, other clients > > are possibly doing their transactions, in parallel, even if it is in the > > same thread, so it is not "stopped time" for all clients. It starts to > > matter with "-j 1 -c 30" and slow transactions, the cumulated conn_time in > > each thread may be arbitrary close to the whole time if there are many > > clients. > > I think this pretty much entirely broke the tps_exclude logic when not > using -C, especially when -c and -j differ. The wait time there is > actually per thread, not per client. > > In this example I set post_auth_delay=1s on the server. Pgbench > tells me: > pgbench -M prepared -c 180 -j 180 -T 10 -P1 -S > tps = 897607.544862 (including connections establishing) > tps = 1004793.708611 (excluding connections establishing) > > pgbench -M prepared -c 180 -j 60 -T 10 -P1 -S > tps = 739502.979613 (including connections establishing) > tps = 822639.038779 (excluding connections establishing) > > pgbench -M prepared -c 180 -j 30 -T 10 -P1 -S > tps = 376468.177081 (including connections establishing) > tps = 418554.527585 (excluding connections establishing) > > which pretty obviously is bogus. While I'd not expect it'd to work > perfectly, the "excluding" number should stay roughly constant. > > > The fundamental issue is that without -C *none* of the connections in > each thread gets to actually execute work before all of them have > established a connection. So dividing conn_total_time by / nclients is > wrong. > > For more realistic connection delays this leads to the 'excluding' > number being way too close to the 'including' number, even if a > substantial portion of the time is spent on it. Not suggesting it as an actual fix, but just multiplying the computed conn_time within runThread() by the number of connections the thread handles (for the non -C case) results in much more reasonable output. pgbench -M prepared -c 180 -j 30 -T 10 -P1 -S before: tps = 378393.985650 (including connections establishing) tps = 420691.025930 (excluding connections establishing) after: tps = 379818.929680 (including connections establishing) tps = 957405.785600 (excluding connections establishing) pgbench -M prepared -c 180 -j 180 -T 10 -P1 -S before: tps = 906662.031099 (including connections establishing) tps = 1012223.500880 (excluding connections establishing) after: tps = 906178.154876 (including connections establishing) tps = 1012431.154463 (excluding connections establishing) The numbers are still a bit bogus because of thread startup overhead being included, and conn_time being computed relative to the thread creation. But they at least make some basic sense. diff --git i/src/bin/pgbench/pgbench.c w/src/bin/pgbench/pgbench.c index 1159757acb0..3bc45107136 100644 --- i/src/bin/pgbench/pgbench.c +++ w/src/bin/pgbench/pgbench.c @@ -6265,6 +6265,16 @@ threadRun(void *arg) INSTR_TIME_SET_CURRENT(thread->conn_time); INSTR_TIME_SUBTRACT(thread->conn_time, thread->start_time); +/* add once for each other connection */ +if (!is_connect) +{ +instr_time e = thread->conn_time; +for (i = 0; i < (nstate - 1); i++) +{ +INSTR_TIME_ADD(thread->conn_time, e); +} +} + /* explicitly initialize the state machines */ for (i = 0; i < nstate; i++) { Greetings, Andres Freund
ALTER TEXT SEARCH DICTIONARY tab completion
"ALTER TEXT SEARCH DICTIONARY foobar" can be followed by an open parenthesis, but that is not offered in tab completion. That is useful, because otherwise I have to look up the docs to see if I need a SET or OPTION(S) or WITH or something before it, just to discover I don't. The attached one-line patch adds "(". We can't go beyond that, as available options for each dictionary are not known in advance. Cheers, Jeff alter_dict_tab_complete.patch Description: Binary data
Re: [HACKERS] Doubt in pgbench TPS number
Hi, On 2015-09-25 20:35:45 +0200, Fabien COELHO wrote: > > Hello Tatsuo, > > > Hmmm... I never use -C. The formula seems ok: > > > >tps_exclude = normal_xacts / (time_include - > >(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads)); > > Hmmm... it is not:-) > > I think that the degree of parallelism to consider is nclients, not > nthreads: while connection time is accumulated in conn_time, other clients > are possibly doing their transactions, in parallel, even if it is in the > same thread, so it is not "stopped time" for all clients. It starts to > matter with "-j 1 -c 30" and slow transactions, the cumulated conn_time in > each thread may be arbitrary close to the whole time if there are many > clients. I think this pretty much entirely broke the tps_exclude logic when not using -C, especially when -c and -j differ. The wait time there is actually per thread, not per client. In this example I set post_auth_delay=1s on the server. Pgbench tells me: pgbench -M prepared -c 180 -j 180 -T 10 -P1 -S tps = 897607.544862 (including connections establishing) tps = 1004793.708611 (excluding connections establishing) pgbench -M prepared -c 180 -j 60 -T 10 -P1 -S tps = 739502.979613 (including connections establishing) tps = 822639.038779 (excluding connections establishing) pgbench -M prepared -c 180 -j 30 -T 10 -P1 -S tps = 376468.177081 (including connections establishing) tps = 418554.527585 (excluding connections establishing) which pretty obviously is bogus. While I'd not expect it'd to work perfectly, the "excluding" number should stay roughly constant. The fundamental issue is that without -C *none* of the connections in each thread gets to actually execute work before all of them have established a connection. So dividing conn_total_time by / nclients is wrong. For more realistic connection delays this leads to the 'excluding' number being way too close to the 'including' number, even if a substantial portion of the time is spent on it. Greetings, Andres Freund
Less-silly selectivity for JSONB matching operators
While looking at a recent complaint about bad planning, I was reminded that jsonb's @> and related operators use "contsel" as their selectivity estimator. This is really bad, because (a) contsel is only a stub, yielding a fixed default estimate, and (b) that default is 0.001, meaning we estimate these operators as five times more selective than equality, which is surely pretty silly. There's a good model for improving this in ltree's ltreeparentsel(): for any "var OP constant" query, we can try applying the operator to all of the column's MCV and histogram values, taking the latter as being a random sample of the non-MCV values. That code is actually 100% generic except for the question of exactly what default selectivity ought to be plugged in when we don't have stats. Hence, the attached draft patch moves that logic into a generic function in selfuncs.c, and then invents "matchsel" and "matchjoinsel" generic estimators that have a default estimate of twice DEFAULT_EQ_SEL. (I'm not especially wedded to that number, but it seemed like a reasonable starting point.) There were a couple of other operators that seemed to be inappropriately using contsel, so I changed all of these to use matchsel: @>(tsquery,tsquery)| tsq_mcontains <@(tsquery,tsquery)| tsq_mcontained @@(text,text) | ts_match_tt @@(text,tsquery) | ts_match_tq -|-(anyrange,anyrange) | range_adjacent @>(jsonb,jsonb)| jsonb_contains ?(jsonb,text) | jsonb_exists ?|(jsonb,text[]) | jsonb_exists_any ?&(jsonb,text[]) | jsonb_exists_all <@(jsonb,jsonb)| jsonb_contained @?(jsonb,jsonpath) | jsonb_path_exists_opr @@(jsonb,jsonpath) | jsonb_path_match_opr Note: you might think that we should just shove this generic logic into contsel itself, and maybe areasel and patternsel while at it. However, that would be pretty useless for these functions' intended usage with the geometric operators, because we collect neither MCV nor histogram stats for the geometric data types, making the extra complexity worthless. Pending somebody putting some effort into estimation for the geometric data types, I think we should just get out of the business of having non-geometric types relying on these estimators. This patch is not complete, because I didn't look at changing the contrib modules, and grep says at least some of them are using contsel for non-geometric data types. But I thought I'd put it up for discussion at this stage. regards, tom lane diff --git a/contrib/ltree/ltree_op.c b/contrib/ltree/ltree_op.c index 070868f..2791037 100644 --- a/contrib/ltree/ltree_op.c +++ b/contrib/ltree/ltree_op.c @@ -559,8 +559,6 @@ ltree2text(PG_FUNCTION_ARGS) } -#define DEFAULT_PARENT_SEL 0.001 - /* * ltreeparentsel - Selectivity of parent relationship for ltree data types. */ @@ -571,101 +569,12 @@ ltreeparentsel(PG_FUNCTION_ARGS) Oid operator = PG_GETARG_OID(1); List *args = (List *) PG_GETARG_POINTER(2); int varRelid = PG_GETARG_INT32(3); - VariableStatData vardata; - Node *other; - bool varonleft; double selec; - /* - * If expression is not variable <@ something or something <@ variable, - * then punt and return a default estimate. - */ - if (!get_restriction_variable(root, args, varRelid, - &vardata, &other, &varonleft)) - PG_RETURN_FLOAT8(DEFAULT_PARENT_SEL); - - /* - * If the something is a NULL constant, assume operator is strict and - * return zero, ie, operator will never return TRUE. - */ - if (IsA(other, Const) && - ((Const *) other)->constisnull) - { - ReleaseVariableStats(vardata); - PG_RETURN_FLOAT8(0.0); - } - - if (IsA(other, Const)) - { - /* Variable is being compared to a known non-null constant */ - Datum constval = ((Const *) other)->constvalue; - FmgrInfo contproc; - double mcvsum; - double mcvsel; - double nullfrac; - int hist_size; - - fmgr_info(get_opcode(operator), &contproc); - - /* - * Is the constant "<@" to any of the column's most common values? - */ - mcvsel = mcv_selectivity(&vardata, &contproc, constval, varonleft, - &mcvsum); - - /* - * If the histogram is large enough, see what fraction of it the - * constant is "<@" to, and assume that's representative of the - * non-MCV population. Otherwise use the default selectivity for the - * non-MCV population. - */ - selec = histogram_selectivity(&vardata, &contproc, - constval, varonleft, - 10, 1, &hist_size); - if (selec < 0) - { - /* Nope, fall back on default */ - selec = DEFAULT_PARENT_SEL; - } - else if (hist_size < 100) - { - /* - * For histogram sizes from 10 to 100, we combine the histogram - * and default selectivities, putting increasingly more trust in - * the histogram for larger sizes. - */ - double hist_weight = hist_size / 100.0; - - selec = selec * hist_weight + -DEFAULT_PARENT_SEL * (1.0 - hist_weight); - } - - /* In any case, don'
Re: [GsoC] Read/write transaction-level routing in Odyssey Project Idea
Greetings, * koschasia...@gmail.com (koschasia...@gmail.com) wrote: > I send this response to remind you of my initial email because it might have > been lost among others. Andrey is the one listed as a possible mentor for that project- I've added him to the CC list. Hopefully he'll get back to you soon. Thanks, Stephen signature.asc Description: PGP signature
Re: [GsoC] Read/write transaction-level routing in Odyssey Project Idea
Hello once again, I send this response to remind you of my initial email because it might have been lost among others. I also want to know, if I can familiarize myself with your project idea before even sending my proposal. Thanks for your time, I understand you receive hundreds of emails. > On 25 Feb 2020, at 17:56, Kostas Chasialis wrote: > > > Greetings, > > I am a Computer Science student at National and Kapodistrian University of > Athens, and I would like to be a part of this year's GsoC program. > During my academic courses, I developed an interest on databases (back-end) > and the main language I used during my academic career is C. > > A project I recently developed was In-Memory SUM SQL Query Execution on > RDBMS-like queries using SortMerge Join. As input data we used the data > provided from SIGMOD 2018 contest. > (github link : https://github.com/kchasialis/Query-Compiler-Executor) > > Therefore, the project idea "Read/Write transaction-level routing in Odyssea" > highly intrigued me and I would like to be a part of it. > I believe that I have the necessary background to live up to your > expectations and I would like to learn more things about this project and > ways I can contribute to its development. > > Thanks in advance, Kostas. >
Re: pgbench: option delaying queries till connections establishment?
Hi, On 2020-02-27 10:01:00 -0800, Andres Freund wrote: > If so, should this be done unconditionally? A new option? Included in an > existing one somehow? FWIW, leaving windows, error handling, and other annoyances aside, this can be implemented fairly simply. See below. As an example of the difference: Before: andres@awork3:~/build/postgres/dev-optimize/vpath$ ./src/bin/pgbench/pgbench -M prepared -c 5000 -j 100 -T 100 -P1 -S starting vacuum...end. progress: 100.4 s, 515307.4 tps, lat 1.374 ms stddev 7.739 transaction type: scaling factor: 30 query mode: prepared number of clients: 5000 number of threads: 100 duration: 100 s number of transactions actually processed: 51728348 latency average = 1.374 ms latency stddev = 7.739 ms tps = 513802.541226 (including connections establishing) tps = 521342.427158 (excluding connections establishing) Note that there's no progress report until the end. That's because the main thread didn't get a connection until the other threads were done. After: pgbench -M prepared -c 5000 -j 100 -T 100 -P1 -S starting vacuum...end. progress: 1.5 s, 9943.5 tps, lat 4.795 ms stddev 14.822 progress: 2.0 s, 380312.6 tps, lat 1.728 ms stddev 15.461 progress: 3.0 s, 478811.1 tps, lat 2.052 ms stddev 31.687 progress: 4.0 s, 470804.6 tps, lat 1.941 ms stddev 24.661 I think this also shows that "including/excluding connections establishing" as well as some of the other stats reported pretty bogus. In the 'before' case a substantial numer of the connections had not yet been established until the end of the test run! diff --git i/src/bin/pgbench/pgbench.c w/src/bin/pgbench/pgbench.c index 1159757acb0..1a82c6a290e 100644 --- i/src/bin/pgbench/pgbench.c +++ w/src/bin/pgbench/pgbench.c @@ -310,6 +310,8 @@ typedef struct RandomState /* Various random sequences are initialized from this one. */ static RandomState base_random_sequence; +pthread_barrier_t conn_barrier; + /* * Connection state machine states. */ @@ -6110,6 +6112,8 @@ main(int argc, char **argv) /* start threads */ #ifdef ENABLE_THREAD_SAFETY +pthread_barrier_init(&conn_barrier, NULL, nthreads); + for (i = 0; i < nthreads; i++) { TState *thread = &threads[i]; @@ -6265,6 +6269,8 @@ threadRun(void *arg) INSTR_TIME_SET_CURRENT(thread->conn_time); INSTR_TIME_SUBTRACT(thread->conn_time, thread->start_time); +pthread_barrier_wait(&conn_barrier); + /* explicitly initialize the state machines */ for (i = 0; i < nstate; i++) { Greetings, Andres Freund
pgbench: option delaying queries till connections establishment?
Hi, I am trying to run a few benchmarks measuring the effects of patch to make GetSnapshotData() faster in the face of larger numbers of established connections. Before the patch connection establishment often is very slow due to contention. The first few connections are fast, but after that it takes increasingly long. The first few connections constantly hold ProcArrayLock in shared mode, which then makes it hard for new connections to acquire it exclusively (I'm addressing that to a significant degree in the patch FWIW). But for a fair comparison of the runtime effects I'd like to only compare the throughput for when connections are actually usable, otherwise I end up benchmarking few vs many connections, which is not useful. And because I'd like to run the numbers for a lot of different numbers of connections etc, I can't just make each run several hour longs to make the initial minutes not matter much. Therefore I'd like to make pgbench wait till it has established all connections, before they run queries. Does anybody else see this as being useful? If so, should this be done unconditionally? A new option? Included in an existing one somehow? Greetings, Andres Freund
Re: Allow auto_explain to log plans before queries are executed
Hi, that feature for dumping plans with auto explain is already available in https://github.com/legrandlegrand/pg_stat_sql_plans This is an hybrid extension combining auto_explain and pg_stat_statements, adding a planid and tracking metrics even on error, ..., ... With pg_stat_sql_plans.track_planid = true pg_stat_sql_plans.explain = true --> it writes explain plan in log file after planning and only one time per (queryid,planid) then no need of sampling and with pg_stat_sql_plans.track = 'all' --> function pgssp_backend_queryid(pid) retrieves (nested) queryid of a stuck statement, and permit to retrieve its plan (by its queryid) in logs. Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
On Wed, Feb 26, 2020 at 10:03 PM Fujii Masao wrote: > Thanks for committing this nice feature! You're welcome! > Here is one minor comment. > > + deduplicate_items > + storage parameter > > This should be > > deduplicate_items storage parameter I pushed a fix for this. Thanks -- Peter Geoghegan
BEFORE ROW triggers for partitioned tables
Enabling BEFORE triggers FOR EACH ROW in partitioned tables is very easy -- just remove the check against them. With that, they work fine. The main problem is that the executor is not prepared to re-route the tuple if the user decides to change the partitioning columns (so you get the error that the partitioning constraint is violated by the partition, which makes no sense if you're inserting in the top-level partitioned table). There are several views about that: 1. Just let it be. If the user does something silly, it's their problem if they get an ugly error message. 2. If the partition keys are changed, raise an error. The trigger is allowed to change everything but those columns. Then there's no conflict, and it allows desirable use cases. 3. Allow the partition keys to change, as long as the tuple ends up in the same partition. This is the same as (1) except the error message is nicer. The attached patch implements (2). The cases that are allowed by (3) are a strict superset of those allowed by (2), so if we decide to allow it in the future, it is possible without breaking anything that works after implementing (2). The implementation harnesses the newly added pg_trigger.tgparentid column; if that is set to a non-zero value, then we search up the partitioning hierarchy for each partitioning key column, and verify the values are bitwise equal, up to the "root". Notes: * We must check all levels, not just the one immediately above, because the routing might involve crawling down several levels, and any of those might become invalidated if the trigger changes values. * The "root" is not necessarily the root partitioned table, but instead it's the table that was named in the command. Because of this, we don't need to acquire locks on the tables, since the executor already has the tables open and locked (thus they cannot be modified by concurrent commands). * I find it a little odd that the leaf partition doesn't have any intel on what its partitioning columns are. I guess they haven't been needed thus far, and it seems inappropriate for this admittedly very small feature to add such a burden on the rest of the system. * The new function I added, ReportTriggerPartkeyChange(), contains one serious bug (namely: it doesn't map attribute numbers properly if partitions are differently defined). Also, it has a performance issue, namely that we do heap_getattr() potentially repeatedly -- maybe it'd be better to "slotify" the tuple prior to doing the checks. Another possible controversial point is that its location in commands/trigger.c isn't great. (Frankly, I don't understand why the executor trigger firing functions are in commands/ at all.) Thoughts? -- Álvaro Herrera >From 20ad3320d9d5f16756d3fd0bba2db5df74117d35 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 26 Feb 2020 17:34:54 -0300 Subject: [PATCH] Enable BEFORE row-level triggers for partitioned tables ... with the limitation that if partitioning columns are changed, the operation is aborted. (The reason for this limitation is that it might require re-routing the tuple, which doesn't work.) --- src/backend/commands/tablecmds.c | 7 -- src/backend/commands/trigger.c | 106 ++--- src/backend/partitioning/partdesc.c| 2 +- src/include/utils/reltrigger.h | 1 + src/test/regress/expected/triggers.out | 53 - src/test/regress/sql/triggers.sql | 39 - 6 files changed, 182 insertions(+), 26 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 02a7c04fdb..3b560067a4 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -16503,13 +16503,6 @@ CloneRowTriggersToPartition(Relation parent, Relation partition) !OidIsValid(trigForm->tgparentid))) continue; - /* - * Complain if we find an unexpected trigger type. - */ - if (!TRIGGER_FOR_AFTER(trigForm->tgtype)) - elog(ERROR, "unexpected trigger \"%s\" found", - NameStr(trigForm->tgname)); - /* Use short-lived context for CREATE TRIGGER */ oldcxt = MemoryContextSwitchTo(perTupCxt); diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 6e8b7223fe..d4687de211 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -53,10 +53,12 @@ #include "utils/acl.h" #include "utils/builtins.h" #include "utils/bytea.h" +#include "utils/datum.h" #include "utils/fmgroids.h" #include "utils/inval.h" #include "utils/lsyscache.h" #include "utils/memutils.h" +#include "utils/partcache.h" #include "utils/rel.h" #include "utils/snapmgr.h" #include "utils/syscache.h" @@ -82,6 +84,9 @@ static int MyTriggerDepth = 0; /* Local function prototypes */ static void ConvertTriggerToFK(CreateTrigStmt *stmt, Oid funcoid); static void SetTriggerFlags(TriggerDesc *trigdesc, Trigger *trigger); +static void ReportTriggerPartkeyChange(ResultRelInfo *relinfo, +
Re: pg_trigger.tgparentid
Alvaro Herrera writes: > Thanks both -- pushed. I also changed regress/sql/triggers to leave > tables around that have a non-zero tgparentid. This ensures that the > pg_upgrade test sees such objects, as well as findoidjoins. > I refrained from doing the findoidjoins dance itself, though; I got a > large number of false positives that I think are caused by some pg12-era > hacking. Generally I try to update findoidjoins once per release cycle, after feature freeze. I don't think it's worth messing with it more often than that. But thanks for making sure there'll be data for it to find. regards, tom lane
Re: Error on failed COMMIT
On Thu, 27 Feb 2020 at 07:44, Dave Cramer wrote: > > > > On Wed, 26 Feb 2020 at 16:57, Vik Fearing wrote: > >> On 26/02/2020 22:22, Tom Lane wrote: >> > Dave Cramer writes: >> >> OK, here is a patch that actually doesn't leave the transaction in a >> failed >> >> state but emits the error and rolls back the transaction. >> > >> >> This is far from complete as it fails a number of tests and does not >> cover >> >> all of the possible paths. >> >> But I'd like to know if this is strategy will be acceptable ? >> > >> > I really don't think that changing the server's behavior here is going >> to >> > fly. The people who are unhappy that we changed it are going to vastly >> > outnumber the people who are happy. Even the people who are happy are >> not >> > going to find that their lives are improved all that much, because >> they'll >> > still have to deal with old servers with the old behavior for the >> > foreseeable future. >> >> Dealing with old servers for a while is something that everyone is used >> to. >> >> > Even granting that a behavioral incompatibility is acceptable, I'm not >> > sure how a client is supposed to be sure that this "error" means that a >> > rollback happened, as opposed to real errors that prevented any state >> > change from occurring. >> >> Because the error is a Class 40 — Transaction Rollback. >> >> My original example was: >> >> postgres=!# commit; >> ERROR: 40P00: transaction cannot be committed >> DETAIL: First error was "42601: syntax error at or near "error"". >> >> >> > (A trivial example of that is misspelling the >> > COMMIT command; which I'll grant is unlikely in practice. But there are >> > less-trivial examples involving internal server malfunctions.) >> >> Misspelling the COMMIT command is likely a syntax error, which is Class >> 42. Can you give one of those less-trivial examples please? >> >> > The only >> > way to be sure you're out of the transaction is to check the transaction >> > state that's sent along with ReadyForQuery ... but if you need to do >> > that, it's not clear why we should change the server behavior at all. >> >> How does this differ from the deferred constraint violation example you >> provided early on in the thread? That gave the error 23505 and >> terminated the transaction. If you run the same scenario with the >> primary key immediate, you get the *exact same error* but the >> transaction is *not* terminated! >> >> I won't go so far as to suggest we change all COMMIT errors to Class 40 >> (as the spec says), but I'm thinking it very loudly. >> >> > I also don't think that this scales to the case of subtransaction >> > commit/rollback. That should surely act the same, but your patch >> doesn't >> > change it. >> >> How does one commit a subtransaction? >> >> > Lastly, introducing a new client-visible message level seems right out. >> > That's a very fundamental protocol break, independently of all else. >> >> Yeah, this seemed like a bad idea to me, too. >> > > Ok, here is a much less obtrusive solution thanks to Vladimir. > Still had to mess with error levels since commit and chain needs the existing context to succeed. After fixing up the tests only 1 still failing. Dave Cramer http://www.postgres.rocks throwerror2.patch Description: Binary data
Re: pg_trigger.tgparentid
Thanks both -- pushed. I also changed regress/sql/triggers to leave tables around that have a non-zero tgparentid. This ensures that the pg_upgrade test sees such objects, as well as findoidjoins. I refrained from doing the findoidjoins dance itself, though; I got a large number of false positives that I think are caused by some pg12-era hacking. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
jsonpath syntax extensions
Hi, hackers! Attached patches implement several useful jsonpath syntax extensions. I already published them two years ago in the original SQL/JSON thread, but then after creation of separate threads for SQL/JSON functions and JSON_TABLE I forgot about them. A brief description of the patches: 1. Introduced new jsonpath modifier 'pg' which is used for enabling PostgreSQL-specific extensions. This feature was already proposed in the discussion of jsonpath's like_regex implementation. 2. Added support for raw jbvObject and jbvArray JsonbValues inside jsonpath engine. Now, jsonpath can operate with JSON arrays and objects only in jbvBinary form. But with introduction of array and object constructors in patches #4 and #5 raw in-memory jsonb containers can appear in jsonpath engine. In some places we can iterate through jbvArrays, in others we need to encode jbvArrays and jbvObjects into jbvBinay. 3. SQL/JSON sequence construction syntax. A simple comma-separated list can be used to concatenate single values or sequences into a single resulting sequence. SELECT jsonb_path_query('[1, 2, 3]', 'pg $[*], 4, 2 + 3'); jsonb_path_query -- 1 2 3 4 5 SELECT jsonb_path_query('{ "a": [1, 2, 3], "b": [4, 5] }', 'pg ($.a[*], $.b[*]) ? (@ % 2 == 1)'); jsonb_path_query -- 1 3 5 Patches #4-#6 implement ECMAScript-like syntax constructors and accessors: 4. Array construction syntax. This can also be considered as enclosing a sequence constructor into brackets. SELECT jsonb_path_query('[1, 2, 3]', 'pg [$[*], 4, 2 + 3]'); jsonb_path_query -- [1, 2, 3, 4, 5] Having this feature, jsonb_path_query_array() becomes somewhat redundant. 5. Object construction syntax. It is useful for constructing derived objects from the interesting parts of the original object. (But this is not sufficient to "project" each object in array, item method like '.map()' is needed here.) SELECT jsonb_path_query('{"b": 2}', 'pg { a : 1, b : $.b, "x y" : $.b + 3 }'); jsonb_path_query --- { "a" : 1, "b": 3, "x y": 5 } Fields with empty values are simply skipped regardless of lax/strict mode: SELECT jsonb_path_query('{"a": 1}', 'pg { b : $.b, a : $.a ? (@ > 1) }'); jsonb_path_query -- {} 6. Object subscription syntax. This gives us ability to specify what key to extract on runtime. The syntax is the same as ordinary array subscription syntax. -- non-existent $.x is simply skipped in lax mode SELECT jsonb_path_query('{"a": "b", "b": "c"}', 'pg $[$.a, "x", "a"]'); jsonb_path_query -- "c" "b" SELECT jsonb_path_query('{"a": "b", "b": "c"}', 'pg $[$fld]', '{"fld": "b"}'); jsonb_path_query -- "c" -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company >From 258579bf45484fd150d952cf78f32f37fabff77a Mon Sep 17 00:00:00 2001 From: Nikita Glukhov Date: Thu, 27 Jun 2019 19:58:35 +0300 Subject: [PATCH v1 1/8] Add jsonpath 'pg' modifier for enabling extensions --- doc/src/sgml/func.sgml | 29 ++ src/backend/utils/adt/jsonpath.c | 54 ++ src/backend/utils/adt/jsonpath_exec.c | 4 +++ src/backend/utils/adt/jsonpath_gram.y | 17 +++ src/backend/utils/adt/jsonpath_scan.l | 1 + src/include/utils/jsonpath.h | 9 -- src/test/regress/expected/jsonpath.out | 18 src/test/regress/sql/jsonpath.sql | 3 ++ src/tools/pgindent/typedefs.list | 1 + 9 files changed, 116 insertions(+), 20 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 28035f1..d344b95 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12601,6 +12601,35 @@ table2-mapping + +Extensions + + PostgreSQL has some extensions to the SQL/JSON + Path standard. These syntax extensions that can enabled by the specifying + additional pg modifier before the + strict/lax flags. + shows these + extensions with examples. + + + +jsonpath Syntax Extensions + + + +Name +Description +Example JSON +Example JSON path +Result + + + + + + + + Regular Expressions diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c index 3c0dc38..17c09a7 100644 --- a/src/backend/utils/adt/jsonpath.c +++ b/src/backend/utils/adt/jsonpath.c @@ -72,11 +72,20 @@ #include "utils/jsonpath.h" +/* Context for jsonpath encoding. */ +typedef struct JsonPathEncodingContext +{ + StringInfo buf; /* output buffer */ + bool ext; /* PG extensions are enabled? */ +} JsonPathEncodingContext; + static Datum jsonPathFromCstring(char *in, int len); static char *jsonPathToCstring(StringInfo out, JsonPath *in, int estimated
Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
On 2020-02-27 16:41, Alexey Kondratov wrote: New version of the patch is attached. Thanks again for your review. Last patch (v18) got a conflict with one of today commits (05d8449e73). Rebased version is attached. -- Alexey Kondratov Postgres Professional https://www.postgrespro.com The Russian Postgres Company From ea93b52b298d80aac547735c5917386b37667595 Mon Sep 17 00:00:00 2001 From: Alexander Korotkov Date: Tue, 25 Feb 2020 02:22:45 +0300 Subject: [PATCH v19] pg_rewind: Add options to restore WAL files from archive Currently, pg_rewind fails when it could not find required WAL files in the target data directory. One have to manually figure out which WAL files are required and copy them back from archive. This commit implements new pg_rewind options, which allow pg_rewind to automatically retrieve missing WAL files from archival storage. The restore_command option is read from postgresql.conf. Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1%40postgrespro.ru Author: Alexey Kondratov Reviewed-by: Michael Paquier, Andrey Borodin, Alvaro Herrera Reviewed-by: Andres Freund, Alexander Korotkov --- doc/src/sgml/ref/pg_rewind.sgml | 28 -- src/backend/access/transam/xlogarchive.c | 60 ++-- src/bin/pg_rewind/parsexlog.c| 33 ++- src/bin/pg_rewind/pg_rewind.c| 77 +++- src/bin/pg_rewind/pg_rewind.h| 6 +- src/bin/pg_rewind/t/001_basic.pl | 3 +- src/bin/pg_rewind/t/RewindTest.pm| 66 +- src/common/Makefile | 2 + src/common/archive.c | 102 + src/common/fe_archive.c | 111 +++ src/include/common/archive.h | 22 + src/include/common/fe_archive.h | 19 src/tools/msvc/Mkvcbuild.pm | 8 +- 13 files changed, 457 insertions(+), 80 deletions(-) create mode 100644 src/common/archive.c create mode 100644 src/common/fe_archive.c create mode 100644 src/include/common/archive.h create mode 100644 src/include/common/fe_archive.h diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml index 42d29edd4e..64a6942031 100644 --- a/doc/src/sgml/ref/pg_rewind.sgml +++ b/doc/src/sgml/ref/pg_rewind.sgml @@ -66,11 +66,11 @@ PostgreSQL documentation can be found either on the target timeline, the source timeline, or their common ancestor. In the typical failover scenario where the target cluster was shut down soon after the divergence, this is not a problem, but if the - target cluster ran for a long time after the divergence, the old WAL - files might no longer be present. In that case, they can be manually - copied from the WAL archive to the pg_wal directory, or - fetched on startup by configuring or - . The use of + target cluster ran for a long time after the divergence, its old WAL + files might no longer be present. In this case, you can manually copy them + from the WAL archive to the pg_wal directory, or run + pg_rewind with the -c option to + automatically retrieve them from the WAL archive. The use of pg_rewind is not limited to failover, e.g. a standby server can be promoted, run some write transactions, and then rewinded to become a standby again. @@ -232,6 +232,19 @@ PostgreSQL documentation + + -c + --restore-target-wal + + +Use the restore_command defined in +the target cluster configuration to retrieve WAL files from +the WAL archive if these files are no longer available in the +pg_wal directory. + + + + --debug @@ -318,7 +331,10 @@ GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, b history forked off from the target cluster. For each WAL record, record each data block that was touched. This yields a list of all the data blocks that were changed in the target cluster, after the - source cluster forked off. + source cluster forked off. If some of the WAL files are no longer + available, try re-running pg_rewind with + the -c option to search for the missing files in + the WAL archive. diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index 188b73e752..f78a7e8f02 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -21,6 +21,7 @@ #include "access/xlog.h" #include "access/xlog_internal.h" +#include "common/archive.h" #include "miscadmin.h" #include "postmaster/startup.h" #include "replication/walsender.h" @@ -55,9 +56,6 @@ RestoreArchivedFile(char *path, const char *xlogfname, char xlogpath[MAXPGPATH]; char xlogRestoreCmd[MAXPGPATH]; char lastRestartPointFname[MAXPGPATH]; - char *dp; - char *end
Re: proposal: schema variables
čt 27. 2. 2020 v 15:59 odesílatel DUVAL REMI napsal: > Hello Pavel. > > > > That looks pretty good to me ! > > > > I’m adding Philippe Beaudoin who was also interested in this topic. > > > > Recap : We were looking for a way to separate variable from constants in > the “Schema Variables” proposition from Pavel. > > Pavel was saying that there are some limitations regarding the keywords we > can use, as the community don’t want to introduce too much new keywords in > Postgres SQL (PL/pgSQL is a different list of keywords). > > “CONSTANT” is not a keyword in SQL for Now (though it is one in PL/pgSQL). > > Pavel’s syntax allow to use it as a keyword in the “WITH OPTIONS” clause > that is already supported. > > … I think it’s a good idea. > > > > The list of keywords is defined in : postgresql\src\include\parser\kwlist.h > > > > Pavel, I saw that in DB2, those variables are called “Global Variables”, > is it something we can consider changing, or do you prefer to keep using > the “Schema Variable” name ? > It is most hard question. Global variables has sense, but when we will use it in plpgsql, then this name can be little bit confusing. Personally I prefer "schema variable" although my opinion is not too strong. This name more signalize so this is more generic, more database related than some special kind of plpgsql variables. Now, I think so maybe is better to use schema variables, because there is different syntax then global temp tables. Variables are global by design. So in this moment I prefer the name "schema variables". It can be used as global variables in plpgsql, but it is one case. Pavel > > > > *De :* Pavel Stehule [mailto:pavel.steh...@gmail.com] > *Envoyé :* jeudi 27 février 2020 15:38 > *À :* DUVAL REMI > *Cc :* PostgreSQL Hackers > *Objet :* Re: proposal: schema variables > > > > > > Hi > > > > > 3) Any way to define CONSTANTs ? > We already talked a bit about this subject and also Gilles Darold > introduces it in this mailing-list topic but I'd like to insist on it. > I think it would be nice to have a way to say that a variable should not > be changed once defined. > Maybe it's hard to implement and can be implemented later, but I just want > to know if this concern is open. > > > > I played little bit with it and I didn't find any nice solution, but maybe > I found the solution. I had ideas about some variants, but almost all time > I had a problem with parser's shifts because all potential keywords are not > reserved. > > > > last variant, but maybe best is using keyword WITH > > > > So the syntax can looks like > > > > CREATE [ TEMP ] VARIABLE varname [ AS ] type [ NOT NULL ] [ DEFAULT > expression ] [ WITH [ OPTIONS ] '(' ... ')' ] ] > > > > What do you think about this syntax? It doesn't need any new keyword, and > it easy to enhance it. > > > > CREATE VARIABLE foo AS int DEFAULT 10 WITH OPTIONS ( CONSTANT); > > > > ? > > > > Regards > > > > Pavel > > > > >
RE: proposal: schema variables
Hello Pavel. That looks pretty good to me ! I’m adding Philippe Beaudoin who was also interested in this topic. Recap : We were looking for a way to separate variable from constants in the “Schema Variables” proposition from Pavel. Pavel was saying that there are some limitations regarding the keywords we can use, as the community don’t want to introduce too much new keywords in Postgres SQL (PL/pgSQL is a different list of keywords). “CONSTANT” is not a keyword in SQL for Now (though it is one in PL/pgSQL). Pavel’s syntax allow to use it as a keyword in the “WITH OPTIONS” clause that is already supported. … I think it’s a good idea. The list of keywords is defined in : postgresql\src\include\parser\kwlist.h Pavel, I saw that in DB2, those variables are called “Global Variables”, is it something we can consider changing, or do you prefer to keep using the “Schema Variable” name ? De : Pavel Stehule [mailto:pavel.steh...@gmail.com] Envoyé : jeudi 27 février 2020 15:38 À : DUVAL REMI Cc : PostgreSQL Hackers Objet : Re: proposal: schema variables Hi 3) Any way to define CONSTANTs ? We already talked a bit about this subject and also Gilles Darold introduces it in this mailing-list topic but I'd like to insist on it. I think it would be nice to have a way to say that a variable should not be changed once defined. Maybe it's hard to implement and can be implemented later, but I just want to know if this concern is open. I played little bit with it and I didn't find any nice solution, but maybe I found the solution. I had ideas about some variants, but almost all time I had a problem with parser's shifts because all potential keywords are not reserved. last variant, but maybe best is using keyword WITH So the syntax can looks like CREATE [ TEMP ] VARIABLE varname [ AS ] type [ NOT NULL ] [ DEFAULT expression ] [ WITH [ OPTIONS ] '(' ... ')' ] ] What do you think about this syntax? It doesn't need any new keyword, and it easy to enhance it. CREATE VARIABLE foo AS int DEFAULT 10 WITH OPTIONS ( CONSTANT); ? Regards Pavel
Re: proposal: schema variables
Hi > 3) Any way to define CONSTANTs ? > We already talked a bit about this subject and also Gilles Darold > introduces it in this mailing-list topic but I'd like to insist on it. > I think it would be nice to have a way to say that a variable should not > be changed once defined. > Maybe it's hard to implement and can be implemented later, but I just want > to know if this concern is open. > I played little bit with it and I didn't find any nice solution, but maybe I found the solution. I had ideas about some variants, but almost all time I had a problem with parser's shifts because all potential keywords are not reserved. last variant, but maybe best is using keyword WITH So the syntax can looks like CREATE [ TEMP ] VARIABLE varname [ AS ] type [ NOT NULL ] [ DEFAULT expression ] [ WITH [ OPTIONS ] '(' ... ')' ] ] What do you think about this syntax? It doesn't need any new keyword, and it easy to enhance it. CREATE VARIABLE foo AS int DEFAULT 10 WITH OPTIONS ( CONSTANT); ? Regards Pavel
Re: Commit fest manager for 2020-03
On 2020-Feb-27, Michael Paquier wrote: > On Wed, Feb 26, 2020 at 08:34:26PM +0100, Tomas Vondra wrote: > > Nope, the RMT for PG12 was announced on 2019/03/30 [1], i.e. shortly > > before the end of the last CF (and before pgcon). I think there was some > > discussion about the members at/after the FOSDEM dev meeting. The > > overlap with CFM duties is still fairly minimal, and there's not much > > for RMT to do before the end of the last CF anyway ... > > > > [1] https://www.postgresql.org/message-id/20190330094043.ga28...@paquier.xyz > > > > Maybe we shouldn't wait with assembling RMT until pgcon, though. > > Waiting until PGCon if a bad idea, +1. As I recall, the RMT assembles around the time the last CF is over. Last year it was announced on March 30th, which is the latest date it has ever happened. The history can be seen at the bottom here: https://wiki.postgresql.org/wiki/RMT -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
On 2020-02-27 04:52, Michael Paquier wrote: On Thu, Feb 27, 2020 at 12:43:55AM +0300, Alexander Korotkov wrote: Regarding text split change, it was made by pgindent. I didn't notice it belongs to unchanged part of code. Sure, we shouldn't include this into the patch. I have read through v17 (not tested, sorry), and spotted a couple of issues that need to be addressed. + "--source-pgdata=$standby_pgdata", + "--target-pgdata=$master_pgdata", + "--no-sync", "--no-ensure-shutdown", FWIW, I think that perl indenting would reshape this part. I would recommend to run src/tools/pgindent/pgperltidy and ./src/tools/perlcheck/pgperlcritic before commit. Thanks, formatted this part with perltidy. It also has modified RecursiveCopy's indents. Pgperlcritic has no complains about this file. BTW, being executed on the whole project pgperltidy modifies dozens of perl files an even pgindent itself. + * Copyright (c) 2020, PostgreSQL Global Development Group Wouldn't it be better to just use the full copyright here? I mean the following: Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group Portions Copyright (c) 1994, The Regents of the University of California I think so, it contains some older code parts, so it is better to use unified copyrights. +++ b/src/common/archive.c [...] +#include "postgres.h" + +#include "common/archive.h" This is incorrect. All files shared between the backend and the frontend in src/common/ have to include the following set of headers: #ifndef FRONTEND #include "postgres.h" #else #include "postgres_fe.h" #endif +++ b/src/common/fe_archive.c [...] +#include "postgres_fe.h" This is incomplete. The following piece should be added: #ifndef FRONTEND #error "This file is not expected to be compiled for backend code" #endif Fixed both. + snprintf(postgres_cmd, sizeof(postgres_cmd), "%s -D %s -C restore_command", +postgres_exec_path, datadir_target); + I think that this is missing proper quoting. Yep, added the same quoting as in pg_upgrade/options. I would rename ConstructRestoreCommand() to BuildRestoreCommand() while on it.. OK, shorter is better. I think that it would be saner to check the return status of ConstructRestoreCommand() in xlogarchive.c as a sanity check, with an elog(ERROR) if not 0, as that should never happen. Added. New version of the patch is attached. Thanks again for your review. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com The Russian Postgres Company From c775a2e40e405474f6ecef35843d276d43fb462f Mon Sep 17 00:00:00 2001 From: Alexander Korotkov Date: Tue, 25 Feb 2020 02:22:45 +0300 Subject: [PATCH v18] pg_rewind: Add options to restore WAL files from archive Currently, pg_rewind fails when it could not find required WAL files in the target data directory. One have to manually figure out which WAL files are required and copy them back from archive. This commit implements new pg_rewind options, which allow pg_rewind to automatically retrieve missing WAL files from archival storage. The restore_command option is read from postgresql.conf. Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1%40postgrespro.ru Author: Alexey Kondratov Reviewed-by: Michael Paquier, Andrey Borodin, Alvaro Herrera Reviewed-by: Andres Freund, Alexander Korotkov --- doc/src/sgml/ref/pg_rewind.sgml | 28 -- src/backend/access/transam/xlogarchive.c | 60 ++-- src/bin/pg_rewind/parsexlog.c| 33 ++- src/bin/pg_rewind/pg_rewind.c| 77 +++- src/bin/pg_rewind/pg_rewind.h| 6 +- src/bin/pg_rewind/t/001_basic.pl | 3 +- src/bin/pg_rewind/t/RewindTest.pm| 66 +- src/common/Makefile | 2 + src/common/archive.c | 102 + src/common/fe_archive.c | 111 +++ src/include/common/archive.h | 22 + src/include/common/fe_archive.h | 19 src/tools/msvc/Mkvcbuild.pm | 8 +- 13 files changed, 457 insertions(+), 80 deletions(-) create mode 100644 src/common/archive.c create mode 100644 src/common/fe_archive.c create mode 100644 src/include/common/archive.h create mode 100644 src/include/common/fe_archive.h diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml index 42d29edd4e..64a6942031 100644 --- a/doc/src/sgml/ref/pg_rewind.sgml +++ b/doc/src/sgml/ref/pg_rewind.sgml @@ -66,11 +66,11 @@ PostgreSQL documentation can be found either on the target timeline, the source timeline, or their common ancestor. In the typical failover scenario where the target cluster was shut down soon after the divergence, this is not a problem, but if the -
Re: Improve handling of parameter differences in physical replication
On 2020-02-27 11:13, Fujii Masao wrote: Btw., I think the current setup is slightly buggy. The MaxBackends value that is used to size shared memory is computed as MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but we don't track autovacuum_max_workers in WAL. Maybe this is because autovacuum doesn't work during recovery? Autovacuum on the primary can use locks or xids, and so it's possible that the standby when processing WAL encounters more of those than it has locally allocated shared memory to handle. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Using stat collector for collecting long SQL
Hi > There is a often problem with taking source of long SQL strings. The > pg_stat_activity field query is reduced to some short limit and is not too > practical to increase this limit. I thought it was "old story", since that track_activity_query_size can be increased widely: https://www.postgresql.org/message-id/flat/7b5ecc5a9991045e2f13c84e3047541d%40postgrespro.ru [...] > It can be base for implementation EXPLAIN PID ? +1 for this concept, that would be very usefull for diagnosing stuck queries. Until now the only proposal I saw regarding this was detailled in https://www.postgresql.org/message-id/1582756552256-0.p...@n3.nabble.com a prerequisite to be able to use extension postgrespro/pg_query_state Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: [Proposal] Level4 Warnings show many shadow vars
On Thu, Feb 27, 2020 at 10:43:50AM +0100, Peter Eisentraut wrote: > This thread is still in the commit fest, but it's apparently gone as far as > it will, so I've set it to "Committed" for lack of a "partial" status. Thanks, that sounds right to me. I was just looking at the latest patch presented after seeing your reply, and I did not spot immediately any issues standing out compared to the others. -- Michael signature.asc Description: PGP signature
Re: PG v12.2 - Setting jit_above_cost is causing the server to crash
Hi On Thu, Feb 27, 2020 at 12:41 PM Tom Lane wrote: > Aditya Toshniwal writes: > > On Mon, Feb 24, 2020 at 12:46 PM Andres Freund > wrote: > >> This isn't reproducible here. Are you sure that you're running on a > >> clean installation? > > > Yes I did a fresh installation using installer provided here - > > https://www.enterprisedb.com/downloads/postgresql > > There is apparently something wrong with the JIT stuff in EDB's 12.2 > build for macOS. At least, that's the conclusion I came to after > off-list discussion with the submitter of bug #16264, which has pretty > much exactly this symptom (especially if you're seeing "signal 9" > reports in the postmaster log). For him, either disabling JIT or > reverting to 12.1 made it go away. > We've been looking into this; Apple started a notarisation process some time ago, designed to mark their applications as conforming to various security requirements, but prior to Catalina it was essentially optional. When Catalina was released, they made notarisation for distributed software a requirement, but had the process issue warnings for non-compliance. As-of the end of January, those warnings became hard errors, so now our packages must be notarised, and for that to happen, must be hardened by linking with a special runtime and having securely time stamped signatures on every binary before being checked and notarised as such by Apple. Without that, users would have to disable security features on their systems before they could run our software. Our packages are being successfully notarised at the moment, because that's essentially done through a static analysis. We can (and have) added what Apple call an entitlement in test builds which essentially puts a flag in the notarisation for the product that declares that it will do JIT operations, however, it seems that this alone is not enough and that in addition to the entitlement, we also need to include the MAP_JIT flag in mmap() calls. See https://developer.apple.com/documentation/security/hardened_runtime and https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_security_cs_allow-jit We're working on trying to test a patch for that at the moment. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Error on failed COMMIT
On Wed, 26 Feb 2020 at 16:57, Vik Fearing wrote: > On 26/02/2020 22:22, Tom Lane wrote: > > Dave Cramer writes: > >> OK, here is a patch that actually doesn't leave the transaction in a > failed > >> state but emits the error and rolls back the transaction. > > > >> This is far from complete as it fails a number of tests and does not > cover > >> all of the possible paths. > >> But I'd like to know if this is strategy will be acceptable ? > > > > I really don't think that changing the server's behavior here is going to > > fly. The people who are unhappy that we changed it are going to vastly > > outnumber the people who are happy. Even the people who are happy are > not > > going to find that their lives are improved all that much, because > they'll > > still have to deal with old servers with the old behavior for the > > foreseeable future. > > Dealing with old servers for a while is something that everyone is used to. > > > Even granting that a behavioral incompatibility is acceptable, I'm not > > sure how a client is supposed to be sure that this "error" means that a > > rollback happened, as opposed to real errors that prevented any state > > change from occurring. > > Because the error is a Class 40 — Transaction Rollback. > > My original example was: > > postgres=!# commit; > ERROR: 40P00: transaction cannot be committed > DETAIL: First error was "42601: syntax error at or near "error"". > > > > (A trivial example of that is misspelling the > > COMMIT command; which I'll grant is unlikely in practice. But there are > > less-trivial examples involving internal server malfunctions.) > > Misspelling the COMMIT command is likely a syntax error, which is Class > 42. Can you give one of those less-trivial examples please? > > > The only > > way to be sure you're out of the transaction is to check the transaction > > state that's sent along with ReadyForQuery ... but if you need to do > > that, it's not clear why we should change the server behavior at all. > > How does this differ from the deferred constraint violation example you > provided early on in the thread? That gave the error 23505 and > terminated the transaction. If you run the same scenario with the > primary key immediate, you get the *exact same error* but the > transaction is *not* terminated! > > I won't go so far as to suggest we change all COMMIT errors to Class 40 > (as the spec says), but I'm thinking it very loudly. > > > I also don't think that this scales to the case of subtransaction > > commit/rollback. That should surely act the same, but your patch doesn't > > change it. > > How does one commit a subtransaction? > > > Lastly, introducing a new client-visible message level seems right out. > > That's a very fundamental protocol break, independently of all else. > > Yeah, this seemed like a bad idea to me, too. > Ok, here is a much less obtrusive solution thanks to Vladimir. FWIW, only 10 of 196 tests fail. Dave Cramer www.postgres.rocks > > throwerror2.patch Description: Binary data
Re: Crash by targetted recovery
On 2020/02/27 17:05, Kyotaro Horiguchi wrote: Thank you for the comment. At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao wrote in On 2020/02/27 15:23, Kyotaro Horiguchi wrote: I failed to understand why random access while reading from stream is bad idea. Could you elaborate why? It seems to me the word "streaming" suggests that WAL record should be read sequentially. Random access, which means reading from arbitrary location, breaks a stream. (But the patch doesn't try to stop wal sender if randAccess.) Isn't it sufficient to set currentSource to 0 when disabling StandbyMode? I thought that and it should work, but I hesitated to manipulate on currentSource in StartupXLOG. currentSource is basically a private state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I think it's not good to modify it out of the the logic in WaitForWALToBecomeAvailable. If so, what about adding the following at the top of WaitForWALToBecomeAvailable()? if (!StandbyMode && currentSource == XLOG_FROM_STREAM) currentSource = 0; It works virtually the same way. I'm happy to do that if you don't agree to using randAccess. But I'd rather do that in the 'if (!InArchiveRecovery)' section. The approach using randAccess seems unsafe. Please imagine the case where currentSource is changed to XLOG_FROM_ARCHIVE because randAccess is true, while walreceiver is still running. For example, this case can occur when the record at REDO starting point is fetched with randAccess = true after walreceiver is invoked to fetch the last checkpoint record. The situation "currentSource != XLOG_FROM_STREAM while walreceiver is running" seems invalid. No? So I think that the approach that I proposed is better. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: Online verification of checksums
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested The patch applies cleanly and works as expected. Just a few minor observations: - I would suggest refactoring PageIsZero function by getting rid of all_zeroes variable and simply returning false when a non-zero byte is found, rather than setting all_zeros variable to false and breaking the for loop. The function should simply return true at the end otherwise. - Remove the empty line: +* would throw an assertion failure. Consider this a +* checksum failure. +*/ + + checksum_failures++; - Code needs to run through pgindent. Also, I'd suggest to make "5" a define within the current file/function, perhaps something like "MAX_CHECKSUM_FAILURES". You could move the second warning outside the conditional statement as it appears in both "if" and "else" blocks. Regards, --Asif The new status of this patch is: Waiting on Author
Re: Improve handling of parameter differences in physical replication
On 2020/02/27 17:23, Peter Eisentraut wrote: When certain parameters are changed on a physical replication primary, this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record. The standby then checks whether its own settings are at least as big as the ones on the primary. If not, the standby shuts down with a fatal error. The correspondence of settings between primary and standby is required because those settings influence certain shared memory sizings that are required for processing WAL records that the primary might send. For example, if the primary sends a prepared transaction, the standby must have had max_prepared_transaction set appropriately or it won't be able to process those WAL records. However, fatally shutting down the standby immediately upon receipt of the parameter change record might be a bit of an overreaction. The resources related to those settings are not required immediately at that point, and might never be required if the activity on the primary does not exhaust all those resources. An extreme example is raising max_prepared_transactions on the primary but never actually using prepared transactions. Where this becomes a serious problem is if you have many standbys and you do a failover. If the newly promoted standby happens to have a higher setting for one of the relevant parameters, all the other standbys that have followed it then shut down immediately and won't be able to continue until you change all their settings. If we didn't do the hard shutdown and we just let the standby roll on with recovery, nothing bad will happen and it will eventually produce an appropriate error when those resources are required (e.g., "maximum number of prepared transactions reached"). So I think there are better ways to handle this. It might be reasonable to provide options. The attached patch doesn't do that but it would be pretty easy. What the attached patch does is: Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but only issue a warning and set a global flag if there is a problem. Then when we actually hit the resource issue and the flag was set, we issue another warning message with relevant information. Additionally, at that point we pause recovery instead of shutting down, so a hot standby remains usable. (That could certainly be configurable.) +1 Btw., I think the current setup is slightly buggy. The MaxBackends value that is used to size shared memory is computed as MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but we don't track autovacuum_max_workers in WAL. Maybe this is because autovacuum doesn't work during recovery? Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: Improve handling of parameter differences in physical replication
Hello Thank you for working on this! > Where this becomes a serious problem is if you have many standbys and you do > a failover. +1 Several times my team would like to pause recovery instead of panic after change settings on primary. (same thing for create_tablespace_directories replay errors too...) We documented somewhere (excluding code) shutting down the standby immediately upon receipt of the parameter change? doc/src/sgml/high-availability.sgml says only about "refuse to start". regards, Sergei
Re: [Proposal] Level4 Warnings show many shadow vars
On 2019-12-18 10:55, Alvaro Herrera wrote: On 2019-Dec-18, Michael Paquier wrote: Let's take one example. The changes in pg_dump/ like /progname/prog_name/ have just been done in haste, without actual thoughts about how the problem ought to be fixed. And in this case, something which could be more adapted is to remove the argument from usage() because progname is a global variable, initialized from the beginning in pg_restore. We discussed progname as a global/local before -- IIRC in the thread that introduced the frontend logging API -- and while I think the whole issue could stand some improvement, we shouldn't let it be driven by minor changes; that'll only make it more confusing. IMO if we want it improved, a larger change (involving the bunch of frontend programs) is what to look for. Maybe what you suggest is an improvement, though (certainly the "prog_name" patch wasn't). This thread is still in the commit fest, but it's apparently gone as far as it will, so I've set it to "Committed" for lack of a "partial" status. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Connection time for \conninfo
My opinion is that this is not particularly useful and not appropriate to piggy-back onto \conninfo. Connection information including host, port, database, user name is a well-established concept in PostgreSQL programs and tools and it contains a delimited set of information. Knowing what server and what database you are connected to also seems kind of important. Moreover, this is information that is under control of the client, so it must be tracked on the client side. Knowing how long you've been connected on the other hand seems kind of fundamentally unimportant. If we add that, what's to stop us from adding other statistics of minor interest such as how many commands you've run, how many errors there were, etc. The connection time is already available, and perhaps we should indeed make it a bit easier to get, but it doesn't need to be a psql command. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: vacuum verbose detail logs are unclear; log at *start* of each stage
On 2020-01-26 06:36, Justin Pryzby wrote: From a3d0b41435655615ab13f808ec7c30e53e596e50 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 25 Jan 2020 21:25:37 -0600 Subject: [PATCH v3 1/4] Remove gettext erronously readded at 580ddce --- src/backend/access/heap/vacuumlazy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 8ce5011..8e8ea9d 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1690,7 +1690,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, "%u pages are entirely empty.\n", empty_pages), empty_pages); - appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0)); + appendStringInfo(&buf, "%s.", pg_rusage_show(&ru0)); ereport(elevel, (errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages", -- 2.7.4 Why do you think it was erroneously added? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Yet another vectorized engine
On 27.02.2020 11:09, Hubert Zhang wrote: Hi Konstantin, I also vimdiff nodeAgg.c in your PG13 branch with nodeAgg.c in pg's main repo. Many functions has changed from PG96 to PG13, e.g. 'advance_aggregates', 'lookup_hash_entry' The vectorized nodeAgg seems still follow the PG96 way of implementing these functions. In general, I think we'd better port executor of PG13 to vectorized executor of PG13 instead of merge some PG13 code into vectorized executor of PG96 to make it works. Because It's hard to determine which functions need to be merged and it's buggy if the executor code of both PG13 and PG96 exist in one branch. What's your opinion? In new version of Postgres all logic of aggregates transition is encapsulated in expression and performed by execExprInterp or generated GIT code. If we not going to embed vectorize engine in kernel and continue to develop it as extension, then I do not have any good idea how to achieve it without copying and patching code of ExecInterpExpr. In any case, the current prototype doesn't show any noticeable performance improvement comparing with existed executor with enabled JIT. And providing vectorized version of ExecInterpExpr will not help to increase speed (according to profile time is spent in other places). -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [postgis-devel] About EXTENSION from UNPACKAGED on PostgreSQL 13
On Thu, Feb 27, 2020 at 09:32:24AM +0100, Sandro Santilli wrote: > On Wed, Feb 26, 2020 at 11:18:43AM -0500, Chapman Flack wrote: > > On 2/26/20 10:52 AM, Sandro Santilli wrote: > > > > > This part is not clear to me. You're _assuming_ that the unpackaged--xxx > > > will not make checks, so you _drop_ support for it ? Can't the normal > > > extension script also be unsafe for some reason ? Or can't the > > > unpackaged-xxx script be made safe by the publishers ? Or, as a last > > > resort.. can't you just mark postgis as UNSAFE and still require > > > superuser, which would give us the same experience as before ? > > > > I am wondering: does anything in the PG 13 change preclude writing > > a postgis_raster--unpackaged.sql script that could be applied with > > CREATE EXTENSION postgis_raster VERSION unpackaged; > > and would do perhaps nothing at all, or merely confirm that the > > right unpackaged things are present and are the right things... > > > > ... from which an ALTER EXTENSION postgis_raster UPDATE TO 3.0; > > would naturally run the existing postgis_raster--unpackaged--3.0.sql > > and execute all of its existing ALTER EXTENSION ... ADD operations? > > > > Has the disadvantage of being goofy, but possibly the advantage of > > being implementable in the current state of affairs. > > Thanks for this hint, yes, seems to be technically feasible, as well > as doing packaging in the extension creation scripts. Only... this > would basically work around the intentionally removed syntax, which > Steven Frost was against (still unclear to me why)... NOTE: my suggestion was to directly have CREATE EXTENSION do the packaging, which would give the same level of security as the workaround suggested here, but with less hops. --strk;
Re: WIP: Aggregation push-down
legrand legrand wrote: > Antonin Houska-2 wrote > > Right now I recall two problems: 1) is the way I currently store > > RelOptInfo for the grouped relations correct?, 2) how should we handle > > types for which logical equality does not imply physical (byte-wise) > > equality? > > > > Fortunately it seems now that I'm not the only one who cares about 2), so > > this > > problem might be resolved soon: > > > > https://www.postgresql.org/message-id/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ%40mail.gmail.com > > > > But 1) still remains. > > > > Hello > would "pgsql: Add equalimage B-Tree support functions." > https://www.postgresql.org/message-id/e1j72ny-0002gi...@gemulon.postgresql.org Yes, it seems so. I'll adapt the patch soon, hopefully next week. Thanks for reminder. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Using stat collector for collecting long SQL
Hi There is a often problem with taking source of long SQL strings. The pg_stat_activity field query is reduced to some short limit and is not too practical to increase this limit. I have a idea to use for solution of this problem stat_collector process. When the query string is reduced, then full form of query string can be saved in stat_collector process. Maybe with execution plan. Then any process can read these information from this process. What do you think about this idea? It can be base for implementation EXPLAIN PID ? Notes, comments? Regards Pavel
Re: [postgis-devel] About EXTENSION from UNPACKAGED on PostgreSQL 13
On Wed, Feb 26, 2020 at 11:18:43AM -0500, Chapman Flack wrote: > On 2/26/20 10:52 AM, Sandro Santilli wrote: > > > This part is not clear to me. You're _assuming_ that the unpackaged--xxx > > will not make checks, so you _drop_ support for it ? Can't the normal > > extension script also be unsafe for some reason ? Or can't the > > unpackaged-xxx script be made safe by the publishers ? Or, as a last > > resort.. can't you just mark postgis as UNSAFE and still require > > superuser, which would give us the same experience as before ? > > I am wondering: does anything in the PG 13 change preclude writing > a postgis_raster--unpackaged.sql script that could be applied with > CREATE EXTENSION postgis_raster VERSION unpackaged; > and would do perhaps nothing at all, or merely confirm that the > right unpackaged things are present and are the right things... > > ... from which an ALTER EXTENSION postgis_raster UPDATE TO 3.0; > would naturally run the existing postgis_raster--unpackaged--3.0.sql > and execute all of its existing ALTER EXTENSION ... ADD operations? > > Has the disadvantage of being goofy, but possibly the advantage of > being implementable in the current state of affairs. Thanks for this hint, yes, seems to be technically feasible, as well as doing packaging in the extension creation scripts. Only... this would basically work around the intentionally removed syntax, which Steven Frost was against (still unclear to me why)... --strk;
Improve handling of parameter differences in physical replication
When certain parameters are changed on a physical replication primary, this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record. The standby then checks whether its own settings are at least as big as the ones on the primary. If not, the standby shuts down with a fatal error. The correspondence of settings between primary and standby is required because those settings influence certain shared memory sizings that are required for processing WAL records that the primary might send. For example, if the primary sends a prepared transaction, the standby must have had max_prepared_transaction set appropriately or it won't be able to process those WAL records. However, fatally shutting down the standby immediately upon receipt of the parameter change record might be a bit of an overreaction. The resources related to those settings are not required immediately at that point, and might never be required if the activity on the primary does not exhaust all those resources. An extreme example is raising max_prepared_transactions on the primary but never actually using prepared transactions. Where this becomes a serious problem is if you have many standbys and you do a failover. If the newly promoted standby happens to have a higher setting for one of the relevant parameters, all the other standbys that have followed it then shut down immediately and won't be able to continue until you change all their settings. If we didn't do the hard shutdown and we just let the standby roll on with recovery, nothing bad will happen and it will eventually produce an appropriate error when those resources are required (e.g., "maximum number of prepared transactions reached"). So I think there are better ways to handle this. It might be reasonable to provide options. The attached patch doesn't do that but it would be pretty easy. What the attached patch does is: Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but only issue a warning and set a global flag if there is a problem. Then when we actually hit the resource issue and the flag was set, we issue another warning message with relevant information. Additionally, at that point we pause recovery instead of shutting down, so a hot standby remains usable. (That could certainly be configurable.) Btw., I think the current setup is slightly buggy. The MaxBackends value that is used to size shared memory is computed as MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but we don't track autovacuum_max_workers in WAL. (This patch was developed together with Simon Riggs.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From f5b4b7fd853b0dba2deea6b1e8290ae4c6df7081 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 27 Feb 2020 08:50:37 +0100 Subject: [PATCH v1] Improve handling of parameter differences in physical replication When certain parameters are changed on a physical replication primary, this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record. The standby then checks whether its own settings are at least as big as the ones on the primary. If not, the standby shuts down with a fatal error. The correspondence of settings between primary and standby is required because those settings influence certain shared memory sizings that are required for processing WAL records that the primary might send. For example, if the primary sends a prepared transaction, the standby must have had max_prepared_transaction set appropriately or it won't be able to process those WAL records. However, fatally shutting down the standby immediately upon receipt of the parameter change record might be a bit of an overreaction. The resources related to those settings are not required immediately at that point, and might never be required if the activity on the primary does not exhaust all those resources. If we just let the standby roll on with recovery, it will eventually produce an appropriate error when those resources are used. So this patch relaxes this a bit. Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but only issue a warning and set a global flag if there is a problem. Then when we actually hit the resource issue and the flag was set, we issue another warning message with relevant information. Additionally, at that point we pause recovery, so a hot standby remains usable. --- src/backend/access/transam/twophase.c | 3 ++ src/backend/access/transam/xlog.c | 56 ++- src/backend/storage/ipc/procarray.c | 15 ++- src/backend/storage/lmgr/lock.c | 10 + src/include/access/xlog.h | 1 + 5 files changed, 74 insertions(+), 11 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 5adf956f41..fdac2ed69d 100644 --- a/src/b
Re: Yet another vectorized engine
Hi Konstantin, I also vimdiff nodeAgg.c in your PG13 branch with nodeAgg.c in pg's main repo. Many functions has changed from PG96 to PG13, e.g. 'advance_aggregates', 'lookup_hash_entry' The vectorized nodeAgg seems still follow the PG96 way of implementing these functions. In general, I think we'd better port executor of PG13 to vectorized executor of PG13 instead of merge some PG13 code into vectorized executor of PG96 to make it works. Because It's hard to determine which functions need to be merged and it's buggy if the executor code of both PG13 and PG96 exist in one branch. What's your opinion?
Re: reindex concurrently and two toast indexes
On Thu, Feb 27, 2020 at 04:32:11PM +0900, Michael Paquier wrote: > On Sat, Feb 22, 2020 at 04:06:57PM +0100, Julien Rouhaud wrote: > > Sorry, I just realized that I forgot to commit the last changes before > > sending > > the patch, so here's the correct v2. > > Thanks for the patch. > > > + if (skipit) > > + { > > + ereport(NOTICE, > > +(errmsg("skipping invalid index \"%s.%s\"", > > + > > get_namespace_name(get_rel_namespace(indexOid)), > > +get_rel_name(indexOid; > > ReindexRelationConcurrently() issues a WARNING when bumping on an > invalid index, shouldn't the same log level be used? For ReindexRelationConcurrently, the index is skipped because the feature isn't supported, thus a warning. In this case that would work, it's just that we don't want to process such indexes, so I used a notice instead. I'm not opposed to use a warning instead if you prefer. What errcode should be used though, ERRCODE_WARNING? ERRCODE_FEATURE_NOT_SUPPORTED doesn't feel right. > Even with this patch, it is possible to reindex an invalid toast index > with REINDEX INDEX (with and without CONCURRENTLY), which is the > problem I mentioned upthread (Er, actually only for the non-concurrent > case as told about reindex_index). Shouldn't both cases be prevented > as well with an ERROR? Ah indeed, sorry I missed that. While looking at it, I see that invalid indexes seem to leaked when the table is dropped, with no way to get rid of them: s1: CREATE TABLE t1(val text); CREATE INDEX ON t1 (val); BEGIN; SELECT * FROM t1 FOR UPDATE; s2: REINDEX TABLE CONCURRENTLY t1; [stucked and canceled] SELECT indexrelid::regclass, indrelid::regclass FROM pg_index WHERE NOT indisvalid; indexrelid |indrelid -+- t1_val_idx_ccold| t1 pg_toast.pg_toast_16385_index_ccold | pg_toast.pg_toast_16385 (2 rows) s1: ROLLBACK; DROP TABLE t1; SELECT indexrelid::regclass, indrelid::regclass FROM pg_index WHERE NOT indisvalid; indexrelid | indrelid -+-- t1_val_idx_ccold| 16385 pg_toast.pg_toast_16385_index_ccold | 16388 (2 rows) REINDEX INDEX t1_val_idx_ccold; ERROR: XX000: could not open relation with OID 16385 LOCATION: relation_open, relation.c:62 DROP INDEX t1_val_idx_ccold; ERROR: XX000: could not open relation with OID 16385 LOCATION: relation_open, relation.c:62 REINDEX INDEX pg_toast.pg_toast_16385_index_ccold; ERROR: XX000: could not open relation with OID 16388 LOCATION: relation_open, relation.c:62 DROP INDEX pg_toast.pg_toast_16385_index_ccold; ERROR: XX000: could not open relation with OID 16388 LOCATION: relation_open, relation.c:62 REINDEX DATABASE rjuju; REINDEX SELECT indexrelid::regclass, indrelid::regclass FROM pg_index WHERE NOT indisvalid; indexrelid | indrelid -+-- t1_val_idx_ccold| 16385 pg_toast.pg_toast_16385_index_ccold | 16388 (2 rows) Shouldn't DROP TABLE be fixed to also drop invalid indexes?
Re: Crash by targetted recovery
Thank you for the comment. At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao wrote in > On 2020/02/27 15:23, Kyotaro Horiguchi wrote: > >> I failed to understand why random access while reading from > >> stream is bad idea. Could you elaborate why? > > It seems to me the word "streaming" suggests that WAL record should be > > read sequentially. Random access, which means reading from arbitrary > > location, breaks a stream. (But the patch doesn't try to stop wal > > sender if randAccess.) > > > >> Isn't it sufficient to set currentSource to 0 when disabling > >> StandbyMode? > > I thought that and it should work, but I hesitated to manipulate on > > currentSource in StartupXLOG. currentSource is basically a private > > state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I > > think it's not good to modify it out of the the logic in > > WaitForWALToBecomeAvailable. > > If so, what about adding the following at the top of > WaitForWALToBecomeAvailable()? > > if (!StandbyMode && currentSource == XLOG_FROM_STREAM) > currentSource = 0; It works virtually the same way. I'm happy to do that if you don't agree to using randAccess. But I'd rather do that in the 'if (!InArchiveRecovery)' section. > > Come to think of that I got to think the > > following part in ReadRecord should use randAccess instead.. > > xlog.c:4384 > >> /* > > - * Before we retry, reset lastSourceFailed and currentSource > > - * so that we will check the archive next. > > + * Streaming has broken, we retry from the same LSN. > >> */ > >> lastSourceFailed = false; > > - currentSource = 0; > > + private->randAccess = true; > > Sorry, I failed to understand why this change is necessary... It's not necessary, just for being tidy about the responsibility on currentSource. > At least the comment that you added seems incorrect > because WAL streaming should not have started yet when > we reach the above point. Oops, right. - * Streaming has broken, we retry from the same LSN. + * Restart recovery from the current LSN. For clarity, I don't insist on the change at all. If it were necessary, it's another topic, anyway. Please forget it. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 8cb817a43226a0d60dd62c6205219a2f0c807b9e Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 26 Feb 2020 20:41:11 +0900 Subject: [PATCH v2 1/2] TAP test for a crash bug --- src/test/recovery/t/003_recovery_targets.pl | 32 + 1 file changed, 32 insertions(+) diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl index fd14bab208..8e71788981 100644 --- a/src/test/recovery/t/003_recovery_targets.pl +++ b/src/test/recovery/t/003_recovery_targets.pl @@ -167,3 +167,35 @@ foreach my $i (0..100) $logfile = slurp_file($node_standby->logfile()); ok($logfile =~ qr/FATAL: recovery ended before configured recovery target was reached/, 'recovery end before target reached is a fatal error'); + + +# Edge case where targetted promotion happens on segment boundary +$node_standby = get_new_node('standby_9'); +$node_standby->init_from_backup($node_master, 'my_backup', +has_restoring => 1, has_streaming => 1); +$node_standby->start; +## read wal_segment_size +my $result = + $node_standby->safe_psql('postgres', "SHOW wal_segment_size"); +die "unknown format of wal_segment_size: $result\n" + if ($result !~ /^([0-9]+)MB$/); +my $segsize = $1 * 1024 * 1024; +## stop just before the next segment boundary +$result = + $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()"); +my ($seg, $off) = split('/', $result); +my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize); + +$node_standby->stop; +$node_standby->append_conf('postgresql.conf', qq( +recovery_target_inclusive=no +recovery_target_lsn='$target' +recovery_target_action='promote' +)); +$node_standby->start; +## do targetted promote +$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;"); +$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;"); +my $caughtup_query = "SELECT NOT pg_is_in_recovery()"; +$node_standby->poll_query_until('postgres', $caughtup_query) + or die "Timed out while waiting for standby to promote"; -- 2.18.2 >From d917638b787b9d1393cbc0d77586168da844756b Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 26 Feb 2020 20:41:37 +0900 Subject: [PATCH v2 2/2] Fix a crash bug of targetted promotion. After recovery target is reached, StartupXLOG turns off standby mode then refetches the last record. If the last record starts from the previous segment at the time, WaitForWALToBecomeAvailable crashes with assertion failure. WaitForWALToBecomeAvailable should move back to XLOG_FROM_ARCHIVE if standby mode is turned off while XLOG_FROM_STREAM. --- src/backend/access/transam/xlog.c | 13 +++-- 1 file changed, 11 insertions(+), 2 de