Re: security_context_t marked as deprecated in libselinux 3.1
On Thu, Aug 13, 2020 at 01:29:35AM -0400, Tom Lane wrote: > Well, "you get a compiler warning" isn't a reason to consider the version > unsupported. There are probably going to be a few other warnings you get > when building on an ancient platform --- as long as it works, I think > we're fine. So based on this, no objection, and I think no need to > change our statement about what's supported. Okay, thanks for confirming. Let's do so then, I'll just wait a bit to see if there are more comments from others. -- Michael signature.asc Description: PGP signature
Re: security_context_t marked as deprecated in libselinux 3.1
Michael Paquier writes: > On Wed, Aug 12, 2020 at 10:50:21PM -0400, Tom Lane wrote: >> Ummm ... aren't you going to get some cast-away-const warnings now? > Let me see.. The function signatures we use have been visibly changed > in 9eb9c932, which comes down to a point between 2.2.2 and 2.3, and > there are two of them we care about, both use now "const char *": > - security_check_context_raw() > - security_compute_create_name_raw() OK, it's all good then. > Based on this information, what if we increased the minimum support to > 2.3 then? That's a release from 2014, and maintaining such legacy > code does not seem much worth the effort IMO. Well, "you get a compiler warning" isn't a reason to consider the version unsupported. There are probably going to be a few other warnings you get when building on an ancient platform --- as long as it works, I think we're fine. So based on this, no objection, and I think no need to change our statement about what's supported. regards, tom lane
Re: run pgindent on a regular basis / scripted manner
On Wed, Aug 12, 2020 at 10:14 PM Tom Lane wrote: > > Noah Misch writes: > > ... Another advantage of master-only is a guarantee against > > disrupting time-critical patches. (It would be ugly to push back branches > > and > > sort out the master push later, but it doesn't obstruct the mission.) > > Hm, doesn't it? I had the idea that "git push" is atomic --- either all > the per-branch commits succeed, or they all fail. I might be wrong. As of Git 2.28, atomic pushes are not turned on by default. That means "git push my-remote foo bar" _can_ result in partial success. I'm that paranoid who does "git push --atomic my-remote foo bar fizz". Cheers, Jesse
Re: security_context_t marked as deprecated in libselinux 3.1
On Wed, Aug 12, 2020 at 10:50:21PM -0400, Tom Lane wrote: > Ummm ... aren't you going to get some cast-away-const warnings now? > Or are all of the called functions declared as taking "const char *" > not just "char *"? Let me see.. The function signatures we use have been visibly changed in 9eb9c932, which comes down to a point between 2.2.2 and 2.3, and there are two of them we care about, both use now "const char *": - security_check_context_raw() - security_compute_create_name_raw() We claim in the docs that the minimum version of libselinux supported is 2.1.10 (7a86fe1a from march 2012). Then, the only buildfarm animal I know of testing selinux is rhinoceros, that uses CentOS 7.1, and this visibly already bundles libselinux 2.5 that was released in 2016 (2b69984), per the RPM list here: http://mirror.centos.org/centos/7/ Joe, what's the version of libselinux used in rhinoceros? 2.5? Based on this information, what if we increased the minimum support to 2.3 then? That's a release from 2014, and maintaining such legacy code does not seem much worth the effort IMO. -- Michael signature.asc Description: PGP signature
Re: run pgindent on a regular basis / scripted manner
On Thu, Aug 13, 2020 at 01:14:33AM -0400, Tom Lane wrote: > Noah Misch writes: > > ... Another advantage of master-only is a guarantee against > > disrupting time-critical patches. (It would be ugly to push back branches > > and > > sort out the master push later, but it doesn't obstruct the mission.) > > Hm, doesn't it? I had the idea that "git push" is atomic --- either all > the per-branch commits succeed, or they all fail. I might be wrong. Atomicity is good. I just meant that you could issue something like "git push origin $(cd .git/refs/heads && ls REL*)" to defer the complaint about master.
Re: run pgindent on a regular basis / scripted manner
Noah Misch writes: > ... Another advantage of master-only is a guarantee against > disrupting time-critical patches. (It would be ugly to push back branches and > sort out the master push later, but it doesn't obstruct the mission.) Hm, doesn't it? I had the idea that "git push" is atomic --- either all the per-branch commits succeed, or they all fail. I might be wrong. regards, tom lane
Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE
Hi čt 13. 8. 2020 v 6:31 odesílatel Mikhail Titov napsal: > Hello! > > According to the docs[1], one may use DEFAULT keyword while inserting > into generated columns (stored and identity). However, currently it > works only for a single VALUES sublist with DEFAULT for a generated column > but not for the case when VALUES RTE is used. This is not being tested > and it is broken. > > I am attaching two patches. One for tests and another one with the > proposed changes based on ideas from Andrew on IRC. So if all good there > goes the credit where credit is due. If patch is no good, then it is > likely my misunderstanding how to put words into code :-) > > This is my only second patch to PostgreSQL (the first one was rejected) > so don't be too harsh :-) It may not be perfect but I am open for a > feedback and this is just to get the ball rolling and to let the > community know about this issue. > > Before you ask why would I want to insert DEFAULTs ... well, there are > ORMs[2] that still need to be patched and current situation contradicts > documentation[1]. > please, assign your patch to commitfest application https://commitfest.postgresql.org/29/ Regards Pavel > Footnotes: > [1] https://www.postgresql.org/docs/devel/ddl-generated-columns.html > > [2] https://github.com/rails/rails/pull/39368#issuecomment-670351379 > > -- > Mikhail >
Re: Parallel query hangs after a smart shutdown is issued
Thomas Munro writes: > Makes sense. I tested this version on a primary and a replica and > verified that parallel workers launch, but I saw that autovacuum > workers still can't start without something like this: > @@ -2463,7 +2463,8 @@ canAcceptConnections(int backend_type) > * be returned until we have checked for too many children. > */ > if (smartShutState != SMART_NORMAL_USAGE && > - backend_type != BACKEND_TYPE_BGWORKER) > + backend_type != BACKEND_TYPE_BGWORKER && > + backend_type != BACKEND_TYPE_AUTOVAC) Hmmm ... maybe that should be more like if (smartShutState != SMART_NORMAL_USAGE && backend_type == BACKEND_TYPE_NORMAL) (the adjacent comment needs adjustment too of course). regards, tom lane
Re: Parallel query hangs after a smart shutdown is issued
On Thu, Aug 13, 2020 at 2:37 PM Tom Lane wrote: > I experimented with separating the shutdown-in-progress state into a > separate variable, letting us actually reduce not increase the number of > pmStates. This way, PM_RUN and other states still apply until we're > ready to pull the shutdown trigger, so that we don't need to complicate > state-based decisions about launching auxiliary processes. This patch > also unifies the signal-sending for the smart and fast shutdown paths, > which seems like a nice improvement. I kind of like this, though I'm not > in love with the particular variable name I used here (smartShutState). Makes sense. I tested this version on a primary and a replica and verified that parallel workers launch, but I saw that autovacuum workers still can't start without something like this: @@ -2463,7 +2463,8 @@ canAcceptConnections(int backend_type) * be returned until we have checked for too many children. */ if (smartShutState != SMART_NORMAL_USAGE && - backend_type != BACKEND_TYPE_BGWORKER) + backend_type != BACKEND_TYPE_BGWORKER && + backend_type != BACKEND_TYPE_AUTOVAC) { if (smartShutState == SMART_SUPERUSER_ONLY) result = CAC_WAITBACKUP;/* allow superusers only */ @@ -2471,7 +2472,8 @@ canAcceptConnections(int backend_type) return CAC_SHUTDOWN;/* shutdown is pending */ } if (pmState != PM_RUN && - backend_type != BACKEND_TYPE_BGWORKER) + backend_type != BACKEND_TYPE_BGWORKER && + backend_type != BACKEND_TYPE_AUTOVAC) { if (Shutdown > NoShutdown) return CAC_SHUTDOWN;/* shutdown is pending */
Re: Switch to multi-inserts for pg_depend
On Wed, Aug 12, 2020 at 07:52:42PM -0400, Alvaro Herrera wrote: > Yeah. As I understand, the only reason to have this number is to avoid > an arbitrarily large number of entries created as a single multi-insert > WAL record ... but does that really ever happen? I guess if you create > a table with some really complicated schema you might get, say, a > hundred pg_depend rows at once. But to fill eight complete pages of > pg_depend entries sounds astoundingly ridiculous already -- I'd say it's > just an easy way to spell "infinity" for this. Tweaking one infinity > value to become some other infinity value sounds useless. > > So I agree with what Andres said. Let's have just one such define and > be done with it. Okay. Would src/include/catalog/catalog.h be a suited location for this flag or somebody has a better idea? -- Michael signature.asc Description: PGP signature
REINDEX SCHEMA/DATABASE/SYSTEM weak with dropped relations
Hi all, While working on support for REINDEX for partitioned relations, I have noticed an old bug in the logic of ReindexMultipleTables(): the list of relations to process is built in a first transaction, and then each table is done in an independent transaction, but we don't actually check that the relation still exists when doing the real work. I think that a complete fix involves two things: 1) Addition of one SearchSysCacheExists1() at the beginning of the loop processing each item in the list in ReindexMultipleTables(). This would protect from a relation dropped, but that would not be enough if ReindexMultipleTables() is looking at a relation dropped before we lock it in reindex_table() or ReindexRelationConcurrently(). Still that's simple, cheaper, and would protect from most problems. 2) Be completely water-proof and adopt a logic close to what we do for VACUUM where we try to open a relation, but leave if it does not exist. This can be achieved with just some try_relation_open() calls with the correct lock used, and we also need to have a new REINDEXOPT_* flag to control this behavior conditionally. 2) and 1) are complementary, but 2) is invasive, so based on the lack of complaints we have seen that does not seem really worth backpatching to me, and I think that we could also just have 1) on stable branches as a minimal fix, to take care of most of the problems that could show up to users. Attached is a patch to fix all that, with a cheap isolation test that fails on HEAD with a cache lookup failure. I am adding that to the next CF for now, and I would rather fix this issue before moving on with REINDEX for partitioned relations as fixing this issue reduces the use of session locks for partition trees. Any thoughts? -- Michael diff --git a/src/include/access/table.h b/src/include/access/table.h index cf0ef7b337..68dc4d62c0 100644 --- a/src/include/access/table.h +++ b/src/include/access/table.h @@ -22,6 +22,7 @@ extern Relation table_open(Oid relationId, LOCKMODE lockmode); extern Relation table_openrv(const RangeVar *relation, LOCKMODE lockmode); extern Relation table_openrv_extended(const RangeVar *relation, LOCKMODE lockmode, bool missing_ok); +extern Relation try_table_open(Oid relationId, LOCKMODE lockmode); extern void table_close(Relation relation, LOCKMODE lockmode); #endif /* TABLE_H */ diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 151bcdb7ef..f6ffee7e53 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -3351,6 +3351,7 @@ typedef struct ConstraintsSetStmt /* Reindex options */ #define REINDEXOPT_VERBOSE (1 << 0) /* print progress info */ #define REINDEXOPT_REPORT_PROGRESS (1 << 1) /* report pgstat progress */ +#define REINDEXOPT_MISSING_OK (2 << 1) /* skip missing relations */ typedef enum ReindexObjectType { diff --git a/src/backend/access/table/table.c b/src/backend/access/table/table.c index 1aa01a54b3..7c29091e6c 100644 --- a/src/backend/access/table/table.c +++ b/src/backend/access/table/table.c @@ -57,6 +57,40 @@ table_open(Oid relationId, LOCKMODE lockmode) return r; } + +/* + * try_table_open - open a table relation by relation OID + * + * Same as table_open, except return NULL instead of failing + * if the relation does not exist. + * + */ +Relation +try_table_open(Oid relationId, LOCKMODE lockmode) +{ + Relation r; + + r = try_relation_open(relationId, lockmode); + + /* leave if table does not exist */ + if (!r) + return NULL; + + if (r->rd_rel->relkind == RELKIND_INDEX || + r->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) + ereport(ERROR, +(errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("\"%s\" is an index", + RelationGetRelationName(r; + else if (r->rd_rel->relkind == RELKIND_COMPOSITE_TYPE) + ereport(ERROR, +(errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("\"%s\" is a composite type", + RelationGetRelationName(r; + + return r; +} + /* * table_openrv - open a table relation specified * by a RangeVar node diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 1be27eec52..4e7b3eb6a2 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -3423,8 +3423,20 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, * Open and lock the parent heap relation. ShareLock is sufficient since * we only need to be sure no schema or data changes are going on. */ - heapId = IndexGetRelation(indexId, false); - heapRelation = table_open(heapId, ShareLock); + heapId = IndexGetRelation(indexId, + (options & REINDEXOPT_MISSING_OK) != 0); + /* if relation is missing, leave */ + if (!OidIsValid(heapId)) + return; + + if ((options & REINDEXOPT_MISSING_OK) != 0) + heapRelation = try_table_open(heapId, ShareLock); + else + heapRelation = table_open(heapId, ShareLock); + + /* if relation is gone, leave */ + if
[bug+patch] Inserting DEFAULT into generated columns from VALUES RTE
Hello! According to the docs[1], one may use DEFAULT keyword while inserting into generated columns (stored and identity). However, currently it works only for a single VALUES sublist with DEFAULT for a generated column but not for the case when VALUES RTE is used. This is not being tested and it is broken. I am attaching two patches. One for tests and another one with the proposed changes based on ideas from Andrew on IRC. So if all good there goes the credit where credit is due. If patch is no good, then it is likely my misunderstanding how to put words into code :-) This is my only second patch to PostgreSQL (the first one was rejected) so don't be too harsh :-) It may not be perfect but I am open for a feedback and this is just to get the ball rolling and to let the community know about this issue. Before you ask why would I want to insert DEFAULTs ... well, there are ORMs[2] that still need to be patched and current situation contradicts documentation[1]. Footnotes: [1] https://www.postgresql.org/docs/devel/ddl-generated-columns.html [2] https://github.com/rails/rails/pull/39368#issuecomment-670351379 -- Mikhail From d606c4f952a6080dff6fb621ea034bfce2865f7b Mon Sep 17 00:00:00 2001 From: Mikhail Titov Date: Wed, 12 Aug 2020 22:42:37 -0500 Subject: [PATCH 1/2] Test DEFAULT in VALUES RTE for generated columns --- src/test/regress/expected/generated.out | 9 + src/test/regress/expected/identity.out | 13 + src/test/regress/sql/generated.sql | 4 src/test/regress/sql/identity.sql | 1 + 4 files changed, 23 insertions(+), 4 deletions(-) diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out index 7ccc3c65ed..31525ef2a6 100644 --- a/src/test/regress/expected/generated.out +++ b/src/test/regress/expected/generated.out @@ -90,6 +90,15 @@ CREATE TABLE gtest_err_8 (a int PRIMARY KEY, b int GENERATED BY DEFAULT AS (a * ERROR: for a generated column, GENERATED ALWAYS must be specified LINE 1: ...E gtest_err_8 (a int PRIMARY KEY, b int GENERATED BY DEFAULT... ^ +-- test VALUES RTE with defaults +INSERT INTO gtest0 VALUES (1, DEFAULT), (2, DEFAULT); +SELECT * FROM gtest0; + a | b +---+ + 1 | 55 + 2 | 55 +(2 rows) + INSERT INTO gtest1 VALUES (1); INSERT INTO gtest1 VALUES (2, DEFAULT); INSERT INTO gtest1 VALUES (3, 33); -- error diff --git a/src/test/regress/expected/identity.out b/src/test/regress/expected/identity.out index 7ac9df767f..ca27b7ff73 100644 --- a/src/test/regress/expected/identity.out +++ b/src/test/regress/expected/identity.out @@ -76,6 +76,7 @@ INSERT INTO itest3 DEFAULT VALUES; INSERT INTO itest3 DEFAULT VALUES; INSERT INTO itest4 DEFAULT VALUES; INSERT INTO itest4 DEFAULT VALUES; +INSERT INTO itest4 (a, b) VALUES (DEFAULT, 'foo'), (DEFAULT, 'bar'); SELECT * FROM itest1; a | b ---+--- @@ -99,10 +100,12 @@ SELECT * FROM itest3; SELECT * FROM itest4; a | b +--- +---+- 1 | 2 | -(2 rows) + 3 | foo + 4 | bar +(4 rows) -- VALUES RTEs INSERT INTO itest3 VALUES (DEFAULT, 'a'); @@ -211,11 +214,13 @@ ALTER TABLE itest4 ALTER COLUMN a DROP NOT NULL; INSERT INTO itest4 DEFAULT VALUES; SELECT * FROM itest4; a | b +--- +---+- 1 | 2 | + 3 | foo + 4 | bar | -(3 rows) +(5 rows) -- check that sequence is removed SELECT sequence_name FROM itest4_a_seq; diff --git a/src/test/regress/sql/generated.sql b/src/test/regress/sql/generated.sql index 4cff1279c7..18bb1d3c78 100644 --- a/src/test/regress/sql/generated.sql +++ b/src/test/regress/sql/generated.sql @@ -40,6 +40,10 @@ CREATE TABLE gtest_err_7d (a int PRIMARY KEY, b int GENERATED ALWAYS AS (generat -- GENERATED BY DEFAULT not allowed CREATE TABLE gtest_err_8 (a int PRIMARY KEY, b int GENERATED BY DEFAULT AS (a * 2) STORED); +-- test VALUES RTE with defaults +INSERT INTO gtest0 VALUES (1, DEFAULT), (2, DEFAULT); +SELECT * FROM gtest0; + INSERT INTO gtest1 VALUES (1); INSERT INTO gtest1 VALUES (2, DEFAULT); INSERT INTO gtest1 VALUES (3, 33); -- error diff --git a/src/test/regress/sql/identity.sql b/src/test/regress/sql/identity.sql index 1bf2a976eb..b3d99583c2 100644 --- a/src/test/regress/sql/identity.sql +++ b/src/test/regress/sql/identity.sql @@ -47,6 +47,7 @@ INSERT INTO itest3 DEFAULT VALUES; INSERT INTO itest3 DEFAULT VALUES; INSERT INTO itest4 DEFAULT VALUES; INSERT INTO itest4 DEFAULT VALUES; +INSERT INTO itest4 (a, b) VALUES (DEFAULT, 'foo'), (DEFAULT, 'bar'); SELECT * FROM itest1; SELECT * FROM itest2; -- 2.28.0.windows.1 From 7a187f698d31638c65da10a71e97060793a29c7f Mon Sep 17 00:00:00 2001 From: Mikhail Titov Date: Wed, 12 Aug 2020 21:07:22 -0500 Subject: [PATCH 2/2] Allow DEFAULT in VALUES RTE for generated columns --- src/backend/parser/parse_relation.c | 50 ++-- src/backend/rewrite/rewriteHandler.c | 13 +++- 2 files changed, 51 insertions(+), 12 deletions(-) diff --git
Re: run pgindent on a regular basis / scripted manner
On Thu, Aug 13, 2020 at 12:08:36AM -0400, Tom Lane wrote: > Noah Misch writes: > > On Wed, Aug 12, 2020 at 07:47:01PM -0400, Tom Lane wrote: > >> If the workflow is commit first and re-indent later, then we can always > >> revert the pgindent commit and clean things up manually; but an auto > >> re-indent during commit wouldn't provide that history. > > > There are competing implementations of assuring pgindent-cleanliness at > > every > > check-in: > > > 1. After each push, an automated followup commit appears, restoring > >pgindent-cleanliness. > > 2. "git push" results in a commit that melds pgindent changes into what the > >committer tried to push. > > 3. "git push" fails, for the master branch, if the pushed commit disrupts > >pgindent-cleanliness. > > > Were you thinking of (2)? > > I was objecting to (2). (1) would perhaps work. (3) could be pretty > darn annoying, Right. I think of three use cases here: a) I'm a committer who wants to push clean code. I want (3). b) I'm a committer who wants to ignore pgindent. I get some email complaints under (1), which I ignore. Under (3), I'm forced to become (a). c) I'm reading the history. I want (3). > I hadn't thought about the angle of HEAD versus back-branch patches, > but that does seem like something to ponder. The back branches don't > have the same pgindent rules necessarily, plus the patch versions > might be different in more than just whitespace. My own habit when > back-patching has been to indent the HEAD patch per-current-rules and > then preserve that layout as much as possible in the back branches, > but I doubt we could get a tool to do that with any reliability. Similar habit here. Another advantage of master-only is a guarantee against disrupting time-critical patches. (It would be ugly to push back branches and sort out the master push later, but it doesn't obstruct the mission.)
Re: run pgindent on a regular basis / scripted manner
Noah Misch writes: > On Wed, Aug 12, 2020 at 07:47:01PM -0400, Tom Lane wrote: >> If the workflow is commit first and re-indent later, then we can always >> revert the pgindent commit and clean things up manually; but an auto >> re-indent during commit wouldn't provide that history. > There are competing implementations of assuring pgindent-cleanliness at every > check-in: > 1. After each push, an automated followup commit appears, restoring >pgindent-cleanliness. > 2. "git push" results in a commit that melds pgindent changes into what the >committer tried to push. > 3. "git push" fails, for the master branch, if the pushed commit disrupts >pgindent-cleanliness. > Were you thinking of (2)? I was objecting to (2). (1) would perhaps work. (3) could be pretty darn annoying, especially if it blocks a time-critical security patch. > Regarding typedefs.list, I would use the in-tree one, like you discussed here: Yeah, after thinking about that more, it seems like automated typedefs.list updates would be far riskier than automated reindent based on the existing typedefs.list. The latter could at least be expected not to change code unrelated to the immediate commit. typedefs.list updates need some amount of adult supervision. (I'd still vote for nag-mail to the committer whose work got reindented, in case the bot made things a lot worse.) I hadn't thought about the angle of HEAD versus back-branch patches, but that does seem like something to ponder. The back branches don't have the same pgindent rules necessarily, plus the patch versions might be different in more than just whitespace. My own habit when back-patching has been to indent the HEAD patch per-current-rules and then preserve that layout as much as possible in the back branches, but I doubt we could get a tool to do that with any reliability. Of course, there's also the possibility of forcibly reindenting all the active back branches to current rules. But I think we've rejected that idea already, because it would cause so much pain for forks that are following a back branch. regards, tom lane
Re: run pgindent on a regular basis / scripted manner
On Wed, Aug 12, 2020 at 07:47:01PM -0400, Tom Lane wrote: > Jesse Zhang writes: > > On Wed, Aug 12, 2020 at 3:34 PM Andres Freund wrote: > >> Is there any reason we don't just automatically run pgindent regularly? > >> Like once a week? And also update typedefs.list automatically, while > >> we're at it? > > > You know what's better than weekly? Every check-in. > > I'm not in favor of unsupervised pgindent runs, really. It can do a lot > of damage to code that was written without thinking about it --- in > particular, it'll make a hash of comment blocks that were manually > formatted and not protected with dashes. > > If the workflow is commit first and re-indent later, then we can always > revert the pgindent commit and clean things up manually; but an auto > re-indent during commit wouldn't provide that history. There are competing implementations of assuring pgindent-cleanliness at every check-in: 1. After each push, an automated followup commit appears, restoring pgindent-cleanliness. 2. "git push" results in a commit that melds pgindent changes into what the committer tried to push. 3. "git push" fails, for the master branch, if the pushed commit disrupts pgindent-cleanliness. Were you thinking of (2)? (1) doesn't have the lack-of-history problem, but it does have the unexpected-damage problem, and it makes gitweb noisier. (3) has neither problem, and I'd prefer it over (1), (2), or $SUBJECT. Regarding typedefs.list, I would use the in-tree one, like you discussed here: On Wed, Aug 12, 2020 at 07:57:29PM -0400, Tom Lane wrote: > Maybe the secret is to not allow automated adoption of new typedefs.list > entries, but to require somebody to add entries to that file by hand, > even if they're basing it on the buildfarm results. (This would > encourage the habit some people have adopted of updating typedefs.list > along with commits that add typedefs. I've never done that, but would > be willing to change if there's good motivation to.)
Re: Handing off SLRU fsyncs to the checkpointer
On Wed, Aug 12, 2020 at 6:06 PM Thomas Munro wrote: > [patch] Bitrot, rebased, no changes. > Yeah, the combined effect of these two patches is better than I > expected. To be clear though, I was only measuring the time between > the "redo starts at ..." and "redo done at ..." messages, since I've > been staring at the main recovery code, but there are also some more > fsyncs before (SyncDataDirectory()) and after (RemoveOldXlogFiles()) > that are unaffected. I think it's probably possible to do something > about those too, but that's another topic. ... and of course the end-of-recovery checkpoint; in my tests this wasn't materially changed since there isn't actually very much CLOG, it's just that we avoided syncing it block at a time and getting rescheduled. FWIW I put a very simple test here: https://github.com/macdice/redo-bench, YMMV. From 32c1c16d2800467d1d179678b66d1042d07c Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 4 Aug 2020 17:57:18 +1200 Subject: [PATCH v2] Defer flushing of SLRU files. Previously, we called fsync() after writing out pg_xact, multixact and commit_ts pages, leading to an I/O stall in user backends and recovery. Collapse requests for the same file into a single system call as part of the next checkpoint, as we do for relation files. Discussion: https://postgr.es/m/CA+hUKGLJ=84yt+nvhkeedauutvhmfq9i-n7k_o50jmq6rpj...@mail.gmail.com --- src/backend/access/transam/clog.c | 13 +++- src/backend/access/transam/commit_ts.c | 12 ++- src/backend/access/transam/multixact.c | 24 +- src/backend/access/transam/slru.c | 101 +++-- src/backend/access/transam/subtrans.c | 4 +- src/backend/commands/async.c | 5 +- src/backend/storage/lmgr/predicate.c | 4 +- src/backend/storage/sync/sync.c| 28 ++- src/include/access/clog.h | 3 + src/include/access/commit_ts.h | 3 + src/include/access/multixact.h | 4 + src/include/access/slru.h | 12 ++- src/include/storage/sync.h | 7 +- 13 files changed, 174 insertions(+), 46 deletions(-) diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index dd2f4d5bc7..3eb33aea01 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -42,6 +42,7 @@ #include "pg_trace.h" #include "pgstat.h" #include "storage/proc.h" +#include "storage/sync.h" /* * Defines for CLOG page sizes. A page is the same BLCKSZ as is used @@ -692,7 +693,8 @@ CLOGShmemInit(void) { XactCtl->PagePrecedes = CLOGPagePrecedes; SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE, - XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER); + XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER, + SYNC_HANDLER_CLOG); } /* @@ -1034,3 +1036,12 @@ clog_redo(XLogReaderState *record) else elog(PANIC, "clog_redo: unknown op code %u", info); } + +/* + * Entrypoint for sync.c to sync clog files. + */ +int +clogsyncfiletag(const FileTag *ftag, char *path) +{ + return slrusyncfiletag(XactCtl, ftag, path); +} diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c index 5244b06a2b..913ec9e48d 100644 --- a/src/backend/access/transam/commit_ts.c +++ b/src/backend/access/transam/commit_ts.c @@ -555,7 +555,8 @@ CommitTsShmemInit(void) CommitTsCtl->PagePrecedes = CommitTsPagePrecedes; SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0, CommitTsSLRULock, "pg_commit_ts", - LWTRANCHE_COMMITTS_BUFFER); + LWTRANCHE_COMMITTS_BUFFER, + SYNC_HANDLER_COMMIT_TS); commitTsShared = ShmemInitStruct("CommitTs shared", sizeof(CommitTimestampShared), @@ -1083,3 +1084,12 @@ commit_ts_redo(XLogReaderState *record) else elog(PANIC, "commit_ts_redo: unknown op code %u", info); } + +/* + * Entrypoint for sync.c to sync commit_ts files. + */ +int +committssyncfiletag(const FileTag *ftag, char *path) +{ + return slrusyncfiletag(CommitTsCtl, ftag, path); +} diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index b8bedca04a..344006b0f5 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -1831,11 +1831,13 @@ MultiXactShmemInit(void) SimpleLruInit(MultiXactOffsetCtl, "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0, MultiXactOffsetSLRULock, "pg_multixact/offsets", - LWTRANCHE_MULTIXACTOFFSET_BUFFER); + LWTRANCHE_MULTIXACTOFFSET_BUFFER, + SYNC_HANDLER_MULTIXACT_OFFSET); SimpleLruInit(MultiXactMemberCtl, "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0, MultiXactMemberSLRULock, "pg_multixact/members", - LWTRANCHE_MULTIXACTMEMBER_BUFFER); + LWTRANCHE_MULTIXACTMEMBER_BUFFER, + SYNC_HANDLER_MULTIXACT_MEMBER); /* Initialize our shared state struct */ MultiXactState = ShmemInitStruct("Shared MultiXact State", @@ -3386,3 +3388,21 @@
Autonomous database is coming to Postgres?
Hello, I'm not sure if I should have posted this to pgsql-advocacy, but this is being developed so I posted here. Does anyone know if this development come to open source Postgres, or only to the cloud services of Microsoft and Google? (I wonder this will become another reason that Postgres won't incorporate optimizer hint feature.) Data systems that learn to be better http://news.mit.edu/2020/mit-data-systems-learn-be-better-tsunami-bao-0810 [Quote] -- As a first step toward this vision, Kraska and colleagues developed Tsunami and Bao. Tsunami uses machine learning to automatically re-organize a dataset’s storage layout based on the types of queries that its users make. Tests show that it can run queries up to 10 times faster than state-of-the-art systems. What’s more, its datasets can be organized via a series of "learned indexes" that are up to 100 times smaller than the indexes used in traditional systems. Bao, meanwhile, focuses on improving the efficiency of query optimization through machine learning. ... Traditional query optimizers take years to build, are very hard to maintain, and, most importantly, do not learn from their mistakes. Bao is the first learning-based approach to query optimization that has been fully integrated into the popular database management system PostgreSQL. Lead author Ryan Marcus, a postdoc in Kraska’s group, says that Bao produces query plans that run up to 50 percent faster than those created by the PostgreSQL optimizer, meaning that it could help to significantly reduce the cost of cloud services, like Amazon’s Redshift, that are based on PostgreSQL. Kraska says that in contrast to other learning-based approaches to query optimization, Bao learns much faster and can outperform open-source and commercial optimizers with as little as one hour of training time.In the future, his team aims to integrate Bao into cloud systems to improve resource utilization in environments where disk, RAM, and CPU time are scarce resources. ... The work was done as part of the Data System and AI Lab (DSAIL@CSAIL), which is sponsored by Intel, Google, Microsoft, and the U.S. National Science Foundation. -- Regards Takayuki Tsunakawa
Re: remove some ancient port hacks
On Wed, Aug 12, 2020 at 09:12:07AM +0200, Peter Eisentraut wrote: > There are two ancient hacks in the cygwin and solaris ports that appear to > have been solved more than 10 years ago, so I think we can remove them. See > attached patches. +1 for removing these. >10y age is not sufficient justification by itself; if systems that shipped with the defect were not yet EOL, that would tend to justify waiting longer. For these particular hacks, though, affected systems are both old and EOL.
Re: security_context_t marked as deprecated in libselinux 3.1
Michael Paquier writes: > Per the following commit in upstream SELinux, security_context_t has > been marked as deprecated, generating complains with > -Wdeprecated-declarations: > https://github.com/SELinuxProject/selinux/commit/7a124ca2758136f49cc38efc26fb1a2d385ecfd9 Huh. Apparently it's been considered legacy for a good while, too. > This can be seen with Debian GID when building contrib/selinux/, as it > we have libselinux 3.1 there. Per the upstream repo, > security_context_t maps to char * in include/selinux/selinux.h, so we > can get rid easily of the warnings with the attached that replaces > the references to security_context_t. Ummm ... aren't you going to get some cast-away-const warnings now? Or are all of the called functions declared as taking "const char *" not just "char *"? regards, tom lane
Re: Parallel query hangs after a smart shutdown is issued
Thomas Munro writes: > On Thu, Aug 13, 2020 at 10:21 AM Tom Lane wrote: >> Also, the state before PM_WAIT_READONLY could have been >> PM_RECOVERY or PM_STARTUP, in which case we don't really want to think >> it's like PM_HOT_STANDBY either; only the BgWorkerStart_PostmasterStart >> case should be accepted. That suggests that we need yet another pmState, >> or else a more thoroughgoing refactoring of how the postmaster's state >> is represented. > Hmm. I experimented with separating the shutdown-in-progress state into a separate variable, letting us actually reduce not increase the number of pmStates. This way, PM_RUN and other states still apply until we're ready to pull the shutdown trigger, so that we don't need to complicate state-based decisions about launching auxiliary processes. This patch also unifies the signal-sending for the smart and fast shutdown paths, which seems like a nice improvement. I kind of like this, though I'm not in love with the particular variable name I used here (smartShutState). If we go this way, CAC_WAITBACKUP ought to be renamed since the PMState it's named after no longer exists. I left that alone pending making final naming choices, though. regards, tom lane diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml index e31275a04e..3946fa52ea 100644 --- a/doc/src/sgml/ref/pg_ctl-ref.sgml +++ b/doc/src/sgml/ref/pg_ctl-ref.sgml @@ -185,8 +185,8 @@ PostgreSQL documentation stop mode shuts down the server that is running in the specified data directory. Three different shutdown methods can be selected with the -m - option. Smart mode waits for all active - clients to disconnect and any online backup to finish. + option. Smart mode disallows new connections, then waits + for all existing clients to disconnect and any online backup to finish. If the server is in hot standby, recovery and streaming replication will be terminated once all clients have disconnected. Fast mode (the default) does not wait for clients to disconnect and diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 38e2c16ac2..66b402e7f7 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -148,8 +148,6 @@ #define BACKEND_TYPE_BGWORKER 0x0008 /* bgworker process */ #define BACKEND_TYPE_ALL 0x000F /* OR of all the above */ -#define BACKEND_TYPE_WORKER (BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER) - /* * List of active backends (or child processes anyway; we don't actually * know whether a given child has become a backend or is still in the @@ -304,8 +302,7 @@ static bool FatalError = false; /* T if recovering from backend crash */ * and we switch to PM_RUN state. * * Normal child backends can only be launched when we are in PM_RUN or - * PM_HOT_STANDBY state. (We also allow launch of normal - * child backends in PM_WAIT_BACKUP state, but only for superusers.) + * PM_HOT_STANDBY state. (smartShutState can also restrict launching.) * In other states we handle connection requests by launching "dead_end" * child processes, which will simply send the client an error message and * quit. (We track these in the BackendList so that we can know when they @@ -319,10 +316,10 @@ static bool FatalError = false; /* T if recovering from backend crash */ * * Notice that this state variable does not distinguish *why* we entered * states later than PM_RUN --- Shutdown and FatalError must be consulted - * to find that out. FatalError is never true in PM_RECOVERY_* or PM_RUN - * states, nor in PM_SHUTDOWN states (because we don't enter those states - * when trying to recover from a crash). It can be true in PM_STARTUP state, - * because we don't clear it until we've successfully started WAL redo. + * to find that out. FatalError is never true in PM_RECOVERY, PM_HOT_STANDBY, + * or PM_RUN states, nor in PM_SHUTDOWN states (because we don't enter those + * states when trying to recover from a crash). It can be true in PM_STARTUP + * state, because we don't clear it until we've successfully started WAL redo. */ typedef enum { @@ -331,8 +328,7 @@ typedef enum PM_RECOVERY,/* in archive recovery mode */ PM_HOT_STANDBY,/* in hot standby mode */ PM_RUN, /* normal "database is alive" state */ - PM_WAIT_BACKUP,/* waiting for online backup mode to end */ - PM_WAIT_READONLY, /* waiting for read only backends to exit */ + PM_STOP_BACKENDS, /* need to stop remaining backends */ PM_WAIT_BACKENDS, /* waiting for live backends to exit */ PM_SHUTDOWN,/* waiting for checkpointer to do shutdown * ckpt */ @@ -344,6 +340,21 @@ typedef enum static PMState pmState = PM_INIT; +/* + * While performing a "smart shutdown", we restrict new connections but stay + * in PM_RUN or PM_HOT_STANDBY state until all the client backends are gone. + * smartShutState is a sub-state indicator showing the
Re: run pgindent on a regular basis / scripted manner
On Wed, Aug 12, 2020 at 06:53:25PM -0400, Alvaro Herrera wrote: > On 2020-Aug-12, Andres Freund wrote: >> Is there any reason we don't just automatically run pgindent regularly? >> Like once a week? And also update typedefs.list automatically, while >> we're at it? > > Seconded. Thirded. -- Michael signature.asc Description: PGP signature
security_context_t marked as deprecated in libselinux 3.1
Hi all, Per the following commit in upstream SELinux, security_context_t has been marked as deprecated, generating complains with -Wdeprecated-declarations: https://github.com/SELinuxProject/selinux/commit/7a124ca2758136f49cc38efc26fb1a2d385ecfd9 This can be seen with Debian GID when building contrib/selinux/, as it we have libselinux 3.1 there. Per the upstream repo, security_context_t maps to char * in include/selinux/selinux.h, so we can get rid easily of the warnings with the attached that replaces the references to security_context_t. Funnily, our code already mixes both definitions, see for example sepgsql_set_client_label, so this clarifies things. Any thoughts? -- Michael diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c index 32e405530b..b00b91df5a 100644 --- a/contrib/sepgsql/label.c +++ b/contrib/sepgsql/label.c @@ -120,7 +120,7 @@ sepgsql_set_client_label(const char *new_label) tcontext = client_label_peer; else { - if (security_check_context_raw((security_context_t) new_label) < 0) + if (security_check_context_raw(new_label) < 0) ereport(ERROR, (errcode(ERRCODE_INVALID_NAME), errmsg("SELinux: invalid security label: \"%s\"", @@ -453,9 +453,9 @@ sepgsql_get_label(Oid classId, Oid objectId, int32 subId) object.objectSubId = subId; label = GetSecurityLabel(, SEPGSQL_LABEL_TAG); - if (!label || security_check_context_raw((security_context_t) label)) + if (!label || security_check_context_raw(label)) { - security_context_t unlabeled; + char *unlabeled; if (security_get_initial_context_raw("unlabeled", ) < 0) ereport(ERROR, @@ -487,7 +487,7 @@ sepgsql_object_relabel(const ObjectAddress *object, const char *seclabel) * context of selinux. */ if (seclabel && - security_check_context_raw((security_context_t) seclabel) < 0) + security_check_context_raw(seclabel) < 0) ereport(ERROR, (errcode(ERRCODE_INVALID_NAME), errmsg("SELinux: invalid security label: \"%s\"", seclabel))); @@ -725,7 +725,7 @@ exec_object_restorecon(struct selabel_handle *sehnd, Oid catalogId) char *objname; int objtype = 1234; ObjectAddress object; - security_context_t context; + char *context; /* * The way to determine object name depends on object classes. So, any diff --git a/contrib/sepgsql/selinux.c b/contrib/sepgsql/selinux.c index 9fdc810f2e..2695e88f23 100644 --- a/contrib/sepgsql/selinux.c +++ b/contrib/sepgsql/selinux.c @@ -768,8 +768,8 @@ sepgsql_compute_avd(const char *scontext, * Ask SELinux what is allowed set of permissions on a pair of the * security contexts and the given object class. */ - if (security_compute_av_flags_raw((security_context_t) scontext, - (security_context_t) tcontext, + if (security_compute_av_flags_raw(scontext, + tcontext, tclass_ex, 0, _ex) < 0) ereport(ERROR, (errcode(ERRCODE_INTERNAL_ERROR), @@ -838,7 +838,7 @@ sepgsql_compute_create(const char *scontext, uint16 tclass, const char *objname) { - security_context_t ncontext; + char *ncontext; security_class_t tclass_ex; const char *tclass_name; char *result; @@ -853,8 +853,8 @@ sepgsql_compute_create(const char *scontext, * Ask SELinux what is the default context for the given object class on a * pair of security contexts */ - if (security_compute_create_name_raw((security_context_t) scontext, - (security_context_t) tcontext, + if (security_compute_create_name_raw(scontext, + tcontext, tclass_ex, objname, ) < 0) diff --git a/contrib/sepgsql/uavc.c b/contrib/sepgsql/uavc.c index 639a52c556..97189b7c46 100644 --- a/contrib/sepgsql/uavc.c +++ b/contrib/sepgsql/uavc.c @@ -171,7 +171,7 @@ sepgsql_avc_unlabeled(void) { if (!avc_unlabeled) { - security_context_t unlabeled; + char *unlabeled; if (security_get_initial_context_raw("unlabeled", ) < 0) ereport(ERROR, @@ -216,7 +216,7 @@ sepgsql_avc_compute(const char *scontext, const char *tcontext, uint16 tclass) * policy is reloaded, validation status shall be kept, so we also cache * whether the supplied security context was valid, or not. */ - if (security_check_context_raw((security_context_t) tcontext) != 0) + if (security_check_context_raw(tcontext) != 0) ucontext = sepgsql_avc_unlabeled(); /* signature.asc Description: PGP signature
Re: Add LWLock blocker(s) information
On Wed, Aug 12, 2020 at 5:39 PM Andres Freund wrote: > Attached. Needed one python3 fix, and to be adapted so it works with > futex based semaphores. Seems to work for both sysv and posix semaphores > now, based a very short test. Great, thanks! -- Peter Geoghegan
Re: [HACKERS] Dynamic instrumentation of lwlock wait times (lwlock flamegraphs)
Hi, On 2017-06-22 14:08:45 -0700, Andres Freund wrote: > At pgcon some people were talking about the difficulty of instrumenting > the time actually spent waiting for lwlocks and related measurements. > I'd mentioned that linux these days provides infrastructure to measure > such things in unmodified binaries. > > Attached is a prototype of a script that measures the time spent inside > PGSemaphoreLock(), aggregates that inside the kernel, grouped by pid and > stacktrace. That allows one to generate nice flame graphs showing which > part of the code waits how long for lwlocks. > > The attached script clearly needs improvements, but I thought I'd show > some of the results it can get. To run it you need the the python > library of the 'bcc' project [1], and a sufficiently new kernel. Some > distributions, e.g. newer debian versions, package this as python-bpfcc > and similar. > > The output of the script can be turned into a flamegraph with > https://github.com/brendangregg/FlameGraph 's flamegraph.pl. The script has bitrot slightly, due to python3 and postgres changes (the move to posix semaphores). Updated version attached. Based on the discussion in https://www.postgresql.org/message-id/20200813003934.yrm4qqngfygr6ii7%40alap3.anarazel.de Greetings, Andres Freund #!/usr/bin/python from __future__ import print_function from bcc import BPF from time import sleep import argparse import signal def positive_int(val): try: ival = int(val) except ValueError: raise argparse.ArgumentTypeError("must be an integer") if ival < 0: raise argparse.ArgumentTypeError("must be positive") return ival def positive_nonzero_int(val): ival = positive_int(val) if ival == 0: raise argparse.ArgumentTypeError("must be nonzero") return ival bpf_text = """ #include #include struct stats_key_t { u32 pid; int user_stack_id; }; struct stats_value_t { u64 total_time; }; struct start_key_t { u32 pid; }; struct start_value_t { u64 last_start; }; // map_type, key_type, leaf_type, table_name, num_entry BPF_HASH(stats, struct stats_key_t, struct stats_value_t); BPF_HASH(start, struct start_key_t, struct start_value_t); BPF_STACK_TRACE(stack_traces, STACK_STORAGE_SIZE); int trace_sem_entry(struct pt_regs *ctx) { u32 pid = bpf_get_current_pid_tgid(); struct start_key_t start_key = {}; struct start_value_t start_value = {}; if (!(THREAD_FILTER)) { return 0; } start_key.pid = pid; start_value.last_start = bpf_ktime_get_ns(); start.update(_key, _value); return 0; } int trace_sem_return(struct pt_regs *ctx) { u32 pid = bpf_get_current_pid_tgid(); struct stats_key_t stats_key = {}; struct start_key_t start_key = {}; struct stats_value_t zero = {}; struct stats_value_t *stats_value; struct start_value_t *start_value; u64 delta; if (!(THREAD_FILTER)) { return 0; } start_key.pid = pid; start_value = start.lookup(_key); if (!start_value) return 0; /* missed start */; delta = bpf_ktime_get_ns() - start_value->last_start; start.delete(_key); stats_key.pid = pid; stats_key.user_stack_id = stack_traces.get_stackid(ctx, BPF_F_REUSE_STACKID | BPF_F_USER_STACK); stats_value = stats.lookup_or_init(_key, ); stats_value->total_time += delta; return 0; } """ examples = """examples: ./pgsemwait.py -x BINARY# trace postgres lwlock wait time until Ctrl-C ./pgsemwait.py -x BINARY 5 # trace for 5 seconds only ./pgsemwait.py -x BINARY -p PID # trace PID """ parser = argparse.ArgumentParser( description="Measure Postgres LWLock Wait Time", formatter_class=argparse.RawDescriptionHelpFormatter, epilog=examples) parser.add_argument("-x", "--binary", metavar="BINARY", dest="binary", required = True, help="path to postgres binary") parser.add_argument("-p", "--pid", metavar="PID", dest="pid", help="trace this PID only", type=positive_int) parser.add_argument("-f", "--folded", action="store_true", help="output folded format") parser.add_argument("duration", nargs="?", default=, type=positive_nonzero_int, help="duration of trace, in seconds") parser.add_argument("--stack-storage-size", default=1024, type=positive_nonzero_int, help="the number of unique stack traces that can be stored and " "displayed (default 1024)") args = parser.parse_args() folded = args.folded duration = int(args.duration) # set stack storage size bpf_text = bpf_text.replace('STACK_STORAGE_SIZE', str(args.stack_storage_size)) # setup pid filter thread_filter = '1' if args.pid is not None: thread_filter = 'pid == %d' % args.pid bpf_text = bpf_text.replace('THREAD_FILTER', thread_filter) binary = args.binary b = BPF(text=bpf_text) libpath = BPF.find_exe(binary) if not libpath: bail("can't resolve library %s" % library) library = libpath
Re: Add LWLock blocker(s) information
Hi, On 2020-08-12 16:47:13 -0700, Peter Geoghegan wrote: > On Mon, Aug 10, 2020 at 5:41 PM Andres Freund wrote: > > Most of the cases where this kind of information really is interesting > > seem to benefit a lot from having stack information available. That > > obviously has overhead, so we don't want the cost all the > > time. The script at > > https://postgr.es/m/20170622210845.d2hsbqv6rxu2tiye%40alap3.anarazel.de > > can give you results like e.g. > > https://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg > > It seems to have bitrot. Do you have a more recent version of the script? Attached. Needed one python3 fix, and to be adapted so it works with futex based semaphores. Seems to work for both sysv and posix semaphores now, based a very short test. sudo python3 ./pgsemwait.py -x /home/andres/build/postgres/dev-optimize/vpath/src/backend/postgres -f 3|~/src/flamegraph/flamegraph.pl Will add a note to the other thread. Greetings, Andres Freund #!/usr/bin/python from __future__ import print_function from bcc import BPF from time import sleep import argparse import signal def positive_int(val): try: ival = int(val) except ValueError: raise argparse.ArgumentTypeError("must be an integer") if ival < 0: raise argparse.ArgumentTypeError("must be positive") return ival def positive_nonzero_int(val): ival = positive_int(val) if ival == 0: raise argparse.ArgumentTypeError("must be nonzero") return ival bpf_text = """ #include #include struct stats_key_t { u32 pid; int user_stack_id; }; struct stats_value_t { u64 total_time; }; struct start_key_t { u32 pid; }; struct start_value_t { u64 last_start; }; // map_type, key_type, leaf_type, table_name, num_entry BPF_HASH(stats, struct stats_key_t, struct stats_value_t); BPF_HASH(start, struct start_key_t, struct start_value_t); BPF_STACK_TRACE(stack_traces, STACK_STORAGE_SIZE); int trace_sem_entry(struct pt_regs *ctx) { u32 pid = bpf_get_current_pid_tgid(); struct start_key_t start_key = {}; struct start_value_t start_value = {}; if (!(THREAD_FILTER)) { return 0; } start_key.pid = pid; start_value.last_start = bpf_ktime_get_ns(); start.update(_key, _value); return 0; } int trace_sem_return(struct pt_regs *ctx) { u32 pid = bpf_get_current_pid_tgid(); struct stats_key_t stats_key = {}; struct start_key_t start_key = {}; struct stats_value_t zero = {}; struct stats_value_t *stats_value; struct start_value_t *start_value; u64 delta; if (!(THREAD_FILTER)) { return 0; } start_key.pid = pid; start_value = start.lookup(_key); if (!start_value) return 0; /* missed start */; delta = bpf_ktime_get_ns() - start_value->last_start; start.delete(_key); stats_key.pid = pid; stats_key.user_stack_id = stack_traces.get_stackid(ctx, BPF_F_REUSE_STACKID | BPF_F_USER_STACK); stats_value = stats.lookup_or_init(_key, ); stats_value->total_time += delta; return 0; } """ examples = """examples: ./pgsemwait.py -x BINARY# trace postgres lwlock wait time until Ctrl-C ./pgsemwait.py -x BINARY 5 # trace for 5 seconds only ./pgsemwait.py -x BINARY -p PID # trace PID """ parser = argparse.ArgumentParser( description="Measure Postgres LWLock Wait Time", formatter_class=argparse.RawDescriptionHelpFormatter, epilog=examples) parser.add_argument("-x", "--binary", metavar="BINARY", dest="binary", required = True, help="path to postgres binary") parser.add_argument("-p", "--pid", metavar="PID", dest="pid", help="trace this PID only", type=positive_int) parser.add_argument("-f", "--folded", action="store_true", help="output folded format") parser.add_argument("duration", nargs="?", default=, type=positive_nonzero_int, help="duration of trace, in seconds") parser.add_argument("--stack-storage-size", default=1024, type=positive_nonzero_int, help="the number of unique stack traces that can be stored and " "displayed (default 1024)") args = parser.parse_args() folded = args.folded duration = int(args.duration) # set stack storage size bpf_text = bpf_text.replace('STACK_STORAGE_SIZE', str(args.stack_storage_size)) # setup pid filter thread_filter = '1' if args.pid is not None: thread_filter = 'pid == %d' % args.pid bpf_text = bpf_text.replace('THREAD_FILTER', thread_filter) binary = args.binary b = BPF(text=bpf_text) libpath = BPF.find_exe(binary) if not libpath: bail("can't resolve library %s" % library) library = libpath b.attach_uprobe(name=library, sym_re='PGSemaphoreLock|futex', fn_name="trace_sem_entry", pid = -1) b.attach_uretprobe(name=library, sym_re='PGSemaphoreLock|futex', fn_name="trace_sem_return", pid = -1) matched = b.num_open_uprobes() if matched == 0: print("error: 0 functions traced. Exiting.", file=stderr) exit(1) #
Re: run pgindent on a regular basis / scripted manner
Andres Freund writes: > Unfortunately that is, with the current tooling, not entirely trivial to > do so completely. The way we generate the list of known typedefs > unfortunately depends on the platform a build is run on. Therefore the > buildfarm collects a number of the generated list of typedefs from > different platforms, and then we use that combined list to run pgindent. Yeah, it's hard to see how to avoid that given that the set of typedefs provided by system headers is not fixed. (Windows vs not Windows is the worst case of course, but Unixen aren't all alike either.) Another gotcha that we have to keep our eyes on is that sometimes the process finds spurious names that we don't want to treat as typedefs because it results in messing up too much code. There's a reject list in pgindent that takes care of those, but it has to be maintained manually. So I'm not sure how that could work in conjunction with unsupervised reindents --- by the time you notice a problem, the git history will already be cluttered with bogus reindentations. Maybe the secret is to not allow automated adoption of new typedefs.list entries, but to require somebody to add entries to that file by hand, even if they're basing it on the buildfarm results. (This would encourage the habit some people have adopted of updating typedefs.list along with commits that add typedefs. I've never done that, but would be willing to change if there's good motivation to.) regards, tom lane
Re: Switch to multi-inserts for pg_depend
On 2020-Aug-11, Robert Haas wrote: > On Tue, Aug 11, 2020 at 1:59 AM Michael Paquier wrote: > > On Mon, Aug 10, 2020 at 05:32:21PM -0700, Andres Freund wrote: > > > Do we really want to end up with several separate defines for different > > > type of catalog batch inserts? That doesn't seem like a good > > > thing. Think there should be a single define for all catalog bulk > > > inserts. > > > > Unlikely so, but I kept them separate to potentially lower the > > threshold of 64kB for catalog rows that have a lower average size than > > pg_attribute. > > Uh, why would we want to do that? Yeah. As I understand, the only reason to have this number is to avoid an arbitrarily large number of entries created as a single multi-insert WAL record ... but does that really ever happen? I guess if you create a table with some really complicated schema you might get, say, a hundred pg_depend rows at once. But to fill eight complete pages of pg_depend entries sounds astoundingly ridiculous already -- I'd say it's just an easy way to spell "infinity" for this. Tweaking one infinity value to become some other infinity value sounds useless. So I agree with what Andres said. Let's have just one such define and be done with it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add LWLock blocker(s) information
On Mon, Aug 10, 2020 at 5:41 PM Andres Freund wrote: > Most of the cases where this kind of information really is interesting > seem to benefit a lot from having stack information available. That > obviously has overhead, so we don't want the cost all the > time. The script at > https://postgr.es/m/20170622210845.d2hsbqv6rxu2tiye%40alap3.anarazel.de > can give you results like e.g. > https://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg It seems to have bitrot. Do you have a more recent version of the script? -- Peter Geoghegan
Re: run pgindent on a regular basis / scripted manner
Jesse Zhang writes: > On Wed, Aug 12, 2020 at 3:34 PM Andres Freund wrote: >> Is there any reason we don't just automatically run pgindent regularly? >> Like once a week? And also update typedefs.list automatically, while >> we're at it? > You know what's better than weekly? Every check-in. I'm not in favor of unsupervised pgindent runs, really. It can do a lot of damage to code that was written without thinking about it --- in particular, it'll make a hash of comment blocks that were manually formatted and not protected with dashes. If the workflow is commit first and re-indent later, then we can always revert the pgindent commit and clean things up manually; but an auto re-indent during commit wouldn't provide that history. I do like the idea of more frequent, smaller pgindent runs instead of doing just one giant run per year. With the giant runs it's necessary to invest a fair amount of time eyeballing all the changes; if we did it every couple weeks then the pain would be a lot less. Another idea would be to have a bot that runs pgindent *without* committing the results, and emails the people who have made commits into files that changed, saying "if you don't like these diffs then you'd better clean up your commit before it happens for real". With some warning like that, it might be okay to do automatic reindenting a little bit later. Plus, nagging committers who habitually commit improperly-indented code might persuade them to clean up their acts ;-) regards, tom lane
Re: Parallel query hangs after a smart shutdown is issued
On Thu, Aug 13, 2020 at 10:21 AM Tom Lane wrote: > Thomas Munro writes: > > @@ -5911,11 +5912,11 @@ bgworker_should_start_now(BgWorkerStartTime > > start_time) > > + case PM_WAIT_READONLY: > > + case PM_WAIT_CLIENTS: > > case PM_RUN: > > So the question here is whether time-based bgworkers should be allowed to > restart in this scenario. I'm not quite sure --- depending on what the > bgworker's purpose is, you could make an argument either way, I think. > Do we need some way to control that? I'm not sure why any bgworker would actually want to be shut down or not restarted during the twilight zone of a smart shutdown though -- if users can do arbitrary stuff, why can't supporting workers carry on? For example, a hypothetical extension that triggers vacuum freeze at smarter times, or a wait event sampling extension, an FDW that uses an extra worker to maintain a connection to something, etc etc could all be things that a user is indirectly relying on to do their normal work, and I struggle to think of an example of something that you explicitly don't want running just because (in some sense) the server *plans* to shut down, when the users get around to logging off. But maybe I lack imagination. > In any case, we'd want to treat PM_WAIT_READONLY like PM_HOT_STANDBY not > PM_RUN, no? Yeah, you're right. > Also, the state before PM_WAIT_READONLY could have been > PM_RECOVERY or PM_STARTUP, in which case we don't really want to think > it's like PM_HOT_STANDBY either; only the BgWorkerStart_PostmasterStart > case should be accepted. That suggests that we need yet another pmState, > or else a more thoroughgoing refactoring of how the postmaster's state > is represented. Hmm.
Re: run pgindent on a regular basis / scripted manner
Hi, On 2020-08-12 16:08:50 -0700, Jesse Zhang wrote: > On Wed, Aug 12, 2020 at 3:34 PM Andres Freund wrote: > > > > Hi, > > > > When developing patches I find it fairly painful that I cannot re-indent > > patches with pgindent without also seeing a lot of indentation changes > > in unmodified parts of files. It is easy enough ([1]) to only re-indent > > files that I have modified, but there's often a lot of independent > > indentation changes in the files that I did modified. > > > > I e.g. just re-indented patch 0001 of my GetSnapshotData() series and > > most of the hunks were entirely unrelated. Despite the development > > window for 14 having only relatively recently opened. Based on my > > experience it tends to get worse over time. > > How bad was it right after branching 13? I wonder if we have any > empirical measure of badness over time -- assuming there was a point in > the recent past where everything was good, and the bad just crept in. Well, just after branching it was perfect, because pgindent was customarily is run just before branching. After that it incrementally gets worse. > > Is there any reason we don't just automatically run pgindent regularly? > > Like once a week? And also update typedefs.list automatically, while > > we're at it? > > You know what's better than weekly? Every check-in. I for one would love > it if we can just format the entire codebase, and ensure that new > check-ins are also formatted. We _do_ need some form of continuous > integration to catch us when we have fallen short (again, once HEAD > reaches a "known good" state, it's conceivably cheap to keep it in the > good state. Unfortunately that is, with the current tooling, not entirely trivial to do so completely. The way we generate the list of known typedefs unfortunately depends on the platform a build is run on. Therefore the buildfarm collects a number of the generated list of typedefs from different platforms, and then we use that combined list to run pgindent. We surely can improve further, but I think having any automation around this already would be a huge step. Greetings, Andres Freund
Re: Dependencies for partitioned indexes are still a mess
Andres Freund writes: > Given that pg_dump already re-orders the columns for DDL, could we make > it apply that re-ordering not just during the CREATE TABLE, but also > when dumping the table contents? Hm, possibly. I think when this was last looked at, we didn't have any way to get COPY to output the columns in non-physical order, but now we do. regards, tom lane
Re: run pgindent on a regular basis / scripted manner
Hi Andres, On Wed, Aug 12, 2020 at 3:34 PM Andres Freund wrote: > > Hi, > > When developing patches I find it fairly painful that I cannot re-indent > patches with pgindent without also seeing a lot of indentation changes > in unmodified parts of files. It is easy enough ([1]) to only re-indent > files that I have modified, but there's often a lot of independent > indentation changes in the files that I did modified. > > I e.g. just re-indented patch 0001 of my GetSnapshotData() series and > most of the hunks were entirely unrelated. Despite the development > window for 14 having only relatively recently opened. Based on my > experience it tends to get worse over time. How bad was it right after branching 13? I wonder if we have any empirical measure of badness over time -- assuming there was a point in the recent past where everything was good, and the bad just crept in. > > > Is there any reason we don't just automatically run pgindent regularly? > Like once a week? And also update typedefs.list automatically, while > we're at it? You know what's better than weekly? Every check-in. I for one would love it if we can just format the entire codebase, and ensure that new check-ins are also formatted. We _do_ need some form of continuous integration to catch us when we have fallen short (again, once HEAD reaches a "known good" state, it's conceivably cheap to keep it in the good state. Cheers, Jesse
Re: run pgindent on a regular basis / scripted manner
On 2020-Aug-12, Andres Freund wrote: > Is there any reason we don't just automatically run pgindent regularly? > Like once a week? And also update typedefs.list automatically, while > we're at it? Seconded. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Dependencies for partitioned indexes are still a mess
On 2020-Jul-15, Tom Lane wrote: > Issue #2: parallel restore does not work > > 1. dropdb r2; createdb r2 > 2. pg_restore -j8 -d r2 regression.dump > > This is fairly timing-dependent, but some attempts fail with messages > like > > pg_restore: while PROCESSING TOC: > pg_restore: from TOC entry 6684; 2606 29166 FK CONSTRAINT fk fk_a_fkey > postgres > pg_restore: error: could not execute query: ERROR: there is no unique > constraint matching given keys for referenced table "pk" > Command was: ALTER TABLE fkpart3.fk > ADD CONSTRAINT fk_a_fkey FOREIGN KEY (a) REFERENCES fkpart3.pk(a); Hmm, we do make the FK constraint depend on the ATTACH for the direct children; what I think we're lacking is dependencies on descendants twice-removed (?) or higher. This mock patch seems to fix this problem by adding dependencies recursively on all children of the index; I no longer see this problem with it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 42391b0b2c..c78bbd7d00 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -7412,6 +7412,24 @@ getExtendedStatistics(Archive *fout) destroyPQExpBuffer(query); } +/* recursive bit of getConstraints */ +static void +addConstrChildIdxDeps(DumpableObject *dobj, IndxInfo *refidx) +{ + SimplePtrListCell *cell; + + for (cell = refidx->partattaches.head; cell; cell = cell->next) + { + DumpableObject *childobj = (DumpableObject *) cell->ptr; + IndexAttachInfo *attach = (IndexAttachInfo *) childobj; + + addObjectDependency(dobj, childobj->dumpId); + + if (attach->partitionIdx->partattaches.head != NULL) + addConstrChildIdxDeps(dobj, attach->partitionIdx); + } +} + /* * getConstraints * @@ -7517,25 +7535,20 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables) reftable = findTableByOid(constrinfo[j].confrelid); if (reftable && reftable->relkind == RELKIND_PARTITIONED_TABLE) { -IndxInfo *refidx; Oid indexOid = atooid(PQgetvalue(res, j, i_conindid)); if (indexOid != InvalidOid) { for (int k = 0; k < reftable->numIndexes; k++) { - SimplePtrListCell *cell; + IndxInfo *refidx; /* not our index? */ if (reftable->indexes[k].dobj.catId.oid != indexOid) continue; refidx = >indexes[k]; - for (cell = refidx->partattaches.head; cell; - cell = cell->next) - addObjectDependency([j].dobj, -((DumpableObject *) - cell->ptr)->dumpId); + addConstrChildIdxDeps([j].dobj, refidx); break; } }
Re: Dependencies for partitioned indexes are still a mess
Hi, On 2020-08-12 18:29:16 -0400, Tom Lane wrote: > Andres Freund writes: > > I've attached the diff between first.sql and second.sql. Here's the > > highlights: > > As I recall, the differences in b_star etc are expected, because > pg_dump reorders that table's columns to match its inheritance parent, > which they don't to start with because of ALTER TABLE operations. Ugh. Obviously applications shouldn't use INSERT or SELECT without a target list, but that still seems somewhat nasty. I guess we could script it so that we don't compare the "original" with a restored database, but instead compare the restored version with one restored from that. But that seems likely to hide bugs. Given that pg_dump already re-orders the columns for DDL, could we make it apply that re-ordering not just during the CREATE TABLE, but also when dumping the table contents? Greetings, Andres Freund
run pgindent on a regular basis / scripted manner
Hi, When developing patches I find it fairly painful that I cannot re-indent patches with pgindent without also seeing a lot of indentation changes in unmodified parts of files. It is easy enough ([1]) to only re-indent files that I have modified, but there's often a lot of independent indentation changes in the files that I did modified. I e.g. just re-indented patch 0001 of my GetSnapshotData() series and most of the hunks were entirely unrelated. Despite the development window for 14 having only relatively recently opened. Based on my experience it tends to get worse over time. Is there any reason we don't just automatically run pgindent regularly? Like once a week? And also update typedefs.list automatically, while we're at it? Currently the yearly pgindent runs are somewhat painful for patches that didn't make it into the release, but if we were to reindent on a more regular basis, that should be much less the case. It'd also help newer developers who we sometimes tell to use pgindent - which doesn't really work. Greetings, Andres Freund [1] ./src/tools/pgindent/pgindent $(git diff-tree --no-commit-id --name-only -r upstream/master..HEAD|grep -v src/test|grep -v READ ME|grep -v typedefs.list)
Re: Dependencies for partitioned indexes are still a mess
Andres Freund writes: > I've attached the diff between first.sql and second.sql. Here's the > highlights: As I recall, the differences in b_star etc are expected, because pg_dump reorders that table's columns to match its inheritance parent, which they don't to start with because of ALTER TABLE operations. I'm pretty sure we set it up that way deliberately ages ago, because pg_dump used to have bugs when contending with such cases. Not sure about a good way to mechanize recognizing that these diffs are expected. Dunno about test_type_diff2, but it might be a newer instance of the same thing. regards, tom lane
Re: Parallel query hangs after a smart shutdown is issued
Thomas Munro writes: > I think we also need: > + else if (Shutdown <= SmartShutdown && > +backend_type == BACKEND_TYPE_AUTOVAC) > + result = CAC_OK; Hm, ok. > Retesting the original complaint, I think we need: > @@ -5911,11 +5912,11 @@ bgworker_should_start_now(BgWorkerStartTime > start_time) > + case PM_WAIT_READONLY: > + case PM_WAIT_CLIENTS: > case PM_RUN: So the question here is whether time-based bgworkers should be allowed to restart in this scenario. I'm not quite sure --- depending on what the bgworker's purpose is, you could make an argument either way, I think. Do we need some way to control that? In any case, we'd want to treat PM_WAIT_READONLY like PM_HOT_STANDBY not PM_RUN, no? Also, the state before PM_WAIT_READONLY could have been PM_RECOVERY or PM_STARTUP, in which case we don't really want to think it's like PM_HOT_STANDBY either; only the BgWorkerStart_PostmasterStart case should be accepted. That suggests that we need yet another pmState, or else a more thoroughgoing refactoring of how the postmaster's state is represented. regards, tom lane
Re: Dependencies for partitioned indexes are still a mess
Hi, On 2020-07-15 15:52:03 -0400, Tom Lane wrote: > I've been experimenting with trying to dump-and-restore the > regression database, which is a test case that for some reason > we don't cover in the buildfarm (pg_upgrade is not the same thing). Yea, we really should have that. IIRC I was trying to add that, and tests that compare dumps from primary / standby, and failed due to some differences that were hard to fix. A quick test with pg_dumpall shows some odd differences after: 1) create new cluster 2) installcheck-parallel 3) drop table gtest30_1, gtest1_1; 4) pg_dumpall > first.sql 5) recreate cluster 6) psql < first.sql > first.sql.log 7) pg_dumpall > second.sql I've attached the diff between first.sql and second.sql. Here's the highlights: @@ -15392,9 +15392,9 @@ -- CREATE TABLE public.test_type_diff2_c1 ( +int_two smallint, int_four bigint, -int_eight bigint, -int_two smallint +int_eight bigint ) INHERITS (public.test_type_diff2); ... @@ -39194,10 +39194,10 @@ -- Data for Name: b_star; Type: TABLE DATA; Schema: public; Owner: andres -- -COPY public.b_star (class, aa, bb, a) FROM stdin; -b 3 mumble \N +COPY public.b_star (class, aa, a, bb) FROM stdin; +b 3 \N mumble b 4 \N \N -b \N bumble \N +b \N \N bumble b \N \N \N \. @@ -323780,7 +323780,7 @@ -- Data for Name: renamecolumnanother; Type: TABLE DATA; Schema: public; Owner: andres -- -COPY public.renamecolumnanother (d, a, c, w) FROM stdin; +COPY public.renamecolumnanother (d, w, a, c) FROM stdin; \. The primary / standby differences are caused by sequence logging. I wonder if there's some good way to hide those, or to force them to be the same between primary / standby, without hiding bugs. Greetings, Andres Freund --- /tmp/first.sql 2020-08-12 15:01:11.810862861 -0700 +++ /tmp/second.sql 2020-08-12 15:02:05.877709572 -0700 @@ -15392,9 +15392,9 @@ -- CREATE TABLE public.test_type_diff2_c1 ( +int_two smallint, int_four bigint, -int_eight bigint, -int_two smallint +int_eight bigint ) INHERITS (public.test_type_diff2); @@ -15406,9 +15406,9 @@ -- CREATE TABLE public.test_type_diff2_c2 ( -int_eight bigint, int_two smallint, -int_four bigint +int_four bigint, +int_eight bigint ) INHERITS (public.test_type_diff2); @@ -39194,10 +39194,10 @@ -- Data for Name: b_star; Type: TABLE DATA; Schema: public; Owner: andres -- -COPY public.b_star (class, aa, bb, a) FROM stdin; -b 3 mumble \N +COPY public.b_star (class, aa, a, bb) FROM stdin; +b 3 \N mumble b 4 \N \N -b \N bumble \N +b \N \N bumble b \N \N \N \. @@ -91102,10 +91102,10 @@ -- Data for Name: c_star; Type: TABLE DATA; Schema: public; Owner: andres -- -COPY public.c_star (class, aa, cc, a) FROM stdin; -c 5 hi mom \N +COPY public.c_star (class, aa, a, cc) FROM stdin; +c 5 \N hi mom c 6 \N \N -c \N hi paul \N +c \N \N hi paul c \N \N \N \. @@ -91381,22 +91381,22 @@ -- Data for Name: d_star; Type: TABLE DATA; Schema: public; Owner: andres -- -COPY public.d_star (class, aa, bb, cc, dd, a) FROM stdin; -d 7 grumble hi sunita 0 \N -d 8 stumble hi koko \N \N -d 9 rumble \N 1.1 \N -d 10 \N hi kristin 10.01 \N -d \N crumble hi boris 100.001 \N -d 11 fumble \N \N \N -d 12 \N hi avi \N \N -d 13 \N \N 1000.0001 \N -d \N tumble hi andrew \N \N -d \N humble \N 1.1 \N -d \N \N hi ginger 10.01 \N +COPY public.d_star (class, aa, a, bb, cc, dd) FROM stdin; +d 7 \N grumble hi sunita 0 +d 8 \N stumble hi koko \N +d 9 \N rumble \N 1.1 +d 10 \N \N hi kristin 10.01 +d \N \N crumble hi boris 100.001 +d 11 \N fumble \N \N +d 12 \N \N hi avi \N +d 13 \N \N \N 1000.0001 +d \N \N tumble hi andrew \N +d \N \N humble \N 1.1 +d \N \N \N hi ginger 10.01 d 14 \N \N \N \N -d \N jumble \N \N \N -d \N \N hi jolly \N \N -d \N \N \N 100.001 \N +d \N \N jumble \N \N +d \N \N \N hi jolly \N +d \N \N \N \N 100.001 d \N \N \N \N \N \. @@ -103095,14 +103095,14 @@ -- Data for Name: e_star; Type: TABLE DATA; Schema: public; Owner: andres -- -COPY public.e_star (class, aa, cc, ee, e, a) FROM stdin; -e 15 hi carol -1 \N \N -e 16 hi bob \N \N \N -e 17 \N -2 \N \N -e \N hi michelle -3 \N \N +COPY public.e_star (class, aa, a, cc, ee, e) FROM stdin; +e 15 \N hi carol -1 \N +e 16 \N hi bob \N \N +e 17 \N \N -2 \N +e \N \N hi michelle -3 \N e 18 \N \N \N \N -e \N hi elisa \N \N \N -e \N \N -4 \N \N +e \N \N hi elisa \N \N +e \N \N \N -4 \N \. @@ -103174,23 +103174,23 @@ -- Data for Name: f_star; Type: TABLE DATA; Schema: public; Owner: andres -- -COPY public.f_star (class, aa, cc, ee, ff, f, e, a) FROM stdin; -f 19 hi claire -5 ((1,3),(2,4)) 10 \N \N -f 20 hi mike -6 \N 10 \N \N -f 21 hi marcel \N ((11,44),(22,55),(33,66)) 10 \N \N -f 22 \N -7 ((111,555),(222,666),(333,777),(444,888)) 10 \N \N -f \N hi keith -8 ((,),(,)) 10 \N \N -f 24 hi marc \N \N 10 \N \N
Re: Dependencies for partitioned indexes are still a mess
On 2020-Jul-15, Tom Lane wrote: > Issue #1: "--clean" does not work > > 1. createdb r2 > 2. pg_restore -d r2 regression.dump > 3. pg_restore --clean -d r2 regression.dump > > pg_restore: while PROCESSING TOC: > pg_restore: from TOC entry 6606; 1259 35458 INDEX idxpart32_a_idx postgres > pg_restore: error: could not execute query: ERROR: cannot drop index > public.idxpart32_a_idx because index public.idxpart3_a_idx requires it > HINT: You can drop index public.idxpart3_a_idx instead. > Command was: DROP INDEX public.idxpart32_a_idx; I think this problem is just that we're trying to drop a partition index that's not droppable. This seems fixed with just leaving the dropStmt empty, as in the attached. One issue is that if you previously restored only that particular partition and its indexes, but not the ATTACH command that would make it dependent on the parent index, there would not be a DROP command to get rid of it. Do we need to be concerned about that case? I'm inclined to think not. > (There seem to be some other problems as well, but most of the 54 complaints > are related to partitioned indexes/constraints.) In my run of it there's a good dozen remaining problems, all alike: we do DROP TYPE widget CASCADE (which works) followed by DROP FUNCTION public.widget_out(widget), which fails complaining that type widget doesn't exist. But in reality the error is innocuous, since that function was dropped by the DROP TYPE CASCADE anyway. You could say that the same thing is happening with these noisy DROP INDEX of index partitions: the complaints are right in that each partition's DROP INDEX command doesn't actually work, but the indexes are dropped later anyway, so the effect is the same. > Issue #2: parallel restore does not work Looking. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 8334445705c53bb0abff407ebb92ac67975a5898 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 12 Aug 2020 17:36:37 -0400 Subject: [PATCH] Don't dump DROP stmts for index partitions They would fail to run in --clean mode anyway --- src/bin/pg_dump/pg_dump.c | 14 +- src/bin/pg_dump/pg_dump.h | 2 +- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 9c8436dde6..42391b0b2c 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -16416,7 +16416,8 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo) qindxname); } - appendPQExpBuffer(delq, "DROP INDEX %s;\n", qqindxname); + if (indxinfo->parentidx == InvalidOid) + appendPQExpBuffer(delq, "DROP INDEX %s;\n", qqindxname); if (indxinfo->dobj.dump & DUMP_COMPONENT_DEFINITION) ArchiveEntry(fout, indxinfo->dobj.catId, indxinfo->dobj.dumpId, @@ -16695,10 +16696,13 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo) "pg_catalog.pg_class", "INDEX", fmtQualifiedDumpable(indxinfo)); - appendPQExpBuffer(delq, "ALTER %sTABLE ONLY %s ", foreign, - fmtQualifiedDumpable(tbinfo)); - appendPQExpBuffer(delq, "DROP CONSTRAINT %s;\n", - fmtId(coninfo->dobj.name)); + if (indxinfo->parentidx == InvalidOid) + { + appendPQExpBuffer(delq, "ALTER %sTABLE ONLY %s ", foreign, + fmtQualifiedDumpable(tbinfo)); + appendPQExpBuffer(delq, "DROP CONSTRAINT %s;\n", + fmtId(coninfo->dobj.name)); + } tag = psprintf("%s %s", tbinfo->dobj.name, coninfo->dobj.name); diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h index da97b731b1..2f051b83d9 100644 --- a/src/bin/pg_dump/pg_dump.h +++ b/src/bin/pg_dump/pg_dump.h @@ -368,7 +368,7 @@ typedef struct _indxInfo * contains both key and nonkey attributes */ bool indisclustered; bool indisreplident; - Oid parentidx; /* if partitioned, parent index OID */ + Oid parentidx; /* if a partition, parent index OID */ SimplePtrList partattaches; /* if partitioned, partition attach objects */ /* if there is an associated constraint object, its dumpId: */ -- 2.20.1
Re: Parallel query hangs after a smart shutdown is issued
On Thu, Aug 13, 2020 at 8:59 AM Tom Lane wrote: > I wrote: > > Oh, excellent point! I'd not thought to look at tests of the Shutdown > > variable, but yeah, those should be <= SmartShutdown if we want autovac > > to continue to operate in this state. > > On looking closer, there's another problem: setting start_autovac_launcher > isn't enough to get the AV launcher to run, because ServerLoop() won't > launch it except in PM_RUN state. Likewise, the other "relaunch a dead > process" checks in ServerLoop() need to be generalized to support > relaunching background processes while we're waiting out the foreground > clients. So that leads me to the attached v3. I had to re-instantiate > PM_WAIT_READONLY as an alternate state to PM_WAIT_CLIENTS; these states > are about the same so far as PostmasterStateMachine is concerned, but > some of the should-we-launch-FOO checks care about the difference. I think we also need: @@ -2459,6 +2459,9 @@ canAcceptConnections(int backend_type) { if (pmState == PM_WAIT_BACKUP) result = CAC_WAITBACKUP;/* allow superusers only */ + else if (Shutdown <= SmartShutdown && +backend_type == BACKEND_TYPE_AUTOVAC) + result = CAC_OK; else if (Shutdown > NoShutdown) return CAC_SHUTDOWN;/* shutdown is pending */ else if (!FatalError && Retesting the original complaint, I think we need: @@ -5911,11 +5912,11 @@ bgworker_should_start_now(BgWorkerStartTime start_time) case PM_SHUTDOWN_2: case PM_SHUTDOWN: case PM_WAIT_BACKENDS: - case PM_WAIT_READONLY: - case PM_WAIT_CLIENTS: case PM_WAIT_BACKUP: break; + case PM_WAIT_READONLY: + case PM_WAIT_CLIENTS: case PM_RUN: if (start_time == BgWorkerStart_RecoveryFinished) return true;
Re: Parallel query hangs after a smart shutdown is issued
I wrote: > Oh, excellent point! I'd not thought to look at tests of the Shutdown > variable, but yeah, those should be <= SmartShutdown if we want autovac > to continue to operate in this state. On looking closer, there's another problem: setting start_autovac_launcher isn't enough to get the AV launcher to run, because ServerLoop() won't launch it except in PM_RUN state. Likewise, the other "relaunch a dead process" checks in ServerLoop() need to be generalized to support relaunching background processes while we're waiting out the foreground clients. So that leads me to the attached v3. I had to re-instantiate PM_WAIT_READONLY as an alternate state to PM_WAIT_CLIENTS; these states are about the same so far as PostmasterStateMachine is concerned, but some of the should-we-launch-FOO checks care about the difference. The various pmState tests are getting messy enough to cry out for refactorization, but I've not attempted that here. There's enough variance in the conditions for launching different subprocesses that I'm not very sure what would be a nicer-looking way to write them. regards, tom lane diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml index e31275a04e..3946fa52ea 100644 --- a/doc/src/sgml/ref/pg_ctl-ref.sgml +++ b/doc/src/sgml/ref/pg_ctl-ref.sgml @@ -185,8 +185,8 @@ PostgreSQL documentation stop mode shuts down the server that is running in the specified data directory. Three different shutdown methods can be selected with the -m - option. Smart mode waits for all active - clients to disconnect and any online backup to finish. + option. Smart mode disallows new connections, then waits + for all existing clients to disconnect and any online backup to finish. If the server is in hot standby, recovery and streaming replication will be terminated once all clients have disconnected. Fast mode (the default) does not wait for clients to disconnect and diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 38e2c16ac2..d134dade53 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -148,8 +148,6 @@ #define BACKEND_TYPE_BGWORKER 0x0008 /* bgworker process */ #define BACKEND_TYPE_ALL 0x000F /* OR of all the above */ -#define BACKEND_TYPE_WORKER (BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER) - /* * List of active backends (or child processes anyway; we don't actually * know whether a given child has become a backend or is still in the @@ -319,10 +317,10 @@ static bool FatalError = false; /* T if recovering from backend crash */ * * Notice that this state variable does not distinguish *why* we entered * states later than PM_RUN --- Shutdown and FatalError must be consulted - * to find that out. FatalError is never true in PM_RECOVERY_* or PM_RUN - * states, nor in PM_SHUTDOWN states (because we don't enter those states - * when trying to recover from a crash). It can be true in PM_STARTUP state, - * because we don't clear it until we've successfully started WAL redo. + * to find that out. FatalError is never true in PM_RECOVERY, PM_HOT_STANDBY, + * or PM_RUN states, nor in PM_SHUTDOWN states (because we don't enter those + * states when trying to recover from a crash). It can be true in PM_STARTUP + * state, because we don't clear it until we've successfully started WAL redo. */ typedef enum { @@ -332,8 +330,9 @@ typedef enum PM_HOT_STANDBY,/* in hot standby mode */ PM_RUN, /* normal "database is alive" state */ PM_WAIT_BACKUP,/* waiting for online backup mode to end */ - PM_WAIT_READONLY, /* waiting for read only backends to exit */ - PM_WAIT_BACKENDS, /* waiting for live backends to exit */ + PM_WAIT_CLIENTS, /* waiting for normal backends to exit */ + PM_WAIT_READONLY, /* likewise, when we had been in a RO state */ + PM_WAIT_BACKENDS, /* waiting for all backends to exit */ PM_SHUTDOWN,/* waiting for checkpointer to do shutdown * ckpt */ PM_SHUTDOWN_2,/* waiting for archiver and walsenders to @@ -437,9 +436,10 @@ static void InitPostmasterDeathWatchHandle(void); * even during recovery. */ #define PgArchStartupAllowed() \ - ((XLogArchivingActive() && pmState == PM_RUN) || \ + ((XLogArchivingActive() && \ + (pmState == PM_RUN || pmState == PM_WAIT_BACKUP || pmState == PM_WAIT_CLIENTS)) || \ (XLogArchivingAlways() && \ - (pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY))) + (pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY))) #ifdef EXEC_BACKEND @@ -1750,7 +1750,8 @@ ServerLoop(void) * fails, we'll just try again later. Likewise for the checkpointer. */ if (pmState == PM_RUN || pmState == PM_RECOVERY || - pmState == PM_HOT_STANDBY) + pmState == PM_HOT_STANDBY || pmState == PM_WAIT_BACKUP || + pmState == PM_WAIT_CLIENTS || pmState == PM_WAIT_READONLY) { if (CheckpointerPID == 0)
Re: [BUG] Error in BRIN summarization
On 2020-Aug-12, Alvaro Herrera wrote: > 'anyvisible' mode is not required AFAICS; reading the code, I think this > could also hit REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY, which > do not use that flag. I didn't try to reproduce it there, though. > Anyway, I'm going to remove that Assert() I added. So this is what I propose as the final form of the fix. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From e0abe5d957155285980a40fb33c192100699e8c0 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 12 Aug 2020 14:02:58 -0400 Subject: [PATCH v4] fix HOT tuples while scanning for index builds --- src/backend/access/heap/heapam_handler.c | 20 src/backend/access/heap/pruneheap.c | 5 +++-- 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c index 267a6ee25a..ba44e30035 100644 --- a/src/backend/access/heap/heapam_handler.c +++ b/src/backend/access/heap/heapam_handler.c @@ -1324,6 +1324,12 @@ heapam_index_build_range_scan(Relation heapRelation, * buffer continuously while visiting the page, so no pruning * operation can occur either. * + * In cases with only ShareUpdateExclusiveLock on the table, it's + * possible for some HOT tuples to appear that we didn't know about + * when we first read the page. To handle that case, we re-obtain the + * list of root offsets when a HOT tuple points to a root item that we + * don't know about. + * * Also, although our opinions about tuple liveness could change while * we scan the page (due to concurrent transaction commits/aborts), * the chain root locations won't, so this info doesn't need to be @@ -1625,6 +1631,20 @@ heapam_index_build_range_scan(Relation heapRelation, offnum = ItemPointerGetOffsetNumber(>t_self); + /* + * If a HOT tuple points to a root that we don't know + * about, obtain root items afresh. If that still fails, + * report it as corruption. + */ + if (root_offsets[offnum - 1] == InvalidOffsetNumber) + { +Page page = BufferGetPage(hscan->rs_cbuf); + +LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE); +heap_get_root_tuples(page, root_offsets); +LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK); + } + if (!OffsetNumberIsValid(root_offsets[offnum - 1])) ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index 256df4de10..7e3d44dfd6 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -732,7 +732,7 @@ heap_page_prune_execute(Buffer buffer, * root_offsets[k - 1] = j. * * The passed-in root_offsets array must have MaxHeapTuplesPerPage entries. - * We zero out all unused entries. + * Unused entries are filled with InvalidOffsetNumber (zero). * * The function must be called with at least share lock on the buffer, to * prevent concurrent prune operations. @@ -747,7 +747,8 @@ heap_get_root_tuples(Page page, OffsetNumber *root_offsets) OffsetNumber offnum, maxoff; - MemSet(root_offsets, 0, MaxHeapTuplesPerPage * sizeof(OffsetNumber)); + MemSet(root_offsets, InvalidOffsetNumber, + MaxHeapTuplesPerPage * sizeof(OffsetNumber)); maxoff = PageGetMaxOffsetNumber(page); for (offnum = FirstOffsetNumber; offnum <= maxoff; offnum = OffsetNumberNext(offnum)) -- 2.20.1
Re: use pg_get_functiondef() in pg_dump
On Wed, Aug 12, 2020 at 4:25 AM Peter Eisentraut wrote: > Here is a patch to have pg_dump use pg_get_functiondef() instead of > assembling the CREATE FUNCTION/PROCEDURE commands itself. This should > save on maintenance effort in the future. It's also a prerequisite for > being able to dump functions with SQL-standard function body discussed > in [0]. > > pg_get_functiondef() was meant for psql's \ef command, so its defaults > are slightly different from what pg_dump would like, so this adds a few > optional parameters for tweaking the behavior. The naming of the > parameters is up for discussion. One problem with this, which I think Tom pointed out before, is that it might make it to handle some forward-compatibility problems. In other words, if something that the server is generating needs to be modified for compatibility with a future release, it's not easy to do that. Like if we needed to quote something we weren't previously quoting, for example. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Parallel query hangs after a smart shutdown is issued
Thomas Munro writes: > On Thu, Aug 13, 2020 at 6:00 AM Tom Lane wrote: >> One other thing I changed here was to remove PM_WAIT_READONLY from the >> set of states in which we'll allow promotion to occur or a new walreceiver >> to start. I'm not convinced that either of those behaviors aren't >> bugs; although if someone thinks they're right, we can certainly put >> back PM_WAIT_CLIENTS in those checks. (But, for example, it does not >> appear possible to reach PM_WAIT_READONLY/PM_WAIT_CLIENTS state with >> Shutdown == NoShutdown, so the test in MaybeStartWalReceiver sure looks >> like confusingly dead code to me. If we do want to allow restarting >> the walreceiver in this state, the Shutdown condition needs fixed.) > If a walreceiver is allowed to run, why should it not be allowed to > restart? I'd come to about the same conclusion after thinking more, so v2 attached undoes that change. I think putting off promotion is fine though; it'll get handled at the next postmaster start. (It looks like the state machine would just proceed to exit anyway if we allowed the promotion, but that's a hard-to-test state transition that we could do without.) > Yeah, I suppose that other test'd need to be Shutdown <= > SmartShutdown, just like we do in SIGHUP_handler(). Looking at other > places where we test Shutdown == NoShutdown, one that jumps out is the > autovacuum wraparound defence stuff and the nearby > PMSIGNAL_START_AUTOVAC_WORKER code. Oh, excellent point! I'd not thought to look at tests of the Shutdown variable, but yeah, those should be <= SmartShutdown if we want autovac to continue to operate in this state. I also noticed that where reaper() is dealing with startup process exit(3), it unconditionally sets Shutdown = SmartShutdown which seems pretty bogus; that variable's value should never be allowed to decrease, but this could cause it. In the attached I did StartupStatus = STARTUP_NOT_RUNNING; -Shutdown = SmartShutdown; +Shutdown = Max(Shutdown, SmartShutdown); TerminateChildren(SIGTERM); But given that it's forcing immediate termination of all backends, I wonder if that's not more like a FastShutdown? (Scary here is that the coverage report shows we're not testing this path, so who knows if it works at all.) regards, tom lane diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml index e31275a04e..3946fa52ea 100644 --- a/doc/src/sgml/ref/pg_ctl-ref.sgml +++ b/doc/src/sgml/ref/pg_ctl-ref.sgml @@ -185,8 +185,8 @@ PostgreSQL documentation stop mode shuts down the server that is running in the specified data directory. Three different shutdown methods can be selected with the -m - option. Smart mode waits for all active - clients to disconnect and any online backup to finish. + option. Smart mode disallows new connections, then waits + for all existing clients to disconnect and any online backup to finish. If the server is in hot standby, recovery and streaming replication will be terminated once all clients have disconnected. Fast mode (the default) does not wait for clients to disconnect and diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 38e2c16ac2..e8ad4b67a3 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -148,8 +148,6 @@ #define BACKEND_TYPE_BGWORKER 0x0008 /* bgworker process */ #define BACKEND_TYPE_ALL 0x000F /* OR of all the above */ -#define BACKEND_TYPE_WORKER (BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER) - /* * List of active backends (or child processes anyway; we don't actually * know whether a given child has become a backend or is still in the @@ -319,7 +317,7 @@ static bool FatalError = false; /* T if recovering from backend crash */ * * Notice that this state variable does not distinguish *why* we entered * states later than PM_RUN --- Shutdown and FatalError must be consulted - * to find that out. FatalError is never true in PM_RECOVERY_* or PM_RUN + * to find that out. FatalError is never true in PM_RECOVERY or PM_RUN * states, nor in PM_SHUTDOWN states (because we don't enter those states * when trying to recover from a crash). It can be true in PM_STARTUP state, * because we don't clear it until we've successfully started WAL redo. @@ -332,8 +330,8 @@ typedef enum PM_HOT_STANDBY,/* in hot standby mode */ PM_RUN, /* normal "database is alive" state */ PM_WAIT_BACKUP,/* waiting for online backup mode to end */ - PM_WAIT_READONLY, /* waiting for read only backends to exit */ - PM_WAIT_BACKENDS, /* waiting for live backends to exit */ + PM_WAIT_CLIENTS, /* waiting for normal backends to exit */ + PM_WAIT_BACKENDS, /* waiting for all backends to exit */ PM_SHUTDOWN,/* waiting for checkpointer to do shutdown * ckpt */ PM_SHUTDOWN_2,/* waiting for archiver and
Re: Parallel query hangs after a smart shutdown is issued
On Thu, Aug 13, 2020 at 6:00 AM Tom Lane wrote: > Thomas Munro writes: > > On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy > > wrote: > >> After a smart shutdown is issued(with pg_ctl), run a parallel query, > >> then the query hangs. > > > Yeah, the current situation is not good. I think your option 2 sounds > > better, because the documented behaviour of smart shutdown is that it > > "lets existing sessions end their work normally". I think that means > > that a query that is already running or allowed to start should be > > able to start new workers and not have its existing workers > > terminated. Arseny Sher wrote a couple of different patches to try > > that last year, but they fell through the cracks: > > https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com > > I already commented on this in the other thread that Bharath started [1]. > I think the real issue here is why is the postmaster's SIGTERM handler > doing *anything* other than disallowing new connections? It seems quite > premature to kill support processes of any sort, not only parallel > workers. The documentation says existing clients are allowed to end > their work, not that their performance is going to be crippled until > they end. Right. It's pretty strange that during smart shutdown, you could run for hours with no autovacuum, walwriter, bgwriter. I guess Arseny and I were looking for a minimal change to fix a bug, but clearly there's a more general problem and this change works out cleaner anyway. > So I looked at moving the kills of all the support processes to happen > after we detect that there are no remaining regular backends, and it > seems to not be too hard. I realized that the existing PM_WAIT_READONLY > state is doing that already, but just for a subset of support processes > that it thinks might be active in hot standby mode. So what I did in the > attached was to repurpose that state as "PM_WAIT_CLIENTS", which does the > right thing in either regular or hot standby mode. Make sense, works as expected and passes check-world. > One other thing I changed here was to remove PM_WAIT_READONLY from the > set of states in which we'll allow promotion to occur or a new walreceiver > to start. I'm not convinced that either of those behaviors aren't > bugs; although if someone thinks they're right, we can certainly put > back PM_WAIT_CLIENTS in those checks. (But, for example, it does not > appear possible to reach PM_WAIT_READONLY/PM_WAIT_CLIENTS state with > Shutdown == NoShutdown, so the test in MaybeStartWalReceiver sure looks > like confusingly dead code to me. If we do want to allow restarting > the walreceiver in this state, the Shutdown condition needs fixed.) If a walreceiver is allowed to run, why should it not be allowed to restart? Yeah, I suppose that other test'd need to be Shutdown <= SmartShutdown, just like we do in SIGHUP_handler(). Looking at other places where we test Shutdown == NoShutdown, one that jumps out is the autovacuum wraparound defence stuff and the nearby PMSIGNAL_START_AUTOVAC_WORKER code.
ltree_plpython failure test on Cygwin for 12.4 test
I am finally trying to move from python2.7 to python 3.x for both 3.7 and 3.8 I have (attached log): 2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython ERROR: incompatible library "/pub/devel/postgresql/prov a38/postgresql-12.4-1.x86_64/build/tmp_install/usr/lib/postgresql/plpython3.dll": missing magic block 2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython HINT: Extension libraries are required to use the PG_MO DULE_MAGIC macro. 2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython STATEMENT: CREATE EXTENSION ltree_plpython3u CASCADE; 2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython ERROR: language "plpython3u" does not exist 2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython HINT: Use CREATE EXTENSION to load the language into th e database. 2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython STATEMENT: CREATE FUNCTION test1(val ltree) RETURNS int LANGUAGE plpython3u TRANSFORM FOR TYPE ltree AS $$ plpy.info(repr(val)) return len(val) $$; Only the python tests fail $ grep FAIL postgresql-12.4-1-check.log test python3/hstore_plpython ... FAILED 423 ms test python3/jsonb_plpython ... FAILED 172 ms test python3/ltree_plpython ... FAILED 163 ms never had problem with python2.7 Suggestion ? Regards Marco 2020-08-12 18:35:43.891 CEST [10397] LOG: starting PostgreSQL 12.4 on x86_64-pc-cygwin, compiled by gcc (GCC) 9.3.0, 64-bit 2020-08-12 18:35:43.905 CEST [10397] LOG: listening on Unix socket "/tmp/pg_regress-RPb7cl/.s.PGSQL.54468" 2020-08-12 18:35:43.987 CEST [10400] LOG: database system was shut down at 2020-08-12 18:35:43 CEST 2020-08-12 18:35:44.017 CEST [10401] [unknown] FATAL: the database system is starting up 2020-08-12 18:35:44.192 CEST [10397] LOG: database system is ready to accept connections 2020-08-12 18:35:45.366 CEST [10402] LOG: checkpoint starting: immediate force wait flush-all 2020-08-12 18:35:45.370 CEST [10402] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.000 s, total=0.004 s; sync files=0, longest=0.000 s, average=0.000 s; distance=1 kB, estimate=1 kB 2020-08-12 18:35:46.503 CEST [10402] LOG: checkpoint starting: immediate force wait 2020-08-12 18:35:46.507 CEST [10402] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.003 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=1 kB 2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython ERROR: incompatible library "/pub/devel/postgresql/prova38/postgresql-12.4-1.x86_64/build/tmp_install/usr/lib/postgresql/plpython3.dll": missing magic block 2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython HINT: Extension libraries are required to use the PG_MODULE_MAGIC macro. 2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython STATEMENT: CREATE EXTENSION ltree_plpython3u CASCADE; 2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython ERROR: language "plpython3u" does not exist 2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython HINT: Use CREATE EXTENSION to load the language into the database. 2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython STATEMENT: CREATE FUNCTION test1(val ltree) RETURNS int LANGUAGE plpython3u TRANSFORM FOR TYPE ltree AS $$ plpy.info(repr(val)) return len(val) $$; 2020-08-12 18:35:47.434 CEST [10418] pg_regress/python3/ltree_plpython ERROR: function test1(ltree) does not exist at character 8 2020-08-12 18:35:47.434 CEST [10418] pg_regress/python3/ltree_plpython HINT: No function matches the given name and argument types. You might need to add explicit type casts. 2020-08-12 18:35:47.434 CEST [10418] pg_regress/python3/ltree_plpython STATEMENT: SELECT test1('aa.bb.cc'::ltree); 2020-08-12 18:35:47.435 CEST [10418] pg_regress/python3/ltree_plpython ERROR: language "plpython3u" does not exist 2020-08-12 18:35:47.435 CEST [10418] pg_regress/python3/ltree_plpython HINT: Use CREATE EXTENSION to load the language into the database. 2020-08-12 18:35:47.435 CEST [10418] pg_regress/python3/ltree_plpython STATEMENT: CREATE FUNCTION test1n(val ltree) RETURNS int LANGUAGE plpython3u TRANSFORM FOR TYPE ltree AS $$ plpy.info(repr(val)) return len(val) $$; 2020-08-12 18:35:47.435 CEST [10418] pg_regress/python3/ltree_plpython ERROR: function test1n(ltree) does not exist at character 8 2020-08-12 18:35:47.435 CEST [10418] pg_regress/python3/ltree_plpython HINT: No function matches the given name and argument types. You might need to add explicit type casts. 2020-08-12 18:35:47.435 CEST [10418]
Re: [BUG] Error in BRIN summarization
On 2020-Aug-11, Alvaro Herrera wrote: > A much more troubling thought is what happens if the range is > desummarized, then the index item is used for the summary of a different > range. Then the index might end up returning corrupt results. Actually, this is not a concern because the brin tuple's bt_blkno is rechecked before returning it, and if it doesn't match what we're searching, the loop is restarted. It becomes an infinite loop problem if the revmap is pointing to a tuple that's labelled with a different range's blkno. So I think my patch as posted is a sufficient fix for this problem. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Parallel query hangs after a smart shutdown is issued
Thomas Munro writes: > On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy > wrote: >> After a smart shutdown is issued(with pg_ctl), run a parallel query, >> then the query hangs. > Yeah, the current situation is not good. I think your option 2 sounds > better, because the documented behaviour of smart shutdown is that it > "lets existing sessions end their work normally". I think that means > that a query that is already running or allowed to start should be > able to start new workers and not have its existing workers > terminated. Arseny Sher wrote a couple of different patches to try > that last year, but they fell through the cracks: > https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com I already commented on this in the other thread that Bharath started [1]. I think the real issue here is why is the postmaster's SIGTERM handler doing *anything* other than disallowing new connections? It seems quite premature to kill support processes of any sort, not only parallel workers. The documentation says existing clients are allowed to end their work, not that their performance is going to be crippled until they end. So I looked at moving the kills of all the support processes to happen after we detect that there are no remaining regular backends, and it seems to not be too hard. I realized that the existing PM_WAIT_READONLY state is doing that already, but just for a subset of support processes that it thinks might be active in hot standby mode. So what I did in the attached was to repurpose that state as "PM_WAIT_CLIENTS", which does the right thing in either regular or hot standby mode. One other thing I changed here was to remove PM_WAIT_READONLY from the set of states in which we'll allow promotion to occur or a new walreceiver to start. I'm not convinced that either of those behaviors aren't bugs; although if someone thinks they're right, we can certainly put back PM_WAIT_CLIENTS in those checks. (But, for example, it does not appear possible to reach PM_WAIT_READONLY/PM_WAIT_CLIENTS state with Shutdown == NoShutdown, so the test in MaybeStartWalReceiver sure looks like confusingly dead code to me. If we do want to allow restarting the walreceiver in this state, the Shutdown condition needs fixed.) regards, tom lane [1] https://www.postgresql.org/message-id/65189.1597181322%40sss.pgh.pa.us diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml index e31275a04e..3946fa52ea 100644 --- a/doc/src/sgml/ref/pg_ctl-ref.sgml +++ b/doc/src/sgml/ref/pg_ctl-ref.sgml @@ -185,8 +185,8 @@ PostgreSQL documentation stop mode shuts down the server that is running in the specified data directory. Three different shutdown methods can be selected with the -m - option. Smart mode waits for all active - clients to disconnect and any online backup to finish. + option. Smart mode disallows new connections, then waits + for all existing clients to disconnect and any online backup to finish. If the server is in hot standby, recovery and streaming replication will be terminated once all clients have disconnected. Fast mode (the default) does not wait for clients to disconnect and diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 38e2c16ac2..790948a4b2 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -148,8 +148,6 @@ #define BACKEND_TYPE_BGWORKER 0x0008 /* bgworker process */ #define BACKEND_TYPE_ALL 0x000F /* OR of all the above */ -#define BACKEND_TYPE_WORKER (BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER) - /* * List of active backends (or child processes anyway; we don't actually * know whether a given child has become a backend or is still in the @@ -332,8 +330,8 @@ typedef enum PM_HOT_STANDBY,/* in hot standby mode */ PM_RUN, /* normal "database is alive" state */ PM_WAIT_BACKUP,/* waiting for online backup mode to end */ - PM_WAIT_READONLY, /* waiting for read only backends to exit */ - PM_WAIT_BACKENDS, /* waiting for live backends to exit */ + PM_WAIT_CLIENTS, /* waiting for normal backends to exit */ + PM_WAIT_BACKENDS, /* waiting for all backends to exit */ PM_SHUTDOWN,/* waiting for checkpointer to do shutdown * ckpt */ PM_SHUTDOWN_2,/* waiting for archiver and walsenders to @@ -2793,35 +2791,19 @@ pmdie(SIGNAL_ARGS) sd_notify(0, "STOPPING=1"); #endif - if (pmState == PM_RUN || pmState == PM_RECOVERY || -pmState == PM_HOT_STANDBY || pmState == PM_STARTUP) - { -/* autovac workers are told to shut down immediately */ -/* and bgworkers too; does this need tweaking? */ -SignalSomeChildren(SIGTERM, - BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER); -/* and the autovac launcher too */ -if (AutoVacPID != 0) - signal_child(AutoVacPID, SIGTERM); -/* and the bgwriter
Re: EDB builds Postgres 13 with an obsolete ICU version
On Tue, 11 Aug 2020 at 13:45, Thomas Kellerer wrote: > > Jaime Casanova schrieb am 11.08.2020 um 20:39: > >> As a follow-up to bug #16570 [1] and other previous discussions > >> on the mailing-lists, I'm checking out PG13 beta for Windows > >> from: > >> https://www.enterprisedb.com/postgresql-early-experience > >> and it ships with the same obsolete ICU 53 that was used > >> for PG 10,11,12. > >> Besides not having the latest Unicode features and fixes, ICU 53 > >> ignores the BCP 47 tags syntax in collations used as examples > >> in Postgres documentation, which leads to confusion and > >> false bug reports. > >> The current version is ICU 67. > >> > > > > Sadly, that is managed by EDB and not by the community. > > > > You can try > > https://www.2ndquadrant.com/en/resources/postgresql-installer-2ndquadrant/ > > which uses ICU-62.2, is not the latest but should allow you to follow > > the examples in the documentation. > > > One of the reasons I prefer the EDB builds is, that they provide a ZIP file > without the installer overhead. > Any chance 2ndQuadrant can supply something like that as well? > i don't think so, an unattended install mode is the closest -- Jaime Casanova www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Parallel query hangs after a smart shutdown is issued
On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy wrote: > After a smart shutdown is issued(with pg_ctl), run a parallel query, > then the query hangs. The postmaster doesn't inform backends about the > smart shutdown(see pmdie() -> SIGTERM -> BACKEND_TYPE_NORMAL are not > informed), so if they request parallel workers, the postmaster is > unable to fork any workers as it's status(pmState) gets changed to > PM_WAIT_BACKENDS(see maybe_start_bgworkers() --> > bgworker_should_start_now() returns false). > > Few ways we could solve this: > 1. Do we want to disallow parallelism when there is a pending smart > shutdown? - If yes, then, we can let the postmaster know the regular > backends whenever a smart shutdown is received and the backends use > this info to not consider parallelism. If we use SIGTERM to notify, > since the backends have die() as handlers, they just cancel the > queries which is again an inconsistent behaviour[1]. Would any other > signal like SIGUSR2(I think it's currently ignored by backends) be > used here? If the signals are overloaded, can we multiplex SIGTERM > similar to SIGUSR1? If we don't want to use signals at all, the > postmaster can make an entry of it's status in bg worker shared memory > i.e. BackgroundWorkerData, RegisterDynamicBackgroundWorker() can > simply return, without requesting the postmaster for parallel workers. > > 2. If we want to allow parallelism, then, we can tweak > bgworker_should_start_now(), detect that the pending bg worker fork > requests are for parallelism, and let the postmaster start the > workers. > > Thoughts? Hello Bharath, Yeah, the current situation is not good. I think your option 2 sounds better, because the documented behaviour of smart shutdown is that it "lets existing sessions end their work normally". I think that means that a query that is already running or allowed to start should be able to start new workers and not have its existing workers terminated. Arseny Sher wrote a couple of different patches to try that last year, but they fell through the cracks: https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com
pg_stat_statements and "IN" conditions
Hi I would like to start another thread to follow up on [1], mostly to bump up the topic. Just to remind, it's about how pg_stat_statements jumbling ArrayExpr in queries like: SELECT something FROM table WHERE col IN (1, 2, 3, ...) The current implementation produces different jumble hash for every different number of arguments for essentially the same query. Unfortunately a lot of ORMs like to generate these types of queries, which in turn leads to pg_stat_statements pollution. Ideally we want to prevent this and have only one record for such a query. As the result of [1] I've identified two highlighted approaches to improve this situation: * Reduce the generated ArrayExpr to an array Const immediately, in cases where all the inputs are Consts. * Make repeating Const to contribute nothing to the resulting hash. I've tried to prototype both approaches to find out pros/cons and be more specific. Attached patches could not be considered a completed piece of work, but they seem to work, mostly pass the tests and demonstrate the point. I would like to get some high level input about them and ideally make it clear what is the preferred solution to continue with. # Reducing ArrayExpr to an array Const IIUC this requires producing a Const with ArrayType constvalue in transformAExprIn for ScalarArrayOpExpr. This could be a general improvement, since apparently it's being done later anyway. But it deals only with Const, which leaves more on the table, e.g. Params and other similar types of duplication we observe when repeating constants are wrapped into VALUES. Another point here is that it's quite possible this approach will still require corresponding changes in pg_stat_statements, since just preventing duplicates to show also loses the information. Ideally we also need to have some understanding how many elements are actually there and display it, e.g. in cases when there is just one outlier query that contains a huge IN list. # Contribute nothing to the hash I guess there could be multiple ways of doing this, but the first idea I had in mind is to skip jumbling when necessary. At the same time it can be implemented more centralized for different types of queries (although in the attached patch there are only Const & Values). In the simplest case we just identify sequence of constants of the same type, which just ignores any other cases when stuff is mixed. But I believe it's something that could be considered a rare corner case and it's better to start with the simplest solution. Having said that I believe the second approach of contributing nothing to the hash sounds more appealing, but would love to hear other opinions. [1]: https://www.postgresql.org/message-id/flat/CAF42k%3DJCfHMJtkAVXCzBn2XBxDC83xb4VhV7jU7enPnZ0CfEQQ%40mail.gmail.com 0001-Reduce-ArrayExpr-into-const-array.patch Description: Binary data 0001-Limit-jumbling-for-repeating-constants.patch Description: Binary data
Re: [BUG] Error in BRIN summarization
On 2020-Aug-11, Alvaro Herrera wrote: > I think this is more complicated than necessary. It seems easier to > solve this problem by just checking whether the given root pointer is > set to InvalidOffsetNumber, which is already done in the existing coding > of heap_get_root_tuples (only they spell it "0" rather than > InvalidOffsetNumber, which I propose to change). AFAIR this should only > happen in the 'anyvisible' mode, so I added that in an assert. 'anyvisible' mode is not required AFAICS; reading the code, I think this could also hit REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY, which do not use that flag. I didn't try to reproduce it there, though. Anyway, I'm going to remove that Assert() I added. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [BUG] Error in BRIN summarization
On 2020-Jul-28, Peter Geoghegan wrote: > On Mon, Jul 27, 2020 at 10:25 AM Alvaro Herrera > wrote: > > (I was also considering whether it needs to be a loop to reobtain root > > tuples, in case a concurrent transaction can create a new item while > > we're checking that item; but I don't think that can really happen for > > one individual tuple.) > > I wonder if something like that is the underlying problem in a recent > problem case involving a "REINDEX index > pg_class_tblspc_relfilenode_index" command that runs concurrently with > the regression tests: > > https://postgr.es/m/CAH2-WzmBxu4o=pmsniur+bwhqcgcmv_aolkuk6buu7nga6e...@mail.gmail.com > > We see a violation of the HOT invariant in this case, though only for > a system catalog index, and only in fairly particular circumstances > involving concurrency. Hmm. As far as I understand, the bug Anastasia reports can only hit an index build that occurs concurrently to heap updates; and that cannot happen for a regular index build, only for CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY. So unless I miss something, it's not related to that other bug. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Parallel query hangs after a smart shutdown is issued
Hi, After a smart shutdown is issued(with pg_ctl), run a parallel query, then the query hangs. The postmaster doesn't inform backends about the smart shutdown(see pmdie() -> SIGTERM -> BACKEND_TYPE_NORMAL are not informed), so if they request parallel workers, the postmaster is unable to fork any workers as it's status(pmState) gets changed to PM_WAIT_BACKENDS(see maybe_start_bgworkers() --> bgworker_should_start_now() returns false). Few ways we could solve this: 1. Do we want to disallow parallelism when there is a pending smart shutdown? - If yes, then, we can let the postmaster know the regular backends whenever a smart shutdown is received and the backends use this info to not consider parallelism. If we use SIGTERM to notify, since the backends have die() as handlers, they just cancel the queries which is again an inconsistent behaviour[1]. Would any other signal like SIGUSR2(I think it's currently ignored by backends) be used here? If the signals are overloaded, can we multiplex SIGTERM similar to SIGUSR1? If we don't want to use signals at all, the postmaster can make an entry of it's status in bg worker shared memory i.e. BackgroundWorkerData, RegisterDynamicBackgroundWorker() can simply return, without requesting the postmaster for parallel workers. 2. If we want to allow parallelism, then, we can tweak bgworker_should_start_now(), detect that the pending bg worker fork requests are for parallelism, and let the postmaster start the workers. Thoughts? Note: this issue is identified while working on [1] [1] - https://www.postgresql.org/message-id/CALj2ACWTAQ2uWgj4yRFLQ6t15MMYV_uc3GCT5F5p8R9pzrd7yQ%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: 回复:how to create index concurrently on partitioned table
On Wed, Aug 12, 2020 at 12:28:20AM -0500, Justin Pryzby wrote: > On Wed, Aug 12, 2020 at 01:54:38PM +0900, Michael Paquier wrote: >> +++ b/src/backend/catalog/index.c >> @@ -3661,20 +3662,12 @@ reindex_relation(Oid relid, int flags, int options) >> +elog(ERROR, "unsupported relation kind for relation \"%s\"", >> + RelationGetRelationName(rel)); > > I guess it should show the relkind(%c) in the message, like these: Sure, but I don't see much the point in adding the relkind here knowing that we know which one it is. > ISTM reindex_index is missing that, too: > > 8b08f7d4820fd7a8ef6152a9dd8c6e3cb01e5f99 > + if (iRel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) > + elog(ERROR, "unsupported relation kind for index \"%s\"", > +RelationGetRelationName(iRel)); The error string does not follow the usual project style either, so I have updated both. >> >> - Reindexing partitioned tables or partitioned indexes is not supported. >> - Each individual partition can be reindexed separately instead. >> + Reindexing partitioned indexes or partitioned tables is supported >> + with respectively REINDEX INDEX or >> + REINDEX TABLE. > > Should say "..with REINDEX INDEX or REINDEX TABLE, respectively". > >> + Each partition of the partitioned >> + relation defined is rebuilt in its own transaction. > > => Each partition of the specified partitioned relation is reindexed in a > separate transaction. Thanks, good idea. I have been able to work more on this patch today, and finally added an error context for the transaction block check, as that's cleaner. In my manual tests, I have also bumped into a case that failed with the original patch (where there were no session locks), and created an isolation test based on it: drop of a partition leaf concurrently to REINDEX done on the parent. One last thing I have spotted is that we failed to discard properly foreign tables defined as leaves of a partition tree, causing a reindex to fail, so reindex_partitions() ought to just use RELKIND_HAS_STORAGE() to do its filtering work. I am leaving this patch alone for a couple of days now, and I'll try to come back to it after and potentially commit it. The attached has been indented by the way. -- Michael diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h index c26a102b17..22db3f660d 100644 --- a/src/include/commands/defrem.h +++ b/src/include/commands/defrem.h @@ -34,8 +34,10 @@ extern ObjectAddress DefineIndex(Oid relationId, bool check_not_in_use, bool skip_build, bool quiet); -extern void ReindexIndex(RangeVar *indexRelation, int options, bool concurrent); -extern Oid ReindexTable(RangeVar *relation, int options, bool concurrent); +extern void ReindexIndex(RangeVar *indexRelation, int options, bool concurrent, + bool isTopLevel); +extern Oid ReindexTable(RangeVar *relation, int options, bool concurrent, + bool isTopLevel); extern void ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind, int options, bool concurrent); extern char *makeObjectName(const char *name1, const char *name2, diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 1be27eec52..14e3464d0e 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -77,6 +77,7 @@ #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/pg_rusage.h" +#include "utils/rel.h" #include "utils/snapmgr.h" #include "utils/syscache.h" #include "utils/tuplesort.h" @@ -3447,11 +3448,12 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, iRel->rd_rel->relam); /* - * The case of reindexing partitioned tables and indexes is handled - * differently by upper layers, so this case shouldn't arise. + * Partitioned indexes should never get processed here, as they have no + * physical storage. */ if (iRel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) - elog(ERROR, "unsupported relation kind for index \"%s\"", + elog(ERROR, "cannot reindex partitioned index \"%s.%s\"", + get_namespace_name(RelationGetNamespace(iRel)), RelationGetRelationName(iRel)); /* @@ -3661,20 +3663,13 @@ reindex_relation(Oid relid, int flags, int options) rel = table_open(relid, ShareLock); /* - * This may be useful when implemented someday; but that day is not today. - * For now, avoid erroring out when called in a multi-table context - * (REINDEX SCHEMA) and happen to come across a partitioned table. The - * partitions may be reindexed on their own anyway. + * Partitioned tables should never get processed here, as they have no + * physical storage. */ if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - { - ereport(WARNING, -(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("REINDEX of partitioned tables is not yet implemented, skipping \"%s\"", -
Re: recovering from "found xmin ... from before relfrozenxid ..."
Thanks Robert for the review. Please find my comments inline below: On Fri, Aug 7, 2020 at 9:21 PM Robert Haas wrote: > > On Thu, Aug 6, 2020 at 9:23 AM Ashutosh Sharma wrote: > > Attached v4 patch fixes the latest comments from Robert and Masahiko-san. > > Compiler warning: > > heap_surgery.c:136:13: error: comparison of unsigned expression < 0 is > always false [-Werror,-Wtautological-compare] > if (blkno < 0 || blkno >= nblocks) > ~ ^ ~ > Fixed. > There's a certain inconsistency to these messages: > > rhaas=# create table foo (a int); > CREATE TABLE > rhaas=# insert into foo values (1); > INSERT 0 1 > rhaas=# select heap_force_kill('foo'::regclass, array['(0,2)'::tid]); > NOTICE: skipping tid (0, 2) because it contains an invalid offset > heap_force_kill > - > > (1 row) > > rhaas=# select heap_force_kill('foo'::regclass, array['(1,0)'::tid]); > ERROR: invalid item pointer > LOCATION: tids_same_page_fetch_offnums, heap_surgery.c:347 > rhaas=# select heap_force_kill('foo'::regclass, array['(1,1)'::tid]); > ERROR: block number 1 is out of range for relation "foo" > > From a user perspective it seems like I've made three very similar > mistakes: in the first case, the offset is too high, in the second > case it's too low, and in the third case the block number is out of > range. But in one case I get a NOTICE and in the other two cases I get > an ERROR. In one case I get the relation name and in the other two > cases I don't. The two complaints about an invalid offset are phrased > completely differently from each other. For example, suppose you do > this: > > ERROR: tid (%u, %u) is invalid for relation "%s" because the block > number is out of range (%u..%u) > ERROR: tid (%u, %u) is invalid for relation "%s" because the item > number is out of range for this block (%u..%u) > ERROR: tid (%u, %u) is invalid for relation "%s" because the item is unused > ERROR: tid (%u, %u) is invalid for relation "%s" because the item is dead > Corrected. > I think I misled you when I said to use pg_class_aclcheck. I think it > should actually be pg_class_ownercheck. > okay, I've changed it to pg_class_ownercheck. > I think the relkind sanity check should permit RELKIND_MATVIEW also. > Yeah, actually we should allow MATVIEW, don't know why I thought of blocking it earlier. > It's unclear to me why the freeze logic here shouldn't do this part > what heap_prepare_freeze_tuple() does when freezing xmax: > > frz->t_infomask2 &= ~HEAP_HOT_UPDATED; > frz->t_infomask2 &= ~HEAP_KEYS_UPDATED; > Yeah, we should have these changes when freezing the xmax. > Likewise, why should we not freeze or invalidate xvac in the case > where tuple->t_infomask & HEAP_MOVED, as heap_prepare_freeze_tuple() > would do? > Again, we should have this as well. Apart from above, this time I've also added the documentation on pg_surgery module and added a few more test-cases. Attached patch with above changes. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com From 2aa56b9632cf1d291f4433afd972dc647d354dcb Mon Sep 17 00:00:00 2001 From: ashu Date: Wed, 12 Aug 2020 14:38:14 +0530 Subject: [PATCH] Add contrib/pg_surgery to perform surgery on a damaged heap table. This contrib module basically adds a couple of functions named heap_force_kill and heap_force_freeze that can be used in the scripts to perform surgery on the damaged heap tables. Ashutosh Sharma. --- contrib/Makefile | 1 + contrib/pg_surgery/Makefile| 23 ++ contrib/pg_surgery/expected/pg_surgery.out | 161 + contrib/pg_surgery/heap_surgery.c | 375 + contrib/pg_surgery/pg_surgery--1.0.sql | 18 ++ contrib/pg_surgery/pg_surgery.control | 5 + contrib/pg_surgery/sql/pg_surgery.sql | 89 +++ doc/src/sgml/contrib.sgml | 1 + doc/src/sgml/filelist.sgml | 1 + doc/src/sgml/pgsurgery.sgml| 94 10 files changed, 768 insertions(+) create mode 100644 contrib/pg_surgery/Makefile create mode 100644 contrib/pg_surgery/expected/pg_surgery.out create mode 100644 contrib/pg_surgery/heap_surgery.c create mode 100644 contrib/pg_surgery/pg_surgery--1.0.sql create mode 100644 contrib/pg_surgery/pg_surgery.control create mode 100644 contrib/pg_surgery/sql/pg_surgery.sql create mode 100644 doc/src/sgml/pgsurgery.sgml diff --git a/contrib/Makefile b/contrib/Makefile index 1846d41..07d5734 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -35,6 +35,7 @@ SUBDIRS = \ pg_standby \ pg_stat_statements \ pg_trgm \ + pg_surgery \ pgcrypto \ pgrowlocks \ pgstattuple \ diff --git a/contrib/pg_surgery/Makefile b/contrib/pg_surgery/Makefile new file mode 100644 index 000..ecf2e20 --- /dev/null +++ b/contrib/pg_surgery/Makefile @@ -0,0 +1,23 @@ +# contrib/pg_surgery/Makefile + +MODULE_big = pg_surgery
Re: Allows Extend Protocol support CURSOR_OPT_HOLD with prepared stmt.
On Wed, Aug 12, 2020 at 8:21 PM Dave Cramer wrote: > > > On Wed, 12 Aug 2020 at 08:14, Andy Fan wrote: > >> >> >> On Wed, Aug 12, 2020 at 8:11 PM Andy Fan >> wrote: >> >>> >>> >>> On Wed, Aug 12, 2020 at 5:54 PM Dave Cramer >>> wrote: >>> On Tue, 11 Aug 2020 at 22:33, Andy Fan wrote: > > > On Mon, Jul 27, 2020 at 11:57 AM Andy Fan > wrote: > >> >>> 2. Currently I want to add a new GUC parameter, if set it to true, >>> server will >>> create a holdable portal, or else nothing changed. Then let the >>> user set >>> it to true in the above case and reset it to false afterward. Is >>> there any issue >>> with this method? >>> >>> >> I forget to say in this case, the user has to drop the holdable >> portal explicitly. >> >> >> > After some days's hack and testing, I found more issues to support the > following case > > rs = prepared_stmt.execute(1); > while(rs.next()) > { > // do something with the result (mainly DML ) > conn.commit(); or conn.rollback(); > > // commit / rollback to avoid the long lock holding. > } > > The holdable portal is still be dropped in transaction > aborted/rollbacked case since > the HoldPortal doesn't happens before that and "abort/rollabck" means > something > wrong so it is risk to hold it again. What I did to fix this issue is > HoldPortal just after > we define a Holdable portal. However, that's bad for performance. > Originally, we just > needed to scan the result when needed, now we have to hold all the > results and then fetch > and the data one by one. > > The above user case looks reasonable to me IMO, I would say it is > kind of "tech debt" > in postgres. To support this completely, looks we have to decouple > the snapshot/locking > management with transaction? If so, it looks like a huge change. I > wonder if anybody > tried to resolve this issue and where do we get to that point? > > -- > Best Regards > Andy Fan > I think if you set the fetch size the driver will use a named cursor and this should work >>> If the drivers can use the tempfile as an extra store, then things will >>> be better than the server. >>> >> >> Maybe not much better, just the same as each other. Both need to >> store all of them first and fetch them from the temp store again. >> >> > Ya I thought about this after I answered it. If you have a resultset that > you requested in a transaction and then you commit the transaction I think > it is reasonable to expect that the resultset is no longer valid. > > I checked JDBC, the resultset only uses memory to cache the resultset. so we can't set an inf+ fetch size with the hope that the client's resultset can cache all of them for us. Basically I will use my above hack. -- Best Regards Andy Fan
Re: Allows Extend Protocol support CURSOR_OPT_HOLD with prepared stmt.
On Wed, 12 Aug 2020 at 08:14, Andy Fan wrote: > > > On Wed, Aug 12, 2020 at 8:11 PM Andy Fan wrote: > >> >> >> On Wed, Aug 12, 2020 at 5:54 PM Dave Cramer >> wrote: >> >>> >>> >>> >>> On Tue, 11 Aug 2020 at 22:33, Andy Fan wrote: >>> On Mon, Jul 27, 2020 at 11:57 AM Andy Fan wrote: > >> 2. Currently I want to add a new GUC parameter, if set it to true, >> server will >> create a holdable portal, or else nothing changed. Then let the user >> set >> it to true in the above case and reset it to false afterward. Is >> there any issue >> with this method? >> >> > I forget to say in this case, the user has to drop the holdable > portal explicitly. > > > After some days's hack and testing, I found more issues to support the following case rs = prepared_stmt.execute(1); while(rs.next()) { // do something with the result (mainly DML ) conn.commit(); or conn.rollback(); // commit / rollback to avoid the long lock holding. } The holdable portal is still be dropped in transaction aborted/rollbacked case since the HoldPortal doesn't happens before that and "abort/rollabck" means something wrong so it is risk to hold it again. What I did to fix this issue is HoldPortal just after we define a Holdable portal. However, that's bad for performance. Originally, we just needed to scan the result when needed, now we have to hold all the results and then fetch and the data one by one. The above user case looks reasonable to me IMO, I would say it is kind of "tech debt" in postgres. To support this completely, looks we have to decouple the snapshot/locking management with transaction? If so, it looks like a huge change. I wonder if anybody tried to resolve this issue and where do we get to that point? -- Best Regards Andy Fan >>> >>> >>> I think if you set the fetch size the driver will use a named cursor and >>> this should work >>> >>> >> If the drivers can use the tempfile as an extra store, then things will >> be better than the server. >> > > Maybe not much better, just the same as each other. Both need to > store all of them first and fetch them from the temp store again. > > Ya I thought about this after I answered it. If you have a resultset that you requested in a transaction and then you commit the transaction I think it is reasonable to expect that the resultset is no longer valid. Dave Cramer www.postgres.rocks >
Re: Allows Extend Protocol support CURSOR_OPT_HOLD with prepared stmt.
On Wed, Aug 12, 2020 at 8:11 PM Andy Fan wrote: > > > On Wed, Aug 12, 2020 at 5:54 PM Dave Cramer > wrote: > >> >> >> >> On Tue, 11 Aug 2020 at 22:33, Andy Fan wrote: >> >>> >>> >>> On Mon, Jul 27, 2020 at 11:57 AM Andy Fan >>> wrote: >>> > 2. Currently I want to add a new GUC parameter, if set it to true, > server will > create a holdable portal, or else nothing changed. Then let the user > set > it to true in the above case and reset it to false afterward. Is > there any issue > with this method? > > I forget to say in this case, the user has to drop the holdable portal explicitly. >>> After some days's hack and testing, I found more issues to support the >>> following case >>> >>> rs = prepared_stmt.execute(1); >>> while(rs.next()) >>> { >>> // do something with the result (mainly DML ) >>> conn.commit(); or conn.rollback(); >>> >>> // commit / rollback to avoid the long lock holding. >>> } >>> >>> The holdable portal is still be dropped in transaction >>> aborted/rollbacked case since >>> the HoldPortal doesn't happens before that and "abort/rollabck" means >>> something >>> wrong so it is risk to hold it again. What I did to fix this issue is >>> HoldPortal just after >>> we define a Holdable portal. However, that's bad for performance. >>> Originally, we just >>> needed to scan the result when needed, now we have to hold all the >>> results and then fetch >>> and the data one by one. >>> >>> The above user case looks reasonable to me IMO, I would say it is kind >>> of "tech debt" >>> in postgres. To support this completely, looks we have to decouple the >>> snapshot/locking >>> management with transaction? If so, it looks like a huge change. I >>> wonder if anybody >>> tried to resolve this issue and where do we get to that point? >>> >>> -- >>> Best Regards >>> Andy Fan >>> >> >> >> I think if you set the fetch size the driver will use a named cursor and >> this should work >> >> > If the drivers can use the tempfile as an extra store, then things will be > better than the server. > Maybe not much better, just the same as each other. Both need to store all of them first and fetch them from the temp store again. -- Best Regards Andy Fan
Re: Allows Extend Protocol support CURSOR_OPT_HOLD with prepared stmt.
On Wed, Aug 12, 2020 at 5:54 PM Dave Cramer wrote: > > > > On Tue, 11 Aug 2020 at 22:33, Andy Fan wrote: > >> >> >> On Mon, Jul 27, 2020 at 11:57 AM Andy Fan >> wrote: >> >>> 2. Currently I want to add a new GUC parameter, if set it to true, server will create a holdable portal, or else nothing changed. Then let the user set it to true in the above case and reset it to false afterward. Is there any issue with this method? >>> I forget to say in this case, the user has to drop the holdable >>> portal explicitly. >>> >>> >>> >> After some days's hack and testing, I found more issues to support the >> following case >> >> rs = prepared_stmt.execute(1); >> while(rs.next()) >> { >> // do something with the result (mainly DML ) >> conn.commit(); or conn.rollback(); >> >> // commit / rollback to avoid the long lock holding. >> } >> >> The holdable portal is still be dropped in transaction aborted/rollbacked >> case since >> the HoldPortal doesn't happens before that and "abort/rollabck" means >> something >> wrong so it is risk to hold it again. What I did to fix this issue is >> HoldPortal just after >> we define a Holdable portal. However, that's bad for performance. >> Originally, we just >> needed to scan the result when needed, now we have to hold all the >> results and then fetch >> and the data one by one. >> >> The above user case looks reasonable to me IMO, I would say it is kind >> of "tech debt" >> in postgres. To support this completely, looks we have to decouple the >> snapshot/locking >> management with transaction? If so, it looks like a huge change. I wonder >> if anybody >> tried to resolve this issue and where do we get to that point? >> >> -- >> Best Regards >> Andy Fan >> > > > I think if you set the fetch size the driver will use a named cursor and > this should work > > I knew this point before working on that, but I heard from my customer that the size would be pretty big, so I gave up on this idea (too early). However, after working on a Holdable solution, I see there is very little difference between caching the result on the server or client. If the drivers can use the tempfile as an extra store, then things will be better than the server. Or else, things will be still complex. Thanks for your reminder! -- Best Regards Andy Fan
Re: Make contrib modules' installation scripts more secure.
Re: Tom Lane > > (The Debian regression tests remove plpgsql before testing all > > extensions in turn.) > > Meh. I think that's testing a case that we don't guarantee to work. > There was already a plpgsql dependency in hstore--1.1--1.2.sql, > which I just cribbed from to make these fixes. The key difference is that hstore--1.1--1.2.sql was never required for installing an extension from scratch, only for upgrades. The practical relevance of this distinction is that the upgrade scripts are only run once, while install-time scripts (including the upgrade scripts for versions that do not have a direct creation script) are also required for dump-restore cycles. As an admin, I'd very much hate databases that couldn't be restored without extra fiddling. The thing that maybe saves us here is that while hstore is trusted, so any user can create it, plpgsql is trusted as well, but owned by postgres, so even database owners can't drop it from beneath hstore. Only superusers can "mess up" a database in that way. But still. > A band-aid sort of fix would be to roll up the base install scripts > for these modules to the latest version, so that a plain install from > scratch doesn't need to execute any of the catalog adjustments in > their update scripts. That's not terribly attractive from a maintenance > or testing standpoint, though. That's a pretty small price compared to the dump-reload inconsistencies. I can see the extra maintenance effort, but how many extensions would require rewriting as direct-install.sql scripts? I guess it's only a few that need plpgsql for upgrades. Christoph
Re: Allows Extend Protocol support CURSOR_OPT_HOLD with prepared stmt.
On Tue, 11 Aug 2020 at 22:33, Andy Fan wrote: > > > On Mon, Jul 27, 2020 at 11:57 AM Andy Fan > wrote: > >> >>> 2. Currently I want to add a new GUC parameter, if set it to true, >>> server will >>> create a holdable portal, or else nothing changed. Then let the user >>> set >>> it to true in the above case and reset it to false afterward. Is there >>> any issue >>> with this method? >>> >>> >> I forget to say in this case, the user has to drop the holdable >> portal explicitly. >> >> >> > After some days's hack and testing, I found more issues to support the > following case > > rs = prepared_stmt.execute(1); > while(rs.next()) > { > // do something with the result (mainly DML ) > conn.commit(); or conn.rollback(); > > // commit / rollback to avoid the long lock holding. > } > > The holdable portal is still be dropped in transaction aborted/rollbacked > case since > the HoldPortal doesn't happens before that and "abort/rollabck" means > something > wrong so it is risk to hold it again. What I did to fix this issue is > HoldPortal just after > we define a Holdable portal. However, that's bad for performance. > Originally, we just > needed to scan the result when needed, now we have to hold all the results > and then fetch > and the data one by one. > > The above user case looks reasonable to me IMO, I would say it is kind of > "tech debt" > in postgres. To support this completely, looks we have to decouple the > snapshot/locking > management with transaction? If so, it looks like a huge change. I wonder > if anybody > tried to resolve this issue and where do we get to that point? > > -- > Best Regards > Andy Fan > I think if you set the fetch size the driver will use a named cursor and this should work Dave Cramer www.postgres.rocks
use pg_get_functiondef() in pg_dump
Here is a patch to have pg_dump use pg_get_functiondef() instead of assembling the CREATE FUNCTION/PROCEDURE commands itself. This should save on maintenance effort in the future. It's also a prerequisite for being able to dump functions with SQL-standard function body discussed in [0]. pg_get_functiondef() was meant for psql's \ef command, so its defaults are slightly different from what pg_dump would like, so this adds a few optional parameters for tweaking the behavior. The naming of the parameters is up for discussion. [0]: https://www.postgresql.org/message-id/flat/1c11f1eb-f00c-43b7-799d-2d44132c0...@2ndquadrant.com -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 6d87daace8d5e9cec80d1b3b5d1a03ff9c2206ee Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 12 Aug 2020 09:19:58 +0200 Subject: [PATCH v1 1/2] Drop final newline from pg_get_functiondef result This makes it more consistent with other pg_get_*def() functions. psql (for \ef) already ensures that a trailing newline is added as necessary, so this change doesn't affect psql's functionality. --- src/backend/utils/adt/ruleutils.c | 2 -- src/test/regress/expected/create_function_3.out | 12 src/test/regress/expected/create_procedure.out | 3 +-- src/test/regress/expected/rules.out | 3 +-- 4 files changed, 6 insertions(+), 14 deletions(-) diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 60dd80c23c..877f99cba3 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -2832,8 +2832,6 @@ pg_get_functiondef(PG_FUNCTION_ARGS) appendStringInfoString(, prosrc); appendBinaryStringInfo(, dq.data, dq.len); - appendStringInfoChar(, '\n'); - ReleaseSysCache(proctup); PG_RETURN_TEXT_P(string_to_text(buf.data)); diff --git a/src/test/regress/expected/create_function_3.out b/src/test/regress/expected/create_function_3.out index ba260df996..5d0be17282 100644 --- a/src/test/regress/expected/create_function_3.out +++ b/src/test/regress/expected/create_function_3.out @@ -200,8 +200,7 @@ SELECT pg_get_functiondef('functest_A_1'::regproc); CREATE OR REPLACE FUNCTION temp_func_test.functest_a_1(text, date)+ RETURNS boolean + LANGUAGE sql + - AS $function$SELECT $1 = 'abcd' AND $2 > '2001-01-01'$function$ + - + AS $function$SELECT $1 = 'abcd' AND $2 > '2001-01-01'$function$ (1 row) SELECT pg_get_functiondef('functest_B_3'::regproc); @@ -211,8 +210,7 @@ SELECT pg_get_functiondef('functest_B_3'::regproc); RETURNS boolean + LANGUAGE sql + STABLE+ - AS $function$SELECT $1 = 0$function$ + - + AS $function$SELECT $1 = 0$function$ (1 row) SELECT pg_get_functiondef('functest_C_3'::regproc); @@ -222,8 +220,7 @@ SELECT pg_get_functiondef('functest_C_3'::regproc); RETURNS boolean + LANGUAGE sql + SECURITY DEFINER + - AS $function$SELECT $1 < 0$function$ + - + AS $function$SELECT $1 < 0$function$ (1 row) SELECT pg_get_functiondef('functest_F_2'::regproc); @@ -233,8 +230,7 @@ SELECT pg_get_functiondef('functest_F_2'::regproc); RETURNS boolean + LANGUAGE sql + STRICT+ - AS $function$SELECT $1 = 50$function$ + - + AS $function$SELECT $1 = 50$function$ (1 row) -- information_schema tests diff --git a/src/test/regress/expected/create_procedure.out b/src/test/regress/expected/create_procedure.out index 211a42cefa..2c94b9a4b6 100644 --- a/src/test/regress/expected/create_procedure.out +++ b/src/test/regress/expected/create_procedure.out @@ -29,8 +29,7 @@ SELECT pg_get_functiondef('ptest1'::regproc); LANGUAGE sql+ AS $procedure$ + INSERT INTO cp_test VALUES (1, x); + - $procedure$ + - + $procedure$ (1 row) -- show only normal functions diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index 601734a6f1..be434c97ac 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -3402,8 +3402,7 @@ SELECT pg_get_functiondef('func_with_set_params()'::regprocedure); SET work_mem TO '4MB'
Re: remove some ancient port hacks
On 12.08.2020 09:12, Peter Eisentraut wrote: There are two ancient hacks in the cygwin and solaris ports that appear to have been solved more than 10 years ago, so I think we can remove them. See attached patches. Hi Peter, This is really archeology Check for b20.1 as it was released in 1998. No problem at all to remove it Regards Marco Cygwin Package Maintainer
remove some ancient port hacks
There are two ancient hacks in the cygwin and solaris ports that appear to have been solved more than 10 years ago, so I think we can remove them. See attached patches. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 6322cb1d9579ab8e2fd66139aa6a0f342925b7f0 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 12 Aug 2020 09:09:02 +0200 Subject: [PATCH 1/2] Remove obsolete HAVE_BUGGY_SOLARIS_STRTOD Fixed more than 10 years ago. --- src/backend/utils/adt/float.c | 24 src/include/port/solaris.h| 12 2 files changed, 36 deletions(-) diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c index ffd1ce8c76..429c9280c0 100644 --- a/src/backend/utils/adt/float.c +++ b/src/backend/utils/adt/float.c @@ -271,18 +271,6 @@ float4in(PG_FUNCTION_ARGS) errmsg("invalid input syntax for type %s: \"%s\"", "real", orig_num))); } -#ifdef HAVE_BUGGY_SOLARIS_STRTOD - else - { - /* -* Many versions of Solaris have a bug wherein strtod sets endptr to -* point one byte beyond the end of the string when given "inf" or -* "infinity". -*/ - if (endptr != num && endptr[-1] == '\0') - endptr--; - } -#endif /* HAVE_BUGGY_SOLARIS_STRTOD */ /* skip trailing whitespace */ while (*endptr != '\0' && isspace((unsigned char) *endptr)) @@ -499,18 +487,6 @@ float8in_internal_opt_error(char *num, char **endptr_p, type_name, orig_string))), have_error); } -#ifdef HAVE_BUGGY_SOLARIS_STRTOD - else - { - /* -* Many versions of Solaris have a bug wherein strtod sets endptr to -* point one byte beyond the end of the string when given "inf" or -* "infinity". -*/ - if (endptr != num && endptr[-1] == '\0') - endptr--; - } -#endif /* HAVE_BUGGY_SOLARIS_STRTOD */ /* skip trailing whitespace */ while (*endptr != '\0' && isspace((unsigned char) *endptr)) diff --git a/src/include/port/solaris.h b/src/include/port/solaris.h index eeb1a320bd..e63a3bd824 100644 --- a/src/include/port/solaris.h +++ b/src/include/port/solaris.h @@ -24,15 +24,3 @@ #if defined(__i386__) #include #endif - -/* - * Many versions of Solaris have broken strtod() --- see bug #4751182. - * This has been fixed in current versions of Solaris: - * - * http://sunsolve.sun.com/search/document.do?assetkey=1-21-108993-62-1=108993-62 - * http://sunsolve.sun.com/search/document.do?assetkey=1-21-112874-34-1=112874-34 - * - * However, many people might not have patched versions, so - * still use our own fix for the buggy version. - */ -#define HAVE_BUGGY_SOLARIS_STRTOD -- 2.28.0 From 71a0e828454f2c473c65824cfdbf6a14640ffcf6 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 12 Aug 2020 09:09:02 +0200 Subject: [PATCH 2/2] Remove obsolete cygwin.h hack The version being checked for is 20 years old. --- src/include/port/cygwin.h | 9 - 1 file changed, 9 deletions(-) diff --git a/src/include/port/cygwin.h b/src/include/port/cygwin.h index f1fc1a93d7..64d69936e5 100644 --- a/src/include/port/cygwin.h +++ b/src/include/port/cygwin.h @@ -1,14 +1,5 @@ /* src/include/port/cygwin.h */ -#include - -/* - * Check for b20.1 and disable AF_UNIX family socket support. - */ -#if CYGWIN_VERSION_DLL_MAJOR < 1001 -#undef HAVE_UNIX_SOCKETS -#endif - #ifdef BUILDING_DLL #define PGDLLIMPORT __declspec (dllexport) #else -- 2.28.0
Re: SyncRepLock acquired exclusively in default configuration
On Wed, 12 Aug 2020 at 14:06, Asim Praveen wrote: > > > > > On 11-Aug-2020, at 8:57 PM, Robert Haas wrote: > > > > I think this gets to the root of the issue. If we check the flag > > without a lock, we might see a slightly stale value. But, considering > > that there's no particular amount of time within which configuration > > changes are guaranteed to take effect, maybe that's OK. However, there > > is one potential gotcha here: if the walsender declares the standby to > > be synchronous, a user can see that, right? So maybe there's this > > problem: a user sees that the standby is synchronous and expects a > > transaction committing afterward to provoke a wait, but really it > > doesn't. Now the user is unhappy, feeling that the system didn't > > perform according to expectations. > > Yes, pg_stat_replication reports a standby in sync as soon as walsender > updates priority of the standby to something other than 0. > > The potential gotcha referred above doesn’t seem too severe. What is the > likelihood of someone setting synchronous_standby_names GUC with either “*” > or a standby name and then immediately promoting that standby? If the > standby is promoted before the checkpointer on master gets a chance to update > sync_standbys_defined in shared memory, commits made during this interval on > master may not make it to standby. Upon promotion, those commits may be lost. I think that if the standby is quite behind the primary and in case of the primary crashes, the likelihood of losing commits might get higher. The user can see the standby became synchronous standby via pg_stat_replication but commit completes without a wait because the checkpointer doesn't update sync_standbys_defined yet. If the primary crashes before standby catching up and the user does failover, the committed transaction will be lost, even though the user expects that transaction commit has been replicated to the standby synchronously. And this can happen even without the patch, right? Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Handing off SLRU fsyncs to the checkpointer
On Sat, Aug 8, 2020 at 2:44 AM Robert Haas wrote: > On Wed, Aug 5, 2020 at 2:01 AM Thomas Munro wrote: > > * Master is around 11% faster than last week before commit c5315f4f > > "Cache smgrnblocks() results in recovery." > > * This patch gives a similar speedup, bringing the total to around 25% > > faster than last week (the time is ~20% less, the WAL processing speed > > is ~1.25x). > > Dang, that's pretty nice, especially for the relatively small amount > of code that it seems to require. Yeah, the combined effect of these two patches is better than I expected. To be clear though, I was only measuring the time between the "redo starts at ..." and "redo done at ..." messages, since I've been staring at the main recovery code, but there are also some more fsyncs before (SyncDataDirectory()) and after (RemoveOldXlogFiles()) that are unaffected. I think it's probably possible to do something about those too, but that's another topic. I spotted a small problem: if the transaction ID wrap all the way around between checkpoints, then you might have cancelled requests for a removed SLRU segment from the previous epoch, so we'd better uncancel them if we see that. That's a one line fix, done in the attached. I also adjusted the commit message to be a little clearer (this work deferment/collapsing scheme works in crash recovery too, not just when there is a checkpointer process to hand the work to). From 63235aca6b2525716f8f29caf07e0a0e6a26965e Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 4 Aug 2020 17:57:18 +1200 Subject: [PATCH] Defer flushing of SLRU files. Previously, we called fsync() after writing out pg_xact, multixact and commit_ts pages, leading to an I/O stall in user backends and recovery. Collapse requests for the same file into a single system call as part of the next checkpoint, as we do for relation files. Discussion: https://postgr.es/m/CA+hUKGLJ=84yt+nvhkeedauutvhmfq9i-n7k_o50jmq6rpj...@mail.gmail.com --- src/backend/access/transam/clog.c | 13 +++- src/backend/access/transam/commit_ts.c | 12 ++- src/backend/access/transam/multixact.c | 24 +- src/backend/access/transam/slru.c | 101 +++-- src/backend/access/transam/subtrans.c | 4 +- src/backend/commands/async.c | 5 +- src/backend/storage/lmgr/predicate.c | 4 +- src/backend/storage/sync/sync.c| 28 ++- src/include/access/clog.h | 3 + src/include/access/commit_ts.h | 3 + src/include/access/multixact.h | 4 + src/include/access/slru.h | 12 ++- src/include/storage/sync.h | 7 +- 13 files changed, 174 insertions(+), 46 deletions(-) diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index f3da40ae01..0c5b7a525e 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -42,6 +42,7 @@ #include "pg_trace.h" #include "pgstat.h" #include "storage/proc.h" +#include "storage/sync.h" /* * Defines for CLOG page sizes. A page is the same BLCKSZ as is used @@ -692,7 +693,8 @@ CLOGShmemInit(void) { XactCtl->PagePrecedes = CLOGPagePrecedes; SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE, - XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER); + XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER, + SYNC_HANDLER_CLOG); } /* @@ -1034,3 +1036,12 @@ clog_redo(XLogReaderState *record) else elog(PANIC, "clog_redo: unknown op code %u", info); } + +/* + * Entrypoint for sync.c to sync clog files. + */ +int +clogsyncfiletag(const FileTag *ftag, char *path) +{ + return slrusyncfiletag(XactCtl, ftag, path); +} diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c index 903280ae92..b4edbdb4e3 100644 --- a/src/backend/access/transam/commit_ts.c +++ b/src/backend/access/transam/commit_ts.c @@ -555,7 +555,8 @@ CommitTsShmemInit(void) CommitTsCtl->PagePrecedes = CommitTsPagePrecedes; SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0, CommitTsSLRULock, "pg_commit_ts", - LWTRANCHE_COMMITTS_BUFFER); + LWTRANCHE_COMMITTS_BUFFER, + SYNC_HANDLER_COMMIT_TS); commitTsShared = ShmemInitStruct("CommitTs shared", sizeof(CommitTimestampShared), @@ -1083,3 +1084,12 @@ commit_ts_redo(XLogReaderState *record) else elog(PANIC, "commit_ts_redo: unknown op code %u", info); } + +/* + * Entrypoint for sync.c to sync commit_ts files. + */ +int +committssyncfiletag(const FileTag *ftag, char *path) +{ + return slrusyncfiletag(CommitTsCtl, ftag, path); +} diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 475f5ed861..27ae2edbdc 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -1831,11 +1831,13 @@ MultiXactShmemInit(void) SimpleLruInit(MultiXactOffsetCtl, "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,