Re: WIP: Avoid creation of the free space map for small tables
On 10/16/18, Amit Kapila wrote: > I think we can avoid using prevBlockNumber and try_every_block, if we > maintain a small cache which tells whether the particular block is > tried or not. What I am envisioning is that while finding the block > with free space, if we came to know that the relation in question is > small enough that it doesn't have FSM, we can perform a local_search. > In local_seach, we can enquire the cache for any block that we can try > and if we find any block, we can try inserting in that block, > otherwise, we need to extend the relation. One simple way to imagine > such a cache would be an array of structure and structure has blkno > and status fields. After we get the usable block, we need to clear > the cache, if exists. Here is the design I've implemented in the attached v6. There is more code than v5, but there's a cleaner separation between freespace.c and hio.c, as you preferred. I also think it's more robust. I've expended some effort to avoid doing unnecessary system calls to get the number of blocks. -- For the local, in-memory map, maintain a static array of status markers, of fixed-length HEAP_FSM_CREATION_THRESHOLD, indexed by block number. This is populated every time we call GetPageWithFreeSpace() on small tables with no FSM. The statuses are 'zero' (beyond the relation) 'available to try' 'tried already' Example for a 4-page heap: 01234567 If we try block 3 and there is no space, we set it to 'tried' and next time through the loop we'll try 2, etc: 01234567 AAAT If we try all available blocks, we will extend the relation. As in the master branch, first we call GetPageWithFreeSpace() again to see if another backend extended the relation to 5 blocks while we were waiting for the lock. If we find a new block, we will mark the new block available and leave the rest alone: 01234567 A000 On the odd chance we still can't insert into the new block, we'll skip checking any others and we'll redo the logic to extend the relation. If we're about to successfully return a buffer, whether from an existing block, or by extension, we clear the local map. Once this is in shape, I'll do some performance testing. -John Naylor From 529fa1f57946d70736b2304c2883213e45f7c077 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Mon, 22 Oct 2018 13:39:25 +0700 Subject: [PATCH v6] Avoid creation of the free space map for small tables. The FSM isn't created if the heap has fewer than 8 blocks. If the last known good block has insufficient space, try every block before extending the heap. If a heap with a FSM is truncated back to below the threshold, the FSM stays around and can be used as usual. --- contrib/pageinspect/expected/page.out | 77 contrib/pageinspect/sql/page.sql | 33 ++-- src/backend/access/brin/brin.c| 2 +- src/backend/access/brin/brin_pageops.c| 8 +- src/backend/access/heap/hio.c | 57 -- src/backend/commands/vacuumlazy.c | 17 +- src/backend/storage/freespace/freespace.c | 224 +- src/backend/storage/freespace/indexfsm.c | 4 +- src/include/storage/freespace.h | 7 +- 9 files changed, 344 insertions(+), 85 deletions(-) diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out index 3fcd9fbe6d..83e5910453 100644 --- a/contrib/pageinspect/expected/page.out +++ b/contrib/pageinspect/expected/page.out @@ -1,48 +1,69 @@ CREATE EXTENSION pageinspect; -CREATE TABLE test1 (a int, b int); -INSERT INTO test1 VALUES (16777217, 131584); -VACUUM test1; -- set up FSM +CREATE TABLE test_rel_forks (a int); +-- Make sure there are enough blocks in the heap for the FSM to be created. +INSERT INTO test_rel_forks SELECT g from generate_series(1,1) g; +-- set up FSM and VM +VACUUM test_rel_forks; -- The page contents can vary, so just test that it can be read -- successfully, but don't keep the output. -SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0; main_0 8192 (1 row) -SELECT octet_length(get_raw_page('test1', 'main', 1)) AS main_1; -ERROR: block number 1 is out of range for relation "test1" -SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100; +ERROR: block number 100 is out of range for relation "test_rel_forks" +SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0; fsm_0 --- 8192 (1 row) -SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1; - fsm_1 - 8192 -(1 row) - -SELECT octet_length(get_raw_page('test1', 'vm', 0)) AS vm_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10; +ERROR: block number 10 is out of range for relation "test_rel_forks" +SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0; vm_0 -- 8192
Re: Side effect of CVE-2017-7484 fix?
Hi, On 2018/10/22 14:41, Stephen Frost wrote: > Greetings, > > * Dilip Kumar (dilipbal...@gmail.com) wrote: >> As part of the security fix >> (e2d4ef8de869c57e3bf270a30c12d48c2ce4e00c), we have restricted the >> users from accessing the statistics of the table if the user doesn't >> have privileges on the table and the function is not leakproof. Now, >> as a side effect of this, if the user has the privileges on the root >> partitioned table but does not have privilege on the child tables, the >> user will be able to access the data of the child table but it won't >> be able to access the statistics of the child table. This may result >> in a bad plan. I am not sure what should be the fix. Should we >> allow to access the statistics of the table if a user has privilege on >> its parent table? > > Yes... If the user has access to the parent table then they can see the > child tables, so they should be able to see the statistics on them. Yeah, but I'd think only if access the child tables are being accessed via the parent table. So, maybe, statistic_proc_security_check() added by that commit should return true if IS_OTHER_REL(vardata->rel)? Thanks, Amit
Re: Function to promote standby servers
On Sat, Oct 20, 2018 at 01:48:56PM +0900, Michael Paquier wrote: > On Sat, Oct 20, 2018 at 06:24:28AM +0200, Laurenz Albe wrote: >> Here is another version, with a fix in pg_proc.dat, an improved comment >> and "wait_seconds" exercised in the regression test. > > Thanks for the new version. This looks pretty good to me. I'll see if > I can review it once and then commit. > >> -WAIT_EVENT_SYNC_REP >> +WAIT_EVENT_SYNC_REP, >> +WAIT_EVENT_PROMOTE >> } WaitEventIPC; > > Those are kept in alphabetical order. Individual wait events are also > documented with a description. Regarding the documentation, wouldn't it be more adapted to list the new function under the section "Recovery Control Functions"? Not only does the new function signal the postmaster, but it also creates the promotion trigger file. Anyway, I have been looking in depth at the patch, and I finish with the attached. Here are some notes: - pg_ctl returns an error if it cannot create the promote trigger file and if it doesn't close it. pg_promote should do the same. - WL_TIMEOUT could have been reached, leading to no further retries (remove for example the part related to the trigger file creation and try to trigger the promotion, the wait time is incorrect). - Documentation has been reworked. - The new wait event is documented. - a WARNING is returned if the signal to the postmaster is not sent, which is something I think makes most sense as we do the same for other signals. - position including unistd.h was not correct in xlogfuncs.c. - Timeouts for the tests are made a tad longer. Some buildfarm machines don't like a promotion wait of 10s. - a catalog version bump is included, just a personal reminder.. - Indentatio has been run. Laurenz, what do you think? -- Michael diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 5193df3366..96d45419e5 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -19202,6 +19202,9 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); pg_is_wal_replay_paused + +pg_promote + pg_wal_replay_pause @@ -19232,6 +19235,22 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); True if recovery is paused. + + +pg_promote(wait boolean DEFAULT true, wait_seconds integer DEFAULT 60) + + boolean + +Promotes a physical standby server. Returns true +if promotion is successful and false otherwise. +With wait set to true, the +default, the function waits until promotion is completed or +wait_seconds seconds have passed, otherwise the +function returns immediately after sending the promotion signal to the +postmaster. This function is restricted to superusers by default, but +other users can be granted EXECUTE to run the function. + + pg_wal_replay_pause() diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index ebcb3daaed..faf8e71854 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -1471,14 +1471,17 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)' -To trigger failover of a log-shipping standby server, -run pg_ctl promote or create a trigger -file with the file name and path specified by the trigger_file -setting in recovery.conf. If you're planning to use -pg_ctl promote to fail over, trigger_file is -not required. If you're setting up the reporting servers that are -only used to offload read-only queries from the primary, not for high -availability purposes, you don't need to promote it. +To trigger failover of a log-shipping standby server, run +pg_ctl promote, call pg_promote, +or create a trigger file with the file name and path specified by the +trigger_file setting in +recovery.conf. If you're planning to use +pg_ctl promote or to call +pg_promote to fail over, +trigger_file is not required. If you're +setting up the reporting servers that are only used to offload read-only +queries from the primary, not for high availability purposes, you don't +need to promote it. diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 0484cfa77a..073f19542a 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1268,7 +1268,7 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser Waiting in an extension. - IPC + IPC BgWorkerShutdown Waiting for background worker to shut down. @@ -1384,6 +1384,10 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser ProcArrayGroupUpdate Waiting for group leader to clear transaction id at transaction end. + + P
Re: remove deprecated @@@ operator ?
On Sun, Oct 21, 2018 at 11:24 PM Tom Lane wrote: > > Oleg Bartunov writes: > > The commit 9b5c8d45f62bd3d243a40cc84deb93893f2f5122 is now 10+ years > > old, may be we could remove deprecated @@@ operator ? > > Is it actually causing any problem? AFAICS it's just a couple extra > pg_operator entries, so why not leave it? > > I'd be +1 for removing it from the docs, though ... attached a tiny patch for docs > > regards, tom lane -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company func.sgml.patch Description: Binary data
Re: Side effect of CVE-2017-7484 fix?
On Mon, Oct 22, 2018 at 11:10:09AM +0530, Dilip Kumar wrote: > As part of the security fix > (e2d4ef8de869c57e3bf270a30c12d48c2ce4e00c), we have restricted the > users from accessing the statistics of the table if the user doesn't > have privileges on the table and the function is not leakproof. > Now, as a side effect of this, if the user has the privileges on the > root partitioned table but does not have privilege on the child > tables, the user will be able to access the data of the child table > but it won't be able to access the statistics of the child table. Do we have any kind of quantitative idea of how much worse query performance gets with this blind spot? > This may result in a bad plan. I am not sure what should be the > fix. Should we allow to access the statistics of the table if a > user has privilege on its parent table? In threat modeling terms, access to the statistics is an information leak. If we just say "too bad" to the people who care a lot about slowing information leaks, I'm guessing that would make them very unhappy. Since the access controls are built for those people, I'd say that we should prioritize performance optimizations for cases when people haven't explicitly decided to trade performance for lower information leak rates, which is to say for people who haven't put those access controls on in the first place. That's just my take, though. Another GUC to control this, perhaps? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Side effect of CVE-2017-7484 fix?
Greetings, * Dilip Kumar (dilipbal...@gmail.com) wrote: > As part of the security fix > (e2d4ef8de869c57e3bf270a30c12d48c2ce4e00c), we have restricted the > users from accessing the statistics of the table if the user doesn't > have privileges on the table and the function is not leakproof. Now, > as a side effect of this, if the user has the privileges on the root > partitioned table but does not have privilege on the child tables, the > user will be able to access the data of the child table but it won't > be able to access the statistics of the child table. This may result > in a bad plan. I am not sure what should be the fix. Should we > allow to access the statistics of the table if a user has privilege on > its parent table? Yes... If the user has access to the parent table then they can see the child tables, so they should be able to see the statistics on them. That's my 2c on a quick review, at least. Thanks! Stephen signature.asc Description: PGP signature
Side effect of CVE-2017-7484 fix?
As part of the security fix (e2d4ef8de869c57e3bf270a30c12d48c2ce4e00c), we have restricted the users from accessing the statistics of the table if the user doesn't have privileges on the table and the function is not leakproof. Now, as a side effect of this, if the user has the privileges on the root partitioned table but does not have privilege on the child tables, the user will be able to access the data of the child table but it won't be able to access the statistics of the child table. This may result in a bad plan. I am not sure what should be the fix. Should we allow to access the statistics of the table if a user has privilege on its parent table? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: SQL:2011 PERIODS vs Postgres Ranges?
Hi ne 21. 10. 2018 v 21:47 odesílatel Paul A Jungwirth < p...@illuminatedcomputing.com> napsal: > On Sun, Oct 21, 2018 at 12:11 PM Heikki Linnakangas > wrote: > > On 21/10/2018 21:17, Paul A Jungwirth wrote: > > > 3. Build our own abstractions on top of ranges, and then use those to > > > implement PERIOD-based features. > > +1 on this approach. I think [7] got the model right. If we can > > implement SQL-standard PERIODs on top of it, then that's a bonus, but > > having sane, flexible, coherent set of range operators is more important > > to me. > > Okay, I'm surprised to hear from you and Isaac that following the > standard isn't as important as I thought, but I'm certainly pleased > not to make it the focus. I just thought that Postgres's reputation > was to be pretty careful about sticking to it. (I think we could still > add a standard-compliant layer, but like you I don't feel a duty to > suffer from it.) It sounds like I should work out some proposed > function signatures and write up how to use them, and see what people > think. Is that a useful approach? > > It can be very unhappy if we cannot to implement standard syntax and behave. The implementation behind or another is not too important. We should not to accept any design that don't allow implement standard. The world is 10 years after standards (maybe more). Now, this feature is implemented in MySQL/MariaDB, and I expecting a press to have standardized syntax after 5 years. Regards Pavel > > What are we missing? > > Here are a few big ones: > > 1. Define temporal primary keys and foreign keys that are known to the > database catalog and controlled as higher-level objects. For instance > I wrote an extension at https://github.com/pjungwir/time_for_keys to > create temporal foreign keys, but the database isn't "aware" of them. > That means they are more cluttered in `\d foo` than necessary (you see > the trigger constraints instead of something about a foreign key), > they don't automatically disappear if you drop the column, it is hard > to make them "polymorphic" (My extension supports only > int+tstzrange.), they don't validate that the referenced table has a > declared temporal PK, they probably have slightly different > locking/transaction semantics than the real RI code, etc. This is what > I'd like to implement right now. > > 2. System time: automatically track DML changes to the table, and let > you query "as of" a given time. > > 3. Temporal joins. I don't want to tackle this myself, because there > is already an amazing proposed patch that does everything we could ask > for at > https://www.postgresql-archive.org/PROPOSAL-Temporal-query-processing-with-range-types-tt5913058.html > (recently updated btw, so I hope someone will look at it!). > > 4. Temporal UPDATE/DELETE: these should be converted to instead change > the end time of old rows and insert new rows with the changed > attributes. I'm interested in implementing this too, but one thing at > a time. . . . > > I really appreciate your sharing your thoughts! > > Paul > >
Re: Number of buckets/partitions of dshash
Hello. At Fri, 12 Oct 2018 06:19:12 +, "Ideriha, Takeshi" wrote in <4E72940DA2BF16479384A86D54D0988A6F1C1779@G01JPEXMBKW04> > Hi, let me clarify my understanding about the $title. > > It seems that the number of hash partitions is fixed at 128 in dshash and > right now we cannot change it unless dshash.c code itself is changed, right? > > According to the comment of dshash.c, DSHASH_NUM_PARTITIONS could be runtime > parameter in future. > Are there someone working on it? > (I want to do it, but TBH I cannot promise it because of some other work.) > > In my current development for allocating catalog cache on the shared memory[1] > I'm trying to put hash table of each CatCache to the shared memory using > dshash. > The number of buckets for CatCache is pre-defined by cacheinfo and most of > them is under 128 like 8 or 16. > This would cause some waste of memory on DSA because some partitions > (buckets) is allocated but not used. > > So I'm thinking that current dshash design is still ok but flexible size of > partition is appropriate > for some use cases like mine. > > Do you have any thoughts? We could do that easily, but shouldn't we find a way to reduce or eliminate the impact of locking first? dshash needs to hold partition lock while the caller is examining a returned entry. > [1] > https://www.postgresql.org/message-id/4E72940DA2BF16479384A86D54D0988A567B9245%40G01JPEXMBKW04 regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: found xmin x from before relfrozenxid y
Andres Freund writes: > On 2018-10-21 10:24:16 -0400, Tom Lane wrote: >> (Well, I see the short answer: the code layer throwing the error >> doesn't know. But that could be fixed easily enough.) > I wonder if the better approach wouldn't be to add an errcontext for > vaccuum, where continually update the block number etc. Theres plenty of > different sources of corruption that'd potentially cause debug messages > or errors, and that should get most of them. I did not chase this in detail, but it looked to me like there were two code paths leading to heap_prepare_freeze_tuple, and only one of them came from VACUUM. So adding a Relation parameter seemed like a more promising fix for it. But possibly there are more error messages we need to worry about besides this. regards, tom lane
Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot
(moving to -hackers) On Sun, Oct 14, 2018 at 10:42:40AM -0700, Andres Freund wrote: > I'm unhappy this approach was taken over objections. Without a real > warning. Oops, that was not clear to me. Sorry about that! I did not see you objecting again after the last arguments I raised. The end of PREPARE TRANSACTION is designed like that since it has been introduced, so changing the way the dummy GXACT is inserted before the main one is cleared from its XID does not sound wise to me. The current design is also present for a couple of reasons, please see this thread: https://www.postgresql.org/message-id/13905.1119109...@sss.pgh.pa.us This has resulted in e26b0abd. Among the things I thought are: - Clearing the XID at the same time the dummy entry is added, which actually means to hold on ProcArrayLock longer while doing more at the end of prepare. I actually don't think you can do that cleanly without endangering the transaction visibility for other backends, and syncrep may cause the window to get wider. - Changing GetRunningTransactionData so as duplicates are removed at this stage. However this also requires to hold ProcArrayLock for longer. For most deployments, if no dummy entries from 2PC transactions are present it would be possible to bypass the checks to remove the duplicated entries, but if at least one dummy entry is found it would be necessary to scan again the whole ProcArray, which would be most likely the case at each checkpoint with workloads like what Konstantin has mentioned in the original report. And ProcArrayLock is already a point of contention for many OLTP workloads with small transactions. So the performance argument worries me. Speaking of which, I have looked at the performance of qsort, and for a couple of thousand entries, we may not see any impact. But I am not confident enough to say that it would be OK for all platforms each time a standby snapshot is taken when ordering works on 4-bytes elements, so the performance argument from Konstantin seems quite sensible to me (see the quickly-hacked qsort_perf.c attached). > Even leaving the crummyness aside, did you check other users of > XLOG_RUNNING_XACTS, e.g. logical decoding? I actually spent some time checking that, so it is not innocent. SnapBuildWaitSnapshot() waits for transactions to commit or abort based on the XIDs in the record. And that's the only place where those XIDs are used so it won't matter to wait twice for the same transaction to finish. The error callback would be used only once. -- Michael /* * Measure performance of qsort: * qsort_perf number_elements */ #include #include #include #include int int_compare(const void *arg1, const void *arg2) { int int1 = *(const int *) arg1; int int2 = *(const int *) arg2; if (int1 > int2) return 1; if (int1 < int2) return -1; return 0; } int main(int argc, char **argv) { int num_elts, i; int *elts; if (argc != 2) { fprintf(stderr, "Incorrect number of arguments\n"); exit(1); } num_elts = atoi(argv[1]); elts = (int *) malloc(num_elts * sizeof(int)); for (i = 0; i < num_elts; i++) elts[i] = random(); fprintf(stdout, "Beginning %u\n", (unsigned) time(NULL)); qsort(elts, num_elts, sizeof(int), int_compare); fprintf(stdout, "Finishing %u\n", (unsigned) time(NULL)); exit(0); } signature.asc Description: PGP signature
Re: relhassubclass and partitioned indexes
Hi, On 2018/10/22 11:09, Michael Paquier wrote: > On Fri, Oct 19, 2018 at 06:46:15PM +0900, Amit Langote wrote: >> Thanks. Attached a patch to set relhassubclass when an index partition is >> added to a partitioned index. > > Thanks, committed after adding a test with ALTER TABLE ONLY, and > checking upgrades as well as ATTACH partition for ALTER INDEX and ALTER > TABLE. In all cases the same code paths are taken. Thank you for committing with those changes. I didn't know about create index on "only". >> /* update pg_inherits, if needed */ >> if (OidIsValid(parentIndexRelid)) >> +{ >> StoreSingleInheritance(indexRelationId, parentIndexRelid, 1); >> >> +/* Also, set the parent's relhassubclass. */ >> +SetRelationHasSubclass(parentIndexRelid, true); >> +} > > Calling SetRelationHasSubclass() updates pg_class for this parent > relation. We would need CommandCounterIncrement() if the tuple gets > updated, but that's not the case as far as I checked for all code paths > where this gets called. This would be seen immediately by the way.. I assume you're talking about avoiding getting into a situation that results in "ERROR: tuple concurrently updated". Afaics, acquire_inherited_sample_rows() "does" perform CCI, because as the comment says the parent's pg_class row may already have been updated in that case: /* CCI because we already updated the pg_class row in this command */ CommandCounterIncrement(); SetRelationHasSubclass(RelationGetRelid(onerel), false); In all the other case, SetRelationHasSubclass() seems to be the first time that the parent's pg_class row is updated. Thanks, Amit
Re: relhassubclass and partitioned indexes
On Fri, Oct 19, 2018 at 06:46:15PM +0900, Amit Langote wrote: > Thanks. Attached a patch to set relhassubclass when an index partition is > added to a partitioned index. Thanks, committed after adding a test with ALTER TABLE ONLY, and checking upgrades as well as ATTACH partition for ALTER INDEX and ALTER TABLE. In all cases the same code paths are taken. > /* update pg_inherits, if needed */ > if (OidIsValid(parentIndexRelid)) > + { > StoreSingleInheritance(indexRelationId, parentIndexRelid, 1); > > + /* Also, set the parent's relhassubclass. */ > + SetRelationHasSubclass(parentIndexRelid, true); > + } Calling SetRelationHasSubclass() updates pg_class for this parent relation. We would need CommandCounterIncrement() if the tuple gets updated, but that's not the case as far as I checked for all code paths where this gets called. This would be seen immediately by the way.. -- Michael signature.asc Description: PGP signature
Re: found xmin x from before relfrozenxid y
Hi, On 2018-10-21 10:24:16 -0400, Tom Lane wrote: > =?UTF-8?Q?Johannes_Gra=c3=abn?= writes: > > after upgrading to version 11, I see the error pattern "found xmin x > > from before relfrozenxid y" in different databases on different hosts. > > From https://www.postgresql.org/docs/10/static/release-10-3.html, I > > learned that this was an error caused by pg_upgrade, which apparently > > had been fixed in 10.3. This page also states that refreshing the > > affected materialized view non-concurrently would fix the problem. > > My question is now how to infer the affected materialized view from the > > error message. Is there a way to tell which one to refresh from the xmin > > or relfrozenxid value? > > No :-(. I wonder why in the world we didn't make that error message > include the relation and block number the tuple was found in. Because it was a really complicated bugfix already, I don't think the answer is more complicated than that. > (Well, I see the short answer: the code layer throwing the error > doesn't know. But that could be fixed easily enough.) I wonder if the better approach wouldn't be to add an errcontext for vaccuum, where continually update the block number etc. Theres plenty of different sources of corruption that'd potentially cause debug messages or errors, and that should get most of them. Greetings, Andres Freund
Re: More issues with pg_verify_checksums and checksum verification in base backups
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Sun, Oct 21, 2018 at 11:03:30AM -0400, Stephen Frost wrote: > > This doesn't change my opinion of the bigger question though, which is > > to what extent we should be implicitly supporting extensions and > > whatever else putting files into the database and tablespace > > directories. > > Well, the whole point is that I have never seen either that it is > forbidden for extensions to drop files into global/ and/or base/. I am > pretty sure that I'd actually want to be able to do that myself by the > way. If I had pluggable storage APIs and the possibility to write by > myself a storage engine out-of-the-box, I would like to be able to rely > on the default tablespace path and use other tablespace paths. All of this pie-in-the-sky about what pluggable storage might have is just hand-waving, in my opinion, and not worth much more than that. I hope (and suspect..) that the actual pluggable storage that's being worked on doesn't do any of this "just drop a file somewhere" because there's a lot of downsides to it- and if it did, it wouldn't be much more than what we can do with an FDW, so why go through and add it? Considering the example given doesn't today do that (maybe it once did?) as you mentioned up-thread, it seems like maybe there was a realization that it's not a good idea even without this issue around pg_basebackup and pg_verify_checksums. > > Even if we go with this proposed change to look at the relation > > filename, I'd be happier with some kind of warning being thrown when we > > come across files we don't recognize in directories that aren't intended > > to have random files showing up. > > Yes, that could be something we could do, as an option I would guess as > this does not match with what v10 does. I'll wait for more people to > provide input on this thread before answering more, but if possible I > think that we should focus on fixing v11 and HEAD first, then try to > figure out what we'd want to do later on. pg_basebackup works the way it does in v10 because we've accepted letting users drop files into the root of PGDATA and even encouraged it to some extent. I don't think there was ever a serious intent that it would be used to back up an extension's self-created files on the filesystem in tablespaces, since there's no way for it to know how to do so in a way that would ensure that they're consistent or useful or sensible to backup online. What are you thinking the 'option' would look like? Everything I come up with seems awful confusing and not very sensible. Thanks! Stephen signature.asc Description: PGP signature
RE: WAL archive (archive_mode = always) ?
From: Adelino Silva [mailto:adelino.j.si...@googlemail.com] > What is the advantage to use archive_mode = always in a slave server compared > to archive_mode = on (shared WAL archive) ? > > I only see duplication of Wal files, what is the purpose of this feature ? This also saves you the network bandwidth by not sending the WAL archive from the primary to the standby(s). The network bandwidth can be costly between remote regions for disaster recovery. Regards Takayuki Tsunakawa
Re: More issues with pg_verify_checksums and checksum verification in base backups
On Sun, Oct 21, 2018 at 11:03:30AM -0400, Stephen Frost wrote: > This doesn't change my opinion of the bigger question though, which is > to what extent we should be implicitly supporting extensions and > whatever else putting files into the database and tablespace > directories. Well, the whole point is that I have never seen either that it is forbidden for extensions to drop files into global/ and/or base/. I am pretty sure that I'd actually want to be able to do that myself by the way. If I had pluggable storage APIs and the possibility to write by myself a storage engine out-of-the-box, I would like to be able to rely on the default tablespace path and use other tablespace paths. > Even if we go with this proposed change to look at the relation > filename, I'd be happier with some kind of warning being thrown when we > come across files we don't recognize in directories that aren't intended > to have random files showing up. Yes, that could be something we could do, as an option I would guess as this does not match with what v10 does. I'll wait for more people to provide input on this thread before answering more, but if possible I think that we should focus on fixing v11 and HEAD first, then try to figure out what we'd want to do later on. -- Michael signature.asc Description: PGP signature
Re: Patch to avoid SIGQUIT accident
I agree with your arguments, and if instead we put an option to disable this before compiling or a set in the psql cli? On Sun, Oct 21, 2018, 20:20 Tom Lane wrote: > Renato dos Santos writes: > > I have been using psql for quite a few days. And I have accidentally > pressed the CTRL + \ keys that sends the signal QUIT+Coredump (CTRL+4 also > sends the same signal). > > I hope it's relevant to more people. (This has bothered me.) > > > this patch avoids the output of the CLI using ctrl + \ is the same as > ctrl + c > > I'm fairly confused about why this would be a good idea, for several > reasons: > > * If you have a tendency to make that typo, why would you want a fix that > only affects psql and not any of your other applications? (If you do > want the latter, there are already ways to do it, eg you could remap > SIGQUIT to some other key via stty, or disable core dumps via ulimit.) > > * If we put this in, what becomes of people who actually want a core dump, > eg for debugging? > > * SIGQUIT is a fairly well-known way to get out of an application when all > else fails. People who aren't familiar with psql's exit commands might > find it pretty unfriendly of us to block this off. > > regards, tom lane >
Re: Patch to avoid SIGQUIT accident
Renato dos Santos writes: > I have been using psql for quite a few days. And I have accidentally pressed > the CTRL + \ keys that sends the signal QUIT+Coredump (CTRL+4 also sends the > same signal). > I hope it's relevant to more people. (This has bothered me.) > this patch avoids the output of the CLI using ctrl + \ is the same as ctrl + c I'm fairly confused about why this would be a good idea, for several reasons: * If you have a tendency to make that typo, why would you want a fix that only affects psql and not any of your other applications? (If you do want the latter, there are already ways to do it, eg you could remap SIGQUIT to some other key via stty, or disable core dumps via ulimit.) * If we put this in, what becomes of people who actually want a core dump, eg for debugging? * SIGQUIT is a fairly well-known way to get out of an application when all else fails. People who aren't familiar with psql's exit commands might find it pretty unfriendly of us to block this off. regards, tom lane
Re: Pluggable Storage - Andres's take
On Thu, Oct 18, 2018 at 6:28 AM Haribabu Kommi wrote: > On Tue, Oct 16, 2018 at 6:06 AM Alexander Korotkov > wrote: >> * 0002-add-costing-function-to-API.patch – adds function for costing >> sequential and table sample scan to tableam API. zheap costing >> function are now copies of heap costing function. This should be >> adjusted in future. > > This patch misses the new *_cost.c files that are added specific cost > functions. Thank you for noticing. Revised patchset is attached. >> Estimation for heap lookup during index scans >> should be also pluggable, but not yet implemented (TODO). > > Yes, Is it possible to use the same API that is added by above > patch? I'm not yet sure. I'll elaborate more on that. I'd like to keep number of costing functions small. Handling of costing of index scan heap fetches will probably require function signature change. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0001-remove-extra-snapshot-functions-2.patch Description: Binary data 0002-add-costing-function-to-API-2.patch Description: Binary data
Re: [PATCH] XLogReadRecord returns pointer to currently read page
On 17/08/2018 06:47, Andrey Lepikhov wrote: I propose the patch for fix one small code defect. The XLogReadRecord() function reads the pages of a WAL segment that contain a WAL-record. Then it creates a readRecordBuf buffer in private memory of a backend and copy record from the pages to the readRecordBuf buffer. Pointer 'record' is set to the beginning of the readRecordBuf buffer. But if the WAL-record is fully placed on one page, the XLogReadRecord() function forgets to bind the "record" pointer with the beginning of the readRecordBuf buffer. In this case, XLogReadRecord() returns a pointer to an internal xlog page. This patch fixes the defect. Hmm. I agree it looks weird the way it is. But I wonder, why do we even copy the record to readRecordBuf, if it fits on the WAL page? Returning a pointer to the xlog page buffer seems OK in that case. What if we remove the memcpy(), instead? - Heikki
Re: remove deprecated @@@ operator ?
Oleg Bartunov writes: > The commit 9b5c8d45f62bd3d243a40cc84deb93893f2f5122 is now 10+ years > old, may be we could remove deprecated @@@ operator ? Is it actually causing any problem? AFAICS it's just a couple extra pg_operator entries, so why not leave it? I'd be +1 for removing it from the docs, though ... regards, tom lane
remove deprecated @@@ operator ?
Hello, The commit 9b5c8d45f62bd3d243a40cc84deb93893f2f5122 is now 10+ years old, may be we could remove deprecated @@@ operator ? Author: Tom Lane Date: Mon Apr 14 17:05:34 2008 + Push index operator lossiness determination down to GIST/GIN opclass "consistent" functions, and remove pg_amop.opreqcheck, as per recent discussion. The main immediate benefit of this is that we no longer need 8.3's ugly hack of requiring @@@ rather than @@ to test weight-using tsquery searches on GIN indexes. In future it should be possible to optimize some other queries better than is done now, by detecting at runtime whether the index match is exact or not. Tom Lane, after an idea of Heikki's, and with some help from Teodor. Best Regards, Oleg -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: SQL:2011 PERIODS vs Postgres Ranges?
On Sun, Oct 21, 2018 at 12:11 PM Heikki Linnakangas wrote: > On 21/10/2018 21:17, Paul A Jungwirth wrote: > > 3. Build our own abstractions on top of ranges, and then use those to > > implement PERIOD-based features. > +1 on this approach. I think [7] got the model right. If we can > implement SQL-standard PERIODs on top of it, then that's a bonus, but > having sane, flexible, coherent set of range operators is more important > to me. Okay, I'm surprised to hear from you and Isaac that following the standard isn't as important as I thought, but I'm certainly pleased not to make it the focus. I just thought that Postgres's reputation was to be pretty careful about sticking to it. (I think we could still add a standard-compliant layer, but like you I don't feel a duty to suffer from it.) It sounds like I should work out some proposed function signatures and write up how to use them, and see what people think. Is that a useful approach? > What are we missing? Here are a few big ones: 1. Define temporal primary keys and foreign keys that are known to the database catalog and controlled as higher-level objects. For instance I wrote an extension at https://github.com/pjungwir/time_for_keys to create temporal foreign keys, but the database isn't "aware" of them. That means they are more cluttered in `\d foo` than necessary (you see the trigger constraints instead of something about a foreign key), they don't automatically disappear if you drop the column, it is hard to make them "polymorphic" (My extension supports only int+tstzrange.), they don't validate that the referenced table has a declared temporal PK, they probably have slightly different locking/transaction semantics than the real RI code, etc. This is what I'd like to implement right now. 2. System time: automatically track DML changes to the table, and let you query "as of" a given time. 3. Temporal joins. I don't want to tackle this myself, because there is already an amazing proposed patch that does everything we could ask for at https://www.postgresql-archive.org/PROPOSAL-Temporal-query-processing-with-range-types-tt5913058.html (recently updated btw, so I hope someone will look at it!). 4. Temporal UPDATE/DELETE: these should be converted to instead change the end time of old rows and insert new rows with the changed attributes. I'm interested in implementing this too, but one thing at a time. . . . I really appreciate your sharing your thoughts! Paul
Re: SQL:2011 PERIODS vs Postgres Ranges?
On 21/10/2018 21:17, Paul A Jungwirth wrote: 3. Build our own abstractions on top of ranges, and then use those to implement PERIOD-based features. This is the least clear option, and I imagine it would require a lot more design effort. Our range types are already a step in this direction. Does anyone think this approach has promise? If so I can start thinking about how we'd do it. I imagine we could use a lot of the ideas in [7]. ... [7] C. J. Date, Hugh Darwen, Nikos Lorentzos. Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL. 2nd edition, 2014. +1 on this approach. I think [7] got the model right. If we can implement SQL-standard PERIODs on top of it, then that's a bonus, but having sane, flexible, coherent set of range operators is more important to me. What are we missing? It's been years since I read that book, but IIRC temporal joins is one thing, at least. What features do you have in mind? - Heikki
Re: SQL:2011 PERIODS vs Postgres Ranges?
On Sun, 21 Oct 2018 at 14:18, Paul A Jungwirth wrote: > Also, just how strictly do we have to follow the standard? Requiring > sentinels like '01 JAN 3000` just seems so silly. Could Postgres > permit nullable start/end PERIOD columns, and give them the same > meaning as ranges (unbounded)? Even if I forgot about ranges > altogether, I'd sure love to avoid these sentinels. > We have "infinity" and "-infinity" values in our date and timestamp types: https://www.postgresql.org/docs/current/static/datatype-datetime.html I think this avoids the silliness with sentinel values. For myself, I don't care about PERIOD etc. one bit. The "every new capability gets its own syntax" model that SQL follows is very old-fashioned, and for good reason. I'm happy with ranges and exclusion constraints. But if we can provide an implementation of PERIOD that makes it easier to port applications written for legacy database systems, it might be worthwhile.
SQL:2011 PERIODS vs Postgres Ranges?
Hello, I'm interested in contributing some temporal database functionality to Postgres, starting with temporal primary and foreign keys. I know some other folks nearby interested in helping out, too. But before we begin I'd like to ask the community about complying with the SQL:2011 standard [1] for these things. In SQL:2011, temporal features all build upon PERIODs, which are a new concept you can attach to tables. Each PERIOD is composed of a start column and an end column (both of some date/time type). You define PERIODs when you CREATE TABLE or ALTER TABLE. Then you refer to the periods when you create primary keys or foreign keys to make them temporal. There are also a handful of new operators for testing two ranges for overlap/succession/etc.[2] Most PERIODs are for tracking the history of a *thing* over time, but if the PERIOD is named SYSTEM_TIME it instead tracks the history of changes to *your database*.[3] (Google for "bitemporal" to read more about this.) Personally I think PERIODs are quite disappointing. They are not part of relational theory. They are not a column, but something else. If you say `SELECT * FROM t` you don't get `PERIODs` (as far as I can tell). But you can mention PERIODs approximately wherever you can mention columns [4], so now we have to support them when projecting, selecting, joining, aggregating, etc. (Or if we are permitted to not support them in some of those places, isn't that even worse?) You can see that PERIODs share a lot with Postgres's own range types. But ranges are a real column, requiring no special-case behavior, either for RDBMS implementers or SQL users. They have a richer set of operators.[5] They don't require any special declarations to put them in a table. They aren't limited to just date/time types. You can even define new range types yourself (e.g. I've found it helpful before to define inetrange and floatrange). Also the start/end columns of a PERIOD must be not nullable,[6] so that unbounded ranges must use sentinels like `01 JAN ` or `01 JAN 3000` instead. Also there is no way (as far as I can tell) to define and use a period within a subquery or CTE or view. Many of these criticisms of PERIODs you can find in [7], pages 403 - 410 (where "interval" means basically our own range types), plus others: for example PERIODs are always closed/open, you can only have a single application PERIOD per table, they are wordy, etc. I expect that any Postgres implementation of the standard would wind up using ranges internally. For example a temporal primary key would use an exclusion constraint based on a range expression, so if you had a PERIOD defined on columns named `valid_start` and `valid_end`, the PK would use something like `EXCLUDE USING gist (id WITH =, tstzrange(valid_start, valid_end) WITH &&)`. Also the new SQL:2011 operators would be easy to implement on top of our range operators. And then a temporal foreign key implementation would use either those or raw range operators. So is there any way for Postgres to offer the same temporal features, but give users the choice of using either PERIODs or ranges? If we built that, would the community be interested in it? I think there are several possible ways to go about it: 1. Permit defining PERIODs on either a start/end column pair, or an existing range column. Then everything else continues to use PERIODs. This seems tidy to implement, although since it acquiesces to the PERIOD-based approach for temporal functionality, it doesn't solve all the problems above. Also as [9] points out, it would lead to incompatibilities in the new `information_schema` views. E.g. `periods` is supposed to have `start_column_name` and `end_column_name` columns.[8] 2. Permit either ranges or PERIODs in the new syntax, e.g. `PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)` where `valid_at` is either a PERIOD or a range column. Similarly with foreign keys. There is probably some `information_schema` messiness here too, but perhaps less than with #1. This seems like a great alternative to application-time PERIODs, but I'm not sure how you'd tell Postgres to use a range column for the system-time dimension.[3] Perhaps just a function, and then the PERIOD of `SYSTEM_TIME` would call that function (with a range expression). 3. Build our own abstractions on top of ranges, and then use those to implement PERIOD-based features. This is the least clear option, and I imagine it would require a lot more design effort. Our range types are already a step in this direction. Does anyone think this approach has promise? If so I can start thinking about how we'd do it. I imagine we could use a lot of the ideas in [7]. 4. Just give up and follow the standard to the letter. I'm not enthusiastic about this, but I also really want temporal features, so I might still do the work if that's what folks preferred. Left to my own devices I would probably go with a mix of #2 & #3, where temporal functionality is exposed by a layer of public
Patch to avoid SIGQUIT accident
Hello,I have been using psql for quite a few days. And I have accidentally pressed the CTRL + \ keys that sends the signal QUIT+Coredump (CTRL+4 also sends the same signal).I hope it's relevant to more people. (This has bothered me.)this patch avoids the output of the CLI using ctrl + \ is the same as ctrl + c Renato dos Santosshaz...@gmail.com psql.patch Description: Binary data smime.p7s Description: S/MIME cryptographic signature
Re: More issues with pg_verify_checksums and checksum verification in base backups
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > This is a follow-up of the following thread: > https://www.postgresql.org/message-id/20181012010411.re53cwcistcpi...@alap3.anarazel.de Thanks for starting this thread Michael. > pg_verify_checksums used first a blacklist to decide if files have > checksums or not. So with this approach all files should have checksums > except files like pg_control, pg_filenode.map, PG_VERSION, etc. So, this is a great example- pg_control actually *does* have a CRC and we really should be checking it in tools like pg_verify_checksums and pg_basebackup. Similairly, we should be trying to get to a point where we have checksums for anything else that we actually care about. > d55241a has added a first fix to silence the buildfarm for EXEC_BACKEND > builds. After discussion, Andres has pointed out that some extensions > may want to drop files within global/ or base/ as part of their logic. > cstore_fdw was mentioned but if you look at their README that's not the > case. So the one example that's been put forward doesn't actually do this..? > However, I think that Andres argument is pretty good as with > pluggable storage we should allow extensions to put custom files for > different tablespaces. Andres' wasn't argueing, that I saw at least, that pluggable storage would result in random files showing up in tablespace directories that the core code has no knowledge of. In fact, he seemed to be saying in 20181019205724.ewnuhvrsanaci...@alap3.anarazel.de that pluggable storage results in the core code being aware of the files that those other storage mechanisms create, so this entire line of argument seems to be without merit. > So this commit has changed the logic of > pg_verify_checksums to use a whitelist, which assumes that only normal > relation files have checksums: > > . > _ > _. After more discussion on the thread mentioned above, Stephen has pointed > out that base backups use the same blacklist logic when verifying > checksums. This has the same problem with EXEC_BACKEND-specific files, > resulting in spurious warnings (that's an example!) which are confusing > for the user: > $ pg_basebackup -D popo > WARNING: cannot verify checksum in file "./global/config_exec_params", > block 0: read buffer size 5 and page size 8192 differ Stephen also extensively argued that extensions shouldn't be dropping random files into PG's database and tablespace directories and that we shouldn't be writing code which attempts to work around that (and, indeed, ultimately can't since technically extension authors could drop files into those directories which match these "whitelist patterns"). Further, using a whitelist risks possibly missing files that should be included, unlike having an exclude list which ensures that we actually check everything and complain about anything found that's out of the ordinary. Additionally, having these checks would hopefully make it clear that people shouldn't be dropping random files into our database and tablespace directories- something we didn't try to do for the root of the data directory, resulting in, frankly, a big mess. Allowing random files to exist in the data directory has lead to quite a few issues including things like pg_basebackup breaking because of a root-owned file or similar that can't be read, and this extends that. I thought the point of this new thread was to encourage discussion of that point and the pros and cons seen for each side, not to only include one side of it, so I'm rather disappointed. > Folks on this thread agreed that both pg_verify_checksums and checksum > verification for base backups should use the same logic. It seems to me > that using a whitelist-based approach for both is easier to maintain as > the patterns of files supporting checksums is more limited than files > which do not support checksums. So this way we allow extensions > bundling custom files to still work with pg_verify_checksums and > checksum verification in base backups. This is not an accurate statement- those random files which some extension drops into these directories don't "work" with pg_verify_checksums, this just makes pg_verify_checksums ignore them. That is not the same thing at all. Worse, we end up with things like pg_basebackup copying/backing up those files, but skipping them for validation checking, when we have no reason to expect that those files will be at all valid when they're copied and no way to see if they are valid in the resulting restore. Other parts of the system will continue to similairly do things that people might not expect- DROP DATABASE will happily nuke any file it finds, no matter if it matches these patterns or not. Basically, the way I see it, at least, is that we should either maintain that PG's database and tablespace directories are under the purview of PG and other things shouldn't be putting files there without the core code's knowledge, or we decide that it's ok for ran
Re: found xmin x from before relfrozenxid y
=?UTF-8?Q?Johannes_Gra=c3=abn?= writes: > after upgrading to version 11, I see the error pattern "found xmin x > from before relfrozenxid y" in different databases on different hosts. > From https://www.postgresql.org/docs/10/static/release-10-3.html, I > learned that this was an error caused by pg_upgrade, which apparently > had been fixed in 10.3. This page also states that refreshing the > affected materialized view non-concurrently would fix the problem. > My question is now how to infer the affected materialized view from the > error message. Is there a way to tell which one to refresh from the xmin > or relfrozenxid value? No :-(. I wonder why in the world we didn't make that error message include the relation and block number the tuple was found in. (Well, I see the short answer: the code layer throwing the error doesn't know. But that could be fixed easily enough.) In the meantime, the only answer I can think of offhand is to manually do VACUUM FREEZE on each of your MVs, and then refresh anything that shows up with an error. regards, tom lane
More issues with pg_verify_checksums and checksum verification in base backups
Hi all, This is a follow-up of the following thread: https://www.postgresql.org/message-id/20181012010411.re53cwcistcpi...@alap3.anarazel.de In a nutshell, b34e84f1 has added TAP tests for pg_verify_checksums, and the buildfarm has immediately complained about files generated by EXEC_BACKEND missing, causing pg_verify_checksums to fail for such builds. An extra issue has been noticed by Andres in the tool: it misses to check for temporary files, hence files like base/pgsql_tmp/pgsql_tmp$PID.NN.sharedfileset/1.1 can also cause the tool to fail. After a crash, those files would not be removed, and stopping the instance would still let them around. pg_verify_checksums used first a blacklist to decide if files have checksums or not. So with this approach all files should have checksums except files like pg_control, pg_filenode.map, PG_VERSION, etc. d55241a has added a first fix to silence the buildfarm for EXEC_BACKEND builds. After discussion, Andres has pointed out that some extensions may want to drop files within global/ or base/ as part of their logic. cstore_fdw was mentioned but if you look at their README that's not the case. However, I think that Andres argument is pretty good as with pluggable storage we should allow extensions to put custom files for different tablespaces. So this commit has changed the logic of pg_verify_checksums to use a whitelist, which assumes that only normal relation files have checksums: . _ _.diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index b20f6c379c..207e2facb8 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -72,7 +72,6 @@ static void parse_basebackup_options(List *options, basebackup_options *opt); static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli); static int compareWalFileNames(const void *a, const void *b); static void throttle(size_t increment); -static bool is_checksummed_file(const char *fullpath, const char *filename); /* Was the backup currently in-progress initiated in recovery mode? */ static bool backup_started_in_recovery = false; @@ -187,17 +186,6 @@ static const char *excludeFiles[] = NULL }; -/* - * List of files excluded from checksum validation. - */ -static const char *const noChecksumFiles[] = { - "pg_control", - "pg_filenode.map", - "pg_internal.init", - "PG_VERSION", - NULL, -}; - /* * Called when ERROR or FATAL happens in perform_base_backup() after @@ -1101,8 +1089,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, /* Exclude all forks for unlogged tables except the init fork */ if (isDbDir && - parse_filename_for_nontemp_relation(de->d_name, &relOidChars, -&relForkNum)) + ParseRelationFileName(de->d_name, &relOidChars, &relForkNum)) { /* Never exclude init forks */ if (relForkNum != INIT_FORKNUM) @@ -1312,32 +1299,6 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, return size; } -/* - * Check if a file should have its checksum validated. - * We validate checksums on files in regular tablespaces - * (including global and default) only, and in those there - * are some files that are explicitly excluded. - */ -static bool -is_checksummed_file(const char *fullpath, const char *filename) -{ - const char *const *f; - - /* Check that the file is in a tablespace */ - if (strncmp(fullpath, "./global/", 9) == 0 || - strncmp(fullpath, "./base/", 7) == 0 || - strncmp(fullpath, "/", 1) == 0) - { - /* Compare file against noChecksumFiles skiplist */ - for (f = noChecksumFiles; *f; f++) - if (strcmp(*f, filename) == 0) -return false; - - return true; - } - else - return false; -} /* * Functions for handling tar file format @@ -1391,13 +1352,15 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf char *filename; /* - * Get the filename (excluding path). As last_dir_separator() - * includes the last directory separator, we chop that off by - * incrementing the pointer. + * Check if a given file should have its checksums verified or not. + * Only relation files should have this property now. First get the + * filename (excluding path). As last_dir_separator() includes the + * last directory separator, we chop that off by incrementing the + * pointer. */ filename = last_dir_separator(readfilename) + 1; - if (is_checksummed_file(readfilename, filename)) + if (ParseRelationFileName(filename, NULL, NULL)) { verify_checksum = true; diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c index 74ff6c359b..eae32e9a5d 100644 --- a/src/backend/storage/file/reinit.c +++ b/src/backend/storage/file/reinit.c @@ -186,8 +186,7 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op) unlogged_relation_entry ent; /* Skip anything that doesn't look like a relation data file. */ - if (!parse_fil