Re: Track IO times in pg_stat_io
Hi, On 3/9/23 1:34 AM, Andres Freund wrote: Hi, On 2023-03-08 12:55:34 +0100, Drouvot, Bertrand wrote: On 3/7/23 7:47 PM, Andres Freund wrote: On 2023-03-07 13:43:28 -0500, Melanie Plageman wrote: No, I don't think we can do that. It can be enabled on a per-session basis. Oh right. So it's even less clear to me to get how one would make use of those new *_time fields, given that: - pg_stat_io is "global" across all sessions. So, even if one session is doing some "testing" and needs to turn track_io_timing on, then it is even not sure it's only reflecting its own testing (as other sessions may have turned it on too). I think for 17 we should provide access to per-existing-connection pg_stat_io stats, and also provide a database aggregated version. Neither should be particularly hard. +1 that would be great. I don't think it's particularly useful to use the time to calculate "per IO" costs - they can vary *drastically* due to kernel level buffering. Exactly and I think that's the reason why it could be useful. I think that could help (with frequent enough sampling) to try to identify when the IOs are served by the page cache or not (if one knows his infra well enough). One could say (for example, depending on his environment) that if the read_time > 4ms then the IO is served by spindle disks (if any) and if <<< ms then by the page cache. What I mean is that one could try to characterized their IOs based on threshold that they could define. Adding/reporting histograms in the game would be even better: something we could look for for 17? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pg_upgrade and logical replication
Hi, On Thu, Mar 09, 2023 at 12:05:36PM +0530, Amit Kapila wrote: > On Wed, Mar 8, 2023 at 12:26 PM Julien Rouhaud wrote: > > > > Is there something that can be done for pg16? I was thinking that having a > > fix for the normal and easy case could be acceptable: only allowing > > pg_upgrade to optionally, and not by default, preserve the subscription > > relations IFF all subscriptions only have tables in ready state. Different > > states should be transient, and it's easy to check as a user beforehand and > > also easy to check during pg_upgrade, so it seems like an acceptable > > limitations (which I personally see as a good sanity check, but YMMV). It > > could be lifted in later releases if wanted anyway. > > > > It's unclear to me whether this limited scope would also require to > > preserve the replication origins, but having looked at the code I don't > > think it would be much of a problem as the local LSN doesn't have to be > > preserved. > > > > I think we need to preserve replication origins as they help us to > determine the WAL location from where to start the streaming after the > upgrade. If we don't preserve those then from which location will the > subscriber start streaming? It would start from the slot's information on the publisher side, but I guess there's no guarantee that this will be accurate in all cases. > We don't want to replicate the WAL which > has already been sent. Yeah I agree. I added support to also preserve the subscription's replication origin information, a new --preserve-subscription-state (better naming welcome) documented option for pg_upgrade to optionally ask for this new mode, and a similar (but undocumented) option for pg_dump that only works with --binary-upgrade and added a check in pg_upgrade that all relations are in 'r' (ready) mode. Patch v2 attached. >From 0a77ac305243e0f58dbfce6bb7c8cf062b45d4f4 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Wed, 22 Feb 2023 09:19:32 +0800 Subject: [PATCH v2] Optionally preserve the full subscription's state during pg_upgrade Previously, only the subscription metadata information was preserved. Without the list of relations and their state it's impossible to re-enable the subscriptions without missing some records as the list of relations can only be refreshed after enabling the subscription (and therefore starting the apply worker). Even if we added a way to refresh the subscription while enabling a publication, we still wouldn't know which relations are new on the publication side, and therefore should be fully synced, and which shouldn't. Similarly, the subscription's replication origin are needed to ensure that we don't replicate anything twice. To fix this problem, this patch teaches pg_dump in binary upgrade mode to emit additional commands to be able to restore the content of pg_subscription_rel, and addition LSN parameter in the subscription creation to restore the underlying replication origin remote LSN. The LSN parameter is only accepted in CREATE SUBSCRIPTION in binary upgrade mode. The new ALTER SUBSCRIPTION subcommand, usable only during binary upgrade, has the following syntax: ALTER SUBSCRIPTION name ADD TABLE (relid = XYZ, state = 'x' [, lsn = 'X/Y']) The relation is identified by its oid, as it's preserved during pg_upgrade. The lsn is optional, and defaults to NULL / InvalidXLogRecPtr. This mode is optional and not enabled by default. A new --preserve-subscription-state option is added to pg_upgrade to use it. For now, pg_upgrade will check that all the subscription relations are in 'r' (ready) state, and will error out if any subscription relation in any database has a different state, logging the list of problematic databases with the number of problematic relation in each. Author: Julien Rouhaud Reviewed-by: FIXME Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud --- doc/src/sgml/ref/pgupgrade.sgml | 13 +++ src/backend/commands/subscriptioncmds.c | 67 +- src/backend/parser/gram.y | 11 +++ src/bin/pg_dump/pg_backup.h | 2 + src/bin/pg_dump/pg_dump.c | 114 +++- src/bin/pg_dump/pg_dump.h | 13 +++ src/bin/pg_upgrade/check.c | 54 +++ src/bin/pg_upgrade/dump.c | 3 +- src/bin/pg_upgrade/option.c | 7 ++ src/bin/pg_upgrade/pg_upgrade.h | 1 + src/include/nodes/parsenodes.h | 3 +- 11 files changed, 283 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml index 7816b4c685..aef3b8a8b8 100644 --- a/doc/src/sgml/ref/pgupgrade.sgml +++ b/doc/src/sgml/ref/pgupgrade.sgml @@ -240,6 +240,19 @@ PostgreSQL documentation + + --preserve-subscription-state + + +Fully preserve the logical subscription state if any. That include the +underlying replication origin with their remote LSN
Re: ICU locale validation / canonicalization
On 28.02.23 06:57, Jeff Davis wrote: On Mon, 2023-02-20 at 15:23 -0800, Jeff Davis wrote: New patch attached. The new patch also includes a GUC that (when enabled) validates that the collator is actually found. New patch attached. Now it always preserves the exact locale string during pg_upgrade, and does not attempt to canonicalize it. Before it was trying to be clever by determining if the language tag was finding the same collator as the original string -- I didn't find a problem with that, but it just seemed a bit too clever. So, only newly-created locales and databases have the ICU locale string canonicalized to a language tag. Also, I added a SQL function pg_icu_language_tag() that can convert locale strings to language tags, and check whether they exist or not. This patch appears to do about three things at once, and it's not clear exactly where the boundaries are between them and which ones we might actually want. And I think the terminology also gets mixed up a bit, which makes following this harder. 1. Canonicalizing the locale string. This is presumably what uloc_canonicalize() does, which the patch doesn't actually use. What are examples of what this does? Does the patch actually do this? 2. Converting the locale string to BCP 47 format. This converts 'de@collation=phonebook' to 'de-u-co-phonebk'. This is what uloc_getLanguageTag() does. 3. Validating the locale string, to reject faulty input. What are the relationships between these? I don't understand how the validation actually happens in your patch. Does uloc_getLanguageTag() do the validation also? Can you do canonicalization without converting to language tag? Can you do validation of un-canonicalized locale names? What is the guidance for the use of the icu_locale_validation GUC? The description throws in yet another term: "validates that ICU locale strings are well-formed". What is "well-formed"? How does that relate to the other concepts? Personally, I'm not on board with this behavior: => CREATE COLLATION test (provider = icu, locale = 'de@collation=phonebook'); NOTICE: 0: using language tag "de-u-co-phonebk" for locale "de@collation=phonebook" I mean, maybe that is a thing we want to do somehow sometime, to migrate people to the "new" spellings, but the old ones aren't wrong. So this should be a separate consideration, with an option, and it would require various updates in the documentation. It also doesn't appear to address how to handle ICU before version 54. But, see earlier questions, are these three things all connected somehow?
Re: Allow tests to pass in OpenSSL FIPS mode
On 08.03.23 10:37, Daniel Gustafsson wrote: The comment in question was missed in 55fe26a4b58, but I agree that it's a false claim given the OpenSSL implementation so removing or at least mimicking the comments in cryptohash_openssl.c would be better. I have fixed these comments to match cryptohash_openssl.c.
Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences
On 2023-Jan-22, Karl O. Pinc wrote: > Actually, this CSS, added to doc/src/sgml/stylesheet.css, > makes the column spacing look pretty good: > > /* Adequate spacing between columns in a simplelist non-inline table */ > .simplelist td { padding-left: 2em; padding-right: 2em; } Okay, this looks good to me too. However, for it to actually work, we need to patch the corresponding CSS file in the pgweb repository too. I'll follow up in the other mailing list. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: Time delayed LR (WAS Re: logical replication restrictions)
At Thu, 9 Mar 2023 11:00:46 +0530, Amit Kapila wrote in > On Wed, Mar 8, 2023 at 9:20 AM Masahiko Sawada wrote: > > > > On Wed, Mar 8, 2023 at 3:30 AM Nathan Bossart > > wrote: > > > > > > > > IMO the current set of > > > trade-offs (e.g., unavoidable bloat and WAL buildup) would make this > > > feature virtually unusable for a lot of workloads, so it's probably worth > > > exploring an alternative approach. > > > > It might require more engineering effort for alternative approaches > > such as one I proposed but the feature could become better from the > > user perspective. I also think it would be worth exploring it if we've > > not yet. > > > > Fair enough. I think as of now most people think that we should > consider alternative approaches for this feature. The two ideas at a If we can notify subscriber of the transaction start time, will that solve the current problem? If not, or if it is not possible, +1 to look for other solutions. > high level are that the apply worker itself first flushes the decoded > WAL (maybe only when time-delay is configured) or have a separate > walreceiver process as we have for standby. I think we need to analyze > the pros and cons of each of those approaches and see if they would be > useful even for other things on the apply side. My understanding of the requirements here is that the publisher should not hold changes, the subscriber should not hold data reads, and all transactions including two-phase ones should be applied at once upon committing. Both sides need to respond to the requests from the other side. We expect apply-delay of several hours or more. My thoughts considering the requirements are as follows: If we expect delays of several hours or more, I don't think it's feasible to stack received changes in the process memory. So, if apply-delay is in effect, I think it would be better to process transactions through files regardless of process configuration. I'm not sure whether we should have a separate process for protocol processing. On one hand, it would simplify the protocol processing part, but on the other hand, changes would always have to be applied through files. If we plan to integrate the paths with and without apply-delay by the file-passing method, this might work. If we want to maintain the current approach when not applying apply-delay, I think we would have to implement it in a single process, but I feel the protocol processing part could become complicated. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Move defaults toward ICU in 16?
On 08.03.23 06:55, Jeff Davis wrote: On Fri, 2023-03-03 at 21:45 -0800, Jeff Davis wrote: 0002: update template0 in new cluster (as described above) I think 0002 is about ready and I plan to commit it soon unless someone has more comments. 0002 seems fine to me. I'm holding off on 0001 for now, because you objected. But I still think 0001 is a good idea so I'd like to hear more before I withdraw it. Let's come back to that after dealing with the other two. 0003: default initdb to use ICU This is also about ready, and I plan to commit this soon after 0002. This seems mostly ok to me. I have a few small comments. +default, ICU obtains the ICU locale from the ICU default collator. This seems to be a fancy way of saying, the default ICU locale will be set to something that approximates what you have set your operating system to. Which is what we want, I think. Can we say this in a more user-friendly way? +static void +check_icu_locale() should be check_icu_locale(void) + if (U_ICU_VERSION_MAJOR_NUM >= 54) + { If we're going to add more of these mysterious version checks, could we add a comment about what they are for? However, I suspect what this chunk is doing is some sort of canonicalization/language-tag conversion, which per the other thread, I have some questions about. How about for this patch, we skip this part and just do the else branch + icu_locale = pg_strdup(default_locale); and then put the canonicalization business into the canonicalization patch set?
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Thu, Mar 9, 2023 at 2:56 PM Kyotaro Horiguchi wrote: > > At Thu, 9 Mar 2023 11:00:46 +0530, Amit Kapila > wrote in > > On Wed, Mar 8, 2023 at 9:20 AM Masahiko Sawada > > wrote: > > > > > > On Wed, Mar 8, 2023 at 3:30 AM Nathan Bossart > > > wrote: > > > > > > > > > > > IMO the current set of > > > > trade-offs (e.g., unavoidable bloat and WAL buildup) would make this > > > > feature virtually unusable for a lot of workloads, so it's probably > > > > worth > > > > exploring an alternative approach. > > > > > > It might require more engineering effort for alternative approaches > > > such as one I proposed but the feature could become better from the > > > user perspective. I also think it would be worth exploring it if we've > > > not yet. > > > > > > > Fair enough. I think as of now most people think that we should > > consider alternative approaches for this feature. The two ideas at a > > If we can notify subscriber of the transaction start time, will that > solve the current problem? > I don't think that will solve the current problem because the problem is related to confirming back the flush LSN (commit LSN) to the publisher which we do only after we commit the delayed transaction. Due to this, we are not able to advance WAL(restart_lsn)/XMIN on the publisher which causes an accumulation of WAL and does not allow the vacuum to remove deleted rows. Do you have something else in mind which makes you think that it can solve the problem? -- With Regards, Amit Kapila.
Re: improving user.c error messages
On 20.02.23 23:58, Nathan Bossart wrote: Similarly -- this is an existing issue but we might as well look at it -- in something like must be superuser or a role with privileges of the pg_write_server_files role the phrase "a role with the privileges of that other role" seems ambiguous. Doesn't it really mean you must be a member of that role? Membership alone is not sufficient. You must also inherit the privileges of the role via the INHERIT option. I thought about making this something like must have the INHERIT option on role %s but I'm not sure that's accurate either. That wording makes it sound lіke you need to be granted membership to the role directly WITH INHERIT OPTION, but what you really need is membership, direct or indirect, with an INHERIT chain up to the role in question. However, it looks like "must have the ADMIN option on role %s" is used to mean something similar, so perhaps I am overthinking it. For now, I've reworded these as "must inherit privileges of". I don't have a good mental model of all this role inheritance, personally, but I fear that this change makes the messages more jargony and less clear. Maybe the original wording was good enough. A couple of other thoughts: "admin option" is sort of a natural language term, I think, so we don't need to parametrize it as "%s option". Also, there are no other "options" in this context, I think. A general thought: It seems we currently don't have any error messages that address the user like "You must do this". Do we want to go there? Should we try for a more impersonal wording like "You must have the %s attribute to create roles." "Current user must have the %s attribute to create roles." "%s attribute is required to create roles." By the way, I'm not sure what the separation between 0001 and 0002 is supposed to be.
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Hi Peter, > 1. > In my previous review [1] (comment #1) I wrote that only some of the > "should" were misleading and gave examples where to change. But I > didn't say that *every* usage of that word was wrong, so your global > replace of "should" to "must" has modified a couple of places in > unexpected ways. > > Details are in subsequent review comments below -- see #2b, #3, #5. > Ah, that was my mistake. Thanks for thorough review on this. > > == > doc/src/sgml/logical-replication.sgml > > 2. > A published table must have a “replica identity” configured in order > to be able to replicate UPDATE and DELETE operations, so that > appropriate rows to update or delete can be identified on the > subscriber side. By default, this is the primary key, if there is one. > Another unique index (with certain additional requirements) can also > be set to be the replica identity. If the table does not have any > suitable key, then it can be set to replica identity “full”, which > means the entire row becomes the key. When replica identity “full” is > specified, indexes can be used on the subscriber side for searching > the rows. Candidate indexes must be btree, non-partial, and have at > least one column reference (i.e. cannot consist of only expressions). > These restrictions on the non-unique index properties adheres to some > of the restrictions that are enforced for primary keys. Internally, we > follow a similar approach for supporting index scans within logical > replication scope. If there are no such suitable indexes, the search > on the subscriber side can be very inefficient, therefore replica > identity “full” must only be used as a fallback if no other solution > is possible. If a replica identity other than “full” is set on the > publisher side, a replica identity comprising the same or fewer > columns must also be set on the subscriber side. See REPLICA IDENTITY > for details on how to set the replica identity. If a table without a > replica identity is added to a publication that replicates UPDATE or > DELETE operations then subsequent UPDATE or DELETE operations will > cause an error on the publisher. INSERT operations can proceed > regardless of any replica identity. > > ~~ > > 2a. > My previous review [1] (comment #2) was not fixed quite as suggested. > > Please change: > "adheres to" --> "adhere to" > > Oops, it seems I only got the "to" part of your suggestion and missed "s". Done now. > ~~ > > 2b. should/must > > This should/must change was OK as it was before, because here it is only > advice. > > Please change this back how it was: > "must only be used as a fallback" --> "should only be used as a fallback" > Thanks, changed. > > == > src/backend/executor/execReplication.c > > 3. build_replindex_scan_key > > /* > * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' > that > * is setup to match 'rel' (*NOT* idxrel!). > * > - * Returns whether any column contains NULLs. > + * Returns how many columns must be used for the index scan. > + * > > ~ > > This should/must change does not seem quite right. > > SUGGESTION (reworded) > Returns how many columns to use for the index scan. > Fixed. (I wish we had a simpler process to incorporate such comments.) > > ~~~ > > 4. build_replindex_scan_key > > > > > Based on the discussions below, I kept as-is. I really don't want to do > unrelated > > changes in this patch, as I also got several feedback for not doing it, > > > > Hmm, although this code pre-existed I don’t consider this one as > "unrelated changes" because the patch introduced the new "if > (!AttributeNumberIsValid(table_attno))" which changed things. As I > wrote to Amit yesterday [2] IMO it would be better to do the 'opttype' > assignment *after* the potential 'continue' otherwise there is every > chance that the assignment is just redundant. And if you move the > assignment where it belongs, then you might as well declare the > variable in the more appropriate place at the same time – i.e. with > 'opfamily' declaration. Anyway, I've given my reason a couple of times > now, so if you don't want to change it I won't about it debate > anymore. > Alright, given both you and Amit [1] agree on this, I'll follow that. > > == > src/backend/replication/logical/relation.c > > 5. FindUsableIndexForReplicaIdentityFull > > + * XXX: There are no fundamental problems for supporting non-btree > indexes. > + * We mostly need to relax the limitations in > RelationFindReplTupleByIndex(). > + * For partial indexes, the required changes are likely to be larger. If > + * none of the tuples satisfy the expression for the index scan, we must > + * fall-back to sequential execution, which might not be a good idea in > some > + * cases. > > The should/must change (the one in the XXX comment) does not seem quite > right. > > SUGGESTION > "we must fall-back to sequential execution" --> "we fallback to > sequential execution" > fixed, thanks. > > == > .../subs
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Hi Vignesh C, > > Hmm, can you please elaborate more on this? The declaration > > and assignment are already on different lines. > > > > ps: pgindent changed this line a bit. Does that look better? > > I thought of changing it to something like below: > bool isUsableIndex; > Oid idxoid = lfirst_oid(lc); > Relation indexRelation = index_open(idxoid, AccessShareLock); > IndexInfo *indexInfo = BuildIndexInfo(indexRelation); > > isUsableIndex = IsIndexUsableForReplicaIdentityFull(indexInfo); > > Alright, this looks slightly better. I did a small change to your suggestion, basically kept *lfirst_oid * as the first statement in the loop. I'll attach the changes on v38 in the next e-mail. Thanks, Onder KALACI
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Hi Amit, all > > > > This new test takes ~9s on my machine whereas most other tests in > subscription/t take roughly 2-5s. I feel we should try to reduce the > test timing without sacrificing much of the functionality or code > coverage. Alright, that is reasonable. > I think if possible we should try to reduce setup/teardown > cost for each separate test by combining them where possible. I have a > few comments on tests which also might help to optimize these tests. > > 1. > +# Testcase start: SUBSCRIPTION USES INDEX > +# > +# Basic test where the subscriber uses index > +# and only updates 1 row and deletes > +# 1 other row > ... > ... > +# Testcase start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS > +# > +# Basic test where the subscriber uses index > +# and updates 50 rows > > ... > +# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS > +# > +# Basic test where the subscriber uses index > +# and deletes 200 rows > > I think to a good extent these tests overlap each other. I think we > can have just one test for the index with multiple columns that > updates multiple rows and have both updates and deletes. > Alright, dropped *SUBSCRIPTION USES INDEX*, expanded *SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS* with an UPDATE that touches multiple rows > 2. > +# Testcase start: SUBSCRIPTION DOES NOT USE PARTIAL INDEX > ... > ... > +# Testcase start: SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS > > Instead of having separate tests where we do all setups again, I think > it would be better if we create both the indexes in one test and show > that none of them is used. > Makes sense > > 3. > +# now, the update could either use the test_replica_id_full_idy or > test_replica_id_full_idy index > +# it is not possible for user to control which index to use > > The name of the second index is wrong in the above comment. > thanks, fixed > > 4. > +# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN > > As we have removed enable_indexscan check, you should remove this test. > Hmm, I think my rebase problems are causing confusion here, which v38 fixes. In the first commit, we have ENABLE_INDEXSCAN checks. In the second commit, I changed the same test to use enable_replica_identity_full_index_scan. If we are going to only consider the first patch to get into the master branch, I can probably drop the test. In that case, I'm not sure what is our perspective on ENABLE_INDEXSCAN GUC. Do we want to keep that guard in the code (hence the test)? > > 5. In general, the line length seems to vary a lot for different > multi-line comments. Though we are not strict in that it will look > better if there is consistency in that (let's have ~80 cols line > length for each comment in a single line). > > Went over the tests, and made ~80 cols. There is one exception, in the first commit, the test for enable_indexcan is still shorter, but I failed to make that properly. I'll try to fix that as well, but I didn't want to block the progress due to that. Also, you have not noted, but I think* SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS* already covers *SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS.* So, I changed the first one to *SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS* and dropped the second one. Let me know if it does not make sense to you. If I try, there are few more opportunities to squeeze in some more tests together, but those would start to complicate readability. Attached v38. Thanks, Onder KALACI v38_0002_use_index_on_subs_when_pub_rep_ident_full.patch Description: Binary data v38_0001_use_index_on_subs_when_pub_rep_ident_full.patch Description: Binary data
RE: Support logical replication of DDLs
On Thur, Mar 9, 2023 10:27 AM Wang, Wei/王 威 > On Mon, Mar 6, 2023 18:17 PM Wang, Wei/王 威 > wrote: > > On Mon, Mar 6, 2023 14:34 AM Ajin Cherian wrote: > > > Changes are in patch 1 and patch 2 > > > > Thanks for updating the patch set. > > > > Here are some comments: > > Here are some more comments for v-75-0002* patch: Thanks for your comments. > 1. In the function deparse_AlterRelation > + if ((sub->address.objectId != relId && > + sub->address.objectId != InvalidOid) && > + !(subcmd->subtype == AT_AddConstraint && > + subcmd->recurse) && > + istable) > + continue; > I think when we execute the command "ALTER TABLE ... CLUSTER ON" > (subtype is AT_ClusterOn), this command will be skipped for parsing. I > think we need to parse this command here. > > I think we are skipping some needed parsing due to this check, such as > [1].#1 and the AT_ClusterOn command mentioned above. After reading the > thread, I think the purpose of this check is to fix the bug in [2] > (see the last point in [2]). > I think maybe we could modify this check to `continue` when > sub->address.objectId and relId are inconsistent and > sub->sub->address.objectId is a > child (inherited or partition) table. What do you think? Fixed as suggested. > ~~~ > > 2. In the function deparse_CreateStmt > I think when we execute the following command: > `CREATE TABLE tbl (a int GENERATED ALWAYS AS (1) STORED);` the > deparsed result is : > `CREATE TABLE public.tbl (a pg_catalog.int4 STORAGE plain > GENERATED ALWAYS AS 1 STORED);` I think the parentheses around > generation_expr(I mean `1`) are missing, which would cause a syntax > error. Fixed. > ~~~ > > 3. In the function deparse_IndexStmt > I think we missed parsing of options [NULLS NOT DISTINCT] in the > following > command: > ``` > CREATE UNIQUE INDEX ... ON table_name ... NULLS NOT DISTINCT; ``` I > think we could check this option via node->nulls_not_distinct. Fixed. Best Regards, Hou zj
Re: Raising the SCRAM iteration count
> On 9 Mar 2023, at 08:09, Michael Paquier wrote: > > On Wed, Mar 08, 2023 at 05:21:20PM +0900, Michael Paquier wrote: >> On Wed, Mar 08, 2023 at 09:07:36AM +0100, Daniel Gustafsson wrote: >>> AFAIK a TAP test with psql_interactive is the only way to do this so that's >>> what I've implemented. > > I cannot think of a better idea than what you have here, so I am > marking this patch as ready for committer. Thanks for review! > I am wondering how stable a logic based on a timer of 5s would be.. Actually that was a bug, it should be using the default timeout and restarting for each operation to ensure that even overloaded hosts wont time out unless something is actually broken/incorrect. I've fixed that in the attached rev and also renamed the password in the regress test from "raisediterationcount" as it's now lowering the count in the test. Unless there objections to this version I plan to commit that during this CF. -- Daniel Gustafsson v8-0001-Make-SCRAM-iteration-count-configurable.patch Description: Binary data
Re: Add standard collation UNICODE
On 08.03.23 19:25, Jeff Davis wrote: Why is "unicode" only provided for the UTF-8 encoding? For "ucs_basic" that makes some sense, because the implementation only works in UTF-8. But here we are using ICU, and the "und" locale should work for any ICU-supported encoding. I suggest that we use collencoding=-1 for "unicode", and the docs can just add a note next to "ucs_basic" that it only works for UTF-8, because that's the weird case. make sense For the docs, I suggest that you clarify that "ucs_basic" has the same behavior as the C locale does *in the UTF-8 encoding*. Not all users might pick up on the subtlety that the C locale has different behaviors in different encodings. Ok, word-smithed a bit more. How about this patch version? From a8e33d010f60cceb9442123bd0531451875df313 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 9 Mar 2023 11:14:28 +0100 Subject: [PATCH v2] Add standard collation UNICODE Discussion: https://www.postgresql.org/message-id/flat/1293e382-2093-a2bf-a397-c04e8f83d...@enterprisedb.com --- doc/src/sgml/charset.sgml | 31 --- src/bin/initdb/initdb.c | 10 +++--- 2 files changed, 35 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 3032392b80..12fabb7372 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -659,9 +659,34 @@ Standard Collations -Additionally, the SQL standard collation name ucs_basic -is available for encoding UTF8. It is equivalent -to C and sorts by Unicode code point. +Additionally, two SQL standard collation names are available: + + + + unicode + + +This collation sorts using the Unicode Collation Algorithm with the +Default Unicode Collation Element Table. It is available in all +encodings. ICU support is required to use this collation. (This +collation has the same behavior as the ICU root locale; see .) + + + + + + ucs_basic + + +This collation sorts by Unicode code point. It is only available for +encoding UTF8. (This collation has the same +behavior as the libc locale specification C in +UTF8 encoding.) + + + + diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 5e3c6a27c4..d303cc5609 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -1486,10 +1486,14 @@ static void setup_collation(FILE *cmdfd) { /* -* Add an SQL-standard name. We don't want to pin this, so it doesn't go -* in pg_collation.h. But add it before reading system collations, so -* that it wins if libc defines a locale named ucs_basic. +* Add SQL-standard names. We don't want to pin these, so they don't go +* in pg_collation.dat. But add them before reading system collations, so +* that they win if libc defines a locale with the same name. */ + PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, collowner, collprovider, collisdeterministic, collencoding, colliculocale)" + "VALUES (pg_nextoid('pg_catalog.pg_collation', 'oid', 'pg_catalog.pg_collation_oid_index'), 'unicode', 'pg_catalog'::regnamespace, %u, '%c', true, -1, 'und');\n\n", + BOOTSTRAP_SUPERUSERID, COLLPROVIDER_ICU); + PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, collowner, collprovider, collisdeterministic, collencoding, collcollate, collctype)" "VALUES (pg_nextoid('pg_catalog.pg_collation', 'oid', 'pg_catalog.pg_collation_oid_index'), 'ucs_basic', 'pg_catalog'::regnamespace, %u, '%c', true, %d, 'C', 'C');\n\n", BOOTSTRAP_SUPERUSERID, COLLPROVIDER_LIBC, PG_UTF8); base-commit: 36ea345f8fa616fd9b40576310e54145aa70c1a1 -- 2.39.2
Re: Allow tests to pass in OpenSSL FIPS mode
On Thu, Mar 09, 2023 at 10:01:14AM +0100, Peter Eisentraut wrote: > I have fixed these comments to match cryptohash_openssl.c. Missed that, thanks for the fix. -- Michael signature.asc Description: PGP signature
Re: A bug with ExecCheckPermissions
I didn't like very much this business of setting the perminfoindex directly to '2' and '1'. It looks ugly with no explanation. What do you think of creating the as we go along and set each index correspondingly, as in the attached? -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Ellos andaban todos desnudos como su madre los parió, y también las mujeres, aunque no vi más que una, harto moza, y todos los que yo vi eran todos mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón) >From f0041bf1ac2f0c727d6b8495a63a548240c3b9c5 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 9 Mar 2023 11:33:18 +0100 Subject: [PATCH] fix ExecCheckPermissions call in RI_Initial_Check --- src/backend/executor/execMain.c | 22 src/backend/utils/adt/ri_triggers.c | 40 - 2 files changed, 44 insertions(+), 18 deletions(-) diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index b32f419176..6230f5e3d4 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -577,6 +577,28 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos, ListCell *l; bool result = true; +#ifdef USE_ASSERT_CHECKING + Bitmapset *indexset = NULL; + + /* Check that rteperminfos is consistent with rangeTable */ + foreach(l, rangeTable) + { + RangeTblEntry *rte = lfirst_node(RangeTblEntry, l); + + if (rte->perminfoindex != 0) + { + /* Sanity checks */ + (void) getRTEPermissionInfo(rteperminfos, rte); + /* Many-to-one mapping not allowed */ + Assert(!bms_is_member(rte->perminfoindex, indexset)); + indexset = bms_add_member(indexset, rte->perminfoindex); + } + } + + /* All rteperminfos are referenced */ + Assert(bms_num_members(indexset) == list_length(rteperminfos)); +#endif + foreach(l, rteperminfos) { RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l); diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c index 375b17b9f3..b4e789899e 100644 --- a/src/backend/utils/adt/ri_triggers.c +++ b/src/backend/utils/adt/ri_triggers.c @@ -1373,10 +1373,11 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel) char fkrelname[MAX_QUOTED_REL_NAME_LEN]; char pkattname[MAX_QUOTED_NAME_LEN + 3]; char fkattname[MAX_QUOTED_NAME_LEN + 3]; - RangeTblEntry *pkrte; - RangeTblEntry *fkrte; + RangeTblEntry *rte; RTEPermissionInfo *pk_perminfo; RTEPermissionInfo *fk_perminfo; + List *rtes = NIL; + List *perminfos = NIL; const char *sep; const char *fk_only; const char *pk_only; @@ -1394,25 +1395,29 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel) * * XXX are there any other show-stopper conditions to check? */ - pkrte = makeNode(RangeTblEntry); - pkrte->rtekind = RTE_RELATION; - pkrte->relid = RelationGetRelid(pk_rel); - pkrte->relkind = pk_rel->rd_rel->relkind; - pkrte->rellockmode = AccessShareLock; - pk_perminfo = makeNode(RTEPermissionInfo); pk_perminfo->relid = RelationGetRelid(pk_rel); pk_perminfo->requiredPerms = ACL_SELECT; - - fkrte = makeNode(RangeTblEntry); - fkrte->rtekind = RTE_RELATION; - fkrte->relid = RelationGetRelid(fk_rel); - fkrte->relkind = fk_rel->rd_rel->relkind; - fkrte->rellockmode = AccessShareLock; + perminfos = lappend(perminfos, pk_perminfo); + rte = makeNode(RangeTblEntry); + rte->rtekind = RTE_RELATION; + rte->relid = RelationGetRelid(pk_rel); + rte->relkind = pk_rel->rd_rel->relkind; + rte->rellockmode = AccessShareLock; + rte->perminfoindex = list_length(perminfos); + rtes = lappend(rtes, rte); fk_perminfo = makeNode(RTEPermissionInfo); fk_perminfo->relid = RelationGetRelid(fk_rel); fk_perminfo->requiredPerms = ACL_SELECT; + perminfos = lappend(perminfos, fk_perminfo); + rte = makeNode(RangeTblEntry); + rte->rtekind = RTE_RELATION; + rte->relid = RelationGetRelid(fk_rel); + rte->relkind = fk_rel->rd_rel->relkind; + rte->rellockmode = AccessShareLock; + rte->perminfoindex = list_length(perminfos); + rtes = lappend(rtes, rte); for (int i = 0; i < riinfo->nkeys; i++) { @@ -1425,8 +1430,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel) fk_perminfo->selectedCols = bms_add_member(fk_perminfo->selectedCols, attno); } - if (!ExecCheckPermissions(list_make2(fkrte, pkrte), - list_make2(fk_perminfo, pk_perminfo), false)) + if (!ExecCheckPermissions(rtes, perminfos, false)) return false; /* @@ -1436,9 +1440,9 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel) */ if (!has_bypassrls_privilege(GetUserId()) && ((pk_rel->rd_rel->relrowsecurity && - !object_ownercheck(RelationRelationId, pkrte->relid, GetUserId())) || + !object_ownercheck(RelationRelationId, RelationGetRelid(pk_rel), GetUserId())) || (fk_rel->rd_rel->relrowsecurity && - !object_ownercheck(RelationRelationId, fkrte->relid, GetUserId() + !object_ownercheck(RelationRelationId,
Re: Small omission in type_sanity.sql
On 2023-Jan-27, Andres Freund wrote: > On 2023-01-27 20:39:04 -0500, Tom Lane wrote: > > Andres Freund writes: > > > Tom, is there a reason we run the various sanity tests early-ish in the > > > schedule? It does seem to reduce their effectiveness a bit... > > > > Originally, those tests were mainly needed to sanity-check the > > hand-maintained initial catalog data, so it made sense to run them > > early. > > It's also kinda useful to have some basic validity testing early on, because > if there's something wrong with the catalog values, it'll cause lots of issues > later. We can just list the tests twice in the schedule ... -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: A bug with ExecCheckPermissions
Hi Alvaro, On Thu, Mar 9, 2023 at 7:39 PM Alvaro Herrera wrote: > I didn't like very much this business of setting the perminfoindex > directly to '2' and '1'. It looks ugly with no explanation. What do > you think of creating the as we go along and set each index > correspondingly, as in the attached? Agree it looks cleaner and self-explanatory that way. Thanks. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
On Thu, Mar 9, 2023 at 3:26 PM Önder Kalacı wrote: > >> >> 4. >> +# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN >> >> As we have removed enable_indexscan check, you should remove this test. > > > Hmm, I think my rebase problems are causing confusion here, which v38 fixes. > I think it is still not fixed in v38 as the test is still present in 0001. > In the first commit, we have ENABLE_INDEXSCAN checks. In the second commit, > I changed the same test to use enable_replica_identity_full_index_scan. > > If we are going to only consider the first patch to get into the master > branch, > I can probably drop the test. In that case, I'm not sure what is our > perspective > on ENABLE_INDEXSCAN GUC. Do we want to keep that guard in the code > (hence the test)? > I am not sure what we are going to do on this because I feel we need some storage option as you have in 0002 patch but you and Andres thinks that is not required. So, we can discuss that a bit more after 0001 is committed but if there is no agreement then we need to probably drop it. Even if drop it, I don't think using enable_index makes sense. I think for now you don't need to send 0002 patch, let's first try to get 0001 patch and then we can discuss about 0002. >> >> >> 5. In general, the line length seems to vary a lot for different >> multi-line comments. Though we are not strict in that it will look >> better if there is consistency in that (let's have ~80 cols line >> length for each comment in a single line). >> > > Went over the tests, and made ~80 cols. There is one exception, in the first > commit, the test for enable_indexcan is still shorter, but I failed to make > that > properly. I'll try to fix that as well, but I didn't want to block the > progress due to > that. > > Also, you have not noted, but I think SUBSCRIPTION USES INDEX WITH MULTIPLE > COLUMNS > already covers SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS. > > So, I changed the first one to SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND > COLUMNS > and dropped the second one. Let me know if it does not make sense to you. If > I try, there are few more > opportunities to squeeze in some more tests together, but those would start > to complicate readability. > I still want to reduce the test time and will think about it. Which of the other tests do you think can be combined? BTW, did you consider updating the patch based on my yesterday's email [1]? One minor comment: +# now, create index and see that the index is used +$node_subscriber->safe_psql('postgres', + "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)"); + +# wait until the index is created +$node_subscriber->poll_query_until( + 'postgres', q{select count(*)=1 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';} +) or die "Timed out while waiting for creating index test_replica_id_full_idx"; + +$node_publisher->safe_psql('postgres', + "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;"); +$node_publisher->wait_for_catchup($appname); + + +# wait until the index is used on the subscriber +$node_subscriber->poll_query_until( + 'postgres', q{select (idx_scan = 1) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';} +) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates one row via index"; + + +# now, create index on column y as well +$node_subscriber->safe_psql('postgres', + "CREATE INDEX test_replica_id_full_idy ON test_replica_id_full(y)"); + +# wait until the index is created +$node_subscriber->poll_query_until( + 'postgres', q{select count(*)=1 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idy';} +) or die "Timed out while waiting for creating index test_replica_id_full_idy"; It appears you are using inconsistent spacing. It may be better to use a single empty line wherever required. [1] - https://www.postgresql.org/message-id/CAA4eK1%2BoM_v-b_WDHZmqCyVHU2oD4j3vF9YcH9xVHj%3DzAfy4og%40mail.gmail.com -- With Regards, Amit Kapila.
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
On Thu, Mar 9, 2023 at 4:50 PM Amit Kapila wrote: > > On Thu, Mar 9, 2023 at 3:26 PM Önder Kalacı wrote: > > > > > > So, I changed the first one to SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS > > AND COLUMNS > > and dropped the second one. Let me know if it does not make sense to you. > > If I try, there are few more > > opportunities to squeeze in some more tests together, but those would start > > to complicate readability. > > > > I still want to reduce the test time and will think about it. Which of > the other tests do you think can be combined? > Some of the ideas I can think of are as follows: 1. Combine "SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS" and "SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS" such that after verifying updates and deletes of the first test, we can drop some of the columns on both publisher and subscriber, then use alter subscription ... refresh publication command and then do the steps of the second test. Note that we don't add tables after initial setup, only changing schema. 2. We can also combine "Some NULL values" and "PUBLICATION LACKS THE COLUMN ON THE SUBS INDEX" as both use the same schema. After the first test, we need to drop the existing index and create a new index on the subscriber node. 3. General comment +# updates 200 rows +$node_publisher->safe_psql('postgres', + "UPDATE test_replica_id_full SET x = x + 1 WHERE x IN (5, 6);"); I think here you are updating 20 rows not 200. So, the comment seems wrong to me. Please think more and see if we can combine some other tests like "Unique index that is not primary key or replica identity" and the test we will have after comment#2 above. -- With Regards, Amit Kapila.
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Hi, Amit Kapila , 8 Mar 2023 Çar, 14:42 tarihinde şunu yazdı: > On Wed, Mar 8, 2023 at 4:51 PM Önder Kalacı wrote: > > > > > >> > >> I just share this case and then we > >> can discuss should we pick the index which only contain the extra > columns on the > >> subscriber. > >> > > > > I think its performance implications come down to the discussion on [1]. > Overall, I prefer > > avoiding adding any additional complexity in the code for some edge > cases. The code > > can handle this sub-optimal user pattern, with a sub-optimal performance. > > > > It is fine to leave this and Hou-San's case if they make the patch > complex. However, it may be better to give it a try and see if this or > other regression/optimization can be avoided without adding much > complexity to the patch. You can prepare a top-up patch and then we > can discuss it. > > > Alright, I did some basic prototypes for the problems mentioned, just to show that these problems can be solved without too much hassle. But, the patchees are not complete, some tests fail, no comments / tests exist, some values should be cached etc. Mostly sharing as a heads up and sharing the progress given I have not responded to this specific mail. I'll update these when I have some extra time after replying to the 0001 patch. Thanks, Onder wip_optimize_for_non_pkey_non_ri_unique_index.patch Description: Binary data wip_for_optimize_index_column_match.diff Description: Binary data
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Hi Amit, > > > >> > >> 4. > >> +# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN > >> > >> As we have removed enable_indexscan check, you should remove this test. > > > > > > Hmm, I think my rebase problems are causing confusion here, which v38 > fixes. > > > > I think it is still not fixed in v38 as the test is still present in 0001. > Ah, yes, sorry again for the noise. v39 will drop that. > > > In the first commit, we have ENABLE_INDEXSCAN checks. In the second > commit, > > I changed the same test to use enable_replica_identity_full_index_scan. > > > > If we are going to only consider the first patch to get into the master > branch, > > I can probably drop the test. In that case, I'm not sure what is our > perspective > > on ENABLE_INDEXSCAN GUC. Do we want to keep that guard in the code > > (hence the test)? > > > > I am not sure what we are going to do on this because I feel we need > some storage option as you have in 0002 patch but you and Andres > thinks that is not required. So, we can discuss that a bit more after > 0001 is committed but if there is no agreement then we need to > probably drop it. Even if drop it, I don't think using enable_index > makes sense. I think for now you don't need to send 0002 patch, let's > first try to get 0001 patch and then we can discuss about 0002. > sounds good, when needed I'll rebase 0002. > > > > Also, you have not noted, but I think SUBSCRIPTION USES INDEX WITH > MULTIPLE COLUMNS > > already covers SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS. > > > > So, I changed the first one to SUBSCRIPTION USES INDEX WITH MULTIPLE > ROWS AND COLUMNS > > and dropped the second one. Let me know if it does not make sense to > you. If I try, there are few more > > opportunities to squeeze in some more tests together, but those would > start to complicate readability. > > > > I still want to reduce the test time and will think about it. Which of > the other tests do you think can be combined? > > I'll follow your suggestion in the next e-mail [2], and focus on further improvements. BTW, did you consider updating the patch based on my yesterday's email [1]? > > Yes, replied to that one just now with some wip commits [1] > It appears you are using inconsistent spacing. It may be better to use > a single empty line wherever required. > > Sure, let me fix those. attached v39. [1] https://www.postgresql.org/message-id/CACawEhXGnk6v7UOHrxuJjjybHvvq33Zv666ouy4UzjPfJM6tBw%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAA4eK1LSYWrthA3xjbrZvZVmwuha10HtM3-QRrVMD7YBt4t3pg%40mail.gmail.com v39_0001_use_index_on_subs_when_pub_rep_ident_full.patch Description: Binary data
Re: Avoid multiple SetLatch() calls in procsignal_sigusr1_handler()
Hi, On 2/28/23 4:30 PM, Bharath Rupireddy wrote: Hi, Most of the multiplexed SIGUSR1 handlers are setting latch explicitly when the procsignal_sigusr1_handler() can do that for them at the end. Right. These multiplexed handlers are currently being used as SIGUSR1 handlers, not as independent handlers, so no problem if SetLatch() is removed from them. Agree, they are only used in procsignal_sigusr1_handler(). A few others do it right by saying /* latch will be set by procsignal_sigusr1_handler */. Yeap, so do HandleProcSignalBarrierInterrupt() and HandleLogMemoryContextInterrupt(). Although, calling SetLatch() in quick succession does no harm (it just returns if the latch was previously set), it seems unnecessary. Agree. I'm attaching a patch that avoids multiple SetLatch() calls. Thoughts? I agree with the idea behind the patch. The thing that worry me a bit is that the changed functions are defined as external and so may produce an impact outside of core pg and I'm not sure that's worth it. Otherwise the patch LGTM. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Tue, Mar 07, 2023 at 10:18:44AM -0800, Andres Freund wrote: > Hi, > > On 2023-03-06 15:21:14 -0500, Melanie Plageman wrote: > > Good point. Attached is what you suggested. I committed the transaction > > before the drop table so that the statistics would be visible when we > > queried pg_stat_io. > > Pushed, thanks for the report, analysis and fix, Tom, Horiguchi-san, Melanie. There's a 2nd portion of the test that's still flapping, at least on cirrusci. The issue that Tom mentioned is at: SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes; But what I've seen on cirrusci is at: SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes; https://api.cirrus-ci.com/v1/artifact/task/6701069548388352/log/src/test/recovery/tmp_check/regression.diffs https://api.cirrus-ci.com/v1/artifact/task/5355168397524992/log/src/test/recovery/tmp_check/regression.diffs https://api.cirrus-ci.com/v1/artifact/task/6142435751886848/testrun/build/testrun/recovery/027_stream_regress/log/regress_log_027_stream_regress It'd be neat if cfbot could show a histogram of test failures, although I'm not entirely sure what granularity would be most useful: the test that failed (027_regress) or the way it failed (:after_write > :before_writes). Maybe it's enough to show the test, with links to its recent failures. -- Justin
Re: Refactor calculations to use instr_time
Hi, There was a warning while applying the patch, v5 is attached. Regards, Nazir Bilal Yavuz Microsoft From dcd49e48a0784a95b8731df1c6ee7c3a612a8529 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Thu, 9 Mar 2023 15:35:38 +0300 Subject: [PATCH v5] Refactor instr_time calculations In instr_time.h it is stated that: * When summing multiple measurements, it's recommended to leave the * running sum in instr_time form (ie, use INSTR_TIME_ADD or * INSTR_TIME_ACCUM_DIFF) and convert to a result format only at the end. So, this patch aims to refactor 'PendingWalStats' to use 'instr_time' instead of 'PgStat_Counter' while accumulating 'wal_write_time' and 'wal_sync_time'. Also, some calculations are refactored to use 'INSTR_TIME_ACCUM_DIFF' instead of using 'INSTR_TIME_SUBTRACT' and 'INSTR_TIME_ADD'. --- src/backend/access/transam/xlog.c | 6 ++--- src/backend/storage/file/buffile.c | 6 ++--- src/backend/utils/activity/pgstat_wal.c | 31 + src/include/pgstat.h| 17 +- src/tools/pgindent/typedefs.list| 1 + 5 files changed, 37 insertions(+), 24 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 543d4d897ae..46821ad6056 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2206,8 +2206,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible) instr_time duration; INSTR_TIME_SET_CURRENT(duration); - INSTR_TIME_SUBTRACT(duration, start); - PendingWalStats.wal_write_time += INSTR_TIME_GET_MICROSEC(duration); + INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, duration, start); } PendingWalStats.wal_write++; @@ -8204,8 +8203,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) instr_time duration; INSTR_TIME_SET_CURRENT(duration); - INSTR_TIME_SUBTRACT(duration, start); - PendingWalStats.wal_sync_time += INSTR_TIME_GET_MICROSEC(duration); + INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, duration, start); } PendingWalStats.wal_sync++; diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c index 0a51624df3b..e55f86b675e 100644 --- a/src/backend/storage/file/buffile.c +++ b/src/backend/storage/file/buffile.c @@ -469,8 +469,7 @@ BufFileLoadBuffer(BufFile *file) if (track_io_timing) { INSTR_TIME_SET_CURRENT(io_time); - INSTR_TIME_SUBTRACT(io_time, io_start); - INSTR_TIME_ADD(pgBufferUsage.temp_blk_read_time, io_time); + INSTR_TIME_ACCUM_DIFF(pgBufferUsage.temp_blk_read_time, io_time, io_start); } /* we choose not to advance curOffset here */ @@ -544,8 +543,7 @@ BufFileDumpBuffer(BufFile *file) if (track_io_timing) { INSTR_TIME_SET_CURRENT(io_time); - INSTR_TIME_SUBTRACT(io_time, io_start); - INSTR_TIME_ADD(pgBufferUsage.temp_blk_write_time, io_time); + INSTR_TIME_ACCUM_DIFF(pgBufferUsage.temp_blk_write_time, io_time, io_start); } file->curOffset += bytestowrite; diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c index e8598b2f4e0..58daae3fbd6 100644 --- a/src/backend/utils/activity/pgstat_wal.c +++ b/src/backend/utils/activity/pgstat_wal.c @@ -21,7 +21,7 @@ #include "executor/instrument.h" -PgStat_WalStats PendingWalStats = {0}; +PgStat_PendingWalStats PendingWalStats = {0}; /* * WAL usage counters saved from pgWALUsage at the previous call to @@ -70,7 +70,7 @@ bool pgstat_flush_wal(bool nowait) { PgStatShared_Wal *stats_shmem = &pgStatLocal.shmem->wal; - WalUsage diff = {0}; + WalUsage wal_usage_diff = {0}; Assert(IsUnderPostmaster || !IsPostmasterEnvironment); Assert(pgStatLocal.shmem != NULL && @@ -88,25 +88,26 @@ pgstat_flush_wal(bool nowait) * Calculate how much WAL usage counters were increased by subtracting the * previous counters from the current ones. */ - WalUsageAccumDiff(&diff, &pgWalUsage, &prevWalUsage); - PendingWalStats.wal_records = diff.wal_records; - PendingWalStats.wal_fpi = diff.wal_fpi; - PendingWalStats.wal_bytes = diff.wal_bytes; + WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevWalUsage); if (!nowait) LWLockAcquire(&stats_shmem->lock, LW_EXCLUSIVE); else if (!LWLockConditionalAcquire(&stats_shmem->lock, LW_EXCLUSIVE)) return true; -#define WALSTAT_ACC(fld) stats_shmem->stats.fld += PendingWalStats.fld - WALSTAT_ACC(wal_records); - WALSTAT_ACC(wal_fpi); - WALSTAT_ACC(wal_bytes); - WALSTAT_ACC(wal_buffers_full); - WALSTAT_ACC(wal_write); - WALSTAT_ACC(wal_sync); - WALSTAT_ACC(wal_write_time); - WALSTAT_ACC(wal_sync_time); +#define WALSTAT_ACC(fld, var_to_add) \ + (stats_shmem->stats.fld += var_to_add.fld) +#define WALLSTAT_ACC_INSTR_TIME_TYPE(fld) \ + (stats_shmem->stats.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld)) + WALSTAT_ACC(wal_records, wal_usage_diff); + WALSTAT_ACC(wal_fpi, wal_usage_diff); + WALSTAT_ACC(wal_bytes, wal_usage_diff); + WALSTAT_ACC(wal_buffers_full, P
Re: SQL/JSON revisited
On 08.03.23 22:40, Andrew Dunstan wrote: There are probably some fairly easy opportunities to reduce the number of non-terminals introduced here (e.g. I think json_aggregate_func could possibly be expanded in place without introducing json_object_aggregate_constructor and json_array_aggregate_constructor). I'm going to make an attempt at that, at least to pick some low hanging fruit. But in the end I think we are going to be left with a significant expansion of the grammar rules, more or less forced on us by the way the SQL Standards Committee rather profligately invents new ways of contorting the grammar. I browsed these patches, and I agree that the grammar is the thing that sticks out as something that could be tightened up a bit. Try to reduce the number of different symbols, and check that the keywords are all in alphabetical order. There are also various bits of code that are commented out, in some cases because they can't be implemented, in some cases without explanation. I think these should all be removed. Otherwise, whoever needs to touch this code next would be under some sort of obligation to keep the commented-out code up to date with surrounding changes, which would be awkward. We can find better ways to explain missing functionality and room for improvement. Also, perhaps we can find better names for the new test files. Like, what does "sqljson.sql" mean, as opposed to, say, "json.sql"? Maybe something like "json_functions", "json_expressions", etc. would be clearer. (Starting it with "json" would also group the files better.) These both seem like things not worth holding up progress for, and I think it would be good to get these patches committed as soon as possible. My intention is to commit them (after some grammar adjustments) plus their documentation in the next few days. If possible, the documentation for each incremental part should be part of that patch, not a separate all-in-one patch.
Re: Should vacuum process config file reload more often
On Thu, Mar 9, 2023 at 4:47 PM John Naylor wrote: > > > On Thu, Mar 9, 2023 at 12:42 AM Jim Nasby wrote: > > > > Doesn't the dead tuple space grow as needed? Last I looked we don't > > allocate up to 1GB right off the bat. > > Incorrect. > > > Of course, if the patch that eliminates the 1GB vacuum limit gets committed > > the situation will be even worse. > > If you're referring to the proposed tid store, I'd be interested in seeing a > reproducible test case with a m_w_m over 1GB where it makes things worse than > the current state of affairs. And I think that the tidstore makes it easy to react to maintenance_work_mem changes. We don't need to enlarge it and just update its memory limit at an appropriate time. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Add shared buffer hits to pg_stat_io
On Wed, Mar 8, 2023 at 2:23 PM Andres Freund wrote: > > On 2023-03-08 13:44:32 -0500, Melanie Plageman wrote: > > However, I am concerned that, while unlikely, this could be flakey. > > Something could happen to force all of those blocks out of shared > > buffers (even though they were just read in) before we hit them. > > You could make the test query a simple nested loop self-join, that'll prevent > the page being evicted, because it'll still be pinned on the outer side, while > generating hits on the inner side. Good idea. v3 attached. From e15bd63d662780898ef6fd584608acb13a12dbe1 Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Sat, 25 Feb 2023 14:37:25 -0500 Subject: [PATCH v3 1/2] Reorder pgstatfuncs-local enum --- src/backend/utils/adt/pgstatfuncs.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index b61a12382b..1112bc2904 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -1276,16 +1276,16 @@ pgstat_get_io_op_index(IOOp io_op) { case IOOP_EVICT: return IO_COL_EVICTIONS; + case IOOP_EXTEND: + return IO_COL_EXTENDS; + case IOOP_FSYNC: + return IO_COL_FSYNCS; case IOOP_READ: return IO_COL_READS; case IOOP_REUSE: return IO_COL_REUSES; case IOOP_WRITE: return IO_COL_WRITES; - case IOOP_EXTEND: - return IO_COL_EXTENDS; - case IOOP_FSYNC: - return IO_COL_FSYNCS; } elog(ERROR, "unrecognized IOOp value: %d", io_op); -- 2.37.2 From b648727abe315375fbf13d314fd3b9398a75b26f Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Sat, 25 Feb 2023 14:36:06 -0500 Subject: [PATCH v3 2/2] Track shared buffer hits in pg_stat_io Count new IOOP_HITs and add "hits" column to pg_stat_io. Reviewed-by: Bertrand Drouvot Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_beMa9Hzih40%3DXPYqhDVz6tsgUGTrhZXRo%3Dunp%2Bszb%3DUA%40mail.gmail.com --- doc/src/sgml/monitoring.sgml | 11 src/backend/catalog/system_views.sql | 1 + src/backend/storage/buffer/bufmgr.c| 38 ++ src/backend/storage/buffer/localbuf.c | 11 ++-- src/backend/utils/activity/pgstat_io.c | 2 +- src/backend/utils/adt/pgstatfuncs.c| 3 ++ src/include/catalog/pg_proc.dat| 6 ++-- src/include/pgstat.h | 1 + src/include/storage/buf_internals.h| 2 +- src/test/regress/expected/rules.out| 3 +- src/test/regress/expected/stats.out| 34 +-- src/test/regress/sql/stats.sql | 21 -- 12 files changed, 90 insertions(+), 43 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 6249bb50d0..8b34ca60bc 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3855,6 +3855,17 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i + + + +hits bigint + + +The number of times a desired block was found in a shared buffer. + + + + diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 34ca0e739f..87bbbdfcb3 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1126,6 +1126,7 @@ SELECT b.writes, b.extends, b.op_bytes, + b.hits, b.evictions, b.reuses, b.fsyncs, diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 0a05577b68..05fd3c9a2a 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -472,7 +472,7 @@ static BufferDesc *BufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum, BufferAccessStrategy strategy, - bool *foundPtr, IOContext *io_context); + bool *foundPtr, IOContext io_context); static void FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object, IOContext io_context); static void FindAndDropRelationBuffers(RelFileLocator rlocator, @@ -850,13 +850,14 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, if (isLocalBuf) { /* - * LocalBufferAlloc() will set the io_context to IOCONTEXT_NORMAL. We - * do not use a BufferAccessStrategy for I/O of temporary tables. + * We do not use a BufferAccessStrategy for I/O of temporary tables. * However, in some cases, the "strategy" may not be NULL, so we can't * rely on IOContextForStrategy() to set the right IOContext for us. * This may happen in cases like CREATE TEMPORARY TABLE AS... */ - bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found, &io_context); + io_context = IOCONTEXT_NORMAL; + io_object = IOOBJECT_TEMP_RELATION; + bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found); if (found) pgBufferUsage.lo
Re: buildfarm + meson
On 2023-03-08 We 17:23, Andrew Dunstan wrote: On 2023-03-08 We 14:22, Andres Freund wrote: Hi, On 2023-03-02 17:35:26 -0500, Andrew Dunstan wrote: On 2023-03-02 Th 17:06, Andres Freund wrote: Hi On 2023-03-02 17:00:47 -0500, Andrew Dunstan wrote: On 2023-03-01 We 16:32, Andres Freund wrote: This is now working on my MSVC test rig (WS2019, VS2019, Strawberry Perl), including TAP tests. I do get a whole lot of annoying messages like this: Unknown TAP version. The first line MUST be `TAP version `. Assuming version 12. The newest minor version has fixed that, it was a misunderstanding about / imprecision in the tap 14 specification. Unfortunately, meson v 1.0.1 appears to be broken on Windows, I had to downgrade back to 1.0.0. Is it possible that you're using a PG checkout from a few days ago? A hack I used was invalidated by 1.0.1, but I fixed that already. CI is running with 1.0.1: https://cirrus-ci.com/task/5806561726038016?logs=configure#L8 No, running against PG master tip. I'll get some details - it's not too hard to switch back and forth. Any more details? I was held up by difficulties even with meson 1.0.0 (the test modules stuff). Now I again have a clean build with meson 1.0.0 on Windows as a baseline I will get back to trying meson 1.0.1. OK, I have now got a clean run using meson 1.0.1 / MSVC. Not sure what made the difference. One change I did make was to stop using "--backend vs" and thus use the ninja backend even for MSVC. That proved necessary to run the new install-test-files target which failed miserably with "--backend vs". Not sure if we have it documented, but if not it should be that you need to use the ninja backend on all platforms. At this stage I think I'm prepared to turn this loose on a couple of my buildfarm animals, and if nothing goes awry for the remainder of the month merge the dev/meson branch and push a new release. There is still probably a little polishing to do, especially w.r.t. log file artefacts. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: meson: Non-feature feature options
On 03.03.23 11:01, Nazir Bilal Yavuz wrote: On Fri, 3 Mar 2023 at 12:16, Peter Eisentraut wrote: On 02.03.23 11:41, Nazir Bilal Yavuz wrote: I am kind of confused. I added these checks for considering other SSL implementations in the future, for this reason I have two nested if checks. The top one is for checking if we need to search an SSL library and the nested one is for checking if we need to search this specific SSL library. What do you think? I suppose that depends on how you envision integrating other SSL libraries into this logic. It's not that important right now; if the structure makes sense to you, that's fine. Please send an updated patch with the small changes that have been mentioned. The updated patch is attached. This seems to work well. One flaw, the "External libraries" summary shows something like ssl: YES 3.0.7 It would be nice if it showed "openssl". How about we just hardcode "openssl" here instead? We could build that array dynamically, of course, but maybe we leave that until we actually have a need?
Re: meson: Non-feature feature options
> On 9 Mar 2023, at 14:45, Peter Eisentraut > wrote: > How about we just hardcode "openssl" here instead? We could build that array > dynamically, of course, but maybe we leave that until we actually have a need? At least for 16 keeping it hardcoded is an entirely safe bet so +1 for leaving additional complexity for when needed. -- Daniel Gustafsson
Re: meson: Non-feature feature options
Hi, On Thu, 9 Mar 2023 at 16:54, Daniel Gustafsson wrote: > > > On 9 Mar 2023, at 14:45, Peter Eisentraut > > wrote: > > > How about we just hardcode "openssl" here instead? We could build that > > array dynamically, of course, but maybe we leave that until we actually > > have a need? > > At least for 16 keeping it hardcoded is an entirely safe bet so +1 for leaving > additional complexity for when needed. We already have the 'ssl_library' variable. Can't we use that instead of hardcoding 'openssl'? e.g: summary( { 'ssl': ssl.found() ? [ssl, '(@0@)'.format(ssl_library)] : ssl, }, section: 'External libraries', list_sep: ', ', ) And it will output: ssl: YES 3.0.8, (openssl) I don't think that using 'ssl_library' will increase the complexity. Regards, Nazir Bilal Yavuz Microsoft
Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security
On Wed, Mar 8, 2023 at 5:44 PM Jacob Champion wrote: > > On Wed, Mar 8, 2023 at 11:40 AM Robert Haas wrote: > > On Wed, Mar 8, 2023 at 2:30 PM Jacob Champion > > wrote: > > > I don't think I necessarily like that option better than SASL-style, > > > but hopefully that clarifies it somewhat? > > > > Hmm, yeah, I guess that's OK. > > Okay, cool. > > > I still don't love it, though. It feels > > more solid to me if the proxy can actually block the connections > > before they even happen, without having to rely on a server > > interaction to figure out what is permissible. > > Sure. I don't see a way for the proxy to figure that out by itself, > though, going back to my asymmetry argument from before. Only the > server truly knows, at time of HBA processing, whether the proxy > itself has authority. If the proxy knew, it wouldn't be confused. > > > I don't know what you mean by SASL-style, exactly. > > That's the one where the server explicitly names all forms of > authentication, including the ambient ones (ANONYMOUS, EXTERNAL, > etc.), and requires the client to choose one before running any > actions on their behalf. That lets the require_auth machinery work for > this case, too. > > --Jacob -- Robert Haas EDB: http://www.enterprisedb.com
Re: meson: Non-feature feature options
> On 9 Mar 2023, at 15:12, Nazir Bilal Yavuz wrote: > > Hi, > > On Thu, 9 Mar 2023 at 16:54, Daniel Gustafsson wrote: >> >>> On 9 Mar 2023, at 14:45, Peter Eisentraut >>> wrote: >> >>> How about we just hardcode "openssl" here instead? We could build that >>> array dynamically, of course, but maybe we leave that until we actually >>> have a need? >> >> At least for 16 keeping it hardcoded is an entirely safe bet so +1 for >> leaving >> additional complexity for when needed. > > We already have the 'ssl_library' variable. Can't we use that instead > of hardcoding 'openssl'? e.g: > > summary( > { >'ssl': ssl.found() ? [ssl, '(@0@)'.format(ssl_library)] : ssl, > }, > section: 'External libraries', > list_sep: ', ', > ) > > And it will output: > ssl: YES 3.0.8, (openssl) > > I don't think that using 'ssl_library' will increase the complexity. That seems like a good idea. -- Daniel Gustafsson
Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security
On Wed, Mar 8, 2023 at 5:44 PM Jacob Champion wrote: > Sure. I don't see a way for the proxy to figure that out by itself, > though, going back to my asymmetry argument from before. Only the > server truly knows, at time of HBA processing, whether the proxy > itself has authority. If the proxy knew, it wouldn't be confused. That seems like a circular argument. If you call the problem the confused deputy problem then the issue must indeed be that the deputy is confused, and needs to talk to someone else to get un-confused. But why is the deputy necessarily confused in the first place? Our deputy is confused because our code to decide whether to proxy a connection or not is super-dumb, but if there's an intrinsic reason it can't be smarter, I don't understand what it is. -- Robert Haas EDB: http://www.enterprisedb.com
Re: meson: Non-feature feature options
On 09.03.23 15:12, Nazir Bilal Yavuz wrote: Hi, On Thu, 9 Mar 2023 at 16:54, Daniel Gustafsson wrote: On 9 Mar 2023, at 14:45, Peter Eisentraut wrote: How about we just hardcode "openssl" here instead? We could build that array dynamically, of course, but maybe we leave that until we actually have a need? At least for 16 keeping it hardcoded is an entirely safe bet so +1 for leaving additional complexity for when needed. We already have the 'ssl_library' variable. Can't we use that instead of hardcoding 'openssl'? e.g: summary( { 'ssl': ssl.found() ? [ssl, '(@0@)'.format(ssl_library)] : ssl, }, section: 'External libraries', list_sep: ', ', ) And it will output: ssl: YES 3.0.8, (openssl) I don't think that using 'ssl_library' will increase the complexity. Then we might as well use ssl_library as the key, like: { ... 'selinux': selinux, ssl_library: ssl, 'systemd': systemd, ... }
Re: Track IO times in pg_stat_io
> >>> Now I've a second thought: what do you think about resetting the related > >>> number > >>> of operations and *_time fields when enabling/disabling track_io_timing? > >>> (And mention it in the doc). > >>> > >>> That way it'd prevent bad interpretation (at least as far the time per > >>> operation metrics are concerned). > >>> > >>> Thinking that way as we'd loose some (most?) benefits of the new *_time > >>> columns > >>> if one can't "trust" their related operations and/or one is not sampling > >>> pg_stat_io frequently enough (to discard the samples > >>> where the track_io_timing changes occur). > >>> > >>> But well, resetting the operations could also lead to bad interpretation > >>> about the operations... > >>> > >>> Not sure about which approach I like the most yet, what do you think? > >> > >> Oh, this is an interesting idea. I think you are right about the > >> synchronization issues making the statistics untrustworthy and, thus, > >> unuseable. > > > > No, I don't think we can do that. It can be enabled on a per-session basis. > Oh right. So it's even less clear to me to get how one would make use of > those new *_time fields, given that: > - pg_stat_io is "global" across all sessions. So, even if one session is > doing some "testing" and needs to turn track_io_timing on, then it > is even not sure it's only reflecting its own testing (as other sessions may > have turned it on too). > - There is the risk mentioned above of bad interpretations for the "time per > operation" metrics. > - Even if there is frequent enough sampling of it pg_stat_io, one does not > know which samples contain track_io_timing changes (at the cluster or session > level). As long as track_io_timing can be toggled, blk_write_time could lead to wrong conclusions. I think it may be helpful to track the blks_read when track_io_timing is enabled Separately. blks_read will be as is and give the overall blks_read, while a new column blks_read_with_timing will only report on blks_read with track_io_timing enabled. blks_read_with_timing should never be larger than blks_read. This will then make the blks_read_time valuable if it's looked at with the blks_read_with_timing column. Regards, -- Sami Imseih Amazon Web Services (AWS)
Re: Improve logging when using Huge Pages
Greetings, * Nathan Bossart (nathandboss...@gmail.com) wrote: > On Wed, Feb 15, 2023 at 10:13:17AM -0800, Nathan Bossart wrote: > > On Tue, Feb 14, 2023 at 07:32:56PM -0600, Justin Pryzby wrote: > >> On Mon, Feb 13, 2023 at 08:18:52PM -0800, Nathan Bossart wrote: > >>> I'm curious why you chose to make this a string instead of an enum. There > >>> might be little practical difference, but since there are only three > >>> possible values, I wonder if it'd be better form to make it an enum. > >> > >> It takes more code to write as an enum - see 002.txt. I'm not convinced > >> this is better. > >> > >> But your comment made me fix its , and reconsider the strings, > >> which I changed to active={unknown/true/false} rather than {unk/on/off}. > >> It could also be active={unknown/yes/no}... > > > > I think unknown/true/false is fine. I'm okay with using a string if no one > > else thinks it should be an enum (or a bool). > > There's been no response for this, so I guess we can proceed with a string > GUC. Just happened to see this and I'm not really a fan of this being a string when it's pretty clear that's not what it actually is. > +Reports whether huge pages are in use by the current instance. > +See for more information. > > I still think we should say "server" in place of "current instance" here. We certainly use 'server' a lot more in config.sgml than we do 'instance'. "currently running server" might be closer to how we describe a running PG system in other parts (we talk about "currently running server processes", "while the server is running", "When running a standby server", "when the server is running"; "instance" is used much less and seems to more typically refer to 'state of files on disk' in my reading vs. 'actively running process' though there's some of each). > + {"huge_pages_active", PGC_INTERNAL, PRESET_OPTIONS, > + gettext_noop("Indicates whether huge pages are in > use."), > + NULL, > + GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | > GUC_RUNTIME_COMPUTED > + }, > > I don't think we need to use GUC_RUNTIME_COMPUTED. 'postgres -C' seems to > always report "unknown" for this GUC, so all this would do is cause that > command to error unnecessarily when the server is running. ... or we could consider adjusting things to actually try the mmap() and find out if we'd end up with huge pages or not. Naturally we'd only want to do that if the server isn't running though and erroring if it is would be perfectly correct. Either that or just refusing to provide it by an error or other approach (see below) seems entirely reasonable. > It might be worth documenting exactly what "unknown" means. IIUC you'll > only ever see "on" or "off" via SHOW or pg_settings, which doesn't seem > tremendously obvious. If we could get rid of that case and just make this a boolean, that seems like it'd really be the best answer. To that end- perhaps this isn't appropriate as a GUC at all? Why not have this just be a system information function? Functionally it really provides the same info- if the server is running then you get back either true or false, if it's not then you can't call it but that's hardly different from an unknown or error result. Thanks, Stephen signature.asc Description: PGP signature
Re: meson: Non-feature feature options
Hi, On Thu, 9 Mar 2023 at 17:18, Peter Eisentraut wrote: > > On 09.03.23 15:12, Nazir Bilal Yavuz wrote: > > Hi, > > > > On Thu, 9 Mar 2023 at 16:54, Daniel Gustafsson wrote: > >> > >>> On 9 Mar 2023, at 14:45, Peter Eisentraut > >>> wrote: > >> > >>> How about we just hardcode "openssl" here instead? We could build that > >>> array dynamically, of course, but maybe we leave that until we actually > >>> have a need? > >> > >> At least for 16 keeping it hardcoded is an entirely safe bet so +1 for > >> leaving > >> additional complexity for when needed. > > > > We already have the 'ssl_library' variable. Can't we use that instead > > of hardcoding 'openssl'? e.g: > > > > summary( > >{ > > 'ssl': ssl.found() ? [ssl, '(@0@)'.format(ssl_library)] : ssl, > >}, > >section: 'External libraries', > >list_sep: ', ', > > ) > > > > And it will output: > > ssl: YES 3.0.8, (openssl) > > > > I don't think that using 'ssl_library' will increase the complexity. > > Then we might as well use ssl_library as the key, like: > > { > ... > 'selinux': selinux, > ssl_library: ssl, > 'systemd': systemd, > ... > } > There will be a problem if ssl is not found. It will output 'none: NO' because 'ssl_library' is initialized as 'none' for now. We can initialize 'ssl_library' as 'ssl' but I am not sure that is a good idea. Regards, Nazir Bilal Yavuz Microsoft
Re: [PATCH] Align GSS and TLS error handling in PQconnectPoll()
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Fri, Feb 17, 2023 at 09:01:43AM -0800, Jacob Champion wrote: > > On Thu, Feb 16, 2023 at 10:59 PM Michael Paquier > > wrote: > >> I am adding Stephen Frost > >> in CC to see if he has any comments about all this part of the logic > >> with gssencmode. > > > > Sounds good. > > Hearing nothing on this part, perhaps we should just move on and > adjust the behavior on HEAD? Thats seems like one step in the good > direction. If this brews right, we could always discuss a backpatch > at some point, if necessary. > > Thoughts from others? I agree with matching how SSL is handled here and in a review of the patch proposed didn't see any issues with it. Seems like it's probably something that should also be back-patched and it doesn't look terribly risky to do so, is there a specific risk that you see? Thanks, Stephen signature.asc Description: PGP signature
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Hi, > 1. Use internal 64 bit page numbers in SLRUs without changing segments naming. > 2. Use the larger segment file names in async.c, to lift the current 8 > GB limit on the max number of pending notifications. > [...] > > Where our main focus for PG16 is going to be 1 and 2, and we may try > to deliver 6 and 7 too if time will permit. > > Maxim and I agreed (offlist) that I'm going to submit 1 and 2. The > patch 1 is ready, please see the attachment. I'm currently working on > 2 and going to submit it in a bit. It seems to be relatively > straightforward but I don't want to rush things and want to make sure > I didn't miss something. PFA the patch v57 which now includes both 1 and 2. -- Best regards, Aleksander Alekseev v57-0001-Index-SLRUs-by-64-bit-integers-rather-than-by-32.patch Description: Binary data v57-0002-Use-larger-segment-file-names-for-pg_notify.patch Description: Binary data
Re: Disable rdns for Kerberos tests
Greetings, * Heikki Linnakangas (hlinn...@iki.fi) wrote: > On 25 February 2023 00:50:30 EET, Stephen Frost wrote: > >Thanks for reviewing! Comments added and updated the commit message. > > > >Unless there's anything else, I'll push this early next week. > > s/capture portal/captive portal/. Other than that, looks good to me. Push, thanks again! Stephen signature.asc Description: PGP signature
Re: HOT chain validation in verify_heapam()
On Wed, Mar 8, 2023 at 7:30 PM Himanshu Upadhyaya < upadhyaya.himan...@gmail.com> wrote: Please find the v11 patch with all these changes. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From f2b262e95fe07dddfec994f20a6d2e76bc12b410 Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Thu, 9 Mar 2023 21:18:58 +0530 Subject: [PATCH v11] Implement HOT chain validation in verify_heapam() Himanshu Upadhyaya, reviewed by Robert Haas, Aleksander Alekseev, Andres Freund. Some revisions by Robert Haas. --- contrib/amcheck/verify_heapam.c | 291 +- src/bin/pg_amcheck/t/004_verify_heapam.pl | 288 - 2 files changed, 562 insertions(+), 17 deletions(-) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index 4fcfd6df72..9cd4a795a0 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -150,7 +150,9 @@ typedef struct HeapCheckContext } HeapCheckContext; /* Internal implementation */ -static void check_tuple(HeapCheckContext *ctx); +static void check_tuple(HeapCheckContext *ctx, + bool *xmin_commit_status_ok, + XidCommitStatus *xmin_commit_status); static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx, ToastedAttribute *ta, int32 *expected_chunk_seq, uint32 extsize); @@ -160,7 +162,9 @@ static void check_toasted_attribute(HeapCheckContext *ctx, ToastedAttribute *ta); static bool check_tuple_header(HeapCheckContext *ctx); -static bool check_tuple_visibility(HeapCheckContext *ctx); +static bool check_tuple_visibility(HeapCheckContext *ctx, + bool *xmin_commit_status_ok, + XidCommitStatus *xmin_commit_status); static void report_corruption(HeapCheckContext *ctx, char *msg); static void report_toast_corruption(HeapCheckContext *ctx, @@ -399,9 +403,16 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) { OffsetNumber maxoff; + OffsetNumber predecessor[MaxOffsetNumber]; + OffsetNumber successor[MaxOffsetNumber]; + bool lp_valid[MaxOffsetNumber]; + bool xmin_commit_status_ok[MaxOffsetNumber]; + XidCommitStatus xmin_commit_status[MaxOffsetNumber]; CHECK_FOR_INTERRUPTS(); + memset(predecessor, 0, sizeof(OffsetNumber) * MaxOffsetNumber); + /* Optionally skip over all-frozen or all-visible blocks */ if (skip_option != SKIP_PAGES_NONE) { @@ -433,6 +444,12 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; ctx.offnum = OffsetNumberNext(ctx.offnum)) { + BlockNumber nextblkno; + OffsetNumber nextoffnum; + + successor[ctx.offnum] = InvalidOffsetNumber; + lp_valid[ctx.offnum] = false; + xmin_commit_status_ok[ctx.offnum] = false; ctx.itemid = PageGetItemId(ctx.page, ctx.offnum); /* Skip over unused/dead line pointers */ @@ -469,6 +486,14 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); + +/* + * Record the fact that this line pointer has passed basic + * sanity checking, and also the offset number to which it + * points. + */ +lp_valid[ctx.offnum] = true; +successor[ctx.offnum] = rdoffnum; continue; } @@ -502,11 +527,237 @@ verify_heapam(PG_FUNCTION_ARGS) } /* It should be safe to examine the tuple's header, at least */ + lp_valid[ctx.offnum] = true; ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); /* Ok, ready to check this next tuple */ - check_tuple(&ctx); + check_tuple(&ctx, + &xmin_commit_status_ok[ctx.offnum], + &xmin_commit_status[ctx.offnum]); + + /* + * If the CTID field of this tuple seems to point to another tuple + * on the same page, record that tuple as the successor of this + * one. + */ + nextblkno = ItemPointerGetBlockNumber(&(ctx.tuphdr)->t_ctid); + nextoffnum = ItemPointerGetOffsetNumber(&(ctx.tuphdr)->t_ctid); + if (nextblkno == ctx.blkno && nextoffnum != ctx.offnum) +successor[ctx.offnum] = nextoffnum; + } + + /* + * Update chain validation. Check each line pointer that's got a valid + * successor against that successor. + */ + ctx.attnum = -1; + for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; + ctx.offnum = OffsetNumberNext(ctx.offnum)) + { + ItemId curr_lp; + ItemId next_lp; + HeapTupleHeader curr_htup; + HeapTupleHeader next_htup; + TransactionId curr_xmin; + TransactionId curr_xmax; + TransactionId next_xmin; + OffsetNumber nextoffnum = successor[ctx.offnum]; + + /* + * The current line pointer may not have a successor, either + * because it's not valid or because it didn't point to anything. + * In either case, we have to give up. + * + * If the current line pointer does point
Re: Add shared buffer hits to pg_stat_io
Hi, On 3/9/23 2:23 PM, Melanie Plageman wrote: On Wed, Mar 8, 2023 at 2:23 PM Andres Freund wrote: On 2023-03-08 13:44:32 -0500, Melanie Plageman wrote: However, I am concerned that, while unlikely, this could be flakey. Something could happen to force all of those blocks out of shared buffers (even though they were just read in) before we hit them. You could make the test query a simple nested loop self-join, that'll prevent the page being evicted, because it'll still be pinned on the outer side, while generating hits on the inner side. Good idea. v3 attached. Thanks! The added test looks good to me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Disable rdns for Kerberos tests
Stephen Frost writes: > Push, thanks again! Why'd you only change HEAD? Isn't the test equally fragile in the back branches? regards, tom lane
Re: Add LZ4 compression in pg_dump
On Wed, Mar 01, 2023 at 05:39:54PM +0100, Tomas Vondra wrote: > On 2/27/23 05:49, Justin Pryzby wrote: > > On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote: > >> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: > >>> I have some fixes (attached) and questions while polishing the patch for > >>> zstd compression. The fixes are small and could be integrated with the > >>> patch for zstd, but could be applied independently. > >> > >> One more - WriteDataToArchiveGzip() says: > > > > One more again. > > > > The LZ4 path is using non-streaming mode, which compresses each block > > without persistent state, giving poor compression for -Fc compared with > > -Fp. If the data is highly compressible, the difference can be orders > > of magnitude. > > > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c > > 12351763 > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c > > 21890708 > > > > That's not true for gzip: > > > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c > > 2118869 > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c > > 2115832 > > > > The function ought to at least use streaming mode, so each block/row > > isn't compressioned in isolation. 003 is a simple patch to use > > streaming mode, which improves the -Fc case: > > > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c > > 15178283 > > > > However, that still flushes the compression buffer, writing a block > > header, for every row. With a single-column table, pg_dump -Fc -Z lz4 > > still outputs ~10% *more* data than with no compression at all. And > > that's for compressible data. > > > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c > > 12890296 > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c > > 11890296 > > > > I think this should use the LZ4F API with frames, which are buffered to > > avoid outputting a header for every single row. The LZ4F format isn't > > compatible with the LZ4 format, so (unlike changing to the streaming > > API) that's not something we can change in a bugfix release. I consider > > this an Opened Item. > > > > With the LZ4F API in 004, -Fp and -Fc are essentially the same size > > (like gzip). (Oh, and the output is three times smaller, too.) > > > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c > > 4155448 > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c > > 4156548 > > Thanks. Those are definitely interesting improvements/optimizations! > > I suggest we track them as a separate patch series - please add them to > the CF app (I guess you'll have to add them to 2023-07 at this point, > but we can get them in, I think). Thanks for looking. I'm not sure if I'm the best person to write/submit the patch to implement that for LZ4. Georgios, would you want to take on this change ? I think that needs to be changed for v16, since 1) LZ4F works so much better like this, and 2) we can't change it later without breaking compatibility of the dumpfiles by changing the header with another name other than "lz4". Also, I imagine we'd want to continue supporting the ability to *restore* a dumpfile using the old(current) format, which would be untestable code unless we also preserved the ability to write it somehow (like -Z lz4-old). One issue is that LZ4F_createCompressionContext() and LZ4F_compressBegin() ought to be called in InitCompressorLZ4(). It seems like it might *need* to be called there to avoid exactly the kind of issue that I reported with empty LOs with gzip. But InitCompressorLZ4() isn't currently passed the ArchiveHandle, so can't write the header. And LZ4CompressorState has a simple char *buf, and not an more elaborate data structure like zlib. You could work around that by keeping storing the "len" of the existing buffer, and flushing it in EndCompressorLZ4(), but that adds needless complexity to the Write and End functions. Maybe the Init function should be passed the AH. -- Justin
Re: Improve logging when using Huge Pages
On Thu, Mar 09, 2023 at 09:34:10AM -0500, Stephen Frost wrote: > Greetings, > > * Nathan Bossart (nathandboss...@gmail.com) wrote: > > On Wed, Feb 15, 2023 at 10:13:17AM -0800, Nathan Bossart wrote: > > > On Tue, Feb 14, 2023 at 07:32:56PM -0600, Justin Pryzby wrote: > > >> On Mon, Feb 13, 2023 at 08:18:52PM -0800, Nathan Bossart wrote: > > >>> I'm curious why you chose to make this a string instead of an enum. > > >>> There > > >>> might be little practical difference, but since there are only three > > >>> possible values, I wonder if it'd be better form to make it an enum. > > >> > > >> It takes more code to write as an enum - see 002.txt. I'm not convinced > > >> this is better. > > >> > > >> But your comment made me fix its , and reconsider the strings, > > >> which I changed to active={unknown/true/false} rather than {unk/on/off}. > > >> It could also be active={unknown/yes/no}... > > > > > > I think unknown/true/false is fine. I'm okay with using a string if no > > > one > > > else thinks it should be an enum (or a bool). > > > > There's been no response for this, so I guess we can proceed with a string > > GUC. > > Just happened to see this and I'm not really a fan of this being a > string when it's pretty clear that's not what it actually is. I don't understand what you mean by that. Why do you say it isn't a string ? > > +Reports whether huge pages are in use by the current instance. > > +See for more information. > > > > I still think we should say "server" in place of "current instance" here. > > We certainly use 'server' a lot more in config.sgml than we do > 'instance'. "currently running server" might be closer to how we > describe a running PG system in other parts (we talk about "currently > running server processes", "while the server is running", "When running > a standby server", "when the server is running"; "instance" is used much > less and seems to more typically refer to 'state of files on disk' in my > reading vs. 'actively running process' though there's some of each). I called it "instance" since the GUC has no meaning when it's not running. I'm fine to rename it to "running server". > > + {"huge_pages_active", PGC_INTERNAL, PRESET_OPTIONS, > > + gettext_noop("Indicates whether huge pages are in > > use."), > > + NULL, > > + GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | > > GUC_RUNTIME_COMPUTED > > + }, > > > > I don't think we need to use GUC_RUNTIME_COMPUTED. 'postgres -C' seems to > > always report "unknown" for this GUC, so all this would do is cause that > > command to error unnecessarily when the server is running. > > ... or we could consider adjusting things to actually try the mmap() and > find out if we'd end up with huge pages or not. That seems like a bad idea, since it might work one moment and fail one moment later. If we could tell in advance whether it was going to work, we wouldn't be here, and probably also wouldn't have invented huge_pages=true. > > It might be worth documenting exactly what "unknown" means. IIUC you'll > > only ever see "on" or "off" via SHOW or pg_settings, which doesn't seem > > tremendously obvious. > > If we could get rid of that case and just make this a boolean, that > seems like it'd really be the best answer. > > To that end- perhaps this isn't appropriate as a GUC at all? Why not > have this just be a system information function? Functionally it really > provides the same info- if the server is running then you get back > either true or false, if it's not then you can't call it but that's > hardly different from an unknown or error result. We talked about making it a function ~6 weeks ago. Is there an agreement to use a function, instead ? -- Justin
Re: Track IO times in pg_stat_io
Hi, v4 attached addresses these review comments. On Tue, Mar 7, 2023 at 1:39 PM Andres Freund wrote: > On 2023-03-06 11:30:13 -0500, Melanie Plageman wrote: > > > As pgstat_bktype_io_stats_valid() is called only in Assert(), I think > > > that would be a good idea > > > to also check that if counts are not Zero then times are not Zero. > > > > Yes, I think adding some validation around the relationship between > > counts and timing should help prevent developers from forgetting to call > > pg_stat_count_io_op() when calling pgstat_count_io_time() (as relevant). > > > > However, I think that we cannot check that if IO counts are non-zero > > that IO times are non-zero, because the user may not have > > track_io_timing enabled. We can check that if IO times are not zero, IO > > counts are not zero. I've done this in the attached v3. > > And even if track_io_timing is enabled, the timer granularity might be so low > that we *still* get zeroes. > > I wonder if we should get rid of pgStatBlockReadTime, pgStatBlockWriteTime, And then have pg_stat_reset_shared('io') reset pg_stat_database IO stats? > > @@ -1000,11 +1000,27 @@ ReadBuffer_common(SMgrRelation smgr, char > > relpersistence, ForkNumber forkNum, > > > > if (isExtend) > > { > > + instr_time io_start, > > + io_time; > > + > > /* new buffers are zero-filled */ > > MemSet((char *) bufBlock, 0, BLCKSZ); > > + > > + if (track_io_timing) > > + INSTR_TIME_SET_CURRENT(io_start); > > + else > > + INSTR_TIME_SET_ZERO(io_start); > > + > > I wonder if there's an argument for tracking this in the existing IO stats as > well. But I guess we've lived with this for a long time... Not sure I want to include that in this patchset. > > @@ -2981,16 +2998,16 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, > > IOObject io_object, > >* When a strategy is not in use, the write can only be a "regular" > > write > >* of a dirty shared buffer (IOCONTEXT_NORMAL IOOP_WRITE). > >*/ > > - pgstat_count_io_op(IOOBJECT_RELATION, io_context, IOOP_WRITE); > > - > > if (track_io_timing) > > { > > INSTR_TIME_SET_CURRENT(io_time); > > INSTR_TIME_SUBTRACT(io_time, io_start); > > > > pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time)); > > INSTR_TIME_ADD(pgBufferUsage.blk_write_time, io_time); > > + pgstat_count_io_time(IOOBJECT_RELATION, io_context, > > IOOP_WRITE, io_time); > > } > > I think this needs a bit of cleanup - pgstat_count_buffer_write_time(), > pgBufferUsage.blk_write_time++, pgstat_count_io_time() is a bit excessive. We > might not be able to reduce the whole duplication at this point, but at least > it should be a bit more centralized. So, in the attached v4, I've introduced pgstat_io_start() and pgstat_io_end(...). The end IO function takes the IOObject, IOOp, and IOContext, in addition to the start_time, so that we know which pgBufferUsage field to increment and which pgstat_count_buffer_*_time() to call. I will note that calling this function now causes pgBufferUsage and pgStatBlock*Time to be incremented in a couple of places that they were not before. I think those might have been accidental omissions, so I think it is okay. The exception is pgstat_count_write_time() being only called for relations in shared buffers and not temporary relations while pgstat_count_buffer_read_time() is called for temporary relations and relations in shared buffers. I left that behavior as is, though it seems like it is wrong. I added pgstat_io_start() to pgstat.c -- not sure if it is best there. I could separate it into a commit that does this refactoring of the existing counting (without adding pgstat_count_io_time()) and then another that adds pgstat_count_io_time(). I hesitated to do that until I knew that the new functions were viable. > > + pgstat_count_io_op(IOOBJECT_RELATION, io_context, IOOP_WRITE); > > pgBufferUsage.shared_blks_written++; > > > > /* > > @@ -3594,6 +3611,9 @@ FlushRelationBuffers(Relation rel) > > > > if (RelationUsesLocalBuffers(rel)) > > { > > + instr_time io_start, > > + io_time; > > + > > for (i = 0; i < NLocBuffer; i++) > > { > > uint32 buf_state; > > @@ -3616,6 +3636,11 @@ FlushRelationBuffers(Relation rel) > > > > PageSetChecksumInplace(localpage, > > bufHdr->tag.blockNum); > > > > + if (track_io_timing) > > + INSTR_TIME_SET_CURRENT(io_start); > > + else > > + INSTR_TIME_SET_ZERO(io_start); > > + > > smgrwrite(RelationGetSmgr(rel), > >
Re: [PATCH] Add pretty-printed XML output option
While reviewing this patch, I started to wonder why we don't eliminate the maintenance hassle of xml_1.out by putting in a short-circuit at the top of the test, similar to those in some other scripts: /* skip test if XML support not compiled in */ SELECT 'one'::xml; \if :ERROR \quit \endif (and I guess xmlmap.sql could get the same treatment). The only argument I can think of against it is that the current approach ensures we produce a clean error (and not, say, a crash) for all xml.c entry points not just xml_in. I'm not sure how much that's worth though. The compiler/linker would tell us if we miss compiling out every reference to libxml2. Thoughts? regards, tom lane
Re: Infinite Interval
Hi Joseph, Thanks for working on the patch. Sorry for taking so long to review this patch. But here's it finally, review of code changes static pg_tz *FetchDynamicTimeZone(TimeZoneAbbrevTable *tbl, const datetkn *tp, - DateTimeErrorExtra *extra); + DateTimeErrorExtra * extra); There are a lot of these diffs. PG code doesn't leave an extra space between variable name and *. /* Handle the integer part */ -if (!int64_multiply_add(val, scale, &itm_in->tm_usec)) +if (pg_mul_add_s64_overflow(val, scale, &itm_in->tm_usec)) I think this is a good change, since we are moving the function to int.h where it belongs. We could separate these kind of changes into another patch for easy review. + +result->day = days; +if (pg_mul_add_s32_overflow(weeks, 7, &result->day)) +ereport(ERROR, +(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), + errmsg("interval out of range"))); I think such changes are also good, but probably a separate patch for ease of review. secs = rint(secs * USECS_PER_SEC); -result->time = hours mj* ((int64) SECS_PER_HOUR * USECS_PER_SEC) + -mins * ((int64) SECS_PER_MINUTE * USECS_PER_SEC) + -(int64) secs; + +result->time = secs; +if (pg_mul_add_s64_overflow(mins, ((int64) SECS_PER_MINUTE * USECS_PER_SEC), &result->time)) +ereport(ERROR, +(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), + errmsg("interval out of range"))); +if (pg_mul_add_s64_overflow(hours, ((int64) SECS_PER_HOUR * USECS_PER_SEC), &result->time)) +ereport(ERROR, +(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), + errmsg("interval out of range"))); shouldn't this be secs = rint(secs); result->time = 0; pg_mul_add_s64_overflow(secs, USECS_PER_SEC, &result->time) to catch overflow error early? +if TIMESTAMP_IS_NOBEGIN +(dt2) Better be written as if (TIMESTAMP_IS_NOBEGIN(dt2))? There are more corrections like this. +if (INTERVAL_NOT_FINITE(result)) +ereport(ERROR, +(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), + errmsg("interval out of range"))); Probably, I added these kind of checks. But I don't remember if those are defensive checks or whether it's really possible that the arithmetic above these lines can yield an non-finite interval. +else +{ +result->time = -interval->time; +result->day = -interval->day; +result->month = -interval->month; + +if (INTERVAL_NOT_FINITE(result)) +ereport(ERROR, +(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), + errmsg("interval out of range"))); If this error ever gets to the user, it could be confusing. Can we elaborate by adding context e.g. errcontext("while negating an interval") or some such? - -result->time = -interval->time; -/* overflow check copied from int4um */ -if (interval->time != 0 && SAMESIGN(result->time, interval->time)) -ereport(ERROR, -(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), - errmsg("interval out of range"))); -result->day = -interval->day; -if (interval->day != 0 && SAMESIGN(result->day, interval->day)) -ereport(ERROR, -(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), - errmsg("interval out of range"))); -result->month = -interval->month; -if (interval->month != 0 && SAMESIGN(result->month, interval->month)) -ereport(ERROR, -(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), - errmsg("interval out of range"))); +interval_um_internal(interval, result); Shouldn't we incorporate these checks in interval_um_internal()? I don't think INTERVAL_NOT_FINITE() covers all of those. +/* + * Subtracting two infinite intervals with different signs results in an + * infinite interval with the same sign as the left operand. Subtracting + * two infinte intervals with the same sign results in an error. + */ I think we need someone to validate these assumptions and similar assumptions in interval_pl(). Googling gives confusing results in some cases. I have not looked for IEEE standard around this specificallly. +if (INTERVAL_NOT_FINITE(interval)) +{ +doubler = NonFiniteIntervalPart(type, val, lowunits, + INTERVAL_IS_NOBEGIN(interval), + false); + +if (r) I see that this code is very similar to the corresponding code in timestamp and timestamptz, so it's bound to be correct. But I always thought float equality is unreliable. if (r) is equivalent to if (r == 0.0) so it will not work as intended. But may be (float) 0.0 is a special value for which equality holds true. +static inline boo
Re: Improve logging when using Huge Pages
On 2023-Mar-09, Justin Pryzby wrote: > On Thu, Mar 09, 2023 at 09:34:10AM -0500, Stephen Frost wrote: > > > +Reports whether huge pages are in use by the current instance. > > > +See for more information. > > > > > > I still think we should say "server" in place of "current instance" here. > > > > We certainly use 'server' a lot more in config.sgml than we do > > 'instance'. "currently running server" might be closer to how we > > describe a running PG system in other parts (we talk about "currently > > running server processes", "while the server is running", "When running > > a standby server", "when the server is running"; "instance" is used much > > less and seems to more typically refer to 'state of files on disk' in my > > reading vs. 'actively running process' though there's some of each). > > I called it "instance" since the GUC has no meaning when it's not > running. I'm fine to rename it to "running server". I'd rather make all these other places use "instance" instead. We used to consider these terms interchangeable, but since we introduced the glossary to unify the terminology, they are no longer supposed to be. A server (== a machine) can contain many instances, and each individual instance in the server could be using huge pages or not. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Now I have my system running, not a byte was off the shelf; It rarely breaks and when it does I fix the code myself. It's stable, clean and elegant, and lightning fast as well, And it doesn't cost a nickel, so Bill Gates can go to hell."
Re: Add LZ4 compression in pg_dump
On 3/9/23 17:15, Justin Pryzby wrote: > On Wed, Mar 01, 2023 at 05:39:54PM +0100, Tomas Vondra wrote: >> On 2/27/23 05:49, Justin Pryzby wrote: >>> On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote: On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: > I have some fixes (attached) and questions while polishing the patch for > zstd compression. The fixes are small and could be integrated with the > patch for zstd, but could be applied independently. One more - WriteDataToArchiveGzip() says: >>> >>> One more again. >>> >>> The LZ4 path is using non-streaming mode, which compresses each block >>> without persistent state, giving poor compression for -Fc compared with >>> -Fp. If the data is highly compressible, the difference can be orders >>> of magnitude. >>> >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c >>> 12351763 >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c >>> 21890708 >>> >>> That's not true for gzip: >>> >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c >>> 2118869 >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c >>> 2115832 >>> >>> The function ought to at least use streaming mode, so each block/row >>> isn't compressioned in isolation. 003 is a simple patch to use >>> streaming mode, which improves the -Fc case: >>> >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c >>> 15178283 >>> >>> However, that still flushes the compression buffer, writing a block >>> header, for every row. With a single-column table, pg_dump -Fc -Z lz4 >>> still outputs ~10% *more* data than with no compression at all. And >>> that's for compressible data. >>> >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c >>> 12890296 >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c >>> 11890296 >>> >>> I think this should use the LZ4F API with frames, which are buffered to >>> avoid outputting a header for every single row. The LZ4F format isn't >>> compatible with the LZ4 format, so (unlike changing to the streaming >>> API) that's not something we can change in a bugfix release. I consider >>> this an Opened Item. >>> >>> With the LZ4F API in 004, -Fp and -Fc are essentially the same size >>> (like gzip). (Oh, and the output is three times smaller, too.) >>> >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c >>> 4155448 >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c >>> 4156548 >> >> Thanks. Those are definitely interesting improvements/optimizations! >> >> I suggest we track them as a separate patch series - please add them to >> the CF app (I guess you'll have to add them to 2023-07 at this point, >> but we can get them in, I think). > > Thanks for looking. I'm not sure if I'm the best person to write/submit > the patch to implement that for LZ4. Georgios, would you want to take > on this change ? > > I think that needs to be changed for v16, since 1) LZ4F works so much > better like this, and 2) we can't change it later without breaking > compatibility of the dumpfiles by changing the header with another name > other than "lz4". Also, I imagine we'd want to continue supporting the > ability to *restore* a dumpfile using the old(current) format, which > would be untestable code unless we also preserved the ability to write > it somehow (like -Z lz4-old). > I'm a bit confused about the lz4 vs. lz4f stuff, TBH. If we switch to lz4f, doesn't that mean it (e.g. restore) won't work on systems that only have older lz4 version? What would/should happen if we take backup compressed with lz4f, an then try restoring it on an older system where lz4 does not support lz4f? Maybe if lz4f format is incompatible with regular lz4, we should treat it as a separate compression method 'lz4f'? I'm mostly afk until the end of the week, but I tried searching for lz4f info - the results are not particularly enlightening, unfortunately. AFAICS this only applies to lz4f stuff. Or would the streaming mode be a breaking change too? > One issue is that LZ4F_createCompressionContext() and > LZ4F_compressBegin() ought to be called in InitCompressorLZ4(). It > seems like it might *need* to be called there to avoid exactly the kind > of issue that I reported with empty LOs with gzip. But > InitCompressorLZ4() isn't currently passed the ArchiveHandle, so can't > write the header. And LZ4CompressorState has a simple char *buf, and > not an more elaborate data structure like zlib. You could work around > that by keeping storing the "len" of the existing buffer, and flushing > it in EndCompressorLZ4(), but that adds needless complexity to the Write > and End functions. Maybe the Init function should be passed the AH. > Not sure, but looking at GzipCompressorState I see the only extra thing it has (compared to LZ4CompressorState) is "z_streamp". I can't experiment with this until the end of t
Re: improving user.c error messages
On Thu, Mar 09, 2023 at 10:55:54AM +0100, Peter Eisentraut wrote: > On 20.02.23 23:58, Nathan Bossart wrote: >> For now, I've reworded these as "must inherit privileges of". > > I don't have a good mental model of all this role inheritance, personally, > but I fear that this change makes the messages more jargony and less clear. > Maybe the original wording was good enough. I'm fine with that. > "admin option" is sort of a natural language term, I think, so we don't need > to parametrize it as "%s option". Also, there are no other "options" in > this context, I think. v16 introduces the INHERIT and SET options. I don't have a strong opinion about parameterizing it, though. My intent was to consistently capitalize all the attributes and options. > A general thought: It seems we currently don't have any error messages that > address the user like "You must do this". Do we want to go there? Should we > try for a more impersonal wording like > > "You must have the %s attribute to create roles." > > "Current user must have the %s attribute to create roles." > > "%s attribute is required to create roles." I think I like the last option the most. In general, I agree with trying to avoid the second-person phrasing. > By the way, I'm not sure what the separation between 0001 and 0002 is > supposed to be. I'll combine them. I first started with user.c only, but we kept finding new messages to improve. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)
On 3/9/23 01:30, Michael Paquier wrote: > On Thu, Mar 09, 2023 at 12:39:08AM +0100, Tomas Vondra wrote: >> IMO we should fix that. We have a bunch of buildfarm members running on >> Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP >> tests. But considering how trivial the fix is ... >> >> Barring objections, I'll push a fix early next week. > > +1, better to change that, thanks. Actually, would --rm be OK even on > Windows? As far as I can see, the CI detects a LZ4 command for the > VS2019 environment but not the liblz4 libraries that would be needed > to trigger the set of tests. Thanks for noticing that. I'll investigate next week. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Add pretty-printed XML output option
On 09.03.23 18:38, Tom Lane wrote: While reviewing this patch, I started to wonder why we don't eliminate the maintenance hassle of xml_1.out by putting in a short-circuit at the top of the test, similar to those in some other scripts: /* skip test if XML support not compiled in */ SELECT 'one'::xml; \if :ERROR \quit \endif (and I guess xmlmap.sql could get the same treatment). The only argument I can think of against it is that the current approach ensures we produce a clean error (and not, say, a crash) for all xml.c entry points not just xml_in. I'm not sure how much that's worth though. The compiler/linker would tell us if we miss compiling out every reference to libxml2. Thoughts? regards, tom lane Hi Tom, I agree it would make things easier and it could indeed save some time (and some CI runs ;)). However, checking in the absence of libxml2 if an error message is raised, and checking if this error message is the one we expect, is IMHO also a very nice test. But I guess I could also live with skipping the whole thing. Best, Jim smime.p7s Description: S/MIME Cryptographic Signature
WaitEventSet resource leakage
In [1] I wrote: > PG Bug reporting form writes: >> The following script: >> [ leaks a file descriptor per error ] > > Yeah, at least on platforms where WaitEventSets own kernel file > descriptors. I don't think it's postgres_fdw's fault though, > but that of ExecAppendAsyncEventWait, which is ignoring the > possibility of failing partway through. It looks like it'd be > sufficient to add a PG_CATCH or PG_FINALLY block there to make > sure the WaitEventSet is disposed of properly --- fortunately, > it doesn't need to have any longer lifespan than that one > function. After further thought that seems like a pretty ad-hoc solution. We probably can do no better in the back branches, but shouldn't we start treating WaitEventSets as ResourceOwner-managed resources? Otherwise, transient WaitEventSets are going to be a permanent source of headaches. regards, tom lane [1] https://www.postgresql.org/message-id/423731.1678381075%40sss.pgh.pa.us
Re: Move defaults toward ICU in 16?
On Thu, 2023-03-09 at 10:36 +0100, Peter Eisentraut wrote: > 0002 seems fine to me. Committed 0002 with some test improvements. > > Let's come back to that after dealing with the other two. Leaving 0001 open for now. 0003 committed after addressing your comments. -- Jeff Davis PostgreSQL Contributor Team - AWS
Re: Add standard collation UNICODE
On Thu, 2023-03-09 at 11:21 +0100, Peter Eisentraut wrote: > How about this patch version? Looks good to me. Regards, Jeff Davis
Re: psql \watch 2nd argument: iteration count
+ pg_log_error("Watch period must be non-negative number, but argument is '%s'", opt); After looking around at the other error messages in this file, I think we should make this more concise. Maybe something like pg_log_error("\\watch: invalid delay interval: %s", opt); + free(opt); + resetPQExpBuffer(query_buf); + return PSQL_CMD_ERROR; Is this missing psql_scan_reset(scan_state)? I haven't had a chance to look closely at 0002 yet. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Hi, On 2023-03-09 06:51:31 -0600, Justin Pryzby wrote: > On Tue, Mar 07, 2023 at 10:18:44AM -0800, Andres Freund wrote: > > Hi, > > > > On 2023-03-06 15:21:14 -0500, Melanie Plageman wrote: > > > Good point. Attached is what you suggested. I committed the transaction > > > before the drop table so that the statistics would be visible when we > > > queried pg_stat_io. > > > > Pushed, thanks for the report, analysis and fix, Tom, Horiguchi-san, > > Melanie. > > There's a 2nd portion of the test that's still flapping, at least on > cirrusci. > > The issue that Tom mentioned is at: > SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes; > > But what I've seen on cirrusci is at: > SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes; Seems you meant to copy a different line for Tom's (s/writes/redas/)? > https://api.cirrus-ci.com/v1/artifact/task/6701069548388352/log/src/test/recovery/tmp_check/regression.diffs Hm. I guess the explanation here is that the buffers were already all written out by another backend. Which is made more likely by your patch. I found a few more occurances and chatted with Melanie. Melanie will come up with a fix I think. Greetings, Andres Freund
Re: Improve logging when using Huge Pages
On Thu, Mar 09, 2023 at 10:38:56AM -0600, Justin Pryzby wrote: > On Thu, Mar 09, 2023 at 09:34:10AM -0500, Stephen Frost wrote: >> To that end- perhaps this isn't appropriate as a GUC at all? Why not >> have this just be a system information function? Functionally it really >> provides the same info- if the server is running then you get back >> either true or false, if it's not then you can't call it but that's >> hardly different from an unknown or error result. > > We talked about making it a function ~6 weeks ago. > > Is there an agreement to use a function, instead ? If we're tallying votes, please count me as +1 for a system information function. I think that nicely sidesteps having to return "unknown" in some cases, which I worry will just cause confusion. IMHO it is much more obvious that the value refers to the current server if you have to log in and execute a function to see it. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: buildfarm + meson
On 2023-03-09 Th 08:28, Andrew Dunstan wrote: At this stage I think I'm prepared to turn this loose on a couple of my buildfarm animals, and if nothing goes awry for the remainder of the month merge the dev/meson branch and push a new release. There is still probably a little polishing to do, especially w.r.t. log file artefacts. A few things I've found: . We don't appear to have an equivalent of the headerscheck and cpluspluscheck GNUmakefile targets . I don't know how to build other docs targets (e.g. postgres-US.pdf) . There appears to be some mismatch in database names (e.g. regression_dblink vs contrib_regression_dblink). That's going to cause some issues with the module that adjusts things for cross version upgrade. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Disable rdns for Kerberos tests
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > Push, thanks again! > > Why'd you only change HEAD? Isn't the test equally fragile in the > back branches? We hadn't had any complaints about it and so I wasn't sure if it was useful to back-patch it. I'm happy to do so though. Thanks, Stephen signature.asc Description: PGP signature
Re: Improve logging when using Huge Pages
On Thu, Mar 09, 2023 at 06:51:21PM +0100, Alvaro Herrera wrote: > I'd rather make all these other places use "instance" instead. We used > to consider these terms interchangeable, but since we introduced the > glossary to unify the terminology, they are no longer supposed to be. > A server (== a machine) can contain many instances, and each individual > instance in the server could be using huge pages or not. Ah, good to know. I've always considered "server" in this context to mean the server process(es) for a single instance, but I can see the value in having different terminology to clearly distinguish the process(es) from the machine. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: buildfarm + meson
Hi, On 2023-03-09 14:47:36 -0500, Andrew Dunstan wrote: > On 2023-03-09 Th 08:28, Andrew Dunstan wrote: > > At this stage I think I'm prepared to turn this loose on a couple of my > > buildfarm animals, and if nothing goes awry for the remainder of the > > month merge the dev/meson branch and push a new release. Cool! > > There is still probably a little polishing to do, especially w.r.t. log > > file artefacts. > A few things I've found: > > . We don't appear to have an equivalent of the headerscheck and > cpluspluscheck GNUmakefile targets Yes. I have a pending patch for it, but haven't yet cleaned it up sufficiently. The way headercheck/cpluspluscheck query information from Makefile.global is somewhat nasty. > . I don't know how to build other docs targets (e.g. postgres-US.pdf) There's an 'alldocs' target, or you can do ninja doc/src/sgml/postgres-US.pdf > . There appears to be some mismatch in database names (e.g. > regression_dblink vs contrib_regression_dblink). That's going to cause some > issues with the module that adjusts things for cross version upgrade. I guess we can try to do something about that, but the make situation is overly complicated. I don't really want to emulate having randomly differing database names just because a test is in contrib/ rather than src/. Greetings, Andres Freund
Re: Improve logging when using Huge Pages
Greetings, * Justin Pryzby (pry...@telsasoft.com) wrote: > On Thu, Mar 09, 2023 at 09:34:10AM -0500, Stephen Frost wrote: > > * Nathan Bossart (nathandboss...@gmail.com) wrote: > > > On Wed, Feb 15, 2023 at 10:13:17AM -0800, Nathan Bossart wrote: > > > > On Tue, Feb 14, 2023 at 07:32:56PM -0600, Justin Pryzby wrote: > > > >> On Mon, Feb 13, 2023 at 08:18:52PM -0800, Nathan Bossart wrote: > > > >>> I'm curious why you chose to make this a string instead of an enum. > > > >>> There > > > >>> might be little practical difference, but since there are only three > > > >>> possible values, I wonder if it'd be better form to make it an enum. > > > >> > > > >> It takes more code to write as an enum - see 002.txt. I'm not > > > >> convinced > > > >> this is better. > > > >> > > > >> But your comment made me fix its , and reconsider the strings, > > > >> which I changed to active={unknown/true/false} rather than > > > >> {unk/on/off}. > > > >> It could also be active={unknown/yes/no}... > > > > > > > > I think unknown/true/false is fine. I'm okay with using a string if no > > > > one > > > > else thinks it should be an enum (or a bool). > > > > > > There's been no response for this, so I guess we can proceed with a string > > > GUC. > > > > Just happened to see this and I'm not really a fan of this being a > > string when it's pretty clear that's not what it actually is. > > I don't understand what you mean by that. > Why do you say it isn't a string ? Because it's clearly a yes/no, either the server is currently running with huge pages, or it isn't. That's the definition of a boolean. Sure, anything can be cast to text but when there's a data type that fits better, that's almost uniformly better to use. > > > +Reports whether huge pages are in use by the current instance. > > > +See for more information. > > > > > > I still think we should say "server" in place of "current instance" here. > > > > We certainly use 'server' a lot more in config.sgml than we do > > 'instance'. "currently running server" might be closer to how we > > describe a running PG system in other parts (we talk about "currently > > running server processes", "while the server is running", "When running > > a standby server", "when the server is running"; "instance" is used much > > less and seems to more typically refer to 'state of files on disk' in my > > reading vs. 'actively running process' though there's some of each). > > I called it "instance" since the GUC has no meaning when it's not > running. I'm fine to rename it to "running server". Great, I do think that would match better with the rest of the documentation. > > > + {"huge_pages_active", PGC_INTERNAL, PRESET_OPTIONS, > > > + gettext_noop("Indicates whether huge pages are in > > > use."), > > > + NULL, > > > + GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | > > > GUC_RUNTIME_COMPUTED > > > + }, > > > > > > I don't think we need to use GUC_RUNTIME_COMPUTED. 'postgres -C' seems to > > > always report "unknown" for this GUC, so all this would do is cause that > > > command to error unnecessarily when the server is running. > > > > ... or we could consider adjusting things to actually try the mmap() and > > find out if we'd end up with huge pages or not. > > That seems like a bad idea, since it might work one moment and fail one > moment later. If we could tell in advance whether it was going to work, > we wouldn't be here, and probably also wouldn't have invented > huge_pages=true. Sure it might ... but I tend to disagree that it's actually all that likely for it to end up being as inconsistent as that and it'd be nice to be able to see if the server will end up successfully starting (for this part, at least), or not, when configured with huge pages set to on, or if starting with 'try' is likely to result in it actually using huge pages, or not. > > > It might be worth documenting exactly what "unknown" means. IIUC you'll > > > only ever see "on" or "off" via SHOW or pg_settings, which doesn't seem > > > tremendously obvious. > > > > If we could get rid of that case and just make this a boolean, that > > seems like it'd really be the best answer. > > > > To that end- perhaps this isn't appropriate as a GUC at all? Why not > > have this just be a system information function? Functionally it really > > provides the same info- if the server is running then you get back > > either true or false, if it's not then you can't call it but that's > > hardly different from an unknown or error result. > > We talked about making it a function ~6 weeks ago. Oh, good, glad I'm not the only one to have thought of that. > Is there an agreement to use a function, instead ? Looking back at the arguments for having it be a GUC ... I just don't really see any of them as very strong. Not that I feel super strongly about it being a function either, but it's certainly not a configuration va
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
> On Mar 8, 2023, at 4:15 PM, Andres Freund wrote: > > I worked some more on the fixes for amcheck, and fixes for amcheck. > > The second amcheck fix ends up correcting some inaccurate output in the tests > - previously xids from before xid 0 were reported to be in the future. > > Previously there was no test case exercising exceeding nextxid, without > wrapping around into the past. I added that at the end of > 004_verify_heapam.pl, because renumbering seemed too annoying. > > What do you think? The changes look reasonable to me. > Somewhat random note: > > Is it intentional that we VACUUM FREEZE test ROWCOUNT times? That's > effectively O(ROWCOUNT^2), albeit with small enough constants to not really > matter. I don't think we need to insert the rows one-by-one either. Changing > that to a single INSERT and FREEZE shaves 10-12% off the tests. I didn't > change that, but we also fire off a psql for each tuple for heap_page_items(), > with offset $N no less. That seems to be another 500ms. I don't recall the reasoning. Feel free to optimize the tests. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Improve logging when using Huge Pages
Greetings, * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote: > On 2023-Mar-09, Justin Pryzby wrote: > > On Thu, Mar 09, 2023 at 09:34:10AM -0500, Stephen Frost wrote: > > > > +Reports whether huge pages are in use by the current instance. > > > > +See for more information. > > > > > > > > I still think we should say "server" in place of "current instance" > > > > here. > > > > > > We certainly use 'server' a lot more in config.sgml than we do > > > 'instance'. "currently running server" might be closer to how we > > > describe a running PG system in other parts (we talk about "currently > > > running server processes", "while the server is running", "When running > > > a standby server", "when the server is running"; "instance" is used much > > > less and seems to more typically refer to 'state of files on disk' in my > > > reading vs. 'actively running process' though there's some of each). > > > > I called it "instance" since the GUC has no meaning when it's not > > running. I'm fine to rename it to "running server". > > I'd rather make all these other places use "instance" instead. We used > to consider these terms interchangeable, but since we introduced the > glossary to unify the terminology, they are no longer supposed to be. > A server (== a machine) can contain many instances, and each individual > instance in the server could be using huge pages or not. Perhaps, but then the references to instance should probably also be changed since there's certainly some that are referring to a set of data files and not to backend processes, eg: ## The shutdown setting is useful to have the instance ready at the exact replay point desired. The instance will still be able to replay more WAL records (and in fact will have to replay WAL records since the last checkpoint next time it is started). ## Not sure about just changing one thing at a time though or using the 'right' term when introducing things while having the 'wrong' term be used next to it. Also not saying that this patch should be responsible for fixing everything either. 'Server' in the glossary does explicitly say it could be used when referring to an 'instance' too though, so 'running server' doesn't seem to be entirely wrong. Thanks, Stephen signature.asc Description: PGP signature
Re: ICU locale validation / canonicalization
On Thu, 2023-03-09 at 09:46 +0100, Peter Eisentraut wrote: > This patch appears to do about three things at once, and it's not > clear > exactly where the boundaries are between them and which ones we might > actually want. And I think the terminology also gets mixed up a bit, > which makes following this harder. > > 1. Canonicalizing the locale string. This is presumably what > uloc_canonicalize() does, which the patch doesn't actually use. What > are examples of what this does? Does the patch actually do this? Both uloc_canonicalize() and uloc_getLanguageTag() do Level 2 Canonicalization, which is described here: https://unicode-org.github.io/icu/userguide/locale/#canonicalization > 2. Converting the locale string to BCP 47 format. This converts > 'de@collation=phonebook' to 'de-u-co-phonebk'. This is what > uloc_getLanguageTag() does. Yes, though uloc_getLanguageTag() also canonicalizes. I consider converting to the language tag a part of "canonicalization", because it's the canonical form we agreed on in this thread. > 3. Validating the locale string, to reject faulty input. Canonicalization doesn't make sure the locale actually exists in ICU, so it's easy to make a typo like "jp_JP" instead of "ja_JP". After canonicalizing to a language tag, the former is "jp-JP" (resolving to the collator with valid locale "root") and the latter is "ja-JP" (resolving to the collator with valid locale "ja"). The former is clearly a mistake, and I call catching that mistake "validation". If the user specifies something other than the root locale (i.e. not "root", "und", or ""), and the locale resolves to a collator with a valid locale of "root", then this patch considers that to be a mistake and issues a WARNING (upgraded to ERROR if the GUC icu_locale_validation is true). > What are the relationships between these? 1 & 2 are closely related. If we canonicalize, we need to pick one canonical form: either BCP 47 or ICU format locale IDs. 3 is related, but can be seen as an independent change. > I don't understand how the validation actually happens in your patch. > Does uloc_getLanguageTag() do the validation also? Using the above definition of "validation" it happens inside icu_collator_exists(). > Can you do canonicalization without converting to language tag? If we used uloc_canonicalize(), it would give us ICU format locale IDs, and that would be a valid thing to do; and we could switch the canonical form from ICU format locale IDs to BCP 47 in a separate patch. I don't have a strong opinion, but if we're going to canonicalize, I think it makes sense to go straight to language tags. > Can you do validation of un-canonicalized locale names? Yes, though I feel like an un-canonicalized name is less stable in meaning, and so validation on that name may also be less stable. For instance, if we don't canonicalize "fr_CA.UTF-8", it resolves to plain "fr"; but if we do canonicalize it first, it resolves to "fr-CA". Will the uncanonicalized name always resolve to "fr"? I'm not sure, because the documentation says that ucol_open() expects either an ICU format locale ID or, preferably, a language tag. So they are technically independently useful changes, but I would recommend that canonicalization goes in first. > What is the guidance for the use of the icu_locale_validation GUC? If an error when creating a new collation or database due to a bad locale name would be highly disruptive, leave it false. If such an error would be helpful to make sure you get the locale you expect, then turn it on. In practice, existing important production systems would leave it off; new systems could turn it on to help avoid misconfigurations/mistakes. > The description throws in yet another term: "validates that ICU > locale > strings are well-formed". What is "well-formed"? How does that > relate > to the other concepts? Good point, I don't think I need to redefine "validation". Maybe I should just describe it as elevating canonicalization or validation problems from WARNING to ERROR. > Personally, I'm not on board with this behavior: > > => CREATE COLLATION test (provider = icu, locale = > 'de@collation=phonebook'); > NOTICE: 0: using language tag "de-u-co-phonebk" for locale > "de@collation=phonebook" > > I mean, maybe that is a thing we want to do somehow sometime, to > migrate > people to the "new" spellings, but the old ones aren't wrong. I see what you mean; I'm not sure the best thing to do here. We are adjusting the string passed by the user, and it feels like some users might want to know that. It's a NOTICE, not a WARNING, so it's not meant to imply that it's wrong. But at the same time I can see it being annoying or confusing. If it's confusing, perhaps a wording change and documentation would improve it? If it's annoying, we might need to have an option and/or a different log level? > It also doesn't appear to address > how to handle ICU before version 54. Do you have a specific c
Re: [PATCH] Add pretty-printed XML output option
Peter Smith writes: > The patch v19 LGTM. I've looked through this now, and have some minor complaints and a major one. The major one is that it doesn't work for XML that doesn't satisfy IS DOCUMENT. For example, regression=# select '42'::xml is document; ?column? -- f (1 row) regression=# select xmlserialize (content '42' as text); xmlserialize --- 42 (1 row) regression=# select xmlserialize (content '42' as text indent); ERROR: invalid XML document DETAIL: line 1: Extra content at the end of the document 42 ^ This is not what the documentation promises, and I don't think it's good enough --- the SQL spec has no restriction saying you can't use INDENT with CONTENT. I tried adjusting things so that we call xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag, but all that got me was empty output (except for a document header). It seems like xmlDocDumpFormatMemory is not the thing to use, at least not in the CONTENT case. But libxml2 has a few other "dump" functions, so maybe we can use a different one? I see we are using xmlNodeDump elsewhere, and that has a format option, so maybe there's a way forward there. A lesser issue is that INDENT tacks on a document header (XML declaration) whether there was one or not. I'm not sure whether that's an appropriate thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT case. We have code that can strip off the header again, but we need to figure out exactly when to apply it. I also suspect that it's outright broken to attach a header claiming the data is now in UTF8 encoding. If the database encoding isn't UTF8, then either that's a lie or we now have an encoding violation. Another thing that's mildly irking me is that the current factorization of this code will result in xml_parse'ing the data twice, if you have both DOCUMENT and INDENT specified. We could consider avoiding that if we merged the indentation functionality into xmltotext_with_xmloption, but it's probably premature to do so when we haven't figured out how to get the output right --- we might end up needing two xml_parse calls anyway with different parameters, perhaps. I also had a bunch of cosmetic complaints (mostly around this having a bad case of add-at-the-end-itis), which I've cleaned up in the attached v20. This doesn't address any of the above, however. regards, tom lane diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 467b49b199..53d59662b9 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -4460,14 +4460,18 @@ xml 'bar' xml, uses the function xmlserialize:xmlserialize -XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type ) +XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [NO] INDENT ] ) type can be character, character varying, or text (or an alias for one of those). Again, according to the SQL standard, this is the only way to convert between type xml and character types, but PostgreSQL also allows -you to simply cast the value. +you to simply cast the value. The option INDENT allows to +indent the serialized xml output - the default is NO INDENT. +It is designed to indent XML strings of type DOCUMENT, but it can also + be used with CONTENT as long as value + contains a well-formed XML. diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt index 0fb9ab7533..bb4c135a7f 100644 --- a/src/backend/catalog/sql_features.txt +++ b/src/backend/catalog/sql_features.txt @@ -621,7 +621,7 @@ X061 XMLParse: character string input and DOCUMENT option YES X065 XMLParse: binary string input and CONTENT option NO X066 XMLParse: binary string input and DOCUMENT option NO X068 XMLSerialize: BOM NO -X069 XMLSerialize: INDENT NO +X069 XMLSerialize: INDENT YES X070 XMLSerialize: character string serialization and CONTENT option YES X071 XMLSerialize: character string serialization and DOCUMENT option YES X072 XMLSerialize: character string serialization YES diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 19351fe34b..3dcd15d5f0 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) { Datum *argvalue = op->d.xmlexpr.argvalue; bool *argnull = op->d.xmlexpr.argnull; +text *result; /* argument type is known to be xml */ Assert(list_length(xexpr->args) == 1); @@ -3837,8 +3838,12 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) return; value = argvalue[0]; -*op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value), - xexpr->xmloption)); +result = xmltotext_with_xmloption(DatumG
Re: buildfarm + meson
Andres Freund writes: > On 2023-03-09 14:47:36 -0500, Andrew Dunstan wrote: >> . There appears to be some mismatch in database names (e.g. >> regression_dblink vs contrib_regression_dblink). That's going to cause some >> issues with the module that adjusts things for cross version upgrade. > I guess we can try to do something about that, but the make situation is > overly complicated. I don't really want to emulate having randomly differing > database names just because a test is in contrib/ rather than src/. We could talk about adjusting the behavior on the make side instead, perhaps, but something needs to be done there eventually. Having said that, I'm not sure that the first meson-capable buildfarm version needs to support cross-version-upgrade testing. regards, tom lane
Re: allow_in_place_tablespaces vs. pg_basebackup
On Wed, Mar 8, 2023 at 9:52 PM Thomas Munro wrote: > Yeah. We knew that this didn't work (was discussed in a couple of > other threads), but you might be right that an error would be better > for now. It's absolutely not a user facing mode of operation, it was > intended just for the replication tests. Clearly it might be useful > for testing purposes in the backup area too, which is probably how > Robert got here. I will think about what changes would be needed as I > am looking at backup code currently, but I'm not sure when I'll be > able to post a patch so don't let that stop anyone else who sees how > to get it working... If there had been an error message like "ERROR: pg_basebackup cannot back up a database that contains in-place tablespaces," it would have saved me a lot of time yesterday. If there had at least been a documentation mention, I would have found it eventually (but not nearly as quickly). As it is, the only reference to this topic in the repository seems to be c6f2f01611d4f2c412e92eb7893f76fa590818e8, "Fix pg_basebackup with in-place tablespaces," which makes it look like this is supposed to be working. I also think that if we're going to have in-place tablespaces, it's a good idea for them to work with pg_basebackup. I wasn't really looking to test pg_basebackup with this feature (although it's a good idea); I was just trying to make sure I didn't break in-place tablespaces while working on something else. I think it's sometimes OK to add stuff for testing that doesn't work with absolutely everything we have in the system, but the tablespace code is awfully closely related to the pg_basebackup code for them to just not work together at all. Now that I'm done grumbling, here's a patch. -- Robert Haas EDB: http://www.enterprisedb.com v1-0001-Fix-pg_basebackup-with-in-place-tablespaces-some-.patch Description: Binary data
Sub-millisecond [autovacuum_]vacuum_cost_delay broken
Hi, I think that 4753ef37e0ed undid the work caf626b2c did to support sub-millisecond delays for vacuum and autovacuum. After 4753ef37e0ed, vacuum_delay_point()'s local variable msec is a double which, after being passed to WaitLatch() as timeout, which is a long, ends up being 0, so we don't end up waiting AFAICT. When I set [autovacuum_]vacuum_delay_point to 0.5, SHOW will report that it is 500us, but WaitLatch() is still getting 0 as timeout. - Melanie
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
On Fri, Mar 10, 2023 at 10:26 AM Melanie Plageman wrote: > I think that 4753ef37e0ed undid the work caf626b2c did to support > sub-millisecond delays for vacuum and autovacuum. > > After 4753ef37e0ed, vacuum_delay_point()'s local variable msec is a > double which, after being passed to WaitLatch() as timeout, which is a > long, ends up being 0, so we don't end up waiting AFAICT. > > When I set [autovacuum_]vacuum_delay_point to 0.5, SHOW will report that > it is 500us, but WaitLatch() is still getting 0 as timeout. Given that some of the clunkier underlying kernel primitives have milliseconds in their interface, I don't think it would be possible to make a usec-based variant of WaitEventSetWait() that works everywhere. Could it possibly make sense to do something that accumulates the error, so if you're using 0.5 then every second vacuum_delay_point() waits for 1ms?
Re: Date-time extraneous fields with reserved keywords
Joseph Koshakow writes: > Please see the attached patch with these changes. Pushed with a couple of cosmetic adjustments. regards, tom lane
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
Thomas Munro writes: > On Fri, Mar 10, 2023 at 10:26 AM Melanie Plageman > wrote: >> I think that 4753ef37e0ed undid the work caf626b2c did to support >> sub-millisecond delays for vacuum and autovacuum. > Given that some of the clunkier underlying kernel primitives have > milliseconds in their interface, I don't think it would be possible to > make a usec-based variant of WaitEventSetWait() that works everywhere. > Could it possibly make sense to do something that accumulates the > error, so if you're using 0.5 then every second vacuum_delay_point() > waits for 1ms? Yeah ... using float math there was cute, but it'd only get us so far. The caf626b2c code would only work well on platforms that have microsecond-based sleep primitives, so it was already not too portable. Can we fix this by making VacuumCostBalance carry the extra fractional delay, or would a separate variable be better? regards, tom lane
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
Greetings, * Thomas Munro (thomas.mu...@gmail.com) wrote: > On Fri, Mar 10, 2023 at 10:26 AM Melanie Plageman > wrote: > > I think that 4753ef37e0ed undid the work caf626b2c did to support > > sub-millisecond delays for vacuum and autovacuum. > > > > After 4753ef37e0ed, vacuum_delay_point()'s local variable msec is a > > double which, after being passed to WaitLatch() as timeout, which is a > > long, ends up being 0, so we don't end up waiting AFAICT. > > > > When I set [autovacuum_]vacuum_delay_point to 0.5, SHOW will report that > > it is 500us, but WaitLatch() is still getting 0 as timeout. > > Given that some of the clunkier underlying kernel primitives have > milliseconds in their interface, I don't think it would be possible to > make a usec-based variant of WaitEventSetWait() that works everywhere. > Could it possibly make sense to do something that accumulates the > error, so if you're using 0.5 then every second vacuum_delay_point() > waits for 1ms? Hmm. That generally makes sense to me.. though isn't exactly the same. Still, I wouldn't want to go back to purely pg_usleep() as that has the other downsides mentioned. Perhaps if the delay is sub-millisecond, explicitly do the WaitLatch() with zero but also do the pg_usleep()? That's doing a fair bit of work beyond just sleeping, but it also means we shouldn't miss out on the postmaster going away or similar.. Thanks, Stephen signature.asc Description: PGP signature
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
On Fri, Mar 10, 2023 at 11:02 AM Tom Lane wrote: > Thomas Munro writes: > > On Fri, Mar 10, 2023 at 10:26 AM Melanie Plageman > > wrote: > >> I think that 4753ef37e0ed undid the work caf626b2c did to support > >> sub-millisecond delays for vacuum and autovacuum. > > > Given that some of the clunkier underlying kernel primitives have > > milliseconds in their interface, I don't think it would be possible to > > make a usec-based variant of WaitEventSetWait() that works everywhere. > > Could it possibly make sense to do something that accumulates the > > error, so if you're using 0.5 then every second vacuum_delay_point() > > waits for 1ms? > > Yeah ... using float math there was cute, but it'd only get us so far. > The caf626b2c code would only work well on platforms that have > microsecond-based sleep primitives, so it was already not too portable. Also, the previous coding was already b0rked, because pg_usleep() rounds up to milliseconds on Windows (with a surprising formula for rounding), and also the whole concept seems to assume things about schedulers that aren't really universally true. If we actually cared about high res times maybe we should be using nanosleep and tracking the drift? And spreading it out a bit. But I don't know. > Can we fix this by making VacuumCostBalance carry the extra fractional > delay, or would a separate variable be better? I was wondering the same thing, but not being too familiar with that code, no opinion on that yet.
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
On Thu, Mar 9, 2023 at 5:10 PM Thomas Munro wrote: > > On Fri, Mar 10, 2023 at 11:02 AM Tom Lane wrote: > > Thomas Munro writes: > > > On Fri, Mar 10, 2023 at 10:26 AM Melanie Plageman > > > wrote: > > >> I think that 4753ef37e0ed undid the work caf626b2c did to support > > >> sub-millisecond delays for vacuum and autovacuum. > > > > > Given that some of the clunkier underlying kernel primitives have > > > milliseconds in their interface, I don't think it would be possible to > > > make a usec-based variant of WaitEventSetWait() that works everywhere. > > > Could it possibly make sense to do something that accumulates the > > > error, so if you're using 0.5 then every second vacuum_delay_point() > > > waits for 1ms? > > > > Yeah ... using float math there was cute, but it'd only get us so far. > > The caf626b2c code would only work well on platforms that have > > microsecond-based sleep primitives, so it was already not too portable. > > Also, the previous coding was already b0rked, because pg_usleep() > rounds up to milliseconds on Windows (with a surprising formula for > rounding), and also the whole concept seems to assume things about > schedulers that aren't really universally true. If we actually cared > about high res times maybe we should be using nanosleep and tracking > the drift? And spreading it out a bit. But I don't know. > > > Can we fix this by making VacuumCostBalance carry the extra fractional > > delay, or would a separate variable be better? > > I was wondering the same thing, but not being too familiar with that > code, no opinion on that yet. Well, that is reset to zero in vacuum() at the top -- which is called for each table for autovacuum, so it would get reset to zero between autovacuuming tables. I dunno how you feel about that... - Melanie
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
Thomas Munro writes: > On Fri, Mar 10, 2023 at 11:02 AM Tom Lane wrote: >> The caf626b2c code would only work well on platforms that have >> microsecond-based sleep primitives, so it was already not too portable. > Also, the previous coding was already b0rked, because pg_usleep() > rounds up to milliseconds on Windows (with a surprising formula for > rounding), and also the whole concept seems to assume things about > schedulers that aren't really universally true. If we actually cared > about high res times maybe we should be using nanosleep and tracking > the drift? And spreading it out a bit. But I don't know. Yeah, I was wondering about trying to make it a closed-loop control, but I think that'd be huge overkill considering what the mechanism is trying to accomplish. A minimalistic fix could be as attached. I'm not sure if it's worth making the state variable global so that it can be reset to zero in the places where we zero out VacuumCostBalance etc. Also note that this is ignoring the VacuumSharedCostBalance stuff, so you'd possibly have the extra delay accumulating in unexpected places when there are multiple workers. But I really doubt it's worth worrying about that. Is it reasonable to assume that all modern platforms can time millisecond delays accurately? Ten years ago I'd have suggested truncating the delay to a multiple of 10msec and using this logic to track the remainder, but maybe now that's unnecessary. regards, tom lane diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 2e12baf8eb..64d3c59709 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -2229,14 +2229,30 @@ vacuum_delay_point(void) /* Nap if appropriate */ if (msec > 0) { + /* + * Since WaitLatch can only wait in units of milliseconds, carry any + * residual fractional msec in a static variable, and accumulate it to + * add into the wait interval next time. + */ + static double residual_msec = 0; + long lmsec; + + msec += residual_msec; + if (msec > VacuumCostDelay * 4) msec = VacuumCostDelay * 4; - (void) WaitLatch(MyLatch, - WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, - msec, - WAIT_EVENT_VACUUM_DELAY); - ResetLatch(MyLatch); + lmsec = floor(msec); + residual_msec = msec - lmsec; + + if (lmsec > 0) + { + (void) WaitLatch(MyLatch, + WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, + lmsec, + WAIT_EVENT_VACUUM_DELAY); + ResetLatch(MyLatch); + } VacuumCostBalance = 0;
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
On Thu, Mar 09, 2023 at 05:27:08PM -0500, Tom Lane wrote: > Is it reasonable to assume that all modern platforms can time > millisecond delays accurately? Ten years ago I'd have suggested > truncating the delay to a multiple of 10msec and using this logic > to track the remainder, but maybe now that's unnecessary. If so, it might also be worth updating or removing this comment in pgsleep.c: * NOTE: although the delay is specified in microseconds, the effective * resolution is only 1/HZ, or 10 milliseconds, on most Unixen. Expect * the requested delay to be rounded up to the next resolution boundary. I've had doubts for some time about whether this is still accurate... -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: buildfarm + meson
On 2023-03-09 Th 15:25, Tom Lane wrote: Andres Freund writes: On 2023-03-09 14:47:36 -0500, Andrew Dunstan wrote: . There appears to be some mismatch in database names (e.g. regression_dblink vs contrib_regression_dblink). That's going to cause some issues with the module that adjusts things for cross version upgrade. I guess we can try to do something about that, but the make situation is overly complicated. I don't really want to emulate having randomly differing database names just because a test is in contrib/ rather than src/. We could talk about adjusting the behavior on the make side instead, perhaps, but something needs to be done there eventually. Having said that, I'm not sure that the first meson-capable buildfarm version needs to support cross-version-upgrade testing. Well, I want to store up as little future work as possible. This particular issue won't be much of a problem for several months until we branch the code, as we don't do database adjustments for a same version upgrade. At that stage I think a small modification to AdjustUpgrade.pm will do the trick. We just need to remember to do it. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
On Thu, Mar 9, 2023 at 5:27 PM Tom Lane wrote: > > Thomas Munro writes: > > On Fri, Mar 10, 2023 at 11:02 AM Tom Lane wrote: > >> The caf626b2c code would only work well on platforms that have > >> microsecond-based sleep primitives, so it was already not too portable. > > > Also, the previous coding was already b0rked, because pg_usleep() > > rounds up to milliseconds on Windows (with a surprising formula for > > rounding), and also the whole concept seems to assume things about > > schedulers that aren't really universally true. If we actually cared > > about high res times maybe we should be using nanosleep and tracking > > the drift? And spreading it out a bit. But I don't know. > > Yeah, I was wondering about trying to make it a closed-loop control, > but I think that'd be huge overkill considering what the mechanism is > trying to accomplish. Not relevant to fixing this, but I wonder if you could eliminate the need to specify the cost delay in most cases for autovacuum if you used feedback from how much vacuuming work was done during the last cycle of vacuuming to control the delay value internally - a kind of feedback-adjusted controller. - Melanie
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
On Thu, Mar 9, 2023 at 5:27 PM Tom Lane wrote: > > Thomas Munro writes: > > On Fri, Mar 10, 2023 at 11:02 AM Tom Lane wrote: > >> The caf626b2c code would only work well on platforms that have > >> microsecond-based sleep primitives, so it was already not too portable. > > > Also, the previous coding was already b0rked, because pg_usleep() > > rounds up to milliseconds on Windows (with a surprising formula for > > rounding), and also the whole concept seems to assume things about > > schedulers that aren't really universally true. If we actually cared > > about high res times maybe we should be using nanosleep and tracking > > the drift? And spreading it out a bit. But I don't know. > > Yeah, I was wondering about trying to make it a closed-loop control, > but I think that'd be huge overkill considering what the mechanism is > trying to accomplish. > > A minimalistic fix could be as attached. I'm not sure if it's worth > making the state variable global so that it can be reset to zero in > the places where we zero out VacuumCostBalance etc. Also note that > this is ignoring the VacuumSharedCostBalance stuff, so you'd possibly > have the extra delay accumulating in unexpected places when there are > multiple workers. But I really doubt it's worth worrying about that. What if someone resets the delay guc and there is still a large residual?
Add macros for ReorderBufferTXN toptxn
Hi, During a recent code review, I was confused multiple times by the toptxn member of ReorderBufferTXN, which is defined only for sub-transactions. e.g. txn->toptxn member == NULL means the txn is a top level txn. e.g. txn->toptxn member != NULL means the txn is not a top level txn It makes sense if you squint and read it slowly enough, but IMO it's too easy to accidentally misinterpret the meaning when reading code that uses this member. ~ Such code can be made easier to read just by introducing some simple macros: #define isa_toptxn(rbtxn) (rbtxn->toptxn == NULL) #define isa_subtxn(rbtxn) (rbtxn->toptxn != NULL) #define get_toptxn(rbtxn) (isa_subtxn(rbtxn) ? rbtxn->toptxn : rbtxn) ~ PSA a small patch that does this. (Tests OK using make check-world) Thoughts? -- Kind Regards, Peter Smith. Fujitsu Australia v1-0001-Add-macros-for-ReorderBufferTXN-toptxn.patch Description: Binary data
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
On Tue, Mar 7, 2023 at 1:36 PM Juan José Santamaría Flecha < juanjo.santama...@gmail.com> wrote: > > On Thu, Mar 2, 2023 at 8:01 AM Michael Paquier > wrote: > >> >> The internal implementation of _pgstat64() is used in quite a few >> > places, so we'd better update this part first, IMO, and then focus on >> the pg_dump part. Anyway, it looks like you are right here: there is >> nothing for FILE_TYPE_PIPE or FILE_TYPE_CHAR in this WIN32 >> implementation of fstat(). >> >> I am amazed to hear that both ftello64() and fseek64() actually >> succeed if you use a pipe: >> https://pubs.opengroup.org/onlinepubs/9699919799/functions/fseek.html >> Could it be something we should try to make more portable by ourselves >> with a wrapper for these on WIN32? That would not be the first one to >> accomodate our code with POSIX, and who knows what code could be broken >> because of that, like external extensions that use fseek64() without >> knowing it. >> > > The error is reproducible in versions previous to win32stat.c, so that > might work as bug fix. > I've broken the patch in two: 1. fixes the detection of unseekable files in checkSeek(), using logic that hopefully is backpatchable, 2. the improvements on file type detection for stat() proposed by the OP. Regards, Juan José Santamaría Flecha v2-0002-improve-detection-file-type-for-WIN32-stat.patch Description: Binary data v2-0001-fix-chechSeek-for-WIN32.patch Description: Binary data
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
Melanie Plageman writes: > On Thu, Mar 9, 2023 at 5:27 PM Tom Lane wrote: >> A minimalistic fix could be as attached. I'm not sure if it's worth >> making the state variable global so that it can be reset to zero in >> the places where we zero out VacuumCostBalance etc. Also note that >> this is ignoring the VacuumSharedCostBalance stuff, so you'd possibly >> have the extra delay accumulating in unexpected places when there are >> multiple workers. But I really doubt it's worth worrying about that. > What if someone resets the delay guc and there is still a large residual? By definition, the residual is less than 1msec. regards, tom lane
Re: buildfarm + meson
On 2023-03-09 Th 14:47, Andrew Dunstan wrote: On 2023-03-09 Th 08:28, Andrew Dunstan wrote: At this stage I think I'm prepared to turn this loose on a couple of my buildfarm animals, and if nothing goes awry for the remainder of the month merge the dev/meson branch and push a new release. There is still probably a little polishing to do, especially w.r.t. log file artefacts. A few things I've found: . We don't appear to have an equivalent of the headerscheck and cpluspluscheck GNUmakefile targets . I don't know how to build other docs targets (e.g. postgres-US.pdf) . There appears to be some mismatch in database names (e.g. regression_dblink vs contrib_regression_dblink). That's going to cause some issues with the module that adjusts things for cross version upgrade. Another thing: the test for uuid.h is too strict. On Fedora 36 the OSSP header is in /usr/include, not /usr/include/ossp (I got around that for now by symlinking it, but obviously that's a nasty hack we can't ask people to do) cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: improving user.c error messages
On Thu, Mar 09, 2023 at 09:58:46AM -0800, Nathan Bossart wrote: > On Thu, Mar 09, 2023 at 10:55:54AM +0100, Peter Eisentraut wrote: >> On 20.02.23 23:58, Nathan Bossart wrote: >>> For now, I've reworded these as "must inherit privileges of". >> >> I don't have a good mental model of all this role inheritance, personally, >> but I fear that this change makes the messages more jargony and less clear. >> Maybe the original wording was good enough. > > I'm fine with that. I used the original wording in v7. >> "admin option" is sort of a natural language term, I think, so we don't need >> to parametrize it as "%s option". Also, there are no other "options" in >> this context, I think. > > v16 introduces the INHERIT and SET options. I don't have a strong opinion > about parameterizing it, though. My intent was to consistently capitalize > all the attributes and options. I didn't change this in v7, but I can do so if you still think it shouldn't be parameterized. >> A general thought: It seems we currently don't have any error messages that >> address the user like "You must do this". Do we want to go there? Should we >> try for a more impersonal wording like >> >> "You must have the %s attribute to create roles." >> >> "Current user must have the %s attribute to create roles." >> >> "%s attribute is required to create roles." > > I think I like the last option the most. In general, I agree with trying > to avoid the second-person phrasing. I ended up using the "current user must have" wording in a few places, and for most others, I used "only roles with X may do Y." That seemed to flow relatively well, and IMO it made the required privileges abundantly clear. I initially was going to use the "X attribute is required to Y" wording, but I was worried that didn't make it sufficiently clear that the _role_ must have the attribute. In any case, I'm not wedded to the approach I used in the patch and am willing to try out other wordings. BTW I did find one example of a "you must" message while I was updating the patch: write_stderr("%s does not know where to find the server configuration file.\n" "You must specify the --config-file or -D invocation " "option or set the PGDATA environment variable.\n", progname); I don't think it's a common style, though. >> By the way, I'm not sure what the separation between 0001 and 0002 is >> supposed to be. > > I'll combine them. I first started with user.c only, but we kept finding > new messages to improve. I combined the patches in v7. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 66cc7a3999cae48c1b97ef067bc16872b31b887f Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Thu, 26 Jan 2023 11:05:13 -0800 Subject: [PATCH v7 1/1] Improve several permission-related error messages. --- contrib/file_fdw/expected/file_fdw.out| 3 +- contrib/file_fdw/file_fdw.c | 8 +- .../test_decoding/expected/permissions.out| 12 +- src/backend/backup/basebackup_server.c| 4 +- src/backend/catalog/objectaddress.c | 16 +- src/backend/commands/copy.c | 12 +- src/backend/commands/user.c | 169 +- src/backend/replication/slot.c| 6 +- src/backend/storage/ipc/procarray.c | 4 +- src/backend/storage/ipc/signalfuncs.c | 16 +- src/backend/tcop/utility.c| 5 +- src/backend/utils/init/miscinit.c | 4 + src/backend/utils/init/postinit.c | 12 +- src/backend/utils/misc/guc.c | 15 +- .../expected/dummy_seclabel.out | 3 +- .../unsafe_tests/expected/rolenames.out | 3 +- src/test/regress/expected/create_role.out | 80 ++--- src/test/regress/expected/dependency.out | 4 + src/test/regress/expected/privileges.out | 23 ++- 19 files changed, 287 insertions(+), 112 deletions(-) diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out index 36d76ba26c..a5acf6d4f7 100644 --- a/contrib/file_fdw/expected/file_fdw.out +++ b/contrib/file_fdw/expected/file_fdw.out @@ -474,7 +474,8 @@ ALTER FOREIGN TABLE agg_text OWNER TO regress_file_fdw_user; ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text'); SET ROLE regress_file_fdw_user; ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text'); -ERROR: only superuser or a role with privileges of the pg_read_server_files role may specify the filename option of a file_fdw foreign table +ERROR: permission denied to set the "filename" option of a file_fdw foreign table +DETAIL: Only roles with privileges of the "pg_read_server_files" role may set this option. SET ROLE regress_file_fdw_superuser; -- cleanup RESET ROLE; diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index 2d2b0b6a6b..8a312b5d0e 100644 --- a/contrib/file_fdw/file_fdw.c +++ b/contrib/fi
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
On Fri, Mar 10, 2023 at 11:37 AM Nathan Bossart wrote: > > On Thu, Mar 09, 2023 at 05:27:08PM -0500, Tom Lane wrote: > > Is it reasonable to assume that all modern platforms can time > > millisecond delays accurately? Ten years ago I'd have suggested > > truncating the delay to a multiple of 10msec and using this logic > > to track the remainder, but maybe now that's unnecessary. > > If so, it might also be worth updating or removing this comment in > pgsleep.c: > > * NOTE: although the delay is specified in microseconds, the effective > * resolution is only 1/HZ, or 10 milliseconds, on most Unixen. Expect > * the requested delay to be rounded up to the next resolution boundary. > > I've had doubts for some time about whether this is still accurate... What I see with the old select(), or a more modern clock_nanosleep() call, is that Linux, FreeBSD, macOS are happy sleeping for .1ms, .5ms, 1ms, 2ms, 3ms, and through innaccuracies and scheduling overheads etc it works out to about 5-25% extra sleep time (I expect that can be affected by choice of time source/available hardware, and perhaps various system calls use different tricks). I definitely recall the behaviour described, back in the old days where more stuff was scheduler-tick based. I have no clue for Windows; quick googling tells me that it might still be pretty chunky, unless you do certain other stuff that I didn't follow up; we could probably get more accurate sleep times by rummaging through nt.dll. It would be good to find out how well WaitEventSet does on Windows; perhaps we should have a little timing accuracy test in the tree to collect build farm data? FWIW epoll has a newer _pwait2() call that has higher res timeout argument, and Windows WaitEventSet could also do high res timers if you add timer events rather than using the timeout argument, and I guess conceptually even the old poll() thing could do the equivalent with a signal alarm timer, but it sounds a lot like a bad idea to do very short sleeps to me, burning so much CPU on scheduling. I kinda wonder if the 10ms + residual thing might even turn out to be a better idea... but I dunno. The 1ms residual thing looks pretty good to me as a fix to the immediate problem report, but we might also want to adjust the wording in config.sgml?
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
Erm, but maybe I'm just looking at this too myopically. Is there really any point in letting people set it to 0.5, if it behaves as if you'd set it to 1 and doubled the cost limit? Isn't it just more confusing? I haven't read the discussion from when fractional delays came in, where I imagine that must have come up...
Re: Date-Time dangling unit fix
Joseph Koshakow writes: > Also I removed some dead code from the previous patch. This needs a rebase over bcc704b52, so I did that and made a couple of other adjustments. I'm inclined to think that you removed too much from DecodeTimeOnly. That does accept date specs (at least for timetz), and I see no very good reason for it not to accept a Julian date spec. I also wonder why you removed the UNITS: case there. Seems like we want these functions to accept the same syntax as much as possible. I think the code is still a bit schizophrenic about placement of ptype specs. In the UNITS: case, we don't insist that a unit apply to exactly the very next field; instead it applies to the next one where it disambiguates. So for instead this is accepted: regression=# select 'J PM 1234567 1:23'::timestamp; timestamp 1333-01-11 13:23:00 BC That's a little weird, or maybe even a lot weird, but it's not inherently nonsensical so I'm hesitant to stop accepting it. However, if UNITS acts that way, then why is ISOTIME different? So I'm inclined to remove ISOTIME's lookahead check if (i >= nf - 1 || (ftype[i + 1] != DTK_NUMBER && ftype[i + 1] != DTK_TIME && ftype[i + 1] != DTK_DATE)) return DTERR_BAD_FORMAT; and rely on the ptype-still-set error at the bottom of the loop to complain about nonsensical cases. Also, if we do keep the lookahead checks, the one in DecodeTimeOnly could be simplified --- it's accepting some cases that actually aren't supported there. regards, tom lane diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index a7558d39a0..13856b06d1 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -983,7 +983,7 @@ DecodeDateTime(char **field, int *ftype, int nf, int fmask = 0, tmask, type; - int ptype = 0; /* "prefix type" for ISO y2001m02d04 format */ + int ptype = 0; /* "prefix type" for ISO and Julian formats */ int i; int val; int dterr; @@ -1071,6 +1071,9 @@ DecodeDateTime(char **field, int *ftype, int nf, { char *cp; + /* + * Allow a preceding "t" field, but no other units. + */ if (ptype != 0) { /* Sanity check; should not fail this test */ @@ -1175,8 +1178,7 @@ DecodeDateTime(char **field, int *ftype, int nf, case DTK_NUMBER: /* - * Was this an "ISO date" with embedded field labels? An - * example is "y2001m02d04" - thomas 2001-02-04 + * Deal with cases where previous field labeled this one */ if (ptype != 0) { @@ -1187,85 +1189,11 @@ DecodeDateTime(char **field, int *ftype, int nf, value = strtoint(field[i], &cp, 10); if (errno == ERANGE) return DTERR_FIELD_OVERFLOW; - - /* - * only a few kinds are allowed to have an embedded - * decimal - */ - if (*cp == '.') - switch (ptype) - { - case DTK_JULIAN: - case DTK_TIME: - case DTK_SECOND: -break; - default: -return DTERR_BAD_FORMAT; -break; - } - else if (*cp != '\0') + if (*cp != '.' && *cp != '\0') return DTERR_BAD_FORMAT; switch (ptype) { - case DTK_YEAR: - tm->tm_year = value; - tmask = DTK_M(YEAR); - break; - - case DTK_MONTH: - - /* - * already have a month and hour? then assume - * minutes - */ - if ((fmask & DTK_M(MONTH)) != 0 && -(fmask & DTK_M(HOUR)) != 0) - { -tm->tm_min = value; -tmask = DTK_M(MINUTE); - } - else - { -tm->tm_mon = value; -tmask = DTK_M(MONTH); - } - break; - - case DTK_DAY: - tm->tm_mday = value; - tmask = DTK_M(DAY); - break; - - case DTK_HOUR: - tm->tm_hour = value; - tmask = DTK_M(HOUR); - break; - - case DTK_MINUTE: - tm->tm_min = value; - tmask = DTK_M(MINUTE); - break; - - case DTK_SECOND: - tm->tm_sec = value; - tmask = DTK_M(SECOND); - if (*cp == '.') - { -dterr = ParseFractionalSecond(cp, fsec); -if (dterr) - return dterr; -tmask = DTK_ALL_SECS_M; - } - break; - - case DTK_TZ: - tmask = DTK_M(TZ); - dterr = DecodeTimezone(field[i], tzp); - if (dterr) -return dterr; - break; - case DTK_JULIAN: /* previous field was a label for "julian date" */ if (value < 0) @@ -1519,6 +1447,9 @@ DecodeDateTime(char **field, int *ftype, int nf, case UNITS: tmask = 0; + /* reject consecutive unhandled units */ + if (ptype != 0) + return DTERR_BAD_FORMAT; ptype = val; break; @@ -1576,6 +1507,10 @@ DecodeDateTime(char **field, int
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
Thomas Munro writes: > Erm, but maybe I'm just looking at this too myopically. Is there > really any point in letting people set it to 0.5, if it behaves as if > you'd set it to 1 and doubled the cost limit? Isn't it just more > confusing? I haven't read the discussion from when fractional delays > came in, where I imagine that must have come up... At [1] I argued >> The reason is this: what we want to do is throttle VACUUM's I/O demand, >> and by "throttle" I mean "gradually reduce". There is nothing gradual >> about issuing a few million I/Os and then sleeping for many milliseconds; >> that'll just produce spikes and valleys in the I/O demand. Ideally, >> what we'd have it do is sleep for a very short interval after each I/O. >> But that's not too practical, both for code-structure reasons and because >> most platforms don't give us a way to so finely control the length of a >> sleep. Hence the design of sleeping for awhile after every so many I/Os. >> >> However, the current settings are predicated on the assumption that >> you can't get the kernel to give you a sleep of less than circa 10ms. >> That assumption is way outdated, I believe; poking around on systems >> I have here, the minimum delay time using pg_usleep(1) seems to be >> generally less than 100us, and frequently less than 10us, on anything >> released in the last decade. >> >> I propose therefore that instead of increasing vacuum_cost_limit, >> what we ought to be doing is reducing vacuum_cost_delay by a similar >> factor. And, to provide some daylight for people to reduce it even >> more, we ought to arrange for it to be specifiable in microseconds >> not milliseconds. There's no GUC_UNIT_US right now, but it's time. That last point was later overruled in favor of keeping it measured in msec to avoid breaking existing configuration files. Nonetheless, vacuum_cost_delay *is* an actual time to wait (conceptually at least), not just part of a unitless ratio; and there seem to be good arguments in favor of letting people make it small. I take your point that really short sleeps are inefficient so far as the scheduling overhead goes. But on modern machines you probably have to get down to a not-very-large number of microseconds before that's a big deal. regards, tom lane [1] https://www.postgresql.org/message-id/28720.1552101086%40sss.pgh.pa.us
Re: Add pg_walinspect function with block info columns
On Thu, Mar 09, 2023 at 03:37:21PM +0900, Michael Paquier wrote: > Okay, thanks. Let's use that, then. I have done one pass over that today, and applied it. Thanks! I'd really like to do something about the errors we raise in the module when specifying LSNs in the future for this release, now. I got annoyed by it again this morning while doing \watch queries that kept failing randomly while stressing this patch. -- Michael signature.asc Description: PGP signature
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
On Fri, Mar 10, 2023 at 1:46 PM Tom Lane wrote: > >> I propose therefore that instead of increasing vacuum_cost_limit, > >> what we ought to be doing is reducing vacuum_cost_delay by a similar > >> factor. And, to provide some daylight for people to reduce it even > >> more, we ought to arrange for it to be specifiable in microseconds > >> not milliseconds. There's no GUC_UNIT_US right now, but it's time. > > That last point was later overruled in favor of keeping it measured in > msec to avoid breaking existing configuration files. Nonetheless, > vacuum_cost_delay *is* an actual time to wait (conceptually at least), > not just part of a unitless ratio; and there seem to be good arguments > in favor of letting people make it small. > > I take your point that really short sleeps are inefficient so far as the > scheduling overhead goes. But on modern machines you probably have to get > down to a not-very-large number of microseconds before that's a big deal. OK. One idea is to provide a WaitLatchUsec(), which is just some cross platform donkeywork that I think I know how to type in, and it would have to round up on poll() and Windows builds. Then we could either also provide WaitEventSetResolution() that returns 1000 or 1 depending on availability of 1us waits so that we could round appropriately and then track residual, but beyond that let the user worry about inaccuracies and overheads (as mentioned in the documentation), or we could start consulting the clock and tracking our actual sleep time and true residual over time (maybe that's what "closed-loop control" means?).
Re: Add pg_walinspect function with block info columns
On Fri, Mar 10, 2023 at 6:43 AM Michael Paquier wrote: > > I'd really like to do something about the errors we raise in the > module when specifying LSNs in the future for this release, now. I > got annoyed by it again this morning while doing \watch queries that > kept failing randomly while stressing this patch. Perhaps what is proposed here https://www.postgresql.org/message-id/calj2acwqj+m0hoqj9qkav2uqfq97yk5jn2modfkcxusxsyp...@mail.gmail.com might help and avoid many errors around input LSN validations. Let's discuss that in that thread. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
Thomas Munro writes: > OK. One idea is to provide a WaitLatchUsec(), which is just some > cross platform donkeywork that I think I know how to type in, and it > would have to round up on poll() and Windows builds. Then we could > either also provide WaitEventSetResolution() that returns 1000 or 1 > depending on availability of 1us waits so that we could round > appropriately and then track residual, but beyond that let the user > worry about inaccuracies and overheads (as mentioned in the > documentation), ... so we'd still need to have the residual-sleep-time logic? > or we could start consulting the clock and tracking > our actual sleep time and true residual over time (maybe that's what > "closed-loop control" means?). Yeah, I was hand-waving about trying to measure our actual sleep times. On reflection I doubt it's a great idea. It'll add overhead and there's still a question of whether measurement noise would accumulate. regards, tom lane
Re: Add pg_walinspect function with block info columns
On Fri, Mar 10, 2023 at 06:50:05AM +0530, Bharath Rupireddy wrote: > Perhaps what is proposed here > https://www.postgresql.org/message-id/calj2acwqj+m0hoqj9qkav2uqfq97yk5jn2modfkcxusxsyp...@mail.gmail.com > might help and avoid many errors around input LSN validations. Let's > discuss that in that thread. Yep, I am going to look at your proposal. -- Michael signature.asc Description: PGP signature
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
On Fri, Mar 10, 2023 at 12:12:37AM +0100, Juan José Santamaría Flecha wrote: > I've broken the patch in two: > 1. fixes the detection of unseekable files in checkSeek(), using logic that > hopefully is backpatchable, > 2. the improvements on file type detection for stat() proposed by the OP. I am OK with 0002, so I'll try to get this part backpatched down to where the implementation of stat() has been added. I am not completely sure that 0001 is the right way forward, though, particularly with the long-term picture.. In the backend, we have one caller of fseeko() as of read_binary_file(), so we would never pass down a pipe to that. However, there could be a risk of some silent breakages on Windows if some new code relies on that? There is a total of 11 callers of fseeko() in pg_dump, so rather than relying on checkSeek() to see if it actually works, I'd like to think that we should have a central policy to make this code more bullet-proof in the future. -- Michael signature.asc Description: PGP signature