Re: ANY_VALUE aggregate
On Thu, 19 Jan 2023 at 06:01, Vik Fearing wrote: > Thank you for the review. Attached is a new version rebased to d540a02a72. I've only a bunch of nit-picks, personal preferences and random thoughts to offer as a review: 1. I'd be inclined *not* to mention the possible future optimisation in: + * Currently this just returns the first value, but in the future it might be + * able to signal to the aggregate that it does not need to be called anymore. I think it's unlikely that the transfn would "signal" such a thing. It seems more likely if we did anything about it that nodeAgg.c would maybe have some additional knowledge not to call that function if the agg state already has a value. Just so we're not preempting how we might do such a thing in the future, it seems best just to remove the mention of it. I don't really think it serves as a good reminder that we might want to do this one day anyway. 2. +any_value_trans(PG_FUNCTION_ARGS) Many of transition function names end in "transfn", not "trans". I think it's better to follow the existing (loosely followed) naming pattern that a few aggregates seem to follow rather than invent a new one. 3. I tend to try to copy the capitalisation of keywords from the surrounding regression tests. I see the following breaks that. +SELECT any_value(v) FILTER (WHERE v > 2) FROM (VALUES (1), (2), (3)) AS v (v); (obviously, ideally, we'd always just follow the same capitalisation of keywords everywhere in each .sql file, but we've long broken that and the best way can do is be consistent with surrounding tests) 4. I think I'd use the word "Returns" instead of "Chooses" in: +Chooses a non-deterministic value from the non-null input values. 5. I've not managed to find a copy of the 2023 draft, so I'm assuming you've got the ignoring of NULLs correct. I tried to see what other databases do using https://www.db-fiddle.com/ . I was surprised to see MySQL 8.0 returning NULL with: create table a (a int, b int); insert into a values(1,null),(1,2),(1,null); select any_value(b) from a group by a; I'd have expected "2" to be returned. (It gets even weirder without the GROUP BY clause, so I'm not too hopeful any useful information can be obtained from looking here) I know MySQL doesn't follow the spec quite as closely as we do, so I might not be that surprised if they didn't pay attention to the wording when implementing this, however, I've not seen the spec, so I can only speculate what value should be returned. Certainly not doing any aggregation for any_value() when there is no GROUP BY seems strange. I see they don't do the same with sum(). Perhaps this is just a side effect of their loose standards when it came to columns in the SELECT clause that are not in the GROUP BY clause. 6. Is it worth adding a WindowFunc test somewhere in window.sql with an any_value(...) over (...)? Is what any_value() returns as a WindowFunc equally as non-deterministic as when it's used as an Aggref? Can we assume there's no guarantee that it'll return the same value for each partition in each row? Does the spec mention anything about that? 7. I wondered if it's worth adding a SupportRequestOptimizeWindowClause support function for this aggregate. I'm thinking that it might not be as likely people would use something more specific like first_value/nth_value/last_value instead of using any_value as a WindowFunc. Also, I'm currently thinking that a SupportRequestWFuncMonotonic for any_value() is not worth the dozen or so lines of code it would take to write it. I'm assuming it would always be a MONOTONICFUNC_BOTH function. It seems unlikely that someone would have a subquery with a WHERE clause in the upper-level query referencing the any_value() aggregate. Thought I'd mention both of these things anyway as someone else might think of some good reason we should add them that I didn't think of. David
Re: Wasted Vacuum cycles when OldestXmin is not moving
On Wed, Jan 11, 2023 at 3:16 AM sirisha chamarthi wrote: > > Hi Hackers, > > vacuum is not able to clean up dead tuples when OldestXmin is not moving > (because of a long running transaction or when hot_standby_feedback is > behind). Even though OldestXmin is not moved from the last time it checked, > it keeps retrying every autovacuum_naptime and wastes CPU cycles and IOs when > pages are not in memory. Can we not bypass the dead tuple collection and > cleanup step until OldestXmin is advanced? Below log shows the vacuum running > every 1 minute. > > 2023-01-09 08:13:01.364 UTC [727219] LOG: automatic vacuum of table > "postgres.public.t1": index scans: 0 > pages: 0 removed, 6960 remain, 6960 scanned (100.00% of total) > tuples: 0 removed, 1572864 remain, 786432 are dead but not yet > removable > removable cutoff: 852, which was 2 XIDs old when operation ended > frozen: 0 pages from table (0.00% of total) had 0 tuples frozen > index scan not needed: 0 pages from table (0.00% of total) had 0 dead > item identifiers removed > avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s > buffer usage: 13939 hits, 0 misses, 0 dirtied > WAL usage: 0 records, 0 full page images, 0 bytes > system usage: CPU: user: 0.15 s, system: 0.00 s, elapsed: 0.29 s > 2023-01-09 08:14:01.363 UTC [727289] LOG: automatic vacuum of table > "postgres.public.t1": index scans: 0 > pages: 0 removed, 6960 remain, 6960 scanned (100.00% of total) > tuples: 0 removed, 1572864 remain, 786432 are dead but not yet > removable > removable cutoff: 852, which was 2 XIDs old when operation ended > frozen: 0 pages from table (0.00% of total) had 0 tuples frozen > index scan not needed: 0 pages from table (0.00% of total) had 0 dead > item identifiers removed > avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s > buffer usage: 13939 hits, 0 misses, 0 dirtied > WAL usage: 0 records, 0 full page images, 0 bytes > system usage: CPU: user: 0.14 s, system: 0.00 s, elapsed: 0.29 s Can you provide a patch and test case, if possible, a TAP test with and without patch? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Deadlock between logrep apply worker and tablesync worker
On Mon, Jan 23, 2023 at 1:29 AM Tom Lane wrote: > > Another thing that has a bad smell about it is the fact that > process_syncing_tables_for_sync uses two transactions in the first > place. There's a comment there claiming that it's for crash safety, > but I can't help suspecting it's really because this case becomes a > hard deadlock without that mid-function commit. > > It's not great in any case that the apply worker can move on in > the belief that the tablesync worker is done when in fact the latter > still has catalog state updates to make. And I wonder what we're > doing with having both of them calling replorigin_drop_by_name > ... shouldn't that responsibility belong to just one of them? > Originally, it was being dropped at one place only (via tablesync worker) but we found a race condition as mentioned in the comments in process_syncing_tables_for_sync() before the start of the second transaction which leads to this change. See the report and discussion about that race condition in the email [1]. [1] - https://www.postgresql.org/message-id/CAD21AoAw0Oofi4kiDpJBOwpYyBBBkJj=sluon4gd2gjuakg...@mail.gmail.com -- With Regards, Amit Kapila.
Re: Improve GetConfigOptionValues function
On Thu, Jan 19, 2023 at 3:27 PM Nitin Jadhav wrote: > > > Possibly a better answer is to refactor into separate functions, > > along the lines of > > > > static bool > > ConfigOptionIsShowable(struct config_generic *conf) > > > > static void > > GetConfigOptionValues(struct config_generic *conf, const char **values) > > Nice suggestion. Attached a patch for the same. Please share the > comments if any. The v2 patch looks good to me except the comment around ConfigOptionIsShowable() which is too verbose. How about just "Return whether the GUC variable is visible or not."? I think you can add it to CF, if not done, to not lose track of it. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: User functions for building SCRAM secrets
On 11/29/22 8:12 PM, Michael Paquier wrote: On Tue, Nov 29, 2022 at 09:32:34PM +0100, Daniel Gustafsson wrote: On the whole I tend to agree with Jacob upthread, while this does provide consistency it doesn't seem to move the needle for best practices. Allowing scram_build_secret_sha256('password', 'a', 1); with the password potentially going in cleartext over the wire and into the logs doesn't seem like a great tradeoff for the (IMHO) niche usecases it would satisfy. Should we try to make \password and libpq more flexible instead? Two things got discussed in this area since v10: - The length of the random salt. - The iteration number. Or we could bump up the defaults, and come back to that in a few years, again.. ;p Here is another attempt at this patch that takes into account the SCRAM code refactor. I addressed some of Daniel's previous feedback, but will need to make another pass on the docs and the assert trace as the main focus of this revision was bringing the code inline with the recent changes. This patch changes the function name to "scram_build_secret" and now accepts a new parameter of hash type. This sets it up to handle additional hashes in the future. I do agree we should make libpq more flexible, but as mentioned in the original thread, this does not solve the *server side* cases where a user needs to build a SCRAM secret. For example, being able to precompute hashes on one server before sending them to another server, which can require no plaintext passwords if the server is randomly generating the data. Another use case comes from the "pg_tle" project, specifically with the ability to write a "check_password_hook" from an available PL[1]. If a user does follow our best practices and sends a pre-built SCRAM secret over the wire, a hook can then verify that the secret is not contained within a common password dictionary. Thanks, Jonathan [1] https://github.com/aws/pg_tle/blob/main/docs/04_hooks.md From 756c93f83869b7f8cbb87a7e4ccd967cbd8e8553 Mon Sep 17 00:00:00 2001 From: "Jonathan S. Katz" Date: Mon, 31 Oct 2022 16:13:08 -0400 Subject: [PATCH] Add "scram_build_secret" SQL function This function lets users build SCRAM secrets from SQL functions and provides the ability for the user to select the password, salt, and number of iterations for the password hashing algorithm. Currently this only supports the "sha256" hash, but can be modified to support additional hashes in the future. --- doc/src/sgml/func.sgml | 46 ++ src/backend/catalog/system_functions.sql | 7 ++ src/backend/libpq/auth-scram.c | 31 +-- src/backend/libpq/crypt.c| 2 +- src/backend/utils/adt/Makefile | 1 + src/backend/utils/adt/authfuncs.c| 109 +++ src/backend/utils/adt/meson.build| 1 + src/include/catalog/pg_proc.dat | 4 + src/include/libpq/scram.h| 4 +- src/test/regress/expected/scram.out | 99 src/test/regress/parallel_schedule | 2 +- src/test/regress/sql/scram.sql | 62 + 12 files changed, 356 insertions(+), 12 deletions(-) create mode 100644 src/backend/utils/adt/authfuncs.c create mode 100644 src/test/regress/expected/scram.out create mode 100644 src/test/regress/sql/scram.sql diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index b8dac9ef46..68f11e953a 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -3513,6 +3513,52 @@ repeat('Pg', 4) PgPgPgPg + + + + scram_build_secret + +scram_build_secret ( hash_type text +, password text +[, salt bytea +[, iterations integer ] ]) +text + + +Using the values provided in hash type and +password, builds a SCRAM secret equilvaent to +what is stored in +pg_authid.rolpassword +and used with scram-sha-256 +authentication. If not provided or set to NULL, +salt is randomly generated and +iterations defaults to 4096. +Currently hash type only supports +sha256. + + +SELECT scram_build_secret('sha256', 'secret password', decode('MTIzNDU2Nzg5MGFiY2RlZg==', 'base64')); + + + SCRAM-SHA-256$4096:MTIzNDU2Nzg5MGFiY2RlZg==$D5BmucT796UQKargx2k3fdqjDYR7cH/L0viKKhGo6kA=:M33+iHFOESP8C3DKLDb2k5QAhkNVWEbp/YUIFd2CxN4= + + + +SELECT scram_build_secret('sha256', 'secret password', '\xabba5432'); + + + SCRAM-SHA-256$4096:q7pUMg==$05Nb9QHwHkMA0CRcYaEfwtgZ+3kStIefz8fLMjTEtio=:P126h1ycyP938E69yxktEfhoAILbiwL/UMsMk3Efb6o= + + + +SELECT scram_build_secret('sha256', 'secret password', decode('MTIzNDU2Nzg5MGFiY2RlZg==', 'base64'), 1); + + +
Re: Perform streaming logical transactions by background workers and parallel apply
On Mon, Jan 23, 2023 at 8:47 AM Amit Kapila wrote: > > On Fri, Jan 20, 2023 at 11:48 AM Masahiko Sawada > wrote: > > > > > > > > Yet another way is to use the existing parameter logical_decode_mode > > > [1]. If the value of logical_decoding_mode is 'immediate', then we can > > > immediately switch to partial serialize mode. This will eliminate the > > > dependency on timing. The one argument against using this is that it > > > won't be as clear as a separate parameter like > > > 'stream_serialize_threshold' proposed by the patch but OTOH we already > > > have a few parameters that serve a different purpose when used on the > > > subscriber. For example, 'max_replication_slots' is used to define the > > > maximum number of replication slots on the publisher and the maximum > > > number of origins on subscribers. Similarly, > > > wal_retrieve_retry_interval' is used for different purposes on > > > subscriber and standby nodes. > > > > Using the existing parameter makes sense to me. But if we use > > logical_decoding_mode also on the subscriber, as Shveta Malik also > > suggested, probably it's better to rename it so as not to confuse. For > > example, logical_replication_mode or something. > > > > +1. Among the options discussed, this sounds better. Yeah, this looks better option with the parameter name 'logical_replication_mode'. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
RE: [Proposal] Add foreign-server health checks infrastructure
Dear Ted, Thanks for reviewing! PSA new version. > For v25-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch , > `pqConnCheck_internal` only has one caller which is quite short. > Can pqConnCheck_internal and PQConnCheck be merged into one func ? I divided the function for feature expandability. Currently it works on linux platform, but the limitation should be removed in future and internal function will be longer. Therefore I want to keep this style. > +int > +PQCanConnCheck(void) > > It seems the return value should be of bool type. I slightly changed the returned value like true/false. But IIUC libpq functions cannot define as "bool" datatype. E.g. PQconnectionNeedsPassword() returns true/false, but the function is defined as int. Best Regards, Hayato Kuroda FUJITSU LIMITED v26-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch Description: v26-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch v26-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch Description: v26-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch v26-0003-add-test.patch Description: v26-0003-add-test.patch
Re: Exit walsender before confirming remote flush in logical replication
On Fri, Jan 20, 2023 at 4:15 PM Amit Kapila wrote: > > On Tue, Jan 17, 2023 at 2:41 PM Amit Kapila wrote: > > > > Let me try to summarize the discussion till now. The problem we are > > trying to solve here is to allow a shutdown to complete when walsender > > is not able to send the entire WAL. Currently, in such cases, the > > shutdown fails. As per our current understanding, this can happen when > > (a) walreceiver/walapply process is stuck (not able to receive more > > WAL) due to locks or some other reason; (b) a long time delay has been > > configured to apply the WAL (we don't yet have such a feature for > > logical replication but the discussion for same is in progress). > > > > Both reasons mostly apply to logical replication because there is no > > separate walreceiver process whose job is to just flush the WAL. In > > logical replication, the process that receives the WAL also applies > > it. So, while applying it can stuck for a long time waiting for some > > heavy-weight lock to be released by some other long-running > > transaction by the backend. > > > > While checking the commits and email discussions in this area, I came > across the email [1] from Michael where something similar seems to > have been discussed. Basically, whether the early shutdown of > walsender can prevent a switchover between publisher and subscriber > but that part was never clearly answered in that email chain. It might > be worth reading the entire discussion [2]. That discussion finally > lead to the following commit: Right, in the thread the question is raised about whether it makes sense for logical replication to send all WALs but there is no conclusion on that. But I think this patch is mainly about resolving the PANIC due to extra WAL getting generated by walsender during checkpoint processing and that's the reason the behavior of sending all the WAL is maintained but only the extra WAL generation stopped (before shutdown checkpoint can proceed) using this new state > > commit c6c333436491a292d56044ed6e167e2bdee015a2 > Author: Andres Freund > Date: Mon Jun 5 18:53:41 2017 -0700 > > Prevent possibility of panics during shutdown checkpoint. > > When the checkpointer writes the shutdown checkpoint, it checks > afterwards whether any WAL has been written since it started and > throws a PANIC if so. At that point, only walsenders are still > active, so one might think this could not happen, but walsenders can > also generate WAL, for instance in BASE_BACKUP and logical decoding > related commands (e.g. via hint bits). So they can trigger this panic > if such a command is run while the shutdown checkpoint is being > written. > > To fix this, divide the walsender shutdown into two phases. First, > checkpointer, itself triggered by postmaster, sends a > PROCSIG_WALSND_INIT_STOPPING signal to all walsenders. If the backend > is idle or runs an SQL query this causes the backend to shutdown, if > logical replication is in progress all existing WAL records are > processed followed by a shutdown. > ... > ... > > Here, as mentioned in the commit, we are trying to ensure that before > checkpoint writes its shutdown WAL record, we ensure that "if logical > replication is in progress all existing WAL records are processed > followed by a shutdown.". I think even before this commit, we try to > send the entire WAL before shutdown but not completely sure. Yes, I think that there is no change in that behavior. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Deadlock between logrep apply worker and tablesync worker
On Mon, Jan 23, 2023 at 1:29 AM Tom Lane wrote: > > On my machine, the src/test/subscription/t/002_types.pl test > usually takes right about 1.5 seconds: > > $ time make check PROVE_FLAGS=--timer PROVE_TESTS=t/002_types.pl > ... > [14:22:12] t/002_types.pl .. ok 1550 ms ( 0.00 usr 0.00 sys + 0.70 cusr > 0.25 csys = 0.95 CPU) > [14:22:13] > > I noticed however that sometimes (at least one try in ten, for me) > it takes about 2.5 seconds: > > [14:22:16] t/002_types.pl .. ok 2591 ms ( 0.00 usr 0.00 sys + 0.69 cusr > 0.28 csys = 0.97 CPU) > [14:22:18] > > and I've even seen 3.5 seconds. I dug into this and eventually > identified the cause: it's a deadlock between a subscription's apply > worker and a tablesync worker that it's spawned. Sometimes the apply > worker calls wait_for_relation_state_change (to wait for the tablesync > worker to finish) while it's holding a lock on pg_replication_origin. > If that's the case, then when the tablesync worker reaches > process_syncing_tables_for_sync it is able to perform > UpdateSubscriptionRelState and reach the transaction commit below > that; but when it tries to do replorigin_drop_by_name a little further > down, it blocks on acquiring ExclusiveLock on pg_replication_origin. > So we have an undetected deadlock. We escape that because > wait_for_relation_state_change has a 1-second timeout, after which > it rechecks GetSubscriptionRelState and is able to see the committed > relation state change; so it continues, and eventually releases its > transaction and the lock, permitting the tablesync worker to finish. > > I've not tracked down the exact circumstances in which the apply > worker ends up holding a problematic lock, but it seems likely > that it corresponds to cases where its main loop has itself called > replorigin_drop_by_name, a bit further up, for some other concurrent > tablesync operation. (In all the cases I've traced through, the apply > worker is herding multiple tablesync workers when this happens.) > > I experimented with having the apply worker release its locks > before waiting for the tablesync worker, as attached. > I don't see any problem with your proposed change but I was wondering if it would be better to commit the transaction and release locks immediately after performing the replication origin drop? By doing that, we will minimize the amount of time the transaction holds the lock. > This passes > check-world and it seems to eliminate the test runtime instability, > but I wonder whether it's semantically correct. This whole business > of taking table-wide ExclusiveLock on pg_replication_origin looks > like a horrid kluge that we should try to get rid of, not least > because I don't see any clear documentation of what hazard it's > trying to prevent. > IIRC, this is done to prevent concurrent drops of origin drop say by exposed API pg_replication_origin_drop(). See the discussion in [1] related to it. If we want we can optimize it so that we can acquire the lock on the specific origin as mentioned in comments replorigin_drop_by_name() but it was not clear that this operation would be frequent enough. [1] - https://www.postgresql.org/message-id/CAHut%2BPuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A%40mail.gmail.com -- With Regards, Amit Kapila.
Re: Add a new pg_walinspect function to extract FPIs from WAL records
On Thu, Jan 12, 2023 at 05:37:40PM +0530, Bharath Rupireddy wrote: > I understand. I don't mind discussing something like [1] with the > following behaviour and discarding till_end_of_wal functions > altogether: > If start_lsn is NULL, error out/return NULL. > If end_lsn isn't specified, default to NULL, then determine the end_lsn. > If end_lsn is specified as NULL, then determine the end_lsn. > If end_lsn is specified as non-NULL, then determine if it is greater > than start_lsn if yes, go ahead do the job, otherwise error out. > > I'll think a bit more on this and perhaps discuss it separately. FWIW, I still find the current interface of the module bloated. So, while it is possible to stick some pg_current_wal_lsn() calls to bypass the error in most cases, enforcing the end of WAL with a NULL or larger value would still be something I would push for based on my own experience as there would be no need to worry about the latest LSN as being two different values in two function contexts. You could keep the functions as STRICT for consistency, and just allow larger values as a synonym for the end of WAL. Saying that, the version of pg_get_wal_fpi_info() committed respects the current behavior of the module, with an error on an incorrect end LSN. > I'll keep the FPI extract function simple as proposed in the patch and > I'll not go write till_end_of_wal version. If needed to get all the > FPIs till the end of WAL, one can always determine the end_lsn with > pg_current_wal_flush_lsn()/pg_last_wal_replay_lsn() when in recovery, > and use the proposed function. I was reading the patch this morning, and that's pretty much what I would have done in terms of simplicity with a test checking that at least one FPI has been generated. I have shortened a bit the documentation, tweaked a few comments and applied the whole after seeing the result. One thing that I have been wondering about is whether it is worth adding the block_id from the record in the output, but discarded this idea as it could be confused with the physical block number, even if this function is for advanced users. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Use 128-bit math to accelerate numeric division, when 8 < divisor digits <= 16
On Sun, Jan 22, 2023 at 10:42 PM Joel Jacobson wrote: > I did write the code like you suggest first, but changed it, > since I realised the extra "else if" needed could be eliminated, > and thought div_var_int64() wouldn't be slower than div_var_int() since > I thought 64-bit instructions in general are as fast as 32-bit instructions, > on 64-bit platforms. According to Agner's instruction tables [1], integer division on Skylake (for example) has a latency of 26 cycles for 32-bit operands, and 42-95 cycles for 64-bit. [1] https://www.agner.org/optimize/instruction_tables.pdf -- John Naylor EDB: http://www.enterprisedb.com
RE: [Proposal] Add foreign-server health checks infrastructure
> Thank you for reviewing! PSA new patch set. Sorry, I missed the updated file in the patch. New version will be posted soon. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Parallel Aggregates for string_agg and array_agg
On Thu, 19 Jan 2023 at 20:44, David Rowley wrote: > Thanks. Pending one last look, I'm planning to move ahead with it > unless there are any further objections or concerns. I've now pushed this. Thank you to everyone who reviewed or gave input on this patch. David
Re: Remove source code display from \df+?
On Sun, Jan 22, 2023 at 09:50:29PM -0500, Isaac Morland wrote: > However, one of the jobs (Windows - Server 2019, MinGW64 - Meson) is paused > and appears never to have run: > > https://cirrus-ci.com/task/6687014536347648 Yeah, mingw is currently set to run only when manually "triggered" by the repository owner (because it's slow). There's no mechanism to tell cfbot to trigger the task, but you can do it if you run from your own github. Anyway, there's no reason to think this patch needs extra platform-specific coverage. -- Justin
Re: Logical replication timeout problem
On Mon, Jan 23, 2023 at 6:21 AM Peter Smith wrote: > > 1. > > It makes no real difference, but I was wondering about: > "update txn progress" versus "update progress txn" > Yeah, I think we can go either way but I still prefer "update progress txn" as that is more closer to LogicalOutputPluginWriterUpdateProgress callback name. > > 5. > > + if (++changes_count >= CHANGES_THRESHOLD) > + { > + rb->update_progress_txn(rb, txn, change); > + changes_count = 0; > + } > > When there is no update_progress function this code is still incurring > some small additional overhead for incrementing and testing the > THRESHOLD every time, and also needlessly calling to the wrapper every > 100x. This overhead could be avoided with a simpler up-front check > like shown below. OTOH, maybe the overhead is insignificant enough > that just leaving the curent code is neater? > As far as built-in logical replication is concerned, it will be defined and I don't know if the overhead will be significant enough in this case. Also, one can say that for the cases it is defined, we are adding this check multiple times (it is already checked inside OutputPluginUpdateProgress). So, I would prefer a neat code here. -- With Regards, Amit Kapila.
Re: Perform streaming logical transactions by background workers and parallel apply
On Fri, Jan 20, 2023 at 11:48 AM Masahiko Sawada wrote: > > > > > Yet another way is to use the existing parameter logical_decode_mode > > [1]. If the value of logical_decoding_mode is 'immediate', then we can > > immediately switch to partial serialize mode. This will eliminate the > > dependency on timing. The one argument against using this is that it > > won't be as clear as a separate parameter like > > 'stream_serialize_threshold' proposed by the patch but OTOH we already > > have a few parameters that serve a different purpose when used on the > > subscriber. For example, 'max_replication_slots' is used to define the > > maximum number of replication slots on the publisher and the maximum > > number of origins on subscribers. Similarly, > > wal_retrieve_retry_interval' is used for different purposes on > > subscriber and standby nodes. > > Using the existing parameter makes sense to me. But if we use > logical_decoding_mode also on the subscriber, as Shveta Malik also > suggested, probably it's better to rename it so as not to confuse. For > example, logical_replication_mode or something. > +1. Among the options discussed, this sounds better. -- With Regards, Amit Kapila.
Re: Remove source code display from \df+?
On Sun, 22 Jan 2023 at 21:37, Justin Pryzby wrote: > On Sun, Jan 22, 2023 at 08:23:25PM -0500, Isaac Morland wrote: > > > Were you able to test with your own github account ? > > > > I haven’t had a chance to try this. I must confess to being a bit > confused > > by the distinction between running the CI tests and doing "make check"; > > ideally I would like to be able to run all the tests on my own machine > > without any external resources. But at the same time I don’t pretend to > > understand the full situation so I will try to use this when I get some > > time. > > First: "make check" only runs the sql tests, and not the perl tests > (including pg_upgrade) or isolation tests. check-world runs everything. > Thanks very much. I should have remembered check-world, and of course the fact that the CI tests multiple platforms. I’ll go and do some reading/re-reading; now that I’ve gone through some parts of the process I’ll probably understand more. The latest submission appears to have passed: http://cfbot.cputube.org/isaac-morland.html However, one of the jobs (Windows - Server 2019, MinGW64 - Meson) is paused and appears never to have run: https://cirrus-ci.com/task/6687014536347648 Other than that, I think this is passing the tests.
Re: recovery modules
On Tue, Jan 17, 2023 at 08:44:27PM -0800, Nathan Bossart wrote: > Thanks. Here's a rebased version of the last patch. Thanks for the rebase. The final state of the documentation is as follows: 51. Archive and Recovery Modules 51.1. Archive Module Initialization Functions 51.2. Archive Module Callbacks 51.3. Recovery Module Initialization Functions 51.4. Recovery Module Callbacks I am not completely sure whether this grouping is the best thing to do. Wouldn't it be better to move that into two different sub-sections instead? One layout suggestion: 51. WAL Modules 51.1. Archive Modules 51.1.1. Archive Module Initialization Functions 51.1.2. Archive Module Callbacks 51.2. Recovery Modules 51.2.1. Recovery Module Initialization Functions 51.2.2. Recovery Module Callbacks Putting both of them in the same section sounds like a good idea per the symmetry that one would likely have between the code paths of archiving and recovery, so as they share some common knowledge. This kinds of comes back to the previous suggestion to rename basic_archive to something like wal_modules, that covers both archiving and recovery. I does not seem like this would overlap with RMGRs, which is much more internal anyway. (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("must specify restore_command when standby mode is not enabled"))); + errmsg("must specify restore_command or a restore_library that defines " +"a restore callback when standby mode is not enabled"))); This is long. Shouldn't this be split into an errdetail() to list all the options at hand? - if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0') - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), -errmsg("both archive_command and archive_library set"), -errdetail("Only one of archive_command, archive_library may be set."))); + CheckMutuallyExclusiveGUCs(XLogArchiveLibrary, "archive_library", + XLogArchiveCommand, "archive_command"); The introduction of this routine could be a patch on its own, as it impacts the archiving path. + CheckMutuallyExclusiveGUCs(restoreLibrary, "restore_library", + archiveCleanupCommand, "archive_cleanup_command"); + if (strcmp(prevRestoreLibrary, restoreLibrary) != 0 || + strcmp(prevArchiveCleanupCommand, archiveCleanupCommand) != 0) + { + call_recovery_module_shutdown_cb(0, (Datum) 0); + LoadRecoveryCallbacks(); + } + + pfree(prevRestoreLibrary); + pfree(prevArchiveCleanupCommand); Hm.. The callers of CheckMutuallyExclusiveGUCs() with the new ERROR paths they introduce need a close lookup. As far as I can see this concerns four areas depending on the three restore commands (restore_command and recovery_end command for startup, archive_cleanup_command for the checkpointer): - Startup process initialization, as of validateRecoveryParameters() where the postmaster GUCs for the recovery target are evaluated. This one is an early stage which is fine. - Startup reloading, as of StartupRereadConfig(). This code could involve a WAL receiver restart depending on a change in the slot change or in primary_conninfo, and force at the same time an ERROR because of conflicting recovery library and command configuration. This one should be safe because pendingWalRcvRestart would just be considered later on by the startup process itself while waiting for WAL to become available. Still this could deserve a comment? Even if there is a misconfiguration, a reload on a standby would enforce a FATAL in the startup process, taking down the whole server. - Checkpointer initialization, as of CheckpointerMain(). A configuration failure in this code path, aka server startup, causes the server to loop infinitely on FATAL with the misconfiguration showing up all the time.. This is a problem. - Last comes the checkpointer GUC reloading, as of HandleCheckpointerInterrupts(), with a second problem. This introduces a failure path where ConfigReloadPending is processed at the same time as ShutdownRequestPending based on the way it is coded, interacting with what would be a normal shutdown in some cases? And actually, if you enforce a misconfiguration on reload, the checkpointer reports an error but it does not enforce a process restart, hence it keeps around the new, incorrect, configuration while waiting for a new checkpoint to happen once restore_library and archive_cleanup_command are set. This could lead to surprises, IMO. Upgrading to a FATAL in this code path triggers an infinite loop, like the startup path. If the archive_cleanup_command ballback of a restore library triggers a FATAL, it seems to me that it would continuously trigger a server restart, actually. Perhaps that's something to document, in comparison to the safe fallbacks of the shell command where we don't
Re: Remove source code display from \df+?
On Sun, Jan 22, 2023 at 08:23:25PM -0500, Isaac Morland wrote: > > Were you able to test with your own github account ? > > I haven’t had a chance to try this. I must confess to being a bit confused > by the distinction between running the CI tests and doing "make check"; > ideally I would like to be able to run all the tests on my own machine > without any external resources. But at the same time I don’t pretend to > understand the full situation so I will try to use this when I get some > time. First: "make check" only runs the sql tests, and not the perl tests (including pg_upgrade) or isolation tests. check-world runs everything. One difference from running it locally is that cirrus runs tests under four OSes. Another is that it has a bunch of compilation flags and variations to help catch errors (although it's currently missing ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, so that wouldn't have been caught). And another reason is that it runs in a "clean" environment, so (for example) it'd probably catch if you have local, uncommited changes, or if you assumed that the username is "postgres" (earlier I said that it didn't, but actually the mac task runs as "admin"). The old way of doing things was for cfbot to "inject" the cirrus.yml file and then push a branch to cirrusci to run tests; it made some sense for people to mail a patch to the list to cause cfbot to run the tests under cirrusci. The current/new way is that .cirrus.yml is in the source tree, so anyone with a github account can do that. IMO it no longer makes sense to send patches to the list "to see" if it passes tests. I encouraging those who haven't to try it. -- Justin
Re: Non-superuser subscription owners
Hi, On 2023-01-22 09:05:27 -0800, Jeff Davis wrote: > On Sat, 2023-01-21 at 14:01 -0800, Andres Freund wrote: > > There are good reasons to have 'peer' authentication set up for the > > user > > running postgres, so admin scripts can connect without issues. Which > > unfortunately then also means that postgres_fdw etc can connect to > > the current > > database as superuser, without that check. Which imo clearly is an > > issue. > > Perhaps we should have a way to directly turn on/off authentication > methods in libpq through API functions and/or options? Yes. There's an in-progress patch adding, I think, pretty much what is required here: https://www.postgresql.org/message-id/9e5a8ccddb8355ea9fa4b75a1e3a9edc88a70cd3.ca...@vmware.com require_auth=a,b,c I think an allowlist approach is the right thing for the subscription (and postgres_fdw/dblink) use case, otherwise we'll add some auth method down the line without updating what's disallowed in the subscription code. > > Why is this only about local files, rather than e.g. also using the local > > user? > > It's not, but we happen to already have pg_read_server_files, and it > makes sense to use that at least for files referenced directly in the > connection string. You're right that it's incomplete, and also that it > doesn't make a lot of sense for files accessed indirectly. I just meant that we need to pay attention to user-based permissions as well. Greetings, Andres Freund
Re: run pgindent on a regular basis / scripted manner
Andres Freund writes: > On 2023-01-22 19:28:42 -0500, Tom Lane wrote: >> Hmm ... right offhand, the only objection I can see is that the >> pg_bsd_indent files use the BSD 4-clause license, which is not ours. >> However, didn't UCB grant a blanket exception years ago that said >> that people could treat that as the 3-clause license? > Yep: > https://www.freebsd.org/copyright/license/ Cool. I'll take a look at doing this later (probably after the current CF) unless somebody beats me to it. regards, tom lane
Re: Remove source code display from \df+?
On Sun, 22 Jan 2023 at 17:27, Justin Pryzby wrote: > On Sun, Jan 22, 2023 at 04:28:21PM -0500, Isaac Morland wrote: > > On Sun, 22 Jan 2023 at 15:04, Tom Lane wrote: > But now I'm having a problem I don't understand: the CI are still > failling, > > but not in the psql test. Instead, I get this: > > > > [20:11:17.624] +++ tap check in src/bin/pg_upgrade +++ > > You'll find the diff in the "artifacts", but not a separate "diff" file. > > https://api.cirrus-ci.com/v1/artifact/task/6146418377752576/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/regress_log_002_pg_upgrade > > CREATE FUNCTION public.regress_psql_df_sql() RETURNS void > LANGUAGE sql > BEGIN ATOMIC > - SELECT NULL::text; > + SELECT NULL::text AS text; > END; > > It's failing because after restoring the function, the column is named > "text" - maybe it's a bug. > OK, thanks. I'd say I've uncovered a bug related to pg_upgrade, unless I’m missing something. However, I've adjusted my patch so that nothing it creates is kept. This seems tidier even without the test failure. Tom's earlier point was that neither the function nor its owner needs to > be preserved (as is done to exercise pg_dump/restore/upgrade - surely > functions are already tested). Dropping it when you're done running \df > will avoid any possible issue with pg_upgrade. > > Were you able to test with your own github account ? > I haven’t had a chance to try this. I must confess to being a bit confused by the distinction between running the CI tests and doing "make check"; ideally I would like to be able to run all the tests on my own machine without any external resources. But at the same time I don’t pretend to understand the full situation so I will try to use this when I get some time. 0001-Remove-source-code-display-from-df-v6.patch Description: Binary data
Re: run pgindent on a regular basis / scripted manner
Hi, On 2023-01-22 19:28:42 -0500, Tom Lane wrote: > Andres Freund writes: > > I think I've proposed this before, but I still think that as long as we rely > > on pg_bsd_indent, we should have it be part of our source tree and > > automatically built. It's no wonder that barely anybody indents their > > patches, given that it requires building pg_bsd_ident in a separate repo > > (but > > referencing our sourc etree), putting the binary in path, etc. > > Hmm ... right offhand, the only objection I can see is that the > pg_bsd_indent files use the BSD 4-clause license, which is not ours. > However, didn't UCB grant a blanket exception years ago that said > that people could treat that as the 3-clause license? Yep: https://www.freebsd.org/copyright/license/ NOTE: The copyright of UC Berkeley’s Berkeley Software Distribution ("BSD") source has been updated. The copyright addendum may be found at ftp://ftp.cs.berkeley.edu/pub/4bsd/README.Impt.License.Change and is included below. July 22, 1999 To All Licensees, Distributors of Any Version of BSD: As you know, certain of the Berkeley Software Distribution ("BSD") source code files require that further distributions of products containing all or portions of the software, acknowledge within their advertising materials that such products contain software developed by UC Berkeley and its contributors. Specifically, the provision reads: * 3. All advertising materials mentioning features or use of this software *must display the following acknowledgement: *This product includes software developed by the University of *California, Berkeley and its contributors." Effective immediately, licensees and distributors are no longer required to include the acknowledgement within advertising materials. Accordingly, the foregoing paragraph of those BSD Unix files containing it is hereby deleted in its entirety. William Hoskins Director, Office of Technology Licensing University of California, Berkeley I did check, and the FTP bit is still downloadable. A bit awkward though, now that browsers have/are removing ftp support. Greetings, Andres Freund
Re: run pgindent on a regular basis / scripted manner
Hi, On 2023-01-22 19:50:10 -0500, Andrew Dunstan wrote: > On 2023-01-22 Su 18:14, Tom Lane wrote: > > Jelte Fennema writes: > >> Maybe I'm not understanding your issue correctly, but for such > >> a case you could push two commits at the same time. > > I don't know that much about git commit hooks, but do they really > > only check the final state of a series of commits? > > > The pre-commit hook is literally run every time you do `git commit`. But > it's only run on your local instance and only if you have enabled it. > It's not project-wide. There's different hooks. Locally, I think pre-push would be better suited to this than pre-commit (I often save WIP work in local branches, it'd be pretty annoying if some indentation thing swore at me). But there's also hooks like pre-receive, that allow doing validation on the server side. Which obviously would be project wide... Greetings, Andres Freund
Re: Logical replication timeout problem
Here are my review comments for patch v4-0001 == General 1. It makes no real difference, but I was wondering about: "update txn progress" versus "update progress txn" I thought that the first way sounds more natural. YMMV. If you change this then there is impact for the typedef, function names, comments, member names: ReorderBufferUpdateTxnProgressCB --> ReorderBufferUpdateProgressTxnCB “/* update progress txn callback */” --> “/* update txn progress callback */” update_progress_txn_cb_wrapper --> update_txn_progress_cb_wrapper updated_progress_txn --> update_txn_progress == Commit message 2. The problem is when there is a DDL in a transaction that generates lots of temporary data due to rewrite rules, these temporary data will not be processed by the pgoutput plugin. The previous commit (f95d53e) only fixed timeouts caused by filtering out changes in pgoutput. Therefore, the previous fix for DML had no impact on this case. ~ IMO this still some rewording to say up-front what the the actual problem -- i.e. an avoidable timeout occuring. SUGGESTION (or something like this...) When there is a DDL in a transaction that generates lots of temporary data due to rewrite rules, this temporary data will not be processed by the pgoutput plugin. This means it is possible for a timeout to occur if a sufficiently long time elapses since the last pgoutput message. A previous commit (f95d53e) fixed a similar scenario in this area, but that only fixed timeouts for DML going through pgoutput, so it did not address this DDL timeout case. == src/backend/replication/logical/logical.c 3. update_progress_txn_cb_wrapper +/* + * Update progress callback while processing a transaction. + * + * Try to update progress and send a keepalive message during sending data of a + * transaction (and its subtransactions) to the output plugin. + * + * For a large transaction, if we don't send any change to the downstream for a + * long time (exceeds the wal_receiver_timeout of standby) then it can timeout. + * This can happen when all or most of the changes are either not published or + * got filtered out. + */ +static void +update_progress_txn_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn, +ReorderBufferChange *change) Simplify the "Try to..." paragraph. And other part should also mention about DDL. SUGGESTION Try send a keepalive message during transaction processing. This is done because if we don't send any change to the downstream for a long time (exceeds the wal_receiver_timeout of standby), then it can timeout. This can happen for large DDL, or for large transactions when all or most of the changes are either not published or got filtered out. == .../replication/logical/reorderbuffer.c 4. ReorderBufferProcessTXN @@ -2105,6 +2105,19 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, PG_TRY(); { + /* + * Static variable used to accumulate the number of changes while + * processing txn. + */ + static int changes_count = 0; + + /* + * Sending keepalive messages after every change has some overhead, but + * testing showed there is no noticeable overhead if keepalive is only + * sent after every ~100 changes. + */ +#define CHANGES_THRESHOLD 100 + IMO these can be relocated to be declared/defined inside the "while" loop -- i.e. closer to where they are being used. ~~~ 5. + if (++changes_count >= CHANGES_THRESHOLD) + { + rb->update_progress_txn(rb, txn, change); + changes_count = 0; + } When there is no update_progress function this code is still incurring some small additional overhead for incrementing and testing the THRESHOLD every time, and also needlessly calling to the wrapper every 100x. This overhead could be avoided with a simpler up-front check like shown below. OTOH, maybe the overhead is insignificant enough that just leaving the curent code is neater? LogicalDecodingContext *ctx = rb->private_data; ... if (ctx->update_progress_txn && (++changes_count >= CHANGES_THRESHOLD)) { rb->update_progress_txn(rb, txn, change); changes_count = 0; } -- Kind Reagrds, Peter Smith. Fujitsu Australia
Re: run pgindent on a regular basis / scripted manner
On 2023-01-22 Su 18:14, Tom Lane wrote: > Jelte Fennema writes: >> Maybe I'm not understanding your issue correctly, but for such >> a case you could push two commits at the same time. > I don't know that much about git commit hooks, but do they really > only check the final state of a series of commits? The pre-commit hook is literally run every time you do `git commit`. But it's only run on your local instance and only if you have enabled it. It's not project-wide. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: run pgindent on a regular basis / scripted manner
On Sun, Jan 22, 2023 at 07:28:42PM -0500, Tom Lane wrote: > Andres Freund writes: >> I think I've proposed this before, but I still think that as long as we rely >> on pg_bsd_indent, we should have it be part of our source tree and >> automatically built. It's no wonder that barely anybody indents their >> patches, given that it requires building pg_bsd_ident in a separate repo (but >> referencing our sourc etree), putting the binary in path, etc. > > Hmm ... right offhand, the only objection I can see is that the > pg_bsd_indent files use the BSD 4-clause license, which is not ours. > However, didn't UCB grant a blanket exception years ago that said > that people could treat that as the 3-clause license? If we could > get past the license question, I agree that doing what you suggest > would be superior to the current situation. +1. -- Michael signature.asc Description: PGP signature
Re: Record queryid when auto_explain.log_verbose is on
On Fri, Jan 20, 2023 at 12:32:58PM +0900, Michael Paquier wrote: > FWIW, no objections from here. This maps with EXPLAIN where the query > ID is only printed under VERBOSE. While looking at this change, I have been wondering about something.. Isn't the knowledge of the query ID something that should be pushed within ExplainPrintPlan() so as we don't duplicate in two places the checks that show it? In short, the patch ignores the case where compute_query_id = regress in auto_explain. ExplainPrintTriggers() is kind of different because there is auto_explain_log_triggers. Still, we could add a flag in ExplainState deciding if the triggers should be printed, so as it would be possible to move ExplainPrintTriggers() and ExplainPrintJITSummary() within ExplainPrintPlan(), as well? The same kind of logic could be applied for the planning time and the buffer usage if these are tracked in ExplainState rather than being explicit arguments of ExplainOnePlan(). Not to mention that this reduces the differences between ExplainOneUtility() and ExplainOnePlan(). Leaving this comment aside, I think that this should have at least one regression test in 001_auto_explain.pl, where query_log() can be called while the verbose GUC of auto_explain is enabled. -- Michael signature.asc Description: PGP signature
Re: run pgindent on a regular basis / scripted manner
Andres Freund writes: > I think I've proposed this before, but I still think that as long as we rely > on pg_bsd_indent, we should have it be part of our source tree and > automatically built. It's no wonder that barely anybody indents their > patches, given that it requires building pg_bsd_ident in a separate repo (but > referencing our sourc etree), putting the binary in path, etc. Hmm ... right offhand, the only objection I can see is that the pg_bsd_indent files use the BSD 4-clause license, which is not ours. However, didn't UCB grant a blanket exception years ago that said that people could treat that as the 3-clause license? If we could get past the license question, I agree that doing what you suggest would be superior to the current situation. regards, tom lane
Re: run pgindent on a regular basis / scripted manner
Hi, On 2023-01-22 18:28:27 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2023-01-22 18:20:49 +0100, Jelte Fennema wrote: > >> I don't think the amount of pain is really much lower if we reformat > >> 10,000 or 300,000 lines of code, without automation both would be > >> quite painful. But the git commands I shared in my previous email > >> should alleviate most of that pain. > > > It's practically not possible to review a 300k line change. And perhaps I'm > > paranoid, but I would have a problem with a commit in the history that's > > practically not reviewable. > > As far as that goes, if you had concern then you could run the indentation > tool locally and confirm you got matching results. Of course, but I somehow feel a change of formatting should be reviewable to at least some degree. Even if it's just to make sure that the tool didn't have a bug and cause some subtle behavioural change. > So the more I think about it the less excited I am about depending on > clang-format, because version skew in peoples' clang installations seems > inevitable, and there's good reason to fear that that would show up > as varying indentation results. One thing that I like about clang-format is that it's possible to treat it about our include order rules (which does find some "irregularities). But of course that's not enough. If we decide to move to another tool, I think it might be worth to remove a few of the pg_bsd_indent options, that other tools won't be able to emulate, first. E.g. -di12 -> -di4 would remove a *lot* of the noise from a move to another tool. And be much easier to write manually, but ... :) I think I've proposed this before, but I still think that as long as we rely on pg_bsd_indent, we should have it be part of our source tree and automatically built. It's no wonder that barely anybody indents their patches, given that it requires building pg_bsd_ident in a separate repo (but referencing our sourc etree), putting the binary in path, etc. Greetings, Andres Freund
Re: run pgindent on a regular basis / scripted manner
On Sun, Jan 22, 2023 at 3:28 PM Tom Lane wrote: > So the more I think about it the less excited I am about depending on > clang-format, because version skew in peoples' clang installations seems > inevitable, and there's good reason to fear that that would show up > as varying indentation results. I have to admit that the way that I was thinking about this was colored by the way that I use clang-format today. I only now realize how different my requirements are to the requirements that we'd have for any tool that gets run against the tree in bulk. In particular, I didn't realize how annoying the non-additive nature of certain variable alignment rules might be until you pointed it out today (seems obvious now!). In my experience clang-format really shines when you need to quickly indent code that is indented in some way that looks completely wrong -- it does quite a lot better than pgindent when that's your starting point. It has a reasonable way of balancing competing goals like maximum number of columns (a soft maximum) and how function parameters are displayed, which pgindent can't do. It also avoids allowing a function parameter from a function declaration with its type name on its own line, before the variable name -- also beyond the capabilities of pgindent IIRC. Features like that make it very useful as a first pass thing, where all the bells and whistles have little real downside. Running clang-format and then running pgindent tends to produce better results than just running pgindent, at least when working on a new patch. -- Peter Geoghegan
Re: pgindent vs variable declaration across multiple lines
On Mon, Jan 23, 2023 at 11:34 AM Tom Lane wrote: > I spent some more time staring at this and came up with what seems like > a workable patch, based on the idea that what we want to indent is > specifically initialization expressions. pg_bsd_indent does have some > understanding of that: ps.block_init is true within such an expression, > and then ps.block_init_level is the brace nesting depth inside it. > If you just enable ind_stmt based on block_init then you get a bunch > of unwanted additional indentation inside struct initializers, but > it seems to work okay if you restrict it to not happen inside braces. > More importantly, it doesn't change anything we don't want changed. Nice! LGTM now that I know about block_init.
Re: run pgindent on a regular basis / scripted manner
Andres Freund writes: > On 2023-01-22 18:20:49 +0100, Jelte Fennema wrote: >> I don't think the amount of pain is really much lower if we reformat >> 10,000 or 300,000 lines of code, without automation both would be >> quite painful. But the git commands I shared in my previous email >> should alleviate most of that pain. > It's practically not possible to review a 300k line change. And perhaps I'm > paranoid, but I would have a problem with a commit in the history that's > practically not reviewable. As far as that goes, if you had concern then you could run the indentation tool locally and confirm you got matching results. But this does point up that the processes Jelte suggested all depend critically on indentation results being 100% reproducible by anybody. So the more I think about it the less excited I am about depending on clang-format, because version skew in peoples' clang installations seems inevitable, and there's good reason to fear that that would show up as varying indentation results. regards, tom lane
Re: pg_stats and range statistics
On 1/22/23 22:33, Justin Pryzby wrote: > On Sun, Jan 22, 2023 at 07:19:41PM +0100, Tomas Vondra wrote: >> On 1/21/23 19:53, Egor Rogov wrote: >>> Hi Tomas, >>> On 21.01.2023 00:50, Tomas Vondra wrote: This simply adds two functions, accepting/producing anyarray - one for lower bounds, one for upper bounds. I don't think it can be done with a plain subquery (or at least I don't know how). >>> >>> Anyarray is an alien to SQL, so functions are well justified here. What >>> makes me a bit uneasy is two almost identical functions. Should we >>> consider other options like a function with an additional parameter or a >>> function returning an array of bounds arrays (which is somewhat >>> wasteful, but probably it doesn't matter much here)? >>> >> >> I thought about that, but I think the alternatives (e.g. a single >> function with a parameter determining which boundary to return). But I >> don't think it's better. > > What about a common function, maybe called like: > > ranges_upper_bounds(PG_FUNCTION_ARGS) > { > AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0); > Oid element_type = AARR_ELEMTYPE(array); > TypeCacheEntry *typentry; > > /* Get information about range type; note column might be a domain */ > typentry = range_get_typcache(fcinfo, getBaseType(element_type)); > > return ranges_bounds_common(typentry, array, false); > } > > That saves 40 LOC. > Thanks, that's better. But I'm still not sure it's a good idea to add function with anyarray argument, when we need it to be an array of ranges ... I wonder if we have other functions doing something similar, i.e. accepting a polymorphic type and then imposing additional restrictions on it. > Shouldn't this add some sql tests ? > Yeah, I guess we should have a couple tests calling these functions on different range arrays. This reminds me lower()/upper() have some extra rules about handling empty ranges / infinite boundaries etc. These functions should behave consistently (as if we called lower() in a loop) and I'm pretty sure that's not the current state. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: run pgindent on a regular basis / scripted manner
Jelte Fennema writes: > Maybe I'm not understanding your issue correctly, but for such > a case you could push two commits at the same time. I don't know that much about git commit hooks, but do they really only check the final state of a series of commits? In any case, I'm still down on the idea of checking this in a commit hook because of the complexity and lack of transparency of such a check. If you think your commit is correctly indented, but the hook (running on somebody else's machine) disagrees, how are you going to debug that? I don't want to get into such a situation, especially since Murphy's law guarantees that it would mainly bite people under time pressure, like when pushing a security fix. regards, tom lane
Re: [PATCH] Use 128-bit math to accelerate numeric division, when 8 < divisor digits <= 16
On Sun, Jan 22, 2023, at 14:25, Dean Rasheed wrote: > I just modified the previous test you posted: > > \timing on > SELECT count(numeric_div_volatile(1e131071,123456)) FROM > generate_series(1,1e4); > > Time: 2048.060 ms (00:02.048)-- HEAD > Time: 2422.720 ms (00:02.423)-- With patch > ... > > Apparently it can make a difference. Probably something to do with > having less data to move around. I remember noticing that when I wrote > div_var_int(), which is why I split it into 2 branches in that way. Many thanks for feedback. Nice catch! New patch attached. Interesting, I'm not able to reproduce this on my MacBook Pro M1 Max: SELECT version; PostgreSQL 16devel on aarch64-apple-darwin22.2.0, compiled by Apple clang version 14.0.0 (clang-1400.0.29.202), 64-bit SELECT count(numeric_div_volatile(1e131071,123456)) FROM generate_series(1,1e 4); Time: 1569.730 ms (00:01.570) - HEAD Time: 1569.918 ms (00:01.570) -- div_var_int64.patch Time: 1569.038 ms (00:01.569) -- div_var_int64-2.patch Just curious, what platform are you on? /Joel div_var_int64-2.patch Description: Binary data
Re: run pgindent on a regular basis / scripted manner
Andres Freund writes: > I strongly dislike it, I rarely get it right by hand - but it does have some > benefit over aligning variable names based on the length of the type names as > uncrustify/clang-format: In their approach an added local variable can cause > all the other variables to be re-indented (and their initial value possibly > wrapped). The fixed alignment doesn't have that issue. Yeah. That's one of my biggest gripes about pgperltidy: if you insert another assignment in a series of assignments, it is very likely to reformat all the adjacent assignments because it thinks it's cool to make all the equal signs line up. That's just awful. You can either run pgperltidy on new code before committing, and accept that the feature patch will touch a lot of lines it's not making real changes to (thereby dirtying the "git blame" history) or not do so and thereby commit code that's not passing tidiness checks. Let's *not* adopt any style that causes similar things to start happening in our C code. regards, tom lane
Re: pgindent vs variable declaration across multiple lines
Hi, On 2023-01-22 17:34:52 -0500, Tom Lane wrote: > I spent some more time staring at this and came up with what seems like > a workable patch, based on the idea that what we want to indent is > specifically initialization expressions. That's awesome. Thanks for doing that. > Proposed patch for pg_bsd_indent attached. I've also attached a diff > representing the delta between what current pg_bsd_indent wants to do > to HEAD and what this would do. All the changes it wants to make look > good, although I can't say whether there are other places it's failing > to change that we'd like it to. I think it's a significant improvement, even if it turns out that there's other cases it misses. Greetings, Andres Freund
Re: run pgindent on a regular basis / scripted manner
Hi, On 2023-01-22 18:20:49 +0100, Jelte Fennema wrote: > > But switching away from that intermixed with a lot of other changes isn't > > going to be fun. > > I don't think the amount of pain is really much lower if we reformat > 10,000 or 300,000 lines of code, without automation both would be > quite painful. But the git commands I shared in my previous email > should alleviate most of that pain. It's practically not possible to review a 300k line change. And perhaps I'm paranoid, but I would have a problem with a commit in the history that's practically not reviewable. > > I don't have a problem with the current pgindent alignment of function > > parameters, so not sure what you mean about that. > > Function parameter alignment is fine with pgindent imho, but this +12 > column variable declaration thing I personally think is quite weird. I strongly dislike it, I rarely get it right by hand - but it does have some benefit over aligning variable names based on the length of the type names as uncrustify/clang-format: In their approach an added local variable can cause all the other variables to be re-indented (and their initial value possibly wrapped). The fixed alignment doesn't have that issue. Personally I think the cost of trying to align local variable names is way way higher than the benefit. Greetings, Andres Freund
Re: pgindent vs variable declaration across multiple lines
Thomas Munro writes: > On Fri, Jan 20, 2023 at 2:43 PM Tom Lane wrote: >> Yeah :-(. That's enough of a rat's nest that I've not really wanted to. >> But I'd support applying such a fix if someone can figure it out. > This may be a clue: the place where declarations are treated > differently seems to be (stangely) in io.c: > ps.ind_stmt = ps.in_stmt & ~ps.in_decl; /* next line should be > * indented if we have not > * completed this stmt and if > * we are not in the middle of > * a declaration */ > If you just remove "& ~ps.in_decl" then it does the desired thing for > that new code in predicate.c, but it also interferes with declarations > with commas, ie int i, j; where i and j currently line up, now j just > gets one indentation level. It's probably not the right way to do it > but you can fix that with a last token kluge, something like: > #include "indent_codes.h" > ps.ind_stmt = ps.in_stmt && (!ps.in_decl || ps.last_token != comma); > That improves a lot of code in our tree IMHO but of course there is > other collateral damage... I spent some more time staring at this and came up with what seems like a workable patch, based on the idea that what we want to indent is specifically initialization expressions. pg_bsd_indent does have some understanding of that: ps.block_init is true within such an expression, and then ps.block_init_level is the brace nesting depth inside it. If you just enable ind_stmt based on block_init then you get a bunch of unwanted additional indentation inside struct initializers, but it seems to work okay if you restrict it to not happen inside braces. More importantly, it doesn't change anything we don't want changed. Proposed patch for pg_bsd_indent attached. I've also attached a diff representing the delta between what current pg_bsd_indent wants to do to HEAD and what this would do. All the changes it wants to make look good, although I can't say whether there are other places it's failing to change that we'd like it to. regards, tom lane diff --git a/io.c b/io.c index 3ce8bfb..8a05d7e 100644 --- a/io.c +++ b/io.c @@ -205,11 +205,12 @@ dump_line(void) ps.decl_on_line = ps.in_decl; /* if we are in the middle of a * declaration, remember that fact for * proper comment indentation */ -ps.ind_stmt = ps.in_stmt & ~ps.in_decl; /* next line should be - * indented if we have not - * completed this stmt and if - * we are not in the middle of - * a declaration */ +/* next line should be indented if we have not completed this stmt, and + * either we are not in a declaration or we are in an initialization + * assignment; but not if we're within braces in an initialization, + * because that scenario is handled by other rules. */ +ps.ind_stmt = ps.in_stmt && + (!ps.in_decl || (ps.block_init && ps.block_init_level <= 0)); ps.use_ff = false; ps.dumped_decl_indent = 0; *(e_lab = s_lab) = '\0'; /* reset buffers */ diff --git a/contrib/ltree/ltree_gist.c b/contrib/ltree/ltree_gist.c index 5d2db6c62b..119d03522f 100644 --- a/contrib/ltree/ltree_gist.c +++ b/contrib/ltree/ltree_gist.c @@ -43,7 +43,7 @@ ltree_gist_alloc(bool isalltrue, BITVECP sign, int siglen, ltree *left, ltree *right) { int32 size = LTG_HDRSIZE + (isalltrue ? 0 : siglen) + - (left ? VARSIZE(left) + (right ? VARSIZE(right) : 0) : 0); + (left ? VARSIZE(left) + (right ? VARSIZE(right) : 0) : 0); ltree_gist *result = palloc(size); SET_VARSIZE(result, size); diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c index e523d22eba..295c7dcc22 100644 --- a/contrib/test_decoding/test_decoding.c +++ b/contrib/test_decoding/test_decoding.c @@ -290,7 +290,7 @@ pg_decode_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { TestDecodingData *data = ctx->output_plugin_private; TestDecodingTxnData *txndata = - MemoryContextAllocZero(ctx->context, sizeof(TestDecodingTxnData)); + MemoryContextAllocZero(ctx->context, sizeof(TestDecodingTxnData)); txndata->xact_wrote_changes = false; txn->output_plugin_private = txndata; @@ -350,7 +350,7 @@ pg_decode_begin_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { TestDecodingData *data = ctx->output_plugin_private; TestDecodingTxnData *txndata = - MemoryContextAllocZero(ctx->context, sizeof(TestDecodingTxnData)); + MemoryContextAllocZero(ctx->context, sizeof(TestDecodingTxnData)); txndata->xact_wrote_changes = false; txn->output_plugin_private = txndata; diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index 14c23101ad..dfcb4d279b 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -1717,7
Re: Remove source code display from \df+?
On Sun, Jan 22, 2023 at 04:28:21PM -0500, Isaac Morland wrote: > On Sun, 22 Jan 2023 at 15:04, Tom Lane wrote: > > > Isaac Morland writes: > > > On Sun, 22 Jan 2023 at 14:26, Alvaro Herrera > > > wrote: > > >> This one would fail the sanity check that all roles created by > > >> regression tests need to have names that start with "regress_". > > > > > Thanks for the correction. Now I feel like I've skipped some of the > > > readings! > > > Updated patch attached. Informally, I am adopting the regress_* policy > > for > > > all object types. > > > > That's excessive. The policy Alvaro mentions applies to globally-visible > > object names (i.e., database, role, and tablespace names), and it's there > > to try to ensure that doing "make installcheck" against a live > > installation won't clobber any non-test-created objects. There's no point > > in having such a policy within a test database --- its most likely effect > > there would be to increase the risk that different test scripts step on > > each others' toes. If you feel a need for a name prefix for non-global > > objects, use something based on the name of your test script. > > > > I already used a test-specific prefix, then added "regress_" in front. > Point taken, however, on the difference between global and non-global > objects. > > But now I'm having a problem I don't understand: the CI are still failling, > but not in the psql test. Instead, I get this: > > [20:11:17.624] +++ tap check in src/bin/pg_upgrade +++ You'll find the diff in the "artifacts", but not a separate "diff" file. https://api.cirrus-ci.com/v1/artifact/task/6146418377752576/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/regress_log_002_pg_upgrade CREATE FUNCTION public.regress_psql_df_sql() RETURNS void LANGUAGE sql BEGIN ATOMIC - SELECT NULL::text; + SELECT NULL::text AS text; END; It's failing because after restoring the function, the column is named "text" - maybe it's a bug. Tom's earlier point was that neither the function nor its owner needs to be preserved (as is done to exercise pg_dump/restore/upgrade - surely functions are already tested). Dropping it when you're done running \df will avoid any possible issue with pg_upgrade. Were you able to test with your own github account ? -- Justin
Re: Remove source code display from \df+?
On Sun, 22 Jan 2023 at 16:56, Justin Pryzby wrote: > On Sun, Jan 22, 2023 at 03:04:14PM -0500, Tom Lane wrote: > > That's excessive. The policy Alvaro mentions applies to globally-visible > > object names (i.e., database, role, and tablespace names), and it's there > > to try to ensure that doing "make installcheck" against a live > > installation won't clobber any non-test-created objects. There's no > point > > in having such a policy within a test database --- its most likely effect > > there would be to increase the risk that different test scripts step on > > each others' toes. If you feel a need for a name prefix for non-global > > objects, use something based on the name of your test script. > > But we *are* talking about the role to be created to allow stable output > of \df+ , so it's necessary to name it "regress_*". To appease > ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, and to avoid clobbering > global objects during "installcheck". > Tom is talking about my informal policy of prefixing all objects. Only global objects need to be prefixed with regress_, but I prefixed everything I created (functions as well as the role). I actually called the role regress_psql_df and used that entire role name as the prefix of my function names, so I think it unlikely that I’ll collide with anything else.
Re: run pgindent on a regular basis / scripted manner
Hi, On 2023-01-22 22:19:10 +0100, Jelte Fennema wrote: > Maybe I'm not understanding your issue correctly, but for such > a case you could push two commits at the same time. Right. > Apart from that "git diff -w" will hide any whitespace changes so I'm not I > personally wouldn't consider it important to split such changes across > commits. I do think it's important. For one, the changes made by pgindent et al aren't just whitespace ones. But I think it's also important to be able to see the actual changes made in a patch precisely - lots of spurious whitespace changes could indicate a problem. Greetings, Andres Freund
Re: Remove source code display from \df+?
On Sun, Jan 22, 2023 at 03:04:14PM -0500, Tom Lane wrote: > Isaac Morland writes: > > On Sun, 22 Jan 2023 at 14:26, Alvaro Herrera > > wrote: > >> This one would fail the sanity check that all roles created by > >> regression tests need to have names that start with "regress_". > > > Thanks for the correction. Now I feel like I've skipped some of the > > readings! > > Updated patch attached. Informally, I am adopting the regress_* policy for > > all object types. > > That's excessive. The policy Alvaro mentions applies to globally-visible > object names (i.e., database, role, and tablespace names), and it's there > to try to ensure that doing "make installcheck" against a live > installation won't clobber any non-test-created objects. There's no point > in having such a policy within a test database --- its most likely effect > there would be to increase the risk that different test scripts step on > each others' toes. If you feel a need for a name prefix for non-global > objects, use something based on the name of your test script. But we *are* talking about the role to be created to allow stable output of \df+ , so it's necessary to name it "regress_*". To appease ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, and to avoid clobbering global objects during "installcheck". -- Justin
Re: pg_stats and range statistics
On Sun, Jan 22, 2023 at 07:19:41PM +0100, Tomas Vondra wrote: > On 1/21/23 19:53, Egor Rogov wrote: > > Hi Tomas, > > On 21.01.2023 00:50, Tomas Vondra wrote: > >> This simply adds two functions, accepting/producing anyarray - one for > >> lower bounds, one for upper bounds. I don't think it can be done with a > >> plain subquery (or at least I don't know how). > > > > Anyarray is an alien to SQL, so functions are well justified here. What > > makes me a bit uneasy is two almost identical functions. Should we > > consider other options like a function with an additional parameter or a > > function returning an array of bounds arrays (which is somewhat > > wasteful, but probably it doesn't matter much here)? > > > > I thought about that, but I think the alternatives (e.g. a single > function with a parameter determining which boundary to return). But I > don't think it's better. What about a common function, maybe called like: ranges_upper_bounds(PG_FUNCTION_ARGS) { AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0); Oid element_type = AARR_ELEMTYPE(array); TypeCacheEntry *typentry; /* Get information about range type; note column might be a domain */ typentry = range_get_typcache(fcinfo, getBaseType(element_type)); return ranges_bounds_common(typentry, array, false); } That saves 40 LOC. Shouldn't this add some sql tests ? -- Justin
Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences
On Sun, 22 Jan 2023 14:42:46 -0600 "Karl O. Pinc" wrote: > Attached are 2 patches: > > v10-0001-List-trusted-and-obsolete-extensions.patch > > List trusted extenions in 4 columns, with the CSS altered > to put spacing between vertical columns. In theory, a number of other simplelist presentations could benefit from this. For example, in the Data Types Boolean Type section the true truth values are presently listed vertically, like so: true yes on 1 Instead they could still be listed 'type="vert"' (the default), but with 'columns="4"', to produce something like: trueyeson1 This stands out just as much, but takes less space on the page. Likewise, perhaps some tables are tables instead of simplelists just because putting simplelists into columns was so ugly. I'll leave such modifications to others, at least for now. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Re: Remove source code display from \df+?
On Sun, 22 Jan 2023 at 15:04, Tom Lane wrote: > Isaac Morland writes: > > On Sun, 22 Jan 2023 at 14:26, Alvaro Herrera > > wrote: > >> This one would fail the sanity check that all roles created by > >> regression tests need to have names that start with "regress_". > > > Thanks for the correction. Now I feel like I've skipped some of the > > readings! > > Updated patch attached. Informally, I am adopting the regress_* policy > for > > all object types. > > That's excessive. The policy Alvaro mentions applies to globally-visible > object names (i.e., database, role, and tablespace names), and it's there > to try to ensure that doing "make installcheck" against a live > installation won't clobber any non-test-created objects. There's no point > in having such a policy within a test database --- its most likely effect > there would be to increase the risk that different test scripts step on > each others' toes. If you feel a need for a name prefix for non-global > objects, use something based on the name of your test script. > I already used a test-specific prefix, then added "regress_" in front. Point taken, however, on the difference between global and non-global objects. But now I'm having a problem I don't understand: the CI are still failling, but not in the psql test. Instead, I get this: [20:11:17.624] +++ tap check in src/bin/pg_upgrade +++ [20:11:17.624] [20:09:11] t/001_basic.pl ... ok 106 ms ( 0.00 usr 0.00 sys + 0.06 cusr 0.02 csys = 0.08 CPU) [20:11:17.624] [20:11:17.624] # Failed test 'old and new dumps match after pg_upgrade' [20:11:17.624] # at t/002_pg_upgrade.pl line 362. [20:11:17.624] # got: '1' [20:11:17.624] # expected: '0' [20:11:17.624] # Looks like you failed 1 test of 13. [20:11:17.624] [20:11:17] t/002_pg_upgrade.pl .. [20:11:17.624] Dubious, test returned 1 (wstat 256, 0x100) [20:11:17.624] Failed 1/13 subtests [20:11:17.624] [20:11:17] [20:11:17.624] [20:11:17.624] Test Summary Report [20:11:17.624] --- [20:11:17.624] t/002_pg_upgrade.pl (Wstat: 256 Tests: 13 Failed: 1) [20:11:17.624] Failed test: 13 [20:11:17.624] Non-zero exit status: 1 [20:11:17.624] Files=2, Tests=21, 126 wallclock secs ( 0.01 usr 0.00 sys + 6.65 cusr 3.95 csys = 10.61 CPU) [20:11:17.624] Result: FAIL [20:11:17.624] make[2]: *** [Makefile:55: check] Error 1 [20:11:17.625] make[1]: *** [Makefile:43: check-pg_upgrade-recurse] Error 2 As far as I can tell this is the only failure and doesn’t have anything to do with my change. Unless the objects I added are messing it up? Unlike when the psql regression test was failing, I don’t see an indication of where I can see the diffs.
Re: run pgindent on a regular basis / scripted manner
Maybe I'm not understanding your issue correctly, but for such a case you could push two commits at the same time. Apart from that "git diff -w" will hide any whitespace changes so I'm not I personally wouldn't consider it important to split such changes across commits.
Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences
On Sun, 22 Jan 2023 08:09:03 -0600 "Karl O. Pinc" wrote: > On Sat, 21 Jan 2023 08:11:43 -0600 > "Karl O. Pinc" wrote: > > > Attached are 2 v9 patch versions. I don't think I like them. > > I think the v8 versions are better. But I thought it > > wouldn't hurt to show them to you. > > > > On Fri, 20 Jan 2023 14:22:25 -0600 > > "Karl O. Pinc" wrote: > > > > > Attached are 2 alternatives: > > > (They touch separate files so the ordering is meaningless.) > > > > > > > > > v8-0001-List-trusted-and-obsolete-extensions.patch > > > > > > Instead of putting [trusted] and [obsolete] in the titles > > > of the modules, like v7 does, add a list of them into the text. > > > > > > > v9 puts the list in vertical format, 5 columns. > > > > But the column spacing in HTML is ugly, and I don't > > see a parameter to set to change it. I suppose we could > > do more work on the stylesheets, but this seems excessive. > > Come to think of it, this should be fixed by using CSS > with a > > table.simplelist Actually, this CSS, added to doc/src/sgml/stylesheet.css, makes the column spacing look pretty good: /* Adequate spacing between columns in a simplelist non-inline table */ .simplelist td { padding-left: 2em; padding-right: 2em; } (No point in specifying table, since td only shows up in tables.) Note that the default simplelist type value is "vert", causing a 1 column vertical display. There are a number of these in the documenation. I kind of like what the above css does to these layouts. An example would be the layout in doc/src/sgml/html/datatype-boolean.html, which is the "Data Types" section "Boolean Type" sub-section. For other places affected see: grep -l doc/src/sgml/*.sgml simplelist Attached are 2 patches: v10-0001-List-trusted-and-obsolete-extensions.patch List trusted extenions in 4 columns, with the CSS altered to put spacing between vertical columns. I changed this from the 5 columns of v9 because with 5 columns there was a little bit of overflow into the right hand margin of a US-letter PDF. The PDF still has an ugly page break right before the table. To avoid that use the v8 version, which presents the list inline. v10-0002-Page-break-before-sect1-in-contrib-appendix-when-pdf.patch This is exactly like the v8 version. See my comments earlier about v8 v.s. v9. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml index 12c79b798b..b9f3268cad 100644 --- a/doc/src/sgml/contrib.sgml +++ b/doc/src/sgml/contrib.sgml @@ -84,6 +84,32 @@ CREATE EXTENSION extension_name; provide access to outside-the-database functionality. + These are the trusted extensions: + + + + + + + + + + + + + + + + + + + + + + + + + Many extensions allow you to install their objects in a schema of your choice. To do that, add SCHEMA @@ -100,6 +126,15 @@ CREATE EXTENSION extension_name; component for details. + + These modules and extensions are obsolete: + + + + + + + diff --git a/doc/src/sgml/stylesheet.css b/doc/src/sgml/stylesheet.css index 6410a47958..61d8a6537d 100644 --- a/doc/src/sgml/stylesheet.css +++ b/doc/src/sgml/stylesheet.css @@ -163,3 +163,6 @@ acronym { font-style: inherit; } width: 75%; } } + +/* Adequate spacing between columns in a simplelist non-inline table */ +table.simplelist td { padding-left: 2em; padding-right: 2em; } diff --git a/doc/src/sgml/stylesheet-fo.xsl b/doc/src/sgml/stylesheet-fo.xsl index 0c4dff92c4..68a46f9e24 100644 --- a/doc/src/sgml/stylesheet-fo.xsl +++ b/doc/src/sgml/stylesheet-fo.xsl @@ -132,4 +132,12 @@ + + + + + + +
Re: Remove source code display from \df+?
Isaac Morland writes: > On Sun, 22 Jan 2023 at 14:26, Alvaro Herrera > wrote: >> This one would fail the sanity check that all roles created by >> regression tests need to have names that start with "regress_". > Thanks for the correction. Now I feel like I've skipped some of the > readings! > Updated patch attached. Informally, I am adopting the regress_* policy for > all object types. That's excessive. The policy Alvaro mentions applies to globally-visible object names (i.e., database, role, and tablespace names), and it's there to try to ensure that doing "make installcheck" against a live installation won't clobber any non-test-created objects. There's no point in having such a policy within a test database --- its most likely effect there would be to increase the risk that different test scripts step on each others' toes. If you feel a need for a name prefix for non-global objects, use something based on the name of your test script. regards, tom lane
Deadlock between logrep apply worker and tablesync worker
On my machine, the src/test/subscription/t/002_types.pl test usually takes right about 1.5 seconds: $ time make check PROVE_FLAGS=--timer PROVE_TESTS=t/002_types.pl ... [14:22:12] t/002_types.pl .. ok 1550 ms ( 0.00 usr 0.00 sys + 0.70 cusr 0.25 csys = 0.95 CPU) [14:22:13] I noticed however that sometimes (at least one try in ten, for me) it takes about 2.5 seconds: [14:22:16] t/002_types.pl .. ok 2591 ms ( 0.00 usr 0.00 sys + 0.69 cusr 0.28 csys = 0.97 CPU) [14:22:18] and I've even seen 3.5 seconds. I dug into this and eventually identified the cause: it's a deadlock between a subscription's apply worker and a tablesync worker that it's spawned. Sometimes the apply worker calls wait_for_relation_state_change (to wait for the tablesync worker to finish) while it's holding a lock on pg_replication_origin. If that's the case, then when the tablesync worker reaches process_syncing_tables_for_sync it is able to perform UpdateSubscriptionRelState and reach the transaction commit below that; but when it tries to do replorigin_drop_by_name a little further down, it blocks on acquiring ExclusiveLock on pg_replication_origin. So we have an undetected deadlock. We escape that because wait_for_relation_state_change has a 1-second timeout, after which it rechecks GetSubscriptionRelState and is able to see the committed relation state change; so it continues, and eventually releases its transaction and the lock, permitting the tablesync worker to finish. I've not tracked down the exact circumstances in which the apply worker ends up holding a problematic lock, but it seems likely that it corresponds to cases where its main loop has itself called replorigin_drop_by_name, a bit further up, for some other concurrent tablesync operation. (In all the cases I've traced through, the apply worker is herding multiple tablesync workers when this happens.) I experimented with having the apply worker release its locks before waiting for the tablesync worker, as attached. This passes check-world and it seems to eliminate the test runtime instability, but I wonder whether it's semantically correct. This whole business of taking table-wide ExclusiveLock on pg_replication_origin looks like a horrid kluge that we should try to get rid of, not least because I don't see any clear documentation of what hazard it's trying to prevent. Another thing that has a bad smell about it is the fact that process_syncing_tables_for_sync uses two transactions in the first place. There's a comment there claiming that it's for crash safety, but I can't help suspecting it's really because this case becomes a hard deadlock without that mid-function commit. It's not great in any case that the apply worker can move on in the belief that the tablesync worker is done when in fact the latter still has catalog state updates to make. And I wonder what we're doing with having both of them calling replorigin_drop_by_name ... shouldn't that responsibility belong to just one of them? So I think this whole area deserves a hard look, and I'm not at all sure that what's attached is a good solution. regards, tom lane diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 4647837b82..bb45c2107f 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -539,15 +539,27 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn) LWLockRelease(LogicalRepWorkerLock); /* - * Enter busy loop and wait for synchronization worker to - * reach expected state (or die trying). + * If we have a transaction, we must commit it to release + * any locks we have (it's quite likely we hold lock on + * pg_replication_origin, which the sync worker will need + * to update). Then start a new transaction so we can + * examine catalog state. */ - if (!started_tx) + if (started_tx) + { + CommitTransactionCommand(); + StartTransactionCommand(); + } + else { StartTransactionCommand(); started_tx = true; } + /* + * Enter busy loop and wait for synchronization worker to + * reach expected state (or die trying). + */ wait_for_relation_state_change(rstate->relid, SUBREL_STATE_SYNCDONE); }
Re: Remove source code display from \df+?
On Sun, 22 Jan 2023 at 14:26, Alvaro Herrera wrote: > On 2023-Jan-22, Isaac Morland wrote: > > > I’ve re-written the tests to create a test-specific role and functions so > > there is no longer a dependency on the superuser name. > > This one would fail the sanity check that all roles created by > regression tests need to have names that start with "regress_". > Thanks for the correction. Now I feel like I've skipped some of the readings! Updated patch attached. Informally, I am adopting the regress_* policy for all object types. > I pondered the notion of going with the flow and just leaving out the > > tests but that seemed like giving up too easily. > > I think avoiding even more untested code is good, so +1 for keeping at > it. > 0001-Remove-source-code-display-from-df-v5.patch Description: Binary data
Re: Remove source code display from \df+?
On 2023-Jan-22, Isaac Morland wrote: > I’ve re-written the tests to create a test-specific role and functions so > there is no longer a dependency on the superuser name. This one would fail the sanity check that all roles created by regression tests need to have names that start with "regress_". > I pondered the notion of going with the flow and just leaving out the > tests but that seemed like giving up too easily. I think avoiding even more untested code is good, so +1 for keeping at it. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "At least to kernel hackers, who really are human, despite occasional rumors to the contrary" (LWN.net)
Re: run pgindent on a regular basis / scripted manner
Jelte Fennema writes: > When reading the emails in this discussion from 2 years ago > it seems like the respondents wouldn't mind updating the > typedefs.list manually. And proposed approach number 3 > seemed to have support overall, i.e. fail a push to master > when pgindent was not run on the new commit. Would > it make sense to simply try that approach and see if > there's any big issues with it? I will absolutely not accept putting in something that fails pushes on this basis. There are too many cases where pgindent purity is not an overriding issue. I mentioned a counterexample just upthread: even if you are as anal as you could be about indentation, you might prefer to separate a logic-changing patch from the ensuing mechanical reindentation. regards, tom lane
Re: wake up logical workers after ALTER SUBSCRIPTION
Nathan Bossart writes: > On Tue, Jan 10, 2023 at 10:59:14AM +0530, Amit Kapila wrote: >> I haven't looked in detail but isn't it better to explain somewhere in >> the comments that it achieves to rate limit the restart of workers in >> case of error and allows them to restart immediately in case of >> subscription parameter change? > I expanded one of the existing comments to make this clear. I pushed v17 with some mostly-cosmetic changes, including more comments. > Of course, if the launcher restarts, then the notify_pid will no longer be > accurate. However, I see that workers also register a before_shmem_exit > callback that will send SIGUSR1 to the launcher_pid currently stored in > shared memory. (I wonder if there is a memory ordering bug here.) I think it's all close enough in reality. There are other issues in this code, and I'm about to start a new thread about one I identified while testing this patch, but I think we're in good shape on this particular point. I've marked the CF entry as committed. regards, tom lane
Re: MERGE ... RETURNING
> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c > new file mode 100644 > index e34f583..aa3cca0 > --- a/src/backend/commands/copy.c > +++ b/src/backend/commands/copy.c > @@ -274,12 +274,6 @@ DoCopy(ParseState *pstate, const CopyStm > { > Assert(stmt->query); > > - /* MERGE is allowed by parser, but unimplemented. Reject for > now */ > - if (IsA(stmt->query, MergeStmt)) > - ereport(ERROR, > - errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > - errmsg("MERGE not supported in COPY")); Does this COPY stuff come from another branch where you're adding support for MERGE in COPY? I see that you add a test that MERGE without RETURNING fails, but you didn't add any tests that it works with RETURNING. Anyway, I suspect these small changes shouldn't be here. Overall, the idea of using Postgres-specific functions for extracting context in the RETURNING clause looks acceptable to me. We can change that to add support to whatever clauses the SQL committee offers, when they get around to offering something. (We do have to keep our fingers crossed that they will decide to use the same RETURNING syntax as we do in this patch, of course.) Regarding mas_action_idx, I would have thought that it belongs in MergeAction rather than MergeActionState. After all, you determine it once at parse time, and it is a constant from there onwards, right? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Remove source code display from \df+?
On Sun, 22 Jan 2023 at 00:45, Justin Pryzby wrote: > On Sun, Jan 22, 2023 at 12:18:34AM -0500, Isaac Morland wrote: > > > It turns out that my tests wanted the owner to be “vagrant” rather than > > “postgres”. This is apparently because I was running as that user (in a > > Vagrant VM) when running the tests. Then I took that output and just made > > it the expected output. I’ve re-worked my build environment a bit so > that I > > run as “postgres” inside the Vagrant VM. > > > > What I don’t understand is why that didn’t break all the other tests. > > Probably because the other tests avoid showing the owner, exactly > because it varies depending on who runs the tests. Most of the "plus" > commands aren't tested, at least in the sql regression tests. > Thanks for your patience. I didn’t think about it enough but everything you both said makes sense. I’ve re-written the tests to create a test-specific role and functions so there is no longer a dependency on the superuser name. I pondered the notion of going with the flow and just leaving out the tests but that seemed like giving up too easily. We should probably change one of the CI usernames to something other > than "postgres" to catch the case that someone hardcodes "postgres". > > > proper value in the Owner column so let’s see what CI does with it. > > Or better: see above about using it from your github account. Yes, I will try to get this working before I try to make another contribution. 0001-Remove-source-code-display-from-df-v4.patch Description: Binary data
Re: pg_stats and range statistics
On 1/21/23 19:53, Egor Rogov wrote: > Hi Tomas, > > On 21.01.2023 00:50, Tomas Vondra wrote: >> Hi Egor, >> >> While reviewing a patch improving join estimates for ranges [1] I >> realized we don't show stats for ranges in pg_stats, and I recalled we >> had this patch. >> >> I rebased the v2, and I decided to took a stab at showing separate >> histograms for lower/upper histogram bounds. I believe it makes it way >> more readable, which is what pg_stats is about IMHO. > > > Thanks for looking into this. > > I have to admit it looks much better this way, so +1. > OK, good to hear. > >> This simply adds two functions, accepting/producing anyarray - one for >> lower bounds, one for upper bounds. I don't think it can be done with a >> plain subquery (or at least I don't know how). > > > Anyarray is an alien to SQL, so functions are well justified here. What > makes me a bit uneasy is two almost identical functions. Should we > consider other options like a function with an additional parameter or a > function returning an array of bounds arrays (which is somewhat > wasteful, but probably it doesn't matter much here)? > I thought about that, but I think the alternatives (e.g. a single function with a parameter determining which boundary to return). But I don't think it's better. Moreover, I think this is pretty similar to lower/upper, which already work on range values. So if we have separate functions for that, we should do the same thing here. I renamed the functions to ranges_lower/ranges_upper, but maybe why not to even call the functions lower/upper too? The main trouble with the function I can think of is that we only have anyarray type, not anyrangearray. So the functions will get called for arbitrary array, and the check that it's array of ranges happens inside. I'm not sure if that's a good or bad idea, or what would it take to add a new polymorphic type ... For now I at least kept "ranges_" to make it less likely. > >> Finally, it renames the empty_range_frac to start with range_, per the >> earlier discussion. I wonder if the new column names for lower/upper >> bounds (range_lower_bounds_histograms/range_upper_bounds_histograms) are >> too long ... > > > It seems so. The ending -s should be left out since it's a single > histogram now. And I think that > range_lower_histogram/range_upper_histogram are descriptive enough. > > I'm adding one more patch to shorten the column names, refresh the docs, > and make 'make check' happy (unfortunately, we have to edit > src/regress/expected/rules.out every time pg_stats definition changes). > Thanks. I noticed the docs were added to pg_user_mapping by mistake, not to pg_stats. So I fixed that, and I also added the new functions. Finally, I reordered the fields a bit - moved range_empty_frac to keep the histogram fields together. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 8049d2b7e5d1636d5fb2b7d421d6b29a39389fb3 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Fri, 20 Jan 2023 20:50:41 +0100 Subject: [PATCH 1/2] Display length and bounds histograms in pg_stats Values corresponding to STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and STATISTIC_KIND_BOUNDS_HISTOGRAM were not exposed to pg_stats when these slot kinds were introduced in 918eee0c49. This commit adds the missing fields to pg_stats. TODO: catalog version bump --- doc/src/sgml/catalogs.sgml| 40 ++ src/backend/catalog/system_views.sql | 30 - src/backend/utils/adt/rangetypes_typanalyze.c | 118 ++ src/include/catalog/pg_proc.dat | 10 ++ src/include/catalog/pg_statistic.h| 3 + src/test/regress/expected/rules.out | 34 - 6 files changed, 233 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index c1e4048054..e0851c52b4 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -9634,6 +9634,46 @@ SCRAM-SHA-256$iteration count: User mapping specific options, as keyword=value strings + + + + range_length_histogram anyarray + + + A histogram of the lengths of non-empty and non-null range values of a + range type column. (Null for non-range types.) + + + + + + range_empty_frac float4 + + + Fraction of column entries whose values are empty ranges. + (Null for non-range types.) + + + + + + range_lower_histogram anyarray + + + A histogram of lower bounds of non-empty and non-null range values. + (Null for non-range types.) + + + + + + range_upper_histogram anyarray + + + A histogram of upper bounds of non-empty and non-null range values. + (Null for non-range types.) + + diff --git a/src/backend/catalog/system_views.sql
Re: run pgindent on a regular basis / scripted manner
But I do think this discussion about other formatting tools is distracting from the main pain point I wanted to discuss: our current formatting tool is not run consistently enough. The only thing that another tool will change in this regard is that there is no need to update typedefs.list. It doesn't seem like that's such a significant difference that it would change the solution to the first problem. When reading the emails in this discussion from 2 years ago it seems like the respondents wouldn't mind updating the typedefs.list manually. And proposed approach number 3 seemed to have support overall, i.e. fail a push to master when pgindent was not run on the new commit. Would it make sense to simply try that approach and see if there's any big issues with it? > (Another problem here is that there's a sizable subset of committers > who clearly just don't care, and I'm not sure we can convince them to.) My guess would be that the main reason is simply because committers forget it sometimes because there's no automation complaining about it. On Sun, 22 Jan 2023 at 18:20, Jelte Fennema wrote: > > > But so far I haven't seen one that can make that > > column be column +12. > > Thanks for clarifying what the current variable declaration indention > rule is. Indeed neither uncrustify or clang-format seem to support > that. Getting uncrustify to support it might not be too difficult, but > the question remains if we even want that. > > > But switching away from that intermixed with a lot of other changes isn't > > going to be fun. > > I don't think the amount of pain is really much lower if we reformat > 10,000 or 300,000 lines of code, without automation both would be > quite painful. But the git commands I shared in my previous email > should alleviate most of that pain. > > > I don't have a problem with the current pgindent alignment of function > > parameters, so not sure what you mean about that. > > Function parameter alignment is fine with pgindent imho, but this +12 > column variable declaration thing I personally think is quite weird. > > > Really? I have been using 14, which is quite recent. Did you just > > figure this out recently? If this is true, then it's certainly > > discouraging. > > It seems this was due to my Ubuntu 22.04 install having clang-format > 14.0.0. After > updating it to 14.0.6 by using the official llvm provided packages, I > don't have this > issue on clang-format-14 anymore. To be clear this was an issue in alignment > of > variable declarations not function parameters. > > But I agree with Tom Lane that this makes clear that whatever tool we > pick we'll need > to pick a specific version, just like we do now with perltidy. And > indeed I'm not sure > how easy that is with clang. Installing a specific uncrustify version > is pretty easy btw, > the compilation from source is quite quick.
Re: [PATCH] Use 128-bit math to accelerate numeric division, when 8 < divisor digits <= 16
On Sun, 22 Jan 2023 at 15:41, Joel Jacobson wrote: > > On Sun, Jan 22, 2023, at 11:06, Dean Rasheed wrote: > > Seems like a reasonable idea, with some pretty decent gains. > > > > Note, however, that for a divisor having fewer than 5 or 6 digits, > > it's now significantly slower because it's forced to go through > > div_var_int64() instead of div_var_int() for all small divisors. So > > the var2ndigits <= 2 case needs to come first. > > Can you give a measurable example of when the patch > the way it's written is significantly slower for a divisor having > fewer than 5 or 6 digits, on some platform? > I just modified the previous test you posted: \timing on SELECT count(numeric_div_volatile(1e131071,123456)) FROM generate_series(1,1e4); Time: 2048.060 ms (00:02.048)-- HEAD Time: 2422.720 ms (00:02.423)-- With patch > I did write the code like you suggest first, but changed it, > since I realised the extra "else if" needed could be eliminated, > and thought div_var_int64() wouldn't be slower than div_var_int() since > I thought 64-bit instructions in general are as fast as 32-bit instructions, > on 64-bit platforms. > Apparently it can make a difference. Probably something to do with having less data to move around. I remember noticing that when I wrote div_var_int(), which is why I split it into 2 branches in that way. Regards, Dean
Re: run pgindent on a regular basis / scripted manner
> But so far I haven't seen one that can make that > column be column +12. Thanks for clarifying what the current variable declaration indention rule is. Indeed neither uncrustify or clang-format seem to support that. Getting uncrustify to support it might not be too difficult, but the question remains if we even want that. > But switching away from that intermixed with a lot of other changes isn't > going to be fun. I don't think the amount of pain is really much lower if we reformat 10,000 or 300,000 lines of code, without automation both would be quite painful. But the git commands I shared in my previous email should alleviate most of that pain. > I don't have a problem with the current pgindent alignment of function > parameters, so not sure what you mean about that. Function parameter alignment is fine with pgindent imho, but this +12 column variable declaration thing I personally think is quite weird. > Really? I have been using 14, which is quite recent. Did you just > figure this out recently? If this is true, then it's certainly > discouraging. It seems this was due to my Ubuntu 22.04 install having clang-format 14.0.0. After updating it to 14.0.6 by using the official llvm provided packages, I don't have this issue on clang-format-14 anymore. To be clear this was an issue in alignment of variable declarations not function parameters. But I agree with Tom Lane that this makes clear that whatever tool we pick we'll need to pick a specific version, just like we do now with perltidy. And indeed I'm not sure how easy that is with clang. Installing a specific uncrustify version is pretty easy btw, the compilation from source is quite quick.
Re: Non-superuser subscription owners
On Sat, 2023-01-21 at 14:01 -0800, Andres Freund wrote: > There are good reasons to have 'peer' authentication set up for the > user > running postgres, so admin scripts can connect without issues. Which > unfortunately then also means that postgres_fdw etc can connect to > the current > database as superuser, without that check. Which imo clearly is an > issue. Perhaps we should have a way to directly turn on/off authentication methods in libpq through API functions and/or options? This reminds me of the "channel_binding=required" option. We considered some similar alternatives for that feature. > Why is this only about local files, rather than e.g. also using the > local > user? It's not, but we happen to already have pg_read_server_files, and it makes sense to use that at least for files referenced directly in the connection string. You're right that it's incomplete, and also that it doesn't make a lot of sense for files accessed indirectly. -- Jeff Davis PostgreSQL Contributor Team - AWS
Re: run pgindent on a regular basis / scripted manner
Peter Geoghegan writes: > On Sat, Jan 21, 2023 at 3:39 PM Jelte Fennema wrote: >> ... For clang-format you should use >> at least clang-format 15, otherwise it has some bugs in the alignment >> logic. > Really? I have been using 14, which is quite recent. Did you just > figure this out recently? If this is true, then it's certainly > discouraging. Indeed. What that points to is a future where different contributors get different results depending on what clang version they have installed --- and it's not going to be practical to insist that everybody have the same version, because AFAICT clang-format is tied to clang itself. So that sounds a bit unappetizing. One of the few advantages of the current tool situation is that at any time there's just one agreed-on version of pgindent and pgperltidy. I've not heard push-back about our policy that you should use perltidy version 20170521, because that's not especially connected to any other part of one's system. Maybe the same would hold for uncrustify, but it's never going to work for pieces of the clang ecosystem. regards, tom lane
Re: [Commitfest 2023-01] has started
On Sun, 15 Jan 2023 at 23:02, vignesh C wrote: > > On Sun, 8 Jan 2023 at 21:00, vignesh C wrote: > > > > On Tue, 3 Jan 2023 at 13:13, vignesh C wrote: > > > > > > Hi All, > > > > > > Just a reminder that Commitfest 2023-01 has started. > > > There are many patches based on the latest run from [1] which require > > > a) Rebased on top of head b) Fix compilation failures c) Fix test > > > failure, please have a look and rebase it so that it is easy for the > > > reviewers and committers: > > > 1. TAP output format for pg_regress > > > 2. Add BufFileRead variants with short read and EOF detection > > > 3. Add SHELL_EXIT_CODE variable to psql > > > 4. Add foreign-server health checks infrastructure > > > 5. Add last_vacuum_index_scans in pg_stat_all_tables > > > 6. Add index scan progress to pg_stat_progress_vacuum > > > 7. Add the ability to limit the amount of memory that can be allocated > > > to backends. > > > 8. Add tracking of backend memory allocated to pg_stat_activity > > > 9. CAST( ... ON DEFAULT) > > > 10. CF App: add "Returned: Needs more interest" close status > > > 11. CI and test improvements > > > 12. Cygwin cleanup > > > 13. Expand character set for ltree labels > > > 14. Fix tab completion MERGE > > > 15. Force streaming every change in logical decoding > > > 16. More scalable multixacts buffers and locking > > > 17. Move SLRU data into the regular buffer pool > > > 18. Move extraUpdatedCols out of RangeTblEntry > > > 19.New [relation] options engine > > > 20. Optimizing Node Files Support > > > 21. PGDOCS - Stats views and functions not in order? > > > 22. POC: Lock updated tuples in tuple_update() and tuple_delete() > > > 23. Parallelize correlated subqueries that execute within each worker > > > 24. Pluggable toaster > > > 25. Prefetch the next tuple's memory during seqscans > > > 26. Pulling up direct-correlated ANY_SUBLINK > > > 27. Push aggregation down to base relations and joins > > > 28. Reduce timing overhead of EXPLAIN ANALYZE using rdtsc > > > 29. Refactor relation extension, faster COPY > > > 30. Remove NEW placeholder entry from stored view query range table > > > 31. TDE key management patches > > > 32. Use AF_UNIX for tests on Windows (ie drop fallback TCP code) > > > 33. Windows filesystem support improvements > > > 34. making relfilenodes 56 bit > > > 35. postgres_fdw: commit remote (sub)transactions in parallel during > > > pre-commit > > > 36.recovery modules > > > > > > Commitfest status as of now: > > > Needs review:177 > > > Waiting on Author: 47 > > > Ready for Committer: 20 > > > Committed: 31 > > > Withdrawn:4 > > > Rejected: 0 > > > Returned with Feedback: 0 > > > Total: 279 > > > > > > We will be needing more members to actively review the patches to get > > > more patches to the committed state. I would like to remind you that > > > each patch submitter is expected to review at least one patch from > > > another submitter during the CommitFest, those members who have not > > > picked up patch for review please pick someone else's patch to review > > > as soon as you can. > > > I'll send out reminders this week to get your patches rebased and > > > update the status of the patch accordingly. > > > > > > [1] - http://cfbot.cputube.org/ > > > > Hi Hackers, > > > > Here's a quick status report after the first week (I think only about > > 9 commits happened during the week, the rest were pre-CF activity): > > > > status | 3rd Jan | w1 > > -+---+- > > Needs review:| 177 | 149 > > Waiting on Author: |47 | 60 > > Ready for Committer: |20 | 23 > > Committed: |31 | 40 > > Withdrawn: | 4 | 7 > > Rejected:| 0 | 0 > > Returned with Feedback: | 0 | 0 > > Total: | 279 | 279 > > > > Here is a list of "Needs review" entries for which there has not been > > much communication on the thread and needs help in proceeding further. > > Please pick one of these and help us on how to proceed further: > > pgbench: using prepared BEGIN statement in a pipeline could cause an > > error | Yugo Nagata > > Fix dsa_free() to re-bin segment | Dongming Liu > > pg_rewind: warn when checkpoint hasn't happened after promotion | James > > Coleman > > Work around non-atomic read of read of control file on ext4 | Thomas Munro > > Rethinking the implementation of ts_headline | Tom Lane > > Fix GetWALAvailability function code comments for WALAVAIL_REMOVED > > return value | sirisha chamarti > > Function to log backtrace of postgres processes | vignesh C, Bharath > > Rupireddy > > disallow HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_LOCKED_ONLY | Nathan Bossart > > New hooks in the connection path | Bertrand Drouvot > > Check consistency of GUC
bug: ANALYZE progress report with inheritance tables
pg_stat_progress_analyze was added in v13 (a166d408e). For tables with inheritance children, do_analyze_rel() and acquire_sample_rows() are called twice. The first time through, pgstat_progress_start_command() has memset() the progress array to zero. But the 2nd time, ANALYZE_BLOCKS_DONE is already set from the previous call, and BLOCKS_TOTAL can be set to some lower value (and in any case a value unrelated to the pre-existing value of BLOCKS_DONE). So the progress report briefly shows a bogus combination of values and, with these assertions, fails regression tests in master and v13, unless BLOCKS_DONE is first zeroed. | Core was generated by `postgres: pryzbyj regression [local] VACUUM '. | ... | #5 0x559a1c9fbbcc in ExceptionalCondition (conditionName=conditionName@entry=0x559a1cb68068 "a[PROGRESS_ANALYZE_BLOCKS_DONE] <= a[PROGRESS_ANALYZE_BLOCKS_TOTAL]", | ... | #16 0x563165cc7cfe in exec_simple_query (query_string=query_string@entry=0x563167cad0c8 "VACUUM ANALYZE stxdinh, stxdinh1, stxdinh2;") at ../src/backend/tcop/postgres.c:1237 | ... | (gdb) p MyBEEntry->st_progress_param[1] | $1 = 5 | (gdb) p MyBEEntry->st_progress_param[2] | $2 = 9 BTW, I found this bug as well as the COPY progress bug I reported [0] while testing the CREATE INDEX progress bug reported by Ilya. It seems like the progress infrastructure should have some checks added. [0] https://www.postgresql.org/message-id/flat/20230119054703.gb13...@telsasoft.com diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index c86e690980e..96710b84558 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -1145,6 +1145,12 @@ acquire_sample_rows(Relation onerel, int elevel, TableScanDesc scan; BlockNumber nblocks; BlockNumber blksdone = 0; + int64 progress_vals[2] = {0}; + int const progress_inds[2] = { + PROGRESS_ANALYZE_BLOCKS_DONE, + PROGRESS_ANALYZE_BLOCKS_TOTAL + }; + #ifdef USE_PREFETCH int prefetch_maximum = 0; /* blocks to prefetch if enabled */ BlockSamplerData prefetch_bs; @@ -1169,8 +1175,8 @@ acquire_sample_rows(Relation onerel, int elevel, #endif /* Report sampling block numbers */ - pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_TOTAL, -nblocks); + progress_vals[1] = nblocks; + pgstat_progress_update_multi_param(2, progress_inds, progress_vals); /* Prepare for sampling rows */ reservoir_init_selection_state(, targrows); diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c index d96af812b19..05593fb13cb 100644 --- a/src/backend/utils/activity/backend_progress.c +++ b/src/backend/utils/activity/backend_progress.c @@ -10,6 +10,7 @@ */ #include "postgres.h" +#include "commands/progress.h" #include "port/atomics.h" /* for memory barriers */ #include "utils/backend_progress.h" #include "utils/backend_status.h" @@ -37,6 +38,83 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid) PGSTAT_END_WRITE_ACTIVITY(beentry); } +/* + * Check for consistency of progress data (current < total). + * + * Check during pgstat_progress_updates_*() rather than only from + * pgstat_progress_end_command() to catch issues with uninitialized/stale data + * from previous progress commands. + * + * If a command fails due to interrupt or error, the values may be less than + * the expected final value. + */ +static void +pgstat_progress_asserts(void) +{ + volatile PgBackendStatus *beentry = MyBEEntry; + volatile int64 *a = beentry->st_progress_param; + + switch (beentry->st_progress_command) + { + case PROGRESS_COMMAND_VACUUM: + Assert(a[PROGRESS_VACUUM_HEAP_BLKS_SCANNED] <= + a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]); + Assert(a[PROGRESS_VACUUM_HEAP_BLKS_VACUUMED] <= + a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]); + Assert(a[PROGRESS_VACUUM_NUM_DEAD_TUPLES] <= + a[PROGRESS_VACUUM_MAX_DEAD_TUPLES]); + break; + + case PROGRESS_COMMAND_ANALYZE: + Assert(a[PROGRESS_ANALYZE_BLOCKS_DONE] <= + a[PROGRESS_ANALYZE_BLOCKS_TOTAL]); + Assert(a[PROGRESS_ANALYZE_EXT_STATS_COMPUTED] <= + a[PROGRESS_ANALYZE_EXT_STATS_TOTAL]); + Assert(a[PROGRESS_ANALYZE_CHILD_TABLES_DONE] <= + a[PROGRESS_ANALYZE_CHILD_TABLES_TOTAL]); + break; + + case PROGRESS_COMMAND_CLUSTER: + Assert(a[PROGRESS_CLUSTER_HEAP_BLKS_SCANNED] <= + a[PROGRESS_CLUSTER_TOTAL_HEAP_BLKS]); +
Re: run pgindent on a regular basis / scripted manner
Andrew Dunstan writes: >> ... btw, can we get away with making the diff run be "diff -upd" >> not just "diff -u"? I find diff output for C files noticeably >> more useful with those options, but I'm unsure about their >> portability. > I think they are available on Linux, MacOS and FBSD, and on Windows (if > anyone's actually using it for this) it's likely to be Gnu diff. So I > think that's probably enough coverage. I checked NetBSD as well, and it has all three too. Patch looks good to me. regards, tom lane
Re: [PATCH] Use 128-bit math to accelerate numeric division, when 8 < divisor digits <= 16
On Sun, Jan 22, 2023, at 11:06, Dean Rasheed wrote: > Seems like a reasonable idea, with some pretty decent gains. > > Note, however, that for a divisor having fewer than 5 or 6 digits, > it's now significantly slower because it's forced to go through > div_var_int64() instead of div_var_int() for all small divisors. So > the var2ndigits <= 2 case needs to come first. Can you give a measurable example of when the patch the way it's written is significantly slower for a divisor having fewer than 5 or 6 digits, on some platform? I can't detect any difference at all at my MacBook Pro M1 Max for this example: EXPLAIN ANALYZE SELECT count(numeric_div_volatile(1,)) FROM generate_series(1,1e8); I did write the code like you suggest first, but changed it, since I realised the extra "else if" needed could be eliminated, and thought div_var_int64() wouldn't be slower than div_var_int() since I thought 64-bit instructions in general are as fast as 32-bit instructions, on 64-bit platforms. I'm not suggesting your claim is incorrect, I'm just trying to understand and verify it experimentally. > The implementation of div_var_int64() should be in an #ifdef HAVE_INT128 > block. > > In div_var_int64(), s/ULONG_MAX/PG_UINT64_MAX/ OK, thanks, I'll fix, but I'll await your feedback first on the above. /Joel
Re: HOT chain validation in verify_heapam()
On Fri, Jan 20, 2023 at 12:38 AM Robert Haas wrote: > > I think that the handling of lp_valid[] in the loop that begins with > "Loop over offset and populate predecessor array from all entries that > are present in successor array" is very confusing. I think that > lp_valid[] should be answering the question "is the line pointer > basically sane?". That is, if it's a redirect, it needs to point to > something within the line pointer array (and we also check that it > must be an entry in the line pointer array that is used, which seems > fine). If it's not a redirect, it needs to point to space that's > entirely within the block, properly aligned, and big enough to contain > a tuple. We determine the answers to all of these questions in the > first loop, the one that starts with /* Perform tuple checks */. > > Nothing that happens in the second loop, where we populate the > predecessor array, can reverse our previous conclusion that the line > pointer is valid, so this loop shouldn't be resetting entries in > lp_valid[] to false. The reason that it's doing so seems to be that it > wants to use lp_valid[] to control the behavior of the third loop, > where we perform checks against things that have entries in the > predecessor array. As written, the code ensures that we always set > lp_valid[nextoffnum] to false unless we set predecessor[nextoffnum] to > a value other than InvalidOffsetNumber. But that is needlessly > complex: the third loop doesn't need to look at lp_valid[] at all. It > can just check whether predecessor[currentoffnum] is valid. If it is, > perform checks. Otherwise, skip it. It seems to me that this would be > significantly simpler. > I was trying to use lp_valid as I need to identify the root of the HOT chain and we are doing validation on the root of the HOT chain when we loop over the predecessor array. if (nextoffnum == InvalidOffsetNumber || !lp_valid[ctx.offnum] || !lp_valid[nextoffnum]) { /* * Set lp_valid of nextoffnum to false if current tuple's * lp_valid is true. We don't add this to predecessor array as * it's of no use to validate tuple if its predecessor is * already corrupted but we need to identify all those tuple's * so that we can differentiate between all the cases of * missing offset in predecessor array, this will help in * validating the root of chain when we loop over predecessor * array. */ if (!lp_valid[ctx.offnum] && lp_valid[nextoffnum]) lp_valid[nextoffnum] = false; Was resetting lp_valid in the last patch because we don't add data to predecessor[] and while looping over the predecessor array we need to isolate (and identify) all cases of missing data in the predecessor array to exactly identify the root of HOT chain. One solution is to always add data to predecessor array while looping over successor array and then while looping over predecessor array we can continue for other validation "if (lp_valid [predecessor[currentoffnum]] && lp_valid[currentoffnum]" is true but in this case also our third loop will also look at lp_valid[]. To put the above complaint another way, a variable shouldn't mean two > different things depending on where you are in the function. Right > now, at the end of the first loop, lp_valid[x] answers the question > "is line pointer x basically valid?". But by the end of the second > loop, it answers the question "is line pointer x valid and does it > also have a valid predecessor?". That kind of definitional change is > something to be avoided. > > agree. > The test if (pred_in_progress || TransactionIdDidCommit(curr_xmin)) > seems wrong to me. Shouldn't it be &&? Has this code been tested at > all? It doesn't seem to have a test case. Some of these other errors > don't, either. Maybe there's some that we can't easily test in an > automated way, but we should test what we can. I guess maybe casual > testing wouldn't reveal the problem here because of the recheck, but > it's worrying to find logic that doesn't look right with no > corresponding comments or test cases. > > This is totally my Mistake, apologies for that. I will fix this in my next patch. Regarding the missing test cases, I need one in-progress transaction for these test cases to be included in 004_verify_heapam.pl but I don't find a clear way to have an in-progress transaction(as per the design of 004_verify_heapam.pl ) that I can use in the test cases. I will be doing more research on a solution to add these missing test cases. > Some error message kibitizing: > > psprintf("redirected tuple at line pointer offset %u is not
Re: [PATCH] Use 128-bit math to accelerate numeric division, when 8 < divisor digits <= 16
On Sun, 22 Jan 2023 at 13:42, Joel Jacobson wrote: > > Hi, > > On platforms where we support 128bit integers, we could accelerate division > when the number of digits in the divisor is larger than 8 and less than or > equal to 16 digits, i.e. when the divisor that fits in a 64-bit integer but > would > not fit in a 32-bit integer. > Seems like a reasonable idea, with some pretty decent gains. Note, however, that for a divisor having fewer than 5 or 6 digits, it's now significantly slower because it's forced to go through div_var_int64() instead of div_var_int() for all small divisors. So the var2ndigits <= 2 case needs to come first. The implementation of div_var_int64() should be in an #ifdef HAVE_INT128 block. In div_var_int64(), s/ULONG_MAX/PG_UINT64_MAX/ Regards, Dean
Re: run pgindent on a regular basis / scripted manner
On 2023-01-21 Sa 11:10, Tom Lane wrote: > Andrew Dunstan writes: I think we could do better with some automation tooling for committers here. One low-risk and simple change would be to provide a non-destructive mode for pgindent that would show you the changes if any it would make. That could be worked into a git pre-commit hook that committers could deploy. I can testify to the usefulness of such hooks - I have one that while not perfect has saved me on at least two occasions from forgetting to bump the catalog version. > That sounds like a good idea from here. I do not think we want a > mandatory commit filter, but if individual committers care to work > this into their process in some optional way, great! I can think > of ways I'd use it during patch development, too. Yes, it's intended for use at committers' discretion. We have no way of forcing use of a git hook on committers, although we could reject pushes that offend against certain rules. For the reasons you give below that's not a good idea. A pre-commit hook can be avoided by using `git commit -n` and there's are similar option/hook for `git merge`. > > (One reason not to want a mandatory filter is that you might wish > to apply pgindent as a separate commit, so that you can then > put that commit into .git-blame-ignore-revs. This could be handy > for example when a patch needs to change the nesting level of a lot > of pre-existing code, without making changes in it otherwise.) Agreed. > Looks reasonable, but you should also update > src/tools/pgindent/pgindent.man, which AFAICT is our only > documentation for pgindent switches. (Is it time for a > --help option in pgindent?) > > Yes, see revised patch. > ... btw, can we get away with making the diff run be "diff -upd" > not just "diff -u"? I find diff output for C files noticeably > more useful with those options, but I'm unsure about their > portability. I think they are available on Linux, MacOS and FBSD, and on Windows (if anyone's actually using it for this) it's likely to be Gnu diff. So I think that's probably enough coverage. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent index 741b0ccb58..bad2a17582 100755 --- a/src/tools/pgindent/pgindent +++ b/src/tools/pgindent/pgindent @@ -21,16 +21,27 @@ my $indent_opts = my $devnull = File::Spec->devnull; -my ($typedefs_file, $typedef_str, $code_base, $excludes, $indent, $build); +my ($typedefs_file, $typedef_str, $code_base, $excludes, $indent, $build, + $show_diff, $silent_diff, $help); + +$help = 0; my %options = ( + "help" => \$help, "typedefs=s" => \$typedefs_file, "list-of-typedefs=s" => \$typedef_str, "code-base=s"=> \$code_base, "excludes=s" => \$excludes, "indent=s" => \$indent, - "build" => \$build,); -GetOptions(%options) || die "bad command line argument\n"; + "build" => \$build, +"show-diff" => \$show_diff, +"silent_diff"=> \$silent_diff,); +GetOptions(%options) || usage ("bad command line argument"); + +usage() if $help; + +usage ("Cannot have both --silent-diff and --show-diff") + if $silent_diff && $show_diff; run_build($code_base) if ($build); @@ -230,7 +241,6 @@ sub pre_indent sub post_indent { my $source = shift; - my $source_filename = shift; # Restore CATALOG lines $source =~ s!^/\*(CATALOG\(.*)\*/$!$1!gm; @@ -280,33 +290,21 @@ sub run_indent close($src_out); return $source; - } - -# for development diagnostics -sub diff +sub show_diff { - my $pre = shift; - my $post = shift; - my $flags = shift || ""; + my $indented = shift; + my $source_filename = shift; - print STDERR "running diff\n"; + my $post_fh = new File::Temp(TEMPLATE => "pgdiffX"); - my $pre_fh = new File::Temp(TEMPLATE => "pgdiffbX"); - my $post_fh = new File::Temp(TEMPLATE => "pgdiffaX"); + print $post_fh $indented; - print $pre_fh $pre; - print $post_fh $post; - - $pre_fh->close(); $post_fh->close(); - system( "diff $flags " - . $pre_fh->filename . " " - . $post_fh->filename - . " >&2"); - return; + my $diff = `diff -upd $source_filename $post_fh->filename 2>&1`; + return $diff; } @@ -377,6 +375,33 @@ sub build_clean return; } +sub usage +{ + my $message = shift; + my $helptext = <<'EOF'; +pgindent [OPTION]... [FILE]... +Options: + --help show this message and quit + --typedefs=FILE file containing a list of typedefs + --list-of-typedefs=STR string containing typedefs, space separated + --code-base=DIR path to the base of PostgreSQL source code + --excludes=PATH file containing list of filename patterns to ignore + --indent=PATH path to pg_bsd_indent program + --build build the pg_bsd_indent program +--show-diff
Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences
On Sat, 21 Jan 2023 08:11:43 -0600 "Karl O. Pinc" wrote: > Attached are 2 v9 patch versions. I don't think I like them. > I think the v8 versions are better. But I thought it > wouldn't hurt to show them to you. > > On Fri, 20 Jan 2023 14:22:25 -0600 > "Karl O. Pinc" wrote: > > > Attached are 2 alternatives: > > (They touch separate files so the ordering is meaningless.) > > > > > > v8-0001-List-trusted-and-obsolete-extensions.patch > > > > Instead of putting [trusted] and [obsolete] in the titles > > of the modules, like v7 does, add a list of them into the text. > > v9 puts the list in vertical format, 5 columns. > > But the column spacing in HTML is ugly, and I don't > see a parameter to set to change it. I suppose we could > do more work on the stylesheets, but this seems excessive. Come to think of it, this should be fixed by using CSS with a table.simplelist selector. Or something along those lines. But I don't have a serious interest in proceeding further. A inline list seems good enough, even if it does not stand out in a visual scan of the page. There is a certain amount of visual-standout due to all the hyperlinks next to each other in the inline presentation. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
RE: Time delayed LR (WAS Re: logical replication restrictions)
On Saturday, January 21, 2023 3:36 AM I wrote: > Kindly have a look at the patch v18. I've conducted some refactoring for v18. Now the latest patch should be tidier and the comments would be clearer and more aligned as a whole. Attached the updated patch v19. Best Regards, Takamichi Osumi v19-0001-Time-delayed-logical-replication-subscriber.patch Description: v19-0001-Time-delayed-logical-replication-subscriber.patch
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Hello. I have registered it as patch in the commit fest: https://commitfest.postgresql.org/42/4138/ Best regards, Michail.
Re: MERGE ... RETURNING
On Mon, 9 Jan 2023 at 17:44, Dean Rasheed wrote: > > On Mon, 9 Jan 2023 at 16:23, Vik Fearing wrote: > > > > Bikeshedding here. Instead of Yet Another WITH Clause, could we perhaps > > make a MERGING() function analogous to the GROUPING() function that goes > > with grouping sets? > > > > MERGE ... > > RETURNING *, MERGING('clause'), MERGING('action'); > > > > Hmm, possibly, but I think that would complicate the implementation quite a > bit. > > GROUPING() is not really a function (in the sense that there is no > pg_proc entry for it, you can't do "\df grouping", and it isn't > executed with its arguments like a normal function). Rather, it > requires special-case handling in the parser, through to the executor, > and I think MERGING() would be similar. > > Also, it masks any user function with the same name, and would > probably require MERGING to be some level of reserved keyword. > I thought about this some more, and I think functions do make more sense here, rather than inventing a special WITH syntax. However, rather than using a special MERGING() function like GROUPING(), which isn't really a function at all, I think it's better (and much simpler to implement) to have a pair of normal functions (one returning int, and one text). The example from the tests shows the sort of thing this allows: MERGE INTO sq_target t USING sq_source s ON tid = sid WHEN MATCHED AND tid >= 2 THEN UPDATE SET balance = t.balance + delta WHEN NOT MATCHED THEN INSERT (balance, tid) VALUES (balance + delta, sid) WHEN MATCHED AND tid < 2 THEN DELETE RETURNING pg_merge_when_clause() AS when_clause, pg_merge_action() AS merge_action, t.*, CASE pg_merge_action() WHEN 'INSERT' THEN 'Inserted '||t WHEN 'UPDATE' THEN 'Added '||delta||' to balance' WHEN 'DELETE' THEN 'Removed '||t END AS description; when_clause | merge_action | tid | balance | description -+--+-+-+- 3 | DELETE | 1 | 100 | Removed (1,100) 1 | UPDATE | 2 | 220 | Added 20 to balance 2 | INSERT | 4 | 40 | Inserted (4,40) (3 rows) I think this is easier to use than the WITH syntax, and more flexible, since the new functions can be used anywhere in the RETURNING list, including in expressions. There is one limitation though. Due to the way these functions need access to the originating query, they need to appear directly in MERGE's RETURNING list, not in subqueries, plpgsql function bodies, or anything else that amounts to a different query. Maybe there's a way round that, but it looks tricky. In practice though, it's easy to work around, if necessary (e.g., by wrapping the MERGE in a CTE). Regards, Dean diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml new file mode 100644 index 0995fe0..bedb2c8 --- a/doc/src/sgml/ref/merge.sgml +++ b/doc/src/sgml/ref/merge.sgml @@ -25,6 +25,7 @@ PostgreSQL documentation MERGE INTO [ ONLY ] target_table_name [ * ] [ [ AS ] target_alias ] USING data_source ON join_condition when_clause [...] +[ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ] where data_source is: diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c new file mode 100644 index e34f583..aa3cca0 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -274,12 +274,6 @@ DoCopy(ParseState *pstate, const CopyStm { Assert(stmt->query); - /* MERGE is allowed by parser, but unimplemented. Reject for now */ - if (IsA(stmt->query, MergeStmt)) - ereport(ERROR, - errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("MERGE not supported in COPY")); - query = makeNode(RawStmt); query->stmt = stmt->query; query->stmt_location = stmt_location; diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c new file mode 100644 index 8043b4e..e02d7d0 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -510,7 +510,8 @@ BeginCopyTo(ParseState *pstate, { Assert(query->commandType == CMD_INSERT || query->commandType == CMD_UPDATE || - query->commandType == CMD_DELETE); + query->commandType == CMD_DELETE || + query->commandType == CMD_MERGE); ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c new file mode 100644 index 812ead9..8572b01 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -48,6 +48,7 @@ #include "utils/array.h" #include "utils/builtins.h" #include "utils/datum.h" +#include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/typcache.h" @@ -2485,6 +2486,22 @@ ExecInitFunc(ExprEvalStep *scratch, Expr InitFunctionCallInfoData(*fcinfo, flinfo, nargs, inputcollid, NULL, NULL); + /* + * Merge support functions should only be called
Re: run pgindent on a regular basis / scripted manner
Hi, On 2023-01-21 15:32:45 -0800, Peter Geoghegan wrote: > Attached is my .clang-format, since you asked for it. It was > originally based on stuff that both you and Peter E posted several > years back, I believe. Plus the timescaledb one in one or two places. > I worked a couple of things out through trial and error. It's > relatively hard to follow the documentation, and there have been > features added to newer LLVM versions. Reformatting with your clang-format end up with something like: Peter's: 2234 files changed, 334753 insertions(+), 289772 deletions(-) Jelte's: 2236 files changed, 357500 insertions(+), 306815 deletions(-) Mine (modified to reduce this): 2226 files changed, 261538 insertions(+), 256039 deletions(-) Which is all at least an order of magnitude too much. Jelte's uncrustify: 1773 files changed, 121722 insertions(+), 125369 deletions(-) better, but still not great. Also had to prevent a file files it choked on from getting reindented. I think the main issue with either is that our variable definition indentation just can't be emulated by the tools as-is. Some tools can indent variable definitions so that the variable name starts on the same column. Some can limit that for too long type names. But so far I haven't seen one that cn make that column be column +12. They all look to other surrounding types. I hate that variable name indentation with a fiery passion. But switching away from that intermixed with a lot of other changes isn't going to be fun. Greetings, Andres Freund