Re: [EXTERNAL] Support load balancing in libpq
Hi, "unlikely" macro is used in libpq_prng_init() in the patch. I wonder if the place is really 'hot' to use "unlikely" macro. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: [EXTERNAL] Support load balancing in libpq
> On 28 Mar 2023, at 09:16, Tatsuo Ishii wrote: > "unlikely" macro is used in libpq_prng_init() in the patch. I wonder > if the place is really 'hot' to use "unlikely" macro. I don't think it is, I was thinking to rewrite as the below sketch: { if (pg_prng_strong_seed(&conn->prng_state))) return; /* fallback seeding */ } -- Daniel Gustafsson
Re: Moving forward with TDE
On Tue, Mar 28, 2023 at 5:02 AM Stephen Frost wrote: > > > There's clearly user demand for it as there's a number of organizations >> > who have forks which are providing it in one shape or another. This >> > kind of splintering of the community is actually an actively bad thing >> > for the project and is part of what killed Unix, by at least some pretty >> > reputable accounts, in my view. >> >> Yes, the number of commercial implementations of this is a concern. Of >> course, it is also possible that those commercial implementations are >> meeting checkbox requirements rather than technical ones, and the >> community has been hostile to check box-only features. > > > I’ve grown weary of this argument as the other major piece of work it was > routinely applied to was RLS and yet that has certainly been seen broadly > as a beneficial feature with users clearly leveraging it and in more than > some “checkbox” way. > > Indeed, it’s similar also in that commercial implementations were done of > RLS while there were arguments made about it being a checkbox feature which > were used to discourage it from being implemented in core. Were it truly > checkbox, I don’t feel we would have the regular and ongoing discussion > about it on the lists that we do, nor see other tools built on top of PG > which specifically leverage it. Perhaps there are truly checkbox features > out there which we will never implement, but I’m (perhaps due to what my > dad would call selective listening on my part, perhaps not) having trouble > coming up with any presently. Features that exist in other systems that we > don’t want? Certainly. We don’t characterize those as simply “checkbox” > though. Perhaps that’s in part because we provide alternatives- but that’s > not the case here. We have no comparable way to have this capability as > part of the core system. > > We, as a community, are clearly losing value by lack of this capability, > if by no other measure than simply the numerous users of the commercial > implementations feeling that they simply can’t use PG without this feature, > for whatever their reasoning. > I also think this is something of a problem because very few requirements are actually purely technical requirements, and I think the issue is that in many cases there are ways around the lack of the feature. So I would phrase this differently. What is the value of doing this in core? This dramatically simplifies the question of setting up a PostgreSQL environment that is properly protected with encryption at rest. That in itself is valuable. Today you can accomplish something similar with encrypted filesystems and encryption options in things like pgbackrest. However these are many different pieces of a solution and missing up the setup of any one of them can compromise the data. Having a single point of encryption and decryption means fewer opportunities to mess it up and that means less risk. This in turn makes it easier to settle on using PostgreSQL. There are certainly going to be those who approach encryption at rest as a checkbox item and who don't really care if there are holes in it. But there are others who really should be concerned (and this is becoming a bigger issue where data privacy, PCI-DSS, and other requirements may come into play), and those need better tooling than we have. I also think that as data privacy becomes a larger issue, this will become a larger topic. Anyway, my contribution to that question. Best Wishes, Chris Travers > > Thanks, > > Stephen > -- Best Wishes, Chris Travers Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in. http://www.efficito.com/learn_more
RE: PGdoc: add missing ID attribute to create_subscription.sgml
Dear Amit, Thank you for reviewing! PSA new version. > Isn't it better to move the link-related part to the next line > wherever possible? Currently, it looks bit odd. Previously I preferred not to add a new line inside the tag, but it caused long-line. So I adjusted them not to be too short/long length. > Why 0002 patch is part of this thread? I thought here we want to add > 'ids' to entries corresponding to Create Subscription as we have added > the one in commit ecb696. > 0002 was motivated by Peter's comment [1]. This exceeds the initial intention of the patch, so I removed once. [1]: https://www.postgresql.org/message-id/CAHut%2BPu%2B-OocYYhW9E0gxxqgfUb1yJ8jVQ4AZ0v-ud00s7TxEA%40mail.gmail.com Best Regards, Hayato Kuroda FUJITSU LIMITED v6-0001-Add-XML-ID-attributes-to-create_subscription.sgml.patch Description: v6-0001-Add-XML-ID-attributes-to-create_subscription.sgml.patch
Re: [EXTERNAL] Support load balancing in libpq
I think it's fine to remove it. It originated from postmaster.c, where I copied the original implementation of libpq_prng_init from. On Tue, 28 Mar 2023 at 09:22, Daniel Gustafsson wrote: > > > On 28 Mar 2023, at 09:16, Tatsuo Ishii wrote: > > > "unlikely" macro is used in libpq_prng_init() in the patch. I wonder > > if the place is really 'hot' to use "unlikely" macro. > > I don't think it is, I was thinking to rewrite as the below sketch: > > { > if (pg_prng_strong_seed(&conn->prng_state))) > return; > > /* fallback seeding */ > } > > -- > Daniel Gustafsson >
Re: [EXTERNAL] Support load balancing in libpq
>> "unlikely" macro is used in libpq_prng_init() in the patch. I wonder >> if the place is really 'hot' to use "unlikely" macro. > > I don't think it is, I was thinking to rewrite as the below sketch: > > { > if (pg_prng_strong_seed(&conn->prng_state))) > return; > > /* fallback seeding */ > } +1. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: [EXTERNAL] Support load balancing in libpq
> I think it's fine to remove it. It originated from postmaster.c, where > I copied the original implementation of libpq_prng_init from. I agree to remove unlikely macro here. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: Moving forward with TDE
On Tue, Mar 28, 2023 at 8:35 AM Bruce Momjian wrote: > On Tue, Mar 28, 2023 at 02:03:50AM +0200, Stephen Frost wrote: > > The remote storage is certainly an independent system. Multi-mount LUNs > are > > entirely possible in a SAN (and absolutely with NFS, or just the NFS > server > > itself is compromised..), so while the attacker may not have any access > to the > > database server itself, they may have access to these other systems, and > that’s > > not even considering in-transit attacks which are also absolutely > possible, > > especially with iSCSI or NFS. > > > > I don’t understand what is being claimed that the remote storage is “not > an > > independent system” based on my understanding of, eg, NFS. With NFS, a > > directory on the NFS server is exported and the client mounts that > directory as > > NFS locally, all over a network which may or may not be secured against > > manipulation. A user on the NFS server with root access is absolutely > able to > > access and modify files on the NFS server trivially, even if they have no > > access to the PG server. Would you explain what you mean? > > The point is that someone could change values in the storage, pg_xact, > encryption settings, binaries, that would allow the attacker to learn > the encryption key. This is not possible for two secure endpoints and > someone changing data in transit. Yeah, it took me a while to > understand these boundaries too. > > > So the idea is that the backup user can be compromised without the > data > > being vulnerable --- makes sense, though that use-case seems narrow. > > > > That’s perhaps a fair consideration- but it’s clearly of enough value > that many > > of our users are asking for it and not using PG because we don’t have it > today. > > Ultimately though, this clearly makes it more than a “checkbox” feature. > I hope > > we are able to agree on that now. > > It is more than a check box feature, yes, but I am guessing few people > are wanting the this for the actual features beyond check box. > > > Yes, there is value beyond the check-box, but in most cases those > > values are limited considering the complexity of the features, and > the > > check-box is what most people are asking for, I think. > > > > For the users who ask on the lists for this feature, regularly, how many > don’t > > ask because they google or find prior responses on the list to the > question of > > if we have this capability? How do we know that their cases are > “checkbox”? > > Because I have rarely heard people articulate the value beyond check > box. > I think there is value. I am going to try to articulate a case for this here. The first is that if people just want a "checkbox" then they can implement PostgreSQL in ways that have encryption at rest today. This includes using LUKS and the encryption options in pgbackrest. That's good enough for a checkbox. It isn't good enough for a real, secured instance however. There are a few problems with trying to do this for a secured instance. The first is that you have multiple links in the encryption chain, and the failure of any one of them ill lead to cleartext exposure of data files. This is not a problem for those who just want to tick a checkbox. Also the fact that backups and main systems are separately encrypted there (if the backups are encrypted at all) means that people have to choose between complicating a restore process and simply ditching encryption on the backup, which makes the checkbox somewhat pointless. Where I have usually seen this come up is in the question of "how do you prevent the problem of someone pulling storage devices from your servers and taking them away to compromise your data?" Physical security comes into it but often times people want more than that as an answer. I saw questions like that from external auditors when I was at Adjust. If you want to actually address that problem, then the current tooling is quite cumbersome. Yes you can do it, but it is very hard to make sure it has been fully secured and also very hard to monitor. TDE would make the setup and verification of this much easier. And in particular it solves a number of other issues that I can see arising from LUKS and similar approaches since it doesn't rely on the kernel to be able to translate plain text to and from cypher text. I have actually worked with folks who have PII and need to protect it and who currently use LUKS and pg_backrest to do so. I would be extremely happy to see TDE replace those for their needs. I can imagine that those who hold high value data would use it as well instead of these other more error prone and less secure setups. > > > Consider that there are standards groups which explicitly consider these > attack > > vectors and consider them important enough to require mitigations to > address > > those vectors. Do the end users of PG understand the attack vectors or > why they > > matter? Perhaps not, but just because
Re: Should vacuum process config file reload more often
At Mon, 27 Mar 2023 14:12:03 -0400, Melanie Plageman wrote in > So, I've attached an alternate version of the patchset which takes the > approach of having one commit which only enables cost-based delay GUC > refresh for VACUUM and another commit which enables it for autovacuum > and makes the changes to balancing variables. > > I still think the commit which has workers updating their own > wi_cost_delay in vacuum_delay_point() is a bit weird. It relies on no one > else emulating our bad behavior and reading from wi_cost_delay without a > lock and on no one else deciding to ever write to wi_cost_delay (even > though it is in shared memory [this is the same as master]). It is only > safe because our process is the only one (right now) writing to > wi_cost_delay, so when we read from it without a lock, we know it isn't > being written to. And everyone else takes a lock when reading from > wi_cost_delay right now. So, it seems...not great. > > This approach also introduces a function that is only around for > one commit until the next commit obsoletes it, which seems a bit silly. (I'm not sure what this refers to, though..) I don't think it's silly if a later patch removes something that the preceding patches introdcued, as long as that contributes to readability. Untimately, they will be merged together on committing. > Basically, I think it is probably better to just have one commit > enabling guc refresh for VACUUM and then another which correctly > implements what is needed for autovacuum to do the same. > Attached v9 does this. > > I've provided both complete versions of both approaches (v9 and v8). I took a look at v9 and have a few comments. 0001: I don't believe it is necessary, as mentioned in the commit message. It apperas that we are resetting it at the appropriate times. 0002: I felt a bit uneasy on this. It seems somewhat complex (and makes the succeeding patches complex), has confusing names, and doesn't seem like self-contained. I think it'd be simpler to add a global boolean (maybe VacuumCostActiveForceDisable or such) that forces VacuumCostActive to be false and set VacuumCostActive using a setter function that follows the boolean. 0003: +* Reload the configuration file if requested. This allows changes to +* vacuum_cost_limit and vacuum_cost_delay to take effect while a table is +* being vacuumed or analyzed. Analyze should not reload configuration +* file if it is in an outer transaction, as GUC values shouldn't be +* allowed to refer to some uncommitted state (e.g. database objects +* created in this transaction). I'm not sure GUC reload is or should be related to transactions. For instance, work_mem can be changed by a reload during a transaction unless it has been set in the current transaction. I don't think we need to deliberately suppress changes in variables caused by realods during transactions only for analzye. If analyze doesn't like changes to certain GUC variables, their values should be snapshotted before starting the process. 0004: - double at_vacuum_cost_delay; - int at_vacuum_cost_limit; + double at_table_option_vac_cost_delay; + int at_table_option_vac_cost_limit; We call that options "relopt(ion)". I don't think there's any reason to use different names. dlist_head av_runningWorkers; WorkerInfo av_startingWorker; AutoVacuumWorkItem av_workItems[NUM_WORKITEMS]; + pg_atomic_uint32 av_nworkers_for_balance; The name of the new member doesn't seem to follow the surrounding convention. (However, I don't think the member is needed. See below.) - /* -* Remember the prevailing values of the vacuum cost GUCs. We have to -* restore these at the bottom of the loop, else we'll compute wrong -* values in the next iteration of autovac_balance_cost(). -*/ - stdVacuumCostDelay = VacuumCostDelay; - stdVacuumCostLimit = VacuumCostLimit; + av_table_option_cost_limit = tab->at_table_option_vac_cost_limit; + av_table_option_cost_delay = tab->at_table_option_vac_cost_delay; I think this requires a comment. + /* There is at least 1 autovac worker (this worker). */ + int nworkers_for_balance = Max(pg_atomic_read_u32( + &AutoVacuumShmem->av_nworkers_for_balance), 1); I think it *must* be greater than 0. However, to begin with, I don't think we need that variable to be shared. I don't believe it matters if we count involved workers every time we calcualte the delay. +/* + * autovac_balance_cost + * Recalculate the number of workers to consider, given table options and + * the current number of active workers. + * + * Caller must h
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry
At Tue, 28 Mar 2023 15:16:36 +0900, Michael Paquier wrote in > On Tue, Mar 28, 2023 at 07:49:45AM +0200, Drouvot, Bertrand wrote: > > What about something like? > > > > for pg_stat_get_xact_blocks_fetched(): "block read requests for table or > > index, in the current > > transaction. This number minus pg_stat_get_xact_blocks_hit() gives the > > number of kernel > > read() calls." > > > > pg_stat_get_xact_blocks_hit(): "block read requests for table or index, in > > the current > > transaction, found in cache (not triggering kernel read() calls)". > > Something among these lines within the table would be also OK by me. > Horiguchi-san or Melanie-san, perhaps you have counter-proposals? No. Fine by me, except that "block read requests" seems to suggest kernel read() calls, maybe because it's not clear whether "block" refers to our buffer blocks or file blocks to me.. If it is generally clear, I'm fine with the proposal. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: running logical replication as the subscription owner
On Mon, 27 Mar 2023 at 22:47, Robert Haas wrote: > Can a table owner defend themselves > against a subscription owner who wants to usurp their privileges? I have a hard time following the discussion. And I think it's because there's lots of different possible privilege escalations to think about. Below is the list of escalations I've gathered and how I think they interact with the different patches: 1. Table owner escalating to a high-privileged subscription owner. i.e. the subscription owner is superuser, or has SET ROLE privileges for all owners of tables in the subscription. 2. Table owner escalating to a low-privileged subscription owner. e.g. the subscription owner can only insert into the tables in its subscription a. The subscription owner only has insert permissions for a tables owned by a single other user b. The subscription owner has insert permissions for tables owned by multiple other users (but still does not have SET ROLE, or possibly even select/update/delete) 3. Publisher applying into tables that the subscriber side doesn't want it to 4. Subscription-owner escalating to table owner a. The subscription owner is highly privileged (allows SET ROLE to table owner) b. The subscription owner is lowly privileged Which can currently only be addressed by having 1 subscription/publication pair for every table owner. This has the big issue that WAL needs to be decoded multiple times on the publisher. This patch would make escalation 1 impossible without having to do anything special when setting up the subscription. With Citus we only run into this escalation issue with logical replication at the moment. We want to replicate lots of different tables, possibly owned by multiple users from one node to another. We trust the publisher and we trust the subscription owner, but we don't trust the table owners at all. This is why I'm very much in favor of a version of this patch. 2.a seems a non-issue, because in this case the table owner can easily give itself the same permissions as the subscription owner (if it didn't have them yet). So by "escalating" to the subscription owner the table owner only gets fewer permissions. 2.b is actually interesting from a security perspective, because by escalating to the subscription owner, the table owner might be able to insert into tables that it normally can't. Patch v1 would "solve" both these issues by simply not supporting these scenarios. My patch v2 keeps the existing behaviour, where both "escalations" are possible and who-ever sets up the replication should create a dedicated subscriber for each table owner to make sure that only 2.a ever occurs and 2.b does not. 3 is something I have not run into. But I can easily imagine a case where a subscriber connects to a (somewhat) untrusted publisher for the purpose of replicating changes from a single table, e.g. some events table. But you don't want to allow replication into any other tables, even if the publisher tells you to. Patch v1 would force you to have SET ROLE privileges on the target events table its owner. So if that owner owns other tables too, then effectively you'd allow the publisher to write into those tables too. The current behaviour (without any patch) supports protecting against this escalation, by giving only INSERT permissions on a single table without the need for SET ROLE. My v2 patch preserves that ability. 4.a again seems like an obvious non-issue to me because the subscription owner can already "escalate" to table owner using SET ROLE. 4.b seems like it's pretty much the same as 3, afaict all the same arguments apply. And I honestly can't think of a real situation where you would not trust the subscription owner (as opposed to the publisher). If any of you have an example of such a situation I'd love to understand this one better. All in all, I think patch v2 is the right direction here, because it covers all these escalations to some extent.
Re: running logical replication as the subscription owner
On Tue, 28 Mar 2023 at 11:15, Jelte Fennema wrote: > Which can currently only be addressed by having 1 > subscription/publication pair for every table owner. Oops, moving sentences around in my email made me not explicitly reference escalation 1 anymore. The above should have been: 1 can currently only be addressed by having 1 subscription/publication pair for every table owner.
Re: Save a few bytes in pg_attribute
On 23.03.23 13:45, Peter Eisentraut wrote: My suggestion is to use this patch and then consider the column encryption patch as it stands now. I have committed this.
Re: Remove 'htmlhelp' documentat format (was meson documentation build open issues)
On 24.03.23 17:58, Andres Freund wrote: On 2023-03-24 11:59:23 +0100, Peter Eisentraut wrote: Another option here is to remove support for htmlhelp. That might actually be the best path - it certainly doesn't look like anybody has been actively using it. Or otherwise somebody would have complained about there not being any instructions on how to actually compile a .chm file. And perhaps complained that it takes next to forever to build. I also have the impression that people don't use the .chm stuff much anymore, but that might just be me not using windows. I think in ancient times, pgadmin used it for its internal help. But I have heard less about htmlhelp over the years than about the info format.
Re: Initial Schema Sync for Logical Replication
On Mon, Mar 27, 2023 at 8:17 AM Masahiko Sawada wrote: > > On Fri, Mar 24, 2023 at 11:51 PM Kumar, Sachin wrote: > > > > > From: Amit Kapila > > > > I think we won't be able to use same snapshot because the transaction > > > > will > > > > be committed. > > > > In CreateSubscription() we can use the transaction snapshot from > > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am not sure > > > > about this part maybe walrcv_disconnect() calls the commits internally > > > > ?). > > > > So somehow we need to keep this snapshot alive, even after transaction > > > > is committed(or delay committing the transaction , but we can have > > > > CREATE SUBSCRIPTION with ENABLED=FALSE, so we can have a restart > > > > before tableSync is able to use the same snapshot.) > > > > > > > > > > Can we think of getting the table data as well along with schema via > > > pg_dump? Won't then both schema and initial data will correspond to the > > > same snapshot? > > > > Right , that will work, Thanks! > > While it works, we cannot get the initial data in parallel, no? > Another possibility is that we dump/restore the schema of each table along with its data. One thing we can explore is whether the parallel option of dump can be useful here. Do you have any other ideas? One related idea is that currently, we fetch the table list corresponding to publications in subscription and create the entries for those in pg_subscription_rel during Create Subscription, can we think of postponing that work till after the initial schema sync? We seem to be already storing publications list in pg_subscription, so it appears possible if we somehow remember the value of copy_data. If this is feasible then I think that may give us the flexibility to perform the initial sync at a later point by the background worker. > > > > > > I think we can have same issues as you mentioned New table t1 is added > > > > to the publication , User does a refresh publication. > > > > pg_dump / pg_restore restores the table definition. But before > > > > tableSync can start, steps from 2 to 5 happen on the publisher. > > > > > 1. Create Table t1(c1, c2); --LSN: 90 2. Insert t1 (1, 1); --LSN 100 > > > > > 3. Insert t1 (2, 2); --LSN 110 4. Alter t1 Add Column c3; --LSN 120 > > > > > 5. Insert t1 (3, 3, 3); --LSN 130 > > > > And table sync errors out > > > > There can be one more issue , since we took the pg_dump without > > > snapshot (wrt to replication slot). > > > > > > > > > > To avoid both the problems mentioned for Refresh Publication, we can do > > > one of the following: (a) create a new slot along with a snapshot for this > > > operation and drop it afterward; or (b) using the existing slot, > > > establish a > > > new snapshot using a technique proposed in email [1]. > > > > > > > Thanks, I think option (b) will be perfect, since we don’t have to create a > > new slot. > > Regarding (b), does it mean that apply worker stops streaming, > requests to create a snapshot, and then resumes the streaming? > Shouldn't this be done by the backend performing a REFRESH publication? -- With Regards, Amit Kapila.
Re: Remove 'htmlhelp' documentat format (was meson documentation build open issues)
On Tue, 28 Mar 2023 at 10:46, Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > On 24.03.23 17:58, Andres Freund wrote: > > On 2023-03-24 11:59:23 +0100, Peter Eisentraut wrote: > >> Another option here is to remove support for htmlhelp. > > > > That might actually be the best path - it certainly doesn't look like > anybody > > has been actively using it. Or otherwise somebody would have complained > about > > there not being any instructions on how to actually compile a .chm file. > And > > perhaps complained that it takes next to forever to build. > > > > I also have the impression that people don't use the .chm stuff much > anymore, > > but that might just be me not using windows. > > I think in ancient times, pgadmin used it for its internal help. > Yes, very ancient :-). We use Sphinx now. > > But I have heard less about htmlhelp over the years than about the info > format. > > > -- Dave Page Blog: https://pgsnake.blogspot.com Twitter: @pgsnake EDB: https://www.enterprisedb.com
RE: Data is copied twice when specifying both child and parent table in publication
On Tues, Mar 28, 2023 at 7:02 AM Jacob Champion wrote: > On Mon, Mar 20, 2023 at 11:22 PM Amit Kapila > wrote: > > If the tests you have in mind are only related to this patch set then > > feel free to propose them here if you feel the current ones are not > > sufficient. > > I think the new tests added by Wang cover my concerns (thanks!). I share > Peter's comment that we don't seem to have a regression test covering > only the bug description itself -- just ones that combine that case with > row and column restrictions -- but if you're all happy with the existing > approach then I have nothing much to add there. The scenario of this bug is to subscribe to two publications at the same time, and these two publications publish parent table and child table respectively. And option via_root is specified in both publications or only in the publication of the parent table. At this time, the data on the publisher-side will be copied twice (the data will be copied to the two tables on the subscribe-side respectively). So, I think we have covered this bug itself in 013_partition.pl. We inserted the initial data into the parent table tab4 on the publisher-side, and checked whether the sync is completed as we expected (there is data in table tab4, but there is no data in table tab4_1). > I was staring at this subquery in fetch_table_list(): > > > +" ( SELECT array_agg(a.attname ORDER > > BY a.attnum)\n" > > +"FROM pg_attribute a\n" > > +"WHERE a.attrelid = gpt.relid > > AND\n" > > +" a.attnum = ANY(gpt.attrs)\n" > > +" ) AS attnames\n" > > On my machine this takes up roughly 90% of the runtime of the query, > which makes for a noticeable delay with a bigger test case (a couple of > FOR ALL TABLES subscriptions on the regression database). And it seems > like we immediately throw all that work away: if I understand correctly, > we only use the third column for its interaction with DISTINCT. Would it > be enough to just replace that whole thing with gpt.attrs? Make sense. Changed as suggested. Attach the new patch. Regards, Wang Wei v25-0001-Avoid-syncing-data-twice-for-the-publish_via_par.patch Description: v25-0001-Avoid-syncing-data-twice-for-the-publish_via_par.patch
Re: Add standard collation UNICODE
On 23.03.23 21:16, Jeff Davis wrote: Another thought: for ICU, do we want the default collation to be UNICODE (root collation)? What we have now gets the default from the environment, which is consistent with the libc provider. But now that we have the UNICODE collation, it makes me wonder if we should just default to that. The server's environment doesn't necessarily say much about the locale of the data stored in it or the locale of the applications accessing it. As long as we still have to initialize the libc locale fields to some language, I think it would be less confusing to keep the ICU locale on the same language. If we ever manage to get rid of that, then I would also support making the ICU locale the root collation by default.
RE: Data is copied twice when specifying both child and parent table in publication
On Tues, Mar 28, 2023 at 18:00 PM Wang, Wei/王 威 wrote: > Attach the new patch. Sorry, I attached the wrong patch. Here is the correct new version patch which addressed all comments so far. Regards, Wang Wei v25-0001-Avoid-syncing-data-twice-for-the-publish_via_par.patch Description: v25-0001-Avoid-syncing-data-twice-for-the-publish_via_par.patch
Re: doc: add missing "id" attributes to extension packaging page
On 28.03.2023 at 00:11, Peter Smith wrote: FYI, there is a lot of overlap between this last attachment and the patches of Kuroda-san's current thread here [1] which is also adding ids to create_subscription.sgml. (Anyway, I guess you might already be aware of that other thread because your new ids look like they have the same names as those chosen by Kuroda-san) -- [1] https://www.postgresql.org/message-id/flat/CAHut%2BPvzo6%3DKKLqMR6-mAQdM%2Bj_dse0eUreGmrFouL7gbLbv2w%40mail.gmail.com#7da8d0e3b73096375847c16c856b4aed Thanks, I actually was not aware of this. Also, kudos for capturing the missing id and advocating for consistency regarding ids even before this is actively enforced. This nurtures my optimism that consistency might actually be achieveable without everybody getting angry at me because my patch will enforce it. Regarding the overlap, I currently try to make it as easy as possible for a potential committer and I'm happy to rebase my patch upon request or if Kuroda-san's patch gets applied first. Regards, Brar
Move definition of standard collations from initdb to pg_collation.dat
While working on [0], I was wondering why the collations ucs_basic and unicode are not in pg_collation.dat. I traced this back through history, and I think this was just lost in a game of telephone. The initial commit for pg_collation.h (414c5a2ea6) has only the default collation in pg_collation.h (pre .dat), with initdb handling everything else. Over time, additional collations "C" and "POSIX" were moved to pg_collation.h, and other logic was moved from initdb to pg_import_system_collations(). But ucs_basic was untouched. Commit 0b13b2a771 rearranged the relative order of operations in initdb and added the current comment "We don't want to pin these", but looking at the email[1], I think this was more a guess about the previous intent. I suggest we fix this now; see attached patch. [0]: https://www.postgresql.org/message-id/flat/1293e382-2093-a2bf-a397-c04e8f83d3c2%40enterprisedb.com [1]: https://www.postgresql.org/message-id/28195.1498172402%40sss.pgh.pa.usFrom 0d2c6b92a3340833f13bab395e0556ce1f045226 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 28 Mar 2023 12:04:34 +0200 Subject: [PATCH] Move definition of standard collations from initdb to pg_collation.dat --- src/bin/initdb/initdb.c | 15 +-- src/include/catalog/pg_collation.dat | 7 +++ 2 files changed, 8 insertions(+), 14 deletions(-) diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index bae97539fc..9ccbf998ec 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -1695,20 +1695,7 @@ setup_description(FILE *cmdfd) static void setup_collation(FILE *cmdfd) { - /* -* Add SQL-standard names. We don't want to pin these, so they don't go -* in pg_collation.dat. But add them before reading system collations, so -* that they win if libc defines a locale with the same name. -*/ - PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, collowner, collprovider, collisdeterministic, collencoding, colliculocale)" - "VALUES (pg_nextoid('pg_catalog.pg_collation', 'oid', 'pg_catalog.pg_collation_oid_index'), 'unicode', 'pg_catalog'::regnamespace, %u, '%c', true, -1, 'und');\n\n", - BOOTSTRAP_SUPERUSERID, COLLPROVIDER_ICU); - - PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, collowner, collprovider, collisdeterministic, collencoding, collcollate, collctype)" - "VALUES (pg_nextoid('pg_catalog.pg_collation', 'oid', 'pg_catalog.pg_collation_oid_index'), 'ucs_basic', 'pg_catalog'::regnamespace, %u, '%c', true, %d, 'C', 'C');\n\n", - BOOTSTRAP_SUPERUSERID, COLLPROVIDER_LIBC, PG_UTF8); - - /* Now import all collations we can find in the operating system */ + /* Import all collations we can find in the operating system */ PG_CMD_PUTS("SELECT pg_import_system_collations('pg_catalog');\n\n"); } diff --git a/src/include/catalog/pg_collation.dat b/src/include/catalog/pg_collation.dat index f4bda1c769..14df398ad2 100644 --- a/src/include/catalog/pg_collation.dat +++ b/src/include/catalog/pg_collation.dat @@ -23,5 +23,12 @@ descr => 'standard POSIX collation', collname => 'POSIX', collprovider => 'c', collencoding => '-1', collcollate => 'POSIX', collctype => 'POSIX' }, +{ oid => '962', + descr => 'sorts using the Unicode Collation Algorithm with default settings', + collname => 'unicode', collprovider => 'i', collencoding => '-1', + colliculocale => 'und' }, +{ oid => '963', descr => 'sorts by Unicode code point', + collname => 'ucs_basic', collprovider => 'c', collencoding => '6', + collcollate => 'C', collctype => 'C' }, ] -- 2.40.0
"variable not found in subplan target list"
I have to run now so can't dissect it, but while running sqlsmith on the SQL/JSON patch after Justin's report, I got $SUBJECT in this query: MERGE INTO public.target_parted as target_0 USING (select subq_0.c5 as c0, subq_0.c0 as c1, ref_0.a as c2, subq_0.c1 as c3, subq_0.c9 as c4, (select c from public.prt2_m_p3 limit 1 offset 1) as c5, subq_0.c8 as c6, ref_0.a as c7, subq_0.c7 as c8, subq_0.c1 as c9, pg_catalog.system_user() as c10 from public.itest1 as ref_0 left join (select ref_1.matches as c0, ref_1.typ as c1, ref_1.colname as c2, (select slotname from public.iface limit 1 offset 44) as c3, ref_1.matches as c4, ref_1.op as c5, ref_1.matches as c6, ref_1.value as c7, ref_1.op as c8, ref_1.op as c9, ref_1.typ as c10 from public.brinopers_multi as ref_1 where cast(null as polygon) <@ (select polygon from public.tab_core_types limit 1 offset 22) ) as subq_0 on (cast(null as macaddr8) >= cast(null as macaddr8)) where subq_0.c10 > subq_0.c2 limit 49) as subq_1 ON target_0.b = subq_1.c2 WHEN MATCHED AND (cast(null as box) |>> cast(null as box)) or (cast(null as lseg) ?-| (select s from public.lseg_tbl limit 1 offset 6) ) THEN DELETE WHEN NOT MATCHED AND (EXISTS ( select 21 as c0, subq_2.c0 as c1 from public.itest14 as sample_0 tablesample system (3.6) inner join public.num_exp_sqrt as sample_1 tablesample bernoulli (0.3) on (cast(null as "char") <= cast(null as "char")), lateral (select sample_1.id as c0 from public.a as ref_2 where (cast(null as lseg) <@ cast(null as line)) or ((select b3 from public.bit_defaults limit 1 offset 80) <> (select b3 from public.bit_defaults limit 1 offset 4) ) limit 158) as subq_2 where (cast(null as name) !~ (select t from public.test_tsvector limit 1 offset 5) ) and ((select bool from public.tab_core_types limit 1 offset 61) < (select pg_catalog.bool_or(v) from public.rtest_view1) ))) or (18 is NULL) THEN INSERT VALUES ( pg_catalog.int4um( cast(public.func_with_bad_set() as int4)), 13) WHEN MATCHED AND ((24 is not NULL) or (true)) or (cast(null as "timestamp") <= cast(null as timestamptz)) THEN UPDATE set b = target_0.b Ugh. I got no more SQL/JSON related crashes so far. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)
Re: Move definition of standard collations from initdb to pg_collation.dat
Peter Eisentraut writes: > While working on [0], I was wondering why the collations ucs_basic and > unicode are not in pg_collation.dat. I traced this back through > history, and I think this was just lost in a game of telephone. > The initial commit for pg_collation.h (414c5a2ea6) has only the default > collation in pg_collation.h (pre .dat), with initdb handling everything > else. Over time, additional collations "C" and "POSIX" were moved to > pg_collation.h, and other logic was moved from initdb to > pg_import_system_collations(). But ucs_basic was untouched. Commit > 0b13b2a771 rearranged the relative order of operations in initdb and > added the current comment "We don't want to pin these", but looking at > the email[1], I think this was more a guess about the previous intent. Yeah, I was just loath to change the previous behavior in that patch. I can't see any strong reason not to pin these entries. > I suggest we fix this now; see attached patch. While we're here, do we want to adopt some other spelling of "the root locale" than "und", in view of recent discoveries about the instability of that on old ICU versions? regards, tom lane
Re: "variable not found in subplan target list"
Alvaro Herrera writes: > I have to run now so can't dissect it, but while running sqlsmith on the > SQL/JSON patch after Justin's report, I got $SUBJECT in this query: Reproduces in HEAD and v15 too (once you replace pg_catalog.system_user with some function that exists in v15). So it's not the fault of the JSON patch, nor of my outer-join hacking which had been my first thought. regards, tom lane
Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths
On Mon, Dec 19, 2022 at 12:37:19PM +0100, Alvaro Herrera wrote: > I have created one in the January commitfest, > https://commitfest.postgresql.org/41/ > and rebased the patch on current master. (I have not reviewed this.) I have spent some time on that, and here are some comments with an updated version of the patch attached. The checks in XLogRegisterData() seemed overcomplicated to me. In this context, I think that we should just care about making sure that mainrdata_len does not overflow depending on the length given by the caller, which is where pg_add_u32_overflow() becomes handy. XLogRegisterBufData() added a check on UINT16_MAX in an assert, though we already check for overflow a couple of lines down. This is not necessary, it seems. @@ -535,6 +567,9 @@ XLogRecordAssemble(RmgrId rmid, uint8 info, XLogRecord *rechdr; char *scratch = hdr_scratch; + /* ensure that any assembled record can be decoded */ + Assert(AllocSizeIsValid(DecodeXLogRecordRequiredSpace(MaxXLogRecordSize))); A hardcoded check like that has no need to be in a code path triggered each time a WAL record is assembled. One place where this could be is InitXLogInsert(). It still means that it is called one time for each backend, but seeing it where the initialization of xloginsert.c feels natural, at least. A postmaster location would be enough, as well. XLogRecordMaxSize just needs to be checked once IMO, around the end of XLogRecordAssemble() once we know the total size of the record that will be fed to a XLogReader. One thing that we should be more careful of is to make sure that total_len does not overflow its uint32 value while assembling the record, as well. I have removed XLogErrorDataLimitExceeded(), replacing it with more context about the errors happening. Perhaps this has no need to be that much verbose, but it can be really useful for developers. Some comments had no need to be updated, and there were some typos. I am on board with the idea of a XLogRecordMaxSize that's bounded at 1020MB, leaving 4MB as room for the extra data needed by a XLogReader. At the end, I think that this is quite interesting long-term. For example, if we lift up XLogRecordMaxSize, we can evaluate the APIs adding buffer data or main data separately. Thoughts about this version? -- Michael From 16b40324a5bc5bbe6e06cbfb86251c907f400151 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 28 Mar 2023 20:34:31 +0900 Subject: [PATCH v10] Add protections in xlog record APIs against overflows Before this, it was possible for an extension to create malicious WAL records that were too large to replay; or that would overflow the xl_tot_len field, causing potential corruption in WAL record IO ops. Emitting invalid records was also possible through pg_logical_emit_message(), which allowed you to emit arbitrary amounts of data up to 2GB, much higher than the replay limit of approximately 1GB. This patch adds a limit to the size of an XLogRecord (1020MiB), checks against overflows, and ensures that the reader infrastructure can read any max-sized XLogRecords, such that the wal-generating backend will fail when it attempts to insert unreadable records, instead of that insertion succeeding but breaking any replication streams. Author: Matthias van de Meent --- src/include/access/xlogrecord.h | 11 src/backend/access/transam/xloginsert.c | 73 + 2 files changed, 74 insertions(+), 10 deletions(-) diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h index 0d576f7883..0a7b0bb6fc 100644 --- a/src/include/access/xlogrecord.h +++ b/src/include/access/xlogrecord.h @@ -54,6 +54,17 @@ typedef struct XLogRecord #define SizeOfXLogRecord (offsetof(XLogRecord, xl_crc) + sizeof(pg_crc32c)) +/* + * XLogReader needs to allocate all the data of an xlog record in a single + * chunk. This means that a single XLogRecord cannot exceed MaxAllocSize + * in length if we ignore any allocation overhead of the XLogReader. + * + * To accommodate some overhead, this value allows for 4M of allocation + * overhead, that should be plenty enough for what + * DecodeXLogRecordRequiredSpace() expects as extra. + */ +#define XLogRecordMaxSize (1020 * 1024 * 1024) + /* * The high 4 bits in xl_info may be used freely by rmgr. The * XLR_SPECIAL_REL_UPDATE and XLR_CHECK_CONSISTENCY bits can be passed by diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c index 008612e032..86fa6a5276 100644 --- a/src/backend/access/transam/xloginsert.c +++ b/src/backend/access/transam/xloginsert.c @@ -33,6 +33,7 @@ #include "access/xloginsert.h" #include "catalog/pg_control.h" #include "common/pg_lzcompress.h" +#include "common/int.h" #include "executor/instrument.h" #include "miscadmin.h" #include "pg_trace.h" @@ -351,11 +352,13 @@ void XLogRegisterData(char *data, uint32 len) { XLogRecData *rdata; + uint32 result = 0; Assert(begininsert_call
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On 2023-03-27 23:28, Damir Belyalov wrote: Hi! I made the specified changes and my patch turned out the same as yours. The performance measurements were the same too. Thanks for your review and measurements. The only thing left to do is how not to add IGNORE_DATATYPE_ERRORS as a keyword. See how this is done for parameters such as FORCE_NOT_NULL, FORCE_NULL, FORCE_QUOTE. They are not in kwlist.h and are not as keywords in gram.y. I might misunderstand something, but I believe the v5 patch uses copy_generic_opt_list and it does not add IGNORE_DATATYPE_ERRORS as a keyword. It modifies neither kwlist.h nor gram.y. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
> > I might misunderstand something, but I believe the v5 patch uses > copy_generic_opt_list and it does not add IGNORE_DATATYPE_ERRORS as a > keyword. > It modifies neither kwlist.h nor gram.y. > Sorry, didn't notice that. I think everything is alright now. Regards, Damir Belyalov Postgres Professional
Re: PGdoc: add missing ID attribute to create_subscription.sgml
On Tue, Mar 28, 2023 at 1:00 PM Hayato Kuroda (Fujitsu) wrote: > > Dear Amit, > > Thank you for reviewing! PSA new version. > > > Isn't it better to move the link-related part to the next line > > wherever possible? Currently, it looks bit odd. > > Previously I preferred not to add a new line inside the tag, but it > caused > long-line. So I adjusted them not to be too short/long length. > There is no need to break the link line. See attached. -- With Regards, Amit Kapila. v7-0001-Add-XML-ID-attributes-to-create_subscription.sgml.patch Description: Binary data
Re: Add standard collation UNICODE
On 3/28/23 06:07, Peter Eisentraut wrote: On 23.03.23 21:16, Jeff Davis wrote: Another thought: for ICU, do we want the default collation to be UNICODE (root collation)? What we have now gets the default from the environment, which is consistent with the libc provider. But now that we have the UNICODE collation, it makes me wonder if we should just default to that. The server's environment doesn't necessarily say much about the locale of the data stored in it or the locale of the applications accessing it. As long as we still have to initialize the libc locale fields to some language, I think it would be less confusing to keep the ICU locale on the same language. I definitely agree with that. If we ever manage to get rid of that, then I would also support making the ICU locale the root collation by default. +1 -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: "variable not found in subplan target list"
Alvaro Herrera writes: > I have to run now so can't dissect it, but while running sqlsmith on the > SQL/JSON patch after Justin's report, I got $SUBJECT in this query: I reduced this down to MERGE INTO public.target_parted as target_0 USING public.itest1 as ref_0 ON target_0.b = ref_0.a WHEN NOT MATCHED THEN INSERT VALUES (42, 13); The critical moving part seems to just be that the MERGE target is a partitioned table ... but surely somebody tested that before? regards, tom lane
Re: Hash table scans outside transactions
Bumping it to attract some attention. On Tue, Mar 21, 2023 at 12:51 PM Ashutosh Bapat wrote: > > Hi, > Hash table scans (seq_scan_table/level) are cleaned up at the end of a > transaction in AtEOXact_HashTables(). If a hash seq scan continues > beyond transaction end it will meet "ERROR: no hash_seq_search scan > for hash table" in deregister_seq_scan(). That seems like a limiting > the hash table usage. > > Our use case is > 1. Add/update/remove entries in hash table > 2. Scan the existing entries and perform one transaction per entry > 3. Close scan > > repeat above steps in an infinite loop. Note that we do not > add/modify/delete entries in step 2. We can't use linked lists since > the entries need to be updated or deleted using hash keys. Because the > hash seq scan is cleaned up at the end of the transaction, we > encounter error in the 3rd step. I don't see that the actual hash > table scan depends upon the seq_scan_table/level[] which is cleaned up > at the end of the transaction. > > I have following questions > 1. Is there a way to avoid cleaning up seq_scan_table/level() when the > transaction ends? > 2. Is there a way that we can use hash table implementation in > PostgreSQL code for our purpose? > > > -- > Best Wishes, > Ashutosh Bapat -- Best Wishes, Ashutosh Bapat
Re: "variable not found in subplan target list"
I wrote: > I reduced this down to > MERGE INTO public.target_parted as target_0 > USING public.itest1 as ref_0 > ON target_0.b = ref_0.a > WHEN NOT MATCHED >THEN INSERT VALUES (42, 13); > The critical moving part seems to just be that the MERGE target > is a partitioned table ... but surely somebody tested that before? Oh, it's not just any partitioned table: regression=# \d+ target_parted Partitioned table "public.target_parted" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description +-+---+--+-+-+-+--+- a | integer | | | | plain | | | b | integer | | | | plain | | | Partition key: LIST (a) Number of partitions: 0 The planner is reducing the scan of target_parted to a dummy scan, as is reasonable, but it forgets to provide ctid as an output from that scan; then the parent join node is unhappy because it does have a ctid output. So it looks like the problem is some shortcut we take while creating the dummy scan. I suppose that without the planner bug, this'd fail at runtime for lack of a partition to put (42,13) into. Because of that, the case isn't really interesting for production, which may explain the lack of reports. regards, tom lane
Re: Memory leak from ExecutorState context?
On Tue, 28 Mar 2023 00:43:34 +0200 Tomas Vondra wrote: > On 3/27/23 23:13, Jehan-Guillaume de Rorthais wrote: > > Please, find in attachment a patch to allocate bufFiles in a dedicated > > context. I picked up your patch, backpatch'd it, went through it and did > > some minor changes to it. I have some comment/questions thought. > > > > 1. I'm not sure why we must allocate the "HashBatchFiles" new context > > under ExecutorState and not under hashtable->hashCxt? > > > > The only references I could find was in hashjoin.h:30: > > > >/* [...] > > * [...] (Exception: data associated with the temp files lives in the > > * per-query context too, since we always call buffile.c in that > > context.) > > > > And in nodeHashjoin.c:1243:ExecHashJoinSaveTuple() (I reworded this > > original comment in the patch): > > > >/* [...] > > * Note: it is important always to call this in the regular executor > > * context, not in a shorter-lived context; else the temp file buffers > > * will get messed up. > > > > > > But these are not explanation of why BufFile related allocations must be > > under a per-query context. > > > > Doesn't that simply describe the current (unpatched) behavior where > BufFile is allocated in the per-query context? I wasn't sure. The first quote from hashjoin.h seems to describe a stronger rule about «**always** call buffile.c in per-query context». But maybe it ought to be «always call buffile.c from one of the sub-query context»? I assume the aim is to enforce the tmp files removal on query end/error? > I mean, the current code calls BufFileCreateTemp() without switching the > context, so it's in the ExecutorState. But with the patch it very clearly is > not. > > And I'm pretty sure the patch should do > > hashtable->fileCxt = AllocSetContextCreate(hashtable->hashCxt, > "HashBatchFiles", > ALLOCSET_DEFAULT_SIZES); > > and it'd still work. Or why do you think we *must* allocate it under > ExecutorState? That was actually my very first patch and it indeed worked. But I was confused about the previous quoted code comments. That's why I kept your original code and decided to rise the discussion here. Fixed in new patch in attachment. > FWIW The comment in hashjoin.h needs updating to reflect the change. Done in the last patch. Is my rewording accurate? > > 2. Wrapping each call of ExecHashJoinSaveTuple() with a memory context > > switch seems fragile as it could be forgotten in futur code path/changes. > > So I added an Assert() in the function to make sure the current memory > > context is "HashBatchFiles" as expected. > > Another way to tie this up might be to pass the memory context as > > argument to the function. > > ... Or maybe I'm over precautionary. > > > > I'm not sure I'd call that fragile, we have plenty other code that > expects the memory context to be set correctly. Not sure about the > assert, but we don't have similar asserts anywhere else. I mostly sticked it there to stimulate the discussion around this as I needed to scratch that itch. > But I think it's just ugly and overly verbose +1 Your patch was just a demo/debug patch by the time. It needed some cleanup now :) > it'd be much nicer to e.g. pass the memory context as a parameter, and do > the switch inside. That was a proposition in my previous mail, so I did it in the new patch. Let's see what other reviewers think. > > 3. You wrote: > > > >>> A separate BufFile memory context helps, although people won't see it > >>> unless they attach a debugger, I think. Better than nothing, but I was > >>> wondering if we could maybe print some warnings when the number of batch > >>> files gets too high ... > > > > So I added a WARNING when batches memory are exhausting the memory size > > allowed. > > > >+ if (hashtable->fileCxt->mem_allocated > hashtable->spaceAllowed) > >+ elog(WARNING, "Growing number of hash batch is exhausting > > memory"); > > > > This is repeated on each call of ExecHashIncreaseNumBatches when BufFile > > overflows the memory budget. I realize now I should probably add the > > memory limit, the number of current batch and their memory consumption. > > The message is probably too cryptic for a user. It could probably be > > reworded, but some doc or additionnal hint around this message might help. > > > > Hmmm, not sure is WARNING is a good approach, but I don't have a better > idea at the moment. I stepped it down to NOTICE and added some more infos. Here is the output of the last patch with a 1MB work_mem: =# explain analyze select * from small join large using (id); WARNING: increasing number of batches from 1 to 2 WARNING: increasing number of batches from 2 to 4 WARNING: increasing number of batches from 4 to 8 WARNING: increasing number of batches from 8 to 16 WARNING: increasing numb
Re: TAP output format in pg_regress
> On 15 Mar 2023, at 11:36, Peter Eisentraut > wrote: > > On 24.02.23 10:49, Daniel Gustafsson wrote: >> Another rebase on top of 337903a16f. Unless there are conflicting reviews, I >> consider this patch to be done and ready for going in during the next CF. > > I think this is just about as good as it's going to get, so I think we can > consider committing this soon. > > A few comments along the way: > > 1) We can remove the gettext markers _() inside calls like note(), bail(), > etc. If we really wanted to do translation, we would do that inside those > functions (similar to errmsg() etc.). Fixed. The attached also removes all explicit \n from output and leaves the decision on when to add a linebreak to the TAP emitting function. I think this better match how we typically handle printing of output like this. It also ensures that all bail messages follow the same syntax. > 2) There are a few fprintf(stderr, ...) calls left. Should those be changed > to something TAP-enabled? Initially the patch kept errors happening before testing started a non-TAP output, there were leftovers which are now converted. > 3) Maybe these lines > > +++ isolation check in src/test/isolation +++ > > should be changed to TAP format? Arguably, if I run "make -s check", then > everything printed should be valid TAP, right? Fixed. > 4) From the introduction lines > > == creating temporary instance== > == initializing database system == > == starting postmaster== > running on port 61696 with PID 85346 > == creating database "regression" == > == running regression test queries== > > you have kept > > # running on port 61696 with PID 85346 > > which, well, is that the most interesting of those? > > The first three lines (creating, initializing, starting) take some noticeable > amount of time, so they could be kept as a progress indicator. Or just > delete all of them? I suppose some explicit indication about temp-instance > versus existing instance would be useful. This was discussed in 20220221164736.rq3ornzjdkmwk...@alap3.anarazel.de which concluded that this was about the only thing of interest, and even at that it was more of a maybe than a definite yes. As this patch stands, it prints the above for a temp install and the host/port for an existing install, but I don't have ny problems removing that as well. > 5) About the output format. Obviously, this will require some retraining of > the eye. But my first impression is that there is a lot of whitespace > without any guiding lines, so to speak, horizontally or vertically. It makes > me a bit dizzy. ;-) > > I think instead > > # parallel group (2 tests): event_trigger oidjoins > ok 210 event_trigger 131 ms > ok 211 oidjoins 190 ms > ok 212 fast_default 158 ms > ok 213 tablespace319 ms > > Something like this would be less dizzy-making: > > # parallel group (2 tests): event_trigger oidjoins > ok 210 - + event_trigger 131 ms > ok 211 - + oidjoins190 ms > ok 212 - fast_default 158 ms > ok 213 - tablespace319 ms The current format is chosen to be close to the old format, while also adding sufficient padding that it won't yield ragged columns. The wide padding is needed to cope with long names in the isolation and ecgp test suites and not so much regress suite. In the attached I've dialled back the padding a little bit to make it a bit more compact, but I doubt it's enough. > Just an idea, we don't have to get this exactly perfect right now, but it's > something to think about. I'm quite convinced that this will be revisited once this lands and is in front of developers. I think the attached is a good candidate for going in, but I would like to see it for another spin in the CF bot first. -- Daniel Gustafsson v18-0001-pg_regress-Emit-TAP-compliant-output.patch Description: Binary data
Re: Add standard collation UNICODE
On Tue, 2023-03-28 at 08:46 -0400, Joe Conway wrote: > > As long as we still have to initialize the libc locale fields to > > some > > language, I think it would be less confusing to keep the ICU locale > > on > > the same language. > > I definitely agree with that. Sounds good -- no changes then. Regards, Jeff Davis >
Re: Infinite Interval
On Sun, Mar 19, 2023 at 12:18 AM Joseph Koshakow wrote: > > The problem is that `secs = rint(secs)` rounds the seconds too early > and loses any fractional seconds. Do we have an overflow detecting > multiplication function for floats? We have float8_mul() which checks for overflow. typedef double float8; > > >+if (INTERVAL_NOT_FINITE(result)) > >+ereport(ERROR, > >+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), > >+ errmsg("interval out of range"))); > > > >Probably, I added these kind of checks. But I don't remember if those are > >defensive checks or whether it's really possible that the arithmetic > > above > >these lines can yield an non-finite interval. > > These checks appear in `make_interval`, `justify_X`, > `interval_um_internal`, `interval_pl`, `interval_mi`, `interval_mul`, > `interval_div`. For all of these it's possible that the interval > overflows/underflows the non-finite ranges, but does not > overflow/underflow the data type. For example > `SELECT INTERVAL '2147483646 months' + INTERVAL '1 month'` would error > on this check. Without this patch postgres@64807=#SELECT INTERVAL '2147483646 months' + INTERVAL '1 month'; ?column? 178956970 years 7 mons (1 row) That result looks correct postgres@64807=#select 178956970 * 12 + 7; ?column? 2147483647 (1 row) So some backward compatibility break. I don't think we can avoid the backward compatibility break without expanding interval structure and thus causing on-disk breakage. But we can reduce the chances of breaking, if we change INTERVAL_NOT_FINITE to check all the three fields, instead of just month. > > > >+else > >+{ > >+result->time = -interval->time; > >+result->day = -interval->day; > >+result->month = -interval->month; > >+ > >+if (INTERVAL_NOT_FINITE(result)) > >+ereport(ERROR, > >+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), > >+ errmsg("interval out of range"))); > > > >If this error ever gets to the user, it could be confusing. Can we > > elaborate by > >adding context e.g. errcontext("while negating an interval") or some > > such? > > Done. Thanks. Can we add relevant contexts at similar other places? Also if we use all the three fields, we will need to add such checks in interval_justify_hours() > > I replaced these checks with the following: > > + else if (interval->time == PG_INT64_MIN || interval->day == PG_INT32_MIN || > interval->month == PG_INT32_MIN) > + ereport(ERROR, > + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), > + errmsg("interval out of range"))); > > I think this covers the same overflow check but is maybe a bit more > obvious. Unless, there's something I'm missing? Thanks. Your current version is closer to int4um(). Some more review comments in the following email. -- Best Wishes, Ashutosh Bapat
Re: SQL/JSON revisited
Op 3/27/23 om 20:54 schreef Alvaro Herrera: Docs amended as I threatened. Other than that, this has required more > [v12-0001-SQL-JSON-constructors.patch] > [v12-0001-delta-uniqueifyJsonbObject-bugfix.patch] In doc/src/sgml/func.sgml, some minor stuff: 'which specify the data type returned' should be 'which specifies the data type returned' In the json_arrayagg() description, it says: 'If ABSENT ON NULL is specified, any NULL values are omitted.' That's true, but as omitting NULL values is the default (i.e., also without that clause) maybe it's better to say: 'Any NULL values are omitted unless NULL ON NULL is specified' I've found no bugs in functionality. Thanks, Erik Rijkers
Re: Infinite Interval
On Sun, Mar 26, 2023 at 1:28 AM Tom Lane wrote: > > I think you can take it as read that simple C test programs on modern > platforms will exhibit IEEE-compliant handling of float infinities. > For the record, I tried the attached. It gives a warning at compilation time. $gcc float_inf.c float_inf.c: In function ‘main’: float_inf.c:10:17: warning: division by zero [-Wdiv-by-zero] 10 | float inf = 1.0/0; | ^ float_inf.c:11:20: warning: division by zero [-Wdiv-by-zero] 11 | float n_inf = -1.0/0; |^ $ ./a.out inf = inf -inf = -inf inf + inf = inf inf + -inf = -nan -inf + inf = -nan -inf + -inf = -inf inf - inf = -nan inf - -inf = inf -inf - inf = -inf -inf - -inf = -nan float 0.0 equals 0.0 float 1.0 equals 1.0 5.0 * inf = inf 5.0 * - inf = -inf 5.0 / inf = 0.00 5.0 / - inf = -0.00 inf / 5.0 = inf - inf / 5.0 = -inf The changes in the patch are compliant with the observations above. -- Best Wishes, Ashutosh Bapat #include void test_flag(int flag, char *flag_name); int main(void) { float inf = 1.0/0; float n_inf = -1.0/0; printf("inf = %f\n", inf); printf("-inf = %f\n", n_inf); /* Additions */ printf("inf + inf = %f\n", inf + inf); printf("inf + -inf = %f\n", inf + n_inf); printf("-inf + inf = %f\n", n_inf + inf); printf("-inf + -inf = %f\n", n_inf + n_inf); /* Subtractions */ printf("inf - inf = %f\n", inf - inf); printf("inf - -inf = %f\n", inf - n_inf); printf("-inf - inf = %f\n", n_inf - inf); printf("-inf - -inf = %f\n", n_inf - n_inf); if (0.0 == 0.0) printf("float 0.0 equals 0.0\n"); if (0.5 + 0.5 == .64 + .36) printf("float 1.0 equals 1.0\n"); /* Multiplication */ printf(" 5.0 * inf = %f\n", 5.0 * inf); printf(" 5.0 * - inf = %f\n", 5.0 * n_inf); /* Division */ printf(" 5.0 / inf = %f\n", 5.0 / inf); printf(" 5.0 / - inf = %f\n", 5.0 / n_inf); printf(" inf / 5.0 = %f\n", inf / 5.0); printf(" - inf / 5.0 = %f\n", n_inf / 5.0); }
Re: Infinite Interval
On Mon, Mar 20, 2023 at 3:16 AM Joseph Koshakow wrote: > > > > On Sun, Mar 19, 2023 at 5:13 PM Tom Lane wrote: > > > >Did you actually write "if TIMESTAMP_IS_NOBEGIN(dt2)" and not > >"if (TIMESTAMP_IS_NOBEGIN(dt2))"? If the former, I'm not surprised > >that pgindent gets confused. The parentheses are required by the > >C standard. Your code might accidentally work because the macro > >has parentheses internally, but call sites have no business > >knowing that. For example, it would be completely legit to change > >TIMESTAMP_IS_NOBEGIN to be a plain function, and then this would be > >syntactically incorrect. > > Oh duh. I've been doing too much Rust development and did this without > thinking. I've attached a patch with a fix. > Thanks for fixing this. On this latest patch, I have one code comment @@ -3047,7 +3180,30 @@ timestamptz_pl_interval_internal(TimestampTz timestamp, TimestampTz result; int tz; - if (TIMESTAMP_NOT_FINITE(timestamp)) + /* +* Adding two infinites with the same sign results in an infinite +* timestamp with the same sign. Adding two infintes with different signs +* results in an error. +*/ + if (INTERVAL_IS_NOBEGIN(span)) + { + if TIMESTAMP_IS_NOEND(timestamp) + ereport(ERROR, + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), +errmsg("interval out of range"))); + else + TIMESTAMP_NOBEGIN(result); + } + else if (INTERVAL_IS_NOEND(span)) + { + if TIMESTAMP_IS_NOBEGIN(timestamp) + ereport(ERROR, + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), +errmsg("interval out of range"))); + else + TIMESTAMP_NOEND(result); + } + else if (TIMESTAMP_NOT_FINITE(timestamp)) This code is duplicated in timestamp_pl_interval(). We could create a function to encode the infinity handling rules and then call it in these two places. The argument types are different, Timestamp and TimestampTz viz. which map to in64, so shouldn't be a problem. But it will be slightly unreadable. Or use macros but then it will be difficult to debug. What do you think? Next I will review the test changes and also make sure that every operator that interval as one of its operands or the result has been covered in the code. This is the following list #select oprname, oprcode from pg_operator where oprleft = 'interval'::regtype or oprright = 'interval'::regtype or oprresult = 'interval'::regtype; oprname | oprcode -+- + | date_pl_interval - | date_mi_interval + | timestamptz_pl_interval - | timestamptz_mi - | timestamptz_mi_interval = | interval_eq <> | interval_ne < | interval_lt <= | interval_le > | interval_gt >= | interval_ge - | interval_um + | interval_pl - | interval_mi - | time_mi_time * | interval_mul * | mul_d_interval / | interval_div + | time_pl_interval - | time_mi_interval + | timetz_pl_interval - | timetz_mi_interval + | interval_pl_time + | timestamp_pl_interval - | timestamp_mi - | timestamp_mi_interval + | interval_pl_date + | interval_pl_timetz + | interval_pl_timestamp + | interval_pl_timestamptz (30 rows) -- Best Wishes, Ashutosh Bapat
Re: TAP output format in pg_regress
> On 28 Mar 2023, at 15:26, Daniel Gustafsson wrote: > I think the attached is a good candidate for going in, but I would like to > see it > for another spin in the CF bot first. Another candidate due to a thinko which raised a compiler warning. -- Daniel Gustafsson v19-0001-pg_regress-Emit-TAP-compliant-output.patch Description: Binary data
Re: Move definition of standard collations from initdb to pg_collation.dat
On 28.03.23 13:33, Tom Lane wrote: While we're here, do we want to adopt some other spelling of "the root locale" than "und", in view of recent discoveries about the instability of that on old ICU versions? That issue was fixed by 3b50275b12, so we can keep using the "und" spelling.
Re: Request for comment on setting binary format output per session
On Sun, 26 Mar 2023 at 21:30, Tom Lane wrote: > Dave Cramer writes: > > On Sun, 26 Mar 2023 at 18:12, Tom Lane wrote: > >> I would not expect DISCARD ALL to reset a session-level property. > > > Well if we can't reset it with DISCARD ALL how would that work with > > pgbouncer, or any pool for that matter since it doesn't know which client > > asked for which (if any) OID's to be binary. > > Well, it'd need to know that, just like it already needs to know > which clients asked for which database or which login role. > OK, IIUC what you are proposing here is that there would be a separate pool for database, user, and OIDs. This doesn't seem too flexible. For instance if I create a UDT and then want it to be returned as binary then I have to reconfigure the pool to be able to accept a new list of OID's. Am I mis-understanding how this would potentially work? Dave > >
Re: Support logical replication of global object commands
> I think that there are some (possibly) tricky challenges that haven't > been discussed yet to support replicating global objects. > > First, as for publications having global objects (roles, databases, > and tablespaces), but storing them in database specific tables like > pg_publication doesn't make sense, because it should be at some shared > place where all databases can have access to it. Maybe we need to have > a shared catalog like pg_shpublication or pg_publication_role to store > publications related to global objects or the relationship between > such publications and global objects. Second, we might need to change > the logical decoding infrastructure so that it's aware of shared > catalog changes. Thanks for the feedback. This is insightful. > Currently we need to scan only db-specific catalogs. > Finally, since we process CREATE DATABASE in a different way than > other DDLs (by cloning another database such as template1), simply > replicating the CREATE DATABASE statement would not produce the same > results as the publisher. Also, since event triggers are not fired on > DDLs for global objects, always WAL-logging such DDL statements like > the proposed patch does is not a good idea. > Given that there seems to be some tricky problems and there is a > discussion for cutting the scope to make the initial patch small[1], I > think it's better to do this work after the first version. Agreed. Regards, Zane
Re: Kerberos delegation support in libpq and postgres_fdw
Greetings, * Stephen Frost (sfr...@snowman.net) wrote: > * Greg Stark (st...@mit.edu) wrote: > > The CFBot says there's a function be_gssapi_get_proxy() which is > > undefined. Presumably this is a missing #ifdef or a definition that > > should be outside an #ifdef. > > Yup, just a couple of missing #ifdef's. > > Updated and rebased patch attached. ... and a few more. Apparently hacking on a plane without enough sleep leads to changing ... and unchanging configure flags before testing. Thanks, Stephen From 9808fd4eb4920e40468709af7325a27b18066cec Mon Sep 17 00:00:00 2001 From: Stephen Frost Date: Mon, 28 Feb 2022 20:17:55 -0500 Subject: [PATCH] Add support for Kerberos credential delegation Support GSSAPI/Kerberos credentials being delegated to the server by a client. With this, a user authenticating to PostgreSQL using Kerberos (GSSAPI) credentials can choose to delegate their credentials to the PostgreSQL server (which can choose to accept them, or not), allowing the server to then use those delegated credentials to connect to another service, such as with postgres_fdw or dblink or theoretically any other service which is able to be authenticated using Kerberos. Both postgres_fdw and dblink are changed to allow non-superuser password-less connections but only when GSSAPI credentials have been proxied to the server by the client and GSSAPI is used to authenticate to the remote system. Authors: Stephen Frost, Peifeng Qiu Discussion: https://postgr.es/m/co1pr05mb8023cc2cb575e0faad7df4f8a8...@co1pr05mb8023.namprd05.prod.outlook.com --- contrib/dblink/dblink.c | 127 --- contrib/dblink/expected/dblink.out| 4 +- contrib/postgres_fdw/connection.c | 72 +++- .../postgres_fdw/expected/postgres_fdw.out| 19 +- contrib/postgres_fdw/option.c | 3 + contrib/postgres_fdw/sql/postgres_fdw.sql | 3 +- doc/src/sgml/config.sgml | 17 + doc/src/sgml/dblink.sgml | 5 +- doc/src/sgml/libpq.sgml | 41 +++ doc/src/sgml/monitoring.sgml | 9 + doc/src/sgml/postgres-fdw.sgml| 5 +- src/backend/catalog/system_views.sql | 3 +- src/backend/foreign/foreign.c | 1 + src/backend/libpq/auth.c | 13 +- src/backend/libpq/be-gssapi-common.c | 51 +++ src/backend/libpq/be-secure-gssapi.c | 26 +- src/backend/utils/activity/backend_status.c | 1 + src/backend/utils/adt/pgstatfuncs.c | 20 +- src/backend/utils/init/postinit.c | 8 +- src/backend/utils/misc/guc_tables.c | 10 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/catalog/pg_proc.dat | 6 +- src/include/libpq/auth.h | 1 + src/include/libpq/be-gssapi-common.h | 3 + src/include/libpq/libpq-be.h | 2 + src/include/utils/backend_status.h| 1 + src/interfaces/libpq/exports.txt | 1 + src/interfaces/libpq/fe-auth.c| 15 +- src/interfaces/libpq/fe-connect.c | 17 + src/interfaces/libpq/fe-secure-gssapi.c | 23 +- src/interfaces/libpq/libpq-fe.h | 1 + src/interfaces/libpq/libpq-int.h | 2 + src/test/kerberos/Makefile| 3 + src/test/kerberos/t/001_auth.pl | 335 -- src/test/perl/PostgreSQL/Test/Utils.pm| 27 ++ src/test/regress/expected/rules.out | 11 +- 36 files changed, 752 insertions(+), 135 deletions(-) diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 78a8bcee6e..d4dd338055 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -48,6 +48,7 @@ #include "funcapi.h" #include "lib/stringinfo.h" #include "libpq-fe.h" +#include "libpq/libpq-be.h" #include "libpq/libpq-be-fe-helpers.h" #include "mb/pg_wchar.h" #include "miscadmin.h" @@ -111,7 +112,8 @@ static HeapTuple get_tuple_of_interest(Relation rel, int *pkattnums, int pknumat static Relation get_rel_from_relname(text *relname_text, LOCKMODE lockmode, AclMode aclmode); static char *generate_relation_name(Relation rel); static void dblink_connstr_check(const char *connstr); -static void dblink_security_check(PGconn *conn, remoteConn *rconn); +static bool dblink_connstr_has_pw(const char *connstr); +static void dblink_security_check(PGconn *conn, remoteConn *rconn, const char *connstr); static void dblink_res_error(PGconn *conn, const char *conname, PGresult *res, bool fail, const char *fmt,...) pg_attribute_printf(5, 6); static char *get_connect_string(const char *servername); @@ -213,7 +215,7 @@ dblink_get_conn(char *conname_or_str, errmsg("could not establish connection"), errdetail_internal("%s", msg))); } - dblink_security_check(conn, rconn); + dblink_security_check(conn, rconn, connstr); if (PQcl
Re: Missing update of all_hasnulls in BRIN opclasses
Hi, It took me a while but I finally got back to reworking this to use the bt_info bit, as proposed by Alvaro. And it turned out to work great, because (a) it's a tuple-level flag, i.e. the right place, and (b) it does not overload existing flags. This greatly simplified the code in add_values_to_range and (especially) union_tuples, making it much easier to understand, I think. One disadvantage is we are unable to see which ranges are empty in current pageinspect, but 0002 addresses that by adding "empty" column to the brin_page_items() output. That's a matter for master only, though. It's a trivial patch and it makes it easier/possible to test this, so we should consider to squeeze it into PG16. I did quite a bit of testing - the attached 0003 adds extra tests, but I don't propose to get this committed as is - it's rather overkill. Maybe some reduced version of it ... The hardest thing to test is the union_tuples() part, as it requires concurrent operations with "correct" timing. Easy to simulate by breakpoints in GDB, not so much in plain regression/TAP tests. There's also a stress tests, doing a lot of randomized summarizations, etc. Without the fix this failed in maybe 30% of runs, now I did ~100 runs without a single failure. I haven't done any backporting, but I think it should be simpler than with the earlier approach. I wonder if we need to care about starting to use the previously unused bit - I don't think so, in the worst case we'll just ignore it, but maybe I'm missing something (e.g. when using physical replication). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 10efaf9964806e5a30818994e3cfda879bb90171 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Sun, 8 Jan 2023 16:43:06 +0100 Subject: [PATCH 1/3] Fix handling of NULLs in BRIN indexes BRIN indexes did not properly distinguish between summaries for empty (no rows) and all-NULL ranges. All summaries were initialized with allnulls=true, and the opclasses simply reset allnulls to false when processing the first non-NULL value. This however fails if the range starts with a NULL value (or a sequence of NULL values), in which case we forget the range contains NULL values. This happens because the allnulls flag is used for two separate purposes - to mark empty ranges (not representing any rows yet) and ranges containing only NULL values. Opclasses don't know which of these cases it is, and so don't know whether to set hasnulls=true. Setting hasnulls=true in both cases would make it correct, but it would also make BRIN indexes useless for queries with IS NULL clauses - all ranges start empty (and thus allnulls=true), so all ranges would end up with either allnulls=true or hasnulls=true. The severity of the issue is somewhat reduced by the fact that it only happens when adding values to an existing summary with allnulls=true, not when the summarization is processing values in bulk (e.g. during CREATE INDEX or automatic summarization). In this case the flags were updated in a slightly different way, not forgetting the NULL values. This introduces a new a new flag marking index tuples representing ranges with no rows. Luckily we have an unused tuple in the BRIN tuple header that we can use for this. We still store index tuples for empty ranges, because otherwise we'd not be able to say whether a range is empty or not yet summarized, and we'd have to process them for any query. Backpatch to 11. The issue exists since BRIN indexes were introduced in 9.5, but older releases are already EOL. Backpatch-through: 11 Reviewed-by: Alvaro Herrera, Justin Pryzby, Matthias van de Meent Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80aff...@enterprisedb.com --- src/backend/access/brin/brin.c| 115 +- src/backend/access/brin/brin_tuple.c | 15 ++- src/include/access/brin_tuple.h | 6 +- ...summarization-and-inprogress-insertion.out | 8 +- ...ummarization-and-inprogress-insertion.spec | 1 + 5 files changed, 137 insertions(+), 8 deletions(-) diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index 53e4721a54e..162a0c052aa 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -592,6 +592,17 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm) bval = &dtup->bt_columns[attno - 1]; + /* + * If the BRIN tuple indicates that this range is empty, + * we can skip it: there's nothing to match. We don't + * need to examine the next columns. + */ + if (dtup->bt_empty_range) + { + addrange = false; + break; + } + /* * First check if there are any IS [NOT] NULL scan keys, * and if we're violating them. In that case we can @@ -1590,6 +1601,8 @@ form_and_insert_tuple(BrinBuildState *state) /* * Given two deformed tuples, adjust the first one so that it's consistent * with the summary values
Re: "variable not found in subplan target list"
I wrote: > The planner is reducing the scan of target_parted to > a dummy scan, as is reasonable, but it forgets to > provide ctid as an output from that scan; then the > parent join node is unhappy because it does have > a ctid output. So it looks like the problem is some > shortcut we take while creating the dummy scan. Oh, actually the problem is in distribute_row_identity_vars, which is supposed to handle this case, but it thinks it doesn't have to back-fill the rel's reltarget. Wrong. Now that I see the problem, I wonder if we can't reproduce a similar symptom without MERGE, which would mean that v14 has the issue too. The attached seems to fix it, but I'm going to look for a non-MERGE test case before pushing. regards, tom lane diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c index 9d377385f1..c1b1557570 100644 --- a/src/backend/optimizer/util/appendinfo.c +++ b/src/backend/optimizer/util/appendinfo.c @@ -21,6 +21,7 @@ #include "nodes/nodeFuncs.h" #include "optimizer/appendinfo.h" #include "optimizer/pathnode.h" +#include "optimizer/planmain.h" #include "parser/parsetree.h" #include "utils/lsyscache.h" #include "utils/rel.h" @@ -994,9 +995,10 @@ distribute_row_identity_vars(PlannerInfo *root) * certainly process no rows. Handle this edge case by re-opening the top * result relation and adding the row identity columns it would have used, * as preprocess_targetlist() would have done if it weren't marked "inh". - * (This is a bit ugly, but it seems better to confine the ugliness and - * extra cycles to this unusual corner case.) We needn't worry about - * fixing the rel's reltarget, as that won't affect the finished plan. + * Then re-run build_base_rel_tlists() to ensure that the added columns + * get propagated to the relation's reltarget. (This is a bit ugly, but + * it seems better to confine the ugliness and extra cycles to this + * unusual corner case.) */ if (root->row_identity_vars == NIL) { @@ -1006,6 +1008,8 @@ distribute_row_identity_vars(PlannerInfo *root) add_row_identity_columns(root, result_relation, target_rte, target_relation); table_close(target_relation, NoLock); + build_base_rel_tlists(root, root->processed_tlist); + /* There are no ROWID_VAR Vars in this case, so we're done. */ return; }
Re: Support logical replication of DDLs
On 3/27/23 2:37 AM, Amit Kapila wrote: On Mon, Mar 27, 2023 at 2:52 AM Tom Lane wrote: And TBH, I don't think that I quite believe the premise in the first place. The whole point of using logical rather than physical replication is that the subscriber installation(s) aren't exactly like the publisher. Given that, how can we expect that automated DDL replication is going to do the right thing often enough to be a useful tool rather than a disastrous foot-gun? One of the major use cases as mentioned in the initial email was for online version upgrades. And also, people would be happy to automatically sync the schema for cases where the logical replication is set up to get a subset of the data via features like row filters. Having said that, I agree with you that it is very important to define the scope of this feature if we want to see it becoming reality. To echo Amit, this is actually one area where PostgreSQL replication lags behind (no pun intended) other mature RDBMSes. As Amit says, the principal use case is around major version upgrades, but also migration between systems or moving data/schemas between systems that speak the PostgreSQL protocol. All of these are becoming more increasingly common as PostgreSQL is taking on more workloads that are sensitive to downtime or are distributed in nature. There are definitely footguns with logical replication of DDL -- I've seen this from reading other manuals that support this feature and in my own experiments. However, like many features, users have strategies thy use to avoid footgun scenarios. For example, in systems that use logical replication as part of their HA, users will either: * Not replicate DDL, but use some sort of rolling orchestration process to place it on each instance * Replicate DDL, but coordinate it with some kind of global lock * Replica only a subset of DDL, possibly with lock coordination I'll comment on the patch scope further downthread. I agree it's very big -- I had given some of that feedback privately a few month back -- and it could benefit from the "step back, holistic review." For example, I was surprised that a fairly common pattern[1] did not work due to changes we made when addressing a CVE (some follow up work was proposed but we haven't done it yet). I do agree this patch would benefit from stepping back, and I do think we can work many of the issues. From listening to users and prospective users, it's pretty clear we need to support DDL replication in some capacity. Thanks, Jonathan [1] https://www.postgresql.org/message-id/263bea1c-a897-417d-3765-ba6e1e24711e%40postgresql.org OpenPGP_signature Description: OpenPGP digital signature
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Hi Jelte, I had a look into your patchset (v16), did a quick review and played a bit with the feature. Patch 2 is missing the documentation about PQcancelSocket() and contains a few typos; please find attached a (fixup) patch to correct these. --- a/src/interfaces/libpq/libpq-fe.h +++ b/src/interfaces/libpq/libpq-fe.h @@ -321,16 +328,28 @@ extern PostgresPollingStatusType PQresetPoll(PGconn *conn); /* Synchronous (blocking) */ extern void PQreset(PGconn *conn); +/* issue a cancel request */ +extern PGcancelConn * PQcancelSend(PGconn *conn); [...] Maybe I'm missing something, but this function above seems a bit strange. Namely, I wonder why it returns a PGcancelConn and what's the point of requiring the user to call PQcancelStatus() to see if something got wrong. Maybe it could be defined as: int PQcancelSend(PGcancelConn *cancelConn); where the return value would be status? And the user would only need to call PQcancelErrorMessage() in case of error. This would leave only one single way to create a PGcancelConn value (i.e. PQcancelConn()), which seems less confusing to me. Jelte Fennema wrote: > Especially since I ran into another use case that I would want to use > this patch for recently: Adding an async cancel function to Python > it's psycopg3 library. This library exposes both a Connection class > and an AsyncConnection class (using python its asyncio feature). But > one downside of the AsyncConnection type is that it doesn't have a > cancel method. As part of my testing, I've implemented non-blocking cancellation in Psycopg, based on v16 on this patchset. Overall this worked fine and seems useful; if you want to try it: https://github.com/dlax/psycopg3/tree/pg16/non-blocking-pqcancel (The only thing I found slightly inconvenient is the need to convey the connection encoding (from PGconn) when handling error message from the PGcancelConn.) Cheers, Denis >From a5f9cc680ffa520b05fe34b7cac5df2e60a6d4ad Mon Sep 17 00:00:00 2001 From: Denis Laxalde Date: Tue, 28 Mar 2023 16:06:42 +0200 Subject: [PATCH] fixup! Add non-blocking version of PQcancel --- doc/src/sgml/libpq.sgml | 16 +++- src/interfaces/libpq/fe-connect.c| 2 +- src/test/modules/libpq_pipeline/libpq_pipeline.c | 2 +- 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 29f08a4317..aa404c4d15 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -5795,7 +5795,7 @@ PGcancelConn *PQcancelSend(PGconn *conn); options as the the original PGconn. So when the original connection is encrypted (using TLS or GSS), the connection for the cancel request is encrypted in the same way. Any connection options - that only are only used during authentication or after authentication of + that are only used during authentication or after authentication of the client are ignored though, because cancellation requests do not require authentication and the connection is closed right after the cancellation request is submitted. @@ -5912,6 +5912,20 @@ ConnStatusType PQcancelStatus(const PGcancelConn *conn); + + PQcancelSocketPQcancelSocket + + + + A version of that can be used for + cancellation connections. + +int PQcancelSocket(PGcancelConn *conn); + + + + + PQcancelPollPQcancelPoll diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 74e337fddf..16af7303d4 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -808,7 +808,7 @@ PQcancelConn(PGconn *conn) /* * Cancel requests should not iterate over all possible hosts. The request * needs to be sent to the exact host and address that the original - * connection used. So we we manually create the host and address arrays + * connection used. So we manually create the host and address arrays * with a single element after freeing the host array that we generated * from the connection options. */ diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c index e8e904892c..6764ab513b 100644 --- a/src/test/modules/libpq_pipeline/libpq_pipeline.c +++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c @@ -89,7 +89,7 @@ pg_fatal_impl(int line, const char *fmt,...) /* * Check that the query on the given connection got cancelled. * - * This is a function wrapped in a macrco to make the reported line number + * This is a function wrapped in a macro to make the reported line number * in an error match the line number of the invocation. */ #define confirm_query_cancelled(conn) confirm_query_cancelled_impl(__LINE__, conn) -- 2.30.2
Re: Memory leak from ExecutorState context?
Hi, Sorry for the late answer, I was reviewing the first patch and it took me some time to study and dig around. On Thu, 23 Mar 2023 08:07:04 -0400 Melanie Plageman wrote: > On Fri, Mar 10, 2023 at 1:51 PM Jehan-Guillaume de Rorthais > wrote: > > > So I guess the best thing would be to go through these threads, see what > > > the status is, restart the discussion and propose what to do. If you do > > > that, I'm happy to rebase the patches, and maybe see if I could improve > > > them in some way. > > > > OK! It took me some time, but I did it. I'll try to sum up the situation as > > simply as possible. > > Wow, so many memories! > > I'm excited that someone looked at this old work (though it is sad that > a customer faced this issue). And, Jehan, I really appreciate your great > summarization of all these threads. This will be a useful reference. Thank you! > > 1. "move BufFile stuff into separate context" > > [...] > >I suppose this simple one has been forgotten in the fog of all other > >discussions. Also, this probably worth to be backpatched. > > I agree with Jehan-Guillaume and Tomas that this seems fine to commit > alone. This is a WIP. > > 2. "account for size of BatchFile structure in hashJoin" > > [...] > > I think I would have to see a modern version of a patch which does this > to assess if it makes sense. But, I probably still agree with 2019 > Melanie :) I volunteer to work on this after the memory context patch, unless someone grab it in the meantime. > [...] > On Mon, Mar 20, 2023 at 10:12 AM Jehan-Guillaume de Rorthais > wrote: > > BNJL and/or other considerations are for 17 or even after. In the meantime, > > Melanie, who authored BNLJ, +1 the balancing patch as it can coexists with > > other discussed solutions. No one down vote since then. Melanie, what is > > your opinion today on this patch? Did you change your mind as you worked > > for many months on BNLJ since then? > > So, in order to avoid deadlock, my design of adaptive hash join/block > nested loop hash join required a new parallelism concept not yet in > Postgres at the time -- the idea of a lone worker remaining around to do > work when others have left. > > See: BarrierArriveAndDetachExceptLast() > introduced in 7888b09994 > > Thomas Munro had suggested we needed to battle test this concept in a > more straightforward feature first, so I implemented parallel full outer > hash join and parallel right outer hash join with it. > > https://commitfest.postgresql.org/42/2903/ > > This has been stalled ready-for-committer for two years. It happened to > change timing such that it made an existing rarely hit parallel hash > join bug more likely to be hit. Thomas recently committed our fix for > this in 8d578b9b2e37a4d (last week). It is my great hope that parallel > full outer hash join goes in before the 16 feature freeze. This is really interesting to follow. I kinda feel/remember how this could be useful for your BNLJ patch. It's good to see things are moving, step by step. Thanks for the pointers. > If it does, I think it could make sense to try and find committable > smaller pieces of the adaptive hash join work. As it is today, parallel > hash join does not respect work_mem, and, in some sense, is a bit broken. > > I would be happy to work on this feature again, or, if you were > interested in picking it up, to provide review and any help I can if for > you to work on it. I don't think I would be able to pick up such a large and complex patch. But I'm interested to help, test and review, as far as I can! Regards,
Re: Initial Schema Sync for Logical Replication
On Tue, Mar 28, 2023 at 6:47 PM Amit Kapila wrote: > > On Mon, Mar 27, 2023 at 8:17 AM Masahiko Sawada wrote: > > > > On Fri, Mar 24, 2023 at 11:51 PM Kumar, Sachin wrote: > > > > > > > From: Amit Kapila > > > > > I think we won't be able to use same snapshot because the transaction > > > > > will > > > > > be committed. > > > > > In CreateSubscription() we can use the transaction snapshot from > > > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am not sure > > > > > about this part maybe walrcv_disconnect() calls the commits > > > > > internally ?). > > > > > So somehow we need to keep this snapshot alive, even after transaction > > > > > is committed(or delay committing the transaction , but we can have > > > > > CREATE SUBSCRIPTION with ENABLED=FALSE, so we can have a restart > > > > > before tableSync is able to use the same snapshot.) > > > > > > > > > > > > > Can we think of getting the table data as well along with schema via > > > > pg_dump? Won't then both schema and initial data will correspond to the > > > > same snapshot? > > > > > > Right , that will work, Thanks! > > > > While it works, we cannot get the initial data in parallel, no? > > > > Another possibility is that we dump/restore the schema of each table > along with its data. One thing we can explore is whether the parallel > option of dump can be useful here. Do you have any other ideas? A downside of the idea of dumping both table schema and table data would be that we need to temporarily store data twice the size of the table (the dump file and the table itself) during the load. One might think that we can redirect the pg_dump output into the backend so that it can load it via SPI, but it doesn't work since "COPY tbl FROM stdin;" doesn't work via SPI. The --inserts option of pg_dump could help it out but it makes restoration very slow. > > One related idea is that currently, we fetch the table list > corresponding to publications in subscription and create the entries > for those in pg_subscription_rel during Create Subscription, can we > think of postponing that work till after the initial schema sync? We > seem to be already storing publications list in pg_subscription, so it > appears possible if we somehow remember the value of copy_data. If > this is feasible then I think that may give us the flexibility to > perform the initial sync at a later point by the background worker. It sounds possible. With this idea, we will be able to have the apply worker restore the table schemas (and create pg_subscription_rel entries) as the first thing. Another point we might need to consider is that the initial schema sync (i.e. creating tables) and creating pg_subscription_rel entries need to be done in the same transaction. Otherwise, we could end up committing either one change. I think it depends on how we restore the schema data. > > > > > > > > > I think we can have same issues as you mentioned New table t1 is added > > > > > to the publication , User does a refresh publication. > > > > > pg_dump / pg_restore restores the table definition. But before > > > > > tableSync can start, steps from 2 to 5 happen on the publisher. > > > > > > 1. Create Table t1(c1, c2); --LSN: 90 2. Insert t1 (1, 1); --LSN 100 > > > > > > 3. Insert t1 (2, 2); --LSN 110 4. Alter t1 Add Column c3; --LSN 120 > > > > > > 5. Insert t1 (3, 3, 3); --LSN 130 > > > > > And table sync errors out > > > > > There can be one more issue , since we took the pg_dump without > > > > snapshot (wrt to replication slot). > > > > > > > > > > > > > To avoid both the problems mentioned for Refresh Publication, we can do > > > > one of the following: (a) create a new slot along with a snapshot for > > > > this > > > > operation and drop it afterward; or (b) using the existing slot, > > > > establish a > > > > new snapshot using a technique proposed in email [1]. > > > > > > > > > > Thanks, I think option (b) will be perfect, since we don’t have to create > > > a new slot. > > > > Regarding (b), does it mean that apply worker stops streaming, > > requests to create a snapshot, and then resumes the streaming? > > > > Shouldn't this be done by the backend performing a REFRESH publication? Hmm, I might be missing something but the idea (b) uses the existing slot to establish a new snapshot, right? What existing replication slot do we use for that? I thought it was the one used by the apply worker. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Memory leak from ExecutorState context?
On 3/28/23 15:17, Jehan-Guillaume de Rorthais wrote: > On Tue, 28 Mar 2023 00:43:34 +0200 > Tomas Vondra wrote: > >> On 3/27/23 23:13, Jehan-Guillaume de Rorthais wrote: >>> Please, find in attachment a patch to allocate bufFiles in a dedicated >>> context. I picked up your patch, backpatch'd it, went through it and did >>> some minor changes to it. I have some comment/questions thought. >>> >>> 1. I'm not sure why we must allocate the "HashBatchFiles" new context >>> under ExecutorState and not under hashtable->hashCxt? >>> >>> The only references I could find was in hashjoin.h:30: >>> >>>/* [...] >>> * [...] (Exception: data associated with the temp files lives in the >>> * per-query context too, since we always call buffile.c in that >>> context.) >>> >>> And in nodeHashjoin.c:1243:ExecHashJoinSaveTuple() (I reworded this >>> original comment in the patch): >>> >>>/* [...] >>> * Note: it is important always to call this in the regular executor >>> * context, not in a shorter-lived context; else the temp file buffers >>> * will get messed up. >>> >>> >>> But these are not explanation of why BufFile related allocations must be >>> under a per-query context. >>> >> >> Doesn't that simply describe the current (unpatched) behavior where >> BufFile is allocated in the per-query context? > > I wasn't sure. The first quote from hashjoin.h seems to describe a stronger > rule about «**always** call buffile.c in per-query context». But maybe it > ought > to be «always call buffile.c from one of the sub-query context»? I assume the > aim is to enforce the tmp files removal on query end/error? > I don't think we need this info for tempfile cleanup - CleanupTempFiles relies on the VfdCache, which does malloc/realloc (so it's not using memory contexts at all). I'm not very familiar with this part of the code, so I might be missing something. But you can try that - just stick an elog(ERROR) somewhere into the hashjoin code, and see if that breaks the cleanup. Not an explicit proof, but if there was some hard-wired requirement in which memory context to allocate BufFile stuff, I'd expect it to be documented in buffile.c. But that actually says this: * Note that BufFile structs are allocated with palloc(), and therefore * will go away automatically at query/transaction end. Since the underlying * virtual Files are made with OpenTemporaryFile, all resources for * the file are certain to be cleaned up even if processing is aborted * by ereport(ERROR). The data structures required are made in the * palloc context that was current when the BufFile was created, and * any external resources such as temp files are owned by the ResourceOwner * that was current at that time. which I take as confirmation that it's legal to allocate BufFile in any memory context, and that cleanup is handled by the cache in fd.c. >> I mean, the current code calls BufFileCreateTemp() without switching the >> context, so it's in the ExecutorState. But with the patch it very clearly is >> not. >> >> And I'm pretty sure the patch should do >> >> hashtable->fileCxt = AllocSetContextCreate(hashtable->hashCxt, >> "HashBatchFiles", >> ALLOCSET_DEFAULT_SIZES); >> >> and it'd still work. Or why do you think we *must* allocate it under >> ExecutorState? > > That was actually my very first patch and it indeed worked. But I was confused > about the previous quoted code comments. That's why I kept your original code > and decided to rise the discussion here. > IIRC I was just lazy when writing the experimental patch, there was not much thought about stuff like this. > Fixed in new patch in attachment. > >> FWIW The comment in hashjoin.h needs updating to reflect the change. > > Done in the last patch. Is my rewording accurate? > >>> 2. Wrapping each call of ExecHashJoinSaveTuple() with a memory context >>> switch seems fragile as it could be forgotten in futur code path/changes. >>> So I added an Assert() in the function to make sure the current memory >>> context is "HashBatchFiles" as expected. >>> Another way to tie this up might be to pass the memory context as >>> argument to the function. >>> ... Or maybe I'm over precautionary. >>> >> >> I'm not sure I'd call that fragile, we have plenty other code that >> expects the memory context to be set correctly. Not sure about the >> assert, but we don't have similar asserts anywhere else. > > I mostly sticked it there to stimulate the discussion around this as I needed > to scratch that itch. > >> But I think it's just ugly and overly verbose > > +1 > > Your patch was just a demo/debug patch by the time. It needed some cleanup now > :) > >> it'd be much nicer to e.g. pass the memory context as a parameter, and do >> the switch inside. > > That was a proposition in my previous mail, so I did it in the new patch. > Let's > see what other re
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 28 Mar 2023 at 16:54, Denis Laxalde wrote: > I had a look into your patchset (v16), did a quick review and played a > bit with the feature. > > Patch 2 is missing the documentation about PQcancelSocket() and contains > a few typos; please find attached a (fixup) patch to correct these. Thanks applied that patch and attached a new patchset > Namely, I wonder why it returns a PGcancelConn and what's the > point of requiring the user to call PQcancelStatus() to see if something > got wrong. Maybe it could be defined as: > > int PQcancelSend(PGcancelConn *cancelConn); > > where the return value would be status? And the user would only need to > call PQcancelErrorMessage() in case of error. This would leave only one > single way to create a PGcancelConn value (i.e. PQcancelConn()), which > seems less confusing to me. To clarify what you mean, the API would then be like this: PGcancelConn cancelConn = PQcancelConn(conn); if (PQcancelSend(cancelConn) == CONNECTION_BAD) { printf("ERROR %s\n", PQcancelErrorMessage(cancelConn)) exit(1) } Instead of: PGcancelConn cancelConn = PQcancelSend(conn); if (PQcancelStatus(cancelConn) == CONNECTION_BAD) { printf("ERROR %s\n", PQcancelErrorMessage(cancelConn)) exit(1) } Those are so similar, that I have no preference either way. If more people prefer one over the other I'm happy to change it, but for now I'll keep it as is. > As part of my testing, I've implemented non-blocking cancellation in > Psycopg, based on v16 on this patchset. Overall this worked fine and > seems useful; if you want to try it: > > https://github.com/dlax/psycopg3/tree/pg16/non-blocking-pqcancel That's great to hear! I'll try to take a closer look at that change tomorrow. > (The only thing I found slightly inconvenient is the need to convey the > connection encoding (from PGconn) when handling error message from the > PGcancelConn.) Could you expand a bit more on this? And if you have any idea on how to improve the API with regards to this? v17-0002-Copy-and-store-addrinfo-in-libpq-owned-private-m.patch Description: Binary data v17-0005-Start-using-new-libpq-cancel-APIs.patch Description: Binary data v17-0003-Return-2-from-pqReadData-on-EOF.patch Description: Binary data v17-0004-Add-non-blocking-version-of-PQcancel.patch Description: Binary data v17-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch Description: Binary data
Re: Add LZ4 compression in pg_dump
--- Original Message --- On Friday, March 24th, 2023 at 10:30 AM, gkokola...@pm.me wrote: > > --- Original Message --- > On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra > tomas.von...@enterprisedb.com wrote: > > > This leaves the empty-data issue (which we have a fix for) and the > > switch to LZ4F. And then the zstd part. > > Please expect promptly a patch for the switch to frames. Please find the expected patch attached. Note that the bulk of the patch is code unification, variable renaming to something more appropriate, and comment addition. These are changes that are not strictly necessary to switch to LZ4F. I do believe that are essential for code hygiene after the switch and they do belong on the same commit. Cheers, //Georgios > > Cheers, > //GeorgiosFrom c289fb8d49b680ad180a44b20fff1dc9553b7494 Mon Sep 17 00:00:00 2001 From: Georgios Kokolatos Date: Tue, 28 Mar 2023 15:48:06 + Subject: [PATCH v1] Use LZ4 frames in pg_dump's compressor API. This change allows for greater compaction of data, especially so in very narrow relations, by avoiding at least a compaction header and footer per row. Since LZ4 frames are now used by both compression APIs, some code deduplication opportunities have become obvious and are also implemented. Reported by: Justin Pryzby --- src/bin/pg_dump/compress_lz4.c | 358 ++--- 1 file changed, 244 insertions(+), 114 deletions(-) diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c index fc2f4e116d..078dc35cd6 100644 --- a/src/bin/pg_dump/compress_lz4.c +++ b/src/bin/pg_dump/compress_lz4.c @@ -17,7 +17,6 @@ #include "compress_lz4.h" #ifdef USE_LZ4 -#include #include /* @@ -29,102 +28,279 @@ #endif /*-- - * Compressor API - *-- + * Common to both APIs */ -typedef struct LZ4CompressorState +/* + * State used for LZ4 (de)compression by both APIs. + */ +typedef struct LZ4State { - char *outbuf; - size_t outsize; -} LZ4CompressorState; + /* + * Used by the File API to keep track of the file stream. + */ + FILE *fp; + + LZ4F_preferences_t prefs; + + LZ4F_compressionContext_t ctx; + LZ4F_decompressionContext_t dtx; + + /* + * Used by the File API's lazy initialization. + */ + bool inited; + + /* + * Used by the File API to distinguish between compression + * and decompression operations. + */ + bool compressing; + + /* + * Used by the Compressor API to mark if the compression + * headers have been written after initialization. + */ + bool needs_header_flush; + + size_t buflen; + char *buffer; + + /* + * Used by the File API to store already uncompressed + * data that the caller has not consumed. + */ + size_t overflowalloclen; + size_t overflowlen; + char *overflowbuf; + + /* + * Used by both APIs to keep track of the compressed + * data length stored in the buffer. + */ + size_t compressedlen; + + /* + * Used by both APIs to keep track of error codes. + */ + size_t errcode; +} LZ4State; + +/* + * Initialize the required LZ4State members for compression. Write the LZ4 frame + * header in a buffer keeping track of its length. Users of this function can + * choose when and how to write the header to a file stream. + * + * Returns true on success. In case of a failure returns false, and stores the + * error code in state->errcode. + */ +static bool +LZ4_compression_state_init(LZ4State *state) +{ + size_t status; + + state->buflen = LZ4F_compressBound(DEFAULT_IO_BUFFER_SIZE, &state->prefs); + + /* + * LZ4F_compressBegin requires a buffer that is greater or equal to + * LZ4F_HEADER_SIZE_MAX. Verify that the requirement is met. + */ + if (state->buflen < LZ4F_HEADER_SIZE_MAX) + state->buflen = LZ4F_HEADER_SIZE_MAX; + + status = LZ4F_createCompressionContext(&state->ctx, LZ4F_VERSION); + if (LZ4F_isError(status)) + { + state->errcode = status; + return false; + } + + state->buffer = pg_malloc(state->buflen); + status = LZ4F_compressBegin(state->ctx, +state->buffer, state->buflen, + &state->prefs); + if (LZ4F_isError(status)) + { + state->errcode = status; + return false; + } + + state->compressedlen = status; + + return true; +} + +/*-- + * Compressor API + *-- + */ /* Private routines that support LZ4 compressed data I/O */ -static void ReadDataFromArchiveLZ4(ArchiveHandle *AH, CompressorState *cs); -static void WriteDataToArchiveLZ4(ArchiveHandle *AH, CompressorState *cs, - const void *data, size_t dLen); -static void EndCompressorLZ4(ArchiveHandle *AH, CompressorState *cs); static void ReadDataFromArchiveLZ4(ArchiveHandle *AH, CompressorState *cs) { - LZ4_streamDecode_t lz4StreamDecode; - char *buf; - char *decbuf; - size_t buflen; - size_t cnt; - - buflen = DEFAULT_IO_BUFFER_SIZE; - buf = pg_malloc(buflen); - decbuf = pg_malloc(buflen); + size_t r; + size_t readbuflen; + char *outbuf; + char *re
Re: Commitfest 2023-03 starting tomorrow!
Status summary: Needs review: 116. Waiting on Author: 30. Ready for Committer: 32. Committed: 94. Moved to next CF: 17. Returned with Feedback: 6. Rejected: 6. Withdrawn: 18. Total: 319. Ok, here are the patches that have been stuck in "Waiting on Author" for a while. I divided them into three groups. * The first group have been stuck for over a week and mostly look like they should be RwF. Some I guess just moved to next CF. But some of them I'm uncertain if I should leave or if they really should be RfC or NR. * The other two groups have had some updates in the last week (actually I used 10 days). But some of them still look like they're pretty much dead for this CF and should either be moved forward or Rwf or Rejected. So here's the triage list. I'm going to send emails and start clearing out the patches pretty much right away. Some of these are pretty clearcut. Nothing in over a week: -- * Better infrastructure for automated testing of concurrency issues - Consensus that this is desirable. But it's not clear what it's actually waiting on Author for. RwF? * Provide the facility to set binary format output for specific OID's per session - I think Dave was looking for feedback and got it from Tom and Peter. I don't actually see a specific patch here but there are two patches linked in the original message. There seems to be enough feedback to proceed but nobody's working on it. RwF? * pg_visibility's pg_check_visible() yields false positive when working in parallel with autovacuum - Bug, but tentatively a false positive... * CAST( ... ON DEFAULT) - it'll have to wait till there's something solid from the committee" -- Rejected? * Fix tab completion MERGE - Partly committed but v9-0002-psql-Add-PartialMatches-macro-for-better-tab-complet.patch remains. There was a review from Dean Rasheed. Move to next CF? * Fix recovery conflict SIGUSR1 handling - This looks like a suite of bug fixes and looks like it should be Needs Review or Ready for Commit * Prefetch the next tuple's memory during seqscans - David Rowley said it was dependeny on "heapgettup() refactoring" which has been refactored. So is it now Needs Review or Ready for Commit? Is it likely to happen this CF? * Pluggable toaster - This seems to have digressed from the original patch. There were patches early on and a lot of feedback. Is the result that the original patches are Rejected or are they still live? * psql - refactor echo code - "I think this patch requires an up-to-date summary and explanation" from Peter. But it seems like Tom was ok with it and just had some additional improvements he wanted that were added. It sounds like this might be "Ready for Commit" if someone familiar with the patch looked at it. * Push aggregation down to base relations and joins - Needs a significant rebase (since March 1). * Remove self join on a unique column - An offer of a Bounty! There was one failing test which was apparently fixed? But it looks like this should be in Needs Review or Ready for Commit. * Split index and table statistics into different types of stats - Was stuck on "Generate pg_stat_get_xact*() functions with Macros" which was committed. So "Ready for Commit" now? * Default to ICU during initdb - Partly committed, 0001 waiting until after CF * suppressing useless wakeups in logical/worker.c - Got feedback March 17. Doesn't look like it's going to be ready this CF. * explain analyze rows=%.0f - Patch updated January, but I think Tom still had some simple if tedious changes he asked for * Fix order of checking ICU options in initdb and create database - Feedback Last November but no further questions or progress * Introduce array_shuffle() and array_sample() functions - Feedback from Tom last September. No further questions or progress Status Updates in last week: * Some revises in adding sorting path - Got feedback Feb 21 and author responded but doesn't look like it's going to be ready this CF * Add TAP tests for psql \g piped into program - Peter Eisentraut asked for a one-line change, otherwise it looks like it's Ready for Commit? * Improvements to Meson docs - Some feedback March 15 but no response. I assume this is still in play Emails in last week: --- * RADIUS tests and improvements - last feedback March 20, last patch March 4. Should probably be moved to the next CF unless there's progress soon. * Direct SSL Connections - (This is mine) Code for SSL is pretty finished. The last patch for ALPN support needs a bit of polish. I'll be doing that momentarily. * Fix alter subscription concurrency errors - "patch as-submitted is pretty uninteresting" and "patch that I don't care much about" ... I guess this is Rejected or Withdrawn * Fix improper qual pushdown after applying outer join identity 3 - Tom Lane's patch. Active discussion as of M
Re: running logical replication as the subscription owner
On Mon, Mar 27, 2023 at 6:08 PM Jelte Fennema wrote: > Attached is an updated version of your patch with what I had in mind > (admittedly it needed one more line than "just" the return to make it > work). But as you can see all previous tests for a lowly privileged > subscription owner that **cannot** SET ROLE to the table owner > continue to work as they did before. While still downgrading to the > table owners role when the subscription owner **can** SET ROLE to the > table owner. > > Obviously this needs some comments explaining what's going on and > probably some code refactoring and/or variable renaming, but I hope > it's clear what I meant now: For high privileged subscription owners, > we downgrade to the permissions of the table owner, but for low > privileged ones we care about permissions of the subscription owner > itself. Hmm. This is an interesting idea. A variant on this theme could be: what if we made this an explicit configuration option? I'm worried that if we just try to switch users and silently fall back to not doing so when we don't have enough permissions, the resulting behavior is going to be difficult to understand and troubleshoot. I'm thinking that maybe if you let people pick the behavior they want that becomes more comprehensible. It's also a nice insurance policy: say for the sake of argument we make switch-to-table-owner the new default. If that new behavior causes something to happen to somebody that they don't like, they can always turn it off, even if they are a highly privileged user who doesn't "need" to turn it off. -- Robert Haas EDB: http://www.enterprisedb.com
Re: zstd compression for pg_dump
On 3/27/23 19:28, Justin Pryzby wrote: > On Fri, Mar 17, 2023 at 03:43:31AM +0100, Tomas Vondra wrote: >> On 3/16/23 05:50, Justin Pryzby wrote: >>> On Fri, Mar 10, 2023 at 12:48:13PM -0800, Jacob Champion wrote: On Wed, Mar 8, 2023 at 10:59 AM Jacob Champion wrote: > I did some smoke testing against zstd's GitHub release on Windows. To > build against it, I had to construct an import library, and put that > and the DLL into the `lib` folder expected by the MSVC scripts... > which makes me wonder if I've chosen a harder way than necessary? It looks like pg_dump's meson.build is missing dependencies on zstd (meson couldn't find the headers in the subproject without them). >>> >>> I saw that this was added for LZ4, but I hadn't added it for zstd since >>> I didn't run into an issue without it. Could you check that what I've >>> added works for your case ? >>> > Parallel zstd dumps seem to work as expected, in that the resulting > pg_restore output is identical to uncompressed dumps and nothing > explodes. I haven't inspected the threading implementation for safety > yet, as you mentioned. Hm. Best I can tell, the CloneArchive() machinery is supposed to be handling safety for this, by isolating each thread's state. I don't feel comfortable pronouncing this new addition safe or not, because I'm not sure I understand what the comments in the format-specific _Clone() callbacks are saying yet. >>> >>> My line of reasoning for unix is that pg_dump forks before any calls to >>> zstd. Nothing zstd does ought to affect the pg_dump layer. But that >>> doesn't apply to pg_dump under windows. This is an opened question. If >>> there's no solid answer, I could disable/ignore the option (maybe only >>> under windows). >> >> I may be missing something, but why would the patch affect this? Why >> would it even affect safety of the parallel dump? And I don't see any >> changes to the clone stuff ... > > zstd supports using threads during compression, with -Z zstd:workers=N. > When unix forks, the child processes can't do anything to mess up the > state of the parent processes. > > But windows pg_dump uses threads instead of forking, so it seems > possible that the pg_dump -j threads that then spawn zstd threads could > "leak threads" and break the main thread. I suspect there's no issue, > but we still ought to verify that before declaring it safe. > OK. I don't have access to a Windows machine so I can't test that. Is it possible to disable the zstd threading, until we figure this out? >>> The function is first checking if it was passed a filename which already >>> has a suffix. And if not, it searches through a list of suffixes, >>> testing for an existing file with each suffix. The search with stat() >>> doesn't happen if it has a suffix. I'm having trouble seeing how the >>> hasSuffix() branch isn't dead code. Another opened question. >> >> AFAICS it's done this way because of this comment in pg_backup_directory >> >> * ... >> * ".gz" suffix is added to the filenames. The TOC files are never >> * compressed by pg_dump, however they are accepted with the .gz suffix >> * too, in case the user has manually compressed them with 'gzip'. >> >> I haven't tried, but I believe that if you manually compress the >> directory, it may hit this branch. > > That would make sense, but when I tried, it didn't work like that. > The filenames were all uncompressed names. Maybe it worked differently > in an older release. Or maybe it changed during development of the > parallel-directory-dump patch and it's actually dead code. > Interesting. Would be good to find out. I wonder if a little bit of git-log digging could tell us more. Anyway, until we confirm it's dead code, we should probably do what .gz does and have the same check for .lz4 and .zst files. > This is rebased over the updated compression API. > > It seems like I misunderstood something you said before, so now I put > back "supports_compression()". > Thanks! I need to do a bit more testing and review, but it seems pretty much RFC to me, assuming we can figure out what to do about threading. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: logical decoding and replication of sequences, take 2
On Mon, Mar 27, 2023 at 11:46 PM Tomas Vondra wrote: > > > > On 3/27/23 03:32, Masahiko Sawada wrote: > > Hi, > > > > On Fri, Mar 24, 2023 at 7:26 AM Tomas Vondra > > wrote: > >> > >> I merged the earlier "fixup" patches into the relevant parts, and left > >> two patches with new tweaks (deducing the corrent "WAL" state from the > >> current state read by copy_sequence), and the interlock discussed here. > >> > > > > Apart from that, how does the publication having sequences work with > > subscribers who are not able to handle sequence changes, e.g. in a > > case where PostgreSQL version of publication is newer than the > > subscriber? As far as I tested the latest patches, the subscriber > > (v15) errors out with the error 'invalid logical replication message > > type "Q"' when receiving a sequence change. I'm not sure it's sensible > > behavior. I think we should instead either (1) deny starting the > > replication if the subscriber isn't able to handle sequence changes > > and the publication includes that, or (2) not send sequence changes to > > such subscribers. > > > > I agree the "invalid message" error is not great, but it's not clear to > me how to do either (1). The trouble is we don't really know if the > publication contains (or will contain) sequences. I mean, what would > happen if the replication starts and then someone adds a sequence? > > For (2), I think that's not something we should do - silently discarding > some messages seems error-prone. If the publication includes sequences, > presumably the user wanted to replicate those. If they want to replicate > to an older subscriber, create a publication without sequences. > > Perhaps the right solution would be to check if the subscriber supports > replication of sequences in the output plugin, while attempting to write > the "Q" message. And error-out if the subscriber does not support it. It might be related to this topic; do we need to bump the protocol version? The commit 64824323e57d introduced new streaming callbacks and bumped the protocol version. I think the same seems to be true for this change as it adds sequence_cb callback. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Add LZ4 compression in pg_dump
On 3/28/23 18:07, gkokola...@pm.me wrote: > > > > > > --- Original Message --- > On Friday, March 24th, 2023 at 10:30 AM, gkokola...@pm.me > wrote: > >> >> --- Original Message --- >> On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra >> tomas.von...@enterprisedb.com wrote: >> >>> This leaves the empty-data issue (which we have a fix for) and the >>> switch to LZ4F. And then the zstd part. >> >> Please expect promptly a patch for the switch to frames. > > Please find the expected patch attached. Note that the bulk of the > patch is code unification, variable renaming to something more > appropriate, and comment addition. These are changes that are not > strictly necessary to switch to LZ4F. I do believe that are > essential for code hygiene after the switch and they do belong > on the same commit. > Thanks! I agree the renames & cleanup are appropriate - it'd be silly to stick to misleading naming etc. Would it make sense to split the patch into two, to separate the renames and the switch to lz4f? That'd make it the changes necessary for lz4f switch clearer. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Data is copied twice when specifying both child and parent table in publication
On Tue, Mar 28, 2023 at 2:59 AM wangw.f...@fujitsu.com wrote: > The scenario of this bug is to subscribe to two publications at the same time, > and these two publications publish parent table and child table respectively. > And option via_root is specified in both publications or only in the > publication > of the parent table. Ah, reading the initial mail again, that makes sense. I came to this thread with the alternative reproduction in mind (subscribing to one publication with viaroot=true, and another publication with viaroot=false) and misread the report accordingly... In the end, I'm comfortable with the current level of coverage. > > Would it > > be enough to just replace that whole thing with gpt.attrs? > > Make sense. > Changed as suggested. LGTM, by inspection. Thanks! --Jacob
Re: Non-superuser subscription owners
On Fri, 2023-03-24 at 00:17 -0700, Jeff Davis wrote: > The other patch you posted seems like it makes a lot of progress in > that direction, and I think that should go in first. That was one of > the items I suggested previously[2], so thank you for working on > that. The above is not a hard objection. I still hold the opinion that the non-superuser subscriptions work is feels premature without the apply-as-table-owner work. It would be great if the other patch ends up ready quickly, which would moot the commit-ordering question. Regards, Jeff Davis
Re: running logical replication as the subscription owner
On Tue, Mar 28, 2023 at 1:38 AM Jeff Davis wrote: > > If they do have those things then in > > theory they might be able to protect themselves, but in practice they > > are unlikely to be careful enough. I judge that practically every > > installation where table owners use triggers would be easily > > compromised. Only the most security-conscious of table owners are > > going to get this right. > > This is the interesting part. > > Commit 11da97024ab set the subscription process's search path as empty. > It seems it was done to protect the subscription owner against users > writing into the public schema. But after your apply-as-table-owner > patch, it seems to also offer some protection for table owners against > a malicious subscription owner, too. > > But I must be missing something obvious here -- if the subscription > process has an empty search_path, what can an attacker do to trick it? Oh, interesting. I hadn't realized we were doing that, and I do think that narrows the vulnerability quite a bit. But I still feel pretty uncomfortable with the idea of requiring only INSERT/UPDATE/DELETE permissions on the target table. Let's suppose that you create a table and attach a trigger to it which is SECURITY INVOKER. Absent logical replication, you don't really need to worry about whether that function is written securely -- it will run with privileges of the person performing the DML, and not impact your account at all. They have reason to be scared of you, but not the reverse. However, if the people doing DML on the table can arrange to perform that DML as you, then any security vulnerabilities in your function potentially allow them to compromise your account. Now, there might not be any, but there also might be some, depending on exactly what your function does. And if logical replication into a table requires only I/U/D permission then it's basically a back-door way to run functions that someone could otherwise execute only as themselves as the table owner, and that's scary. Now, I don't know how much to worry about that. As you just pointed out to me in some out-of-band discussion, this is only going to affect a table owner who writes a trigger, makes it insecure in some way other than failure to sanitize the search_path, and sets it ENABLE ALWAYS TRIGGER or ENABLE REPLICA TRIGGER. And maybe we could say that if you do that last part, you kind of need to think about the consequences for logical replication. If so, we could document that problem away. It would also affect someone who uses a default expression or other means of associating executable code with the table, and something like a default expression doesn't require any explicit setting to make it apply in case of replication, so maybe the risk of someone messing up is a bit higher. But this definitely makes it more of a judgment call than I thought initially. Functions that are vulnerable to search_path exploits are a dime a dozen; other kinds of exploits are going to be less common, and more dependent on exactly what the function is doing. I'm still inclined to leave the patch checking for SET ROLE, because after all, we're thinking of switching user identities to the table owner, and checking whether we can SET ROLE is the most natural way of doing that, and definitely doesn't leave room to escalate to any privilege you don't already have. However, there might be an argument that we ought to do something else, like have a REPLICATE privilege on the table that the owner can grant to users that they trust to replicate into that table. Because time is short, I'd like to leave that idea for a future release. What I would like to change in the patch in this release is to add a new subscription property along the lines of what I proposed to Jelte in my earlier email: let's provide an escape hatch that turns off the user-switching behavior. If we do that, then anyone who feels themselves worse off after this patch can switch back to the old behavior. Most people will be better off, I believe, and the opportunity to make things still better in the future is not foreclosed. -- Robert Haas EDB: http://www.enterprisedb.com
Re: zstd compression for pg_dump
On Tue, Mar 28, 2023 at 12:23 PM Tomas Vondra wrote: > On 3/27/23 19:28, Justin Pryzby wrote: > > On Fri, Mar 17, 2023 at 03:43:31AM +0100, Tomas Vondra wrote: > >> On 3/16/23 05:50, Justin Pryzby wrote: > >>> On Fri, Mar 10, 2023 at 12:48:13PM -0800, Jacob Champion wrote: > On Wed, Mar 8, 2023 at 10:59 AM Jacob Champion < > jchamp...@timescale.com> wrote: > > I did some smoke testing against zstd's GitHub release on Windows. To > ... > OK. I don't have access to a Windows machine so I can't test that. Is it > possible to disable the zstd threading, until we figure this out? > > Thomas since I appear to be one of the few windows users (I use both), can I help? I can test pg_dump... for you, easy to do. I do about 5-10 pg_dumps a day on windows while developing. Also, I have an AWS instance I created to build PG/Win with readline back in November. I could give you access to that... (you are not the only person who has made this statement here). I've made such instances available for other Open Source developers, to support them. Obvi I would share connection credentials privately. Regards, Kirk
Re: Add pg_walinspect function with block info columns
On Mon, Mar 27, 2023 at 7:47 PM Bharath Rupireddy wrote: > Hm, agreed. Changed in the attached v7-0002 patch. We can as well > write a case statement in the create function SQL to output forkname > instead forknumber, but I'd stop doing that to keep in sync with > pg_buffercache. I just don't see much value in any textual representation of fork name, however generated. In practice it's just not adding very much useful information. It is mostly useful as a way of filtering block references, which makes simple integers more natural. > Oh, thanks for finding it out. Fixed in the attached v7-0001 patch. I > also removed the "invalid fork number" error as users can figure that > out if at all the fork number is wrong. Pushed just now. > On the ordering of the columns, I kept start_lsn, end_lsn and prev_lsn > first and then the rel** columns (this rel** columns order follows > pg_buffercache) and then block data related columns. Michael and > Kyotaro are of the opinion that it's better to keep LSNs first to be > consistent and also given that this function is WAL related, it makes > sense to have LSNs first. Right, but I didn't change that part in the revision of the patch I posted. Those columns still came first, and were totally consistent with the pg_get_wal_record_info function. I think that there was a "mid air collision" here, where we both posted patches that we each called v7 within minutes of each other. Just to be clear, I ended up with a column order as described here in my revision: https://postgr.es/m/CAH2-WzmzO-AU4QSbnzzANBkrpg=4cuod3scvtv+7x65e+qk...@mail.gmail.com It now occurs to me that "fpi_data" should perhaps be called "block_fpi_data". What do you think? -- Peter Geoghegan
Re: allow_in_place_tablespaces vs. pg_basebackup
On Tue, Mar 28, 2023 at 5:15 PM Thomas Munro wrote: > I guess we should add a test of -Fp too, to keep it working? Oops, that was already tested in existing tests, so I take that back.
Re: "variable not found in subplan target list"
So I'm back home and found a couple more weird errors in the log: MERGE INTO public.idxpart2 as target_0 USING (select 1 from public.xmltest2 as ref_0 inner join public.prt1_l_p1 as sample_0 inner join fkpart4.droppk as ref_1 on (sample_0.a = ref_1.a ) on (true) limit 50) as subq_0 left join information_schema.transforms as ref_2 left join public.transition_table_status as sample_1 on (ref_2.transform_type is not NULL) on (true) ON target_0.a = sample_1.level WHEN MATCHED THEN UPDATE set a = target_0.a; ERROR: mismatching PartitionPruneInfo found at part_prune_index 0 DETALLE: plan node relids (b 1), pruneinfo relids (b 36) This one is probably my fault, will look later. select pg_catalog.pg_stat_get_buf_fsync_backend() as c9 from public.tenk2 as ref_0 where (ref_0.stringu2 is NULL) and (EXISTS ( select 1 from fkpart5.fk1 as ref_1 where pg_catalog.current_date() < (select pg_catalog.max(filler3) from public.mcv_lists))) ; ERROR: subplan "InitPlan 1 (returns $1)" was not initialized CONTEXTO: parallel worker select 1 as c0 from (select subq_0.c9 as c5, subq_0.c8 as c9 from public.iso8859_5_inputs as ref_0, lateral (select ref_1.ident as c2, ref_0.description as c8, ref_1.used_bytes as c9 from pg_catalog.pg_backend_memory_contexts as ref_1 where true ) as subq_0 where subq_0.c2 is not NULL) as subq_1 inner join pg_catalog.pg_class as sample_0 on (subq_1.c5 = public.int8alias1in( cast(case when subq_1.c9 is not NULL then null end as cstring))) where true; ERROR: could not find commutator for operator 53286 There were quite a few of those "variable not found" ones, both mentioning singular "targetlist" and others that said "targetlists". I reran them with your patch and they no longer error out, so I guess it's all the same bug. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "I must say, I am absolutely impressed with what pgsql's implementation of VALUES allows me to do. It's kind of ridiculous how much "work" goes away in my code. Too bad I can't do this at work (Oracle 8/9)." (Tom Allison) http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php
Re: logical decoding and replication of sequences, take 2
On 3/28/23 18:34, Masahiko Sawada wrote: > On Mon, Mar 27, 2023 at 11:46 PM Tomas Vondra > wrote: >> >> >> >> On 3/27/23 03:32, Masahiko Sawada wrote: >>> Hi, >>> >>> On Fri, Mar 24, 2023 at 7:26 AM Tomas Vondra >>> wrote: I merged the earlier "fixup" patches into the relevant parts, and left two patches with new tweaks (deducing the corrent "WAL" state from the current state read by copy_sequence), and the interlock discussed here. >>> >>> Apart from that, how does the publication having sequences work with >>> subscribers who are not able to handle sequence changes, e.g. in a >>> case where PostgreSQL version of publication is newer than the >>> subscriber? As far as I tested the latest patches, the subscriber >>> (v15) errors out with the error 'invalid logical replication message >>> type "Q"' when receiving a sequence change. I'm not sure it's sensible >>> behavior. I think we should instead either (1) deny starting the >>> replication if the subscriber isn't able to handle sequence changes >>> and the publication includes that, or (2) not send sequence changes to >>> such subscribers. >>> >> >> I agree the "invalid message" error is not great, but it's not clear to >> me how to do either (1). The trouble is we don't really know if the >> publication contains (or will contain) sequences. I mean, what would >> happen if the replication starts and then someone adds a sequence? >> >> For (2), I think that's not something we should do - silently discarding >> some messages seems error-prone. If the publication includes sequences, >> presumably the user wanted to replicate those. If they want to replicate >> to an older subscriber, create a publication without sequences. >> >> Perhaps the right solution would be to check if the subscriber supports >> replication of sequences in the output plugin, while attempting to write >> the "Q" message. And error-out if the subscriber does not support it. > > It might be related to this topic; do we need to bump the protocol > version? The commit 64824323e57d introduced new streaming callbacks > and bumped the protocol version. I think the same seems to be true for > this change as it adds sequence_cb callback. > It's not clear to me what should be the exact behavior? I mean, imagine we're opening a connection for logical replication, and the subscriber does not handle sequences. What should the publisher do? (Note: The correct commit hash is 464824323e57d.) I don't think the streaming is a good match for sequences, because of a couple important differences ... Firstly, streaming determines *how* the changes are replicated, not what gets replicated. It doesn't (silently) filter out "bad" events that the subscriber doesn't know how to apply. If the subscriber does not know how to deal with streamed xacts, it'll still get the same changes exactly per the publication definition. Secondly, the default value is "streming=off", i.e. the subscriber has to explicitly request streaming when opening the connection. And we simply check it against the negotiated protocol version, i.e. the check in pgoutput_startup() protects against subscriber requesting a protocol v1 but also streaming=on. I don't think we can/should do more check at this point - we don't know what's included in the requested publications at that point, and I doubt it's worth adding because we certainly can't predict if the publication will be altered to include/decode sequences in the future. Speaking of precedents, TRUNCATE is probably a better one, because it's a new action and it determines *what* the subscriber can handle. But that does exactly the thing we do for sequences - if you open a connection from PG10 subscriber (truncate was added in PG11), and the publisher decodes a truncate, subscriber will do: 2023-03-28 20:29:46.921 CEST [2357609] ERROR: invalid logical replication message type "T" 2023-03-28 20:29:46.922 CEST [2356534] LOG: worker process: logical replication worker for subscription 16390 (PID 2357609) exited with exit code 1 I don't see why sequences should do anything else. If you need to replicate to such subscriber, create a publication that does not have 'sequence' in the publish option ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: "variable not found in subplan target list"
Alvaro Herrera writes: > So I'm back home and found a couple more weird errors in the log: > ERROR: mismatching PartitionPruneInfo found at part_prune_index 0 > DETALLE: plan node relids (b 1), pruneinfo relids (b 36) This one reproduces for me. > select > pg_catalog.pg_stat_get_buf_fsync_backend() as c9 > from > public.tenk2 as ref_0 > where (ref_0.stringu2 is NULL) > and (EXISTS ( > select 1 from fkpart5.fk1 as ref_1 > where pg_catalog.current_date() < (select pg_catalog.max(filler3) > from public.mcv_lists))) ; > ERROR: subplan "InitPlan 1 (returns $1)" was not initialized > CONTEXTO: parallel worker Hmph, I couldn't reproduce that, not even with other settings of debug_parallel_query. Are you running it with non-default planner parameters? > select 1 as c0 > ... > ERROR: could not find commutator for operator 53286 I got a slightly different error: ERROR: missing support function 1(195306,195306) in opfamily 1976 where regression=# select 195306::regtype; regtype int8alias1 (1 row) So that one is related to the intentionally-somewhat-broken int8 opclass configuration that equivclass.sql leaves behind. I've always had mixed emotions about whether leaving that set up that way was a good idea or not. In principle nothing really bad should happen, but it can lead to confusing errors like this one. Maybe it'd be better to roll that back? regards, tom lane
Re: POC: Better infrastructure for automated testing of concurrency issues
On Wed, 31 Aug 2022 at 20:52, Alexander Korotkov wrote: > > I'll continue work on this patch. The rebased patch is attached. It > implements stop events as configure option (not runtime GUC option). It looks like this patch isn't going to be ready this commitfest. And it hasn't received much discussion since August 2022. If I'm wrong say something but otherwise I'll mark it Returned With Feedback. It can be resurrected (and moved to the next commitfest) when you're free to work on it again. -- Gregory Stark As Commitfest Manager
Re: zstd compression for pg_dump
On 3/28/23 20:03, Kirk Wolak wrote: > On Tue, Mar 28, 2023 at 12:23 PM Tomas Vondra > mailto:tomas.von...@enterprisedb.com>> > wrote: > > On 3/27/23 19:28, Justin Pryzby wrote: > > On Fri, Mar 17, 2023 at 03:43:31AM +0100, Tomas Vondra wrote: > >> On 3/16/23 05:50, Justin Pryzby wrote: > >>> On Fri, Mar 10, 2023 at 12:48:13PM -0800, Jacob Champion wrote: > On Wed, Mar 8, 2023 at 10:59 AM Jacob Champion > mailto:jchamp...@timescale.com>> wrote: > > I did some smoke testing against zstd's GitHub release on > Windows. To > ... > OK. I don't have access to a Windows machine so I can't test that. Is it > possible to disable the zstd threading, until we figure this out? > > Thomas since I appear to be one of the few windows users (I use both), > can I help? > I can test pg_dump... for you, easy to do. I do about 5-10 pg_dumps a > day on windows while developing. > Perhaps. But I'll leave the details up to Justin - it's his patch, and I'm not sure how to verify the threading is OK. I'd try applying this patch, build with --with-zstd and then run the pg_dump TAP tests, and perhaps do some manual tests. And perhaps do the same for --with-lz4 - there's a thread [1] suggesting we don't detect lz4 stuff on Windows, so the TAP tests do nothing. https://www.postgresql.org/message-id/zajl96n9zw84u...@msg.df7cb.de > Also, I have an AWS instance I created to build PG/Win with readline > back in November. > I could give you access to that... (you are not the only person who has > made this statement here). > I've made such instances available for other Open Source developers, to > support them. > > Obvi I would share connection credentials privately. > I'd rather leave the Windows stuff up to someone with more experience with that platform. I have plenty of other stuff on my plate atm. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Add LZ4 compression in pg_dump
On Tue, Mar 28, 2023 at 06:40:03PM +0200, Tomas Vondra wrote: > On 3/28/23 18:07, gkokola...@pm.me wrote: > > --- Original Message --- > > On Friday, March 24th, 2023 at 10:30 AM, gkokola...@pm.me > > wrote: > > > >> --- Original Message --- > >> On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra > >> tomas.von...@enterprisedb.com wrote: > >> > >>> This leaves the empty-data issue (which we have a fix for) and the > >>> switch to LZ4F. And then the zstd part. > >> > >> Please expect promptly a patch for the switch to frames. > > > > Please find the expected patch attached. Note that the bulk of the > > patch is code unification, variable renaming to something more > > appropriate, and comment addition. These are changes that are not > > strictly necessary to switch to LZ4F. I do believe that are > > essential for code hygiene after the switch and they do belong > > on the same commit. > > Thanks! > > I agree the renames & cleanup are appropriate - it'd be silly to stick > to misleading naming etc. Would it make sense to split the patch into > two, to separate the renames and the switch to lz4f? > That'd make it the changes necessary for lz4f switch clearer. I don't think so. Did you mean separate commits only for review ? The patch is pretty readable - the File API has just some renames, and the compressor API is what's being replaced, which isn't going to be any more clear. @Georgeos: did you consider using a C union in LZ4State, to separate the parts used by the different APIs ? -- Justin
Re: Request for comment on setting binary format output per session
FYI I attached this thread to https://commitfest.postgresql.org/42/3777 which I believe is the same issue. I mistakenly had this listed as a CF entry with no discussion for a long time due to that missing link. -- Gregory Stark As Commitfest Manager
Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
On Tue, 3 Jan 2023 at 14:16, Tom Lane wrote: > > Vik Fearing writes: > > > I don't think we should add that syntax until I do get it through the > > committee, just in case they change something. > > Agreed. So this is something we won't be able to put into v16; > it'll have to wait till there's something solid from the committee. I guess I'll mark this Rejected in the CF then. Who knows when the SQL committee will look at this... -- Gregory Stark As Commitfest Manager
Re: running logical replication as the subscription owner
On Tue, 28 Mar 2023 at 18:13, Robert Haas wrote: > > On Mon, Mar 27, 2023 at 6:08 PM Jelte Fennema wrote: > > For high privileged subscription owners, > > we downgrade to the permissions of the table owner, but for low > > privileged ones we care about permissions of the subscription owner > > itself. > > Hmm. This is an interesting idea. A variant on this theme could be: > what if we made this an explicit configuration option? Sounds perfect to me. Let's do that. As long as the old no-superuser tests continue to pass when disabling the new switch-to-table-owner behaviour, that sounds totally fine to me. I think it's probably easiest if you use the tests from my v2 patch when you add that option, since that was by far the thing I spent most time on to get right in the v2 patch. On Tue, 28 Mar 2023 at 18:13, Robert Haas wrote: > > On Mon, Mar 27, 2023 at 6:08 PM Jelte Fennema wrote: > > Attached is an updated version of your patch with what I had in mind > > (admittedly it needed one more line than "just" the return to make it > > work). But as you can see all previous tests for a lowly privileged > > subscription owner that **cannot** SET ROLE to the table owner > > continue to work as they did before. While still downgrading to the > > table owners role when the subscription owner **can** SET ROLE to the > > table owner. > > > > Obviously this needs some comments explaining what's going on and > > probably some code refactoring and/or variable renaming, but I hope > > it's clear what I meant now: For high privileged subscription owners, > > we downgrade to the permissions of the table owner, but for low > > privileged ones we care about permissions of the subscription owner > > itself. > > Hmm. This is an interesting idea. A variant on this theme could be: > what if we made this an explicit configuration option? > > I'm worried that if we just try to switch users and silently fall back > to not doing so when we don't have enough permissions, the resulting > behavior is going to be difficult to understand and troubleshoot. I'm > thinking that maybe if you let people pick the behavior they want that > becomes more comprehensible. It's also a nice insurance policy: say > for the sake of argument we make switch-to-table-owner the new > default. If that new behavior causes something to happen to somebody > that they don't like, they can always turn it off, even if they are a > highly privileged user who doesn't "need" to turn it off. > > -- > Robert Haas > EDB: http://www.enterprisedb.com
Re: [PATCH]Feature improvement for MERGE tab completion
It looks like this remaining work isn't going to happen this CF and therefore this release. There hasn't been an update since January when Dean Rasheed posted a review. Unless there's any updates soon I'll move this on to the next commitfest or mark it returned with feedback. -- Gregory Stark As Commitfest Manager
Re: [PATCH]Feature improvement for MERGE tab completion
Ah, another thread with a bouncing email address... Please respond to to thread from this point to avoid bounces. -- Gregory Stark As Commitfest Manager
Re: Why mark empty pages all visible?
Hi, On 2023-03-27 21:51:09 -0700, Peter Geoghegan wrote: > On Mon, Mar 27, 2023 at 9:32 PM Andres Freund wrote: > > > You haven't said anything about the leading cause of marking empty > > > pages all-frozen, by far: lazy_vacuum_heap_page(). Should it also stop > > > marking empty pages all-frozen? > > > > It's not obvious that it should - but it's not as clear a case as the > > ConditionalLockBufferForCleanup() -> lazy_scan_new_or_empty() one. In the > > latter, we know > > a) that we don't have to do any work to be able to advance the horizon > > b) we know that somebody else has the page pinned > > > > What's the point in marking it all-visible at that point? In quite likely be > > from RelationGetBufferForTuple() having extended the relation and then > > briefly > > needed to release the lock (to acquire the lock on otherBuffer or in > > GetVisibilityMapPins()). > > I think that there is significant value in avoiding special cases, on > general principle. If we stopped doing this in > lazy_scan_new_or_empty() we'd be inventing a new thing that cleanup > locks are supposed to protect against. Maybe something like that would > make sense, but if so then make that argument, and make it explicitly > represented in the code. I will probably make that argument - so far I was just trying to understand the intent of the current code. There aren't really comments explaining why we want to mark currently-pinned empty/new pages all-visible and enter them into the FSM. Historically we did *not* enter currently pinned empty/new pages into the FSM / VM. Afaics that's new as of 44fa84881fff. The reason I'm looking at this is that there's a lot of complexity at the bottom of RelationGetBufferForTuple(), related to needing to release the lock on the newly extended page and then needing to recheck whether there still is free space on the page. And that it's not complicated enough (c.f. INSERT_FROZEN doing visibilitymap_pin() with the page locked). As far as I can tell, if we went back to not entering new/empty pages into either VM or FSM, we could rely on the page not getting filled while just pinning, not locking it. > > I don't follow. In the ConditionalLockBufferForCleanup() -> > > lazy_scan_new_or_empty() case we are dealing with an new or empty > > page. Whereas lazy_vacuum_heap_page() deals with a page that definitely > > has dead tuples on it. How are those two cases comparable? > > It doesn't have dead tuples anymore, though. > > ISTM that there is an issue here with the definition of an empty page. > You're concerned about PageIsEmpty() pages. And not just any PageIsEmpty() page, ones that are currently pinned. I also do wonder whether the different behaviour of PageIsEmpty() and PageIsNew() pages makes sense. The justification for not marking PageIsNew() pages as all-visible holds just as true for empty pages. There's just as much free space there. Greetings, Andres Freund
Re: Schema variables - new implementation for Postgres 15
ne 26. 3. 2023 v 13:32 odesílatel Julien Rouhaud napsal: > Hi, > > I just have a few minor wording improvements for the various comments / > documentation you quoted. > > On Sun, Mar 26, 2023 at 08:53:49AM +0200, Pavel Stehule wrote: > > út 21. 3. 2023 v 17:18 odesílatel Peter Eisentraut < > > peter.eisentr...@enterprisedb.com> napsal: > > > > > - What is the purpose of struct Variable? It seems very similar to > > >FormData_pg_variable. At least a comment would be useful. > > > > > > > I wrote comment there: > > > > > > /* > > * The Variable struct is based on FormData_pg_variable struct. Against > > * FormData_pg_variable it can hold node of deserialized expression used > > * for calculation of default value. > > */ > > Did you mean "Unlike" rather than "Against"? > fixed > > > > 0002 > > > > > > expr_kind_allows_session_variables() should have some explanation > > > about criteria for determining which expression kinds should allow > > > variables. > > > > > > > I wrote comment there: > > > > /* > > * Returns true, when expression of kind allows using of > > * session variables. > > + * The session's variables can be used everywhere where > > + * can be used external parameters. Session variables > > + * are not allowed in DDL. Session's variables cannot be > > + * used in constraints. > > + * > > + * The identifier can be parsed as an session variable > > + * only in expression's kinds where session's variables > > + * are allowed. This is the primary usage of this function. > > + * > > + * Second usage of this function is for decision if > > + * an error message "column does not exist" or "column > > + * or variable does not exist" should be printed. When > > + * we are in expression, where session variables cannot > > + * be used, we raise the first form or error message. > > */ > > Maybe > > /* > * Returns true if the given expression kind is valid for session variables > * Session variables can be used everywhere where external parameters can > be > * used. Session variables are not allowed in DDL commands or in > constraints. > * > * An identifier can be parsed as a session variable only for expression > kinds > * where session variables are allowed. This is the primary usage of this > * function. > * > * Second usage of this function is to decide whether "column does not > exist" or > * "column or variable does not exist" error message should be printed. > * When we are in an expression where session variables cannot be used, we > raise > * the first form or error message. > */ > changed > > > > session_variables_ambiguity_warning: There needs to be more > > > information about this. The current explanation is basically just, > > > "warn if your query is confusing". Why do I want that? Why would I > > > not want that? What is the alternative? What are some examples? > > > Shouldn't there be a standard behavior without a need to configure > > > anything? > > > > > > > I enhanced this entry: > > > > + > > +The session variables can be shadowed by column references in a > > query. This > > +is an expected feature. The existing queries should not be > broken > > by creating > > +any session variable, because session variables are shadowed > > always if the > > +identifier is ambiguous. The variables should be named without > > possibility > > +to collision with identifiers of other database objects (column > > names or > > +record field names). The warnings enabled by setting > > session_variables_ambiguity_warning > > +should help with finding identifier's collisions. > > Maybe > > Session variables can be shadowed by column references in a query, this is > an > expected behavior. Previously working queries shouldn't error out by > creating > any session variable, so session variables are always shadowed if an > identifier > is ambiguous. Variables should be referenced using an unambiguous > identifier > without any possibility for a collision with identifier of other database > objects (column names or record fields names). The warning messages > emitted > when enabling session_variables_ambiguity_warning can > help > finding such identifier collision. > > > + > > + > > +This feature can significantly increase size of logs, and then > it > > is > > +disabled by default, but for testing or development > environments it > > +should be enabled. > > Maybe > > This feature can significantly increase log size, so it's disabled by > default. > For testing or development environments it's recommended to enable it if > you > use session variables. > replaced Thank you very much for these language correctures Regards Pavel p.s. I'll send updated patch after today or tomorrow - I have to fix broken dependency check after rebase
Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
On Mon, 19 Dec 2022 at 17:57, Corey Huinker wrote: > > Attached is my work in progress to implement the changes to the CAST() > function as proposed by Vik Fearing. > > CAST(expr AS typename NULL ON ERROR) > will use error-safe functions to do the cast of expr, and will return > NULL if the cast fails. > > CAST(expr AS typename DEFAULT expr2 ON ERROR) > will use error-safe functions to do the cast of expr, and will return > expr2 if the cast fails. > Is there any difference between NULL and DEFAULT NULL?
Re: Documentation for building with meson
Hi, On Wed, Mar 15, 2023 at 4:28 AM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > > [PATCH v8 1/5] Make minor additions and corrections to meson docs > > The last hunk revealed that there is some mixing up between meson setup > and meson configure. This goes a bit further. For example, earlier it > says that to get a list of meson setup options, call meson configure > --help and look at https://mesonbuild.com/Commands.html#configure, which > are both wrong. Also later throughout the text it uses one or the > other. I think this has the potential to be very confusing, and we > should clean this up carefully. > > The text about additional meson test options maybe should go into the > regress.sgml chapter? > I tried to make the meson setup and meson configure usage consistent. I've removed the text for the test options. > > > > [PATCH v8 2/5] Add data layout options sub-section in installation > docs > > This makes sense. Please double check your patch for correct title > casing, vertical spacing of XML, to keep everything looking consistent. > Thanks for noticing. Made it consistent on both sides. > > This text isn't yours, but since your patch emphasizes it, I wonder if > it could use some clarification: > > + These options affect how PostgreSQL lays out data on disk. > + Note that changing these breaks on-disk database compatibility, > + meaning you cannot use pg_upgrade to upgrade to > + a build with different values of these options. > > This isn't really correct. What breaking on-disk compatibility means is > that you can't use a server compiled one way with a data directory > initialized by binaries compiled another way. pg_upgrade may well have > the ability to upgrade between one or the other; that's up to pg_upgrade > to figure out but not an intrinsic property. (I wonder why pg_upgrade > cares about the WAL block size.) > Fixed. > > > > [PATCH v8 3/5] Remove Anti-Features section from Installation from > source docs > > Makes sense. But is "--disable-thread-safety" really a developer > feature? I think not. > > Moved to PostgreSQL features. Do you think there's a better place for it? > > > [PATCH v8 4/5] Re-organize Miscellaneous section > > This moves the Miscellaneous section after Developer Features. I think > Developer Features should be last. > > Maybe should remove this section and add the options to the regular > PostgreSQL Features section. > Yes, that makes sense. Made this change. > > Also consider the grouping in meson_options.txt, which is slightly > different yet. Removed Misc options section from meson_options.txt too. > > > > [PATCH v8 5/5] Change Short Version for meson installation guide > > +# create working directory > +mkdir postgres > +cd postgres > + > +# fetch source code > +git clone https://git.postgresql.org/git/postgresql.git src > > This comes after the "Getting the Source" section, so at this point they > already have the source and don't need to do "git clone" etc. again. > > +# setup and enter build directory (done only first time) > +## Unix based platforms > +meson setup build src --prefix=$PWD/install > + > +## Windows > +meson setup build src --prefix=%cd%/install > > Maybe some people work this way, but to me the directory structures you > create here are completely weird. > I'd like to discuss what you think is a good directory structure to work with. I've mentioned some of the drawbacks I see with the current structure for the short version. I know this structure can feel different but it feeling weird is not ideal. Do you have a directory structure in mind which is different but doesn't feel odd to you? > > +# Initialize a new database > +../install/bin/initdb -D ../data > + > +# Start database > +../install/bin/pg_ctl -D ../data/ -l logfile start > + > +# Connect to the database > +../install/bin/psql -d postgres > > The terminology here needs to be tightened up. You are using "database" > here to mean three different things. > I'll address this together once we are aligned on the overall directory structure etc. There are a few reasons why I had done this. Some reasons Andres has > described in his previous email and I'll add a few specific examples on why > having the same section for both might not be a good idea. > > * Having readline and zlib as required for building PostgreSQL is now > wrong because they are not required for meson builds. Also, the name of the > configs are different for make and meson and the current section only lists > the make ones. > * There are many references to configure in that section which don't > apply to meson. > * Last I checked Flex and Bison were always required to build via meson > but not for make and the current section doesn't explain those differences. > > I spent a good amount of time thinking if we could have a single section, > clarify these differences to make it correct and not confuse the users. I > couldn't find a way to do all thre
Re: Schema variables - new implementation for Postgres 15
Hi ne 26. 3. 2023 v 19:53 odesílatel Dmitry Dolgov <9erthali...@gmail.com> napsal: > > On Sun, Mar 26, 2023 at 07:32:05PM +0800, Julien Rouhaud wrote: > > Hi, > > > > I just have a few minor wording improvements for the various comments / > > documentation you quoted. > > Talking about documentation I've noticed that the implementation > contains few limitations, that are not mentioned in the docs. Examples > are WITH queries: > > WITH x AS (LET public.svar = 100) SELECT * FROM x; > ERROR: LET not supported in WITH query > The LET statement doesn't support the RETURNING clause, so using inside CTE does not make any sense. Do you have some tips, where this behaviour should be mentioned? > and using with set-returning functions (haven't found any related tests). > There it is: +CREATE VARIABLE public.svar AS int; +-- should be ok +LET public.svar = generate_series(1, 1); +-- should fail +LET public.svar = generate_series(1, 2); +ERROR: expression returned more than one row +LET public.svar = generate_series(1, 0); +ERROR: expression returned no rows +DROP VARIABLE public.svar; > > Another small note is about this change in the rowsecurity: > > /* > -* For SELECT, UPDATE and DELETE, add security quals to enforce > the USING > -* policies. These security quals control access to existing > table rows. > -* Restrictive policies are combined together using AND, and > permissive > -* policies are combined together using OR. > +* For SELECT, LET, UPDATE and DELETE, add security quals to > enforce the > +* USING policies. These security quals control access to > existing table > +* rows. Restrictive policies are combined together using AND, and > +* permissive policies are combined together using OR. > */ > > From this commentary one may think that LET command supports row level > security, but I don't see it being implemented. A wrong commentary? > I don't think so. The row level security should be supported. I tested it on example from doc: CREATE TABLE public.accounts ( manager text, company text, contact_email text ); CREATE VARIABLE public.v AS text; COPY public.accounts (manager, company, contact_email) FROM stdin; t1role xxx t1r...@xxx.org t2role yyy t2r...@yyy.org \. CREATE POLICY account_managers ON public.accounts USING ((manager = CURRENT_USER)); ALTER TABLE public.accounts ENABLE ROW LEVEL SECURITY; GRANT SELECT,INSERT ON TABLE public.accounts TO t1role; GRANT SELECT,INSERT ON TABLE public.accounts TO t2role; GRANT ALL ON VARIABLE public.v TO t1role; GRANT ALL ON VARIABLE public.v TO t2role; [pavel@localhost postgresql.master]$ psql Assertions: on psql (16devel) Type "help" for help. (2023-03-28 21:32:33) postgres=# set role to t1role; SET (2023-03-28 21:32:40) postgres=# select * from accounts ; ┌─┬─┬┐ │ manager │ company │ contact_email │ ╞═╪═╪╡ │ t1role │ xxx │ t1r...@xxx.org │ └─┴─┴┘ (1 row) (2023-03-28 21:32:45) postgres=# let v = (select company from accounts); LET (2023-03-28 21:32:58) postgres=# select v; ┌─┐ │ v │ ╞═╡ │ xxx │ └─┘ (1 row) (2023-03-28 21:33:03) postgres=# set role to default; SET (2023-03-28 21:33:12) postgres=# set role to t2role; SET (2023-03-28 21:33:19) postgres=# select * from accounts ; ┌─┬─┬┐ │ manager │ company │ contact_email │ ╞═╪═╪╡ │ t2role │ yyy │ t2r...@yyy.org │ └─┴─┴┘ (1 row) (2023-03-28 21:33:22) postgres=# let v = (select company from accounts); LET (2023-03-28 21:33:26) postgres=# select v; ┌─┐ │ v │ ╞═╡ │ yyy │ └─┘ (1 row) Regards Pavel
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hi, Patch v15-0001-ALTER-TABLE-MERGE-PARTITIONS-command.patch Apply nicely. One warning on meson compile (configure -Dssl=openssl -Dldap=enabled -Dauto_features=enabled -DPG_TEST_EXTRA='ssl,ldap,kerberos' -Dbsd_auth=disabled -Dbonjour=disabled -Dpam=disabled -Dpltcl=disabled -Dsystemd=disabled -Dzstd=disabled -Db_coverage=true) ../../src/pgmergesplit/src/test/modules/test_ddl_deparse/test_ddl_deparse.c: In function ‘get_altertable_subcmdinfo’: ../../src/pgmergesplit/src/test/modules/test_ddl_deparse/test_ddl_deparse.c:112:17: warning: enumeration value ‘AT_MergePartitions’ not handled in switch [-Wswitch] 112 | switch (subcmd->subtype) | ^~ Should be the same with 0002... meson test perfect, patch coverage is very good. Patch v15-0002-ALTER-TABLE-SPLIT-PARTITION-command.patch Doesn't apply on 326a33a289c7ba2dbf45f17e610b7be98dc11f67 Patch v15-0003-Documentation-for-ALTER-TABLE-SPLIT-PARTITION-ME.patch Apply with one warning 1 line add space error (translate from french "warning: 1 ligne a ajouté des erreurs d'espace"). v15-0003-Documentation-for-ALTER-TABLE-SPLIT-PARTITION-ME.patch:54: trailing whitespace. One of the new partitions partition_name1, Comment are ok for me. A non native english speaker. Perhaps you could add some remarks in ddl.html and alter-ddl.html Stéphane The new status of this patch is: Waiting on Author
Re: POC: Better infrastructure for automated testing of concurrency issues
On Tue, Mar 28, 2023 at 9:44 PM Gregory Stark (as CFM) wrote: > On Wed, 31 Aug 2022 at 20:52, Alexander Korotkov wrote: > > > > I'll continue work on this patch. The rebased patch is attached. It > > implements stop events as configure option (not runtime GUC option). > > It looks like this patch isn't going to be ready this commitfest. And > it hasn't received much discussion since August 2022. If I'm wrong say > something but otherwise I'll mark it Returned With Feedback. It can be > resurrected (and moved to the next commitfest) when you're free to > work on it again. I'm good with that. -- Regards, Alexander Korotkov
Re: Some revises in adding sorting path
On Tue, 21 Feb 2023 at 22:02, Richard Guo wrote: > Looking at the codes now I have some concern that what we do in > create_ordered_paths for partial paths may have already been done in > generate_useful_gather_paths, especially when query_pathkeys is equal to > sort_pathkeys. Not sure if this is a problem. And the comment there > mentions generate_gather_paths(), but ISTM we should mention what > generate_useful_gather_paths has done. I think you need to write some tests for this. I've managed to come up with something to get that code to perform a Sort, but I've not managed to get it to perform an incremental sort. create or replace function parallel_safe_volatile(a int) returns int as $$ begin return a; end; $$ parallel safe volatile language plpgsql; create table abc(a int, b int, c int); insert into abc select x,y,z from generate_Series(1,100)x, generate_Series(1,100)y, generate_Series(1,100)z; set parallel_tuple_cost=0; Without making those parallel paths, we get: postgres=# explain select * from abc where a=1 order by a,b,parallel_safe_volatile(c); QUERY PLAN Sort (cost=13391.49..13417.49 rows=10400 width=16) Sort Key: b, (parallel_safe_volatile(c)) -> Gather (cost=1000.00..12697.58 rows=10400 width=16) Workers Planned: 2 -> Parallel Seq Scan on abc (cost=0.00..11697.58 rows=4333 width=16) Filter: (a = 1) (6 rows) but with, the plan is: postgres=# explain select * from abc where a=1 order by a,b,parallel_safe_volatile(c); QUERY PLAN Gather Merge (cost=12959.35..13060.52 rows=8666 width=16) Workers Planned: 2 -> Sort (cost=11959.32..11970.15 rows=4333 width=16) Sort Key: b, (parallel_safe_volatile(c)) -> Parallel Seq Scan on abc (cost=0.00..11697.58 rows=4333 width=16) Filter: (a = 1) (6 rows) I added the parallel safe and volatile function so that get_useful_pathkeys_for_relation() wouldn't include all of the query_pathkeys. If you write some tests for this code, it will be useful to prove that it actually does something, and also that it does not break again in the future. I don't really want to just blindly copy the pattern used in 3c6fc5820 for creating incremental sort paths if it's not useful here. It would be good to see tests that make an Incremental Sort path using the code you're changing. Same for the 0003 patch. I'll mark this as waiting on author while you work on that. David
Re: Why mark empty pages all visible?
On Tue, Mar 28, 2023 at 12:01 PM Andres Freund wrote: > I will probably make that argument - so far I was just trying to understand > the intent of the current code. There aren't really comments explaining why we > want to mark currently-pinned empty/new pages all-visible and enter them into > the FSM. I don't think that not being able to immediately get a cleanup lock on a page signifies much of anything. I never have. > Historically we did *not* enter currently pinned empty/new pages into the FSM > / VM. Afaics that's new as of 44fa84881fff. Of course that's true, but I don't know why that history is particularly important. Either way, holding a pin was never supposed to work as an interlock against a page being concurrently set all-visible, or having its space recorded in the FSM. It's true that my work on VACUUM has shaken out a couple of bugs where we accidentally relied on that being true. But those were all due to the change in lazy_vacuum_heap_page() made in Postgres 14 -- not the addition of lazy_scan_new_or_empty() in Postgres 15. I actually think that I might agree with the substance of much of what you're saying, but at the same time I don't think that you're defining the problem in a way that's particularly helpful. I gather that you *don't* want to do anything about the lazy_vacuum_heap_page behavior with setting empty pages all-visible (just the lazy_scan_new_or_empty behavior). So clearly this isn't really about marking empty pages all-visible, with or without a cleanup lock. It's actually about something rather more specific: the interactions with RelationGetBufferForTuple. I actually agree that VACUUM is way too unconcerned about interfering with concurrent activity in terms of how it manages free space in the FSM. But this seems like just about the least important example of that (outside the context of your RelationGetBufferForTuple work). The really important case (that VACUUM gets wrong) all involve recently dead tuples. But I don't think that you want to talk about that right now. You want to talk about RelationGetBufferForTuple. > The reason I'm looking at this is that there's a lot of complexity at the > bottom of RelationGetBufferForTuple(), related to needing to release the lock > on the newly extended page and then needing to recheck whether there still is > free space on the page. And that it's not complicated enough > (c.f. INSERT_FROZEN doing visibilitymap_pin() with the page locked). > > As far as I can tell, if we went back to not entering new/empty pages into > either VM or FSM, we could rely on the page not getting filled while just > pinning, not locking it. What you're essentially arguing for is inventing a new rule that makes the early lifetime of a page (what we currently call a PageIsEmpty() page, and new pages) special, to avoid interference from VACUUM. I have made similar arguments myself on quite a few occasions, so I'm actually sympathetic. I just think that you should own it. And no, I'm not just reflexively defending my work in 44fa84881fff; I actually think that framing the problem as a case of restoring a previous behavior is confusing and ahistorical. If there was a useful behavior that was lost, then it was quite an accidental behavior all along. The difference matters because now you have to reconcile what you're saying with the lazy_vacuum_heap_page no-cleanup-lock behavior added in 14. I think that you must be arguing for making the early lifetime of a heap page special to VACUUM, since AFAICT you want to change VACUUM's behavior with strictly PageIsEmpty/PageIsNew pages only -- and *not* with pages that have one remaining LP_UNUSED item, but are otherwise empty (which could be set all-visible/all-frozen in either the first or second heap pass, even if we disabled the lazy_scan_new_or_empty() behavior you're complaining about). You seem to want to distinguish between very new pages (that also happen to be empty), and old pages that happen to be empty. Right? > I also do wonder whether the different behaviour of PageIsEmpty() and > PageIsNew() pages makes sense. The justification for not marking PageIsNew() > pages as all-visible holds just as true for empty pages. There's just as much > free space there. What you say here makes a lot of sense to me. I'm just not sure what I'd prefer to do about it. -- Peter Geoghegan
Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
On Tue, Mar 28, 2023 at 2:53 PM Gregory Stark (as CFM) wrote: > On Tue, 3 Jan 2023 at 14:16, Tom Lane wrote: > > > > Vik Fearing writes: > > > > > I don't think we should add that syntax until I do get it through the > > > committee, just in case they change something. > > > > Agreed. So this is something we won't be able to put into v16; > > it'll have to wait till there's something solid from the committee. > > I guess I'll mark this Rejected in the CF then. Who knows when the SQL > committee will look at this... > > -- > Gregory Stark > As Commitfest Manager > Yes, for now. I'm in touch with the pg-people on the committee and will resume work when there's something to act upon.
Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
On Tue, Mar 28, 2023 at 3:25 PM Isaac Morland wrote: > On Mon, 19 Dec 2022 at 17:57, Corey Huinker > wrote: > >> >> Attached is my work in progress to implement the changes to the CAST() >> function as proposed by Vik Fearing. >> >> CAST(expr AS typename NULL ON ERROR) >> will use error-safe functions to do the cast of expr, and will return >> NULL if the cast fails. >> >> CAST(expr AS typename DEFAULT expr2 ON ERROR) >> will use error-safe functions to do the cast of expr, and will return >> expr2 if the cast fails. >> > > Is there any difference between NULL and DEFAULT NULL? > What I think you're asking is: is there a difference between these two statements: SELECT CAST(my_string AS integer NULL ON ERROR) FROM my_table; SELECT CAST(my_string AS integer DEFAULT NULL ON ERROR) FROM my_table; And as I understand it, the answer would be no, there is no practical difference. The first case is just a convenient shorthand, whereas the second case tees you up for a potentially complex expression. Before you ask, I believe the ON ERROR syntax could be made optional. As I implemented it, both cases create a default expression which then typecast to integer, and in both cases that expression would be a const-null, so the optimizer steps would very quickly collapse those steps into a plain old constant.
Re: pgsql: amcheck: Fix verify_heapam for tuples where xmin or xmax is 0.
On Mon, Mar 27, 2023 at 4:52 PM Peter Geoghegan wrote: > This is fine, as far as it goes. Obviously it fixes the immediate problem. OK, I've committed and back-patched this fix to v14, just like the erroneous commit that created the issue. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Support logical replication of DDLs
On Sun, Mar 26, 2023 at 5:22 PM Tom Lane wrote: > I spent some time looking through this thread to try to get a sense > of the state of things, and I came away quite depressed. The patchset > has ballooned to over 2MB, which is a couple orders of magnitude > larger than anyone could hope to meaningfully review from scratch. > Despite that, it seems that there are fundamental semantics issues > remaining, not to mention clear-and-present security dangers, not > to mention TODO comments all over the code. Thanks for looking into this! > I'm also less than sold on the technical details, specifically > the notion of "let's translate utility parse trees into JSON and > send that down the wire". You can probably make that work for now, > but I wonder if it will be any more robust against cross-version > changes than just shipping the outfuncs.c representation. (Perhaps > it can be made more robust than the raw parse trees, but I see no > evidence that anyone's thought much about how.) I explored the idea of using the outfuncs.c representation in [1] and found this existing format is not designed to be portable between different major versions. So it can't be directly used for replication without serious modification. I think the DDL deparser is a necessary tool if we want to be able to handle cross-version DDL syntax differences by providing the capability to machine-edit the JSON representation. > And TBH, I don't think that I quite believe the premise in the > first place. The whole point of using logical rather than physical > replication is that the subscriber installation(s) aren't exactly like > the publisher. Given that, how can we expect that automated DDL > replication is going to do the right thing often enough to be a useful > tool rather than a disastrous foot-gun? The more you expand the scope > of what gets replicated, the worse that problem becomes --- for > example, I don't buy for one second that "let's replicate roles" > is a credible solution for the problems that come from the roles > not being the same on publisher and subscriber. I agree that a full fledged DDL deparser and DDL replication is too big of a task for one patch. I think we may consider approaching this feature in the following ways: 1. Phased development and testing as discussed in other emails. Probably support table commands first (as they are the most common DDLs), then the other commands in multiple phases. 2. Provide a subscription option to receive the DDL change, raise a notice and to skip applying the change. The users can listen to the DDL notice and implement application logic to apply the change if needed. The idea is we can start gathering user feedback by providing a somewhat useful feature (compared to doing nothing about DDLs), but also avoid heading straight into the potential footgun situation caused by automatically applying any mal-formatted DDLs. 3. As cross-version DDL syntax differences are expected to be uncommon (in real workload), maybe we can think about other options to handle such edge cases instead of fully automating it? For example, what about letting the user specify how a DDL should be replicated on the subscriber by explicitly providing two versions of DDL commands in some way? > I'm not sure how we get from here to a committable and useful feature, > but I don't think we're close to that yet, and I'm not sure that minor > iterations on a 2MB patchset will accomplish much. About 1 MB of the patch are testing output files for the DDL deparser (postgres/src/test/modules/test_ddl_deparse_regress/expected/). Regards, Zane [1] https://www.postgresql.org/message-id/CAAD30U%2Boi6e6Vh_zAzhuXzkqUhagmLGD%2B_iyn2N9w_sNRKsoag%40mail.gmail.com
Re: Add LZ4 compression in pg_dump
On 3/28/23 18:07, gkokola...@pm.me wrote: > > --- Original Message --- > On Friday, March 24th, 2023 at 10:30 AM, gkokola...@pm.me > wrote: > >> >> --- Original Message --- >> On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra >> tomas.von...@enterprisedb.com wrote: >> >>> This leaves the empty-data issue (which we have a fix for) and the >>> switch to LZ4F. And then the zstd part. >> >> Please expect promptly a patch for the switch to frames. > > Please find the expected patch attached. Note that the bulk of the > patch is code unification, variable renaming to something more > appropriate, and comment addition. These are changes that are not > strictly necessary to switch to LZ4F. I do believe that are > essential for code hygiene after the switch and they do belong > on the same commit. > I think the patch is fine, but I'm wondering if the renames shouldn't go a bit further. It removes references to LZ4File struct, but there's a bunch of functions with LZ4File_ prefix. Why not to simply use LZ4_ prefix? We don't have GzipFile either. Sure, it might be a bit confusing because lz4.h uses LZ4_ prefix, but then we probably should not define LZ4_compressor_init ... Also, maybe the comments shouldn't use "File API" when compress_io.c calls that "Compressed stream API". regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Why mark empty pages all visible?
Hi, On 2023-03-28 13:05:19 -0700, Peter Geoghegan wrote: > On Tue, Mar 28, 2023 at 12:01 PM Andres Freund wrote: > > I will probably make that argument - so far I was just trying to understand > > the intent of the current code. There aren't really comments explaining why > > we > > want to mark currently-pinned empty/new pages all-visible and enter them > > into > > the FSM. > > I don't think that not being able to immediately get a cleanup lock on > a page signifies much of anything. I never have. Why is that? It's as clear a signal of concurrent activity on the buffer you're going to get. > > Historically we did *not* enter currently pinned empty/new pages into the > > FSM > > / VM. Afaics that's new as of 44fa84881fff. > > Of course that's true, but I don't know why that history is > particularly important. It's interesting to understand *why* we are doing what we are. I think it'd make sense to propose changing how things work around this, but I just don't feel like I have a good enough grasp for why we do some of the things we do. And given there's not a lot of comments around it and some of the comments that do exist are inconsistent with themselves, looking at the history seems like the next best thing? > I actually think that I might agree with the substance of much of what > you're saying, but at the same time I don't think that you're defining > the problem in a way that's particularly helpful. Likely because the goals of the existing code aren't clear to me. So I don't feel like I have a firm grasp... > I gather that you *don't* want to do anything about the > lazy_vacuum_heap_page behavior with setting empty pages all-visible (just > the lazy_scan_new_or_empty behavior). Not in the short-medium term, at least. In the long term I do suspect we might want to do something about it. We have a *crapton* of contention in the FSM and caused by the FSM in bulk workloads. With my relation extension patch disabling the FSM nearly doubles concurrent load speed. At the same time, the fact that we might loose knowledge about all the existing free space in case of promotion or crash and never rediscover that space (because the pages are frozen), seems decidedly not great. I don't know what the path forward is, but it seems somewhat clear that we ought to do something. I suspect having a not crash-safe FSM isn't really acceptable anymore - it probably is fine to not persist a *reduction* in free space, but we can't permanently loose track of free space, no matter how many crashes. I know that you strongly dislike the way the FSM works, although I forgot some of the details. One thing I am fairly certain about is that using the FSM to tell other backends about newly bulk extended pages is not a good solution, even though we're stuck with it for the moment. > I actually agree that VACUUM is way too unconcerned about interfering > with concurrent activity in terms of how it manages free space in the > FSM. But this seems like just about the least important example of > that (outside the context of your RelationGetBufferForTuple work). The > really important case (that VACUUM gets wrong) all involve recently > dead tuples. But I don't think that you want to talk about that right > now. You want to talk about RelationGetBufferForTuple. That's indeed the background. By now I'd also like to add a few comments explaining why we do what we currently do, because I don't find all of it obvious. > > The reason I'm looking at this is that there's a lot of complexity at the > > bottom of RelationGetBufferForTuple(), related to needing to release the > > lock > > on the newly extended page and then needing to recheck whether there still > > is > > free space on the page. And that it's not complicated enough > > (c.f. INSERT_FROZEN doing visibilitymap_pin() with the page locked). > > > > As far as I can tell, if we went back to not entering new/empty pages into > > either VM or FSM, we could rely on the page not getting filled while just > > pinning, not locking it. > > What you're essentially arguing for is inventing a new rule that makes > the early lifetime of a page (what we currently call a PageIsEmpty() > page, and new pages) special, to avoid interference from VACUUM. I > have made similar arguments myself on quite a few occasions, so I'm > actually sympathetic. I just think that you should own it. And no, I'm > not just reflexively defending my work in 44fa84881fff; I actually > think that framing the problem as a case of restoring a previous > behavior is confusing and ahistorical. If there was a useful behavior > that was lost, then it was quite an accidental behavior all along. The > difference matters because now you have to reconcile what you're > saying with the lazy_vacuum_heap_page no-cleanup-lock behavior added > in 14. I really don't have a position to own yet, not on firm enough ground. > I think that you must be arguing for making the early lifetime of a > heap page special
Re: zstd compression for pg_dump
On Wed, Mar 15, 2023 at 9:50 PM Justin Pryzby wrote: > On Fri, Mar 10, 2023 at 12:48:13PM -0800, Jacob Champion wrote: > > It looks like pg_dump's meson.build is missing dependencies on zstd > > (meson couldn't find the headers in the subproject without them). > > I saw that this was added for LZ4, but I hadn't added it for zstd since > I didn't run into an issue without it. Could you check that what I've > added works for your case ? I thought I replied to this, sorry -- your newest patchset builds correctly with subprojects, so the new dependency looks good to me. Thanks! > > Hm. Best I can tell, the CloneArchive() machinery is supposed to be > > handling safety for this, by isolating each thread's state. I don't feel > > comfortable pronouncing this new addition safe or not, because I'm not > > sure I understand what the comments in the format-specific _Clone() > > callbacks are saying yet. > > My line of reasoning for unix is that pg_dump forks before any calls to > zstd. Nothing zstd does ought to affect the pg_dump layer. But that > doesn't apply to pg_dump under windows. This is an opened question. If > there's no solid answer, I could disable/ignore the option (maybe only > under windows). To (maybe?) move this forward a bit, note that pg_backup_custom's _Clone() function makes sure that there is no active compressor state at the beginning of the new thread. pg_backup_directory's implementation has no such provision. And I don't think it can, because the parent thread might have concurrently set one up -- see the directory-specific implementation of _CloseArchive(). Perhaps we should just NULL out those fields after the copy, instead? To illustrate why I think this is tough to characterize: if I've read the code correctly, the _Clone() and CloneArchive() implementations are running concurrently with code that is actively modifying the ArchiveHandle and the lclContext. So safety is only ensured to the extent that we keep track of which fields threads are allowed to touch, and I don't have that mental model. --Jacob
Re: Add pg_walinspect function with block info columns
On Tue, Mar 28, 2023 at 11:15:17AM -0700, Peter Geoghegan wrote: > On Mon, Mar 27, 2023 at 7:47 PM Bharath Rupireddy > wrote: >> Hm, agreed. Changed in the attached v7-0002 patch. We can as well >> write a case statement in the create function SQL to output forkname >> instead forknumber, but I'd stop doing that to keep in sync with >> pg_buffercache. > > I just don't see much value in any textual representation of fork > name, however generated. In practice it's just not adding very much > useful information. It is mostly useful as a way of filtering block > references, which makes simple integers more natural. I disagree with this argument. Personally, I have a *much* better experience with textual representation because there is no need to cross-check the internals of the code in case you don't remember what a given number means in an enum or in a set of bits, especially if you're in a hurry of looking at a production or customer deployment. In short, it makes for less mistakes because you don't have to think about some extra mapping between some integers and what they actually mean through text. The clauses you'd apply for a group by on the forks, or for a filter with IN clauses don't change, they're just made easier to understand for the common user, and that includes experienced people. We'd better think about that like the text[] arrays we use for the flag values, like the FPI flags, or why we've introduced text[] for the HEAP_* flags in the heap functions of pageinspect. There's even more consistency with pageinspect in using a fork name, where we can pass down a fork name to get a raw page. -- Michael signature.asc Description: PGP signature
Re: running logical replication as the subscription owner
On Tue, 2023-03-28 at 13:55 -0400, Robert Haas wrote: > Oh, interesting. I hadn't realized we were doing that, and I do think > that narrows the vulnerability quite a bit. It's good to know I wasn't missing something obvious. > bsent logical replication, you don't really need to worry > about whether that function is written securely -- it will run with > privileges of the person performing the DML, and not impact your > account at all. That's not strictly true. See the example at the bottom of this email. The second trigger is SECURITY INVOKER, and it captures the leaked secret number that was stored in the tuple by an earlier SECURITY DEFINER trigger. Perhaps I'm being pedantic, but my point is that SECURITY INVOKER is not a magical shield that protects users from themselves without bound. It protects users against some kinds of attacks, sure, but there's a limit to what it has to offer. > They have reason to be scared of you, but not the > reverse. However, if the people doing DML on the table can arrange to > perform that DML as you, then any security vulnerabilities in your > function potentially allow them to compromise your account. OK, but I'd like to be clear that you've moved from your prior statement: "in theory they might be able to protect themselves, but in practice they are unlikely to be careful enough" To something very different, where it seems that we can't think of a concrete problem even for not-so-careful users. The dangerous cases seem to be something along the lines of a security- invoker trigger function that builds and executes arbirary SQL based on the row contents. And then the table owner would then still need to set ENABLE ALWAYS TRIGGER. Do we really want to take that case on as our security responsibility? > It would also affect someone who uses a default > expression or other means of associating executable code with the > table, and something like a default expression doesn't require any > explicit setting to make it apply in case of replication, so maybe > the > risk of someone messing up is a bit higher. Perhaps, but I still don't understand that case. Unless I'm missing something, the table owner would have to write a pretty weird default expression or check constraint or whatever to end up with something dangerous. > But this definitely makes it more of a judgment call than I thought > initially. And I'm fine if the judgement is that we just require SET ROLE to be conservative and make sure we don't over-promise in 16. That's a totally reasonable thing: it's easier to loosen restrictions later than to tighten them. Furthermore, just because you and I can't think of exploitable problems doesn't mean they don't exist. I have found this discussion enlightening, so thank you for going through these details with me. > I'm still inclined to leave the patch checking for SET ROLE +0.5. I want the patch to go in either way, and can carry on the discussion later and improve things more for 17. But I want to say generally that I believe we spend too much effort trying to protect unreasonable users from themselves, which interferes with our ability to protect reasonable users and solve real use cases. > However, there might be an argument > that we ought to do something else, like have a REPLICATE privilege Sounds cool. Not sure I have an opinion yet, but probably 17 material. > What I would like to change in the > patch in this release is to add a new subscription property along the > lines of what I proposed to Jelte in my earlier email: let's provide > an escape hatch that turns off the user-switching behavior. I believe switching to the table owner (as done in your patch) is the right best practice, and should be the default. I'm fine with an escape hatch here to ease migrations or whatever. But please do it in an unobtrusive way such that we (as hackers) can mostly forget that the non-switching behavior exists. At least for me, our system is already pretty complex without two kinds of subscription security models. Regards, Jeff Davis - example follows -- -- user foo CREATE TABLE secret(I INT); INSERT INTO secret VALUES(42); CREATE TABLE r(i INT, secret INT); CREATE OR REPLACE FUNCTION a_func() RETURNS TRIGGER SECURITY DEFINER LANGUAGE plpgsql AS $$ BEGIN SELECT i INTO NEW.secret FROM secret; RETURN NEW; END; $$; CREATE OR REPLACE FUNCTION z_func() RETURNS TRIGGER SECURITY INVOKER LANGUAGE plpgsql AS $$ BEGIN IF NEW.secret <> abs(NEW.secret) THEN RAISE EXCEPTION 'no negative secret numbers allowed'; END IF; RETURN NEW; END; $$; CREATE TRIGGER a_trigger BEFORE INSERT ON r FOR EACH ROW EXECUTE PROCEDURE a_func(); CREATE TRIGGER z_trigger BEFORE INSERT ON r FOR EACH ROW EXECUTE PROCEDURE z_func(); GRANT INSERT ON r TO bar; GRANT USAGE ON SCHEMA foo TO bar; -- user bar CREATE OR REPLACE FUNCTION bar.abs(i INT4) RETURNS INT4 LA
Re: Request for comment on setting binary format output per session
On Tue, 2023-03-28 at 10:22 -0400, Dave Cramer wrote: > OK, IIUC what you are proposing here is that there would be a > separate pool for > database, user, and OIDs. This doesn't seem too flexible. For > instance if I create a UDT and then want it to be returned > as binary then I have to reconfigure the pool to be able to accept a > new list of OID's. There are two ways that I could imagine the connection pool working: 1. Accept whatever clients connect, and pass along the binary_formats setting to the outbound (server) connection. The downside here is that if you have many different clients (or different versions) that have different binary_formats settings, then it creates too many pools and doesn't share well enough. 2. Some kind of configuration setting (or maybe it can be done automatically) that organizes based on a common subset of binary formats that many clients can understand. These can evolve once the protocol extension is in place. Regards, Jeff Davis
Re: Why mark empty pages all visible?
On Tue, Mar 28, 2023 at 3:29 PM Andres Freund wrote: > Why is that? It's as clear a signal of concurrent activity on the buffer > you're going to get. Not immediately getting a cleanup lock in VACUUM is informative to the extent that you only care about what is happening that very nanosecond. If you look at which pages it happens to in detail, what you seem to end up with is a whole bunch of noise, which (on its own) tells you exactly nothing about what VACUUM really ought to be doing with those pages. In almost all cases we could get a cleanup lock by waiting for one millisecond and retrying. I suspect that the cleanup lock thing might be a noisy, unreliable proxy for the condition that you actually care about, in the context of your work on relation extension. I bet there is a better signal to go on, if you look for one. > It's interesting to understand *why* we are doing what we are. I think it'd > make sense to propose changing how things work around this, but I just don't > feel like I have a good enough grasp for why we do some of the things we > do. And given there's not a lot of comments around it and some of the comments > that do exist are inconsistent with themselves, looking at the history seems > like the next best thing? I think that I know why Heikki had no comments about PageIsEmpty() pages when the VM first went in, back in 2009: because it was just so obvious that you'd treat them the same as any other initialized page, it didn't seem to warrant a comment at all. The difference between a page with 0 tuples and 1 tuple is the same difference between a page with 1 tuple and a page with 2 tuples. A tiny difference (one extra tuple), of no particular consequence. I think that you don't see it that way now because you're focussed on the hio.c view of things. That code looked very different back in 2009, and in any case is very far removed from vacuumlazy.c. I can tell you what I was thinking of with lazy_scan_new_or_empty: I hate special cases. I will admit to being a zealot about it. > > I gather that you *don't* want to do anything about the > > lazy_vacuum_heap_page behavior with setting empty pages all-visible (just > > the lazy_scan_new_or_empty behavior). > > Not in the short-medium term, at least. In the long term I do suspect we might > want to do something about it. We have a *crapton* of contention in the FSM > and caused by the FSM in bulk workloads. With my relation extension patch > disabling the FSM nearly doubles concurrent load speed. I've seen the same effect myself. There is no question that that's a big problem. I think that the problem is that the system doesn't have any firm understanding of pages as things that are owned by particular transactions and/or backends, at least to a limited, scoped extent. It's all really low level, when it actually needs to be high level and take lots of context that comes from the application into account. > At the same time, the fact that we might loose knowledge about all the > existing free space in case of promotion or crash and never rediscover that > space (because the pages are frozen), seems decidedly not great. Unquestionably. > I don't know what the path forward is, but it seems somewhat clear that we > ought to do something. I suspect having a not crash-safe FSM isn't really > acceptable anymore - it probably is fine to not persist a *reduction* in free > space, but we can't permanently loose track of free space, no matter how many > crashes. Strongly agreed. It's a terrible false economy. If we bit the bullet and made relation extension and the FSM crash safe, we'd gain so much more than we'd lose. > One thing I am fairly certain about is that using the FSM to tell other > backends about newly bulk extended pages is not a good solution, even though > we're stuck with it for the moment. Strongly agreed. > > I think that you must be arguing for making the early lifetime of a > > heap page special to VACUUM, since AFAICT you want to change VACUUM's > > behavior with strictly PageIsEmpty/PageIsNew pages only -- and *not* > > with pages that have one remaining LP_UNUSED item, but are otherwise > > empty (which could be set all-visible/all-frozen in either the first > > or second heap pass, even if we disabled the lazy_scan_new_or_empty() > > behavior you're complaining about). You seem to want to distinguish > > between very new pages (that also happen to be empty), and old pages > > that happen to be empty. Right? > > I think that might be worthwhile, yes. The retry code in > RelationGetBufferForTuple() is quite hairy and almost impossible to test. If > we can avoid the complexity, at a fairly bound cost (vacuum needing to > re-visit new/empty pages if they're currently pinned), it'd imo be more that > worth the price. Short term, you could explicitly say that PageIsEmpty() means that the page is qualitatively different to other empty pages that were left behind by VACUUM's second phase. > But perhaps the better p
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry
On Tue, Mar 28, 2023 at 05:43:26PM +0900, Kyotaro Horiguchi wrote: > No. Fine by me, except that "block read requests" seems to suggest > kernel read() calls, maybe because it's not clear whether "block" > refers to our buffer blocks or file blocks to me.. If it is generally > clear, I'm fine with the proposal. Okay. Would somebody like to draft a patch? -- Michael signature.asc Description: PGP signature
Re: ALTER TABLE SET ACCESS METHOD on partitioned tables
On Tue, Mar 28, 2023 at 02:56:28PM +0900, Michael Paquier wrote: > Hmm. This is a good point. It is true that the patch feels > incomplete on this side. I don't see why we could not be flexible, > and allow a value of 0 in a partitioned table's relam to mean that we > would pick up the database default in this case when a partition is > is created on it. This would have the advantage to be consistent with > older versions where we fallback on the default. We cannot be > completely consistent with the reltablespace of the leaf partitions > unfortunately, as relam should always be set if a relation has > storage. And allowing a value of 0 means that there are likely other > tricky cases with dumps? > > Another thing: would it make sense to allow an empty string in > default_table_access_method so as we'd always fallback to a database > default? FYI, I am not sure that I will be able to look more at this patch by the end of the commit fest, and there are quite more points to consider. Perhaps at this stage we'd better mark it as returned with feedback? I understand that I've arrived late at this party :/ -- Michael signature.asc Description: PGP signature
Re: Should vacuum process config file reload more often
On Tue, Mar 28, 2023 at 4:21 AM Kyotaro Horiguchi wrote: > > At Mon, 27 Mar 2023 14:12:03 -0400, Melanie Plageman > wrote in > > So, I've attached an alternate version of the patchset which takes the > > approach of having one commit which only enables cost-based delay GUC > > refresh for VACUUM and another commit which enables it for autovacuum > > and makes the changes to balancing variables. > > > > I still think the commit which has workers updating their own > > wi_cost_delay in vacuum_delay_point() is a bit weird. It relies on no one > > else emulating our bad behavior and reading from wi_cost_delay without a > > lock and on no one else deciding to ever write to wi_cost_delay (even > > though it is in shared memory [this is the same as master]). It is only > > safe because our process is the only one (right now) writing to > > wi_cost_delay, so when we read from it without a lock, we know it isn't > > being written to. And everyone else takes a lock when reading from > > wi_cost_delay right now. So, it seems...not great. > > > > This approach also introduces a function that is only around for > > one commit until the next commit obsoletes it, which seems a bit silly. > > (I'm not sure what this refers to, though..) I don't think it's silly > if a later patch removes something that the preceding patches > introdcued, as long as that contributes to readability. Untimately, > they will be merged together on committing. I was under the impression that reviewers thought config reload and worker balance changes should be committed in separate commits. Either way, the ephemeral function is not my primary concern. I felt more uncomfortable with increasing how often we update a double in shared memory which is read without acquiring a lock. > > Basically, I think it is probably better to just have one commit > > enabling guc refresh for VACUUM and then another which correctly > > implements what is needed for autovacuum to do the same. > > Attached v9 does this. > > > > I've provided both complete versions of both approaches (v9 and v8). > > I took a look at v9 and have a few comments. > > 0001: > > I don't believe it is necessary, as mentioned in the commit > message. It apperas that we are resetting it at the appropriate times. VacuumCostBalance must be zeroed out when we disable vacuum cost. Previously, we did not reenable VacuumCostActive once it was disabled, but now that we do, I think it is good practice to always zero out VacuumCostBalance when we disable vacuum cost. I will squash this commit into the one introducing VacuumCostInactive, though. > > 0002: > > I felt a bit uneasy on this. It seems somewhat complex (and makes the > succeeding patches complex), Even if we introduced a second global variable to indicate that failsafe mode has been engaged, we would still require the additional checks of VacuumCostInactive. > has confusing names, I would be happy to rename the values of the enum to make them less confusing. Are you thinking "force" instead of "locked"? maybe: VACUUM_COST_FORCE_INACTIVE and VACUUM_COST_INACTIVE ? > and doesn't seem like self-contained. By changing the variable from VacuumCostActive to VacuumCostInactive, I have kept all non-vacuum code from having to distinguish between it being inactive due to failsafe mode or due to user settings. > I think it'd be simpler to add a global boolean (maybe > VacuumCostActiveForceDisable or such) that forces VacuumCostActive to > be false and set VacuumCostActive using a setter function that follows > the boolean. I think maintaining an additional global variable is more brittle than including the information in a single variable. > 0003: > > +* Reload the configuration file if requested. This allows changes to > +* vacuum_cost_limit and vacuum_cost_delay to take effect while a > table is > +* being vacuumed or analyzed. Analyze should not reload configuration > +* file if it is in an outer transaction, as GUC values shouldn't be > +* allowed to refer to some uncommitted state (e.g. database objects > +* created in this transaction). > > I'm not sure GUC reload is or should be related to transactions. For > instance, work_mem can be changed by a reload during a transaction > unless it has been set in the current transaction. I don't think we > need to deliberately suppress changes in variables caused by realods > during transactions only for analzye. If analyze doesn't like changes > to certain GUC variables, their values should be snapshotted before > starting the process. Currently, we only reload the config file in top-level statements. We don't reload the configuration file from within a nested transaction command. BEGIN starts a transaction but not a transaction command. So BEGIN; ANALYZE; probably wouldn't violate this rule. But it is simpler to just forbid reloading when it is not a top-level transaction command. I have updated the comment to reflect this. > 0004: > -
Re: Add LZ4 compression in pg_dump
On 3/28/23 00:34, gkokola...@pm.me wrote: > > ... > >> Got it. In that case I agree it's fine to do that in a single commit. > > For what is worth, I think that this patch should get a +1 and get in. It > solves the empty writes problem and includes a test to a previous untested > case. > Pushed, after updating / rewording the commit message a little bit. Thanks! -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [RFC] Add jit deform_counter
On Sun, 12 Jun 2022 at 21:14, Dmitry Dolgov <9erthali...@gmail.com> wrote: > I've noticed that JIT performance counter generation_counter seems to include > actions, relevant for both jit_expressions and jit_tuple_deforming options. It > means one can't directly see what is the influence of jit_tuple_deforming > alone, which would be helpful when adjusting JIT options. To make it better a > new counter can be introduced, does it make sense? I'm not so sure about this idea. As of now, if I look at EXPLAIN ANALYZE's JIT summary, the individual times add up to the total time. If we add this deform time, then that's no longer going to be true as the "Generation" time includes the newly added deform time. master: JIT: Functions: 600 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 37.758 ms, Inlining 0.000 ms, Optimization 6.736 ms, Emission 172.244 ms, Total 216.738 ms 37.758 + 6.736 + 172.244 = 216.738 I think if I was a DBA wondering why JIT was taking so long, I'd probably either be very astonished or I'd report a bug if I noticed that all the individual component JIT times didn't add up to the total time. I don't think the solution is to subtract the deform time from the generation time either. Can't users just get this by looking at EXPLAIN ANALYZE with and without jit_tuple_deforming? David