Re: Mark all GUC variable as PGDLLIMPORT
On Mon, 23 Aug 2021 at 22:45, Julien Rouhaud wrote: > On Mon, Aug 23, 2021 at 10:22 PM Tom Lane wrote: > > > > Bruce Momjian writes: > > > On Mon, Aug 23, 2021 at 10:15:04AM -0400, Tom Lane wrote: > > >> By that argument, *every* globally-visible variable should be marked > > >> PGDLLIMPORT. But the mere fact that two backend .c files need to > access > > > > > No, Julien says 99% need only the GUCs, so that is not the argument I > am > > > making. > > > > That's a claim unbacked by any evidence that I've seen. More to > > the point, we already have a mechanism that extensions can/should > > use to read and write GUC settings, and it's not direct access. > > I clearly didn't try all single extension available out there. It's > excessively annoying to compile extensions on Windows, and also I > don't have a lot of dependencies installed so there are some that I > wasn't able to test since I'm lacking some other lib and given my > absolute lack of knowledge of that platform I didn't spent time trying > to install those. > > Plenty of them are closed too. While that's not the Pg community's problem, as such, it'd be nice not to arbitrarily break them all for no actual benefit.
Re: Mark all GUC variable as PGDLLIMPORT
On Tue, 24 Aug 2021 at 05:08, Chapman Flack wrote: > On 08/23/21 14:30, Robert Haas wrote: > > it seems likely that this proposed > > API would have the exact same problem, because it would let people do > > exactly the same thing. And, going through this proposed API would > > still be significantly more expensive than just accessing the bare > > variables, because you'd at least have to do some kind of lookup based > > on the GUC name > > I think the API ideas in [0] would not let people do exactly the same > thing. > > They would avoid exposing the bare variables to overwrite. Not that > there has been any plague of extensions going and overwriting GUCs, > but I think in some messages on this thread I detected a sense that > in principle it's better if an API precludes it, and that makes sense > to me. > > The second idea also avoids the expense of name-based lookup (except once > at extension initialization), and in fact minimizes the cost of obtaining > the current value when needed, by slightly increasing the infrequent cost > of updating values. > I'd be generally in favour of something that reduced our reliance on the current chaotic and inconsistent jumble of globals which are a semi-random combination of compilation-unit-scoped, globally-scoped-except-on-Windows and globally scoped. Tackling GUCs would be a good start. Especially given the number of GUCs where the actual GUC value isn't the state that anyone should be interacting with directly since a hook maintains the "real" state derived from the GUC storage. And preventing direct writes to GUCs seems like a clearly correct thing to do. Some consideration of performance would be important for some of the hotter GUCs, of course, but most extensions won't be hammering most GUC access a lot.
Re: Mark all GUC variable as PGDLLIMPORT
On Tue, 24 Aug 2021 at 13:21, Craig Ringer wrote: > > There is not even the thinnest pretense of Pg having a dedicated extension > API or any sort of internal vs public API separation. > Oh, and if we do want such a separation, we'll need to introduce a MUCH lower-pain-and-overhead way to get related patches in. Otherwise it'll take decades to add any necessary function wrappers for currently exported data symbols, add necessary hooks, wrap or hide internal symbols and state, etc. But ... what is the actual goal and expected outcome of such a hypothetical public/private API separation? It won't help meaningfully with server maintenance: We already break extensions freely in major releases, and sometimes even minor releases. We don't make any stable API promise at all. So any argument that it would make maintenance of the core server easier is weak at best. It won't help protect server runtime stability: The server is written in C, and makes heavy use of non-opaque / non-hidden types. Many of which would not be practical to hide without enormous refactoring if at all. Writing any sort of "safe" C API is very difficult even when the exposed functionality is very narrow and well defined. Even then such an API can only help prevent inadvertent mistakes, since C programs are free to grovel around in memory. Look at the mess with EXPORT_SYMBOL_GPL in the kernel for just how ugly that can get. So I don't think there's any realistic way to claim that narrowing the exposed API surface would make it safer to load and run extensions that the user has not separately checked and reviewed or obtained from a trusted source with robust testing practices. Certainly it offers no benefit at all against a bad actor. It won't make it safer to use untrusted extensions. What will it do? Not much, in the short term, except cripple existing extensions or add a pile of questionably useful code annotations. The only real benefits I see are some improvement in link-time optimization and export symbol table size. Both of which are nice, but IMO not worth the pain by themselves for a pure C program. In C++, with its enormous symbol tables it's absolutely worth it. But not really for Pg. To be clear, I actually love the idea of starting to define a solid public API, with API, ABI and semantic promises and associated tests. But to say it's a nontrivial amount of work is an enormous understatement. And unless done by an existing committer who is trusted to largely define a provisional API without bike-shedding and arguing the merits of every single part of it, it's nearly impossible to do with the way Pg is currently developed. It's completely beyond me why it's OK to export all function symbols on Windows, but not all data symbols. Or why it's OK to export all data symbols on *nix, but not on Windows.
Re: Failure of subscription tests with topminnow
On Mon, Aug 16, 2021 at 6:33 PM Amit Kapila wrote: > The "ALTER SUBSCRIPTION tap_sub CONNECTION" would lead to restart the > walsender process. Now, here the problem seems to be that the previous > walsender process (16336) didn't exit and the new process started to > use the slot which leads to the error shown in the log. This is > evident from the below part of log where we can see that 16336 has > exited after new process started to use the slot. > > 2021-08-15 18:44:38.027 CEST [16475:6] tap_sub LOG: received > replication command: START_REPLICATION SLOT "tap_sub" LOGICAL > 0/16BEEB0 (proto_version '1', publication_names > '"tap_pub","tap_pub_ins_only"') > 2021-08-15 18:44:38.027 CEST [16475:7] tap_sub STATEMENT: > START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1', > publication_names '"tap_pub","tap_pub_ins_only"') > 2021-08-15 18:44:38.027 CEST [16475:8] tap_sub ERROR: replication > slot "tap_sub" is active for PID 16336 > 2021-08-15 18:44:38.027 CEST [16475:9] tap_sub STATEMENT: > START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1', > publication_names '"tap_pub","tap_pub_ins_only"') > 2021-08-15 18:44:38.041 CEST [16475:10] tap_sub LOG: disconnection: > session time: 0:00:00.036 user=nm database=postgres host=[local] > 2021-08-15 18:44:38.043 CEST [16336:14] tap_sub LOG: disconnection: > session time: 0:00:06.367 user=nm database=postgres host=[local] > > One idea to solve this is to first disable the subscription, wait for > the walsender process to exit, and then change the connection string > and re-enable the subscription. I tried to simulate this by putting delays prior to the exit of the walsender. Even though I see the same error in the logs that the replication slot is active for the previous PID, the test case does not fail in the way seen in this case here.. The new walsender tries to acquire the slot but seeing that there is another PID that is active on the slot, it errors and exits. At no point does this new failed walsender update its pid for the slot. As a result, the below polled query would not return a true value $node_publisher->poll_query_until('postgres', "SELECT pid != $oldpid FROM pg_stat_replication WHERE application_name = 'tap_sub';" ) or die "Timed out while waiting for apply to restart"; In my runs I see that this query is repeated until a new walsender is able to successfully acquire the slot. I am not able to explain why this query returned true in the topminnow tap test. This would imply that a walsender was able to acquire the slot and update its pid but I don't see how. We also know that if this polled query returns a true (implying a pid other than $oldpid), then the next query in the test is to try and identify the pid: SELECT pid FROM pg_stat_replication WHERE application_name = 'tap_sub'; >From the topminnow logs we know that this query returned a "NULL", as the value extracted from this is used to formulate the next query which errored out and eventually caused the test case to fail. [16482:3] 001_rep_changes.pl ERROR: syntax error at or near "FROM" at character 16 [16482:4] 001_rep_changes.pl STATEMENT: SELECT pid != FROM pg_stat_replication WHERE application_name = 'tap_sub'; I am not an expert in perl but I looked at the perl function used in the tap test poll_query_until(), this would poll until the query returns a 'true' (unless specified otherwise). I don't see in my tests that the polled function exits if the query returns a NULL. I don't know if differences in the installed perl can cause this difference in behaviour. Can a NULL set be construed as a true and cause the poll to exit? Any suggestions? regards, Ajin Cherian Fujitsu Australia
Re: Mark all GUC variable as PGDLLIMPORT
On Tue, 24 Aug 2021 at 02:31, Robert Haas wrote: > On Mon, Aug 23, 2021 at 11:40 AM Alvaro Herrera > wrote: > > In that case, why not improve the API with functions that return the > > values in some native datatype? For scalars with native C types (int, > > floats, Boolean etc) this is easy enough; I bet it'll solve 99% of the > > problems or more. > > Sure, but ... why bother? > > The entire argument rests on the presumption that there is some harm > being done by people accessing the values directly, but I don't think > that's true. And, if it were true, it seems likely that this proposed > API would have the exact same problem, because it would let people do > exactly the same thing. And, going through this proposed API would > still be significantly more expensive than just accessing the bare > variables, because you'd at least have to do some kind of lookup based > on the GUC name to find the corresponding variable. It's just a > solution in search of a problem. > > Nothing bad happens when people write extensions that access GUC > variables directly. It works totally, completely fine. Except on > Windows. > Not only that, but postgres already exports every non-static function symbol on both *nix and Windows, and every data symbol on *nix. A lot of those function symbols are very internal and give anything that can call them the ability to wreck absolute havoc on the server's state. There is not even the thinnest pretense of Pg having a dedicated extension API or any sort of internal vs public API separation. This ongoing pain with PGDLLIMPORT on Windows is hard to see as anything but an excuse to make working with and supporting Windows harder because some of us don't like it. I happen to rather dislike working with Windows myself, but I get to do it anyway, and I'd be very happy to remove this particular source of pain.
Re: Mark all GUC variable as PGDLLIMPORT
On Mon, 23 Aug 2021 at 22:15, Tom Lane wrote: > Bruce Momjian writes: > > On Sun, Aug 22, 2021 at 09:29:01PM +0800, Julien Rouhaud wrote: > >> Then shouldn't we try to prevent direct access on all platforms rather > than > >> only one? > > > Agreed. If Julian says 99% of the non-export problems are GUCs, and we > > can just export them all, why not do it? We already export every global > > variable on Unix-like systems, and we have seen no downsides. > > By that argument, *every* globally-visible variable should be marked > PGDLLIMPORT. But the mere fact that two backend .c files need to access > some variable doesn't mean that we want any random bit of code doing so. > > And yes, I absolutely would prohibit extensions from accessing many > of them, if there were a reasonable way to do it. It would be a good > start towards establishing a defined API for extensions. > There is: -fvisibility=hidden and __attribute__((visibility("default"))) . Or if you prefer to explicitly mark private symbols, use __attribute__((visiblity("hidden"))) . In addition to API surface control this also gives you a smaller export symbol table for faster dynamic linking, and it improves link-time optimization. I could've sworn I proposed its use in the past but I can't find a relevant list thread except quite a recent one. All I can find is [1] . But that is where we should start: switch from a linker script for libpq to using PGDLLIMPORT (actually it'd be LIBPQDLLIMPORT) in libpq. When -DBUILDING_LIBPQ this would expand to __declspec("dllexport") on Windows and __attribute__((visibility("default"))) on gcc/clang. Otherwise it expands to __declspec("dllimport") on Windows and empty on other targets. This would also be a good time to rename the confusingly named BUILDING_DLL macro to BUILDING_POSTGRES . The next step would be to have PGDLLIMPORT expand to __attribute__((visibility("default"))) on gcc/clang when building the server itself. This won't do anything by itself since all symbols are already default visibility. But once the "public API" of both function and data symbol is so-annotated, we could switch to building Pg with -fvisibility=hidden by default, and on Windows, we'd disable the linker script that exports all functions using a generated .DEF file. [1] https://www.postgresql.org/message-id/CAMsr%2BYHa3TfA4rKtnZuzurwCSmxxXNQHFE3UE29BoDEQcwfuxA%40mail.gmail.com
Re: Queries that should be canceled will get stuck on secure_write function
On 2021/08/24 0:26, Alvaro Herrera wrote: Aren't we talking about query cancellations that occur in response to a standby delay limit? Those aren't in response to user action. What I mean is that if the standby delay limit is exceeded, then we send a query cancel; we expect the standby to cancel its query at that time and then the primary can move on. But if the standby doesn't react, then we can have it terminate its connection. +1 On 2021/08/24 3:45, Robert Haas wrote: On Mon, Aug 23, 2021 at 11:26 AM Alvaro Herrera wrote: Aren't we talking about query cancellations that occur in response to a standby delay limit? Those aren't in response to user action. Oh, you're right. But I guess a similar problem could also occur in response to pg_terminate_backend(), no? There seems no problem in that case because pg_terminate_backend() causes a backend to set ProcDiePending to true in die() signal hander and ProcessClientWriteInterrupt() called by secure_write() handles ProcDiePending. No? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Mark all GUC variable as PGDLLIMPORT
On Sun, 22 Aug 2021 at 21:29, Julien Rouhaud wrote: > On Sun, Aug 22, 2021 at 09:19:42AM -0400, Tom Lane wrote: > > > > Uh, no, it's exactly *not* clear. There are a lot of GUCs that are only > > of interest to particular subsystems. I do not see why being a GUC makes > > something automatically more interesting than any other global variable. > > Usually, the fact that one is global is only so the GUC machinery itself > > can get at it, otherwise it'd be static in the owning module. > > > > As for "extensions should be able to get at the values", the GUC > machinery > > already provides uniform mechanisms for doing that safely. Direct access > > to the variable's internal value would be unsafe in many cases. > > Then shouldn't we try to prevent direct access on all platforms rather than > only one? > Yes. That's what I think we should be doing if we're not going to PGDLLIMPORT them all. The current API is painful because it round-trips via a text representation. We'd at least want some kind of GetConfigOptionBool(...) GetConfigOptionEnum(...) etc. I don't understand the objection to marking them all PGDLLIMPORT anyway though. Any pretense that Pg has any sort of public/private API divide is pure fantasy. Whether things are static or not, in public headers or "internal" headers, etc, is nearly random. We have absolutely no API surface control such as __attribute__((visibility("hidden"))) annotations, and proposals to add them have been soundly rejected in the past. If we have a defined API, where is it defined and annotated? If we don't, then Windows being different and incompatible is a bug, and that bug should be fixed.
Re: Mark all GUC variable as PGDLLIMPORT
On Tue, Aug 24, 2021 at 12:36 PM Pavel Stehule wrote: > > There are few GUC variables, where we need extra fast access - work_mem, > encoding settings, and maybe an application name. For others I hadn't needed > to access it for over 20 years. But I understand that more complex extensions > like timescaledb will use more internal GUC. Note that even trivial extensions require some other GUC access. For instance any single extension that wants to aggregate data based on the query_id has to store an array of query_id in shmem to know what a parallel worker is working on. On top of wasting memory and CPU, it also means that computing the maximum number of backends in required. So you need to recompute MaxBackends, which means access to multiple GUCs. Unfortunately the patch to expose the query_id didn't fix that problem as it only passes the top level query_id to the pararllel workers, not the current one.
Re: Mark all GUC variable as PGDLLIMPORT
po 23. 8. 2021 v 23:08 odesílatel Chapman Flack napsal: > On 08/23/21 14:30, Robert Haas wrote: > > it seems likely that this proposed > > API would have the exact same problem, because it would let people do > > exactly the same thing. And, going through this proposed API would > > still be significantly more expensive than just accessing the bare > > variables, because you'd at least have to do some kind of lookup based > > on the GUC name > > I think the API ideas in [0] would not let people do exactly the same > thing. > > They would avoid exposing the bare variables to overwrite. Not that > there has been any plague of extensions going and overwriting GUCs, > but I think in some messages on this thread I detected a sense that > in principle it's better if an API precludes it, and that makes sense > to me. > > The second idea also avoids the expense of name-based lookup (except once > at extension initialization), and in fact minimizes the cost of obtaining > the current value when needed, by slightly increasing the infrequent cost > of updating values. > > There are few GUC variables, where we need extra fast access - work_mem, encoding settings, and maybe an application name. For others I hadn't needed to access it for over 20 years. But I understand that more complex extensions like timescaledb will use more internal GUC. Maybe a new light API based on enum identifiers instead of string identifiers that ensure relatively fast access (and safe and secure) can be a good solution. And for special variables, there can be envelope functions for very fast access. But although the special interface, the responsibility of the extension's author will not be less, and the C extensions will not be more trusted, so it is hard to say something about possible benefits. I am inclined to Robert's opinion, so the current state is not too bad, and maybe the best thing that is possible now is the decreasing difference between supported platforms. Regards Pavel >
Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
On Tue, Aug 24, 2021 at 5:00 AM Robert Haas wrote: > > > I think this looks pretty good. I am not sure I see any reason to > introduce a new function RestoreTxnSnapshotAndSetAsActive. Couldn't we > just use RestoreTransactionSnapshot() and then call > PushActiveSnapshot() from parallel.c? That seems preferable to me from > the standpoint of avoiding multiplication of APIs. > I initially thought this too, but RestoreTransactionSnapshot() sets up the resultant transaction snapshot in "CurrentSnapshot", which is static to snapmgr.c (like the other pointers to valid snapshots) and I didn't really want to mess with the visibility of that, to allow a call to PushActiveSnapshot(CurrentSnapshot) in parallel.c. Also, I wasn't sure if it was safe to call GetTransactionSnapshot() here without the risk of unwanted side-effects - but, looking at it again, I think it is probably OK, so I did use it in my revised patch (attached) and removed that new function RestoreTxnSnapshotAndSetAsActive(). Do you agree that it is OK to call GetTransactionSnapshot() here? > I also think that the comments should explain why we are doing this, > rather than just that we are doing this. So instead of this: > > +/* > + * If the transaction snapshot was serialized, restore it and restore the > + * active snapshot too. Otherwise, the active snapshot is also installed > as > + * the transaction snapshot. > + */ > > ...perhaps something like: > > If the transaction isolation level is READ COMMITTED or SERIALIZABLE, > the leader has serialized the transaction snapshot and we must restore > it. At lower isolation levels, there is no transaction-lifetime > snapshot, but we need TransactionXmin to get set to a value which is > less than or equal to the xmin of every snapshot that will be used by > this worker. The easiest way to accomplish that is to install the > active snapshot as the transaction snapshot. Code running in this > parallel worker might take new snapshots via GetTransactionSnapshot() > or GetLatestSnapshot(), but it shouldn't have any way of acquiring a > snapshot older than the active snapshot. > I agree, that is a better comment and detailed description, but didn't you mean "If the transaction isolation level is REPEATABLE READ or SERIALIZABLE ..."? since we have: #define XACT_READ_UNCOMMITTED 0 #define XACT_READ_COMMITTED 1 #define XACT_REPEATABLE_READ2 #define XACT_SERIALIZABLE 3 #define IsolationUsesXactSnapshot() (XactIsoLevel >= XACT_REPEATABLE_READ) Regards, Greg Nancarrow Fujitsu Australia v10-0001-Fix-parallel-worker-failed-assertion-and-coredump.patch Description: Binary data
Re: Support reset of Shared objects statistics in "pg_stat_reset" function
On Sun, 22 Aug 2021 at 22:53, Sadhuprasad Patro wrote: > > > On 2021/08/20 11:07, Mahendra Singh Thalor wrote: > > > 1) > > > Resets statistics for a single table or index in the current database > > > -to zero. > > > +to zero. The input can be a shared table also > > > > > > I think, above comment should be modified. Maybe, we can modify it as "If input is a shared oid(table or index or toast), then resets statistics for a single shared entry to zero. > > > > I'm not sure if ordinary users can understand what "shared oid" means. Instead, > > what about "Resets statistics for a single relation in the current database or > > shared across all databases in the cluster to zero."? > > > > Thank you for the review here. As per the comments, attached the > latest patch here... > Thanks Sadhu for the updated patch. Patch looks good to me and I don't have any more comments. I marked this patch as 'ready for committer'. https://commitfest.postgresql.org/34/3282/ -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Re: Allow batched insert during cross-partition updates
On Tue, Jul 27, 2021 at 11:32 AM Amit Langote wrote: > On Thu, Jul 22, 2021 at 2:18 PM vignesh C wrote: > > One of the test postgres_fdw has failed, can you please post an > > updated patch for the fix: > > test postgres_fdw ... FAILED (test process exited with > > exit code 2) 7264 ms > > Thanks Vignesh. > > I found a problem with the underlying batching code that caused this > failure and have just reported it here: > > https://www.postgresql.org/message-id/CA%2BHiwqEWd5B0-e-RvixGGUrNvGkjH2s4m95%3DJcwUnyV%3Df0rAKQ%40mail.gmail.com > > Here's v8, including the patch for the above problem. Tomas committed the bug-fix, so attaching a rebased version of the patch for $subject. -- Amit Langote EDB: http://www.enterprisedb.com v9-0001-Allow-batching-of-inserts-during-cross-partition-.patch Description: Binary data
Re: prevent immature WAL streaming
At Mon, 23 Aug 2021 18:52:17 -0400, Alvaro Herrera wrote in > Included 蔡梦娟 and Jakub Wartak because they've expressed interest on > this topic -- notably [2] ("Bug on update timing of walrcv->flushedUpto > variable"). > > As mentioned in the course of thread [1], we're missing a fix for > streaming replication to avoid sending records that the primary hasn't > fully flushed yet. This patch is a first attempt at fixing that problem > by retreating the LSN reported as FlushPtr whenever a segment is > registered, based on the understanding that if no registration exists > then the LogwrtResult.Flush pointer can be taken at face value; but if a > registration exists, then we have to stream only till the start LSN of > that registered entry. > > This patch is probably incomplete. First, I'm not sure that logical > replication is affected by this problem. I think it isn't, because > logical replication will halt until the record can be read completely -- > maybe I'm wrong and there is a way for things to go wrong with logical > replication as well. But also, I need to look at the other uses of > GetFlushRecPtr() and see if those need to change to the new function too > or they can remain what they are now. > > I'd also like to have tests. That seems moderately hard, but if we had > WAL-molasses that could be used in walreceiver, it could be done. (It > sounds easier to write tests with a molasses-archive_command.) > > > [1] https://postgr.es/m/cbddfa01-6e40-46bb-9f98-9340f4379...@amazon.com > [2] > https://postgr.es/m/3f9c466d-d143-472c-a961-66406172af96.mengjuan@alibaba-inc.com (I'm not sure what "WAL-molasses" above expresses, same as "sugar"?) For our information, this issue is related to the commit 0668719801 which makes XLogPageRead restart reading a (continued or segments-spanning) record with switching sources. In that thread, I modifed the code to cause a server crash under the desired situation.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: .ready and .done files considered harmful
(sigh..) At Tue, 24 Aug 2021 11:35:06 +0900 (JST), Kyotaro Horiguchi wrote in > > IIUC partial WAL files are handled because the next file in the > > sequence with the given TimeLineID won't be there, so we will fall > > back to a directory scan and pick it up. Timeline history files are > > handled by forcing a directory scan, which should work because they > > always have the highest priority. Backup history files, however, do > > not seem to be handled. I think one approach to fixing that is to > > also treat backup history files similarly to timeline history files. > > If one is created, we force a directory scan, and the directory scan > > logic will consider backup history files as higher priority than > > everything but timeline history files. > > Backup history files are (currently) just informational and they are > finally processed at the end of a bulk-archiving performed by the fast > path. However, I feel that it is cleaner to trigger a directory scan > every time we add an other-than-a-regular-WAL-file, as base-backup or - promotion are not supposed happen so infrequently. + promotion are not supposed happen so frequently. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: .ready and .done files considered harmful
At Tue, 24 Aug 2021 00:03:37 +, "Bossart, Nathan" wrote in > On 8/23/21, 10:49 AM, "Robert Haas" wrote: > > On Mon, Aug 23, 2021 at 11:50 AM Bossart, Nathan > > wrote: > >> To handle a "cheating" archive command, I'd probably need to add a > >> stat() for every time pgarch_readyXLog() returned something from > >> arch_files. I suspect something similar might be needed in Dipesh's > >> patch to handle backup history files and partial WAL files. > > > > I think he's effectively got that already, although it's probably > > inside of pgarch_readyXLog(). The idea there is that instead of having > > a cache of files to be returned (as in your case) he just checks > > whether the next file in sequence happens to be present and if so > > returns that file name. To see whether it's present, he uses stat(). > > IIUC partial WAL files are handled because the next file in the > sequence with the given TimeLineID won't be there, so we will fall > back to a directory scan and pick it up. Timeline history files are > handled by forcing a directory scan, which should work because they > always have the highest priority. Backup history files, however, do > not seem to be handled. I think one approach to fixing that is to > also treat backup history files similarly to timeline history files. > If one is created, we force a directory scan, and the directory scan > logic will consider backup history files as higher priority than > everything but timeline history files. Backup history files are (currently) just informational and they are finally processed at the end of a bulk-archiving performed by the fast path. However, I feel that it is cleaner to trigger a directory scan every time we add an other-than-a-regular-WAL-file, as base-backup or promotion are not supposed happen so infrequently. > I've been looking at the v9 patch with fresh eyes, and I still think > we should be able to force the directory scan as needed in > XLogArchiveNotify(). Unless the file to archive is a regular WAL file > that is > our stored location in archiver memory, we should force a > directory scan. I think it needs to be > instead of >= because we > don't know if the archiver has just completed a directory scan and > found a later segment to use to update the archiver state (but hasn't > yet updated the state in shared memory). I'm afraid that it can be seen as a violation of modularity. I feel that wal-emitter side should not be aware of that datail of archiving. Instead, I would prefer to keep directory scan as far as it found an smaller segment id than the next-expected segment id ever archived by the fast-path (if possible). This would be less-performant in the case out-of-order segments are frequent but I think the overall objective of the original patch will be kept. > Also, I think we need to make sure to set PgArch->dirScan back to true > at the end of pgarch_readyXlog() unless we've found a new regular WAL > file that we can use to reset the archiver's stored location. This > ensures that we'll keep doing directory scans as long as there are > timeline/backup history files to process. Right. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
On Tue, Aug 24, 2021 at 10:01 AM houzj.f...@fujitsu.com wrote: > > > > > -Original Message- > > From: Masahiko Sawada > > Sent: Tuesday, August 24, 2021 8:52 AM > > > > Thanks. The patch for HEAD looks good to me. Regarding the patch for PG14, I > > think we need to update the comment as well: > > > > @@ -987,7 +986,7 @@ AlterSubscription(AlterSubscriptionStmt *stmt, bool > > isTopLevel) > > PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with > > refresh"); > > > > /* Only refresh the added/dropped list of publications. */ > > - sub->publications = stmt->publication; > > + sub->publications = publist; > > > > Thanks for the comment, attach new version patch which fixed it. Thank you for updating the patch! Looks good to me. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
> -Original Message- > From: Masahiko Sawada > Sent: Tuesday, August 24, 2021 8:52 AM > > Thanks. The patch for HEAD looks good to me. Regarding the patch for PG14, I > think we need to update the comment as well: > > @@ -987,7 +986,7 @@ AlterSubscription(AlterSubscriptionStmt *stmt, bool > isTopLevel) > PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with > refresh"); > > /* Only refresh the added/dropped list of publications. */ > - sub->publications = stmt->publication; > + sub->publications = publist; > Thanks for the comment, attach new version patch which fixed it. Best regards, Hou zj v8-0001-PG15-Fix-Alter-Subscription-Add-Drop-Publication-refre.patch Description: v8-0001-PG15-Fix-Alter-Subscription-Add-Drop-Publication-refre.patch v8-0001-PG14-Fix-Alter-Subscription-Add-Drop-Publication-refre_patch Description: v8-0001-PG14-Fix-Alter-Subscription-Add-Drop-Publication-refre_patch
Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
On Mon, Aug 23, 2021 at 11:05 PM houzj.f...@fujitsu.com wrote: > > On Mon, Aug 23, 2021 8:01 PM Amit Kapila wrote: > > On Mon, Aug 23, 2021 at 2:45 PM > > houzj.f...@fujitsu.com wrote: > > > > > > On Mon, Aug 23, 2021 12:59 PM Amit Kapila wrote: > > > > > > > > On Sat, Aug 7, 2021 at 6:53 PM houzj.f...@fujitsu.com > > > > wrote: > > > > > > > > > > Personally, I also think it will be better to make the behavior > > > > > consistent. > > > > > Attach the new version patch make both ADD and DROP behave the > > > > > same as SET PUBLICATION which refresh all the publications. > > > > > > > > > > > > > I think we can have tests in the separate test file > > > > (alter_sub_pub.pl) like you earlier had in one of the versions. Use > > > > some meaningful names for tables instead of temp1, temp2 as you had in > > > > the > > previous version. > > > > Otherwise, the code changes look good to me. > > > > > > Thanks for the comment. > > > Attach new version patch which did the following changes. > > > > > > * move the tests to a separate test file (024_alter_sub_pub.pl) > > > * adjust the document of copy_data option > > > * add tab-complete for copy_data option when ALTER DROP publication. > > > > > > > Thanks, the patch looks good to me. I have made some cosmetic changes in the > > attached version. It makes the test cases easier to understand. > > I am planning to push this tomorrow unless there are more comments or > > suggestions. > > Thanks ! The patch looks good to me. > > I noticed that the patch cannot be applied to PG14-stable, > so I attach a separate patch for back-patch(I didn’t change the patch for > HEAD branch). Thanks. The patch for HEAD looks good to me. Regarding the patch for PG14, I think we need to update the comment as well: @@ -987,7 +986,7 @@ AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel) PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION with refresh"); /* Only refresh the added/dropped list of publications. */ - sub->publications = stmt->publication; + sub->publications = publist; Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Mon, Aug 23, 2021 at 04:57:31PM -0400, Robert Haas wrote: > On Fri, Aug 20, 2021 at 1:36 PM Shruthi Gowda wrote: > > Thanks Robert for your comments. > > I have split the patch into two portions. One that handles DB OID and > > the other that > > handles tablespace OID and relfilenode OID. > > It's pretty clear from the discussion, I think, that the database OID > one is going to need rework to be considered. I assume this patch is not going to be applied until there is an actual use case for preserving these values. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: .ready and .done files considered harmful
On 8/23/21, 10:49 AM, "Robert Haas" wrote: > On Mon, Aug 23, 2021 at 11:50 AM Bossart, Nathan wrote: >> To handle a "cheating" archive command, I'd probably need to add a >> stat() for every time pgarch_readyXLog() returned something from >> arch_files. I suspect something similar might be needed in Dipesh's >> patch to handle backup history files and partial WAL files. > > I think he's effectively got that already, although it's probably > inside of pgarch_readyXLog(). The idea there is that instead of having > a cache of files to be returned (as in your case) he just checks > whether the next file in sequence happens to be present and if so > returns that file name. To see whether it's present, he uses stat(). IIUC partial WAL files are handled because the next file in the sequence with the given TimeLineID won't be there, so we will fall back to a directory scan and pick it up. Timeline history files are handled by forcing a directory scan, which should work because they always have the highest priority. Backup history files, however, do not seem to be handled. I think one approach to fixing that is to also treat backup history files similarly to timeline history files. If one is created, we force a directory scan, and the directory scan logic will consider backup history files as higher priority than everything but timeline history files. I've been looking at the v9 patch with fresh eyes, and I still think we should be able to force the directory scan as needed in XLogArchiveNotify(). Unless the file to archive is a regular WAL file that is > our stored location in archiver memory, we should force a directory scan. I think it needs to be > instead of >= because we don't know if the archiver has just completed a directory scan and found a later segment to use to update the archiver state (but hasn't yet updated the state in shared memory). Also, I think we need to make sure to set PgArch->dirScan back to true at the end of pgarch_readyXlog() unless we've found a new regular WAL file that we can use to reset the archiver's stored location. This ensures that we'll keep doing directory scans as long as there are timeline/backup history files to process. Nathan
prevent immature WAL streaming
Included 蔡梦娟 and Jakub Wartak because they've expressed interest on this topic -- notably [2] ("Bug on update timing of walrcv->flushedUpto variable"). As mentioned in the course of thread [1], we're missing a fix for streaming replication to avoid sending records that the primary hasn't fully flushed yet. This patch is a first attempt at fixing that problem by retreating the LSN reported as FlushPtr whenever a segment is registered, based on the understanding that if no registration exists then the LogwrtResult.Flush pointer can be taken at face value; but if a registration exists, then we have to stream only till the start LSN of that registered entry. This patch is probably incomplete. First, I'm not sure that logical replication is affected by this problem. I think it isn't, because logical replication will halt until the record can be read completely -- maybe I'm wrong and there is a way for things to go wrong with logical replication as well. But also, I need to look at the other uses of GetFlushRecPtr() and see if those need to change to the new function too or they can remain what they are now. I'd also like to have tests. That seems moderately hard, but if we had WAL-molasses that could be used in walreceiver, it could be done. (It sounds easier to write tests with a molasses-archive_command.) [1] https://postgr.es/m/cbddfa01-6e40-46bb-9f98-9340f4379...@amazon.com [2] https://postgr.es/m/3f9c466d-d143-472c-a961-66406172af96.mengjuan@alibaba-inc.com -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ >From 2c5d347e9b6819ba59fa0e72e4227a2cb4d1230f Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 23 Aug 2021 18:25:48 -0400 Subject: [PATCH] Don't stream non-final WAL segments Avoid setting the physical-stream replication read pointer in the middle of a WAL record. This can occur if a record is split in two (or more) across segment boundaries. The reason to avoid it is that if we stream the segment containing the first half, and then we crash before writing the next segment, the primary will rewrite the tail of the segment with a new WAL record (having discarded the incomplete record), but the replica will be stuck trying to replay a broken file (since the next segment will never contain the now-gone data). To do this, change streaming replication to retreat the flush pointer according to registered segment boundaries. --- src/backend/access/transam/xlog.c | 39 ++--- src/backend/replication/walsender.c | 2 +- src/include/access/xlog.h | 1 + 3 files changed, 37 insertions(+), 5 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 24165ab03e..2b85d1b2ae 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -731,8 +731,10 @@ typedef struct XLogCtlData */ XLogSegNo lastNotifiedSeg; XLogSegNo earliestSegBoundary; + XLogRecPtr earliestSegBoundaryStartPtr; XLogRecPtr earliestSegBoundaryEndPtr; XLogSegNo latestSegBoundary; + XLogRecPtr latestSegBoundaryStartPtr; XLogRecPtr latestSegBoundaryEndPtr; slock_t segtrack_lck; /* locks shared variables shown above */ @@ -932,7 +934,7 @@ static void RemoveXlogFile(const char *segname, XLogSegNo recycleSegNo, XLogSegNo *endlogSegNo); static void UpdateLastRemovedPtr(char *filename); static void ValidateXLOGDirectoryStructure(void); -static void RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr pos); +static void RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr startpos, XLogRecPtr endpos); static void CleanupBackupHistory(void); static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force); static XLogRecord *ReadRecord(XLogReaderState *xlogreader, @@ -1183,11 +1185,11 @@ XLogInsertRecord(XLogRecData *rdata, * * Note that we did not use XLByteToPrevSeg() for determining the * ending segment. This is so that a record that fits perfectly into - * the end of the segment causes the latter to get marked ready for + * the end of the segment causes said segment to get marked ready for * archival immediately. */ if (StartSeg != EndSeg && XLogArchivingActive()) - RegisterSegmentBoundary(EndSeg, EndPos); + RegisterSegmentBoundary(EndSeg, StartPos, EndPos); /* * Advance LogwrtRqst.Write so that it includes new block(s). @@ -4398,7 +4400,7 @@ ValidateXLOGDirectoryStructure(void) * to delay until the end segment is known flushed. */ static void -RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr endpos) +RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr startpos, XLogRecPtr endpos) { XLogSegNo segno PG_USED_FOR_ASSERTS_ONLY; @@ -4415,6 +4417,7 @@ RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr endpos) if (XLogCtl->earliestSegBoundary == MaxXLogSegNo) { XLogCtl->earliestSegBoundary = seg; + XLogCtl->earliestSegBoundaryStartPtr = startpos; XLogCtl->earliestSegBoundaryEndPtr = endpos; } else if (seg >
Re: The Free Space Map: Problems and Opportunities
On Fri, Aug 20, 2021 at 1:30 PM Robert Haas wrote: > On Fri, Aug 20, 2021 at 3:40 PM Peter Geoghegan wrote: > > My concern here is really the data structure and its overall > > complexity. > I don't think I understand the data structure that you have in mind > well enough to comment intelligently. Right now my prototype has a centralized table in shared memory, with a hash table. One entry per relation, generally multiple freelists per relation. And with per-freelist metadata such as owner and original leader backend XID values. Plus of course the lists of free blocks themselves. The prototype already clearly fixes the worst problems with BenchmarkSQL, but that's only one of my goals. That's just the starting point. I appreciate your input on this. And not just on the implementation details -- the actual requirements themselves are still in flux. This isn't so much a project to replace the FSM as it is a project that adds a new rich abstraction layer that goes between access methods and smgr.c -- free space management is only one of the responsibilities. Access methods like heapam can talk to the new abstraction layer, sometimes exploiting semantics from the logical database -- owning transaction IDs for a given lease, things like that. The new abstraction may thereby help us when working on other problems, including problems in areas that might seem totally unrelated. That's the big idea. Here are some (admittedly quite speculative) examples of places that the infrastructure could help with: * Parallel INSERT/COPY. I imagine parallel DML is hard because of coordination problems. With the infrastructure I have in mind, a parallel DML implementation could request a large free space lease, and distribute blocks from that lease among parallel workers during execution, on demand. This naturally avoids unnecessary buffer lock contention among concurrent writes from the same transaction (but different workers). But that's not all. I'm also thinking of things like LWLock/buffer lock deadlock avoidance -- issues of fundamental correctness. That stuff becomes easier, too. Individual parallel workers from a parallel INSERT could safely assume that the blocks that the parallel leader tells them about really is only going to be accessed by them, for as long as the owning transaction is around (barring read-only access by VACUUM, things like that). There is simply no question of any other backend/parallel worker thinking that it has the right to use the same block before the transaction commits. The transaction-scoped ownership semantics must be truly robust for this to work at all, but that's a good idea anyway. * Truncate tables without a disruptive relation-level lock. AFAICT it isn't actually logically necessary to get a heavyweight relation extension lock to truncate a relation during VACUUM. The only reason we need that right now is because it's really hard to nail down which backends might have a usable BlockNumber that they still expect to not point past the end of the physical relfilenode (or much worse, point to new and logically unrelated data). Perhaps we can solve the underlying problem using something like Lanin & Shasha's "drain technique", which is used inside nbtree for page deletion today. The general idea is to defer the physical unlink/recycling until we know for sure that any backends that might have had unsafe references have gone away. This is a very broad technique. * Truncate-away empty heap table segments that happen to come before other segments that still have useful data. * Eager cleanup of aborted transactions by VACUUM (or some variation of VACUUM that just handles aborts). Again, we're using information about the logical database to solve problem with the physical database -- that information is what we seem to currently be missing in all these areas. That's really the central idea behind the new infrastructure. -- Peter Geoghegan
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Fri, Aug 20, 2021 at 1:36 PM Shruthi Gowda wrote: > > Thanks Robert for your comments. > > I have split the patch into two portions. One that handles DB OID and > > the other that > > handles tablespace OID and relfilenode OID. > > It's pretty clear from the discussion, I think, that the database OID > one is going to need rework to be considered. Regarding that ... I have to wonder just what promises we feel we've made when it comes to what a user is expected to be able to do with the new cluster *before* pg_upgrade is run on it. For my part, I sure feel like it's "nothing", in which case it seems like we can do things that we can't do with a running system, like literally just DROP and recreate with the correct OID of any databases we need to, or even push that back to the user to do that at initdb time with some kind of error thrown by pg_upgrade during the --check phase. "Initial databases have non-standard OIDs, recreate destination cluster with initdb --with-oid=12341" or something along those lines. Also open to the idea of simply forcing 'template1' to always being OID=1 even if it's dropped/recreated and then just dropping/recreating the template0 and postgres databases if they've got different OIDs than what the old cluster did- after all, they should be getting entirely re-populated as part of the pg_upgrade process itself. Thanks, Stephen signature.asc Description: PGP signature
Re: Mark all GUC variable as PGDLLIMPORT
On 08/23/21 14:30, Robert Haas wrote: > it seems likely that this proposed > API would have the exact same problem, because it would let people do > exactly the same thing. And, going through this proposed API would > still be significantly more expensive than just accessing the bare > variables, because you'd at least have to do some kind of lookup based > on the GUC name I think the API ideas in [0] would not let people do exactly the same thing. They would avoid exposing the bare variables to overwrite. Not that there has been any plague of extensions going and overwriting GUCs, but I think in some messages on this thread I detected a sense that in principle it's better if an API precludes it, and that makes sense to me. The second idea also avoids the expense of name-based lookup (except once at extension initialization), and in fact minimizes the cost of obtaining the current value when needed, by slightly increasing the infrequent cost of updating values. Regards, -Chap [0] https://www.postgresql.org/message-id/6123C425.3080409%40anastigmatix.net
Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
On Fri, Aug 20, 2021 at 1:36 PM Shruthi Gowda wrote: > Thanks Robert for your comments. > I have split the patch into two portions. One that handles DB OID and > the other that > handles tablespace OID and relfilenode OID. It's pretty clear from the discussion, I think, that the database OID one is going to need rework to be considered. Regarding the other one: - The comment in binary_upgrade_set_pg_class_oids() is still not accurate. You removed the sentence which says "Indexes cannot have toast tables, so we need not make this probe in the index code path" but the immediately preceding sentence is still inaccurate in at least two ways. First, it only talks about tables, but the code now applies to indexes. Second, it only talks about OIDs, but now also deals with refilenodes. It's really important to fully update every comment that might be affected by your changes! - The SQL query in that function isn't completely correct. There is a left join from pg_class to pg_index whose ON clause includes "c.reltoastrelid = i.indrelid AND i.indisvalid." The reason it's likely that is because it is possible, in corner cases, for a TOAST table to have multiple TOAST indexes. I forget exactly how that happens, but I think it might be like if a REINDEX CONCURRENTLY on the TOAST table fails midway through, or something of that sort. Now if that happens, the LEFT JOIN you added is going to cause the output to contain multiple rows, because you didn't replicate the i.indisvalid condition into that ON clause. And then it will fail. Apparently we don't have a pg_upgrade test case for this scenario; we probably should. Actually what I think would be even better than putting i.indisvalid into that ON clause would be to join off of i.indrelid rather than c.reltoastrelid. - The code that decodes the various columns of this query does so in a slightly different order than the query itself. It would be better to make it match. Perhaps put relkind first in both cases. I might also think about trying to make the column naming a bit more consistent, e.g. relkind, relfilenode, toast_oid, toast_relfilenode, toast_index_oid, toast_index_relfilenode. - In heap_create(), the wording of the error messages is not quite consistent. You have "relfilenode value not set when in binary upgrade mode", "toast relfilenode value not set when in binary upgrade mode", and "pg_class index relfilenode value not set when in binary upgrade mode". Why does the last one mention pg_class when the other two don't? - The code in heap_create() now has no comments whatsoever, which is a shame, because it's actually kind of a tricky bit of logic. Someone might wonder why we override the relfilenode inside that function instead of doing it at the same places where we absorb binary_upgrade_next_{heap,index,toast}_pg_class_oid and the passing down the relfilenode. I think the answer is that passing down the relfilenode from the caller would result in storage not actually being created, whereas in this case we want it to be created but just with the value we specify, and the reason we want that is because we need later DDL that happens after these statements but before the old cluster's relations are moved over to execute successfully, which it won't if the storage is altogether absent. However, that raises the question of whether this patch has even got the basic design right. Maybe we ought to actually be absorbing the relfilenode setting at the same places where we're doing so for the OID, and then passing an additional parameter to heap_create() like bool suppress_storage or something like that. Maybe, taking it even further, we ought to be changing the signatures of binary_upgrade_next_heap_pg_class_oid and friends to be two-argument functions, and pass down the OID and the relfilenode in the same call, rather than calling two separate functions. I'm not so much concerned about the cost of calling two functions as the potential for confusion. I'm not honestly sure that either of these changes are the right thing to do, but I am pretty strongly inclined to do at least the first part - trying to absorb reloid and relfilenode in the same places. If we're not going to do that we certainly need to explain why we're doing it the way we are in the comments. It's not really this patch's fault, but it would sure be nice if we had some better testing for this area. Suppose this patch somehow changed nothing from the present behavior. How would we know? Or suppose it managed to somehow set all the relfilenodes in the new cluster to random values rather than the intended one? There's no automated testing that would catch any of that, and it's not obvious how it could be added to test.sh. I suppose what we really need to do at some point is rewrite that as a TAP test, but that seems like a separate project from this patch. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)
Greetings, * Mark Dilger (mark.dil...@enterprisedb.com) wrote: > > On Aug 23, 2021, at 12:51 PM, Stephen Frost wrote: > > Simply using superuser_arg() isn't sufficient is exactly the point that > > I'm trying to make. As a 'landlord', I might very well want to have > > some kind of 'landlord' role that isn't directly a superuser but which > > could *become* a superuser by having been GRANT'd a superuser role- but > > I certainly don't want that role's objects to be able to be messed with > > by the tenant. > > > If one of those other non-superuser roles is, itself, a role that can > > become a superuser > > If you have a sandbox-superuser who can do anything within the sandbox but > nothing outside the sandbox, then you need a pretty good wall at the > periphery of the sandbox. Breaking sandbox-superuser-ishness into multiple > distinct privileges rather than one monolithic privilege doesn't change the > need for a good wall at the periphery. The pg_manage_database_objects role > doesn't encompass all sandbox-superuser privileges, but it is on that side of > the wall. > > We could agree to move the wall a little, and say that non-superuser roles > who have the ability to become superusers are on the other side of the wall. > That's fine. I'd have to rework the patch a bit, but conceptually that seems > doable. We could also say that non-superusers who are members of privileged > roles (pg_execute_server_programs, pg_signal_backend, etc) are likewise on > the other side of that wall. > > Does that work? I'd much rather we go down the path that Robert had suggested where we find a way to make a connection between the tenant role and everything that they create, and leave everything that is outside of that box on the other side of the 'wall'. There's also the risk that the landlord creates a role one day but then GRANT's superuser rights to that role on another day, that happened to be after the tenant managed to gain control of that role. That kind of thing is something we should work hard to make difficult to happen- the landlord should have to explicitly give the tenant control over something that the landlord creates, it shouldn't happen automagically. Having hard-coded lists of which predefined roles are 'ok' and which aren't sounds generally bad and I don't think we'd actually want to include all predefined roles in that list either (even if it'd be fine today, which I don't think it is given things like pg_monitor and pg_signal_backend, though perhaps there could be some debate over those...). Thanks, Stephen signature.asc Description: PGP signature
Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)
> On Aug 23, 2021, at 12:51 PM, Stephen Frost wrote: > > Simply using superuser_arg() isn't sufficient is exactly the point that > I'm trying to make. As a 'landlord', I might very well want to have > some kind of 'landlord' role that isn't directly a superuser but which > could *become* a superuser by having been GRANT'd a superuser role- but > I certainly don't want that role's objects to be able to be messed with > by the tenant. > If one of those other non-superuser roles is, itself, a role that can > become a superuser If you have a sandbox-superuser who can do anything within the sandbox but nothing outside the sandbox, then you need a pretty good wall at the periphery of the sandbox. Breaking sandbox-superuser-ishness into multiple distinct privileges rather than one monolithic privilege doesn't change the need for a good wall at the periphery. The pg_manage_database_objects role doesn't encompass all sandbox-superuser privileges, but it is on that side of the wall. We could agree to move the wall a little, and say that non-superuser roles who have the ability to become superusers are on the other side of the wall. That's fine. I'd have to rework the patch a bit, but conceptually that seems doable. We could also say that non-superusers who are members of privileged roles (pg_execute_server_programs, pg_signal_backend, etc) are likewise on the other side of that wall. Does that work? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: archive status ".ready" files may be created too early
On 2021-Aug-23, Bossart, Nathan wrote: > On 8/23/21, 12:55 PM, "alvhe...@alvh.no-ip.org" > wrote: > > Thanks. I've pushed this now to all live branches. > > Thank you! I appreciate the thorough reviews. Should we make a new > thread for the streaming replication fix? Yeah, this one is long enough :-) -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: archive status ".ready" files may be created too early
On 8/23/21, 12:55 PM, "alvhe...@alvh.no-ip.org" wrote: > Thanks. I've pushed this now to all live branches. Thank you! I appreciate the thorough reviews. Should we make a new thread for the streaming replication fix? Nathan
Re: Mark all GUC variable as PGDLLIMPORT
On Mon, Aug 23, 2021 at 10:56:52AM -0400, Robert Haas wrote: > On Mon, Aug 23, 2021 at 10:15 AM Tom Lane wrote: > > And yes, I absolutely would prohibit extensions from accessing many > > of them, if there were a reasonable way to do it. It would be a good > > start towards establishing a defined API for extensions. > > Mostly, it would make extension development more difficult for no > discernable benefit to the project. > > You've made this argument many times over the years ... but we're no > closer to having an extension API than we ever were, and we continue > to get complaints about breaking stuff on Windows on a pretty regular > basis. > > Honestly, it seems unimaginable that an API is ever really going to be > possible. It would be a ton of work to maintain, and we'd just end up > breaking it every time we discover that there's a new feature we want > to implement which doesn't fit into the defined API now. That's what > we do *now* with functions that third-party extensions actually call, > and with variables that they access, and it's not something that, in > my experience, is any great problem in maintaining an extension. > You're running code that is running inside somebody else's executable > and sometimes you have to adjust it for upstream changes. That's life, > and I don't think that complaints about that topic are nearly as > frequent as complaints about extensions breaking on Windows because of > missing PGDLLIMPORT markings. > > And more than that, I'm pretty sure that you've previously taken the > view that we shouldn't document all the hook functions that only exist > in the backend for the purpose of extension use. As the person on the receiving end of that one, I was nonplussed, so I took a step back to think it over. I recognized at that time that I didn't have a great answer for a legitimate concern, namely that any change, however trivial, that goes into the core and doesn't go directly to core database functionality, represents a long-term maintenance burden. The thing I'm coming to is that the key architectural feature PostgreSQL has that other RDBMSs don't is its extensibility. Because that's been a stable feature over time, I'm pretty sure we actually need to see documenting that as a thing that does actually go to core database functionality. Yes, there are resources involved with doing a thing like this, but I don't think that they require constant or even frequent attention from committers or even from seasoned DB hackers. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: archive status ".ready" files may be created too early
On 2021-Aug-23, Bossart, Nathan wrote: > Ah, okay. BTW the other changes you mentioned made sense to me. Thanks. I've pushed this now to all live branches. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ It does it in a really, really complicated way why does it need to be complicated? Because it's MakeMaker.
Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)
Greetings, * Mark Dilger (mark.dil...@enterprisedb.com) wrote: > > On Aug 23, 2021, at 11:13 AM, Stephen Frost wrote: > > This I have to object to pretty strongly- we have got to get away from > > the idea that just because X isn't a superuser or isn't owned by a > > superuser that it's fine to allow some non-superuser to mess with it. > > I thought we were trying to create a set of roles which could cumulatively do > everything *inside the sandbox* that superuser can do, but which cannot > escape the sandbox. I would think this pg_manage_database_objects role would > be reasonable in the context of that effort. I wasn't objecting to the general concept of trying to have a role that can do lots of things inside the sandbox but aren't allowed to escape it. I was specifically objecting to the idea that just checking if an object is directly owned by a superuser is not sufficient to prevent a role from being able to escape the sandbox. > > In particlar, just because a role isn't explicitly marked as a superuser > > doesn't mean that the role can't *become* a superuser, or that it hasn't > > got privileged access to the system in other ways, such as by being a > > member of other predefined roles that perhaps the role who is a member > > of pg_manage_database_objects doesn't have. > > The implementation does not allow pg_manage_database_objects to mess with > objects that are owned by a role which satisfies superuser_arg(). If you are > renting out a database to a tenant and change the ownership of stuff to a > non-superuser, then you get what you get. But why would you do that? Simply using superuser_arg() isn't sufficient is exactly the point that I'm trying to make. As a 'landlord', I might very well want to have some kind of 'landlord' role that isn't directly a superuser but which could *become* a superuser by having been GRANT'd a superuser role- but I certainly don't want that role's objects to be able to be messed with by the tenant. > > Such a check against > > modifying of "superuser owned" objects implies that it's providing some > > kind of protection against the role being able to become a superuser > > when it doesn't actually provide that protection in any kind of reliable > > fashion and instead ends up fooling the user. > > Please provide steps to reproduce this issue. Assume that a database is > initialized and that everything is owned by the system. A "tenant" role is > created and granted pg_manage_database_objects, and other non-superuser roles > are created. Now, what exactly can "tenant" do that you find objectionable? If one of those other non-superuser roles is, itself, a role that can become a superuser and it decides to create some functions for its own purposes, then the tenant role would be able to modify those functions, allowing the tenant to gain access to the non-superuser role, and from there being able to gain access to superuser. Something along these lines, basically: CREATE USER tenant; GRANT pg_manage_database_objects TO tenant; CREATE USER landlord; GRANT postgres TO landlord; SET ROLE landlord; CREATE FUNCTION do_stuff(); put call to do_stuff() into a cronjob SET ROLE tenant; CREATE OR REPLACE do_stuff(); -- with code to take over landlord poof- tenant has ability to be landlord and then further to become postgres. All of the above applies beyond just superuser too- consider a non-superuser role which has been grant'd pg_execute_server_program. That won't trip up superuser_arg() but it sure would allow a role to break out of the sandbox. > > This is the issue with CREATEROLE and we definitely shouldn't be > > doubling-down on that mistake, and also brings up the point that I, at > > least, had certainly hoped that part of this effort would include a way > > for roles to be created by a user with an appropriate predefined role, > > and w/o CREATEROLE (which would then be deprecated or, ideally, just > > outright removed). > > Well, pg_manage_database_objects has no special ability to create or drop > roles. I thought separating those powers made more sense than grouping them > together. We can have a new role for doing what you say, but that seems > redundant with CREATEROLE. I didn't want this patch set to be bogged down > waiting for a consensus on how to change the CREATEROLE privilege. CREATEROLE doesn't work to give to folks generally because of the issues above- its check is, similarly, too simple and always has been. This isn't news either, it's been discussed in various places from time to time and is part of why people who run cloud providers end up either not giving out that role attribute and providing another way, or they hack up the PG core code to handle the way that attribute works differently. > > I get that this doesn't have to be in the first > > patch or even patch set going down this road but the lack of discussion > > or of any coordination between this effort and the other one that is > > trying to
Re: Postgres perl module namespace
On Mon, Aug 23, 2021 at 3:03 PM Andrew Dunstan wrote: > OK, I count 3 in favor of changing to PgTest::Cluster, 1 against, > remainder don't care. I'd have gone with something starting with Postgres:: ... but I don't care much. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)
> On Aug 23, 2021, at 11:13 AM, Stephen Frost wrote: > > This I have to object to pretty strongly- we have got to get away from > the idea that just because X isn't a superuser or isn't owned by a > superuser that it's fine to allow some non-superuser to mess with it. I thought we were trying to create a set of roles which could cumulatively do everything *inside the sandbox* that superuser can do, but which cannot escape the sandbox. I would think this pg_manage_database_objects role would be reasonable in the context of that effort. > In particlar, just because a role isn't explicitly marked as a superuser > doesn't mean that the role can't *become* a superuser, or that it hasn't > got privileged access to the system in other ways, such as by being a > member of other predefined roles that perhaps the role who is a member > of pg_manage_database_objects doesn't have. The implementation does not allow pg_manage_database_objects to mess with objects that are owned by a role which satisfies superuser_arg(). If you are renting out a database to a tenant and change the ownership of stuff to a non-superuser, then you get what you get. But why would you do that? > Such a check against > modifying of "superuser owned" objects implies that it's providing some > kind of protection against the role being able to become a superuser > when it doesn't actually provide that protection in any kind of reliable > fashion and instead ends up fooling the user. Please provide steps to reproduce this issue. Assume that a database is initialized and that everything is owned by the system. A "tenant" role is created and granted pg_manage_database_objects, and other non-superuser roles are created. Now, what exactly can "tenant" do that you find objectionable? > This is the issue with CREATEROLE and we definitely shouldn't be > doubling-down on that mistake, and also brings up the point that I, at > least, had certainly hoped that part of this effort would include a way > for roles to be created by a user with an appropriate predefined role, > and w/o CREATEROLE (which would then be deprecated or, ideally, just > outright removed). Well, pg_manage_database_objects has no special ability to create or drop roles. I thought separating those powers made more sense than grouping them together. We can have a new role for doing what you say, but that seems redundant with CREATEROLE. I didn't want this patch set to be bogged down waiting for a consensus on how to change the CREATEROLE privilege. > I get that this doesn't have to be in the first > patch or even patch set going down this road but the lack of discussion > or of any coordination between this effort and the other one that is > trying to address the CREATEROLE issue seems likely to land us in a bad > place with two distinct approaches being used. I'm confused. This patch set doesn't come within a country mile of CREATEROLE. Why should this patch set have to coordinate with that one? I'm not arguing with you -- merely asking what I'm misunderstanding? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Aug 23, 2021 at 2:13 PM Stephen Frost > wrote:> This I have to object to pretty strongly- we have got to get > away from > > the idea that just because X isn't a superuser or isn't owned by a > > superuser that it's fine to allow some non-superuser to mess with it. > > In particlar, just because a role isn't explicitly marked as a superuser > > doesn't mean that the role can't *become* a superuser, or that it hasn't > > got privileged access to the system in other ways, such as by being a > > member of other predefined roles that perhaps the role who is a member > > of pg_manage_database_objects doesn't have. Such a check against > > modifying of "superuser owned" objects implies that it's providing some > > kind of protection against the role being able to become a superuser > > when it doesn't actually provide that protection in any kind of reliable > > fashion and instead ends up fooling the user. > > I think you make a good point here, but it seems to me that we need > *something*. We need a way to create a "lead tenant" role that can > create other tenant roles and then, err, boss them around. Not only > drop the roles again, but also drop or alter or change the owner of > their objects, or really bypass any security those roles would like to > assert as against the lead tenant. If we can't see a way to create > some sort of role of that sort, then I don't think we can really say > we've solved anything much. Sure, but we can't just use the "superuser" flag for that, we need something better. The "better" in my mind here would be akin to what we're thinking about doing for event triggers, but for roles which actually already have a distinction between becoming a role vs. being able to GRANT that role to another role, and that's the 'admin' option. In other words, the user we imagine being GRANT'd this hypothetical pg_manage_database_objects role wouldn't actually need that role to explicitly give them access to be able to modify the objects of other roles- it would be able to do that by virtue of just being a member of those roles. The roles who are allowed to modify existing role membership should have the 'admin' right on those roles, and what we just need is a new predefined role that's basically "allow roles to be created or dropped" but where the only roles which can be GRANT'd by a user with that ability are the ones that they have admin rights on, and the only roles that they're allowed to drop they also have to have admin rights on. > > This is the issue with CREATEROLE and we definitely shouldn't be > > doubling-down on that mistake, and also brings up the point that I, at > > least, had certainly hoped that part of this effort would include a way > > for roles to be created by a user with an appropriate predefined role, > > and w/o CREATEROLE (which would then be deprecated or, ideally, just > > outright removed). I get that this doesn't have to be in the first > > patch or even patch set going down this road but the lack of discussion > > or of any coordination between this effort and the other one that is > > trying to address the CREATEROLE issue seems likely to land us in a bad > > place with two distinct approaches being used. > > Is there an active effort to do something about CREATEROLE? Do you > have a link to the thread? I feel like this is one of those things > that has occasioned discussion over the years but I am not aware of an > active project or a specific proposal to do something about this. Hrmpf, I had been thinking of this: https://www.postgresql.org/message-id/flat/c2ee39152957af339ae6f3e851aef09930dd2faf.ca...@credativ.de registered in the CF here: https://commitfest.postgresql.org/34/2918/ though I see now that it isn't trying to explicitly deal with the CREATEROLE bit (which I had understood from some other discussion was a topic of interest to the author), but is definitely caught up in the discussion about who is allowed to set what GUCs, and therefore still seems rather related to me. > Maybe this can be solved from the other end? Like, as opposed to > working by exception and saying, "well, everything but superusers," > maybe we need to explicitly declare who is included. Like, perhaps we > could somehow represent the fact that role A has super-powers with > respect to roles B, C, D, and any future roles that A may create, but > not other roles that exist on the system, or something of that sort? Isn't this exactly what having the 'admin' option on a role is? You're GRANT'd that role and further are allowed to then GRANT that role to other roles. Being a member of that role means you're considered to have 'ownership' level rights for all the objects that that role owns too. Maybe also need to have some condition around "you can only set attributes on roles which you already have", or maybe we need to invent 'admin' options for each of the role attributes if we think it needs to
Re: Postgres perl module namespace
On 8/11/21 9:30 AM, Tom Lane wrote: > Alvaro Herrera writes: >> I'll recast my vote to make this line be >> my $node = PgTest::Cluster->new('nodename'); >> which seems succint enough. I could get behind PgTest::PgCluster too, >> but having a second Pg there seems unnecessary. > Either of those WFM. > > OK, I count 3 in favor of changing to PgTest::Cluster, 1 against, remainder don't care. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
On Wed, Aug 18, 2021 at 9:28 AM Greg Nancarrow wrote: > Yes, I think I agree on that. > I've updated the patch to restore the actual transaction snapshot in > the IsolationUsesXactSnapshot() case, otherwise the active snapshot is > installed as the transaction snapshot. > I've tested the patch for the different transaction isolation levels, > and the reported coredump (from assertion failure) is not occurring. > (In the "serializable" case there are "could not serialize access due > to read/write dependencies among transactions" errors, as Pavel has > previously reported, but these occur without the patch and it appears > to be an unrelated issue) I think this looks pretty good. I am not sure I see any reason to introduce a new function RestoreTxnSnapshotAndSetAsActive. Couldn't we just use RestoreTransactionSnapshot() and then call PushActiveSnapshot() from parallel.c? That seems preferable to me from the standpoint of avoiding multiplication of APIs. I also think that the comments should explain why we are doing this, rather than just that we are doing this. So instead of this: +/* + * If the transaction snapshot was serialized, restore it and restore the + * active snapshot too. Otherwise, the active snapshot is also installed as + * the transaction snapshot. + */ ...perhaps something like: If the transaction isolation level is READ COMMITTED or SERIALIZABLE, the leader has serialized the transaction snapshot and we must restore it. At lower isolation levels, there is no transaction-lifetime snapshot, but we need TransactionXmin to get set to a value which is less than or equal to the xmin of every snapshot that will be used by this worker. The easiest way to accomplish that is to install the active snapshot as the transaction snapshot. Code running in this parallel worker might take new snapshots via GetTransactionSnapshot() or GetLatestSnapshot(), but it shouldn't have any way of acquiring a snapshot older than the active snapshot. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PROPOSAL] new diagnostic items for the dynamic sql
Hi pá 20. 8. 2021 v 10:24 odesílatel Dinesh Chemuduru napsal: > On Sun, 25 Jul 2021 at 16:34, Pavel Stehule > wrote: > please, can you register this patch to commitfest app https://commitfest.postgresql.org/34/ Regards Pavel > >> >> ne 25. 7. 2021 v 12:52 odesílatel Dinesh Chemuduru < >> dinesh.ku...@migops.com> napsal: >> >>> On Sat, 17 Jul 2021 at 01:29, Pavel Stehule >>> wrote: >>> Hi pá 16. 7. 2021 v 21:47 odesílatel Dinesh Chemuduru < dinesh.ku...@migops.com> napsal: > Hi Everyone, > > We would like to propose the below 2 new plpgsql diagnostic items, > related to parsing. Because, the current diag items are not providing > the useful diagnostics about the dynamic SQL statements. > > 1. PG_PARSE_SQL_STATEMENT (returns parse failed sql statement) > 2. PG_PARSE_SQL_STATEMENT_POSITION (returns parse failed sql text > cursor position) > > Consider the below example, which is an invalid SQL statement. > > postgres=# SELECT 1 JOIN SELECT 2; > ERROR: syntax error at or near "JOIN" > LINE 1: SELECT 1 JOIN SELECT 2; > ^ > Here, there is a syntax error at JOIN clause, > and also we are getting the syntax error position(^ symbol, the > position of JOIN clause). > This will be helpful, while dealing with long queries. > > Now, if we run the same statement as a dynamic SQL(by using EXECUTE > ), > then it seems we are not getting the text cursor position, > and the SQL statement which is failing at parse level. > > Please find the below example. > > postgres=# SELECT exec_me('SELECT 1 JOIN SELECT 2'); > NOTICE: RETURNED_SQLSTATE 42601 > NOTICE: COLUMN_NAME > NOTICE: CONSTRAINT_NAME > NOTICE: PG_DATATYPE_NAME > NOTICE: MESSAGE_TEXT syntax error at or near "JOIN" > NOTICE: TABLE_NAME > NOTICE: SCHEMA_NAME > NOTICE: PG_EXCEPTION_DETAIL > NOTICE: PG_EXCEPTION_HINT > NOTICE: PG_EXCEPTION_CONTEXT PL/pgSQL function exec_me(text) line 18 > at EXECUTE > NOTICE: PG_CONTEXT PL/pgSQL function exec_me(text) line 21 at GET > STACKED DIAGNOSTICS > exec_me > - > > (1 row) > > From the above results, by using all the existing diag items, we are > unable to get the position of "JOIN" in the submitted SQL statement. > By using these proposed diag items, we will be getting the required > information, > which will be helpful while running long SQL statements as dynamic SQL > statements. > > Please find the below example. > > postgres=# SELECT exec_me('SELECT 1 JOIN SELECT 2'); > NOTICE: PG_PARSE_SQL_STATEMENT SELECT 1 JOIN SELECT 2 > NOTICE: PG_PARSE_SQL_STATEMENT_POSITION 10 > exec_me > - > > (1 row) > > From the above results, by using these diag items, > we are able to get what is failing and it's position as well. > This information will be much helpful to debug the issue, > while a long running SQL statement is running as a dynamic SQL > statement. > > We are attaching the patch for this proposal, and will be looking for > your inputs. > +1 It is good idea. I am not sure if the used names are good. I propose PG_SQL_TEXT and PG_ERROR_LOCATION Regards Pavel >>> Thanks Pavel, >>> >>> Sorry for the late reply. >>> >>> The proposed diag items are `PG_SQL_TEXT`, `PG_ERROR_LOCATION` are much >>> better and generic. >>> >>> But, as we are only dealing with the parsing failure, I thought of >>> adding that to the diag name. >>> >> >> I understand. But parsing is only one case - and these variables can be >> used for any case. Sure, ***we don't want*** to have PG_PARSE_SQL_TEXT, >> PG_ANALYZE_SQL_TEXT, PG_EXECUTION_SQL_TEXT ... >> >> The idea is good, and you found the case, where it has benefits for >> users. Naming is hard. >> >> > Thanks for your time and suggestions Pavel. > I updated the patch as per the suggestions, and attached it here for > further inputs. > > Regards, > Dinesh Kumar > > > >> Regards >> >> Pavel >> >> >>> Regards, >>> Dinesh Kumar >>> >>> > Regards, > Dinesh Kumar >
Re: Mark all GUC variable as PGDLLIMPORT
On Mon, Aug 23, 2021 at 7:57 AM Robert Haas wrote: > In short, +1 from me for the original proposal of marking all GUCs as > PGDLLIMPORT. +1 > And, heck, +1 for marking all the other global variables > that way, too. We're not solving any problem here. We're just annoying > people, mostly people who are loyal community members and steady > contributors to the project. I'm +0.5 on this aspect -- the result might be a lot of verbosity for no possible benefit. I'm not sure how many global variables there are. Hopefully not that many. Maybe making adding new global variables annoying would be a useful disincentive -- sometimes we use global variables when it isn't particularly natural (it's natural with GUCs, but not other things). That might tip the scales, at least for me. Unnecessary use of global variables are why Postgres 13 went through several point releases before I accidentally found out that parallel VACUUM doesn't respect cost limits -- a very simple bug concerning how we propagate (or fail to propagate) state to parallel workers. I bet it would have taken far longer for the bug to be discovered if it wasn't for my Postgres 14 VACUUM refactoring work. -- Peter Geoghegan
Re: very long record lines in expanded psql output
Hi! Done, here's the link: https://commitfest.postgresql.org/34/3295/ Best regards, Platon Pronko On 2021-08-23 21:14, Andrew Dunstan wrote: On 8/23/21 2:00 PM, Platon Pronko wrote: Hi! Apparently I did forget something, and that's the patch itself :) Thanks for Justin Pryzby for pointing this out. Attaching the patch now. Please add it to the commitfest, the next one starts in about a week. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Queries that should be canceled will get stuck on secure_write function
On Mon, Aug 23, 2021 at 11:26 AM Alvaro Herrera wrote: > Aren't we talking about query cancellations that occur in response to a > standby delay limit? Those aren't in response to user action. Oh, you're right. But I guess a similar problem could also occur in response to pg_terminate_backend(), no? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)
On Mon, Aug 23, 2021 at 2:13 PM Stephen Frost wrote:> This I have to object to pretty strongly- we have got to get away from > the idea that just because X isn't a superuser or isn't owned by a > superuser that it's fine to allow some non-superuser to mess with it. > In particlar, just because a role isn't explicitly marked as a superuser > doesn't mean that the role can't *become* a superuser, or that it hasn't > got privileged access to the system in other ways, such as by being a > member of other predefined roles that perhaps the role who is a member > of pg_manage_database_objects doesn't have. Such a check against > modifying of "superuser owned" objects implies that it's providing some > kind of protection against the role being able to become a superuser > when it doesn't actually provide that protection in any kind of reliable > fashion and instead ends up fooling the user. I think you make a good point here, but it seems to me that we need *something*. We need a way to create a "lead tenant" role that can create other tenant roles and then, err, boss them around. Not only drop the roles again, but also drop or alter or change the owner of their objects, or really bypass any security those roles would like to assert as against the lead tenant. If we can't see a way to create some sort of role of that sort, then I don't think we can really say we've solved anything much. > This is the issue with CREATEROLE and we definitely shouldn't be > doubling-down on that mistake, and also brings up the point that I, at > least, had certainly hoped that part of this effort would include a way > for roles to be created by a user with an appropriate predefined role, > and w/o CREATEROLE (which would then be deprecated or, ideally, just > outright removed). I get that this doesn't have to be in the first > patch or even patch set going down this road but the lack of discussion > or of any coordination between this effort and the other one that is > trying to address the CREATEROLE issue seems likely to land us in a bad > place with two distinct approaches being used. Is there an active effort to do something about CREATEROLE? Do you have a link to the thread? I feel like this is one of those things that has occasioned discussion over the years but I am not aware of an active project or a specific proposal to do something about this. Maybe this can be solved from the other end? Like, as opposed to working by exception and saying, "well, everything but superusers," maybe we need to explicitly declare who is included. Like, perhaps we could somehow represent the fact that role A has super-powers with respect to roles B, C, D, and any future roles that A may create, but not other roles that exist on the system, or something of that sort? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Mark all GUC variable as PGDLLIMPORT
On Mon, Aug 23, 2021 at 11:40 AM Alvaro Herrera wrote: > In that case, why not improve the API with functions that return the > values in some native datatype? For scalars with native C types (int, > floats, Boolean etc) this is easy enough; I bet it'll solve 99% of the > problems or more. Sure, but ... why bother? The entire argument rests on the presumption that there is some harm being done by people accessing the values directly, but I don't think that's true. And, if it were true, it seems likely that this proposed API would have the exact same problem, because it would let people do exactly the same thing. And, going through this proposed API would still be significantly more expensive than just accessing the bare variables, because you'd at least have to do some kind of lookup based on the GUC name to find the corresponding variable. It's just a solution in search of a problem. Nothing bad happens when people write extensions that access GUC variables directly. It works totally, completely fine. Except on Windows. -- Robert Haas EDB: http://www.enterprisedb.com
Re: very long record lines in expanded psql output
On 8/23/21 2:00 PM, Platon Pronko wrote: > Hi! > > Apparently I did forget something, and that's the patch itself :) > Thanks for Justin Pryzby for pointing this out. > > Attaching the patch now. > > Please add it to the commitfest, the next one starts in about a week. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)
Greetings, * Mark Dilger (mark.dil...@enterprisedb.com) wrote: > > On Jul 22, 2021, at 1:01 PM, Robert Haas wrote: > > It's awkward. I think that we can't afford to create a separate > > predefined role for every single thing that someone could > > theoretically want to sever, because then we'll have a zillion of them > > and it will be unmaintainable. So the idea was to try to break up > > everything someone might want to do either via DDL or by setting GUCs > > into one of three categories: internal to the database > > (pg_database_security), facing outward toward the network > > (pg_network_security), and facing inward toward the host > > (pg_host_security). If we didn't have any predefined security-related > > roles already, this would still have complications, but as things > > stand it has more, because as you point out, pg_read_server_files > > overlaps with pg_host_security. But what do we do about that? > > I gave up on the idea of splitting all superuser functions into three roles. I can't say that I blame you for that. :) For my 2c, at least, the ones proposed never really felt like they were very directly tied to specific capabilities, which I think was one of the issues with that approach. > Patch v5-0001 refactors the guc code to allow non-superuser roles to be > associated with guc variables. Any such role can then set the variable, > including via "alter system set". The patch stops short of creating any new > roles or assigning any roles to any guc variable. Haven't looked at the patch yet but this does generally seem like an interesting approach. > Patches v5-0002 through v5-0005 create four new roles for managing host > resource settings, vacuum settings, autovacuum settings, and logging > settings. That last one excludes "where to log" settings, because we don't > want the role to be able to write to arbitrary locations on the server. > Remaining guc variables not in these four categories continue to belong to > the superuser. We do have a role today who is allowed to write to arbitrary locations on the server, so I wonder if for those log settings we'd include a requirement for the user to have both of those roles instead..? > Patches v5-0006 and v5-0007 allow non-superusers to own event triggers, and > limit the event triggers to only running for events triggered by roles that > the event trigger owner belongs to. This is backward compatible, because > event triggers have historically belonged only to superusers, and superusers > have implicit membership in all groups. While I generally agree that this doesn't end up opening up security issues, it's going to certainly be a change in how event triggers work as they'll no longer *always* fire, and that seems quite at odds with how triggers are generally thought of. So much so that I worry about mis-use due to this. Then again, if we're going to go down this route at all, I can't think of any particular way to avoid the security issues of running the trigger for everyone when it's owned by a non-superuser. > Patches v5-0008 through v5-0010 allow non-superusers to own subscriptions > while restricting the tablesync and apply workers to only work on tables that > the subscription owner has permissions on. This is almost backward > compatible, because subscriptions have historically belonged only to > superusers, as above, except for unlikely scenarios where superusers have > given ownership to non-superusers. In those cases, the new code will refuse > to apply in situations where the old code would blindly apply changes. Does > anybody see a problem with this? This doesn't particularly bother me, at least. > Patch v5-0011 is a bug fix posted elsewhere that hasn't been committed yet > but which must be committed in preparation for v5-0012. No idea what it is as I hadn't looked yet, but if it's a bug fix then shouldn't it be separated and back-patched..? > Patch v5-0012 creates a new role, pg_manage_database_objects, which can do > anything with an object that the owner could do with it, as long as the owner > is not a superuser. This role is intended as a "tenant" role, and is in some > sense a less powerful replacement for the pg_database_security role > previously proposed. This I have to object to pretty strongly- we have got to get away from the idea that just because X isn't a superuser or isn't owned by a superuser that it's fine to allow some non-superuser to mess with it. In particlar, just because a role isn't explicitly marked as a superuser doesn't mean that the role can't *become* a superuser, or that it hasn't got privileged access to the system in other ways, such as by being a member of other predefined roles that perhaps the role who is a member of pg_manage_database_objects doesn't have. Such a check against modifying of "superuser owned" objects implies that it's providing some kind of protection against the role being able to become a superuser when it doesn't actually
Re: pretty slow merge-join due rescan?
po 23. 8. 2021 v 18:44 odesílatel Pavel Stehule napsal: > Hi > > The customer reports a very slow query. I have a reproducer script. The > dataset is not too big > > postgres=# \dt+ > List of relations > ┌┬───┬───┬───┬─┬┬─┐ > │ Schema │ Name │ Type │ Owner │ Persistence │Size│ Description │ > ╞╪═══╪═══╪═══╪═╪╪═╡ > │ public │ f_dep │ table │ pavel │ permanent │ 8192 bytes │ │ > │ public │ f_emp │ table │ pavel │ permanent │ 1001 MB│ │ > │ public │ f_fin │ table │ pavel │ permanent │ 432 kB │ │ > │ public │ qt│ table │ pavel │ permanent │ 1976 kB│ │ > │ public │ qtd │ table │ pavel │ permanent │ 87 MB │ │ > └┴───┴───┴───┴─┴┴─┘ > (5 rows) > > and the query is not too complex > > SELECT > sub.a_121, > count(*) > FROM ( > SELECT > f_fin.dt_business_day_id AS a_1056, > f_dep.description_id AS a_121, > f_emp.employee_id_id AS a_1327 > FROM f_emp > INNER JOIN f_dep ON > ( f_emp.department_id_id = f_dep.id ) > INNER JOIN f_fin ON > ( f_emp.business_day_date_id = f_fin.id ) > GROUP BY 1, 2, 3 > ) AS sub > INNER JOIN qt ON > ( sub.a_1056 = qt.tt_1056_1056_b ) > LEFT OUTER JOIN qtd AS qt_2 ON > ( ( qt.tt_1056_1056_b = qt_2.a_1056 ) > AND ( sub.a_121 = qt_2.a_121 ) > AND ( sub.a_1327 = qt_2.a_1327 ) ) > LEFT OUTER JOIN qtd AS qt_3 ON > ( ( qt.tt_1056_1056_a = qt_3.a_1056 ) > AND ( sub.a_121 = qt_3.a_121 ) > AND ( sub.a_1327 = qt_3.a_1327 ) ) > GROUP BY 1; > > By default I get a good plan, and the performance is ok > https://explain.depesz.com/s/Mr2H (about 16 sec). Unfortunately, when I > increase work_mem, I get good plan with good performance > https://explain.depesz.com/s/u4Ff > > But this depends on index only scan. In the production environment, the > index only scan is not always available, and I see another plan (I can get > this plan with disabled index only scan). > > Although the cost is almost the same, the query is about 15x slower > https://explain.depesz.com/s/L6zP > > │ HashAggregate (cost=1556129.74..1556131.74 rows=200 width=12) (actual > time=269948.878..269948.897 rows=64 loops=1) > │ > │ Group Key: f_dep.description_id > > │ > │ Batches: 1 Memory Usage: 40kB > > │ > │ Buffers: shared hit=5612 read=145051 > > │ > │ -> Merge Left Join (cost=1267976.96..1534602.18 rows=4305512 > width=4) (actual time=13699.847..268785.500 rows=4291151 loops=1) > │ > │ Merge Cond: ((f_emp.employee_id_id = qt_3.a_1327) AND > (f_dep.description_id = qt_3.a_121)) > │ > │ Join Filter: (qt.tt_1056_1056_a = qt_3.a_1056) > > │ > │ Rows Removed by Join Filter: 1203659495 > > │ > │ Buffers: shared hit=5612 read=145051 > > │ > > . > > │ > │ -> Sort (cost=209977.63..214349.77 rows=1748859 width=12) (actual > time=979.522..81842.913 rows=1205261892 loops=1) > │ > │ Sort Key: qt_3.a_1327, qt_3.a_121 > > │ > │ Sort Method: quicksort Memory: 144793kB > > │ > │ Buffers: shared hit=2432 read=8718 > > │ > │ -> Seq Scan on qtd qt_3 (cost=0.00..28638.59 > rows=1748859 width=12) (actual time=0.031..284.437 rows=1748859 loops=1) > │ Buffers: shared hit=2432 read=8718 > > The sort of qtd table is very fast > > postgres=# explain analyze select * from qtd order by a_1327, a_121; > > ┌──┐ > │ QUERY PLAN > │ > > ╞══╡ > │ Sort (cost=209977.63..214349.77 rows=1748859 width=27) (actual > time=863.923...213 rows=1748859 loops=1) │ > │ Sort Key: a_1327, a_121 > │ > │ Sort Method: quicksort Memory: 199444kB > │ > │ -> Seq Scan on qtd (cost=0.00..28638.59 rows=1748859 width=27) > (actual time=0.035..169.385 rows=1748859 loops=1) │ > │ Planning Time: 0.473 ms > │ > │ Execution Time: 1226.305 ms > │ > >
Re: very long record lines in expanded psql output
Hi! Apparently I did forget something, and that's the patch itself :) Thanks for Justin Pryzby for pointing this out. Attaching the patch now. Best regards, Platon Pronko On 2021-08-23 20:51, Platon Pronko wrote: Hi! Please find attached the patch implementing the proposed changes. I hope I didn't forget anything (docs, tab completion, comments, etc), if I did forget something please tell me and I'll fix it. Best regards, Platon Pronko On 2021-08-08 01:18, Andrew Dunstan wrote: On 8/7/21 10:56 AM, Platon Pronko wrote: Hi! I also find this annoying and would be happy to be rid of it. Have you tried "\pset format wrapped"? Pavel suggested it, and it solved most of the problem for me, for example. Yes, but it changes the data line output. Ideally, you should be able to modify these independently. I agree, and I think this can be implemented, but I'm a bit afraid of introducing an additional psql option (there's already quite a lot of them). I suspect primary PostgreSQL maintainers won't be happy with such an approach. I think I qualify as one of those ... :-) Sorry, I'm new here, don't know who's who :) No problem. Welcome! We're always very glad to see new contributors. I'll start working on a new patch then. A couple questions about specifics: 1. Can we add "expanded" in the option name, like "xheader_expanded_width"? I think adjusting the header row width doesn't make sense on any other modes, and placing that in the option name makes intent a bit clearer. "xheader" was meant to be shorthand for "expanded output header" 2. What was "column" option in your original suggestion supposed to do? ("\pset xheader_width column|page|nnn") It's meant to say don't print anything past the column spec, e.g.: -[ RECORD 1 ]+ n | 42 long_column_name | xx -[ RECORD 2 ]+ n | 210 long_column_name | xx 3. Should we bother with using this option when in "\pset border 2" mode? I can do it for consistency, but it will still look bad. Probably not, but since I never use it I'll let others who do weigh in on the subject. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index fcab5c0d51..8cffc18165 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -2799,6 +2799,32 @@ lo_import 152801 + + xheader_width + + + Sets expanded header width to one of full, + column, + page, + or integer value. + + + full is the default option - expanded header + is not truncated. + + + column: don't print header line past the first + column. + + + page: fit header line to terminal width. + + + integer value: specify exact width of header line. + + + + fieldsep diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 49d4c0e3ce..440b732d3c 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -4310,6 +4310,28 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) popt->topt.expanded = !popt->topt.expanded; } + /* header line width in expanded mode */ + else if (strcmp(param, "xheader_width") == 0) + { + if (value && pg_strcasecmp(value, "full") == 0) + popt->topt.expanded_header_width_type = PRINT_XHEADER_FULL; + else if (value && pg_strcasecmp(value, "column") == 0) + popt->topt.expanded_header_width_type = PRINT_XHEADER_COLUMN; + else if (value && pg_strcasecmp(value, "page") == 0) + popt->topt.expanded_header_width_type = PRINT_XHEADER_PAGE; + else if (value) { + popt->topt.expanded_header_width_type = PRINT_XHEADER_EXACT_WIDTH; + popt->topt.expanded_header_exact_width = atoi(value); + if (popt->topt.expanded_header_exact_width == 0) { +pg_log_error("\\pset: allowed xheader_width values are full (default), column, page, or an number specifying exact width."); +return false; + } + } else { + /* reset to default if value is empty */ + popt->topt.expanded_header_width_type = PRINT_XHEADER_FULL; + } + } + /* field separator for CSV format */ else if (strcmp(param, "csv_fieldsep") == 0) { @@ -4502,6 +4524,19 @@ printPsetInfo(const char *param, printQueryOpt *popt) printf(_("Expanded display is off.\n")); } + /* show xheader width type */ + else if (strcmp(param, "xheader_width") == 0) + { + if (popt->topt.expanded_header_width_type == PRINT_XHEADER_FULL) + printf(_("Expanded header width is 'full'.\n")); + else if
Re: very long record lines in expanded psql output
Hi! Please find attached the patch implementing the proposed changes. I hope I didn't forget anything (docs, tab completion, comments, etc), if I did forget something please tell me and I'll fix it. Best regards, Platon Pronko On 2021-08-08 01:18, Andrew Dunstan wrote: On 8/7/21 10:56 AM, Platon Pronko wrote: Hi! I also find this annoying and would be happy to be rid of it. Have you tried "\pset format wrapped"? Pavel suggested it, and it solved most of the problem for me, for example. Yes, but it changes the data line output. Ideally, you should be able to modify these independently. I agree, and I think this can be implemented, but I'm a bit afraid of introducing an additional psql option (there's already quite a lot of them). I suspect primary PostgreSQL maintainers won't be happy with such an approach. I think I qualify as one of those ... :-) Sorry, I'm new here, don't know who's who :) No problem. Welcome! We're always very glad to see new contributors. I'll start working on a new patch then. A couple questions about specifics: 1. Can we add "expanded" in the option name, like "xheader_expanded_width"? I think adjusting the header row width doesn't make sense on any other modes, and placing that in the option name makes intent a bit clearer. "xheader" was meant to be shorthand for "expanded output header" 2. What was "column" option in your original suggestion supposed to do? ("\pset xheader_width column|page|nnn") It's meant to say don't print anything past the column spec, e.g.: -[ RECORD 1 ]+ n| 42 long_column_name | xx -[ RECORD 2 ]+ n| 210 long_column_name | xx 3. Should we bother with using this option when in "\pset border 2" mode? I can do it for consistency, but it will still look bad. Probably not, but since I never use it I'll let others who do weigh in on the subject. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: .ready and .done files considered harmful
On Mon, Aug 23, 2021 at 11:50 AM Bossart, Nathan wrote: > To handle a "cheating" archive command, I'd probably need to add a > stat() for every time pgarch_readyXLog() returned something from > arch_files. I suspect something similar might be needed in Dipesh's > patch to handle backup history files and partial WAL files. I think he's effectively got that already, although it's probably inside of pgarch_readyXLog(). The idea there is that instead of having a cache of files to be returned (as in your case) he just checks whether the next file in sequence happens to be present and if so returns that file name. To see whether it's present, he uses stat(). > In any case, I think Dipesh's patch is the way to go. It obviously > will perform better in the extreme cases discussed in this thread. I > think it's important to make sure the patch doesn't potentially leave > files behind to be picked up by a directory scan that might not > happen, but there are likely ways to handle that. In the worst case, > perhaps we need to force a directory scan every N files to make sure > nothing gets left behind. But maybe we can do better. It seems to me that we can handle that by just having the startup process notify the archiver every time some file is ready for archiving that's not the next one in the sequence. We have to make sure we cover all the relevant code paths, but that seems like it should be doable, and we have to decide on the synchronization details, but that also seems pretty manageable, even if we haven't totally got it sorted yet. The thing is, as soon as you go back to forcing a directory scan every N files, you've made it formally O(N^2) again, which might not matter in practice if the constant factor is low enough, but I don't think it will be. Either you force the scans every, say, 1000 files, in which case it's going to make the whole mechanism a lot less effective in terms of getting out from under problem cases -- or you force scans every, say, 100 files, in which case it's not really going to cause any missed files to get archived soon enough to make anyone happy. I doubt there is really a happy medium in there. I suppose the two approaches could be combined, too - remember the first N files you think you'll encounter and then after that try successive filenames until one is missing. That would be more resilient against O(N^2) behavior in the face of frequent gaps. But it might also be more engineering than is strictly required. -- Robert Haas EDB: http://www.enterprisedb.com
Re: archive status ".ready" files may be created too early
On 8/23/21, 10:31 AM, "alvhe...@alvh.no-ip.org" wrote: > On 2021-Aug-23, alvhe...@alvh.no-ip.org wrote: > >> The only way .ready files are created is that XLogNotifyWrite() is >> called. For regular WAL files during regular operation, that only >> happens in XLogNotifyWriteSeg(). That, in turn, only happens in >> NotifySegmentsReadyForArchive(). But if the system runs and never >> writes WAL records that cross WAL boundaries, that function will see >> that both earliestSegBoundary and latestSegBoundary are MaxXLogSegno, >> and return without doing anything. So no segments will be notified. > > Nevermind -- I realized that all segments get registered, not just those > for which we generate continuation records. Ah, okay. BTW the other changes you mentioned made sense to me. Nathan
Re: archive status ".ready" files may be created too early
On 2021-Aug-23, alvhe...@alvh.no-ip.org wrote: > The only way .ready files are created is that XLogNotifyWrite() is > called. For regular WAL files during regular operation, that only > happens in XLogNotifyWriteSeg(). That, in turn, only happens in > NotifySegmentsReadyForArchive(). But if the system runs and never > writes WAL records that cross WAL boundaries, that function will see > that both earliestSegBoundary and latestSegBoundary are MaxXLogSegno, > and return without doing anything. So no segments will be notified. Nevermind -- I realized that all segments get registered, not just those for which we generate continuation records. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "Use it up, wear it out, make it do, or do without"
Re: archive status ".ready" files may be created too early
On 2021-Aug-23, Bossart, Nathan wrote: > Sorry, I'm still not following this one. If we skipped creating > .ready segments due to a crash, we rely on RemoveOldXlogFiles() to > create them as needed in the end-of-recovery checkpoint. If a record > fits perfectly in the end of a segment, we'll still register it as a > boundary for the next segment (hence why we use XLByteToSeg() instead > of XLByteToPrevSeg()). If database activity stops completely, there > shouldn't be anything to mark ready. The only way .ready files are created is that XLogNotifyWrite() is called. For regular WAL files during regular operation, that only happens in XLogNotifyWriteSeg(). That, in turn, only happens in NotifySegmentsReadyForArchive(). But if the system runs and never writes WAL records that cross WAL boundaries, that function will see that both earliestSegBoundary and latestSegBoundary are MaxXLogSegno, and return without doing anything. So no segments will be notified. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
Re: archive status ".ready" files may be created too early
On 8/23/21, 9:33 AM, "alvhe...@alvh.no-ip.org" wrote: > On 2021-Aug-23, Bossart, Nathan wrote: > >> Hm. My expectation would be that if there are no registrations, we >> cannot create .ready files for the flushed segments. The scenario >> where I can see that happening is when a record gets flushed to disk >> prior to registration. In that case, we'll still eventually register >> the record and wake up the WAL writer process, which will take care of >> creating the .ready files that were missed earlier. Is there another >> case you are thinking of where we could miss registration for a cross- >> segment record altogether? > > I'm thinking of the case where no record cross segment boundaries ever. Sorry, I'm still not following this one. If we skipped creating .ready segments due to a crash, we rely on RemoveOldXlogFiles() to create them as needed in the end-of-recovery checkpoint. If a record fits perfectly in the end of a segment, we'll still register it as a boundary for the next segment (hence why we use XLByteToSeg() instead of XLByteToPrevSeg()). If database activity stops completely, there shouldn't be anything to mark ready. Nathan
pretty slow merge-join due rescan?
Hi The customer reports a very slow query. I have a reproducer script. The dataset is not too big postgres=# \dt+ List of relations ┌┬───┬───┬───┬─┬┬─┐ │ Schema │ Name │ Type │ Owner │ Persistence │Size│ Description │ ╞╪═══╪═══╪═══╪═╪╪═╡ │ public │ f_dep │ table │ pavel │ permanent │ 8192 bytes │ │ │ public │ f_emp │ table │ pavel │ permanent │ 1001 MB│ │ │ public │ f_fin │ table │ pavel │ permanent │ 432 kB │ │ │ public │ qt│ table │ pavel │ permanent │ 1976 kB│ │ │ public │ qtd │ table │ pavel │ permanent │ 87 MB │ │ └┴───┴───┴───┴─┴┴─┘ (5 rows) and the query is not too complex SELECT sub.a_121, count(*) FROM ( SELECT f_fin.dt_business_day_id AS a_1056, f_dep.description_id AS a_121, f_emp.employee_id_id AS a_1327 FROM f_emp INNER JOIN f_dep ON ( f_emp.department_id_id = f_dep.id ) INNER JOIN f_fin ON ( f_emp.business_day_date_id = f_fin.id ) GROUP BY 1, 2, 3 ) AS sub INNER JOIN qt ON ( sub.a_1056 = qt.tt_1056_1056_b ) LEFT OUTER JOIN qtd AS qt_2 ON ( ( qt.tt_1056_1056_b = qt_2.a_1056 ) AND ( sub.a_121 = qt_2.a_121 ) AND ( sub.a_1327 = qt_2.a_1327 ) ) LEFT OUTER JOIN qtd AS qt_3 ON ( ( qt.tt_1056_1056_a = qt_3.a_1056 ) AND ( sub.a_121 = qt_3.a_121 ) AND ( sub.a_1327 = qt_3.a_1327 ) ) GROUP BY 1; By default I get a good plan, and the performance is ok https://explain.depesz.com/s/Mr2H (about 16 sec). Unfortunately, when I increase work_mem, I get good plan with good performance https://explain.depesz.com/s/u4Ff But this depends on index only scan. In the production environment, the index only scan is not always available, and I see another plan (I can get this plan with disabled index only scan). Although the cost is almost the same, the query is about 15x slower https://explain.depesz.com/s/L6zP │ HashAggregate (cost=1556129.74..1556131.74 rows=200 width=12) (actual time=269948.878..269948.897 rows=64 loops=1) │ │ Group Key: f_dep.description_id │ │ Batches: 1 Memory Usage: 40kB │ │ Buffers: shared hit=5612 read=145051 │ │ -> Merge Left Join (cost=1267976.96..1534602.18 rows=4305512 width=4) (actual time=13699.847..268785.500 rows=4291151 loops=1) │ │ Merge Cond: ((f_emp.employee_id_id = qt_3.a_1327) AND (f_dep.description_id = qt_3.a_121)) │ │ Join Filter: (qt.tt_1056_1056_a = qt_3.a_1056) │ │ Rows Removed by Join Filter: 1203659495 │ │ Buffers: shared hit=5612 read=145051 │ . │ │ -> Sort (cost=209977.63..214349.77 rows=1748859 width=12) (actual time=979.522..81842.913 rows=1205261892 loops=1) │ │ Sort Key: qt_3.a_1327, qt_3.a_121 │ │ Sort Method: quicksort Memory: 144793kB │ │ Buffers: shared hit=2432 read=8718 │ │ -> Seq Scan on qtd qt_3 (cost=0.00..28638.59 rows=1748859 width=12) (actual time=0.031..284.437 rows=1748859 loops=1) │ Buffers: shared hit=2432 read=8718 The sort of qtd table is very fast postgres=# explain analyze select * from qtd order by a_1327, a_121; ┌──┐ │ QUERY PLAN │ ╞══╡ │ Sort (cost=209977.63..214349.77 rows=1748859 width=27) (actual time=863.923...213 rows=1748859 loops=1) │ │ Sort Key: a_1327, a_121 │ │ Sort Method: quicksort Memory: 199444kB │ │ -> Seq Scan on qtd (cost=0.00..28638.59 rows=1748859 width=27) (actual time=0.035..169.385 rows=1748859 loops=1) │ │ Planning Time: 0.473 ms │ │ Execution Time: 1226.305 ms │ └──┘ (6 rows) but here it returns 700x lines more and it is 70 x slower. Probably it is because something does rescan. But why? With index only scan, I don't see any indices of rescan. Is it an executor or optimizer bug? Or is it a bug? I tested this behaviour on Postgres 13 and on the fresh
Re: [Patch] change the default value of shared_buffers in postgresql.conf.sample
On Wed, Aug 18, 2021 at 08:27:19PM +0200, Magnus Hagander wrote: > On Wed, Aug 18, 2021 at 8:16 PM Bruce Momjian wrote: > > > > On Wed, Aug 18, 2021 at 02:03:56PM -0400, Tom Lane wrote: > > > Bruce Momjian writes: > > > > I think the only question is whether this is a PG 15-only patch or a > > > > patckpatch to PG 10; I am in favor of the later. > > > > > > I think you need a lot stronger argument that this is a bug > > > before you consider back-patching user-visible behavioral > > > changes. > > > > I think the only logic to backpatching it is your statement that this is > > cosmetic, and the new cosmetic appearance is more accurate. However, if > > you don't feel we need to backpatch, that is fine with me --- we have > > gotten very few complaints about this. > > +1 for making the change ,and +1 for making it in master only, no backpatch. Patch applied to master. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: archive status ".ready" files may be created too early
On 2021-Aug-23, Bossart, Nathan wrote: > Hm. My expectation would be that if there are no registrations, we > cannot create .ready files for the flushed segments. The scenario > where I can see that happening is when a record gets flushed to disk > prior to registration. In that case, we'll still eventually register > the record and wake up the WAL writer process, which will take care of > creating the .ready files that were missed earlier. Is there another > case you are thinking of where we could miss registration for a cross- > segment record altogether? I'm thinking of the case where no record cross segment boundaries ever. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: archive status ".ready" files may be created too early
On 8/23/21, 8:50 AM, "alvhe...@alvh.no-ip.org" wrote: > ... while reading the resulting code after backpatching to all branches, > I realized that if there are no registrations whatsoever, then archiving > won't do anything, which surely is the wrong thing to do. The correct > behavior should be "if there are no registrations, then *all* flushed > segments can be notified". Hm. My expectation would be that if there are no registrations, we cannot create .ready files for the flushed segments. The scenario where I can see that happening is when a record gets flushed to disk prior to registration. In that case, we'll still eventually register the record and wake up the WAL writer process, which will take care of creating the .ready files that were missed earlier. Is there another case you are thinking of where we could miss registration for a cross- segment record altogether? Nathan
Re: Mark all GUC variable as PGDLLIMPORT
On 08/23/21 10:57, Chapman Flack wrote: > Maybe those cases aren't very numerous ... and maybe they're distinctive > enough to recognize when creating one ("hmm, I am creating a check or > assign hook that does significant work here, will it be worth exposing > a getter API for the product of the work?"). How about a generic GetTypedConfigValue(char const *name, void *into) ? By default the type of *into would correspond to the type of the GUC (int for an enum) and would be returned directly from *conf->variable by a getter hook installed by default when the GUC is defined. But the definition could also supply a getter hook that would store a value of a different type, or computed a different way, for a GUC where that's appropriate. The function return could be boolean, true if such a variable exists and isn't a placeholder. Pro: provides read access to the typed value of any GUC, without exposing it to overwriting. Con: has to bsearch the GUCs every time the value is wanted. What I'd really like: ObserveTypedConfigValue(char const *name, void *into, int *flags, int flag) This would register an observer of the named GUC: whenever its typed value changes, it is written at *into and flag is ORed into *flags. This is more restrictive than allowing arbitrary code to be hooked into GUC changes (as changes can happen at delicate times such as error recovery), but allows an extension to always have the current typed value available and to cheaply know when it has changed. Observers would be updated in the normal course of processing a GUC change, eliminating the bsearch-per-lookup cost of the first approach. Thinkable? Regards, -Chap
Re: .ready and .done files considered harmful
On 8/23/21, 6:42 AM, "Robert Haas" wrote: > On Sun, Aug 22, 2021 at 10:31 PM Bossart, Nathan wrote: >> I ran this again on a bigger machine with 200K WAL files pending >> archive. The v9 patch took ~5.5 minutes, the patch I sent took ~8 >> minutes, and the existing logic took just under 3 hours. > > Hmm. On the one hand, 8 minutes > 5.5 minutes, and presumably the gap > would only get wider if the number of files were larger or if reading > the directory were slower. I am pretty sure that reading the directory > must be much slower in some real deployments where this problem has > come up. On the other hand, 8.8 minutes << 3 hours, and your patch > would win if somehow we had a ton of gaps in the sequence of files. > I'm not sure how likely that is to be the cause - probably not very > likely at all if you aren't using an archive command that cheats, but > maybe really common if you are. Hmm, but I think if the > archive_command cheats by marking a bunch of files done when it is > tasked with archiving just one, your patch will break, because, unless > I'm missing something, it doesn't re-evaluate whether things have > changed on every pass through the loop as Dipesh's patch does. So I > guess I'm not quite sure I understand why you think this might be the > way to go? To handle a "cheating" archive command, I'd probably need to add a stat() for every time pgarch_readyXLog() returned something from arch_files. I suspect something similar might be needed in Dipesh's patch to handle backup history files and partial WAL files. In any case, I think Dipesh's patch is the way to go. It obviously will perform better in the extreme cases discussed in this thread. I think it's important to make sure the patch doesn't potentially leave files behind to be picked up by a directory scan that might not happen, but there are likely ways to handle that. In the worst case, perhaps we need to force a directory scan every N files to make sure nothing gets left behind. But maybe we can do better. > Maintaining the binary heap in lowest-priority-first order is very > clever, and the patch does look quite elegant. I'm just not sure I > understand the point. This was mostly an exploratory exercise to get some numbers for the different approaches discussed in this thread. Nathan
Re: archive status ".ready" files may be created too early
On 2021-Aug-21, Bossart, Nathan wrote: > > Well, the thing I realized is that these three helper functions have > > exactly one caller each. I think the compiler is going to inline them, > > so there isn't going to be a function call in the assembly. I haven't > > verified this, though. > > Good point. It looks like they're getting inlined for me. I still didn't like it, because it looks like we're creating an API for which there can be only one caller. So I expanded the functions in the caller. It doesn't look too bad. However ... ... while reading the resulting code after backpatching to all branches, I realized that if there are no registrations whatsoever, then archiving won't do anything, which surely is the wrong thing to do. The correct behavior should be "if there are no registrations, then *all* flushed segments can be notified". I'll fix that ... Another thing I didn't like is that you used a name ending in RecPtr for the LSN, which gives no indication that it really is the *end* LSN, not the start pointer. And it won't play nice with the need to add the *start* LSN which we'll need to implement solving the equivalent problem for streaming replication. I'll rename those to earliestSegBoundaryEndPtr and latestSegBoundaryEndPtr. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ Syntax error: function hell() needs an argument. Please choose what hell you want to involve. >From 6e4c3fd6f5687ec5762de121344ce35c1c890812 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 23 Aug 2021 09:06:09 -0400 Subject: [PATCH v15] Avoid creating archive status ".ready" files too early WAL records may span multiple segments, but XLogWrite() does not wait for the entire record to be written out to disk before creating archive status files. Instead, as soon as the last WAL page of the segment is written, the archive status file is created, and the archiver may process it. If PostgreSQL crashes before it is able to write and flush the rest of the record (in the next WAL segment), the wrong version of the first segment file lingers in the archive, which causes operations such as point-in-time restores to fail. To fix this, keep track of records that span across segments and ensure that segments are only marked ready-for-archival once such records have been completely written to disk. This has always been wrong, so backpatch all the way back. Author: Nathan Bossart Reviewed-by: Kyotaro Horiguchi Reviewed-by: Ryo Matsumura Reviewed-by: Andrey Borodin Discussion: https://postgr.es/m/cbddfa01-6e40-46bb-9f98-9340f4379...@amazon.com --- src/backend/access/transam/timeline.c| 2 +- src/backend/access/transam/xlog.c| 220 +-- src/backend/access/transam/xlogarchive.c | 17 +- src/backend/postmaster/walwriter.c | 7 + src/backend/replication/walreceiver.c| 6 +- src/include/access/xlog.h| 1 + src/include/access/xlogarchive.h | 4 +- src/include/access/xlogdefs.h| 1 + 8 files changed, 234 insertions(+), 24 deletions(-) diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c index 8d0903c175..acd5c2431d 100644 --- a/src/backend/access/transam/timeline.c +++ b/src/backend/access/transam/timeline.c @@ -452,7 +452,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI, if (XLogArchivingActive()) { TLHistoryFileName(histfname, newTLI); - XLogArchiveNotify(histfname); + XLogArchiveNotify(histfname, true); } } diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e51a7a749d..8e7c3a364a 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -724,6 +724,18 @@ typedef struct XLogCtlData XLogRecPtr lastFpwDisableRecPtr; slock_t info_lck; /* locks shared variables shown above */ + + /* + * Variables used to track cross-segment records. Protected by + * segtrack_lck. + */ + XLogSegNo lastNotifiedSeg; + XLogSegNo earliestSegBoundary; + XLogRecPtr earliestSegBoundaryRecPtr; + XLogSegNo latestSegBoundary; + XLogRecPtr latestSegBoundaryRecPtr; + + slock_t segtrack_lck; /* locks shared variables shown above */ } XLogCtlData; static XLogCtlData *XLogCtl = NULL; @@ -920,6 +932,7 @@ static void RemoveXlogFile(const char *segname, XLogSegNo recycleSegNo, XLogSegNo *endlogSegNo); static void UpdateLastRemovedPtr(char *filename); static void ValidateXLOGDirectoryStructure(void); +static void RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr pos); static void CleanupBackupHistory(void); static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force); static XLogRecord *ReadRecord(XLogReaderState *xlogreader, @@ -1154,23 +1167,56 @@ XLogInsertRecord(XLogRecData *rdata, END_CRIT_SECTION(); /* - * Update shared LogwrtRqst.Write, if we crossed page boundary. + * If we crossed page boundary, update LogwrtRqst.Write; if we crossed + *
Regexp: observable bug from careless usage of zaptreesubs
While looking at the regexp code, I started to get uncomfortable about the implications of commit 0c3405cf1: it means that not only the cdissect() phase, but also the preceding DFA-check phase (longest()/shortest()) rely on saved subexpression match positions to be valid for the match we're currently considering. It seemed to me that the cdissect() recursion wasn't being careful to reset the match pointers for an abandoned submatch before moving on to the next attempt, meaning that dfa_backref() could conceivably get applied using a stale match pointer. Upon poking into it, I failed to find any bug of that exact ilk, but what I did find was a pre-existing bug of not resetting an abandoned match pointer at all. That allows these fun things: regression=# select 'abcdef' ~ '^(.)\1|\1.'; ?column? -- t (1 row) regression=# select 'abadef' ~ '^((.)\2|..)\2'; ?column? -- t (1 row) In both of these examples, the (.) capture is in an alternation branch that subsequently fails; therefore, the later backref should never be able to match. But it does, because we forgot to clear the capture's match data on the way out. It turns out that this can be fixed using fewer, not more, zaptreesubs calls, if we are careful to define the recursion rules precisely. See attached. This bug is ancient. I verified it as far back as PG 7.4, and it can also be reproduced in Tcl, so it's likely aboriginal to Spencer's library. It's not that surprising that no one has reported it, because regexps that have this sort of possibly-invalid backref are most likely incorrect. In most use-cases the existing code will fail to match, as expected, so users would probably notice that and fix their regexps without observing that there are cases where the match incorrectly succeeds. regards, tom lane diff --git a/src/backend/regex/regexec.c b/src/backend/regex/regexec.c index aa5ba85514..2411e6561d 100644 --- a/src/backend/regex/regexec.c +++ b/src/backend/regex/regexec.c @@ -216,14 +216,14 @@ pg_regexec(regex_t *re, if (backref && nmatch <= v->g->nsub) { /* need larger work area */ - if (v->g->nsub + 1 <= LOCALMAT) + v->nmatch = v->g->nsub + 1; + if (v->nmatch <= LOCALMAT) v->pmatch = mat; else - v->pmatch = (regmatch_t *) MALLOC((v->g->nsub + 1) * - sizeof(regmatch_t)); + v->pmatch = (regmatch_t *) MALLOC(v->nmatch * sizeof(regmatch_t)); if (v->pmatch == NULL) return REG_ESPACE; - v->nmatch = v->g->nsub + 1; + zapallsubs(v->pmatch, v->nmatch); } else { @@ -488,7 +488,6 @@ find(struct vars *v, return REG_OKAY; /* find submatches */ - zapallsubs(v->pmatch, v->nmatch); return cdissect(v, v->g->tree, begin, end); } @@ -599,7 +598,6 @@ cfindloop(struct vars *v, break; /* no match with this begin point, try next */ MDEBUG(("tentative end %ld\n", LOFF(end))); /* Dissect the potential match to see if it really matches */ -zapallsubs(v->pmatch, v->nmatch); er = cdissect(v, v->g->tree, begin, end); if (er == REG_OKAY) { @@ -647,6 +645,8 @@ cfindloop(struct vars *v, /* * zapallsubs - initialize all subexpression matches to "no match" + * + * Note that p[0], the overall-match location, is not touched. */ static void zapallsubs(regmatch_t *p, @@ -716,8 +716,30 @@ subset(struct vars *v, * DFA and found that the proposed substring satisfies the DFA. (We make * the caller do that because in concatenation and iteration nodes, it's * much faster to check all the substrings against the child DFAs before we - * recurse.) Also, caller must have cleared subexpression match data via - * zaptreesubs (or zapallsubs at the top level). + * recurse.) + * + * A side-effect of a successful match is to save match locations for + * capturing subexpressions in v->pmatch[]. This is a little bit tricky, + * so we make the following rules: + * 1. Before initial entry to cdissect, all match data must have been + *cleared (this is seen to by zapallsubs). + * 2. Before any recursive entry to cdissect, the match data for that + *subexpression tree must be guaranteed clear (see zaptreesubs). + * 3. When returning REG_OKAY, each level of cdissect will have saved + *any relevant match locations. + * 4. When returning REG_NOMATCH, each level of cdissect will guarantee + *that its subexpression match locations are again clear. + * 5. No guarantees are made for error cases (i.e., other result codes). + * 6. When a level of cdissect abandons a successful sub-match, it will + *clear that subtree's match locations with zaptreesubs before trying + *any new DFA match or cdissect call for that subtree or any subtree + *to its right (that is, any subtree that could have a backref into the + *abandoned match). + * This may seem overly complicated, but it's difficult to simplify it + * because of the provision that match locations must be reset before + * any fresh DFA match (a rule that is needed to
Re: Mark all GUC variable as PGDLLIMPORT
On 2021-Aug-23, Robert Haas wrote: > It's also a bit unfair to say, well we have APIs for accessing GUC > values. It's true that we do. But if the GUC variable is, say, a > Boolean, you do not want your extension to call some function that > does a bunch of shenanigans and returns a string so that you can then > turn around and parse the string to recover the Boolean value. Even > moreso if the value is an integer or a comma-separated list. You want > to access the value as the system represents it internally, not > duplicate the parsing logic in a way that is inefficient and > bug-prone. In that case, why not improve the API with functions that return the values in some native datatype? For scalars with native C types (int, floats, Boolean etc) this is easy enough; I bet it'll solve 99% of the problems or more. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
Re: Queries that should be canceled will get stuck on secure_write function
On 2021-Aug-23, Robert Haas wrote: > On Mon, Aug 23, 2021 at 10:45 AM Alvaro Herrera > wrote: > > Do we actually need new GUCs, though? I think we should never let an > > unresponsive client dictate what the server does, because that opens the > > door for uncooperative or malicious clients to wreak serious havoc. I > > think the implementation should wait until time now+X to cancel the > > query, but if by time now+2X (or whatever we deem reasonable -- maybe > > now+1.1X) we're still waiting, then it's okay to just close the > > connection. This suggests a completely different implementation, though. > > I don't quite understand what you mean by waiting until time now+X to > cancel the query. We don't know a priori when query cancels are going > to happen, but when they do happen we want to respond to them as > quickly as we can. It seems reasonable to say that if we can't handle > them within X amount of time we can resort to emergency measures, but > that's basically what the patch does, except it uses a GUC instead of > hardcoding X. Aren't we talking about query cancellations that occur in response to a standby delay limit? Those aren't in response to user action. What I mean is that if the standby delay limit is exceeded, then we send a query cancel; we expect the standby to cancel its query at that time and then the primary can move on. But if the standby doesn't react, then we can have it terminate its connection. I'm looking at the problem from the primary's point of view rather than the standby's point of view. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
Re: Queries that should be canceled will get stuck on secure_write function
On Mon, Aug 23, 2021 at 10:45 AM Alvaro Herrera wrote: > Do we actually need new GUCs, though? I think we should never let an > unresponsive client dictate what the server does, because that opens the > door for uncooperative or malicious clients to wreak serious havoc. I > think the implementation should wait until time now+X to cancel the > query, but if by time now+2X (or whatever we deem reasonable -- maybe > now+1.1X) we're still waiting, then it's okay to just close the > connection. This suggests a completely different implementation, though. I don't quite understand what you mean by waiting until time now+X to cancel the query. We don't know a priori when query cancels are going to happen, but when they do happen we want to respond to them as quickly as we can. It seems reasonable to say that if we can't handle them within X amount of time we can resort to emergency measures, but that's basically what the patch does, except it uses a GUC instead of hardcoding X. > I wonder if it's possible to write a test for this. We would have to > send a query and then hang the client somehow. I recently added a TAP > test that uses SIGSTOP to a walsender ... can we use SIGSTOP with a > background psql that's running SELECT pg_sleep() perhaps? > (Or maybe it's sufficient to start background psql and not pump() it?) Starting a background process and not pumping it sounds promising, because it seems like it would be more likely to be portable. I think we'd want to be careful not to assume very much about how large the output buffer is, because I'm guessing that varies by platform and that it might in some cases be fairly large. Perhaps we could use pg_stat_activity to wait until we block in a ClientWrite state, although I wonder if we might find out that we sometimes block on a different wait state than what we expect to see. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Mark all GUC variable as PGDLLIMPORT
On 08/23/21 10:36, Bruce Momjian wrote: > So the problem is that extensions only _need_ to use that API on > Windows, so many initially don't, or that the API is too limited? I think there can be cases where it's too limited, such as when significant computation or validation is needed between the form of the setting known to the GUC machinery and the form that would otherwise be available in the global. I'm thinking, for instance, of the old example before session_timezone was PGDLLIMPORTed, and you'd have to GETCONFIGOPTION("timezone") and repeat the work done by pg_tzset to validate and map the timezone name through the timezone database, to reconstruct the value that was otherwise already available in session_timezone. Maybe those cases aren't very numerous ... and maybe they're distinctive enough to recognize when creating one ("hmm, I am creating a check or assign hook that does significant work here, will it be worth exposing a getter API for the product of the work?"). Regards, -Chap
Re: Mark all GUC variable as PGDLLIMPORT
On Mon, Aug 23, 2021 at 10:15 AM Tom Lane wrote: > And yes, I absolutely would prohibit extensions from accessing many > of them, if there were a reasonable way to do it. It would be a good > start towards establishing a defined API for extensions. Mostly, it would make extension development more difficult for no discernable benefit to the project. You've made this argument many times over the years ... but we're no closer to having an extension API than we ever were, and we continue to get complaints about breaking stuff on Windows on a pretty regular basis. Honestly, it seems unimaginable that an API is ever really going to be possible. It would be a ton of work to maintain, and we'd just end up breaking it every time we discover that there's a new feature we want to implement which doesn't fit into the defined API now. That's what we do *now* with functions that third-party extensions actually call, and with variables that they access, and it's not something that, in my experience, is any great problem in maintaining an extension. You're running code that is running inside somebody else's executable and sometimes you have to adjust it for upstream changes. That's life, and I don't think that complaints about that topic are nearly as frequent as complaints about extensions breaking on Windows because of missing PGDLLIMPORT markings. And more than that, I'm pretty sure that you've previously taken the view that we shouldn't document all the hook functions that only exist in the backend for the purpose of extension use. I think you would argue against a patch to go and document all the variables that are marked PGDLLIMPORT now. So it seems to me that you're for an API when it means that we don't have to change anything, and against an API when it means that we don't have to change anything, which doesn't really seem like a consistent position. I think we should be responding to the real, expressed needs of extension developers, and the lack of PGDLLIMPORT markings on various global variables is surely top of the list. It's also a bit unfair to say, well we have APIs for accessing GUC values. It's true that we do. But if the GUC variable is, say, a Boolean, you do not want your extension to call some function that does a bunch of shenanigans and returns a string so that you can then turn around and parse the string to recover the Boolean value. Even moreso if the value is an integer or a comma-separated list. You want to access the value as the system represents it internally, not duplicate the parsing logic in a way that is inefficient and bug-prone. In short, +1 from me for the original proposal of marking all GUCs as PGDLLIMPORT. And, heck, +1 for marking all the other global variables that way, too. We're not solving any problem here. We're just annoying people, mostly people who are loyal community members and steady contributors to the project. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Mark all GUC variable as PGDLLIMPORT
On Mon, Aug 23, 2021 at 10:15 PM Tom Lane wrote: > > And yes, I absolutely would prohibit extensions from accessing many > of them, if there were a reasonable way to do it. It would be a good > start towards establishing a defined API for extensions. The v2 patch I sent does that, at least when compiling with GCC. I didn't find something similar for clang, but I only checked quickly. I'm assuming that the unreasonable part is having to add some extra attribute to the variable? Would it be acceptable if wrapped into some other macro, as I proposed?
Re: Mark all GUC variable as PGDLLIMPORT
On Mon, Aug 23, 2021 at 10:36 PM Bruce Momjian wrote: > > So the problem is that extensions only _need_ to use that API on > Windows, so many initially don't, or that the API is too limited? The inconvenience with that API is that it's only returning c strings, so you gave to convert it back to the original datatype. That's probably why most of the extensions simply read from the original exposed variable rather than using the API, because they're usually written on Linux or similar, not because they want to mess up the stored value.
Re: Mark all GUC variable as PGDLLIMPORT
On Mon, Aug 23, 2021 at 10:22 PM Tom Lane wrote: > > Bruce Momjian writes: > > On Mon, Aug 23, 2021 at 10:15:04AM -0400, Tom Lane wrote: > >> By that argument, *every* globally-visible variable should be marked > >> PGDLLIMPORT. But the mere fact that two backend .c files need to access > > > No, Julien says 99% need only the GUCs, so that is not the argument I am > > making. > > That's a claim unbacked by any evidence that I've seen. More to > the point, we already have a mechanism that extensions can/should > use to read and write GUC settings, and it's not direct access. I clearly didn't try all single extension available out there. It's excessively annoying to compile extensions on Windows, and also I don't have a lot of dependencies installed so there are some that I wasn't able to test since I'm lacking some other lib and given my absolute lack of knowledge of that platform I didn't spent time trying to install those. I think I tested a dozen of "small" extensions (I'm assuming that the big one like postgis would require too much effort to build, and they probably already take care of Windows compatibility), and I only faced this problem. That's maybe not a representative set, but I also doubt that I was unlucky enough to find the few only exceptions.
Re: Queries that should be canceled will get stuck on secure_write function
On 2021-Aug-23, Robert Haas wrote: > On Mon, Aug 23, 2021 at 4:15 AM 蔡梦娟(玊于) wrote: > > I want to know why the interrupt is only handled when ProcDiePending > > is true, I think query which is supposed to be canceled also should > > respond to the signal. Yeah, I agree. > Well, if we're halfway through sending a message to the client and we > abort the write, we have no way of re-establishing protocol sync, > right? It's OK to die without sending any more data to the client, > since then the connection is closed anyway and the loss of sync > doesn't matter, but continuing the session doesn't seem workable. > > Your proposed patch actually seems to dodge this problem and I think > perhaps we could consider something along those lines. Do we actually need new GUCs, though? I think we should never let an unresponsive client dictate what the server does, because that opens the door for uncooperative or malicious clients to wreak serious havoc. I think the implementation should wait until time now+X to cancel the query, but if by time now+2X (or whatever we deem reasonable -- maybe now+1.1X) we're still waiting, then it's okay to just close the connection. This suggests a completely different implementation, though. I wonder if it's possible to write a test for this. We would have to send a query and then hang the client somehow. I recently added a TAP test that uses SIGSTOP to a walsender ... can we use SIGSTOP with a background psql that's running SELECT pg_sleep() perhaps? (Or maybe it's sufficient to start background psql and not pump() it?) -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "It takes less than 2 seconds to get to 78% complete; that's a good sign. A few seconds later it's at 90%, but it seems to have stuck there. Did somebody make percentages logarithmic while I wasn't looking?" http://smylers.hates-software.com/2005/09/08/1995c749.html
Re: Mark all GUC variable as PGDLLIMPORT
On Mon, Aug 23, 2021 at 10:22:51AM -0400, Tom Lane wrote: > Bruce Momjian writes: > > On Mon, Aug 23, 2021 at 10:15:04AM -0400, Tom Lane wrote: > >> By that argument, *every* globally-visible variable should be marked > >> PGDLLIMPORT. But the mere fact that two backend .c files need to access > > > No, Julien says 99% need only the GUCs, so that is not the argument I am > > making. > > That's a claim unbacked by any evidence that I've seen. More to > the point, we already have a mechanism that extensions can/should > use to read and write GUC settings, and it's not direct access. So the problem is that extensions only _need_ to use that API on Windows, so many initially don't, or that the API is too limited? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Mark all GUC variable as PGDLLIMPORT
Bruce Momjian writes: > On Mon, Aug 23, 2021 at 10:15:04AM -0400, Tom Lane wrote: >> By that argument, *every* globally-visible variable should be marked >> PGDLLIMPORT. But the mere fact that two backend .c files need to access > No, Julien says 99% need only the GUCs, so that is not the argument I am > making. That's a claim unbacked by any evidence that I've seen. More to the point, we already have a mechanism that extensions can/should use to read and write GUC settings, and it's not direct access. regards, tom lane
Re: Mark all GUC variable as PGDLLIMPORT
On Mon, Aug 23, 2021 at 10:15:04AM -0400, Tom Lane wrote: > Bruce Momjian writes: > > On Sun, Aug 22, 2021 at 09:29:01PM +0800, Julien Rouhaud wrote: > >> Then shouldn't we try to prevent direct access on all platforms rather than > >> only one? > > > Agreed. If Julian says 99% of the non-export problems are GUCs, and we > > can just export them all, why not do it? We already export every global > > variable on Unix-like systems, and we have seen no downsides. > > By that argument, *every* globally-visible variable should be marked > PGDLLIMPORT. But the mere fact that two backend .c files need to access No, Julien says 99% need only the GUCs, so that is not the argument I am making. > some variable doesn't mean that we want any random bit of code doing so. > > And yes, I absolutely would prohibit extensions from accessing many > of them, if there were a reasonable way to do it. It would be a good > start towards establishing a defined API for extensions. Well, if Unix needed it, we would have addressed this more regularly, but since it is only Windows that has this _feature_, it is the rare Windows cases that need adjustment, and usually as an afterthought since Windows isn't usually a primary tested platform. What I am saying is that if we blocked global variable access on Unix, initial testing would have showed what needs to be exported and it would not be as big a problem. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Added schema level support for publication.
On Tue, Aug 17, 2021 at 6:55 PM Tom Lane wrote: > > Amit Kapila writes: > > On Tue, Aug 17, 2021 at 6:40 AM Peter Smith wrote: > >> On Mon, Aug 16, 2021 at 11:31 PM Tom Lane wrote: > >>> Abstractly it'd be > >>> > >>> createpub := CREATE PUBLICATION pubname FOR cpitem [, ... ] [ WITH ... ] > >>> > >>> cpitem := ALL TABLES | > >>> TABLE name | > >>> ALL TABLES IN SCHEMA name | > >>> ALL SEQUENCES | > >>> SEQUENCE name | > >>> ALL SEQUENCES IN SCHEMA name | > >>> name > >>> > >>> The grammar output would need some post-analysis to attribute the > >>> right type to bare "name" items, but that doesn't seem difficult. > > >> That last bare "name" cpitem. looks like it would permit the following syntax: > >> CREATE PUBLICATION pub FOR a,b,c; > >> Was that intentional? > > > I think so. > > I had supposed that we could throw an error at the post-processing stage, > or alternatively default to assuming that such names are tables. > > Now you could instead make the grammar work like > > cpitem := ALL TABLES | > TABLE name [, ...] | > ALL TABLES IN SCHEMA name [, ...] | > ALL SEQUENCES | > SEQUENCE name [, ...] | > ALL SEQUENCES IN SCHEMA name [, ...] > > which would result in a two-level-list data structure. I'm not sure > that this is better, as any sort of mistake would result in a very > uninformative generic "syntax error" from Bison. Errors out of a > post-processing stage could be more specific than that. I preferred the implementation in the lines Tom Lane had proposed at [1]. Is it ok if the implementation is something like below: CreatePublicationStmt: CREATE PUBLICATION name FOR pub_obj_list opt_definition { CreatePublicationStmt *n = makeNode(CreatePublicationStmt); n->pubname = $3; n->options = $6; n->pubobjects = (List *)$5; $$ = (Node *)n; } ; pub_obj_list: PublicationObjSpec { $$ = list_make1($1); } | pub_obj_list ',' PublicationObjSpec { $$ = lappend($1, $3); } ; /* FOR TABLE and FOR ALL TABLES IN SCHEMA specifications */ PublicationObjSpec: TABLE pubobj_expr { } | ALL TABLES IN_P SCHEMA pubobj_expr { } | pubobj_expr { } ; pubobj_expr: any_name { } | any_name '*' { } | ONLY any_name { } | ONLY '(' any_name ')' { } | CURRENT_SCHEMA { } ; I needed pubobj_expr to support the existing syntaxes supported by relation_expr and also to handle CURRENT_SCHEMA support in case of the "FOR ALL TABLES IN SCHEMA" feature. I changed the name to any_name to support objects like "sch1.t1". I felt if a user specified "FOR ALL TABLES", the user should not be allowed to combine it with "FOR TABLE" and "FOR ALL TABLES IN SCHEMA" as "FOR ALL TABLES" anyway will include all the tables. Should we support the similar syntax in case of alter publication, like "ALTER PUBLICATION pub1 ADD TABLE t1,t2, ALL TABLES IN SCHEMA sch1, sch2" or shall we keep these separate like "ALTER PUBLICATION pub1 ADD TABLE t1, t2" and "ALTER PUBLICATION pub1 ADD ALL TABLES IN SCHEMA sch1, sch2". I preferred to keep it separate as we have kept ADD/DROP separately which cannot be combined currently. I have not mentioned SEQUENCE handling separately, the sequence will be extended in similar lines. I thought of throwing an error at post processing, the first option that Tom Lane had suggested if the user specified syntax like: create publication pub1 for t1,t2; Thoughts? [1] - https://www.postgresql.org/message-id/877603.1629120678%40sss.pgh.pa.us Regards, Vignesh
Re: Mark all GUC variable as PGDLLIMPORT
Bruce Momjian writes: > On Sun, Aug 22, 2021 at 09:29:01PM +0800, Julien Rouhaud wrote: >> Then shouldn't we try to prevent direct access on all platforms rather than >> only one? > Agreed. If Julian says 99% of the non-export problems are GUCs, and we > can just export them all, why not do it? We already export every global > variable on Unix-like systems, and we have seen no downsides. By that argument, *every* globally-visible variable should be marked PGDLLIMPORT. But the mere fact that two backend .c files need to access some variable doesn't mean that we want any random bit of code doing so. And yes, I absolutely would prohibit extensions from accessing many of them, if there were a reasonable way to do it. It would be a good start towards establishing a defined API for extensions. regards, tom lane
Re: Queries that should be canceled will get stuck on secure_write function
On Mon, Aug 23, 2021 at 4:15 AM 蔡梦娟(玊于) wrote: > I want to know why the interrupt is only handled when ProcDiePending is true, > I think query which is supposed to be canceled also should respond to the > signal. Well, if we're halfway through sending a message to the client and we abort the write, we have no way of re-establishing protocol sync, right? It's OK to die without sending any more data to the client, since then the connection is closed anyway and the loss of sync doesn't matter, but continuing the session doesn't seem workable. Your proposed patch actually seems to dodge this problem and I think perhaps we could consider something along those lines. It would be interesting to hear what Andres thinks about this idea, since I think he was the last one to rewrite that code. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Mark all GUC variable as PGDLLIMPORT
On Sun, Aug 22, 2021 at 09:29:01PM +0800, Julien Rouhaud wrote: > On Sun, Aug 22, 2021 at 09:19:42AM -0400, Tom Lane wrote: > > > > Uh, no, it's exactly *not* clear. There are a lot of GUCs that are only > > of interest to particular subsystems. I do not see why being a GUC makes > > something automatically more interesting than any other global variable. > > Usually, the fact that one is global is only so the GUC machinery itself > > can get at it, otherwise it'd be static in the owning module. > > > > As for "extensions should be able to get at the values", the GUC machinery > > already provides uniform mechanisms for doing that safely. Direct access > > to the variable's internal value would be unsafe in many cases. > > Then shouldn't we try to prevent direct access on all platforms rather than > only one? Agreed. If Julian says 99% of the non-export problems are GUCs, and we can just export them all, why not do it? We already export every global variable on Unix-like systems, and we have seen no downsides. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
RE: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
On Mon, Aug 23, 2021 8:01 PM Amit Kapila wrote: > On Mon, Aug 23, 2021 at 2:45 PM > houzj.f...@fujitsu.com wrote: > > > > On Mon, Aug 23, 2021 12:59 PM Amit Kapila wrote: > > > > > > On Sat, Aug 7, 2021 at 6:53 PM houzj.f...@fujitsu.com > > > wrote: > > > > > > > > Personally, I also think it will be better to make the behavior > > > > consistent. > > > > Attach the new version patch make both ADD and DROP behave the > > > > same as SET PUBLICATION which refresh all the publications. > > > > > > > > > > I think we can have tests in the separate test file > > > (alter_sub_pub.pl) like you earlier had in one of the versions. Use > > > some meaningful names for tables instead of temp1, temp2 as you had in the > previous version. > > > Otherwise, the code changes look good to me. > > > > Thanks for the comment. > > Attach new version patch which did the following changes. > > > > * move the tests to a separate test file (024_alter_sub_pub.pl) > > * adjust the document of copy_data option > > * add tab-complete for copy_data option when ALTER DROP publication. > > > > Thanks, the patch looks good to me. I have made some cosmetic changes in the > attached version. It makes the test cases easier to understand. > I am planning to push this tomorrow unless there are more comments or > suggestions. Thanks ! The patch looks good to me. I noticed that the patch cannot be applied to PG14-stable, so I attach a separate patch for back-patch(I didn’t change the patch for HEAD branch). Best regards, Hou zj v7-0001-PG14-Fix-Alter-Subscription-Add-Drop-Publication-refre_patch Description: v7-0001-PG14-Fix-Alter-Subscription-Add-Drop-Publication-refre_patch v7-0001-PG15-Fix-Alter-Subscription-Add-Drop-Publication-refre.patch Description: v7-0001-PG15-Fix-Alter-Subscription-Add-Drop-Publication-refre.patch
Re: .ready and .done files considered harmful
On Sun, Aug 22, 2021 at 10:31 PM Bossart, Nathan wrote: > I ran this again on a bigger machine with 200K WAL files pending > archive. The v9 patch took ~5.5 minutes, the patch I sent took ~8 > minutes, and the existing logic took just under 3 hours. Hmm. On the one hand, 8 minutes > 5.5 minutes, and presumably the gap would only get wider if the number of files were larger or if reading the directory were slower. I am pretty sure that reading the directory must be much slower in some real deployments where this problem has come up. On the other hand, 8.8 minutes << 3 hours, and your patch would win if somehow we had a ton of gaps in the sequence of files. I'm not sure how likely that is to be the cause - probably not very likely at all if you aren't using an archive command that cheats, but maybe really common if you are. Hmm, but I think if the archive_command cheats by marking a bunch of files done when it is tasked with archiving just one, your patch will break, because, unless I'm missing something, it doesn't re-evaluate whether things have changed on every pass through the loop as Dipesh's patch does. So I guess I'm not quite sure I understand why you think this might be the way to go? Maintaining the binary heap in lowest-priority-first order is very clever, and the patch does look quite elegant. I'm just not sure I understand the point. -- Robert Haas EDB: http://www.enterprisedb.com
Re: replay of CREATE TABLESPACE eats data at wal_level=minimal
On Sun, Aug 22, 2021 at 6:59 PM Noah Misch wrote: > Here's what I plan to push. Besides adding a test, I modified things so > CREATE TABLESPACE redo continues to report an error if a non-directory exists > under the name we seek to create. One could argue against covering that > corner case, but TablespaceCreateDbspace() does cover it. By and large, LGTM, though perhaps it would be appropriate to also credit me as the reporter of the issue. I feel it might be slightly better to highlight somewhere, either in the commit message or in the comments, that removing the old directory is unsafe, because if wal_level=minimal, we may have no other copy of the data. For me that's the key point here. I feel that the commit message and comments inside the patch explain rather thoroughly the possible consequences of the bug and why this particular fix was chosen, but they're not real explicit about why there was a bug at all. Thanks very much for working on this. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
On Mon, Aug 23, 2021 at 2:45 PM houzj.f...@fujitsu.com wrote: > > On Mon, Aug 23, 2021 12:59 PM Amit Kapila wrote: > > > > On Sat, Aug 7, 2021 at 6:53 PM houzj.f...@fujitsu.com > > wrote: > > > > > > Personally, I also think it will be better to make the behavior > > > consistent. > > > Attach the new version patch make both ADD and DROP behave the same as > > > SET PUBLICATION which refresh all the publications. > > > > > > > I think we can have tests in the separate test file (alter_sub_pub.pl) like > > you > > earlier had in one of the versions. Use some meaningful names for tables > > instead of temp1, temp2 as you had in the previous version. > > Otherwise, the code changes look good to me. > > Thanks for the comment. > Attach new version patch which did the following changes. > > * move the tests to a separate test file (024_alter_sub_pub.pl) > * adjust the document of copy_data option > * add tab-complete for copy_data option when ALTER DROP publication. > Thanks, the patch looks good to me. I have made some cosmetic changes in the attached version. It makes the test cases easier to understand. I am planning to push this tomorrow unless there are more comments or suggestions. -- With Regards, Amit Kapila. v6-0001-Fix-Alter-Subscription-Add-Drop-Publication-refre.patch Description: Binary data
Re: replay of CREATE TABLESPACE eats data at wal_level=minimal
On Mon, Aug 23, 2021 at 4:29 AM Noah Misch wrote: > On Wed, Aug 18, 2021 at 10:32:10PM -0700, Noah Misch wrote: > > On Wed, Aug 18, 2021 at 10:47:24AM -0400, Robert Haas wrote: > > > On Tue, Aug 10, 2021 at 9:35 AM Robert Haas > wrote: > > > > Oh, yeah, I think that works, actually. I was imagining a few > problems > > > > here, but I don't think they really exist. The redo routines for > files > > > > within the directory can't possibly care about having the old files > > > > erased for them, since that wouldn't be something that would normally > > > > happen, if there were no recent CREATE TABLESPACE involved. And > > > > there's code further down to remove and recreate the symlink, just in > > > > case. So I think your proposed patch might be all we need. > > > > > > Noah, do you plan to commit this? > > > > Yes. I feel it needs a test case, which is the main reason I've queued > the > > task rather than just pushed what I posted last. > > Here's what I plan to push. Besides adding a test, I have reproduced the issue of data inconsistency with CREATE TABLESPACE at wal_level=minimal, also I have tested the fix with v0 and v1 patch, and come up with a similar tap-testcase(as in v1). The test case looks good. -- With Regards, Prabhat Kumar Sahu EnterpriseDB: http://www.enterprisedb.com
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash
On Wed, Aug 18, 2021 at 8:09 PM Drouvot, Bertrand wrote: > > Hi, > > On 8/18/21 12:01 PM, Amit Kapila wrote: > > On Wed, Aug 18, 2021 at 1:27 PM Drouvot, Bertrand > > wrote: > >> Hi, > > I've updated the comment and prepared the back patch versions: > I have verified and all your patches look good to me. I'll push and backpatch this by Wednesday unless there are more comments. -- With Regards, Amit Kapila.
Re: Separate out FileSet from SharedFileSet (was Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o)
On Mon, Aug 23, 2021 at 1:43 PM Amit Kapila wrote: Note: merge comments from multiple mails > > I think we should handle that in worker.c itself, by adding a > > before_dsm_detach function before_shmem_exit right? > > > > Yeah, I thought of handling it in worker.c similar to what you've in 0002 > patch. > I have done handling in worker.c On Mon, Aug 23, 2021 at 9:11 AM houzj.f...@fujitsu.com wrote: > > On Sat, Aug 21, 2021 8:38 PM Dilip Kumar wrote: > > 1) > + TempTablespacePath(tempdirpath, tablespace); > + snprintf(path, MAXPGPATH, "%s/%s%lu.%u.sharedfileset", > +tempdirpath, PG_TEMP_FILE_PREFIX, > +(unsigned long) fileset->creator_pid, > fileset->number); > > do we need to use different filename for shared and un-shared fileset ? I was also thinking about the same, does it make sense to name it just ""%s/%s%lu.%u.fileset"? On Mon, Aug 23, 2021 at 1:08 PM Masahiko Sawada wrote: > Here are some comments on 0001 patch: > > +/* > + * Initialize a space for temporary files. This API can be used by shared > + * fileset as well as if the temporary files are used only by single backend > + * but the files need to be opened and closed multiple times and also the > + * underlying files need to survive across transactions. > + * > + * Files will be distributed over the tablespaces configured in > + * temp_tablespaces. > + * > + * Under the covers the set is one or more directories which will eventually > + * be deleted. > + */ > > I think it's better to mention cleaning up here like we do in the > comment of SharedFileSetInit(). Right, done > > --- > I think we need to clean up both stream_fileset and subxact_fileset on > proc exit. In 0002 patch cleans up the fileset but I think we need to > do that also in 0001 patch. Done > --- > There still are some comments using "shared fileset" in both buffile.c > and worker.c. Done -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com From ced3a5e0222c2f89a977add167615f09d99d107a Mon Sep 17 00:00:00 2001 From: Dilip Kumar Date: Wed, 18 Aug 2021 15:52:21 +0530 Subject: [PATCH v2 1/2] Sharedfileset refactoring Currently, sharedfileset.c is designed for a very specific purpose i.e. one backend could create the fileset and could be shared across multiple backend so the fileset should be created in DSM. But in some use cases, we need exact behavior as sharedfileset that the files created under this should be named files and should survive the transaction and should allow to have feature of open and close. In this patch we have refactored these files such that there will be two files a) fileset.c which will provide general-purpose interfaces b) sharedfileset.c, which will internally used fileset.c but this will be specific for DSM-based filesets. --- src/backend/replication/logical/worker.c | 89 +++ src/backend/storage/file/Makefile | 1 + src/backend/storage/file/buffile.c| 82 +- src/backend/storage/file/fd.c | 2 +- src/backend/storage/file/fileset.c| 204 + src/backend/storage/file/sharedfileset.c | 239 +- src/backend/utils/sort/logtape.c | 8 +- src/backend/utils/sort/sharedtuplestore.c | 5 +- src/include/storage/buffile.h | 12 +- src/include/storage/fileset.h | 40 + src/include/storage/sharedfileset.h | 14 +- 11 files changed, 366 insertions(+), 330 deletions(-) create mode 100644 src/backend/storage/file/fileset.c create mode 100644 src/include/storage/fileset.h diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index ecaed15..d98ebab 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -39,13 +39,13 @@ * BufFile infrastructure supports temporary files that exceed the OS file size * limit, (b) provides a way for automatic clean up on the error and (c) provides * a way to survive these files across local transactions and allow to open and - * close at stream start and close. We decided to use SharedFileSet + * close at stream start and close. We decided to use FileSet * infrastructure as without that it deletes the files on the closure of the * file and if we decide to keep stream files open across the start/stop stream * then it will consume a lot of memory (more than 8K for each BufFile and * there could be multiple such BufFiles as the subscriber could receive * multiple start/stop streams for different transactions before getting the - * commit). Moreover, if we don't use SharedFileSet then we also need to invent + * commit). Moreover, if we don't use FileSet then we also need to invent * a new way to pass filenames to BufFile APIs so that we are allowed to open * the file we desired across multiple stream-open calls for the same * transaction. @@ -231,8 +231,8 @@ typedef struct
Re: Proposal: More structured logging
On Sat, Aug 21, 2021 at 2:37 AM Michael Paquier wrote: > > On Fri, Aug 20, 2021 at 11:35:29AM +0200, Ronan Dunklau wrote: > > Michael, your jsonlog module already fullfills this need. Is it something > > that > > should be merged into our tree ? > > Yes, there is nothing technically preventing to have this stuff in > core, of course, and that would even take care of the issues in > detecting if the piping protocol should be used or not. > > Now, the last time this was proposed, there was a lot of drawback > because the presence of the JSON keys increases significantly the size > of the logs compared to CSV, and usually users care a lot about the > size of the log files. I would support the presence of JSON format > for the logs in core, FWIW. As long as it's optional, I don't think that drawback holds as an argument. The same argument could be made against the cvs logs in the first place -- they add information to every row that a lot of people don't need. But it's optional. Leaving it up to the administrator to choose whether they prefer the verbose-and-easier-to-parse-maybe format or the more compact format seems like the right path for that. I bet quite a few would actually choose json -- easier to integrate with other systems (because newer systems love json), and unless you're actually logging a lot of queries (which many people don't), the size of the logfile is often not a concern at all. In short, I would also support the presence of JSON log format in core. (but as a proper log_destination of course -- or if it's time to actually split that into a separaet thing, being one parameter for log_destination and another for log_format) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Defer selection of asynchronous subplans until the executor initialization stage
On Wed, Jun 30, 2021 at 1:50 PM Andrey Lepikhov wrote: > I have completely rewritten this patch. > > Main idea: > > The async_capable field of a plan node inform us that this node could > work in async mode. Each node sets this field based on its own logic. > The actual mode of a node is defined by the async_capable of PlanState > structure. It is made at the executor initialization stage. > In this patch, only an append node could define async behaviour for its > subplans. I finally reviewed the patch. One thing I noticed about the patch is that it would break ordered Appends. Here is such an example using the patch: create table pt (a int) partition by range (a); create table loct1 (a int); create table loct2 (a int); create foreign table p1 partition of pt for values from (10) to (20) server loopback1 options (table_name 'loct1'); create foreign table p2 partition of pt for values from (20) to (30) server loopback2 options (table_name 'loct2'); explain verbose select * from pt order by a; QUERY PLAN - Append (cost=200.00..440.45 rows=5850 width=4) -> Async Foreign Scan on public.p1 pt_1 (cost=100.00..205.60 rows=2925 width=4) Output: pt_1.a Remote SQL: SELECT a FROM public.loct1 ORDER BY a ASC NULLS LAST -> Async Foreign Scan on public.p2 pt_2 (cost=100.00..205.60 rows=2925 width=4) Output: pt_2.a Remote SQL: SELECT a FROM public.loct2 ORDER BY a ASC NULLS LAST (7 rows) This would not always provide tuples in the required order, as async execution would provide them from the subplans rather randomly. I think it would not only be too late but be not efficient to do the planning work at execution time (consider executing generic plans!), so I think we should avoid doing so. (The cost of doing that work for simple foreign scans is small, but if we support async execution for upper plan nodes such as NestLoop as discussed before, I think the cost for such plan nodes would not be small anymore.) To just execute what was planned at execution time, I think we should return to the patch in [1]. The patch was created for Horiguchi-san’s async-execution patch, so I modified it to work with HEAD, and added a simplified version of your test cases. Please find attached a patch. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/7fe10f95-ac6c-c81d-a9d3-227493eb9...@postgrespro.ru allow-async-in-more-cases.patch Description: Binary data
RE: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
On Mon, Aug 23, 2021 12:59 PM Amit Kapila wrote: > > On Sat, Aug 7, 2021 at 6:53 PM houzj.f...@fujitsu.com > wrote: > > > > Personally, I also think it will be better to make the behavior consistent. > > Attach the new version patch make both ADD and DROP behave the same as > > SET PUBLICATION which refresh all the publications. > > > > I think we can have tests in the separate test file (alter_sub_pub.pl) like > you > earlier had in one of the versions. Use some meaningful names for tables > instead of temp1, temp2 as you had in the previous version. > Otherwise, the code changes look good to me. Thanks for the comment. Attach new version patch which did the following changes. * move the tests to a separate test file (024_alter_sub_pub.pl) * adjust the document of copy_data option * add tab-complete for copy_data option when ALTER DROP publication. Best regards, Hou zj v5-0001-Fix-Alter-Subscription-Add-Drop-Publication-refresh-.patch Description: v5-0001-Fix-Alter-Subscription-Add-Drop-Publication-refresh-.patch
RE: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
On Mon, Aug 23, 2021 1:18 PM Masahiko Sawada wrote: > On Mon, Aug 23, 2021 at 1:59 PM Amit Kapila > wrote: > > > > On Sat, Aug 7, 2021 at 6:53 PM houzj.f...@fujitsu.com > > wrote: > > > > > > Personally, I also think it will be better to make the behavior > > > consistent. > > > Attach the new version patch make both ADD and DROP behave the same > > > as SET PUBLICATION which refresh all the publications. > > > > > > > I think we can have tests in the separate test file (alter_sub_pub.pl) > > like you earlier had in one of the versions. Use some meaningful names > > for tables instead of temp1, temp2 as you had in the previous version. > > Otherwise, the code changes look good to me. > > - supported_opts = SUBOPT_REFRESH; > - if (isadd) > - supported_opts |= SUBOPT_COPY_DATA; > + supported_opts = SUBOPT_REFRESH | SUBOPT_COPY_DATA; > > I think that the currently the doc says copy_data option can be specified > except > in DROP PUBLICATION case, which needs to be fixed corresponding the above > change. Thanks for the comment. Fixed in the new version patch. Best regards, Hou zj
Re: pg_dump handling of ALTER DEFAULT PRIVILEGES IN SCHEMA
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested The patch looks fine and successfully fixes the issue of missing privileges issue. The new status of this patch is: Ready for Committer
Queries that should be canceled will get stuck on secure_write function
Hi, all Recently, I got a problem that the startup process of standby is stuck and keeps in a waiting state. The backtrace of startup process shows that it is waiting for a backend process which conflicts with recovery processing to exit, the guc parameters max_standby_streaming_delay and max_standby_archive_delay are both set as 30 seconds, but it doesn't work. The backend process keeps alive, and the backtrace of this backend process show that it is waiting for the socket to be writeable in secure_write(). After further reading the code, I found that ProcessClientWriteInterrupt() in secure_write() only process interrupts when ProcDiePending is true, otherwise do nothing. However, snapshot conflicts with recovery will only set QueryCancelPending as true, so the response to the signal will de delayed indefinitely if the corresponding client is stuck, thus blocking the recovery process. I want to know why the interrupt is only handled when ProcDiePending is true, I think query which is supposed to be canceled also should respond to the signal. I also want to share a patch with you, I add a guc parameter max_standby_client_write_delay, if a query is supposed to be canceled, and the time delayed by a client exceeds max_standby_client_write_delay, then set ProcDiePending as true to avoid being delayed indefinitely, what do you think of it, hope to get your reply. Thanks & Best Regard 0001-Set-max-client-write-delay-timeout-for-query-which-i.patch Description: Binary data
Re: Separate out FileSet from SharedFileSet (was Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o)
On Mon, Aug 23, 2021 at 12:52 PM Dilip Kumar wrote: > > On Mon, Aug 23, 2021 at 11:43 AM Amit Kapila wrote: > > > > On Mon, Aug 23, 2021 at 9:53 AM Dilip Kumar wrote: > > > > > > On Mon, Aug 23, 2021 at 9:11 AM houzj.f...@fujitsu.com > > > wrote: > > > > > > > 4) > > > > -extern File SharedFileSetCreate(SharedFileSet *fileset, const char > > > > *name); > > > > -extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name, > > > > - int mode); > > > > -extern bool SharedFileSetDelete(SharedFileSet *fileset, const char > > > > *name, > > > > - bool > > > > error_on_failure); > > > > extern void SharedFileSetDeleteAll(SharedFileSet *fileset); > > > > -extern void SharedFileSetUnregister(SharedFileSet *input_fileset); > > > > > > > > I noticed the patch delete part of public api, is it better to keep the > > > > old api and > > > > let them invoke new api internally ? Having said that, I didn’t find > > > > any open source > > > > extension use these old api, so it might be fine to delete them. > > > > > > Right, those were internally used by buffile.c but now we have changed > > > buffile.c to directly use the fileset interfaces, so we better remove > > > them. > > > > > > > I also don't see any reason to keep those exposed from > > sharedfileset.h. I see that even in the original commit dc6c4c9dc2, > > these APIs seem to be introduced to be used by buffile. Andres/Thomas, > > do let us know if you think otherwise? > > > > One more comment: > > I think v1-0001-Sharedfileset-refactoring doesn't have a way for > > cleaning up worker.c temporary files on error/exit. It is better to > > have that to make it an independent patch. > > I think we should handle that in worker.c itself, by adding a > before_dsm_detach function before_shmem_exit right? > Yeah, I thought of handling it in worker.c similar to what you've in 0002 patch. -- With Regards, Amit Kapila.
Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
On Sat, Aug 21, 2021 at 9:38 PM Dilip Kumar wrote: > > On Wed, Aug 18, 2021 at 3:45 PM Dilip Kumar wrote: > > > > On Wed, Aug 18, 2021 at 11:24 AM Amit Kapila > > wrote: > > > > > > On Tue, Aug 17, 2021 at 4:34 PM Andres Freund wrote: > > > > > > > > On 2021-08-17 10:54:30 +0530, Amit Kapila wrote: > > > > > 5. How can we provide a strict mechanism to not allow to use dsm APIs > > > > > for non-dsm FileSet? One idea could be that we can have a variable > > > > > (probably bool) in SharedFileSet structure which will be initialized > > > > > in SharedFileSetInit based on whether the caller has provided dsm > > > > > segment. Then in other DSM-based APIs, we can check if it is used for > > > > > the wrong type. > > > > > > > > Well, isn't the issue here that it's not a shared file set in case you > > > > explicitly don't want to share it? ISTM that the proper way to address > > > > this would be to split out a FileSet from SharedFileSet that's then used > > > > for worker.c and sharedfileset.c. > > > > > > > > > > Okay, but note that to accomplish the same, we need to tweak the > > > BufFile (buffile.c) APIs as well so that they can work with FileSet. > > > As per the initial analysis, there doesn't seem to be any problem with > > > that though. > > > > I was looking into this, so if we want to do that I think the outline > > will look like this > > > > - There will be a fileset.c and fileset.h files, and we will expose a > > new structure FileSet, which will be the same as SharedFileSet, except > > mutext and refcount. The fileset.c will expose FileSetInit(), > > FileSetCreate(), FileSetOpen(), FileSetDelete() and FileSetDeleteAll() > > interfaces. > > > > - sharefileset.c will internally call the fileset.c's interfaces. The > > SharedFileSet structure will also contain FileSet and other members > > i.e. mutex and refcount. > > > > - the buffile.c's interfaces which are ending with Shared e.g. > > BufFileCreateShared, BufFileOpenShared, should be converted to > > BufFileCreate and > > BufFileOpen respectively. And the input to these interfaces can be > > converted to FileSet instead of SharedFileSet. > > Here is the first draft based on the idea we discussed, 0001, splits > sharedfileset.c in sharedfileset.c and fileset.c and 0002 is same > patch I submitted earlier(use single fileset throughout the worker), > just it is rebased on top of 0001. Please let me know your thoughts. Here are some comments on 0001 patch: +/* + * Initialize a space for temporary files. This API can be used by shared + * fileset as well as if the temporary files are used only by single backend + * but the files need to be opened and closed multiple times and also the + * underlying files need to survive across transactions. + * + * Files will be distributed over the tablespaces configured in + * temp_tablespaces. + * + * Under the covers the set is one or more directories which will eventually + * be deleted. + */ I think it's better to mention cleaning up here like we do in the comment of SharedFileSetInit(). --- I think we need to clean up both stream_fileset and subxact_fileset on proc exit. In 0002 patch cleans up the fileset but I think we need to do that also in 0001 patch. --- There still are some comments using "shared fileset" in both buffile.c and worker.c. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Minor pg_amcheck fixes spotted while reading code
> On 21 Aug 2021, at 02:43, Michael Paquier wrote: > > On Fri, Aug 20, 2021 at 11:42:09AM -0700, Mark Dilger wrote: >> These look correct. > > static void > -help(const char *progname) > +help(const char *program_name) > These were discussed not long ago, and I recall that they were in the > we-don't-care category. Note for example all the tools of > src/scripts/ and pg_dump/. Fair enough, I had missed that thread. -- Daniel Gustafsson https://vmware.com/
Re: Separate out FileSet from SharedFileSet (was Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o)
On Mon, Aug 23, 2021 at 11:43 AM Amit Kapila wrote: > > On Mon, Aug 23, 2021 at 9:53 AM Dilip Kumar wrote: > > > > On Mon, Aug 23, 2021 at 9:11 AM houzj.f...@fujitsu.com > > wrote: > > > > > 4) > > > -extern File SharedFileSetCreate(SharedFileSet *fileset, const char > > > *name); > > > -extern File SharedFileSetOpen(SharedFileSet *fileset, const char *name, > > > - int mode); > > > -extern bool SharedFileSetDelete(SharedFileSet *fileset, const char *name, > > > - bool > > > error_on_failure); > > > extern void SharedFileSetDeleteAll(SharedFileSet *fileset); > > > -extern void SharedFileSetUnregister(SharedFileSet *input_fileset); > > > > > > I noticed the patch delete part of public api, is it better to keep the > > > old api and > > > let them invoke new api internally ? Having said that, I didn’t find any > > > open source > > > extension use these old api, so it might be fine to delete them. > > > > Right, those were internally used by buffile.c but now we have changed > > buffile.c to directly use the fileset interfaces, so we better remove > > them. > > > > I also don't see any reason to keep those exposed from > sharedfileset.h. I see that even in the original commit dc6c4c9dc2, > these APIs seem to be introduced to be used by buffile. Andres/Thomas, > do let us know if you think otherwise? > > One more comment: > I think v1-0001-Sharedfileset-refactoring doesn't have a way for > cleaning up worker.c temporary files on error/exit. It is better to > have that to make it an independent patch. I think we should handle that in worker.c itself, by adding a before_dsm_detach function before_shmem_exit right? Or you are thinking that in FileSetInit, we keep the mechanism of filesetlist like we were doing in SharedFileSetInit? I think that will just add unnecessary complexity in the first patch which will eventually go away in the second patch. And if we do that then SharedFileSetInit can not directly use the FileSetInit, otherwise, the dsm based fileset will also get registered for cleanup in filesetlist so for that we might need to pass one parameter to the FileSetInit() that whether to register for cleanup or not and that will again not look clean because now we are again adding the conditional cleanup, IMHO that is the same problem what we are trying to cleanup in SharedFileSetInit by introducing a new FileSetInit. I think what we can do is, introduce a new function FileSetInitInternal(), that will do what FileSetInit() is doing today and now both SharedFileSetInit() and the FileSetInit() will call this function, and along with that SharedFileSetInit(), will register the dsm based cleanup and FileSetInit() will do the filesetlist based cleanup. But IMHO, we should try to go in this direction only if we are sure that we want to commit the first patch and not the second. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: pg_veryfybackup can fail with a valid backup for TLI > 1
At Mon, 23 Aug 2021 15:46:37 +0900, Michael Paquier wrote in > On Fri, Aug 20, 2021 at 04:47:15PM +0900, Kyotaro Horiguchi wrote: > > Yes, backup_label looks correct. > > > > backup_label (extract): > > START WAL LOCATION: 0/528 (file 00020005) > > CHECKPOINT LOCATION: 0/560 > > START TIMELINE: 2 > > Okay. I have worked on that today, did more manual tests, and applied > this fix down to 13. The cherry-pick from 14 to 13 only required a > s/$master/$primary/ in the test, which was straight-forward. Your > patch for 13 did that though: > - if (starttli != entry->tli) > + if (!XLogRecPtrIsInvalid(entry->begin)) > > So it would have caused a failure with parent TLIs that have a correct > begin location, but we expect the opposite. The patch for 14/HEAD had > that right. Mmm. Sorry for the silly mistake, and thanks for commiting this! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Proposal: More structured logging
Le vendredi 20 août 2021, 11:31:21 CEST Ronan Dunklau a écrit : > Le jeudi 19 août 2021, 15:04:30 CEST Alvaro Herrera a écrit : > > On 2021-Aug-13, Ronan Dunklau wrote: > > > ereport(NOTICE, > > > > > > (errmsg("My log message")), > > > (errtag("EMITTER", "MYEXTENSION")), > > > (errtag("MSG-ID", "%d", error_message_id)) > > > > > > ); > > > > Interesting idea. I agree this would be useful. > > > > > Please find attached a very small POC patch to better demonstrate what I > > > propose. > > > > Seems like a good start. I think a further step towards a committable > > patch would include a test module that uses the new tagging > > functionality. > > Please find attached the original patch + a new one adding a test module. > The test module exposes a function for generating logs with tags, and a log > hook which format the tags in the DETAIL field for easy regression testing. > > Next step would be to emit those tags in the CSV logs. I'm not sure what > representation they should have though: a nested csv in it's own column ? A > key => value pairs list similar to hstore ? A json object ? I opted for a JSON representation in the CSV logs, please find attached v3 which is the same thing as v2 with another patch for CSV log output. > > Also we should probably make this available to the client too, but once > again the format of the tag field needs to be defined. Any opinion ? -- Ronan Dunklau>From e5af5d1a07e65926eee90e0d18443a673d1fcba8 Mon Sep 17 00:00:00 2001 From: Ronan Dunklau Date: Fri, 13 Aug 2021 15:03:18 +0200 Subject: [PATCH v3 1/3] Add ErrorTag support --- src/backend/utils/error/elog.c | 48 ++ src/include/utils/elog.h | 10 +++ 2 files changed, 58 insertions(+) diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index a3e1c59a82..5b9b1b8a72 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -465,6 +465,7 @@ errstart(int elevel, const char *domain) edata->sqlerrcode = ERRCODE_SUCCESSFUL_COMPLETION; /* errno is saved here so that error parameter eval can't change it */ edata->saved_errno = errno; + edata->tags = NIL; /* * Any allocations for this error state level should go into ErrorContext @@ -516,6 +517,7 @@ errfinish(const char *filename, int lineno, const char *funcname) int elevel; MemoryContext oldcontext; ErrorContextCallback *econtext; + ListCell *lc; recursion_depth++; CHECK_STACK_DEPTH(); @@ -621,7 +623,18 @@ errfinish(const char *filename, int lineno, const char *funcname) pfree(edata->constraint_name); if (edata->internalquery) pfree(edata->internalquery); + /* Every tag should have been palloc'ed */ + if (edata->tags != NIL) + { + foreach(lc, edata->tags) + { + ErrorTag *tag = (ErrorTag *) lfirst(lc); + pfree(tag->tagvalue); + pfree(tag); + } + pfree(edata->tags); + } errordata_stack_depth--; /* Exit error-handling context */ @@ -1192,6 +1205,41 @@ errhint_plural(const char *fmt_singular, const char *fmt_plural, return 0; /* return value does not matter */ } +int +errtag(const char *tag, const char *fmt_value,...) +{ + ErrorData *edata = [errordata_stack_depth]; + ErrorTag *etag; + MemoryContext oldcontext; + StringInfoData buf; + + recursion_depth++; + CHECK_STACK_DEPTH(); + oldcontext = MemoryContextSwitchTo(edata->assoc_context); + etag = palloc(sizeof(ErrorTag)); + etag->tagname = tag; + initStringInfo(); + for (;;) + { + va_list args; + int needed; + + errno = edata->saved_errno; + va_start(args, fmt_value); + needed = appendStringInfoVA(, fmt_value, args); + va_end(args); + if (needed == 0) + break; + enlargeStringInfo(, needed); + } + etag->tagvalue = pstrdup(buf.data); + edata->tags = lappend(edata->tags, etag); + pfree(buf.data); + MemoryContextSwitchTo(oldcontext); + recursion_depth--; + return 0; +} + /* * errcontext_msg --- add a context error message text to the current error diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index f53607e12e..1c490d1b11 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -15,6 +15,7 @@ #define ELOG_H #include +#include "nodes/pg_list.h" /* Error level codes */ #define DEBUG5 10 /* Debugging messages, in categories of @@ -193,6 +194,8 @@ extern int errhint(const char *fmt,...) pg_attribute_printf(1, 2); extern int errhint_plural(const char *fmt_singular, const char *fmt_plural, unsigned long n,...) pg_attribute_printf(1, 4) pg_attribute_printf(2, 4); +extern int errtag(const char *tag, const char *fmt_value,...) pg_attribute_printf(2, 3); + /* * errcontext() is typically called in error context callback functions, not * within an ereport() invocation. The callback function can be in a different @@ -395,11 +398,18 @@ typedef struct ErrorData int internalpos; /* cursor index into internalquery */ char *internalquery; /* text of internally-generated query */
Re: Mark all GUC variable as PGDLLIMPORT
On Sun, Aug 22, 2021 at 09:29:01PM +0800, Julien Rouhaud wrote: > On Sun, Aug 22, 2021 at 09:19:42AM -0400, Tom Lane wrote: > > > > Uh, no, it's exactly *not* clear. There are a lot of GUCs that are only > > of interest to particular subsystems. I do not see why being a GUC makes > > something automatically more interesting than any other global variable. > > Usually, the fact that one is global is only so the GUC machinery itself > > can get at it, otherwise it'd be static in the owning module. > > > > As for "extensions should be able to get at the values", the GUC machinery > > already provides uniform mechanisms for doing that safely. Direct access > > to the variable's internal value would be unsafe in many cases. > > Then shouldn't we try to prevent direct access on all platforms rather than > only one? So since the non currently explicitly exported GUC global variables shouldn't be accessible by third-party code, I'm attaching a POC patch that does the opposite of v1: enforce that restriction using a new pg_attribute_hidden() macro, defined with GCC only, to start discussing that topic. It would probably be better to have some other macro (e.g. PG_GLOBAL_PUBLIC and PG_GLOBAL_PRIVATE or similar) to make declarations more consistent, but given the amount of changes it would represent I prefer to have some feedback before spending time on that. >From b43c48a80f985e879a88d9d270fc1e2b283b5732 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Sun, 22 Aug 2021 15:40:47 +0800 Subject: [PATCH v2] Make all GUC ariables non previously marked as PGDLLIMPORT private. --- src/backend/utils/misc/guc.c | 18 +++--- src/include/access/gin.h | 2 +- src/include/access/tableam.h | 4 +- src/include/access/toast_compression.h| 2 +- src/include/access/xact.h | 12 ++-- src/include/access/xlog.h | 78 +++ src/include/c.h | 7 ++ src/include/catalog/namespace.h | 2 +- src/include/catalog/storage.h | 2 +- src/include/commands/async.h | 2 +- src/include/commands/user.h | 2 +- src/include/commands/vacuum.h | 12 ++-- src/include/fmgr.h| 2 +- src/include/jit/jit.h | 20 +++--- src/include/libpq/auth.h | 4 +- src/include/libpq/libpq.h | 24 +++ src/include/libpq/pqcomm.h| 2 +- src/include/miscadmin.h | 22 +++ src/include/optimizer/geqo.h | 10 +-- src/include/optimizer/optimizer.h | 4 +- src/include/optimizer/planmain.h | 6 +- src/include/parser/parse_expr.h | 2 +- src/include/parser/parser.h | 4 +- src/include/pgstat.h | 6 +- src/include/pgtime.h | 2 +- src/include/postmaster/autovacuum.h | 30 - src/include/postmaster/bgwriter.h | 8 +-- src/include/postmaster/postmaster.h | 28 src/include/postmaster/syslogger.h| 10 +-- src/include/postmaster/walwriter.h| 4 +- src/include/replication/logicallauncher.h | 4 +- src/include/replication/syncrep.h | 2 +- src/include/replication/walreceiver.h | 6 +- src/include/replication/walsender.h | 6 +- src/include/storage/bufmgr.h | 20 +++--- src/include/storage/dsm_impl.h| 4 +- src/include/storage/fd.h | 2 +- src/include/storage/large_object.h| 2 +- src/include/storage/lock.h| 12 ++-- src/include/storage/lwlock.h | 2 +- src/include/storage/pg_shmem.h| 6 +- src/include/storage/predicate.h | 6 +- src/include/storage/proc.h| 2 +- src/include/storage/standby.h | 8 +-- src/include/tcop/tcopprot.h | 6 +- src/include/tsearch/ts_cache.h| 2 +- src/include/utils/array.h | 2 +- src/include/utils/builtins.h | 2 +- src/include/utils/bytea.h | 2 +- src/include/utils/elog.h | 10 +-- src/include/utils/guc.h | 62 +- src/include/utils/pg_locale.h | 8 +-- src/include/utils/plancache.h | 2 +- src/include/utils/ps_status.h | 2 +- src/include/utils/queryjumble.h | 2 +- src/include/utils/rls.h | 2 +- src/include/utils/xml.h | 4 +- 57 files changed, 263 insertions(+), 256 deletions(-) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index a2e0f8de7e..ca2d6042cd 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -129,20 +129,20 @@ #define REALTYPE_PRECISION 17 /* XXX these should appear in other modules' header files */ -extern bool Log_disconnections; -extern int
Re: pg_veryfybackup can fail with a valid backup for TLI > 1
On Fri, Aug 20, 2021 at 04:47:15PM +0900, Kyotaro Horiguchi wrote: > Yes, backup_label looks correct. > > backup_label (extract): > START WAL LOCATION: 0/528 (file 00020005) > CHECKPOINT LOCATION: 0/560 > START TIMELINE: 2 Okay. I have worked on that today, did more manual tests, and applied this fix down to 13. The cherry-pick from 14 to 13 only required a s/$master/$primary/ in the test, which was straight-forward. Your patch for 13 did that though: - if (starttli != entry->tli) + if (!XLogRecPtrIsInvalid(entry->begin)) So it would have caused a failure with parent TLIs that have a correct begin location, but we expect the opposite. The patch for 14/HEAD had that right. -- Michael signature.asc Description: PGP signature
Re: Some leftovers of recent message cleanup?
At Fri, 20 Aug 2021 19:36:02 +0900, Fujii Masao wrote in > > > On 2021/08/20 11:53, Kyotaro Horiguchi wrote: > > At Thu, 19 Aug 2021 20:29:42 +0900, Fujii Masao > > wrote in > >> On 2021/08/19 17:03, Kyotaro Horiguchi wrote: > >>> Hello. > >>> While I was examining message translation for PG14, I found some > >>> messages that would need to be fixed. > >>> 0001 is a fix for perhaps-leftovers of the recent message cleanups > >>> related to "positive integer"(fd90f6ba7a). > >> > >> There are still other many messages using "positive" and "negative" > >> keywords. > >> We should also fix them at all? > > I'm not sure, or no if anything. My main point here is not to avoid > > use of such kind of words, but reducing variations of the effectively > > the same message from the view of translator burden. The two messages > > in 0001 are in that category. I noticed those messages accidentally. I > > don't think they are the only instance of such divergence, but I'm not > > going to do a comprehensive examination of such divergences.. (Or do I > > need to check that more comprehensively?) > > > Understood. > > - errmsg("modulus for hash partition > must be a positive > - integer"))); > + errmsg("modulus for hash partition must be an integer greater than > zero"))); > > "an integer greater" should be "an integer value greater" because > we use that words in other similar log messages like "modulus for > hash partition must be an integer value greater than zero" > in src/backend/partitioning/partbounds.c? Ugh... Of course. I thought I did that way but the actual file is incorrect... Fixed that in the attached. > I'm thinking to back-patch this to v11 where hash partitioning > was supported. On the other hand, the following change should If I'm not missing something, back to 13, where the "word change" took place? For 11 and 12, we need to change the *both* messages. v11, 12 ./src/backend/parser/parse_utilcmd.cer"))); Binary file ./src/backend/parser/gram.o matches ./src/backend/partitioning/partbounds.cpg/preproc/ecpg.header @@ -596,7 +596,7 @@ check_declared_list(const char *name) { if (connection) if (connection && strcmp(ptr->connection, connection) != 0) - mmerror(PARSE_ERROR, ET_WARNING, "connection %s is overwritten with %s by declare statement %s.", connection, ptr->connection, name); + mmerror(PARSE_ERROR, ET_WARNING, "connection %s is overwritten with %s by DECLARE statement %s", connection, ptr->connection, name); connection = mm_strdup(ptr -> connection); return true; } -- 2.27.0