Re: Introduce XID age and inactive timeout based replication slot invalidation
On Sat, Mar 23, 2024 at 3:02 AM Bharath Rupireddy wrote: > > On Fri, Mar 22, 2024 at 3:15 PM Bertrand Drouvot > wrote: > > > > > > Worth to add some tests too (or we postpone them in future commits because > > we're > > confident enough they will follow soon)? > > Yes. Added some tests in a new TAP test file named > src/test/recovery/t/043_replslot_misc.pl. This new file can be used to > add miscellaneous replication tests in future as well. I couldn't find > a better place in existing test files - tried having the new tests for > physical slots in t/001_stream_rep.pl and I didn't find a right place > for logical slots. > How about adding the test in 019_replslot_limit? It is not a direct fit but I feel later we can even add 'invalid_timeout' related tests in this file which will use last_inactive_time feature. It is also possible that some of the tests added by the 'invalid_timeout' feature will obviate the need for some of these tests. Review of v15 == 1. @@ -1026,7 +1026,8 @@ CREATE VIEW pg_replication_slots AS L.conflicting, L.invalidation_reason, L.failover, -L.synced +L.synced, +L.last_inactive_time FROM pg_get_replication_slots() AS L As mentioned previously, let's keep these new fields before conflicting and after two_phase. 2. +# Get last_inactive_time value after slot's creation. Note that the slot is still +# inactive unless it's used by the standby below. +my $last_inactive_time_1 = $primary->safe_psql('postgres', + qq(SELECT last_inactive_time FROM pg_replication_slots WHERE slot_name = '$sb_slot' AND last_inactive_time IS NOT NULL;) +); We should check $last_inactive_time_1 to be a valid value and add a similar check for logical slots. 3. BTW, why don't we set last_inactive_time for temporary slots (RS_TEMPORARY) as well? Don't we even invalidate temporary slots? If so, then I think we should set last_inactive_time for those as well and later allow them to be invalidated based on timeout parameter. -- With Regards, Amit Kapila.
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Fri, Mar 22, 2024 at 7:17 PM Bertrand Drouvot wrote: > > On Fri, Mar 22, 2024 at 06:02:11PM +0530, Amit Kapila wrote: > > On Fri, Mar 22, 2024 at 5:30 PM Bertrand Drouvot > > wrote: > > > On Fri, Mar 22, 2024 at 03:56:23PM +0530, Amit Kapila wrote: > > > > > > > > > > That would avoid testing twice "slot->data.persistency == > > > > > RS_PERSISTENT". > > > > > > > > > > > > > That sounds like a good idea. Also, don't we need to consider physical > > > > slots where we don't reserve WAL during slot creation? I don't think > > > > there is a need to set inactive_at for such slots. > > > > > > If the slot is not active, why shouldn't we set inactive_at? I can > > > understand > > > that such a slots do not present "any risks" but I think we should still > > > set > > > inactive_at (also to not give the false impression that the slot is > > > active). > > > > > > > But OTOH, there is a chance that we will invalidate such slots even > > though they have never reserved WAL in the first place which doesn't > > appear to be a good thing. > > That's right but I don't think it is not a good thing. I think we should treat > inactive_at as an independent field (like if the timeout one does not exist at > all) and just focus on its meaning (slot being inactive). If one sets a > timeout > (> 0) and gets an invalidation then I think it works as designed (even if the > slot does not present any "risk" as it does not hold any rows or WAL). > Fair point. -- With Regards, Amit Kapila.
Re: Large block sizes support in Linux
On Fri, Mar 22, 2024 at 10:56 PM Pankaj Raghav (Samsung) wrote: > My team and I have been working on adding Large block size(LBS) > support to XFS in Linux[1]. Once this feature lands upstream, we will be > able to create XFS with FS block size > page size of the system on Linux. > We also gave a talk about it in Linux Plumbers conference recently[2] > for more context. The initial support is only for XFS but more FSs will > follow later. Very cool! (I used XFS on IRIX in the 90s, and it had large blocks then, a feature lost in the port to Linux AFAIK.) > On an x86_64 system, fs block size was limited to 4k, but traditionally > Postgres uses 8k as its default internal page size. With LBS support, > fs block size can be set to 8K, thereby matching the Postgres page size. > > If the file system block size == DB page size, then Postgres can have > guarantees that a single DB page will be written as a single unit during > kernel write back and not split. > > My knowledge of Postgres internals is limited, so I'm wondering if there > are any optimizations or potential optimizations that Postgres could > leverage once we have LBS support on Linux? FWIW here are a couple of things I wrote about our storage atomicity problem, for non-PostgreSQL hackers who may not understand our project jargon: https://wiki.postgresql.org/wiki/Full_page_writes https://freebsdfoundation.org/wp-content/uploads/2023/02/munro_ZFS.pdf The short version is that we (and MySQL, via a different scheme with different tradeoffs) could avoid writing all our stuff out twice if we could count on atomic writes of a suitable size on power failure, so the benefits are very large. As far as I know, there are two things we need from the kernel and storage to do that on "overwrite" filesystems like XFS: 1. The disk must promise that its atomicity-on-power-failure is a multiple of our block size -- something like NVMe AWUPF, right? My devices seem to say 0 :-( Or I guess the filesystem has to compensate, but then it's not exactly an overwrite filesystem anymore... 2. The kernel must promise that there is no code path in either buffered I/O or direct I/O that will arbitrarily chop up our 8KB (or other configured block size) writes on some smaller boundary, most likely sector I guess, on their way to the device, as you were saying. Not just in happy cases, but even under memory pressure, if interrupted, etc etc. Sounds like you're working on problem #2 which is great news. I've been wondering for a while how a Unixoid kernel should report these properties to userspace where it knows them, especially on non-overwrite filesystems like ZFS where this sort of thing works already, without stuff like AWUPF working the way one might hope. Here was one throw-away idea on the back of a napkin about that, for what little it's worth: https://wiki.postgresql.org/wiki/FreeBSD/AtomicIO
Re: SQL:2011 application time
On Fri, Mar 22, 2024 at 11:49 PM Peter Eisentraut wrote: > > On 22.03.24 01:35, Paul Jungwirth wrote: > > > 1. In ri_triggers.c ri_KeysEqual, you swap the order of arguments to > > ri_AttributesEqual(): > > > > > > - if (!ri_AttributesEqual(riinfo->ff_eq_oprs[i], > > RIAttType(rel, attnums[i]), > > > - oldvalue, newvalue)) > > > + if (!ri_AttributesEqual(eq_opr, RIAttType(rel, attnums[i]), > > > + newvalue, oldvalue)) > > > > > > But the declared arguments of ri_AttributesEqual() are oldvalue and > > newvalue, so passing them > > > backwards is really confusing. And the change does matter in the tests. > > > > > > Can we organize this better? > > > > I renamed the params and actually the whole function. All it's doing is > > execute `oldvalue op newvalue`, casting if necessary. So I changed it to > > ri_CompareWithCast and added some documentation. In an earlier version > > of this patch I had a separate function for the PERIOD comparison, but > > it's just doing the same thing, so I think the best thing is to give the > > function a more accurate name and use it. > > Ok, I see now, and the new explanation is better. > > But after reading the comment in the function about collations, I think > there could be trouble. As long as we are only comparing for equality > (and we don't support nondeterministic global collations), then we can > use any collation to compare for equality. But if we are doing > contained-by, then the collation does matter, so we would need to get > the actual collation somehow. So as written, this might not always work > correctly. > > I think it would be safer for now if we just kept using the equality > operation even for temporal foreign keys. If we did that, then in the > case that you update a key to a new value that is contained by the old > value, this function would say "not equal" and fire all the checks, even > though it wouldn't need to. This is kind of similar to the "false > negatives" that the comment already talks about. > > What do you think? > we don't need to worry about primary key and foreign key with different collation. because it will be error out as incompatible data type, foreign key constraint will not be created. if there are the same collation, when we build the query string, we don't need to worry about collation. because at runtime, the operator associated oprcode will fetch collation information later. main operator and the main oprcode related to this patch(0001, 0002) are: range_contained_by_multirange range_eq range_overlaps range_contained_by the first 3 functions will fetch collation information within range_cmp_bounds. range_contained_by will fetch collation information in range_contains_elem_internal. demo: CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false); DROP TABLE IF exists temporal_fk_rng2rng; DROP TABLE IF exists temporal_rng; DROP TYPE textrange_case_insensitive; create type textrange_case_insensitive as range(subtype=text, collation=case_insensitive); CREATE TABLE temporal_rng (id int4range, valid_at textrange_case_insensitive); ALTER TABLE temporal_rng ADD CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS); CREATE TABLE temporal_fk_rng2rng ( id int4range, valid_at textrange_case_insensitive, parent_id int4range, CONSTRAINT temporal_fk_rng2rng_fk2 FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng (id, PERIOD valid_at) ); INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)', textrange_case_insensitive('c', 'h','[]')); --fail INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[1,2)', textrange_case_insensitive('B', 'B','[]'), '[1,2)'); --fail. INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[1,2)', textrange_case_insensitive('a', 'F','[]'), '[1,2)'); --fail. INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[1,2)', textrange_case_insensitive('e', 'Z','[]'), '[1,2)'); --ok INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[1,2)', textrange_case_insensitive('d', 'F','[]'), '[1,2)');
Re: Large block sizes support in Linux
On Fri, Mar 22, 2024 at 10:31:11PM +0100, Tomas Vondra wrote: > Right, but things change over time - current storage devices support > much larger sectors (LBA format), usually 4K. And if you do I/O with > this size, it's usually atomic. > > AFAIK if you built Postgres with 4K pages, on a device with 4K LBA > format, that would not need full-page writes - we always do I/O in 4k > pages, and block layer does I/O (during writeback from page cache) with > minimum guaranteed size = logical block size. 4K are great for OLTP > systems in general, it'd be even better if we didn't need to worry about > torn pages (but the tricky part is to be confident it's safe to disable > them on a particular system). Yes, even if the file system is 8k, and the storage is 8k, we only know that torn pages are impossible if the file system never overwrites existing 8k pages, but writes new ones and then makes it active. I think ZFS does that to handle snapshots. > The other thing is - is there a reliable way to say when the guarantees > actually apply? I mean, how would the administrator *know* it's safe to > set full_page_writes=off, or even better how could we verify this when > the database starts (and complain if it's not safe to disable FPW)? Yes, this is quite hard to know. Our docs have: https://www.postgresql.org/docs/current/wal-reliability.html Another risk of data loss is posed by the disk platter write operations themselves. Disk platters are divided into sectors, commonly 512 bytes each. Every physical read or write operation processes a whole sector. When a write request arrives at the drive, it might be for some multiple of 512 bytes (PostgreSQL typically writes 8192 bytes, or 16 sectors, at a time), and the process of writing could fail due to power loss at any time, meaning some of the 512-byte sectors were written while others were not. To guard against such failures, PostgreSQL periodically writes full page images to permanent WAL storage before modifying the actual page on disk. By doing this, during crash recovery PostgreSQL can --> restore partially-written pages from WAL. If you have file-system --> software that prevents partial page writes (e.g., ZFS), you can turn off --> this page imaging by turning off the full_page_writes parameter. --> Battery-Backed Unit (BBU) disk controllers do not prevent partial page --> writes unless they guarantee that data is written to the BBU as full --> (8kB) pages. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Potential stack overflow in incremental base backup
On Fri, Mar 8, 2024 at 6:53 AM Robert Haas wrote: > But I think that's really only necessary if we're actually going to > get rid of the idea of segmented relations altogether, which I don't > think is happening at least for v17, and maybe not ever. Yeah, I consider the feedback on ext4's size limitations to have completely killed the idea of getting rid of segments for the foreseeable future, at least in standard md.c (though who knows what people will do with pluggable smgr?). As for initdb --rel-segsize (CF #4305) for md.c, I abandoned plans to do that for 17 because I couldn't see what to do about this issue. Incremental backup effectively relies on smaller segments, by using them as problem-dividing granularity for checksumming and memory usage. That'll take some research...
Re: session username in default psql prompt?
On Fri, Mar 22, 2024 at 7:34 PM Tom Lane wrote: > > On the whole, I think we'd get more complaints about the default > prompt having more-or-less doubled in length than we'd get kudos > about this being a great usability improvement. Prompt space is > expensive and precious, at least for people who aren't in the > habit of working in very wide windows. > > > I'm not sure you're right, but in view of the opposition I won't press it. Thanks to Kori for the patch. cheers andrew
Re: Cannot find a working 64-bit integer type on Illumos
Thomas Munro writes: > . o O ( int64_t, PRIdi64, etc were standardised a quarter of a century ago ) Yeah. Now that we require C99 it's probably reasonable to assume that those things exist. I wouldn't be in favor of ripping out our existing notations like UINT64CONST, because the code churn would be substantial and the gain minimal. But we could imagine reimplementing that stuff atop and then getting rid of the configure-time probes. regards, tom lane
Re: sublink [exists (select xxx group by grouping sets ())] causes an assertion error
Jeff Davis writes: > On Fri, 2024-03-22 at 12:28 -0400, Tom Lane wrote: >> Thanks for the report. I did some bisecting and found that the crash >> appears at Jeff's commit c8aeaf3ab (which introduced this assertion) >> and disappears at Heikki's c4649cce3 (which removed it). So I would >> say that the problem is "this assertion is wrong", and we should fix >> the problem by fixing the assertion, not by hacking around in distant >> calling code. On the whole, since this code has been dead for >> several versions, I'd be inclined to just remove the assertion. > c4649cce3 didn't add additional calls to LogicalTapeSetBlocks(), so I'm > not sure if the removal of the Assert was related to his changes, or if > he just realized the assertion was wrong and removed it along the way? My guess is he just zapped it because the code block was dependent on the "tape" abstraction which was going away. Heikki? > Also, without the assertion, the word "should" in the comment is > ambiguous (does it mean "must not" or something else), and it still > exists in master. Do we care about the calculation being wrong if > there's an unfinished write? I think it's actually fine. The callers of LogicalTapeSetBlocks only use the number for statistics or trace reporting, so precision isn't critical in the first place, but I think what they care about is the amount of data that's really been written out to files. The comment should be clarified, for sure. regards, tom lane
Re: sublink [exists (select xxx group by grouping sets ())] causes an assertion error
On Fri, 2024-03-22 at 12:28 -0400, Tom Lane wrote: > Thanks for the report. I did some bisecting and found that the crash > appears at Jeff's commit c8aeaf3ab (which introduced this assertion) > and disappears at Heikki's c4649cce3 (which removed it). So I would > say that the problem is "this assertion is wrong", and we should fix > the problem by fixing the assertion, not by hacking around in distant > calling code. On the whole, since this code has been dead for > several versions, I'd be inclined to just remove the assertion. c4649cce3 didn't add additional calls to LogicalTapeSetBlocks(), so I'm not sure if the removal of the Assert was related to his changes, or if he just realized the assertion was wrong and removed it along the way? Also, without the assertion, the word "should" in the comment is ambiguous (does it mean "must not" or something else), and it still exists in master. Do we care about the calculation being wrong if there's an unfinished write? If not, I'll just clarify that the calculation doesn't take into account still-buffered data. If we do care, then something might need to be fixed. Regards, Jeff Davis
Re: Cannot find a working 64-bit integer type on Illumos
On Sat, Mar 23, 2024 at 6:26 AM Tom Lane wrote: > conftest.c:139:5: error: no previous prototype for 'does_int64_work' > [-Werror=missing-prototypes] > 139 | int does_int64_work() > | ^~~ > cc1: all warnings being treated as errors > configure:17003: $? = 1 > configure: program exited with status 1 . o O ( int64_t, PRIdi64, etc were standardised a quarter of a century ago )
Re: pg_upgrade --copy-file-range
Hmm, this discussion seems to assume that we only use copy_file_range() to copy/clone whole segment files, right? That's great and may even get most of the available benefit given typical databases with many segments of old data that never changes, but... I think copy_write_range() allows us to go further than the other whole-file clone techniques: we can stitch together parts of an old backup segment file and an incremental backup to create a new file. If you're interested in minimising disk use while also removing dependencies on the preceding chain of backups, then it might make sense to do that even if you *also* have to read the data to compute the checksums, I think? That's why I mentioned it: if copy_file_range() (ie sub-file-level block sharing) is a solution in search of a problem, has the world ever seen a better problem than pg_combinebackup?
Re: BitmapHeapScan streaming read user and prelim refactoring
On Tue, Mar 19, 2024 at 02:33:35PM +0200, Heikki Linnakangas wrote: > On 18/03/2024 17:19, Melanie Plageman wrote: > > I've attached v7 rebased over this commit. > > If we delayed table_beginscan_bm() call further, after starting the TBM > iterator, we could skip it altogether when the iterator is empty. > > That's a further improvement, doesn't need to be part of this patch set. > Just caught my eye while reading this. Hmm. You mean like until after the first call to tbm_[shared]_iterate()? AFAICT, tbm_begin_iterate() doesn't tell us anything about whether or not the iterator is "empty". Do you mean cases when the bitmap has no blocks in it? It seems like we should be able to tell that from the TIDBitmap. > > > v7-0003-Push-BitmapHeapScan-skip-fetch-optimization-into-.patch > > I suggest to avoid the double negative with SO_CAN_SKIP_FETCH, and call the > flag e.g. SO_NEED_TUPLE. Agreed. Done in attached v8. Though I wondered if it was a bit weird that the flag is set in the common case and not set in the uncommon case... > As yet another preliminary patch before the streaming read API, it would be > nice to move the prefetching code to heapam.c too. I've done this, but I can say it is not very pretty. see 0013. I had to add a bunch of stuff to TableScanDescData and HeapScanDescData which are only used for bitmapheapscans. I don't know if it makes the BHS streaming read user patch easier to review, but I don't think what I have in 0013 is committable to Postgres. Maybe there was another way I could have approached it. Let me know what you think. In addition to bloating the table descriptors, note that it was difficult to avoid one semantic change -- with 0013, we no longer prefetch or adjust prefetch target when emitting each empty tuple -- though I think this is could actually be desirable. > What's the point of having separate table_scan_bitmap_next_block() and > table_scan_bitmap_next_tuple() functions anymore? The AM owns the TBM > iterator now. The executor node updates the lossy/exact page counts, but > that's the only per-page thing it does now. Oh, interesting. Good point. I've done this in 0015. If you like the way it turned out, I can probably rebase this back into an earlier point in the set and end up dropping some of the other incremental changes (e.g. 0008). > > /* > > * If this is the first scan of the underlying table, create > > the table > > * scan descriptor and begin the scan. > > */ > > if (!scan) > > { > > uint32 extra_flags = 0; > > > > /* > > * We can potentially skip fetching heap pages if we do > > not need > > * any columns of the table, either for checking > > non-indexable > > * quals or for returning data. This test is a bit > > simplistic, as > > * it checks the stronger condition that there's no > > qual or return > > * tlist at all. But in most cases it's probably not > > worth working > > * harder than that. > > */ > > if (node->ss.ps.plan->qual == NIL && > > node->ss.ps.plan->targetlist == NIL) > > extra_flags |= SO_CAN_SKIP_FETCH; > > > > scan = node->ss.ss_currentScanDesc = table_beginscan_bm( > > > > > > node->ss.ss_currentRelation, > > > > > > node->ss.ps.state->es_snapshot, > > > > 0, > > > > NULL, > > > > extra_flags); > > } > > > > scan->tbmiterator = tbmiterator; > > scan->shared_tbmiterator = shared_tbmiterator; > > How about passing the iterator as an argument to table_beginscan_bm()? You'd > then need some other function to change the iterator on rescan, though. Not > sure what exactly to do here, but feels that this part of the API is not > fully thought-out. Needs comments at least, to explain who sets tbmiterator > / shared_tbmiterator and when. For comparison, for a TID scan there's a > separate scan_set_tidrange() table AM function. Maybe follow that example > and introduce scan_set_tbm_iterator(). I've spent quite a bit of time playing around with the code trying to make it less terrible than what I had
Re: New Table Access Methods for Multi and Single Inserts
On Thu, 2024-03-21 at 13:10 +0530, Bharath Rupireddy wrote: > I'm attaching the v13 patches using virtual tuple slots for buffered > tuples for multi inserts. Comments: * Do I understand correctly that CMV, RMV, and CTAS experience a performance benefit, but COPY FROM does not? And is that because COPY already used table_multi_insert, whereas CMV and RMV did not? * In the COPY FROM code, it looks like it's deciding whether to flush based on MAX_BUFFERED_TUPLES, but the slot array is allocated with MAX_BUFFERED_SLOTS (they happen to be the same for heap, but perhaps not for other AMs). The copy code shouldn't be using internal knowledge of the multi-insert code; it should know somehow from the API when the right time is to flush. * How is the memory management expected to work? It looks like COPY FROM is using the ExprContext when running the input functions, but we really want to be using a memory context owned by the table AM, right? * What's the point of the table_multi_insert_slots() "num_slots" argument? The only caller simply discards it. * table_tuple_insert_v2 isn't called anywhere, what's it for? * the "v2" naming is inconsistent -- it seems you only added it in places where there's a name conflict, which makes it hard to tell which API methods go together. I'm not sure how widely table_multi_insert* is used outside of core, so it's possible that we may even be able to just change those APIs and the few extensions that call it can be updated. * Memory usage tracking should be done in the AM by allocating everything in a single context so it's easy to check the size. Don't manually add up memory. * I don't understand: "Caller may have got the slot using heap_multi_insert_next_free_slot, filled it and passed. So, skip copying in such a case." If the COPY FROM had a WHERE clause and skipped a tuple after filling the slot, doesn't that mean the slot has bogus data from the last tuple? * We'd like this to work for insert-into-select (IIS) and logical replication, too. Do you see any problem there, or is it just a matter of code? * Andres had some comments[1] that don't seem entirely addressed. - You are still allocating the AM-specific part of TableModifyState as a separately-allocated chunk. - It's still called TableInsertState rather than TableModifyState as he suggested. If you change that, you should also change to table_modify_begin/end. - CID: I suppose Andres is considering the use case of "BEGIN; ... ten thousand inserts ... COMMIT;". I don't think this problem is really solvable (discussed below) but we should have some response/consensus on that point. - He mentioned that we only need one new method implemented by the AM. I don't know if one is enough, but 7 does seem excessive. I have some simplification ideas below. Overall: If I understand correctly, there are two ways to use the API: 1. used by CTAS, MV: tistate = table_insert_begin(...); table_multi_insert_v2(tistate, tup1); ... table_multi_insert_v2(tistate, tupN); table_insert_end(tistate); 2. used by COPY ... FROM: tistate = table_insert_begin(..., SKIP_FLUSH); if (multi_insert_slot_array_is_full()) table_multi_insert_flush(tistate); slot = table_insert_next_free_slot(tistate); ... fill slot with tup1 table_multi_insert_v2(tistate, tup1); ... slot = table_insert_next_free_slot(tistate); ... fill slot with tupN table_multi_insert_v2(tistate, tupN); table_insert_end(tistate); Those two uses need comments explaining what's going on. It appears the SKIP_FLUSH flag is used to indicate which use the caller intends. Use #2 is not enforced well by either the API or runtime checks. If the caller neglects to check for a full buffer, it appears that it will just overrun the slots array. Also, for use #2, table_multi_insert_v2() doesn't do much other than incrementing the memory used. The slot will never be NULL because it was obtained with table_multi_insert_next_free_slot(), and the other two branches don't happen when SKIP_FLUSH is true. The real benefit to COPY of your new API is that the AM can manage slots for itself, and how many tuples may be tracked (which might be a lot higher for non-heap AMs). I agree with Luc Vlaming's comment[2] that more should be left to the table AM. Your patch tries too hard to work with the copyfrom.c slot array, somehow sharing it with the table AM. That adds complexity to the API and feels like a layering violation. We also shouldn't mandate a slot array in the API. Each slot is 64 bytes -- a lot of overhead for small tuples. For a non-heap AM, it's much better to store the tuple data in a big contiguous chunk with minimal overhead. Let's just have a simple API like: tmstate = table_modify_begin(...); table_modify_save_insert(tmstate, tup1); ... table_modify_save_insert(tmstate, tupN); table_modify_end(tmstate); and leave it up to the AM to do all the buffering and flushing work (as Luc Vlaming suggested[2]). That leaves one
Re: Trying to build x86 version on windows using meson
Hi, On 2024-03-21 13:17:44 -0400, Dave Cramer wrote: > Attached correct log file Hm. So there's something a bit odd: > Build started at 2024-03-21T13:07:08.707715 > Main binary: C:\Program Files\Meson\meson.exe > Build Options: '-Dextra_include_dirs=c:\Program Files\OpenSSL-Win64\include' > -Derrorlogs=True '-Dextra_lib_dirs=c:\Program Files\OpenSSL-win64' > '-Dprefix=c:\postgres86' > Python system: Windows > The Meson build system > Version: 1.3.1 > Source dir: C:\Users\davec\projects\postgresql > Build dir: C:\Users\davec\projects\postgresql\build > Build type: native build So meson thinks this is a native build, not a cross build. But then later realizes that generated binaries and the current platform aren't the same. And thus errors out. The line numbers don't match my tree, but I think what's failing is the sizeof() check. Which has support for cross builds, but it only uses that (slower) path if it knows that a cross build is being used. I suggest actually telling meson to cross compile. I don't quite know what properties you're going to need, but something like the following (put it in a file, point meson to it wity --cross-file) might give you a start: [properties] needs_exe_wrapper = false [binaries] c = 'cl' cpp = 'cl' ar = 'lib' windres = 'rc' [host_machine] system = 'windows' cpu_family = 'x86_64' cpu = 'x86_64' endian = 'little' Greetings, Andres Freund
Re: session username in default psql prompt?
Andrew Dunstan writes: > On Fri, Mar 22, 2024 at 4:04 PM Robert Haas wrote: >> I'm not really a fan of this. Right now my prompt looks like this: >> robert.haas=# >> If we did this, it would say: >> robert.h...@robert.haas=# There would be similar duplication for, eg, the postgres user connected to the postgres database. However, I'm more worried about the case where they don't match, because then the %~ suggestion doesn't help shorten it. > Of course, people can put this in their .psqlrc, and I do. The suggestion > came about because I had a couple of instances where people using the > default prompt showed me stuff and the problem arose from confusion about > which user they were connected as. On the whole, I think we'd get more complaints about the default prompt having more-or-less doubled in length than we'd get kudos about this being a great usability improvement. Prompt space is expensive and precious, at least for people who aren't in the habit of working in very wide windows. regards, tom lane
Re: Combine Prune and Freeze records emitted by vacuum
On 20/03/2024 21:17, Melanie Plageman wrote: Attached patch changes-for-0001.patch has a bunch of updated comments -- especially for heapam_xlog.h (including my promised diagram). And a few suggestions (mostly changes that I should have made before). New version of these WAL format changes attached. Squashed to one patch now. + // TODO: should we avoid this if we only froze? heap_xlog_freeze() doesn't + // do it Ah yes, that makes sense, did that. In the final commit message, I think it is worth calling out that all of those freeze logging functions heap_log_freeze_eq/cmp/etc are lifted and shifted from one file to another. When I am reviewing a big diff, it is always helpful to know where I need to review line-by-line. Done. From 5d6fc2ffbdd839e0b69242af16446a46cf6a2dc7 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Wed, 20 Mar 2024 13:49:59 +0200 Subject: [PATCH v5 04/26] 'nplans' is a pointer I'm surprised the compiler didn't warn about this oops. while looking at this, I noticed that the asserts I added that nredirected > 0, ndead > 0, and nunused > 0 have the same problem. Good catch, fixed. - remove xlhp_conflict_horizon and store a TransactionId directly. A separate struct would make sense if we needed to store anything else there, but for now it just seems like more code. makes sense. I just want to make sure heapam_xlog.h makes it extra clear that this is happening. I see your comment addition. If you combine it with my comment additions in the attached patch for 0001, hopefully that makes it clear enough. Thanks. I spent more time on the comments throughout the patch. And one notable code change: I replaced the XLHP_LP_TRUNCATE_ONLY flag with XLHP_CLEANUP_LOCK. XLHP_CLEANUP_LOCK directly indicates if you need a cleanup lock to replay the record. It must always be set when XLHP_HAS_REDIRECTIONS or XLHP_HAS_DEAD_ITEMS is set, because replaying those always needs a cleanup lock. That felt easier to document and understand than XLHP_LP_TRUNCATE_ONLY. From 8af186ee9dd8c7dc20f37a69b34cab7b95faa43b Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Wed, 20 Mar 2024 14:03:06 +0200 Subject: [PATCH v5 07/26] Add comment to log_heap_prune_and_freeze(). XXX: This should be rewritten, but I tried to at least list some important points. Are you thinking that it needs to mention more things or that the things it mentions need more detail? I previously just quickly jotted down things that seemed worth mentioning in the comment. It was not so bad actually, but I reworded it a little. From b26e36ba8614d907a6e15810ed4f684f8f628dd2 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Wed, 20 Mar 2024 14:53:31 +0200 Subject: [PATCH v5 08/26] minor refactoring in log_heap_prune_and_freeze() Mostly to make local variables more tightly-scoped. So, I don't think you can move those sub-records into the tighter scope. If you run tests with this applied, you'll see it crashes and a lot of the data in the record is wrong. If you move the sub-record declarations out to a wider scope, the tests pass. The WAL record data isn't actually copied into the buffer until recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE_FREEZE); after registering everything. So all of those sub-records you made are out of scope the time it tries to copy from them. I spent the better part of a day last week trying to figure out what was happening after I did the exact same thing. I must say that I found the xloginsert API incredibly unhelpful on this point. Oops. I had that in mind and that was actually why I moved the XLogRegisterData() call to the end of the function, because I found it confusing to register the struct before filling it in completely, even though it works perfectly fine. But then I missed it anyway when I moved the local variables. I added a brief comment on that. I would like to review the rest of the suggested changes in this patch after we fix the issue I just mentioned. Thanks, review is appreciated. I feel this is ready now, so barring any big new issues, I plan to commit this early next week. There is another patch in the commitfest that touches this area: "Recording whether Heap2/PRUNE records are from VACUUM or from opportunistic pruning" [1]. That actually goes in the opposite direction than this patch. That patch wants to add more information, to show whether a record was emitted by VACUUM or on-access pruning, while this patch makes the freezing and VACUUM's 2nd phase records also look the same. We could easily add more flags to xl_heap_prune to distinguish them. Or assign different xl_info code for them, like that other patch proposed. But I don't think that needs to block this patch, that can be added as a separate patch. [1] https://www.postgresql.org/message-id/cah2-wzmsevhox+hjpfmqgcxwwdgnzp0r9f+vbnpok6tgxmp...@mail.gmail.com -- Heikki Linnakangas Neon (https://neon.tech) From
Re: session username in default psql prompt?
On Fri, Mar 22, 2024 at 4:04 PM Robert Haas wrote: > On Wed, Mar 13, 2024 at 4:56 AM Andrew Dunstan > wrote: > > Reposting as the archive mail processor doesn't seem to like the Apple > > mail attachment. > > I'm not really a fan of this. Right now my prompt looks like this: > > robert.haas=# > > If we did this, it would say: > > robert.h...@robert.haas=# > Hmm. Perhaps we should change the default to "%n@%~%R%x%# " Then when connected to your eponymous database you would see robert.haas@~=# Of course, people can put this in their .psqlrc, and I do. The suggestion came about because I had a couple of instances where people using the default prompt showed me stuff and the problem arose from confusion about which user they were connected as. > I have yet to meet anyone who doesn't think that one Robert Haas is > quite enough already, and perhaps too much by half. > perish the thought. cheers andrew
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Fri, Mar 22, 2024 at 3:15 PM Bertrand Drouvot wrote: > > Looking at v14-0002: Thanks for reviewing. I agree that 0002 with last_inactive_at can go independently and be of use on its own in addition to helping implement inactive_timeout based invalidation. > 1 === > > @@ -691,6 +699,13 @@ ReplicationSlotRelease(void) > ConditionVariableBroadcast(>active_cv); > } > > + if (slot->data.persistency == RS_PERSISTENT) > + { > + SpinLockAcquire(>mutex); > + slot->last_inactive_at = GetCurrentTimestamp(); > + SpinLockRelease(>mutex); > + } > > I'm not sure we should do system calls while we're holding a spinlock. > Assign a variable before? Can do that. Then, the last_inactive_at = current_timestamp + mutex acquire time. But, that shouldn't be a problem than doing system calls while holding the mutex. So, done that way. > 2 === > > Also, what about moving this here? > > " > if (slot->data.persistency == RS_PERSISTENT) > { > /* > * Mark persistent slot inactive. We're not freeing it, just > * disconnecting, but wake up others that may be waiting for it. > */ > SpinLockAcquire(>mutex); > slot->active_pid = 0; > SpinLockRelease(>mutex); > ConditionVariableBroadcast(>active_cv); > } > " > > That would avoid testing twice "slot->data.persistency == RS_PERSISTENT". Ugh. Done that now. > 3 === > > @@ -2341,6 +2356,7 @@ RestoreSlotFromDisk(const char *name) > > slot->in_use = true; > slot->active_pid = 0; > + slot->last_inactive_at = 0; > > I think we should put GetCurrentTimestamp() here. It's done in v14-0006 but I > think it's better to do it in 0002 (and not taking care of inactive_timeout). Done. > 4 === > > Track last_inactive_at in pg_replication_slots > > doc/src/sgml/system-views.sgml | 11 +++ > src/backend/catalog/system_views.sql | 3 ++- > src/backend/replication/slot.c | 16 > src/backend/replication/slotfuncs.c | 7 ++- > src/include/catalog/pg_proc.dat | 6 +++--- > src/include/replication/slot.h | 3 +++ > src/test/regress/expected/rules.out | 5 +++-- > 7 files changed, 44 insertions(+), 7 deletions(-) > > Worth to add some tests too (or we postpone them in future commits because > we're > confident enough they will follow soon)? Yes. Added some tests in a new TAP test file named src/test/recovery/t/043_replslot_misc.pl. This new file can be used to add miscellaneous replication tests in future as well. I couldn't find a better place in existing test files - tried having the new tests for physical slots in t/001_stream_rep.pl and I didn't find a right place for logical slots. > 5 === > > Most of the fields that reflect a time (not duration) in the system views are > _time, so I'm wondering if instead of "last_inactive_at" we should use > something like "last_inactive_time"? Yeah, I can see that. So, I changed it to last_inactive_time. I agree with treating last_inactive_time as a separate property of the slot having its own use in addition to helping implement inactive_timeout based invalidation. I think it can go separately. I tried to address the review comments received for this patch alone and attached v15-0001. I'll post other patches soon. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v15-0001-Track-last_inactive_time-in-pg_replication_slots.patch Description: Binary data
Re: Large block sizes support in Linux
On 3/22/24 19:46, Bruce Momjian wrote: > On Thu, Mar 21, 2024 at 06:46:19PM +0100, Pankaj Raghav (Samsung) wrote: >> Hello, >> >> My team and I have been working on adding Large block size(LBS) >> support to XFS in Linux[1]. Once this feature lands upstream, we will be >> able to create XFS with FS block size > page size of the system on Linux. >> We also gave a talk about it in Linux Plumbers conference recently[2] >> for more context. The initial support is only for XFS but more FSs will >> follow later. >> >> On an x86_64 system, fs block size was limited to 4k, but traditionally >> Postgres uses 8k as its default internal page size. With LBS support, >> fs block size can be set to 8K, thereby matching the Postgres page size. >> >> If the file system block size == DB page size, then Postgres can have >> guarantees that a single DB page will be written as a single unit during >> kernel write back and not split. >> >> My knowledge of Postgres internals is limited, so I'm wondering if there >> are any optimizations or potential optimizations that Postgres could >> leverage once we have LBS support on Linux? > > We have discussed this in the past, and in fact in the early years we > thought we didn't need fsync since the BSD file system was 8k at the > time. > > What we later realized is that we have no guarantee that the file system > will write to the device in the specified block size, and even it it > does, the I/O layers between the OS and the device might not, since many > devices use 512 byte blocks or other sizes. > Right, but things change over time - current storage devices support much larger sectors (LBA format), usually 4K. And if you do I/O with this size, it's usually atomic. AFAIK if you built Postgres with 4K pages, on a device with 4K LBA format, that would not need full-page writes - we always do I/O in 4k pages, and block layer does I/O (during writeback from page cache) with minimum guaranteed size = logical block size. 4K are great for OLTP systems in general, it'd be even better if we didn't need to worry about torn pages (but the tricky part is to be confident it's safe to disable them on a particular system). I did watch the talk linked by Pankaj, and IIUC the promise of the LBS patches is that this benefit would extend would apply even to larger page sizes (= fs page size). Which right now you can't even mount, but the patches allow that. So for example it would be possible to create an XFS filesystem with 8kB pages, and then we'd read/write 8kB pages as usual, and we'd know that the page cache always writes out either the whole page or none of it. Which right now is not guaranteed to happen, it's possible to e.g. write the page as two 4K requests, even if all other things are set properly (drive has 4K logical/physical sectors). At least that's my understanding ... Pankaj, could you clarify what the guarantees provided by LBS are going to be? the talk uses wording like "should be" and "hint" in a couple places, and there's also stuff I'm not 100% familiar with. If we create a filesystem with 8K blocks, and we only ever do writes (and reads) in 8K chunks (our default page size), what guarantees that gives us? What if the underlying device has LBA format with only 4K (or perhaps even just 512B), how would that affect the guarantees? The other thing is - is there a reliable way to say when the guarantees actually apply? I mean, how would the administrator *know* it's safe to set full_page_writes=off, or even better how could we verify this when the database starts (and complain if it's not safe to disable FPW)? It's easy to e.g. take a backup on one filesystem and restore it on another one, and forget those may have different block sizes etc. I'm not sure it's possible in a 100% reliable way (tablespaces?). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: psql not responding to SIGINT upon db reconnection
On Fri Mar 22, 2024 at 12:17 PM CDT, Robert Haas wrote: On Fri, Mar 22, 2024 at 1:05 PM Tristan Partin wrote: > Sorry for taking a while to get back to y'all. I have taken your > feedback into consideration for v9. This is my first time writing > Postgres docs, so I'm ready for another round of editing :). Yeah, that looks like it needs some wordsmithing yet. I can take a crack at that at some point, but here are a few notes: - "takes care of" and "working through the state machine" seem quite vague to me. - the meanings of forRead and forWrite don't seem to be documented. - "retuns" is a typo. > Robert, could you point out some places where comments would be useful > in 0002? I did rename the function and moved it as suggested, thanks! In > turn, I also realized I forgot a prototype, so also added it. Well, for starters, I'd give the function a header comment. Telling me that a 1 second timeout is a 1 second timeout is less useful than telling me why we've picked a 1 second timeout. Presumably the answer is "so we can respond to cancel interrupts in a timely fashion", but I'd make that explicit. It might be worth a comment saying that PQsocket() shouldn't be hoisted out of the loop because it could be a multi-host connection string and the socket might change under us. Unless that's not true, in which case we should hoist the PQsocket() call out of the loop. I think it would also be worth a comment saying that we don't need to report errors here, as the caller will do that; we just need to wait until the connection is known to have either succeeded or failed, or until the user presses cancel. This is good feedback, thanks. I have added comments where you suggested. I reworded the PQsocketPoll docs to hopefully meet your expectations. I am using the term "connection sequence" which is from the PQconnectStartParams docs, so hopefully people will be able to make that connection. I wrote documentation for "forRead" and "forWrite" as well. I had a question about parameter naming. Right now I have a mix of camel-case and snake-case in the function signature since that is what I inherited. Should I change that to be consistent? If so, which case would you like? Thanks for your continued reviews. -- Tristan Partin Neon (https://neon.tech) From dc45e95ab443d973845dd840600d6719dcd4cae2 Mon Sep 17 00:00:00 2001 From: Tristan Partin Date: Tue, 30 Jan 2024 15:40:57 -0600 Subject: [PATCH v10 1/2] Expose PQsocketPoll for frontends PQsocketPoll should help with state machine processing when connecting to a database asynchronously via PQconnectStart(). --- doc/src/sgml/libpq.sgml | 43 +++- src/interfaces/libpq/exports.txt | 1 + src/interfaces/libpq/fe-misc.c | 7 +++--- src/interfaces/libpq/libpq-fe.h | 4 +++ 4 files changed, 50 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index d3e87056f2c..774b57ba70b 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -262,6 +262,44 @@ PGconn *PQsetdb(char *pghost, + + PQsocketPollPQsocketPoll + + + nonblocking connection + Poll a connections underlying socket descriptor retrieved with . + +int PQsocketPoll(int sock, int forRead, int forWrite, time_t end_time); + + + + + This function sets up polling of a file descriptor. The underlying function is either + poll(2) or select(2), depending on platform + support. The primary use of this function is iterating through the connection sequence + described in the documentation of . If + forRead is specified, the function waits for the socket to be ready + for reading. If forWrite is specified, the function waits for the + socket to be ready for write. See POLLIN and POLLOUT + from poll(2), or readfds and + writefds from select(2) for more information. + + + + The function returns a value greater than 0 if the specified condition + is met, 0 if a timeout occurred, or -1 if an error + or interrupt occurred. In the event forRead and + forWrite are not set, the function immediately returns a timeout + condition. + + + + end_time is the time in the future in seconds starting from the UNIX + epoch in which you would like the function to return if the condition is not met. + + + + PQconnectStartParamsPQconnectStartParams PQconnectStartPQconnectStart @@ -358,7 +396,10 @@ PostgresPollingStatusType PQconnectPoll(PGconn *conn); Loop thus: If PQconnectPoll(conn) last returned PGRES_POLLING_READING, wait until the socket is ready to read (as indicated by select(), poll(), or - similar system function). + similar system function). Note that PQsocketPoll + can help reduce boilerplate by abstracting the setup of + select(2) or poll(2) if it is + available on
Re: SET ROLE documentation improvement
On Fri, Mar 22, 2024 at 03:45:06PM -0500, Nathan Bossart wrote: > On Fri, Mar 22, 2024 at 03:58:59PM -0400, Robert Haas wrote: >> On Fri, Nov 10, 2023 at 12:41 PM Nathan Bossart >> wrote: >>> I still think we should update the existing note about privileges for >>> SET/RESET ROLE to something like the following: >>> >>> diff --git a/doc/src/sgml/ref/set_role.sgml b/doc/src/sgml/ref/set_role.sgml >>> index 13bad1bf66..c91a95f5af 100644 >>> --- a/doc/src/sgml/ref/set_role.sgml >>> +++ b/doc/src/sgml/ref/set_role.sgml >>> @@ -41,8 +41,10 @@ RESET ROLE >>> >>> >>> >>> - The specified role_name >>> - must be a role that the current session user is a member of. >>> + The current session user must have the SET for the >>> + specified role_name, either >>> + directly or indirectly via a chain of memberships with the >>> + SET option. >>> (If the session user is a superuser, any role can be selected.) >>> >> >> This is a good change; I should have done this when SET was added. > > Cool. Actually, shouldn't this one be back-patched to v16? If so, I'd do that one separately from the other changes we are discussing. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: SET ROLE documentation improvement
On Fri, Mar 22, 2024 at 03:58:59PM -0400, Robert Haas wrote: > On Fri, Nov 10, 2023 at 12:41 PM Nathan Bossart > wrote: >> Looking again, I'm kind of hesitant to add too much qualification to this >> note about losing superuser privileges. > > The note in question is: > > >Note that when a superuser chooses to SET ROLE to a >non-superuser role, they lose their superuser privileges. > > > It's not entirely clear to me what the point of this note is. I think > we could consider removing it entirely, on the theory that it's just > poorly-stated special case of what's already been said in the > description, i.e. "permissions checking for SQL commands is carried > out as though the named role were the one that had logged in > originally" and "The specified class="parameter">role_name must be a role that the > current session user is a member of." +1. IMHO these kinds of special mentions of SUPERUSER tend to be redundant, and, as evidenced by this thread, confusing. I'll update the patch. > I think it's also possible that what the author of this paragraph > meant was that role attributes like CREATEDB, CREATEROLE, REPLICATION, > and SUPERUSER follow the current user, not the session user. If we > think that was the point of this paragraph, we could make it say that > more clearly. However, I'm not sure that really needs to be mentioned, > because "permissions checking for SQL commands is carried out as > though the named role were the one that had logged in originally" > seems to cover that ground along with everything else. +1 >> I still think we should update the existing note about privileges for >> SET/RESET ROLE to something like the following: >> >> diff --git a/doc/src/sgml/ref/set_role.sgml b/doc/src/sgml/ref/set_role.sgml >> index 13bad1bf66..c91a95f5af 100644 >> --- a/doc/src/sgml/ref/set_role.sgml >> +++ b/doc/src/sgml/ref/set_role.sgml >> @@ -41,8 +41,10 @@ RESET ROLE >> >> >> >> - The specified role_name >> - must be a role that the current session user is a member of. >> + The current session user must have the SET for the >> + specified role_name, either >> + directly or indirectly via a chain of memberships with the >> + SET option. >> (If the session user is a superuser, any role can be selected.) >> > > This is a good change; I should have done this when SET was added. Cool. > Another change we could consider is revising "permissions checking for > SQL commands is carried out as though the named role were the one that > had logged in originally" to mention that SET ROLE and SET SESSION > AUTHORIZATION are exceptions. That seems like a resonable idea, although it might require a few rounds of wordsmithing. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Cannot find a working 64-bit integer type on Illumos
Robert Haas writes: > On Fri, Mar 22, 2024 at 3:31 PM Andres Freund wrote: >> IME the bigger issue is that sometimes it doesn't lead to outright failures, >> just to lots of stuff not being detected as supported / present. > Ugh. That does, indeed, sound not very nice. I could get behind throwing an error if -Werror is spotted. I think trying to pull it out and put it back is too much work and a bit too much magic. regards, tom lane
Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
Nathan Bossart writes: > On Fri, Mar 22, 2024 at 11:54:48AM -0500, Nathan Bossart wrote: >> On Fri, Mar 22, 2024 at 12:53:15PM -0400, Tom Lane wrote: >>> Would you like to review the catcache patch further, or do you >>> think it's good to go? >> I'll take another look this afternoon. > LGTM Thanks for looking, I'll push that shortly. regards, tom lane
Re: [PATCH] plpython function causes server panic
Robert Haas writes: > - I don't think the documentation changes are entirely accurate. The > whole point of the patch is to allow parallel workers to make changes > to the transaction state, but the documentation says you can't. Maybe > we should just delete "change the transaction state" entirely from the > list of things that you're not allowed to do, since "write to the > database" is already listed separately; or maybe we should replace it > with something like "assign new transaction IDs or command IDs," > although that's kind of low-level. I don't think we should just delete > the "even temporarily" bit, as you've done. Fair enough. In the attached v2, I wrote "change the transaction state (other than by using a subtransaction for error recovery)"; what do you think of that? I dug around in the docs and couldn't really find anything about parallel-query transaction limitations other than this bit in parallel.sgml and the more or less copy-pasted text in create_function.sgml; did you have any other spots in mind? (I did find the commentary in README.parallel, but that's not exactly user-facing.) > - While I like the new comments in BeginInternalSubTransaction(), I > think the changes in ReleaseCurrentSubTransaction() and > RollbackAndReleaseCurrentSubTransaction() need more thought. Yah. After studying the code a bit more, I realized that what I'd done would cause IsInParallelMode() to start returning false during a subtransaction within parallel mode, which is surely not what we want. That state has to be heritable into subtransactions in some fashion. The attached keeps the current semantics of parallelModeLevel and adds a bool parallelChildXact field that is true if any outer transaction level has nonzero parallelModeLevel. That's possibly more general than we need today, but it seems like a reasonably clean definition. > One additional thing that might (or might not) be worth mentioning or > checking for here is that the leader shouldn't try to reduce the > height of the transaction state stack to anything less than what it > was when the parallel operation started; if it wants to do that, it > needs to clean up the parallel workers and exit parallel mode first. > Similarly a worker shouldn't ever end the toplevel transaction except > during backend cleanup. I think these things are already dealt with. However, one thing worth questioning is that CommitSubTransaction() will just silently kill any workers started during the current subxact, and likewise CommitTransaction() zaps workers without complaint. Shouldn't these instead throw an error about how you didn't close parallel mode, and then the corresponding Abort function does the cleanup? I did not change that behavior here, but it seems dubious. v2 attached works a bit harder on the comments and adds a simplistic test case. I feel that I don't want to incorporate the plpython crash that started this thread, as it's weird and dependent on Python code outside our control (though I have checked that we don't crash on that anymore). regards, tom lane diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml index 5acc9537d6..dae9dd7f2f 100644 --- a/doc/src/sgml/parallel.sgml +++ b/doc/src/sgml/parallel.sgml @@ -545,10 +545,10 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; -Functions and aggregates must be marked PARALLEL UNSAFE if -they write to the database, access sequences, change the transaction state -even temporarily (e.g., a PL/pgSQL function that establishes an -EXCEPTION block to catch errors), or make persistent changes to +Functions and aggregates must be marked PARALLEL UNSAFE +if they write to the database, change the transaction state (other than by +using a subtransaction for error recovery), access sequences, or make +persistent changes to settings. Similarly, functions must be marked PARALLEL RESTRICTED if they access temporary tables, client connection state, cursors, prepared statements, or miscellaneous backend-local state that diff --git a/doc/src/sgml/ref/create_function.sgml b/doc/src/sgml/ref/create_function.sgml index 863d99d1fc..0d240484cd 100644 --- a/doc/src/sgml/ref/create_function.sgml +++ b/doc/src/sgml/ref/create_function.sgml @@ -428,22 +428,24 @@ CREATE [ OR REPLACE ] FUNCTION PARALLEL - PARALLEL UNSAFE indicates that the function - can't be executed in parallel mode and the presence of such a + + PARALLEL UNSAFE indicates that the function + can't be executed in parallel mode; the presence of such a function in an SQL statement forces a serial execution plan. This is the default. PARALLEL RESTRICTED indicates that - the function can be executed in parallel mode, but the execution is - restricted to parallel group leader. PARALLEL SAFE + the function can be executed in parallel mode, but only in the parallel +
Re: session username in default psql prompt?
On Wed, Mar 13, 2024 at 4:56 AM Andrew Dunstan wrote: > Reposting as the archive mail processor doesn't seem to like the Apple > mail attachment. I'm not really a fan of this. Right now my prompt looks like this: robert.haas=# If we did this, it would say: robert.h...@robert.haas=# I have yet to meet anyone who doesn't think that one Robert Haas is quite enough already, and perhaps too much by half. -- Robert Haas EDB: http://www.enterprisedb.com
Re: SET ROLE documentation improvement
On Fri, Nov 10, 2023 at 12:41 PM Nathan Bossart wrote: > On Tue, Sep 26, 2023 at 08:33:25AM -0700, Yurii Rashkovskii wrote: > > This is a good start, indeed. I've amended my patch to include it. > > Thanks for the new patch. > > Looking again, I'm kind of hesitant to add too much qualification to this > note about losing superuser privileges. The note in question is: Note that when a superuser chooses to SET ROLE to a non-superuser role, they lose their superuser privileges. It's not entirely clear to me what the point of this note is. I think we could consider removing it entirely, on the theory that it's just poorly-stated special case of what's already been said in the description, i.e. "permissions checking for SQL commands is carried out as though the named role were the one that had logged in originally" and "The specified role_name must be a role that the current session user is a member of." I think it's also possible that what the author of this paragraph meant was that role attributes like CREATEDB, CREATEROLE, REPLICATION, and SUPERUSER follow the current user, not the session user. If we think that was the point of this paragraph, we could make it say that more clearly. However, I'm not sure that really needs to be mentioned, because "permissions checking for SQL commands is carried out as though the named role were the one that had logged in originally" seems to cover that ground along with everything else. > I still think we should update the existing note about privileges for > SET/RESET ROLE to something like the following: > > diff --git a/doc/src/sgml/ref/set_role.sgml b/doc/src/sgml/ref/set_role.sgml > index 13bad1bf66..c91a95f5af 100644 > --- a/doc/src/sgml/ref/set_role.sgml > +++ b/doc/src/sgml/ref/set_role.sgml > @@ -41,8 +41,10 @@ RESET ROLE > > > > - The specified role_name > - must be a role that the current session user is a member of. > + The current session user must have the SET for the > + specified role_name, either > + directly or indirectly via a chain of memberships with the > + SET option. > (If the session user is a superuser, any role can be selected.) > This is a good change; I should have done this when SET was added. Another change we could consider is revising "permissions checking for SQL commands is carried out as though the named role were the one that had logged in originally" to mention that SET ROLE and SET SESSION AUTHORIZATION are exceptions. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [HACKERS] make async slave to wait for lsn to be replayed
Thank you for your feedback. On 2024-03-20 12:11, Alexander Korotkov wrote: On Wed, Mar 20, 2024 at 12:34 AM Kartyshov Ivan wrote: > 4.2 With an unreasonably high future LSN, BEGIN command waits > unboundedly, shouldn't we check if the specified LSN is more than > pg_last_wal_receive_lsn() error out? I think limiting wait lsn by current received lsn would destroy the whole value of this feature. The value is to wait till given LSN is replayed, whether it's already received or not. Ok sounds reasonable, I`ll rollback the changes. But I don't see a problem here. On the replica, it's out of our control to check which lsn is good and which is not. We can't check whether the lsn, which is in future for the replica, is already issued by primary. For the case of wrong lsn, which could cause potentially infinite wait, there is the timeout and the manual query cancel. Fully agree with this take. > 4.3 With an unreasonably high wait time, BEGIN command waits > unboundedly, shouldn't we restrict the wait time to some max value, > say a day or so? > SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset > BEGIN AFTER :'future_receive_lsn' WITHIN 10; Good idea, I put it 1 day. But this limit we should to discuss. Do you think that specifying timeout in milliseconds is suitable? I would prefer to switch to seconds (with ability to specify fraction of second). This was expressed before by Alexander Lakhin. It sounds like an interesting idea. Please review the result. > https://github.com/macdice/redo-bench or similar tools? Ivan, could you do this? Yes, test redo-bench/crash-recovery.sh This patch on master 91.327, 1.973 105.907, 3.338 98.412, 4.579 95.818, 4.19 REL_13-STABLE 116.645, 3.005 113.212, 2.568 117.644, 3.183 111.411, 2.782 master 124.712, 2.047 117.012, 1.736 116.328, 2.035 115.662, 1.797 Strange behavior, patched version is faster then REL_13-STABLE and master. I don't see this change in the patch. Normally if a process gets a signal, that causes WaitLatch() to exit immediately. It also exists immediately on query cancel. IIRC, this 1 minute timeout is needed to handle some extreme cases when an interrupt is missing. Other places have it equal to 1 minute. I don't see why we should have it different. Ok, I`ll rollback my changes. 4) added and expanded sections in the documentation I don't see this in the patch. I see only a short description in func.sgml, which is definitely not sufficient. We need at least everything we have in the docs before to be adjusted with the current approach of procedure. I didn't find another section where to add the description of pg_wait_lsn(). So I extend description on the bottom of the table. 5) add default variant of timeout pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0) example: pg_wait_lsn('0/31B1B60') equal pg_wait_lsn('0/31B1B60', 0) Does zero here mean no timeout? I think this should be documented. Also, I would prefer to see the timeout by default. Probably one minute would be good for default. Lets discuss this point. Loop in function WaitForLSN is made that way, if we choose delay=0, only then we can wait infinitely to wait LSN without timeout. So default must be 0. Please take one more look on the patch. PS sorry, the strange BUG throw my mails out of thread. https://www.postgresql.org/message-id/flat/f2ff071aa9141405bb8efee67558a058%40postgrespro.ru -- Ivan Kartyshov Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 5b225ccf4f..6e1f69962d 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27861,10 +27861,54 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset extension. + + + + + pg_wait_lsn + +pg_wait_lsn (trg_lsn pg_lsn, delay int8 DEFAULT 0) + + +Returns ERROR if target LSN was not replayed on standby. +Parameter delay sets seconds, the time to wait to the LSN. + + + +pg_wait_lsn waits till wait_lsn +to be replayed on standby to achieve read-your-writes-consistency, while +using async replica for reads and primary for writes. In that case lsn of +last modification is stored inside application. + + + +You can use pg_wait_lsn to wait the pg_lsn +value. For example: + + Replica: + postgresql.conf + recovery_min_apply_delay = 10s; + + Primary: Application update table and get lsn of changes + postgres=# UPDATE films SET kind = 'Dramatic' WHERE kind = 'Drama'; + postgres=# SELECT pg_current_wal_lsn(); + pg_current_wal_lsn + + 0/306EE20 + (1 row) + + Replica: Wait and read updated data + postgres=# CALL pg_wait_lsn('0/306EE20', 100); // Timeout 100ms is insufficent + ERROR: canceling waiting for LSN due to timeout + postgres=# CALL pg_wait_lsn('0/306EE20'); +
Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
On Fri, Mar 22, 2024 at 11:54:48AM -0500, Nathan Bossart wrote: > On Fri, Mar 22, 2024 at 12:53:15PM -0400, Tom Lane wrote: >> Would you like to review the catcache patch further, or do you >> think it's good to go? > > I'll take another look this afternoon. LGTM -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Cannot find a working 64-bit integer type on Illumos
On Fri, Mar 22, 2024 at 3:31 PM Andres Freund wrote: > IME the bigger issue is that sometimes it doesn't lead to outright failures, > just to lots of stuff not being detected as supported / present. Ugh. That does, indeed, sound not very nice. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Cannot find a working 64-bit integer type on Illumos
Hi, On 2024-03-22 15:02:45 -0400, Robert Haas wrote: > On Fri, Mar 22, 2024 at 2:38 PM Andres Freund wrote: > > I wonder if we ought to make configure warn if it sees -Werror in CFLAGS - > > this is far from the first time somebody stumbling with -Werror. Including a > > few quite senior hackers, if I recall correctly. We could also just filter > > it > > temporarily and put it back at the end of configure. > > I think I made this mistake at some point, but I just looked at > config.log and corrected my mistake. IME the bigger issue is that sometimes it doesn't lead to outright failures, just to lots of stuff not being detected as supported / present. Greetings, Andres Freund
Re: [HACKERS] make async slave to wait for lsn to be replayed
Thank you for your feedback. On 2024-03-20 12:11, Alexander Korotkov wrote: On Wed, Mar 20, 2024 at 12:34 AM Kartyshov Ivan wrote: > 4.2 With an unreasonably high future LSN, BEGIN command waits > unboundedly, shouldn't we check if the specified LSN is more than > pg_last_wal_receive_lsn() error out? I think limiting wait lsn by current received lsn would destroy the whole value of this feature. The value is to wait till given LSN is replayed, whether it's already received or not. Ok sounds reasonable, I`ll rollback the changes. But I don't see a problem here. On the replica, it's out of our control to check which lsn is good and which is not. We can't check whether the lsn, which is in future for the replica, is already issued by primary. For the case of wrong lsn, which could cause potentially infinite wait, there is the timeout and the manual query cancel. Fully agree with this take. > 4.3 With an unreasonably high wait time, BEGIN command waits > unboundedly, shouldn't we restrict the wait time to some max value, > say a day or so? > SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset > BEGIN AFTER :'future_receive_lsn' WITHIN 10; Good idea, I put it 1 day. But this limit we should to discuss. Do you think that specifying timeout in milliseconds is suitable? I would prefer to switch to seconds (with ability to specify fraction of second). This was expressed before by Alexander Lakhin. It sounds like an interesting idea. Please review the result. > https://github.com/macdice/redo-bench or similar tools? Ivan, could you do this? Yes, test redo-bench/crash-recovery.sh This patch on master 91.327, 1.973 105.907, 3.338 98.412, 4.579 95.818, 4.19 REL_13-STABLE 116.645, 3.005 113.212, 2.568 117.644, 3.183 111.411, 2.782 master 124.712, 2.047 117.012, 1.736 116.328, 2.035 115.662, 1.797 Strange behavior, patched version is faster then REL_13-STABLE and master. I don't see this change in the patch. Normally if a process gets a signal, that causes WaitLatch() to exit immediately. It also exists immediately on query cancel. IIRC, this 1 minute timeout is needed to handle some extreme cases when an interrupt is missing. Other places have it equal to 1 minute. I don't see why we should have it different. Ok, I`ll rollback my changes. 4) added and expanded sections in the documentation I don't see this in the patch. I see only a short description in func.sgml, which is definitely not sufficient. We need at least everything we have in the docs before to be adjusted with the current approach of procedure. I didn't find another section where to add the description of pg_wait_lsn(). So I extend description on the bottom of the table. 5) add default variant of timeout pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0) example: pg_wait_lsn('0/31B1B60') equal pg_wait_lsn('0/31B1B60', 0) Does zero here mean no timeout? I think this should be documented. Also, I would prefer to see the timeout by default. Probably one minute would be good for default. Lets discuss this point. Loop in function WaitForLSN is made that way, if we choose delay=0, only then we can wait infinitely to wait LSN without timeout. So default must be 0. Please take one more look on the patch. -- Ivan Kartyshov Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 5b225ccf4f..6e1f69962d 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27861,10 +27861,54 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset extension. + + + + + pg_wait_lsn + +pg_wait_lsn (trg_lsn pg_lsn, delay int8 DEFAULT 0) + + +Returns ERROR if target LSN was not replayed on standby. +Parameter delay sets seconds, the time to wait to the LSN. + + + +pg_wait_lsn waits till wait_lsn +to be replayed on standby to achieve read-your-writes-consistency, while +using async replica for reads and primary for writes. In that case lsn of +last modification is stored inside application. + + + +You can use pg_wait_lsn to wait the pg_lsn +value. For example: + + Replica: + postgresql.conf + recovery_min_apply_delay = 10s; + + Primary: Application update table and get lsn of changes + postgres=# UPDATE films SET kind = 'Dramatic' WHERE kind = 'Drama'; + postgres=# SELECT pg_current_wal_lsn(); + pg_current_wal_lsn + + 0/306EE20 + (1 row) + + Replica: Wait and read updated data + postgres=# CALL pg_wait_lsn('0/306EE20', 100); // Timeout 100ms is insufficent + ERROR: canceling waiting for LSN due to timeout + postgres=# CALL pg_wait_lsn('0/306EE20'); + postgres=# SELECT * FROM films WHERE kind = 'Drama'; + + + The functions shown in control the progress of recovery. diff --git
Re: documentation structure
On Fri, Mar 22, 2024 at 3:17 PM Bruce Momjian wrote: > I agree and they should be with the other views. I was just explaining > why, at the time, I didn't touch them. Ah, OK. That makes total sense. -- Robert Haas EDB: http://www.enterprisedb.com
Re: documentation structure
On Fri, Mar 22, 2024 at 03:13:29PM -0400, Robert Haas wrote: > On Fri, Mar 22, 2024 at 2:59 PM Bruce Momjian wrote: > > I assume statistics collector views are in "Monitoring Database > > Activity" because that is their purpose. > > Well, yes. :-) > > But the point is that all other statistics views are in a single > section regardless of their purpose. We don't document pg_roles in the > "Database Roles" chapter, for example. > > And on the flip side, pg_locks and pg_replication_origin_status are > also for monitoring database activity, but they're in the "System > Views" chapter anyway. The only system views that are in "Monitoring > Database Activity" rather than "System Views" are the ones where the > name starts with "pg_stat_". > > So the reason you state is why these views are under "Monitoring > Database Activity" rather than a chapter chosen at random. But it > doesn't really explain why they're separate from the other system > views at all. That seems to be a pretty much random choice, AFAICT. I agree and they should be with the other views. I was just explaining why, at the time, I didn't touch them. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: documentation structure
On Fri, Mar 22, 2024 at 2:59 PM Bruce Momjian wrote: > I assume statistics collector views are in "Monitoring Database > Activity" because that is their purpose. Well, yes. :-) But the point is that all other statistics views are in a single section regardless of their purpose. We don't document pg_roles in the "Database Roles" chapter, for example. And on the flip side, pg_locks and pg_replication_origin_status are also for monitoring database activity, but they're in the "System Views" chapter anyway. The only system views that are in "Monitoring Database Activity" rather than "System Views" are the ones where the name starts with "pg_stat_". So the reason you state is why these views are under "Monitoring Database Activity" rather than a chapter chosen at random. But it doesn't really explain why they're separate from the other system views at all. That seems to be a pretty much random choice, AFAICT. -- Robert Haas EDB: http://www.enterprisedb.com
Re: documentation structure
On Fri, Mar 22, 2024 at 11:19 AM Robert Haas wrote: > On Fri, Mar 22, 2024 at 1:35 PM Bruce Momjian wrote: > > But that all seems like a separate question from why we have the > statistic collector views in a completely different part of the > documentation from the rest of the system views. My guess is that it's > just kind of a historical accident, but maybe there was some other > logic to it. > > The details under-pinning the cumulative statistics subsystem are definitely large enough to warrant their own subsection. And it isn't like placing them into the monitoring chapter is wrong and aside from a couple of views those under System Views don't fit into what we've defined as monitoring. I don't have any desire to lump them under the generic system views; which itself could probably use a level of categorization since the nature of pg_locks and pg_cursors is decidedly different than pg_indexes and pg_config. This all becomes more appealing to work on once we solve the problem of all sect2 entries being placed on a single page. I struggled for a long while where I'd always look for pg_stat_activity under system views instead of monitoring. Amending my prior suggestion in light of this I would suggest we move the Cumulative Statistics Views into Reference but as its own Chapter, not part of System Views, and change its name to "Monitoring Views" (going more generalized here feels like a win to me). I'd move pg_locks, pg_cursors, pg_backend_memory_contexts, pg_prepared_*, pg_shmem_allocations, and pg_replication_*. Those all have the same general monitoring nature to them compared to the others that basically provide details regarding schema and static or session configuration. The original server admin monitoring section can go into detail regarding Cumulative Statistics versus other kinds of monitoring. We can use section ordering to fulfill logical grouping desires until we are able to make section3 entries appear on their own pages. David J.
Re: Cannot find a working 64-bit integer type on Illumos
On Fri, Mar 22, 2024 at 2:38 PM Andres Freund wrote: > I wonder if we ought to make configure warn if it sees -Werror in CFLAGS - > this is far from the first time somebody stumbling with -Werror. Including a > few quite senior hackers, if I recall correctly. We could also just filter it > temporarily and put it back at the end of configure. I think I made this mistake at some point, but I just looked at config.log and corrected my mistake. I'm not strongly against having an explicit check for -Werror, but I think the main problem here is that the original poster didn't have a look at config.log to see what the actual problem was, and at least IME that's necessary in pretty much 100% of cases where configure fails for whatever reason. Perhaps autotools could be better-designed in that regard, but we don't necessarily want to work around every problem that can stem from that design choice in our code, either. -- Robert Haas EDB: http://www.enterprisedb.com
Re: documentation structure
On Fri, Mar 22, 2024 at 02:19:29PM -0400, Robert Haas wrote: > If you were actually looking for the section called "System Views", > you weren't likely to see it here unless you already knew it was > there, because it was 64 items into a 97-item list. Having one of > these two sections inside the other just doesn't work at all. We could > have alternatively chosen to have one chapter with two tags > inside of it, but I think what you actually did was perfectly fine. > IMHO, "System Views" is important enough (and big enough) that giving > it its own chapter is perfectly reasonable. > > But that all seems like a separate question from why we have the > statistic collector views in a completely different part of the > documentation from the rest of the system views. My guess is that it's > just kind of a historical accident, but maybe there was some other > logic to it. I assume statistics collector views are in "Monitoring Database Activity" because that is their purpose. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Large block sizes support in Linux
On Thu, Mar 21, 2024 at 06:46:19PM +0100, Pankaj Raghav (Samsung) wrote: > Hello, > > My team and I have been working on adding Large block size(LBS) > support to XFS in Linux[1]. Once this feature lands upstream, we will be > able to create XFS with FS block size > page size of the system on Linux. > We also gave a talk about it in Linux Plumbers conference recently[2] > for more context. The initial support is only for XFS but more FSs will > follow later. > > On an x86_64 system, fs block size was limited to 4k, but traditionally > Postgres uses 8k as its default internal page size. With LBS support, > fs block size can be set to 8K, thereby matching the Postgres page size. > > If the file system block size == DB page size, then Postgres can have > guarantees that a single DB page will be written as a single unit during > kernel write back and not split. > > My knowledge of Postgres internals is limited, so I'm wondering if there > are any optimizations or potential optimizations that Postgres could > leverage once we have LBS support on Linux? We have discussed this in the past, and in fact in the early years we thought we didn't need fsync since the BSD file system was 8k at the time. What we later realized is that we have no guarantee that the file system will write to the device in the specified block size, and even it it does, the I/O layers between the OS and the device might not, since many devices use 512 byte blocks or other sizes. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: pg_upgrade --copy-file-range
On Fri, Mar 22, 2024 at 1:22 PM Tomas Vondra wrote: > Right, this will happen: > > pg_combinebackup: error: unable to use accelerated copy when manifest > checksums are to be calculated. Use --no-manifest > > Are you saying we should just silently override the copy method and do > the copy block by block? Yes. > I'm not strongly opposed to that, but it feels > wrong to just ignore that the user explicitly requested cloning, and I'm > not sure why should this be different from any other case when the user > requests incompatible combination of options and/or options that are not > supported on the current configuration. I don't feel like copying block-by-block when that's needed to compute a checksum is ignoring what the user requested. I mean, if we'd had to perform reconstruction rather than copying an entire file, we would have done that regardless of whether --clone had been there, and I don't see why the need-to-compute-a-checksum case is any different. I think we should view a flag like --clone as specifying how to copy a file when we don't need to do anything but copy it. I don't think it should dictate that we're not allowed to perform other processing when that other processing is required. >From my point of view, this is not a case of incompatible options having been specified. If you specify run pg_basebackup with both --format=p and --format=t, those are incompatible options; the backup can be done one way or the other, but not both at once. But here there's no such conflict. Block-by-block copying and fast-copying can happen as part of the same operation, as in the example that I showed, where some files need the block-by-block copying and some can be fast-copied. The user is entitled to specify which fast-copying method they would like to have used for the files where fast-copying is possible without getting a failure just because it isn't possible for every single file. Or to say it the other way around, if there's 1 file that needs to be copied block by block and 99 files that can be fast-copied, you want to force the user to the block-by-block method for all 100 files. I want to force the use of the block-by-block method for the 1 file where that's the only valid method, and let them choose what they want to do for the other 99. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Cannot find a working 64-bit integer type on Illumos
Hi, On 2024-03-22 13:25:56 -0400, Tom Lane wrote: > The short answer is that Autoconf is not designed to support -Werror > and it's not worth it to try to make it do so. I wonder if we ought to make configure warn if it sees -Werror in CFLAGS - this is far from the first time somebody stumbling with -Werror. Including a few quite senior hackers, if I recall correctly. We could also just filter it temporarily and put it back at the end of configure. I don't think there's great way of making the autoconf buildsystem use -Werror continually, today. IIRC the best way is to use Makefile.custom. Greetings, Andres Freund
Re: documentation structure
On Fri, Mar 22, 2024 at 1:35 PM Bruce Momjian wrote: > > One question I have is why all of these views are documented here > > rather than in chapter 53, "System Views," because surely they are > > system views. I feel like if our documentation index weren't a mile > > long and if you could easily find the entry for "System Views," that's > > where you would naturally look for these details. I don't think it's > > natural for a user to expect that most of the system views are going > > to be documented in section VII, chapter 53 but one particular kind is > > going to be documented in section III, chapter 27, under a chapter > > Well, until this commit in 2022, the system views were _under_ the > system catalogs chapter: Even before that commit, the statistics collector views were documented in a completely separate part of the documentation from all of the other system views. I think that commit was a good idea, even though it made the top-level documentation index bigger, because in v14, the "System Catalogs" chapter looks like this: ... 52.61. pg_ts_template 52.62. pg_type 52.63. pg_user_mapping 52.64. System Views 52.65. pg_available_extensions 52.66. pg_available_extension_versions 52.67. pg_backend_memory_contexts ... If you were actually looking for the section called "System Views", you weren't likely to see it here unless you already knew it was there, because it was 64 items into a 97-item list. Having one of these two sections inside the other just doesn't work at all. We could have alternatively chosen to have one chapter with two tags inside of it, but I think what you actually did was perfectly fine. IMHO, "System Views" is important enough (and big enough) that giving it its own chapter is perfectly reasonable. But that all seems like a separate question from why we have the statistic collector views in a completely different part of the documentation from the rest of the system views. My guess is that it's just kind of a historical accident, but maybe there was some other logic to it. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] plpython function causes server panic
On Fri, Mar 22, 2024 at 1:52 PM Tom Lane wrote: > Robert Haas writes: > > I agree with the general direction. A few comments: > > > - Isn't it redundant to test if IsInParallelMode() || > > IsParallelWorker()? We can't be in a parallel worker without also > > being in parallel mode, except during the worker startup sequence. > > Hmm. The existing code in AssignTransactionId and > CommandCounterIncrement tests both, so I figured that the conservative > course was to make DefineSavepoint and friends test both. Are you > saying AssignTransactionId and CommandCounterIncrement are wrong? > If you're saying you don't believe that these routines are reachable > during parallel worker start, that could be true, but I'm not sure > I want to make that assumption. In any case, surely the xxxSavepoint > routines are not hot enough to make it an interesting > micro-optimization. (Perhaps it is worthwhile in AssignTransactionId > and CCI, but changing those seems like a job for another patch.) Yeah, that's all fair enough. I went back and looked at the history of this and found 94b4f7e2a635c3027a23b07086f740615b56aa64. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] plpython function causes server panic
Robert Haas writes: > I agree with the general direction. A few comments: > - Isn't it redundant to test if IsInParallelMode() || > IsParallelWorker()? We can't be in a parallel worker without also > being in parallel mode, except during the worker startup sequence. Hmm. The existing code in AssignTransactionId and CommandCounterIncrement tests both, so I figured that the conservative course was to make DefineSavepoint and friends test both. Are you saying AssignTransactionId and CommandCounterIncrement are wrong? If you're saying you don't believe that these routines are reachable during parallel worker start, that could be true, but I'm not sure I want to make that assumption. In any case, surely the xxxSavepoint routines are not hot enough to make it an interesting micro-optimization. (Perhaps it is worthwhile in AssignTransactionId and CCI, but changing those seems like a job for another patch.) regards, tom lane
Re: [DOC] Add detail regarding resource consumption wrt max_connections
On Fri, Mar 8, 2024 at 9:52 AM Robert Treat wrote: > I'm of the opinion that advice suggestingDBA's set things to DEBUG 3 > is unfriendly at best. If you really want to add more, there is an > existing unfriendly section of the docs at > https://www.postgresql.org/docs/devel/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT > that mentions this problem, specifically: > > "If PostgreSQL itself is the cause of the system running out of > memory, you can avoid the problem by changing your configuration. In > some cases, it may help to lower memory-related configuration > parameters, particularly shared_buffers, work_mem, and > hash_mem_multiplier. In other cases, the problem may be caused by > allowing too many connections to the database server itself. In many > cases, it may be better to reduce max_connections and instead make use > of external connection-pooling software." > > I couldn't really find a spot to add in your additional info, but > maybe you can find a spot that fits? Or maybe a well written > walk-through of this would make for a good wiki page in case people > really want to dig in. > > In any case, I think Roberto's original language is an improvement > over what we have now, so I'd probably recommend just going with that, > along with a similar note to max_prepared_xacts, and optionally a > pointer to the shared mem section of the docs. I agree with this. I don't agree with Cary's statement that if you increase max_connections you should increase shared_buffers as well. That seems situation-dependent to me, and it's also missing Roberto's point, which is that JUST increasing max_connections without doing anything else uses more shared memory. Similarly, I don't think we need to document a detailed testing procedure, as proposed by Reid. If users want to know exactly how many additional resources are used, they can test; either using the DEBUG3 approach, or perhaps more simply via the pg_shmem_allocations view. But I think it's overkill for us to recommend any specific testing procedure here. Rather, I think that it's entirely appropriate to do what Roberto suggested, which is to say, let users know that they're going to use some extra resources if they increase the setting, and then let them figure out what if anything they want to do about that. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Proposal for implementing OCSP Stapling in PostgreSQL
On Tue, Mar 5, 2024 at 4:12 PM David Zhang wrote: > Any comments or feedback would be greatly appreciated! Hi David -- I haven't had time to get to this for the 17 release cycle, but I'm interested in this feature and I intend to review it at some point for 18. I think OCSP will be especially helpful for anyone relying on sslrootcert=system. Thanks for working on it! --Jacob
Re: Add minimal C example and SQL registration example for custom table access methods.
On Fri, Jan 26, 2024 at 3:03 PM Fabrízio de Royes Mello wrote: > On Wed, Nov 15, 2023 at 8:29 PM Roberto Mello wrote: > > Suggestion: > > > > In the C example you added you mention in the comment: > > > > + /* Methods from TableAmRoutine omitted from example, but all > > + non-optional ones must be provided here. */ > > > > Perhaps you could provide a "see " to point the reader finding your > > example where he could find these non-optional methods he must provide? > > > > Nitpicking a little: your patch appears to change more lines than it does, > > because it added line breaks earlier in the lines. I would generally avoid > > that unless there's good reason to do so. > > Hey folks, > > There is a previous patch [1] around the same topic. What about joining > efforts on pointing these documentation changes to the proposed test module? > > [1] https://commitfest.postgresql.org/46/4588/ Looking over this thread, I see that it was moved from pgsql-docs to pgsql-hackers while at the same time dropping the original poster from the Cc list. That seems rather unfortunate. I suspect there's a pretty good chance that Phil Eaton hasn't seen any of the replies other than the first one from Paul Jungwirth, which is also the only one that didn't ask for anything to be changed. Re-adding Phil. Phil, you should have a look over https://www.postgresql.org/message-id/flat/CAByiw%2Br%2BCS-ojBDP7Dm%3D9YeOLkZTXVnBmOe_ajK%3Den8C_zB3_g%40mail.gmail.com and respond to the various emails and probably update the patch somehow. Note that feature freeze is in 2 weeks, so if we can't reach agreement on what is to be done here soon, this will have to wait for the next cycle, or later. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Adding comments to help understand psql hidden queries
On Fri, Mar 22, 2024 at 11:39 AM David Christensen wrote: > I think it's easier to keep the widths balanced than constant (patch > version included here) Yeah, I'm fine with that, especially because nobody is translating it, nor are they likely to, to be honest. Cheers, Greg
Re: documentation structure
On Fri, Mar 22, 2024 at 08:32:14AM -0400, Robert Haas wrote: > On Thu, Mar 21, 2024 at 6:32 PM David G. Johnston > wrote: > > Just going to note that the section on the cumulative statistics views > > being a single page is still a strongly bothersome issue here. Though the > > quick fix actually requires upgrading the section to chapter status... > > Yeah, I've been bothered by this in the past, too. I'm not very keen > to start promoting things to the top-level, though. I think we need a > more thoughtful fix than that. > > One question I have is why all of these views are documented here > rather than in chapter 53, "System Views," because surely they are > system views. I feel like if our documentation index weren't a mile > long and if you could easily find the entry for "System Views," that's > where you would naturally look for these details. I don't think it's > natural for a user to expect that most of the system views are going > to be documented in section VII, chapter 53 but one particular kind is > going to be documented in section III, chapter 27, under a chapter Well, until this commit in 2022, the system views were _under_ the system catalogs chapter: commit 64d364bb39c Author: Bruce Momjian Date: Thu Jul 14 16:07:12 2022 -0400 doc: move system views section to its own chapter Previously it was inside the system catalogs chapter. Reported-by: Peter Smith Discussion: https://postgr.es/m/cahut+psmc18qp60d+l0hjboxrlqt5m88yvacdyxlq34gfph...@mail.gmail.com Backpatch-through: 15 The thread contains more discussion the issue, and I think it still needs help: https://www.postgresql.org/message-id/flat/CAHut%2BPsMc18QP60D%2BL0hJBOXrLQT5m88yVaCDyxLq34gfPHsow%40mail.gmail.com -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Simplify documentation related to Windows builds
On Tue, Jan 30, 2024 at 3:02 AM Michael Paquier wrote: > The more I think about this thread, the more I'd tend to wipe out most > of "windows-requirements" for the sole reason that it is the far-west > regarding the various ways it is possible to get the dependencies we > need for the build and at runtime. We could keep it minimal with the > set of requirements we are listing under meson in terms of versions: > https://www.postgresql.org/docs/devel/install-requirements.html I'm not very knowledgeable about building software about Windows in general, but on the rare occasions that I've done it, it was MUCH harder to figure out where to get things like Perl that it is on Linux or macOS machines. On Linux, your package manager probably knows about everything you need, and if it doesn't, you can probably fix that by adding an additional RPM repository to your configuration or using something like CPAN to find Perl modules that your OS package manager doesn't have. On macOS, you can install homebrew or macports and then get most things from there. But on Windows you have to go download installers individually for everything you need, and there's lots of installers on the Internet, and not all of them are prepared by equally friendly people, and not all of them necessarily work for building PostgreSQL. So I think that it's pretty darn helpful to have some installation instructions in the documentation for stuff like this, just like I think it's useful that in the documentation index we tell people how to get the doc toolchain working on various platforms. I understand the concern about seeming to endorse particular Perl distributions or other software bundles, but I also don't like the idea of telling people something that boils down to "hey, it's possible to get this to compile on Windows, and we know some methods that do work, but we're not going to tell you what they are because we don't want to endorse anything so ... good luck!". If we know a set of things that work, I think we should list them in the documentation and just say that we're not endorsing the use of these particular distributions but just telling you that we've tested with them. And then I think we should update that as things change. -- Robert Haas EDB: http://www.enterprisedb.com
Re: make dist using git archive
On Thu Mar 21, 2024 at 3:44 AM CDT, Peter Eisentraut wrote: Here is an updated version of this patch set. You should add 'disabler: true' to the git find_program in Meson. If Git doesn't exist on the system with the way your patch is currently written, the targets would be defined, even though they would never succeed. You may also want to make sure that we are actually in a Git repository. I don't think git-archive works outside one. Re the autoclrf, is this something we could throw in a .gitattributes files? I have removed the "dirty check" stuff. It didn't really work well/was buggy under meson, and it failed mysteriously on the Linux CI tasks. So let's just drop that functionality for now. I have also added a more complete commit message and some more code comments. Meson has its own distribution building command (meson dist), but we are not using that at this point. The main problem is that the way they have implemented it, it is not deterministic in the above sense. Also, we want a "make" version for the time being. But the target name "dist" in meson is reserved for that reason, so we call the custom target "pgdist" (so call something like "meson compile -C build pgdist"). I would suggest poisoning `meson dist` in the following way: if not meson.is_subproject() # Maybe edit the comment...Maybe tell perl to print this message # instead and then exit non-zero? # # Meson has its own distribution building command (meson dist), but we # are not using that at this point. The main problem is that the way # they have implemented it, it is not deterministic in the above sense. # Also, we want a "make" version for the time being. But the target # name "dist" in meson is reserved for that reason, so we call the # custom target "pgdist" (so call something like "meson compile -C build # pgdist"). # # We don't poison the dist if we are a subproject because it is # possible that the parent project may want to create a dist using # the builtin Meson method. meson.add_dist_script(perl, '-e', 'exit 1') endif I have extracted the freebsd CI script fix into a separate patch (0002). I think this is useful even if we don't take the full CI patch (0003). 0002 looks pretty reasonable to me. About the 0003 patch: It seems useful in principle to test these things continuously. The dist script runs about 10 seconds in each task, and takes a bit of disk space for the artifacts. I'm not sure to what degree this might bother someone. 0003 works for me :). -- Tristan Partin Neon (https://neon.tech)
Re: DOCS: add helpful partitioning links
On Thu, Mar 21, 2024 at 7:27 AM Ashutosh Bapat wrote: > On Wed, Mar 20, 2024 at 5:22 PM Ashutosh Bapat > wrote: >> On Tue, Mar 19, 2024 at 6:38 PM Robert Treat wrote: >>> >>> >>> I've put it in the next commitfest with target version of 17, and I've >>> added you as a reviewer :-) >>> >> >> Thanks. >> >>> >>> Also, attached is an updated patch with your change above which should >>> apply cleanly to the current git master. >> >> >> It did apply for me now. >> >> The HTML renders good, the links work as expected. >> >> The CREATE TABLE ... LIKE command >> I think the original word "option" instead of "command" is better since we >> are talking about LIKE as an option to CREATE TABLE instead of CREATE TABLE >> command. >> >> + but any future attached or created partitions will be indexed as well. >> >> I think just "any future partitions will be indexed as well" would suffice, >> no need to mention whether they were created or attached. >> >> + One limitation when creating new indexes on partitioned tables is that >> it >> + is not possible to use the CONCURRENTLY >> + qualifier when creating such a partitioned index. To avoid long >> >> The sentence uses two different phrases, "indexes on partitioned tables" and >> "partitioned index", for the same thing in the same sentence. Probably it is >> better to leave original sentence as is. >> >> But I think it's time for a committer to take a look at this. Please feel >> free to address the above comments if you agree with them. Marking this as >> ready for committer. >> > > The patch changes things not directly related to $Subject. It will be good to > add a commit message to the patch describing what are those changes about. I > observe that all of them are in section "partition maintenance". > https://www.postgresql.org/docs/16/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-MAINTENANCE. > Do you see any more edits required in that section? > Heh, well, I had thought about some other possible improvements to that section but hadn't quite worked them out, but you inspired me to have another go of it ;-) v5 patch attached which I think further improves clarity/brevity of this section. I've left the patch name the same for simplicity, but I'd agree that the commit would now be more along the lines of editing / improvements / copyrighting of "Partition Maintenance" docs. Robert Treat https://xzilla.net improve-partition-links_v5.patch Description: Binary data
Re: Built-in CTYPE provider
On Fri, 2024-03-22 at 15:51 +0100, Peter Eisentraut wrote: > I think this might be too big of a compatibility break. So far, > initcap('123abc') has always returned '123abc'. If the new collation > returns '123Abc' now, then that's quite a change. These are not some > obscure Unicode special case characters, after all. It's a new collation, so I'm not sure it's a compatibility break. But you are right that it is against documentation and expectations for INITCAP(). > What is the ICU configuration incantation for this? Maybe we could > have > the builtin provider understand some of that, too. https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/stringoptions_8h.html#a4975f537b9960f0330b233061ef0608d https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/stringoptions_8h.html#afc65fa226cac9b8eeef0e877b8a7744e > Or we should create a function separate from initcap. If we create a new function, that also gives us the opportunity to accept optional arguments to control the behavior rather than relying on collation for every decision. Regards, Jeff Davis
Re: Cannot find a working 64-bit integer type on Illumos
Japin Li writes: > On Sat, 23 Mar 2024 at 00:53, Andres Freund wrote: >> Likely there's an unrelated warning triggering the configure test to >> fail. We'd need to see config.log to see what that is. > Thanks for your quick reply. Attach the config.log. Yup: conftest.c:139:5: error: no previous prototype for 'does_int64_work' [-Werror=missing-prototypes] 139 | int does_int64_work() | ^~~ cc1: all warnings being treated as errors configure:17003: $? = 1 configure: program exited with status 1 This warning is harmless normally, but breaks the configure probe if you enable -Werror. No doubt we could improve that test snippet so that it does not trigger that warning. But trying to make configure safe for -Werror seems like a fool's errand, for these reasons: * Do you really want to try to make all of configure's probes proof against every compiler warning everywhere? * Many of the test snippets aren't readily under our control, as they are supplied by Autoconf. * In the majority of cases, any such failures would be silent, as configure would just conclude that the feature it is probing for isn't there. So even finding there's a problem would be difficult. The short answer is that Autoconf is not designed to support -Werror and it's not worth it to try to make it do so. regards, tom lane
Re: Cannot find a working 64-bit integer type on Illumos
On Sat, 23 Mar 2024 at 01:04, Tom Lane wrote: > Japin Li writes: >> When I try to configure PostgreSQL 16.2 on Illumos using the following >> command, >> it complains $subject. > >> ./configure --enable-cassert --enable-debug --enable-nls --with-perl \ >> --with-python --without-tcl --without-gssapi --with-openssl \ >> --with-ldap --with-libxml --with-libxslt --without-systemd \ >> --with-readline --enable-thread-safety --enable-dtrace \ >> DTRACEFLAGS=-64 CFLAGS=-Werror > >> However, if I remove the `CFLAGS=-Werror`, it works fine. >> I'm not sure what happened here. > > CFLAGS=-Werror breaks a whole lot of configure's tests, not only that > one. (We even have this documented, see [1].) So you can't inject > -Werror that way. What I do on my buildfarm animals is the equivalent > of > > export COPT='-Werror' > > after configure and before build. I think configure pays no attention > to COPT, so it'd likely be safe to keep that set all the time, but in > the buildfarm client it's just as easy to be conservative. > > regards, tom lane > > [1] https://www.postgresql.org/docs/devel/install-make.html#CONFIGURE-ENVVARS Thank you very much! I didn't notice this part before.
Re: pg_upgrade --copy-file-range
On 3/22/24 17:42, Robert Haas wrote: > On Fri, Mar 22, 2024 at 10:40 AM Tomas Vondra > wrote: >> There's one question, though. As it stands, there's a bit of asymmetry >> between handling CopyFile() on WIN32 and the clone/copy_file_range on >> other platforms). On WIN32, we simply automatically switch to CopyFile >> automatically, if we don't need to calculate checksum. But with the >> other methods, error out if the user requests those and we need to >> calculate the checksum. > > That seems completely broken. copy_file() needs to have the ability to > calculate a checksum if one is required; when one isn't required, it > can do whatever it likes. So we should always fall back to the > block-by-block method if we need a checksum. Whatever option the user > specified should only be applied when we don't need a checksum. > > Consider, for example: > > pg_basebackup -D sunday -c fast --manifest-checksums=CRC32C > pg_basebackup -D monday -c fast --manifest-checksums=SHA224 > --incremental sunday/backup_manifest > pg_combinebackup sunday monday -o tuesday --manifest-checksums=CRC32C --clone > > Any files that are copied in their entirety from Sunday's backup can > be cloned, if we have support for cloning. But any files copied from > Monday's backup will need to be re-checksummed, since the checksum > algorithms don't match. With what you're describing, it sounds like > pg_combinebackup would just fail in this case; I don't think that's > the behavior that the user is going to want. > Right, this will happen: pg_combinebackup: error: unable to use accelerated copy when manifest checksums are to be calculated. Use --no-manifest Are you saying we should just silently override the copy method and do the copy block by block? I'm not strongly opposed to that, but it feels wrong to just ignore that the user explicitly requested cloning, and I'm not sure why should this be different from any other case when the user requests incompatible combination of options and/or options that are not supported on the current configuration. Why not just to tell the user to use the correct parameters, i.e. either remove --clone or add --no-manifest? FWIW I now realize it actually fails a bit earlier than I thought - when parsing the options, not in copy_file. But then some checks (if a given copy method is supported) happen in the copy functions. I wonder if it'd be better/possible to do all of that in one place, not sure. Also, the message only suggests to use --no-manifest. It probably should suggest removing --clone too. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: psql not responding to SIGINT upon db reconnection
On Fri, Mar 22, 2024 at 1:05 PM Tristan Partin wrote: > Sorry for taking a while to get back to y'all. I have taken your > feedback into consideration for v9. This is my first time writing > Postgres docs, so I'm ready for another round of editing :). Yeah, that looks like it needs some wordsmithing yet. I can take a crack at that at some point, but here are a few notes: - "takes care of" and "working through the state machine" seem quite vague to me. - the meanings of forRead and forWrite don't seem to be documented. - "retuns" is a typo. > Robert, could you point out some places where comments would be useful > in 0002? I did rename the function and moved it as suggested, thanks! In > turn, I also realized I forgot a prototype, so also added it. Well, for starters, I'd give the function a header comment. Telling me that a 1 second timeout is a 1 second timeout is less useful than telling me why we've picked a 1 second timeout. Presumably the answer is "so we can respond to cancel interrupts in a timely fashion", but I'd make that explicit. It might be worth a comment saying that PQsocket() shouldn't be hoisted out of the loop because it could be a multi-host connection string and the socket might change under us. Unless that's not true, in which case we should hoist the PQsocket() call out of the loop. I think it would also be worth a comment saying that we don't need to report errors here, as the caller will do that; we just need to wait until the connection is known to have either succeeded or failed, or until the user presses cancel. -- Robert Haas EDB: http://www.enterprisedb.com
Re: psql not responding to SIGINT upon db reconnection
On Fri Mar 22, 2024 at 9:59 AM CDT, Robert Haas wrote: On Wed, Jan 31, 2024 at 1:07 PM Tristan Partin wrote: > I was looking for documentation of PQsocket(), but didn't find any > standalone (unless I completely missed it). So I just copied how > PQsocket() is documented in PQconnectPoll(). I am happy to document it > separately if you think it would be useful. As Jelte said back at the end of January, I think you just completely missed it. The relevant part of libpq.sgml starts like this: PQsocketPQsocket As far as I know, we document all of the exported libpq functions in that SGML file. I think there's no real reason why we couldn't get at least 0001 and maybe also 0002 into this release, but only if you move quickly on this. Feature freeze is approaching rapidly. Modulo the documentation changes, I think 0001 is pretty much ready to go. 0002 needs comments. I'm also not so sure about the name process_connection_state_machine(); it seems a little verbose. How about something like wait_until_connected()? And maybe put it below do_connect() instead of above. Robert, Jelte: Sorry for taking a while to get back to y'all. I have taken your feedback into consideration for v9. This is my first time writing Postgres docs, so I'm ready for another round of editing :). Robert, could you point out some places where comments would be useful in 0002? I did rename the function and moved it as suggested, thanks! In turn, I also realized I forgot a prototype, so also added it. This patchset has also been rebased on master, so the libpq export number for PQsocketPoll was bumped. -- Tristan Partin Neon (https://neon.tech) From 7f8bf7250ecf79f65c129ccc42643c36bc53f882 Mon Sep 17 00:00:00 2001 From: Tristan Partin Date: Tue, 30 Jan 2024 15:40:57 -0600 Subject: [PATCH v9 1/2] Expose PQsocketPoll for frontends PQsocketPoll should help with state machine processing when connecting to a database asynchronously via PQconnectStart(). --- doc/src/sgml/libpq.sgml | 38 +++- src/interfaces/libpq/exports.txt | 1 + src/interfaces/libpq/fe-misc.c | 7 +++--- src/interfaces/libpq/libpq-fe.h | 4 4 files changed, 45 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index d3e87056f2c..10f191f6b88 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -262,6 +262,39 @@ PGconn *PQsetdb(char *pghost, + + PQsocketPollPQsocketPoll + + + nonblocking connection + Poll a connections underlying socket descriptor retrieved with . + +int PQsocketPoll(int sock, int forRead, int forWrite, time_t end_time); + + + + + This function takes care of the setup for monitoring a file descriptor. The underlying + function is either poll(2) or select(2), + depending on platform support. The primary use of this function is working through the state + machine instantiated by . + + + + The function returns a value greater than 0 if the specified condition + is met, 0 if a timeout occurred, or -1 if an error + or interrupt occurred. In the event forRead and + forWrite are not set, the function immediately retuns a timeout + condition. + + + + end_time is the time in the future in seconds starting from the UNIX + epoch in which you would like the function to return if the condition is not met. + + + + PQconnectStartParamsPQconnectStartParams PQconnectStartPQconnectStart @@ -358,7 +391,10 @@ PostgresPollingStatusType PQconnectPoll(PGconn *conn); Loop thus: If PQconnectPoll(conn) last returned PGRES_POLLING_READING, wait until the socket is ready to read (as indicated by select(), poll(), or - similar system function). + similar system function). Note that PQsocketPoll + can help reduce boilerplate by abstracting the setup of + select(2) or poll(2) if it is + available on your system. Then call PQconnectPoll(conn) again. Conversely, if PQconnectPoll(conn) last returned PGRES_POLLING_WRITING, wait until the socket is ready diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt index 9fbd3d34074..1e48d37677d 100644 --- a/src/interfaces/libpq/exports.txt +++ b/src/interfaces/libpq/exports.txt @@ -202,3 +202,4 @@ PQcancelSocket199 PQcancelErrorMessage 200 PQcancelReset 201 PQcancelFinish202 +PQsocketPoll 203 diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c index f2fc78a481c..f562cd8d344 100644 --- a/src/interfaces/libpq/fe-misc.c +++ b/src/interfaces/libpq/fe-misc.c @@ -55,7 +55,6 @@ static int pqPutMsgBytes(const void *buf, size_t len, PGconn *conn); static int pqSendSome(PGconn *conn, int len); static int pqSocketCheck(PGconn *conn, int forRead, int forWrite,
Re: Cannot find a working 64-bit integer type on Illumos
Japin Li writes: > When I try to configure PostgreSQL 16.2 on Illumos using the following > command, > it complains $subject. > ./configure --enable-cassert --enable-debug --enable-nls --with-perl \ > --with-python --without-tcl --without-gssapi --with-openssl \ > --with-ldap --with-libxml --with-libxslt --without-systemd \ > --with-readline --enable-thread-safety --enable-dtrace \ > DTRACEFLAGS=-64 CFLAGS=-Werror > However, if I remove the `CFLAGS=-Werror`, it works fine. > I'm not sure what happened here. CFLAGS=-Werror breaks a whole lot of configure's tests, not only that one. (We even have this documented, see [1].) So you can't inject -Werror that way. What I do on my buildfarm animals is the equivalent of export COPT='-Werror' after configure and before build. I think configure pays no attention to COPT, so it'd likely be safe to keep that set all the time, but in the buildfarm client it's just as easy to be conservative. regards, tom lane [1] https://www.postgresql.org/docs/devel/install-make.html#CONFIGURE-ENVVARS
Re: documentation structure
On Fri, Mar 22, 2024, 09:32 Robert Haas wrote: > > > I notice that you say that the "Installation" section should "cover > the architectural overview and point people to where they can find the > stuff they need to install PostgreSQL in the various ways available to > them" so maybe you're not imagining a four-sentence chapter, either. > Fair point but I posit that new users are looking for a chapter named Installation in the documentation. At least the ones willing to read documentation. Having two of them isn't needed but having zero doesn't make sense either. The current proposal does that so I'm ok as-is but it can be further improved by moving source install talk elsewhere and having the installation chapter redirect the reader there for details. I'm not concerned with how long or short the resultant installation chapter is. David J. > >
Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
On Fri, Mar 22, 2024 at 12:53:15PM -0400, Tom Lane wrote: > Would you like to review the catcache patch further, or do you > think it's good to go? I'll take another look this afternoon. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
Nathan Bossart writes: > On Fri, Mar 22, 2024 at 11:27:46AM -0400, Tom Lane wrote: >> * Do we want to risk back-patching any of this, to fix the performance >> regression in v16? I think that the OP's situation is a pretty >> narrow one, but maybe he's not the only person who managed to dodge >> roles_is_member_of's performance issues in most other cases. > I've heard complaints about performance with many roles before, so I > certainly think this area is worth optimizing. As far as back-patching > goes, my current feeling is that the hash table is probably pretty safe and > provides the majority of the benefit, but anything fancier should probably > be reserved for v17 or v18. Yeah. Although both the catcache and list_append_unique_oid bits are O(N^2), the catcache seems to have a much bigger constant factor --- when I did a "perf" check on the unpatched code, I saw catcache eating over 90% of the runtime and list_member_oid about 2%. So let's fix that part in v16 and call it a day. It should be safe to back-patch the catcache changes as long as we put the new fields at the end of the struct and leave cc_lists present but empty. Would you like to review the catcache patch further, or do you think it's good to go? regards, tom lane
Re: Cannot find a working 64-bit integer type on Illumos
Hi, On 2024-03-23 00:48:05 +0800, Japin Li wrote: > When I try to configure PostgreSQL 16.2 on Illumos using the following > command, > it complains $subject. > > ./configure --enable-cassert --enable-debug --enable-nls --with-perl \ > --with-python --without-tcl --without-gssapi --with-openssl \ > --with-ldap --with-libxml --with-libxslt --without-systemd \ > --with-readline --enable-thread-safety --enable-dtrace \ > DTRACEFLAGS=-64 CFLAGS=-Werror > > However, if I remove the `CFLAGS=-Werror`, it works fine. Likely there's an unrelated warning triggering the configure test to fail. We'd need to see config.log to see what that is. Greetings, Andres Freund
Re: pgsql: Allow using syncfs() in frontend utilities.
On Wed, Sep 6, 2023 at 7:28 PM Nathan Bossart wrote: > Allow using syncfs() in frontend utilities. > > This commit allows specifying a --sync-method in several frontend > utilities that must synchronize many files to disk (initdb, > pg_basebackup, pg_checksums, pg_dump, pg_rewind, and pg_upgrade). > On Linux, users can specify "syncfs" to synchronize the relevant > file systems instead of calling fsync() for every single file. In > many cases, using syncfs() is much faster. > > As with recovery_init_sync_method, this new option comes with some > caveats. The descriptions of these caveats have been moved to a > new appendix section in the documentation. Hi, I'd like to complain about this commit's addition of a new appendix. I do understand the temptation to document caveats like this centrally instead of in multiple places, but as I've been complaining about over in the "documentation structure" thread, our top-level documentation index is too big, and I feel strongly that we need to de-clutter it rather than cluttering it further. This added a new chapter which is just 5 sentences long. I understand that this was done because the same issue applies to a bunch of different utilities and we didn't want to duplicate this text in all of those places, but I feel like this approach just doesn't scale. If we did this in every place where we have this much text that we want to avoid duplicating, we'd soon have hundreds of appendixes. What I would suggest we do instead is pick one of the places where this comes up and document it there, perhaps the recovery_init_sync_method GUC. And then make the documentation for the other say something like, you know those issues we documented for recovery_init_sync_method? Well they also apply to this. -- Robert Haas EDB: http://www.enterprisedb.com
Cannot find a working 64-bit integer type on Illumos
Hi, hackers, When I try to configure PostgreSQL 16.2 on Illumos using the following command, it complains $subject. ./configure --enable-cassert --enable-debug --enable-nls --with-perl \ --with-python --without-tcl --without-gssapi --with-openssl \ --with-ldap --with-libxml --with-libxslt --without-systemd \ --with-readline --enable-thread-safety --enable-dtrace \ DTRACEFLAGS=-64 CFLAGS=-Werror However, if I remove the `CFLAGS=-Werror`, it works fine. I'm not sure what happened here. $ uname -a SunOS db_build 5.11 hunghu-20231216T132436Z i86pc i386 i86pc illumos $ gcc --version gcc (GCC) 10.4.0 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Re: pg_upgrade --copy-file-range
On Fri, Mar 22, 2024 at 10:40 AM Tomas Vondra wrote: > There's one question, though. As it stands, there's a bit of asymmetry > between handling CopyFile() on WIN32 and the clone/copy_file_range on > other platforms). On WIN32, we simply automatically switch to CopyFile > automatically, if we don't need to calculate checksum. But with the > other methods, error out if the user requests those and we need to > calculate the checksum. That seems completely broken. copy_file() needs to have the ability to calculate a checksum if one is required; when one isn't required, it can do whatever it likes. So we should always fall back to the block-by-block method if we need a checksum. Whatever option the user specified should only be applied when we don't need a checksum. Consider, for example: pg_basebackup -D sunday -c fast --manifest-checksums=CRC32C pg_basebackup -D monday -c fast --manifest-checksums=SHA224 --incremental sunday/backup_manifest pg_combinebackup sunday monday -o tuesday --manifest-checksums=CRC32C --clone Any files that are copied in their entirety from Sunday's backup can be cloned, if we have support for cloning. But any files copied from Monday's backup will need to be re-checksummed, since the checksum algorithms don't match. With what you're describing, it sounds like pg_combinebackup would just fail in this case; I don't think that's the behavior that the user is going to want. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
On Fri, Mar 22, 2024 at 11:27:46AM -0400, Tom Lane wrote: > Yeah, that's about what I'd expect: hash+bloom ought to remove > most (not quite all) of the opportunity for simd to shine, because > the bloom filter should eliminate most of the list_member_oid calls. Right. IMHO the SIMD work is still worth considering because there are probably even more extreme cases where it'll make a decent amount of difference. Plus, that stuff is pretty low overhead for what you get in return. That being said, the hash table and Bloom filter should definitely be the higher priority. > * Same question for the bloom logic, but here I think it's mostly > a matter of tuning those constants. I suspect there might be some regressions just after the point where we construct the filter, but I'd still expect that to be a reasonable trade-off. We could probably pretty easily construct some benchmarks to understand the impact with a given number of roles. (I'm not sure I'll be able to get to that today.) > * Do we want to risk back-patching any of this, to fix the performance > regression in v16? I think that the OP's situation is a pretty > narrow one, but maybe he's not the only person who managed to dodge > roles_is_member_of's performance issues in most other cases. I've heard complaints about performance with many roles before, so I certainly think this area is worth optimizing. As far as back-patching goes, my current feeling is that the hash table is probably pretty safe and provides the majority of the benefit, but anything fancier should probably be reserved for v17 or v18. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: documentation structure
On Fri, Mar 22, 2024 at 11:50 AM David G. Johnston wrote: > On Fri, Mar 22, 2024 at 7:10 AM Robert Haas wrote: >> That's actually what we had in chapter >> 18, "Installation from Source Code on Windows", since removed. But for >> some reason we decided that on non-Windows platforms, it needed a >> whole new chapter rather than an extra sentence in the existing one. I >> think that's massively overkill. > > I agree with the premise that we should have a single chapter, in the main > documentation flow, named "Installation". It should cover the architectural > overview and point people to where they can find the stuff they need to > install PostgreSQL in the various ways available to them. I agree with > moving the source installation material to the appendix. None of the > sections under Installation would then actually detail how to install the > software since that isn't something the project itself handles but has > delegated to packagers for the vast majority of cases and the source install > details are in the appendix for the one "supported" mechanism that most > people do not use. Hmm, that's not quite the same as my position. I'm fine with either moving the installation from source material to an appendix, or leaving it where it is. But I'm strongly against continuing to have a chapter with four sentences in it that says to use the same download link that is on the main navigation bar of every page on the postgresql.org web site. We're never going to get the chapter index down to a reasonable size if we insist on having chapters that have a totally de minimis amount of content. So my feeling is that if we keep the installation from source material where it is, then we can make it also mention the download link, just as we used to do in the installation-on-windows chapter. But if we banish installation from source to the appendixes, then we shouldn't keep a whole chapter in the main documentation to tell people something that is anyway obvious. I don't really think that material needs to be there at all, but if we want to have it, surely we can find someplace to put it such that it doesn't require a whole chapter to say that and nothing else. It could for example go at the beginning of the "Server Setup and Operation" chapter, for instance; if that were the first chapter of section III, I think that would be natural enough. I notice that you say that the "Installation" section should "cover the architectural overview and point people to where they can find the stuff they need to install PostgreSQL in the various ways available to them" so maybe you're not imagining a four-sentence chapter, either. But this project is going to be impossible unless we stick to limited goals. We can, and should, rewrite some sections of the documentation to be more useful; but if we try to do that as part of the same project that aims to tidy up the index, the chances of us getting stuck in an endless bikeshedding loop go from "high" to "certain". So I don't want to hypothesize the existence of an installation chapter that isn't any of the things we have today. Let's try to get the things we have into places that make sense, and then consider other improvements separately. -- Robert Haas EDB: http://www.enterprisedb.com
Re: sublink [exists (select xxx group by grouping sets ())] causes an assertion error
"=?UTF-8?B?6LW15bqt5rW3KOW6reeroCk=?=" writes: > I recently notice these sql can lead to a assertion error in pg14 and older > version. Here is an example: > postgres=> CREATE TABLE t1 (a int); > postgres=> INSERT INTO t1 VALUES (1); > postgres=> SELECT EXISTS ( SELECT * FROM t1 GROUP BY GROUPING SETS ((a), > generate_series (1, 262144)) ) AS result; > server closed the connection unexpectedly In current v14 this produces: TRAP: FailedAssertion("!lt->writing || lt->buffer == NULL", File: "logtape.c", Line: 1279, PID: 928622) Thanks for the report. I did some bisecting and found that the crash appears at Jeff's commit c8aeaf3ab (which introduced this assertion) and disappears at Heikki's c4649cce3 (which removed it). So I would say that the problem is "this assertion is wrong", and we should fix the problem by fixing the assertion, not by hacking around in distant calling code. On the whole, since this code has been dead for several versions, I'd be inclined to just remove the assertion. I think it's quite risky because of the possibility that we reach this function during post-transaction-abort cleanup, when there's no very good reason to assume that the tapeset's been closed down cleanly. (To be clear, that's not what's happening in the given test case ... but I fear that it could.) regards, tom lane
Re: [PATCH] plpython function causes server panic
Robert Haas writes: > On Fri, Dec 29, 2023 at 12:56 PM Tom Lane wrote: >> Here's a draft patch along this line. Basically the idea is that >> subtransactions used for error control are now legal in parallel >> mode (including in parallel workers) so long as they don't try to >> acquire their own XIDs. I had to clean up some error handling >> in xact.c, but really this is a pretty simple patch. > I agree with the general direction. A few comments: Thanks for looking at this! I was hoping you'd review it, because I thought there was a pretty significant chance that I'd missed some fundamental reason it couldn't work. I feel better now about it being worth pursuing. I consider the patch draft quality at this point: I didn't spend much effort on docs or comments, and none on test cases. I'll work on those issues and come back with a v2. regards, tom lane
Re: documentation structure
On Fri, Mar 22, 2024 at 7:10 AM Robert Haas wrote: > > That's actually what we had in chapter > 18, "Installation from Source Code on Windows", since removed. But for > some reason we decided that on non-Windows platforms, it needed a > whole new chapter rather than an extra sentence in the existing one. I > think that's massively overkill. > > I agree with the premise that we should have a single chapter, in the main documentation flow, named "Installation". It should cover the architectural overview and point people to where they can find the stuff they need to install PostgreSQL in the various ways available to them. I agree with moving the source installation material to the appendix. None of the sections under Installation would then actually detail how to install the software since that isn't something the project itself handles but has delegated to packagers for the vast majority of cases and the source install details are in the appendix for the one "supported" mechanism that most people do not use. David J.
Re: MIN/MAX functions for a record
Exactly Tom, I see no fundamental problem for it not to be implemented, since comparison operator is already implemented. In fact, MIN/MAX should work for all types for which comparison operator is defined. Regarding index support, there should not be an issue if the index is defined for the record (e.g. `CREATE INDEX ON my_table(ROW(field_a, field_b))`). However such indexes seem not to be supported. Whether a composite index is compatible with a record created on the indexed fields in every edge case I'm not sure... Alexander, rewriting the year-month example is easy, but how would you rewrite this query? CREATE TABLE events(event_time TIMESTAMP, message VARCHAR, user_id VARCHAR); You want a newest message for each user. It's easy with MAX(record): SELECT user_id, MAX(ROW(event_time, message)).message FROM events GROUP BY user_id; One option is to rewrite to a subquery with LIMIT 1 SELECT user_id, (SELECT message FROM events e2 WHERE e1.user_id=e2.user_id ORDER BY event_time DESC LIMIT 1) FROM events e1 GROUP BY user_id; If your query already has multiple levels of grouping, multiple joins, UNIONs etc., it gets much more complex. I also wonder if the optimizer would pick the same plan as it would be if the MAX(record) is supported. Viliam On Fri, Mar 22, 2024 at 4:12 PM Tom Lane wrote: > Aleksander Alekseev writes: > >> In my queries I often need to do MIN/MAX for tuples, for example: > >> SELECT MAX(row(year, month)) > >> FROM (VALUES(2025, 1), (2024,2)) x(year, month); > >> This query throws: > >> ERROR: function max(record) does not exist > >> Was this ever discussed or is there something preventing the > implementation? > > > I believe it would be challenging to implement max(record) that would > > work reasonably well in a general case. > > As long as you define it as "works the same way record comparison > does", ie base it on record_cmp(), I don't think it would be much > more than a finger exercise [*]. And why would you want it to act > any differently from record_cmp()? Those semantics have been > established for a long time. > > regards, tom lane > > [*] Although conceivably there are some challenges in getting > record_cmp's caching logic to work in the context of an aggregate. >
Re: SQL:2011 application time
On 22.03.24 01:35, Paul Jungwirth wrote: > 1. In ri_triggers.c ri_KeysEqual, you swap the order of arguments to ri_AttributesEqual(): > > - if (!ri_AttributesEqual(riinfo->ff_eq_oprs[i], RIAttType(rel, attnums[i]), > - oldvalue, newvalue)) > + if (!ri_AttributesEqual(eq_opr, RIAttType(rel, attnums[i]), > + newvalue, oldvalue)) > > But the declared arguments of ri_AttributesEqual() are oldvalue and newvalue, so passing them > backwards is really confusing. And the change does matter in the tests. > > Can we organize this better? I renamed the params and actually the whole function. All it's doing is execute `oldvalue op newvalue`, casting if necessary. So I changed it to ri_CompareWithCast and added some documentation. In an earlier version of this patch I had a separate function for the PERIOD comparison, but it's just doing the same thing, so I think the best thing is to give the function a more accurate name and use it. Ok, I see now, and the new explanation is better. But after reading the comment in the function about collations, I think there could be trouble. As long as we are only comparing for equality (and we don't support nondeterministic global collations), then we can use any collation to compare for equality. But if we are doing contained-by, then the collation does matter, so we would need to get the actual collation somehow. So as written, this might not always work correctly. I think it would be safer for now if we just kept using the equality operation even for temporal foreign keys. If we did that, then in the case that you update a key to a new value that is contained by the old value, this function would say "not equal" and fire all the checks, even though it wouldn't need to. This is kind of similar to the "false negatives" that the comment already talks about. What do you think?
Re: Add Index-level REINDEX with multiple jobs
On Wed, Mar 20, 2024 at 7:19 PM Alexander Korotkov wrote: > On Mon, Mar 11, 2024 at 3:44 PM Maxim Orlov wrote: > > On Tue, 6 Feb 2024 at 09:22, Michael Paquier wrote: > >> The problem may be actually trickier than that, no? Could there be > >> other factors to take into account for their classification, like > >> their sizes (typically, we'd want to process the biggest one first, I > >> guess)? > > > > > > Sorry for a late reply. Thanks for an explanation. This is sounds > > reasonable to me. > > Svetlana had addressed this in the patch v2. > > I think this patch is a nice improvement. But it doesn't seem to be > implemented in the right way. There is no guarantee that > get_parallel_object_list() will return tables in the same order as > indexes. Especially when there is "ORDER BY idx.relpages". Also, > sort_indices_by_tables() has quadratic complexity (probably OK since > input list shouldn't be too lengthy) and a bit awkward. > > I've revised the patchset. Now appropriate ordering is made in SQL > query. The original list of indexes is modified to match the list of > tables. The tables are ordered by the size of its greatest index, > within table indexes are ordered by size. > > I'm going to further revise this patch, mostly comments and the commit > message. Here goes the revised patch. I'm going to push this if there are no objections. -- Regards, Alexander Korotkov v4-0001-Add-the-index-level-REINDEX-with-multiple-jobs.patch Description: Binary data
Re: [PATCH] plpython function causes server panic
On Fri, Dec 29, 2023 at 12:56 PM Tom Lane wrote: > Here's a draft patch along this line. Basically the idea is that > subtransactions used for error control are now legal in parallel > mode (including in parallel workers) so long as they don't try to > acquire their own XIDs. I had to clean up some error handling > in xact.c, but really this is a pretty simple patch. I agree with the general direction. A few comments: - Isn't it redundant to test if IsInParallelMode() || IsParallelWorker()? We can't be in a parallel worker without also being in parallel mode, except during the worker startup sequence. - I don't think the documentation changes are entirely accurate. The whole point of the patch is to allow parallel workers to make changes to the transaction state, but the documentation says you can't. Maybe we should just delete "change the transaction state" entirely from the list of things that you're not allowed to do, since "write to the database" is already listed separately; or maybe we should replace it with something like "assign new transaction IDs or command IDs," although that's kind of low-level. I don't think we should just delete the "even temporarily" bit, as you've done. - While I like the new comments in BeginInternalSubTransaction(), I think the changes in ReleaseCurrentSubTransaction() and RollbackAndReleaseCurrentSubTransaction() need more thought. For one thing, it's got to be wildly optimistic to claim that we would have caught *anything* that's forbidden in parallel mode; that would require solving the halting problem. I'd rather have no comment at all here than one making such an ambitious claim, and I think that might be a fine way to go. But if we do have a comment, I think it should be more narrowly focused e.g. "We do not check for parallel mode here. It's permissible to start and end subtransactions while in parallel mode, as long as no new XIDs or command IDs are assigned." One additional thing that might (or might not) be worth mentioning or checking for here is that the leader shouldn't try to reduce the height of the transaction state stack to anything less than what it was when the parallel operation started; if it wants to do that, it needs to clean up the parallel workers and exit parallel mode first. Similarly a worker shouldn't ever end the toplevel transaction except during backend cleanup. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Adding comments to help understand psql hidden queries
On Fri, Mar 22, 2024 at 9:47 AM Greg Sabino Mullane wrote: > > On Thu, Mar 21, 2024 at 6:20 PM Peter Eisentraut wrote: >> >> lines are supposed to align vertically. With your patch, the first line >> would have variable length depending on the command. > > > Yes, that is a good point. Aligning those would be quite tricky, what if we > just kept a standard width for the closing query? Probably the 24 stars we > currently have to match "QUERY", which it appears nobody has changed for > translation purposes yet anyway. (If I am reading the code correctly, it > would be up to the translators to maintain the vertical alignment). I think it's easier to keep the widths balanced than constant (patch version included here), but if we needed to squeeze the opening string to a standard width that would be possible without too much trouble. The internal comment strings seem to be a bit more of a pain if we wanted all of the comments to be the same width, as we'd need a table or something so we can compute the longest string width, etc; doesn't seem worth the convolutions IMHO. No changes to Greg's patch, just keeping 'em both so cfbot stays happy. David v3-0002-Add-output-of-the-command-that-got-us-here-to-the.patch Description: Binary data v3-0001-Include-SQL-comments-on-echo-hidden-output.patch Description: Binary data
Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
Nathan Bossart writes: > On Fri, Mar 22, 2024 at 09:47:39AM -0500, Nathan Bossart wrote: >> hash hash+simd hash+simd+bloom >> create 1.27 1.27 1.28 >> grant 0.18 0.11 0.03 > For just hash+bloom, I'm seeing 1.29 and 0.04. Yeah, that's about what I'd expect: hash+bloom ought to remove most (not quite all) of the opportunity for simd to shine, because the bloom filter should eliminate most of the list_member_oid calls. Possibly we could fix that small regression in the create phase with more careful tuning of the magic constants in the bloom logic? Although I'd kind of expect that the create step doesn't ever invoke the bloom filter, else it would have been showing a performance problem before; so this might not be a good test case for helping us tune those. I think remaining questions are: * Is there any case where the new hash catcache logic could lose measurably? I kind of doubt it, because we were already computing the hash value for list searches; so basically the only overhead is one more palloc per cache during the first list search. (If you accumulate enough lists to cause a rehash, you're almost surely well into winning territory.) * Same question for the bloom logic, but here I think it's mostly a matter of tuning those constants. * Do we want to risk back-patching any of this, to fix the performance regression in v16? I think that the OP's situation is a pretty narrow one, but maybe he's not the only person who managed to dodge roles_is_member_of's performance issues in most other cases. regards, tom lane
Re: [DOC] Introducing Quick Start Guide to PL/pgSQL and PL/Python Documentation
On Fri, Dec 22, 2023 at 12:04 PM Pavel Stehule wrote: > I have not a strong opinion about it. My personal experience is so 99% PL > code is PLpgSQL, so it can be there, and other PL can be referenced there. I > am not sure if there is some common part for all PL. After reading over this thread, it seems clear to me that there is no consensus to proceed with this patch in its current form, and the discussion seems to have stalled. Accordingly, I've marked this "Returned with Feedback" in the CommitFest. Ishaan, if you plan to rework this into a form which might be acceptable given the review comments made up until now, please feel free to change this back to "Waiting on Author", and/or move it to a future CommitFest. -- Robert Haas EDB: http://www.enterprisedb.com
Re: MIN/MAX functions for a record
Aleksander Alekseev writes: >> In my queries I often need to do MIN/MAX for tuples, for example: >> SELECT MAX(row(year, month)) >> FROM (VALUES(2025, 1), (2024,2)) x(year, month); >> This query throws: >> ERROR: function max(record) does not exist >> Was this ever discussed or is there something preventing the implementation? > I believe it would be challenging to implement max(record) that would > work reasonably well in a general case. As long as you define it as "works the same way record comparison does", ie base it on record_cmp(), I don't think it would be much more than a finger exercise [*]. And why would you want it to act any differently from record_cmp()? Those semantics have been established for a long time. regards, tom lane [*] Although conceivably there are some challenges in getting record_cmp's caching logic to work in the context of an aggregate.
Re: hot updates and fillfactor
Thanks for your explanation and for the links On Tue, Mar 19, 2024 at 11:17 AM Aleksander Alekseev < aleksan...@timescale.com> wrote: > Hi Fabrice, > > > I do not understand why hot_updates value is not 0 for pg_database? > Given that reloptions is empty for this table that means it has a default > value of 100% > > Maybe I didn't entirely understand your question, but why would you > assume they are somehow related? > > According to the documentation [1][2]: > > pg_class.reloptions: > Access-method-specific options, as “keyword=value” strings > > pg_stat_all_tables.n_tup_hot_upd: > Number of rows HOT updated. These are updates where no successor > versions are required in indexes. > > The value of n_tup_hot_upd is not zero because there are tuples that > were HOT-updated. That's it. You can read more about HOT here [3]. > > [1]: https://www.postgresql.org/docs/current/catalog-pg-class.html > [2]: https://www.postgresql.org/docs/current/monitoring-stats.html > [3]: https://www.postgresql.org/docs/current/storage-hot.html > > -- > Best regards, > Aleksander Alekseev >
Re: MIN/MAX functions for a record
Hi, > In my queries I often need to do MIN/MAX for tuples, for example: > > SELECT MAX(row(year, month)) > FROM (VALUES(2025, 1), (2024,2)) x(year, month); > > This query throws: > > ERROR: function max(record) does not exist > > In this case you can replace it with `MAX((year||'-'||month||'-1')::date)`. > However in my case I have an event table with `event_time` and `text` > columns, I'm grouping that table by some key and want to have the text for > the newest event. I would do `MAX(ROW(event_time, text)).text`. Workarounds > for this are clumsy, e.g. with a subquery with LIMIT 1. > > The lack of this feature is kind of unexpected, because the `>` operator or > `GREATEST` function are defined for records: > > SELECT > GREATEST((2025, 1), (2024, 2)), > (2025, 1) > (2024, 2) > > Was this ever discussed or is there something preventing the implementation? I believe it would be challenging to implement max(record) that would work reasonably well in a general case. What if, for instance, one of the columns is JOSNB or XML? Not to mention the fact that Postgres supports user-defined types which don't necessarily have a reasonable order. Take a point in a 2D or 3D space as an example. On top of that I doubt that the proposed query will perform well since I don't see how it could benefit from using indexes. I don't claim that this is necessarily true in your case but generally one could argue that the wrong schema is used here and instead of (year, month) pair a table should have a date/timestamp(tz) column. Personally I would choose format() function [1] in cases like this in order to play it safe. Assuming of course that the table is small and the query is not executed often. [1]: https://www.postgresql.org/docs/current/functions-string.html -- Best regards, Aleksander Alekseev
Re: psql not responding to SIGINT upon db reconnection
On Wed, Jan 31, 2024 at 1:07 PM Tristan Partin wrote: > I was looking for documentation of PQsocket(), but didn't find any > standalone (unless I completely missed it). So I just copied how > PQsocket() is documented in PQconnectPoll(). I am happy to document it > separately if you think it would be useful. As Jelte said back at the end of January, I think you just completely missed it. The relevant part of libpq.sgml starts like this: PQsocketPQsocket As far as I know, we document all of the exported libpq functions in that SGML file. I think there's no real reason why we couldn't get at least 0001 and maybe also 0002 into this release, but only if you move quickly on this. Feature freeze is approaching rapidly. Modulo the documentation changes, I think 0001 is pretty much ready to go. 0002 needs comments. I'm also not so sure about the name process_connection_state_machine(); it seems a little verbose. How about something like wait_until_connected()? And maybe put it below do_connect() instead of above. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
On Fri, Mar 22, 2024 at 09:47:39AM -0500, Nathan Bossart wrote: > hash hash+simd hash+simd+bloom > create 1.27 1.27 1.28 > grant 0.18 0.11 0.03 For just hash+bloom, I'm seeing 1.29 and 0.04. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Built-in CTYPE provider
On 21.03.24 01:13, Jeff Davis wrote: Are there any test cases that illustrate the word boundary changes in patch 0005? It might be useful to test those against Oracle as well. The tests include initcap('123abc') which is '123abc' in the PG_C_UTF8 collation vs '123Abc' in PG_UNICODE_FAST. The reason for the latter behavior is that the Unicode Default Case Conversion algorithm for toTitlecase() advances to the next Cased character before mapping to titlecase, and digits are not Cased. ICU has a configurable adjustment, and defaults in a way that produces '123abc'. I think this might be too big of a compatibility break. So far, initcap('123abc') has always returned '123abc'. If the new collation returns '123Abc' now, then that's quite a change. These are not some obscure Unicode special case characters, after all. What is the ICU configuration incantation for this? Maybe we could have the builtin provider understand some of that, too. Or we should create a function separate from initcap.
Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
On Thu, Mar 21, 2024 at 08:59:54PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> On Thu, Mar 21, 2024 at 03:40:12PM -0500, Nathan Bossart wrote: >>> On Thu, Mar 21, 2024 at 04:31:45PM -0400, Tom Lane wrote: I don't think we have any really cheap way to de-duplicate the role OIDs, especially seeing that it has to be done on-the-fly within the collection loop, and the order of roles_list is at least potentially interesting. Not sure how to make further progress without a lot of work. > >>> Assuming these are larger lists, this might benefit from optimizations >>> involving SIMD intrinsics. > >> Never mind. With the reproduction script, I'm only seeing a ~2% >> improvement with my patches. > > Yeah, you cannot beat an O(N^2) problem by throwing SIMD at it. I apparently had some sort of major brain fade when I did this because I didn't apply your hashing patch when I ran this SIMD test. With it applied, I see a speedup of ~39%, which makes a whole lot more sense to me. If I add the Bloom patch (updated with your suggestions), I get another ~73% improvement from there, and a much smaller regression in the role creation portion. hash hash+simd hash+simd+bloom create 1.27 1.27 1.28 grant 0.18 0.11 0.03 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c index cf5d08576a..4952dc9c1e 100644 --- a/src/backend/utils/adt/acl.c +++ b/src/backend/utils/adt/acl.c @@ -36,6 +36,7 @@ #include "common/hashfn.h" #include "foreign/foreign.h" #include "funcapi.h" +#include "lib/bloomfilter.h" #include "lib/qunique.h" #include "miscadmin.h" #include "utils/acl.h" @@ -4946,6 +4947,7 @@ roles_is_member_of(Oid roleid, enum RoleRecurseType type, ListCell *l; List *new_cached_roles; MemoryContext oldctx; + bloom_filter *bf = NULL; Assert(OidIsValid(admin_of) == PointerIsValid(admin_role)); if (admin_role != NULL) @@ -5023,16 +5025,46 @@ roles_is_member_of(Oid roleid, enum RoleRecurseType type, * graph, we must test for having already seen this role. It is * legal for instance to have both A->B and A->C->B. */ - roles_list = list_append_unique_oid(roles_list, otherid); + if ((bf && bloom_lacks_element(bf, (unsigned char *) , sizeof(Oid))) || +!list_member_oid(roles_list, otherid)) + { +if (bf == NULL && list_length(roles_list) > 1000) +{ + bf = bloom_create(1024 * 1024, work_mem, 0); + foreach_oid(role, roles_list) + bloom_add_element(bf, (unsigned char *) , sizeof(Oid)); +} +roles_list = lappend_oid(roles_list, otherid); +if (bf) + bloom_add_element(bf, (unsigned char *) , sizeof(Oid)); + } } ReleaseSysCacheList(memlist); /* implement pg_database_owner implicit membership */ if (memberid == dba && OidIsValid(dba)) - roles_list = list_append_unique_oid(roles_list, -ROLE_PG_DATABASE_OWNER); + { + Oid dbowner = ROLE_PG_DATABASE_OWNER; + + if ((bf && bloom_lacks_element(bf, (unsigned char *) , sizeof(Oid))) || +!list_member_oid(roles_list, dbowner)) + { +if (bf == NULL && list_length(roles_list) > 1000) +{ + bf = bloom_create(1024 * 1024, work_mem, 0); + foreach_oid(role, roles_list) + bloom_add_element(bf, (unsigned char *) , sizeof(Oid)); +} +roles_list = lappend_oid(roles_list, dbowner); +if (bf) + bloom_add_element(bf, (unsigned char *) , sizeof(Oid)); + } + } } + if (bf) + bloom_free(bf); + /* * Copy the completed list into TopMemoryContext so it will persist. */
Re: Adding comments to help understand psql hidden queries
On Thu, Mar 21, 2024 at 6:20 PM Peter Eisentraut wrote: > lines are supposed to align vertically. With your patch, the first line > would have variable length depending on the command. > Yes, that is a good point. Aligning those would be quite tricky, what if we just kept a standard width for the closing query? Probably the 24 stars we currently have to match "QUERY", which it appears nobody has changed for translation purposes yet anyway. (If I am reading the code correctly, it would be up to the translators to maintain the vertical alignment). Cheers, Greg
Re: pg_upgrade --copy-file-range
Here's a patch reworked along the lines from a couple days ago. The primary goals were to add clone/copy_file_range while minimizing unnecessary disruption, and overall cleanup of the patch. I'm not saying it's committable, but I think the patch is much easier to understand. The main change is that this abandons the idea of handling all possible cases in a single function that looks like a maze of ifdefs, and instead separates each case into it's own function and the decision happens much earlier. This is pretty much exactly what pg_upgrade does, BTW. There's maybe an argument that these functions could be unified and moved to a library in src/common - I can imagine doing that, but I don't think it's required. The functions are pretty trivial wrappers, and it's not like we expect many more callers. And there's probably stuff we'd need to keep out of that library (e.g. the decision which copy/clone methods are available / should be used or error reporting). So it doesn't seem worth it, at least for now. There's one question, though. As it stands, there's a bit of asymmetry between handling CopyFile() on WIN32 and the clone/copy_file_range on other platforms). On WIN32, we simply automatically switch to CopyFile automatically, if we don't need to calculate checksum. But with the other methods, error out if the user requests those and we need to calculate the checksum. The asymmetry comes from the fact there's no option to request CopyFile on WIN32, and we feel confident it's always the right thing to do (safe, faster). We don't seem to know that for the other methods, so the user has to explicitly request those. And if the user requests --clone, for example, it'd be wrong to silently fallback to plain copy. Still, I wonder if this might cause some undesirable issues during restores. But I guess that's why we have --dry-run. This asymmetry also shows a bit in the code - the CopyFile is coded and called a bit differently from the other methods. FWIW I abandoned the approach with "strategy" and just use a switch on CopyMode enum, just like pg_upgrade does. There's a couple more smaller changes: - Addition of docs for --clone/--copy-file-range (shameless copy from pg_upgrade docs). - Removal of opt_errinfo - not only was it buggy, I think the code is actually cleaner without it. - Removal of EINTR retry condition from copy_file_range handling (this is what Thomas ended up for pg_upgrade while committing that part). Put together, this cuts the patch from ~40kB to ~15kB (most of this is due to the cleanup of unnecessary whitespace changes, though). I think to make this committable, this requires some review and testing, ideally on a range of platforms. One open question is how to allow testing this. For pg_upgrade we now have PG_TEST_PG_UPGRADE_MODE, which can be set to e.g. "--clone". I wonder if we should add PG_TEST_PG_COMBINEBACKUP_MODE ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 7b6a6fe1b555647109caec2817f9200ac8fe9db9 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Tue, 19 Mar 2024 15:16:29 +0100 Subject: [PATCH v20240322] pg_combinebackup - allow using clone/copy_file_range --- doc/src/sgml/ref/pg_combinebackup.sgml | 34 + src/bin/pg_combinebackup/copy_file.c| 157 +++- src/bin/pg_combinebackup/copy_file.h| 18 ++- src/bin/pg_combinebackup/pg_combinebackup.c | 26 +++- src/bin/pg_combinebackup/reconstruct.c | 5 +- src/bin/pg_combinebackup/reconstruct.h | 5 +- 6 files changed, 202 insertions(+), 43 deletions(-) diff --git a/doc/src/sgml/ref/pg_combinebackup.sgml b/doc/src/sgml/ref/pg_combinebackup.sgml index 8a0a600c2b2..60a60e3fae6 100644 --- a/doc/src/sgml/ref/pg_combinebackup.sgml +++ b/doc/src/sgml/ref/pg_combinebackup.sgml @@ -185,6 +185,40 @@ PostgreSQL documentation + + --clone + + +Use efficient file cloning (also known as reflinks on +some systems) instead of copying files to the new cluster. This can +result in near-instantaneous copying of the data files, giving the +speed advantages of -k/--link while +leaving the old cluster untouched. + + + +File cloning is only supported on some operating systems and file +systems. If it is selected but not supported, the +pg_combinebackup run will error. At present, +it is supported on Linux (kernel 4.5 or later) with Btrfs and XFS (on +file systems created with reflink support), and on macOS with APFS. + + + + + + --copy-file-range + + +Use the copy_file_range system call for efficient +copying. On some file systems this gives results similar to +--clone, sharing physical disk blocks, while on others +it may still copy blocks, but do so via an optimized path. At present, +it is supported on
Re: Support a wildcard in backtrace_functions
On Fri, 22 Mar 2024 at 11:14, Bharath Rupireddy wrote: > A few comments: > > 1. > @@ -832,6 +849,9 @@ matches_backtrace_functions(const char *funcname) > { > const char *p; > > +if (!backtrace_functions || backtrace_functions[0] == '\0') > +return true; > + > > Shouldn't this be returning false to not log set backtrace when > backtrace_functions is not set? Am I missing something here? Empty string is considered the new wildcard, i.e. backtrace_functions filtering is not enabled if it is empty. This seemed reasonable to me since you should now disable backtraces by using log_backtrace=none, having backtrace_functions='' mean the same thing seems rather useless. I also documented this in the updated docs. > 2. > +{ > +{"log_backtrace", PGC_SUSET, LOGGING_WHAT, > +gettext_noop("Sets if logs should include a backtrace."), > +NULL > > IMV, log_backtrace, backtrace_min_level and backtrace_functions are > interlinked, so keeping all of them as DEVELOPER_OPTIONS looks fine to > me than having log_backtrace at just LOGGING_WHAT kind. Also, we must > also mark log_backtrace as GUC_NOT_IN_SAMPLE. I agree they are linked, but I don't think it's just useful for developers to be able to set log_backtrace to internal (even if we choose not to make "internal" the default). > 3. I think it's worth adding a few examples to get backtraces in docs. > For instance, what to set to get backtraces of all internal errors and > what to set to get backtraces of all ERRORs coming from a specific > function etc. Good idea, I'll try to add those later. For now your specific cases would be: log_backtrace = 'internal' (default in 0003) backtrace_functions = '' (default) backtrace_min_level = 'ERROR' (default) and log_backtrace = 'all' backtrace_functions = 'someFunc' backtrace_min_level = 'ERROR' (default) > 4. I see the support for wildcard in backtrace_functions is missing. > Is it intentionally left out? If not, can you make it part of 0003 > patch? Yes it's intentional, see answer on 1. > 5. The amount of backtraces generated is going to be too huge when > setting log_backtrace = 'all' and backtrace_min_level = 'debug5'. With > this setting installcheck generated 12MB worth of log and the test > took about 55 seconds (as opposed to 48 seconds without it) The point > is if these settings are misused, it can easily slow down the entire > system and fill up disk space leading to crashes eventually. This > makes a strong case for marking log_backtrace a developer only > function. I think the same argument can be made for many other GUCs that are not marked as developer options (e.g. log_min_messages). Normally we "protect" such options by using PGC_SUSET. DEVELOPER_OPTIONS is really only meant for options that are only useful for developers > 6. In continuation to comment #5, does anyone need backtrace for > elevels like debugX and LOG etc.? What's the use of the backtrace in > such cases? I think at least WARNING and NOTICE could be useful in practice, but I agree LOG and DEBUGX seem kinda useless. But it seems kinda strange to not have them in the list, especially given it is pretty much no effort to support them too.
Re: [PATCH] Add sortsupport for range types and btree_gist
Hi Andrey, Am Freitag, dem 22.03.2024 um 18:27 +0500 schrieb Andrey M. Borodin: > > Here is a rebased version of the patch. > > FWIW it would be nice at least port tests from commit that I > referenced upthread. > Nowadays we have injection points, so these tests can be much more > stable. Alright, that's a reasonable point. I will look into this. Did you see other important things missing? Changed status back to "Waiting On Author". Thanks, Bernd
Re: speed up a logical replica setup
On Fri, Mar 22, 2024 at 9:02 AM Euler Taveira wrote: > > On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote: > > There is a compilation error while building postgres with the patch > due to a recent commit. I have attached a top-up patch v32-0003 to > resolve this compilation error. > I have not updated the version of the patch as I have not made any > change in v32-0001 and v32-0002 patch. > > > I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the > main patch (v32-0001). This version also includes 2 new tests: > > - refuse to run if the standby server is running > - refuse to run if the standby was promoted e.g. it is not in recovery > > The first one exercises a recent change (standby should be stopped) and the > second one covers an important requirement. > > Based on the discussion [1] about the check functions, Vignesh suggested that > it > should check both server before exiting. v33-0003 implements it. I don't have > a > strong preference; feel free to apply it. > > > [1] > https://www.postgresql.org/message-id/CALDaNm1Dg5tDRmaabk%2BZND4WF17NrNq52WZxCE%2B90-PGz5trQQ%40mail.gmail.com I had run valgrind with pg_createsubscriber to see if there were any issues. Valgrind reported the following issues: ==651272== LEAK SUMMARY: ==651272==definitely lost: 1,319 bytes in 18 blocks ==651272==indirectly lost: 1,280 bytes in 2 blocks ==651272== possibly lost: 44 bytes in 3 blocks ==651272==still reachable: 3,066 bytes in 22 blocks ==651272== suppressed: 0 bytes in 0 blocks ==651272== ==651272== For lists of detected and suppressed errors, rerun with: -s ==651272== ERROR SUMMARY: 17 errors from 17 contexts (suppressed: 0 from 0) The attached report has the details of the same. Thanks and Regards: Shubham Khanna. valgrind.log Description: Binary data
Re: UUID v7
On 20.03.24 19:08, Andrey M. Borodin wrote: On 19 Mar 2024, at 13:55, Peter Eisentraut wrote: On 16.03.24 18:43, Andrey M. Borodin wrote: On 15 Mar 2024, at 14:47, Aleksander Alekseev wrote: +1 to the idea. I doubt that anyone will miss it. PFA v22. Changes: 1. Squashed all editorialisation by Jelte 2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead) 3. Remove all traces of uuid_extract_variant() I have committed a subset of this for now, namely the additions of uuid_extract_timestamp() and uuid_extract_version(). These seemed mature and agreed upon. You can rebase the rest of your patch on top of that. Great! Thank you! PFA v23 with rebase on HEAD. I have been studying the uuidv() function. I find this code extremely hard to follow. We don't need to copy all that documentation from the RFC 4122bis document. People can read that themselves. What I would like to see is easy to find information what from there we are implementing. Like, - UUID version 7 - fixed-length dedicated counter - counter is 18 bits - 4 bits are initialized as zero That's more or less all I would need to know what is going on. That said, I don't understand why you say it's an 18 bit counter, when you overwrite 6 bits with variant and version. Then it's just a 12 bit counter? Which is the size of the rand_a field, so that kind of makes sense. But 12 bits is the recommended minimum, and (in this patch) we don't use sub-millisecond timestamp precision, so we should probably use more than the minimum? Also, you are initializing 4 bits (I think?) to zero to guard against counter rollovers (so it's really just an 8 bit counter?). But nothing checks against such rollovers, so I don't understand the use of that. The code code be organized better. In the not-increment_counter case, you could use two separate pg_strong_random calls: One to initialize rand_b, starting at >data[8], and one to initialize the counter. Then the former could be shared between the two branches, and the code to assign the sequence_counter to the uuid fields could also be shared. I would also prefer if the normal case (not-increment_counter) were the first if branch. Some other notes on your patch: - Your rebase duplicated the documentation of uuid_extract_timestamp and uuid_extract_version. - PostgreSQL code uses uint64 etc. instead of uint64_t etc. - It seems the added includes #include "access/xlog.h" #include "utils/builtins.h" #include "utils/datetime.h" are not needed. - The static variables sequence_counter and previous_timestamp could be kept inside the uuidv7() function.
Re: sslinfo extension - add notbefore and notafter timestamps
On Fri, Mar 22, 2024 at 6:15 AM Daniel Gustafsson wrote: > While staging this to commit I realized one silly thing about it warranting > another round here. The ASN.1 timediff code can diff against *any* timestamp, > not just the UNIX epoch, so we could just pass in the postgres epoch and skip > the final subtraction since we're already correctly adjusted. This removes > the > non-overflow checked arithmetic with a simpler logic. Ah, that's much better! +1. --Jacob
Re: speed up a logical replica setup
On Fri, 22 Mar 2024 at 09:01, Euler Taveira wrote: > > On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote: > > There is a compilation error while building postgres with the patch > due to a recent commit. I have attached a top-up patch v32-0003 to > resolve this compilation error. > I have not updated the version of the patch as I have not made any > change in v32-0001 and v32-0002 patch. > > > I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the > main patch (v32-0001). This version also includes 2 new tests: > > - refuse to run if the standby server is running > - refuse to run if the standby was promoted e.g. it is not in recovery > > The first one exercises a recent change (standby should be stopped) and the > second one covers an important requirement. Few comments: 1) In error case PQclear and PQfinish should be called: + /* Secure search_path */ + res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + pg_log_error("could not clear search_path: %s", +PQresultErrorMessage(res)); + if (exit_on_error) + exit(1); + + return NULL; + } + PQclear(res); 2) Call fflush here before calling system command to get proper ordered console output: a) Call fflush: + int rc = system(cmd_str); + + if (rc == 0) + pg_log_info("subscriber successfully changed the system identifier"); + else + pg_fatal("subscriber failed to change system identifier: exit code: %d", rc); + } b) similarly here: + pg_log_debug("pg_ctl command is: %s", pg_ctl_cmd->data); + rc = system(pg_ctl_cmd->data); + pg_ctl_status(pg_ctl_cmd->data, rc); + standby_running = true; c) similarly here: + pg_ctl_cmd = psprintf("\"%s\" stop -D \"%s\" -s", pg_ctl_path, + datadir); + pg_log_debug("pg_ctl command is: %s", pg_ctl_cmd); + rc = system(pg_ctl_cmd); + pg_ctl_status(pg_ctl_cmd, rc); + standby_running = false; 3) Call PQClear in error cases: a) Call PQClear + res = PQexec(conn, "SELECT system_identifier FROM pg_catalog.pg_control_system()"); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + pg_log_error("could not get system identifier: %s", +PQresultErrorMessage(res)); + disconnect_database(conn, true); + } b) similarly here + if (PQntuples(res) != 1) + { + pg_log_error("could not get system identifier: got %d rows, expected %d row", +PQntuples(res), 1); + disconnect_database(conn, true); + } + c) similarly here + res = PQexec(conn, +"SELECT oid FROM pg_catalog.pg_database " +"WHERE datname = pg_catalog.current_database()"); + if (PQresultStatus(res) != PGRES_TUPLES_OK) + { + pg_log_error("could not obtain database OID: %s", +PQresultErrorMessage(res)); + disconnect_database(conn, true); + } + d) There are few more places like this. 4) Since we are using a global variable, we might be able to remove initializing many of them. + /* Default settings */ + subscriber_dir = NULL; + opt.config_file = NULL; + opt.pub_conninfo_str = NULL; + opt.socket_dir = NULL; + opt.sub_port = DEFAULT_SUB_PORT; + opt.sub_username = NULL; + opt.database_names = (SimpleStringList){0}; + opt.recovery_timeout = 0; Regards, Vignesh
Re: NLS for extension
On Fri, Mar 22, 2024 at 9:27 AM Ed Behn wrote: > Thank you for your guidance. It turns out the call was coming from inside > the house. The error isn't caused by Postgres. It's the library that I'm > using that reported the error. My extension passes any errors it generates as > Postgres ERRORs. Yeah, I was kind of wondering if it was something like that. But I figured it was a capital mistake to theorize in advance of the data. Glad you figured it out. -- Robert Haas EDB: http://www.enterprisedb.com
Re: documentation structure
On Fri, Mar 22, 2024 at 9:35 AM Peter Eisentraut wrote: > >> But this separation was explicitly added a few years ago, because most > >> people just want to read about the binaries. > > > > I really doubt that this is true. > > Here is the thread: > https://www.postgresql.org/message-id/flat/CABUevExRCf8waYOsrCO-QxQL50XGapMf5dnWScOXj7X%3DMXW--g%40mail.gmail.com Sorry. I didn't mean to dispute the point that the section was added a few years ago, nor the point that most people just want to read about the binaries. I am confident that both of those things are true. What I do want to dispute is that having a four-sentence chapter in the documentation index that tells people something they can find much more easily without using the documentation at all is a good plan. I agree with the concern that Magnus expressed on the thread, i.e: > It's kind of strange that if you start your PostgreSQL journey by reading our > instructions, you get nothing useful about installing PostgreSQL from binary > packages other than "go ask somebody else about it". But I don't agree that this was the right way to address that problem. I think it would have been better to just add the download link to the existing installation chapter. That's actually what we had in chapter 18, "Installation from Source Code on Windows", since removed. But for some reason we decided that on non-Windows platforms, it needed a whole new chapter rather than an extra sentence in the existing one. I think that's massively overkill. Alternately, I think it would be reasonable to address the concern by just moving all the stuff about building from source code to an appendix, and assume people can figure out how to download the software without us needing to say anything in the documentation at all. What was weird about the state before that patch, IMHO, was that we both talked about building from source code and didn't talk about binary packages. That can be addressed either by adding a mention of binary packages, or by deemphasizing the idea of installing from source code. -- Robert Haas EDB: http://www.enterprisedb.com
Re: UUID v7
Another source: Microservices Pattern: Database per service | | | | | | | | | | | Microservices Pattern: Database per service A service's database is private to that service | | | Sergey Prokhorenko sergeyprokhore...@yahoo.com.au On Friday, 22 March 2024 at 04:58:59 pm GMT+3, Sergey Prokhorenko wrote: BTW: Each microservice should have its own database to ensure data isolation and independence, enabling better scalability and fault tolerance Source: Microservices Pattern: Shared database | | | | Microservices Pattern: Shared database | | | | | | | Microservices Pattern: Shared database | | | Sergey Prokhorenko sergeyprokhore...@yahoo.com.au On Friday, 22 March 2024 at 04:42:20 pm GMT+3, Sergey Prokhorenko wrote: Why not use a single UUID generator for the database table in this case, similar to autoincrement? Sergey Prokhorenko sergeyprokhore...@yahoo.com.au On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut wrote: On 21.03.24 16:21, Jelte Fennema-Nio wrote: > On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin wrote: >> Timer-based bits contribute to global sortability. But the real timers we >> have are not even millisecond adjusted. We can hope for ~few ms variation in >> one datacenter or in presence of atomic clocks. > > I think the main benefit of using microseconds would not be > sortability between servers, but sortability between backends. There is that, and there are also multiple backend workers for one session.
Re: documentation structure
On Thu, Mar 21, 2024 at 7:40 PM Peter Eisentraut wrote: > I'm highly against this. If I want to read about PL/Python, why should > I have to wade through PL/Perl and PL/Tcl? > > I think, abstractly, in a book, PL/Python should be a chapter of its > own. Just like GiST should be a chapter of its own. Because they are > self-contained topics. On the other hand, in a book, chapters tend to be of relatively uniform length. People don't usually write a book with some chapters that are 100+ pages long, and others that are a single page, or even just a couple of sentences. I mean, I'm sure it's been done, but it's not a normal way to write a book. And I don't believe that if someone were writing a physical book about PostgreSQL from scratch, they'd ever end up with a top-level chapter that looks anything like our GiST chapter. All of the index AM chapters are quite obviously clones of each other, and they're all quite short. Surely you'd make them sections within a chapter, not entire chapters. I do agree that PL/pgsql is more arguable. I can imagine somebody writing a book about PostgreSQL and choosing to make that topic into a whole chapter. However, I also think that people don't make decisions about what should be a chapter in a vacuum. If you've got 100 people writing a book together, which is essentially what we actually do have, and each of those people makes decisions in isolation about what is worthy of being a chapter, then you end up with exactly the kind of mess that we now have. Some chapters are long and some are short. Some are well-written and some are poorly written. Some are updated regularly and others have hardly been touched in a decade. Books have editors to straighten out those kinds of inconsistencies so that there's some uniformity to the product as a whole. The problem with that, of course, is that it invites bike-shedding. As you say, every decision that is reflected in our documentation was made for some reason, and most of them will have been made by prominent, active committers. So discussions about how to improve things can easily bog down even when people agree on the overall goals, simply because few individual changes find consensus. I hope that doesn't happen here, because I think most people who have commented so far agree that there is a problem here and that we should try to fix it. Let's not let the perfect be the enemy of the good. -- Robert Haas EDB: http://www.enterprisedb.com
Re: UUID v7
BTW: Each microservice should have its own database to ensure data isolation and independence, enabling better scalability and fault tolerance Source: Microservices Pattern: Shared database | | | | Microservices Pattern: Shared database | | | Sergey Prokhorenko sergeyprokhore...@yahoo.com.au On Friday, 22 March 2024 at 04:42:20 pm GMT+3, Sergey Prokhorenko wrote: Why not use a single UUID generator for the database table in this case, similar to autoincrement? Sergey Prokhorenko sergeyprokhore...@yahoo.com.au On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut wrote: On 21.03.24 16:21, Jelte Fennema-Nio wrote: > On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin wrote: >> Timer-based bits contribute to global sortability. But the real timers we >> have are not even millisecond adjusted. We can hope for ~few ms variation in >> one datacenter or in presence of atomic clocks. > > I think the main benefit of using microseconds would not be > sortability between servers, but sortability between backends. There is that, and there are also multiple backend workers for one session.
Re: Using the %m printf format more
Dagfinn Ilmari Mannsåker writes: > Michael Paquier writes: > >> On Wed, Mar 13, 2024 at 02:33:52PM +0100, Peter Eisentraut wrote: >>> The 0002 patch looks sensible. It would be good to fix that, otherwise it >>> could have some confusing outcomes in the future. >> >> You mean if we begin to use %m in future callers of >> emit_tap_output_v(), hypothetically? That's a fair argument. > > Yeah, developers would rightfully expect to be able to use %m with > anything that takes a printf format string. Case in point: when I was > first doing the conversion I did change the bail() and diag() calls in > pg_regress to %m, and only while I was preparing the patch for > submission did I think to check the actual implementation to see if it > was safe to do so. > > The alternative would be to document that you can't use %m with these > functions, which is silly IMO, given how simple the fix is. > > One minor improvement I can think of is to add a comment by the > save_errno declaration noting that it's needed in order to support %m. Here's an updated patch that adds such a comment. I'll add it to the commitfest later today unless someone commits it before then. > - ilmari >From 4312d746a722582202a013c7199f5c42e88db951 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= Date: Mon, 11 Mar 2024 11:08:14 + Subject: [PATCH v2] Save errno in emit_tap_output_v() and use %m in callers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This defends aganst the fprintf() calls before vprintf(…, fmt, …) clobbering errno, thus breaking %m. --- src/test/regress/pg_regress.c | 84 ++- 1 file changed, 34 insertions(+), 50 deletions(-) diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index 76f01c73f5..ea94d874b0 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -341,6 +341,14 @@ emit_tap_output_v(TAPtype type, const char *fmt, va_list argp) { va_list argp_logfile; FILE *fp; + int save_errno; + + /* + * The fprintf() calls used to output TAP protocol elements might clobber + * errno, so we need to save it and restore it before vfprintf()-ing the + * user's format string, in case it contains %m placeholders. + */ + save_errno = errno; /* * Diagnostic output will be hidden by prove unless printed to stderr. The @@ -379,9 +387,13 @@ emit_tap_output_v(TAPtype type, const char *fmt, va_list argp) if (logfile) fprintf(logfile, "# "); } + errno = save_errno; vfprintf(fp, fmt, argp); if (logfile) + { + errno = save_errno; vfprintf(logfile, fmt, argp_logfile); + } /* * If we are entering into a note with more details to follow, register @@ -492,10 +504,7 @@ make_temp_sockdir(void) temp_sockdir = mkdtemp(template); if (temp_sockdir == NULL) - { - bail("could not create directory \"%s\": %s", - template, strerror(errno)); - } + bail("could not create directory \"%s\": %m", template); /* Stage file names for remove_temp(). Unsafe in a signal handler. */ UNIXSOCK_PATH(sockself, port, temp_sockdir); @@ -616,8 +625,7 @@ load_resultmap(void) /* OK if it doesn't exist, else complain */ if (errno == ENOENT) return; - bail("could not open file \"%s\" for reading: %s", - buf, strerror(errno)); + bail("could not open file \"%s\" for reading: %m", buf); } while (fgets(buf, sizeof(buf), f)) @@ -1046,10 +1054,7 @@ config_sspi_auth(const char *pgdata, const char *superuser_name) #define CW(cond) \ do { \ if (!(cond)) \ - { \ - bail("could not write to file \"%s\": %s", \ - fname, strerror(errno)); \ - } \ + bail("could not write to file \"%s\": %m", fname); \ } while (0) res = snprintf(fname, sizeof(fname), "%s/pg_hba.conf", pgdata); @@ -1064,8 +1069,7 @@ config_sspi_auth(const char *pgdata, const char *superuser_name) hba = fopen(fname, "w"); if (hba == NULL) { - bail("could not open file \"%s\" for writing: %s", - fname, strerror(errno)); + bail("could not open file \"%s\" for writing: %m", fname); } CW(fputs("# Configuration written by config_sspi_auth()\n", hba) >= 0); CW(fputs("host all all 127.0.0.1/32 sspi include_realm=1 map=regress\n", @@ -1079,8 +1083,7 @@ config_sspi_auth(const char *pgdata, const char *superuser_name) ident = fopen(fname, "w"); if (ident == NULL) { - bail("could not open file \"%s\" for writing: %s", - fname, strerror(errno)); + bail("could not open file \"%s\" for writing: %m", fname); } CW(fputs("# Configuration written by config_sspi_auth()\n", ident) >= 0); @@ -1210,7 +1213,7 @@ spawn_process(const char *cmdline) pid = fork(); if (pid == -1) { - bail("could not fork: %s", strerror(errno)); + bail("could not fork: %m"); } if (pid == 0) { @@ -1226,7 +1229,7 @@ spawn_process(const char *cmdline) cmdline2 = psprintf("exec %s", cmdline); execl(shellprog, shellprog, "-c", cmdline2, (char *) NULL); /*