date:20240322

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-22 Thread Amit Kapila

On Sat, Mar 23, 2024 at 3:02 AM Bharath Rupireddy
 wrote:
>
> On Fri, Mar 22, 2024 at 3:15 PM Bertrand Drouvot
>  wrote:
> >
> >
> > Worth to add some tests too (or we postpone them in future commits because 
> > we're
> > confident enough they will follow soon)?
>
> Yes. Added some tests in a new TAP test file named
> src/test/recovery/t/043_replslot_misc.pl. This new file can be used to
> add miscellaneous replication tests in future as well. I couldn't find
> a better place in existing test files - tried having the new tests for
> physical slots in t/001_stream_rep.pl and I didn't find a right place
> for logical slots.
>

How about adding the test in 019_replslot_limit? It is not a direct
fit but I feel later we can even add 'invalid_timeout' related tests
in this file which will use last_inactive_time feature. It is also
possible that some of the tests added by the 'invalid_timeout' feature
will obviate the need for some of these tests.

Review of v15
==
1.
@@ -1026,7 +1026,8 @@ CREATE VIEW pg_replication_slots AS
 L.conflicting,
 L.invalidation_reason,
 L.failover,
-L.synced
+L.synced,
+L.last_inactive_time
 FROM pg_get_replication_slots() AS L

As mentioned previously, let's keep these new fields before
conflicting and after two_phase.

2.
+# Get last_inactive_time value after slot's creation. Note that the
slot is still
+# inactive unless it's used by the standby below.
+my $last_inactive_time_1 = $primary->safe_psql('postgres',
+ qq(SELECT last_inactive_time FROM pg_replication_slots WHERE
slot_name = '$sb_slot' AND last_inactive_time IS NOT NULL;)
+);

We should check $last_inactive_time_1 to be a valid value and add a
similar check for logical slots.

3. BTW, why don't we set last_inactive_time for temporary slots
(RS_TEMPORARY) as well? Don't we even invalidate temporary slots? If
so, then I think we should set last_inactive_time for those as well
and later allow them to be invalidated based on timeout parameter.

-- 
With Regards,
Amit Kapila.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-22 Thread Amit Kapila

On Fri, Mar 22, 2024 at 7:17 PM Bertrand Drouvot
 wrote:
>
> On Fri, Mar 22, 2024 at 06:02:11PM +0530, Amit Kapila wrote:
> > On Fri, Mar 22, 2024 at 5:30 PM Bertrand Drouvot
> >  wrote:
> > > On Fri, Mar 22, 2024 at 03:56:23PM +0530, Amit Kapila wrote:
> > > > >
> > > > > That would avoid testing twice "slot->data.persistency == 
> > > > > RS_PERSISTENT".
> > > > >
> > > >
> > > > That sounds like a good idea. Also, don't we need to consider physical
> > > > slots where we don't reserve WAL during slot creation? I don't think
> > > > there is a need to set inactive_at for such slots.
> > >
> > > If the slot is not active, why shouldn't we set inactive_at? I can 
> > > understand
> > > that such a slots do not present "any risks" but I think we should still 
> > > set
> > > inactive_at (also to not give the false impression that the slot is 
> > > active).
> > >
> >
> > But OTOH, there is a chance that we will invalidate such slots even
> > though they have never reserved WAL in the first place which doesn't
> > appear to be a good thing.
>
> That's right but I don't think it is not a good thing. I think we should treat
> inactive_at as an independent field (like if the timeout one does not exist at
> all) and just focus on its meaning (slot being inactive). If one sets a 
> timeout
> (> 0) and gets an invalidation then I think it works as designed (even if the
> slot does not present any "risk" as it does not hold any rows or WAL).
>

Fair point.

-- 
With Regards,
Amit Kapila.

Re: Large block sizes support in Linux

2024-03-22 Thread Thomas Munro

On Fri, Mar 22, 2024 at 10:56 PM Pankaj Raghav (Samsung)
 wrote:
> My team and I have been working on adding Large block size(LBS)
> support to XFS in Linux[1]. Once this feature lands upstream, we will be
> able to create XFS with FS block size > page size of the system on Linux.
> We also gave a talk about it in Linux Plumbers conference recently[2]
> for more context. The initial support is only for XFS but more FSs will
> follow later.

Very cool!

(I used XFS on IRIX in the 90s, and it had large blocks then, a
feature lost in the port to Linux AFAIK.)

> On an x86_64 system, fs block size was limited to 4k, but traditionally
> Postgres uses 8k as its default internal page size. With LBS support,
> fs block size can be set to 8K, thereby matching the Postgres page size.
>
> If the file system block size == DB page size, then Postgres can have
> guarantees that a single DB page will be written as a single unit during
> kernel write back and not split.
>
> My knowledge of Postgres internals is limited, so I'm wondering if there
> are any optimizations or potential optimizations that Postgres could
> leverage once we have LBS support on Linux?

FWIW here are a couple of things I wrote about our storage atomicity
problem, for non-PostgreSQL hackers who may not understand our project
jargon:

https://wiki.postgresql.org/wiki/Full_page_writes
https://freebsdfoundation.org/wp-content/uploads/2023/02/munro_ZFS.pdf

The short version is that we (and MySQL, via a different scheme with
different tradeoffs) could avoid writing all our stuff out twice if we
could count on atomic writes of a suitable size on power failure, so
the benefits are very large.  As far as I know, there are two things
we need from the kernel and storage to do that on "overwrite"
filesystems like XFS:

1.  The disk must promise that its atomicity-on-power-failure is a
multiple of our block size -- something like NVMe AWUPF, right?  My
devices seem to say 0 :-(  Or I guess the filesystem has to
compensate, but then it's not exactly an overwrite filesystem
anymore...

2.  The kernel must promise that there is no code path in either
buffered I/O or direct I/O that will arbitrarily chop up our 8KB (or
other configured block size) writes on some smaller boundary, most
likely sector I guess, on their way to the device, as you were saying.
Not just in happy cases, but even under memory pressure, if
interrupted, etc etc.

Sounds like you're working on problem #2 which is great news.

I've been wondering for a while how a Unixoid kernel should report
these properties to userspace where it knows them, especially on
non-overwrite filesystems like ZFS where this sort of thing works
already, without stuff like AWUPF working the way one might hope.
Here was one throw-away idea on the back of a napkin about that, for
what little it's worth:

https://wiki.postgresql.org/wiki/FreeBSD/AtomicIO

Re: SQL:2011 application time

2024-03-22 Thread jian he

On Fri, Mar 22, 2024 at 11:49 PM Peter Eisentraut  wrote:
>
> On 22.03.24 01:35, Paul Jungwirth wrote:
> >  > 1. In ri_triggers.c ri_KeysEqual, you swap the order of arguments to
> > ri_AttributesEqual():
> >  >
> >  > -   if (!ri_AttributesEqual(riinfo->ff_eq_oprs[i],
> > RIAttType(rel, attnums[i]),
> >  > -   oldvalue, newvalue))
> >  > +   if (!ri_AttributesEqual(eq_opr, RIAttType(rel, attnums[i]),
> >  > +   newvalue, oldvalue))
> >  >
> >  > But the declared arguments of ri_AttributesEqual() are oldvalue and
> > newvalue, so passing them
> >  > backwards is really confusing.  And the change does matter in the tests.
> >  >
> >  > Can we organize this better?
> >
> > I renamed the params and actually the whole function. All it's doing is
> > execute `oldvalue op newvalue`, casting if necessary. So I changed it to
> > ri_CompareWithCast and added some documentation. In an earlier version
> > of this patch I had a separate function for the PERIOD comparison, but
> > it's just doing the same thing, so I think the best thing is to give the
> > function a more accurate name and use it.
>
> Ok, I see now, and the new explanation is better.
>
> But after reading the comment in the function about collations, I think
> there could be trouble.  As long as we are only comparing for equality
> (and we don't support nondeterministic global collations), then we can
> use any collation to compare for equality.  But if we are doing
> contained-by, then the collation does matter, so we would need to get
> the actual collation somehow.  So as written, this might not always work
> correctly.
>
> I think it would be safer for now if we just kept using the equality
> operation even for temporal foreign keys.  If we did that, then in the
> case that you update a key to a new value that is contained by the old
> value, this function would say "not equal" and fire all the checks, even
> though it wouldn't need to.  This is kind of similar to the "false
> negatives" that the comment already talks about.
>
> What do you think?
>

we don't need to worry about primary key and foreign key with
different collation.
because it will be error out as incompatible data type,
foreign key constraint will not be created.

if there are the same collation, when we build the query string, we
don't need to worry about collation.
because at runtime, the operator associated oprcode
will fetch collation information later.

main operator and the main oprcode related to this patch(0001, 0002) are:
range_contained_by_multirange
range_eq
range_overlaps
range_contained_by
the first 3 functions will fetch collation information within range_cmp_bounds.
range_contained_by will fetch collation information in
range_contains_elem_internal.

demo:
CREATE COLLATION case_insensitive (provider = icu, locale =
'und-u-ks-level2', deterministic = false);
DROP TABLE IF exists temporal_fk_rng2rng;
DROP TABLE IF exists temporal_rng;
DROP TYPE textrange_case_insensitive;
create type textrange_case_insensitive as range(subtype=text,
collation=case_insensitive);
CREATE TABLE temporal_rng (id int4range, valid_at textrange_case_insensitive);
ALTER TABLE temporal_rng
ADD CONSTRAINT temporal_rng_pk
PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);
CREATE TABLE temporal_fk_rng2rng (
id int4range,
valid_at  textrange_case_insensitive,
parent_id int4range,
CONSTRAINT temporal_fk_rng2rng_fk2 FOREIGN KEY (parent_id, PERIOD valid_at)
REFERENCES temporal_rng (id, PERIOD valid_at)
);
INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)',
textrange_case_insensitive('c', 'h','[]'));

--fail
INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id)
VALUES ('[1,2)', textrange_case_insensitive('B', 'B','[]'), '[1,2)');

--fail.
INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id)
VALUES ('[1,2)', textrange_case_insensitive('a', 'F','[]'), '[1,2)');

--fail.
INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id)
VALUES ('[1,2)', textrange_case_insensitive('e', 'Z','[]'), '[1,2)');

--ok
INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id)
VALUES ('[1,2)', textrange_case_insensitive('d', 'F','[]'), '[1,2)');

Re: Large block sizes support in Linux

2024-03-22 Thread Bruce Momjian

On Fri, Mar 22, 2024 at 10:31:11PM +0100, Tomas Vondra wrote:
> Right, but things change over time - current storage devices support
> much larger sectors (LBA format), usually 4K. And if you do I/O with
> this size, it's usually atomic.
> 
> AFAIK if you built Postgres with 4K pages, on a device with 4K LBA
> format, that would not need full-page writes - we always do I/O in 4k
> pages, and block layer does I/O (during writeback from page cache) with
> minimum guaranteed size = logical block size. 4K are great for OLTP
> systems in general, it'd be even better if we didn't need to worry about
> torn pages (but the tricky part is to be confident it's safe to disable
> them on a particular system).

Yes, even if the file system is 8k, and the storage is 8k, we only know
that torn pages are impossible if the file system never overwrites
existing 8k pages, but writes new ones and then makes it active.  I
think ZFS does that to handle snapshots.

> The other thing is - is there a reliable way to say when the guarantees
> actually apply? I mean, how would the administrator *know* it's safe to
> set full_page_writes=off, or even better how could we verify this when
> the database starts (and complain if it's not safe to disable FPW)?

Yes, this is quite hard to know.  Our docs have:

https://www.postgresql.org/docs/current/wal-reliability.html

Another risk of data loss is posed by the disk platter write operations
themselves. Disk platters are divided into sectors, commonly 512 bytes
each. Every physical read or write operation processes a whole sector.
When a write request arrives at the drive, it might be for some multiple
of 512 bytes (PostgreSQL typically writes 8192 bytes, or 16 sectors, at
a time), and the process of writing could fail due to power loss at any
time, meaning some of the 512-byte sectors were written while others
were not. To guard against such failures, PostgreSQL periodically writes
full page images to permanent WAL storage before modifying the actual
page on disk. By doing this, during crash recovery PostgreSQL can
--> restore partially-written pages from WAL. If you have file-system
--> software that prevents partial page writes (e.g., ZFS), you can turn off
--> this page imaging by turning off the full_page_writes parameter.
--> Battery-Backed Unit (BBU) disk controllers do not prevent partial page
--> writes unless they guarantee that data is written to the BBU as full
--> (8kB) pages.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Potential stack overflow in incremental base backup

2024-03-22 Thread Thomas Munro

On Fri, Mar 8, 2024 at 6:53 AM Robert Haas  wrote:
> But I think that's really only necessary if we're actually going to
> get rid of the idea of segmented relations altogether, which I don't
> think is happening at least for v17, and maybe not ever.

Yeah, I consider the feedback on ext4's size limitations to have
completely killed the idea of getting rid of segments for the
foreseeable future, at least in standard md.c (though who knows what
people will do with pluggable smgr?).  As for initdb --rel-segsize (CF
#4305) for md.c, I abandoned plans to do that for 17 because I
couldn't see what to do about this issue.  Incremental backup
effectively relies on smaller segments, by using them as
problem-dividing granularity for checksumming and memory usage.
That'll take some research...

Re: session username in default psql prompt?

2024-03-22 Thread Andrew Dunstan

On Fri, Mar 22, 2024 at 7:34 PM Tom Lane  wrote:

>
> On the whole, I think we'd get more complaints about the default
> prompt having more-or-less doubled in length than we'd get kudos
> about this being a great usability improvement.  Prompt space is
> expensive and precious, at least for people who aren't in the
> habit of working in very wide windows.
>
>
>

I'm not sure you're right, but in view of the opposition I won't press it.
Thanks to Kori for the patch.

cheers

andrew

Re: Cannot find a working 64-bit integer type on Illumos

2024-03-22 Thread Tom Lane

Thomas Munro  writes:
> . o O ( int64_t, PRIdi64, etc were standardised a quarter of a century ago )

Yeah.  Now that we require C99 it's probably reasonable to assume
that those things exist.  I wouldn't be in favor of ripping out our
existing notations like UINT64CONST, because the code churn would be
substantial and the gain minimal.  But we could imagine reimplementing
that stuff atop  and then getting rid of the configure-time
probes.

regards, tom lane

Re: sublink [exists (select xxx group by grouping sets ())] causes an assertion error

2024-03-22 Thread Tom Lane

Jeff Davis  writes:
> On Fri, 2024-03-22 at 12:28 -0400, Tom Lane wrote:
>> Thanks for the report.  I did some bisecting and found that the crash
>> appears at Jeff's commit c8aeaf3ab (which introduced this assertion)
>> and disappears at Heikki's c4649cce3 (which removed it).  So I would
>> say that the problem is "this assertion is wrong", and we should fix
>> the problem by fixing the assertion, not by hacking around in distant
>> calling code.  On the whole, since this code has been dead for
>> several versions, I'd be inclined to just remove the assertion.

> c4649cce3 didn't add additional calls to LogicalTapeSetBlocks(), so I'm
> not sure if the removal of the Assert was related to his changes, or if
> he just realized the assertion was wrong and removed it along the way?

My guess is he just zapped it because the code block was dependent
on the "tape" abstraction which was going away.  Heikki?

> Also, without the assertion, the word "should" in the comment is
> ambiguous (does it mean "must not" or something else), and it still
> exists in master. Do we care about the calculation being wrong if
> there's an unfinished write?

I think it's actually fine.  The callers of LogicalTapeSetBlocks only
use the number for statistics or trace reporting, so precision isn't
critical in the first place, but I think what they care about is the
amount of data that's really been written out to files.  The
comment should be clarified, for sure.

regards, tom lane

Re: sublink [exists (select xxx group by grouping sets ())] causes an assertion error

2024-03-22 Thread Jeff Davis

On Fri, 2024-03-22 at 12:28 -0400, Tom Lane wrote:
> Thanks for the report.  I did some bisecting and found that the crash
> appears at Jeff's commit c8aeaf3ab (which introduced this assertion)
> and disappears at Heikki's c4649cce3 (which removed it).  So I would
> say that the problem is "this assertion is wrong", and we should fix
> the problem by fixing the assertion, not by hacking around in distant
> calling code.  On the whole, since this code has been dead for
> several versions, I'd be inclined to just remove the assertion.

c4649cce3 didn't add additional calls to LogicalTapeSetBlocks(), so I'm
not sure if the removal of the Assert was related to his changes, or if
he just realized the assertion was wrong and removed it along the way?

Also, without the assertion, the word "should" in the comment is
ambiguous (does it mean "must not" or something else), and it still
exists in master. Do we care about the calculation being wrong if
there's an unfinished write? If not, I'll just clarify that the
calculation doesn't take into account still-buffered data. If we do
care, then something might need to be fixed.

Regards,
Jeff Davis

Re: Cannot find a working 64-bit integer type on Illumos

2024-03-22 Thread Thomas Munro

On Sat, Mar 23, 2024 at 6:26 AM Tom Lane  wrote:
> conftest.c:139:5: error: no previous prototype for 'does_int64_work' 
> [-Werror=missing-prototypes]
>   139 | int does_int64_work()
>   | ^~~
> cc1: all warnings being treated as errors
> configure:17003: $? = 1
> configure: program exited with status 1

. o O ( int64_t, PRIdi64, etc were standardised a quarter of a century ago )

Re: pg_upgrade --copy-file-range

2024-03-22 Thread Thomas Munro

Hmm, this discussion seems to assume that we only use
copy_file_range() to copy/clone whole segment files, right?  That's
great and may even get most of the available benefit given typical
databases with many segments of old data that never changes, but... I
think copy_write_range() allows us to go further than the other
whole-file clone techniques: we can stitch together parts of an old
backup segment file and an incremental backup to create a new file.
If you're interested in minimising disk use while also removing
dependencies on the preceding chain of backups, then it might make
sense to do that even if you *also* have to read the data to compute
the checksums, I think?  That's why I mentioned it: if
copy_file_range() (ie sub-file-level block sharing) is a solution in
search of a problem, has the world ever seen a better problem than
pg_combinebackup?

Re: BitmapHeapScan streaming read user and prelim refactoring

2024-03-22 Thread Melanie Plageman

On Tue, Mar 19, 2024 at 02:33:35PM +0200, Heikki Linnakangas wrote:
> On 18/03/2024 17:19, Melanie Plageman wrote:
> > I've attached v7 rebased over this commit.
> 
> If we delayed table_beginscan_bm() call further, after starting the TBM
> iterator, we could skip it altogether when the iterator is empty.
> 
> That's a further improvement, doesn't need to be part of this patch set.
> Just caught my eye while reading this.

Hmm. You mean like until after the first call to tbm_[shared]_iterate()?
AFAICT, tbm_begin_iterate() doesn't tell us anything about whether or
not the iterator is "empty". Do you mean cases when the bitmap has no
blocks in it? It seems like we should be able to tell that from the
TIDBitmap.

> 
> > v7-0003-Push-BitmapHeapScan-skip-fetch-optimization-into-.patch
> 
> I suggest to avoid the double negative with SO_CAN_SKIP_FETCH, and call the
> flag e.g. SO_NEED_TUPLE.

Agreed. Done in attached v8. Though I wondered if it was a bit weird
that the flag is set in the common case and not set in the uncommon
case...

> As yet another preliminary patch before the streaming read API, it would be
> nice to move the prefetching code to heapam.c too.

I've done this, but I can say it is not very pretty. see 0013. I had to
add a bunch of stuff to TableScanDescData and HeapScanDescData which are
only used for bitmapheapscans. I don't know if it makes the BHS
streaming read user patch easier to review, but I don't think what I
have in 0013 is committable to Postgres. Maybe there was another way I
could have approached it. Let me know what you think.

In addition to bloating the table descriptors, note that it was
difficult to avoid one semantic change -- with 0013, we no longer
prefetch or adjust prefetch target when emitting each empty tuple --
though I think this is could actually be desirable.

> What's the point of having separate table_scan_bitmap_next_block() and
> table_scan_bitmap_next_tuple() functions anymore? The AM owns the TBM
> iterator now. The executor node updates the lossy/exact page counts, but
> that's the only per-page thing it does now.

Oh, interesting. Good point. I've done this in 0015. If you like the way
it turned out, I can probably rebase this back into an earlier point in
the set and end up dropping some of the other incremental changes (e.g.
0008).

> > /*
> >  * If this is the first scan of the underlying table, create 
> > the table
> >  * scan descriptor and begin the scan.
> >  */
> > if (!scan)
> > {
> > uint32  extra_flags = 0;
> > 
> > /*
> >  * We can potentially skip fetching heap pages if we do 
> > not need
> >  * any columns of the table, either for checking 
> > non-indexable
> >  * quals or for returning data.  This test is a bit 
> > simplistic, as
> >  * it checks the stronger condition that there's no 
> > qual or return
> >  * tlist at all. But in most cases it's probably not 
> > worth working
> >  * harder than that.
> >  */
> > if (node->ss.ps.plan->qual == NIL && 
> > node->ss.ps.plan->targetlist == NIL)
> > extra_flags |= SO_CAN_SKIP_FETCH;
> > 
> > scan = node->ss.ss_currentScanDesc = table_beginscan_bm(
> > 
> > 
> > node->ss.ss_currentRelation,
> > 
> > 
> > node->ss.ps.state->es_snapshot,
> > 
> > 0,
> > 
> > NULL,
> > 
> > extra_flags);
> > }
> > 
> > scan->tbmiterator = tbmiterator;
> > scan->shared_tbmiterator = shared_tbmiterator;
> 
> How about passing the iterator as an argument to table_beginscan_bm()? You'd
> then need some other function to change the iterator on rescan, though. Not
> sure what exactly to do here, but feels that this part of the API is not
> fully thought-out. Needs comments at least, to explain who sets tbmiterator
> / shared_tbmiterator and when. For comparison, for a TID scan there's a
> separate scan_set_tidrange() table AM function. Maybe follow that example
> and introduce scan_set_tbm_iterator().

I've spent quite a bit of time playing around with the code trying to
make it less terrible than what I had

Re: New Table Access Methods for Multi and Single Inserts

2024-03-22 Thread Jeff Davis

On Thu, 2024-03-21 at 13:10 +0530, Bharath Rupireddy wrote:
> I'm attaching the v13 patches using virtual tuple slots for buffered
> tuples for multi inserts.

Comments:

* Do I understand correctly that CMV, RMV, and CTAS experience a
performance benefit, but COPY FROM does not? And is that because COPY
already used table_multi_insert, whereas CMV and RMV did not?

* In the COPY FROM code, it looks like it's deciding whether to flush
based on MAX_BUFFERED_TUPLES, but the slot array is allocated with
MAX_BUFFERED_SLOTS (they happen to be the same for heap, but perhaps
not for other AMs). The copy code shouldn't be using internal knowledge
of the multi-insert code; it should know somehow from the API when the
right time is to flush.

* How is the memory management expected to work? It looks like COPY
FROM is using the ExprContext when running the input functions, but we
really want to be using a memory context owned by the table AM, right?

* What's the point of the table_multi_insert_slots() "num_slots"
argument? The only caller simply discards it.

* table_tuple_insert_v2 isn't called anywhere, what's it for?

* the "v2" naming is inconsistent -- it seems you only added it in
places where there's a name conflict, which makes it hard to tell which
API methods go together. I'm not sure how widely table_multi_insert* is
used outside of core, so it's possible that we may even be able to just
change those APIs and the few extensions that call it can be updated.

* Memory usage tracking should be done in the AM by allocating
everything in a single context so it's easy to check the size. Don't
manually add up memory.

* I don't understand: "Caller may have got the slot using
heap_multi_insert_next_free_slot, filled it and passed. So, skip
copying in such a case." If the COPY FROM had a WHERE clause and
skipped a tuple after filling the slot, doesn't that mean the slot has
bogus data from the last tuple?

* We'd like this to work for insert-into-select (IIS) and logical
replication, too. Do you see any problem there, or is it just a matter
of code?

* Andres had some comments[1] that don't seem entirely addressed.
  - You are still allocating the AM-specific part of TableModifyState
as a separately-allocated chunk.
  - It's still called TableInsertState rather than TableModifyState as
he suggested. If you change that, you should also change to
table_modify_begin/end.
  - CID: I suppose Andres is considering the use case of "BEGIN; ...
ten thousand inserts ... COMMIT;". I don't think this problem is really
solvable (discussed below) but we should have some response/consensus
on that point.
  - He mentioned that we only need one new method implemented by the
AM. I don't know if one is enough, but 7 does seem excessive. I have
some simplification ideas below.

Overall:

If I understand correctly, there are two ways to use the API:

1. used by CTAS, MV:

  tistate = table_insert_begin(...);
  table_multi_insert_v2(tistate, tup1);
  ...
  table_multi_insert_v2(tistate, tupN);
  table_insert_end(tistate);

2. used by COPY ... FROM:

  tistate = table_insert_begin(..., SKIP_FLUSH);
  if (multi_insert_slot_array_is_full())
 table_multi_insert_flush(tistate);
  slot = table_insert_next_free_slot(tistate);
  ... fill slot with tup1
  table_multi_insert_v2(tistate, tup1);
  ...
  slot = table_insert_next_free_slot(tistate);
  ... fill slot with tupN
  table_multi_insert_v2(tistate, tupN);
  table_insert_end(tistate);

Those two uses need comments explaining what's going on. It appears the
SKIP_FLUSH flag is used to indicate which use the caller intends.

Use #2 is not enforced well by either the API or runtime checks. If the
caller neglects to check for a full buffer, it appears that it will
just overrun the slots array.

Also, for use #2, table_multi_insert_v2() doesn't do much other than
incrementing the memory used. The slot will never be NULL because it
was obtained with table_multi_insert_next_free_slot(), and the other
two branches don't happen when SKIP_FLUSH is true.

The real benefit to COPY of your new API is that the AM can manage
slots for itself, and how many tuples may be tracked (which might be a
lot higher for non-heap AMs).

I agree with Luc Vlaming's comment[2] that more should be left to the
table AM. Your patch tries too hard to work with the copyfrom.c slot
array, somehow sharing it with the table AM. That adds complexity to
the API and feels like a layering violation.

We also shouldn't mandate a slot array in the API. Each slot is 64
bytes -- a lot of overhead for small tuples. For a non-heap AM, it's
much better to store the tuple data in a big contiguous chunk with
minimal overhead.

Let's just have a simple API like:

  tmstate = table_modify_begin(...);
  table_modify_save_insert(tmstate, tup1);
  ...
  table_modify_save_insert(tmstate, tupN);
  table_modify_end(tmstate);

and leave it up to the AM to do all the buffering and flushing work (as
Luc Vlaming suggested[2]).

That leaves one

Re: Trying to build x86 version on windows using meson

2024-03-22 Thread Andres Freund

Hi,

On 2024-03-21 13:17:44 -0400, Dave Cramer wrote:
> Attached correct log file

Hm. So there's something a bit odd:

> Build started at 2024-03-21T13:07:08.707715
> Main binary: C:\Program Files\Meson\meson.exe
> Build Options: '-Dextra_include_dirs=c:\Program Files\OpenSSL-Win64\include' 
> -Derrorlogs=True '-Dextra_lib_dirs=c:\Program Files\OpenSSL-win64' 
> '-Dprefix=c:\postgres86'
> Python system: Windows
> The Meson build system
> Version: 1.3.1
> Source dir: C:\Users\davec\projects\postgresql
> Build dir: C:\Users\davec\projects\postgresql\build
> Build type: native build

So meson thinks this is a native build, not a cross build. But then later
realizes that generated binaries and the current platform aren't the same. And
thus errors out.

The line numbers don't match my tree, but I think what's failing is the
sizeof() check. Which has support for cross builds, but it only uses that
(slower) path if it knows that a cross build is being used.

I suggest actually telling meson to cross compile. I don't quite know what
properties you're going to need, but something like the following (put it in a
file, point meson to it wity --cross-file) might give you a start:

[properties]
needs_exe_wrapper = false

[binaries]
c = 'cl'
cpp = 'cl'
ar = 'lib'
windres = 'rc'

[host_machine]
system = 'windows'
cpu_family = 'x86_64'
cpu = 'x86_64'
endian = 'little'

Greetings,

Andres Freund

Re: session username in default psql prompt?

2024-03-22 Thread Tom Lane

Andrew Dunstan  writes:
> On Fri, Mar 22, 2024 at 4:04 PM Robert Haas  wrote:
>> I'm not really a fan of this. Right now my prompt looks like this:
>> robert.haas=#
>> If we did this, it would say:
>> robert.h...@robert.haas=#

There would be similar duplication for, eg, the postgres user
connected to the postgres database.  However, I'm more worried
about the case where they don't match, because then the %~
suggestion doesn't help shorten it.

> Of course, people can put this in their .psqlrc, and I do. The suggestion
> came about because I had a couple of instances where people using the
> default prompt showed me stuff and the problem arose from confusion about
> which user they were connected as.

On the whole, I think we'd get more complaints about the default
prompt having more-or-less doubled in length than we'd get kudos
about this being a great usability improvement.  Prompt space is
expensive and precious, at least for people who aren't in the
habit of working in very wide windows.

regards, tom lane

Re: Combine Prune and Freeze records emitted by vacuum

2024-03-22 Thread Heikki Linnakangas


On 20/03/2024 21:17, Melanie Plageman wrote:

Attached patch changes-for-0001.patch has a bunch of updated comments --
especially for heapam_xlog.h (including my promised diagram). And a few
suggestions (mostly changes that I should have made before).


New version of these WAL format changes attached. Squashed to one patch now.


+   // TODO: should we avoid this if we only froze? heap_xlog_freeze() 
doesn't
+   // do it


Ah yes, that makes sense, did that.


In the final commit message, I think it is worth calling out that all of
those freeze logging functions heap_log_freeze_eq/cmp/etc are lifted and
shifted from one file to another. When I am reviewing a big diff, it is
always helpful to know where I need to review line-by-line.


Done.


 From 5d6fc2ffbdd839e0b69242af16446a46cf6a2dc7 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 20 Mar 2024 13:49:59 +0200
Subject: [PATCH v5 04/26] 'nplans' is a pointer

I'm surprised the compiler didn't warn about this


oops. while looking at this, I noticed that the asserts I added that
nredirected > 0, ndead > 0, and nunused > 0 have the same problem.


Good catch, fixed.


- remove xlhp_conflict_horizon and store a TransactionId directly. A
   separate struct would make sense if we needed to store anything else
   there, but for now it just seems like more code.


makes sense. I just want to make sure heapam_xlog.h makes it extra clear
that this is happening. I see your comment addition. If you combine it
with my comment additions in the attached patch for 0001, hopefully that
makes it clear enough.


Thanks. I spent more time on the comments throughout the patch. And one 
notable code change: I replaced the XLHP_LP_TRUNCATE_ONLY flag with 
XLHP_CLEANUP_LOCK. XLHP_CLEANUP_LOCK directly indicates if you need a 
cleanup lock to replay the record. It must always be set when 
XLHP_HAS_REDIRECTIONS or XLHP_HAS_DEAD_ITEMS is set, because replaying 
those always needs a cleanup lock. That felt easier to document and 
understand than XLHP_LP_TRUNCATE_ONLY.



 From 8af186ee9dd8c7dc20f37a69b34cab7b95faa43b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 20 Mar 2024 14:03:06 +0200
Subject: [PATCH v5 07/26] Add comment to log_heap_prune_and_freeze().

XXX: This should be rewritten, but I tried to at least list some
important points.


Are you thinking that it needs to mention more things or that the things
it mentions need more detail?


I previously just quickly jotted down things that seemed worth 
mentioning in the comment. It was not so bad actually, but I reworded it 
a little.



 From b26e36ba8614d907a6e15810ed4f684f8f628dd2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 20 Mar 2024 14:53:31 +0200
Subject: [PATCH v5 08/26] minor refactoring in log_heap_prune_and_freeze()

Mostly to make local variables more tightly-scoped.


So, I don't think you can move those sub-records into the tighter scope.
If you run tests with this applied, you'll see it crashes and a lot of
the data in the record is wrong. If you move the sub-record declarations
out to a wider scope, the tests pass.

The WAL record data isn't actually copied into the buffer until

recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE_FREEZE);

after registering everything.
So all of those sub-records you made are out of scope the time it tries
to copy from them.

I spent the better part of a day last week trying to figure out what was
happening after I did the exact same thing. I must say that I found the
xloginsert API incredibly unhelpful on this point.


Oops. I had that in mind and that was actually why I moved the 
XLogRegisterData() call to the end of the function, because I found it 
confusing to register the struct before filling it in completely, even 
though it works perfectly fine. But then I missed it anyway when I moved 
the local variables. I added a brief comment on that.



I would like to review the rest of the suggested changes in this patch
after we fix the issue I just mentioned.


Thanks, review is appreciated. I feel this is ready now, so barring any 
big new issues, I plan to commit this early next week.


There is another patch in the commitfest that touches this area: 
"Recording whether Heap2/PRUNE records are from VACUUM or from 
opportunistic pruning" [1]. That actually goes in the opposite direction 
than this patch. That patch wants to add more information, to show 
whether a record was emitted by VACUUM or on-access pruning, while this 
patch makes the freezing and VACUUM's 2nd phase records also look the 
same. We could easily add more flags to xl_heap_prune to distinguish 
them. Or assign different xl_info code for them, like that other patch 
proposed. But I don't think that needs to block this patch, that can be 
added as a separate patch.


[1] 
https://www.postgresql.org/message-id/cah2-wzmsevhox+hjpfmqgcxwwdgnzp0r9f+vbnpok6tgxmp...@mail.gmail.com


--
Heikki Linnakangas
Neon (https://neon.tech)
From

Re: session username in default psql prompt?

2024-03-22 Thread Andrew Dunstan

On Fri, Mar 22, 2024 at 4:04 PM Robert Haas  wrote:

> On Wed, Mar 13, 2024 at 4:56 AM Andrew Dunstan 
> wrote:
> > Reposting as the archive mail processor doesn't seem to like the Apple
> > mail attachment.
>
> I'm not really a fan of this. Right now my prompt looks like this:
>
> robert.haas=#
>
> If we did this, it would say:
>
> robert.h...@robert.haas=#
>

Hmm. Perhaps we should change the default to "%n@%~%R%x%# "

Then when connected to your eponymous database you would see

robert.haas@~=#

Of course, people can put this in their .psqlrc, and I do. The suggestion
came about because I had a couple of instances where people using the
default prompt showed me stuff and the problem arose from confusion about
which user they were connected as.

> I have yet to meet anyone who doesn't think that one Robert Haas is
> quite enough already, and perhaps too much by half.
>

perish the thought.

cheers

andrew

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-22 Thread Bharath Rupireddy

On Fri, Mar 22, 2024 at 3:15 PM Bertrand Drouvot
 wrote:
>
> Looking at v14-0002:

Thanks for reviewing. I agree that 0002 with last_inactive_at can go
independently and be of use on its own in addition to helping
implement inactive_timeout based invalidation.

> 1 ===
>
> @@ -691,6 +699,13 @@ ReplicationSlotRelease(void)
> ConditionVariableBroadcast(>active_cv);
> }
>
> +   if (slot->data.persistency == RS_PERSISTENT)
> +   {
> +   SpinLockAcquire(>mutex);
> +   slot->last_inactive_at = GetCurrentTimestamp();
> +   SpinLockRelease(>mutex);
> +   }
>
> I'm not sure we should do system calls while we're holding a spinlock.
> Assign a variable before?

Can do that. Then, the last_inactive_at = current_timestamp + mutex
acquire time. But, that shouldn't be a problem than doing system calls
while holding the mutex. So, done that way.

> 2 ===
>
> Also, what about moving this here?
>
> "
> if (slot->data.persistency == RS_PERSISTENT)
> {
> /*
>  * Mark persistent slot inactive.  We're not freeing it, just
>  * disconnecting, but wake up others that may be waiting for it.
>  */
> SpinLockAcquire(>mutex);
> slot->active_pid = 0;
> SpinLockRelease(>mutex);
> ConditionVariableBroadcast(>active_cv);
> }
> "
>
> That would avoid testing twice "slot->data.persistency == RS_PERSISTENT".

Ugh. Done that now.

> 3 ===
>
> @@ -2341,6 +2356,7 @@ RestoreSlotFromDisk(const char *name)
>
> slot->in_use = true;
> slot->active_pid = 0;
> +   slot->last_inactive_at = 0;
>
> I think we should put GetCurrentTimestamp() here. It's done in v14-0006 but I
> think it's better to do it in 0002 (and not taking care of inactive_timeout).

Done.

> 4 ===
>
> Track last_inactive_at in pg_replication_slots
>
>  doc/src/sgml/system-views.sgml   | 11 +++
>  src/backend/catalog/system_views.sql |  3 ++-
>  src/backend/replication/slot.c   | 16 
>  src/backend/replication/slotfuncs.c  |  7 ++-
>  src/include/catalog/pg_proc.dat  |  6 +++---
>  src/include/replication/slot.h   |  3 +++
>  src/test/regress/expected/rules.out  |  5 +++--
>  7 files changed, 44 insertions(+), 7 deletions(-)
>
> Worth to add some tests too (or we postpone them in future commits because 
> we're
> confident enough they will follow soon)?

Yes. Added some tests in a new TAP test file named
src/test/recovery/t/043_replslot_misc.pl. This new file can be used to
add miscellaneous replication tests in future as well. I couldn't find
a better place in existing test files - tried having the new tests for
physical slots in t/001_stream_rep.pl and I didn't find a right place
for logical slots.

> 5 ===
>
> Most of the fields that reflect a time (not duration) in the system views are
> _time, so I'm wondering if instead of "last_inactive_at" we should use
> something like "last_inactive_time"?

Yeah, I can see that. So, I changed it to last_inactive_time.

I agree with treating last_inactive_time as a separate property of the
slot having its own use in addition to helping implement
inactive_timeout based invalidation. I think it can go separately.

I tried to address the review comments received for this patch alone
and attached v15-0001. I'll post other patches soon.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

v15-0001-Track-last_inactive_time-in-pg_replication_slots.patch
Description: Binary data

Re: Large block sizes support in Linux

2024-03-22 Thread Tomas Vondra

On 3/22/24 19:46, Bruce Momjian wrote:
> On Thu, Mar 21, 2024 at 06:46:19PM +0100, Pankaj Raghav (Samsung) wrote:
>> Hello, 
>>
>> My team and I have been working on adding Large block size(LBS)
>> support to XFS in Linux[1]. Once this feature lands upstream, we will be
>> able to create XFS with FS block size > page size of the system on Linux.
>> We also gave a talk about it in Linux Plumbers conference recently[2]
>> for more context. The initial support is only for XFS but more FSs will
>> follow later.
>>
>> On an x86_64 system, fs block size was limited to 4k, but traditionally
>> Postgres uses 8k as its default internal page size. With LBS support,
>> fs block size can be set to 8K, thereby matching the Postgres page size.
>>
>> If the file system block size == DB page size, then Postgres can have
>> guarantees that a single DB page will be written as a single unit during
>> kernel write back and not split.
>>
>> My knowledge of Postgres internals is limited, so I'm wondering if there
>> are any optimizations or potential optimizations that Postgres could
>> leverage once we have LBS support on Linux?
> 
> We have discussed this in the past, and in fact in the early years we
> thought we didn't need fsync since the BSD file system was 8k at the
> time.
> 
> What we later realized is that we have no guarantee that the file system
> will write to the device in the specified block size, and even it it
> does, the I/O layers between the OS and the device might not, since many
> devices use 512 byte blocks or other sizes.
> 

Right, but things change over time - current storage devices support
much larger sectors (LBA format), usually 4K. And if you do I/O with
this size, it's usually atomic.

AFAIK if you built Postgres with 4K pages, on a device with 4K LBA
format, that would not need full-page writes - we always do I/O in 4k
pages, and block layer does I/O (during writeback from page cache) with
minimum guaranteed size = logical block size. 4K are great for OLTP
systems in general, it'd be even better if we didn't need to worry about
torn pages (but the tricky part is to be confident it's safe to disable
them on a particular system).

I did watch the talk linked by Pankaj, and IIUC the promise of the LBS
patches is that this benefit would extend would apply even to larger
page sizes (= fs page size). Which right now you can't even mount, but
the patches allow that. So for example it would be possible to create an
XFS filesystem with 8kB pages, and then we'd read/write 8kB pages as
usual, and we'd know that the page cache always writes out either the
whole page or none of it. Which right now is not guaranteed to happen,
it's possible to e.g. write the page as two 4K requests, even if all
other things are set properly (drive has 4K logical/physical sectors).

At least that's my understanding ...

Pankaj, could you clarify what the guarantees provided by LBS are going
to be? the talk uses wording like "should be" and "hint" in a couple
places, and there's also stuff I'm not 100% familiar with.

If we create a filesystem with 8K blocks, and we only ever do writes
(and reads) in 8K chunks (our default page size), what guarantees that
gives us? What if the underlying device has LBA format with only 4K (or
perhaps even just 512B), how would that affect the guarantees?

The other thing is - is there a reliable way to say when the guarantees
actually apply? I mean, how would the administrator *know* it's safe to
set full_page_writes=off, or even better how could we verify this when
the database starts (and complain if it's not safe to disable FPW)?

It's easy to e.g. take a backup on one filesystem and restore it on
another one, and forget those may have different block sizes etc. I'm
not sure it's possible in a 100% reliable way (tablespaces?).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: psql not responding to SIGINT upon db reconnection

2024-03-22 Thread Tristan Partin

On Fri Mar 22, 2024 at 12:17 PM CDT, Robert Haas wrote:

On Fri, Mar 22, 2024 at 1:05 PM Tristan Partin  wrote:
> Sorry for taking a while to get back to y'all. I have taken your
> feedback into consideration for v9. This is my first time writing
> Postgres docs, so I'm ready for another round of editing :).

Yeah, that looks like it needs some wordsmithing yet. I can take a
crack at that at some point, but here are a few notes:

- "takes care of" and "working through the state machine" seem quite
vague to me.
- the meanings of forRead and forWrite don't seem to be documented.
- "retuns" is a typo.

> Robert, could you point out some places where comments would be useful
> in 0002? I did rename the function and moved it as suggested, thanks! In
> turn, I also realized I forgot a prototype, so also added it.

Well, for starters, I'd give the function a header comment.

Telling me that a 1 second timeout is a 1 second timeout is less
useful than telling me why we've picked a 1 second timeout. Presumably
the answer is "so we can respond to cancel interrupts in a timely
fashion", but I'd make that explicit.

It might be worth a comment saying that PQsocket() shouldn't be
hoisted out of the loop because it could be a multi-host connection
string and the socket might change under us. Unless that's not true,
in which case we should hoist the PQsocket() call out of the loop.

I think it would also be worth a comment saying that we don't need to
report errors here, as the caller will do that; we just need to wait
until the connection is known to have either succeeded or failed, or
until the user presses cancel.

This is good feedback, thanks. I have added comments where you 
suggested. I reworded the PQsocketPoll docs to hopefully meet your 
expectations. I am using the term "connection sequence" which is from 
the PQconnectStartParams docs, so hopefully people will be able to make 
that connection. I wrote documentation for "forRead" and "forWrite" as 
well.

I had a question about parameter naming. Right now I have a mix of 
camel-case and snake-case in the function signature since that is what 
I inherited. Should I change that to be consistent? If so, which case 
would you like?

Thanks for your continued reviews.

--
Tristan Partin
Neon (https://neon.tech)
From dc45e95ab443d973845dd840600d6719dcd4cae2 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 30 Jan 2024 15:40:57 -0600
Subject: [PATCH v10 1/2] Expose PQsocketPoll for frontends

PQsocketPoll should help with state machine processing when connecting
to a database asynchronously via PQconnectStart().
---
 doc/src/sgml/libpq.sgml  | 43 +++-
 src/interfaces/libpq/exports.txt |  1 +
 src/interfaces/libpq/fe-misc.c   |  7 +++---
 src/interfaces/libpq/libpq-fe.h  |  4 +++
 4 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d3e87056f2c..774b57ba70b 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -262,6 +262,44 @@ PGconn *PQsetdb(char *pghost,

+
+ PQsocketPollPQsocketPoll
+ 
+  
+   nonblocking connection
+   Poll a connections underlying socket descriptor retrieved with .
+
+int PQsocketPoll(int sock, int forRead, int forWrite, time_t end_time);
+
+  
+
+  
+   This function sets up polling of a file descriptor. The underlying function is either
+   poll(2) or select(2), depending on platform
+   support. The primary use of this function is iterating through the connection sequence
+   described in the documentation of . If
+   forRead is specified, the function waits for the socket to be ready
+   for reading. If forWrite is specified, the function waits for the
+   socket to be ready for write. See POLLIN and POLLOUT
+   from poll(2), or readfds and
+   writefds from select(2) for more information.
+  
+
+  
+   The function returns a value greater than 0 if the specified condition
+   is met, 0 if a timeout occurred, or -1 if an error
+   or interrupt occurred. In the event forRead and
+   forWrite are not set, the function immediately returns a timeout
+   condition.
+  
+
+  
+   end_time is the time in the future in seconds starting from the UNIX
+   epoch in which you would like the function to return if the condition is not met.
+  
+ 
+
+

  PQconnectStartParamsPQconnectStartParams
  PQconnectStartPQconnectStart
@@ -358,7 +396,10 @@ PostgresPollingStatusType PQconnectPoll(PGconn *conn);
Loop thus: If PQconnectPoll(conn) last returned
PGRES_POLLING_READING, wait until the socket is ready to
read (as indicated by select(), poll(), or
-   similar system function).
+   similar system function).  Note that PQsocketPoll
+   can help reduce boilerplate by abstracting the setup of
+   select(2) or poll(2) if it is
+   available on

Re: SET ROLE documentation improvement

2024-03-22 Thread Nathan Bossart

On Fri, Mar 22, 2024 at 03:45:06PM -0500, Nathan Bossart wrote:
> On Fri, Mar 22, 2024 at 03:58:59PM -0400, Robert Haas wrote:
>> On Fri, Nov 10, 2023 at 12:41 PM Nathan Bossart
>>  wrote:
>>> I still think we should update the existing note about privileges for
>>> SET/RESET ROLE to something like the following:
>>>
>>> diff --git a/doc/src/sgml/ref/set_role.sgml b/doc/src/sgml/ref/set_role.sgml
>>> index 13bad1bf66..c91a95f5af 100644
>>> --- a/doc/src/sgml/ref/set_role.sgml
>>> +++ b/doc/src/sgml/ref/set_role.sgml
>>> @@ -41,8 +41,10 @@ RESET ROLE
>>>
>>>
>>>
>>> -   The specified role_name
>>> -   must be a role that the current session user is a member of.
>>> +   The current session user must have the SET for the
>>> +   specified role_name, either
>>> +   directly or indirectly via a chain of memberships with the
>>> +   SET option.
>>> (If the session user is a superuser, any role can be selected.)
>>>
>> 
>> This is a good change; I should have done this when SET was added.
> 
> Cool.

Actually, shouldn't this one be back-patched to v16?  If so, I'd do that
one separately from the other changes we are discussing.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: SET ROLE documentation improvement

2024-03-22 Thread Nathan Bossart

On Fri, Mar 22, 2024 at 03:58:59PM -0400, Robert Haas wrote:
> On Fri, Nov 10, 2023 at 12:41 PM Nathan Bossart
>  wrote:
>> Looking again, I'm kind of hesitant to add too much qualification to this
>> note about losing superuser privileges.
> 
> The note in question is:
> 
>   
>Note that when a superuser chooses to SET ROLE to a
>non-superuser role, they lose their superuser privileges.
>   
> 
> It's not entirely clear to me what the point of this note is. I think
> we could consider removing it entirely, on the theory that it's just
> poorly-stated special case of what's already been said in the
> description, i.e. "permissions checking for SQL commands is carried
> out as though the named role were the one that had logged in
> originally" and "The specified  class="parameter">role_name must be a role that the
> current session user is a member of."

+1.  IMHO these kinds of special mentions of SUPERUSER tend to be
redundant, and, as evidenced by this thread, confusing.  I'll update the
patch.

> I think it's also possible that what the author of this paragraph
> meant was that role attributes like CREATEDB, CREATEROLE, REPLICATION,
> and SUPERUSER follow the current user, not the session user. If we
> think that was the point of this paragraph, we could make it say that
> more clearly. However, I'm not sure that really needs to be mentioned,
> because "permissions checking for SQL commands is carried out as
> though the named role were the one that had logged in originally"
> seems to cover that ground along with everything else.

+1

>> I still think we should update the existing note about privileges for
>> SET/RESET ROLE to something like the following:
>>
>> diff --git a/doc/src/sgml/ref/set_role.sgml b/doc/src/sgml/ref/set_role.sgml
>> index 13bad1bf66..c91a95f5af 100644
>> --- a/doc/src/sgml/ref/set_role.sgml
>> +++ b/doc/src/sgml/ref/set_role.sgml
>> @@ -41,8 +41,10 @@ RESET ROLE
>>
>>
>>
>> -   The specified role_name
>> -   must be a role that the current session user is a member of.
>> +   The current session user must have the SET for the
>> +   specified role_name, either
>> +   directly or indirectly via a chain of memberships with the
>> +   SET option.
>> (If the session user is a superuser, any role can be selected.)
>>
> 
> This is a good change; I should have done this when SET was added.

Cool.

> Another change we could consider is revising "permissions checking for
> SQL commands is carried out as though the named role were the one that
> had logged in originally" to mention that SET ROLE and SET SESSION
> AUTHORIZATION are exceptions.

That seems like a resonable idea, although it might require a few rounds of
wordsmithing.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Cannot find a working 64-bit integer type on Illumos

2024-03-22 Thread Tom Lane

Robert Haas  writes:
> On Fri, Mar 22, 2024 at 3:31 PM Andres Freund  wrote:
>> IME the bigger issue is that sometimes it doesn't lead to outright failures,
>> just to lots of stuff not being detected as supported / present.

> Ugh. That does, indeed, sound not very nice.

I could get behind throwing an error if -Werror is spotted.  I think
trying to pull it out and put it back is too much work and a bit
too much magic.

regards, tom lane

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-22 Thread Tom Lane

Nathan Bossart  writes:
> On Fri, Mar 22, 2024 at 11:54:48AM -0500, Nathan Bossart wrote:
>> On Fri, Mar 22, 2024 at 12:53:15PM -0400, Tom Lane wrote:
>>> Would you like to review the catcache patch further, or do you
>>> think it's good to go?

>> I'll take another look this afternoon.

> LGTM

Thanks for looking, I'll push that shortly.

regards, tom lane

Re: [PATCH] plpython function causes server panic

2024-03-22 Thread Tom Lane

Robert Haas  writes:
> - I don't think the documentation changes are entirely accurate. The
> whole point of the patch is to allow parallel workers to make changes
> to the transaction state, but the documentation says you can't. Maybe
> we should just delete "change the transaction state" entirely from the
> list of things that you're not allowed to do, since "write to the
> database" is already listed separately; or maybe we should replace it
> with something like "assign new transaction IDs or command IDs,"
> although that's kind of low-level. I don't think we should just delete
> the "even temporarily" bit, as you've done.

Fair enough.  In the attached v2, I wrote "change the transaction
state (other than by using a subtransaction for error recovery)";
what do you think of that?

I dug around in the docs and couldn't really find anything about
parallel-query transaction limitations other than this bit in
parallel.sgml and the more or less copy-pasted text in
create_function.sgml; did you have any other spots in mind?
(I did find the commentary in README.parallel, but that's not
exactly user-facing.)

> - While I like the new comments in BeginInternalSubTransaction(), I
> think the changes in ReleaseCurrentSubTransaction() and
> RollbackAndReleaseCurrentSubTransaction() need more thought.

Yah.  After studying the code a bit more, I realized that what
I'd done would cause IsInParallelMode() to start returning false
during a subtransaction within parallel mode, which is surely not
what we want.  That state has to be heritable into subtransactions
in some fashion.  The attached keeps the current semantics of
parallelModeLevel and adds a bool parallelChildXact field that is
true if any outer transaction level has nonzero parallelModeLevel.
That's possibly more general than we need today, but it seems like
a reasonably clean definition.

> One additional thing that might (or might not) be worth mentioning or
> checking for here is that the leader shouldn't try to reduce the
> height of the transaction state stack to anything less than what it
> was when the parallel operation started; if it wants to do that, it
> needs to clean up the parallel workers and exit parallel mode first.
> Similarly a worker shouldn't ever end the toplevel transaction except
> during backend cleanup.

I think these things are already dealt with.  However, one thing
worth questioning is that CommitSubTransaction() will just silently
kill any workers started during the current subxact, and likewise
CommitTransaction() zaps workers without complaint.  Shouldn't these
instead throw an error about how you didn't close parallel mode,
and then the corresponding Abort function does the cleanup?
I did not change that behavior here, but it seems dubious.

v2 attached works a bit harder on the comments and adds a simplistic
test case.  I feel that I don't want to incorporate the plpython
crash that started this thread, as it's weird and dependent on
Python code outside our control (though I have checked that we
don't crash on that anymore).

regards, tom lane

diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index 5acc9537d6..dae9dd7f2f 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -545,10 +545,10 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';

-Functions and aggregates must be marked PARALLEL UNSAFE if
-they write to the database, access sequences, change the transaction state
-even temporarily (e.g., a PL/pgSQL function that establishes an
-EXCEPTION block to catch errors), or make persistent changes to
+Functions and aggregates must be marked PARALLEL UNSAFE
+if they write to the database, change the transaction state (other than by
+using a subtransaction for error recovery), access sequences, or make
+persistent changes to
 settings.  Similarly, functions must be marked PARALLEL
 RESTRICTED if they access temporary tables, client connection state,
 cursors, prepared statements, or miscellaneous backend-local state that
diff --git a/doc/src/sgml/ref/create_function.sgml b/doc/src/sgml/ref/create_function.sgml
index 863d99d1fc..0d240484cd 100644
--- a/doc/src/sgml/ref/create_function.sgml
+++ b/doc/src/sgml/ref/create_function.sgml
@@ -428,22 +428,24 @@ CREATE [ OR REPLACE ] FUNCTION
 PARALLEL

- PARALLEL UNSAFE indicates that the function
-  can't be executed in parallel mode and the presence of such a
+ 
+  PARALLEL UNSAFE indicates that the function
+  can't be executed in parallel mode; the presence of such a
   function in an SQL statement forces a serial execution plan.  This is
   the default.  PARALLEL RESTRICTED indicates that
-  the function can be executed in parallel mode, but the execution is
-  restricted to parallel group leader.  PARALLEL SAFE
+  the function can be executed in parallel mode, but only in the parallel
+

Re: session username in default psql prompt?

2024-03-22 Thread Robert Haas

On Wed, Mar 13, 2024 at 4:56 AM Andrew Dunstan  wrote:
> Reposting as the archive mail processor doesn't seem to like the Apple
> mail attachment.

I'm not really a fan of this. Right now my prompt looks like this:

robert.haas=#

If we did this, it would say:

robert.h...@robert.haas=#

I have yet to meet anyone who doesn't think that one Robert Haas is
quite enough already, and perhaps too much by half.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: SET ROLE documentation improvement

2024-03-22 Thread Robert Haas

On Fri, Nov 10, 2023 at 12:41 PM Nathan Bossart
 wrote:
> On Tue, Sep 26, 2023 at 08:33:25AM -0700, Yurii Rashkovskii wrote:
> > This is a good start, indeed. I've amended my patch to include it.
>
> Thanks for the new patch.
>
> Looking again, I'm kind of hesitant to add too much qualification to this
> note about losing superuser privileges.

The note in question is:

   Note that when a superuser chooses to SET ROLE to a
   non-superuser role, they lose their superuser privileges.

It's not entirely clear to me what the point of this note is. I think
we could consider removing it entirely, on the theory that it's just
poorly-stated special case of what's already been said in the
description, i.e. "permissions checking for SQL commands is carried
out as though the named role were the one that had logged in
originally" and "The specified role_name must be a role that the
current session user is a member of."

I think it's also possible that what the author of this paragraph
meant was that role attributes like CREATEDB, CREATEROLE, REPLICATION,
and SUPERUSER follow the current user, not the session user. If we
think that was the point of this paragraph, we could make it say that
more clearly. However, I'm not sure that really needs to be mentioned,
because "permissions checking for SQL commands is carried out as
though the named role were the one that had logged in originally"
seems to cover that ground along with everything else.

> I still think we should update the existing note about privileges for
> SET/RESET ROLE to something like the following:
>
> diff --git a/doc/src/sgml/ref/set_role.sgml b/doc/src/sgml/ref/set_role.sgml
> index 13bad1bf66..c91a95f5af 100644
> --- a/doc/src/sgml/ref/set_role.sgml
> +++ b/doc/src/sgml/ref/set_role.sgml
> @@ -41,8 +41,10 @@ RESET ROLE
>
>
>
> -   The specified role_name
> -   must be a role that the current session user is a member of.
> +   The current session user must have the SET for the
> +   specified role_name, either
> +   directly or indirectly via a chain of memberships with the
> +   SET option.
> (If the session user is a superuser, any role can be selected.)
>

This is a good change; I should have done this when SET was added.

Another change we could consider is revising "permissions checking for
SQL commands is carried out as though the named role were the one that
had logged in originally" to mention that SET ROLE and SET SESSION
AUTHORIZATION are exceptions.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-22 Thread Kartyshov Ivan

Thank you for your feedback.

On 2024-03-20 12:11, Alexander Korotkov wrote:

On Wed, Mar 20, 2024 at 12:34 AM Kartyshov Ivan
 wrote:

> 4.2 With an unreasonably high future LSN, BEGIN command waits
> unboundedly, shouldn't we check if the specified LSN is more than
> pg_last_wal_receive_lsn() error out?

I think limiting wait lsn by current received lsn would destroy the
whole value of this feature.  The value is to wait till given LSN is
replayed, whether it's already received or not.

Ok sounds reasonable, I`ll rollback the changes.

But I don't see a problem here. On the replica, it's out of our
control to check which lsn is good and which is not.  We can't check
whether the lsn, which is in future for the replica, is already issued
by primary.

For the case of wrong lsn, which could cause potentially infinite
wait, there is the timeout and the manual query cancel.

Fully agree with this take.

> 4.3 With an unreasonably high wait time, BEGIN command waits
> unboundedly, shouldn't we restrict the wait time to some max

value,

> say a day or so?
> SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
> BEGIN AFTER :'future_receive_lsn' WITHIN 10;

Good idea, I put it 1 day. But this limit we should to discuss.

Do you think that specifying timeout in milliseconds is suitable?  I
would prefer to switch to seconds (with ability to specify fraction of
second).  This was expressed before by Alexander Lakhin.

It sounds like an interesting idea. Please review the result.

> https://github.com/macdice/redo-bench or similar tools?

Ivan, could you do this?

Yes, test redo-bench/crash-recovery.sh
This patch on master
91.327, 1.973
105.907, 3.338
98.412, 4.579
95.818, 4.19

REL_13-STABLE
116.645, 3.005
113.212, 2.568
117.644, 3.183
111.411, 2.782

master
124.712, 2.047
117.012, 1.736
116.328, 2.035
115.662, 1.797

Strange behavior, patched version is faster then REL_13-STABLE and 
master.

I don't see this change in the patch.  Normally if a process gets a
signal, that causes WaitLatch() to exit immediately.  It also exists
immediately on query cancel.  IIRC, this 1 minute timeout is needed to
handle some extreme cases when an interrupt is missing.  Other places
have it equal to 1 minute. I don't see why we should have it
different.

Ok, I`ll rollback my changes.

4) added and expanded sections in the documentation

I don't see this in the patch.  I see only a short description in
func.sgml, which is definitely not sufficient.  We need at least
everything we have in the docs before to be adjusted with the current
approach of procedure.

I didn't find another section where to add the description of 
pg_wait_lsn().

So I extend description on the bottom of the table.

5) add default variant of timeout
pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0)
example: pg_wait_lsn('0/31B1B60') equal pg_wait_lsn('0/31B1B60', 0)

Does zero here mean no timeout?  I think this should be documented.
Also, I would prefer to see the timeout by default.  Probably one
minute would be good for default.

Lets discuss this point. Loop in function WaitForLSN is made that way,
if we choose delay=0, only then we can wait infinitely to wait LSN
without timeout. So default must be 0.

Please take one more look on the patch.

PS sorry, the strange BUG throw my mails out of thread.
https://www.postgresql.org/message-id/flat/f2ff071aa9141405bb8efee67558a058%40postgrespro.ru

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5b225ccf4f..6e1f69962d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27861,10 +27861,54 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
 extension.

+
+  
+   
+
+ pg_wait_lsn
+
+pg_wait_lsn (trg_lsn pg_lsn, delay int8 DEFAULT 0)
+   
+   
+Returns ERROR if target LSN was not replayed on standby.
+Parameter delay sets seconds, the time to wait to the LSN.
+   
+  

+   
+pg_wait_lsn waits till wait_lsn
+to be replayed on standby to achieve read-your-writes-consistency, while
+using async replica for reads and primary for writes. In that case lsn of
+last modification is stored inside application.
+   
+
+   
+You can use pg_wait_lsn to wait the pg_lsn
+value. For example:
+   
+   Replica:
+   postgresql.conf
+ recovery_min_apply_delay = 10s;
+
+   Primary: Application update table and get lsn of changes
+   postgres=# UPDATE films SET kind = 'Dramatic' WHERE kind = 'Drama';
+   postgres=# SELECT pg_current_wal_lsn();
+   pg_current_wal_lsn
+   
+   0/306EE20
+   (1 row)
+
+   Replica: Wait and read updated data
+   postgres=# CALL pg_wait_lsn('0/306EE20', 100); // Timeout 100ms is insufficent
+   ERROR:  canceling waiting for LSN due to timeout
+   postgres=# CALL pg_wait_lsn('0/306EE20');
+

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-22 Thread Nathan Bossart

On Fri, Mar 22, 2024 at 11:54:48AM -0500, Nathan Bossart wrote:
> On Fri, Mar 22, 2024 at 12:53:15PM -0400, Tom Lane wrote:
>> Would you like to review the catcache patch further, or do you
>> think it's good to go?
> 
> I'll take another look this afternoon.

LGTM

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Cannot find a working 64-bit integer type on Illumos

2024-03-22 Thread Robert Haas

On Fri, Mar 22, 2024 at 3:31 PM Andres Freund  wrote:
> IME the bigger issue is that sometimes it doesn't lead to outright failures,
> just to lots of stuff not being detected as supported / present.

Ugh. That does, indeed, sound not very nice.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Cannot find a working 64-bit integer type on Illumos

2024-03-22 Thread Andres Freund

Hi,

On 2024-03-22 15:02:45 -0400, Robert Haas wrote:
> On Fri, Mar 22, 2024 at 2:38 PM Andres Freund  wrote:
> > I wonder if we ought to make configure warn if it sees -Werror in CFLAGS -
> > this is far from the first time somebody stumbling with -Werror. Including a
> > few quite senior hackers, if I recall correctly.  We could also just filter 
> > it
> > temporarily and put it back at the end of configure.
> 
> I think I made this mistake at some point, but I just looked at
> config.log and corrected my mistake.

IME the bigger issue is that sometimes it doesn't lead to outright failures,
just to lots of stuff not being detected as supported / present.

Greetings,

Andres Freund

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-22 Thread Kartyshov Ivan

Thank you for your feedback.

On 2024-03-20 12:11, Alexander Korotkov wrote:

On Wed, Mar 20, 2024 at 12:34 AM Kartyshov Ivan
 wrote:

> 4.2 With an unreasonably high future LSN, BEGIN command waits
> unboundedly, shouldn't we check if the specified LSN is more than
> pg_last_wal_receive_lsn() error out?

I think limiting wait lsn by current received lsn would destroy the
whole value of this feature.  The value is to wait till given LSN is
replayed, whether it's already received or not.

Ok sounds reasonable, I`ll rollback the changes.

But I don't see a problem here. On the replica, it's out of our
control to check which lsn is good and which is not.  We can't check
whether the lsn, which is in future for the replica, is already issued
by primary.

For the case of wrong lsn, which could cause potentially infinite
wait, there is the timeout and the manual query cancel.

Fully agree with this take.

> 4.3 With an unreasonably high wait time, BEGIN command waits
> unboundedly, shouldn't we restrict the wait time to some max

value,

> say a day or so?
> SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
> BEGIN AFTER :'future_receive_lsn' WITHIN 10;

Good idea, I put it 1 day. But this limit we should to discuss.

Do you think that specifying timeout in milliseconds is suitable?  I
would prefer to switch to seconds (with ability to specify fraction of
second).  This was expressed before by Alexander Lakhin.

It sounds like an interesting idea. Please review the result.

> https://github.com/macdice/redo-bench or similar tools?

Ivan, could you do this?

Yes, test redo-bench/crash-recovery.sh
This patch on master
91.327, 1.973
105.907, 3.338
98.412, 4.579
95.818, 4.19

REL_13-STABLE
116.645, 3.005
113.212, 2.568
117.644, 3.183
111.411, 2.782

master
124.712, 2.047
117.012, 1.736
116.328, 2.035
115.662, 1.797

Strange behavior, patched version is faster then REL_13-STABLE and 
master.

I don't see this change in the patch.  Normally if a process gets a
signal, that causes WaitLatch() to exit immediately.  It also exists
immediately on query cancel.  IIRC, this 1 minute timeout is needed to
handle some extreme cases when an interrupt is missing.  Other places
have it equal to 1 minute. I don't see why we should have it
different.

Ok, I`ll rollback my changes.

4) added and expanded sections in the documentation

I don't see this in the patch.  I see only a short description in
func.sgml, which is definitely not sufficient.  We need at least
everything we have in the docs before to be adjusted with the current
approach of procedure.

I didn't find another section where to add the description of 
pg_wait_lsn().

So I extend description on the bottom of the table.

5) add default variant of timeout
pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0)
example: pg_wait_lsn('0/31B1B60') equal pg_wait_lsn('0/31B1B60', 0)

Does zero here mean no timeout?  I think this should be documented.
Also, I would prefer to see the timeout by default.  Probably one
minute would be good for default.

Lets discuss this point. Loop in function WaitForLSN is made that way,
if we choose delay=0, only then we can wait infinitely to wait LSN
without timeout. So default must be 0.

Please take one more look on the patch.

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5b225ccf4f..6e1f69962d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27861,10 +27861,54 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
 extension.

+
+  
+   
+
+ pg_wait_lsn
+
+pg_wait_lsn (trg_lsn pg_lsn, delay int8 DEFAULT 0)
+   
+   
+Returns ERROR if target LSN was not replayed on standby.
+Parameter delay sets seconds, the time to wait to the LSN.
+   
+  

+   
+pg_wait_lsn waits till wait_lsn
+to be replayed on standby to achieve read-your-writes-consistency, while
+using async replica for reads and primary for writes. In that case lsn of
+last modification is stored inside application.
+   
+
+   
+You can use pg_wait_lsn to wait the pg_lsn
+value. For example:
+   
+   Replica:
+   postgresql.conf
+ recovery_min_apply_delay = 10s;
+
+   Primary: Application update table and get lsn of changes
+   postgres=# UPDATE films SET kind = 'Dramatic' WHERE kind = 'Drama';
+   postgres=# SELECT pg_current_wal_lsn();
+   pg_current_wal_lsn
+   
+   0/306EE20
+   (1 row)
+
+   Replica: Wait and read updated data
+   postgres=# CALL pg_wait_lsn('0/306EE20', 100); // Timeout 100ms is insufficent
+   ERROR:  canceling waiting for LSN due to timeout
+   postgres=# CALL pg_wait_lsn('0/306EE20');
+   postgres=# SELECT * FROM films WHERE kind = 'Drama';
+   
+   
+

 The functions shown in  control the progress of recovery.
diff --git

Re: documentation structure

2024-03-22 Thread Robert Haas

On Fri, Mar 22, 2024 at 3:17 PM Bruce Momjian  wrote:
> I agree and they should be with the other views.  I was just explaining
> why, at the time, I didn't touch them.

Ah, OK. That makes total sense.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

2024-03-22 Thread Bruce Momjian

On Fri, Mar 22, 2024 at 03:13:29PM -0400, Robert Haas wrote:
> On Fri, Mar 22, 2024 at 2:59 PM Bruce Momjian  wrote:
> > I assume statistics collector views are in "Monitoring Database
> > Activity" because that is their purpose.
> 
> Well, yes. :-)
> 
> But the point is that all other statistics views are in a single
> section regardless of their purpose. We don't document pg_roles in the
> "Database Roles" chapter, for example.
> 
> And on the flip side, pg_locks and pg_replication_origin_status are
> also for monitoring database activity, but they're in the "System
> Views" chapter anyway. The only system views that are in "Monitoring
> Database Activity" rather than "System Views" are the ones where the
> name starts with "pg_stat_".
> 
> So the reason you state is why these views are under "Monitoring
> Database Activity" rather than a chapter chosen at random. But it
> doesn't really explain why they're separate from the other system
> views at all. That seems to be a pretty much random choice, AFAICT.

I agree and they should be with the other views.  I was just explaining
why, at the time, I didn't touch them.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: documentation structure

2024-03-22 Thread Robert Haas

On Fri, Mar 22, 2024 at 2:59 PM Bruce Momjian  wrote:
> I assume statistics collector views are in "Monitoring Database
> Activity" because that is their purpose.

Well, yes. :-)

But the point is that all other statistics views are in a single
section regardless of their purpose. We don't document pg_roles in the
"Database Roles" chapter, for example.

And on the flip side, pg_locks and pg_replication_origin_status are
also for monitoring database activity, but they're in the "System
Views" chapter anyway. The only system views that are in "Monitoring
Database Activity" rather than "System Views" are the ones where the
name starts with "pg_stat_".

So the reason you state is why these views are under "Monitoring
Database Activity" rather than a chapter chosen at random. But it
doesn't really explain why they're separate from the other system
views at all. That seems to be a pretty much random choice, AFAICT.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

2024-03-22 Thread David G. Johnston

On Fri, Mar 22, 2024 at 11:19 AM Robert Haas  wrote:

> On Fri, Mar 22, 2024 at 1:35 PM Bruce Momjian  wrote:
>
> But that all seems like a separate question from why we have the
> statistic collector views in a completely different part of the
> documentation from the rest of the system views. My guess is that it's
> just kind of a historical accident, but maybe there was some other
> logic to it.
>
>
The details under-pinning the cumulative statistics subsystem are
definitely large enough to warrant their own subsection. And it isn't like
placing them into the monitoring chapter is wrong and aside from a couple
of views those under System Views don't fit into what we've defined as
monitoring.  I don't have any desire to lump them under the generic system
views; which itself could probably use a level of categorization since the
nature of pg_locks and pg_cursors is decidedly different than pg_indexes
and pg_config.  This all becomes more appealing to work on once we solve
the problem of all sect2 entries being placed on a single page.

I struggled for a long while where I'd always look for pg_stat_activity
under system views instead of monitoring.  Amending my prior suggestion in
light of this I would suggest we move the Cumulative Statistics Views into
Reference but as its own Chapter, not part of System Views, and change its
name to "Monitoring Views" (going more generalized here feels like a win to
me). I'd move pg_locks, pg_cursors, pg_backend_memory_contexts,
pg_prepared_*, pg_shmem_allocations, and pg_replication_*.  Those all have
the same general monitoring nature to them compared to the others that
basically provide details regarding schema and static or session
configuration.

The original server admin monitoring section can go into detail regarding
Cumulative Statistics versus other kinds of monitoring.  We can use
section ordering to fulfill logical grouping desires until we are able to
make section3 entries appear on their own pages.

David J.

Re: Cannot find a working 64-bit integer type on Illumos

2024-03-22 Thread Robert Haas

On Fri, Mar 22, 2024 at 2:38 PM Andres Freund  wrote:
> I wonder if we ought to make configure warn if it sees -Werror in CFLAGS -
> this is far from the first time somebody stumbling with -Werror. Including a
> few quite senior hackers, if I recall correctly.  We could also just filter it
> temporarily and put it back at the end of configure.

I think I made this mistake at some point, but I just looked at
config.log and corrected my mistake. I'm not strongly against having
an explicit check for -Werror, but I think the main problem here is
that the original poster didn't have a look at config.log to see what
the actual problem was, and at least IME that's necessary in pretty
much 100% of cases where configure fails for whatever reason. Perhaps
autotools could be better-designed in that regard, but we don't
necessarily want to work around every problem that can stem from that
design choice in our code, either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

2024-03-22 Thread Bruce Momjian

On Fri, Mar 22, 2024 at 02:19:29PM -0400, Robert Haas wrote:
> If you were actually looking for the section called "System Views",
> you weren't likely to see it here unless you already knew it was
> there, because it was 64 items into a 97-item list. Having one of
> these two sections inside the other just doesn't work at all. We could
> have alternatively chosen to have one chapter with two  tags
> inside of it, but I think what you actually did was perfectly fine.
> IMHO, "System Views" is important enough (and big enough) that giving
> it its own chapter is perfectly reasonable.
> 
> But that all seems like a separate question from why we have the
> statistic collector views in a completely different part of the
> documentation from the rest of the system views. My guess is that it's
> just kind of a historical accident, but maybe there was some other
> logic to it.

I assume statistics collector views are in "Monitoring Database
Activity" because that is their purpose.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Large block sizes support in Linux

2024-03-22 Thread Bruce Momjian

On Thu, Mar 21, 2024 at 06:46:19PM +0100, Pankaj Raghav (Samsung) wrote:
> Hello, 
> 
> My team and I have been working on adding Large block size(LBS)
> support to XFS in Linux[1]. Once this feature lands upstream, we will be
> able to create XFS with FS block size > page size of the system on Linux.
> We also gave a talk about it in Linux Plumbers conference recently[2]
> for more context. The initial support is only for XFS but more FSs will
> follow later.
> 
> On an x86_64 system, fs block size was limited to 4k, but traditionally
> Postgres uses 8k as its default internal page size. With LBS support,
> fs block size can be set to 8K, thereby matching the Postgres page size.
> 
> If the file system block size == DB page size, then Postgres can have
> guarantees that a single DB page will be written as a single unit during
> kernel write back and not split.
> 
> My knowledge of Postgres internals is limited, so I'm wondering if there
> are any optimizations or potential optimizations that Postgres could
> leverage once we have LBS support on Linux?

We have discussed this in the past, and in fact in the early years we
thought we didn't need fsync since the BSD file system was 8k at the
time.

What we later realized is that we have no guarantee that the file system
will write to the device in the specified block size, and even it it
does, the I/O layers between the OS and the device might not, since many
devices use 512 byte blocks or other sizes.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: pg_upgrade --copy-file-range

2024-03-22 Thread Robert Haas

On Fri, Mar 22, 2024 at 1:22 PM Tomas Vondra
 wrote:
> Right, this will happen:
>
>   pg_combinebackup: error: unable to use accelerated copy when manifest
>   checksums are to be calculated. Use --no-manifest
>
> Are you saying we should just silently override the copy method and do
> the copy block by block?

Yes.

> I'm not strongly opposed to that, but it feels
> wrong to just ignore that the user explicitly requested cloning, and I'm
> not sure why should this be different from any other case when the user
> requests incompatible combination of options and/or options that are not
> supported on the current configuration.

I don't feel like copying block-by-block when that's needed to compute
a checksum is ignoring what the user requested. I mean, if we'd had to
perform reconstruction rather than copying an entire file, we would
have done that regardless of whether --clone had been there, and I
don't see why the need-to-compute-a-checksum case is any different. I
think we should view a flag like --clone as specifying how to copy a
file when we don't need to do anything but copy it. I don't think it
should dictate that we're not allowed to perform other processing when
that other processing is required.

>From my point of view, this is not a case of incompatible options
having been specified. If you specify run pg_basebackup with both
--format=p and --format=t, those are incompatible options; the backup
can be done one way or the other, but not both at once. But here
there's no such conflict. Block-by-block copying and fast-copying can
happen as part of the same operation, as in the example that I showed,
where some files need the block-by-block copying and some can be
fast-copied. The user is entitled to specify which fast-copying method
they would like to have used for the files where fast-copying is
possible without getting a failure just because it isn't possible for
every single file.

Or to say it the other way around, if there's 1 file that needs to be
copied block by block and 99 files that can be fast-copied, you want
to force the user to the block-by-block method for all 100 files. I
want to force the use of the block-by-block method for the 1 file
where that's the only valid method, and let them choose what they want
to do for the other 99.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Cannot find a working 64-bit integer type on Illumos

2024-03-22 Thread Andres Freund

Hi,

On 2024-03-22 13:25:56 -0400, Tom Lane wrote:
> The short answer is that Autoconf is not designed to support -Werror
> and it's not worth it to try to make it do so.

I wonder if we ought to make configure warn if it sees -Werror in CFLAGS -
this is far from the first time somebody stumbling with -Werror. Including a
few quite senior hackers, if I recall correctly.  We could also just filter it
temporarily and put it back at the end of configure.

I don't think there's great way of making the autoconf buildsystem use -Werror
continually, today. IIRC the best way is to use Makefile.custom.

Greetings,

Andres Freund

Re: documentation structure

2024-03-22 Thread Robert Haas

On Fri, Mar 22, 2024 at 1:35 PM Bruce Momjian  wrote:
> > One question I have is why all of these views are documented here
> > rather than in chapter 53, "System Views," because surely they are
> > system views. I feel like if our documentation index weren't a mile
> > long and if you could easily find the entry for "System Views," that's
> > where you would naturally look for these details. I don't think it's
> > natural for a user to expect that most of the system views are going
> > to be documented in section VII, chapter 53 but one particular kind is
> > going to be documented in section III, chapter 27, under a chapter
>
> Well, until this commit in 2022, the system views were _under_ the
> system catalogs chapter:

Even before that commit, the statistics collector views were
documented in a completely separate part of the documentation from all
of the other system views.

I think that commit was a good idea, even though it made the top-level
documentation index bigger, because in v14, the "System Catalogs"
chapter looks like this:

...
52.61. pg_ts_template
52.62. pg_type
52.63. pg_user_mapping
52.64. System Views
52.65. pg_available_extensions
52.66. pg_available_extension_versions
52.67. pg_backend_memory_contexts
...

If you were actually looking for the section called "System Views",
you weren't likely to see it here unless you already knew it was
there, because it was 64 items into a 97-item list. Having one of
these two sections inside the other just doesn't work at all. We could
have alternatively chosen to have one chapter with two  tags
inside of it, but I think what you actually did was perfectly fine.
IMHO, "System Views" is important enough (and big enough) that giving
it its own chapter is perfectly reasonable.

But that all seems like a separate question from why we have the
statistic collector views in a completely different part of the
documentation from the rest of the system views. My guess is that it's
just kind of a historical accident, but maybe there was some other
logic to it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] plpython function causes server panic

2024-03-22 Thread Robert Haas

On Fri, Mar 22, 2024 at 1:52 PM Tom Lane  wrote:
> Robert Haas  writes:
> > I agree with the general direction. A few comments:
>
> > - Isn't it redundant to test if IsInParallelMode() ||
> > IsParallelWorker()? We can't be in a parallel worker without also
> > being in parallel mode, except during the worker startup sequence.
>
> Hmm.  The existing code in AssignTransactionId and
> CommandCounterIncrement tests both, so I figured that the conservative
> course was to make DefineSavepoint and friends test both.  Are you
> saying AssignTransactionId and CommandCounterIncrement are wrong?
> If you're saying you don't believe that these routines are reachable
> during parallel worker start, that could be true, but I'm not sure
> I want to make that assumption.  In any case, surely the xxxSavepoint
> routines are not hot enough to make it an interesting
> micro-optimization.  (Perhaps it is worthwhile in AssignTransactionId
> and CCI, but changing those seems like a job for another patch.)

Yeah, that's all fair enough. I went back and looked at the history of
this and found 94b4f7e2a635c3027a23b07086f740615b56aa64.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] plpython function causes server panic

2024-03-22 Thread Tom Lane

Robert Haas  writes:
> I agree with the general direction. A few comments:

> - Isn't it redundant to test if IsInParallelMode() ||
> IsParallelWorker()? We can't be in a parallel worker without also
> being in parallel mode, except during the worker startup sequence.

Hmm.  The existing code in AssignTransactionId and
CommandCounterIncrement tests both, so I figured that the conservative
course was to make DefineSavepoint and friends test both.  Are you
saying AssignTransactionId and CommandCounterIncrement are wrong?
If you're saying you don't believe that these routines are reachable
during parallel worker start, that could be true, but I'm not sure
I want to make that assumption.  In any case, surely the xxxSavepoint
routines are not hot enough to make it an interesting
micro-optimization.  (Perhaps it is worthwhile in AssignTransactionId
and CCI, but changing those seems like a job for another patch.)

regards, tom lane

Re: [DOC] Add detail regarding resource consumption wrt max_connections

2024-03-22 Thread Robert Haas

On Fri, Mar 8, 2024 at 9:52 AM Robert Treat  wrote:
> I'm of the opinion that advice suggestingDBA's set things to DEBUG 3
> is unfriendly at best. If you really want to add more, there is an
> existing unfriendly section of the docs at
> https://www.postgresql.org/docs/devel/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT
> that mentions this problem, specifically:
>
> "If PostgreSQL itself is the cause of the system running out of
> memory, you can avoid the problem by changing your configuration. In
> some cases, it may help to lower memory-related configuration
> parameters, particularly shared_buffers, work_mem, and
> hash_mem_multiplier. In other cases, the problem may be caused by
> allowing too many connections to the database server itself. In many
> cases, it may be better to reduce max_connections and instead make use
> of external connection-pooling software."
>
> I couldn't really find a spot to add in your additional info, but
> maybe you can find a spot that fits? Or maybe a well written
> walk-through of this would make for a good wiki page in case people
> really want to dig in.
>
> In any case, I think Roberto's original language is an improvement
> over what we have now, so I'd probably recommend just going with that,
> along with a similar note to max_prepared_xacts, and optionally a
> pointer to the shared mem section of the docs.

I agree with this.

I don't agree with Cary's statement that if you increase
max_connections you should increase shared_buffers as well. That seems
situation-dependent to me, and it's also missing Roberto's point,
which is that JUST increasing max_connections without doing anything
else uses more shared memory.

Similarly, I don't think we need to document a detailed testing
procedure, as proposed by Reid. If users want to know exactly how many
additional resources are used, they can test; either using the DEBUG3
approach, or perhaps more simply via the pg_shmem_allocations view.
But I think it's overkill for us to recommend any specific testing
procedure here.

Rather, I think that it's entirely appropriate to do what Roberto
suggested, which is to say, let users know that they're going to use
some extra resources if they increase the setting, and then let them
figure out what if anything they want to do about that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Proposal for implementing OCSP Stapling in PostgreSQL

2024-03-22 Thread Jacob Champion

On Tue, Mar 5, 2024 at 4:12 PM David Zhang  wrote:
> Any comments or feedback would be greatly appreciated!

Hi David -- I haven't had time to get to this for the 17 release
cycle, but I'm interested in this feature and I intend to review it at
some point for 18. I think OCSP will be especially helpful for anyone
relying on sslrootcert=system. Thanks for working on it!

--Jacob

Re: Add minimal C example and SQL registration example for custom table access methods.

2024-03-22 Thread Robert Haas

On Fri, Jan 26, 2024 at 3:03 PM Fabrízio de Royes Mello
 wrote:
> On Wed, Nov 15, 2023 at 8:29 PM Roberto Mello  wrote:
> > Suggestion:
> >
> > In the C example you added you mention in the comment:
> >
> > +  /* Methods from TableAmRoutine omitted from example, but all
> > + non-optional ones must be provided here. */
> >
> > Perhaps you could provide a "see " to point the reader finding your 
> > example where he could find these non-optional methods he must provide?
> >
> > Nitpicking a little: your patch appears to change more lines than it does, 
> > because it added line breaks earlier in the lines. I would generally avoid 
> > that unless there's good reason to do so.
>
> Hey folks,
>
> There is a previous patch [1] around the same topic. What about joining 
> efforts on pointing these documentation changes to the proposed test module?
>
> [1] https://commitfest.postgresql.org/46/4588/

Looking over this thread, I see that it was moved from pgsql-docs to
pgsql-hackers while at the same time dropping the original poster from
the Cc list. That seems rather unfortunate. I suspect there's a pretty
good chance that Phil Eaton hasn't seen any of the replies other than
the first one from Paul Jungwirth, which is also the only one that
didn't ask for anything to be changed.

Re-adding Phil. Phil, you should have a look over
https://www.postgresql.org/message-id/flat/CAByiw%2Br%2BCS-ojBDP7Dm%3D9YeOLkZTXVnBmOe_ajK%3Den8C_zB3_g%40mail.gmail.com
and respond to the various emails and probably update the patch
somehow. Note that feature freeze is in 2 weeks, so if we can't reach
agreement on what is to be done here soon, this will have to wait for
the next cycle, or later.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Adding comments to help understand psql hidden queries

2024-03-22 Thread Greg Sabino Mullane

On Fri, Mar 22, 2024 at 11:39 AM David Christensen 
wrote:

> I think it's easier to keep the widths balanced than constant (patch
> version included here)


Yeah, I'm fine with that, especially because nobody is translating it, nor
are they likely to, to be honest.

Cheers,
Greg

Re: documentation structure

2024-03-22 Thread Bruce Momjian

On Fri, Mar 22, 2024 at 08:32:14AM -0400, Robert Haas wrote:
> On Thu, Mar 21, 2024 at 6:32 PM David G. Johnston
>  wrote:
> > Just going to note that the section on the cumulative statistics views 
> > being a single page is still a strongly bothersome issue here.  Though the 
> > quick fix actually requires upgrading the section to chapter status...
> 
> Yeah, I've been bothered by this in the past, too. I'm not very keen
> to start promoting things to the top-level, though. I think we need a
> more thoughtful fix than that.
> 
> One question I have is why all of these views are documented here
> rather than in chapter 53, "System Views," because surely they are
> system views. I feel like if our documentation index weren't a mile
> long and if you could easily find the entry for "System Views," that's
> where you would naturally look for these details. I don't think it's
> natural for a user to expect that most of the system views are going
> to be documented in section VII, chapter 53 but one particular kind is
> going to be documented in section III, chapter 27, under a chapter

Well, until this commit in 2022, the system views were _under_ the
system catalogs chapter:

commit 64d364bb39c
Author: Bruce Momjian 
Date:   Thu Jul 14 16:07:12 2022 -0400

doc:  move system views section to its own chapter

Previously it was inside the system catalogs chapter.

Reported-by: Peter Smith

Discussion: 
https://postgr.es/m/cahut+psmc18qp60d+l0hjboxrlqt5m88yvacdyxlq34gfph...@mail.gmail.com

Backpatch-through: 15

The thread contains more discussion the issue, and I think it still needs help:


https://www.postgresql.org/message-id/flat/CAHut%2BPsMc18QP60D%2BL0hJBOXrLQT5m88yVaCDyxLq34gfPHsow%40mail.gmail.com

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Simplify documentation related to Windows builds

2024-03-22 Thread Robert Haas

On Tue, Jan 30, 2024 at 3:02 AM Michael Paquier  wrote:
> The more I think about this thread, the more I'd tend to wipe out most
> of "windows-requirements" for the sole reason that it is the far-west
> regarding the various ways it is possible to get the dependencies we
> need for the build and at runtime.  We could keep it minimal with the
> set of requirements we are listing under meson in terms of versions:
> https://www.postgresql.org/docs/devel/install-requirements.html

I'm not very knowledgeable about building software about Windows in
general, but on the rare occasions that I've done it, it was MUCH
harder to figure out where to get things like Perl that it is on Linux
or macOS machines. On Linux, your package manager probably knows about
everything you need, and if it doesn't, you can probably fix that by
adding an additional RPM repository to your configuration or using
something like CPAN to find Perl modules that your OS package manager
doesn't have. On macOS, you can install homebrew or macports and then
get most things from there. But on Windows you have to go download
installers individually for everything you need, and there's lots of
installers on the Internet, and not all of them are prepared by
equally friendly people, and not all of them necessarily work for
building PostgreSQL.

So I think that it's pretty darn helpful to have some installation
instructions in the documentation for stuff like this, just like I
think it's useful that in the documentation index we tell people how
to get the doc toolchain working on various platforms. I understand
the concern about seeming to endorse particular Perl distributions or
other software bundles, but I also don't like the idea of telling
people something that boils down to "hey, it's possible to get this to
compile on Windows, and we know some methods that do work, but we're
not going to tell you what they are because we don't want to endorse
anything so ... good luck!". If we know a set of things that work, I
think we should list them in the documentation and just say that we're
not endorsing the use of these particular distributions but just
telling you that we've tested with them. And then I think we should
update that as things change.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: make dist using git archive

2024-03-22 Thread Tristan Partin


On Thu Mar 21, 2024 at 3:44 AM CDT, Peter Eisentraut wrote:

Here is an updated version of this patch set.


You should add 'disabler: true' to the git find_program in Meson. If Git 
doesn't exist on the system with the way your patch is currently 
written, the targets would be defined, even though they would never 
succeed.


You may also want to make sure that we are actually in a Git repository. 
I don't think git-archive works outside one.


Re the autoclrf, is this something we could throw in a .gitattributes 
files?


I have removed the "dirty check" stuff.  It didn't really work well/was 
buggy under meson, and it failed mysteriously on the Linux CI tasks.  So 
let's just drop that functionality for now.


I have also added a more complete commit message and some more code 
comments.



Meson has its own distribution building command (meson dist), but we
are not using that at this point.  The main problem is that the way
they have implemented it, it is not deterministic in the above sense.
Also, we want a "make" version for the time being.  But the target
name "dist" in meson is reserved for that reason, so we call the
custom target "pgdist" (so call something like "meson compile -C build
pgdist").


I would suggest poisoning `meson dist` in the following way:

if not meson.is_subproject()
	# Maybe edit the comment...Maybe tell perl to print this message 
	# instead and then exit non-zero?

#
# Meson has its own distribution building command (meson dist), but we
# are not using that at this point.  The main problem is that the way
# they have implemented it, it is not deterministic in the above sense.
# Also, we want a "make" version for the time being.  But the target
# name "dist" in meson is reserved for that reason, so we call the
# custom target "pgdist" (so call something like "meson compile -C build
# pgdist").
#
	# We don't poison the dist if we are a subproject because it is 
	# possible that the parent project may want to create a dist using 
	# the builtin Meson method.

meson.add_dist_script(perl, '-e', 'exit 1')
endif

I have extracted the freebsd CI script fix into a separate patch (0002). 
  I think this is useful even if we don't take the full CI patch (0003).


0002 looks pretty reasonable to me.

About the 0003 patch: It seems useful in principle to test these things 
continuously.  The dist script runs about 10 seconds in each task, and 
takes a bit of disk space for the artifacts.  I'm not sure to what 
degree this might bother someone.


0003 works for me :).

--
Tristan Partin
Neon (https://neon.tech)

Re: DOCS: add helpful partitioning links

2024-03-22 Thread Robert Treat

On Thu, Mar 21, 2024 at 7:27 AM Ashutosh Bapat
 wrote:
> On Wed, Mar 20, 2024 at 5:22 PM Ashutosh Bapat  
> wrote:
>> On Tue, Mar 19, 2024 at 6:38 PM Robert Treat  wrote:
>>>
>>>
>>> I've put it in the next commitfest with target version of 17, and I've
>>> added you as a reviewer :-)
>>>
>>
>> Thanks.
>>
>>>
>>> Also, attached is an updated patch with your change above which should
>>> apply cleanly to the current git master.
>>
>>
>> It did apply for me now.
>>
>> The HTML renders good, the links work as expected.
>>
>> The CREATE TABLE ... LIKE command
>> I think the original word "option" instead of "command" is better since we 
>> are talking about LIKE as an option to CREATE TABLE instead of CREATE TABLE 
>> command.
>>
>> + but any future attached or created partitions will be indexed as well.
>>
>> I think just "any future partitions will be indexed as well" would suffice, 
>> no need to mention whether they were created or attached.
>>
>> + One limitation when creating new indexes on partitioned tables is that 
>> it
>> + is not possible to use the CONCURRENTLY
>> + qualifier when creating such a partitioned index.  To avoid long
>>
>> The sentence uses two different phrases, "indexes on partitioned tables" and 
>> "partitioned index", for the same thing in the same sentence. Probably it is 
>> better to leave original sentence as is.
>>
>> But I think it's time for a committer to take a look at this. Please feel 
>> free to address the above comments if you agree with them. Marking this as 
>> ready for committer.
>>
>
> The patch changes things not directly related to $Subject. It will be good to 
> add a commit message to the patch describing what are those changes about. I 
> observe that all of them are in section "partition maintenance". 
> https://www.postgresql.org/docs/16/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-MAINTENANCE.
>  Do you see any more edits required in that section?
>

Heh, well, I had thought about some other possible improvements to
that section but hadn't quite worked them out, but you inspired me to
have another go of it ;-)

v5 patch attached which I think further improves clarity/brevity of
this section. I've left the patch name the same for simplicity, but
I'd agree that the commit would now be more along the lines of editing
/ improvements / copyrighting of "Partition Maintenance" docs.

Robert Treat
https://xzilla.net


improve-partition-links_v5.patch
Description: Binary data

Re: Built-in CTYPE provider

2024-03-22 Thread Jeff Davis

On Fri, 2024-03-22 at 15:51 +0100, Peter Eisentraut wrote:
> I think this might be too big of a compatibility break.  So far, 
> initcap('123abc') has always returned '123abc'.  If the new collation
> returns '123Abc' now, then that's quite a change.  These are not some
> obscure Unicode special case characters, after all.

It's a new collation, so I'm not sure it's a compatibility break. But
you are right that it is against documentation and expectations for
INITCAP().

> What is the ICU configuration incantation for this?  Maybe we could
> have 
> the builtin provider understand some of that, too.

https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/stringoptions_8h.html#a4975f537b9960f0330b233061ef0608d
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/stringoptions_8h.html#afc65fa226cac9b8eeef0e877b8a7744e

> Or we should create a function separate from initcap.

If we create a new function, that also gives us the opportunity to
accept optional arguments to control the behavior rather than relying
on collation for every decision.

Regards,
Jeff Davis

Re: Cannot find a working 64-bit integer type on Illumos

2024-03-22 Thread Tom Lane

Japin Li  writes:
> On Sat, 23 Mar 2024 at 00:53, Andres Freund  wrote:
>> Likely there's an unrelated warning triggering the configure test to
>> fail. We'd need to see config.log to see what that is.

> Thanks for your quick reply. Attach the config.log.

Yup:

conftest.c:139:5: error: no previous prototype for 'does_int64_work' 
[-Werror=missing-prototypes]
  139 | int does_int64_work()
  | ^~~
cc1: all warnings being treated as errors
configure:17003: $? = 1
configure: program exited with status 1

This warning is harmless normally, but breaks the configure probe if
you enable -Werror.

No doubt we could improve that test snippet so that it does not
trigger that warning.  But trying to make configure safe for -Werror
seems like a fool's errand, for these reasons:

* Do you really want to try to make all of configure's probes proof
against every compiler warning everywhere?

* Many of the test snippets aren't readily under our control, as they
are supplied by Autoconf.

* In the majority of cases, any such failures would be silent, as
configure would just conclude that the feature it is probing for
isn't there.  So even finding there's a problem would be difficult.

The short answer is that Autoconf is not designed to support -Werror
and it's not worth it to try to make it do so.

regards, tom lane

Re: Cannot find a working 64-bit integer type on Illumos

2024-03-22 Thread Japin Li



On Sat, 23 Mar 2024 at 01:04, Tom Lane  wrote:
> Japin Li  writes:
>> When I try to configure PostgreSQL 16.2 on Illumos using the following 
>> command,
>> it complains $subject.
>
>> ./configure --enable-cassert --enable-debug --enable-nls --with-perl \
>>   --with-python --without-tcl --without-gssapi --with-openssl \
>>   --with-ldap --with-libxml --with-libxslt --without-systemd \
>>   --with-readline --enable-thread-safety --enable-dtrace \
>>   DTRACEFLAGS=-64 CFLAGS=-Werror
>
>> However, if I remove the `CFLAGS=-Werror`, it works fine.
>> I'm not sure what happened here.
>
> CFLAGS=-Werror breaks a whole lot of configure's tests, not only that
> one.  (We even have this documented, see [1].)  So you can't inject
> -Werror that way.  What I do on my buildfarm animals is the equivalent
> of
>
>   export COPT='-Werror'
>
> after configure and before build.  I think configure pays no attention
> to COPT, so it'd likely be safe to keep that set all the time, but in
> the buildfarm client it's just as easy to be conservative.
>
>   regards, tom lane
>
> [1] https://www.postgresql.org/docs/devel/install-make.html#CONFIGURE-ENVVARS

Thank you very much!  I didn't notice this part before.

Re: pg_upgrade --copy-file-range

2024-03-22 Thread Tomas Vondra

On 3/22/24 17:42, Robert Haas wrote:
> On Fri, Mar 22, 2024 at 10:40 AM Tomas Vondra
>  wrote:
>> There's one question, though. As it stands, there's a bit of asymmetry
>> between handling CopyFile() on WIN32 and the clone/copy_file_range on
>> other platforms). On WIN32, we simply automatically switch to CopyFile
>> automatically, if we don't need to calculate checksum. But with the
>> other methods, error out if the user requests those and we need to
>> calculate the checksum.
> 
> That seems completely broken. copy_file() needs to have the ability to
> calculate a checksum if one is required; when one isn't required, it
> can do whatever it likes. So we should always fall back to the
> block-by-block method if we need a checksum. Whatever option the user
> specified should only be applied when we don't need a checksum.
> 
> Consider, for example:
> 
> pg_basebackup -D sunday -c fast --manifest-checksums=CRC32C
> pg_basebackup -D monday -c fast --manifest-checksums=SHA224
> --incremental sunday/backup_manifest
> pg_combinebackup sunday monday -o tuesday --manifest-checksums=CRC32C --clone
> 
> Any files that are copied in their entirety from Sunday's backup can
> be cloned, if we have support for cloning. But any files copied from
> Monday's backup will need to be re-checksummed, since the checksum
> algorithms don't match. With what you're describing, it sounds like
> pg_combinebackup would just fail in this case; I don't think that's
> the behavior that the user is going to want.
> 

Right, this will happen:

  pg_combinebackup: error: unable to use accelerated copy when manifest
  checksums are to be calculated. Use --no-manifest

Are you saying we should just silently override the copy method and do
the copy block by block? I'm not strongly opposed to that, but it feels
wrong to just ignore that the user explicitly requested cloning, and I'm
not sure why should this be different from any other case when the user
requests incompatible combination of options and/or options that are not
supported on the current configuration.

Why not just to tell the user to use the correct parameters, i.e. either
remove --clone or add --no-manifest?

FWIW I now realize it actually fails a bit earlier than I thought - when
parsing the options, not in copy_file. But then some checks (if a given
copy method is supported) happen in the copy functions. I wonder if it'd
be better/possible to do all of that in one place, not sure.

Also, the message only suggests to use --no-manifest. It probably should
suggest removing --clone too.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: psql not responding to SIGINT upon db reconnection

2024-03-22 Thread Robert Haas

On Fri, Mar 22, 2024 at 1:05 PM Tristan Partin  wrote:
> Sorry for taking a while to get back to y'all. I have taken your
> feedback into consideration for v9. This is my first time writing
> Postgres docs, so I'm ready for another round of editing :).

Yeah, that looks like it needs some wordsmithing yet. I can take a
crack at that at some point, but here are a few notes:

- "takes care of" and "working through the state machine" seem quite
vague to me.
- the meanings of forRead and forWrite don't seem to be documented.
- "retuns" is a typo.

> Robert, could you point out some places where comments would be useful
> in 0002? I did rename the function and moved it as suggested, thanks! In
> turn, I also realized I forgot a prototype, so also added it.

Well, for starters, I'd give the function a header comment.

Telling me that a 1 second timeout is a 1 second timeout is less
useful than telling me why we've picked a 1 second timeout. Presumably
the answer is "so we can respond to cancel interrupts in a timely
fashion", but I'd make that explicit.

It might be worth a comment saying that PQsocket() shouldn't be
hoisted out of the loop because it could be a multi-host connection
string and the socket might change under us. Unless that's not true,
in which case we should hoist the PQsocket() call out of the loop.

I think it would also be worth a comment saying that we don't need to
report errors here, as the caller will do that; we just need to wait
until the connection is known to have either succeeded or failed, or
until the user presses cancel.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: psql not responding to SIGINT upon db reconnection

2024-03-22 Thread Tristan Partin


On Fri Mar 22, 2024 at 9:59 AM CDT, Robert Haas wrote:

On Wed, Jan 31, 2024 at 1:07 PM Tristan Partin  wrote:
> I was looking for documentation of PQsocket(), but didn't find any
> standalone (unless I completely missed it). So I just copied how
> PQsocket() is documented in PQconnectPoll(). I am happy to document it
> separately if you think it would be useful.

As Jelte said back at the end of January, I think you just completely
missed it. The relevant part of libpq.sgml starts like this:


 
PQsocketPQsocket

As far as I know, we document all of the exported libpq functions in
that SGML file.

I think there's no real reason why we couldn't get at least 0001 and
maybe also 0002 into this release, but only if you move quickly on
this. Feature freeze is approaching rapidly.

Modulo the documentation changes, I think 0001 is pretty much ready to
go. 0002 needs comments. I'm also not so sure about the name
process_connection_state_machine(); it seems a little verbose. How
about something like wait_until_connected()? And maybe put it below
do_connect() instead of above.


Robert, Jelte:

Sorry for taking a while to get back to y'all. I have taken your 
feedback into consideration for v9. This is my first time writing 
Postgres docs, so I'm ready for another round of editing :).


Robert, could you point out some places where comments would be useful 
in 0002? I did rename the function and moved it as suggested, thanks! In 
turn, I also realized I forgot a prototype, so also added it.


This patchset has also been rebased on master, so the libpq export 
number for PQsocketPoll was bumped.


--
Tristan Partin
Neon (https://neon.tech)
From 7f8bf7250ecf79f65c129ccc42643c36bc53f882 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 30 Jan 2024 15:40:57 -0600
Subject: [PATCH v9 1/2] Expose PQsocketPoll for frontends

PQsocketPoll should help with state machine processing when connecting
to a database asynchronously via PQconnectStart().
---
 doc/src/sgml/libpq.sgml  | 38 +++-
 src/interfaces/libpq/exports.txt |  1 +
 src/interfaces/libpq/fe-misc.c   |  7 +++---
 src/interfaces/libpq/libpq-fe.h  |  4 
 4 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d3e87056f2c..10f191f6b88 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -262,6 +262,39 @@ PGconn *PQsetdb(char *pghost,
  
 
 
+
+ PQsocketPollPQsocketPoll
+ 
+  
+   nonblocking connection
+   Poll a connections underlying socket descriptor retrieved with .
+
+int PQsocketPoll(int sock, int forRead, int forWrite, time_t end_time);
+
+  
+
+  
+   This function takes care of the setup for monitoring a file descriptor. The underlying
+   function is either poll(2) or select(2),
+   depending on platform support. The primary use of this function is working through the state
+   machine instantiated by .
+  
+
+  
+   The function returns a value greater than 0 if the specified condition
+   is met, 0 if a timeout occurred, or -1 if an error
+   or interrupt occurred. In the event forRead and
+   forWrite are not set, the function immediately retuns a timeout
+   condition.
+  
+
+  
+   end_time is the time in the future in seconds starting from the UNIX
+   epoch in which you would like the function to return if the condition is not met.
+  
+ 
+
+
 
  PQconnectStartParamsPQconnectStartParams
  PQconnectStartPQconnectStart
@@ -358,7 +391,10 @@ PostgresPollingStatusType PQconnectPoll(PGconn *conn);
Loop thus: If PQconnectPoll(conn) last returned
PGRES_POLLING_READING, wait until the socket is ready to
read (as indicated by select(), poll(), or
-   similar system function).
+   similar system function).  Note that PQsocketPoll
+   can help reduce boilerplate by abstracting the setup of
+   select(2) or poll(2) if it is
+   available on your system.
Then call PQconnectPoll(conn) again.
Conversely, if PQconnectPoll(conn) last returned
PGRES_POLLING_WRITING, wait until the socket is ready
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 9fbd3d34074..1e48d37677d 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -202,3 +202,4 @@ PQcancelSocket199
 PQcancelErrorMessage  200
 PQcancelReset 201
 PQcancelFinish202
+PQsocketPoll  203
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index f2fc78a481c..f562cd8d344 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -55,7 +55,6 @@ static int	pqPutMsgBytes(const void *buf, size_t len, PGconn *conn);
 static int	pqSendSome(PGconn *conn, int len);
 static int	pqSocketCheck(PGconn *conn, int forRead, int forWrite,

Re: Cannot find a working 64-bit integer type on Illumos

2024-03-22 Thread Tom Lane

Japin Li  writes:
> When I try to configure PostgreSQL 16.2 on Illumos using the following 
> command,
> it complains $subject.

> ./configure --enable-cassert --enable-debug --enable-nls --with-perl \
>   --with-python --without-tcl --without-gssapi --with-openssl \
>   --with-ldap --with-libxml --with-libxslt --without-systemd \
>   --with-readline --enable-thread-safety --enable-dtrace \
>   DTRACEFLAGS=-64 CFLAGS=-Werror

> However, if I remove the `CFLAGS=-Werror`, it works fine.
> I'm not sure what happened here.

CFLAGS=-Werror breaks a whole lot of configure's tests, not only that
one.  (We even have this documented, see [1].)  So you can't inject
-Werror that way.  What I do on my buildfarm animals is the equivalent
of

export COPT='-Werror'

after configure and before build.  I think configure pays no attention
to COPT, so it'd likely be safe to keep that set all the time, but in
the buildfarm client it's just as easy to be conservative.

regards, tom lane

[1] https://www.postgresql.org/docs/devel/install-make.html#CONFIGURE-ENVVARS

Re: documentation structure

2024-03-22 Thread David G. Johnston

On Fri, Mar 22, 2024, 09:32 Robert Haas  wrote:

>
>
> I notice that you say that the "Installation" section should "cover
> the architectural overview and point people to where they can find the
> stuff they need to install PostgreSQL in the various ways available to
> them" so maybe you're not imagining a four-sentence chapter, either.
>

Fair point but I posit that new users are looking for a chapter named
Installation in the documentation.  At least the ones willing to read
documentation.  Having two of them isn't needed but having zero doesn't
make sense either.

The current proposal does that so I'm ok as-is but it can be further
improved by moving source install talk elsewhere and having the
installation chapter redirect the reader there for details.  I'm not
concerned with how long or short the resultant installation chapter is.

David J.

>
>

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-22 Thread Nathan Bossart

On Fri, Mar 22, 2024 at 12:53:15PM -0400, Tom Lane wrote:
> Would you like to review the catcache patch further, or do you
> think it's good to go?

I'll take another look this afternoon.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-22 Thread Tom Lane

Nathan Bossart  writes:
> On Fri, Mar 22, 2024 at 11:27:46AM -0400, Tom Lane wrote:
>> * Do we want to risk back-patching any of this, to fix the performance
>> regression in v16?  I think that the OP's situation is a pretty
>> narrow one, but maybe he's not the only person who managed to dodge
>> roles_is_member_of's performance issues in most other cases.

> I've heard complaints about performance with many roles before, so I
> certainly think this area is worth optimizing.  As far as back-patching
> goes, my current feeling is that the hash table is probably pretty safe and
> provides the majority of the benefit, but anything fancier should probably
> be reserved for v17 or v18.

Yeah.  Although both the catcache and list_append_unique_oid bits
are O(N^2), the catcache seems to have a much bigger constant
factor --- when I did a "perf" check on the unpatched code,
I saw catcache eating over 90% of the runtime and list_member_oid
about 2%.  So let's fix that part in v16 and call it a day.
It should be safe to back-patch the catcache changes as long as
we put the new fields at the end of the struct and leave cc_lists
present but empty.

Would you like to review the catcache patch further, or do you
think it's good to go?

regards, tom lane

Re: Cannot find a working 64-bit integer type on Illumos

2024-03-22 Thread Andres Freund

Hi,

On 2024-03-23 00:48:05 +0800, Japin Li wrote:
> When I try to configure PostgreSQL 16.2 on Illumos using the following 
> command,
> it complains $subject.
> 
> ./configure --enable-cassert --enable-debug --enable-nls --with-perl \
>   --with-python --without-tcl --without-gssapi --with-openssl \
>   --with-ldap --with-libxml --with-libxslt --without-systemd \
>   --with-readline --enable-thread-safety --enable-dtrace \
>   DTRACEFLAGS=-64 CFLAGS=-Werror
> 
> However, if I remove the `CFLAGS=-Werror`, it works fine.

Likely there's an unrelated warning triggering the configure test to
fail. We'd need to see config.log to see what that is.

Greetings,

Andres Freund

Re: pgsql: Allow using syncfs() in frontend utilities.

2024-03-22 Thread Robert Haas

On Wed, Sep 6, 2023 at 7:28 PM Nathan Bossart  wrote:
> Allow using syncfs() in frontend utilities.
>
> This commit allows specifying a --sync-method in several frontend
> utilities that must synchronize many files to disk (initdb,
> pg_basebackup, pg_checksums, pg_dump, pg_rewind, and pg_upgrade).
> On Linux, users can specify "syncfs" to synchronize the relevant
> file systems instead of calling fsync() for every single file.  In
> many cases, using syncfs() is much faster.
>
> As with recovery_init_sync_method, this new option comes with some
> caveats.  The descriptions of these caveats have been moved to a
> new appendix section in the documentation.

Hi,

I'd like to complain about this commit's addition of a new appendix. I
do understand the temptation to document caveats like this centrally
instead of in multiple places, but as I've been complaining about over
in the "documentation structure" thread, our top-level documentation
index is too big, and I feel strongly that we need to de-clutter it
rather than cluttering it further. This added a new chapter which is
just 5 sentences long. I understand that this was done because the
same issue applies to a bunch of different utilities and we didn't
want to duplicate this text in all of those places, but I feel like
this approach just doesn't scale. If we did this in every place where
we have this much text that we want to avoid duplicating, we'd soon
have hundreds of appendixes.

What I would suggest we do instead is pick one of the places where
this comes up and document it there, perhaps the
recovery_init_sync_method GUC. And then make the documentation for the
other say something like, you know those issues we documented for
recovery_init_sync_method? Well they also apply to this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Cannot find a working 64-bit integer type on Illumos

2024-03-22 Thread Japin Li



Hi, hackers,

When I try to configure PostgreSQL 16.2 on Illumos using the following command,
it complains $subject.

./configure --enable-cassert --enable-debug --enable-nls --with-perl \
  --with-python --without-tcl --without-gssapi --with-openssl \
  --with-ldap --with-libxml --with-libxslt --without-systemd \
  --with-readline --enable-thread-safety --enable-dtrace \
  DTRACEFLAGS=-64 CFLAGS=-Werror

However, if I remove the `CFLAGS=-Werror`, it works fine.

I'm not sure what happened here.

$ uname -a
SunOS db_build 5.11 hunghu-20231216T132436Z i86pc i386 i86pc illumos
$ gcc --version
gcc (GCC) 10.4.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Re: pg_upgrade --copy-file-range

2024-03-22 Thread Robert Haas

On Fri, Mar 22, 2024 at 10:40 AM Tomas Vondra
 wrote:
> There's one question, though. As it stands, there's a bit of asymmetry
> between handling CopyFile() on WIN32 and the clone/copy_file_range on
> other platforms). On WIN32, we simply automatically switch to CopyFile
> automatically, if we don't need to calculate checksum. But with the
> other methods, error out if the user requests those and we need to
> calculate the checksum.

That seems completely broken. copy_file() needs to have the ability to
calculate a checksum if one is required; when one isn't required, it
can do whatever it likes. So we should always fall back to the
block-by-block method if we need a checksum. Whatever option the user
specified should only be applied when we don't need a checksum.

Consider, for example:

pg_basebackup -D sunday -c fast --manifest-checksums=CRC32C
pg_basebackup -D monday -c fast --manifest-checksums=SHA224
--incremental sunday/backup_manifest
pg_combinebackup sunday monday -o tuesday --manifest-checksums=CRC32C --clone

Any files that are copied in their entirety from Sunday's backup can
be cloned, if we have support for cloning. But any files copied from
Monday's backup will need to be re-checksummed, since the checksum
algorithms don't match. With what you're describing, it sounds like
pg_combinebackup would just fail in this case; I don't think that's
the behavior that the user is going to want.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-22 Thread Nathan Bossart

On Fri, Mar 22, 2024 at 11:27:46AM -0400, Tom Lane wrote:
> Yeah, that's about what I'd expect: hash+bloom ought to remove
> most (not quite all) of the opportunity for simd to shine, because
> the bloom filter should eliminate most of the list_member_oid calls.

Right.  IMHO the SIMD work is still worth considering because there are
probably even more extreme cases where it'll make a decent amount of
difference.  Plus, that stuff is pretty low overhead for what you get in
return.  That being said, the hash table and Bloom filter should definitely
be the higher priority.

> * Same question for the bloom logic, but here I think it's mostly
> a matter of tuning those constants.

I suspect there might be some regressions just after the point where we
construct the filter, but I'd still expect that to be a reasonable
trade-off.  We could probably pretty easily construct some benchmarks to
understand the impact with a given number of roles.  (I'm not sure I'll be
able to get to that today.)

> * Do we want to risk back-patching any of this, to fix the performance
> regression in v16?  I think that the OP's situation is a pretty
> narrow one, but maybe he's not the only person who managed to dodge
> roles_is_member_of's performance issues in most other cases.

I've heard complaints about performance with many roles before, so I
certainly think this area is worth optimizing.  As far as back-patching
goes, my current feeling is that the hash table is probably pretty safe and
provides the majority of the benefit, but anything fancier should probably
be reserved for v17 or v18.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: documentation structure

2024-03-22 Thread Robert Haas

On Fri, Mar 22, 2024 at 11:50 AM David G. Johnston
 wrote:
> On Fri, Mar 22, 2024 at 7:10 AM Robert Haas  wrote:
>> That's actually what we had in chapter
>> 18, "Installation from Source Code on Windows", since removed. But for
>> some reason we decided that on non-Windows platforms, it needed a
>> whole new chapter rather than an extra sentence in the existing one. I
>> think that's massively overkill.
>
> I agree with the premise that we should have a single chapter, in the main 
> documentation flow, named "Installation".  It should cover the architectural 
> overview and point people to where they can find the stuff they need to 
> install PostgreSQL in the various ways available to them.  I agree with 
> moving the source installation material to the appendix.  None of the 
> sections under Installation would then actually detail how to install the 
> software since that isn't something the project itself handles but has 
> delegated to packagers for the vast majority of cases and the source install 
> details are in the appendix for the one "supported" mechanism that most 
> people do not use.

Hmm, that's not quite the same as my position. I'm fine with either
moving the installation from source material to an appendix, or
leaving it where it is. But I'm strongly against continuing to have a
chapter with four sentences in it that says to use the same download
link that is on the main navigation bar of every page on the
postgresql.org web site. We're never going to get the chapter index
down to a reasonable size if we insist on having chapters that have a
totally de minimis amount of content.

So my feeling is that if we keep the installation from source material
where it is, then we can make it also mention the download link, just
as we used to do in the installation-on-windows chapter. But if we
banish installation from source to the appendixes, then we shouldn't
keep a whole chapter in the main documentation to tell people
something that is anyway obvious. I don't really think that material
needs to be there at all, but if we want to have it, surely we can
find someplace to put it such that it doesn't require a whole chapter
to say that and nothing else. It could for example go at the beginning
of the "Server Setup and Operation" chapter, for instance; if that
were the first chapter of section III, I think that would be natural
enough.

I notice that you say that the "Installation" section should "cover
the architectural overview and point people to where they can find the
stuff they need to install PostgreSQL in the various ways available to
them" so maybe you're not imagining a four-sentence chapter, either.
But this project is going to be impossible unless we stick to limited
goals. We can, and should, rewrite some sections of the documentation
to be more useful; but if we try to do that as part of the same
project that aims to tidy up the index, the chances of us getting
stuck in an endless bikeshedding loop go from "high" to "certain". So
I don't want to hypothesize the existence of an installation chapter
that isn't any of the things we have today. Let's try to get the
things we have into places that make sense, and then consider other
improvements separately.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: sublink [exists (select xxx group by grouping sets ())] causes an assertion error

2024-03-22 Thread Tom Lane

"=?UTF-8?B?6LW15bqt5rW3KOW6reeroCk=?="  writes:
> I recently notice these sql can lead to a assertion error in pg14 and older 
> version. Here is an example:
> postgres=> CREATE TABLE t1 (a int);
> postgres=> INSERT INTO t1 VALUES (1);
> postgres=> SELECT EXISTS ( SELECT * FROM t1 GROUP BY GROUPING SETS ((a), 
> generate_series (1, 262144)) ) AS result;
> server closed the connection unexpectedly

In current v14 this produces:
TRAP: FailedAssertion("!lt->writing || lt->buffer == NULL", File: "logtape.c", 
Line: 1279, PID: 928622)

Thanks for the report.  I did some bisecting and found that the crash
appears at Jeff's commit c8aeaf3ab (which introduced this assertion)
and disappears at Heikki's c4649cce3 (which removed it).  So I would
say that the problem is "this assertion is wrong", and we should fix
the problem by fixing the assertion, not by hacking around in distant
calling code.  On the whole, since this code has been dead for
several versions, I'd be inclined to just remove the assertion.
I think it's quite risky because of the possibility that we reach
this function during post-transaction-abort cleanup, when there's
no very good reason to assume that the tapeset's been closed down
cleanly.  (To be clear, that's not what's happening in the given
test case ... but I fear that it could.)

regards, tom lane

Re: [PATCH] plpython function causes server panic

2024-03-22 Thread Tom Lane

Robert Haas  writes:
> On Fri, Dec 29, 2023 at 12:56 PM Tom Lane  wrote:
>> Here's a draft patch along this line.  Basically the idea is that
>> subtransactions used for error control are now legal in parallel
>> mode (including in parallel workers) so long as they don't try to
>> acquire their own XIDs.  I had to clean up some error handling
>> in xact.c, but really this is a pretty simple patch.

> I agree with the general direction. A few comments:

Thanks for looking at this!  I was hoping you'd review it, because
I thought there was a pretty significant chance that I'd missed some
fundamental reason it couldn't work.  I feel better now about it
being worth pursuing.

I consider the patch draft quality at this point: I didn't spend
much effort on docs or comments, and none on test cases.  I'll
work on those issues and come back with a v2.

regards, tom lane

Re: documentation structure

2024-03-22 Thread David G. Johnston

On Fri, Mar 22, 2024 at 7:10 AM Robert Haas  wrote:

>
> That's actually what we had in chapter
> 18, "Installation from Source Code on Windows", since removed. But for
> some reason we decided that on non-Windows platforms, it needed a
> whole new chapter rather than an extra sentence in the existing one. I
> think that's massively overkill.
>
>
I agree with the premise that we should have a single chapter, in the main
documentation flow, named "Installation".  It should cover the
architectural overview and point people to where they can find the stuff
they need to install PostgreSQL in the various ways available to them.  I
agree with moving the source installation material to the appendix.  None
of the sections under Installation would then actually detail how to
install the software since that isn't something the project itself handles
but has delegated to packagers for the vast majority of cases and the
source install details are in the appendix for the one "supported"
mechanism that most people do not use.

David J.

Re: MIN/MAX functions for a record

2024-03-22 Thread Viliam Ďurina

Exactly Tom, I see no fundamental problem for it not to be implemented,
since comparison operator is already implemented. In fact, MIN/MAX should
work for all types for which comparison operator is defined.

Regarding index support, there should not be an issue if the index is
defined for the record (e.g. `CREATE INDEX ON my_table(ROW(field_a,
field_b))`). However such indexes seem not to be supported. Whether a
composite index is compatible with a record created on the indexed fields
in every edge case I'm not sure...

Alexander, rewriting the year-month example is easy, but how would you
rewrite this query?

CREATE TABLE events(event_time TIMESTAMP, message VARCHAR, user_id VARCHAR);

You want a newest message for each user. It's easy with MAX(record):

SELECT user_id, MAX(ROW(event_time, message)).message
FROM events
GROUP BY user_id;

One option is to rewrite to a subquery with LIMIT 1

SELECT user_id, (SELECT message FROM events e2 WHERE e1.user_id=e2.user_id
ORDER BY event_time DESC LIMIT 1)
FROM events e1
GROUP BY user_id;

If your query already has multiple levels of grouping, multiple joins,
UNIONs etc., it gets much more complex. I also wonder if the optimizer
would pick the same plan as it would be if the MAX(record) is supported.

Viliam

On Fri, Mar 22, 2024 at 4:12 PM Tom Lane  wrote:

> Aleksander Alekseev  writes:
> >> In my queries I often need to do MIN/MAX for tuples, for example:
> >> SELECT MAX(row(year, month))
> >> FROM (VALUES(2025, 1), (2024,2)) x(year, month);
> >> This query throws:
> >> ERROR: function max(record) does not exist
> >> Was this ever discussed or is there something preventing the
> implementation?
>
> > I believe it would be challenging to implement max(record) that would
> > work reasonably well in a general case.
>
> As long as you define it as "works the same way record comparison
> does", ie base it on record_cmp(), I don't think it would be much
> more than a finger exercise [*].  And why would you want it to act
> any differently from record_cmp()?  Those semantics have been
> established for a long time.
>
> regards, tom lane
>
> [*] Although conceivably there are some challenges in getting
> record_cmp's caching logic to work in the context of an aggregate.
>

Re: SQL:2011 application time

2024-03-22 Thread Peter Eisentraut

On 22.03.24 01:35, Paul Jungwirth wrote:
 > 1. In ri_triggers.c ri_KeysEqual, you swap the order of arguments to 
ri_AttributesEqual():

 >
 > -   if (!ri_AttributesEqual(riinfo->ff_eq_oprs[i], 
RIAttType(rel, attnums[i]),

 > -   oldvalue, newvalue))
 > +   if (!ri_AttributesEqual(eq_opr, RIAttType(rel, attnums[i]),
 > +   newvalue, oldvalue))
 >
 > But the declared arguments of ri_AttributesEqual() are oldvalue and 
newvalue, so passing them

 > backwards is really confusing.  And the change does matter in the tests.
 >
 > Can we organize this better?

I renamed the params and actually the whole function. All it's doing is 
execute `oldvalue op newvalue`, casting if necessary. So I changed it to 
ri_CompareWithCast and added some documentation. In an earlier version 
of this patch I had a separate function for the PERIOD comparison, but 
it's just doing the same thing, so I think the best thing is to give the 
function a more accurate name and use it.

Ok, I see now, and the new explanation is better.

But after reading the comment in the function about collations, I think 
there could be trouble.  As long as we are only comparing for equality 
(and we don't support nondeterministic global collations), then we can 
use any collation to compare for equality.  But if we are doing 
contained-by, then the collation does matter, so we would need to get 
the actual collation somehow.  So as written, this might not always work 
correctly.

I think it would be safer for now if we just kept using the equality 
operation even for temporal foreign keys.  If we did that, then in the 
case that you update a key to a new value that is contained by the old 
value, this function would say "not equal" and fire all the checks, even 
though it wouldn't need to.  This is kind of similar to the "false 
negatives" that the comment already talks about.

What do you think?

Re: Add Index-level REINDEX with multiple jobs

2024-03-22 Thread Alexander Korotkov

On Wed, Mar 20, 2024 at 7:19 PM Alexander Korotkov  wrote:
> On Mon, Mar 11, 2024 at 3:44 PM Maxim Orlov  wrote:
> > On Tue, 6 Feb 2024 at 09:22, Michael Paquier  wrote:
> >> The problem may be actually trickier than that, no?  Could there be
> >> other factors to take into account for their classification, like
> >> their sizes (typically, we'd want to process the biggest one first, I
> >> guess)?
> >
> >
> > Sorry for a late reply.  Thanks for an explanation.  This is sounds 
> > reasonable to me.
> > Svetlana had addressed this in the patch v2.
>
> I think this patch is a nice improvement.  But it doesn't seem to be
> implemented in the right way.  There is no guarantee that
> get_parallel_object_list() will return tables in the same order as
> indexes.  Especially when there is "ORDER BY idx.relpages".  Also,
> sort_indices_by_tables() has quadratic complexity (probably OK since
> input list shouldn't be too lengthy) and a bit awkward.
>
> I've revised the patchset.  Now appropriate ordering is made in SQL
> query.  The original list of indexes is modified to match the list of
> tables.  The tables are ordered by the size of its greatest index,
> within table indexes are ordered by size.
>
> I'm going to further revise this patch, mostly comments and the commit 
> message.

Here goes the revised patch.  I'm going to push this if there are no objections.

--
Regards,
Alexander Korotkov


v4-0001-Add-the-index-level-REINDEX-with-multiple-jobs.patch
Description: Binary data

Re: [PATCH] plpython function causes server panic

2024-03-22 Thread Robert Haas

On Fri, Dec 29, 2023 at 12:56 PM Tom Lane  wrote:
> Here's a draft patch along this line.  Basically the idea is that
> subtransactions used for error control are now legal in parallel
> mode (including in parallel workers) so long as they don't try to
> acquire their own XIDs.  I had to clean up some error handling
> in xact.c, but really this is a pretty simple patch.

I agree with the general direction. A few comments:

- Isn't it redundant to test if IsInParallelMode() ||
IsParallelWorker()? We can't be in a parallel worker without also
being in parallel mode, except during the worker startup sequence.

- I don't think the documentation changes are entirely accurate. The
whole point of the patch is to allow parallel workers to make changes
to the transaction state, but the documentation says you can't. Maybe
we should just delete "change the transaction state" entirely from the
list of things that you're not allowed to do, since "write to the
database" is already listed separately; or maybe we should replace it
with something like "assign new transaction IDs or command IDs,"
although that's kind of low-level. I don't think we should just delete
the "even temporarily" bit, as you've done.

- While I like the new comments in BeginInternalSubTransaction(), I
think the changes in ReleaseCurrentSubTransaction() and
RollbackAndReleaseCurrentSubTransaction() need more thought. For one
thing, it's got to be wildly optimistic to claim that we would have
caught *anything* that's forbidden in parallel mode; that would
require solving the halting problem. I'd rather have no comment at all
here than one making such an ambitious claim, and I think that might
be a fine way to go. But if we do have a comment, I think it should be
more narrowly focused e.g. "We do not check for parallel mode here.
It's permissible to start and end subtransactions while in parallel
mode, as long as no new XIDs or command IDs are assigned." One
additional thing that might (or might not) be worth mentioning or
checking for here is that the leader shouldn't try to reduce the
height of the transaction state stack to anything less than what it
was when the parallel operation started; if it wants to do that, it
needs to clean up the parallel workers and exit parallel mode first.
Similarly a worker shouldn't ever end the toplevel transaction except
during backend cleanup.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Adding comments to help understand psql hidden queries

2024-03-22 Thread David Christensen

On Fri, Mar 22, 2024 at 9:47 AM Greg Sabino Mullane  wrote:
>
> On Thu, Mar 21, 2024 at 6:20 PM Peter Eisentraut  wrote:
>>
>> lines are supposed to align vertically.  With your patch, the first line
>> would have variable length depending on the command.
>
>
> Yes, that is a good point. Aligning those would be quite tricky, what if we 
> just kept a standard width for the closing query? Probably the 24 stars we 
> currently have to match "QUERY", which it appears nobody has changed for 
> translation purposes yet anyway. (If I am reading the code correctly, it 
> would be up to the translators to maintain the vertical alignment).

I think it's easier to keep the widths balanced than constant (patch
version included here), but if we needed to squeeze the opening string
to a standard width that would be possible without too much trouble.
The internal comment strings seem to be a bit more of a pain if we
wanted all of the comments to be the same width, as we'd need a table
or something so we can compute the longest string width, etc; doesn't
seem worth the convolutions IMHO.

No changes to Greg's patch, just keeping 'em both so cfbot stays happy.

David

v3-0002-Add-output-of-the-command-that-got-us-here-to-the.patch
Description: Binary data

v3-0001-Include-SQL-comments-on-echo-hidden-output.patch
Description: Binary data

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-22 Thread Tom Lane

Nathan Bossart  writes:
> On Fri, Mar 22, 2024 at 09:47:39AM -0500, Nathan Bossart wrote:
>> hash hash+simd hash+simd+bloom
>> create  1.27 1.27  1.28
>> grant   0.18 0.11  0.03

> For just hash+bloom, I'm seeing 1.29 and 0.04.

Yeah, that's about what I'd expect: hash+bloom ought to remove
most (not quite all) of the opportunity for simd to shine, because
the bloom filter should eliminate most of the list_member_oid calls.

Possibly we could fix that small regression in the create phase
with more careful tuning of the magic constants in the bloom
logic?  Although I'd kind of expect that the create step doesn't
ever invoke the bloom filter, else it would have been showing a
performance problem before; so this might not be a good test case
for helping us tune those.

I think remaining questions are:

* Is there any case where the new hash catcache logic could lose
measurably?  I kind of doubt it, because we were already computing
the hash value for list searches; so basically the only overhead
is one more palloc per cache during the first list search.  (If
you accumulate enough lists to cause a rehash, you're almost
surely well into winning territory.)

* Same question for the bloom logic, but here I think it's mostly
a matter of tuning those constants.

* Do we want to risk back-patching any of this, to fix the performance
regression in v16?  I think that the OP's situation is a pretty
narrow one, but maybe he's not the only person who managed to dodge
roles_is_member_of's performance issues in most other cases.

regards, tom lane

Re: [DOC] Introducing Quick Start Guide to PL/pgSQL and PL/Python Documentation

2024-03-22 Thread Robert Haas

On Fri, Dec 22, 2023 at 12:04 PM Pavel Stehule  wrote:
> I have not a strong opinion about it. My personal experience is so 99% PL 
> code is PLpgSQL, so it can be there, and other PL can be referenced there. I 
> am not sure if there is some common part for all PL.

After reading over this thread, it seems clear to me that there is no
consensus to proceed with this patch in its current form, and the
discussion seems to have stalled. Accordingly, I've marked this
"Returned with Feedback" in the CommitFest.

Ishaan, if you plan to rework this into a form which might be
acceptable given the review comments made up until now, please feel
free to change this back to "Waiting on Author", and/or move it to a
future CommitFest.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: MIN/MAX functions for a record

2024-03-22 Thread Tom Lane

Aleksander Alekseev  writes:
>> In my queries I often need to do MIN/MAX for tuples, for example:
>> SELECT MAX(row(year, month))
>> FROM (VALUES(2025, 1), (2024,2)) x(year, month);
>> This query throws:
>> ERROR: function max(record) does not exist
>> Was this ever discussed or is there something preventing the implementation?

> I believe it would be challenging to implement max(record) that would
> work reasonably well in a general case.

As long as you define it as "works the same way record comparison
does", ie base it on record_cmp(), I don't think it would be much
more than a finger exercise [*].  And why would you want it to act
any differently from record_cmp()?  Those semantics have been
established for a long time.

regards, tom lane

[*] Although conceivably there are some challenges in getting
record_cmp's caching logic to work in the context of an aggregate.

Re: hot updates and fillfactor

2024-03-22 Thread Fabrice Chapuis

Thanks for your explanation and for the links


On Tue, Mar 19, 2024 at 11:17 AM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi Fabrice,
>
> > I do not understand why hot_updates value is not 0 for pg_database?
> Given that reloptions is empty for this table that means it has a default
> value of 100%
>
> Maybe I didn't entirely understand your question, but why would you
> assume they are somehow related?
>
> According to the documentation [1][2]:
>
> pg_class.reloptions:
>   Access-method-specific options, as “keyword=value” strings
>
> pg_stat_all_tables.n_tup_hot_upd:
>   Number of rows HOT updated. These are updates where no successor
> versions are required in indexes.
>
> The value of n_tup_hot_upd is not zero because there are tuples that
> were HOT-updated. That's it. You can read more about HOT here [3].
>
> [1]: https://www.postgresql.org/docs/current/catalog-pg-class.html
> [2]: https://www.postgresql.org/docs/current/monitoring-stats.html
> [3]: https://www.postgresql.org/docs/current/storage-hot.html
>
> --
> Best regards,
> Aleksander Alekseev
>

Re: MIN/MAX functions for a record

2024-03-22 Thread Aleksander Alekseev

Hi,

> In my queries I often need to do MIN/MAX for tuples, for example:
>
>   SELECT MAX(row(year, month))
>   FROM (VALUES(2025, 1), (2024,2)) x(year, month);
>
> This query throws:
>
> ERROR: function max(record) does not exist
>
> In this case you can replace it with `MAX((year||'-'||month||'-1')::date)`. 
> However in my case I have an event table with `event_time` and `text` 
> columns, I'm grouping that table by some key and want to have the text for 
> the newest event. I would do `MAX(ROW(event_time, text)).text`. Workarounds 
> for this are clumsy, e.g. with a subquery with LIMIT 1.
>
> The lack of this feature is kind of unexpected, because the `>` operator or 
> `GREATEST` function are defined for records:
>
> SELECT
> GREATEST((2025, 1), (2024, 2)),
> (2025, 1) > (2024, 2)
>
> Was this ever discussed or is there something preventing the implementation?

I believe it would be challenging to implement max(record) that would
work reasonably well in a general case.

What if, for instance, one of the columns is JOSNB or XML? Not to
mention the fact that Postgres supports user-defined types which don't
necessarily have a reasonable order. Take a point in a 2D or 3D space
as an example. On top of that I doubt that the proposed query will
perform well since I don't see how it could benefit from using
indexes. I don't claim that this is necessarily true in your case but
generally one could argue that the wrong schema is used here and
instead of (year, month) pair a table should have a date/timestamp(tz)
column.

Personally I would choose format() function [1] in cases like this in
order to play it safe. Assuming of course that the table is small and
the query is not executed often.

[1]: https://www.postgresql.org/docs/current/functions-string.html

-- 
Best regards,
Aleksander Alekseev

Re: psql not responding to SIGINT upon db reconnection

2024-03-22 Thread Robert Haas

On Wed, Jan 31, 2024 at 1:07 PM Tristan Partin  wrote:
> I was looking for documentation of PQsocket(), but didn't find any
> standalone (unless I completely missed it). So I just copied how
> PQsocket() is documented in PQconnectPoll(). I am happy to document it
> separately if you think it would be useful.

As Jelte said back at the end of January, I think you just completely
missed it. The relevant part of libpq.sgml starts like this:

PQsocketPQsocket

As far as I know, we document all of the exported libpq functions in
that SGML file.

I think there's no real reason why we couldn't get at least 0001 and
maybe also 0002 into this release, but only if you move quickly on
this. Feature freeze is approaching rapidly.

Modulo the documentation changes, I think 0001 is pretty much ready to
go. 0002 needs comments. I'm also not so sure about the name
process_connection_state_machine(); it seems a little verbose. How
about something like wait_until_connected()? And maybe put it below
do_connect() instead of above.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-22 Thread Nathan Bossart

On Fri, Mar 22, 2024 at 09:47:39AM -0500, Nathan Bossart wrote:
> hash hash+simd hash+simd+bloom
> create  1.27 1.27  1.28
> grant   0.18 0.11  0.03

For just hash+bloom, I'm seeing 1.29 and 0.04.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Built-in CTYPE provider

2024-03-22 Thread Peter Eisentraut


On 21.03.24 01:13, Jeff Davis wrote:

Are there any test cases that illustrate the word boundary changes in
patch 0005?  It might be useful to test those against Oracle as well.

The tests include initcap('123abc') which is '123abc' in the PG_C_UTF8
collation vs '123Abc' in PG_UNICODE_FAST.

The reason for the latter behavior is that the Unicode Default Case
Conversion algorithm for toTitlecase() advances to the next Cased
character before mapping to titlecase, and digits are not Cased. ICU
has a configurable adjustment, and defaults in a way that produces
'123abc'.


I think this might be too big of a compatibility break.  So far, 
initcap('123abc') has always returned '123abc'.  If the new collation 
returns '123Abc' now, then that's quite a change.  These are not some 
obscure Unicode special case characters, after all.


What is the ICU configuration incantation for this?  Maybe we could have 
the builtin provider understand some of that, too.


Or we should create a function separate from initcap.

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-22 Thread Nathan Bossart

On Thu, Mar 21, 2024 at 08:59:54PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> On Thu, Mar 21, 2024 at 03:40:12PM -0500, Nathan Bossart wrote:
>>> On Thu, Mar 21, 2024 at 04:31:45PM -0400, Tom Lane wrote:
 I don't think we have any really cheap way to de-duplicate the role
 OIDs, especially seeing that it has to be done on-the-fly within the
 collection loop, and the order of roles_list is at least potentially
 interesting.  Not sure how to make further progress without a lot of
 work.
> 
>>> Assuming these are larger lists, this might benefit from optimizations
>>> involving SIMD intrinsics.
> 
>> Never mind.  With the reproduction script, I'm only seeing a ~2%
>> improvement with my patches.
> 
> Yeah, you cannot beat an O(N^2) problem by throwing SIMD at it.

I apparently had some sort of major brain fade when I did this because I
didn't apply your hashing patch when I ran this SIMD test.  With it
applied, I see a speedup of ~39%, which makes a whole lot more sense to me.
If I add the Bloom patch (updated with your suggestions), I get another
~73% improvement from there, and a much smaller regression in the role
creation portion.

hash hash+simd hash+simd+bloom
create  1.27 1.27  1.28
grant   0.18 0.11  0.03

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index cf5d08576a..4952dc9c1e 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -36,6 +36,7 @@
 #include "common/hashfn.h"
 #include "foreign/foreign.h"
 #include "funcapi.h"
+#include "lib/bloomfilter.h"
 #include "lib/qunique.h"
 #include "miscadmin.h"
 #include "utils/acl.h"
@@ -4946,6 +4947,7 @@ roles_is_member_of(Oid roleid, enum RoleRecurseType type,
 	ListCell   *l;
 	List	   *new_cached_roles;
 	MemoryContext oldctx;
+	bloom_filter *bf = NULL;
 
 	Assert(OidIsValid(admin_of) == PointerIsValid(admin_role));
 	if (admin_role != NULL)
@@ -5023,16 +5025,46 @@ roles_is_member_of(Oid roleid, enum RoleRecurseType type,
 			 * graph, we must test for having already seen this role. It is
 			 * legal for instance to have both A->B and A->C->B.
 			 */
-			roles_list = list_append_unique_oid(roles_list, otherid);
+			if ((bf && bloom_lacks_element(bf, (unsigned char *) , sizeof(Oid))) ||
+!list_member_oid(roles_list, otherid))
+			{
+if (bf == NULL && list_length(roles_list) > 1000)
+{
+	bf = bloom_create(1024 * 1024, work_mem, 0);
+	foreach_oid(role, roles_list)
+		bloom_add_element(bf, (unsigned char *) , sizeof(Oid));
+}
+roles_list = lappend_oid(roles_list, otherid);
+if (bf)
+	bloom_add_element(bf, (unsigned char *) , sizeof(Oid));
+			}
 		}
 		ReleaseSysCacheList(memlist);
 
 		/* implement pg_database_owner implicit membership */
 		if (memberid == dba && OidIsValid(dba))
-			roles_list = list_append_unique_oid(roles_list,
-ROLE_PG_DATABASE_OWNER);
+		{
+			Oid			dbowner = ROLE_PG_DATABASE_OWNER;
+
+			if ((bf && bloom_lacks_element(bf, (unsigned char *) , sizeof(Oid))) ||
+!list_member_oid(roles_list, dbowner))
+			{
+if (bf == NULL && list_length(roles_list) > 1000)
+{
+	bf = bloom_create(1024 * 1024, work_mem, 0);
+	foreach_oid(role, roles_list)
+		bloom_add_element(bf, (unsigned char *) , sizeof(Oid));
+}
+roles_list = lappend_oid(roles_list, dbowner);
+if (bf)
+	bloom_add_element(bf, (unsigned char *) , sizeof(Oid));
+			}
+		}
 	}
 
+	if (bf)
+		bloom_free(bf);
+
 	/*
 	 * Copy the completed list into TopMemoryContext so it will persist.
 	 */

Re: Adding comments to help understand psql hidden queries

2024-03-22 Thread Greg Sabino Mullane

On Thu, Mar 21, 2024 at 6:20 PM Peter Eisentraut 
wrote:

> lines are supposed to align vertically.  With your patch, the first line
> would have variable length depending on the command.
>

Yes, that is a good point. Aligning those would be quite tricky, what if we
just kept a standard width for the closing query? Probably the 24 stars we
currently have to match "QUERY", which it appears nobody has changed for
translation purposes yet anyway. (If I am reading the code correctly, it
would be up to the translators to maintain the vertical alignment).

Cheers,
Greg

Re: pg_upgrade --copy-file-range

2024-03-22 Thread Tomas Vondra

Here's a patch reworked along the lines from a couple days ago.

The primary goals were to add clone/copy_file_range while minimizing
unnecessary disruption, and overall cleanup of the patch. I'm not saying
it's committable, but I think the patch is much easier to understand.

The main change is that this abandons the idea of handling all possible
cases in a single function that looks like a maze of ifdefs, and instead
separates each case into it's own function and the decision happens much
earlier. This is pretty much exactly what pg_upgrade does, BTW.

There's maybe an argument that these functions could be unified and
moved to a library in src/common - I can imagine doing that, but I don't
think it's required. The functions are pretty trivial wrappers, and it's
not like we expect many more callers. And there's probably stuff we'd
need to keep out of that library (e.g. the decision which copy/clone
methods are available / should be used or error reporting). So it
doesn't seem worth it, at least for now.

There's one question, though. As it stands, there's a bit of asymmetry
between handling CopyFile() on WIN32 and the clone/copy_file_range on
other platforms). On WIN32, we simply automatically switch to CopyFile
automatically, if we don't need to calculate checksum. But with the
other methods, error out if the user requests those and we need to
calculate the checksum.

The asymmetry comes from the fact there's no option to request CopyFile
on WIN32, and we feel confident it's always the right thing to do (safe,
faster). We don't seem to know that for the other methods, so the user
has to explicitly request those. And if the user requests --clone, for
example, it'd be wrong to silently fallback to plain copy.

Still, I wonder if this might cause some undesirable issues during
restores. But I guess that's why we have --dry-run.

This asymmetry also shows a bit in the code - the CopyFile is coded and
called a bit differently from the other methods. FWIW I abandoned the
approach with "strategy" and just use a switch on CopyMode enum, just
like pg_upgrade does.

There's a couple more smaller changes:

- Addition of docs for --clone/--copy-file-range (shameless copy from
pg_upgrade docs).

- Removal of opt_errinfo - not only was it buggy, I think the code is
actually cleaner without it.

- Removal of EINTR retry condition from copy_file_range handling (this
is what Thomas ended up for pg_upgrade while committing that part).

Put together, this cuts the patch from ~40kB to ~15kB (most of this is
due to the cleanup of unnecessary whitespace changes, though).

I think to make this committable, this requires some review and testing,
ideally on a range of platforms.

One open question is how to allow testing this. For pg_upgrade we now
have PG_TEST_PG_UPGRADE_MODE, which can be set to e.g. "--clone". I
wonder if we should add PG_TEST_PG_COMBINEBACKUP_MODE ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 7b6a6fe1b555647109caec2817f9200ac8fe9db9 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Tue, 19 Mar 2024 15:16:29 +0100
Subject: [PATCH v20240322] pg_combinebackup - allow using
 clone/copy_file_range

---
 doc/src/sgml/ref/pg_combinebackup.sgml  |  34 +
 src/bin/pg_combinebackup/copy_file.c| 157 +++-
 src/bin/pg_combinebackup/copy_file.h|  18 ++-
 src/bin/pg_combinebackup/pg_combinebackup.c |  26 +++-
 src/bin/pg_combinebackup/reconstruct.c  |   5 +-
 src/bin/pg_combinebackup/reconstruct.h  |   5 +-
 6 files changed, 202 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/ref/pg_combinebackup.sgml b/doc/src/sgml/ref/pg_combinebackup.sgml
index 8a0a600c2b2..60a60e3fae6 100644
--- a/doc/src/sgml/ref/pg_combinebackup.sgml
+++ b/doc/src/sgml/ref/pg_combinebackup.sgml
@@ -185,6 +185,40 @@ PostgreSQL documentation
   
  
 
+ 
+  --clone
+  
+   
+Use efficient file cloning (also known as reflinks on
+some systems) instead of copying files to the new cluster.  This can
+result in near-instantaneous copying of the data files, giving the
+speed advantages of -k/--link while
+leaving the old cluster untouched.
+   
+
+   
+File cloning is only supported on some operating systems and file
+systems.  If it is selected but not supported, the
+pg_combinebackup run will error.  At present,
+it is supported on Linux (kernel 4.5 or later) with Btrfs and XFS (on
+file systems created with reflink support), and on macOS with APFS.
+   
+  
+ 
+
+ 
+  --copy-file-range
+  
+   
+Use the copy_file_range system call for efficient
+copying.  On some file systems this gives results similar to
+--clone, sharing physical disk blocks, while on others
+it may still copy blocks, but do so via an optimized path.  At present,
+it is supported on

Re: Support a wildcard in backtrace_functions

2024-03-22 Thread Jelte Fennema-Nio

On Fri, 22 Mar 2024 at 11:14, Bharath Rupireddy
 wrote:
> A few comments:
>
> 1.
> @@ -832,6 +849,9 @@ matches_backtrace_functions(const char *funcname)
>  {
>  const char *p;
>
> +if (!backtrace_functions || backtrace_functions[0] == '\0')
> +return true;
> +
>
> Shouldn't this be returning false to not log set backtrace when
> backtrace_functions is not set? Am I missing something here?

Empty string is considered the new wildcard, i.e. backtrace_functions
filtering is not enabled if it is empty. This seemed reasonable to me
since you should now disable backtraces by using log_backtrace=none,
having backtrace_functions='' mean the same thing seems rather
useless. I also documented this in the updated docs.

> 2.
> +{
> +{"log_backtrace", PGC_SUSET, LOGGING_WHAT,
> +gettext_noop("Sets if logs should include a backtrace."),
> +NULL
>
> IMV, log_backtrace, backtrace_min_level and backtrace_functions are
> interlinked, so keeping all of them as DEVELOPER_OPTIONS looks fine to
> me than having log_backtrace at just LOGGING_WHAT kind. Also, we must
> also mark log_backtrace as GUC_NOT_IN_SAMPLE.

I agree they are linked, but I don't think it's just useful for
developers to be able to set log_backtrace to internal (even if we
choose not to make "internal" the default).

> 3. I think it's worth adding a few examples to get backtraces in docs.
> For instance, what to set to get backtraces of all internal errors and
> what to set to get backtraces of all ERRORs coming from a specific
> function etc.

Good idea, I'll try to add those later. For now your specific cases would be:
log_backtrace = 'internal' (default in 0003)
backtrace_functions = '' (default)
backtrace_min_level = 'ERROR' (default)

and

log_backtrace = 'all'
backtrace_functions = 'someFunc'
backtrace_min_level = 'ERROR' (default)

> 4. I see the support for wildcard in backtrace_functions is missing.
> Is it intentionally left out? If not, can you make it part of 0003
> patch?

Yes it's intentional, see answer on 1.

> 5. The amount of backtraces generated is going to be too huge when
> setting log_backtrace = 'all' and backtrace_min_level = 'debug5'. With
> this setting installcheck generated 12MB worth of log and the test
> took about 55 seconds (as opposed to 48 seconds without it) The point
> is if these settings are misused, it can easily slow down the entire
> system and fill up disk space leading to crashes eventually. This
> makes a strong case for marking log_backtrace a developer only
> function.

I think the same argument can be made for many other GUCs that are not
marked as developer options (e.g. log_min_messages). Normally we
"protect" such options by using PGC_SUSET. DEVELOPER_OPTIONS is really
only meant for options that are only useful for developers

> 6. In continuation to comment #5, does anyone need backtrace for
> elevels like debugX and LOG etc.? What's the use of the backtrace in
> such cases?

I think at least WARNING and NOTICE could be useful in practice, but I
agree LOG and DEBUGX seem kinda useless. But it seems kinda strange to
not have them in the list, especially given it is pretty much no
effort to support them too.

Re: [PATCH] Add sortsupport for range types and btree_gist

2024-03-22 Thread Bernd Helmle

Hi Andrey,

Am Freitag, dem 22.03.2024 um 18:27 +0500 schrieb Andrey M. Borodin:
> > Here is a rebased version of the patch.
> 
> FWIW it would be nice at least port tests from commit that I
> referenced upthread.
> Nowadays we have injection points, so these tests can be much more
> stable.

Alright, that's a reasonable point. I will look into this. Did you see
other important things missing?

Changed status back to "Waiting On Author".

Thanks,

Bernd

Re: speed up a logical replica setup

2024-03-22 Thread Shubham Khanna

On Fri, Mar 22, 2024 at 9:02 AM Euler Taveira  wrote:
>
> On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
>
> There is a compilation error while building postgres with the patch
> due to a recent commit. I have attached a top-up patch v32-0003 to
> resolve this compilation error.
> I have not updated the version of the patch as I have not made any
> change in v32-0001 and v32-0002 patch.
>
>
> I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the
> main patch (v32-0001). This version also includes 2 new tests:
>
> - refuse to run if the standby server is running
> - refuse to run if the standby was promoted e.g. it is not in recovery
>
> The first one exercises a recent change (standby should be stopped) and the
> second one covers an important requirement.
>
> Based on the discussion [1] about the check functions, Vignesh suggested that 
> it
> should check both server before exiting. v33-0003 implements it. I don't have 
> a
> strong preference; feel free to apply it.
>
>
> [1] 
> https://www.postgresql.org/message-id/CALDaNm1Dg5tDRmaabk%2BZND4WF17NrNq52WZxCE%2B90-PGz5trQQ%40mail.gmail.com

I had run valgrind with pg_createsubscriber to see if there were any
issues. Valgrind reported the following issues:
==651272== LEAK SUMMARY:
==651272==definitely lost: 1,319 bytes in 18 blocks
==651272==indirectly lost: 1,280 bytes in 2 blocks
==651272==  possibly lost: 44 bytes in 3 blocks
==651272==still reachable: 3,066 bytes in 22 blocks
==651272== suppressed: 0 bytes in 0 blocks
==651272==
==651272== For lists of detected and suppressed errors, rerun with: -s
==651272== ERROR SUMMARY: 17 errors from 17 contexts (suppressed: 0 from 0)
The attached report has the details of the same.

Thanks and Regards:
Shubham Khanna.

valgrind.log
Description: Binary data

Re: UUID v7

2024-03-22 Thread Peter Eisentraut


On 20.03.24 19:08, Andrey M. Borodin wrote:

On 19 Mar 2024, at 13:55, Peter Eisentraut  wrote:

On 16.03.24 18:43, Andrey M. Borodin wrote:

On 15 Mar 2024, at 14:47, Aleksander Alekseev  wrote:

+1 to the idea. I doubt that anyone will miss it.

PFA v22.
Changes:
1. Squashed all editorialisation by Jelte
2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
3. Remove all traces of uuid_extract_variant()


I have committed a subset of this for now, namely the additions of 
uuid_extract_timestamp() and uuid_extract_version().  These seemed mature and 
agreed upon.  You can rebase the rest of your patch on top of that.


Great! Thank you! PFA v23 with rebase on HEAD.


I have been studying the uuidv() function.

I find this code extremely hard to follow.

We don't need to copy all that documentation from the RFC 4122bis 
document.  People can read that themselves.  What I would like to see is 
easy to find information what from there we are implementing.  Like,


- UUID version 7
- fixed-length dedicated counter
- counter is 18 bits
- 4 bits are initialized as zero

That's more or less all I would need to know what is going on.

That said, I don't understand why you say it's an 18 bit counter, when 
you overwrite 6 bits with variant and version.  Then it's just a 12 bit 
counter?  Which is the size of the rand_a field, so that kind of makes 
sense.  But 12 bits is the recommended minimum, and (in this patch) we 
don't use sub-millisecond timestamp precision, so we should probably use 
more than the minimum?


Also, you are initializing 4 bits (I think?) to zero to guard against 
counter rollovers (so it's really just an 8 bit counter?).  But nothing 
checks against such rollovers, so I don't understand the use of that.


The code code be organized better.  In the not-increment_counter case, 
you could use two separate pg_strong_random calls: One to initialize 
rand_b, starting at >data[8], and one to initialize the counter. 
Then the former could be shared between the two branches, and the code 
to assign the sequence_counter to the uuid fields could also be shared.


I would also prefer if the normal case (not-increment_counter) were the 
first if branch.



Some other notes on your patch:

- Your rebase duplicated the documentation of uuid_extract_timestamp and 
uuid_extract_version.


- PostgreSQL code uses uint64 etc. instead of uint64_t etc.

- It seems the added includes

#include "access/xlog.h"
#include "utils/builtins.h"
#include "utils/datetime.h"

are not needed.

- The static variables sequence_counter and previous_timestamp could be 
kept inside the uuidv7() function.

Re: sslinfo extension - add notbefore and notafter timestamps

2024-03-22 Thread Jacob Champion

On Fri, Mar 22, 2024 at 6:15 AM Daniel Gustafsson  wrote:
> While staging this to commit I realized one silly thing about it warranting
> another round here.  The ASN.1 timediff code can diff against *any* timestamp,
> not just the UNIX epoch, so we could just pass in the postgres epoch and skip
> the final subtraction since we're already correctly adjusted.  This removes 
> the
> non-overflow checked arithmetic with a simpler logic.

Ah, that's much better! +1.

--Jacob

Re: speed up a logical replica setup

2024-03-22 Thread vignesh C

On Fri, 22 Mar 2024 at 09:01, Euler Taveira  wrote:
>
> On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
>
> There is a compilation error while building postgres with the patch
> due to a recent commit. I have attached a top-up patch v32-0003 to
> resolve this compilation error.
> I have not updated the version of the patch as I have not made any
> change in v32-0001 and v32-0002 patch.
>
>
> I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the
> main patch (v32-0001). This version also includes 2 new tests:
>
> - refuse to run if the standby server is running
> - refuse to run if the standby was promoted e.g. it is not in recovery
>
> The first one exercises a recent change (standby should be stopped) and the
> second one covers an important requirement.

Few comments:
1) In error case PQclear and PQfinish should be called:
+   /* Secure search_path */
+   res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL);
+   if (PQresultStatus(res) != PGRES_TUPLES_OK)
+   {
+   pg_log_error("could not clear search_path: %s",
+PQresultErrorMessage(res));
+   if (exit_on_error)
+   exit(1);
+
+   return NULL;
+   }
+   PQclear(res);

2) Call fflush here before calling system command to get proper
ordered console output:
a) Call fflush:
+   int rc = system(cmd_str);
+
+   if (rc == 0)
+   pg_log_info("subscriber successfully changed
the system identifier");
+   else
+   pg_fatal("subscriber failed to change system
identifier: exit code: %d", rc);
+   }

b) similarly here:
+   pg_log_debug("pg_ctl command is: %s", pg_ctl_cmd->data);
+   rc = system(pg_ctl_cmd->data);
+   pg_ctl_status(pg_ctl_cmd->data, rc);
+   standby_running = true;

c) similarly here:
+   pg_ctl_cmd = psprintf("\"%s\" stop -D \"%s\" -s", pg_ctl_path,
+ datadir);
+   pg_log_debug("pg_ctl command is: %s", pg_ctl_cmd);
+   rc = system(pg_ctl_cmd);
+   pg_ctl_status(pg_ctl_cmd, rc);
+   standby_running = false;

3) Call PQClear in error cases:
a) Call PQClear
+   res = PQexec(conn, "SELECT system_identifier FROM
pg_catalog.pg_control_system()");
+   if (PQresultStatus(res) != PGRES_TUPLES_OK)
+   {
+   pg_log_error("could not get system identifier: %s",
+PQresultErrorMessage(res));
+   disconnect_database(conn, true);
+   }

b) similarly here
+   if (PQntuples(res) != 1)
+   {
+   pg_log_error("could not get system identifier: got %d
rows, expected %d row",
+PQntuples(res), 1);
+   disconnect_database(conn, true);
+   }
+

c) similarly here
+   res = PQexec(conn,
+"SELECT oid FROM pg_catalog.pg_database "
+"WHERE datname =
pg_catalog.current_database()");
+   if (PQresultStatus(res) != PGRES_TUPLES_OK)
+   {
+   pg_log_error("could not obtain database OID: %s",
+PQresultErrorMessage(res));
+   disconnect_database(conn, true);
+   }
+

d) There are few more places like this.

4) Since we are using a global variable, we might be able to remove
initializing many of them.
+   /* Default settings */
+   subscriber_dir = NULL;
+   opt.config_file = NULL;
+   opt.pub_conninfo_str = NULL;
+   opt.socket_dir = NULL;
+   opt.sub_port = DEFAULT_SUB_PORT;
+   opt.sub_username = NULL;
+   opt.database_names = (SimpleStringList){0};
+   opt.recovery_timeout = 0;

Regards,
Vignesh

Re: NLS for extension

2024-03-22 Thread Robert Haas

On Fri, Mar 22, 2024 at 9:27 AM Ed Behn  wrote:
> Thank you for your guidance. It turns out the call was coming from inside 
> the house. The error isn't caused by Postgres. It's the library that I'm 
> using that reported the error. My extension passes any errors it generates as 
> Postgres ERRORs.

Yeah, I was kind of wondering if it was something like that. But I
figured it was a capital mistake to theorize in advance of the data.

Glad you figured it out.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

2024-03-22 Thread Robert Haas

On Fri, Mar 22, 2024 at 9:35 AM Peter Eisentraut  wrote:
> >> But this separation was explicitly added a few years ago, because most
> >> people just want to read about the binaries.
> >
> > I really doubt that this is true.
>
> Here is the thread:
> https://www.postgresql.org/message-id/flat/CABUevExRCf8waYOsrCO-QxQL50XGapMf5dnWScOXj7X%3DMXW--g%40mail.gmail.com

Sorry. I didn't mean to dispute the point that the section was added a
few years ago, nor the point that most people just want to read about
the binaries. I am confident that both of those things are true. What
I do want to dispute is that having a four-sentence chapter in the
documentation index that tells people something they can find much
more easily without using the documentation at all is a good plan. I
agree with the concern that Magnus expressed on the thread, i.e:

> It's kind of strange that if you start your PostgreSQL journey by reading our 
> instructions, you get nothing useful about installing PostgreSQL from binary 
> packages other than "go ask somebody else about it".

But I don't agree that this was the right way to address that problem.
I think it would have been better to just add the download link to the
existing installation chapter. That's actually what we had in chapter
18, "Installation from Source Code on Windows", since removed. But for
some reason we decided that on non-Windows platforms, it needed a
whole new chapter rather than an extra sentence in the existing one. I
think that's massively overkill.

Alternately, I think it would be reasonable to address the concern by
just moving all the stuff about building from source code to an
appendix, and assume people can figure out how to download the
software without us needing to say anything in the documentation at
all. What was weird about the state before that patch, IMHO, was that
we both talked about building from source code and didn't talk about
binary packages. That can be addressed either by adding a mention of
binary packages, or by deemphasizing the idea of installing from
source code.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: UUID v7

2024-03-22 Thread Sergey Prokhorenko

Another source: Microservices Pattern: Database per service

| 
| 
| 
|  |  |

 |

 |
| 
|  | 
Microservices Pattern: Database per service

A service's database is private to that service
 |

 |

 |




Sergey Prokhorenko sergeyprokhore...@yahoo.com.au 

On Friday, 22 March 2024 at 04:58:59 pm GMT+3, Sergey Prokhorenko 
 wrote:  
 
 BTW: Each microservice should have its own database to ensure data isolation 
and independence, enabling better scalability and fault tolerance
Source: Microservices Pattern: Shared database

| 
| 
|  | 
Microservices Pattern: Shared database


 |

 |

 |



| 
| 
|  | 
Microservices Pattern: Shared database


 |

 |

 |







Sergey Prokhorenko sergeyprokhore...@yahoo.com.au 

On Friday, 22 March 2024 at 04:42:20 pm GMT+3, Sergey Prokhorenko 
 wrote:  
 
 Why not use a single UUID generator for the database table in this case, 
similar to autoincrement?


Sergey Prokhorenko
sergeyprokhore...@yahoo.com.au 

On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut 
 wrote:  
 
 On 21.03.24 16:21, Jelte Fennema-Nio wrote:
> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin  wrote:
>> Timer-based bits contribute to global sortability. But the real timers we 
>> have are not even millisecond adjusted. We can hope for ~few ms variation in 
>> one datacenter or in presence of atomic clocks.
> 
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.

There is that, and there are also multiple backend workers for one session.

Re: documentation structure

2024-03-22 Thread Robert Haas

On Thu, Mar 21, 2024 at 7:40 PM Peter Eisentraut  wrote:
> I'm highly against this.  If I want to read about PL/Python, why should
> I have to wade through PL/Perl and PL/Tcl?
>
> I think, abstractly, in a book, PL/Python should be a chapter of its
> own.  Just like GiST should be a chapter of its own.  Because they are
> self-contained topics.

On the other hand, in a book, chapters tend to be of relatively
uniform length. People don't usually write a book with some chapters
that are 100+ pages long, and others that are a single page, or even
just a couple of sentences. I mean, I'm sure it's been done, but it's
not a normal way to write a book.

And I don't believe that if someone were writing a physical book about
PostgreSQL from scratch, they'd ever end up with a top-level chapter
that looks anything like our GiST chapter. All of the index AM
chapters are quite obviously clones of each other, and they're all
quite short. Surely you'd make them sections within a chapter, not
entire chapters.

I do agree that PL/pgsql is more arguable. I can imagine somebody
writing a book about PostgreSQL and choosing to make that topic into a
whole chapter.

However, I also think that people don't make decisions about what
should be a chapter in a vacuum. If you've got 100 people writing a
book together, which is essentially what we actually do have, and each
of those people makes decisions in isolation about what is worthy of
being a chapter, then you end up with exactly the kind of mess that we
now have. Some chapters are long and some are short. Some are
well-written and some are poorly written. Some are updated regularly
and others have hardly been touched in a decade. Books have editors to
straighten out those kinds of inconsistencies so that there's some
uniformity to the product as a whole.

The problem with that, of course, is that it invites bike-shedding. As
you say, every decision that is reflected in our documentation was
made for some reason, and most of them will have been made by
prominent, active committers. So discussions about how to improve
things can easily bog down even when people agree on the overall
goals, simply because few individual changes find consensus. I hope
that doesn't happen here, because I think most people who have
commented so far agree that there is a problem here and that we should
try to fix it. Let's not let the perfect be the enemy of the good.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: UUID v7

2024-03-22 Thread Sergey Prokhorenko

BTW: Each microservice should have its own database to ensure data isolation 
and independence, enabling better scalability and fault tolerance
Source: Microservices Pattern: Shared database

| 
| 
|  | 
Microservices Pattern: Shared database


 |

 |

 |





Sergey Prokhorenko sergeyprokhore...@yahoo.com.au 

On Friday, 22 March 2024 at 04:42:20 pm GMT+3, Sergey Prokhorenko 
 wrote:  
 
 Why not use a single UUID generator for the database table in this case, 
similar to autoincrement?


Sergey Prokhorenko
sergeyprokhore...@yahoo.com.au 

On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut 
 wrote:  
 
 On 21.03.24 16:21, Jelte Fennema-Nio wrote:
> On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin  wrote:
>> Timer-based bits contribute to global sortability. But the real timers we 
>> have are not even millisecond adjusted. We can hope for ~few ms variation in 
>> one datacenter or in presence of atomic clocks.
> 
> I think the main benefit of using microseconds would not be
> sortability between servers, but sortability between backends.

There is that, and there are also multiple backend workers for one session.

Re: Using the %m printf format more

2024-03-22 Thread Dagfinn Ilmari Mannsåker

Dagfinn Ilmari Mannsåker  writes:

> Michael Paquier  writes:
>
>> On Wed, Mar 13, 2024 at 02:33:52PM +0100, Peter Eisentraut wrote:
>>> The 0002 patch looks sensible.  It would be good to fix that, otherwise it
>>> could have some confusing outcomes in the future.
>>
>> You mean if we begin to use %m in future callers of
>> emit_tap_output_v(), hypothetically?  That's a fair argument.
>
> Yeah, developers would rightfully expect to be able to use %m with
> anything that takes a printf format string.  Case in point: when I was
> first doing the conversion I did change the bail() and diag() calls in
> pg_regress to %m, and only while I was preparing the patch for
> submission did I think to check the actual implementation to see if it
> was safe to do so.
>
> The alternative would be to document that you can't use %m with these
> functions, which is silly IMO, given how simple the fix is.
>
> One minor improvement I can think of is to add a comment by the
> save_errno declaration noting that it's needed in order to support %m.

Here's an updated patch that adds such a comment.  I'll add it to the
commitfest later today unless someone commits it before then.

> - ilmari

>From 4312d746a722582202a013c7199f5c42e88db951 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Mon, 11 Mar 2024 11:08:14 +
Subject: [PATCH v2] Save errno in emit_tap_output_v() and use %m in callers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This defends aganst the fprintf() calls before vprintf(…, fmt, …)
clobbering errno, thus breaking %m.
---
 src/test/regress/pg_regress.c | 84 ++-
 1 file changed, 34 insertions(+), 50 deletions(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 76f01c73f5..ea94d874b0 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -341,6 +341,14 @@ emit_tap_output_v(TAPtype type, const char *fmt, va_list argp)
 {
 	va_list		argp_logfile;
 	FILE	   *fp;
+	int			save_errno;
+
+	/*
+	 * The fprintf() calls used to output TAP protocol elements might clobber
+	 * errno, so we need to save it and restore it before vfprintf()-ing the
+	 * user's format string, in case it contains %m placeholders.
+	 */
+	save_errno = errno;
 
 	/*
 	 * Diagnostic output will be hidden by prove unless printed to stderr. The
@@ -379,9 +387,13 @@ emit_tap_output_v(TAPtype type, const char *fmt, va_list argp)
 		if (logfile)
 			fprintf(logfile, "# ");
 	}
+	errno = save_errno;
 	vfprintf(fp, fmt, argp);
 	if (logfile)
+	{
+		errno = save_errno;
 		vfprintf(logfile, fmt, argp_logfile);
+	}
 
 	/*
 	 * If we are entering into a note with more details to follow, register
@@ -492,10 +504,7 @@ make_temp_sockdir(void)
 
 	temp_sockdir = mkdtemp(template);
 	if (temp_sockdir == NULL)
-	{
-		bail("could not create directory \"%s\": %s",
-			 template, strerror(errno));
-	}
+		bail("could not create directory \"%s\": %m", template);
 
 	/* Stage file names for remove_temp().  Unsafe in a signal handler. */
 	UNIXSOCK_PATH(sockself, port, temp_sockdir);
@@ -616,8 +625,7 @@ load_resultmap(void)
 		/* OK if it doesn't exist, else complain */
 		if (errno == ENOENT)
 			return;
-		bail("could not open file \"%s\" for reading: %s",
-			 buf, strerror(errno));
+		bail("could not open file \"%s\" for reading: %m", buf);
 	}
 
 	while (fgets(buf, sizeof(buf), f))
@@ -1046,10 +1054,7 @@ config_sspi_auth(const char *pgdata, const char *superuser_name)
 #define CW(cond)	\
 	do { \
 		if (!(cond)) \
-		{ \
-			bail("could not write to file \"%s\": %s", \
- fname, strerror(errno)); \
-		} \
+			bail("could not write to file \"%s\": %m", fname); \
 	} while (0)
 
 	res = snprintf(fname, sizeof(fname), "%s/pg_hba.conf", pgdata);
@@ -1064,8 +1069,7 @@ config_sspi_auth(const char *pgdata, const char *superuser_name)
 	hba = fopen(fname, "w");
 	if (hba == NULL)
 	{
-		bail("could not open file \"%s\" for writing: %s",
-			 fname, strerror(errno));
+		bail("could not open file \"%s\" for writing: %m", fname);
 	}
 	CW(fputs("# Configuration written by config_sspi_auth()\n", hba) >= 0);
 	CW(fputs("host all all 127.0.0.1/32  sspi include_realm=1 map=regress\n",
@@ -1079,8 +1083,7 @@ config_sspi_auth(const char *pgdata, const char *superuser_name)
 	ident = fopen(fname, "w");
 	if (ident == NULL)
 	{
-		bail("could not open file \"%s\" for writing: %s",
-			 fname, strerror(errno));
+		bail("could not open file \"%s\" for writing: %m", fname);
 	}
 	CW(fputs("# Configuration written by config_sspi_auth()\n", ident) >= 0);
 
@@ -1210,7 +1213,7 @@ spawn_process(const char *cmdline)
 	pid = fork();
 	if (pid == -1)
 	{
-		bail("could not fork: %s", strerror(errno));
+		bail("could not fork: %m");
 	}
 	if (pid == 0)
 	{
@@ -1226,7 +1229,7 @@ spawn_process(const char *cmdline)
 		cmdline2 = psprintf("exec %s", cmdline);
 		execl(shellprog, shellprog, "-c", cmdline2, (char *) NULL);
 		/*

1 2 >

1 - 100 of 147 matches

Mail list logo