[PATCH] Add roman support for to_number function

2024-08-30 Thread Hunaid Sohail
Hi,

While looking at formatting.c file, I noticed a TODO about "add support for
roman number to standard number conversion"  (
https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/formatting.c#L52
)

I have attached the patch that adds support for this and converts roman
numerals to standard numbers (1 to 3999) while also checking if roman
numerals are valid or not.
I have also added a new error code: ERRCODE_INVALID_ROMAN_NUMERAL in case
of invalid numerals.

A few examples:

postgres=# SELECT to_number('MC', 'RN');
 to_number
---
  1100
(1 row)

postgres=# SELECT to_number('XIV', 'RN');
 to_number
---
14
(1 row)

postgres=# SELECT to_number('MMMCMXCIX', 'RN');
 to_number
---
  3999
(1 row)

postgres=# SELECT to_number('MCCD', 'RN');
ERROR:  invalid roman numeral

I would appreciate your feedback on the following cases:
- Is it okay for the function to handle Roman numerals in a
case-insensitive way? (e.g., 'XIV', 'xiv', and 'Xiv' are all seen as 14).
- How should we handle Roman numerals with leading or trailing spaces, like
' XIV' or 'MC '? Should we trim the spaces, or would it be better to throw
an error in such cases?

I have tested the changes and would appreciate any suggestions for
improvement. I will update the docs and regressions in the next version of
patch.

Regards,
Hunaid Sohail


v1-0001-Add-RN-rn-support-for-to_number-function.patch
Description: Binary data


Re: consider -Wmissing-variable-declarations

2024-08-30 Thread Peter Eisentraut

On 28.08.24 05:31, Thomas Munro wrote:

On Wed, Jun 19, 2024 at 3:02 AM Andres Freund  wrote:

-const char *EAN13_range[][2] = {
+static const char *EAN13_range[][2] = {
   {"000", "019"}, /* GS1 US */
   {"020", "029"}, /* Restricted distribution (MO 
defined) */
   {"030", "039"}, /* GS1 US */



-const char *ISBN_range[][2] = {
+static const char *ISBN_range[][2] = {
   {"0-00", "0-19"},
   {"0-200", "0-699"},
   {"0-7000", "0-8499"},
@@ -967,7 +967,7 @@ const char *ISBN_range[][2] = {
   */


FYI these ones generate -Wunused-variable warnings from headerscheck
on CI, though it doesn't fail the task.  Hmm, these aren't really
headers, are they?


Yes, it looks like these ought to be excluded from checking:

diff --git a/src/tools/pginclude/headerscheck 
b/src/tools/pginclude/headerscheck

index 436e2b92a33..3fc737d2cc1 100755
--- a/src/tools/pginclude/headerscheck
+++ b/src/tools/pginclude/headerscheck
@@ -138,6 +138,12 @@ do
test "$f" = src/pl/tcl/pltclerrcodes.h && continue

# Also not meant to be included standalone.
+   test "$f" = contrib/isn/EAN13.h && continue
+   test "$f" = contrib/isn/ISBN.h && continue
+   test "$f" = contrib/isn/ISMN.h && continue
+   test "$f" = contrib/isn/ISSN.h && continue
+   test "$f" = contrib/isn/UPC.h && continue
+
test "$f" = src/include/common/unicode_nonspacing_table.h && continue
test "$f" = src/include/common/unicode_east_asian_fw_table.h && 
continue






Re: pgsql: Add more SQL/JSON constructor functions

2024-08-30 Thread Amit Langote
On Thu, Aug 22, 2024 at 12:44 PM Amit Langote  wrote:
> On Thu, Aug 22, 2024 at 11:02 jian he  wrote:
>> On Tue, Jul 30, 2024 at 12:59 PM Amit Langote  
>> wrote:
>> > On Fri, Jul 26, 2024 at 11:19 PM jian he  
>> > wrote:
>> > > {
>> > > ...
>> > > /*
>> > >  * For expression nodes that support soft errors.  Should be set to 
>> > > NULL
>> > >  * before calling ExecInitExprRec() if the caller wants errors 
>> > > thrown.
>> > >  */
>> > > ErrorSaveContext *escontext;
>> > > } ExprState;
>> > >
>> > > i believe by default makeNode will set escontext to NULL.
>> > > So the comment should be, if you want to catch the soft errors, make
>> > > sure the escontext pointing to an allocated ErrorSaveContext.
>> > > or maybe just say, default is NULL.
>> > >
>> > > Otherwise, the original comment's meaning feels like: we need to
>> > > explicitly set it to NULL
>> > > for certain operations, which I believe is false?
>> >
>> > OK, I'll look into updating this.

See 0001.

>> > > json_behavior_type:
>> > > ERROR_P{ $$ = JSON_BEHAVIOR_ERROR; }
>> > > | NULL_P{ $$ = JSON_BEHAVIOR_NULL; }
>> > > | TRUE_P{ $$ = JSON_BEHAVIOR_TRUE; }
>> > > | FALSE_P{ $$ = JSON_BEHAVIOR_FALSE; }
>> > > | UNKNOWN{ $$ = JSON_BEHAVIOR_UNKNOWN; }
>> > > | EMPTY_P ARRAY{ $$ = JSON_BEHAVIOR_EMPTY_ARRAY; }
>> > > | EMPTY_P OBJECT_P{ $$ = JSON_BEHAVIOR_EMPTY_OBJECT; }
>> > > /* non-standard, for Oracle compatibility only */
>> > > | EMPTY_P{ $$ = JSON_BEHAVIOR_EMPTY_ARRAY; }
>> > > ;
>> > >
>> > > EMPTY_P behaves the same as EMPTY_P ARRAY
>> > > so for function GetJsonBehaviorConst, the following "case
>> > > JSON_BEHAVIOR_EMPTY:" is wrong?
>> > >
>> > > case JSON_BEHAVIOR_NULL:
>> > > case JSON_BEHAVIOR_UNKNOWN:
>> > > case JSON_BEHAVIOR_EMPTY:
>> > > val = (Datum) 0;
>> > > isnull = true;
>> > > typid = INT4OID;
>> > > len = sizeof(int32);
>> > > isbyval = true;
>> > > break;
>> > >
>> > > also src/backend/utils/adt/ruleutils.c
>> > > if (jexpr->on_error->btype != JSON_BEHAVIOR_EMPTY)
>> > > get_json_behavior(jexpr->on_error, context, "ERROR");
>> >
>> > Something like the attached makes sense?  While this meaningfully
>> > changes the deparsing output, there is no behavior change for
>> > JsonTable top-level path execution.  That's because the behavior when
>> > there's an error in the execution of the top-level path is to throw it
>> > or return an empty set, which is handled in jsonpath_exec.c, not
>> > execExprInterp.c.

See 0002.

I'm also attaching 0003 to fix a minor annoyance that JSON_TABLE()
columns' default ON ERROR, ON EMPTY behaviors are unnecessarily
emitted in the deparsed output when the top-level ON ERROR behavior is
ERROR.

Will push these on Monday.

I haven't had a chance to take a closer look at your patch to optimize
the code in ExecInitJsonExpr() yet.

-- 
Thanks, Amit Langote


v1-0002-SQL-JSON-Fix-default-ON-ERROR-behavior-for-JSON_T.patch
Description: Binary data


v1-0001-Update-comment-about-ExprState.escontext.patch
Description: Binary data


v1-0003-SQL-JSON-Fix-JSON_TABLE-column-deparsing.patch
Description: Binary data


Re: Set query_id for query contained in utility statement

2024-08-30 Thread Anthonin Bonnefoy
On Tue, Aug 27, 2024 at 11:14 AM jian he  wrote:
> also it's ok to use passed (ParseState *pstate) for
> {
> estate = CreateExecutorState();
> estate->es_param_list_info = params;
> paramLI = EvaluateParams(pstate, entry, execstmt->params, estate);
> }
> ?
> I really don't know.
>
> some of the change is refactoring, maybe you can put it into a separate patch.

Thanks for the review. I think the parser state is mostly used for the
error callbacks and parser_errposition but I'm not 100% sure. Either
way, you're right and it probably shouldn't be in the patch. I've
modified the patch to restrict the changes to only add the necessary
query jumble and post parse hook calls.


v4-0001-Set-query_id-for-queries-contained-in-utility-sta.patch
Description: Binary data


Re: ANALYZE ONLY

2024-08-30 Thread jian he
On Thu, Aug 29, 2024 at 7:31 PM Melih Mutlu  wrote:
>
> Hi Michael,
>
> Michael Harris , 23 Ağu 2024 Cum, 13:01 tarihinde şunu 
> yazdı:
>>
>> V2 of the patch is attached.
>
>
> Thanks for updating the patch. I have a few more minor feedbacks.
>
>> -ANALYZE [ ( option [, ...] ) ] 
>> [ table_and_columns [, ...] ]
>> +ANALYZE [ ( option [, ...] ) ] 
>> [ [ ONLY ] table_and_columns [, 
>> ...] ]
>
>
> I believe moving "[ ONLY ]" to the definitions of table_and_columns below, as 
> you did with "[ * ]", might be better to be consistent with other places (see 
> [1])
>
>> + if ((options & VACOPT_VACUUM) && is_partitioned_table && ! 
>> include_children)
>
>

I think you are right.

ANALYZE [ ( option [, ...] ) ] [ [ ONLY ] table_and_columns [, ...] ]

seems not explain commands like:

ANALYZE ONLY only_parted(a), ONLY only_parted(b);




Re: Collect statistics about conflicts in logical replication

2024-08-30 Thread shveta malik
On Fri, Aug 30, 2024 at 12:15 PM Zhijie Hou (Fujitsu)
 wrote:
>
>
> Here is V5 patch which addressed above and Shveta's[1] comments.
>

The patch looks good to me.

thanks
Shveta




Re: ANALYZE ONLY

2024-08-30 Thread David Rowley
Hi Michael,

On Fri, 23 Aug 2024 at 22:01, Michael Harris  wrote:
> V2 of the patch is attached.

(I understand this is your first PostgreSQL patch)

I just wanted to make sure you knew about the commitfest cycle we use
for the development of PostgreSQL.  If you've not got one already, can
you register a community account. That'll allow you to include this
patch in the September commitfest that starts on the 1st. See
https://commitfest.postgresql.org/49/

I understand there's now some sort of cool-off period for new
community accounts, so if you have trouble adding the patch before the
CF starts, let me know and I should be able to do it for you. The
commitfest normally closes for new patches around 12:00 UTC on the 1st
of the month.

There's a bit more information in
https://wiki.postgresql.org/wiki/Submitting_a_Patch#Patch_submission

David




Re: Add contrib/pg_logicalsnapinspect

2024-08-30 Thread Bertrand Drouvot
Hi,

On Thu, Aug 29, 2024 at 02:15:47PM +, Bertrand Drouvot wrote:
> I don't see any use case where it could be useful when the server is down. So,
> I think I'll move forward with in core functions (unless someone has a 
> different
> opinion).
> 

Please find v2 attached that creates the 2 new in core functions.

Note that once those new functions are in (or maybe sooner), I'll submit an
additional patch to get rid of the code duplication between the new
ValidateSnapshotFile() and SnapBuildRestore().

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 6c5a1ad66b203036739aae955932b8e3813c71e3 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Thu, 29 Aug 2024 20:51:31 +
Subject: [PATCH v2] Functions to get ondisk logical snapshots details

Provides SQL functions that allow to inspect the contents of serialized logical
snapshots of a running database cluster, which is useful for debugging or
educational purposes.
---
 contrib/test_decoding/Makefile|   2 +-
 .../expected/get_ondisk_snapshot_info.out |  57 +
 contrib/test_decoding/meson.build |   1 +
 .../specs/get_ondisk_snapshot_info.spec   |  32 +++
 doc/src/sgml/func.sgml|  53 +
 src/backend/replication/logical/snapbuild.c   | 225 ++
 src/include/catalog/pg_proc.dat   |  16 ++
 7 files changed, 385 insertions(+), 1 deletion(-)
  17.3% contrib/test_decoding/expected/
  12.6% contrib/test_decoding/specs/
  17.3% doc/src/sgml/
  45.3% src/backend/replication/logical/
   6.6% src/include/catalog/

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index a4ba1a509a..b1b8ffa9e8 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -9,7 +9,7 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
 ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
 	oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
 	twophase_snapshot slot_creation_error catalog_change_snapshot \
-	skip_snapshot_restore
+	skip_snapshot_restore get_ondisk_snapshot_info
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/get_ondisk_snapshot_info.out b/contrib/test_decoding/expected/get_ondisk_snapshot_info.out
new file mode 100644
index 00..b676ccd528
--- /dev/null
+++ b/contrib/test_decoding/expected/get_ondisk_snapshot_info.out
@@ -0,0 +1,57 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s0_init s0_begin s0_savepoint s0_truncate s1_checkpoint s1_get_changes s0_commit s0_begin s0_insert s1_checkpoint s1_get_changes s0_commit s1_get_changes s1_get_logical_snapshot_info s1_get_logical_snapshot_meta
+step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+?column?
+
+init
+(1 row)
+
+step s0_begin: BEGIN;
+step s0_savepoint: SAVEPOINT sp1;
+step s0_truncate: TRUNCATE tbl1;
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+
+(0 rows)
+
+step s0_commit: COMMIT;
+step s0_begin: BEGIN;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data   
+---
+BEGIN  
+table public.tbl1: TRUNCATE: (no-flags)
+COMMIT 
+(3 rows)
+
+step s0_commit: COMMIT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data 
+-
+BEGIN
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+COMMIT   
+(3 rows)
+
+step s1_get_logical_snapshot_info: SELECT (pg_get_logical_snapshot_info(f.name::pg_lsn)).state,(pg_get_logical_snapshot_info(f.name::pg_lsn)).catchange_count,array_length((pg_get_logical_snapshot_info(f.name::pg_lsn)).catchange_xip,1),(pg_get_logical_snapshot_info(f.name::pg_lsn)).committed_count,array_length((pg_get_logical_snapshot_info(f.name::pg_lsn)).committed_xip,1) FROM (SELECT replace(replace(name,'.snap',''),'-','/') AS name FROM pg_ls_logicalsnapdir()) AS f ORDER BY 2;
+state|catchange_count|array_length|committed_count|array_length
+-+---++---+
+2|  0|   

Re: long-standing data loss bug in initial sync of logical replication

2024-08-30 Thread Shlok Kyal
> BTW, we should do some performance testing by having a mix of DML and
> DDLs to see the performance impact of this patch.
>
> [1] - 
> https://www.postgresql.org/message-id/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=k...@mail.gmail.com
>

I did some performance testing and I found some performance impact for
the following case:

1. Created a publisher, subscriber set up on a single table, say 'tab_conc1';
2. Created a second publisher, subscriber set on a single table say 'tp';
3. Created 'tcount' no. of tables. These tables are not part of any publication.
4. There are two sessions running in parallel, let's say S1 and S2.
5. Begin a transaction in S1.
6. Now in a loop (this loop runs 100 times):
 S1: Insert a row in table 'tab_conc1'
 S1: Insert a row in all 'tcount' tables.
 S2: BEGIN;  Alter publication for 2nd publication; COMMIT;
The current logic in the patch will call the function
'rel_sync_cache_publication_cb' during invalidation. This will
invalidate the cache for all the tables. So cache related to all the
tables i.e. table 'tab_conc1', 'tcount' tables will be invalidated.
7. COMMIT the transaction in S1.

The performance in this case is:
No. of tables  | With patch (in ms) | With head (in ms)
-
tcount = 100  | 101376.4   | 101357.8
tcount = 1000| 994085.4   | 993471.4

For 100 tables the performance is slow by '0.018%' and for 1000 tables
performance is slow by '0.06%'.
These results are the average of 5 runs.

Other than this I tested the following cases but did not find any
performance impact:
1. with 'tcount = 10'. But I didn't find any performance impact.
2. with 'tcount = 0' and running the loop 1000 times. But I didn't
find any performance impact.

I have also attached the test script and the machine configurations on
which performance testing was done.
Next I am planning to test solely on the logical decoding side and
will share the results.

Thanks and Regards,
Shlok Kyal
NAME="Red Hat Enterprise Linux Server"
VERSION="7.9 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.9"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.9 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.9:GA:server"
HOME_URL="https://www.redhat.com/";
BUG_REPORT_URL="https://bugzilla.redhat.com/";

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.9
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.9"
CPU INFO

processor   : 119
vendor_id   : GenuineIntel
cpu family  : 6
model   : 62
model name  : Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz
stepping: 7
microcode   : 0x715
cpu MHz : 1505.957
cache size  : 38400 KB
physical id : 3
siblings: 30
core id : 14
cpu cores   : 15
apicid  : 125
initial apicid  : 125
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb intel_ppin ssbd ibrs 
ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt 
dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
bogomips: 5629.54
clflush size: 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:
MemTotal:   792237404 kB
MemFree:724051992 kB
MemAvailable:   762505368 kB
Buffers:2108 kB
Cached: 43885588 kB
SwapCached:0 kB
Active: 22276460 kB
Inactive:   21761812 kB
Active(anon):1199380 kB
Inactive(anon):  4228212 kB
Active(file):   21077080 kB
Inactive(file): 17533600 kB
Unevictable:   0 kB
Mlocked:   0 kB
SwapTotal:   4194300 kB
SwapFree:4194300 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages:150796 kB
Mapped:  5283248 kB
Shmem:   5277044 kB
Slab:1472084 kB
SReclaimable:1165144 kB
SUnreclaim:   306940 kB
KernelStack:   17504 kB
PageTables:21540 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:400313000 kB
Committed_AS:   86287072 kB
VmallocTotal:   34359738367 kB
VmallocUsed: 1942092 kB
VmallocChunk:   33753397244 kB
Percpu:36352 kB
HardwareCorrupted: 0 kB
AnonHugePages: 81920 kB
CmaTotal:  0 kB
CmaFree:   0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp

Re: [PoC] Federated Authn/z with OAUTHBEARER

2024-08-30 Thread Peter Eisentraut

On 28.08.24 18:31, Jacob Champion wrote:

On Mon, Aug 26, 2024 at 4:23 PM Jacob Champion
 wrote:

I was having trouble reasoning about the palloc-that-isn't-palloc code
during the first few drafts, so I will try a round with the jsonapi_
prefix.


v27 takes a stab at that. I have kept the ALLOC/FREE naming to match
the strategy in other src/common source files.


This looks pretty good to me.  Maybe on the naming side, this seems like 
a gratuitous divergence:


+#define jsonapi_createStringInfo   makeStringInfo


The name of the variable JSONAPI_USE_PQEXPBUFFER leads to sections of
code that look like this:

+#ifdef JSONAPI_USE_PQEXPBUFFER
+if (!new_prediction || !new_fnames || !new_fnull)
+return false;
+#endif

To me it wouldn't be immediately obvious why "using PQExpBuffer" has
anything to do with this code; the key idea is that we expect any
allocations to be able to fail. Maybe a name like JSONAPI_ALLOW_OOM or
JSONAPI_SHLIB_ALLOCATIONS or...?


Seems ok to me as is.  I think the purpose of JSONAPI_USE_PQEXPBUFFER is 
adequately explained by this comment


+/*
+ * By default, we will use palloc/pfree along with StringInfo.  In libpq,
+ * use malloc and PQExpBuffer, and return JSON_OUT_OF_MEMORY on 
out-of-memory.

+ */
+#ifdef JSONAPI_USE_PQEXPBUFFER

For some of the other proposed names, I'd be afraid that someone might 
think you are free to mix and match APIs, OOM behavior, and compilation 
options.



Some comments on src/include/common/jsonapi.h:

-#include "lib/stringinfo.h"

I suspect this will fail headerscheck?  Probably needs an exception 
added there.


+#ifdef JSONAPI_USE_PQEXPBUFFER
+#define StrValType PQExpBufferData
+#else
+#define StrValType StringInfoData
+#endif

Maybe use jsonapi_StrValType here.

+typedef struct StrValType StrValType;

I don't think that is needed.  It would just duplicate typedefs that 
already exist elsewhere, depending on what StrValType is set to.


+   boolparse_strval;
+   StrValType *strval; /* only used if 
parse_strval == true */


The parse_strval field could use a better explanation.

I actually don't understand the need for this field.  AFAICT, this is
just used to record whether strval is valid.  But in the cases where
it's not valid, why do we need to record that?  Couldn't you just return
failed_oom in those cases?





Re: Add contrib/pg_logicalsnapinspect

2024-08-30 Thread Amit Kapila
On Thu, Aug 29, 2024 at 6:33 PM Bharath Rupireddy
 wrote:
>
> On Thu, Aug 29, 2024 at 3:44 PM Bertrand Drouvot
>  wrote:
> >
> > Yeah that's fair. And now I'm wondering if we need an extra module. I think
> > we could "simply" expose 2 new functions in core, thoughts?
> >
> > > > What do you think? Did you have something else in mind?
> > > >
> > >
> > > On similar lines, we can also provide a function to get the slot's
> > > on-disk data.
> >
> > Yeah, having a way to expose the data from the disk makes fully sense to me.
> >
> > > IIRC, Bharath had previously proposed a tool to achieve
> > > the same. It is fine if we don't want to add that as part of this
> > > patch but I mentioned it because by having that we can have a set of
> > > functions to view logical decoding data.
> >
> > That's right. I think this one would be simply enough to expose one or two
> > functions in core too (and probably would not need an extra module).
>
> +1 for functions in core unless this extra module
> pg_logicalsnapinspect works as a tool to be helpful even when the
> server is down.
>

We have an example of pageinspect which provides low-level functions
to aid debugging. The proposal for these APIs also seems to fall in
the same category, so why go for the core for these functions?

-- 
With Regards,
Amit Kapila.




Re: why there is not VACUUM FULL CONCURRENTLY?

2024-08-30 Thread Junwang Zhao
Hi,

On Tue, Aug 27, 2024 at 8:01 PM Antonin Houska  wrote:
>
> Attached is version 2, the feature itself is now in 0004.
>
> Unlike version 1, it contains some regression tests (0006) and a new GUC to
> control how long the AccessExclusiveLock may be held (0007).
>
> Kirill Reshke  wrote:
>
> > On Fri, 2 Aug 2024 at 11:09, Antonin Houska  wrote:
> > >
> > > Kirill Reshke  wrote:
> > > > However, in general, the 3rd patch is really big, very hard to
> > > > comprehend.  Please consider splitting this into smaller (and
> > > > reviewable) pieces.
> > >
> > > I'll try to move some preparation steps into separate diffs, but not sure 
> > > if
> > > that will make the main diff much smaller. I prefer self-contained 
> > > patches, as
> > > also explained in [3].
> >
> > Thanks for sharing [3], it is a useful link.
> >
> > There is actually one more case when ACCESS EXCLUSIVE is held: during
> > table rewrite (AT set TAM, AT set Tablespace and AT alter column type
> > are some examples).
> > This can be done CONCURRENTLY too, using the same logical replication
> > approach, or do I miss something?
>
> Yes, the logical replication can potentially be used in other cases.
>
> > I'm not saying we must do it immediately, this should be a separate
> > thread, but we can do some preparation work here.
> >
> > I can see that a bunch of functions which are currently placed in
> > cluster.c can be moved to something like
> > logical_rewrite_heap.c. ConcurrentChange struct and
> > apply_concurrent_insert function  is one example of such.
> >
> > So, if this is the case, 0003 patch can be splitted in two:
> > The first one is general utility code for logical table rewrite
> > The second one with actual VACUUM CONCURRENTLY feature.
>
> > What do you think?
>
> I can imagine moving the function process_concurrent_changes() and subroutines
> to a different file (e.g. rewriteheap.c), but moving it into a separate diff
> that does not contain any call of the function makes little sense to me. Such
> a diff would not add any useful functionality and could not be considered
> refactoring either.
>
> So far I at least moved some code to separate diffs: 0003 and 0005. I'll move
> more if I find sensible opportunity in the future.
>
> --
> Antonin Houska
> Web: https://www.cybertec-postgresql.com
>

Thanks for working on this, I think this is a very useful feature.

The patch doesn't compile in the debug build with errors:

../postgres/src/backend/commands/cluster.c: In function ‘get_catalog_state’:
../postgres/src/backend/commands/cluster.c:2771:33: error: declaration
of ‘td_src’ shadows a previous local [-Werror=shadow=compatible-local]
2771 | TupleDesc td_src, td_dst;
| ^~
../postgres/src/backend/commands/cluster.c:2741:25: note: shadowed
declaration is here
2741 | TupleDesc td_src = RelationGetDescr(rel);

you forgot the meson build for pgoutput_cluster

diff --git a/src/backend/meson.build b/src/backend/meson.build
index 78c5726814..0f9141a4ac 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
 subdir('jit/llvm')
 subdir('replication/libpqwalreceiver')
 subdir('replication/pgoutput')
+subdir('replication/pgoutput_cluster')

I noticed that you use lmode/lock_mode/lockmode, there are lmode and lockmode
in the codebase, but I remember someone proposed all changes to lockmode, how
about sticking to lockmode in your patch?

0004:

+  sure that the old files do not change during the processing because the
+  chnages would get lost due to the swap.
typo

+  files. The data changes that took place during the creation of the new
+  table and index files are captured using logical decoding
+  () and applied before
+  the ACCESS EXCLUSIVE lock is requested. Thus the lock
+  is typically held only for the time needed to swap the files, which
+  should be pretty short.

I remember pg_squeeze also did some logical decoding after getting the exclusive
lock, if that is still true, I guess the doc above is not precise.

+  Note that CLUSTER with the
+  the CONCURRENTLY option does not try to order the
+  rows inserted into the table after the clustering started.

Do you mean after the *logical decoding* started here? If CLUSTER CONCURRENTLY
does not order rows at all, why bother implementing it?

+ errhint("CLUSTER CONCURRENTLY is only allowed for permanent relations")));

errhint messages should end with a dot. Why hardcoded to "CLUSTER CONCURRENTLY"
instead of parameter *stmt*.

+ ResourceOwner oldowner = CurrentResourceOwner;
+
+ /*
+ * In the CONCURRENT case, do the planning in a subtrensaction so that
typo

I did not see VacuumStmt changes in gram.y, how do we suppose to
use the vacuum full concurrently? I tried the following but no success.

[local] postgres@demo:5432-36097=# vacuum (concurrently) aircrafts_data;
ERROR:  CONCURRENTLY can only be specified with VACUUM FULL

[local] postgres@demo:5432-36097=# vacuum full (c

Re: allowing extensions to control planner behavior

2024-08-30 Thread Robert Haas
On Thu, Aug 29, 2024 at 6:49 PM Jeff Davis  wrote:
> I don't see that in the code yet, so I assume you are referring to the
> comment at [1]?

FYI, I'm hacking on a revised approach but it's not ready to show to
other people yet.

> I still like my idea to generalize the pathkey infrastructure, and
> Robert asked for other approaches to consider. It would allow us to
> hold onto multiple paths for longer, similar to pathkeys, which might
> offer some benefits or simplifications.

This is a fair point. I dislike the fact that add_path() is a thicket
of if-statements that's actually quite hard to understand and easy to
screw up when you're making modifications. But I feel like it would be
difficult to generalize the infrastructure without making it
substantially slower, which would probably cause too much of an
increase in planning time to be acceptable. So my guess is that this
is a dead end, unless there's a clever idea that I'm not seeing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




bt Scankey in another contradictory case

2024-08-30 Thread bigbro...@hotmail.com
Hi,
when i read source code of the part of nbtree, i detected another kind of 
contradictory case postgresql has not eliminated 
(eg. x >4 and x <3) in the function _bt_preprocess_keys in nbtutils.c. this 
cause next unnecessary index scan and qual evaluation.
it seems no one will write SQL like that, but maybe it generated from the ec or 
degenerate qual. 
the function just invoked in function _bt_first during initializing index scan, 
so it’s cheap to eliminate this case in the function.
we just pay attention on the opposite operator, so there are four detailed 
cases (ie. x >4 and x <3 , x >4 and x <=3, x >=4 and x <3, x >=4 and x <=3).
when test the case is contradictory or not, it's need to use the more 
restrictive operator when less and more restrictive operator both appeared, 
case like x >4 and x <=4, if we choose <=, 4 <= 4 is satisfied, actually x >4 
and x <=4 is contradictory. 
when >= with <= or > with <, it's ok to pick any one of them.
i have added this feature in my local environment and tested. but i am not sure 
it's worth to spend more effort on it. 


bigbro...@hotmail.com


回复: bt Scankey in another contradictory case

2024-08-30 Thread b ro
Hi,
  this is the patch attachment.


发件人: b ro 
发送时间: 2024年8月30日 0:56
收件人: pgsql-hackers 
主题: bt Scankey in another contradictory case

Hi,
when i read source code of the part of nbtree, i detected another kind of 
contradictory case postgresql has not eliminated
(eg. x >4 and x <3) in the function _bt_preprocess_keys in nbtutils.c. this 
cause next unnecessary index scan and qual evaluation.
it seems no one will write SQL like that, but maybe it generated from the ec or 
degenerate qual.
the function just invoked in function _bt_first during initializing index scan, 
so it’s cheap to eliminate this case in the function.
we just pay attention on the opposite operator, so there are four detailed 
cases (ie. x >4 and x <3 , x >4 and x <=3, x >=4 and x <3, x >=4 and x <=3).
when test the case is contradictory or not, it's need to use the more 
restrictive operator when less and more restrictive operator both appeared,
case like x >4 and x <=4, if we choose <=, 4 <= 4 is satisfied, actually x >4 
and x <=4 is contradictory.
when >= with <= or > with <, it's ok to pick any one of them.
i have added this feature in my local environment and tested. but i am not sure 
it's worth to spend more effort on it.

bigbro...@hotmail.com


patch.diff
Description: patch.diff


Re: Add contrib/pg_logicalsnapinspect

2024-08-30 Thread Bertrand Drouvot
Hi,

On Fri, Aug 30, 2024 at 03:43:12PM +0530, Amit Kapila wrote:
> On Thu, Aug 29, 2024 at 6:33 PM Bharath Rupireddy
>  wrote:
> >
> > On Thu, Aug 29, 2024 at 3:44 PM Bertrand Drouvot
> >  wrote:
> > >
> > > Yeah that's fair. And now I'm wondering if we need an extra module. I 
> > > think
> > > we could "simply" expose 2 new functions in core, thoughts?
> > >
> > > > > What do you think? Did you have something else in mind?
> > > > >
> > > >
> > > > On similar lines, we can also provide a function to get the slot's
> > > > on-disk data.
> > >
> > > Yeah, having a way to expose the data from the disk makes fully sense to 
> > > me.
> > >
> > > > IIRC, Bharath had previously proposed a tool to achieve
> > > > the same. It is fine if we don't want to add that as part of this
> > > > patch but I mentioned it because by having that we can have a set of
> > > > functions to view logical decoding data.
> > >
> > > That's right. I think this one would be simply enough to expose one or two
> > > functions in core too (and probably would not need an extra module).
> >
> > +1 for functions in core unless this extra module
> > pg_logicalsnapinspect works as a tool to be helpful even when the
> > server is down.
> >
> 
> We have an example of pageinspect which provides low-level functions
> to aid debugging. The proposal for these APIs also seems to fall in
> the same category,

That's right, but...

>  so why go for the core for these functions?

as we decided not to expose the SnapBuildOnDisk and SnapBuild structs to public
and to create/expose 2 new functions in snapbuild.c then the functions in the
module would do nothing but expose the data coming from the snapbuild.c's
functions (get the tuple and send it to the client). That sounds weird to me to
create a module that would "only" do so, that's why I thought that in core
functions taking care of everything make more sense.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: pgstattuple: fix free space calculation

2024-08-30 Thread Andreas Karlsson

On 8/29/24 4:53 PM, Frédéric Yhuel wrote:

So I think we should just use PageGetExactFreeSpace().


I agree, I feel that is the least surprising behavior because we 
currently sum tiny amounts of free space that is unusable anyway. E.g. 
imagine one million pages with 10 free bytes each, that looks like 10 
free MB so I do not see why we should treat the max tuples per page case 
with any special logic.


Andreas




Re: define PG_REPLSLOT_DIR

2024-08-30 Thread Bertrand Drouvot
Hi,

On Fri, Aug 30, 2024 at 03:34:56PM +0900, Michael Paquier wrote:
> On Tue, Aug 20, 2024 at 04:23:06PM +, Bertrand Drouvot wrote:
> > Please find attached v3 that:
> > 
> > - takes care of your comments (and also removed the use of PG_TBLSPC_DIR in
> > RELATIVE_PG_TBLSPC_DIR).
> > - removes the new macros from the comments (see Michael's and Yugo-San's
> > comments in [0] resp. [1]).
> > - adds a missing sizeof() (see [1]).
> > - implements Ashutosh's idea of adding a new SLOT_DIRNAME_ARGS (see [2]). 
> > It's
> > done in 0002 (I used REPLSLOT_DIR_ARGS though).
> > - fixed a macro usage in ReorderBufferCleanupSerializedTXNs() (was not at 
> > the
> > right location, discovered while implementing 0002).
> 
> I was looking at that, and applied 0001 for pg_replslot and merged
> 0003~0005 together for pg_logical.

Thanks!

> In 0006, I am not sure that RELATIVE_PG_TBLSPC_DIR is really something
> we should have.  Sure, that's a special case for base backups when
> sending a directory, but we have also pg_wal/ and its XLOGDIR so
> that's inconsistent.

That's right but OTOH there is no (for good reason) PG_WAL_DIR or such defined.

That said, I don't have a strong opinion on this one, I think that also makes
sense to leave it as it is. Please find attached v4 doing so.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 4736e0913933284bbcbb81228d4a1ea4ce1fb9da Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 19 Aug 2024 07:40:10 +
Subject: [PATCH v4] Define PG_TBLSPC_DIR

Replace most of the places where "pg_tblspc" is used in .c files with a new
PG_TBLSPC_DIR define. The places where it is not done is for consistency with
the existing PG_STAT_TMP_DIR define.
---
 src/backend/access/transam/xlog.c   | 12 -
 src/backend/access/transam/xlogrecovery.c   | 14 +-
 src/backend/backup/backup_manifest.c|  3 ++-
 src/backend/backup/basebackup.c |  2 +-
 src/backend/commands/dbcommands.c   |  2 +-
 src/backend/commands/tablespace.c   |  4 +--
 src/backend/storage/file/fd.c   | 29 +++--
 src/backend/storage/file/reinit.c   | 10 +++
 src/backend/utils/adt/dbsize.c  |  8 +++---
 src/backend/utils/adt/misc.c|  4 +--
 src/backend/utils/cache/relcache.c  |  4 +--
 src/bin/pg_checksums/pg_checksums.c |  6 ++---
 src/bin/pg_combinebackup/pg_combinebackup.c | 18 ++---
 src/bin/pg_rewind/file_ops.c|  2 +-
 src/bin/pg_upgrade/exec.c   |  2 +-
 src/common/file_utils.c |  3 ++-
 src/common/relpath.c| 20 +++---
 src/fe_utils/astreamer_file.c   |  7 +++--
 src/include/common/relpath.h|  7 +
 19 files changed, 85 insertions(+), 72 deletions(-)
  15.5% src/backend/access/transam/
   3.3% src/backend/backup/
   4.4% src/backend/commands/
  26.8% src/backend/storage/file/
   8.8% src/backend/utils/adt/
   4.2% src/bin/pg_checksums/
  11.7% src/bin/pg_combinebackup/
  13.3% src/common/
   3.6% src/fe_utils/
   7.8% src/

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ee0fb0e28f..4e06d86196 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8944,10 +8944,10 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
 		datadirpathlen = strlen(DataDir);
 
 		/* Collect information about all tablespaces */
-		tblspcdir = AllocateDir("pg_tblspc");
-		while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
+		tblspcdir = AllocateDir(PG_TBLSPC_DIR);
+		while ((de = ReadDir(tblspcdir, PG_TBLSPC_DIR)) != NULL)
 		{
-			char		fullpath[MAXPGPATH + 10];
+			char		fullpath[MAXPGPATH + sizeof(PG_TBLSPC_DIR)];
 			char		linkpath[MAXPGPATH];
 			char	   *relpath = NULL;
 			char	   *s;
@@ -8970,7 +8970,7 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
 			if (*badp != '\0' || errno == EINVAL || errno == ERANGE)
 continue;
 
-			snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
+			snprintf(fullpath, sizeof(fullpath), "%s/%s", PG_TBLSPC_DIR, de->d_name);
 
 			de_type = get_dirent_type(fullpath, de, false, ERROR);
 
@@ -9031,8 +9031,8 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
  * In this case, we store a relative path rather than an
  * absolute path into the tablespaceinfo.
  */
-snprintf(linkpath, sizeof(linkpath), "pg_tblspc/%s",
-		 de->d_name);
+snprintf(linkpath, sizeof(linkpath), "%s/%s",
+		 PG_TBLSPC_DIR, de->d_name);
 relpath = pstrdup(linkpath);
 			}
 			else
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index ad817fbca6..5d2755ef79 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access

Re: JIT: Remove some unnecessary instructions.

2024-08-30 Thread Andreas Karlsson

On 8/30/24 5:55 AM, Xing Guo wrote:

I find there are some unnecessary load/store instructions being
emitted by the JIT compiler.


Well spotted! All of these are obvious dead instructions and while LLVM 
might be able to optimize them away there is no reason to create extra 
work for the optimizer.


The patch looks good, applies and the tests passes.

Andreas




Re: bt Scankey in another contradictory case

2024-08-30 Thread Peter Geoghegan
On Fri, Aug 30, 2024 at 7:36 AM b ro  wrote:
>   this is the patch attachment.

We discussed this recently:

https://www.postgresql.org/message-id/80384.1715458896%40sss.pgh.pa.us

I think that we should do this.

It doesn't make a huge difference in practice, because we'll still end
the scan once the leaf level is reached. But it matters more when
array keys are involved, where there might be more than one descent to
the leaf level. Plus we might as well just be thorough about this
stuff.

-- 
Peter Geoghegan




Re: Segfault in jit tuple deforming on arm64 due to LLVM issue

2024-08-30 Thread Anthonin Bonnefoy
I created a commitfest entry[1] to have the CI test the patch. There
was a failure in headerscheck and cpluspluscheck when the include of
SectionMemoryManager.h is checked[2]

In file included from /usr/include/llvm/ADT/SmallVector.h:18,
from /tmp/cirrus-ci-build/src/include/jit/SectionMemoryManager.h:23,
from /tmp/headerscheck.4b1i5C/test.c:2:
/usr/include/llvm/Support/type_traits.h:17:10: fatal error:
type_traits: No such file or directory
   17 | #include 

Since the SmallVector.h include type_traits, this file can't be
compiled with a C compiler so I've just excluded it from headerscheck.

Loosely related to headerscheck, running it locally was failing as it
couldn't find the  file. This is because headerscheck
except llvm include files to be in /usr/include and don't rely on
llvm-config. I created a second patch to use the LLVM_CPPFLAGS as
extra flags when testing the src/include/jit/* files.

Lastly, I've used www.github.com instead of github.com link to stop
spamming the llvm-project's PR with reference to the commit every time
it is pushed somewhere (which seems to be the unofficial hack[3]).

[1] https://commitfest.postgresql.org/49/5220/
[2] 
https://cirrus-ci.com/task/4646639124611072?logs=headers_headerscheck#L42-L46
[3] 
https://github.com/orgs/community/discussions/23123#discussioncomment-3239240


v6-0002-Add-LLVM_CPPFLAGS-in-headerscheck-to-llvm-jit-fil.patch
Description: Binary data


v6-0001-Backport-of-LLVM-code-to-fix-ARM-relocation-bug.patch
Description: Binary data


Re: Restart pg_usleep when interrupted

2024-08-30 Thread Sami Imseih
> From this discussion, there is desire for a sleep function that:
> 1/ Sleeps for the full duration of the requested time
> 2/ Continues to handle important interrupts during the sleep
> 
> While something like CF 5118 will take care of point #1,


> I'm not sure, even with CF entry 5118, nanosleep() could be interrupted. But I
> agree that the leader won't be interrupted by PqMsg_Progress anymore.

Correct.


> 1. we should still implement the "1 Hz" stuff as 1.1/ it could be useful if CF
> 5118 gets committed and we move to WaitLatchUs() and 2.2/ it won't break 
> anything
> if CF gets committed and we don't move to WaitLatchUs(). For 1.1 it would 
> still
> prevent the leader to be waked up too frequently by the parallel workers.

Yes, regardless of any changes that may occur in the future that change the 
behaior
of pg_usleep, preventing a leader from being woken up too frequently is
good to have. The "1 Hz" work is still good to have.


> 2. still discuss the "need" and get a consensus regarding a sleep that could
> ensure the wait duration (in some cases and depending of the reason why the
> process is waked up).

Unless there is objection, I will withdraw the CF [1] entry for this patch next 
week.

This discussion however should be one of the points that CF 5118 must deal with.
That is, 5118 will change the behavior of pg_usleep when dealing with interrupts
previously signaled by SIGUSR1. 


[1] https://commitfest.postgresql.org/49/5161/

Regards, 

Sami







Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2024-08-30 Thread Matthias van de Meent
On Wed, 28 Aug 2024 at 15:53, Peter Geoghegan  wrote:
> On Wed, Aug 28, 2024 at 9:49 AM Robert Haas  wrote:
> > On Wed, Aug 28, 2024 at 9:41 AM Peter Geoghegan  wrote:
> > > On Wed, Aug 28, 2024 at 9:35 AM Robert Haas  wrote:
> > > > > If you think it's important to have this info on all indexes then I'd
> > > > > prefer the pgstat approach over adding a field in IndexScanDescData.
> > > > > If instead you think that this is primarily important to expose for
> > > > > nbtree index scans, then I'd prefer putting it in the BTSO using e.g.
> > > > > the index AM analyze hook approach, as I think that's much more
> > > > > elegant than this.
> > > >
> > > > I agree with this analysis. I don't see why IndexScanDesc would ever
> > > > be the right place for this.
> > >
> > > Then what do you think is the right place?
> >
> > The paragraph that I agreed with and quoted in my reply, and that you
> > then quoted in your reply to me, appears to me to address that exact
> > question.
>
> Are you talking about adding global counters, in the style of pgBufferUsage?

My pgstat approach would be that ExecIndexScan (plus ExecIOS and
ExecBitmapIS) could record the current state of relevant fields from
node->ss.ss_currentRelation->pgstat_info, and diff them with the
recorded values at the end of that node's execution, pushing the
result into e.g. Instrumentation; diffing which is similar to what
happens in InstrStartNode() and InstrStopNode() but for the relation's
pgstat_info instead of pgBufferUsage and pgWalUsage. Alternatively
this could happen in ExecProcNodeInstr, but it'd need some more
special-casing to make sure it only addresses (index) relation scan
nodes.

By pulling the stats directly from Relation->pgstat_info, no catalog
index scans are counted if they aren't also the index which is subject
to that [Bitmap]Index[Only]Scan.

In effect, this would need no changes in AM code, as this would "just"
pull the data from those statistics which are already being updated in
AM code.


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Jargon and acronyms on this mailing list

2024-08-30 Thread Greg Sabino Mullane
I normally wouldn't mention my blog entries here, but this one was about
the hackers mailing list, so wanted to let people know about it in case you
don't follow Planet Postgres. I scanned the last year's worth of posts and
gathered the most used acronyms and jargon. The most commonly used acronym
was IMO (in my opinion), followed by FWIW (for what it's worth), and IIUC
(if I understand correctly). The complete list can be found in the post
below, I'll refrain from copying everything here.

https://www.crunchydata.com/blog/understanding-the-postgres-hackers-mailing-list

Cheers,
Greg


Re: Make printtup a bit faster

2024-08-30 Thread Andreas Karlsson

On 8/29/24 1:51 PM, David Rowley wrote:

I had planned to work on this for PG18, but I'd be happy for some
assistance if you're willing.


I am interested in working on this, unless Andy Fan wants to do this 
work. :) I believe that optimizing the out, in and send functions would 
be worth the pain. I get Tom's objections but I do not think adding a 
small check would add much overhead compared to the gains we can get.


And given that all of in, out and send could be optimized I do not like 
the idea of duplicating all three in the catalog.


David, have you given any thought on the cleanest way to check for if 
the new API or the old is the be used for these functions? If not I can 
figure out something myself, just wondering if you already had something 
in mind.


Andreas





Re: Inline non-SQL SRFs using SupportRequestSimplify

2024-08-30 Thread Paul Jungwirth

On 7/26/24 11:58, Tom Lane wrote:
> Heikki Linnakangas  writes:
>> On 28/06/2024 01:01, Paul Jungwirth wrote:
>>> Another approach I considered is using a separate support request, e.g. 
SupportRequestInlineSRF, and

>>> just calling it from inline_set_returning_function. I didn't like having 
two support requests that
>>> did almost exactly the same thing. OTOH my current approach means you'll get an error if you do 
this:

>>>
>>> ```
>>> postgres=# select temporal_semijoin('employees', 'id', 'valid_at', 
'positions', 'employee_id',
>>> 'valid_at');
>>> ERROR:  unrecognized node type: 66
>>> ```
>>>
>>> I'll look into ways to fix that.
>
> I like this idea, but I like exactly nothing about this implementation.
> The right thing is to have a separate SupportRequestInlineSRF request
> that is called directly by inline_set_returning_function.

Here are new patches using a new SupportRequestInlineSRF request type. They include patches and 
documentation.


The patches handle this:

   SELECT * FROM srf();

but not this:

   SELECT srf();

In the latter case, Postgres always calls the function in "materialized mode" and gets the whole 
result up front, so inline_set_returning_function is never called, even for SQL functions.


For tests I added a `foo_from_bar(colname, tablename, filter)` PL/pgSQL function that does `SELECT 
$colname FROM $tablename [WHERE $colname = $filter]`, then the support function generates the same 
SQL and turns it into a Query node. This matches how I want to use the feature for my 
temporal_semijoin etc functions. If you give a non-NULL filter, you get a Query with a Var node, so 
we are testing something that isn't purely Const.


The SupportRequestSimplify type has some comments about supporting operators, but I don't think you 
can have a set-returning operator, so I didn't repeat those comments for this new type.


I split things up into three patch files because I couldn't get git to gracefully handle shifting a 
large block of code into an if statement. The first two patches have no changes except that 
indentation (and initializing one variable to NULL). They aren't meant to be committed separately.


Rebased to a83a944e9f.

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom 591de3a06ec97cf1bff10d603946f0bac176b9d9 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Thu, 29 Aug 2024 18:17:55 -0700
Subject: [PATCH v2 1/3] Add indented section

---
 src/backend/optimizer/util/clauses.c | 74 +++-
 1 file changed, 73 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index b4e085e9d4b..ef8282ded49 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -5069,7 +5069,7 @@ inline_set_returning_function(PlannerInfo *root, RangeTblEntry *rte)
 	TupleDesc	rettupdesc;
 	List	   *raw_parsetree_list;
 	List	   *querytree_list;
-	Query	   *querytree;
+	Query	   *querytree = NULL;
 
 	Assert(rte->rtekind == RTE_FUNCTION);
 
@@ -5235,6 +5235,78 @@ inline_set_returning_function(PlannerInfo *root, RangeTblEntry *rte)
 			goto fail;
 		querytree = linitial(querytree_list);
 	}
+	if (!querytree)
+	{
+		/* Fetch the function body */
+		tmp = SysCacheGetAttrNotNull(PROCOID, func_tuple, Anum_pg_proc_prosrc);
+		src = TextDatumGetCString(tmp);
+
+		/*
+		 * Setup error traceback support for ereport().  This is so that we can
+		 * finger the function that bad information came from.
+		 */
+		callback_arg.proname = NameStr(funcform->proname);
+		callback_arg.prosrc = src;
+
+		sqlerrcontext.callback = sql_inline_error_callback;
+		sqlerrcontext.arg = (void *) &callback_arg;
+		sqlerrcontext.previous = error_context_stack;
+		error_context_stack = &sqlerrcontext;
+
+		/* If we have prosqlbody, pay attention to that not prosrc */
+		tmp = SysCacheGetAttr(PROCOID,
+			  func_tuple,
+			  Anum_pg_proc_prosqlbody,
+			  &isNull);
+		if (!isNull)
+		{
+			Node	   *n;
+
+			n = stringToNode(TextDatumGetCString(tmp));
+			if (IsA(n, List))
+querytree_list = linitial_node(List, castNode(List, n));
+			else
+querytree_list = list_make1(n);
+			if (list_length(querytree_list) != 1)
+goto fail;
+			querytree = linitial(querytree_list);
+
+			/* Acquire necessary locks, then apply rewriter. */
+			AcquireRewriteLocks(querytree, true, false);
+			querytree_list = pg_rewrite_query(querytree);
+			if (list_length(querytree_list) != 1)
+goto fail;
+			querytree = linitial(querytree_list);
+		}
+		else
+		{
+			/*
+			 * Set up to handle parameters while parsing the function body.  We
+			 * can use the FuncExpr just created as the input for
+			 * prepare_sql_fn_parse_info.
+			 */
+			pinfo = prepare_sql_fn_parse_info(func_tuple,
+			  (Node *) fexpr,
+			  fexpr->inputcollid);
+
+			/*
+			 * Parse, analyze, and rewrite (unlike inline_function(), we can't
+			 * skip rewriting here).  We can fail as soon as we find m

Re: allowing extensions to control planner behavior

2024-08-30 Thread Jeff Davis
On Fri, 2024-08-30 at 07:33 -0400, Robert Haas wrote:
> This is a fair point. I dislike the fact that add_path() is a thicket
> of if-statements that's actually quite hard to understand and easy to
> screw up when you're making modifications. But I feel like it would
> be
> difficult to generalize the infrastructure without making it
> substantially slower, which would probably cause too much of an
> increase in planning time to be acceptable. So my guess is that this
> is a dead end, unless there's a clever idea that I'm not seeing.

As far as performance goes, I'm only looking at branch in add_path()
that calls compare_pathkeys(). Do you have some example queries which
would be a worst case for that path?

In general if you can post some details about how you are measuring,
that would be helpful.

Regards,
Jeff Davis





Re: PG_TEST_EXTRA and meson

2024-08-30 Thread Jacob Champion
On Wed, Aug 28, 2024 at 8:21 AM Nazir Bilal Yavuz  wrote:
> I do not exactly remember the reason but I think I copied the same
> behavior as before, PG_TEST_EXTRA variable was checked in the
> src/test/Makefile so I exported it there.

Okay, give v3 a try then. This exports directly from Makefile.global.
Since that gets pulled into a bunch of places, the scope is a lot
wider than it used to be; I've disabled it for PGXS so it doesn't end
up poisoning other extensions.

--Jacob


v3-make_test_extra.diff
Description: Binary data


Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-08-30 Thread Tom Lane
Alexander Lakhin  writes:
> Let me show you another related anomaly, which drongo kindly discovered
> recently: [1]. That test failed with:
>   SELECT dblink_cancel_query('dtest1');
> - dblink_cancel_query
> --
> - OK
> +   dblink_cancel_query
> +--
> + cancel request timed out
>   (1 row)

While we're piling on, has anyone noticed that *non* Windows buildfarm
animals are also failing this test pretty frequently?  The most recent
occurrence is at [1], and it looks like this:

diff -U3 
/home/bf/bf-build/mylodon/HEAD/pgsql/contrib/postgres_fdw/expected/query_cancel.out
 
/home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/postgres_fdw/regress/results/query_cancel.out
--- 
/home/bf/bf-build/mylodon/HEAD/pgsql/contrib/postgres_fdw/expected/query_cancel.out
 2024-07-22 11:09:50.638133878 +
+++ 
/home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/postgres_fdw/regress/results/query_cancel.out
2024-08-30 06:28:01.971083945 +
@@ -17,4 +17,5 @@
 SET LOCAL statement_timeout = '10ms';
 select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this 
takes very long
 ERROR:  canceling statement due to statement timeout
+WARNING:  could not get result of cancel request due to timeout
 COMMIT;

I trawled the buildfarm database for other occurrences of "could not
get result of cancel request" since this test went in.  I found 34
of them (see attachment), and none that weren't the timeout flavor.

Most of the failing machines are not especially slow, so even though
the hard-wired 30 second timeout that's being used here feels a little
under-engineered, I'm not sure that arranging to raise it would help.
My spidey sense feels that there's some actual bug here, but it's hard
to say where.  mylodon's postmaster log confirms that the 30 seconds
did elapse, and that there wasn't anything much else going on:

2024-08-30 06:27:31.926 UTC client backend[3668381] pg_regress/query_cancel 
ERROR:  canceling statement due to statement timeout
2024-08-30 06:27:31.926 UTC client backend[3668381] pg_regress/query_cancel 
STATEMENT:  select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN 
ft5;
2024-08-30 06:28:01.946 UTC client backend[3668381] pg_regress/query_cancel 
WARNING:  could not get result of cancel request due to timeout

Any thoughts?

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-08-30%2006%3A25%3A46

sysname|branch |  snapshot   |   stage  
  |  l  

---+---+-++-
 flaviventris  | HEAD  | 2024-04-02 23:58:01 | 
postgres_fdwInstallCheck-C | +WARNING:  could not get result of cancel request 
due to timeout
 calliphoridae | HEAD  | 2024-04-11 01:54:17 | 
postgres_fdwInstallCheck-C | +WARNING:  could not get result of cancel request 
due to timeout
 kestrel   | REL_17_STABLE | 2024-07-13 04:15:25 | 
postgres_fdwInstallCheck-C | +WARNING:  could not get result of cancel request 
due to timeout
 sarus | REL_17_STABLE | 2024-07-20 15:02:23 | ContribCheck-C   
  | 2024-07-20 15:10:15.547 UTC [613791:141] pg_regress/postgres_fdw WARNING:  
could not get result of cancel request due to timeout
 sarus | REL_17_STABLE | 2024-07-20 15:02:23 | ContribCheck-C   
  | +WARNING:  could not get result of cancel request due to timeout
 mylodon   | HEAD  | 2024-07-26 13:15:09 | 
postgres_fdwInstallCheck-C | +WARNING:  could not get result of cancel request 
due to timeout
 adder | HEAD  | 2024-07-02 00:24:10 | postgres_fdwCheck
  | +WARNING:  could not get result of cancel request due to timeout
 adder | HEAD  | 2024-07-02 00:24:10 | postgres_fdwCheck
  | 2024-07-02 00:33:45.770 UTC client backend[1036068] pg_regress/postgres_fdw 
WARNING:  could not get result of cancel request due to timeout
 calliphoridae | REL_17_STABLE | 2024-07-16 22:45:08 | 
postgres_fdwInstallCheck-C | +WARNING:  could not get result of cancel request 
due to timeout
 mylodon   | HEAD  | 2024-08-09 05:25:24 | postgres_fdwCheck
  | +WARNING:  could not get result of cancel request due to timeout
 mylodon   | HEAD  | 2024-08-09 05:25:24 | postgres_fdwCheck
  | 2024-08-09 05:28:21.990 UTC client backend[1775253] pg_regress/query_cancel 
WARNING:  could not get result of cancel request due to timeout
 mylodon   | HEAD  | 2024-04-24 07:12:22 | 
postgres_fdwInstallCheck-C | +WARNING:  could not get result of cancel request 
due to timeout
 skink | HEAD  | 2024-07-20 20

Re: Avoiding superfluous buffer locking during nbtree backwards scans

2024-08-30 Thread Matthias van de Meent
On Mon, 19 Aug 2024 at 13:43, Matthias van de Meent
 wrote:
>
> On Sun, 11 Aug 2024 at 21:44, Peter Geoghegan  wrote:
> >
> > On Tue, Aug 6, 2024 at 6:31 PM Matthias van de Meent
> >  wrote:
> > > +1, LGTM.
> > >
> > > This changes the backward scan code in _bt_readpage to have an
> > > approximately equivalent handling as the forward scan case for
> > > end-of-scan cases, which is an improvement IMO.
>
> Here's a new patch that further improves the situation, so that we
> don't try to re-lock the buffer we just accessed when we're stepping
> backward in index scans, reducing buffer lock operations in the common
> case by 1/2.

Attached is an updated version of the patch, now v2, which fixes some
assertion failures for parallel plans by passing the correct
parameters to _bt_parallel_release for forward scans.

With the test setup below, it unifies the number of buffer accesses
between forward and backward scans:

CREATE TABLE test AS
SELECT generate_series(1, 100) as num,
   '' j;
CREATE INDEX ON test (num);
VACUUM (FREEZE) test;
SET enable_seqscan = off; SET max_parallel_workers_per_gather = 0;
EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(j ORDER BY num DESC) -- or ASC
FROM test;

The test case will have an Index Scan, which in DESC case is backward.
Without this patch, the IS will have 7160 accesses for the ASC
ordering, but 9892 in the DESC case (an increase of 2732,
approximately equivalent to the number leaf pages in the index), while
with this patch, the IndexScan will have 7160 buffer accesses for both
ASC and DESC ordering.

In my previous mail I used buffer lock stats from an index-only scan
as proof of the patch working. It's been pointed out to me that an
IndexScan is easier to extract this data from, as it drops the pin on
the page after getting some results from a page.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)


v2-0001-Avoid-unneeded-nbtree-backwards-scan-buffer-locks.patch
Description: Binary data


Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-08-30 Thread Jelte Fennema-Nio
On Fri, Aug 30, 2024, 21:21 Tom Lane  wrote:

>
> While we're piling on, has anyone noticed that *non* Windows buildfarm
> animals are also failing this test pretty frequently?
>


Any thoughts?
>

Yes. Fixes are here (see the ~10 emails above in the thread for details):
https://www.postgresql.org/message-id/cageczqqo8cn2rw45xuymvzxet7-bcruuzufmbgqc3ue...@mail.gmail.com

They don't apply anymore after the change to move this test to a dedicated
file. It shouldn't be too hard to update those patches though. I'll try to
do that in a few weeks when I'm back behind my computer. But feel free to
commit something earlier.

>


Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD

2024-08-30 Thread Andrew Dunstan



On 2024-08-29 Th 5:50 PM, Mark Murawski wrote:



On 8/29/24 16:54, Andrew Dunstan wrote:


On 2024-08-29 Th 1:01 PM, Mark Murawski wrote:



On 8/29/24 11:56, Andrew Dunstan wrote:


On 2024-08-28 We 5:53 PM, Mark Murawski wrote:

Hi Hackers!

This would be version v1 of this feature

Basically, the subject says it all: pl/pgperl Patch for being able 
to tell which function you're in.
This is a hashref so it will be possible to populate new and 
exciting other details in the future as the need arises


This also greatly improves logging capabilities for things like 
catching warnings,  Because as it stands right now, there's no 
information that can assist with locating the source of a warning 
like this:


# tail -f /var/log/postgresql.log
*** GOT A WARNING - Use of uninitialized value $prefix in 
concatenation (.) or string at (eval 531) line 48.


Now, with $_FN you can do this:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$


use warnings;
use strict;
use Data::Dumper;

$SIG{__WARN__} = sub {
  elog(NOTICE, Dumper($_FN));

  print STDERR "In Function: $_FN->{name}: $_[0]\n";
};

my $a;
print "$a"; # uninit!

return undef;

$function$
;

This patch is against 12 which is still our production branch. 
This could easily be also patched against newer releases as well.


I've been using this code in production now for about 3 years, 
it's greatly helped track down issues.  And there shouldn't be 
anything platform-specific here, it's all regular perl API


I'm not sure about adding testing.  This is my first postgres 
patch, so any guidance on adding regression testing would be 
appreciated.


The rationale for this has come from the need to know the source 
function name, and we've typically resorted to things like this in 
the past:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$

my $function_name = 'throw_warning';
$SIG{__WARN__} = sub { print STDERR "In Function: $function_name: 
$_[0]\n"; }

$function$
;

We've literally had to copy/paste this all over and it's something 
that postgres should just 'give you' since it knows the name 
already, just like when triggers pass you $_TD with all the 
pertinent information


A wishlist item would be for postgres plperl to automatically 
prepend the function name and schema when throwing perl warnings 
so you don't have to do your own __WARN__ handler, but this is the 
next best thing.




I'm not necessarily opposed to this, but the analogy to $_TD isn't 
really apt.  You can't know the trigger data at compile time, 
whereas you can know the function's name at compile time, using 
just the mechanism you find irksome.


And if we're going to do it for plperl, shouldn't we do it for 
other PLs?






Hi Andrew,


Thanks for the feedback.


1) Why is this not similar to _TD?  It literally operates 
identically. At run-time it passes you $_TD  for triggers. Same her 
for functions.  This is all run-time.   What exactly is the issue 
you're trying to point out?



It's not the same as the trigger data case because the function name 
is knowable at compile time, as in fact you have demonstrated. You 
just find it a bit inconvenient to have to code for that knowledge. 
By contrast, trigger data is ONLY knowable at run time.


But I don't see that it is such a heavy burden to have to write

  my $funcname = "foo";

or similar in your function.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Understood, regarding knowability.  Trigger data is definitely going 
to be very dynamic in that regard.


No, It's not a heavy burden to hard code the function name.  But what 
my ideal goal would be is this:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$
use 'PostgresWarnHandler'; # <--- imagine this registers a WARN 
handler and outputs $_FN->{name} for you as part of the final warning


my $a;
print $a;

 etc


and then there's nothing 'hard coded' regarding the name of the 
function, anywhere.  It just seems nonsensical that postgres plperl 
can't send you the name of your registered function and there's 
absolutely no way to get it other than hard coding the name 
(redundantly, considering you're already know the name when you're 
defining the function in the first place)


But even better would be catching the warn at the plperl level, 
prepending the function name to the warn message, and then you only need:


CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE 
plperlu AS $function$


my $a;
print $a;

 etc

And then in this hypothetical situation -- magic ensues, and you're 
left with this:

# tail -f /var/log/postgresql.log
*** GOT A WARNING - Use of uninitialized value $a in concatenation 
(.) or string in function public.throw_warning() line 1









--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: allowing extensions to control planner behavior

2024-08-30 Thread Robert Haas
On Fri, Aug 30, 2024 at 1:42 PM Jeff Davis  wrote:
> As far as performance goes, I'm only looking at branch in add_path()
> that calls compare_pathkeys(). Do you have some example queries which
> would be a worst case for that path?

I think we must be talking past each other somehow. It seems to me
that your scheme would require replacing that branch with something
more complicated or generalized. If it doesn't, then I don't
understand the proposal. If it does, then that seems like it could be
a problem.

> In general if you can post some details about how you are measuring,
> that would be helpful.

I'm not really measuring anything at this point, just recalling the
many previous times when add_path() has been discussed as a pain
point.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

2024-08-30 Thread Robert Haas
On Thu, Aug 29, 2024 at 1:36 PM Nathan Bossart  wrote:
> Thanks.  Robert, do you have any concerns with this?

I don't know if I'm exactly concerned but I don't understand what
problem we're solving, either. I thought Ayush said that the function
wouldn't produce useful results for sequences; so then why do we need
to change the code to enable it?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

2024-08-30 Thread Andrew Dunstan


On 2024-08-29 Th 4:44 PM, Jacob Champion wrote:

As for the other patches, I'll ping Andrew about 0001,



Patch 0001 looks sane to me.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-08-30 Thread Tom Lane
Jelte Fennema-Nio  writes:
> On Fri, Aug 30, 2024, 21:21 Tom Lane  wrote:
>> While we're piling on, has anyone noticed that *non* Windows buildfarm
>> animals are also failing this test pretty frequently?

> Yes. Fixes are here (see the ~10 emails above in the thread for details):
> https://www.postgresql.org/message-id/cageczqqo8cn2rw45xuymvzxet7-bcruuzufmbgqc3ue...@mail.gmail.com

Hmm.  I'm not convinced that 0001 is an actual *fix*, but it should
at least reduce the frequency of occurrence a lot, which'd help.

I don't want to move the test case to where you propose, because
that's basically not sensible.  But can't we avoid remote estimates
by just cross-joining ft1 to itself, and not using the tables for
which remote estimate is enabled?

I think 0002 is probably outright wrong, or at least the change to
disable_statement_timeout is.  Once we get to that, we don't want
to throw a timeout error any more, even if an interrupt was received
just before it.

regards, tom lane




Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD

2024-08-30 Thread Andrew Dunstan



On 2024-08-30 Fr 3:49 PM, Andrew Dunstan wrote:





Sorry for empty reply. Errant finger hit send.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Add tests for PL/pgSQL SRFs

2024-08-30 Thread Paul Jungwirth

Hello Hackers,

While working on inlining non-SQL SRFs [1] I noticed we don't have tests for when a PL/pgSQL 
function requires materialize mode but doesn't have a result TupleDesc. Here is a patch adding tests 
for that, as well as some other conditions around SRF calls with `SETOF RECORD` vs `TABLE (...)`. 
There aren't any code changes, just some new tests.


But IMO it might be better to change the code. This error message is a bit 
confusing:

+-- materialize mode requires a result TupleDesc:
+select array_to_set2(array['one', 'two']); -- fail
+ERROR:  materialize mode required, but it is not allowed in this context
+CONTEXT:  PL/pgSQL function array_to_set2(anyarray) line 3 at RETURN QUERY

Perhaps it would be better to give the same error as here?:

+select * from array_to_set2(array['one', 'two']); -- fail
+ERROR:  a column definition list is required for functions returning "record"
+LINE 1: select * from array_to_set2(array['one', 'two']);

If folks agree, I can work on a patch for that. Otherwise, at least this patch documents the current 
behavior and increases coverage.


[1] https://commitfest.postgresql.org/49/5083/

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom 763db868d2dc121f8745ba9f90e4f737c135bcaa Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Wed, 28 Aug 2024 19:44:02 -0700
Subject: [PATCH v1] Add tests for PL/pgSQL Set-Returning Functions

These tests exercize SRFs with and without a result TupleDesc (in other
words RETURNS TABLE (...) vs RETURNS SETOF RECORD). We can only support
materialize mode in the former case. On the other hand, that case
rejects a column definition list as redundant. The position of these
tests shows the contrast with SQL functions (tested above), which
support SETOF RECORD in more places.
---
 src/test/regress/expected/rangefuncs.out | 90 
 src/test/regress/sql/rangefuncs.sql  | 42 +++
 2 files changed, 132 insertions(+)

diff --git a/src/test/regress/expected/rangefuncs.out b/src/test/regress/expected/rangefuncs.out
index 397a8b35d6d..3d3ebd17780 100644
--- a/src/test/regress/expected/rangefuncs.out
+++ b/src/test/regress/expected/rangefuncs.out
@@ -2169,6 +2169,96 @@ LINE 1: select * from sin(3) as t(f1 int8,f2 int8);
 drop type rngfunc_type cascade;
 NOTICE:  drop cascades to function testrngfunc()
 --
+-- test use of PL/pgSQL functions returning setof record
+--
+create or replace function array_to_set2(anyarray) returns setof record as $$
+  begin
+  return query execute 'select i AS "index", $1[i] AS "value" from generate_subscripts($1, 1) i'
+using $1;
+  end;
+$$ language plpgsql immutable;
+-- materialize mode requires a result TupleDesc:
+select array_to_set2(array['one', 'two']); -- fail
+ERROR:  materialize mode required, but it is not allowed in this context
+CONTEXT:  PL/pgSQL function array_to_set2(anyarray) line 3 at RETURN QUERY
+select * from array_to_set2(array['one', 'two']) as t(f1 int,f2 text);
+ f1 | f2  
++-
+  1 | one
+  2 | two
+(2 rows)
+
+select * from array_to_set2(array['one', 'two']); -- fail
+ERROR:  a column definition list is required for functions returning "record"
+LINE 1: select * from array_to_set2(array['one', 'two']);
+  ^
+select * from array_to_set2(array['one', 'two']) as t(f1 numeric(4,2),f2 text); -- fail
+ERROR:  structure of query does not match function result type
+DETAIL:  Returned type integer does not match expected type numeric(4,2) in column 1.
+CONTEXT:  SQL statement "select i AS "index", $1[i] AS "value" from generate_subscripts($1, 1) i"
+PL/pgSQL function array_to_set2(anyarray) line 3 at RETURN QUERY
+select * from array_to_set2(array['one', 'two']) as t(f1 point,f2 text); -- fail
+ERROR:  structure of query does not match function result type
+DETAIL:  Returned type integer does not match expected type point in column 1.
+CONTEXT:  SQL statement "select i AS "index", $1[i] AS "value" from generate_subscripts($1, 1) i"
+PL/pgSQL function array_to_set2(anyarray) line 3 at RETURN QUERY
+explain (verbose, costs off)
+  select * from array_to_set2(array['one', 'two']) as t(f1 int, f2 text);
+ QUERY PLAN  
+-
+ Function Scan on public.array_to_set2 t
+   Output: f1, f2
+   Function Call: array_to_set2('{one,two}'::text[])
+(3 rows)
+
+drop function array_to_set2;
+--
+-- test use of PL/pgSQL functions returning table
+--
+create or replace function array_to_set2(anyarray) returns table(index int, value anyelement) as $$
+  begin
+  return query execute 'select i AS "index", $1[i] AS "value" from generate_subscripts($1, 1) i'
+using $1;
+  end;
+$$ language plpgsql immutable;
+-- materialize mode requires a result TupleDesc:
+select array_to_set2(array['one', 'two']);
+ array_to_set2 
+---
+ (1,one)
+ (2,two)
+(2 rows)
+
+select * from array_to_set2(array['one', 'two']) as t(f1 int,f2 text); -- fai

Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD

2024-08-30 Thread Mark Murawski

On 8/30/24 16:12, Andrew Dunstan wrote:


Sorry for empty reply. Errant finger hit send.




No problem.

So anyway... my main point is to highlight this:




On 2024-08-29 Th 5:50 PM, Mark Murawski wrote:



And then in this hypothetical situation -- magic ensues, and you're 
left with this:

# tail -f /var/log/postgresql.log
*** GOT A WARNING - Use of uninitialized value $a in 
concatenation (.) or string in function public.throw_warning() line 1






The essential element here is:  Why does every single developer who ever 
wants to develop in plperl be forced to figure out (typically at the 
worst possible time) that Postgres doesn't log the source function of 
warning.  And then be forced to hard code their own function name as a 
variable inside their function.  The typical situation is you test your 
code, you push it to production, and then observe.  And then production 
does something you didn't expect and throws a warning.  With the current 
design, you have no idea what code threw the warning and you have to go 
into every single possible plperl function and throw in hard coded 
function names for logging. To me this is highly nonsensical to force 
this on developers.


Pretty much every modern scripting language I've come across, has a way 
to access dynamically: the name of the currently executing function.  
Either by way of a special variable, or a stack trace introspection.  
Being that this is Perl, sure... we can get caller() or a stack trace.  
But the design of plperl Postgres functions uses an anonymous coderef to 
define the function, so there is no function name defined. I don't see 
the harm in adding more information so that the running function can 
know its own name.


Maybe another approach to the fix here is to give the function an actual 
name, and when calling it, now you know where you are executing from.  
But in perl you cannot define a sub called schema.function, it 
definitely does not compile.  So you would need some sort of name 
mangling like postgres_plperl_schema_function so it's painfully obvious 
where it came from.  So... this is why it's handy to just have a 
variable, since you can format the called schema.function properly.


But, Ideally the even better solution or just catching and re-throwing 
the warn handler sounds like it would be the best option.  I'll need to 
look into this more since this is really my first jump into the perl-c 
api and I've never worked with warn handlers at this level.

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-08-30 Thread Tom Lane
I wrote:
> Hmm.  I'm not convinced that 0001 is an actual *fix*, but it should
> at least reduce the frequency of occurrence a lot, which'd help.

After enabling log_statement = all to verify what commands are being
sent to the remote, I realized that there's a third thing this patch
can do to stabilize matters: issue a regular remote query inside the
test transaction, before we enable the timeout.  This will ensure
that we've dealt with configure_remote_session() and started a
remote transaction, so that there aren't extra round trips happening
for that while the clock is running.

Pushed with that addition and some comment-tweaking.  We'll see
whether that actually makes things more stable, but I don't think
it could make it worse.

regards, tom lane




Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

2024-08-30 Thread Nathan Bossart
On Fri, Aug 30, 2024 at 04:07:30PM -0400, Robert Haas wrote:
> On Thu, Aug 29, 2024 at 1:36 PM Nathan Bossart  
> wrote:
>> Thanks.  Robert, do you have any concerns with this?
> 
> I don't know if I'm exactly concerned but I don't understand what
> problem we're solving, either. I thought Ayush said that the function
> wouldn't produce useful results for sequences; so then why do we need
> to change the code to enable it?

I suppose it would be difficult to argue that it is actually useful, given
it hasn't worked since v11 and apparently nobody noticed until recently.
If we're content to leave it unsupported, then sure, let's just remove the
"relkind == RELKIND_SEQUENCE" check in pgstat_relation().  But I also don't
have a great reason to _not_ support it.  It used to work (which appears to
have been intentional, based on the code), it was unintentionally broken,
and it'd work again with a ~1 line change.  "SELECT count(*) FROM
my_sequence" probably doesn't provide a lot of value, but I have no
intention of proposing a patch that removes support for that.

All that being said, I don't have a terribly strong opinion, but I guess I
lean towards re-enabling.

Another related inconsistency I just noticed in pageinspect:

postgres=# select t_data from heap_page_items(get_raw_page('s', 0));
t_data
--
 \x01
(1 row)

postgres=# select tuple_data_split('s'::regclass, t_data, t_infomask, 
t_infomask2, t_bits) from heap_page_items(get_raw_page('s', 0));
ERROR:  only heap AM is supported

-- 
nathan




Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-08-30 Thread Jelte Fennema-Nio
On Fri, 30 Aug 2024 at 22:12, Tom Lane  wrote:
>
> Jelte Fennema-Nio  writes:
> > On Fri, Aug 30, 2024, 21:21 Tom Lane  wrote:
> >> While we're piling on, has anyone noticed that *non* Windows buildfarm
> >> animals are also failing this test pretty frequently?
>
> > Yes. Fixes are here (see the ~10 emails above in the thread for details):
> > https://www.postgresql.org/message-id/cageczqqo8cn2rw45xuymvzxet7-bcruuzufmbgqc3ue...@mail.gmail.com
>
> Hmm.  I'm not convinced that 0001 is an actual *fix*, but it should
> at least reduce the frequency of occurrence a lot, which'd help.

I also don't think it's an actual fix, but I couldn't think of a way
to fix this. And since this only happens if you cancel right at the
start of a postgres_fdw query, I don't think it's worth investing too
much time on a fix.

> I don't want to move the test case to where you propose, because
> that's basically not sensible.  But can't we avoid remote estimates
> by just cross-joining ft1 to itself, and not using the tables for
> which remote estimate is enabled?

Yeah that should work too (I just saw your next email, where you said
it's committed like this).

> I think 0002 is probably outright wrong, or at least the change to
> disable_statement_timeout is.  Once we get to that, we don't want
> to throw a timeout error any more, even if an interrupt was received
> just before it.

The disable_statement_timeout change was not the part of that patch
that was necessary for stable output, only the change in the first
branch of enable_statement_timeout was necessary. The reason being
that enable_statement_timeout is called multiple times for a query,
because start_xact_command is called multiple times in
exec_simple_query. The change to disable_statement_timeout just seemed
like the logical extension of that change, especially since there was
basically a verbatim copy of disable_statement_timeout in the second
branch of enable_statement_timeout.

To make sure I understand your suggestion correctly: Are you saying
you would want to completely remove the outstanding interrupt if it
was caused by de statement_timout when disable_statement_timeout is
called? Because I agree that would probably make sense, but that
sounds like a more impactful change. But the current behaviour seems
strictly worse than the behaviour proposed in the patch to me, because
currently the backend would still be interrupted, but the error would
indicate a reason for the interrupt that is simply incorrect i.e. it
will say it was cancelled due to a user request, which never happened.




Re: Use read streams in pg_visibility

2024-08-30 Thread Noah Misch
On Tue, Aug 27, 2024 at 10:49:19AM +0300, Nazir Bilal Yavuz wrote:
> On Fri, 23 Aug 2024 at 22:01, Noah Misch  wrote:
> > On Fri, Aug 23, 2024 at 02:20:06PM +0300, Nazir Bilal Yavuz wrote:
> > > On Tue, 20 Aug 2024 at 21:47, Noah Misch  wrote:
> > > > On Tue, Aug 13, 2024 at 03:22:27PM +0300, Nazir Bilal Yavuz wrote:

> I liked the block_range_read_stream_cb. Attached patches for that
> (first 3 patches). I chose an nblocks way instead of last_blocks in
> the struct.

To read blocks 10 and 11, I would expect to initialize the struct with one of:

{ .first=10, .nblocks=2 }
{ .first=10, .last_inclusive=11 }
{ .first=10, .last_exclusive=12 }

With the patch's API, I would need {.first=10,.nblocks=12}.  The struct field
named "nblocks" behaves like a last_block_exclusive.  Please either make the
behavior an "nblocks" behavior or change the field name to replace the term
"nblocks" with something matching the behavior.  (I used longer field names in
my examples here, to disambiguate those examples.  It's okay if the final
field names aren't those, as long as the field names and the behavior align.)

> > > > The callback doesn't return blocks having zero vm bits, so the blkno 
> > > > variable
> > > > is not accurate.  I didn't test, but I think the loop's "Recheck to 
> > > > avoid
> > > > returning spurious results." looks at the bit for the wrong block.  If 
> > > > that's
> > > > what v1 does, could you expand the test file to catch that?  For 
> > > > example, make
> > > > a two-block table with only the second block all-visible.
> > >
> > > Yes, it was not accurate. I am getting blockno from the buffer now. I
> > > checked and confirmed it is working as expected by manually logging
> > > blocknos returned from the read stream. I am not sure how to add a
> > > test case for this.
> >
> > VACUUM FREEZE makes an all-visible, all-frozen table.  DELETE of a 
> > particular
> > TID, even if rolled back, clears both vm bits for the TID's page.  Past 
> > tests
> > like that had instability problems.  One cause is a concurrent session's XID
> > or snapshot, which can prevent VACUUM setting vm bits.  Using a TEMP table 
> > may
> > have been one of the countermeasures, but I don't remember clearly.  Hence,
> > please search the archives or the existing pg_visibility tests for how we
> > dealt with that.  It may not be problem for this particular test.
> 
> Thanks for the information, I will check these. What I still do not
> understand is how to make sure that only the second block is processed
> and the first one is skipped. pg_check_visible() and pg_check_frozen()
> returns TIDs that cause corruption in the visibility map, there is no
> information about block numbers.

I see what you're saying.  collect_corrupt_items() needs a corrupt table to
report anything; all corruption-free tables get the same output.  Testing this
would need extra C code or techniques like corrupt_page_checksum() to create
the corrupt state.  That wouldn't be a bad thing to have, but it's big enough
that I'll consider it out of scope for $SUBJECT.  With the callback change
above, I'll be ready to push all this.




Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-08-30 Thread Alexander Lakhin

Hello Tom,

30.08.2024 23:55, Tom Lane wrote:

Pushed with that addition and some comment-tweaking.  We'll see
whether that actually makes things more stable, but I don't think
it could make it worse.


Thank you for fixing that issue!

I've tested your fix with the modification I proposed upthread:
 idle_session_timeout_enabled = false;
 }
+if (rand() % 10 == 0) pg_usleep(1);
 /*
  * (5) disable async signal conditions again.

and can confirm that the issue is gone. On 8749d850f~1, the test failed
on iterations 3, 3. 12 for me, but on current REL_17_STABLE, 100 test
iterations succeeded.

At the same time, mylodon confirmed my other finding at [1] and failed [2] with:
-ERROR:  canceling statement due to statement timeout
+ERROR:  canceling statement due to user request

[1] 
https://www.postgresql.org/message-id/4db099c8-4a52-3cc4-e970-14539a319466%40gmail.com
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-08-30%2023%3A03%3A31

Best regards,
Alexander




Re: Inval reliability, especially for inplace updates

2024-08-30 Thread Noah Misch
On Tue, Jun 18, 2024 at 08:23:49AM -0700, Noah Misch wrote:
> On Mon, Jun 17, 2024 at 06:57:30PM -0700, Andres Freund wrote:
> > On 2024-06-17 16:58:54 -0700, Noah Misch wrote:
> > > On Sat, Jun 15, 2024 at 03:37:18PM -0700, Noah Misch wrote:
> > > > On Wed, May 22, 2024 at 05:05:48PM -0700, Noah Misch wrote:
> > > > > https://postgr.es/m/20240512232923.aa.nmi...@google.com wrote:
> > > > > > Separable, nontrivial things not fixed in the attached patch stack:
> > > > > >
> > > > > > - Inplace update uses transactional CacheInvalidateHeapTuple().  
> > > > > > ROLLBACK of
> > > > > >   CREATE INDEX wrongly discards the inval, leading to the 
> > > > > > relhasindex=t loss
> > > > > >   still seen in inplace-inval.spec.  CacheInvalidateRelmap() does 
> > > > > > this right.
> > > > >
> > > > > I plan to fix that like CacheInvalidateRelmap(): send the inval 
> > > > > immediately,
> > > > > inside the critical section.  Send it in heap_xlog_inplace(), too.
> > 
> > I'm worried this might cause its own set of bugs, e.g. if there are any 
> > places
> > that, possibly accidentally, rely on the invalidation from the inplace 
> > update
> > to also cover separate changes.
> 
> Good point.  I do have index_update_stats() still doing an ideally-superfluous
> relcache update for that reason.  Taking that further, it would be cheap
> insurance to have the inplace update do a transactional inval in addition to
> its immediate inval.  Future master-only work could remove the transactional
> one.  How about that?

Restoring the transactional inval seemed good to me, so I've rebased and
included that.  This applies on top of three patches from
https://postgr.es/m/20240822073200.4f.nmi...@google.com.  I'm attaching those
to placate cfbot, but this thread is for review of the last patch only.
Author: Noah Misch 
Commit: Noah Misch 

Warn if LOCKTAG_TUPLE is held at commit, under debug_assertions.

The current use always releases this locktag.  A planned use will
continue that intent.  It will involve more areas of code, making unlock
omissions easier.  Warn under debug_assertions, like we do for various
resource leaks.  Back-patch to v12 (all supported versions), the plan
for the commit of the new use.

Reviewed by FIXME.

Discussion: https://postgr.es/m/20240512232923.aa.nmi...@google.com

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 0400a50..e5e7ab5 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2256,6 +2256,16 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
locallock->numLockOwners = 0;
}
 
+#ifdef USE_ASSERT_CHECKING
+
+   /*
+* Tuple locks are currently held only for short durations 
within a
+* transaction. Check that we didn't forget to release one.
+*/
+   if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_TUPLE && !allLocks)
+   elog(WARNING, "tuple lock held at commit");
+#endif
+
/*
 * If the lock or proclock pointers are NULL, this lock was 
taken via
 * the relation fast-path (and is not known to have been 
transferred).
Author: Noah Misch 
Commit: Noah Misch 

Fix data loss at inplace update after heap_update().

As previously-added tests demonstrated, heap_inplace_update() could
instead update an unrelated tuple of the same catalog.  It could lose
the update.  Losing relhasindex=t was a source of index corruption.
Inplace-updating commands like VACUUM will now wait for heap_update()
commands like GRANT TABLE and GRANT DATABASE.  That isn't ideal, but a
long-running GRANT already hurts VACUUM progress more just by keeping an
XID running.  The VACUUM will behave like a DELETE or UPDATE waiting for
the uncommitted change.

For implementation details, start at the systable_inplace_update_begin()
header comment and README.tuplock.  Back-patch to v12 (all supported
versions).  In back branches, retain a deprecated heap_inplace_update(),
for extensions.

Reviewed by Heikki Linnakangas and Alexander Lakhin.

Discussion: 
https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bg2mgqecerqr5gg...@mail.gmail.com

diff --git a/src/backend/access/heap/README.tuplock 
b/src/backend/access/heap/README.tuplock
index 6441e8b..ddb2def 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -153,3 +153,14 @@ The following infomask bits are applicable:
 
 We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
 is set.
+
+Reading inplace-updated columns
+---
+
+Inplace updates create an exception to the rule that tuple data won't change
+under a reader holding a pin.  A reader of a heap_fetch() result tuple may
+witness a torn read.  C

Re: Interrupts vs signals

2024-08-30 Thread Thomas Munro
On Sat, Aug 31, 2024 at 10:17 AM Heikki Linnakangas  wrote:
> > * This direction seems to fit quite nicely with future ideas about
> > asynchronous network I/O.  That may sound unrelated, but imagine that
> > a future version of WaitEventSet is built on Linux io_uring (or
> > Windows iorings, or Windows IOCP, or kqueue), and waits for the kernel
> > to tell you that network data has been transferred directly into a
> > user space buffer.  You could wait for the interrupt word to change at
> > the same time by treating it as a futex[1].  Then all that other stuff
> > -- signalfd, is_set, maybe_sleeping -- just goes away, and all we have
> > left is one single word in memory.  (That it is possible to do that is
> > not really a coincidence, as our own Mr Freund asked Mr Axboe to add
> > it[2].  The existing latch implementation techniques could be used as
> > fallbacks, but when looked at from the right angle, once you squish
> > all the wakeup reasons into a single word, it's all just an
> > implementation of a multiplexable futex with extra steps.)
>
> Cool

Just by the way, speaking of future tricks and the connections between
this code and other problems in other threads, I wanted to mention
that the above thought is also connected to CF #3998.  When I started
working on this, in parallel I had an experimental patch set using
futexes[1] (back then, to try out futexes, I had to patch my OS[2]
because Linux couldn't multiplex them yet, and macOS/*BSD had
something sort of vaguely similar but effectively only usable between
threads in one process).  I planned to switch to waiting directly on
the interrupt vector as a futex when bringing that idea together with
the one in this thread, but I guess I assumed we had to keep latches
too since they seemed like such a central concept in PostgreSQL.  Your
idea seems much better, the more I think about it, but maybe only the
inventor of latches could have the idea of blowing them up :-)
Anyway, in that same experiment I realised I could wake multiple
backends in one system call, which led to more discoveries about the
negative interactions between latches and locks, and begat CF #3998
(SetLatches()).   By way of excuse, unfortunately I got blocked in my
progress on interrupt vectors for a couple of release cycles by the
recovery conflict system, a set of procsignals that were not like the
others, and turned out to be broken more or less as a result.  That
was tricky to fix (CF #3615), leading to journeys into all kinds of
strange places like the regex code...

[1] https://github.com/macdice/postgres/commits/kqueue-usermem/
[2] https://reviews.freebsd.org/D37102




Re: query_id, pg_stat_activity, extended query protocol

2024-08-30 Thread jian he
On Thu, Aug 15, 2024 at 5:06 AM Imseih (AWS), Sami  wrote:
>
> > I think the testing discussion should be moved to a different thread.
> > What do you think?
> See v4.
>
> 0001 deals with reporting queryId in exec_execute_message and
> exec_bind_message.
> 0002 deals with reporting queryId after a cache invalidation.
>
> There are no tests as this requires more discussion in a separate thread(?)
>

hi.
v4-0001 work as expected. i don't know how to test 0002

In 0001 and 0002, all foreach loops, we can use the new macro foreach_node.
see 
https://git.postgresql.org/cgit/postgresql.git/commit/?id=14dd0f27d7cd56ffae9ecdbe324965073d01a9ff



the following are the minimum tests I come up with for 0001

/* test \bind queryid exists */
select query_id is not null as query_id_exist
from pg_stat_activity where pid = pg_backend_pid() \bind \g



/* test that \parse \bind_named queryid exists */
select pg_backend_pid() as current_pid \gset pref01_
select query_id is not null as query_id_exist from pg_stat_activity
where pid = $1 \parse stmt11
\bind_named stmt11 :pref01_current_pid \g




Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-08-30 Thread Noah Misch
On Thu, Aug 29, 2024 at 03:48:53PM -0500, Masahiko Sawada wrote:
> On Sun, May 19, 2024 at 6:46 AM Noah Misch  wrote:
> > If I were standardizing pg_trgm on one or the other notion of "char", I 
> > would
> > choose signed char, since I think it's still the majority.  More broadly, I
> > see these options to fix pg_trgm:
> >
> > 1. Change to signed char.  Every arm64 system needs to scan pg_trgm indexes.
> > 2. Change to unsigned char.  Every x86 system needs to scan pg_trgm indexes.
> 
> Even though it's true that signed char systems are the majority, it
> would not be acceptable to force the need to scan pg_trgm indexes on
> unsigned char systems.
> 
> > 3. Offer both, as an upgrade path.  For example, pg_trgm could have separate
> >operator classes gin_trgm_ops and gin_trgm_ops_unsigned.  Running
> >pg_upgrade on an unsigned-char system would automatically map v17
> >gin_trgm_ops to v18 gin_trgm_ops_unsigned.  This avoids penalizing any
> >architecture with upgrade-time scans.
> 
> Very interesting idea. How can new v18 users use the correct operator
> class? I don't want to require users to specify the correct signed or
> unsigned operator classes when creating a GIN index. Maybe we need to

In brief, it wouldn't matter which operator class new v18 indexes use.  The
documentation would focus on gin_trgm_ops and also say something like:

  There's an additional operator class, gin_trgm_ops_unsigned.  It behaves
  exactly like gin_trgm_ops, but it uses a deprecated on-disk representation.
  Use gin_trgm_ops in new indexes, but there's no disadvantage from continuing
  to use gin_trgm_ops_unsigned.  Before PostgreSQL 18, gin_trgm_ops used a
  platform-dependent representation.  pg_upgrade automatically uses
  gin_trgm_ops_unsigned when upgrading from source data that used the
  deprecated representation.

What concerns might users have, then?  (Neither operator class would use plain
"char" in a context that affects on-disk state.  They'll use "signed char" and
"unsigned char".)

> dynamically use the correct compare function for the same operator
> class depending on the char signedness. But is it possible to do it on
> the extension (e.g. pg_trgm) side?

No, I don't think the extension can do that cleanly.  It would need to store
the signedness in the index somehow, and GIN doesn't call the opclass at a
time facilitating that.  That would need help from the core server.




Re: Streaming read-ready sequential scan code

2024-08-30 Thread Thomas Munro
On Fri, Aug 30, 2024 at 1:00 AM Alexander Lakhin  wrote:
> I looked at two perf profiles of such out-of-sync processes and found no
> extra calls or whatsoever in the slow one, it just has the number of perf
> samples increased proportionally. It made me suspect CPU frequency
> scaling... Indeed, with the "performance" governor set and the boost mode
> disabled, I'm now seeing much more stable numbers (I do this tuning before
> running performance tests, but I had forgotten about that when I ran that
> your test, my bad).

Aha, mystery solved.

I have pushed the fix.  Thanks!




Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-30 Thread Ilya Gladyshev
LGTM in general, but here are some final nitpicks:

+   if (maxFd != 0)
+   (void) select(maxFd + 1, &input_mask, &output_mask, 
&except_mask, NULL);

It’s a good idea to check for the return value of select, in case it returns 
any errors.

+   dbs_complete++;
+   (void) PQgetResult(slot->conn);
+   PQfinish(slot->conn);

Perhaps it’s rather for me to understand, what does PQgetResult call do here?

+   /* Check for connection failure. */
+   if (PQconnectPoll(slot->conn) == PGRES_POLLING_FAILED)
+   pg_fatal("connection failure: %s", 
PQerrorMessage(slot->conn));
+
+   /* Check whether the connection is still establishing. 
*/
+   if (PQconnectPoll(slot->conn) != PGRES_POLLING_OK)
+   return;

Are the two consecutive calls of PQconnectPoll intended here? Seems a bit 
wasteful, but maybe that’s ok.

We should probably mention this change in the docs as well, I found these two 
places that I think can be improved:

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 9877f2f01c..ad7aa33f07 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -118,7 +118,7 @@ PostgreSQL documentation
  
   -j njobs
   --jobs=njobs
-  number of simultaneous processes or threads to use
+  number of simultaneous processes, threads or connections 
to use
   
  
 
@@ -587,8 +587,9 @@ NET STOP postgresql-&majorversion;
 
 
  The --jobs option allows multiple CPU cores to be used
- for copying/linking of files and to dump and restore database schemas
- in parallel;  a good place to start is the maximum of the number of
+ for copying/linking of files, to dump and restore database schemas in
+ parallel and to use multiple simultaneous connections for upgrade checks;
+  a good place to start is the maximum of the number of
  CPU cores and tablespaces.  This option can dramatically reduce the
  time to upgrade a multi-database server running on a multiprocessor
  machine.
-- 


> 28 авг. 2024 г., в 22:46, Nathan Bossart  
> написал(а):
> 
> I spent some time cleaning up names, comments, etc.  Barring additional
> feedback, I'm planning to commit this stuff in the September commitfest so
> that it has plenty of time to bake in the buildfarm.
> 
> -- 
> nathan
> 



Re: race condition in pg_class

2024-08-30 Thread Noah Misch
On Thu, Aug 29, 2024 at 09:08:43PM +0530, Nitin Motiani wrote:
> On Thu, Aug 29, 2024 at 8:11 PM Noah Misch  wrote:
> > On Tue, Aug 20, 2024 at 11:59:45AM +0300, Heikki Linnakangas wrote:
> > > My order of preference is: 2, 1, 3.
> >
> > I kept tuple locking responsibility in heapam.c.  That's simpler and better
> > for modularity, but it does mean we release+acquire after any xmax wait.
> > Before, we avoided that if the next genam.c scan found the same TID.  (If 
> > the
> > next scan finds the same TID, the xmax probably aborted.)  I think DDL 
> > aborts
> > are rare enough to justify simplifying as this version does.  I don't expect
> > anyone to notice the starvation outside of tests built to show it.  (With
> > previous versions, one can show it with a purpose-built test that commits
> > instead of aborting, like the "001_pgbench_grant@9" test.)
> >
> > This move also loses the optimization of unpinning before 
> > XactLockTableWait().
> > heap_update() doesn't optimize that way, so that's fine.
> >
> > The move ended up more like (1), though I did do
> > s/systable_inplace_update_scan/systable_inplace_update_begin/ like in (2).  
> > I
> > felt that worked better than (2) to achieve lock release before
> > CacheInvalidateHeapTuple().  Alternatives that could be fine:
> >
> From a consistency point of view, I find it cleaner if we can have all
> the heap_inplace_lock and heap_inplace_unlock in the same set of
> functions. So here those would be the systable_inplace_* functions.

That will technically be the case after inplace160, and I could make it so
here by inlining heap_inplace_unlock() into its heapam.c caller.  Would that
be cleaner or less clean?

> > - In the cancel case, call both systable_inplace_update_cancel and
> >   systable_inplace_update_end.  _finish or _cancel would own unlock, while
> >   _end would own systable_endscan().
> >
> What happens to CacheInvalidateHeapTuple() in this approach?  I think
> it will still need to be brought to the genam.c layer if we are
> releasing the lock in systable_inplace_update_finish.

Cancel scenarios don't do invalidation.  (Same under other alternatives.)

> > - Hoist the CacheInvalidateHeapTuple() up to the genam.c layer.  While
> >   tolerable now, this gets less attractive after the inplace160 patch from
> >   https://postgr.es/m/flat/20240523000548.58.nmi...@google.com
> >
> I skimmed through the inplace160 patch. It wasn't clear to me why this
> becomes less attractive with the patch. I see there is a new
> CacheInvalidateHeapTupleInPlace but that looks like it would be called
> while holding the lock. And then there is an
> AcceptInvalidationMessages which can perhaps be moved to the genam.c
> layer too. Is the concern that one invalidation call will be in the
> heapam layer and the other will be in the genam layer?

That, or a critical section would start in heapam.c, then end in genam.c.
Current call tree at inplace160 v4:

genam.c:systable_inplace_update_finish
 heapam.c:heap_inplace_update
  PreInplace_Inval
  START_CRIT_SECTION
  BUFFER_LOCK_UNLOCK
  AtInplace_Inval
  END_CRIT_SECTION
  UnlockTuple
  AcceptInvalidationMessages

If we hoisted all of invalidation up to the genam.c layer, a critical section
that starts in heapam.c would end in genam.c:

genam.c:systable_inplace_update_finish
 PreInplace_Inval
 heapam.c:heap_inplace_update
  START_CRIT_SECTION
 BUFFER_LOCK_UNLOCK
 AtInplace_Inval
 END_CRIT_SECTION
 UnlockTuple
 AcceptInvalidationMessages

If we didn't accept splitting the critical section but did accept splitting
invalidation responsibilities, one gets perhaps:

genam.c:systable_inplace_update_finish
 PreInplace_Inval
 heapam.c:heap_inplace_update
  START_CRIT_SECTION
  BUFFER_LOCK_UNLOCK
  AtInplace_Inval
  END_CRIT_SECTION
  UnlockTuple
 AcceptInvalidationMessages

That's how I ended up at inplace120 v9's design.

> Also I have a small question from inplace120.
> 
> I see that all the places we check resultRelInfo->ri_needLockTagTuple,
> we can just call
> IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a
> big advantage of storing a separate bool field? Also there is another

No, not a big advantage.  I felt it was more in line with the typical style of
src/backend/executor.

> write to ri_RelationDesc in CatalogOpenIndexes in
> src/backlog/catalog/indexing.c. I think ri_needLockTagTuple needs to
> be set there also to keep it consistent with ri_RelationDesc. Please
> let me know if I am missing something about the usage of the new
> field.

Can you say more about consequences you found?

Only the full executor reads the field, doing so when it fetches the most
recent version of a row.  CatalogOpenIndexes() callers lack the full
executor's practice of fetching the most recent version of a row, so they
couldn't benefit reading the field.

I don't think any CatalogOpenIndexes() caller passes its ResultRelInfo to the
full executor, and "typedef struct ResultRelInfo *CatalogIndexState" exists in
part to k

Re: 039_end_of_wal: error in "xl_tot_len zero" test

2024-08-30 Thread Thomas Munro
Pushed.  Thanks!