[PATCH] Add roman support for to_number function
Hi, While looking at formatting.c file, I noticed a TODO about "add support for roman number to standard number conversion" ( https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/formatting.c#L52 ) I have attached the patch that adds support for this and converts roman numerals to standard numbers (1 to 3999) while also checking if roman numerals are valid or not. I have also added a new error code: ERRCODE_INVALID_ROMAN_NUMERAL in case of invalid numerals. A few examples: postgres=# SELECT to_number('MC', 'RN'); to_number --- 1100 (1 row) postgres=# SELECT to_number('XIV', 'RN'); to_number --- 14 (1 row) postgres=# SELECT to_number('MMMCMXCIX', 'RN'); to_number --- 3999 (1 row) postgres=# SELECT to_number('MCCD', 'RN'); ERROR: invalid roman numeral I would appreciate your feedback on the following cases: - Is it okay for the function to handle Roman numerals in a case-insensitive way? (e.g., 'XIV', 'xiv', and 'Xiv' are all seen as 14). - How should we handle Roman numerals with leading or trailing spaces, like ' XIV' or 'MC '? Should we trim the spaces, or would it be better to throw an error in such cases? I have tested the changes and would appreciate any suggestions for improvement. I will update the docs and regressions in the next version of patch. Regards, Hunaid Sohail v1-0001-Add-RN-rn-support-for-to_number-function.patch Description: Binary data
Re: consider -Wmissing-variable-declarations
On 28.08.24 05:31, Thomas Munro wrote: On Wed, Jun 19, 2024 at 3:02 AM Andres Freund wrote: -const char *EAN13_range[][2] = { +static const char *EAN13_range[][2] = { {"000", "019"}, /* GS1 US */ {"020", "029"}, /* Restricted distribution (MO defined) */ {"030", "039"}, /* GS1 US */ -const char *ISBN_range[][2] = { +static const char *ISBN_range[][2] = { {"0-00", "0-19"}, {"0-200", "0-699"}, {"0-7000", "0-8499"}, @@ -967,7 +967,7 @@ const char *ISBN_range[][2] = { */ FYI these ones generate -Wunused-variable warnings from headerscheck on CI, though it doesn't fail the task. Hmm, these aren't really headers, are they? Yes, it looks like these ought to be excluded from checking: diff --git a/src/tools/pginclude/headerscheck b/src/tools/pginclude/headerscheck index 436e2b92a33..3fc737d2cc1 100755 --- a/src/tools/pginclude/headerscheck +++ b/src/tools/pginclude/headerscheck @@ -138,6 +138,12 @@ do test "$f" = src/pl/tcl/pltclerrcodes.h && continue # Also not meant to be included standalone. + test "$f" = contrib/isn/EAN13.h && continue + test "$f" = contrib/isn/ISBN.h && continue + test "$f" = contrib/isn/ISMN.h && continue + test "$f" = contrib/isn/ISSN.h && continue + test "$f" = contrib/isn/UPC.h && continue + test "$f" = src/include/common/unicode_nonspacing_table.h && continue test "$f" = src/include/common/unicode_east_asian_fw_table.h && continue
Re: pgsql: Add more SQL/JSON constructor functions
On Thu, Aug 22, 2024 at 12:44 PM Amit Langote wrote: > On Thu, Aug 22, 2024 at 11:02 jian he wrote: >> On Tue, Jul 30, 2024 at 12:59 PM Amit Langote >> wrote: >> > On Fri, Jul 26, 2024 at 11:19 PM jian he >> > wrote: >> > > { >> > > ... >> > > /* >> > > * For expression nodes that support soft errors. Should be set to >> > > NULL >> > > * before calling ExecInitExprRec() if the caller wants errors >> > > thrown. >> > > */ >> > > ErrorSaveContext *escontext; >> > > } ExprState; >> > > >> > > i believe by default makeNode will set escontext to NULL. >> > > So the comment should be, if you want to catch the soft errors, make >> > > sure the escontext pointing to an allocated ErrorSaveContext. >> > > or maybe just say, default is NULL. >> > > >> > > Otherwise, the original comment's meaning feels like: we need to >> > > explicitly set it to NULL >> > > for certain operations, which I believe is false? >> > >> > OK, I'll look into updating this. See 0001. >> > > json_behavior_type: >> > > ERROR_P{ $$ = JSON_BEHAVIOR_ERROR; } >> > > | NULL_P{ $$ = JSON_BEHAVIOR_NULL; } >> > > | TRUE_P{ $$ = JSON_BEHAVIOR_TRUE; } >> > > | FALSE_P{ $$ = JSON_BEHAVIOR_FALSE; } >> > > | UNKNOWN{ $$ = JSON_BEHAVIOR_UNKNOWN; } >> > > | EMPTY_P ARRAY{ $$ = JSON_BEHAVIOR_EMPTY_ARRAY; } >> > > | EMPTY_P OBJECT_P{ $$ = JSON_BEHAVIOR_EMPTY_OBJECT; } >> > > /* non-standard, for Oracle compatibility only */ >> > > | EMPTY_P{ $$ = JSON_BEHAVIOR_EMPTY_ARRAY; } >> > > ; >> > > >> > > EMPTY_P behaves the same as EMPTY_P ARRAY >> > > so for function GetJsonBehaviorConst, the following "case >> > > JSON_BEHAVIOR_EMPTY:" is wrong? >> > > >> > > case JSON_BEHAVIOR_NULL: >> > > case JSON_BEHAVIOR_UNKNOWN: >> > > case JSON_BEHAVIOR_EMPTY: >> > > val = (Datum) 0; >> > > isnull = true; >> > > typid = INT4OID; >> > > len = sizeof(int32); >> > > isbyval = true; >> > > break; >> > > >> > > also src/backend/utils/adt/ruleutils.c >> > > if (jexpr->on_error->btype != JSON_BEHAVIOR_EMPTY) >> > > get_json_behavior(jexpr->on_error, context, "ERROR"); >> > >> > Something like the attached makes sense? While this meaningfully >> > changes the deparsing output, there is no behavior change for >> > JsonTable top-level path execution. That's because the behavior when >> > there's an error in the execution of the top-level path is to throw it >> > or return an empty set, which is handled in jsonpath_exec.c, not >> > execExprInterp.c. See 0002. I'm also attaching 0003 to fix a minor annoyance that JSON_TABLE() columns' default ON ERROR, ON EMPTY behaviors are unnecessarily emitted in the deparsed output when the top-level ON ERROR behavior is ERROR. Will push these on Monday. I haven't had a chance to take a closer look at your patch to optimize the code in ExecInitJsonExpr() yet. -- Thanks, Amit Langote v1-0002-SQL-JSON-Fix-default-ON-ERROR-behavior-for-JSON_T.patch Description: Binary data v1-0001-Update-comment-about-ExprState.escontext.patch Description: Binary data v1-0003-SQL-JSON-Fix-JSON_TABLE-column-deparsing.patch Description: Binary data
Re: Set query_id for query contained in utility statement
On Tue, Aug 27, 2024 at 11:14 AM jian he wrote: > also it's ok to use passed (ParseState *pstate) for > { > estate = CreateExecutorState(); > estate->es_param_list_info = params; > paramLI = EvaluateParams(pstate, entry, execstmt->params, estate); > } > ? > I really don't know. > > some of the change is refactoring, maybe you can put it into a separate patch. Thanks for the review. I think the parser state is mostly used for the error callbacks and parser_errposition but I'm not 100% sure. Either way, you're right and it probably shouldn't be in the patch. I've modified the patch to restrict the changes to only add the necessary query jumble and post parse hook calls. v4-0001-Set-query_id-for-queries-contained-in-utility-sta.patch Description: Binary data
Re: ANALYZE ONLY
On Thu, Aug 29, 2024 at 7:31 PM Melih Mutlu wrote: > > Hi Michael, > > Michael Harris , 23 Ağu 2024 Cum, 13:01 tarihinde şunu > yazdı: >> >> V2 of the patch is attached. > > > Thanks for updating the patch. I have a few more minor feedbacks. > >> -ANALYZE [ ( option [, ...] ) ] >> [ table_and_columns [, ...] ] >> +ANALYZE [ ( option [, ...] ) ] >> [ [ ONLY ] table_and_columns [, >> ...] ] > > > I believe moving "[ ONLY ]" to the definitions of table_and_columns below, as > you did with "[ * ]", might be better to be consistent with other places (see > [1]) > >> + if ((options & VACOPT_VACUUM) && is_partitioned_table && ! >> include_children) > > I think you are right. ANALYZE [ ( option [, ...] ) ] [ [ ONLY ] table_and_columns [, ...] ] seems not explain commands like: ANALYZE ONLY only_parted(a), ONLY only_parted(b);
Re: Collect statistics about conflicts in logical replication
On Fri, Aug 30, 2024 at 12:15 PM Zhijie Hou (Fujitsu) wrote: > > > Here is V5 patch which addressed above and Shveta's[1] comments. > The patch looks good to me. thanks Shveta
Re: ANALYZE ONLY
Hi Michael, On Fri, 23 Aug 2024 at 22:01, Michael Harris wrote: > V2 of the patch is attached. (I understand this is your first PostgreSQL patch) I just wanted to make sure you knew about the commitfest cycle we use for the development of PostgreSQL. If you've not got one already, can you register a community account. That'll allow you to include this patch in the September commitfest that starts on the 1st. See https://commitfest.postgresql.org/49/ I understand there's now some sort of cool-off period for new community accounts, so if you have trouble adding the patch before the CF starts, let me know and I should be able to do it for you. The commitfest normally closes for new patches around 12:00 UTC on the 1st of the month. There's a bit more information in https://wiki.postgresql.org/wiki/Submitting_a_Patch#Patch_submission David
Re: Add contrib/pg_logicalsnapinspect
Hi, On Thu, Aug 29, 2024 at 02:15:47PM +, Bertrand Drouvot wrote: > I don't see any use case where it could be useful when the server is down. So, > I think I'll move forward with in core functions (unless someone has a > different > opinion). > Please find v2 attached that creates the 2 new in core functions. Note that once those new functions are in (or maybe sooner), I'll submit an additional patch to get rid of the code duplication between the new ValidateSnapshotFile() and SnapBuildRestore(). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 6c5a1ad66b203036739aae955932b8e3813c71e3 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Thu, 29 Aug 2024 20:51:31 + Subject: [PATCH v2] Functions to get ondisk logical snapshots details Provides SQL functions that allow to inspect the contents of serialized logical snapshots of a running database cluster, which is useful for debugging or educational purposes. --- contrib/test_decoding/Makefile| 2 +- .../expected/get_ondisk_snapshot_info.out | 57 + contrib/test_decoding/meson.build | 1 + .../specs/get_ondisk_snapshot_info.spec | 32 +++ doc/src/sgml/func.sgml| 53 + src/backend/replication/logical/snapbuild.c | 225 ++ src/include/catalog/pg_proc.dat | 16 ++ 7 files changed, 385 insertions(+), 1 deletion(-) 17.3% contrib/test_decoding/expected/ 12.6% contrib/test_decoding/specs/ 17.3% doc/src/sgml/ 45.3% src/backend/replication/logical/ 6.6% src/include/catalog/ diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile index a4ba1a509a..b1b8ffa9e8 100644 --- a/contrib/test_decoding/Makefile +++ b/contrib/test_decoding/Makefile @@ -9,7 +9,7 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \ ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \ oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \ twophase_snapshot slot_creation_error catalog_change_snapshot \ - skip_snapshot_restore + skip_snapshot_restore get_ondisk_snapshot_info REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf diff --git a/contrib/test_decoding/expected/get_ondisk_snapshot_info.out b/contrib/test_decoding/expected/get_ondisk_snapshot_info.out new file mode 100644 index 00..b676ccd528 --- /dev/null +++ b/contrib/test_decoding/expected/get_ondisk_snapshot_info.out @@ -0,0 +1,57 @@ +Parsed test spec with 2 sessions + +starting permutation: s0_init s0_begin s0_savepoint s0_truncate s1_checkpoint s1_get_changes s0_commit s0_begin s0_insert s1_checkpoint s1_get_changes s0_commit s1_get_changes s1_get_logical_snapshot_info s1_get_logical_snapshot_meta +step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); +?column? + +init +(1 row) + +step s0_begin: BEGIN; +step s0_savepoint: SAVEPOINT sp1; +step s0_truncate: TRUNCATE tbl1; +step s1_checkpoint: CHECKPOINT; +step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0'); +data + +(0 rows) + +step s0_commit: COMMIT; +step s0_begin: BEGIN; +step s0_insert: INSERT INTO tbl1 VALUES (1); +step s1_checkpoint: CHECKPOINT; +step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0'); +data +--- +BEGIN +table public.tbl1: TRUNCATE: (no-flags) +COMMIT +(3 rows) + +step s0_commit: COMMIT; +step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0'); +data +- +BEGIN +table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null +COMMIT +(3 rows) + +step s1_get_logical_snapshot_info: SELECT (pg_get_logical_snapshot_info(f.name::pg_lsn)).state,(pg_get_logical_snapshot_info(f.name::pg_lsn)).catchange_count,array_length((pg_get_logical_snapshot_info(f.name::pg_lsn)).catchange_xip,1),(pg_get_logical_snapshot_info(f.name::pg_lsn)).committed_count,array_length((pg_get_logical_snapshot_info(f.name::pg_lsn)).committed_xip,1) FROM (SELECT replace(replace(name,'.snap',''),'-','/') AS name FROM pg_ls_logicalsnapdir()) AS f ORDER BY 2; +state|catchange_count|array_length|committed_count|array_length +-+---++---+ +2| 0|
Re: long-standing data loss bug in initial sync of logical replication
> BTW, we should do some performance testing by having a mix of DML and > DDLs to see the performance impact of this patch. > > [1] - > https://www.postgresql.org/message-id/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=k...@mail.gmail.com > I did some performance testing and I found some performance impact for the following case: 1. Created a publisher, subscriber set up on a single table, say 'tab_conc1'; 2. Created a second publisher, subscriber set on a single table say 'tp'; 3. Created 'tcount' no. of tables. These tables are not part of any publication. 4. There are two sessions running in parallel, let's say S1 and S2. 5. Begin a transaction in S1. 6. Now in a loop (this loop runs 100 times): S1: Insert a row in table 'tab_conc1' S1: Insert a row in all 'tcount' tables. S2: BEGIN; Alter publication for 2nd publication; COMMIT; The current logic in the patch will call the function 'rel_sync_cache_publication_cb' during invalidation. This will invalidate the cache for all the tables. So cache related to all the tables i.e. table 'tab_conc1', 'tcount' tables will be invalidated. 7. COMMIT the transaction in S1. The performance in this case is: No. of tables | With patch (in ms) | With head (in ms) - tcount = 100 | 101376.4 | 101357.8 tcount = 1000| 994085.4 | 993471.4 For 100 tables the performance is slow by '0.018%' and for 1000 tables performance is slow by '0.06%'. These results are the average of 5 runs. Other than this I tested the following cases but did not find any performance impact: 1. with 'tcount = 10'. But I didn't find any performance impact. 2. with 'tcount = 0' and running the loop 1000 times. But I didn't find any performance impact. I have also attached the test script and the machine configurations on which performance testing was done. Next I am planning to test solely on the logical decoding side and will share the results. Thanks and Regards, Shlok Kyal NAME="Red Hat Enterprise Linux Server" VERSION="7.9 (Maipo)" ID="rhel" ID_LIKE="fedora" VARIANT="Server" VARIANT_ID="server" VERSION_ID="7.9" PRETTY_NAME="Red Hat Enterprise Linux Server 7.9 (Maipo)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:7.9:GA:server" HOME_URL="https://www.redhat.com/"; BUG_REPORT_URL="https://bugzilla.redhat.com/"; REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7" REDHAT_BUGZILLA_PRODUCT_VERSION=7.9 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="7.9" CPU INFO processor : 119 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz stepping: 7 microcode : 0x715 cpu MHz : 1505.957 cache size : 38400 KB physical id : 3 siblings: 30 core id : 14 cpu cores : 15 apicid : 125 initial apicid : 125 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d bogomips: 5629.54 clflush size: 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: MemTotal: 792237404 kB MemFree:724051992 kB MemAvailable: 762505368 kB Buffers:2108 kB Cached: 43885588 kB SwapCached:0 kB Active: 22276460 kB Inactive: 21761812 kB Active(anon):1199380 kB Inactive(anon): 4228212 kB Active(file): 21077080 kB Inactive(file): 17533600 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 4194300 kB SwapFree:4194300 kB Dirty: 0 kB Writeback: 0 kB AnonPages:150796 kB Mapped: 5283248 kB Shmem: 5277044 kB Slab:1472084 kB SReclaimable:1165144 kB SUnreclaim: 306940 kB KernelStack: 17504 kB PageTables:21540 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:400313000 kB Committed_AS: 86287072 kB VmallocTotal: 34359738367 kB VmallocUsed: 1942092 kB VmallocChunk: 33753397244 kB Percpu:36352 kB HardwareCorrupted: 0 kB AnonHugePages: 81920 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp
Re: [PoC] Federated Authn/z with OAUTHBEARER
On 28.08.24 18:31, Jacob Champion wrote: On Mon, Aug 26, 2024 at 4:23 PM Jacob Champion wrote: I was having trouble reasoning about the palloc-that-isn't-palloc code during the first few drafts, so I will try a round with the jsonapi_ prefix. v27 takes a stab at that. I have kept the ALLOC/FREE naming to match the strategy in other src/common source files. This looks pretty good to me. Maybe on the naming side, this seems like a gratuitous divergence: +#define jsonapi_createStringInfo makeStringInfo The name of the variable JSONAPI_USE_PQEXPBUFFER leads to sections of code that look like this: +#ifdef JSONAPI_USE_PQEXPBUFFER +if (!new_prediction || !new_fnames || !new_fnull) +return false; +#endif To me it wouldn't be immediately obvious why "using PQExpBuffer" has anything to do with this code; the key idea is that we expect any allocations to be able to fail. Maybe a name like JSONAPI_ALLOW_OOM or JSONAPI_SHLIB_ALLOCATIONS or...? Seems ok to me as is. I think the purpose of JSONAPI_USE_PQEXPBUFFER is adequately explained by this comment +/* + * By default, we will use palloc/pfree along with StringInfo. In libpq, + * use malloc and PQExpBuffer, and return JSON_OUT_OF_MEMORY on out-of-memory. + */ +#ifdef JSONAPI_USE_PQEXPBUFFER For some of the other proposed names, I'd be afraid that someone might think you are free to mix and match APIs, OOM behavior, and compilation options. Some comments on src/include/common/jsonapi.h: -#include "lib/stringinfo.h" I suspect this will fail headerscheck? Probably needs an exception added there. +#ifdef JSONAPI_USE_PQEXPBUFFER +#define StrValType PQExpBufferData +#else +#define StrValType StringInfoData +#endif Maybe use jsonapi_StrValType here. +typedef struct StrValType StrValType; I don't think that is needed. It would just duplicate typedefs that already exist elsewhere, depending on what StrValType is set to. + boolparse_strval; + StrValType *strval; /* only used if parse_strval == true */ The parse_strval field could use a better explanation. I actually don't understand the need for this field. AFAICT, this is just used to record whether strval is valid. But in the cases where it's not valid, why do we need to record that? Couldn't you just return failed_oom in those cases?
Re: Add contrib/pg_logicalsnapinspect
On Thu, Aug 29, 2024 at 6:33 PM Bharath Rupireddy wrote: > > On Thu, Aug 29, 2024 at 3:44 PM Bertrand Drouvot > wrote: > > > > Yeah that's fair. And now I'm wondering if we need an extra module. I think > > we could "simply" expose 2 new functions in core, thoughts? > > > > > > What do you think? Did you have something else in mind? > > > > > > > > > > On similar lines, we can also provide a function to get the slot's > > > on-disk data. > > > > Yeah, having a way to expose the data from the disk makes fully sense to me. > > > > > IIRC, Bharath had previously proposed a tool to achieve > > > the same. It is fine if we don't want to add that as part of this > > > patch but I mentioned it because by having that we can have a set of > > > functions to view logical decoding data. > > > > That's right. I think this one would be simply enough to expose one or two > > functions in core too (and probably would not need an extra module). > > +1 for functions in core unless this extra module > pg_logicalsnapinspect works as a tool to be helpful even when the > server is down. > We have an example of pageinspect which provides low-level functions to aid debugging. The proposal for these APIs also seems to fall in the same category, so why go for the core for these functions? -- With Regards, Amit Kapila.
Re: why there is not VACUUM FULL CONCURRENTLY?
Hi, On Tue, Aug 27, 2024 at 8:01 PM Antonin Houska wrote: > > Attached is version 2, the feature itself is now in 0004. > > Unlike version 1, it contains some regression tests (0006) and a new GUC to > control how long the AccessExclusiveLock may be held (0007). > > Kirill Reshke wrote: > > > On Fri, 2 Aug 2024 at 11:09, Antonin Houska wrote: > > > > > > Kirill Reshke wrote: > > > > However, in general, the 3rd patch is really big, very hard to > > > > comprehend. Please consider splitting this into smaller (and > > > > reviewable) pieces. > > > > > > I'll try to move some preparation steps into separate diffs, but not sure > > > if > > > that will make the main diff much smaller. I prefer self-contained > > > patches, as > > > also explained in [3]. > > > > Thanks for sharing [3], it is a useful link. > > > > There is actually one more case when ACCESS EXCLUSIVE is held: during > > table rewrite (AT set TAM, AT set Tablespace and AT alter column type > > are some examples). > > This can be done CONCURRENTLY too, using the same logical replication > > approach, or do I miss something? > > Yes, the logical replication can potentially be used in other cases. > > > I'm not saying we must do it immediately, this should be a separate > > thread, but we can do some preparation work here. > > > > I can see that a bunch of functions which are currently placed in > > cluster.c can be moved to something like > > logical_rewrite_heap.c. ConcurrentChange struct and > > apply_concurrent_insert function is one example of such. > > > > So, if this is the case, 0003 patch can be splitted in two: > > The first one is general utility code for logical table rewrite > > The second one with actual VACUUM CONCURRENTLY feature. > > > What do you think? > > I can imagine moving the function process_concurrent_changes() and subroutines > to a different file (e.g. rewriteheap.c), but moving it into a separate diff > that does not contain any call of the function makes little sense to me. Such > a diff would not add any useful functionality and could not be considered > refactoring either. > > So far I at least moved some code to separate diffs: 0003 and 0005. I'll move > more if I find sensible opportunity in the future. > > -- > Antonin Houska > Web: https://www.cybertec-postgresql.com > Thanks for working on this, I think this is a very useful feature. The patch doesn't compile in the debug build with errors: ../postgres/src/backend/commands/cluster.c: In function ‘get_catalog_state’: ../postgres/src/backend/commands/cluster.c:2771:33: error: declaration of ‘td_src’ shadows a previous local [-Werror=shadow=compatible-local] 2771 | TupleDesc td_src, td_dst; | ^~ ../postgres/src/backend/commands/cluster.c:2741:25: note: shadowed declaration is here 2741 | TupleDesc td_src = RelationGetDescr(rel); you forgot the meson build for pgoutput_cluster diff --git a/src/backend/meson.build b/src/backend/meson.build index 78c5726814..0f9141a4ac 100644 --- a/src/backend/meson.build +++ b/src/backend/meson.build @@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + { subdir('jit/llvm') subdir('replication/libpqwalreceiver') subdir('replication/pgoutput') +subdir('replication/pgoutput_cluster') I noticed that you use lmode/lock_mode/lockmode, there are lmode and lockmode in the codebase, but I remember someone proposed all changes to lockmode, how about sticking to lockmode in your patch? 0004: + sure that the old files do not change during the processing because the + chnages would get lost due to the swap. typo + files. The data changes that took place during the creation of the new + table and index files are captured using logical decoding + () and applied before + the ACCESS EXCLUSIVE lock is requested. Thus the lock + is typically held only for the time needed to swap the files, which + should be pretty short. I remember pg_squeeze also did some logical decoding after getting the exclusive lock, if that is still true, I guess the doc above is not precise. + Note that CLUSTER with the + the CONCURRENTLY option does not try to order the + rows inserted into the table after the clustering started. Do you mean after the *logical decoding* started here? If CLUSTER CONCURRENTLY does not order rows at all, why bother implementing it? + errhint("CLUSTER CONCURRENTLY is only allowed for permanent relations"))); errhint messages should end with a dot. Why hardcoded to "CLUSTER CONCURRENTLY" instead of parameter *stmt*. + ResourceOwner oldowner = CurrentResourceOwner; + + /* + * In the CONCURRENT case, do the planning in a subtrensaction so that typo I did not see VacuumStmt changes in gram.y, how do we suppose to use the vacuum full concurrently? I tried the following but no success. [local] postgres@demo:5432-36097=# vacuum (concurrently) aircrafts_data; ERROR: CONCURRENTLY can only be specified with VACUUM FULL [local] postgres@demo:5432-36097=# vacuum full (c
Re: allowing extensions to control planner behavior
On Thu, Aug 29, 2024 at 6:49 PM Jeff Davis wrote: > I don't see that in the code yet, so I assume you are referring to the > comment at [1]? FYI, I'm hacking on a revised approach but it's not ready to show to other people yet. > I still like my idea to generalize the pathkey infrastructure, and > Robert asked for other approaches to consider. It would allow us to > hold onto multiple paths for longer, similar to pathkeys, which might > offer some benefits or simplifications. This is a fair point. I dislike the fact that add_path() is a thicket of if-statements that's actually quite hard to understand and easy to screw up when you're making modifications. But I feel like it would be difficult to generalize the infrastructure without making it substantially slower, which would probably cause too much of an increase in planning time to be acceptable. So my guess is that this is a dead end, unless there's a clever idea that I'm not seeing. -- Robert Haas EDB: http://www.enterprisedb.com
bt Scankey in another contradictory case
Hi, when i read source code of the part of nbtree, i detected another kind of contradictory case postgresql has not eliminated (eg. x >4 and x <3) in the function _bt_preprocess_keys in nbtutils.c. this cause next unnecessary index scan and qual evaluation. it seems no one will write SQL like that, but maybe it generated from the ec or degenerate qual. the function just invoked in function _bt_first during initializing index scan, so it’s cheap to eliminate this case in the function. we just pay attention on the opposite operator, so there are four detailed cases (ie. x >4 and x <3 , x >4 and x <=3, x >=4 and x <3, x >=4 and x <=3). when test the case is contradictory or not, it's need to use the more restrictive operator when less and more restrictive operator both appeared, case like x >4 and x <=4, if we choose <=, 4 <= 4 is satisfied, actually x >4 and x <=4 is contradictory. when >= with <= or > with <, it's ok to pick any one of them. i have added this feature in my local environment and tested. but i am not sure it's worth to spend more effort on it. bigbro...@hotmail.com
回复: bt Scankey in another contradictory case
Hi, this is the patch attachment. 发件人: b ro 发送时间: 2024年8月30日 0:56 收件人: pgsql-hackers 主题: bt Scankey in another contradictory case Hi, when i read source code of the part of nbtree, i detected another kind of contradictory case postgresql has not eliminated (eg. x >4 and x <3) in the function _bt_preprocess_keys in nbtutils.c. this cause next unnecessary index scan and qual evaluation. it seems no one will write SQL like that, but maybe it generated from the ec or degenerate qual. the function just invoked in function _bt_first during initializing index scan, so it’s cheap to eliminate this case in the function. we just pay attention on the opposite operator, so there are four detailed cases (ie. x >4 and x <3 , x >4 and x <=3, x >=4 and x <3, x >=4 and x <=3). when test the case is contradictory or not, it's need to use the more restrictive operator when less and more restrictive operator both appeared, case like x >4 and x <=4, if we choose <=, 4 <= 4 is satisfied, actually x >4 and x <=4 is contradictory. when >= with <= or > with <, it's ok to pick any one of them. i have added this feature in my local environment and tested. but i am not sure it's worth to spend more effort on it. bigbro...@hotmail.com patch.diff Description: patch.diff
Re: Add contrib/pg_logicalsnapinspect
Hi, On Fri, Aug 30, 2024 at 03:43:12PM +0530, Amit Kapila wrote: > On Thu, Aug 29, 2024 at 6:33 PM Bharath Rupireddy > wrote: > > > > On Thu, Aug 29, 2024 at 3:44 PM Bertrand Drouvot > > wrote: > > > > > > Yeah that's fair. And now I'm wondering if we need an extra module. I > > > think > > > we could "simply" expose 2 new functions in core, thoughts? > > > > > > > > What do you think? Did you have something else in mind? > > > > > > > > > > > > > On similar lines, we can also provide a function to get the slot's > > > > on-disk data. > > > > > > Yeah, having a way to expose the data from the disk makes fully sense to > > > me. > > > > > > > IIRC, Bharath had previously proposed a tool to achieve > > > > the same. It is fine if we don't want to add that as part of this > > > > patch but I mentioned it because by having that we can have a set of > > > > functions to view logical decoding data. > > > > > > That's right. I think this one would be simply enough to expose one or two > > > functions in core too (and probably would not need an extra module). > > > > +1 for functions in core unless this extra module > > pg_logicalsnapinspect works as a tool to be helpful even when the > > server is down. > > > > We have an example of pageinspect which provides low-level functions > to aid debugging. The proposal for these APIs also seems to fall in > the same category, That's right, but... > so why go for the core for these functions? as we decided not to expose the SnapBuildOnDisk and SnapBuild structs to public and to create/expose 2 new functions in snapbuild.c then the functions in the module would do nothing but expose the data coming from the snapbuild.c's functions (get the tuple and send it to the client). That sounds weird to me to create a module that would "only" do so, that's why I thought that in core functions taking care of everything make more sense. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pgstattuple: fix free space calculation
On 8/29/24 4:53 PM, Frédéric Yhuel wrote: So I think we should just use PageGetExactFreeSpace(). I agree, I feel that is the least surprising behavior because we currently sum tiny amounts of free space that is unusable anyway. E.g. imagine one million pages with 10 free bytes each, that looks like 10 free MB so I do not see why we should treat the max tuples per page case with any special logic. Andreas
Re: define PG_REPLSLOT_DIR
Hi, On Fri, Aug 30, 2024 at 03:34:56PM +0900, Michael Paquier wrote: > On Tue, Aug 20, 2024 at 04:23:06PM +, Bertrand Drouvot wrote: > > Please find attached v3 that: > > > > - takes care of your comments (and also removed the use of PG_TBLSPC_DIR in > > RELATIVE_PG_TBLSPC_DIR). > > - removes the new macros from the comments (see Michael's and Yugo-San's > > comments in [0] resp. [1]). > > - adds a missing sizeof() (see [1]). > > - implements Ashutosh's idea of adding a new SLOT_DIRNAME_ARGS (see [2]). > > It's > > done in 0002 (I used REPLSLOT_DIR_ARGS though). > > - fixed a macro usage in ReorderBufferCleanupSerializedTXNs() (was not at > > the > > right location, discovered while implementing 0002). > > I was looking at that, and applied 0001 for pg_replslot and merged > 0003~0005 together for pg_logical. Thanks! > In 0006, I am not sure that RELATIVE_PG_TBLSPC_DIR is really something > we should have. Sure, that's a special case for base backups when > sending a directory, but we have also pg_wal/ and its XLOGDIR so > that's inconsistent. That's right but OTOH there is no (for good reason) PG_WAL_DIR or such defined. That said, I don't have a strong opinion on this one, I think that also makes sense to leave it as it is. Please find attached v4 doing so. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 4736e0913933284bbcbb81228d4a1ea4ce1fb9da Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Mon, 19 Aug 2024 07:40:10 + Subject: [PATCH v4] Define PG_TBLSPC_DIR Replace most of the places where "pg_tblspc" is used in .c files with a new PG_TBLSPC_DIR define. The places where it is not done is for consistency with the existing PG_STAT_TMP_DIR define. --- src/backend/access/transam/xlog.c | 12 - src/backend/access/transam/xlogrecovery.c | 14 +- src/backend/backup/backup_manifest.c| 3 ++- src/backend/backup/basebackup.c | 2 +- src/backend/commands/dbcommands.c | 2 +- src/backend/commands/tablespace.c | 4 +-- src/backend/storage/file/fd.c | 29 +++-- src/backend/storage/file/reinit.c | 10 +++ src/backend/utils/adt/dbsize.c | 8 +++--- src/backend/utils/adt/misc.c| 4 +-- src/backend/utils/cache/relcache.c | 4 +-- src/bin/pg_checksums/pg_checksums.c | 6 ++--- src/bin/pg_combinebackup/pg_combinebackup.c | 18 ++--- src/bin/pg_rewind/file_ops.c| 2 +- src/bin/pg_upgrade/exec.c | 2 +- src/common/file_utils.c | 3 ++- src/common/relpath.c| 20 +++--- src/fe_utils/astreamer_file.c | 7 +++-- src/include/common/relpath.h| 7 + 19 files changed, 85 insertions(+), 72 deletions(-) 15.5% src/backend/access/transam/ 3.3% src/backend/backup/ 4.4% src/backend/commands/ 26.8% src/backend/storage/file/ 8.8% src/backend/utils/adt/ 4.2% src/bin/pg_checksums/ 11.7% src/bin/pg_combinebackup/ 13.3% src/common/ 3.6% src/fe_utils/ 7.8% src/ diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ee0fb0e28f..4e06d86196 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8944,10 +8944,10 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces, datadirpathlen = strlen(DataDir); /* Collect information about all tablespaces */ - tblspcdir = AllocateDir("pg_tblspc"); - while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL) + tblspcdir = AllocateDir(PG_TBLSPC_DIR); + while ((de = ReadDir(tblspcdir, PG_TBLSPC_DIR)) != NULL) { - char fullpath[MAXPGPATH + 10]; + char fullpath[MAXPGPATH + sizeof(PG_TBLSPC_DIR)]; char linkpath[MAXPGPATH]; char *relpath = NULL; char *s; @@ -8970,7 +8970,7 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces, if (*badp != '\0' || errno == EINVAL || errno == ERANGE) continue; - snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name); + snprintf(fullpath, sizeof(fullpath), "%s/%s", PG_TBLSPC_DIR, de->d_name); de_type = get_dirent_type(fullpath, de, false, ERROR); @@ -9031,8 +9031,8 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces, * In this case, we store a relative path rather than an * absolute path into the tablespaceinfo. */ -snprintf(linkpath, sizeof(linkpath), "pg_tblspc/%s", - de->d_name); +snprintf(linkpath, sizeof(linkpath), "%s/%s", + PG_TBLSPC_DIR, de->d_name); relpath = pstrdup(linkpath); } else diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index ad817fbca6..5d2755ef79 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access
Re: JIT: Remove some unnecessary instructions.
On 8/30/24 5:55 AM, Xing Guo wrote: I find there are some unnecessary load/store instructions being emitted by the JIT compiler. Well spotted! All of these are obvious dead instructions and while LLVM might be able to optimize them away there is no reason to create extra work for the optimizer. The patch looks good, applies and the tests passes. Andreas
Re: bt Scankey in another contradictory case
On Fri, Aug 30, 2024 at 7:36 AM b ro wrote: > this is the patch attachment. We discussed this recently: https://www.postgresql.org/message-id/80384.1715458896%40sss.pgh.pa.us I think that we should do this. It doesn't make a huge difference in practice, because we'll still end the scan once the leaf level is reached. But it matters more when array keys are involved, where there might be more than one descent to the leaf level. Plus we might as well just be thorough about this stuff. -- Peter Geoghegan
Re: Segfault in jit tuple deforming on arm64 due to LLVM issue
I created a commitfest entry[1] to have the CI test the patch. There was a failure in headerscheck and cpluspluscheck when the include of SectionMemoryManager.h is checked[2] In file included from /usr/include/llvm/ADT/SmallVector.h:18, from /tmp/cirrus-ci-build/src/include/jit/SectionMemoryManager.h:23, from /tmp/headerscheck.4b1i5C/test.c:2: /usr/include/llvm/Support/type_traits.h:17:10: fatal error: type_traits: No such file or directory 17 | #include Since the SmallVector.h include type_traits, this file can't be compiled with a C compiler so I've just excluded it from headerscheck. Loosely related to headerscheck, running it locally was failing as it couldn't find the file. This is because headerscheck except llvm include files to be in /usr/include and don't rely on llvm-config. I created a second patch to use the LLVM_CPPFLAGS as extra flags when testing the src/include/jit/* files. Lastly, I've used www.github.com instead of github.com link to stop spamming the llvm-project's PR with reference to the commit every time it is pushed somewhere (which seems to be the unofficial hack[3]). [1] https://commitfest.postgresql.org/49/5220/ [2] https://cirrus-ci.com/task/4646639124611072?logs=headers_headerscheck#L42-L46 [3] https://github.com/orgs/community/discussions/23123#discussioncomment-3239240 v6-0002-Add-LLVM_CPPFLAGS-in-headerscheck-to-llvm-jit-fil.patch Description: Binary data v6-0001-Backport-of-LLVM-code-to-fix-ARM-relocation-bug.patch Description: Binary data
Re: Restart pg_usleep when interrupted
> From this discussion, there is desire for a sleep function that: > 1/ Sleeps for the full duration of the requested time > 2/ Continues to handle important interrupts during the sleep > > While something like CF 5118 will take care of point #1, > I'm not sure, even with CF entry 5118, nanosleep() could be interrupted. But I > agree that the leader won't be interrupted by PqMsg_Progress anymore. Correct. > 1. we should still implement the "1 Hz" stuff as 1.1/ it could be useful if CF > 5118 gets committed and we move to WaitLatchUs() and 2.2/ it won't break > anything > if CF gets committed and we don't move to WaitLatchUs(). For 1.1 it would > still > prevent the leader to be waked up too frequently by the parallel workers. Yes, regardless of any changes that may occur in the future that change the behaior of pg_usleep, preventing a leader from being woken up too frequently is good to have. The "1 Hz" work is still good to have. > 2. still discuss the "need" and get a consensus regarding a sleep that could > ensure the wait duration (in some cases and depending of the reason why the > process is waked up). Unless there is objection, I will withdraw the CF [1] entry for this patch next week. This discussion however should be one of the points that CF 5118 must deal with. That is, 5118 will change the behavior of pg_usleep when dealing with interrupts previously signaled by SIGUSR1. [1] https://commitfest.postgresql.org/49/5161/ Regards, Sami
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
On Wed, 28 Aug 2024 at 15:53, Peter Geoghegan wrote: > On Wed, Aug 28, 2024 at 9:49 AM Robert Haas wrote: > > On Wed, Aug 28, 2024 at 9:41 AM Peter Geoghegan wrote: > > > On Wed, Aug 28, 2024 at 9:35 AM Robert Haas wrote: > > > > > If you think it's important to have this info on all indexes then I'd > > > > > prefer the pgstat approach over adding a field in IndexScanDescData. > > > > > If instead you think that this is primarily important to expose for > > > > > nbtree index scans, then I'd prefer putting it in the BTSO using e.g. > > > > > the index AM analyze hook approach, as I think that's much more > > > > > elegant than this. > > > > > > > > I agree with this analysis. I don't see why IndexScanDesc would ever > > > > be the right place for this. > > > > > > Then what do you think is the right place? > > > > The paragraph that I agreed with and quoted in my reply, and that you > > then quoted in your reply to me, appears to me to address that exact > > question. > > Are you talking about adding global counters, in the style of pgBufferUsage? My pgstat approach would be that ExecIndexScan (plus ExecIOS and ExecBitmapIS) could record the current state of relevant fields from node->ss.ss_currentRelation->pgstat_info, and diff them with the recorded values at the end of that node's execution, pushing the result into e.g. Instrumentation; diffing which is similar to what happens in InstrStartNode() and InstrStopNode() but for the relation's pgstat_info instead of pgBufferUsage and pgWalUsage. Alternatively this could happen in ExecProcNodeInstr, but it'd need some more special-casing to make sure it only addresses (index) relation scan nodes. By pulling the stats directly from Relation->pgstat_info, no catalog index scans are counted if they aren't also the index which is subject to that [Bitmap]Index[Only]Scan. In effect, this would need no changes in AM code, as this would "just" pull the data from those statistics which are already being updated in AM code. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Jargon and acronyms on this mailing list
I normally wouldn't mention my blog entries here, but this one was about the hackers mailing list, so wanted to let people know about it in case you don't follow Planet Postgres. I scanned the last year's worth of posts and gathered the most used acronyms and jargon. The most commonly used acronym was IMO (in my opinion), followed by FWIW (for what it's worth), and IIUC (if I understand correctly). The complete list can be found in the post below, I'll refrain from copying everything here. https://www.crunchydata.com/blog/understanding-the-postgres-hackers-mailing-list Cheers, Greg
Re: Make printtup a bit faster
On 8/29/24 1:51 PM, David Rowley wrote: I had planned to work on this for PG18, but I'd be happy for some assistance if you're willing. I am interested in working on this, unless Andy Fan wants to do this work. :) I believe that optimizing the out, in and send functions would be worth the pain. I get Tom's objections but I do not think adding a small check would add much overhead compared to the gains we can get. And given that all of in, out and send could be optimized I do not like the idea of duplicating all three in the catalog. David, have you given any thought on the cleanest way to check for if the new API or the old is the be used for these functions? If not I can figure out something myself, just wondering if you already had something in mind. Andreas
Re: Inline non-SQL SRFs using SupportRequestSimplify
On 7/26/24 11:58, Tom Lane wrote: > Heikki Linnakangas writes: >> On 28/06/2024 01:01, Paul Jungwirth wrote: >>> Another approach I considered is using a separate support request, e.g. SupportRequestInlineSRF, and >>> just calling it from inline_set_returning_function. I didn't like having two support requests that >>> did almost exactly the same thing. OTOH my current approach means you'll get an error if you do this: >>> >>> ``` >>> postgres=# select temporal_semijoin('employees', 'id', 'valid_at', 'positions', 'employee_id', >>> 'valid_at'); >>> ERROR: unrecognized node type: 66 >>> ``` >>> >>> I'll look into ways to fix that. > > I like this idea, but I like exactly nothing about this implementation. > The right thing is to have a separate SupportRequestInlineSRF request > that is called directly by inline_set_returning_function. Here are new patches using a new SupportRequestInlineSRF request type. They include patches and documentation. The patches handle this: SELECT * FROM srf(); but not this: SELECT srf(); In the latter case, Postgres always calls the function in "materialized mode" and gets the whole result up front, so inline_set_returning_function is never called, even for SQL functions. For tests I added a `foo_from_bar(colname, tablename, filter)` PL/pgSQL function that does `SELECT $colname FROM $tablename [WHERE $colname = $filter]`, then the support function generates the same SQL and turns it into a Query node. This matches how I want to use the feature for my temporal_semijoin etc functions. If you give a non-NULL filter, you get a Query with a Var node, so we are testing something that isn't purely Const. The SupportRequestSimplify type has some comments about supporting operators, but I don't think you can have a set-returning operator, so I didn't repeat those comments for this new type. I split things up into three patch files because I couldn't get git to gracefully handle shifting a large block of code into an if statement. The first two patches have no changes except that indentation (and initializing one variable to NULL). They aren't meant to be committed separately. Rebased to a83a944e9f. Yours, -- Paul ~{:-) p...@illuminatedcomputing.comFrom 591de3a06ec97cf1bff10d603946f0bac176b9d9 Mon Sep 17 00:00:00 2001 From: "Paul A. Jungwirth" Date: Thu, 29 Aug 2024 18:17:55 -0700 Subject: [PATCH v2 1/3] Add indented section --- src/backend/optimizer/util/clauses.c | 74 +++- 1 file changed, 73 insertions(+), 1 deletion(-) diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index b4e085e9d4b..ef8282ded49 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -5069,7 +5069,7 @@ inline_set_returning_function(PlannerInfo *root, RangeTblEntry *rte) TupleDesc rettupdesc; List *raw_parsetree_list; List *querytree_list; - Query *querytree; + Query *querytree = NULL; Assert(rte->rtekind == RTE_FUNCTION); @@ -5235,6 +5235,78 @@ inline_set_returning_function(PlannerInfo *root, RangeTblEntry *rte) goto fail; querytree = linitial(querytree_list); } + if (!querytree) + { + /* Fetch the function body */ + tmp = SysCacheGetAttrNotNull(PROCOID, func_tuple, Anum_pg_proc_prosrc); + src = TextDatumGetCString(tmp); + + /* + * Setup error traceback support for ereport(). This is so that we can + * finger the function that bad information came from. + */ + callback_arg.proname = NameStr(funcform->proname); + callback_arg.prosrc = src; + + sqlerrcontext.callback = sql_inline_error_callback; + sqlerrcontext.arg = (void *) &callback_arg; + sqlerrcontext.previous = error_context_stack; + error_context_stack = &sqlerrcontext; + + /* If we have prosqlbody, pay attention to that not prosrc */ + tmp = SysCacheGetAttr(PROCOID, + func_tuple, + Anum_pg_proc_prosqlbody, + &isNull); + if (!isNull) + { + Node *n; + + n = stringToNode(TextDatumGetCString(tmp)); + if (IsA(n, List)) +querytree_list = linitial_node(List, castNode(List, n)); + else +querytree_list = list_make1(n); + if (list_length(querytree_list) != 1) +goto fail; + querytree = linitial(querytree_list); + + /* Acquire necessary locks, then apply rewriter. */ + AcquireRewriteLocks(querytree, true, false); + querytree_list = pg_rewrite_query(querytree); + if (list_length(querytree_list) != 1) +goto fail; + querytree = linitial(querytree_list); + } + else + { + /* + * Set up to handle parameters while parsing the function body. We + * can use the FuncExpr just created as the input for + * prepare_sql_fn_parse_info. + */ + pinfo = prepare_sql_fn_parse_info(func_tuple, + (Node *) fexpr, + fexpr->inputcollid); + + /* + * Parse, analyze, and rewrite (unlike inline_function(), we can't + * skip rewriting here). We can fail as soon as we find m
Re: allowing extensions to control planner behavior
On Fri, 2024-08-30 at 07:33 -0400, Robert Haas wrote: > This is a fair point. I dislike the fact that add_path() is a thicket > of if-statements that's actually quite hard to understand and easy to > screw up when you're making modifications. But I feel like it would > be > difficult to generalize the infrastructure without making it > substantially slower, which would probably cause too much of an > increase in planning time to be acceptable. So my guess is that this > is a dead end, unless there's a clever idea that I'm not seeing. As far as performance goes, I'm only looking at branch in add_path() that calls compare_pathkeys(). Do you have some example queries which would be a worst case for that path? In general if you can post some details about how you are measuring, that would be helpful. Regards, Jeff Davis
Re: PG_TEST_EXTRA and meson
On Wed, Aug 28, 2024 at 8:21 AM Nazir Bilal Yavuz wrote: > I do not exactly remember the reason but I think I copied the same > behavior as before, PG_TEST_EXTRA variable was checked in the > src/test/Makefile so I exported it there. Okay, give v3 a try then. This exports directly from Makefile.global. Since that gets pulled into a bunch of places, the scope is a lot wider than it used to be; I've disabled it for PGXS so it doesn't end up poisoning other extensions. --Jacob v3-make_test_extra.diff Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Alexander Lakhin writes: > Let me show you another related anomaly, which drongo kindly discovered > recently: [1]. That test failed with: > SELECT dblink_cancel_query('dtest1'); > - dblink_cancel_query > -- > - OK > + dblink_cancel_query > +-- > + cancel request timed out > (1 row) While we're piling on, has anyone noticed that *non* Windows buildfarm animals are also failing this test pretty frequently? The most recent occurrence is at [1], and it looks like this: diff -U3 /home/bf/bf-build/mylodon/HEAD/pgsql/contrib/postgres_fdw/expected/query_cancel.out /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/postgres_fdw/regress/results/query_cancel.out --- /home/bf/bf-build/mylodon/HEAD/pgsql/contrib/postgres_fdw/expected/query_cancel.out 2024-07-22 11:09:50.638133878 + +++ /home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/postgres_fdw/regress/results/query_cancel.out 2024-08-30 06:28:01.971083945 + @@ -17,4 +17,5 @@ SET LOCAL statement_timeout = '10ms'; select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long ERROR: canceling statement due to statement timeout +WARNING: could not get result of cancel request due to timeout COMMIT; I trawled the buildfarm database for other occurrences of "could not get result of cancel request" since this test went in. I found 34 of them (see attachment), and none that weren't the timeout flavor. Most of the failing machines are not especially slow, so even though the hard-wired 30 second timeout that's being used here feels a little under-engineered, I'm not sure that arranging to raise it would help. My spidey sense feels that there's some actual bug here, but it's hard to say where. mylodon's postmaster log confirms that the 30 seconds did elapse, and that there wasn't anything much else going on: 2024-08-30 06:27:31.926 UTC client backend[3668381] pg_regress/query_cancel ERROR: canceling statement due to statement timeout 2024-08-30 06:27:31.926 UTC client backend[3668381] pg_regress/query_cancel STATEMENT: select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; 2024-08-30 06:28:01.946 UTC client backend[3668381] pg_regress/query_cancel WARNING: could not get result of cancel request due to timeout Any thoughts? regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-08-30%2006%3A25%3A46 sysname|branch | snapshot | stage | l ---+---+-++- flaviventris | HEAD | 2024-04-02 23:58:01 | postgres_fdwInstallCheck-C | +WARNING: could not get result of cancel request due to timeout calliphoridae | HEAD | 2024-04-11 01:54:17 | postgres_fdwInstallCheck-C | +WARNING: could not get result of cancel request due to timeout kestrel | REL_17_STABLE | 2024-07-13 04:15:25 | postgres_fdwInstallCheck-C | +WARNING: could not get result of cancel request due to timeout sarus | REL_17_STABLE | 2024-07-20 15:02:23 | ContribCheck-C | 2024-07-20 15:10:15.547 UTC [613791:141] pg_regress/postgres_fdw WARNING: could not get result of cancel request due to timeout sarus | REL_17_STABLE | 2024-07-20 15:02:23 | ContribCheck-C | +WARNING: could not get result of cancel request due to timeout mylodon | HEAD | 2024-07-26 13:15:09 | postgres_fdwInstallCheck-C | +WARNING: could not get result of cancel request due to timeout adder | HEAD | 2024-07-02 00:24:10 | postgres_fdwCheck | +WARNING: could not get result of cancel request due to timeout adder | HEAD | 2024-07-02 00:24:10 | postgres_fdwCheck | 2024-07-02 00:33:45.770 UTC client backend[1036068] pg_regress/postgres_fdw WARNING: could not get result of cancel request due to timeout calliphoridae | REL_17_STABLE | 2024-07-16 22:45:08 | postgres_fdwInstallCheck-C | +WARNING: could not get result of cancel request due to timeout mylodon | HEAD | 2024-08-09 05:25:24 | postgres_fdwCheck | +WARNING: could not get result of cancel request due to timeout mylodon | HEAD | 2024-08-09 05:25:24 | postgres_fdwCheck | 2024-08-09 05:28:21.990 UTC client backend[1775253] pg_regress/query_cancel WARNING: could not get result of cancel request due to timeout mylodon | HEAD | 2024-04-24 07:12:22 | postgres_fdwInstallCheck-C | +WARNING: could not get result of cancel request due to timeout skink | HEAD | 2024-07-20 20
Re: Avoiding superfluous buffer locking during nbtree backwards scans
On Mon, 19 Aug 2024 at 13:43, Matthias van de Meent wrote: > > On Sun, 11 Aug 2024 at 21:44, Peter Geoghegan wrote: > > > > On Tue, Aug 6, 2024 at 6:31 PM Matthias van de Meent > > wrote: > > > +1, LGTM. > > > > > > This changes the backward scan code in _bt_readpage to have an > > > approximately equivalent handling as the forward scan case for > > > end-of-scan cases, which is an improvement IMO. > > Here's a new patch that further improves the situation, so that we > don't try to re-lock the buffer we just accessed when we're stepping > backward in index scans, reducing buffer lock operations in the common > case by 1/2. Attached is an updated version of the patch, now v2, which fixes some assertion failures for parallel plans by passing the correct parameters to _bt_parallel_release for forward scans. With the test setup below, it unifies the number of buffer accesses between forward and backward scans: CREATE TABLE test AS SELECT generate_series(1, 100) as num, '' j; CREATE INDEX ON test (num); VACUUM (FREEZE) test; SET enable_seqscan = off; SET max_parallel_workers_per_gather = 0; EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(j ORDER BY num DESC) -- or ASC FROM test; The test case will have an Index Scan, which in DESC case is backward. Without this patch, the IS will have 7160 accesses for the ASC ordering, but 9892 in the DESC case (an increase of 2732, approximately equivalent to the number leaf pages in the index), while with this patch, the IndexScan will have 7160 buffer accesses for both ASC and DESC ordering. In my previous mail I used buffer lock stats from an index-only scan as proof of the patch working. It's been pointed out to me that an IndexScan is easier to extract this data from, as it drops the pin on the page after getting some results from a page. Kind regards, Matthias van de Meent Neon (https://neon.tech) v2-0001-Avoid-unneeded-nbtree-backwards-scan-buffer-locks.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Fri, Aug 30, 2024, 21:21 Tom Lane wrote: > > While we're piling on, has anyone noticed that *non* Windows buildfarm > animals are also failing this test pretty frequently? > Any thoughts? > Yes. Fixes are here (see the ~10 emails above in the thread for details): https://www.postgresql.org/message-id/cageczqqo8cn2rw45xuymvzxet7-bcruuzufmbgqc3ue...@mail.gmail.com They don't apply anymore after the change to move this test to a dedicated file. It shouldn't be too hard to update those patches though. I'll try to do that in a few weeks when I'm back behind my computer. But feel free to commit something earlier. >
Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD
On 2024-08-29 Th 5:50 PM, Mark Murawski wrote: On 8/29/24 16:54, Andrew Dunstan wrote: On 2024-08-29 Th 1:01 PM, Mark Murawski wrote: On 8/29/24 11:56, Andrew Dunstan wrote: On 2024-08-28 We 5:53 PM, Mark Murawski wrote: Hi Hackers! This would be version v1 of this feature Basically, the subject says it all: pl/pgperl Patch for being able to tell which function you're in. This is a hashref so it will be possible to populate new and exciting other details in the future as the need arises This also greatly improves logging capabilities for things like catching warnings, Because as it stands right now, there's no information that can assist with locating the source of a warning like this: # tail -f /var/log/postgresql.log *** GOT A WARNING - Use of uninitialized value $prefix in concatenation (.) or string at (eval 531) line 48. Now, with $_FN you can do this: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ use warnings; use strict; use Data::Dumper; $SIG{__WARN__} = sub { elog(NOTICE, Dumper($_FN)); print STDERR "In Function: $_FN->{name}: $_[0]\n"; }; my $a; print "$a"; # uninit! return undef; $function$ ; This patch is against 12 which is still our production branch. This could easily be also patched against newer releases as well. I've been using this code in production now for about 3 years, it's greatly helped track down issues. And there shouldn't be anything platform-specific here, it's all regular perl API I'm not sure about adding testing. This is my first postgres patch, so any guidance on adding regression testing would be appreciated. The rationale for this has come from the need to know the source function name, and we've typically resorted to things like this in the past: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ my $function_name = 'throw_warning'; $SIG{__WARN__} = sub { print STDERR "In Function: $function_name: $_[0]\n"; } $function$ ; We've literally had to copy/paste this all over and it's something that postgres should just 'give you' since it knows the name already, just like when triggers pass you $_TD with all the pertinent information A wishlist item would be for postgres plperl to automatically prepend the function name and schema when throwing perl warnings so you don't have to do your own __WARN__ handler, but this is the next best thing. I'm not necessarily opposed to this, but the analogy to $_TD isn't really apt. You can't know the trigger data at compile time, whereas you can know the function's name at compile time, using just the mechanism you find irksome. And if we're going to do it for plperl, shouldn't we do it for other PLs? Hi Andrew, Thanks for the feedback. 1) Why is this not similar to _TD? It literally operates identically. At run-time it passes you $_TD for triggers. Same her for functions. This is all run-time. What exactly is the issue you're trying to point out? It's not the same as the trigger data case because the function name is knowable at compile time, as in fact you have demonstrated. You just find it a bit inconvenient to have to code for that knowledge. By contrast, trigger data is ONLY knowable at run time. But I don't see that it is such a heavy burden to have to write my $funcname = "foo"; or similar in your function. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com Understood, regarding knowability. Trigger data is definitely going to be very dynamic in that regard. No, It's not a heavy burden to hard code the function name. But what my ideal goal would be is this: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ use 'PostgresWarnHandler'; # <--- imagine this registers a WARN handler and outputs $_FN->{name} for you as part of the final warning my $a; print $a; etc and then there's nothing 'hard coded' regarding the name of the function, anywhere. It just seems nonsensical that postgres plperl can't send you the name of your registered function and there's absolutely no way to get it other than hard coding the name (redundantly, considering you're already know the name when you're defining the function in the first place) But even better would be catching the warn at the plperl level, prepending the function name to the warn message, and then you only need: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ my $a; print $a; etc And then in this hypothetical situation -- magic ensues, and you're left with this: # tail -f /var/log/postgresql.log *** GOT A WARNING - Use of uninitialized value $a in concatenation (.) or string in function public.throw_warning() line 1 -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: allowing extensions to control planner behavior
On Fri, Aug 30, 2024 at 1:42 PM Jeff Davis wrote: > As far as performance goes, I'm only looking at branch in add_path() > that calls compare_pathkeys(). Do you have some example queries which > would be a worst case for that path? I think we must be talking past each other somehow. It seems to me that your scheme would require replacing that branch with something more complicated or generalized. If it doesn't, then I don't understand the proposal. If it does, then that seems like it could be a problem. > In general if you can post some details about how you are measuring, > that would be helpful. I'm not really measuring anything at this point, just recalling the many previous times when add_path() has been discussed as a pain point. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
On Thu, Aug 29, 2024 at 1:36 PM Nathan Bossart wrote: > Thanks. Robert, do you have any concerns with this? I don't know if I'm exactly concerned but I don't understand what problem we're solving, either. I thought Ayush said that the function wouldn't produce useful results for sequences; so then why do we need to change the code to enable it? -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
On 2024-08-29 Th 4:44 PM, Jacob Champion wrote: As for the other patches, I'll ping Andrew about 0001, Patch 0001 looks sane to me. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Jelte Fennema-Nio writes: > On Fri, Aug 30, 2024, 21:21 Tom Lane wrote: >> While we're piling on, has anyone noticed that *non* Windows buildfarm >> animals are also failing this test pretty frequently? > Yes. Fixes are here (see the ~10 emails above in the thread for details): > https://www.postgresql.org/message-id/cageczqqo8cn2rw45xuymvzxet7-bcruuzufmbgqc3ue...@mail.gmail.com Hmm. I'm not convinced that 0001 is an actual *fix*, but it should at least reduce the frequency of occurrence a lot, which'd help. I don't want to move the test case to where you propose, because that's basically not sensible. But can't we avoid remote estimates by just cross-joining ft1 to itself, and not using the tables for which remote estimate is enabled? I think 0002 is probably outright wrong, or at least the change to disable_statement_timeout is. Once we get to that, we don't want to throw a timeout error any more, even if an interrupt was received just before it. regards, tom lane
Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD
On 2024-08-30 Fr 3:49 PM, Andrew Dunstan wrote: Sorry for empty reply. Errant finger hit send. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Add tests for PL/pgSQL SRFs
Hello Hackers, While working on inlining non-SQL SRFs [1] I noticed we don't have tests for when a PL/pgSQL function requires materialize mode but doesn't have a result TupleDesc. Here is a patch adding tests for that, as well as some other conditions around SRF calls with `SETOF RECORD` vs `TABLE (...)`. There aren't any code changes, just some new tests. But IMO it might be better to change the code. This error message is a bit confusing: +-- materialize mode requires a result TupleDesc: +select array_to_set2(array['one', 'two']); -- fail +ERROR: materialize mode required, but it is not allowed in this context +CONTEXT: PL/pgSQL function array_to_set2(anyarray) line 3 at RETURN QUERY Perhaps it would be better to give the same error as here?: +select * from array_to_set2(array['one', 'two']); -- fail +ERROR: a column definition list is required for functions returning "record" +LINE 1: select * from array_to_set2(array['one', 'two']); If folks agree, I can work on a patch for that. Otherwise, at least this patch documents the current behavior and increases coverage. [1] https://commitfest.postgresql.org/49/5083/ Yours, -- Paul ~{:-) p...@illuminatedcomputing.comFrom 763db868d2dc121f8745ba9f90e4f737c135bcaa Mon Sep 17 00:00:00 2001 From: "Paul A. Jungwirth" Date: Wed, 28 Aug 2024 19:44:02 -0700 Subject: [PATCH v1] Add tests for PL/pgSQL Set-Returning Functions These tests exercize SRFs with and without a result TupleDesc (in other words RETURNS TABLE (...) vs RETURNS SETOF RECORD). We can only support materialize mode in the former case. On the other hand, that case rejects a column definition list as redundant. The position of these tests shows the contrast with SQL functions (tested above), which support SETOF RECORD in more places. --- src/test/regress/expected/rangefuncs.out | 90 src/test/regress/sql/rangefuncs.sql | 42 +++ 2 files changed, 132 insertions(+) diff --git a/src/test/regress/expected/rangefuncs.out b/src/test/regress/expected/rangefuncs.out index 397a8b35d6d..3d3ebd17780 100644 --- a/src/test/regress/expected/rangefuncs.out +++ b/src/test/regress/expected/rangefuncs.out @@ -2169,6 +2169,96 @@ LINE 1: select * from sin(3) as t(f1 int8,f2 int8); drop type rngfunc_type cascade; NOTICE: drop cascades to function testrngfunc() -- +-- test use of PL/pgSQL functions returning setof record +-- +create or replace function array_to_set2(anyarray) returns setof record as $$ + begin + return query execute 'select i AS "index", $1[i] AS "value" from generate_subscripts($1, 1) i' +using $1; + end; +$$ language plpgsql immutable; +-- materialize mode requires a result TupleDesc: +select array_to_set2(array['one', 'two']); -- fail +ERROR: materialize mode required, but it is not allowed in this context +CONTEXT: PL/pgSQL function array_to_set2(anyarray) line 3 at RETURN QUERY +select * from array_to_set2(array['one', 'two']) as t(f1 int,f2 text); + f1 | f2 ++- + 1 | one + 2 | two +(2 rows) + +select * from array_to_set2(array['one', 'two']); -- fail +ERROR: a column definition list is required for functions returning "record" +LINE 1: select * from array_to_set2(array['one', 'two']); + ^ +select * from array_to_set2(array['one', 'two']) as t(f1 numeric(4,2),f2 text); -- fail +ERROR: structure of query does not match function result type +DETAIL: Returned type integer does not match expected type numeric(4,2) in column 1. +CONTEXT: SQL statement "select i AS "index", $1[i] AS "value" from generate_subscripts($1, 1) i" +PL/pgSQL function array_to_set2(anyarray) line 3 at RETURN QUERY +select * from array_to_set2(array['one', 'two']) as t(f1 point,f2 text); -- fail +ERROR: structure of query does not match function result type +DETAIL: Returned type integer does not match expected type point in column 1. +CONTEXT: SQL statement "select i AS "index", $1[i] AS "value" from generate_subscripts($1, 1) i" +PL/pgSQL function array_to_set2(anyarray) line 3 at RETURN QUERY +explain (verbose, costs off) + select * from array_to_set2(array['one', 'two']) as t(f1 int, f2 text); + QUERY PLAN +- + Function Scan on public.array_to_set2 t + Output: f1, f2 + Function Call: array_to_set2('{one,two}'::text[]) +(3 rows) + +drop function array_to_set2; +-- +-- test use of PL/pgSQL functions returning table +-- +create or replace function array_to_set2(anyarray) returns table(index int, value anyelement) as $$ + begin + return query execute 'select i AS "index", $1[i] AS "value" from generate_subscripts($1, 1) i' +using $1; + end; +$$ language plpgsql immutable; +-- materialize mode requires a result TupleDesc: +select array_to_set2(array['one', 'two']); + array_to_set2 +--- + (1,one) + (2,two) +(2 rows) + +select * from array_to_set2(array['one', 'two']) as t(f1 int,f2 text); -- fai
Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD
On 8/30/24 16:12, Andrew Dunstan wrote: Sorry for empty reply. Errant finger hit send. No problem. So anyway... my main point is to highlight this: On 2024-08-29 Th 5:50 PM, Mark Murawski wrote: And then in this hypothetical situation -- magic ensues, and you're left with this: # tail -f /var/log/postgresql.log *** GOT A WARNING - Use of uninitialized value $a in concatenation (.) or string in function public.throw_warning() line 1 The essential element here is: Why does every single developer who ever wants to develop in plperl be forced to figure out (typically at the worst possible time) that Postgres doesn't log the source function of warning. And then be forced to hard code their own function name as a variable inside their function. The typical situation is you test your code, you push it to production, and then observe. And then production does something you didn't expect and throws a warning. With the current design, you have no idea what code threw the warning and you have to go into every single possible plperl function and throw in hard coded function names for logging. To me this is highly nonsensical to force this on developers. Pretty much every modern scripting language I've come across, has a way to access dynamically: the name of the currently executing function. Either by way of a special variable, or a stack trace introspection. Being that this is Perl, sure... we can get caller() or a stack trace. But the design of plperl Postgres functions uses an anonymous coderef to define the function, so there is no function name defined. I don't see the harm in adding more information so that the running function can know its own name. Maybe another approach to the fix here is to give the function an actual name, and when calling it, now you know where you are executing from. But in perl you cannot define a sub called schema.function, it definitely does not compile. So you would need some sort of name mangling like postgres_plperl_schema_function so it's painfully obvious where it came from. So... this is why it's handy to just have a variable, since you can format the called schema.function properly. But, Ideally the even better solution or just catching and re-throwing the warn handler sounds like it would be the best option. I'll need to look into this more since this is really my first jump into the perl-c api and I've never worked with warn handlers at this level.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
I wrote: > Hmm. I'm not convinced that 0001 is an actual *fix*, but it should > at least reduce the frequency of occurrence a lot, which'd help. After enabling log_statement = all to verify what commands are being sent to the remote, I realized that there's a third thing this patch can do to stabilize matters: issue a regular remote query inside the test transaction, before we enable the timeout. This will ensure that we've dealt with configure_remote_session() and started a remote transaction, so that there aren't extra round trips happening for that while the clock is running. Pushed with that addition and some comment-tweaking. We'll see whether that actually makes things more stable, but I don't think it could make it worse. regards, tom lane
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
On Fri, Aug 30, 2024 at 04:07:30PM -0400, Robert Haas wrote: > On Thu, Aug 29, 2024 at 1:36 PM Nathan Bossart > wrote: >> Thanks. Robert, do you have any concerns with this? > > I don't know if I'm exactly concerned but I don't understand what > problem we're solving, either. I thought Ayush said that the function > wouldn't produce useful results for sequences; so then why do we need > to change the code to enable it? I suppose it would be difficult to argue that it is actually useful, given it hasn't worked since v11 and apparently nobody noticed until recently. If we're content to leave it unsupported, then sure, let's just remove the "relkind == RELKIND_SEQUENCE" check in pgstat_relation(). But I also don't have a great reason to _not_ support it. It used to work (which appears to have been intentional, based on the code), it was unintentionally broken, and it'd work again with a ~1 line change. "SELECT count(*) FROM my_sequence" probably doesn't provide a lot of value, but I have no intention of proposing a patch that removes support for that. All that being said, I don't have a terribly strong opinion, but I guess I lean towards re-enabling. Another related inconsistency I just noticed in pageinspect: postgres=# select t_data from heap_page_items(get_raw_page('s', 0)); t_data -- \x01 (1 row) postgres=# select tuple_data_split('s'::regclass, t_data, t_infomask, t_infomask2, t_bits) from heap_page_items(get_raw_page('s', 0)); ERROR: only heap AM is supported -- nathan
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Fri, 30 Aug 2024 at 22:12, Tom Lane wrote: > > Jelte Fennema-Nio writes: > > On Fri, Aug 30, 2024, 21:21 Tom Lane wrote: > >> While we're piling on, has anyone noticed that *non* Windows buildfarm > >> animals are also failing this test pretty frequently? > > > Yes. Fixes are here (see the ~10 emails above in the thread for details): > > https://www.postgresql.org/message-id/cageczqqo8cn2rw45xuymvzxet7-bcruuzufmbgqc3ue...@mail.gmail.com > > Hmm. I'm not convinced that 0001 is an actual *fix*, but it should > at least reduce the frequency of occurrence a lot, which'd help. I also don't think it's an actual fix, but I couldn't think of a way to fix this. And since this only happens if you cancel right at the start of a postgres_fdw query, I don't think it's worth investing too much time on a fix. > I don't want to move the test case to where you propose, because > that's basically not sensible. But can't we avoid remote estimates > by just cross-joining ft1 to itself, and not using the tables for > which remote estimate is enabled? Yeah that should work too (I just saw your next email, where you said it's committed like this). > I think 0002 is probably outright wrong, or at least the change to > disable_statement_timeout is. Once we get to that, we don't want > to throw a timeout error any more, even if an interrupt was received > just before it. The disable_statement_timeout change was not the part of that patch that was necessary for stable output, only the change in the first branch of enable_statement_timeout was necessary. The reason being that enable_statement_timeout is called multiple times for a query, because start_xact_command is called multiple times in exec_simple_query. The change to disable_statement_timeout just seemed like the logical extension of that change, especially since there was basically a verbatim copy of disable_statement_timeout in the second branch of enable_statement_timeout. To make sure I understand your suggestion correctly: Are you saying you would want to completely remove the outstanding interrupt if it was caused by de statement_timout when disable_statement_timeout is called? Because I agree that would probably make sense, but that sounds like a more impactful change. But the current behaviour seems strictly worse than the behaviour proposed in the patch to me, because currently the backend would still be interrupted, but the error would indicate a reason for the interrupt that is simply incorrect i.e. it will say it was cancelled due to a user request, which never happened.
Re: Use read streams in pg_visibility
On Tue, Aug 27, 2024 at 10:49:19AM +0300, Nazir Bilal Yavuz wrote: > On Fri, 23 Aug 2024 at 22:01, Noah Misch wrote: > > On Fri, Aug 23, 2024 at 02:20:06PM +0300, Nazir Bilal Yavuz wrote: > > > On Tue, 20 Aug 2024 at 21:47, Noah Misch wrote: > > > > On Tue, Aug 13, 2024 at 03:22:27PM +0300, Nazir Bilal Yavuz wrote: > I liked the block_range_read_stream_cb. Attached patches for that > (first 3 patches). I chose an nblocks way instead of last_blocks in > the struct. To read blocks 10 and 11, I would expect to initialize the struct with one of: { .first=10, .nblocks=2 } { .first=10, .last_inclusive=11 } { .first=10, .last_exclusive=12 } With the patch's API, I would need {.first=10,.nblocks=12}. The struct field named "nblocks" behaves like a last_block_exclusive. Please either make the behavior an "nblocks" behavior or change the field name to replace the term "nblocks" with something matching the behavior. (I used longer field names in my examples here, to disambiguate those examples. It's okay if the final field names aren't those, as long as the field names and the behavior align.) > > > > The callback doesn't return blocks having zero vm bits, so the blkno > > > > variable > > > > is not accurate. I didn't test, but I think the loop's "Recheck to > > > > avoid > > > > returning spurious results." looks at the bit for the wrong block. If > > > > that's > > > > what v1 does, could you expand the test file to catch that? For > > > > example, make > > > > a two-block table with only the second block all-visible. > > > > > > Yes, it was not accurate. I am getting blockno from the buffer now. I > > > checked and confirmed it is working as expected by manually logging > > > blocknos returned from the read stream. I am not sure how to add a > > > test case for this. > > > > VACUUM FREEZE makes an all-visible, all-frozen table. DELETE of a > > particular > > TID, even if rolled back, clears both vm bits for the TID's page. Past > > tests > > like that had instability problems. One cause is a concurrent session's XID > > or snapshot, which can prevent VACUUM setting vm bits. Using a TEMP table > > may > > have been one of the countermeasures, but I don't remember clearly. Hence, > > please search the archives or the existing pg_visibility tests for how we > > dealt with that. It may not be problem for this particular test. > > Thanks for the information, I will check these. What I still do not > understand is how to make sure that only the second block is processed > and the first one is skipped. pg_check_visible() and pg_check_frozen() > returns TIDs that cause corruption in the visibility map, there is no > information about block numbers. I see what you're saying. collect_corrupt_items() needs a corrupt table to report anything; all corruption-free tables get the same output. Testing this would need extra C code or techniques like corrupt_page_checksum() to create the corrupt state. That wouldn't be a bad thing to have, but it's big enough that I'll consider it out of scope for $SUBJECT. With the callback change above, I'll be ready to push all this.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Hello Tom, 30.08.2024 23:55, Tom Lane wrote: Pushed with that addition and some comment-tweaking. We'll see whether that actually makes things more stable, but I don't think it could make it worse. Thank you for fixing that issue! I've tested your fix with the modification I proposed upthread: idle_session_timeout_enabled = false; } +if (rand() % 10 == 0) pg_usleep(1); /* * (5) disable async signal conditions again. and can confirm that the issue is gone. On 8749d850f~1, the test failed on iterations 3, 3. 12 for me, but on current REL_17_STABLE, 100 test iterations succeeded. At the same time, mylodon confirmed my other finding at [1] and failed [2] with: -ERROR: canceling statement due to statement timeout +ERROR: canceling statement due to user request [1] https://www.postgresql.org/message-id/4db099c8-4a52-3cc4-e970-14539a319466%40gmail.com [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-08-30%2023%3A03%3A31 Best regards, Alexander
Re: Inval reliability, especially for inplace updates
On Tue, Jun 18, 2024 at 08:23:49AM -0700, Noah Misch wrote: > On Mon, Jun 17, 2024 at 06:57:30PM -0700, Andres Freund wrote: > > On 2024-06-17 16:58:54 -0700, Noah Misch wrote: > > > On Sat, Jun 15, 2024 at 03:37:18PM -0700, Noah Misch wrote: > > > > On Wed, May 22, 2024 at 05:05:48PM -0700, Noah Misch wrote: > > > > > https://postgr.es/m/20240512232923.aa.nmi...@google.com wrote: > > > > > > Separable, nontrivial things not fixed in the attached patch stack: > > > > > > > > > > > > - Inplace update uses transactional CacheInvalidateHeapTuple(). > > > > > > ROLLBACK of > > > > > > CREATE INDEX wrongly discards the inval, leading to the > > > > > > relhasindex=t loss > > > > > > still seen in inplace-inval.spec. CacheInvalidateRelmap() does > > > > > > this right. > > > > > > > > > > I plan to fix that like CacheInvalidateRelmap(): send the inval > > > > > immediately, > > > > > inside the critical section. Send it in heap_xlog_inplace(), too. > > > > I'm worried this might cause its own set of bugs, e.g. if there are any > > places > > that, possibly accidentally, rely on the invalidation from the inplace > > update > > to also cover separate changes. > > Good point. I do have index_update_stats() still doing an ideally-superfluous > relcache update for that reason. Taking that further, it would be cheap > insurance to have the inplace update do a transactional inval in addition to > its immediate inval. Future master-only work could remove the transactional > one. How about that? Restoring the transactional inval seemed good to me, so I've rebased and included that. This applies on top of three patches from https://postgr.es/m/20240822073200.4f.nmi...@google.com. I'm attaching those to placate cfbot, but this thread is for review of the last patch only. Author: Noah Misch Commit: Noah Misch Warn if LOCKTAG_TUPLE is held at commit, under debug_assertions. The current use always releases this locktag. A planned use will continue that intent. It will involve more areas of code, making unlock omissions easier. Warn under debug_assertions, like we do for various resource leaks. Back-patch to v12 (all supported versions), the plan for the commit of the new use. Reviewed by FIXME. Discussion: https://postgr.es/m/20240512232923.aa.nmi...@google.com diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 0400a50..e5e7ab5 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -2256,6 +2256,16 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks) locallock->numLockOwners = 0; } +#ifdef USE_ASSERT_CHECKING + + /* +* Tuple locks are currently held only for short durations within a +* transaction. Check that we didn't forget to release one. +*/ + if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_TUPLE && !allLocks) + elog(WARNING, "tuple lock held at commit"); +#endif + /* * If the lock or proclock pointers are NULL, this lock was taken via * the relation fast-path (and is not known to have been transferred). Author: Noah Misch Commit: Noah Misch Fix data loss at inplace update after heap_update(). As previously-added tests demonstrated, heap_inplace_update() could instead update an unrelated tuple of the same catalog. It could lose the update. Losing relhasindex=t was a source of index corruption. Inplace-updating commands like VACUUM will now wait for heap_update() commands like GRANT TABLE and GRANT DATABASE. That isn't ideal, but a long-running GRANT already hurts VACUUM progress more just by keeping an XID running. The VACUUM will behave like a DELETE or UPDATE waiting for the uncommitted change. For implementation details, start at the systable_inplace_update_begin() header comment and README.tuplock. Back-patch to v12 (all supported versions). In back branches, retain a deprecated heap_inplace_update(), for extensions. Reviewed by Heikki Linnakangas and Alexander Lakhin. Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bg2mgqecerqr5gg...@mail.gmail.com diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock index 6441e8b..ddb2def 100644 --- a/src/backend/access/heap/README.tuplock +++ b/src/backend/access/heap/README.tuplock @@ -153,3 +153,14 @@ The following infomask bits are applicable: We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit is set. + +Reading inplace-updated columns +--- + +Inplace updates create an exception to the rule that tuple data won't change +under a reader holding a pin. A reader of a heap_fetch() result tuple may +witness a torn read. C
Re: Interrupts vs signals
On Sat, Aug 31, 2024 at 10:17 AM Heikki Linnakangas wrote: > > * This direction seems to fit quite nicely with future ideas about > > asynchronous network I/O. That may sound unrelated, but imagine that > > a future version of WaitEventSet is built on Linux io_uring (or > > Windows iorings, or Windows IOCP, or kqueue), and waits for the kernel > > to tell you that network data has been transferred directly into a > > user space buffer. You could wait for the interrupt word to change at > > the same time by treating it as a futex[1]. Then all that other stuff > > -- signalfd, is_set, maybe_sleeping -- just goes away, and all we have > > left is one single word in memory. (That it is possible to do that is > > not really a coincidence, as our own Mr Freund asked Mr Axboe to add > > it[2]. The existing latch implementation techniques could be used as > > fallbacks, but when looked at from the right angle, once you squish > > all the wakeup reasons into a single word, it's all just an > > implementation of a multiplexable futex with extra steps.) > > Cool Just by the way, speaking of future tricks and the connections between this code and other problems in other threads, I wanted to mention that the above thought is also connected to CF #3998. When I started working on this, in parallel I had an experimental patch set using futexes[1] (back then, to try out futexes, I had to patch my OS[2] because Linux couldn't multiplex them yet, and macOS/*BSD had something sort of vaguely similar but effectively only usable between threads in one process). I planned to switch to waiting directly on the interrupt vector as a futex when bringing that idea together with the one in this thread, but I guess I assumed we had to keep latches too since they seemed like such a central concept in PostgreSQL. Your idea seems much better, the more I think about it, but maybe only the inventor of latches could have the idea of blowing them up :-) Anyway, in that same experiment I realised I could wake multiple backends in one system call, which led to more discoveries about the negative interactions between latches and locks, and begat CF #3998 (SetLatches()). By way of excuse, unfortunately I got blocked in my progress on interrupt vectors for a couple of release cycles by the recovery conflict system, a set of procsignals that were not like the others, and turned out to be broken more or less as a result. That was tricky to fix (CF #3615), leading to journeys into all kinds of strange places like the regex code... [1] https://github.com/macdice/postgres/commits/kqueue-usermem/ [2] https://reviews.freebsd.org/D37102
Re: query_id, pg_stat_activity, extended query protocol
On Thu, Aug 15, 2024 at 5:06 AM Imseih (AWS), Sami wrote: > > > I think the testing discussion should be moved to a different thread. > > What do you think? > See v4. > > 0001 deals with reporting queryId in exec_execute_message and > exec_bind_message. > 0002 deals with reporting queryId after a cache invalidation. > > There are no tests as this requires more discussion in a separate thread(?) > hi. v4-0001 work as expected. i don't know how to test 0002 In 0001 and 0002, all foreach loops, we can use the new macro foreach_node. see https://git.postgresql.org/cgit/postgresql.git/commit/?id=14dd0f27d7cd56ffae9ecdbe324965073d01a9ff the following are the minimum tests I come up with for 0001 /* test \bind queryid exists */ select query_id is not null as query_id_exist from pg_stat_activity where pid = pg_backend_pid() \bind \g /* test that \parse \bind_named queryid exists */ select pg_backend_pid() as current_pid \gset pref01_ select query_id is not null as query_id_exist from pg_stat_activity where pid = $1 \parse stmt11 \bind_named stmt11 :pref01_current_pid \g
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Thu, Aug 29, 2024 at 03:48:53PM -0500, Masahiko Sawada wrote: > On Sun, May 19, 2024 at 6:46 AM Noah Misch wrote: > > If I were standardizing pg_trgm on one or the other notion of "char", I > > would > > choose signed char, since I think it's still the majority. More broadly, I > > see these options to fix pg_trgm: > > > > 1. Change to signed char. Every arm64 system needs to scan pg_trgm indexes. > > 2. Change to unsigned char. Every x86 system needs to scan pg_trgm indexes. > > Even though it's true that signed char systems are the majority, it > would not be acceptable to force the need to scan pg_trgm indexes on > unsigned char systems. > > > 3. Offer both, as an upgrade path. For example, pg_trgm could have separate > >operator classes gin_trgm_ops and gin_trgm_ops_unsigned. Running > >pg_upgrade on an unsigned-char system would automatically map v17 > >gin_trgm_ops to v18 gin_trgm_ops_unsigned. This avoids penalizing any > >architecture with upgrade-time scans. > > Very interesting idea. How can new v18 users use the correct operator > class? I don't want to require users to specify the correct signed or > unsigned operator classes when creating a GIN index. Maybe we need to In brief, it wouldn't matter which operator class new v18 indexes use. The documentation would focus on gin_trgm_ops and also say something like: There's an additional operator class, gin_trgm_ops_unsigned. It behaves exactly like gin_trgm_ops, but it uses a deprecated on-disk representation. Use gin_trgm_ops in new indexes, but there's no disadvantage from continuing to use gin_trgm_ops_unsigned. Before PostgreSQL 18, gin_trgm_ops used a platform-dependent representation. pg_upgrade automatically uses gin_trgm_ops_unsigned when upgrading from source data that used the deprecated representation. What concerns might users have, then? (Neither operator class would use plain "char" in a context that affects on-disk state. They'll use "signed char" and "unsigned char".) > dynamically use the correct compare function for the same operator > class depending on the char signedness. But is it possible to do it on > the extension (e.g. pg_trgm) side? No, I don't think the extension can do that cleanly. It would need to store the signedness in the index somehow, and GIN doesn't call the opclass at a time facilitating that. That would need help from the core server.
Re: Streaming read-ready sequential scan code
On Fri, Aug 30, 2024 at 1:00 AM Alexander Lakhin wrote: > I looked at two perf profiles of such out-of-sync processes and found no > extra calls or whatsoever in the slow one, it just has the number of perf > samples increased proportionally. It made me suspect CPU frequency > scaling... Indeed, with the "performance" governor set and the boost mode > disabled, I'm now seeing much more stable numbers (I do this tuning before > running performance tests, but I had forgotten about that when I ran that > your test, my bad). Aha, mystery solved. I have pushed the fix. Thanks!
Re: optimizing pg_upgrade's once-in-each-database steps
LGTM in general, but here are some final nitpicks: + if (maxFd != 0) + (void) select(maxFd + 1, &input_mask, &output_mask, &except_mask, NULL); It’s a good idea to check for the return value of select, in case it returns any errors. + dbs_complete++; + (void) PQgetResult(slot->conn); + PQfinish(slot->conn); Perhaps it’s rather for me to understand, what does PQgetResult call do here? + /* Check for connection failure. */ + if (PQconnectPoll(slot->conn) == PGRES_POLLING_FAILED) + pg_fatal("connection failure: %s", PQerrorMessage(slot->conn)); + + /* Check whether the connection is still establishing. */ + if (PQconnectPoll(slot->conn) != PGRES_POLLING_OK) + return; Are the two consecutive calls of PQconnectPoll intended here? Seems a bit wasteful, but maybe that’s ok. We should probably mention this change in the docs as well, I found these two places that I think can be improved: diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml index 9877f2f01c..ad7aa33f07 100644 --- a/doc/src/sgml/ref/pgupgrade.sgml +++ b/doc/src/sgml/ref/pgupgrade.sgml @@ -118,7 +118,7 @@ PostgreSQL documentation -j njobs --jobs=njobs - number of simultaneous processes or threads to use + number of simultaneous processes, threads or connections to use @@ -587,8 +587,9 @@ NET STOP postgresql-&majorversion; The --jobs option allows multiple CPU cores to be used - for copying/linking of files and to dump and restore database schemas - in parallel; a good place to start is the maximum of the number of + for copying/linking of files, to dump and restore database schemas in + parallel and to use multiple simultaneous connections for upgrade checks; + a good place to start is the maximum of the number of CPU cores and tablespaces. This option can dramatically reduce the time to upgrade a multi-database server running on a multiprocessor machine. -- > 28 авг. 2024 г., в 22:46, Nathan Bossart > написал(а): > > I spent some time cleaning up names, comments, etc. Barring additional > feedback, I'm planning to commit this stuff in the September commitfest so > that it has plenty of time to bake in the buildfarm. > > -- > nathan >
Re: race condition in pg_class
On Thu, Aug 29, 2024 at 09:08:43PM +0530, Nitin Motiani wrote: > On Thu, Aug 29, 2024 at 8:11 PM Noah Misch wrote: > > On Tue, Aug 20, 2024 at 11:59:45AM +0300, Heikki Linnakangas wrote: > > > My order of preference is: 2, 1, 3. > > > > I kept tuple locking responsibility in heapam.c. That's simpler and better > > for modularity, but it does mean we release+acquire after any xmax wait. > > Before, we avoided that if the next genam.c scan found the same TID. (If > > the > > next scan finds the same TID, the xmax probably aborted.) I think DDL > > aborts > > are rare enough to justify simplifying as this version does. I don't expect > > anyone to notice the starvation outside of tests built to show it. (With > > previous versions, one can show it with a purpose-built test that commits > > instead of aborting, like the "001_pgbench_grant@9" test.) > > > > This move also loses the optimization of unpinning before > > XactLockTableWait(). > > heap_update() doesn't optimize that way, so that's fine. > > > > The move ended up more like (1), though I did do > > s/systable_inplace_update_scan/systable_inplace_update_begin/ like in (2). > > I > > felt that worked better than (2) to achieve lock release before > > CacheInvalidateHeapTuple(). Alternatives that could be fine: > > > From a consistency point of view, I find it cleaner if we can have all > the heap_inplace_lock and heap_inplace_unlock in the same set of > functions. So here those would be the systable_inplace_* functions. That will technically be the case after inplace160, and I could make it so here by inlining heap_inplace_unlock() into its heapam.c caller. Would that be cleaner or less clean? > > - In the cancel case, call both systable_inplace_update_cancel and > > systable_inplace_update_end. _finish or _cancel would own unlock, while > > _end would own systable_endscan(). > > > What happens to CacheInvalidateHeapTuple() in this approach? I think > it will still need to be brought to the genam.c layer if we are > releasing the lock in systable_inplace_update_finish. Cancel scenarios don't do invalidation. (Same under other alternatives.) > > - Hoist the CacheInvalidateHeapTuple() up to the genam.c layer. While > > tolerable now, this gets less attractive after the inplace160 patch from > > https://postgr.es/m/flat/20240523000548.58.nmi...@google.com > > > I skimmed through the inplace160 patch. It wasn't clear to me why this > becomes less attractive with the patch. I see there is a new > CacheInvalidateHeapTupleInPlace but that looks like it would be called > while holding the lock. And then there is an > AcceptInvalidationMessages which can perhaps be moved to the genam.c > layer too. Is the concern that one invalidation call will be in the > heapam layer and the other will be in the genam layer? That, or a critical section would start in heapam.c, then end in genam.c. Current call tree at inplace160 v4: genam.c:systable_inplace_update_finish heapam.c:heap_inplace_update PreInplace_Inval START_CRIT_SECTION BUFFER_LOCK_UNLOCK AtInplace_Inval END_CRIT_SECTION UnlockTuple AcceptInvalidationMessages If we hoisted all of invalidation up to the genam.c layer, a critical section that starts in heapam.c would end in genam.c: genam.c:systable_inplace_update_finish PreInplace_Inval heapam.c:heap_inplace_update START_CRIT_SECTION BUFFER_LOCK_UNLOCK AtInplace_Inval END_CRIT_SECTION UnlockTuple AcceptInvalidationMessages If we didn't accept splitting the critical section but did accept splitting invalidation responsibilities, one gets perhaps: genam.c:systable_inplace_update_finish PreInplace_Inval heapam.c:heap_inplace_update START_CRIT_SECTION BUFFER_LOCK_UNLOCK AtInplace_Inval END_CRIT_SECTION UnlockTuple AcceptInvalidationMessages That's how I ended up at inplace120 v9's design. > Also I have a small question from inplace120. > > I see that all the places we check resultRelInfo->ri_needLockTagTuple, > we can just call > IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a > big advantage of storing a separate bool field? Also there is another No, not a big advantage. I felt it was more in line with the typical style of src/backend/executor. > write to ri_RelationDesc in CatalogOpenIndexes in > src/backlog/catalog/indexing.c. I think ri_needLockTagTuple needs to > be set there also to keep it consistent with ri_RelationDesc. Please > let me know if I am missing something about the usage of the new > field. Can you say more about consequences you found? Only the full executor reads the field, doing so when it fetches the most recent version of a row. CatalogOpenIndexes() callers lack the full executor's practice of fetching the most recent version of a row, so they couldn't benefit reading the field. I don't think any CatalogOpenIndexes() caller passes its ResultRelInfo to the full executor, and "typedef struct ResultRelInfo *CatalogIndexState" exists in part to k
Re: 039_end_of_wal: error in "xl_tot_len zero" test
Pushed. Thanks!