from:"Justin Pryzby"

Re: pg_upgrade failing for 200+ million Large Objects

2024-07-24 Thread Justin Pryzby

On Mon, Apr 01, 2024 at 03:28:26PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
> > The one design point that worries me a little is the non-configurability of
> > --transaction-size in pg_upgrade.  I think it's fine to default it to 1,000
> > or something, but given how often I've had to fiddle with
> > max_locks_per_transaction, I'm wondering if we might regret hard-coding it.
> 
> Well, we could add a command-line switch to pg_upgrade, but I'm
> unconvinced that it'd be worth the trouble.  I think a very large
> fraction of users invoke pg_upgrade by means of packager-supplied
> scripts that are unlikely to provide a way to pass through such
> a switch.  I'm inclined to say let's leave it as-is until we get
> some actual field requests for a switch.

I've been importing our schemas and doing upgrade testing, and was
surprised when a postgres backend was killed for OOM during pg_upgrade:

Killed process 989302 (postgres) total-vm:5495648kB, anon-rss:5153292kB, ...

Upgrading from v16 => v16 doesn't use nearly as much RAM.

While tracking down the responsible commit, I reproduced the problem
using a subset of tables; at 959b38d770, the backend process used
~650 MB RAM, and at its parent commit used at most ~120 MB.

959b38d770b Invent --transaction-size option for pg_restore.

By changing RESTORE_TRANSACTION_SIZE to 100, backend RAM use goes to
180 MB during pg_upgrade, which is reasonable.

With partitioning, we have a lot of tables, some of them wide (126
partitioned tables, 8942 childs, total 1019315 columns).  I didn't track
if certain parts of our schema contribute most to the high backend mem
use, just that it's now 5x (while testing a subset) to 50x higher.

We'd surely prefer that the transaction size be configurable.

-- 
Justin

Re: warn if GUC set to an invalid shared library

2024-07-22 Thread Justin Pryzby

On Fri, May 24, 2024 at 01:15:13PM -0400, Robert Haas wrote:
> + /* Note that filename was already canonicalized */
> 
> I see that this comment is copied from load_libraries(), but I don't
> immediately see where the canonicalization actually happens. Do you
> know, or can you find out? Because that's crucial here, else stat()
> might not target the real filename. I wonder if it will anyway. Like,
> couldn't the library be versioned, and might not dlopen() try a few
> possibilities?

This comment made me realize that we've been fixated on the warning.
But the patch was broken, and would've always warned.  I think almost
all of the previous patch versions had this issue - oops.

I added a call to expand_dynamic_library_name(), which seems to answer
your question.

And added a preparatory patch to distinguish ALTER USER/DATABASE SET
from SET in a function, to avoid warning in that case.

-- 
Justin
>From 29610d4e417a1a0cf099e10736b5c14aec90f641 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 21 Jul 2024 09:49:17 -0500
Subject: [PATCH 1/4] change function SET to use a separate GUC source

This allows distinguishing it from ALTER ROLE SET and ALTER DATABASE
SET, as needed for the following patch.

See also:
2abae34a2e8fde42be731b4e18d44cd08901464d - added function SET
0c66a223774dec62edb5281a47e72fe480a8f7aa - (partially) updated comments about PGC_S_TEST
---
 src/backend/access/table/tableamapi.c|  2 +-
 src/backend/catalog/pg_db_role_setting.c |  8 
 src/backend/commands/functioncmds.c  |  4 ++--
 src/backend/commands/tablespace.c|  4 ++--
 src/backend/commands/variable.c  |  8 
 src/backend/storage/buffer/localbuf.c|  2 +-
 src/backend/utils/cache/ts_cache.c   |  2 +-
 src/backend/utils/misc/guc.c | 18 +-
 src/include/utils/guc.h  |  7 ---
 9 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index e9b598256fb..7c0f08ed46d 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -132,7 +132,7 @@ check_default_table_access_method(char **newval, void **extra, GucSource source)
 			 * nonexistent table access method, only a NOTICE. See comments in
 			 * guc.h.
 			 */
-			if (source == PGC_S_TEST)
+			if (source == PGC_S_TEST || source == PGC_S_TEST_FUNCTION)
 			{
 ereport(NOTICE,
 		(errcode(ERRCODE_UNDEFINED_OBJECT),
diff --git a/src/backend/catalog/pg_db_role_setting.c b/src/backend/catalog/pg_db_role_setting.c
index 8c20f519fc0..abfcc5655b3 100644
--- a/src/backend/catalog/pg_db_role_setting.c
+++ b/src/backend/catalog/pg_db_role_setting.c
@@ -70,7 +70,7 @@ AlterSetting(Oid databaseid, Oid roleid, VariableSetStmt *setstmt)
  RelationGetDescr(rel), );
 
 			if (!isnull)
-new = GUCArrayReset(DatumGetArrayTypeP(datum));
+new = GUCArrayReset(DatumGetArrayTypeP(datum), PGC_S_TEST);
 
 			if (new)
 			{
@@ -115,9 +115,9 @@ AlterSetting(Oid databaseid, Oid roleid, VariableSetStmt *setstmt)
 
 		/* Update (valuestr is NULL in RESET cases) */
 		if (valuestr)
-			a = GUCArrayAdd(a, setstmt->name, valuestr);
+			a = GUCArrayAdd(a, setstmt->name, valuestr, PGC_S_TEST);
 		else
-			a = GUCArrayDelete(a, setstmt->name);
+			a = GUCArrayDelete(a, setstmt->name, PGC_S_TEST);
 
 		if (a)
 		{
@@ -141,7 +141,7 @@ AlterSetting(Oid databaseid, Oid roleid, VariableSetStmt *setstmt)
 
 		memset(nulls, false, sizeof(nulls));
 
-		a = GUCArrayAdd(NULL, setstmt->name, valuestr);
+		a = GUCArrayAdd(NULL, setstmt->name, valuestr, PGC_S_TEST);
 
 		values[Anum_pg_db_role_setting_setdatabase - 1] =
 			ObjectIdGetDatum(databaseid);
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 6593fd7d811..30a58cf027c 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -657,9 +657,9 @@ update_proconfig_value(ArrayType *a, List *set_items)
 			char	   *valuestr = ExtractSetVariableArgs(sstmt);
 
 			if (valuestr)
-a = GUCArrayAdd(a, sstmt->name, valuestr);
+a = GUCArrayAdd(a, sstmt->name, valuestr, PGC_S_TEST_FUNCTION);
 			else/* RESET */
-a = GUCArrayDelete(a, sstmt->name);
+a = GUCArrayDelete(a, sstmt->name, PGC_S_TEST_FUNCTION);
 		}
 	}
 
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 113b4807315..b0a460c1622 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -1104,7 +1104,7 @@ check_default_tablespace(char **newval, void **extra, GucSource source)
 			 * When source == PGC_S_TEST, don't throw a hard error for a
 			 * nonexistent tablespace, only a NOTICE.  See comments in guc.h.
 			 */
-			if (source == PGC_S_TEST)
+			if (source == PGC_S_TEST || source == PGC_S_TEST_FUNCTION)
 			{
 ereport(NOTICE,
 		(errcode(ERRCODE_UNDEFINED_O

Re: ALTER TABLE uses a bistate but not for toast tables

2024-07-15 Thread Justin Pryzby

@cfbot: rebased
>From bc92a05ba35115ba0df278d553a5c0e4e303fe23 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 21 Jun 2022 22:28:06 -0500
Subject: [PATCH] WIP: use BulkInsertState for toast tuples, too

DONE: ALTER, CLUSTER
TODO: copyto, copyfrom?

slot_getsomeattrs slot_deform_heap_tuple fetchatt
heap_getnextslot => heapgettup => heapgetpage => ReadBufferExtended
initscan
table_beginscan table_scan_getnextslot
RelationCopyStorageUsingBuffer ReadBufferWithoutRelcache

(gdb) bt
 #0  table_open (relationId=relationId@entry=16390, lockmode=lockmode@entry=1) at table.c:40
 #1  0x56444cb23d3c in toast_fetch_datum (attr=attr@entry=0x7f67933cc6cc) at detoast.c:372
 #2  0x56444cb24217 in detoast_attr (attr=attr@entry=0x7f67933cc6cc) at detoast.c:123
 #3  0x56444d07a4c8 in pg_detoast_datum_packed (datum=datum@entry=0x7f67933cc6cc) at fmgr.c:1743
 #4  0x56444d042c8d in text_to_cstring (t=0x7f67933cc6cc) at varlena.c:224
 #5  0x56444d0434f9 in textout (fcinfo=) at varlena.c:573
 #6  0x56444d078f10 in FunctionCall1Coll (flinfo=flinfo@entry=0x56444e4706b0, collation=collation@entry=0, arg1=arg1@entry=140082828592844) at fmgr.c:1124
 #7  0x56444d079d7f in OutputFunctionCall (flinfo=flinfo@entry=0x56444e4706b0, val=val@entry=140082828592844) at fmgr.c:1561
 #8  0x56444ccb1665 in CopyOneRowTo (cstate=cstate@entry=0x56444e470898, slot=slot@entry=0x56444e396d20) at copyto.c:975
 #9  0x56444ccb2c7d in DoCopyTo (cstate=cstate@entry=0x56444e470898) at copyto.c:891
 #10 0x56444ccab4c2 in DoCopy (pstate=pstate@entry=0x56444e396bb0, stmt=stmt@entry=0x56444e3759b0, stmt_location=0, stmt_len=48, processed=processed@entry=0x7ffc212a6310) at copy.c:308

cluster:
heapam_relation_copy_for_cluster
reform_and_rewrite_tuple
rewrite_heap_tuple
raw_heap_insert

//-os-only: linux
---
 src/backend/access/common/toast_internals.c |  6 --
 src/backend/access/heap/heapam.c| 24 +++--
 src/backend/access/heap/heaptoast.c | 11 ++
 src/backend/access/heap/rewriteheap.c   |  8 +--
 src/backend/access/table/toast_helper.c |  6 --
 src/include/access/heaptoast.h  |  4 +++-
 src/include/access/hio.h|  2 ++
 src/include/access/toast_helper.h   |  3 ++-
 src/include/access/toast_internals.h|  4 +++-
 9 files changed, 49 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 90d0654e629..b66d32a26e3 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -117,7 +117,8 @@ toast_compress_datum(Datum value, char cmethod)
  */
 Datum
 toast_save_datum(Relation rel, Datum value,
- struct varlena *oldexternal, int options)
+ struct varlena *oldexternal, int options,
+ BulkInsertState bistate)
 {
 	Relation	toastrel;
 	Relation   *toastidxs;
@@ -320,7 +321,8 @@ toast_save_datum(Relation rel, Datum value,
 		memcpy(VARDATA(_data), data_p, chunk_size);
 		toasttup = heap_form_tuple(toasttupDesc, t_values, t_isnull);
 
-		heap_insert(toastrel, toasttup, mycid, options, NULL);
+		heap_insert(toastrel, toasttup, mycid, options, bistate ?
+	bistate->toast_state : NULL);
 
 		/*
 		 * Create the index entry.  We cheat a little here by not using
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 91b20147a00..18b05d46735 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -71,7 +71,8 @@
 
 
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
-	 TransactionId xid, CommandId cid, int options);
+	 TransactionId xid, CommandId cid, int options,
+	 BulkInsertState bistate);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
   Buffer newbuf, HeapTuple oldtup,
   HeapTuple newtup, HeapTuple old_key_tuple,
@@ -1931,9 +1932,15 @@ GetBulkInsertState(void)
 	bistate = (BulkInsertState) palloc(sizeof(BulkInsertStateData));
 	bistate->strategy = GetAccessStrategy(BAS_BULKWRITE);
 	bistate->current_buf = InvalidBuffer;
+
 	bistate->next_free = InvalidBlockNumber;
 	bistate->last_free = InvalidBlockNumber;
 	bistate->already_extended_by = 0;
+
+	bistate->toast_state = (BulkInsertState) palloc(sizeof(BulkInsertStateData));
+	bistate->toast_state->strategy = GetAccessStrategy(BAS_BULKWRITE);
+	bistate->toast_state->current_buf = InvalidBuffer;
+
 	return bistate;
 }
 
@@ -1945,6 +1952,10 @@ FreeBulkInsertState(BulkInsertState bistate)
 {
 	if (bistate->current_buf != InvalidBuffer)
 		ReleaseBuffer(bistate->current_buf);
+	if (bistate->toast_state->current_buf != InvalidBuffer)
+		ReleaseBuffer(bistate->toast_state->current_buf);
+
+	FreeAccessStrategy(bistate->toast_state->strategy);
 	FreeAccessStrategy(bistate->strategy);
 	pfree(bistate);
 }
@@ -2010,

Re: CREATE INDEX CONCURRENTLY on partitioned index

2024-07-11 Thread Justin Pryzby

On Sat, Jun 15, 2024 at 07:56:38PM +0100, Ilya Gladyshev wrote:
> In addition, I noticed that progress tracking is once again broken for
> partitioned tables, while looking at REINDEX implementation, attaching the
> second patch to fix it.

Thanks for the fixes, I started reviewing them but need some more time
to digest.

Do you mean that progress reporting is broken in master, for REINDEX, or
just with this patch ?

-- 
Justin

Re: CI and test improvements

2024-06-18 Thread Justin Pryzby

On Fri, Jun 14, 2024 at 08:34:37AM -0700, Andres Freund wrote:
> Hm. There actually already is the mingw ccache installed. The images for 
> mingw and msvc used to be separate (for startup performance when using 
> containers), but we just merged them.  So it might be easiest to just 
> explicitly use the ccache from there

> (also an explicit path might be faster).

I don't think the path search is significant; it's fast so long as
there's no choco stub involved.

> Could you check if that's fast?

Yes, it is.

> If not, we can just install the binaries distributed by the project, it's 
> just more work to keep up2date that way. 

I guess you mean a separate line to copy choco's ccache.exe somewhere.

-- 
Justin
>From ad03da2cfa3ecf23334f8b31f2e4529ba3094512 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 25 May 2022 21:53:22 -0500
Subject: [PATCH 1/7] cirrus/windows: add compiler_warnings_script

The goal is to fail due to warnings only after running tests.
(At least historically, it's too slow to run a separate windows VM to
compile with -Werror.)

https://www.postgresql.org/message-id/20220212212310.f645c6vw3njkgxka%40alap3.anarazel.de

I'm not sure how to write this test in windows shell; it's also easy to
write something that doesn't work in posix sh, since windows shell is
interpretting && and ||...

See also:
8a1ce5e54f6d144e4f8e19af7c767b026ee0c956
https://cirrus-ci.com/task/6241060062494720
https://cirrus-ci.com/task/6496366607204352

20221104161934.gb16...@telsasoft.com
20221031223744.gt16...@telsasoft.com
20221104161934.gb16...@telsasoft.com
20221123054329.gg11...@telsasoft.com
20230224002029.gq1...@telsasoft.com

ci-os-only: windows
---
 .cirrus.tasks.yml | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 33646faeadf..5a2b64f64c2 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -566,12 +566,21 @@ task:
 
   build_script: |
 vcvarsall x64
-ninja -C build
+ninja -C build |tee build.txt
+REM Since pipes lose the exit status of the preceding command, rerun the compilation
+REM without the pipe, exiting now if it fails, to avoid trying to run checks
+ninja -C build > nul
 
   check_world_script: |
 vcvarsall x64
 meson test %MTEST_ARGS% --num-processes %TEST_JOBS%
 
+  # This should be last, so check_world is run even if there are warnings
+  always:
+compiler_warnings_script:
+  # this avoids using metachars which would be interpretted by the windows shell
+  - sh -c 'if grep ": warning " build.txt; then exit 1; fi; exit 0'
+
   on_failure:
 <<: *on_failure_meson
 crashlog_artifacts:
-- 
2.42.0

>From 78031c45afad402397e4efa1389e6396557cd405 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 25 May 2022 22:05:13 -0500
Subject: [PATCH 2/7] cirrus/windows: ccache

ccache 4.7 added support for depend mode with MSVC, and ccache v4.10
supports PCH, so this is now viable.

https://www.postgresql.org/message-id/flat/20220522232606.GZ19626%40telsasoft.com

https://cirrus-ci.com/task/5213936495099904 build 29sec
---
 .cirrus.tasks.yml | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 5a2b64f64c2..6ebbc1ccd81 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -536,6 +536,12 @@ task:
   env:
 TEST_JOBS: 8 # wild guess, data based value welcome
 
+CC: c:\msys64\ucrt64\bin\ccache.exe cl.exe
+CCACHE_DIR: $CIRRUS_WORKING_DIR/.ccache
+CCACHE_MAXSIZE: "500M"
+CCACHE_SLOPPINESS: pch_defines,time_macros
+CCACHE_DEPEND: 1
+
 # Cirrus defaults to SetErrorMode(SEM_NOGPFAULTERRORBOX | ...). That
 # prevents crash reporting from working unless binaries do SetErrorMode()
 # themselves. Furthermore, it appears that either python or, more likely,
@@ -553,6 +559,9 @@ task:
   setup_additional_packages_script: |
 REM choco install -y --no-progress ...
 
+  ccache_cache:
+folder: $CCACHE_DIR
+
   setup_hosts_file_script: |
 echo 127.0.0.1 pg-loadbalancetest >> c:\Windows\System32\Drivers\etc\hosts
 echo 127.0.0.2 pg-loadbalancetest >> c:\Windows\System32\Drivers\etc\hosts
@@ -562,7 +571,7 @@ task:
   # Use /DEBUG:FASTLINK to avoid high memory usage during linking
   configure_script: |
 vcvarsall x64
-meson setup --backend ninja --buildtype debug -Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Dinjection_points=true -Db_pch=true -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=%TAR% -DPG_TEST_EXTRA="%PG_TEST_EXTRA%" build
+meson setup build --backend ninja --buildtype debug -Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Dinjection_points=true -Db_pch=true -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=%TAR% -DPG_TEST_EXTRA="%PG_TEST_EXTRA%" -Dc_args="/Z7&q

Re: Reducing the log spam

2024-06-17 Thread Justin Pryzby

On Thu, May 02, 2024 at 12:47:45PM +0200, Laurenz Albe wrote:
> On Mon, 2024-03-11 at 09:33 +0100, Jelte Fennema-Nio wrote:
> > -   the subscriber's server log.
> > +   the subscriber's server log if you remove 23505 from
> > +   .
> > 
> > This seems like a pretty big regression. Being able to know why your
> > replication got closed seems pretty critical.
> 
> Yes.  But I'd argue that that is a shortcoming of logical replication:
> there should be a ways to get this information via SQL.  Having to look into
> the log file is not a very useful option.
> 
> The feature will become much less useful if unique voilations keep getting 
> logged.

Uh, to be clear, your patch is changing the *defaults*, which I found
surprising, even after reaading the thread.  Evidently, the current
behavior is not what you want, and you want to change it, but I'm *sure*
that whatever default you want to use at your site/with your application
is going to make someone else unhappy.  I surely want unique violations
to be logged, for example.

> @@ -6892,6 +6892,41 @@ local0.*/var/log/postgresql
>
>   
>  
> +  xreflabel="log_suppress_errcodes">
> +  log_suppress_errcodes (string)
> +  
> +   log_suppress_errcodes configuration 
> parameter
> +  
> +  
> +  
> +   
> +Causes ERROR and FATAL messages
> +from client backend processes with certain error codes to be excluded
> +from the log.
> +The value is a comma-separated list of five-character error codes as
> +listed in .  An error code that
> +represents a class of errors (ends with three zeros) suppresses 
> logging
> +of all error codes within that class.  For example, the entry
> +08000 (connection_exception)
> +would suppress an error with code 08P01
> +(protocol_violation).  The default setting is
> +23505,3D000,3F000,42601,42704,42883,42P01,57P03.
> +Only superusers and users with the appropriate SET
> +privilege can change this setting.
> +   

> +
> +   
> +This setting is useful to exclude error messages from the log that 
> are
> +frequent but irrelevant.

I think you should phrase the feature as ".. *allow* skipping error
logging for messages ... that are frequent but irrelevant for a given
site/role/DB/etc."

I suggest that this patch should not change the default behavior at all:
its default should be empty.  That you, personally, would plan to
exclude this or that error code is pretty uninteresting.  I think the
idea of changing the default behavior will kill the patch, and even if
you want to propose to do that, it should be a separate discussion.
Maybe you should make it an 002 patch.

> + {"log_suppress_errcodes", PGC_SUSET, LOGGING_WHEN,
> + gettext_noop("ERROR and FATAL messages with these error 
> codes don't get logged."),
> + NULL,
> + GUC_LIST_INPUT
> + },
> + _suppress_errcodes,
> + DEFAULT_LOG_SUPPRESS_ERRCODES,
> + check_log_suppress_errcodes, assign_log_suppress_errcodes, NULL

> +/*
> + * Default value for log_suppress_errcodes.  ERROR or FATAL messages with
> + * these error codes are never logged.  Error classes (error codes ending 
> with
> + * three zeros) match all error codes in the class.   The idea is to suppress
> + * messages that usually don't indicate a serious problem but tend to pollute
> + * the log file.
> + */
> +
> +#define DEFAULT_LOG_SUPPRESS_ERRCODES 
> "23505,3D000,3F000,42601,42704,42883,42P01,57P03"
> +

../src/backend/utils/errcodes.txt:23505EERRCODE_UNIQUE_VIOLATION
   unique_violation
../src/backend/utils/errcodes.txt:3D000EERRCODE_INVALID_CATALOG_NAME
   invalid_catalog_name
../src/backend/utils/errcodes.txt:3F000EERRCODE_INVALID_SCHEMA_NAME 
   invalid_schema_name
../src/backend/utils/errcodes.txt:42601EERRCODE_SYNTAX_ERROR
   syntax_error
../src/backend/utils/errcodes.txt:3D000EERRCODE_UNDEFINED_DATABASE
../src/backend/utils/errcodes.txt:42883EERRCODE_UNDEFINED_FUNCTION  
   undefined_function
../src/backend/utils/errcodes.txt:3F000EERRCODE_UNDEFINED_SCHEMA
../src/backend/utils/errcodes.txt:42P01EERRCODE_UNDEFINED_TABLE 
   undefined_table
../src/backend/utils/errcodes.txt:42704EERRCODE_UNDEFINED_OBJECT
   undefined_object
../src/backend/utils/errcodes.txt:57P03EERRCODE_CANNOT_CONNECT_NOW  
   cannot_connect_now

-- 
Justin

Re: CREATE INDEX CONCURRENTLY on partitioned index

2024-06-15 Thread Justin Pryzby

On Thu, May 23, 2024 at 10:14:57PM +0100, Ilya Gladyshev wrote:
> Hi,
> 
> I think it's well worth the effort to revive the patch, so I rebased it on
> master, updated it and will return it back to the commitfest. Alexander,
> Justin feel free to add yourselves as authors

Thanks -- I was intending to write about this.

I realized that the patch will need some isolation tests to exercise its
concurrent behavior.

-- 
Justin

Re: CI and test improvements

2024-06-14 Thread Justin Pryzby

On Fri, Jun 14, 2024 at 05:36:54PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> On Thu, 13 Jun 2024 at 14:56, Justin Pryzby  wrote:
> >
> > ccache should be installed in the image rather than re-installed on each
> > invocation.
> 
> ccache is installed in the Windows VM images now [1]. It can be used
> as 'set CC=ccache.exe cl.exe' in the Windows CI task.
> 
> [1] 
> https://github.com/anarazel/pg-vm-images/commit/03a9225ac962fb30b5c0722c702941e2d7c1e81e

Thanks.  I think that works by using a "shim" created by choco in
C:\ProgramData\chocolatey\bin.

But going through that indirection seems to incur an extra 15sec of
compilation time; I think we'll want to do something to avoid that.

I'm not sure what the options are, like maybe installing ccache into a
fixed path like c:\ccache or installing a custom link, to a "pinned"
version of ccache.

-- 
Justin

Re: CI and test improvements

2024-06-13 Thread Justin Pryzby

On Thu, Jun 13, 2024 at 02:38:46PM +0300, Nazir Bilal Yavuz wrote:
> On Wed, 12 Jun 2024 at 16:10, Justin Pryzby  wrote:
> > On Fri, Jun 24, 2022 at 08:38:50AM +1200, Thomas Munro wrote:
> > > > I've also sent some patches to Thomas for cfbot to help progress some 
> > > > of these
> > > > patches (code coverage and documentation upload as artifacts).
> > > > https://github.com/justinpryzby/cfbot/commits/master
> > >
> > > Thanks, sorry for lack of action, will get to these soon.
> >
> > On Tue, Feb 13, 2024 at 01:10:21PM -0600, Justin Pryzby wrote:
> > > I sent various patches to cfbot but haven't heard back.
> >
> > > https://www.postgresql.org/message-id/flat/20220409021853.gp24...@telsasoft.com
> > > https://www.postgresql.org/message-id/flat/20220623193125.gb22...@telsasoft.com
> > > https://github.com/justinpryzby/cfbot/commits/master
> > > https://github.com/macdice/cfbot/pulls
> >
> > @Thomas: ping
> >
> > I reintroduced the patch for ccache/windows -- v4.10 supports PCH, which
> > can make the builds 2x faster.
> 
> I applied 0001 and 0002 to see ccache support on Windows but the build
> step failed with: 'ccache: error: No stats log has been configured'.
> Perhaps you forgot to add 'CCACHE_STATSLOG: $CCACHE_DIR.stats.log' to
> 0002?

Something like that - I put the line back.  I don't know if statslog
should be included in the patch, but it's useful for demonstrating that
it's working.

ccache should be installed in the image rather than re-installed on each
invocation.

-- 
Justin
>From ad03da2cfa3ecf23334f8b31f2e4529ba3094512 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 25 May 2022 21:53:22 -0500
Subject: [PATCH 1/6] cirrus/windows: add compiler_warnings_script

The goal is to fail due to warnings only after running tests.
(At least historically, it's too slow to run a separate windows VM to
compile with -Werror.)

https://www.postgresql.org/message-id/20220212212310.f645c6vw3njkgxka%40alap3.anarazel.de

I'm not sure how to write this test in windows shell; it's also easy to
write something that doesn't work in posix sh, since windows shell is
interpretting && and ||...

See also:
8a1ce5e54f6d144e4f8e19af7c767b026ee0c956
https://cirrus-ci.com/task/6241060062494720
https://cirrus-ci.com/task/6496366607204352

20221104161934.gb16...@telsasoft.com
20221031223744.gt16...@telsasoft.com
20221104161934.gb16...@telsasoft.com
20221123054329.gg11...@telsasoft.com
20230224002029.gq1...@telsasoft.com

ci-os-only: windows
---
 .cirrus.tasks.yml | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 33646faeadf..5a2b64f64c2 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -566,12 +566,21 @@ task:
 
   build_script: |
 vcvarsall x64
-ninja -C build
+ninja -C build |tee build.txt
+REM Since pipes lose the exit status of the preceding command, rerun the compilation
+REM without the pipe, exiting now if it fails, to avoid trying to run checks
+ninja -C build > nul
 
   check_world_script: |
 vcvarsall x64
 meson test %MTEST_ARGS% --num-processes %TEST_JOBS%
 
+  # This should be last, so check_world is run even if there are warnings
+  always:
+compiler_warnings_script:
+  # this avoids using metachars which would be interpretted by the windows shell
+  - sh -c 'if grep ": warning " build.txt; then exit 1; fi; exit 0'
+
   on_failure:
 <<: *on_failure_meson
 crashlog_artifacts:
-- 
2.42.0

>From 31629f2f413ceb5f2704ac71e24d0d8e82cc26ab Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 25 May 2022 22:05:13 -0500
Subject: [PATCH 2/6] WIP: cirrus/windows: ccache

ccache v4.10 supports PCH, so this now seems viable.
(ccache 4.7 added support for depend mode with MSVC).

https://www.postgresql.org/message-id/flat/20220522232606.GZ19626%40telsasoft.com

ccache should be installed in the VM rather than with choco in each
invocation.

https://cirrus-ci.com/task/5213936495099904 build 29sec

, linux
linux-meson
ci-os-only: windows, windows-msvc
---
 .cirrus.tasks.yml | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 5a2b64f64c2..d8c2a396a20 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -536,6 +536,12 @@ task:
   env:
 TEST_JOBS: 8 # wild guess, data based value welcome
 
+CCACHE_DIR: $CIRRUS_WORKING_DIR/.ccache
+CCACHE_MAXSIZE: "500M"
+CCACHE_SLOPPINESS: pch_defines,time_macros
+CCACHE_DEPEND: 1
+CCACHE_STATSLOG: $CCACHE_DIR.stats.log
+
 # Cirrus defaults to SetErrorMode(SEM_NOGPFAULTERRORBOX | ...). That
 # prevents crash reporting from working unless binaries do SetErrorMode()
 # themselves. Furth

Re: CI and test improvements

2024-06-12 Thread Justin Pryzby

On Fri, Jun 24, 2022 at 08:38:50AM +1200, Thomas Munro wrote:
> > I've also sent some patches to Thomas for cfbot to help progress some of 
> > these
> > patches (code coverage and documentation upload as artifacts).
> > https://github.com/justinpryzby/cfbot/commits/master
> 
> Thanks, sorry for lack of action, will get to these soon.

On Tue, Feb 13, 2024 at 01:10:21PM -0600, Justin Pryzby wrote:
> I sent various patches to cfbot but haven't heard back.

> https://www.postgresql.org/message-id/flat/20220409021853.gp24...@telsasoft.com
> https://www.postgresql.org/message-id/flat/20220623193125.gb22...@telsasoft.com
> https://github.com/justinpryzby/cfbot/commits/master
> https://github.com/macdice/cfbot/pulls

@Thomas: ping

I reintroduced the patch for ccache/windows -- v4.10 supports PCH, which
can make the builds 2x faster.
>From 8cebe912127ee5c5d03b62b231811c2670698da0 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 25 May 2022 21:53:22 -0500
Subject: [PATCH 1/6] cirrus/windows: add compiler_warnings_script

The goal is to fail due to warnings only after running tests.
(At least historically, it's too slow to run a separate windows VM to
compile with -Werror.)

https://www.postgresql.org/message-id/20220212212310.f645c6vw3njkgxka%40alap3.anarazel.de

I'm not sure how to write this test in windows shell; it's also easy to
write something that doesn't work in posix sh, since windows shell is
interpretting && and ||...

See also:
8a1ce5e54f6d144e4f8e19af7c767b026ee0c956
https://cirrus-ci.com/task/6241060062494720
https://cirrus-ci.com/task/6496366607204352

20221104161934.gb16...@telsasoft.com
20221031223744.gt16...@telsasoft.com
20221104161934.gb16...@telsasoft.com
20221123054329.gg11...@telsasoft.com
20230224002029.gq1...@telsasoft.com

ci-os-only: windows
---
 .cirrus.tasks.yml | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 33646faeadf..5a2b64f64c2 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -566,12 +566,21 @@ task:
 
   build_script: |
 vcvarsall x64
-ninja -C build
+ninja -C build |tee build.txt
+REM Since pipes lose the exit status of the preceding command, rerun the compilation
+REM without the pipe, exiting now if it fails, to avoid trying to run checks
+ninja -C build > nul
 
   check_world_script: |
 vcvarsall x64
 meson test %MTEST_ARGS% --num-processes %TEST_JOBS%
 
+  # This should be last, so check_world is run even if there are warnings
+  always:
+compiler_warnings_script:
+  # this avoids using metachars which would be interpretted by the windows shell
+  - sh -c 'if grep ": warning " build.txt; then exit 1; fi; exit 0'
+
   on_failure:
 <<: *on_failure_meson
 crashlog_artifacts:
-- 
2.42.0

>From 7f8e1c244b24f04a28d5a1a03da85f00419717e1 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 25 May 2022 22:05:13 -0500
Subject: [PATCH 2/6] cirrus/windows: ccache

https://www.postgresql.org/message-id/flat/20220522232606.GZ19626%40telsasoft.com

https://cirrus-ci.com/task/5213936495099904 build 29sec

, linux
linux-meson
ci-os-only: windows, windows-msvc
---
 .cirrus.tasks.yml | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 5a2b64f64c2..7a3fcc36fae 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -536,6 +536,11 @@ task:
   env:
 TEST_JOBS: 8 # wild guess, data based value welcome
 
+CCACHE_DIR: $CIRRUS_WORKING_DIR/.ccache
+CCACHE_MAXSIZE: "500M"
+CCACHE_SLOPPINESS: pch_defines,time_macros
+CCACHE_DEPEND: 1
+
 # Cirrus defaults to SetErrorMode(SEM_NOGPFAULTERRORBOX | ...). That
 # prevents crash reporting from working unless binaries do SetErrorMode()
 # themselves. Furthermore, it appears that either python or, more likely,
@@ -550,8 +555,11 @@ task:
   depends_on: SanityCheck
   only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*windows.*'
 
-  setup_additional_packages_script: |
-REM choco install -y --no-progress ...
+  setup_additional_packages_script:
+- choco install -y --no-progress ccache --version 4.10.0
+
+  ccache_cache:
+folder: $CCACHE_DIR
 
   setup_hosts_file_script: |
 echo 127.0.0.1 pg-loadbalancetest >> c:\Windows\System32\Drivers\etc\hosts
@@ -562,7 +570,8 @@ task:
   # Use /DEBUG:FASTLINK to avoid high memory usage during linking
   configure_script: |
 vcvarsall x64
-meson setup --backend ninja --buildtype debug -Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Dinjection_points=true -Db_pch=true -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=%TAR% -DPG_TEST_EXTRA="%PG_TEST_EXTRA%" build
+set CC=c:\ProgramData\chocolatey\lib\ccache\tools\ccache-4.10-windows-x86_64\

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-06-04 Thread Justin Pryzby

On Tue, May 21, 2024 at 08:33:51AM -0500, Justin Pryzby wrote:
> It occurred to me that psql \dP+ should show the AM of partitioned
> tables (and other partitioned rels).

ping

Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

2024-06-03 Thread Justin Pryzby

On Thu, May 30, 2024 at 10:59:06AM -0700, David Christensen wrote:
> Hi Justin,
> 
> Thanks for the patch and the work on it.  In reviewing the basic
> feature, I think this is something that has utility and is worthwhile
> at the high level.

Thanks for looking.

> A few more specific notes:
> 
> The pg_namespace_size() function can stand on its own, and probably
> has some utility for the released Postgres versions.

Are you suggesting to add the C function retroactively in back branches?
I don't think anybody would consider doing that.

It wouldn't be used by anything internally, and any module that wanted
to use it would have to check the minor version, instead of just the
major version, which is wrong.

> I do think the psql implementation for the \dn+ or \dA+ commands
> shouldn't need to use this same function; it's a straightforward
> expansion of the SQL query that can be run in a way that will be
> backwards-compatible with any connected postgres version, so no reason
> to exclude this information for this cases.  (This may have been in an
> earlier revision of the patchset; I didn't check everything.)

I think you're suggesting to write the query in SQL rather than in C.

But I did that in the first version of the patch, and the response was
that maybe in the future someone would want to add permission checks
that would compromize the ability to get correct results from SQL, so
then I presented the functionality writen in C.

I recommend that reviewers try to read the existing communication on the
thread, otherwise we end up going back and forth about the same things.

> I think the \dX++ command versions add code complexity without a real
> need for it.

If you view this as a way to "show schema sizes", then you're right,
there's no need.  But I don't want this patch to necessary further
embrace the idea that it's okay for "plus commands to be slow and show
nonportable results".  If there were a consensus that it'd be fine in a
plus command, I would be okay with that, though.

> We have precedence with \l(+) to show permissions on the
> basic display and size on the extended display, and I think this is
> sufficient in this case here.

You also have the precedence that \db doesn't show the ACL, and you
can't get it without also computing the sizes.  That's 1) inconsistent
with \l and 2) pretty inconvenient for someone who wants to show the
ACL (as mentioned in the first message on this thread).

> 0001-Add-pg_am_size-pg_namespace_size.patch
> - fine, but needs rebase to work

I suggest reviewers to consider sending a rebased patch, optionally with
any proposed changes in a separate patch.

> 0002-psql-add-convenience-commands-dA-and-dn.patch
> - split into just + variant; remove \l++
> - make the \dn show permission and \dn+ show size

> 0003-f-convert-the-other-verbose-to-int-too.patch
> - unneeded
> 0004-Move-the-double-plus-Size-columns-to-the-right.patch
> - unneeded

You say they're unneeded, but what I've been hoping for is a committer
interested enough to at least suggest whether to run with 001, 001+002,
001+002+003, or 001+002+003+004.  They're intentionally presented as
such.

I've also thought about submitting a patch specifically dedicated to
"moving size out of + and into ++".  I find the idea compelling, for the
reasons I wrote in the the patch description.  That'd be like presenting
003+004 first.

I'm opened to changing the behavior or the implementation.  But changing
the patch as I've presented it based on one suggestion I think will lead
to incoherent code trashing.  I need to hear a wider agreement.

-- 
Justin
>From 713213bb75dd4b7c45eab9219422a3ab4339df25 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 13 Jul 2021 21:25:48 -0500
Subject: [PATCH 1/4] Add pg_am_size(), pg_namespace_size() ..

See also: 358a897fa, 528ac10c7
---
 doc/src/sgml/func.sgml  |  39 ++
 src/backend/utils/adt/dbsize.c  | 132 
 src/include/catalog/pg_proc.dat |  19 +
 3 files changed, 190 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 17c44bc3384..0515412ed30 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29596,6 +29596,45 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset

+  
+   
+
+ pg_namespace_size
+
+pg_namespace_size ( name )
+bigint
+   
+   
+pg_namespace_size ( oid )
+bigint
+   
+   
+Computes the total disk space used by relations in the namespace (schema)
+with the specified name or OID. To use this function, you must
+have CREATE privilege on the specified namespace
+or have privileges of the pg_read_all_stats role,
+unless it is the default namespace for the current database.
+

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-05-25 Thread Justin Pryzby

On Fri, May 03, 2024 at 04:32:25PM +0300, Alexander Korotkov wrote:
> On Fri, May 3, 2024 at 4:23 PM Justin Pryzby  wrote:
> > Note that the error that led to "EXCLUDING IDENTITY" is being discused
> > over here:
> > https://www.postgresql.org/message-id/3b8a9dc1-bbc7-0ef5-6863-c432afac7...@gmail.com
> >
> > It's possible that once that's addressed, the exclusion should be
> > removed here, too.
> 
> +1

Can EXCLUDING IDENTITY be removed now ?

I wasn't able to find why it was needed - at one point, I think there
was a test case that threw an error, but now when I remove the EXCLUDE,
nothing goes wrong.

-- 
Justin

Re: warn if GUC set to an invalid shared library

2024-05-24 Thread Justin Pryzby

On Fri, May 24, 2024 at 09:26:54AM -0400, Robert Haas wrote:
> On Thu, Jul 6, 2023 at 4:15 PM Justin Pryzby  wrote:

> But then I realized, reading another email, that Justin already knew
> that the behavior would change, or at least I'm 90% certain that he
> knows that.

You give me too much credit..

> On the behavior change itself, it seems to me that there's a big
> difference between shared_preload_libraries=bogus and work_mem=bogus.
..
> So if changing PGC_S_FILE to
> PGC_S_TEST in AlterSystemSetConfigFile is going to have the effect of
> allowing garbage values into postgresql.auto.conf that would currently
> get blocked, I think that's a bad plan and we shouldn't do it.

Right - this is something I'd failed to realize.  We can't change it in
the naive way because it allows bogus values, and not just missing
libraries.  Specifically, for GUCs with assign hooks conditional on
PGC_TEST.

We don't want to change the behavior to allow this to succeed -- it
would allow leaving the server in a state that it fails to start (rather
than helping to avoid doing so, as intended by this thread).

regression=# ALTER SYSTEM SET default_table_access_method=abc;
NOTICE:  table access method "abc" does not exist
ALTER SYSTEM

Maybe there should be a comment explaning why PGC_FILE is used, and
maybe there should be a TAP test for the behavior, but they're pretty
unrelated to this thread.  So, I've dropped the 001 patch.  

-- 
Justin
>From d29d0cfcf9d15dad7db1d0dd334e3e77cdad653f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 13 Dec 2021 08:42:38 -0600
Subject: [PATCH 1/3] warn when setting GUC to a nonextant library

Note that warnings shouldn't be issued during startup (only fatals):

$ ./tmp_install/usr/local/pgsql/bin/postgres -D ./testrun/regress/regress/tmp_check/data -c shared_preload_libraries=ab
2023-07-06 14:47:03.817 CDT postmaster[2608106] FATAL:  could not access file "ab": No existe el archivo o el directorio
2023-07-06 14:47:03.817 CDT postmaster[2608106] CONTEXT:  while loading shared libraries for setting "shared_preload_libraries"
---
 src/backend/commands/variable.c   | 95 +++
 src/backend/utils/misc/guc_tables.c   |  6 +-
 src/include/utils/guc_hooks.h |  7 ++
 .../unsafe_tests/expected/rolenames.out   | 19 
 .../modules/unsafe_tests/sql/rolenames.sql| 13 +++
 src/test/regress/expected/rules.out   |  8 ++
 6 files changed, 145 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 9345131711e..bab51c4572a 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -40,6 +40,7 @@
 #include "utils/timestamp.h"
 #include "utils/tzparser.h"
 #include "utils/varlena.h"
+#include 
 
 /*
  * DATESTYLE
@@ -1234,3 +1235,97 @@ check_ssl(bool *newval, void **extra, GucSource source)
 #endif
 	return true;
 }
+
+
+/*
+ * See also load_libraries() and internal_load_library().
+ */
+static bool
+check_preload_libraries(char **newval, void **extra, GucSource source,
+		bool restricted)
+{
+	char	   *rawstring;
+	List	   *elemlist;
+	ListCell   *l;
+
+	/* nothing to do if empty */
+	if (newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * issue warnings only during an interactive SET, from ALTER
+	 * ROLE/DATABASE/SYSTEM.
+	 */
+	if (source != PGC_S_TEST)
+		return true;
+
+	/* Need a modifiable copy of string */
+	rawstring = pstrdup(*newval);
+
+	/* Parse string into list of filename paths */
+	if (!SplitDirectoriesString(rawstring, ',', ))
+	{
+		/* Should not happen ? */
+		return false;
+	}
+
+	foreach(l, elemlist)
+	{
+		/* Note that filename was already canonicalized */
+		char	   *filename = (char *) lfirst(l);
+		char	   *expanded = NULL;
+
+		/* If restricting, insert $libdir/plugins if not mentioned already */
+		if (restricted && first_dir_separator(filename) == NULL)
+		{
+			expanded = psprintf("$libdir/plugins/%s", filename);
+			filename = expanded;
+		}
+
+		/*
+		 * stat()/access() only check that the library exists, not that it has
+		 * the correct magic number or even a library.  But error messages
+		 * from dlopen() are not portable, so it'd be hard to report any
+		 * problem other than "does not exist".
+		 */
+		if (access(filename, R_OK) == 0)
+			continue;
+
+		if (source == PGC_S_FILE)
+			/* ALTER SYSTEM */
+			ereport(WARNING,
+	errcode_for_file_access(),
+	errmsg("could not access file \"%s\"", filename),
+	errdetail("The server will currently fail to start with this setting."),
+	errhint("If the server is shut down, it will be necessary to manually edit the %s file to allow it to start again.",
+			"postgresql.auto.conf"));
+		else
+			/* ALTER ROLE/DATABASE */
+			ereport(WARNING,
+	errcode_for_file_acce

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-05-21 Thread Justin Pryzby

It occurred to me that psql \dP+ should show the AM of partitioned
tables (and other partitioned rels).
Arguably, this could've been done when \dP was introduced in v12, but
at that point would've shown the AM only for partitioned indexes.
But it makes a lot of sense to do it now that partitioned tables support
AMs.  I suggest to consider this for v17.

regression=# \dP+
  List of partitioned relations
 Schema | Name |  Owner  |   Type| Table  | 
Access method | Total size | Description
+--+-+---++---++-
 public | mlparted | pryzbyj | partitioned table || 
heap2 | 104 kB |
 public | tableam_parted_heap2 | pryzbyj | partitioned table || 
  | 32 kB  |
 public | trigger_parted   | pryzbyj | partitioned table || 
  | 0 bytes|
 public | upsert_test  | pryzbyj | partitioned table || 
  | 8192 bytes |
 public | trigger_parted_pkey  | pryzbyj | partitioned index | trigger_parted | 
btree | 16 kB  |
 public | upsert_test_pkey | pryzbyj | partitioned index | upsert_test| 
btree | 8192 bytes |
---
 src/bin/psql/describe.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index f67bf0b8925..22a668409e7 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4113,7 +4113,7 @@ listPartitionedTables(const char *reltypes, const char 
*pattern, bool verbose)
PQExpBufferData title;
PGresult   *res;
printQueryOpt myopt = pset.popt;
-   booltranslate_columns[] = {false, false, false, false, 
false, false, false, false, false};
+   booltranslate_columns[] = {false, false, false, false, 
false, false, false, false, false, false};
const char *tabletitle;
boolmixed_output = false;
 
@@ -4181,6 +4181,14 @@ listPartitionedTables(const char *reltypes, const char 
*pattern, bool verbose)
 
if (verbose)
{
+   /*
+* Table access methods were introduced in v12, and can be set 
on
+* partitioned tables since v17.
+*/
+   appendPQExpBuffer(,
+ ",\n  am.amname as \"%s\"",
+ gettext_noop("Access 
method"));
+
if (showNested)
{
appendPQExpBuffer(,
@@ -4216,6 +4224,9 @@ listPartitionedTables(const char *reltypes, const char 
*pattern, bool verbose)
 
if (verbose)
{
+   appendPQExpBufferStr(,
+"\n LEFT JOIN 
pg_catalog.pg_am am ON c.relam = am.oid");
+
if (pset.sversion < 12)
{
appendPQExpBufferStr(,
-- 
2.42.0

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-05-14 Thread Justin Pryzby

On Thu, May 09, 2024 at 12:51:32AM +0300, Alexander Korotkov wrote:
> > > However, parent's table extended statistics already covers all its
> > > children.
> >
> > => That's the wrong explanation.  It's not that "stats on the parent
> > table cover its children".  It's that there are two types of stats:
> > stats for the "table hierarchy" and stats for the individual table.
> > That's true for single-column stats as well as for extended stats.
> > In both cases, that's indicated by the inh flag in the code and in the
> > catalog.
> >
> > The right explanation is that extended stats on partitioned tables are
> > not similar to indexes.  Indexes on parent table are nothing other than
> > a mechanism to create indexes on the child tables.  That's not true for
> > stats.
> >
> > See also my prior messages
> > ZiJW1g2nbQs9ekwK@pryzbyj2023
> > Zi5Msg74C61DjJKW@pryzbyj2023
> 
> Yes, I understand that parents pg_statistic entry with stainherit ==
> true includes statistics for the children.  I tried to express this by
> word "covers".  But you're right, this is the wrong explanation.
> 
> Can I, please, ask you to revise the patch?

I tried to make this clear but it'd be nice if someone (Tomas/Alvaro?)
would check that this says what's wanted.

-- 
Justin
>From 265207e5bdb215600ce5d7b45f627bc41fc2bc26 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov 
Date: Wed, 8 May 2024 20:32:20 +0300
Subject: [PATCH] Don't copy extended statistics during MERGE/SPLIT partition
 operations

When MERGE/SPLIT created new partitions, it was cloning the extended
statistics of the parent table.

However, extended stats on partitioned tables are not analgous to
indexes on partitioned tables (which exist only to create physical
indexes on child tables).  Rather, extended stats on a parent 1) cause
extended stats to be collected and computed across the whole partition
heirarchy, and 2) do not cause extended stats to be computed for the
individual partitions.

"CREATE TABLE ... PARTITION OF" command doesn't copy extended
statistics.  This commit makes createPartitionTable() behave
consistently.

Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/ZiJW1g2nbQs9ekwK%40pryzbyj2023
---
 doc/src/sgml/ref/alter_table.sgml | 9 +++--
 src/backend/commands/tablecmds.c  | 8 +---
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 891fa4a7a04..313c722ee7f 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1154,9 +1154,12 @@ WITH ( MODULUS numeric_literal, REM
  
  
   The new partitions will be created the same as tables created with the
-  SQL command CREATE TABLE partition_nameN (LIKE name INCLUDING ALL EXCLUDING INDEXES EXCLUDING IDENTITY).
+  SQL command CREATE TABLE partition_nameN (LIKE name INCLUDING ALL EXCLUDING INDEXES EXCLUDING IDENTITY EXCLUDING STATISTICS).
   The indexes and identity are created later, after moving the data
   into the new partitions.
+  Extended statistics aren't copied from the parent table, for consistency with
+  CREATE TABLE PARTITION OF.
+
   New partitions will have the same table access method as the parent.
   If the parent table is persistent then new partitions are created
   persistent.  If the parent table is temporary then new partitions
@@ -1224,9 +1227,11 @@ WITH ( MODULUS numeric_literal, REM
  
  
   The new partition will be created the same as a table created with the
-  SQL command CREATE TABLE partition_name (LIKE name INCLUDING ALL EXCLUDING INDEXES EXCLUDING IDENTITY).
+  SQL command CREATE TABLE partition_name (LIKE name INCLUDING ALL EXCLUDING INDEXES EXCLUDING IDENTITY EXCLUDING STATISTICS).
   The indexes and identity are created later, after moving the data
   into the new partition.
+  Extended statistics aren't copied from the parent table, for consistency with
+  CREATE TABLE PARTITION OF.
   The new partition will have the same table access method as the parent.
   If the parent table is persistent then the new partition is created
   persistent.  If the parent table is temporary then the new partition
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 79c9c031833..50fc54cb309 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -20419,7 +20419,7 @@ moveSplitTableRows(Relation rel, Relation splitRel, List *partlist, List *newPar
  * (newPartName) like table (modelRel)
  *
  * Emulates command: CREATE [TEMP] TABLE  (LIKE 
- * INCLUDING ALL EXCLUDING INDEXES EXCLUDING IDENTITY)
+ * INCLUDING ALL EXCLUDING INDEXES EXCLUDING IDENTITY EXCLUDING STATISTICS)
  *
  * Also, this function sets the new partition acc

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-05-08 Thread Justin Pryzby

On Wed, May 08, 2024 at 09:00:10PM +0300, Alexander Korotkov wrote:
> On Fri, May 3, 2024 at 4:32 PM Alexander Korotkov  
> wrote:
> > On Fri, May 3, 2024 at 4:23 PM Justin Pryzby  wrote:
> > > On Wed, May 01, 2024 at 10:51:24PM +0300, Dmitry Koval wrote:
> > > > 30.04.2024 23:15, Justin Pryzby пишет:
> > > > > Is this issue already fixed ?
> > > > > I wasn't able to reproduce it.  Maybe it only happened with earlier
> > > > > patch versions applied ?
> > > >
> > > > I think this was fixed in commit [1].
> > > >
> > > > [1] 
> > > > https://github.com/postgres/postgres/commit/fcf80c5d5f0f3787e70fca8fd029d2e08a923f91
> > >
> > > I tried to reproduce it at fcf80c5d5f~, but couldn't.
> > > I don't see how that patch would fix it anyway.
> > > I'm hoping Alexander can confirm what happened.
> >
> > This problem is only relevant for an old version of fix [1], which
> > overrides schemas for new partitions.  That version was never
> > committed.
> 
> Here are the patches.
> 0002 Skips copying extended statistics while creating new partitions in 
> MERGE/SPLIT
> 
> For 0002 I'd like to hear some feedback on wordings used in docs and comments.

commit message:

Currenlty => Currently
partiions => partitios
copying => by copying

> However, parent's table extended statistics already covers all its
> children.

=> That's the wrong explanation.  It's not that "stats on the parent
table cover its children".  It's that there are two types of stats:
stats for the "table hierarchy" and stats for the individual table.
That's true for single-column stats as well as for extended stats.
In both cases, that's indicated by the inh flag in the code and in the
catalog.

The right explanation is that extended stats on partitioned tables are
not similar to indexes.  Indexes on parent table are nothing other than
a mechanism to create indexes on the child tables.  That's not true for
stats.

See also my prior messages
ZiJW1g2nbQs9ekwK@pryzbyj2023
Zi5Msg74C61DjJKW@pryzbyj2023

I think EXCLUDE IDENTITY can/should now also be removed - see 509199587.
I'm not able to reproduce that problem anyway, even before that...

-- 
Justin

Re: backend stuck in DataFileExtend

2024-05-07 Thread Justin Pryzby

On Tue, May 07, 2024 at 10:55:28AM +1200, Thomas Munro wrote:
> On Tue, May 7, 2024 at 6:21 AM Justin Pryzby  wrote:
> > FWIW: both are running zfs-2.2.3 RPMs from zfsonlinux.org.
> ...
> > Yes, they're running centos7 with the indicated kernels.
> 
> So far we've got:
> 
> * spurious EIO when opening a file (your previous report)
> * hanging with CPU spinning (?) inside pwritev()
> * old kernel, bleeding edge ZFS
> 
> From an (uninformed) peek at the ZFS code, if it really is spinning
> there is seems like a pretty low level problem: it's finish the write,
> and now is just trying to release (something like our unpin) and
> unlock the buffers, which involves various code paths that might touch
> various mutexes and spinlocks, and to get stuck like that I guess it's
> either corrupted itself or it is deadlocking against something else,
> but what?  Do you see any other processes (including kernel threads)
> with any stuck stacks that might be a deadlock partner?

Sorry, but even after forgetting several times, I finally remembered to
go back to issue, and already rebooted the VM as needed to kill the
stuck process.

But .. it seems to have recurred again this AM.  Note that yesterday,
I'd taken the opportunity to upgrade to zfs-2.2.4.

These two procs are the oldest active postgres procs, and also (now)
adjacent in ps -ef --sort start_time.

-[ RECORD 1 
]+
backend_start| 2024-05-07 09:45:06.228528-06
application_name | 
xact_start   | 2024-05-07 09:55:38.409549-06
query_start  | 2024-05-07 09:55:38.409549-06
state_change | 2024-05-07 09:55:38.409549-06
pid  | 27449
backend_type | autovacuum worker
wait_event_type  | BufferPin
wait_event   | BufferPin
state| active
backend_xid  | 
backend_xmin | 4293757489
query_id | 
left | autovacuum: VACUUM ANALYZE child.
-[ RECORD 2 
]+
backend_start| 2024-05-07 09:49:24.686314-06
application_name | MasterMetricsLoader -n -m Xml
xact_start   | 2024-05-07 09:50:30.387156-06
query_start  | 2024-05-07 09:50:32.744435-06
state_change | 2024-05-07 09:50:32.744436-06
pid  | 5051
backend_type | client backend
wait_event_type  | IO
wait_event   | DataFileExtend
state| active
backend_xid  | 4293757489
backend_xmin | 4293757429
query_id | 2230046159014513529
left | PREPARE mml_0 AS INSERT INTO chil

The earlier proc is doing:
strace: Process 27449 attached
epoll_wait(11, ^Cstrace: Process 27449 detached
 

The later process is stuck, with:
[pryzbyj@telsasoft-centos7 ~]$ sudo cat /proc/5051/stack 
[] 0x

For good measure:
[pryzbyj@telsasoft-centos7 ~]$ sudo cat /proc/27433/stack 
[] taskq_thread+0x48e/0x4e0 [spl]
[] kthread+0xd1/0xe0
[] ret_from_fork_nospec_end+0x0/0x39
[] 0x
[pryzbyj@telsasoft-centos7 ~]$ sudo cat /proc/27434/stack 
[] taskq_thread+0x48e/0x4e0 [spl]
[] kthread+0xd1/0xe0
[] ret_from_fork_nospec_end+0x0/0x39
[] 0x

[pryzbyj@telsasoft-centos7 ~]$ ps -O wchan 5051 27449
  PID === S TTY  TIME COMMAND
 5051 ?   R ?02:14:27 postgres: telsasoft ts ::1(53708) 
INSERT
27449 ep_poll S ?00:05:16 postgres: autovacuum worker ts

The interesting procs might be:

ps -eO wchan===,lstart --sort start_time
...
15632 worker_thread  Mon May  6 23:51:34 2024 S ?00:00:00 [kworker/2:2H]
27433 taskq_thread   Tue May  7 09:35:59 2024 S ?00:00:56 [z_wr_iss]
27434 taskq_thread   Tue May  7 09:35:59 2024 S ?00:00:57 [z_wr_iss]
27449 ep_pollTue May  7 09:45:05 2024 S ?00:05:16 postgres: 
autovacuum worker ts
 5051 ?  Tue May  7 09:49:23 2024 R ?02:23:04 postgres: 
telsasoft ts ::1(53708) INSERT
 7861 ep_pollTue May  7 09:51:25 2024 S ?00:03:04 
/usr/local/bin/python3.8 -u /home/telsasoft/server/alarms/core/pr...
 7912 ep_pollTue May  7 09:51:27 2024 S ?00:00:00 postgres: 
telsasoft ts ::1(53794) idle
24518 futex_wait_que Tue May  7 10:42:56 2024 S ?00:00:55 java -jar 
/home/telsasoft/server/alarms/alcatel_lucent/jms/jms2rm...
...

> While looking around for reported issues I found your abandoned report
> against an older ZFS version from a few years ago, same old Linux
> version:
> 
> https://github.com/openzfs/zfs/issues/11641
> 
> I don't know enough to say anything useful about that but it certainly
> smells similar...

Wow - I'd completely forgotten about that problem report.
The symptoms are the same, even with a zfs version 3+ years newer.
I wish the ZFS people would do more with their problem reports.

BTW, we'll be upgrading this VM to a newer kernel, if not a newer OS
(for some reason,

pg_restore -N loses extension comment

2024-05-07 Thread Justin Pryzby

pg_dump -Fc |pg_restore -l -N schema:

| 2; 3079 18187 EXTENSION - pg_buffercache 

Without -N schema also shows:

| 2562; 0 0 COMMENT - EXTENSION pg_buffercache 

I mean literal s-c-h-e-m-a, but I suppose anything else will work the
same.

BTW, I noticed that pg_restore -v shows that duplicate dependencies can be
stored.  We see things like this (and worse).

| 4284; 1259 191439414 VIEW public wmg_server_view telsasoft
| ;  depends on: 612 612 612 612 612 612 612 612 612 612 612 612 612 612 
612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 
612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 
612 612 23087 612

I see that's possible not only for views, but also tables.
That's probaably wasteful of CPU, at least.

-- 
Justin

Re: bug: copy progress reporting of backends which run multiple COPYs

2024-05-07 Thread Justin Pryzby

On Sat, Jan 21, 2023 at 02:45:40AM +0100, Matthias van de Meent wrote:
> > Would you do anything different in the master branch, with no
> > compatibility constraints ?  I think the progress reporting would still
> > be limited to one row per backend, not one per CopyFrom().
> 
> I think I would at least introduce another parameter to BeginCopyFrom
> for progress reporting (instead of relying on pstate != NULL), like
> how we have a bit in reindex_index's params->options that specifies
> whether we want progress reporting (which is unset for parallel
> workers iirc).

This didn't get fixed for v16, and it seems unlikely that it'll be
addressed in back branches.

But while I was reviewing forgotten threads, it occurred to me to raise
the issue in time to fix it for v17.

-- 
Justin

Re: backend stuck in DataFileExtend

2024-05-06 Thread Justin Pryzby

On Mon, May 06, 2024 at 10:51:08AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2024-05-06 12:37:26 -0500, Justin Pryzby wrote:
> > On Mon, May 06, 2024 at 10:04:13AM -0700, Andres Freund wrote:
> > > Hi,
> > >
> > > On 2024-05-06 09:05:38 -0500, Justin Pryzby wrote:
> > > > In March, I noticed that a backend got stuck overnight doing:
> > > >
> > > > backend_start| 2024-03-27 22:34:12.774195-07
> > > > xact_start   | 2024-03-27 22:34:39.741933-07
> > > > query_start  | 2024-03-27 22:34:41.997276-07
> > > > state_change | 2024-03-27 22:34:41.997307-07
> > > > wait_event_type  | IO
> > > > wait_event   | DataFileExtend
> > > > state| active
> > > > backend_xid  | 330896991
> > > > backend_xmin | 330896991
> > > > query_id | -3255287420265634540
> > > > query| PREPARE mml_1 AS INSERT INTO child.un...
> > > >
> > > > The backend was spinning at 100% CPU:
> > > >
> > > > [pryzbyj@telsa2021 ~]$ ps -O wchan,pcpu 7881
> > > >PID WCHAN  %CPU S TTY  TIME COMMAND
> > > >   7881 ?  99.4 R ?08:14:55 postgres: telsasoft ts [local] 
> > > > INSERT
> > > >
> > > > This was postgres 16 STABLE compiled at 14e991db8.
> > > >
> > > > It's possible that this is a rare issue that we haven't hit before.
> > > > It's also possible this this is a recent regression.  We previously
> > > > compiled at b2c9936a7 without hitting any issue (possibly by chance).
> > > >
> > > > I could neither strace the process nor attach a debugger.  They got
> > > > stuck attaching.  Maybe it's possible there's a kernel issue.  This is a
> > > > VM running centos7 / 3.10.0-1160.49.1.el7.x86_64.
> > >
> > > > $ awk '{print $14, $15}' /proc/7881/stat # usr / sys
> > > > 229 3088448
> > > >
> > > > When I tried to shut down postgres (hoping to finally be able to attach
> > > > a debugger), instead it got stuck:
> > >
> > > That strongly indicates either a kernel bug or storage having an issue. It
> > > can't be postgres' fault if an IO never completes.
> >
> > Is that for sure even though wchan=? (which I take to mean "not in a system
> > call"), and the process is stuck in user mode ?
> 
> Postgres doesn't do anything to prevent a debugger from working, so this is
> just indicative that the kernel is stuck somewhere that it didn't set up
> information about being blocked - because it's busy doing something.
> 
> 
> > > What do /proc/$pid/stack say?
> >
> > [pryzbyj@telsasoft-centos7 ~]$ sudo cat /proc/18468/stat
> > 18468 (postgres) R 2274 18468 18468 0 -1 4857920 91836 0 3985 0 364 3794271 
> > 0 0 20 0 1 0 6092292660 941846528 10 18446744073709551615 4194304 12848820 
> > 140732995870240 140732995857304 139726958536394 0 4194304 19929088 
> > 536896135 0 0 0 17 3 0 0 1682 0 0 14949632 15052146 34668544 
> > 140732995874457 140732995874511 140732995874511 140732995874781 0
> 
> stack, not stat...

Ah, that is illuminating - thanks.

[pryzbyj@telsasoft-centos7 ~]$ sudo cat /proc/18468/stack 
[] __cond_resched+0x26/0x30
[] dbuf_rele+0x1e/0x40 [zfs]
[] dmu_buf_rele_array.part.6+0x40/0x70 [zfs]
[] dmu_write_uio_dnode+0x11a/0x180 [zfs]
[] dmu_write_uio_dbuf+0x54/0x70 [zfs]
[] zfs_write+0xb9b/0xfb0 [zfs]
[] zpl_aio_write+0x152/0x1a0 [zfs]
[] do_sync_readv_writev+0x7b/0xd0
[] do_readv_writev+0xce/0x260
[] vfs_writev+0x35/0x60
[] SyS_pwritev+0xc2/0xf0
[] system_call_fastpath+0x25/0x2a
[] 0x

FWIW: both are running zfs-2.2.3 RPMs from zfsonlinux.org.

It's surely possible that there's an issue that affects older kernels
but not more recent ones.

> > > > Full disclosure: the VM that hit this issue today has had storage-level
> > > > errors (reported here at ZZqr_GTaHyuW7fLp@pryzbyj2023), as recently as 3
> > > > days ago.
> > >
> > > So indeed, my suspicion from above is confirmed.
> >
> > I'd be fine with that conclusion (as in the earlier thread), except this
> > has now happened on 2 different VMs, and the first one has no I/O
> > issues.  If this were another symptom of a storage failure, and hadn't
> > previously happened on another VM, I wouldn't be re-reporting it.
> 
> Is it the same VM hosting environment? And the same old distro?

Yes, they're running centos7 with the indicated kernels.

dmidecode shows they're both running:

Product Name: VMware Virtual Platform

But they're different customers, so I'd be somewhat surprised if they're
running same versions of the hypervisor.

-- 
Justin

Re: backend stuck in DataFileExtend

2024-05-06 Thread Justin Pryzby

On Mon, May 06, 2024 at 10:04:13AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2024-05-06 09:05:38 -0500, Justin Pryzby wrote:
> > In March, I noticed that a backend got stuck overnight doing:
> > 
> > backend_start| 2024-03-27 22:34:12.774195-07
> > xact_start   | 2024-03-27 22:34:39.741933-07
> > query_start  | 2024-03-27 22:34:41.997276-07
> > state_change | 2024-03-27 22:34:41.997307-07
> > wait_event_type  | IO
> > wait_event   | DataFileExtend
> > state| active
> > backend_xid  | 330896991
> > backend_xmin | 330896991
> > query_id | -3255287420265634540
> > query| PREPARE mml_1 AS INSERT INTO child.un...
> > 
> > The backend was spinning at 100% CPU:
> > 
> > [pryzbyj@telsa2021 ~]$ ps -O wchan,pcpu 7881
> >PID WCHAN  %CPU S TTY  TIME COMMAND
> >   7881 ?  99.4 R ?08:14:55 postgres: telsasoft ts [local] INSERT
> > 
> > This was postgres 16 STABLE compiled at 14e991db8.
> > 
> > It's possible that this is a rare issue that we haven't hit before.
> > It's also possible this this is a recent regression.  We previously
> > compiled at b2c9936a7 without hitting any issue (possibly by chance).
> > 
> > I could neither strace the process nor attach a debugger.  They got
> > stuck attaching.  Maybe it's possible there's a kernel issue.  This is a
> > VM running centos7 / 3.10.0-1160.49.1.el7.x86_64.
> 
> > $ awk '{print $14, $15}' /proc/7881/stat # usr / sys
> > 229 3088448
> > 
> > When I tried to shut down postgres (hoping to finally be able to attach
> > a debugger), instead it got stuck:
> 
> That strongly indicates either a kernel bug or storage having an issue. It
> can't be postgres' fault if an IO never completes.

Is that for sure even though wchan=? (which I take to mean "not in a system
call"), and the process is stuck in user mode ?

> What do /proc/$pid/stack say?

[pryzbyj@telsasoft-centos7 ~]$ sudo cat /proc/18468/stat
18468 (postgres) R 2274 18468 18468 0 -1 4857920 91836 0 3985 0 364 3794271 0 0 
20 0 1 0 6092292660 941846528 10 18446744073709551615 4194304 12848820 
140732995870240 140732995857304 139726958536394 0 4194304 19929088 536896135 0 
0 0 17 3 0 0 1682 0 0 14949632 15052146 34668544 140732995874457 
140732995874511 140732995874511 140732995874781 0

> > In both cases, the backend got stuck after 10pm, which is when a backup
> > job kicks off, followed by other DB maintenance.  Our backup job uses
> > pg_export_snapshot() + pg_dump --snapshot.  In today's case, the pg_dump
> > would've finished and snapshot closed at 2023-05-05 22:15.  The backup
> > job did some more pg_dumps involving historic tables without snapshots
> > and finished at 01:11:40, at which point a reindex job started, but it
> > looks like the DB was already stuck for the purpose of reindex, and so
> > the script ended after a handful of commands were "[canceled] due to
> > statement timeout".
> 
> Is it possible that you're "just" waiting for very slow IO? Is there a lot of
> dirty memory? Particularly on these old kernels that can lead to very extreme
> delays.
> 
> grep -Ei 'dirty|writeback' /proc/meminfo

[pryzbyj@telsasoft-centos7 ~]$ grep -Ei 'dirty|writeback' /proc/meminfo
Dirty:28 kB
Writeback: 0 kB
WritebackTmp:  0 kB

> > Full disclosure: the VM that hit this issue today has had storage-level
> > errors (reported here at ZZqr_GTaHyuW7fLp@pryzbyj2023), as recently as 3
> > days ago.
> 
> So indeed, my suspicion from above is confirmed.

I'd be fine with that conclusion (as in the earlier thread), except this
has now happened on 2 different VMs, and the first one has no I/O
issues.  If this were another symptom of a storage failure, and hadn't
previously happened on another VM, I wouldn't be re-reporting it.

-- 
Justin

Re: pg17 issues with not-null contraints

2024-05-06 Thread Justin Pryzby

On Mon, May 06, 2024 at 06:34:16PM +0200, Alvaro Herrera wrote:
> On 2024-May-06, Justin Pryzby wrote:
> 
> > > (Do you really want the partition to be
> > > created without the primary key already there?)
> > 
> > Why not ?  The PK will be added when I attach it one moment later.
> > 
> > CREATE TABLE part (LIKE parent);
> > ALTER TABLE parent ATTACH PARTITION part ...
> 
> Well, if you load data in the meantime, you'll spend time during `ALTER
> TABLE parent` for the index to be created.  (On the other hand, you may
> want to first create the table, then load data, then create the
> indexes.)

To be clear, I'm referring to the case of CREATE+ATTACH to avoid a
strong lock while creating a partition in advance of loading data.  See:
20220718143304.gc18...@telsasoft.com
f170b572d2b4cc232c5b6d391b4ecf3e368594b7
898e5e3290a72d288923260143930fb32036c00c

> This would also solve your complaint, because then the table would have
> the not-null constraint in all cases.

I agree that it would solve my complaint, but at this time I've no
further opinion.

-- 
Justin

Re: pg17 issues with not-null contraints

2024-05-06 Thread Justin Pryzby

On Mon, May 06, 2024 at 05:56:54PM +0200, Alvaro Herrera wrote:
> On 2024-May-04, Alvaro Herrera wrote:
> > On 2024-May-03, Justin Pryzby wrote:
> > 
> > > But if it's created with LIKE:
> > > postgres=# CREATE TABLE t1 (LIKE t);
> > > postgres=# ALTER TABLE t ATTACH PARTITION t1 DEFAULT ;
> > > 
> > > ..one also sees:
> > > 
> > > Not-null constraints:
> > > "t1_i_not_null" NOT NULL "i"
> > 
> > Hmm, I think the problem here is not ATTACH; the not-null constraint is
> > there immediately after CREATE.  I think this is all right actually,
> > because we derive a not-null constraint from the primary key and this is
> > definitely intentional.  But I also think that if you do CREATE TABLE t1
> > (LIKE t INCLUDING CONSTRAINTS) then you should get only the primary key
> > and no separate not-null constraint.  That will make the table more
> > similar to the one being copied.
> 
> I misspoke -- it's INCLUDING INDEXES that we need here, not INCLUDING
> CONSTRAINTS ... and it turns out we already do it that way, so with this
> script
> 
> CREATE TABLE t (i int PRIMARY KEY) PARTITION BY RANGE (i);
> CREATE TABLE t1 (LIKE t INCLUDING INDEXES);
> ALTER TABLE t ATTACH PARTITION t1 DEFAULT ;
> 
> you end up with this
> 
> 55432 17devel 71313=# \d+ t
>   Partitioned table "public.t"
>  Column │  Type   │ Collation │ Nullable │ Default │ Storage │ Compression │ 
> Stats target │ Description 
> ┼─┼───┼──┼─┼─┼─┼──┼─
>  i  │ integer │   │ not null │ │ plain   │ │  
> │ 
> Partition key: RANGE (i)
> Indexes:
> "t_pkey" PRIMARY KEY, btree (i)
> Partitions: t1 DEFAULT
> 
> 55432 17devel 71313=# \d+ t1
>Table "public.t1"
>  Column │  Type   │ Collation │ Nullable │ Default │ Storage │ Compression │ 
> Stats target │ Description 
> ┼─┼───┼──┼─┼─┼─┼──┼─
>  i  │ integer │   │ not null │ │ plain   │ │  
> │ 
> Partition of: t DEFAULT
> No partition constraint
> Indexes:
> "t1_pkey" PRIMARY KEY, btree (i)
> Access method: heap
> 
> which I think is what you want.  (Do you really want the partition to be
> created without the primary key already there?)

Why not ?  The PK will be added when I attach it one moment later.

CREATE TABLE part (LIKE parent);
ALTER TABLE parent ATTACH PARTITION part ...

Do you really think that after ATTACH, the constraints should be
different depending on whether the child was created INCLUDING INDEXES ?
I'll continue to think about this, but I still find that surprising.

-- 
Justin

backend stuck in DataFileExtend

2024-05-06 Thread Justin Pryzby

In March, I noticed that a backend got stuck overnight doing:

backend_start| 2024-03-27 22:34:12.774195-07
xact_start   | 2024-03-27 22:34:39.741933-07
query_start  | 2024-03-27 22:34:41.997276-07
state_change | 2024-03-27 22:34:41.997307-07
wait_event_type  | IO
wait_event   | DataFileExtend
state| active
backend_xid  | 330896991
backend_xmin | 330896991
query_id | -3255287420265634540
query| PREPARE mml_1 AS INSERT INTO child.un...

The backend was spinning at 100% CPU:

[pryzbyj@telsa2021 ~]$ ps -O wchan,pcpu 7881
   PID WCHAN  %CPU S TTY  TIME COMMAND
  7881 ?  99.4 R ?08:14:55 postgres: telsasoft ts [local] INSERT

This was postgres 16 STABLE compiled at 14e991db8.

It's possible that this is a rare issue that we haven't hit before.
It's also possible this this is a recent regression.  We previously
compiled at b2c9936a7 without hitting any issue (possibly by chance).

I could neither strace the process nor attach a debugger.  They got
stuck attaching.  Maybe it's possible there's a kernel issue.  This is a
VM running centos7 / 3.10.0-1160.49.1.el7.x86_64.

$ awk '{print $14, $15}' /proc/7881/stat # usr / sys
229 3088448

When I tried to shut down postgres (hoping to finally be able to attach
a debugger), instead it got stuck:

$ ps -fu postgres
UID PID   PPID  C STIME TTY  TIME CMD
postgres   7881 119674 99 mar27 ?08:38:06 postgres: telsasoft ts 
[local] INSERT
postgres 119674  1  0 mar25 ?00:07:13 /usr/pgsql-16/bin/postgres -D 
/var/lib/pgsql/16/data/
postgres 119676 119674  0 mar25 ?00:00:11 postgres: logger 
postgres 119679 119674  0 mar25 ?00:11:56 postgres: checkpointer 

This first occurred on Mar 27, but I see today that it's now recurring
at a different customer:

backend_start   | 2024-05-05 22:19:17.009477-06
xact_start  | 2024-05-05 22:20:18.129305-06
query_start | 2024-05-05 22:20:19.409555-06
state_change| 2024-05-05 22:20:19.409556-06
pid | 18468
wait_event_type | IO
wait_event  | DataFileExtend
state   | active
backend_xid | 4236249136
backend_xmin| 4236221661
query_id| 2601062835886299431
left| PREPARE mml_1 AS INSERT INTO chil

This time it's running v16.2 (REL_16_STABLE compiled at b78fa8547),
under centos7 / 3.10.0-1160.66.1.el7.x86_64.

The symptoms are the same: backend stuck using 100% CPU in user mode:
[pryzbyj@telsasoft-centos7 ~]$ awk '{print $14, $15}' /proc/18468/stat # usr / 
sys
364 2541168

This seems to have affected other backends, which are now waiting on
RegisterSyncRequest, frozenid, and CheckpointerComm.

In both cases, the backend got stuck after 10pm, which is when a backup
job kicks off, followed by other DB maintenance.  Our backup job uses
pg_export_snapshot() + pg_dump --snapshot.  In today's case, the pg_dump
would've finished and snapshot closed at 2023-05-05 22:15.  The backup
job did some more pg_dumps involving historic tables without snapshots
and finished at 01:11:40, at which point a reindex job started, but it
looks like the DB was already stuck for the purpose of reindex, and so
the script ended after a handful of commands were "[canceled] due to
statement timeout".

Full disclosure: the VM that hit this issue today has had storage-level
errors (reported here at ZZqr_GTaHyuW7fLp@pryzbyj2023), as recently as 3
days ago.

Maybe more importantly, this VM also seems to suffer from some memory
leak, and the leaky process was Killed shortly after the stuck query
started.  This makes me suspect a race condition which was triggered
while swapping:

May  5 22:24:05 localhost kernel: Out of memory: Kill process 17157 (python3.8) 
score 134 or sacrifice child

We don't have as good logs from March, but I'm not aware of killed
processes on the VM where we hit this in March, but it's true that the
I/O there is not fast.

Also, I fibbed when I said these were compiled at 16 STABLE - I'd
backpatched a small number of patches from master:

a97bbe1f1df Reduce branches in heapgetpage()'s per-tuple loop
98f320eb2ef Increase default vacuum_buffer_usage_limit to 2MB.
44086b09753 Preliminary refactor of heap scanning functions
959b38d770b Invent --transaction-size option for pg_restore.
a45c78e3284 Rearrange pg_dump's handling of large objects for better efficiency.
9d1a5354f58 Fix costing bug in MergeAppend
a5cf808be55 Read include/exclude commands for dump/restore from file
8c16ad3b432 Allow using syncfs() in frontend utilities.
6cdeb32 Add support for syncfs() in frontend support functions.
3ed19567198 Make enum for sync methods available to frontend code.
f39b265808b Move PG_TEMP_FILE* macros to file_utils.h.
a14354cac0e Add GUC parameter "huge_pages_status"

I will need to restart services this morning, but if someone wants to
suggest diagnostic measures, I will see whether the command gets stuck
or not.

-- 
Justin

Re: pg17 issues with not-null contraints

2024-05-03 Thread Justin Pryzby

On another thread [0], Alexander Lakhin pointed out, indirectly, that
partitions created using LIKE+ATTACH now have different not-null constraints
from partitions created using PARTITION OF.

postgres=# CREATE TABLE t (i int PRIMARY KEY) PARTITION BY RANGE (i);
postgres=# CREATE TABLE t1 PARTITION OF t DEFAULT ;
postgres=# \d+ t1
...
Partition of: t DEFAULT
No partition constraint
Indexes:
"t1_pkey" PRIMARY KEY, btree (i)
Access method: heap

But if it's created with LIKE:
postgres=# CREATE TABLE t1 (LIKE t);
postgres=# ALTER TABLE t ATTACH PARTITION t1 DEFAULT ;

..one also sees:

Not-null constraints:
"t1_i_not_null" NOT NULL "i"

It looks like ATTACH may have an issue with constraints implied by pkey.

[0] 
https://www.postgresql.org/message-id/8034d1c6-5f0e-e858-9af9-45d5e246515e%40gmail.com

-- 
Justin

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-05-03 Thread Justin Pryzby

On Wed, May 01, 2024 at 10:51:24PM +0300, Dmitry Koval wrote:
> Hi!
> 
> 30.04.2024 23:15, Justin Pryzby пишет:
> > Is this issue already fixed ?
> > I wasn't able to reproduce it.  Maybe it only happened with earlier
> > patch versions applied ?
> 
> I think this was fixed in commit [1].
> 
> [1] 
> https://github.com/postgres/postgres/commit/fcf80c5d5f0f3787e70fca8fd029d2e08a923f91

I tried to reproduce it at fcf80c5d5f~, but couldn't.  
I don't see how that patch would fix it anyway.
I'm hoping Alexander can confirm what happened.

The other remaining issues I'm aware of are for EXCLUDING STATISTICS and
refusing to ALTER if the owners don't match.

Note that the error that led to "EXCLUDING IDENTITY" is being discused
over here:
https://www.postgresql.org/message-id/3b8a9dc1-bbc7-0ef5-6863-c432afac7...@gmail.com

It's possible that once that's addressed, the exclusion should be
removed here, too.

-- 
Justin

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-30 Thread Justin Pryzby

On Thu, Apr 11, 2024 at 08:00:00PM +0300, Alexander Lakhin wrote:
> 11.04.2024 16:27, Dmitry Koval wrote:
> > 
> > Added correction (and test), see 
> > v3-0001-Fix-for-SPLIT-MERGE-partitions-of-temporary-table.patch.
> 
> Thank you for the correction, but may be an attempt to merge into implicit
> pg_temp should fail just like CREATE TABLE ... PARTITION OF ... does?
> 
> Please look also at another anomaly with schemas:
> CREATE SCHEMA s1;
> CREATE TABLE t (i int) PARTITION BY RANGE (i);
> CREATE TABLE tp_0_2 PARTITION OF t
>   FOR VALUES FROM (0) TO (2);
> ALTER TABLE t SPLIT PARTITION tp_0_2 INTO
>   (PARTITION s1.tp0 FOR VALUES FROM (0) TO (1), PARTITION s1.tp1 FOR VALUES 
> FROM (1) TO (2));
> results in:
> \d+ s1.*
> Did not find any relation named "s1.*"
> \d+ tp*
>   Table "public.tp0"

Hi,

Is this issue already fixed ?

I wasn't able to reproduce it.  Maybe it only happened with earlier
patch versions applied ?

Thanks,
-- 
Justin

Re: pg17 issues with not-null contraints

2024-04-30 Thread Justin Pryzby

On Tue, Apr 30, 2024 at 01:52:02PM -0400, Robert Haas wrote:
> On Thu, Apr 18, 2024 at 12:52 PM Justin Pryzby  wrote:
> > I'm not totally clear on what's intended in v17 - maybe it'd be dead
> > code, and maybe it shouldn't even be applied to master branch.  But I do
> > think it's worth patching earlier versions (even though it'll be less
> > useful than having done so 5 years ago).
> 
> This thread is still on the open items list, but I'm not sure whether
> there's still stuff here that needs to be fixed for the current
> release. If not, this thread should be moved to the "resolved before
> 17beta1" section. If so, we should try to reach consensus on what the
> remaining issues are and what we're going to do about them.

I think the only thing that's relevant for v17 is this:

On Tue, Apr 16, 2024 at 08:11:49PM +0200, Alvaro Herrera wrote:
> Speaking of which, I wonder if I should modify pg16's tests so that they
> leave behind tables set up in this way, to immortalize pg_upgrade testing.

The patch on the other thread for pg_upgrade --check is an old issue
affecting all stable releases.

-- 
Justin

Re: Internal error codes triggered by tests

2024-04-29 Thread Justin Pryzby

I sent a list of user-facing elogs here, a few times.
ZDclRM/jate66...@telsasoft.com

And if someone had expressed an interest, I might have sent a longer
list.

Re: using extended statistics to improve join estimates

2024-04-29 Thread Justin Pryzby

On Sun, Apr 28, 2024 at 10:07:01AM +0800, Andy Fan wrote:
> 's/estimiatedcluases/estimatedclauses/' typo error in the
> commit message is not fixed since I have to regenerate all the commits

Maybe you know this, but some of these patches need to be squashed.
Regenerating the patches to address feedback is the usual process.
When they're not squished, it makes it hard to review the content of the
patches.

For example:
[PATCH v1 18/22] Fix error "unexpected system attribute" when join with system 
attr
..adds .sql regression tests, but the expected .out isn't updated until
[PATCH v1 19/22] Fix the incorrect comment on extended stats.

That fixes an elog() in Tomas' original commit, so it should probably be
002 or 003.  It might make sense to keep the first commit separate for
now, since it's nice to keep Tomas' original patch "pristine" to make
more apparent the changes you're proposing.

Another:
[PATCH v1 20/22] Add fastpath when combine the 2 MCV like eqjoinsel_inner.
..doesn't compile without
[PATCH v1 21/22] When mcv->ndimensions == list_length(clauses), handle it same 
as

Your 022 patch fixes a typo in your 002 patch, which means that first
one reads a patch with a typo, and then later, a 10 line long patch
reflowing the comment with a typo fixed.

A good guideline is that each patch should be self-contained, compiling
and passing tests.  Which is more difficult with a long stack of
patches.

-- 
Justin

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-28 Thread Justin Pryzby

On Sun, Apr 28, 2024 at 08:18:42AM -0500, Justin Pryzby wrote:
> > I will explore this.  Do we copy extended stats when we do CREATE
> > TABLE ... PARTITION OF?  I think we need to do the same here.
> 
> Right, they're not copied because an extended stats objs on the parent
> does something different than putting stats objects on each child.
> I've convinced myself that it's wrong to copy the parent's stats obj.
> If someone wants stats objects on each child, they'll have to handle
> them specially after MERGE/SPLIT, just as they would for per-child
> defaults/constraints/etc.

I dug up this thread, in which the idea of copying extended stats from
parent to child was considered some 6 years ago, but never implemented;
for consistency, MERGE/SPLIT shouldn't copy extended stats, either.

https://www.postgresql.org/message-id/20180305195750.aecbpihhcvuskzba%40alvherre.pgsql

-- 
Justin

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-28 Thread Justin Pryzby

On Sun, Apr 28, 2024 at 04:04:54AM +0300, Alexander Korotkov wrote:
> Hi Justin,
> 
> Thank you for your review.  Please check v9 of the patchset [1].
> 
> On Wed, Apr 24, 2024 at 11:26 PM Justin Pryzby  wrote:
> > This patch also/already fixes the schema issue I reported.  Thanks.
> >
> > If you wanted to include a test case for that:
> >
> > begin;
> > CREATE SCHEMA s;
> > CREATE SCHEMA t;
> > CREATE TABLE p(i int) PARTITION BY RANGE(i);
> > CREATE TABLE s.c1 PARTITION OF p FOR VALUES FROM (1)TO(2);
> > CREATE TABLE s.c2 PARTITION OF p FOR VALUES FROM (2)TO(3);
> > ALTER TABLE p MERGE PARTITIONS (s.c1, s.c2) INTO s.c1; -- misbehaves if 
> > merging into the same name as an existing partition
> > \d+ p
> > ...
> > Partitions: c1 FOR VALUES FROM (1) TO (3)
> 
> There is already a test which checks merging into the same name as an
> existing partition.  And there are tests with schema-qualified names.
> I'm not yet convinced we need a test with both these properties
> together.

I mentioned that the combination of schemas and merge-into-same-name is
what currently doesn't work right.

> > Also, extended stats objects are currently cloned to new child tables.
> > But I suggested in [0] that they probably shouldn't be.
> 
> I will explore this.  Do we copy extended stats when we do CREATE
> TABLE ... PARTITION OF?  I think we need to do the same here.

Right, they're not copied because an extended stats objs on the parent
does something different than putting stats objects on each child.
I've convinced myself that it's wrong to copy the parent's stats obj.
If someone wants stats objects on each child, they'll have to handle
them specially after MERGE/SPLIT, just as they would for per-child
defaults/constraints/etc.

On Sun, Apr 28, 2024 at 04:04:54AM +0300, Alexander Korotkov wrote:
> On Wed, Apr 24, 2024 at 11:26 PM Justin Pryzby  wrote:
> > This patch adds documentation saying:
> > +  Any indexes, constraints and user-defined row-level triggers that 
> > exist
> > +  in the parent table are cloned on new partitions [...]
> >
> > Which is good to say, and addresses part of my message [0]
> > [0] ZiJW1g2nbQs9ekwK@pryzbyj2023
> 
> Makes sense.  Extracted this into a separate patch in v10.

I adjusted the language some and fixed a typo in the commit message.

s/parition/partition/

-- 
Justin
>From e00033fc4b8254c70bf8a3d41d513edd9540e2d7 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov 
Date: Sun, 28 Apr 2024 03:39:30 +0300
Subject: [PATCH] Document the way partition MERGE/SPLIT operations create new
 partitions

Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/ZilrByTp-pbz6Mvf%40pryzbyj2023
---
 doc/src/sgml/ref/alter_table.sgml | 12 
 1 file changed, 12 insertions(+)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index fe36ff82e52..fc2dfffe49f 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1153,6 +1153,12 @@ WITH ( MODULUS numeric_literal, REM
   splitting we have a partition with the same name).
   Only simple, non-partitioned partition can be split.
  
+ 
+  The new partitions will be created the same as tables created with the
+  SQL command CREATE TABLE partition_nameN (LIKE name INCLUDING ALL EXCLUDING INDEXES EXCLUDING IDENTITY).
+  The indexes and identity are created later, after moving the data
+  into the new partitions.
+ 
  
   
This command acquires an ACCESS EXCLUSIVE lock.
@@ -1213,6 +1219,12 @@ WITH ( MODULUS numeric_literal, REM
   can have the same name as one of the merged partitions.  Only simple,
   non-partitioned partitions can be merged.
  
+ 
+  The new partition will be created the same as a table created with the
+  SQL command CREATE TABLE partition_name (LIKE name INCLUDING ALL EXCLUDING INDEXES EXCLUDING IDENTITY).
+  The indexes and identity are created later, after moving the data
+  into the new partition.
+ 
  
   
This command acquires an ACCESS EXCLUSIVE lock.
-- 
2.42.0

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-24 Thread Justin Pryzby

On Mon, Apr 22, 2024 at 01:31:48PM +0300, Alexander Korotkov wrote:
> Hi!
> 
> On Fri, Apr 19, 2024 at 4:29 PM Alexander Korotkov  
> wrote:
> > On Fri, Apr 19, 2024 at 2:26 AM Dmitry Koval  wrote:
> > > 18.04.2024 19:00, Alexander Lakhin wrote:
> > > > leaves a strange constraint:
> > > > \d+ t*
> > > >Table "public.tp_0"
> > > > ...
> > > > Not-null constraints:
> > > >  "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i"
> > >
> > > Thanks!
> > > Attached fix (with test) for this case.
> > > The patch should be applied after patches
> > > v6-0001- ... .patch ... v6-0004- ... .patch
> >
> > I've incorporated this fix with 0001 patch.
> >
> > Also added to the patchset
> > 005 – tab completion by Dagfinn [1]
> > 006 – draft fix for table AM issue spotted by Alexander Lakhin [2]
> > 007 – doc review by Justin [3]
> >
> > I'm continuing work on this.
> >
> > Links
> > 1. https://www.postgresql.org/message-id/87plumiox2.fsf%40wibble.ilmari.org
> > 2. 
> > https://www.postgresql.org/message-id/84ada05b-be5c-473e-6d1c-ebe5dd21b190%40gmail.com
> > 3. https://www.postgresql.org/message-id/ZiGH0xc1lxJ71ZfB%40pryzbyj2023
> 
> 0001
> The way we handle name collisions during MERGE PARTITIONS operation is
> reworked by integration of patch [3].  This makes note about commit in
> [2] not relevant.

This patch also/already fixes the schema issue I reported.  Thanks.

If you wanted to include a test case for that:

begin;
CREATE SCHEMA s;
CREATE SCHEMA t;
CREATE TABLE p(i int) PARTITION BY RANGE(i);
CREATE TABLE s.c1 PARTITION OF p FOR VALUES FROM (1)TO(2);
CREATE TABLE s.c2 PARTITION OF p FOR VALUES FROM (2)TO(3);
ALTER TABLE p MERGE PARTITIONS (s.c1, s.c2) INTO s.c1; -- misbehaves if merging 
into the same name as an existing partition
\d+ p
...
Partitions: c1 FOR VALUES FROM (1) TO (3)

> 0002
> The persistence of the new partition is copied as suggested in [1].
> But the checks are in-place, because search_path could influence new
> table persistence.  Per review [2], commit message typos are fixed,
> documentation is revised, revised tests to cover schema-qualification,
> usage of search_path.

Subject: [PATCH v8 2/7] Make new partitions with parent's persistence during 
MERGE/SPLIT operations

This patch adds documentation saying:
+  Any indexes, constraints and user-defined row-level triggers that exist
+  in the parent table are cloned on new partitions [...]

Which is good to say, and addresses part of my message [0]
[0] ZiJW1g2nbQs9ekwK@pryzbyj2023

But it doesn't have anything to do with "creating new partitions with
parent's persistence".  Maybe there was a merge conflict and the docs
ended up in the wrong patch ?

Also, defaults, storage options, compression are also copied.  As will
be anything else from LIKE.  And since anything added in the future will
also be copied, maybe it's better to just say that the tables will be
created the same way as "LIKE .. INCLUDING ALL EXCLUDING ..", or
similar.  Otherwise, the next person who adds a new option for LIKE
would have to remember to update this paragraph...

Also, extended stats objects are currently cloned to new child tables.
But I suggested in [0] that they probably shouldn't be.

> 007 – doc review by Justin [3]

I suggest to drop this patch for now.  I'll send some more minor fixes to
docs and code comments once the other patches are settled.

-- 
Justin

Re: pgsql: Fix restore of not-null constraints with inheritance

2024-04-20 Thread Justin Pryzby

On Fri, Apr 19, 2024 at 01:59:31PM +0200, Alvaro Herrera wrote:
> BTW because of a concern from Justin that the NO INHERIT stuff will
> cause errors in old server versions, I started to wonder if it wouldn't
> be better to add these constraints in a separate line for compatibility,
> so for example in the table above it'd be
> 
> CREATE TABLE public.rule_and_refint_t2 (
> id2a integer,
> id2c integer
> );
> ALTER TABLE public.rule_and_refint_t2 ADD CONSTRAINT 
> pgdump_throwaway_notnull_0 NOT NULL id2a NO INHERIT;
> ALTER TABLE public.rule_and_refint_t2 ADD CONSTRAINT 
> pgdump_throwaway_notnull_1 NOT NULL id2c NO INHERIT;
> 
> which might be less problematic in terms of compatibility: you still end
> up having the table, it's only the ALTER TABLE that would error out.

Under pg_restore -d, those would all be run in a single transactional
command, so it would *still* fail to create the table...

It seems like the workaround to restore into an old server version would
be to run:
| pg_restore -f- |sed 's/ NO INHERIT//' |psql

Putting them on separate lines makes that a tiny bit better, since you
could do:
| pg_restore -f- |sed '/^ALTER TABLE .* ADD CONSTRAINT .* NOT NULL/{ s/ NO 
INHERIT// }' |psql

But I'm not sure whether that's enough of an improvement to warrant the
effort.

-- 
Justin

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-19 Thread Justin Pryzby

On Thu, Apr 11, 2024 at 10:20:53PM -0400, Robert Haas wrote:
> On Thu, Apr 11, 2024 at 9:54 PM Alexander Korotkov  
> wrote:
> > I think we shouldn't unconditionally copy schema name and
> > relpersistence from the parent table.  Instead we should throw the
> > error on a mismatch like CREATE TABLE ... PARTITION OF ... does.  I'm
> > working on revising this fix.
> 
> We definitely shouldn't copy the schema name from the parent table. It
> should be possible to schema-qualify the new partition names, and if
> you don't, then the search_path should determine where they get
> placed.

+1.  Alexander Lakhin reported an issue with schemas and SPLIT, and I
noticed an issue with schemas with MERGE.  The issue I hit is occurs
when MERGE'ing into a partition with the same name, and it's fixed like
so:

--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -21526,8 +21526,7 @@ ATExecMergePartitions(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
{
/* Create partition table with generated temporary name. */
sprintf(tmpRelName, "merge-%u-%X-tmp", RelationGetRelid(rel), 
MyProcPid);
-   mergePartName = 
makeRangeVar(get_namespace_name(RelationGetNamespace(rel)),
-
tmpRelName, -1);
+   mergePartName = makeRangeVar(mergePartName->schemaname, 
tmpRelName, -1);
}
createPartitionTable(mergePartName,

makeRangeVar(get_namespace_name(RelationGetNamespace(rel)),

> One of the things I dislike about this type of feature -- not this
> implementation specifically, but just this kind of idea in general --
> is that the syntax mentions a whole bunch of tables but in a way where
> you can't set their properties. Persistence, reloptions, whatever.
> There's just no place to mention any of that stuff - and if you wanted
> to create a place, you'd have to invent special syntax for each
> separate thing. That's why I think it's good that the normal way of
> creating a partition is CREATE TABLE .. PARTITION OF. Because that
> way, we know that the full power of the CREATE TABLE statement is
> always available, and you can set anything that you could set for a
> table that is not a partition.

Right.  The current feature is useful and will probably work for 90% of
people's partitioned tables.

Currently, CREATE TABLE .. PARTITION OF does not create stats objects on
the child table, but MERGE PARTITIONS does, which seems strange.
Maybe stats should not be included on the new child ?

Note that stats on parent table are not analagous to indexes -
partitioned indexes do nothing other than cause indexes to be created on
any new/attached partitions.  But stats objects on the parent 1) cause
extended stats to be collected and computed across the whole partition
heirarchy, and 2) do not cause stats to be computed for the individual
partitions.

Partitions can have different column definitions, for example null
constraints, FKs, defaults.  And currently, if you MERGE partitions,
those will all be lost (or rather, replaced by whatever LIKE parent
gives).  I think that's totally fine - anyone using different defaults
on child tables could either not use MERGE PARTITIONS, or fix up the
defaults afterwards.  There's not much confusion that the details of the
differences between individual partitions will be lost when the
individual partitions are merged and no longer exist.
But I think it'd be useful to document how the new partitions will be
constructed.

-- 
Justin

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-18 Thread Justin Pryzby

Here are some additional fixes to docs.
>From 6da8beaa5a2b78e785e5b6519894f8357002d916 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 18 Apr 2024 15:40:44 -0500
Subject: [PATCH] doc review for ALTER TABLE ... SPLIT/MERGE PARTITION

---
 doc/src/sgml/ddl.sgml |  4 ++--
 doc/src/sgml/ref/alter_table.sgml | 22 +++---
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 026bfff70f3..01277b1d327 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -4384,7 +4384,7 @@ ALTER INDEX measurement_city_id_logdate_key
 
 
  There is also an option for merging multiple table partitions into
- a single partition using the
+ a single partition using
  ALTER TABLE ... MERGE PARTITIONS.
  This feature simplifies the management of partitioned tables by allowing
  users to combine partitions that are no longer needed as
@@ -4403,7 +4403,7 @@ ALTER TABLE measurement
 
 
  Similarly to merging multiple table partitions, there is an option for
- splitting a single partition into multiple using the
+ splitting a single partition into multiple partitions using
  ALTER TABLE ... SPLIT PARTITION.
  This feature could come in handy when one partition grows too big
  and needs to be split into multiple.  It's important to note that
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index fe36ff82e52..e52cfee840c 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1136,16 +1136,16 @@ WITH ( MODULUS numeric_literal, REM
   If the split partition is a DEFAULT partition, one of the new partitions must be DEFAULT.
   In case one of the new partitions or one of existing partitions is DEFAULT,
   new partitions partition_name1,
-  partition_name2, ... can have spaces
+  partition_name2, ... can have gaps
   between partitions bounds.  If the partitioned table does not have a DEFAULT
   partition, the DEFAULT partition can be defined as one of the new partitions.
  
  
   In case new partitions do not contain a DEFAULT partition and the partitioned table
-  does not have a DEFAULT partition, the following must be true: sum bounds of
+  does not have a DEFAULT partition, the following must be true: the sum bounds of
   new partitions partition_name1,
   partition_name2, ... should be
-  equal to bound of split partition partition_name.
+  equal to the bounds of split partition partition_name.
   One of the new partitions partition_name1,
   partition_name2, ... can have
   the same name as split partition partition_name
@@ -1168,24 +1168,24 @@ WITH ( MODULUS numeric_literal, REM
 
 
  
-  This form merges several partitions into the one partition of the target table.
-  Hash-partitioning is not supported.  If DEFAULT partition is not in the
+  This form merges several partitions of the target table into a single partition.
+  Hash-partitioning is not supported.  If a DEFAULT partition is not in the
   list of partitions partition_name1,
   partition_name2 [, ...]:
   

 
- For range-partitioned tables it is necessary that the ranges
+ For range-partitioned tables, it is necessary that the ranges
  of the partitions partition_name1,
  partition_name2 [, ...] can
- be merged into one range without spaces and overlaps (otherwise an error
+ be merged into one range with neither gaps nor overlaps (otherwise an error
  will be generated).  The combined range will be the range for the partition
  partition_name.
 


 
- For list-partitioned tables the value lists of all partitions
+ For list-partitioned tables, the value lists of all partitions
  partition_name1,
  partition_name2 [, ...] are
  combined and form the list of values of partition
@@ -1193,7 +1193,7 @@ WITH ( MODULUS numeric_literal, REM
 

   
-  If DEFAULT partition is in the list of partitions partition_name1,
+  If a DEFAULT partition is in the list of partitions partition_name1,
   partition_name2 [, ...]:
   

@@ -1204,8 +1204,8 @@ WITH ( MODULUS numeric_literal, REM


 
- For range- and list-partitioned tables the ranges and lists of values
- of the merged partitions can be any.
+ For range- and list-partitioned tables, the ranges and lists of values
+ of the merged partitions can be anything.
 

   
-- 
2.42.0

Re: [HACKERS] pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL

2024-04-18 Thread Justin Pryzby

On Thu, Dec 14, 2017 at 08:51:06AM +0700, Ali Akbar wrote:
> Patch for adding check in pg_upgrade.

[...]

On Fri, Sep 29, 2023 at 11:36:42AM -0500, Justin Pryzby wrote:
> On Fri, Sep 29, 2023 at 09:16:35AM +0900, Michael Paquier wrote:
> > You mean when upgrading from an instance of 9.6 or older as c30f177 is
> > not there, right?
> 
> No - while upgrading from v15 to v16.  I'm not clear on how we upgraded
> *to* v15 without hitting the issue, nor how the "not null" got
> dropped...
> 
> > Anyway, it seems like the patch from [1] has no need to run this check
> > when the old cluster's version is 10 or newer.  And perhaps it should
> > mention that this check could be removed from pg_upgrade once v10
> > support is out of scope, in the shape of a comment.
> 
> You're still thinking of PRIMARY KEY as the only way to hit this, right?
> But Ali Akbar already pointed out how to reproduce the problem with DROP
> NOT NULL - which still applies to both v16 and v17.

Rebased and amended the forgotten patch.
See also ZiE3NoY6DdvlvFl9@pryzbyj2023 et seq.
>From f804bec39acac0b53e29563be9ef7a10237ba717 Mon Sep 17 00:00:00 2001
From: Ali Akbar 
Date: Thu, 14 Dec 2017 19:18:59 +0700
Subject: [PATCH] pg_upgrade: check for inconsistent inherited not null columns

Otherwise, the upgrade can fail halfway through with error:
CREATE TABLE ip(id int PRIMARY KEY); CREATE TABLE ic(id int) INHERITS (ip); ALTER TABLE ic ALTER id DROP NOT NULL;
ERROR: column "a" in child table must be marked NOT NULL

JTP: updated to handle contype='n' - otherwise, the cnn_grandchild2
added at b0e96f311 triggers the issue this patch tries to avoid ...

Should be backpatched to all versions.
---
 src/bin/pg_upgrade/check.c | 101 +
 1 file changed, 101 insertions(+)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index ad1f8e3bcb1..79671a9cdb4 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -31,6 +31,7 @@ static void check_new_cluster_logical_replication_slots(void);
 static void check_new_cluster_subscription_configuration(void);
 static void check_old_cluster_for_valid_slots(bool live_check);
 static void check_old_cluster_subscription_state(void);
+static void check_for_not_null_inheritance(ClusterInfo *cluster);
 
 /*
  * DataTypesUsageChecks - definitions of data type checks for the old cluster
@@ -646,6 +647,8 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(_cluster);
 
+	check_for_not_null_inheritance(_cluster);
+
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
@@ -1832,6 +1835,104 @@ check_new_cluster_subscription_configuration(void)
 	check_ok();
 }
 
+/*
+ * check_for_not_null_inheritance()
+ *
+ *  Currently pg_upgrade will fail if there are any inconsistencies in NOT NULL
+ *  constraint inheritance: In PostgreSQL, we can have a column that is NOT NULL
+ *  in parent, but nullable in its children. So check for inconsistent NOT NULL
+ *  constraints in inheritance tree.
+ */
+static void
+check_for_not_null_inheritance(ClusterInfo *cluster)
+{
+	int			dbnum;
+	FILE	   *script = NULL;
+	char		output_path[MAXPGPATH];
+
+	prep_status("Checking for NOT NULL inconsistencies in inheritance");
+
+	snprintf(output_path, sizeof(output_path), "not_null_inconsistent_columns.txt");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		PGresult   *res;
+		bool		db_used = false;
+		int			ntups;
+		int			rowno;
+		int			i_nspname,
+	i_relname,
+	i_attname;
+		DbInfo	   *active_db = >dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+		res = executeQueryOrDie(conn,
+"WITH RECURSIVE parents AS ( "
+"	SELECT	i.inhrelid, i.inhparent "
+"	FROM	pg_catalog.pg_inherits i "
+"	UNION ALL "
+"	SELECT	p.inhrelid, i.inhparent "
+"   FROM	parents p "
+"	JOIN	pg_catalog.pg_inherits i "
+"		ON	i.inhrelid = p.inhparent "
+") "
+"SELECT	n.nspname, c.relname, ac.attname  "
+"FROM	parents p, "
+"		pg_catalog.pg_attribute ac, "
+"		pg_catalog.pg_attribute ap, "
+"		pg_catalog.pg_constraint cc, "
+"		pg_catalog.pg_class c, "
+"		pg_catalog.pg_namespace n "
+"WHERE	NOT ac.attnotnull AND "
+"		ac.attrelid = p.inhrelid AND "
+"		ap.attrelid = p.inhparent AND "
+"		ac.attname = ap.attname AND "
+"		ac.attrelid = cc.conrelid AND ac.attnum = ANY(cc.conkey) AND contype='n' AND NOT connoinherit AND"
+

Re: pg17 issues with not-null contraints

2024-04-18 Thread Justin Pryzby

On Thu, Apr 18, 2024 at 06:23:30PM +0200, Alvaro Herrera wrote:
> On 2024-Apr-18, Justin Pryzby wrote:
> 
> > BTW, this works up to v16 (although maybe it should not):
> > 
> > | CREATE TABLE ip(id int PRIMARY KEY); CREATE TABLE ic(id int) INHERITS 
> > (ip); ALTER TABLE ic ALTER id DROP NOT NULL;
> 
> > It'd still be nice to detect the issue in advance rather than failing
> > halfway through the upgrade.
> 
> Maybe we can have pg_upgrade --check look for cases we might have
> trouble upgrading.  (I mean: such cases would fail if you have rows with
> nulls in the affected columns, but the schema upgrade should be
> successful.  Is that what you have in mind?)

Before v16, pg_upgrade failed in the middle of restoring the schema,
without being caught during --check.  The patch to implement that was
forgotten and never progressed.

I'm not totally clear on what's intended in v17 - maybe it'd be dead
code, and maybe it shouldn't even be applied to master branch.  But I do
think it's worth patching earlier versions (even though it'll be less
useful than having done so 5 years ago).

-- 
Justin

Re: pg17 issues with not-null contraints

2024-04-18 Thread Justin Pryzby

On Tue, Apr 16, 2024 at 08:11:49PM +0200, Alvaro Herrera wrote:
> This is still missing some cleanup and additional tests, of course.
> Speaking of which, I wonder if I should modify pg16's tests so that they
> leave behind tables set up in this way, to immortalize pg_upgrade
> testing.

That seems like it could be important.  I considered but never actually
test your patch by pg_upgrading across major versions.

BTW, this works up to v16 (although maybe it should not):

| CREATE TABLE ip(id int PRIMARY KEY); CREATE TABLE ic(id int) INHERITS (ip); 
ALTER TABLE ic ALTER id DROP NOT NULL;

Under v17, this fails. Maybe that's okay, but it should probably be
called out in the release notes.
| ERROR:  cannot drop inherited constraint "ic_id_not_null" of relation "ic"

That's the issue that I mentioned in the 6 year old thread.  In the
future (upgrading *from* v17) it won't be possible anymore, right?  It'd
still be nice to detect the issue in advance rather than failing halfway
through the upgrade.  I have a rebased patch while I'll send on that
thread.  I guess it's mostly unrelated to your patch but it'd be nice if
you could take a look.

-- 
Justin

Re: pg17 issues with not-null contraints

2024-04-16 Thread Justin Pryzby

On Tue, Apr 16, 2024 at 08:11:49PM +0200, Alvaro Herrera wrote:
> On 2024-Apr-15, Alvaro Herrera wrote:
> 
> > - Fourth thought: we do as in the third thought, except we also allow
> > DROP CONSTRAINT a constraint that's marked "local, inherited" to be
> > simply an inherited constraint (remove its "local" marker).
> 
> Here is an initial implementation of what I was thinking.  Can you
> please give it a try and see if it fixes this problem?  At least in my
> run of your original test case, it seems to work as expected.

Yes, this fixes the issue I reported.

BTW, that seems to be the same issue Andrew reported in January.
https://www.postgresql.org/message-id/CAJnzarwkfRu76_yi3dqVF_WL-MpvT54zMwAxFwJceXdHB76bOA%40mail.gmail.com

-- 
Justin

Re: pg17 issues with not-null contraints

2024-04-15 Thread Justin Pryzby

On Mon, Apr 15, 2024 at 03:47:38PM +0200, Alvaro Herrera wrote:
> On 2024-Apr-15, Justin Pryzby wrote:
> 
> > I think there are other issues related to b0e96f3119 (Catalog not-null
> > constraints) - if I dump a v16 server using v17 tools, the backup can't
> > be restored into the v16 server.  I'm okay ignoring a line or two like
> > 'unrecognized configuration parameter "transaction_timeout", but not
> > 'syntax error at or near "NO"'.
> 
> This doesn't look something that we can handle at all.  The assumption
> is that pg_dump's output is going to be fed to a server that's at least
> the same version.  Running on older versions is just not supported.

You're right - the docs say:

|Also, it is not guaranteed that pg_dump's output can be loaded into a
|server of an older major version — not even if the dump was taken from a
|server of that version

Here's a couple more issues affecting upgrades from v16 to v17.

postgres=# CREATE TABLE a(i int NOT NULL); CREATE TABLE b(i int PRIMARY KEY) 
INHERITS (a);
pg_restore: error: could not execute query: ERROR:  constraint 
"pgdump_throwaway_notnull_0" of relation "b" does not exist

postgres=# CREATE TABLE a(i int CONSTRAINT a NOT NULL PRIMARY KEY); CREATE 
TABLE b()INHERITS(a); ALTER TABLE b ADD CONSTRAINT pkb PRIMARY KEY (i);
pg_restore: error: could not execute query: ERROR:  cannot drop inherited 
constraint "pgdump_throwaway_notnull_0" of relation "b"

-- 
Justin

Re: allow changing autovacuum_max_workers without restarting

2024-04-15 Thread Justin Pryzby

On Wed, Apr 10, 2024 at 04:23:44PM -0500, Nathan Bossart wrote:
> The attached proof-of-concept patch demonstrates what I have in mind.
> Instead of trying to dynamically change the global process table, etc., I'm
> proposing that we introduce a new GUC that sets the effective maximum
> number of autovacuum workers that can be started at any time.  This means
> there would be two GUCs for the number of autovacuum workers: one for the
> number of slots reserved for autovacuum workers, and another that restricts
> the number of those slots that can be used.  The former would continue to
> require a restart to change its value, and users would typically want to
> set it relatively high.  The latter could be changed at any time and would
> allow for raising or lowering the maximum number of active autovacuum
> workers, up to the limit set by the other parameter.
> 
> The proof-of-concept patch keeps autovacuum_max_workers as the maximum
> number of slots to reserve for workers, but I think we should instead
> rename this parameter to something else and then reintroduce
> autovacuum_max_workers as the new parameter that can be adjusted without
> restarting.  That way, autovacuum_max_workers continues to work much the
> same way as in previous versions.

When I thought about this, I considered proposing to add a new GUC for
"autovacuum_policy_workers".

autovacuum_max_workers would be the same as before, requiring a restart
to change.  The policy GUC would be the soft limit, changable at runtime
up to the hard limit of autovacuum_max_workers (or maybe any policy
value exceeding autovacuum_max_workers would be ignored).

We'd probably change autovacuum_max_workers to default to a higher value
(8, or 32 as in your patch), and have autovacuum_max_workers default to
3, for consistency with historic behavior.  Maybe
autovacuum_policy_workers=-1 would mean to use all workers.

There's the existing idea to change autovacuum thresholds during the
busy period of the day vs. off hours.  This would allow something
similar with nworkers rather than thresholds: if the goal were to reduce
the resource use of vacuum, the admin could set max_workers=8, with
policy_workers=2 during the busy period.

-- 
Justin

pg17 issues with not-null contraints

2024-04-15 Thread Justin Pryzby

Forking: <20230829172828.5qi2pfbladvfgvsg@alvherre.pgsql>
Subject: Re: Strange presentaion related to inheritance in \d+

On Tue, Aug 29, 2023 at 07:28:28PM +0200, Alvaro Herrera wrote:
> On 2023-Aug-29, Kyotaro Horiguchi wrote:
> 
> > Attached is the initial version of the patch. It prevents "CREATE
> > TABLE" from executing if there is an inconsisntent not-null
> > constraint.  Also I noticed that "ALTER TABLE t ADD NOT NULL c NO
> > INHERIT" silently ignores the "NO INHERIT" part and fixed it.
> 
> Great, thank you.  I pushed it after modifying it a bit -- instead of
> throwing the error in MergeAttributes, I did it in
> AddRelationNotNullConstraints().  It seems cleaner this way, mostly
> because we already have to match these two constraints there.  (I guess
> you could argue that we waste catalog-insertion work before the error is
> reported and the whole thing is aborted; but I don't think this is a
> serious problem in practice.)

9b581c5341 can break dump/restore from old versions, including
pgupgrade.

postgres=# CREATE TABLE iparent(id serial PRIMARY KEY); CREATE TABLE child (id 
int) INHERITS (iparent); ALTER TABLE child ALTER id DROP NOT NULL; ALTER TABLE 
child ADD CONSTRAINT p PRIMARY KEY (id);

$ pg_dump -h /tmp -p 5678 postgres -Fc |pg_restore -1 -h /tmp -p 5679 -d 
postgres
ERROR:  cannot change NO INHERIT status of inherited NOT NULL constraint 
"pgdump_throwaway_notnull_0" on relation "child"
STATEMENT:  ALTER TABLE ONLY public.iparent
ADD CONSTRAINT iparent_pkey PRIMARY KEY (id);
ALTER TABLE ONLY public.iparent DROP CONSTRAINT 
pgdump_throwaway_notnull_0;

Strangely, if I name the table "parent", it seems to work, which might
indicate an ordering/dependency issue.

I think there are other issues related to b0e96f3119 (Catalog not-null
constraints) - if I dump a v16 server using v17 tools, the backup can't
be restored into the v16 server.  I'm okay ignoring a line or two like
'unrecognized configuration parameter "transaction_timeout", but not
'syntax error at or near "NO"'.

postgres=# CREATE TABLE a(i int not null primary key);

$ pg_dump -h /tmp -p 5678 postgres |psql -h /tmp -p 5678 -d new

2024-04-13 21:26:14.510 CDT [475995] ERROR:  syntax error at or near "NO" at 
character 86
2024-04-13 21:26:14.510 CDT [475995] STATEMENT:  CREATE TABLE public.a (
i integer CONSTRAINT pgdump_throwaway_notnull_0 NOT NULL NO INHERIT
);
ERROR:  syntax error at or near "NO"
LINE 2: ...er CONSTRAINT pgdump_throwaway_notnull_0 NOT NULL NO INHERIT

The other version checks in pg_dump.c are used to construct sql for
querying the source db, but this is used to create the sql to restore
the target, using syntax that didn't exist until v17.

if (print_notnull)  
{
if (tbinfo->notnull_constrs[j][0] == '\0')
appendPQExpBufferStr(q, " NOT NULL");
else
appendPQExpBuffer(q, " CONSTRAINT %s NOT NULL",
  
fmtId(tbinfo->notnull_constrs[j]));

if (tbinfo->notnull_noinh[j])
appendPQExpBufferStr(q, " NO INHERIT");
}

This other thread is 6 years old and forgotten again, but still seems
relevant.
https://www.postgresql.org/message-id/flat/b8794d6a-38f0-9d7c-ad4b-e85adf860fc9%40enterprisedb.com

BTW, these comments are out of date:

+* In versions 16 and up, we need pg_constraint for explicit NOT NULL
+   if (fout->remoteVersion >= 17)

+ *   that we needn't specify that again for the child. (Versions >= 16 no
+   if (fout->remoteVersion < 17)

-- 
Justin

Re: using extended statistics to improve join estimates

2024-04-12 Thread Justin Pryzby

On Tue, Apr 02, 2024 at 04:23:45PM +0800, Andy Fan wrote:
> 
> 0001 is your patch, I just rebase them against the current master. 0006
> is not much relevant with current patch, and I think it can be committed
> individually if you are OK with that.

Your 002 should also remove listidx to avoid warning
../src/backend/statistics/extended_stats.c:2879:8: error: variable 'listidx' 
set but not used [-Werror,-Wunused-but-set-variable]

> Subject: [PATCH v1 2/8] Remove estimiatedcluases and varRelid arguments

> @@ -2939,15 +2939,11 @@ statext_try_join_estimates(PlannerInfo *root, List 
> *clauses, int varRelid,
>   /* needs to happen before skipping any clauses */
>   listidx++;
>  
> - /* Skip clauses that were already estimated. */
> - if (bms_is_member(listidx, estimatedclauses))
> - continue;
> -

Your 007 could instead test if relids == NULL:

> Subject: [PATCH v1 7/8] bms_is_empty is more effective than bms_num_members(b)
>-   if (bms_num_members(relids) == 0)
>+   if (bms_is_empty(relids))

typos:
001: s/heuristict/heuristics/
002: s/grantee/guarantee/
002: s/estimiatedcluases/estimatedclauses/

It'd be nice to fix/silence these warnings from 001:

|../src/backend/statistics/extended_stats.c:3151:36: warning: ‘relid’ may be 
used uninitialized [-Wmaybe-uninitialized]
| 3151 | if (var->varno != relid)
|  |^
|../src/backend/statistics/extended_stats.c:3104:33: note: ‘relid’ was declared 
here
| 3104 | int relid;
|  | ^
|[1016/1893] Compiling C object src/backend/postgres_lib.a.p/statistics_mcv.c.o
|../src/backend/statistics/mcv.c: In function ‘mcv_combine_extended’:
|../src/backend/statistics/mcv.c:2431:49: warning: declaration of ‘idx’ shadows 
a previous local [-Wshadow=compatible-local]

FYI, I also ran the patch with a $large number of reports without
observing any errors or crashes.

I'll try to look harder at the next patch revision.

-- 
Justin

Re: POC, WIP: OR-clause support for indexes

2024-04-07 Thread Justin Pryzby

On Mon, Apr 08, 2024 at 01:34:37AM +0300, Alexander Korotkov wrote:
> Hi!
> 
> On Mon, Apr 1, 2024 at 9:38 AM Andrei Lepikhov
>  wrote:
> > On 28/3/2024 16:54, Alexander Korotkov wrote:
> > > The current patch has a boolean guc enable_or_transformation.
> > > However, when we have just a few ORs to be transformated, then we
> > > should get less performance gain from the transformation and higher
> > > chances to lose a good bitmap scan plan from that.  When there is a
> > > huge list of ORs to be transformed, then the performance gain is
> > > greater and it is less likely we could lose a good bitmap scan plan.
> > >
> > > What do you think about introducing a GUC threshold value: the minimum
> > > size of list to do OR-to-ANY transformation?
> > > min_list_or_transformation or something.
> > I labelled it or_transformation_limit (see in attachment). Feel free to
> > rename it.
> > It's important to note that the limiting GUC doesn't operate
> > symmetrically for forward, OR -> SAOP, and backward SAOP -> OR
> > operations. In the forward case, it functions as you've proposed.
> > However, in the backward case, we only check whether the feature is
> > enabled or not. This is due to our existing limitation,
> > MAX_SAOP_ARRAY_SIZE, and the fact that we can't match the length of the
> > original OR list with the sizes of the resulting SAOPs. For instance, a
> > lengthy OR list with 100 elements can be transformed into 3 SAOPs, each
> > with a size of around 30 elements.
> > One aspect that requires attention is the potential inefficiency of our
> > OR -> ANY transformation when we have a number of elements less than
> > MAX_SAOP_ARRAY_SIZE. This is because we perform a reverse transformation
> > ANY -> OR at the stage of generating bitmap scans. If the BitmapScan
> > path dominates, we may have done unnecessary work. Is this an occurrence
> > that we should address?
> > But the concern above may just be a point of improvement later: We can
> > add one more strategy to the optimizer: testing each array element as an
> > OR clause; we can also provide a BitmapOr path, where SAOP is covered
> > with a minimal number of partial indexes (likewise, previous version).
> 
> I've revised the patch.  Did some beautification, improvements for
> documentation, commit messages etc.
> 
> I've pushed the 0001 patch without 0002.  I think 0001 is good by
> itself given that there is the or_to_any_transform_limit GUC option.
> The more similar OR clauses are here the more likely grouping them
> into SOAP will be a win.  But I've changed the default value to 5.

The sample config file has the wrong default

+#or_to_any_transform_limit = 0

We had a patch to catch this kind of error, but it was closed (which IMO
was itself an error).

-- 
Justin

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-03-31 Thread Justin Pryzby

On Sun, Mar 31, 2024 at 12:00:00PM +0300, Alexander Lakhin wrote:
> Hello Alvaro,
> 
> 28.03.2024 18:58, Alvaro Herrera wrote:
> > Grumble.  I don't like initialization at declare time, so far from the
> > code that depends on the value.  But the alternative would have been to
> > assign right where this blocks starts, an additional line.  I pushed it
> > like you had it.
> 
> I've stumbled upon a test failure caused by the test query added in that
> commit:
> +ERROR:  deadlock detected
> +DETAIL:  Process 3076180 waits for AccessShareLock on relation 1259 of 
> database 16386; blocked by process 3076181.
> +Process 3076181 waits for AccessShareLock on relation 2601 of database 
> 16386; blocked by process 3076180.

I think means that, although it was cute to use pg_am in the reproducer
given in the problem report, it's not a good choice to use here in the
sql regression tests.

-- 
Justin

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-03-26 Thread Justin Pryzby

On Thu, Mar 21, 2024 at 01:07:01PM +0100, Alvaro Herrera wrote:
> Given that Michaël is temporarily gone, I propose to push the attached
> tomorrow.

Thanks.

On Tue, Mar 26, 2024 at 12:05:47PM +0100, Alvaro Herrera wrote:
> On 2024-Mar-26, Alexander Lakhin wrote:
> 
> > Hello Alvaro,
> > 
> > 21.03.2024 15:07, Alvaro Herrera wrote:
> > > Given that Michaël is temporarily gone, I propose to push the attached
> > > tomorrow.
> > 
> > Please look at a new anomaly introduced with 374c7a229.
> > Starting from that commit, the following erroneous query:
> > CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;
> > 
> > triggers an assertion failure:
> > TRAP: failed Assert("relation->rd_rel->relam == InvalidOid"), File: 
> > "relcache.c", Line: 1219, PID: 3706301
> 
> Hmm, yeah, we're setting relam for relations that shouldn't have it.
> I propose the attached.

Looks right.  That's how I originally wrote it, except for the
"stmt->accessMethod != NULL" case.

I prefered my way - the grammar should refuse to set stmt->accessMethod
for inappropriate relkinds.  And you could assert that.

I also prefered to set "accessMethodId = InvalidOid" once, rather than
twice.

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8a02c5b05b6..050be89728f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -962,18 +962,21 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid 
ownerId,
 * case of a partitioned table) the parent's, if it has one.
 */
if (stmt->accessMethod != NULL)
-   accessMethodId = get_table_am_oid(stmt->accessMethod, false);
-   else if (stmt->partbound)
{
-   Assert(list_length(inheritOids) == 1);
-   accessMethodId = get_rel_relam(linitial_oid(inheritOids));
+   Assert(RELKIND_HAS_TABLE_AM(relkind) || relkind == 
RELKIND_PARTITIONED_TABLE);
+   accessMethodId = get_table_am_oid(stmt->accessMethod, false);
}
-   else
-   accessMethodId = InvalidOid;
+   else if (RELKIND_HAS_TABLE_AM(relkind) || relkind == 
RELKIND_PARTITIONED_TABLE)
+   {
+   if (stmt->partbound)
+   {
+   Assert(list_length(inheritOids) == 1);
+   accessMethodId = 
get_rel_relam(linitial_oid(inheritOids));
+   }
 
-   /* still nothing? use the default */
-   if (RELKIND_HAS_TABLE_AM(relkind) && !OidIsValid(accessMethodId))
-   accessMethodId = get_table_am_oid(default_table_access_method, 
false);
+   if (RELKIND_HAS_TABLE_AM(relkind) && 
!OidIsValid(accessMethodId))
+   accessMethodId = 
get_table_am_oid(default_table_access_method, false);
+   }
 
/*
 * Create the relation.  Inherited defaults and constraints are passed 
in

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-03-07 Thread Justin Pryzby

On Mon, Mar 04, 2024 at 05:46:56PM +0900, Michael Paquier wrote:
> > Since InvalidOid is already taken, I guess you might need to introduce a
> > boolean flag, like set_relam, indicating that the statement has an
> > ACCESS METHOD clause.
> 
> Yes, I don't see an alternative.  The default needs a different field
> to be tracked down to the execution.

The data structure you used (defaultAccessMethod) allows this, which
is intended to be prohibited:

postgres=# ALTER TABLE a SET access method default, SET access method default;
ALTER TABLE

As you wrote it, you pass the "defaultAccessMethod" bool to
ATExecSetAccessMethodNoStorage(), which seems odd.  Why not just pass
the target amoid as newAccessMethod ?

When I fooled with this on my side, I called it "chgAccessMethod" to
follow "chgPersistence".  I think "is default" isn't the right data
structure.

Attached a relative patch with my version.

Also: I just realized that instead of adding a bool, we could test
(tab->rewrite & AT_REWRITE_ACCESS_METHOD) != 0

+-- Default and AM set in in clause are the same, relam should be set.

in in?

-- 
Justin
>From 8aca5e443ebb41b7de1ba18b519a33a97a41a897 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 4 Mar 2024 17:42:04 +0900
Subject: [PATCH 1/2] Allow specifying access method of partitioned tables

..to be inherited by partitions
See also:
ca4103025dfe26eaaf6a500dec9170fbb176eebc
8586bf7ed8889f39a59dd99b292014b73be85342
ebfe2dbd6b624e2a428e14b7ee9322cc096f63f7 - prevent DROP AM

Authors: Justin Pryzby, Soumyadeep Chakraborty
---
 doc/src/sgml/catalogs.sgml  |   5 +-
 doc/src/sgml/ref/alter_table.sgml   |   8 ++
 doc/src/sgml/ref/create_table.sgml  |   4 +
 src/backend/commands/tablecmds.c| 183 +---
 src/backend/utils/cache/lsyscache.c |  22 +++
 src/backend/utils/cache/relcache.c  |   7 +
 src/bin/pg_dump/pg_dump.c   |   3 +-
 src/bin/pg_dump/t/002_pg_dump.pl|  35 +
 src/include/catalog/pg_class.h  |   4 +-
 src/include/utils/lsyscache.h   |   1 +
 src/test/regress/expected/create_am.out | 158 +++-
 src/test/regress/sql/create_am.sql  |  91 +++-
 12 files changed, 487 insertions(+), 34 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 0ae97d1adaa..e4fcacb610a 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1989,8 +1989,9 @@ SCRAM-SHA-256$iteration count:
   
   
If this is a table or an index, the access method used (heap,
-   B-tree, hash, etc.); otherwise zero (zero occurs for sequences,
-   as well as relations without storage, such as views)
+   B-tree, hash, etc.); otherwise zero. Zero occurs for sequences,
+   as well as relations without storage, such as views. Partitioned
+   tables may use a non-zero value.
   
  
 
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 96e3d776051..779c8b31226 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -737,6 +737,14 @@ WITH ( MODULUS numeric_literal, REM
   DEFAULT changes the access method of the table
   to .
  
+ 
+  When applied to a partitioned table, there is no data to rewrite, but any
+  partitions created afterwards will use that access method unless
+  overridden by a USING clause. Using
+  DEFAULT switches the partitioned table to rely on
+  the value of default_table_access_method set
+  when new partitions are created, instead.
+ 
 

 
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 4cbaaccaf7c..b79081a5ec1 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1330,6 +1330,10 @@ WITH ( MODULUS numeric_literal, REM
   method is chosen for the new table. See  for more information.
  
+ 
+  When creating a partition, the table access method is the access method
+  of its partitioned table, if set.
+ 
 

 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7014be80396..cf595329b6c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -184,6 +184,7 @@ typedef struct AlteredTableInfo
 	bool		verify_new_notnull; /* T if we should recheck NOT NULL */
 	int			rewrite;		/* Reason for forced rewrite, if any */
 	Oid			newAccessMethod;	/* new access method; 0 means no change */
+	bool		defaultAccessMethod;	/* true if SET ACCESS METHOD DEFAULT */
 	Oid			newTableSpace;	/* new tablespace; 0 means no change */
 	bool		chgPersistence; /* T if SET LOGGED/UNLOGGED is used */
 	char		newrelpersistence;	/* if above is true */
@@ -589,6 +590,8 @@ static ObjectAddress ATExecClusterOn(Relation rel, const char *indexName,
 	 LOCKMODE lockmode);

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-03-01 Thread Justin Pryzby

On Fri, Mar 01, 2024 at 10:56:50AM +0900, Michael Paquier wrote:
> On Thu, Feb 29, 2024 at 08:51:31AM -0600, Justin Pryzby wrote:
> > On Wed, Feb 28, 2024 at 05:08:49PM +0900, Michael Paquier wrote:
> >> I have implemented that so as we keep the default, historical
> >> behavior: if pg_class.relam is 0 for a partitioned table, use the AM
> >> defined by default_table_access_method.  The patch only adds a path to
> >> switch to a different AM than the GUC when creating a new partition if
> >> and only if a partitioned table has been manipulated with ALTER TABLE
> >> SET ACCESS METHOD to update its AM to something else than the GUC.
> >> Similarly to tablespaces, CREATE TABLE USING is *not* supported for
> >> partitioned tables, same behavior as previously.
> > 
> > This patch allows resetting relam=0 by running ALTER TABLE SET AM to the
> > same value as the GUC.  Maybe it'd be better to have an explicit SET
> > DEFAULT (as in b9424d01 and 4f622503).
> 
> Outside the scope of this patch's thread, this looks like a good idea
> even for tables/matviews.  And the semantics are pretty easy: if DEFAULT
> is specified, just set the access method to NULL in the parser and let
> tablecmds.c go the AM OID lookup in the prep phase if set to NULL.
> See 0001 attached.  This one looks pretty good taken as an independent
> piece.
> 
> When it comes to partitioned tables, there is a still a tricky case:
> what should we do when a user specifies a non-default value in the SET
> ACCESS METHOD clause and it matches default_table_access_method?

I don't think it's tricky - it seems more like a weird hack in the
previous patch version to make AMs behave like tablespaces, despite not
being completely parallel, due to the absence of a pg_default AM.

With the new 001, the hack can go away, and so it should.

> Should the relam be 0 or should we force relam to be the OID of the
> given value given by the query?

You said "force" it to be the user-specified value, but I think that's
not "forcing", it's respecting (but to take the user's desired value,
and conditionally store 0 instead, that could be described as
"forcing.")

> Implementation-wise, forcing the value to 0 is simpler, but I can get
> why it could be confusing as well, because the state of the catalogs
> does not reflect what was provided in the query.

> At the same time, the user has explicitly set the access method to be
> the same as the default, so perhaps 0 makes sense anyway in this case.

I think if the user sets something "explicitly", the catalog should
reflect what they set.  Tablespaces have dattablespace, but AMs don't --
it's a simpler case.

For 001: we don't *need* to support "ALTER SET AM default" for leaf
tables.  It doesn't do anything that's not already possible.  But, if
AMs for partitioned tables are optional rather than required, then seems
to be needed to allow (re)settinng relam=0.

But for partitioned tables, I think it should set relam=0 directly.
Currently it 1) falls through to default_table_am; and 2) detects that
it's the default, so then sets relam to 0.  Since InvalidOid 

On Fri, Mar 01, 2024 at 02:03:48PM +0900, Michael Paquier wrote:
> Fun topic, especially once coupled with the internals of tablecmds.c
> that uses InvalidOid for the new access AM as a special value to work
> as a no-op.

Since InvalidOid is already taken, I guess you might need to introduce a
boolean flag, like set_relam, indicating that the statement has an
ACCESS METHOD clause.

> + * method defined so as their children can inherit it; however, this is 
> handled

so that

> +  * Do nothing: access methods is a setting that partitions can

method (singular), or s/is/are/

On Wed, Feb 28, 2024 at 05:08:49PM +0900, Michael Paquier wrote:
> Similarly to tablespaces, CREATE TABLE USING is *not* supported for
> partitioned tables, same behavior as previously.

Maybe I misunderstood what you're trying to say, but CREATE..TABLESPACE
*is* supported for partitioned tables.  I'm not sure why it wouldn't be
supported to set the AM, too.

In any case, it'd be a bit confusing for the error message to still say:

postgres=# CREATE TABLE a(i int) PARTITION BY RANGE(a) USING heap2;
ERROR:  specifying a table access method is not supported on a partitioned table

-- 
Justin

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-02-29 Thread Justin Pryzby

On Wed, Feb 28, 2024 at 05:08:49PM +0900, Michael Paquier wrote:
> On Wed, Feb 21, 2024 at 08:46:48AM +0100, Peter Eisentraut wrote:
> > Yes, I think most people agreed that that would be the preferred behavior.
> 
> Challenge accepted.  As of the patch attached.

Thanks for picking it up.  I find it pretty hard to switch back to
put the needed effort into a patch after a long period.

> I have implemented that so as we keep the default, historical
> behavior: if pg_class.relam is 0 for a partitioned table, use the AM
> defined by default_table_access_method.  The patch only adds a path to
> switch to a different AM than the GUC when creating a new partition if
> and only if a partitioned table has been manipulated with ALTER TABLE
> SET ACCESS METHOD to update its AM to something else than the GUC.
> Similarly to tablespaces, CREATE TABLE USING is *not* supported for
> partitioned tables, same behavior as previously.

This patch allows resetting relam=0 by running ALTER TABLE SET AM to the
same value as the GUC.  Maybe it'd be better to have an explicit SET
DEFAULT (as in b9424d01 and 4f622503).

-- 
Justin

Re: pg_upgrade and logical replication

2024-02-15 Thread Justin Pryzby

On Wed, Feb 14, 2024 at 03:37:03AM +, Hayato Kuroda (Fujitsu) wrote:
> Attached patch modified the test accordingly. Also, it contains some 
> optimizations.
> This can pass the test on my env:

What optimizations?  I can't see them, and since the patch is described
as rearranging test cases (and therefore already difficult to read), I
guess they should be a separate patch, or the optimizations described.

-- 
Justin

Re: pg_upgrade and logical replication

2024-02-13 Thread Justin Pryzby

On Tue, Jan 02, 2024 at 03:58:25PM +0530, Amit Kapila wrote:
> Pushed.

pg_upgrade/t/004_subscription.pl says

|my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';

..but I think maybe it should not.

When you try to use --link, it fails:
https://cirrus-ci.com/task/4669494061170688

|Adding ".old" suffix to old global/pg_control ok
|
|If you want to start the old cluster, you will need to remove
|the ".old" suffix from 
/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_subscription_old_sub_data/pgdata/global/pg_control.old.
|Because "link" mode was used, the old cluster cannot be safely
|started once the new cluster has been started.
|...
|
|postgres: could not find the database system
|Expected to find it in the directory 
"/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_subscription_old_sub_data/pgdata",
|but could not open file 
"/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_subscription_old_sub_data/pgdata/global/pg_control":
 No such file or directory
|# No postmaster PID for node "old_sub"
|[19:36:01.396](0.250s) Bail out!  pg_ctl start failed

You could rename pg_control.old to avoid that immediate error, but that doesn't
address the essential issue that "the old cluster cannot be safely started once
the new cluster has been started."

-- 
Justin

Re: CI and test improvements

2024-02-13 Thread Justin Pryzby

On Wed, Jan 31, 2024 at 11:59:21AM +0100, Alvaro Herrera wrote:
> On 2024-Jan-31, vignesh C wrote:
> 
> > Are we planning to do anything more on this? I was not sure if we
> > should move this to next commitfest or return it.
> 
> Well, the patches don't apply anymore since .cirrus.tasks.yml has been
> created.  However, I'm sure we still want [some of] the improvements
> to the tests in [1].  I can volunteer to rebase the patches in time for the
> March commitfest, if Justin is not available to do so.  If you can
> please move it forward to the March cf and set it WoA, I'd appreciate
> that.

The patches are rebased.  A couple were merged since I last rebased them
~10 months ago.  The freebsd patch will probably be obsoleted by a patch
of Thomas.

On Mon, Mar 13, 2023 at 07:39:52AM +0100, Peter Eisentraut wrote:
> On 03.02.23 15:26, Justin Pryzby wrote:
> > 9baf41674ad pg_upgrade: tap test: exercise --link and --clone
> 
> This seems like a good idea.

On Wed, Mar 15, 2023 at 10:58:41AM +0100, Peter Eisentraut wrote:
> > [PATCH 4/8] pg_upgrade: tap test: exercise --link and --clone
> 
> I haven't been able to get any changes to the test run times outside
> of noise from this.  But some more coverage is sensible in any case.
> 
> I'm concerned that with this change, the only platform that tests
> --copy is Windows, but Windows has a separate code path for copy.  So
> we should leave one Unix platform to test --copy.  Maybe have FreeBSD
> test --link and macOS test --clone and leave the others with --copy?

I addressed Peter's comments, but haven't heard further.

The patch to show HTML docs artifacts may be waiting for Andres' patch
to convert CompilerWarnings to meson.

It may also be waiting on cfbot to avoid squishing all the patches
together.

I sent various patches to cfbot but haven't heard back.
https://www.postgresql.org/message-id/flat/20220409021853.gp24...@telsasoft.com
https://www.postgresql.org/message-id/flat/20220623193125.gb22...@telsasoft.com
https://github.com/justinpryzby/cfbot/commits/master
https://github.com/macdice/cfbot/pulls

-- 
Justin
>From 37a697bf2ff2e9e24c7b8cb2df289f21e1fca924 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 25 May 2022 21:53:22 -0500
Subject: [PATCH 1/6] cirrus/windows: add compiler_warnings_script

The goal is to fail due to warnings only after running tests.
(At least historically, it's too slow to run a separate windows VM to
compile with -Werror.)

https://www.postgresql.org/message-id/20220212212310.f645c6vw3njkgxka%40alap3.anarazel.de

I'm not sure how to write this test in windows shell; it's also easy to
write something that doesn't work in posix sh, since windows shell is
interpretting && and ||...

See also:
8a1ce5e54f6d144e4f8e19af7c767b026ee0c956
https://cirrus-ci.com/task/6241060062494720
https://cirrus-ci.com/task/6496366607204352

ci-os-only: windows
---
 .cirrus.tasks.yml | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index e4e1bcfeb99..2d397448851 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -561,12 +561,21 @@ task:
 
   build_script: |
 vcvarsall x64
-ninja -C build
+ninja -C build |tee build.txt
+REM Since pipes lose the exit status of the preceding command, rerun the compilation
+REM without the pipe, exiting now if it fails, to avoid trying to run checks
+ninja -C build > nul
 
   check_world_script: |
 vcvarsall x64
 meson test %MTEST_ARGS% --num-processes %TEST_JOBS%
 
+  # This should be last, so check_world is run even if there are warnings
+  always:
+compiler_warnings_script:
+  # this avoids using metachars which would be interpretted by the windows shell
+  - sh -c 'if grep ": warning " build.txt; then exit 1; fi; exit 0'
+
   on_failure:
 <<: *on_failure_meson
 crashlog_artifacts:
-- 
2.42.0

>From 71d3bdad00bd0a4967db5a394247a54eb0f8a1fa Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 30 Jul 2022 19:25:51 -0500
Subject: [PATCH 2/6] cirrus/002_pg_upgrade: exercise --link and --clone

This increases code coverage (and maybe accelerates the test).

See also: b059a2409faf5833b3ba7792e247d6466c9e8090

linux,
macos,
//-os-only: freebsd
---
 .cirrus.tasks.yml | 4 
 1 file changed, 4 insertions(+)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 2d397448851..53be0ce15e4 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -138,6 +138,8 @@ task:
 CPPFLAGS: -DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
 CFLAGS: -Og -ggdb
 
+PG_TEST_PG_UPGRADE_MODE: --link
+
   <<: *freebsd_task_template
 
   depends_on: SanityCheck
@@ -431,6 +433,8 @@ task:
 CFLAGS: -Og -ggdb
 CXXFLAGS: -Og -ggdb
 
+PG_TEST_PG_UPGRADE_MODE: --clone
+

Re: pgsql: Add EXPLAIN (MEMORY) to report planner memory consumption

2024-02-05 Thread Justin Pryzby

Up to now, the explain "  " (two space) format is not mixed with "=".

And, other places which show "Memory" do not use "=".  David will
remember prior discussions.
https://www.postgresql.org/message-id/20200402054120.gc14...@telsasoft.com
https://www.postgresql.org/message-id/20200407042521.gh2...@telsasoft.com

 "Memory: used=%lld bytes  
allocated=%lld bytes",
vs
 "Buckets: %d (originally %d)  
Batches: %d (originally %d)  Memory Usage: %ldkB\n",

There was some discussion about "bytes" - maybe it should instead show
kB?

(Also, I first thought that "peek" should be "peak", but eventually I
realized that's it's as intended.)

On Fri, Jan 12, 2024 at 10:53:08PM +0530, Abhijit Menon-Sen wrote:
> (Those "bytes" look slightly odd to me in the midst of all the x=y
> pieces, but that's probably not worth thinking about.)


On Mon, Jan 29, 2024 at 04:55:27PM +, Alvaro Herrera wrote:
> Add EXPLAIN (MEMORY) to report planner memory consumption
> 
> This adds a new "Memory:" line under the "Planning:" group (which
> currently only has "Buffers:") when the MEMORY option is specified.
> 
> In order to make the reporting reasonably accurate, we create a separate
> memory context for planner activities, to be used only when this option
> is given.  The total amount of memory allocated by that context is
> reported as "allocated"; we subtract memory in the context's freelists
> from that and report that result as "used".  We use
> MemoryContextStatsInternal() to obtain the quantities.
> 
> The code structure to show buffer usage during planning was not in
> amazing shape, so I (Álvaro) modified the patch a bit to clean that up
> in passing.
> 
> Author: Ashutosh Bapat
> Reviewed-by: David Rowley, Andrey Lepikhov, Jian He, Andy Fan
> Discussion: 
> https://www.postgresql.org/message-id/CAExHW5sZA=5lj_zppro-w09ck8z9p7eayaqq3ks9gdfhrxe...@mail.gmail.com

Re: cannot abort transaction 2737414167, it was already committed

2024-01-07 Thread Justin Pryzby

On Wed, Dec 27, 2023 at 09:02:25AM -0600, Justin Pryzby wrote:
> We had this:
> 
> < 2023-12-25 04:06:20.062 MST telsasoft >ERROR:  could not open file 
> "pg_tblspc/16395/PG_16_202307071/16384/121010871": Input/output error
> < 2023-12-25 04:06:20.062 MST telsasoft >STATEMENT:  commit
> < 2023-12-25 04:06:20.062 MST telsasoft >WARNING:  AbortTransaction while in 
> COMMIT state
> < 2023-12-25 04:06:20.062 MST telsasoft >PANIC:  cannot abort transaction 
> 2737414167, it was already committed
> < 2023-12-25 04:06:20.473 MST  >LOG:  server process (PID 14678) was 
> terminated by signal 6: Aborted

> It's possible that the filesystem had an IO error, but I can't find any
> evidence of that.  Postgres is running entirely on zfs, which says:

FYI: the VM which hit this error also just hit:

log_time   | 2024-01-07 05:19:11.611-07
error_severity | ERROR
message| could not open file 
"pg_tblspc/16395/PG_16_202307071/16384/123270438_vm": Input/output error
query  | commit
location   | mdopenfork, md.c:663

Since I haven't seen this anywhere else, that's good evidence it's an
issue with the backing storage (even though the FS/kernel aren't nice
enough to say so).

-- 
Justin

Re: warn if GUC set to an invalid shared library

2024-01-07 Thread Justin Pryzby

On Fri, Jul 22, 2022 at 03:26:47PM -0400, Tom Lane wrote:
> Hmph.  I wonder if we shouldn't change that, because it's a lie.
> The value isn't actually coming from the config file, at least
> not yet.

On Thu, Jul 06, 2023 at 03:15:20PM -0500, Justin Pryzby wrote:
> On Sat, Oct 29, 2022 at 10:40 AM Justin Pryzby  wrote:
> > > > > On Fri, Sep 02, 2022 at 05:24:58PM -0500, Justin Pryzby wrote:
> > > > > > It caused no issue when I changed:
> > > > > >
> > > > > > /* Check that it's acceptable for the 
> > > > > > indicated parameter */
> > > > > > if (!parse_and_validate_value(record, name, 
> > > > > > value,
> > > > > > - PGC_S_FILE, 
> > > > > > ERROR,
> > > > > > + PGC_S_TEST, 
> > > > > > ERROR,
> > > > > >   , 
> > > > > > ))
> > > > > >
> > > > > > I'm not sure where to go from here.
> > > > >
> > > > > I'm hoping for some guidance ; this simple change may be naive, but 
> > > > > I'm not
> > > > > sure what a wider change would look like.
> 
> I'm still hoping.

@cfbot: rebased
>From e983189c5ae64524f724f317f024d6938f6ad271 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 22 Jul 2022 15:52:11 -0500
Subject: [PATCH v8 1/4] WIP: test GUCs from ALTER SYSTEM as PGC_S_TEST not
 FILE

WIP: ALTER SYSTEM should use PGC_S_TEST rather than PGC_S_FILE

Since the value didn't come from a file.  Or maybe we should have
another PGC_S_ value for this, or a flag for 'is a test'.
---
 src/backend/utils/misc/guc.c | 2 +-
 src/include/utils/guc.h  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 09353bb613a..0339d2bc722 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -4701,7 +4701,7 @@ AlterSystemSetConfigFile(AlterSystemStmt *altersysstmt)
 void	   *newextra = NULL;
 
 if (!parse_and_validate_value(record, name, value,
-			  PGC_S_FILE, ERROR,
+			  PGC_S_TEST, ERROR,
 			  , ))
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 109e7c5abbd..d0ac2fd4944 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -120,6 +120,7 @@ typedef enum
 	PGC_S_INTERACTIVE,			/* dividing line for error reporting */
 	PGC_S_TEST,	/* test per-database or per-user setting */
 	PGC_S_SESSION,/* SET command */
+	// PGC_S_TEST_FILE,			/* test global cluster settings (ALTER SYSTEM) */
 } GucSource;
 
 /*
-- 
2.42.0

>From 790fc2db29d28c7d00dceaa3f1ca22919ba09213 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 13 Dec 2021 08:42:38 -0600
Subject: [PATCH v8 2/4] warn when setting GUC to a nonextant library

Note that warnings shouldn't be issued during startup (only fatals):

$ ./tmp_install/usr/local/pgsql/bin/postgres -D ./testrun/regress/regress/tmp_check/data -c shared_preload_libraries=ab
2023-07-06 14:47:03.817 CDT postmaster[2608106] FATAL:  could not access file "ab": No existe el archivo o el directorio
2023-07-06 14:47:03.817 CDT postmaster[2608106] CONTEXT:  while loading shared libraries for setting "shared_preload_libraries"
---
 src/backend/commands/variable.c   | 95 +++
 src/backend/utils/misc/guc_tables.c   |  6 +-
 src/include/utils/guc_hooks.h |  7 ++
 .../unsafe_tests/expected/rolenames.out   | 19 
 .../modules/unsafe_tests/sql/rolenames.sql| 13 +++
 src/test/regress/expected/rules.out   |  8 ++
 6 files changed, 145 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 30efcd554ae..7109091440c 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -40,6 +40,7 @@
 #include "utils/timestamp.h"
 #include "utils/tzparser.h"
 #include "utils/varlena.h"
+#include 
 
 /*
  * DATESTYLE
@@ -1234,3 +1235,97 @@ check_ssl(bool *newval, void **extra, GucSource source)
 #endif
 	return true;
 }
+
+
+/*
+ * See also load_libraries() and internal_load_library().
+ */
+static bool
+check_preload_libraries(char **newval, void **extra, GucSource source,
+		bool restricted)
+{
+	char	   *rawstring;
+	List	   *elemlist;
+	ListCell   *l;
+
+	/* nothing to do if empty */
+	if (newval == NULL || *newval[0] == '\0')
+		return true;
+
+	/*
+	 * issue warnings only during an interactive SET, from ALTER
+	 * ROL

Re: cannot abort transaction 2737414167, it was already committed

2023-12-27 Thread Justin Pryzby

On Thu, Dec 28, 2023 at 11:33:16AM +1300, Thomas Munro wrote:
> I guess the large object usage isn't directly relevant (that module's
> EOXact stuff seems to be finished before TRANS_COMMIT, but I don't
> know that code well).  Everything later is supposed to be about
> closing/releasing/cleaning up, and for example smgrDoPendingDeletes()
> reaches code with this relevant comment:
> 
>  * Note: smgr_unlink must treat deletion failure as a WARNING, not an
>  * ERROR, because we've already decided to commit or abort the current
>  * xact.
> 
> We don't really have a general ban on ereporting on system call
> failure, though.  We've just singled unlink() out.  Only a few lines
> above that we call DropRelationsAllBuffers(rels, nrels), and that
> calls smgrnblocks(), and that might need to need to re-open() the
> relation file to do lseek(SEEK_END), because PostgreSQL itself has no
> tracking of relation size.  Hard to say but my best guess is that's
> where you might have got your EIO, assuming you dropped the relation
> in this transaction?

Yeah.  In fact I was confused - this was not lo_unlink().
This uses normal tables, so would've done:

"begin;"
"DROP TABLE IF EXISTS %s", tablename
"DELETE FROM cached_objects WHERE cache_name=%s", tablename
"commit;"

> > This is pg16 compiled at efa8f6064, runing under centos7.  ZFS is 2.2.2,
> > but the pool hasn't been upgraded to use the features new since 2.1.
> 
> I've been following recent ZFS stuff from a safe distance as a user.
> AFAIK the extremely hard to hit bug fixed in that very recent release
> didn't technically require the interesting new feature (namely block
> cloning, though I think that helped people find the root cause after a
> phase of false blame?).  Anyway, it had for symptom some bogus zero
> bytes on read, not a spurious EIO.

The ZFS bug had to do with bogus bytes which may-or-may-not-be-zero, as
I understand.  The understanding is that the bug was pre-existing but
became more easy to hit in 2.2, and is fixed in 2.2.2 and 2.1.14.

-- 
Justin

cannot abort transaction 2737414167, it was already committed

2023-12-27 Thread Justin Pryzby

We had this:

< 2023-12-25 04:06:20.062 MST telsasoft >ERROR:  could not open file 
"pg_tblspc/16395/PG_16_202307071/16384/121010871": Input/output error
< 2023-12-25 04:06:20.062 MST telsasoft >STATEMENT:  commit
< 2023-12-25 04:06:20.062 MST telsasoft >WARNING:  AbortTransaction while in 
COMMIT state
< 2023-12-25 04:06:20.062 MST telsasoft >PANIC:  cannot abort transaction 
2737414167, it was already committed
< 2023-12-25 04:06:20.473 MST  >LOG:  server process (PID 14678) was terminated 
by signal 6: Aborted

The application is a daily cronjob which would've just done:

begin;
lo_unlink(); -- the client-side function called from pygresql;
DELETE FROM tbl WHERE col=%s;
commit;

The table being removed would've been a transient (but not "temporary")
table created ~1 day prior.

It's possible that the filesystem had an IO error, but I can't find any
evidence of that.  Postgres is running entirely on zfs, which says:

scan: scrub repaired 0B in 00:07:03 with 0 errors on Mon Dec 25 04:49:07 2023
errors: No known data errors

My main question is why an IO error would cause the DB to abort, rather
than raising an ERROR.

This is pg16 compiled at efa8f6064, runing under centos7.  ZFS is 2.2.2,
but the pool hasn't been upgraded to use the features new since 2.1.

(gdb) bt
#0  0x7fc961089387 in raise () from /lib64/libc.so.6
#1  0x7fc96108aa78 in abort () from /lib64/libc.so.6
#2  0x009438b7 in errfinish (filename=filename@entry=0xac8e20 "xact.c", 
lineno=lineno@entry=1742, funcname=funcname@entry=0x9a6600 <__func__.32495> 
"RecordTransactionAbort") at elog.c:604
#3  0x0054d6ab in RecordTransactionAbort 
(isSubXact=isSubXact@entry=false) at xact.c:1741
#4  0x0054d7bd in AbortTransaction () at xact.c:2814
#5  0x0054e015 in AbortCurrentTransaction () at xact.c:3415
#6  0x00804e4e in PostgresMain (dbname=0x12ea840 "ts", 
username=0x12ea828 "telsasoft") at postgres.c:4354
#7  0x0077bdd6 in BackendRun (port=, port=) at postmaster.c:4465
#8  BackendStartup (port=0x12e44c0) at postmaster.c:4193
#9  ServerLoop () at postmaster.c:1783
#10 0x0077ce9a in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x12ad280) at postmaster.c:1467
#11 0x004ba8b8 in main (argc=3, argv=0x12ad280) at main.c:198

#3  0x0054d6ab in RecordTransactionAbort 
(isSubXact=isSubXact@entry=false) at xact.c:1741
xid = 2737414167
rels = 0x94f549 
ndroppedstats = 0
droppedstats = 0x0

#4  0x0054d7bd in AbortTransaction () at xact.c:2814
is_parallel_worker = false

-- 
Justin

processes stuck in shutdown following OOM/recovery

2023-11-30 Thread Justin Pryzby

If postgres starts, and one of its children is immediately killed, and
the cluster is also told to stop, then, instead, the whole system gets
wedged.

$ initdb -D ./pgdev.dat1
$ pg_ctl -D ./pgdev.dat1 start -o '-c port=5678'
$ kill -9 2524495; sleep 0.05; pg_ctl -D ./pgdev.dat1 stop -m fast # 2524495 is 
a child's pid
.. failed
pg_ctl: server does not shut down

$ ps -wwwf --ppid 2524494
UID  PIDPPID  C STIME TTY  TIME CMD
pryzbyj  2524552 2524494  0 20:47 ?00:00:00 postgres: checkpointer 

(gdb) bt
#0  0x7f0ce2d08c03 in epoll_wait (epfd=10, events=0x55cb4cbaac28, 
maxevents=1, timeout=timeout@entry=156481) at 
../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x55cb4c219208 in WaitEventSetWaitBlock (set=set@entry=0x55cb4cbaabc0, 
cur_timeout=cur_timeout@entry=156481, 
occurred_events=occurred_events@entry=0x7ffd80130410, 
nevents=nevents@entry=1) at ../src/backend/storage/ipc/latch.c:1583
#2  0x55cb4c219e02 in WaitEventSetWait (set=0x55cb4cbaabc0, 
timeout=timeout@entry=30, 
occurred_events=occurred_events@entry=0x7ffd80130410, nevents=nevents@entry=1, 
wait_event_info=wait_event_info@entry=83886084) at 
../src/backend/storage/ipc/latch.c:1529
#3  0x55cb4c219f87 in WaitLatch (latch=, 
wakeEvents=wakeEvents@entry=41, timeout=timeout@entry=30, 
wait_event_info=wait_event_info@entry=83886084)
at ../src/backend/storage/ipc/latch.c:539
#4  0x55cb4c1aabc2 in CheckpointerMain () at 
../src/backend/postmaster/checkpointer.c:523
#5  0x55cb4c1a8207 in AuxiliaryProcessMain 
(auxtype=auxtype@entry=CheckpointerProcess) at 
../src/backend/postmaster/auxprocess.c:153
#6  0x55cb4c1ae63d in StartChildProcess 
(type=type@entry=CheckpointerProcess) at 
../src/backend/postmaster/postmaster.c:5331
#7  0x55cb4c1b07f3 in ServerLoop () at 
../src/backend/postmaster/postmaster.c:1792
#8  0x55cb4c1b1c56 in PostmasterMain (argc=argc@entry=5, 
argv=argv@entry=0x55cb4cbaa380) at ../src/backend/postmaster/postmaster.c:1466
#9  0x55cb4c0f4c1b in main (argc=5, argv=0x55cb4cbaa380) at 
../src/backend/main/main.c:198

I noticed this because of the counter-effective behavior of systemd+PGDG
unit files to run "pg_ctl stop" whenever a backend is killed for OOM:
https://www.postgresql.org/message-id/ZVI112aVNCHOQgfF@pryzbyj2023

This affects v15, and fails at 7ff23c6d27 but not its parent.

commit 7ff23c6d277d1d90478a51f0dd81414d343f3850 (HEAD)
Author: Thomas Munro 
Date:   Mon Aug 2 17:32:20 2021 +1200

Run checkpointer and bgwriter in crash recovery.

-- 
Justin

Re: ALTER TABLE uses a bistate but not for toast tables

2023-11-16 Thread Justin Pryzby

@cfbot: rebased
>From a9bb61421c24bd3273a8519362febb4073c297a1 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 21 Jun 2022 22:28:06 -0500
Subject: [PATCH v4] WIP: use BulkInsertState for toast tuples, too

DONE: ALTER, CLUSTER
TODO: copyto, copyfrom?

slot_getsomeattrs slot_deform_heap_tuple fetchatt
heap_getnextslot => heapgettup => heapgetpage => ReadBufferExtended
initscan
table_beginscan table_scan_getnextslot
RelationCopyStorageUsingBuffer ReadBufferWithoutRelcache

(gdb) bt
 #0  table_open (relationId=relationId@entry=16390, lockmode=lockmode@entry=1) at table.c:40
 #1  0x56444cb23d3c in toast_fetch_datum (attr=attr@entry=0x7f67933cc6cc) at detoast.c:372
 #2  0x56444cb24217 in detoast_attr (attr=attr@entry=0x7f67933cc6cc) at detoast.c:123
 #3  0x56444d07a4c8 in pg_detoast_datum_packed (datum=datum@entry=0x7f67933cc6cc) at fmgr.c:1743
 #4  0x56444d042c8d in text_to_cstring (t=0x7f67933cc6cc) at varlena.c:224
 #5  0x56444d0434f9 in textout (fcinfo=) at varlena.c:573
 #6  0x56444d078f10 in FunctionCall1Coll (flinfo=flinfo@entry=0x56444e4706b0, collation=collation@entry=0, arg1=arg1@entry=140082828592844) at fmgr.c:1124
 #7  0x56444d079d7f in OutputFunctionCall (flinfo=flinfo@entry=0x56444e4706b0, val=val@entry=140082828592844) at fmgr.c:1561
 #8  0x56444ccb1665 in CopyOneRowTo (cstate=cstate@entry=0x56444e470898, slot=slot@entry=0x56444e396d20) at copyto.c:975
 #9  0x56444ccb2c7d in DoCopyTo (cstate=cstate@entry=0x56444e470898) at copyto.c:891
 #10 0x56444ccab4c2 in DoCopy (pstate=pstate@entry=0x56444e396bb0, stmt=stmt@entry=0x56444e3759b0, stmt_location=0, stmt_len=48, processed=processed@entry=0x7ffc212a6310) at copy.c:308

cluster:
heapam_relation_copy_for_cluster
reform_and_rewrite_tuple
rewrite_heap_tuple
raw_heap_insert

//-os-only: linux
---
 src/backend/access/common/toast_internals.c |  6 --
 src/backend/access/heap/heapam.c| 24 +++--
 src/backend/access/heap/heaptoast.c | 11 ++
 src/backend/access/heap/rewriteheap.c   |  6 +-
 src/backend/access/table/toast_helper.c |  6 --
 src/include/access/heaptoast.h  |  4 +++-
 src/include/access/hio.h|  2 ++
 src/include/access/toast_helper.h   |  3 ++-
 src/include/access/toast_internals.h|  4 +++-
 9 files changed, 48 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 371515395a5..3b52cba5eb1 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -118,7 +118,8 @@ toast_compress_datum(Datum value, char cmethod)
  */
 Datum
 toast_save_datum(Relation rel, Datum value,
- struct varlena *oldexternal, int options)
+ struct varlena *oldexternal, int options,
+ BulkInsertState bistate)
 {
 	Relation	toastrel;
 	Relation   *toastidxs;
@@ -321,7 +322,8 @@ toast_save_datum(Relation rel, Datum value,
 		memcpy(VARDATA(_data), data_p, chunk_size);
 		toasttup = heap_form_tuple(toasttupDesc, t_values, t_isnull);
 
-		heap_insert(toastrel, toasttup, mycid, options, NULL);
+		heap_insert(toastrel, toasttup, mycid, options, bistate ?
+	bistate->toast_state : NULL);
 
 		/*
 		 * Create the index entry.  We cheat a little here by not using
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14de8158d49..470ffa7e708 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -74,7 +74,8 @@
 
 
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
-	 TransactionId xid, CommandId cid, int options);
+	 TransactionId xid, CommandId cid, int options,
+	 BulkInsertState bistate);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
   Buffer newbuf, HeapTuple oldtup,
   HeapTuple newtup, HeapTuple old_key_tuple,
@@ -1765,9 +1766,15 @@ GetBulkInsertState(void)
 	bistate = (BulkInsertState) palloc(sizeof(BulkInsertStateData));
 	bistate->strategy = GetAccessStrategy(BAS_BULKWRITE);
 	bistate->current_buf = InvalidBuffer;
+
 	bistate->next_free = InvalidBlockNumber;
 	bistate->last_free = InvalidBlockNumber;
 	bistate->already_extended_by = 0;
+
+	bistate->toast_state = (BulkInsertState) palloc(sizeof(BulkInsertStateData));
+	bistate->toast_state->strategy = GetAccessStrategy(BAS_BULKWRITE);
+	bistate->toast_state->current_buf = InvalidBuffer;
+
 	return bistate;
 }
 
@@ -1779,6 +1786,10 @@ FreeBulkInsertState(BulkInsertState bistate)
 {
 	if (bistate->current_buf != InvalidBuffer)
 		ReleaseBuffer(bistate->current_buf);
+	if (bistate->toast_state->current_buf != InvalidBuffer)
+		ReleaseBuffer(bistate->toast_state->current_buf);
+
+	FreeAccessStrategy(bistate->toast_state->strategy);
 	FreeAccessStrategy(bistate->strategy);
 	pfree(bistate);
 }
@@ -1844,

XX000: tuple concurrently deleted during DROP STATISTICS

2023-11-08 Thread Justin Pryzby

I found this in our logs, and reproduced it under v11-v16.

CREATE TABLE t(a int, b int);
INSERT INTO t SELECT generate_series(1,999);
CREATE STATISTICS t_stats ON a,b FROM t;

while :; do psql postgres -qtxc "ANALYZE t"; done &
while :; do psql postgres -qtxc "begin; DROP STATISTICS t_stats"; done &

It's known that concurrent DDL can hit elog().  But in this case,
there's only one DDL operation.

(gdb) bt
#0  0x009442a0 in pg_re_throw ()
#1  0x00943504 in errfinish ()
#2  0x004fcafe in simple_heap_delete ()
#3  0x00639d3f in RemoveStatisticsDataById ()
#4  0x00639d79 in RemoveStatisticsById ()
#5  0x0057a428 in deleteObjectsInList ()
#6  0x0057a8f0 in performMultipleDeletions ()
#7  0x0060b5ed in RemoveObjects ()
#8  0x008099ce in ProcessUtilitySlow.isra.1 ()
#9  0x00808c71 in standard_ProcessUtility ()
#10 0x7efbfed7a508 in pgss_ProcessUtility () from 
/usr/pgsql-16/lib/pg_stat_statements.so
#11 0x0080745a in PortalRunUtility ()
#12 0x00807579 in PortalRunMulti ()
#13 0x008079dc in PortalRun ()
#14 0x00803927 in exec_simple_query ()
#15 0x00803f28 in PostgresMain ()
#16 0x0077bae6 in ServerLoop ()
#17 0x0077cbaa in PostmasterMain ()
#18 0x004ba788 in main ()

-- 
Justin

Re: pg16: invalid page/page verification failed

2023-10-06 Thread Justin Pryzby

On Fri, Oct 06, 2023 at 08:47:39AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2023-10-06 09:20:05 +0900, Michael Paquier wrote:
> > On Thu, Oct 05, 2023 at 11:45:18AM -0500, Justin Pryzby wrote:
> > > This table is what it sounds like: a partition into which CSV logs are
> > > COPY'ed.  It would've been created around 8am.  There's no special
> > > params set for the table nor for autovacuum.
> > 
> > This may be an important bit of information.  31966b151e6a is new as
> > of Postgres 16, has changed the way relations are extended and COPY
> > was one area touched.  I am adding Andres in CC.
> 
> Hm, is there any chance the COPY targets more than one partition? If so, this
> sounds like it might be the issue described here
> https://postgr.es/m/20230925213746.fwqauhhifjgefyzk%40alap3.anarazel.de

The first error was from:
log_time | 2023-10-05 09:57:01.939-05
left | COPY postgres_log FROM 
'/var/log/postgresql/postgresql-2023-10-05_095200.csv' WITH csv

Unfortunately, I no longer have the CSV files which caused errors.
After I moved the broken table out of the way and created a new
partition, they would've been imported successfully, and then removed.

Also, it's sad, but the original 2023_10_05_0900 partition I created was
itself rotated out of existence a few hours ago (I still have the most
interesting lines, though).

I've seen that it's possible for a CSV to include some data that ideally
would've gone into the "next" CSV:  2023-01-01_18.csv might include a line
of data after 6pm.  For example, with log_rotation_age=2min,
postgresql-2023-10-06_120800.csv had a row after 12:10:
2023-10-06 12:10:00.101 
CDT,"pryzbyj","pryzbyj",5581,"[local]",65203f66.15cd,2,...

But I'm not sure how that can explain this issue, because this was
095600.csv, and not 095800.csv.  My script knows to create the "next"
partition, to handle the case that the file includes some data that
should've gone to the next logfile.  I'm handling that case with the
anticipation that there might be a few tenths of a second or even a few
seconds of logs in the wrong file - typically 0 lines and sometimes 1
line.  I don't know if it's even possible to have multiple lines in the
"wrong" file.  In any case, I'm not not expecting log rotation to be 2
minutes behind.

Also, not only was the data in the CSV earlier than 10am, but the error
*itself* was also earlier.  The error importing the CSV was at 9:57, so
the CSV couldn't have had data after 10:00.  Not that it matters, but my
script doesn't import the most recent logfile, and also avoids importing
files written within the last minute.

I don't see how a CSV with a 2 minute interval of data beginning at 9:56
could straddle hourly partitions.

log_time | 2023-10-05 09:57:01.939-05
left | invalid page in block 119 of relation base/16409/801594131
left | COPY postgres_log FROM 
'/var/log/postgresql/postgresql-2023-10-05_095200.csv' WITH csv

log_time | 2023-10-05 09:57:01.939-05
left | page verification failed, calculated checksum 5074 but expected 50
left | COPY postgres_log FROM 
'/var/log/postgresql/postgresql-2023-10-05_095200.csv' WITH csv

-- 
Justin

Re: pg16: invalid page/page verification failed

2023-10-06 Thread Justin Pryzby

On Fri, Oct 06, 2023 at 09:20:05AM +0900, Michael Paquier wrote:
> On Thu, Oct 05, 2023 at 11:45:18AM -0500, Justin Pryzby wrote:
> > This table is what it sounds like: a partition into which CSV logs are
> > COPY'ed.  It would've been created around 8am.  There's no special
> > params set for the table nor for autovacuum.
> 
> This may be an important bit of information.  31966b151e6a is new as
> of Postgres 16, has changed the way relations are extended and COPY
> was one area touched.  I am adding Andres in CC.

Also, I realized that someone kicked off a process just after 9am which
would've done a lot of INSERT ON CONFLICT DO UPDATE, VACUUM FULL, and
VACUUM.  Which consumed and dirtied buffers about 100x faster than normal.

log_time | 2023-10-05 10:00:55.794-05
pid  | 31754
left | duration: 51281.001 ms  statement: VACUUM (FULL,FREEZE) 
othertable...

log_time | 2023-10-05 10:01:01.784-05
backend_type | checkpointer
left | checkpoint starting: time

log_time | 2023-10-05 10:01:02.935-05
pid  | 10023
left | page verification failed, calculated checksum 5074 but 
expected 5050
context  | COPY postgres_log, line 947
left | COPY postgres_log FROM 
'/var/log/postgresql/postgresql-2023-10-05_095600.csv' WITH csv

log_time | 2023-10-05 10:01:02.935-05
pid  | 10023
left | invalid page in block 119 of relation base/16409/801594131
context  | COPY postgres_log, line 947
left | COPY postgres_log FROM 
'/var/log/postgresql/postgresql-2023-10-05_095600.csv' WITH csv

log_time | 2023-10-05 10:01:11.636-05
pid  | 31754
left | duration: 15838.374 ms  statement: VACUUM (FREEZE) 
othertable...

I meant to point out that the issue is on the last block.

postgres=# SELECT 
pg_relation_size('"BROKEN_postgres_log_2023_10_05_0900"')/8192;
?column? | 120

It sounds like there may be an issue locking (pinning?) a page, or
rather not locking it, or releasing the lock too early.

-- 
Justin

Re: pg16: invalid page/page verification failed

2023-10-05 Thread Justin Pryzby

On Thu, Oct 05, 2023 at 07:16:31PM +0200, Matthias van de Meent wrote:
> On Thu, 5 Oct 2023 at 18:48, Justin Pryzby  wrote:
> >
> > On an instance running pg16.0:
> >
> > log_time | 2023-10-05 10:03:00.014-05
> > backend_type | autovacuum worker
> > left | page verification failed, calculated checksum 5074 but 
> > expected 5050
> > context  | while scanning block 119 of relation 
> > "public.postgres_log_2023_10_05_0900"
> >
> > This is the only error I've seen so far, and for all I know there's a
> > issue on the storage behind the VM, or a cosmic ray hit.  But I moved
> > the table out of the way and saved a copy of get_raw_page() in case
> > someone wants to ask about it.
> >
> > postgres=# SELECT * FROM 
> > heap_page_item_attrs(get_raw_page(801594131::regclass::text, 119), 
> > 801594131);
> >  lp  | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | 
> > t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_attrs
> >1 |   2304 |1 | 16 |||  ||   
> >   ||||   |
> >2 |   8160 |1 | 16 |||  ||   
> >   ||||   |
> >3 |   8144 |1 | 16 |||  ||   
> >   ||||   |
> > ...all the same except for lp_off...
> >  365 |   2352 |1 | 16 |||  ||   
> >   ||||   |
> >  366 |   2336 |1 | 16 |||  ||   
> >   ||||   |
> >  367 |   2320 |1 | 16 |||  ||   
> >   ||||   |
> 
> That's not a HEAP page; it looks more like a btree page: lp_len is too
> short for heap (which starts at lp_len = 24), and there are too many
> line pointers for an 8KiB heap page. btree often has lp_len of 16: 8
> bytes indextuple header, one maxalign of data (e.g. int or bigint).
> 
> So, assuming it's a block of a different relation kind, then it's also
> likely it was originally located elsewhere in that other relation,
> indeed causing the checksum failure. You can further validate this by
> looking at the page header's pd_special value - if it is 8176, that'd
> be another indicator for it being a btree.

Nice point.

postgres=# SET ignore_checksum_failure=on; SELECT * FROM 
generate_series(115,119) AS a, 
page_header(get_raw_page(801594131::regclass::text, a)) AS b;
WARNING:  page verification failed, calculated checksum 5074 but expected 5050
  a  | lsn  | checksum | flags | lower | upper | special | pagesize | 
version | prune_xid 
-+--+--+---+---+---+-+--+-+---
 115 | B61/A9436C8  |   -23759 | 4 |92 |   336 |8192 | 8192 |   
4 | 0
 116 | B61/A944FA0  | 3907 | 4 |   104 |   224 |8192 | 8192 |   
4 | 0
 117 | B61/A946828  |   -24448 | 4 |76 |   264 |8192 | 8192 |   
4 | 0
 118 | B61/A94CCE0  |26915 | 4 |28 |  6256 |8192 | 8192 |   
4 | 0
 119 | B5C/9F30D1C8 | 5050 | 0 |  1492 |  2304 |8176 | 8192 |   
4 | 0

The table itself has a few btree indexes on text columns and a brin
index on log_timestamp, but not on the integers.

It sounds like it's what's expected at this point, but after I
"SET ignore_checksum_failure=on", and read the page in, vacuum kicked
off and then crashed (in heap_page_prune() if that half of the stack
trace can be trusted).

*** stack smashing detected ***: postgres: autovacuum worker postgres terminated

< 2023-10-05 12:35:30.764 CDT  >LOG:  server process (PID 30692) was terminated 
by signal 11: Segmentation fault
< 2023-10-05 12:35:30.764 CDT  >DETAIL:  Failed process was running: 
autovacuum: VACUUM ANALYZE public.BROKEN_postgres_log_2023_10_05_0900

I took the opportunity to fsck the FS, which showed no errors.

I was curious if the relfilenodes had gotten confused/corrupted/??
But this seems to indicate not; the problem is only one block.

postgres=# SELECT oid, relfilenode, oid=relfilenode, relname FROM pg_class 
WHERE oid BETWEEN 80155 AND 801594199 ORDER BY 1;
oid| relfilenode | ?column? | relname   
  
---+-+--+-
 801564542 |   801564542 | t| postgres_log_2023_10_05_0800
 801564545 |   801564545 | t| pg_toast_80

pg16: invalid page/page verification failed

2023-10-05 Thread Justin Pryzby

On an instance running pg16.0:

log_time | 2023-10-05 10:03:00.014-05
backend_type | autovacuum worker
left | page verification failed, calculated checksum 5074 but 
expected 5050
context  | while scanning block 119 of relation 
"public.postgres_log_2023_10_05_0900"

This is the only error I've seen so far, and for all I know there's a
issue on the storage behind the VM, or a cosmic ray hit.  But I moved
the table out of the way and saved a copy of get_raw_page() in case
someone wants to ask about it.

 public | BROKEN_postgres_log_2023_10_05_0900 | table | postgres | permanent   
| heap | 1664 kB

This table is what it sounds like: a partition into which CSV logs are
COPY'ed.  It would've been created around 8am.  There's no special
params set for the table nor for autovacuum.

Although we have a ZFS tablespace, these tables aren't on it, and
full_page_writes=on.  There's no crashes, and the instance has been up
since it was pg_upgraded from v15.4 on sep25.

pg_stat_all_tables indicates that the table was never (successfully)
vacuumed.

This was compiled to RPM on centos7, and might include a few commits
made since v16.0.

postgres=# SELECT * FROM 
heap_page_item_attrs(get_raw_page(801594131::regclass::text, 119), 801594131);
 lp  | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | 
t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_attrs 
   1 |   2304 |1 | 16 |||  ||   
  ||||   | 
   2 |   8160 |1 | 16 |||  ||   
  ||||   | 
   3 |   8144 |1 | 16 |||  ||   
  ||||   | 
...all the same except for lp_off...
 365 |   2352 |1 | 16 |||  ||   
  ||||   | 
 366 |   2336 |1 | 16 |||  ||   
  ||||   | 
 367 |   2320 |1 | 16 |||  ||   
  ||||   | 

postgres=# SELECT FROM (SELECT tuple_data_split(801594131, t_data, t_infomask, 
t_infomask2, t_bits) a FROM 
heap_page_items(get_raw_page(801594131::regclass::text, 119))) WHERE a IS NOT 
NULL;
WARNING:  page verification failed, calculated checksum 5074 but expected 5050
(0 rows)

-- 
Justin

Re: [HACKERS] pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL

2023-09-29 Thread Justin Pryzby

On Fri, Sep 29, 2023 at 09:16:35AM +0900, Michael Paquier wrote:
> You mean when upgrading from an instance of 9.6 or older as c30f177 is
> not there, right?

No - while upgrading from v15 to v16.  I'm not clear on how we upgraded
*to* v15 without hitting the issue, nor how the "not null" got
dropped...

> Anyway, it seems like the patch from [1] has no need to run this check
> when the old cluster's version is 10 or newer.  And perhaps it should
> mention that this check could be removed from pg_upgrade once v10
> support is out of scope, in the shape of a comment.

You're still thinking of PRIMARY KEY as the only way to hit this, right?
But Ali Akbar already pointed out how to reproduce the problem with DROP
NOT NULL - which still applies to both v16 and v17.

Re: [HACKERS] pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL

2023-09-28 Thread Justin Pryzby

On Thu, Dec 14, 2017 at 07:18:59PM +0700, Ali Akbar wrote:
> By the way, should i add this patch to the current commitfest?

The patch for pg_upgrade --check got forgotten 6 years ago, but it's a
continuing problem (we hit it again which cost an hour during
pg_upgrade) and ought to be (have been) backpatched.

I didn't dig into it, but maybe it'd even be possible to fix the issue
with ALTER..DROP NOT NULL ...

-- 
Justin

pg16: XX000: could not find pathkey item to sort

2023-09-18 Thread Justin Pryzby

This fails since 1349d2790b

commit 1349d2790bf48a4de072931c722f39337e72055e
Author: David Rowley 
Date:   Tue Aug 2 23:11:45 2022 +1200

Improve performance of ORDER BY / DISTINCT aggregates

ts=# CREATE TABLE t (a int, b text) PARTITION BY RANGE (a);
ts=# CREATE TABLE td PARTITION OF t DEFAULT;
ts=# INSERT INTO t SELECT 1 AS a, '' AS b;
ts=# SET enable_partitionwise_aggregate=on;
ts=# explain SELECT a, COUNT(DISTINCT b) FROM t GROUP BY a;
ERROR:  XX000: could not find pathkey item to sort
LOCATION:  prepare_sort_from_pathkeys, createplan.c:6235

-- 
Justin

Re: should frontend tools use syncfs() ?

2023-09-01 Thread Justin Pryzby

On Fri, Sep 01, 2023 at 11:08:51AM -0700, Nathan Bossart wrote:
> > This should probably give a distinct error when syncfs is not supported
> > than when it's truely recognized.
> 
> Later versions of the patch should have this.

Oops, right.

> > The patch should handle pg_dumpall, too.
> 
> It looks like pg_dumpall only ever fsyncs a single file, so I don't think
> it is really needed there.

What about (per git grep no-sync doc) pg_receivewal?

-- 
Justin

Re: should frontend tools use syncfs() ?

2023-09-01 Thread Justin Pryzby

> + if (!user_opts.sync_method)
> + user_opts.sync_method = pg_strdup("fsync");

why pstrdup?

> +parse_sync_method(const char *optarg, SyncMethod *sync_method)
> +{
> + if (strcmp(optarg, "fsync") == 0)
> + *sync_method = SYNC_METHOD_FSYNC;
> +#ifdef HAVE_SYNCFS
> + else if (strcmp(optarg, "syncfs") == 0)
> + *sync_method = SYNC_METHOD_SYNCFS;
> +#endif
> + else
> + {
> + pg_log_error("unrecognized sync method: %s", optarg);
> + return false;
> + }

This should probably give a distinct error when syncfs is not supported
than when it's truely recognized.

The patch should handle pg_dumpall, too.

Note that /bin/sync doesn't try to de-duplicate, it does just what you
tell it.

$ strace -e openat,syncfs,fsync sync / / / -f
...
openat(AT_FDCWD, "/", O_RDONLY|O_NONBLOCK) = 3
syncfs(3)   = 0
openat(AT_FDCWD, "/", O_RDONLY|O_NONBLOCK) = 3
syncfs(3)   = 0
openat(AT_FDCWD, "/", O_RDONLY|O_NONBLOCK) = 3
syncfs(3)   = 0
+++ exited with 0 +++

Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

2023-08-14 Thread Justin Pryzby

On Thu, Dec 15, 2022 at 10:13:23AM -0600, Justin Pryzby wrote:
> This patch record was "closed for lack of interest", but I think what's
> actually needed is committer review of which approach to take.

On Tue, Aug 01, 2023 at 09:54:34AM +0200, Daniel Gustafsson wrote:
> > On 24 May 2023, at 23:05, Justin Pryzby  wrote:
> 
> > I'm planning to set this patch as ready
> 
> This is marked RfC so I'm moving this to the next CF, but the patch no longer
> applies so it needs a rebase.

I was still hoping to receive some feedback on which patches to squish.

-- 
Justin
>From 2b97b0fe27199a343f00f31aaf3fcd79fd1228f1 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 13 Jul 2021 21:25:48 -0500
Subject: [PATCH 1/4] Add pg_am_size(), pg_namespace_size() ..

See also: 358a897fa, 528ac10c7
---
 doc/src/sgml/func.sgml  |  39 ++
 src/backend/utils/adt/dbsize.c  | 132 
 src/include/catalog/pg_proc.dat |  19 +
 3 files changed, 190 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index be2f54c9141..8a1c8226c48 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27814,6 +27814,45 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset

   
 
+  
+   
+
+ pg_namespace_size
+
+pg_namespace_size ( name )
+bigint
+   
+   
+pg_namespace_size ( oid )
+bigint
+   
+   
+Computes the total disk space used by relations in the namespace (schema)
+with the specified name or OID. To use this function, you must
+have CREATE privilege on the specified namespace
+or have privileges of the pg_read_all_stats role,
+unless it is the default namespace for the current database.
+   
+  
+
+  
+   
+
+ pg_am_size
+
+pg_am_size ( name )
+bigint
+   
+   
+pg_am_size ( oid )
+bigint
+   
+   
+Computes the total disk space used by relations using the access method
+with the specified name or OID.
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index e5c0f1c45b6..af0955d1790 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -13,19 +13,25 @@
 
 #include 
 
+#include "access/genam.h"
 #include "access/htup_details.h"
 #include "access/relation.h"
+#include "access/table.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_namespace.h"
 #include "catalog/pg_tablespace.h"
 #include "commands/dbcommands.h"
+#include "commands/defrem.h"
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "storage/fd.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
 #include "utils/numeric.h"
 #include "utils/rel.h"
 #include "utils/relfilenumbermap.h"
@@ -858,6 +864,132 @@ pg_size_bytes(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(result);
 }
 
+/*
+ * Return the sum of size of relations for which the given attribute of
+ * pg_class matches the specified OID value.
+ */
+static int64
+calculate_size_attvalue(AttrNumber attnum, Oid attval)
+{
+	int64		totalsize = 0;
+	ScanKeyData skey;
+	Relation	pg_class;
+	SysScanDesc scan;
+	HeapTuple	tuple;
+
+	ScanKeyInit(, attnum,
+BTEqualStrategyNumber, F_OIDEQ, attval);
+
+	pg_class = table_open(RelationRelationId, AccessShareLock);
+	scan = systable_beginscan(pg_class, InvalidOid, false, NULL, 1, );
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple);
+		Relation	rel;
+
+		rel = try_relation_open(classtuple->oid, AccessShareLock);
+		if (!rel)
+			continue;
+
+		for (ForkNumber forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+			totalsize += calculate_relation_size(&(rel->rd_locator), rel->rd_backend, forkNum);
+
+		relation_close(rel, AccessShareLock);
+	}
+
+	systable_endscan(scan);
+	table_close(pg_class, AccessShareLock);
+	return totalsize;
+}
+
+/* Compute the size of relations in a schema (namespace) */
+static int64
+calculate_namespace_size(Oid nspOid)
+{
+	/*
+	 * User must be a member of pg_read_all_stats or have CREATE privilege for
+	 * target namespace.
+	 */
+	if (!is_member_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+	{
+		AclResult	aclresult;
+
+		aclresult = object_aclcheck(NamespaceRelationId, nspOid, GetUserId(), ACL_CREATE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_SCHEMA,
+

pg15: reltuples stuck at -1 after pg_upgrade and VACUUM

2023-08-08 Thread Justin Pryzby

Since 3d351d916 (pg14), reltuples -1 means that the rel has never been
vacuumed nor analyzed.

But since 4496020e6d (backpatched to pg15), following pg_upgrade, vacuum
can leave reltuples=-1.

commit 4496020e6dfaffe8217e4d3f85567bb2b6927b45
Author: Peter Geoghegan 
Date:   Fri Aug 19 09:26:06 2022 -0700

Avoid reltuples distortion in very small tables.

$ /usr/pgsql-15/bin/initdb -N -D ./pg15.dat2
$ /usr/pgsql-15/bin/initdb -N -D ./pg15.dat3

$ /usr/pgsql-15/bin/postgres -c logging_collector=no -p 5678 -k /tmp -D 
./pg15.dat2& # old cluster, pre-upgrade
postgres=# CREATE TABLE t AS SELECT generate_series(1,);
postgres=# SELECT reltuples FROM pg_class WHERE oid='t'::regclass;
reltuples | -1
postgres=# VACUUM FREEZE t;
postgres=# SELECT reltuples FROM pg_class WHERE oid='t'::regclass;
reltuples | 

$ /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-15/bin -d ./pg15.dat2 
-D./pg15.dat3 # -c logging_collector=no -p 5678 -k /tmp&

$ /usr/pgsql-15/bin/postgres -c logging_collector=no -p 5678 -k /tmp -D 
./pg15.dat3& # new cluster, post-upgrade
postgres=# VACUUM FREEZE VERBOSE t;
postgres=# SELECT reltuples FROM pg_class WHERE oid='t'::regclass;
reltuples | -1

The problem isn't that reltuples == -1 after the upgrade (which is
normal).  The issue is that if VACUUM skips all the pages, it can leave
reltuples -1.  My expectation is that after running "vacuum", no tables
are left in the "never have been vacuumed" state.

If the table was already frozen, then VACUUM (FREEZE) is inadequate to
fix it, and you need to use DISABLE_PAGE_SKIPPING.

-- 
Justin

pg15.3: dereference null *plan in CachedPlanIsSimplyValid/plpgsql

2023-07-20 Thread Justin Pryzby

A production instance crashed like so.

< 2023-07-20 05:41:36.557 MDT  >LOG:  server process (PID 106001) was 
terminated by signal 11: Segmentation fault

[168417.649549] postmaster[106001]: segfault at 12 ip 008f1d40 sp 
7ffe885502e0 error 4 in postgres[40+7dc000]

Core was generated by `postgres: main main 127.0.0.1(51936) SELECT'.

(gdb) bt
#0  CachedPlanIsSimplyValid (plansource=0x2491398, plan=0x0, owner=0x130a198) 
at plancache.c:1441
#1  0x7fcab2e93d15 in exec_eval_simple_expr (rettypmod=0x7ffe88550374, 
rettype=0x7ffe88550370, isNull=0x7ffe8855036f, result=, 
expr=0x2489dc8, estate=0x7ffe885507f0) at pl_exec.c:6067
#2  exec_eval_expr (estate=0x7ffe885507f0, expr=0x2489dc8, 
isNull=0x7ffe8855036f, rettype=0x7ffe88550370, rettypmod=0x7ffe88550374) at 
pl_exec.c:5696
#3  0x7fcab2e97d42 in exec_assign_expr (estate=estate@entry=0x7ffe885507f0, 
target=0x16250f8, expr=0x2489dc8) at pl_exec.c:5028
#4  0x7fcab2e98675 in exec_stmt_assign (stmt=0x2489d80, stmt=0x2489d80, 
estate=0x7ffe885507f0) at pl_exec.c:2155
#5  exec_stmts (estate=estate@entry=0x7ffe885507f0, stmts=0x2489ad8) at 
pl_exec.c:2019
#6  0x7fcab2e9b672 in exec_stmt_block (estate=estate@entry=0x7ffe885507f0, 
block=block@entry=0x248af48) at pl_exec.c:1942
#7  0x7fcab2e9b6cb in exec_toplevel_block 
(estate=estate@entry=0x7ffe885507f0, block=0x248af48) at pl_exec.c:1633
#8  0x7fcab2e9bf84 in plpgsql_exec_function (func=func@entry=0x1035388, 
fcinfo=fcinfo@entry=0x40a76b8, simple_eval_estate=simple_eval_estate@entry=0x0, 
simple_eval_resowner=simple_eval_resowner@entry=0x0,
procedure_resowner=procedure_resowner@entry=0x0, atomic=atomic@entry=true) 
at pl_exec.c:622
#9  0x7fcab2ea7454 in plpgsql_call_handler (fcinfo=0x40a76b8) at 
pl_handler.c:277
#10 0x00658307 in ExecAggPlainTransByRef (setno=, 
aggcontext=, pergroup=0x41a3358, pertrans=, 
aggstate=) at execExprInterp.c:4379
#11 ExecInterpExpr (state=0x40a7870, econtext=0x1843a58, isnull=) at execExprInterp.c:1770
#12 0x00670282 in ExecEvalExprSwitchContext (isNull=0x7ffe88550b47, 
econtext=, state=) at 
../../../src/include/executor/executor.h:341
#13 advance_aggregates (aggstate=, aggstate=) at 
nodeAgg.c:824
#14 agg_fill_hash_table (aggstate=0x1843630) at nodeAgg.c:2544
#15 ExecAgg (pstate=0x1843630) at nodeAgg.c:2154
#16 0x00662c58 in ExecProcNodeInstr (node=0x1843630) at 
execProcnode.c:480
#17 0x0067c7f3 in ExecProcNode (node=0x1843630) at 
../../../src/include/executor/executor.h:259
#18 ExecHashJoinImpl (parallel=false, pstate=0x1843390) at nodeHashjoin.c:268
#19 ExecHashJoin (pstate=0x1843390) at nodeHashjoin.c:621
#20 0x00662c58 in ExecProcNodeInstr (node=0x1843390) at 
execProcnode.c:480
#21 0x0067c7f3 in ExecProcNode (node=0x1843390) at 
../../../src/include/executor/executor.h:259
#22 ExecHashJoinImpl (parallel=false, pstate=0x18430f0) at nodeHashjoin.c:268
#23 ExecHashJoin (pstate=0x18430f0) at nodeHashjoin.c:621
#24 0x00662c58 in ExecProcNodeInstr (node=0x18430f0) at 
execProcnode.c:480
#25 0x0068de76 in ExecProcNode (node=0x18430f0) at 
../../../src/include/executor/executor.h:259
#26 ExecSort (pstate=0x1842ee0) at nodeSort.c:149
#27 0x00662c58 in ExecProcNodeInstr (node=0x1842ee0) at 
execProcnode.c:480
#28 0x0066ced9 in ExecProcNode (node=0x1842ee0) at 
../../../src/include/executor/executor.h:259
#29 fetch_input_tuple (aggstate=aggstate@entry=0x1842908) at nodeAgg.c:563
#30 0x006701b2 in agg_retrieve_direct (aggstate=0x1842908) at 
nodeAgg.c:2346
#31 ExecAgg (pstate=0x1842908) at nodeAgg.c:2161
#32 0x00662c58 in ExecProcNodeInstr (node=0x1842908) at 
execProcnode.c:480
#33 0x0065be12 in ExecProcNode (node=0x1842908) at 
../../../src/include/executor/executor.h:259
#34 ExecutePlan (execute_once=, dest=0x5fb39f8, 
direction=, numberTuples=0, sendTuples=true, 
operation=CMD_SELECT, use_parallel_mode=, planstate=0x1842908, 
estate=0x1842298)
at execMain.c:1636
#35 standard_ExecutorRun (queryDesc=0x2356ab8, direction=, 
count=0, execute_once=) at execMain.c:363
#36 0x7fcab32bd67d in pgss_ExecutorRun (queryDesc=0x2356ab8, 
direction=ForwardScanDirection, count=0, execute_once=) at 
pg_stat_statements.c:1010
#37 0x7fcab30b7781 in explain_ExecutorRun (queryDesc=0x2356ab8, 
direction=ForwardScanDirection, count=0, execute_once=) at 
auto_explain.c:320
#38 0x007d010e in PortalRunSelect (portal=portal@entry=0xf79598, 
forward=forward@entry=true, count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x5fb39f8) at pquery.c:924
#39 0x007d15bf in PortalRun (portal=, 
count=9223372036854775807, isTopLevel=, run_once=, dest=0x5fb39f8, altdest=0x5fb39f8, qc=0x7ffe885512d0) at pquery.c:768
#40 0x007cd417 in exec_simple_query (
query_string=0x15ee788 "-- report: thresrept: None: E-UTRAN/eNB KPIs: 
['2023-07-19 2023-07-20', '0 23', '4', '{\"report_type\": \"hourly\",

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2023-07-19 Thread Justin Pryzby

On Thu, Jun 01, 2023 at 08:50:50AM -0400, Michael Paquier wrote:
> On Wed, May 31, 2023 at 06:35:34PM -0500, Justin Pryzby wrote:
> > What do you think the comment ought to say ?  It already says:
> >
> > src/backend/catalog/heap.c-  * Make a dependency link to force 
> > the relation to be deleted if its
> > src/backend/catalog/heap.c-  * access method is.
> 
> This is the third location where we rely on the fact that
> RELKIND_HAS_TABLE_AM() does not include RELKIND_PARTITIONED_TABLE, so
> it seems worth documenting what we are relying on as a comment?  Say:
>  * Make a dependency link to force the relation to be deleted if its
>  * access method is.
>  *
>  * No need to add an explicit dependency for the toast table, as the
>  * main table depends on it.  Partitioned tables have a table access
>  * method defined, and RELKIND_HAS_TABLE_AM ignores them.

You said that this location "relies on" the macro not including
partitioned tables, but I would say the opposite: the places that use
RELKIND_HAS_TABLE_AM() and do *not* say "or relkind==PARTITIONED_TABLE"
are the ones that "rely on" that...

Anyway, this updates various comments.  No other changes.

-- 
Justin
>From e96a2a109d25bb9ed7cc9c0bf80c0824128dceac Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 7 Mar 2021 00:11:38 -0600
Subject: [PATCH] Allow specifying access method of partitioned tables..

..to be inherited by partitions

See also:
ca4103025dfe26eaaf6a500dec9170fbb176eebc
8586bf7ed8889f39a59dd99b292014b73be85342
ebfe2dbd6b624e2a428e14b7ee9322cc096f63f7 - prevent DROP AM

Authors: Justin Pryzby, Soumyadeep Chakraborty
---
 doc/src/sgml/ref/alter_table.sgml   |  4 ++
 doc/src/sgml/ref/create_table.sgml  |  9 ++-
 src/backend/catalog/heap.c  |  6 +-
 src/backend/commands/tablecmds.c| 89 +++--
 src/backend/utils/cache/lsyscache.c | 22 ++
 src/backend/utils/cache/relcache.c  |  5 ++
 src/bin/pg_dump/pg_dump.c   |  3 +-
 src/bin/pg_dump/t/002_pg_dump.pl| 34 ++
 src/include/catalog/pg_class.h  |  3 +-
 src/include/utils/lsyscache.h   |  1 +
 src/test/regress/expected/create_am.out | 62 -
 src/test/regress/sql/create_am.sql  | 25 +--
 12 files changed, 213 insertions(+), 50 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index d4d93eeb7c6..d32d4c44b10 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -719,6 +719,10 @@ WITH ( MODULUS numeric_literal, REM
  
   This form changes the access method of the table by rewriting it. See
for more information.
+  When applied to a partitioned table, there is no data to rewrite, but any
+  partitions created afterwards with
+  CREATE TABLE PARTITION OF will use that access method,
+  unless overridden by an USING clause.
  
 

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 10ef699fab9..b20d272b151 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1295,9 +1295,12 @@ WITH ( MODULUS numeric_literal, REM
   This optional clause specifies the table access method to use to store
   the contents for the new table; the method needs be an access method of
   type TABLE. See  for more
-  information.  If this option is not specified, the default table access
-  method is chosen for the new table. See  for more information.
+  information.  If this option is not specified, a default table access
+  method is chosen for the new table.
+  When creating a partition, the default table access method is the
+  access method of its parent.
+  For other tables, the default is determined by
+  .
  
 

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 4c30c7d461f..82e003cabc1 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1450,9 +1450,11 @@ heap_create_with_catalog(const char *relname,
 		 * access method is.
 		 *
 		 * No need to add an explicit dependency for the toast table, as the
-		 * main table depends on it.
+		 * main table depends on it.  Partitioned tables have an access method
+		 * defined, but are not handled by RELKIND_HAS_TABLE_AM.
 		 */
-		if (RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE)
+		if ((RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE) ||
+relkind == RELKIND_PARTITIONED_TABLE)
 		{
 			ObjectAddressSet(referenced, AccessMethodRelationId, accessmtd);
 			add_exact_object_address(, addrs);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 4dc029f91f1..bc343539aeb 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/c

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2023-07-16 Thread Justin Pryzby

On Thu, Jun 01, 2023 at 08:50:50AM -0400, Michael Paquier wrote:
> >> Speaking of which, ATExecSetAccessMethodNoStorage() does a catalog
> >> update even if ALTER TABLE is defined to use the same table AM as what
> >> is currently set.  There is no need to update the relation's pg_class
> >> entry in this case.  Be careful that InvokeObjectPostAlterHook() still
> >> needs to be checked in this case.  Perhaps some tests should be added
> >> in test_oat_hooks.sql?  It would be tempted to add a new SQL file for
> >> that.
> >
> > Are you suggesting to put this in a conditional: if 
> > oldrelam!=newAccessMethod ?
> 
> Yes, that's what I would add with a few lines close to the beginning
> of ATExecSetTableSpaceNoStorage() to save from catalog updates if
> these are not needed.

I understand that it's possible for it to be conditional, but I don't
undertand why it's desirable/important ?

It still seems preferable to be unconditional.

On Wed, May 31, 2023 at 06:35:34PM -0500, Justin Pryzby wrote:
>> Why is that desirable ?  I'd prefer to see it written with fewer
>> conditionals, not more; then, it hits the same path every time.

It's not conditional in the tablespace code that this follows, nor
others like ATExecSetStatistics(), ATExecForceNoForceRowSecurity(),
ATExecSetCompression(), etc, some of which are recently-added.  If it's
important to do here, isn't it also important to do everywhere else ?

-- 
Justin

Re: CREATE INDEX CONCURRENTLY on partitioned index

2023-07-12 Thread Justin Pryzby

On Mon, Mar 27, 2023 at 01:28:24PM +0300, Alexander Pyhalov wrote:
> Justin Pryzby писал 2023-03-26 17:51:
> > On Sun, Dec 04, 2022 at 01:09:35PM -0600, Justin Pryzby wrote:
> > > This currently handles partitions with a loop around the whole CIC
> > > implementation, which means that things like WaitForLockers() happen
> > > once for each index, the same as REINDEX CONCURRENTLY on a partitioned
> > > table.  Contrast that with ReindexRelationConcurrently(), which handles
> > > all the indexes on a table in one pass by looping around indexes within
> > > each phase.
> > 
> > Rebased over the progress reporting fix (27f5c712b).
> > 
> > I added a list of (intermediate) partitioned tables, rather than looping
> > over the list of inheritors again, to save calling rel_get_relkind().
> > 
> > I think this patch is done.
> 
> Overall looks good to me. However, I think that using 'partitioned' as list
> of partitioned index oids in DefineIndex() is a bit misleading - we've just
> used it as boolean, specifying if we are dealing with a partitioned
> relation.

Right.  This is also rebased on 8c852ba9a4 (Allow some exclusion
constraints on partitions).

-- 
Justin
>From 3f60cbdd12b67115f86854ff60a4009028b8b99f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 6 Jun 2020 17:42:23 -0500
Subject: [PATCH] Allow CREATE INDEX CONCURRENTLY on partitioned table

https://www.postgresql.org/message-id/flat/20201031063117.gf3...@telsasoft.com
---
 doc/src/sgml/ddl.sgml  |   4 +-
 doc/src/sgml/ref/create_index.sgml |  14 +-
 src/backend/commands/indexcmds.c   | 201 ++---
 src/test/regress/expected/indexing.out | 127 +++-
 src/test/regress/sql/indexing.sql  |  26 +++-
 5 files changed, 297 insertions(+), 75 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 58aaa691c6a..afa982154a8 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -4161,9 +4161,7 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
  so that they are applied automatically to the entire hierarchy.
  This is very
  convenient, as not only will the existing partitions become indexed, but
- also any partitions that are created in the future will.  One limitation is
- that it's not possible to use the CONCURRENTLY
- qualifier when creating such a partitioned index.  To avoid long lock
+ also any partitions that are created in the future will.  To avoid long lock
  times, it is possible to use CREATE INDEX ON ONLY
  the partitioned table; such an index is marked invalid, and the partitions
  do not get the index applied automatically.  The indexes on partitions can
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 40986aa502f..b05102efdaf 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -645,7 +645,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 If a problem arises while scanning the table, such as a deadlock or a
 uniqueness violation in a unique index, the CREATE INDEX
-command will fail but leave behind an invalid index. This index
+command will fail but leave behind an invalid index.
+If this happens while build an index concurrently on a partitioned
+table, the command can also leave behind valid or
+invalid indexes on table partitions.  The invalid index
 will be ignored for querying purposes because it might be incomplete;
 however it will still consume update overhead. The psql
 \d command will report such an index as INVALID:
@@ -692,15 +695,6 @@ Indexes:
 cannot.

 
-   
-Concurrent builds for indexes on partitioned tables are currently not
-supported.  However, you may concurrently build the index on each
-partition individually and then finally create the partitioned index
-non-concurrently in order to reduce the time where writes to the
-partitioned table will be locked out.  In this case, building the
-partitioned index is a metadata only operation.
-   
-
   
  
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index baf3e6e57a5..dfe64052b81 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -92,6 +92,11 @@ static char *ChooseIndexName(const char *tabname, Oid namespaceId,
 			 bool primary, bool isconstraint);
 static char *ChooseIndexNameAddition(List *colnames);
 static List *ChooseIndexColumnNames(List *indexElems);
+static void DefineIndexConcurrentInternal(Oid relationId,
+		  Oid indexRelationId,
+		  IndexInfo *indexInfo,
+		  LOCKTAG heaplocktag,
+		  LockRelId heaprelid);
 static void ReindexIndex(RangeVar *indexRelation, ReindexParams *params,
 		 bool isTopLevel);
 s

Re: pg16b2: REINDEX segv on null pointer in RemoveFromWaitQueue

2023-07-12 Thread Justin Pryzby

On Mon, Jul 10, 2023 at 09:01:37PM -0500, Justin Pryzby wrote:
> An instance compiled locally, without assertions, failed like this:
> 
...
> 
> => REINDEX was running, with parallel workers, but deadlocked with
> ANALYZE, and then crashed.
> 
> It looks like parallel workers are needed to hit this issue.
> I'm not sure if the issue is specific to extended stats - probably not.
> 
> I reproduced the crash with manual REINDEX+ANALYZE, and with assertions (which
> were not hit), and on a more recent commit (1124cb2cf).  The crash is hit 
> about
> 30% of the time when running a loop around REINDEX and then also running
> ANALYZE.
> 
> I hope someone has a hunch where to look; so far, I wasn't able to create a
> minimal reproducer.  

I was able to reproduce this in isolation by reloading data into a test
instance, ANALYZEing the DB to populate pg_statistic_ext_data (so it's
over 3MB in size), and then REINDEXing the stats_ext index in a loop
while ANALYZEing a table with extended stats.

I still don't have a minimal reproducer, but on a hunch I found that
this fails at 5764f611e but not its parent.

commit 5764f611e10b126e09e37fdffbe884c44643a6ce
Author: Andres Freund 
Date:   Wed Jan 18 11:41:14 2023 -0800

Use dlist/dclist instead of PROC_QUEUE / SHM_QUEUE for heavyweight locks

I tried compiling with -DILIST_DEBUG, but that shows nothing beyond
segfaulting, which seems to show that the lists themselves are fine.

-- 
Justin

Re: CI and test improvements

2023-07-11 Thread Justin Pryzby

On Tue, Jan 17, 2023 at 11:35:09AM -0600, Justin Pryzby wrote:
> However, this finds two real problems and one false-positive with
> missing regress/isolation tests:
> 
> $ for makefile in `find src contrib -name Makefile`; do for testname in `sed 
> -r '/^(REGRESS|ISOLATION) =/!d; s///; :l; /$/{s///; N; b l}; s/\n//g' 
> "$makefile"`; do meson=${makefile%/Makefile}/meson.build; grep -Fw 
> "$testname" "$meson" >/dev/null || echo "$testname is missing from $meson"; 
> done; done

And, since 681d9e462:

security is missing from contrib/seg/meson.build

pg16b2: REINDEX segv on null pointer in RemoveFromWaitQueue

2023-07-10 Thread Justin Pryzby

An instance compiled locally, without assertions, failed like this:

< 2023-07-09 22:04:46.470 UTC  >LOG:  process 30002 detected deadlock while 
waiting for ShareLock on transaction 813219577 after 333.228 ms
< 2023-07-09 22:04:46.470 UTC  >DETAIL:  Process holding the lock: 2103. Wait 
queue: 30001.
< 2023-07-09 22:04:46.470 UTC  >CONTEXT:  while checking uniqueness of tuple 
(549,4) in relation "pg_statistic_ext_data"
< 2023-07-09 22:04:46.470 UTC  >STATEMENT:  REINDEX INDEX 
pg_statistic_ext_data_stxoid_inh_index
< 2023-07-09 22:04:46.474 UTC  >ERROR:  deadlock detected
< 2023-07-09 22:04:46.474 UTC  >DETAIL:  Process 30001 waits for ShareLock on 
transaction 813219577; blocked by process 2103.
Process 2103 waits for RowExclusiveLock on relation 3429 of database 
16588; blocked by process 30001.
Process 30001: REINDEX INDEX pg_statistic_ext_data_stxoid_inh_index
Process 2103: autovacuum: ANALYZE child.ericsson_sgsn_rac_202307
< 2023-07-09 22:04:46.474 UTC  >HINT:  See server log for query details.
< 2023-07-09 22:04:46.474 UTC  >CONTEXT:  while checking uniqueness of tuple 
(549,4) in relation "pg_statistic_ext_data"
< 2023-07-09 22:04:46.474 UTC  >STATEMENT:  REINDEX INDEX 
pg_statistic_ext_data_stxoid_inh_index
< 2023-07-09 22:04:46.483 UTC  >LOG:  background worker "parallel worker" (PID 
30002) exited with exit code 1
< 2023-07-09 22:04:46.487 UTC postgres >ERROR:  deadlock detected
< 2023-07-09 22:04:46.487 UTC postgres >DETAIL:  Process 30001 waits for 
ShareLock on transaction 813219577; blocked by process 2103.
Process 2103 waits for RowExclusiveLock on relation 3429 of database 
16588; blocked by process 30001.
< 2023-07-09 22:04:46.487 UTC postgres >HINT:  See server log for query details.
< 2023-07-09 22:04:46.487 UTC postgres >CONTEXT:  while checking uniqueness of 
tuple (549,4) in relation "pg_statistic_ext_data"
parallel worker
< 2023-07-09 22:04:46.487 UTC postgres >STATEMENT:  REINDEX INDEX 
pg_statistic_ext_data_stxoid_inh_index
< 2023-07-09 22:04:46.848 UTC  >LOG:  server process (PID 30001) was terminated 
by signal 11: Segmentation fault
< 2023-07-09 22:04:46.848 UTC  >DETAIL:  Failed process was running: REINDEX 
INDEX pg_statistic_ext_data_stxoid_inh_index

=> REINDEX was running, with parallel workers, but deadlocked with
ANALYZE, and then crashed.

It looks like parallel workers are needed to hit this issue.
I'm not sure if the issue is specific to extended stats - probably not.

I reproduced the crash with manual REINDEX+ANALYZE, and with assertions (which
were not hit), and on a more recent commit (1124cb2cf).  The crash is hit about
30% of the time when running a loop around REINDEX and then also running
ANALYZE.

I hope someone has a hunch where to look; so far, I wasn't able to create a
minimal reproducer.  

Core was generated by `postgres: pryzbyj ts [local] REINDEX '.
Program terminated with signal 11, Segmentation fault.
#0  RemoveFromWaitQueue (proc=0x2aaabc1289e0, hashcode=2627626119) at 
../src/backend/storage/lmgr/lock.c:1898
1898LOCKMETHODID lockmethodid = LOCK_LOCKMETHOD(*waitLock);
(gdb) bt
#0  RemoveFromWaitQueue (proc=0x2aaabc1289e0, hashcode=2627626119) at 
../src/backend/storage/lmgr/lock.c:1898
#1  0x007ab56b in LockErrorCleanup () at 
../src/backend/storage/lmgr/proc.c:735
#2  0x00548a7e in AbortTransaction () at 
../src/backend/access/transam/xact.c:2735
#3  0x00549405 in AbortCurrentTransaction () at 
../src/backend/access/transam/xact.c:3414
#4  0x007b6414 in PostgresMain (dbname=, 
username=) at ../src/backend/tcop/postgres.c:4352
#5  0x00730e9a in BackendRun (port=, port=) at ../src/backend/postmaster/postmaster.c:4461
#6  BackendStartup (port=0x12a8a50) at 
../src/backend/postmaster/postmaster.c:4189
#7  ServerLoop () at ../src/backend/postmaster/postmaster.c:1779
#8  0x0073207d in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x127a230) at ../src/backend/postmaster/postmaster.c:1463
#9  0x004b5535 in main (argc=3, argv=0x127a230) at 
../src/backend/main/main.c:198

(gdb) l
1893RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode)
1894{
1895LOCK   *waitLock = proc->waitLock;
1896PROCLOCK   *proclock = proc->waitProcLock;
1897LOCKMODElockmode = proc->waitLockMode;
1898LOCKMETHODID lockmethodid = LOCK_LOCKMETHOD(*waitLock);
1899
1900/* Make sure proc is waiting */
1901Assert(proc->waitStatus == PROC_WAIT_STATUS_WAITING);
1902Assert(proc->links.next != NULL);

(gdb) p waitLock
$1 = (LOCK *) 0x0

Another variant on this crash:

Jul 11 00:55:19 telsa kernel: postgres[25415]: segfault at f ip 
008a sp 7ffdbc01ea90 error 4 in postgres[40+8df000]

Core was generated by `postgres: parallel worker for PID 27096  waiting '.

(gdb) bt
#0  RemoveFromWaitQueue (proc=0x2aaabc154040, hashcode=2029421528) at

Re: warn if GUC set to an invalid shared library

2023-07-06 Thread Justin Pryzby

On Sat, Oct 29, 2022 at 10:40 AM Justin Pryzby  wrote:
> > > > On Fri, Sep 02, 2022 at 05:24:58PM -0500, Justin Pryzby wrote:
> > > > > It caused no issue when I changed:
> > > > >
> > > > > /* Check that it's acceptable for the 
> > > > > indicated parameter */
> > > > > if (!parse_and_validate_value(record, name, 
> > > > > value,
> > > > > - PGC_S_FILE, 
> > > > > ERROR,
> > > > > + PGC_S_TEST, 
> > > > > ERROR,
> > > > >   , 
> > > > > ))
> > > > >
> > > > > I'm not sure where to go from here.
> > > >
> > > > I'm hoping for some guidance ; this simple change may be naive, but I'm 
> > > > not
> > > > sure what a wider change would look like.

I'm still hoping.

> > PGC_S_TEST is a better fit, so my question is whether it's really that
> > simple ?  
> 
> I've added the trivial change as 0001 and re-opened the patch (which ended
> up in January's CF)
> 
> If for some reason it's not really as simple as that, then 001 will
> serve as a "straw-man patch" hoping to elicit discussion on that point.

> From defdb57fe0ec373c1eea8df42f0e1831b3f9c3cc Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Fri, 22 Jul 2022 15:52:11 -0500
> Subject: [PATCH v6 1/4] WIP: test GUCs from ALTER SYSTEM as PGC_S_TEST not
>  FILE
> 
> WIP: ALTER SYSTEM should use PGC_S_TEST rather than PGC_S_FILE
> 
> Since the value didn't come from a file.  Or maybe we should have
> another PGC_S_ value for this, or a flag for 'is a test'.
> ---
>  src/backend/utils/misc/guc.c | 2 +-
>  src/include/utils/guc.h  | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
> index 6f21752b844..ae8810591d6 100644
> --- a/src/backend/utils/misc/guc.c
> +++ b/src/backend/utils/misc/guc.c
> @@ -4435,7 +4435,7 @@ AlterSystemSetConfigFile(AlterSystemStmt *altersysstmt)
>  
>   /* Check that it's acceptable for the indicated 
> parameter */
>   if (!parse_and_validate_value(record, name, value,
> - 
>   PGC_S_FILE, ERROR,
> + 
>   PGC_S_TEST, ERROR,
>       
>   , ))
>   ereport(ERROR,
>   
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),

This is rebased over my own patch to enable checks for
REGRESSION_TEST_NAME_RESTRICTIONS.

-- 
Justin
>From aa4c3ae08b3379bcd222e00aa896a40f811155a5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 22 Jul 2022 15:52:11 -0500
Subject: [PATCH 1/4] WIP: test GUCs from ALTER SYSTEM as PGC_S_TEST not FILE

WIP: ALTER SYSTEM should use PGC_S_TEST rather than PGC_S_FILE

Since the value didn't come from a file.  Or maybe we should have
another PGC_S_ value for this, or a flag for 'is a test'.
---
 src/backend/utils/misc/guc.c | 2 +-
 src/include/utils/guc.h  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 5308896c87f..dda54310a56 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -4540,7 +4540,7 @@ AlterSystemSetConfigFile(AlterSystemStmt *altersysstmt)
 
 			/* Check that it's acceptable for the indicated parameter */
 			if (!parse_and_validate_value(record, name, value,
-		  PGC_S_FILE, ERROR,
+		  PGC_S_TEST, ERROR,
 		  , ))
 ereport(ERROR,
 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index d5253c7ed23..bd45d40c106 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -119,6 +119,7 @@ typedef enum
 	PGC_S_OVERRIDE,/* special case to forcibly set default */
 	PGC_S_INTERACTIVE,			/* dividing line for error reporting */
 	PGC_S_TEST,	/* test per-database or per-user setting */
+	// PGC_S_TEST_FILE,			/* test global cluster settings (ALTER SYSTEM) */
 	PGC_S_SESSION/* SET command */
 } GucSource;
 
-- 
2.34.1

>From b05c738bddfca11d4d0510c497a121cc8fcb585a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 13 Dec 2021 08:42:38 -0600
Subject: [PATCH 2/4] warn when setting GUC to a nonextant library

Note that warnings shouldn't be issued during sta

Re: Improve logging when using Huge Pages

2023-06-20 Thread Justin Pryzby

On Tue, Jun 13, 2023 at 02:50:30PM +0900, Michael Paquier wrote:
> On Mon, Jun 12, 2023 at 02:37:15PM -0700, Nathan Bossart wrote:
> > Fair enough.  I know I've been waffling in the GUC versus function
> > discussion, but FWIW v7 of the patch looks reasonable to me.
> 
> +   Assert(strcmp("unknown", GetConfigOption("huge_pages_status", false, 
> false)) != 0);
> 
> Not sure that there is anything to gain with this assertion in
> CreateSharedMemoryAndSemaphores(), because this is pretty much what
> check_GUC_init() looks after?

It seems like you misread the assertion, so I added a comment about it.
Indeed, the assertion addresses the other question you asked later.

That's what I already commented about, and the reason I found it
compelling not to use a boolean.

On Thu, Apr 06, 2023 at 04:57:58PM -0500, Justin Pryzby wrote:
> I added an assert to check that a running server won't output
> "unknown".

On Wed, Jun 14, 2023 at 09:15:35AM +0900, Michael Paquier wrote:
> There was a second thing that bugged me here.  Would it be worth
> adding some checks on huge_pages_status to make sure that it is never
> reported as unknown when the server is up and running?

-- 
Justin
>From 00234f5a0c14e2569ac1e7dab89907196f3ca9e1 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 23 Jan 2023 18:33:51 -0600
Subject: [PATCH v8] add GUC: huge_pages_active

This is useful to show the current state of huge pages when
huge_pages=try.  The effective status is not otherwise visible without
OS level tools like gdb or /proc/N/smaps.

https://www.postgresql.org/message-id/flat/tu4pr8401mb1152ebb0d271f827e2e37a01ee...@tu4pr8401mb1152.namprd84.prod.outlook.com
---
 doc/src/sgml/config.sgml| 21 -
 src/backend/port/sysv_shmem.c   | 10 ++
 src/backend/port/win32_shmem.c  |  5 +
 src/backend/storage/ipc/ipci.c  |  7 +++
 src/backend/utils/misc/guc_tables.c | 20 
 src/include/storage/pg_shmem.h  |  5 +++--
 6 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6262cb7bb2f..e69afae71bf 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1727,7 +1727,8 @@ include_dir 'conf.d'
 server will try to request huge pages, but fall back to the default if
 that fails. With on, failure to request huge pages
 will prevent the server from starting up. With off,
-huge pages will not be requested.
+huge pages will not be requested.  The actual state of huge pages is
+indicated by the server variable .

 

@@ -10738,6 +10739,24 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
+ 
+  huge_pages_status (enum)
+  
+   huge_pages_status configuration parameter
+  
+  
+  
+   
+Reports the state of huge pages in the current instance:
+on, off, or (if displayed with
+postgres -C) unknown.
+This parameter is useful to determine whether allocation of huge pages
+was successful when huge_pages=try.
+See  for more information.
+   
+  
+ 
+
  
   integer_datetimes (boolean)
   
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index eaba244bc9c..0e710237ea1 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -627,6 +627,10 @@ CreateAnonymousSegment(Size *size)
 	}
 #endif
 
+	/* Report whether huge pages are in use */
+	SetConfigOption("huge_pages_status", ptr == MAP_FAILED ? "off" : "on",
+	PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
+
 	if (ptr == MAP_FAILED && huge_pages != HUGE_PAGES_ON)
 	{
 		/*
@@ -737,8 +741,14 @@ PGSharedMemoryCreate(Size size,
 		sysvsize = sizeof(PGShmemHeader);
 	}
 	else
+	{
 		sysvsize = size;
 
+		/* huge pages are only available with mmap */
+		SetConfigOption("huge_pages_status", "off",
+		PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
+	}
+
 	/*
 	 * Loop till we find a free IPC key.  Trust CreateDataDirLockFile() to
 	 * ensure no more than one postmaster per data directory can enter this
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index 62e08074770..87f0b001915 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -327,6 +327,11 @@ retry:
 			Sleep(1000);
 			continue;
 		}
+
+		/* Report whether huge pages are in use */
+		SetConfigOption("huge_pages_status", (flProtect & SEC_LARGE_PAGES) ?
+		"on" : "off", PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
+
 		break;
 	}
 
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 8f1ded7338f..5901a3bd8eb 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2023-05-31 Thread Justin Pryzby

On Thu, May 25, 2023 at 03:49:12PM +0900, Michael Paquier wrote:
> looking at the patch.  Here are a few comments.

...
>  * No need to add an explicit dependency for the toast table, as the
>  * main table depends on it.
>  */
> -   if (RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE)
> +   if ((RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE) 
> ||
> +   relkind == RELKIND_PARTITIONED_TABLE)
> 
> The comment at the top of this code block needs an update.

What do you think the comment ought to say ?  It already says:

src/backend/catalog/heap.c-  * Make a dependency link to force the 
relation to be deleted if its
src/backend/catalog/heap.c-  * access method is.

> Speaking of which, ATExecSetAccessMethodNoStorage() does a catalog
> update even if ALTER TABLE is defined to use the same table AM as what
> is currently set.  There is no need to update the relation's pg_class
> entry in this case.  Be careful that InvokeObjectPostAlterHook() still
> needs to be checked in this case.  Perhaps some tests should be added
> in test_oat_hooks.sql?  It would be tempted to add a new SQL file for
> that.

Are you suggesting to put this in a conditional: if oldrelam!=newAccessMethod ?

+   ((Form_pg_class) GETSTRUCT(tuple))->relam = newAccessMethod;
+   CatalogTupleUpdate(pg_class, >t_self, tuple);
+
+   /* Update dependency on new AM */
+   changeDependencyFor(RelationRelationId, relid, AccessMethodRelationId,
+   oldrelam, newAccessMethod);

Why is that desirable ?  I'd prefer to see it written with fewer
conditionals, not more; then, it hits the same path every time.

This ought to address your other comments.

-- 
Justin
>From a726bd7ca8844f6eee639d672afa7edace329caf Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 7 Mar 2021 00:11:38 -0600
Subject: [PATCH] Allow specifying access method of partitioned tables..

..to be inherited by partitions

See also:
ca4103025dfe26eaaf6a500dec9170fbb176eebc
8586bf7ed8889f39a59dd99b292014b73be85342
ebfe2dbd6b624e2a428e14b7ee9322cc096f63f7 - prevent DROP AM

Authors: Justin Pryzby, Soumyadeep Chakraborty
---
 doc/src/sgml/ref/alter_table.sgml   |  4 ++
 doc/src/sgml/ref/create_table.sgml  |  9 ++-
 src/backend/catalog/heap.c  |  3 +-
 src/backend/commands/tablecmds.c| 89 +++--
 src/backend/utils/cache/lsyscache.c | 22 ++
 src/backend/utils/cache/relcache.c  |  5 ++
 src/bin/pg_dump/pg_dump.c   |  3 +-
 src/bin/pg_dump/t/002_pg_dump.pl| 34 ++
 src/include/catalog/pg_class.h  |  4 +-
 src/include/utils/lsyscache.h   |  1 +
 src/test/regress/expected/create_am.out | 62 -
 src/test/regress/sql/create_am.sql  | 25 +--
 12 files changed, 212 insertions(+), 49 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index d4d93eeb7c6..d32d4c44b10 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -719,6 +719,10 @@ WITH ( MODULUS numeric_literal, REM
  
   This form changes the access method of the table by rewriting it. See
for more information.
+  When applied to a partitioned table, there is no data to rewrite, but any
+  partitions created afterwards with
+  CREATE TABLE PARTITION OF will use that access method,
+  unless overridden by an USING clause.
  
 

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 10ef699fab9..b20d272b151 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1295,9 +1295,12 @@ WITH ( MODULUS numeric_literal, REM
   This optional clause specifies the table access method to use to store
   the contents for the new table; the method needs be an access method of
   type TABLE. See  for more
-  information.  If this option is not specified, the default table access
-  method is chosen for the new table. See  for more information.
+  information.  If this option is not specified, a default table access
+  method is chosen for the new table.
+  When creating a partition, the default table access method is the
+  access method of its parent.
+  For other tables, the default is determined by
+  .
  
 

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 2a0d82aedd7..bbf8e08618b 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1452,7 +1452,8 @@ heap_create_with_catalog(const char *relname,
 		 * No need to add an explicit dependency for the toast table, as the
 		 * main table depends on it.
 		 */
-		if (RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE)
+

Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

2023-05-24 Thread Justin Pryzby

I added documentation for the SQL functions in 001.
And updated to say 17

I'm planning to set this patch as ready - it has not changed
significantly in 18 months.  Not for the first time, I've implemented a
workaround at a higher layer.

-- 
Justin
>From 917e5e36d14018b6de457ba9eafe8936c0e88c3c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 13 Jul 2021 21:25:48 -0500
Subject: [PATCH 1/4] Add pg_am_size(), pg_namespace_size() ..

See also: 358a897fa, 528ac10c7
---
 doc/src/sgml/func.sgml  |  39 ++
 src/backend/utils/adt/dbsize.c  | 132 
 src/include/catalog/pg_proc.dat |  19 +
 3 files changed, 190 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5a47ce43434..6cbf4e9aa56 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27762,6 +27762,45 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset

   
 
+  
+   
+
+ pg_namespace_size
+
+pg_namespace_size ( name )
+bigint
+   
+   
+pg_namespace_size ( oid )
+bigint
+   
+   
+Computes the total disk space used by relations in the namespace (schema)
+with the specified name or OID. To use this function, you must
+have CREATE privilege on the specified namespace
+or have privileges of the pg_read_all_stats role,
+unless it is the default namespace for the current database.
+   
+  
+
+  
+   
+
+ pg_am_size
+
+pg_am_size ( name )
+bigint
+   
+   
+pg_am_size ( oid )
+bigint
+   
+   
+Computes the total disk space used by relations using the access method
+with the specified name or OID.
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index e5c0f1c45b6..af0955d1790 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -13,19 +13,25 @@
 
 #include 
 
+#include "access/genam.h"
 #include "access/htup_details.h"
 #include "access/relation.h"
+#include "access/table.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_namespace.h"
 #include "catalog/pg_tablespace.h"
 #include "commands/dbcommands.h"
+#include "commands/defrem.h"
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "storage/fd.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
 #include "utils/numeric.h"
 #include "utils/rel.h"
 #include "utils/relfilenumbermap.h"
@@ -858,6 +864,132 @@ pg_size_bytes(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(result);
 }
 
+/*
+ * Return the sum of size of relations for which the given attribute of
+ * pg_class matches the specified OID value.
+ */
+static int64
+calculate_size_attvalue(AttrNumber attnum, Oid attval)
+{
+	int64		totalsize = 0;
+	ScanKeyData skey;
+	Relation	pg_class;
+	SysScanDesc scan;
+	HeapTuple	tuple;
+
+	ScanKeyInit(, attnum,
+BTEqualStrategyNumber, F_OIDEQ, attval);
+
+	pg_class = table_open(RelationRelationId, AccessShareLock);
+	scan = systable_beginscan(pg_class, InvalidOid, false, NULL, 1, );
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple);
+		Relation	rel;
+
+		rel = try_relation_open(classtuple->oid, AccessShareLock);
+		if (!rel)
+			continue;
+
+		for (ForkNumber forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+			totalsize += calculate_relation_size(&(rel->rd_locator), rel->rd_backend, forkNum);
+
+		relation_close(rel, AccessShareLock);
+	}
+
+	systable_endscan(scan);
+	table_close(pg_class, AccessShareLock);
+	return totalsize;
+}
+
+/* Compute the size of relations in a schema (namespace) */
+static int64
+calculate_namespace_size(Oid nspOid)
+{
+	/*
+	 * User must be a member of pg_read_all_stats or have CREATE privilege for
+	 * target namespace.
+	 */
+	if (!is_member_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+	{
+		AclResult	aclresult;
+
+		aclresult = object_aclcheck(NamespaceRelationId, nspOid, GetUserId(), ACL_CREATE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_SCHEMA,
+		   get_namespace_name(nspOid));
+	}
+
+	return calculate_size_attvalue(Anum_pg_class_relnamespace, nspOid);
+}
+
+Datum
+pg_namespace_size_oid(PG_FUNCTION_ARGS)
+{
+	Oid			nspOid = PG_GETARG_OID(0);
+	int64		size;
+
+	size = calculate_namespace_size(nspOid);
+
+	if (size < 0)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(size);
+}
+
+Datum
+pg_namespace_size_name(P

Re: fix stats_fetch_consistency value in postgresql.conf.sample

2023-05-09 Thread Justin Pryzby

On Wed, Mar 29, 2023 at 11:03:59PM -0500, Justin Pryzby wrote:
> On Wed, Jul 13, 2022 at 04:49:00PM -0700, Andres Freund wrote:
> > On 2022-07-14 08:46:02 +0900, Michael Paquier wrote:
> > > On Wed, Jul 13, 2022 at 12:30:00PM -0500, Justin Pryzby wrote:
> > > > How did you make this list ?  Was it by excluding things that failed 
> > > > for you ?
> > > > 
> > > > cfbot is currently failing due to io_concurrency on windows.
> > > > I think there are more GUC which should be included here.
> > > > 
> > > > http://cfbot.cputube.org/kyotaro-horiguchi.html
> > > 
> > > FWIW, I am not really a fan of making this test depend on a hardcoded
> > > list of GUCs.
> > 
> > I wonder if we should add flags indicating platform dependency etc to guc.c?
> > That should allow to remove most of them?
> 
> Michael commented on this, but on another thread, so I'm copying and
> pasting it here.
> 
> On Thu, Mar 23, 2023 at 08:59:57PM -0500, Justin Pryzby wrote:
> > On Fri, Mar 24, 2023 at 10:24:43AM +0900, Michael Paquier wrote:
> > > >> * Check consistency of GUC defaults between .sample.conf and 
> > > >> pg_settings.boot_val
> > > >   - It looks like this was pretty active until last October and might
> > > > have been ready to apply at least partially? But no further work or
> > > > review has happened since.
> > > 
> > > FWIW, I don't find much appealing the addition of two GUC flags for
> > > only the sole purpose of that,
> > 
> > The flags seem independently interesting - adding them here follows
> > a suggestion Andres made in response to your complaint.
> > 20220713234900.z4rniuaerkq34...@awork3.anarazel.de
> > 
> > > particularly as we get a stronger
> > > dependency between GUCs that can be switched dynamically at
> > > initialization and at compile-time.
> > 
> > What do you mean by "stronger dependency between GUCs" ?
> 
> I'm still not clear what that means ?

Michael ?

This fixes an issue with the last version that failed with
log_autovacuum_min_duration in cirrusci's pg_ci_base.conf.

And now includes both a perl and a sql-based versions of the test - both
of which rely on the flags.
>From 963b56636b9f7fd4a78e502c6acba07919518910 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 19 Jul 2022 08:31:56 -0500
Subject: [PATCH 1/2] pg_settings_get_flags(): add DEFAULT_COMPILE and
 DEFAULT_INITDB ..

for settings which vary by ./configure or platform or initdb

Note that this is independent from PGC_S_DYNAMIC_DEFAULT.
---
 doc/src/sgml/func.sgml  | 15 ++
 src/backend/utils/misc/guc_funcs.c  |  6 ++-
 src/backend/utils/misc/guc_tables.c | 76 ++---
 src/include/utils/guc.h |  2 +
 4 files changed, 69 insertions(+), 30 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e13fce6f6b1..17069d2249e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24945,6 +24945,21 @@ SELECT collation for ('foo' COLLATE "de_DE");
  FlagDescription
 
 
+
+ 
+  DEFAULT_COMPILE
+  Parameters with this flag have default values which vary by
+  platform, or compile-time options.
+  
+ 
+
+ 
+  DEFAULT_INITDB
+  Parameters with this flag have default values which are
+  determined dynamically during initdb.
+  
+ 
+
  
   EXPLAIN
   Parameters with this flag are included in
diff --git a/src/backend/utils/misc/guc_funcs.c b/src/backend/utils/misc/guc_funcs.c
index 52c361e9755..183943c1973 100644
--- a/src/backend/utils/misc/guc_funcs.c
+++ b/src/backend/utils/misc/guc_funcs.c
@@ -551,7 +551,7 @@ ShowAllGUCConfig(DestReceiver *dest)
 Datum
 pg_settings_get_flags(PG_FUNCTION_ARGS)
 {
-#define MAX_GUC_FLAGS	6
+#define MAX_GUC_FLAGS	8
 	char	   *varname = TextDatumGetCString(PG_GETARG_DATUM(0));
 	struct config_generic *record;
 	int			cnt = 0;
@@ -564,6 +564,10 @@ pg_settings_get_flags(PG_FUNCTION_ARGS)
 	if (record == NULL)
 		PG_RETURN_NULL();
 
+	if (record->flags & GUC_DEFAULT_COMPILE)
+		flags[cnt++] = CStringGetTextDatum("DEFAULT_COMPILE");
+	if (record->flags & GUC_DEFAULT_INITDB)
+		flags[cnt++] = CStringGetTextDatum("DEFAULT_INITDB");
 	if (record->flags & GUC_EXPLAIN)
 		flags[cnt++] = CStringGetTextDatum("EXPLAIN");
 	if (record->flags & GUC_NO_RESET)
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 2f42cebaf62..94b0aa24a98 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1233,7 +1233,7 @@ struct config_bool ConfigureNamesBool[] =
 		{"debug_assertions&

Re: Direct I/O

2023-04-30 Thread Justin Pryzby

On Sun, Apr 30, 2023 at 06:35:30PM +1200, Thomas Munro wrote:
> On Sun, Apr 30, 2023 at 4:11 PM Noah Misch  wrote:
> > Speaking of the developer-only status, I find the io_direct name more 
> > enticing
> > than force_parallel_mode, which PostgreSQL renamed due to overuse from 
> > people
> > expecting non-developer benefits.  Should this have a name starting with
> > debug_?
> 
> Hmm, yeah I think people coming from other databases would be tempted
> by it.  But, unlike the
> please-jam-a-gather-node-on-top-of-the-plan-so-I-can-debug-the-parallel-executor
> switch, I think of this thing more like an experimental feature that
> is just waiting for more features to make it useful.  What about a
> warning message about that at startup if it's on?

Such a warning wouldn't be particularly likely to be seen by someone who
already didn't read/understand the docs for the not-feature that they
turned on.

Since this is -currently- a developer-only feature, it seems reasonable
to rename the GUC to debug_direct_io, and (at such time as it's
considered to be helpful to users) later rename it to direct_io.  
That avoids the issue that random advice to enable direct_io=x under
v17+ is applied by people running v16.  +0.8 to do so.

Maybe in the future, it should be added to GUC_EXPLAIN, too ?

-- 
Justin

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2023-04-24 Thread Justin Pryzby

On Thu, Mar 30, 2023 at 12:07:58AM -0500, Justin Pryzby wrote:
> On Mon, Mar 27, 2023 at 11:34:36PM -0500, Justin Pryzby wrote:
> > On Tue, Mar 28, 2023 at 09:13:10AM +0900, Michael Paquier wrote:
> > > On Mon, Mar 20, 2023 at 09:30:50AM +0900, Michael Paquier wrote:
> > > > Did you check dump and restore flows with partition
> > > > trees and --no-table-access-method?  Perhaps there should be
> > > > some regression tests with partitioned tables?
> > > 
> > > I was looking at the patch, and as I suspected the dumps generated
> > > are forgetting to apply the AM to the partitioned tables.
> > 
> > The patch said:
> > 
> > +   else if (RELKIND_HAS_TABLE_AM(relkind) || relkind == 
> > RELKIND_PARTITIONED_TABLE)
> > 
> > pg_dump was missing a similar change that's conditional on
> > RELKIND_HAS_TABLE_AM().  It looks like those are the only two places
> > that need be be specially handled.
> > 
> > I dug up my latest patch from 2021 and incorporated the changes from the
> > 0001 patch here, and added a test case.
> > 
> > I realized that one difference with tablespaces is that, as written,
> > partitioned tables will *always* have an AM specified,  and partitions
> > will never use default_table_access_method.  Is that what's intended ?
> > 
> > Or do we need logic similar tablespaces, such that the relam of a
> > partitioned table is set only if it differs from default_table_am ?
> 
> Actually .. I think it'd be a mistake if the relam needed to be
> interpretted differently for partitioned tables than other relkinds.
> 
> I updated the patch to allow intermediate partitioned tables to inherit
> relam from their parent.
> 
> Michael wrote:
> > .. and there are quite more points to consider.
> 
> What more points ?  There's nothing that's been raised here.  In fact,
> your message last week is the first comment since last June ..

Michael ?

Re: Wrong results from Parallel Hash Full Join

2023-04-20 Thread Justin Pryzby

On Wed, Apr 19, 2023 at 08:47:07PM -0400, Melanie Plageman wrote:
> On Wed, Apr 19, 2023 at 8:41 PM Justin Pryzby  wrote:
> >
> > On Wed, Apr 19, 2023 at 12:20:51PM -0700, Andres Freund wrote:
> > > On 2023-04-19 12:16:24 -0500, Justin Pryzby wrote:
> > > > On Wed, Apr 19, 2023 at 11:17:04AM -0400, Melanie Plageman wrote:
> > > > > Ultimately this is probably fine. If we wanted to modify one of the
> > > > > existing tests to cover the multi-batch case, changing the select
> > > > > count(*) to a select * would do the trick. I imagine we wouldn't want 
> > > > > to
> > > > > do this because of the excessive output this would produce. I wondered
> > > > > if there was a pattern in the tests for getting around this.
> > > >
> > > > You could use explain (ANALYZE).  But the output is machine-dependant in
> > > > various ways (which is why the tests use "explain analyze so rarely).
> > >
> > > I think with sufficient options it's not machine specific.
> >
> > It *can* be machine specific depending on the node type..
> >
> > In particular, for parallel workers, it shows "Workers Launched: ..",
> > which can vary even across executions on the same machine.  And don't
> > forget about "loops=".
> >
> > Plus:
> > src/backend/commands/explain.c: "Buckets: %d  Batches: %d  Memory Usage: 
> > %ldkB\n",
> >
> > > We have a bunch of
> > >  EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) ..
> > > in our tests.
> >
> > There's 81 uses of "timing off", out of a total of ~1600 explains.  Most
> > of them are in partition_prune.sql.  explain analyze is barely used.
> >
> > I sent a patch to elide the machine-specific parts, which would make it
> > easier to use.  But there was no interest.
> 
> While I don't know about other use cases, I would have used that here.
> Do you still have that patch laying around? I'd be interested to at
> least review it.

https://commitfest.postgresql.org/41/3409/

Re: Wrong results from Parallel Hash Full Join

2023-04-19 Thread Justin Pryzby

On Wed, Apr 19, 2023 at 12:20:51PM -0700, Andres Freund wrote:
> On 2023-04-19 12:16:24 -0500, Justin Pryzby wrote:
> > On Wed, Apr 19, 2023 at 11:17:04AM -0400, Melanie Plageman wrote:
> > > Ultimately this is probably fine. If we wanted to modify one of the
> > > existing tests to cover the multi-batch case, changing the select
> > > count(*) to a select * would do the trick. I imagine we wouldn't want to
> > > do this because of the excessive output this would produce. I wondered
> > > if there was a pattern in the tests for getting around this.
> > 
> > You could use explain (ANALYZE).  But the output is machine-dependant in
> > various ways (which is why the tests use "explain analyze so rarely).
> 
> I think with sufficient options it's not machine specific.

It *can* be machine specific depending on the node type..

In particular, for parallel workers, it shows "Workers Launched: ..",
which can vary even across executions on the same machine.  And don't
forget about "loops=".

Plus:
src/backend/commands/explain.c: "Buckets: %d  Batches: %d  Memory Usage: 
%ldkB\n",

> We have a bunch of
>  EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) ..
> in our tests.

There's 81 uses of "timing off", out of a total of ~1600 explains.  Most
of them are in partition_prune.sql.  explain analyze is barely used.

I sent a patch to elide the machine-specific parts, which would make it
easier to use.  But there was no interest.

-- 
Justin

Re: Wrong results from Parallel Hash Full Join

2023-04-19 Thread Justin Pryzby

On Wed, Apr 19, 2023 at 11:17:04AM -0400, Melanie Plageman wrote:
> Ultimately this is probably fine. If we wanted to modify one of the
> existing tests to cover the multi-batch case, changing the select
> count(*) to a select * would do the trick. I imagine we wouldn't want to
> do this because of the excessive output this would produce. I wondered
> if there was a pattern in the tests for getting around this.

You could use explain (ANALYZE).  But the output is machine-dependant in
various ways (which is why the tests use "explain analyze so rarely).

So you'd have to filter its output with a function (like the functions
that exist in a few places for similar purpose).

-- 
Justin

Re: Fix typos and inconsistencies for v16

2023-04-18 Thread Justin Pryzby

On Tue, Apr 18, 2023 at 02:06:43PM +1200, David Rowley wrote:
> On Tue, 18 Apr 2023 at 10:10, Justin Pryzby  wrote:
> > > -  * USER SET values are appliciable only for PGC_USERSET 
> > > parameters. We
> > > +  * USER SET values are applicable only for PGC_USERSET 
> > > parameters. We
> > >* use InvalidOid as role in order to evade possible 
> > > privileges of the
> >
> > and s/evade/avoid/
> 
> I didn't touch this. You'll need to provide more justification for why
> you think it's more correct than what's there.  

I'd noticed that it's a substitution/mistake that's been made in the
past.  I dug up:

9436041ed848debb3d64fb5fbff6cdb35bc46d04
8e12f4a250d250a89153da2eb9b91c31bb80c483
cd9479af2af25d7fa9bfd24dd4dcf976b360f077
6df7a9698bb036610c1e8c6d375e1be38cb26d5f
911e70207703799605f5a0e8aad9f06cff067c63

> It might not be worth too much discussion, however.

+many.  I may resend the patch at some later date.

> > Attached some others that I found.
> 
> Pushed the rest.  Thanks

Thanks!

-- 
Justin

Re: Fix typos and inconsistencies for v16

2023-04-17 Thread Justin Pryzby

On Mon, Apr 17, 2023 at 09:00:00PM +0300, Alexander Lakhin wrote:
> Hello hackers,
> 
> Please consider fixing the following unique words/identifiers introduced in 
> v16:

Well done.

Note that your patches are overlapping:

  3 --- a/src/backend/utils/misc/guc.c
  2 --- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
  2 --- a/src/test/ldap/LdapServer.pm
  2 --- a/src/interfaces/libpq/t/004_load_balance_dns.pl
  2 --- a/src/backend/utils/adt/acl.c

It'd make sense if the changes to each file were isolated to a single
patch (especially 004_load and acl.c).

> -  * USER SET values are appliciable only for PGC_USERSET 
> parameters. We
> +  * USER SET values are applicable only for PGC_USERSET 
> parameters. We
>* use InvalidOid as role in order to evade possible privileges 
> of the

and s/evade/avoid/

> +++ b/src/bin/pg_dump/pg_dumpall.c

You missed "boostrap" :)

I independently found 11 of the same typos you did:

> 1. addresess -> addresses
> 3. appeneded -> appended
> 4. appliciable -> applicable
> 8. containsthe ->  contains the
> 15. execpt -> except
> 19. happend -> happened
> 27. optionn -> option
> 30. permissons -> permissions
> 37. remaing -> remaining
> 42. sentinal -> sentinel
> 47. varilables -> variables

But hadn't yet convinced myself to start the process of defending each
one of the fixes.  Attached some others that I found.

-- 
Justin
diff --git a/contrib/test_decoding/expected/ddl.out 
b/contrib/test_decoding/expected/ddl.out
index 9a28b5ddc5a..d55fb3a667f 100644
--- a/contrib/test_decoding/expected/ddl.out
+++ b/contrib/test_decoding/expected/ddl.out
@@ -428,7 +428,7 @@ SELECT data FROM 
pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
 -- test whether a known, but not yet logged toplevel xact, followed by a
 -- subxact commit is handled correctly
 BEGIN;
-SELECT pg_current_xact_id() != '0'; -- so no fixed xid apears in the outfile
+SELECT pg_current_xact_id() != '0'; -- so no fixed xid appears in the outfile
  ?column? 
 --
  t
diff --git a/contrib/test_decoding/sql/ddl.sql 
b/contrib/test_decoding/sql/ddl.sql
index 4f76bed72c1..57285a828c7 100644
--- a/contrib/test_decoding/sql/ddl.sql
+++ b/contrib/test_decoding/sql/ddl.sql
@@ -236,7 +236,7 @@ SELECT data FROM 
pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
 -- test whether a known, but not yet logged toplevel xact, followed by a
 -- subxact commit is handled correctly
 BEGIN;
-SELECT pg_current_xact_id() != '0'; -- so no fixed xid apears in the outfile
+SELECT pg_current_xact_id() != '0'; -- so no fixed xid appears in the outfile
 SAVEPOINT a;
 INSERT INTO tr_sub(path) VALUES ('4-top-1-#1');
 RELEASE SAVEPOINT a;
diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index b5e0392ad27..b6c37ccef26 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -346,7 +346,7 @@ ALTER ROLE myname SET enable_indexscan TO off;
role using SET ROLE. However, since any user who has
ADMIN OPTION on a role can grant membership in that
role to any other user, the CREATEROLE user can gain
-   access to the created role by simplying granting that role back to
+   access to the created role by simply granting that role back to
themselves with the INHERIT and/or SET
options. Thus, the fact that privileges are not inherited by default nor
is SET ROLE granted by default is a safeguard against
diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 244957a2483..9bdc70c702e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4051,7 +4051,7 @@ recurse_push_qual(Node *setOp, Query *topquery,
  *
  * extra_used_attrs can be passed as non-NULL to mark any columns (offset by
  * FirstLowInvalidHeapAttributeNumber) that we should not remove.  This
- * parameter is modifed by the function, so callers must make a copy if they
+ * parameter is modified by the function, so callers must make a copy if they
  * need to use the passed in Bitmapset after calling this function.
  *
  * To avoid affecting column numbering in the targetlist, we don't physically
diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index e3824efe9b5..65adf04c4eb 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -436,7 +436,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, 
bool inhparent,
 * the number-of-tuples estimate to equal the parent 
table; if it
 * is partial then we have to use the same methods as 
we would for
 * a table, except we can be sure that the index is not 
larger
-* than the table.  We must ignore partitioned indexes 
here as as
+* than the table.  We must ignore partitioned indexes 
here as

Re: v16dev: TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, PID: 13564

2023-04-17 Thread Justin Pryzby

On Mon, Apr 17, 2023 at 01:50:30PM -0400, Tom Lane wrote:
> I wrote:
> > Yeah, I just came to the same conclusion.  One thing I don't understand
> > yet: log_newpage_range is old (it looks like this back to v12), and
> > that Assert is older, so why doesn't this reproduce further back?
> > Maybe the state where all the pages are new didn't happen before?
> 
> Bingo: bisecting shows the failure started at

Just curious: what "test" did you use to bisect with ?

v16dev: TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, PID: 13564

2023-04-17 Thread Justin Pryzby

I hit this assertion while pg_restoring data into a v16 instance.
postgresql16-server-16-alpha_20230417_PGDG.rhel7.x86_64

wal_level=minimal and pg_dump --single-transaction both seem to be
required to hit the issue.

$ /usr/pgsql-16/bin/postgres -D ./pg16test -c maintenance_work_mem=1GB -c 
max_wal_size=16GB -c wal_level=minimal -c max_wal_senders=0 -c port=5678 -c 
logging_collector=no &

$ time sudo -u postgres /usr/pgsql-16/bin/pg_restore -d postgres -p 5678 
--single-transaction --no-tablespace ./curtables

TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, 
PID: 13564

Core was generated by `postgres: postgres postgres [local] COMMIT   
 '.
Program terminated with signal 6, Aborted.
#0  0x7f28b8bd5387 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install 
postgresql16-server-16-alpha_20230417_PGDG.rhel7.x86_64
(gdb) bt
#0  0x7f28b8bd5387 in raise () from /lib64/libc.so.6
#1  0x7f28b8bd6a78 in abort () from /lib64/libc.so.6
#2  0x009bc8c9 in ExceptionalCondition 
(conditionName=conditionName@entry=0xa373e1 "size > SizeOfXLogRecord", 
fileName=fileName@entry=0xa31b13 "xlog.c", lineNumber=lineNumber@entry=1055) at 
assert.c:66
#3  0x0057b049 in ReserveXLogInsertLocation (PrevPtr=0x2e3d750, 
EndPos=, StartPos=, size=24) at 
xlog.c:1055
#4  XLogInsertRecord (rdata=rdata@entry=0xf187a0 , 
fpw_lsn=fpw_lsn@entry=0, flags=, num_fpi=num_fpi@entry=0, 
topxid_included=topxid_included@entry=false) at xlog.c:844
#5  0x0058210c in XLogInsert (rmid=rmid@entry=0 '\000', 
info=info@entry=176 '\260') at xloginsert.c:510
#6  0x00582b09 in log_newpage_range (rel=rel@entry=0x2e1f628, 
forknum=forknum@entry=FSM_FORKNUM, startblk=startblk@entry=0, 
endblk=endblk@entry=3, page_std=page_std@entry=false) at xloginsert.c:1317
#7  0x005d02f9 in smgrDoPendingSyncs (isCommit=isCommit@entry=true, 
isParallelWorker=isParallelWorker@entry=false) at storage.c:837
#8  0x00571637 in CommitTransaction () at xact.c:2225
#9  0x00572b25 in CommitTransactionCommand () at xact.c:3201
#10 0x0086afc7 in finish_xact_command () at postgres.c:2782
#11 0x0086d7e1 in exec_simple_query (query_string=0x2dec4f8 "COMMIT") 
at postgres.c:1307

Re: Move defaults toward ICU in 16?

2023-04-17 Thread Justin Pryzby

On Wed, Apr 05, 2023 at 09:33:25AM +0200, Peter Eisentraut wrote:
> On 16.03.23 14:52, Peter Eisentraut wrote:
> > On 09.03.23 20:14, Jeff Davis wrote:
> > > > Let's come back to that after dealing with the other two.
> > > 
> > > Leaving 0001 open for now.
> > 
> > I suspect making a change like this now would result in a bloodbath on
> > the build farm that we could do without.  I suggest revisiting this
> > after the commit fest ends.
> 
> I don't object to this patch.  I suggest waiting until next week to commit
> it and then see what happens.  It's easy to revert if it goes terribly.

Is this still being considered for v16 ?

-- 
Justin

Re: Can we delete the vacuumdb.sgml notes about which version each option was added in?

2023-04-16 Thread Justin Pryzby

On Sun, Apr 16, 2023 at 10:14:35PM +1200, David Rowley wrote:
> I was just looking at the vacuumdb docs and noticed that I had
> neglected to follow the tradition of adding a note to mention which
> version we added the new option in when I committed the
> --buffer-usage-limit patch.
> 
> There are 3 notes there that read "This option is only available for
> servers running PostgreSQL 9.6 and later.".  Since 9.6 is a few years
> out of support, can we get rid of these 3?
> 
> Or better yet, can we just delete them all?  Is it really worth doing
> this in case someone is using a new vacuumdb on an older server?
> 
> I just tried compiling the HTML with all the notes removed, I see from
> looking at a print preview that it's now ~1 full A4 page shorter than
> it was previously.  5 pages down to 4.
> 
> Does anyone think we should keep these?

I don't know if I'd support removing the notes, but I agree that they
don't need to take up anywhere near as much space as they do (especially
since the note is now repeated 10 times).

https://www.postgresql.org/docs/devel/app-vacuumdb.html

I suggest to remove the  markup and preserve the annotation about
version compatibility.  It's normal, technical writing to repeat the
same language like that.

Another, related improvement I suggested would be to group the
client-side options separately from the server-side options.
https://www.postgresql.org/message-id/zbydtrd1kygg+...@telsasoft.com

-- 
Justin

Re: v16dev: invalid memory alloc request size 8488348128

2023-04-15 Thread Justin Pryzby

On Sat, Apr 15, 2023 at 11:33:58AM +1200, David Rowley wrote:
> On Sat, 15 Apr 2023 at 10:48, Justin Pryzby  wrote:
> >
> > On Sat, Apr 15, 2023 at 10:04:52AM +1200, David Rowley wrote:
> > > Which aggregate function is being called here?  Is it a custom
> > > aggregate written in C, by any chance?
> >
> > That function is not an aggregate:
> 
> There's an aggregate somewhere as indicated by this fragment from the
> stack trace:
> 
> > #12 project_aggregates (aggstate=aggstate@entry=0x607200070d38) at 
> > ../src/backend/executor/nodeAgg.c:1377
> > #13 0x00903eb6 in agg_retrieve_direct 
> > (aggstate=aggstate@entry=0x607200070d38) at 
> > ../src/backend/executor/nodeAgg.c:2520
> > #14 0x00904074 in ExecAgg (pstate=0x607200070d38) at 
> > ../src/backend/executor/nodeAgg.c:2172
> 
> Any chance you could try and come up with a minimal reproducer?  You
> have access to see which aggregates are being used here and what data
> types are being given to them and then what's being done with the
> return value of that aggregate that's causing the crash.  Maybe you
> can still get the crash if you mock up some data to aggregate and
> strip out the guts from the plpgsql functions that we're crashing on?

Try this
CREATE OR REPLACE FUNCTION avfinal(x anycompatiblearray) RETURNS 
anycompatiblearray
LANGUAGE plpgsql AS $$
DECLARE
res real[];
BEGIN
res := ARRAY_FILL(x[1], ARRAY[4]);
RETURN res;
END;
$$;

CREATE OR REPLACE FUNCTION acc(x anycompatiblearray, y anycompatiblearray) 
RETURNS anycompatiblearray
LANGUAGE plpgsql AS $$
BEGIN
RETURN ARRAY_FILL(y[1], ARRAY[4]);
END;
$$;

CREATE OR REPLACE AGGREGATE public.av(anycompatiblearray) (
STYPE = anycompatiblearray,
INITCOND = '{{0},{0}}',
SFUNC = acc,
FINALFUNC = avfinal
);

CREATE OR REPLACE FUNCTION aw(real[])
RETURNS void LANGUAGE plpgsql
AS $function$
begin
end
$function$;

SELECT 
aw(av(ARRAY[1.0::real])),
aw(av(ARRAY[1.0::real])),
1;

Re: Direct I/O

2023-04-15 Thread Justin Pryzby

On Sat, Apr 15, 2023 at 02:19:35PM -0400, Tom Lane wrote:
> PS: I don't quite understand how it managed to get through initdb
> when CREATE DATABASE doesn't work.  Maybe there is a different
> code path taken in standalone mode?

ad43a413c4f7f5d024a5b2f51e00d280a22f1874
initdb: When running CREATE DATABASE, use STRATEGY = WAL_COPY.

Re: segfault tied to "IS JSON predicate" commit

2023-04-15 Thread Justin Pryzby

On Thu, Apr 13, 2023 at 09:14:01PM -0700, Peter Geoghegan wrote:
> I find that if I run the following test against a standard debug build
> on HEAD, my local installation reliably segfaults:
> 
> $ meson test --setup running --suite test_rls_hooks-running
> 
> Attached is a "bt full" run from gdb against a core dump. The query
> "EXPLAIN (costs off) SELECT * FROM rls_test_permissive;" runs when the
> backend segfaults.
> 
> The top frame of the back trace is suggestive of a use-after-free:
> 
> #0  copyObjectImpl (from=0x7f7f7f7f7f7f7f7e) at copyfuncs.c:187
> 187 switch (nodeTag(from))
> ...
> 
> "git bisect" suggests that the problem began at commit 6ee30209,
> "SQL/JSON: support the IS JSON predicate".
> 
> It's a bit surprising that the bug reproduces when I run a standard
> test, and yet we appear to have a bug that's about 2 weeks old.  There
> may be something unusual about my system that will turn out to be
> relevant -- though there is nothing particularly exotic about this
> machine. My repro doesn't rely on concurrent execution, or timing, or
> anything like that -- it's quite reliable.

I was able to reproduce this yesterday but not today.

I think what happened is that you (and I) are in the habbit of running
"meson test tmp_install" to compile new binaries and install them into
./tmp_install, and then run a server out from there.  But nowadays
there's also "meson test install_test_files".  I'm not sure what
combination of things are out of sync, but I suspect you forgot one of
0) compile *and* install the new binaries; or 1) restart the running
postmaster; or, 2) install the new shared library ("test files").

I saw the crash again when I did this:

time ninja
time meson test tmp_install install_test_files regress/regress # does not 
recompile, BTW
./tmp_install/usr/local/pgsql/bin/postgres -D 
./testrun/regress/regress/tmp_check/data -p 5678 -c autovacuum=no&
git checkout HEAD~222
time meson test tmp_install install_test_files
time PGPORT=5678 meson test --setup running test_rls_hooks-running/regress

In this case, I'm not sure if there's anything to blame meson for; the
issue is running server, which surely has different structures since
last month.

-- 
Justin

Re: v16dev: invalid memory alloc request size 8488348128

2023-04-14 Thread Justin Pryzby

Maybe you'll find valgrind errors to be helpful.

==17971== Source and destination overlap in memcpy(0x1eb8c078, 0x1d88cb20, 
123876054)
==17971==at 0x4C2E81D: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==17971==by 0x9C705A: memcpy (string3.h:51)
==17971==by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823)
==17971==by 0x8952F8: expand_array (array_expanded.c:131)
==17971==by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==by 0x6D9C8C: ExecProject (executor.h:388)
==17971==by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==by 0x6C4821: ExecProcNode (executor.h:272)
==17971==by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==by 0x870535: PortalRunSelect (pquery.c:924)
==17971==by 0x871CCE: PortalRun (pquery.c:768)
==17971==by 0x86D552: exec_simple_query (postgres.c:1274)

==17971== Invalid read of size 8
==17971==at 0x4C2EA20: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==17971==by 0x9C705A: memcpy (string3.h:51)
==17971==by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823)
==17971==by 0x8952F8: expand_array (array_expanded.c:131)
==17971==by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==by 0x6D9C8C: ExecProject (executor.h:388)
==17971==by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==by 0x6C4821: ExecProcNode (executor.h:272)
==17971==by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==by 0x870535: PortalRunSelect (pquery.c:924)
==17971==by 0x871CCE: PortalRun (pquery.c:768)
==17971==by 0x86D552: exec_simple_query (postgres.c:1274)
==17971==  Address 0x1eb8c038 is 8 bytes before a block of size 123,876,112 
alloc'd
==17971==at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==17971==by 0x9E4204: AllocSetAlloc (aset.c:732)
==17971==by 0x9ED5BD: palloc (mcxt.c:1224)
==17971==by 0x9C704C: pg_detoast_datum_copy (fmgr.c:1821)
==17971==by 0x8952F8: expand_array (array_expanded.c:131)
==17971==by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==by 0x6D9C8C: ExecProject (executor.h:388)
==17971==by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==by 0x6C4821: ExecProcNode (executor.h:272)
==17971==by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==by 0x870535: PortalRunSelect (pquery.c:924)

==17971== Invalid read of size 8
==17971==at 0x4C2EA28: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==17971==by 0x9C705A: memcpy (string3.h:51)
==17971==by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823)
==17971==by 0x8952F8: expand_array (array_expanded.c:131)
==17971==by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==by 0x6D9C8C: ExecProject (executor.h:388)
==17971==by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==by 0x6C4821: ExecProcNode (executor.h:272)
==17971==by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==by 0x870535: PortalRunSelect (pquery.c:924)
==17971==by 0x871CCE: PortalRun (pquery.c:768)
==17971==by 0x86D552: exec_simple_query (postgres.c:1274)
==17971==  Address 0x1eb8c030 is 16 bytes before a block of size 123,876,112 
alloc'd
==17971==at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==17971==by 0x9E4204: AllocSetAlloc (aset.c:732)
==17971==by 0x9ED5BD: palloc (mcxt.c:1224)
==17971==by 0x9C704C: pg_detoast_datum_copy (fmgr.c:1821)
==17971==by 0x8952F8: expand_array (array_expanded.c:131)
==17971==by 0x1E971A28: plpgsql_exec_function

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2605 matches

Mail list logo