RE: [PATCH] fix msvc build libpq error LNK2019 when link openssl;

2023-02-26 Thread gamefunc
sorry i should have made it clear – I am only reporting my build error issue;
the patch is only for illustrative purpose, not for merging to upstream;
And my English is not good, so I can only write simple descriptions;
>> This diff forces the creation of a VS2022Solution():
yes, my env: vs2022, openssl3.0.7; 

>> Note that buildfarm member drongo is providing coverage for 64b:
thank you, But I need to build libpq myself because I need to add functions to 
libpq for my example use; https://github.com/gamefunc/Aiolibpq_simple;

>> Are you sure that you just didn't mix 64b builds with 32b libraries;
no, all 64b;

From: Michael Paquier
Date: 2023年2月27日 13:27
To: gamefunc
CC: pgsql-hackers
Subject: Re: [PATCH] fix msvc build libpq error LNK2019 when link openssl;

On Mon, Feb 27, 2023 at 09:58:28AM +0800, gamefunc wrote:
> # I:
> (default target) (1) -> (Link target) ->
>   libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
> __imp_CertOpenStore, capi_open_store  
>   libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
> __imp_CertCloseStore, capi_find_key  
>   libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
> __imp_CertEnumCertificatesInStore, capi_find_cert  
>   libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
> __imp_CertFindCertificateInStore, capi_find_cert
>   libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
> __imp_CertDuplicateCertificateContext, capi_load_ssl_client_cert
>   libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
> __imp_CertFreeCertificateContext, capi_dsa_free
>   libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
> __imp_CertGetCertificateContextProperty, capi_cert_get_fname 

@@ -94,7 +94,7 @@ sub mkvcbuild
die 'Must run from root or msvc directory'
  unless (-d 'src/tools/msvc' && -d 'src')

-   my $vsVersion = DetermineVisualStudioVersion();
+   my $vsVersion = "17.00";

This diff forces the creation of a VS2022Solution(), which would be
incorrect when using an MSVC environment different than 17.0 as
version number, no?

Note that buildfarm member drongo is providing coverage for 64b
Windows builds with Visual 2019 and OpenSSL:
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=drongo&br=HEAD

Are you sure that you just didn't mix 64b builds with 32b libraries,
or vice-versa?
--
Michael




Re: pg_rewind: warn when checkpoint hasn't happened after promotion

2023-02-26 Thread Heikki Linnakangas

On 16/11/2022 07:17, kuroda.keis...@nttcom.co.jp wrote:

The issue here is pg_rewind looks into control file to determine the
soruce timeline, because the control file is not updated until the
first checkpoint ends after promotion finishes, even though file
blocks are already diverged.

Even in that case history file for the new timeline is already
created, so searching for the latest history file works.


I think this change is a good one because if I want
pg_rewind to run automatically after a promotion,
I don't have to wait for the checkpoint to complete.

The attached patch is Horiguchi-san's patch with
additional tests. The tests are based on James's tests,
"010_no_checkpoint_after_promotion.pl" tests that
pg_rewind is successfully executed without running
checkpoint after promote.


I fixed this last week in commit 009746, see thread [1]. I'm sorry I 
didn't notice this thread earlier.


I didn't realize that we had a notice about this in the docs. I'll go 
and remove that. Thanks!


- Heikki





Re: Allow tests to pass in OpenSSL FIPS mode

2023-02-26 Thread Peter Eisentraut

On 27.02.23 08:16, Michael Paquier wrote:

On Thu, Oct 13, 2022 at 01:16:18PM +0200, Alvaro Herrera wrote:

However, there are some changes in brin_multi.out that are quite
surprising and suggest that we might have bugs in brin:

+WARNING:  unexpected number of results 31 for 
(macaddr8col,>,macaddr8,b1:d1:0e:7b:af:a4:42:12,33)
+WARNING:  unexpected number of results 17 for 
(macaddr8col,>=,macaddr8,d9:35:91:bd:f7:86:0e:1e,15)
+WARNING:  unexpected number of results 11 for 
(macaddr8col,<=,macaddr8,23:e8:46:63:86:07:ad:cb,13)
+WARNING:  unexpected number of results 4 for 
(macaddr8col,<,macaddr8,13:16:8e:6a:2e:6c:84:b4,6)


This refers to brin_minmax_multi_distance_macaddr8(), no?  This is
amazing.  I have a hard time imagining how FIPS would interact with
what we do in mac8.c to explain that, so it may be something entirely
different.  Is that reproducible?


This is no longer present in the v2 patch.





Re: Allow tests to pass in OpenSSL FIPS mode

2023-02-26 Thread Michael Paquier
On Thu, Oct 13, 2022 at 01:16:18PM +0200, Alvaro Herrera wrote:
> However, there are some changes in brin_multi.out that are quite
> surprising and suggest that we might have bugs in brin:
> 
> +WARNING:  unexpected number of results 31 for 
> (macaddr8col,>,macaddr8,b1:d1:0e:7b:af:a4:42:12,33)
> +WARNING:  unexpected number of results 17 for 
> (macaddr8col,>=,macaddr8,d9:35:91:bd:f7:86:0e:1e,15)
> +WARNING:  unexpected number of results 11 for 
> (macaddr8col,<=,macaddr8,23:e8:46:63:86:07:ad:cb,13)
> +WARNING:  unexpected number of results 4 for 
> (macaddr8col,<,macaddr8,13:16:8e:6a:2e:6c:84:b4,6)

This refers to brin_minmax_multi_distance_macaddr8(), no?  This is
amazing.  I have a hard time imagining how FIPS would interact with
what we do in mac8.c to explain that, so it may be something entirely
different.  Is that reproducible?
--
Michael


signature.asc
Description: PGP signature


Re: Raising the SCRAM iteration count

2023-02-26 Thread Michael Paquier
On Thu, Feb 23, 2023 at 03:10:05PM +0100, Daniel Gustafsson wrote:
> In fixing the CFBot test error in the previous version I realized through
> off-list discussion that the GUC name was badly chosen.  Incorporating the
> value of another GUC in the name is a bad idea, so the attached version 
> reverts
> to "scram_iterations=".  Should there ever be another SCRAM method
> standardized (which seems a slim chance to happen before the v17 freeze) we 
> can
> make a backwards compatible change to ": | "
> where the latter is a default for all.  Internally the variable contains
> sha_256 though, that part I think is fine for readability.

Okay by me if you want to go this way.  We could always have the
compatibility argument later on if it proves necessary.

Anyway, the patch does that in libpq:
@@ -1181,6 +1181,10 @@ pqSaveParameterStatus(PGconn *conn, const char *name, 
const char *value)
conn->in_hot_standby =
(strcmp(value, "on") == 0) ? PG_BOOL_YES : PG_BOOL_NO;
}
+   else if (strcmp(name, "scram_sha_256_iterations") == 0)
+   {
+   conn->scram_sha_256_iterations = atoi(value);
+   }
This should match on "scram_iterations", which is the name of the
GUC.  Would the long-term plan be to use multiple variables in conn if
we ever get to : that would require more parsing?
This is fine by me, just asking. 

Perhaps there should be a test with \password to make sure that libpq
gets the call when the GUC is updated by a SET command?
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-02-26 Thread Önder Kalacı
Hi Amit, all


Wouldn't a table-level option like 'apply_index_scan' be better than a
> subscription-level option with a default value as false? Anyway, the
> bigger point is that we don't see a better way to proceed here than to
> introduce some option to control this behavior.
>

What would be a good API for adding such an option for table-level?
To be more specific, I cannot see any table level sub/pub options in the
docs.

My main motivation for doing it for subscription-level is that (a) it might
be
too much work for users to control the behavior if it is table-level (b) I
couldn't
find a good API for table-level, and inventing a new one seemed
like a big change.

Overall, I think it makes sense to disable the feature by default. It is
enabled by default, and that's good for test coverage for now, but
let me disable it when I push a version next time.


>
> I see this as a way to provide this feature for users but I would
> prefer to proceed with this if we can get some more buy-in from senior
> community members (at least one more committer) and some user(s) if
> possible. So, I once again request others to chime in and share their
> opinion.
>
>
Agreed, it would be great to hear some other perspectives on this.

Thanks,
Onder


RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-26 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> I was trying to think if there is any better way to implement the
> newly added callback (WalSndDelay()) but couldn't find any. For
> example, one idea I tried to evaluate is whether we can merge it with
> the existing callback WalSndUpdateProgress() or maybe extract the part
> other than progress tracking from that function into a new callback
> and then try to reuse it here as well. Though there is some common
> functionality like checking for timeout and processing replies still
> they are different enough that they seem to need separate callbacks.
> The prime purpose of a callback for the patch being discussed here is
> to delay the xact before sending the commit/prepare whereas the
> existing callback (WalSndUpdateProgress()) or what we are discussing
> at [1] allows sending the keepalive message in some special cases
> where there is no communication between walsender and walreceiver.
> Now, the WalSndDelay() also tries to check for timeout and send
> keepalive if necessary but there is also dependency on the delay
> parameter, so don't think it is a good idea of trying to combine those
> functionalities into one API.
> 
> Thoughts?
> 
> [1] -
> https://www.postgresql.org/message-id/20230210210423.r26ndnfmuifie4f6%40
> awork3.anarazel.de

Thank you for confirming. My understanding was that we should keep the current 
design.
I agree with your posting.

In the current callback and modified version in [1], sending keepalives is done
via ProcessPendingWrites(). It is called by many functions and should not be 
changed,
like adding end_time only for us. Moreover, the name is not suitable because
time-delayed logical replication does not wait until the send buffer becomes 
empty.

If we reconstruct WalSndUpdateProgress() and change mechanisms around that,
codes will become dirty. As Amit said, in one path, the lag will be tracked and
the walsender will wait until the buffer is empty.
In another path, the lag calculation will be ignored, and the walsender will 
wait
until the process spends time till a given period. Such a function is painful 
to read later.

I think callbacks that have different purposes should not be mixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-26 Thread Amit Kapila
On Mon, Feb 27, 2023 at 11:11 AM Masahiko Sawada  wrote:
>
> On Thu, Feb 23, 2023 at 9:10 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Thank you for reviewing! PSA new version.
> >
> >
>
> Thank you for updating the patch. Here are some comments on v7 patch:
>
> + *
> + * LOGICALREP_PROTO_MIN_SEND_DELAY_VERSION_NUM is the minimum protocol 
> version
> + * with support for delaying to send transactions. Introduced in PG16.
>   */
>  #define LOGICALREP_PROTO_MIN_VERSION_NUM 1
>  #define LOGICALREP_PROTO_VERSION_NUM 1
>  #define LOGICALREP_PROTO_STREAM_VERSION_NUM 2
>  #define LOGICALREP_PROTO_TWOPHASE_VERSION_NUM 3
>  #define LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM 4
> -#define LOGICALREP_PROTO_MAX_VERSION_NUM
> LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM
> +#define LOGICALREP_PROTO_MIN_SEND_DELAY_VERSION_NUM 4
> +#define LOGICALREP_PROTO_MAX_VERSION_NUM
> LOGICALREP_PROTO_MIN_SEND_DELAY_VERSION_NUM
>
> What is the usecase of the old macro,
> LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM, after adding
> LOGICALREP_PROTO_MIN_SEND_DELAY_VERSION_NUM ? I think if we go this
> way, we will end up adding macros every time when adding a new option,
> which seems not a good idea. I'm really not sure we need to change the
> protocol version or the macro. Commit
> 366283961ac0ed6d8901c6090f3fd02fce0a introduced the 'origin'
> subscription parameter that is also sent to the publisher, but we
> didn't touch the protocol version at all.
>

Right, I also don't see a reason to do anything for this. We have
previously bumped the protocol version when we send extra/additional
information from walsender but here that is not the requirement, so
this change doesn't seem to be required.

> ---
> Why do we not to delay sending COMMIT PREPARED messages?
>

I think we need to either add delay for prepare or commit prepared as
otherwise, it will lead to delaying the xact more than required. The
patch seems to add a delay before sending a PREPARE as that is the
time when the subscriber will apply the changes.
-- 
With Regards,
Amit Kapila.




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-26 Thread Amit Kapila
On Thu, Feb 23, 2023 at 5:40 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Thank you for reviewing! PSA new version.
>

I was trying to think if there is any better way to implement the
newly added callback (WalSndDelay()) but couldn't find any. For
example, one idea I tried to evaluate is whether we can merge it with
the existing callback WalSndUpdateProgress() or maybe extract the part
other than progress tracking from that function into a new callback
and then try to reuse it here as well. Though there is some common
functionality like checking for timeout and processing replies still
they are different enough that they seem to need separate callbacks.
The prime purpose of a callback for the patch being discussed here is
to delay the xact before sending the commit/prepare whereas the
existing callback (WalSndUpdateProgress()) or what we are discussing
at [1] allows sending the keepalive message in some special cases
where there is no communication between walsender and walreceiver.
Now, the WalSndDelay() also tries to check for timeout and send
keepalive if necessary but there is also dependency on the delay
parameter, so don't think it is a good idea of trying to combine those
functionalities into one API.

Thoughts?

[1] - 
https://www.postgresql.org/message-id/20230210210423.r26ndnfmuifie4f6%40awork3.anarazel.de

-- 
With Regards,
Amit Kapila.




Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-02-26 Thread torikoshia

On 2023-02-06 15:00, Tom Lane wrote:

Andres Freund  writes:
On February 5, 2023 9:12:17 PM PST, Tom Lane  
wrote:

Damir Belyalov  writes:
InputFunctionCallSafe() is good for detecting errors from 
input-functions
but there are such errors from NextCopyFrom () that can not be 
detected
with InputFunctionCallSafe(), e.g. "wrong number of columns in 
row''.


If you want to deal with those, then there's more work to be done to 
make
those bits non-error-throwing.  But there's a very finite amount of 
code

involved and no obvious reason why it couldn't be done.



I'm not even sure it makes sense to avoid that kind of error. And
invalid column count or such is something quite different than failing
some data type input routine, or falling a constraint.


I think it could be reasonable to put COPY's overall-line-format
requirements on the same level as datatype input format violations.
I agree that trying to trap every kind of error is a bad idea,
for largely the same reason that the soft-input-errors patches
only trap certain kinds of errors: it's too hard to tell whether
an error is an "internal" error that it's scary to continue past.


Is it a bad idea to limit the scope of allowing errors to 'soft' errors 
in InputFunctionCallSafe()?


I think it could be still useful for some usecases.

  diff --git a/src/test/regress/sql/copy2.sql 
b/src/test/regress/sql/copy2.sql


  +-- tests for IGNORE_DATATYPE_ERRORS option
  +CREATE TABLE check_ign_err (n int, m int[], k int);
  +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
  +1  {1} 1
  +a  {2} 2
  +3  {3} 33
  +4  {a, 4}  4
  +
  +5  {5} 5
  +\.
  +SELECT * FROM check_ign_err;

  diff --git a/src/test/regress/expected/copy2.out 
b/src/test/regress/expected/copy2.out

  index 090ef6c7a8..08e8056fc1 100644

  +-- tests for IGNORE_DATATYPE_ERRORS option
  +CREATE TABLE check_ign_err (n int, m int[], k int);
  +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
  +WARNING:  invalid input syntax for type integer: "a"
  +WARNING:  value "33" is out of range for type integer
  +WARNING:  invalid input syntax for type integer: "a"
  +WARNING:  invalid input syntax for type integer: ""
  +SELECT * FROM check_ign_err;
  + n |  m  | k
  +---+-+---
  + 1 | {1} | 1
  + 5 | {5} | 5
  +(2 rows)

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 16877d4cdd64db5f85bed9cd559e618d8211e598 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 27 Feb 2023 12:02:16 +0900
Subject: [PATCH v1] Add COPY option IGNORE_DATATYPE_ERRORS

---
 src/backend/commands/copy.c  |  8 
 src/backend/commands/copyfrom.c  | 11 +++
 src/backend/commands/copyfromparse.c | 12 ++--
 src/backend/parser/gram.y|  8 +++-
 src/bin/psql/tab-complete.c  |  3 ++-
 src/include/commands/copy.h  |  1 +
 src/include/commands/copyfrom_internal.h |  2 ++
 src/include/parser/kwlist.h  |  1 +
 src/test/regress/expected/copy2.out  | 14 ++
 src/test/regress/sql/copy2.sql   | 12 
 10 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e34f583ea7..2f1cfb3f4d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -410,6 +410,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
+	bool		ignore_datatype_errors_specified= false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -449,6 +450,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+		{
+			if (ignore_datatype_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_datatype_errors_specified= true;
+			opts_out->ignore_datatype_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index af52faca6d..24eec6a27d 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -959,6 +959,7 @@ CopyFrom(CopyFromState cstate)
 	{
 		TupleTableSlot *myslot;
 		bool		skip_tuple;
+		ErrorSaveContext escontext = {T_ErrorSaveContext};
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -991,10 +992,20 @@ CopyFrom(CopyFromState cstate)
 
 		ExecClearTuple(myslot);
 
+		if (cstate->opts.ignore_datatype_errors)
+		{
+			escontext.details_wanted = true;
+			cstate->escontext = escontext;
+		}
+
 		/* Directly store the values/nulls array in the slot */
 		if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
 			break;
 
+		/* Soft error occured, skip this tuple */
+		if(cstate->escontext.error_occurred)
+			continue;
+
 		ExecStoreVirtualTuple(myslot)

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-26 Thread Masahiko Sawada
On Thu, Feb 23, 2023 at 9:10 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shi,
>
> Thank you for reviewing! PSA new version.
>
> > + elog(DEBUG2, "time-delayed replication for txid %u, delay_time
> > = %d ms, remaining wait time: %ld ms",
> > +  ctx->write_xid, (int) ctx->min_send_delay,
> > +  remaining_wait_time_ms);
> >
> > I tried and saw that the xid here looks wrong, what it got is the xid of the
> > previous transaction. It seems `ctx->write_xid` has not been updated and we
> > can't use it.
> >
>
> Good catch. There are several approaches to fix that, I choose the simplest 
> way.
> TransactionId was added as an argument of functions.
>

Thank you for updating the patch. Here are some comments on v7 patch:

+ *
+ * LOGICALREP_PROTO_MIN_SEND_DELAY_VERSION_NUM is the minimum protocol version
+ * with support for delaying to send transactions. Introduced in PG16.
  */
 #define LOGICALREP_PROTO_MIN_VERSION_NUM 1
 #define LOGICALREP_PROTO_VERSION_NUM 1
 #define LOGICALREP_PROTO_STREAM_VERSION_NUM 2
 #define LOGICALREP_PROTO_TWOPHASE_VERSION_NUM 3
 #define LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM 4
-#define LOGICALREP_PROTO_MAX_VERSION_NUM
LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM
+#define LOGICALREP_PROTO_MIN_SEND_DELAY_VERSION_NUM 4
+#define LOGICALREP_PROTO_MAX_VERSION_NUM
LOGICALREP_PROTO_MIN_SEND_DELAY_VERSION_NUM

What is the usecase of the old macro,
LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM, after adding
LOGICALREP_PROTO_MIN_SEND_DELAY_VERSION_NUM ? I think if we go this
way, we will end up adding macros every time when adding a new option,
which seems not a good idea. I'm really not sure we need to change the
protocol version or the macro. Commit
366283961ac0ed6d8901c6090f3fd02fce0a introduced the 'origin'
subscription parameter that is also sent to the publisher, but we
didn't touch the protocol version at all.

---
Why do we not to delay sending COMMIT PREPARED messages?

---
+/*
+ * If we've requested to shut down, exit the process.
+ *
+ * Note that WalSndDone() cannot be used here because
the delaying
+ * changes will be sent in the function.
+ */
+if (got_STOPPING)
+WalSndShutdown();

Since the walsender exits without sending the done message at a server
shutdown, we get the following log message on the subscriber:

ERROR:  could not receive data from WAL stream: server closed the
connection unexpectedly

I think that since the walsender is just waiting for sending data, it
can send the done message if the socket is writable.

---
+delayUntil = TimestampTzPlusMilliseconds(delay_start,
ctx->min_send_delay);
+remaining_wait_time_ms =
TimestampDifferenceMilliseconds(GetCurrentTimestamp(), delayUntil);
+
(snip)
+
+/* Sleep until appropriate time. */
+timeout_sleeptime_ms =
WalSndComputeSleeptime(GetCurrentTimestamp());

I think it's better to call GetCurrentTimestamp() only once.

---
+# This test is successful only if at least the configured delay has elapsed.
+ok( time() - $publisher_insert_time >= $delay,
+"subscriber applies WAL only after replication delay for
non-streaming transaction"
+);

The subscriber doesn't actually apply WAL records, but logically
replicated changes. How about "subscriber applies changes only
after..."?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Change xl_hash_vacuum_one_page.ntuples from int to uint16

2023-02-26 Thread Amit Kapila
On Fri, Feb 17, 2023 at 7:44 PM Drouvot, Bertrand
 wrote:
>
> On 2/16/23 1:26 PM, Drouvot, Bertrand wrote:
> > Hi,
> >
> > On 2/16/23 12:00 PM, Amit Kapila wrote:
> >
> >> BTW, feel free to create the second patch
> >> (to align the types for variables/arguments) as that would be really
> >> helpful after we commit this one.
>

Pushed the first patch.

> Please find attached a patch proposal to do so.
>
> It looks like a Pandora's box as it produces
> those cascading changes:
>
>  _hash_vacuum_one_page
>  index_compute_xid_horizon_for_tuples
>  gistprunepage
>  PageIndexMultiDelete
>  gistXLogDelete
>  PageIndexMultiDelete
>  spgRedoVacuumRedirect
>  vacuumRedirectAndPlaceholder
>  spgPageIndexMultiDelete
>  moveLeafs
>  doPickSplit
>  _bt_delitems_vacuum
>  btvacuumpage
>  _bt_delitems_delete
>  _bt_delitems_delete_check
>  hash_xlog_move_page_contents
>  gistvacuumpage
>  gistXLogUpdate
>  gistplacetopage
>  hashbucketcleanup
>
>
> Which makes me:
>
> - wonder it is not too intrusive (we could reduce the scope and keep the
> PageIndexMultiDelete()'s nitems argument as an int for example).
>
> - worry if there is no side effects (like the one I'm mentioning as a comment
> in PageIndexMultiDelete()) even if it passes the CI tests.
>
> - wonder if we should not change MaxIndexTuplesPerPage from int to uint16 too 
> (given
> the fact that the maximum block size is 32 KB.
>
> I'm sharing this version but I still need to think about it and
> I'm curious about your thoughts too.
>

@@ -591,11 +591,11 @@ hash_xlog_move_page_contents(XLogReaderState *record)

  if (len > 0)
  {
- OffsetNumber *unused;
- OffsetNumber *unend;
+ uint16 *unused;
+ uint16 *unend;

- unused = (OffsetNumber *) ptr;
- unend = (OffsetNumber *) ((char *) ptr + len);
+ unused = (uint16 *) ptr;
+ unend = (uint16 *) ((char *) ptr + len);

It doesn't seem useful to me to make such changes. About other changes
in the second patch, it is not clear whether there is much value
addition by those even though I don't see anything wrong with them.
So, let's see if Nathan or others see the value in the proposed patch
or any subset of these changes.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] fix msvc build libpq error LNK2019 when link openssl;

2023-02-26 Thread Michael Paquier
On Mon, Feb 27, 2023 at 09:58:28AM +0800, gamefunc wrote:
> # I:
> (default target) (1) -> (Link target) ->
>   libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
> __imp_CertOpenStore, capi_open_store  
>   libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
> __imp_CertCloseStore, capi_find_key  
>   libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
> __imp_CertEnumCertificatesInStore, capi_find_cert  
>   libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
> __imp_CertFindCertificateInStore, capi_find_cert
>   libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
> __imp_CertDuplicateCertificateContext, capi_load_ssl_client_cert
>   libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
> __imp_CertFreeCertificateContext, capi_dsa_free
>   libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
> __imp_CertGetCertificateContextProperty, capi_cert_get_fname 

@@ -94,7 +94,7 @@ sub mkvcbuild
die 'Must run from root or msvc directory'
  unless (-d 'src/tools/msvc' && -d 'src')

-   my $vsVersion = DetermineVisualStudioVersion();
+   my $vsVersion = "17.00";

This diff forces the creation of a VS2022Solution(), which would be
incorrect when using an MSVC environment different than 17.0 as
version number, no?

Note that buildfarm member drongo is providing coverage for 64b
Windows builds with Visual 2019 and OpenSSL:
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=drongo&br=HEAD

Are you sure that you just didn't mix 64b builds with 32b libraries,
or vice-versa?
--
Michael


signature.asc
Description: PGP signature


Re: psql \watch 2nd argument: iteration count

2023-02-26 Thread Andrey Borodin
Thanks for looking into this!

On Mon, Feb 20, 2023 at 6:15 PM Kyotaro Horiguchi
 wrote:
>
>  FWIW the patch still accepts an incorrect parameter '1abc'
> by ignoring any trailing garbage.
Indeed, fixed.
>
> In any case, I reckon the error message should be more specific. In
> other words, it would be better if it suggests the expected input
> format and range.
+1.
Not a range, actually, because upper limits have no sense for a user.

>
> Regarding the second patch, if we want \watch to throw an error
> message for the garbage trailing to sleep times, I think we should do
> the same for iteration counts.
+1, done.

> Additionally, we need to update the
> documentation.
Done.

Thanks for the review!

Best regards, Andrey Borodin.


v3-0002-Add-iteration-count-argument-to-psql-watch-comman.patch
Description: Binary data


v3-0001-Fix-incorrect-argument-handling-in-psql-watch.patch
Description: Binary data


Re: Add LZ4 compression in pg_dump

2023-02-26 Thread Justin Pryzby
On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote:
> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> > I have some fixes (attached) and questions while polishing the patch for
> > zstd compression.  The fixes are small and could be integrated with the
> > patch for zstd, but could be applied independently.
> 
> One more - WriteDataToArchiveGzip() says:

One more again.

The LZ4 path is using non-streaming mode, which compresses each block
without persistent state, giving poor compression for -Fc compared with
-Fp.  If the data is highly compressible, the difference can be orders
of magnitude.

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c
12351763
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
21890708

That's not true for gzip:

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c
2118869
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c
2115832

The function ought to at least use streaming mode, so each block/row
isn't compressioned in isolation.  003 is a simple patch to use
streaming mode, which improves the -Fc case:

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
15178283

However, that still flushes the compression buffer, writing a block
header, for every row.  With a single-column table, pg_dump -Fc -Z lz4
still outputs ~10% *more* data than with no compression at all.  And
that's for compressible data.

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c
12890296
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c
11890296

I think this should use the LZ4F API with frames, which are buffered to
avoid outputting a header for every single row.  The LZ4F format isn't
compatible with the LZ4 format, so (unlike changing to the streaming
API) that's not something we can change in a bugfix release.  I consider
this an Opened Item.

With the LZ4F API in 004, -Fp and -Fc are essentially the same size
(like gzip).  (Oh, and the output is three times smaller, too.)

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c
4155448
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c
4156548

-- 
Justin
>From 3a980f956bf314fb161fbf0a76f62ed0c2c35bfe Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 24 Feb 2023 14:07:09 -0600
Subject: [PATCH 1/4] f!fixes for LZ4

---
 src/bin/pg_dump/compress_gzip.c |  8 
 src/bin/pg_dump/compress_io.h   |  2 +-
 src/bin/pg_dump/compress_lz4.c  | 12 
 src/bin/pg_dump/pg_dump.c   |  2 --
 4 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 0af65afeb4e..52f41c2e58c 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -123,17 +123,9 @@ WriteDataToArchiveGzip(ArchiveHandle *AH, CompressorState *cs,
 		gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
 		gzipcs->outsize = ZLIB_OUT_SIZE;
 
-		/*
-		 * A level of zero simply copies the input one block at the time. This
-		 * is probably not what the user wanted when calling this interface.
-		 */
-		if (cs->compression_spec.level == 0)
-			pg_fatal("requested to compress the archive yet no level was specified");
-
 		if (deflateInit(zp, cs->compression_spec.level) != Z_OK)
 			pg_fatal("could not initialize compression library: %s", zp->msg);
 
-		/* Just be paranoid - maybe End is called after Start, with no Write */
 		zp->next_out = gzipcs->outbuf;
 		zp->avail_out = gzipcs->outsize;
 	}
diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index bbde2693915..cdb15951ea9 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -172,7 +172,7 @@ struct CompressFileHandle
 extern CompressFileHandle *InitCompressFileHandle(const pg_compress_specification compression_spec);
 
 /*
- * Initialize a compress file stream. Deffer the compression algorithm
+ * Initialize a compress file stream. Infer the compression algorithm
  * from 'path', either by examining its suffix or by appending the supported
  * suffixes in 'path'.
  */
diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index fe1014e6e77..7dacfeae469 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -105,12 +105,8 @@ EndCompressorLZ4(ArchiveHandle *AH, CompressorState *cs)
 	LZ4CompressorState *LZ4cs;
 
 	LZ4cs = (LZ4CompressorState *) cs->private_data;
-	if (LZ4cs)
-	{
-		pg_free(LZ4cs->outbuf);
-		pg_free(LZ4cs);
-		cs->private_data = NULL;
-	}
+	pg_free(LZ4cs->outbuf);
+	pg_free(LZ4cs);
 }
 
 
@@ -161,8 +157,8 @@ typedef struct LZ4File
 } LZ4File;
 
 /*
- * LZ4 equivalent to feof() or gzeof(). The end of file is reached if there
- * is no decompressed output in the overflow buffer and the end of the file
+ * LZ4 equivalent to feof() or gzeof().  Return true iff there is no
+ * decompressed output in the overflow buffer and the end of the backing file
  * is

Re: Proposal: :SQL_EXEC_TIME (like :ROW_COUNT) Variable (psql)

2023-02-26 Thread Pavel Stehule
po 27. 2. 2023 v 5:08 odesílatel Kirk Wolak  napsal:

> On Fri, Feb 24, 2023 at 10:56 PM Kirk Wolak  wrote:
>
>> On Fri, Feb 24, 2023 at 2:11 AM Gurjeet Singh  wrote:
>>
>>> On Thu, Feb 23, 2023 at 8:42 PM Kirk Wolak  wrote:
>>>
>> ...
>
>> >   I think like ROW_COUNT, it should not change because of internal
>>> commands.
>>> ...
>>
>> By using \timing, the user is explicitly opting into any overhead
>>> caused by time-keeping. With this feature, the timing info will be
>>> collected all the time. So do consider evaluating the performance
>>> impact this can cause on people's workloads. They may not care for the
>>> impact in interactive mode, but in automated scripts, even a moderate
>>> performance overhead would be a deal-breaker.
>>>
>> Excellent point.  I run lots of long scripts, but I usually set \timing
>> on, just because I turn off everything else.
>> I tested 2,000+ lines of select 1; (Fast sql shouldn't matter, it's the
>> most impacted)
>> Honestly, it was imperceptible,  Maybe approximating 0.01 seconds
>> With timing on:  ~ seconds 0.28
>> With timing of:   ~ seconds 0.27
>>
>> The \timing incurs no realistic penalty at this point.  The ONLY penalty
>> we could face is the time to
>> write it to the variable, and that cannot be tested until implemented.
>> But I will do that.  And I will
>> report the results of the impact.  But I do not expect a big impact.  We
>> update SQL_COUNT without an issue.
>> And that might be much more expensive to get.
>>
>
> Okay, I've written and tested this using SQL_EXEC_ELAPSED (suggested name
> improvement).
> First, the instant you have ANY output, it swamps the impact. (I settled
> on: SELECT 1 as v \gset xxx) for no output
> Second, the variability of running even a constant script is mind-blowing.
> Third, I've limited the output...  I built this in layers (init.sql
> initializes the psql variables I use), run_100.sql runs
> another file (\i tst_2000.sql) 100 times.  Resulting in 200k selects.
>

This is the very worst case.

But nobody will run from psql 200K selects - can you try little bit more
real but still synthetic test case?

create table foo(a int);
begin
  insert into foo values(1);
 ...
  insert into foo values(20);
commit;

Regards

Pavel


>
> Executive Summary:  1,000,000 statements executed, consumes ~2 - 2.5
> seconds of extra time (Total)
>
> So, the per statement cost is: 2.5s / 1,000,000 = 0.000,0025 s per
> statement
> Roughly: 2.5us
>
> Unfortunately, my test lines look like this:
> Without Timing
> done 0.198215 (500) *total *98.862548 *min* 0.167614 *avg*
> 0.197725096000 *max *0.290659
>
> With Timing
> done 0.191583 (500) *total* 100.729868 *min *0.163280 *avg 
> *0.201459736000
> *max *0.275787
>
> Notice that the With Timing had a lower min, and a lower max.  But a
> higher average.
> The distance between min - avg  AND min - max, is big (those are for 1,000
> selects each)
>
> Are these numbers at the "So What" Level?
>
> While testing, I got the distinct impression that I am measuring something
> that changes, or that the
> variance in the system itself really swamps this on a per statement
> basis.  It's only impact is felt
> on millions of PSQL queries, and that's a couple of seconds...
>
> Curious what others think before I take this any further.
>
> regards, Kirk
>
>>
>> Thanks!
>>
>>>
>>> [1]:
>>> https://github.com/gurjeet/home/blob/08f1051fb854f4fc8fbc4f1326f393ed507a55ce/.bashrc#L278
>>> [2]:
>>> https://github.com/gurjeet/home/blob/08f1051fb854f4fc8fbc4f1326f393ed507a55ce/.bashrc#L262
>>>
>>> Best regards,
>>> Gurjeet
>>> http://Gurje.et
>>>
>>


Re: Proposal: :SQL_EXEC_TIME (like :ROW_COUNT) Variable (psql)

2023-02-26 Thread Kirk Wolak
On Fri, Feb 24, 2023 at 10:56 PM Kirk Wolak  wrote:

> On Fri, Feb 24, 2023 at 2:11 AM Gurjeet Singh  wrote:
>
>> On Thu, Feb 23, 2023 at 8:42 PM Kirk Wolak  wrote:
>>
> ...

> >   I think like ROW_COUNT, it should not change because of internal
>> commands.
>> ...
>
> By using \timing, the user is explicitly opting into any overhead
>> caused by time-keeping. With this feature, the timing info will be
>> collected all the time. So do consider evaluating the performance
>> impact this can cause on people's workloads. They may not care for the
>> impact in interactive mode, but in automated scripts, even a moderate
>> performance overhead would be a deal-breaker.
>>
> Excellent point.  I run lots of long scripts, but I usually set \timing
> on, just because I turn off everything else.
> I tested 2,000+ lines of select 1; (Fast sql shouldn't matter, it's the
> most impacted)
> Honestly, it was imperceptible,  Maybe approximating 0.01 seconds
> With timing on:  ~ seconds 0.28
> With timing of:   ~ seconds 0.27
>
> The \timing incurs no realistic penalty at this point.  The ONLY penalty
> we could face is the time to
> write it to the variable, and that cannot be tested until implemented.
> But I will do that.  And I will
> report the results of the impact.  But I do not expect a big impact.  We
> update SQL_COUNT without an issue.
> And that might be much more expensive to get.
>

Okay, I've written and tested this using SQL_EXEC_ELAPSED (suggested name
improvement).
First, the instant you have ANY output, it swamps the impact. (I settled
on: SELECT 1 as v \gset xxx) for no output
Second, the variability of running even a constant script is mind-blowing.
Third, I've limited the output...  I built this in layers (init.sql
initializes the psql variables I use), run_100.sql runs
another file (\i tst_2000.sql) 100 times.  Resulting in 200k selects.

Executive Summary:  1,000,000 statements executed, consumes ~2 - 2.5
seconds of extra time (Total)

So, the per statement cost is: 2.5s / 1,000,000 = 0.000,0025 s per statement
Roughly: 2.5us

Unfortunately, my test lines look like this:
Without Timing
done 0.198215 (500) *total *98.862548 *min* 0.167614 *avg*
0.197725096000 *max *0.290659

With Timing
done 0.191583 (500) *total* 100.729868 *min *0.163280 *avg
*0.201459736000
*max *0.275787

Notice that the With Timing had a lower min, and a lower max.  But a higher
average.
The distance between min - avg  AND min - max, is big (those are for 1,000
selects each)

Are these numbers at the "So What" Level?

While testing, I got the distinct impression that I am measuring something
that changes, or that the
variance in the system itself really swamps this on a per statement basis.
It's only impact is felt
on millions of PSQL queries, and that's a couple of seconds...

Curious what others think before I take this any further.

regards, Kirk

>
> Thanks!
>
>>
>> [1]:
>> https://github.com/gurjeet/home/blob/08f1051fb854f4fc8fbc4f1326f393ed507a55ce/.bashrc#L278
>> [2]:
>> https://github.com/gurjeet/home/blob/08f1051fb854f4fc8fbc4f1326f393ed507a55ce/.bashrc#L262
>>
>> Best regards,
>> Gurjeet
>> http://Gurje.et
>>
>


Re: allow meson to find ICU in non-standard localtion

2023-02-26 Thread Jeff Davis
On Sun, 2023-02-26 at 09:57 -0800, Andres Freund wrote:
> If you tell meson where to find the pkg-config file in those
> directories it'd
> also work. -Dpkg_config_path=...

Setup is able to find it, which is good, but it seems like it's not
adding it to RPATH so it's not working.

I think we need some doc updates to clarify which features are affected
by -Dextra_lib_dirs/-Dpkg_config_path.

Regards,
Jeff Davis





Re: What's the prefix?

2023-02-26 Thread Tom Lane
"jack...@gmail.com"  writes:
>> text is variable length so there is header information built into the 
>> datatype representation that indicates how long the content is.

David's statement is accurate.

> No, this is the varlena struct:
> struct varlena
> {
> char vl_len_[4]; /* Do not touch this field directly! */
> char vl_dat[FLEXIBLE_ARRAY_MEMBER]; /* Data content is here */

This struct only accurately describes "untoasted" varlenas.
The one you are looking at is a "short header" varlena;
see varattrib_1b and nearby comments in src/include/varatt.h,
or in postgres.h if you're not looking at current HEAD branch.

regards, tom lane




Re: Re: What's the prefix?

2023-02-26 Thread jack...@gmail.com

From: David G. Johnston
Date: 2023-02-27 00:27
To: jack...@gmail.com
CC: pgsql-hackers
Subject: Re: What's the prefix?
On Sun, Feb 26, 2023 at 9:16 AM jack...@gmail.com  wrote:
use these sqls:
create table t(a text);
insert into t values('a');
select lp,lp_len,t_data from heap_page_items(get_raw_page('t',0));
lp | lp_len | t_data 
++
  1 | 26 | \x0561
as you can see, the 61 is 'a', so what's the 05??? strange.

text is variable length so there is header information built into the datatype 
representation that indicates how long the content is.

David J.

No, this is the varlena struct:
struct varlena
{
char vl_len_[4]; /* Do not touch this field directly! */
char vl_dat[FLEXIBLE_ARRAY_MEMBER]; /* Data content is here */
};
when I insert 'a', this struct will be {
vl_len : 00 00 00 05
vl_dat: 'a'
}
the t_data should be \x000561, but it's \x0561? strange
--
jack...@gmail.com


Re: meson vs make: missing/inconsistent ENV

2023-02-26 Thread Justin Pryzby
On Sun, Feb 26, 2023 at 03:21:04PM -0800, Andres Freund wrote:
> > Is there any consideration of promoting these or other warnings to
> > fatal?
> 
> You mean the perl warnings?

Yes - it'd be nice if the warnings caused an obvious failure to allow
addressing the issue.  I noticed the icu warning while looking at a bug
in 0da243fed, and updating to add ZSTD.  




[PATCH] fix msvc build libpq error LNK2019 when link openssl;

2023-02-26 Thread gamefunc
# I:
(default target) (1) -> (Link target) ->
  libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
__imp_CertOpenStore?? capi_open_store  
  libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
__imp_CertCloseStore?? capi_find_key  
  libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
__imp_CertEnumCertificatesInStore?? capi_find_cert  
  libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
__imp_CertFindCertificateInStore?? capi_find_cert
  libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
__imp_CertDuplicateCertificateContext?? capi_load_ssl_client_cert
  libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
__imp_CertFreeCertificateContext?? capi_dsa_free
  libcrypto.lib(libcrypto-lib-e_capi.obj) : error LNK2019:  
__imp_CertGetCertificateContextProperty?? capi_cert_get_fname 

# A:
loss crypt32.lib

# Fix:
Mkvcbuild.pm: fix: add:
$libpq->AddLibrary('crypt32.lib');
$postgres->AddLibrary('crypt32.lib')

and simple fix: "Unable to determine Visual Studio version":
replace(
"my $vsVersion = DetermineVisualStudioVersion();",
"my $vsVersion = "17.00";");

Mkvcbuild.pm.patch
Description: Binary data


Re: Doc update for pg_stat_statements normalization

2023-02-26 Thread Michael Paquier
On Sat, Feb 25, 2023 at 01:59:04PM +, Imseih (AWS), Sami wrote:
> The overhead of storing this additional private data for the life of the query
> execution may not be  desirable.

Okay, but why?

> I think we also will need to copy the
> private data to QueryDesc as well to make it available to planner/utility/exec
> hooks.

This seems like the key point to me here.  If we copy more information
into the Query structures, then we basically have no need for sticky
entries, which could be an advantage on its own as it simplifies the
deallocation and lookup logic.

For a DML or a SELECT, the manipulation of the hash table would still
be a three-step process (post-analyze, planner and execution end), but
the first step would have no need to use an exclusive lock on the hash
table because we could just read and copy over the Query the
normalized query if an entry exists, meaning that we could actually
relax things a bit?  This relaxation has as cost the extra memory used
to store more data to allow the insertion to use a proper state of the
Query[Desc] coming from the JumbleState (this extra data has no need
to be JumbleState, just the results we generate from it aka the
normalized query).

> In v14, we added a dealloc metric to pg_stat_statements_info, which is 
> helpful.
> However, this only deals with the pgss_hash entry deallocation.
> I think we should also add a metric for the text file garbage collection.

This sounds like a good idea on its own.
--
Michael


signature.asc
Description: PGP signature


Provide PID data for "cannot wait on a latch owned by another process" in latch.c

2023-02-26 Thread Michael Paquier
Hi all,

While doing something I should not have done, I have been able to
trigger latch.c with the error of $subject.  Adding in the elog
generated some information about the PID owning the latch and
MyProcPid has made me understand immediately why I was wrong.  Would
there be any objections to add more information in this case?

The attached patch does so.
Thanks,
--
Michael
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index f4123e7de7..ab902b265e 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -936,7 +936,8 @@ AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd, Latch *latch,
 	if (latch)
 	{
 		if (latch->owner_pid != MyProcPid)
-			elog(ERROR, "cannot wait on a latch owned by another process");
+			elog(ERROR, "cannot wait on a latch owned by another process (%d,%d)",
+ latch->owner_pid, MyProcPid);
 		if (set->latch)
 			elog(ERROR, "cannot wait on more than one latch");
 		if ((events & WL_LATCH_SET) != WL_LATCH_SET)
@@ -1046,7 +1047,8 @@ ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch)
 	if (events == WL_LATCH_SET)
 	{
 		if (latch && latch->owner_pid != MyProcPid)
-			elog(ERROR, "cannot wait on a latch owned by another process");
+			elog(ERROR, "cannot wait on a latch owned by another process (%d,%d)",
+ latch->owner_pid, MyProcPid);
 		set->latch = latch;
 
 		/*


signature.asc
Description: PGP signature


Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-02-26 Thread Melanie Plageman
On Wed, Feb 22, 2023 at 8:04 AM Melih Mutlu  wrote:
>
> Hi Wang,
>
> Thanks for reviewing.
> Please see updated patches. [1]

This is cool! Thanks for working on this.
I had a chance to review your patchset and I had some thoughts and
questions.

I notice that you've added a new user-facing option to make a snapshot.
I think functionality to independently make a snapshot for use elsewhere
has been discussed in the past for the implementation of different
features (e.g. [1] pg_dump but they ended up using replication slots for
this I think?), but I'm not quite sure I understand all the implications
for providing a user-visible create snapshot command. Where can it be
used? When can the snapshot be used? In your patch's case, you know that
you can use the snapshot you are creating, but I just wonder if any
restrictions or caveats need be taken for its general use.

For the worker reuse portion of the code, could it be a separate patch
in the set? It could be independently committable and would be easier to
review (separate from repl slot reuse).

Given table sync worker reuse, I think it is worth considering a more
explicit structure for the table sync worker code now -- i.e. having a
TableSyncWorkerMain() function. Though they still do the
LogicalRepApplyLoop(), much of what else they do is different than the
apply leader.

Apply worker leader does:

ApplyWorkerMain()
walrcv_startstreaming()
LogicalRepApplyLoop()
launch table sync workers
walrcv_endstreaming()
proc_exit()

Table Sync workers master:

ApplyWorkerMain()
start_table_sync()
walrcv_create_slot()
copy_table()
walrcv_startstreaming()
start_apply()
LogicalRepApplyLoop()
walrcv_endstreaming()
proc_exit()

Now table sync workers need to loop back and do start_table_sync() again
for their new table.
You have done this in ApplyWorkerMain(). But I think that this could be
a separate main function since their main loop is effectively totally
different now than an apply worker leader.

Something like:

TableSyncWorkerMain()
TableSyncWorkerLoop()
start_table_sync()
walrcv_startstreaming()
LogicalRepApplyLoop()
walrcv_endstreaming()
wait_for_new_rel_assignment()
proc_exit()

You mainly have this structure, but it is a bit hidden and some of the
shared functions that previously may have made sense for table sync
worker and apply workers to share don't really make sense to share
anymore.

The main thing that table sync workers and apply workers share is the
logic in LogicalRepApplyLoop() (table sync workers use when they do
catchup), so perhaps we should make the other code separate?

Also on the topic of worker reuse, I was wondering if having workers
find their own next table assignment (as you have done in
process_syncing_tables_for_sync()) makes sense.

The way the whole system would work now (with your patch applied), as I
understand it, the apply leader would loop through the subscription rel
states and launch workers up to max_sync_workers_per_subscription for
every candidate table needing sync. The apply leader will continue to do
this, even though none of those workers would exit unless they die
unexpectedly. So, once it reaches max_sync_workers_per_subscription, it
won't launch any more workers.

When one of these sync workers is finished with a table (it is synced
and caught up), it will search through the subscription rel states
itself looking for a candidate table to work on.

It seems it would be common for workers to be looking through the
subscription rel states at the same time, and I don't really see how you
prevent races in who is claiming a relation to work on. Though you take
a shared lock on the LogicalRepWorkerLock, what if in between
logicalrep_worker_find() and updating my MyLogicalRepWorker->relid,
someone else also updates their relid to that relid. I don't think you
can update LogicalRepWorker->relid with only a shared lock.

I wonder if it is not better to have the apply leader, in
process_syncing_tables_for_apply(), first check for an existing worker
for the rel, then check for an available worker without an assignment,
then launch a worker?

Workers could then sleep after finishing their assignment and wait for
the leader to give them a new assignment.

Given an exclusive lock on LogicalRepWorkerLock, it may be okay for
workers to find their own table assignments from the subscriptionrel --
and perhaps this will be much more efficient from a CPU perspective. It
feels just a bit weird to have the code doing that buried in
process_syncing_tables_for_sync(). It seems like it should at least
return out to a main table sync worker loop in which workers loop
through finding a table and assigning it to themselves, syncing the
table, and catching the table up.

- Melanie

[1] 
https://www.postgresql.org/message-id/flat/CA%2BU5nMLRjGtpskUkYSzZOEYZ_8OMc02k%2BO6FDi4una3mB4rS1w%40mail.gmail.com#45692f75a1e79d4ce2d4f6a0

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Tom Lane
Andres Freund  writes:
> Not that I understand why that tries to terminate connections, instead of just
> looking at application name.

The test is trying to verify the application name reported by the
"remote" session, which isn't constant, so we can't just do "select
application_name from pg_stat_activity".  I agree that terminating the
connection seems like kind of a strange thing to do --- maybe it's to
ensure that we get a new session with the updated application name
for the next test case?  If not, maybe we could do "select 1 from
pg_stat_activity where application_name = computed-pattern", but that
has the same problem that a cache flush might have terminated the
remote session.

regards, tom lane




Re: meson vs make: missing/inconsistent ENV

2023-02-26 Thread Andres Freund
Hi,

On 2023-02-26 16:52:39 -0600, Justin Pryzby wrote:
> I noticed warnings:
> Use of uninitialized value $ENV{"with_icu"} in string eq at 
> /home/pryzbyj/src/postgres/src/bin/pg_dump/t/002_pg_dump.pl line 56.
> 
> and looked through: git grep ^export '*/Makefile'
> 
> and found that:
> src/bin/pg_dump/meson.build is missing with_icu since 396d348b0

Looks like it.


> Also, e6927270c added ZSTD to src/bin/pg_basebackup/meson.build, but
> it's not in ./Makefile ??  Maybe that was for consistency with other
> places, or pre-emptive in case the tap tests want to do tests involving
> the ZSTD tool.  But it'd be better if ./Makefile had it too.

I suspect I just over-eagerly added it when the pg_basebackup zstd support
went in, using the GZIP_PROGRAM/LZ4 cases as a template. And foolishly
assuming a newly added compression method would be tested.


> The rest I think are not errors:
> 
> src/test/meson.build is missing PG_TEST_EXTRA

> src/bin/pg_upgrade/meson.build and ../src/test/recovery/meson.build
> are missing REGRESS_SHLIB

Yep, these are added in the top-level meson.build.


> Is there any consideration of promoting these or other warnings to
> fatal?

You mean the perl warnings?

Greetings,

Andres Freund




RE: Allow logical replication to copy tables in binary format

2023-02-26 Thread Takamichi Osumi (Fujitsu)
On Monday, February 20, 2023 8:47 PM Melih Mutlu  wrote:
> Thanks for letting me know. 
> Attached the fixed version of the patch.
Hi, Melih


Thanks for updating the patch. Minor comments on v9.

(1) commit message

"The patch introduces a new parameter, copy_format, to CREATE SUBSCRIPTION to 
allow to choose
the format used in initial table synchronization."

This patch introduces the new parameter not only to CREATE SUBSCRIPTION and 
ALTER SUBSCRIPTION, then this description should be more general, something 
like below.

"The patch introduces a new parameter, copy_format, as subscription option to
allow user to choose the format of initial table synchronization."

(2) copy_table

We don't need to check the publisher's version below.

+
+   /* If the publisher is v16 or later, specify the format to copy data. */
+   if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 16)
+   {
+   char *format = MySubscription->copyformat == 
LOGICALREP_COPY_AS_BINARY ? "binary" : "text";
+   appendStringInfo(&cmd, "  WITH (FORMAT %s)", format);
+   options = lappend(options, makeDefElem("format", (Node *) 
makeString(format), -1));
+   }
+

We don't have this kind of check for "binary" option and it seems this is 
user's responsibility to avoid any errors during replication. If we want to add 
this kind of check, then we can add checks for both "binary" and "copy_format" 
option together as an independent patch.

(3) subscription.sql/out

The other existing other subscription options check the invalid input for newly 
created option (e.g. "foo" for disable_on_error,  streaming mode). So, I think 
we can add this type of test for this feature.



Best Regards,
Takamichi Osumi





meson vs make: missing/inconsistent ENV

2023-02-26 Thread Justin Pryzby
I noticed warnings:
Use of uninitialized value $ENV{"with_icu"} in string eq at 
/home/pryzbyj/src/postgres/src/bin/pg_dump/t/002_pg_dump.pl line 56.

and looked through: git grep ^export '*/Makefile'

and found that:
src/bin/pg_dump/meson.build is missing with_icu since 396d348b0

Also, e6927270c added ZSTD to src/bin/pg_basebackup/meson.build, but
it's not in ./Makefile ??  Maybe that was for consistency with other
places, or pre-emptive in case the tap tests want to do tests involving
the ZSTD tool.  But it'd be better if ./Makefile had it too.

The rest I think are not errors:

src/test/meson.build is missing PG_TEST_EXTRA
src/bin/pg_upgrade/meson.build and ../src/test/recovery/meson.build
are missing REGRESS_SHLIB

Is there any consideration of promoting these or other warnings to
fatal?

-- 
Justin




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-26 Thread Melanie Plageman
On Sun, Feb 26, 2023 at 04:11:45PM -0500, Melanie Plageman wrote:
> On Sun, Feb 26, 2023 at 12:33:03PM -0800, Andres Freund wrote:
> > On 2023-02-26 15:08:33 -0500, Tom Lane wrote:
> > > Andres Freund  writes:
> > > > They're all animals for testing older LLVM versions. They're using
> > > > pretty old clang versions. phycodurus and dragonet are clang 3.9, 
> > > > petalura and
> > > > desmoxytes is clang 4, idiacanthus and pogona are clang 5.
> > >
> > > [ shrug ... ]  If I thought this was actually good code, I might
> > > agree with ignoring these warnings; but I think what it mostly is
> > > is misleading overcomplication.
> >
> > I don't mind removing *_FIRST et al by using 0. None of the proposals for
> > getting rid of *_NUM_* seemed a cure actually better than the disease.
> 
> I am also fine with removing *_FIRST and allowing those electrons to
> move on to bigger and better things :)
> 
> >
> > Adding a cast to int of the loop iteration variable seems to work and only
> > noticeably, not untollerably, ugly.
> >
> > One thing that's odd is that the warnings don't appear reliably. The
> > "io_op < IOOP_NUM_TYPES" comparison in pgstatfuncs.c doesn't trigger any
> > with clang-4.
> 
> Using an int and casting all over the place certainly doesn't make the
> code more attractive, but I am fine with this if it seems like the least
> bad solution.
> 
> I didn't want to write a patch with this (ints instead of enums as loop
> control variable) without being able to reproduce the warnings myself
> and confirm the patch silences them. However, I wasn't able to reproduce
> the warnings myself. I tried to do so with a minimal repro on godbolt,
> and even with
> -Wtautological-constant-out-of-range-compare -Wall -Wextra -Weverything 
> -Werror
> I couldn't get clang 4 or 5 (or a number of other compilers I randomly
> picked from the dropdown) to produce the warnings.

Just kidding: it reproduces if the defined enum has two or less values.
Interesting...

After discovering this, tried out various solutions including one Andres
suggested:

for (IOOp io_op = 0; (int) io_op < IOOP_NUM_TYPES; io_op++)

and it does silence the warning. What do you think?

- Melanie




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-26 Thread Melanie Plageman
On Sun, Feb 26, 2023 at 12:33:03PM -0800, Andres Freund wrote:
> On 2023-02-26 15:08:33 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > They're all animals for testing older LLVM versions. They're using
> > > pretty old clang versions. phycodurus and dragonet are clang 3.9, 
> > > petalura and
> > > desmoxytes is clang 4, idiacanthus and pogona are clang 5.
> >
> > [ shrug ... ]  If I thought this was actually good code, I might
> > agree with ignoring these warnings; but I think what it mostly is
> > is misleading overcomplication.
>
> I don't mind removing *_FIRST et al by using 0. None of the proposals for
> getting rid of *_NUM_* seemed a cure actually better than the disease.

I am also fine with removing *_FIRST and allowing those electrons to
move on to bigger and better things :)

>
> Adding a cast to int of the loop iteration variable seems to work and only
> noticeably, not untollerably, ugly.
>
> One thing that's odd is that the warnings don't appear reliably. The
> "io_op < IOOP_NUM_TYPES" comparison in pgstatfuncs.c doesn't trigger any
> with clang-4.

Using an int and casting all over the place certainly doesn't make the
code more attractive, but I am fine with this if it seems like the least
bad solution.

I didn't want to write a patch with this (ints instead of enums as loop
control variable) without being able to reproduce the warnings myself
and confirm the patch silences them. However, I wasn't able to reproduce
the warnings myself. I tried to do so with a minimal repro on godbolt,
and even with
-Wtautological-constant-out-of-range-compare -Wall -Wextra -Weverything -Werror
I couldn't get clang 4 or 5 (or a number of other compilers I randomly
picked from the dropdown) to produce the warnings.

- Melanie




Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Andres Freund
Hi,

On 2023-02-26 15:57:01 -0500, Tom Lane wrote:
> However, the other stanza with debug_discard_caches muckery is the
> one about "test postgres_fdw.application_name GUC", and in that case
> ignoring the number of terminated connections would destroy the
> point of the test entirely; because without that, you're proving
> nothing about what the remote's application_name actually looks like.
> 
> I'm inclined to think we should indeed just nuke that test.  It's
> overcomplicated and it expends a lot of test cycles on a pretty
> marginal feature.

It does seem fairly complicated...

*If* we wanted to rescue it, we probably could just use a transaction around
the SELECT and the termination, which ought to prevent sinval issues.

Not that I understand why that tries to terminate connections, instead of just
looking at application name.

Greetings,

Andres Freund




Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Tom Lane
I wrote:
> I'm inclined to think we should indeed just nuke that test.  It's
> overcomplicated and it expends a lot of test cycles on a pretty
> marginal feature.

Perhaps a better idea: at the start of the test, set
postgres_fdw.application_name to something that exercises all the
available escape sequences, but don't try to verify what the
result looks like.  That at least gives us code coverage for the
escape sequence processing code, even if it doesn't prove that
the output is desirable.

regards, tom lane




Re: WIN32 pg_import_system_collations

2023-02-26 Thread Andrew Dunstan


On 2023-01-03 Tu 08:48, Peter Eisentraut wrote:

On 09.12.22 13:48, Juan José Santamaría Flecha wrote:
On Thu, Dec 1, 2022 at 8:46 AM Peter Eisentraut 
> wrote:



    What is the status of this now?  I think the other issue has been
    addressed?


Yes, that's addressed for MSVC builds. I think there are a couple of 
pending issues for MinGW, but those should have their own threads.


The patch had rotten, so PFA a rebased version.


committed




Now that I have removed the barrier to testing this in the buildfarm, 
and added an appropriate locale setting to drongo, we can see that this 
test fails like this:



diff -w -U3 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out
 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
--- 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out
  2023-01-23 04:39:06.755149600 +
+++ 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
   2023-02-26 17:32:54.115515200 +
@@ -363,16 +363,17 @@
 
 -- to_char

 SET lc_time TO 'de_DE';
+ERROR:  invalid value for parameter "lc_time": "de_DE"
 SELECT to_char(date '2010-03-01', 'DD TMMON ');
to_char
 -
- 01 MRZ 2010
+ 01 MAR 2010
 (1 row)
 
 SELECT to_char(date '2010-03-01', 'DD TMMON ' COLLATE "de_DE");

to_char
 -
- 01 MRZ 2010
+ 01 MAR 2010
 (1 row)
 
 -- to_date



The last of these is especially an issue, as it doesn't even throw an error.

See 




cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Tom Lane
Andres Freund  writes:
> Hm, yea, that should work. It's indeed the entirety of the diff
> https://api.cirrus-ci.com/v1/artifact/task/4718859714822144/testrun/build/testrun/postgres_fdw-running/regress/regression.diffs

> If we go that way we can remove the debug_discard muckery as well, I think?

Okay, so that seems to work for the "reestablish new connection" test:
as coded here, it passes with or without debug_discard_caches enabled,
and I believe it's testing what it intends to either way.  So that's
good.

However, the other stanza with debug_discard_caches muckery is the
one about "test postgres_fdw.application_name GUC", and in that case
ignoring the number of terminated connections would destroy the
point of the test entirely; because without that, you're proving
nothing about what the remote's application_name actually looks like.

I'm inclined to think we should indeed just nuke that test.  It's
overcomplicated and it expends a lot of test cycles on a pretty
marginal feature.

So I propose the attached.

regards, tom lane

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index d5fc61446a..d0ba40aca3 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9926,11 +9926,6 @@ WARNING:  there is no transaction in progress
 -- Change application_name of remote connection to special one
 -- so that we can easily terminate the connection later.
 ALTER SERVER loopback OPTIONS (application_name 'fdw_retry_check');
--- If debug_discard_caches is active, it results in
--- dropping remote connections after every transaction, making it
--- impossible to test termination meaningfully.  So turn that off
--- for this test.
-SET debug_discard_caches = 0;
 -- Make sure we have a remote connection.
 SELECT 1 FROM ft1 LIMIT 1;
  ?column? 
@@ -9939,13 +9934,12 @@ SELECT 1 FROM ft1 LIMIT 1;
 (1 row)
 
 -- Terminate the remote connection and wait for the termination to complete.
-SELECT pg_terminate_backend(pid, 18) FROM pg_stat_activity
+-- (If a cache flush happens, the remote connection might have already been
+-- dropped; so code this step in a way that doesn't fail if no connection.)
+DO $$ BEGIN
+PERFORM pg_terminate_backend(pid, 18) FROM pg_stat_activity
 	WHERE application_name = 'fdw_retry_check';
- pg_terminate_backend 
---
- t
-(1 row)
-
+END $$;
 -- This query should detect the broken connection when starting new remote
 -- transaction, reestablish new connection, and then succeed.
 BEGIN;
@@ -9958,13 +9952,10 @@ SELECT 1 FROM ft1 LIMIT 1;
 -- If we detect the broken connection when starting a new remote
 -- subtransaction, we should fail instead of establishing a new connection.
 -- Terminate the remote connection and wait for the termination to complete.
-SELECT pg_terminate_backend(pid, 18) FROM pg_stat_activity
+DO $$ BEGIN
+PERFORM pg_terminate_backend(pid, 18) FROM pg_stat_activity
 	WHERE application_name = 'fdw_retry_check';
- pg_terminate_backend 
---
- t
-(1 row)
-
+END $$;
 SAVEPOINT s;
 -- The text of the error might vary across platforms, so only show SQLSTATE.
 \set VERBOSITY sqlstate
@@ -9972,7 +9963,6 @@ SELECT 1 FROM ft1 LIMIT 1;-- should fail
 ERROR:  08006
 \set VERBOSITY default
 COMMIT;
-RESET debug_discard_caches;
 -- =
 -- test connection invalidation cases and postgres_fdw_get_connections function
 -- =
@@ -11627,77 +11617,6 @@ ALTER FOREIGN DATA WRAPPER postgres_fdw OPTIONS (nonexistent 'fdw');
 ERROR:  invalid option "nonexistent"
 HINT:  There are no valid options in this context.
 -- ===
--- test postgres_fdw.application_name GUC
--- ===
 Turn debug_discard_caches off for this test to make sure that
 the remote connection is alive when checking its application_name.
-SET debug_discard_caches = 0;
--- Specify escape sequences in application_name option of a server
--- object so as to test that they are replaced with status information
--- expectedly.
---
--- Since pg_stat_activity.application_name may be truncated to less than
--- NAMEDATALEN characters, note that substring() needs to be used
--- at the condition of test query to make sure that the string consisting
--- of database name and process ID is also less than that.
-ALTER SERVER loopback2 OPTIONS (application_name 'fdw_%d%p');
-SELECT 1 FROM ft6 LIMIT 1;
- ?column? 
---
-1
-(1 row)
-
-SELECT pg_terminate_backend(pid, 18) FROM pg_stat_activity
-  WHERE application_name =
-substring('fdw_' || current_database() || pg_backend_pid() for
-  current_setting('max_identifier_length')::int);
- pg_terminate_backend 
---

Re: meson logs environment

2023-02-26 Thread Andrew Dunstan


On 2023-02-26 Su 12:59, Andres Freund wrote:

Hi,

On 2023-02-20 20:47:59 -0500, Andrew Dunstan wrote:

I've noticed that `meson test` logs the complete environment in
meson_logs/testlog.txt. That seems unnecessary and probably undesirable for
the buildfarm client.

It doesn't seem unnecessary to me, at all. It's what you need to rerun the
test in a precise way.



Well, clearly I'm not the only person who is concerned about it - see 
the upstream issue Nazir referred to. In any case, I have got a 
procedure in my meson buildfarm client for filtering the inherited 
environment to accomodate this verbosity, so there's no need to do 
anything else here.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Tom Lane
Andres Freund  writes:
> On 2023-02-26 14:51:45 -0500, Tom Lane wrote:
>> If that's the only diff, we could just hide it, say by writing

> Hm, yea, that should work. It's indeed the entirety of the diff
> https://api.cirrus-ci.com/v1/artifact/task/4718859714822144/testrun/build/testrun/postgres_fdw-running/regress/regression.diffs

> If we go that way we can remove the debug_discard muckery as well, I think?

Perhaps.  I'll check to see if that stanza passes with debug_discard on.

regards, tom lane




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-26 Thread Andres Freund
On 2023-02-26 15:08:33 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > They're all animals for testing older LLVM versions. They're using
> > pretty old clang versions. phycodurus and dragonet are clang 3.9, petalura 
> > and
> > desmoxytes is clang 4, idiacanthus and pogona are clang 5.
> 
> [ shrug ... ]  If I thought this was actually good code, I might
> agree with ignoring these warnings; but I think what it mostly is
> is misleading overcomplication.

I don't mind removing *_FIRST et al by using 0. None of the proposals for
getting rid of *_NUM_* seemed a cure actually better than the disease.

Adding a cast to int of the loop iteration variable seems to work and only
noticeably, not untollerably, ugly.

One thing that's odd is that the warnings don't appear reliably. The
"io_op < IOOP_NUM_TYPES" comparison in pgstatfuncs.c doesn't trigger any
with clang-4.

Greetings,

Andres Freund




Re: stopgap fix for signal handling during restore_command

2023-02-26 Thread Andres Freund
Hi,

On 2023-02-26 11:39:00 -0800, Nathan Bossart wrote:
> On Sun, Feb 26, 2023 at 10:00:29AM -0800, Andres Freund wrote:
> > On 2023-02-25 14:06:29 -0800, Nathan Bossart wrote:
> >> On Sat, Feb 25, 2023 at 11:52:53AM -0800, Andres Freund wrote:
> >> > I think I opined on this before, but we really ought to have a function 
> >> > to do
> >> > some minimal signal safe output. Implemented centrally, instead of being 
> >> > open
> >> > coded in a bunch of places.
> >> 
> >> While looking around for the right place to put this, I noticed that
> >> there's a write_stderr() function in elog.c that we might be able to use.
> >> I used that in v9.  WDYT?
> > 
> > write_stderr() isn't signal safe, from what I can tell.
> 
> *facepalm*  Sorry.
> 
> What precisely did you have in mind?  AFAICT you are asking for a wrapper
> around write().

Partially I just want something that can easily be searched for, that can have
comments attached to it documenting why what it is doing is safe.

It'd not be a huge amount of work to have a slow and restricted string
interpolation support, to make it easier to write messages. Converting floats
is probably too hard to do safely, and I'm not sure %m can safely be
supported. But basic things like %d would be pretty simple.

Basically a loop around the format string that directly writes to stderr using
write(), and only supports a signal safe subset of normal format strings.

Greetings,

Andres Freund




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-26 Thread Tom Lane
Andres Freund  writes:
> They're all animals for testing older LLVM versions. They're using
> pretty old clang versions. phycodurus and dragonet are clang 3.9, petalura and
> desmoxytes is clang 4, idiacanthus and pogona are clang 5.

[ shrug ... ]  If I thought this was actually good code, I might
agree with ignoring these warnings; but I think what it mostly is
is misleading overcomplication.

regards, tom lane




Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Andres Freund
Hi,

On 2023-02-26 14:51:45 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-12-08 16:15:11 -0800, Andres Freund wrote:
> >> The most frequent case is postgres_fdw, which somewhat regularly fails 
> >> with a
> >> regression.diff like this:
> >> WHERE application_name = 'fdw_retry_check';
> >> pg_terminate_backend
> >> --
> >> - t
> >> -(1 row)
> >> +(0 rows)
> 
> > Unless somebody comes up with a way to make the test more reliable pretty
> > soon, I think we should just remove it. It's one of the most frequently
> > flapping tests at the moment.
> 
> If that's the only diff, we could just hide it, say by writing
> 
> do $$ begin
> PERFORM pg_terminate_backend(pid, 18) FROM pg_stat_activity
> WHERE application_name = 'fdw_retry_check';
> end $$;
> 
> The actually important thing is the failure check after this;
> we don't care that much whether the initially-created connection
> is still live at this point.

Hm, yea, that should work. It's indeed the entirety of the diff
https://api.cirrus-ci.com/v1/artifact/task/4718859714822144/testrun/build/testrun/postgres_fdw-running/regress/regression.diffs

If we go that way we can remove the debug_discard muckery as well, I think?

Greetings,

Andres Freund




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-26 Thread Andres Freund
Hi,

On 2023-02-26 14:40:00 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-02-26 13:20:00 -0500, Tom Lane wrote:
> >> I hadn't run my buildfarm-compile-warning scraper for a little while,
> >> but I just did, and I find that this commit is causing warnings on
> >> no fewer than 14 buildfarm animals.  They all look like
> 
> > What other animals? If it had been just ayu / clang 4, I'd not be sure it's
> > worth doing much here.
> 
> ayu
> batfish
> demoiselle
> desmoxytes
> dragonet
> idiacanthus
> mantid
> petalura
> phycodurus
> pogona
> wobbegong
> 
> Some of those are yours ;-)
> 
> Actually there are only 11, because I miscounted before, but
> there are new compilers in that group not only old ones.
> desmoxytes is gcc 10, for instance.

I think on mine the warnings come from the clang to generate bitcode, rather
than gcc. The parallel make output makes that a bit hard to see though, as
commands and warnings are interspersed.

They're all animals for testing older LLVM versions. They're using
pretty old clang versions. phycodurus and dragonet are clang 3.9, petalura and
desmoxytes is clang 4, idiacanthus and pogona are clang 5.

Greetings,

Andres Freund




Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Tom Lane
Andres Freund  writes:
> On 2022-12-08 16:15:11 -0800, Andres Freund wrote:
>> The most frequent case is postgres_fdw, which somewhat regularly fails with a
>> regression.diff like this:
>> WHERE application_name = 'fdw_retry_check';
>> pg_terminate_backend
>> --
>> - t
>> -(1 row)
>> +(0 rows)

> Unless somebody comes up with a way to make the test more reliable pretty
> soon, I think we should just remove it. It's one of the most frequently
> flapping tests at the moment.

If that's the only diff, we could just hide it, say by writing

do $$ begin
PERFORM pg_terminate_backend(pid, 18) FROM pg_stat_activity
WHERE application_name = 'fdw_retry_check';
end $$;

The actually important thing is the failure check after this;
we don't care that much whether the initially-created connection
is still live at this point.

regards, tom lane




Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Andres Freund
Hi,

On 2022-12-08 16:15:11 -0800, Andres Freund wrote:
> The most frequent case is postgres_fdw, which somewhat regularly fails with a
> regression.diff like this:
> 
> diff -U3 /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 
> /tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out
> --- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out   
> 2022-12-08 20:35:24.772888000 +
> +++ 
> /tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out
>   2022-12-08 20:43:38.19945 +
> @@ -9911,8 +9911,7 @@
>   WHERE application_name = 'fdw_retry_check';
>   pg_terminate_backend
>  --
> - t
> -(1 row)
> +(0 rows)
> 
>  -- This query should detect the broken connection when starting new remote
>  -- transaction, reestablish new connection, and then succeed.
> 
> 
> See e.g.
> https://cirrus-ci.com/task/5925540020879360
> https://api.cirrus-ci.com/v1/artifact/task/5925540020879360/testrun/build/testrun/postgres_fdw-running/regress/regression.diffs
> https://api.cirrus-ci.com/v1/artifact/task/5925540020879360/testrun/build/testrun/runningcheck.log
> 
> 
> The following comment in the test provides a hint what might be happening:
> 
> -- If debug_discard_caches is active, it results in
> -- dropping remote connections after every transaction, making it
> -- impossible to test termination meaningfully.  So turn that off
> -- for this test.
> SET debug_discard_caches = 0;
> 
> 
> I guess that a cache reset message arrives and leads to the connection being
> terminated. Unfortunately that's hard to see right now, as the relevant log
> messages are output with DEBUG3 - it's quite verbose, so enabling it for all
> tests will be painful.

Downthread I reported that I was able to pinpoint that the source of the issue
indeed is a cache inval message arriving in the wrong moment.


We've had trouble with this test for years by now. We added workarounds, like

commit 1273a15bf91fa322915e32d3b6dc6ec916397268
Author: Tom Lane 
Date:   2021-05-04 13:36:26 -0400

Disable cache clobber to avoid breaking postgres_fdw termination test.

But that didn't suffice to make it reliable. Not entirely surprising, given
there are cache resource sources other than clobber cache.

Unless somebody comes up with a way to make the test more reliable pretty
soon, I think we should just remove it. It's one of the most frequently
flapping tests at the moment.

Greetings,

Andres Freund




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-26 Thread Tom Lane
Andres Freund  writes:
> On 2023-02-26 13:20:00 -0500, Tom Lane wrote:
>> I hadn't run my buildfarm-compile-warning scraper for a little while,
>> but I just did, and I find that this commit is causing warnings on
>> no fewer than 14 buildfarm animals.  They all look like

> What other animals? If it had been just ayu / clang 4, I'd not be sure it's
> worth doing much here.

ayu
batfish
demoiselle
desmoxytes
dragonet
idiacanthus
mantid
petalura
phycodurus
pogona
wobbegong

Some of those are yours ;-)

Actually there are only 11, because I miscounted before, but
there are new compilers in that group not only old ones.
desmoxytes is gcc 10, for instance.

regards, tom lane




Re: stopgap fix for signal handling during restore_command

2023-02-26 Thread Nathan Bossart
On Sun, Feb 26, 2023 at 10:00:29AM -0800, Andres Freund wrote:
> On 2023-02-25 14:06:29 -0800, Nathan Bossart wrote:
>> On Sat, Feb 25, 2023 at 11:52:53AM -0800, Andres Freund wrote:
>> > I think I opined on this before, but we really ought to have a function to 
>> > do
>> > some minimal signal safe output. Implemented centrally, instead of being 
>> > open
>> > coded in a bunch of places.
>> 
>> While looking around for the right place to put this, I noticed that
>> there's a write_stderr() function in elog.c that we might be able to use.
>> I used that in v9.  WDYT?
> 
> write_stderr() isn't signal safe, from what I can tell.

*facepalm*  Sorry.

What precisely did you have in mind?  AFAICT you are asking for a wrapper
around write().

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-26 Thread Andres Freund
Hi,

On 2023-02-26 13:20:00 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Pushed the first (and biggest) commit. More tomorrow.
> 
> I hadn't run my buildfarm-compile-warning scraper for a little while,
> but I just did, and I find that this commit is causing warnings on
> no fewer than 14 buildfarm animals.  They all look like
> 
>  ayu   | 2023-02-25 23:02:08 | pgstat_io.c:40:14: warning: comparison 
> of constant 2 with expression of type 'IOObject' (aka 'enum IOObject') is 
> always true [-Wtautological-constant-out-of-range-compare]
>  ayu   | 2023-02-25 23:02:08 | pgstat_io.c:43:16: warning: comparison 
> of constant 4 with expression of type 'IOContext' (aka 'enum IOContext') is 
> always true [-Wtautological-constant-out-of-range-compare]
>  ayu   | 2023-02-25 23:02:08 | pgstat_io.c:70:19: warning: comparison 
> of constant 2 with expression of type 'IOObject' (aka 'enum IOObject') is 
> always true [-Wtautological-constant-out-of-range-compare]
>  ayu   | 2023-02-25 23:02:08 | pgstat_io.c:71:20: warning: comparison 
> of constant 4 with expression of type 'IOContext' (aka 'enum IOContext') is 
> always true [-Wtautological-constant-out-of-range-compare]
>  ayu   | 2023-02-25 23:02:08 | pgstat_io.c:115:14: warning: 
> comparison of constant 2 with expression of type 'IOObject' (aka 'enum 
> IOObject') is always true [-Wtautological-constant-out-of-range-compare]
>  ayu   | 2023-02-25 23:02:08 | pgstat_io.c:118:16: warning: 
> comparison of constant 4 with expression of type 'IOContext' (aka 'enum 
> IOContext') is always true [-Wtautological-constant-out-of-range-compare]
>  ayu   | 2023-02-25 23:02:08 | pgstatfuncs.c:1329:12: warning: 
> comparison of constant 2 with expression of type 'IOObject' (aka 'enum 
> IOObject') is always true [-Wtautological-constant-out-of-range-compare]
>  ayu   | 2023-02-25 23:02:08 | pgstatfuncs.c:1334:17: warning: 
> comparison of constant 4 with expression of type 'IOContext' (aka 'enum 
> IOContext') is always true [-Wtautological-constant-out-of-range-compare]

What other animals? If it had been just ayu / clang 4, I'd not be sure it's
worth doing much here.


> That is, these compilers think that comparisons like
> 
>   io_object < IOOBJECT_NUM_TYPES
>   io_context < IOCONTEXT_NUM_TYPES
> 
> are constant-true.  This seems not good; if they were to actually
> act on this observation, by removing those loop-ending tests,
> we'd have a problem.

It'd at least be obvious breakage :/


> The issue seems to be that code like this:
> 
> typedef enum IOContext
> {
>   IOCONTEXT_BULKREAD,
>   IOCONTEXT_BULKWRITE,
>   IOCONTEXT_NORMAL,
>   IOCONTEXT_VACUUM,
> } IOContext;
> 
> #define IOCONTEXT_FIRST IOCONTEXT_BULKREAD
> #define IOCONTEXT_NUM_TYPES (IOCONTEXT_VACUUM + 1)
> 
> is far too cute for its own good.  I'm not sure about how to fix it
> either.  I thought of defining
> 
> #define IOCONTEXT_LAST IOCONTEXT_VACUUM
> 
> and make the loop conditions like "io_context <= IOCONTEXT_LAST",
> but that doesn't actually fix the problem.
> 
> (Even aside from that, I do not find this coding even a little bit
> mistake-proof: you still have to remember to update the #define
> when adding another enum value.)

But the alternative is going around and updating N places, or having a LAST
member in the enum, which then precludes means either adding pointless case
statements or adding default: cases, which prevents the compiler from warning
when a new case is added.

I haven't dug up an old enough compiler yet, what happens if
IOCONTEXT_NUM_TYPES is redefined to ((int)IOOBJECT_TEMP_RELATION + 1)?


> We have similar code involving enum ForkNumber but it looks to me
> like the loop variables are always declared as plain "int".  That
> might be the path of least resistance here.

IIRC that caused some even longer lines due to casting the integer to the enum
in some other lines. Perhaps we should just case for the < comparison?

Greetings,

Andres Freund




Re: logical decoding and replication of sequences, take 2

2023-02-26 Thread Jonathan S. Katz

On 2/23/23 7:56 AM, Tomas Vondra wrote:

On 2/22/23 18:04, Jonathan S. Katz wrote:

On 2/22/23 5:02 AM, Tomas Vondra wrote:





Interestingly, in systems that tend to have higher rates of failover
(I'm thinking of a few distributed systems), this may cause int4
sequences to exhaust numbers slightly (marginally?) more quickly. Likely
not too big of an issue, but something to keep in mind.



IMHO the number of systems that would work fine with int4 sequences but
this change results in the sequences being "exhausted" too quickly is
indistinguishable from 0. I don't think this is an issue.


I agree it's an edge case. I do think it's a number greater than 0, 
having seen some incredibly flaky setups, particularly in distributed 
systems. I would not worry about it, but only mentioned it to try and 
probe edge cases.



Well, yeah. We don't support active-active logical replication (at least
not with the built-in). You can easily get into similar issues without
sequences.


The "origin=none" feature lets you replicate tables bidirectionally.
While it's not full "active-active", this is a starting point and a
feature for v16. We'll definitely have users replicating data
bidirectionally with this.



Well, then the users need to use some other way to generate IDs, not
local sequences. Either some sort of distributed/global sequence, UUIDs
or something like that.

[snip]


In any case, we should update the restrictions in [2] to state: while
sequences can be replicated, there is additional work required if you
are bidirectionally replicating tables that use sequences, esp. if used
in a PK or a constraint. We can provide alternatives to how a user could
set that up, i.e. not replicates the sequences or do something like in [3].



I agree. I see this as mostly a documentation issue.


Great. I agree that users need other mechanisms to generate IDs, but we 
should ensure we document that. If needed, I'm happy to help with the 
docs here.


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-26 Thread Tom Lane
I wrote:
> The issue seems to be that code like this:
> ...
> is far too cute for its own good.

Oh, there's another thing here that qualifies as too-cute: loops like

for (IOObject io_object = IOOBJECT_FIRST;
 io_object < IOOBJECT_NUM_TYPES; io_object++)

make it look like we could define these enums as 1-based rather
than 0-based, but if we did this code would fail, because it's
confusing "the number of values" with "1 more than the last value".

Again, we could fix that with tests like "io_context <= IOCONTEXT_LAST",
but I don't see the point of adding more macros rather than removing
some.  We do need IOOBJECT_NUM_TYPES to declare array sizes with,
so I think we should nuke the "xxx_FIRST" macros as being not worth
the electrons they're written on, and write these loops like

for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)

which is not actually adding any assumptions that you don't already
make by using io_object as a C array subscript.

regards, tom lane




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-26 Thread Tom Lane
Andres Freund  writes:
> Pushed the first (and biggest) commit. More tomorrow.

I hadn't run my buildfarm-compile-warning scraper for a little while,
but I just did, and I find that this commit is causing warnings on
no fewer than 14 buildfarm animals.  They all look like

 ayu   | 2023-02-25 23:02:08 | pgstat_io.c:40:14: warning: comparison 
of constant 2 with expression of type 'IOObject' (aka 'enum IOObject') is 
always true [-Wtautological-constant-out-of-range-compare]
 ayu   | 2023-02-25 23:02:08 | pgstat_io.c:43:16: warning: comparison 
of constant 4 with expression of type 'IOContext' (aka 'enum IOContext') is 
always true [-Wtautological-constant-out-of-range-compare]
 ayu   | 2023-02-25 23:02:08 | pgstat_io.c:70:19: warning: comparison 
of constant 2 with expression of type 'IOObject' (aka 'enum IOObject') is 
always true [-Wtautological-constant-out-of-range-compare]
 ayu   | 2023-02-25 23:02:08 | pgstat_io.c:71:20: warning: comparison 
of constant 4 with expression of type 'IOContext' (aka 'enum IOContext') is 
always true [-Wtautological-constant-out-of-range-compare]
 ayu   | 2023-02-25 23:02:08 | pgstat_io.c:115:14: warning: comparison 
of constant 2 with expression of type 'IOObject' (aka 'enum IOObject') is 
always true [-Wtautological-constant-out-of-range-compare]
 ayu   | 2023-02-25 23:02:08 | pgstat_io.c:118:16: warning: comparison 
of constant 4 with expression of type 'IOContext' (aka 'enum IOContext') is 
always true [-Wtautological-constant-out-of-range-compare]
 ayu   | 2023-02-25 23:02:08 | pgstatfuncs.c:1329:12: warning: 
comparison of constant 2 with expression of type 'IOObject' (aka 'enum 
IOObject') is always true [-Wtautological-constant-out-of-range-compare]
 ayu   | 2023-02-25 23:02:08 | pgstatfuncs.c:1334:17: warning: 
comparison of constant 4 with expression of type 'IOContext' (aka 'enum 
IOContext') is always true [-Wtautological-constant-out-of-range-compare]

That is, these compilers think that comparisons like

io_object < IOOBJECT_NUM_TYPES
io_context < IOCONTEXT_NUM_TYPES

are constant-true.  This seems not good; if they were to actually
act on this observation, by removing those loop-ending tests,
we'd have a problem.

The issue seems to be that code like this:

typedef enum IOContext
{
IOCONTEXT_BULKREAD,
IOCONTEXT_BULKWRITE,
IOCONTEXT_NORMAL,
IOCONTEXT_VACUUM,
} IOContext;

#define IOCONTEXT_FIRST IOCONTEXT_BULKREAD
#define IOCONTEXT_NUM_TYPES (IOCONTEXT_VACUUM + 1)

is far too cute for its own good.  I'm not sure about how to fix it
either.  I thought of defining

#define IOCONTEXT_LAST IOCONTEXT_VACUUM

and make the loop conditions like "io_context <= IOCONTEXT_LAST",
but that doesn't actually fix the problem.

(Even aside from that, I do not find this coding even a little bit
mistake-proof: you still have to remember to update the #define
when adding another enum value.)

We have similar code involving enum ForkNumber but it looks to me
like the loop variables are always declared as plain "int".  That
might be the path of least resistance here.

regards, tom lane




Re: stopgap fix for signal handling during restore_command

2023-02-26 Thread Andres Freund
Hi,

On 2023-02-25 14:06:29 -0800, Nathan Bossart wrote:
> On Sat, Feb 25, 2023 at 11:52:53AM -0800, Andres Freund wrote:
> > I think I opined on this before, but we really ought to have a function to 
> > do
> > some minimal signal safe output. Implemented centrally, instead of being 
> > open
> > coded in a bunch of places.
> 
> While looking around for the right place to put this, I noticed that
> there's a write_stderr() function in elog.c that we might be able to use.
> I used that in v9.  WDYT?

write_stderr() isn't signal safe, from what I can tell.




Re: meson logs environment

2023-02-26 Thread Andres Freund
Hi,

On 2023-02-20 20:47:59 -0500, Andrew Dunstan wrote:
> I've noticed that `meson test` logs the complete environment in
> meson_logs/testlog.txt. That seems unnecessary and probably undesirable for
> the buildfarm client.

It doesn't seem unnecessary to me, at all. It's what you need to rerun the
test in a precise way.

Greetings,

Andres Freund




Re: allow meson to find ICU in non-standard localtion

2023-02-26 Thread Andres Freund
Hi,

On 2023-02-22 10:26:23 -0800, Jeff Davis wrote:
> I attached a simple patch that allows meson to find ICU in a non-
> standard location if if you specify -Dextra_lib_dirs and
> -Dextra_include_dirs.

If you tell meson where to find the pkg-config file in those directories it'd
also work. -Dpkg_config_path=...

Does that suffice?

Greetings,

Andres Freund




Re: windows/meson cfbot warnings

2023-02-26 Thread Andres Freund
Hi,

On 2023-02-25 16:45:38 -0600, Justin Pryzby wrote:
> Unrelated, but something else changed and now there's this.
> 
> https://cirrus-ci.com/task/6202242768830464
> 
> [20:10:34.310] c:\cirrus>call sh -c 'if grep ": warning " build.txt; then 
> exit 1; fi; exit 0' 
> [20:10:34.397] C:\python\Include\pyconfig.h(117): warning C4005: 'MS_WIN64': 
> macro redefinition
> [20:10:34.397] C:\python\Include\pyconfig.h(117): warning C4005: 'MS_WIN64': 
> macro redefinition
> [20:10:34.397] C:\python\Include\pyconfig.h(117): warning C4005: 'MS_WIN64': 
> macro redefinition
> [20:10:34.397] C:\python\Include\pyconfig.h(117): warning C4005: 'MS_WIN64': 
> macro redefinition

Hm, odd.

There's a bit more context about the warning in the output:

[21:43:58.782] [1509/2165] Compiling C object 
src/pl/plpython/plpython3.dll.p/plpy_exec.c.obj
[21:43:58.782] C:\python\Include\pyconfig.h(117): warning C4005: 'MS_WIN64': 
macro redefinition
[21:43:58.924] src/pl/plpython/plpython3.dll.p/meson_pch-c.c: note: see 
previous definition of 'MS_WIN64'

I suspect one would have to look at the external source files to find where
it's also defined. The way the warning is triggered it seems to have to be
predefined somewhere in/below postgres.h.

It might be that we'll get more information after disabling precompiled headers.

Greetings,

Andres Freund




Re: What's the prefix?

2023-02-26 Thread David G. Johnston
On Sun, Feb 26, 2023 at 9:16 AM jack...@gmail.com  wrote:

> use these sqls:
> create table t(a text);
> insert into t values('a');
> select lp,lp_len,t_data from heap_page_items(get_raw_page('t',0));
> lp | lp_len | t_data
> ++
>   1 | 26 | \x0561
> as you can see, the 61 is 'a', so what's the 05??? strange.
>

text is variable length so there is header information built into the
datatype representation that indicates how long the content is.

David J.


What's the prefix?

2023-02-26 Thread jack...@gmail.com
use these sqls:
create table t(a text);
insert into t values('a');
select lp,lp_len,t_data from heap_page_items(get_raw_page('t',0));
lp | lp_len | t_data 
++
  1 | 26 | \x0561
as you can see, the 61 is 'a', so what's the 05??? strange.


jack...@gmail.com


Track IO times in pg_stat_io

2023-02-26 Thread Melanie Plageman
Hi,

As suggested in [1], the attached patch adds IO times to pg_stat_io;

I added docs but haven't added any tests. The timings will only be
non-zero when track_io_timing is on, and I only see tests with track IO
timing on in explain.sql and the IO timings I added to pg_stat_io would
not be visible there.

I didn't split it up into two patches (one with the changes to track IO
timing and 1 with the view additions and docs), because I figured the
overall diff is pretty small.

There is one minor question (in the code as a TODO) which is whether or
not it is worth cross-checking that IO counts and times are either both
zero or neither zero in the validation function
pgstat_bktype_io_stats_valid().

- Melanie

[1] 
https://www.postgresql.org/message-id/20230209050319.chyyup4vtq4jzobq%40awork3.anarazel.de
From f0c96e638e33f7404b44b936d5dfa6d4945b99d0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Sat, 25 Feb 2023 18:09:10 -0500
Subject: [PATCH v1] Track IO times in pg_stat_io

Add IO timing for reads, writes, extends, and fsyncs to pg_stat_io.
---
 doc/src/sgml/monitoring.sgml   | 48 +++
 src/backend/catalog/system_views.sql   |  4 ++
 src/backend/storage/buffer/bufmgr.c| 34 +++
 src/backend/storage/buffer/localbuf.c  | 14 +
 src/backend/storage/smgr/md.c  | 30 ++
 src/backend/utils/activity/pgstat_io.c | 83 +-
 src/backend/utils/adt/pgstatfuncs.c| 40 +++--
 src/include/catalog/pg_proc.dat|  6 +-
 src/include/pgstat.h   |  5 +-
 src/test/regress/expected/rules.out|  6 +-
 10 files changed, 246 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b0b997f092..e74d9c1cf1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3814,6 +3814,18 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   
+read_time double precision
+   
+   
+Time spent in read operations in milliseconds (if  is enabled, otherwise zero)
+   
+  
+ 
+
  
   

@@ -3826,6 +3838,18 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   
+write_time double precision
+   
+   
+Time spent in write operations in milliseconds (if  is enabled, otherwise zero)
+   
+  
+ 
+
  
   

@@ -3838,6 +3862,18 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   
+extend_time double precision
+   
+   
+Time spent in extend operations in milliseconds (if  is enabled, otherwise zero)
+   
+  
+ 
+
  
   

@@ -3902,6 +3938,18 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   
+fsync_time double precision
+   
+   
+Time spent in fsync operations in milliseconds (if  is enabled, otherwise zero)
+   
+  
+ 
+
  
   

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 34ca0e739f..39391bc2fc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1123,12 +1123,16 @@ SELECT
b.io_object,
b.io_context,
b.reads,
+   b.read_time,
b.writes,
+   b.write_time,
b.extends,
+   b.extend_time,
b.op_bytes,
b.evictions,
b.reuses,
b.fsyncs,
+   b.fsync_time,
b.stats_reset
 FROM pg_stat_get_io() b;
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 98904a7c05..52302b317e 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1000,11 +1000,28 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 	if (isExtend)
 	{
+		instr_time	io_start,
+	io_time;
+
 		/* new buffers are zero-filled */
 		MemSet((char *) bufBlock, 0, BLCKSZ);
+
+		if (track_io_timing)
+			INSTR_TIME_SET_CURRENT(io_start);
+		else
+			INSTR_TIME_SET_ZERO(io_start);
+
 		/* don't set checksum for all-zero page */
 		smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);
 
+		if (track_io_timing)
+		{
+			INSTR_TIME_SET_CURRENT(io_time);
+			INSTR_TIME_SUBTRACT(io_time, io_start);
+			pgstat_count_io_time(io_object, io_context, IOOP_EXTEND, io_time);
+		}
+
+
 		pgstat_count_io_op(io_object, io_context, IOOP_EXTEND);
 
 		/*
@@ -1042,6 +1059,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 INSTR_TIME_SUBTRACT(io_time, io_start);
 pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time));
 INSTR_TIME_ADD(pgBufferUsage.blk_read_time, io_time);
+pgstat_count_io_time(io_objec

Re: Re: Give me more details of some bits in infomask!!

2023-02-26 Thread David G. Johnston
On Sun, Feb 26, 2023 at 8:36 AM jack...@gmail.com  wrote:

> > CID means "command ID" i.e. sequential ID assigned to commands in a
> > single session (for visibility checks, so that a query doesn't see data
> > deleted by earlier commands in the same session). See
> > src/backend/utils/time/combocid.c for basic explanation of what "combo
> > CID" is.
> I think if cid is used for  visibility checks in one session, that's
> meaingless, beacause we can use the t_xmin and t_xmax to
> get this goal. Is tis
>
>
I think the word "session" is wrong.  It should be "transaction".

IIUC, it is what is changed when one issues CommandCounterIncrement within
a transaction.  And you need somewhere to save which CCI step deletes rows,
in particular due to the savepoint feature.

David J.


Re: Re: Why the lp_len is 28 not 32?

2023-02-26 Thread David G. Johnston
On Sun, Feb 26, 2023 at 8:11 AM jack...@gmail.com  wrote:

>
> *From:* Tomas Vondra 
>
> > +++
> >   1 |   8160 | 28 | \x0100
> >
> 
>
> Pretty sure this is because we align the data to MAXALIGN, and on x86_64
> that's 8 bytes. 28 is not a multiple of 8 while 32 is.
>
> >> yes, So it should be 32 bytes not 28bytes, but the sql result is 28
> !! that's false
>
>
No, that is a definition not matching your expectation.  Are you trying to
demonstrate a bug here or just observing that your intuition of this didn't
work here?

The content doesn't include alignment padding.  The claim isn't that the
size is "the number of bytes consumed in some place within the page" but
rather the size is "the number of bytes needed to get the content required
to be passed into the input function for the datatype".  Nicely, it is
trivial to then align the value to figure out the consumed width.  If you
just have the aligned size you would never know how many bytes you need for
the data value.

David J.


Re: Re: Give me more details of some bits in infomask!!

2023-02-26 Thread jack...@gmail.com
 
From: Tomas Vondra
Date: 2023-02-26 23:23
To: jack...@gmail.com; pgsql-hackers
Subject: Re: Give me more details of some bits in infomask!!
On 2/26/23 15:30, jack...@gmail.com wrote:
> here are the source codes from src/include/access/htup_details.h.
> /*
>  * information stored in t_infomask:
>  */
> #define HEAP_HASNULL0x0001/* has null attribute(s) */
> #define HEAP_HASVARWIDTH0x0002/* has variable-width attribute(s) */
> #define HEAP_HASEXTERNAL0x0004/* has external stored attribute(s) */
> #define HEAP_HASOID_OLD0x0008/* has an object-id field */
> #define HEAP_XMAX_KEYSHR_LOCK0x0010/* xmax is a key-shared locker */
> #define HEAP_COMBOCID0x0020/* t_cid is a combo CID */
> #define HEAP_XMAX_EXCL_LOCK0x0040/* xmax is exclusive locker */
> #define HEAP_XMAX_LOCK_ONLY0x0080/* xmax, if valid, is only a locker */
> 
> And I can't understand these attrs:
 
I suggest you try something like 'git grep HEAP_HASEXTERNAL' which shows
you where the flag is used, which should tell you what it means. These
short descriptions generally assume you know enough about the internals.
 
> 1. external stored attribute(s), what is this?  can you give a create
> statement to show me?
 
external = value stored in a TOAST table
 
> 2. xmax is a key-shared locker/exclusive locker/only a locker, so how
> you use this? can you give me a  scenario?
> let me try to explain it:
>  if there is a txn is trying to read this heaptuple,
> the HEAP_XMAX_KEYSHR_LOCK bit will be set to 1.
>  if there is a txn is trying to delete/update this heaptuple,
> the HEAP_XMAX_EXCL_LOCK bit will be set to 1.
>  but for HEAP_XMAX_LOCK_ONLY, I can't understand.
> And another thought is that these three bit can have only one to be set
> 1 at most.
 
I believe HEAP_XMAX_LOCK_ONLY means the xmax transaction only locked the
tuple, without deleting/updating it.
 
> 3. t_cid is a combo CID? what's a CID? give me an example please.
 
CID means "command ID" i.e. sequential ID assigned to commands in a
single session (for visibility checks, so that a query doesn't see data
deleted by earlier commands in the same session). See
src/backend/utils/time/combocid.c for basic explanation of what "combo
CID" is.
 
 
regards
 
-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


> I believe HEAP_XMAX_LOCK_ONLY means the xmax transaction only locked the
> tuple, without deleting/updating it.
if so, you mean when I read this tuple, this bit will be set 1, but I think 
this is duplicat with HEAP_XMAX_KEYSHR_LOCK.

> CID means "command ID" i.e. sequential ID assigned to commands in a
> single session (for visibility checks, so that a query doesn't see data
> deleted by earlier commands in the same session). See
> src/backend/utils/time/combocid.c for basic explanation of what "combo
> CID" is.
I think if cid is used for  visibility checks in one session, that's 
meaingless, beacause we can use the t_xmin and t_xmax to 
get this goal. Is tis 


Re: Why the lp_len is 28 not 32?

2023-02-26 Thread Tomas Vondra
On 2/26/23 16:11, jack...@gmail.com wrote:

> 
> yes, So it should be 32 bytes not 28bytes, but the sql result is 28
> !! that's false

No. The tuple is 28 bytes long, and that's what's stored in lp_len. But
we align the start of the tuple to a multiple of 8 bytes. So it's at
offset 8160 because that's the closest multiple of 8. Then there's 28
bytes of data and then 4 empty bytes.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Give me more details of some bits in infomask!!

2023-02-26 Thread Tomas Vondra
On 2/26/23 15:30, jack...@gmail.com wrote:
> here are the source codes from src/include/access/htup_details.h.
> /*
>  * information stored in t_infomask:
>  */
> #define HEAP_HASNULL0x0001/* has null attribute(s) */
> #define HEAP_HASVARWIDTH0x0002/* has variable-width attribute(s) */
> #define HEAP_HASEXTERNAL0x0004/* has external stored attribute(s) */
> #define HEAP_HASOID_OLD0x0008/* has an object-id field */
> #define HEAP_XMAX_KEYSHR_LOCK0x0010/* xmax is a key-shared locker */
> #define HEAP_COMBOCID0x0020/* t_cid is a combo CID */
> #define HEAP_XMAX_EXCL_LOCK0x0040/* xmax is exclusive locker */
> #define HEAP_XMAX_LOCK_ONLY0x0080/* xmax, if valid, is only a locker */
> 
> And I can't understand these attrs:

I suggest you try something like 'git grep HEAP_HASEXTERNAL' which shows
you where the flag is used, which should tell you what it means. These
short descriptions generally assume you know enough about the internals.

> 1. external stored attribute(s), what is this?  can you give a create
> statement to show me?

external = value stored in a TOAST table

> 2. xmax is a key-shared locker/exclusive locker/only a locker, so how
> you use this? can you give me a  scenario?
> let me try to explain it:
>  if there is a txn is trying to read this heaptuple,
> the HEAP_XMAX_KEYSHR_LOCK bit will be set to 1.
>  if there is a txn is trying to delete/update this heaptuple,
> the HEAP_XMAX_EXCL_LOCK bit will be set to 1.
>  but for HEAP_XMAX_LOCK_ONLY, I can't understand.
> And another thought is that these three bit can have only one to be set
> 1 at most.

I believe HEAP_XMAX_LOCK_ONLY means the xmax transaction only locked the
tuple, without deleting/updating it.

> 3. t_cid is a combo CID? what's a CID? give me an example please.

CID means "command ID" i.e. sequential ID assigned to commands in a
single session (for visibility checks, so that a query doesn't see data
deleted by earlier commands in the same session). See
src/backend/utils/time/combocid.c for basic explanation of what "combo
CID" is.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Re: Why the lp_len is 28 not 32?

2023-02-26 Thread jack...@gmail.com
 
From: Tomas Vondra
Date: 2023-02-26 23:07
To: jack...@gmail.com; pgsql-hackers
Subject: Re: Why the lp_len is 28 not 32?
On 2/26/23 15:35, jack...@gmail.com wrote:
> use these sqls below:
> create table t(a int);
> insert into t values(1);
> select lp,lp_off,lp_len,t_data from heap_page_items(get_raw_page('t',0));
>  lp | lp_off | lp_len |   t_data   
> +++
>   1 |   8160 | 28 | \x0100
> 
 
Pretty sure this is because we align the data to MAXALIGN, and on x86_64
that's 8 bytes. 28 is not a multiple of 8 while 32 is.
 
regards
 
-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

yes, So it should be 32 bytes not 28bytes, but the sql result is 28 !! 
that's false
-
jack...@gmail.com;


Re: Why the lp_len is 28 not 32?

2023-02-26 Thread Tomas Vondra
On 2/26/23 15:35, jack...@gmail.com wrote:
> use these sqls below:
> create table t(a int);
> insert into t values(1);
> select lp,lp_off,lp_len,t_data from heap_page_items(get_raw_page('t',0));
>  lp | lp_off | lp_len |   t_data   
> +++
>   1 |   8160 |     28 | \x0100
> 

Pretty sure this is because we align the data to MAXALIGN, and on x86_64
that's 8 bytes. 28 is not a multiple of 8 while 32 is.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-02-26 Thread Tomas Vondra



On 2/25/23 06:02, Justin Pryzby wrote:
> I have some fixes (attached) and questions while polishing the patch for
> zstd compression.  The fixes are small and could be integrated with the
> patch for zstd, but could be applied independently.
> 
> - I'm unclear about get_error_func().  That's called in three places
>   from pg_backup_directory.c, after failures from write_func(), to
>   supply an compression-specific error message to pg_fatal().  But it's
>   not being used outside of directory format, nor for errors for other
>   function pointers, or even for all errors in write_func().  Is there
>   some reason why each compression method's write_func() shouldn't call
>   pg_fatal() directly, with its compression-specific message ?
> 

I think there are a couple more places that might/should call
get_error_func(). For example ahwrite() in pg_backup_archiver.c now
simply does

if (bytes_written != size * nmemb)
WRITE_ERROR_EXIT;

but perhaps it should call get_error_func() too. There are probably
other places that call write_func() and should use get_error_func().

> - I still think supports_compression() should be renamed, or made into a
>   static function in the necessary file.  The main reason is that it's
>   more clear what it indicates - whether compression is "implemented by
>   pgdump" and not whether compression is "supported by this postgres
>   build".  It also seems possible that we'd want to add a function
>   called something like supports_compression(), indicating whether the
>   algorithm is supported by the current build.  It'd be better if pgdump
>   didn't subjugate that name.
> 

If we choose to rename this to have pgdump_ prefix, fine with me. But I
don't think there's a realistic chance of conflict, as it's restricted
to pgdump header etc. And it's not part of an API, so I guess we could
rename that in the future if needed.

> - Finally, the "Nothing to do in the default case" comment comes from
>   Michael's commit 5e73a6048:
> 
> +   /*
> +* Custom and directory formats are compressed by default with gzip 
> when
> +* available, not the others.
> +*/
> +   if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> +   !user_compression_defined)
> {
>  #ifdef HAVE_LIBZ
> -   if (archiveFormat == archCustom || archiveFormat == 
> archDirectory)
> -   compressLevel = Z_DEFAULT_COMPRESSION;
> -   else
> +   parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> +
> &compression_spec);
> +#else
> +   /* Nothing to do in the default case */
>  #endif
> -   compressLevel = 0;
> }
> 
> 
> As the comment says: for -Fc and -Fd, the compression is set to zlib, if
> enabled, and when not otherwise specified by the user.
> 
> Before 5e73a6048, this set compressLevel=0 for -Fp and -Ft, *and* when
> zlib was unavailable.
> 
> But I'm not sure why there's now an empty "#else".  I also don't know
> what "the default case" refers to.
> 
> Maybe the best thing here is to move the preprocessor #if, since it's no
> longer in the middle of a runtime conditional:
> 
>  #ifdef HAVE_LIBZ
> +   if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> +   !user_compression_defined)
> +   parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> +&compression_spec);
>  #endif
> 
> ...but that elicits a warning about "variable set but not used"...
> 

Not sure, I need to think about this a bit.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Why the lp_len is 28 not 32?

2023-02-26 Thread jack...@gmail.com
use these sqls below:
create table t(a int);
insert into t values(1);
select lp,lp_off,lp_len,t_data from heap_page_items(get_raw_page('t',0));
 lp | lp_off | lp_len |   t_data   
+++
  1 |   8160 | 28 | \x0100

jack...@gmail.com


Give me more details of some bits in infomask!!

2023-02-26 Thread jack...@gmail.com
here are the source codes from src/include/access/htup_details.h.
/*
 * information stored in t_infomask:
 */
#define HEAP_HASNULL 0x0001 /* has null attribute(s) */
#define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */
#define HEAP_HASEXTERNAL 0x0004 /* has external stored attribute(s) */
#define HEAP_HASOID_OLD 0x0008 /* has an object-id field */
#define HEAP_XMAX_KEYSHR_LOCK 0x0010 /* xmax is a key-shared locker */
#define HEAP_COMBOCID 0x0020 /* t_cid is a combo CID */
#define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */
#define HEAP_XMAX_LOCK_ONLY 0x0080 /* xmax, if valid, is only a locker */

And I can't understand these attrs:
1. external stored attribute(s), what is this?  can you give a create statement 
to show me?
2. xmax is a key-shared locker/exclusive locker/only a locker, so how you use 
this? can you give me a  scenario?
let me try to explain it:
 if there is a txn is trying to read this heaptuple, the HEAP_XMAX_KEYSHR_LOCK 
bit will be set to 1.
 if there is a txn is trying to delete/update this heaptuple, the 
HEAP_XMAX_EXCL_LOCK bit will be set to 1.
 but for HEAP_XMAX_LOCK_ONLY, I can't understand.
And another thought is that these three bit can have only one to be set 1 at 
most.
3. t_cid is a combo CID? what's a CID? give me an example please.
--
jack...@gmail.com


Re: locale/encoding vs vcregress.pl installcheck

2023-02-26 Thread Andrew Dunstan


On 2023-02-25 Sa 12:13, Andrew Dunstan wrote:


vcregress's installcheck_internal has "--encoding=SQL_ASCII 
--no-locale" hardcoded. It's been like that for a long time, for no 
good reason that I can see. The practical effect is to make it well 
nigh impossible to run the regular regression tests against any other 
encoding/locale. This in turn has apparently masked an issue with the 
collate.windows.win1252 test, which only runs on a WIN1252-encoded 
database.


I propose simply to remove those settings for the installcheck target. 
We already run the regression tests under these conditions in 
'vcregress.pl check', so we wouldn't be giving up anything important. 
Although this partcular test is only present in HEAD, I think we 
should backpatch the change to all live branches.


(Yes, I know we are trying to get rid of these tools, but we haven't 
done so yet. I'm working on it for the buildfarm, which is how I 
discovered this issue.)





Done.


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: pg_stat_statements and "IN" conditions

2023-02-26 Thread Dmitry Dolgov
> On Thu, Feb 23, 2023 at 09:48:35AM +0100, David Geier wrote:
> Hi,
>
> > > Seems like supporting only constants is a good starting
> > > point. The only thing that is likely confusing for users is that NUMERICs
> > > (and potentially constants of other types) are unsupported. Wouldn't it be
> > > fairly simple to support them via something like the following?
> > >
> > >      is_const(element) || (is_coercion(element) && 
> > > is_const(element->child))
> > It definitely makes sense to implement that, although I don't think it's
> > going to be acceptable to do that via directly listing conditions an
> > element has to satisfy. It probably has to be more flexible, sice we
> > would like to extend it in the future. My plan is to address this in a
> > follow-up patch, when the main mechanism is approved. Would you agree
> > with this approach?
>
> I still think it's counterintuitive and I'm pretty sure people would even
> report this as a bug because not knowing about the difference in internal
> representation you would expect NUMERICs to work the same way other
> constants work. If anything we would have to document it.
>
> Can't we do something pragmatic and have something like
> IsMergableInElement() which for now only supports constants and later can be
> extended? Or what exactly do you mean by "more flexible"?

Here is how I see it (pls correct me if I'm wrong at any point). To
support numerics as presented in the tests from this patch, we have to
deal with FuncExpr (the function converting a value into a numeric).
Having in mind only numerics, we would need to filter out any other
FuncExpr (which already sounds dubious). Then we need to validate for
example that the function is immutable and have constant arguments,
which is already implemented in evaluate_function and is a part of
eval_const_expression. There is nothing special about numerics at this
point, and this approach leads us back to eval_const_expression to a
certain degree. Do you see any other way?

I'm thinking about Michael idea in this context, and want to see if it
would be possible to make the mechanism more flexible using some node
attributes. But I see it only as a follow-up step, not a prerequisite.




Re: MERGE ... RETURNING

2023-02-26 Thread Dean Rasheed
On Fri, 24 Feb 2023 at 05:46, Dean Rasheed  wrote:
>
> Rebased version attached.
>

Another rebase.

Regards,
Dean
diff --git a/doc/src/sgml/dml.sgml b/doc/src/sgml/dml.sgml
new file mode 100644
index cbbc5e2..ff2a827
--- a/doc/src/sgml/dml.sgml
+++ b/doc/src/sgml/dml.sgml
@@ -283,10 +283,15 @@ DELETE FROM products;
RETURNING
   
 
+  
+   MERGE
+   RETURNING
+  
+
   
Sometimes it is useful to obtain data from modified rows while they are
being manipulated.  The INSERT, UPDATE,
-   and DELETE commands all have an
+   DELETE, and MERGE commands all have an
optional RETURNING clause that supports this.  Use
of RETURNING avoids performing an extra database query to
collect the data, and is especially valuable when it would otherwise be
@@ -339,6 +344,24 @@ DELETE FROM products
 
   
 
+  
+   In a MERGE, the data available to RETURNING is
+   the content of the source row plus the content of the inserted, updated, or
+   deleted target row.  In addition, the merge support functions
+and 
+   may be used to return information about which merge action was executed for
+   each row.  Since it is quite common for the source and target to have many
+   of the same columns, specifying RETURNING * can lead to a
+   lot of duplicated columns, so it is often more useful to just return the
+   target row.  For example:
+
+MERGE INTO products p USING new_products n ON p.product_no = n.product_no
+  WHEN NOT MATCHED THEN INSERT VALUES (n.product_no, n.name, n.price)
+  WHEN MATCHED THEN UPDATE SET name = n.name, price = n.price
+  RETURNING pg_merge_action(), p.*;
+
+  
+
   
If there are triggers () on the target table,
the data available to RETURNING is the row as modified by
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 0cbdf63..224e9b8
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21216,6 +21216,99 @@ SELECT count(*) FROM sometable;
 
  
 
+ 
+  Merge Support Functions
+
+  
+   MERGE
+   RETURNING
+  
+
+  
+   The merge support functions shown in
+may be used in the
+   RETURNING list of a  command
+   to return additional information about the action taken for each row.
+  
+
+  
+Merge Support Functions
+
+
+ 
+  
+   
+Function
+   
+   
+Description
+   
+  
+ 
+
+ 
+  
+   
+
+ pg_merge_action
+
+pg_merge_action ( )
+text
+   
+   
+Returns the action performed on the current row ('INSERT',
+'UPDATE', or 'DELETE').
+   
+  
+
+  
+   
+
+ pg_merge_when_clause
+
+pg_merge_when_clause ( )
+integer
+   
+   
+Returns the 1-based index of the WHEN clause executed
+for the current row.
+   
+  
+ 
+
+  
+
+  
+   Example:
+
+  
+
+  
+   Note that it is an error to use these functions anywhere other than the
+   RETURNING list of a MERGE command.
+  
+
+ 
+
  
   Subquery Expressions
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 7c01a54..4317cfc
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -1323,9 +1323,9 @@
  to a client upon the
  completion of an SQL command, usually a
  SELECT but it can be an
- INSERT, UPDATE, or
- DELETE command if the RETURNING
- clause is specified.
+ INSERT, UPDATE,
+ DELETE, or MERGE command if the
+ RETURNING clause is specified.
 
 
  The fact that a result set is a relation means that a query can be used
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
new file mode 100644
index 7c8a49f..b839695
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -1020,8 +1020,8 @@ INSERT INTO mytable VALUES (1,'one'), (2
 
 
 
- If the command does return rows (for example SELECT,
- or INSERT/UPDATE/DELETE
+ If the command does return rows (for example SELECT, or
+ INSERT/UPDATE/DELETE/MERGE
  with RETURNING), there are two ways to proceed.
  When the command will return at most one row, or you only care about
  the first row of output, write the command as usual but add
@@ -1149,6 +1149,7 @@ SELECT select_expressionsexpressions INTO STRICT target;
 UPDATE ... RETURNING expressions INTO STRICT target;
 DELETE ... RETURNING expressions INTO STRICT target;
+MERGE ... RETURNING expressions INTO STRICT target;
 
 
  where target can be a record variable, a row
@@ -1159,8 +1160,8 @@ DELETE ... RETURNING expres
  INTO clause) just as described above,
  and the plan is cached in the same way.
  This works for SELECT,
- INSERT/UPDATE/DELETE with
- RETURNING, and certain utility commands
+ INSERT/UPDATE/DELETE/MERGE
+ with RETURNING, and certain utility commands
  that return row sets, such as EXPLAIN.
  Except for the INTO clause, the SQL command is the same
 

Re: Doc updates for MERGE

2023-02-26 Thread Dean Rasheed
On Fri, 24 Feb 2023 at 09:28, Dean Rasheed  wrote:
>
> On Fri, 24 Feb 2023 at 08:56, Alvaro Herrera  wrote:
> >
> > I assume you're proposing to back-patch this.
>
> Yes. Will do.
>

Done.

Regards,
Dean