Re: Should CSV parsing be stricter about mid-field quotes?

2023-07-01 Thread Noah Misch
On Sat, May 20, 2023 at 09:16:30AM +0200, Joel Jacobson wrote:
> On Fri, May 19, 2023, at 18:06, Daniel Verite wrote:
> > COPY FROM file CSV somewhat differs as your example shows,
> > but it still mishandle \. when unquoted. For instance, consider this
> > file to load with COPY  t FROM '/tmp/t.csv' WITH CSV
> > $ cat /tmp/t.csv
> > line 1
> > \.
> > line 3
> > line 4
> >
> > It results in having only "line 1" being imported.
> 
> Hmm, this is a problem for one of the new use-cases I brought up that would be
> possible with DELIMITER NONE QUOTE NONE, i.e. to import unstructured log 
> files,
> where each raw line should be imported "as is" into a single text column.
> 
> Is there a valid reason why \. is needed for COPY FROM filename?

No.

> It seems to me it would only be necessary for the COPY FROM STDIN case,
> since files have a natural end-of-file and a known file size.

Right.  Even for COPY FROM STDIN, it's not needed anymore since the removal of
protocol v2.  psql would still use it to find the end of inline COPY data,
though.  Here's another relevant thread:
https://postgr.es/m/flat/bfcd57e4-8f23-4c3e-a5db-2571d09208e2%40beta.fastmail.com




Re: Incremental View Maintenance, take 2

2023-07-01 Thread jian he
ok. Now I really found a small bug.

this works as intended:
BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW test_ivm AS SELECT i, MIN(j) as
min_j  FROM mv_base_a group by 1;
INSERT INTO mv_base_a select 1,-2 where false;
rollback;

however the following one:
BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW test_ivm1 AS SELECT MIN(j) as
min_j  FROM mv_base_a;
INSERT INTO mv_base_a  select 1, -2 where false;
rollback;

will evaluate
tuplestore_tuple_count(new_tuplestores) to 1, it will walk through
IVM_immediate_maintenance function to apply_delta.
but should it be zero?




Re: possible bug in handling of contrecords in dd38ff28ad (Fix recovery_prefetch with low maintenance_io_concurrency)

2023-07-01 Thread Thomas Munro
On Sun, Jul 2, 2023 at 1:40 AM Tomas Vondra
 wrote:
> I think there's some sort of bug in how dd38ff28ad deals with
> contrecords. Consider something as simple as
>
>   pgbench -i -s 100
>
> and then doing pg_waldump on the WAL segments, I get this for every
> single one:
>
>   pg_waldump: error: error in WAL record at 0/198: missing
>  contrecord at 0/1E0
>
> This only happens since dd38ff28ad, and revert makes it disappear.
>
> It's possible we still have some issue with contrecords, but IIUC we
> fixed those. So unless there's some unknown one (and considering this
> seems to happen for *every* WAL segment that's hard to believe), this
> seems more like an issue in the error detection.

Yeah.  That message is due to this part of dd38ff28ad's change:

Also add an explicit error message for missing contrecords.  It was a
bit strange that we didn't report an error already, and became a latent
bug with prefetching, since the internal state that tracks aborted
contrecords would not survive retrying, as revealed by
026_overwrite_contrecord.pl with this adjustment.  Reporting an error
prevents that.

We can change 'missing contrecord' back to silent end-of-decoding (as
it was in 14) with the attached.  Here [1] is some analysis of this
error that I wrote last year.  The reason for my hesitation in pushing
a fix was something like this:

1.  For pg_waldump, it's "you told me to decode only this WAL segment,
so decoding failed here", which is useless noise
2.  For walsender, it's "you told me to shut down, so decoding failed
here", which is useless noise
3.  For crash recovery, "I ran out of data, so decoding failed here",
which seems like a report-worthy condition, I think?

When I added that new error I was thinking about that third case.  We
generally expect to detect the end of WAL replay after a crash with an
error ("invalid record length ...: wanted 24, got 0" + several other
possibilities), but in this rare case it would be silent.  The effect
on the first two cases is cosmetic, but certainly annoying.  Perhaps I
should go ahead and back-patch the attached change, and then we can
discuss whether/how we should do a better job of distinguishing "user
requested artificial end of decoding" from "unexpectedly ran out of
data" for v17?

[1] 
https://www.postgresql.org/message-id/ca+hukg+wkszpdoryeqm8_rk5uqpcqs2hgy92wimgfsk2wvk...@mail.gmail.com
From a6fbff139d27f53cf39521e0e78b734c59fd8211 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sun, 2 Jul 2023 13:07:17 +1200
Subject: [PATCH] Silence "missing contrecord" error.

Commit dd38ff28ad added a new error message "missing contrecord" when
decoding fails at a contrecord.  Unfortunately that caused noisy
messages to be logged by pg_waldump at end of segment, and walsender
when asked to shut down on a segment boundary.

Remove the new error message, so that this condition signals end-of-
WAL without an error.  It's arguably a reportable condition that
should not be silenced while performing crash recovery, but fixing that
without introducing noise in the other cases will require more research.

Back-patch to 15.

Reported-by: Tomas Vondra 
Discussion: https://postgr.es/m/6a1df56e-4656-b3ce-4b7a-a9cb41df8189%40enterprisedb.com
---
 src/backend/access/transam/xlogreader.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 2e7b1ba8e1..c9f9f6e98f 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -914,15 +914,11 @@ err:
 		state->missingContrecPtr = targetPagePtr;
 
 		/*
-		 * If we got here without reporting an error, report one now so that
-		 * XLogPrefetcherReadRecord() doesn't bring us back a second time and
-		 * clobber the above state.  Otherwise, the existing error takes
-		 * precedence.
+		 * If we got here without reporting an error, make sure an error is
+		 * queued so that XLogPrefetcherReadRecord() doesn't bring us back a
+		 * second time and clobber the above state.
 		 */
-		if (!state->errormsg_buf[0])
-			report_invalid_record(state,
-  "missing contrecord at %X/%X",
-  LSN_FORMAT_ARGS(RecPtr));
+		state->errormsg_deferred = true;
 	}
 
 	if (decoded && decoded->oversized)
-- 
2.40.1



Re: Incremental View Maintenance, take 2

2023-07-01 Thread jian he
This is probably not trivial.
In function  apply_new_delta_with_count.

 appendStringInfo(,
"WITH updt AS (" /* update a tuple if this exists in the view */
"UPDATE %s AS mv SET %s = mv.%s OPERATOR(pg_catalog.+) diff.%s "
"%s " /* SET clauses for aggregates */
"FROM %s AS diff "
"WHERE %s " /* tuple matching condition */
"RETURNING %s" /* returning keys of updated tuples */
") INSERT INTO %s (%s)" /* insert a new tuple if this doesn't existw */
"SELECT %s FROM %s AS diff "
"WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE %s);",

-
") INSERT INTO %s (%s)" /* insert a new tuple if this doesn't existw */
"SELECT %s FROM %s AS diff "

the INSERT INTO line, should have one white space in the end?
also "existw" should be "exists"


Re: Adding SHOW CREATE TABLE

2023-07-01 Thread Kirk Wolak
On Fri, Jun 30, 2023 at 1:56 PM Kirk Wolak  wrote:

> On Wed, Jun 21, 2023 at 8:52 PM Kirk Wolak  wrote:
>
>> On Mon, Jun 5, 2023 at 7:43 AM Jelte Fennema  wrote:
>>
>>> On Thu, 1 Jun 2023 at 18:57, Kirk Wolak  wrote:
>>>
>>
Definitely have the questions from the previous email, but I CERTAINLY
appreciate this output.
(Don't like the +, but pg_get_viewdef() creates the view the same way)...
Will psql doing \st pg_class be able to just call/output this so that the
output is nice and clean?

At this point... I will keep pressing forward, cleaning things up.  And
then send a patch for others to play with
(Probably bad timing with wrapping up V16)



*select pg_get_tabledef('pg_class'::regclass);*
pg_get_tabledef

CREATE TABLE pg_class (oid oid NOT NULL,+
 relname name NOT NULL COLLATE "C", +
 relnamespace oid NOT NULL, +
 reltype oid NOT NULL,  +
 reloftype oid NOT NULL,+
 relowner oid NOT NULL, +
 relam oid NOT NULL,+
 relfilenode oid NOT NULL,  +
 reltablespace oid NOT NULL,+
 relpages integer NOT NULL, +
 reltuples real NOT NULL,   +
 relallvisible integer NOT NULL,+
 reltoastrelid oid NOT NULL,+
 relhasindex boolean NOT NULL,  +
 relisshared boolean NOT NULL,  +
 relpersistence "char" NOT NULL,+
 relkind "char" NOT NULL,   +
 relnatts smallint NOT NULL,+
 relchecks smallint NOT NULL,   +
 relhasrules boolean NOT NULL,  +
 relhastriggers boolean NOT NULL,   +
 relhassubclass boolean NOT NULL,   +
 relrowsecurity boolean NOT NULL,   +
 relforcerowsecurity boolean NOT NULL,  +
 relispopulated boolean NOT NULL,   +
 relreplident "char" NOT NULL,  +
 relispartition boolean NOT NULL,   +
 relrewrite oid NOT NULL,   +
 relfrozenxid xid NOT NULL, +
 relminmxid xid NOT NULL,   +
 relacl aclitem[],  +
 reloptions text[] COLLATE "C", +
 relpartbound pg_node_tree COLLATE "C"  +
) USING heap


Re: RFC: pg_stat_logmsg

2023-07-01 Thread Joe Conway

On 6/30/23 23:20, Pavel Stehule wrote:
so 1. 7. 2023 v 1:57 odesílatel Joe Conway > napsal:

Part of the thinking is that people with fleets of postgres instances
can use this to scan for various errors that they care about.
Additionally it would be useful to look for "should not happen" errors.

I will register this in the July CF and will appreciate feedback.

This can be a very interesting feature. I like it.


Thanks!

FWIW, I just modified it to provide the localized text of the elevel 
rather than the internal number. I also localized the message format string:


8<--
psql (16beta2)
Type "help" for help.

test=# \x
Expanded display is on.
test=# select * from pg_stat_logmsg where elevel = 'ERROR' and 
sqlerrcode = 'XX000' and count > 1;

-[ RECORD 1 ]-
filename   | tablecmds.c
lineno | 10908
elevel | ERROR
funcname   | ATExecAlterConstraint
sqlerrcode | XX000
message| cannot alter constraint "%s" on relation "%s"
count  | 2
-[ RECORD 2 ]-
filename   | user.c
lineno | 2130
elevel | ERROR
funcname   | check_role_membership_authorization
sqlerrcode | XX000
message| role "%s" cannot have explicit members
count  | 2

test=# set lc_messages ='sv_SE.UTF8';
SET
test=# select * from pg_stat_logmsg where elevel = 'FEL' and sqlerrcode 
= 'XX000' and count > 1;

-[ RECORD 1 ]-
filename   | tablecmds.c
lineno | 10908
elevel | FEL
funcname   | ATExecAlterConstraint
sqlerrcode | XX000
message| kan inte ändra villkoret "%s" i relation "%s"
count  | 2
-[ RECORD 2 ]-
filename   | user.c
lineno | 2130
elevel | FEL
funcname   | check_role_membership_authorization
sqlerrcode | XX000
message| rollen "%s" kan inte ha explicita medlemmar
count  | 2
8<--

New tarball attached.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


pg_stat_logmsg-001.tgz
Description: application/compressed-tar


Re: ICU locale validation / canonicalization

2023-07-01 Thread Noah Misch
On Sat, May 20, 2023 at 10:19:30AM -0700, Jeff Davis wrote:
> On Tue, 2023-05-02 at 07:29 -0700, Noah Misch wrote:
> > On Thu, Mar 30, 2023 at 08:59:41AM +0200, Peter Eisentraut wrote:
> > > On 30.03.23 04:33, Jeff Davis wrote:
> > > > Attached is a new version of the final patch, which performs
> > > > canonicalization. I'm not 100% sure that it's wanted, but it
> > > > still
> > > > seems like a good idea to get the locales into a standard format
> > > > in the
> > > > catalogs, and if a lot more people start using ICU in v16
> > > > (because it's
> > > > the default), then it would be a good time to do it. But perhaps
> > > > there
> > > > are risks?
> > > 
> > > I say, let's do it.
> > 
> > The following is not cause for postgresql.git changes at this time,
> > but I'm
> > sharing it in case it saves someone else the study effort.  Commit
> > ea1db8a
> > ("Canonicalize ICU locale names to language tags.") slowed buildfarm
> > member
> > hoverfly, but that disappears if I drop debug_parallel_query from its
> > config.
> > Typical end-to-end duration rose from 2h5m to 2h55m.  Most-affected
> > were
> > installcheck runs, which rose from 11m to 19m.  (The "check" stage
> > uses
> > NO_LOCALE=1, so it changed less.)  From profiles, my theory is that
> > each of
> > the many parallel workers burns notable CPU and I/O opening its ICU
> > collator
> > for the first time.
> 
> I didn't repro the overall test timings (mine is ~1m40s compared to
> ~11-19m on hoverfly) but I think a microbenchmark on the ICU calls
> showed a possible cause.
> 
> I ran open in a loop 10M times on the requested locale. The root locale
> ("und"[1], "root" and "") take about 1.3s to open 10M times; simple
> locales like 'en' and 'fr-CA' and 'de-DE' are all a little shower at
> 3.3s.
> 
> Unrecognized locales like "xyz" take about 10 times as long: 13s to
> open 10M times, presumably to perform the fallback logic that
> ultimately opens the root locale. Not sure if 10X slower in the open
> path is enough to explain the overall test slowdown.
> 
> My guess is that the ICU locale for these tests is not recognized, or
> is some other locale that opens slowly. Can you tell me the actual
> daticulocale?

As of commit b8c3f6d, InstallCheck-C got daticulocale=en-US-u-va-posix.  Check
got daticulocale=NULL.

(The machine in question was unusable for PostgreSQL from 2023-05-12 to
2023-06-30, due to https://stackoverflow.com/q/76369660/16371536.  That
delayed my response.)




Re: Does a cancelled REINDEX CONCURRENTLY need to be messy?

2023-07-01 Thread Thom Brown
On Thu, 29 Jun 2023, 14:45 Álvaro Herrera,  wrote:

> ALTER TABLE DETACH CONCURRENTLY had to deal with this also, and it did it
> by having a COMPLETE option you can run later in case things got stuck the
> first time around. I suppose we could do something similar, where the
> server automatically does the needful, whatever that is.
>

So there doesn't appear to be provision for deferred activities.

Could, perhaps, the fact that it is an invalid index that has no locks on
it, and is dependent on the table mean it could be removed by a VACUUM?

I just don't like the idea of the user needing to remove broken things.

Thom


Re: check_strxfrm_bug()

2023-07-01 Thread Noah Misch
On Wed, Jun 28, 2023 at 01:02:21PM +1200, Thomas Munro wrote:
> On Wed, Jun 28, 2023 at 11:03 AM Thomas Munro  wrote:
> > The GCC build farm has just received some SPARC hardware new enough to
> > run modern Solaris (hostname gcc106), so if wrasse were moved over
> > there we could finally assume all systems have POSIX 2008 (AKA
> > SUSv4)'s locale_t.
> 
> That would look something like this.

This removes thirty-eight ifdefs, most of them located in the middle of
function bodies.  That's far more beneficial than most proposals to raise
minimum requirements.  +1 for revoking support for wrasse's OS version.
(wrasse wouldn't move, but it would stop testing v17+.)




Re: Preventing non-superusers from altering session authorization

2023-07-01 Thread Joseph Koshakow
>> That might be a good change? If the original authenticated role ID no
>> longer exists then we may want to return an error when trying to set
>> your session authorization to that role.
>
> I was curious why we don't block DROP ROLE if there are active sessions
for
> the role or terminate any such sessions as part of the command, and I
found
> this discussion from 2016:
>
>https://postgr.es/m/flat/56E87CD8.60007%40ohmu.fi

Ah, that makes sense that we don't prevent DROP ROLE on active roles.
Though, we do error when you try and set your role or session
authorization to a dropped role. So erroring on RESET SESSION
AUTHORIZATION when the original role is dropped makes it consistent
with SET SESSION AUTHORIZATION TO . On the other
hand it makes it inconsistent with RESET ROLE, which does not error on
a dropped role.

- Joe Koshakow

On Fri, Jun 23, 2023 at 1:54 PM Nathan Bossart 
wrote:

> On Thu, Jun 22, 2023 at 06:39:45PM -0400, Joseph Koshakow wrote:
> > On Wed, Jun 21, 2023 at 11:48 PM Nathan Bossart <
> nathandboss...@gmail.com>
> > wrote:
> >> I see that RESET SESSION AUTHORIZATION
> >> with a concurrently dropped role will FATAL with your patch but succeed
> >> without it, which could be part of the reason.
> >
> > That might be a good change? If the original authenticated role ID no
> > longer exists then we may want to return an error when trying to set
> > your session authorization to that role.
>
> I was curious why we don't block DROP ROLE if there are active sessions for
> the role or terminate any such sessions as part of the command, and I found
> this discussion from 2016:
>
> https://postgr.es/m/flat/56E87CD8.60007%40ohmu.fi
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com
>


[PATCH] pgrowlocks: Make mode names consistent with docs

2023-07-01 Thread David Cook
I noticed that pgrowlocks will use different names for shared locks
depending on whether the locks are intermediated by a multixact or not.
Particularly, if a single transaction has locked a row, it may return "For
Key Share" or "For Share" in the "modes" array, while if multiple
transactions have locked a row, it may return "Key Share" or "Share". The
documentation of the pgrowlocks function only lists "Key Share" and "Share"
as possible modes. (The four exclusive lock modes use the correct names in
both cases)

The attached patch (against the master branch) fixes this discrepancy, by
using "Key Share" and "Share" in the single transaction case, since that
matches the documentation. I also updated the test's expected output so it
passes again.

Thanks,
--David Cook
From dd7c5dcb0bd8457bd04f5ef762466f2b77209ad4 Mon Sep 17 00:00:00 2001
From: David Cook 
Date: Fri, 30 Jun 2023 09:22:20 -0500
Subject: [PATCH] pgrowlocks: Make mode names consistent with docs

---
 contrib/pgrowlocks/expected/pgrowlocks.out | 28 +++---
 contrib/pgrowlocks/pgrowlocks.c|  4 ++--
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/contrib/pgrowlocks/expected/pgrowlocks.out b/contrib/pgrowlocks/expected/pgrowlocks.out
index 725467266a..625246711f 100644
--- a/contrib/pgrowlocks/expected/pgrowlocks.out
+++ b/contrib/pgrowlocks/expected/pgrowlocks.out
@@ -10,10 +10,10 @@ a|b
 (2 rows)
 
 step s2_rowlocks: SELECT locked_row, multi, modes FROM pgrowlocks('multixact_conflict');
-locked_row|multi|modes
---+-+-
-(0,1) |f|{"For Key Share"}
-(0,2) |f|{"For Key Share"}
+locked_row|multi|modes
+--+-+-
+(0,1) |f|{"Key Share"}
+(0,2) |f|{"Key Share"}
 (2 rows)
 
 step s1_commit: COMMIT;
@@ -28,10 +28,10 @@ a|b
 (2 rows)
 
 step s2_rowlocks: SELECT locked_row, multi, modes FROM pgrowlocks('multixact_conflict');
-locked_row|multi|modes
---+-+-
-(0,1) |f|{"For Share"}
-(0,2) |f|{"For Share"}
+locked_row|multi|modes  
+--+-+---
+(0,1) |f|{Share}
+(0,2) |f|{Share}
 (2 rows)
 
 step s1_commit: COMMIT;
@@ -111,10 +111,10 @@ a|b
 (2 rows)
 
 step s2_rowlocks: SELECT locked_row, multi, modes FROM pgrowlocks('multixact_conflict');
-locked_row|multi|modes
---+-+-
-(0,1) |f|{"For Key Share"}
-(0,2) |f|{"For Key Share"}
+locked_row|multi|modes
+--+-+-
+(0,1) |f|{"Key Share"}
+(0,2) |f|{"Key Share"}
 (2 rows)
 
 step s1_commit: COMMIT;
@@ -208,7 +208,7 @@ step s2_rowlocks: SELECT locked_row, multi, modes FROM pgrowlocks('multixact_con
 locked_row|multi|modes   
 --+-+
 (0,1) |t|{"Key Share",Update}
-(0,2) |f|{"For Key Share"}   
+(0,2) |f|{"Key Share"}   
 (2 rows)
 
 step s1_commit: COMMIT;
@@ -226,7 +226,7 @@ step s1_updateb: UPDATE multixact_conflict SET b = 11 WHERE b = 4;
 step s2_rowlocks: SELECT locked_row, multi, modes FROM pgrowlocks('multixact_conflict');
 locked_row|multi|modes
 --+-+-
-(0,1) |f|{"For Key Share"}
+(0,1) |f|{"Key Share"}
 (0,2) |t|{"Key Share","No Key Update"}
 (2 rows)
 
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index c543277b7c..c2d66dda35 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -230,9 +230,9 @@ pgrowlocks(PG_FUNCTION_ARGS)
 if (infomask & HEAP_XMAX_LOCK_ONLY)
 {
 	if (HEAP_XMAX_IS_SHR_LOCKED(infomask))
-		snprintf(values[Atnum_modes], NCHARS, "{For Share}");
+		snprintf(values[Atnum_modes], NCHARS, "{Share}");
 	else if (HEAP_XMAX_IS_KEYSHR_LOCKED(infomask))
-		snprintf(values[Atnum_modes], NCHARS, "{For Key Share}");
+		snprintf(values[Atnum_modes], NCHARS, "{Key Share}");
 	else if (HEAP_XMAX_IS_EXCL_LOCKED(infomask))
 	{
 		if (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)
-- 
2.30.2



Re: ProcessStartupPacket(): database_name and user_name truncation

2023-07-01 Thread Drouvot, Bertrand

Hi,

On 6/30/23 7:32 PM, Drouvot, Bertrand wrote:

Hi,

On 6/30/23 5:54 PM, Tom Lane wrote:

Nathan Bossart  writes:

After taking another look at this, I wonder if it'd be better to fail as
soon as we see the database or user name is too long instead of lugging
them around when authentication is destined to fail.


If we're agreed that we aren't going to truncate these identifiers,
that seems like a reasonable way to handle it.



Yeah agree, thanks Nathan for the idea.
I'll work on a new patch version proposal.



Please find V2 attached where it's failing as soon as the database name or
user name are detected as overlength.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 7747032129fb66891805a8a2b5e06cbce8df2d2a Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Wed, 21 Jun 2023 18:28:13 +
Subject: [PATCH v2] Reject incoming username and database name in case of
 overlength

---
 src/backend/postmaster/postmaster.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)
 100.0% src/backend/postmaster/

diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index 4c49393fc5..03289f2093 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2183,9 +2183,25 @@ retry1:
valptr = buf + valoffset;
 
if (strcmp(nameptr, "database") == 0)
+   {
+   /* Overlength database name has been provided. 
*/
+   if (strlen(valptr) >= NAMEDATALEN)
+   ereport(FATAL,
+   
(errcode(ERRCODE_UNDEFINED_DATABASE),
+errmsg("database 
\"%s\" does not exist", valptr)));
+
port->database_name = pstrdup(valptr);
+   }
else if (strcmp(nameptr, "user") == 0)
+   {
+   /* Overlength user name has been provided. */
+   if (strlen(valptr) >= NAMEDATALEN)
+   ereport(FATAL,
+   
(errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION),
+errmsg("role \"%s\" 
does not exist", valptr)));
+
port->user_name = pstrdup(valptr);
+   }
else if (strcmp(nameptr, "options") == 0)
port->cmdline_options = pstrdup(valptr);
else if (strcmp(nameptr, "replication") == 0)
@@ -2290,15 +2306,6 @@ retry1:
}
}
 
-   /*
-* Truncate given database and user names to length of a Postgres name.
-* This avoids lookup failures when overlength names are given.
-*/
-   if (strlen(port->database_name) >= NAMEDATALEN)
-   port->database_name[NAMEDATALEN - 1] = '\0';
-   if (strlen(port->user_name) >= NAMEDATALEN)
-   port->user_name[NAMEDATALEN - 1] = '\0';
-
if (am_walsender)
MyBackendType = B_WAL_SENDER;
else
-- 
2.34.1



possible bug in handling of contrecords in dd38ff28ad (Fix recovery_prefetch with low maintenance_io_concurrency)

2023-07-01 Thread Tomas Vondra
Hi,

I think there's some sort of bug in how dd38ff28ad deals with
contrecords. Consider something as simple as

  pgbench -i -s 100

and then doing pg_waldump on the WAL segments, I get this for every
single one:

  pg_waldump: error: error in WAL record at 0/198: missing
 contrecord at 0/1E0

This only happens since dd38ff28ad, and revert makes it disappear.

It's possible we still have some issue with contrecords, but IIUC we
fixed those. So unless there's some unknown one (and considering this
seems to happen for *every* WAL segment that's hard to believe), this
seems more like an issue in the error detection.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2023-07-01 Thread Dean Rasheed
On Tue, 21 Mar 2023 at 12:26, Alvaro Herrera  wrote:
>
> On 2023-Mar-21, Dean Rasheed wrote:
>
> > Looking at it with fresh eyes though, I realise that I could have just 
> > written
> >
> > action->qual = make_and_qual((Node *) ntest, action->qual);
> >
> > which is equivalent, but more concise.
>
> Nice.
>
> I have no further observations about this patch.
>

Looking at this one afresh, it seems that the change to make Vars
outer-join aware broke it -- the Var in the qual to test whether the
source row is null needs to be marked as nullable by the join added by
transform_MERGE_to_join(). That's something that needs to be done in
transform_MERGE_to_join(), so it makes more sense to add the new qual
there rather than in transformMergeStmt().

Also, now that MERGE has ruleutils support, it's clear that adding the
qual in transformMergeStmt() isn't right anyway, since it would then
appear in the deparsed output.

So attached is an updated patch doing that, which seems neater all
round, since adding the qual is closely related to the join-type
choice, which is now a decision taken entirely in
transform_MERGE_to_join(). This requires a new "mergeSourceRelation"
field on the Query structure, but as before, it does away with the
"mergeUseOuterJoin" field.

I've also updated the ruleutils support. In the absence of any WHEN
NOT MATCHED BY SOURCE actions, this will output not-matched actions
simply as "WHEN NOT MATCHED" for backwards compatibility, and to be
SQL-standard-compliant. If there are any WHEN NOT MATCHED BY SOURCE
actions though, I think it's preferable to output explicit "BY SOURCE"
and "BY TARGET" qualifiers for all not-matched actions, to make the
meaning clearer.

Regards,
Dean
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
new file mode 100644
index f8f83d4..6ef0c2b
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -396,8 +396,8 @@
 originally matched appears later in the list of actions.
 On the other hand, if the row is concurrently updated or deleted so
 that the join condition fails, then MERGE will
-evaluate the condition's NOT MATCHED actions next,
-and execute the first one that succeeds.
+evaluate the condition's NOT MATCHED [BY TARGET]
+actions next, and execute the first one that succeeds.
 If MERGE attempts an INSERT
 and a unique index is present and a duplicate row is concurrently
 inserted, then a uniqueness violation error is raised;
diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml
new file mode 100644
index 0995fe0..8ef121a
--- a/doc/src/sgml/ref/merge.sgml
+++ b/doc/src/sgml/ref/merge.sgml
@@ -33,7 +33,8 @@ USING dat
 and when_clause is:
 
 { WHEN MATCHED [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |
-  WHEN NOT MATCHED [ AND condition ] THEN { merge_insert | DO NOTHING } }
+  WHEN NOT MATCHED BY SOURCE [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |
+  WHEN NOT MATCHED [BY TARGET] [ AND condition ] THEN { merge_insert | DO NOTHING } }
 
 and merge_insert is:
 
@@ -70,7 +71,9 @@ DELETE
from data_source to
target_table_name
producing zero or more candidate change rows.  For each candidate change
-   row, the status of MATCHED or NOT MATCHED
+   row, the status of MATCHED,
+   NOT MATCHED BY SOURCE,
+   or NOT MATCHED [BY TARGET]
is set just once, after which WHEN clauses are evaluated
in the order specified.  For each candidate change row, the first clause to
evaluate as true is executed.  No more than one WHEN
@@ -226,16 +229,37 @@ DELETE
   At least one WHEN clause is required.
  
  
+  The WHEN clause may specify WHEN MATCHED,
+  WHEN NOT MATCHED BY SOURCE, or
+  WHEN NOT MATCHED [BY TARGET].
+  Note that the SQL standard only defines
+  WHEN MATCHED and WHEN NOT MATCHED
+  (which is defined to mean no matching target row).
+  WHEN NOT MATCHED BY SOURCE is an extension to the
+  SQL standard, as is the option to append
+  BY TARGET to WHEN NOT MATCHED, to
+  make its meaning more explicit.
+ 
+ 
   If the WHEN clause specifies WHEN MATCHED
-  and the candidate change row matches a row in the
+  and the candidate change row matches a row in the source to a row in the
   target_table_name,
   the WHEN clause is executed if the
   condition is
   absent or it evaluates to true.
  
  
-  Conversely, if the WHEN clause specifies
-  WHEN NOT MATCHED
+  If the WHEN clause specifies
+  WHEN NOT MATCHED BY SOURCE and the candidate change
+  row represents a row in the
+  target_table_name that does
+  not match a source row, the WHEN clause is executed
+  if the condition is
+  absent or it evaluates to true.
+ 
+ 
+  If the WHEN clause specifies
+  WHEN NOT MATCHED [BY TARGET]
   and the candidate change row does not match a row in the
   target_table_name,
   the WHEN 

Re: SPI isolation changes

2023-07-01 Thread Seino Yuki

On 2023-07-01 01:47, Tom Lane wrote:

Seino Yuki  writes:

Of course, executing SET TRANSACTION ISOLATION LEVEL with SPI_execute
will result in error.
---
SPI_execute("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE", false, 0);



(Log Output)
ERROR:  SET TRANSACTION ISOLATION LEVEL must be called before any 
query

CONTEXT:  SQL statement "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE"


Even if you just did SPI_commit?  That *should* fail if you just do
it right off the bat in a SPI-using procedure, because you're already
within the transaction that called the procedure.  But I think it
will work if you do SPI_commit followed by this SPI_execute.

regards, tom lane


I'm sorry. I understood wrongly.
SPI_execute(SET TRANSACTION ISOLATION LEVEL ~ ) after executing 
SPI_commit succeeded.


Thank you. My problem is solved.

Regards,
--
Seino Yuki
NTT DATA CORPORATION




Re: MERGE ... RETURNING

2023-07-01 Thread Dean Rasheed
On Mon, 13 Mar 2023 at 13:36, Dean Rasheed  wrote:
>
> And another rebase.
>

I ran out of cycles to pursue the MERGE patches in v16, but hopefully
I can make more progress in v17.

Looking at this one with fresh eyes, it looks mostly in good shape. To
recap, this adds support for the RETURNING clause in MERGE, together
with new support functions pg_merge_action() and
pg_merge_when_clause() that can be used in the RETURNING list of MERGE
to retrieve the kind of action (INSERT/UPDATE/DELETE), and the index
of the WHEN clause executed for each row merged. In addition,
RETURNING support allows MERGE to be used as the source query in COPY
TO and WITH queries.

One minor annoyance is that, due to the way that pg_merge_action() and
pg_merge_when_clause() require access to the MergeActionState, they
only work if they appear directly in the RETURNING list. They can't,
for example, appear in a subquery in the RETURNING list, and I don't
see an easy way round that limitation.

Attached is an updated patch with some cosmetic updates, plus updated
ruleutils support.

Regards,
Dean
diff --git a/doc/src/sgml/dml.sgml b/doc/src/sgml/dml.sgml
new file mode 100644
index cbbc5e2..ff2a827
--- a/doc/src/sgml/dml.sgml
+++ b/doc/src/sgml/dml.sgml
@@ -283,10 +283,15 @@ DELETE FROM products;
RETURNING
   
 
+  
+   MERGE
+   RETURNING
+  
+
   
Sometimes it is useful to obtain data from modified rows while they are
being manipulated.  The INSERT, UPDATE,
-   and DELETE commands all have an
+   DELETE, and MERGE commands all have an
optional RETURNING clause that supports this.  Use
of RETURNING avoids performing an extra database query to
collect the data, and is especially valuable when it would otherwise be
@@ -339,6 +344,24 @@ DELETE FROM products
 
   
 
+  
+   In a MERGE, the data available to RETURNING is
+   the content of the source row plus the content of the inserted, updated, or
+   deleted target row.  In addition, the merge support functions
+and 
+   may be used to return information about which merge action was executed for
+   each row.  Since it is quite common for the source and target to have many
+   of the same columns, specifying RETURNING * can lead to a
+   lot of duplicated columns, so it is often more useful to just return the
+   target row.  For example:
+
+MERGE INTO products p USING new_products n ON p.product_no = n.product_no
+  WHEN NOT MATCHED THEN INSERT VALUES (n.product_no, n.name, n.price)
+  WHEN MATCHED THEN UPDATE SET name = n.name, price = n.price
+  RETURNING pg_merge_action(), p.*;
+
+  
+
   
If there are triggers () on the target table,
the data available to RETURNING is the row as modified by
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 5a47ce4..ca1b3cc
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21705,6 +21705,100 @@ SELECT count(*) FROM sometable;
 
  
 
+ 
+  Merge Support Functions
+
+  
+   MERGE
+   RETURNING
+  
+
+  
+   The merge support functions shown in
+may be used in the
+   RETURNING list of a  command
+   to return additional information about the action taken for each row.
+  
+
+  
+Merge Support Functions
+
+
+ 
+  
+   
+Function
+   
+   
+Description
+   
+  
+ 
+
+ 
+  
+   
+
+ pg_merge_action
+
+pg_merge_action ( )
+text
+   
+   
+Returns the action performed on the current row ('INSERT',
+'UPDATE', or 'DELETE').
+   
+  
+
+  
+   
+
+ pg_merge_when_clause
+
+pg_merge_when_clause ( )
+integer
+   
+   
+Returns the 1-based index of the WHEN clause executed
+for the current row.
+   
+  
+ 
+
+  
+
+  
+   Example:
+
+  
+
+  
+   Note that these functions can only be used in the RETURNING
+   list of a MERGE command.  It is an error to use them in
+   any other part of a query, or in a subquery.
+  
+
+ 
+
  
   Subquery Expressions
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index fe8def4..51a15ca
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -1387,9 +1387,9 @@
  to a client upon the
  completion of an SQL command, usually a
  SELECT but it can be an
- INSERT, UPDATE, or
- DELETE command if the RETURNING
- clause is specified.
+ INSERT, UPDATE,
+ DELETE, or MERGE command if the
+ RETURNING clause is specified.
 
 
  The fact that a result set is a relation means that a query can be used
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
new file mode 100644
index f55e901..6803240
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -1020,8 +1020,8 @@ INSERT INTO mytable VALUES (1,'one'), (2
 
 
 
- If the command does return rows (for example SELECT,
- or 

Re: Do we want a hashset type?

2023-07-01 Thread Joel Jacobson
On Fri, Jun 30, 2023, at 06:50, jian he wrote:
> more like a C questions
> in this context does
> #define HASHSET_GET_VALUES(set) ((int32 *) ((set)->data +
> CEIL_DIV((set)->capacity, 8)))
> define first, then define struct int4hashset_t. Is this normally ok?

Yes, it's fine. Macros are just text substitutions done pre-compilation.

> Also does
> #define HASHSET_GET_VALUES(set) ((int32 *) ((set)->data +
> CEIL_DIV((set)->capacity, 8)))
>
> remove (int32 *) will make it generic? then when you use it, you can
> cast whatever type you like?

Maybe, but might be less error-prone more descriptive with different
macros for each type, e.g. INT4HASHSET_GET_VALUES,
similar to the existing PG_GETARG_INT4HASHSET

Curious to hear what everybody thinks about the interface, documentation,
semantics and implementation?

Is there anything missing or something that you think should be 
changed/improved?

/Joel




Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?

2023-07-01 Thread Andrey M. Borodin
Fast upgrade of highly available cluster is a vital part of being 
industry-acceptable solution for any data management system. Because the 
cluster is required to be highly available.

Without this documented technique upgrade of 1Tb cluster would last many hours, 
not seconds.
There are industry concerns about scalability beyond tens of terabytes per 
cluster, but such downtime would significantly lower that boundary.

> On 1 Jul 2023, at 01:16, Robert Haas  wrote:
> 
>  If somebody
> wants to write a reliable tool for this to ship as part of PostgreSQL,
> well and good.

IMV that's a good idea. We could teach pg_upgrade or some new tool to do that 
reliably. The tricky part is that the tool must stop-start standby remotely...


Best regards, Andrey Borodin.

Outdated description of PG_CACHE_LINE_SIZE

2023-07-01 Thread Julien Rouhaud
Hi,

I just noticed that the comment for PG_CACHE_LINE_SIZE still says that "it's
currently used in xlog.c", which hasn't been true for quite some time.

PFA a naive patch to make the description more generic.
>From a554ee9ca3558c1cc67b2f4024c13b26aacff3c9 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Sat, 1 Jul 2023 15:41:54 +0800
Subject: [PATCH v1] Fix PG_CACHE_LINE_SIZE description.

PG_CACHE_LINE_SIZE was originally only used in xlog.c, but this hasn't been
true for a very long time and is now wildly used, so modify its description to
not mention any explicit source code file.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: FIXME
---
 src/include/pg_config_manual.h | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index a1a93ad706..01fe2af499 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -218,12 +218,11 @@
 
 /*
  * Assumed cache line size. This doesn't affect correctness, but can be used
- * for low-level optimizations. Currently, this is used to pad some data
- * structures in xlog.c, to ensure that highly-contended fields are on
- * different cache lines. Too small a value can hurt performance due to false
- * sharing, while the only downside of too large a value is a few bytes of
- * wasted memory. The default is 128, which should be large enough for all
- * supported platforms.
+ * for low-level optimizations. This is mostly used to pad various data
+ * structures, to ensure that highly-contended fields are on different cache
+ * lines. Too small a value can hurt performance due to false sharing, while
+ * the only downside of too large a value is a few bytes of wasted memory. The
+ * default is 128, which should be large enough for all supported platforms.
  */
 #define PG_CACHE_LINE_SIZE		128
 
-- 
2.37.0