Re: Enable data checksums by default

2024-10-16 Thread Peter Eisentraut

On 16.10.24 08:54, Peter Eisentraut wrote:

On 14.10.24 11:28, Peter Eisentraut wrote:

On 03.10.24 23:13, Nathan Bossart wrote:

On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote:
I have committed 0001 (the new option) and 0004 (the docs tweak).  I 
think
there is consensus for the rest, too, but I'll leave it for a few 
more days

to think about.  I guess the test failure has to be addressed.


Here is a rebased patch with the test fix (for cfbot).  I have made no
other changes.


I have committed the test changes (patch 0002).  (I renamed the option 
to no_data_checksums to keep the wording consistent with the initdb 
option.)


Right now, with checksums off by default, this doesn't do much, but 
you can test this like


 PG_TEST_INITDB_EXTRA_OPTS=--data-checksums meson test ...

and everything will pass.  To make that work, I had to adjust the 
order of how the initdb options are assembled in Cluster.pm a bit.


I will work on the patch that flips the default next.


The patch that flips the default has been committed.

I also started a PG18 open items page and made a note that we follow up 
on the upgrade experience, as was discussed in this thread.


Ah yes, and the upgrade tests on the buildfarm don't like this.  What 
shall we do about this?  Maybe adjust the buildfarm scripts to use 
--no-data-checksums?






Re: Doc: typo in config.sgml

2024-10-16 Thread Peter Eisentraut

On 15.10.24 23:51, Bruce Momjian wrote:

I don't see why we need to enforce this at this level.  Whatever downstream
toolchain has requirements about which characters are allowed will complain
if it encounters a character it doesn't like.


Uh, the PDF build does not complain if you pass it a non-Latin-1 UTF8
characters.  To test this I added some Russian characters (non-Latin-1)
to release.sgml:

(⟨б⟩, ⟨в⟩, ⟨г⟩, ⟨д⟩, ⟨ж⟩, ⟨з⟩, ⟨к⟩, ⟨л⟩, ⟨м⟩, ⟨н⟩, ⟨п⟩, ⟨р⟩, ⟨с⟩, ⟨т⟩,
⟨ф⟩, ⟨х⟩, ⟨ц⟩, ⟨ч⟩, ⟨ш⟩, ⟨щ⟩), ten vowels (⟨а⟩, ⟨е⟩, ⟨ё⟩, ⟨и⟩, ⟨о⟩, ⟨у⟩,
⟨ы⟩, ⟨э⟩, ⟨ю⟩, ⟨я⟩), a semivowel / consonant (⟨й⟩), and two modifier
letters or "signs" (⟨ъ⟩, ⟨ь⟩)

and I ran 'make postgres-US.pdf', and then removed the Russian
characters and ran the same command again.  The output, including stderr
was identical.  The PDFs, of course, were not, with the Russian
characters showing as "".  Makefile output attached.


Hmm, mine complains:

/opt/homebrew/bin/fop -fo postgres-A4.fo -pdf postgres-A4.pdf
Picked up JAVA_TOOL_OPTIONS: -Djava.awt.headless=true
[WARN] FOUserAgent - Font "Symbol,normal,700" not found. Substituting 
with "Symbol,normal,400".
[WARN] FOUserAgent - Font "ZapfDingbats,normal,700" not found. 
Substituting with "ZapfDingbats,normal,400".

[WARN] FOUserAgent - Glyph "⟨" (0x27e8) not available in font "Times-Roman".
[WARN] FOUserAgent - Glyph "б" (0x431, afii10066) not available in font 
"Times-Roman".

[WARN] FOUserAgent - Glyph "⟩" (0x27e9) not available in font "Times-Roman".
[WARN] FOUserAgent - Glyph "в" (0x432, afii10067) not available in font 
"Times-Roman".
[WARN] FOUserAgent - Glyph "г" (0x433, afii10068) not available in font 
"Times-Roman".
[WARN] FOUserAgent - Glyph "д" (0x434, afii10069) not available in font 
"Times-Roman".
[WARN] FOUserAgent - Glyph "ж" (0x436, afii10072) not available in font 
"Times-Roman".
[WARN] FOUserAgent - Glyph "з" (0x437, afii10073) not available in font 
"Times-Roman".
[WARN] PropertyMaker - span="inherit" on fo:block, but no explicit value 
found on the parent FO.






Re: Doc: typo in config.sgml

2024-10-16 Thread Peter Eisentraut

On 15.10.24 23:51, Bruce Momjian wrote:

On Tue, Oct 15, 2024 at 05:27:49PM -0400, Tom Lane wrote:

Bruce Momjian  writes:

Well, we can only use Latin-1, so the idea is that we will be explicit
about specifying Latin-1 only as HTML entities, rather than letting
non-Latin-1 creep in as UTF8.  We can exclude certain UTF8 or SGML files
if desired.


That policy would cause substantial problems with contributor names
in the release notes.  I agree with Peter that we don't need this.
Catching otherwise-invisible characters seems sufficient.


Uh, why can't we use HTML entities going forward?  Is that harder?


I think the question should be the other way around.  The entities are a 
historical workaround for when encoding support and rendering support 
was poor.  Now you can just type in the characters you want as is, which 
seems nicer.






Re: Enable data checksums by default

2024-10-16 Thread Daniel Gustafsson
> On 16 Oct 2024, at 09:49, Peter Eisentraut  wrote:

> Ah yes, and the upgrade tests on the buildfarm don't like this.  What shall 
> we do about this?  Maybe adjust the buildfarm scripts to use 
> --no-data-checksums?

I can't see many other options, and it represents a reasonable use-case, so +1.

--
Daniel Gustafsson





Re: Vacuum statistics

2024-10-16 Thread Alena Rybakina

Hi!

On 16.10.2024 13:31, Ilia Evdokimov wrote:



On 08.10.2024 19:18, Alena Rybakina wrote:

Made a rebase on a fresh master branch.
--
Regards,
Alena Rybakina
Postgres Professional


Thank you for rebasing.

I have noticed that when I create a table or an index on this table, 
there is no information about the table or index in 
pg_stat_vacuum_tables and pg_stat_vacuum_indexes until we perform a 
VACUUM.


Example:

CREATE TABLE t (i INT, j INT);
INSERT INTO t SELECT i/10, i/100 FROM GENERATE_SERIES(1,100) i;
SELECT * FROM pg_stat_vacuum_tables WHERE relname = 't';

(0 rows)
CREATE INDEX ON t (i);
SELECT * FROM pg_stat_vacuum_indexes WHERE relname = 't_i_idx';
...
(0 rows)

I can see the entries after running VACUUM or executing autovacuum. or 
when autovacuum is executed. I would suggest adding a line about the 
relation even if it has not yet been processed by 
vacuum. Interestingly, this issue does not occur with 
pg_stat_vacuum_database:


CREATE DATABASE example_db;
SELECT * FROM pg_stat_vacuum_database WHERE dbname = 'example_db';
dboid |   dbname | ...
 ...  | example_db | ...
(1 row)

BTW, I recommend renaming the view pg_stat_vacuum_database to 
pg_stat_vacuum_database_S_  for consistency with pg_stat_vacuum_tables 
and pg_stat_vacuum_indexes


Thanks for the review. I'm investigating this. I agree with the 
renaming, I will do it in the next version of the patch.


--
Regards,
Alena Rybakina
Postgres Professional


Re: Add “FOR UPDATE NOWAIT” lock details to the log.

2024-10-16 Thread Fujii Masao




On 2024/09/13 20:49, Seino Yuki wrote:

Hello,

I would like to add the information of the PID that caused the failure
when acquiring a lock with "FOR UPDATE NOWAIT".

When "FOR UPDATE" is executed and interrupted by lock_timeout,
xid and PID are output in the logs,


Could you explain how to reproduce this? In my quick test,
lock waits canceled by lock_timeout didn’t report either xid or PID,
so I might be missing something.



but in the case of "FOR UPDATE NOWAIT",
no information is output, making it impossible to identify the cause of the 
lock failure.
Therefore, I would like to output information in the logs in the same way as
when "FOR UPDATE" is executed and interrupted by lock_timeout.


I think NOWAIT is often used for lock waits that can fail frequently.
Logging detailed information for each failed NOWAIT lock wait could
lead to excessive and potentially noisy log messages for some users.

On the other hand, I understand that even with NOWAIT, we might want
to investigate why a lock wait failed. In such cases,
detailed information about the locking process would be useful.

Considering both points, I’m leaning toward adding a new GUC parameter
to control whether detailed logs should be generated for failed
NOWAIT locks (and possibly for those canceled by lock_timeout).
Alternatively, if adding a new GUC is not ideal, we could extend
log_lock_waits to accept a new value like 'error,' which would trigger
detailed logging when a lock wait fails due to NOWAIT, lock_timeout,
or cancellation.



The patch is attached as well.


I got the following compiler errors;

proc.c:1240:3: error: call to undeclared function 
'CollectLockHoldersAndWaiters'; ISO C99 and later do not support implicit 
function declarations [-Wimplicit-function-declaration]
 1240 | CollectLockHoldersAndWaiters(proclock, lock, 
&lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum);
  | ^
proc.c:1549:4: error: call to undeclared function 
'CollectLockHoldersAndWaiters'; ISO C99 and later do not support implicit 
function declarations [-Wimplicit-function-declaration]
 1549 | CollectLockHoldersAndWaiters(NULL, lock, 
&lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum);
  | ^
proc.c:1126:8: warning: unused variable 'first_holder' [-Wunused-variable]
 1126 | boolfirst_holder = true,
  | ^~~~
proc.c:1127:5: warning: unused variable 'first_waiter' [-Wunused-variable]
 1127 | first_waiter = true;
  | ^~~~
proc.c:1982:1: error: conflicting types for 'CollectLockHoldersAndWaiters'
 1982 | CollectLockHoldersAndWaiters(PROCLOCK *waitProcLock, LOCK *lock, 
StringInfo lock_holders_sbuf, StringInfo lock_waiters_sbuf, int *lockHoldersNum)
  | ^
proc.c:1240:3: note: previous implicit declaration is here
 1240 | CollectLockHoldersAndWaiters(proclock, lock, 
&lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum);
  | ^
2 warnings and 3 errors generated.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION





Re: replace strtok()

2024-10-16 Thread Ranier Vilela
Em qua., 16 de out. de 2024 às 04:45, Peter Eisentraut 
escreveu:

> On 15.10.24 14:07, Ranier Vilela wrote:
> > I also wonder, if other places touched by 5d2e1cc11 need corrections
> > too.
> > I played with
> > PG_COLOR=always PG_COLORS="error=01;31" .../initdb
> >
> > and it looks like this free() call in pg_logging_init():
> >  char   *colors = strdup(pg_colors_env);
> >
> >  if (colors)
> >  {
> > ...
> >  while ((token = strsep(&colors, ":")))
> >  {
> > ...
> >  }
> >
> >  free(colors);
> >  }
> > gets null in colors.
> >
> > Yeah, I also saw this usage, but I was waiting for a definition for the
> > first report.
> > The solution IMO, would be the same.
> >
> > diff --git a/src/common/logging.c b/src/common/logging.c
> > index aedd1ae2d8..45b5316d48 100644
> > --- a/src/common/logging.c
> > +++ b/src/common/logging.c
> > @@ -121,7 +121,7 @@ pg_logging_init(const char *argv0)
> >{
> >char   *token;
> >
> > - while ((token = strsep(&colors, ":")))
> > + while ((token = strsep(&colors, ":")) != NULL && colors != NULL)
> >{
> >char   *e = strchr(token, '=');
> > The advantage of this change is that it would avoid processing
> > unnecessary tokens.
>
> This wouldn't fix anything, I think.  If colors is NULL, then strsep()
> already returns NULL, so the added code does nothing.
>
If *colors* is NULL, then the delimiter is not found and strsep will return
the entire
string **stringp, so the token becomes invalid*.

IMO, I think it must be necessary to check if *colors* are NULL too.

best regards,
Ranier Vilela


Re: New function normal_rand_array function to contrib/tablefunc.

2024-10-16 Thread Andy Fan

Hello Dean,

 Thanks for the detailed feedback! Here is the rebased version. 

> 1). In the tests:
>
> +select setseed(0.8);
> +select rand_array(10, 0, 3, 50::int, 80::int);
..
> this should really have a comment block to distinguish these new tests
> from the preceeding normal_rand() tests.

> 3). It's only necessary to call setseed() once to get a reproducible
> set of results from then on.

> 4). I'd use setseed(0) or setseed(0.5), since those values are exactly
> representable as double precision values, unlike 0.8, ensuring that it
> works the same on all platforms. It might not matter, but why take the
> risk?

All Done, very interesting about the reason why the 0.8 is bad.

> 2). It's worth also having tests for the error cases.

After the code refactor, looks the ERROR is not reachable, so I added
both Assert(false) and elog(ERROR) with no test case for that.

>
> 5). The float8 case still has minimum and maximum value arguments that
> it just ignores. It should either not take those arguments, or it
> should respect them, and try to return float8 values in the requested
> range. I suspect that trying to return  float8 values in the requested
> range would be hard, if not impossible, due to the inexact nature of
> double precision arithmetic. That's why I suggested earlier having the
> float8 function not take minval/maxval arguments, and just return
> values in the range 0 to 1.
>
> 11). If the float8 case is made to not have minval/maxval arguments, this 
> code:
>
> +Datum minval = PG_GETARG_DATUM(3),
> +maxval = PG_GETARG_DATUM(4);
>
> and the FunctionCallInfo setup code needs to be different for the
> float8 case, in order to avoid reading and setting undefined
> arguments. Perhaps introduce a "need_val_bounds" boolean variable,
> based on the datatype.

I should have noticed the float8 issue after reading your first
reply.. Actually I don't have speical reason for which I have to support
float8. At least when I working on the patch, I want some toast and
non-toast data type, so supporting both int/bigint and numeric should be
good, at least for my purpose. So in the attached version, I removed the
support for float8 for simplification.

>
> 6). This new function:
>
> +static Datum
> +rand_array_internal(FunctionCallInfo fcinfo, Oid datatype)
> +{
>
> should have a comment block. In particular, it should document what
> its inputs and outputs are.

Done.
>
> 7). This code:
>
> +int num_tuples = PG_GETARG_INT32(0);
> +int minlen = PG_GETARG_INT32(1);
> +int maxlen = PG_GETARG_INT32(2);
> +Datum minval = PG_GETARG_DATUM(3),
> +maxval = PG_GETARG_DATUM(4);
> +rand_array_fctx *fctx;
> +
> +if (datatype == INT4OID)
> +random_fn_oid = F_RANDOM_INT4_INT4;
> +else if (datatype == INT8OID)
...
> should be inside the "if (SRF_IS_FIRSTCALL())" block. There's no point
> repeating all those checks for every call.

Done.
>
> 8). I think it would be neater to code the "if (datatype == INT4OID)
> ... else if ..." as a switch statement.
Done.
>
>
> 9). I would swap the last 2 bound checks round, and simplify the error
> messages as follows:

Done.
>
> Also note that primary error messages should not end with a period.
> See https://www.postgresql.org/docs/current/error-style-guide.html

Thanks for sharing this!

> 10). In this error:
>
> +elog(ERROR, "unsupported type %d for rand_array function.",
> + datatype);
>
> "datatype" is of type Oid, which is unsigned, and so it should use
> "%u" not "%d". Also, as above, it should not end with a period, so it
> should be:
>
> +elog(ERROR, "unsupported type %u for rand_array function",
> + datatype);

Done.

>
> 12). This code:
>
> +random_len_fcinfo->args[0].value = minlen;
> +random_len_fcinfo->args[1].value = maxlen;
>
> should really be:
>
> +random_len_fcinfo->args[0].value = Int32GetDatum(minlen);
> +random_len_fcinfo->args[1].value = Int32GetDatum(maxlen);
>
> It amounts to the same thing, but documents the fact that it's
> converting an integer to a Datum.

Done. 
>
>
> 13). These new functions are significantly under-documented,
> especially when compared to all the other functions on
> https://www.postgresql.org/docs/current/tablefunc.html
>
> They really should have their own subsection, along the same lines as
> "F.41.1.1. Normal_rand", explaining what the functions do, what their
> arguments mean, and giving a couple of usage examples.

Done, very impresive feedback and I know why PostgreSQL can always have
user friendly documentation now.

-- 
Best Regards
Andy Fan

>From 2024871241c35959a259e54a8151d4e9372dbfbf Mon Sep 17 00:00:00 2001
From: Andy Fan 
Date: Mon, 26 Aug 2024 18:50:57 +0800
Subject: [PATCH v20241016 1/1] Add functions rand_array function to
 contrib/tablefunc.

It produces an array of numeric_type with its length in the range of
[minlen,maxlen] and each value is in the range of [minval,maxval].
---
 contrib/tab

Re: replace strtok()

2024-10-16 Thread Peter Eisentraut

On 15.10.24 14:07, Ranier Vilela wrote:

I also wonder, if other places touched by 5d2e1cc11 need corrections
too.
I played with
PG_COLOR=always PG_COLORS="error=01;31" .../initdb

and it looks like this free() call in pg_logging_init():
         char       *colors = strdup(pg_colors_env);

         if (colors)
         {
...
             while ((token = strsep(&colors, ":")))
             {
...
             }

             free(colors);
         }
gets null in colors.

Yeah, I also saw this usage, but I was waiting for a definition for the 
first report.

The solution IMO, would be the same.

diff --git a/src/common/logging.c b/src/common/logging.c
index aedd1ae2d8..45b5316d48 100644
--- a/src/common/logging.c
+++ b/src/common/logging.c
@@ -121,7 +121,7 @@ pg_logging_init(const char *argv0)
   {
   char   *token;

- while ((token = strsep(&colors, ":")))
+ while ((token = strsep(&colors, ":")) != NULL && colors != NULL)
   {
   char   *e = strchr(token, '=');
The advantage of this change is that it would avoid processing 
unnecessary tokens.


This wouldn't fix anything, I think.  If colors is NULL, then strsep() 
already returns NULL, so the added code does nothing.






Re: Vacuum statistics

2024-10-16 Thread Andrei Zubkov
Hi Ilia,

On Wed, 2024-10-16 at 13:31 +0300, Ilia Evdokimov wrote:
> BTW, I recommend renaming the view pg_stat_vacuum_database to
> pg_stat_vacuum_databaseS  for consistency with pg_stat_vacuum_tables
> and pg_stat_vacuum_indexes

Such renaming doesn't seems correct to me because
pg_stat_vacuum_database is consistent with pg_stat_database view, while
pg_stat_vacuum_tables is consistent with pg_stat_all_tables.

This inconsistency is in Postgres views, so it should be changed
synchronously.
-- 
regards, Andrei Zubkov





Re: ECPG Refactor: move sqlca variable in ecpg_log()

2024-10-16 Thread Fujii Masao




On 2024/10/16 15:04, Yuto Sasaki (Fujitsu) wrote:

 >Hmm, I'm not sure we want that, because if we do this, then
 >ECPGget_sqlca() gets run with the debug_mutex held.  In the original
 >coding, it's run without the mutex.

Thank you for pointing that out. I agree with your observation. But there is
another room for improvement - we can avoid unnecessary calls to ECPGget_sqlca()
by moving just before mutex lock.


+1


PSA a patch.


The patch looks good to me.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION





Re: BF mamba failure

2024-10-16 Thread Bertrand Drouvot
Hi,

On Wed, Oct 16, 2024 at 09:43:48AM +0900, Michael Paquier wrote:
> On Fri, Oct 11, 2024 at 08:18:58AM +, Bertrand Drouvot wrote:
> > On Fri, Oct 11, 2024 at 11:07:29AM +0300, Kouber Saparev wrote:
> >> Unfortunately not, we are running 15.4 and planning to upgrade very soon.
> >> Is the patch mentioned already merged in PostgreSQL 16?
> > 
> > Yes, as of 16.4.
> 
> Right.  I'd surely welcome more eyes on what I have posted here:
> https://www.postgresql.org/message-id/zm-8xo93k9yd9...@paquier.xyz

I applied your patch and do confirm that it fixes the issue. While that works I
wonder is there no other way to fix the issue.

Indeed, in pgstat_release_entry_ref(), we're doing:

"
if (pg_atomic_fetch_sub_u32(&entry_ref->shared_entry->refcount, 1) == 1)
.
.
shent = dshash_find(pgStatLocal.shared_hash,
&entry_ref->shared_entry->key,
true);
"

I wonder if we are not decrementing &entry_ref->shared_entry->refcount too 
early.

I mean, wouldn't that make sense to decrement it after the dshash_find() call?
(to ensure a "proper" whole entry locking, done in dshash_find(), first)

> I am a bit annoyed by the fact of making PgStatShared_HashEntry
> slightly larger to track the age of the entries,

Yeah, FWIW, we would be going from 32 bytes:

(gdb)  ptype /o struct PgStatShared_HashEntry
/* offset  |size */  type = struct PgStatShared_HashEntry {
/*  0  |  16 */PgStat_HashKey key;
/* 16  |   1 */_Bool dropped;
/* XXX  3-byte hole  */
/* 20  |   4 */pg_atomic_uint32 refcount;
/* 24  |   8 */dsa_pointer body;

   /* total size (bytes):   32 */
 }

to 40 bytes (with the patch applied):

(gdb) ptype /o struct PgStatShared_HashEntry
/* offset  |size */  type = struct PgStatShared_HashEntry {
/*  0  |  16 */PgStat_HashKey key;
/* 16  |   1 */_Bool dropped;
/* XXX  3-byte hole  */
/* 20  |   4 */pg_atomic_uint32 refcount;
/* 24  |   4 */pg_atomic_uint32 agecount;
/* XXX  4-byte hole  */
/* 32  |   8 */dsa_pointer body;

   /* total size (bytes):   40 */
 }

Due to the padding, that's a 8 bytes increase while we're adding "only" 4 bytes.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Remove deprecated -H option from oid2name

2024-10-16 Thread Daniel Gustafsson
> On 15 Oct 2024, at 22:48, Peter Eisentraut  wrote:
> 
> On 09.10.24 20:30, Daniel Gustafsson wrote:
>>> On 9 Oct 2024, at 19:15, Nathan Bossart  wrote:
>>> Another problem is that "deprecated" may or may not imply that the feature
>>> will be removed in the future.  IMHO we should be clear about that when we
>>> intend to remove something down the road (e.g., "this flag is deprecated
>>> and will be removed in a future major release of PostgreSQL").
>> That's a fair point, but if we don't aim to remove something we have, IMHO, a
>> social contract to maintain the feature instead and at that point it's
>> questionable if it is indeed deprecated.  I guess I think we should separate
>> between discouraged and deprecated.
> 
> The dictionary definition of "deprecate" is "to express disapproval of", 
> which I think is about the same as your "discouraged".  If we need another 
> term, then we should say something like "planned/scheduled to be removed".

I guess my interpretation of "deprecated" is more in line with the Wikipedia
paragraph on deprecation in software.

"Deprecated status may also indicate the feature will be removed in the
future.  Features are deprecated, rather than immediately removed, to
provide backward compatibility and to give programmers time to bring
affected code into compliance with the new standard."

Either way, it's not a hill to die on, if the consensus is to keep features
marked deprecated then I'll withdraw this one.  It's not an important thing,
just something stumbled upon when looking at another thing.

--
Daniel Gustafsson





Re: Set query_id for query contained in utility statement

2024-10-16 Thread Anthonin Bonnefoy
On Tue, Oct 15, 2024 at 3:23 PM jian he  wrote:
> explain(verbose) SELECT 1, 2, 3\;  explain SELECT 1, 2, 3, 4;
> will transformed to
> explain(verbose) SELECT 1, 2, 3;  explain SELECT 1, 2, 3, 4;
>
> it seems to me your patch care about following position.
> explain(verbose) SELECT 1, 2, 3;  explain SELECT 1, 2, 3, 4;
>   ^
> but this patch [1] at another thread will get the top level statement
> (passed the raw parse, no syntax error) beginning more effortless.
> explain(verbose) SELECT 1, 2, 3;  explain SELECT 1, 2, 3, 4;
> ^  ^
>
> can you try to looking at [1]. it may help to resolve this patch problem.
>
> [1] https://www.postgresql.org/message-id/2245576.1728418678%40sss.pgh.pa.us

The top level statement beginning doesn't have an impact on this
patch. The patch's focus is on the nested statement and RawStmt's
location and length are only used to get the nested statement length.
I will need the nested statement's location and length no matter what,
to handle cases like "COPY (UPDATE ...) to stdout;" and only report
the statement within parentheses.

The linked patch will allow a simplification: the "if (@$ < 0) @$ =
@2;" I've added won't be needed anymore. But I think that's the only
possible simplification? I've run the tests on top of the linked patch
and there was no change in the regression output.




Re: Vacuum statistics

2024-10-16 Thread Ilia Evdokimov


On 08.10.2024 19:18, Alena Rybakina wrote:

Made a rebase on a fresh master branch.
--
Regards,
Alena Rybakina
Postgres Professional


Thank you for rebasing.

I have noticed that when I create a table or an index on this table, 
there is no information about the table or index in 
pg_stat_vacuum_tables and pg_stat_vacuum_indexes until we perform a VACUUM.


Example:

CREATE TABLE t (i INT, j INT);
INSERT INTO t SELECT i/10, i/100 FROM  GENERATE_SERIES(1,100) i;
SELECT * FROM pg_stat_vacuum_tables WHERE relname = 't';

(0 rows)
CREATE INDEX ON t (i);
SELECT * FROM pg_stat_vacuum_indexes WHERE relname = 't_i_idx';
...
(0 rows)

I can see the entries after running VACUUM or executing autovacuum. or 
when autovacuum is executed. I would suggest adding a line about the 
relation even if it has not yet been processed by vacuum. Interestingly, 
this issue does not occur with pg_stat_vacuum_database:


CREATE DATABASE example_db;
SELECT * FROM pg_stat_vacuum_database WHERE dbname = 'example_db';
dboid |   dbname | ...
 ...  | example_db | ...
(1 row)

BTW, I recommend renaming the view pg_stat_vacuum_database to 
pg_stat_vacuum_database_S_  for consistency with pg_stat_vacuum_tables 
and pg_stat_vacuum_indexes


--
Regards,
Ilia Evdokimov,
Tantor Labs LLC.


Re: replace strtok()

2024-10-16 Thread Peter Eisentraut

On 15.10.24 14:00, Alexander Lakhin wrote:

I also wonder, if other places touched by 5d2e1cc11 need corrections too.
I played with
PG_COLOR=always PG_COLORS="error=01;31" .../initdb

and it looks like this free() call in pg_logging_init():
         char       *colors = strdup(pg_colors_env);

         if (colors)
         {
...
             while ((token = strsep(&colors, ":")))
             {
...
             }

             free(colors);
         }
gets null in colors.


Yes, this is indeed incorrect.  We need to keep a separate pointer to 
the start of the string to free later.  This matches the example on the 
strsep man page (https://man.freebsd.org/cgi/man.cgi?strsep(3)).  Patch 
attached.
From 1b842bfdc2b303264134687f3ec21fd3e53a0c82 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 16 Oct 2024 09:37:54 +0200
Subject: [PATCH] Fix memory leaks from incorrect strsep() uses

Commit 5d2e1cc117b introduced some strsep() uses, but it did the
memory management wrong in some cases.  We need to keep a separate
pointer to the allocate memory so that we can free it later, because
strsep() advances the pointer we pass to it, and it at the end it
will be NULL, so any free() calls won't do anything.

(This fixes two of the four places changed in commit 5d2e1cc117b.  The
other two don't have this problem.)

Reported-by: Alexander Lakhin 
Discussion: 
https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c39442...@eisentraut.org
---
 src/common/logging.c  | 3 ++-
 src/test/regress/pg_regress.c | 7 +--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/common/logging.c b/src/common/logging.c
index aedd1ae2d8c..3cf119090a5 100644
--- a/src/common/logging.c
+++ b/src/common/logging.c
@@ -120,8 +120,9 @@ pg_logging_init(const char *argv0)
if (colors)
{
char   *token;
+   char   *cp = colors;
 
-   while ((token = strsep(&colors, ":")))
+   while ((token = strsep(&cp, ":")))
{
char   *e = strchr(token, '=');
 
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 5157629b1cc..6c188954b14 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -233,14 +233,17 @@ free_stringlist(_stringlist **listhead)
 static void
 split_to_stringlist(const char *s, const char *delim, _stringlist **listhead)
 {
-   char   *sc = pg_strdup(s);
char   *token;
+   char   *sc;
+   char   *tofree;
+
+   tofree = sc = pg_strdup(s);
 
while ((token = strsep(&sc, delim)))
{
add_stringlist_item(listhead, token);
}
-   free(sc);
+   free(tofree);
 }
 
 /*
-- 
2.47.0



Re: System views for versions reporting

2024-10-16 Thread Peter Eisentraut

On 06.10.24 17:36, Dmitry Dolgov wrote:

Based on the feedback in [1], here is my attempt at implementing system
views for versions reporting. It adds pg_system_versions for showing
things like core version, compiler, LLVM, etc, and pg_system_libraries
for showing linked shared objects.


Is a system view the right interface?  For example, in pgbouncer we just 
added it to the version output:


$ pgbouncer --version
PgBouncer 1.18.0
libevent 2.1.12-stable
adns: c-ares 1.19.0
tls: OpenSSL 3.3.2 3 Sep 2024

That way, you can get this information without having to start a server 
instance.  (Maybe you can't start a server instance because it just 
crashed because of some library version issue ...)





Re: System views for versions reporting

2024-10-16 Thread Joe Conway

On 10/16/24 08:47, Peter Eisentraut wrote:

On 06.10.24 17:36, Dmitry Dolgov wrote:

Based on the feedback in [1], here is my attempt at implementing system
views for versions reporting. It adds pg_system_versions for showing
things like core version, compiler, LLVM, etc, and pg_system_libraries
for showing linked shared objects.


Is a system view the right interface?  For example, in pgbouncer we just
added it to the version output:

$ pgbouncer --version
PgBouncer 1.18.0
libevent 2.1.12-stable
adns: c-ares 1.19.0
tls: OpenSSL 3.3.2 3 Sep 2024

That way, you can get this information without having to start a server
instance.  (Maybe you can't start a server instance because it just
crashed because of some library version issue ...)



While it is also useful to be able to get the info without being able to 
start the server, I think that would be an addition not a replacement. 
When you have a fleet with no direct access to run shell commands, being 
able to get this info via SQL is valuable.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: BitmapHeapScan streaming read user and prelim refactoring

2024-10-16 Thread Tomas Vondra
Hi,

I took a quick look on the first couple parts of this series, up to
0007. I only have a couple minor review comments:

1) 0001

I find it a bit weird that we use ntuples to determine if it's exact or
lossy. Not an issue caused by this patch, of course, but maybe we could
improve that somehow? I mean this:

if (tbmres->ntuples >= 0)
(*exact_pages)++;
else
(*lossy_pages)++;

Also, maybe consider manually wrapping long lines - I haven't tried, but
I guess pgindent did not wrap this. I mean this line, for example:

... whitespace ... &node->stats.lossy_pages, &node->stats.exact_pages))


2) 0003

The "tidr" name is not particularly clear, IMO. Maybe just "range" would
be better?

+  struct
+  {
+  ItemPointerData rs_mintid;
+  ItemPointerData rs_maxtid;
+  }   tidr;

Isn't it a bit strange that this patch remove tmbres from
heapam_scan_bitmap_next_block, when it was already removed from
table_scan_bitmap_next_tuple in the previous commit? Does it make sense
to separate it like this? (Maybe it does, not sure.)

I find this not very clear:

+ *recheck do current page's tuples need recheck
+ *blockno used to validate pf and current block in sync
+ *pfblockno   used to validate pf stays ahead of current block

The "blockno" sounds weird - shouldn't it say "are in sync"? Also, not
clear what "pf" stands for without context (sure, it's "prefetch"). But
Why not to not write "prefetch_blockno" or something like that?


3) 0006

Seems unnecessary to keep this as separate commit, it's  just log
message improvement. I'd merge it to the earlier patch.

4) 0007

Seems fine to me in principle. This adds "subclass" similar to what we
do for plan nodes, except that states are not derived from Node. But
it's the same approach.

Why not to do

scan = (HeapScanDesc *) bscan;

instead of

scan = &bscan->rs_heap_base;

I think the simple cast is what we do for the plan nodes, and we even do
it this way in the opposite direction in a couple places:

BitmapHeapScanDesc *bscan = (BitmapHeapScanDesc *) scan;

BTW would it be a good idea for heapam_scan_bitmap_next_block to check
the passed "scan" has SO_TYPE_BITMAPSCAN? We haven't been checking that
so far I think, but we only had a single struct ...


regards

-- 
Tomas Vondra





Re: Proposal to Enable/Disable Index using ALTER INDEX

2024-10-16 Thread Peter Eisentraut

On 24.09.24 20:21, Tom Lane wrote:

Peter Eisentraut  writes:

On 23.09.24 22:51, Shayon Mukherjee wrote:

I am happy to draft a patch for this as well. I think I have a working
idea so far of where the necessary checks might go. However if you don’t
mind, can you elaborate further on how the effect would be similar to
enable_indexscan?



Planner settings like enable_indexscan used to just add a large number
(disable_cost) to the estimated plan node costs.  It's a bit more
sophisticated in PG17.  But in any case, I imagine "disabling an index"
could use the same mechanism.  Or maybe not, maybe the setting would
just completely ignore the index.


Yeah, I'd be inclined to implement this by having create_index_paths
just not make any paths for rejected indexes.  Or you could back up
another step and keep plancat.c from building IndexOptInfos for them.
The latter might have additional effects, such as preventing the plan
from relying on a uniqueness condition enforced by the index.  Not
clear to me if that's desirable or not.


Yes, I think you'd want that, because one of the purposes of this 
feature would be to test whether it's okay to drop an index.  So you 
don't want the planner to take any account of the index for its decisions.






Re: Proposal to Enable/Disable Index using ALTER INDEX

2024-10-16 Thread Robert Haas
On Wed, Oct 16, 2024 at 8:33 AM Peter Eisentraut  wrote:
> > Yeah, I'd be inclined to implement this by having create_index_paths
> > just not make any paths for rejected indexes.  Or you could back up
> > another step and keep plancat.c from building IndexOptInfos for them.
> > The latter might have additional effects, such as preventing the plan
> > from relying on a uniqueness condition enforced by the index.  Not
> > clear to me if that's desirable or not.
>
> Yes, I think you'd want that, because one of the purposes of this
> feature would be to test whether it's okay to drop an index.  So you
> don't want the planner to take any account of the index for its decisions.

I think this is right. I think we want to avoid invalidating the
index, so we still need to consider it in determining where HOT
updates must be performed, but we don't want to "improve" the plan
using the index if it's disabled.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Large expressions in indexes can't be stored (non-TOASTable)

2024-10-16 Thread Nathan Bossart
On Wed, Oct 16, 2024 at 10:24:32AM +0900, Michael Paquier wrote:
> On Tue, Oct 15, 2024 at 08:20:17PM -0500, Nathan Bossart wrote:
>> I assume all of this will get compiled out in non-USE_ASSERT_CHECKING
>> builds as-is, but I see no problem with surrounding it with an #ifdef to be
>> sure.
> 
> Yeah, I'm not sure that that would always be the case when optimized.
> Code generated can be dumb sometimes even if compilers got much
> smarter in the last 10 years or so (not compiler guy here).

Here is a new version of the patch set with this #ifdef added.  I plan to
give each of the code paths adjusted by 0001 a closer look, but otherwise
I'm hoping to commit these soon.

-- 
nathan
>From b9444fd22d8dcc05e3a27817943be7f5296503fa Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 14 Oct 2024 14:43:26 -0500
Subject: [PATCH v4 1/3] Ensure we have a snapshot when updating various system
 catalogs.

Reported-by: Alexander Lakhin
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/18127-fe54b6a667f29658%40postgresql.org
Discussion: https://postgr.es/m/18309-c0bf914950c46692%40postgresql.org
Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan
Backpatch-through: FIXME
---
 src/backend/commands/tablecmds.c| 13 +++
 src/backend/replication/logical/tablesync.c | 21 ++
 src/backend/replication/logical/worker.c| 24 +
 3 files changed, 58 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1ccc80087c..c6f82e93b9 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -19283,9 +19283,22 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
tab->rel = rel;
}
 
+   /*
+* Detaching the partition might involve TOAST table access, so ensure 
we
+* have a valid snapshot.  We only expect to get here without a snapshot
+* in the concurrent path.
+*/
+   if (concurrent)
+   PushActiveSnapshot(GetTransactionSnapshot());
+   else
+   Assert(HaveRegisteredOrActiveSnapshot());
+
/* Do the final part of detaching */
DetachPartitionFinalize(rel, partRel, concurrent, defaultPartOid);
 
+   if (concurrent)
+   PopActiveSnapshot();
+
ObjectAddressSet(address, RelationRelationId, 
RelationGetRelid(partRel));
 
/* keep our lock until commit */
diff --git a/src/backend/replication/logical/tablesync.c 
b/src/backend/replication/logical/tablesync.c
index e03e761392..d3fbb18438 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -377,6 +377,12 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
replorigin_session_origin_lsn = InvalidXLogRecPtr;
replorigin_session_origin_timestamp = 0;
 
+   /*
+* Updating pg_replication_origin might involve TOAST table 
access, so
+* ensure we have a valid snapshot.
+*/
+   PushActiveSnapshot(GetTransactionSnapshot());
+
/*
 * Drop the tablesync's origin tracking if exists.
 *
@@ -387,6 +393,8 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
 */
replorigin_drop_by_name(originname, true, false);
 
+   PopActiveSnapshot();
+
finish_sync_worker();
}
else
@@ -483,6 +491,12 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
started_tx = true;
}
 
+   /*
+* Updating pg_replication_origin might involve 
TOAST table
+* access, so ensure we have a valid snapshot.
+*/
+   PushActiveSnapshot(GetTransactionSnapshot());
+
/*
 * Remove the tablesync origin tracking if 
exists.
 *
@@ -500,6 +514,8 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)

   sizeof(originname));
replorigin_drop_by_name(originname, true, 
false);
 
+   PopActiveSnapshot();
+
/*
 * Update the state to READY only after the 
origin cleanup.
 */
@@ -1455,8 +1471,13 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
 * Then advance to the LSN got from walrcv_create_slot. This is 
WAL
 * logged for the purpose of recovery. Locks are to prevent the
 * replication origin from vanishing while advancing.
+   

Re: sunsetting md5 password support

2024-10-16 Thread Nathan Bossart
On Fri, Oct 11, 2024 at 04:36:27PM -0500, Nathan Bossart wrote:
> Here is a first attempt at a patch for marking MD5 passwords as deprecated.
> It's quite bare-bones at the moment, so I anticipate future revisions will
> add more content.  Besides sprinkling several deprecation notices
> throughout the documentation, this patch teaches CREATE ROLE and ALTER ROLE
> to emit warnings when setting MD5 passwords.  A new GUC named
> md5_password_warnings can be set to "off" to disable these warnings.  I
> considered adding even more warnings (e.g., when authenticating), but I
> felt that would be far too noisy.

In v2, I've added an entry for the new md5_password_warnings GUC to the
documentation, and I've simplified the passwordcheck test changes a bit.

-- 
nathan
>From e7786f4d2e4c892d34c7992ffddbd1cf9e109be7 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 11 Oct 2024 16:21:09 -0500
Subject: [PATCH v2 1/1] Deprecate MD5 passwords.

MD5 has been considered to be unsuitable for use as a cryptographic
hash algorithm for some time.  Furthermore, MD5 password hashes in
PostgreSQL are vulnerable to pass-the-hash attacks, i.e., knowing
the username and hashed password is sufficient to authenticate.
The SCRAM-SHA-256 method added in v10 is not subject to these
problems and is considered to be superior to MD5.

This commit marks MD5 password support in PostgreSQL as deprecated
and to be removed in a future release.  The documentation now
contains several deprecation notices, and CREATE ROLE and ALTER
ROLE now emit deprecation warnings when setting MD5 passwords.  The
warnings can be disabled by setting the md5_password_warnings
parameter to "off".

Reviewed-by: ???
Discussion: https://postgr.es/m/ZwbfpJJol7lDWajL%40nathan
---
 .../passwordcheck/expected/passwordcheck.out  |  1 +
 .../expected/passwordcheck_1.out  |  1 +
 contrib/passwordcheck/sql/passwordcheck.sql   |  1 +
 doc/src/sgml/catalogs.sgml|  9 +++
 doc/src/sgml/client-auth.sgml |  8 +++
 doc/src/sgml/config.sgml  | 24 +++
 doc/src/sgml/libpq.sgml   |  9 +++
 doc/src/sgml/protocol.sgml|  8 +++
 doc/src/sgml/ref/create_role.sgml |  8 +++
 doc/src/sgml/runtime.sgml | 10 
 src/backend/libpq/crypt.c | 10 
 src/backend/utils/misc/guc_tables.c   |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/libpq/crypt.h |  3 +++
 src/test/modules/test_misc/t/003_check_guc.pl |  2 +-
 src/test/regress/expected/password.out| 15 
 src/test/regress/expected/password_1.out  |  9 +++
 17 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/contrib/passwordcheck/expected/passwordcheck.out 
b/contrib/passwordcheck/expected/passwordcheck.out
index 2027681daf..dfb2ccfe00 100644
--- a/contrib/passwordcheck/expected/passwordcheck.out
+++ b/contrib/passwordcheck/expected/passwordcheck.out
@@ -1,3 +1,4 @@
+SET md5_password_warnings = off;
 LOAD 'passwordcheck';
 CREATE USER regress_passwordcheck_user1;
 -- ok
diff --git a/contrib/passwordcheck/expected/passwordcheck_1.out 
b/contrib/passwordcheck/expected/passwordcheck_1.out
index 5d8d5dcc1c..9519d60a49 100644
--- a/contrib/passwordcheck/expected/passwordcheck_1.out
+++ b/contrib/passwordcheck/expected/passwordcheck_1.out
@@ -1,3 +1,4 @@
+SET md5_password_warnings = off;
 LOAD 'passwordcheck';
 CREATE USER regress_passwordcheck_user1;
 -- ok
diff --git a/contrib/passwordcheck/sql/passwordcheck.sql 
b/contrib/passwordcheck/sql/passwordcheck.sql
index 1fbd6b0e96..5953ece5c2 100644
--- a/contrib/passwordcheck/sql/passwordcheck.sql
+++ b/contrib/passwordcheck/sql/passwordcheck.sql
@@ -1,3 +1,4 @@
+SET md5_password_warnings = off;
 LOAD 'passwordcheck';
 
 CREATE USER regress_passwordcheck_user1;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 964c819a02..0b9ca087c8 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1618,6 +1618,15 @@
will store the md5 hash of xyzzyjoe.
   
 
+  
+   
+Support for MD5-encrypted passwords is deprecated and will be removed in a
+future release of PostgreSQL.  Refer to
+ for details about migrating to another
+password type.
+   
+  
+
   
If the password is encrypted with SCRAM-SHA-256, it has the format:
 
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 51343de7ca..60a1d16288 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -1260,6 +1260,14 @@ omicron bryanh  guest1
server is encrypted for SCRAM (see below), then SCRAM-based
authentication will automatically be chosen instead.
   
+
+  
+   
+Support for MD5-encrypted passwords is deprecated and will be removed
+in a future release of PostgreS

Re: Add support to TLS 1.3 cipher suites and curves lists

2024-10-16 Thread Jacob Champion
On Tue, Oct 15, 2024 at 3:42 AM Daniel Gustafsson  wrote:
> Thanks!  I think the v8 posted todays is about ready to go in and unless there
> are objections I'll go ahead with it shortly.

This new paragraph is missing a close-paren:

> + 
> +  Additionally, LibreSSL is supported 
> using the
> +  OpenSSL compatibility layer.  The 
> minimum
> +  required version is 3.4 (from  class="osname">OpenBSD
> +  version 7.0.
>   

Other than that, LGTM!

Thanks,
--Jacob




Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)

2024-10-16 Thread Shayon Mukherjee
On Oct 15, 2024, at 7:25 PM, David Rowley  wrote:On Wed, 16 Oct 2024 at 03:40, Robert Haas  wrote:On Sat, Oct 12, 2024 at 5:56 AM Shayon Mukherjee  wrote:Thank you for sharing this Robert. I like the idea behind "allowing extensions to control planner behavior” overall and I think it does help towards a powerful extension ecosystem too. I wonder if there is a reality where we can achieve both the outcomes here- Support disabling of indexes [1] through ALTER command- While also building on "allowing extensions to control planner behavior” for the reasons above?[1] https://www.postgresql.org/message-id/ABD42A12-4DCF-4EE4-B903-4C657903CECE%40gmail.comYes, I think we can do both things.I think so too.  I imagine there'd be cases where even hints global toall queries running on the server wouldn't result in the index beingcompletely disabled.  For example, a physical replica might not beprivy to the hints defined on the primary and it might just be thequeries running on the physical replica that are getting the most useout of the given index.  Having the change made in pg_index would meanphysical replicas have the index disabled too. For the primary usecase I have in mind (test disabling indexes you're consideringdropping), having the disabledness replicate would be very useful.+1 and I agree. That said - Thank you everyone for the discussions and pointers.  I now have a new  patch that  introduces the ability to enable or disable indexes using ALTER INDEX and CREATE INDEX commands, and updating get_relation_info in plancat.c to skip disabled indexes entirely by baking in the concept into IndexOptInfo structure. Below are all the relevant details.Original motivation for the problem and proposal for a patch can be found at [1].This patch passes all the existing specs and the newly added regression tests. The patch is ready for review and test. It compiles, so the can patch can be applied for testing as well. Implementation details:- New Grammar:   - ALTER INDEX ... ENABLE/DISABLE  - CREATE INDEX ... DISABLE- Default state is enabled. Indexes are only disabled when explicitly   instructed via CREATE INDEX ... DISABLE or ALTER INDEX ... DISABLE.- Primary Key and Unique constraint indexes are always enabled as well. The   ENABLE/DISABLE grammar is not supported for these types of indexes. They can  be later disabled via ALTER INDEX ... ENABLE/DISABLE if needed.- ALTER INDEX ... ENABLE/DISABLE performs an in-place update of the pg_index   catalog to protect against indcheckxmin [2] (older unrelated thread).- pg_get_indexdef() support the new functionality and grammar. This change is   reflected in \d output for tables and pg_dump. We show the DISABLE syntax accordingly.- Updated create_index.sql regression test to cover the new grammar and verify   that disabled indexes are not used in queries. The test CATALOG_VERSION_NO  - Basic single-column and multi-column indexes  - Partial indexes  - _expression_ indexes  - Join indexes  - GIN and GiST indexes  - Covering indexes  - Range indexes  - Unique indexes and constraints- Adds a new enabled attribute to the IndexOptInfo structure.- Modifies get_relation_info in plancat.c to skip disabled indexes entirely, thus reducing the number of places we need to check if an index is disabled or not. Inspired by the conversations start at [3].  - I chose to modify the logic within get_relation_info as compared to, say, reducing the cost to make the planner not consider an index during planning, mostly to keep the number of changes being introduced to a minimum and also the logic itself being self-contained and easier to under perhaps (?).- No changes are made to stop the index from getting rebuilt. This way we ensure no  data miss or corruption when index is re-nabled.- TOAST indexes are supported and enabled by default as well.- REINDEX CONCURRENTLY is supported as well and existing state of pg_index.indisenabled  is carried over accordingly.- See the changes in create_index.sql to get an idea of the grammar and sql statements. - See the changes in create_index.out to get an idea of the catalogue states and EXPLAIN  output to see when an index is getting used or isn't (when disabled).- Incorporated DavidR's feedback from [4] around documentation and also you will see that by skip disabled indexes entirely from get_relation_info in plancat.c (as mentioned above), we address the other mentioned issues as well.Looking forward to any and all feedback on this patch, including but not limited to code quality, tests, fundamental logic. [1] https://www.postgresql.org/message-id/CANqtF-oXKe0M%3D0QOih6H%2BsZRjE2BWAbkW_1%2B9nMEAMLxUJg5jA%40mail.gmail.com [2] https://www.postgresql.org/message-id/20180618215635.m5vrnxdxhxytv...@alap3.anarazel.de [3] https://www.postgresql.org/message-id/3465209.1727202064%40sss.pgh.pa.us [4] https://www.postgresql.org/message-id/CAApHDvpUNu%3DiVcdJ74sypvgeaCF%2Btfpyb8VRhZaF7DTd1xVr7g%40mail.gmail.comThanksShayon

v1-0001-Introduce-the-ability-to-enable-disa

Re: ECPG cleanup and fix for clang compile-time problem

2024-10-16 Thread Tom Lane
Alexander Lakhin  writes:
> Maybe you would like to fix in passing several (not new) defects, I've
> found while playing with ecpg under Valgrind:

Done.  After evaluation I concluded that none of these were worth the
trouble to back-patch, but by all means let's fix such things in HEAD.

regards, tom lane




Re: New "raw" COPY format

2024-10-16 Thread Daniel Verite
Joel Jacobson wrote:

> However, I thinking rejecting such column data seems like the
> better alternative, to ensure data exported with COPY TO
> can always be imported back using COPY FROM,
> for the same format. 

On the other hand, that might prevent cases where we
want to export, for instance, a valid json array:

  copy (select json_agg(col) from table ) to 'file' RAW

This is a variant of the discussion in [1] where the OP does:

 copy (select json_agg(row_to_json(t)) from  t) TO 'file' 

and he complains that both text and csv "break the JSON".
That discussion morphed into a proposed patch adding JSON
format to COPY,  but RAW would work directly as the OP
expected.

That is, unless  happens to include JSON fields with LF/CRLF
in them, and the RAW format says this is an error condition.
In that case it's quite annoying to make it an error, rather than
simply let it pass.


[1]
https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=k...@mail.gmail.com


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Misleading error "permission denied for table"

2024-10-16 Thread Tom Lane
Nathan Bossart  writes:
> On Wed, Oct 16, 2024 at 07:36:29PM +0530, Ashutosh Bapat wrote:
>> Shouldn't we report "permission defined for column atest5.three?

> We do have "permission denied for column" messages in aclchk.c (e.g.,
> aclcheck_error_col()), but I don't see them actually used anywhere (or at
> least they don't show up in any expected regression test output).  I'm
> inclined to agree that a more specific error would be nice, but I worry
> there's some hidden complexity that's prevented it thus far...

See ExecCheckOneRelPerms, which seems to regard this as a TODO.
I think the hard part is how to report the cases that involve
pg_attribute_aclcheck_all(..., ACLMASK_ANY), which means

 * If 'how' is ACLMASK_ANY, then returns ACLCHECK_OK if user has any of the
 * privileges identified by 'mode' on any non-dropped column in the relation;

so that you can't really finger a specific column as being in
violation.  Maybe we could leave those cases as just mentioning the
rel; not sure if that leads to failing to move the goalposts much.
Otherwise, we could extend ExecCheckOneRelPerms and
pg_attribute_aclcheck_all to pass back a column number, and
then modify the error reporting in ExecCheckPermissions.

regards, tom lane




Re: allowing extensions to control planner behavior

2024-10-16 Thread Robert Haas
On Mon, Oct 14, 2024 at 6:02 AM Jakub Wartak
 wrote:
>> I wonder if we could think about reversing the order of operations
>> here and making it so that we do the distinct-ification during parse
>> analysis or maybe early in planning, so that the name you see EXPLAIN
>> print out is the real name of that thing and not just a value invented
>> for display purposes.
>
> This if that's possible?, or simply some counter and numbering the plan 
> operation? or Andrei's response/idea of using hashing??

I spent some time looking at this. I think everything that we might
ever want to do here is possible, but what I unfortunately couldn't
find is a place to do it where it would be more-or-less free.

In parse analysis, we do detect certain kinds of namespace collisions.
For example, "SELECT * FROM x, x" or "SELECT * FROM x, y x" will fail
with ERROR:  table name "x" specified more than once. Similarly,
"SELECT * FROM generate_series(1,2), x generate_series" will fail with
ERROR:  table name "generate_series" specified more than once, even
though generate_series is not, strictly speaking, a table name. From
this you might infer that each alias can be used only once, but it
turns out that's not entirely correct. If you say "SELECT * FROM x,
q.x" that is totally fine even though both relations end up with the
alias "x". There's a specific exception in the code to allow this
case, when the aliases are the same and but they refer to different
relation OIDs, and this is justified by reference to the SQL standard.
Furthermore, "SELECT * FROM (x JOIN y ON true) x, y" is totally fine
even though there are two x's and two y's. The fact that the
parenthesized part has its own alias makes everything inside the
parentheses a separate scope from the names outside the parentheses,
so there's no name conflict. If you delete the "x" just after the
closing parenthesis then it's all a single scope and you get a
complaint about "y" being used twice. In this example, we're allowed
to have collisions even though there are no subqueries, but it is also
true that each subquery level gets its own scope e.g. "SELECT * FROM
(SELECT * FROM x) x" is permitted.

What I think all this means is that piggy-backing on the existing
logic here doesn't look especially promising. If we're trying to
identify a specific usage of a specific relation inside of a query, we
want global uniqueness across the whole query, but this isn't even
local uniqueness within a certain query level -- it's uniqueness
within a part of a query level with a special carve-out for same-named
tables in different schemas. Just to drive the point home, the
duplicate checks are done with two nested for loops, a good
old-fashioned O(n^2) algorithm, instead of something like entering
everything into a hash table and complaining if the entry already
exists. We're basically relying on the fact that there won't be very
many items at a single level of a FROM-list for this to have
reasonable performance; and scaling it even as far as the whole query
sounds like it might be dubious. Maybe it's possible to rewrite this
somehow to use a hash table and distinguish the cases where it
shouldn't complain, but it doesn't look like a ton of fun.

Once we get past parse analysis, there's really nothing that cares
about duplicate names, as far as I can see. The planner is perfectly
happy to work on multiple instances of the same table or of different
tables with the same alias, and it really has no reason to care about
what anything would look like if it were deparsed and printed out. I
suppose this is why the code works the way it does: by postponing
renaming until somebody actually does EXPLAIN or some other deparsing
operation, we avoid doing it when it isn't "really needed". Aside from
this nudge-the-planner use case there's no real reason to do anything
else.

In terms of solving the problem, nothing prevents us from installing a
hash table in PlannerGlobal and using that to label every
RangeTblEntry (except unnamed joins, probably) with a unique alias. My
tentative idea is to have the hash table key be a string. We'd check
whether the table alias is of the form
string+'_'+positive_integer_without_a_leading_zero. If so, we'd look
up the string in the hash table, else the entire alias. The hash table
entry would tell us which integers were already used for that string,
so then we could arrange to use one that isn't used yet, essentially
re-aliasing tables wherever collisions occur. However, in the "normal"
case where the query plan is not being printed and no
query-plan-frobbing extensions are in use, this effort is all wasted.
Maybe there's a way to do this kind of work only on demand, but it
seems to be somewhat complicated by the way we handle flattening the
range table: initially, every subquery level gets its own range table,
and at the end of planning, those range tables are all collapsed into
one. This means that depending on WHEN we get asked for information
about the re-aliased table 

Re: New "raw" COPY format

2024-10-16 Thread Jacob Champion
On Tue, Oct 15, 2024 at 1:38 PM Joel Jacobson  wrote:
>
> However, I thinking rejecting such column data seems like the
> better alternative, to ensure data exported with COPY TO
> can always be imported back using COPY FROM,
> for the same format. If text column data contains newlines,
> users probably ought to be using the text or csv format instead.

Yeah. I think _someone's_ going to have strong opinions one way or the
other, but that person is not me. And I assume a contents check during
COPY TO is going to have a noticeable performance impact...

> > - RAW seems like an okay-ish label, but for something that's doing as
> > much magic end-of-line detection as this patch is, I'd personally
> > prefer SINGLE (as in, "single column").
>
> It's actually the same end-of-line detection as the text format
> in copyfromparse.c's CopyReadLineText(), except the code
> is simpler thanks to not having to deal with quotes or escapes.

Right, sorry, I hadn't meant to imply that you made it up. :D Just
that a "raw" format that is actually automagically detecting things
doesn't seem very "raw" to me, so I prefer the other name.

> It basically just learns the newline sequence based on the first
> occurrence, and then require it to be the same throughout the file.

A hypothetical type whose text representation can contain '\r' but not
'\n' still can't be unambiguously round-tripped under this scheme:
COPY FROM will see the "mixed" line endings and complain, even though
there's no ambiguity.

Maybe no one will run into that problem in practice? But if they did,
I think that'd be a pretty frustrating limitation. It'd be nice to
override the behavior, to change it from "do what you think I mean" to
"do what I say".

> > - Speaking of magic end-of-line detection, can there be a way to turn
> > that off? Say, via DELIMITER?
> > - Generic DELIMITER support, for any single-byte separator at all,
> > might make a "single-column" format more generally applicable. But I
> > might be over-architecting. And it would make the COPY TO issue even
> > worse...
>
> That's an interesting idea that would provide more flexibility,
> though, at the cost of complicating things by overloading the meaning
> of DELIMITER.

I think that'd be a docs issue rather than a conceptual one, though...
it's still a delimiter. I wouldn't really expect end-user confusion.

> If aiming to make this more generally applicable,
> then at least DELIMITER would need to be multi-byte,
> since otherwise the Windows case \r\n couldn't be specified.

True.

> What I found appealing with the idea of a new COPY format,
> was that instead of overloading the existing options
> with more complexity, a new format wouldn't need to affect
> the existing options, and the new format could be explained
> separately, without making things worse for users not
> using this format.

I agree that we should not touch the existing formats. If
RAW/SINGLE/whatever needed a multibyte line delimiter, I'm not
proposing that the other formats should change.

--Jacob




Re: Misleading error "permission denied for table"

2024-10-16 Thread Nathan Bossart
On Wed, Oct 16, 2024 at 07:36:29PM +0530, Ashutosh Bapat wrote:
> In privileges.sql there are tests for column level privileges e.g.
> 
> INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set
> three = 10 RETURNING atest5.three;
> ERROR:  permission denied for table atest5
> 
> In the above case the current user regress_priv_user4, doesn't have
> privileges to access atest5.three. But the error does not mention
> atest5.three anywhere. In fact, if the same query were to be changed
> to return atest5.four, it would succeed since the user has privileges
> to access column atest5.four.
> 
> Shouldn't we report "permission defined for column atest5.three?

We do have "permission denied for column" messages in aclchk.c (e.g.,
aclcheck_error_col()), but I don't see them actually used anywhere (or at
least they don't show up in any expected regression test output).  I'm
inclined to agree that a more specific error would be nice, but I worry
there's some hidden complexity that's prevented it thus far...

-- 
nathan




Re: System views for versions reporting

2024-10-16 Thread Tom Lane
Joe Conway  writes:
> On 10/16/24 08:47, Peter Eisentraut wrote:
>> That way, you can get this information without having to start a server
>> instance.  (Maybe you can't start a server instance because it just
>> crashed because of some library version issue ...)

> While it is also useful to be able to get the info without being able to 
> start the server, I think that would be an addition not a replacement. 
> When you have a fleet with no direct access to run shell commands, being 
> able to get this info via SQL is valuable.

Yeah.  In addition, I envisioned that this might include information
that's only readily available at runtime.  Don't have a concrete
example at hand (-ENOCAFFEINE) but I think that --version is
necessarily going to be exceedingly constrained in what it can do.

Another problem is that, just like with version(), there is already
code making assumptions about what --version will output.  Most of
that is under our control, but perhaps not all.  The main value
of a new system view, IMV, is that it's a completely green field
for us to define the contents of.

regards, tom lane




Re: Doc: shared_memory_size_in_huge_pages with the "SHOW" command.

2024-10-16 Thread Nathan Bossart
On Wed, Oct 16, 2024 at 03:31:05PM +0900, Fujii Masao wrote:
> Another idea is to use the same wording as for num_os_semaphores in 
> runtime.sgml, like this:
> 
> This parameter can be viewed
> before starting the server with a postgres command 
> like:

WFM

-- 
nathan




Misleading error "permission denied for table"

2024-10-16 Thread Ashutosh Bapat
Hi hackers,
In privileges.sql there are tests for column level privileges e.g.

INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set
three = 10 RETURNING atest5.three;
ERROR:  permission denied for table atest5

In the above case the current user regress_priv_user4, doesn't have
privileges to access atest5.three. But the error does not mention
atest5.three anywhere. In fact, if the same query were to be changed
to return atest5.four, it would succeed since the user has privileges
to access column atest5.four.

Shouldn't we report "permission defined for column atest5.three?

-- 
Best Wishes,
Ashutosh Bapat




Re: Doc: typo in config.sgml

2024-10-16 Thread Bruce Momjian
On Wed, Oct 16, 2024 at 10:00:15AM +0200, Peter Eisentraut wrote:
> On 15.10.24 23:51, Bruce Momjian wrote:
> > On Tue, Oct 15, 2024 at 05:27:49PM -0400, Tom Lane wrote:
> > > Bruce Momjian  writes:
> > > > Well, we can only use Latin-1, so the idea is that we will be explicit
> > > > about specifying Latin-1 only as HTML entities, rather than letting
> > > > non-Latin-1 creep in as UTF8.  We can exclude certain UTF8 or SGML files
> > > > if desired.
> > > 
> > > That policy would cause substantial problems with contributor names
> > > in the release notes.  I agree with Peter that we don't need this.
> > > Catching otherwise-invisible characters seems sufficient.
> > 
> > Uh, why can't we use HTML entities going forward?  Is that harder?
> 
> I think the question should be the other way around.  The entities are a
> historical workaround for when encoding support and rendering support was
> poor.  Now you can just type in the characters you want as is, which seems
> nicer.

Yes, that does make sense, and if we fully supported Unicode, we could
ignore all of this.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"




Re: Doc: typo in config.sgml

2024-10-16 Thread Bruce Momjian
On Wed, Oct 16, 2024 at 09:58:23AM +0200, Peter Eisentraut wrote:
> On 15.10.24 23:51, Bruce Momjian wrote:
> > > I don't see why we need to enforce this at this level.  Whatever 
> > > downstream
> > > toolchain has requirements about which characters are allowed will 
> > > complain
> > > if it encounters a character it doesn't like.
> > 
> > Uh, the PDF build does not complain if you pass it a non-Latin-1 UTF8
> > characters.  To test this I added some Russian characters (non-Latin-1)
> > to release.sgml:
> > 
> > (⟨б⟩, ⟨в⟩, ⟨г⟩, ⟨д⟩, ⟨ж⟩, ⟨з⟩, ⟨к⟩, ⟨л⟩, ⟨м⟩, ⟨н⟩, ⟨п⟩, ⟨р⟩, ⟨с⟩, ⟨т⟩,
> > ⟨ф⟩, ⟨х⟩, ⟨ц⟩, ⟨ч⟩, ⟨ш⟩, ⟨щ⟩), ten vowels (⟨а⟩, ⟨е⟩, ⟨ё⟩, ⟨и⟩, ⟨о⟩, ⟨у⟩,
> > ⟨ы⟩, ⟨э⟩, ⟨ю⟩, ⟨я⟩), a semivowel / consonant (⟨й⟩), and two modifier
> > letters or "signs" (⟨ъ⟩, ⟨ь⟩)
> > 
> > and I ran 'make postgres-US.pdf', and then removed the Russian
> > characters and ran the same command again.  The output, including stderr
> > was identical.  The PDFs, of course, were not, with the Russian
> > characters showing as "".  Makefile output attached.
> 
> Hmm, mine complains:

My Debian 12 toolchain must be older.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"




Re: [PoC] Federated Authn/z with OAUTHBEARER

2024-10-16 Thread Peter Eisentraut

On 15.10.24 20:10, Jacob Champion wrote:

On Tue, Oct 15, 2024 at 11:00 AM Alexander Lakhin  wrote:

I've discovered that starting from 0785d1b8b,
make check -C src/bin/pg_combinebackup
fails under Valgrind, with the following diagnostics:


Yep, sorry for that (and thanks for the report!). It's currently
tracked over at [1], but I should have mentioned it here. The patch I
used is attached, renamed to not stress out the cfbot.


I have committed this fix.





Re: Improve node type forward reference

2024-10-16 Thread Peter Eisentraut

On 15.10.24 16:43, Nathan Bossart wrote:

On Tue, Oct 15, 2024 at 09:02:48AM +0200, Peter Eisentraut wrote:

On 14.10.24 23:28, Nathan Bossart wrote:

On Mon, Oct 14, 2024 at 09:47:59AM +0200, Peter Eisentraut wrote:

But we can do this better by using an incomplete struct, like

  struct Query *viewQuery ...;

That way, everything has the correct type and fewer casts are required. This
technique is already used elsewhere in node type definitions.


I noticed that the examples in parsenodes.h are for structs defined within
the same file.  If the struct is defined in a separate file, I guess you
might need to include another header file wherever it is used, but that
doesn't seem too bad.


No, you can leave the struct incomplete.  You only need to provide its full
definition (= include the other header file) if you actually want to access
the struct's fields.


That's what I figured.  This one LGTM, too, then.


Committed, thanks.





Re: minor doc issue in 9.16.2.1.1. Boolean Predicate Check Expressions

2024-10-16 Thread jian he
On Thu, Oct 17, 2024 at 7:59 AM Bruce Momjian  wrote:
>
>
> Where are we on this?  I still see this behavior.
>
> ---

> > but I found following two examples returning different results,
> > i think they should return the same value.
> > select json_value('1', '$ == "1"' returning jsonb error on error);
> > select json_query('1', '$ == "1"' returning jsonb error on error);

This part has been resolved.
see section Note section in
https://www.postgresql.org/docs/current/functions-json.html#SQLJSON-QUERY-FUNCTIONS


>
> On Fri, Jun 21, 2024 at 04:53:55PM +0800, jian he wrote:
> > On Fri, Jun 21, 2024 at 11:11 AM David G. Johnston
> >  wrote:
> > >
> > > On Thu, Jun 20, 2024 at 7:30 PM jian he  
> > > wrote:
> > >>
> > >> "predicate check expressions return the single three-valued result of
> > >>
> > >> the predicate: true, false, or unknown."
> > >> "unknown" is wrong, because `select 'unknown'::jsonb;` will fail.
> > >> here "unknown" should be "null"? see jsonb_path_query doc entry also.
> > >>

doc 
(https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-CHECK-EXPRESSIONS)
<>
While SQL-standard path expressions return the relevant element(s) of
the queried JSON value, predicate check expressions return the single
three-valued result of the predicate: true, false, or unknown.
<>

https://www.postgresql.org/docs/current/datatype-boolean.html
says
"The boolean type can have several states: “true”, “false”, and a
third state, “unknown”, which is represented by the SQL null value."

but here
select jsonb_path_query('1', '$ == "a"');
return JSON null value, not SQL null value.

however.
select jsonb_path_match('1', '$ == "a"');
return SQL null value.


maybe we can change to
"predicate check expressions return the single three-valued result of
the predicate: true, false, or null"

Then in the  section mention that
when Predicate check expressions cannot be applied, it returns JSON
null for function jsonb_path_query,
return SQL NULL for function jsonb_path_match or @@ operator.




Re: Set query_id for query contained in utility statement

2024-10-16 Thread jian he
hi. Anthonin
please check attached v9-0001, v9-0002, v9-003.

v9-0001-Better-error-reporting-from-extension-scripts-Was.patch
same as v4-0001-Improve-parser-s-reporting-of-statement-start-loc.patch in [1]

v9-0002-Add-tests-covering-pgss-nested-queries.patch same as
v8-0001-Add-tests-covering-pgss-nested-queries.patch in [2]
which is your work.

v9-0003-Track-location-in-nested-explain-statement.patch
is the main change I made based on your patch.

in [3] I mentioned adding "ParseLoc location" to ExplainStmt, then you
found some problems at [4] with multi statements,
now I found a way to resolve it.
I also add "ParseLoc location;" to typedef struct CopyStmt.
copy (select 1) to stdout;
I tested my change can tracking
beginning location and  length of the nested query ("select 1")


I didn't touch other nested queries cases yet, so far only ExplainStmt
and CopyStmt1
IMHO, it's more neat than your patches.
Can you give it a try?

[1] https://www.postgresql.org/message-id/2245576.1728418678%40sss.pgh.pa.us
[2] 
https://www.postgresql.org/message-id/CAO6_XqqMYOxJmHJWCKjP44T9AsW0MmKV87XUYCP3R9JZvYcVaw%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/CACJufxEXSfk4o2jHDhf50fOY6WC%2BdFQke2gmpcz%2BEHVUsmEptg%40mail.gmail.com
[4] 
https://www.postgresql.org/message-id/CAO6_Xqrjr_1Ss0bRe5VFm6OsUwX2nuN_VhbhYj0LFP3acoaaWw%40mail.gmail.com


SELECT pg_stat_statements_reset() IS NOT NULL AS t;
explain(verbose) SELECT 1, 2, 3;
explain(verbose) (SELECT 1, 2, 3);
SELECT toplevel, calls, query FROM pg_stat_statements ORDER BY query
COLLATE "C", toplevel;
will have 2 calls for "SELECT $1, $2, $3"

SELECT pg_stat_statements_reset() IS NOT NULL AS t;
explain(verbose) (SELECT 1, 2, 3);
explain(verbose) SELECT 1, 2, 3;
SELECT toplevel, calls, query FROM pg_stat_statements ORDER BY query
COLLATE "C", toplevel;
will have 2 calls for " (SELECT $1, $2, $3)"
I think that's fine.
From 8718894eecfaf03dcce44f6dd3c90e0bd7294291 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Wed, 16 Oct 2024 20:56:28 +0800
Subject: [PATCH v9 1/3] Better error reporting from extension scripts (Was:
 Extend ALTER OPERATOR)

---
 .../pg_stat_statements/expected/select.out|  5 +-
 contrib/pg_stat_statements/sql/select.sql |  3 +-
 src/backend/nodes/queryjumblefuncs.c  |  6 ++
 src/backend/parser/gram.y | 66 +++
 4 files changed, 34 insertions(+), 46 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index dd6c756f67..e0e2fa265c 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -19,8 +19,9 @@ SELECT 1 AS "int";
1
 (1 row)
 
+/* this comment should not appear in the output */
 SELECT 'hello'
-  -- multiline
+  -- but this one will appear
   AS "text";
  text  
 ---
@@ -129,7 +130,7 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
 ---+--+--
  1 |1 | PREPARE pgss_test (int) AS SELECT $1, $2 LIMIT $3
  4 |4 | SELECT $1   +
-   |  |   -- multiline  +
+   |  |   -- but this one will appear   +
|  |   AS "text"
  2 |2 | SELECT $1 + $2
  3 |3 | SELECT $1 + $2 + $3 AS "add"
diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql
index eb45cb81ad..e0be58d5e2 100644
--- a/contrib/pg_stat_statements/sql/select.sql
+++ b/contrib/pg_stat_statements/sql/select.sql
@@ -12,8 +12,9 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
 --
 SELECT 1 AS "int";
 
+/* this comment should not appear in the output */
 SELECT 'hello'
-  -- multiline
+  -- but this one will appear
   AS "text";
 
 SELECT 'world' AS "text";
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 5e43fd9229..e8bf95690b 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -90,6 +90,12 @@ CleanQuerytext(const char *query, int *location, int *len)
 	/*
 	 * Discard leading and trailing whitespace, too.  Use scanner_isspace()
 	 * not libc's isspace(), because we want to match the lexer's behavior.
+	 *
+	 * Note: the parser now strips leading comments and whitespace from the
+	 * reported stmt_location, so this first loop will only iterate in the
+	 * unusual case that the location didn't propagate to here.  But the
+	 * statement length will extend to the end-of-string or terminating
+	 * semicolon, so the second loop often does something useful.
 	 */
 	while (query_len > 0 && scanner_isspace(query[0]))
 		query++, query_location++, query_len--;
diff --git a/src/backend/parser/gram.y b/src/backend/

Re: Should we document how column DEFAULT expressions work?

2024-10-16 Thread Andrei Lepikhov

On 10/17/24 06:19, Bruce Momjian wrote:

On Fri, Jul  5, 2024 at 05:11:22PM -0400, Bruce Momjian wrote:

On Fri, Jul  5, 2024 at 05:03:35PM -0400, Tom Lane wrote:

Bruce Momjian  writes:

Well, 'now()' certainly _looks_ like a function call, though it isn't.
The fact that 'now()'::timestamptz and 'now'::timestamptz generate
volatile results via a function call was my point.


The only reason 'now()'::timestamptz works is that timestamptz_in
ignores irrelevant punctuation (or what it thinks is irrelevant,
anyway).  I do not think we should include examples that look like
that, because it will further confuse readers who don't already
have a solid grasp of how this works.


Wow, I see that now:

test=> SELECT 'now('::timestamptz;
  timestamptz
---
 2024-07-05 17:04:33.457915-04

If I remove the 'now()' mention in the docs, patch attached, I am
concerned people will be confused whether it is the removal of the
single quotes or the use of "()" which causes insert-time evaluation,
and they might try 'now()'.


Does anyone like this patch?  I changed now()::timestamptz to
now::timestamptz.
Pardon the noise, but can you consider the idea of replacing the phrase 
'data insertion time' with something like 'immediately before the 
insertion operation starts'? Sometimes people (especially younglings) 
ask which time it is precisely: will it differ for each tuple?


--
regards, Andrei Lepikhov





Re: BF mamba failure

2024-10-16 Thread Michael Paquier
On Wed, Oct 16, 2024 at 10:11:08AM +, Bertrand Drouvot wrote:
> Indeed, in pgstat_release_entry_ref(), we're doing:
> 
> if (pg_atomic_fetch_sub_u32(&entry_ref->shared_entry->refcount, 1) == 1)
> .
> .
>   shent = dshash_find(pgStatLocal.shared_hash,
> &entry_ref->shared_entry->key,
> true);
> 
> I wonder if we are not decrementing &entry_ref->shared_entry->refcount too 
> early.
> 
> I mean, wouldn't that make sense to decrement it after the dshash_find() call?
> (to ensure a "proper" whole entry locking, done in dshash_find(), first)

Making that a two-step process (first step to read the refcount,
second step to decrement it after taking the exclusive lock through  
dshash_find) would make the logic more complicated what what we have
now, without offering more protection, afaik, because you'd still need
to worry about a race condition between the first and second steps.
Making this a one-step only would incur more dshash_find() calls than
we actually need, and I would not underestimate that under a
heavily-concurrent pgstat_release_entry_ref() path taken.

> Yeah, FWIW, we would be going from 32 bytes:
>/* total size (bytes):   32 */
> 
> to 40 bytes (with the patch applied):
>/* total size (bytes):   40 */
> 
> Due to the padding, that's a 8 bytes increase while we're adding "only" 4 
> bytes.

I have spent some time digging into all the original pgstat threads 
dealing with the shmem implementation or dshash, and I'm not really
seeing anybody stressing on the size of the individual hash entries.
This stuff was already wasting space with padding originally, so
perhaps we're stressing too much here?  Would anybody like to comment?
--
Michael


signature.asc
Description: PGP signature


Re: Considering fractional paths in Append node

2024-10-16 Thread Andrei Lepikhov

On 10/17/24 07:05, Andy Fan wrote:

Nikita Malakhov  writes:

Helll Nikita,


Hi hackers!

Sorry, I've forgot to attach the patch itself. Please check it out.


Could you check if [1] is related to this subject? I think the hard part
would be which tuple_fraction to use during adding
add_paths_to_append_rel since root->tuple_fraction is on subquery level,
while the add_paths_to_append_rel is only on RelOptInfo level.

[1]
https://www.postgresql.org/message-id/CAApHDvry0nSV62kAOH3iccvfPhGPLN0Q97%2B%3Db1RsDPXDz3%3DCiQ%40mail.gmail.com

Yes, this thread is connected to the code Nikita has proposed.
It is almost related to partitioned cases, of course.
I'm not sure about partitionwise joins - it looks overcomplicated for 
the moment. We just see cases when SeqScan is preferred to IndexScan. 
One logical way is to add into the Append node an alternative fractional 
path and, if at the top level some Limit, IncrementalSort or another 
fractional-friendly node will request only a small part of the data, the 
optimiser will have the option to choose the fractional path.


--
regards, Andrei Lepikhov





Re: Should we document how column DEFAULT expressions work?

2024-10-16 Thread Tom Lane
Andrei Lepikhov  writes:
> Pardon the noise, but can you consider the idea of replacing the phrase 
> 'data insertion time' with something like 'immediately before the 
> insertion operation starts'? Sometimes people (especially younglings) 
> ask which time it is precisely: will it differ for each tuple?

If we're discussing the meaning of "now", it's the transaction
start timestamp, cf

https://www.postgresql.org/docs/devel/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT

I do not think other wordings will improve on that.

regards, tom lane




Re: Doc: shared_memory_size_in_huge_pages with the "SHOW" command.

2024-10-16 Thread Seino Yuki

Thank you, everyone.


the point being to encourage its use before the server is running
as we don't want to allocate anything when tuning it.
I was mistaken and now understand that it needs to be run before the 
server is running.


Another idea is to use the same wording as for num_os_semaphores in 
runtime.sgml, like this:


WFM

I've fixed the patch and will register it in the CF.

Regards,
--
Yuki Seino
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 2c4d5ef640..63669f2d2e 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1431,9 +1431,9 @@ export PG_OOM_ADJUST_VALUE=0
 the operating system to provide enough huge pages of the desired size.
 To determine the number of huge pages needed, use the
 postgres command to see the value of
-.  Note that the
-server must be shut down to view this runtime-computed parameter.
-This might look like:
+.  This parameter
+can be viewed before starting the server with a postgres
+command like:
 
 $ postgres -D $PGDATA -C shared_memory_size_in_huge_pages
 3170


Re: Should we document how column DEFAULT expressions work?

2024-10-16 Thread Bruce Momjian
On Wed, Oct 16, 2024 at 04:45:39PM -0700, David G. Johnston wrote:
> On Wednesday, October 16, 2024, Bruce Momjian  wrote:
> I do not, but maybe I’m being overly pedantic.  All string literals are parsed
> during the create table command.  It’s only the situations where that parsing
> is non-deterministic that cause an issue.
> 
> Is there anything wrong with the patch I proposed?

I thought it was too verbose so the point was not clear.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"




Re: ltree docs imprecise about sorting order

2024-10-16 Thread Bruce Momjian
On Thu, May 23, 2024 at 04:17:50PM +, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/16/ltree.html
> Description:
> 
> The ltree docs available at
> https://www.postgresql.org/docs/current/ltree.html state "Comparison sorts
> in the order of a tree traversal" without specifying the strategy
> implemented to walk the tree. 
> A quick experiment suggests that the implemented solution is pre-ordered
> depth-first search.
> I suggest the ltree docs be amended to "Comparison sorts in the order of a
> pre-ordered depth-first tree traversal".

[ moved to hackers ]

Can someone confirm this and/or create a patch?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"




Re: Considering fractional paths in Append node

2024-10-16 Thread Nikita Malakhov
Hi hackers!

Sorry, I've forgot to attach the patch itself. Please check it out.


-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


0001_append_limit_v1.patch
Description: Binary data


Re: Statistics Import and Export

2024-10-16 Thread Corey Huinker
>
> Code fix with comment on why nobody expects a relpages -1. Test case to
> demonstrate that relpages -1 can happen, and updated doc to reflect the new
> lower bound.
>

Additional fixes, now in a patch-set:

1. Allow relpages to be set to -1 (partitioned tables with partitions have
this value after ANALYZE).
2. Turn off autovacuum on tables (where possible) if they are going to be
the target of pg_set_relation_stats().
3. Allow pg_set_relation_stats to continue past an out-of-range detection
on one attribute, rather than immediately returning false.
From c0efe3fb2adac102e186e086bf94fb29fabba28b Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Tue, 15 Oct 2024 19:09:59 -0400
Subject: [PATCH v2 1/3] Allow pg_set_relation_stats() to set relpages to -1.

While the default value for relpages is 0, if a partitioned table with
at least one child has been analyzed, then the partititoned table will
have a relpages value of -1. pg_set_relation_stats() should therefore be
able to set relpages to -1.

Add test case to demonstrate the behavior of ANALYZE on a partitioned
table with child table(s), and to demonstrate that
pg_set_relation_stats() now accepts -1.
---
 src/backend/statistics/relation_stats.c|  8 +++--
 src/test/regress/expected/stats_import.out | 38 +-
 src/test/regress/sql/stats_import.sql  | 25 ++
 doc/src/sgml/func.sgml |  2 +-
 4 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/src/backend/statistics/relation_stats.c b/src/backend/statistics/relation_stats.c
index 26f15061e8..b90f70b190 100644
--- a/src/backend/statistics/relation_stats.c
+++ b/src/backend/statistics/relation_stats.c
@@ -99,11 +99,15 @@ relation_statistics_update(FunctionCallInfo fcinfo, int elevel)
 	{
 		int32		relpages = PG_GETARG_INT32(RELPAGES_ARG);
 
-		if (relpages < 0)
+		/*
+		 * While the default value for relpages is 0, a partitioned table with at
+		 * least one child partition can have a relpages of -1.
+		 */
+		if (relpages < -1)
 		{
 			ereport(elevel,
 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-	 errmsg("relpages cannot be < 0")));
+	 errmsg("relpages cannot be < -1")));
 			table_close(crel, RowExclusiveLock);
 			return false;
 		}
diff --git a/src/test/regress/expected/stats_import.out b/src/test/regress/expected/stats_import.out
index cd1b80aa43..46e45a5fa3 100644
--- a/src/test/regress/expected/stats_import.out
+++ b/src/test/regress/expected/stats_import.out
@@ -135,9 +135,45 @@ SELECT
 'stats_import.testview'::regclass);
 ERROR:  cannot modify statistics for relation "testview"
 DETAIL:  This operation is not supported for views.
+-- Partitioned tables with at least 1 child partition will, when analyzed,
+-- have a relpages of -1
+CREATE TABLE stats_import.part_parent ( i integer ) PARTITION BY RANGE(i);
+CREATE TABLE stats_import.part_child_1
+  PARTITION OF stats_import.part_parent
+  FOR VALUES FROM (0) TO (10);
+ANALYZE stats_import.part_parent;
+SELECT relpages
+FROM pg_class
+WHERE oid = 'stats_import.part_parent'::regclass;
+ relpages 
+--
+   -1
+(1 row)
+
+-- nothing stops us from setting it to not -1
+SELECT
+pg_catalog.pg_set_relation_stats(
+relation => 'stats_import.part_parent'::regclass,
+relpages => 2::integer);
+ pg_set_relation_stats 
+---
+ t
+(1 row)
+
+-- nothing stops us from setting it to -1
+SELECT
+pg_catalog.pg_set_relation_stats(
+relation => 'stats_import.part_parent'::regclass,
+relpages => -1::integer);
+ pg_set_relation_stats 
+---
+ t
+(1 row)
+
 DROP SCHEMA stats_import CASCADE;
-NOTICE:  drop cascades to 4 other objects
+NOTICE:  drop cascades to 5 other objects
 DETAIL:  drop cascades to type stats_import.complex_type
 drop cascades to table stats_import.test
 drop cascades to sequence stats_import.testseq
 drop cascades to view stats_import.testview
+drop cascades to table stats_import.part_parent
diff --git a/src/test/regress/sql/stats_import.sql b/src/test/regress/sql/stats_import.sql
index 3e9f6d9124..ad129b1ab9 100644
--- a/src/test/regress/sql/stats_import.sql
+++ b/src/test/regress/sql/stats_import.sql
@@ -95,4 +95,29 @@ SELECT
 pg_catalog.pg_clear_relation_stats(
 'stats_import.testview'::regclass);
 
+-- Partitioned tables with at least 1 child partition will, when analyzed,
+-- have a relpages of -1
+CREATE TABLE stats_import.part_parent ( i integer ) PARTITION BY RANGE(i);
+CREATE TABLE stats_import.part_child_1
+  PARTITION OF stats_import.part_parent
+  FOR VALUES FROM (0) TO (10);
+
+ANALYZE stats_import.part_parent;
+
+SELECT relpages
+FROM pg_class
+WHERE oid = 'stats_import.part_parent'::regclass;
+
+-- nothing stops us from setting it to not -1
+SELECT
+pg_catalog.pg_set_relation_stats(
+relation => 'stats_import.part_parent'::regclass,
+relpages => 2::integer);
+
+-- nothing stops us from setting it to -1
+SELECT
+pg_catalog.pg_set_r

Re: Should we document how column DEFAULT expressions work?

2024-10-16 Thread Bruce Momjian
On Fri, Jul  5, 2024 at 05:11:22PM -0400, Bruce Momjian wrote:
> On Fri, Jul  5, 2024 at 05:03:35PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > Well, 'now()' certainly _looks_ like a function call, though it isn't. 
> > > The fact that 'now()'::timestamptz and 'now'::timestamptz generate
> > > volatile results via a function call was my point.
> > 
> > The only reason 'now()'::timestamptz works is that timestamptz_in
> > ignores irrelevant punctuation (or what it thinks is irrelevant,
> > anyway).  I do not think we should include examples that look like
> > that, because it will further confuse readers who don't already
> > have a solid grasp of how this works.
> 
> Wow, I see that now:
> 
>   test=> SELECT 'now('::timestamptz;
> timestamptz
>   ---
>2024-07-05 17:04:33.457915-04
> 
> If I remove the 'now()' mention in the docs, patch attached, I am
> concerned people will be confused whether it is the removal of the
> single quotes or the use of "()" which causes insert-time evaluation,
> and they might try 'now()'.

Does anyone like this patch?  I changed now()::timestamptz to
now::timestamptz.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 83859bac76f..f5b861b387f 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -897,6 +897,13 @@ WITH ( MODULUS numeric_literal, REM
   does not specify a value for the column.  If there is no default
   for a column, then the default is null.
  
+
+ 
+  Note, a string that returns a volatile result once cast to a data
+  type, like 'now'::timestamptz, is evaluated at
+  table creation time, while now::timestamptz
+  (without quotes) is evaluated at data insertion time.
+ 
 

 


Re: pgindent exit status if a file encounters an error

2024-10-16 Thread Bruce Momjian


Uh, where are we on this?

---

On Fri, Jun 28, 2024 at 06:05:35PM +0530, Ashutosh Bapat wrote:
> 
> 
> On Wed, Jun 26, 2024 at 8:54 PM Tom Lane  wrote:
> 
> Ashutosh Bapat  writes:
> > The usage help mentions exit code 2 specifically while describing 
> --check
> > option but it doesn't mention exit code 1. Neither does the README. So I
> > don't think we need to document exit code 3 anywhere. Please let me know
> if
> > you think otherwise.
> 
> I think we should have at least a code comment summarizing the
> possible exit codes, along the lines of
> 
> # Exit codes:
> #   0 -- all OK
> #   1 -- could not invoke pgindent, nothing done
> #   2 -- --check mode and at least one file requires changes
> #   3 -- pgindent failed on at least one file
> 
> 
> Thanks. Here's a patch with these lines.
>  
> In an offline chat Andrew mentioned that he expects more changes in the patch
> and he would take care of those. Will review and test his patch.
> 
> --
> Best Wishes,
> Ashutosh Bapat

> From 2a61830f0a2b6559d04bef75c6aa224e30ed1609 Mon Sep 17 00:00:00 2001
> From: Ashutosh Bapat 
> Date: Wed, 26 Jun 2024 15:46:53 +0530
> Subject: [PATCH 1/2] pgindent exit status on error
> 
> When pg_bsd_indent exits with a non-zero status or reports an error,
> make pgindent exit with non-zero status 3. The program does not exit on
> the first instance of the error. Instead it continues to process
> remaining files as long as some other exit condition is encountered, in
> which case exit code 3 is reported.
> 
> This helps to detect errors automatically when pgindent is run in shells
> which interpret non-zero exit status as failure.
> 
> Author: Ashutosh Bapat
> Reviewed by: Andrew Dunstan
> Discussion: 
> https://www.postgresql.org/message-id/caexhw5sprsifeldp-u1fa5ba7ys2f0gvljmkoobopkadjwq...@mail.gmail.com
> ---
>  src/tools/pgindent/pgindent | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
> index 48d83bc434..03b319c9c3 100755
> --- a/src/tools/pgindent/pgindent
> +++ b/src/tools/pgindent/pgindent
> @@ -1,5 +1,12 @@
>  #!/usr/bin/perl
>  
> +# Program to maintain uniform layout style in our C and Perl code.
> +# Exit codes:
> +#   0 -- all OK
> +#   1 -- could not invoke pgindent, nothing done
> +#   2 -- --check mode and at least one file requires changes
> +#   3 -- pgindent failed on at least one file
> +
>  # Copyright (c) 2021-2024, PostgreSQL Global Development Group
>  
>  use strict;
> @@ -409,6 +416,7 @@ foreach my $source_filename (@files)
>   if ($source eq "")
>   {
>   print STDERR "Failure in $source_filename: " . $error_message . 
> "\n";
> + $status = 3;
>   next;
>   }
>  
> @@ -429,7 +437,7 @@ foreach my $source_filename (@files)
>  
>   if ($check)
>   {
> - $status = 2;
> + $status ||= 2;
>   last unless $diff;
>   }
>   }
> -- 
> 2.34.1
> 


-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"




Re: Limiting overshoot in nbtree's parallel SAOP index scans

2024-10-16 Thread Matthias van de Meent
On Thu, 17 Oct 2024 at 00:33, Peter Geoghegan  wrote:
>
> On Wed, Oct 16, 2024 at 5:48 PM Matthias van de Meent
>  wrote:
> > In v17 and the master branch you'll note 16 buffer hits for the test
> > query. However, when we use more expensive btree compare operations
> > (e.g. by adding pg_usleep(1) to both btint8cmp and btint4cmp), the
> > buffer access count starts to vary a lot and skyrockets to 30+ on my
> > machine, in some cases reaching >100 buffer hits. After applying my
> > patch, the buffer access count is capped to a much more agreeable
> > 16-18 hits - it still shows signs of overshooting the serial bounds,
> > but the number of buffers we overshoot our target is capped and thus
> > significantly lower.
>
> It's not exactly capped, though. Since in any case you're always prone
> to getting extra leaf page reads at the end of each primitive index
> scan. That's not something that's new to Postgres 17, though.

True, but the SAOP-enabled continued overshoot _is_ new: previously,
each backend would only get up to one additional buffer access for
every SOAP scan entry, while now it's only limited by outer SAOP
bounds and index size.

> Anyway, I'm still not convinced. Your test case requires adding a one
> second delay to each ORDER proc comparison,

microsecond. pg_usleep() takes microseconds as argument, and so
pg_usleep(1) should sleep for >=1 microsecond, but definitely much
less than 1 second.

> and so has an unrealistic adversarial character.

I don't think it's too adversarial to expect some compare operations
for a page to take one microsecond, especially when that compare
operation is related to SAOPs. I think complex comparators like those
for jsonb or collated text can certainly take much more CPU time than
your usual integer compare operation, which can definitely be long
enough for other workers to seize the scan. It's just that I didn't
want to spend a lot of time building a dataset that would expose the
issue when a single and obvious modification can do the same.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Should we document how column DEFAULT expressions work?

2024-10-16 Thread David G. Johnston
On Wednesday, October 16, 2024, Bruce Momjian  wrote:

> On Fri, Jul  5, 2024 at 05:11:22PM -0400, Bruce Momjian wrote:
> > On Fri, Jul  5, 2024 at 05:03:35PM -0400, Tom Lane wrote:
> > > Bruce Momjian  writes:
> > > > Well, 'now()' certainly _looks_ like a function call, though it
> isn't.
> > > > The fact that 'now()'::timestamptz and 'now'::timestamptz generate
> > > > volatile results via a function call was my point.
> > >
> > > The only reason 'now()'::timestamptz works is that timestamptz_in
> > > ignores irrelevant punctuation (or what it thinks is irrelevant,
> > > anyway).  I do not think we should include examples that look like
> > > that, because it will further confuse readers who don't already
> > > have a solid grasp of how this works.
> >
> > Wow, I see that now:
> >
> >   test=> SELECT 'now('::timestamptz;
> > timestamptz
> >   ---
> >2024-07-05 17:04:33.457915-04
> >
> > If I remove the 'now()' mention in the docs, patch attached, I am
> > concerned people will be confused whether it is the removal of the
> > single quotes or the use of "()" which causes insert-time evaluation,
> > and they might try 'now()'.
>
> Does anyone like this patch?  I changed now()::timestamptz to
> now::timestamptz.
>

I do not, but maybe I’m being overly pedantic.  All string literals are
parsed during the create table command.  It’s only the situations where
that parsing is non-deterministic that cause an issue.

Is there anything wrong with the patch I proposed?

David J.


Re: POC, WIP: OR-clause support for indexes

2024-10-16 Thread Alexander Korotkov
On Wed, Oct 16, 2024 at 7:22 AM Andrei Lepikhov  wrote:
> On 10/12/24 21:25, Alexander Korotkov wrote:
> > I forgot to specify (COSTS OFF) for EXPLAINs in regression tests.  Fixed in 
> > v42.
> I've passed through the patch set.
>
> Let me put aside the v42-0003  patch—it looks debatable, and I need time
> to analyse the change in regression tests caused by this patch.

Yes, 0003 patch is for illustration purposes for now.  I will not keep
rebasing it.  We can pick it later when main patches are committed.

> Comments look much better according to my current language level. Ideas
> with fast exits also look profitable and are worth an additional
> 'matched' variable.
>
> So, in general, it is ok. I think only one place with
> inner_other_clauses can be improved. Maybe it will be enough to create
> this list only once,  outside 'foreach(j, groupedArgs)' cycle? Also, the
> comment on the necessity of this operation was unclear to me. See the
> attachment for my modest attempt at improving it.

Thank you, I've integrated your patch with minor edits from me.

--
Regards,
Alexander Korotkov
Supabase


v43-0002-Teach-bitmap-path-generation-about-transforming-.patch
Description: Binary data


v43-0001-Transform-OR-clauses-to-SAOP-s-during-index-matc.patch
Description: Binary data


Re: Using per-transaction memory contexts for storing decoded tuples

2024-10-16 Thread Masahiko Sawada
On Wed, Oct 16, 2024 at 10:32 AM Masahiko Sawada  wrote:
>
> On Tue, Oct 15, 2024 at 9:01 PM Amit Kapila  wrote:
> >
> > On Tue, Oct 15, 2024 at 11:15 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Sun, Oct 13, 2024 at 11:00 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Fri, Oct 11, 2024 at 3:40 AM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > Please find the attached patches.
> > > > >
> > >
> > > Thank you for reviewing the patch!
> > >
> > > >
> > > > @@ -343,9 +343,9 @@ ReorderBufferAllocate(void)
> > > >   */
> > > >   buffer->tup_context = GenerationContextCreate(new_ctx,
> > > > "Tuples",
> > > > -   SLAB_LARGE_BLOCK_SIZE,
> > > > -   SLAB_LARGE_BLOCK_SIZE,
> > > > -   SLAB_LARGE_BLOCK_SIZE);
> > > > +   SLAB_DEFAULT_BLOCK_SIZE,
> > > > +   SLAB_DEFAULT_BLOCK_SIZE,
> > > > +   SLAB_DEFAULT_BLOCK_SIZE);
> > > >
> > > > Shouldn't we change the comment atop this change [1] which states that
> > > > we should benchmark the existing values?
> > >
> > > Agreed.
> > >
> >
> > Can we slightly tweak the comments as follows so that it doesn't sound
> > like a fix for a bug?
> >
> > /* To minimize memory fragmentation caused by long-running
> > transactions with changes spanning multiple memory blocks, we use a
> > single fixed-size memory block for decoded tuple storage. The tests
> > showed that the default memory block size maintains logical decoding
> > performance without causing fragmentation due to concurrent
> > transactions. One might think that we can use the max size as
> > SLAB_LARGE_BLOCK_SIZE but the test also showed it doesn't help resolve
> > the memory fragmentation.
>
> Agreed. I've incorporated your comment in the latest patches. I'm
> going to push them today, barring any objections or further comments.

Pushed. Thank you all for reviewing and testing the patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Should we document how column DEFAULT expressions work?

2024-10-16 Thread Tom Lane
Bruce Momjian  writes:
> Does anyone like this patch?  I changed now()::timestamptz to
> now::timestamptz.

No, because you clearly didn't bother to test it:

regression=# select now::timestamptz;
ERROR:  column "now" does not exist
LINE 1: select now::timestamptz;
   ^

Also "a string that returns a volatile result once cast to a data
type" is pretty meaningless to my eyes, not least because the real
problem is that the construct is *not* volatile, but gets folded to
a constant at CREATE TABLE time.

The distinction we want to draw is that 'now'::timestamptz is a
parse-time constant and so is not equivalent to the function call
now() (which already produces timestamptz, so no need for a cast).
This is already covered at the end of section 9.9.5 Current Date/Time,
although I have no objection to repeating it in some form in the
CREATE TABLE docs.

regards, tom lane




Re: Set AUTOCOMMIT to on in script output by pg_dump

2024-10-16 Thread Shinya Kato

On 2024-10-10 14:56, Shinya Kato wrote:

A new patch is attached.
I am not a native English, so corrections to the texts are welcome.


I created a commit fest entry.
https://commitfest.postgresql.org/50/5306/

--
Regards,
Shinya Kato
NTT DATA GROUP CORPORATION




Re: MergeAppend could consider sorting cheapest child path

2024-10-16 Thread Andy Fan
Bruce Momjian  writes:

> Is this still being considered?

I'd +1 on this feature. I guess this would be more useful on parallel
case, where the Sort can be pushed down to parallel worker, and in the
distributed database case, where the Sort can be pushed down to multiple
nodes, at the result, the leader just do the merge works.

At the high level implementaion, sorting *cheapest* child path looks
doesn't add too much overhead on the planning effort. 

-- 
Best Regards
Andy Fan





Re: wrong comment in libpq.h

2024-10-16 Thread Bruce Momjian
On Fri, May 10, 2024 at 11:14:51AM -0700, David Zhang wrote:
> 
> > It looks like this wording "prototypes for functions in" is used many
> > times in src/include/, in many cases equally inaccurately, so I would
> > suggest creating a more comprehensive patch for this.
> 
> I noticed this "prototypes for functions in" in many header files and
> briefly checked them. It kind of all make sense except the bufmgr.h has
> something like below.
> 
> /* in buf_init.c */
> extern void InitBufferPool(void);
> extern Size BufferShmemSize(void);
> 
> /* in localbuf.c */
> extern void AtProcExit_LocalBuffers(void);
> 
> /* in freelist.c */
> 
> which doesn't say "prototypes for functions xxx", but it still make sense
> for me.

I looked at this and decided the GUC section was illogical, so I just
moved the variables up into the be-secure.c section.  Patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index 142c98462ed..2a1ff89eb71 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -86,19 +86,6 @@ extern bool pq_check_connection(void);
 /*
  * prototypes for functions in be-secure.c
  */
-extern PGDLLIMPORT char *ssl_library;
-extern PGDLLIMPORT char *ssl_cert_file;
-extern PGDLLIMPORT char *ssl_key_file;
-extern PGDLLIMPORT char *ssl_ca_file;
-extern PGDLLIMPORT char *ssl_crl_file;
-extern PGDLLIMPORT char *ssl_crl_dir;
-extern PGDLLIMPORT char *ssl_dh_params_file;
-extern PGDLLIMPORT char *ssl_passphrase_command;
-extern PGDLLIMPORT bool ssl_passphrase_command_supports_reload;
-#ifdef USE_SSL
-extern PGDLLIMPORT bool ssl_loaded_verify_locations;
-#endif
-
 extern int	secure_initialize(bool isServerStart);
 extern bool secure_loaded_verify_locations(void);
 extern void secure_destroy(void);
@@ -109,6 +96,27 @@ extern ssize_t secure_write(Port *port, void *ptr, size_t len);
 extern ssize_t secure_raw_read(Port *port, void *ptr, size_t len);
 extern ssize_t secure_raw_write(Port *port, const void *ptr, size_t len);
 
+/*
+ * declarations for variables defined in be-secure.c
+ */
+extern PGDLLIMPORT char *ssl_library;
+extern PGDLLIMPORT char *ssl_ca_file;
+extern PGDLLIMPORT char *ssl_cert_file;
+extern PGDLLIMPORT char *ssl_crl_file;
+extern PGDLLIMPORT char *ssl_crl_dir;
+extern PGDLLIMPORT char *ssl_key_file;
+extern PGDLLIMPORT int ssl_min_protocol_version;
+extern PGDLLIMPORT int ssl_max_protocol_version;
+extern PGDLLIMPORT char *ssl_passphrase_command;
+extern PGDLLIMPORT bool ssl_passphrase_command_supports_reload;
+extern PGDLLIMPORT char *ssl_dh_params_file;
+extern PGDLLIMPORT char *SSLCipherSuites;
+extern PGDLLIMPORT char *SSLECDHCurve;
+extern PGDLLIMPORT bool SSLPreferServerCiphers;
+#ifdef USE_SSL
+extern PGDLLIMPORT bool ssl_loaded_verify_locations;
+#endif
+
 /*
  * prototypes for functions in be-secure-gssapi.c
  */
@@ -116,13 +124,6 @@ extern ssize_t secure_raw_write(Port *port, const void *ptr, size_t len);
 extern ssize_t secure_open_gssapi(Port *port);
 #endif
 
-/* GUCs */
-extern PGDLLIMPORT char *SSLCipherSuites;
-extern PGDLLIMPORT char *SSLECDHCurve;
-extern PGDLLIMPORT bool SSLPreferServerCiphers;
-extern PGDLLIMPORT int ssl_min_protocol_version;
-extern PGDLLIMPORT int ssl_max_protocol_version;
-
 enum ssl_protocol_versions
 {
 	PG_TLS_ANY = 0,


Re: Fixing Hash Join bug I caused with adf97c156

2024-10-16 Thread David Rowley
On Wed, 16 Oct 2024 at 15:21, David Rowley  wrote:
> Here's a patch including a test this time.

I've pushed this patch. Thanks for looking.

David




Re: ltree docs imprecise about sorting order

2024-10-16 Thread David G. Johnston
On Wednesday, October 16, 2024, Bruce Momjian  wrote:

> On Thu, May 23, 2024 at 04:17:50PM +, PG Doc comments form wrote:
> > The following documentation comment has been logged on the website:
> >
> > Page: https://www.postgresql.org/docs/16/ltree.html
> > Description:
> >
> > The ltree docs available at
> > https://www.postgresql.org/docs/current/ltree.html state "Comparison
> sorts
> > in the order of a tree traversal" without specifying the strategy
> > implemented to walk the tree.
> > A quick experiment suggests that the implemented solution is pre-ordered
> > depth-first search.
> > I suggest the ltree docs be amended to "Comparison sorts in the order of
> a
> > pre-ordered depth-first tree traversal".
>
> [ moved to hackers ]
>
> Can someone confirm this and/or create a patch?
>

If we are going to update the description of sorting we should probably
take the chance to mention collation, or I believe the lack thereof.
Including an example with order by wouldn’t hurt either.

The example data is also a perfect tree, all intermediate nodes exist as
their own rows.  It may be informative to include exceptions to this rule.

David J.


Re: New "raw" COPY format

2024-10-16 Thread Joel Jacobson
On Wed, Oct 16, 2024, at 18:04, Jacob Champion wrote:
> A hypothetical type whose text representation can contain '\r' but not
> '\n' still can't be unambiguously round-tripped under this scheme:
> COPY FROM will see the "mixed" line endings and complain, even though
> there's no ambiguity.

Yeah, that's quite an ugly limitation.

> Maybe no one will run into that problem in practice? But if they did,
> I think that'd be a pretty frustrating limitation. It'd be nice to
> override the behavior, to change it from "do what you think I mean" to
> "do what I say".

That would be nice.

>> That's an interesting idea that would provide more flexibility,
>> though, at the cost of complicating things by overloading the meaning
>> of DELIMITER.
>
> I think that'd be a docs issue rather than a conceptual one, though...
> it's still a delimiter. I wouldn't really expect end-user confusion.

Yeah, I meant the docs, but that's probably fine,
we could just add  to DELIMITER.

>> What I found appealing with the idea of a new COPY format,
>> was that instead of overloading the existing options
>> with more complexity, a new format wouldn't need to affect
>> the existing options, and the new format could be explained
>> separately, without making things worse for users not
>> using this format.
>
> I agree that we should not touch the existing formats. If
> RAW/SINGLE/whatever needed a multibyte line delimiter, I'm not
> proposing that the other formats should change.

Right, I didn't think you did either, I meant overloading the existing
options, from a docs perspective.

But I agree it's probably fine if we just overload DELIMITER in the docs,
that should be possible to explain in a pedagogic way,
without causing confusion.

/Joel




Re: ECPG cleanup and fix for clang compile-time problem

2024-10-16 Thread Tom Lane
I wrote:
> Attached are rebased and renumbered 0006-0008, mostly to keep the
> cfbot happy.

For some reason I thought the stuff I pushed later on Monday
didn't interact with these patches, but the cfbot disabused me
of that folly.  Here's a rebased v7 --- no substantive change.

regards, tom lane

From 52e50770ca723278cf04da52fee0ffaddfae8235 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Wed, 16 Oct 2024 13:38:13 -0400
Subject: [PATCH v7 1/3] ecpg: fix some memory leakage of data-type-related
 structures.

ECPGfree_type() and related functions were quite incomplete
about removing subsidiary data structures.  Possibly this is
because ecpg wasn't careful to make sure said data structures
always had their own storage.  Previous patches in this series
cleaned up a lot of that, and I had to add a couple more
mm_strdup's here.

Also, ecpg.trailer tended to overwrite struct_member_list[struct_level]
without bothering to free up its previous contents, thus potentially
leaking a lot of struct-member-related storage.  Add
ECPGfree_struct_member() calls at appropriate points.  (Note: the
lifetime of those lists is not obvious.  They are still live after
initial construction, in order to handle cases like

struct foo { ... } foovar1, foovar2;

We can't delete the list immediately after parsing the right brace,
because it needs to be copied for each of the variables.  Instead,
it's kept around until the next struct declaration.)

Discussion: https://postgr.es/m/2011420.1713493...@sss.pgh.pa.us
---
 src/interfaces/ecpg/preproc/ecpg.header  |  3 ++-
 src/interfaces/ecpg/preproc/ecpg.trailer | 11 +--
 src/interfaces/ecpg/preproc/type.c   | 15 +--
 src/interfaces/ecpg/preproc/type.h   |  5 +++--
 src/interfaces/ecpg/preproc/variable.c   |  7 ++-
 5 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/src/interfaces/ecpg/preproc/ecpg.header b/src/interfaces/ecpg/preproc/ecpg.header
index a415780faf..9554e2b02e 100644
--- a/src/interfaces/ecpg/preproc/ecpg.header
+++ b/src/interfaces/ecpg/preproc/ecpg.header
@@ -507,11 +507,12 @@ add_typedef(const char *name, const char *dimension, const char *length,
 		this->name = mm_strdup(name);
 		this->brace_level = braces_open;
 		this->type = (struct this_type *) mm_alloc(sizeof(struct this_type));
+		this->type->type_storage = NULL;
 		this->type->type_enum = type_enum;
 		this->type->type_str = mm_strdup(name);
 		this->type->type_dimension = mm_strdup(dimension); /* dimension of array */
 		this->type->type_index = mm_strdup(length);	/* length of string */
-		this->type->type_sizeof = ECPGstruct_sizeof;
+		this->type->type_sizeof = ECPGstruct_sizeof ? mm_strdup(ECPGstruct_sizeof) : NULL;
 		this->struct_member_list = (type_enum == ECPGt_struct || type_enum == ECPGt_union) ?
 			ECPGstruct_member_dup(struct_member_list[struct_level]) : NULL;
 
diff --git a/src/interfaces/ecpg/preproc/ecpg.trailer b/src/interfaces/ecpg/preproc/ecpg.trailer
index e48ea2..0b15252433 100644
--- a/src/interfaces/ecpg/preproc/ecpg.trailer
+++ b/src/interfaces/ecpg/preproc/ecpg.trailer
@@ -752,6 +752,7 @@ var_type: simple_type
 			else
 $$.type_sizeof = cat_str(3, "sizeof(", this->name, ")");
 
+			ECPGfree_struct_member(struct_member_list[struct_level]);
 			struct_member_list[struct_level] = ECPGstruct_member_dup(this->struct_member_list);
 		}
 	}
@@ -879,6 +880,7 @@ var_type: simple_type
 			else
 $$.type_sizeof = cat_str(3, "sizeof(", this->name, ")");
 
+			ECPGfree_struct_member(struct_member_list[struct_level]);
 			struct_member_list[struct_level] = ECPGstruct_member_dup(this->struct_member_list);
 		}
 	}
@@ -901,6 +903,7 @@ var_type: simple_type
 			$$.type_dimension = this->type->type_dimension;
 			$$.type_index = this->type->type_index;
 			$$.type_sizeof = this->type->type_sizeof;
+			ECPGfree_struct_member(struct_member_list[struct_level]);
 			struct_member_list[struct_level] = ECPGstruct_member_dup(this->struct_member_list);
 		}
 		else
@@ -910,6 +913,7 @@ var_type: simple_type
 			$$.type_dimension = "-1";
 			$$.type_index = "-1";
 			$$.type_sizeof = "";
+			ECPGfree_struct_member(struct_member_list[struct_level]);
 			struct_member_list[struct_level] = NULL;
 		}
 	}
@@ -925,6 +929,7 @@ enum_definition: '{' c_list '}'
 
 struct_union_type_with_symbol: s_struct_union_symbol
 	{
+		ECPGfree_struct_member(struct_member_list[struct_level]);
 		struct_member_list[struct_level++] = NULL;
 		if (struct_level >= STRUCT_DEPTH)
 			mmerror(PARSE_ERROR, ET_ERROR, "too many levels in nested structure/union definition");
@@ -966,12 +971,13 @@ struct_union_type_with_symbol: s_struct_union_symbol
 		this->name = mm_strdup(su_type.type_str);
 		this->brace_level = braces_open;
 		this->type = (struct this_type *) mm_alloc(sizeof(struct this_type));
+		this->type->type_storage = NULL;
 		this->type->type_enum = su_type.type_enum;
 		this->type->type_str = mm_strdup(su_type.type_str);
 		this->type->type_dimension = mm_s

Re: Fix for consume_xids advancing XIDs incorrectly

2024-10-16 Thread Masahiko Sawada
On Tue, Oct 15, 2024 at 10:06 PM Yushi Ogiwara
 wrote:
>
> Hi,
>
> Thank you for your comment.
>
> Regarding the first patch, I believe it works correctly when
> consume_xids(1) is called.  This is because the lastxid variable in the
> consume_xids_common function is initialized as lastxid =
> ReadNextFullTransactionId(), where the ReadNextFullTransactionId
> function returns the (current XID) + 1.

But it's possible that concurrent transactions consume XIDs in meanwhile, no?

>
> Separately, I found that consume_xids(0) does not behave as expected.
> Below is an example:
>
> postgres=# select txid_current();
>   txid_current
> --
>  45496
> (1 row)
>
> postgres=# select consume_xids(0);
>   consume_xids
> --
>  45497
> (1 row)
>
> postgres=# select consume_xids(0);
>   consume_xids
> --
>  45497
> (1 row)
>
> In the example, the argument to consume_xids is 0, meaning it should not
> consume any XIDs. However, the first invocation of consume_xids(0) looks
> like unexpectedly consuming 1 XID though it's not consuming actually.
> This happens because consume_xids(0) returns the value from
> ReadNextFullTransactionId.
>
> I have updated the patch (skip.diff, attached to this e-mail) to address
> this issue. Now, when consume_xids(0) is called, it returns
> ReadNextFullTransactionId().value - 1, ensuring no XID is consumed as
> shown below:
>
> postgres=# select txid_current();
>   txid_current
> --
>  45498
> (1 row)
>
> postgres=# select consume_xids(0);
>   consume_xids
> --
>  45498
> (1 row)

Hmm, I think if we expect this function to return the last XID that
the function actually consumed, calling consume_xids with 0 should
raise an error instead. Even if it returns
ReadNextFullTransactionId().value - 1 as you proposed, other
concurrent transactions might consume XIDs between txid_current() and
consume_xids(0), resulting in consume_xids() appearing to have
consumed XIDs.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Pgoutput not capturing the generated columns

2024-10-16 Thread Shubham Khanna
On Wed, Oct 9, 2024 at 11:00 AM vignesh C  wrote:
>
> On Tue, 8 Oct 2024 at 11:37, Shubham Khanna  
> wrote:
> >
> > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith  wrote:
> > >
> > > Hi Shubham, here are my review comments for v36-0001.
> > >
> > > ==
> > > 1. General  - merge patches
> > >
> > > It is long past due when patches 0001 and 0002 should've been merged.
> > > AFAIK the split was only because historically these parts had
> > > different authors. But, keeping them separated is not helpful anymore.
> > >
> > > ==
> > > src/backend/catalog/pg_publication.c
> > >
> > > 2.
> > >  Bitmapset *
> > > -pub_collist_validate(Relation targetrel, List *columns)
> > > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> > >
> > > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > > so it should also be removed.
> > >
> > > ==
> > > src/backend/replication/pgoutput/pgoutput.c
> > >
> > > 3.
> > >   /*
> > > - * If the publication is FOR ALL TABLES then it is treated the same as
> > > - * if there are no column lists (even if other publications have a
> > > - * list).
> > > + * To handle cases where the publish_generated_columns option isn't
> > > + * specified for all tables in a publication, we must create a column
> > > + * list that excludes generated columns. So, the publisher will not
> > > + * replicate the generated columns.
> > >   */
> > > - if (!pub->alltables)
> > > + if (!(pub->alltables && pub->pubgencols))
> > >
> > > I still found that comment hard to understand. Does this mean to say
> > > something like:
> > >
> > > --
> > > Process potential column lists for the following cases:
> > >
> > > a. Any publication that is not FOR ALL TABLES.
> > >
> > > b. When the publication is FOR ALL TABLES and
> > > 'publish_generated_columns' is false.
> > > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > > so all columns will be replicated by default. However, if
> > > 'publish_generated_columns' is set to false, column lists must still
> > > be created to exclude any generated columns from being published
> > > --
> > >
> > > ==
> > > src/test/regress/sql/publication.sql
> > >
> > > 4.
> > > +SET client_min_messages = 'WARNING';
> > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) 
> > > STORED);
> > >
> > > AFAIK you don't need to keep changing 'client_min_messages',
> > > particularly now that you've removed the WARNING message that was
> > > previously emitted.
> > >
> > > ~
> > >
> > > 5.
> > > nit - minor comment changes.
> > >
> > > ==
> > > Please refer to the attachment which implements any nits from above.
> > >
> >
> > I have fixed all the given comments. Also, I have created a new 0003
> > patch for the TAP-Tests related to the '011_generated.pl' file. I am
> > planning to merge 0001 and 0003 patches once they will get fixed.
> > The attached patches contain the required changes.
>
> There is inconsistency in replication when a generated column is
> specified in the column list. The generated column data is not
> replicated during initial sync whereas it is getting replicated during
> incremental sync:
> -- publisher
> CREATE TABLE t1(c1 int, c2 int GENERATED ALWAYS AS (c1 * 2) STORED)
> INSERT INTO t1 VALUES (1);
> CREATE PUBLICATION pub1 for table t1(c1, c2);
>
> --subscriber
> CREATE TABLE t1(c1 int, c2 int)
> CREATE SUBSCRIPTION sub1 connection 'dbname=postgres host=localhost
> port=5432' PUBLICATION pub1;
>
> -- Generate column data is not synced during initial sync
> postgres=# select * from t1;
>  c1 | c2
> +
>   1 |
> (1 row)
>
> -- publisher
> INSERT INTO t1 VALUES (2);
>
> -- Whereas generated column data is synced during incremental sync
> postgres=# select * from t1;
>  c1 | c2
> +
>   1 |
>   2 |  4
> (2 rows)
>

There was an issue for this scenario:
CREATE TABLE t1(c1 int, c2 int GENERATED ALWAYS AS (c1 * 2) STORED)
create publication pub1 for table t1(c1, c2)

In this case included_cols was getting set to NULL.
Changed it to get included_cols as it is instead of replacing with
NULL and changed the condition to:
if (server_version >= 18)
{
  remotegenlist[natt] = DatumGetBool(slot_getattr(slot, 5, &isnull));
  /*
   * If the column is generated and neither the generated column
   * option is specified nor it appears in the column list, we will
   * skip it.
   */
  if (remotegenlist[natt] && !has_pub_with_pubgencols && !included_cols)
  {
ExecClearTuple(slot);
continue;
  }
}

I will further think if there is a better solution for this.
Please refer to the updated v39 Patches here in [1]. See [1] for the
changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjLjb%2B98i5ZQUphivxdOZ3hSGLfq2SiWQetUvk8zGyAQwQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.




Re: Pgoutput not capturing the generated columns

2024-10-16 Thread Shubham Khanna
On Wed, Oct 9, 2024 at 11:13 AM Peter Smith  wrote:
>
> Hi, here are my review comments for patch v37-0001.
>
> ==
> Commit message
>
> 1.
> Example usage of subscription option:
> CREATE PUBLICATION FOR TABLE tab_gencol WITH (publish_generated_columns
> = true);
>
> ~
>
> This is wrong -- it's not a "subscription option". Better to just say
> "Example usage:"
>
> ~~~
>
> 2.
> When 'copy_data' is true, during the initial sync, the data is replicated from
> the publisher to the subscriber using the COPY command. The normal COPY
> command does not copy generated columns, so when 'publish_generated_columns'
> is true...
>
> ~
>
> By only mentioning the "when ... is true" case this description does
> not cover the scenario when 'publish_generated_columns' is false when
> the publication column list has a generated column.
>
> ~~~
>
> 3.
> typo - /replication of generated column/replication of generated columns/
> typo - /filed/filled/
> typo - 'pg_publicataion' catalog
>
> ==
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
> 4.
> nit - missing word in a comment
>
> ~~~
>
> fetch_remote_table_info:
> 5.
> + appendStringInfo(&cmd,
>   "  FROM pg_catalog.pg_attribute a"
>   "  LEFT JOIN pg_catalog.pg_index i"
>   "   ON (i.indexrelid = pg_get_replica_identity_index(%u))"
>   " WHERE a.attnum > 0::pg_catalog.int2"
> - "   AND NOT a.attisdropped %s"
> + "   AND NOT a.attisdropped", lrel->remoteid);
> +
> + appendStringInfo(&cmd,
>   "   AND a.attrelid = %u"
>   " ORDER BY a.attnum",
> - lrel->remoteid,
> - (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 ?
> -   "AND a.attgenerated = ''" : ""),
>   lrel->remoteid);
>
> Version v37-0001 has removed a condition previously between these two
> appendStringInfo's. But, that now means there is no reason to keep
> these statements separated. These should be combined now to use one
> appendStringInfo.
>
> ~
>
> 6.
> + if (server_version >= 12)
> + remotegenlist[natt] = DatumGetBool(slot_getattr(slot, 5, &isnull));
> +
>
> Are you sure the version check for 12 is correct? IIUC, this 5
> matches the 'attgenerated' column, but the SQL for that was
> constructed using a different condition:
> if (server_version >= 18)
>   appendStringInfo(&cmd, ", a.attgenerated != ''");
>
> It is this 12 versus 18 difference that makes me suspicious of
> a potential mistake.
>
> ~~~
>
> 7.
> + /*
> + * If the column is generated and neither the generated column option
> + * is specified nor it appears in the column list, we will skip it.
> + */
> + if (remotegenlist[natt] && !has_pub_with_pubgencols &&
> + !bms_is_member(attnum, included_cols))
> + {
> + ExecClearTuple(slot);
> + continue;
> + }
>
> 7b.
> I am also suspicious about how this condition interacts with the other
> condition (shown below) that came earlier:
> /* If the column is not in the column list, skip it. */
> if (included_cols != NULL && !bms_is_member(attnum, included_cols))
>
> Something doesn't seem right. e.g. If we can only get here by passing
> the earlier condition, then it means we already know the generated
> condition was *not* a member of a column list in which case that
> should affect this new condition and the new comment too.
>
> ==
> src/backend/replication/pgoutput/pgoutput.c
>
> pgoutput_column_list_init:
>
> 8.
>   /*
> - * If the publication is FOR ALL TABLES then it is treated the same as
> - * if there are no column lists (even if other publications have a
> - * list).
> + * Process potential column lists for the following cases: a. Any
> + * publication that is not FOR ALL TABLES. b. When the publication is
> + * FOR ALL TABLES and 'publish_generated_columns' is false. FOR ALL
> + * TABLES publication doesn't have user-defined column lists, so all
> + * columns will be replicated by default. However, if
> + * 'publish_generated_columns' is set to false, column lists must
> + * still be created to exclude any generated columns from being
> + * published.
>   */
>
> nit - please reformat this comment so the bullets are readable
>

I have fixed all the comments and posted the v39 patches for them.
Please refer to the updated v39 Patches here in [1]. See [1] for the
changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjLjb%2B98i5ZQUphivxdOZ3hSGLfq2SiWQetUvk8zGyAQwQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.




Soft errors and ThrowErrorData() comment updates

2024-10-16 Thread Jeff Davis
The comments around ThrowErrorData() and how it might apply to soft
errors is slightly confusing. Attached a patch which hopefully
clarifies things.

>From a distance, ThrowErrorData() is somewhat like ReThrowError(), but
it's actually quite different. The former is expecting a free-standing
ErrorData that hasn't been processed at all; while the latter is for an
ERROR specifically, for which errstart() and errfinish() have already
been called.

We also might consider adding some asserts.

Regards,
Jeff Davis

From 2fc18f71afc89d95eb38fe2d2606b5ff07a8b8f9 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 16 Oct 2024 13:16:02 -0700
Subject: [PATCH v1] Improve ThrowErrorData() comments for use with soft
 errors.

---
 src/backend/utils/error/elog.c | 15 +--
 src/include/nodes/miscnodes.h  |  7 ---
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 987ff98067..8acca3e0a0 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -1881,12 +1881,15 @@ FlushErrorState(void)
 /*
  * ThrowErrorData --- report an error described by an ErrorData structure
  *
- * This is somewhat like ReThrowError, but it allows elevels besides ERROR,
- * and the boolean flags such as output_to_server are computed via the
- * default rules rather than being copied from the given ErrorData.
- * This is primarily used to re-report errors originally reported by
- * background worker processes and then propagated (with or without
- * modification) to the backend responsible for them.
+ * This function should be called on an ErrorData structure that isn't stored
+ * on the errordata stack and hasn't been processed yet. It will call
+ * errstart() and errfinish() as needed, so those should not have already been
+ * called.
+ *
+ * ThrowErrorData() is useful for handling soft errors. It's also useful for
+ * re-reporting errors originally reported by background worker processes and
+ * then propagated (with or without modification) to the backend responsible
+ * for them.
  */
 void
 ThrowErrorData(ErrorData *edata)
diff --git a/src/include/nodes/miscnodes.h b/src/include/nodes/miscnodes.h
index 1612b63acd..f6f2dd4377 100644
--- a/src/include/nodes/miscnodes.h
+++ b/src/include/nodes/miscnodes.h
@@ -36,9 +36,10 @@
  * After calling code that might report an error this way, check
  * error_occurred to see if an error happened.  If so, and if details_wanted
  * is true, error_data has been filled with error details (stored in the
- * callee's memory context!).  FreeErrorData() can be called to release
- * error_data, although that step is typically not necessary if the called
- * code was run in a short-lived context.
+ * callee's memory context!).  The ErrorData can be modified (i.e. downgraded
+ * to a WARNING) and reported with ThrowErrorData().  FreeErrorData() can be
+ * called to release error_data, although that step is typically not necessary
+ * if the called code was run in a short-lived context.
  */
 typedef struct ErrorSaveContext
 {
-- 
2.34.1



Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

2024-10-16 Thread Michael Paquier
On Wed, Oct 16, 2024 at 02:50:53PM +0900, Michael Paquier wrote:
> You are right.  f6d4c9cf162b got that wrong.  Will fix and backpatch
> with the extra tests.

And done down to 17 for 0002, down to 16 for 0001, with tweaks in 0001
to limit the use of COPY TO in the queries where we want to force
error patterns linked to the TO clause.

The two tests for the all-column cases with force_quote were not
strictly required for the bug, still are useful to have for coverage
purposes.
--
Michael


signature.asc
Description: PGP signature


Re: minor doc issue in 9.16.2.1.1. Boolean Predicate Check Expressions

2024-10-16 Thread Bruce Momjian


Where are we on this?  I still see this behavior.

---

On Fri, Jun 21, 2024 at 04:53:55PM +0800, jian he wrote:
> On Fri, Jun 21, 2024 at 11:11 AM David G. Johnston
>  wrote:
> >
> > On Thu, Jun 20, 2024 at 7:30 PM jian he  wrote:
> >>
> >> "predicate check expressions return the single three-valued result of
> >>
> >> the predicate: true, false, or unknown."
> >> "unknown" is wrong, because `select 'unknown'::jsonb;` will fail.
> >> here "unknown" should be "null"? see jsonb_path_query doc entry also.
> >>
> >
> > The syntax for json_exists belies this claim (assuming our docs are 
> > accurate there).  Its "on error" options are true/false/unknown.  
> > Additionally, the predicate test operator is named "is unknown" not "is 
> > null".
> >
> > The result of the predicate test, which is never produced as a value, only 
> > a concept, is indeed "unknown" - which then devolves to false when it is 
> > practically applied to determining whether to output the path item being 
> > tested.  As it does also when used in a parth expression.
> >
> 
> in [1] says
> The similar predicate check expression simply returns true, indicating
> that a match exists:
> 
> => select jsonb_path_query(:'json', '$.track.segments[*].HR > 130');
>  jsonb_path_query
> --
>  true
> 
> 
> 
> but in this example
> select jsonb_path_query('1', '$ == "1"');
> return null.
> 
> I guess here, the match evaluation cannot be applied, thus returning null.
> 
> 
> So summary:
> if the boolean predicate check expressions are applicable, return true or 
> false.
> 
> the boolean predicate check expressions are not applicable, return null.
> example: select jsonb_path_query('1', '$ == "a"');
> 
> 
> but I found following two examples returning different results,
> i think they should return the same value.
> select json_value('1', '$ == "1"' returning jsonb error on error);
> select json_query('1', '$ == "1"' returning jsonb error on error);
> 
> [1] 
> https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-CHECK-EXPRESSIONS
> 
> 

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"




Re: MergeAppend could consider sorting cheapest child path

2024-10-16 Thread Bruce Momjian


Is this still being considered?

---

On Tue, Jun 18, 2024 at 07:45:09PM +0300, Alexander Pyhalov wrote:
> Hi.
> 
> Now when planner finds suitable pathkeys in generate_orderedappend_paths(),
> it uses them, even if explicit sort of the cheapest child path could be more
> efficient.
> 
> We encountered this issue on partitioned table with two indexes, where one
> is suitable for sorting, and another is good for selecting data. MergeAppend
> was generated
> with subpaths doing index scan on suitably ordered index and filtering a lot
> of data.
> The suggested fix allows MergeAppend to consider sorting on cheapest
> childrel total path as an alternative.
> 
> -- 
> Best regards,
> Alexander Pyhalov,
> Postgres Professional

> From d5eb3d326d83e2ca47c17552fcc6fd27b6b98d4e Mon Sep 17 00:00:00 2001
> From: Alexander Pyhalov 
> Date: Tue, 18 Jun 2024 15:56:18 +0300
> Subject: [PATCH] MergeAppend could consider using sorted best path.
> 
> This helps when index with suitable pathkeys is not
> good for filtering data.
> ---
>  .../postgres_fdw/expected/postgres_fdw.out|  6 +-
>  src/backend/optimizer/path/allpaths.c | 21 +
>  src/test/regress/expected/inherit.out | 45 +-
>  src/test/regress/expected/partition_join.out  | 87 +++
>  src/test/regress/expected/union.out   |  6 +-
>  src/test/regress/sql/inherit.sql  | 17 
>  6 files changed, 141 insertions(+), 41 deletions(-)
> 
> diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
> b/contrib/postgres_fdw/expected/postgres_fdw.out
> index ea566d50341..687591e4a97 100644
> --- a/contrib/postgres_fdw/expected/postgres_fdw.out
> +++ b/contrib/postgres_fdw/expected/postgres_fdw.out
> @@ -10074,13 +10074,15 @@ SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 
> ON (t1.a = t2.b) WHERE t1.a
> ->  Nested Loop
>   Join Filter: (t1.a = t2.b)
>   ->  Append
> -   ->  Foreign Scan on ftprt1_p1 t1_1
> +   ->  Sort
> + Sort Key: t1_1.a
> + ->  Foreign Scan on ftprt1_p1 t1_1
> ->  Foreign Scan on ftprt1_p2 t1_2
>   ->  Materialize
> ->  Append
>   ->  Foreign Scan on ftprt2_p1 t2_1
>   ->  Foreign Scan on ftprt2_p2 t2_2
> -(10 rows)
> +(12 rows)
>  
>  SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE 
> t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1;
>a  |  b  
> diff --git a/src/backend/optimizer/path/allpaths.c 
> b/src/backend/optimizer/path/allpaths.c
> index 4895cee9944..827bc469269 100644
> --- a/src/backend/optimizer/path/allpaths.c
> +++ b/src/backend/optimizer/path/allpaths.c
> @@ -1845,6 +1845,27 @@ generate_orderedappend_paths(PlannerInfo *root, 
> RelOptInfo *rel,
>   /* Assert we do have an unparameterized path 
> for this child */
>   Assert(cheapest_total->param_info == NULL);
>   }
> + else
> + {
> + /*
> +  * Even if we found necessary pathkeys, using 
> unsorted path
> +  * can be more efficient.
> +  */
> + Pathsort_path;
> +
> + cost_sort(&sort_path,
> +   root,
> +   pathkeys,
> +   
> childrel->cheapest_total_path->total_cost,
> +   
> childrel->cheapest_total_path->rows,
> +   
> childrel->cheapest_total_path->pathtarget->width,
> +   0.0,
> +   work_mem,
> +   -1.0 /* need all tuples to 
> sort them */ );
> +
> + if (compare_path_costs(&sort_path, 
> cheapest_total, TOTAL_COST) < 0)
> + cheapest_total = 
> childrel->cheapest_total_path;
> + }
>  
>   /*
>* When building a fractional path, determine a cheapest
> diff --git a/src/test/regress/expected/inherit.out 
> b/src/test/regress/expected/inherit.out
> index ad732134148..16e78c8d2ff 100644
> --- a/src/test/regress/expected/inherit.out
> +++ b/src/test/regress/expected/inherit.out
> @@ -1555,6 +1555,7 @@ insert into matest2 (name) values ('Test 4');
>  insert into matest3 (name) values ('Test 5');
>  insert into matest3 (name) values ('Test 6');
>  set enable_indexscan = off;  -- force use of seqscan/sort, so no merge
> +set enable_sort = off; -- avoid sorting below MergeAppend
>  explain (verbose, cost

Re: Considering fractional paths in Append node

2024-10-16 Thread Andy Fan
Nikita Malakhov  writes:

Helll Nikita,

> Hi hackers!
>
> Sorry, I've forgot to attach the patch itself. Please check it out.

Could you check if [1] is related to this subject? I think the hard part
would be which tuple_fraction to use during adding
add_paths_to_append_rel since root->tuple_fraction is on subquery level,
while the add_paths_to_append_rel is only on RelOptInfo level.

[1]
https://www.postgresql.org/message-id/CAApHDvry0nSV62kAOH3iccvfPhGPLN0Q97%2B%3Db1RsDPXDz3%3DCiQ%40mail.gmail.com

-- 
Best Regards
Andy Fan





Re: Popcount optimization using AVX512

2024-10-16 Thread Nathan Bossart
On Tue, Oct 08, 2024 at 09:36:03PM -0500, Nathan Bossart wrote:
> On Wed, Jul 31, 2024 at 04:43:02PM -0500, Nathan Bossart wrote:
>> On Wed, Jul 31, 2024 at 01:52:54PM -0700, Andres Freund wrote:
>>> I think we'd be better off enabling architectural features on a per-function
>>> basis, roughly like this:
>>>
>>> [...]
>>> 
>>> /* FIXME: Should be gated by configure check of -mavx512vpopcntdq 
>>> -mavx512bw support */
>>> pg_enable_target("avx512vpopcntdq,avx512bw")
>>> uint64_t
>>> pg_popcount_avx512(const char *buf, int bytes)
>>> ...
>> 
>> I remember wondering why the CRC-32C code wasn't already doing something
>> like this (old compiler versions? non-gcc-like compilers?), and I'm not
>> sure I ever discovered the reason, so out of an abundance of caution I used
>> the same approach for AVX-512.  If we can convince ourselves that
>> __attribute__((target("..."))) is standard enough at this point, +1 for
>> moving to that.
> 
> [...]
> 
> So, at least for the CRC code, __attribute__((target("..."))) was probably
> not widely available enough yet when it was first added.  Unfortunately,
> the ARMv8 CRC target support (without -march) is still pretty new, but it
> might be possible to switch the others to a per-function approach in v18.

Here is a first attempt at using __attribute__((target("..."))) for the
AVX-512 stuff.  Besides allowing us to consolidate the code into a single
file, this simplifies the build file changes quite a bit.

-- 
nathan
>From c97e25e56347c90f169a5ce069a9ea06c873915b Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 16 Oct 2024 15:57:55 -0500
Subject: [PATCH v1 1/1] use __attribute__((target(...))) for AVX-512 stuff

---
 config/c-compiler.m4 |  60 +-
 configure| 163 ++-
 configure.ac |  17 +--
 meson.build  |  17 +--
 src/Makefile.global.in   |   5 -
 src/include/c.h  |  10 ++
 src/makefiles/meson.build|   4 +-
 src/port/Makefile|  12 +-
 src/port/meson.build |   7 +-
 src/port/pg_popcount_avx512.c|  86 +-
 src/port/pg_popcount_avx512_choose.c | 102 -
 11 files changed, 171 insertions(+), 312 deletions(-)
 delete mode 100644 src/port/pg_popcount_avx512_choose.c

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 10f8c7bd0a..aa90f8ef33 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -700,20 +700,20 @@ undefine([Ac_cachevar])dnl
 # Check if the compiler supports the XSAVE instructions using the _xgetbv
 # intrinsic function.
 #
-# An optional compiler flag can be passed as argument (e.g., -mxsave).  If the
-# intrinsic is supported, sets pgac_xsave_intrinsics and CFLAGS_XSAVE.
+# If the intrinsics are supported, sets pgac_xsave_intrinsics.
 AC_DEFUN([PGAC_XSAVE_INTRINSICS],
-[define([Ac_cachevar], [AS_TR_SH([pgac_cv_xsave_intrinsics_$1])])dnl
-AC_CACHE_CHECK([for _xgetbv with CFLAGS=$1], [Ac_cachevar],
-[pgac_save_CFLAGS=$CFLAGS
-CFLAGS="$pgac_save_CFLAGS $1"
-AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ],
-  [return _xgetbv(0) & 0xe0;])],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_xsave_intrinsics])])dnl
+AC_CACHE_CHECK([for _xgetbv], [Ac_cachevar],
+[AC_LINK_IFELSE([AC_LANG_PROGRAM([#include 
+__attribute__((target("xsave")))
+static int xsave_test(void)
+{
+  return _xgetbv(0) & 0xe0;
+}],
+  [return xsave_test();])],
   [Ac_cachevar=yes],
-  [Ac_cachevar=no])
-CFLAGS="$pgac_save_CFLAGS"])
+  [Ac_cachevar=no])])
 if test x"$Ac_cachevar" = x"yes"; then
-  CFLAGS_XSAVE="$1"
   pgac_xsave_intrinsics=yes
 fi
 undefine([Ac_cachevar])dnl
@@ -725,29 +725,27 @@ undefine([Ac_cachevar])dnl
 # _mm512_setzero_si512, _mm512_maskz_loadu_epi8, _mm512_popcnt_epi64,
 # _mm512_add_epi64, and _mm512_reduce_add_epi64 intrinsic functions.
 #
-# Optional compiler flags can be passed as argument (e.g., -mavx512vpopcntdq
-# -mavx512bw).  If the intrinsics are supported, sets
-# pgac_avx512_popcnt_intrinsics and CFLAGS_POPCNT.
+# If the intrinsics are supported, sets pgac_avx512_popcnt_intrinsics.
 AC_DEFUN([PGAC_AVX512_POPCNT_INTRINSICS],
-[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx512_popcnt_intrinsics_$1])])dnl
-AC_CACHE_CHECK([for _mm512_popcnt_epi64 with CFLAGS=$1], [Ac_cachevar],
-[pgac_save_CFLAGS=$CFLAGS
-CFLAGS="$pgac_save_CFLAGS $1"
-AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ],
-  [const char buf@<:@sizeof(__m512i)@:>@;
-   PG_INT64_TYPE popcnt = 0;
-   __m512i accum = _mm512_setzero_si512();
-   const __m512i val = _mm512_maskz_loadu_epi8((__mmask64) 0xf0f0f0f0f0f0f0f0, 
(const __m512i *) buf);
-   const __m512i cnt = _mm512_popcnt_epi64(val);
-   accum = _mm512_add_epi64(accum, cnt);
-   popcnt = _mm512_reduce_add_epi64(accum);
-   /* return computed value, to prevent the above being optimized away */
-   return popcnt == 0;])],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx512_popcnt_intrinsics]

Re: Limiting overshoot in nbtree's parallel SAOP index scans

2024-10-16 Thread Matthias van de Meent
On Wed, 16 Oct 2024 at 20:52, Peter Geoghegan  wrote:
>
> On Fri, Oct 11, 2024 at 10:27 AM Matthias van de Meent
>  wrote:
> > With the introduction of the new SAOP handling in PG17, however, the
> > shared state has become a bit more muddied. Because the default has
> > changed from "stop scanning at the end of a SAOP value's range" to
> > "continue scanning, unless ...", and because it is relatively
> > expensive to determine whether we need to stop or continue we release
> > the parallel scan to continue before we know if a new primitive scan
> > would be preferred.
>
> It's not just relatively expensive. It's essential that _bt_readpage
> be able to release the parallel scan at the earliest opportunity (just
> as soon as the next sibling link is known).

Why is this essential? AFAIK, that's only for performance reasons -
and it's also only for performance reasons that we start a new
primitive scan, so in my view there is a trade-off between releasing
the scan early (and making better use of parallel resources) and
scheduling a new primitive scan (to reduce overall resource usage).

> Whereas a parallel backend
> worker will only decide that it's likely a good idea to do another
> primitive index scan when _bt_readpage has finished executing.

Correct.

> Clearly it would not be sensible to wait until _bt_readpage had gotten
> as far as deciding on the need for another descent of the index. To do
> so would essentially serialize the scan. There are clear advantages to
> doing as little coordination as possible among backends. (I don't
> think that you disagree with any of this, but just want to establish
> context for anybody else that's reading along -- you seem to want to
> do some limited/conditional form of this behavior.)

In general, yes, but if we see strong signals that it may be worth the
wait, then why not?

> > Problem
> > ---
> > The new parallel index scan code can only get into a new primitive
> > scan IFF no parallel worker has taken the next block in the time
> > between the call to _bt_parallel_release at the start of _bt_readpage,
> > and the call to _bt_parallel_primscan_schedule through _bt_checkkeys
> > slightly later in _bt_readpage. This makes the code susceptible to
> > race conditions
>
> Do you have a test case?

Yes, see below.

> I was aware of the issues in this area when designing the
> _bt_parallel_primscan_schedule mechanism. I believe it's true that the
> scan is only strictly guaranteed to be able to make forward progress
> via naively scanning the next leaf page, again and again. This is a
> fairly theoretical point, though.
>
> I believe that it's practically impossible for this to cause us
> problems. That would require a group of backends that all want to
> start another primitive index scan, but each block each other out,
> constantly. Not just once, mind you -- again and again (enough to be
> noticeable). This would have to happen in spite of the fact that we
> only release one backend per _bt_parallel_release call
> (_bt_parallel_release uses ConditionVariableSignal under the hood,
> which only releases the oldest worker waiting on the CV, not all
> backends).

I may not have communicated this well enough, but what happens in the
topical issue is as follows:

0. backend 1 has seized the scan; backend 2 is now waiting for another
page to process.
1. backend 1: _release
2. backend 1: start _readpage
3. backend 2: wake up
4. backend 2: _seize
5. backend 1: finish _readpage;
6. backend 1: _primscan_schedule -> fail to schedule.
... swap roles of backend 1 and 2, and restart at 0.

This is easy to replicate with somewhat expensive compare operator,
one which take enough CPU time so that a signaled backend can wake up
and seize the scan before a page is processed in _readpage. Because
the pages contain no interesting tuples, the scan infrastructure won't
ever need to visit HEAP VM pages or return any data from the scan in
this overshoot case, and thus won't easily get into a state that might
stop seizing the scan for long enough that _primscan_schedule actually
does something - after _readpage finds out a new primitive scan is
needed, it immediately returns and is ready to _seize the scan again.

A test case for this:

Schema and data population:
CREATE TABLE test (a int, b bigint);
INSERT INTO test (a, b) SELECT i % 101, i % 1001 FROM
generate_series(1, 1000) i;
CREATE INDEX ON test (a, b);
DELETE FROM test WHERE a = 0 AND b = 1000;
DELETE FROM test WHERE a = 100 AND b = 0;
VACUUM (FREEZE, INDEX_CLEANUP on, ANALYZE) test;

This gets you an index of 9239 blocks.

Config to force actual parallel scan (note: debug_parallel_query
doesn't exhibit the issue due to single-worker processing of data :-(
):
SET parallel_setup_cost = 0;
SET parallel_tuple_cost = 0;
SET min_parallel_table_scan_size = 1;
SET min_parallel_index_scan_size = 1;

Test query:
SELECT count(*) FROM test WHERE a IN (0, 100) AND b IN (0, 1000); -- count = 196

In v17 and the master branch you'll no

Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)

2024-10-16 Thread Shayon Mukherjee

> On Oct 16, 2024, at 2:15 PM, Shayon Mukherjee  wrote:
> 
> 
>> On Oct 16, 2024, at 12:19 PM, Shayon Mukherjee  wrote:
>> 
>> - ALTER INDEX ... ENABLE/DISABLE performs an in-place update of the pg_index 
>>   catalog to protect against indcheckxmin [2] (older unrelated thread).
> 
> Performing the in place update of the pg_index row from 
> ATExecEnableDisableIndex using systable_inplace_update_begin was failing in 
> CI weirdly but not on my local MacBook when running spec suite. I am also 
> wondering what is causing the requirement [1] to update the row in-place vs 
> say using CatalogTupleUpdate since using the later is working well locally + 
> CI? 
> 


I believe I somewhat understand why the issue might be occurring, where CI is 
failing the create_index regression test and crashing (signal 6 in postmaster 
logs). I suspect it might be due to a segmentation fault or a similar issue.

IsInplaceUpdateOid is used in systable_inplace_update_begin (recently 
introduced), but it doesn’t yet support pg_index. Based on 
check_lock_if_inplace_updateable_rel in heapam.c and IsInplaceUpdateOid in 
catalog.c, introducing support for in-place updates via this new mechanism 
might not be straightforward for pg_index. It would require updating the 
callers to handle in-place updates and locks, I presume?

I did confirm that, based on the v2 PATCH, when not doing in-place updates on 
pg_index - it means that the xmin for pg_index would increment. I suppose 
there’s a risk of indcheckxmin being marked as TRUE when xmin exceeds, 
potentially causing an index to be accidentally skipped ? (I am not 100% sure)

I'll take some time to think this through and familiarize myself with the new 
systable_inplace_update_begin. In the meantime, aside from the in-place updates 
on pg_index, I would love to learn any first impressions or feedback on the 
patch folks may have.

Thank you,
Shayon



Re: Using per-transaction memory contexts for storing decoded tuples

2024-10-16 Thread Masahiko Sawada
On Tue, Oct 15, 2024 at 9:01 PM Amit Kapila  wrote:
>
> On Tue, Oct 15, 2024 at 11:15 PM Masahiko Sawada  
> wrote:
> >
> > On Sun, Oct 13, 2024 at 11:00 PM Amit Kapila  
> > wrote:
> > >
> > > On Fri, Oct 11, 2024 at 3:40 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > Please find the attached patches.
> > > >
> >
> > Thank you for reviewing the patch!
> >
> > >
> > > @@ -343,9 +343,9 @@ ReorderBufferAllocate(void)
> > >   */
> > >   buffer->tup_context = GenerationContextCreate(new_ctx,
> > > "Tuples",
> > > -   SLAB_LARGE_BLOCK_SIZE,
> > > -   SLAB_LARGE_BLOCK_SIZE,
> > > -   SLAB_LARGE_BLOCK_SIZE);
> > > +   SLAB_DEFAULT_BLOCK_SIZE,
> > > +   SLAB_DEFAULT_BLOCK_SIZE,
> > > +   SLAB_DEFAULT_BLOCK_SIZE);
> > >
> > > Shouldn't we change the comment atop this change [1] which states that
> > > we should benchmark the existing values?
> >
> > Agreed.
> >
>
> Can we slightly tweak the comments as follows so that it doesn't sound
> like a fix for a bug?
>
> /* To minimize memory fragmentation caused by long-running
> transactions with changes spanning multiple memory blocks, we use a
> single fixed-size memory block for decoded tuple storage. The tests
> showed that the default memory block size maintains logical decoding
> performance without causing fragmentation due to concurrent
> transactions. One might think that we can use the max size as
> SLAB_LARGE_BLOCK_SIZE but the test also showed it doesn't help resolve
> the memory fragmentation.

Agreed. I've incorporated your comment in the latest patches. I'm
going to push them today, barring any objections or further comments.

> Other than that the changes in the patch look good to me. Note that I
> haven't tested the patch by myself but the test results shown by you
> and others in the thread seem sufficient to proceed with this change.

Understood. Thank you for the discussion and your help reviewing the patch.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


REL16_v3-0001-Reduce-memory-block-size-for-decoded-tuple-storag.patch
Description: Binary data


REL15_v3-0001-Reduce-memory-block-size-for-decoded-tuple-storag.patch
Description: Binary data


REL14_v3-0001-Reduce-memory-block-size-for-decoded-tuple-storag.patch
Description: Binary data


master_v3-0001-Reduce-memory-block-size-for-decoded-tuple-storag.patch
Description: Binary data


REL17_v3-0001-Reduce-memory-block-size-for-decoded-tuple-storag.patch
Description: Binary data


REL12_v3-0001-Reduce-memory-block-size-for-decoded-tuple-storag.patch
Description: Binary data


REL13_v3-0001-Reduce-memory-block-size-for-decoded-tuple-storag.patch
Description: Binary data


Failing assertion in predicate.c

2024-10-16 Thread Yurii Rashkovskii
Hi pg-hackers,

I am encountering an assertion failure in predicate.c when running a high
volume of short serializable transactions. The error occurs during stress
testing with many concurrent connections.

I have replicated this issue on two separate macOS M1 systems, but I have
not been able to reproduce it on Linux. The issue arises in both PostgreSQL
16 and 17. Interestingly, introducing a delay of just 1 ms between
statements seems to prevent the assertion from failing.

The specific failure I encounter is:

```
TRAP: failed Assert("TransactionIdIsValid(tailXid)"), File: "predicate.c",
Line: 885, PID: 1350
```

For anyone interested, I have created a reproduction program available
here: https://github.com/yrashk/pg-predicate-tailxid-assertion-issue. The
program establishes a connection pool and issues simple INSERT statements,
each wrapped in its own serializable transaction.

I would appreciate any insights into this issue or potential steps for
further debugging.


-- 
Founder at Omnigres
https://omnigres.com


Re: Limiting overshoot in nbtree's parallel SAOP index scans

2024-10-16 Thread Peter Geoghegan
On Wed, Oct 16, 2024 at 5:48 PM Matthias van de Meent
 wrote:
> In v17 and the master branch you'll note 16 buffer hits for the test
> query. However, when we use more expensive btree compare operations
> (e.g. by adding pg_usleep(1) to both btint8cmp and btint4cmp), the
> buffer access count starts to vary a lot and skyrockets to 30+ on my
> machine, in some cases reaching >100 buffer hits. After applying my
> patch, the buffer access count is capped to a much more agreeable
> 16-18 hits - it still shows signs of overshooting the serial bounds,
> but the number of buffers we overshoot our target is capped and thus
> significantly lower.

It's not exactly capped, though. Since in any case you're always prone
to getting extra leaf page reads at the end of each primitive index
scan. That's not something that's new to Postgres 17, though.

Anyway, I'm still not convinced. Your test case requires adding a one
second delay to each ORDER proc comparison, and so has an unrealistic
adversarial character. It uses an index-only scan that is drastically
faster if we don't use a parallel scan at all. The serial case is
0.046 ms for me, whereas the parallel case is 3.094 ms (obviously
that's without the addition of a 1 second delay). You've thrown
everything but the kitchen sink at the issue, and yet the impact on
buffer hits really isn't too bad.

Does anybody else have an opinion on this?

-- 
Peter Geoghegan




Re: ActiveState Perl is not valid anymore to build PG17 on the Windows 10/11 platforms, So Documentation still suggesting it should be updated

2024-10-16 Thread Bruce Momjian
On Tue, Jul 16, 2024 at 08:23:11AM -0400, Andrew Dunstan wrote:
> 
> On 2024-07-16 Tu 7:46 AM, Yasir wrote:
> 
> Hi Hackers, 
> 
> Recently, I compiled PG17 on the windows. Till PG16 "ActiveState Perl", as
> instructed in the documentation, was being used successfully on the 
> Windows
> 10/11 to compile PG. 
> However, it looks like that "ActiveState Perl" is not valid anymore to
> compile PG17 on Windows 10/11 but documentation still suggests it.
> Therefore, I think documentation needs to be updated.
> Moreover, I had to install "strawberry's perl" in order to compile PG17 on
> Windows 10/11. Please check out the thread "errors building on windows
> using meson" highlighting the issue. 
> 
> 
> 
> See  https://postgr.es/m/4acddcd4-1c08-44e7-ba60-cab102259...@dunslane.net
> 
> I agree we should fix the docco.

I have written the attached patch to make these changes.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 3a491b59896..0e16bb01d85 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -3809,14 +3809,12 @@ make: *** [postgres] Error 1
 
  
   
-   ActiveState Perl
+   Strawberry Perl

-ActiveState Perl is required to run the build generation scripts. MinGW
+Strawberry Perl is required to run the build generation scripts. MinGW
 or Cygwin Perl will not work. It must also be present in the PATH.
 Binaries can be downloaded from
-https://www.activestate.com";>
-(Note: version 5.14 or later is required,
-the free Standard Distribution is sufficient).
+https://strawberryperl.com";>.

   
 
@@ -3868,10 +3866,9 @@ make: *** [postgres] Error 1
 
  
   
-   ActiveState Tcl
+   Magicsplat Tcl

-Required for building PL/Tcl (Note: version
-8.4 is required, the free Standard Distribution is sufficient).
+Required for building PL/Tcl.

   
 


Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description

2024-10-16 Thread Bruce Momjian
On Mon, Sep  9, 2024 at 01:15:32PM +1000, Peter Smith wrote:
> On Mon, Sep 9, 2024 at 12:20 PM David G. Johnston
>  wrote:
> >
> >
> >
> > On Sun, Sep 8, 2024, 18:55 Peter Smith  wrote:
> >>
> >> Saying "The time..." is fine, but the suggestions given seem backwards to 
> >> me:
> >> - The time this slot was inactivated
> >> - The time when the slot became inactive.
> >> - The time when the slot was deactivated.
> >>
> >> e.g. It is not like light switch. So, there is no moment when the
> >> active slot "became inactive" or "was deactivated".
> >
> >
> > While this is plausible the existing wording and the name of the field 
> > definitely fail to convey this.
> >
> >>
> >> Rather, the 'inactive_since' timestamp field is simply:
> >> - The time the slot was last active.
> >> - The last time the slot was active.
> >
> >
> > I see your point but that wording is also quite confusing when an active 
> > slot returns null for this field.
> >
> > At this point I'm confused enough to need whatever wording is taken to be 
> > supported by someone explaining the code that interacts with this field.
> >
> 
> Me too. I created this thread primarily to get the description changed
> to clarify this field represents a moment in time, rather than a
> duration. So I will be happy with any wording that addresses that.

I dug into the code and came up with the attached patch.  "active" means
there is a process streaming the slot, and the "inactive_since" time
means the last time synchronous slot streaming was stopped.  Doc patch
attached.

I am not sure what other things are needed, but this is certainly
unclear.  This comment from src/backend/replication/logical/slotsync.c
helped me understand this:


 * We need to update inactive_since only when we are promoting standby to
 * correctly interpret the inactive_since if the standby gets promoted
 * without a restart. We don't want the slots to appear inactive for a
 * long time after promotion if they haven't been synchronized recently.
 * Whoever acquires the slot, i.e., makes the slot active, will reset it.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 61d28e701f2..934104e1bc9 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -2435,7 +2435,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
active bool
   
   
-   True if this slot is currently actively being used
+   True if this slot is currently currently being streamed
   
  
 
@@ -2444,9 +2444,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
active_pid int4
   
   
-   The process ID of the session using this slot if the slot
-   is currently actively being used. NULL if
-   inactive.
+   The process ID of the session streaming data for this slot.
+   NULL if inactive.
   
  
 
@@ -2566,15 +2565,13 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
inactive_since timestamptz
   
   
-The time since the slot has become inactive.
-NULL if the slot is currently being used.
-Note that for slots on the standby that are being synced from a
-primary server (whose synced field is
-true), the
-inactive_since indicates the last
-synchronization (see
-)
-time.
+The time slot synchronization (see )
+was most recently stopped.  NULL if the slot
+has always been synchronized.  This is useful for slots on the
+standby that are intended to be synced from a primary server (whose
+synced field is true)
+so they know when the slot stopped being synchronized.
   
  
 
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index f9649eec1a5..cbd8c00aa3f 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -1517,7 +1517,7 @@ update_synced_slots_inactive_since(void)
 	 * correctly interpret the inactive_since if the standby gets promoted
 	 * without a restart. We don't want the slots to appear inactive for a
 	 * long time after promotion if they haven't been synchronized recently.
-	 * Whoever acquires the slot i.e.makes the slot active will reset it.
+	 * Whoever acquires the slot, i.e., makes the slot active, will reset it.
 	 */
 	if (!StandbyMode)
 		return;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 45582cf9d89..e3e0816bcd3 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,7 +205,7 @@ typedef struct ReplicationSlot
 	 */
 	XLogRecPtr	last_saved_confirmed_flush;
 
-	/* The time since the slot has become inactiv

Re: Avoiding superfluous buffer locking during nbtree backwards scans

2024-10-16 Thread Peter Geoghegan
On Fri, Oct 11, 2024 at 7:29 PM Peter Geoghegan  wrote:
> Attached is v5

Now I'm attaching a v6, which further polishes things. Current plan is to
commit something very close to this in the next day or two.

v6 is mostly just further comment polishing. But it also merges together
the two existing _bt_readnextpage loops (for forward and backward scan
directions) into one single loop that handles everything. This is
definitely a win for brevity, and might also be a win for clarity --
but I'm not 100% sure which way I prefer it just yet.

I'll need to make a final decision on this loop merging business
before committing. Anybody else have an opinion, either way?

--
Peter Geoghegan
From 17959762e34cf8ee75af37f98dc5b085d05c7860 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Tue, 8 Oct 2024 11:40:54 -0400
Subject: [PATCH v6] Optimize and simplify nbtree backwards scans.

This enhancement completely supersedes the one added by commit 3f44959f.

Author: Matthias van de Meent 
Author: Peter Geoghegan 
Discussion: https://postgr.es/m/CAEze2WgpBGRgTTxTWVPXc9+PB6fc1a7t+VyGXHzfnrFXcQVxnA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkBTuFv7W2+84jJT8mWZLXVL0GHq2hMUTn6c9Vw=eYrCw@mail.gmail.com
---
 src/include/access/nbtree.h   |  23 +-
 src/backend/access/nbtree/nbtree.c|  77 ++-
 src/backend/access/nbtree/nbtsearch.c | 744 --
 src/backend/access/nbtree/nbtutils.c  |   2 +-
 4 files changed, 401 insertions(+), 445 deletions(-)

diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 90a68375a..22bf38934 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -954,6 +954,7 @@ typedef struct BTScanPosData
 
 	XLogRecPtr	lsn;			/* pos in the WAL stream when page was read */
 	BlockNumber currPage;		/* page referenced by items array */
+	BlockNumber prevPage;		/* page's left link when we scanned it */
 	BlockNumber nextPage;		/* page's right link when we scanned it */
 
 	/*
@@ -965,15 +966,6 @@ typedef struct BTScanPosData
 	bool		moreLeft;
 	bool		moreRight;
 
-	/*
-	 * Direction of the scan at the time that _bt_readpage was called.
-	 *
-	 * Used by btrestrpos to "restore" the scan's array keys by resetting each
-	 * array to its first element's value (first in this scan direction). This
-	 * avoids the need to directly track the array keys in btmarkpos.
-	 */
-	ScanDirection dir;
-
 	/*
 	 * If we are doing an index-only scan, nextTupleOffset is the first free
 	 * location in the associated tuple storage workspace.
@@ -1022,6 +1014,7 @@ typedef BTScanPosData *BTScanPos;
 #define BTScanPosInvalidate(scanpos) \
 	do { \
 		(scanpos).currPage = InvalidBlockNumber; \
+		(scanpos).prevPage = InvalidBlockNumber; \
 		(scanpos).nextPage = InvalidBlockNumber; \
 		(scanpos).buf = InvalidBuffer; \
 		(scanpos).lsn = InvalidXLogRecPtr; \
@@ -1073,6 +1066,7 @@ typedef struct BTScanOpaqueData
 	 * markPos.
 	 */
 	int			markItemIndex;	/* itemIndex, or -1 if not valid */
+	ScanDirection markDir;		/* direction of scan with mark */
 
 	/* keep these last in struct for efficiency */
 	BTScanPosData currPos;		/* current position data */
@@ -1091,7 +1085,6 @@ typedef struct BTReadPageState
 	OffsetNumber minoff;		/* Lowest non-pivot tuple's offset */
 	OffsetNumber maxoff;		/* Highest non-pivot tuple's offset */
 	IndexTuple	finaltup;		/* Needed by scans with array keys */
-	BlockNumber prev_scan_page; /* previous _bt_parallel_release block */
 	Page		page;			/* Page being read */
 
 	/* Per-tuple input parameters, set by _bt_readpage for _bt_checkkeys */
@@ -1192,12 +1185,14 @@ extern int	btgettreeheight(Relation rel);
 /*
  * prototypes for internal functions in nbtree.c
  */
-extern bool _bt_parallel_seize(IndexScanDesc scan, BlockNumber *pageno,
-			   bool first);
-extern void _bt_parallel_release(IndexScanDesc scan, BlockNumber scan_page);
+extern bool _bt_parallel_seize(IndexScanDesc scan, BlockNumber *next_scan_page,
+			   BlockNumber *curr_page, bool first);
+extern void _bt_parallel_release(IndexScanDesc scan,
+ BlockNumber next_scan_page,
+ BlockNumber curr_page);
 extern void _bt_parallel_done(IndexScanDesc scan);
 extern void _bt_parallel_primscan_schedule(IndexScanDesc scan,
-		   BlockNumber prev_scan_page);
+		   BlockNumber curr_page);
 
 /*
  * prototypes for functions in nbtdedup.c
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 9b968aa0f..ff4fa2cf9 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -66,7 +66,9 @@ typedef enum
  */
 typedef struct BTParallelScanDescData
 {
-	BlockNumber btps_scanPage;	/* latest or next page to be scanned */
+	BlockNumber btps_nextScanPage;	/* next page to be scanned */
+	BlockNumber btps_lastCurrPage;	/* page whose sibling link was copied into
+	 * btps_scanPage */
 	BTPS_State	btps_pageStatus;	/* indicates whether next page is
 	 * available for scan.

Re: Pgoutput not capturing the generated columns

2024-10-16 Thread Shubham Khanna
On Wed, Oct 9, 2024 at 11:52 AM vignesh C  wrote:
>
> On Tue, 8 Oct 2024 at 11:37, Shubham Khanna  
> wrote:
> >
> > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith  wrote:
> > >
> > > Hi Shubham, here are my review comments for v36-0001.
> > >
> > > ==
> > > 1. General  - merge patches
> > >
> > > It is long past due when patches 0001 and 0002 should've been merged.
> > > AFAIK the split was only because historically these parts had
> > > different authors. But, keeping them separated is not helpful anymore.
> > >
> > > ==
> > > src/backend/catalog/pg_publication.c
> > >
> > > 2.
> > >  Bitmapset *
> > > -pub_collist_validate(Relation targetrel, List *columns)
> > > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> > >
> > > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > > so it should also be removed.
> > >
> > > ==
> > > src/backend/replication/pgoutput/pgoutput.c
> > >
> > > 3.
> > >   /*
> > > - * If the publication is FOR ALL TABLES then it is treated the same as
> > > - * if there are no column lists (even if other publications have a
> > > - * list).
> > > + * To handle cases where the publish_generated_columns option isn't
> > > + * specified for all tables in a publication, we must create a column
> > > + * list that excludes generated columns. So, the publisher will not
> > > + * replicate the generated columns.
> > >   */
> > > - if (!pub->alltables)
> > > + if (!(pub->alltables && pub->pubgencols))
> > >
> > > I still found that comment hard to understand. Does this mean to say
> > > something like:
> > >
> > > --
> > > Process potential column lists for the following cases:
> > >
> > > a. Any publication that is not FOR ALL TABLES.
> > >
> > > b. When the publication is FOR ALL TABLES and
> > > 'publish_generated_columns' is false.
> > > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > > so all columns will be replicated by default. However, if
> > > 'publish_generated_columns' is set to false, column lists must still
> > > be created to exclude any generated columns from being published
> > > --
> > >
> > > ==
> > > src/test/regress/sql/publication.sql
> > >
> > > 4.
> > > +SET client_min_messages = 'WARNING';
> > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) 
> > > STORED);
> > >
> > > AFAIK you don't need to keep changing 'client_min_messages',
> > > particularly now that you've removed the WARNING message that was
> > > previously emitted.
> > >
> > > ~
> > >
> > > 5.
> > > nit - minor comment changes.
> > >
> > > ==
> > > Please refer to the attachment which implements any nits from above.
> > >
> >
> > I have fixed all the given comments. Also, I have created a new 0003
> > patch for the TAP-Tests related to the '011_generated.pl' file. I am
> > planning to merge 0001 and 0003 patches once they will get fixed.
> > The attached patches contain the required changes.
>
> Few comments:
> 1) I felt this change need not be part of this patch, if required it
> can be proposed as a separate patch:
> +   if (server_version >= 15)
> {
> WalRcvExecResult *pubres;
> TupleTableSlot *tslot;
> Oid attrsRow[] = {INT2VECTOROID};
> -   StringInfoData pub_names;
> -
> -   initStringInfo(&pub_names);
> -   foreach(lc, MySubscription->publications)
> -   {
> -   if (foreach_current_index(lc) > 0)
> -   appendStringInfoString(&pub_names, ", ");
> -   appendStringInfoString(&pub_names,
> quote_literal_cstr(strVal(lfirst(lc;
> -   }
> +   StringInfo  pub_names = makeStringInfo();
>
> 2) These two statements can be combined in to single appendStringInfo:
> +   appendStringInfo(&cmd,
>  "  FROM pg_catalog.pg_attribute a"
>  "  LEFT JOIN pg_catalog.pg_index i"
>  "   ON (i.indexrelid =
> pg_get_replica_identity_index(%u))"
>  " WHERE a.attnum > 
> 0::pg_catalog.int2"
> -"   AND NOT a.attisdropped %s"
> +"   AND NOT a.attisdropped",
> lrel->remoteid);
> +
> +   appendStringInfo(&cmd,
>  "   AND a.attrelid = %u"
>  " ORDER BY a.attnum",
> -lrel->remoteid,
> -
> (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 ?
> - "AND a.attgenerated = ''" : ""),
>  lrel->remoteid);
>
> 3) In which scenario this will be hit:
> +   /*
> +* Construct column list for COPY, excluding columns that are
> subscription
> +  

Re: Pgoutput not capturing the generated columns

2024-10-16 Thread Shubham Khanna
On Thu, Oct 10, 2024 at 10:53 AM Peter Smith  wrote:
>
> Here are some comments for TAP test patch v37-0003.
>
> I’m not in favour of the removal of such a large number of
> 'combination' and other 'misc' tests. In the commit message, please
> delete me as a "co-author" of this patch.
>
> ==
>
> 1.
> Any description or comment that still mentions "all combinations" is
> no longer valid:
>
> (e.g. in the comment message)
> Add tests for all combinations of generated column replication.
>
> (e.g. in the test file)
> # The following test cases exercise logical replication for all combinations
> # where there is a generated column on one or both sides of pub/sub:
>
> and
>
> # Furthermore, all combinations are tested using:
>
> ==
> 2.
> +# --
> +# Testcase: generated -> normal
> +# Publisher table has generated column 'b'.
> +# Subscriber table has normal column 'b'.
> +# --
> +
>
> Now that COPY for generated columns is already implemented in patch
> 0001, shouldn't this test be using 'copy_data' enabled, so it can test
> replication both for initial tablesync as well as normal replication?
>
> That was the whole point of having the "# XXX copy_data=false for now.
> This will be changed later." reminder comment in this file.
>
> ==
>
> 3.
> Previously there were some misc tests to ensure that a generated
> column which was then altered using DROP EXPRESSION would work as
> expected. The test scenario was commented like:
>
> +# 
> =
> +# Misc test.
> +#
> +# A "normal -> generated" replication fails, reporting an error that the
> +# subscriber side column is missing.
> +#
> +# In this test case we use DROP EXPRESSION to change the subscriber generated
> +# column into a normal column, then verify replication works ok.
> +# 
> =
>
> Now in patch v37 this test no longer exists. Why?
>
> ==
> 4.
> +# 
> =
> +# The following test cases demonstrate behavior of generated column 
> replication
> +# when publish_generated_colums=false/true:
> +#
> +# Test: column list includes gencols, when publish_generated_columns=false
> +# Test: column list does not include gencols, when
> publish_generated_columns=false
> +#
> +# Test: column list includes gencols, when publish_generated_columns=true
> +# Test: column list does not include gencols, when
> publish_generated_columns=true
> +# Test: no column list, when publish_generated_columns=true
> +# 
> =
>
> These tests are currently only testing the initial tablesync
> replication. Since the COPY logic is different from the normal
> replication logic, I think it would be better to test some normal
> replication records as well, to make sure both parts work
> consistently. This comment applies to all of the following test cases.
>
> ~~~
>
> 5.
> +# Create table and publications.
> +$node_publisher->safe_psql(
> + 'postgres', qq(
> + CREATE TABLE nogen_to_gen3 (a int, b int, gen1 int GENERATED ALWAYS
> AS (a * 2) STORED, gen2 int GENERATED ALWAYS AS (a * 2) STORED);
> + CREATE TABLE nogen_to_gen4 (c int, d int, gen1 int GENERATED ALWAYS
> AS (c * 2) STORED, gen2 int GENERATED ALWAYS AS (c * 2) STORED);
> + INSERT INTO nogen_to_gen3 VALUES (1, 1);
> + INSERT INTO nogen_to_gen4 VALUES (1, 1);
> + CREATE PUBLICATION pub1 FOR table nogen_to_gen3, nogen_to_gen4(gen1)
> WITH (publish_generated_columns=true);
> +));
> +
>
> 5a.
> The code should do only what the comments say it does. So, the INSERTS
> should be done separately after the CREATE PUBLICATION, but before the
> CREATE SUBSCRIPTION. A similar change should be made for all of these
> test cases.
>
> # Insert some initial data
> INSERT INTO nogen_to_gen3 VALUES (1, 1);
> INSERT INTO nogen_to_gen4 VALUES (1, 1);
>
> ~
>
> 5b.
> The tables are badly named. Why are they 'nogen_to_gen', when the
> publisher side has generated cols and the subscriber side does not?
> This problem seems repeated in multiple subsequent test cases.
>
> ~
>
> 6.
> +$result = $node_subscriber->safe_psql('postgres',
> + "SELECT * FROM gen_to_nogen ORDER BY a");
> +is($result, qq(1|1||2),
> + 'gen_to_nogen initial sync, when publish_generated_columns=false');
> +
> +$result = $node_subscriber->safe_psql('postgres',
> + "SELECT * FROM gen_to_nogen2 ORDER BY c");
> +is($result, qq(1|1||),
> + 'gen_to_nogen2 initial sync, when publish_generated_columns=false');
>
> IMO all the "result" queries like these ones ought to have to have a
> comment which explains the reason for the expected results. This
> review comment applies to multiple places. Please add comments to all
> of them.
>
> ~~~
>
> 7.
> +# -

Re: optimize hashjoin

2024-10-16 Thread Bruce Momjian
On Fri, Aug 23, 2024 at 08:17:26AM -0400, Robert Haas wrote:
> On Fri, Aug 23, 2024 at 7:02 AM bucoo  wrote:
> > Howerver, the non-parallel hashjoin indeed showed about a 10% performance 
> > improvement.
> >->  Hash Join  (cost=508496.00..2302429.31 rows=47989008 width=0) 
> > (actual time=1075.213..9503.727 rows=47989007 loops=1)
> >->  Hash Join  (cost=508496.00..2302429.31 rows=47989008 width=0) 
> > (actual time=1087.588..8726.441 rows=47989007 loops=1)
> 
> It's not a good idea to test performance with EXPLAIN ANALYZE,
> generally speaking. And you usually need to test a few times and
> average or something, rather than just a single test. But also, this
> doesn't show the hash join being 10% faster. It shows the hash join
> being essentially the same speed (1075ms unpatched, 1087ms patched),
> and the aggregate node on top of it being faster.
> 
> Now, it does seem possible to me that changing one node could cause a
> performance improvement for the node above it, but I don't quite see
> why that would happen in this case.

Where are we on this patch?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"




Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)

2024-10-16 Thread Shayon Mukherjee
On Oct 16, 2024, at 12:19 PM, Shayon Mukherjee  wrote:- ALTER INDEX ... ENABLE/DISABLE performs an in-place update of the pg_index   catalog to protect against indcheckxmin [2] (older unrelated thread).Performing the in place update of the pg_index row from ATExecEnableDisableIndex using systable_inplace_update_begin was failing in CI weirdly but not on my local MacBook when running spec suite. I am also wondering what is causing the requirement [1] to update the row in-place vs say using CatalogTupleUpdate since using the later is working well locally + CI? I have attached a v2 patch (following from the last v1 patch [1]) that uses CatalogTupleUpdate and local + CI [2] is passing. [1] https://www.postgresql.org/message-id/20180618215635.m5vrnxdxhxytv...@alap3.anarazel.de[2] https://www.postgresql.org/message-id/EF2313B8-A017-4869-9B7F-A24EDD8795DE%40gmail.com[3]  https://github.com/shayonj/postgres/pull/1ThanksShayon

v2-0001-Introduce-the-ability-to-enable-disable-indexes-u.patch
Description: Binary data


Re: Better error reporting from extension scripts (Was: Extend ALTER OPERATOR)

2024-10-16 Thread Tom Lane
jian he  writes:
> just found out the"elog(INFO, "should not reached here");" part never reached.

You didn't check any of the cases we were discussing I guess?
(That is, places in gram.y that throw an error without a
parser_errposition call.)

Note that even if we fix all of those and keep them fixed, we still
couldn't assume the case is unreachable, because gram.y isn't
self-contained.  For instance, if we hit out-of-memory during raw
parsing, the OOM error out of mcxt.c isn't going to provide a syntax
error position.  I'm not too concerned about doing better than what
the patch does now (i.e. nothing) in such edge cases, but we can't
do worse.

> i guess, we don't need performance in script_error_callback,
> but in script_error_callback arrange code seperate  syntax error(raw
> parser) and other error seems good.
> please check the attached minor refactor.

I do not think that's an improvement.  It's more complicated and
less readable, and I don't see why we need to squeeze more
performance out of this error-reporting path that should never
be taken in production.

regards, tom lane




Re: Limiting overshoot in nbtree's parallel SAOP index scans

2024-10-16 Thread Peter Geoghegan
On Fri, Oct 11, 2024 at 10:27 AM Matthias van de Meent
 wrote:
> With the introduction of the new SAOP handling in PG17, however, the
> shared state has become a bit more muddied. Because the default has
> changed from "stop scanning at the end of a SAOP value's range" to
> "continue scanning, unless ...", and because it is relatively
> expensive to determine whether we need to stop or continue we release
> the parallel scan to continue before we know if a new primitive scan
> would be preferred.

It's not just relatively expensive. It's essential that _bt_readpage
be able to release the parallel scan at the earliest opportunity (just
as soon as the next sibling link is known). Whereas a parallel backend
worker will only decide that it's likely a good idea to do another
primitive index scan when _bt_readpage has finished executing.

Clearly it would not be sensible to wait until _bt_readpage had gotten
as far as deciding on the need for another descent of the index. To do
so would essentially serialize the scan. There are clear advantages to
doing as little coordination as possible among backends. (I don't
think that you disagree with any of this, but just want to establish
context for anybody else that's reading along -- you seem to want to
do some limited/conditional form of this behavior.)

> Problem
> ---
> The new parallel index scan code can only get into a new primitive
> scan IFF no parallel worker has taken the next block in the time
> between the call to _bt_parallel_release at the start of _bt_readpage,
> and the call to _bt_parallel_primscan_schedule through _bt_checkkeys
> slightly later in _bt_readpage. This makes the code susceptible to
> race conditions

Do you have a test case?

I was aware of the issues in this area when designing the
_bt_parallel_primscan_schedule mechanism. I believe it's true that the
scan is only strictly guaranteed to be able to make forward progress
via naively scanning the next leaf page, again and again. This is a
fairly theoretical point, though.

I believe that it's practically impossible for this to cause us
problems. That would require a group of backends that all want to
start another primitive index scan, but each block each other out,
constantly. Not just once, mind you -- again and again (enough to be
noticeable). This would have to happen in spite of the fact that we
only release one backend per _bt_parallel_release call
(_bt_parallel_release uses ConditionVariableSignal under the hood,
which only releases the oldest worker waiting on the CV, not all
backends).

There is no signalling of condition variables within
_bt_parallel_primscan_schedule, either. So you just have this one call
to ConditionVariableSignal. Why should it be possible to get a
stampede of parallel backends? (I'm not 100% sure if your concern is
even theoretically valid.)

> Given the above, a parallel index scan with ScanKey `id = ANY
> (minvalue, maxvalue)` and bad enough timing/luck could thus scan the
> whole index, while just 2 primitive index scans (the behaviour before
> PG17) are sufficient.
>
> In practice it is quite unlikely that parallel index scans have such
> bad luck, but I don't think we should have luck or timing as a big
> factor for the performance picture in parallel index scans.

It's also possible for a hash table to degenerate when there are
excessively many hash collisions. This is certainly possible with
adversarial inputs -- it's not hard to write a test case that'll
generate some. So hash tables generally also "have luck as a big
factor", in about the same sense. No?

I don't think that it's fundamentally unreasonable to have concerns
about things like this, of course. Safety-critical avionics systems
tend to just not use hash tables at all.

> I think I've found a reasonably elegant solution, which limits the
> number of buffers we can fail to start a new primitive scan to the
> number of parallel workers + 1:  If we detect that a parallel worker
> in the same primscan range thinks this is the right moment to start a
> new primscan (but couldn't due to concurrent advancement), we don't
> release the parallel scan immediately, but instead only release it
> after reading the pages contents, to find out if we really should
> start a new primitive scan.  If so, we do that, and if not, we instead
> mark that the primitive scan has reached a new primscan range, do some
> cleanup, and then continue business as usual.

I'm opposed to this patch. It'll introduce a behavior that's more
difficult to reason about and test then the one that it purports to
fix.

-- 
Peter Geoghegan




Re: New "raw" COPY format

2024-10-16 Thread Joel Jacobson
On Wed, Oct 16, 2024, at 18:34, Daniel Verite wrote:
> Joel Jacobson wrote:
>
>> However, I thinking rejecting such column data seems like the
>> better alternative, to ensure data exported with COPY TO
>> can always be imported back using COPY FROM,
>> for the same format. 
>
> On the other hand, that might prevent cases where we
> want to export, for instance, a valid json array:
>
>   copy (select json_agg(col) from table ) to 'file' RAW
>
> This is a variant of the discussion in [1] where the OP does:
>
>  copy (select json_agg(row_to_json(t)) from  t) TO 'file' 
>
> and he complains that both text and csv "break the JSON".
> That discussion morphed into a proposed patch adding JSON
> format to COPY,  but RAW would work directly as the OP
> expected.
>
> That is, unless  happens to include JSON fields with LF/CRLF
> in them, and the RAW format says this is an error condition.
> In that case it's quite annoying to make it an error, rather than
> simply let it pass.
>
>
> [1]
> https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=k...@mail.gmail.com

Thanks for finding this related thread.

Very good example, that would be solved with RAW.

I've used a different hack myself many times to export text "as is",
which is to use COPY TO ... BINARY, and then to manually edit the
file using vim, stripping away the header and footer from the file. :D

But I don't see how JSON fields that come from PostgreSQL
could contain LF/CRLF though, could they?
Since LF/CR must be escaped inside a JSON field,
and when casting JSONB to text, there are no newlines
injected anywhere in between fields.
I can only see how a value of the legacy JSON type could
have newlines in it.

That doesn't matter though, this is still a very good example
on the need to export text "as is" from PostgreSQL to a file.

I like Jacob's idea of letting the user specify a DELIMITER
also for the RAW format, to override the automagical
newline detection. This would e.g. allow importing values
containing e.g. \r when the newline delimiter is \n,
which would otherwise be reject with an
"inconsistent newline style" error.

I think it would also be useful to allow specifying
DELIMITER NONE, which would allow importing an
entire text file, "as is", into a single column and single row.

To export a single column single row, the delimiter
wouldn't matter, since there wouldn't be any.
But DELIMITER NONE would be useful also for COPY TO,
if wanting to concatenate text "as is" without adding
newlines in between, to a file, similar to the
UNIX "cat" command.

Regarding the name for the format, I thought SINGLE
was nice before I read this message, but now since
the need for more rawness seems desirable,
I think I like RAW the most again, since if the
automagical newline detection can be overridden,
then it's really raw for real.

A final thought is to maybe consider just skipping
the automagical newline detection for RAW?

Instead of the automagical detection,
the default newline delimiter could be the OS default,
similar to how COPY TO works.

That way, it would almost always just work for most users,
as long as processing files within their OS,
and when not, they would just need to specify the DELIMITER.

/Joel




Considering fractional paths in Append node

2024-10-16 Thread Nikita Malakhov
Hi hackers!

A colleague of mine, Andrei Lepikhov, has found interesting behavior
in path cost calculation for Append node - when evaluating the cheapest
path it does not take into account fractional path costs.

We've prepared a patch that forces add_paths_to_append_rel function
to consider non-parametrized fractional path costs.

The effect is easily seen in one of standard PG tests:
Vanilla (current master):
explain analyze
select t1.unique1 from tenk1 t1
inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0
   union all
(values(1)) limit 1;
  QUERY PLAN

--
 Limit  (cost=0.00..219.55 rows=1 width=4) (actual time=6.309..6.312 rows=1
loops=1)
   ->  Append  (cost=0.00..2415.09 rows=11 width=4) (actual
time=6.308..6.310 rows=1 loops=1)
 ->  Nested Loop  (cost=0.00..2415.03 rows=10 width=4) (actual
time=6.307..6.308 rows=1 loops=1)
   Join Filter: (t1.tenthous = t2.tenthous)
   Rows Removed by Join Filter: 4210
   ->  Seq Scan on tenk1 t1  (cost=0.00..445.00 rows=1
width=8) (actual time=0.004..0.057 rows=422 loops=1)
   ->  Materialize  (cost=0.00..470.05 rows=10 width=4) (actual
time=0.000..0.014 rows=10 loops=422)
 Storage: Memory  Maximum Storage: 17kB
 ->  Seq Scan on tenk2 t2  (cost=0.00..470.00 rows=10
width=4) (actual time=0.076..5.535 rows=10 loops=1)
   Filter: (thousand = 0)
   Rows Removed by Filter: 9990
 ->  Result  (cost=0.00..0.01 rows=1 width=4) (never executed)
 Planning Time: 0.324 ms
 Execution Time: 6.336 ms
(14 rows)

Patched, the same test:
explain analyze
select t1.unique1 from tenk1 t1
inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0
   union all
(values(1)) limit 1;
QUERY
PLAN
--
 Limit  (cost=0.29..126.00 rows=1 width=4) (actual time=0.105..0.106 rows=1
loops=1)
   ->  Append  (cost=0.29..1383.12 rows=11 width=4) (actual
time=0.104..0.105 rows=1 loops=1)
 ->  Nested Loop  (cost=0.29..1383.05 rows=10 width=4) (actual
time=0.104..0.104 rows=1 loops=1)
   ->  Seq Scan on tenk2 t2  (cost=0.00..470.00 rows=10
width=4) (actual time=0.076..0.076 rows=1 loops=1)
 Filter: (thousand = 0)
 Rows Removed by Filter: 421
   ->  Index Scan using tenk1_thous_tenthous on tenk1 t1
 (cost=0.29..91.30 rows=1 width=8) (actual time=0.026..0.026 rows=1 loops=1)
 Index Cond: (tenthous = t2.tenthous)
 ->  Result  (cost=0.00..0.01 rows=1 width=4) (never executed)
 Planning Time: 0.334 ms
 Execution Time: 0.130 ms
(11 rows)

Hope this optimization could be useful.

-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


Re: New "raw" COPY format

2024-10-16 Thread Joel Jacobson
On Wed, Oct 16, 2024, at 20:30, Joel Jacobson wrote:
> A final thought is to maybe consider just skipping
> the automagical newline detection for RAW?
>
> Instead of the automagical detection,
> the default newline delimiter could be the OS default,
> similar to how COPY TO works.
>
> That way, it would almost always just work for most users,
> as long as processing files within their OS,
> and when not, they would just need to specify the DELIMITER.

I would guess that nowadays, dealing with unstructured text files
are probably less common, than dealing with structured text files,
such as JSON, YAML, TOML, XML, etc.

Therefore, maybe DELIMITER NONE would be a better default
for RAW? Especially since it's then also more honest in being "raw".

If needing to import an unstructured text file that is just newline
delimited, and not wanting the entire file as a single value,
the newline style would then just need to be specified
using the DELIMITER option.

/Joel




Re: POC, WIP: OR-clause support for indexes

2024-10-16 Thread Andrei Lepikhov

On 10/17/24 03:39, Alexander Korotkov wrote:

On Wed, Oct 16, 2024 at 7:22 AM Andrei Lepikhov  wrote:

On 10/12/24 21:25, Alexander Korotkov wrote:

I forgot to specify (COSTS OFF) for EXPLAINs in regression tests.  Fixed in v42.

I've passed through the patch set.

Let me put aside the v42-0003  patch—it looks debatable, and I need time
to analyse the change in regression tests caused by this patch.


Yes, 0003 patch is for illustration purposes for now.  I will not keep
rebasing it.  We can pick it later when main patches are committed.

Got it. I will save it into the TODO list.



Comments look much better according to my current language level. Ideas
with fast exits also look profitable and are worth an additional
'matched' variable.

So, in general, it is ok. I think only one place with
inner_other_clauses can be improved. Maybe it will be enough to create
this list only once,  outside 'foreach(j, groupedArgs)' cycle? Also, the
comment on the necessity of this operation was unclear to me. See the
attachment for my modest attempt at improving it.


Thank you, I've integrated your patch with minor edits from me.
Thanks, I'm not sure about necessity to check NIL value of a list 
(list_free also do it), but I'm ok with the edits.


--
regards, Andrei Lepikhov