Re: Synchronizing slots from primary to standby

2024-03-31 Thread Amit Kapila
On Mon, Apr 1, 2024 at 10:01 AM Bharath Rupireddy
 wrote:
>
> On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > [2] The steps to reproduce the data miss issue on a primary->standby setup:
>
> I'm trying to reproduce the problem with [1], but I can see the
> changes after the standby is promoted. Am I missing anything here?
>
> ubuntu:~/postgres/pg17/bin$ ./psql -d postgres -p 5433 -c "select *
> from pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);"
> lsn| xid |data
> ---+-+-
>  0/3B0 | 738 | BEGIN 738
>  0/3017FC8 | 738 | COMMIT 738
>  0/3017FF8 | 739 | BEGIN 739
>  0/3019A38 | 739 | COMMIT 739
>  0/3019A38 | 740 | BEGIN 740
>  0/3019A38 | 740 | table public.dummy1: INSERT: a[integer]:999
>  0/3019AA8 | 740 | COMMIT 740
> (7 rows)
>
> [1]
> -#define LOG_SNAPSHOT_INTERVAL_MS 15000
> +#define LOG_SNAPSHOT_INTERVAL_MS 150
>
> ./initdb -D db17
> echo "archive_mode = on
> archive_command='cp %p /home/ubuntu/postgres/pg17/bin/archived_wal/%f'
> wal_level='logical'
> autovacuum = off
> checkpoint_timeout='1h'" | tee -a db17/postgresql.conf
>
> ./pg_ctl -D db17 -l logfile17 start
>
> rm -rf sbdata logfilesbdata
> ./pg_basebackup -D sbdata
>
> ./psql -d postgres -p 5432 -c "SELECT
> pg_create_logical_replication_slot('lrep_sync_slot', 'test_decoding',
> false, false, true);"
> ./psql -d postgres -p 5432 -c "SELECT
> pg_create_physical_replication_slot('phy_repl_slot', true, false);"
>
> echo "port=5433
> primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu'
> primary_slot_name='phy_repl_slot'
> restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p'
> hot_standby_feedback=on
> sync_replication_slots=on" | tee -a sbdata/postgresql.conf
>
> touch sbdata/standby.signal
>
> ./pg_ctl -D sbdata -l logfilesbdata start
> ./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();"
>
> ./psql -d postgres
>
> SESSION1, TXN1
> BEGIN;
> create table dummy1(a int);
>
> SESSION2, TXN2
> SELECT pg_log_standby_snapshot();
>
> SESSION1, TXN1
> COMMIT;
>
> SESSION1, TXN1
> BEGIN;
> create table dummy2(a int);
>
> SESSION2, TXN2
> SELECT pg_log_standby_snapshot();
>
> SESSION1, TXN1
> COMMIT;
>
> ./psql -d postgres -p 5432 -c "SELECT
> pg_replication_slot_advance('lrep_sync_slot', pg_current_wal_lsn());"
>

After this step and before the next, did you ensure that the slot sync
has synced the latest confirmed_flush/restart LSNs? You can query:
"select slot_name,restart_lsn, confirmed_flush_lsn from
pg_replication_slots;" to ensure the same on both the primary and
standby.

-- 
With Regards,
Amit Kapila.




Re: Synchronizing slots from primary to standby

2024-03-31 Thread Bharath Rupireddy
On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
 wrote:
>
> [2] The steps to reproduce the data miss issue on a primary->standby setup:

I'm trying to reproduce the problem with [1], but I can see the
changes after the standby is promoted. Am I missing anything here?

ubuntu:~/postgres/pg17/bin$ ./psql -d postgres -p 5433 -c "select *
from pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);"
lsn| xid |data
---+-+-
 0/3B0 | 738 | BEGIN 738
 0/3017FC8 | 738 | COMMIT 738
 0/3017FF8 | 739 | BEGIN 739
 0/3019A38 | 739 | COMMIT 739
 0/3019A38 | 740 | BEGIN 740
 0/3019A38 | 740 | table public.dummy1: INSERT: a[integer]:999
 0/3019AA8 | 740 | COMMIT 740
(7 rows)

[1]
-#define LOG_SNAPSHOT_INTERVAL_MS 15000
+#define LOG_SNAPSHOT_INTERVAL_MS 150

./initdb -D db17
echo "archive_mode = on
archive_command='cp %p /home/ubuntu/postgres/pg17/bin/archived_wal/%f'
wal_level='logical'
autovacuum = off
checkpoint_timeout='1h'" | tee -a db17/postgresql.conf

./pg_ctl -D db17 -l logfile17 start

rm -rf sbdata logfilesbdata
./pg_basebackup -D sbdata

./psql -d postgres -p 5432 -c "SELECT
pg_create_logical_replication_slot('lrep_sync_slot', 'test_decoding',
false, false, true);"
./psql -d postgres -p 5432 -c "SELECT
pg_create_physical_replication_slot('phy_repl_slot', true, false);"

echo "port=5433
primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu'
primary_slot_name='phy_repl_slot'
restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p'
hot_standby_feedback=on
sync_replication_slots=on" | tee -a sbdata/postgresql.conf

touch sbdata/standby.signal

./pg_ctl -D sbdata -l logfilesbdata start
./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();"

./psql -d postgres

SESSION1, TXN1
BEGIN;
create table dummy1(a int);

SESSION2, TXN2
SELECT pg_log_standby_snapshot();

SESSION1, TXN1
COMMIT;

SESSION1, TXN1
BEGIN;
create table dummy2(a int);

SESSION2, TXN2
SELECT pg_log_standby_snapshot();

SESSION1, TXN1
COMMIT;

./psql -d postgres -p 5432 -c "SELECT
pg_replication_slot_advance('lrep_sync_slot', pg_current_wal_lsn());"
./psql -d postgres -p 5432 -c "INSERT INTO dummy1 VALUES(999);"

./psql -d postgres -p 5433 -c "SELECT pg_promote();"
./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();"

./psql -d postgres -p 5433 -c "select * from
pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);"

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: remaining sql/json patches

2024-03-31 Thread jian he
typedef struct JsonTableExecContext
{
int magic;
JsonTablePlanState *rootplanstate;
JsonTablePlanState **colexprplans;
} JsonTableExecContext;

imho, this kind of naming is kind of inconsistent.
"state" and "plan" are mixed together.
maybe

typedef struct JsonTableExecContext
{
int magic;
JsonTablePlanState *rootplanstate;
JsonTablePlanState **colexprstates;
} JsonTableExecContext;


+ cxt->colexprplans = palloc(sizeof(JsonTablePlanState *) *
+   list_length(tf->colvalexprs));
+
  /* Initialize plan */
- cxt->rootplanstate = JsonTableInitPlan(cxt, rootplan, args,
+ cxt->rootplanstate = JsonTableInitPlan(cxt, (Node *) rootplan, NULL, args,
CurrentMemoryContext);
I think, the comments "Initialize plan" is not right, here we
initialize the rootplanstate (JsonTablePlanState)
and also for each (no ordinality) columns, we also initialized the
specific JsonTablePlanState.

 static void JsonTableRescan(JsonTablePlanState *planstate);
@@ -331,6 +354,9 @@ static Datum JsonTableGetValue(TableFuncScanState
*state, int colnum,
Oid typid, int32 typmod, bool *isnull);
 static void JsonTableDestroyOpaque(TableFuncScanState *state);
 static bool JsonTablePlanNextRow(JsonTablePlanState *planstate);
+static bool JsonTablePlanPathNextRow(JsonTablePlanState *planstate);
+static void JsonTableRescan(JsonTablePlanState *planstate);

JsonTableRescan included twice?




Re: Improve eviction algorithm in ReorderBuffer

2024-03-31 Thread Masahiko Sawada
On Mon, Apr 1, 2024 at 11:26 AM Masahiko Sawada  wrote:
>
> On Fri, Mar 29, 2024 at 7:37 PM Amit Kapila  wrote:
> >
> > On Fri, Mar 29, 2024 at 12:13 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Fri, Mar 29, 2024 at 2:09 PM vignesh C  wrote:
> > > >
> > > > On Tue, 26 Mar 2024 at 10:05, Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > On Thu, Mar 14, 2024 at 12:02 PM Masahiko Sawada 
> > > > >  wrote:
> > > > > >
> > > > > >
> > > > > > I've attached new version patches.
> > > > >
> > > > > Since the previous patch conflicts with the current HEAD, I've
> > > > > attached the rebased patches.
> > > >
> > > > Thanks for the updated patch.
> > > > One comment:
> > > > I felt we can mention the improvement where we update memory
> > > > accounting info at transaction level instead of per change level which
> > > > is done in ReorderBufferCleanupTXN, ReorderBufferTruncateTXN, and
> > > > ReorderBufferSerializeTXN also in the commit message:
> > >
> > > Agreed.
> > >
> > > I think the patch is in good shape. I'll push the patch with the
> > > suggestion next week, barring any objections.
> > >
> >
> > Few minor comments:
> > 1.
> > @@ -3636,6 +3801,8 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
> >   Assert(txn->nentries_mem == 0);
> >   }
> >
> > + ReorderBufferMaybeResetMaxHeap(rb);
> > +
> >
> > Can we write a comment about why this reset is required here?
> > Otherwise, the reason is not apparent.
>
> Yes, added.
>
> >
> > 2.
> > Although using max-heap to select the largest
> > + * transaction is effective when there are many transactions being decoded,
> > + * there is generally no need to use it as long as all transactions being
> > + * decoded are top-level transactions. Therefore, we use MaxConnections as 
> > the
> > + * threshold so we can prevent switching to the state unless we use
> > + * subtransactions.
> > + */
> > +#define MAX_HEAP_TXN_COUNT_THRESHOLD MaxConnections
> >
> > Isn't using max-heap equally effective in finding the largest
> > transaction whether there are top-level or top-level plus
> > subtransactions? This comment indicates it is only effective when
> > there are subtransactions.
>
> You're right. Updated the comment.
>
> I've attached the updated patches.
>

While reviewing the patches, I realized the comment of
binearyheap_allocate() should also be updated. So I've attached the
new patches.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v12-0001-Make-binaryheap-enlargeable.patch
Description: Binary data


v12-0002-Add-functions-to-binaryheap-for-efficient-key-re.patch
Description: Binary data


v12-0003-Improve-eviction-algorithm-in-Reorderbuffer-usin.patch
Description: Binary data


Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-31 Thread Amit Kapila
On Fri, Mar 29, 2024 at 6:17 PM Bertrand Drouvot
 wrote:
>
> On Fri, Mar 29, 2024 at 03:03:01PM +0530, Amit Kapila wrote:
> > On Fri, Mar 29, 2024 at 11:49 AM Bertrand Drouvot
> >  wrote:
> > >
> > > On Fri, Mar 29, 2024 at 09:39:31AM +0530, Amit Kapila wrote:
> > > >
> > > > Commit message states: "why we can't just update inactive_since for
> > > > synced slots on the standby with the value received from remote slot
> > > > on the primary. This is consistent with any other slot parameter i.e.
> > > > all of them are synced from the primary."
> > > >
> > > > The inactive_since is not consistent with other slot parameters which
> > > > we copy. We don't perform anything related to those other parameters
> > > > like say two_phase phase which can change that property. However, we
> > > > do acquire the slot, advance the slot (as per recent discussion [1]),
> > > > and release it. Since these operations can impact inactive_since, it
> > > > seems to me that inactive_since is not the same as other parameters.
> > > > It can have a different value than the primary. Why would anyone want
> > > > to know the value of inactive_since from primary after the standby is
> > > > promoted?
> > >
> > > I think it can be useful "before" it is promoted and in case the primary 
> > > is down.
> > >
> >
> > It is not clear to me what is user going to do by checking the
> > inactivity time for slots when the corresponding server is down.
>
> Say a failover needs to be done, then it could be useful to know for which
> slots the activity needs to be resumed (thinking about external logical 
> decoding
> plugin, not about pub/sub here). If one see an inactive slot (since long 
> "enough")
> then he can start to reasonate about what to do with it.
>
> > I thought the idea was to check such slots and see if they need to be
> > dropped or enabled again to avoid excessive disk usage, etc.
>
> Yeah that's the case but it does not mean inactive_since can't be useful in 
> other
> ways.
>
> Also, say the slot has been invalidated on the primary (due to inactivity 
> timeout),
> primary is down and there is a failover. By keeping the inactive_since from
> the primary, one could know when the inactivity that lead to the timeout 
> started.
>

So, this means at promotion, we won't set the current_time for
inactive_since which is not what the currently proposed patch is
doing. Moreover, doing the invalidation on promoted standby based on
inactive_since of the primary node sounds debatable because the
inactive_timeout could be different on the new node (promoted
standby).

> Again, more concerned about external logical decoding plugin than pub/sub 
> here.
>
> > > I agree that tracking the activity time of a synced slot can be useful, 
> > > why
> > > not creating a dedicated field for that purpose (and keep inactive_since a
> > > perfect "copy" of the primary)?
> > >
> >
> > We can have a separate field for this but not sure if it is worth it.
>
> OTOH I'm not sure that erasing this information from the primary is useful. I
> think that 2 fields would be the best option and would be less subject of
> misinterpretation.
>
> > > > Now, the other concern is that calling GetCurrentTimestamp()
> > > > could be costly when the values for the slot are not going to be
> > > > updated but if that happens we can optimize such that before acquiring
> > > > the slot we can have some minimal pre-checks to ensure whether we need
> > > > to update the slot or not.
> > >
> > > Right, but for a very active slot it is likely that we call 
> > > GetCurrentTimestamp()
> > > during almost each sync cycle.
> > >
> >
> > True, but if we have to save a slot to disk each time to persist the
> > changes (for an active slot) then probably GetCurrentTimestamp()
> > shouldn't be costly enough to matter.
>
> Right, persisting the changes to disk would be even more costly.
>

The point I was making is that currently after copying the
remote_node's values, we always persist the slots to disk, so the cost
of current_time shouldn't be much. Now, if the values won't change
then probably there is some cost but in most cases (active slots), the
values will always change. Also, if all the slots are inactive then we
will slow down the speed of sync. We also need to consider if we want
to copy the value of inactive_since from the primary and if that is
the only value changed then shall we persist the slot or not?

-- 
With Regards,
Amit Kapila.




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-31 Thread Bharath Rupireddy
On Fri, Mar 29, 2024 at 9:39 AM Amit Kapila  wrote:
>
> Commit message states: "why we can't just update inactive_since for
> synced slots on the standby with the value received from remote slot
> on the primary. This is consistent with any other slot parameter i.e.
> all of them are synced from the primary."
>
> The inactive_since is not consistent with other slot parameters which
> we copy. We don't perform anything related to those other parameters
> like say two_phase phase which can change that property. However, we
> do acquire the slot, advance the slot (as per recent discussion [1]),
> and release it. Since these operations can impact inactive_since, it
> seems to me that inactive_since is not the same as other parameters.
> It can have a different value than the primary. Why would anyone want
> to know the value of inactive_since from primary after the standby is
> promoted?

After thinking about it for a while now, it feels to me that the
synced slots (slots on the standby that are being synced from the
primary) can have their own inactive_sicne value. Fundamentally,
inactive_sicne is set to 0 when slot is acquired and set to current
time when slot is released, no matter who acquires and releases it -
be it walsenders for replication, or backends for slot advance, or
backends for slot sync using pg_sync_replication_slots, or backends
for other slot functions, or background sync worker. Remember the
earlier patch was updating inactive_since just for walsenders, but
then the suggestion was to update it unconditionally -
https://www.postgresql.org/message-id/CAJpy0uD64X%3D2ENmbHaRiWTKeQawr-rbGoy_GdhQQLVXzUSKTMg%40mail.gmail.com.
Whoever syncs the slot, *acutally* acquires the slot i.e. makes it
theirs, syncs it from the primary, and releases it. IMO, no
differentiation is to be made for synced slots.

There was a suggestion on using inactive_since of the synced slot on
the standby to know the inactivity of the slot on the primary. If one
wants to do that, they better look at/monitor the primary slot
info/logs/pg_replication_slot/whatever. I really don't see a point in
having two different meanings for a single property of a replication
slot - inactive_since for a regular slot tells since when this slot
has become inactive, and for a synced slot since when the
corresponding remote slot has become inactive. I think this will
confuse users for sure.

Also, if inactive_since is being changed on the primary so frequently,
and none of the other parameters are changing, if we copy
inactive_since to the synced slots, then standby will just be doing
*sync* work (mark the slots dirty and save to disk) for updating
inactive_since. I think this is unnecessary behaviour for sure.

Coming to a future patch for inactive timeout based slot invalidation,
we can either allow invalidation without any differentiation for
synced slots or restrict invalidation to avoid more sync work. For
instance, if inactive timeout is kept low on the standby, the sync
worker will be doing more work as it drops and recreates a slot
repeatedly if it keeps getting invalidated. Another thing is that the
standby takes independent invalidation decisions for synced slots.
AFAICS, invalidation due to wal_removal is the only sole reason (out
of all available invalidation reasons) for a synced slot to get
invalidated independently of the primary. Check
https://www.postgresql.org/message-id/CAA4eK1JXBwTaDRD_%3D8t6UB1fhRNjC1C%2BgH4YdDxj_9U6djLnXw%40mail.gmail.com
for the suggestion on we better not differentiaing invalidation
decisions for synced slots.

The assumption of letting synced slots have their own inactive_since
not only simplifies the code, but also looks less-confusing and more
meaningful to the user. The only code that we put in on top of the
committed code is to use InRecovery in place of
RecoveryInProgress() in RestoreSlotFromDisk() to fix the issue raised
by Shveta upthread.

> Now, the other concern is that calling GetCurrentTimestamp()
> could be costly when the values for the slot are not going to be
> updated but if that happens we can optimize such that before acquiring
> the slot we can have some minimal pre-checks to ensure whether we need
> to update the slot or not.
>
> [1] - 
> https://www.postgresql.org/message-id/OS0PR01MB571615D35F486080616CA841943A2%40OS0PR01MB5716.jpnprd01.prod.outlook.com

A quick test with a function to measure the cost of
GetCurrentTimestamp [1] on my Ubuntu dev system (an AWS EC2 c5.4xlarge
instance), gives me [2]. It took 0.388 ms, 2.269 ms, 21.144 ms,
209.333 ms, 2091.174 ms, 20908.942 ms for 10K, 100K, 1million,
10million, 100million, 1billion times respectively. Costs might be
different on various systems with different OS, but it gives us a
rough idea.

If we are too much concerned about the cost of GetCurrentTimestamp(),
a possible approach is just don't set inactive_since for slots being
synced on the standby. Just let the first acquisition and release
after the promotion do that job. We 

Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-03-31 Thread Masahiko Sawada
On Fri, Mar 29, 2024 at 4:21 PM John Naylor  wrote:
>
> On Thu, Mar 28, 2024 at 12:55 PM Masahiko Sawada  
> wrote:
> > I think the patch is in good shape. Do you have other comments or
> > suggestions, John?
>
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -1918,11 +1918,6 @@ include_dir 'conf.d'
>  too high.  It may be useful to control for this by separately
>  setting .
> 
> -   
> -Note that for the collection of dead tuple identifiers,
> -VACUUM is only able to utilize up to a maximum of
> -1GB of memory.
> -   
>
>   
>
> This is mentioned twice for two different GUCs -- need to remove the
> other one, too.

Good catch, removed.

> Other than that, I just have minor nits:
>
> - * The major space usage for vacuuming is storage for the array of dead TIDs
> + * The major space usage for vacuuming is TID store, a storage for dead TIDs
>
> I think I've helped edit this sentence before, but I still don't quite
> like it. I'm thinking now "is storage for the dead tuple IDs".
>
> - * set upper bounds on the number of TIDs we can keep track of at once.
> + * set upper bounds on the maximum memory that can be used for keeping track
> + * of dead TIDs at once.
>
> I think "maximum" is redundant with "upper bounds".

Fixed.

>
> I also feel the commit message needs more "meat" -- we need to clearly
> narrate the features and benefits. I've attached how I would write it,
> but feel free to use what you like to match your taste.

Well, that's much better than mine.

>
> I've marked it Ready for Committer.

Thank you! I've attached the patch that I'm going to push tomorrow.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v82-0001-Use-TidStore-for-dead-tuple-TIDs-storage-during-.patch
Description: Binary data


Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-03-31 Thread Masahiko Sawada
On Fri, Mar 29, 2024 at 11:54 AM torikoshia  wrote:
>
> On 2024-03-28 21:54, Masahiko Sawada wrote:
> > On Thu, Mar 28, 2024 at 9:38 PM torikoshia 
> > wrote:
> >>
> >> On 2024-03-28 10:20, Masahiko Sawada wrote:
> >> > Hi,
> >> >
> >> > On Thu, Jan 18, 2024 at 5:33 PM Masahiko Sawada 
> >> > wrote:
> >> >>
> >> >> On Thu, Jan 18, 2024 at 4:59 PM Alexander Korotkov
> >> >>  wrote:
> >> >> >
> >> >> > On Thu, Jan 18, 2024 at 4:16 AM torikoshia 
> >> >> >  wrote:
> >> >> > > On 2024-01-18 10:10, jian he wrote:
> >> >> > > > On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada 
> >> >> > > > 
> >> >> > > > wrote:
> >> >> > > >> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  
> >> >> > > >> wrote:
> >> >> > > >> > Kyotaro-san's suggestion isn't bad, though I might shorten it 
> >> >> > > >> > to
> >> >> > > >> > error_action {error|ignore|log} (or perhaps "stop" instead of 
> >> >> > > >> > "error")?
> >> >> > > >> > You will need a separate parameter anyway to specify the 
> >> >> > > >> > destination
> >> >> > > >> > of "log", unless "none" became an illegal table name when I 
> >> >> > > >> > wasn't
> >> >> > > >> > looking.  I don't buy that one parameter that has some special 
> >> >> > > >> > values
> >> >> > > >> > while other values could be names will be a good design.  
> >> >> > > >> > Moreover,
> >> >> > > >> > what if we want to support (say) log-to-file along with 
> >> >> > > >> > log-to-table?
> >> >> > > >> > Trying to distinguish a file name from a table name without 
> >> >> > > >> > any other
> >> >> > > >> > context seems impossible.
> >> >> > > >>
> >> >> > > >> I've been thinking we can add more values to this option to log 
> >> >> > > >> errors
> >> >> > > >> not only to the server logs but also to the error table (not sure
> >> >> > > >> details but I imagined an error table is created for each table 
> >> >> > > >> on
> >> >> > > >> error), without an additional option for the destination name. 
> >> >> > > >> The
> >> >> > > >> values would be like error_action 
> >> >> > > >> {error|ignore|save-logs|save-table}.
> >> >> > > >>
> >> >> > > >
> >> >> > > > another idea:
> >> >> > > > on_error {error|ignore|other_future_option}
> >> >> > > > if not specified then by default ERROR.
> >> >> > > > You can also specify ERROR or IGNORE for now.
> >> >> > > >
> >> >> > > > I agree, the parameter "error_action" is better than "location".
> >> >> > >
> >> >> > > I'm not sure whether error_action or on_error is better, but either 
> >> >> > > way
> >> >> > > "error_action error" and "on_error error" seems a bit odd to me.
> >> >> > > I feel "stop" is better for both cases as Tom suggested.
> >> >> >
> >> >> > OK.  What about this?
> >> >> > on_error {stop|ignore|other_future_option}
> >> >> > where other_future_option might be compound like "file 'copy.log'" or
> >> >> > "table 'copy_log'".
> >> >>
> >> >> +1
> >> >>
> >> >
> >> > I realized that ON_ERROR syntax synoposis in the documentation is not
> >> > correct. The option doesn't require the value to be quoted and the
> >> > value can be omitted. The attached patch fixes it.
> >> >
> >> > Regards,
> >>
> >> Thanks!
> >>
> >> Attached patch fixes the doc, but I'm wondering perhaps it might be
> >> better to modify the codes to prohibit abbreviation of the value.
> >>
> >> When seeing the query which abbreviates ON_ERROR value, I feel it's
> >> not
> >> obvious what happens compared to other options which tolerates
> >> abbreviation of the value such as FREEZE or HEADER.
> >>
> >>COPY t1 FROM stdin WITH (ON_ERROR);
> >>
> >> What do you think?
> >
> > Indeed. Looking at options of other commands such as VACUUM and
> > EXPLAIN, I can see that we can omit a boolean value, but non-boolean
> > parameters require its value. The HEADER option is not a pure boolean
> > parameter but we can omit the value. It seems to be for backward
> > compatibility; it used to be a boolean parameter. I agree that the
> > above example would confuse users.
> >
> > Regards,
>
> Thanks for your comment!
>
> Attached a patch which modifies the code to prohibit omission of its
> value.
>
> I was a little unsure about adding a regression test for this, but I
> have not added it since other COPY option doesn't test the omission of
> its value.

Probably should we change the doc as well since ON_ERROR value doesn't
necessarily need to be single-quoted?

The rest looks good to me.

Alexander, what do you think about this change as you're the committer
of this feature?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Improve eviction algorithm in ReorderBuffer

2024-03-31 Thread Masahiko Sawada
On Fri, Mar 29, 2024 at 8:48 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Sawada-san,
>
> > Agreed.
> >
> > I think the patch is in good shape. I'll push the patch with the
> > suggestion next week, barring any objections.
>
> Thanks for working on this. Agreed it is committable.
> Few minor comments:

Thank you for the comments!

>
> ```
> + * Either txn or change must be non-NULL at least. We update the memory
> + * counter of txn if it's non-NULL, otherwise change->txn.
> ```
>
> IIUC no one checks the restriction. Should we add Assert() for it, e.g,:
> Assert(txn || change)?

Agreed to add it.

>
> ```
> +/* make sure enough space for a new node */
> ...
> +/* make sure enough space for a new node */
> ```
>
> Should be started with upper case?

I  don't think we need to change it. There are other comments in the
same file that are one line and start with lowercase.

I've just submitted the updated patches[1]

Regards,

[1] 
https://www.postgresql.org/message-id/CAD21AoA6%3D%2BtL%3DbtB_s9N%2BcZK7tKz1W%3DPQyNq72nzjUcdyE%2BwZw%40mail.gmail.com

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Improve eviction algorithm in ReorderBuffer

2024-03-31 Thread Masahiko Sawada
On Fri, Mar 29, 2024 at 7:37 PM Amit Kapila  wrote:
>
> On Fri, Mar 29, 2024 at 12:13 PM Masahiko Sawada  
> wrote:
> >
> > On Fri, Mar 29, 2024 at 2:09 PM vignesh C  wrote:
> > >
> > > On Tue, 26 Mar 2024 at 10:05, Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Thu, Mar 14, 2024 at 12:02 PM Masahiko Sawada 
> > > >  wrote:
> > > > >
> > > > >
> > > > > I've attached new version patches.
> > > >
> > > > Since the previous patch conflicts with the current HEAD, I've
> > > > attached the rebased patches.
> > >
> > > Thanks for the updated patch.
> > > One comment:
> > > I felt we can mention the improvement where we update memory
> > > accounting info at transaction level instead of per change level which
> > > is done in ReorderBufferCleanupTXN, ReorderBufferTruncateTXN, and
> > > ReorderBufferSerializeTXN also in the commit message:
> >
> > Agreed.
> >
> > I think the patch is in good shape. I'll push the patch with the
> > suggestion next week, barring any objections.
> >
>
> Few minor comments:
> 1.
> @@ -3636,6 +3801,8 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
>   Assert(txn->nentries_mem == 0);
>   }
>
> + ReorderBufferMaybeResetMaxHeap(rb);
> +
>
> Can we write a comment about why this reset is required here?
> Otherwise, the reason is not apparent.

Yes, added.

>
> 2.
> Although using max-heap to select the largest
> + * transaction is effective when there are many transactions being decoded,
> + * there is generally no need to use it as long as all transactions being
> + * decoded are top-level transactions. Therefore, we use MaxConnections as 
> the
> + * threshold so we can prevent switching to the state unless we use
> + * subtransactions.
> + */
> +#define MAX_HEAP_TXN_COUNT_THRESHOLD MaxConnections
>
> Isn't using max-heap equally effective in finding the largest
> transaction whether there are top-level or top-level plus
> subtransactions? This comment indicates it is only effective when
> there are subtransactions.

You're right. Updated the comment.

I've attached the updated patches.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v11-0002-Add-functions-to-binaryheap-for-efficient-key-re.patch
Description: Binary data


v11-0001-Make-binaryheap-enlargeable.patch
Description: Binary data


v11-0003-Improve-eviction-algorithm-in-Reorderbuffer-usin.patch
Description: Binary data


Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-31 Thread Bharath Rupireddy
On Mon, Apr 1, 2024 at 5:54 AM Alexander Korotkov  wrote:
>
> > 9. To me the following query blocks even though I didn't mention timeout.
> > CALL pg_wal_replay_wait('0/fff');
>
> If your primary server is freshly initialized, you need to do quite
> data modifications to reach this LSN.

Right, but why pg_wal_replay_wait  blocks without a timeout? It must
return an error saying it can't reach the target LSN, no?

Did you forget to attach the new patch?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Extract numeric filed in JSONB more effectively

2024-03-31 Thread Andy Fan

Here is latest version, nothing changed besides the rebase to the latest
master. The most recent 3 questions should be addressed.

- The error message compatible issue [1] and the Peter's answer at [2].
- Peter's new question at [2] and my answer at [3].

Any effrot to move this patch ahead is welcome and thanks all the people
who provided high quaility feedback so far, especially chapman!

[1] https://www.postgresql.org/message-id/87r0hmvuvr@163.com
[2]
https://www.postgresql.org/message-id/8102ff5b-b156-409e-a48f-e53e63a39b36%40eisentraut.org
[3] https://www.postgresql.org/message-id/8734t6c5rh.fsf%40163.com

-- 
Best Regards
Andy Fan

>From fb38be5addb93d7c0b8c1a3e8376751c9b3be5f5 Mon Sep 17 00:00:00 2001
From: "yizhi.fzh" 
Date: Mon, 1 Apr 2024 09:36:08 +0800
Subject: [PATCH v17 1/1] Improve the performance of Jsonb numeric/bool 
 extraction.

JSONB object uses a binary compatible numeric format with the numeric
data type in SQL. However in the past, extracting a numeric value from a
JSONB field still needs to find the corresponding JsonbValue first,
then convert the JsonbValue to Jsonb, and finally use the cast system to
convert the Jsonb to a Numeric data type. This approach was very
inefficient in terms of performance.

In the current patch, It is handled that the JsonbValue is converted to
numeric data type directly.  This is done by the planner support
function which detects the above case and simplify the expression.
Because the boolean type and numeric type share certain similarities in
their attributes, we have implemented the same optimization approach for
both.  In the ideal test case, the performance can be 2x than before.

The optimized functions and operators includes:
1. jsonb_object_field / ->
2. jsonb_array_element / ->
3. jsonb_extract_path / #>
4. jsonb_path_query
5. jsonb_path_query_first
---
 src/backend/utils/adt/jsonb.c | 206 ++
 src/backend/utils/adt/jsonbsubs.c |   4 +-
 src/backend/utils/adt/jsonfuncs.c | 123 ++-
 src/backend/utils/adt/jsonpath_exec.c |  32 +++-
 src/include/catalog/pg_proc.dat   |  46 +-
 src/include/utils/jsonb.h |  11 +-
 src/test/regress/expected/jsonb.out   | 112 +-
 src/test/regress/sql/jsonb.sql|  66 -
 src/tools/pgindent/typedefs.list  |   1 +
 9 files changed, 542 insertions(+), 59 deletions(-)

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index a5e48744ac..6e93b34fd6 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -16,9 +16,15 @@
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "funcapi.h"
+#include "nodes/makefuncs.h"
+#include "nodes/supportnodes.h"
+#include "parser/parse_coerce.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
+#include "utils/date.h"
+#include "utils/datetime.h"
+#include "utils/fmgroids.h"
 #include "utils/json.h"
 #include "utils/jsonb.h"
 #include "utils/jsonfuncs.h"
@@ -2035,6 +2041,206 @@ cannotCastJsonbValue(enum jbvType type, const char *sqltype)
 	elog(ERROR, "unknown jsonb type: %d", (int) type);
 }
 
+
+/*
+ * jsonb_cast_support()
+ *
+ * Planner support function for extracting numeric or bool data type more
+ * effectively. After finding out the corresponding JsonbValue, instead of
+ * casting it to Jsonb as an intermediate type, we covert it to the desired
+ * data type directly.
+ */
+Datum
+jsonb_cast_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+
+	if (IsA(rawreq, SupportRequestSimplify))
+	{
+		SupportRequestSimplify *req = (SupportRequestSimplify *) rawreq;
+		FuncExpr   *fexpr = req->fcall;
+		FuncExpr   *jsonb_start_func = NULL,
+   *jsonb_finish_func = NULL,
+   *final_func = NULL;
+		Node	   *input;
+		Oid			new_func_id = InvalidOid;
+		List	   *args;
+		Oid			input_func_id,
+	collid,
+	inputcollid;
+		bool		retset = false,
+	variadic = false;
+
+		Assert(list_length(fexpr->args) == 1);
+		input = (Node *) linitial(fexpr->args);
+
+		if (IsA(input, OpExpr))
+		{
+			OpExpr	   *opExpr = castNode(OpExpr, input);
+
+			input_func_id = opExpr->opfuncid;
+			collid = opExpr->opcollid;
+			inputcollid = opExpr->inputcollid;
+			args = opExpr->args;
+		}
+		else if (IsA(input, FuncExpr))
+		{
+			FuncExpr   *funcExpr = castNode(FuncExpr, input);
+
+			input_func_id = funcExpr->funcid;
+			collid = funcExpr->funccollid;
+			inputcollid = funcExpr->inputcollid;
+			args = funcExpr->args;
+		}
+		else
+			/* not the desired pattern. */
+			PG_RETURN_POINTER(NULL);
+
+		/* build a function to return the JsonbValue directly. */
+		switch (input_func_id)
+		{
+			case F_JSONB_OBJECT_FIELD:
+new_func_id = F_JSONB_OBJECT_FIELD_START;
+break;
+			case F_JSONB_ARRAY_ELEMENT:
+new_func_id = F_JSONB_ARRAY_ELEMENT_START;
+break;
+			case F_JSONB_EXTRACT_PATH:
+new_func_id = F_JSONB_EXTRACT_PATH_START;
+variadic = true;
+

Re: Statistics Import and Export

2024-03-31 Thread Corey Huinker
>
> IIRC, "variadic any" requires having at least one variadic parameter.
> But that seems fine --- what would be the point, or even the
> semantics, of calling pg_set_attribute_stats with no data fields?
>

If my pg_dump run emitted a bunch of stats that could never be imported,
I'd want to know. With silent failures, I don't.



> Perhaps we could
> invent a new backend function that extracts the actual element type
> of a non-null anyarray argument.
>

A backend function that we can't guarantee exists on the source system. :(


> Another way we could get to no-coercions is to stick with your
> signature but declare the relevant parameters as anyarray instead of
> text.  I still think though that we'd be better off to leave the
> parameter matching to runtime, so that we-don't-recognize-that-field
> can be a warning not an error.
>

I'm a bit confused here. AFAIK we can't construct an anyarray in SQL:

# select '{1,2,3}'::anyarray;
ERROR:  cannot accept a value of type anyarray


> I think you missed my point: you're doing that inefficiently,
> and maybe even with race conditions.  Use the relcache's copy
> of the pg_class row.
>

Roger Wilco.


> Well, I'm here to debate it if you want, but I'll just note that *one*
> error will be enough to abort a pg_upgrade entirely, and most users
> these days get scared by errors during manual dump/restore too.  So we
> had better not be throwing errors except for cases that we don't think
> pg_dump could ever emit.
>

That's pretty persuasive. It also means that we need to trap for error in
the array_in() calls, as that function does not yet have a _safe() mode.


Re: Popcount optimization using AVX512

2024-03-31 Thread Nathan Bossart
On Sat, Mar 30, 2024 at 03:03:29PM -0500, Nathan Bossart wrote:
> My current plan is to add some new tests for
> pg_popcount() with many bytes, and then I'll give it a few more days for
> any additional feedback before committing.

Here is a v18 with a couple of new tests.  Otherwise, it is the same as
v17.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 86a571721ed3ed4ca7e04134b9541fc3ac43b9f1 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 27 Mar 2024 16:39:24 -0500
Subject: [PATCH v18 1/1] AVX512 popcount support

---
 config/c-compiler.m4 |  58 ++
 configure| 252 +++
 configure.ac |  51 ++
 meson.build  |  87 +
 src/Makefile.global.in   |   5 +
 src/include/pg_config.h.in   |  12 ++
 src/include/port/pg_bitutils.h   |  15 ++
 src/makefiles/meson.build|   4 +-
 src/port/Makefile|  11 ++
 src/port/meson.build |   6 +-
 src/port/pg_bitutils.c   |  56 +++---
 src/port/pg_popcount_avx512.c|  49 ++
 src/port/pg_popcount_avx512_choose.c |  71 
 src/test/regress/expected/bit.out|  24 +++
 src/test/regress/sql/bit.sql |   4 +
 15 files changed, 666 insertions(+), 39 deletions(-)
 create mode 100644 src/port/pg_popcount_avx512.c
 create mode 100644 src/port/pg_popcount_avx512_choose.c

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 3268a780bb..5fb60775ca 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -694,3 +694,61 @@ if test x"$Ac_cachevar" = x"yes"; then
 fi
 undefine([Ac_cachevar])dnl
 ])# PGAC_LOONGARCH_CRC32C_INTRINSICS
+
+# PGAC_XSAVE_INTRINSICS
+# -
+# Check if the compiler supports the XSAVE instructions using the _xgetbv
+# intrinsic function.
+#
+# An optional compiler flag can be passed as argument (e.g., -mxsave).  If the
+# intrinsic is supported, sets pgac_xsave_intrinsics and CFLAGS_XSAVE.
+AC_DEFUN([PGAC_XSAVE_INTRINSICS],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_xsave_intrinsics_$1])])dnl
+AC_CACHE_CHECK([for _xgetbv with CFLAGS=$1], [Ac_cachevar],
+[pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS $1"
+AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ],
+  [return _xgetbv(0) & 0xe0;])],
+  [Ac_cachevar=yes],
+  [Ac_cachevar=no])
+CFLAGS="$pgac_save_CFLAGS"])
+if test x"$Ac_cachevar" = x"yes"; then
+  CFLAGS_XSAVE="$1"
+  pgac_xsave_intrinsics=yes
+fi
+undefine([Ac_cachevar])dnl
+])# PGAC_XSAVE_INTRINSICS
+
+# PGAC_AVX512_POPCNT_INTRINSICS
+# -
+# Check if the compiler supports the AVX512 POPCNT instructions using the
+# _mm512_setzero_si512, _mm512_loadu_si512, _mm512_popcnt_epi64,
+# _mm512_add_epi64, and _mm512_reduce_add_epi64 intrinsic functions.
+#
+# An optional compiler flag can be passed as argument
+# (e.g., -mavx512vpopcntdq).  If the intrinsics are supported, sets
+# pgac_avx512_popcnt_intrinsics and CFLAGS_POPCNT.
+AC_DEFUN([PGAC_AVX512_POPCNT_INTRINSICS],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx512_popcnt_intrinsics_$1])])dnl
+AC_CACHE_CHECK([for _mm512_popcnt_epi64 with CFLAGS=$1], [Ac_cachevar],
+[pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS $1"
+AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ],
+  [const char buf@<:@sizeof(__m512i)@:>@;
+   PG_INT64_TYPE popcnt = 0;
+   __m512i accum = _mm512_setzero_si512();
+   const __m512i val = _mm512_loadu_si512((const __m512i *) buf);
+   const __m512i cnt = _mm512_popcnt_epi64(val);
+   accum = _mm512_add_epi64(accum, cnt);
+   popcnt = _mm512_reduce_add_epi64(accum);
+   /* return computed value, to prevent the above being optimized away */
+   return popcnt == 0;])],
+  [Ac_cachevar=yes],
+  [Ac_cachevar=no])
+CFLAGS="$pgac_save_CFLAGS"])
+if test x"$Ac_cachevar" = x"yes"; then
+  CFLAGS_POPCNT="$1"
+  pgac_avx512_popcnt_intrinsics=yes
+fi
+undefine([Ac_cachevar])dnl
+])# PGAC_AVX512_POPCNT_INTRINSICS
diff --git a/configure b/configure
index 36feeafbb2..b48ed7f271 100755
--- a/configure
+++ b/configure
@@ -647,6 +647,9 @@ MSGFMT_FLAGS
 MSGFMT
 PG_CRC32C_OBJS
 CFLAGS_CRC
+PG_POPCNT_OBJS
+CFLAGS_POPCNT
+CFLAGS_XSAVE
 LIBOBJS
 OPENSSL
 ZSTD
@@ -17404,6 +17407,40 @@ $as_echo "#define HAVE__GET_CPUID 1" >>confdefs.h
 
 fi
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __get_cpuid_count" >&5
+$as_echo_n "checking for __get_cpuid_count... " >&6; }
+if ${pgac_cv__get_cpuid_count+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include 
+int
+main ()
+{
+unsigned int exx[4] = {0, 0, 0, 0};
+  __get_cpuid_count(7, 0, [0], [1], [2], [3]);
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  pgac_cv__get_cpuid_count="yes"
+else
+  pgac_cv__get_cpuid_count="no"
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+fi
+{ $as_echo 

Re: Add new error_action COPY ON_ERROR "log"

2024-03-31 Thread Masahiko Sawada
On Sat, Mar 30, 2024 at 11:05 PM Bharath Rupireddy
 wrote:
>
> On Thu, Mar 28, 2024 at 6:31 PM Masahiko Sawada  wrote:
> >
> > That is,
> > since the LOG_VERBOSITY option is an enum parameter, it might make
> > more sense to require the value, instead of making the value optional.
> > For example, the following command could not be obvious for users:
> >
> > COPY test FROM stdin (ON_ERROR ignore, LOG_VERBOSITY);
>
> Agreed. Please see the attached v14 patch.

Thank you for updating the patch!

>  The LOG_VERBOSITY now needs
> a value to be specified. Note that I've not added any test for this
> case as there seemed to be no such tests so far generating "ERROR:
> <> requires a parameter". I don't mind adding one for
> LOG_VERBOSITY though.

+1

One minor point:

 ENCODING 'encoding_name'
+LOG_VERBOSITY [ mode ]
 

'[' and ']' are not necessary because the value is no longer optional.

I've attached the updated patch. I'll push it, barring any objections.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v15-0001-Add-new-COPY-option-LOG_VERBOSITY.patch
Description: Binary data


RE: Synchronizing slots from primary to standby

2024-03-31 Thread Zhijie Hou (Fujitsu)
On Friday, March 29, 2024 2:50 PM Amit Kapila  wrote:
> 
> On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> >
> > Attach a new version patch which fixed an un-initialized variable
> > issue and added some comments.
> >
> 
> The other approach to fix this issue could be that the slotsync worker get the
> serialized snapshot using pg_read_binary_file() corresponding to restart_lsn
> and writes those at standby. But there are cases when we won't have such a 
> file
> like (a) when we initially create the slot and reach the consistent_point, or 
> (b)
> also by the time the slotsync worker starts to read the remote snapshot file, 
> the
> snapshot file could have been removed by the checkpointer on the primary (if
> the restart_lsn of the remote has been advanced in this window). So, in such
> cases, we anyway need to advance the slot. I think these could be 
> optimizations
> that we could do in the future.
> 
> Few comments:

Thanks for the comments.

> =
> 1.
> - if (slot->data.database != MyDatabaseId)
> + if (slot->data.database != MyDatabaseId && !fast_forward)
>   ereport(ERROR,
>   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>   errmsg("replication slot \"%s\" was not created in this database", @@ -526,7
> +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn,
>   * Do not allow consumption of a "synchronized" slot until the standby
>   * gets promoted.
>   */
> - if (RecoveryInProgress() && slot->data.synced)
> + if (RecoveryInProgress() && slot->data.synced &&
> + !IsSyncingReplicationSlots())
> 
> 
> Add comments at both of the above places.

Added.

> 
> 
> 2.
> +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto,
> +   bool *found_consistent_point);
> +
> 
> This API looks a bit awkward as the functionality doesn't match the name. How
> about having a function with name
> LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> ready_for_decoding) with the same functionality as your patch has for
> pg_logical_replication_slot_advance() and then invoke it both from
> pg_logical_replication_slot_advance and slotsync.c. The function name is too
> big, we can think of a shorter name. Any ideas?

How about LogicalSlotAdvanceAndCheckDecodingState() Or just
LogicalSlotAdvanceAndCheckDecoding()? (I used the suggested
LogicalSlotAdvanceAndCheckReadynessForDecoding in this version, It can be 
renamed in
next version if we agree).

Attach the V3 patch which addressed above comments and Kuroda-san's
comments[1]. I also adjusted the tap-test to only check the confirmed_flush_lsn
after syncing, as the restart_lsn could be different from the remote one due to
the new slot_advance() call. I am also testing some optimization idea locally
and will share if ready.

[1] 
https://www.postgresql.org/message-id/TYCPR01MB1207757BB2A32B6815CE1CCE7F53A2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hou zj


v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description:  v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch


Cant link libpq with a pg extension using cmake

2024-03-31 Thread Tony Wayne
i am getting linking issues when trying to link libpq with my pg extension
and i am using pg's libpq ,so libpq is built along with pg,so i did this in
my extension's cmakelists.txt

file (GLOB storage_SRC CONFIGURE_DEPENDS "*.cpp" )
add_library(storage OBJECT ${storage_SRC})
target_link_libraries(storage PRIVATE pq)

btw included all required include dirs in my toplevel cmakelists.txt

then i got undefined symbol: pqsecure_write but don't know why if i give
the pg_config --libdir/libpq.a path in target_link_libraries instead of lib
name then it links but walreceiver process cant start and i get FATAL:
could not connect to the primary server: libpq is incorrectly linked to
backend functions


Re: Statistics Import and Export

2024-03-31 Thread Tom Lane
Corey Huinker  writes:
>> and I really think that we need to provide
>> the source server's major version number --- maybe we will never
>> need that, but if we do and we don't have it we will be sad.

> The JSON had it, and I never did use it. Not against having it again.

Well, you don't need it now seeing that the definition of pg_stats
columns hasn't changed in the past ... but there's no guarantee we
won't want to change them in the future.

>> So this leads me to suggest that we'd be best off with a VARIADIC
>> ANY signature, where the variadic part consists of alternating
>> parameter labels and values:
>> pg_set_attribute_stats(table regclass, attribute name,
>>inherited bool, source_version int,
>>variadic "any") returns void

> I'm not aware of how strict works with variadics. Would the lack of any
> variadic parameters trigger it?

IIRC, "variadic any" requires having at least one variadic parameter.
But that seems fine --- what would be the point, or even the
semantics, of calling pg_set_attribute_stats with no data fields?

> Also going with strict means that an inadvertent explicit NULL in one
> parameter would cause the entire attribute import to fail silently. I'd
> rather fail loudly.

Not really convinced that that is worth any trouble...

> * We can require the calling statement to cast arguments, particularly
>> arrays, to the proper type, removing the need for conversions within
>> the stats-setting function.  (But instead, it'd need to check that the
>> next "any" argument is the type it ought to be based on the type of
>> the target column.)

> So, that's tricky. The type of the values is not always the attribute type,

Hmm.  You would need to have enough smarts in pg_set_attribute_stats
to identify the appropriate array type in any case: as coded, it needs
that for coercion, whereas what I'm suggesting would only require it
for checking, but either way you need it.  I do concede that pg_dump
(or other logic generating the calls) needs to know more under my
proposal than before.  I had been thinking that it would not need to
hard-code that because it could look to see what the actual type is
of the array it's dumping.  However, I see that pg_typeof() doesn't
work for that because it just returns anyarray.  Perhaps we could
invent a new backend function that extracts the actual element type
of a non-null anyarray argument.

Another way we could get to no-coercions is to stick with your
signature but declare the relevant parameters as anyarray instead of
text.  I still think though that we'd be better off to leave the
parameter matching to runtime, so that we-don't-recognize-that-field
can be a warning not an error.

>> * why is check_relation_permissions looking up the pg_class row?
>> There's already a copy of that in the Relation struct.

> To prove that the caller is the owner (or better) of the table.

I think you missed my point: you're doing that inefficiently,
and maybe even with race conditions.  Use the relcache's copy
of the pg_class row.

>> * I'm dubious that we can fully vet the contents of these arrays,
>> and even a little dubious that we need to try.

> A lot of the feedback I got on this patch over the months concerned giving
> inaccurate, nonsensical, or malicious data to the planner. Surely the
> planner does do *some* defensive programming when fetching these values,
> but this is the first time those values were potentially set by a user, not
> by our own internal code. We can try to match types, collations, etc from
> source to dest, but even that would fall victim to another glibc-level
> collation change.

That sort of concern is exactly why I think the planner has to, and
does, defend itself.  Even if you fully vet the data at the instant
of loading, we might have the collation change under us later.

It could be argued that feeding bogus data to the planner for testing
purposes is a valid use-case for this feature.  (Of course, as
superuser we could inject bogus data into pg_statistic manually,
so it's not necessary to have this feature for that purpose.)
I guess I'm a great deal more sanguine than other people about the
planner's ability to tolerate inconsistent data; but in any case
I don't have a lot of faith in relying on checks in
pg_set_attribute_stats to substitute for that ability.  That idea
mainly leads to having a whole lot of code that has to be kept in
sync with other code that's far away from it and probably isn't
coded in a parallel fashion either.

>> * There's a lot of ERROR cases that maybe we ought to downgrade
>> to WARN-and-press-on, in the service of not breaking the restore
>> completely in case of trouble.

> All cases were made error precisely to spark debate about which cases we'd
> want to continue from and which we'd want to error from.

Well, I'm here to debate it if you want, but I'll just note that *one*
error will be enough to abort a pg_upgrade entirely, and most users
these 

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-31 Thread Alexander Korotkov
Hi Bharath,

Thank you for your feedback.

On Sun, Mar 31, 2024 at 8:44 AM Bharath Rupireddy
 wrote:
> On Sun, Mar 31, 2024 at 7:41 AM Alexander Korotkov  
> wrote:
> Thanks for the patch. I have a few comments on the v16 patch.
>
> 1. Is there any specific reason for pg_wal_replay_wait() being a
> procedure rather than a function? I haven't read the full thread, but
> I vaguely noticed the discussion on the new wait mechanism holding up
> a snapshot or some other resource. Is that the reason to use a stored
> procedure over a function? If yes, can we specify it somewhere in the
> commit message and just before the procedure definition in
> system_functions.sql?

Surely, there is a reason.  Function should be executed in a snapshot,
which can prevent WAL records from being replayed.  See [1] for a
particular test scenario.  In a procedure we may enforce non-atomic
context and release the snapshot.

I've mentioned that in the commit message and in the procedure code.
I don't think system_functions.sql is the place for this type of
comment.  We only use system_functions.sql to push the default values.

> 2. Is the pg_wal_replay_wait first procedure that postgres provides
> out of the box?

Yes, it appears first.  I see nothing wrong about that.

> 3. Defining a procedure for the first time in system_functions.sql
> which is supposed to be for functions seems a bit unusual to me.

>From the scope of DDL and system catalogue procedure is just another
kind of function (prokind == 'p').  So, I don't feel wrong about that.

> 4.
> +
> +endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), timeout);
> +
>
> +if (timeout > 0)
> +{
> +delay_ms = (endtime - GetCurrentTimestamp()) / 1000;
> +latch_events |= WL_TIMEOUT;
> +if (delay_ms <= 0)
> +break;
> +}
>
> Why is endtime calculated even for timeout <= 0 only to just skip it
> later? Can't we just do a fastpath exit if timeout = 0 and targetLSN <

OK, fixed.

> 5.
>  Parameter
> +timeout is the time in milliseconds to wait
> +for the target_lsn
> +replay. When timeout value equals to zero no
> +timeout is applied.
> +replay. When timeout value equals to zero no
> +timeout is applied.
>
> It turns out to be "When timeout value equals to zero no timeout is
> applied." I guess, we can just say something like the following which
> I picked up from pg_terminate_backend timeout parameter description.
>
>
> If timeout is not specified or zero, this
> function returns if  the WAL upto
> target_lsn  is replayed.
> If the timeout is specified (in
> milliseconds) and greater than zero, the function waits until the
> server actually replays the WAL upto target_lsn  or
> until the given time has passed. On timeout, an error is emitted.
>

Applied as you suggested with some edits from me.

> 6.
> +ereport(ERROR,
> +(errcode(ERRCODE_QUERY_CANCELED),
> + errmsg("canceling waiting for LSN due to timeout")));
>
> We can be a bit more informative here and say targetLSN and currentLSN
> something like - "timed out while waiting for target LSN %X/%X to be
> replayed; current LSN %X/%X"?

Done this way.  Adjusted other ereport()'s as well.

> 7.
> +if (context->atomic)
> +ereport(ERROR,
> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("pg_wal_replay_wait() must be only called in
> non-atomic context")));
> +
>
> Can we say what a "non-atomic context '' is in a user understandable
> way like explicit txn or whatever that might  be? "non-atomic context'
> might not sound great to the end -user.

Added errdetail() to this ereport().

> 8.
> +the movie table and get the lsn 
> after
> +changes just made.  This example uses
> pg_current_wal_insert_lsn
> +to get the lsn given that
> synchronous_commit
> +could be set to off.
>
> Can we just mention that run pg_current_wal_insert_lsn on the primary?

The mention is added.

> 9. To me the following query blocks even though I didn't mention timeout.
> CALL pg_wal_replay_wait('0/fff');

If your primary server is freshly initialized, you need to do quite
data modifications to reach this LSN.

> 10. Can't we do some input validation on the timeout parameter and
> emit an error for negative values just like pg_terminate_backend?
> CALL pg_wal_replay_wait('0/ff', -100);

Reasonable, added.

> 11.
> +
> +if (timeout > 0)
> +{
> +delay_ms = (endtime - GetCurrentTimestamp()) / 1000;
> +latch_events |= WL_TIMEOUT;
> +if (delay_ms <= 0)
> +break;
> +}
> +
>
> Can we avoid calling GetCurrentTimestamp in a for loop which can be
> costly at times especially when pg_wal_replay_wait is called with
> larger timeouts on multiple backends? Can't we reuse
> 

Re: Statistics Import and Export

2024-03-31 Thread Corey Huinker
>
>
> I concur with the plan of extracting data from pg_stats not
> pg_statistics, and with emitting a single "set statistics"
> call per attribute.  (I think at one point I'd suggested a call
> per stakind slot, but that would lead to a bunch of UPDATEs on
> existing pg_attribute tuples and hence a bunch of dead tuples
> at the end of an import, so it's not the way to go.  A series
> of UPDATEs would likely also play poorly with a background
> auto-ANALYZE happening concurrently.)
>

That was my reasoning as well.



> I do not like the current design for pg_set_attribute_stats' API
> though: I don't think it's at all future-proof.  What happens when
> somebody adds a new stakind (and hence new pg_stats column)?
> You could try to add an overloaded pg_set_attribute_stats
> version with more parameters, but I'm pretty sure that would
> lead to "ambiguous function call" failures when trying to load
> old dump files containing only the original parameters.


I don't think we'd overload, we'd just add new parameters to the function
signature.


>   The
> present design is also fragile in that an unrecognized parameter
> will lead to a parse-time failure and no function call happening,
> which is less robust than I'd like.


There was a lot of back-and-forth about what sorts of failures were
error-worthy, and which were warn-worthy. I'll discuss further below.


>   As lesser points,
> the relation argument ought to be declared regclass not oid for
> convenience of use,


+1


> and I really think that we need to provide
> the source server's major version number --- maybe we will never
> need that, but if we do and we don't have it we will be sad.
>

The JSON had it, and I never did use it. Not against having it again.


>
> So this leads me to suggest that we'd be best off with a VARIADIC
> ANY signature, where the variadic part consists of alternating
> parameter labels and values:
>
> pg_set_attribute_stats(table regclass, attribute name,
>inherited bool, source_version int,
>variadic "any") returns void
>
> where a call might look like
>
> SELECT pg_set_attribute_stats('public.mytable', 'mycolumn',
>   false, -- not inherited
>   16, -- source server major version
>   -- pairs of labels and values follow
>   'null_frac', 0.4,
>   'avg_width', 42,
>   'histogram_bounds',
>   array['a', 'b', 'c']::text[],
>   ...);
>
> Note a couple of useful things here:
>
> * AFAICS we could label the function strict and remove all those ad-hoc
> null checks.  If you don't have a value for a particular stat, you
> just leave that pair of arguments out (exactly as the existing code
> in 0002 does, just using a different notation).  This also means that
> we don't need any default arguments and so no need for hackery in
> system_functions.sql.
>

I'm not aware of how strict works with variadics. Would the lack of any
variadic parameters trigger it?

Also going with strict means that an inadvertent explicit NULL in one
parameter would cause the entire attribute import to fail silently. I'd
rather fail loudly.



> * If we don't recognize a parameter label at runtime, we can treat
> that as a warning rather than a hard error, and press on.  This case
> would mostly be useful in major version downgrades I suppose, but
> that will be something people will want eventually.
>

Interesting.

* We can require the calling statement to cast arguments, particularly
> arrays, to the proper type, removing the need for conversions within
> the stats-setting function.  (But instead, it'd need to check that the
> next "any" argument is the type it ought to be based on the type of
> the target column.)
>

So, that's tricky. The type of the values is not always the attribute type,
for expression indexes, we do call exprType() and exprCollation(), in which
case we always use the expression type over the attribute type, but only
use the collation type if the attribute had no collation. This mimics the
behavior of ANALYZE.

Then, for the MCELEM and DECHIST stakinds we have to find the type's
element type, and that has special logic for tsvectors, ranges, and other
non-scalars, borrowing from the various *_typanalyze() functions. For that
matter, the existing typanalyze functions don't grab the < operator, which
I need for later data validations, so using examine_attribute() was
simultaneously overkill and insufficient.

None of this functionality is accessible from a client program, so we'd
have to pull in a lot of backend stuff to pg_dump to make it resolve the
typecasts correctly. Text and array_in() was just easier.


> pg_set_relation_stats is simpler in that the set of stats values
> to be set will probably remain fairly static, and there seems little
> reason to allow 

Re: remaining sql/json patches

2024-03-31 Thread jian he
FAILED: src/interfaces/ecpg/test/sql/sqljson_jsontable.c
/home/jian/postgres/buildtest6/src/interfaces/ecpg/preproc/ecpg
--regression -I../../Desktop/pg_src/src6/postgres/src/interfaces/ecpg/test/sql
-I../../Desktop/pg_src/src6/postgres/src/interfaces/ecpg/include/ -o
src/interfaces/ecpg/test/sql/sqljson_jsontable.c
../../Desktop/pg_src/src6/postgres/src/interfaces/ecpg/test/sql/sqljson_jsontable.pgc
../../Desktop/pg_src/src6/postgres/src/interfaces/ecpg/test/sql/sqljson_jsontable.pgc:21:
WARNING: unsupported feature will be passed to server
../../Desktop/pg_src/src6/postgres/src/interfaces/ecpg/test/sql/sqljson_jsontable.pgc:32:
ERROR: syntax error at or near ";"
need an extra closing parenthesis?

   
The rows produced by JSON_TABLE are laterally
joined to the row that generated them, so you do not have to explicitly join
the constructed view with the original table holding JSON
-   data.
need closing para.

SELECT * FROM JSON_TABLE('[]', 'strict $.a' COLUMNS (js2 text  PATH
'$' error on empty error on error) EMPTY ON ERROR);
should i expect it return one row?
is there any example to make it return one row from top level "EMPTY ON ERROR"?


+ {
+ JsonTablePlan *scan = (JsonTablePlan *) plan;
+
+ JsonTableInitPathScan(cxt, planstate, args, mcxt);
+
+ planstate->nested = scan->child ?
+ JsonTableInitPlan(cxt, scan->child, planstate, args, mcxt) : NULL;
+ }
first line seems strange, do we just simply change from "plan" to "scan"?


+ case JTC_REGULAR:
+ typenameTypeIdAndMod(pstate, rawc->typeName, , );
+
+ /*
+ * Use implicit FORMAT JSON for composite types (arrays and
+ * records) or if a non-default WRAPPER / QUOTES behavior is
+ * specified.
+ */
+ if (typeIsComposite(typid) ||
+ rawc->quotes != JS_QUOTES_UNSPEC ||
+ rawc->wrapper != JSW_UNSPEC)
+ rawc->coltype = JTC_FORMATTED;
per previous discussion, should we refactor the above comment?


+/* Recursively set 'reset' flag of planstate and its child nodes */
+static void
+JsonTablePlanReset(JsonTablePlanState *planstate)
+{
+ if (IsA(planstate->plan, JsonTableSiblingJoin))
+ {
+ JsonTablePlanReset(planstate->left);
+ JsonTablePlanReset(planstate->right);
+ planstate->advanceRight = false;
+ }
+ else
+ {
+ planstate->reset = true;
+ planstate->advanceNested = false;
+
+ if (planstate->nested)
+ JsonTablePlanReset(planstate->nested);
+ }
per coverage, the first part of the IF branch never executed.
i also found out that JsonTablePlanReset is quite similar to JsonTableRescan,
i don't fully understand these two functions though.


SELECT * FROM JSON_TABLE(jsonb'{"a": {"z":[]}, "b": 1,"c": 2, "d":
91}', '$' COLUMNS (
c int path '$.c',
d int path '$.d',
id1 for ordinality,
NESTED PATH '$.a.z[*]' columns (z int path '$', id for ordinality)
));
doc seems to say that duplicated ordinality columns in different nest
levels are not allowed?


"currentRow" naming seems misleading, generally, when we think of "row",
we think of several (not one) datums, or several columns.
but here, we only have one datum.
I don't have good optional naming though.


+ case JTC_FORMATTED:
+ case JTC_EXISTS:
+ {
+ Node   *je;
+ CaseTestExpr *param = makeNode(CaseTestExpr);
+
+ param->collation = InvalidOid;
+ param->typeId = contextItemTypid;
+ param->typeMod = -1;
+
+ je = transformJsonTableColumn(rawc, (Node *) param,
+  NIL, errorOnError);
+
+ colexpr = transformExpr(pstate, je, EXPR_KIND_FROM_FUNCTION);
+ assign_expr_collations(pstate, colexpr);
+
+ typid = exprType(colexpr);
+ typmod = exprTypmod(colexpr);
+ break;
+ }
+
+ default:
+ elog(ERROR, "unknown JSON_TABLE column type: %d", rawc->coltype);
+ break;
+ }
+
+ tf->coltypes = lappend_oid(tf->coltypes, typid);
+ tf->coltypmods = lappend_int(tf->coltypmods, typmod);
+ tf->colcollations = lappend_oid(tf->colcollations, get_typcollation(typid));
+ tf->colvalexprs = lappend(tf->colvalexprs, colexpr);

why not use exprCollation(colexpr) for tf->colcollations, similar to
exprType(colexpr)?




+-- Should fail (JSON arguments are not passed to column paths)
+SELECT *
+FROM JSON_TABLE(
+ jsonb '[1,2,3]',
+ '$[*] ? (@ < $x)'
+ PASSING 10 AS x
+ COLUMNS (y text FORMAT JSON PATH '$ ? (@ < $x)')
+ ) jt;
+ERROR:  could not find jsonpath variable "x"

the error message does not correspond to the comments intention.
also "y text FORMAT JSON" should be fine?

only the second last example really using the PASSING clause.
should the following query work just fine in this context?

create table s(js jsonb);
insert into s select '{"a":{"za":[{"z1": [11,]},{"z21": [22,
234,2345]}]},"c": 3}';
SELECT sub.* FROM s,JSON_TABLE(js, '$' passing 11 AS "b c", 1 + 2 as y
COLUMNS (xx int path '$.c ? (@ == $y)')) sub;


I thought the json and text data type were quite similar.
should these following two queries return the same result?

SELECT sub.* FROM s, JSON_TABLE(js, '$' COLUMNS(
xx int path '$.c',
nested PATH '$.a.za[1]' columns (NESTED PATH '$.z21[*]' COLUMNS (a12
jsonb path '$'))
))sub;

SELECT sub.* FROM s,JSON_TABLE(js, '$' 

Re: Catalog domain not-null constraints

2024-03-31 Thread jian he
On Tue, Mar 26, 2024 at 2:28 AM Dean Rasheed  wrote:
>
> On Fri, 22 Mar 2024 at 08:28, jian he  wrote:
> >
> > On Thu, Mar 21, 2024 at 7:23 PM Peter Eisentraut  
> > wrote:
> > >
> > > Hmm.  CREATE DOMAIN uses column constraint syntax, but ALTER DOMAIN uses
> > > table constraint syntax.  Attached is a patch to try to sort this out.
> >
> > also you should also change src/backend/utils/adt/ruleutils.c?
> >
> > src6=# \dD
> >   List of domains
> >  Schema |Name |  Type   | Collation | Nullable | Default |
> >  Check
> > +-+-+---+--+-+--
> >  public | domain_test | integer |   | not null | |
> > CHECK (VALUE > 0) NOT NULL VALUE
> > (1 row)
> >
> > probably change to CHECK (VALUE IS NOT NULL)
>
> I'd say it should just output "NOT NULL", since that's the input
> syntax that created the constraint. But then again, why display NOT
> NULL constraints in that column at all, when there's a separate
> "Nullable" column?
>
create table sss(a int not null);
SELECT pg_get_constraintdef(oid) FROM pg_constraint WHERE conname =
'sss_a_not_null';
returns
" NOT NULL a"

I think just outputting "NOT NULL" is ok for the domain, given the
table constraint is "NOT NULL" + table column, per above example.
yech, we already have a "Nullable" column, so we don't need to display
 NOT NULL constraints.




Re: Statistics Import and Export

2024-03-31 Thread Tom Lane
Corey Huinker  writes:
>> I can't quibble with that view of what has priority.  I'm just
>> suggesting that redesigning what pg_upgrade does in this area
>> should come later than doing something about extended stats.

> I mostly agree, with the caveat that pg_upgrade's existing message saying
> that optimizer stats were not carried over wouldn't be 100% true anymore.

I think we can tweak the message wording.  I just don't want to be
doing major redesign of the behavior, nor adding fundamentally new
monitoring capabilities.

regards, tom lane




Re: Psql meta-command conninfo+

2024-03-31 Thread Imseih (AWS), Sami
>. However,
> I didn't complete item 4. I'm not sure, but I believe that linking it to the 
> documentation
> could confuse the user a bit. I chose to keep the descriptions as they were. 
> However, if
> you have any ideas on how we could outline it, let me know and perhaps we can
> implement it.



That is fair, after spending some time thinking about this, it is better

to make the documentation as crystal clear as possible.



I do have some rewording suggestions for the following fields.





Database:

The database name of the connection.



Authenticated User:

The authenticated database user of the connection.



Socket Directory:

The Unix socket directory of the connection. NULL if host or hostaddr are 
specified.



Host:

The host name or address of the connection. NULL if a Unix socket is used.



Server Port:

The IP address of the connection. NULL if a Unix socket is used.



Server Address:

The IP address of the host name. NULL if a Unix socket is used.



Client Address:

The IP address of the client connected to this backend. NULL if a Unix socket 
is used.



Client Port:

The port of the client connected to this backend. NULL if a Unix socket is used.



Backend PID:

The process id of the backend for the connection.



Application name:

psql is the default for a psql connection

unless otherwise specified in the 

configuration parameter.



For System User, you should use the full definition here

https://github.com/postgres/postgres/blob/master/doc/src/sgml/func.sgml#L24251-L24259.

The current path is missing the full description.





Regards,



Sami




Re: Statistics Import and Export

2024-03-31 Thread Corey Huinker
>
> I wonder if the right answer to that is "let's enhance the I/O
> functions for those types".  But whether that helps or not, it's
> v18-or-later material for sure.
>

That was Stephen's take as well, and I agreed given that I had to throw the
kitchen-sink of source-side oid mappings (attname, types, collatons,
operators) into the JSON to work around the limitation.


> I can't quibble with that view of what has priority.  I'm just
> suggesting that redesigning what pg_upgrade does in this area
> should come later than doing something about extended stats.
>

I mostly agree, with the caveat that pg_upgrade's existing message saying
that optimizer stats were not carried over wouldn't be 100% true anymore.


Re: Statistics Import and Export

2024-03-31 Thread Tom Lane
Corey Huinker  writes:
> On Sun, Mar 31, 2024 at 2:41 PM Tom Lane  wrote:
>> There's a bigger issue though: AFAICS this patch set does nothing
>> about dumping extended statistics.  I surely don't want to hold up
>> the patch insisting that that has to happen before we can commit the
>> functionality proposed here.  But we cannot rip out pg_upgrade's
>> support for post-upgrade ANALYZE processing before we do something
>> about extended statistics, and that means it's premature to be
>> designing any changes to how that works.  So I'd set that whole
>> topic on the back burner.

> So Extended Stats _were_ supported by earlier versions where the medium of
> communication was JSON. However, there were several problems with adapting
> that to the current model where we match params to stat types:

> * Several of the column types do not have functional input functions, so we
> must construct the data structure internally and pass them to
> statext_store().
> * The output functions for some of those column types have lists of
> attnums, with negative values representing positional expressions in the
> stat definition. This information is not translatable to another system
> without also passing along the attnum/attname mapping of the source system.

I wonder if the right answer to that is "let's enhance the I/O
functions for those types".  But whether that helps or not, it's
v18-or-later material for sure.

> At least three people told me "nobody uses extended stats" and to just drop
> that from the initial version.

I can't quibble with that view of what has priority.  I'm just
suggesting that redesigning what pg_upgrade does in this area
should come later than doing something about extended stats.

regards, tom lane




Re: Statistics Import and Export

2024-03-31 Thread Corey Huinker
On Sun, Mar 31, 2024 at 2:41 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > Having given this some thought, I'd be inclined to create a view,
> > pg_stats_missing, with the same security barrier as pg_stats, but looking
> > for tables that lack stats on at least one column, or lack stats on an
> > extended statistics object.
>
> The week before feature freeze is no time to be designing something
> like that, unless you've abandoned all hope of getting this into v17.
>

It was a response to the suggestion that there be some way for
tools/automation to read the status of stats. I would view it as a separate
patch, as such a view would be useful now for knowing which tables to
ANALYZE, regardless of whether this patch goes in or not.


> There's a bigger issue though: AFAICS this patch set does nothing
> about dumping extended statistics.  I surely don't want to hold up
> the patch insisting that that has to happen before we can commit the
> functionality proposed here.  But we cannot rip out pg_upgrade's
> support for post-upgrade ANALYZE processing before we do something
> about extended statistics, and that means it's premature to be
> designing any changes to how that works.  So I'd set that whole
> topic on the back burner.
>

So Extended Stats _were_ supported by earlier versions where the medium of
communication was JSON. However, there were several problems with adapting
that to the current model where we match params to stat types:

* Several of the column types do not have functional input functions, so we
must construct the data structure internally and pass them to
statext_store().
* The output functions for some of those column types have lists of
attnums, with negative values representing positional expressions in the
stat definition. This information is not translatable to another system
without also passing along the attnum/attname mapping of the source system.

At least three people told me "nobody uses extended stats" and to just drop
that from the initial version. Unhappy with this assessment, I inquired as
to whether my employer (AWS) had some internal databases that used extended
stats so that I could get good test data, and came up with nothing, nor did
anyone know of customers who used the feature. So when the fourth person
told me that nobody uses extended stats, and not to let a rarely-used
feature get in the way of a feature that would benefit nearly 100% of
users, I dropped it.


> It's possible that we could drop the analyze-in-stages recommendation,
> figuring that this functionality will get people to the
> able-to-limp-along level immediately and that all that is needed is a
> single mop-up ANALYZE pass.  But I think we should leave that till we
> have a bit more than zero field experience with this feature.


It may be that we leave the recommendation exactly as it is.

Perhaps we enhance the error messages in pg_set_*_stats() to indicate what
command would remediate the issue.


RE: Psql meta-command conninfo+

2024-03-31 Thread Maiquel Grassi
Note that, in the patch as posted, the column names are not
translatable.  In order to be translatable, the code needs to do
something like

appendPQExpBuffer(,
"  NULL AS \"%s\",\n"
"  NULL AS \"%s\",\n"
"  NULL AS \"%s\",\n"
"  NULL AS \"%s\",\n",
_("SSL Connection"),
_("SSL Protocol"),
_("SSL Cipher"),
_("SSL Compression"));

instead of

   appendPQExpBuffer(,
 "  NULL AS \"SSL Connection\",\n"
 "  NULL AS \"SSL Protocol\",\n"
 "  NULL AS \"SSL Cipher\",\n"
 "  NULL AS \"SSL Compression\",\n");


Please list me as reviewer for this patch, as I provided significant
guidance before it was even posted.

-//-

Hello!

(v23)

Yes Álvaro, I have already appointed you as the patch reviewer.
It's true, even before publishing it on Commifest, you have
already provided good ideas and guidance.

I adjusted the translation syntax for the SSL and GSSAPI columns.
After your validation, that is, an okay confirmation that it's fine, I'll
proceed with the others.

Test:

[postgres@localhost bin]$ ./psql -x -p 5000 -h 127.0.0.1
psql (17devel)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host "127.0.0.1" 
at port "5000".
postgres=# \conninfo+
Current Connection Information
-[ RECORD 1 ]+--
Database | postgres
Authenticated User   | postgres
Socket Directory |
Host | 127.0.0.1
Server Port  | 5000
Server Address   | 127.0.0.1
Client Address   | 127.0.0.1
Client Port  | 52966
Backend PID  | 1693
System User  |
Current User | postgres
Session User | postgres
Application Name | psql
SSL Connection   | f
SSL Protocol |
SSL Cipher   |
SSL Compression  |
GSSAPI Authenticated | f
GSSAPI Principal |
GSSAPI Encrypted | f
GSSAPI Credentials Delegated | f

Regards,
Maiquel Grassi.


v23-0001-psql-meta-command-conninfo-plus.patch
Description: v23-0001-psql-meta-command-conninfo-plus.patch


Re: Security lessons from liblzma

2024-03-31 Thread Andres Freund
Hi,

On 2024-03-31 12:18:29 +0200, Michael Banck wrote:
> If you ask where they are maintained, the answer is here:
>
> https://salsa.debian.org/postgresql/postgresql/-/tree/17/debian/patches?ref_type=heads
>
> the other major versions have their own branch.

Luckily these are all quite small, leaving little space to hide stuff.  I'd
still like to get rid of at least some of them.

I've previously proposed a patch to make pkglibdir configurable, I think we
should just go for that.

For the various defines, ISTM we should just make them #ifndef guarded, then
they could be overridden by defining them at configure time. Some of them,
like DEFAULT_PGSOCKET_DIR seem to be overridden by just about every
distro. And others would be nice to easily override anyway, I e.g. dislike the
default DEFAULT_PAGER value.


On 2024-03-31 16:55:26 +0100, Devrim Gündüz wrote:
> Here are the v17 patches:
>
> https://git.postgresql.org/gitweb/?p=pgrpms.git;a=tree;f=rpm/redhat/main/non-common/postgresql-17/main

A bit bigger/more patches, but still not too bad.

postgresql-17-conf.patch

Uncomments a few values to their default, that's a bit odd.


postgresql-17-ecpg_config.h:
postgresql-17-pg_config.h:

Hm, wonder if we should make this easier somehow. Perhaps we ought to support
installing the various *config.h headers into a different directory than the
architecture independent stuff?  At least on debian based systems it seems we
ought to support installing pg_config.h etc into /usr/include/ or
something along those lines.


postgresql-17-rpm-pgsql.patch:

We should probably make this stuff properly configurable. The current logic
with inferring whether to add /postgresql is just weird. Perhaps a configure
option that defaults to the current logic when set to an empty string but can
be overridden?


Greetings,

Andres Freund




Re: Security lessons from liblzma

2024-03-31 Thread Tom Lane
Michael Banck  writes:
> On Sun, Mar 31, 2024 at 01:05:40PM -0400, Joe Conway wrote:
>> But it has always bothered me how many patches get applied to the upstream
>> tarballs by the OS package builders.

> I think this more an artifact of how RHEL development works, i.e. trying
> to ship the same major version of glibc for 10 years, but still fix lots
> of bugs and possibly some performance improvements your larger customers
> ask for. So I guess a lot of those 1000 patches are just cherry-picks /
> backports of upstream commits from newer releases.

Yeah.  Also, precisely because they keep supporting versions that are
out-of-support according to upstream, the idea that all the patches
can be moved upstream isn't going to work for them, and they're
unlikely to be excited about partial solutions.

The bigger problem though is: if we do this, are we going to take
patches that we fundamentally don't agree with?  For instance,
if a packager chooses to rip out the don't-run-server-as-root check.
(Pretty sure I've heard of people doing that.)  That would look like
blessing things we don't think are good ideas, and it would inevitably
lead to long arguments with packagers about why-dont-you-do-this-some-
other-way.  I'm not excited about that prospect.

regards, tom lane




Re: Add column name to error description

2024-03-31 Thread Tom Lane
Erik Wienhold  writes:
> On 2024-03-31 15:22 +0200, Marcos Pegoraro wrote:
>> This is my first patch, so sorry if I miss something.

> Please make sure that tests are passing by running make check:

check-world, in fact.

> The format "%d-%s" is not ideal.  I suggesst "%d (%s)".

I didn't like that either, for two reasons: if we have a column name
it ought to be the prominent label, but we might not have one if the
TupleDesc came from some anonymous source (possibly that case explains
the test crash?  Although I think the attname would be an empty string
rather than missing entirely in such cases).  I think it'd be worth
providing two distinct message strings:

"Returned type %s does not match expected type %s in column \"%s\" (position 
%d)."
"Returned type %s does not match expected type %s in column position %d."

I'd suggest dropping the column number entirely in the first case,
were it not that the attnames might well not be unique if we're
dealing with an anonymous record type such as a SELECT result.

regards, tom lane




Re: Security lessons from liblzma

2024-03-31 Thread Michael Banck
Hi,

On Sun, Mar 31, 2024 at 01:05:40PM -0400, Joe Conway wrote:
> But it has always bothered me how many patches get applied to the upstream
> tarballs by the OS package builders. Some of them, e.g. glibc on RHEL 7,
> include more than 1000 patches that you would have to manually vet if you
> cared enough and had the skills. Last time I looked at the openssl package
> sources it was similar in volume and complexity. They might as well be
> called forks if everyone were being honest about it...

I think this more an artifact of how RHEL development works, i.e. trying
to ship the same major version of glibc for 10 years, but still fix lots
of bugs and possibly some performance improvements your larger customers
ask for. So I guess a lot of those 1000 patches are just cherry-picks /
backports of upstream commits from newer releases.

I guess it would be useful to maybe have another look at the patches
that are being applied for apt/yum.postgresql.org for the 18 release
cycle, but I do not think those are a security problem. Not sure about
RPM builds, but at least in theory the APT builds should be
reproducible.

What would be a significant gain in security/trust was an easy
service/recipe on how to verify the reproducibility (i) by independently
building packages (and maybe the more popular extensions) and comparing
them to the {apt,yum}.postgresql.org repository packages (ii) by being
able to build the release tarballs reproducibly.


Michael




Re: Add column name to error description

2024-03-31 Thread Erik Wienhold
On 2024-03-31 15:22 +0200, Marcos Pegoraro wrote:
> This is my first patch, so sorry if I miss something.

Please make sure that tests are passing by running make check:
https://www.postgresql.org/docs/current/regress-run.html#REGRESS-RUN-TEMP-INST

The patch breaks src/test/regress/sql/plpgsql.sql at:

-- this does not work currently (no implicit casting)
create or replace function compos() returns compostype as $$
begin
  return (1, 'hello');
end;
$$ language plpgsql;
select compos();
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

> If I have a function which returns lots of columns and any of these columns
> returns a wrong type it's not easy to see which one is that column because
> it points me only to its position, not its name. So, this patch only adds
> that column name, just that.

+1 for this improvement.

> create function my_f(a integer, b integer) returns table(first_col integer,
> lots_of_cols_later numeric) language plpgsql as $function$
> begin
>   return query select a, b;
> end;$function$;
> 
> select * from my_f(1,1);
> --ERROR: structure of query does not match function result type
> --Returned type integer does not match expected type numeric in column 2.
> 
> For a function which has just 2 columns is easy but if it returns a hundred
> of columns, which one is that 66th column ?
> 
> My patch just adds column name to that description message.
> --ERROR: structure of query does not match function result type
> --Returned type integer does not match expected type numeric in column 2-
> lots_of_cols_later.

> diff --git a/src/backend/access/common/attmap.c 
> b/src/backend/access/common/attmap.c
> index b0fe27ef57..85f7c0cb8c 100644
> --- a/src/backend/access/common/attmap.c
> +++ b/src/backend/access/common/attmap.c
> @@ -118,12 +118,13 @@ build_attrmap_by_position(TupleDesc indesc,
>  ereport(ERROR,
>  (errcode(ERRCODE_DATATYPE_MISMATCH),
>   errmsg_internal("%s", _(msg)),
> - errdetail("Returned type %s does not match expected 
> type %s in column %d.",
> + errdetail("Returned type %s does not match expected 
> type %s in column %d-%s.",

The format "%d-%s" is not ideal.  I suggesst "%d (%s)".

> format_type_with_typemod(att->atttypid,
>  att->atttypmod),
> format_type_with_typemod(atttypid,
>  atttypmod),
> -   noutcols)));
> +   noutcols,
> +   att->attname)));

Must be NameStr(att->attname) for use with printf's %s.  GCC even prints
a warning:

attmap.c:121:60: warning: format ‘%s’ expects argument of type ‘char 
*’, but argument 5 has type ‘NameData’ {aka ‘struct nameData’} [-Wformat=]

-- 
Erik




Re: Statistics Import and Export

2024-03-31 Thread Tom Lane
My apologies for having paid so little attention to this thread for
months.  I got around to reading the v15 patches today, and while
I think they're going in more or less the right direction, there's
a long way to go IMO.

I concur with the plan of extracting data from pg_stats not
pg_statistics, and with emitting a single "set statistics"
call per attribute.  (I think at one point I'd suggested a call
per stakind slot, but that would lead to a bunch of UPDATEs on
existing pg_attribute tuples and hence a bunch of dead tuples
at the end of an import, so it's not the way to go.  A series
of UPDATEs would likely also play poorly with a background
auto-ANALYZE happening concurrently.)

I do not like the current design for pg_set_attribute_stats' API
though: I don't think it's at all future-proof.  What happens when
somebody adds a new stakind (and hence new pg_stats column)?
You could try to add an overloaded pg_set_attribute_stats
version with more parameters, but I'm pretty sure that would
lead to "ambiguous function call" failures when trying to load
old dump files containing only the original parameters.  The
present design is also fragile in that an unrecognized parameter
will lead to a parse-time failure and no function call happening,
which is less robust than I'd like.  As lesser points,
the relation argument ought to be declared regclass not oid for
convenience of use, and I really think that we need to provide
the source server's major version number --- maybe we will never
need that, but if we do and we don't have it we will be sad.

So this leads me to suggest that we'd be best off with a VARIADIC
ANY signature, where the variadic part consists of alternating
parameter labels and values:

pg_set_attribute_stats(table regclass, attribute name,
   inherited bool, source_version int,
   variadic "any") returns void

where a call might look like

SELECT pg_set_attribute_stats('public.mytable', 'mycolumn',
  false, -- not inherited
  16, -- source server major version
  -- pairs of labels and values follow
  'null_frac', 0.4,
  'avg_width', 42,
  'histogram_bounds',
  array['a', 'b', 'c']::text[],
  ...);

Note a couple of useful things here:

* AFAICS we could label the function strict and remove all those ad-hoc
null checks.  If you don't have a value for a particular stat, you
just leave that pair of arguments out (exactly as the existing code
in 0002 does, just using a different notation).  This also means that
we don't need any default arguments and so no need for hackery in
system_functions.sql.

* If we don't recognize a parameter label at runtime, we can treat
that as a warning rather than a hard error, and press on.  This case
would mostly be useful in major version downgrades I suppose, but
that will be something people will want eventually.

* We can require the calling statement to cast arguments, particularly
arrays, to the proper type, removing the need for conversions within
the stats-setting function.  (But instead, it'd need to check that the
next "any" argument is the type it ought to be based on the type of
the target column.)

If we write the labels as undecorated string literals as I show
above, I think they will arrive at the function as "unknown"-type
constants, which is a little weird but doesn't seem like it's
really a big problem.  The alternative is to cast them all to text
explicitly, but that's adding notation to no great benefit IMO.

pg_set_relation_stats is simpler in that the set of stats values
to be set will probably remain fairly static, and there seems little
reason to allow only part of them to be supplied (so personally I'd
drop the business about accepting nulls there too).  If we do grow
another value or values for it to set there shouldn't be much problem
with overloading it with another version with more arguments.
Still needs to take regclass not oid though ...

I've not read the patches in great detail, but I did note a
few low-level issues:

* why is check_relation_permissions looking up the pg_class row?
There's already a copy of that in the Relation struct.  Likewise
for the other caller of can_modify_relation (but why is that
caller not using check_relation_permissions?)  That all looks
overly complicated and duplicative.  I think you don't need two
layers of function there.

* I find the stuff with enums and "const char *param_names" to
be way too cute and unlike anything we do elsewhere.  Please
don't invent your own notations for coding patterns that have
hundreds of existing instances.  pg_set_relation_stats, for
example, has absolutely no reason not to look like the usual

Oid relid = PG_GETARG_OID(0);
float4  relpages = PG_GETARG_FLOAT4(1);
... etc ...

* The 

Re: Statistics Import and Export

2024-03-31 Thread Tom Lane
Corey Huinker  writes:
> Having given this some thought, I'd be inclined to create a view,
> pg_stats_missing, with the same security barrier as pg_stats, but looking
> for tables that lack stats on at least one column, or lack stats on an
> extended statistics object.

The week before feature freeze is no time to be designing something
like that, unless you've abandoned all hope of getting this into v17.

There's a bigger issue though: AFAICS this patch set does nothing
about dumping extended statistics.  I surely don't want to hold up
the patch insisting that that has to happen before we can commit the
functionality proposed here.  But we cannot rip out pg_upgrade's
support for post-upgrade ANALYZE processing before we do something
about extended statistics, and that means it's premature to be
designing any changes to how that works.  So I'd set that whole
topic on the back burner.

It's possible that we could drop the analyze-in-stages recommendation,
figuring that this functionality will get people to the
able-to-limp-along level immediately and that all that is needed is a
single mop-up ANALYZE pass.  But I think we should leave that till we
have a bit more than zero field experience with this feature.

regards, tom lane




Re: Security lessons from liblzma

2024-03-31 Thread Joe Conway

On 3/31/24 11:49, Tom Lane wrote:

Joe Conway  writes:
I am saying maybe those patches should be eliminated in favor of our 
tree including build options that would produce the same result.


I don't really see how that can be expected to work sanely.
It turns the responsibility for platform-specific build issues
on its head, and it doesn't work at all for issues discovered
after we make a release.  The normal understanding of how you
can vet a distro's package is that you look at the package
contents (the SRPM in Red Hat world and whatever the equivalent
concept is elsewhere), check that the contained tarball
matches upstream and that the patches and build instructions
look sane, and then build it locally and check for a match to
the distro's binary package.  Even if we could overcome the
obstacles to putting the patch files into the upstream tarball,
we're surely not going to include the build instructions, so
we'd not have moved the needle very far in terms of whether the
packager could do something malicious.


True enough I guess.

But it has always bothered me how many patches get applied to the 
upstream tarballs by the OS package builders. Some of them, e.g. glibc 
on RHEL 7, include more than 1000 patches that you would have to 
manually vet if you cared enough and had the skills. Last time I looked 
at the openssl package sources it was similar in volume and complexity. 
They might as well be called forks if everyone were being honest about it...


I know our PGDG packages are no big deal compared to that, fortunately.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: Security lessons from liblzma

2024-03-31 Thread Devrim Gündüz
Hi,

On Sat, 2024-03-30 at 21:52 -0400, Bruce Momjian wrote:
> How would someone access the out-of-tree patches? 

Here are the v17 patches:

https://git.postgresql.org/gitweb/?p=pgrpms.git;a=tree;f=rpm/redhat/main/non-common/postgresql-17/main

You can replace -17 with -16 (and etc) for the other major releases.

Please note that both Debian folks and me build about 300 other packages
to support the ecosystem. Just saying.

Regards,
-- 
Devrim Gündüz
Open Source Solution Architect, PostgreSQL Major Contributor
Twitter: @DevrimGunduz , @DevrimGunduzTR




Re: Security lessons from liblzma

2024-03-31 Thread Tom Lane
Joe Conway  writes:
> I am saying maybe those patches should be eliminated in favor of our 
> tree including build options that would produce the same result.

I don't really see how that can be expected to work sanely.
It turns the responsibility for platform-specific build issues
on its head, and it doesn't work at all for issues discovered
after we make a release.  The normal understanding of how you
can vet a distro's package is that you look at the package
contents (the SRPM in Red Hat world and whatever the equivalent
concept is elsewhere), check that the contained tarball
matches upstream and that the patches and build instructions
look sane, and then build it locally and check for a match to
the distro's binary package.  Even if we could overcome the
obstacles to putting the patch files into the upstream tarball,
we're surely not going to include the build instructions, so
we'd not have moved the needle very far in terms of whether the
packager could do something malicious.

regards, tom lane




Re: New Table Access Methods for Multi and Single Inserts

2024-03-31 Thread Bharath Rupireddy
On Wed, Mar 27, 2024 at 1:42 PM Jeff Davis  wrote:
>
> On Wed, 2024-03-27 at 01:19 +0530, Bharath Rupireddy wrote:
> >
> > Similarly, with this new AM, the onus lies on the table AM
> > implementers to provide an implementation for these new AMs even if
> > they just do single inserts.
>
> Why not fall back to using the plain tuple_insert? Surely some table
> AMs might be simple and limited, and we shouldn't break them just
> because they don't implement the new APIs.

Hm. That might complicate table_modify_begin,
table_modify_buffer_insert and table_modify_end a bit. What do we put
in TableModifyState then? Do we create the bulk insert state
(BulkInsertStateData) outside? I think to give a better interface, can
we let TAM implementers support these new APIs in their own way? If
this sounds rather intrusive, we can just implement the fallback to
tuple_insert if these new API are not supported in the caller, for
example, do something like below in createas.c and matview.c.
Thoughts?

if (table_modify_buffer_insert() is defined)
   table_modify_buffer_insert(...);
else
{
  myState->bistate = GetBulkInsertState();
  table_tuple_insert(...);
}

> > table_multi_insert needs to be there for sure as COPY ... FROM uses
> > it.
>
> After we have these new APIs fully in place and used by COPY, what will
> happen to those other APIs? Will they be deprecated or will there be a
> reason to keep them?

Deprecated perhaps?

Please find the attached v16 patches for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 85410b429917cf388c4b58883ddc304118c73143 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sun, 31 Mar 2024 15:34:16 +
Subject: [PATCH v16 1/2] Introduce new table modify access methods

---
 src/backend/access/heap/heapam.c | 189 ++-
 src/backend/access/heap/heapam_handler.c |   5 +
 src/backend/access/table/tableamapi.c|   4 +
 src/include/access/heapam.h  |  41 +
 src/include/access/tableam.h |  58 +++
 src/tools/pgindent/typedefs.list |   3 +
 6 files changed, 299 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b661d9811e..69f8c597d8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -64,6 +64,7 @@
 #include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -107,7 +108,8 @@ static int	bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
 static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
 		bool *copy);
-
+static void heap_modify_buffer_flush(TableModifyState *state);
+static void heap_modify_insert_end(TableModifyState *state);
 
 /*
  * Each tuple lock mode has a corresponding heavyweight lock, and one or two
@@ -2441,6 +2443,191 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	*insert_indexes = true;
 }
 
+/*
+ * Initialize heap modify state.
+ */
+TableModifyState *
+heap_modify_begin(Relation rel, int modify_flags, CommandId cid,
+  int options)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(CurrentMemoryContext,
+	"heap_modify memory context",
+	ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mctx = context;
+	state->cid = cid;
+	state->options = options;
+	state->insert_indexes = false;
+	state->modify_end_cb = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_modify_buffer_insert(TableModifyState *state,
+		  TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mctx);
+
+	/* First time through, initialize heap insert state */
+	if (state->data == NULL)
+	{
+		istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
+		istate->bistate = NULL;
+		istate->mistate = NULL;
+		state->data = istate;
+
+		if ((state->modify_flags & TM_FLAG_MULTI_INSERTS) != 0)
+		{
+			mistate = (HeapMultiInsertState *) palloc0(sizeof(HeapMultiInsertState));
+			mistate->slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+			istate->mistate = mistate;
+		}
+
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			istate->bistate = GetBulkInsertState();
+
+		

Re: Security lessons from liblzma

2024-03-31 Thread Devrim Gündüz
Hi,

On Sun, 2024-03-31 at 08:15 -0400, Joe Conway wrote:
> 
> I am saying maybe those patches should be eliminated in favor of our 
> tree including build options that would produce the same result.

Works for me, as a long as I can commit them and upcoming potential
patches to PostgreSQL git repo.

OTOH, we also carry non-patches like README files, systemd unit files,
pam files, setup script, etc., which are very RPM specific.

Regards,
-- 
Devrim Gündüz
Open Source Solution Architect, PostgreSQL Major Contributor
Twitter: @DevrimGunduz , @DevrimGunduzTR




Re: Building with musl in CI and the build farm

2024-03-31 Thread walther

About building one of the CI tasks with musl:

Andres Freund:

I'd rather adapt one of the existing tasks, to avoid increasing CI costs unduly.


I looked into this and I think the only task that could be changed is 
the SanityCheck. This is because this builds without any additional 
features enabled. I guess that makes sense, because otherwise those 
dependencies would first have to be built with musl-gcc as well.




FWIW, except for one small issue, building postgres against musl works on 
debian and the tests pass if I install first.


After the fix for LD_LIBRARY_PATH this now works as expected without 
installing first. I confirmed it works on debian with CC=musl-gcc.




The small problem mentioned above is that on debian linux/fs.h isn't available
when building with musl, which in turn causes src/bin/pg_upgrade/file.c to
fail to compile.


According to [1], this can be worked around by linking some folders:

ln -s /usr/include/linux /usr/include/x86_64-linux-musl/
ln -s /usr/include/asm-generic /usr/include/x86_64-linux-musl/
ln -s /usr/include/x86_64-linux-gnu/asm /usr/include/x86_64-linux-musl/

Please find a patch to use musl-gcc in SanityCheck attached. Logs from 
the CI run are in [2]. It has this in the configure phase:


[13:19:52.712] Using 'CC' from environment with value: 'ccache musl-gcc'
[13:19:52.712] C compiler for the host machine: ccache musl-gcc (gcc 
10.2.1 "cc (Debian 10.2.1-6) 10.2.1 20210110")

[13:19:52.712] C linker for the host machine: musl-gcc ld.bfd 2.35.2
[13:19:52.712] Using 'CC' from environment with value: 'ccache musl-gcc'

So meson picks up musl-gcc properly. I also double checked that without 
the links above, the build does indeed fail with the linux/fs.h error.


I assume the installation of musl-tools should be done in the 
pg-vm-images repo instead of the additional script here?


Best,

Wolfgang

[1]: 
https://debian-bugs-dist.debian.narkive.com/VlFkLigg/bug-789789-musl-fails-to-compile-stuff-that-depends-on-kernel-headers

[2]: https://cirrus-ci.com/task/5741892590108672From 4a69d9851e7bbd7cd521d236847af9ebf5e6253b Mon Sep 17 00:00:00 2001
From: Wolfgang Walther 
Date: Sun, 31 Mar 2024 15:17:43 +0200
Subject: [PATCH] Build SanityCheck against musl

---
 .cirrus.tasks.yml | 9 +
 1 file changed, 9 insertions(+)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 72f553e69f4..5815c51abbe 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -69,6 +69,7 @@ task:
 CCACHE_DIR: ${CIRRUS_WORKING_DIR}/ccache_dir
 # no options enabled, should be small
 CCACHE_MAXSIZE: "150M"
+CC: ccache musl-gcc
 
   # While containers would start up a bit quicker, building is a bit
   # slower. This way we don't have to maintain a container image.
@@ -77,6 +78,14 @@ task:
   ccache_cache:
 folder: $CCACHE_DIR
 
+  setup_additional_packages_script: |
+apt-get update
+DEBIAN_FRONTEND=noninteractive apt-get -y install musl-tools
+ln -s -t /usr/include/x86_64-linux-musl/ \
+  /usr/include/linux \
+  /usr/include/asm-generic \
+  /usr/include/x86_64-linux-gnu/asm
+
   create_user_script: |
 useradd -m postgres
 chown -R postgres:postgres .
-- 
2.44.0



Re: pgbnech: allow to cancel queries during benchmark

2024-03-31 Thread Yugo NAGATA
On Sat, 30 Mar 2024 14:35:37 +0100
Alvaro Herrera  wrote:

> Due to commit 61461a300c1c, this patch needs to be reworked.

Thank you for pointing out this.

Although cfbot doesn't report any failure, but PQcancel is now
deprecated and insecure. I'll consider it too while fixing a
problem I found in [1].

[1] 
https://www.postgresql.org/message-id/20240207101903.b5846c25808f64a91ee2e7a2%40sraoss.co.jp

Regards,
Yugo Nagata

> -- 
> Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
> "La fuerza no está en los medios físicos
> sino que reside en una voluntad indomable" (Gandhi)


-- 
Yugo NAGATA 




Add column name to error description

2024-03-31 Thread Marcos Pegoraro
This is my first patch, so sorry if I miss something.

If I have a function which returns lots of columns and any of these columns
returns a wrong type it's not easy to see which one is that column because
it points me only to its position, not its name. So, this patch only adds
that column name, just that.

create function my_f(a integer, b integer) returns table(first_col integer,
lots_of_cols_later numeric) language plpgsql as $function$
begin
  return query select a, b;
end;$function$;

select * from my_f(1,1);
--ERROR: structure of query does not match function result type
--Returned type integer does not match expected type numeric in column 2.

For a function which has just 2 columns is easy but if it returns a hundred
of columns, which one is that 66th column ?

My patch just adds column name to that description message.
--ERROR: structure of query does not match function result type
--Returned type integer does not match expected type numeric in column 2-
lots_of_cols_later.

regards
Marcos


Add column name to error description.patch
Description: Binary data


Re: Security lessons from liblzma

2024-03-31 Thread Joe Conway

On 3/30/24 21:52, Bruce Momjian wrote:

On Sat, Mar 30, 2024 at 07:54:00PM -0400, Joe Conway wrote:

Virtually every RPM source, including ours, contains out of tree patches
that get applied on top of the release tarball. At least for the PGDG
packages, it would be nice to integrate them into our git repo as build
options or whatever so that the packages could be built without any patches
applied to it. Add a tarball that is signed and traceable back to the git
tag, and we would be in a much better place than we are now.


How would someone access the out-of-tree patches?  I think Debian
includes the patches in its source tarball.


I am saying maybe those patches should be eliminated in favor of our 
tree including build options that would produce the same result.


For example, these patches are applied to our release tarball files when 
the RPM is being built for pg16 on RHEL 9:


-
https://git.postgresql.org/gitweb/?p=pgrpms.git;a=blob;f=rpm/redhat/main/non-common/postgresql-16/main/postgresql-16-rpm-pgsql.patch;h=d9b6d12b7517407ac81352fa325ec91b05587641;hb=HEAD

https://git.postgresql.org/gitweb/?p=pgrpms.git;a=blob;f=rpm/redhat/main/non-common/postgresql-16/main/postgresql-16-var-run-socket.patch;h=f2528efaf8f4681754b20283463eff3e14eedd39;hb=HEAD

https://git.postgresql.org/gitweb/?p=pgrpms.git;a=blob;f=rpm/redhat/main/non-common/postgresql-16/main/postgresql-16-conf.patch;h=da28ed793232316dd81fdcbbe59a6479b054a364;hb=HEAD

https://git.postgresql.org/gitweb/?p=pgrpms.git;a=blob;f=rpm/redhat/main/non-common/postgresql-16/main/postgresql-16-perl-rpath.patch;h=748c42f0ec2c9730af3143e90e5b205c136f40d9;hb=HEAD
-

Nothing too crazy, but wouldn't it be better if no patches were required 
at all?


Ideally we should have reproducible builds so that starting with our 
tarball (which is traceable back to the git release tag) one can easily 
obtain the same binary as what the RPMs/DEBs deliver.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: Statistics Import and Export

2024-03-31 Thread Corey Huinker
>
> That will make it *really* hard for any form of automation or drivers of
> this. The information needs to go somewhere where such tools can easily
> consume it, and an informational message during runtime (which is also
> likely to be translated in many environments) is the exact opposite of that.
>

Having given this some thought, I'd be inclined to create a view,
pg_stats_missing, with the same security barrier as pg_stats, but looking
for tables that lack stats on at least one column, or lack stats on an
extended statistics object.

Table structure would be

schemaname name
tablename name
attnames text[]
ext_stats text[]


The informational message, if it changes at all, could reference this new
view as the place to learn about how well the stats import went.

vacuumdb might get a --missing-only option in addition to
--analyze-in-stages.

For that matter, we could add --analyze-missing options to pg_restore and
pg_upgrade to do the mopping up themselves.


Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-03-31 Thread Justin Pryzby
On Sun, Mar 31, 2024 at 12:00:00PM +0300, Alexander Lakhin wrote:
> Hello Alvaro,
> 
> 28.03.2024 18:58, Alvaro Herrera wrote:
> > Grumble.  I don't like initialization at declare time, so far from the
> > code that depends on the value.  But the alternative would have been to
> > assign right where this blocks starts, an additional line.  I pushed it
> > like you had it.
> 
> I've stumbled upon a test failure caused by the test query added in that
> commit:
> +ERROR:  deadlock detected
> +DETAIL:  Process 3076180 waits for AccessShareLock on relation 1259 of 
> database 16386; blocked by process 3076181.
> +Process 3076181 waits for AccessShareLock on relation 2601 of database 
> 16386; blocked by process 3076180.

I think means that, although it was cute to use pg_am in the reproducer
given in the problem report, it's not a good choice to use here in the
sql regression tests.

-- 
Justin




Re: Security lessons from liblzma

2024-03-31 Thread Michael Banck
On Sat, Mar 30, 2024 at 09:52:47PM -0400, Bruce Momjian wrote:
> On Sat, Mar 30, 2024 at 07:54:00PM -0400, Joe Conway wrote:
> > Virtually every RPM source, including ours, contains out of tree patches
> > that get applied on top of the release tarball. At least for the PGDG
> > packages, it would be nice to integrate them into our git repo as build
> > options or whatever so that the packages could be built without any patches
> > applied to it. Add a tarball that is signed and traceable back to the git
> > tag, and we would be in a much better place than we are now.
> 
> How would someone access the out-of-tree patches?  I think Debian
> includes the patches in its source tarball.

If you ask where they are maintained, the answer is here:

https://salsa.debian.org/postgresql/postgresql/-/tree/17/debian/patches?ref_type=heads

the other major versions have their own branch.


Michael




Re: [PATCH] kNN for btree

2024-03-31 Thread Andrey M. Borodin



> On 15 Jan 2024, at 13:11, Anton A. Melnikov  wrote:
> 
> If there are any ideas pro and contra would be glad to discuss them.

Hi, Anton!

This is kind of ancient thread. I've marked CF entry [0] as "Needs review" and 
you as an author (seems like you are going to be a point of correspondence on 
this feature).

At this point it's obvious that the feature won't make it to 17, so let's move 
to July CF. Given 7 years of history in this thread, you might want to start a 
new one with a summarisation of this thread. This will attract more reviewers, 
either way they have to dig on their own.

Thanks!


Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/47/4871/



Re: logicalrep_worker_launch -- counting/checking the worker limits

2024-03-31 Thread Andrey M. Borodin



> On 15 Aug 2023, at 07:38, Peter Smith  wrote:
> 
> A rebase was needed due to a recent push [1].
> 
> PSA v3.


> On 14 Jan 2024, at 10:43, vignesh C  wrote:
> 
> I have changed the status of the patch to "Waiting on Author" as
> Amit's queries at [1] have not been verified and concluded. Please
> feel free to address them and change the status back again.

Hi Peter!

Are you still interested in this thread? If so - please post an answer to 
Amit's question.
If you are not interested - please Withdraw a CF entry [0].

Thanks!


Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/47/4499/



Re: Wrong results with grouping sets

2024-03-31 Thread Andrey M. Borodin



> On 11 Jan 2024, at 20:10, vignesh C  wrote:
> 
> I have changed the status of the patch to "Waiting on Author" as Tom
> Lane's comments have not yet been addressed, feel free to address them
> and update the commitfest entry accordingly.

This CF entry seems to be a fix for actually unexpected behaviour. But seems 
like we need another fix.
Richard, Alena, what do you think? Should we mark CF entry [0] "RwF" or leave 
it to wait for better fix?

Best regards, Andrey Borodin.


[0] https://commitfest.postgresql.org/47/4583/



Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-03-31 Thread Alexander Lakhin

Hello Alvaro,

28.03.2024 18:58, Alvaro Herrera wrote:

Grumble.  I don't like initialization at declare time, so far from the
code that depends on the value.  But the alternative would have been to
assign right where this blocks starts, an additional line.  I pushed it
like you had it.


I've stumbled upon a test failure caused by the test query added in that
commit:
--- .../src/test/regress/expected/create_am.out  2024-03-28 
12:14:11.700764888 -0400
+++ .../src/test/recovery/tmp_check/results/create_am.out 2024-03-31 
03:10:28.172244122 -0400
@@ -549,7 +549,10 @@
 ERROR:  access method "btree" is not of type TABLE
 -- Other weird invalid cases that cause problems
 CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;
-ERROR:  "pg_am" is not partitioned
+ERROR:  deadlock detected
+DETAIL:  Process 3076180 waits for AccessShareLock on relation 1259 of 
database 16386; blocked by process 3076181.
+Process 3076181 waits for AccessShareLock on relation 2601 of database 16386; 
blocked by process 3076180.
+HINT:  See server log for query details.
 -- Drop table access method, which fails as objects depends on it
 DROP ACCESS METHOD heap2;
 ERROR:  cannot drop access method heap2 because other objects depend on it

027_stream_regress_primary.log contains:
2024-03-31 03:10:26.728 EDT [3076181] pg_regress/vacuum LOG: statement: VACUUM 
FULL pg_class;
...
2024-03-31 03:10:26.797 EDT [3076180] pg_regress/create_am LOG: statement: CREATE FOREIGN TABLE fp PARTITION OF pg_am 
DEFAULT SERVER x;

...
2024-03-31 03:10:28.183 EDT [3076181] pg_regress/vacuum LOG: statement: VACUUM 
FULL pg_database;

This simple demo confirms the issue:
for ((i=1;i<=20;i++)); do
echo "iteration $i"
echo "VACUUM FULL pg_class;" | psql >psql-1.log &
echo "CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;" | psql 
>psql-2.log &
wait
done

...
iteration 15
ERROR:  "pg_am" is not partitioned
iteration 16
ERROR:  deadlock detected
DETAIL:  Process 2556377 waits for AccessShareLock on relation 1259 of database 
16384; blocked by process 2556378.
Process 2556378 waits for AccessShareLock on relation 2601 of database 16384; 
blocked by process 2556377.
HINT:  See server log for query details.
...

Best regards,
Alexander