date:20230815

update with the correct patch..


v8-0001-optimize-casting-jsonb-to-a-given-type.patch
Description: Binary data

Re: Remove distprep

2023-08-15 Thread Peter Eisentraut


On 14.07.23 11:48, Tom Lane wrote:

Peter Eisentraut  writes:

Ah, there was a reason.  The headerscheck and cpluspluscheck targets
need jsonpath_gram.h to be built first.  Which previously happened
indirectly somehow?  I have fixed this in the new patch version.  I also
fixed the issue that Álvaro reported nearby.


Have we yet run this concept past the packagers list?  I'm still
wondering if they will raise any objection to getting rid of all
the prebuilt files.


So far there hasn't been any feedback from packagers that would appear 
to affect the outcome here.



Also, I personally don't like the fact that you have removed the
distinction between distclean and maintainer-clean.  I use
distclean-and-reconfigure quite a lot to avoid having to rebuild
bison/flex outputs.  This patch seems to have destroyed that
workflow optimization in return for not much.


The distclean target has a standard meaning that is baked into 
downstream build systems, so it would be pretty disruptive if distclean 
didn't actually clean everything back down to what was in the 
distribution tarball.  We could add a different clean target that cleans 
not quite everything, if you can suggest a definition of what that 
should do.

Re: WIP: new system catalog pg_wait_event

On Wed, Aug 16, 2023 at 07:04:53AM +0200, Drouvot, Bertrand wrote:
> I'd prefer the singular form. There is a lot of places where it's already used
> (pg_database, pg_user, pg_namespace...to name a few) and it looks like that 
> using
> the plural form are exceptions.

Okay, fine by me.  Also, I would remove the "wait_event_" prefixes to
the field names for the attribute names.

> Yeah, now that af720b4c50 is done, I'll add the custom wait events handling
> in v6.

Thanks.  I guess that the hash tables had better remain local to
wait_event.c.
--
Michael


signature.asc
Description: PGP signature

Re: Extract numeric filed in JSONB more effectively

On Tue, Aug 15, 2023 at 1:24 PM Pavel Stehule 
wrote:

> Hi
>
> út 15. 8. 2023 v 5:24 odesílatel Andy Fan 
> napsal:
>
>>
>>>  jsonb_extract_xx_type just cares about the argtype, but
>>> 'explain select xx'  will still access the const->constvalue.
>>> const->constvalue is 0 which is set by makeNullConst currently,
>>> and it is ok for the current supported type.
>>>
>>
>> The exception is numeric data type, the constvalue can't be 0.
>> so hack it with the below line.  maybe not good enough,  but I
>> have no better solution now.
>>
>> +   Const   *target =
>>  makeNullConst(fexpr->funcresulttype,
>> +
>>-1,
>> +
>>InvalidOid);
>> +   /*
>> +* Since all the above functions are strict, we
>> can't input
>> +* a NULL value.
>> +*/
>> +   target->constisnull = false;
>> +
>> +   Assert(target->constbyval || target->consttype ==
>> NUMERICOID);
>> +
>> +   /* Mock a valid datum for !constbyval type. */
>> +   if (fexpr->funcresulttype == NUMERICOID)
>> +   target->constvalue =
>> DirectFunctionCall1(numeric_in, CStringGetDatum("0"));
>>
>>
> Personally I think this workaround is too dirty, and better to use a
> strict function (I believe so the overhead for NULL values is acceptable).
>

In the patch v8,  I created a new routine named makeDummyConst,
which just sits by makeNullConst. It may be helpful to some extent.
a).  The code is self-document for the user/reader.  b).  We have a
central place to maintain this routine.

Besides the framework,  the troubles for the reviewer may be if the
code has some corner case issue or behavior changes. Especially
I have some code refactor when working on jsonb_extract_path.
so the attached test.sql is designed for this.  I have compared the
result between master and patched version and I think reviewer
can do some extra testing with it.

v8 is the finished version in my mind, so I think it is ready for review
now.

-- 
Best Regards
Andy Fan

v8-0001-optimize-casting-jsonb-to-a-given-type.patch.bak
Description: Binary data

test.sql
Description: Binary data

some code cleanup in index.c and indexcmds.c

2023-08-15 Thread Peter Eisentraut

Here is a patch set with some straightforward code cleanup in index.c 
and indexcmds.c and some adjacent places.


First, I have added const qualifiers to all the function prototypes as 
appropriate.  This didn't require any additional casts or tricks.


Then, I have renamed some function arguments for clarity.  For example, 
several functions had an argument like


Oid *classObjectId

This is confusing in more than one way:  The "class" is actually the 
operator class, not the pg_class entry, and the argument is actually an 
array, not a single value as the name would suggest.  The amended version


const Oid *opclassIds

should be much clearer.

Also, about half the code in these files already used the better naming 
system, so this change also makes everything within these files more 
consistent.


Third, I removed some line breaks around the places that I touched 
anyway.  In some cases, with the renaming, the lines didn't seem that 
long anymore to warrant a line break.From dfa595a003ea237a7431603824b9e5f28c5b20fa Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 16 Aug 2023 07:45:28 +0200
Subject: [PATCH 1/3] Add const decorations

in index.c and indexcmds.c and some adjacent places.  This especially
makes it easier to understand for some complicated function signatures
which are the input and the output arguments.
---
 src/backend/bootstrap/bootstrap.c |  2 +-
 src/backend/catalog/heap.c|  2 +-
 src/backend/catalog/index.c   | 66 +++
 src/backend/catalog/namespace.c   | 10 ++---
 src/backend/commands/indexcmds.c  | 58 +--
 src/include/bootstrap/bootstrap.h |  2 +-
 src/include/catalog/heap.h|  2 +-
 src/include/catalog/index.h   | 28 ++---
 src/include/catalog/namespace.h   | 10 ++---
 src/include/commands/defrem.h |  8 ++--
 10 files changed, 94 insertions(+), 94 deletions(-)

diff --git a/src/backend/bootstrap/bootstrap.c 
b/src/backend/bootstrap/bootstrap.c
index 49e956b2c5..4cc2efa95c 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -914,7 +914,7 @@ AllocateAttribute(void)
 void
 index_register(Oid heap,
   Oid ind,
-  IndexInfo *indexInfo)
+  const IndexInfo *indexInfo)
 {
IndexList  *newind;
MemoryContext oldcxt;
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 4c30c7d461..c89b0bfa75 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -697,7 +697,7 @@ void
 InsertPgAttributeTuples(Relation pg_attribute_rel,
TupleDesc tupdesc,
Oid new_rel_oid,
-   Datum *attoptions,
+   const Datum *attoptions,
CatalogIndexState indstate)
 {
TupleTableSlot **slot;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index eb2b8d84c3..2a30eb3a32 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -106,20 +106,20 @@ typedef struct
 /* non-export function prototypes */
 static bool relationHasPrimaryKey(Relation rel);
 static TupleDesc ConstructTupleDescriptor(Relation heapRelation,
-   
  IndexInfo *indexInfo,
-   
  List *indexColNames,
+   
  const IndexInfo *indexInfo,
+   
  const List *indexColNames,

  Oid accessMethodObjectId,
-   
  Oid *collationObjectId,
-   
  Oid *classObjectId);
+   
  const Oid *collationObjectId,
+   
  const Oid *classObjectId);
 static void InitializeAttributeOids(Relation indexRelation,
int 
numatts, Oid indexoid);
-static void AppendAttributeTuples(Relation indexRelation, Datum *attopts);
+static void AppendAttributeTuples(Relation indexRelation, const Datum 
*attopts);
 static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
Oid 
parentIndexId,
-   IndexInfo 
*indexInfo,
-   Oid 
*collation

Re: Synchronizing slots from primary to standby

2023-08-15 Thread Ajin Cherian

On Mon, Aug 14, 2023 at 8:38 PM shveta malik  wrote:
>
> On Mon, Aug 14, 2023 at 3:22 PM shveta malik  wrote:
> >
> > On Tue, Aug 8, 2023 at 11:11 AM Drouvot, Bertrand
> >  wrote:
> > >
> > > Hi,
> > >
> > > On 8/8/23 7:01 AM, shveta malik wrote:
> > > > On Mon, Aug 7, 2023 at 3:17 PM Drouvot, Bertrand
> > > >  wrote:
> > > >>
> > > >> Hi,
> > > >>
> > > >> On 8/4/23 1:32 PM, shveta malik wrote:
> > > >>> On Fri, Aug 4, 2023 at 2:44 PM Drouvot, Bertrand
> > > >>>  wrote:
> > >  On 7/28/23 4:39 PM, Bharath Rupireddy wrote:
> > > >>
> > > >
> > > > Agreed. That is why in v10,v11 patches, we have different infra for
> > > > sync-slot worker i.e. it is not relying on "logical replication
> > > > background worker" anymore.
> > >
> > > yeah saw that, looks like the right way to go to me.
> > >
> > > >> Maybe we should start some tests/benchmark with only one sync worker 
> > > >> to get numbers
> > > >> and start from there?
> > > >
> > > > Yes, we can do that performance testing to figure out the difference
> > > > between the two modes. I will try to get some statistics on this.
> > > >
> > >
> > > Great, thanks!
> > >
> >
> > We (myself and Ajin) performed the tests to compute the lag in standby
> > slots as compared to primary slots with different number of slot-sync
> > workers configured.
> >
> > 3 DBs were created, each with 30 tables and each table having one
> > logical-pub/sub configured. So this made a total of 90 logical
> > replication slots to be synced. Then the workload was run for aprox 10
> > mins. During this workload, at regular intervals, primary and standby
> > slots' lsns were captured (from pg_replication_slots) and compared. At
> > each capture, the intent was to know how much is each standby's slot
> > lagging behind corresponding primary's slot by taking the distance
> > between confirmed_flush_lsn of primary and standby slot. Then we took
> > the average (integer value) of this distance over the span of 10 min
> > workload and this is what we got:
> >
>
> I have attached the scripts for schema-setup, running workload and
> capturing lag. Please go through Readme for details.
>
>
I did some more tests for 10,20 and 40 slots to calculate the average
lsn distance
between slots, comparing 1 worker and 3 workers.

My results are as follows:

10 slots
1 worker: 5529.75527426 (average lsn distance between primary and
standby per slot)
3 worker: 2224.57589134

20 slots
1 worker: 9592.87234043
3 worker: 3194.6293

40 slots
1 worker: 20566.093
3 worker: 7885.80952381

90 slots
1 worker: 36706.8405797
3 worker: 10236.6393162

regards,
Ajin Cherian
Fujitsu Australia

Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2023-08-15 Thread Ashutosh Bapat

On Fri, Aug 4, 2023 at 2:11 PM Ashutosh Bapat
 wrote:
>
> Attached patchset fixing those.
> 0001 - patch to report planning memory, with to explain.out regression output 
> fix. We may consider committing this as well.
> 0002 - with your comment addressed above.

0003 - Added this patch for handling SpecialJoinInfos for inner joins.
These SpecialJoinInfos are created on the fly for parent joins. They
can be created on the fly for child joins as well without requiring
any translations. Thus they do not require any new memory. This patch
is intended to be merged into 0002 finally.

Added to the upcoming commitfest https://commitfest.postgresql.org/44/4500/


--
Best Wishes,
Ashutosh Bapat
From 1c39902456e715902e0e24b7cfdaf7135c472ae8 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Mon, 7 Aug 2023 11:00:53 +0530
Subject: [PATCH 3/3] Do not translate dummy SpecialJoinInfos

Plain inner joins do not have SpecialJoinInfos associated with them.
They are crafted on the fly for parent joins. Do the same for child
joins.

Ashutosh Bapat
---
 src/backend/optimizer/path/costsize.c | 37 ++
 src/backend/optimizer/path/joinrels.c | 72 ++-
 src/include/optimizer/paths.h |  2 +
 3 files changed, 54 insertions(+), 57 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d6ceafd51c..43e0c9621c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5021,23 +5021,7 @@ compute_semi_anti_join_factors(PlannerInfo *root,
 	/*
 	 * Also get the normal inner-join selectivity of the join clauses.
 	 */
-	norm_sjinfo.type = T_SpecialJoinInfo;
-	norm_sjinfo.min_lefthand = outerrel->relids;
-	norm_sjinfo.min_righthand = innerrel->relids;
-	norm_sjinfo.syn_lefthand = outerrel->relids;
-	norm_sjinfo.syn_righthand = innerrel->relids;
-	norm_sjinfo.jointype = JOIN_INNER;
-	norm_sjinfo.ojrelid = 0;
-	norm_sjinfo.commute_above_l = NULL;
-	norm_sjinfo.commute_above_r = NULL;
-	norm_sjinfo.commute_below_l = NULL;
-	norm_sjinfo.commute_below_r = NULL;
-	/* we don't bother trying to make the remaining fields valid */
-	norm_sjinfo.lhs_strict = false;
-	norm_sjinfo.semi_can_btree = false;
-	norm_sjinfo.semi_can_hash = false;
-	norm_sjinfo.semi_operators = NIL;
-	norm_sjinfo.semi_rhs_exprs = NIL;
+	make_dummy_sjinfo(&norm_sjinfo, outerrel, innerrel);
 
 	nselec = clauselist_selectivity(root,
 	joinquals,
@@ -5190,23 +5174,8 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals)
 	/*
 	 * Make up a SpecialJoinInfo for JOIN_INNER semantics.
 	 */
-	sjinfo.type = T_SpecialJoinInfo;
-	sjinfo.min_lefthand = path->outerjoinpath->parent->relids;
-	sjinfo.min_righthand = path->innerjoinpath->parent->relids;
-	sjinfo.syn_lefthand = path->outerjoinpath->parent->relids;
-	sjinfo.syn_righthand = path->innerjoinpath->parent->relids;
-	sjinfo.jointype = JOIN_INNER;
-	sjinfo.ojrelid = 0;
-	sjinfo.commute_above_l = NULL;
-	sjinfo.commute_above_r = NULL;
-	sjinfo.commute_below_l = NULL;
-	sjinfo.commute_below_r = NULL;
-	/* we don't bother trying to make the remaining fields valid */
-	sjinfo.lhs_strict = false;
-	sjinfo.semi_can_btree = false;
-	sjinfo.semi_can_hash = false;
-	sjinfo.semi_operators = NIL;
-	sjinfo.semi_rhs_exprs = NIL;
+	make_dummy_sjinfo(&sjinfo, path->outerjoinpath->parent,
+	  path->innerjoinpath->parent);
 
 	/* Get the approximate selectivity */
 	foreach(l, quals)
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 1a341c9b73..540eda612f 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -44,7 +44,8 @@ static void try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1,
    List *parent_restrictlist);
 static void build_child_join_sjinfo(PlannerInfo *root,
 	SpecialJoinInfo *parent_sjinfo,
-	Relids left_relids, Relids right_relids,
+	RelOptInfo *left_child,
+	RelOptInfo *right_child,
 	SpecialJoinInfo *child_sjinfo);
 static void free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo);
 static void compute_partition_bounds(PlannerInfo *root, RelOptInfo *rel1,
@@ -655,6 +656,32 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 	return true;
 }
 
+/*
+ * make_dummy_sjinfo
+ *Populate the given SpecialJoinInfo for a plain inner join, between rel1
+ *and rel2, which does not have a SpecialJoinInfo associated with it.
+ */
+void
+make_dummy_sjinfo(SpecialJoinInfo *sjinfo, RelOptInfo *rel1, RelOptInfo *rel2)
+{
+	sjinfo->type = T_SpecialJoinInfo;
+	sjinfo->min_lefthand = rel1->relids;
+	sjinfo->min_righthand = rel2->relids;
+	sjinfo->syn_lefthand = rel1->relids;
+	sjinfo->syn_righthand = rel2->relids;
+	sjinfo->jointype = JOIN_INNER;
+	sjinfo->ojrelid = 0;
+	sjinfo->commute_above_l = NULL;
+	sjinfo->commute_above_r = NULL;
+	sjinfo->commute_below_l = NULL;
+	sjinfo->commute_below_r = NULL;
+	/* we don'

Re: WIP: new system catalog pg_wait_event

2023-08-15 Thread Drouvot, Bertrand


Hi,

On 8/14/23 6:37 AM, Michael Paquier wrote:

On Thu, Aug 10, 2023 at 08:09:34PM +0200, Drouvot, Bertrand wrote:

Agree that's worth it given the fact that iterating one more time is not that
costly here.


I have reviewed v4, and finished by putting my hands on it to see what
I could do.


Thanks!



There are two things
that we could do:
- Hide that behind a macro defined in wait_event_funcs.c.
- Feed the data generated here into a static structure, like:
+static const struct
+{
+   const char *type;
+   const char *name;
+   const char *description;
+}

After experimenting with both, I've found the latter a tad cleaner, so
the attached version does that.


Yeah, looking at what you've done in v5, I also agree that's better
that what has been done in v4 (I also think it will be easier to maintain).


I am not sure that "pg_wait_event" is a good idea for the name if the
new view.  How about "pg_wait_events" instead, in plural form?  There
is more than one wait event listed.



I'd prefer the singular form. There is a lot of places where it's already used
(pg_database, pg_user, pg_namespace...to name a few) and it looks like that 
using
the plural form are exceptions.


One log entry in Solution.pm has missed the addition of a reference to
wait_event_funcs_data.c.



Oh right, thanks for fixing it in v5.


And..  src/backend/Makefile missed two updates for maintainer-clean & co,
no?
 

Oh right, thanks for fixing it in v5.


One thing that the patch is still missing is the handling of custom
wait events for extensions. 


Yeah, now that af720b4c50 is done, I'll add the custom wait events handling
in v6.


So this still requires more code.  I was
thinking about listed these events as:
- Type: "Extension"
- Name: What a module has registered.
- Description: "Custom wait event \"%Name%\" defined by extension".


That sounds good to me, I'll do that.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: remaining sql/json patches

2023-08-15 Thread Amit Langote

On Tue, Aug 15, 2023 at 5:58 PM jian he  wrote:
> Hi.
> in v11, json_query:
> +The returned data_type has the
> same semantics
> +as for constructor functions like 
> json_objectagg;
> +the default returned type is text.
> +The ON EMPTY clause specifies the behavior if the
> +path_expression yields no value at all; 
> the
> +default when ON ERROR is not specified is
> to return a
> +null value.
>
> the default returned type is jsonb?

You are correct.

> Also in above quoted second last
> line should be ON EMPTY ?

Correct too.

> Other than that, the doc looks good.

Thanks for the review.

I will post a new version after finishing working on a few other
improvements I am working on.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: pg_logical_emit_message() misses a XLogFlush()

2023-08-15 Thread Andres Freund

Hi,

On 2023-08-16 12:37:21 +0900, Michael Paquier wrote:
> I won't fight much if there are objections to backpatching, but that's
> not really wise (no idea how much EDB's close flavor of BDR relies on
> that).

To be clear: I don't just object to backpatching, I also object to making
existing invocations flush WAL in HEAD. I do not at all object to adding a
parameter that indicates flushing, or a separate function to do so. The latter
might be better, because it allows you to flush a group of messages, rather
than a single one.

Greetings,

Andres Freund

Re: pg_logical_emit_message() misses a XLogFlush()

2023-08-15 Thread Andres Freund

Hi,

On 2023-08-16 03:20:53 +0200, Tomas Vondra wrote:
> On 8/16/23 02:33, Andres Freund wrote:
> > Hi,
> >
> > On 2023-08-16 06:58:56 +0900, Michael Paquier wrote:
> >> On Tue, Aug 15, 2023 at 11:37:32AM +0200, Tomas Vondra wrote:
> >>> Shouldn't the flush be done only for non-transactional messages? The
> >>> transactional case will be flushed by regular commit flush.
> >>
> >> Indeed, that would be better.  I am sending an updated patch.
> >>
> >> I'd like to backpatch that, would there be any objections to that?
> >
> > Yes, I object. This would completely cripple the performance of some uses of
> > logical messages - a slowdown of several orders of magnitude. It's not clear
> > to me that flushing would be the right behaviour if it weren't released, but
> > it certainly doesn't seem right to make such a change in a minor release.
> >
>
> So are you objecting to adding the flush in general, or just to the
> backpatching part?

Both, I think. I don't object to adding a way to trigger flushing, but I think
it needs to be optional.


> IMHO we either guarantee durability of non-transactional messages, in which
> case this would be a clear bug - and I'd say a fairly serious one.  I'm
> curious what the workload that'd see order of magnitude of slowdown does
> with logical messages I've used it, but even if such workload exists, would
> it really be enough to fix any other durability bug?

Not sure what you mean with the last sentence?

I've e.g. used non-transactional messages for:

- A non-transactional queuing system. Where sometimes one would dump a portion
  of tables into messages, with something like
  SELECT pg_logical_emit_message(false, 'app:', to_json(r)) FROM r;
  Obviously flushing after every row would be bad.

  This is useful when you need to coordinate with other systems in a
  non-transactional way. E.g. letting other parts of the system know that
  files on disk (or in S3 or ...) were created/deleted, since a database
  rollback wouldn't unlink/revive the files.

- Audit logging, when you want to log in a way that isn't undone by rolling
  back transaction - just flushing every pg_logical_emit_message() would
  increase the WAL flush rate many times, because instead of once per
  transaction, you'd now flush once per modified row. It'd basically make it
  impractical to use for such things.

- Optimistic locking. Emitting things that need to be locked on logical
  replicas, to be able to commit on the primary. A pre-commit hook would wait
  for the WAL to be replayed sufficiently - but only once per transaction, not
  once per object.


> Or perhaps we don't want to guarantee durability for such messages, in
> which case we don't need to fix it at all (even in master).

Well, I can see adding an option to flush, or perhaps a separate function to
flush, to master.


> The docs are not very clear on what to expect, unfortunately. It says
> that non-transactional messages are "written immediately" which I could
> interpret in either way.

Yea, the docs certainly should be improved, regardless what we end up with.

Greetings,

Andres Freund

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-08-15 Thread Peter Smith

On Fri, Aug 11, 2023 at 11:45 PM Melih Mutlu  wrote:
>
> Again, I couldn't reproduce the cases where you saw significantly degraded 
> performance. I wonder if I'm missing something. Did you do anything not 
> included in the test scripts you shared? Do you think v26-0001 will perform 
> 84% worse than HEAD, if you try again? I just want to be sure that it was not 
> a random thing.
> Interestingly, I also don't see an improvement in above results as big as in 
> your results when inserts/tx ratio is smaller. Even though it certainly is 
> improved in such cases.
>

TEST ENVIRONMENTS

I am running the tests on a high-spec machine:

-- NOTE: Nobody else is using this machine during our testing, so
there are no unexpected influences messing up the results.

Linix

Architecture:  x86_64
CPU(s):120
Thread(s) per core:2
Core(s) per socket:15

  totalusedfree  shared  buff/cache   available
Mem:   755G5.7G737G 49M 12G748G
Swap:  4.0G  0B4.0G

~~~

The results I am seeing are not random. HEAD+v26-0001 is consistently
worse than HEAD but only for some settings. With these settings, I see
bad results (i.e. worse than HEAD) consistently every time using the
dedicated test machine.

Hou-san also reproduced bad results using a different high-spec machine

Vignesh also reproduced bad results using just his laptop but in his
case, it did *not* occur every time. As discussed elsewhere the
problem is timing-related, so sometimes you may be lucky and sometimes
not.

~

I expect you are running everything correctly, but if you are using
just a laptop (like Vignesh) then like him you might need to try
multiple times before you can hit the problem happening in your
environment.

Anyway, in case there is some other reason you are not seeing the bad
results I have re-attached scripts and re-described every step below.

==

BUILDING

-- NOTE: I have a very minimal configuration without any
optimization/debug flags etc. See config.log

$ ./configure --prefix=/home/peter/pg_oss

-- NOTE: Of course, make sure to be running using the correct Postgres:

echo 'set environment variables for OSS work'
export PATH=/home/peter/pg_oss/bin:$PATH

-- NOTE: Be sure to do git stash or whatever so don't accidentally
build a patched version thinking it is the HEAD version
-- NOTE: Be sure to do a full clean build and apply (or don't apply
v26-0001) according to the test you wish to run.

STEPS
1. sudo make clean
2. make
3. sudo make install

==

SCRIPTS & STEPS

SCRIPTS
testrun.sh
do_one_test_setup.sh
do_one_test_PUB.sh
do_one_test_SUB.sh

---

STEPS

Step-1. Edit the testrun.sh

tables=( 100 )
workers=( 2 4 8 16 )
size="0"
prefix="0816headbusy" <-- edit to differentiate each test run

~

Step-2. Edit the do_one_test_PUB.sh
IF commit_counter = 1000 THEN <-- edit this if needed. I wanted 1000
inserts/tx so nothing to do

~

Step-3: Check nothing else is running. If yes, then clean it up
[peter@localhost testing_busy]$ ps -eaf | grep postgres
peter111924 100103  0 19:31 pts/000:00:00 grep --color=auto postgres

~

Step-4: Run the tests
[peter@localhost testing_busy]$ ./testrun.sh
num_tables=100, size=0, num_workers=2, run #1 <-- check the echo
matched the config you set in the Setp-1
waiting for server to shut down done
server stopped
waiting for server to shut down done
server stopped
num_tables=100, size=0, num_workers=2, run #2
waiting for server to shut down done
server stopped
waiting for server to shut down done
server stopped
num_tables=100, size=0, num_workers=2, run #3
...

~

Step-5: Sanity check
When the test completes the current folder will be full of .log and .dat* files.
Check for sanity that no errors happened

[peter@localhost testing_busy]$ cat *.log | grep ERROR
[peter@localhost testing_busy]$

~

Step-6: Collect the results
The results are output (by the do_one_test_SUB.sh) into the *.dat_SUB files
Use grep to extract them

[peter@localhost testing_busy]$ cat 0816headbusy_100t_0_2w_*.dat_SUB |
grep RESULT | grep -v duration | awk '{print $3}'
11742.019
12157.355
11773.807
11582.981
12220.962
12546.325
12210.713
12614.892
12015.489
13527.05

Repeat grep for other files:
$ cat 0816headbusy_100t_0_4w_*.dat_SUB | grep RESULT | grep -v
duration | awk '{print $3}'
$ cat 0816headbusy_100t_0_8w_*.dat_SUB | grep RESULT | grep -v
duration | awk '{print $3}'
$ cat 0816headbusy_100t_0_16w_*.dat_SUB | grep RESULT | grep -v
duration | awk '{print $3}'

~

Step-7: Summarise the results
Now I just cut/paste the results from Step-6 into a spreadsheet and
report the median of the runs.

For example, for the above HEAD run, it was:
 2w4w   8w  16w
1   11742   5996   1919   1582
2   12157   5960   1871   1469
3   11774   5926   2101   1571
4   11583   6155   1883   1671
5   12221   6310   1895   1707
6   12

Re: pg_logical_emit_message() misses a XLogFlush()

On Wed, Aug 16, 2023 at 03:20:53AM +0200, Tomas Vondra wrote:
> So are you objecting to adding the flush in general, or just to the
> backpatching part?
> 
> IMHO we either guarantee durability of non-transactional messages, in
> which case this would be a clear bug - and I'd say a fairly serious one.
> I'm curious what the workload that'd see order of magnitude of slowdown
> does with logical messages, but even if such workload exists, would it
> really be enough to fix any other durability bug?

Yes, I also think that this is a pretty serious issue to not ensure
durability in the non-transactional case if you have solutions
designed around that.

> Or perhaps we don't want to guarantee durability for such messages, in
> which case we don't need to fix it at all (even in master).

I mean, just look at the first message of the thread I am mentioning
at the top of this thread: it lists three cases where something like
pg_logical_emit_message() was wanted.  And, it looks like a serious
issue to me for the first two ones at least (out-of-order messages and
inter-node communication), because an application may want to send
something from node1 to node2, and node1 may just forget about it
entirely if it crashes, finishing WAL redo at an earlier point than
the record inserted.

> The docs are not very clear on what to expect, unfortunately. It says
> that non-transactional messages are "written immediately" which I could
> interpret in either way.

Agreed.  The original thread never mentions "flush", "sync" or
"durab".

I won't fight much if there are objections to backpatching, but that's
not really wise (no idea how much EDB's close flavor of BDR relies on
that).
--
Michael

signature.asc
Description: PGP signature

Re: [PATCH] Add function to_oct

2023-08-15 Thread John Naylor

On Wed, Aug 16, 2023 at 12:17 AM Nathan Bossart 
wrote:
>
> On Tue, Aug 15, 2023 at 01:53:25PM +0700, John Naylor wrote:

> > - *ptr = '\0';
> > + Assert(base == 2 || base == 8 || base == 16);
> >
> > + *ptr = '\0';
> >
> > Spurious whitespace change?
>
> I think this might just be a weird artifact of the diff algorithm.

Don't believe everything you think. :-)

```
*ptr = '\0';

do
```

to

```
*ptr = '\0';
do
```

> > - char buf[32]; /* bigger than needed, but reasonable */
> > + char   *buf = palloc(sizeof(uint64) * BITS_PER_BYTE + 1);
> >
> > Why is this no longer allocated on the stack? Maybe needs a comment
about
> > the size calculation.
>
> It really should be.  IIRC I wanted to avoid passing a pre-allocated
buffer
> to convert_to_base(), but I don't remember why.  I fixed this in v5.

Now I'm struggling to understand why each and every instance has its own
nominal buffer, passed down to the implementation. All we care about is the
result -- is there some reason not to confine the buffer declaration to the
general implementation?

--
John Naylor
EDB: http://www.enterprisedb.com

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-15 Thread Hayato Kuroda (Fujitsu)

Dear Hou,

Thanks for posting the patch! I want to open a question to gather opinions from 
others.

> > It was primarily for upgrade purposes only. So, as we can't see a good 
> > reason to
> > go via pg_dump let's do it in upgrade unless someone thinks otherwise.
> 
> Removed the new option in pg_dump and modified the pg_upgrade
> directly use the slot info to restore the slot in new cluster.

In this version, creations of logical slots are serialized, whereas old ones 
were
parallelised per db. Do you it should be parallelized again? I have tested 
locally
and felt harmless. Also, this approch allows to log the executed SQLs.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: pg_logical_emit_message() misses a XLogFlush()

2023-08-15 Thread Tomas Vondra

On 8/16/23 02:33, Andres Freund wrote:
> Hi,
> 
> On 2023-08-16 06:58:56 +0900, Michael Paquier wrote:
>> On Tue, Aug 15, 2023 at 11:37:32AM +0200, Tomas Vondra wrote:
>>> Shouldn't the flush be done only for non-transactional messages? The
>>> transactional case will be flushed by regular commit flush.
>>
>> Indeed, that would be better.  I am sending an updated patch.
>>
>> I'd like to backpatch that, would there be any objections to that?
> 
> Yes, I object. This would completely cripple the performance of some uses of
> logical messages - a slowdown of several orders of magnitude. It's not clear
> to me that flushing would be the right behaviour if it weren't released, but
> it certainly doesn't seem right to make such a change in a minor release.
> 

So are you objecting to adding the flush in general, or just to the
backpatching part?

IMHO we either guarantee durability of non-transactional messages, in
which case this would be a clear bug - and I'd say a fairly serious one.
I'm curious what the workload that'd see order of magnitude of slowdown
does with logical messages, but even if such workload exists, would it
really be enough to fix any other durability bug?

Or perhaps we don't want to guarantee durability for such messages, in
which case we don't need to fix it at all (even in master).

The docs are not very clear on what to expect, unfortunately. It says
that non-transactional messages are "written immediately" which I could
interpret in either way.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Test case for parameterized remote path in postgres_fdw

2023-08-15 Thread Richard Guo

On Tue, Aug 15, 2023 at 7:50 PM Etsuro Fujita 
wrote:

> So we should have modified the second one as well?  Attached is a
> small patch for that.


Agreed, nice catch!  +1 to the patch.

Thanks
Richard

Re: pg_logical_emit_message() misses a XLogFlush()

2023-08-15 Thread Andres Freund

Hi,

On 2023-08-16 06:58:56 +0900, Michael Paquier wrote:
> On Tue, Aug 15, 2023 at 11:37:32AM +0200, Tomas Vondra wrote:
> > Shouldn't the flush be done only for non-transactional messages? The
> > transactional case will be flushed by regular commit flush.
> 
> Indeed, that would be better.  I am sending an updated patch.
> 
> I'd like to backpatch that, would there be any objections to that?

Yes, I object. This would completely cripple the performance of some uses of
logical messages - a slowdown of several orders of magnitude. It's not clear
to me that flushing would be the right behaviour if it weren't released, but
it certainly doesn't seem right to make such a change in a minor release.

Greetings,

Andres Freund

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-08-15 Thread Peter Smith

Here is another review comment about patch v26-0001.

The tablesync worker processes include the 'relid' as part of their
name. See launcher.c:

snprintf(bgw.bgw_name, BGW_MAXLEN,
"logical replication tablesync worker for subscription %u sync %u",
subid,
relid);

~~

And if that worker is "reused" by v26-0001 to process another relation
there is a LOG

if (reuse_worker)
ereport(LOG,
errmsg("logical replication table synchronization worker for
subscription \"%s\" will be reused to sync table \"%s\" with relid
%u.",
MySubscription->name,
get_rel_name(MyLogicalRepWorker->relid),
MyLogicalRepWorker->relid));


AFAICT, when being "reused" the original process name remains
unchanged, and so I think it will continue to appear to any user
looking at it that the tablesync process is just taking a very long
time handling the original 'relid'.

Won't the stale process name cause confusion to the users?

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: should frontend tools use syncfs() ?

On Tue, Aug 08, 2023 at 01:06:06PM -0700, Nathan Bossart wrote:
> I ran a couple of tests for pg_upgrade with 100k tables (created using the
> script here [0]) in order to demonstrate the potential benefits of this
> patch.

That shows some nice numbers with many files, indeed.  How does the
size of each file influence the difference in time?

+else
+{
+while (errno = 0, (de = readdir(dir)) != NULL)
+{
+charsubpath[MAXPGPATH * 2];
+
+if (strcmp(de->d_name, ".") == 0 ||
+strcmp(de->d_name, "..") == 0)
+continue;

It seems to me that there is no need to do that for in-place
tablespaces.  There are relative paths in pg_tblspc/, so they would be
taken care of by the syncfs() done on the main data folder.

This does not really check if the mount points of each tablespace is
different, as well.  For example, imagine that you have two
tablespaces within the same disk, syncfs() twice.  Perhaps, the
current logic is OK anyway as long as the behavior is optional, but it
should be explained in the docs, at least.

I'm finding a bit confusing that fsync_pgdata() is coded in such a way
that it does a silent fallback to the cascading syncs through
walkdir() when syncfs is specified but not available in the build.
Perhaps an error is more helpful because one would then know that they
are trying something that's not around?

+   pg_log_error("could not synchronize file system for file \"%s\": %m", 
path);
+   (void) close(fd);
+   exit(EXIT_FAILURE);

walkdir() reports errors and does not consider these fatal.  Why the
early exit()?

I am a bit concerned about the amount of duplication this patch
introduces in the docs.  Perhaps this had better be moved into a new
section of the docs to explain the tradeoffs, with each tool linking
to it?

Do we actually need --no-sync at all if --sync-method is around?  We
could have an extra --sync-method=none at option level with --no-sync
still around mainly for compatibility?  Or perhaps that's just
over-designing things?
--
Michael

signature.asc
Description: PGP signature

Re: Avoid a potential unstable test case: xmlmap.sql

>
>
> Please look at the bug #18014:
>
> https://www.postgresql.org/message-id/flat/18014-28c81cb79d44295d%40postgresql.org
> There were other aspects of the xmlmap test failure discussed in that
> thread as well.
>

Thank you Alexander for the information,   I will go through there for
discussion.

-- 
Best Regards
Andy Fan

Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

On Tue, Aug 15, 2023 at 03:39:10PM -0700, Jacob Champion wrote:
> I'm not super comfortable with saying "connection authenticated" when
> it explicitly hasn't been (nor with switching the meaning of a
> non-NULL SYSTEM_USER from "definitely authenticated somehow" to "who
> knows; parse it apart to see"). But adding a log entry ("connection
> trusted:" or some such?) with the pointer to the HBA line that made it
> happen seems like a useful audit helper to me.

Yeah, thanks for confirming.  That's also the impression I get after
reading again the original thread and the idea of how this code path
is handled in this commit.

We could do something like a LOG "connection: method=%s user=%s
(%s:%d)", without the "authenticated" and "identity" terms from
set_authn_id().  Just to drop an idea.
--
Michael


signature.asc
Description: PGP signature

Re: Would it be possible to backpatch Close support in libpq (28b5726) to PG16?

2023-08-15 Thread Alvaro Herrera

On 2023-Aug-16, Michael Paquier wrote:

> > Personally I think backpatching 28b5726 has a really low risk of
> > breaking anything.
> 
> I agree about the low-risk argument, though.  This is just new code.

Here's a way to think about it.  If 16.1 was already out, would we add
libpq support for Close to 16.2?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)

Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-15 Thread Jacob Champion

On Tue, Aug 15, 2023 at 3:24 PM Michael Paquier  wrote:
> The first message from Jacob outlines the idea behind the handling of
> trust.  We could perhaps add one extra set_authn_id() for the uaTrust
> case (not uaCert!) if that's more helpful.

I'm not super comfortable with saying "connection authenticated" when
it explicitly hasn't been (nor with switching the meaning of a
non-NULL SYSTEM_USER from "definitely authenticated somehow" to "who
knows; parse it apart to see"). But adding a log entry ("connection
trusted:" or some such?) with the pointer to the HBA line that made it
happen seems like a useful audit helper to me.

--Jacob

Re: Would it be possible to backpatch Close support in libpq (28b5726) to PG16?

On Wed, Aug 16, 2023 at 12:14:21AM +0200, Jelte Fennema wrote:
> 28b5726 allows sending Close messages from libpq, as opposed to
> sending DEALLOCATE queries to deallocate prepared statements. Without
> support for Close messages, libpq based clients won't be able to
> deallocate prepared statements on PgBouncer, because PgBouncer does
> not parse SQL queries and only looks at protocol level messages (i.e.
> Close messages for deallocation).

The RMT has the final word on anything related to the release, but we
are discussing about adding something new to a branch that has gone
through two beta cycles with a GA targetted around the end of
September ~ beginning of October based on the trends of the recent
years.  That's out of the picture, IMO.  This comes once every year.

> Personally I think backpatching 28b5726 has a really low risk of
> breaking anything.

I agree about the low-risk argument, though.  This is just new code.
--
Michael

signature.asc
Description: PGP signature

Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

On Tue, Aug 15, 2023 at 04:49:47PM -0500, Shaun Thomas wrote:
> The switch statement that decodes port->hba->auth_method ends by
> simply setting status = STATUS_OK; with no supplementary output since
> it never calls set_authn_id. So in theory, a malicious user could add
> a trust line to pg_hba.conf and have unlimited unlogged access to the
> database.

That's the same as giving access to your data folder.  Updating
pg_hba.conf is only the tip of the iceberg if one has write access to
your data folder.

> Unless you happen to notice that the "connection
> authenticated" line has disappeared, it would look like normal
> activity.

"trust" is not really an anthentication method because it does
nothing, it just authorizes things to go through, so you cannot
really say that it can have an authn ID (grep for "pedantic" around
here):
https://www.postgresql.org/message-id/c55788dd1773c521c862e8e0dddb367df51222be.ca...@vmware.com

> Would it make sense to decouple the hba info from set_authn_id so that
> it is always logged even when new auth methods get added in the
> future? Or alternatively create a function call specifically for that
> output so it can be produced from the trust case statement and
> anywhere else that needs to tag the auth line. I personally would love
> to see if someone got in through a trust line, ESPECIALLY if it isn't
> supposed to be there. Like:
>
> 2023-08-15 13:26:03.159 PDT [692166] postgres@snip LOG: connection
> authenticated: identity="postgres" method=trust
> (/etc/postgresql/15/main/pg_hba.conf:1)

You mean outside the switch/case in ClientAuthentication().  Some
methods have an authn that is implementation-dependent, like ldap.  I
am not sure if switching the logic would lead to a gain, like calling
once set_authn_id() vs passing up a string to set in a single call of
set_authn_id().

> I read through the discussion, and it doesn't seem like the security
> aspect of simply hiding trust auths from the log was considered. Since
> this is a new capability, I suppose nothing is really different from
> say Postgres 14 and below. Still, it never hurts to ask.

The first message from Jacob outlines the idea behind the handling of
trust.  We could perhaps add one extra set_authn_id() for the uaTrust
case (not uaCert!) if that's more helpful.
--
Michael


signature.asc
Description: PGP signature

Would it be possible to backpatch Close support in libpq (28b5726) to PG16?

2023-08-15 Thread Jelte Fennema

Hi,

I know that this will probably get a staunch "No" as an answer, but...
I'm still going to ask: Would it be possible to backport 28b5726 to
the PG16 branch? Even though it's clearly a new feature?

I'm working on named prepared statement support in PgBouncer:
https://github.com/pgbouncer/pgbouncer/pull/845 That PR is pretty
close to mergable (IMO) and my intention is to release a PgBouncer
version with prepared statement support within a few months.

28b5726 allows sending Close messages from libpq, as opposed to
sending DEALLOCATE queries to deallocate prepared statements. Without
support for Close messages, libpq based clients won't be able to
deallocate prepared statements on PgBouncer, because PgBouncer does
not parse SQL queries and only looks at protocol level messages (i.e.
Close messages for deallocation).

Personally I think backpatching 28b5726 has a really low risk of
breaking anything. And since PgBouncer is used a lot in the Postgres
ecosystem, especially with libpq based clients, IMO it might be worth
deviating from the rule of not backporting features after a STABLE
branch has been cut. Otherwise all libpq based clients will have only
limited support for prepared statements with PgBouncer until PG17 is
released.

Jelte

Re: pg_logical_emit_message() misses a XLogFlush()

On Tue, Aug 15, 2023 at 11:37:32AM +0200, Tomas Vondra wrote:
> Shouldn't the flush be done only for non-transactional messages? The
> transactional case will be flushed by regular commit flush.

Indeed, that would be better.  I am sending an updated patch.

I'd like to backpatch that, would there be any objections to that?
This may depend on how much logical solutions depend on this routine.
--
Michael
diff --git a/src/backend/replication/logical/message.c b/src/backend/replication/logical/message.c
index c5de14afc6..cb3a806114 100644
--- a/src/backend/replication/logical/message.c
+++ b/src/backend/replication/logical/message.c
@@ -47,6 +47,7 @@ LogLogicalMessage(const char *prefix, const char *message, size_t size,
   bool transactional)
 {
 	xl_logical_message xlrec;
+	XLogRecPtr	lsn;
 
 	/*
 	 * Force xid to be allocated if we're emitting a transactional message.
@@ -71,7 +72,15 @@ LogLogicalMessage(const char *prefix, const char *message, size_t size,
 	/* allow origin filtering */
 	XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN);
 
-	return XLogInsert(RM_LOGICALMSG_ID, XLOG_LOGICAL_MESSAGE);
+	lsn = XLogInsert(RM_LOGICALMSG_ID, XLOG_LOGICAL_MESSAGE);
+
+	/*
+	 * Make sure that the message hits disk before leaving if not emitting a
+	 * transactional message.
+	 */
+	if (!transactional)
+		XLogFlush(lsn);
+	return lsn;
 }
 
 /*


signature.asc
Description: PGP signature

Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-15 Thread Shaun Thomas

Hi everyone,

This started as a conversation on Discord. Someone asked if Postgres
logs which line in pg_hba.conf matched against a certain login
attempt, and I said no. That's not quite right, as enabling
log_connections includes a line like this:

2023-08-15 13:26:03.159 PDT [692166] postgres@snip LOG: connection
authenticated: identity="postgres" method=md5
(/etc/postgresql/15/main/pg_hba.conf:107)

But I wasn't getting that output. I finally gave up and looked at the
code, where I found that this particular output is only generated by
the set_authn_id function. So if that function is never called,
there's no message saying which line from the pg_hba.conf file matched
a particular login.

The switch statement that decodes port->hba->auth_method ends by
simply setting status = STATUS_OK; with no supplementary output since
it never calls set_authn_id. So in theory, a malicious user could add
a trust line to pg_hba.conf and have unlimited unlogged access to the
database. Unless you happen to notice that the "connection
authenticated" line has disappeared, it would look like normal
activity.

Would it make sense to decouple the hba info from set_authn_id so that
it is always logged even when new auth methods get added in the
future? Or alternatively create a function call specifically for that
output so it can be produced from the trust case statement and
anywhere else that needs to tag the auth line. I personally would love
to see if someone got in through a trust line, ESPECIALLY if it isn't
supposed to be there. Like:

2023-08-15 13:26:03.159 PDT [692166] postgres@snip LOG: connection
authenticated: identity="postgres" method=trust
(/etc/postgresql/15/main/pg_hba.conf:1)

Perhaps I'm being too paranoid; It just seemed to be an odd omission.
Euler Taveira clued me into the initial patch which introduced the
pg_hba.conf tattling:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9afffcb833d3c5e59a328a2af674fac7e7334fc1

I read through the discussion, and it doesn't seem like the security
aspect of simply hiding trust auths from the log was considered. Since
this is a new capability, I suppose nothing is really different from
say Postgres 14 and below. Still, it never hurts to ask.

Cheers!

--
Shaun Thomas
High Availability Architect
EDB
www.enterprisedb.com

Re: Using defines for protocol characters

2023-08-15 Thread Tatsuo Ishii

> On Wed, Aug 16, 2023 at 06:25:09AM +0900, Tatsuo Ishii wrote:
>> Currently pqcomm.h needs c.h which is not problem for Pgpool-II. But
>> what about other middleware?
> 
> Why do you need to include directly c.h?  There are definitions in
> there that are not intended to be exposed.

pqcomm.h has this:

#define UNIXSOCK_PATH(path, port, sockdir) \
   (AssertMacro(sockdir), \
AssertMacro(*(sockdir) != '\0'), \
snprintf(path, sizeof(path), "%s/.s.PGSQL.%d", \
 (sockdir), (port)))

AssertMacro is defined in c.h.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Using defines for protocol characters

2023-08-15 Thread Alvaro Herrera

On 2023-Aug-16, Michael Paquier wrote:

> On Wed, Aug 16, 2023 at 06:25:09AM +0900, Tatsuo Ishii wrote:
> > Currently pqcomm.h needs c.h which is not problem for Pgpool-II. But
> > what about other middleware?
> 
> Why do you need to include directly c.h?  There are definitions in
> there that are not intended to be exposed.

What this argument says is that these new defines should be in a
separate file, not in pqcomm.h.  IMO that makes sense, precisely because
these defines should be usable by third parties.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Using defines for protocol characters

On Wed, Aug 16, 2023 at 06:25:09AM +0900, Tatsuo Ishii wrote:
> Currently pqcomm.h needs c.h which is not problem for Pgpool-II. But
> what about other middleware?

Why do you need to include directly c.h?  There are definitions in
there that are not intended to be exposed.
--
Michael


signature.asc
Description: PGP signature

Re: Using defines for protocol characters

2023-08-15 Thread Tatsuo Ishii

> On Tue, Aug 15, 2023 at 02:46:24PM +0900, Tatsuo Ishii wrote:
>> Is it possible to put the new define staff into protocol.h then let
>> pqcomm.h include protocol.h? This makes Pgpool-II and other middle
>> ware/drivers written in C easier to use the defines so that they only
>> include protocol.h to use the defines.
> 
> It is possible, of course, but are there any reasons such programs couldn't
> include pqcomm.h?  It looks relatively inexpensive to me.  That being said,
> I'm fine with the approach you describe if the folks in this thread agree.

Currently pqcomm.h needs c.h which is not problem for Pgpool-II. But
what about other middleware?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: run pgindent on a regular basis / scripted manner

On Fri, Aug 11, 2023 at 01:59:40PM -0700, Peter Geoghegan wrote:
> I'm starting to have doubts about this policy. There have now been
> quite a few follow-up "fixes" to indentation issues that koel
> complained about. None of these fixups have been included in
> .git-blame-ignore-revs. If things continue like this then "git blame"
> is bound to become much less usable over time.

Should we add those?  Patch attached.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From f4f158e153f9027330ce841ca79b5650dfd24a0e Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 15 Aug 2023 13:24:18 -0700
Subject: [PATCH v1 1/1] Add a few recent commits to .git-blame-ignore-revs.

---
 .git-blame-ignore-revs | 21 +
 1 file changed, 21 insertions(+)

diff --git a/.git-blame-ignore-revs b/.git-blame-ignore-revs
index 00b548c7b0..2ff86c8d0d 100644
--- a/.git-blame-ignore-revs
+++ b/.git-blame-ignore-revs
@@ -14,6 +14,27 @@
 #
 # $ git log --pretty=format:"%H # %cd%n# %s" $PGINDENTGITHASH -1 --date=iso
 
+89be0b89ae60c63856fd26d82a104781540e2312 # 2023-08-15 17:45:00 +0900
+# Fix code indentation vioaltion introduced in commit 9e9931d2b.
+
+5dc456b7dda4f7d0d7735b066607c190c74d174a # 2023-08-11 20:43:34 +0900
+# Fix code indentation violations introduced by recent commit
+
+62e9af4c63fbd36fb9af8450fb44bece76d7766f # 2023-07-25 12:35:58 +0530
+# Fix code indentation vioaltion introduced in commit d38ad8e31d.
+
+4e465aac36ce9a9533c68dbdc83e67579880e628 # 2023-07-18 14:04:31 +0900
+# Fix indentation in twophase.c
+
+328f492d2565cfbe383f13a69425d751fd79415f # 2023-07-13 22:26:10 +0900
+# Fix code indentation violation in commit b6e1157e7d
+
+69a674a170058e63e8178aec8a36a673efce8801 # 2023-07-06 11:49:18 +0530
+# Fix code indentation vioaltion introduced in commit cc32ec24fd.
+
+a4cfeeca5a97f2b5969c31aa69ba775af95ee5a3 # 2023-07-03 12:47:49 +0200
+# Fix code indentation violations
+
 b334612b8aee9f9a34378982d8938b201dfad323 # 2023-06-20 09:50:43 -0400
 # Pre-beta2 mechanical code beautification.
 
-- 
2.25.1

Re: [PATCH] Add function to_oct

On Tue, Aug 15, 2023 at 01:53:25PM +0700, John Naylor wrote:
> If we're not going to have a general SQL conversion function, here are some
> comments on the current patch.

I appreciate the review.

> +static char *convert_to_base(uint64 value, int base);
> 
> Not needed if the definition is above the callers.

Done.

> + * Workhorse for to_binary, to_oct, and to_hex.  Note that base must be
> either
> + * 2, 8, or 16.
> 
> Why wouldn't it work with any base <= 16?

You're right.  I changed this in v5.

> - *ptr = '\0';
> + Assert(base == 2 || base == 8 || base == 16);
> 
> + *ptr = '\0';
> 
> Spurious whitespace change?

I think this might just be a weird artifact of the diff algorithm.

> - char buf[32]; /* bigger than needed, but reasonable */
> + char   *buf = palloc(sizeof(uint64) * BITS_PER_BYTE + 1);
> 
> Why is this no longer allocated on the stack? Maybe needs a comment about
> the size calculation.

It really should be.  IIRC I wanted to avoid passing a pre-allocated buffer
to convert_to_base(), but I don't remember why.  I fixed this in v5.

> +static char *
> +convert_to_base(uint64 value, int base)
> 
> On my machine this now requires a function call and a DIV instruction, even
> though the patch claims not to support anything but a power of two. It's
> tiny enough to declare it inline so the compiler can specialize for each
> call site.

This was on my list of things to check before committing.  I assumed that
it would be automatically inlined, but given your analysis, I went ahead
and added the inline keyword.  My compiler took the hint and inlined the
function, and it used SHR instead of DIV instructions.  The machine code
for to_hex32/64 is still a couple of instructions longer than before
(probably because of the uint64 casts), but I don't know if we need to
worry about micro-optimizing these functions any further.

> +{ oid => '5101', descr => 'convert int4 number to binary',
> 
> Needs to be over 8000.

Done.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 2ad83f8781a013ef5e5e5a62a422f3b73c6c5f2d Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 25 Jul 2023 16:09:01 -0700
Subject: [PATCH v5 1/1] add to_binary() and to_oct()

---
 doc/src/sgml/func.sgml|  42 +++
 src/backend/utils/adt/varlena.c   | 103 +++---
 src/include/catalog/pg_proc.dat   |  12 +++
 src/test/regress/expected/strings.out |  26 ++-
 src/test/regress/sql/strings.sql  |   9 ++-
 5 files changed, 163 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index be2f54c914..23b5ac7457 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3737,6 +3737,48 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ to_binary
+
+to_binary ( integer )
+text
+   
+   
+to_binary ( bigint )
+text
+   
+   
+Converts the number to its equivalent binary representation.
+   
+   
+to_binary(2147483647)
+111
+   
+  
+
+  
+   
+
+ to_oct
+
+to_oct ( integer )
+text
+   
+   
+to_oct ( bigint )
+text
+   
+   
+Converts the number to its equivalent octal representation.
+   
+   
+to_oct(2147483647)
+177
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index b1ec5c32ce..b955e54611 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -4919,54 +4919,105 @@ array_to_text_internal(FunctionCallInfo fcinfo, ArrayType *v,
 	return result;
 }
 
-#define HEXBASE 16
 /*
- * Convert an int32 to a string containing a base 16 (hex) representation of
- * the number.
+ * We size the buffer for to_binary's longest possible return value, including
+ * the terminating '\0'.
  */
-Datum
-to_hex32(PG_FUNCTION_ARGS)
+#define CONVERT_TO_BASE_BUFSIZE		(sizeof(uint64) * BITS_PER_BYTE + 1)
+
+/*
+ * Workhorse for to_binary, to_oct, and to_hex.  Note that base must be > 1 and
+ * <= 16.  buf must be at least CONVERT_TO_BASE_BUFSIZE bytes.
+ */
+static inline char *
+convert_to_base(uint64 value, int base, char *buf)
 {
-	uint32		value = (uint32) PG_GETARG_INT32(0);
-	char	   *ptr;
 	const char *digits = "0123456789abcdef";
-	char		buf[32];		/* bigger than needed, but reasonable */
+	char	   *ptr = buf + CONVERT_TO_BASE_BUFSIZE - 1;
 
-	ptr = buf + sizeof(buf) - 1;
-	*ptr = '\0';
+	Assert(base > 1);
+	Assert(base <= 16);
 
+	*ptr = '\0';
 	do
 	{
-		*--ptr = digits[value % HEXBASE];
-		value /= HEXBASE;
+		*--ptr = digits[value % base];
+		value /= base;
 	} while (ptr > buf && value);
 
-	PG_RETURN_TEXT_P(cstring_to_text(ptr));
+	return ptr;
+}
+
+/*
+ * Convert an integer to a string containing a base-2 (binary) representation
+ * of the number.
+

Re: initial pruning in parallel append

2023-08-15 Thread Robert Haas

On Wed, Aug 9, 2023 at 8:57 AM Amit Langote  wrote:
> I checked enough to be sure that IsParallelWorker() is reliable at the
> time of ExecutorStart() / ExecInitNode() in ParallelQueryMain() in a
> worker.   However, ParallelWorkerContext is not available at that
> point.  Here's the relevant part of ParallelQueryMain():
>
> /* Start up the executor */
> queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
> ExecutorStart(queryDesc, fpes->eflags);
>
> /* Special executor initialization steps for parallel workers */
> queryDesc->planstate->state->es_query_dsa = area;
> if (DsaPointerIsValid(fpes->param_exec))
> {
> char   *paramexec_space;
>
> paramexec_space = dsa_get_address(area, fpes->param_exec);
> RestoreParamExecParams(paramexec_space, queryDesc->estate);
> }
> pwcxt.toc = toc;
> pwcxt.seg = seg;
> ExecParallelInitializeWorker(queryDesc->planstate, &pwcxt);
>
> BTW, we do also use IsParallelWorker() in ExecGetRangeTableRelation()
> which also probably only runs during ExecInitNode(), same as
> ExecInitPartitionPruning() that this patch adds it to.

I don't know if that's a great idea, but I guess if we're already
doing it, it doesn't hurt to expand the use a little bit.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Faster "SET search_path"

2023-08-15 Thread Robert Haas

On Mon, Aug 7, 2023 at 7:24 PM Jeff Davis  wrote:
> I might just avoid guc.c entirely and directly set
> namespace_search_path and baseSearchPathValid=false. The main thing we
> lose there is the GUC stack (AtEOXact_GUC()), but there's already a
> PG_TRY/PG_FINALLY block in fmgr_security_definer, so we can use that to
> change it back safely. (I think? I need to convince myself that it's
> safe to skip the work in guc.c, at least for the special case of
> search_path, and that it won't be too fragile.)

I suspect that dodging the GUC stack machinery is not a very good
idea. The timing of when TRY/CATCH blocks are called is different from
when subtransactions are aborted, and that distinction has messed me
up more than once when trying to code various things. Specifically,
changes that happen in a CATCH block happen before we actually reach
Abort(Sub)Transaction, which sort of makes it sound like you'd be
reverting back to the old value too early. The GUC stack is also
pretty complicated because of the SET/SAVE and LOCAL/non-LOCAL stuff.
I can't say there isn't a way to make something like what you're
talking about here work, but I bet it will be difficult to get all of
the corner cases right, and I suspect that trying to speed up the
existing mechanism is a better plan than trying to go around it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Extending SMgrRelation lifetimes

2023-08-15 Thread Heikki Linnakangas


On 14/08/2023 05:41, Thomas Munro wrote:

The new idea is to overload smgrrelease(it) so that it also clears the
owner, which means that AtEOXact_SMgr() will eventually smgrclose(it),
unless it is re-owned by a relation before then.  That choice stems
from the complete lack of information available via sinval in the case
of an overflow.  We must (1) close all descriptors because any file
might have been unlinked, (2) keep all pointers valid and yet (3) not
leak dropped smgr objects forever.  In this patch, smgrreleaseall()
achieves those goals.


Makes sense.

Some of the smgrclose() calls could perhaps still be smgrclose(). If you 
can guarantee that no-one is holding the SMgrRelation, it's still OK to 
call smgrclose(). There's little gain from closing earlier, though.



Proof-of-concept patch attached.  Are there holes in this scheme?
Better ideas welcome.  In terms of spelling, another option would be
to change the behaviour of smgrclose() to work as described, ie it
would call smgrrelease() and then also disown, so we don't have to
change most of those callers, and then add a new function
smgrdestroy() for the few places that truly need it.  Or something
like that.


If you change smgrclose() to do what smgrrelease() does now, then it 
will apply automatically to extensions.


If an extension is currently using smgropen()/smgrclose() correctly, 
this patch alone won't make it wrong, so it's not very critical for 
extensions to adopt the change. However, if after this we consider it OK 
to hold a pointer to SMgrRelation for longer, and start doing that in 
the backend, then extensions need to be adapted too.



While studying this I noticed a minor thinko in smgrrelease() in
15+16, so here's a fix for that also.  I haven't figured out a
sequence that makes anything bad happen, but we should really
invalidate smgr_targblock when a relfilenode is reused, since it might
point past the end.  This becomes more obvious once smgrrelease() is
used for truncation, as proposed here.


+1. You can move the smgr_targblock clearing out of the loop, though.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Using defines for protocol characters

On Tue, Aug 15, 2023 at 02:46:24PM +0900, Tatsuo Ishii wrote:
> Is it possible to put the new define staff into protocol.h then let
> pqcomm.h include protocol.h? This makes Pgpool-II and other middle
> ware/drivers written in C easier to use the defines so that they only
> include protocol.h to use the defines.

It is possible, of course, but are there any reasons such programs couldn't
include pqcomm.h?  It looks relatively inexpensive to me.  That being said,
I'm fine with the approach you describe if the folks in this thread agree.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Add function to_oct

On Tue, Aug 15, 2023 at 07:58:17AM +0200, Vik Fearing wrote:
> On 8/15/23 06:11, Nathan Bossart wrote:
>> If there are no objections, I'd like to commit this patch soon.
> 
> I just took a look at this (and the rest of the thread).  I am a little bit
> disappointed that we don't have a generic base conversion function, but even
> if we did I think these specialized functions would still be useful.
> 
> No objection from me.

Thanks for taking a look.  I don't mean for this to preclude a generic base
conversion function that would supersede the functions added by this patch.
However, I didn't want to hold up $SUBJECT because of something that may or
may not happen after lots of discussion.  If it does happen within the v17
development cycle, we could probably just remove to_oct/binary.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: pgbench - adding pl/pgsql versions of tests

On Tue, Aug 15, 2023 at 09:46:59AM +0200, Fabien COELHO wrote:
> I'm unclear about what variety of scripts that could be provided given the
> tables made available with pgbench. ISTM that other scenari would involve
> both an initialization and associated scripts, and any proposal would be
> bared because it would open the door to anything.

Why's that?  I'm not aware of any project policy that prohibits such
enhancements to pgbench.  It might take some effort to gather consensus on
a proposal like this, but IMHO that doesn't mean we shouldn't try.  If the
prevailing wisdom is that we shouldn't add more built-in scripts because
there is an existing way to provide custom ones, then it's not clear that
we should proceed with $SUBJECT, anyway.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Replace known_assigned_xids_lck by memory barrier

On Tue, Aug 15, 2023 at 12:29:24PM +0200, Michail Nikolaev wrote:
>> What sort of benefits do you see from this patch? It might be worthwhile
>> in itself to remove spinlocks when possible, but IME it's much easier to
>> justify such changes when there is a tangible benefit we can point to.
> 
> Oh, it is not an easy question :)
> 
> The answer, probably, looks like this:
> 1) performance benefits of spin lock acquire removing in
> KnownAssignedXidsGetOldestXmin and KnownAssignedXidsSearch
> 2) it is closing 13-year-old tech depth
> 
> But in reality, it is not easy to measure performance improvement
> consistently for this change.

Okay.  Elsewhere, it seems like folks are fine with patches that reduce
shared memory space via atomics or barriers even if there's no immediate
benefit [0], so I think it's fine to proceed.

>> Are the assignments in question guaranteed to be atomic? IIUC we assume
>> that aligned 4-byte loads/stores are atomic, so we should be okay as long
>> as we aren't handling anything larger.
> 
> Yes, 4-bytes assignment are atomic, locking is used to ensure memory
> write ordering in this place.

Yeah, it looks like both the values that are protected by
known_assigned_xids_lck are integers, so this should be okay.  One
remaining question I have is whether it is okay if we see an updated value
for one of the head/tail variables but not the other.  It looks like the
tail variable is only updated with ProcArrayLock held exclusively, which
IIUC wouldn't prevent such mismatches even today, since we use a separate
spinlock for reading them in some cases.

[0] https://postgr.es/m/20230524214958.mt6f5xokpumvnrio%40awork3.anarazel.de

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Converting sql anywhere to postgres

2023-08-15 Thread Russell Rose | Passfield Data Systems

Hi there

I am trying to convert a SQL Anywhere database to postgres. Within SQL anywhere 
a field can have a default value of 'last user'. This means that when you 
perform an update on a table, if the field is not explicitly set then the 
current user is used. So for instance if I have a field called mod_user in a 
table, but when I do an update on the table and do not set mod_user then SQL 
Anywhere sets the field to current_uer. I have tried to replicate this using a 
postgres trigger in the before update. However, if I do not set the value then 
it automatically picks up the value that was already in the field. Is there a 
way to tell the difference between me setting the value to the same as the 
previous value and postgres automatically picking it up.

If the field myfield contains the word 'me'. Can I tell the difference between:
Update table1 set field1='something',myfield='me'
And
Update table1 set field1='something'

Re: Avoid a potential unstable test case: xmlmap.sql

2023-08-15 Thread Alexander Lakhin


Hi Andy,

15.08.2023 14:09, Andy Fan wrote:


Hi:

In the test case of xmlmap.sql, we have the query below under schema_to_xml.



Please look at the bug #18014:
https://www.postgresql.org/message-id/flat/18014-28c81cb79d44295d%40postgresql.org
There were other aspects of the xmlmap test failure discussed in that thread as 
well.

Best regards,
Alexander

Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-08-15 Thread Heikki Linnakangas


On 01/08/2023 16:48, Joe Conway wrote:

Any further comments on the posted patch[1]? I would like to apply/push
this prior to the beta and minor releases next week.


I'm not sure about the placement of the uselocale() calls. In 
plperl_spi_exec(), for example, I think we should switch to the global 
locale right after the check_spi_usage_allowed() call. Otherwise, if an 
error happens in BeginInternalSubTransaction() or in pg_verifymbstr(), 
the error would be processed with the perl locale. Maybe that's 
harmless, error processing hardly cares about LC_MONETARY, but seems 
wrong in principle.


Hmm, come to think of it, if BeginInternalSubTransaction() throws an 
error, we just jump out of the perl interpreter? That doesn't seem cool. 
But that's not new with this patch.


If I'm reading correctly, compile_plperl_function() calls 
select_perl_context(), which calls plperl_trusted_init(), which calls 
uselocale(). So it leaves locale set to the perl locale. Who sets it back?


How about adding a small wrapper around eval_pl() that sets and unsets 
the locale(), just when we enter the interpreter? It's easier to see 
that we are doing the calls in right places, if we make them as close as 
possible to entering/exiting the interpreter. Are there other functions 
in addition to eval_pl() that need to be called with the perl locale?



/*
 * plperl_xact_callback --- cleanup at main-transaction end.
 */
static void
plperl_xact_callback(XactEvent event, void *arg)
{
/* ensure global locale is the current locale */
if (uselocale((locale_t) 0) != LC_GLOBAL_LOCALE)
perl_locale_obj = uselocale(LC_GLOBAL_LOCALE);
}


So the assumption is that the if current locale is not LC_GLOBAL_LOCALE, 
then it was the perl locale. Seems true today, but this could confusion 
if anything else calls uselocale(). In particular, if another PL 
implementation copies this, and you use plperl and the other PL at the 
same time, they would get mixed up. I think we need another "bool 
perl_locale_obj_in_use" variable to track explicitly whether the perl 
locale is currently active.


If we are careful to put the uselocale() calls in the right places so 
that we never ereport() while in perl locale, this callback isn't 
needed. Maybe it's still a good idea, though, to be extra sure that 
things get reset to a sane state if something unexpected happens.


If multiple interpreters are used, is the single perl_locale_obj 
variable still enough? Each interpreter can have their own locale I believe.


PS. please pgindent

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: postgres_fdw: wrong results with self join + enable_nestloop off

2023-08-15 Thread Önder Kalacı

Hi Etsuro, all

The commit[1] seems to break some queries in Citus[2], which is an
extension which relies on set_join_pathlist_hook.

Although the comment says */*Finally, give extensions a chance to
manipulate the path list.*/  *we use it to extract lots of information
about the joins and do the planning based on the information.

Now,  for some joins where consider_join_pushdown=false, we cannot get the
information that we used to get, which prevents doing distributed planning
for certain queries.

We wonder if it is possible to allow extensions to access the join info
under all circumstances, as it used to be? Basically, removing the
additional check:

diff --git a/src/backend/optimizer/path/joinpath.c
b/src/backend/optimizer/path/joinpath.c
index 03b3185984..080e76cbe9 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -349,8 +349,7 @@ add_paths_to_joinrel(PlannerInfo *root,
/*
 * 6. Finally, give extensions a chance to manipulate the path list.
 */
-   if (set_join_pathlist_hook &&
-   consider_join_pushdown)
+   if (set_join_pathlist_hook)
set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
   jointype,
&extra);
 }

Thanks,
Onder

[1]:
https://github.com/postgres/postgres/commit/b0e390e6d1d68b92e9983840941f8f6d9e083fe0
[2]: https://github.com/citusdata/citus/issues/7119

Etsuro Fujita , 15 Ağu 2023 Sal, 11:05 tarihinde
şunu yazdı:

> On Tue, Aug 8, 2023 at 6:30 PM Richard Guo  wrote:
> > On Tue, Aug 8, 2023 at 4:40 PM Etsuro Fujita 
> wrote:
> >> I modified the code a bit further to use an if-test to avoid a useless
> >> function call, and added/tweaked comments and docs further.  Attached
> >> is a new version of the patch.  I am planning to commit this, if there
> >> are no objections.
>
> > +1 to the v4 patch.  It looks good to me.
>
> Pushed after some copy-and-paste editing of comments/documents.
>
> Thanks!
>
> Best regards,
> Etsuro Fujita
>
>
>

Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-08-15 Thread John Naylor

On Tue, Aug 15, 2023 at 9:34 AM Masahiko Sawada 
wrote:

> BTW cfbot reported that some regression tests failed due to OOM. I've
> attached the patch to fix it.

Seems worth doing now rather than later, so added this and squashed most of
the rest together. I wonder if that test uses too much memory in general.
Maybe using the full uint64 is too much.

> On Mon, Aug 14, 2023 at 8:05 PM John Naylor
>  wrote:

> > This is possibly faster than v38-0010, but looking like not worth the
complexity, assuming the other way avoids the bug going forward.
>
> I prefer 0010 but is it worth testing with other compilers such as clang?

Okay, keeping 0010 with a comment, and leaving out 0011 for now. Clang is
aggressive about unrolling loops, so may be worth looking globally at some
point.

> > v38-0012-Use-branch-free-coding-to-skip-new-element-index.patch

> > Within noise level of v38-0011, but it's small and simple, so I like
it, at least for small arrays.
>
> Agreed.

Keeping 0012 and not 0013.

--
John Naylor
EDB: http://www.enterprisedb.com

v39-ART.tar.gz
Description: application/gzip

Test case for parameterized remote path in postgres_fdw

2023-08-15 Thread Etsuro Fujita

Hi,

While working on the join pushdown issue, I noticed this bit in commit
e4106b252:

--- parameterized remote path
+-- parameterized remote path for foreign table
 EXPLAIN (VERBOSE, COSTS false)
-  SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
+  SELECT * FROM "S 1"."T 1" a, ft2 b WHERE a."C 1" = 47 AND b.c1 = a.c2;
 SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
+

The first statement was modified to test the intended behavior, but
the second one was not.  The second one as-is performs a foreign join:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;

 QUERY PLAN
---
 Foreign Scan
   Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2,
b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
   Relations: (public.ft2 a) INNER JOIN (public.ft2 b)
   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6,
r1.c7, r1.c8, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7,
r2.c8 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 =
r2."C 1")) AND ((r1."C 1" = 47
(4 rows)

So we should have modified the second one as well?  Attached is a
small patch for that.

Best regards,
Etsuro Fujita


postgres-fdw-parameterized-path-test-case.patch
Description: Binary data

Re: Avoid a potential unstable test case: xmlmap.sql

I overlooked the fact even in the bitmap index scan loose mode,  the recheck
is still executed  before the qual, so bitmap index scan is OK in this case.

 Sort
   Output: oid, relname
   Sort Key: pg_class.relname
   ->  Bitmap Heap Scan on pg_catalog.pg_class
 Output: oid, relname
 Recheck Cond: (pg_class.relnamespace = '28601'::oid)
 Filter: (has_table_privilege(pg_class.oid, 'SELECT'::text) AND
(pg_class.relkind = ANY ('{r,m,v}'::"char"[])))
 ->  Bitmap Index Scan on pg_class_relname_nsp_index
   Index Cond: (pg_class.relnamespace = '28601'::oid)

v2 attached.

-- 
Best Regards
Andy Fan


v2-0001-Avoid-a-potential-unstable-testcase.patch
Description: Binary data

Avoid a potential unstable test case: xmlmap.sql

Hi:

In the test case of  xmlmap.sql, we have the query below
under schema_to_xml.

explain (costs off, verbose)
SELECT oid FROM pg_catalog.pg_class
WHERE  relnamespace = 28601
AND relkind IN ('r','m','v')
AND pg_catalog.has_table_privilege (oid, 'SELECT')
ORDER BY relname;

If the query is using SeqScan, the execution order of the quals is:

has_table_privilege(pg_class.oid, 'SELECT'::text) AND
(pg_class.relnamespace = '28601'::oid) AND (pg_class.relkind = ANY
('{r,m,v}'::"char"[]))

based on current cost setting and algorithm.  With this plan,
has_table_privilege(pg_class.oid, 'SELECT'::text)  may be executed
against all the relations (not just the given namespace), so if a
tuple in pg_class is scanned and before has_table_privilege is called,
the relation is dropped, then we will get error:

ERROR:  relation with OID xxx does not exist

To overcome this,  if disabling the seqscan, then only index scan on
relnamespace is possible,  so relnamespace = '28601'::oid will be filtered
first before calling has_table_privilege.  and in this test case, we are
sure
the relation belonging to the current namespace will never be dropped, so
no error is possible.  Here is the plan for reference:

Seq Scan:

 Sort
   Output: oid, relname
   Sort Key: pg_class.relname
   ->  Seq Scan on pg_catalog.pg_class
 Output: oid, relname
 Filter: (has_table_privilege(pg_class.oid, 'SELECT'::text) AND
(pg_class.relnamespace = '28601'::oid) AND (pg_class.relkind = ANY
('{r,m,v}'::"char"[])))

enable_seqscan to off

QUERY PLAN

--
 Index Scan using pg_class_relname_nsp_index on pg_catalog.pg_class
   Output: oid, relname
   Index Cond: (pg_class.relnamespace = '28601'::oid)
   Filter: (has_table_privilege(pg_class.oid, 'SELECT'::text) AND
(pg_class.relkind = ANY ('{r,m,v}'::"char"[])))

Patch is attached.

-- 
Best Regards
Andy Fan


v1-0001-Avoid-a-potential-unstable-testcase.patch
Description: Binary data

Re: Replace known_assigned_xids_lck by memory barrier

2023-08-15 Thread Michail Nikolaev

Hello, Nathan.

> What sort of benefits do you see from this patch? It might be worthwhile
> in itself to remove spinlocks when possible, but IME it's much easier to
> justify such changes when there is a tangible benefit we can point to.

Oh, it is not an easy question :)

The answer, probably, looks like this:
1) performance benefits of spin lock acquire removing in
KnownAssignedXidsGetOldestXmin and KnownAssignedXidsSearch
2) it is closing 13-year-old tech depth

But in reality, it is not easy to measure performance improvement
consistently for this change.

> Are the assignments in question guaranteed to be atomic? IIUC we assume
> that aligned 4-byte loads/stores are atomic, so we should be okay as long
> as we aren't handling anything larger.

Yes, 4-bytes assignment are atomic, locking is used to ensure memory
write ordering in this place.

> This use of pg_write_barrier() looks correct to me, but don't we need
> corresponding read barriers wherever we obtain the pointers? FWIW I tend
> to review src/backend/storage/lmgr/README.barrier in its entirety whenever
> I deal with this stuff.

Oh, yeah, you're right! (1)
I'll prepare an updated version of the patch soon. I don't why I was
assuming pg_write_barrier is enough (⊙＿⊙')


[1]: 
https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/README.barrier#L125

Re: cataloguing NOT NULL constraints

2023-08-15 Thread Alvaro Herrera

On 2023-Aug-15, Dean Rasheed wrote:

> I think perhaps for ALTER TABLE INHERIT, it should check that the
> child has a NOT NULL constraint, and error out if not. That's the
> current behaviour, and also matches other constraints types (e.g.,
> CHECK constraints).

Yeah, I reached the same conclusion yesterday while trying it out, so
that's what I implemented.  I'll post later today.

> More generally though, I'm worried that this is starting to get very
> complicated. I wonder if there might be a different, simpler approach.
> One vague idea is to have a new attribute on the column that counts
> the number of constraints (local and inherited PK and NOT NULL
> constraints) that make the column not null.

Hmm.  I grant that this is different, but I don't see that it is
simpler.

> Something else I noticed when reading the SQL standard is that a
> user-defined CHECK (col IS NOT NULL) constraint should be recognised
> by the system as also making the column not null (setting its
> "nullability characteristic" to "known not nullable").

I agree with this view actually, but I've refrained from implementing
it(*) because our SQL-standards people have advised against it.  Insider
knowledge?  I don't know.  I think this is a comparatively smaller
consideration though, and we can adjust for it afterwards.

(*) Rather: at some point I removed the implementation of that from the
patch.

> I'm also wondering whether creating a pg_constraint entry for *every*
> not-nullable column is actually going too far. If we were to
> distinguish between "defined as NOT NULL" and being not null as a
> result of one or more constraints, in the way that the standard seems
> to suggest, perhaps the former (likely to be much more common) could
> simply be a new attribute stored on the column. I think we actually
> only need to create pg_constraint entries if a constraint name or any
> additional constraint properties such as NOT VALID are specified. That
> would lead to far fewer new constraints, less catalog bloat, and less
> noise in the \d output.

There is a problem if we do this, though, which is that we cannot use
the constraints for the things that we want them for -- for example,
remove_useless_groupby_columns() would like to use unique constraints,
not just primary keys; but it depends on the NOT NULL rows being there
for invalidation reasons (namely: if the NOT NULL constraint is dropped,
we need to be able to replan.  Without catalog rows, we don't have a
mechanism to let that happen).

If we don't add all those redundant catalog rows, then this is all for
naught.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green
stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'.
After collecting 500 such letters, he mused, a university somewhere in
Arizona would probably grant him a degree.  (Don Knuth)

Re: cataloguing NOT NULL constraints

2023-08-15 Thread Dean Rasheed

On Fri, 11 Aug 2023 at 14:54, Alvaro Herrera  wrote:
>
> Right, in the end I got around to that point of view.  I abandoned the
> idea of adding these dependency links, and I'm back at relying on the
> coninhcount/conislocal markers.  But there were a couple of bugs in the
> accounting for that, so I've fixed some of those, but it's not yet
> complete:
>
> - ALTER TABLE parent ADD PRIMARY KEY
>   needs to create NOT NULL constraints in children.  I added this, but
>   I'm not yet sure it works correctly (for example, if a child already
>   has a NOT NULL constraint, we need to bump its inhcount, but we
>   don't.)
> - ALTER TABLE parent ADD PRIMARY KEY USING index
>   Not sure if this is just as above or needs separate handling
> - ALTER TABLE DROP PRIMARY KEY
>   needs to decrement inhcount or drop the constraint if there are no
>   other sources for that constraint to exist.  I've adjusted the drop
>   constraint code to do this.
> - ALTER TABLE INHERIT
>   needs to create a constraint on the new child, if parent has PK. Not
>   implemented
> - ALTER TABLE NO INHERIT
>   needs to delink any constraints (decrement inhcount, possibly drop
>   the constraint).
>

I think perhaps for ALTER TABLE INHERIT, it should check that the
child has a NOT NULL constraint, and error out if not. That's the
current behaviour, and also matches other constraints types (e.g.,
CHECK constraints).

More generally though, I'm worried that this is starting to get very
complicated. I wonder if there might be a different, simpler approach.
One vague idea is to have a new attribute on the column that counts
the number of constraints (local and inherited PK and NOT NULL
constraints) that make the column not null.

Something else I noticed when reading the SQL standard is that a
user-defined CHECK (col IS NOT NULL) constraint should be recognised
by the system as also making the column not null (setting its
"nullability characteristic" to "known not nullable"). I think that's
more than just an artefact of how they say NOT NULL constraints should
be implemented, because the effect of such a CHECK constraint should
be exposed in the "columns" view of the information schema -- the
value of "is_nullable" should be "NO" if the column is "known not
nullable".

In this sense, the standard does allow multiple not null constraints
on a column, independently of whether the column is "defined as NOT
NULL". My understanding of the standard is that ALTER COLUMN ...
SET/DROP NOT NULL change whether or not the column is "defined as NOT
NULL", and manage a single system-generated constraint, but there may
be any number of other user-defined constraints that also make the
column "known not nullable", and they need to be tracked in some way.

I'm also wondering whether creating a pg_constraint entry for *every*
not-nullable column is actually going too far. If we were to
distinguish between "defined as NOT NULL" and being not null as a
result of one or more constraints, in the way that the standard seems
to suggest, perhaps the former (likely to be much more common) could
simply be a new attribute stored on the column. I think we actually
only need to create pg_constraint entries if a constraint name or any
additional constraint properties such as NOT VALID are specified. That
would lead to far fewer new constraints, less catalog bloat, and less
noise in the \d output.

Regards,
Dean

Re: pgbench: allow to exit immediately when any client is aborted

2023-08-15 Thread Fabien COELHO




About pgbench exit on abort v4:

Patch applies cleanly, compiles, local make check ok, doc looks ok.

This looks ok to me.

--
Fabien.

Re: pg_logical_emit_message() misses a XLogFlush()

2023-08-15 Thread Tomas Vondra




On 8/15/23 08:38, Michael Paquier wrote:
> Hi all,
> 
> While playing with pg_logical_emit_message() and WAL replay, I have
> noticed that LogLogicalMessage() inserts a record but forgets to make
> sure that the record has been flushed.  So, for example, if the system
> crashes the message inserted can get lost.
> 
> I was writing some TAP tests for it for the sake of a bug, and I have
> found this the current behavior annoying because one cannot really
> rely on it when emulating crashes.
> 
> This has been introduced in 3fe3511 (from 2016), and there is no
> mention of that on the original thread that led to this commit:
> https://www.postgresql.org/message-id/flat/5685F999.6010202%402ndquadrant.com
> 
> This could be an issue for anybody using LogLogicalMessage() out of
> core, as well, because it would mean some records lost.  So, perhaps
> this should be treated as a bug, sufficient for a backpatch?
> 

Shouldn't the flush be done only for non-transactional messages? The
transactional case will be flushed by regular commit flush.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: LLVM 16 (opaque pointers)

2023-08-15 Thread Fabien COELHO




[...] Further changes are already needed for their "main" branch (LLVM 
17-to-be), so this won't quite be enough to shut seawasp up.


For information, the physical server which was hosting my 2 bf animals 
(seawasp and moonjelly) has given up rebooting after a power cut a few 
weeks/months ago, and I have not setup a replacement (yet).


--
Fabien.

Generating memory trace from Postgres backend process

2023-08-15 Thread Muneeb Anwar

Hi,
Has anyone tried generating a dynamic memory trace of a backend Postgres
process while it's running a query?

I want to characterize the memory access pattern of the Postgres database
engine when it's running any given query. The usual way to do this would be
to attach a dynamic instrumentation tool like DynamoRIO or Intel Pin to the
backend process running the query (?). How could I find the exact backend
process to attach to? Also, if there's any prior experience with generating
such memory traces, it would be helpful to know what tools were used.

best regards,
Muneeb Khan

Re: remaining sql/json patches

2023-08-15 Thread jian he

Hi.
in v11, json_query:
+The returned data_type has the
same semantics
+as for constructor functions like json_objectagg;
+the default returned type is text.
+The ON EMPTY clause specifies the behavior if the
+path_expression yields no value at all; the
+default when ON ERROR is not specified is
to return a
+null value.

the default returned type is jsonb?  Also in above quoted second last
line should be ON EMPTY ?
Other than that, the doc looks good.

Re: New WAL record to detect the checkpoint redo location

2023-08-15 Thread Dilip Kumar

On Fri, Jul 14, 2023 at 8:46 PM Andres Freund  wrote:
>
> Hi,
>
> As I think I mentioned before, I like this idea. However, I don't like the
> implementation too much.

Thanks for looking into it.


> This might work right now, but doesn't really seem maintainable. Nor do I like
> adding branches into this code a whole lot.

Okay, Now I have moved the XlogInsert for the special record outside
the WalInsertLock so we don't need this special handling here.

> > @@ -6597,6 +6612,32 @@ CreateCheckPoint(int flags)
>
> I think the commentary above this function would need a fair bit of
> revising...
>
> >*/
> >   RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
> >
> > + /*
> > +  * Insert a special purpose CHECKPOINT_REDO record as the first 
> > record at
> > +  * checkpoint redo lsn.  Although we have the checkpoint record that
> > +  * contains the exact redo lsn, there might have been some other 
> > records
> > +  * those got inserted between the redo lsn and the actual checkpoint
> > +  * record.  So when processing the wal, we cannot rely on the 
> > checkpoint
> > +  * record if we want to stop at the checkpoint-redo LSN.
> > +  *
> > +  * This special record, however, is not required when we doing a 
> > shutdown
> > +  * checkpoint, as there will be no concurrent wal insertions during 
> > that
> > +  * time.  So, the shutdown checkpoint LSN will be the same as
> > +  * checkpoint-redo LSN.
> > +  *
> > +  * This record is guaranteed to be the first record at checkpoint 
> > redo lsn
> > +  * because we are inserting this while holding the exclusive wal 
> > insertion
> > +  * lock.
> > +  */
> > + if (!shutdown)
> > + {
> > + int dummy = 0;
> > +
> > + XLogBeginInsert();
> > + XLogRegisterData((char *) &dummy, sizeof(dummy));
> > + recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKPOINT_REDO);
> > + }
>
> It seems to me that we should be able to do better than this.
>
> I suspect we might be able to get rid of the need for exclusive inserts
> here. If we rid of that, we could determine the redoa location based on the
> LSN determined by the XLogInsert().

Yeah, good idea, actually we can do this insert outside of the
exclusive insert lock and set the LSN of this insert as the
checkpoint. redo location.  So now we do not need to compute the
checkpoint. redo based on the current insertion point we can directly
use the LSN of this record.  I have modified this and I have also
moved some other code that is not required to be inside the WAL
insertion lock.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v2-0001-New-WAL-record-for-checkpoint-redo-location.patch
Description: Binary data

Re: postgres_fdw: wrong results with self join + enable_nestloop off

2023-08-15 Thread Etsuro Fujita

On Tue, Aug 8, 2023 at 6:30 PM Richard Guo  wrote:
> On Tue, Aug 8, 2023 at 4:40 PM Etsuro Fujita  wrote:
>> I modified the code a bit further to use an if-test to avoid a useless
>> function call, and added/tweaked comments and docs further.  Attached
>> is a new version of the patch.  I am planning to commit this, if there
>> are no objections.

> +1 to the v4 patch.  It looks good to me.

Pushed after some copy-and-paste editing of comments/documents.

Thanks!

Best regards,
Etsuro Fujita

Re: pgbench - adding pl/pgsql versions of tests

2023-08-15 Thread Fabien COELHO




Hello Nathan,


1. so I don't have to create the script and function manually each
time I want to test mainly the database (instead of the
client-database system)

2. so that new users of PostgreSQL can easily see how much better OLTP
workloads perform when packaged up as a server-side function


I'm not sure we should add micro-optimized versions of the existing scripts
to pgbench.  Your point about demonstrating the benefits of server-side
functions seems reasonable, but it also feels a bit like artifically
improving pgbench numbers.  I think I'd rather see some more variety in the
built-in scripts so that folks can more easily test a wider range of common
workloads.  Perhaps this could include a test that is focused on
server-side functions.


ISTM that your argument suggests to keep the tpcb-like PL/pgSQL version.
It is the more beneficial anyway as it merges 4/5 commands in one call, so 
it demonstrate the effect of investing in this kind of approach.


I'm unclear about what variety of scripts that could be provided given the 
tables made available with pgbench. ISTM that other scenari would involve 
both an initialization and associated scripts, and any proposal would be 
bared because it would open the door to anything.



In any case, it looks like there is unaddressed feedback for this patch, so
I'm marking it as "waiting on author."


Indeed.

--
Fabien.

Re: Extract numeric filed in JSONB more effectively

2023-08-15 Thread Pavel Stehule

út 15. 8. 2023 v 9:05 odesílatel Andy Fan  napsal:

>
>
>> a) effectiveness. The ending performance should be similar like your
>> current patch, but without necessity to use planner support API.
>>
>
> So the cost is we need to create a new & different framework.
>

yes, it can be less work, code than for example introduction of
"anycompatible".


>
>>
> b) because you can write only var := j->'f', and plpgsql forces cast
>> function execution, but not via planner.
>>
>
> var a := 1 needs going with planner,  IIUC,  same with j->'f'.
>

i was wrong, the planner is full, but the executor is reduced.



>
> c) nothing else. It should not to require to modify cast function
>> definitions
>>
>
> If you look at how the planner support function works,  that is
> pretty simple,  just modify the prosupport attribute. I'm not sure
> this should be called an issue or avoiding it can be described
> as a benefit.
>
> I don't think the current case is as bad as the other ones like
> users needing to modify their queries or type-safety system
> being broken. So personally I'm not willing to creating some
> thing new & heavy. However I'm open to see what others say.
>

ok

regards

Pavel


>
> --
> Best Regards
> Andy Fan
>

Re: Extract numeric filed in JSONB more effectively